Series comparison

 [PULL 00/26] target-arm queue
-Small pile of bug fixes for rc1. I've included my patches to get
+Hi; here's a target-arm pullreq. Mostly this is RTH's FEAT_RME
-our docs building with Sphinx 3, just for convenience...
+series; there are also a handful of bug fixes including some
 which aren't arm-specific but which it's convenient to include
 here.
+thanks
 -- PMM
-The following changes since commit b149dea55cce97cb226683d06af61984a1c11e96:
+The following changes since commit b455ce4c2f300c8ba47cba7232dd03261368a4cb:
-  Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20201102' into staging (2020-11-02 10:57:48 +0000)
+  Merge tag 'q800-for-8.1-pull-request' of https://github.com/vivier/qemu-m68k into staging (2023-06-22 10:18:32 +0200)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20201102
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20230623
-for you to fetch changes up to ffb4fbf90a2f63c9cb33e4bb9f854c79bf04ca4a:
+for you to fetch changes up to 497fad38979c16b6412388927401e577eba43d26:
-  tests/qtest/npcm7xx_rng-test: Disable randomness tests (2020-11-02 16:52:18 +0000)
+  pc-bios/keymaps: Use the official xkb name for Arabic layout, not the legacy synonym (2023-06-23 11:46:02 +0100)
 ----------------------------------------------------------------
 target-arm queue:
- * target/arm: Fix Neon emulation bugs on big-endian hosts
+ * Add (experimental) support for FEAT_RME
- * target/arm: fix handling of HCR.FB
+ * host-utils: Avoid using __builtin_subcll on buggy versions of Apple Clang
- * target/arm: fix LORID_EL1 access check
+ * target/arm: Restructure has_vfp_d32 test
- * disas/capstone: Fix monitor disassembly of >32 bytes
+ * hw/arm/sbsa-ref: add ITS support in SBSA GIC
- * hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
+ * target/arm: Fix sve predicate store, 8 <= VQ <= 15
- * hw/arm/boot: fix SVE for EL3 direct kernel boot
+ * pc-bios/keymaps: Use the official xkb name for Arabic layout, not the legacy synonym
  * hw/display/omap_lcdc: Fix potential NULL pointer dereference
  * hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
  * target/arm: Get correct MMU index for other-security-state
  * configure: Test that gio libs from pkg-config work
  * hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
  * docs: Fix building with Sphinx 3
  * tests/qtest/npcm7xx_rng-test: Disable randomness tests
 ----------------------------------------------------------------
-AlexChen (2):
+Peter Maydell (2):
-      hw/display/omap_lcdc: Fix potential NULL pointer dereference
+      host-utils: Avoid using __builtin_subcll on buggy versions of Apple Clang
-      hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
+      pc-bios/keymaps: Use the official xkb name for Arabic layout, not the legacy synonym
-Peter Maydell (9):
+Richard Henderson (23):
-      target/arm: Fix float16 pairwise Neon ops on big-endian hosts
+      target/arm: Add isar_feature_aa64_rme
-      target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
+      target/arm: Update SCR and HCR for RME
-      disas/capstone: Fix monitor disassembly of >32 bytes
+      target/arm: SCR_EL3.NS may be RES1
-      target/arm: Get correct MMU index for other-security-state
+      target/arm: Add RME cpregs
-      configure: Test that gio libs from pkg-config work
+      target/arm: Introduce ARMSecuritySpace
-      hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
+      include/exec/memattrs: Add two bits of space to MemTxAttrs
-      scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
+      target/arm: Adjust the order of Phys and Stage2 ARMMMUIdx
-      qemu-option-trace.rst.inc: Don't use option:: markup
+      target/arm: Introduce ARMMMUIdx_Phys_{Realm,Root}
-      tests/qtest/npcm7xx_rng-test: Disable randomness tests
+      target/arm: Remove __attribute__((nonnull)) from ptw.c
       target/arm: Pipe ARMSecuritySpace through ptw.c
       target/arm: NSTable is RES0 for the RME EL3 regime
       target/arm: Handle Block and Page bits for security space
       target/arm: Handle no-execute for Realm and Root regimes
       target/arm: Use get_phys_addr_with_struct in S1_ptw_translate
       target/arm: Move s1_is_el0 into S1Translate
       target/arm: Use get_phys_addr_with_struct for stage2
       target/arm: Add GPC syndrome
       target/arm: Implement GPC exceptions
       target/arm: Implement the granule protection check
       target/arm: Add cpu properties for enabling FEAT_RME
       docs/system/arm: Document FEAT_RME
       target/arm: Restructure has_vfp_d32 test
       target/arm: Fix sve predicate store, 8 <= VQ <= 15
-Philippe Mathieu-Daudé (1):
+Shashi Mallela (1):
-      hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
+      hw/arm/sbsa-ref: add ITS support in SBSA GIC
-Richard Henderson (11):
+ docs/system/arm/cpu-features.rst |  23 ++
-      target/arm: Introduce neon_full_reg_offset
+ docs/system/arm/emulation.rst    |   1 +
-      target/arm: Move neon_element_offset to translate.c
+ docs/system/arm/sbsa.rst         |  14 +
-      target/arm: Use neon_element_offset in neon_load/store_reg
+ include/exec/memattrs.h          |   9 +-
-      target/arm: Use neon_element_offset in vfp_reg_offset
+ include/qemu/compiler.h          |  13 +
-      target/arm: Add read/write_neon_element32
+ include/qemu/host-utils.h        |   2 +-
-      target/arm: Expand read/write_neon_element32 to all MemOp
+ target/arm/cpu.h                 | 151 ++++++++---
-      target/arm: Rename neon_load_reg32 to vfp_load_reg32
+ target/arm/internals.h           |  27 ++
-      target/arm: Add read/write_neon_element64
+ target/arm/syndrome.h            |  10 +
-      target/arm: Rename neon_load_reg64 to vfp_load_reg64
+ hw/arm/sbsa-ref.c                |  33 ++-
-      target/arm: Simplify do_long_3d and do_2scalar_long
+ target/arm/cpu.c                 |  32 ++-
-      target/arm: Improve do_prewiden_3d
+ target/arm/helper.c              | 162 ++++++++++-
+ target/arm/ptw.c                 | 570 +++++++++++++++++++++++++++++++--------
-Rémi Denis-Courmont (3):
+ target/arm/tcg/cpu64.c           |  53 ++++
-      target/arm: fix handling of HCR.FB
+ target/arm/tcg/tlb_helper.c      |  96 ++++++-
-      target/arm: fix LORID_EL1 access check
+ target/arm/tcg/translate-sve.c   |   2 +-
-      hw/arm/boot: fix SVE for EL3 direct kernel boot
+ pc-bios/keymaps/meson.build      |   2 +-
+files changed, 1034 insertions(+), 166 deletions(-)
  docs/qemu-option-trace.rst.inc     |   6 +-
  configure                          |  10 +-
  include/hw/intc/arm_gicv3_common.h |   1 -
  disas/capstone.c                   |   2 +-
  hw/arm/boot.c                      |   3 +
  hw/arm/smmuv3.c                    |   3 +-
  hw/display/exynos4210_fimd.c       |   4 +-
  hw/display/omap_lcdc.c             |  10 +-
  hw/intc/arm_gicv3_cpuif.c          |   5 +-
  target/arm/helper.c                |  24 +-
  target/arm/m_helper.c              |   3 +-
  target/arm/translate.c             | 153 +++++++++---
  target/arm/vec_helper.c            |  12 +-
  tests/qtest/npcm7xx_rng-test.c     |  14 +-
  scripts/kernel-doc                 |  18 +-
  target/arm/translate-neon.c.inc    | 472 ++++++++++++++++++++-----------------
  target/arm/translate-vfp.c.inc     | 341 +++++++++++----------------
 files changed, 588 insertions(+), 493 deletions(-)

-[PULL 01/26] target/arm: Introduce neon_full_reg_offset
+[PULL 01/26] target/arm: Add isar_feature_aa64_rme
 From: Richard Henderson <richard.henderson@linaro.org>
-This function makes it clear that we're talking about the whole
+Add the missing field for ID_AA64PFR0, and the predicate.
-register, and not the 32-bit piece at index 0.  This fixes a bug
+Disable it if EL3 is forced off by the board or command-line.
 when running on a big-endian host.
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-2-richard.henderson@linaro.org
+Message-id: 20230620124418.805717-2-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          |  8 ++++++
+ target/arm/cpu.h | 6 ++++++
- target/arm/translate-neon.c.inc | 44 ++++++++++++++++-----------------
+ target/arm/cpu.c | 4 ++++
- target/arm/translate-vfp.c.inc  |  2 +-
+files changed, 10 insertions(+)
 files changed, 31 insertions(+), 23 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/cpu.h
-+++ b/target/arm/translate.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline void gen_hlt(DisasContext *s, int imm)
+@@ -XXX,XX +XXX,XX @@ FIELD(ID_AA64PFR0, SEL2, 36, 4)
-     unallocated_encoding(s);
+ FIELD(ID_AA64PFR0, MPAM, 40, 4)
  FIELD(ID_AA64PFR0, AMU, 44, 4)
  FIELD(ID_AA64PFR0, DIT, 48, 4)
 +FIELD(ID_AA64PFR0, RME, 52, 4)
  FIELD(ID_AA64PFR0, CSV2, 56, 4)
  FIELD(ID_AA64PFR0, CSV3, 60, 4)
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sel2(const ARMISARegisters *id)
      return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, SEL2) != 0;
  }
-+/*
++static inline bool isar_feature_aa64_rme(const ARMISARegisters *id)
 + * Return the offset of a "full" NEON Dreg.
 + */
 +static long neon_full_reg_offset(unsigned reg)
 +{
-+    return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
++    return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, RME) != 0;
 +}
 +
- static inline long vfp_reg_offset(bool dp, unsigned reg)
+ static inline bool isar_feature_aa64_vh(const ARMISARegisters *id)
  {
-     if (dp) {
+     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, VH) != 0;
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.c.inc
+--- a/target/arm/cpu.c
-+++ b/target/arm/translate-neon.c.inc
++++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ neon_element_offset(int reg, int element, MemOp size)
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
-         ofs ^= 8 - element_size;
+         cpu->isar.id_dfr0 = FIELD_DP32(cpu->isar.id_dfr0, ID_DFR0, COPSDBG, 0);
          cpu->isar.id_aa64pfr0 = FIELD_DP64(cpu->isar.id_aa64pfr0,
                                             ID_AA64PFR0, EL3, 0);
 +
 +        /* Disable the realm management extension, which requires EL3. */
 +        cpu->isar.id_aa64pfr0 = FIELD_DP64(cpu->isar.id_aa64pfr0,
 +                                           ID_AA64PFR0, RME, 0);
      }
- #endif
--    return neon_reg_offset(reg, 0) + ofs;
+     if (!cpu->has_el2) {
 +    return neon_full_reg_offset(reg) + ofs;
  }
  static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
               * We cannot write 16 bytes at once because the
               * destination is unaligned.
               */
 -            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
 +            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
 , 8, tmp);
 -            tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
 -                             neon_reg_offset(vd, 0), 8, 8);
 +            tcg_gen_gvec_mov(0, neon_full_reg_offset(vd + 1),
 +                             neon_full_reg_offset(vd), 8, 8);
          } else {
 -            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
 +            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
                                   vec_size, vec_size, tmp);
          }
          tcg_gen_addi_i32(addr, addr, 1 << size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
  static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
  {
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rn_ofs = neon_reg_offset(a->vn, 0);
 -    int rm_ofs = neon_reg_offset(a->vm, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rn_ofs = neon_full_reg_offset(a->vn);
 +    int rm_ofs = neon_full_reg_offset(a->vm);
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
  {
      /* Handle a 2-reg-shift insn which can be vectorized. */
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rm_ofs = neon_reg_offset(a->vm, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rm_ofs = neon_full_reg_offset(a->vm);
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
  {
      /* FP operations in 2-reg-and-shift group */
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rm_ofs = neon_reg_offset(a->vm, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rm_ofs = neon_full_reg_offset(a->vm);
      TCGv_ptr fpst;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
          return true;
      }
 -    reg_ofs = neon_reg_offset(a->vd, 0);
 +    reg_ofs = neon_full_reg_offset(a->vd);
      vec_size = a->q ? 16 : 8;
      imm = asimd_imm_const(a->imm, a->cmode, a->op);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
          return true;
      }
 -    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
 -                       neon_reg_offset(a->vn, 0),
 -                       neon_reg_offset(a->vm, 0),
 +    tcg_gen_gvec_3_ool(neon_full_reg_offset(a->vd),
 +                       neon_full_reg_offset(a->vn),
 +                       neon_full_reg_offset(a->vm),
 , 16, 0, fn_gvec);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
  {
      /* Two registers and a scalar, using gvec */
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rn_ofs = neon_reg_offset(a->vn, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rn_ofs = neon_full_reg_offset(a->vn);
      int rm_ofs;
      int idx;
      TCGv_ptr fpstatus;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
      /* a->vm is M:Vm, which encodes both register and index */
      idx = extract32(a->vm, a->size + 2, 2);
      a->vm = extract32(a->vm, 0, a->size + 2);
 -    rm_ofs = neon_reg_offset(a->vm, 0);
 +    rm_ofs = neon_full_reg_offset(a->vm);
      fpstatus = fpstatus_ptr(a->size == 1 ? FPST_STD_F16 : FPST_STD);
      tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpstatus,
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
          return true;
      }
 -    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
 +    tcg_gen_gvec_dup_mem(a->size, neon_full_reg_offset(a->vd),
                           neon_element_offset(a->vm, a->index, a->size),
                           a->q ? 16 : 8, a->q ? 16 : 8);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
  static bool do_2misc_vec(DisasContext *s, arg_2misc *a, GVecGen2Fn *fn)
  {
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rm_ofs = neon_reg_offset(a->vm, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rm_ofs = neon_full_reg_offset(a->vm);
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
      }
      tmp = load_reg(s, a->rt);
 -    tcg_gen_gvec_dup_i32(size, neon_reg_offset(a->vn, 0),
 +    tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(a->vn),
                           vec_size, vec_size, tmp);
      tcg_temp_free_i32(tmp);
 --
-.20.1
+.34.1

-[PULL 24/26] scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
+[PULL 02/26] target/arm: Update SCR and HCR for RME
-The kerneldoc script currently emits Sphinx markup for a macro with
+From: Richard Henderson <richard.henderson@linaro.org>
 arguments that uses the c:function directive. This is correct for
 Sphinx versions earlier than Sphinx 3, where c:macro doesn't allow
 documentation of macros with arguments and c:function is not picky
 about the syntax of what it is passed. However, in Sphinx 3 the
 c:macro directive was enhanced to support macros with arguments,
 and c:function was made more picky about what syntax it accepted.
-When kerneldoc is told that it needs to produce output for Sphinx
+Define the missing SCR and HCR bits, allow SCR_NSE and {SCR,HCR}_GPF
-or later, make it emit c:function only for functions and c:macro
+to be set, and invalidate TLBs when NSE changes.
 for macros with arguments. We assume that anything with a return
 type is a function and anything without is a macro.
-This fixes the Sphinx error:
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20230620124418.805717-3-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/cpu.h    |  5 +++--
  target/arm/helper.c | 10 ++++++++--
 files changed, 11 insertions(+), 4 deletions(-)
-/home/petmay01/linaro/qemu-from-laptop/qemu/docs/../include/qom/object.h:155:Error in declarator
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
-If declarator-id with parameters (e.g., 'void f(int arg)'):
+index XXXXXXX..XXXXXXX 100644
-  Invalid C declaration: Expected identifier in nested name. [error at 25]
+--- a/target/arm/cpu.h
-    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
++++ b/target/arm/cpu.h
-    -------------------------^
+@@ -XXX,XX +XXX,XX @@ static inline void xpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
-If parenthesis in noptr-declarator (e.g., 'void (*f(int arg))(double)'):
+ #define HCR_TERR      (1ULL << 36)
-  Error in declarator or parameters
+ #define HCR_TEA       (1ULL << 37)
-  Invalid C declaration: Expecting "(" in parameters. [error at 39]
+ #define HCR_MIOCNCE   (1ULL << 38)
-    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
+-/* RES0 bit 39 */
-    ---------------------------------------^
++#define HCR_TME       (1ULL << 39)
+ #define HCR_APK       (1ULL << 40)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+ #define HCR_API       (1ULL << 41)
-Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
+ #define HCR_NV        (1ULL << 42)
-Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
+@@ -XXX,XX +XXX,XX @@ static inline void xpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
-Message-id: 20201030174700.7204-2-peter.maydell@linaro.org
+ #define HCR_NV2       (1ULL << 45)
----
+ #define HCR_FWB       (1ULL << 46)
- scripts/kernel-doc | 18 +++++++++++++++++-
+ #define HCR_FIEN      (1ULL << 47)
-file changed, 17 insertions(+), 1 deletion(-)
+-/* RES0 bit 48 */
++#define HCR_GPF       (1ULL << 48)
-diff --git a/scripts/kernel-doc b/scripts/kernel-doc
+ #define HCR_TID4      (1ULL << 49)
-index XXXXXXX..XXXXXXX 100755
+ #define HCR_TICAB     (1ULL << 50)
---- a/scripts/kernel-doc
+ #define HCR_AMVOFFEN  (1ULL << 51)
-+++ b/scripts/kernel-doc
+@@ -XXX,XX +XXX,XX @@ static inline void xpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
-@@ -XXX,XX +XXX,XX @@ sub output_function_rst(%) {
+ #define SCR_TRNDR             (1ULL << 40)
-     output_highlight_rst($args{'purpose'});
+ #define SCR_ENTP2             (1ULL << 41)
-     $start = "\n\n**Syntax**\n\n  ``";
+ #define SCR_GPF               (1ULL << 48)
 +#define SCR_NSE               (1ULL << 62)
  #define HSTR_TTEE (1 << 16)
  #define HSTR_TJDBX (1 << 17)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void scr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
          if (cpu_isar_feature(aa64_fgt, cpu)) {
              valid_mask |= SCR_FGTEN;
          }
 +        if (cpu_isar_feature(aa64_rme, cpu)) {
 +            valid_mask |= SCR_NSE | SCR_GPF;
 +        }
      } else {
--    print ".. c:function:: ";
+         valid_mask &= ~(SCR_RW | SCR_ST);
-+        if ((split(/\./, $sphinx_version))[0] >= 3) {
+         if (cpu_isar_feature(aa32_ras, cpu)) {
-+            # Sphinx 3 and later distinguish macros and functions and
+@@ -XXX,XX +XXX,XX @@ static void scr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
-+            # complain if you use c:function with something that's not
+     env->cp15.scr_el3 = value;
-+            # syntactically valid as a function declaration.
-+            # We assume that anything with a return type is a function
+     /*
-+            # and anything without is a macro.
+-     * If SCR_EL3.NS changes, i.e. arm_is_secure_below_el3, then
-+            if ($args{'functiontype'} ne "") {
++     * If SCR_EL3.{NS,NSE} changes, i.e. change of security state,
-+                print ".. c:function:: ";
+      * we must invalidate all TLBs below EL3.
-+            } else {
+      */
-+                print ".. c:macro:: ";
+-    if (changed & SCR_NS) {
-+            }
++    if (changed & (SCR_NS | SCR_NSE)) {
-+        } else {
+         tlb_flush_by_mmuidx(env_cpu(env), (ARMMMUIdxBit_E10_0 |
-+            # Older Sphinx don't support documenting macros that take
+                                            ARMMMUIdxBit_E20_0 |
-+            # arguments with c:macro, and don't complain about the use
+                                            ARMMMUIdxBit_E10_1 |
-+            # of c:function for this.
+@@ -XXX,XX +XXX,XX @@ static void do_hcr_write(CPUARMState *env, uint64_t value, uint64_t valid_mask)
-+            print ".. c:function:: ";
+         if (cpu_isar_feature(aa64_fwb, cpu)) {
              valid_mask |= HCR_FWB;
          }
 +        if (cpu_isar_feature(aa64_rme, cpu)) {
 +            valid_mask |= HCR_GPF;
 +        }
      }
-     if ($args{'functiontype'} ne "") {
-     $start .= $args{'functiontype'} . " " . $args{'function'} . " (";
+     if (cpu_isar_feature(any_evt, cpu)) {
 --
-.20.1
+.34.1

-[PULL 14/26] target/arm: fix handling of HCR.FB
+[PULL 03/26] target/arm: SCR_EL3.NS may be RES1
-From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-HCR should be applied when NS is set, not when it is cleared.
+With RME, SEL2 must also be present to support secure state.
 The NS bit is RES1 if SEL2 is not present.
-Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20230620124418.805717-4-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c | 5 ++---
+ target/arm/helper.c | 3 +++
-file changed, 2 insertions(+), 3 deletions(-)
+file changed, 3 insertions(+)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+@@ -XXX,XX +XXX,XX @@ static void scr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
+         }
- /*
+         if (cpu_isar_feature(aa64_sel2, cpu)) {
-  * Non-IS variants of TLB operations are upgraded to
+             valid_mask |= SCR_EEL2;
-- * IS versions if we are at NS EL1 and HCR_EL2.FB is set to
++        } else if (cpu_isar_feature(aa64_rme, cpu)) {
-+ * IS versions if we are at EL1 and HCR_EL2.FB is effectively set to
++            /* With RME and without SEL2, NS is RES1 (R_GSWWH, I_DJJQJ). */
-  * force broadcast of these operations.
++            value |= SCR_NS;
-  */
+         }
- static bool tlb_force_broadcast(CPUARMState *env)
+         if (cpu_isar_feature(aa64_mte, cpu)) {
- {
+             valid_mask |= SCR_ATA;
 -    return (env->cp15.hcr_el2 & HCR_FB) &&
 -        arm_current_el(env) == 1 && arm_is_secure_below_el3(env);
 +    return arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_FB);
  }
  static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
 --
-.20.1
+.34.1

-[PULL 15/26] target/arm: fix LORID_EL1 access check
+[PULL 04/26] target/arm: Add RME cpregs
-From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-Secure mode is not exempted from checking SCR_EL3.TLOR, and in the
+This includes GPCCR, GPTBR, MFAR, the TLB flush insns PAALL, PAALLOS,
-future HCR_EL2.TLOR when S-EL2 is enabled.
+RPALOS, RPAOS, and the cache flush insns CIPAPA and CIGDPAPA.
-Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20230620124418.805717-5-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c | 19 +++++--------------
+ target/arm/cpu.h    | 19 ++++++++++
-file changed, 5 insertions(+), 14 deletions(-)
+ target/arm/helper.c | 84 +++++++++++++++++++++++++++++++++++++++++++++
 files changed, 103 insertions(+)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
+         uint64_t fgt_read[2]; /* HFGRTR, HDFGRTR */
+         uint64_t fgt_write[2]; /* HFGWTR, HDFGWTR */
+         uint64_t fgt_exec[1]; /* HFGITR */
++
++        /* RME registers */
++        uint64_t gpccr_el3;
++        uint64_t gptbr_el3;
++        uint64_t mfar_el3;
+     } cp15;
+     struct {
+@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
+     uint64_t reset_cbar;
+     uint32_t reset_auxcr;
+     bool reset_hivecs;
++    uint8_t reset_l0gptsz;
+     /*
+      * Intermediate values used during property parsing.
+@@ -XXX,XX +XXX,XX @@ FIELD(MVFR1, SIMDFMAC, 28, 4)
+ FIELD(MVFR2, SIMDMISC, 0, 4)
+ FIELD(MVFR2, FPMISC, 4, 4)
++FIELD(GPCCR, PPS, 0, 3)
++FIELD(GPCCR, IRGN, 8, 2)
++FIELD(GPCCR, ORGN, 10, 2)
++FIELD(GPCCR, SH, 12, 2)
++FIELD(GPCCR, PGS, 14, 2)
++FIELD(GPCCR, GPC, 16, 1)
++FIELD(GPCCR, GPCP, 17, 1)
++FIELD(GPCCR, L0GPTSZ, 20, 4)
++
++FIELD(MFAR, FPA, 12, 40)
++FIELD(MFAR, NSE, 62, 1)
++FIELD(MFAR, NS, 63, 1)
++
+ QEMU_BUILD_BUG_ON(ARRAY_SIZE(((ARMCPU *)0)->ccsidr) <= R_V7M_CSSELR_INDEX_MASK);
+ /* If adding a feature bit which corresponds to a Linux ELF
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static uint64_t id_aa64pfr0_read(CPUARMState *env, const ARMCPRegInfo *ri)
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo sme_reginfo[] = {
        .access = PL2_RW, .accessfn = access_esm,
        .type = ARM_CP_CONST, .resetvalue = 0 },
  };
 +
 +static void tlbi_aa64_paall_write(CPUARMState *env, const ARMCPRegInfo *ri,
 +                                  uint64_t value)
 +{
 +    CPUState *cs = env_cpu(env);
 +
 +    tlb_flush(cs);
 +}
 +
 +static void gpccr_write(CPUARMState *env, const ARMCPRegInfo *ri,
 +                        uint64_t value)
 +{
 +    /* L0GPTSZ is RO; other bits not mentioned are RES0. */
 +    uint64_t rw_mask = R_GPCCR_PPS_MASK | R_GPCCR_IRGN_MASK |
 +        R_GPCCR_ORGN_MASK | R_GPCCR_SH_MASK | R_GPCCR_PGS_MASK |
 +        R_GPCCR_GPC_MASK | R_GPCCR_GPCP_MASK;
 +
 +    env->cp15.gpccr_el3 = (value & rw_mask) | (env->cp15.gpccr_el3 & ~rw_mask);
 +}
 +
 +static void gpccr_reset(CPUARMState *env, const ARMCPRegInfo *ri)
 +{
 +    env->cp15.gpccr_el3 = FIELD_DP64(0, GPCCR, L0GPTSZ,
 +                                     env_archcpu(env)->reset_l0gptsz);
 +}
 +
 +static void tlbi_aa64_paallos_write(CPUARMState *env, const ARMCPRegInfo *ri,
 +                                    uint64_t value)
 +{
 +    CPUState *cs = env_cpu(env);
 +
 +    tlb_flush_all_cpus_synced(cs);
 +}
 +
 +static const ARMCPRegInfo rme_reginfo[] = {
 +    { .name = "GPCCR_EL3", .state = ARM_CP_STATE_AA64,
 +      .opc0 = 3, .opc1 = 6, .crn = 2, .crm = 1, .opc2 = 6,
 +      .access = PL3_RW, .writefn = gpccr_write, .resetfn = gpccr_reset,
 +      .fieldoffset = offsetof(CPUARMState, cp15.gpccr_el3) },
 +    { .name = "GPTBR_EL3", .state = ARM_CP_STATE_AA64,
 +      .opc0 = 3, .opc1 = 6, .crn = 2, .crm = 1, .opc2 = 4,
 +      .access = PL3_RW, .fieldoffset = offsetof(CPUARMState, cp15.gptbr_el3) },
 +    { .name = "MFAR_EL3", .state = ARM_CP_STATE_AA64,
 +      .opc0 = 3, .opc1 = 6, .crn = 6, .crm = 0, .opc2 = 5,
 +      .access = PL3_RW, .fieldoffset = offsetof(CPUARMState, cp15.mfar_el3) },
 +    { .name = "TLBI_PAALL", .state = ARM_CP_STATE_AA64,
 +      .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 7, .opc2 = 4,
 +      .access = PL3_W, .type = ARM_CP_NO_RAW,
 +      .writefn = tlbi_aa64_paall_write },
 +    { .name = "TLBI_PAALLOS", .state = ARM_CP_STATE_AA64,
 +      .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 1, .opc2 = 4,
 +      .access = PL3_W, .type = ARM_CP_NO_RAW,
 +      .writefn = tlbi_aa64_paallos_write },
 +    /*
 +     * QEMU does not have a way to invalidate by physical address, thus
 +     * invalidating a range of physical addresses is accomplished by
 +     * flushing all tlb entries in the outer sharable domain,
 +     * just like PAALLOS.
 +     */
 +    { .name = "TLBI_RPALOS", .state = ARM_CP_STATE_AA64,
 +      .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 4, .opc2 = 7,
 +      .access = PL3_W, .type = ARM_CP_NO_RAW,
 +      .writefn = tlbi_aa64_paallos_write },
 +    { .name = "TLBI_RPAOS", .state = ARM_CP_STATE_AA64,
 +      .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 4, .opc2 = 3,
 +      .access = PL3_W, .type = ARM_CP_NO_RAW,
 +      .writefn = tlbi_aa64_paallos_write },
 +    { .name = "DC_CIPAPA", .state = ARM_CP_STATE_AA64,
 +      .opc0 = 1, .opc1 = 6, .crn = 7, .crm = 14, .opc2 = 1,
 +      .access = PL3_W, .type = ARM_CP_NOP },
 +};
 +
 +static const ARMCPRegInfo rme_mte_reginfo[] = {
 +    { .name = "DC_CIGDPAPA", .state = ARM_CP_STATE_AA64,
 +      .opc0 = 1, .opc1 = 6, .crn = 7, .crm = 14, .opc2 = 5,
 +      .access = PL3_W, .type = ARM_CP_NOP },
 +};
  #endif /* TARGET_AARCH64 */
  static void define_pmu_regs(ARMCPU *cpu)
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
      if (cpu_isar_feature(aa64_fgt, cpu)) {
          define_arm_cp_regs(cpu, fgt_reginfo);
      }
 +
 +    if (cpu_isar_feature(aa64_rme, cpu)) {
 +        define_arm_cp_regs(cpu, rme_reginfo);
 +        if (cpu_isar_feature(aa64_mte, cpu)) {
 +            define_arm_cp_regs(cpu, rme_mte_reginfo);
 +        }
 +    }
  #endif
- /* Shared logic between LORID and the rest of the LOR* registers.
+     if (cpu_isar_feature(any_predinv, cpu)) {
 - * Secure state has already been delt with.
 + * Secure state exclusion has already been dealt with.
   */
 -static CPAccessResult access_lor_ns(CPUARMState *env)
 +static CPAccessResult access_lor_ns(CPUARMState *env,
 +                                    const ARMCPRegInfo *ri, bool isread)
  {
      int el = arm_current_el(env);
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_ns(CPUARMState *env)
      return CP_ACCESS_OK;
  }
 -static CPAccessResult access_lorid(CPUARMState *env, const ARMCPRegInfo *ri,
 -                                   bool isread)
 -{
 -    if (arm_is_secure_below_el3(env)) {
 -        /* Access ok in secure mode.  */
 -        return CP_ACCESS_OK;
 -    }
 -    return access_lor_ns(env);
 -}
 -
  static CPAccessResult access_lor_other(CPUARMState *env,
                                         const ARMCPRegInfo *ri, bool isread)
  {
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_other(CPUARMState *env,
          /* Access denied in secure mode.  */
          return CP_ACCESS_TRAP;
      }
 -    return access_lor_ns(env);
 +    return access_lor_ns(env, ri, isread);
  }
  /*
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lor_reginfo[] = {
        .type = ARM_CP_CONST, .resetvalue = 0 },
      { .name = "LORID_EL1", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 7,
 -      .access = PL1_R, .accessfn = access_lorid,
 +      .access = PL1_R, .accessfn = access_lor_ns,
        .type = ARM_CP_CONST, .resetvalue = 0 },
      REGINFO_SENTINEL
  };
 --
-.20.1
+.34.1

-[PULL 06/26] target/arm: Expand read/write_neon_element32 to all MemOp
+[PULL 05/26] target/arm: Introduce ARMSecuritySpace
 From: Richard Henderson <richard.henderson@linaro.org>
-We can then use this to improve VMOV (scalar to gp) and
+Introduce both the enumeration and functions to retrieve
-VMOV (gp to scalar) so that we simply perform the memory
+the current state, and state outside of EL3.
 operation that we wanted, rather than inserting or
 extracting from a 32-bit quantity.
-These were the last uses of neon_load/store_reg, so remove them.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-7-richard.henderson@linaro.org
+Message-id: 20230620124418.805717-6-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c         | 50 +++++++++++++-----------
+ target/arm/cpu.h    | 89 ++++++++++++++++++++++++++++++++++-----------
- target/arm/translate-vfp.c.inc | 71 +++++-----------------------------
+ target/arm/helper.c | 60 ++++++++++++++++++++++++++++++
-files changed, 37 insertions(+), 84 deletions(-)
+files changed, 127 insertions(+), 22 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/cpu.h
-+++ b/target/arm/translate.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
+@@ -XXX,XX +XXX,XX @@ static inline int arm_feature(CPUARMState *env, int feature)
-  * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
-  * where 0 is the least significant end of the register.
+ void arm_cpu_finalize_features(ARMCPU *cpu, Error **errp);
 -#if !defined(CONFIG_USER_ONLY)
  /*
 + * ARM v9 security states.
 + * The ordering of the enumeration corresponds to the low 2 bits
 + * of the GPI value, and (except for Root) the concat of NSE:NS.
 + */
 +
 +typedef enum ARMSecuritySpace {
 +    ARMSS_Secure     = 0,
 +    ARMSS_NonSecure  = 1,
 +    ARMSS_Root       = 2,
 +    ARMSS_Realm      = 3,
 +} ARMSecuritySpace;
 +
 +/* Return true if @space is secure, in the pre-v9 sense. */
 +static inline bool arm_space_is_secure(ARMSecuritySpace space)
 +{
 +    return space == ARMSS_Secure || space == ARMSS_Root;
 +}
 +
 +/* Return the ARMSecuritySpace for @secure, assuming !RME or EL[0-2]. */
 +static inline ARMSecuritySpace arm_secure_to_space(bool secure)
 +{
 +    return secure ? ARMSS_Secure : ARMSS_NonSecure;
 +}
 +
 +#if !defined(CONFIG_USER_ONLY)
 +/**
 + * arm_security_space_below_el3:
 + * @env: cpu context
 + *
 + * Return the security space of exception levels below EL3, following
 + * an exception return to those levels.  Unlike arm_security_space,
 + * this doesn't care about the current EL.
 + */
 +ARMSecuritySpace arm_security_space_below_el3(CPUARMState *env);
 +
 +/**
 + * arm_is_secure_below_el3:
 + * @env: cpu context
 + *
   * Return true if exception levels below EL3 are in secure state,
 - * or would be following an exception return to that level.
 - * Unlike arm_is_secure() (which is always a question about the
 - * _current_ state of the CPU) this doesn't care about the current
 - * EL or mode.
 + * or would be following an exception return to those levels.
   */
--static long neon_element_offset(int reg, int element, MemOp size)
+ static inline bool arm_is_secure_below_el3(CPUARMState *env)
-+static long neon_element_offset(int reg, int element, MemOp memop)
+ {
- {
+-    assert(!arm_feature(env, ARM_FEATURE_M));
--    int element_size = 1 << size;
+-    if (arm_feature(env, ARM_FEATURE_EL3)) {
-+    int element_size = 1 << (memop & MO_SIZE);
+-        return !(env->cp15.scr_el3 & SCR_NS);
-     int ofs = element * element_size;
+-    } else {
- #ifdef HOST_WORDS_BIGENDIAN
+-        /* If EL3 is not supported then the secure state is implementation
-     /*
+-         * defined, in which case QEMU defaults to non-secure.
-@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
+-         */
 -        return false;
 -    }
 +    ARMSecuritySpace ss = arm_security_space_below_el3(env);
 +    return ss == ARMSS_Secure;
  }
  /* Return true if the CPU is AArch64 EL3 or AArch32 Mon */
@@ -XXX,XX +XXX,XX @@ static inline bool arm_is_el3_or_mon(CPUARMState *env)
      return false;
  }
 -/* Return true if the processor is in secure state */
 +/**
 + * arm_security_space:
 + * @env: cpu context
 + *
 + * Return the current security space of the cpu.
 + */
 +ARMSecuritySpace arm_security_space(CPUARMState *env);
 +
 +/**
 + * arm_is_secure:
 + * @env: cpu context
 + *
 + * Return true if the processor is in secure state.
 + */
  static inline bool arm_is_secure(CPUARMState *env)
  {
 -    if (arm_feature(env, ARM_FEATURE_M)) {
 -        return env->v7m.secure;
 -    }
 -    if (arm_is_el3_or_mon(env)) {
 -        return true;
 -    }
 -    return arm_is_secure_below_el3(env);
 +    return arm_space_is_secure(arm_security_space(env));
  }
  /*
@@ -XXX,XX +XXX,XX @@ static inline bool arm_is_el2_enabled(CPUARMState *env)
  }
  #else
 +static inline ARMSecuritySpace arm_security_space_below_el3(CPUARMState *env)
 +{
 +    return ARMSS_NonSecure;
 +}
 +
  static inline bool arm_is_secure_below_el3(CPUARMState *env)
  {
      return false;
  }
 +static inline ARMSecuritySpace arm_security_space(CPUARMState *env)
 +{
 +    return ARMSS_NonSecure;
 +}
 +
  static inline bool arm_is_secure(CPUARMState *env)
  {
      return false;
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void aarch64_sve_change_el(CPUARMState *env, int old_el,
      }
  }
+ #endif
--static TCGv_i32 neon_load_reg(int reg, int pass)
++
--{
++#ifndef CONFIG_USER_ONLY
--    TCGv_i32 tmp = tcg_temp_new_i32();
++ARMSecuritySpace arm_security_space(CPUARMState *env)
--    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
++{
--    return tmp;
++    if (arm_feature(env, ARM_FEATURE_M)) {
--}
++        return arm_secure_to_space(env->v7m.secure);
--
++    }
--static void neon_store_reg(int reg, int pass, TCGv_i32 var)
++
--{
++    /*
--    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
++     * If EL3 is not supported then the secure state is implementation
--    tcg_temp_free_i32(var);
++     * defined, in which case QEMU defaults to non-secure.
--}
++     */
--
++    if (!arm_feature(env, ARM_FEATURE_EL3)) {
- static inline void neon_load_reg64(TCGv_i64 var, int reg)
++        return ARMSS_NonSecure;
- {
++    }
-     tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
++
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
++    /* Check for AArch64 EL3 or AArch32 Mon. */
-     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
++    if (is_a64(env)) {
- }
++        if (extract32(env->pstate, 2, 2) == 3) {
++            if (cpu_isar_feature(aa64_rme, env_archcpu(env))) {
--static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
++                return ARMSS_Root;
-+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
++            } else {
- {
++                return ARMSS_Secure;
--    long off = neon_element_offset(reg, ele, size);
++            }
-+    long off = neon_element_offset(reg, ele, memop);
++        }
++    } else {
--    switch (size) {
++        if ((env->uncached_cpsr & CPSR_M) == ARM_CPU_MODE_MON) {
--    case MO_32:
++            return ARMSS_Secure;
-+    switch (memop) {
++        }
-+    case MO_SB:
++    }
-+        tcg_gen_ld8s_i32(dest, cpu_env, off);
++
-+        break;
++    return arm_security_space_below_el3(env);
-+    case MO_UB:
++}
-+        tcg_gen_ld8u_i32(dest, cpu_env, off);
++
-+        break;
++ARMSecuritySpace arm_security_space_below_el3(CPUARMState *env)
-+    case MO_SW:
++{
-+        tcg_gen_ld16s_i32(dest, cpu_env, off);
++    assert(!arm_feature(env, ARM_FEATURE_M));
-+        break;
++
-+    case MO_UW:
++    /*
-+        tcg_gen_ld16u_i32(dest, cpu_env, off);
++     * If EL3 is not supported then the secure state is implementation
-+        break;
++     * defined, in which case QEMU defaults to non-secure.
-+    case MO_UL:
++     */
-+    case MO_SL:
++    if (!arm_feature(env, ARM_FEATURE_EL3)) {
-         tcg_gen_ld_i32(dest, cpu_env, off);
++        return ARMSS_NonSecure;
-         break;
++    }
-     default:
++
-@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
++    /*
-     }
++     * Note NSE cannot be set without RME, and NSE & !NS is Reserved.
- }
++     * Ignoring NSE when !NS retains consistency without having to
++     * modify other predicates.
--static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
++     */
-+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
++    if (!(env->cp15.scr_el3 & SCR_NS)) {
- {
++        return ARMSS_Secure;
--    long off = neon_element_offset(reg, ele, size);
++    } else if (env->cp15.scr_el3 & SCR_NSE) {
-+    long off = neon_element_offset(reg, ele, memop);
++        return ARMSS_Realm;
++    } else {
--    switch (size) {
++        return ARMSS_NonSecure;
-+    switch (memop) {
++    }
-+    case MO_8:
++}
-+        tcg_gen_st8_i32(src, cpu_env, off);
++#endif /* !CONFIG_USER_ONLY */
 +        break;
 +    case MO_16:
 +        tcg_gen_st16_i32(src, cpu_env, off);
 +        break;
      case MO_32:
          tcg_gen_st_i32(src, cpu_env, off);
          break;
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
  {
      /* VMOV scalar to general purpose register */
      TCGv_i32 tmp;
 -    int pass;
 -    uint32_t offset;
 -    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
 -    if (a->size == 2
 +    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
 +    if (a->size == MO_32
          ? !dc_isar_feature(aa32_fpsp_v2, s)
          : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
          return false;
      }
 -    offset = a->index << a->size;
 -    pass = extract32(offset, 2, 1);
 -    offset = extract32(offset, 0, 2) * 8;
 -
      if (!vfp_access_check(s)) {
          return true;
      }
 -    tmp = neon_load_reg(a->vn, pass);
 -    switch (a->size) {
 -    case 0:
 -        if (offset) {
 -            tcg_gen_shri_i32(tmp, tmp, offset);
 -        }
 -        if (a->u) {
 -            gen_uxtb(tmp);
 -        } else {
 -            gen_sxtb(tmp);
 -        }
 -        break;
 -    case 1:
 -        if (a->u) {
 -            if (offset) {
 -                tcg_gen_shri_i32(tmp, tmp, 16);
 -            } else {
 -                gen_uxth(tmp);
 -            }
 -        } else {
 -            if (offset) {
 -                tcg_gen_sari_i32(tmp, tmp, 16);
 -            } else {
 -                gen_sxth(tmp);
 -            }
 -        }
 -        break;
 -    case 2:
 -        break;
 -    }
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
      store_reg(s, a->rt, tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
  static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
  {
      /* VMOV general purpose register to scalar */
 -    TCGv_i32 tmp, tmp2;
 -    int pass;
 -    uint32_t offset;
 +    TCGv_i32 tmp;
 -    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
 -    if (a->size == 2
 +    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
 +    if (a->size == MO_32
          ? !dc_isar_feature(aa32_fpsp_v2, s)
          : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
          return false;
      }
 -    offset = a->index << a->size;
 -    pass = extract32(offset, 2, 1);
 -    offset = extract32(offset, 0, 2) * 8;
 -
      if (!vfp_access_check(s)) {
          return true;
      }
      tmp = load_reg(s, a->rt);
 -    switch (a->size) {
 -    case 0:
 -        tmp2 = neon_load_reg(a->vn, pass);
 -        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
 -        tcg_temp_free_i32(tmp2);
 -        break;
 -    case 1:
 -        tmp2 = neon_load_reg(a->vn, pass);
 -        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
 -        tcg_temp_free_i32(tmp2);
 -        break;
 -    case 2:
 -        break;
 -    }
 -    neon_store_reg(a->vn, pass, tmp);
 +    write_neon_element32(tmp, a->vn, a->index, a->size);
 +    tcg_temp_free_i32(tmp);
      return true;
  }
 --
-.20.1
+.34.1

-[PULL 26/26] tests/qtest/npcm7xx_rng-test: Disable randomness tests
+[PULL 06/26] include/exec/memattrs: Add two bits of space to MemTxAttrs
-The randomness tests in the NPCM7xx RNG test fail intermittently
+From: Richard Henderson <richard.henderson@linaro.org>
 but fairly frequently. On my machine running the test in a loop:
  while QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/npcm7xx_rng-test; do true; done
-will fail in less than a minute with an error like:
+We will need 2 bits to represent ARMSecurityState.
 ERROR:../../tests/qtest/npcm7xx_rng-test.c:256:test_first_byte_runs:
 assertion failed (calc_runs_p(buf.l, sizeof(buf) * BITS_PER_BYTE) > 0.01): (0.00286205989 > 0.01)
-(Failures have been observed on all 4 of the randomness tests,
+Do not attempt to replace or widen secure, even though it
-not just first_byte_runs.)
+logically overlaps the new field -- there are uses within
 e.g. hw/block/pflash_cfi01.c, which don't know anything
 specific about ARM.
-It's not clear why these tests are failing like this, but intermittent
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-failures make CI and merge testing awkward, so disable running them
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-unless a developer specifically sets QEMU_TEST_FLAKY_RNG_TESTS when
+Message-id: 20230620124418.805717-7-richard.henderson@linaro.org
-running the test suite, until we work out the cause.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  include/exec/memattrs.h | 9 ++++++++-
 file changed, 8 insertions(+), 1 deletion(-)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/include/exec/memattrs.h b/include/exec/memattrs.h
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Message-id: 20201102152454.8287-1-peter.maydell@linaro.org
 Reviewed-by: Havard Skinnemoen <hskinnemoen@google.com>
 ---
  tests/qtest/npcm7xx_rng-test.c | 14 ++++++++++----
 file changed, 10 insertions(+), 4 deletions(-)
 diff --git a/tests/qtest/npcm7xx_rng-test.c b/tests/qtest/npcm7xx_rng-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/tests/qtest/npcm7xx_rng-test.c
+--- a/include/exec/memattrs.h
-+++ b/tests/qtest/npcm7xx_rng-test.c
++++ b/include/exec/memattrs.h
-@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
+@@ -XXX,XX +XXX,XX @@ typedef struct MemTxAttrs {
+      * "didn't specify" if necessary.
-     qtest_add_func("npcm7xx_rng/enable_disable", test_enable_disable);
+      */
-     qtest_add_func("npcm7xx_rng/rosel", test_rosel);
+     unsigned int unspecified:1;
--    qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
+-    /* ARM/AMBA: TrustZone Secure access
 -    qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
 -    qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
 -    qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
 +    /*
-+     * These tests fail intermittently; only run them on explicit
++     * ARM/AMBA: TrustZone Secure access
-+     * request until we figure out why.
+      * x86: System Management Mode access
       */
      unsigned int secure:1;
 +    /*
 +     * ARM: ArmSecuritySpace.  This partially overlaps secure, but it is
 +     * easier to have both fields to assist code that does not understand
 +     * ARMv9 RME, or no specific knowledge of ARM at all (e.g. pflash).
 +     */
-+    if (getenv("QEMU_TEST_FLAKY_RNG_TESTS")) {
++    unsigned int space:2;
-+        qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
+     /* Memory access is usermode (unprivileged) */
-+        qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
+     unsigned int user:1;
-+        qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
+     /*
 +        qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
 +    }
      qtest_start("-machine npcm750-evb");
      ret = g_test_run();
 --
-.20.1
+.34.1

-[PULL 25/26] qemu-option-trace.rst.inc: Don't use option:: markup
+[PULL 07/26] target/arm: Adjust the order of Phys and Stage2 ARMMMUIdx
-Sphinx 3.2 is pickier than earlier versions about the option:: markup,
+From: Richard Henderson <richard.henderson@linaro.org>
 and complains about our usage in qemu-option-trace.rst:
-../../docs/qemu-option-trace.rst.inc:4:Malformed option description
+It will be helpful to have ARMMMUIdx_Phys_* to be in the same
-  '[enable=]PATTERN', should look like "opt", "-opt args", "--opt args",
+relative order as ARMSecuritySpace enumerators. This requires
-  "/opt args" or "+opt args"
+the adjustment to the nstable check. While there, check for being
 in secure state rather than rely on clearing the low bit making
 no change to non-secure state.
-In this file, we're really trying to document the different parts of
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-the top-level --trace option, which qemu-nbd.rst and qemu-img.rst
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-have already introduced with an option:: markup.  So it's not right
+Message-id: 20230620124418.805717-8-richard.henderson@linaro.org
-to use option:: here anyway.  Switch to a different markup
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-(definition lists) which gives about the same formatted output.
+---
  target/arm/cpu.h | 12 ++++++------
  target/arm/ptw.c | 12 +++++-------
 files changed, 11 insertions(+), 13 deletions(-)
-(Unlike option::, this markup doesn't produce index entries; but
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 at the moment we don't do anything much with indexes anyway, and
 in any case I think it doesn't make much sense to have individual
 index entries for the sub-parts of the --trace option.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
 Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
 Message-id: 20201030174700.7204-3-peter.maydell@linaro.org
 ---
  docs/qemu-option-trace.rst.inc | 6 +++---
 file changed, 3 insertions(+), 3 deletions(-)
 diff --git a/docs/qemu-option-trace.rst.inc b/docs/qemu-option-trace.rst.inc
 index XXXXXXX..XXXXXXX 100644
---- a/docs/qemu-option-trace.rst.inc
+--- a/target/arm/cpu.h
-+++ b/docs/qemu-option-trace.rst.inc
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
+     ARMMMUIdx_E2        = 6 | ARM_MMU_IDX_A,
- Specify tracing options.
+     ARMMMUIdx_E3        = 7 | ARM_MMU_IDX_A,
--.. option:: [enable=]PATTERN
+-    /* TLBs with 1-1 mapping to the physical address spaces. */
-+``[enable=]PATTERN``
+-    ARMMMUIdx_Phys_NS   = 8 | ARM_MMU_IDX_A,
+-    ARMMMUIdx_Phys_S    = 9 | ARM_MMU_IDX_A,
-   Immediately enable events matching *PATTERN*
+-
-   (either event name or a globbing pattern).  This option is only
+     /*
-@@ -XXX,XX +XXX,XX @@ Specify tracing options.
+      * Used for second stage of an S12 page table walk, or for descriptor
+      * loads during first stage of an S1 page table walk.  Note that both
-   Use :option:`-trace help` to print a list of names of trace points.
+      * are in use simultaneously for SecureEL2: the security state for
+      * the S2 ptw is selected by the NS bit from the S1 ptw.
--.. option:: events=FILE
+      */
-+``events=FILE``
+-    ARMMMUIdx_Stage2    = 10 | ARM_MMU_IDX_A,
+-    ARMMMUIdx_Stage2_S  = 11 | ARM_MMU_IDX_A,
-   Immediately enable events listed in *FILE*.
++    ARMMMUIdx_Stage2_S  = 8 | ARM_MMU_IDX_A,
-   The file must contain one event name (as listed in the ``trace-events-all``
++    ARMMMUIdx_Stage2    = 9 | ARM_MMU_IDX_A,
-@@ -XXX,XX +XXX,XX @@ Specify tracing options.
++
-   available if QEMU has been compiled with the ``simple``, ``log`` or
++    /* TLBs with 1-1 mapping to the physical address spaces. */
-   ``ftrace`` tracing backend.
++    ARMMMUIdx_Phys_S    = 10 | ARM_MMU_IDX_A,
++    ARMMMUIdx_Phys_NS   = 11 | ARM_MMU_IDX_A,
--.. option:: file=FILE
-+``file=FILE``
+     /*
+      * These are not allocated TLBs and are used only for AT system
-   Log output traces to *FILE*.
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
-   This option is only available if QEMU has been compiled with
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
      descaddr |= (address >> (stride * (4 - level))) & indexmask;
      descaddr &= ~7ULL;
      nstable = !regime_is_stage2(mmu_idx) && extract32(tableattrs, 4, 1);
 -    if (nstable) {
 +    if (nstable && ptw->in_secure) {
          /*
           * Stage2_S -> Stage2 or Phys_S -> Phys_NS
 -         * Assert that the non-secure idx are even, and relative order.
 +         * Assert the relative order of the secure/non-secure indexes.
           */
 -        QEMU_BUILD_BUG_ON((ARMMMUIdx_Phys_NS & 1) != 0);
 -        QEMU_BUILD_BUG_ON((ARMMMUIdx_Stage2 & 1) != 0);
 -        QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_NS + 1 != ARMMMUIdx_Phys_S);
 -        QEMU_BUILD_BUG_ON(ARMMMUIdx_Stage2 + 1 != ARMMMUIdx_Stage2_S);
 -        ptw->in_ptw_idx &= ~1;
 +        QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_S + 1 != ARMMMUIdx_Phys_NS);
 +        QEMU_BUILD_BUG_ON(ARMMMUIdx_Stage2_S + 1 != ARMMMUIdx_Stage2);
 +        ptw->in_ptw_idx += 1;
          ptw->in_secure = false;
      }
      if (!S1_ptw_translate(env, ptw, descaddr, fi)) {
 --
-.20.1
+.34.1

-[PULL 23/26] hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
+[PULL 08/26] target/arm: Introduce ARMMMUIdx_Phys_{Realm,Root}
-In gicv3_init_cpuif() we copy the ARMCPU gicv3_maintenance_interrupt
+From: Richard Henderson <richard.henderson@linaro.org>
 into the GICv3CPUState struct's maintenance_irq field.  This will
 only work if the board happens to have already wired up the CPU
 maintenance IRQ before the GIC was realized.  Unfortunately this is
 not the case for the 'virt' board, and so the value that gets copied
 is NULL (since a qemu_irq is really a pointer to an IRQState struct
 under the hood).  The effect is that the CPU interface code never
 actually raises the maintenance interrupt line.
-Instead, since the GICv3CPUState has a pointer to the CPUState, make
+With FEAT_RME, there are four physical address spaces.
-the dereference at the point where we want to raise the interrupt, to
+For now, just define the symbols, and mention them in
-avoid an implicit requirement on board code to wire things up in a
+the same spots as the other Phys indexes in ptw.c.
 particular order.
-Reported-by: Jose Martins <josemartins90@gmail.com>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20230620124418.805717-9-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20201009153904.28529-1-peter.maydell@linaro.org
-Reviewed-by: Luc Michel <luc@lmichel.fr>
 ---
- include/hw/intc/arm_gicv3_common.h | 1 -
+ target/arm/cpu.h | 23 +++++++++++++++++++++--
- hw/intc/arm_gicv3_cpuif.c          | 5 ++---
+ target/arm/ptw.c | 10 ++++++++--
-files changed, 2 insertions(+), 4 deletions(-)
+files changed, 29 insertions(+), 4 deletions(-)
-diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/intc/arm_gicv3_common.h
+--- a/target/arm/cpu.h
-+++ b/include/hw/intc/arm_gicv3_common.h
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ struct GICv3CPUState {
+@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
-     qemu_irq parent_fiq;
+     ARMMMUIdx_Stage2    = 9 | ARM_MMU_IDX_A,
-     qemu_irq parent_virq;
-     qemu_irq parent_vfiq;
+     /* TLBs with 1-1 mapping to the physical address spaces. */
--    qemu_irq maintenance_irq;
+-    ARMMMUIdx_Phys_S    = 10 | ARM_MMU_IDX_A,
+-    ARMMMUIdx_Phys_NS   = 11 | ARM_MMU_IDX_A,
-     /* Redistributor */
++    ARMMMUIdx_Phys_S     = 10 | ARM_MMU_IDX_A,
-     uint32_t level;                  /* Current IRQ level */
++    ARMMMUIdx_Phys_NS    = 11 | ARM_MMU_IDX_A,
-diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
++    ARMMMUIdx_Phys_Root  = 12 | ARM_MMU_IDX_A,
 +    ARMMMUIdx_Phys_Realm = 13 | ARM_MMU_IDX_A,
      /*
       * These are not allocated TLBs and are used only for AT system
@@ -XXX,XX +XXX,XX @@ typedef enum ARMASIdx {
      ARMASIdx_TagS = 3,
  } ARMASIdx;
 +static inline ARMMMUIdx arm_space_to_phys(ARMSecuritySpace space)
 +{
 +    /* Assert the relative order of the physical mmu indexes. */
 +    QEMU_BUILD_BUG_ON(ARMSS_Secure != 0);
 +    QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_NS != ARMMMUIdx_Phys_S + ARMSS_NonSecure);
 +    QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_Root != ARMMMUIdx_Phys_S + ARMSS_Root);
 +    QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_Realm != ARMMMUIdx_Phys_S + ARMSS_Realm);
 +
 +    return ARMMMUIdx_Phys_S + space;
 +}
 +
 +static inline ARMSecuritySpace arm_phys_to_space(ARMMMUIdx idx)
 +{
 +    assert(idx >= ARMMMUIdx_Phys_S && idx <= ARMMMUIdx_Phys_Realm);
 +    return idx - ARMMMUIdx_Phys_S;
 +}
 +
  static inline bool arm_v7m_csselr_razwi(ARMCPU *cpu)
  {
      /* If all the CLIDR.Ctypem bits are 0 there are no caches, and
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/arm_gicv3_cpuif.c
+--- a/target/arm/ptw.c
-+++ b/hw/intc/arm_gicv3_cpuif.c
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
+@@ -XXX,XX +XXX,XX @@ static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx,
-     int irqlevel = 0;
+     case ARMMMUIdx_E3:
-     int fiqlevel = 0;
+         break;
-     int maintlevel = 0;
-+    ARMCPU *cpu = ARM_CPU(cs->cpu);
+-    case ARMMMUIdx_Phys_NS:
+     case ARMMMUIdx_Phys_S:
-     idx = hppvi_index(cs);
++    case ARMMMUIdx_Phys_NS:
-     trace_gicv3_cpuif_virt_update(gicv3_redist_affid(cs), idx);
++    case ARMMMUIdx_Phys_Root:
-@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
++    case ARMMMUIdx_Phys_Realm:
+         /* No translation for physical address spaces. */
-     qemu_set_irq(cs->parent_vfiq, fiqlevel);
+         return true;
-     qemu_set_irq(cs->parent_virq, irqlevel);
--    qemu_set_irq(cs->maintenance_irq, maintlevel);
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_disabled(CPUARMState *env, target_ulong address,
-+    qemu_set_irq(cpu->gicv3_maintenance_interrupt, maintlevel);
+     switch (mmu_idx) {
- }
+     case ARMMMUIdx_Stage2:
+     case ARMMMUIdx_Stage2_S:
- static uint64_t icv_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
+-    case ARMMMUIdx_Phys_NS:
-@@ -XXX,XX +XXX,XX @@ void gicv3_init_cpuif(GICv3State *s)
+     case ARMMMUIdx_Phys_S:
-             && cpu->gic_num_lrs) {
++    case ARMMMUIdx_Phys_NS:
-             int j;
++    case ARMMMUIdx_Phys_Root:
++    case ARMMMUIdx_Phys_Realm:
--            cs->maintenance_irq = cpu->gicv3_maintenance_interrupt;
+         break;
--
-             cs->num_list_regs = cpu->gic_num_lrs;
+     default:
-             cs->vpribits = cpu->gic_vpribits;
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
-             cs->vprebits = cpu->gic_vprebits;
+     switch (mmu_idx) {
      case ARMMMUIdx_Phys_S:
      case ARMMMUIdx_Phys_NS:
 +    case ARMMMUIdx_Phys_Root:
 +    case ARMMMUIdx_Phys_Realm:
          /* Checking Phys early avoids special casing later vs regime_el. */
          return get_phys_addr_disabled(env, address, access_type, mmu_idx,
                                        is_secure, result, fi);
 --
-.20.1
+.34.1

-[PULL 22/26] configure: Test that gio libs from pkg-config work
+[PULL 09/26] target/arm: Remove __attribute__((nonnull)) from ptw.c
-On some hosts (eg Ubuntu Bionic) pkg-config returns a set of
+From: Richard Henderson <richard.henderson@linaro.org>
 libraries for gio-2.0 which don't actually work when compiling
 statically. (Specifically, the returned library string includes
 -lmount, but not -lblkid which -lmount depends upon, so linking
 fails due to missing symbols.)
-Check that the libraries work, and don't enable gio if they don't,
+This was added in 7e98e21c098 as part of a reorg in which
-in the same way we do for gnutls.
+one of the argument had been legally NULL, and this caught
 actual instances.  Now that the reorg is complete, this
 serves little purpose.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20230620124418.805717-10-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20200928160402.7961-1-peter.maydell@linaro.org
 ---
- configure | 10 +++++++++-
+ target/arm/ptw.c | 6 ++----
-file changed, 9 insertions(+), 1 deletion(-)
+file changed, 2 insertions(+), 4 deletions(-)
-diff --git a/configure b/configure
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
-index XXXXXXX..XXXXXXX 100755
+index XXXXXXX..XXXXXXX 100644
---- a/configure
+--- a/target/arm/ptw.c
-+++ b/configure
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ if test "$static" = yes && test "$mingw32" = yes; then
+@@ -XXX,XX +XXX,XX @@ typedef struct S1Translate {
- fi
+ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
+                                uint64_t address,
- if $pkg_config --atleast-version=$glib_req_ver gio-2.0; then
+                                MMUAccessType access_type, bool s1_is_el0,
--    gio=yes
+-                               GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
-     gio_cflags=$($pkg_config --cflags gio-2.0)
+-    __attribute__((nonnull));
-     gio_libs=$($pkg_config --libs gio-2.0)
++                               GetPhysAddrResult *result, ARMMMUFaultInfo *fi);
-     gdbus_codegen=$($pkg_config --variable=gdbus_codegen gio-2.0)
-     if [ ! -x "$gdbus_codegen" ]; then
+ static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
-         gdbus_codegen=
+                                       target_ulong address,
-     fi
+                                       MMUAccessType access_type,
-+    # Check that the libraries actually work -- Ubuntu 18.04 ships
+                                       GetPhysAddrResult *result,
-+    # with pkg-config --static --libs data for gio-2.0 that is missing
+-                                      ARMMMUFaultInfo *fi)
-+    # -lblkid and will give a link error.
+-    __attribute__((nonnull));
-+    write_c_skeleton
++                                      ARMMMUFaultInfo *fi);
-+    if compile_prog "" "gio_libs" ; then
-+        gio=yes
+ /* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
-+    else
+ static const uint8_t pamax_map[] = {
 +        gio=no
 +    fi
  else
      gio=no
  fi
 --
-.20.1
+.34.1

-[PULL 11/26] target/arm: Improve do_prewiden_3d
+[PULL 10/26] target/arm: Pipe ARMSecuritySpace through ptw.c
 From: Richard Henderson <richard.henderson@linaro.org>
-We can use proper widening loads to extend 32-bit inputs,
+Add input and output space members to S1Translate.  Set and adjust
-and skip the "widenfn" step.
+them in S1_ptw_translate, and the various points at which we drop
 secure state.  Initialize the space in get_phys_addr; for now leave
 get_phys_addr_with_secure considering only secure vs non-secure spaces.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-12-richard.henderson@linaro.org
+Message-id: 20230620124418.805717-11-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          |  6 +++
+ target/arm/ptw.c | 86 +++++++++++++++++++++++++++++++++++++++---------
- target/arm/translate-neon.c.inc | 66 ++++++++++++++++++---------------
+file changed, 71 insertions(+), 15 deletions(-)
 files changed, 43 insertions(+), 29 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/ptw.c
-+++ b/target/arm/translate.c
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
+@@ -XXX,XX +XXX,XX @@
-     long off = neon_element_offset(reg, ele, memop);
+ typedef struct S1Translate {
+     ARMMMUIdx in_mmu_idx;
-     switch (memop) {
+     ARMMMUIdx in_ptw_idx;
-+    case MO_SL:
++    ARMSecuritySpace in_space;
-+        tcg_gen_ld32s_i64(dest, cpu_env, off);
+     bool in_secure;
-+        break;
+     bool in_debug;
-+    case MO_UL:
+     bool out_secure;
-+        tcg_gen_ld32u_i64(dest, cpu_env, off);
+     bool out_rw;
-+        break;
+     bool out_be;
-     case MO_Q:
++    ARMSecuritySpace out_space;
-         tcg_gen_ld_i64(dest, cpu_env, off);
+     hwaddr out_virt;
-         break;
+     hwaddr out_phys;
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+     void *out_host;
-index XXXXXXX..XXXXXXX 100644
+@@ -XXX,XX +XXX,XX @@ static bool S2_attrs_are_device(uint64_t hcr, uint8_t attrs)
---- a/target/arm/translate-neon.c.inc
+ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
-+++ b/target/arm/translate-neon.c.inc
+                              hwaddr addr, ARMMMUFaultInfo *fi)
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
  static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
                             NeonGenWidenFn *widenfn,
                             NeonGenTwo64OpFn *opfn,
 -                           bool src1_wide)
 +                           int src1_mop, int src2_mop)
  {
-     /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
++    ARMSecuritySpace space = ptw->in_space;
-     TCGv_i64 rn0_64, rn1_64, rm_64;
+     bool is_secure = ptw->in_secure;
--    TCGv_i32 rm;
+     ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
+     ARMMMUIdx s2_mmu_idx = ptw->in_ptw_idx;
-     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
-         return false;
+                 .in_mmu_idx = s2_mmu_idx,
-@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
+                 .in_ptw_idx = ptw_idx_for_stage_2(env, s2_mmu_idx),
-         return false;
+                 .in_secure = s2_mmu_idx == ARMMMUIdx_Stage2_S,
-     }
++                .in_space = (s2_mmu_idx == ARMMMUIdx_Stage2_S ? ARMSS_Secure
++                             : space == ARMSS_Realm ? ARMSS_Realm
--    if (!widenfn || !opfn) {
++                             : ARMSS_NonSecure),
-+    if (!opfn) {
+                 .in_debug = true,
-         /* size == 3 case, which is an entirely different insn group */
+             };
-         return false;
+             GetPhysAddrResult s2 = { };
-     }
+@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
+             ptw->out_phys = s2.f.phys_addr;
--    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
+             pte_attrs = s2.cacheattrs.attrs;
-+    if ((a->vd & 1) || (src1_mop == MO_Q && (a->vn & 1))) {
+             ptw->out_secure = s2.f.attrs.secure;
-         return false;
++            ptw->out_space = s2.f.attrs.space;
-     }
+         } else {
+             /* Regime is physical. */
-@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
+             ptw->out_phys = addr;
-     rn1_64 = tcg_temp_new_i64();
+             pte_attrs = 0;
-     rm_64 = tcg_temp_new_i64();
+             ptw->out_secure = s2_mmu_idx == ARMMMUIdx_Phys_S;
++            ptw->out_space = (s2_mmu_idx == ARMMMUIdx_Phys_S ? ARMSS_Secure
--    if (src1_wide) {
++                              : space == ARMSS_Realm ? ARMSS_Realm
--        read_neon_element64(rn0_64, a->vn, 0, MO_64);
++                              : ARMSS_NonSecure);
-+    if (src1_mop >= 0) {
+         }
-+        read_neon_element64(rn0_64, a->vn, 0, src1_mop);
+         ptw->out_host = NULL;
          ptw->out_rw = false;
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
          ptw->out_rw = full->prot & PAGE_WRITE;
          pte_attrs = full->pte_attrs;
          ptw->out_secure = full->attrs.secure;
 +        ptw->out_space = full->attrs.space;
  #else
          g_assert_not_reached();
  #endif
@@ -XXX,XX +XXX,XX @@ static uint32_t arm_ldl_ptw(CPUARMState *env, S1Translate *ptw,
          }
      } else {
-         TCGv_i32 tmp = tcg_temp_new_i32();
+         /* Page tables are in MMIO. */
-         read_neon_element32(tmp, a->vn, 0, MO_32);
+-        MemTxAttrs attrs = { .secure = ptw->out_secure };
-         widenfn(rn0_64, tmp);
++        MemTxAttrs attrs = {
-         tcg_temp_free_i32(tmp);
++            .secure = ptw->out_secure,
-     }
++            .space = ptw->out_space,
--    rm = tcg_temp_new_i32();
++        };
--    read_neon_element32(rm, a->vm, 0, MO_32);
+         AddressSpace *as = arm_addressspace(cs, attrs);
-+    if (src2_mop >= 0) {
+         MemTxResult result = MEMTX_OK;
-+        read_neon_element64(rm_64, a->vm, 0, src2_mop);
-+    } else {
+@@ -XXX,XX +XXX,XX @@ static uint64_t arm_ldq_ptw(CPUARMState *env, S1Translate *ptw,
-+        TCGv_i32 tmp = tcg_temp_new_i32();
+ #endif
-+        read_neon_element32(tmp, a->vm, 0, MO_32);
+     } else {
-+        widenfn(rm_64, tmp);
+         /* Page tables are in MMIO. */
-+        tcg_temp_free_i32(tmp);
+-        MemTxAttrs attrs = { .secure = ptw->out_secure };
-+    }
++        MemTxAttrs attrs = {
++            .secure = ptw->out_secure,
--    widenfn(rm_64, rm);
++            .space = ptw->out_space,
--    tcg_temp_free_i32(rm);
++        };
-     opfn(rn0_64, rn0_64, rm_64);
+         AddressSpace *as = arm_addressspace(cs, attrs);
          MemTxResult result = MEMTX_OK;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v6(CPUARMState *env, S1Translate *ptw,
           * regime, because the attribute will already be non-secure.
           */
          result->f.attrs.secure = false;
 +        result->f.attrs.space = ARMSS_NonSecure;
      }
      result->f.phys_addr = phys_addr;
      return false;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
           * regime, because the attribute will already be non-secure.
           */
          result->f.attrs.secure = false;
 +        result->f.attrs.space = ARMSS_NonSecure;
      }
      if (regime_is_stage2(mmu_idx)) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
               */
              if (sattrs.ns) {
                  result->f.attrs.secure = false;
 +                result->f.attrs.space = ARMSS_NonSecure;
              } else if (!secure) {
                  /*
                   * NS access to S memory must fault.
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
      bool is_secure = ptw->in_secure;
      bool ret, ipa_secure;
      ARMCacheAttrs cacheattrs1;
 +    ARMSecuritySpace ipa_space;
      bool is_el0;
      uint64_t hcr;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
      ipa = result->f.phys_addr;
      ipa_secure = result->f.attrs.secure;
 +    ipa_space = result->f.attrs.space;
      is_el0 = ptw->in_mmu_idx == ARMMMUIdx_Stage1_E0;
      ptw->in_mmu_idx = ipa_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
      ptw->in_secure = ipa_secure;
 +    ptw->in_space = ipa_space;
      ptw->in_ptw_idx = ptw_idx_for_stage_2(env, ptw->in_mmu_idx);
      /*
-      * Load second pass inputs before storing the first pass result, to
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
-      * avoid incorrect results if a narrow input overlaps with the result.
+     ARMMMUIdx s1_mmu_idx;
      /*
 -     * The page table entries may downgrade secure to non-secure, but
 -     * cannot upgrade an non-secure translation regime's attributes
 -     * to secure.
 +     * The page table entries may downgrade Secure to NonSecure, but
 +     * cannot upgrade a NonSecure translation regime's attributes
 +     * to Secure or Realm.
       */
--    if (src1_wide) {
+     result->f.attrs.secure = is_secure;
--        read_neon_element64(rn1_64, a->vn, 1, MO_64);
++    result->f.attrs.space = ptw->in_space;
-+    if (src1_mop >= 0) {
-+        read_neon_element64(rn1_64, a->vn, 1, src1_mop);
+     switch (mmu_idx) {
-     } else {
+     case ARMMMUIdx_Phys_S:
-         TCGv_i32 tmp = tcg_temp_new_i32();
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
-         read_neon_element32(tmp, a->vn, 1, MO_32);
-         widenfn(rn1_64, tmp);
+     default:
-         tcg_temp_free_i32(tmp);
+         /* Single stage uses physical for ptw. */
-     }
+-        ptw->in_ptw_idx = is_secure ? ARMMMUIdx_Phys_S : ARMMMUIdx_Phys_NS;
--    rm = tcg_temp_new_i32();
++        ptw->in_ptw_idx = arm_space_to_phys(ptw->in_space);
--    read_neon_element32(rm, a->vm, 1, MO_32);
+         break;
-+    if (src2_mop >= 0) {
+     }
-+        read_neon_element64(rm_64, a->vm, 1, src2_mop);
-+    } else {
+@@ -XXX,XX +XXX,XX @@ bool get_phys_addr_with_secure(CPUARMState *env, target_ulong address,
-+        TCGv_i32 tmp = tcg_temp_new_i32();
+     S1Translate ptw = {
-+        read_neon_element32(tmp, a->vm, 1, MO_32);
+         .in_mmu_idx = mmu_idx,
-+        widenfn(rm_64, tmp);
+         .in_secure = is_secure,
-+        tcg_temp_free_i32(tmp);
++        .in_space = arm_secure_to_space(is_secure),
-+    }
+     };
+     return get_phys_addr_with_struct(env, &ptw, address, access_type,
-     write_neon_element64(rn0_64, a->vd, 0, MO_64);
+                                      result, fi);
+@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
--    widenfn(rm_64, rm);
+                    MMUAccessType access_type, ARMMMUIdx mmu_idx,
--    tcg_temp_free_i32(rm);
+                    GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
-     opfn(rn1_64, rn1_64, rm_64);
+ {
-     write_neon_element64(rn1_64, a->vd, 1, MO_64);
+-    bool is_secure;
++    S1Translate ptw = {
-@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
++        .in_mmu_idx = mmu_idx,
-     return true;
++    };
 +    ARMSecuritySpace ss;
      switch (mmu_idx) {
      case ARMMMUIdx_E10_0:
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
      case ARMMMUIdx_Stage1_E1:
      case ARMMMUIdx_Stage1_E1_PAN:
      case ARMMMUIdx_E2:
 -        is_secure = arm_is_secure_below_el3(env);
 +        ss = arm_security_space_below_el3(env);
          break;
      case ARMMMUIdx_Stage2:
 +        /*
 +         * For Secure EL2, we need this index to be NonSecure;
 +         * otherwise this will already be NonSecure or Realm.
 +         */
 +        ss = arm_security_space_below_el3(env);
 +        if (ss == ARMSS_Secure) {
 +            ss = ARMSS_NonSecure;
 +        }
 +        break;
      case ARMMMUIdx_Phys_NS:
      case ARMMMUIdx_MPrivNegPri:
      case ARMMMUIdx_MUserNegPri:
      case ARMMMUIdx_MPriv:
      case ARMMMUIdx_MUser:
 -        is_secure = false;
 +        ss = ARMSS_NonSecure;
          break;
 -    case ARMMMUIdx_E3:
      case ARMMMUIdx_Stage2_S:
      case ARMMMUIdx_Phys_S:
      case ARMMMUIdx_MSPrivNegPri:
      case ARMMMUIdx_MSUserNegPri:
      case ARMMMUIdx_MSPriv:
      case ARMMMUIdx_MSUser:
 -        is_secure = true;
 +        ss = ARMSS_Secure;
 +        break;
 +    case ARMMMUIdx_E3:
 +        if (arm_feature(env, ARM_FEATURE_AARCH64) &&
 +            cpu_isar_feature(aa64_rme, env_archcpu(env))) {
 +            ss = ARMSS_Root;
 +        } else {
 +            ss = ARMSS_Secure;
 +        }
 +        break;
 +    case ARMMMUIdx_Phys_Root:
 +        ss = ARMSS_Root;
 +        break;
 +    case ARMMMUIdx_Phys_Realm:
 +        ss = ARMSS_Realm;
          break;
      default:
          g_assert_not_reached();
      }
 -    return get_phys_addr_with_secure(env, address, access_type, mmu_idx,
 -                                     is_secure, result, fi);
 +
 +    ptw.in_space = ss;
 +    ptw.in_secure = arm_space_is_secure(ss);
 +    return get_phys_addr_with_struct(env, &ptw, address, access_type,
 +                                     result, fi);
  }
--#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
+ hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
-+#define DO_PREWIDEN(INSN, S, OP, SRC1WIDE, SIGN)                        \
+@@ -XXX,XX +XXX,XX @@ hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
-     static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
+ {
-     {                                                                   \
+     ARMCPU *cpu = ARM_CPU(cs);
-         static NeonGenWidenFn * const widenfn[] = {                     \
+     CPUARMState *env = &cpu->env;
-             gen_helper_neon_widen_##S##8,                               \
++    ARMMMUIdx mmu_idx = arm_mmu_idx(env);
-             gen_helper_neon_widen_##S##16,                              \
++    ARMSecuritySpace ss = arm_security_space(env);
--            tcg_gen_##EXT##_i32_i64,                                    \
+     S1Translate ptw = {
--            NULL,                                                       \
+-        .in_mmu_idx = arm_mmu_idx(env),
-+            NULL, NULL,                                                 \
+-        .in_secure = arm_is_secure(env),
-         };                                                              \
++        .in_mmu_idx = mmu_idx,
-         static NeonGenTwo64OpFn * const addfn[] = {                     \
++        .in_space = ss,
-             gen_helper_neon_##OP##l_u16,                                \
++        .in_secure = arm_space_is_secure(ss),
-@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
+         .in_debug = true,
-             tcg_gen_##OP##_i64,                                         \
+     };
-             NULL,                                                       \
+     GetPhysAddrResult res = {};
          };                                                              \
 -        return do_prewiden_3d(s, a, widenfn[a->size],                   \
 -                              addfn[a->size], SRC1WIDE);                \
 +        int narrow_mop = a->size == MO_32 ? MO_32 | SIGN : -1;          \
 +        return do_prewiden_3d(s, a, widenfn[a->size], addfn[a->size],   \
 +                              SRC1WIDE ? MO_Q : narrow_mop,             \
 +                              narrow_mop);                              \
      }
 -DO_PREWIDEN(VADDL_S, s, ext, add, false)
 -DO_PREWIDEN(VADDL_U, u, extu, add, false)
 -DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
 -DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
 -DO_PREWIDEN(VADDW_S, s, ext, add, true)
 -DO_PREWIDEN(VADDW_U, u, extu, add, true)
 -DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
 -DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
 +DO_PREWIDEN(VADDL_S, s, add, false, MO_SIGN)
 +DO_PREWIDEN(VADDL_U, u, add, false, 0)
 +DO_PREWIDEN(VSUBL_S, s, sub, false, MO_SIGN)
 +DO_PREWIDEN(VSUBL_U, u, sub, false, 0)
 +DO_PREWIDEN(VADDW_S, s, add, true, MO_SIGN)
 +DO_PREWIDEN(VADDW_U, u, add, true, 0)
 +DO_PREWIDEN(VSUBW_S, s, sub, true, MO_SIGN)
 +DO_PREWIDEN(VSUBW_U, u, sub, true, 0)
  static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
                           NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
 --
-.20.1
+.34.1

-[PULL 20/26] hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
+[PULL 11/26] target/arm: NSTable is RES0 for the RME EL3 regime
-From: AlexChen <alex.chen@huawei.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-In exynos4210_fimd_update(), the pointer s is dereferinced before
+Test in_space instead of in_secure so that we don't
-being check if it is valid, which may lead to NULL pointer dereference.
+switch out of Root space.
 So move the assignment to global_width after checking that the s is valid.
-Reported-by: Euler Robot <euler.robot@huawei.com>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Alex Chen <alex.chen@huawei.com>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Message-id: 20230620124418.805717-12-richard.henderson@linaro.org
 Message-id: 5F9F8D88.9030102@huawei.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/display/exynos4210_fimd.c | 4 +++-
+ target/arm/ptw.c | 28 ++++++++++++++--------------
-file changed, 3 insertions(+), 1 deletion(-)
+file changed, 14 insertions(+), 14 deletions(-)
-diff --git a/hw/display/exynos4210_fimd.c b/hw/display/exynos4210_fimd.c
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/exynos4210_fimd.c
+--- a/target/arm/ptw.c
-+++ b/hw/display/exynos4210_fimd.c
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ static void exynos4210_fimd_update(void *opaque)
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
-     bool blend = false;
+ {
-     uint8_t *host_fb_addr;
+     ARMCPU *cpu = env_archcpu(env);
-     bool is_dirty = false;
+     ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
--    const int global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
+-    bool is_secure = ptw->in_secure;
-+    int global_width;
+     int32_t level;
+     ARMVAParameters param;
-     if (!s || !s->console || !s->enabled ||
+     uint64_t ttbr;
-         surface_bits_per_pixel(qemu_console_surface(s->console)) == 0) {
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
-         return;
+     uint64_t descaddrmask;
      bool aarch64 = arm_el_is_aa64(env, el);
      uint64_t descriptor, new_descriptor;
 -    bool nstable;
      /* TODO: This code does not support shareability levels. */
      if (aarch64) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
          descaddrmask = MAKE_64BIT_MASK(0, 40);
      }
      descaddrmask &= ~indexmask_grainsize;
 -
 -    /*
 -     * Secure stage 1 accesses start with the page table in secure memory and
 -     * can be downgraded to non-secure at any step. Non-secure accesses
 -     * remain non-secure. We implement this by just ORing in the NSTable/NS
 -     * bits at each step.
 -     * Stage 2 never gets this kind of downgrade.
 -     */
 -    tableattrs = is_secure ? 0 : (1 << 4);
 +    tableattrs = 0;
   next_level:
      descaddr |= (address >> (stride * (4 - level))) & indexmask;
      descaddr &= ~7ULL;
 -    nstable = !regime_is_stage2(mmu_idx) && extract32(tableattrs, 4, 1);
 -    if (nstable && ptw->in_secure) {
 +
 +    /*
 +     * Process the NSTable bit from the previous level.  This changes
 +     * the table address space and the output space from Secure to
 +     * NonSecure.  With RME, the EL3 translation regime does not change
 +     * from Root to NonSecure.
 +     */
 +    if (ptw->in_space == ARMSS_Secure
 +        && !regime_is_stage2(mmu_idx)
 +        && extract32(tableattrs, 4, 1)) {
          /*
           * Stage2_S -> Stage2 or Phys_S -> Phys_NS
           * Assert the relative order of the secure/non-secure indexes.
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
          QEMU_BUILD_BUG_ON(ARMMMUIdx_Stage2_S + 1 != ARMMMUIdx_Stage2);
          ptw->in_ptw_idx += 1;
          ptw->in_secure = false;
 +        ptw->in_space = ARMSS_NonSecure;
      }
 +
-+    global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
+     if (!S1_ptw_translate(env, ptw, descaddr, fi)) {
-     exynos4210_update_resolution(s);
+         goto do_fault;
-     surface = qemu_console_surface(s->console);
+     }
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
       */
      attrs = new_descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(50, 14));
      if (!regime_is_stage2(mmu_idx)) {
 -        attrs |= nstable << 5; /* NS */
 +        attrs |= !ptw->in_secure << 5; /* NS */
          if (!param.hpd) {
              attrs |= extract64(tableattrs, 0, 2) << 53;     /* XN, PXN */
              /*
 --
-.20.1
+.34.1

-[PULL 07/26] target/arm: Rename neon_load_reg32 to vfp_load_reg32
+[PULL 12/26] target/arm: Handle Block and Page bits for security space
 From: Richard Henderson <richard.henderson@linaro.org>
-The only uses of this function are for loading VFP
+With Realm security state, bit 55 of a block or page descriptor during
-single-precision values, and nothing to do with NEON.
+the stage2 walk becomes the NS bit; during the stage1 walk the bit 5
 NS bit is RES0.  With Root security state, bit 11 of the block or page
 descriptor during the stage1 walk becomes the NSE bit.
+Rather than collecting an NS bit and applying it later, compute the
+output pa space from the input pa space and unconditionally assign.
+This means that we no longer need to adjust the output space earlier
+for the NSTable bit.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-8-richard.henderson@linaro.org
+Message-id: 20230620124418.805717-13-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c         |   4 +-
+ target/arm/ptw.c | 89 +++++++++++++++++++++++++++++++++++++++---------
- target/arm/translate-vfp.c.inc | 184 ++++++++++++++++-----------------
+file changed, 73 insertions(+), 16 deletions(-)
 files changed, 94 insertions(+), 94 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/ptw.c
-+++ b/target/arm/translate.c
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
+@@ -XXX,XX +XXX,XX @@ static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
-     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
+  * @mmu_idx: MMU index indicating required translation regime
- }
+  * @is_aa64: TRUE if AArch64
+  * @ap:      The 2-bit simple AP (AP[2:1])
--static inline void neon_load_reg32(TCGv_i32 var, int reg)
+- * @ns:      NS (non-secure) bit
-+static inline void vfp_load_reg32(TCGv_i32 var, int reg)
+  * @xn:      XN (execute-never) bit
   * @pxn:     PXN (privileged execute-never) bit
 + * @in_pa:   The original input pa space
 + * @out_pa:  The output pa space, modified by NSTable, NS, and NSE
   */
  static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
 -                      int ap, int ns, int xn, int pxn)
 +                      int ap, int xn, int pxn,
 +                      ARMSecuritySpace in_pa, ARMSecuritySpace out_pa)
  {
-     tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
+     ARMCPU *cpu = env_archcpu(env);
- }
+     bool is_user = regime_is_user(env, mmu_idx);
+@@ -XXX,XX +XXX,XX @@ static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
 -static inline void neon_store_reg32(TCGv_i32 var, int reg)
 +static inline void vfp_store_reg32(TCGv_i32 var, int reg)
  {
      tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
  }
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
          frn = tcg_temp_new_i32();
          frm = tcg_temp_new_i32();
          dest = tcg_temp_new_i32();
 -        neon_load_reg32(frn, rn);
 -        neon_load_reg32(frm, rm);
 +        vfp_load_reg32(frn, rn);
 +        vfp_load_reg32(frm, rm);
          switch (a->cc) {
          case 0: /* eq: Z */
              tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
          if (sz == 1) {
              tcg_gen_andi_i32(dest, dest, 0xffff);
          }
 -        neon_store_reg32(dest, rd);
 +        vfp_store_reg32(dest, rd);
          tcg_temp_free_i32(frn);
          tcg_temp_free_i32(frm);
          tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          TCGv_i32 tcg_res;
          tcg_op = tcg_temp_new_i32();
          tcg_res = tcg_temp_new_i32();
 -        neon_load_reg32(tcg_op, rm);
 +        vfp_load_reg32(tcg_op, rm);
          if (sz == 1) {
              gen_helper_rinth(tcg_res, tcg_op, fpst);
          } else {
              gen_helper_rints(tcg_res, tcg_op, fpst);
          }
 -        neon_store_reg32(tcg_res, rd);
 +        vfp_store_reg32(tcg_res, rd);
          tcg_temp_free_i32(tcg_op);
          tcg_temp_free_i32(tcg_res);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
              gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
          }
          tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
 -        neon_store_reg32(tcg_tmp, rd);
 +        vfp_store_reg32(tcg_tmp, rd);
          tcg_temp_free_i32(tcg_tmp);
          tcg_temp_free_i64(tcg_res);
          tcg_temp_free_i64(tcg_double);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
          TCGv_i32 tcg_single, tcg_res;
          tcg_single = tcg_temp_new_i32();
          tcg_res = tcg_temp_new_i32();
 -        neon_load_reg32(tcg_single, rm);
 +        vfp_load_reg32(tcg_single, rm);
          if (sz == 1) {
              if (is_signed) {
                  gen_helper_vfp_toslh(tcg_res, tcg_single, tcg_shift, fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
                  gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
              }
          }
 -        neon_store_reg32(tcg_res, rd);
 +        vfp_store_reg32(tcg_res, rd);
          tcg_temp_free_i32(tcg_res);
          tcg_temp_free_i32(tcg_single);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
      if (a->l) {
          /* VFP to general purpose register */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vn);
 +        vfp_load_reg32(tmp, a->vn);
          tcg_gen_andi_i32(tmp, tmp, 0xffff);
          store_reg(s, a->rt, tmp);
      } else {
          /* general purpose register to VFP */
          tmp = load_reg(s, a->rt);
          tcg_gen_andi_i32(tmp, tmp, 0xffff);
 -        neon_store_reg32(tmp, a->vn);
 +        vfp_store_reg32(tmp, a->vn);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
      if (a->l) {
          /* VFP to general purpose register */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vn);
 +        vfp_load_reg32(tmp, a->vn);
          if (a->rt == 15) {
              /* Set the 4 flag bits in the CPSR.  */
              gen_set_nzcv(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
      } else {
          /* general purpose register to VFP */
          tmp = load_reg(s, a->rt);
 -        neon_store_reg32(tmp, a->vn);
 +        vfp_store_reg32(tmp, a->vn);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
      if (a->op) {
          /* fpreg to gpreg */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm);
 +        vfp_load_reg32(tmp, a->vm);
          store_reg(s, a->rt, tmp);
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm + 1);
 +        vfp_load_reg32(tmp, a->vm + 1);
          store_reg(s, a->rt2, tmp);
      } else {
          /* gpreg to fpreg */
          tmp = load_reg(s, a->rt);
 -        neon_store_reg32(tmp, a->vm);
 +        vfp_store_reg32(tmp, a->vm);
          tcg_temp_free_i32(tmp);
          tmp = load_reg(s, a->rt2);
 -        neon_store_reg32(tmp, a->vm + 1);
 +        vfp_store_reg32(tmp, a->vm + 1);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
      if (a->op) {
          /* fpreg to gpreg */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm * 2);
 +        vfp_load_reg32(tmp, a->vm * 2);
          store_reg(s, a->rt, tmp);
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm * 2 + 1);
 +        vfp_load_reg32(tmp, a->vm * 2 + 1);
          store_reg(s, a->rt2, tmp);
      } else {
          /* gpreg to fpreg */
          tmp = load_reg(s, a->rt);
 -        neon_store_reg32(tmp, a->vm * 2);
 +        vfp_store_reg32(tmp, a->vm * 2);
          tcg_temp_free_i32(tmp);
          tmp = load_reg(s, a->rt2);
 -        neon_store_reg32(tmp, a->vm * 2 + 1);
 +        vfp_store_reg32(tmp, a->vm * 2 + 1);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
      tmp = tcg_temp_new_i32();
      if (a->l) {
          gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
 -        neon_store_reg32(tmp, a->vd);
 +        vfp_store_reg32(tmp, a->vd);
      } else {
 -        neon_load_reg32(tmp, a->vd);
 +        vfp_load_reg32(tmp, a->vd);
          gen_aa32_st16(s, tmp, addr, get_mem_index(s));
      }
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
      tmp = tcg_temp_new_i32();
      if (a->l) {
          gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
 -        neon_store_reg32(tmp, a->vd);
 +        vfp_store_reg32(tmp, a->vd);
      } else {
 -        neon_load_reg32(tmp, a->vd);
 +        vfp_load_reg32(tmp, a->vd);
          gen_aa32_st32(s, tmp, addr, get_mem_index(s));
      }
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
          if (a->l) {
              /* load */
              gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
 -            neon_store_reg32(tmp, a->vd + i);
 +            vfp_store_reg32(tmp, a->vd + i);
          } else {
              /* store */
 -            neon_load_reg32(tmp, a->vd + i);
 +            vfp_load_reg32(tmp, a->vd + i);
              gen_aa32_st32(s, tmp, addr, get_mem_index(s));
          }
          tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
      fd = tcg_temp_new_i32();
      fpst = fpstatus_ptr(FPST_FPCR);
 -    neon_load_reg32(f0, vn);
 -    neon_load_reg32(f1, vm);
 +    vfp_load_reg32(f0, vn);
 +    vfp_load_reg32(f1, vm);
      for (;;) {
          if (reads_vd) {
 -            neon_load_reg32(fd, vd);
 +            vfp_load_reg32(fd, vd);
          }
          fn(fd, f0, f1, fpst);
 -        neon_store_reg32(fd, vd);
 +        vfp_store_reg32(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
          veclen--;
          vd = vfp_advance_sreg(vd, delta_d);
          vn = vfp_advance_sreg(vn, delta_d);
 -        neon_load_reg32(f0, vn);
 +        vfp_load_reg32(f0, vn);
          if (delta_m) {
              vm = vfp_advance_sreg(vm, delta_m);
 -            neon_load_reg32(f1, vm);
 +            vfp_load_reg32(f1, vm);
          }
      }
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
+-    if (ns && arm_is_secure(env) && (env->cp15.scr_el3 & SCR_SIF)) {
-     fd = tcg_temp_new_i32();
++    if (out_pa == ARMSS_NonSecure && in_pa == ARMSS_Secure &&
-     fpst = fpstatus_ptr(FPST_FPCR_F16);
++        (env->cp15.scr_el3 & SCR_SIF)) {
+         return prot_rw;
 -    neon_load_reg32(f0, vn);
 -    neon_load_reg32(f1, vm);
 +    vfp_load_reg32(f0, vn);
 +    vfp_load_reg32(f1, vm);
      if (reads_vd) {
 -        neon_load_reg32(fd, vd);
 +        vfp_load_reg32(fd, vd);
      }
-     fn(fd, f0, f1, fpst);
--    neon_store_reg32(fd, vd);
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
-+    vfp_store_reg32(fd, vd);
+     int32_t stride;
+     int addrsize, inputsize, outputsize;
-     tcg_temp_free_i32(f0);
+     uint64_t tcr = regime_tcr(env, mmu_idx);
-     tcg_temp_free_i32(f1);
+-    int ap, ns, xn, pxn;
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
++    int ap, xn, pxn;
-     f0 = tcg_temp_new_i32();
+     uint32_t el = regime_el(env, mmu_idx);
-     fd = tcg_temp_new_i32();
+     uint64_t descaddrmask;
+     bool aarch64 = arm_el_is_aa64(env, el);
--    neon_load_reg32(f0, vm);
+     uint64_t descriptor, new_descriptor;
-+    vfp_load_reg32(f0, vm);
++    ARMSecuritySpace out_space;
-     for (;;) {
+     /* TODO: This code does not support shareability levels. */
-         fn(fd, f0);
+     if (aarch64) {
--        neon_store_reg32(fd, vd);
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
 +        vfp_store_reg32(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
              /* single source one-many */
              while (veclen--) {
                  vd = vfp_advance_sreg(vd, delta_d);
 -                neon_store_reg32(fd, vd);
 +                vfp_store_reg32(fd, vd);
              }
              break;
          }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
          veclen--;
          vd = vfp_advance_sreg(vd, delta_d);
          vm = vfp_advance_sreg(vm, delta_m);
 -        neon_load_reg32(f0, vm);
 +        vfp_load_reg32(f0, vm);
      }
-     tcg_temp_free_i32(f0);
+     ap = extract32(attrs, 6, 2);
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
++    out_space = ptw->in_space;
      if (regime_is_stage2(mmu_idx)) {
 -        ns = mmu_idx == ARMMMUIdx_Stage2;
 +        /*
 +         * R_GYNXY: For stage2 in Realm security state, bit 55 is NS.
 +         * The bit remains ignored for other security states.
 +         */
 +        if (out_space == ARMSS_Realm && extract64(attrs, 55, 1)) {
 +            out_space = ARMSS_NonSecure;
 +        }
          xn = extract64(attrs, 53, 2);
          result->f.prot = get_S2prot(env, ap, xn, s1_is_el0);
      } else {
 -        ns = extract32(attrs, 5, 1);
 +        int nse, ns = extract32(attrs, 5, 1);
 +        switch (out_space) {
 +        case ARMSS_Root:
 +            /*
 +             * R_GVZML: Bit 11 becomes the NSE field in the EL3 regime.
 +             * R_XTYPW: NSE and NS together select the output pa space.
 +             */
 +            nse = extract32(attrs, 11, 1);
 +            out_space = (nse << 1) | ns;
 +            if (out_space == ARMSS_Secure &&
 +                !cpu_isar_feature(aa64_sel2, cpu)) {
 +                out_space = ARMSS_NonSecure;
 +            }
 +            break;
 +        case ARMSS_Secure:
 +            if (ns) {
 +                out_space = ARMSS_NonSecure;
 +            }
 +            break;
 +        case ARMSS_Realm:
 +            switch (mmu_idx) {
 +            case ARMMMUIdx_Stage1_E0:
 +            case ARMMMUIdx_Stage1_E1:
 +            case ARMMMUIdx_Stage1_E1_PAN:
 +                /* I_CZPRF: For Realm EL1&0 stage1, NS bit is RES0. */
 +                break;
 +            case ARMMMUIdx_E2:
 +            case ARMMMUIdx_E20_0:
 +            case ARMMMUIdx_E20_2:
 +            case ARMMMUIdx_E20_2_PAN:
 +                /*
 +                 * R_LYKFZ, R_WGRZN: For Realm EL2 and EL2&1,
 +                 * NS changes the output to non-secure space.
 +                 */
 +                if (ns) {
 +                    out_space = ARMSS_NonSecure;
 +                }
 +                break;
 +            default:
 +                g_assert_not_reached();
 +            }
 +            break;
 +        case ARMSS_NonSecure:
 +            /* R_QRMFF: For NonSecure state, the NS bit is RES0. */
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
          xn = extract64(attrs, 54, 1);
          pxn = extract64(attrs, 53, 1);
 -        result->f.prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
 +
 +        /*
 +         * Note that we modified ptw->in_space earlier for NSTable, but
 +         * result->f.attrs retains a copy of the original security space.
 +         */
 +        result->f.prot = get_S1prot(env, mmu_idx, aarch64, ap, xn, pxn,
 +                                    result->f.attrs.space, out_space);
      }
-     f0 = tcg_temp_new_i32();
+     if (!(result->f.prot & (1 << access_type))) {
--    neon_load_reg32(f0, vm);
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
 +    vfp_load_reg32(f0, vm);
      fn(f0, f0);
 -    neon_store_reg32(f0, vd);
 +    vfp_store_reg32(f0, vd);
      tcg_temp_free_i32(f0);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vn, a->vn);
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vn, a->vn);
 +    vfp_load_reg32(vm, a->vm);
      if (neg_n) {
          /* VFNMS, VFMS */
          gen_helper_vfp_negh(vn, vn);
      }
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (neg_d) {
          /* VFNMA, VFNMS */
          gen_helper_vfp_negh(vd, vd);
      }
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vn, a->vn);
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vn, a->vn);
 +    vfp_load_reg32(vm, a->vm);
      if (neg_n) {
          /* VFNMS, VFMS */
          gen_helper_vfp_negs(vn, vn);
      }
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (neg_d) {
          /* VFNMA, VFNMS */
          gen_helper_vfp_negs(vd, vd);
      }
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_hp(DisasContext *s, arg_VMOV_imm_sp *a)
      }
      fd = tcg_const_i32(vfp_expand_imm(MO_16, a->imm));
 -    neon_store_reg32(fd, a->vd);
 +    vfp_store_reg32(fd, a->vd);
      tcg_temp_free_i32(fd);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
      fd = tcg_const_i32(vfp_expand_imm(MO_32, a->imm));
      for (;;) {
 -        neon_store_reg32(fd, vd);
 +        vfp_store_reg32(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
      vd = tcg_temp_new_i32();
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (a->z) {
          tcg_gen_movi_i32(vm, 0);
      } else {
 -        neon_load_reg32(vm, a->vm);
 +        vfp_load_reg32(vm, a->vm);
      }
      if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
      vd = tcg_temp_new_i32();
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (a->z) {
          tcg_gen_movi_i32(vm, 0);
      } else {
 -        neon_load_reg32(vm, a->vm);
 +        vfp_load_reg32(vm, a->vm);
      }
      if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
      /* The T bit tells us if we want the low or high 16 bits of Vm */
      tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
      gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_i32(ahp_mode);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
      ahp_mode = get_ahp_flag();
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
      tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
      tcg_temp_free_i32(ahp_mode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_hp(DisasContext *s, arg_VRINTR_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      gen_helper_rinth(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rints(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_hp(DisasContext *s, arg_VRINTZ_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rinth(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tcg_rmode);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rints(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tcg_rmode);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_hp(DisasContext *s, arg_VRINTX_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      gen_helper_rinth_exact(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rints_exact(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i64();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      gen_helper_vfp_fcvtds(vd, vm, cpu_env);
      neon_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
      vm = tcg_temp_new_i64();
      neon_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i64(vm);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
      }
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      if (a->s) {
          /* i32 -> f16 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
          /* u32 -> f16 */
          gen_helper_vfp_uitoh(vm, vm, fpst);
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
      }
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      if (a->s) {
          /* i32 -> f32 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
          /* u32 -> f32 */
          gen_helper_vfp_uitos(vm, vm, fpst);
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i64();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      if (a->s) {
          /* i32 -> f64 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
      vd = tcg_temp_new_i32();
      neon_load_reg64(vm, a->vm);
      gen_helper_vjcvt(vd, vm, cpu_env);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i64(vm);
      tcg_temp_free_i32(vd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
          g_assert_not_reached();
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i32(shift);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
          g_assert_not_reached();
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i32(shift);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      if (a->s) {
          if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
              gen_helper_vfp_touih(vm, vm, fpst);
          }
      }
--    neon_store_reg32(vm, a->vd);
-+    vfp_store_reg32(vm, a->vd);
+-    if (ns) {
-     tcg_temp_free_i32(vm);
+-        /*
-     tcg_temp_free_ptr(fpst);
+-         * The NS bit will (as required by the architecture) have no effect if
-     return true;
+-         * the CPU doesn't support TZ or this is a non-secure translation
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
+-         * regime, because the attribute will already be non-secure.
+-         */
-     fpst = fpstatus_ptr(FPST_FPCR);
+-        result->f.attrs.secure = false;
-     vm = tcg_temp_new_i32();
+-        result->f.attrs.space = ARMSS_NonSecure;
--    neon_load_reg32(vm, a->vm);
+-    }
-+    vfp_load_reg32(vm, a->vm);
++    result->f.attrs.space = out_space;
++    result->f.attrs.secure = arm_space_is_secure(out_space);
-     if (a->s) {
-         if (a->rz) {
+     if (regime_is_stage2(mmu_idx)) {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
+         result->cacheattrs.is_s2_format = true;
              gen_helper_vfp_touis(vm, vm, fpst);
          }
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
              gen_helper_vfp_touid(vd, vm, fpst);
          }
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i64(vm);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VINS(DisasContext *s, arg_VINS *a)
      /* Insert low half of Vm into high half of Vd */
      rm = tcg_temp_new_i32();
      rd = tcg_temp_new_i32();
 -    neon_load_reg32(rm, a->vm);
 -    neon_load_reg32(rd, a->vd);
 +    vfp_load_reg32(rm, a->vm);
 +    vfp_load_reg32(rd, a->vd);
      tcg_gen_deposit_i32(rd, rd, rm, 16, 16);
 -    neon_store_reg32(rd, a->vd);
 +    vfp_store_reg32(rd, a->vd);
      tcg_temp_free_i32(rm);
      tcg_temp_free_i32(rd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOVX(DisasContext *s, arg_VINS *a)
      /* Set Vd to high half of Vm */
      rm = tcg_temp_new_i32();
 -    neon_load_reg32(rm, a->vm);
 +    vfp_load_reg32(rm, a->vm);
      tcg_gen_shri_i32(rm, rm, 16);
 -    neon_store_reg32(rm, a->vd);
 +    vfp_store_reg32(rm, a->vd);
      tcg_temp_free_i32(rm);
      return true;
  }
 --
-.20.1
+.34.1

-[PULL 09/26] target/arm: Rename neon_load_reg64 to vfp_load_reg64
+[PULL 13/26] target/arm: Handle no-execute for Realm and Root regimes
 From: Richard Henderson <richard.henderson@linaro.org>
-The only uses of this function are for loading VFP
+While Root and Realm may read and write data from other spaces,
-double-precision values, and nothing to do with NEON.
+neither may execute from other pa spaces.
+This happens for Stage1 EL3, EL2, EL2&0, and Stage2 EL1&0.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-10-richard.henderson@linaro.org
+Message-id: 20230620124418.805717-14-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c         |  8 ++--
+ target/arm/ptw.c | 52 ++++++++++++++++++++++++++++++++++++++++++------
- target/arm/translate-vfp.c.inc | 84 +++++++++++++++++-----------------
+file changed, 46 insertions(+), 6 deletions(-)
 files changed, 46 insertions(+), 46 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/ptw.c
-+++ b/target/arm/translate.c
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
+@@ -XXX,XX +XXX,XX @@ do_fault:
   * @xn:      XN (execute-never) bits
   * @s1_is_el0: true if this is S2 of an S1+2 walk for EL0
   */
 -static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
 +static int get_S2prot_noexecute(int s2ap)
  {
      int prot = 0;
@@ -XXX,XX +XXX,XX @@ static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
      if (s2ap & 2) {
          prot |= PAGE_WRITE;
      }
- }
++    return prot;
++}
--static inline void neon_load_reg64(TCGv_i64 var, int reg)
++
-+static inline void vfp_load_reg64(TCGv_i64 var, int reg)
++static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
- {
++{
--    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
++    int prot = get_S2prot_noexecute(s2ap);
-+    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(true, reg));
- }
+     if (cpu_isar_feature(any_tts2uxn, env_archcpu(env))) {
+         switch (xn) {
--static inline void neon_store_reg64(TCGv_i64 var, int reg)
+@@ -XXX,XX +XXX,XX @@ static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
 +static inline void vfp_store_reg64(TCGv_i64 var, int reg)
  {
 -    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
 +    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(true, reg));
  }
  static inline void vfp_load_reg32(TCGv_i32 var, int reg)
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
          tcg_gen_ext_i32_i64(nf, cpu_NF);
          tcg_gen_ext_i32_i64(vf, cpu_VF);
 -        neon_load_reg64(frn, rn);
 -        neon_load_reg64(frm, rm);
 +        vfp_load_reg64(frn, rn);
 +        vfp_load_reg64(frm, rm);
          switch (a->cc) {
          case 0: /* eq: Z */
              tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
              tcg_temp_free_i64(tmp);
              break;
          }
 -        neon_store_reg64(dest, rd);
 +        vfp_store_reg64(dest, rd);
          tcg_temp_free_i64(frn);
          tcg_temp_free_i64(frm);
          tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          TCGv_i64 tcg_res;
          tcg_op = tcg_temp_new_i64();
          tcg_res = tcg_temp_new_i64();
 -        neon_load_reg64(tcg_op, rm);
 +        vfp_load_reg64(tcg_op, rm);
          gen_helper_rintd(tcg_res, tcg_op, fpst);
 -        neon_store_reg64(tcg_res, rd);
 +        vfp_store_reg64(tcg_res, rd);
          tcg_temp_free_i64(tcg_op);
          tcg_temp_free_i64(tcg_res);
      } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
          tcg_double = tcg_temp_new_i64();
          tcg_res = tcg_temp_new_i64();
          tcg_tmp = tcg_temp_new_i32();
 -        neon_load_reg64(tcg_double, rm);
 +        vfp_load_reg64(tcg_double, rm);
          if (is_signed) {
              gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
          } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
      tmp = tcg_temp_new_i64();
      if (a->l) {
          gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
 -        neon_store_reg64(tmp, a->vd);
 +        vfp_store_reg64(tmp, a->vd);
      } else {
 -        neon_load_reg64(tmp, a->vd);
 +        vfp_load_reg64(tmp, a->vd);
          gen_aa32_st64(s, tmp, addr, get_mem_index(s));
      }
      tcg_temp_free_i64(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
          if (a->l) {
              /* load */
              gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
 -            neon_store_reg64(tmp, a->vd + i);
 +            vfp_store_reg64(tmp, a->vd + i);
          } else {
              /* store */
 -            neon_load_reg64(tmp, a->vd + i);
 +            vfp_load_reg64(tmp, a->vd + i);
              gen_aa32_st64(s, tmp, addr, get_mem_index(s));
          }
          tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
      fd = tcg_temp_new_i64();
      fpst = fpstatus_ptr(FPST_FPCR);
 -    neon_load_reg64(f0, vn);
 -    neon_load_reg64(f1, vm);
 +    vfp_load_reg64(f0, vn);
 +    vfp_load_reg64(f1, vm);
      for (;;) {
          if (reads_vd) {
 -            neon_load_reg64(fd, vd);
 +            vfp_load_reg64(fd, vd);
          }
          fn(fd, f0, f1, fpst);
 -        neon_store_reg64(fd, vd);
 +        vfp_store_reg64(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
          veclen--;
          vd = vfp_advance_dreg(vd, delta_d);
          vn = vfp_advance_dreg(vn, delta_d);
 -        neon_load_reg64(f0, vn);
 +        vfp_load_reg64(f0, vn);
          if (delta_m) {
              vm = vfp_advance_dreg(vm, delta_m);
 -            neon_load_reg64(f1, vm);
 +            vfp_load_reg64(f1, vm);
          }
      }
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
+-    if (out_pa == ARMSS_NonSecure && in_pa == ARMSS_Secure &&
-     f0 = tcg_temp_new_i64();
+-        (env->cp15.scr_el3 & SCR_SIF)) {
-     fd = tcg_temp_new_i64();
+-        return prot_rw;
++    if (in_pa != out_pa) {
--    neon_load_reg64(f0, vm);
++        switch (in_pa) {
-+    vfp_load_reg64(f0, vm);
++        case ARMSS_Root:
++            /*
-     for (;;) {
++             * R_ZWRVD: permission fault for insn fetched from non-Root,
-         fn(fd, f0);
++             * I_WWBFB: SIF has no effect in EL3.
--        neon_store_reg64(fd, vd);
++             */
-+        vfp_store_reg64(fd, vd);
++            return prot_rw;
++        case ARMSS_Realm:
-         if (veclen == 0) {
++            /*
-             break;
++             * R_PKTDS: permission fault for insn fetched from non-Realm,
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
++             * for Realm EL2 or EL2&0.  The corresponding fault for EL1&0
-             /* single source one-many */
++             * happens during any stage2 translation.
-             while (veclen--) {
++             */
-                 vd = vfp_advance_dreg(vd, delta_d);
++            switch (mmu_idx) {
--                neon_store_reg64(fd, vd);
++            case ARMMMUIdx_E2:
-+                vfp_store_reg64(fd, vd);
++            case ARMMMUIdx_E20_0:
-             }
++            case ARMMMUIdx_E20_2:
-             break;
++            case ARMMMUIdx_E20_2_PAN:
 +                return prot_rw;
 +            default:
 +                break;
 +            }
 +            break;
 +        case ARMSS_Secure:
 +            if (env->cp15.scr_el3 & SCR_SIF) {
 +                return prot_rw;
 +            }
 +            break;
 +        default:
 +            /* Input NonSecure must have output NonSecure. */
 +            g_assert_not_reached();
 +        }
      }
      /* TODO have_wxn should be replaced with
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
          /*
           * R_GYNXY: For stage2 in Realm security state, bit 55 is NS.
           * The bit remains ignored for other security states.
 +         * R_YMCSL: Executing an insn fetched from non-Realm causes
 +         * a stage2 permission fault.
           */
          if (out_space == ARMSS_Realm && extract64(attrs, 55, 1)) {
              out_space = ARMSS_NonSecure;
 +            result->f.prot = get_S2prot_noexecute(ap);
 +        } else {
 +            xn = extract64(attrs, 53, 2);
 +            result->f.prot = get_S2prot(env, ap, xn, s1_is_el0);
          }
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
+-        xn = extract64(attrs, 53, 2);
-         veclen--;
+-        result->f.prot = get_S2prot(env, ap, xn, s1_is_el0);
          vd = vfp_advance_dreg(vd, delta_d);
          vd = vfp_advance_dreg(vm, delta_m);
 -        neon_load_reg64(f0, vm);
 +        vfp_load_reg64(f0, vm);
      }
      tcg_temp_free_i64(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
      vm = tcg_temp_new_i64();
      vd = tcg_temp_new_i64();
 -    neon_load_reg64(vn, a->vn);
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vn, a->vn);
 +    vfp_load_reg64(vm, a->vm);
      if (neg_n) {
          /* VFNMS, VFMS */
          gen_helper_vfp_negd(vn, vn);
      }
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      if (neg_d) {
          /* VFNMA, VFNMS */
          gen_helper_vfp_negd(vd, vd);
      }
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
      fd = tcg_const_i64(vfp_expand_imm(MO_64, a->imm));
      for (;;) {
 -        neon_store_reg64(fd, vd);
 +        vfp_store_reg64(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
      vd = tcg_temp_new_i64();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      if (a->z) {
          tcg_gen_movi_i64(vm, 0);
      } else {
--        neon_load_reg64(vm, a->vm);
+         int nse, ns = extract32(attrs, 5, 1);
-+        vfp_load_reg64(vm, a->vm);
+         switch (out_space) {
      }
      if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
      tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
      vd = tcg_temp_new_i64();
      gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(ahp_mode);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
      tmp = tcg_temp_new_i32();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
      tcg_temp_free_i64(vm);
      tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rintd(tmp, tmp, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rintd(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      tcg_temp_free_i32(tcg_rmode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rintd_exact(tmp, tmp, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
      vd = tcg_temp_new_i64();
      vfp_load_reg32(vm, a->vm);
      gen_helper_vfp_fcvtds(vd, vm, cpu_env);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_i64(vd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
      vd = tcg_temp_new_i32();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
      vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
          /* u32 -> f64 */
          gen_helper_vfp_uitod(vd, vm, fpst);
      }
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_i64(vd);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
      vm = tcg_temp_new_i64();
      vd = tcg_temp_new_i32();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vjcvt(vd, vm, cpu_env);
      vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i64(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i64();
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
          g_assert_not_reached();
      }
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i64(vd);
      tcg_temp_free_i32(shift);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
      fpst = fpstatus_ptr(FPST_FPCR);
      vm = tcg_temp_new_i64();
      vd = tcg_temp_new_i32();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      if (a->s) {
          if (a->rz) {
 --
-.20.1
+.34.1

-[PULL 21/26] target/arm: Get correct MMU index for other-security-state
+[PULL 14/26] target/arm: Use get_phys_addr_with_struct in S1_ptw_translate
-In arm_v7m_mmu_idx_for_secstate() we get the 'priv' level to pass to
+From: Richard Henderson <richard.henderson@linaro.org>
 armv7m_mmu_idx_for_secstate_and_priv() by calling arm_current_el().
 This is incorrect when the security state being queried is not the
 current one, because arm_current_el() uses the current security state
 to determine which of the banked CONTROL.nPRIV bits to look at.
 The effect was that if (for instance) Secure state was in privileged
 mode but Non-Secure was not then we would return the wrong MMU index.
-The only places where we are using this function in a way that could
+Do not provide a fast-path for physical addresses,
-trigger this bug are for the stack loads during a v8M function-return
+as those will need to be validated for GPC.
 and for the instruction fetch of a v8M SG insn.
-Fix the bug by expanding out the M-profile version of the
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-arm_current_el() logic inline so it can use the passed in secstate
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-rather than env->v7m.secure.
+Message-id: 20230620124418.805717-15-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/ptw.c | 44 +++++++++++++++++---------------------------
 file changed, 17 insertions(+), 27 deletions(-)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20201022164408.13214-1-peter.maydell@linaro.org
 ---
  target/arm/m_helper.c | 3 ++-
 file changed, 2 insertions(+), 1 deletion(-)
 diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/m_helper.c
+--- a/target/arm/ptw.c
-+++ b/target/arm/m_helper.c
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
+@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
- /* Return the MMU index for a v7M CPU in the specified security state */
+          * From gdbstub, do not use softmmu so that we don't modify the
- ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
+          * state of the cpu at all, including softmmu tlb contents.
- {
+          */
--    bool priv = arm_current_el(env) != 0;
+-        if (regime_is_stage2(s2_mmu_idx)) {
-+    bool priv = arm_v7m_is_handler_mode(env) ||
+-            S1Translate s2ptw = {
-+        !(env->v7m.control[secstate] & 1);
+-                .in_mmu_idx = s2_mmu_idx,
+-                .in_ptw_idx = ptw_idx_for_stage_2(env, s2_mmu_idx),
-     return arm_v7m_mmu_idx_for_secstate_and_priv(env, secstate, priv);
+-                .in_secure = s2_mmu_idx == ARMMMUIdx_Stage2_S,
- }
+-                .in_space = (s2_mmu_idx == ARMMMUIdx_Stage2_S ? ARMSS_Secure
 -                             : space == ARMSS_Realm ? ARMSS_Realm
 -                             : ARMSS_NonSecure),
 -                .in_debug = true,
 -            };
 -            GetPhysAddrResult s2 = { };
 +        S1Translate s2ptw = {
 +            .in_mmu_idx = s2_mmu_idx,
 +            .in_ptw_idx = ptw_idx_for_stage_2(env, s2_mmu_idx),
 +            .in_secure = s2_mmu_idx == ARMMMUIdx_Stage2_S,
 +            .in_space = (s2_mmu_idx == ARMMMUIdx_Stage2_S ? ARMSS_Secure
 +                         : space == ARMSS_Realm ? ARMSS_Realm
 +                         : ARMSS_NonSecure),
 +            .in_debug = true,
 +        };
 +        GetPhysAddrResult s2 = { };
 -            if (get_phys_addr_lpae(env, &s2ptw, addr, MMU_DATA_LOAD,
 -                                   false, &s2, fi)) {
 -                goto fail;
 -            }
 -            ptw->out_phys = s2.f.phys_addr;
 -            pte_attrs = s2.cacheattrs.attrs;
 -            ptw->out_secure = s2.f.attrs.secure;
 -            ptw->out_space = s2.f.attrs.space;
 -        } else {
 -            /* Regime is physical. */
 -            ptw->out_phys = addr;
 -            pte_attrs = 0;
 -            ptw->out_secure = s2_mmu_idx == ARMMMUIdx_Phys_S;
 -            ptw->out_space = (s2_mmu_idx == ARMMMUIdx_Phys_S ? ARMSS_Secure
 -                              : space == ARMSS_Realm ? ARMSS_Realm
 -                              : ARMSS_NonSecure);
 +        if (get_phys_addr_with_struct(env, &s2ptw, addr,
 +                                      MMU_DATA_LOAD, &s2, fi)) {
 +            goto fail;
          }
 +        ptw->out_phys = s2.f.phys_addr;
 +        pte_attrs = s2.cacheattrs.attrs;
          ptw->out_host = NULL;
          ptw->out_rw = false;
 +        ptw->out_secure = s2.f.attrs.secure;
 +        ptw->out_space = s2.f.attrs.space;
      } else {
  #ifdef CONFIG_TCG
          CPUTLBEntryFull *full;
 --
-.20.1
+.34.1

-[PULL 18/26] hw/arm/boot: fix SVE for EL3 direct kernel boot
+[PULL 15/26] target/arm: Move s1_is_el0 into S1Translate
-From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-When booting a CPU with EL3 using the -kernel flag, set up CPTR_EL3 so
+Instead of passing this to get_phys_addr_lpae, stash it
-that SVE will not trap to EL3.
+in the S1Translate structure.
-Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20201030151541.11976-1-remi@remlab.net
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20230620124418.805717-16-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/boot.c | 3 +++
+ target/arm/ptw.c | 27 ++++++++++++---------------
-file changed, 3 insertions(+)
+file changed, 12 insertions(+), 15 deletions(-)
-diff --git a/hw/arm/boot.c b/hw/arm/boot.c
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/boot.c
+--- a/target/arm/ptw.c
-+++ b/hw/arm/boot.c
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ static void do_cpu_reset(void *opaque)
+@@ -XXX,XX +XXX,XX @@ typedef struct S1Translate {
-                     if (cpu_isar_feature(aa64_mte, cpu)) {
+     ARMSecuritySpace in_space;
-                         env->cp15.scr_el3 |= SCR_ATA;
+     bool in_secure;
-                     }
+     bool in_debug;
-+                    if (cpu_isar_feature(aa64_sve, cpu)) {
++    /*
-+                        env->cp15.cptr_el[3] |= CPTR_EZ;
++     * If this is stage 2 of a stage 1+2 page table walk, then this must
-+                    }
++     * be true if stage 1 is an EL0 access; otherwise this is ignored.
-                     /* AArch64 kernels never boot in secure mode */
++     * Stage 2 is indicated by in_mmu_idx set to ARMMMUIdx_Stage2{,_S}.
-                     assert(!info->secure_boot);
++     */
-                     /* This hook is only supported for AArch32 currently:
++    bool in_s1_is_el0;
      bool out_secure;
      bool out_rw;
      bool out_be;
@@ -XXX,XX +XXX,XX @@ typedef struct S1Translate {
  } S1Translate;
  static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
 -                               uint64_t address,
 -                               MMUAccessType access_type, bool s1_is_el0,
 +                               uint64_t address, MMUAccessType access_type,
                                 GetPhysAddrResult *result, ARMMMUFaultInfo *fi);
  static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
@@ -XXX,XX +XXX,XX @@ static int check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, uint64_t tcr,
   * @ptw: Current and next stage parameters for the walk.
   * @address: virtual address to get physical address for
   * @access_type: MMU_DATA_LOAD, MMU_DATA_STORE or MMU_INST_FETCH
 - * @s1_is_el0: if @ptw->in_mmu_idx is ARMMMUIdx_Stage2
 - *             (so this is a stage 2 page table walk),
 - *             must be true if this is stage 2 of a stage 1+2
 - *             walk for an EL0 access. If @mmu_idx is anything else,
 - *             @s1_is_el0 is ignored.
   * @result: set on translation success,
   * @fi: set to fault info if the translation fails
   */
  static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
                                 uint64_t address,
 -                               MMUAccessType access_type, bool s1_is_el0,
 +                               MMUAccessType access_type,
                                 GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
  {
      ARMCPU *cpu = env_archcpu(env);
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
              result->f.prot = get_S2prot_noexecute(ap);
          } else {
              xn = extract64(attrs, 53, 2);
 -            result->f.prot = get_S2prot(env, ap, xn, s1_is_el0);
 +            result->f.prot = get_S2prot(env, ap, xn, ptw->in_s1_is_el0);
          }
      } else {
          int nse, ns = extract32(attrs, 5, 1);
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
      bool ret, ipa_secure;
      ARMCacheAttrs cacheattrs1;
      ARMSecuritySpace ipa_space;
 -    bool is_el0;
      uint64_t hcr;
      ret = get_phys_addr_with_struct(env, ptw, address, access_type, result, fi);
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
      ipa_secure = result->f.attrs.secure;
      ipa_space = result->f.attrs.space;
 -    is_el0 = ptw->in_mmu_idx == ARMMMUIdx_Stage1_E0;
 +    ptw->in_s1_is_el0 = ptw->in_mmu_idx == ARMMMUIdx_Stage1_E0;
      ptw->in_mmu_idx = ipa_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
      ptw->in_secure = ipa_secure;
      ptw->in_space = ipa_space;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
          ret = get_phys_addr_pmsav8(env, ipa, access_type,
                                     ptw->in_mmu_idx, is_secure, result, fi);
      } else {
 -        ret = get_phys_addr_lpae(env, ptw, ipa, access_type,
 -                                 is_el0, result, fi);
 +        ret = get_phys_addr_lpae(env, ptw, ipa, access_type, result, fi);
      }
      fi->s2addr = ipa;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
      }
      if (regime_using_lpae_format(env, mmu_idx)) {
 -        return get_phys_addr_lpae(env, ptw, address, access_type, false,
 -                                  result, fi);
 +        return get_phys_addr_lpae(env, ptw, address, access_type, result, fi);
      } else if (arm_feature(env, ARM_FEATURE_V7) ||
                 regime_sctlr(env, mmu_idx) & SCTLR_XP) {
          return get_phys_addr_v6(env, ptw, address, access_type, result, fi);
 --
-.20.1
+.34.1

-[PULL 19/26] hw/display/omap_lcdc: Fix potential NULL pointer dereference
+[PULL 16/26] target/arm: Use get_phys_addr_with_struct for stage2
-From: AlexChen <alex.chen@huawei.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-In omap_lcd_interrupts(), the pointer omap_lcd is dereferinced before
+This fixes a bug in which we failed to initialize
-being check if it is valid, which may lead to NULL pointer dereference.
+the result attributes properly after the memset.
 So move the assignment to surface after checking that the omap_lcd is valid
 and move surface_bits_per_pixel(surface) to after the surface assignment.
-Reported-by: Euler Robot <euler.robot@huawei.com>
-Signed-off-by: AlexChen <alex.chen@huawei.com>
-Message-id: 5F9CDB8A.9000001@huawei.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20230620124418.805717-17-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/display/omap_lcdc.c | 10 +++++++---
+ target/arm/ptw.c | 11 +----------
-file changed, 7 insertions(+), 3 deletions(-)
+file changed, 1 insertion(+), 10 deletions(-)
-diff --git a/hw/display/omap_lcdc.c b/hw/display/omap_lcdc.c
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/omap_lcdc.c
+--- a/target/arm/ptw.c
-+++ b/hw/display/omap_lcdc.c
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ static void omap_lcd_interrupts(struct omap_lcd_panel_s *s)
+@@ -XXX,XX +XXX,XX @@ typedef struct S1Translate {
- static void omap_update_display(void *opaque)
+     void *out_host;
- {
+ } S1Translate;
-     struct omap_lcd_panel_s *omap_lcd = (struct omap_lcd_panel_s *) opaque;
--    DisplaySurface *surface = qemu_console_surface(omap_lcd->con);
+-static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
-+    DisplaySurface *surface;
+-                               uint64_t address, MMUAccessType access_type,
-     draw_line_func draw_line;
+-                               GetPhysAddrResult *result, ARMMMUFaultInfo *fi);
-     int size, height, first, last;
+-
-     int width, linesize, step, bpp, frame_offset;
+ static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
-     hwaddr frame_base;
+                                       target_ulong address,
+                                       MMUAccessType access_type,
--    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable ||
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
--        !surface_bits_per_pixel(surface)) {
+     cacheattrs1 = result->cacheattrs;
-+    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable) {
+     memset(result, 0, sizeof(*result));
-+        return;
-+    }
+-    if (arm_feature(env, ARM_FEATURE_PMSA)) {
-+
+-        ret = get_phys_addr_pmsav8(env, ipa, access_type,
-+    surface = qemu_console_surface(omap_lcd->con);
+-                                   ptw->in_mmu_idx, is_secure, result, fi);
-+    if (!surface_bits_per_pixel(surface)) {
+-    } else {
-         return;
+-        ret = get_phys_addr_lpae(env, ptw, ipa, access_type, result, fi);
-     }
+-    }
++    ret = get_phys_addr_with_struct(env, ptw, ipa, access_type, result, fi);
      fi->s2addr = ipa;
      /* Combine the S1 and S2 perms.  */
 --
-.20.1
+.34.1

-[PULL 02/26] target/arm: Move neon_element_offset to translate.c
+[PULL 17/26] target/arm: Add GPC syndrome
 From: Richard Henderson <richard.henderson@linaro.org>
-This will shortly have users outside of translate-neon.c.inc.
+The function takes the fields as filled in by
 the Arm ARM pseudocode for TakeGPCException.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-3-richard.henderson@linaro.org
+Message-id: 20230620124418.805717-18-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          | 20 ++++++++++++++++++++
+ target/arm/syndrome.h | 10 ++++++++++
- target/arm/translate-neon.c.inc | 19 -------------------
+file changed, 10 insertions(+)
 files changed, 20 insertions(+), 19 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/syndrome.h b/target/arm/syndrome.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/syndrome.h
-+++ b/target/arm/translate.c
++++ b/target/arm/syndrome.h
-@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
+@@ -XXX,XX +XXX,XX @@ enum arm_exception_class {
-     return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+     EC_SVEACCESSTRAP          = 0x19,
      EC_ERETTRAP               = 0x1a,
      EC_SMETRAP                = 0x1d,
 +    EC_GPC                    = 0x1e,
      EC_INSNABORT              = 0x20,
      EC_INSNABORT_SAME_EL      = 0x21,
      EC_PCALIGNMENT            = 0x22,
@@ -XXX,XX +XXX,XX @@ static inline uint32_t syn_bxjtrap(int cv, int cond, int rm)
          (cv << 24) | (cond << 20) | rm;
  }
-+/*
++static inline uint32_t syn_gpc(int s2ptw, int ind, int gpcsc,
-+ * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
++                               int cm, int s1ptw, int wnr, int fsc)
 + * where 0 is the least significant end of the register.
 + */
 +static long neon_element_offset(int reg, int element, MemOp size)
 +{
-+    int element_size = 1 << size;
++    /* TODO: FEAT_NV2 adds VNCR */
-+    int ofs = element * element_size;
++    return (EC_GPC << ARM_EL_EC_SHIFT) | ARM_EL_IL | (s2ptw << 21)
-+#ifdef HOST_WORDS_BIGENDIAN
++            | (ind << 20) | (gpcsc << 14) | (cm << 8) | (s1ptw << 7)
-+    /*
++            | (wnr << 6) | fsc;
 +     * Calculate the offset assuming fully little-endian,
 +     * then XOR to account for the order of the 8-byte units.
 +     */
 +    if (element_size < 8) {
 +        ofs ^= 8 - element_size;
 +    }
 +#endif
 +    return neon_full_reg_offset(reg) + ofs;
 +}
 +
- static inline long vfp_reg_offset(bool dp, unsigned reg)
+ static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
  {
-     if (dp) {
+     return (EC_INSNABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static inline int neon_3same_fp_size(DisasContext *s, int x)
  #include "decode-neon-ls.c.inc"
  #include "decode-neon-shared.c.inc"
 -/* Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
 - * where 0 is the least significant end of the register.
 - */
 -static inline long
 -neon_element_offset(int reg, int element, MemOp size)
 -{
 -    int element_size = 1 << size;
 -    int ofs = element * element_size;
 -#ifdef HOST_WORDS_BIGENDIAN
 -    /* Calculate the offset assuming fully little-endian,
 -     * then XOR to account for the order of the 8-byte units.
 -     */
 -    if (element_size < 8) {
 -        ofs ^= 8 - element_size;
 -    }
 -#endif
 -    return neon_full_reg_offset(reg) + ofs;
 -}
 -
  static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
  {
      long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
 --
-.20.1
+.34.1

-[PULL 05/26] target/arm: Add read/write_neon_element32
+[PULL 18/26] target/arm: Implement GPC exceptions
 From: Richard Henderson <richard.henderson@linaro.org>
-Model these off the aa64 read/write_vec_element functions.
+Handle GPC Fault types in arm_deliver_fault, reporting as
-Use it within translate-neon.c.inc.  The new functions do
+either a GPC exception at EL3, or falling through to insn
-not allocate or free temps, so this rearranges the calling
+or data aborts at various exception levels.
 code a bit.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-6-richard.henderson@linaro.org
+Message-id: 20230620124418.805717-19-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          |  26 ++++
+ target/arm/cpu.h            |  1 +
- target/arm/translate-neon.c.inc | 256 ++++++++++++++++++++------------
+ target/arm/internals.h      | 27 +++++++++++
-files changed, 183 insertions(+), 99 deletions(-)
+ target/arm/helper.c         |  5 ++
  target/arm/tcg/tlb_helper.c | 96 +++++++++++++++++++++++++++++++++++--
 files changed, 126 insertions(+), 3 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/cpu.h
-+++ b/target/arm/translate.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
+@@ -XXX,XX +XXX,XX @@
-     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
+ #define EXCP_UNALIGNED      22   /* v7M UNALIGNED UsageFault */
  #define EXCP_DIVBYZERO      23   /* v7M DIVBYZERO UsageFault */
  #define EXCP_VSERR          24
 +#define EXCP_GPC            25   /* v9 Granule Protection Check Fault */
  /* NB: add new EXCP_ defines to the array in arm_log_exception() too */
  #define ARMV7M_EXCP_RESET   1
 diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/internals.h
 +++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ typedef enum ARMFaultType {
      ARMFault_ICacheMaint,
      ARMFault_QEMU_NSCExec, /* v8M: NS executing in S&NSC memory */
      ARMFault_QEMU_SFault, /* v8M: SecureFault INVTRAN, INVEP or AUVIOL */
 +    ARMFault_GPCFOnWalk,
 +    ARMFault_GPCFOnOutput,
  } ARMFaultType;
 +typedef enum ARMGPCF {
 +    GPCF_None,
 +    GPCF_AddressSize,
 +    GPCF_Walk,
 +    GPCF_EABT,
 +    GPCF_Fail,
 +} ARMGPCF;
 +
  /**
   * ARMMMUFaultInfo: Information describing an ARM MMU Fault
   * @type: Type of fault
 + * @gpcf: Subtype of ARMFault_GPCFOn{Walk,Output}.
   * @level: Table walk level (for translation, access flag and permission faults)
   * @domain: Domain of the fault address (for non-LPAE CPUs only)
   * @s2addr: Address that caused a fault at stage 2
 + * @paddr: physical address that caused a fault for gpc
 + * @paddr_space: physical address space that caused a fault for gpc
   * @stage2: True if we faulted at stage 2
   * @s1ptw: True if we faulted at stage 2 while doing a stage 1 page-table walk
   * @s1ns: True if we faulted on a non-secure IPA while in secure state
@@ -XXX,XX +XXX,XX @@ typedef enum ARMFaultType {
  typedef struct ARMMMUFaultInfo ARMMMUFaultInfo;
  struct ARMMMUFaultInfo {
      ARMFaultType type;
 +    ARMGPCF gpcf;
      target_ulong s2addr;
 +    target_ulong paddr;
 +    ARMSecuritySpace paddr_space;
      int level;
      int domain;
      bool stage2;
@@ -XXX,XX +XXX,XX @@ static inline uint32_t arm_fi_to_lfsc(ARMMMUFaultInfo *fi)
      case ARMFault_Exclusive:
          fsc = 0x35;
          break;
 +    case ARMFault_GPCFOnWalk:
 +        assert(fi->level >= -1 && fi->level <= 3);
 +        if (fi->level < 0) {
 +            fsc = 0b100011;
 +        } else {
 +            fsc = 0b100100 | fi->level;
 +        }
 +        break;
 +    case ARMFault_GPCFOnOutput:
 +        fsc = 0b101000;
 +        break;
      default:
          /* Other faults can't occur in a context that requires a
           * long-format status code.
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void arm_log_exception(CPUState *cs)
              [EXCP_UNALIGNED] = "v7M UNALIGNED UsageFault",
              [EXCP_DIVBYZERO] = "v7M DIVBYZERO UsageFault",
              [EXCP_VSERR] = "Virtual SERR",
 +            [EXCP_GPC] = "Granule Protection Check",
          };
          if (idx >= 0 && idx < ARRAY_SIZE(excnames)) {
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
      }
      switch (cs->exception_index) {
 +    case EXCP_GPC:
 +        qemu_log_mask(CPU_LOG_INT, "...with MFAR 0x%" PRIx64 "\n",
 +                      env->cp15.mfar_el3);
 +        /* fall through */
      case EXCP_PREFETCH_ABORT:
      case EXCP_DATA_ABORT:
          /*
 diff --git a/target/arm/tcg/tlb_helper.c b/target/arm/tcg/tlb_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/tlb_helper.c
 +++ b/target/arm/tcg/tlb_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t compute_fsr_fsc(CPUARMState *env, ARMMMUFaultInfo *fi,
      return fsr;
  }
-+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
++static bool report_as_gpc_exception(ARMCPU *cpu, int current_el,
 +                                    ARMMMUFaultInfo *fi)
 +{
-+    long off = neon_element_offset(reg, ele, size);
++    bool ret;
 +
-+    switch (size) {
++    switch (fi->gpcf) {
-+    case MO_32:
++    case GPCF_None:
-+        tcg_gen_ld_i32(dest, cpu_env, off);
++        return false;
 +    case GPCF_AddressSize:
 +    case GPCF_Walk:
 +    case GPCF_EABT:
 +        /* R_PYTGX: GPT faults are reported as GPC. */
 +        ret = true;
 +        break;
 +    case GPCF_Fail:
 +        /*
 +         * R_BLYPM: A GPF at EL3 is reported as insn or data abort.
 +         * R_VBZMW, R_LXHQR: A GPF at EL[0-2] is reported as a GPC
 +         * if SCR_EL3.GPF is set, otherwise an insn or data abort.
 +         */
 +        ret = (cpu->env.cp15.scr_el3 & SCR_GPF) && current_el != 3;
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
++
++    assert(cpu_isar_feature(aa64_rme, cpu));
++    assert(fi->type == ARMFault_GPCFOnWalk ||
++           fi->type == ARMFault_GPCFOnOutput);
++    if (fi->gpcf == GPCF_AddressSize) {
++        assert(fi->level == 0);
++    } else {
++        assert(fi->level >= 0 && fi->level <= 1);
++    }
++
++    return ret;
 +}
 +
-+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
++static unsigned encode_gpcsc(ARMMMUFaultInfo *fi)
 +{
-+    long off = neon_element_offset(reg, ele, size);
++    static uint8_t const gpcsc[] = {
-+
++        [GPCF_AddressSize] = 0b000000,
-+    switch (size) {
++        [GPCF_Walk]        = 0b000100,
-+    case MO_32:
++        [GPCF_Fail]        = 0b001100,
-+        tcg_gen_st_i32(src, cpu_env, off);
++        [GPCF_EABT]        = 0b010100,
-+        break;
++    };
-+    default:
++
-+        g_assert_not_reached();
++    /* Note that we've validated fi->gpcf and fi->level above. */
-+    }
++    return gpcsc[fi->gpcf] | fi->level;
 +}
 +
- static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
+ static G_NORETURN
  void arm_deliver_fault(ARMCPU *cpu, vaddr addr,
                         MMUAccessType access_type,
                         int mmu_idx, ARMMMUFaultInfo *fi)
  {
-     TCGv_ptr ret = tcg_temp_new_ptr();
+     CPUARMState *env = &cpu->env;
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+-    int target_el;
-index XXXXXXX..XXXXXXX 100644
++    int target_el = exception_target_el(env);
---- a/target/arm/translate-neon.c.inc
++    int current_el = arm_current_el(env);
-+++ b/target/arm/translate-neon.c.inc
+     bool same_el;
-@@ -XXX,XX +XXX,XX @@ static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
+     uint32_t syn, exc, fsr, fsc;
-      * early. Since Q is 0 there are always just two passes, so instead
-      * of a complicated loop over each pass we just unroll.
+-    target_el = exception_target_el(env);
-      */
++    if (report_as_gpc_exception(cpu, current_el, fi)) {
--    tmp = neon_load_reg(a->vn, 0);
++        target_el = 3;
--    tmp2 = neon_load_reg(a->vn, 1);
++
-+    tmp = tcg_temp_new_i32();
++        fsr = compute_fsr_fsc(env, fi, target_el, mmu_idx, &fsc);
-+    tmp2 = tcg_temp_new_i32();
++
-+    tmp3 = tcg_temp_new_i32();
++        syn = syn_gpc(fi->stage2 && fi->type == ARMFault_GPCFOnWalk,
-+
++                      access_type == MMU_INST_FETCH,
-+    read_neon_element32(tmp, a->vn, 0, MO_32);
++                      encode_gpcsc(fi), 0, fi->s1ptw,
-+    read_neon_element32(tmp2, a->vn, 1, MO_32);
++                      access_type == MMU_DATA_STORE, fsc);
-     fn(tmp, tmp, tmp2);
++
--    tcg_temp_free_i32(tmp2);
++        env->cp15.mfar_el3 = fi->paddr;
++        switch (fi->paddr_space) {
--    tmp3 = neon_load_reg(a->vm, 0);
++        case ARMSS_Secure:
--    tmp2 = neon_load_reg(a->vm, 1);
++            break;
-+    read_neon_element32(tmp3, a->vm, 0, MO_32);
++        case ARMSS_NonSecure:
-+    read_neon_element32(tmp2, a->vm, 1, MO_32);
++            env->cp15.mfar_el3 |= R_MFAR_NS_MASK;
-     fn(tmp3, tmp3, tmp2);
++            break;
--    tcg_temp_free_i32(tmp2);
++        case ARMSS_Root:
++            env->cp15.mfar_el3 |= R_MFAR_NSE_MASK;
--    neon_store_reg(a->vd, 0, tmp);
++            break;
--    neon_store_reg(a->vd, 1, tmp3);
++        case ARMSS_Realm:
-+    write_neon_element32(tmp, a->vd, 0, MO_32);
++            env->cp15.mfar_el3 |= R_MFAR_NSE_MASK | R_MFAR_NS_MASK;
-+    write_neon_element32(tmp3, a->vd, 1, MO_32);
++            break;
-+
++        default:
-+    tcg_temp_free_i32(tmp);
++            g_assert_not_reached();
-+    tcg_temp_free_i32(tmp2);
++        }
-+    tcg_temp_free_i32(tmp3);
++
-     return true;
++        exc = EXCP_GPC;
- }
++        goto do_raise;
++    }
-@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
++
-      * 2-reg-and-shift operations, size < 3 case, where the
++    /* If SCR_EL3.GPF is unset, GPF may still be routed to EL2. */
-      * helper needs to be passed cpu_env.
++    if (fi->gpcf == GPCF_Fail && target_el < 2) {
-      */
++        if (arm_hcr_el2_eff(env) & HCR_GPF) {
--    TCGv_i32 constimm;
++            target_el = 2;
-+    TCGv_i32 constimm, tmp;
++        }
-     int pass;
++    }
++
-     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+     if (fi->stage2) {
-@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
+         target_el = 2;
-      * by immediate using the variable shift operations.
+         env->cp15.hpfar_el2 = extract64(fi->s2addr, 12, 47) << 4;
-      */
+@@ -XXX,XX +XXX,XX @@ void arm_deliver_fault(ARMCPU *cpu, vaddr addr,
-     constimm = tcg_const_i32(dup_const(a->size, a->shift));
+             env->cp15.hpfar_el2 |= HPFAR_NS;
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
 +        read_neon_element32(tmp, a->vm, pass, MO_32);
          fn(tmp, cpu_env, tmp, constimm);
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(constimm);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
      constimm = tcg_const_i64(-a->shift);
      rm1 = tcg_temp_new_i64();
      rm2 = tcg_temp_new_i64();
 +    rd = tcg_temp_new_i32();
      /* Load both inputs first to avoid potential overwrite if rm == rd */
      neon_load_reg64(rm1, a->vm);
      neon_load_reg64(rm2, a->vm + 1);
      shiftfn(rm1, rm1, constimm);
 -    rd = tcg_temp_new_i32();
      narrowfn(rd, cpu_env, rm1);
 -    neon_store_reg(a->vd, 0, rd);
 +    write_neon_element32(rd, a->vd, 0, MO_32);
      shiftfn(rm2, rm2, constimm);
 -    rd = tcg_temp_new_i32();
      narrowfn(rd, cpu_env, rm2);
 -    neon_store_reg(a->vd, 1, rd);
 +    write_neon_element32(rd, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd);
      tcg_temp_free_i64(rm1);
      tcg_temp_free_i64(rm2);
      tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      constimm = tcg_const_i32(imm);
      /* Load all inputs first to avoid potential overwrite */
 -    rm1 = neon_load_reg(a->vm, 0);
 -    rm2 = neon_load_reg(a->vm, 1);
 -    rm3 = neon_load_reg(a->vm + 1, 0);
 -    rm4 = neon_load_reg(a->vm + 1, 1);
 +    rm1 = tcg_temp_new_i32();
 +    rm2 = tcg_temp_new_i32();
 +    rm3 = tcg_temp_new_i32();
 +    rm4 = tcg_temp_new_i32();
 +    read_neon_element32(rm1, a->vm, 0, MO_32);
 +    read_neon_element32(rm2, a->vm, 1, MO_32);
 +    read_neon_element32(rm3, a->vm, 2, MO_32);
 +    read_neon_element32(rm4, a->vm, 3, MO_32);
      rtmp = tcg_temp_new_i64();
      shiftfn(rm1, rm1, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      tcg_temp_free_i32(rm2);
      narrowfn(rm1, cpu_env, rtmp);
 -    neon_store_reg(a->vd, 0, rm1);
 +    write_neon_element32(rm1, a->vd, 0, MO_32);
 +    tcg_temp_free_i32(rm1);
      shiftfn(rm3, rm3, constimm);
      shiftfn(rm4, rm4, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      narrowfn(rm3, cpu_env, rtmp);
      tcg_temp_free_i64(rtmp);
 -    neon_store_reg(a->vd, 1, rm3);
 +    write_neon_element32(rm3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rm3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
          widen_mask = dup_const(a->size + 1, widen_mask);
      }
 -    rm0 = neon_load_reg(a->vm, 0);
 -    rm1 = neon_load_reg(a->vm, 1);
 +    rm0 = tcg_temp_new_i32();
 +    rm1 = tcg_temp_new_i32();
 +    read_neon_element32(rm0, a->vm, 0, MO_32);
 +    read_neon_element32(rm1, a->vm, 1, MO_32);
      tmp = tcg_temp_new_i64();
      widenfn(tmp, rm0);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      if (src1_wide) {
          neon_load_reg64(rn0_64, a->vn);
      } else {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vn, 0, MO_32);
          widenfn(rn0_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = neon_load_reg(a->vm, 0);
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rm, a->vm, 0, MO_32);
      widenfn(rm_64, rm);
      tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      if (src1_wide) {
          neon_load_reg64(rn1_64, a->vn + 1);
      } else {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vn, 1, MO_32);
          widenfn(rn1_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = neon_load_reg(a->vm, 1);
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rm, a->vm, 1, MO_32);
      neon_store_reg64(rn0_64, a->vd);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
      narrowfn(rd1, rn_64);
 -    neon_store_reg(a->vd, 0, rd0);
 -    neon_store_reg(a->vd, 1, rd1);
 +    write_neon_element32(rd0, a->vd, 0, MO_32);
 +    write_neon_element32(rd1, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd0);
 +    tcg_temp_free_i32(rd1);
      tcg_temp_free_i64(rn_64);
      tcg_temp_free_i64(rm_64);
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
      rd0 = tcg_temp_new_i64();
      rd1 = tcg_temp_new_i64();
 -    rn = neon_load_reg(a->vn, 0);
 -    rm = neon_load_reg(a->vm, 0);
 +    rn = tcg_temp_new_i32();
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rn, a->vn, 0, MO_32);
 +    read_neon_element32(rm, a->vm, 0, MO_32);
      opfn(rd0, rn, rm);
 -    tcg_temp_free_i32(rn);
 -    tcg_temp_free_i32(rm);
 -    rn = neon_load_reg(a->vn, 1);
 -    rm = neon_load_reg(a->vm, 1);
 +    read_neon_element32(rn, a->vn, 1, MO_32);
 +    read_neon_element32(rm, a->vm, 1, MO_32);
      opfn(rd1, rn, rm);
      tcg_temp_free_i32(rn);
      tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
  static inline TCGv_i32 neon_get_scalar(int size, int reg)
  {
 -    TCGv_i32 tmp;
 -    if (size == 1) {
 -        tmp = neon_load_reg(reg & 7, reg >> 4);
 +    TCGv_i32 tmp = tcg_temp_new_i32();
 +    if (size == MO_16) {
 +        read_neon_element32(tmp, reg & 7, reg >> 4, MO_32);
          if (reg & 8) {
              gen_neon_dup_high16(tmp);
          } else {
              gen_neon_dup_low16(tmp);
          }
      } else {
 -        tmp = neon_load_reg(reg & 15, reg >> 4);
 +        read_neon_element32(tmp, reg & 15, reg >> 4, MO_32);
      }
      return tmp;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
       * perform an accumulation operation of that result into the
       * destination.
       */
 -    TCGv_i32 scalar;
 +    TCGv_i32 scalar, tmp;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
      }
      scalar = neon_get_scalar(a->size, a->vm);
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
 +        read_neon_element32(tmp, a->vn, pass, MO_32);
          opfn(tmp, tmp, scalar);
          if (accfn) {
 -            TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +            TCGv_i32 rd = tcg_temp_new_i32();
 +            read_neon_element32(rd, a->vd, pass, MO_32);
              accfn(tmp, rd, tmp);
              tcg_temp_free_i32(rd);
          }
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(scalar);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
       * performs a kind of fused op-then-accumulate using a helper
       * function that takes all of rd, rn and the scalar at once.
       */
 -    TCGv_i32 scalar;
 +    TCGv_i32 scalar, rn, rd;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
      }
      scalar = neon_get_scalar(a->size, a->vm);
 +    rn = tcg_temp_new_i32();
 +    rd = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 rn = neon_load_reg(a->vn, pass);
 -        TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +        read_neon_element32(rn, a->vn, pass, MO_32);
 +        read_neon_element32(rd, a->vd, pass, MO_32);
          opfn(rd, cpu_env, rn, scalar, rd);
 -        tcg_temp_free_i32(rn);
 -        neon_store_reg(a->vd, pass, rd);
 +        write_neon_element32(rd, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(rn);
 +    tcg_temp_free_i32(rd);
      tcg_temp_free_i32(scalar);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
      scalar = neon_get_scalar(a->size, a->vm);
      /* Load all inputs before writing any outputs, in case of overlap */
 -    rn = neon_load_reg(a->vn, 0);
 +    rn = tcg_temp_new_i32();
 +    read_neon_element32(rn, a->vn, 0, MO_32);
      rn0_64 = tcg_temp_new_i64();
      opfn(rn0_64, rn, scalar);
 -    tcg_temp_free_i32(rn);
 -    rn = neon_load_reg(a->vn, 1);
 +    read_neon_element32(rn, a->vn, 1, MO_32);
      rn1_64 = tcg_temp_new_i64();
      opfn(rn1_64, rn, scalar);
      tcg_temp_free_i32(rn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
          return false;
      }
      n <<= 3;
 +    tmp = tcg_temp_new_i32();
      if (a->op) {
 -        tmp = neon_load_reg(a->vd, 0);
 +        read_neon_element32(tmp, a->vd, 0, MO_32);
      } else {
 -        tmp = tcg_temp_new_i32();
          tcg_gen_movi_i32(tmp, 0);
      }
 -    tmp2 = neon_load_reg(a->vm, 0);
 +    tmp2 = tcg_temp_new_i32();
 +    read_neon_element32(tmp2, a->vm, 0, MO_32);
      ptr1 = vfp_reg_ptr(true, a->vn);
      tmp4 = tcg_const_i32(n);
      gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
 -    tcg_temp_free_i32(tmp);
 +
      if (a->op) {
 -        tmp = neon_load_reg(a->vd, 1);
 +        read_neon_element32(tmp, a->vd, 1, MO_32);
      } else {
 -        tmp = tcg_temp_new_i32();
          tcg_gen_movi_i32(tmp, 0);
      }
 -    tmp3 = neon_load_reg(a->vm, 1);
 +    tmp3 = tcg_temp_new_i32();
 +    read_neon_element32(tmp3, a->vm, 1, MO_32);
      gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(tmp4);
      tcg_temp_free_ptr(ptr1);
 -    neon_store_reg(a->vd, 0, tmp2);
 -    neon_store_reg(a->vd, 1, tmp3);
 -    tcg_temp_free_i32(tmp);
 +
 +    write_neon_element32(tmp2, a->vd, 0, MO_32);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp2);
 +    tcg_temp_free_i32(tmp3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
  static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
  {
      int pass, half;
 +    TCGv_i32 tmp[2];
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
          return true;
      }
 -    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
 -        TCGv_i32 tmp[2];
 +    tmp[0] = tcg_temp_new_i32();
 +    tmp[1] = tcg_temp_new_i32();
 +    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
          for (half = 0; half < 2; half++) {
 -            tmp[half] = neon_load_reg(a->vm, pass * 2 + half);
 +            read_neon_element32(tmp[half], a->vm, pass * 2 + half, MO_32);
              switch (a->size) {
              case 0:
                  tcg_gen_bswap32_i32(tmp[half], tmp[half]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
                  g_assert_not_reached();
              }
          }
 -        neon_store_reg(a->vd, pass * 2, tmp[1]);
 -        neon_store_reg(a->vd, pass * 2 + 1, tmp[0]);
 +        write_neon_element32(tmp[1], a->vd, pass * 2, MO_32);
 +        write_neon_element32(tmp[0], a->vd, pass * 2 + 1, MO_32);
      }
 +
 +    tcg_temp_free_i32(tmp[0]);
 +    tcg_temp_free_i32(tmp[1]);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
          rm0_64 = tcg_temp_new_i64();
          rm1_64 = tcg_temp_new_i64();
          rd_64 = tcg_temp_new_i64();
 -        tmp = neon_load_reg(a->vm, pass * 2);
 +
 +        tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vm, pass * 2, MO_32);
          widenfn(rm0_64, tmp);
 -        tcg_temp_free_i32(tmp);
 -        tmp = neon_load_reg(a->vm, pass * 2 + 1);
 +        read_neon_element32(tmp, a->vm, pass * 2 + 1, MO_32);
          widenfn(rm1_64, tmp);
          tcg_temp_free_i32(tmp);
 +
          opfn(rd_64, rm0_64, rm1_64);
          tcg_temp_free_i64(rm0_64);
          tcg_temp_free_i64(rm1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
      narrowfn(rd0, cpu_env, rm);
      neon_load_reg64(rm, a->vm + 1);
      narrowfn(rd1, cpu_env, rm);
 -    neon_store_reg(a->vd, 0, rd0);
 -    neon_store_reg(a->vd, 1, rd1);
 +    write_neon_element32(rd0, a->vd, 0, MO_32);
 +    write_neon_element32(rd1, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd0);
 +    tcg_temp_free_i32(rd1);
      tcg_temp_free_i64(rm);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      }
      rd = tcg_temp_new_i64();
 +    rm0 = tcg_temp_new_i32();
 +    rm1 = tcg_temp_new_i32();
 -    rm0 = neon_load_reg(a->vm, 0);
 -    rm1 = neon_load_reg(a->vm, 1);
 +    read_neon_element32(rm0, a->vm, 0, MO_32);
 +    read_neon_element32(rm1, a->vm, 1, MO_32);
      widenfn(rd, rm0);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
      fpst = fpstatus_ptr(FPST_STD);
      ahp = get_ahp_flag();
 -    tmp = neon_load_reg(a->vm, 0);
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vm, 0, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    tmp2 = tcg_temp_new_i32();
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp2, tmp2, fpst, ahp);
      tcg_gen_shli_i32(tmp2, tmp2, 16);
      tcg_gen_or_i32(tmp2, tmp2, tmp);
 -    tcg_temp_free_i32(tmp);
 -    tmp = neon_load_reg(a->vm, 2);
 +    read_neon_element32(tmp, a->vm, 2, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
 -    tmp3 = neon_load_reg(a->vm, 3);
 -    neon_store_reg(a->vd, 0, tmp2);
 +    tmp3 = tcg_temp_new_i32();
 +    read_neon_element32(tmp3, a->vm, 3, MO_32);
 +    write_neon_element32(tmp2, a->vd, 0, MO_32);
 +    tcg_temp_free_i32(tmp2);
      gen_helper_vfp_fcvt_f32_to_f16(tmp3, tmp3, fpst, ahp);
      tcg_gen_shli_i32(tmp3, tmp3, 16);
      tcg_gen_or_i32(tmp3, tmp3, tmp);
 -    neon_store_reg(a->vd, 1, tmp3);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp3);
      tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(ahp);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
      fpst = fpstatus_ptr(FPST_STD);
      ahp = get_ahp_flag();
      tmp3 = tcg_temp_new_i32();
 -    tmp = neon_load_reg(a->vm, 0);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    tmp2 = tcg_temp_new_i32();
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vm, 0, MO_32);
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      tcg_gen_ext16u_i32(tmp3, tmp);
      gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
 -    neon_store_reg(a->vd, 0, tmp3);
 +    write_neon_element32(tmp3, a->vd, 0, MO_32);
      tcg_gen_shri_i32(tmp, tmp, 16);
      gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp);
 -    neon_store_reg(a->vd, 1, tmp);
 -    tmp3 = tcg_temp_new_i32();
 +    write_neon_element32(tmp, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp);
      tcg_gen_ext16u_i32(tmp3, tmp2);
      gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
 -    neon_store_reg(a->vd, 2, tmp3);
 +    write_neon_element32(tmp3, a->vd, 2, MO_32);
 +    tcg_temp_free_i32(tmp3);
      tcg_gen_shri_i32(tmp2, tmp2, 16);
      gen_helper_vfp_fcvt_f16_to_f32(tmp2, tmp2, fpst, ahp);
 -    neon_store_reg(a->vd, 3, tmp2);
 +    write_neon_element32(tmp2, a->vd, 3, MO_32);
 +    tcg_temp_free_i32(tmp2);
      tcg_temp_free_i32(ahp);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ DO_2M_CRYPTO(SHA256SU0, aa32_sha2, 2)
  static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
  {
 +    TCGv_i32 tmp;
      int pass;
      /* Handle a 2-reg-misc operation by iterating 32 bits at a time */
@@ -XXX,XX +XXX,XX @@ static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
          return true;
      }
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
 +        read_neon_element32(tmp, a->vm, pass, MO_32);
          fn(tmp, tmp);
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VTRN(DisasContext *s, arg_2misc *a)
          return true;
      }
 -    if (a->size == 2) {
 +    tmp = tcg_temp_new_i32();
 +    tmp2 = tcg_temp_new_i32();
 +    if (a->size == MO_32) {
          for (pass = 0; pass < (a->q ? 4 : 2); pass += 2) {
 -            tmp = neon_load_reg(a->vm, pass);
 -            tmp2 = neon_load_reg(a->vd, pass + 1);
 -            neon_store_reg(a->vm, pass, tmp2);
 -            neon_store_reg(a->vd, pass + 1, tmp);
 +            read_neon_element32(tmp, a->vm, pass, MO_32);
 +            read_neon_element32(tmp2, a->vd, pass + 1, MO_32);
 +            write_neon_element32(tmp2, a->vm, pass, MO_32);
 +            write_neon_element32(tmp, a->vd, pass + 1, MO_32);
          }
      } else {
          for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -            tmp = neon_load_reg(a->vm, pass);
 -            tmp2 = neon_load_reg(a->vd, pass);
 -            if (a->size == 0) {
 +            read_neon_element32(tmp, a->vm, pass, MO_32);
 +            read_neon_element32(tmp2, a->vd, pass, MO_32);
 +            if (a->size == MO_8) {
                  gen_neon_trn_u8(tmp, tmp2);
              } else {
                  gen_neon_trn_u16(tmp, tmp2);
              }
 -            neon_store_reg(a->vm, pass, tmp2);
 -            neon_store_reg(a->vd, pass, tmp);
 +            write_neon_element32(tmp2, a->vm, pass, MO_32);
 +            write_neon_element32(tmp, a->vd, pass, MO_32);
          }
      }
-+    tcg_temp_free_i32(tmp);
+-    same_el = (arm_current_el(env) == target_el);
-+    tcg_temp_free_i32(tmp2);
-     return true;
++    same_el = current_el == target_el;
- }
+     fsr = compute_fsr_fsc(env, fi, target_el, mmu_idx, &fsc);
      if (access_type == MMU_INST_FETCH) {
@@ -XXX,XX +XXX,XX @@ void arm_deliver_fault(ARMCPU *cpu, vaddr addr,
          exc = EXCP_DATA_ABORT;
      }
 + do_raise:
      env->exception.vaddress = addr;
      env->exception.fsr = fsr;
      raise_exception(env, exc, syn, target_el);
 --
-.20.1
+.34.1

-[PULL 04/26] target/arm: Use neon_element_offset in vfp_reg_offset
+[PULL 19/26] target/arm: Implement the granule protection check
 From: Richard Henderson <richard.henderson@linaro.org>
-This seems a bit more readable than using offsetof CPU_DoubleU.
+Place the check at the end of get_phys_addr_with_struct,
 so that we check all physical results.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-5-richard.henderson@linaro.org
+Message-id: 20230620124418.805717-20-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 13 ++++---------
+ target/arm/ptw.c | 249 +++++++++++++++++++++++++++++++++++++++++++----
-file changed, 4 insertions(+), 9 deletions(-)
+file changed, 232 insertions(+), 17 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/ptw.c
-+++ b/target/arm/translate.c
++++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ static long neon_element_offset(int reg, int element, MemOp size)
+@@ -XXX,XX +XXX,XX @@ typedef struct S1Translate {
-     return neon_full_reg_offset(reg) + ofs;
+     void *out_host;
- }
+ } S1Translate;
--static inline long vfp_reg_offset(bool dp, unsigned reg)
+-static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
-+/* Return the offset of a VFP Dreg (dp = true) or VFP Sreg (dp = false). */
+-                                      target_ulong address,
-+static long vfp_reg_offset(bool dp, unsigned reg)
+-                                      MMUAccessType access_type,
 -                                      GetPhysAddrResult *result,
 -                                      ARMMMUFaultInfo *fi);
 +static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
 +                                target_ulong address,
 +                                MMUAccessType access_type,
 +                                GetPhysAddrResult *result,
 +                                ARMMMUFaultInfo *fi);
 +
 +static bool get_phys_addr_gpc(CPUARMState *env, S1Translate *ptw,
 +                              target_ulong address,
 +                              MMUAccessType access_type,
 +                              GetPhysAddrResult *result,
 +                              ARMMMUFaultInfo *fi);
  /* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
  static const uint8_t pamax_map[] = {
@@ -XXX,XX +XXX,XX @@ static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx,
      return (regime_sctlr(env, mmu_idx) & SCTLR_M) == 0;
  }
 +static bool granule_protection_check(CPUARMState *env, uint64_t paddress,
 +                                     ARMSecuritySpace pspace,
 +                                     ARMMMUFaultInfo *fi)
 +{
 +    MemTxAttrs attrs = {
 +        .secure = true,
 +        .space = ARMSS_Root,
 +    };
 +    ARMCPU *cpu = env_archcpu(env);
 +    uint64_t gpccr = env->cp15.gpccr_el3;
 +    unsigned pps, pgs, l0gptsz, level = 0;
 +    uint64_t tableaddr, pps_mask, align, entry, index;
 +    AddressSpace *as;
 +    MemTxResult result;
 +    int gpi;
 +
 +    if (!FIELD_EX64(gpccr, GPCCR, GPC)) {
 +        return true;
 +    }
 +
 +    /*
 +     * GPC Priority 1 (R_GMGRR):
 +     * R_JWCSM: If the configuration of GPCCR_EL3 is invalid,
 +     * the access fails as GPT walk fault at level 0.
 +     */
 +
 +    /*
 +     * Configuration of PPS to a value exceeding the implemented
 +     * physical address size is invalid.
 +     */
 +    pps = FIELD_EX64(gpccr, GPCCR, PPS);
 +    if (pps > FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE)) {
 +        goto fault_walk;
 +    }
 +    pps = pamax_map[pps];
 +    pps_mask = MAKE_64BIT_MASK(0, pps);
 +
 +    switch (FIELD_EX64(gpccr, GPCCR, SH)) {
 +    case 0b10: /* outer shareable */
 +        break;
 +    case 0b00: /* non-shareable */
 +    case 0b11: /* inner shareable */
 +        /* Inner and Outer non-cacheable requires Outer shareable. */
 +        if (FIELD_EX64(gpccr, GPCCR, ORGN) == 0 &&
 +            FIELD_EX64(gpccr, GPCCR, IRGN) == 0) {
 +            goto fault_walk;
 +        }
 +        break;
 +    default:   /* reserved */
 +        goto fault_walk;
 +    }
 +
 +    switch (FIELD_EX64(gpccr, GPCCR, PGS)) {
 +    case 0b00: /* 4KB */
 +        pgs = 12;
 +        break;
 +    case 0b01: /* 64KB */
 +        pgs = 16;
 +        break;
 +    case 0b10: /* 16KB */
 +        pgs = 14;
 +        break;
 +    default: /* reserved */
 +        goto fault_walk;
 +    }
 +
 +    /* Note this field is read-only and fixed at reset. */
 +    l0gptsz = 30 + FIELD_EX64(gpccr, GPCCR, L0GPTSZ);
 +
 +    /*
 +     * GPC Priority 2: Secure, Realm or Root address exceeds PPS.
 +     * R_CPDSB: A NonSecure physical address input exceeding PPS
 +     * does not experience any fault.
 +     */
 +    if (paddress & ~pps_mask) {
 +        if (pspace == ARMSS_NonSecure) {
 +            return true;
 +        }
 +        goto fault_size;
 +    }
 +
 +    /* GPC Priority 3: the base address of GPTBR_EL3 exceeds PPS. */
 +    tableaddr = env->cp15.gptbr_el3 << 12;
 +    if (tableaddr & ~pps_mask) {
 +        goto fault_size;
 +    }
 +
 +    /*
 +     * BADDR is aligned per a function of PPS and L0GPTSZ.
 +     * These bits of GPTBR_EL3 are RES0, but are not a configuration error,
 +     * unlike the RES0 bits of the GPT entries (R_XNKFZ).
 +     */
 +    align = MAX(pps - l0gptsz + 3, 12);
 +    align = MAKE_64BIT_MASK(0, align);
 +    tableaddr &= ~align;
 +
 +    as = arm_addressspace(env_cpu(env), attrs);
 +
 +    /* Level 0 lookup. */
 +    index = extract64(paddress, l0gptsz, pps - l0gptsz);
 +    tableaddr += index * 8;
 +    entry = address_space_ldq_le(as, tableaddr, attrs, &result);
 +    if (result != MEMTX_OK) {
 +        goto fault_eabt;
 +    }
 +
 +    switch (extract32(entry, 0, 4)) {
 +    case 1: /* block descriptor */
 +        if (entry >> 8) {
 +            goto fault_walk; /* RES0 bits not 0 */
 +        }
 +        gpi = extract32(entry, 4, 4);
 +        goto found;
 +    case 3: /* table descriptor */
 +        tableaddr = entry & ~0xf;
 +        align = MAX(l0gptsz - pgs - 1, 12);
 +        align = MAKE_64BIT_MASK(0, align);
 +        if (tableaddr & (~pps_mask | align)) {
 +            goto fault_walk; /* RES0 bits not 0 */
 +        }
 +        break;
 +    default: /* invalid */
 +        goto fault_walk;
 +    }
 +
 +    /* Level 1 lookup */
 +    level = 1;
 +    index = extract64(paddress, pgs + 4, l0gptsz - pgs - 4);
 +    tableaddr += index * 8;
 +    entry = address_space_ldq_le(as, tableaddr, attrs, &result);
 +    if (result != MEMTX_OK) {
 +        goto fault_eabt;
 +    }
 +
 +    switch (extract32(entry, 0, 4)) {
 +    case 1: /* contiguous descriptor */
 +        if (entry >> 10) {
 +            goto fault_walk; /* RES0 bits not 0 */
 +        }
 +        /*
 +         * Because the softmmu tlb only works on units of TARGET_PAGE_SIZE,
 +         * and because we cannot invalidate by pa, and thus will always
 +         * flush entire tlbs, we don't actually care about the range here
 +         * and can simply extract the GPI as the result.
 +         */
 +        if (extract32(entry, 8, 2) == 0) {
 +            goto fault_walk; /* reserved contig */
 +        }
 +        gpi = extract32(entry, 4, 4);
 +        break;
 +    default:
 +        index = extract64(paddress, pgs, 4);
 +        gpi = extract64(entry, index * 4, 4);
 +        break;
 +    }
 +
 + found:
 +    switch (gpi) {
 +    case 0b0000: /* no access */
 +        break;
 +    case 0b1111: /* all access */
 +        return true;
 +    case 0b1000:
 +    case 0b1001:
 +    case 0b1010:
 +    case 0b1011:
 +        if (pspace == (gpi & 3)) {
 +            return true;
 +        }
 +        break;
 +    default:
 +        goto fault_walk; /* reserved */
 +    }
 +
 +    fi->gpcf = GPCF_Fail;
 +    goto fault_common;
 + fault_eabt:
 +    fi->gpcf = GPCF_EABT;
 +    goto fault_common;
 + fault_size:
 +    fi->gpcf = GPCF_AddressSize;
 +    goto fault_common;
 + fault_walk:
 +    fi->gpcf = GPCF_Walk;
 + fault_common:
 +    fi->level = level;
 +    fi->paddr = paddress;
 +    fi->paddr_space = pspace;
 +    return false;
 +}
 +
  static bool S2_attrs_are_device(uint64_t hcr, uint8_t attrs)
  {
-     if (dp) {
+     /*
--        return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
-+        return neon_element_offset(reg, 0, MO_64);
+         };
-     } else {
+         GetPhysAddrResult s2 = { };
--        long ofs = offsetof(CPUARMState, vfp.zregs[reg >> 2].d[(reg >> 1) & 1]);
--        if (reg & 1) {
+-        if (get_phys_addr_with_struct(env, &s2ptw, addr,
--            ofs += offsetof(CPU_DoubleU, l.upper);
+-                                      MMU_DATA_LOAD, &s2, fi)) {
--        } else {
++        if (get_phys_addr_gpc(env, &s2ptw, addr, MMU_DATA_LOAD, &s2, fi)) {
--            ofs += offsetof(CPU_DoubleU, l.lower);
+             goto fail;
--        }
+         }
--        return ofs;
++
-+        return neon_element_offset(reg >> 1, reg & 1, MO_32);
+         ptw->out_phys = s2.f.phys_addr;
          pte_attrs = s2.cacheattrs.attrs;
          ptw->out_host = NULL;
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
   fail:
      assert(fi->type != ARMFault_None);
 +    if (fi->type == ARMFault_GPCFOnOutput) {
 +        fi->type = ARMFault_GPCFOnWalk;
 +    }
      fi->s2addr = addr;
      fi->stage2 = true;
      fi->s1ptw = true;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_disabled(CPUARMState *env, target_ulong address,
                                     ARMMMUFaultInfo *fi)
  {
      uint8_t memattr = 0x00;    /* Device nGnRnE */
 -    uint8_t shareability = 0;  /* non-sharable */
 +    uint8_t shareability = 0;  /* non-shareable */
      int r_el;
      switch (mmu_idx) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_disabled(CPUARMState *env, target_ulong address,
              } else {
                  memattr = 0x44;  /* Normal, NC, No */
              }
 -            shareability = 2; /* outer sharable */
 +            shareability = 2; /* outer shareable */
          }
          result->cacheattrs.is_s2_format = false;
          break;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
      ARMSecuritySpace ipa_space;
      uint64_t hcr;
 -    ret = get_phys_addr_with_struct(env, ptw, address, access_type, result, fi);
 +    ret = get_phys_addr_nogpc(env, ptw, address, access_type, result, fi);
      /* If S1 fails, return early.  */
      if (ret) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
      cacheattrs1 = result->cacheattrs;
      memset(result, 0, sizeof(*result));
 -    ret = get_phys_addr_with_struct(env, ptw, ipa, access_type, result, fi);
 +    ret = get_phys_addr_nogpc(env, ptw, ipa, access_type, result, fi);
      fi->s2addr = ipa;
      /* Combine the S1 and S2 perms.  */
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
      return false;
  }
 -static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
 +static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
                                        target_ulong address,
                                        MMUAccessType access_type,
                                        GetPhysAddrResult *result,
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
      }
  }
++static bool get_phys_addr_gpc(CPUARMState *env, S1Translate *ptw,
++                              target_ulong address,
++                              MMUAccessType access_type,
++                              GetPhysAddrResult *result,
++                              ARMMMUFaultInfo *fi)
++{
++    if (get_phys_addr_nogpc(env, ptw, address, access_type, result, fi)) {
++        return true;
++    }
++    if (!granule_protection_check(env, result->f.phys_addr,
++                                  result->f.attrs.space, fi)) {
++        fi->type = ARMFault_GPCFOnOutput;
++        return true;
++    }
++    return false;
++}
++
+ bool get_phys_addr_with_secure(CPUARMState *env, target_ulong address,
+                                MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                                bool is_secure, GetPhysAddrResult *result,
+@@ -XXX,XX +XXX,XX @@ bool get_phys_addr_with_secure(CPUARMState *env, target_ulong address,
+         .in_secure = is_secure,
+         .in_space = arm_secure_to_space(is_secure),
+     };
+-    return get_phys_addr_with_struct(env, &ptw, address, access_type,
+-                                     result, fi);
++    return get_phys_addr_gpc(env, &ptw, address, access_type, result, fi);
+ }
+ bool get_phys_addr(CPUARMState *env, target_ulong address,
+@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
+     ptw.in_space = ss;
+     ptw.in_secure = arm_space_is_secure(ss);
+-    return get_phys_addr_with_struct(env, &ptw, address, access_type,
+-                                     result, fi);
++    return get_phys_addr_gpc(env, &ptw, address, access_type, result, fi);
+ }
+ hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
+@@ -XXX,XX +XXX,XX @@ hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
+     ARMMMUFaultInfo fi = {};
+     bool ret;
+-    ret = get_phys_addr_with_struct(env, &ptw, addr, MMU_DATA_LOAD, &res, &fi);
++    ret = get_phys_addr_gpc(env, &ptw, addr, MMU_DATA_LOAD, &res, &fi);
+     *attrs = res.f.attrs;
+     if (ret) {
 --
-.20.1
+.34.1

-[PULL 08/26] target/arm: Add read/write_neon_element64
+[PULL 20/26] target/arm: Add cpu properties for enabling FEAT_RME
 From: Richard Henderson <richard.henderson@linaro.org>
-Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.
+Add an x-rme cpu property to enable FEAT_RME.
 Add an x-l0gptsz property to set GPCCR_EL3.L0GPTSZ,
 for testing various possible configurations.
 We're not currently completely sure whether FEAT_RME will
 be OK to enable purely as a CPU-level property, or if it will
 need board co-operation, so we're making these experimental
 x- properties, so that the people developing the system
 level software for RME can try to start using this and let
 us know how it goes. The command line syntax for enabling
 this will change in future, without backwards-compatibility.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-9-richard.henderson@linaro.org
+Message-id: 20230620124418.805717-21-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          | 26 +++++++++
+ target/arm/tcg/cpu64.c | 53 ++++++++++++++++++++++++++++++++++++++++++
- target/arm/translate-neon.c.inc | 94 ++++++++++++++++-----------------
+file changed, 53 insertions(+)
 files changed, 73 insertions(+), 47 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/tcg/cpu64.c
-+++ b/target/arm/translate.c
++++ b/target/arm/tcg/cpu64.c
-@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
+@@ -XXX,XX +XXX,XX @@ static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char *name,
-     }
+     cpu->sve_max_vq = max_vq;
  }
-+static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
++static bool cpu_arm_get_rme(Object *obj, Error **errp)
 +{
-+    long off = neon_element_offset(reg, ele, memop);
++    ARMCPU *cpu = ARM_CPU(obj);
 +    return cpu_isar_feature(aa64_rme, cpu);
 +}
 +
-+    switch (memop) {
++static void cpu_arm_set_rme(Object *obj, bool value, Error **errp)
-+    case MO_Q:
++{
-+        tcg_gen_ld_i64(dest, cpu_env, off);
++    ARMCPU *cpu = ARM_CPU(obj);
 +    uint64_t t;
 +
 +    t = cpu->isar.id_aa64pfr0;
 +    t = FIELD_DP64(t, ID_AA64PFR0, RME, value);
 +    cpu->isar.id_aa64pfr0 = t;
 +}
 +
 +static void cpu_max_set_l0gptsz(Object *obj, Visitor *v, const char *name,
 +                                void *opaque, Error **errp)
 +{
 +    ARMCPU *cpu = ARM_CPU(obj);
 +    uint32_t value;
 +
 +    if (!visit_type_uint32(v, name, &value, errp)) {
 +        return;
 +    }
 +
 +    /* Encode the value for the GPCCR_EL3 field. */
 +    switch (value) {
 +    case 30:
 +    case 34:
 +    case 36:
 +    case 39:
 +        cpu->reset_l0gptsz = value - 30;
 +        break;
 +    default:
-+        g_assert_not_reached();
++        error_setg(errp, "invalid value for l0gptsz");
 +        error_append_hint(errp, "valid values are 30, 34, 36, 39\n");
 +        break;
 +    }
 +}
 +
- static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
++static void cpu_max_get_l0gptsz(Object *obj, Visitor *v, const char *name,
- {
++                                void *opaque, Error **errp)
      long off = neon_element_offset(reg, ele, memop);
@@ -XXX,XX +XXX,XX @@ static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
      }
  }
 +static void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
 +{
-+    long off = neon_element_offset(reg, ele, memop);
++    ARMCPU *cpu = ARM_CPU(obj);
 +    uint32_t value = cpu->reset_l0gptsz + 30;
 +
-+    switch (memop) {
++    visit_type_uint32(v, name, &value, errp);
 +    case MO_64:
 +        tcg_gen_st_i64(src, cpu_env, off);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
- static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
+ static Property arm_cpu_lpa2_property =
- {
+     DEFINE_PROP_BOOL("lpa2", ARMCPU, prop_lpa2, true);
-     TCGv_ptr ret = tcg_temp_new_ptr();
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
-index XXXXXXX..XXXXXXX 100644
+     aarch64_add_sme_properties(obj);
---- a/target/arm/translate-neon.c.inc
+     object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
-+++ b/target/arm/translate-neon.c.inc
+                         cpu_max_set_sve_max_vq, NULL, NULL);
-@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
++    object_property_add_bool(obj, "x-rme", cpu_arm_get_rme, cpu_arm_set_rme);
-     for (pass = 0; pass < a->q + 1; pass++) {
++    object_property_add(obj, "x-l0gptsz", "uint32", cpu_max_get_l0gptsz,
-         TCGv_i64 tmp = tcg_temp_new_i64();
++                        cpu_max_set_l0gptsz, NULL, NULL);
+     qdev_property_add_static(DEVICE(obj), &arm_cpu_lpa2_property);
 -        neon_load_reg64(tmp, a->vm + pass);
 +        read_neon_element64(tmp, a->vm, pass, MO_64);
          fn(tmp, cpu_env, tmp, constimm);
 -        neon_store_reg64(tmp, a->vd + pass);
 +        write_neon_element64(tmp, a->vd, pass, MO_64);
          tcg_temp_free_i64(tmp);
      }
      tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
      rd = tcg_temp_new_i32();
      /* Load both inputs first to avoid potential overwrite if rm == rd */
 -    neon_load_reg64(rm1, a->vm);
 -    neon_load_reg64(rm2, a->vm + 1);
 +    read_neon_element64(rm1, a->vm, 0, MO_64);
 +    read_neon_element64(rm2, a->vm, 1, MO_64);
      shiftfn(rm1, rm1, constimm);
      narrowfn(rd, cpu_env, rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
          tcg_gen_shli_i64(tmp, tmp, a->shift);
          tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
      }
 -    neon_store_reg64(tmp, a->vd);
 +    write_neon_element64(tmp, a->vd, 0, MO_64);
      widenfn(tmp, rm1);
      tcg_temp_free_i32(rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
          tcg_gen_shli_i64(tmp, tmp, a->shift);
          tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
      }
 -    neon_store_reg64(tmp, a->vd + 1);
 +    write_neon_element64(tmp, a->vd, 1, MO_64);
      tcg_temp_free_i64(tmp);
      return true;
  }
-@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      rm_64 = tcg_temp_new_i64();
      if (src1_wide) {
 -        neon_load_reg64(rn0_64, a->vn);
 +        read_neon_element64(rn0_64, a->vn, 0, MO_64);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 0, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
       * avoid incorrect results if a narrow input overlaps with the result.
       */
      if (src1_wide) {
 -        neon_load_reg64(rn1_64, a->vn + 1);
 +        read_neon_element64(rn1_64, a->vn, 1, MO_64);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      rm = tcg_temp_new_i32();
      read_neon_element32(rm, a->vm, 1, MO_32);
 -    neon_store_reg64(rn0_64, a->vd);
 +    write_neon_element64(rn0_64, a->vd, 0, MO_64);
      widenfn(rm_64, rm);
      tcg_temp_free_i32(rm);
      opfn(rn1_64, rn1_64, rm_64);
 -    neon_store_reg64(rn1_64, a->vd + 1);
 +    write_neon_element64(rn1_64, a->vd, 1, MO_64);
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
      rd0 = tcg_temp_new_i32();
      rd1 = tcg_temp_new_i32();
 -    neon_load_reg64(rn_64, a->vn);
 -    neon_load_reg64(rm_64, a->vm);
 +    read_neon_element64(rn_64, a->vn, 0, MO_64);
 +    read_neon_element64(rm_64, a->vm, 0, MO_64);
      opfn(rn_64, rn_64, rm_64);
      narrowfn(rd0, rn_64);
 -    neon_load_reg64(rn_64, a->vn + 1);
 -    neon_load_reg64(rm_64, a->vm + 1);
 +    read_neon_element64(rn_64, a->vn, 1, MO_64);
 +    read_neon_element64(rm_64, a->vm, 1, MO_64);
      opfn(rn_64, rn_64, rm_64);
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
      /* Don't store results until after all loads: they might overlap */
      if (accfn) {
          tmp = tcg_temp_new_i64();
 -        neon_load_reg64(tmp, a->vd);
 +        read_neon_element64(tmp, a->vd, 0, MO_64);
          accfn(tmp, tmp, rd0);
 -        neon_store_reg64(tmp, a->vd);
 -        neon_load_reg64(tmp, a->vd + 1);
 +        write_neon_element64(tmp, a->vd, 0, MO_64);
 +        read_neon_element64(tmp, a->vd, 1, MO_64);
          accfn(tmp, tmp, rd1);
 -        neon_store_reg64(tmp, a->vd + 1);
 +        write_neon_element64(tmp, a->vd, 1, MO_64);
          tcg_temp_free_i64(tmp);
      } else {
 -        neon_store_reg64(rd0, a->vd);
 -        neon_store_reg64(rd1, a->vd + 1);
 +        write_neon_element64(rd0, a->vd, 0, MO_64);
 +        write_neon_element64(rd1, a->vd, 1, MO_64);
      }
      tcg_temp_free_i64(rd0);
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
      if (accfn) {
          TCGv_i64 t64 = tcg_temp_new_i64();
 -        neon_load_reg64(t64, a->vd);
 +        read_neon_element64(t64, a->vd, 0, MO_64);
          accfn(t64, t64, rn0_64);
 -        neon_store_reg64(t64, a->vd);
 -        neon_load_reg64(t64, a->vd + 1);
 +        write_neon_element64(t64, a->vd, 0, MO_64);
 +        read_neon_element64(t64, a->vd, 1, MO_64);
          accfn(t64, t64, rn1_64);
 -        neon_store_reg64(t64, a->vd + 1);
 +        write_neon_element64(t64, a->vd, 1, MO_64);
          tcg_temp_free_i64(t64);
      } else {
 -        neon_store_reg64(rn0_64, a->vd);
 -        neon_store_reg64(rn1_64, a->vd + 1);
 +        write_neon_element64(rn0_64, a->vd, 0, MO_64);
 +        write_neon_element64(rn1_64, a->vd, 1, MO_64);
      }
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
          right = tcg_temp_new_i64();
          dest = tcg_temp_new_i64();
 -        neon_load_reg64(right, a->vn);
 -        neon_load_reg64(left, a->vm);
 +        read_neon_element64(right, a->vn, 0, MO_64);
 +        read_neon_element64(left, a->vm, 0, MO_64);
          tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
 -        neon_store_reg64(dest, a->vd);
 +        write_neon_element64(dest, a->vd, 0, MO_64);
          tcg_temp_free_i64(left);
          tcg_temp_free_i64(right);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
          destright = tcg_temp_new_i64();
          if (a->imm < 8) {
 -            neon_load_reg64(right, a->vn);
 -            neon_load_reg64(middle, a->vn + 1);
 +            read_neon_element64(right, a->vn, 0, MO_64);
 +            read_neon_element64(middle, a->vn, 1, MO_64);
              tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
 -            neon_load_reg64(left, a->vm);
 +            read_neon_element64(left, a->vm, 0, MO_64);
              tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
          } else {
 -            neon_load_reg64(right, a->vn + 1);
 -            neon_load_reg64(middle, a->vm);
 +            read_neon_element64(right, a->vn, 1, MO_64);
 +            read_neon_element64(middle, a->vm, 0, MO_64);
              tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
 -            neon_load_reg64(left, a->vm + 1);
 +            read_neon_element64(left, a->vm, 1, MO_64);
              tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
          }
 -        neon_store_reg64(destright, a->vd);
 -        neon_store_reg64(destleft, a->vd + 1);
 +        write_neon_element64(destright, a->vd, 0, MO_64);
 +        write_neon_element64(destleft, a->vd, 1, MO_64);
          tcg_temp_free_i64(destright);
          tcg_temp_free_i64(destleft);
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
          if (accfn) {
              TCGv_i64 tmp64 = tcg_temp_new_i64();
 -            neon_load_reg64(tmp64, a->vd + pass);
 +            read_neon_element64(tmp64, a->vd, pass, MO_64);
              accfn(rd_64, tmp64, rd_64);
              tcg_temp_free_i64(tmp64);
          }
 -        neon_store_reg64(rd_64, a->vd + pass);
 +        write_neon_element64(rd_64, a->vd, pass, MO_64);
          tcg_temp_free_i64(rd_64);
      }
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
      rd0 = tcg_temp_new_i32();
      rd1 = tcg_temp_new_i32();
 -    neon_load_reg64(rm, a->vm);
 +    read_neon_element64(rm, a->vm, 0, MO_64);
      narrowfn(rd0, cpu_env, rm);
 -    neon_load_reg64(rm, a->vm + 1);
 +    read_neon_element64(rm, a->vm, 1, MO_64);
      narrowfn(rd1, cpu_env, rm);
      write_neon_element32(rd0, a->vd, 0, MO_32);
      write_neon_element32(rd1, a->vd, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      widenfn(rd, rm0);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
 -    neon_store_reg64(rd, a->vd);
 +    write_neon_element64(rd, a->vd, 0, MO_64);
      widenfn(rd, rm1);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
 -    neon_store_reg64(rd, a->vd + 1);
 +    write_neon_element64(rd, a->vd, 1, MO_64);
      tcg_temp_free_i64(rd);
      tcg_temp_free_i32(rm0);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSWP(DisasContext *s, arg_2misc *a)
      rm = tcg_temp_new_i64();
      rd = tcg_temp_new_i64();
      for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
 -        neon_load_reg64(rm, a->vm + pass);
 -        neon_load_reg64(rd, a->vd + pass);
 -        neon_store_reg64(rm, a->vd + pass);
 -        neon_store_reg64(rd, a->vm + pass);
 +        read_neon_element64(rm, a->vm, pass, MO_64);
 +        read_neon_element64(rd, a->vd, pass, MO_64);
 +        write_neon_element64(rm, a->vd, pass, MO_64);
 +        write_neon_element64(rd, a->vm, pass, MO_64);
      }
      tcg_temp_free_i64(rm);
      tcg_temp_free_i64(rd);
 --
-.20.1
+.34.1

-[PULL 13/26] target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
+[PULL 21/26] docs/system/arm: Document FEAT_RME
-The helper functions for performing the udot/sdot operations against
+From: Richard Henderson <richard.henderson@linaro.org>
 a scalar were not using an address-swizzling macro when converting
 the index of the scalar element into a pointer into the vm array.
 This had no effect on little-endian hosts but meant we generated
 incorrect results on big-endian hosts.
-For these insns, the index is indexing over group of 4 8-bit values,
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-so 32 bits per indexed entity, and H4() is therefore what we want.
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-(For Neon the only possible input indexes are 0 and 1.)
+Message-id: 20230622143046.1578160-1-richard.henderson@linaro.org
 [PMM: fixed typo; note experimental status in emulation.rst too]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  docs/system/arm/cpu-features.rst | 23 +++++++++++++++++++++++
  docs/system/arm/emulation.rst    |  1 +
 files changed, 24 insertions(+)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Message-id: 20201028191712.4910-3-peter.maydell@linaro.org
 ---
  target/arm/vec_helper.c | 4 ++--
 file changed, 2 insertions(+), 2 deletions(-)
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vec_helper.c
+--- a/docs/system/arm/cpu-features.rst
-+++ b/target/arm/vec_helper.c
++++ b/docs/system/arm/cpu-features.rst
-@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
+@@ -XXX,XX +XXX,XX @@ As with ``sve-default-vector-length``, if the default length is larger
-     intptr_t index = simd_data(desc);
+ than the maximum vector length enabled, the actual vector length will
-     uint32_t *d = vd;
+ be reduced.  If this property is set to ``-1`` then the default vector
-     int8_t *n = vn;
+ length is set to the maximum possible length.
--    int8_t *m_indexed = (int8_t *)vm + index * 4;
++
-+    int8_t *m_indexed = (int8_t *)vm + H4(index) * 4;
++RME CPU Properties
++==================
-     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
++
-      * Otherwise opr_sz is a multiple of 16.
++The status of RME support with QEMU is experimental.  At this time we
-@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
++only support RME within the CPU proper, not within the SMMU or GIC.
-     intptr_t index = simd_data(desc);
++The feature is enabled by the CPU property ``x-rme``, with the ``x-``
-     uint32_t *d = vd;
++prefix present as a reminder of the experimental status, and defaults off.
-     uint8_t *n = vn;
++
--    uint8_t *m_indexed = (uint8_t *)vm + index * 4;
++The method for enabling RME will change in some future QEMU release
-+    uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4;
++without notice or backward compatibility.
++
-     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
++RME Level 0 GPT Size Property
-      * Otherwise opr_sz is a multiple of 16.
++-----------------------------
 +
 +To aid firmware developers in testing different possible CPU
 +configurations, ``x-l0gptsz=S`` may be used to specify the value
 +to encode into ``GPCCR_EL3.L0GPTSZ``, a read-only field that
 +specifies the size of the Level 0 Granule Protection Table.
 +Legal values for ``S`` are 30, 34, 36, and 39; the default is 30.
 +
 +As with ``x-rme``, the ``x-l0gptsz`` property may be renamed or
 +removed in some future QEMU release.
 diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/system/arm/emulation.rst
 +++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
  - FEAT_RAS (Reliability, availability, and serviceability)
  - FEAT_RASv1p1 (RAS Extension v1.1)
  - FEAT_RDM (Advanced SIMD rounding double multiply accumulate instructions)
 +- FEAT_RME (Realm Management Extension) (NB: support status in QEMU is experimental)
  - FEAT_RNG (Random number generator)
  - FEAT_S2FWB (Stage 2 forced Write-Back)
  - FEAT_SB (Speculation Barrier)
 --
-.20.1
+.34.1

-[PULL 12/26] target/arm: Fix float16 pairwise Neon ops on big-endian hosts
+[PULL 22/26] host-utils: Avoid using __builtin_subcll on buggy versions of Apple Clang
-In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
+We use __builtin_subcll() to do a 64-bit subtract with borrow-in and
-meant we were using the H4() address swizzler macro rather than the
+borrow-out when the host compiler supports it.  Unfortunately some
-H2() which is required for 2-byte data.  This had no effect on
+versions of Apple Clang have a bug in their implementation of this
-little-endian hosts but meant we put the result data into the
+intrinsic which means it returns the wrong value.  The effect is that
-destination Dreg in the wrong order on big-endian hosts.
+a QEMU built with the affected compiler will hang when emulating x86
 or m68k float80 division.
+The upstream LLVM issue is:
+https://github.com/llvm/llvm-project/issues/55253
+The commit that introduced the bug apparently never made it into an
+upstream LLVM release without the subsequent fix
+https://github.com/llvm/llvm-project/commit/fffb6e6afdbaba563189c1f715058ed401fbc88d
+but unfortunately it did make it into Apple Clang 14.0, as shipped
+in Xcode 14.3 (14.2 is reported to be OK). The Apple bug number is
+FB12210478.
+Add ifdefs to avoid use of __builtin_subcll() on Apple Clang version
+or greater.  There is not currently a version of Apple Clang which
+has the bug fix -- when one appears we should be able to add an upper
+bound to the ifdef condition so we can start using the builtin again.
+We make the lower bound a conservative "any Apple clang with major
+version 14 or greater" because the consequences of incorrectly
+disabling the builtin when it would work are pretty small and the
+consequences of not disabling it when we should are pretty bad.
+Many thanks to those users who both reported this bug and also
+did a lot of work in identifying the root cause; in particular
+to Daniel Bertalan and osy.
+Cc: qemu-stable@nongnu.org
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1631
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1659
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
-Message-id: 20201028191712.4910-2-peter.maydell@linaro.org
+Tested-by: Daniel Bertalan <dani@danielbertalan.dev>
 Tested-by: Tested-By: Solra Bizna <solra@bizna.name>
 Message-id: 20230622130823.1631719-1-peter.maydell@linaro.org
 ---
- target/arm/vec_helper.c | 8 ++++----
+ include/qemu/compiler.h   | 13 +++++++++++++
-file changed, 4 insertions(+), 4 deletions(-)
+ include/qemu/host-utils.h |  2 +-
 files changed, 14 insertions(+), 1 deletion(-)
-diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
+diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vec_helper.c
+--- a/include/qemu/compiler.h
-+++ b/target/arm/vec_helper.c
++++ b/include/qemu/compiler.h
-@@ -XXX,XX +XXX,XX @@ DO_ABA(gvec_uaba_d, uint64_t)
+@@ -XXX,XX +XXX,XX @@
-         r2 = float16_##OP(m[H2(0)], m[H2(1)], fpst);                    \
+ #define QEMU_DISABLE_CFI
-         r3 = float16_##OP(m[H2(2)], m[H2(3)], fpst);                    \
+ #endif
-                                                                         \
--        d[H4(0)] = r0;                                                  \
++/*
--        d[H4(1)] = r1;                                                  \
++ * Apple clang version 14 has a bug in its __builtin_subcll(); define
--        d[H4(2)] = r2;                                                  \
++ * BUILTIN_SUBCLL_BROKEN for the offending versions so we can avoid it.
--        d[H4(3)] = r3;                                                  \
++ * When a version of Apple clang which has this bug fixed is released
-+        d[H2(0)] = r0;                                                  \
++ * we can add an upper bound to this check.
-+        d[H2(1)] = r1;                                                  \
++ * See https://gitlab.com/qemu-project/qemu/-/issues/1631
-+        d[H2(2)] = r2;                                                  \
++ * and https://gitlab.com/qemu-project/qemu/-/issues/1659 for details.
-+        d[H2(3)] = r3;                                                  \
++ * The bug never made it into any upstream LLVM releases, only Apple ones.
-     }
++ */
++#if defined(__apple_build_version__) && __clang_major__ >= 14
- DO_NEON_PAIRWISE(neon_padd, add)
++#define BUILTIN_SUBCLL_BROKEN
 +#endif
 +
  #endif /* COMPILER_H */
 diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/qemu/host-utils.h
 +++ b/include/qemu/host-utils.h
@@ -XXX,XX +XXX,XX @@ static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)
   */
  static inline uint64_t usub64_borrow(uint64_t x, uint64_t y, bool *pborrow)
  {
 -#if __has_builtin(__builtin_subcll)
 +#if __has_builtin(__builtin_subcll) && !defined(BUILTIN_SUBCLL_BROKEN)
      unsigned long long b = *pborrow;
      x = __builtin_subcll(x, y, b, &b);
      *pborrow = b & 1;
 --
-.20.1
+.34.1

-[PULL 10/26] target/arm: Simplify do_long_3d and do_2scalar_long
+[PULL 23/26] target/arm: Restructure has_vfp_d32 test
 From: Richard Henderson <richard.henderson@linaro.org>
-In both cases, we can sink the write-back and perform
+One cannot test for feature aa32_simd_r32 without first
-the accumulate into the normal destination temps.
+testing if AArch32 mode is supported at all.  This leads to
+qemu-system-aarch64: ARM CPUs must have both VFP-D32 and Neon or neither
+for Apple M1 cpus.
+We already have a check for ARMv8-A never setting vfp-d32 true,
+so restructure the code so that AArch64 avoids the test entirely.
+Reported-by: Mads Ynddal <mads@ynddal.dk>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-11-richard.henderson@linaro.org
+Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Tested-by: Mads Ynddal <m.ynddal@samsung.com>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Cédric Le Goater <clg@kaod.org>
 Reviewed-by: Mads Ynddal <m.ynddal@samsung.com>
 Message-id: 20230619140216.402530-1-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-neon.c.inc | 23 +++++++++--------------
+ target/arm/cpu.c | 28 +++++++++++++++-------------
-file changed, 9 insertions(+), 14 deletions(-)
+file changed, 15 insertions(+), 13 deletions(-)
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.c.inc
+--- a/target/arm/cpu.c
-+++ b/target/arm/translate-neon.c.inc
++++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
+@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
-     if (accfn) {
+      * KVM does not currently allow us to lie to the guest about its
-         tmp = tcg_temp_new_i64();
+      * ID/feature registers, so the guest always sees what the host has.
-         read_neon_element64(tmp, a->vd, 0, MO_64);
+      */
--        accfn(tmp, tmp, rd0);
+-    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)
--        write_neon_element64(tmp, a->vd, 0, MO_64);
+-        ? cpu_isar_feature(aa64_fp_simd, cpu)
-+        accfn(rd0, tmp, rd0);
+-        : cpu_isar_feature(aa32_vfp, cpu)) {
-         read_neon_element64(tmp, a->vd, 1, MO_64);
+-        cpu->has_vfp = true;
--        accfn(tmp, tmp, rd1);
+-        if (!kvm_enabled()) {
--        write_neon_element64(tmp, a->vd, 1, MO_64);
+-            qdev_property_add_static(DEVICE(obj), &arm_cpu_has_vfp_property);
-+        accfn(rd1, tmp, rd1);
++    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
-         tcg_temp_free_i64(tmp);
++        if (cpu_isar_feature(aa64_fp_simd, cpu)) {
--    } else {
++            cpu->has_vfp = true;
--        write_neon_element64(rd0, a->vd, 0, MO_64);
++            cpu->has_vfp_d32 = true;
--        write_neon_element64(rd1, a->vd, 1, MO_64);
++            if (tcg_enabled() || qtest_enabled()) {
-     }
++                qdev_property_add_static(DEVICE(obj),
++                                         &arm_cpu_has_vfp_property);
-+    write_neon_element64(rd0, a->vd, 0, MO_64);
++            }
-+    write_neon_element64(rd1, a->vd, 1, MO_64);
+         }
-     tcg_temp_free_i64(rd0);
+-    }
-     tcg_temp_free_i64(rd1);
+-
+-    if (cpu->has_vfp && cpu_isar_feature(aa32_simd_r32, cpu)) {
-@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
+-        cpu->has_vfp_d32 = true;
-     if (accfn) {
+-        if (!kvm_enabled()) {
-         TCGv_i64 t64 = tcg_temp_new_i64();
++    } else if (cpu_isar_feature(aa32_vfp, cpu)) {
-         read_neon_element64(t64, a->vd, 0, MO_64);
++        cpu->has_vfp = true;
--        accfn(t64, t64, rn0_64);
++        if (cpu_isar_feature(aa32_simd_r32, cpu)) {
--        write_neon_element64(t64, a->vd, 0, MO_64);
++            cpu->has_vfp_d32 = true;
-+        accfn(rn0_64, t64, rn0_64);
+             /*
-         read_neon_element64(t64, a->vd, 1, MO_64);
+              * The permitted values of the SIMDReg bits [3:0] on
--        accfn(t64, t64, rn1_64);
+              * Armv8-A are either 0b0000 and 0b0010. On such CPUs,
--        write_neon_element64(t64, a->vd, 1, MO_64);
+              * make sure that has_vfp_d32 can not be set to false.
-+        accfn(rn1_64, t64, rn1_64);
+              */
-         tcg_temp_free_i64(t64);
+-            if (!(arm_feature(&cpu->env, ARM_FEATURE_V8) &&
--    } else {
+-                  !arm_feature(&cpu->env, ARM_FEATURE_M))) {
--        write_neon_element64(rn0_64, a->vd, 0, MO_64);
++            if ((tcg_enabled() || qtest_enabled())
--        write_neon_element64(rn1_64, a->vd, 1, MO_64);
++                && !(arm_feature(&cpu->env, ARM_FEATURE_V8)
-     }
++                     && !arm_feature(&cpu->env, ARM_FEATURE_M))) {
-+
+                 qdev_property_add_static(DEVICE(obj),
-+    write_neon_element64(rn0_64, a->vd, 0, MO_64);
+                                          &arm_cpu_has_vfp_d32_property);
-+    write_neon_element64(rn1_64, a->vd, 1, MO_64);
+             }
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
      return true;
 --
-.20.1
+.34.1

-[PULL 17/26] hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
+[PULL 24/26] hw/arm/sbsa-ref: add ITS support in SBSA GIC
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+From: Shashi Mallela <shashi.mallela@linaro.org>
-Use the BIT_ULL() macro to ensure we use 64-bit arithmetic.
+Create ITS as part of SBSA platform GIC initialization.
 This fixes the following Coverity issue (OVERFLOW_BEFORE_WIDEN):
-  CID 1432363 (#1 of 1): Unintentional integer overflow:
+GIC ITS information is in DeviceTree so TF-A can pass it to EDK2.
-  overflow_before_widen:
+Bumping platform version to 0.2 as this is important hardware change.
     Potentially overflowing expression 1 << scale with type int
     (32 bits, signed) is evaluated using 32-bit arithmetic, and
     then used in a context that expects an expression of type
     hwaddr (64 bits, unsigned).
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Signed-off-by: Shashi Mallela <shashi.mallela@linaro.org>
-Acked-by: Eric Auger <eric.auger@redhat.com>
+Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
-Message-id: 20201030144617.1535064-1-philmd@redhat.com
+Message-id: 20230619170913.517373-2-marcin.juszkiewicz@linaro.org
 Co-authored-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
 Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/smmuv3.c | 3 ++-
+ docs/system/arm/sbsa.rst | 14 ++++++++++++++
-file changed, 2 insertions(+), 1 deletion(-)
+ hw/arm/sbsa-ref.c        | 33 ++++++++++++++++++++++++++++++---
 files changed, 44 insertions(+), 3 deletions(-)
-diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
+diff --git a/docs/system/arm/sbsa.rst b/docs/system/arm/sbsa.rst
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/smmuv3.c
+--- a/docs/system/arm/sbsa.rst
-+++ b/hw/arm/smmuv3.c
++++ b/docs/system/arm/sbsa.rst
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ to be a complete compliant DT. It currently reports:
-  */
+    - platform version
+    - GIC addresses
- #include "qemu/osdep.h"
-+#include "qemu/bitops.h"
++Platform version
- #include "hw/irq.h"
++''''''''''''''''
- #include "hw/sysbus.h"
++
- #include "migration/vmstate.h"
+ The platform version is only for informing platform firmware about
-@@ -XXX,XX +XXX,XX @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
+ what kind of ``sbsa-ref`` board it is running on. It is neither
-         scale = CMD_SCALE(cmd);
+ a QEMU versioned machine type nor a reflection of the level of the
-         num = CMD_NUM(cmd);
+@@ -XXX,XX +XXX,XX @@ SBSA/SystemReady SR support provided.
-         ttl = CMD_TTL(cmd);
+ The ``machine-version-major`` value is updated when changes breaking
--        num_pages = (num + 1) * (1 << (scale));
+ fw compatibility are introduced. The ``machine-version-minor`` value
-+        num_pages = (num + 1) * BIT_ULL(scale);
+ is updated when features are added that don't break fw compatibility.
 +
 +Platform version changes:
 +
 +0.0
 +  Devicetree holds information about CPUs, memory and platform version.
 +
 +0.1
 +  GIC information is present in devicetree.
 +
 +0.2
 +  GIC ITS information is present in devicetree.
 diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/sbsa-ref.c
 +++ b/hw/arm/sbsa-ref.c
@@ -XXX,XX +XXX,XX @@ enum {
      SBSA_CPUPERIPHS,
      SBSA_GIC_DIST,
      SBSA_GIC_REDIST,
 +    SBSA_GIC_ITS,
      SBSA_SECURE_EC,
      SBSA_GWDT_WS0,
      SBSA_GWDT_REFRESH,
@@ -XXX,XX +XXX,XX @@ static const MemMapEntry sbsa_ref_memmap[] = {
      [SBSA_CPUPERIPHS] =         { 0x40000000, 0x00040000 },
      [SBSA_GIC_DIST] =           { 0x40060000, 0x00010000 },
      [SBSA_GIC_REDIST] =         { 0x40080000, 0x04000000 },
 +    [SBSA_GIC_ITS] =            { 0x44081000, 0x00020000 },
      [SBSA_SECURE_EC] =          { 0x50000000, 0x00001000 },
      [SBSA_GWDT_REFRESH] =       { 0x50010000, 0x00001000 },
      [SBSA_GWDT_CONTROL] =       { 0x50011000, 0x00001000 },
@@ -XXX,XX +XXX,XX @@ static void sbsa_fdt_add_gic_node(SBSAMachineState *sms)
 , sbsa_ref_memmap[SBSA_GIC_REDIST].base,
 , sbsa_ref_memmap[SBSA_GIC_REDIST].size);
 +    nodename = g_strdup_printf("/intc/its");
 +    qemu_fdt_add_subnode(sms->fdt, nodename);
 +    qemu_fdt_setprop_sized_cells(sms->fdt, nodename, "reg",
 +                                 2, sbsa_ref_memmap[SBSA_GIC_ITS].base,
 +                                 2, sbsa_ref_memmap[SBSA_GIC_ITS].size);
 +
      g_free(nodename);
  }
 +
  /*
   * Firmware on this machine only uses ACPI table to load OS, these limited
   * device tree nodes are just to let firmware know the info which varies from
@@ -XXX,XX +XXX,XX @@ static void create_fdt(SBSAMachineState *sms)
       *                        fw compatibility.
       */
      qemu_fdt_setprop_cell(fdt, "/", "machine-version-major", 0);
 -    qemu_fdt_setprop_cell(fdt, "/", "machine-version-minor", 1);
 +    qemu_fdt_setprop_cell(fdt, "/", "machine-version-minor", 2);
      if (ms->numa_state->have_numa_distance) {
          int size = nb_numa_nodes * nb_numa_nodes * 3 * sizeof(uint32_t);
@@ -XXX,XX +XXX,XX @@ static void create_secure_ram(SBSAMachineState *sms,
      memory_region_add_subregion(secure_sysmem, base, secram);
  }
 -static void create_gic(SBSAMachineState *sms)
 +static void create_its(SBSAMachineState *sms)
 +{
 +    const char *itsclass = its_class_name();
 +    DeviceState *dev;
 +
 +    dev = qdev_new(itsclass);
 +
 +    object_property_set_link(OBJECT(dev), "parent-gicv3", OBJECT(sms->gic),
 +                             &error_abort);
 +    sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
 +    sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, sbsa_ref_memmap[SBSA_GIC_ITS].base);
 +}
 +
 +static void create_gic(SBSAMachineState *sms, MemoryRegion *mem)
  {
      unsigned int smp_cpus = MACHINE(sms)->smp.cpus;
      SysBusDevice *gicbusdev;
@@ -XXX,XX +XXX,XX @@ static void create_gic(SBSAMachineState *sms)
      qdev_prop_set_uint32(sms->gic, "len-redist-region-count", 1);
      qdev_prop_set_uint32(sms->gic, "redist-region-count[0]", redist0_count);
 +    object_property_set_link(OBJECT(sms->gic), "sysmem",
 +                             OBJECT(mem), &error_fatal);
 +    qdev_prop_set_bit(sms->gic, "has-lpi", true);
 +
      gicbusdev = SYS_BUS_DEVICE(sms->gic);
      sysbus_realize_and_unref(gicbusdev, &error_fatal);
      sysbus_mmio_map(gicbusdev, 0, sbsa_ref_memmap[SBSA_GIC_DIST].base);
@@ -XXX,XX +XXX,XX @@ static void create_gic(SBSAMachineState *sms)
          sysbus_connect_irq(gicbusdev, i + 3 * smp_cpus,
                             qdev_get_gpio_in(cpudev, ARM_CPU_VFIQ));
      }
++    create_its(sms);
-     if (type == SMMU_CMD_TLBI_NH_VA) {
+ }
  static void create_uart(const SBSAMachineState *sms, int uart,
@@ -XXX,XX +XXX,XX @@ static void sbsa_ref_init(MachineState *machine)
      create_secure_ram(sms, secure_sysmem);
 -    create_gic(sms);
 +    create_gic(sms, sysmem);
      create_uart(sms, SBSA_UART, sysmem, serial_hd(0));
      create_uart(sms, SBSA_SECURE_UART, secure_sysmem, serial_hd(1));
 --
-.20.1
+.34.1

-[PULL 03/26] target/arm: Use neon_element_offset in neon_load/store_reg
+[PULL 25/26] target/arm: Fix sve predicate store, 8 <= VQ <= 15
 From: Richard Henderson <richard.henderson@linaro.org>
-These are the only users of neon_reg_offset, so remove that.
+Brown bag time: store instead of load results in uninitialized temp.
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1704
+Reported-by: Mark Rutland <mark.rutland@arm.com>
+Tested-by: Alex Bennée <alex.bennee@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-4-richard.henderson@linaro.org
+Message-id: 20230620134659.817559-1-richard.henderson@linaro.org
 Fixes: e6dd5e782be ("target/arm: Use tcg_gen_qemu_{ld, st}_i128 in gen_sve_{ld, st}r")
 Tested-by: Alex Bennée <alex.bennee@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 14 ++------------
+ target/arm/tcg/translate-sve.c | 2 +-
-file changed, 2 insertions(+), 12 deletions(-)
+file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/tcg/translate-sve.c
-+++ b/target/arm/translate.c
++++ b/target/arm/tcg/translate-sve.c
-@@ -XXX,XX +XXX,XX @@ static inline long vfp_reg_offset(bool dp, unsigned reg)
+@@ -XXX,XX +XXX,XX @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs,
-     }
+     /* Predicate register stores can be any multiple of 2.  */
- }
+     if (len_remain >= 8) {
+         t0 = tcg_temp_new_i64();
--/* Return the offset of a 32-bit piece of a NEON register.
+-        tcg_gen_st_i64(t0, base, vofs + len_align);
--   zero is the least significant end of the register.  */
++        tcg_gen_ld_i64(t0, base, vofs + len_align);
--static inline long
+         tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUQ | MO_ATOM_NONE);
--neon_reg_offset (int reg, int n)
+         len_remain -= 8;
--{
+         len_align += 8;
 -    int sreg;
 -    sreg = reg * 2 + n;
 -    return vfp_reg_offset(0, sreg);
 -}
 -
  static TCGv_i32 neon_load_reg(int reg, int pass)
  {
      TCGv_i32 tmp = tcg_temp_new_i32();
 -    tcg_gen_ld_i32(tmp, cpu_env, neon_reg_offset(reg, pass));
 +    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
      return tmp;
  }
  static void neon_store_reg(int reg, int pass, TCGv_i32 var)
  {
 -    tcg_gen_st_i32(var, cpu_env, neon_reg_offset(reg, pass));
 +    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
      tcg_temp_free_i32(var);
  }
 --
-.20.1
+.34.1

-[PULL 16/26] disas/capstone: Fix monitor disassembly of >32 bytes
+[PULL 26/26] pc-bios/keymaps: Use the official xkb name for Arabic layout, not the legacy synonym
-If we're using the capstone disassembler, disassembly of a run of
+The xkb official name for the Arabic keyboard layout is 'ara'.
-instructions more than 32 bytes long disassembles the wrong data for
+However xkb has for at least the past 15 years also permitted it to
-instructions beyond the 32 byte mark:
+be named via the legacy synonym 'ar'.  In xkeyboard-config 2.39 this
 synoynm was removed, which breaks compilation of QEMU:
-(qemu) xp /16x 0x100
+FAILED: pc-bios/keymaps/ar
-0000000000000100: 0x00000005 0x54410001 0x00000001 0x00001000
+/home/fred/qemu-git/src/qemu/build-full/qemu-keymap -f pc-bios/keymaps/ar -l ar
-0000000000000110: 0x00000000 0x00000004 0x54410002 0x3c000000
+xkbcommon: ERROR: Couldn't find file "symbols/ar" in include paths
-0000000000000120: 0x00000000 0x00000004 0x54410009 0x74736574
+xkbcommon: ERROR: 1 include paths searched:
-0000000000000130: 0x00000000 0x00000000 0x00000000 0x00000000
+xkbcommon: ERROR:     /usr/share/X11/xkb
-(qemu) xp /16i 0x100
+xkbcommon: ERROR: 3 include paths could not be added:
-x00000100: 00000005 andeq r0, r0, r5
+xkbcommon: ERROR:     /home/fred/.config/xkb
-x00000104: 54410001 strbpl r0, [r1], #-1
+xkbcommon: ERROR:     /home/fred/.xkb
-x00000108: 00000001 andeq r0, r0, r1
+xkbcommon: ERROR:     /etc/xkb
-x0000010c: 00001000 andeq r1, r0, r0
+xkbcommon: ERROR: Abandoning symbols file "(unnamed)"
-x00000110: 00000000 andeq r0, r0, r0
+xkbcommon: ERROR: Failed to compile xkb_symbols
-x00000114: 00000004 andeq r0, r0, r4
+xkbcommon: ERROR: Failed to compile keymap
 x00000118: 54410002 strbpl r0, [r1], #-2
 x0000011c: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
 x00000120: 54410001 strbpl r0, [r1], #-1
 x00000124: 00000001 andeq r0, r0, r1
 x00000128: 00001000 andeq r1, r0, r0
 x0000012c: 00000000 andeq r0, r0, r0
 x00000130: 00000004 andeq r0, r0, r4
 x00000134: 54410002 strbpl r0, [r1], #-2
 x00000138: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
 x0000013c: 00000000 andeq r0, r0, r0
-Here the disassembly of 0x120..0x13f is using the data that is in
+The upstream xkeyboard-config change removing the compat
-x104..0x123.
+mapping is:
 https://gitlab.freedesktop.org/xkeyboard-config/xkeyboard-config/-/commit/470ad2cd8fea84d7210377161d86b31999bb5ea6
-This is caused by passing the wrong value to the read_memory_func().
+Make QEMU always ask for the 'ara' xkb layout, which should work on
-The intention is that at this point in the loop the 'cap_buf' buffer
+both older and newer xkeyboard-config.  We leave the QEMU name for
-already contains 'csize' bytes of data for the instruction at guest
+this keyboard layout as 'ar'; it is not the only one where our name
-addr 'pc', and we want to read in an extra 'tsize' bytes.  Those
+for it deviates from the xkb standard name.
 extra bytes are therefore at 'pc + csize', not 'pc'.  On the first
 time through the loop 'csize' happens to be zero, so the initial read
 of 32 bytes into cap_buf is correct and as long as the disassembly
 never needs to read more data we return the correct information.
 Use the correct guest address in the call to read_memory_func().
 Cc: qemu-stable@nongnu.org
-Fixes: https://bugs.launchpad.net/qemu/+bug/1900779
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201022132445.25039-1-peter.maydell@linaro.org
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
 Message-id: 20230620162024.1132013-1-peter.maydell@linaro.org
 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1709
 ---
- disas/capstone.c | 2 +-
+ pc-bios/keymaps/meson.build | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/disas/capstone.c b/disas/capstone.c
+diff --git a/pc-bios/keymaps/meson.build b/pc-bios/keymaps/meson.build
 index XXXXXXX..XXXXXXX 100644
---- a/disas/capstone.c
+--- a/pc-bios/keymaps/meson.build
-+++ b/disas/capstone.c
++++ b/pc-bios/keymaps/meson.build
-@@ -XXX,XX +XXX,XX @@ bool cap_disas_monitor(disassemble_info *info, uint64_t pc, int count)
+@@ -XXX,XX +XXX,XX @@
+ keymaps = {
-         /* Make certain that we can make progress.  */
+-  'ar': '-l ar',
-         assert(tsize != 0);
++  'ar': '-l ara',
--        info->read_memory_func(pc, cap_buf + csize, tsize, info);
+   'bepo': '-l fr -v dvorak',
-+        info->read_memory_func(pc + csize, cap_buf + csize, tsize, info);
+   'cz': '-l cz',
-         csize += tsize;
+   'da': '-l dk',
          if (cs_disasm_iter(handle, &cbuf, &csize, &pc, insn)) {
 --
-.20.1
+.34.1

Small pile of bug fixes for rc1. I've included my patches to get
our docs building with Sphinx 3, just for convenience...

-- PMM

The following changes since commit b149dea55cce97cb226683d06af61984a1c11e96:

Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20201102' into staging (2020-11-02 10:57:48 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20201102

for you to fetch changes up to ffb4fbf90a2f63c9cb33e4bb9f854c79bf04ca4a:

tests/qtest/npcm7xx_rng-test: Disable randomness tests (2020-11-02 16:52:18 +0000)

----------------------------------------------------------------
target-arm queue:
 * target/arm: Fix Neon emulation bugs on big-endian hosts
 * target/arm: fix handling of HCR.FB
 * target/arm: fix LORID_EL1 access check
 * disas/capstone: Fix monitor disassembly of >32 bytes
 * hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
 * hw/arm/boot: fix SVE for EL3 direct kernel boot
 * hw/display/omap_lcdc: Fix potential NULL pointer dereference
 * hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
 * target/arm: Get correct MMU index for other-security-state
 * configure: Test that gio libs from pkg-config work
 * hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
 * docs: Fix building with Sphinx 3
 * tests/qtest/npcm7xx_rng-test: Disable randomness tests

----------------------------------------------------------------
AlexChen (2):
      hw/display/omap_lcdc: Fix potential NULL pointer dereference
      hw/display/exynos4210_fimd: Fix potential NULL pointer dereference

Peter Maydell (9):
      target/arm: Fix float16 pairwise Neon ops on big-endian hosts
      target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
      disas/capstone: Fix monitor disassembly of >32 bytes
      target/arm: Get correct MMU index for other-security-state
      configure: Test that gio libs from pkg-config work
      hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
      scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
      qemu-option-trace.rst.inc: Don't use option:: markup
      tests/qtest/npcm7xx_rng-test: Disable randomness tests

Philippe Mathieu-Daudé (1):
      hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)

Richard Henderson (11):
      target/arm: Introduce neon_full_reg_offset
      target/arm: Move neon_element_offset to translate.c
      target/arm: Use neon_element_offset in neon_load/store_reg
      target/arm: Use neon_element_offset in vfp_reg_offset
      target/arm: Add read/write_neon_element32
      target/arm: Expand read/write_neon_element32 to all MemOp
      target/arm: Rename neon_load_reg32 to vfp_load_reg32
      target/arm: Add read/write_neon_element64
      target/arm: Rename neon_load_reg64 to vfp_load_reg64
      target/arm: Simplify do_long_3d and do_2scalar_long
      target/arm: Improve do_prewiden_3d

Rémi Denis-Courmont (3):
      target/arm: fix handling of HCR.FB
      target/arm: fix LORID_EL1 access check
      hw/arm/boot: fix SVE for EL3 direct kernel boot

From: Richard Henderson <richard.henderson@linaro.org>

This function makes it clear that we're talking about the whole
register, and not the 32-bit piece at index 0.  This fixes a bug
when running on a big-endian host.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  8 ++++++
 target/arm/translate-neon.c.inc | 44 ++++++++++++++++-----------------
 target/arm/translate-vfp.c.inc  |  2 +-
 3 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_hlt(DisasContext *s, int imm)
     unallocated_encoding(s);
 }
 
+/*
+ * Return the offset of a "full" NEON Dreg.
+ */
+static long neon_full_reg_offset(unsigned reg)
+{
+    return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+}
+
 static inline long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ neon_element_offset(int reg, int element, MemOp size)
         ofs ^= 8 - element_size;
     }
 #endif
-    return neon_reg_offset(reg, 0) + ofs;
+    return neon_full_reg_offset(reg) + ofs;
 }
 
 static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
              * We cannot write 16 bytes at once because the
              * destination is unaligned.
              */
-            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
+            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
                                  8, 8, tmp);
-            tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
-                             neon_reg_offset(vd, 0), 8, 8);
+            tcg_gen_gvec_mov(0, neon_full_reg_offset(vd + 1),
+                             neon_full_reg_offset(vd), 8, 8);
         } else {
-            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
+            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
                                  vec_size, vec_size, tmp);
         }
         tcg_gen_addi_i32(addr, addr, 1 << size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
 static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
 {
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rn_ofs = neon_reg_offset(a->vn, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rn_ofs = neon_full_reg_offset(a->vn);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
 {
     /* Handle a 2-reg-shift insn which can be vectorized. */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
 {
     /* FP operations in 2-reg-and-shift group */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
     TCGv_ptr fpst;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
         return true;
     }
 
-    reg_ofs = neon_reg_offset(a->vd, 0);
+    reg_ofs = neon_full_reg_offset(a->vd);
     vec_size = a->q ? 16 : 8;
     imm = asimd_imm_const(a->imm, a->cmode, a->op);
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
         return true;
     }
 
-    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
-                       neon_reg_offset(a->vn, 0),
-                       neon_reg_offset(a->vm, 0),
+    tcg_gen_gvec_3_ool(neon_full_reg_offset(a->vd),
+                       neon_full_reg_offset(a->vn),
+                       neon_full_reg_offset(a->vm),
                        16, 16, 0, fn_gvec);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
 {
     /* Two registers and a scalar, using gvec */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rn_ofs = neon_reg_offset(a->vn, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rn_ofs = neon_full_reg_offset(a->vn);
     int rm_ofs;
     int idx;
     TCGv_ptr fpstatus;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
     /* a->vm is M:Vm, which encodes both register and index */
     idx = extract32(a->vm, a->size + 2, 2);
     a->vm = extract32(a->vm, 0, a->size + 2);
-    rm_ofs = neon_reg_offset(a->vm, 0);
+    rm_ofs = neon_full_reg_offset(a->vm);
 
     fpstatus = fpstatus_ptr(a->size == 1 ? FPST_STD_F16 : FPST_STD);
     tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpstatus,
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
         return true;
     }
 
-    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
+    tcg_gen_gvec_dup_mem(a->size, neon_full_reg_offset(a->vd),
                          neon_element_offset(a->vm, a->index, a->size),
                          a->q ? 16 : 8, a->q ? 16 : 8);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
 static bool do_2misc_vec(DisasContext *s, arg_2misc *a, GVecGen2Fn *fn)
 {
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
     }
 
     tmp = load_reg(s, a->rt);
-    tcg_gen_gvec_dup_i32(size, neon_reg_offset(a->vn, 0),
+    tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(a->vn),
                          vec_size, vec_size, tmp);
     tcg_temp_free_i32(tmp);
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This will shortly have users outside of translate-neon.c.inc.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          | 20 ++++++++++++++++++++
 target/arm/translate-neon.c.inc | 19 -------------------
 2 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
     return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
 }
 
+/*
+ * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
+ * where 0 is the least significant end of the register.
+ */
+static long neon_element_offset(int reg, int element, MemOp size)
+{
+    int element_size = 1 << size;
+    int ofs = element * element_size;
+#ifdef HOST_WORDS_BIGENDIAN
+    /*
+     * Calculate the offset assuming fully little-endian,
+     * then XOR to account for the order of the 8-byte units.
+     */
+    if (element_size < 8) {
+        ofs ^= 8 - element_size;
+    }
+#endif
+    return neon_full_reg_offset(reg) + ofs;
+}
+
 static inline long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static inline int neon_3same_fp_size(DisasContext *s, int x)
 #include "decode-neon-ls.c.inc"
 #include "decode-neon-shared.c.inc"
 
-/* Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
- * where 0 is the least significant end of the register.
- */
-static inline long
-neon_element_offset(int reg, int element, MemOp size)
-{
-    int element_size = 1 << size;
-    int ofs = element * element_size;
-#ifdef HOST_WORDS_BIGENDIAN
-    /* Calculate the offset assuming fully little-endian,
-     * then XOR to account for the order of the 8-byte units.
-     */
-    if (element_size < 8) {
-        ofs ^= 8 - element_size;
-    }
-#endif
-    return neon_full_reg_offset(reg) + ofs;
-}
-
 static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
 {
     long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

These are the only users of neon_reg_offset, so remove that.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-4-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-/* Return the offset of a 32-bit piece of a NEON register.
-   zero is the least significant end of the register.  */
-static inline long
-neon_reg_offset (int reg, int n)
-{
-    int sreg;
-    sreg = reg * 2 + n;
-    return vfp_reg_offset(0, sreg);
-}
-
 static TCGv_i32 neon_load_reg(int reg, int pass)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_ld_i32(tmp, cpu_env, neon_reg_offset(reg, pass));
+    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
     return tmp;
 }
 
 static void neon_store_reg(int reg, int pass, TCGv_i32 var)
 {
-    tcg_gen_st_i32(var, cpu_env, neon_reg_offset(reg, pass));
+    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
     tcg_temp_free_i32(var);
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This seems a bit more readable than using offsetof CPU_DoubleU.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-5-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_element_offset(int reg, int element, MemOp size)
     return neon_full_reg_offset(reg) + ofs;
 }
 
-static inline long vfp_reg_offset(bool dp, unsigned reg)
+/* Return the offset of a VFP Dreg (dp = true) or VFP Sreg (dp = false). */
+static long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
-        return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+        return neon_element_offset(reg, 0, MO_64);
     } else {
-        long ofs = offsetof(CPUARMState, vfp.zregs[reg >> 2].d[(reg >> 1) & 1]);
-        if (reg & 1) {
-            ofs += offsetof(CPU_DoubleU, l.upper);
-        } else {
-            ofs += offsetof(CPU_DoubleU, l.lower);
-        }
-        return ofs;
+        return neon_element_offset(reg >> 1, reg & 1, MO_32);
     }
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Model these off the aa64 read/write_vec_element functions.
Use it within translate-neon.c.inc.  The new functions do
not allocate or free temps, so this rearranges the calling
code a bit.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-6-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  26 ++++
 target/arm/translate-neon.c.inc | 256 ++++++++++++++++++++------------
 2 files changed, 183 insertions(+), 99 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
+{
+    long off = neon_element_offset(reg, ele, size);
+
+    switch (size) {
+    case MO_32:
+        tcg_gen_ld_i32(dest, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
+{
+    long off = neon_element_offset(reg, ele, size);
+
+    switch (size) {
+    case MO_32:
+        tcg_gen_st_i32(src, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 {
     TCGv_ptr ret = tcg_temp_new_ptr();
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
      * early. Since Q is 0 there are always just two passes, so instead
      * of a complicated loop over each pass we just unroll.
      */
-    tmp = neon_load_reg(a->vn, 0);
-    tmp2 = neon_load_reg(a->vn, 1);
+    tmp = tcg_temp_new_i32();
+    tmp2 = tcg_temp_new_i32();
+    tmp3 = tcg_temp_new_i32();
+
+    read_neon_element32(tmp, a->vn, 0, MO_32);
+    read_neon_element32(tmp2, a->vn, 1, MO_32);
     fn(tmp, tmp, tmp2);
-    tcg_temp_free_i32(tmp2);
 
-    tmp3 = neon_load_reg(a->vm, 0);
-    tmp2 = neon_load_reg(a->vm, 1);
+    read_neon_element32(tmp3, a->vm, 0, MO_32);
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     fn(tmp3, tmp3, tmp2);
-    tcg_temp_free_i32(tmp2);
 
-    neon_store_reg(a->vd, 0, tmp);
-    neon_store_reg(a->vd, 1, tmp3);
+    write_neon_element32(tmp, a->vd, 0, MO_32);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i32(tmp2);
+    tcg_temp_free_i32(tmp3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
      * 2-reg-and-shift operations, size < 3 case, where the
      * helper needs to be passed cpu_env.
      */
-    TCGv_i32 constimm;
+    TCGv_i32 constimm, tmp;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
      * by immediate using the variable shift operations.
      */
     constimm = tcg_const_i32(dup_const(a->size, a->shift));
+    tmp = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
+        read_neon_element32(tmp, a->vm, pass, MO_32);
         fn(tmp, cpu_env, tmp, constimm);
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(constimm);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
     constimm = tcg_const_i64(-a->shift);
     rm1 = tcg_temp_new_i64();
     rm2 = tcg_temp_new_i64();
+    rd = tcg_temp_new_i32();
 
     /* Load both inputs first to avoid potential overwrite if rm == rd */
     neon_load_reg64(rm1, a->vm);
     neon_load_reg64(rm2, a->vm + 1);
 
     shiftfn(rm1, rm1, constimm);
-    rd = tcg_temp_new_i32();
     narrowfn(rd, cpu_env, rm1);
-    neon_store_reg(a->vd, 0, rd);
+    write_neon_element32(rd, a->vd, 0, MO_32);
 
     shiftfn(rm2, rm2, constimm);
-    rd = tcg_temp_new_i32();
     narrowfn(rd, cpu_env, rm2);
-    neon_store_reg(a->vd, 1, rd);
+    write_neon_element32(rd, a->vd, 1, MO_32);
 
+    tcg_temp_free_i32(rd);
     tcg_temp_free_i64(rm1);
     tcg_temp_free_i64(rm2);
     tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
     constimm = tcg_const_i32(imm);
 
     /* Load all inputs first to avoid potential overwrite */
-    rm1 = neon_load_reg(a->vm, 0);
-    rm2 = neon_load_reg(a->vm, 1);
-    rm3 = neon_load_reg(a->vm + 1, 0);
-    rm4 = neon_load_reg(a->vm + 1, 1);
+    rm1 = tcg_temp_new_i32();
+    rm2 = tcg_temp_new_i32();
+    rm3 = tcg_temp_new_i32();
+    rm4 = tcg_temp_new_i32();
+    read_neon_element32(rm1, a->vm, 0, MO_32);
+    read_neon_element32(rm2, a->vm, 1, MO_32);
+    read_neon_element32(rm3, a->vm, 2, MO_32);
+    read_neon_element32(rm4, a->vm, 3, MO_32);
     rtmp = tcg_temp_new_i64();
 
     shiftfn(rm1, rm1, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
     tcg_temp_free_i32(rm2);
 
     narrowfn(rm1, cpu_env, rtmp);
-    neon_store_reg(a->vd, 0, rm1);
+    write_neon_element32(rm1, a->vd, 0, MO_32);
+    tcg_temp_free_i32(rm1);
 
     shiftfn(rm3, rm3, constimm);
     shiftfn(rm4, rm4, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
 
     narrowfn(rm3, cpu_env, rtmp);
     tcg_temp_free_i64(rtmp);
-    neon_store_reg(a->vd, 1, rm3);
+    write_neon_element32(rm3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(rm3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         widen_mask = dup_const(a->size + 1, widen_mask);
     }
 
-    rm0 = neon_load_reg(a->vm, 0);
-    rm1 = neon_load_reg(a->vm, 1);
+    rm0 = tcg_temp_new_i32();
+    rm1 = tcg_temp_new_i32();
+    read_neon_element32(rm0, a->vm, 0, MO_32);
+    read_neon_element32(rm1, a->vm, 1, MO_32);
     tmp = tcg_temp_new_i64();
 
     widenfn(tmp, rm0);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     if (src1_wide) {
         neon_load_reg64(rn0_64, a->vn);
     } else {
-        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vn, 0, MO_32);
         widenfn(rn0_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = neon_load_reg(a->vm, 0);
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rm, a->vm, 0, MO_32);
 
     widenfn(rm_64, rm);
     tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     if (src1_wide) {
         neon_load_reg64(rn1_64, a->vn + 1);
     } else {
-        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vn, 1, MO_32);
         widenfn(rn1_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = neon_load_reg(a->vm, 1);
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rm, a->vm, 1, MO_32);
 
     neon_store_reg64(rn0_64, a->vd);
 
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
 
     narrowfn(rd1, rn_64);
 
-    neon_store_reg(a->vd, 0, rd0);
-    neon_store_reg(a->vd, 1, rd1);
+    write_neon_element32(rd0, a->vd, 0, MO_32);
+    write_neon_element32(rd1, a->vd, 1, MO_32);
 
+    tcg_temp_free_i32(rd0);
+    tcg_temp_free_i32(rd1);
     tcg_temp_free_i64(rn_64);
     tcg_temp_free_i64(rm_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     rd0 = tcg_temp_new_i64();
     rd1 = tcg_temp_new_i64();
 
-    rn = neon_load_reg(a->vn, 0);
-    rm = neon_load_reg(a->vm, 0);
+    rn = tcg_temp_new_i32();
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rn, a->vn, 0, MO_32);
+    read_neon_element32(rm, a->vm, 0, MO_32);
     opfn(rd0, rn, rm);
-    tcg_temp_free_i32(rn);
-    tcg_temp_free_i32(rm);
 
-    rn = neon_load_reg(a->vn, 1);
-    rm = neon_load_reg(a->vm, 1);
+    read_neon_element32(rn, a->vn, 1, MO_32);
+    read_neon_element32(rm, a->vm, 1, MO_32);
     opfn(rd1, rn, rm);
     tcg_temp_free_i32(rn);
     tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
 
 static inline TCGv_i32 neon_get_scalar(int size, int reg)
 {
-    TCGv_i32 tmp;
-    if (size == 1) {
-        tmp = neon_load_reg(reg & 7, reg >> 4);
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    if (size == MO_16) {
+        read_neon_element32(tmp, reg & 7, reg >> 4, MO_32);
         if (reg & 8) {
             gen_neon_dup_high16(tmp);
         } else {
             gen_neon_dup_low16(tmp);
         }
     } else {
-        tmp = neon_load_reg(reg & 15, reg >> 4);
+        read_neon_element32(tmp, reg & 15, reg >> 4, MO_32);
     }
     return tmp;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
      * perform an accumulation operation of that result into the
      * destination.
      */
-    TCGv_i32 scalar;
+    TCGv_i32 scalar, tmp;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
     }
 
     scalar = neon_get_scalar(a->size, a->vm);
+    tmp = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
+        read_neon_element32(tmp, a->vn, pass, MO_32);
         opfn(tmp, tmp, scalar);
         if (accfn) {
-            TCGv_i32 rd = neon_load_reg(a->vd, pass);
+            TCGv_i32 rd = tcg_temp_new_i32();
+            read_neon_element32(rd, a->vd, pass, MO_32);
             accfn(tmp, rd, tmp);
             tcg_temp_free_i32(rd);
         }
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(scalar);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
      * performs a kind of fused op-then-accumulate using a helper
      * function that takes all of rd, rn and the scalar at once.
      */
-    TCGv_i32 scalar;
+    TCGv_i32 scalar, rn, rd;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
     }
 
     scalar = neon_get_scalar(a->size, a->vm);
+    rn = tcg_temp_new_i32();
+    rd = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 rn = neon_load_reg(a->vn, pass);
-        TCGv_i32 rd = neon_load_reg(a->vd, pass);
+        read_neon_element32(rn, a->vn, pass, MO_32);
+        read_neon_element32(rd, a->vd, pass, MO_32);
         opfn(rd, cpu_env, rn, scalar, rd);
-        tcg_temp_free_i32(rn);
-        neon_store_reg(a->vd, pass, rd);
+        write_neon_element32(rd, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(rd);
     tcg_temp_free_i32(scalar);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
     scalar = neon_get_scalar(a->size, a->vm);
 
     /* Load all inputs before writing any outputs, in case of overlap */
-    rn = neon_load_reg(a->vn, 0);
+    rn = tcg_temp_new_i32();
+    read_neon_element32(rn, a->vn, 0, MO_32);
     rn0_64 = tcg_temp_new_i64();
     opfn(rn0_64, rn, scalar);
-    tcg_temp_free_i32(rn);
 
-    rn = neon_load_reg(a->vn, 1);
+    read_neon_element32(rn, a->vn, 1, MO_32);
     rn1_64 = tcg_temp_new_i64();
     opfn(rn1_64, rn, scalar);
     tcg_temp_free_i32(rn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
         return false;
     }
     n <<= 3;
+    tmp = tcg_temp_new_i32();
     if (a->op) {
-        tmp = neon_load_reg(a->vd, 0);
+        read_neon_element32(tmp, a->vd, 0, MO_32);
     } else {
-        tmp = tcg_temp_new_i32();
         tcg_gen_movi_i32(tmp, 0);
     }
-    tmp2 = neon_load_reg(a->vm, 0);
+    tmp2 = tcg_temp_new_i32();
+    read_neon_element32(tmp2, a->vm, 0, MO_32);
     ptr1 = vfp_reg_ptr(true, a->vn);
     tmp4 = tcg_const_i32(n);
     gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
-    tcg_temp_free_i32(tmp);
+
     if (a->op) {
-        tmp = neon_load_reg(a->vd, 1);
+        read_neon_element32(tmp, a->vd, 1, MO_32);
     } else {
-        tmp = tcg_temp_new_i32();
         tcg_gen_movi_i32(tmp, 0);
     }
-    tmp3 = neon_load_reg(a->vm, 1);
+    tmp3 = tcg_temp_new_i32();
+    read_neon_element32(tmp3, a->vm, 1, MO_32);
     gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(tmp4);
     tcg_temp_free_ptr(ptr1);
-    neon_store_reg(a->vd, 0, tmp2);
-    neon_store_reg(a->vd, 1, tmp3);
-    tcg_temp_free_i32(tmp);
+
+    write_neon_element32(tmp2, a->vd, 0, MO_32);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp2);
+    tcg_temp_free_i32(tmp3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
 static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
 {
     int pass, half;
+    TCGv_i32 tmp[2];
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
         return true;
     }
 
-    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
-        TCGv_i32 tmp[2];
+    tmp[0] = tcg_temp_new_i32();
+    tmp[1] = tcg_temp_new_i32();
 
+    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
         for (half = 0; half < 2; half++) {
-            tmp[half] = neon_load_reg(a->vm, pass * 2 + half);
+            read_neon_element32(tmp[half], a->vm, pass * 2 + half, MO_32);
             switch (a->size) {
             case 0:
                 tcg_gen_bswap32_i32(tmp[half], tmp[half]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
                 g_assert_not_reached();
             }
         }
-        neon_store_reg(a->vd, pass * 2, tmp[1]);
-        neon_store_reg(a->vd, pass * 2 + 1, tmp[0]);
+        write_neon_element32(tmp[1], a->vd, pass * 2, MO_32);
+        write_neon_element32(tmp[0], a->vd, pass * 2 + 1, MO_32);
     }
+
+    tcg_temp_free_i32(tmp[0]);
+    tcg_temp_free_i32(tmp[1]);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
         rm0_64 = tcg_temp_new_i64();
         rm1_64 = tcg_temp_new_i64();
         rd_64 = tcg_temp_new_i64();
-        tmp = neon_load_reg(a->vm, pass * 2);
+
+        tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, pass * 2, MO_32);
         widenfn(rm0_64, tmp);
-        tcg_temp_free_i32(tmp);
-        tmp = neon_load_reg(a->vm, pass * 2 + 1);
+        read_neon_element32(tmp, a->vm, pass * 2 + 1, MO_32);
         widenfn(rm1_64, tmp);
         tcg_temp_free_i32(tmp);
+
         opfn(rd_64, rm0_64, rm1_64);
         tcg_temp_free_i64(rm0_64);
         tcg_temp_free_i64(rm1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
     narrowfn(rd0, cpu_env, rm);
     neon_load_reg64(rm, a->vm + 1);
     narrowfn(rd1, cpu_env, rm);
-    neon_store_reg(a->vd, 0, rd0);
-    neon_store_reg(a->vd, 1, rd1);
+    write_neon_element32(rd0, a->vd, 0, MO_32);
+    write_neon_element32(rd1, a->vd, 1, MO_32);
+    tcg_temp_free_i32(rd0);
+    tcg_temp_free_i32(rd1);
     tcg_temp_free_i64(rm);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
     }
 
     rd = tcg_temp_new_i64();
+    rm0 = tcg_temp_new_i32();
+    rm1 = tcg_temp_new_i32();
 
-    rm0 = neon_load_reg(a->vm, 0);
-    rm1 = neon_load_reg(a->vm, 1);
+    read_neon_element32(rm0, a->vm, 0, MO_32);
+    read_neon_element32(rm1, a->vm, 1, MO_32);
 
     widenfn(rd, rm0);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
 
     fpst = fpstatus_ptr(FPST_STD);
     ahp = get_ahp_flag();
-    tmp = neon_load_reg(a->vm, 0);
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vm, 0, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
-    tmp2 = neon_load_reg(a->vm, 1);
+    tmp2 = tcg_temp_new_i32();
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp2, tmp2, fpst, ahp);
     tcg_gen_shli_i32(tmp2, tmp2, 16);
     tcg_gen_or_i32(tmp2, tmp2, tmp);
-    tcg_temp_free_i32(tmp);
-    tmp = neon_load_reg(a->vm, 2);
+    read_neon_element32(tmp, a->vm, 2, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
-    tmp3 = neon_load_reg(a->vm, 3);
-    neon_store_reg(a->vd, 0, tmp2);
+    tmp3 = tcg_temp_new_i32();
+    read_neon_element32(tmp3, a->vm, 3, MO_32);
+    write_neon_element32(tmp2, a->vd, 0, MO_32);
+    tcg_temp_free_i32(tmp2);
     gen_helper_vfp_fcvt_f32_to_f16(tmp3, tmp3, fpst, ahp);
     tcg_gen_shli_i32(tmp3, tmp3, 16);
     tcg_gen_or_i32(tmp3, tmp3, tmp);
-    neon_store_reg(a->vd, 1, tmp3);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp3);
     tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(ahp);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
     fpst = fpstatus_ptr(FPST_STD);
     ahp = get_ahp_flag();
     tmp3 = tcg_temp_new_i32();
-    tmp = neon_load_reg(a->vm, 0);
-    tmp2 = neon_load_reg(a->vm, 1);
+    tmp2 = tcg_temp_new_i32();
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vm, 0, MO_32);
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     tcg_gen_ext16u_i32(tmp3, tmp);
     gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
-    neon_store_reg(a->vd, 0, tmp3);
+    write_neon_element32(tmp3, a->vd, 0, MO_32);
     tcg_gen_shri_i32(tmp, tmp, 16);
     gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp);
-    neon_store_reg(a->vd, 1, tmp);
-    tmp3 = tcg_temp_new_i32();
+    write_neon_element32(tmp, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp);
     tcg_gen_ext16u_i32(tmp3, tmp2);
     gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
-    neon_store_reg(a->vd, 2, tmp3);
+    write_neon_element32(tmp3, a->vd, 2, MO_32);
+    tcg_temp_free_i32(tmp3);
     tcg_gen_shri_i32(tmp2, tmp2, 16);
     gen_helper_vfp_fcvt_f16_to_f32(tmp2, tmp2, fpst, ahp);
-    neon_store_reg(a->vd, 3, tmp2);
+    write_neon_element32(tmp2, a->vd, 3, MO_32);
+    tcg_temp_free_i32(tmp2);
     tcg_temp_free_i32(ahp);
     tcg_temp_free_ptr(fpst);
 
@@ -XXX,XX +XXX,XX @@ DO_2M_CRYPTO(SHA256SU0, aa32_sha2, 2)
 
 static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
 {
+    TCGv_i32 tmp;
     int pass;
 
     /* Handle a 2-reg-misc operation by iterating 32 bits at a time */
@@ -XXX,XX +XXX,XX @@ static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
         return true;
     }
 
+    tmp = tcg_temp_new_i32();
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
+        read_neon_element32(tmp, a->vm, pass, MO_32);
         fn(tmp, tmp);
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
 
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VTRN(DisasContext *s, arg_2misc *a)
         return true;
     }
 
-    if (a->size == 2) {
+    tmp = tcg_temp_new_i32();
+    tmp2 = tcg_temp_new_i32();
+    if (a->size == MO_32) {
         for (pass = 0; pass < (a->q ? 4 : 2); pass += 2) {
-            tmp = neon_load_reg(a->vm, pass);
-            tmp2 = neon_load_reg(a->vd, pass + 1);
-            neon_store_reg(a->vm, pass, tmp2);
-            neon_store_reg(a->vd, pass + 1, tmp);
+            read_neon_element32(tmp, a->vm, pass, MO_32);
+            read_neon_element32(tmp2, a->vd, pass + 1, MO_32);
+            write_neon_element32(tmp2, a->vm, pass, MO_32);
+            write_neon_element32(tmp, a->vd, pass + 1, MO_32);
         }
     } else {
         for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-            tmp = neon_load_reg(a->vm, pass);
-            tmp2 = neon_load_reg(a->vd, pass);
-            if (a->size == 0) {
+            read_neon_element32(tmp, a->vm, pass, MO_32);
+            read_neon_element32(tmp2, a->vd, pass, MO_32);
+            if (a->size == MO_8) {
                 gen_neon_trn_u8(tmp, tmp2);
             } else {
                 gen_neon_trn_u16(tmp, tmp2);
             }
-            neon_store_reg(a->vm, pass, tmp2);
-            neon_store_reg(a->vd, pass, tmp);
+            write_neon_element32(tmp2, a->vm, pass, MO_32);
+            write_neon_element32(tmp, a->vd, pass, MO_32);
         }
     }
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i32(tmp2);
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We can then use this to improve VMOV (scalar to gp) and
VMOV (gp to scalar) so that we simply perform the memory
operation that we wanted, rather than inserting or
extracting from a 32-bit quantity.

These were the last uses of neon_load/store_reg, so remove them.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-7-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         | 50 +++++++++++++-----------
 target/arm/translate-vfp.c.inc | 71 +++++-----------------------------
 2 files changed, 37 insertions(+), 84 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
  * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
  * where 0 is the least significant end of the register.
  */
-static long neon_element_offset(int reg, int element, MemOp size)
+static long neon_element_offset(int reg, int element, MemOp memop)
 {
-    int element_size = 1 << size;
+    int element_size = 1 << (memop & MO_SIZE);
     int ofs = element * element_size;
 #ifdef HOST_WORDS_BIGENDIAN
     /*
@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-static TCGv_i32 neon_load_reg(int reg, int pass)
-{
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
-    return tmp;
-}
-
-static void neon_store_reg(int reg, int pass, TCGv_i32 var)
-{
-    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
-    tcg_temp_free_i32(var);
-}
-
 static inline void neon_load_reg64(TCGv_i64 var, int reg)
 {
     tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
-static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
 {
-    long off = neon_element_offset(reg, ele, size);
+    long off = neon_element_offset(reg, ele, memop);
 
-    switch (size) {
-    case MO_32:
+    switch (memop) {
+    case MO_SB:
+        tcg_gen_ld8s_i32(dest, cpu_env, off);
+        break;
+    case MO_UB:
+        tcg_gen_ld8u_i32(dest, cpu_env, off);
+        break;
+    case MO_SW:
+        tcg_gen_ld16s_i32(dest, cpu_env, off);
+        break;
+    case MO_UW:
+        tcg_gen_ld16u_i32(dest, cpu_env, off);
+        break;
+    case MO_UL:
+    case MO_SL:
         tcg_gen_ld_i32(dest, cpu_env, off);
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
     }
 }
 
-static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
 {
-    long off = neon_element_offset(reg, ele, size);
+    long off = neon_element_offset(reg, ele, memop);
 
-    switch (size) {
+    switch (memop) {
+    case MO_8:
+        tcg_gen_st8_i32(src, cpu_env, off);
+        break;
+    case MO_16:
+        tcg_gen_st16_i32(src, cpu_env, off);
+        break;
     case MO_32:
         tcg_gen_st_i32(src, cpu_env, off);
         break;
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
 {
     /* VMOV scalar to general purpose register */
     TCGv_i32 tmp;
-    int pass;
-    uint32_t offset;
 
-    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
-    if (a->size == 2
+    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
+    if (a->size == MO_32
         ? !dc_isar_feature(aa32_fpsp_v2, s)
         : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
         return false;
     }
 
-    offset = a->index << a->size;
-    pass = extract32(offset, 2, 1);
-    offset = extract32(offset, 0, 2) * 8;
-
     if (!vfp_access_check(s)) {
         return true;
     }
 
-    tmp = neon_load_reg(a->vn, pass);
-    switch (a->size) {
-    case 0:
-        if (offset) {
-            tcg_gen_shri_i32(tmp, tmp, offset);
-        }
-        if (a->u) {
-            gen_uxtb(tmp);
-        } else {
-            gen_sxtb(tmp);
-        }
-        break;
-    case 1:
-        if (a->u) {
-            if (offset) {
-                tcg_gen_shri_i32(tmp, tmp, 16);
-            } else {
-                gen_uxth(tmp);
-            }
-        } else {
-            if (offset) {
-                tcg_gen_sari_i32(tmp, tmp, 16);
-            } else {
-                gen_sxth(tmp);
-            }
-        }
-        break;
-    case 2:
-        break;
-    }
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
     store_reg(s, a->rt, tmp);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
 static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
 {
     /* VMOV general purpose register to scalar */
-    TCGv_i32 tmp, tmp2;
-    int pass;
-    uint32_t offset;
+    TCGv_i32 tmp;
 
-    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
-    if (a->size == 2
+    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
+    if (a->size == MO_32
         ? !dc_isar_feature(aa32_fpsp_v2, s)
         : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
         return false;
     }
 
-    offset = a->index << a->size;
-    pass = extract32(offset, 2, 1);
-    offset = extract32(offset, 0, 2) * 8;
-
     if (!vfp_access_check(s)) {
         return true;
     }
 
     tmp = load_reg(s, a->rt);
-    switch (a->size) {
-    case 0:
-        tmp2 = neon_load_reg(a->vn, pass);
-        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
-        tcg_temp_free_i32(tmp2);
-        break;
-    case 1:
-        tmp2 = neon_load_reg(a->vn, pass);
-        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
-        tcg_temp_free_i32(tmp2);
-        break;
-    case 2:
-        break;
-    }
-    neon_store_reg(a->vn, pass, tmp);
+    write_neon_element32(tmp, a->vn, a->index, a->size);
+    tcg_temp_free_i32(tmp);
 
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The only uses of this function are for loading VFP
single-precision values, and nothing to do with NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         |   4 +-
 target/arm/translate-vfp.c.inc | 184 ++++++++++++++++-----------------
 2 files changed, 94 insertions(+), 94 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
 }
 
-static inline void neon_load_reg32(TCGv_i32 var, int reg)
+static inline void vfp_load_reg32(TCGv_i32 var, int reg)
 {
     tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
-static inline void neon_store_reg32(TCGv_i32 var, int reg)
+static inline void vfp_store_reg32(TCGv_i32 var, int reg)
 {
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         frn = tcg_temp_new_i32();
         frm = tcg_temp_new_i32();
         dest = tcg_temp_new_i32();
-        neon_load_reg32(frn, rn);
-        neon_load_reg32(frm, rm);
+        vfp_load_reg32(frn, rn);
+        vfp_load_reg32(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         if (sz == 1) {
             tcg_gen_andi_i32(dest, dest, 0xffff);
         }
-        neon_store_reg32(dest, rd);
+        vfp_store_reg32(dest, rd);
         tcg_temp_free_i32(frn);
         tcg_temp_free_i32(frm);
         tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i32 tcg_res;
         tcg_op = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        neon_load_reg32(tcg_op, rm);
+        vfp_load_reg32(tcg_op, rm);
         if (sz == 1) {
             gen_helper_rinth(tcg_res, tcg_op, fpst);
         } else {
             gen_helper_rints(tcg_res, tcg_op, fpst);
         }
-        neon_store_reg32(tcg_res, rd);
+        vfp_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_op);
         tcg_temp_free_i32(tcg_res);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
             gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
         }
         tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
-        neon_store_reg32(tcg_tmp, rd);
+        vfp_store_reg32(tcg_tmp, rd);
         tcg_temp_free_i32(tcg_tmp);
         tcg_temp_free_i64(tcg_res);
         tcg_temp_free_i64(tcg_double);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         TCGv_i32 tcg_single, tcg_res;
         tcg_single = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        neon_load_reg32(tcg_single, rm);
+        vfp_load_reg32(tcg_single, rm);
         if (sz == 1) {
             if (is_signed) {
                 gen_helper_vfp_toslh(tcg_res, tcg_single, tcg_shift, fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
                 gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
             }
         }
-        neon_store_reg32(tcg_res, rd);
+        vfp_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_res);
         tcg_temp_free_i32(tcg_single);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
     if (a->l) {
         /* VFP to general purpose register */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vn);
+        vfp_load_reg32(tmp, a->vn);
         tcg_gen_andi_i32(tmp, tmp, 0xffff);
         store_reg(s, a->rt, tmp);
     } else {
         /* general purpose register to VFP */
         tmp = load_reg(s, a->rt);
         tcg_gen_andi_i32(tmp, tmp, 0xffff);
-        neon_store_reg32(tmp, a->vn);
+        vfp_store_reg32(tmp, a->vn);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
     if (a->l) {
         /* VFP to general purpose register */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vn);
+        vfp_load_reg32(tmp, a->vn);
         if (a->rt == 15) {
             /* Set the 4 flag bits in the CPSR.  */
             gen_set_nzcv(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
     } else {
         /* general purpose register to VFP */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vn);
+        vfp_store_reg32(tmp, a->vn);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
     if (a->op) {
         /* fpreg to gpreg */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm);
+        vfp_load_reg32(tmp, a->vm);
         store_reg(s, a->rt, tmp);
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm + 1);
+        vfp_load_reg32(tmp, a->vm + 1);
         store_reg(s, a->rt2, tmp);
     } else {
         /* gpreg to fpreg */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vm);
+        vfp_store_reg32(tmp, a->vm);
         tcg_temp_free_i32(tmp);
         tmp = load_reg(s, a->rt2);
-        neon_store_reg32(tmp, a->vm + 1);
+        vfp_store_reg32(tmp, a->vm + 1);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
     if (a->op) {
         /* fpreg to gpreg */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm * 2);
+        vfp_load_reg32(tmp, a->vm * 2);
         store_reg(s, a->rt, tmp);
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm * 2 + 1);
+        vfp_load_reg32(tmp, a->vm * 2 + 1);
         store_reg(s, a->rt2, tmp);
     } else {
         /* gpreg to fpreg */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vm * 2);
+        vfp_store_reg32(tmp, a->vm * 2);
         tcg_temp_free_i32(tmp);
         tmp = load_reg(s, a->rt2);
-        neon_store_reg32(tmp, a->vm * 2 + 1);
+        vfp_store_reg32(tmp, a->vm * 2 + 1);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     tmp = tcg_temp_new_i32();
     if (a->l) {
         gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
-        neon_store_reg32(tmp, a->vd);
+        vfp_store_reg32(tmp, a->vd);
     } else {
-        neon_load_reg32(tmp, a->vd);
+        vfp_load_reg32(tmp, a->vd);
         gen_aa32_st16(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     tmp = tcg_temp_new_i32();
     if (a->l) {
         gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-        neon_store_reg32(tmp, a->vd);
+        vfp_store_reg32(tmp, a->vd);
     } else {
-        neon_load_reg32(tmp, a->vd);
+        vfp_load_reg32(tmp, a->vd);
         gen_aa32_st32(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
         if (a->l) {
             /* load */
             gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-            neon_store_reg32(tmp, a->vd + i);
+            vfp_store_reg32(tmp, a->vd + i);
         } else {
             /* store */
-            neon_load_reg32(tmp, a->vd + i);
+            vfp_load_reg32(tmp, a->vd + i);
             gen_aa32_st32(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
     fd = tcg_temp_new_i32();
     fpst = fpstatus_ptr(FPST_FPCR);
 
-    neon_load_reg32(f0, vn);
-    neon_load_reg32(f1, vm);
+    vfp_load_reg32(f0, vn);
+    vfp_load_reg32(f1, vm);
 
     for (;;) {
         if (reads_vd) {
-            neon_load_reg32(fd, vd);
+            vfp_load_reg32(fd, vd);
         }
         fn(fd, f0, f1, fpst);
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
         veclen--;
         vd = vfp_advance_sreg(vd, delta_d);
         vn = vfp_advance_sreg(vn, delta_d);
-        neon_load_reg32(f0, vn);
+        vfp_load_reg32(f0, vn);
         if (delta_m) {
             vm = vfp_advance_sreg(vm, delta_m);
-            neon_load_reg32(f1, vm);
+            vfp_load_reg32(f1, vm);
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
     fd = tcg_temp_new_i32();
     fpst = fpstatus_ptr(FPST_FPCR_F16);
 
-    neon_load_reg32(f0, vn);
-    neon_load_reg32(f1, vm);
+    vfp_load_reg32(f0, vn);
+    vfp_load_reg32(f1, vm);
 
     if (reads_vd) {
-        neon_load_reg32(fd, vd);
+        vfp_load_reg32(fd, vd);
     }
     fn(fd, f0, f1, fpst);
-    neon_store_reg32(fd, vd);
+    vfp_store_reg32(fd, vd);
 
     tcg_temp_free_i32(f0);
     tcg_temp_free_i32(f1);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     f0 = tcg_temp_new_i32();
     fd = tcg_temp_new_i32();
 
-    neon_load_reg32(f0, vm);
+    vfp_load_reg32(f0, vm);
 
     for (;;) {
         fn(fd, f0);
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
             /* single source one-many */
             while (veclen--) {
                 vd = vfp_advance_sreg(vd, delta_d);
-                neon_store_reg32(fd, vd);
+                vfp_store_reg32(fd, vd);
             }
             break;
         }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
         veclen--;
         vd = vfp_advance_sreg(vd, delta_d);
         vm = vfp_advance_sreg(vm, delta_m);
-        neon_load_reg32(f0, vm);
+        vfp_load_reg32(f0, vm);
     }
 
     tcg_temp_free_i32(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     }
 
     f0 = tcg_temp_new_i32();
-    neon_load_reg32(f0, vm);
+    vfp_load_reg32(f0, vm);
     fn(f0, f0);
-    neon_store_reg32(f0, vd);
+    vfp_store_reg32(f0, vd);
     tcg_temp_free_i32(f0);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i32();
 
-    neon_load_reg32(vn, a->vn);
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vn, a->vn);
+    vfp_load_reg32(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negh(vn, vn);
     }
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negh(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i32();
 
-    neon_load_reg32(vn, a->vn);
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vn, a->vn);
+    vfp_load_reg32(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negs(vn, vn);
     }
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negs(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_hp(DisasContext *s, arg_VMOV_imm_sp *a)
     }
 
     fd = tcg_const_i32(vfp_expand_imm(MO_16, a->imm));
-    neon_store_reg32(fd, a->vd);
+    vfp_store_reg32(fd, a->vd);
     tcg_temp_free_i32(fd);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
     fd = tcg_const_i32(vfp_expand_imm(MO_32, a->imm));
 
     for (;;) {
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i32();
 
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i32(vm, 0);
     } else {
-        neon_load_reg32(vm, a->vm);
+        vfp_load_reg32(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i32();
 
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i32(vm, 0);
     } else {
-        neon_load_reg32(vm, a->vm);
+        vfp_load_reg32(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
     /* The T bit tells us if we want the low or high 16 bits of Vm */
     tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
     gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_i32(ahp_mode);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
     ahp_mode = get_ahp_flag();
     tmp = tcg_temp_new_i32();
 
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
     tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
     tcg_temp_free_i32(ahp_mode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_hp(DisasContext *s, arg_VRINTR_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_rinth(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rints(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_hp(DisasContext *s, arg_VRINTZ_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rinth(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tcg_rmode);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rints(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tcg_rmode);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_hp(DisasContext *s, arg_VRINTX_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_rinth_exact(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rints_exact(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
 
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i64();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     gen_helper_vfp_fcvtds(vd, vm, cpu_env);
     neon_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
     vm = tcg_temp_new_i64();
     neon_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i64(vm);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
     }
 
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     if (a->s) {
         /* i32 -> f16 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
         /* u32 -> f16 */
         gen_helper_vfp_uitoh(vm, vm, fpst);
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
     }
 
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     if (a->s) {
         /* i32 -> f32 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
         /* u32 -> f32 */
         gen_helper_vfp_uitos(vm, vm, fpst);
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
 
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i64();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     if (a->s) {
         /* i32 -> f64 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
     vd = tcg_temp_new_i32();
     neon_load_reg64(vm, a->vm);
     gen_helper_vjcvt(vd, vm, cpu_env);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i64(vm);
     tcg_temp_free_i32(vd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i32();
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i32();
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
 
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
             gen_helper_vfp_touih(vm, vm, fpst);
         }
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
 
     fpst = fpstatus_ptr(FPST_FPCR);
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
             gen_helper_vfp_touis(vm, vm, fpst);
         }
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
             gen_helper_vfp_touid(vd, vm, fpst);
         }
     }
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i64(vm);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VINS(DisasContext *s, arg_VINS *a)
     /* Insert low half of Vm into high half of Vd */
     rm = tcg_temp_new_i32();
     rd = tcg_temp_new_i32();
-    neon_load_reg32(rm, a->vm);
-    neon_load_reg32(rd, a->vd);
+    vfp_load_reg32(rm, a->vm);
+    vfp_load_reg32(rd, a->vd);
     tcg_gen_deposit_i32(rd, rd, rm, 16, 16);
-    neon_store_reg32(rd, a->vd);
+    vfp_store_reg32(rd, a->vd);
     tcg_temp_free_i32(rm);
     tcg_temp_free_i32(rd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOVX(DisasContext *s, arg_VINS *a)
 
     /* Set Vd to high half of Vm */
     rm = tcg_temp_new_i32();
-    neon_load_reg32(rm, a->vm);
+    vfp_load_reg32(rm, a->vm);
     tcg_gen_shri_i32(rm, rm, 16);
-    neon_store_reg32(rm, a->vd);
+    vfp_store_reg32(rm, a->vd);
     tcg_temp_free_i32(rm);
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-9-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          | 26 +++++++++
 target/arm/translate-neon.c.inc | 94 ++++++++++++++++-----------------
 2 files changed, 73 insertions(+), 47 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
     }
 }
 
+static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
+{
+    long off = neon_element_offset(reg, ele, memop);
+
+    switch (memop) {
+    case MO_Q:
+        tcg_gen_ld_i64(dest, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
 {
     long off = neon_element_offset(reg, ele, memop);
@@ -XXX,XX +XXX,XX @@ static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
     }
 }
 
+static void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
+{
+    long off = neon_element_offset(reg, ele, memop);
+
+    switch (memop) {
+    case MO_64:
+        tcg_gen_st_i64(src, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 {
     TCGv_ptr ret = tcg_temp_new_ptr();
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
     for (pass = 0; pass < a->q + 1; pass++) {
         TCGv_i64 tmp = tcg_temp_new_i64();
 
-        neon_load_reg64(tmp, a->vm + pass);
+        read_neon_element64(tmp, a->vm, pass, MO_64);
         fn(tmp, cpu_env, tmp, constimm);
-        neon_store_reg64(tmp, a->vd + pass);
+        write_neon_element64(tmp, a->vd, pass, MO_64);
         tcg_temp_free_i64(tmp);
     }
     tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
     rd = tcg_temp_new_i32();
 
     /* Load both inputs first to avoid potential overwrite if rm == rd */
-    neon_load_reg64(rm1, a->vm);
-    neon_load_reg64(rm2, a->vm + 1);
+    read_neon_element64(rm1, a->vm, 0, MO_64);
+    read_neon_element64(rm2, a->vm, 1, MO_64);
 
     shiftfn(rm1, rm1, constimm);
     narrowfn(rd, cpu_env, rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
     }
-    neon_store_reg64(tmp, a->vd);
+    write_neon_element64(tmp, a->vd, 0, MO_64);
 
     widenfn(tmp, rm1);
     tcg_temp_free_i32(rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
     }
-    neon_store_reg64(tmp, a->vd + 1);
+    write_neon_element64(tmp, a->vd, 1, MO_64);
     tcg_temp_free_i64(tmp);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rm_64 = tcg_temp_new_i64();
 
     if (src1_wide) {
-        neon_load_reg64(rn0_64, a->vn);
+        read_neon_element64(rn0_64, a->vn, 0, MO_64);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 0, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      * avoid incorrect results if a narrow input overlaps with the result.
      */
     if (src1_wide) {
-        neon_load_reg64(rn1_64, a->vn + 1);
+        read_neon_element64(rn1_64, a->vn, 1, MO_64);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rm = tcg_temp_new_i32();
     read_neon_element32(rm, a->vm, 1, MO_32);
 
-    neon_store_reg64(rn0_64, a->vd);
+    write_neon_element64(rn0_64, a->vd, 0, MO_64);
 
     widenfn(rm_64, rm);
     tcg_temp_free_i32(rm);
     opfn(rn1_64, rn1_64, rm_64);
-    neon_store_reg64(rn1_64, a->vd + 1);
+    write_neon_element64(rn1_64, a->vd, 1, MO_64);
 
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
     rd0 = tcg_temp_new_i32();
     rd1 = tcg_temp_new_i32();
 
-    neon_load_reg64(rn_64, a->vn);
-    neon_load_reg64(rm_64, a->vm);
+    read_neon_element64(rn_64, a->vn, 0, MO_64);
+    read_neon_element64(rm_64, a->vm, 0, MO_64);
 
     opfn(rn_64, rn_64, rm_64);
 
     narrowfn(rd0, rn_64);
 
-    neon_load_reg64(rn_64, a->vn + 1);
-    neon_load_reg64(rm_64, a->vm + 1);
+    read_neon_element64(rn_64, a->vn, 1, MO_64);
+    read_neon_element64(rm_64, a->vm, 1, MO_64);
 
     opfn(rn_64, rn_64, rm_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     /* Don't store results until after all loads: they might overlap */
     if (accfn) {
         tmp = tcg_temp_new_i64();
-        neon_load_reg64(tmp, a->vd);
+        read_neon_element64(tmp, a->vd, 0, MO_64);
         accfn(tmp, tmp, rd0);
-        neon_store_reg64(tmp, a->vd);
-        neon_load_reg64(tmp, a->vd + 1);
+        write_neon_element64(tmp, a->vd, 0, MO_64);
+        read_neon_element64(tmp, a->vd, 1, MO_64);
         accfn(tmp, tmp, rd1);
-        neon_store_reg64(tmp, a->vd + 1);
+        write_neon_element64(tmp, a->vd, 1, MO_64);
         tcg_temp_free_i64(tmp);
     } else {
-        neon_store_reg64(rd0, a->vd);
-        neon_store_reg64(rd1, a->vd + 1);
+        write_neon_element64(rd0, a->vd, 0, MO_64);
+        write_neon_element64(rd1, a->vd, 1, MO_64);
     }
 
     tcg_temp_free_i64(rd0);
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
 
     if (accfn) {
         TCGv_i64 t64 = tcg_temp_new_i64();
-        neon_load_reg64(t64, a->vd);
+        read_neon_element64(t64, a->vd, 0, MO_64);
         accfn(t64, t64, rn0_64);
-        neon_store_reg64(t64, a->vd);
-        neon_load_reg64(t64, a->vd + 1);
+        write_neon_element64(t64, a->vd, 0, MO_64);
+        read_neon_element64(t64, a->vd, 1, MO_64);
         accfn(t64, t64, rn1_64);
-        neon_store_reg64(t64, a->vd + 1);
+        write_neon_element64(t64, a->vd, 1, MO_64);
         tcg_temp_free_i64(t64);
     } else {
-        neon_store_reg64(rn0_64, a->vd);
-        neon_store_reg64(rn1_64, a->vd + 1);
+        write_neon_element64(rn0_64, a->vd, 0, MO_64);
+        write_neon_element64(rn1_64, a->vd, 1, MO_64);
     }
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
         right = tcg_temp_new_i64();
         dest = tcg_temp_new_i64();
 
-        neon_load_reg64(right, a->vn);
-        neon_load_reg64(left, a->vm);
+        read_neon_element64(right, a->vn, 0, MO_64);
+        read_neon_element64(left, a->vm, 0, MO_64);
         tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
-        neon_store_reg64(dest, a->vd);
+        write_neon_element64(dest, a->vd, 0, MO_64);
 
         tcg_temp_free_i64(left);
         tcg_temp_free_i64(right);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
         destright = tcg_temp_new_i64();
 
         if (a->imm < 8) {
-            neon_load_reg64(right, a->vn);
-            neon_load_reg64(middle, a->vn + 1);
+            read_neon_element64(right, a->vn, 0, MO_64);
+            read_neon_element64(middle, a->vn, 1, MO_64);
             tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
-            neon_load_reg64(left, a->vm);
+            read_neon_element64(left, a->vm, 0, MO_64);
             tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
         } else {
-            neon_load_reg64(right, a->vn + 1);
-            neon_load_reg64(middle, a->vm);
+            read_neon_element64(right, a->vn, 1, MO_64);
+            read_neon_element64(middle, a->vm, 0, MO_64);
             tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
-            neon_load_reg64(left, a->vm + 1);
+            read_neon_element64(left, a->vm, 1, MO_64);
             tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
         }
 
-        neon_store_reg64(destright, a->vd);
-        neon_store_reg64(destleft, a->vd + 1);
+        write_neon_element64(destright, a->vd, 0, MO_64);
+        write_neon_element64(destleft, a->vd, 1, MO_64);
 
         tcg_temp_free_i64(destright);
         tcg_temp_free_i64(destleft);
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
 
         if (accfn) {
             TCGv_i64 tmp64 = tcg_temp_new_i64();
-            neon_load_reg64(tmp64, a->vd + pass);
+            read_neon_element64(tmp64, a->vd, pass, MO_64);
             accfn(rd_64, tmp64, rd_64);
             tcg_temp_free_i64(tmp64);
         }
-        neon_store_reg64(rd_64, a->vd + pass);
+        write_neon_element64(rd_64, a->vd, pass, MO_64);
         tcg_temp_free_i64(rd_64);
     }
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
     rd0 = tcg_temp_new_i32();
     rd1 = tcg_temp_new_i32();
 
-    neon_load_reg64(rm, a->vm);
+    read_neon_element64(rm, a->vm, 0, MO_64);
     narrowfn(rd0, cpu_env, rm);
-    neon_load_reg64(rm, a->vm + 1);
+    read_neon_element64(rm, a->vm, 1, MO_64);
     narrowfn(rd1, cpu_env, rm);
     write_neon_element32(rd0, a->vd, 0, MO_32);
     write_neon_element32(rd1, a->vd, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
 
     widenfn(rd, rm0);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
-    neon_store_reg64(rd, a->vd);
+    write_neon_element64(rd, a->vd, 0, MO_64);
     widenfn(rd, rm1);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
-    neon_store_reg64(rd, a->vd + 1);
+    write_neon_element64(rd, a->vd, 1, MO_64);
 
     tcg_temp_free_i64(rd);
     tcg_temp_free_i32(rm0);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSWP(DisasContext *s, arg_2misc *a)
     rm = tcg_temp_new_i64();
     rd = tcg_temp_new_i64();
     for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
-        neon_load_reg64(rm, a->vm + pass);
-        neon_load_reg64(rd, a->vd + pass);
-        neon_store_reg64(rm, a->vd + pass);
-        neon_store_reg64(rd, a->vm + pass);
+        read_neon_element64(rm, a->vm, pass, MO_64);
+        read_neon_element64(rd, a->vd, pass, MO_64);
+        write_neon_element64(rm, a->vd, pass, MO_64);
+        write_neon_element64(rd, a->vm, pass, MO_64);
     }
     tcg_temp_free_i64(rm);
     tcg_temp_free_i64(rd);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The only uses of this function are for loading VFP
double-precision values, and nothing to do with NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-10-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         |  8 ++--
 target/arm/translate-vfp.c.inc | 84 +++++++++++++++++-----------------
 2 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-static inline void neon_load_reg64(TCGv_i64 var, int reg)
+static inline void vfp_load_reg64(TCGv_i64 var, int reg)
 {
-    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
+    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(true, reg));
 }
 
-static inline void neon_store_reg64(TCGv_i64 var, int reg)
+static inline void vfp_store_reg64(TCGv_i64 var, int reg)
 {
-    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
+    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(true, reg));
 }
 
 static inline void vfp_load_reg32(TCGv_i32 var, int reg)
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         tcg_gen_ext_i32_i64(nf, cpu_NF);
         tcg_gen_ext_i32_i64(vf, cpu_VF);
 
-        neon_load_reg64(frn, rn);
-        neon_load_reg64(frm, rm);
+        vfp_load_reg64(frn, rn);
+        vfp_load_reg64(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
             tcg_temp_free_i64(tmp);
             break;
         }
-        neon_store_reg64(dest, rd);
+        vfp_store_reg64(dest, rd);
         tcg_temp_free_i64(frn);
         tcg_temp_free_i64(frm);
         tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i64 tcg_res;
         tcg_op = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
-        neon_load_reg64(tcg_op, rm);
+        vfp_load_reg64(tcg_op, rm);
         gen_helper_rintd(tcg_res, tcg_op, fpst);
-        neon_store_reg64(tcg_res, rd);
+        vfp_store_reg64(tcg_res, rd);
         tcg_temp_free_i64(tcg_op);
         tcg_temp_free_i64(tcg_res);
     } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         tcg_double = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
         tcg_tmp = tcg_temp_new_i32();
-        neon_load_reg64(tcg_double, rm);
+        vfp_load_reg64(tcg_double, rm);
         if (is_signed) {
             gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
         } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
     tmp = tcg_temp_new_i64();
     if (a->l) {
         gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-        neon_store_reg64(tmp, a->vd);
+        vfp_store_reg64(tmp, a->vd);
     } else {
-        neon_load_reg64(tmp, a->vd);
+        vfp_load_reg64(tmp, a->vd);
         gen_aa32_st64(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i64(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
         if (a->l) {
             /* load */
             gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-            neon_store_reg64(tmp, a->vd + i);
+            vfp_store_reg64(tmp, a->vd + i);
         } else {
             /* store */
-            neon_load_reg64(tmp, a->vd + i);
+            vfp_load_reg64(tmp, a->vd + i);
             gen_aa32_st64(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
     fd = tcg_temp_new_i64();
     fpst = fpstatus_ptr(FPST_FPCR);
 
-    neon_load_reg64(f0, vn);
-    neon_load_reg64(f1, vm);
+    vfp_load_reg64(f0, vn);
+    vfp_load_reg64(f1, vm);
 
     for (;;) {
         if (reads_vd) {
-            neon_load_reg64(fd, vd);
+            vfp_load_reg64(fd, vd);
         }
         fn(fd, f0, f1, fpst);
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
         veclen--;
         vd = vfp_advance_dreg(vd, delta_d);
         vn = vfp_advance_dreg(vn, delta_d);
-        neon_load_reg64(f0, vn);
+        vfp_load_reg64(f0, vn);
         if (delta_m) {
             vm = vfp_advance_dreg(vm, delta_m);
-            neon_load_reg64(f1, vm);
+            vfp_load_reg64(f1, vm);
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
     f0 = tcg_temp_new_i64();
     fd = tcg_temp_new_i64();
 
-    neon_load_reg64(f0, vm);
+    vfp_load_reg64(f0, vm);
 
     for (;;) {
         fn(fd, f0);
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
             /* single source one-many */
             while (veclen--) {
                 vd = vfp_advance_dreg(vd, delta_d);
-                neon_store_reg64(fd, vd);
+                vfp_store_reg64(fd, vd);
             }
             break;
         }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
         veclen--;
         vd = vfp_advance_dreg(vd, delta_d);
         vd = vfp_advance_dreg(vm, delta_m);
-        neon_load_reg64(f0, vm);
+        vfp_load_reg64(f0, vm);
     }
 
     tcg_temp_free_i64(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i64();
 
-    neon_load_reg64(vn, a->vn);
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vn, a->vn);
+    vfp_load_reg64(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negd(vn, vn);
     }
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negd(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     fd = tcg_const_i64(vfp_expand_imm(MO_64, a->imm));
 
     for (;;) {
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
     vd = tcg_temp_new_i64();
     vm = tcg_temp_new_i64();
 
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i64(vm, 0);
     } else {
-        neon_load_reg64(vm, a->vm);
+        vfp_load_reg64(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
     tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
     vd = tcg_temp_new_i64();
     gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(ahp_mode);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
     tmp = tcg_temp_new_i32();
     vm = tcg_temp_new_i64();
 
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
     tcg_temp_free_i64(vm);
     tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rintd(tmp, tmp, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rintd(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     tcg_temp_free_i32(tcg_rmode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rintd_exact(tmp, tmp, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
     vd = tcg_temp_new_i64();
     vfp_load_reg32(vm, a->vm);
     gen_helper_vfp_fcvtds(vd, vm, cpu_env);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_i64(vd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
 
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i64();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
     vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
         /* u32 -> f64 */
         gen_helper_vfp_uitod(vd, vm, fpst);
     }
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_i64(vd);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
 
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i32();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vjcvt(vd, vm, cpu_env);
     vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i64(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i64();
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i64(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
     fpst = fpstatus_ptr(FPST_FPCR);
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i32();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

In both cases, we can sink the write-back and perform
the accumulate into the normal destination temps.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-11-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.c.inc | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     if (accfn) {
         tmp = tcg_temp_new_i64();
         read_neon_element64(tmp, a->vd, 0, MO_64);
-        accfn(tmp, tmp, rd0);
-        write_neon_element64(tmp, a->vd, 0, MO_64);
+        accfn(rd0, tmp, rd0);
         read_neon_element64(tmp, a->vd, 1, MO_64);
-        accfn(tmp, tmp, rd1);
-        write_neon_element64(tmp, a->vd, 1, MO_64);
+        accfn(rd1, tmp, rd1);
         tcg_temp_free_i64(tmp);
-    } else {
-        write_neon_element64(rd0, a->vd, 0, MO_64);
-        write_neon_element64(rd1, a->vd, 1, MO_64);
     }
 
+    write_neon_element64(rd0, a->vd, 0, MO_64);
+    write_neon_element64(rd1, a->vd, 1, MO_64);
     tcg_temp_free_i64(rd0);
     tcg_temp_free_i64(rd1);
 
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
     if (accfn) {
         TCGv_i64 t64 = tcg_temp_new_i64();
         read_neon_element64(t64, a->vd, 0, MO_64);
-        accfn(t64, t64, rn0_64);
-        write_neon_element64(t64, a->vd, 0, MO_64);
+        accfn(rn0_64, t64, rn0_64);
         read_neon_element64(t64, a->vd, 1, MO_64);
-        accfn(t64, t64, rn1_64);
-        write_neon_element64(t64, a->vd, 1, MO_64);
+        accfn(rn1_64, t64, rn1_64);
         tcg_temp_free_i64(t64);
-    } else {
-        write_neon_element64(rn0_64, a->vd, 0, MO_64);
-        write_neon_element64(rn1_64, a->vd, 1, MO_64);
     }
+
+    write_neon_element64(rn0_64, a->vd, 0, MO_64);
+    write_neon_element64(rn1_64, a->vd, 1, MO_64);
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
     return true;
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We can use proper widening loads to extend 32-bit inputs,
and skip the "widenfn" step.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-12-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  6 +++
 target/arm/translate-neon.c.inc | 66 ++++++++++++++++++---------------
 2 files changed, 43 insertions(+), 29 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
     long off = neon_element_offset(reg, ele, memop);
 
     switch (memop) {
+    case MO_SL:
+        tcg_gen_ld32s_i64(dest, cpu_env, off);
+        break;
+    case MO_UL:
+        tcg_gen_ld32u_i64(dest, cpu_env, off);
+        break;
     case MO_Q:
         tcg_gen_ld_i64(dest, cpu_env, off);
         break;
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
 static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
                            NeonGenWidenFn *widenfn,
                            NeonGenTwo64OpFn *opfn,
-                           bool src1_wide)
+                           int src1_mop, int src2_mop)
 {
     /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
     TCGv_i64 rn0_64, rn1_64, rm_64;
-    TCGv_i32 rm;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
         return false;
     }
 
-    if (!widenfn || !opfn) {
+    if (!opfn) {
         /* size == 3 case, which is an entirely different insn group */
         return false;
     }
 
-    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
+    if ((a->vd & 1) || (src1_mop == MO_Q && (a->vn & 1))) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rn1_64 = tcg_temp_new_i64();
     rm_64 = tcg_temp_new_i64();
 
-    if (src1_wide) {
-        read_neon_element64(rn0_64, a->vn, 0, MO_64);
+    if (src1_mop >= 0) {
+        read_neon_element64(rn0_64, a->vn, 0, src1_mop);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 0, MO_32);
         widenfn(rn0_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = tcg_temp_new_i32();
-    read_neon_element32(rm, a->vm, 0, MO_32);
+    if (src2_mop >= 0) {
+        read_neon_element64(rm_64, a->vm, 0, src2_mop);
+    } else {
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, 0, MO_32);
+        widenfn(rm_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
 
-    widenfn(rm_64, rm);
-    tcg_temp_free_i32(rm);
     opfn(rn0_64, rn0_64, rm_64);
 
     /*
      * Load second pass inputs before storing the first pass result, to
      * avoid incorrect results if a narrow input overlaps with the result.
      */
-    if (src1_wide) {
-        read_neon_element64(rn1_64, a->vn, 1, MO_64);
+    if (src1_mop >= 0) {
+        read_neon_element64(rn1_64, a->vn, 1, src1_mop);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 1, MO_32);
         widenfn(rn1_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = tcg_temp_new_i32();
-    read_neon_element32(rm, a->vm, 1, MO_32);
+    if (src2_mop >= 0) {
+        read_neon_element64(rm_64, a->vm, 1, src2_mop);
+    } else {
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, 1, MO_32);
+        widenfn(rm_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
 
     write_neon_element64(rn0_64, a->vd, 0, MO_64);
 
-    widenfn(rm_64, rm);
-    tcg_temp_free_i32(rm);
     opfn(rn1_64, rn1_64, rm_64);
     write_neon_element64(rn1_64, a->vd, 1, MO_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     return true;
 }
 
-#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
+#define DO_PREWIDEN(INSN, S, OP, SRC1WIDE, SIGN)                        \
     static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
     {                                                                   \
         static NeonGenWidenFn * const widenfn[] = {                     \
             gen_helper_neon_widen_##S##8,                               \
             gen_helper_neon_widen_##S##16,                              \
-            tcg_gen_##EXT##_i32_i64,                                    \
-            NULL,                                                       \
+            NULL, NULL,                                                 \
         };                                                              \
         static NeonGenTwo64OpFn * const addfn[] = {                     \
             gen_helper_neon_##OP##l_u16,                                \
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
             tcg_gen_##OP##_i64,                                         \
             NULL,                                                       \
         };                                                              \
-        return do_prewiden_3d(s, a, widenfn[a->size],                   \
-                              addfn[a->size], SRC1WIDE);                \
+        int narrow_mop = a->size == MO_32 ? MO_32 | SIGN : -1;          \
+        return do_prewiden_3d(s, a, widenfn[a->size], addfn[a->size],   \
+                              SRC1WIDE ? MO_Q : narrow_mop,             \
+                              narrow_mop);                              \
     }
 
-DO_PREWIDEN(VADDL_S, s, ext, add, false)
-DO_PREWIDEN(VADDL_U, u, extu, add, false)
-DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
-DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
-DO_PREWIDEN(VADDW_S, s, ext, add, true)
-DO_PREWIDEN(VADDW_U, u, extu, add, true)
-DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
-DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
+DO_PREWIDEN(VADDL_S, s, add, false, MO_SIGN)
+DO_PREWIDEN(VADDL_U, u, add, false, 0)
+DO_PREWIDEN(VSUBL_S, s, sub, false, MO_SIGN)
+DO_PREWIDEN(VSUBL_U, u, sub, false, 0)
+DO_PREWIDEN(VADDW_S, s, add, true, MO_SIGN)
+DO_PREWIDEN(VADDW_U, u, add, true, 0)
+DO_PREWIDEN(VSUBW_S, s, sub, true, MO_SIGN)
+DO_PREWIDEN(VSUBW_U, u, sub, true, 0)
 
 static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
                          NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
-- 
2.20.1

In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
meant we were using the H4() address swizzler macro rather than the
H2() which is required for 2-byte data.  This had no effect on
little-endian hosts but meant we put the result data into the
destination Dreg in the wrong order on big-endian hosts.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201028191712.4910-2-peter.maydell@linaro.org
---
 target/arm/vec_helper.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_ABA(gvec_uaba_d, uint64_t)
         r2 = float16_##OP(m[H2(0)], m[H2(1)], fpst);                    \
         r3 = float16_##OP(m[H2(2)], m[H2(3)], fpst);                    \
                                                                         \
-        d[H4(0)] = r0;                                                  \
-        d[H4(1)] = r1;                                                  \
-        d[H4(2)] = r2;                                                  \
-        d[H4(3)] = r3;                                                  \
+        d[H2(0)] = r0;                                                  \
+        d[H2(1)] = r1;                                                  \
+        d[H2(2)] = r2;                                                  \
+        d[H2(3)] = r3;                                                  \
     }
 
 DO_NEON_PAIRWISE(neon_padd, add)
-- 
2.20.1

The helper functions for performing the udot/sdot operations against
a scalar were not using an address-swizzling macro when converting
the index of the scalar element into a pointer into the vm array.
This had no effect on little-endian hosts but meant we generated
incorrect results on big-endian hosts.

For these insns, the index is indexing over group of 4 8-bit values,
so 32 bits per indexed entity, and H4() is therefore what we want.
(For Neon the only possible input indexes are 0 and 1.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201028191712.4910-3-peter.maydell@linaro.org
---
 target/arm/vec_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t index = simd_data(desc);
     uint32_t *d = vd;
     int8_t *n = vn;
-    int8_t *m_indexed = (int8_t *)vm + index * 4;
+    int8_t *m_indexed = (int8_t *)vm + H4(index) * 4;
 
     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
      * Otherwise opr_sz is a multiple of 16.
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t index = simd_data(desc);
     uint32_t *d = vd;
     uint8_t *n = vn;
-    uint8_t *m_indexed = (uint8_t *)vm + index * 4;
+    uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4;
 
     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
      * Otherwise opr_sz is a multiple of 16.
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

HCR should be applied when NS is set, not when it is cleared.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
 
 /*
  * Non-IS variants of TLB operations are upgraded to
- * IS versions if we are at NS EL1 and HCR_EL2.FB is set to
+ * IS versions if we are at EL1 and HCR_EL2.FB is effectively set to
  * force broadcast of these operations.
  */
 static bool tlb_force_broadcast(CPUARMState *env)
 {
-    return (env->cp15.hcr_el2 & HCR_FB) &&
-        arm_current_el(env) == 1 && arm_is_secure_below_el3(env);
+    return arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_FB);
 }
 
 static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

Secure mode is not exempted from checking SCR_EL3.TLOR, and in the
future HCR_EL2.TLOR when S-EL2 is enabled.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t id_aa64pfr0_read(CPUARMState *env, const ARMCPRegInfo *ri)
 #endif
 
 /* Shared logic between LORID and the rest of the LOR* registers.
- * Secure state has already been delt with.
+ * Secure state exclusion has already been dealt with.
  */
-static CPAccessResult access_lor_ns(CPUARMState *env)
+static CPAccessResult access_lor_ns(CPUARMState *env,
+                                    const ARMCPRegInfo *ri, bool isread)
 {
     int el = arm_current_el(env);
 
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_ns(CPUARMState *env)
     return CP_ACCESS_OK;
 }
 
-static CPAccessResult access_lorid(CPUARMState *env, const ARMCPRegInfo *ri,
-                                   bool isread)
-{
-    if (arm_is_secure_below_el3(env)) {
-        /* Access ok in secure mode.  */
-        return CP_ACCESS_OK;
-    }
-    return access_lor_ns(env);
-}
-
 static CPAccessResult access_lor_other(CPUARMState *env,
                                        const ARMCPRegInfo *ri, bool isread)
 {
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_other(CPUARMState *env,
         /* Access denied in secure mode.  */
         return CP_ACCESS_TRAP;
     }
-    return access_lor_ns(env);
+    return access_lor_ns(env, ri, isread);
 }
 
 /*
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lor_reginfo[] = {
       .type = ARM_CP_CONST, .resetvalue = 0 },
     { .name = "LORID_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 7,
-      .access = PL1_R, .accessfn = access_lorid,
+      .access = PL1_R, .accessfn = access_lor_ns,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     REGINFO_SENTINEL
 };
-- 
2.20.1

If we're using the capstone disassembler, disassembly of a run of
instructions more than 32 bytes long disassembles the wrong data for
instructions beyond the 32 byte mark:

(qemu) xp /16x 0x100
0000000000000100: 0x00000005 0x54410001 0x00000001 0x00001000
0000000000000110: 0x00000000 0x00000004 0x54410002 0x3c000000
0000000000000120: 0x00000000 0x00000004 0x54410009 0x74736574
0000000000000130: 0x00000000 0x00000000 0x00000000 0x00000000
(qemu) xp /16i 0x100
0x00000100: 00000005 andeq r0, r0, r5
0x00000104: 54410001 strbpl r0, [r1], #-1
0x00000108: 00000001 andeq r0, r0, r1
0x0000010c: 00001000 andeq r1, r0, r0
0x00000110: 00000000 andeq r0, r0, r0
0x00000114: 00000004 andeq r0, r0, r4
0x00000118: 54410002 strbpl r0, [r1], #-2
0x0000011c: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
0x00000120: 54410001 strbpl r0, [r1], #-1
0x00000124: 00000001 andeq r0, r0, r1
0x00000128: 00001000 andeq r1, r0, r0
0x0000012c: 00000000 andeq r0, r0, r0
0x00000130: 00000004 andeq r0, r0, r4
0x00000134: 54410002 strbpl r0, [r1], #-2
0x00000138: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
0x0000013c: 00000000 andeq r0, r0, r0

Here the disassembly of 0x120..0x13f is using the data that is in
0x104..0x123.

This is caused by passing the wrong value to the read_memory_func().
The intention is that at this point in the loop the 'cap_buf' buffer
already contains 'csize' bytes of data for the instruction at guest
addr 'pc', and we want to read in an extra 'tsize' bytes.  Those
extra bytes are therefore at 'pc + csize', not 'pc'.  On the first
time through the loop 'csize' happens to be zero, so the initial read
of 32 bytes into cap_buf is correct and as long as the disassembly
never needs to read more data we return the correct information.

Use the correct guest address in the call to read_memory_func().

Cc: qemu-stable@nongnu.org
Fixes: https://bugs.launchpad.net/qemu/+bug/1900779
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201022132445.25039-1-peter.maydell@linaro.org
---
 disas/capstone.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/disas/capstone.c b/disas/capstone.c
index XXXXXXX..XXXXXXX 100644
--- a/disas/capstone.c
+++ b/disas/capstone.c
@@ -XXX,XX +XXX,XX @@ bool cap_disas_monitor(disassemble_info *info, uint64_t pc, int count)
 
         /* Make certain that we can make progress.  */
         assert(tsize != 0);
-        info->read_memory_func(pc, cap_buf + csize, tsize, info);
+        info->read_memory_func(pc + csize, cap_buf + csize, tsize, info);
         csize += tsize;
 
         if (cs_disasm_iter(handle, &cbuf, &csize, &pc, insn)) {
-- 
2.20.1

From: Philippe Mathieu-Daudé <philmd@redhat.com>

Use the BIT_ULL() macro to ensure we use 64-bit arithmetic.
This fixes the following Coverity issue (OVERFLOW_BEFORE_WIDEN):

CID 1432363 (#1 of 1): Unintentional integer overflow:

overflow_before_widen:
    Potentially overflowing expression 1 << scale with type int
    (32 bits, signed) is evaluated using 32-bit arithmetic, and
    then used in a context that expects an expression of type
    hwaddr (64 bits, unsigned).

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20201030144617.1535064-1-philmd@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/smmuv3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/bitops.h"
 #include "hw/irq.h"
 #include "hw/sysbus.h"
 #include "migration/vmstate.h"
@@ -XXX,XX +XXX,XX @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
         scale = CMD_SCALE(cmd);
         num = CMD_NUM(cmd);
         ttl = CMD_TTL(cmd);
-        num_pages = (num + 1) * (1 << (scale));
+        num_pages = (num + 1) * BIT_ULL(scale);
     }
 
     if (type == SMMU_CMD_TLBI_NH_VA) {
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

When booting a CPU with EL3 using the -kernel flag, set up CPTR_EL3 so
that SVE will not trap to EL3.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030151541.11976-1-remi@remlab.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/boot.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@ static void do_cpu_reset(void *opaque)
                     if (cpu_isar_feature(aa64_mte, cpu)) {
                         env->cp15.scr_el3 |= SCR_ATA;
                     }
+                    if (cpu_isar_feature(aa64_sve, cpu)) {
+                        env->cp15.cptr_el[3] |= CPTR_EZ;
+                    }
                     /* AArch64 kernels never boot in secure mode */
                     assert(!info->secure_boot);
                     /* This hook is only supported for AArch32 currently:
-- 
2.20.1

From: AlexChen <alex.chen@huawei.com>

In omap_lcd_interrupts(), the pointer omap_lcd is dereferinced before
being check if it is valid, which may lead to NULL pointer dereference.
So move the assignment to surface after checking that the omap_lcd is valid
and move surface_bits_per_pixel(surface) to after the surface assignment.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: AlexChen <alex.chen@huawei.com>
Message-id: 5F9CDB8A.9000001@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/display/omap_lcdc.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/display/omap_lcdc.c b/hw/display/omap_lcdc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/omap_lcdc.c
+++ b/hw/display/omap_lcdc.c
@@ -XXX,XX +XXX,XX @@ static void omap_lcd_interrupts(struct omap_lcd_panel_s *s)
 static void omap_update_display(void *opaque)
 {
     struct omap_lcd_panel_s *omap_lcd = (struct omap_lcd_panel_s *) opaque;
-    DisplaySurface *surface = qemu_console_surface(omap_lcd->con);
+    DisplaySurface *surface;
     draw_line_func draw_line;
     int size, height, first, last;
     int width, linesize, step, bpp, frame_offset;
     hwaddr frame_base;
 
-    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable ||
-        !surface_bits_per_pixel(surface)) {
+    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable) {
+        return;
+    }
+
+    surface = qemu_console_surface(omap_lcd->con);
+    if (!surface_bits_per_pixel(surface)) {
         return;
     }
 
-- 
2.20.1

From: AlexChen <alex.chen@huawei.com>

In exynos4210_fimd_update(), the pointer s is dereferinced before
being check if it is valid, which may lead to NULL pointer dereference.
So move the assignment to global_width after checking that the s is valid.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 5F9F8D88.9030102@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/display/exynos4210_fimd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/display/exynos4210_fimd.c b/hw/display/exynos4210_fimd.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/exynos4210_fimd.c
+++ b/hw/display/exynos4210_fimd.c
@@ -XXX,XX +XXX,XX @@ static void exynos4210_fimd_update(void *opaque)
     bool blend = false;
     uint8_t *host_fb_addr;
     bool is_dirty = false;
-    const int global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
+    int global_width;
 
     if (!s || !s->console || !s->enabled ||
         surface_bits_per_pixel(qemu_console_surface(s->console)) == 0) {
         return;
     }
+
+    global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
     exynos4210_update_resolution(s);
     surface = qemu_console_surface(s->console);
 
-- 
2.20.1

In arm_v7m_mmu_idx_for_secstate() we get the 'priv' level to pass to
armv7m_mmu_idx_for_secstate_and_priv() by calling arm_current_el().
This is incorrect when the security state being queried is not the
current one, because arm_current_el() uses the current security state
to determine which of the banked CONTROL.nPRIV bits to look at.
The effect was that if (for instance) Secure state was in privileged
mode but Non-Secure was not then we would return the wrong MMU index.

The only places where we are using this function in a way that could
trigger this bug are for the stack loads during a v8M function-return
and for the instruction fetch of a v8M SG insn.

Fix the bug by expanding out the M-profile version of the
arm_current_el() logic inline so it can use the passed in secstate
rather than env->v7m.secure.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201022164408.13214-1-peter.maydell@linaro.org
---
 target/arm/m_helper.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
 /* Return the MMU index for a v7M CPU in the specified security state */
 ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
 {
-    bool priv = arm_current_el(env) != 0;
+    bool priv = arm_v7m_is_handler_mode(env) ||
+        !(env->v7m.control[secstate] & 1);
 
     return arm_v7m_mmu_idx_for_secstate_and_priv(env, secstate, priv);
 }
-- 
2.20.1

On some hosts (eg Ubuntu Bionic) pkg-config returns a set of
libraries for gio-2.0 which don't actually work when compiling
statically. (Specifically, the returned library string includes
-lmount, but not -lblkid which -lmount depends upon, so linking
fails due to missing symbols.)

Check that the libraries work, and don't enable gio if they don't,
in the same way we do for gnutls.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200928160402.7961-1-peter.maydell@linaro.org
---
 configure | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index XXXXXXX..XXXXXXX 100755
--- a/configure
+++ b/configure
@@ -XXX,XX +XXX,XX @@ if test "$static" = yes && test "$mingw32" = yes; then
 fi
 
 if $pkg_config --atleast-version=$glib_req_ver gio-2.0; then
-    gio=yes
     gio_cflags=$($pkg_config --cflags gio-2.0)
     gio_libs=$($pkg_config --libs gio-2.0)
     gdbus_codegen=$($pkg_config --variable=gdbus_codegen gio-2.0)
     if [ ! -x "$gdbus_codegen" ]; then
         gdbus_codegen=
     fi
+    # Check that the libraries actually work -- Ubuntu 18.04 ships
+    # with pkg-config --static --libs data for gio-2.0 that is missing
+    # -lblkid and will give a link error.
+    write_c_skeleton
+    if compile_prog "" "gio_libs" ; then
+        gio=yes
+    else
+        gio=no
+    fi
 else
     gio=no
 fi
-- 
2.20.1

In gicv3_init_cpuif() we copy the ARMCPU gicv3_maintenance_interrupt
into the GICv3CPUState struct's maintenance_irq field.  This will
only work if the board happens to have already wired up the CPU
maintenance IRQ before the GIC was realized.  Unfortunately this is
not the case for the 'virt' board, and so the value that gets copied
is NULL (since a qemu_irq is really a pointer to an IRQState struct
under the hood).  The effect is that the CPU interface code never
actually raises the maintenance interrupt line.

Instead, since the GICv3CPUState has a pointer to the CPUState, make
the dereference at the point where we want to raise the interrupt, to
avoid an implicit requirement on board code to wire things up in a
particular order.

Reported-by: Jose Martins <josemartins90@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20201009153904.28529-1-peter.maydell@linaro.org
Reviewed-by: Luc Michel <luc@lmichel.fr>
---
 include/hw/intc/arm_gicv3_common.h | 1 -
 hw/intc/arm_gicv3_cpuif.c          | 5 ++---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -XXX,XX +XXX,XX @@ struct GICv3CPUState {
     qemu_irq parent_fiq;
     qemu_irq parent_virq;
     qemu_irq parent_vfiq;
-    qemu_irq maintenance_irq;
 
     /* Redistributor */
     uint32_t level;                  /* Current IRQ level */
diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
     int irqlevel = 0;
     int fiqlevel = 0;
     int maintlevel = 0;
+    ARMCPU *cpu = ARM_CPU(cs->cpu);
 
     idx = hppvi_index(cs);
     trace_gicv3_cpuif_virt_update(gicv3_redist_affid(cs), idx);
@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
 
     qemu_set_irq(cs->parent_vfiq, fiqlevel);
     qemu_set_irq(cs->parent_virq, irqlevel);
-    qemu_set_irq(cs->maintenance_irq, maintlevel);
+    qemu_set_irq(cpu->gicv3_maintenance_interrupt, maintlevel);
 }
 
 static uint64_t icv_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
@@ -XXX,XX +XXX,XX @@ void gicv3_init_cpuif(GICv3State *s)
             && cpu->gic_num_lrs) {
             int j;
 
-            cs->maintenance_irq = cpu->gicv3_maintenance_interrupt;
-
             cs->num_list_regs = cpu->gic_num_lrs;
             cs->vpribits = cpu->gic_vpribits;
             cs->vprebits = cpu->gic_vprebits;
-- 
2.20.1

The kerneldoc script currently emits Sphinx markup for a macro with
arguments that uses the c:function directive. This is correct for
Sphinx versions earlier than Sphinx 3, where c:macro doesn't allow
documentation of macros with arguments and c:function is not picky
about the syntax of what it is passed. However, in Sphinx 3 the
c:macro directive was enhanced to support macros with arguments,
and c:function was made more picky about what syntax it accepted.

When kerneldoc is told that it needs to produce output for Sphinx
3 or later, make it emit c:function only for functions and c:macro
for macros with arguments. We assume that anything with a return
type is a function and anything without is a macro.

This fixes the Sphinx error:

/home/petmay01/linaro/qemu-from-laptop/qemu/docs/../include/qom/object.h:155:Error in declarator
If declarator-id with parameters (e.g., 'void f(int arg)'):
  Invalid C declaration: Expected identifier in nested name. [error at 25]
    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
    -------------------------^
If parenthesis in noptr-declarator (e.g., 'void (*f(int arg))(double)'):
  Error in declarator or parameters
  Invalid C declaration: Expecting "(" in parameters. [error at 39]
    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
    ---------------------------------------^

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201030174700.7204-2-peter.maydell@linaro.org
---
 scripts/kernel-doc | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index XXXXXXX..XXXXXXX 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -XXX,XX +XXX,XX @@ sub output_function_rst(%) {
 	output_highlight_rst($args{'purpose'});
 	$start = "\n\n**Syntax**\n\n  ``";
     } else {
-	print ".. c:function:: ";
+        if ((split(/\./, $sphinx_version))[0] >= 3) {
+            # Sphinx 3 and later distinguish macros and functions and
+            # complain if you use c:function with something that's not
+            # syntactically valid as a function declaration.
+            # We assume that anything with a return type is a function
+            # and anything without is a macro.
+            if ($args{'functiontype'} ne "") {
+                print ".. c:function:: ";
+            } else {
+                print ".. c:macro:: ";
+            }
+        } else {
+            # Older Sphinx don't support documenting macros that take
+            # arguments with c:macro, and don't complain about the use
+            # of c:function for this.
+            print ".. c:function:: ";
+        }
     }
     if ($args{'functiontype'} ne "") {
 	$start .= $args{'functiontype'} . " " . $args{'function'} . " (";
-- 
2.20.1

Sphinx 3.2 is pickier than earlier versions about the option:: markup,
and complains about our usage in qemu-option-trace.rst:

../../docs/qemu-option-trace.rst.inc:4:Malformed option description
  '[enable=]PATTERN', should look like "opt", "-opt args", "--opt args",
  "/opt args" or "+opt args"

In this file, we're really trying to document the different parts of
the top-level --trace option, which qemu-nbd.rst and qemu-img.rst
have already introduced with an option:: markup.  So it's not right
to use option:: here anyway.  Switch to a different markup
(definition lists) which gives about the same formatted output.

(Unlike option::, this markup doesn't produce index entries; but
at the moment we don't do anything much with indexes anyway, and
in any case I think it doesn't make much sense to have individual
index entries for the sub-parts of the --trace option.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201030174700.7204-3-peter.maydell@linaro.org
---
 docs/qemu-option-trace.rst.inc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/qemu-option-trace.rst.inc b/docs/qemu-option-trace.rst.inc
index XXXXXXX..XXXXXXX 100644
--- a/docs/qemu-option-trace.rst.inc
+++ b/docs/qemu-option-trace.rst.inc
@@ -XXX,XX +XXX,XX @@
 
 Specify tracing options.
 
-.. option:: [enable=]PATTERN
+``[enable=]PATTERN``
 
   Immediately enable events matching *PATTERN*
   (either event name or a globbing pattern).  This option is only
@@ -XXX,XX +XXX,XX @@ Specify tracing options.
 
   Use :option:`-trace help` to print a list of names of trace points.
 
-.. option:: events=FILE
+``events=FILE``
 
   Immediately enable events listed in *FILE*.
   The file must contain one event name (as listed in the ``trace-events-all``
@@ -XXX,XX +XXX,XX @@ Specify tracing options.
   available if QEMU has been compiled with the ``simple``, ``log`` or
   ``ftrace`` tracing backend.
 
-.. option:: file=FILE
+``file=FILE``
 
   Log output traces to *FILE*.
   This option is only available if QEMU has been compiled with
-- 
2.20.1

The randomness tests in the NPCM7xx RNG test fail intermittently
but fairly frequently. On my machine running the test in a loop:
 while QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/npcm7xx_rng-test; do true; done

will fail in less than a minute with an error like:
ERROR:../../tests/qtest/npcm7xx_rng-test.c:256:test_first_byte_runs:
assertion failed (calc_runs_p(buf.l, sizeof(buf) * BITS_PER_BYTE) > 0.01): (0.00286205989 > 0.01)

(Failures have been observed on all 4 of the randomness tests,
not just first_byte_runs.)

It's not clear why these tests are failing like this, but intermittent
failures make CI and merge testing awkward, so disable running them
unless a developer specifically sets QEMU_TEST_FLAKY_RNG_TESTS when
running the test suite, until we work out the cause.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201102152454.8287-1-peter.maydell@linaro.org
Reviewed-by: Havard Skinnemoen <hskinnemoen@google.com>
---
 tests/qtest/npcm7xx_rng-test.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/npcm7xx_rng-test.c b/tests/qtest/npcm7xx_rng-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/npcm7xx_rng-test.c
+++ b/tests/qtest/npcm7xx_rng-test.c
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
 
     qtest_add_func("npcm7xx_rng/enable_disable", test_enable_disable);
     qtest_add_func("npcm7xx_rng/rosel", test_rosel);
-    qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
-    qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
-    qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
-    qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
+    /*
+     * These tests fail intermittently; only run them on explicit
+     * request until we figure out why.
+     */
+    if (getenv("QEMU_TEST_FLAKY_RNG_TESTS")) {
+        qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
+        qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
+        qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
+        qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
+    }
 
     qtest_start("-machine npcm750-evb");
     ret = g_test_run();
-- 
2.20.1

Hi; here's a target-arm pullreq. Mostly this is RTH's FEAT_RME
series; there are also a handful of bug fixes including some
which aren't arm-specific but which it's convenient to include
here.

thanks
-- PMM

The following changes since commit b455ce4c2f300c8ba47cba7232dd03261368a4cb:

Merge tag 'q800-for-8.1-pull-request' of https://github.com/vivier/qemu-m68k into staging (2023-06-22 10:18:32 +0200)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20230623

for you to fetch changes up to 497fad38979c16b6412388927401e577eba43d26:

pc-bios/keymaps: Use the official xkb name for Arabic layout, not the legacy synonym (2023-06-23 11:46:02 +0100)

----------------------------------------------------------------
target-arm queue:
 * Add (experimental) support for FEAT_RME
 * host-utils: Avoid using __builtin_subcll on buggy versions of Apple Clang
 * target/arm: Restructure has_vfp_d32 test
 * hw/arm/sbsa-ref: add ITS support in SBSA GIC
 * target/arm: Fix sve predicate store, 8 <= VQ <= 15
 * pc-bios/keymaps: Use the official xkb name for Arabic layout, not the legacy synonym

----------------------------------------------------------------
Peter Maydell (2):
      host-utils: Avoid using __builtin_subcll on buggy versions of Apple Clang
      pc-bios/keymaps: Use the official xkb name for Arabic layout, not the legacy synonym

Richard Henderson (23):
      target/arm: Add isar_feature_aa64_rme
      target/arm: Update SCR and HCR for RME
      target/arm: SCR_EL3.NS may be RES1
      target/arm: Add RME cpregs
      target/arm: Introduce ARMSecuritySpace
      include/exec/memattrs: Add two bits of space to MemTxAttrs
      target/arm: Adjust the order of Phys and Stage2 ARMMMUIdx
      target/arm: Introduce ARMMMUIdx_Phys_{Realm,Root}
      target/arm: Remove __attribute__((nonnull)) from ptw.c
      target/arm: Pipe ARMSecuritySpace through ptw.c
      target/arm: NSTable is RES0 for the RME EL3 regime
      target/arm: Handle Block and Page bits for security space
      target/arm: Handle no-execute for Realm and Root regimes
      target/arm: Use get_phys_addr_with_struct in S1_ptw_translate
      target/arm: Move s1_is_el0 into S1Translate
      target/arm: Use get_phys_addr_with_struct for stage2
      target/arm: Add GPC syndrome
      target/arm: Implement GPC exceptions
      target/arm: Implement the granule protection check
      target/arm: Add cpu properties for enabling FEAT_RME
      docs/system/arm: Document FEAT_RME
      target/arm: Restructure has_vfp_d32 test
      target/arm: Fix sve predicate store, 8 <= VQ <= 15

Shashi Mallela (1):
      hw/arm/sbsa-ref: add ITS support in SBSA GIC

From: Richard Henderson <richard.henderson@linaro.org>

Add the missing field for ID_AA64PFR0, and the predicate.
Disable it if EL3 is forced off by the board or command-line.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h | 6 ++++++
 target/arm/cpu.c | 4 ++++
 2 files changed, 10 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(ID_AA64PFR0, SEL2, 36, 4)
 FIELD(ID_AA64PFR0, MPAM, 40, 4)
 FIELD(ID_AA64PFR0, AMU, 44, 4)
 FIELD(ID_AA64PFR0, DIT, 48, 4)
+FIELD(ID_AA64PFR0, RME, 52, 4)
 FIELD(ID_AA64PFR0, CSV2, 56, 4)
 FIELD(ID_AA64PFR0, CSV3, 60, 4)
 
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sel2(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, SEL2) != 0;
 }
 
+static inline bool isar_feature_aa64_rme(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, RME) != 0;
+}
+
 static inline bool isar_feature_aa64_vh(const ARMISARegisters *id)
 {
     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, VH) != 0;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
         cpu->isar.id_dfr0 = FIELD_DP32(cpu->isar.id_dfr0, ID_DFR0, COPSDBG, 0);
         cpu->isar.id_aa64pfr0 = FIELD_DP64(cpu->isar.id_aa64pfr0,
                                            ID_AA64PFR0, EL3, 0);
+
+        /* Disable the realm management extension, which requires EL3. */
+        cpu->isar.id_aa64pfr0 = FIELD_DP64(cpu->isar.id_aa64pfr0,
+                                           ID_AA64PFR0, RME, 0);
     }
 
     if (!cpu->has_el2) {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Define the missing SCR and HCR bits, allow SCR_NSE and {SCR,HCR}_GPF
to be set, and invalidate TLBs when NSE changes.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h    |  5 +++--
 target/arm/helper.c | 10 ++++++++--
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline void xpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
 #define HCR_TERR      (1ULL << 36)
 #define HCR_TEA       (1ULL << 37)
 #define HCR_MIOCNCE   (1ULL << 38)
-/* RES0 bit 39 */
+#define HCR_TME       (1ULL << 39)
 #define HCR_APK       (1ULL << 40)
 #define HCR_API       (1ULL << 41)
 #define HCR_NV        (1ULL << 42)
@@ -XXX,XX +XXX,XX @@ static inline void xpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
 #define HCR_NV2       (1ULL << 45)
 #define HCR_FWB       (1ULL << 46)
 #define HCR_FIEN      (1ULL << 47)
-/* RES0 bit 48 */
+#define HCR_GPF       (1ULL << 48)
 #define HCR_TID4      (1ULL << 49)
 #define HCR_TICAB     (1ULL << 50)
 #define HCR_AMVOFFEN  (1ULL << 51)
@@ -XXX,XX +XXX,XX @@ static inline void xpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
 #define SCR_TRNDR             (1ULL << 40)
 #define SCR_ENTP2             (1ULL << 41)
 #define SCR_GPF               (1ULL << 48)
+#define SCR_NSE               (1ULL << 62)
 
 #define HSTR_TTEE (1 << 16)
 #define HSTR_TJDBX (1 << 17)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void scr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
         if (cpu_isar_feature(aa64_fgt, cpu)) {
             valid_mask |= SCR_FGTEN;
         }
+        if (cpu_isar_feature(aa64_rme, cpu)) {
+            valid_mask |= SCR_NSE | SCR_GPF;
+        }
     } else {
         valid_mask &= ~(SCR_RW | SCR_ST);
         if (cpu_isar_feature(aa32_ras, cpu)) {
@@ -XXX,XX +XXX,XX @@ static void scr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
     env->cp15.scr_el3 = value;
 
     /*
-     * If SCR_EL3.NS changes, i.e. arm_is_secure_below_el3, then
+     * If SCR_EL3.{NS,NSE} changes, i.e. change of security state,
      * we must invalidate all TLBs below EL3.
      */
-    if (changed & SCR_NS) {
+    if (changed & (SCR_NS | SCR_NSE)) {
         tlb_flush_by_mmuidx(env_cpu(env), (ARMMMUIdxBit_E10_0 |
                                            ARMMMUIdxBit_E20_0 |
                                            ARMMMUIdxBit_E10_1 |
@@ -XXX,XX +XXX,XX @@ static void do_hcr_write(CPUARMState *env, uint64_t value, uint64_t valid_mask)
         if (cpu_isar_feature(aa64_fwb, cpu)) {
             valid_mask |= HCR_FWB;
         }
+        if (cpu_isar_feature(aa64_rme, cpu)) {
+            valid_mask |= HCR_GPF;
+        }
     }
 
     if (cpu_isar_feature(any_evt, cpu)) {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

With RME, SEL2 must also be present to support secure state.
The NS bit is RES1 if SEL2 is not present.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void scr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
         }
         if (cpu_isar_feature(aa64_sel2, cpu)) {
             valid_mask |= SCR_EEL2;
+        } else if (cpu_isar_feature(aa64_rme, cpu)) {
+            /* With RME and without SEL2, NS is RES1 (R_GSWWH, I_DJJQJ). */
+            value |= SCR_NS;
         }
         if (cpu_isar_feature(aa64_mte, cpu)) {
             valid_mask |= SCR_ATA;
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

This includes GPCCR, GPTBR, MFAR, the TLB flush insns PAALL, PAALLOS,
RPALOS, RPAOS, and the cache flush insns CIPAPA and CIGDPAPA.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h    | 19 ++++++++++
 target/arm/helper.c | 84 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 103 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
         uint64_t fgt_read[2]; /* HFGRTR, HDFGRTR */
         uint64_t fgt_write[2]; /* HFGWTR, HDFGWTR */
         uint64_t fgt_exec[1]; /* HFGITR */
+
+        /* RME registers */
+        uint64_t gpccr_el3;
+        uint64_t gptbr_el3;
+        uint64_t mfar_el3;
     } cp15;
 
     struct {
@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
     uint64_t reset_cbar;
     uint32_t reset_auxcr;
     bool reset_hivecs;
+    uint8_t reset_l0gptsz;
 
     /*
      * Intermediate values used during property parsing.
@@ -XXX,XX +XXX,XX @@ FIELD(MVFR1, SIMDFMAC, 28, 4)
 FIELD(MVFR2, SIMDMISC, 0, 4)
 FIELD(MVFR2, FPMISC, 4, 4)
 
+FIELD(GPCCR, PPS, 0, 3)
+FIELD(GPCCR, IRGN, 8, 2)
+FIELD(GPCCR, ORGN, 10, 2)
+FIELD(GPCCR, SH, 12, 2)
+FIELD(GPCCR, PGS, 14, 2)
+FIELD(GPCCR, GPC, 16, 1)
+FIELD(GPCCR, GPCP, 17, 1)
+FIELD(GPCCR, L0GPTSZ, 20, 4)
+
+FIELD(MFAR, FPA, 12, 40)
+FIELD(MFAR, NSE, 62, 1)
+FIELD(MFAR, NS, 63, 1)
+
 QEMU_BUILD_BUG_ON(ARRAY_SIZE(((ARMCPU *)0)->ccsidr) <= R_V7M_CSSELR_INDEX_MASK);
 
 /* If adding a feature bit which corresponds to a Linux ELF
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo sme_reginfo[] = {
       .access = PL2_RW, .accessfn = access_esm,
       .type = ARM_CP_CONST, .resetvalue = 0 },
 };
+
+static void tlbi_aa64_paall_write(CPUARMState *env, const ARMCPRegInfo *ri,
+                                  uint64_t value)
+{
+    CPUState *cs = env_cpu(env);
+
+    tlb_flush(cs);
+}
+
+static void gpccr_write(CPUARMState *env, const ARMCPRegInfo *ri,
+                        uint64_t value)
+{
+    /* L0GPTSZ is RO; other bits not mentioned are RES0. */
+    uint64_t rw_mask = R_GPCCR_PPS_MASK | R_GPCCR_IRGN_MASK |
+        R_GPCCR_ORGN_MASK | R_GPCCR_SH_MASK | R_GPCCR_PGS_MASK |
+        R_GPCCR_GPC_MASK | R_GPCCR_GPCP_MASK;
+
+    env->cp15.gpccr_el3 = (value & rw_mask) | (env->cp15.gpccr_el3 & ~rw_mask);
+}
+
+static void gpccr_reset(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+    env->cp15.gpccr_el3 = FIELD_DP64(0, GPCCR, L0GPTSZ,
+                                     env_archcpu(env)->reset_l0gptsz);
+}
+
+static void tlbi_aa64_paallos_write(CPUARMState *env, const ARMCPRegInfo *ri,
+                                    uint64_t value)
+{
+    CPUState *cs = env_cpu(env);
+
+    tlb_flush_all_cpus_synced(cs);
+}
+
+static const ARMCPRegInfo rme_reginfo[] = {
+    { .name = "GPCCR_EL3", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 6, .crn = 2, .crm = 1, .opc2 = 6,
+      .access = PL3_RW, .writefn = gpccr_write, .resetfn = gpccr_reset,
+      .fieldoffset = offsetof(CPUARMState, cp15.gpccr_el3) },
+    { .name = "GPTBR_EL3", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 6, .crn = 2, .crm = 1, .opc2 = 4,
+      .access = PL3_RW, .fieldoffset = offsetof(CPUARMState, cp15.gptbr_el3) },
+    { .name = "MFAR_EL3", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 6, .crn = 6, .crm = 0, .opc2 = 5,
+      .access = PL3_RW, .fieldoffset = offsetof(CPUARMState, cp15.mfar_el3) },
+    { .name = "TLBI_PAALL", .state = ARM_CP_STATE_AA64,
+      .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 7, .opc2 = 4,
+      .access = PL3_W, .type = ARM_CP_NO_RAW,
+      .writefn = tlbi_aa64_paall_write },
+    { .name = "TLBI_PAALLOS", .state = ARM_CP_STATE_AA64,
+      .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 1, .opc2 = 4,
+      .access = PL3_W, .type = ARM_CP_NO_RAW,
+      .writefn = tlbi_aa64_paallos_write },
+    /*
+     * QEMU does not have a way to invalidate by physical address, thus
+     * invalidating a range of physical addresses is accomplished by
+     * flushing all tlb entries in the outer sharable domain,
+     * just like PAALLOS.
+     */
+    { .name = "TLBI_RPALOS", .state = ARM_CP_STATE_AA64,
+      .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 4, .opc2 = 7,
+      .access = PL3_W, .type = ARM_CP_NO_RAW,
+      .writefn = tlbi_aa64_paallos_write },
+    { .name = "TLBI_RPAOS", .state = ARM_CP_STATE_AA64,
+      .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 4, .opc2 = 3,
+      .access = PL3_W, .type = ARM_CP_NO_RAW,
+      .writefn = tlbi_aa64_paallos_write },
+    { .name = "DC_CIPAPA", .state = ARM_CP_STATE_AA64,
+      .opc0 = 1, .opc1 = 6, .crn = 7, .crm = 14, .opc2 = 1,
+      .access = PL3_W, .type = ARM_CP_NOP },
+};
+
+static const ARMCPRegInfo rme_mte_reginfo[] = {
+    { .name = "DC_CIGDPAPA", .state = ARM_CP_STATE_AA64,
+      .opc0 = 1, .opc1 = 6, .crn = 7, .crm = 14, .opc2 = 5,
+      .access = PL3_W, .type = ARM_CP_NOP },
+};
 #endif /* TARGET_AARCH64 */
 
 static void define_pmu_regs(ARMCPU *cpu)
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
     if (cpu_isar_feature(aa64_fgt, cpu)) {
         define_arm_cp_regs(cpu, fgt_reginfo);
     }
+
+    if (cpu_isar_feature(aa64_rme, cpu)) {
+        define_arm_cp_regs(cpu, rme_reginfo);
+        if (cpu_isar_feature(aa64_mte, cpu)) {
+            define_arm_cp_regs(cpu, rme_mte_reginfo);
+        }
+    }
 #endif
 
     if (cpu_isar_feature(any_predinv, cpu)) {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Introduce both the enumeration and functions to retrieve
the current state, and state outside of EL3.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-6-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h    | 89 ++++++++++++++++++++++++++++++++++-----------
 target/arm/helper.c | 60 ++++++++++++++++++++++++++++++
 2 files changed, 127 insertions(+), 22 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline int arm_feature(CPUARMState *env, int feature)
 
 void arm_cpu_finalize_features(ARMCPU *cpu, Error **errp);
 
-#if !defined(CONFIG_USER_ONLY)
 /*
+ * ARM v9 security states.
+ * The ordering of the enumeration corresponds to the low 2 bits
+ * of the GPI value, and (except for Root) the concat of NSE:NS.
+ */
+
+typedef enum ARMSecuritySpace {
+    ARMSS_Secure     = 0,
+    ARMSS_NonSecure  = 1,
+    ARMSS_Root       = 2,
+    ARMSS_Realm      = 3,
+} ARMSecuritySpace;
+
+/* Return true if @space is secure, in the pre-v9 sense. */
+static inline bool arm_space_is_secure(ARMSecuritySpace space)
+{
+    return space == ARMSS_Secure || space == ARMSS_Root;
+}
+
+/* Return the ARMSecuritySpace for @secure, assuming !RME or EL[0-2]. */
+static inline ARMSecuritySpace arm_secure_to_space(bool secure)
+{
+    return secure ? ARMSS_Secure : ARMSS_NonSecure;
+}
+
+#if !defined(CONFIG_USER_ONLY)
+/**
+ * arm_security_space_below_el3:
+ * @env: cpu context
+ *
+ * Return the security space of exception levels below EL3, following
+ * an exception return to those levels.  Unlike arm_security_space,
+ * this doesn't care about the current EL.
+ */
+ARMSecuritySpace arm_security_space_below_el3(CPUARMState *env);
+
+/**
+ * arm_is_secure_below_el3:
+ * @env: cpu context
+ *
  * Return true if exception levels below EL3 are in secure state,
- * or would be following an exception return to that level.
- * Unlike arm_is_secure() (which is always a question about the
- * _current_ state of the CPU) this doesn't care about the current
- * EL or mode.
+ * or would be following an exception return to those levels.
  */
 static inline bool arm_is_secure_below_el3(CPUARMState *env)
 {
-    assert(!arm_feature(env, ARM_FEATURE_M));
-    if (arm_feature(env, ARM_FEATURE_EL3)) {
-        return !(env->cp15.scr_el3 & SCR_NS);
-    } else {
-        /* If EL3 is not supported then the secure state is implementation
-         * defined, in which case QEMU defaults to non-secure.
-         */
-        return false;
-    }
+    ARMSecuritySpace ss = arm_security_space_below_el3(env);
+    return ss == ARMSS_Secure;
 }
 
 /* Return true if the CPU is AArch64 EL3 or AArch32 Mon */
@@ -XXX,XX +XXX,XX @@ static inline bool arm_is_el3_or_mon(CPUARMState *env)
     return false;
 }
 
-/* Return true if the processor is in secure state */
+/**
+ * arm_security_space:
+ * @env: cpu context
+ *
+ * Return the current security space of the cpu.
+ */
+ARMSecuritySpace arm_security_space(CPUARMState *env);
+
+/**
+ * arm_is_secure:
+ * @env: cpu context
+ *
+ * Return true if the processor is in secure state.
+ */
 static inline bool arm_is_secure(CPUARMState *env)
 {
-    if (arm_feature(env, ARM_FEATURE_M)) {
-        return env->v7m.secure;
-    }
-    if (arm_is_el3_or_mon(env)) {
-        return true;
-    }
-    return arm_is_secure_below_el3(env);
+    return arm_space_is_secure(arm_security_space(env));
 }
 
 /*
@@ -XXX,XX +XXX,XX @@ static inline bool arm_is_el2_enabled(CPUARMState *env)
 }
 
 #else
+static inline ARMSecuritySpace arm_security_space_below_el3(CPUARMState *env)
+{
+    return ARMSS_NonSecure;
+}
+
 static inline bool arm_is_secure_below_el3(CPUARMState *env)
 {
     return false;
 }
 
+static inline ARMSecuritySpace arm_security_space(CPUARMState *env)
+{
+    return ARMSS_NonSecure;
+}
+
 static inline bool arm_is_secure(CPUARMState *env)
 {
     return false;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void aarch64_sve_change_el(CPUARMState *env, int old_el,
     }
 }
 #endif
+
+#ifndef CONFIG_USER_ONLY
+ARMSecuritySpace arm_security_space(CPUARMState *env)
+{
+    if (arm_feature(env, ARM_FEATURE_M)) {
+        return arm_secure_to_space(env->v7m.secure);
+    }
+
+    /*
+     * If EL3 is not supported then the secure state is implementation
+     * defined, in which case QEMU defaults to non-secure.
+     */
+    if (!arm_feature(env, ARM_FEATURE_EL3)) {
+        return ARMSS_NonSecure;
+    }
+
+    /* Check for AArch64 EL3 or AArch32 Mon. */
+    if (is_a64(env)) {
+        if (extract32(env->pstate, 2, 2) == 3) {
+            if (cpu_isar_feature(aa64_rme, env_archcpu(env))) {
+                return ARMSS_Root;
+            } else {
+                return ARMSS_Secure;
+            }
+        }
+    } else {
+        if ((env->uncached_cpsr & CPSR_M) == ARM_CPU_MODE_MON) {
+            return ARMSS_Secure;
+        }
+    }
+
+    return arm_security_space_below_el3(env);
+}
+
+ARMSecuritySpace arm_security_space_below_el3(CPUARMState *env)
+{
+    assert(!arm_feature(env, ARM_FEATURE_M));
+
+    /*
+     * If EL3 is not supported then the secure state is implementation
+     * defined, in which case QEMU defaults to non-secure.
+     */
+    if (!arm_feature(env, ARM_FEATURE_EL3)) {
+        return ARMSS_NonSecure;
+    }
+
+    /*
+     * Note NSE cannot be set without RME, and NSE & !NS is Reserved.
+     * Ignoring NSE when !NS retains consistency without having to
+     * modify other predicates.
+     */
+    if (!(env->cp15.scr_el3 & SCR_NS)) {
+        return ARMSS_Secure;
+    } else if (env->cp15.scr_el3 & SCR_NSE) {
+        return ARMSS_Realm;
+    } else {
+        return ARMSS_NonSecure;
+    }
+}
+#endif /* !CONFIG_USER_ONLY */
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

We will need 2 bits to represent ARMSecurityState.

Do not attempt to replace or widen secure, even though it
logically overlaps the new field -- there are uses within
e.g. hw/block/pflash_cfi01.c, which don't know anything
specific about ARM.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-7-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/exec/memattrs.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/exec/memattrs.h b/include/exec/memattrs.h
index XXXXXXX..XXXXXXX 100644
--- a/include/exec/memattrs.h
+++ b/include/exec/memattrs.h
@@ -XXX,XX +XXX,XX @@ typedef struct MemTxAttrs {
      * "didn't specify" if necessary.
      */
     unsigned int unspecified:1;
-    /* ARM/AMBA: TrustZone Secure access
+    /*
+     * ARM/AMBA: TrustZone Secure access
      * x86: System Management Mode access
      */
     unsigned int secure:1;
+    /*
+     * ARM: ArmSecuritySpace.  This partially overlaps secure, but it is
+     * easier to have both fields to assist code that does not understand
+     * ARMv9 RME, or no specific knowledge of ARM at all (e.g. pflash).
+     */
+    unsigned int space:2;
     /* Memory access is usermode (unprivileged) */
     unsigned int user:1;
     /*
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

It will be helpful to have ARMMMUIdx_Phys_* to be in the same
relative order as ARMSecuritySpace enumerators. This requires
the adjustment to the nstable check. While there, check for being
in secure state rather than rely on clearing the low bit making
no change to non-secure state.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-8-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h | 12 ++++++------
 target/arm/ptw.c | 12 +++++-------
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
     ARMMMUIdx_E2        = 6 | ARM_MMU_IDX_A,
     ARMMMUIdx_E3        = 7 | ARM_MMU_IDX_A,
 
-    /* TLBs with 1-1 mapping to the physical address spaces. */
-    ARMMMUIdx_Phys_NS   = 8 | ARM_MMU_IDX_A,
-    ARMMMUIdx_Phys_S    = 9 | ARM_MMU_IDX_A,
-
     /*
      * Used for second stage of an S12 page table walk, or for descriptor
      * loads during first stage of an S1 page table walk.  Note that both
      * are in use simultaneously for SecureEL2: the security state for
      * the S2 ptw is selected by the NS bit from the S1 ptw.
      */
-    ARMMMUIdx_Stage2    = 10 | ARM_MMU_IDX_A,
-    ARMMMUIdx_Stage2_S  = 11 | ARM_MMU_IDX_A,
+    ARMMMUIdx_Stage2_S  = 8 | ARM_MMU_IDX_A,
+    ARMMMUIdx_Stage2    = 9 | ARM_MMU_IDX_A,
+
+    /* TLBs with 1-1 mapping to the physical address spaces. */
+    ARMMMUIdx_Phys_S    = 10 | ARM_MMU_IDX_A,
+    ARMMMUIdx_Phys_NS   = 11 | ARM_MMU_IDX_A,
 
     /*
      * These are not allocated TLBs and are used only for AT system
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
     descaddr |= (address >> (stride * (4 - level))) & indexmask;
     descaddr &= ~7ULL;
     nstable = !regime_is_stage2(mmu_idx) && extract32(tableattrs, 4, 1);
-    if (nstable) {
+    if (nstable && ptw->in_secure) {
         /*
          * Stage2_S -> Stage2 or Phys_S -> Phys_NS
-         * Assert that the non-secure idx are even, and relative order.
+         * Assert the relative order of the secure/non-secure indexes.
          */
-        QEMU_BUILD_BUG_ON((ARMMMUIdx_Phys_NS & 1) != 0);
-        QEMU_BUILD_BUG_ON((ARMMMUIdx_Stage2 & 1) != 0);
-        QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_NS + 1 != ARMMMUIdx_Phys_S);
-        QEMU_BUILD_BUG_ON(ARMMMUIdx_Stage2 + 1 != ARMMMUIdx_Stage2_S);
-        ptw->in_ptw_idx &= ~1;
+        QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_S + 1 != ARMMMUIdx_Phys_NS);
+        QEMU_BUILD_BUG_ON(ARMMMUIdx_Stage2_S + 1 != ARMMMUIdx_Stage2);
+        ptw->in_ptw_idx += 1;
         ptw->in_secure = false;
     }
     if (!S1_ptw_translate(env, ptw, descaddr, fi)) {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

With FEAT_RME, there are four physical address spaces.
For now, just define the symbols, and mention them in
the same spots as the other Phys indexes in ptw.c.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h | 23 +++++++++++++++++++++--
 target/arm/ptw.c | 10 ++++++++--
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
     ARMMMUIdx_Stage2    = 9 | ARM_MMU_IDX_A,
 
     /* TLBs with 1-1 mapping to the physical address spaces. */
-    ARMMMUIdx_Phys_S    = 10 | ARM_MMU_IDX_A,
-    ARMMMUIdx_Phys_NS   = 11 | ARM_MMU_IDX_A,
+    ARMMMUIdx_Phys_S     = 10 | ARM_MMU_IDX_A,
+    ARMMMUIdx_Phys_NS    = 11 | ARM_MMU_IDX_A,
+    ARMMMUIdx_Phys_Root  = 12 | ARM_MMU_IDX_A,
+    ARMMMUIdx_Phys_Realm = 13 | ARM_MMU_IDX_A,
 
     /*
      * These are not allocated TLBs and are used only for AT system
@@ -XXX,XX +XXX,XX @@ typedef enum ARMASIdx {
     ARMASIdx_TagS = 3,
 } ARMASIdx;
 
+static inline ARMMMUIdx arm_space_to_phys(ARMSecuritySpace space)
+{
+    /* Assert the relative order of the physical mmu indexes. */
+    QEMU_BUILD_BUG_ON(ARMSS_Secure != 0);
+    QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_NS != ARMMMUIdx_Phys_S + ARMSS_NonSecure);
+    QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_Root != ARMMMUIdx_Phys_S + ARMSS_Root);
+    QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_Realm != ARMMMUIdx_Phys_S + ARMSS_Realm);
+
+    return ARMMMUIdx_Phys_S + space;
+}
+
+static inline ARMSecuritySpace arm_phys_to_space(ARMMMUIdx idx)
+{
+    assert(idx >= ARMMMUIdx_Phys_S && idx <= ARMMMUIdx_Phys_Realm);
+    return idx - ARMMMUIdx_Phys_S;
+}
+
 static inline bool arm_v7m_csselr_razwi(ARMCPU *cpu)
 {
     /* If all the CLIDR.Ctypem bits are 0 there are no caches, and
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx,
     case ARMMMUIdx_E3:
         break;
 
-    case ARMMMUIdx_Phys_NS:
     case ARMMMUIdx_Phys_S:
+    case ARMMMUIdx_Phys_NS:
+    case ARMMMUIdx_Phys_Root:
+    case ARMMMUIdx_Phys_Realm:
         /* No translation for physical address spaces. */
         return true;
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_disabled(CPUARMState *env, target_ulong address,
     switch (mmu_idx) {
     case ARMMMUIdx_Stage2:
     case ARMMMUIdx_Stage2_S:
-    case ARMMMUIdx_Phys_NS:
     case ARMMMUIdx_Phys_S:
+    case ARMMMUIdx_Phys_NS:
+    case ARMMMUIdx_Phys_Root:
+    case ARMMMUIdx_Phys_Realm:
         break;
 
     default:
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
     switch (mmu_idx) {
     case ARMMMUIdx_Phys_S:
     case ARMMMUIdx_Phys_NS:
+    case ARMMMUIdx_Phys_Root:
+    case ARMMMUIdx_Phys_Realm:
         /* Checking Phys early avoids special casing later vs regime_el. */
         return get_phys_addr_disabled(env, address, access_type, mmu_idx,
                                       is_secure, result, fi);
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

This was added in 7e98e21c098 as part of a reorg in which
one of the argument had been legally NULL, and this caught
actual instances.  Now that the reorg is complete, this
serves little purpose.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-10-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ typedef struct S1Translate {
 static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
                                uint64_t address,
                                MMUAccessType access_type, bool s1_is_el0,
-                               GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
-    __attribute__((nonnull));
+                               GetPhysAddrResult *result, ARMMMUFaultInfo *fi);
 
 static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
                                       target_ulong address,
                                       MMUAccessType access_type,
                                       GetPhysAddrResult *result,
-                                      ARMMMUFaultInfo *fi)
-    __attribute__((nonnull));
+                                      ARMMMUFaultInfo *fi);
 
 /* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
 static const uint8_t pamax_map[] = {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Add input and output space members to S1Translate.  Set and adjust
them in S1_ptw_translate, and the various points at which we drop
secure state.  Initialize the space in get_phys_addr; for now leave
get_phys_addr_with_secure considering only secure vs non-secure spaces.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-11-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 86 +++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 71 insertions(+), 15 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@
 typedef struct S1Translate {
     ARMMMUIdx in_mmu_idx;
     ARMMMUIdx in_ptw_idx;
+    ARMSecuritySpace in_space;
     bool in_secure;
     bool in_debug;
     bool out_secure;
     bool out_rw;
     bool out_be;
+    ARMSecuritySpace out_space;
     hwaddr out_virt;
     hwaddr out_phys;
     void *out_host;
@@ -XXX,XX +XXX,XX @@ static bool S2_attrs_are_device(uint64_t hcr, uint8_t attrs)
 static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
                              hwaddr addr, ARMMMUFaultInfo *fi)
 {
+    ARMSecuritySpace space = ptw->in_space;
     bool is_secure = ptw->in_secure;
     ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
     ARMMMUIdx s2_mmu_idx = ptw->in_ptw_idx;
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
                 .in_mmu_idx = s2_mmu_idx,
                 .in_ptw_idx = ptw_idx_for_stage_2(env, s2_mmu_idx),
                 .in_secure = s2_mmu_idx == ARMMMUIdx_Stage2_S,
+                .in_space = (s2_mmu_idx == ARMMMUIdx_Stage2_S ? ARMSS_Secure
+                             : space == ARMSS_Realm ? ARMSS_Realm
+                             : ARMSS_NonSecure),
                 .in_debug = true,
             };
             GetPhysAddrResult s2 = { };
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
             ptw->out_phys = s2.f.phys_addr;
             pte_attrs = s2.cacheattrs.attrs;
             ptw->out_secure = s2.f.attrs.secure;
+            ptw->out_space = s2.f.attrs.space;
         } else {
             /* Regime is physical. */
             ptw->out_phys = addr;
             pte_attrs = 0;
             ptw->out_secure = s2_mmu_idx == ARMMMUIdx_Phys_S;
+            ptw->out_space = (s2_mmu_idx == ARMMMUIdx_Phys_S ? ARMSS_Secure
+                              : space == ARMSS_Realm ? ARMSS_Realm
+                              : ARMSS_NonSecure);
         }
         ptw->out_host = NULL;
         ptw->out_rw = false;
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
         ptw->out_rw = full->prot & PAGE_WRITE;
         pte_attrs = full->pte_attrs;
         ptw->out_secure = full->attrs.secure;
+        ptw->out_space = full->attrs.space;
 #else
         g_assert_not_reached();
 #endif
@@ -XXX,XX +XXX,XX @@ static uint32_t arm_ldl_ptw(CPUARMState *env, S1Translate *ptw,
         }
     } else {
         /* Page tables are in MMIO. */
-        MemTxAttrs attrs = { .secure = ptw->out_secure };
+        MemTxAttrs attrs = {
+            .secure = ptw->out_secure,
+            .space = ptw->out_space,
+        };
         AddressSpace *as = arm_addressspace(cs, attrs);
         MemTxResult result = MEMTX_OK;
 
@@ -XXX,XX +XXX,XX @@ static uint64_t arm_ldq_ptw(CPUARMState *env, S1Translate *ptw,
 #endif
     } else {
         /* Page tables are in MMIO. */
-        MemTxAttrs attrs = { .secure = ptw->out_secure };
+        MemTxAttrs attrs = {
+            .secure = ptw->out_secure,
+            .space = ptw->out_space,
+        };
         AddressSpace *as = arm_addressspace(cs, attrs);
         MemTxResult result = MEMTX_OK;
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_v6(CPUARMState *env, S1Translate *ptw,
          * regime, because the attribute will already be non-secure.
          */
         result->f.attrs.secure = false;
+        result->f.attrs.space = ARMSS_NonSecure;
     }
     result->f.phys_addr = phys_addr;
     return false;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
          * regime, because the attribute will already be non-secure.
          */
         result->f.attrs.secure = false;
+        result->f.attrs.space = ARMSS_NonSecure;
     }
 
     if (regime_is_stage2(mmu_idx)) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_pmsav8(CPUARMState *env, uint32_t address,
              */
             if (sattrs.ns) {
                 result->f.attrs.secure = false;
+                result->f.attrs.space = ARMSS_NonSecure;
             } else if (!secure) {
                 /*
                  * NS access to S memory must fault.
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
     bool is_secure = ptw->in_secure;
     bool ret, ipa_secure;
     ARMCacheAttrs cacheattrs1;
+    ARMSecuritySpace ipa_space;
     bool is_el0;
     uint64_t hcr;
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
 
     ipa = result->f.phys_addr;
     ipa_secure = result->f.attrs.secure;
+    ipa_space = result->f.attrs.space;
 
     is_el0 = ptw->in_mmu_idx == ARMMMUIdx_Stage1_E0;
     ptw->in_mmu_idx = ipa_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
     ptw->in_secure = ipa_secure;
+    ptw->in_space = ipa_space;
     ptw->in_ptw_idx = ptw_idx_for_stage_2(env, ptw->in_mmu_idx);
 
     /*
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
     ARMMMUIdx s1_mmu_idx;
 
     /*
-     * The page table entries may downgrade secure to non-secure, but
-     * cannot upgrade an non-secure translation regime's attributes
-     * to secure.
+     * The page table entries may downgrade Secure to NonSecure, but
+     * cannot upgrade a NonSecure translation regime's attributes
+     * to Secure or Realm.
      */
     result->f.attrs.secure = is_secure;
+    result->f.attrs.space = ptw->in_space;
 
     switch (mmu_idx) {
     case ARMMMUIdx_Phys_S:
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
 
     default:
         /* Single stage uses physical for ptw. */
-        ptw->in_ptw_idx = is_secure ? ARMMMUIdx_Phys_S : ARMMMUIdx_Phys_NS;
+        ptw->in_ptw_idx = arm_space_to_phys(ptw->in_space);
         break;
     }
 
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr_with_secure(CPUARMState *env, target_ulong address,
     S1Translate ptw = {
         .in_mmu_idx = mmu_idx,
         .in_secure = is_secure,
+        .in_space = arm_secure_to_space(is_secure),
     };
     return get_phys_addr_with_struct(env, &ptw, address, access_type,
                                      result, fi);
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
                    MMUAccessType access_type, ARMMMUIdx mmu_idx,
                    GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
 {
-    bool is_secure;
+    S1Translate ptw = {
+        .in_mmu_idx = mmu_idx,
+    };
+    ARMSecuritySpace ss;
 
     switch (mmu_idx) {
     case ARMMMUIdx_E10_0:
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
     case ARMMMUIdx_Stage1_E1:
     case ARMMMUIdx_Stage1_E1_PAN:
     case ARMMMUIdx_E2:
-        is_secure = arm_is_secure_below_el3(env);
+        ss = arm_security_space_below_el3(env);
         break;
     case ARMMMUIdx_Stage2:
+        /*
+         * For Secure EL2, we need this index to be NonSecure;
+         * otherwise this will already be NonSecure or Realm.
+         */
+        ss = arm_security_space_below_el3(env);
+        if (ss == ARMSS_Secure) {
+            ss = ARMSS_NonSecure;
+        }
+        break;
     case ARMMMUIdx_Phys_NS:
     case ARMMMUIdx_MPrivNegPri:
     case ARMMMUIdx_MUserNegPri:
     case ARMMMUIdx_MPriv:
     case ARMMMUIdx_MUser:
-        is_secure = false;
+        ss = ARMSS_NonSecure;
         break;
-    case ARMMMUIdx_E3:
     case ARMMMUIdx_Stage2_S:
     case ARMMMUIdx_Phys_S:
     case ARMMMUIdx_MSPrivNegPri:
     case ARMMMUIdx_MSUserNegPri:
     case ARMMMUIdx_MSPriv:
     case ARMMMUIdx_MSUser:
-        is_secure = true;
+        ss = ARMSS_Secure;
+        break;
+    case ARMMMUIdx_E3:
+        if (arm_feature(env, ARM_FEATURE_AARCH64) &&
+            cpu_isar_feature(aa64_rme, env_archcpu(env))) {
+            ss = ARMSS_Root;
+        } else {
+            ss = ARMSS_Secure;
+        }
+        break;
+    case ARMMMUIdx_Phys_Root:
+        ss = ARMSS_Root;
+        break;
+    case ARMMMUIdx_Phys_Realm:
+        ss = ARMSS_Realm;
         break;
     default:
         g_assert_not_reached();
     }
-    return get_phys_addr_with_secure(env, address, access_type, mmu_idx,
-                                     is_secure, result, fi);
+
+    ptw.in_space = ss;
+    ptw.in_secure = arm_space_is_secure(ss);
+    return get_phys_addr_with_struct(env, &ptw, address, access_type,
+                                     result, fi);
 }
 
 hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
@@ -XXX,XX +XXX,XX @@ hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
 {
     ARMCPU *cpu = ARM_CPU(cs);
     CPUARMState *env = &cpu->env;
+    ARMMMUIdx mmu_idx = arm_mmu_idx(env);
+    ARMSecuritySpace ss = arm_security_space(env);
     S1Translate ptw = {
-        .in_mmu_idx = arm_mmu_idx(env),
-        .in_secure = arm_is_secure(env),
+        .in_mmu_idx = mmu_idx,
+        .in_space = ss,
+        .in_secure = arm_space_is_secure(ss),
         .in_debug = true,
     };
     GetPhysAddrResult res = {};
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Test in_space instead of in_secure so that we don't
switch out of Root space.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-12-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
 {
     ARMCPU *cpu = env_archcpu(env);
     ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
-    bool is_secure = ptw->in_secure;
     int32_t level;
     ARMVAParameters param;
     uint64_t ttbr;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
     uint64_t descaddrmask;
     bool aarch64 = arm_el_is_aa64(env, el);
     uint64_t descriptor, new_descriptor;
-    bool nstable;
 
     /* TODO: This code does not support shareability levels. */
     if (aarch64) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
         descaddrmask = MAKE_64BIT_MASK(0, 40);
     }
     descaddrmask &= ~indexmask_grainsize;
-
-    /*
-     * Secure stage 1 accesses start with the page table in secure memory and
-     * can be downgraded to non-secure at any step. Non-secure accesses
-     * remain non-secure. We implement this by just ORing in the NSTable/NS
-     * bits at each step.
-     * Stage 2 never gets this kind of downgrade.
-     */
-    tableattrs = is_secure ? 0 : (1 << 4);
+    tableattrs = 0;
 
  next_level:
     descaddr |= (address >> (stride * (4 - level))) & indexmask;
     descaddr &= ~7ULL;
-    nstable = !regime_is_stage2(mmu_idx) && extract32(tableattrs, 4, 1);
-    if (nstable && ptw->in_secure) {
+
+    /*
+     * Process the NSTable bit from the previous level.  This changes
+     * the table address space and the output space from Secure to
+     * NonSecure.  With RME, the EL3 translation regime does not change
+     * from Root to NonSecure.
+     */
+    if (ptw->in_space == ARMSS_Secure
+        && !regime_is_stage2(mmu_idx)
+        && extract32(tableattrs, 4, 1)) {
         /*
          * Stage2_S -> Stage2 or Phys_S -> Phys_NS
          * Assert the relative order of the secure/non-secure indexes.
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
         QEMU_BUILD_BUG_ON(ARMMMUIdx_Stage2_S + 1 != ARMMMUIdx_Stage2);
         ptw->in_ptw_idx += 1;
         ptw->in_secure = false;
+        ptw->in_space = ARMSS_NonSecure;
     }
+
     if (!S1_ptw_translate(env, ptw, descaddr, fi)) {
         goto do_fault;
     }
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
      */
     attrs = new_descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(50, 14));
     if (!regime_is_stage2(mmu_idx)) {
-        attrs |= nstable << 5; /* NS */
+        attrs |= !ptw->in_secure << 5; /* NS */
         if (!param.hpd) {
             attrs |= extract64(tableattrs, 0, 2) << 53;     /* XN, PXN */
             /*
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

With Realm security state, bit 55 of a block or page descriptor during
the stage2 walk becomes the NS bit; during the stage1 walk the bit 5
NS bit is RES0.  With Root security state, bit 11 of the block or page
descriptor during the stage1 walk becomes the NSE bit.

Rather than collecting an NS bit and applying it later, compute the
output pa space from the input pa space and unconditionally assign.
This means that we no longer need to adjust the output space earlier
for the NSTable bit.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-13-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 89 +++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 73 insertions(+), 16 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
  * @mmu_idx: MMU index indicating required translation regime
  * @is_aa64: TRUE if AArch64
  * @ap:      The 2-bit simple AP (AP[2:1])
- * @ns:      NS (non-secure) bit
  * @xn:      XN (execute-never) bit
  * @pxn:     PXN (privileged execute-never) bit
+ * @in_pa:   The original input pa space
+ * @out_pa:  The output pa space, modified by NSTable, NS, and NSE
  */
 static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
-                      int ap, int ns, int xn, int pxn)
+                      int ap, int xn, int pxn,
+                      ARMSecuritySpace in_pa, ARMSecuritySpace out_pa)
 {
     ARMCPU *cpu = env_archcpu(env);
     bool is_user = regime_is_user(env, mmu_idx);
@@ -XXX,XX +XXX,XX @@ static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
         }
     }
 
-    if (ns && arm_is_secure(env) && (env->cp15.scr_el3 & SCR_SIF)) {
+    if (out_pa == ARMSS_NonSecure && in_pa == ARMSS_Secure &&
+        (env->cp15.scr_el3 & SCR_SIF)) {
         return prot_rw;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
     int32_t stride;
     int addrsize, inputsize, outputsize;
     uint64_t tcr = regime_tcr(env, mmu_idx);
-    int ap, ns, xn, pxn;
+    int ap, xn, pxn;
     uint32_t el = regime_el(env, mmu_idx);
     uint64_t descaddrmask;
     bool aarch64 = arm_el_is_aa64(env, el);
     uint64_t descriptor, new_descriptor;
+    ARMSecuritySpace out_space;
 
     /* TODO: This code does not support shareability levels. */
     if (aarch64) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
     }
 
     ap = extract32(attrs, 6, 2);
+    out_space = ptw->in_space;
     if (regime_is_stage2(mmu_idx)) {
-        ns = mmu_idx == ARMMMUIdx_Stage2;
+        /*
+         * R_GYNXY: For stage2 in Realm security state, bit 55 is NS.
+         * The bit remains ignored for other security states.
+         */
+        if (out_space == ARMSS_Realm && extract64(attrs, 55, 1)) {
+            out_space = ARMSS_NonSecure;
+        }
         xn = extract64(attrs, 53, 2);
         result->f.prot = get_S2prot(env, ap, xn, s1_is_el0);
     } else {
-        ns = extract32(attrs, 5, 1);
+        int nse, ns = extract32(attrs, 5, 1);
+        switch (out_space) {
+        case ARMSS_Root:
+            /*
+             * R_GVZML: Bit 11 becomes the NSE field in the EL3 regime.
+             * R_XTYPW: NSE and NS together select the output pa space.
+             */
+            nse = extract32(attrs, 11, 1);
+            out_space = (nse << 1) | ns;
+            if (out_space == ARMSS_Secure &&
+                !cpu_isar_feature(aa64_sel2, cpu)) {
+                out_space = ARMSS_NonSecure;
+            }
+            break;
+        case ARMSS_Secure:
+            if (ns) {
+                out_space = ARMSS_NonSecure;
+            }
+            break;
+        case ARMSS_Realm:
+            switch (mmu_idx) {
+            case ARMMMUIdx_Stage1_E0:
+            case ARMMMUIdx_Stage1_E1:
+            case ARMMMUIdx_Stage1_E1_PAN:
+                /* I_CZPRF: For Realm EL1&0 stage1, NS bit is RES0. */
+                break;
+            case ARMMMUIdx_E2:
+            case ARMMMUIdx_E20_0:
+            case ARMMMUIdx_E20_2:
+            case ARMMMUIdx_E20_2_PAN:
+                /*
+                 * R_LYKFZ, R_WGRZN: For Realm EL2 and EL2&1,
+                 * NS changes the output to non-secure space.
+                 */
+                if (ns) {
+                    out_space = ARMSS_NonSecure;
+                }
+                break;
+            default:
+                g_assert_not_reached();
+            }
+            break;
+        case ARMSS_NonSecure:
+            /* R_QRMFF: For NonSecure state, the NS bit is RES0. */
+            break;
+        default:
+            g_assert_not_reached();
+        }
         xn = extract64(attrs, 54, 1);
         pxn = extract64(attrs, 53, 1);
-        result->f.prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
+
+        /*
+         * Note that we modified ptw->in_space earlier for NSTable, but
+         * result->f.attrs retains a copy of the original security space.
+         */
+        result->f.prot = get_S1prot(env, mmu_idx, aarch64, ap, xn, pxn,
+                                    result->f.attrs.space, out_space);
     }
 
     if (!(result->f.prot & (1 << access_type))) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
         }
     }
 
-    if (ns) {
-        /*
-         * The NS bit will (as required by the architecture) have no effect if
-         * the CPU doesn't support TZ or this is a non-secure translation
-         * regime, because the attribute will already be non-secure.
-         */
-        result->f.attrs.secure = false;
-        result->f.attrs.space = ARMSS_NonSecure;
-    }
+    result->f.attrs.space = out_space;
+    result->f.attrs.secure = arm_space_is_secure(out_space);
 
     if (regime_is_stage2(mmu_idx)) {
         result->cacheattrs.is_s2_format = true;
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

While Root and Realm may read and write data from other spaces,
neither may execute from other pa spaces.

This happens for Stage1 EL3, EL2, EL2&0, and Stage2 EL1&0.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-14-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 52 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 6 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ do_fault:
  * @xn:      XN (execute-never) bits
  * @s1_is_el0: true if this is S2 of an S1+2 walk for EL0
  */
-static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
+static int get_S2prot_noexecute(int s2ap)
 {
     int prot = 0;
 
@@ -XXX,XX +XXX,XX @@ static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
     if (s2ap & 2) {
         prot |= PAGE_WRITE;
     }
+    return prot;
+}
+
+static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
+{
+    int prot = get_S2prot_noexecute(s2ap);
 
     if (cpu_isar_feature(any_tts2uxn, env_archcpu(env))) {
         switch (xn) {
@@ -XXX,XX +XXX,XX @@ static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
         }
     }
 
-    if (out_pa == ARMSS_NonSecure && in_pa == ARMSS_Secure &&
-        (env->cp15.scr_el3 & SCR_SIF)) {
-        return prot_rw;
+    if (in_pa != out_pa) {
+        switch (in_pa) {
+        case ARMSS_Root:
+            /*
+             * R_ZWRVD: permission fault for insn fetched from non-Root,
+             * I_WWBFB: SIF has no effect in EL3.
+             */
+            return prot_rw;
+        case ARMSS_Realm:
+            /*
+             * R_PKTDS: permission fault for insn fetched from non-Realm,
+             * for Realm EL2 or EL2&0.  The corresponding fault for EL1&0
+             * happens during any stage2 translation.
+             */
+            switch (mmu_idx) {
+            case ARMMMUIdx_E2:
+            case ARMMMUIdx_E20_0:
+            case ARMMMUIdx_E20_2:
+            case ARMMMUIdx_E20_2_PAN:
+                return prot_rw;
+            default:
+                break;
+            }
+            break;
+        case ARMSS_Secure:
+            if (env->cp15.scr_el3 & SCR_SIF) {
+                return prot_rw;
+            }
+            break;
+        default:
+            /* Input NonSecure must have output NonSecure. */
+            g_assert_not_reached();
+        }
     }
 
     /* TODO have_wxn should be replaced with
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
         /*
          * R_GYNXY: For stage2 in Realm security state, bit 55 is NS.
          * The bit remains ignored for other security states.
+         * R_YMCSL: Executing an insn fetched from non-Realm causes
+         * a stage2 permission fault.
          */
         if (out_space == ARMSS_Realm && extract64(attrs, 55, 1)) {
             out_space = ARMSS_NonSecure;
+            result->f.prot = get_S2prot_noexecute(ap);
+        } else {
+            xn = extract64(attrs, 53, 2);
+            result->f.prot = get_S2prot(env, ap, xn, s1_is_el0);
         }
-        xn = extract64(attrs, 53, 2);
-        result->f.prot = get_S2prot(env, ap, xn, s1_is_el0);
     } else {
         int nse, ns = extract32(attrs, 5, 1);
         switch (out_space) {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Do not provide a fast-path for physical addresses,
as those will need to be validated for GPC.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-15-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 44 +++++++++++++++++---------------------------
 1 file changed, 17 insertions(+), 27 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
          * From gdbstub, do not use softmmu so that we don't modify the
          * state of the cpu at all, including softmmu tlb contents.
          */
-        if (regime_is_stage2(s2_mmu_idx)) {
-            S1Translate s2ptw = {
-                .in_mmu_idx = s2_mmu_idx,
-                .in_ptw_idx = ptw_idx_for_stage_2(env, s2_mmu_idx),
-                .in_secure = s2_mmu_idx == ARMMMUIdx_Stage2_S,
-                .in_space = (s2_mmu_idx == ARMMMUIdx_Stage2_S ? ARMSS_Secure
-                             : space == ARMSS_Realm ? ARMSS_Realm
-                             : ARMSS_NonSecure),
-                .in_debug = true,
-            };
-            GetPhysAddrResult s2 = { };
+        S1Translate s2ptw = {
+            .in_mmu_idx = s2_mmu_idx,
+            .in_ptw_idx = ptw_idx_for_stage_2(env, s2_mmu_idx),
+            .in_secure = s2_mmu_idx == ARMMMUIdx_Stage2_S,
+            .in_space = (s2_mmu_idx == ARMMMUIdx_Stage2_S ? ARMSS_Secure
+                         : space == ARMSS_Realm ? ARMSS_Realm
+                         : ARMSS_NonSecure),
+            .in_debug = true,
+        };
+        GetPhysAddrResult s2 = { };
 
-            if (get_phys_addr_lpae(env, &s2ptw, addr, MMU_DATA_LOAD,
-                                   false, &s2, fi)) {
-                goto fail;
-            }
-            ptw->out_phys = s2.f.phys_addr;
-            pte_attrs = s2.cacheattrs.attrs;
-            ptw->out_secure = s2.f.attrs.secure;
-            ptw->out_space = s2.f.attrs.space;
-        } else {
-            /* Regime is physical. */
-            ptw->out_phys = addr;
-            pte_attrs = 0;
-            ptw->out_secure = s2_mmu_idx == ARMMMUIdx_Phys_S;
-            ptw->out_space = (s2_mmu_idx == ARMMMUIdx_Phys_S ? ARMSS_Secure
-                              : space == ARMSS_Realm ? ARMSS_Realm
-                              : ARMSS_NonSecure);
+        if (get_phys_addr_with_struct(env, &s2ptw, addr,
+                                      MMU_DATA_LOAD, &s2, fi)) {
+            goto fail;
         }
+        ptw->out_phys = s2.f.phys_addr;
+        pte_attrs = s2.cacheattrs.attrs;
         ptw->out_host = NULL;
         ptw->out_rw = false;
+        ptw->out_secure = s2.f.attrs.secure;
+        ptw->out_space = s2.f.attrs.space;
     } else {
 #ifdef CONFIG_TCG
         CPUTLBEntryFull *full;
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Instead of passing this to get_phys_addr_lpae, stash it
in the S1Translate structure.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-16-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ typedef struct S1Translate {
     ARMSecuritySpace in_space;
     bool in_secure;
     bool in_debug;
+    /*
+     * If this is stage 2 of a stage 1+2 page table walk, then this must
+     * be true if stage 1 is an EL0 access; otherwise this is ignored.
+     * Stage 2 is indicated by in_mmu_idx set to ARMMMUIdx_Stage2{,_S}.
+     */
+    bool in_s1_is_el0;
     bool out_secure;
     bool out_rw;
     bool out_be;
@@ -XXX,XX +XXX,XX @@ typedef struct S1Translate {
 } S1Translate;
 
 static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
-                               uint64_t address,
-                               MMUAccessType access_type, bool s1_is_el0,
+                               uint64_t address, MMUAccessType access_type,
                                GetPhysAddrResult *result, ARMMMUFaultInfo *fi);
 
 static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
@@ -XXX,XX +XXX,XX @@ static int check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, uint64_t tcr,
  * @ptw: Current and next stage parameters for the walk.
  * @address: virtual address to get physical address for
  * @access_type: MMU_DATA_LOAD, MMU_DATA_STORE or MMU_INST_FETCH
- * @s1_is_el0: if @ptw->in_mmu_idx is ARMMMUIdx_Stage2
- *             (so this is a stage 2 page table walk),
- *             must be true if this is stage 2 of a stage 1+2
- *             walk for an EL0 access. If @mmu_idx is anything else,
- *             @s1_is_el0 is ignored.
  * @result: set on translation success,
  * @fi: set to fault info if the translation fails
  */
 static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
                                uint64_t address,
-                               MMUAccessType access_type, bool s1_is_el0,
+                               MMUAccessType access_type,
                                GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
 {
     ARMCPU *cpu = env_archcpu(env);
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
             result->f.prot = get_S2prot_noexecute(ap);
         } else {
             xn = extract64(attrs, 53, 2);
-            result->f.prot = get_S2prot(env, ap, xn, s1_is_el0);
+            result->f.prot = get_S2prot(env, ap, xn, ptw->in_s1_is_el0);
         }
     } else {
         int nse, ns = extract32(attrs, 5, 1);
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
     bool ret, ipa_secure;
     ARMCacheAttrs cacheattrs1;
     ARMSecuritySpace ipa_space;
-    bool is_el0;
     uint64_t hcr;
 
     ret = get_phys_addr_with_struct(env, ptw, address, access_type, result, fi);
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
     ipa_secure = result->f.attrs.secure;
     ipa_space = result->f.attrs.space;
 
-    is_el0 = ptw->in_mmu_idx == ARMMMUIdx_Stage1_E0;
+    ptw->in_s1_is_el0 = ptw->in_mmu_idx == ARMMMUIdx_Stage1_E0;
     ptw->in_mmu_idx = ipa_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
     ptw->in_secure = ipa_secure;
     ptw->in_space = ipa_space;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
         ret = get_phys_addr_pmsav8(env, ipa, access_type,
                                    ptw->in_mmu_idx, is_secure, result, fi);
     } else {
-        ret = get_phys_addr_lpae(env, ptw, ipa, access_type,
-                                 is_el0, result, fi);
+        ret = get_phys_addr_lpae(env, ptw, ipa, access_type, result, fi);
     }
     fi->s2addr = ipa;
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
     }
 
     if (regime_using_lpae_format(env, mmu_idx)) {
-        return get_phys_addr_lpae(env, ptw, address, access_type, false,
-                                  result, fi);
+        return get_phys_addr_lpae(env, ptw, address, access_type, result, fi);
     } else if (arm_feature(env, ARM_FEATURE_V7) ||
                regime_sctlr(env, mmu_idx) & SCTLR_XP) {
         return get_phys_addr_v6(env, ptw, address, access_type, result, fi);
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

This fixes a bug in which we failed to initialize
the result attributes properly after the memset.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-17-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

The function takes the fields as filled in by
the Arm ARM pseudocode for TakeGPCException.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-18-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/syndrome.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/target/arm/syndrome.h b/target/arm/syndrome.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/syndrome.h
+++ b/target/arm/syndrome.h
@@ -XXX,XX +XXX,XX @@ enum arm_exception_class {
     EC_SVEACCESSTRAP          = 0x19,
     EC_ERETTRAP               = 0x1a,
     EC_SMETRAP                = 0x1d,
+    EC_GPC                    = 0x1e,
     EC_INSNABORT              = 0x20,
     EC_INSNABORT_SAME_EL      = 0x21,
     EC_PCALIGNMENT            = 0x22,
@@ -XXX,XX +XXX,XX @@ static inline uint32_t syn_bxjtrap(int cv, int cond, int rm)
         (cv << 24) | (cond << 20) | rm;
 }
 
+static inline uint32_t syn_gpc(int s2ptw, int ind, int gpcsc,
+                               int cm, int s1ptw, int wnr, int fsc)
+{
+    /* TODO: FEAT_NV2 adds VNCR */
+    return (EC_GPC << ARM_EL_EC_SHIFT) | ARM_EL_IL | (s2ptw << 21)
+            | (ind << 20) | (gpcsc << 14) | (cm << 8) | (s1ptw << 7)
+            | (wnr << 6) | fsc;
+}
+
 static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
 {
     return (EC_INSNABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Handle GPC Fault types in arm_deliver_fault, reporting as
either a GPC exception at EL3, or falling through to insn
or data aborts at various exception levels.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-19-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            |  1 +
 target/arm/internals.h      | 27 +++++++++++
 target/arm/helper.c         |  5 ++
 target/arm/tcg/tlb_helper.c | 96 +++++++++++++++++++++++++++++++++++--
 4 files changed, 126 insertions(+), 3 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@
 #define EXCP_UNALIGNED      22   /* v7M UNALIGNED UsageFault */
 #define EXCP_DIVBYZERO      23   /* v7M DIVBYZERO UsageFault */
 #define EXCP_VSERR          24
+#define EXCP_GPC            25   /* v9 Granule Protection Check Fault */
 /* NB: add new EXCP_ defines to the array in arm_log_exception() too */
 
 #define ARMV7M_EXCP_RESET   1
diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ typedef enum ARMFaultType {
     ARMFault_ICacheMaint,
     ARMFault_QEMU_NSCExec, /* v8M: NS executing in S&NSC memory */
     ARMFault_QEMU_SFault, /* v8M: SecureFault INVTRAN, INVEP or AUVIOL */
+    ARMFault_GPCFOnWalk,
+    ARMFault_GPCFOnOutput,
 } ARMFaultType;
 
+typedef enum ARMGPCF {
+    GPCF_None,
+    GPCF_AddressSize,
+    GPCF_Walk,
+    GPCF_EABT,
+    GPCF_Fail,
+} ARMGPCF;
+
 /**
  * ARMMMUFaultInfo: Information describing an ARM MMU Fault
  * @type: Type of fault
+ * @gpcf: Subtype of ARMFault_GPCFOn{Walk,Output}.
  * @level: Table walk level (for translation, access flag and permission faults)
  * @domain: Domain of the fault address (for non-LPAE CPUs only)
  * @s2addr: Address that caused a fault at stage 2
+ * @paddr: physical address that caused a fault for gpc
+ * @paddr_space: physical address space that caused a fault for gpc
  * @stage2: True if we faulted at stage 2
  * @s1ptw: True if we faulted at stage 2 while doing a stage 1 page-table walk
  * @s1ns: True if we faulted on a non-secure IPA while in secure state
@@ -XXX,XX +XXX,XX @@ typedef enum ARMFaultType {
 typedef struct ARMMMUFaultInfo ARMMMUFaultInfo;
 struct ARMMMUFaultInfo {
     ARMFaultType type;
+    ARMGPCF gpcf;
     target_ulong s2addr;
+    target_ulong paddr;
+    ARMSecuritySpace paddr_space;
     int level;
     int domain;
     bool stage2;
@@ -XXX,XX +XXX,XX @@ static inline uint32_t arm_fi_to_lfsc(ARMMMUFaultInfo *fi)
     case ARMFault_Exclusive:
         fsc = 0x35;
         break;
+    case ARMFault_GPCFOnWalk:
+        assert(fi->level >= -1 && fi->level <= 3);
+        if (fi->level < 0) {
+            fsc = 0b100011;
+        } else {
+            fsc = 0b100100 | fi->level;
+        }
+        break;
+    case ARMFault_GPCFOnOutput:
+        fsc = 0b101000;
+        break;
     default:
         /* Other faults can't occur in a context that requires a
          * long-format status code.
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void arm_log_exception(CPUState *cs)
             [EXCP_UNALIGNED] = "v7M UNALIGNED UsageFault",
             [EXCP_DIVBYZERO] = "v7M DIVBYZERO UsageFault",
             [EXCP_VSERR] = "Virtual SERR",
+            [EXCP_GPC] = "Granule Protection Check",
         };
 
         if (idx >= 0 && idx < ARRAY_SIZE(excnames)) {
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
     }
 
     switch (cs->exception_index) {
+    case EXCP_GPC:
+        qemu_log_mask(CPU_LOG_INT, "...with MFAR 0x%" PRIx64 "\n",
+                      env->cp15.mfar_el3);
+        /* fall through */
     case EXCP_PREFETCH_ABORT:
     case EXCP_DATA_ABORT:
         /*
diff --git a/target/arm/tcg/tlb_helper.c b/target/arm/tcg/tlb_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/tlb_helper.c
+++ b/target/arm/tcg/tlb_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t compute_fsr_fsc(CPUARMState *env, ARMMMUFaultInfo *fi,
     return fsr;
 }
 
+static bool report_as_gpc_exception(ARMCPU *cpu, int current_el,
+                                    ARMMMUFaultInfo *fi)
+{
+    bool ret;
+
+    switch (fi->gpcf) {
+    case GPCF_None:
+        return false;
+    case GPCF_AddressSize:
+    case GPCF_Walk:
+    case GPCF_EABT:
+        /* R_PYTGX: GPT faults are reported as GPC. */
+        ret = true;
+        break;
+    case GPCF_Fail:
+        /*
+         * R_BLYPM: A GPF at EL3 is reported as insn or data abort.
+         * R_VBZMW, R_LXHQR: A GPF at EL[0-2] is reported as a GPC
+         * if SCR_EL3.GPF is set, otherwise an insn or data abort.
+         */
+        ret = (cpu->env.cp15.scr_el3 & SCR_GPF) && current_el != 3;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    assert(cpu_isar_feature(aa64_rme, cpu));
+    assert(fi->type == ARMFault_GPCFOnWalk ||
+           fi->type == ARMFault_GPCFOnOutput);
+    if (fi->gpcf == GPCF_AddressSize) {
+        assert(fi->level == 0);
+    } else {
+        assert(fi->level >= 0 && fi->level <= 1);
+    }
+
+    return ret;
+}
+
+static unsigned encode_gpcsc(ARMMMUFaultInfo *fi)
+{
+    static uint8_t const gpcsc[] = {
+        [GPCF_AddressSize] = 0b000000,
+        [GPCF_Walk]        = 0b000100,
+        [GPCF_Fail]        = 0b001100,
+        [GPCF_EABT]        = 0b010100,
+    };
+
+    /* Note that we've validated fi->gpcf and fi->level above. */
+    return gpcsc[fi->gpcf] | fi->level;
+}
+
 static G_NORETURN
 void arm_deliver_fault(ARMCPU *cpu, vaddr addr,
                        MMUAccessType access_type,
                        int mmu_idx, ARMMMUFaultInfo *fi)
 {
     CPUARMState *env = &cpu->env;
-    int target_el;
+    int target_el = exception_target_el(env);
+    int current_el = arm_current_el(env);
     bool same_el;
     uint32_t syn, exc, fsr, fsc;
 
-    target_el = exception_target_el(env);
+    if (report_as_gpc_exception(cpu, current_el, fi)) {
+        target_el = 3;
+
+        fsr = compute_fsr_fsc(env, fi, target_el, mmu_idx, &fsc);
+
+        syn = syn_gpc(fi->stage2 && fi->type == ARMFault_GPCFOnWalk,
+                      access_type == MMU_INST_FETCH,
+                      encode_gpcsc(fi), 0, fi->s1ptw,
+                      access_type == MMU_DATA_STORE, fsc);
+
+        env->cp15.mfar_el3 = fi->paddr;
+        switch (fi->paddr_space) {
+        case ARMSS_Secure:
+            break;
+        case ARMSS_NonSecure:
+            env->cp15.mfar_el3 |= R_MFAR_NS_MASK;
+            break;
+        case ARMSS_Root:
+            env->cp15.mfar_el3 |= R_MFAR_NSE_MASK;
+            break;
+        case ARMSS_Realm:
+            env->cp15.mfar_el3 |= R_MFAR_NSE_MASK | R_MFAR_NS_MASK;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+
+        exc = EXCP_GPC;
+        goto do_raise;
+    }
+
+    /* If SCR_EL3.GPF is unset, GPF may still be routed to EL2. */
+    if (fi->gpcf == GPCF_Fail && target_el < 2) {
+        if (arm_hcr_el2_eff(env) & HCR_GPF) {
+            target_el = 2;
+        }
+    }
+
     if (fi->stage2) {
         target_el = 2;
         env->cp15.hpfar_el2 = extract64(fi->s2addr, 12, 47) << 4;
@@ -XXX,XX +XXX,XX @@ void arm_deliver_fault(ARMCPU *cpu, vaddr addr,
             env->cp15.hpfar_el2 |= HPFAR_NS;
         }
     }
-    same_el = (arm_current_el(env) == target_el);
 
+    same_el = current_el == target_el;
     fsr = compute_fsr_fsc(env, fi, target_el, mmu_idx, &fsc);
 
     if (access_type == MMU_INST_FETCH) {
@@ -XXX,XX +XXX,XX @@ void arm_deliver_fault(ARMCPU *cpu, vaddr addr,
         exc = EXCP_DATA_ABORT;
     }
 
+ do_raise:
     env->exception.vaddress = addr;
     env->exception.fsr = fsr;
     raise_exception(env, exc, syn, target_el);
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Place the check at the end of get_phys_addr_with_struct,
so that we check all physical results.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-20-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/ptw.c | 249 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 232 insertions(+), 17 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ typedef struct S1Translate {
     void *out_host;
 } S1Translate;
 
-static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
-                                      target_ulong address,
-                                      MMUAccessType access_type,
-                                      GetPhysAddrResult *result,
-                                      ARMMMUFaultInfo *fi);
+static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
+                                target_ulong address,
+                                MMUAccessType access_type,
+                                GetPhysAddrResult *result,
+                                ARMMMUFaultInfo *fi);
+
+static bool get_phys_addr_gpc(CPUARMState *env, S1Translate *ptw,
+                              target_ulong address,
+                              MMUAccessType access_type,
+                              GetPhysAddrResult *result,
+                              ARMMMUFaultInfo *fi);
 
 /* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
 static const uint8_t pamax_map[] = {
@@ -XXX,XX +XXX,XX @@ static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx,
     return (regime_sctlr(env, mmu_idx) & SCTLR_M) == 0;
 }
 
+static bool granule_protection_check(CPUARMState *env, uint64_t paddress,
+                                     ARMSecuritySpace pspace,
+                                     ARMMMUFaultInfo *fi)
+{
+    MemTxAttrs attrs = {
+        .secure = true,
+        .space = ARMSS_Root,
+    };
+    ARMCPU *cpu = env_archcpu(env);
+    uint64_t gpccr = env->cp15.gpccr_el3;
+    unsigned pps, pgs, l0gptsz, level = 0;
+    uint64_t tableaddr, pps_mask, align, entry, index;
+    AddressSpace *as;
+    MemTxResult result;
+    int gpi;
+
+    if (!FIELD_EX64(gpccr, GPCCR, GPC)) {
+        return true;
+    }
+
+    /*
+     * GPC Priority 1 (R_GMGRR):
+     * R_JWCSM: If the configuration of GPCCR_EL3 is invalid,
+     * the access fails as GPT walk fault at level 0.
+     */
+
+    /*
+     * Configuration of PPS to a value exceeding the implemented
+     * physical address size is invalid.
+     */
+    pps = FIELD_EX64(gpccr, GPCCR, PPS);
+    if (pps > FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE)) {
+        goto fault_walk;
+    }
+    pps = pamax_map[pps];
+    pps_mask = MAKE_64BIT_MASK(0, pps);
+
+    switch (FIELD_EX64(gpccr, GPCCR, SH)) {
+    case 0b10: /* outer shareable */
+        break;
+    case 0b00: /* non-shareable */
+    case 0b11: /* inner shareable */
+        /* Inner and Outer non-cacheable requires Outer shareable. */
+        if (FIELD_EX64(gpccr, GPCCR, ORGN) == 0 &&
+            FIELD_EX64(gpccr, GPCCR, IRGN) == 0) {
+            goto fault_walk;
+        }
+        break;
+    default:   /* reserved */
+        goto fault_walk;
+    }
+
+    switch (FIELD_EX64(gpccr, GPCCR, PGS)) {
+    case 0b00: /* 4KB */
+        pgs = 12;
+        break;
+    case 0b01: /* 64KB */
+        pgs = 16;
+        break;
+    case 0b10: /* 16KB */
+        pgs = 14;
+        break;
+    default: /* reserved */
+        goto fault_walk;
+    }
+
+    /* Note this field is read-only and fixed at reset. */
+    l0gptsz = 30 + FIELD_EX64(gpccr, GPCCR, L0GPTSZ);
+
+    /*
+     * GPC Priority 2: Secure, Realm or Root address exceeds PPS.
+     * R_CPDSB: A NonSecure physical address input exceeding PPS
+     * does not experience any fault.
+     */
+    if (paddress & ~pps_mask) {
+        if (pspace == ARMSS_NonSecure) {
+            return true;
+        }
+        goto fault_size;
+    }
+
+    /* GPC Priority 3: the base address of GPTBR_EL3 exceeds PPS. */
+    tableaddr = env->cp15.gptbr_el3 << 12;
+    if (tableaddr & ~pps_mask) {
+        goto fault_size;
+    }
+
+    /*
+     * BADDR is aligned per a function of PPS and L0GPTSZ.
+     * These bits of GPTBR_EL3 are RES0, but are not a configuration error,
+     * unlike the RES0 bits of the GPT entries (R_XNKFZ).
+     */
+    align = MAX(pps - l0gptsz + 3, 12);
+    align = MAKE_64BIT_MASK(0, align);
+    tableaddr &= ~align;
+
+    as = arm_addressspace(env_cpu(env), attrs);
+
+    /* Level 0 lookup. */
+    index = extract64(paddress, l0gptsz, pps - l0gptsz);
+    tableaddr += index * 8;
+    entry = address_space_ldq_le(as, tableaddr, attrs, &result);
+    if (result != MEMTX_OK) {
+        goto fault_eabt;
+    }
+
+    switch (extract32(entry, 0, 4)) {
+    case 1: /* block descriptor */
+        if (entry >> 8) {
+            goto fault_walk; /* RES0 bits not 0 */
+        }
+        gpi = extract32(entry, 4, 4);
+        goto found;
+    case 3: /* table descriptor */
+        tableaddr = entry & ~0xf;
+        align = MAX(l0gptsz - pgs - 1, 12);
+        align = MAKE_64BIT_MASK(0, align);
+        if (tableaddr & (~pps_mask | align)) {
+            goto fault_walk; /* RES0 bits not 0 */
+        }
+        break;
+    default: /* invalid */
+        goto fault_walk;
+    }
+
+    /* Level 1 lookup */
+    level = 1;
+    index = extract64(paddress, pgs + 4, l0gptsz - pgs - 4);
+    tableaddr += index * 8;
+    entry = address_space_ldq_le(as, tableaddr, attrs, &result);
+    if (result != MEMTX_OK) {
+        goto fault_eabt;
+    }
+
+    switch (extract32(entry, 0, 4)) {
+    case 1: /* contiguous descriptor */
+        if (entry >> 10) {
+            goto fault_walk; /* RES0 bits not 0 */
+        }
+        /*
+         * Because the softmmu tlb only works on units of TARGET_PAGE_SIZE,
+         * and because we cannot invalidate by pa, and thus will always
+         * flush entire tlbs, we don't actually care about the range here
+         * and can simply extract the GPI as the result.
+         */
+        if (extract32(entry, 8, 2) == 0) {
+            goto fault_walk; /* reserved contig */
+        }
+        gpi = extract32(entry, 4, 4);
+        break;
+    default:
+        index = extract64(paddress, pgs, 4);
+        gpi = extract64(entry, index * 4, 4);
+        break;
+    }
+
+ found:
+    switch (gpi) {
+    case 0b0000: /* no access */
+        break;
+    case 0b1111: /* all access */
+        return true;
+    case 0b1000:
+    case 0b1001:
+    case 0b1010:
+    case 0b1011:
+        if (pspace == (gpi & 3)) {
+            return true;
+        }
+        break;
+    default:
+        goto fault_walk; /* reserved */
+    }
+
+    fi->gpcf = GPCF_Fail;
+    goto fault_common;
+ fault_eabt:
+    fi->gpcf = GPCF_EABT;
+    goto fault_common;
+ fault_size:
+    fi->gpcf = GPCF_AddressSize;
+    goto fault_common;
+ fault_walk:
+    fi->gpcf = GPCF_Walk;
+ fault_common:
+    fi->level = level;
+    fi->paddr = paddress;
+    fi->paddr_space = pspace;
+    return false;
+}
+
 static bool S2_attrs_are_device(uint64_t hcr, uint8_t attrs)
 {
     /*
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
         };
         GetPhysAddrResult s2 = { };
 
-        if (get_phys_addr_with_struct(env, &s2ptw, addr,
-                                      MMU_DATA_LOAD, &s2, fi)) {
+        if (get_phys_addr_gpc(env, &s2ptw, addr, MMU_DATA_LOAD, &s2, fi)) {
             goto fail;
         }
+
         ptw->out_phys = s2.f.phys_addr;
         pte_attrs = s2.cacheattrs.attrs;
         ptw->out_host = NULL;
@@ -XXX,XX +XXX,XX @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
 
  fail:
     assert(fi->type != ARMFault_None);
+    if (fi->type == ARMFault_GPCFOnOutput) {
+        fi->type = ARMFault_GPCFOnWalk;
+    }
     fi->s2addr = addr;
     fi->stage2 = true;
     fi->s1ptw = true;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_disabled(CPUARMState *env, target_ulong address,
                                    ARMMMUFaultInfo *fi)
 {
     uint8_t memattr = 0x00;    /* Device nGnRnE */
-    uint8_t shareability = 0;  /* non-sharable */
+    uint8_t shareability = 0;  /* non-shareable */
     int r_el;
 
     switch (mmu_idx) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_disabled(CPUARMState *env, target_ulong address,
             } else {
                 memattr = 0x44;  /* Normal, NC, No */
             }
-            shareability = 2; /* outer sharable */
+            shareability = 2; /* outer shareable */
         }
         result->cacheattrs.is_s2_format = false;
         break;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
     ARMSecuritySpace ipa_space;
     uint64_t hcr;
 
-    ret = get_phys_addr_with_struct(env, ptw, address, access_type, result, fi);
+    ret = get_phys_addr_nogpc(env, ptw, address, access_type, result, fi);
 
     /* If S1 fails, return early.  */
     if (ret) {
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
     cacheattrs1 = result->cacheattrs;
     memset(result, 0, sizeof(*result));
 
-    ret = get_phys_addr_with_struct(env, ptw, ipa, access_type, result, fi);
+    ret = get_phys_addr_nogpc(env, ptw, ipa, access_type, result, fi);
     fi->s2addr = ipa;
 
     /* Combine the S1 and S2 perms.  */
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_twostage(CPUARMState *env, S1Translate *ptw,
     return false;
 }
 
-static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
+static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
                                       target_ulong address,
                                       MMUAccessType access_type,
                                       GetPhysAddrResult *result,
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
     }
 }
 
+static bool get_phys_addr_gpc(CPUARMState *env, S1Translate *ptw,
+                              target_ulong address,
+                              MMUAccessType access_type,
+                              GetPhysAddrResult *result,
+                              ARMMMUFaultInfo *fi)
+{
+    if (get_phys_addr_nogpc(env, ptw, address, access_type, result, fi)) {
+        return true;
+    }
+    if (!granule_protection_check(env, result->f.phys_addr,
+                                  result->f.attrs.space, fi)) {
+        fi->type = ARMFault_GPCFOnOutput;
+        return true;
+    }
+    return false;
+}
+
 bool get_phys_addr_with_secure(CPUARMState *env, target_ulong address,
                                MMUAccessType access_type, ARMMMUIdx mmu_idx,
                                bool is_secure, GetPhysAddrResult *result,
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr_with_secure(CPUARMState *env, target_ulong address,
         .in_secure = is_secure,
         .in_space = arm_secure_to_space(is_secure),
     };
-    return get_phys_addr_with_struct(env, &ptw, address, access_type,
-                                     result, fi);
+    return get_phys_addr_gpc(env, &ptw, address, access_type, result, fi);
 }
 
 bool get_phys_addr(CPUARMState *env, target_ulong address,
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
 
     ptw.in_space = ss;
     ptw.in_secure = arm_space_is_secure(ss);
-    return get_phys_addr_with_struct(env, &ptw, address, access_type,
-                                     result, fi);
+    return get_phys_addr_gpc(env, &ptw, address, access_type, result, fi);
 }
 
 hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
@@ -XXX,XX +XXX,XX @@ hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
     ARMMMUFaultInfo fi = {};
     bool ret;
 
-    ret = get_phys_addr_with_struct(env, &ptw, addr, MMU_DATA_LOAD, &res, &fi);
+    ret = get_phys_addr_gpc(env, &ptw, addr, MMU_DATA_LOAD, &res, &fi);
     *attrs = res.f.attrs;
 
     if (ret) {
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Add an x-rme cpu property to enable FEAT_RME.
Add an x-l0gptsz property to set GPCCR_EL3.L0GPTSZ,
for testing various possible configurations.

We're not currently completely sure whether FEAT_RME will
be OK to enable purely as a CPU-level property, or if it will
need board co-operation, so we're making these experimental
x- properties, so that the people developing the system
level software for RME can try to start using this and let
us know how it goes. The command line syntax for enabling
this will change in future, without backwards-compatibility.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620124418.805717-21-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/cpu64.c | 53 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char *name,
     cpu->sve_max_vq = max_vq;
 }
 
+static bool cpu_arm_get_rme(Object *obj, Error **errp)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+    return cpu_isar_feature(aa64_rme, cpu);
+}
+
+static void cpu_arm_set_rme(Object *obj, bool value, Error **errp)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+    uint64_t t;
+
+    t = cpu->isar.id_aa64pfr0;
+    t = FIELD_DP64(t, ID_AA64PFR0, RME, value);
+    cpu->isar.id_aa64pfr0 = t;
+}
+
+static void cpu_max_set_l0gptsz(Object *obj, Visitor *v, const char *name,
+                                void *opaque, Error **errp)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+    uint32_t value;
+
+    if (!visit_type_uint32(v, name, &value, errp)) {
+        return;
+    }
+
+    /* Encode the value for the GPCCR_EL3 field. */
+    switch (value) {
+    case 30:
+    case 34:
+    case 36:
+    case 39:
+        cpu->reset_l0gptsz = value - 30;
+        break;
+    default:
+        error_setg(errp, "invalid value for l0gptsz");
+        error_append_hint(errp, "valid values are 30, 34, 36, 39\n");
+        break;
+    }
+}
+
+static void cpu_max_get_l0gptsz(Object *obj, Visitor *v, const char *name,
+                                void *opaque, Error **errp)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+    uint32_t value = cpu->reset_l0gptsz + 30;
+
+    visit_type_uint32(v, name, &value, errp);
+}
+
 static Property arm_cpu_lpa2_property =
     DEFINE_PROP_BOOL("lpa2", ARMCPU, prop_lpa2, true);
 
@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
     aarch64_add_sme_properties(obj);
     object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
                         cpu_max_set_sve_max_vq, NULL, NULL);
+    object_property_add_bool(obj, "x-rme", cpu_arm_get_rme, cpu_arm_set_rme);
+    object_property_add(obj, "x-l0gptsz", "uint32", cpu_max_get_l0gptsz,
+                        cpu_max_set_l0gptsz, NULL, NULL);
     qdev_property_add_static(DEVICE(obj), &arm_cpu_lpa2_property);
 }
 
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20230622143046.1578160-1-richard.henderson@linaro.org
[PMM: fixed typo; note experimental status in emulation.rst too]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/cpu-features.rst | 23 +++++++++++++++++++++++
 docs/system/arm/emulation.rst    |  1 +
 2 files changed, 24 insertions(+)

diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/cpu-features.rst
+++ b/docs/system/arm/cpu-features.rst
@@ -XXX,XX +XXX,XX @@ As with ``sve-default-vector-length``, if the default length is larger
 than the maximum vector length enabled, the actual vector length will
 be reduced.  If this property is set to ``-1`` then the default vector
 length is set to the maximum possible length.
+
+RME CPU Properties
+==================
+
+The status of RME support with QEMU is experimental.  At this time we
+only support RME within the CPU proper, not within the SMMU or GIC.
+The feature is enabled by the CPU property ``x-rme``, with the ``x-``
+prefix present as a reminder of the experimental status, and defaults off.
+
+The method for enabling RME will change in some future QEMU release
+without notice or backward compatibility.
+
+RME Level 0 GPT Size Property
+-----------------------------
+
+To aid firmware developers in testing different possible CPU
+configurations, ``x-l0gptsz=S`` may be used to specify the value
+to encode into ``GPCCR_EL3.L0GPTSZ``, a read-only field that
+specifies the size of the Level 0 Granule Protection Table.
+Legal values for ``S`` are 30, 34, 36, and 39; the default is 30.
+
+As with ``x-rme``, the ``x-l0gptsz`` property may be renamed or
+removed in some future QEMU release.
diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
 - FEAT_RAS (Reliability, availability, and serviceability)
 - FEAT_RASv1p1 (RAS Extension v1.1)
 - FEAT_RDM (Advanced SIMD rounding double multiply accumulate instructions)
+- FEAT_RME (Realm Management Extension) (NB: support status in QEMU is experimental)
 - FEAT_RNG (Random number generator)
 - FEAT_S2FWB (Stage 2 forced Write-Back)
 - FEAT_SB (Speculation Barrier)
-- 
2.34.1

We use __builtin_subcll() to do a 64-bit subtract with borrow-in and
borrow-out when the host compiler supports it.  Unfortunately some
versions of Apple Clang have a bug in their implementation of this
intrinsic which means it returns the wrong value.  The effect is that
a QEMU built with the affected compiler will hang when emulating x86
or m68k float80 division.

The upstream LLVM issue is:
https://github.com/llvm/llvm-project/issues/55253

The commit that introduced the bug apparently never made it into an
upstream LLVM release without the subsequent fix
https://github.com/llvm/llvm-project/commit/fffb6e6afdbaba563189c1f715058ed401fbc88d
but unfortunately it did make it into Apple Clang 14.0, as shipped
in Xcode 14.3 (14.2 is reported to be OK). The Apple bug number is
FB12210478.

Add ifdefs to avoid use of __builtin_subcll() on Apple Clang version
14 or greater.  There is not currently a version of Apple Clang which
has the bug fix -- when one appears we should be able to add an upper
bound to the ifdef condition so we can start using the builtin again.
We make the lower bound a conservative "any Apple clang with major
version 14 or greater" because the consequences of incorrectly
disabling the builtin when it would work are pretty small and the
consequences of not disabling it when we should are pretty bad.

Many thanks to those users who both reported this bug and also
did a lot of work in identifying the root cause; in particular
to Daniel Bertalan and osy.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1631
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1659
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Daniel Bertalan <dani@danielbertalan.dev>
Tested-by: Tested-By: Solra Bizna <solra@bizna.name>
Message-id: 20230622130823.1631719-1-peter.maydell@linaro.org
---
 include/qemu/compiler.h   | 13 +++++++++++++
 include/qemu/host-utils.h |  2 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -XXX,XX +XXX,XX @@
 #define QEMU_DISABLE_CFI
 #endif
 
+/*
+ * Apple clang version 14 has a bug in its __builtin_subcll(); define
+ * BUILTIN_SUBCLL_BROKEN for the offending versions so we can avoid it.
+ * When a version of Apple clang which has this bug fixed is released
+ * we can add an upper bound to this check.
+ * See https://gitlab.com/qemu-project/qemu/-/issues/1631
+ * and https://gitlab.com/qemu-project/qemu/-/issues/1659 for details.
+ * The bug never made it into any upstream LLVM releases, only Apple ones.
+ */
+#if defined(__apple_build_version__) && __clang_major__ >= 14
+#define BUILTIN_SUBCLL_BROKEN
+#endif
+
 #endif /* COMPILER_H */
diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -XXX,XX +XXX,XX @@ static inline uint64_t uadd64_carry(uint64_t x, uint64_t y, bool *pcarry)
  */
 static inline uint64_t usub64_borrow(uint64_t x, uint64_t y, bool *pborrow)
 {
-#if __has_builtin(__builtin_subcll)
+#if __has_builtin(__builtin_subcll) && !defined(BUILTIN_SUBCLL_BROKEN)
     unsigned long long b = *pborrow;
     x = __builtin_subcll(x, y, b, &b);
     *pborrow = b & 1;
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

One cannot test for feature aa32_simd_r32 without first
testing if AArch32 mode is supported at all.  This leads to

qemu-system-aarch64: ARM CPUs must have both VFP-D32 and Neon or neither

for Apple M1 cpus.

We already have a check for ARMv8-A never setting vfp-d32 true,
so restructure the code so that AArch64 avoids the test entirely.

Reported-by: Mads Ynddal <mads@ynddal.dk>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Mads Ynddal <m.ynddal@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Mads Ynddal <m.ynddal@samsung.com>
Message-id: 20230619140216.402530-1-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
      * KVM does not currently allow us to lie to the guest about its
      * ID/feature registers, so the guest always sees what the host has.
      */
-    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)
-        ? cpu_isar_feature(aa64_fp_simd, cpu)
-        : cpu_isar_feature(aa32_vfp, cpu)) {
-        cpu->has_vfp = true;
-        if (!kvm_enabled()) {
-            qdev_property_add_static(DEVICE(obj), &arm_cpu_has_vfp_property);
+    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
+        if (cpu_isar_feature(aa64_fp_simd, cpu)) {
+            cpu->has_vfp = true;
+            cpu->has_vfp_d32 = true;
+            if (tcg_enabled() || qtest_enabled()) {
+                qdev_property_add_static(DEVICE(obj),
+                                         &arm_cpu_has_vfp_property);
+            }
         }
-    }
-
-    if (cpu->has_vfp && cpu_isar_feature(aa32_simd_r32, cpu)) {
-        cpu->has_vfp_d32 = true;
-        if (!kvm_enabled()) {
+    } else if (cpu_isar_feature(aa32_vfp, cpu)) {
+        cpu->has_vfp = true;
+        if (cpu_isar_feature(aa32_simd_r32, cpu)) {
+            cpu->has_vfp_d32 = true;
             /*
              * The permitted values of the SIMDReg bits [3:0] on
              * Armv8-A are either 0b0000 and 0b0010. On such CPUs,
              * make sure that has_vfp_d32 can not be set to false.
              */
-            if (!(arm_feature(&cpu->env, ARM_FEATURE_V8) &&
-                  !arm_feature(&cpu->env, ARM_FEATURE_M))) {
+            if ((tcg_enabled() || qtest_enabled())
+                && !(arm_feature(&cpu->env, ARM_FEATURE_V8)
+                     && !arm_feature(&cpu->env, ARM_FEATURE_M))) {
                 qdev_property_add_static(DEVICE(obj),
                                          &arm_cpu_has_vfp_d32_property);
             }
-- 
2.34.1

From: Shashi Mallela <shashi.mallela@linaro.org>

Create ITS as part of SBSA platform GIC initialization.

GIC ITS information is in DeviceTree so TF-A can pass it to EDK2.

Bumping platform version to 0.2 as this is important hardware change.

Signed-off-by: Shashi Mallela <shashi.mallela@linaro.org>
Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Message-id: 20230619170913.517373-2-marcin.juszkiewicz@linaro.org
Co-authored-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/sbsa.rst | 14 ++++++++++++++
 hw/arm/sbsa-ref.c        | 33 ++++++++++++++++++++++++++++++---
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/docs/system/arm/sbsa.rst b/docs/system/arm/sbsa.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/sbsa.rst
+++ b/docs/system/arm/sbsa.rst
@@ -XXX,XX +XXX,XX @@ to be a complete compliant DT. It currently reports:
    - platform version
    - GIC addresses
 
+Platform version
+''''''''''''''''
+
 The platform version is only for informing platform firmware about
 what kind of ``sbsa-ref`` board it is running on. It is neither
 a QEMU versioned machine type nor a reflection of the level of the
@@ -XXX,XX +XXX,XX @@ SBSA/SystemReady SR support provided.
 The ``machine-version-major`` value is updated when changes breaking
 fw compatibility are introduced. The ``machine-version-minor`` value
 is updated when features are added that don't break fw compatibility.
+
+Platform version changes:
+
+0.0
+  Devicetree holds information about CPUs, memory and platform version.
+
+0.1
+  GIC information is present in devicetree.
+
+0.2
+  GIC ITS information is present in devicetree.
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -XXX,XX +XXX,XX @@ enum {
     SBSA_CPUPERIPHS,
     SBSA_GIC_DIST,
     SBSA_GIC_REDIST,
+    SBSA_GIC_ITS,
     SBSA_SECURE_EC,
     SBSA_GWDT_WS0,
     SBSA_GWDT_REFRESH,
@@ -XXX,XX +XXX,XX @@ static const MemMapEntry sbsa_ref_memmap[] = {
     [SBSA_CPUPERIPHS] =         { 0x40000000, 0x00040000 },
     [SBSA_GIC_DIST] =           { 0x40060000, 0x00010000 },
     [SBSA_GIC_REDIST] =         { 0x40080000, 0x04000000 },
+    [SBSA_GIC_ITS] =            { 0x44081000, 0x00020000 },
     [SBSA_SECURE_EC] =          { 0x50000000, 0x00001000 },
     [SBSA_GWDT_REFRESH] =       { 0x50010000, 0x00001000 },
     [SBSA_GWDT_CONTROL] =       { 0x50011000, 0x00001000 },
@@ -XXX,XX +XXX,XX @@ static void sbsa_fdt_add_gic_node(SBSAMachineState *sms)
                                  2, sbsa_ref_memmap[SBSA_GIC_REDIST].base,
                                  2, sbsa_ref_memmap[SBSA_GIC_REDIST].size);
 
+    nodename = g_strdup_printf("/intc/its");
+    qemu_fdt_add_subnode(sms->fdt, nodename);
+    qemu_fdt_setprop_sized_cells(sms->fdt, nodename, "reg",
+                                 2, sbsa_ref_memmap[SBSA_GIC_ITS].base,
+                                 2, sbsa_ref_memmap[SBSA_GIC_ITS].size);
+
     g_free(nodename);
 }
+
 /*
  * Firmware on this machine only uses ACPI table to load OS, these limited
  * device tree nodes are just to let firmware know the info which varies from
@@ -XXX,XX +XXX,XX @@ static void create_fdt(SBSAMachineState *sms)
      *                        fw compatibility.
      */
     qemu_fdt_setprop_cell(fdt, "/", "machine-version-major", 0);
-    qemu_fdt_setprop_cell(fdt, "/", "machine-version-minor", 1);
+    qemu_fdt_setprop_cell(fdt, "/", "machine-version-minor", 2);
 
     if (ms->numa_state->have_numa_distance) {
         int size = nb_numa_nodes * nb_numa_nodes * 3 * sizeof(uint32_t);
@@ -XXX,XX +XXX,XX @@ static void create_secure_ram(SBSAMachineState *sms,
     memory_region_add_subregion(secure_sysmem, base, secram);
 }
 
-static void create_gic(SBSAMachineState *sms)
+static void create_its(SBSAMachineState *sms)
+{
+    const char *itsclass = its_class_name();
+    DeviceState *dev;
+
+    dev = qdev_new(itsclass);
+
+    object_property_set_link(OBJECT(dev), "parent-gicv3", OBJECT(sms->gic),
+                             &error_abort);
+    sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
+    sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, sbsa_ref_memmap[SBSA_GIC_ITS].base);
+}
+
+static void create_gic(SBSAMachineState *sms, MemoryRegion *mem)
 {
     unsigned int smp_cpus = MACHINE(sms)->smp.cpus;
     SysBusDevice *gicbusdev;
@@ -XXX,XX +XXX,XX @@ static void create_gic(SBSAMachineState *sms)
     qdev_prop_set_uint32(sms->gic, "len-redist-region-count", 1);
     qdev_prop_set_uint32(sms->gic, "redist-region-count[0]", redist0_count);
 
+    object_property_set_link(OBJECT(sms->gic), "sysmem",
+                             OBJECT(mem), &error_fatal);
+    qdev_prop_set_bit(sms->gic, "has-lpi", true);
+
     gicbusdev = SYS_BUS_DEVICE(sms->gic);
     sysbus_realize_and_unref(gicbusdev, &error_fatal);
     sysbus_mmio_map(gicbusdev, 0, sbsa_ref_memmap[SBSA_GIC_DIST].base);
@@ -XXX,XX +XXX,XX @@ static void create_gic(SBSAMachineState *sms)
         sysbus_connect_irq(gicbusdev, i + 3 * smp_cpus,
                            qdev_get_gpio_in(cpudev, ARM_CPU_VFIQ));
     }
+    create_its(sms);
 }
 
 static void create_uart(const SBSAMachineState *sms, int uart,
@@ -XXX,XX +XXX,XX @@ static void sbsa_ref_init(MachineState *machine)
 
     create_secure_ram(sms, secure_sysmem);
 
-    create_gic(sms);
+    create_gic(sms, sysmem);
 
     create_uart(sms, SBSA_UART, sysmem, serial_hd(0));
     create_uart(sms, SBSA_SECURE_UART, secure_sysmem, serial_hd(1));
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Brown bag time: store instead of load results in uninitialized temp.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1704
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230620134659.817559-1-richard.henderson@linaro.org
Fixes: e6dd5e782be ("target/arm: Use tcg_gen_qemu_{ld, st}_i128 in gen_sve_{ld, st}r")
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-sve.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -XXX,XX +XXX,XX @@ void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs,
     /* Predicate register stores can be any multiple of 2.  */
     if (len_remain >= 8) {
         t0 = tcg_temp_new_i64();
-        tcg_gen_st_i64(t0, base, vofs + len_align);
+        tcg_gen_ld_i64(t0, base, vofs + len_align);
         tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUQ | MO_ATOM_NONE);
         len_remain -= 8;
         len_align += 8;
-- 
2.34.1

The xkb official name for the Arabic keyboard layout is 'ara'.
However xkb has for at least the past 15 years also permitted it to
be named via the legacy synonym 'ar'.  In xkeyboard-config 2.39 this
synoynm was removed, which breaks compilation of QEMU:

FAILED: pc-bios/keymaps/ar
/home/fred/qemu-git/src/qemu/build-full/qemu-keymap -f pc-bios/keymaps/ar -l ar
xkbcommon: ERROR: Couldn't find file "symbols/ar" in include paths
xkbcommon: ERROR: 1 include paths searched:
xkbcommon: ERROR: 	/usr/share/X11/xkb
xkbcommon: ERROR: 3 include paths could not be added:
xkbcommon: ERROR: 	/home/fred/.config/xkb
xkbcommon: ERROR: 	/home/fred/.xkb
xkbcommon: ERROR: 	/etc/xkb
xkbcommon: ERROR: Abandoning symbols file "(unnamed)"
xkbcommon: ERROR: Failed to compile xkb_symbols
xkbcommon: ERROR: Failed to compile keymap

The upstream xkeyboard-config change removing the compat
mapping is:
https://gitlab.freedesktop.org/xkeyboard-config/xkeyboard-config/-/commit/470ad2cd8fea84d7210377161d86b31999bb5ea6

Make QEMU always ask for the 'ara' xkb layout, which should work on
both older and newer xkeyboard-config.  We leave the QEMU name for
this keyboard layout as 'ar'; it is not the only one where our name
for it deviates from the xkb standard name.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-id: 20230620162024.1132013-1-peter.maydell@linaro.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1709
---
 pc-bios/keymaps/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pc-bios/keymaps/meson.build b/pc-bios/keymaps/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/pc-bios/keymaps/meson.build
+++ b/pc-bios/keymaps/meson.build
@@ -XXX,XX +XXX,XX @@
 keymaps = {
-  'ar': '-l ar',
+  'ar': '-l ara',
   'bepo': '-l fr -v dvorak',
   'cz': '-l cz',
   'da': '-l dk',
-- 
2.34.1