Series comparison

-[PULL 00/26] target-arm queue
+[PULL 00/45] target-arm queue
-Small pile of bug fixes for rc1. I've included my patches to get
+The following changes since commit a97978bcc2d1f650c7d411428806e5b03082b8c7:
 our docs building with Sphinx 3, just for convenience...
--- PMM
+  Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210603' into staging (2021-06-03 10:00:35 +0100)
 The following changes since commit b149dea55cce97cb226683d06af61984a1c11e96:
   Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20201102' into staging (2020-11-02 10:57:48 +0000)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20201102
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210603
-for you to fetch changes up to ffb4fbf90a2f63c9cb33e4bb9f854c79bf04ca4a:
+for you to fetch changes up to 1c861885894d840235954060050d240259f5340b:
-  tests/qtest/npcm7xx_rng-test: Disable randomness tests (2020-11-02 16:52:18 +0000)
+  tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed (2021-06-03 16:43:27 +0100)
 ----------------------------------------------------------------
 target-arm queue:
- * target/arm: Fix Neon emulation bugs on big-endian hosts
+ * Some not-yet-enabled preliminaries for M-profile MVE support
- * target/arm: fix handling of HCR.FB
+ * Consistently use "Cortex-Axx", not "Cortex Axx" in docs, comments
- * target/arm: fix LORID_EL1 access check
+ * docs: Fix installation of man pages with Sphinx 4.x
- * disas/capstone: Fix monitor disassembly of >32 bytes
+ * Mark LDS{MIN,MAX} as signed operations
- * hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
+ * Fix missing syndrome value for DAIF and PAC check exceptions
- * hw/arm/boot: fix SVE for EL3 direct kernel boot
+ * Implement BFloat16 extensions
- * hw/display/omap_lcdc: Fix potential NULL pointer dereference
+ * Refactoring of hvf accelerator code in preparation for aarch64 support
- * hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
+ * Fix some coverity nits in test code
  * target/arm: Get correct MMU index for other-security-state
  * configure: Test that gio libs from pkg-config work
  * hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
  * docs: Fix building with Sphinx 3
  * tests/qtest/npcm7xx_rng-test: Disable randomness tests
 ----------------------------------------------------------------
-AlexChen (2):
+Alexander Graf (12):
-      hw/display/omap_lcdc: Fix potential NULL pointer dereference
+      hvf: Move assert_hvf_ok() into common directory
-      hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
+      hvf: Move vcpu thread functions into common directory
       hvf: Move cpu functions into common directory
       hvf: Move hvf internal definitions into common header
       hvf: Make hvf_set_phys_mem() static
       hvf: Remove use of hv_uvaddr_t and hv_gpaddr_t
       hvf: Split out common code on vcpu init and destroy
       hvf: Use cpu_synchronize_state()
       hvf: Make synchronize functions static
       hvf: Remove hvf-accel-ops.h
       hvf: Introduce hvf vcpu struct
       hvf: Simplify post reset/init/loadvm hooks
-Peter Maydell (9):
+Damien Goutte-Gattat (1):
-      target/arm: Fix float16 pairwise Neon ops on big-endian hosts
+      docs: Fix installation of man pages with Sphinx 4.x
       target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
       disas/capstone: Fix monitor disassembly of >32 bytes
       target/arm: Get correct MMU index for other-security-state
       configure: Test that gio libs from pkg-config work
       hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
       scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
       qemu-option-trace.rst.inc: Don't use option:: markup
       tests/qtest/npcm7xx_rng-test: Disable randomness tests
-Philippe Mathieu-Daudé (1):
+Jamie Iles (4):
-      hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
+      target/arm: fix missing exception class
       target/arm: fold do_raise_exception into raise_exception
       target/arm: use raise_exception_ra for MTE check failure
       target/arm: use raise_exception_ra for stack limit exception
-Richard Henderson (11):
+Peter Maydell (15):
-      target/arm: Introduce neon_full_reg_offset
+      target/arm: Add isar feature check functions for MVE
-      target/arm: Move neon_element_offset to translate.c
+      target/arm: Update feature checks for insns which are "MVE or FP"
-      target/arm: Use neon_element_offset in neon_load/store_reg
+      target/arm: Move fpsp/fpdp isar check into callers of do_vfp_2op_sp/dp
-      target/arm: Use neon_element_offset in vfp_reg_offset
+      target/arm: Add MVE check to VMOV_reg_sp and VMOV_reg_dp
-      target/arm: Add read/write_neon_element32
+      target/arm: Fix return values in fp_sysreg_checks()
-      target/arm: Expand read/write_neon_element32 to all MemOp
+      target/arm: Implement M-profile VPR register
-      target/arm: Rename neon_load_reg32 to vfp_load_reg32
+      target/arm: Make FPSCR.LTPSIZE writable for MVE
-      target/arm: Add read/write_neon_element64
+      target/arm: Allow board models to specify initial NS VTOR
-      target/arm: Rename neon_load_reg64 to vfp_load_reg64
+      arm: Consistently use "Cortex-Axx", not "Cortex Axx"
-      target/arm: Simplify do_long_3d and do_2scalar_long
+      tests/qtest/bios-tables-test: Check for dup2() failure
-      target/arm: Improve do_prewiden_3d
+      tests/qtest/e1000e-test: Check qemu_recv() succeeded
       tests/qtest/hd-geo-test: Fix checks on mkstemp() return value
       tests/qtest/pflash-cfi02-test: Avoid potential integer overflow
       tests/qtest/tpm-tests: Remove unnecessary NULL checks
       tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed
-Rémi Denis-Courmont (3):
+Richard Henderson (13):
-      target/arm: fix handling of HCR.FB
+      target/arm: Mark LDS{MIN,MAX} as signed operations
-      target/arm: fix LORID_EL1 access check
+      target/arm: Add isar_feature_{aa32, aa64, aa64_sve}_bf16
-      hw/arm/boot: fix SVE for EL3 direct kernel boot
+      target/arm: Unify unallocated path in disas_fp_1src
       target/arm: Implement scalar float32 to bfloat16 conversion
       target/arm: Implement vector float32 to bfloat16 conversion
       softfpu: Add float_round_to_odd_inf
       target/arm: Implement bfloat16 dot product (vector)
       target/arm: Implement bfloat16 dot product (indexed)
       target/arm: Implement bfloat16 matrix multiply accumulate
       target/arm: Implement bfloat widening fma (vector)
       target/arm: Implement bfloat widening fma (indexed)
       linux-user/aarch64: Enable hwcap bits for bfloat16
       target/arm: Enable BFloat16 extensions
- docs/qemu-option-trace.rst.inc     |   6 +-
+ docs/conf.py                    |   1 +
- configure                          |  10 +-
+ docs/system/arm/aspeed.rst      |   4 +-
- include/hw/intc/arm_gicv3_common.h |   1 -
+ docs/system/arm/nuvoton.rst     |   6 +-
- disas/capstone.c                   |   2 +-
+ docs/system/arm/sabrelite.rst   |   2 +-
- hw/arm/boot.c                      |   3 +
+ include/fpu/softfloat-types.h   |   4 +-
- hw/arm/smmuv3.c                    |   3 +-
+ include/hw/arm/allwinner-h3.h   |   2 +-
- hw/display/exynos4210_fimd.c       |   4 +-
+ include/hw/arm/armv7m.h         |   2 +
- hw/display/omap_lcdc.c             |  10 +-
+ include/hw/core/cpu.h           |   3 +-
- hw/intc/arm_gicv3_cpuif.c          |   5 +-
+ include/sysemu/hvf_int.h        |  58 +++++
- target/arm/helper.c                |  24 +-
+ target/arm/cpu.h                |  48 +++-
- target/arm/m_helper.c              |   3 +-
+ target/arm/helper-sve.h         |   4 +
- target/arm/translate.c             | 153 +++++++++---
+ target/arm/helper.h             |  15 ++
- target/arm/vec_helper.c            |  12 +-
+ target/i386/hvf/hvf-accel-ops.h |  23 --
- tests/qtest/npcm7xx_rng-test.c     |  14 +-
+ target/i386/hvf/hvf-i386.h      |  33 +--
- scripts/kernel-doc                 |  18 +-
+ target/i386/hvf/vmx.h           |  24 +-
- target/arm/translate-neon.c.inc    | 472 ++++++++++++++++++++-----------------
+ target/i386/hvf/x86hvf.h        |   2 -
- target/arm/translate-vfp.c.inc     | 341 +++++++++++----------------
+ target/arm/neon-dp.decode       |   1 +
-files changed, 588 insertions(+), 493 deletions(-)
+ target/arm/neon-shared.decode   |  11 +
  target/arm/sve.decode           |  19 +-
  target/arm/vfp.decode           |   2 +
  accel/hvf/hvf-accel-ops.c       | 471 ++++++++++++++++++++++++++++++++++++++++
  accel/hvf/hvf-all.c             |  47 ++++
  hw/arm/armv7m.c                 |   7 +
  hw/arm/aspeed.c                 |   6 +-
  hw/arm/mcimx6ul-evk.c           |   2 +-
  hw/arm/mcimx7d-sabre.c          |   2 +-
  hw/arm/npcm7xx_boards.c         |   4 +-
  hw/arm/sabrelite.c              |   2 +-
  hw/misc/npcm7xx_clk.c           |   2 +-
  linux-user/elfload.c            |   2 +
  target/arm/cpu.c                |  13 ++
  target/arm/cpu64.c              |   3 +
  target/arm/cpu_tcg.c            |   1 +
  target/arm/m_helper.c           |   5 +-
  target/arm/machine.c            |  20 ++
  target/arm/mte_helper.c         |  12 +-
  target/arm/op_helper.c          |  32 ++-
  target/arm/sve_helper.c         |   2 +
  target/arm/translate-a64.c      | 155 +++++++++++--
  target/arm/translate-neon.c     |  91 ++++++++
  target/arm/translate-sve.c      | 112 ++++++++++
  target/arm/translate-vfp.c      | 164 ++++++++++----
  target/arm/vec_helper.c         | 140 +++++++++++-
  target/arm/vfp_helper.c         |  21 +-
  target/i386/hvf/hvf-accel-ops.c | 146 -------------
  target/i386/hvf/hvf.c           | 464 +++++----------------------------------
  target/i386/hvf/x86.c           |  28 +--
  target/i386/hvf/x86_descr.c     |  26 +--
  target/i386/hvf/x86_emu.c       |  62 +++---
  target/i386/hvf/x86_mmu.c       |   4 +-
  target/i386/hvf/x86_task.c      |  12 +-
  target/i386/hvf/x86hvf.c        | 222 +++++++++----------
  tests/qtest/bios-tables-test.c  |   8 +-
  tests/qtest/e1000e-test.c       |   3 +-
  tests/qtest/hd-geo-test.c       |   4 +-
  tests/qtest/pflash-cfi02-test.c |   2 +-
  tests/qtest/tpm-tests.c         |  12 +-
  tests/unit/test-vmstate.c       |   5 +-
  fpu/softfloat-parts.c.inc       |   6 +-
  MAINTAINERS                     |   8 +
  accel/hvf/meson.build           |   7 +
  accel/meson.build               |   1 +
  target/i386/hvf/meson.build     |   1 -
 files changed, 1666 insertions(+), 935 deletions(-)
  create mode 100644 include/sysemu/hvf_int.h
  delete mode 100644 target/i386/hvf/hvf-accel-ops.h
  create mode 100644 accel/hvf/hvf-accel-ops.c
  create mode 100644 accel/hvf/hvf-all.c
  delete mode 100644 target/i386/hvf/hvf-accel-ops.c
  create mode 100644 accel/hvf/meson.build

-New patch
+[PULL 01/45] target/arm: Add isar feature check functions for MVE
+Add the isar feature check functions we will need for v8.1M MVE:
+ * a check for MVE present: this corresponds to the pseudocode's
+   CheckDecodeFaults(ExtType_Mve)
+ * a check for the optional floating-point part of MVE: this
+   corresponds to CheckDecodeFaults(ExtType_MveFp)
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210520152840.24453-2-peter.maydell@linaro.org
+---
+ target/arm/cpu.h | 22 ++++++++++++++++++++++
+file changed, 22 insertions(+)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
+     }
+ }
++static inline bool isar_feature_aa32_mve(const ARMISARegisters *id)
++{
++    /*
++     * Return true if MVE is supported (either integer or floating point).
++     * We must check for M-profile as the MVFR1 field means something
++     * else for A-profile.
++     */
++    return isar_feature_aa32_mprofile(id) &&
++        FIELD_EX32(id->mvfr1, MVFR1, MVE) > 0;
++}
++
++static inline bool isar_feature_aa32_mve_fp(const ARMISARegisters *id)
++{
++    /*
++     * Return true if MVE is supported (either integer or floating point).
++     * We must check for M-profile as the MVFR1 field means something
++     * else for A-profile.
++     */
++    return isar_feature_aa32_mprofile(id) &&
++        FIELD_EX32(id->mvfr1, MVFR1, MVE) >= 2;
++}
++
+ static inline bool isar_feature_aa32_vfp_simd(const ARMISARegisters *id)
+ {
+     /*
+--
+.20.1

-[PULL 11/26] target/arm: Improve do_prewiden_3d
+[PULL 02/45] target/arm: Update feature checks for insns which are "MVE or FP"
-From: Richard Henderson <richard.henderson@linaro.org>
+Some v8M instructions are present if either the floating point
 extension or MVE is implemented.  Update our implementation of them
 to check for MVE as well as for FP.
-We can use proper widening loads to extend 32-bit inputs,
+This is all the insns which use CheckDecodeFaults(ExtType_MveOrFp) or
-and skip the "widenfn" step.
+CheckDecodeFaults(ExtType_MveOrDpFp) in their pseudocode, which are
 essentially the loads and stores, moves and sysreg accesses, except
 for VMOV_reg_sp and VMOV_reg_dp, which we handle in subsequent
 patches because they need a refactor to provide a place to put the
 new MVE check.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-12-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210520152840.24453-3-peter.maydell@linaro.org
 ---
- target/arm/translate.c          |  6 +++
+ target/arm/translate-vfp.c | 48 +++++++++++++++++++++++---------------
- target/arm/translate-neon.c.inc | 66 ++++++++++++++++++---------------
+file changed, 29 insertions(+), 19 deletions(-)
 files changed, 43 insertions(+), 29 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/translate-vfp.c
-+++ b/target/arm/translate.c
++++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@ static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
-     long off = neon_element_offset(reg, ele, memop);
+     /* VMOV scalar to general purpose register */
+     TCGv_i32 tmp;
-     switch (memop) {
-+    case MO_SL:
+-    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
-+        tcg_gen_ld32s_i64(dest, cpu_env, off);
+-    if (a->size == MO_32
-+        break;
+-        ? !dc_isar_feature(aa32_fpsp_v2, s)
-+    case MO_UL:
+-        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        tcg_gen_ld32u_i64(dest, cpu_env, off);
+-        return false;
-+        break;
++    /*
-     case MO_Q:
++     * SIZE == MO_32 is a VFP instruction; otherwise NEON. MVE has
-         tcg_gen_ld_i64(dest, cpu_env, off);
++     * all sizes, whether the CPU has fp or not.
-         break;
++     */
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
++    if (!dc_isar_feature(aa32_mve, s)) {
-index XXXXXXX..XXXXXXX 100644
++        if (a->size == MO_32
---- a/target/arm/translate-neon.c.inc
++            ? !dc_isar_feature(aa32_fpsp_v2, s)
-+++ b/target/arm/translate-neon.c.inc
++            : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
++            return false;
- static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
++        }
-                            NeonGenWidenFn *widenfn,
+     }
-                            NeonGenTwo64OpFn *opfn,
--                           bool src1_wide)
+     /* UNDEF accesses to D16-D31 if they don't exist */
-+                           int src1_mop, int src2_mop)
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
      /* VMOV general purpose register to scalar */
      TCGv_i32 tmp;
 -    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
 -    if (a->size == MO_32
 -        ? !dc_isar_feature(aa32_fpsp_v2, s)
 -        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
 -        return false;
 +    /*
 +     * SIZE == MO_32 is a VFP instruction; otherwise NEON. MVE has
 +     * all sizes, whether the CPU has fp or not.
 +     */
 +    if (!dc_isar_feature(aa32_mve, s)) {
 +        if (a->size == MO_32
 +            ? !dc_isar_feature(aa32_fpsp_v2, s)
 +            : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +            return false;
 +        }
      }
      /* UNDEF accesses to D16-D31 if they don't exist */
@@ -XXX,XX +XXX,XX @@ typedef enum FPSysRegCheckResult {
  static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
  {
-     /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
+-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
-     TCGv_i64 rn0_64, rn1_64, rm_64;
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
--    TCGv_i32 rm;
+         return FPSysRegCheckFailed;
+     }
-     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-         return false;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
-@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
+ {
      TCGv_i32 tmp;
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return false;
      }
--    if (!widenfn || !opfn) {
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
-+    if (!opfn) {
+ {
-         /* size == 3 case, which is an entirely different insn group */
+     TCGv_i32 tmp;
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return false;
      }
--    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
-+    if ((a->vd & 1) || (src1_mop == MO_Q && (a->vn & 1))) {
+      * floating point register.  Note that this does not require support
       * for double precision arithmetic.
       */
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return false;
      }
-@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
-     rn1_64 = tcg_temp_new_i64();
+     uint32_t offset;
-     rm_64 = tcg_temp_new_i64();
+     TCGv_i32 addr, tmp;
--    if (src1_wide) {
+-    if (!dc_isar_feature(aa32_fp16_arith, s)) {
--        read_neon_element64(rn0_64, a->vn, 0, MO_64);
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
-+    if (src1_mop >= 0) {
+         return false;
 +        read_neon_element64(rn0_64, a->vn, 0, src1_mop);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 0, MO_32);
          widenfn(rn0_64, tmp);
          tcg_temp_free_i32(tmp);
      }
--    rm = tcg_temp_new_i32();
--    read_neon_element32(rm, a->vm, 0, MO_32);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
-+    if (src2_mop >= 0) {
+     uint32_t offset;
-+        read_neon_element64(rm_64, a->vm, 0, src2_mop);
+     TCGv_i32 addr, tmp;
-+    } else {
-+        TCGv_i32 tmp = tcg_temp_new_i32();
+-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
-+        read_neon_element32(tmp, a->vm, 0, MO_32);
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
-+        widenfn(rm_64, tmp);
+         return false;
 +        tcg_temp_free_i32(tmp);
 +    }
 -    widenfn(rm_64, rm);
 -    tcg_temp_free_i32(rm);
      opfn(rn0_64, rn0_64, rm_64);
      /*
       * Load second pass inputs before storing the first pass result, to
       * avoid incorrect results if a narrow input overlaps with the result.
       */
 -    if (src1_wide) {
 -        read_neon_element64(rn1_64, a->vn, 1, MO_64);
 +    if (src1_mop >= 0) {
 +        read_neon_element64(rn1_64, a->vn, 1, src1_mop);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 1, MO_32);
          widenfn(rn1_64, tmp);
          tcg_temp_free_i32(tmp);
      }
--    rm = tcg_temp_new_i32();
--    read_neon_element32(rm, a->vm, 1, MO_32);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
-+    if (src2_mop >= 0) {
+     TCGv_i64 tmp;
-+        read_neon_element64(rm_64, a->vm, 1, src2_mop);
-+    } else {
+     /* Note that this does not require support for double arithmetic.  */
-+        TCGv_i32 tmp = tcg_temp_new_i32();
+-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
-+        read_neon_element32(tmp, a->vm, 1, MO_32);
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
-+        widenfn(rm_64, tmp);
+         return false;
 +        tcg_temp_free_i32(tmp);
 +    }
      write_neon_element64(rn0_64, a->vd, 0, MO_64);
 -    widenfn(rm_64, rm);
 -    tcg_temp_free_i32(rm);
      opfn(rn1_64, rn1_64, rm_64);
      write_neon_element64(rn1_64, a->vd, 1, MO_64);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      return true;
  }
 -#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
 +#define DO_PREWIDEN(INSN, S, OP, SRC1WIDE, SIGN)                        \
      static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
      {                                                                   \
          static NeonGenWidenFn * const widenfn[] = {                     \
              gen_helper_neon_widen_##S##8,                               \
              gen_helper_neon_widen_##S##16,                              \
 -            tcg_gen_##EXT##_i32_i64,                                    \
 -            NULL,                                                       \
 +            NULL, NULL,                                                 \
          };                                                              \
          static NeonGenTwo64OpFn * const addfn[] = {                     \
              gen_helper_neon_##OP##l_u16,                                \
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
              tcg_gen_##OP##_i64,                                         \
              NULL,                                                       \
          };                                                              \
 -        return do_prewiden_3d(s, a, widenfn[a->size],                   \
 -                              addfn[a->size], SRC1WIDE);                \
 +        int narrow_mop = a->size == MO_32 ? MO_32 | SIGN : -1;          \
 +        return do_prewiden_3d(s, a, widenfn[a->size], addfn[a->size],   \
 +                              SRC1WIDE ? MO_Q : narrow_mop,             \
 +                              narrow_mop);                              \
      }
--DO_PREWIDEN(VADDL_S, s, ext, add, false)
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
--DO_PREWIDEN(VADDL_U, u, extu, add, false)
+     TCGv_i32 addr, tmp;
--DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
+     int i, n;
--DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
--DO_PREWIDEN(VADDW_S, s, ext, add, true)
+-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
--DO_PREWIDEN(VADDW_U, u, extu, add, true)
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
--DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
+         return false;
--DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
+     }
-+DO_PREWIDEN(VADDL_S, s, add, false, MO_SIGN)
-+DO_PREWIDEN(VADDL_U, u, add, false, 0)
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
-+DO_PREWIDEN(VSUBL_S, s, sub, false, MO_SIGN)
+     int i, n;
-+DO_PREWIDEN(VSUBL_U, u, sub, false, 0)
-+DO_PREWIDEN(VADDW_S, s, add, true, MO_SIGN)
+     /* Note that this does not require support for double arithmetic.  */
-+DO_PREWIDEN(VADDW_U, u, add, true, 0)
+-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
-+DO_PREWIDEN(VSUBW_S, s, sub, true, MO_SIGN)
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
-+DO_PREWIDEN(VSUBW_U, u, sub, true, 0)
+         return false;
+     }
- static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
                           NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
 --
 .20.1

-New patch
+[PULL 03/45] target/arm: Move fpsp/fpdp isar check into callers of do_vfp_2op_sp/dp
+The do_vfp_2op_sp() and do_vfp_2op_dp() functions currently check
+whether floating point is supported via the aa32_fpdp_v2 and
+aa32_fpsp_v2 isar checks.  For v8.1M MVE support, the VMOV_reg trans
+functions (but not any of the others) need to update this to also
+allow the insn if MVE is implemented.  Move the check out of the do_
+function and into its callsites (which are all implemented via the
+DO_VFP_2OP macro), so we have a place to change the check for the
+VMOV insns.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210520152840.24453-4-peter.maydell@linaro.org
+---
+ target/arm/translate-vfp.c | 37 +++++++++++++++++++------------------
+file changed, 19 insertions(+), 18 deletions(-)
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-vfp.c
++++ b/target/arm/translate-vfp.c
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
+     int veclen = s->vec_len;
+     TCGv_i32 f0, fd;
+-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+-        return false;
+-    }
++    /* Note that the caller must check the aa32_fpsp_v2 feature. */
+     if (!dc_isar_feature(aa32_fpshvec, s) &&
+         (veclen != 0 || s->vec_stride != 0)) {
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
+      */
+     TCGv_i32 f0;
++    /* Note that the caller must check the aa32_fp16_arith feature */
++
+     if (!dc_isar_feature(aa32_fp16_arith, s)) {
+         return false;
+     }
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
+     int veclen = s->vec_len;
+     TCGv_i64 f0, fd;
+-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+-        return false;
+-    }
++    /* Note that the caller must check the aa32_fpdp_v2 feature. */
+     /* UNDEF accesses to D16-D31 if they don't exist */
+     if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vm) & 0x10)) {
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
+     return true;
+ }
+-#define DO_VFP_2OP(INSN, PREC, FN)                              \
++#define DO_VFP_2OP(INSN, PREC, FN, CHECK)                       \
+     static bool trans_##INSN##_##PREC(DisasContext *s,          \
+                                       arg_##INSN##_##PREC *a)   \
+     {                                                           \
++        if (!dc_isar_feature(CHECK, s)) {                       \
++            return false;                                       \
++        }                                                       \
+         return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
+     }
+-DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32)
+-DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64)
++DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32, aa32_fpsp_v2)
++DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64, aa32_fpdp_v2)
+-DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh)
+-DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss)
+-DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd)
++DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh, aa32_fp16_arith)
++DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss, aa32_fpsp_v2)
++DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd, aa32_fpdp_v2)
+-DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh)
+-DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs)
+-DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd)
++DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh, aa32_fp16_arith)
++DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs, aa32_fpsp_v2)
++DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd, aa32_fpdp_v2)
+ static void gen_VSQRT_hp(TCGv_i32 vd, TCGv_i32 vm)
+ {
+@@ -XXX,XX +XXX,XX @@ static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
+     gen_helper_vfp_sqrtd(vd, vm, cpu_env);
+ }
+-DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp)
+-DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp)
+-DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp)
++DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp, aa32_fp16_arith)
++DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp, aa32_fpsp_v2)
++DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp, aa32_fpdp_v2)
+ static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
+ {
+--
+.20.1

-New patch
+[PULL 04/45] target/arm: Add MVE check to VMOV_reg_sp and VMOV_reg_dp
+Split out the handling of VMOV_reg_sp and VMOV_reg_dp so that we can
+permit the insns if either FP or MVE are present.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210520152840.24453-5-peter.maydell@linaro.org
+---
+ target/arm/translate-vfp.c | 15 +++++++++++++--
+file changed, 13 insertions(+), 2 deletions(-)
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-vfp.c
++++ b/target/arm/translate-vfp.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
+         return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
+     }
+-DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32, aa32_fpsp_v2)
+-DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64, aa32_fpdp_v2)
++#define DO_VFP_VMOV(INSN, PREC, FN)                             \
++    static bool trans_##INSN##_##PREC(DisasContext *s,          \
++                                      arg_##INSN##_##PREC *a)   \
++    {                                                           \
++        if (!dc_isar_feature(aa32_fp##PREC##_v2, s) &&          \
++            !dc_isar_feature(aa32_mve, s)) {                    \
++            return false;                                       \
++        }                                                       \
++        return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
++    }
++
++DO_VFP_VMOV(VMOV_reg, sp, tcg_gen_mov_i32)
++DO_VFP_VMOV(VMOV_reg, dp, tcg_gen_mov_i64)
+ DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh, aa32_fp16_arith)
+ DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss, aa32_fpsp_v2)
+--
+.20.1

-[PULL 25/26] qemu-option-trace.rst.inc: Don't use option:: markup
+[PULL 05/45] target/arm: Fix return values in fp_sysreg_checks()
-Sphinx 3.2 is pickier than earlier versions about the option:: markup,
+The fp_sysreg_checks() function is supposed to be returning an
-and complains about our usage in qemu-option-trace.rst:
+FPSysRegCheckResult, which is an enum with three possible values.
+However, three places in the function "return false" (a hangover from
-../../docs/qemu-option-trace.rst.inc:4:Malformed option description
+a previous iteration of the design where the function just returned a
-  '[enable=]PATTERN', should look like "opt", "-opt args", "--opt args",
+bool).  Make these return FPSysRegCheckFailed instead (for no
-  "/opt args" or "+opt args"
+functional change, since both false and FPSysRegCheckFailed are
+zero).
 In this file, we're really trying to document the different parts of
 the top-level --trace option, which qemu-nbd.rst and qemu-img.rst
 have already introduced with an option:: markup.  So it's not right
 to use option:: here anyway.  Switch to a different markup
 (definition lists) which gives about the same formatted output.
 (Unlike option::, this markup doesn't produce index entries; but
 at the moment we don't do anything much with indexes anyway, and
 in any case I think it doesn't make much sense to have individual
 index entries for the sub-parts of the --trace option.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20210520152840.24453-6-peter.maydell@linaro.org
 Message-id: 20201030174700.7204-3-peter.maydell@linaro.org
 ---
- docs/qemu-option-trace.rst.inc | 6 +++---
+ target/arm/translate-vfp.c | 6 +++---
 file changed, 3 insertions(+), 3 deletions(-)
-diff --git a/docs/qemu-option-trace.rst.inc b/docs/qemu-option-trace.rst.inc
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
---- a/docs/qemu-option-trace.rst.inc
+--- a/target/arm/translate-vfp.c
-+++ b/docs/qemu-option-trace.rst.inc
++++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
+         break;
- Specify tracing options.
+     case ARM_VFP_FPSCR_NZCVQC:
+         if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
--.. option:: [enable=]PATTERN
+-            return false;
-+``[enable=]PATTERN``
++            return FPSysRegCheckFailed;
+         }
-   Immediately enable events matching *PATTERN*
+         break;
-   (either event name or a globbing pattern).  This option is only
+     case ARM_VFP_FPCXT_S:
-@@ -XXX,XX +XXX,XX @@ Specify tracing options.
+     case ARM_VFP_FPCXT_NS:
+         if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-   Use :option:`-trace help` to print a list of names of trace points.
+-            return false;
++            return FPSysRegCheckFailed;
--.. option:: events=FILE
+         }
-+``events=FILE``
+         if (!s->v8m_secure) {
+-            return false;
-   Immediately enable events listed in *FILE*.
++            return FPSysRegCheckFailed;
-   The file must contain one event name (as listed in the ``trace-events-all``
+         }
-@@ -XXX,XX +XXX,XX @@ Specify tracing options.
+         break;
-   available if QEMU has been compiled with the ``simple``, ``log`` or
+     default:
    ``ftrace`` tracing backend.
 -.. option:: file=FILE
 +``file=FILE``
    Log output traces to *FILE*.
    This option is only available if QEMU has been compiled with
 --
 .20.1

-New patch
+[PULL 06/45] target/arm: Implement M-profile VPR register
+If MVE is implemented for an M-profile CPU then it has a VPR
+register, which tracks predication information.
+Implement the read and write handling of this register, and
+the migration of its state.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210520152840.24453-7-peter.maydell@linaro.org
+---
+ target/arm/cpu.h           |  6 ++++++
+ target/arm/machine.c       | 19 +++++++++++++++++++
+ target/arm/translate-vfp.c | 38 ++++++++++++++++++++++++++++++++++++++
+files changed, 63 insertions(+)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
+         uint32_t cpacr[M_REG_NUM_BANKS];
+         uint32_t nsacr;
+         int ltpsize;
++        uint32_t vpr;
+     } v7m;
+     /* Information associated with an exception about to be taken:
+@@ -XXX,XX +XXX,XX @@ FIELD(V7M_FPCCR, ASPEN, 31, 1)
+      R_V7M_FPCCR_UFRDY_MASK |                   \
+      R_V7M_FPCCR_ASPEN_MASK)
++/* v7M VPR bits */
++FIELD(V7M_VPR, P0, 0, 16)
++FIELD(V7M_VPR, MASK01, 16, 4)
++FIELD(V7M_VPR, MASK23, 20, 4)
++
+ /*
+  * System register ID fields.
+  */
+diff --git a/target/arm/machine.c b/target/arm/machine.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/machine.c
++++ b/target/arm/machine.c
+@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_fp = {
+     }
+ };
++static bool mve_needed(void *opaque)
++{
++    ARMCPU *cpu = opaque;
++
++    return cpu_isar_feature(aa32_mve, cpu);
++}
++
++static const VMStateDescription vmstate_m_mve = {
++    .name = "cpu/m/mve",
++    .version_id = 1,
++    .minimum_version_id = 1,
++    .needed = mve_needed,
++    .fields = (VMStateField[]) {
++        VMSTATE_UINT32(env.v7m.vpr, ARMCPU),
++        VMSTATE_END_OF_LIST()
++    },
++};
++
+ static const VMStateDescription vmstate_m = {
+     .name = "cpu/m",
+     .version_id = 4,
+@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m = {
+         &vmstate_m_other_sp,
+         &vmstate_m_v8m,
+         &vmstate_m_fp,
++        &vmstate_m_mve,
+         NULL
+     }
+ };
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-vfp.c
++++ b/target/arm/translate-vfp.c
+@@ -XXX,XX +XXX,XX @@ static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
+             return FPSysRegCheckFailed;
+         }
+         break;
++    case ARM_VFP_VPR:
++    case ARM_VFP_P0:
++        if (!dc_isar_feature(aa32_mve, s)) {
++            return FPSysRegCheckFailed;
++        }
++        break;
+     default:
+         return FPSysRegCheckFailed;
+     }
+@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
+         tcg_temp_free_i32(sfpa);
+         break;
+     }
++    case ARM_VFP_VPR:
++        /* Behaves as NOP if not privileged */
++        if (IS_USER(s)) {
++            break;
++        }
++        tmp = loadfn(s, opaque);
++        store_cpu_field(tmp, v7m.vpr);
++        break;
++    case ARM_VFP_P0:
++    {
++        TCGv_i32 vpr;
++        tmp = loadfn(s, opaque);
++        vpr = load_cpu_field(v7m.vpr);
++        tcg_gen_deposit_i32(vpr, vpr, tmp,
++                            R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
++        store_cpu_field(vpr, v7m.vpr);
++        tcg_temp_free_i32(tmp);
++        break;
++    }
+     default:
+         g_assert_not_reached();
+     }
+@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
+         tcg_temp_free_i32(fpscr);
+         break;
+     }
++    case ARM_VFP_VPR:
++        /* Behaves as NOP if not privileged */
++        if (IS_USER(s)) {
++            break;
++        }
++        tmp = load_cpu_field(v7m.vpr);
++        storefn(s, opaque, tmp);
++        break;
++    case ARM_VFP_P0:
++        tmp = load_cpu_field(v7m.vpr);
++        tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
++        storefn(s, opaque, tmp);
++        break;
+     default:
+         g_assert_not_reached();
+     }
+--
+.20.1

-New patch
+[PULL 07/45] target/arm: Make FPSCR.LTPSIZE writable for MVE
+The M-profile FPSCR has an LTPSIZE field, but if MVE is not
+implemented it is read-only and always reads as 4; this is how QEMU
+currently handles it.
+Make the field writable when MVE is implemented.
+We can safely add the field to the MVE migration struct because
+currently no CPUs enable MVE and so the migration struct is never
+used.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210520152840.24453-8-peter.maydell@linaro.org
+---
+ target/arm/cpu.h        | 3 ++-
+ target/arm/machine.c    | 1 +
+ target/arm/vfp_helper.c | 9 ++++++---
+files changed, 9 insertions(+), 4 deletions(-)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
+         uint32_t fpdscr[M_REG_NUM_BANKS];
+         uint32_t cpacr[M_REG_NUM_BANKS];
+         uint32_t nsacr;
+-        int ltpsize;
++        uint32_t ltpsize;
+         uint32_t vpr;
+     } v7m;
+@@ -XXX,XX +XXX,XX @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
+ #define FPCR_LTPSIZE_SHIFT 16   /* LTPSIZE, M-profile only */
+ #define FPCR_LTPSIZE_MASK (7 << FPCR_LTPSIZE_SHIFT)
++#define FPCR_LTPSIZE_LENGTH 3
+ #define FPCR_NZCV_MASK (FPCR_N | FPCR_Z | FPCR_C | FPCR_V)
+ #define FPCR_NZCVQC_MASK (FPCR_NZCV_MASK | FPCR_QC)
+diff --git a/target/arm/machine.c b/target/arm/machine.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/machine.c
++++ b/target/arm/machine.c
+@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_mve = {
+     .needed = mve_needed,
+     .fields = (VMStateField[]) {
+         VMSTATE_UINT32(env.v7m.vpr, ARMCPU),
++        VMSTATE_UINT32(env.v7m.ltpsize, ARMCPU),
+         VMSTATE_END_OF_LIST()
+     },
+ };
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vfp_helper.c
++++ b/target/arm/vfp_helper.c
+@@ -XXX,XX +XXX,XX @@ uint32_t vfp_get_fpscr(CPUARMState *env)
+ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
+ {
++    ARMCPU *cpu = env_archcpu(env);
++
+     /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
+-    if (!cpu_isar_feature(any_fp16, env_archcpu(env))) {
++    if (!cpu_isar_feature(any_fp16, cpu)) {
+         val &= ~FPCR_FZ16;
+     }
+@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
+          * because in v7A no-short-vector-support cores still had to
+          * allow Stride/Len to be written with the only effect that
+          * some insns are required to UNDEF if the guest sets them.
+-         *
+-         * TODO: if M-profile MVE implemented, set LTPSIZE.
+          */
+         env->vfp.vec_len = extract32(val, 16, 3);
+         env->vfp.vec_stride = extract32(val, 20, 2);
++    } else if (cpu_isar_feature(aa32_mve, cpu)) {
++        env->v7m.ltpsize = extract32(val, FPCR_LTPSIZE_SHIFT,
++                                     FPCR_LTPSIZE_LENGTH);
+     }
+     if (arm_feature(env, ARM_FEATURE_NEON)) {
+--
+.20.1

-[PULL 24/26] scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
+[PULL 08/45] target/arm: Allow board models to specify initial NS VTOR
-The kerneldoc script currently emits Sphinx markup for a macro with
+Currently we allow board models to specify the initial value of the
-arguments that uses the c:function directive. This is correct for
+Secure VTOR register, using an init-svtor property on the TYPE_ARMV7M
-Sphinx versions earlier than Sphinx 3, where c:macro doesn't allow
+object which is plumbed through to the CPU.  Allow board models to
-documentation of macros with arguments and c:function is not picky
+also specify the initial value of the Non-secure VTOR via a similar
-about the syntax of what it is passed. However, in Sphinx 3 the
+init-nsvtor property.
 c:macro directive was enhanced to support macros with arguments,
 and c:function was made more picky about what syntax it accepted.
 When kerneldoc is told that it needs to produce output for Sphinx
 or later, make it emit c:function only for functions and c:macro
 for macros with arguments. We assume that anything with a return
 type is a function and anything without is a macro.
 This fixes the Sphinx error:
 /home/petmay01/linaro/qemu-from-laptop/qemu/docs/../include/qom/object.h:155:Error in declarator
 If declarator-id with parameters (e.g., 'void f(int arg)'):
   Invalid C declaration: Expected identifier in nested name. [error at 25]
     DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
     -------------------------^
 If parenthesis in noptr-declarator (e.g., 'void (*f(int arg))(double)'):
   Error in declarator or parameters
   Invalid C declaration: Expecting "(" in parameters. [error at 39]
     DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
     ---------------------------------------^
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20210520152840.24453-10-peter.maydell@linaro.org
 Message-id: 20201030174700.7204-2-peter.maydell@linaro.org
 ---
- scripts/kernel-doc | 18 +++++++++++++++++-
+ include/hw/arm/armv7m.h |  2 ++
-file changed, 17 insertions(+), 1 deletion(-)
+ target/arm/cpu.h        |  2 ++
  hw/arm/armv7m.c         |  7 +++++++
  target/arm/cpu.c        | 10 ++++++++++
 files changed, 21 insertions(+)
-diff --git a/scripts/kernel-doc b/scripts/kernel-doc
+diff --git a/include/hw/arm/armv7m.h b/include/hw/arm/armv7m.h
-index XXXXXXX..XXXXXXX 100755
+index XXXXXXX..XXXXXXX 100644
---- a/scripts/kernel-doc
+--- a/include/hw/arm/armv7m.h
-+++ b/scripts/kernel-doc
++++ b/include/hw/arm/armv7m.h
-@@ -XXX,XX +XXX,XX @@ sub output_function_rst(%) {
+@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(ARMv7MState, ARMV7M)
-     output_highlight_rst($args{'purpose'});
+  *   devices will be automatically layered on top of this view.)
-     $start = "\n\n**Syntax**\n\n  ``";
+  * + Property "idau": IDAU interface (forwarded to CPU object)
-     } else {
+  * + Property "init-svtor": secure VTOR reset value (forwarded to CPU object)
--    print ".. c:function:: ";
++ * + Property "init-nsvtor": non-secure VTOR reset value (forwarded to CPU object)
-+        if ((split(/\./, $sphinx_version))[0] >= 3) {
+  * + Property "vfp": enable VFP (forwarded to CPU object)
-+            # Sphinx 3 and later distinguish macros and functions and
+  * + Property "dsp": enable DSP (forwarded to CPU object)
-+            # complain if you use c:function with something that's not
+  * + Property "enable-bitband": expose bitbanded IO
-+            # syntactically valid as a function declaration.
+@@ -XXX,XX +XXX,XX @@ struct ARMv7MState {
-+            # We assume that anything with a return type is a function
+     MemoryRegion *board_memory;
-+            # and anything without is a macro.
+     Object *idau;
-+            if ($args{'functiontype'} ne "") {
+     uint32_t init_svtor;
-+                print ".. c:function:: ";
++    uint32_t init_nsvtor;
-+            } else {
+     bool enable_bitband;
-+                print ".. c:macro:: ";
+     bool start_powered_off;
-+            }
+     bool vfp;
-+        } else {
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
-+            # Older Sphinx don't support documenting macros that take
+index XXXXXXX..XXXXXXX 100644
-+            # arguments with c:macro, and don't complain about the use
+--- a/target/arm/cpu.h
-+            # of c:function for this.
++++ b/target/arm/cpu.h
-+            print ".. c:function:: ";
+@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
      /* For v8M, initial value of the Secure VTOR */
      uint32_t init_svtor;
 +    /* For v8M, initial value of the Non-secure VTOR */
 +    uint32_t init_nsvtor;
      /* [QEMU_]KVM_ARM_TARGET_* constant for this CPU, or
       * QEMU_KVM_ARM_TARGET_NONE if the kernel doesn't support this CPU type.
 diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/armv7m.c
 +++ b/hw/arm/armv7m.c
@@ -XXX,XX +XXX,XX @@ static void armv7m_realize(DeviceState *dev, Error **errp)
              return;
          }
      }
 +    if (object_property_find(OBJECT(s->cpu), "init-nsvtor")) {
 +        if (!object_property_set_uint(OBJECT(s->cpu), "init-nsvtor",
 +                                      s->init_nsvtor, errp)) {
 +            return;
 +        }
++    }
+     if (object_property_find(OBJECT(s->cpu), "start-powered-off")) {
+         if (!object_property_set_bool(OBJECT(s->cpu), "start-powered-off",
+                                       s->start_powered_off, errp)) {
+@@ -XXX,XX +XXX,XX @@ static Property armv7m_properties[] = {
+                      MemoryRegion *),
+     DEFINE_PROP_LINK("idau", ARMv7MState, idau, TYPE_IDAU_INTERFACE, Object *),
+     DEFINE_PROP_UINT32("init-svtor", ARMv7MState, init_svtor, 0),
++    DEFINE_PROP_UINT32("init-nsvtor", ARMv7MState, init_nsvtor, 0),
+     DEFINE_PROP_BOOL("enable-bitband", ARMv7MState, enable_bitband, false),
+     DEFINE_PROP_BOOL("start-powered-off", ARMv7MState, start_powered_off,
+                      false),
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.c
++++ b/target/arm/cpu.c
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
+         env->regs[14] = 0xffffffff;
+         env->v7m.vecbase[M_REG_S] = cpu->init_svtor & 0xffffff80;
++        env->v7m.vecbase[M_REG_NS] = cpu->init_nsvtor & 0xffffff80;
+         /* Load the initial SP and PC from offset 0 and 4 in the vector table */
+         vecbase = env->v7m.vecbase[env->v7m.secure];
+@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
+                                        &cpu->init_svtor,
+                                        OBJ_PROP_FLAG_READWRITE);
      }
-     if ($args{'functiontype'} ne "") {
++    if (arm_feature(&cpu->env, ARM_FEATURE_M)) {
-     $start .= $args{'functiontype'} . " " . $args{'function'} . " (";
++        /*
 +         * Initial value of the NS VTOR (for cores without the Security
 +         * extension, this is the only VTOR)
 +         */
 +        object_property_add_uint32_ptr(obj, "init-nsvtor",
 +                                       &cpu->init_nsvtor,
 +                                       OBJ_PROP_FLAG_READWRITE);
 +    }
      qdev_property_add_static(DEVICE(obj), &arm_cpu_cfgend_property);
 --
 .20.1

-[PULL 22/26] configure: Test that gio libs from pkg-config work
+[PULL 09/45] arm: Consistently use "Cortex-Axx", not "Cortex Axx"
-On some hosts (eg Ubuntu Bionic) pkg-config returns a set of
+The official punctuation for Arm CPU names uses a hyphen, like
-libraries for gio-2.0 which don't actually work when compiling
+"Cortex-A9". We mostly follow this, but in a few places usage
-statically. (Specifically, the returned library string includes
+without the hyphen has crept in. Fix those so we consistently
--lmount, but not -lblkid which -lmount depends upon, so linking
+use the same way of writing the CPU name.
-fails due to missing symbols.)
+This commit was created with:
-Check that the libraries work, and don't enable gio if they don't,
+  git grep -z -l 'Cortex ' | xargs -0 sed -i 's/Cortex /Cortex-/'
 in the same way we do for gnutls.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200928160402.7961-1-peter.maydell@linaro.org
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Message-id: 20210527095152.10968-1-peter.maydell@linaro.org
 ---
- configure | 10 +++++++++-
+ docs/system/arm/aspeed.rst    | 4 ++--
-file changed, 9 insertions(+), 1 deletion(-)
+ docs/system/arm/nuvoton.rst   | 6 +++---
+ docs/system/arm/sabrelite.rst | 2 +-
-diff --git a/configure b/configure
+ include/hw/arm/allwinner-h3.h | 2 +-
-index XXXXXXX..XXXXXXX 100755
+ hw/arm/aspeed.c               | 6 +++---
---- a/configure
+ hw/arm/mcimx6ul-evk.c         | 2 +-
-+++ b/configure
+ hw/arm/mcimx7d-sabre.c        | 2 +-
-@@ -XXX,XX +XXX,XX @@ if test "$static" = yes && test "$mingw32" = yes; then
+ hw/arm/npcm7xx_boards.c       | 4 ++--
- fi
+ hw/arm/sabrelite.c            | 2 +-
+ hw/misc/npcm7xx_clk.c         | 2 +-
- if $pkg_config --atleast-version=$glib_req_ver gio-2.0; then
+files changed, 16 insertions(+), 16 deletions(-)
--    gio=yes
-     gio_cflags=$($pkg_config --cflags gio-2.0)
+diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst
-     gio_libs=$($pkg_config --libs gio-2.0)
+index XXXXXXX..XXXXXXX 100644
-     gdbus_codegen=$($pkg_config --variable=gdbus_codegen gio-2.0)
+--- a/docs/system/arm/aspeed.rst
-     if [ ! -x "$gdbus_codegen" ]; then
++++ b/docs/system/arm/aspeed.rst
-         gdbus_codegen=
+@@ -XXX,XX +XXX,XX @@ The QEMU Aspeed machines model BMCs of various OpenPOWER systems and
-     fi
+ Aspeed evaluation boards. They are based on different releases of the
-+    # Check that the libraries actually work -- Ubuntu 18.04 ships
+ Aspeed SoC : the AST2400 integrating an ARM926EJ-S CPU (400MHz), the
-+    # with pkg-config --static --libs data for gio-2.0 that is missing
+ AST2500 with an ARM1176JZS CPU (800MHz) and more recently the AST2600
-+    # -lblkid and will give a link error.
+-with dual cores ARM Cortex A7 CPUs (1.2GHz).
-+    write_c_skeleton
++with dual cores ARM Cortex-A7 CPUs (1.2GHz).
-+    if compile_prog "" "gio_libs" ; then
-+        gio=yes
+ The SoC comes with RAM, Gigabit ethernet, USB, SD/MMC, USB, SPI, I2C,
-+    else
+ etc.
-+        gio=no
+@@ -XXX,XX +XXX,XX @@ AST2500 SoC based machines :
-+    fi
- else
+ AST2600 SoC based machines :
-     gio=no
- fi
+-- ``ast2600-evb``          Aspeed AST2600 Evaluation board (Cortex A7)
 +- ``ast2600-evb``          Aspeed AST2600 Evaluation board (Cortex-A7)
  - ``tacoma-bmc``           OpenPOWER Witherspoon POWER9 AST2600 BMC
  Supported devices
 diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/system/arm/nuvoton.rst
 +++ b/docs/system/arm/nuvoton.rst
@@ -XXX,XX +XXX,XX @@ Nuvoton iBMC boards (``npcm750-evb``, ``quanta-gsj``)
  The `Nuvoton iBMC`_ chips (NPCM7xx) are a family of ARM-based SoCs that are
  designed to be used as Baseboard Management Controllers (BMCs) in various
 -servers. They all feature one or two ARM Cortex A9 CPU cores, as well as an
 +servers. They all feature one or two ARM Cortex-A9 CPU cores, as well as an
  assortment of peripherals targeted for either Enterprise or Data Center /
  Hyperscale applications. The former is a superset of the latter, so NPCM750 has
  all the peripherals of NPCM730 and more.
  .. _Nuvoton iBMC: https://www.nuvoton.com/products/cloud-computing/ibmc/
 -The NPCM750 SoC has two Cortex A9 cores and is targeted for the Enterprise
 +The NPCM750 SoC has two Cortex-A9 cores and is targeted for the Enterprise
  segment. The following machines are based on this chip :
  - ``npcm750-evb``       Nuvoton NPCM750 Evaluation board
 -The NPCM730 SoC has two Cortex A9 cores and is targeted for Data Center and
 +The NPCM730 SoC has two Cortex-A9 cores and is targeted for Data Center and
  Hyperscale applications. The following machines are based on this chip :
  - ``quanta-gsj``        Quanta GSJ server BMC
 diff --git a/docs/system/arm/sabrelite.rst b/docs/system/arm/sabrelite.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/system/arm/sabrelite.rst
 +++ b/docs/system/arm/sabrelite.rst
@@ -XXX,XX +XXX,XX @@ Supported devices
  The SABRE Lite machine supports the following devices:
 - * Up to 4 Cortex A9 cores
 + * Up to 4 Cortex-A9 cores
   * Generic Interrupt Controller
   * 1 Clock Controller Module
   * 1 System Reset Controller
 diff --git a/include/hw/arm/allwinner-h3.h b/include/hw/arm/allwinner-h3.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/arm/allwinner-h3.h
 +++ b/include/hw/arm/allwinner-h3.h
@@ -XXX,XX +XXX,XX @@
   */
  /*
 - * The Allwinner H3 is a System on Chip containing four ARM Cortex A7
 + * The Allwinner H3 is a System on Chip containing four ARM Cortex-A7
   * processor cores. Features and specifications include DDR2/DDR3 memory,
   * SD/MMC storage cards, 10/100/1000Mbit Ethernet, USB 2.0, HDMI and
   * various I/O modules.
 diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/aspeed.c
 +++ b/hw/arm/aspeed.c
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_ast2600_evb_class_init(ObjectClass *oc, void *data)
      MachineClass *mc = MACHINE_CLASS(oc);
      AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 -    mc->desc       = "Aspeed AST2600 EVB (Cortex A7)";
 +    mc->desc       = "Aspeed AST2600 EVB (Cortex-A7)";
      amc->soc_name  = "ast2600-a1";
      amc->hw_strap1 = AST2600_EVB_HW_STRAP1;
      amc->hw_strap2 = AST2600_EVB_HW_STRAP2;
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_tacoma_class_init(ObjectClass *oc, void *data)
      MachineClass *mc = MACHINE_CLASS(oc);
      AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 -    mc->desc       = "OpenPOWER Tacoma BMC (Cortex A7)";
 +    mc->desc       = "OpenPOWER Tacoma BMC (Cortex-A7)";
      amc->soc_name  = "ast2600-a1";
      amc->hw_strap1 = TACOMA_BMC_HW_STRAP1;
      amc->hw_strap2 = TACOMA_BMC_HW_STRAP2;
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_rainier_class_init(ObjectClass *oc, void *data)
      MachineClass *mc = MACHINE_CLASS(oc);
      AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 -    mc->desc       = "IBM Rainier BMC (Cortex A7)";
 +    mc->desc       = "IBM Rainier BMC (Cortex-A7)";
      amc->soc_name  = "ast2600-a1";
      amc->hw_strap1 = RAINIER_BMC_HW_STRAP1;
      amc->hw_strap2 = RAINIER_BMC_HW_STRAP2;
 diff --git a/hw/arm/mcimx6ul-evk.c b/hw/arm/mcimx6ul-evk.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/mcimx6ul-evk.c
 +++ b/hw/arm/mcimx6ul-evk.c
@@ -XXX,XX +XXX,XX @@ static void mcimx6ul_evk_init(MachineState *machine)
  static void mcimx6ul_evk_machine_init(MachineClass *mc)
  {
 -    mc->desc = "Freescale i.MX6UL Evaluation Kit (Cortex A7)";
 +    mc->desc = "Freescale i.MX6UL Evaluation Kit (Cortex-A7)";
      mc->init = mcimx6ul_evk_init;
      mc->max_cpus = FSL_IMX6UL_NUM_CPUS;
      mc->default_ram_id = "mcimx6ul-evk.ram";
 diff --git a/hw/arm/mcimx7d-sabre.c b/hw/arm/mcimx7d-sabre.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/mcimx7d-sabre.c
 +++ b/hw/arm/mcimx7d-sabre.c
@@ -XXX,XX +XXX,XX @@ static void mcimx7d_sabre_init(MachineState *machine)
  static void mcimx7d_sabre_machine_init(MachineClass *mc)
  {
 -    mc->desc = "Freescale i.MX7 DUAL SABRE (Cortex A7)";
 +    mc->desc = "Freescale i.MX7 DUAL SABRE (Cortex-A7)";
      mc->init = mcimx7d_sabre_init;
      mc->max_cpus = FSL_IMX7_NUM_CPUS;
      mc->default_ram_id = "mcimx7d-sabre.ram";
 diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/npcm7xx_boards.c
 +++ b/hw/arm/npcm7xx_boards.c
@@ -XXX,XX +XXX,XX @@ static void npcm750_evb_machine_class_init(ObjectClass *oc, void *data)
      npcm7xx_set_soc_type(nmc, TYPE_NPCM750);
 -    mc->desc = "Nuvoton NPCM750 Evaluation Board (Cortex A9)";
 +    mc->desc = "Nuvoton NPCM750 Evaluation Board (Cortex-A9)";
      mc->init = npcm750_evb_init;
      mc->default_ram_size = 512 * MiB;
  };
@@ -XXX,XX +XXX,XX @@ static void gsj_machine_class_init(ObjectClass *oc, void *data)
      npcm7xx_set_soc_type(nmc, TYPE_NPCM730);
 -    mc->desc = "Quanta GSJ (Cortex A9)";
 +    mc->desc = "Quanta GSJ (Cortex-A9)";
      mc->init = quanta_gsj_init;
      mc->default_ram_size = 512 * MiB;
  };
 diff --git a/hw/arm/sabrelite.c b/hw/arm/sabrelite.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/sabrelite.c
 +++ b/hw/arm/sabrelite.c
@@ -XXX,XX +XXX,XX @@ static void sabrelite_init(MachineState *machine)
  static void sabrelite_machine_init(MachineClass *mc)
  {
 -    mc->desc = "Freescale i.MX6 Quad SABRE Lite Board (Cortex A9)";
 +    mc->desc = "Freescale i.MX6 Quad SABRE Lite Board (Cortex-A9)";
      mc->init = sabrelite_init;
      mc->max_cpus = FSL_IMX6_NUM_CPUS;
      mc->ignore_memory_transaction_failures = true;
 diff --git a/hw/misc/npcm7xx_clk.c b/hw/misc/npcm7xx_clk.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/npcm7xx_clk.c
 +++ b/hw/misc/npcm7xx_clk.c
@@ -XXX,XX +XXX,XX @@
  #define NPCM7XX_CLOCK_REF_HZ            (25000000)
  /* Register Field Definitions */
 -#define NPCM7XX_CLK_WDRCR_CA9C  BIT(0) /* Cortex A9 Cores */
 +#define NPCM7XX_CLK_WDRCR_CA9C  BIT(0) /* Cortex-A9 Cores */
  #define PLLCON_LOKI     BIT(31)
  #define PLLCON_LOKS     BIT(30)
 --
 .20.1

-New patch
+[PULL 10/45] docs: Fix installation of man pages with Sphinx 4.x
+From: Damien Goutte-Gattat <dgouttegattat@incenp.org>
+The 4.x branch of Sphinx introduces a breaking change, as generated man
+pages are now written to subdirectories corresponding to the manual
+section they belong to. This results in `make install` erroring out when
+attempting to install the man pages, because they are not where it
+expects to find them.
+This patch restores the behavior of Sphinx 3.x regarding man pages.
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/256
+Signed-off-by: Damien Goutte-Gattat <dgouttegattat@incenp.org>
+Message-id: 20210503161422.15028-1-dgouttegattat@incenp.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ docs/conf.py | 1 +
+file changed, 1 insertion(+)
+diff --git a/docs/conf.py b/docs/conf.py
+index XXXXXXX..XXXXXXX 100644
+--- a/docs/conf.py
++++ b/docs/conf.py
+@@ -XXX,XX +XXX,XX @@
+      ['Stefan Hajnoczi <stefanha@redhat.com>',
+       'Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>'], 1),
+ ]
++man_make_section_directory = False
+ # -- Options for Texinfo output -------------------------------------------
+--
+.20.1

-New patch
+[PULL 11/45] target/arm: Mark LDS{MIN,MAX} as signed operations
+From: Richard Henderson <richard.henderson@linaro.org>
+The operands to tcg_gen_atomic_fetch_s{min,max}_i64 must
+be signed, so that the inputs are properly extended.
+Zero extend the result afterward, as needed.
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/364
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Message-id: 20210602020720.47679-1-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/translate-a64.c | 13 ++++++++++---
+file changed, 10 insertions(+), 3 deletions(-)
+diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-a64.c
++++ b/target/arm/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
+     int o3_opc = extract32(insn, 12, 4);
+     bool r = extract32(insn, 22, 1);
+     bool a = extract32(insn, 23, 1);
+-    TCGv_i64 tcg_rs, clean_addr;
++    TCGv_i64 tcg_rs, tcg_rt, clean_addr;
+     AtomicThreeOpFn *fn = NULL;
++    MemOp mop = s->be_data | size | MO_ALIGN;
+     if (is_vector || !dc_isar_feature(aa64_atomics, s)) {
+         unallocated_encoding(s);
+@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
+         break;
+     case 004: /* LDSMAX */
+         fn = tcg_gen_atomic_fetch_smax_i64;
++        mop |= MO_SIGN;
+         break;
+     case 005: /* LDSMIN */
+         fn = tcg_gen_atomic_fetch_smin_i64;
++        mop |= MO_SIGN;
+         break;
+     case 006: /* LDUMAX */
+         fn = tcg_gen_atomic_fetch_umax_i64;
+@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
+     }
+     tcg_rs = read_cpu_reg(s, rs, true);
++    tcg_rt = cpu_reg(s, rt);
+     if (o3_opc == 1) { /* LDCLR */
+         tcg_gen_not_i64(tcg_rs, tcg_rs);
+@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
+     /* The tcg atomic primitives are all full barriers.  Therefore we
+      * can ignore the Acquire and Release bits of this instruction.
+      */
+-    fn(cpu_reg(s, rt), clean_addr, tcg_rs, get_mem_index(s),
+-       s->be_data | size | MO_ALIGN);
++    fn(tcg_rt, clean_addr, tcg_rs, get_mem_index(s), mop);
++
++    if ((mop & MO_SIGN) && size != MO_64) {
++        tcg_gen_ext32u_i64(tcg_rt, tcg_rt);
++    }
+ }
+ /*
+--
+.20.1

-[PULL 14/26] target/arm: fix handling of HCR.FB
+[PULL 12/45] target/arm: fix missing exception class
-From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+From: Jamie Iles <jamie@nuviainc.com>
-HCR should be applied when NS is set, not when it is cleared.
+The DAIF and PAC checks used raise_exception_ra to raise an exception
 and unwind CPU state but raise_exception_ra is currently designed for
 handling data aborts as the syndrome is partially precomputed and
 encoded in the TB and then merged in merge_syn_data_abort when handling
 the data abort.  Using raise_exception_ra for DAIF and PAC checks
 results in an empty syndrome being retrieved from data[2] in
 restore_state_to_opc and setting ESR to 0.  This manifested as:
-Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+  kvm [571]: Unknown exception class: esr: 0x000000 –
   Unknown/Uncategorized
 when launching a KVM guest when the host qemu used a CPU supporting
 EL2+pointer authentication and enabling pointer authentication in the
 guest.
 Rework raise_exception_ra such that the state is restored before raising
 the exception so that the exception is not clobbered by
 restore_state_to_opc.
 Fixes: 0d43e1a2d29a ("target/arm: Add PAuth helpers")
 Cc: Richard Henderson <richard.henderson@linaro.org>
 Cc: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Jamie Iles <jamie@nuviainc.com>
 [PMM: added comment]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c | 5 ++---
+ target/arm/op_helper.c | 11 +++++++++--
-file changed, 2 insertions(+), 3 deletions(-)
+file changed, 9 insertions(+), 2 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/op_helper.c
-+++ b/target/arm/helper.c
++++ b/target/arm/op_helper.c
-@@ -XXX,XX +XXX,XX @@ static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+@@ -XXX,XX +XXX,XX @@ void raise_exception(CPUARMState *env, uint32_t excp,
+ void raise_exception_ra(CPUARMState *env, uint32_t excp, uint32_t syndrome,
- /*
+                         uint32_t target_el, uintptr_t ra)
   * Non-IS variants of TLB operations are upgraded to
 - * IS versions if we are at NS EL1 and HCR_EL2.FB is set to
 + * IS versions if we are at EL1 and HCR_EL2.FB is effectively set to
   * force broadcast of these operations.
   */
  static bool tlb_force_broadcast(CPUARMState *env)
  {
--    return (env->cp15.hcr_el2 & HCR_FB) &&
+-    CPUState *cs = do_raise_exception(env, excp, syndrome, target_el);
--        arm_current_el(env) == 1 && arm_is_secure_below_el3(env);
+-    cpu_loop_exit_restore(cs, ra);
-+    return arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_FB);
++    CPUState *cs = env_cpu(env);
 +
 +    /*
 +     * restore_state_to_opc() will set env->exception.syndrome, so
 +     * we must restore CPU state here before setting the syndrome
 +     * the caller passed us, and cannot use cpu_loop_exit_restore().
 +     */
 +    cpu_restore_state(cs, ra, true);
 +    raise_exception(env, excp, syndrome, target_el);
  }
- static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
+ uint64_t HELPER(neon_tbl)(CPUARMState *env, uint32_t desc,
 --
 .20.1

-[PULL 15/26] target/arm: fix LORID_EL1 access check
+[PULL 13/45] target/arm: fold do_raise_exception into raise_exception
-From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+From: Jamie Iles <jamie@nuviainc.com>
-Secure mode is not exempted from checking SCR_EL3.TLOR, and in the
+Now that there are no other users of do_raise_exception, fold it into
-future HCR_EL2.TLOR when S-EL2 is enabled.
+raise_exception.
-Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+Cc: Richard Henderson <richard.henderson@linaro.org>
 Cc: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Jamie Iles <jamie@nuviainc.com>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c | 19 +++++--------------
+ target/arm/op_helper.c | 12 ++----------
-file changed, 5 insertions(+), 14 deletions(-)
+file changed, 2 insertions(+), 10 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/op_helper.c
-+++ b/target/arm/helper.c
++++ b/target/arm/op_helper.c
-@@ -XXX,XX +XXX,XX @@ static uint64_t id_aa64pfr0_read(CPUARMState *env, const ARMCPRegInfo *ri)
+@@ -XXX,XX +XXX,XX @@
- #endif
+ #define SIGNBIT (uint32_t)0x80000000
+ #define SIGNBIT64 ((uint64_t)1 << 63)
- /* Shared logic between LORID and the rest of the LOR* registers.
-- * Secure state has already been delt with.
+-static CPUState *do_raise_exception(CPUARMState *env, uint32_t excp,
-+ * Secure state exclusion has already been dealt with.
+-                                    uint32_t syndrome, uint32_t target_el)
-  */
++void raise_exception(CPUARMState *env, uint32_t excp,
--static CPAccessResult access_lor_ns(CPUARMState *env)
++                     uint32_t syndrome, uint32_t target_el)
 +static CPAccessResult access_lor_ns(CPUARMState *env,
 +                                    const ARMCPRegInfo *ri, bool isread)
  {
-     int el = arm_current_el(env);
+     CPUState *cs = env_cpu(env);
-@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_ns(CPUARMState *env)
+@@ -XXX,XX +XXX,XX @@ static CPUState *do_raise_exception(CPUARMState *env, uint32_t excp,
-     return CP_ACCESS_OK;
+     cs->exception_index = excp;
- }
+     env->exception.syndrome = syndrome;
+     env->exception.target_el = target_el;
--static CPAccessResult access_lorid(CPUARMState *env, const ARMCPRegInfo *ri,
+-
--                                   bool isread)
+-    return cs;
 -{
 -    if (arm_is_secure_below_el3(env)) {
 -        /* Access ok in secure mode.  */
 -        return CP_ACCESS_OK;
 -    }
 -    return access_lor_ns(env);
 -}
 -
- static CPAccessResult access_lor_other(CPUARMState *env,
+-void raise_exception(CPUARMState *env, uint32_t excp,
-                                        const ARMCPRegInfo *ri, bool isread)
+-                     uint32_t syndrome, uint32_t target_el)
- {
+-{
-@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_other(CPUARMState *env,
+-    CPUState *cs = do_raise_exception(env, excp, syndrome, target_el);
-         /* Access denied in secure mode.  */
+     cpu_loop_exit(cs);
          return CP_ACCESS_TRAP;
      }
 -    return access_lor_ns(env);
 +    return access_lor_ns(env, ri, isread);
  }
- /*
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lor_reginfo[] = {
-       .type = ARM_CP_CONST, .resetvalue = 0 },
-     { .name = "LORID_EL1", .state = ARM_CP_STATE_AA64,
-       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 7,
--      .access = PL1_R, .accessfn = access_lorid,
-+      .access = PL1_R, .accessfn = access_lor_ns,
-       .type = ARM_CP_CONST, .resetvalue = 0 },
-     REGINFO_SENTINEL
- };
 --
 .20.1

-New patch
+[PULL 14/45] target/arm: use raise_exception_ra for MTE check failure
+From: Jamie Iles <jamie@nuviainc.com>
+Now that raise_exception_ra restores the state before raising the
+exception we can use restore_exception_ra to perform the state restore +
+exception raising without clobbering the syndrome.
+Cc: Richard Henderson <richard.henderson@linaro.org>
+Cc: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Jamie Iles <jamie@nuviainc.com>
+[PMM: Keep the one line of the comment that is still relevant]
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/mte_helper.c | 12 +++---------
+file changed, 3 insertions(+), 9 deletions(-)
+diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/mte_helper.c
++++ b/target/arm/mte_helper.c
+@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
+     switch (tcf) {
+     case 1:
+-        /*
+-         * Tag check fail causes a synchronous exception.
+-         *
+-         * In restore_state_to_opc, we set the exception syndrome
+-         * for the load or store operation.  Unwind first so we
+-         * may overwrite that with the syndrome for the tag check.
+-         */
+-        cpu_restore_state(env_cpu(env), ra, true);
++        /* Tag check fail causes a synchronous exception. */
+         env->exception.vaddress = dirty_ptr;
+         is_write = FIELD_EX32(desc, MTEDESC, WRITE);
+         syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0,
+                                     is_write, 0x11);
+-        raise_exception(env, EXCP_DATA_ABORT, syn, exception_target_el(env));
++        raise_exception_ra(env, EXCP_DATA_ABORT, syn,
++                           exception_target_el(env), ra);
+         /* noreturn, but fall through to the assert anyway */
+     case 0:
+--
+.20.1

-[PULL 21/26] target/arm: Get correct MMU index for other-security-state
+[PULL 15/45] target/arm: use raise_exception_ra for stack limit exception
-In arm_v7m_mmu_idx_for_secstate() we get the 'priv' level to pass to
+From: Jamie Iles <jamie@nuviainc.com>
 armv7m_mmu_idx_for_secstate_and_priv() by calling arm_current_el().
 This is incorrect when the security state being queried is not the
 current one, because arm_current_el() uses the current security state
 to determine which of the banked CONTROL.nPRIV bits to look at.
 The effect was that if (for instance) Secure state was in privileged
 mode but Non-Secure was not then we would return the wrong MMU index.
-The only places where we are using this function in a way that could
+The sequence cpu_restore_state() + raise_exception() is equivalent to
-trigger this bug are for the stack loads during a v8M function-return
+raise_exception_ra(), so use that instead.  (In this case we never
-and for the instruction fetch of a v8M SG insn.
+cared about the syndrome value, because M-profile doesn't use the
 syndrome; the old code was just written unnecessarily awkwardly.)
-Fix the bug by expanding out the M-profile version of the
+Cc: Richard Henderson <richard.henderson@linaro.org>
-arm_current_el() logic inline so it can use the passed in secstate
+Cc: Peter Maydell <peter.maydell@linaro.org>
-rather than env->v7m.secure.
+Signed-off-by: Jamie Iles <jamie@nuviainc.com>
+[PMM: Retain edited version of comment; rewrite commit message]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201022164408.13214-1-peter.maydell@linaro.org
 ---
- target/arm/m_helper.c | 3 ++-
+ target/arm/m_helper.c  | 5 +----
-file changed, 2 insertions(+), 1 deletion(-)
+ target/arm/op_helper.c | 9 +++------
 files changed, 4 insertions(+), 10 deletions(-)
 diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/m_helper.c
 +++ b/target/arm/m_helper.c
-@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
+@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
- /* Return the MMU index for a v7M CPU in the specified security state */
+             limit = is_psp ? env->v7m.psplim[false] : env->v7m.msplim[false];
- ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
- {
+             if (val < limit) {
--    bool priv = arm_current_el(env) != 0;
+-                CPUState *cs = env_cpu(env);
-+    bool priv = arm_v7m_is_handler_mode(env) ||
+-
-+        !(env->v7m.control[secstate] & 1);
+-                cpu_restore_state(cs, GETPC(), true);
+-                raise_exception(env, EXCP_STKOF, 0, 1);
-     return arm_v7m_mmu_idx_for_secstate_and_priv(env, secstate, priv);
++                raise_exception_ra(env, EXCP_STKOF, 0, 1, GETPC());
              }
              if (is_psp) {
 diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/op_helper.c
 +++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v8m_stackcheck)(CPUARMState *env, uint32_t newvalue)
       * raising an exception if the limit is breached.
       */
      if (newvalue < v7m_sp_limit(env)) {
 -        CPUState *cs = env_cpu(env);
 -
          /*
           * Stack limit exceptions are a rare case, so rather than syncing
 -         * PC/condbits before the call, we use cpu_restore_state() to
 -         * get them right before raising the exception.
 +         * PC/condbits before the call, we use raise_exception_ra() so
 +         * that cpu_restore_state() will sort them out.
           */
 -        cpu_restore_state(cs, GETPC(), true);
 -        raise_exception(env, EXCP_STKOF, 0, 1);
 +        raise_exception_ra(env, EXCP_STKOF, 0, 1, GETPC());
      }
  }
 --
 .20.1

-[PULL 02/26] target/arm: Move neon_element_offset to translate.c
+[PULL 16/45] target/arm: Add isar_feature_{aa32, aa64, aa64_sve}_bf16
 From: Richard Henderson <richard.henderson@linaro.org>
-This will shortly have users outside of translate-neon.c.inc.
+Note that the SVE BFLOAT16 support does not require SVE2,
 it is an independent extension.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-3-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-2-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          | 20 ++++++++++++++++++++
+ target/arm/cpu.h | 15 +++++++++++++++
- target/arm/translate-neon.c.inc | 19 -------------------
+file changed, 15 insertions(+)
 files changed, 20 insertions(+), 19 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/cpu.h
-+++ b/target/arm/translate.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_predinv(const ARMISARegisters *id)
-     return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+     return FIELD_EX32(id->id_isar6, ID_ISAR6, SPECRES) != 0;
  }
-+/*
++static inline bool isar_feature_aa32_bf16(const ARMISARegisters *id)
 + * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
 + * where 0 is the least significant end of the register.
 + */
 +static long neon_element_offset(int reg, int element, MemOp size)
 +{
-+    int element_size = 1 << size;
++    return FIELD_EX32(id->id_isar6, ID_ISAR6, BF16) != 0;
 +    int ofs = element * element_size;
 +#ifdef HOST_WORDS_BIGENDIAN
 +    /*
 +     * Calculate the offset assuming fully little-endian,
 +     * then XOR to account for the order of the 8-byte units.
 +     */
 +    if (element_size < 8) {
 +        ofs ^= 8 - element_size;
 +    }
 +#endif
 +    return neon_full_reg_offset(reg) + ofs;
 +}
 +
- static inline long vfp_reg_offset(bool dp, unsigned reg)
+ static inline bool isar_feature_aa32_i8mm(const ARMISARegisters *id)
  {
-     if (dp) {
+     return FIELD_EX32(id->id_isar6, ID_ISAR6, I8MM) != 0;
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_dcpodp(const ARMISARegisters *id)
-index XXXXXXX..XXXXXXX 100644
+     return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) >= 2;
---- a/target/arm/translate-neon.c.inc
+ }
-+++ b/target/arm/translate-neon.c.inc
-@@ -XXX,XX +XXX,XX @@ static inline int neon_3same_fp_size(DisasContext *s, int x)
++static inline bool isar_feature_aa64_bf16(const ARMISARegisters *id)
- #include "decode-neon-ls.c.inc"
++{
- #include "decode-neon-shared.c.inc"
++    return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, BF16) != 0;
++}
--/* Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
++
-- * where 0 is the least significant end of the register.
+ static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id)
 - */
 -static inline long
 -neon_element_offset(int reg, int element, MemOp size)
 -{
 -    int element_size = 1 << size;
 -    int ofs = element * element_size;
 -#ifdef HOST_WORDS_BIGENDIAN
 -    /* Calculate the offset assuming fully little-endian,
 -     * then XOR to account for the order of the 8-byte units.
 -     */
 -    if (element_size < 8) {
 -        ofs ^= 8 - element_size;
 -    }
 -#endif
 -    return neon_full_reg_offset(reg) + ofs;
 -}
 -
  static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
  {
-     long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
+     /* We always set the AdvSIMD and FP fields identically.  */
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sve2_bitperm(const ARMISARegisters *id)
      return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BITPERM) != 0;
  }
 +static inline bool isar_feature_aa64_sve_bf16(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BFLOAT16) != 0;
 +}
 +
  static inline bool isar_feature_aa64_sve2_sha3(const ARMISARegisters *id)
  {
      return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, SHA3) != 0;
 --
 .20.1

-New patch
+[PULL 17/45] target/arm: Unify unallocated path in disas_fp_1src
+From: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210525225817.400336-3-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/translate-a64.c | 15 ++++++---------
+file changed, 6 insertions(+), 9 deletions(-)
+diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-a64.c
++++ b/target/arm/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
+     int rd = extract32(insn, 0, 5);
+     if (mos) {
+-        unallocated_encoding(s);
+-        return;
++        goto do_unallocated;
+     }
+     switch (opcode) {
+@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
+         /* FCVT between half, single and double precision */
+         int dtype = extract32(opcode, 0, 2);
+         if (type == 2 || dtype == type) {
+-            unallocated_encoding(s);
+-            return;
++            goto do_unallocated;
+         }
+         if (!fp_access_check(s)) {
+             return;
+@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
+     case 0x10 ... 0x13: /* FRINT{32,64}{X,Z} */
+         if (type > 1 || !dc_isar_feature(aa64_frint, s)) {
+-            unallocated_encoding(s);
+-            return;
++            goto do_unallocated;
+         }
+         /* fall through */
+     case 0x0 ... 0x3:
+@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
+             break;
+         case 3:
+             if (!dc_isar_feature(aa64_fp16, s)) {
+-                unallocated_encoding(s);
+-                return;
++                goto do_unallocated;
+             }
+             if (!fp_access_check(s)) {
+@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
+             handle_fp_1src_half(s, opcode, rd, rn);
+             break;
+         default:
+-            unallocated_encoding(s);
++            goto do_unallocated;
+         }
+         break;
+     default:
++    do_unallocated:
+         unallocated_encoding(s);
+         break;
+     }
+--
+.20.1

-[PULL 08/26] target/arm: Add read/write_neon_element64
+[PULL 18/45] target/arm: Implement scalar float32 to bfloat16 conversion
 From: Richard Henderson <richard.henderson@linaro.org>
-Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.
+This is the 64-bit BFCVT and the 32-bit VCVT{B,T}.BF16.F32.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-9-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-4-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          | 26 +++++++++
+ target/arm/helper.h        |  1 +
- target/arm/translate-neon.c.inc | 94 ++++++++++++++++-----------------
+ target/arm/vfp.decode      |  2 ++
-files changed, 73 insertions(+), 47 deletions(-)
+ target/arm/translate-a64.c | 19 +++++++++++++++++++
  target/arm/translate-vfp.c | 24 ++++++++++++++++++++++++
  target/arm/vfp_helper.c    |  5 +++++
 files changed, 51 insertions(+)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/helper.h
-+++ b/target/arm/translate.c
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
-     }
  DEF_HELPER_2(vfp_fcvtds, f64, f32, env)
  DEF_HELPER_2(vfp_fcvtsd, f32, f64, env)
 +DEF_HELPER_FLAGS_2(bfcvt, TCG_CALL_NO_RWG, i32, f32, ptr)
  DEF_HELPER_2(vfp_uitoh, f16, i32, ptr)
  DEF_HELPER_2(vfp_uitos, f32, i32, ptr)
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
  # VCVTB and VCVTT to f16: Vd format is always vd_sp;
  # Vm format depends on size bit
 +VCVT_b16_f32 ---- 1110 1.11 0011 .... 1001 t:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_sp
  VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
               vd=%vd_sp vm=%vm_sp
  VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn)
      case 0x3: /* FSQRT */
          gen_helper_vfp_sqrts(tcg_res, tcg_op, cpu_env);
          goto done;
 +    case 0x6: /* BFCVT */
 +        gen_fpst = gen_helper_bfcvt;
 +        break;
      case 0x8: /* FRINTN */
      case 0x9: /* FRINTP */
      case 0xa: /* FRINTM */
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
          }
          break;
 +    case 0x6:
 +        switch (type) {
 +        case 1: /* BFCVT */
 +            if (!dc_isar_feature(aa64_bf16, s)) {
 +                goto do_unallocated;
 +            }
 +            if (!fp_access_check(s)) {
 +                return;
 +            }
 +            handle_fp_1src_single(s, opcode, rd, rn);
 +            break;
 +        default:
 +            goto do_unallocated;
 +        }
 +        break;
 +
      default:
      do_unallocated:
          unallocated_encoding(s);
 diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c
 +++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
      return true;
  }
-+static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
++static bool trans_VCVT_b16_f32(DisasContext *s, arg_VCVT_b16_f32 *a)
 +{
-+    long off = neon_element_offset(reg, ele, memop);
++    TCGv_ptr fpst;
 +    TCGv_i32 tmp;
 +
-+    switch (memop) {
++    if (!dc_isar_feature(aa32_bf16, s)) {
-+    case MO_Q:
++        return false;
 +        tcg_gen_ld_i64(dest, cpu_env, off);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
++
++    if (!vfp_access_check(s)) {
++        return true;
++    }
++
++    fpst = fpstatus_ptr(FPST_FPCR);
++    tmp = tcg_temp_new_i32();
++
++    vfp_load_reg32(tmp, a->vm);
++    gen_helper_bfcvt(tmp, tmp, fpst);
++    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
++    tcg_temp_free_ptr(fpst);
++    tcg_temp_free_i32(tmp);
++    return true;
 +}
 +
- static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
+ static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
  {
-     long off = neon_element_offset(reg, ele, memop);
+     TCGv_ptr fpst;
-@@ -XXX,XX +XXX,XX @@ static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
-     }
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ float32 VFP_HELPER(fcvts, d)(float64 x, CPUARMState *env)
      return float64_to_float32(x, &env->vfp.fp_status);
  }
-+static void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
++uint32_t HELPER(bfcvt)(float32 x, void *status)
 +{
-+    long off = neon_element_offset(reg, ele, memop);
++    return float32_to_bfloat16(x, status);
 +
 +    switch (memop) {
 +    case MO_64:
 +        tcg_gen_st_i64(src, cpu_env, off);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
- static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
+ /*
- {
+  * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
-     TCGv_ptr ret = tcg_temp_new_ptr();
+  * must always round-to-nearest; the AArch64 ones honour the FPSCR
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
      for (pass = 0; pass < a->q + 1; pass++) {
          TCGv_i64 tmp = tcg_temp_new_i64();
 -        neon_load_reg64(tmp, a->vm + pass);
 +        read_neon_element64(tmp, a->vm, pass, MO_64);
          fn(tmp, cpu_env, tmp, constimm);
 -        neon_store_reg64(tmp, a->vd + pass);
 +        write_neon_element64(tmp, a->vd, pass, MO_64);
          tcg_temp_free_i64(tmp);
      }
      tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
      rd = tcg_temp_new_i32();
      /* Load both inputs first to avoid potential overwrite if rm == rd */
 -    neon_load_reg64(rm1, a->vm);
 -    neon_load_reg64(rm2, a->vm + 1);
 +    read_neon_element64(rm1, a->vm, 0, MO_64);
 +    read_neon_element64(rm2, a->vm, 1, MO_64);
      shiftfn(rm1, rm1, constimm);
      narrowfn(rd, cpu_env, rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
          tcg_gen_shli_i64(tmp, tmp, a->shift);
          tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
      }
 -    neon_store_reg64(tmp, a->vd);
 +    write_neon_element64(tmp, a->vd, 0, MO_64);
      widenfn(tmp, rm1);
      tcg_temp_free_i32(rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
          tcg_gen_shli_i64(tmp, tmp, a->shift);
          tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
      }
 -    neon_store_reg64(tmp, a->vd + 1);
 +    write_neon_element64(tmp, a->vd, 1, MO_64);
      tcg_temp_free_i64(tmp);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      rm_64 = tcg_temp_new_i64();
      if (src1_wide) {
 -        neon_load_reg64(rn0_64, a->vn);
 +        read_neon_element64(rn0_64, a->vn, 0, MO_64);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 0, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
       * avoid incorrect results if a narrow input overlaps with the result.
       */
      if (src1_wide) {
 -        neon_load_reg64(rn1_64, a->vn + 1);
 +        read_neon_element64(rn1_64, a->vn, 1, MO_64);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      rm = tcg_temp_new_i32();
      read_neon_element32(rm, a->vm, 1, MO_32);
 -    neon_store_reg64(rn0_64, a->vd);
 +    write_neon_element64(rn0_64, a->vd, 0, MO_64);
      widenfn(rm_64, rm);
      tcg_temp_free_i32(rm);
      opfn(rn1_64, rn1_64, rm_64);
 -    neon_store_reg64(rn1_64, a->vd + 1);
 +    write_neon_element64(rn1_64, a->vd, 1, MO_64);
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
      rd0 = tcg_temp_new_i32();
      rd1 = tcg_temp_new_i32();
 -    neon_load_reg64(rn_64, a->vn);
 -    neon_load_reg64(rm_64, a->vm);
 +    read_neon_element64(rn_64, a->vn, 0, MO_64);
 +    read_neon_element64(rm_64, a->vm, 0, MO_64);
      opfn(rn_64, rn_64, rm_64);
      narrowfn(rd0, rn_64);
 -    neon_load_reg64(rn_64, a->vn + 1);
 -    neon_load_reg64(rm_64, a->vm + 1);
 +    read_neon_element64(rn_64, a->vn, 1, MO_64);
 +    read_neon_element64(rm_64, a->vm, 1, MO_64);
      opfn(rn_64, rn_64, rm_64);
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
      /* Don't store results until after all loads: they might overlap */
      if (accfn) {
          tmp = tcg_temp_new_i64();
 -        neon_load_reg64(tmp, a->vd);
 +        read_neon_element64(tmp, a->vd, 0, MO_64);
          accfn(tmp, tmp, rd0);
 -        neon_store_reg64(tmp, a->vd);
 -        neon_load_reg64(tmp, a->vd + 1);
 +        write_neon_element64(tmp, a->vd, 0, MO_64);
 +        read_neon_element64(tmp, a->vd, 1, MO_64);
          accfn(tmp, tmp, rd1);
 -        neon_store_reg64(tmp, a->vd + 1);
 +        write_neon_element64(tmp, a->vd, 1, MO_64);
          tcg_temp_free_i64(tmp);
      } else {
 -        neon_store_reg64(rd0, a->vd);
 -        neon_store_reg64(rd1, a->vd + 1);
 +        write_neon_element64(rd0, a->vd, 0, MO_64);
 +        write_neon_element64(rd1, a->vd, 1, MO_64);
      }
      tcg_temp_free_i64(rd0);
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
      if (accfn) {
          TCGv_i64 t64 = tcg_temp_new_i64();
 -        neon_load_reg64(t64, a->vd);
 +        read_neon_element64(t64, a->vd, 0, MO_64);
          accfn(t64, t64, rn0_64);
 -        neon_store_reg64(t64, a->vd);
 -        neon_load_reg64(t64, a->vd + 1);
 +        write_neon_element64(t64, a->vd, 0, MO_64);
 +        read_neon_element64(t64, a->vd, 1, MO_64);
          accfn(t64, t64, rn1_64);
 -        neon_store_reg64(t64, a->vd + 1);
 +        write_neon_element64(t64, a->vd, 1, MO_64);
          tcg_temp_free_i64(t64);
      } else {
 -        neon_store_reg64(rn0_64, a->vd);
 -        neon_store_reg64(rn1_64, a->vd + 1);
 +        write_neon_element64(rn0_64, a->vd, 0, MO_64);
 +        write_neon_element64(rn1_64, a->vd, 1, MO_64);
      }
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
          right = tcg_temp_new_i64();
          dest = tcg_temp_new_i64();
 -        neon_load_reg64(right, a->vn);
 -        neon_load_reg64(left, a->vm);
 +        read_neon_element64(right, a->vn, 0, MO_64);
 +        read_neon_element64(left, a->vm, 0, MO_64);
          tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
 -        neon_store_reg64(dest, a->vd);
 +        write_neon_element64(dest, a->vd, 0, MO_64);
          tcg_temp_free_i64(left);
          tcg_temp_free_i64(right);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
          destright = tcg_temp_new_i64();
          if (a->imm < 8) {
 -            neon_load_reg64(right, a->vn);
 -            neon_load_reg64(middle, a->vn + 1);
 +            read_neon_element64(right, a->vn, 0, MO_64);
 +            read_neon_element64(middle, a->vn, 1, MO_64);
              tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
 -            neon_load_reg64(left, a->vm);
 +            read_neon_element64(left, a->vm, 0, MO_64);
              tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
          } else {
 -            neon_load_reg64(right, a->vn + 1);
 -            neon_load_reg64(middle, a->vm);
 +            read_neon_element64(right, a->vn, 1, MO_64);
 +            read_neon_element64(middle, a->vm, 0, MO_64);
              tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
 -            neon_load_reg64(left, a->vm + 1);
 +            read_neon_element64(left, a->vm, 1, MO_64);
              tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
          }
 -        neon_store_reg64(destright, a->vd);
 -        neon_store_reg64(destleft, a->vd + 1);
 +        write_neon_element64(destright, a->vd, 0, MO_64);
 +        write_neon_element64(destleft, a->vd, 1, MO_64);
          tcg_temp_free_i64(destright);
          tcg_temp_free_i64(destleft);
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
          if (accfn) {
              TCGv_i64 tmp64 = tcg_temp_new_i64();
 -            neon_load_reg64(tmp64, a->vd + pass);
 +            read_neon_element64(tmp64, a->vd, pass, MO_64);
              accfn(rd_64, tmp64, rd_64);
              tcg_temp_free_i64(tmp64);
          }
 -        neon_store_reg64(rd_64, a->vd + pass);
 +        write_neon_element64(rd_64, a->vd, pass, MO_64);
          tcg_temp_free_i64(rd_64);
      }
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
      rd0 = tcg_temp_new_i32();
      rd1 = tcg_temp_new_i32();
 -    neon_load_reg64(rm, a->vm);
 +    read_neon_element64(rm, a->vm, 0, MO_64);
      narrowfn(rd0, cpu_env, rm);
 -    neon_load_reg64(rm, a->vm + 1);
 +    read_neon_element64(rm, a->vm, 1, MO_64);
      narrowfn(rd1, cpu_env, rm);
      write_neon_element32(rd0, a->vd, 0, MO_32);
      write_neon_element32(rd1, a->vd, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      widenfn(rd, rm0);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
 -    neon_store_reg64(rd, a->vd);
 +    write_neon_element64(rd, a->vd, 0, MO_64);
      widenfn(rd, rm1);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
 -    neon_store_reg64(rd, a->vd + 1);
 +    write_neon_element64(rd, a->vd, 1, MO_64);
      tcg_temp_free_i64(rd);
      tcg_temp_free_i32(rm0);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSWP(DisasContext *s, arg_2misc *a)
      rm = tcg_temp_new_i64();
      rd = tcg_temp_new_i64();
      for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
 -        neon_load_reg64(rm, a->vm + pass);
 -        neon_load_reg64(rd, a->vd + pass);
 -        neon_store_reg64(rm, a->vd + pass);
 -        neon_store_reg64(rd, a->vm + pass);
 +        read_neon_element64(rm, a->vm, pass, MO_64);
 +        read_neon_element64(rd, a->vd, pass, MO_64);
 +        write_neon_element64(rm, a->vd, pass, MO_64);
 +        write_neon_element64(rd, a->vm, pass, MO_64);
      }
      tcg_temp_free_i64(rm);
      tcg_temp_free_i64(rd);
 --
 .20.1

-New patch
+[PULL 19/45] target/arm: Implement vector float32 to bfloat16 conversion
+From: Richard Henderson <richard.henderson@linaro.org>
 This is BFCVT{N,T} for both AArch64 AdvSIMD and SVE,
 and VCVT.BF16.F32 for AArch32 NEON.
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210525225817.400336-5-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/helper-sve.h     |  4 ++++
  target/arm/helper.h         |  1 +
  target/arm/neon-dp.decode   |  1 +
  target/arm/sve.decode       |  2 ++
  target/arm/sve_helper.c     |  2 ++
  target/arm/translate-a64.c  | 17 ++++++++++++++
  target/arm/translate-neon.c | 45 +++++++++++++++++++++++++++++++++++++
  target/arm/translate-sve.c  | 16 +++++++++++++
  target/arm/vfp_helper.c     |  7 ++++++
 files changed, 95 insertions(+)
 diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper-sve.h
 +++ b/target/arm/helper-sve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve_fcvt_hd, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_5(sve_fcvt_sd, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(sve_bfcvt, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_5(sve_fcvtzs_hh, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve2_fcvtnt_sh, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_5(sve2_fcvtnt_ds, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(sve_bfcvtnt, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_5(sve2_fcvtlt_hs, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.h
 +++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
  DEF_HELPER_2(vfp_fcvtds, f64, f32, env)
  DEF_HELPER_2(vfp_fcvtsd, f32, f64, env)
  DEF_HELPER_FLAGS_2(bfcvt, TCG_CALL_NO_RWG, i32, f32, ptr)
 +DEF_HELPER_FLAGS_2(bfcvt_pair, TCG_CALL_NO_RWG, i32, i64, ptr)
  DEF_HELPER_2(vfp_uitoh, f16, i32, ptr)
  DEF_HELPER_2(vfp_uitos, f32, i32, ptr)
 diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-dp.decode
 +++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
      VRINTZ       1111 001 11 . 11 .. 10 .... 0 1011 . . 0 .... @2misc
      VCVT_F16_F32 1111 001 11 . 11 .. 10 .... 0 1100 0 . 0 .... @2misc_q0
 +    VCVT_B16_F32 1111 001 11 . 11 .. 10 .... 0 1100 1 . 0 .... @2misc_q0
      VRINTM       1111 001 11 . 11 .. 10 .... 0 1101 . . 0 .... @2misc
 diff --git a/target/arm/sve.decode b/target/arm/sve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/sve.decode
 +++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FNMLS_zpzzz     01100101 .. 1 ..... 111 ... ..... .....         @rdn_pg_rm_ra
  # SVE floating-point convert precision
  FCVT_sh         01100101 10 0010 00 101 ... ..... .....         @rd_pg_rn_e0
  FCVT_hs         01100101 10 0010 01 101 ... ..... .....         @rd_pg_rn_e0
 +BFCVT           01100101 10 0010 10 101 ... ..... .....         @rd_pg_rn_e0
  FCVT_dh         01100101 11 0010 00 101 ... ..... .....         @rd_pg_rn_e0
  FCVT_hd         01100101 11 0010 01 101 ... ..... .....         @rd_pg_rn_e0
  FCVT_ds         01100101 11 0010 10 101 ... ..... .....         @rd_pg_rn_e0
@@ -XXX,XX +XXX,XX @@ RAX1            01000101 00 1 ..... 11110 1 ..... .....  @rd_rn_rm_e0
  FCVTXNT_ds      01100100 00 0010 10 101 ... ..... .....  @rd_pg_rn_e0
  FCVTX_ds        01100101 00 0010 10 101 ... ..... .....  @rd_pg_rn_e0
  FCVTNT_sh       01100100 10 0010 00 101 ... ..... .....  @rd_pg_rn_e0
 +BFCVTNT         01100100 10 0010 10 101 ... ..... .....  @rd_pg_rn_e0
  FCVTLT_hs       01100100 10 0010 01 101 ... ..... .....  @rd_pg_rn_e0
  FCVTNT_ds       01100100 11 0010 10 101 ... ..... .....  @rd_pg_rn_e0
  FCVTLT_sd       01100100 11 0010 11 101 ... ..... .....  @rd_pg_rn_e0
 diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/sve_helper.c
 +++ b/target/arm/sve_helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint64_t vfp_float64_to_uint64_rtz(float64 f, float_status *s)
  DO_ZPZ_FP(sve_fcvt_sh, uint32_t, H1_4, sve_f32_to_f16)
  DO_ZPZ_FP(sve_fcvt_hs, uint32_t, H1_4, sve_f16_to_f32)
 +DO_ZPZ_FP(sve_bfcvt,   uint32_t, H1_4, float32_to_bfloat16)
  DO_ZPZ_FP(sve_fcvt_dh, uint64_t,     , sve_f64_to_f16)
  DO_ZPZ_FP(sve_fcvt_hd, uint64_t,     , sve_f16_to_f64)
  DO_ZPZ_FP(sve_fcvt_ds, uint64_t,     , float64_to_float32)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc)  \
      } while (i != 0);                                                         \
  }
 +DO_FCVTNT(sve_bfcvtnt,    uint32_t, uint16_t, H1_4, H1_2, float32_to_bfloat16)
  DO_FCVTNT(sve2_fcvtnt_sh, uint32_t, uint16_t, H1_4, H1_2, sve_f32_to_f16)
  DO_FCVTNT(sve2_fcvtnt_ds, uint64_t, uint32_t,     , H1_4, float64_to_float32)
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_2misc_narrow(DisasContext *s, bool scalar,
                  tcg_temp_free_i32(ahp);
              }
              break;
 +        case 0x36: /* BFCVTN, BFCVTN2 */
 +            {
 +                TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
 +                gen_helper_bfcvt_pair(tcg_res[pass], tcg_op, fpst);
 +                tcg_temp_free_ptr(fpst);
 +            }
 +            break;
          case 0x56:  /* FCVTXN, FCVTXN2 */
              /* 64 bit to 32 bit float conversion
               * with von Neumann rounding (round to odd)
@@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
              }
              handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd);
              return;
 +        case 0x36: /* BFCVTN, BFCVTN2 */
 +            if (!dc_isar_feature(aa64_bf16, s) || size != 2) {
 +                unallocated_encoding(s);
 +                return;
 +            }
 +            if (!fp_access_check(s)) {
 +                return;
 +            }
 +            handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd);
 +            return;
          case 0x17: /* FCVTL, FCVTL2 */
              if (!fp_access_check(s)) {
                  return;
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      return true;
  }
 +static bool trans_VCVT_B16_F32(DisasContext *s, arg_2misc *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i64 tmp;
 +    TCGv_i32 dst0, dst1;
 +
 +    if (!dc_isar_feature(aa32_bf16, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if ((a->vm & 1) || (a->size != 1)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = fpstatus_ptr(FPST_STD);
 +    tmp = tcg_temp_new_i64();
 +    dst0 = tcg_temp_new_i32();
 +    dst1 = tcg_temp_new_i32();
 +
 +    read_neon_element64(tmp, a->vm, 0, MO_64);
 +    gen_helper_bfcvt_pair(dst0, tmp, fpst);
 +
 +    read_neon_element64(tmp, a->vm, 1, MO_64);
 +    gen_helper_bfcvt_pair(dst1, tmp, fpst);
 +
 +    write_neon_element32(dst0, a->vd, 0, MO_32);
 +    write_neon_element32(dst1, a->vd, 1, MO_32);
 +
 +    tcg_temp_free_i64(tmp);
 +    tcg_temp_free_i32(dst0);
 +    tcg_temp_free_i32(dst1);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
  static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
  {
      TCGv_ptr fpst;
 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-sve.c
 +++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_hs(DisasContext *s, arg_rpr_esz *a)
      return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_hs);
  }
 +static bool trans_BFCVT(DisasContext *s, arg_rpr_esz *a)
 +{
 +    if (!dc_isar_feature(aa64_sve_bf16, s)) {
 +        return false;
 +    }
 +    return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_bfcvt);
 +}
 +
  static bool trans_FCVT_dh(DisasContext *s, arg_rpr_esz *a)
  {
      return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_dh);
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVTNT_sh(DisasContext *s, arg_rpr_esz *a)
      return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtnt_sh);
  }
 +static bool trans_BFCVTNT(DisasContext *s, arg_rpr_esz *a)
 +{
 +    if (!dc_isar_feature(aa64_sve_bf16, s)) {
 +        return false;
 +    }
 +    return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_bfcvtnt);
 +}
 +
  static bool trans_FCVTNT_ds(DisasContext *s, arg_rpr_esz *a)
  {
      if (!dc_isar_feature(aa64_sve2, s)) {
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(bfcvt)(float32 x, void *status)
      return float32_to_bfloat16(x, status);
  }
 +uint32_t HELPER(bfcvt_pair)(uint64_t pair, void *status)
 +{
 +    bfloat16 lo = float32_to_bfloat16(extract64(pair, 0, 32), status);
 +    bfloat16 hi = float32_to_bfloat16(extract64(pair, 32, 32), status);
 +    return deposit32(lo, 16, 16, hi);
 +}
 +
  /*
   * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
   * must always round-to-nearest; the AArch64 ones honour the FPSCR
 --
 .20.1

-[PULL 09/26] target/arm: Rename neon_load_reg64 to vfp_load_reg64
+[PULL 20/45] softfpu: Add float_round_to_odd_inf
 From: Richard Henderson <richard.henderson@linaro.org>
-The only uses of this function are for loading VFP
+For Arm BFDOT and BFMMLA, we need a version of round-to-odd
-double-precision values, and nothing to do with NEON.
+that overflows to infinity, instead of the max normal number.
+Cc: Alex Bennée <alex.bennee@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-10-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-6-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c         |  8 ++--
+ include/fpu/softfloat-types.h | 4 +++-
- target/arm/translate-vfp.c.inc | 84 +++++++++++++++++-----------------
+ fpu/softfloat-parts.c.inc     | 6 ++++--
-files changed, 46 insertions(+), 46 deletions(-)
+files changed, 7 insertions(+), 3 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/include/fpu/softfloat-types.h
-+++ b/target/arm/translate.c
++++ b/include/fpu/softfloat-types.h
-@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
+@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
-     }
+     float_round_up           = 2,
- }
+     float_round_to_zero      = 3,
+     float_round_ties_away    = 4,
--static inline void neon_load_reg64(TCGv_i64 var, int reg)
+-    /* Not an IEEE rounding mode: round to the closest odd mantissa value */
-+static inline void vfp_load_reg64(TCGv_i64 var, int reg)
++    /* Not an IEEE rounding mode: round to closest odd, overflow to max */
- {
+     float_round_to_odd       = 5,
--    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
++    /* Not an IEEE rounding mode: round to closest odd, overflow to inf */
-+    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(true, reg));
++    float_round_to_odd_inf   = 6,
- }
+ } FloatRoundMode;
--static inline void neon_store_reg64(TCGv_i64 var, int reg)
+ /*
-+static inline void vfp_store_reg64(TCGv_i64 var, int reg)
+diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
  {
 -    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
 +    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(true, reg));
  }
  static inline void vfp_load_reg32(TCGv_i32 var, int reg)
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.c.inc
+--- a/fpu/softfloat-parts.c.inc
-+++ b/target/arm/translate-vfp.c.inc
++++ b/fpu/softfloat-parts.c.inc
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
          tcg_gen_ext_i32_i64(nf, cpu_NF);
          tcg_gen_ext_i32_i64(vf, cpu_VF);
 -        neon_load_reg64(frn, rn);
 -        neon_load_reg64(frm, rm);
 +        vfp_load_reg64(frn, rn);
 +        vfp_load_reg64(frm, rm);
          switch (a->cc) {
          case 0: /* eq: Z */
              tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
              tcg_temp_free_i64(tmp);
              break;
          }
 -        neon_store_reg64(dest, rd);
 +        vfp_store_reg64(dest, rd);
          tcg_temp_free_i64(frn);
          tcg_temp_free_i64(frm);
          tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          TCGv_i64 tcg_res;
          tcg_op = tcg_temp_new_i64();
          tcg_res = tcg_temp_new_i64();
 -        neon_load_reg64(tcg_op, rm);
 +        vfp_load_reg64(tcg_op, rm);
          gen_helper_rintd(tcg_res, tcg_op, fpst);
 -        neon_store_reg64(tcg_res, rd);
 +        vfp_store_reg64(tcg_res, rd);
          tcg_temp_free_i64(tcg_op);
          tcg_temp_free_i64(tcg_res);
      } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
          tcg_double = tcg_temp_new_i64();
          tcg_res = tcg_temp_new_i64();
          tcg_tmp = tcg_temp_new_i32();
 -        neon_load_reg64(tcg_double, rm);
 +        vfp_load_reg64(tcg_double, rm);
          if (is_signed) {
              gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
          } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
      tmp = tcg_temp_new_i64();
      if (a->l) {
          gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
 -        neon_store_reg64(tmp, a->vd);
 +        vfp_store_reg64(tmp, a->vd);
      } else {
 -        neon_load_reg64(tmp, a->vd);
 +        vfp_load_reg64(tmp, a->vd);
          gen_aa32_st64(s, tmp, addr, get_mem_index(s));
      }
      tcg_temp_free_i64(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
          if (a->l) {
              /* load */
              gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
 -            neon_store_reg64(tmp, a->vd + i);
 +            vfp_store_reg64(tmp, a->vd + i);
          } else {
              /* store */
 -            neon_load_reg64(tmp, a->vd + i);
 +            vfp_load_reg64(tmp, a->vd + i);
              gen_aa32_st64(s, tmp, addr, get_mem_index(s));
          }
          tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
      fd = tcg_temp_new_i64();
      fpst = fpstatus_ptr(FPST_FPCR);
 -    neon_load_reg64(f0, vn);
 -    neon_load_reg64(f1, vm);
 +    vfp_load_reg64(f0, vn);
 +    vfp_load_reg64(f1, vm);
      for (;;) {
          if (reads_vd) {
 -            neon_load_reg64(fd, vd);
 +            vfp_load_reg64(fd, vd);
          }
          fn(fd, f0, f1, fpst);
 -        neon_store_reg64(fd, vd);
 +        vfp_store_reg64(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
          veclen--;
          vd = vfp_advance_dreg(vd, delta_d);
          vn = vfp_advance_dreg(vn, delta_d);
 -        neon_load_reg64(f0, vn);
 +        vfp_load_reg64(f0, vn);
          if (delta_m) {
              vm = vfp_advance_dreg(vm, delta_m);
 -            neon_load_reg64(f1, vm);
 +            vfp_load_reg64(f1, vm);
          }
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
      f0 = tcg_temp_new_i64();
      fd = tcg_temp_new_i64();
 -    neon_load_reg64(f0, vm);
 +    vfp_load_reg64(f0, vm);
      for (;;) {
          fn(fd, f0);
 -        neon_store_reg64(fd, vd);
 +        vfp_store_reg64(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
              /* single source one-many */
              while (veclen--) {
                  vd = vfp_advance_dreg(vd, delta_d);
 -                neon_store_reg64(fd, vd);
 +                vfp_store_reg64(fd, vd);
              }
              break;
          }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
          veclen--;
          vd = vfp_advance_dreg(vd, delta_d);
          vd = vfp_advance_dreg(vm, delta_m);
 -        neon_load_reg64(f0, vm);
 +        vfp_load_reg64(f0, vm);
      }
      tcg_temp_free_i64(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
      vm = tcg_temp_new_i64();
      vd = tcg_temp_new_i64();
 -    neon_load_reg64(vn, a->vn);
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vn, a->vn);
 +    vfp_load_reg64(vm, a->vm);
      if (neg_n) {
          /* VFNMS, VFMS */
          gen_helper_vfp_negd(vn, vn);
      }
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      if (neg_d) {
          /* VFNMA, VFNMS */
          gen_helper_vfp_negd(vd, vd);
      }
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
      fd = tcg_const_i64(vfp_expand_imm(MO_64, a->imm));
      for (;;) {
 -        neon_store_reg64(fd, vd);
 +        vfp_store_reg64(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
      vd = tcg_temp_new_i64();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      if (a->z) {
          tcg_gen_movi_i64(vm, 0);
      } else {
 -        neon_load_reg64(vm, a->vm);
 +        vfp_load_reg64(vm, a->vm);
      }
      if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
      tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
      vd = tcg_temp_new_i64();
      gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(ahp_mode);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
      tmp = tcg_temp_new_i32();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
      tcg_temp_free_i64(vm);
      tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rintd(tmp, tmp, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rintd(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      tcg_temp_free_i32(tcg_rmode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rintd_exact(tmp, tmp, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
      vd = tcg_temp_new_i64();
      vfp_load_reg32(vm, a->vm);
      gen_helper_vfp_fcvtds(vd, vm, cpu_env);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_i64(vd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
      vd = tcg_temp_new_i32();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
      vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
          /* u32 -> f64 */
          gen_helper_vfp_uitod(vd, vm, fpst);
      }
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_i64(vd);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
      vm = tcg_temp_new_i64();
      vd = tcg_temp_new_i32();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vjcvt(vd, vm, cpu_env);
      vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i64(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i64();
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
          g_assert_not_reached();
      }
--    neon_store_reg64(vd, a->vd);
++    overflow_norm = false;
-+    vfp_store_reg64(vd, a->vd);
+     switch (s->float_rounding_mode) {
-     tcg_temp_free_i64(vd);
+     case float_round_nearest_even:
-     tcg_temp_free_i32(shift);
+-        overflow_norm = false;
-     tcg_temp_free_ptr(fpst);
+         inc = ((p->frac_lo & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
+         break;
-     fpst = fpstatus_ptr(FPST_FPCR);
+     case float_round_ties_away:
-     vm = tcg_temp_new_i64();
+-        overflow_norm = false;
-     vd = tcg_temp_new_i32();
+         inc = frac_lsbm1;
--    neon_load_reg64(vm, a->vm);
+         break;
-+    vfp_load_reg64(vm, a->vm);
+     case float_round_to_zero:
+@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
-     if (a->s) {
+         break;
-         if (a->rz) {
+     case float_round_to_odd:
          overflow_norm = true;
 +        /* fall through */
 +    case float_round_to_odd_inf:
          inc = p->frac_lo & frac_lsb ? 0 : round_mask;
          break;
      default:
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
                         ? frac_lsbm1 : 0);
                  break;
              case float_round_to_odd:
 +            case float_round_to_odd_inf:
                  inc = p->frac_lo & frac_lsb ? 0 : round_mask;
                  break;
              default:
 --
 .20.1

-[PULL 05/26] target/arm: Add read/write_neon_element32
+[PULL 21/45] target/arm: Implement bfloat16 dot product (vector)
 From: Richard Henderson <richard.henderson@linaro.org>
-Model these off the aa64 read/write_vec_element functions.
+This is BFDOT for both AArch64 AdvSIMD and SVE,
-Use it within translate-neon.c.inc.  The new functions do
+and VDOT.BF16 for AArch32 NEON.
 not allocate or free temps, so this rearranges the calling
 code a bit.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-6-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-7-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          |  26 ++++
+ target/arm/helper.h           |  3 +++
- target/arm/translate-neon.c.inc | 256 ++++++++++++++++++++------------
+ target/arm/neon-shared.decode |  2 ++
-files changed, 183 insertions(+), 99 deletions(-)
+ target/arm/sve.decode         |  3 +++
  target/arm/translate-a64.c    | 20 ++++++++++++++++++
  target/arm/translate-neon.c   |  9 ++++++++
  target/arm/translate-sve.c    | 12 +++++++++++
  target/arm/vec_helper.c       | 40 +++++++++++++++++++++++++++++++++++
 files changed, 89 insertions(+)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/helper.h
-+++ b/target/arm/translate.c
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_ummla_b, TCG_CALL_NO_RWG,
-     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
+ DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, i32)
 +
  #ifdef TARGET_AARCH64
  #include "helper-a64.h"
  #include "helper-sve.h"
 diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-shared.decode
 +++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUDOT          1111 110 00 . 10 .... .... 1101 . q:1 . 1 .... \
                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
  VUSDOT         1111 110 01 . 10 .... .... 1101 . q:1 . 0 .... \
                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +VDOT_b16       1111 110 00 . 00 .... .... 1101 . q:1 . 0 .... \
 +               vm=%vm_dp vn=%vn_dp vd=%vd_dp
  # VFM[AS]L
  VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
 diff --git a/target/arm/sve.decode b/target/arm/sve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/sve.decode
 +++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
  FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_e0
  FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_e0
 +### SVE2 floating-point bfloat16 dot-product
 +BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
 +
  ### SVE2 floating-point multiply-add long (indexed)
  FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
  FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          feature = dc_isar_feature(aa64_fcma, s);
          break;
 +    case 0x1f: /* BFDOT */
 +        switch (size) {
 +        case 1:
 +            feature = dc_isar_feature(aa64_bf16, s);
 +            break;
 +        default:
 +            unallocated_encoding(s);
 +            return;
 +        }
 +        break;
      default:
          unallocated_encoding(s);
          return;
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          return;
 +    case 0xf: /* BFDOT */
 +        switch (size) {
 +        case 1:
 +            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
 +        return;
 +
      default:
          g_assert_not_reached();
      }
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VUSDOT(DisasContext *s, arg_VUSDOT *a)
                          gen_helper_gvec_usdot_b);
  }
-+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
++static bool trans_VDOT_b16(DisasContext *s, arg_VDOT_b16 *a)
 +{
-+    long off = neon_element_offset(reg, ele, size);
++    if (!dc_isar_feature(aa32_bf16, s)) {
-+
++        return false;
 +    switch (size) {
 +    case MO_32:
 +        tcg_gen_ld_i32(dest, cpu_env, off);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
++    return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0,
++                        gen_helper_gvec_bfdot);
 +}
 +
-+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
+ static bool trans_VFML(DisasContext *s, arg_VFML *a)
  {
      int opr_sz;
 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-sve.c
 +++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_UMMLA(DisasContext *s, arg_rrrr_esz *a)
  {
      return do_i8mm_zzzz_ool(s, a, gen_helper_gvec_ummla_b, 0);
  }
 +
 +static bool trans_BFDOT_zzzz(DisasContext *s, arg_rrrr_esz *a)
 +{
-+    long off = neon_element_offset(reg, ele, size);
++    if (!dc_isar_feature(aa64_sve_bf16, s)) {
 +        return false;
 +    }
 +    if (sve_access_check(s)) {
 +        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfdot,
 +                          a->rd, a->rn, a->rm, a->ra, 0);
 +    }
 +    return true;
 +}
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static void do_mmla_b(void *vd, void *vn, void *vm, void *va, uint32_t desc,
  DO_MMLA_B(gvec_smmla_b, do_smmla_b)
  DO_MMLA_B(gvec_ummla_b, do_ummla_b)
  DO_MMLA_B(gvec_usmmla_b, do_usmmla_b)
 +
-+    switch (size) {
++/*
-+    case MO_32:
++ * BFloat16 Dot Product
-+        tcg_gen_st_i32(src, cpu_env, off);
++ */
-+        break;
++
-+    default:
++static float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2)
-+        g_assert_not_reached();
++{
-+    }
++    /* FPCR is ignored for BFDOT and BFMMLA. */
 +    float_status bf_status = {
 +        .tininess_before_rounding = float_tininess_before_rounding,
 +        .float_rounding_mode = float_round_to_odd_inf,
 +        .flush_to_zero = true,
 +        .flush_inputs_to_zero = true,
 +        .default_nan_mode = true,
 +    };
 +    float32 t1, t2;
 +
 +    /*
 +     * Extract each BFloat16 from the element pair, and shift
 +     * them such that they become float32.
 +     */
 +    t1 = float32_mul(e1 << 16, e2 << 16, &bf_status);
 +    t2 = float32_mul(e1 & 0xffff0000u, e2 & 0xffff0000u, &bf_status);
 +    t1 = float32_add(t1, t2, &bf_status);
 +    t1 = float32_add(sum, t1, &bf_status);
 +
 +    return t1;
 +}
 +
- static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
++void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
- {
++{
-     TCGv_ptr ret = tcg_temp_new_ptr();
++    intptr_t i, opr_sz = simd_oprsz(desc);
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
++    float32 *d = vd, *a = va;
-index XXXXXXX..XXXXXXX 100644
++    uint32_t *n = vn, *m = vm;
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
       * early. Since Q is 0 there are always just two passes, so instead
       * of a complicated loop over each pass we just unroll.
       */
 -    tmp = neon_load_reg(a->vn, 0);
 -    tmp2 = neon_load_reg(a->vn, 1);
 +    tmp = tcg_temp_new_i32();
 +    tmp2 = tcg_temp_new_i32();
 +    tmp3 = tcg_temp_new_i32();
 +
-+    read_neon_element32(tmp, a->vn, 0, MO_32);
++    for (i = 0; i < opr_sz / 4; ++i) {
-+    read_neon_element32(tmp2, a->vn, 1, MO_32);
++        d[i] = bfdotadd(a[i], n[i], m[i]);
-     fn(tmp, tmp, tmp2);
++    }
--    tcg_temp_free_i32(tmp2);
++    clear_tail(d, opr_sz, simd_maxsz(desc));
++}
 -    tmp3 = neon_load_reg(a->vm, 0);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    read_neon_element32(tmp3, a->vm, 0, MO_32);
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      fn(tmp3, tmp3, tmp2);
 -    tcg_temp_free_i32(tmp2);
 -    neon_store_reg(a->vd, 0, tmp);
 -    neon_store_reg(a->vd, 1, tmp3);
 +    write_neon_element32(tmp, a->vd, 0, MO_32);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +
 +    tcg_temp_free_i32(tmp);
 +    tcg_temp_free_i32(tmp2);
 +    tcg_temp_free_i32(tmp3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
       * 2-reg-and-shift operations, size < 3 case, where the
       * helper needs to be passed cpu_env.
       */
 -    TCGv_i32 constimm;
 +    TCGv_i32 constimm, tmp;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
       * by immediate using the variable shift operations.
       */
      constimm = tcg_const_i32(dup_const(a->size, a->shift));
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
 +        read_neon_element32(tmp, a->vm, pass, MO_32);
          fn(tmp, cpu_env, tmp, constimm);
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(constimm);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
      constimm = tcg_const_i64(-a->shift);
      rm1 = tcg_temp_new_i64();
      rm2 = tcg_temp_new_i64();
 +    rd = tcg_temp_new_i32();
      /* Load both inputs first to avoid potential overwrite if rm == rd */
      neon_load_reg64(rm1, a->vm);
      neon_load_reg64(rm2, a->vm + 1);
      shiftfn(rm1, rm1, constimm);
 -    rd = tcg_temp_new_i32();
      narrowfn(rd, cpu_env, rm1);
 -    neon_store_reg(a->vd, 0, rd);
 +    write_neon_element32(rd, a->vd, 0, MO_32);
      shiftfn(rm2, rm2, constimm);
 -    rd = tcg_temp_new_i32();
      narrowfn(rd, cpu_env, rm2);
 -    neon_store_reg(a->vd, 1, rd);
 +    write_neon_element32(rd, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd);
      tcg_temp_free_i64(rm1);
      tcg_temp_free_i64(rm2);
      tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      constimm = tcg_const_i32(imm);
      /* Load all inputs first to avoid potential overwrite */
 -    rm1 = neon_load_reg(a->vm, 0);
 -    rm2 = neon_load_reg(a->vm, 1);
 -    rm3 = neon_load_reg(a->vm + 1, 0);
 -    rm4 = neon_load_reg(a->vm + 1, 1);
 +    rm1 = tcg_temp_new_i32();
 +    rm2 = tcg_temp_new_i32();
 +    rm3 = tcg_temp_new_i32();
 +    rm4 = tcg_temp_new_i32();
 +    read_neon_element32(rm1, a->vm, 0, MO_32);
 +    read_neon_element32(rm2, a->vm, 1, MO_32);
 +    read_neon_element32(rm3, a->vm, 2, MO_32);
 +    read_neon_element32(rm4, a->vm, 3, MO_32);
      rtmp = tcg_temp_new_i64();
      shiftfn(rm1, rm1, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      tcg_temp_free_i32(rm2);
      narrowfn(rm1, cpu_env, rtmp);
 -    neon_store_reg(a->vd, 0, rm1);
 +    write_neon_element32(rm1, a->vd, 0, MO_32);
 +    tcg_temp_free_i32(rm1);
      shiftfn(rm3, rm3, constimm);
      shiftfn(rm4, rm4, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      narrowfn(rm3, cpu_env, rtmp);
      tcg_temp_free_i64(rtmp);
 -    neon_store_reg(a->vd, 1, rm3);
 +    write_neon_element32(rm3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rm3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
          widen_mask = dup_const(a->size + 1, widen_mask);
      }
 -    rm0 = neon_load_reg(a->vm, 0);
 -    rm1 = neon_load_reg(a->vm, 1);
 +    rm0 = tcg_temp_new_i32();
 +    rm1 = tcg_temp_new_i32();
 +    read_neon_element32(rm0, a->vm, 0, MO_32);
 +    read_neon_element32(rm1, a->vm, 1, MO_32);
      tmp = tcg_temp_new_i64();
      widenfn(tmp, rm0);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      if (src1_wide) {
          neon_load_reg64(rn0_64, a->vn);
      } else {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vn, 0, MO_32);
          widenfn(rn0_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = neon_load_reg(a->vm, 0);
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rm, a->vm, 0, MO_32);
      widenfn(rm_64, rm);
      tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      if (src1_wide) {
          neon_load_reg64(rn1_64, a->vn + 1);
      } else {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vn, 1, MO_32);
          widenfn(rn1_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = neon_load_reg(a->vm, 1);
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rm, a->vm, 1, MO_32);
      neon_store_reg64(rn0_64, a->vd);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
      narrowfn(rd1, rn_64);
 -    neon_store_reg(a->vd, 0, rd0);
 -    neon_store_reg(a->vd, 1, rd1);
 +    write_neon_element32(rd0, a->vd, 0, MO_32);
 +    write_neon_element32(rd1, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd0);
 +    tcg_temp_free_i32(rd1);
      tcg_temp_free_i64(rn_64);
      tcg_temp_free_i64(rm_64);
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
      rd0 = tcg_temp_new_i64();
      rd1 = tcg_temp_new_i64();
 -    rn = neon_load_reg(a->vn, 0);
 -    rm = neon_load_reg(a->vm, 0);
 +    rn = tcg_temp_new_i32();
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rn, a->vn, 0, MO_32);
 +    read_neon_element32(rm, a->vm, 0, MO_32);
      opfn(rd0, rn, rm);
 -    tcg_temp_free_i32(rn);
 -    tcg_temp_free_i32(rm);
 -    rn = neon_load_reg(a->vn, 1);
 -    rm = neon_load_reg(a->vm, 1);
 +    read_neon_element32(rn, a->vn, 1, MO_32);
 +    read_neon_element32(rm, a->vm, 1, MO_32);
      opfn(rd1, rn, rm);
      tcg_temp_free_i32(rn);
      tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
  static inline TCGv_i32 neon_get_scalar(int size, int reg)
  {
 -    TCGv_i32 tmp;
 -    if (size == 1) {
 -        tmp = neon_load_reg(reg & 7, reg >> 4);
 +    TCGv_i32 tmp = tcg_temp_new_i32();
 +    if (size == MO_16) {
 +        read_neon_element32(tmp, reg & 7, reg >> 4, MO_32);
          if (reg & 8) {
              gen_neon_dup_high16(tmp);
          } else {
              gen_neon_dup_low16(tmp);
          }
      } else {
 -        tmp = neon_load_reg(reg & 15, reg >> 4);
 +        read_neon_element32(tmp, reg & 15, reg >> 4, MO_32);
      }
      return tmp;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
       * perform an accumulation operation of that result into the
       * destination.
       */
 -    TCGv_i32 scalar;
 +    TCGv_i32 scalar, tmp;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
      }
      scalar = neon_get_scalar(a->size, a->vm);
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
 +        read_neon_element32(tmp, a->vn, pass, MO_32);
          opfn(tmp, tmp, scalar);
          if (accfn) {
 -            TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +            TCGv_i32 rd = tcg_temp_new_i32();
 +            read_neon_element32(rd, a->vd, pass, MO_32);
              accfn(tmp, rd, tmp);
              tcg_temp_free_i32(rd);
          }
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(scalar);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
       * performs a kind of fused op-then-accumulate using a helper
       * function that takes all of rd, rn and the scalar at once.
       */
 -    TCGv_i32 scalar;
 +    TCGv_i32 scalar, rn, rd;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
      }
      scalar = neon_get_scalar(a->size, a->vm);
 +    rn = tcg_temp_new_i32();
 +    rd = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 rn = neon_load_reg(a->vn, pass);
 -        TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +        read_neon_element32(rn, a->vn, pass, MO_32);
 +        read_neon_element32(rd, a->vd, pass, MO_32);
          opfn(rd, cpu_env, rn, scalar, rd);
 -        tcg_temp_free_i32(rn);
 -        neon_store_reg(a->vd, pass, rd);
 +        write_neon_element32(rd, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(rn);
 +    tcg_temp_free_i32(rd);
      tcg_temp_free_i32(scalar);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
      scalar = neon_get_scalar(a->size, a->vm);
      /* Load all inputs before writing any outputs, in case of overlap */
 -    rn = neon_load_reg(a->vn, 0);
 +    rn = tcg_temp_new_i32();
 +    read_neon_element32(rn, a->vn, 0, MO_32);
      rn0_64 = tcg_temp_new_i64();
      opfn(rn0_64, rn, scalar);
 -    tcg_temp_free_i32(rn);
 -    rn = neon_load_reg(a->vn, 1);
 +    read_neon_element32(rn, a->vn, 1, MO_32);
      rn1_64 = tcg_temp_new_i64();
      opfn(rn1_64, rn, scalar);
      tcg_temp_free_i32(rn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
          return false;
      }
      n <<= 3;
 +    tmp = tcg_temp_new_i32();
      if (a->op) {
 -        tmp = neon_load_reg(a->vd, 0);
 +        read_neon_element32(tmp, a->vd, 0, MO_32);
      } else {
 -        tmp = tcg_temp_new_i32();
          tcg_gen_movi_i32(tmp, 0);
      }
 -    tmp2 = neon_load_reg(a->vm, 0);
 +    tmp2 = tcg_temp_new_i32();
 +    read_neon_element32(tmp2, a->vm, 0, MO_32);
      ptr1 = vfp_reg_ptr(true, a->vn);
      tmp4 = tcg_const_i32(n);
      gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
 -    tcg_temp_free_i32(tmp);
 +
      if (a->op) {
 -        tmp = neon_load_reg(a->vd, 1);
 +        read_neon_element32(tmp, a->vd, 1, MO_32);
      } else {
 -        tmp = tcg_temp_new_i32();
          tcg_gen_movi_i32(tmp, 0);
      }
 -    tmp3 = neon_load_reg(a->vm, 1);
 +    tmp3 = tcg_temp_new_i32();
 +    read_neon_element32(tmp3, a->vm, 1, MO_32);
      gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(tmp4);
      tcg_temp_free_ptr(ptr1);
 -    neon_store_reg(a->vd, 0, tmp2);
 -    neon_store_reg(a->vd, 1, tmp3);
 -    tcg_temp_free_i32(tmp);
 +
 +    write_neon_element32(tmp2, a->vd, 0, MO_32);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp2);
 +    tcg_temp_free_i32(tmp3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
  static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
  {
      int pass, half;
 +    TCGv_i32 tmp[2];
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
          return true;
      }
 -    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
 -        TCGv_i32 tmp[2];
 +    tmp[0] = tcg_temp_new_i32();
 +    tmp[1] = tcg_temp_new_i32();
 +    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
          for (half = 0; half < 2; half++) {
 -            tmp[half] = neon_load_reg(a->vm, pass * 2 + half);
 +            read_neon_element32(tmp[half], a->vm, pass * 2 + half, MO_32);
              switch (a->size) {
              case 0:
                  tcg_gen_bswap32_i32(tmp[half], tmp[half]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
                  g_assert_not_reached();
              }
          }
 -        neon_store_reg(a->vd, pass * 2, tmp[1]);
 -        neon_store_reg(a->vd, pass * 2 + 1, tmp[0]);
 +        write_neon_element32(tmp[1], a->vd, pass * 2, MO_32);
 +        write_neon_element32(tmp[0], a->vd, pass * 2 + 1, MO_32);
      }
 +
 +    tcg_temp_free_i32(tmp[0]);
 +    tcg_temp_free_i32(tmp[1]);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
          rm0_64 = tcg_temp_new_i64();
          rm1_64 = tcg_temp_new_i64();
          rd_64 = tcg_temp_new_i64();
 -        tmp = neon_load_reg(a->vm, pass * 2);
 +
 +        tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vm, pass * 2, MO_32);
          widenfn(rm0_64, tmp);
 -        tcg_temp_free_i32(tmp);
 -        tmp = neon_load_reg(a->vm, pass * 2 + 1);
 +        read_neon_element32(tmp, a->vm, pass * 2 + 1, MO_32);
          widenfn(rm1_64, tmp);
          tcg_temp_free_i32(tmp);
 +
          opfn(rd_64, rm0_64, rm1_64);
          tcg_temp_free_i64(rm0_64);
          tcg_temp_free_i64(rm1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
      narrowfn(rd0, cpu_env, rm);
      neon_load_reg64(rm, a->vm + 1);
      narrowfn(rd1, cpu_env, rm);
 -    neon_store_reg(a->vd, 0, rd0);
 -    neon_store_reg(a->vd, 1, rd1);
 +    write_neon_element32(rd0, a->vd, 0, MO_32);
 +    write_neon_element32(rd1, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd0);
 +    tcg_temp_free_i32(rd1);
      tcg_temp_free_i64(rm);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      }
      rd = tcg_temp_new_i64();
 +    rm0 = tcg_temp_new_i32();
 +    rm1 = tcg_temp_new_i32();
 -    rm0 = neon_load_reg(a->vm, 0);
 -    rm1 = neon_load_reg(a->vm, 1);
 +    read_neon_element32(rm0, a->vm, 0, MO_32);
 +    read_neon_element32(rm1, a->vm, 1, MO_32);
      widenfn(rd, rm0);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
      fpst = fpstatus_ptr(FPST_STD);
      ahp = get_ahp_flag();
 -    tmp = neon_load_reg(a->vm, 0);
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vm, 0, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    tmp2 = tcg_temp_new_i32();
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp2, tmp2, fpst, ahp);
      tcg_gen_shli_i32(tmp2, tmp2, 16);
      tcg_gen_or_i32(tmp2, tmp2, tmp);
 -    tcg_temp_free_i32(tmp);
 -    tmp = neon_load_reg(a->vm, 2);
 +    read_neon_element32(tmp, a->vm, 2, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
 -    tmp3 = neon_load_reg(a->vm, 3);
 -    neon_store_reg(a->vd, 0, tmp2);
 +    tmp3 = tcg_temp_new_i32();
 +    read_neon_element32(tmp3, a->vm, 3, MO_32);
 +    write_neon_element32(tmp2, a->vd, 0, MO_32);
 +    tcg_temp_free_i32(tmp2);
      gen_helper_vfp_fcvt_f32_to_f16(tmp3, tmp3, fpst, ahp);
      tcg_gen_shli_i32(tmp3, tmp3, 16);
      tcg_gen_or_i32(tmp3, tmp3, tmp);
 -    neon_store_reg(a->vd, 1, tmp3);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp3);
      tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(ahp);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
      fpst = fpstatus_ptr(FPST_STD);
      ahp = get_ahp_flag();
      tmp3 = tcg_temp_new_i32();
 -    tmp = neon_load_reg(a->vm, 0);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    tmp2 = tcg_temp_new_i32();
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vm, 0, MO_32);
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      tcg_gen_ext16u_i32(tmp3, tmp);
      gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
 -    neon_store_reg(a->vd, 0, tmp3);
 +    write_neon_element32(tmp3, a->vd, 0, MO_32);
      tcg_gen_shri_i32(tmp, tmp, 16);
      gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp);
 -    neon_store_reg(a->vd, 1, tmp);
 -    tmp3 = tcg_temp_new_i32();
 +    write_neon_element32(tmp, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp);
      tcg_gen_ext16u_i32(tmp3, tmp2);
      gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
 -    neon_store_reg(a->vd, 2, tmp3);
 +    write_neon_element32(tmp3, a->vd, 2, MO_32);
 +    tcg_temp_free_i32(tmp3);
      tcg_gen_shri_i32(tmp2, tmp2, 16);
      gen_helper_vfp_fcvt_f16_to_f32(tmp2, tmp2, fpst, ahp);
 -    neon_store_reg(a->vd, 3, tmp2);
 +    write_neon_element32(tmp2, a->vd, 3, MO_32);
 +    tcg_temp_free_i32(tmp2);
      tcg_temp_free_i32(ahp);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ DO_2M_CRYPTO(SHA256SU0, aa32_sha2, 2)
  static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
  {
 +    TCGv_i32 tmp;
      int pass;
      /* Handle a 2-reg-misc operation by iterating 32 bits at a time */
@@ -XXX,XX +XXX,XX @@ static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
          return true;
      }
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
 +        read_neon_element32(tmp, a->vm, pass, MO_32);
          fn(tmp, tmp);
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VTRN(DisasContext *s, arg_2misc *a)
          return true;
      }
 -    if (a->size == 2) {
 +    tmp = tcg_temp_new_i32();
 +    tmp2 = tcg_temp_new_i32();
 +    if (a->size == MO_32) {
          for (pass = 0; pass < (a->q ? 4 : 2); pass += 2) {
 -            tmp = neon_load_reg(a->vm, pass);
 -            tmp2 = neon_load_reg(a->vd, pass + 1);
 -            neon_store_reg(a->vm, pass, tmp2);
 -            neon_store_reg(a->vd, pass + 1, tmp);
 +            read_neon_element32(tmp, a->vm, pass, MO_32);
 +            read_neon_element32(tmp2, a->vd, pass + 1, MO_32);
 +            write_neon_element32(tmp2, a->vm, pass, MO_32);
 +            write_neon_element32(tmp, a->vd, pass + 1, MO_32);
          }
      } else {
          for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -            tmp = neon_load_reg(a->vm, pass);
 -            tmp2 = neon_load_reg(a->vd, pass);
 -            if (a->size == 0) {
 +            read_neon_element32(tmp, a->vm, pass, MO_32);
 +            read_neon_element32(tmp2, a->vd, pass, MO_32);
 +            if (a->size == MO_8) {
                  gen_neon_trn_u8(tmp, tmp2);
              } else {
                  gen_neon_trn_u16(tmp, tmp2);
              }
 -            neon_store_reg(a->vm, pass, tmp2);
 -            neon_store_reg(a->vd, pass, tmp);
 +            write_neon_element32(tmp2, a->vm, pass, MO_32);
 +            write_neon_element32(tmp, a->vd, pass, MO_32);
          }
      }
 +    tcg_temp_free_i32(tmp);
 +    tcg_temp_free_i32(tmp2);
      return true;
  }
 --
 .20.1

-[PULL 01/26] target/arm: Introduce neon_full_reg_offset
+[PULL 22/45] target/arm: Implement bfloat16 dot product (indexed)
 From: Richard Henderson <richard.henderson@linaro.org>
-This function makes it clear that we're talking about the whole
+This is BFDOT for both AArch64 AdvSIMD and SVE,
-register, and not the 32-bit piece at index 0.  This fixes a bug
+and VDOT.BF16 for AArch32 NEON.
 when running on a big-endian host.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-2-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-8-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          |  8 ++++++
+ target/arm/helper.h           |  2 ++
- target/arm/translate-neon.c.inc | 44 ++++++++++++++++-----------------
+ target/arm/neon-shared.decode |  2 ++
- target/arm/translate-vfp.c.inc  |  2 +-
+ target/arm/sve.decode         |  3 +++
-files changed, 31 insertions(+), 23 deletions(-)
+ target/arm/translate-a64.c    | 41 +++++++++++++++++++++++++++--------
  target/arm/translate-neon.c   |  9 ++++++++
  target/arm/translate-sve.c    | 12 ++++++++++
  target/arm/vec_helper.c       | 20 +++++++++++++++++
 files changed, 80 insertions(+), 9 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/helper.h
-+++ b/target/arm/translate.c
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ static inline void gen_hlt(DisasContext *s, int imm)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG,
-     unallocated_encoding(s);
  DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, i32)
  #ifdef TARGET_AARCH64
  #include "helper-a64.h"
 diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-shared.decode
 +++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUSDOT_scalar  1111 1110 1 . 00 .... .... 1101 . q:1 index:1 0 vm:4 \
                 vn=%vn_dp vd=%vd_dp
  VSUDOT_scalar  1111 1110 1 . 00 .... .... 1101 . q:1 index:1 1 vm:4 \
                 vn=%vn_dp vd=%vd_dp
 +VDOT_b16_scal  1111 1110 0 . 00 .... .... 1101 . q:1 index:1 0 vm:4 \
 +               vn=%vn_dp vd=%vd_dp
  %vfml_scalar_q0_rm 0:3 5:1
  %vfml_scalar_q1_index 5:1 3:1
 diff --git a/target/arm/sve.decode b/target/arm/sve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/sve.decode
 +++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
  FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
  FMLSLB_zzxw     01100100 10 1 ..... 0110.0 ..... .....     @rrxr_3a esz=2
  FMLSLT_zzxw     01100100 10 1 ..... 0110.1 ..... .....     @rrxr_3a esz=2
 +
 +### SVE2 floating-point bfloat16 dot-product (indexed)
 +BFDOT_zzxz      01100100 01 1 ..... 010000 ..... .....     @rrxr_2 esz=2
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
              return;
          }
          break;
 -    case 0x0f: /* SUDOT, USDOT */
 -        if (is_scalar || (size & 1) || !dc_isar_feature(aa64_i8mm, s)) {
 +    case 0x0f:
 +        switch (size) {
 +        case 0: /* SUDOT */
 +        case 2: /* USDOT */
 +            if (is_scalar || !dc_isar_feature(aa64_i8mm, s)) {
 +                unallocated_encoding(s);
 +                return;
 +            }
 +            break;
 +        case 1: /* BFDOT */
 +            if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
 +                unallocated_encoding(s);
 +                return;
 +            }
 +            break;
 +        default:
              unallocated_encoding(s);
              return;
          }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                           u ? gen_helper_gvec_udot_idx_b
                           : gen_helper_gvec_sdot_idx_b);
          return;
 -    case 0x0f: /* SUDOT, USDOT */
 -        gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
 -                         extract32(insn, 23, 1)
 -                         ? gen_helper_gvec_usdot_idx_b
 -                         : gen_helper_gvec_sudot_idx_b);
 -        return;
 -
 +    case 0x0f:
 +        switch (extract32(insn, 22, 2)) {
 +        case 0: /* SUDOT */
 +            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
 +                             gen_helper_gvec_sudot_idx_b);
 +            return;
 +        case 1: /* BFDOT */
 +            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
 +                             gen_helper_gvec_bfdot_idx);
 +            return;
 +        case 2: /* USDOT */
 +            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
 +                             gen_helper_gvec_usdot_idx_b);
 +            return;
 +        }
 +        g_assert_not_reached();
      case 0x11: /* FCMLA #0 */
      case 0x13: /* FCMLA #90 */
      case 0x15: /* FCMLA #180 */
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSUDOT_scalar(DisasContext *s, arg_VSUDOT_scalar *a)
                          gen_helper_gvec_sudot_idx_b);
  }
-+/*
++static bool trans_VDOT_b16_scal(DisasContext *s, arg_VDOT_b16_scal *a)
 + * Return the offset of a "full" NEON Dreg.
 + */
 +static long neon_full_reg_offset(unsigned reg)
 +{
-+    return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
++    if (!dc_isar_feature(aa32_bf16, s)) {
 +        return false;
 +    }
 +    return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index,
 +                        gen_helper_gvec_bfdot_idx);
 +}
 +
- static inline long vfp_reg_offset(bool dp, unsigned reg)
+ static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
  {
-     if (dp) {
+     int opr_sz;
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.c.inc
+--- a/target/arm/translate-sve.c
-+++ b/target/arm/translate-neon.c.inc
++++ b/target/arm/translate-sve.c
-@@ -XXX,XX +XXX,XX @@ neon_element_offset(int reg, int element, MemOp size)
+@@ -XXX,XX +XXX,XX @@ static bool trans_BFDOT_zzzz(DisasContext *s, arg_rrrr_esz *a)
          ofs ^= 8 - element_size;
      }
- #endif
--    return neon_reg_offset(reg, 0) + ofs;
-+    return neon_full_reg_offset(reg) + ofs;
- }
- static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
-              * We cannot write 16 bytes at once because the
-              * destination is unaligned.
-              */
--            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
-+            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
-, 8, tmp);
--            tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
--                             neon_reg_offset(vd, 0), 8, 8);
-+            tcg_gen_gvec_mov(0, neon_full_reg_offset(vd + 1),
-+                             neon_full_reg_offset(vd), 8, 8);
-         } else {
--            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
-+            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
-                                  vec_size, vec_size, tmp);
-         }
-         tcg_gen_addi_i32(addr, addr, 1 << size);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
- static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
- {
-     int vec_size = a->q ? 16 : 8;
--    int rd_ofs = neon_reg_offset(a->vd, 0);
--    int rn_ofs = neon_reg_offset(a->vn, 0);
--    int rm_ofs = neon_reg_offset(a->vm, 0);
-+    int rd_ofs = neon_full_reg_offset(a->vd);
-+    int rn_ofs = neon_full_reg_offset(a->vn);
-+    int rm_ofs = neon_full_reg_offset(a->vm);
-     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-         return false;
-@@ -XXX,XX +XXX,XX @@ static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
- {
-     /* Handle a 2-reg-shift insn which can be vectorized. */
-     int vec_size = a->q ? 16 : 8;
--    int rd_ofs = neon_reg_offset(a->vd, 0);
--    int rm_ofs = neon_reg_offset(a->vm, 0);
-+    int rd_ofs = neon_full_reg_offset(a->vd);
-+    int rm_ofs = neon_full_reg_offset(a->vm);
-     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-         return false;
-@@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
- {
-     /* FP operations in 2-reg-and-shift group */
-     int vec_size = a->q ? 16 : 8;
--    int rd_ofs = neon_reg_offset(a->vd, 0);
--    int rm_ofs = neon_reg_offset(a->vm, 0);
-+    int rd_ofs = neon_full_reg_offset(a->vd);
-+    int rm_ofs = neon_full_reg_offset(a->vm);
-     TCGv_ptr fpst;
-     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-@@ -XXX,XX +XXX,XX @@ static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
-         return true;
-     }
--    reg_ofs = neon_reg_offset(a->vd, 0);
-+    reg_ofs = neon_full_reg_offset(a->vd);
-     vec_size = a->q ? 16 : 8;
-     imm = asimd_imm_const(a->imm, a->cmode, a->op);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
-         return true;
-     }
--    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
--                       neon_reg_offset(a->vn, 0),
--                       neon_reg_offset(a->vm, 0),
-+    tcg_gen_gvec_3_ool(neon_full_reg_offset(a->vd),
-+                       neon_full_reg_offset(a->vn),
-+                       neon_full_reg_offset(a->vm),
-, 16, 0, fn_gvec);
      return true;
  }
-@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
++
- {
++static bool trans_BFDOT_zzxz(DisasContext *s, arg_rrxr_esz *a)
-     /* Two registers and a scalar, using gvec */
++{
-     int vec_size = a->q ? 16 : 8;
++    if (!dc_isar_feature(aa64_sve_bf16, s)) {
--    int rd_ofs = neon_reg_offset(a->vd, 0);
++        return false;
--    int rn_ofs = neon_reg_offset(a->vn, 0);
++    }
-+    int rd_ofs = neon_full_reg_offset(a->vd);
++    if (sve_access_check(s)) {
-+    int rn_ofs = neon_full_reg_offset(a->vn);
++        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfdot_idx,
-     int rm_ofs;
++                          a->rd, a->rn, a->rm, a->ra, a->index);
-     int idx;
++    }
-     TCGv_ptr fpstatus;
++    return true;
-@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
++}
-     /* a->vm is M:Vm, which encodes both register and index */
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
-     idx = extract32(a->vm, a->size + 2, 2);
+index XXXXXXX..XXXXXXX 100644
-     a->vm = extract32(a->vm, 0, a->size + 2);
+--- a/target/arm/vec_helper.c
--    rm_ofs = neon_reg_offset(a->vm, 0);
++++ b/target/arm/vec_helper.c
-+    rm_ofs = neon_full_reg_offset(a->vm);
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
      fpstatus = fpstatus_ptr(a->size == 1 ? FPST_STD_F16 : FPST_STD);
      tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpstatus,
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
          return true;
      }
+     clear_tail(d, opr_sz, simd_maxsz(desc));
--    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
+ }
-+    tcg_gen_gvec_dup_mem(a->size, neon_full_reg_offset(a->vd),
++
-                          neon_element_offset(a->vm, a->index, a->size),
++void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm,
-                          a->q ? 16 : 8, a->q ? 16 : 8);
++                            void *va, uint32_t desc)
-     return true;
++{
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
++    intptr_t i, j, opr_sz = simd_oprsz(desc);
- static bool do_2misc_vec(DisasContext *s, arg_2misc *a, GVecGen2Fn *fn)
++    intptr_t index = simd_data(desc);
- {
++    intptr_t elements = opr_sz / 4;
-     int vec_size = a->q ? 16 : 8;
++    intptr_t eltspersegment = MIN(16 / 4, elements);
--    int rd_ofs = neon_reg_offset(a->vd, 0);
++    float32 *d = vd, *a = va;
--    int rm_ofs = neon_reg_offset(a->vm, 0);
++    uint32_t *n = vn, *m = vm;
-+    int rd_ofs = neon_full_reg_offset(a->vd);
++
-+    int rm_ofs = neon_full_reg_offset(a->vm);
++    for (i = 0; i < elements; i += eltspersegment) {
++        uint32_t m_idx = m[i + H4(index)];
-     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
++
-         return false;
++        for (j = i; j < i + eltspersegment; j++) {
-diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
++            d[j] = bfdotadd(a[j], n[j], m_idx);
-index XXXXXXX..XXXXXXX 100644
++        }
---- a/target/arm/translate-vfp.c.inc
++    }
-+++ b/target/arm/translate-vfp.c.inc
++    clear_tail(d, opr_sz, simd_maxsz(desc));
-@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
++}
      }
      tmp = load_reg(s, a->rt);
 -    tcg_gen_gvec_dup_i32(size, neon_reg_offset(a->vn, 0),
 +    tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(a->vn),
                           vec_size, vec_size, tmp);
      tcg_temp_free_i32(tmp);
 --
 .20.1

-New patch
+[PULL 23/45] target/arm: Implement bfloat16 matrix multiply accumulate
+From: Richard Henderson <richard.henderson@linaro.org>
+This is BFMMLA for both AArch64 AdvSIMD and SVE,
+and VMMLA.BF16 for AArch32 NEON.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210525225817.400336-9-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/helper.h           |  3 +++
+ target/arm/neon-shared.decode |  2 ++
+ target/arm/sve.decode         |  6 +++--
+ target/arm/translate-a64.c    | 10 +++++++++
+ target/arm/translate-neon.c   |  9 ++++++++
+ target/arm/translate-sve.c    | 12 ++++++++++
+ target/arm/vec_helper.c       | 42 ++++++++++++++++++++++++++++++++++-
+files changed, 81 insertions(+), 3 deletions(-)
+diff --git a/target/arm/helper.h b/target/arm/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.h
++++ b/target/arm/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, i32)
++
+ #ifdef TARGET_AARCH64
+ #include "helper-a64.h"
+ #include "helper-sve.h"
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/neon-shared.decode
++++ b/target/arm/neon-shared.decode
+@@ -XXX,XX +XXX,XX @@ VUMMLA         1111 1100 0.10 .... .... 1100 .1.1 .... \
+                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+ VUSMMLA        1111 1100 1.10 .... .... 1100 .1.0 .... \
+                vm=%vm_dp vn=%vn_dp vd=%vd_dp
++VMMLA_b16      1111 1100 0.00 .... .... 1100 .1.0 .... \
++               vm=%vm_dp vn=%vn_dp vd=%vd_dp
+ VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
+                vn=%vn_dp vd=%vd_dp size=1
+diff --git a/target/arm/sve.decode b/target/arm/sve.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/sve.decode
++++ b/target/arm/sve.decode
+@@ -XXX,XX +XXX,XX @@ SQRDCMLAH_zzzz  01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5  ra=%reg_movprfx
+ USDOT_zzzz      01000100 .. 0 ..... 011 110 ..... .....  @rda_rn_rm
+ ### SVE2 floating point matrix multiply accumulate
+-
+-FMMLA           01100100 .. 1 ..... 111001 ..... .....  @rda_rn_rm
++{
++  BFMMLA        01100100 01 1 ..... 111 001 ..... .....  @rda_rn_rm_e0
++  FMMLA         01100100 .. 1 ..... 111 001 ..... .....  @rda_rn_rm
++}
+ ### SVE2 Memory Gather Load Group
+diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-a64.c
++++ b/target/arm/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
+         }
+         feature = dc_isar_feature(aa64_fcma, s);
+         break;
++    case 0x1d: /* BFMMLA */
++        if (size != MO_16 || !is_q) {
++            unallocated_encoding(s);
++            return;
++        }
++        feature = dc_isar_feature(aa64_bf16, s);
++        break;
+     case 0x1f: /* BFDOT */
+         switch (size) {
+         case 1:
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
+         }
+         return;
++    case 0xd: /* BFMMLA */
++        gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
++        return;
+     case 0xf: /* BFDOT */
+         switch (size) {
+         case 1:
+diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.c
++++ b/target/arm/translate-neon.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_VUSMMLA(DisasContext *s, arg_VUSMMLA *a)
+     return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
+                         gen_helper_gvec_usmmla_b);
+ }
++
++static bool trans_VMMLA_b16(DisasContext *s, arg_VMMLA_b16 *a)
++{
++    if (!dc_isar_feature(aa32_bf16, s)) {
++        return false;
++    }
++    return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
++                        gen_helper_gvec_bfmmla);
++}
+diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-sve.c
++++ b/target/arm/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_BFDOT_zzxz(DisasContext *s, arg_rrxr_esz *a)
+     }
+     return true;
+ }
++
++static bool trans_BFMMLA(DisasContext *s, arg_rrrr_esz *a)
++{
++    if (!dc_isar_feature(aa64_sve_bf16, s)) {
++        return false;
++    }
++    if (sve_access_check(s)) {
++        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfmmla,
++                          a->rd, a->rn, a->rm, a->ra, 0);
++    }
++    return true;
++}
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vec_helper.c
++++ b/target/arm/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ static void do_mmla_b(void *vd, void *vn, void *vm, void *va, uint32_t desc,
+          * Process the entire segment at once, writing back the
+          * results only after we've consumed all of the inputs.
+          *
+-         * Key to indicies by column:
++         * Key to indices by column:
+          *          i   j                  i             j
+          */
+         sum0 = a[H4(0 + 0)];
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm,
+     }
+     clear_tail(d, opr_sz, simd_maxsz(desc));
+ }
++
++void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
++{
++    intptr_t s, opr_sz = simd_oprsz(desc);
++    float32 *d = vd, *a = va;
++    uint32_t *n = vn, *m = vm;
++
++    for (s = 0; s < opr_sz / 4; s += 4) {
++        float32 sum00, sum01, sum10, sum11;
++
++        /*
++         * Process the entire segment at once, writing back the
++         * results only after we've consumed all of the inputs.
++         *
++         * Key to indicies by column:
++         *               i   j           i   k             j   k
++         */
++        sum00 = a[s + H4(0 + 0)];
++        sum00 = bfdotadd(sum00, n[s + H4(0 + 0)], m[s + H4(0 + 0)]);
++        sum00 = bfdotadd(sum00, n[s + H4(0 + 1)], m[s + H4(0 + 1)]);
++
++        sum01 = a[s + H4(0 + 1)];
++        sum01 = bfdotadd(sum01, n[s + H4(0 + 0)], m[s + H4(2 + 0)]);
++        sum01 = bfdotadd(sum01, n[s + H4(0 + 1)], m[s + H4(2 + 1)]);
++
++        sum10 = a[s + H4(2 + 0)];
++        sum10 = bfdotadd(sum10, n[s + H4(2 + 0)], m[s + H4(0 + 0)]);
++        sum10 = bfdotadd(sum10, n[s + H4(2 + 1)], m[s + H4(0 + 1)]);
++
++        sum11 = a[s + H4(2 + 1)];
++        sum11 = bfdotadd(sum11, n[s + H4(2 + 0)], m[s + H4(2 + 0)]);
++        sum11 = bfdotadd(sum11, n[s + H4(2 + 1)], m[s + H4(2 + 1)]);
++
++        d[s + H4(0 + 0)] = sum00;
++        d[s + H4(0 + 1)] = sum01;
++        d[s + H4(2 + 0)] = sum10;
++        d[s + H4(2 + 1)] = sum11;
++    }
++    clear_tail(d, opr_sz, simd_maxsz(desc));
++}
+--
+.20.1

-[PULL 13/26] target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
+[PULL 24/45] target/arm: Implement bfloat widening fma (vector)
-The helper functions for performing the udot/sdot operations against
+From: Richard Henderson <richard.henderson@linaro.org>
 a scalar were not using an address-swizzling macro when converting
 the index of the scalar element into a pointer into the vm array.
 This had no effect on little-endian hosts but meant we generated
 incorrect results on big-endian hosts.
-For these insns, the index is indexing over group of 4 8-bit values,
+This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
-so 32 bits per indexed entity, and H4() is therefore what we want.
+and VFMA{B,T}.BF16 for AArch32 NEON.
 (For Neon the only possible input indexes are 0 and 1.)
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210525225817.400336-10-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20201028191712.4910-3-peter.maydell@linaro.org
 ---
- target/arm/vec_helper.c | 4 ++--
+ target/arm/helper.h           |  3 +++
-file changed, 2 insertions(+), 2 deletions(-)
+ target/arm/neon-shared.decode |  3 +++
  target/arm/sve.decode         |  3 +++
  target/arm/translate-a64.c    | 13 +++++++++----
  target/arm/translate-neon.c   |  9 +++++++++
  target/arm/translate-sve.c    | 30 ++++++++++++++++++++++++++++++
  target/arm/vec_helper.c       | 16 ++++++++++++++++
 files changed, 73 insertions(+), 4 deletions(-)
+diff --git a/target/arm/helper.h b/target/arm/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.h
++++ b/target/arm/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, ptr, i32)
++
+ #ifdef TARGET_AARCH64
+ #include "helper-a64.h"
+ #include "helper-sve.h"
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/neon-shared.decode
++++ b/target/arm/neon-shared.decode
+@@ -XXX,XX +XXX,XX @@ VUSMMLA        1111 1100 1.10 .... .... 1100 .1.0 .... \
+ VMMLA_b16      1111 1100 0.00 .... .... 1100 .1.0 .... \
+                vm=%vm_dp vn=%vn_dp vd=%vd_dp
++VFMA_b16       1111 110 0 0.11 .... .... 1000 . q:1 . 1 .... \
++               vm=%vm_dp vn=%vn_dp vd=%vd_dp
++
+ VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
+                vn=%vn_dp vd=%vd_dp size=1
+ VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
+diff --git a/target/arm/sve.decode b/target/arm/sve.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/sve.decode
++++ b/target/arm/sve.decode
+@@ -XXX,XX +XXX,XX @@ FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
+ FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_e0
+ FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_e0
++BFMLALB_zzzw    01100100 11 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
++BFMLALT_zzzw    01100100 11 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
++
+ ### SVE2 floating-point bfloat16 dot-product
+ BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
+diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-a64.c
++++ b/target/arm/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
+         }
+         feature = dc_isar_feature(aa64_bf16, s);
+         break;
+-    case 0x1f: /* BFDOT */
++    case 0x1f:
+         switch (size) {
+-        case 1:
++        case 1: /* BFDOT */
++        case 3: /* BFMLAL{B,T} */
+             feature = dc_isar_feature(aa64_bf16, s);
+             break;
+         default:
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
+     case 0xd: /* BFMMLA */
+         gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
+         return;
+-    case 0xf: /* BFDOT */
++    case 0xf:
+         switch (size) {
+-        case 1:
++        case 1: /* BFDOT */
+             gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
+             break;
++        case 3: /* BFMLAL{B,T} */
++            gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, false, is_q,
++                              gen_helper_gvec_bfmlal);
++            break;
+         default:
+             g_assert_not_reached();
+         }
+diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.c
++++ b/target/arm/translate-neon.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMMLA_b16(DisasContext *s, arg_VMMLA_b16 *a)
+     return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
+                         gen_helper_gvec_bfmmla);
+ }
++
++static bool trans_VFMA_b16(DisasContext *s, arg_VFMA_b16 *a)
++{
++    if (!dc_isar_feature(aa32_bf16, s)) {
++        return false;
++    }
++    return do_neon_ddda_fpst(s, 7, a->vd, a->vn, a->vm, a->q, FPST_STD,
++                             gen_helper_gvec_bfmlal);
++}
+diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-sve.c
++++ b/target/arm/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_BFMMLA(DisasContext *s, arg_rrrr_esz *a)
+     }
+     return true;
+ }
++
++static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel)
++{
++    if (!dc_isar_feature(aa64_sve_bf16, s)) {
++        return false;
++    }
++    if (sve_access_check(s)) {
++        TCGv_ptr status = fpstatus_ptr(FPST_FPCR);
++        unsigned vsz = vec_full_reg_size(s);
++
++        tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
++                           vec_full_reg_offset(s, a->rn),
++                           vec_full_reg_offset(s, a->rm),
++                           vec_full_reg_offset(s, a->ra),
++                           status, vsz, vsz, sel,
++                           gen_helper_gvec_bfmlal);
++        tcg_temp_free_ptr(status);
++    }
++    return true;
++}
++
++static bool trans_BFMLALB_zzzw(DisasContext *s, arg_rrrr_esz *a)
++{
++    return do_BFMLAL_zzzw(s, a, false);
++}
++
++static bool trans_BFMLALT_zzzw(DisasContext *s, arg_rrrr_esz *a)
++{
++    return do_BFMLAL_zzzw(s, a, true);
++}
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
-     intptr_t index = simd_data(desc);
+     }
-     uint32_t *d = vd;
+     clear_tail(d, opr_sz, simd_maxsz(desc));
-     int8_t *n = vn;
+ }
--    int8_t *m_indexed = (int8_t *)vm + index * 4;
++
-+    int8_t *m_indexed = (int8_t *)vm + H4(index) * 4;
++void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va,
++                         void *stat, uint32_t desc)
-     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
++{
-      * Otherwise opr_sz is a multiple of 16.
++    intptr_t i, opr_sz = simd_oprsz(desc);
-@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
++    intptr_t sel = simd_data(desc);
-     intptr_t index = simd_data(desc);
++    float32 *d = vd, *a = va;
-     uint32_t *d = vd;
++    bfloat16 *n = vn, *m = vm;
-     uint8_t *n = vn;
++
--    uint8_t *m_indexed = (uint8_t *)vm + index * 4;
++    for (i = 0; i < opr_sz / 4; ++i) {
-+    uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4;
++        float32 nn = n[H2(i * 2 + sel)] << 16;
++        float32 mm = m[H2(i * 2 + sel)] << 16;
-     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
++        d[H4(i)] = float32_muladd(nn, mm, a[H4(i)], 0, stat);
-      * Otherwise opr_sz is a multiple of 16.
++    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 --
 .20.1

-[PULL 12/26] target/arm: Fix float16 pairwise Neon ops on big-endian hosts
+[PULL 25/45] target/arm: Implement bfloat widening fma (indexed)
-In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
+From: Richard Henderson <richard.henderson@linaro.org>
 meant we were using the H4() address swizzler macro rather than the
 H2() which is required for 2-byte data.  This had no effect on
 little-endian hosts but meant we put the result data into the
 destination Dreg in the wrong order on big-endian hosts.
+This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
+and VFMA{B,T}.BF16 for AArch32 NEON.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210525225817.400336-11-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20201028191712.4910-2-peter.maydell@linaro.org
 ---
- target/arm/vec_helper.c | 8 ++++----
+ target/arm/helper.h           |  2 ++
-file changed, 4 insertions(+), 4 deletions(-)
+ target/arm/neon-shared.decode |  2 ++
  target/arm/sve.decode         |  2 ++
  target/arm/translate-a64.c    | 15 ++++++++++++++-
  target/arm/translate-neon.c   | 10 ++++++++++
  target/arm/translate-sve.c    | 30 ++++++++++++++++++++++++++++++
  target/arm/vec_helper.c       | 22 ++++++++++++++++++++++
 files changed, 82 insertions(+), 1 deletion(-)
+diff --git a/target/arm/helper.h b/target/arm/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.h
++++ b/target/arm/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, ptr, i32)
+ #ifdef TARGET_AARCH64
+ #include "helper-a64.h"
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/neon-shared.decode
++++ b/target/arm/neon-shared.decode
+@@ -XXX,XX +XXX,XX @@ VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 0 . 1 index:1 ... \
+                rm=%vfml_scalar_q0_rm vn=%vn_sp vd=%vd_dp q=0
+ VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 1 . 1 . rm:3 \
+                index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp q=1
++VFMA_b16_scal  1111 1110 0.11 .... .... 1000 . q:1 . 1 . vm:3 \
++               index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp
+diff --git a/target/arm/sve.decode b/target/arm/sve.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/sve.decode
++++ b/target/arm/sve.decode
+@@ -XXX,XX +XXX,XX @@ FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
+ FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
+ FMLSLB_zzxw     01100100 10 1 ..... 0110.0 ..... .....     @rrxr_3a esz=2
+ FMLSLT_zzxw     01100100 10 1 ..... 0110.1 ..... .....     @rrxr_3a esz=2
++BFMLALB_zzxw    01100100 11 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
++BFMLALT_zzxw    01100100 11 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
+ ### SVE2 floating-point bfloat16 dot-product (indexed)
+ BFDOT_zzxz      01100100 01 1 ..... 010000 ..... .....     @rrxr_2 esz=2
+diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-a64.c
++++ b/target/arm/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
+                 unallocated_encoding(s);
+                 return;
+             }
++            size = MO_32;
+             break;
+         case 1: /* BFDOT */
+             if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
+                 unallocated_encoding(s);
+                 return;
+             }
++            size = MO_32;
++            break;
++        case 3: /* BFMLAL{B,T} */
++            if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
++                unallocated_encoding(s);
++                return;
++            }
++            /* can't set is_fp without other incorrect size checks */
++            size = MO_16;
+             break;
+         default:
+             unallocated_encoding(s);
+             return;
+         }
+-        size = MO_32;
+         break;
+     case 0x11: /* FCMLA #0 */
+     case 0x13: /* FCMLA #90 */
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
+             gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
+                              gen_helper_gvec_usdot_idx_b);
+             return;
++        case 3: /* BFMLAL{B,T} */
++            gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, 0, (index << 1) | is_q,
++                              gen_helper_gvec_bfmlal_idx);
++            return;
+         }
+         g_assert_not_reached();
+     case 0x11: /* FCMLA #0 */
+diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.c
++++ b/target/arm/translate-neon.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_VFMA_b16(DisasContext *s, arg_VFMA_b16 *a)
+     return do_neon_ddda_fpst(s, 7, a->vd, a->vn, a->vm, a->q, FPST_STD,
+                              gen_helper_gvec_bfmlal);
+ }
++
++static bool trans_VFMA_b16_scal(DisasContext *s, arg_VFMA_b16_scal *a)
++{
++    if (!dc_isar_feature(aa32_bf16, s)) {
++        return false;
++    }
++    return do_neon_ddda_fpst(s, 6, a->vd, a->vn, a->vm,
++                             (a->index << 1) | a->q, FPST_STD,
++                             gen_helper_gvec_bfmlal_idx);
++}
+diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-sve.c
++++ b/target/arm/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLALT_zzzw(DisasContext *s, arg_rrrr_esz *a)
+ {
+     return do_BFMLAL_zzzw(s, a, true);
+ }
++
++static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel)
++{
++    if (!dc_isar_feature(aa64_sve_bf16, s)) {
++        return false;
++    }
++    if (sve_access_check(s)) {
++        TCGv_ptr status = fpstatus_ptr(FPST_FPCR);
++        unsigned vsz = vec_full_reg_size(s);
++
++        tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
++                           vec_full_reg_offset(s, a->rn),
++                           vec_full_reg_offset(s, a->rm),
++                           vec_full_reg_offset(s, a->ra),
++                           status, vsz, vsz, (a->index << 1) | sel,
++                           gen_helper_gvec_bfmlal_idx);
++        tcg_temp_free_ptr(status);
++    }
++    return true;
++}
++
++static bool trans_BFMLALB_zzxw(DisasContext *s, arg_rrxr_esz *a)
++{
++    return do_BFMLAL_zzxw(s, a, false);
++}
++
++static bool trans_BFMLALT_zzxw(DisasContext *s, arg_rrxr_esz *a)
++{
++    return do_BFMLAL_zzxw(s, a, true);
++}
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ DO_ABA(gvec_uaba_d, uint64_t)
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va,
          r2 = float16_##OP(m[H2(0)], m[H2(1)], fpst);                    \
          r3 = float16_##OP(m[H2(2)], m[H2(3)], fpst);                    \
                                                                          \
 -        d[H4(0)] = r0;                                                  \
 -        d[H4(1)] = r1;                                                  \
 -        d[H4(2)] = r2;                                                  \
 -        d[H4(3)] = r3;                                                  \
 +        d[H2(0)] = r0;                                                  \
 +        d[H2(1)] = r1;                                                  \
 +        d[H2(2)] = r2;                                                  \
 +        d[H2(3)] = r3;                                                  \
      }
+     clear_tail(d, opr_sz, simd_maxsz(desc));
- DO_NEON_PAIRWISE(neon_padd, add)
+ }
 +
 +void HELPER(gvec_bfmlal_idx)(void *vd, void *vn, void *vm,
 +                             void *va, void *stat, uint32_t desc)
 +{
 +    intptr_t i, j, opr_sz = simd_oprsz(desc);
 +    intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 1, 3);
 +    intptr_t elements = opr_sz / 4;
 +    intptr_t eltspersegment = MIN(16 / 4, elements);
 +    float32 *d = vd, *a = va;
 +    bfloat16 *n = vn, *m = vm;
 +
 +    for (i = 0; i < elements; i += eltspersegment) {
 +        float32 m_idx = m[H2(2 * i + index)] << 16;
 +
 +        for (j = i; j < i + eltspersegment; j++) {
 +            float32 n_j = n[H2(2 * j + sel)] << 16;
 +            d[H4(j)] = float32_muladd(n_j, m_idx, a[H4(j)], 0, stat);
 +        }
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 --
 .20.1

-[PULL 10/26] target/arm: Simplify do_long_3d and do_2scalar_long
+[PULL 26/45] linux-user/aarch64: Enable hwcap bits for bfloat16
 From: Richard Henderson <richard.henderson@linaro.org>
-In both cases, we can sink the write-back and perform
-the accumulate into the normal destination temps.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-11-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-12-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-neon.c.inc | 23 +++++++++--------------
+ linux-user/elfload.c | 2 ++
-file changed, 9 insertions(+), 14 deletions(-)
+file changed, 2 insertions(+)
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+diff --git a/linux-user/elfload.c b/linux-user/elfload.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.c.inc
+--- a/linux-user/elfload.c
-+++ b/target/arm/translate-neon.c.inc
++++ b/linux-user/elfload.c
-@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
+@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap2(void)
-     if (accfn) {
+     GET_FEATURE_ID(aa64_sve_i8mm, ARM_HWCAP2_A64_SVEI8MM);
-         tmp = tcg_temp_new_i64();
+     GET_FEATURE_ID(aa64_sve_f32mm, ARM_HWCAP2_A64_SVEF32MM);
-         read_neon_element64(tmp, a->vd, 0, MO_64);
+     GET_FEATURE_ID(aa64_sve_f64mm, ARM_HWCAP2_A64_SVEF64MM);
--        accfn(tmp, tmp, rd0);
++    GET_FEATURE_ID(aa64_sve_bf16, ARM_HWCAP2_A64_SVEBF16);
--        write_neon_element64(tmp, a->vd, 0, MO_64);
+     GET_FEATURE_ID(aa64_i8mm, ARM_HWCAP2_A64_I8MM);
-+        accfn(rd0, tmp, rd0);
++    GET_FEATURE_ID(aa64_bf16, ARM_HWCAP2_A64_BF16);
-         read_neon_element64(tmp, a->vd, 1, MO_64);
+     GET_FEATURE_ID(aa64_rndr, ARM_HWCAP2_A64_RNG);
--        accfn(tmp, tmp, rd1);
+     GET_FEATURE_ID(aa64_bti, ARM_HWCAP2_A64_BTI);
--        write_neon_element64(tmp, a->vd, 1, MO_64);
+     GET_FEATURE_ID(aa64_mte, ARM_HWCAP2_A64_MTE);
 +        accfn(rd1, tmp, rd1);
          tcg_temp_free_i64(tmp);
 -    } else {
 -        write_neon_element64(rd0, a->vd, 0, MO_64);
 -        write_neon_element64(rd1, a->vd, 1, MO_64);
      }
 +    write_neon_element64(rd0, a->vd, 0, MO_64);
 +    write_neon_element64(rd1, a->vd, 1, MO_64);
      tcg_temp_free_i64(rd0);
      tcg_temp_free_i64(rd1);
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
      if (accfn) {
          TCGv_i64 t64 = tcg_temp_new_i64();
          read_neon_element64(t64, a->vd, 0, MO_64);
 -        accfn(t64, t64, rn0_64);
 -        write_neon_element64(t64, a->vd, 0, MO_64);
 +        accfn(rn0_64, t64, rn0_64);
          read_neon_element64(t64, a->vd, 1, MO_64);
 -        accfn(t64, t64, rn1_64);
 -        write_neon_element64(t64, a->vd, 1, MO_64);
 +        accfn(rn1_64, t64, rn1_64);
          tcg_temp_free_i64(t64);
 -    } else {
 -        write_neon_element64(rn0_64, a->vd, 0, MO_64);
 -        write_neon_element64(rn1_64, a->vd, 1, MO_64);
      }
 +
 +    write_neon_element64(rn0_64, a->vd, 0, MO_64);
 +    write_neon_element64(rn1_64, a->vd, 1, MO_64);
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
      return true;
 --
 .20.1

-[PULL 04/26] target/arm: Use neon_element_offset in vfp_reg_offset
+[PULL 27/45] target/arm: Enable BFloat16 extensions
 From: Richard Henderson <richard.henderson@linaro.org>
-This seems a bit more readable than using offsetof CPU_DoubleU.
+Disable BF16 again for !have_neon and !have_vfp during realize.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-5-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-13-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 13 ++++---------
+ target/arm/cpu.c     | 3 +++
-file changed, 4 insertions(+), 9 deletions(-)
+ target/arm/cpu64.c   | 3 +++
  target/arm/cpu_tcg.c | 1 +
 files changed, 7 insertions(+)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/cpu.c
-+++ b/target/arm/translate.c
++++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static long neon_element_offset(int reg, int element, MemOp size)
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
-     return neon_full_reg_offset(reg) + ofs;
- }
+         u = cpu->isar.id_isar6;
+         u = FIELD_DP32(u, ID_ISAR6, JSCVT, 0);
--static inline long vfp_reg_offset(bool dp, unsigned reg)
++        u = FIELD_DP32(u, ID_ISAR6, BF16, 0);
-+/* Return the offset of a VFP Dreg (dp = true) or VFP Sreg (dp = false). */
+         cpu->isar.id_isar6 = u;
-+static long vfp_reg_offset(bool dp, unsigned reg)
- {
+         u = cpu->isar.mvfr0;
-     if (dp) {
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
--        return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
-+        return neon_element_offset(reg, 0, MO_64);
+         t = cpu->isar.id_aa64isar1;
-     } else {
+         t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 0);
--        long ofs = offsetof(CPUARMState, vfp.zregs[reg >> 2].d[(reg >> 1) & 1]);
++        t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 0);
--        if (reg & 1) {
+         t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 0);
--            ofs += offsetof(CPU_DoubleU, l.upper);
+         cpu->isar.id_aa64isar1 = t;
--        } else {
--            ofs += offsetof(CPU_DoubleU, l.lower);
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
--        }
+         u = cpu->isar.id_isar6;
--        return ofs;
+         u = FIELD_DP32(u, ID_ISAR6, DP, 0);
-+        return neon_element_offset(reg >> 1, reg & 1, MO_32);
+         u = FIELD_DP32(u, ID_ISAR6, FHM, 0);
-     }
++        u = FIELD_DP32(u, ID_ISAR6, BF16, 0);
- }
+         u = FIELD_DP32(u, ID_ISAR6, I8MM, 0);
          cpu->isar.id_isar6 = u;
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
          t = FIELD_DP64(t, ID_AA64ISAR1, SB, 1);
          t = FIELD_DP64(t, ID_AA64ISAR1, SPECRES, 1);
 +        t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 1);
          t = FIELD_DP64(t, ID_AA64ISAR1, FRINTTS, 1);
          t = FIELD_DP64(t, ID_AA64ISAR1, LRCPC, 2); /* ARMv8.4-RCPC */
          t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 1);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 1);
          t = FIELD_DP64(t, ID_AA64ZFR0, AES, 2);  /* PMULL */
          t = FIELD_DP64(t, ID_AA64ZFR0, BITPERM, 1);
 +        t = FIELD_DP64(t, ID_AA64ZFR0, BFLOAT16, 1);
          t = FIELD_DP64(t, ID_AA64ZFR0, SHA3, 1);
          t = FIELD_DP64(t, ID_AA64ZFR0, SM4, 1);
          t = FIELD_DP64(t, ID_AA64ZFR0, I8MM, 1);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          u = FIELD_DP32(u, ID_ISAR6, FHM, 1);
          u = FIELD_DP32(u, ID_ISAR6, SB, 1);
          u = FIELD_DP32(u, ID_ISAR6, SPECRES, 1);
 +        u = FIELD_DP32(u, ID_ISAR6, BF16, 1);
          u = FIELD_DP32(u, ID_ISAR6, I8MM, 1);
          cpu->isar.id_isar6 = u;
 diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu_tcg.c
 +++ b/target/arm/cpu_tcg.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
          t = FIELD_DP32(t, ID_ISAR6, FHM, 1);
          t = FIELD_DP32(t, ID_ISAR6, SB, 1);
          t = FIELD_DP32(t, ID_ISAR6, SPECRES, 1);
 +        t = FIELD_DP32(t, ID_ISAR6, BF16, 1);
          t = FIELD_DP32(t, ID_ISAR6, I8MM, 1);
          cpu->isar.id_isar6 = t;
 --
 .20.1

-[PULL 19/26] hw/display/omap_lcdc: Fix potential NULL pointer dereference
+[PULL 28/45] hvf: Move assert_hvf_ok() into common directory
-From: AlexChen <alex.chen@huawei.com>
+From: Alexander Graf <agraf@csgraf.de>
-In omap_lcd_interrupts(), the pointer omap_lcd is dereferinced before
+Until now, Hypervisor.framework has only been available on x86_64 systems.
-being check if it is valid, which may lead to NULL pointer dereference.
+With Apple Silicon shipping now, it extends its reach to aarch64. To
-So move the assignment to surface after checking that the omap_lcd is valid
+prepare for support for multiple architectures, let's start moving common
-and move surface_bits_per_pixel(surface) to after the surface assignment.
+code out into its own accel directory.
-Reported-by: Euler Robot <euler.robot@huawei.com>
+This patch moves assert_hvf_ok() and introduces generic build infrastructure.
-Signed-off-by: AlexChen <alex.chen@huawei.com>
-Message-id: 5F9CDB8A.9000001@huawei.com
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-2-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/display/omap_lcdc.c | 10 +++++++---
+ include/sysemu/hvf_int.h | 18 +++++++++++++++
-file changed, 7 insertions(+), 3 deletions(-)
+ accel/hvf/hvf-all.c      | 47 ++++++++++++++++++++++++++++++++++++++++
+ target/i386/hvf/hvf.c    | 33 +---------------------------
-diff --git a/hw/display/omap_lcdc.c b/hw/display/omap_lcdc.c
+ MAINTAINERS              |  8 +++++++
-index XXXXXXX..XXXXXXX 100644
+ accel/hvf/meson.build    |  6 +++++
---- a/hw/display/omap_lcdc.c
+ accel/meson.build        |  1 +
-+++ b/hw/display/omap_lcdc.c
+files changed, 81 insertions(+), 32 deletions(-)
-@@ -XXX,XX +XXX,XX @@ static void omap_lcd_interrupts(struct omap_lcd_panel_s *s)
+ create mode 100644 include/sysemu/hvf_int.h
- static void omap_update_display(void *opaque)
+ create mode 100644 accel/hvf/hvf-all.c
- {
+ create mode 100644 accel/hvf/meson.build
-     struct omap_lcd_panel_s *omap_lcd = (struct omap_lcd_panel_s *) opaque;
--    DisplaySurface *surface = qemu_console_surface(omap_lcd->con);
+diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
-+    DisplaySurface *surface;
+new file mode 100644
-     draw_line_func draw_line;
+index XXXXXXX..XXXXXXX
-     int size, height, first, last;
+--- /dev/null
-     int width, linesize, step, bpp, frame_offset;
++++ b/include/sysemu/hvf_int.h
-     hwaddr frame_base;
+@@ -XXX,XX +XXX,XX @@
++/*
--    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable ||
++ * QEMU Hypervisor.framework (HVF) support
--        !surface_bits_per_pixel(surface)) {
++ *
-+    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable) {
++ * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + *
 + */
 +
 +/* header to be included in HVF-specific code */
 +
 +#ifndef HVF_INT_H
 +#define HVF_INT_H
 +
 +#include <Hypervisor/hv.h>
 +
 +void assert_hvf_ok(hv_return_t ret);
 +
 +#endif
 diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/accel/hvf/hvf-all.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * QEMU Hypervisor.framework support
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2.  See
 + * the COPYING file in the top-level directory.
 + *
 + * Contributions after 2012-01-13 are licensed under the terms of the
 + * GNU GPL, version 2 or (at your option) any later version.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qemu-common.h"
 +#include "qemu/error-report.h"
 +#include "sysemu/hvf.h"
 +#include "sysemu/hvf_int.h"
 +
 +void assert_hvf_ok(hv_return_t ret)
 +{
 +    if (ret == HV_SUCCESS) {
 +        return;
 +    }
 +
-+    surface = qemu_console_surface(omap_lcd->con);
++    switch (ret) {
-+    if (!surface_bits_per_pixel(surface)) {
++    case HV_ERROR:
-         return;
++        error_report("Error: HV_ERROR");
-     }
++        break;
++    case HV_BUSY:
 +        error_report("Error: HV_BUSY");
 +        break;
 +    case HV_BAD_ARGUMENT:
 +        error_report("Error: HV_BAD_ARGUMENT");
 +        break;
 +    case HV_NO_RESOURCES:
 +        error_report("Error: HV_NO_RESOURCES");
 +        break;
 +    case HV_NO_DEVICE:
 +        error_report("Error: HV_NO_DEVICE");
 +        break;
 +    case HV_UNSUPPORTED:
 +        error_report("Error: HV_UNSUPPORTED");
 +        break;
 +    default:
 +        error_report("Unknown Error");
 +    }
 +
 +    abort();
 +}
 diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/hvf.c
 +++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/error-report.h"
  #include "sysemu/hvf.h"
 +#include "sysemu/hvf_int.h"
  #include "sysemu/runstate.h"
  #include "hvf-i386.h"
  #include "vmcs.h"
@@ -XXX,XX +XXX,XX @@
  HVFState *hvf_state;
 -static void assert_hvf_ok(hv_return_t ret)
 -{
 -    if (ret == HV_SUCCESS) {
 -        return;
 -    }
 -
 -    switch (ret) {
 -    case HV_ERROR:
 -        error_report("Error: HV_ERROR");
 -        break;
 -    case HV_BUSY:
 -        error_report("Error: HV_BUSY");
 -        break;
 -    case HV_BAD_ARGUMENT:
 -        error_report("Error: HV_BAD_ARGUMENT");
 -        break;
 -    case HV_NO_RESOURCES:
 -        error_report("Error: HV_NO_RESOURCES");
 -        break;
 -    case HV_NO_DEVICE:
 -        error_report("Error: HV_NO_DEVICE");
 -        break;
 -    case HV_UNSUPPORTED:
 -        error_report("Error: HV_UNSUPPORTED");
 -        break;
 -    default:
 -        error_report("Unknown Error");
 -    }
 -
 -    abort();
 -}
 -
  /* Memory slots */
  hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
  {
 diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ M: Roman Bolshakov <r.bolshakov@yadro.com>
  W: https://wiki.qemu.org/Features/HVF
  S: Maintained
  F: target/i386/hvf/
 +
 +HVF
 +M: Cameron Esfahani <dirty@apple.com>
 +M: Roman Bolshakov <r.bolshakov@yadro.com>
 +W: https://wiki.qemu.org/Features/HVF
 +S: Maintained
 +F: accel/hvf/
  F: include/sysemu/hvf.h
 +F: include/sysemu/hvf_int.h
  WHPX CPUs
  M: Sunil Muthuswamy <sunilmut@microsoft.com>
 diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/accel/hvf/meson.build
@@ -XXX,XX +XXX,XX @@
 +hvf_ss = ss.source_set()
 +hvf_ss.add(files(
 +  'hvf-all.c',
 +))
 +
 +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
 diff --git a/accel/meson.build b/accel/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/meson.build
 +++ b/accel/meson.build
@@ -XXX,XX +XXX,XX @@ specific_ss.add(files('accel-common.c'))
  softmmu_ss.add(files('accel-softmmu.c'))
  user_ss.add(files('accel-user.c'))
 +subdir('hvf')
  subdir('qtest')
  subdir('kvm')
  subdir('tcg')
 --
 .20.1

-New patch
+[PULL 29/45] hvf: Move vcpu thread functions into common directory
+From: Alexander Graf <agraf@csgraf.de>
+Until now, Hypervisor.framework has only been available on x86_64 systems.
+With Apple Silicon shipping now, it extends its reach to aarch64. To
+prepare for support for multiple architectures, let's start moving common
+code out into its own accel directory.
+This patch moves the vCPU thread loop over.
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
+Reviewed-by: Sergio Lopez <slp@redhat.com>
+Message-id: 20210519202253.76782-3-agraf@csgraf.de
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ {target/i386 => accel}/hvf/hvf-accel-ops.h | 0
+ {target/i386 => accel}/hvf/hvf-accel-ops.c | 0
+ target/i386/hvf/x86hvf.c                   | 2 +-
+ accel/hvf/meson.build                      | 1 +
+ target/i386/hvf/meson.build                | 1 -
+files changed, 2 insertions(+), 2 deletions(-)
+ rename {target/i386 => accel}/hvf/hvf-accel-ops.h (100%)
+ rename {target/i386 => accel}/hvf/hvf-accel-ops.c (100%)
+diff --git a/target/i386/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
+similarity index 100%
+rename from target/i386/hvf/hvf-accel-ops.h
+rename to accel/hvf/hvf-accel-ops.h
+diff --git a/target/i386/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
+similarity index 100%
+rename from target/i386/hvf/hvf-accel-ops.c
+rename to accel/hvf/hvf-accel-ops.c
+diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/i386/hvf/x86hvf.c
++++ b/target/i386/hvf/x86hvf.c
+@@ -XXX,XX +XXX,XX @@
+ #include <Hypervisor/hv.h>
+ #include <Hypervisor/hv_vmx.h>
+-#include "hvf-accel-ops.h"
++#include "accel/hvf/hvf-accel-ops.h"
+ void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
+                      SegmentCache *qseg, bool is_tr)
+diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/accel/hvf/meson.build
++++ b/accel/hvf/meson.build
+@@ -XXX,XX +XXX,XX @@
+ hvf_ss = ss.source_set()
+ hvf_ss.add(files(
+   'hvf-all.c',
++  'hvf-accel-ops.c',
+ ))
+ specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
+diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/target/i386/hvf/meson.build
++++ b/target/i386/hvf/meson.build
+@@ -XXX,XX +XXX,XX @@
+ i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
+   'hvf.c',
+-  'hvf-accel-ops.c',
+   'x86.c',
+   'x86_cpuid.c',
+   'x86_decode.c',
+--
+.20.1

-[PULL 03/26] target/arm: Use neon_element_offset in neon_load/store_reg
+[PULL 30/45] hvf: Move cpu functions into common directory
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Alexander Graf <agraf@csgraf.de>
-These are the only users of neon_reg_offset, so remove that.
+Until now, Hypervisor.framework has only been available on x86_64 systems.
 With Apple Silicon shipping now, it extends its reach to aarch64. To
 prepare for support for multiple architectures, let's start moving common
 code out into its own accel directory.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+This patch moves CPU and memory operations over. While at it, make sure
-Message-id: 20201030022618.785675-4-richard.henderson@linaro.org
+the code is consumable on non-i386 systems.
 Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-4-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 14 ++------------
+ include/sysemu/hvf_int.h   |   4 +
-file changed, 2 insertions(+), 12 deletions(-)
+ target/i386/hvf/hvf-i386.h |   2 -
  target/i386/hvf/x86hvf.h   |   2 -
  accel/hvf/hvf-accel-ops.c  | 308 ++++++++++++++++++++++++++++++++++++-
  target/i386/hvf/hvf.c      | 302 ------------------------------------
 files changed, 311 insertions(+), 307 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/include/sysemu/hvf_int.h
-+++ b/target/arm/translate.c
++++ b/include/sysemu/hvf_int.h
-@@ -XXX,XX +XXX,XX @@ static inline long vfp_reg_offset(bool dp, unsigned reg)
+@@ -XXX,XX +XXX,XX @@
  #include <Hypervisor/hv.h>
 +void hvf_set_phys_mem(MemoryRegionSection *, bool);
  void assert_hvf_ok(hv_return_t ret);
 +hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
 +int hvf_put_registers(CPUState *);
 +int hvf_get_registers(CPUState *);
  #endif
 diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/hvf-i386.h
 +++ b/target/i386/hvf/hvf-i386.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
  };
  extern HVFState *hvf_state;
 -void hvf_set_phys_mem(MemoryRegionSection *, bool);
  void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
 -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
  #ifdef NEED_CPU_H
  /* Functions exported to host specific mode */
 diff --git a/target/i386/hvf/x86hvf.h b/target/i386/hvf/x86hvf.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/x86hvf.h
 +++ b/target/i386/hvf/x86hvf.h
@@ -XXX,XX +XXX,XX @@
  #include "x86_descr.h"
  int hvf_process_events(CPUState *);
 -int hvf_put_registers(CPUState *);
 -int hvf_get_registers(CPUState *);
  bool hvf_inject_interrupts(CPUState *);
  void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
                       SegmentCache *qseg, bool is_tr);
 diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/hvf/hvf-accel-ops.c
 +++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/osdep.h"
  #include "qemu/error-report.h"
  #include "qemu/main-loop.h"
 +#include "exec/address-spaces.h"
 +#include "exec/exec-all.h"
 +#include "sysemu/cpus.h"
  #include "sysemu/hvf.h"
 +#include "sysemu/hvf_int.h"
  #include "sysemu/runstate.h"
 -#include "target/i386/cpu.h"
  #include "qemu/guest-random.h"
  #include "hvf-accel-ops.h"
 +HVFState *hvf_state;
 +
 +/* Memory slots */
 +
 +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
 +{
 +    hvf_slot *slot;
 +    int x;
 +    for (x = 0; x < hvf_state->num_slots; ++x) {
 +        slot = &hvf_state->slots[x];
 +        if (slot->size && start < (slot->start + slot->size) &&
 +            (start + size) > slot->start) {
 +            return slot;
 +        }
 +    }
 +    return NULL;
 +}
 +
 +struct mac_slot {
 +    int present;
 +    uint64_t size;
 +    uint64_t gpa_start;
 +    uint64_t gva;
 +};
 +
 +struct mac_slot mac_slots[32];
 +
 +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
 +{
 +    struct mac_slot *macslot;
 +    hv_return_t ret;
 +
 +    macslot = &mac_slots[slot->slot_id];
 +
 +    if (macslot->present) {
 +        if (macslot->size != slot->size) {
 +            macslot->present = 0;
 +            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
 +            assert_hvf_ok(ret);
 +        }
 +    }
 +
 +    if (!slot->size) {
 +        return 0;
 +    }
 +
 +    macslot->present = 1;
 +    macslot->gpa_start = slot->start;
 +    macslot->size = slot->size;
 +    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
 +    assert_hvf_ok(ret);
 +    return 0;
 +}
 +
 +void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
 +{
 +    hvf_slot *mem;
 +    MemoryRegion *area = section->mr;
 +    bool writeable = !area->readonly && !area->rom_device;
 +    hv_memory_flags_t flags;
 +
 +    if (!memory_region_is_ram(area)) {
 +        if (writeable) {
 +            return;
 +        } else if (!memory_region_is_romd(area)) {
 +            /*
 +             * If the memory device is not in romd_mode, then we actually want
 +             * to remove the hvf memory slot so all accesses will trap.
 +             */
 +             add = false;
 +        }
 +    }
 +
 +    mem = hvf_find_overlap_slot(
 +            section->offset_within_address_space,
 +            int128_get64(section->size));
 +
 +    if (mem && add) {
 +        if (mem->size == int128_get64(section->size) &&
 +            mem->start == section->offset_within_address_space &&
 +            mem->mem == (memory_region_get_ram_ptr(area) +
 +            section->offset_within_region)) {
 +            return; /* Same region was attempted to register, go away. */
 +        }
 +    }
 +
 +    /* Region needs to be reset. set the size to 0 and remap it. */
 +    if (mem) {
 +        mem->size = 0;
 +        if (do_hvf_set_memory(mem, 0)) {
 +            error_report("Failed to reset overlapping slot");
 +            abort();
 +        }
 +    }
 +
 +    if (!add) {
 +        return;
 +    }
 +
 +    if (area->readonly ||
 +        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
 +        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
 +    } else {
 +        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
 +    }
 +
 +    /* Now make a new slot. */
 +    int x;
 +
 +    for (x = 0; x < hvf_state->num_slots; ++x) {
 +        mem = &hvf_state->slots[x];
 +        if (!mem->size) {
 +            break;
 +        }
 +    }
 +
 +    if (x == hvf_state->num_slots) {
 +        error_report("No free slots");
 +        abort();
 +    }
 +
 +    mem->size = int128_get64(section->size);
 +    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
 +    mem->start = section->offset_within_address_space;
 +    mem->region = area;
 +
 +    if (do_hvf_set_memory(mem, flags)) {
 +        error_report("Error registering new memory slot");
 +        abort();
 +    }
 +}
 +
 +static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
 +{
 +    if (!cpu->vcpu_dirty) {
 +        hvf_get_registers(cpu);
 +        cpu->vcpu_dirty = true;
 +    }
 +}
 +
 +void hvf_cpu_synchronize_state(CPUState *cpu)
 +{
 +    if (!cpu->vcpu_dirty) {
 +        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
 +    }
 +}
 +
 +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
 +                                              run_on_cpu_data arg)
 +{
 +    hvf_put_registers(cpu);
 +    cpu->vcpu_dirty = false;
 +}
 +
 +void hvf_cpu_synchronize_post_reset(CPUState *cpu)
 +{
 +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
 +}
 +
 +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
 +                                             run_on_cpu_data arg)
 +{
 +    hvf_put_registers(cpu);
 +    cpu->vcpu_dirty = false;
 +}
 +
 +void hvf_cpu_synchronize_post_init(CPUState *cpu)
 +{
 +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
 +}
 +
 +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
 +                                              run_on_cpu_data arg)
 +{
 +    cpu->vcpu_dirty = true;
 +}
 +
 +void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
 +{
 +    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
 +}
 +
 +static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
 +{
 +    hvf_slot *slot;
 +
 +    slot = hvf_find_overlap_slot(
 +            section->offset_within_address_space,
 +            int128_get64(section->size));
 +
 +    /* protect region against writes; begin tracking it */
 +    if (on) {
 +        slot->flags |= HVF_SLOT_LOG;
 +        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
 +                      HV_MEMORY_READ);
 +    /* stop tracking region*/
 +    } else {
 +        slot->flags &= ~HVF_SLOT_LOG;
 +        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
 +                      HV_MEMORY_READ | HV_MEMORY_WRITE);
 +    }
 +}
 +
 +static void hvf_log_start(MemoryListener *listener,
 +                          MemoryRegionSection *section, int old, int new)
 +{
 +    if (old != 0) {
 +        return;
 +    }
 +
 +    hvf_set_dirty_tracking(section, 1);
 +}
 +
 +static void hvf_log_stop(MemoryListener *listener,
 +                         MemoryRegionSection *section, int old, int new)
 +{
 +    if (new != 0) {
 +        return;
 +    }
 +
 +    hvf_set_dirty_tracking(section, 0);
 +}
 +
 +static void hvf_log_sync(MemoryListener *listener,
 +                         MemoryRegionSection *section)
 +{
 +    /*
 +     * sync of dirty pages is handled elsewhere; just make sure we keep
 +     * tracking the region.
 +     */
 +    hvf_set_dirty_tracking(section, 1);
 +}
 +
 +static void hvf_region_add(MemoryListener *listener,
 +                           MemoryRegionSection *section)
 +{
 +    hvf_set_phys_mem(section, true);
 +}
 +
 +static void hvf_region_del(MemoryListener *listener,
 +                           MemoryRegionSection *section)
 +{
 +    hvf_set_phys_mem(section, false);
 +}
 +
 +static MemoryListener hvf_memory_listener = {
 +    .priority = 10,
 +    .region_add = hvf_region_add,
 +    .region_del = hvf_region_del,
 +    .log_start = hvf_log_start,
 +    .log_stop = hvf_log_stop,
 +    .log_sync = hvf_log_sync,
 +};
 +
 +static void dummy_signal(int sig)
 +{
 +}
 +
 +bool hvf_allowed;
 +
 +static int hvf_accel_init(MachineState *ms)
 +{
 +    int x;
 +    hv_return_t ret;
 +    HVFState *s;
 +
 +    ret = hv_vm_create(HV_VM_DEFAULT);
 +    assert_hvf_ok(ret);
 +
 +    s = g_new0(HVFState, 1);
 +
 +    s->num_slots = 32;
 +    for (x = 0; x < s->num_slots; ++x) {
 +        s->slots[x].size = 0;
 +        s->slots[x].slot_id = x;
 +    }
 +
 +    hvf_state = s;
 +    memory_listener_register(&hvf_memory_listener, &address_space_memory);
 +    return 0;
 +}
 +
 +static void hvf_accel_class_init(ObjectClass *oc, void *data)
 +{
 +    AccelClass *ac = ACCEL_CLASS(oc);
 +    ac->name = "HVF";
 +    ac->init_machine = hvf_accel_init;
 +    ac->allowed = &hvf_allowed;
 +}
 +
 +static const TypeInfo hvf_accel_type = {
 +    .name = TYPE_HVF_ACCEL,
 +    .parent = TYPE_ACCEL,
 +    .class_init = hvf_accel_class_init,
 +};
 +
 +static void hvf_type_init(void)
 +{
 +    type_register_static(&hvf_accel_type);
 +}
 +
 +type_init(hvf_type_init);
 +
  /*
   * The HVF-specific vCPU thread function. This one should only run when the host
   * CPU supports the VMX "unrestricted guest" feature.
 diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/hvf.c
 +++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
  #include "hvf-accel-ops.h"
 -HVFState *hvf_state;
 -
 -/* Memory slots */
 -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
 -{
 -    hvf_slot *slot;
 -    int x;
 -    for (x = 0; x < hvf_state->num_slots; ++x) {
 -        slot = &hvf_state->slots[x];
 -        if (slot->size && start < (slot->start + slot->size) &&
 -            (start + size) > slot->start) {
 -            return slot;
 -        }
 -    }
 -    return NULL;
 -}
 -
 -struct mac_slot {
 -    int present;
 -    uint64_t size;
 -    uint64_t gpa_start;
 -    uint64_t gva;
 -};
 -
 -struct mac_slot mac_slots[32];
 -
 -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
 -{
 -    struct mac_slot *macslot;
 -    hv_return_t ret;
 -
 -    macslot = &mac_slots[slot->slot_id];
 -
 -    if (macslot->present) {
 -        if (macslot->size != slot->size) {
 -            macslot->present = 0;
 -            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
 -            assert_hvf_ok(ret);
 -        }
 -    }
 -
 -    if (!slot->size) {
 -        return 0;
 -    }
 -
 -    macslot->present = 1;
 -    macslot->gpa_start = slot->start;
 -    macslot->size = slot->size;
 -    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
 -    assert_hvf_ok(ret);
 -    return 0;
 -}
 -
 -void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
 -{
 -    hvf_slot *mem;
 -    MemoryRegion *area = section->mr;
 -    bool writeable = !area->readonly && !area->rom_device;
 -    hv_memory_flags_t flags;
 -
 -    if (!memory_region_is_ram(area)) {
 -        if (writeable) {
 -            return;
 -        } else if (!memory_region_is_romd(area)) {
 -            /*
 -             * If the memory device is not in romd_mode, then we actually want
 -             * to remove the hvf memory slot so all accesses will trap.
 -             */
 -             add = false;
 -        }
 -    }
 -
 -    mem = hvf_find_overlap_slot(
 -            section->offset_within_address_space,
 -            int128_get64(section->size));
 -
 -    if (mem && add) {
 -        if (mem->size == int128_get64(section->size) &&
 -            mem->start == section->offset_within_address_space &&
 -            mem->mem == (memory_region_get_ram_ptr(area) +
 -            section->offset_within_region)) {
 -            return; /* Same region was attempted to register, go away. */
 -        }
 -    }
 -
 -    /* Region needs to be reset. set the size to 0 and remap it. */
 -    if (mem) {
 -        mem->size = 0;
 -        if (do_hvf_set_memory(mem, 0)) {
 -            error_report("Failed to reset overlapping slot");
 -            abort();
 -        }
 -    }
 -
 -    if (!add) {
 -        return;
 -    }
 -
 -    if (area->readonly ||
 -        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
 -        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
 -    } else {
 -        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
 -    }
 -
 -    /* Now make a new slot. */
 -    int x;
 -
 -    for (x = 0; x < hvf_state->num_slots; ++x) {
 -        mem = &hvf_state->slots[x];
 -        if (!mem->size) {
 -            break;
 -        }
 -    }
 -
 -    if (x == hvf_state->num_slots) {
 -        error_report("No free slots");
 -        abort();
 -    }
 -
 -    mem->size = int128_get64(section->size);
 -    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
 -    mem->start = section->offset_within_address_space;
 -    mem->region = area;
 -
 -    if (do_hvf_set_memory(mem, flags)) {
 -        error_report("Error registering new memory slot");
 -        abort();
 -    }
 -}
 -
  void vmx_update_tpr(CPUState *cpu)
  {
      /* TODO: need integrate APIC handling */
@@ -XXX,XX +XXX,XX @@ void hvf_handle_io(CPUArchState *env, uint16_t port, void *buffer,
      }
  }
--/* Return the offset of a 32-bit piece of a NEON register.
+-static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
--   zero is the least significant end of the register.  */
+-{
--static inline long
+-    if (!cpu->vcpu_dirty) {
--neon_reg_offset (int reg, int n)
+-        hvf_get_registers(cpu);
--{
+-        cpu->vcpu_dirty = true;
--    int sreg;
+-    }
--    sreg = reg * 2 + n;
+-}
--    return vfp_reg_offset(0, sreg);
+-
--}
+-void hvf_cpu_synchronize_state(CPUState *cpu)
--
+-{
- static TCGv_i32 neon_load_reg(int reg, int pass)
+-    if (!cpu->vcpu_dirty) {
 -        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
 -    }
 -}
 -
 -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
 -                                              run_on_cpu_data arg)
 -{
 -    hvf_put_registers(cpu);
 -    cpu->vcpu_dirty = false;
 -}
 -
 -void hvf_cpu_synchronize_post_reset(CPUState *cpu)
 -{
 -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
 -}
 -
 -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
 -                                             run_on_cpu_data arg)
 -{
 -    hvf_put_registers(cpu);
 -    cpu->vcpu_dirty = false;
 -}
 -
 -void hvf_cpu_synchronize_post_init(CPUState *cpu)
 -{
 -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
 -}
 -
 -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
 -                                              run_on_cpu_data arg)
 -{
 -    cpu->vcpu_dirty = true;
 -}
 -
 -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
 -{
 -    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
 -}
 -
  static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
  {
-     TCGv_i32 tmp = tcg_temp_new_i32();
+     int read, write;
--    tcg_gen_ld_i32(tmp, cpu_env, neon_reg_offset(reg, pass));
+@@ -XXX,XX +XXX,XX @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
-+    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
+     return false;
      return tmp;
  }
- static void neon_store_reg(int reg, int pass, TCGv_i32 var)
+-static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
 -{
 -    hvf_slot *slot;
 -
 -    slot = hvf_find_overlap_slot(
 -            section->offset_within_address_space,
 -            int128_get64(section->size));
 -
 -    /* protect region against writes; begin tracking it */
 -    if (on) {
 -        slot->flags |= HVF_SLOT_LOG;
 -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
 -                      HV_MEMORY_READ);
 -    /* stop tracking region*/
 -    } else {
 -        slot->flags &= ~HVF_SLOT_LOG;
 -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
 -                      HV_MEMORY_READ | HV_MEMORY_WRITE);
 -    }
 -}
 -
 -static void hvf_log_start(MemoryListener *listener,
 -                          MemoryRegionSection *section, int old, int new)
 -{
 -    if (old != 0) {
 -        return;
 -    }
 -
 -    hvf_set_dirty_tracking(section, 1);
 -}
 -
 -static void hvf_log_stop(MemoryListener *listener,
 -                         MemoryRegionSection *section, int old, int new)
 -{
 -    if (new != 0) {
 -        return;
 -    }
 -
 -    hvf_set_dirty_tracking(section, 0);
 -}
 -
 -static void hvf_log_sync(MemoryListener *listener,
 -                         MemoryRegionSection *section)
 -{
 -    /*
 -     * sync of dirty pages is handled elsewhere; just make sure we keep
 -     * tracking the region.
 -     */
 -    hvf_set_dirty_tracking(section, 1);
 -}
 -
 -static void hvf_region_add(MemoryListener *listener,
 -                           MemoryRegionSection *section)
 -{
 -    hvf_set_phys_mem(section, true);
 -}
 -
 -static void hvf_region_del(MemoryListener *listener,
 -                           MemoryRegionSection *section)
 -{
 -    hvf_set_phys_mem(section, false);
 -}
 -
 -static MemoryListener hvf_memory_listener = {
 -    .priority = 10,
 -    .region_add = hvf_region_add,
 -    .region_del = hvf_region_del,
 -    .log_start = hvf_log_start,
 -    .log_stop = hvf_log_stop,
 -    .log_sync = hvf_log_sync,
 -};
 -
  void hvf_vcpu_destroy(CPUState *cpu)
  {
--    tcg_gen_st_i32(var, cpu_env, neon_reg_offset(reg, pass));
+     X86CPU *x86_cpu = X86_CPU(cpu);
-+    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
+@@ -XXX,XX +XXX,XX @@ void hvf_vcpu_destroy(CPUState *cpu)
-     tcg_temp_free_i32(var);
+     assert_hvf_ok(ret);
  }
+-static void dummy_signal(int sig)
+-{
+-}
+-
+ static void init_tsc_freq(CPUX86State *env)
+ {
+     size_t length;
+@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
+     return ret;
+ }
+-
+-bool hvf_allowed;
+-
+-static int hvf_accel_init(MachineState *ms)
+-{
+-    int x;
+-    hv_return_t ret;
+-    HVFState *s;
+-
+-    ret = hv_vm_create(HV_VM_DEFAULT);
+-    assert_hvf_ok(ret);
+-
+-    s = g_new0(HVFState, 1);
+-
+-    s->num_slots = 32;
+-    for (x = 0; x < s->num_slots; ++x) {
+-        s->slots[x].size = 0;
+-        s->slots[x].slot_id = x;
+-    }
+-
+-    hvf_state = s;
+-    memory_listener_register(&hvf_memory_listener, &address_space_memory);
+-    return 0;
+-}
+-
+-static void hvf_accel_class_init(ObjectClass *oc, void *data)
+-{
+-    AccelClass *ac = ACCEL_CLASS(oc);
+-    ac->name = "HVF";
+-    ac->init_machine = hvf_accel_init;
+-    ac->allowed = &hvf_allowed;
+-}
+-
+-static const TypeInfo hvf_accel_type = {
+-    .name = TYPE_HVF_ACCEL,
+-    .parent = TYPE_ACCEL,
+-    .class_init = hvf_accel_class_init,
+-};
+-
+-static void hvf_type_init(void)
+-{
+-    type_register_static(&hvf_accel_type);
+-}
+-
+-type_init(hvf_type_init);
 --
 .20.1

-New patch
+[PULL 31/45] hvf: Move hvf internal definitions into common header
+From: Alexander Graf <agraf@csgraf.de>
+Until now, Hypervisor.framework has only been available on x86_64 systems.
+With Apple Silicon shipping now, it extends its reach to aarch64. To
+prepare for support for multiple architectures, let's start moving common
+code out into its own accel directory.
+This patch moves a few internal struct and constant defines over.
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
+Reviewed-by: Sergio Lopez <slp@redhat.com>
+Message-id: 20210519202253.76782-5-agraf@csgraf.de
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ include/sysemu/hvf_int.h   | 30 ++++++++++++++++++++++++++++++
+ target/i386/hvf/hvf-i386.h | 31 +------------------------------
+files changed, 31 insertions(+), 30 deletions(-)
+diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/sysemu/hvf_int.h
++++ b/include/sysemu/hvf_int.h
+@@ -XXX,XX +XXX,XX @@
+ #include <Hypervisor/hv.h>
++/* hvf_slot flags */
++#define HVF_SLOT_LOG (1 << 0)
++
++typedef struct hvf_slot {
++    uint64_t start;
++    uint64_t size;
++    uint8_t *mem;
++    int slot_id;
++    uint32_t flags;
++    MemoryRegion *region;
++} hvf_slot;
++
++typedef struct hvf_vcpu_caps {
++    uint64_t vmx_cap_pinbased;
++    uint64_t vmx_cap_procbased;
++    uint64_t vmx_cap_procbased2;
++    uint64_t vmx_cap_entry;
++    uint64_t vmx_cap_exit;
++    uint64_t vmx_cap_preemption_timer;
++} hvf_vcpu_caps;
++
++struct HVFState {
++    AccelState parent;
++    hvf_slot slots[32];
++    int num_slots;
++
++    hvf_vcpu_caps *hvf_caps;
++};
++extern HVFState *hvf_state;
++
+ void hvf_set_phys_mem(MemoryRegionSection *, bool);
+ void assert_hvf_ok(hv_return_t ret);
+ hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
+diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/i386/hvf/hvf-i386.h
++++ b/target/i386/hvf/hvf-i386.h
+@@ -XXX,XX +XXX,XX @@
+ #include "qemu/accel.h"
+ #include "sysemu/hvf.h"
++#include "sysemu/hvf_int.h"
+ #include "cpu.h"
+ #include "x86.h"
+-/* hvf_slot flags */
+-#define HVF_SLOT_LOG (1 << 0)
+-
+-typedef struct hvf_slot {
+-    uint64_t start;
+-    uint64_t size;
+-    uint8_t *mem;
+-    int slot_id;
+-    uint32_t flags;
+-    MemoryRegion *region;
+-} hvf_slot;
+-
+-typedef struct hvf_vcpu_caps {
+-    uint64_t vmx_cap_pinbased;
+-    uint64_t vmx_cap_procbased;
+-    uint64_t vmx_cap_procbased2;
+-    uint64_t vmx_cap_entry;
+-    uint64_t vmx_cap_exit;
+-    uint64_t vmx_cap_preemption_timer;
+-} hvf_vcpu_caps;
+-
+-struct HVFState {
+-    AccelState parent;
+-    hvf_slot slots[32];
+-    int num_slots;
+-
+-    hvf_vcpu_caps *hvf_caps;
+-};
+-extern HVFState *hvf_state;
+-
+ void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
+ #ifdef NEED_CPU_H
+--
+.20.1

-New patch
+[PULL 32/45] hvf: Make hvf_set_phys_mem() static
+From: Alexander Graf <agraf@csgraf.de>
+The hvf_set_phys_mem() function is only called within the same file.
+Make it static.
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
+Reviewed-by: Sergio Lopez <slp@redhat.com>
+Message-id: 20210519202253.76782-6-agraf@csgraf.de
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ include/sysemu/hvf_int.h  | 1 -
+ accel/hvf/hvf-accel-ops.c | 2 +-
+files changed, 1 insertion(+), 2 deletions(-)
+diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/sysemu/hvf_int.h
++++ b/include/sysemu/hvf_int.h
+@@ -XXX,XX +XXX,XX @@ struct HVFState {
+ };
+ extern HVFState *hvf_state;
+-void hvf_set_phys_mem(MemoryRegionSection *, bool);
+ void assert_hvf_ok(hv_return_t ret);
+ hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
+ int hvf_put_registers(CPUState *);
+diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
+index XXXXXXX..XXXXXXX 100644
+--- a/accel/hvf/hvf-accel-ops.c
++++ b/accel/hvf/hvf-accel-ops.c
+@@ -XXX,XX +XXX,XX @@ static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
+     return 0;
+ }
+-void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
++static void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
+ {
+     hvf_slot *mem;
+     MemoryRegion *area = section->mr;
+--
+.20.1

-New patch
+[PULL 33/45] hvf: Remove use of hv_uvaddr_t and hv_gpaddr_t
+From: Alexander Graf <agraf@csgraf.de>
+The ARM version of Hypervisor.framework no longer defines these two
+types, so let's just revert to standard ones.
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
+Reviewed-by: Sergio Lopez <slp@redhat.com>
+Message-id: 20210519202253.76782-7-agraf@csgraf.de
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ accel/hvf/hvf-accel-ops.c | 6 +++---
+file changed, 3 insertions(+), 3 deletions(-)
+diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
+index XXXXXXX..XXXXXXX 100644
+--- a/accel/hvf/hvf-accel-ops.c
++++ b/accel/hvf/hvf-accel-ops.c
+@@ -XXX,XX +XXX,XX @@ static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
+     macslot->present = 1;
+     macslot->gpa_start = slot->start;
+     macslot->size = slot->size;
+-    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
++    ret = hv_vm_map(slot->mem, slot->start, slot->size, flags);
+     assert_hvf_ok(ret);
+     return 0;
+ }
+@@ -XXX,XX +XXX,XX @@ static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
+     /* protect region against writes; begin tracking it */
+     if (on) {
+         slot->flags |= HVF_SLOT_LOG;
+-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
++        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
+                       HV_MEMORY_READ);
+     /* stop tracking region*/
+     } else {
+         slot->flags &= ~HVF_SLOT_LOG;
+-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
++        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
+                       HV_MEMORY_READ | HV_MEMORY_WRITE);
+     }
+ }
+--
+.20.1

-[PULL 07/26] target/arm: Rename neon_load_reg32 to vfp_load_reg32
+[PULL 34/45] hvf: Split out common code on vcpu init and destroy
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Alexander Graf <agraf@csgraf.de>
-The only uses of this function are for loading VFP
+Until now, Hypervisor.framework has only been available on x86_64 systems.
-single-precision values, and nothing to do with NEON.
+With Apple Silicon shipping now, it extends its reach to aarch64. To
 prepare for support for multiple architectures, let's start moving common
 code out into its own accel directory.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+This patch splits the vcpu init and destroy functions into a generic and
-Message-id: 20201030022618.785675-8-richard.henderson@linaro.org
+an architecture specific portion. This also allows us to move the generic
 functions into the generic hvf code, removing exported functions.
 Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-8-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c         |   4 +-
+ accel/hvf/hvf-accel-ops.h |  2 --
- target/arm/translate-vfp.c.inc | 184 ++++++++++++++++-----------------
+ include/sysemu/hvf_int.h  |  2 ++
-files changed, 94 insertions(+), 94 deletions(-)
+ accel/hvf/hvf-accel-ops.c | 30 ++++++++++++++++++++++++++++++
  target/i386/hvf/hvf.c     | 23 ++---------------------
 files changed, 34 insertions(+), 23 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/accel/hvf/hvf-accel-ops.h
-+++ b/target/arm/translate.c
++++ b/accel/hvf/hvf-accel-ops.h
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
+@@ -XXX,XX +XXX,XX @@
-     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
  #include "sysemu/cpus.h"
 -int hvf_init_vcpu(CPUState *);
  int hvf_vcpu_exec(CPUState *);
  void hvf_cpu_synchronize_state(CPUState *);
  void hvf_cpu_synchronize_post_reset(CPUState *);
  void hvf_cpu_synchronize_post_init(CPUState *);
  void hvf_cpu_synchronize_pre_loadvm(CPUState *);
 -void hvf_vcpu_destroy(CPUState *);
  #endif /* HVF_CPUS_H */
 diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/sysemu/hvf_int.h
 +++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
  extern HVFState *hvf_state;
  void assert_hvf_ok(hv_return_t ret);
 +int hvf_arch_init_vcpu(CPUState *cpu);
 +void hvf_arch_vcpu_destroy(CPUState *cpu);
  hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
  int hvf_put_registers(CPUState *);
  int hvf_get_registers(CPUState *);
 diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/hvf/hvf-accel-ops.c
 +++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static void hvf_type_init(void)
  type_init(hvf_type_init);
 +static void hvf_vcpu_destroy(CPUState *cpu)
 +{
 +    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
 +    assert_hvf_ok(ret);
 +
 +    hvf_arch_vcpu_destroy(cpu);
 +}
 +
 +static int hvf_init_vcpu(CPUState *cpu)
 +{
 +    int r;
 +
 +    /* init cpu signals */
 +    sigset_t set;
 +    struct sigaction sigact;
 +
 +    memset(&sigact, 0, sizeof(sigact));
 +    sigact.sa_handler = dummy_signal;
 +    sigaction(SIG_IPI, &sigact, NULL);
 +
 +    pthread_sigmask(SIG_BLOCK, NULL, &set);
 +    sigdelset(&set, SIG_IPI);
 +
 +    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
 +    cpu->vcpu_dirty = 1;
 +    assert_hvf_ok(r);
 +
 +    return hvf_arch_init_vcpu(cpu);
 +}
 +
  /*
   * The HVF-specific vCPU thread function. This one should only run when the host
   * CPU supports the VMX "unrestricted guest" feature.
 diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/hvf.c
 +++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
      return false;
  }
--static inline void neon_load_reg32(TCGv_i32 var, int reg)
+-void hvf_vcpu_destroy(CPUState *cpu)
-+static inline void vfp_load_reg32(TCGv_i32 var, int reg)
++void hvf_arch_vcpu_destroy(CPUState *cpu)
  {
-     tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
+     X86CPU *x86_cpu = X86_CPU(cpu);
      CPUX86State *env = &x86_cpu->env;
 -    hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd);
      g_free(env->hvf_mmio_buf);
 -    assert_hvf_ok(ret);
  }
--static inline void neon_store_reg32(TCGv_i32 var, int reg)
+ static void init_tsc_freq(CPUX86State *env)
-+static inline void vfp_store_reg32(TCGv_i32 var, int reg)
+@@ -XXX,XX +XXX,XX @@ static inline bool apic_bus_freq_is_known(CPUX86State *env)
      return env->apic_bus_freq != 0;
  }
 -int hvf_init_vcpu(CPUState *cpu)
 +int hvf_arch_init_vcpu(CPUState *cpu)
  {
-     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
+-
- }
+     X86CPU *x86cpu = X86_CPU(cpu);
-diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
+     CPUX86State *env = &x86cpu->env;
-index XXXXXXX..XXXXXXX 100644
+-    int r;
---- a/target/arm/translate-vfp.c.inc
+-
-+++ b/target/arm/translate-vfp.c.inc
+-    /* init cpu signals */
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+-    sigset_t set;
-         frn = tcg_temp_new_i32();
+-    struct sigaction sigact;
-         frm = tcg_temp_new_i32();
+-
-         dest = tcg_temp_new_i32();
+-    memset(&sigact, 0, sizeof(sigact));
--        neon_load_reg32(frn, rn);
+-    sigact.sa_handler = dummy_signal;
--        neon_load_reg32(frm, rm);
+-    sigaction(SIG_IPI, &sigact, NULL);
-+        vfp_load_reg32(frn, rn);
+-
-+        vfp_load_reg32(frm, rm);
+-    pthread_sigmask(SIG_BLOCK, NULL, &set);
-         switch (a->cc) {
+-    sigdelset(&set, SIG_IPI);
-         case 0: /* eq: Z */
-             tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
+     init_emu();
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+     init_decoder();
-         if (sz == 1) {
+@@ -XXX,XX +XXX,XX @@ int hvf_init_vcpu(CPUState *cpu)
              tcg_gen_andi_i32(dest, dest, 0xffff);
          }
 -        neon_store_reg32(dest, rd);
 +        vfp_store_reg32(dest, rd);
          tcg_temp_free_i32(frn);
          tcg_temp_free_i32(frm);
          tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          TCGv_i32 tcg_res;
          tcg_op = tcg_temp_new_i32();
          tcg_res = tcg_temp_new_i32();
 -        neon_load_reg32(tcg_op, rm);
 +        vfp_load_reg32(tcg_op, rm);
          if (sz == 1) {
              gen_helper_rinth(tcg_res, tcg_op, fpst);
          } else {
              gen_helper_rints(tcg_res, tcg_op, fpst);
          }
 -        neon_store_reg32(tcg_res, rd);
 +        vfp_store_reg32(tcg_res, rd);
          tcg_temp_free_i32(tcg_op);
          tcg_temp_free_i32(tcg_res);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
              gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
          }
          tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
 -        neon_store_reg32(tcg_tmp, rd);
 +        vfp_store_reg32(tcg_tmp, rd);
          tcg_temp_free_i32(tcg_tmp);
          tcg_temp_free_i64(tcg_res);
          tcg_temp_free_i64(tcg_double);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
          TCGv_i32 tcg_single, tcg_res;
          tcg_single = tcg_temp_new_i32();
          tcg_res = tcg_temp_new_i32();
 -        neon_load_reg32(tcg_single, rm);
 +        vfp_load_reg32(tcg_single, rm);
          if (sz == 1) {
              if (is_signed) {
                  gen_helper_vfp_toslh(tcg_res, tcg_single, tcg_shift, fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
                  gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
              }
          }
 -        neon_store_reg32(tcg_res, rd);
 +        vfp_store_reg32(tcg_res, rd);
          tcg_temp_free_i32(tcg_res);
          tcg_temp_free_i32(tcg_single);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
      if (a->l) {
          /* VFP to general purpose register */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vn);
 +        vfp_load_reg32(tmp, a->vn);
          tcg_gen_andi_i32(tmp, tmp, 0xffff);
          store_reg(s, a->rt, tmp);
      } else {
          /* general purpose register to VFP */
          tmp = load_reg(s, a->rt);
          tcg_gen_andi_i32(tmp, tmp, 0xffff);
 -        neon_store_reg32(tmp, a->vn);
 +        vfp_store_reg32(tmp, a->vn);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
      if (a->l) {
          /* VFP to general purpose register */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vn);
 +        vfp_load_reg32(tmp, a->vn);
          if (a->rt == 15) {
              /* Set the 4 flag bits in the CPSR.  */
              gen_set_nzcv(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
      } else {
          /* general purpose register to VFP */
          tmp = load_reg(s, a->rt);
 -        neon_store_reg32(tmp, a->vn);
 +        vfp_store_reg32(tmp, a->vn);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
      if (a->op) {
          /* fpreg to gpreg */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm);
 +        vfp_load_reg32(tmp, a->vm);
          store_reg(s, a->rt, tmp);
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm + 1);
 +        vfp_load_reg32(tmp, a->vm + 1);
          store_reg(s, a->rt2, tmp);
      } else {
          /* gpreg to fpreg */
          tmp = load_reg(s, a->rt);
 -        neon_store_reg32(tmp, a->vm);
 +        vfp_store_reg32(tmp, a->vm);
          tcg_temp_free_i32(tmp);
          tmp = load_reg(s, a->rt2);
 -        neon_store_reg32(tmp, a->vm + 1);
 +        vfp_store_reg32(tmp, a->vm + 1);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
      if (a->op) {
          /* fpreg to gpreg */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm * 2);
 +        vfp_load_reg32(tmp, a->vm * 2);
          store_reg(s, a->rt, tmp);
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm * 2 + 1);
 +        vfp_load_reg32(tmp, a->vm * 2 + 1);
          store_reg(s, a->rt2, tmp);
      } else {
          /* gpreg to fpreg */
          tmp = load_reg(s, a->rt);
 -        neon_store_reg32(tmp, a->vm * 2);
 +        vfp_store_reg32(tmp, a->vm * 2);
          tcg_temp_free_i32(tmp);
          tmp = load_reg(s, a->rt2);
 -        neon_store_reg32(tmp, a->vm * 2 + 1);
 +        vfp_store_reg32(tmp, a->vm * 2 + 1);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
      tmp = tcg_temp_new_i32();
      if (a->l) {
          gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
 -        neon_store_reg32(tmp, a->vd);
 +        vfp_store_reg32(tmp, a->vd);
      } else {
 -        neon_load_reg32(tmp, a->vd);
 +        vfp_load_reg32(tmp, a->vd);
          gen_aa32_st16(s, tmp, addr, get_mem_index(s));
      }
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
      tmp = tcg_temp_new_i32();
      if (a->l) {
          gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
 -        neon_store_reg32(tmp, a->vd);
 +        vfp_store_reg32(tmp, a->vd);
      } else {
 -        neon_load_reg32(tmp, a->vd);
 +        vfp_load_reg32(tmp, a->vd);
          gen_aa32_st32(s, tmp, addr, get_mem_index(s));
      }
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
          if (a->l) {
              /* load */
              gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
 -            neon_store_reg32(tmp, a->vd + i);
 +            vfp_store_reg32(tmp, a->vd + i);
          } else {
              /* store */
 -            neon_load_reg32(tmp, a->vd + i);
 +            vfp_load_reg32(tmp, a->vd + i);
              gen_aa32_st32(s, tmp, addr, get_mem_index(s));
          }
          tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
      fd = tcg_temp_new_i32();
      fpst = fpstatus_ptr(FPST_FPCR);
 -    neon_load_reg32(f0, vn);
 -    neon_load_reg32(f1, vm);
 +    vfp_load_reg32(f0, vn);
 +    vfp_load_reg32(f1, vm);
      for (;;) {
          if (reads_vd) {
 -            neon_load_reg32(fd, vd);
 +            vfp_load_reg32(fd, vd);
          }
          fn(fd, f0, f1, fpst);
 -        neon_store_reg32(fd, vd);
 +        vfp_store_reg32(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
          veclen--;
          vd = vfp_advance_sreg(vd, delta_d);
          vn = vfp_advance_sreg(vn, delta_d);
 -        neon_load_reg32(f0, vn);
 +        vfp_load_reg32(f0, vn);
          if (delta_m) {
              vm = vfp_advance_sreg(vm, delta_m);
 -            neon_load_reg32(f1, vm);
 +            vfp_load_reg32(f1, vm);
          }
      }
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
+-    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
-     fd = tcg_temp_new_i32();
+-    cpu->vcpu_dirty = 1;
-     fpst = fpstatus_ptr(FPST_FPCR_F16);
+-    assert_hvf_ok(r);
+-
--    neon_load_reg32(f0, vn);
+     if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED,
--    neon_load_reg32(f1, vm);
+         &hvf_state->hvf_caps->vmx_cap_pinbased)) {
-+    vfp_load_reg32(f0, vn);
+         abort();
 +    vfp_load_reg32(f1, vm);
      if (reads_vd) {
 -        neon_load_reg32(fd, vd);
 +        vfp_load_reg32(fd, vd);
      }
      fn(fd, f0, f1, fpst);
 -    neon_store_reg32(fd, vd);
 +    vfp_store_reg32(fd, vd);
      tcg_temp_free_i32(f0);
      tcg_temp_free_i32(f1);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
      f0 = tcg_temp_new_i32();
      fd = tcg_temp_new_i32();
 -    neon_load_reg32(f0, vm);
 +    vfp_load_reg32(f0, vm);
      for (;;) {
          fn(fd, f0);
 -        neon_store_reg32(fd, vd);
 +        vfp_store_reg32(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
              /* single source one-many */
              while (veclen--) {
                  vd = vfp_advance_sreg(vd, delta_d);
 -                neon_store_reg32(fd, vd);
 +                vfp_store_reg32(fd, vd);
              }
              break;
          }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
          veclen--;
          vd = vfp_advance_sreg(vd, delta_d);
          vm = vfp_advance_sreg(vm, delta_m);
 -        neon_load_reg32(f0, vm);
 +        vfp_load_reg32(f0, vm);
      }
      tcg_temp_free_i32(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
      }
      f0 = tcg_temp_new_i32();
 -    neon_load_reg32(f0, vm);
 +    vfp_load_reg32(f0, vm);
      fn(f0, f0);
 -    neon_store_reg32(f0, vd);
 +    vfp_store_reg32(f0, vd);
      tcg_temp_free_i32(f0);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vn, a->vn);
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vn, a->vn);
 +    vfp_load_reg32(vm, a->vm);
      if (neg_n) {
          /* VFNMS, VFMS */
          gen_helper_vfp_negh(vn, vn);
      }
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (neg_d) {
          /* VFNMA, VFNMS */
          gen_helper_vfp_negh(vd, vd);
      }
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vn, a->vn);
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vn, a->vn);
 +    vfp_load_reg32(vm, a->vm);
      if (neg_n) {
          /* VFNMS, VFMS */
          gen_helper_vfp_negs(vn, vn);
      }
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (neg_d) {
          /* VFNMA, VFNMS */
          gen_helper_vfp_negs(vd, vd);
      }
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_hp(DisasContext *s, arg_VMOV_imm_sp *a)
      }
      fd = tcg_const_i32(vfp_expand_imm(MO_16, a->imm));
 -    neon_store_reg32(fd, a->vd);
 +    vfp_store_reg32(fd, a->vd);
      tcg_temp_free_i32(fd);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
      fd = tcg_const_i32(vfp_expand_imm(MO_32, a->imm));
      for (;;) {
 -        neon_store_reg32(fd, vd);
 +        vfp_store_reg32(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
      vd = tcg_temp_new_i32();
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (a->z) {
          tcg_gen_movi_i32(vm, 0);
      } else {
 -        neon_load_reg32(vm, a->vm);
 +        vfp_load_reg32(vm, a->vm);
      }
      if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
      vd = tcg_temp_new_i32();
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (a->z) {
          tcg_gen_movi_i32(vm, 0);
      } else {
 -        neon_load_reg32(vm, a->vm);
 +        vfp_load_reg32(vm, a->vm);
      }
      if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
      /* The T bit tells us if we want the low or high 16 bits of Vm */
      tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
      gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_i32(ahp_mode);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
      ahp_mode = get_ahp_flag();
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
      tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
      tcg_temp_free_i32(ahp_mode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_hp(DisasContext *s, arg_VRINTR_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      gen_helper_rinth(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rints(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_hp(DisasContext *s, arg_VRINTZ_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rinth(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tcg_rmode);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rints(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tcg_rmode);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_hp(DisasContext *s, arg_VRINTX_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      gen_helper_rinth_exact(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rints_exact(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i64();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      gen_helper_vfp_fcvtds(vd, vm, cpu_env);
      neon_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
      vm = tcg_temp_new_i64();
      neon_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i64(vm);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
      }
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      if (a->s) {
          /* i32 -> f16 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
          /* u32 -> f16 */
          gen_helper_vfp_uitoh(vm, vm, fpst);
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
      }
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      if (a->s) {
          /* i32 -> f32 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
          /* u32 -> f32 */
          gen_helper_vfp_uitos(vm, vm, fpst);
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i64();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      if (a->s) {
          /* i32 -> f64 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
      vd = tcg_temp_new_i32();
      neon_load_reg64(vm, a->vm);
      gen_helper_vjcvt(vd, vm, cpu_env);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i64(vm);
      tcg_temp_free_i32(vd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
          g_assert_not_reached();
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i32(shift);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
          g_assert_not_reached();
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i32(shift);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      if (a->s) {
          if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
              gen_helper_vfp_touih(vm, vm, fpst);
          }
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
      fpst = fpstatus_ptr(FPST_FPCR);
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      if (a->s) {
          if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
              gen_helper_vfp_touis(vm, vm, fpst);
          }
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
              gen_helper_vfp_touid(vd, vm, fpst);
          }
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i64(vm);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VINS(DisasContext *s, arg_VINS *a)
      /* Insert low half of Vm into high half of Vd */
      rm = tcg_temp_new_i32();
      rd = tcg_temp_new_i32();
 -    neon_load_reg32(rm, a->vm);
 -    neon_load_reg32(rd, a->vd);
 +    vfp_load_reg32(rm, a->vm);
 +    vfp_load_reg32(rd, a->vd);
      tcg_gen_deposit_i32(rd, rd, rm, 16, 16);
 -    neon_store_reg32(rd, a->vd);
 +    vfp_store_reg32(rd, a->vd);
      tcg_temp_free_i32(rm);
      tcg_temp_free_i32(rd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOVX(DisasContext *s, arg_VINS *a)
      /* Set Vd to high half of Vm */
      rm = tcg_temp_new_i32();
 -    neon_load_reg32(rm, a->vm);
 +    vfp_load_reg32(rm, a->vm);
      tcg_gen_shri_i32(rm, rm, 16);
 -    neon_store_reg32(rm, a->vd);
 +    vfp_store_reg32(rm, a->vd);
      tcg_temp_free_i32(rm);
      return true;
  }
 --
 .20.1

-[PULL 18/26] hw/arm/boot: fix SVE for EL3 direct kernel boot
+[PULL 35/45] hvf: Use cpu_synchronize_state()
-From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+From: Alexander Graf <agraf@csgraf.de>
-When booting a CPU with EL3 using the -kernel flag, set up CPTR_EL3 so
+There is no reason to call the hvf specific hvf_cpu_synchronize_state()
-that SVE will not trap to EL3.
+when we can just use the generic cpu_synchronize_state() instead. This
 allows us to have less dependency on internal function definitions and
 allows us to make hvf_cpu_synchronize_state() static.
-Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Sergio Lopez <slp@redhat.com>
-Message-id: 20201030151541.11976-1-remi@remlab.net
+Message-id: 20210519202253.76782-9-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/boot.c | 3 +++
+ accel/hvf/hvf-accel-ops.h | 1 -
-file changed, 3 insertions(+)
+ accel/hvf/hvf-accel-ops.c | 2 +-
  target/i386/hvf/x86hvf.c  | 9 ++++-----
 files changed, 5 insertions(+), 7 deletions(-)
-diff --git a/hw/arm/boot.c b/hw/arm/boot.c
+diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/boot.c
+--- a/accel/hvf/hvf-accel-ops.h
-+++ b/hw/arm/boot.c
++++ b/accel/hvf/hvf-accel-ops.h
-@@ -XXX,XX +XXX,XX @@ static void do_cpu_reset(void *opaque)
+@@ -XXX,XX +XXX,XX @@
-                     if (cpu_isar_feature(aa64_mte, cpu)) {
+ #include "sysemu/cpus.h"
-                         env->cp15.scr_el3 |= SCR_ATA;
-                     }
+ int hvf_vcpu_exec(CPUState *);
-+                    if (cpu_isar_feature(aa64_sve, cpu)) {
+-void hvf_cpu_synchronize_state(CPUState *);
-+                        env->cp15.cptr_el[3] |= CPTR_EZ;
+ void hvf_cpu_synchronize_post_reset(CPUState *);
-+                    }
+ void hvf_cpu_synchronize_post_init(CPUState *);
-                     /* AArch64 kernels never boot in secure mode */
+ void hvf_cpu_synchronize_pre_loadvm(CPUState *);
-                     assert(!info->secure_boot);
+diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
-                     /* This hook is only supported for AArch32 currently:
+index XXXXXXX..XXXXXXX 100644
 --- a/accel/hvf/hvf-accel-ops.c
 +++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
      }
  }
 -void hvf_cpu_synchronize_state(CPUState *cpu)
 +static void hvf_cpu_synchronize_state(CPUState *cpu)
  {
      if (!cpu->vcpu_dirty) {
          run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
 diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/x86hvf.c
 +++ b/target/i386/hvf/x86hvf.c
@@ -XXX,XX +XXX,XX @@
  #include "cpu.h"
  #include "x86_descr.h"
  #include "x86_decode.h"
 +#include "sysemu/hw_accel.h"
  #include "hw/i386/apic_internal.h"
  #include <Hypervisor/hv.h>
  #include <Hypervisor/hv_vmx.h>
 -#include "accel/hvf/hvf-accel-ops.h"
 -
  void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
                       SegmentCache *qseg, bool is_tr)
  {
@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
      env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
      if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
 -        hvf_cpu_synchronize_state(cpu_state);
 +        cpu_synchronize_state(cpu_state);
          do_cpu_init(cpu);
      }
@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
          cpu_state->halted = 0;
      }
      if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) {
 -        hvf_cpu_synchronize_state(cpu_state);
 +        cpu_synchronize_state(cpu_state);
          do_cpu_sipi(cpu);
      }
      if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) {
          cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR;
 -        hvf_cpu_synchronize_state(cpu_state);
 +        cpu_synchronize_state(cpu_state);
          apic_handle_tpr_access_report(cpu->apic_state, env->eip,
                                        env->tpr_access_type);
      }
 --
 .20.1

-[PULL 23/26] hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
+[PULL 36/45] hvf: Make synchronize functions static
-In gicv3_init_cpuif() we copy the ARMCPU gicv3_maintenance_interrupt
+From: Alexander Graf <agraf@csgraf.de>
 into the GICv3CPUState struct's maintenance_irq field.  This will
 only work if the board happens to have already wired up the CPU
 maintenance IRQ before the GIC was realized.  Unfortunately this is
 not the case for the 'virt' board, and so the value that gets copied
 is NULL (since a qemu_irq is really a pointer to an IRQState struct
 under the hood).  The effect is that the CPU interface code never
 actually raises the maintenance interrupt line.
-Instead, since the GICv3CPUState has a pointer to the CPUState, make
+The hvf accel synchronize functions are only used as input for local
-the dereference at the point where we want to raise the interrupt, to
+callback functions, so we can make them static.
 avoid an implicit requirement on board code to wire things up in a
 particular order.
-Reported-by: Jose Martins <josemartins90@gmail.com>
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-10-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20201009153904.28529-1-peter.maydell@linaro.org
-Reviewed-by: Luc Michel <luc@lmichel.fr>
 ---
- include/hw/intc/arm_gicv3_common.h | 1 -
+ accel/hvf/hvf-accel-ops.h | 3 ---
- hw/intc/arm_gicv3_cpuif.c          | 5 ++---
+ accel/hvf/hvf-accel-ops.c | 6 +++---
-files changed, 2 insertions(+), 4 deletions(-)
+files changed, 3 insertions(+), 6 deletions(-)
-diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
+diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/intc/arm_gicv3_common.h
+--- a/accel/hvf/hvf-accel-ops.h
-+++ b/include/hw/intc/arm_gicv3_common.h
++++ b/accel/hvf/hvf-accel-ops.h
-@@ -XXX,XX +XXX,XX @@ struct GICv3CPUState {
+@@ -XXX,XX +XXX,XX @@
-     qemu_irq parent_fiq;
+ #include "sysemu/cpus.h"
-     qemu_irq parent_virq;
-     qemu_irq parent_vfiq;
+ int hvf_vcpu_exec(CPUState *);
--    qemu_irq maintenance_irq;
+-void hvf_cpu_synchronize_post_reset(CPUState *);
+-void hvf_cpu_synchronize_post_init(CPUState *);
-     /* Redistributor */
+-void hvf_cpu_synchronize_pre_loadvm(CPUState *);
-     uint32_t level;                  /* Current IRQ level */
-diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
+ #endif /* HVF_CPUS_H */
 diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/arm_gicv3_cpuif.c
+--- a/accel/hvf/hvf-accel-ops.c
-+++ b/hw/intc/arm_gicv3_cpuif.c
++++ b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
+@@ -XXX,XX +XXX,XX @@ static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
-     int irqlevel = 0;
+     cpu->vcpu_dirty = false;
      int fiqlevel = 0;
      int maintlevel = 0;
 +    ARMCPU *cpu = ARM_CPU(cs->cpu);
      idx = hppvi_index(cs);
      trace_gicv3_cpuif_virt_update(gicv3_redist_affid(cs), idx);
@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
      qemu_set_irq(cs->parent_vfiq, fiqlevel);
      qemu_set_irq(cs->parent_virq, irqlevel);
 -    qemu_set_irq(cs->maintenance_irq, maintlevel);
 +    qemu_set_irq(cpu->gicv3_maintenance_interrupt, maintlevel);
  }
- static uint64_t icv_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
+-void hvf_cpu_synchronize_post_reset(CPUState *cpu)
-@@ -XXX,XX +XXX,XX @@ void gicv3_init_cpuif(GICv3State *s)
++static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
-             && cpu->gic_num_lrs) {
+ {
-             int j;
+     run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
+ }
--            cs->maintenance_irq = cpu->gicv3_maintenance_interrupt;
+@@ -XXX,XX +XXX,XX @@ static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
--
+     cpu->vcpu_dirty = false;
-             cs->num_list_regs = cpu->gic_num_lrs;
+ }
-             cs->vpribits = cpu->gic_vpribits;
-             cs->vprebits = cpu->gic_vprebits;
+-void hvf_cpu_synchronize_post_init(CPUState *cpu)
 +static void hvf_cpu_synchronize_post_init(CPUState *cpu)
  {
      run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
  }
@@ -XXX,XX +XXX,XX @@ static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
      cpu->vcpu_dirty = true;
  }
 -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
 +static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
  {
      run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
  }
 --
 .20.1

-[PULL 17/26] hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
+[PULL 37/45] hvf: Remove hvf-accel-ops.h
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+From: Alexander Graf <agraf@csgraf.de>
-Use the BIT_ULL() macro to ensure we use 64-bit arithmetic.
+We can move the definition of hvf_vcpu_exec() into our internal
-This fixes the following Coverity issue (OVERFLOW_BEFORE_WIDEN):
+hvf header, obsoleting the need for hvf-accel-ops.h.
-  CID 1432363 (#1 of 1): Unintentional integer overflow:
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
+Reviewed-by: Sergio Lopez <slp@redhat.com>
-  overflow_before_widen:
+Message-id: 20210519202253.76782-11-agraf@csgraf.de
     Potentially overflowing expression 1 << scale with type int
     (32 bits, signed) is evaluated using 32-bit arithmetic, and
     then used in a context that expects an expression of type
     hwaddr (64 bits, unsigned).
 Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Acked-by: Eric Auger <eric.auger@redhat.com>
 Message-id: 20201030144617.1535064-1-philmd@redhat.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/smmuv3.c | 3 ++-
+ accel/hvf/hvf-accel-ops.h | 17 -----------------
-file changed, 2 insertions(+), 1 deletion(-)
+ include/sysemu/hvf_int.h  |  1 +
  accel/hvf/hvf-accel-ops.c |  2 --
  target/i386/hvf/hvf.c     |  2 --
 files changed, 1 insertion(+), 21 deletions(-)
  delete mode 100644 accel/hvf/hvf-accel-ops.h
-diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
+diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
 deleted file mode 100644
 index XXXXXXX..XXXXXXX
 --- a/accel/hvf/hvf-accel-ops.h
 +++ /dev/null
@@ -XXX,XX +XXX,XX @@
 -/*
 - * Accelerator CPUS Interface
 - *
 - * Copyright 2020 SUSE LLC
 - *
 - * This work is licensed under the terms of the GNU GPL, version 2 or later.
 - * See the COPYING file in the top-level directory.
 - */
 -
 -#ifndef HVF_CPUS_H
 -#define HVF_CPUS_H
 -
 -#include "sysemu/cpus.h"
 -
 -int hvf_vcpu_exec(CPUState *);
 -
 -#endif /* HVF_CPUS_H */
 diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/smmuv3.c
+--- a/include/sysemu/hvf_int.h
-+++ b/hw/arm/smmuv3.c
++++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ extern HVFState *hvf_state;
  void assert_hvf_ok(hv_return_t ret);
  int hvf_arch_init_vcpu(CPUState *cpu);
  void hvf_arch_vcpu_destroy(CPUState *cpu);
 +int hvf_vcpu_exec(CPUState *);
  hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
  int hvf_put_registers(CPUState *);
  int hvf_get_registers(CPUState *);
 diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/hvf/hvf-accel-ops.c
 +++ b/accel/hvf/hvf-accel-ops.c
 @@ -XXX,XX +XXX,XX @@
-  */
+ #include "sysemu/runstate.h"
+ #include "qemu/guest-random.h"
- #include "qemu/osdep.h"
-+#include "qemu/bitops.h"
+-#include "hvf-accel-ops.h"
- #include "hw/irq.h"
+-
- #include "hw/sysbus.h"
+ HVFState *hvf_state;
- #include "migration/vmstate.h"
-@@ -XXX,XX +XXX,XX @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
+ /* Memory slots */
-         scale = CMD_SCALE(cmd);
+diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
-         num = CMD_NUM(cmd);
+index XXXXXXX..XXXXXXX 100644
-         ttl = CMD_TTL(cmd);
+--- a/target/i386/hvf/hvf.c
--        num_pages = (num + 1) * (1 << (scale));
++++ b/target/i386/hvf/hvf.c
-+        num_pages = (num + 1) * BIT_ULL(scale);
+@@ -XXX,XX +XXX,XX @@
-     }
+ #include "qemu/accel.h"
+ #include "target/i386/cpu.h"
-     if (type == SMMU_CMD_TLBI_NH_VA) {
 -#include "hvf-accel-ops.h"
 -
  void vmx_update_tpr(CPUState *cpu)
  {
      /* TODO: need integrate APIC handling */
 --
 .20.1

-[PULL 20/26] hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
+[PULL 38/45] hvf: Introduce hvf vcpu struct
-From: AlexChen <alex.chen@huawei.com>
+From: Alexander Graf <agraf@csgraf.de>
-In exynos4210_fimd_update(), the pointer s is dereferinced before
+We will need more than a single field for hvf going forward. To keep
-being check if it is valid, which may lead to NULL pointer dereference.
+the global vcpu struct uncluttered, let's allocate a special hvf vcpu
-So move the assignment to global_width after checking that the s is valid.
+struct, similar to how hax does it.
-Reported-by: Euler Robot <euler.robot@huawei.com>
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Signed-off-by: Alex Chen <alex.chen@huawei.com>
+Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
-Message-id: 5F9F8D88.9030102@huawei.com
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-12-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/display/exynos4210_fimd.c | 4 +++-
+ include/hw/core/cpu.h       |   3 +-
-file changed, 3 insertions(+), 1 deletion(-)
+ include/sysemu/hvf_int.h    |   4 +
  target/i386/hvf/vmx.h       |  24 +++--
  accel/hvf/hvf-accel-ops.c   |   8 +-
  target/i386/hvf/hvf.c       | 104 +++++++++---------
  target/i386/hvf/x86.c       |  28 ++---
  target/i386/hvf/x86_descr.c |  26 ++---
  target/i386/hvf/x86_emu.c   |  62 +++++------
  target/i386/hvf/x86_mmu.c   |   4 +-
  target/i386/hvf/x86_task.c  |  12 +--
  target/i386/hvf/x86hvf.c    | 210 ++++++++++++++++++------------------
 files changed, 248 insertions(+), 237 deletions(-)
-diff --git a/hw/display/exynos4210_fimd.c b/hw/display/exynos4210_fimd.c
+diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/exynos4210_fimd.c
+--- a/include/hw/core/cpu.h
-+++ b/hw/display/exynos4210_fimd.c
++++ b/include/hw/core/cpu.h
-@@ -XXX,XX +XXX,XX @@ static void exynos4210_fimd_update(void *opaque)
+@@ -XXX,XX +XXX,XX @@ struct KVMState;
-     bool blend = false;
+ struct kvm_run;
-     uint8_t *host_fb_addr;
-     bool is_dirty = false;
+ struct hax_vcpu_state;
--    const int global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
++struct hvf_vcpu_state;
-+    int global_width;
+ #define TB_JMP_CACHE_BITS 12
-     if (!s || !s->console || !s->enabled ||
+ #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
-         surface_bits_per_pixel(qemu_console_surface(s->console)) == 0) {
+@@ -XXX,XX +XXX,XX @@ struct CPUState {
      struct hax_vcpu_state *hax_vcpu;
 -    int hvf_fd;
 +    struct hvf_vcpu_state *hvf;
      /* track IOMMUs whose translations we've cached in the TCG TLB */
      GArray *iommu_notifiers;
 diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/sysemu/hvf_int.h
 +++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
  };
  extern HVFState *hvf_state;
 +struct hvf_vcpu_state {
 +    int fd;
 +};
 +
  void assert_hvf_ok(hv_return_t ret);
  int hvf_arch_init_vcpu(CPUState *cpu);
  void hvf_arch_vcpu_destroy(CPUState *cpu);
 diff --git a/target/i386/hvf/vmx.h b/target/i386/hvf/vmx.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/vmx.h
 +++ b/target/i386/hvf/vmx.h
@@ -XXX,XX +XXX,XX @@
  #include "vmcs.h"
  #include "cpu.h"
  #include "x86.h"
 +#include "sysemu/hvf.h"
 +#include "sysemu/hvf_int.h"
  #include "exec/address-spaces.h"
@@ -XXX,XX +XXX,XX @@ static inline void macvm_set_rip(CPUState *cpu, uint64_t rip)
      uint64_t val;
      /* BUG, should take considering overlap.. */
 -    wreg(cpu->hvf_fd, HV_X86_RIP, rip);
 +    wreg(cpu->hvf->fd, HV_X86_RIP, rip);
      env->eip = rip;
      /* after moving forward in rip, we need to clean INTERRUPTABILITY */
 -   val = rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
 +   val = rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
     if (val & (VMCS_INTERRUPTIBILITY_STI_BLOCKING |
                 VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)) {
          env->hflags &= ~HF_INHIBIT_IRQ_MASK;
 -        wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY,
 +        wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY,
                 val & ~(VMCS_INTERRUPTIBILITY_STI_BLOCKING |
                 VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING));
     }
@@ -XXX,XX +XXX,XX @@ static inline void vmx_clear_nmi_blocking(CPUState *cpu)
      CPUX86State *env = &x86_cpu->env;
      env->hflags2 &= ~HF2_NMI_MASK;
 -    uint32_t gi = (uint32_t) rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
 +    uint32_t gi = (uint32_t) rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
      gi &= ~VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
 -    wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
 +    wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
  }
  static inline void vmx_set_nmi_blocking(CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ static inline void vmx_set_nmi_blocking(CPUState *cpu)
      CPUX86State *env = &x86_cpu->env;
      env->hflags2 |= HF2_NMI_MASK;
 -    uint32_t gi = (uint32_t)rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
 +    uint32_t gi = (uint32_t)rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
      gi |= VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
 -    wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
 +    wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
  }
  static inline void vmx_set_nmi_window_exiting(CPUState *cpu)
  {
      uint64_t val;
 -    val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
 -    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val |
 +    val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
 +    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val |
            VMCS_PRI_PROC_BASED_CTLS_NMI_WINDOW_EXITING);
  }
@@ -XXX,XX +XXX,XX @@ static inline void vmx_clear_nmi_window_exiting(CPUState *cpu)
  {
      uint64_t val;
 -    val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
 -    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val &
 +    val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
 +    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val &
            ~VMCS_PRI_PROC_BASED_CTLS_NMI_WINDOW_EXITING);
  }
 diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/hvf/hvf-accel-ops.c
 +++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ type_init(hvf_type_init);
  static void hvf_vcpu_destroy(CPUState *cpu)
  {
 -    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
 +    hv_return_t ret = hv_vcpu_destroy(cpu->hvf->fd);
      assert_hvf_ok(ret);
      hvf_arch_vcpu_destroy(cpu);
 +    g_free(cpu->hvf);
 +    cpu->hvf = NULL;
  }
  static int hvf_init_vcpu(CPUState *cpu)
  {
      int r;
 +    cpu->hvf = g_malloc0(sizeof(*cpu->hvf));
 +
      /* init cpu signals */
      sigset_t set;
      struct sigaction sigact;
@@ -XXX,XX +XXX,XX @@ static int hvf_init_vcpu(CPUState *cpu)
      pthread_sigmask(SIG_BLOCK, NULL, &set);
      sigdelset(&set, SIG_IPI);
 -    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
 +    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf->fd, HV_VCPU_DEFAULT);
      cpu->vcpu_dirty = 1;
      assert_hvf_ok(r);
 diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/hvf.c
 +++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@ void vmx_update_tpr(CPUState *cpu)
      int tpr = cpu_get_apic_tpr(x86_cpu->apic_state) << 4;
      int irr = apic_get_highest_priority_irr(x86_cpu->apic_state);
 -    wreg(cpu->hvf_fd, HV_X86_TPR, tpr);
 +    wreg(cpu->hvf->fd, HV_X86_TPR, tpr);
      if (irr == -1) {
 -        wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, 0);
 +        wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, 0);
      } else {
 -        wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, (irr > tpr) ? tpr >> 4 :
 +        wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, (irr > tpr) ? tpr >> 4 :
                irr >> 4);
      }
  }
@@ -XXX,XX +XXX,XX @@ void vmx_update_tpr(CPUState *cpu)
  static void update_apic_tpr(CPUState *cpu)
  {
      X86CPU *x86_cpu = X86_CPU(cpu);
 -    int tpr = rreg(cpu->hvf_fd, HV_X86_TPR) >> 4;
 +    int tpr = rreg(cpu->hvf->fd, HV_X86_TPR) >> 4;
      cpu_set_apic_tpr(x86_cpu->apic_state, tpr);
  }
@@ -XXX,XX +XXX,XX @@ int hvf_arch_init_vcpu(CPUState *cpu)
      }
      /* set VMCS control fields */
 -    wvmcs(cpu->hvf_fd, VMCS_PIN_BASED_CTLS,
 +    wvmcs(cpu->hvf->fd, VMCS_PIN_BASED_CTLS,
            cap2ctrl(hvf_state->hvf_caps->vmx_cap_pinbased,
            VMCS_PIN_BASED_CTLS_EXTINT |
            VMCS_PIN_BASED_CTLS_NMI |
            VMCS_PIN_BASED_CTLS_VNMI));
 -    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS,
 +    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS,
            cap2ctrl(hvf_state->hvf_caps->vmx_cap_procbased,
            VMCS_PRI_PROC_BASED_CTLS_HLT |
            VMCS_PRI_PROC_BASED_CTLS_MWAIT |
            VMCS_PRI_PROC_BASED_CTLS_TSC_OFFSET |
            VMCS_PRI_PROC_BASED_CTLS_TPR_SHADOW) |
            VMCS_PRI_PROC_BASED_CTLS_SEC_CONTROL);
 -    wvmcs(cpu->hvf_fd, VMCS_SEC_PROC_BASED_CTLS,
 +    wvmcs(cpu->hvf->fd, VMCS_SEC_PROC_BASED_CTLS,
            cap2ctrl(hvf_state->hvf_caps->vmx_cap_procbased2,
                     VMCS_PRI_PROC_BASED2_CTLS_APIC_ACCESSES));
 -    wvmcs(cpu->hvf_fd, VMCS_ENTRY_CTLS, cap2ctrl(hvf_state->hvf_caps->vmx_cap_entry,
 +    wvmcs(cpu->hvf->fd, VMCS_ENTRY_CTLS, cap2ctrl(hvf_state->hvf_caps->vmx_cap_entry,
 ));
 -    wvmcs(cpu->hvf_fd, VMCS_EXCEPTION_BITMAP, 0); /* Double fault */
 +    wvmcs(cpu->hvf->fd, VMCS_EXCEPTION_BITMAP, 0); /* Double fault */
 -    wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, 0);
 +    wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, 0);
      x86cpu = X86_CPU(cpu);
      x86cpu->env.xsave_buf = qemu_memalign(4096, 4096);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_STAR, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_LSTAR, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_CSTAR, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_FMASK, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_FSBASE, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_GSBASE, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_KERNELGSBASE, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_TSC_AUX, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_TSC, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_CS, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_EIP, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_ESP, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_STAR, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_LSTAR, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_CSTAR, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_FMASK, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_FSBASE, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_GSBASE, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_KERNELGSBASE, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_TSC_AUX, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_TSC, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_CS, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_EIP, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_ESP, 1);
      return 0;
  }
@@ -XXX,XX +XXX,XX @@ static void hvf_store_events(CPUState *cpu, uint32_t ins_len, uint64_t idtvec_in
          }
          if (idtvec_info & VMCS_IDT_VEC_ERRCODE_VALID) {
              env->has_error_code = true;
 -            env->error_code = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_ERROR);
 +            env->error_code = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_ERROR);
          }
      }
 -    if ((rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY) &
 +    if ((rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY) &
          VMCS_INTERRUPTIBILITY_NMI_BLOCKING)) {
          env->hflags2 |= HF2_NMI_MASK;
      } else {
          env->hflags2 &= ~HF2_NMI_MASK;
      }
 -    if (rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY) &
 +    if (rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY) &
           (VMCS_INTERRUPTIBILITY_STI_BLOCKING |
           VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)) {
          env->hflags |= HF_INHIBIT_IRQ_MASK;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
              return EXCP_HLT;
          }
 -        hv_return_t r  = hv_vcpu_run(cpu->hvf_fd);
 +        hv_return_t r  = hv_vcpu_run(cpu->hvf->fd);
          assert_hvf_ok(r);
          /* handle VMEXIT */
 -        uint64_t exit_reason = rvmcs(cpu->hvf_fd, VMCS_EXIT_REASON);
 -        uint64_t exit_qual = rvmcs(cpu->hvf_fd, VMCS_EXIT_QUALIFICATION);
 -        uint32_t ins_len = (uint32_t)rvmcs(cpu->hvf_fd,
 +        uint64_t exit_reason = rvmcs(cpu->hvf->fd, VMCS_EXIT_REASON);
 +        uint64_t exit_qual = rvmcs(cpu->hvf->fd, VMCS_EXIT_QUALIFICATION);
 +        uint32_t ins_len = (uint32_t)rvmcs(cpu->hvf->fd,
                                             VMCS_EXIT_INSTRUCTION_LENGTH);
 -        uint64_t idtvec_info = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_INFO);
 +        uint64_t idtvec_info = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_INFO);
          hvf_store_events(cpu, ins_len, idtvec_info);
 -        rip = rreg(cpu->hvf_fd, HV_X86_RIP);
 -        env->eflags = rreg(cpu->hvf_fd, HV_X86_RFLAGS);
 +        rip = rreg(cpu->hvf->fd, HV_X86_RIP);
 +        env->eflags = rreg(cpu->hvf->fd, HV_X86_RFLAGS);
          qemu_mutex_lock_iothread();
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
          case EXIT_REASON_EPT_FAULT:
          {
              hvf_slot *slot;
 -            uint64_t gpa = rvmcs(cpu->hvf_fd, VMCS_GUEST_PHYSICAL_ADDRESS);
 +            uint64_t gpa = rvmcs(cpu->hvf->fd, VMCS_GUEST_PHYSICAL_ADDRESS);
              if (((idtvec_info & VMCS_IDT_VEC_VALID) == 0) &&
                  ((exit_qual & EXIT_QUAL_NMIUDTI) != 0)) {
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
                  store_regs(cpu);
                  break;
              } else if (!string && !in) {
 -                RAX(env) = rreg(cpu->hvf_fd, HV_X86_RAX);
 +                RAX(env) = rreg(cpu->hvf->fd, HV_X86_RAX);
                  hvf_handle_io(env, port, &RAX(env), 1, size, 1);
                  macvm_set_rip(cpu, rip + ins_len);
                  break;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
              break;
          }
          case EXIT_REASON_CPUID: {
 -            uint32_t rax = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RAX);
 -            uint32_t rbx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RBX);
 -            uint32_t rcx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RCX);
 -            uint32_t rdx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RDX);
 +            uint32_t rax = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RAX);
 +            uint32_t rbx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RBX);
 +            uint32_t rcx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RCX);
 +            uint32_t rdx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RDX);
              if (rax == 1) {
                  /* CPUID1.ecx.OSXSAVE needs to know CR4 */
 -                env->cr[4] = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR4);
 +                env->cr[4] = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR4);
              }
              hvf_cpu_x86_cpuid(env, rax, rcx, &rax, &rbx, &rcx, &rdx);
 -            wreg(cpu->hvf_fd, HV_X86_RAX, rax);
 -            wreg(cpu->hvf_fd, HV_X86_RBX, rbx);
 -            wreg(cpu->hvf_fd, HV_X86_RCX, rcx);
 -            wreg(cpu->hvf_fd, HV_X86_RDX, rdx);
 +            wreg(cpu->hvf->fd, HV_X86_RAX, rax);
 +            wreg(cpu->hvf->fd, HV_X86_RBX, rbx);
 +            wreg(cpu->hvf->fd, HV_X86_RCX, rcx);
 +            wreg(cpu->hvf->fd, HV_X86_RDX, rdx);
              macvm_set_rip(cpu, rip + ins_len);
              break;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
          case EXIT_REASON_XSETBV: {
              X86CPU *x86_cpu = X86_CPU(cpu);
              CPUX86State *env = &x86_cpu->env;
 -            uint32_t eax = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RAX);
 -            uint32_t ecx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RCX);
 -            uint32_t edx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RDX);
 +            uint32_t eax = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RAX);
 +            uint32_t ecx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RCX);
 +            uint32_t edx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RDX);
              if (ecx) {
                  macvm_set_rip(cpu, rip + ins_len);
                  break;
              }
              env->xcr0 = ((uint64_t)edx << 32) | eax;
 -            wreg(cpu->hvf_fd, HV_X86_XCR0, env->xcr0 | 1);
 +            wreg(cpu->hvf->fd, HV_X86_XCR0, env->xcr0 | 1);
              macvm_set_rip(cpu, rip + ins_len);
              break;
          }
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
              switch (cr) {
              case 0x0: {
 -                macvm_set_cr0(cpu->hvf_fd, RRX(env, reg));
 +                macvm_set_cr0(cpu->hvf->fd, RRX(env, reg));
                  break;
              }
              case 4: {
 -                macvm_set_cr4(cpu->hvf_fd, RRX(env, reg));
 +                macvm_set_cr4(cpu->hvf->fd, RRX(env, reg));
                  break;
              }
              case 8: {
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
              break;
          }
          case EXIT_REASON_TASK_SWITCH: {
 -            uint64_t vinfo = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_INFO);
 +            uint64_t vinfo = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_INFO);
              x68_segment_selector sel = {.sel = exit_qual & 0xffff};
              vmx_handle_task_switch(cpu, sel, (exit_qual >> 30) & 0x3,
               vinfo & VMCS_INTR_VALID, vinfo & VECTORING_INFO_VECTOR_MASK, vinfo
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
              break;
          }
          case EXIT_REASON_RDPMC:
 -            wreg(cpu->hvf_fd, HV_X86_RAX, 0);
 -            wreg(cpu->hvf_fd, HV_X86_RDX, 0);
 +            wreg(cpu->hvf->fd, HV_X86_RAX, 0);
 +            wreg(cpu->hvf->fd, HV_X86_RDX, 0);
              macvm_set_rip(cpu, rip + ins_len);
              break;
          case VMX_REASON_VMCALL:
 diff --git a/target/i386/hvf/x86.c b/target/i386/hvf/x86.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/x86.c
 +++ b/target/i386/hvf/x86.c
@@ -XXX,XX +XXX,XX @@ bool x86_read_segment_descriptor(struct CPUState *cpu,
      }
      if (GDT_SEL == sel.ti) {
 -        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_BASE);
 -        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
 +        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_BASE);
 +        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
      } else {
 -        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_BASE);
 -        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_LIMIT);
 +        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_BASE);
 +        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_LIMIT);
      }
      if (sel.index * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_write_segment_descriptor(struct CPUState *cpu,
      uint32_t limit;
      if (GDT_SEL == sel.ti) {
 -        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_BASE);
 -        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
 +        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_BASE);
 +        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
      } else {
 -        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_BASE);
 -        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_LIMIT);
 +        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_BASE);
 +        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_LIMIT);
      }
      if (sel.index * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_write_segment_descriptor(struct CPUState *cpu,
  bool x86_read_call_gate(struct CPUState *cpu, struct x86_call_gate *idt_desc,
                          int gate)
  {
 -    target_ulong base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_IDTR_BASE);
 -    uint32_t limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_IDTR_LIMIT);
 +    target_ulong base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_IDTR_BASE);
 +    uint32_t limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_IDTR_LIMIT);
      memset(idt_desc, 0, sizeof(*idt_desc));
      if (gate * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_read_call_gate(struct CPUState *cpu, struct x86_call_gate *idt_desc,
  bool x86_is_protected(struct CPUState *cpu)
  {
 -    uint64_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
 +    uint64_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
      return cr0 & CR0_PE;
  }
@@ -XXX,XX +XXX,XX @@ bool x86_is_v8086(struct CPUState *cpu)
  bool x86_is_long_mode(struct CPUState *cpu)
  {
 -    return rvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER) & MSR_EFER_LMA;
 +    return rvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER) & MSR_EFER_LMA;
  }
  bool x86_is_long64_mode(struct CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ bool x86_is_long64_mode(struct CPUState *cpu)
  bool x86_is_paging_mode(struct CPUState *cpu)
  {
 -    uint64_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
 +    uint64_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
      return cr0 & CR0_PG;
  }
  bool x86_is_pae_enabled(struct CPUState *cpu)
  {
 -    uint64_t cr4 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR4);
 +    uint64_t cr4 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR4);
      return cr4 & CR4_PAE;
  }
 diff --git a/target/i386/hvf/x86_descr.c b/target/i386/hvf/x86_descr.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/x86_descr.c
 +++ b/target/i386/hvf/x86_descr.c
@@ -XXX,XX +XXX,XX @@ static const struct vmx_segment_field {
  uint32_t vmx_read_segment_limit(CPUState *cpu, X86Seg seg)
  {
 -    return (uint32_t)rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].limit);
 +    return (uint32_t)rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].limit);
  }
  uint32_t vmx_read_segment_ar(CPUState *cpu, X86Seg seg)
  {
 -    return (uint32_t)rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].ar_bytes);
 +    return (uint32_t)rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].ar_bytes);
  }
  uint64_t vmx_read_segment_base(CPUState *cpu, X86Seg seg)
  {
 -    return rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].base);
 +    return rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].base);
  }
  x68_segment_selector vmx_read_segment_selector(CPUState *cpu, X86Seg seg)
  {
      x68_segment_selector sel;
 -    sel.sel = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector);
 +    sel.sel = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector);
      return sel;
  }
  void vmx_write_segment_selector(struct CPUState *cpu, x68_segment_selector selector, X86Seg seg)
  {
 -    wvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector, selector.sel);
 +    wvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector, selector.sel);
  }
  void vmx_read_segment_descriptor(struct CPUState *cpu, struct vmx_segment *desc, X86Seg seg)
  {
 -    desc->sel = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector);
 -    desc->base = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].base);
 -    desc->limit = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].limit);
 -    desc->ar = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].ar_bytes);
 +    desc->sel = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector);
 +    desc->base = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].base);
 +    desc->limit = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].limit);
 +    desc->ar = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].ar_bytes);
  }
  void vmx_write_segment_descriptor(CPUState *cpu, struct vmx_segment *desc, X86Seg seg)
  {
      const struct vmx_segment_field *sf = &vmx_segment_fields[seg];
 -    wvmcs(cpu->hvf_fd, sf->base, desc->base);
 -    wvmcs(cpu->hvf_fd, sf->limit, desc->limit);
 -    wvmcs(cpu->hvf_fd, sf->selector, desc->sel);
 -    wvmcs(cpu->hvf_fd, sf->ar_bytes, desc->ar);
 +    wvmcs(cpu->hvf->fd, sf->base, desc->base);
 +    wvmcs(cpu->hvf->fd, sf->limit, desc->limit);
 +    wvmcs(cpu->hvf->fd, sf->selector, desc->sel);
 +    wvmcs(cpu->hvf->fd, sf->ar_bytes, desc->ar);
  }
  void x86_segment_descriptor_to_vmx(struct CPUState *cpu, x68_segment_selector selector, struct x86_segment_descriptor *desc, struct vmx_segment *vmx_desc)
 diff --git a/target/i386/hvf/x86_emu.c b/target/i386/hvf/x86_emu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/x86_emu.c
 +++ b/target/i386/hvf/x86_emu.c
@@ -XXX,XX +XXX,XX @@ void simulate_rdmsr(struct CPUState *cpu)
      switch (msr) {
      case MSR_IA32_TSC:
 -        val = rdtscp() + rvmcs(cpu->hvf_fd, VMCS_TSC_OFFSET);
 +        val = rdtscp() + rvmcs(cpu->hvf->fd, VMCS_TSC_OFFSET);
          break;
      case MSR_IA32_APICBASE:
          val = cpu_get_apic_base(X86_CPU(cpu)->apic_state);
@@ -XXX,XX +XXX,XX @@ void simulate_rdmsr(struct CPUState *cpu)
          val = x86_cpu->ucode_rev;
          break;
      case MSR_EFER:
 -        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER);
 +        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER);
          break;
      case MSR_FSBASE:
 -        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_FS_BASE);
 +        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_FS_BASE);
          break;
      case MSR_GSBASE:
 -        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_GS_BASE);
 +        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_GS_BASE);
          break;
      case MSR_KERNELGSBASE:
 -        val = rvmcs(cpu->hvf_fd, VMCS_HOST_FS_BASE);
 +        val = rvmcs(cpu->hvf->fd, VMCS_HOST_FS_BASE);
          break;
      case MSR_STAR:
          abort();
@@ -XXX,XX +XXX,XX @@ void simulate_wrmsr(struct CPUState *cpu)
          cpu_set_apic_base(X86_CPU(cpu)->apic_state, data);
          break;
      case MSR_FSBASE:
 -        wvmcs(cpu->hvf_fd, VMCS_GUEST_FS_BASE, data);
 +        wvmcs(cpu->hvf->fd, VMCS_GUEST_FS_BASE, data);
          break;
      case MSR_GSBASE:
 -        wvmcs(cpu->hvf_fd, VMCS_GUEST_GS_BASE, data);
 +        wvmcs(cpu->hvf->fd, VMCS_GUEST_GS_BASE, data);
          break;
      case MSR_KERNELGSBASE:
 -        wvmcs(cpu->hvf_fd, VMCS_HOST_FS_BASE, data);
 +        wvmcs(cpu->hvf->fd, VMCS_HOST_FS_BASE, data);
          break;
      case MSR_STAR:
          abort();
@@ -XXX,XX +XXX,XX @@ void simulate_wrmsr(struct CPUState *cpu)
          break;
      case MSR_EFER:
          /*printf("new efer %llx\n", EFER(cpu));*/
 -        wvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER, data);
 +        wvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER, data);
          if (data & MSR_EFER_NXE) {
 -            hv_vcpu_invalidate_tlb(cpu->hvf_fd);
 +            hv_vcpu_invalidate_tlb(cpu->hvf->fd);
          }
          break;
      case MSR_MTRRphysBase(0):
@@ -XXX,XX +XXX,XX @@ void load_regs(struct CPUState *cpu)
      CPUX86State *env = &x86_cpu->env;
      int i = 0;
 -    RRX(env, R_EAX) = rreg(cpu->hvf_fd, HV_X86_RAX);
 -    RRX(env, R_EBX) = rreg(cpu->hvf_fd, HV_X86_RBX);
 -    RRX(env, R_ECX) = rreg(cpu->hvf_fd, HV_X86_RCX);
 -    RRX(env, R_EDX) = rreg(cpu->hvf_fd, HV_X86_RDX);
 -    RRX(env, R_ESI) = rreg(cpu->hvf_fd, HV_X86_RSI);
 -    RRX(env, R_EDI) = rreg(cpu->hvf_fd, HV_X86_RDI);
 -    RRX(env, R_ESP) = rreg(cpu->hvf_fd, HV_X86_RSP);
 -    RRX(env, R_EBP) = rreg(cpu->hvf_fd, HV_X86_RBP);
 +    RRX(env, R_EAX) = rreg(cpu->hvf->fd, HV_X86_RAX);
 +    RRX(env, R_EBX) = rreg(cpu->hvf->fd, HV_X86_RBX);
 +    RRX(env, R_ECX) = rreg(cpu->hvf->fd, HV_X86_RCX);
 +    RRX(env, R_EDX) = rreg(cpu->hvf->fd, HV_X86_RDX);
 +    RRX(env, R_ESI) = rreg(cpu->hvf->fd, HV_X86_RSI);
 +    RRX(env, R_EDI) = rreg(cpu->hvf->fd, HV_X86_RDI);
 +    RRX(env, R_ESP) = rreg(cpu->hvf->fd, HV_X86_RSP);
 +    RRX(env, R_EBP) = rreg(cpu->hvf->fd, HV_X86_RBP);
      for (i = 8; i < 16; i++) {
 -        RRX(env, i) = rreg(cpu->hvf_fd, HV_X86_RAX + i);
 +        RRX(env, i) = rreg(cpu->hvf->fd, HV_X86_RAX + i);
      }
 -    env->eflags = rreg(cpu->hvf_fd, HV_X86_RFLAGS);
 +    env->eflags = rreg(cpu->hvf->fd, HV_X86_RFLAGS);
      rflags_to_lflags(env);
 -    env->eip = rreg(cpu->hvf_fd, HV_X86_RIP);
 +    env->eip = rreg(cpu->hvf->fd, HV_X86_RIP);
  }
  void store_regs(struct CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ void store_regs(struct CPUState *cpu)
      CPUX86State *env = &x86_cpu->env;
      int i = 0;
 -    wreg(cpu->hvf_fd, HV_X86_RAX, RAX(env));
 -    wreg(cpu->hvf_fd, HV_X86_RBX, RBX(env));
 -    wreg(cpu->hvf_fd, HV_X86_RCX, RCX(env));
 -    wreg(cpu->hvf_fd, HV_X86_RDX, RDX(env));
 -    wreg(cpu->hvf_fd, HV_X86_RSI, RSI(env));
 -    wreg(cpu->hvf_fd, HV_X86_RDI, RDI(env));
 -    wreg(cpu->hvf_fd, HV_X86_RBP, RBP(env));
 -    wreg(cpu->hvf_fd, HV_X86_RSP, RSP(env));
 +    wreg(cpu->hvf->fd, HV_X86_RAX, RAX(env));
 +    wreg(cpu->hvf->fd, HV_X86_RBX, RBX(env));
 +    wreg(cpu->hvf->fd, HV_X86_RCX, RCX(env));
 +    wreg(cpu->hvf->fd, HV_X86_RDX, RDX(env));
 +    wreg(cpu->hvf->fd, HV_X86_RSI, RSI(env));
 +    wreg(cpu->hvf->fd, HV_X86_RDI, RDI(env));
 +    wreg(cpu->hvf->fd, HV_X86_RBP, RBP(env));
 +    wreg(cpu->hvf->fd, HV_X86_RSP, RSP(env));
      for (i = 8; i < 16; i++) {
 -        wreg(cpu->hvf_fd, HV_X86_RAX + i, RRX(env, i));
 +        wreg(cpu->hvf->fd, HV_X86_RAX + i, RRX(env, i));
      }
      lflags_to_rflags(env);
 -    wreg(cpu->hvf_fd, HV_X86_RFLAGS, env->eflags);
 +    wreg(cpu->hvf->fd, HV_X86_RFLAGS, env->eflags);
      macvm_set_rip(cpu, env->eip);
  }
 diff --git a/target/i386/hvf/x86_mmu.c b/target/i386/hvf/x86_mmu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/x86_mmu.c
 +++ b/target/i386/hvf/x86_mmu.c
@@ -XXX,XX +XXX,XX @@ static bool test_pt_entry(struct CPUState *cpu, struct gpt_translation *pt,
          pt->err_code |= MMU_PAGE_PT;
      }
 -    uint32_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
 +    uint32_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
      /* check protection */
      if (cr0 & CR0_WP) {
          if (pt->write_access && !pte_write_access(pte)) {
@@ -XXX,XX +XXX,XX @@ static bool walk_gpt(struct CPUState *cpu, target_ulong addr, int err_code,
  {
      int top_level, level;
      bool is_large = false;
 -    target_ulong cr3 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR3);
 +    target_ulong cr3 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR3);
      uint64_t page_mask = pae ? PAE_PTE_PAGE_MASK : LEGACY_PTE_PAGE_MASK;
      memset(pt, 0, sizeof(*pt));
 diff --git a/target/i386/hvf/x86_task.c b/target/i386/hvf/x86_task.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/x86_task.c
 +++ b/target/i386/hvf/x86_task.c
@@ -XXX,XX +XXX,XX @@ static void load_state_from_tss32(CPUState *cpu, struct x86_tss_segment32 *tss)
      X86CPU *x86_cpu = X86_CPU(cpu);
      CPUX86State *env = &x86_cpu->env;
 -    wvmcs(cpu->hvf_fd, VMCS_GUEST_CR3, tss->cr3);
 +    wvmcs(cpu->hvf->fd, VMCS_GUEST_CR3, tss->cr3);
      env->eip = tss->eip;
      env->eflags = tss->eflags | 2;
@@ -XXX,XX +XXX,XX @@ static int task_switch_32(CPUState *cpu, x68_segment_selector tss_sel, x68_segme
  void vmx_handle_task_switch(CPUState *cpu, x68_segment_selector tss_sel, int reason, bool gate_valid, uint8_t gate, uint64_t gate_type)
  {
 -    uint64_t rip = rreg(cpu->hvf_fd, HV_X86_RIP);
 +    uint64_t rip = rreg(cpu->hvf->fd, HV_X86_RIP);
      if (!gate_valid || (gate_type != VMCS_INTR_T_HWEXCEPTION &&
                          gate_type != VMCS_INTR_T_HWINTR &&
                          gate_type != VMCS_INTR_T_NMI)) {
 -        int ins_len = rvmcs(cpu->hvf_fd, VMCS_EXIT_INSTRUCTION_LENGTH);
 +        int ins_len = rvmcs(cpu->hvf->fd, VMCS_EXIT_INSTRUCTION_LENGTH);
          macvm_set_rip(cpu, rip + ins_len);
          return;
      }
-+
+@@ -XXX,XX +XXX,XX @@ void vmx_handle_task_switch(CPUState *cpu, x68_segment_selector tss_sel, int rea
-+    global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
+         //ret = task_switch_16(cpu, tss_sel, old_tss_sel, old_tss_base, &next_tss_desc);
-     exynos4210_update_resolution(s);
+         VM_PANIC("task_switch_16");
-     surface = qemu_console_surface(s->console);
+-    macvm_set_cr0(cpu->hvf_fd, rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0) | CR0_TS);
 +    macvm_set_cr0(cpu->hvf->fd, rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0) | CR0_TS);
      x86_segment_descriptor_to_vmx(cpu, tss_sel, &next_tss_desc, &vmx_seg);
      vmx_write_segment_descriptor(cpu, &vmx_seg, R_TR);
      store_regs(cpu);
 -    hv_vcpu_invalidate_tlb(cpu->hvf_fd);
 -    hv_vcpu_flush(cpu->hvf_fd);
 +    hv_vcpu_invalidate_tlb(cpu->hvf->fd);
 +    hv_vcpu_flush(cpu->hvf->fd);
  }
 diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/x86hvf.c
 +++ b/target/i386/hvf/x86hvf.c
@@ -XXX,XX +XXX,XX @@ void hvf_put_xsave(CPUState *cpu_state)
      x86_cpu_xsave_all_areas(X86_CPU(cpu_state), xsave);
 -    if (hv_vcpu_write_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
 +    if (hv_vcpu_write_fpstate(cpu_state->hvf->fd, (void*)xsave, 4096)) {
          abort();
      }
  }
@@ -XXX,XX +XXX,XX @@ void hvf_put_segments(CPUState *cpu_state)
      CPUX86State *env = &X86_CPU(cpu_state)->env;
      struct vmx_segment seg;
 -    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_LIMIT, env->idt.limit);
 -    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_BASE, env->idt.base);
 +    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_LIMIT, env->idt.limit);
 +    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_BASE, env->idt.base);
 -    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_LIMIT, env->gdt.limit);
 -    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_BASE, env->gdt.base);
 +    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_LIMIT, env->gdt.limit);
 +    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_BASE, env->gdt.base);
 -    /* wvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR2, env->cr[2]); */
 -    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR3, env->cr[3]);
 +    /* wvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR2, env->cr[2]); */
 +    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR3, env->cr[3]);
      vmx_update_tpr(cpu_state);
 -    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IA32_EFER, env->efer);
 +    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IA32_EFER, env->efer);
 -    macvm_set_cr4(cpu_state->hvf_fd, env->cr[4]);
 -    macvm_set_cr0(cpu_state->hvf_fd, env->cr[0]);
 +    macvm_set_cr4(cpu_state->hvf->fd, env->cr[4]);
 +    macvm_set_cr0(cpu_state->hvf->fd, env->cr[0]);
      hvf_set_segment(cpu_state, &seg, &env->segs[R_CS], false);
      vmx_write_segment_descriptor(cpu_state, &seg, R_CS);
@@ -XXX,XX +XXX,XX @@ void hvf_put_segments(CPUState *cpu_state)
      hvf_set_segment(cpu_state, &seg, &env->ldt, false);
      vmx_write_segment_descriptor(cpu_state, &seg, R_LDTR);
 -    hv_vcpu_flush(cpu_state->hvf_fd);
 +    hv_vcpu_flush(cpu_state->hvf->fd);
  }
  void hvf_put_msrs(CPUState *cpu_state)
  {
      CPUX86State *env = &X86_CPU(cpu_state)->env;
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_CS,
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_CS,
                        env->sysenter_cs);
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_ESP,
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_ESP,
                        env->sysenter_esp);
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_EIP,
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_EIP,
                        env->sysenter_eip);
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_STAR, env->star);
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_STAR, env->star);
  #ifdef TARGET_X86_64
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_CSTAR, env->cstar);
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_KERNELGSBASE, env->kernelgsbase);
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_FMASK, env->fmask);
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_LSTAR, env->lstar);
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_CSTAR, env->cstar);
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_KERNELGSBASE, env->kernelgsbase);
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_FMASK, env->fmask);
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_LSTAR, env->lstar);
  #endif
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_GSBASE, env->segs[R_GS].base);
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_FSBASE, env->segs[R_FS].base);
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_GSBASE, env->segs[R_GS].base);
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_FSBASE, env->segs[R_FS].base);
  }
@@ -XXX,XX +XXX,XX @@ void hvf_get_xsave(CPUState *cpu_state)
      xsave = X86_CPU(cpu_state)->env.xsave_buf;
 -    if (hv_vcpu_read_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
 +    if (hv_vcpu_read_fpstate(cpu_state->hvf->fd, (void*)xsave, 4096)) {
          abort();
      }
@@ -XXX,XX +XXX,XX @@ void hvf_get_segments(CPUState *cpu_state)
      vmx_read_segment_descriptor(cpu_state, &seg, R_LDTR);
      hvf_get_segment(&env->ldt, &seg);
 -    env->idt.limit = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_LIMIT);
 -    env->idt.base = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_BASE);
 -    env->gdt.limit = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
 -    env->gdt.base = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_BASE);
 +    env->idt.limit = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_LIMIT);
 +    env->idt.base = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_BASE);
 +    env->gdt.limit = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
 +    env->gdt.base = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_BASE);
 -    env->cr[0] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR0);
 +    env->cr[0] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR0);
      env->cr[2] = 0;
 -    env->cr[3] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR3);
 -    env->cr[4] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR4);
 +    env->cr[3] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR3);
 +    env->cr[4] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR4);
 -    env->efer = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IA32_EFER);
 +    env->efer = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IA32_EFER);
  }
  void hvf_get_msrs(CPUState *cpu_state)
@@ -XXX,XX +XXX,XX @@ void hvf_get_msrs(CPUState *cpu_state)
      CPUX86State *env = &X86_CPU(cpu_state)->env;
      uint64_t tmp;
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_CS, &tmp);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_CS, &tmp);
      env->sysenter_cs = tmp;
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_ESP, &tmp);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_ESP, &tmp);
      env->sysenter_esp = tmp;
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_EIP, &tmp);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_EIP, &tmp);
      env->sysenter_eip = tmp;
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_STAR, &env->star);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_STAR, &env->star);
  #ifdef TARGET_X86_64
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_CSTAR, &env->cstar);
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_KERNELGSBASE, &env->kernelgsbase);
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_FMASK, &env->fmask);
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_LSTAR, &env->lstar);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_CSTAR, &env->cstar);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_KERNELGSBASE, &env->kernelgsbase);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_FMASK, &env->fmask);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_LSTAR, &env->lstar);
  #endif
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_APICBASE, &tmp);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_APICBASE, &tmp);
 -    env->tsc = rdtscp() + rvmcs(cpu_state->hvf_fd, VMCS_TSC_OFFSET);
 +    env->tsc = rdtscp() + rvmcs(cpu_state->hvf->fd, VMCS_TSC_OFFSET);
  }
  int hvf_put_registers(CPUState *cpu_state)
@@ -XXX,XX +XXX,XX @@ int hvf_put_registers(CPUState *cpu_state)
      X86CPU *x86cpu = X86_CPU(cpu_state);
      CPUX86State *env = &x86cpu->env;
 -    wreg(cpu_state->hvf_fd, HV_X86_RAX, env->regs[R_EAX]);
 -    wreg(cpu_state->hvf_fd, HV_X86_RBX, env->regs[R_EBX]);
 -    wreg(cpu_state->hvf_fd, HV_X86_RCX, env->regs[R_ECX]);
 -    wreg(cpu_state->hvf_fd, HV_X86_RDX, env->regs[R_EDX]);
 -    wreg(cpu_state->hvf_fd, HV_X86_RBP, env->regs[R_EBP]);
 -    wreg(cpu_state->hvf_fd, HV_X86_RSP, env->regs[R_ESP]);
 -    wreg(cpu_state->hvf_fd, HV_X86_RSI, env->regs[R_ESI]);
 -    wreg(cpu_state->hvf_fd, HV_X86_RDI, env->regs[R_EDI]);
 -    wreg(cpu_state->hvf_fd, HV_X86_R8, env->regs[8]);
 -    wreg(cpu_state->hvf_fd, HV_X86_R9, env->regs[9]);
 -    wreg(cpu_state->hvf_fd, HV_X86_R10, env->regs[10]);
 -    wreg(cpu_state->hvf_fd, HV_X86_R11, env->regs[11]);
 -    wreg(cpu_state->hvf_fd, HV_X86_R12, env->regs[12]);
 -    wreg(cpu_state->hvf_fd, HV_X86_R13, env->regs[13]);
 -    wreg(cpu_state->hvf_fd, HV_X86_R14, env->regs[14]);
 -    wreg(cpu_state->hvf_fd, HV_X86_R15, env->regs[15]);
 -    wreg(cpu_state->hvf_fd, HV_X86_RFLAGS, env->eflags);
 -    wreg(cpu_state->hvf_fd, HV_X86_RIP, env->eip);
 +    wreg(cpu_state->hvf->fd, HV_X86_RAX, env->regs[R_EAX]);
 +    wreg(cpu_state->hvf->fd, HV_X86_RBX, env->regs[R_EBX]);
 +    wreg(cpu_state->hvf->fd, HV_X86_RCX, env->regs[R_ECX]);
 +    wreg(cpu_state->hvf->fd, HV_X86_RDX, env->regs[R_EDX]);
 +    wreg(cpu_state->hvf->fd, HV_X86_RBP, env->regs[R_EBP]);
 +    wreg(cpu_state->hvf->fd, HV_X86_RSP, env->regs[R_ESP]);
 +    wreg(cpu_state->hvf->fd, HV_X86_RSI, env->regs[R_ESI]);
 +    wreg(cpu_state->hvf->fd, HV_X86_RDI, env->regs[R_EDI]);
 +    wreg(cpu_state->hvf->fd, HV_X86_R8, env->regs[8]);
 +    wreg(cpu_state->hvf->fd, HV_X86_R9, env->regs[9]);
 +    wreg(cpu_state->hvf->fd, HV_X86_R10, env->regs[10]);
 +    wreg(cpu_state->hvf->fd, HV_X86_R11, env->regs[11]);
 +    wreg(cpu_state->hvf->fd, HV_X86_R12, env->regs[12]);
 +    wreg(cpu_state->hvf->fd, HV_X86_R13, env->regs[13]);
 +    wreg(cpu_state->hvf->fd, HV_X86_R14, env->regs[14]);
 +    wreg(cpu_state->hvf->fd, HV_X86_R15, env->regs[15]);
 +    wreg(cpu_state->hvf->fd, HV_X86_RFLAGS, env->eflags);
 +    wreg(cpu_state->hvf->fd, HV_X86_RIP, env->eip);
 -    wreg(cpu_state->hvf_fd, HV_X86_XCR0, env->xcr0);
 +    wreg(cpu_state->hvf->fd, HV_X86_XCR0, env->xcr0);
      hvf_put_xsave(cpu_state);
@@ -XXX,XX +XXX,XX @@ int hvf_put_registers(CPUState *cpu_state)
      hvf_put_msrs(cpu_state);
 -    wreg(cpu_state->hvf_fd, HV_X86_DR0, env->dr[0]);
 -    wreg(cpu_state->hvf_fd, HV_X86_DR1, env->dr[1]);
 -    wreg(cpu_state->hvf_fd, HV_X86_DR2, env->dr[2]);
 -    wreg(cpu_state->hvf_fd, HV_X86_DR3, env->dr[3]);
 -    wreg(cpu_state->hvf_fd, HV_X86_DR4, env->dr[4]);
 -    wreg(cpu_state->hvf_fd, HV_X86_DR5, env->dr[5]);
 -    wreg(cpu_state->hvf_fd, HV_X86_DR6, env->dr[6]);
 -    wreg(cpu_state->hvf_fd, HV_X86_DR7, env->dr[7]);
 +    wreg(cpu_state->hvf->fd, HV_X86_DR0, env->dr[0]);
 +    wreg(cpu_state->hvf->fd, HV_X86_DR1, env->dr[1]);
 +    wreg(cpu_state->hvf->fd, HV_X86_DR2, env->dr[2]);
 +    wreg(cpu_state->hvf->fd, HV_X86_DR3, env->dr[3]);
 +    wreg(cpu_state->hvf->fd, HV_X86_DR4, env->dr[4]);
 +    wreg(cpu_state->hvf->fd, HV_X86_DR5, env->dr[5]);
 +    wreg(cpu_state->hvf->fd, HV_X86_DR6, env->dr[6]);
 +    wreg(cpu_state->hvf->fd, HV_X86_DR7, env->dr[7]);
      return 0;
  }
@@ -XXX,XX +XXX,XX @@ int hvf_get_registers(CPUState *cpu_state)
      X86CPU *x86cpu = X86_CPU(cpu_state);
      CPUX86State *env = &x86cpu->env;
 -    env->regs[R_EAX] = rreg(cpu_state->hvf_fd, HV_X86_RAX);
 -    env->regs[R_EBX] = rreg(cpu_state->hvf_fd, HV_X86_RBX);
 -    env->regs[R_ECX] = rreg(cpu_state->hvf_fd, HV_X86_RCX);
 -    env->regs[R_EDX] = rreg(cpu_state->hvf_fd, HV_X86_RDX);
 -    env->regs[R_EBP] = rreg(cpu_state->hvf_fd, HV_X86_RBP);
 -    env->regs[R_ESP] = rreg(cpu_state->hvf_fd, HV_X86_RSP);
 -    env->regs[R_ESI] = rreg(cpu_state->hvf_fd, HV_X86_RSI);
 -    env->regs[R_EDI] = rreg(cpu_state->hvf_fd, HV_X86_RDI);
 -    env->regs[8] = rreg(cpu_state->hvf_fd, HV_X86_R8);
 -    env->regs[9] = rreg(cpu_state->hvf_fd, HV_X86_R9);
 -    env->regs[10] = rreg(cpu_state->hvf_fd, HV_X86_R10);
 -    env->regs[11] = rreg(cpu_state->hvf_fd, HV_X86_R11);
 -    env->regs[12] = rreg(cpu_state->hvf_fd, HV_X86_R12);
 -    env->regs[13] = rreg(cpu_state->hvf_fd, HV_X86_R13);
 -    env->regs[14] = rreg(cpu_state->hvf_fd, HV_X86_R14);
 -    env->regs[15] = rreg(cpu_state->hvf_fd, HV_X86_R15);
 +    env->regs[R_EAX] = rreg(cpu_state->hvf->fd, HV_X86_RAX);
 +    env->regs[R_EBX] = rreg(cpu_state->hvf->fd, HV_X86_RBX);
 +    env->regs[R_ECX] = rreg(cpu_state->hvf->fd, HV_X86_RCX);
 +    env->regs[R_EDX] = rreg(cpu_state->hvf->fd, HV_X86_RDX);
 +    env->regs[R_EBP] = rreg(cpu_state->hvf->fd, HV_X86_RBP);
 +    env->regs[R_ESP] = rreg(cpu_state->hvf->fd, HV_X86_RSP);
 +    env->regs[R_ESI] = rreg(cpu_state->hvf->fd, HV_X86_RSI);
 +    env->regs[R_EDI] = rreg(cpu_state->hvf->fd, HV_X86_RDI);
 +    env->regs[8] = rreg(cpu_state->hvf->fd, HV_X86_R8);
 +    env->regs[9] = rreg(cpu_state->hvf->fd, HV_X86_R9);
 +    env->regs[10] = rreg(cpu_state->hvf->fd, HV_X86_R10);
 +    env->regs[11] = rreg(cpu_state->hvf->fd, HV_X86_R11);
 +    env->regs[12] = rreg(cpu_state->hvf->fd, HV_X86_R12);
 +    env->regs[13] = rreg(cpu_state->hvf->fd, HV_X86_R13);
 +    env->regs[14] = rreg(cpu_state->hvf->fd, HV_X86_R14);
 +    env->regs[15] = rreg(cpu_state->hvf->fd, HV_X86_R15);
 -    env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
 -    env->eip = rreg(cpu_state->hvf_fd, HV_X86_RIP);
 +    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
 +    env->eip = rreg(cpu_state->hvf->fd, HV_X86_RIP);
      hvf_get_xsave(cpu_state);
 -    env->xcr0 = rreg(cpu_state->hvf_fd, HV_X86_XCR0);
 +    env->xcr0 = rreg(cpu_state->hvf->fd, HV_X86_XCR0);
      hvf_get_segments(cpu_state);
      hvf_get_msrs(cpu_state);
 -    env->dr[0] = rreg(cpu_state->hvf_fd, HV_X86_DR0);
 -    env->dr[1] = rreg(cpu_state->hvf_fd, HV_X86_DR1);
 -    env->dr[2] = rreg(cpu_state->hvf_fd, HV_X86_DR2);
 -    env->dr[3] = rreg(cpu_state->hvf_fd, HV_X86_DR3);
 -    env->dr[4] = rreg(cpu_state->hvf_fd, HV_X86_DR4);
 -    env->dr[5] = rreg(cpu_state->hvf_fd, HV_X86_DR5);
 -    env->dr[6] = rreg(cpu_state->hvf_fd, HV_X86_DR6);
 -    env->dr[7] = rreg(cpu_state->hvf_fd, HV_X86_DR7);
 +    env->dr[0] = rreg(cpu_state->hvf->fd, HV_X86_DR0);
 +    env->dr[1] = rreg(cpu_state->hvf->fd, HV_X86_DR1);
 +    env->dr[2] = rreg(cpu_state->hvf->fd, HV_X86_DR2);
 +    env->dr[3] = rreg(cpu_state->hvf->fd, HV_X86_DR3);
 +    env->dr[4] = rreg(cpu_state->hvf->fd, HV_X86_DR4);
 +    env->dr[5] = rreg(cpu_state->hvf->fd, HV_X86_DR5);
 +    env->dr[6] = rreg(cpu_state->hvf->fd, HV_X86_DR6);
 +    env->dr[7] = rreg(cpu_state->hvf->fd, HV_X86_DR7);
      x86_update_hflags(env);
      return 0;
@@ -XXX,XX +XXX,XX @@ int hvf_get_registers(CPUState *cpu_state)
  static void vmx_set_int_window_exiting(CPUState *cpu)
  {
       uint64_t val;
 -     val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
 -     wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val |
 +     val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
 +     wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val |
               VMCS_PRI_PROC_BASED_CTLS_INT_WINDOW_EXITING);
  }
  void vmx_clear_int_window_exiting(CPUState *cpu)
  {
       uint64_t val;
 -     val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
 -     wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val &
 +     val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
 +     wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val &
               ~VMCS_PRI_PROC_BASED_CTLS_INT_WINDOW_EXITING);
  }
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
      uint64_t info = 0;
      if (have_event) {
          info = vector | intr_type | VMCS_INTR_VALID;
 -        uint64_t reason = rvmcs(cpu_state->hvf_fd, VMCS_EXIT_REASON);
 +        uint64_t reason = rvmcs(cpu_state->hvf->fd, VMCS_EXIT_REASON);
          if (env->nmi_injected && reason != EXIT_REASON_TASK_SWITCH) {
              vmx_clear_nmi_blocking(cpu_state);
          }
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
              info &= ~(1 << 12); /* clear undefined bit */
              if (intr_type == VMCS_INTR_T_SWINTR ||
                  intr_type == VMCS_INTR_T_SWEXCEPTION) {
 -                wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INST_LENGTH, env->ins_len);
 +                wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INST_LENGTH, env->ins_len);
              }
              if (env->has_error_code) {
 -                wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_EXCEPTION_ERROR,
 +                wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_EXCEPTION_ERROR,
                        env->error_code);
                  /* Indicate that VMCS_ENTRY_EXCEPTION_ERROR is valid */
                  info |= VMCS_INTR_DEL_ERRCODE;
              }
              /*printf("reinject  %lx err %d\n", info, err);*/
 -            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, info);
 +            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, info);
          };
      }
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
          if (!(env->hflags2 & HF2_NMI_MASK) && !(info & VMCS_INTR_VALID)) {
              cpu_state->interrupt_request &= ~CPU_INTERRUPT_NMI;
              info = VMCS_INTR_VALID | VMCS_INTR_T_NMI | EXCP02_NMI;
 -            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, info);
 +            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, info);
          } else {
              vmx_set_nmi_window_exiting(cpu_state);
          }
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
          int line = cpu_get_pic_interrupt(&x86cpu->env);
          cpu_state->interrupt_request &= ~CPU_INTERRUPT_HARD;
          if (line >= 0) {
 -            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, line |
 +            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, line |
                    VMCS_INTR_VALID | VMCS_INTR_T_HWINTR);
          }
      }
@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
      X86CPU *cpu = X86_CPU(cpu_state);
      CPUX86State *env = &cpu->env;
 -    env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
 +    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
      if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
          cpu_synchronize_state(cpu_state);
 --
 .20.1

-[PULL 06/26] target/arm: Expand read/write_neon_element32 to all MemOp
+[PULL 39/45] hvf: Simplify post reset/init/loadvm hooks
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Alexander Graf <agraf@csgraf.de>
-We can then use this to improve VMOV (scalar to gp) and
+The hooks we have that call us after reset, init and loadvm really all
-VMOV (gp to scalar) so that we simply perform the memory
+just want to say "The reference of all register state is in the QEMU
-operation that we wanted, rather than inserting or
+vcpu struct, please push it".
 extracting from a 32-bit quantity.
-These were the last uses of neon_load/store_reg, so remove them.
+We already have a working pushing mechanism though called cpu->vcpu_dirty,
 so we can just reuse that for all of the above, syncing state properly the
 next time we actually execute a vCPU.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+This fixes PSCI resets on ARM, as they modify CPU state even after the
-Message-id: 20201030022618.785675-7-richard.henderson@linaro.org
+post init call has completed, but before we execute the vCPU again.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 To also make the scheme work for x86, we have to make sure we don't
 move stale eflags into our env when the vcpu state is dirty.
 Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
 Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
 Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-13-agraf@csgraf.de
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c         | 50 +++++++++++++-----------
+ accel/hvf/hvf-accel-ops.c | 27 +++++++--------------------
- target/arm/translate-vfp.c.inc | 71 +++++-----------------------------
+ target/i386/hvf/x86hvf.c  |  5 ++++-
-files changed, 37 insertions(+), 84 deletions(-)
+files changed, 11 insertions(+), 21 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/accel/hvf/hvf-accel-ops.c
-+++ b/target/arm/translate.c
++++ b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
+@@ -XXX,XX +XXX,XX @@ static void hvf_cpu_synchronize_state(CPUState *cpu)
   * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
   * where 0 is the least significant end of the register.
   */
 -static long neon_element_offset(int reg, int element, MemOp size)
 +static long neon_element_offset(int reg, int element, MemOp memop)
  {
 -    int element_size = 1 << size;
 +    int element_size = 1 << (memop & MO_SIZE);
      int ofs = element * element_size;
  #ifdef HOST_WORDS_BIGENDIAN
      /*
@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
      }
  }
--static TCGv_i32 neon_load_reg(int reg, int pass)
+-static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
--{
+-                                              run_on_cpu_data arg)
--    TCGv_i32 tmp = tcg_temp_new_i32();
++static void do_hvf_cpu_synchronize_set_dirty(CPUState *cpu,
--    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
++                                             run_on_cpu_data arg)
--    return tmp;
+ {
 -    hvf_put_registers(cpu);
 -    cpu->vcpu_dirty = false;
 +    /* QEMU state is the reference, push it to HVF now and on next entry */
 +    cpu->vcpu_dirty = true;
  }
  static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
  {
 -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
 -}
 -
--static void neon_store_reg(int reg, int pass, TCGv_i32 var)
+-static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
 -                                             run_on_cpu_data arg)
 -{
--    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
+-    hvf_put_registers(cpu);
--    tcg_temp_free_i32(var);
+-    cpu->vcpu_dirty = false;
 +    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
  }
  static void hvf_cpu_synchronize_post_init(CPUState *cpu)
  {
 -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
 -}
 -
- static inline void neon_load_reg64(TCGv_i64 var, int reg)
+-static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
 -                                              run_on_cpu_data arg)
 -{
 -    cpu->vcpu_dirty = true;
 +    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
  }
  static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
  {
-     tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
+-    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
++    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
      tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
  }
--static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
+ static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
-+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
+diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
  {
 -    long off = neon_element_offset(reg, ele, size);
 +    long off = neon_element_offset(reg, ele, memop);
 -    switch (size) {
 -    case MO_32:
 +    switch (memop) {
 +    case MO_SB:
 +        tcg_gen_ld8s_i32(dest, cpu_env, off);
 +        break;
 +    case MO_UB:
 +        tcg_gen_ld8u_i32(dest, cpu_env, off);
 +        break;
 +    case MO_SW:
 +        tcg_gen_ld16s_i32(dest, cpu_env, off);
 +        break;
 +    case MO_UW:
 +        tcg_gen_ld16u_i32(dest, cpu_env, off);
 +        break;
 +    case MO_UL:
 +    case MO_SL:
          tcg_gen_ld_i32(dest, cpu_env, off);
          break;
      default:
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
      }
  }
 -static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
 +static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
  {
 -    long off = neon_element_offset(reg, ele, size);
 +    long off = neon_element_offset(reg, ele, memop);
 -    switch (size) {
 +    switch (memop) {
 +    case MO_8:
 +        tcg_gen_st8_i32(src, cpu_env, off);
 +        break;
 +    case MO_16:
 +        tcg_gen_st16_i32(src, cpu_env, off);
 +        break;
      case MO_32:
          tcg_gen_st_i32(src, cpu_env, off);
          break;
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.c.inc
+--- a/target/i386/hvf/x86hvf.c
-+++ b/target/arm/translate-vfp.c.inc
++++ b/target/i386/hvf/x86hvf.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
+@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
- {
+     X86CPU *cpu = X86_CPU(cpu_state);
-     /* VMOV scalar to general purpose register */
+     CPUX86State *env = &cpu->env;
-     TCGv_i32 tmp;
--    int pass;
+-    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
--    uint32_t offset;
++    if (!cpu_state->vcpu_dirty) {
++        /* light weight sync for CPU_INTERRUPT_HARD and IF_MASK */
--    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
++        env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
--    if (a->size == 2
++    }
-+    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
-+    if (a->size == MO_32
+     if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
-         ? !dc_isar_feature(aa32_fpsp_v2, s)
+         cpu_synchronize_state(cpu_state);
          : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
          return false;
      }
 -    offset = a->index << a->size;
 -    pass = extract32(offset, 2, 1);
 -    offset = extract32(offset, 0, 2) * 8;
 -
      if (!vfp_access_check(s)) {
          return true;
      }
 -    tmp = neon_load_reg(a->vn, pass);
 -    switch (a->size) {
 -    case 0:
 -        if (offset) {
 -            tcg_gen_shri_i32(tmp, tmp, offset);
 -        }
 -        if (a->u) {
 -            gen_uxtb(tmp);
 -        } else {
 -            gen_sxtb(tmp);
 -        }
 -        break;
 -    case 1:
 -        if (a->u) {
 -            if (offset) {
 -                tcg_gen_shri_i32(tmp, tmp, 16);
 -            } else {
 -                gen_uxth(tmp);
 -            }
 -        } else {
 -            if (offset) {
 -                tcg_gen_sari_i32(tmp, tmp, 16);
 -            } else {
 -                gen_sxth(tmp);
 -            }
 -        }
 -        break;
 -    case 2:
 -        break;
 -    }
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
      store_reg(s, a->rt, tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
  static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
  {
      /* VMOV general purpose register to scalar */
 -    TCGv_i32 tmp, tmp2;
 -    int pass;
 -    uint32_t offset;
 +    TCGv_i32 tmp;
 -    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
 -    if (a->size == 2
 +    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
 +    if (a->size == MO_32
          ? !dc_isar_feature(aa32_fpsp_v2, s)
          : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
          return false;
      }
 -    offset = a->index << a->size;
 -    pass = extract32(offset, 2, 1);
 -    offset = extract32(offset, 0, 2) * 8;
 -
      if (!vfp_access_check(s)) {
          return true;
      }
      tmp = load_reg(s, a->rt);
 -    switch (a->size) {
 -    case 0:
 -        tmp2 = neon_load_reg(a->vn, pass);
 -        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
 -        tcg_temp_free_i32(tmp2);
 -        break;
 -    case 1:
 -        tmp2 = neon_load_reg(a->vn, pass);
 -        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
 -        tcg_temp_free_i32(tmp2);
 -        break;
 -    case 2:
 -        break;
 -    }
 -    neon_store_reg(a->vn, pass, tmp);
 +    write_neon_element32(tmp, a->vn, a->index, a->size);
 +    tcg_temp_free_i32(tmp);
      return true;
  }
 --
 .20.1

-New patch
+[PULL 40/45] tests/qtest/bios-tables-test: Check for dup2() failure
+Coverity notes that we don't check for dup2() failing.  Add some
+assertions so that if it does ever happen we get some indication.
+(This is similar to how we handle other "don't expect this syscall to
+fail" checks in this test code.)
+Fixes: Coverity CID 1432346
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
+Message-id: 20210525134458.6675-2-peter.maydell@linaro.org
+---
+ tests/qtest/bios-tables-test.c | 8 ++++++--
+file changed, 6 insertions(+), 2 deletions(-)
+diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/qtest/bios-tables-test.c
++++ b/tests/qtest/bios-tables-test.c
+@@ -XXX,XX +XXX,XX @@ static void test_acpi_asl(test_data *data)
+                                                  exp_sdt->asl_file, sdt->asl_file);
+                     int out = dup(STDOUT_FILENO);
+                     int ret G_GNUC_UNUSED;
++                    int dupret;
+-                    dup2(STDERR_FILENO, STDOUT_FILENO);
++                    g_assert(out >= 0);
++                    dupret = dup2(STDERR_FILENO, STDOUT_FILENO);
++                    g_assert(dupret >= 0);
+                     ret = system(diff) ;
+-                    dup2(out, STDOUT_FILENO);
++                    dupret = dup2(out, STDOUT_FILENO);
++                    g_assert(dupret >= 0);
+                     close(out);
+                     g_free(diff);
+                 }
+--
+.20.1

-New patch
+[PULL 41/45] tests/qtest/e1000e-test: Check qemu_recv() succeeded
+The e1000e_send_verify() test calls qemu_recv() but doesn't
+check that the call succeeded, which annoys Coverity. Add
+an explicit test check for the length of the data.
+(This is a test check, not a "we assume this syscall always
+succeeds", so we use g_assert_cmpint() rather than g_assert().)
+Fixes: Coverity CID 1432324
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
+Message-id: 20210525134458.6675-3-peter.maydell@linaro.org
+---
+ tests/qtest/e1000e-test.c | 3 ++-
+file changed, 2 insertions(+), 1 deletion(-)
+diff --git a/tests/qtest/e1000e-test.c b/tests/qtest/e1000e-test.c
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/qtest/e1000e-test.c
++++ b/tests/qtest/e1000e-test.c
+@@ -XXX,XX +XXX,XX @@ static void e1000e_send_verify(QE1000E *d, int *test_sockets, QGuestAllocator *a
+     /* Check data sent to the backend */
+     ret = qemu_recv(test_sockets[0], &recv_len, sizeof(recv_len), 0);
+     g_assert_cmpint(ret, == , sizeof(recv_len));
+-    qemu_recv(test_sockets[0], buffer, 64, 0);
++    ret = qemu_recv(test_sockets[0], buffer, 64, 0);
++    g_assert_cmpint(ret, >=, 5);
+     g_assert_cmpstr(buffer, == , "TEST");
+     /* Free test data buffer */
+--
+.20.1

-New patch
+[PULL 42/45] tests/qtest/hd-geo-test: Fix checks on mkstemp() return value
+Coverity notices that the checks against mkstemp() failing in
+create_qcow2_with_mbr() are wrong: mkstemp returns -1 on failure but
+the check is just "g_assert(fd)".  Fix to use "g_assert(fd >= 0)",
+matching the correct check in create_test_img().
+Fixes: Coverity CID 1432274
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
+Message-id: 20210525134458.6675-4-peter.maydell@linaro.org
+---
+ tests/qtest/hd-geo-test.c | 4 ++--
+file changed, 2 insertions(+), 2 deletions(-)
+diff --git a/tests/qtest/hd-geo-test.c b/tests/qtest/hd-geo-test.c
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/qtest/hd-geo-test.c
++++ b/tests/qtest/hd-geo-test.c
+@@ -XXX,XX +XXX,XX @@ static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors)
+     }
+     fd = mkstemp(raw_path);
+-    g_assert(fd);
++    g_assert(fd >= 0);
+     close(fd);
+     fd = open(raw_path, O_WRONLY);
+@@ -XXX,XX +XXX,XX @@ static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors)
+     close(fd);
+     fd = mkstemp(qcow2_path);
+-    g_assert(fd);
++    g_assert(fd >= 0);
+     close(fd);
+     qemu_img_path = getenv("QTEST_QEMU_IMG");
+--
+.20.1

-[PULL 16/26] disas/capstone: Fix monitor disassembly of >32 bytes
+[PULL 43/45] tests/qtest/pflash-cfi02-test: Avoid potential integer overflow
-If we're using the capstone disassembler, disassembly of a run of
+Coverity points out that we calculate a 64-bit value using 32-bit
-instructions more than 32 bytes long disassembles the wrong data for
+arithmetic; add the cast to force the multiply to be done as 64-bits.
-instructions beyond the 32 byte mark:
+(The overflow will never happen with the current test data.)
-(qemu) xp /16x 0x100
+Fixes: Coverity CID 1432320
 0000000000000100: 0x00000005 0x54410001 0x00000001 0x00001000
 0000000000000110: 0x00000000 0x00000004 0x54410002 0x3c000000
 0000000000000120: 0x00000000 0x00000004 0x54410009 0x74736574
 0000000000000130: 0x00000000 0x00000000 0x00000000 0x00000000
 (qemu) xp /16i 0x100
 x00000100: 00000005 andeq r0, r0, r5
 x00000104: 54410001 strbpl r0, [r1], #-1
 x00000108: 00000001 andeq r0, r0, r1
 x0000010c: 00001000 andeq r1, r0, r0
 x00000110: 00000000 andeq r0, r0, r0
 x00000114: 00000004 andeq r0, r0, r4
 x00000118: 54410002 strbpl r0, [r1], #-2
 x0000011c: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
 x00000120: 54410001 strbpl r0, [r1], #-1
 x00000124: 00000001 andeq r0, r0, r1
 x00000128: 00001000 andeq r1, r0, r0
 x0000012c: 00000000 andeq r0, r0, r0
 x00000130: 00000004 andeq r0, r0, r4
 x00000134: 54410002 strbpl r0, [r1], #-2
 x00000138: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
 x0000013c: 00000000 andeq r0, r0, r0
 Here the disassembly of 0x120..0x13f is using the data that is in
 x104..0x123.
 This is caused by passing the wrong value to the read_memory_func().
 The intention is that at this point in the loop the 'cap_buf' buffer
 already contains 'csize' bytes of data for the instruction at guest
 addr 'pc', and we want to read in an extra 'tsize' bytes.  Those
 extra bytes are therefore at 'pc + csize', not 'pc'.  On the first
 time through the loop 'csize' happens to be zero, so the initial read
 of 32 bytes into cap_buf is correct and as long as the disassembly
 never needs to read more data we return the correct information.
 Use the correct guest address in the call to read_memory_func().
 Cc: qemu-stable@nongnu.org
 Fixes: https://bugs.launchpad.net/qemu/+bug/1900779
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20201022132445.25039-1-peter.maydell@linaro.org
+Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
 Message-id: 20210525134458.6675-5-peter.maydell@linaro.org
 ---
- disas/capstone.c | 2 +-
+ tests/qtest/pflash-cfi02-test.c | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/disas/capstone.c b/disas/capstone.c
+diff --git a/tests/qtest/pflash-cfi02-test.c b/tests/qtest/pflash-cfi02-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/disas/capstone.c
+--- a/tests/qtest/pflash-cfi02-test.c
-+++ b/disas/capstone.c
++++ b/tests/qtest/pflash-cfi02-test.c
-@@ -XXX,XX +XXX,XX @@ bool cap_disas_monitor(disassemble_info *info, uint64_t pc, int count)
+@@ -XXX,XX +XXX,XX @@ static void test_geometry(const void *opaque)
-         /* Make certain that we can make progress.  */
+     for (int region = 0; region < nb_erase_regions; ++region) {
-         assert(tsize != 0);
+         for (uint32_t i = 0; i < c->nb_blocs[region]; ++i) {
--        info->read_memory_func(pc, cap_buf + csize, tsize, info);
+-            uint64_t byte_addr = i * c->sector_len[region];
-+        info->read_memory_func(pc + csize, cap_buf + csize, tsize, info);
++            uint64_t byte_addr = (uint64_t)i * c->sector_len[region];
-         csize += tsize;
+             g_assert_cmphex(flash_read(c, byte_addr), ==, bank_mask(c));
+         }
-         if (cs_disasm_iter(handle, &cbuf, &csize, &pc, insn)) {
+     }
 --
 .20.1

-[PULL 26/26] tests/qtest/npcm7xx_rng-test: Disable randomness tests
+[PULL 44/45] tests/qtest/tpm-tests: Remove unnecessary NULL checks
-The randomness tests in the NPCM7xx RNG test fail intermittently
+Coverity points out that in tpm_test_swtpm_migration_test() we
-but fairly frequently. On my machine running the test in a loop:
+assume that src_tpm_addr and dst_tpm_addr are non-NULL (we
- while QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/npcm7xx_rng-test; do true; done
+pass them to tpm_util_migration_start_qemu() which will
 unconditionally dereference them) but then later explicitly
 check them for NULL. Remove the pointless checks.
-will fail in less than a minute with an error like:
+Fixes: Coverity CID 1432367, 1432359
 ERROR:../../tests/qtest/npcm7xx_rng-test.c:256:test_first_byte_runs:
 assertion failed (calc_runs_p(buf.l, sizeof(buf) * BITS_PER_BYTE) > 0.01): (0.00286205989 > 0.01)
 (Failures have been observed on all 4 of the randomness tests,
 not just first_byte_runs.)
 It's not clear why these tests are failing like this, but intermittent
 failures make CI and merge testing awkward, so disable running them
 unless a developer specifically sets QEMU_TEST_FLAKY_RNG_TESTS when
 running the test suite, until we work out the cause.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20201102152454.8287-1-peter.maydell@linaro.org
+Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
-Reviewed-by: Havard Skinnemoen <hskinnemoen@google.com>
+Message-id: 20210525134458.6675-6-peter.maydell@linaro.org
 ---
- tests/qtest/npcm7xx_rng-test.c | 14 ++++++++++----
+ tests/qtest/tpm-tests.c | 12 ++++--------
-file changed, 10 insertions(+), 4 deletions(-)
+file changed, 4 insertions(+), 8 deletions(-)
-diff --git a/tests/qtest/npcm7xx_rng-test.c b/tests/qtest/npcm7xx_rng-test.c
+diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
 index XXXXXXX..XXXXXXX 100644
---- a/tests/qtest/npcm7xx_rng-test.c
+--- a/tests/qtest/tpm-tests.c
-+++ b/tests/qtest/npcm7xx_rng-test.c
++++ b/tests/qtest/tpm-tests.c
-@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
+@@ -XXX,XX +XXX,XX @@ void tpm_test_swtpm_migration_test(const char *src_tpm_path,
+     qtest_quit(src_qemu);
-     qtest_add_func("npcm7xx_rng/enable_disable", test_enable_disable);
-     qtest_add_func("npcm7xx_rng/rosel", test_rosel);
+     tpm_util_swtpm_kill(dst_tpm_pid);
--    qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
+-    if (dst_tpm_addr) {
--    qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
+-        g_unlink(dst_tpm_addr->u.q_unix.path);
--    qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
+-        qapi_free_SocketAddress(dst_tpm_addr);
--    qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
+-    }
-+    /*
++    g_unlink(dst_tpm_addr->u.q_unix.path);
-+     * These tests fail intermittently; only run them on explicit
++    qapi_free_SocketAddress(dst_tpm_addr);
-+     * request until we figure out why.
-+     */
+     tpm_util_swtpm_kill(src_tpm_pid);
-+    if (getenv("QEMU_TEST_FLAKY_RNG_TESTS")) {
+-    if (src_tpm_addr) {
-+        qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
+-        g_unlink(src_tpm_addr->u.q_unix.path);
-+        qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
+-        qapi_free_SocketAddress(src_tpm_addr);
-+        qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
+-    }
-+        qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
++    g_unlink(src_tpm_addr->u.q_unix.path);
-+    }
++    qapi_free_SocketAddress(src_tpm_addr);
+ }
      qtest_start("-machine npcm750-evb");
      ret = g_test_run();
 --
 .20.1

-New patch
+[PULL 45/45] tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed
+Coverity complains that we don't check for failures from dup()
+and mkstemp(); add asserts that these syscalls succeeded.
+Fixes: Coverity CID 1432516, 1432574
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Message-id: 20210525134458.6675-7-peter.maydell@linaro.org
+---
+ tests/unit/test-vmstate.c | 5 ++++-
+file changed, 4 insertions(+), 1 deletion(-)
+diff --git a/tests/unit/test-vmstate.c b/tests/unit/test-vmstate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/unit/test-vmstate.c
++++ b/tests/unit/test-vmstate.c
+@@ -XXX,XX +XXX,XX @@ static int temp_fd;
+ /* Duplicate temp_fd and seek to the beginning of the file */
+ static QEMUFile *open_test_file(bool write)
+ {
+-    int fd = dup(temp_fd);
++    int fd;
+     QIOChannel *ioc;
+     QEMUFile *f;
++    fd = dup(temp_fd);
++    g_assert(fd >= 0);
+     lseek(fd, 0, SEEK_SET);
+     if (write) {
+         g_assert_cmpint(ftruncate(fd, 0), ==, 0);
+@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
+     g_autofree char *temp_file = g_strdup_printf("%s/vmst.test.XXXXXX",
+                                                  g_get_tmp_dir());
+     temp_fd = mkstemp(temp_file);
++    g_assert(temp_fd >= 0);
+     module_call_init(MODULE_INIT_QOM);
+--
+.20.1

Small pile of bug fixes for rc1. I've included my patches to get
our docs building with Sphinx 3, just for convenience...

-- PMM

The following changes since commit b149dea55cce97cb226683d06af61984a1c11e96:

Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20201102' into staging (2020-11-02 10:57:48 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20201102

for you to fetch changes up to ffb4fbf90a2f63c9cb33e4bb9f854c79bf04ca4a:

tests/qtest/npcm7xx_rng-test: Disable randomness tests (2020-11-02 16:52:18 +0000)

----------------------------------------------------------------
target-arm queue:
 * target/arm: Fix Neon emulation bugs on big-endian hosts
 * target/arm: fix handling of HCR.FB
 * target/arm: fix LORID_EL1 access check
 * disas/capstone: Fix monitor disassembly of >32 bytes
 * hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
 * hw/arm/boot: fix SVE for EL3 direct kernel boot
 * hw/display/omap_lcdc: Fix potential NULL pointer dereference
 * hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
 * target/arm: Get correct MMU index for other-security-state
 * configure: Test that gio libs from pkg-config work
 * hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
 * docs: Fix building with Sphinx 3
 * tests/qtest/npcm7xx_rng-test: Disable randomness tests

----------------------------------------------------------------
AlexChen (2):
      hw/display/omap_lcdc: Fix potential NULL pointer dereference
      hw/display/exynos4210_fimd: Fix potential NULL pointer dereference

Peter Maydell (9):
      target/arm: Fix float16 pairwise Neon ops on big-endian hosts
      target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
      disas/capstone: Fix monitor disassembly of >32 bytes
      target/arm: Get correct MMU index for other-security-state
      configure: Test that gio libs from pkg-config work
      hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
      scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
      qemu-option-trace.rst.inc: Don't use option:: markup
      tests/qtest/npcm7xx_rng-test: Disable randomness tests

Philippe Mathieu-Daudé (1):
      hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)

Richard Henderson (11):
      target/arm: Introduce neon_full_reg_offset
      target/arm: Move neon_element_offset to translate.c
      target/arm: Use neon_element_offset in neon_load/store_reg
      target/arm: Use neon_element_offset in vfp_reg_offset
      target/arm: Add read/write_neon_element32
      target/arm: Expand read/write_neon_element32 to all MemOp
      target/arm: Rename neon_load_reg32 to vfp_load_reg32
      target/arm: Add read/write_neon_element64
      target/arm: Rename neon_load_reg64 to vfp_load_reg64
      target/arm: Simplify do_long_3d and do_2scalar_long
      target/arm: Improve do_prewiden_3d

Rémi Denis-Courmont (3):
      target/arm: fix handling of HCR.FB
      target/arm: fix LORID_EL1 access check
      hw/arm/boot: fix SVE for EL3 direct kernel boot

From: Richard Henderson <richard.henderson@linaro.org>

This function makes it clear that we're talking about the whole
register, and not the 32-bit piece at index 0.  This fixes a bug
when running on a big-endian host.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  8 ++++++
 target/arm/translate-neon.c.inc | 44 ++++++++++++++++-----------------
 target/arm/translate-vfp.c.inc  |  2 +-
 3 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_hlt(DisasContext *s, int imm)
     unallocated_encoding(s);
 }
 
+/*
+ * Return the offset of a "full" NEON Dreg.
+ */
+static long neon_full_reg_offset(unsigned reg)
+{
+    return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+}
+
 static inline long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ neon_element_offset(int reg, int element, MemOp size)
         ofs ^= 8 - element_size;
     }
 #endif
-    return neon_reg_offset(reg, 0) + ofs;
+    return neon_full_reg_offset(reg) + ofs;
 }
 
 static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
              * We cannot write 16 bytes at once because the
              * destination is unaligned.
              */
-            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
+            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
                                  8, 8, tmp);
-            tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
-                             neon_reg_offset(vd, 0), 8, 8);
+            tcg_gen_gvec_mov(0, neon_full_reg_offset(vd + 1),
+                             neon_full_reg_offset(vd), 8, 8);
         } else {
-            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
+            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
                                  vec_size, vec_size, tmp);
         }
         tcg_gen_addi_i32(addr, addr, 1 << size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
 static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
 {
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rn_ofs = neon_reg_offset(a->vn, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rn_ofs = neon_full_reg_offset(a->vn);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
 {
     /* Handle a 2-reg-shift insn which can be vectorized. */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
 {
     /* FP operations in 2-reg-and-shift group */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
     TCGv_ptr fpst;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
         return true;
     }
 
-    reg_ofs = neon_reg_offset(a->vd, 0);
+    reg_ofs = neon_full_reg_offset(a->vd);
     vec_size = a->q ? 16 : 8;
     imm = asimd_imm_const(a->imm, a->cmode, a->op);
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
         return true;
     }
 
-    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
-                       neon_reg_offset(a->vn, 0),
-                       neon_reg_offset(a->vm, 0),
+    tcg_gen_gvec_3_ool(neon_full_reg_offset(a->vd),
+                       neon_full_reg_offset(a->vn),
+                       neon_full_reg_offset(a->vm),
                        16, 16, 0, fn_gvec);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
 {
     /* Two registers and a scalar, using gvec */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rn_ofs = neon_reg_offset(a->vn, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rn_ofs = neon_full_reg_offset(a->vn);
     int rm_ofs;
     int idx;
     TCGv_ptr fpstatus;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
     /* a->vm is M:Vm, which encodes both register and index */
     idx = extract32(a->vm, a->size + 2, 2);
     a->vm = extract32(a->vm, 0, a->size + 2);
-    rm_ofs = neon_reg_offset(a->vm, 0);
+    rm_ofs = neon_full_reg_offset(a->vm);
 
     fpstatus = fpstatus_ptr(a->size == 1 ? FPST_STD_F16 : FPST_STD);
     tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpstatus,
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
         return true;
     }
 
-    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
+    tcg_gen_gvec_dup_mem(a->size, neon_full_reg_offset(a->vd),
                          neon_element_offset(a->vm, a->index, a->size),
                          a->q ? 16 : 8, a->q ? 16 : 8);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
 static bool do_2misc_vec(DisasContext *s, arg_2misc *a, GVecGen2Fn *fn)
 {
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
     }
 
     tmp = load_reg(s, a->rt);
-    tcg_gen_gvec_dup_i32(size, neon_reg_offset(a->vn, 0),
+    tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(a->vn),
                          vec_size, vec_size, tmp);
     tcg_temp_free_i32(tmp);
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This will shortly have users outside of translate-neon.c.inc.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          | 20 ++++++++++++++++++++
 target/arm/translate-neon.c.inc | 19 -------------------
 2 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
     return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
 }
 
+/*
+ * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
+ * where 0 is the least significant end of the register.
+ */
+static long neon_element_offset(int reg, int element, MemOp size)
+{
+    int element_size = 1 << size;
+    int ofs = element * element_size;
+#ifdef HOST_WORDS_BIGENDIAN
+    /*
+     * Calculate the offset assuming fully little-endian,
+     * then XOR to account for the order of the 8-byte units.
+     */
+    if (element_size < 8) {
+        ofs ^= 8 - element_size;
+    }
+#endif
+    return neon_full_reg_offset(reg) + ofs;
+}
+
 static inline long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static inline int neon_3same_fp_size(DisasContext *s, int x)
 #include "decode-neon-ls.c.inc"
 #include "decode-neon-shared.c.inc"
 
-/* Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
- * where 0 is the least significant end of the register.
- */
-static inline long
-neon_element_offset(int reg, int element, MemOp size)
-{
-    int element_size = 1 << size;
-    int ofs = element * element_size;
-#ifdef HOST_WORDS_BIGENDIAN
-    /* Calculate the offset assuming fully little-endian,
-     * then XOR to account for the order of the 8-byte units.
-     */
-    if (element_size < 8) {
-        ofs ^= 8 - element_size;
-    }
-#endif
-    return neon_full_reg_offset(reg) + ofs;
-}
-
 static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
 {
     long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

These are the only users of neon_reg_offset, so remove that.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-4-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-/* Return the offset of a 32-bit piece of a NEON register.
-   zero is the least significant end of the register.  */
-static inline long
-neon_reg_offset (int reg, int n)
-{
-    int sreg;
-    sreg = reg * 2 + n;
-    return vfp_reg_offset(0, sreg);
-}
-
 static TCGv_i32 neon_load_reg(int reg, int pass)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_ld_i32(tmp, cpu_env, neon_reg_offset(reg, pass));
+    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
     return tmp;
 }
 
 static void neon_store_reg(int reg, int pass, TCGv_i32 var)
 {
-    tcg_gen_st_i32(var, cpu_env, neon_reg_offset(reg, pass));
+    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
     tcg_temp_free_i32(var);
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This seems a bit more readable than using offsetof CPU_DoubleU.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-5-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_element_offset(int reg, int element, MemOp size)
     return neon_full_reg_offset(reg) + ofs;
 }
 
-static inline long vfp_reg_offset(bool dp, unsigned reg)
+/* Return the offset of a VFP Dreg (dp = true) or VFP Sreg (dp = false). */
+static long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
-        return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+        return neon_element_offset(reg, 0, MO_64);
     } else {
-        long ofs = offsetof(CPUARMState, vfp.zregs[reg >> 2].d[(reg >> 1) & 1]);
-        if (reg & 1) {
-            ofs += offsetof(CPU_DoubleU, l.upper);
-        } else {
-            ofs += offsetof(CPU_DoubleU, l.lower);
-        }
-        return ofs;
+        return neon_element_offset(reg >> 1, reg & 1, MO_32);
     }
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Model these off the aa64 read/write_vec_element functions.
Use it within translate-neon.c.inc.  The new functions do
not allocate or free temps, so this rearranges the calling
code a bit.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-6-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  26 ++++
 target/arm/translate-neon.c.inc | 256 ++++++++++++++++++++------------
 2 files changed, 183 insertions(+), 99 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
+{
+    long off = neon_element_offset(reg, ele, size);
+
+    switch (size) {
+    case MO_32:
+        tcg_gen_ld_i32(dest, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
+{
+    long off = neon_element_offset(reg, ele, size);
+
+    switch (size) {
+    case MO_32:
+        tcg_gen_st_i32(src, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 {
     TCGv_ptr ret = tcg_temp_new_ptr();
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
      * early. Since Q is 0 there are always just two passes, so instead
      * of a complicated loop over each pass we just unroll.
      */
-    tmp = neon_load_reg(a->vn, 0);
-    tmp2 = neon_load_reg(a->vn, 1);
+    tmp = tcg_temp_new_i32();
+    tmp2 = tcg_temp_new_i32();
+    tmp3 = tcg_temp_new_i32();
+
+    read_neon_element32(tmp, a->vn, 0, MO_32);
+    read_neon_element32(tmp2, a->vn, 1, MO_32);
     fn(tmp, tmp, tmp2);
-    tcg_temp_free_i32(tmp2);
 
-    tmp3 = neon_load_reg(a->vm, 0);
-    tmp2 = neon_load_reg(a->vm, 1);
+    read_neon_element32(tmp3, a->vm, 0, MO_32);
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     fn(tmp3, tmp3, tmp2);
-    tcg_temp_free_i32(tmp2);
 
-    neon_store_reg(a->vd, 0, tmp);
-    neon_store_reg(a->vd, 1, tmp3);
+    write_neon_element32(tmp, a->vd, 0, MO_32);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i32(tmp2);
+    tcg_temp_free_i32(tmp3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
      * 2-reg-and-shift operations, size < 3 case, where the
      * helper needs to be passed cpu_env.
      */
-    TCGv_i32 constimm;
+    TCGv_i32 constimm, tmp;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
      * by immediate using the variable shift operations.
      */
     constimm = tcg_const_i32(dup_const(a->size, a->shift));
+    tmp = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
+        read_neon_element32(tmp, a->vm, pass, MO_32);
         fn(tmp, cpu_env, tmp, constimm);
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(constimm);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
     constimm = tcg_const_i64(-a->shift);
     rm1 = tcg_temp_new_i64();
     rm2 = tcg_temp_new_i64();
+    rd = tcg_temp_new_i32();
 
     /* Load both inputs first to avoid potential overwrite if rm == rd */
     neon_load_reg64(rm1, a->vm);
     neon_load_reg64(rm2, a->vm + 1);
 
     shiftfn(rm1, rm1, constimm);
-    rd = tcg_temp_new_i32();
     narrowfn(rd, cpu_env, rm1);
-    neon_store_reg(a->vd, 0, rd);
+    write_neon_element32(rd, a->vd, 0, MO_32);
 
     shiftfn(rm2, rm2, constimm);
-    rd = tcg_temp_new_i32();
     narrowfn(rd, cpu_env, rm2);
-    neon_store_reg(a->vd, 1, rd);
+    write_neon_element32(rd, a->vd, 1, MO_32);
 
+    tcg_temp_free_i32(rd);
     tcg_temp_free_i64(rm1);
     tcg_temp_free_i64(rm2);
     tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
     constimm = tcg_const_i32(imm);
 
     /* Load all inputs first to avoid potential overwrite */
-    rm1 = neon_load_reg(a->vm, 0);
-    rm2 = neon_load_reg(a->vm, 1);
-    rm3 = neon_load_reg(a->vm + 1, 0);
-    rm4 = neon_load_reg(a->vm + 1, 1);
+    rm1 = tcg_temp_new_i32();
+    rm2 = tcg_temp_new_i32();
+    rm3 = tcg_temp_new_i32();
+    rm4 = tcg_temp_new_i32();
+    read_neon_element32(rm1, a->vm, 0, MO_32);
+    read_neon_element32(rm2, a->vm, 1, MO_32);
+    read_neon_element32(rm3, a->vm, 2, MO_32);
+    read_neon_element32(rm4, a->vm, 3, MO_32);
     rtmp = tcg_temp_new_i64();
 
     shiftfn(rm1, rm1, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
     tcg_temp_free_i32(rm2);
 
     narrowfn(rm1, cpu_env, rtmp);
-    neon_store_reg(a->vd, 0, rm1);
+    write_neon_element32(rm1, a->vd, 0, MO_32);
+    tcg_temp_free_i32(rm1);
 
     shiftfn(rm3, rm3, constimm);
     shiftfn(rm4, rm4, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
 
     narrowfn(rm3, cpu_env, rtmp);
     tcg_temp_free_i64(rtmp);
-    neon_store_reg(a->vd, 1, rm3);
+    write_neon_element32(rm3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(rm3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         widen_mask = dup_const(a->size + 1, widen_mask);
     }
 
-    rm0 = neon_load_reg(a->vm, 0);
-    rm1 = neon_load_reg(a->vm, 1);
+    rm0 = tcg_temp_new_i32();
+    rm1 = tcg_temp_new_i32();
+    read_neon_element32(rm0, a->vm, 0, MO_32);
+    read_neon_element32(rm1, a->vm, 1, MO_32);
     tmp = tcg_temp_new_i64();
 
     widenfn(tmp, rm0);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     if (src1_wide) {
         neon_load_reg64(rn0_64, a->vn);
     } else {
-        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vn, 0, MO_32);
         widenfn(rn0_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = neon_load_reg(a->vm, 0);
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rm, a->vm, 0, MO_32);
 
     widenfn(rm_64, rm);
     tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     if (src1_wide) {
         neon_load_reg64(rn1_64, a->vn + 1);
     } else {
-        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vn, 1, MO_32);
         widenfn(rn1_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = neon_load_reg(a->vm, 1);
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rm, a->vm, 1, MO_32);
 
     neon_store_reg64(rn0_64, a->vd);
 
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
 
     narrowfn(rd1, rn_64);
 
-    neon_store_reg(a->vd, 0, rd0);
-    neon_store_reg(a->vd, 1, rd1);
+    write_neon_element32(rd0, a->vd, 0, MO_32);
+    write_neon_element32(rd1, a->vd, 1, MO_32);
 
+    tcg_temp_free_i32(rd0);
+    tcg_temp_free_i32(rd1);
     tcg_temp_free_i64(rn_64);
     tcg_temp_free_i64(rm_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     rd0 = tcg_temp_new_i64();
     rd1 = tcg_temp_new_i64();
 
-    rn = neon_load_reg(a->vn, 0);
-    rm = neon_load_reg(a->vm, 0);
+    rn = tcg_temp_new_i32();
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rn, a->vn, 0, MO_32);
+    read_neon_element32(rm, a->vm, 0, MO_32);
     opfn(rd0, rn, rm);
-    tcg_temp_free_i32(rn);
-    tcg_temp_free_i32(rm);
 
-    rn = neon_load_reg(a->vn, 1);
-    rm = neon_load_reg(a->vm, 1);
+    read_neon_element32(rn, a->vn, 1, MO_32);
+    read_neon_element32(rm, a->vm, 1, MO_32);
     opfn(rd1, rn, rm);
     tcg_temp_free_i32(rn);
     tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
 
 static inline TCGv_i32 neon_get_scalar(int size, int reg)
 {
-    TCGv_i32 tmp;
-    if (size == 1) {
-        tmp = neon_load_reg(reg & 7, reg >> 4);
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    if (size == MO_16) {
+        read_neon_element32(tmp, reg & 7, reg >> 4, MO_32);
         if (reg & 8) {
             gen_neon_dup_high16(tmp);
         } else {
             gen_neon_dup_low16(tmp);
         }
     } else {
-        tmp = neon_load_reg(reg & 15, reg >> 4);
+        read_neon_element32(tmp, reg & 15, reg >> 4, MO_32);
     }
     return tmp;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
      * perform an accumulation operation of that result into the
      * destination.
      */
-    TCGv_i32 scalar;
+    TCGv_i32 scalar, tmp;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
     }
 
     scalar = neon_get_scalar(a->size, a->vm);
+    tmp = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
+        read_neon_element32(tmp, a->vn, pass, MO_32);
         opfn(tmp, tmp, scalar);
         if (accfn) {
-            TCGv_i32 rd = neon_load_reg(a->vd, pass);
+            TCGv_i32 rd = tcg_temp_new_i32();
+            read_neon_element32(rd, a->vd, pass, MO_32);
             accfn(tmp, rd, tmp);
             tcg_temp_free_i32(rd);
         }
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(scalar);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
      * performs a kind of fused op-then-accumulate using a helper
      * function that takes all of rd, rn and the scalar at once.
      */
-    TCGv_i32 scalar;
+    TCGv_i32 scalar, rn, rd;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
     }
 
     scalar = neon_get_scalar(a->size, a->vm);
+    rn = tcg_temp_new_i32();
+    rd = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 rn = neon_load_reg(a->vn, pass);
-        TCGv_i32 rd = neon_load_reg(a->vd, pass);
+        read_neon_element32(rn, a->vn, pass, MO_32);
+        read_neon_element32(rd, a->vd, pass, MO_32);
         opfn(rd, cpu_env, rn, scalar, rd);
-        tcg_temp_free_i32(rn);
-        neon_store_reg(a->vd, pass, rd);
+        write_neon_element32(rd, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(rd);
     tcg_temp_free_i32(scalar);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
     scalar = neon_get_scalar(a->size, a->vm);
 
     /* Load all inputs before writing any outputs, in case of overlap */
-    rn = neon_load_reg(a->vn, 0);
+    rn = tcg_temp_new_i32();
+    read_neon_element32(rn, a->vn, 0, MO_32);
     rn0_64 = tcg_temp_new_i64();
     opfn(rn0_64, rn, scalar);
-    tcg_temp_free_i32(rn);
 
-    rn = neon_load_reg(a->vn, 1);
+    read_neon_element32(rn, a->vn, 1, MO_32);
     rn1_64 = tcg_temp_new_i64();
     opfn(rn1_64, rn, scalar);
     tcg_temp_free_i32(rn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
         return false;
     }
     n <<= 3;
+    tmp = tcg_temp_new_i32();
     if (a->op) {
-        tmp = neon_load_reg(a->vd, 0);
+        read_neon_element32(tmp, a->vd, 0, MO_32);
     } else {
-        tmp = tcg_temp_new_i32();
         tcg_gen_movi_i32(tmp, 0);
     }
-    tmp2 = neon_load_reg(a->vm, 0);
+    tmp2 = tcg_temp_new_i32();
+    read_neon_element32(tmp2, a->vm, 0, MO_32);
     ptr1 = vfp_reg_ptr(true, a->vn);
     tmp4 = tcg_const_i32(n);
     gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
-    tcg_temp_free_i32(tmp);
+
     if (a->op) {
-        tmp = neon_load_reg(a->vd, 1);
+        read_neon_element32(tmp, a->vd, 1, MO_32);
     } else {
-        tmp = tcg_temp_new_i32();
         tcg_gen_movi_i32(tmp, 0);
     }
-    tmp3 = neon_load_reg(a->vm, 1);
+    tmp3 = tcg_temp_new_i32();
+    read_neon_element32(tmp3, a->vm, 1, MO_32);
     gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(tmp4);
     tcg_temp_free_ptr(ptr1);
-    neon_store_reg(a->vd, 0, tmp2);
-    neon_store_reg(a->vd, 1, tmp3);
-    tcg_temp_free_i32(tmp);
+
+    write_neon_element32(tmp2, a->vd, 0, MO_32);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp2);
+    tcg_temp_free_i32(tmp3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
 static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
 {
     int pass, half;
+    TCGv_i32 tmp[2];
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
         return true;
     }
 
-    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
-        TCGv_i32 tmp[2];
+    tmp[0] = tcg_temp_new_i32();
+    tmp[1] = tcg_temp_new_i32();
 
+    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
         for (half = 0; half < 2; half++) {
-            tmp[half] = neon_load_reg(a->vm, pass * 2 + half);
+            read_neon_element32(tmp[half], a->vm, pass * 2 + half, MO_32);
             switch (a->size) {
             case 0:
                 tcg_gen_bswap32_i32(tmp[half], tmp[half]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
                 g_assert_not_reached();
             }
         }
-        neon_store_reg(a->vd, pass * 2, tmp[1]);
-        neon_store_reg(a->vd, pass * 2 + 1, tmp[0]);
+        write_neon_element32(tmp[1], a->vd, pass * 2, MO_32);
+        write_neon_element32(tmp[0], a->vd, pass * 2 + 1, MO_32);
     }
+
+    tcg_temp_free_i32(tmp[0]);
+    tcg_temp_free_i32(tmp[1]);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
         rm0_64 = tcg_temp_new_i64();
         rm1_64 = tcg_temp_new_i64();
         rd_64 = tcg_temp_new_i64();
-        tmp = neon_load_reg(a->vm, pass * 2);
+
+        tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, pass * 2, MO_32);
         widenfn(rm0_64, tmp);
-        tcg_temp_free_i32(tmp);
-        tmp = neon_load_reg(a->vm, pass * 2 + 1);
+        read_neon_element32(tmp, a->vm, pass * 2 + 1, MO_32);
         widenfn(rm1_64, tmp);
         tcg_temp_free_i32(tmp);
+
         opfn(rd_64, rm0_64, rm1_64);
         tcg_temp_free_i64(rm0_64);
         tcg_temp_free_i64(rm1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
     narrowfn(rd0, cpu_env, rm);
     neon_load_reg64(rm, a->vm + 1);
     narrowfn(rd1, cpu_env, rm);
-    neon_store_reg(a->vd, 0, rd0);
-    neon_store_reg(a->vd, 1, rd1);
+    write_neon_element32(rd0, a->vd, 0, MO_32);
+    write_neon_element32(rd1, a->vd, 1, MO_32);
+    tcg_temp_free_i32(rd0);
+    tcg_temp_free_i32(rd1);
     tcg_temp_free_i64(rm);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
     }
 
     rd = tcg_temp_new_i64();
+    rm0 = tcg_temp_new_i32();
+    rm1 = tcg_temp_new_i32();
 
-    rm0 = neon_load_reg(a->vm, 0);
-    rm1 = neon_load_reg(a->vm, 1);
+    read_neon_element32(rm0, a->vm, 0, MO_32);
+    read_neon_element32(rm1, a->vm, 1, MO_32);
 
     widenfn(rd, rm0);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
 
     fpst = fpstatus_ptr(FPST_STD);
     ahp = get_ahp_flag();
-    tmp = neon_load_reg(a->vm, 0);
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vm, 0, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
-    tmp2 = neon_load_reg(a->vm, 1);
+    tmp2 = tcg_temp_new_i32();
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp2, tmp2, fpst, ahp);
     tcg_gen_shli_i32(tmp2, tmp2, 16);
     tcg_gen_or_i32(tmp2, tmp2, tmp);
-    tcg_temp_free_i32(tmp);
-    tmp = neon_load_reg(a->vm, 2);
+    read_neon_element32(tmp, a->vm, 2, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
-    tmp3 = neon_load_reg(a->vm, 3);
-    neon_store_reg(a->vd, 0, tmp2);
+    tmp3 = tcg_temp_new_i32();
+    read_neon_element32(tmp3, a->vm, 3, MO_32);
+    write_neon_element32(tmp2, a->vd, 0, MO_32);
+    tcg_temp_free_i32(tmp2);
     gen_helper_vfp_fcvt_f32_to_f16(tmp3, tmp3, fpst, ahp);
     tcg_gen_shli_i32(tmp3, tmp3, 16);
     tcg_gen_or_i32(tmp3, tmp3, tmp);
-    neon_store_reg(a->vd, 1, tmp3);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp3);
     tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(ahp);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
     fpst = fpstatus_ptr(FPST_STD);
     ahp = get_ahp_flag();
     tmp3 = tcg_temp_new_i32();
-    tmp = neon_load_reg(a->vm, 0);
-    tmp2 = neon_load_reg(a->vm, 1);
+    tmp2 = tcg_temp_new_i32();
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vm, 0, MO_32);
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     tcg_gen_ext16u_i32(tmp3, tmp);
     gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
-    neon_store_reg(a->vd, 0, tmp3);
+    write_neon_element32(tmp3, a->vd, 0, MO_32);
     tcg_gen_shri_i32(tmp, tmp, 16);
     gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp);
-    neon_store_reg(a->vd, 1, tmp);
-    tmp3 = tcg_temp_new_i32();
+    write_neon_element32(tmp, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp);
     tcg_gen_ext16u_i32(tmp3, tmp2);
     gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
-    neon_store_reg(a->vd, 2, tmp3);
+    write_neon_element32(tmp3, a->vd, 2, MO_32);
+    tcg_temp_free_i32(tmp3);
     tcg_gen_shri_i32(tmp2, tmp2, 16);
     gen_helper_vfp_fcvt_f16_to_f32(tmp2, tmp2, fpst, ahp);
-    neon_store_reg(a->vd, 3, tmp2);
+    write_neon_element32(tmp2, a->vd, 3, MO_32);
+    tcg_temp_free_i32(tmp2);
     tcg_temp_free_i32(ahp);
     tcg_temp_free_ptr(fpst);
 
@@ -XXX,XX +XXX,XX @@ DO_2M_CRYPTO(SHA256SU0, aa32_sha2, 2)
 
 static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
 {
+    TCGv_i32 tmp;
     int pass;
 
     /* Handle a 2-reg-misc operation by iterating 32 bits at a time */
@@ -XXX,XX +XXX,XX @@ static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
         return true;
     }
 
+    tmp = tcg_temp_new_i32();
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
+        read_neon_element32(tmp, a->vm, pass, MO_32);
         fn(tmp, tmp);
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
 
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VTRN(DisasContext *s, arg_2misc *a)
         return true;
     }
 
-    if (a->size == 2) {
+    tmp = tcg_temp_new_i32();
+    tmp2 = tcg_temp_new_i32();
+    if (a->size == MO_32) {
         for (pass = 0; pass < (a->q ? 4 : 2); pass += 2) {
-            tmp = neon_load_reg(a->vm, pass);
-            tmp2 = neon_load_reg(a->vd, pass + 1);
-            neon_store_reg(a->vm, pass, tmp2);
-            neon_store_reg(a->vd, pass + 1, tmp);
+            read_neon_element32(tmp, a->vm, pass, MO_32);
+            read_neon_element32(tmp2, a->vd, pass + 1, MO_32);
+            write_neon_element32(tmp2, a->vm, pass, MO_32);
+            write_neon_element32(tmp, a->vd, pass + 1, MO_32);
         }
     } else {
         for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-            tmp = neon_load_reg(a->vm, pass);
-            tmp2 = neon_load_reg(a->vd, pass);
-            if (a->size == 0) {
+            read_neon_element32(tmp, a->vm, pass, MO_32);
+            read_neon_element32(tmp2, a->vd, pass, MO_32);
+            if (a->size == MO_8) {
                 gen_neon_trn_u8(tmp, tmp2);
             } else {
                 gen_neon_trn_u16(tmp, tmp2);
             }
-            neon_store_reg(a->vm, pass, tmp2);
-            neon_store_reg(a->vd, pass, tmp);
+            write_neon_element32(tmp2, a->vm, pass, MO_32);
+            write_neon_element32(tmp, a->vd, pass, MO_32);
         }
     }
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i32(tmp2);
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We can then use this to improve VMOV (scalar to gp) and
VMOV (gp to scalar) so that we simply perform the memory
operation that we wanted, rather than inserting or
extracting from a 32-bit quantity.

These were the last uses of neon_load/store_reg, so remove them.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-7-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         | 50 +++++++++++++-----------
 target/arm/translate-vfp.c.inc | 71 +++++-----------------------------
 2 files changed, 37 insertions(+), 84 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
  * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
  * where 0 is the least significant end of the register.
  */
-static long neon_element_offset(int reg, int element, MemOp size)
+static long neon_element_offset(int reg, int element, MemOp memop)
 {
-    int element_size = 1 << size;
+    int element_size = 1 << (memop & MO_SIZE);
     int ofs = element * element_size;
 #ifdef HOST_WORDS_BIGENDIAN
     /*
@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-static TCGv_i32 neon_load_reg(int reg, int pass)
-{
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
-    return tmp;
-}
-
-static void neon_store_reg(int reg, int pass, TCGv_i32 var)
-{
-    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
-    tcg_temp_free_i32(var);
-}
-
 static inline void neon_load_reg64(TCGv_i64 var, int reg)
 {
     tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
-static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
 {
-    long off = neon_element_offset(reg, ele, size);
+    long off = neon_element_offset(reg, ele, memop);
 
-    switch (size) {
-    case MO_32:
+    switch (memop) {
+    case MO_SB:
+        tcg_gen_ld8s_i32(dest, cpu_env, off);
+        break;
+    case MO_UB:
+        tcg_gen_ld8u_i32(dest, cpu_env, off);
+        break;
+    case MO_SW:
+        tcg_gen_ld16s_i32(dest, cpu_env, off);
+        break;
+    case MO_UW:
+        tcg_gen_ld16u_i32(dest, cpu_env, off);
+        break;
+    case MO_UL:
+    case MO_SL:
         tcg_gen_ld_i32(dest, cpu_env, off);
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
     }
 }
 
-static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
 {
-    long off = neon_element_offset(reg, ele, size);
+    long off = neon_element_offset(reg, ele, memop);
 
-    switch (size) {
+    switch (memop) {
+    case MO_8:
+        tcg_gen_st8_i32(src, cpu_env, off);
+        break;
+    case MO_16:
+        tcg_gen_st16_i32(src, cpu_env, off);
+        break;
     case MO_32:
         tcg_gen_st_i32(src, cpu_env, off);
         break;
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
 {
     /* VMOV scalar to general purpose register */
     TCGv_i32 tmp;
-    int pass;
-    uint32_t offset;
 
-    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
-    if (a->size == 2
+    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
+    if (a->size == MO_32
         ? !dc_isar_feature(aa32_fpsp_v2, s)
         : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
         return false;
     }
 
-    offset = a->index << a->size;
-    pass = extract32(offset, 2, 1);
-    offset = extract32(offset, 0, 2) * 8;
-
     if (!vfp_access_check(s)) {
         return true;
     }
 
-    tmp = neon_load_reg(a->vn, pass);
-    switch (a->size) {
-    case 0:
-        if (offset) {
-            tcg_gen_shri_i32(tmp, tmp, offset);
-        }
-        if (a->u) {
-            gen_uxtb(tmp);
-        } else {
-            gen_sxtb(tmp);
-        }
-        break;
-    case 1:
-        if (a->u) {
-            if (offset) {
-                tcg_gen_shri_i32(tmp, tmp, 16);
-            } else {
-                gen_uxth(tmp);
-            }
-        } else {
-            if (offset) {
-                tcg_gen_sari_i32(tmp, tmp, 16);
-            } else {
-                gen_sxth(tmp);
-            }
-        }
-        break;
-    case 2:
-        break;
-    }
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
     store_reg(s, a->rt, tmp);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
 static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
 {
     /* VMOV general purpose register to scalar */
-    TCGv_i32 tmp, tmp2;
-    int pass;
-    uint32_t offset;
+    TCGv_i32 tmp;
 
-    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
-    if (a->size == 2
+    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
+    if (a->size == MO_32
         ? !dc_isar_feature(aa32_fpsp_v2, s)
         : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
         return false;
     }
 
-    offset = a->index << a->size;
-    pass = extract32(offset, 2, 1);
-    offset = extract32(offset, 0, 2) * 8;
-
     if (!vfp_access_check(s)) {
         return true;
     }
 
     tmp = load_reg(s, a->rt);
-    switch (a->size) {
-    case 0:
-        tmp2 = neon_load_reg(a->vn, pass);
-        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
-        tcg_temp_free_i32(tmp2);
-        break;
-    case 1:
-        tmp2 = neon_load_reg(a->vn, pass);
-        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
-        tcg_temp_free_i32(tmp2);
-        break;
-    case 2:
-        break;
-    }
-    neon_store_reg(a->vn, pass, tmp);
+    write_neon_element32(tmp, a->vn, a->index, a->size);
+    tcg_temp_free_i32(tmp);
 
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The only uses of this function are for loading VFP
single-precision values, and nothing to do with NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         |   4 +-
 target/arm/translate-vfp.c.inc | 184 ++++++++++++++++-----------------
 2 files changed, 94 insertions(+), 94 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
 }
 
-static inline void neon_load_reg32(TCGv_i32 var, int reg)
+static inline void vfp_load_reg32(TCGv_i32 var, int reg)
 {
     tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
-static inline void neon_store_reg32(TCGv_i32 var, int reg)
+static inline void vfp_store_reg32(TCGv_i32 var, int reg)
 {
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         frn = tcg_temp_new_i32();
         frm = tcg_temp_new_i32();
         dest = tcg_temp_new_i32();
-        neon_load_reg32(frn, rn);
-        neon_load_reg32(frm, rm);
+        vfp_load_reg32(frn, rn);
+        vfp_load_reg32(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         if (sz == 1) {
             tcg_gen_andi_i32(dest, dest, 0xffff);
         }
-        neon_store_reg32(dest, rd);
+        vfp_store_reg32(dest, rd);
         tcg_temp_free_i32(frn);
         tcg_temp_free_i32(frm);
         tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i32 tcg_res;
         tcg_op = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        neon_load_reg32(tcg_op, rm);
+        vfp_load_reg32(tcg_op, rm);
         if (sz == 1) {
             gen_helper_rinth(tcg_res, tcg_op, fpst);
         } else {
             gen_helper_rints(tcg_res, tcg_op, fpst);
         }
-        neon_store_reg32(tcg_res, rd);
+        vfp_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_op);
         tcg_temp_free_i32(tcg_res);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
             gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
         }
         tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
-        neon_store_reg32(tcg_tmp, rd);
+        vfp_store_reg32(tcg_tmp, rd);
         tcg_temp_free_i32(tcg_tmp);
         tcg_temp_free_i64(tcg_res);
         tcg_temp_free_i64(tcg_double);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         TCGv_i32 tcg_single, tcg_res;
         tcg_single = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        neon_load_reg32(tcg_single, rm);
+        vfp_load_reg32(tcg_single, rm);
         if (sz == 1) {
             if (is_signed) {
                 gen_helper_vfp_toslh(tcg_res, tcg_single, tcg_shift, fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
                 gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
             }
         }
-        neon_store_reg32(tcg_res, rd);
+        vfp_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_res);
         tcg_temp_free_i32(tcg_single);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
     if (a->l) {
         /* VFP to general purpose register */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vn);
+        vfp_load_reg32(tmp, a->vn);
         tcg_gen_andi_i32(tmp, tmp, 0xffff);
         store_reg(s, a->rt, tmp);
     } else {
         /* general purpose register to VFP */
         tmp = load_reg(s, a->rt);
         tcg_gen_andi_i32(tmp, tmp, 0xffff);
-        neon_store_reg32(tmp, a->vn);
+        vfp_store_reg32(tmp, a->vn);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
     if (a->l) {
         /* VFP to general purpose register */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vn);
+        vfp_load_reg32(tmp, a->vn);
         if (a->rt == 15) {
             /* Set the 4 flag bits in the CPSR.  */
             gen_set_nzcv(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
     } else {
         /* general purpose register to VFP */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vn);
+        vfp_store_reg32(tmp, a->vn);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
     if (a->op) {
         /* fpreg to gpreg */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm);
+        vfp_load_reg32(tmp, a->vm);
         store_reg(s, a->rt, tmp);
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm + 1);
+        vfp_load_reg32(tmp, a->vm + 1);
         store_reg(s, a->rt2, tmp);
     } else {
         /* gpreg to fpreg */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vm);
+        vfp_store_reg32(tmp, a->vm);
         tcg_temp_free_i32(tmp);
         tmp = load_reg(s, a->rt2);
-        neon_store_reg32(tmp, a->vm + 1);
+        vfp_store_reg32(tmp, a->vm + 1);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
     if (a->op) {
         /* fpreg to gpreg */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm * 2);
+        vfp_load_reg32(tmp, a->vm * 2);
         store_reg(s, a->rt, tmp);
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm * 2 + 1);
+        vfp_load_reg32(tmp, a->vm * 2 + 1);
         store_reg(s, a->rt2, tmp);
     } else {
         /* gpreg to fpreg */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vm * 2);
+        vfp_store_reg32(tmp, a->vm * 2);
         tcg_temp_free_i32(tmp);
         tmp = load_reg(s, a->rt2);
-        neon_store_reg32(tmp, a->vm * 2 + 1);
+        vfp_store_reg32(tmp, a->vm * 2 + 1);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     tmp = tcg_temp_new_i32();
     if (a->l) {
         gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
-        neon_store_reg32(tmp, a->vd);
+        vfp_store_reg32(tmp, a->vd);
     } else {
-        neon_load_reg32(tmp, a->vd);
+        vfp_load_reg32(tmp, a->vd);
         gen_aa32_st16(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     tmp = tcg_temp_new_i32();
     if (a->l) {
         gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-        neon_store_reg32(tmp, a->vd);
+        vfp_store_reg32(tmp, a->vd);
     } else {
-        neon_load_reg32(tmp, a->vd);
+        vfp_load_reg32(tmp, a->vd);
         gen_aa32_st32(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
         if (a->l) {
             /* load */
             gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-            neon_store_reg32(tmp, a->vd + i);
+            vfp_store_reg32(tmp, a->vd + i);
         } else {
             /* store */
-            neon_load_reg32(tmp, a->vd + i);
+            vfp_load_reg32(tmp, a->vd + i);
             gen_aa32_st32(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
     fd = tcg_temp_new_i32();
     fpst = fpstatus_ptr(FPST_FPCR);
 
-    neon_load_reg32(f0, vn);
-    neon_load_reg32(f1, vm);
+    vfp_load_reg32(f0, vn);
+    vfp_load_reg32(f1, vm);
 
     for (;;) {
         if (reads_vd) {
-            neon_load_reg32(fd, vd);
+            vfp_load_reg32(fd, vd);
         }
         fn(fd, f0, f1, fpst);
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
         veclen--;
         vd = vfp_advance_sreg(vd, delta_d);
         vn = vfp_advance_sreg(vn, delta_d);
-        neon_load_reg32(f0, vn);
+        vfp_load_reg32(f0, vn);
         if (delta_m) {
             vm = vfp_advance_sreg(vm, delta_m);
-            neon_load_reg32(f1, vm);
+            vfp_load_reg32(f1, vm);
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
     fd = tcg_temp_new_i32();
     fpst = fpstatus_ptr(FPST_FPCR_F16);
 
-    neon_load_reg32(f0, vn);
-    neon_load_reg32(f1, vm);
+    vfp_load_reg32(f0, vn);
+    vfp_load_reg32(f1, vm);
 
     if (reads_vd) {
-        neon_load_reg32(fd, vd);
+        vfp_load_reg32(fd, vd);
     }
     fn(fd, f0, f1, fpst);
-    neon_store_reg32(fd, vd);
+    vfp_store_reg32(fd, vd);
 
     tcg_temp_free_i32(f0);
     tcg_temp_free_i32(f1);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     f0 = tcg_temp_new_i32();
     fd = tcg_temp_new_i32();
 
-    neon_load_reg32(f0, vm);
+    vfp_load_reg32(f0, vm);
 
     for (;;) {
         fn(fd, f0);
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
             /* single source one-many */
             while (veclen--) {
                 vd = vfp_advance_sreg(vd, delta_d);
-                neon_store_reg32(fd, vd);
+                vfp_store_reg32(fd, vd);
             }
             break;
         }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
         veclen--;
         vd = vfp_advance_sreg(vd, delta_d);
         vm = vfp_advance_sreg(vm, delta_m);
-        neon_load_reg32(f0, vm);
+        vfp_load_reg32(f0, vm);
     }
 
     tcg_temp_free_i32(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     }
 
     f0 = tcg_temp_new_i32();
-    neon_load_reg32(f0, vm);
+    vfp_load_reg32(f0, vm);
     fn(f0, f0);
-    neon_store_reg32(f0, vd);
+    vfp_store_reg32(f0, vd);
     tcg_temp_free_i32(f0);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i32();
 
-    neon_load_reg32(vn, a->vn);
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vn, a->vn);
+    vfp_load_reg32(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negh(vn, vn);
     }
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negh(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i32();
 
-    neon_load_reg32(vn, a->vn);
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vn, a->vn);
+    vfp_load_reg32(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negs(vn, vn);
     }
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negs(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_hp(DisasContext *s, arg_VMOV_imm_sp *a)
     }
 
     fd = tcg_const_i32(vfp_expand_imm(MO_16, a->imm));
-    neon_store_reg32(fd, a->vd);
+    vfp_store_reg32(fd, a->vd);
     tcg_temp_free_i32(fd);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
     fd = tcg_const_i32(vfp_expand_imm(MO_32, a->imm));
 
     for (;;) {
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i32();
 
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i32(vm, 0);
     } else {
-        neon_load_reg32(vm, a->vm);
+        vfp_load_reg32(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i32();
 
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i32(vm, 0);
     } else {
-        neon_load_reg32(vm, a->vm);
+        vfp_load_reg32(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
     /* The T bit tells us if we want the low or high 16 bits of Vm */
     tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
     gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_i32(ahp_mode);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
     ahp_mode = get_ahp_flag();
     tmp = tcg_temp_new_i32();
 
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
     tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
     tcg_temp_free_i32(ahp_mode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_hp(DisasContext *s, arg_VRINTR_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_rinth(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rints(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_hp(DisasContext *s, arg_VRINTZ_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rinth(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tcg_rmode);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rints(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tcg_rmode);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_hp(DisasContext *s, arg_VRINTX_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_rinth_exact(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rints_exact(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
 
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i64();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     gen_helper_vfp_fcvtds(vd, vm, cpu_env);
     neon_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
     vm = tcg_temp_new_i64();
     neon_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i64(vm);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
     }
 
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     if (a->s) {
         /* i32 -> f16 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
         /* u32 -> f16 */
         gen_helper_vfp_uitoh(vm, vm, fpst);
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
     }
 
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     if (a->s) {
         /* i32 -> f32 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
         /* u32 -> f32 */
         gen_helper_vfp_uitos(vm, vm, fpst);
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
 
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i64();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     if (a->s) {
         /* i32 -> f64 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
     vd = tcg_temp_new_i32();
     neon_load_reg64(vm, a->vm);
     gen_helper_vjcvt(vd, vm, cpu_env);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i64(vm);
     tcg_temp_free_i32(vd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i32();
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i32();
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
 
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
             gen_helper_vfp_touih(vm, vm, fpst);
         }
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
 
     fpst = fpstatus_ptr(FPST_FPCR);
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
             gen_helper_vfp_touis(vm, vm, fpst);
         }
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
             gen_helper_vfp_touid(vd, vm, fpst);
         }
     }
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i64(vm);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VINS(DisasContext *s, arg_VINS *a)
     /* Insert low half of Vm into high half of Vd */
     rm = tcg_temp_new_i32();
     rd = tcg_temp_new_i32();
-    neon_load_reg32(rm, a->vm);
-    neon_load_reg32(rd, a->vd);
+    vfp_load_reg32(rm, a->vm);
+    vfp_load_reg32(rd, a->vd);
     tcg_gen_deposit_i32(rd, rd, rm, 16, 16);
-    neon_store_reg32(rd, a->vd);
+    vfp_store_reg32(rd, a->vd);
     tcg_temp_free_i32(rm);
     tcg_temp_free_i32(rd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOVX(DisasContext *s, arg_VINS *a)
 
     /* Set Vd to high half of Vm */
     rm = tcg_temp_new_i32();
-    neon_load_reg32(rm, a->vm);
+    vfp_load_reg32(rm, a->vm);
     tcg_gen_shri_i32(rm, rm, 16);
-    neon_store_reg32(rm, a->vd);
+    vfp_store_reg32(rm, a->vd);
     tcg_temp_free_i32(rm);
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-9-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          | 26 +++++++++
 target/arm/translate-neon.c.inc | 94 ++++++++++++++++-----------------
 2 files changed, 73 insertions(+), 47 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
     }
 }
 
+static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
+{
+    long off = neon_element_offset(reg, ele, memop);
+
+    switch (memop) {
+    case MO_Q:
+        tcg_gen_ld_i64(dest, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
 {
     long off = neon_element_offset(reg, ele, memop);
@@ -XXX,XX +XXX,XX @@ static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
     }
 }
 
+static void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
+{
+    long off = neon_element_offset(reg, ele, memop);
+
+    switch (memop) {
+    case MO_64:
+        tcg_gen_st_i64(src, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 {
     TCGv_ptr ret = tcg_temp_new_ptr();
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
     for (pass = 0; pass < a->q + 1; pass++) {
         TCGv_i64 tmp = tcg_temp_new_i64();
 
-        neon_load_reg64(tmp, a->vm + pass);
+        read_neon_element64(tmp, a->vm, pass, MO_64);
         fn(tmp, cpu_env, tmp, constimm);
-        neon_store_reg64(tmp, a->vd + pass);
+        write_neon_element64(tmp, a->vd, pass, MO_64);
         tcg_temp_free_i64(tmp);
     }
     tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
     rd = tcg_temp_new_i32();
 
     /* Load both inputs first to avoid potential overwrite if rm == rd */
-    neon_load_reg64(rm1, a->vm);
-    neon_load_reg64(rm2, a->vm + 1);
+    read_neon_element64(rm1, a->vm, 0, MO_64);
+    read_neon_element64(rm2, a->vm, 1, MO_64);
 
     shiftfn(rm1, rm1, constimm);
     narrowfn(rd, cpu_env, rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
     }
-    neon_store_reg64(tmp, a->vd);
+    write_neon_element64(tmp, a->vd, 0, MO_64);
 
     widenfn(tmp, rm1);
     tcg_temp_free_i32(rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
     }
-    neon_store_reg64(tmp, a->vd + 1);
+    write_neon_element64(tmp, a->vd, 1, MO_64);
     tcg_temp_free_i64(tmp);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rm_64 = tcg_temp_new_i64();
 
     if (src1_wide) {
-        neon_load_reg64(rn0_64, a->vn);
+        read_neon_element64(rn0_64, a->vn, 0, MO_64);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 0, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      * avoid incorrect results if a narrow input overlaps with the result.
      */
     if (src1_wide) {
-        neon_load_reg64(rn1_64, a->vn + 1);
+        read_neon_element64(rn1_64, a->vn, 1, MO_64);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rm = tcg_temp_new_i32();
     read_neon_element32(rm, a->vm, 1, MO_32);
 
-    neon_store_reg64(rn0_64, a->vd);
+    write_neon_element64(rn0_64, a->vd, 0, MO_64);
 
     widenfn(rm_64, rm);
     tcg_temp_free_i32(rm);
     opfn(rn1_64, rn1_64, rm_64);
-    neon_store_reg64(rn1_64, a->vd + 1);
+    write_neon_element64(rn1_64, a->vd, 1, MO_64);
 
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
     rd0 = tcg_temp_new_i32();
     rd1 = tcg_temp_new_i32();
 
-    neon_load_reg64(rn_64, a->vn);
-    neon_load_reg64(rm_64, a->vm);
+    read_neon_element64(rn_64, a->vn, 0, MO_64);
+    read_neon_element64(rm_64, a->vm, 0, MO_64);
 
     opfn(rn_64, rn_64, rm_64);
 
     narrowfn(rd0, rn_64);
 
-    neon_load_reg64(rn_64, a->vn + 1);
-    neon_load_reg64(rm_64, a->vm + 1);
+    read_neon_element64(rn_64, a->vn, 1, MO_64);
+    read_neon_element64(rm_64, a->vm, 1, MO_64);
 
     opfn(rn_64, rn_64, rm_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     /* Don't store results until after all loads: they might overlap */
     if (accfn) {
         tmp = tcg_temp_new_i64();
-        neon_load_reg64(tmp, a->vd);
+        read_neon_element64(tmp, a->vd, 0, MO_64);
         accfn(tmp, tmp, rd0);
-        neon_store_reg64(tmp, a->vd);
-        neon_load_reg64(tmp, a->vd + 1);
+        write_neon_element64(tmp, a->vd, 0, MO_64);
+        read_neon_element64(tmp, a->vd, 1, MO_64);
         accfn(tmp, tmp, rd1);
-        neon_store_reg64(tmp, a->vd + 1);
+        write_neon_element64(tmp, a->vd, 1, MO_64);
         tcg_temp_free_i64(tmp);
     } else {
-        neon_store_reg64(rd0, a->vd);
-        neon_store_reg64(rd1, a->vd + 1);
+        write_neon_element64(rd0, a->vd, 0, MO_64);
+        write_neon_element64(rd1, a->vd, 1, MO_64);
     }
 
     tcg_temp_free_i64(rd0);
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
 
     if (accfn) {
         TCGv_i64 t64 = tcg_temp_new_i64();
-        neon_load_reg64(t64, a->vd);
+        read_neon_element64(t64, a->vd, 0, MO_64);
         accfn(t64, t64, rn0_64);
-        neon_store_reg64(t64, a->vd);
-        neon_load_reg64(t64, a->vd + 1);
+        write_neon_element64(t64, a->vd, 0, MO_64);
+        read_neon_element64(t64, a->vd, 1, MO_64);
         accfn(t64, t64, rn1_64);
-        neon_store_reg64(t64, a->vd + 1);
+        write_neon_element64(t64, a->vd, 1, MO_64);
         tcg_temp_free_i64(t64);
     } else {
-        neon_store_reg64(rn0_64, a->vd);
-        neon_store_reg64(rn1_64, a->vd + 1);
+        write_neon_element64(rn0_64, a->vd, 0, MO_64);
+        write_neon_element64(rn1_64, a->vd, 1, MO_64);
     }
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
         right = tcg_temp_new_i64();
         dest = tcg_temp_new_i64();
 
-        neon_load_reg64(right, a->vn);
-        neon_load_reg64(left, a->vm);
+        read_neon_element64(right, a->vn, 0, MO_64);
+        read_neon_element64(left, a->vm, 0, MO_64);
         tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
-        neon_store_reg64(dest, a->vd);
+        write_neon_element64(dest, a->vd, 0, MO_64);
 
         tcg_temp_free_i64(left);
         tcg_temp_free_i64(right);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
         destright = tcg_temp_new_i64();
 
         if (a->imm < 8) {
-            neon_load_reg64(right, a->vn);
-            neon_load_reg64(middle, a->vn + 1);
+            read_neon_element64(right, a->vn, 0, MO_64);
+            read_neon_element64(middle, a->vn, 1, MO_64);
             tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
-            neon_load_reg64(left, a->vm);
+            read_neon_element64(left, a->vm, 0, MO_64);
             tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
         } else {
-            neon_load_reg64(right, a->vn + 1);
-            neon_load_reg64(middle, a->vm);
+            read_neon_element64(right, a->vn, 1, MO_64);
+            read_neon_element64(middle, a->vm, 0, MO_64);
             tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
-            neon_load_reg64(left, a->vm + 1);
+            read_neon_element64(left, a->vm, 1, MO_64);
             tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
         }
 
-        neon_store_reg64(destright, a->vd);
-        neon_store_reg64(destleft, a->vd + 1);
+        write_neon_element64(destright, a->vd, 0, MO_64);
+        write_neon_element64(destleft, a->vd, 1, MO_64);
 
         tcg_temp_free_i64(destright);
         tcg_temp_free_i64(destleft);
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
 
         if (accfn) {
             TCGv_i64 tmp64 = tcg_temp_new_i64();
-            neon_load_reg64(tmp64, a->vd + pass);
+            read_neon_element64(tmp64, a->vd, pass, MO_64);
             accfn(rd_64, tmp64, rd_64);
             tcg_temp_free_i64(tmp64);
         }
-        neon_store_reg64(rd_64, a->vd + pass);
+        write_neon_element64(rd_64, a->vd, pass, MO_64);
         tcg_temp_free_i64(rd_64);
     }
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
     rd0 = tcg_temp_new_i32();
     rd1 = tcg_temp_new_i32();
 
-    neon_load_reg64(rm, a->vm);
+    read_neon_element64(rm, a->vm, 0, MO_64);
     narrowfn(rd0, cpu_env, rm);
-    neon_load_reg64(rm, a->vm + 1);
+    read_neon_element64(rm, a->vm, 1, MO_64);
     narrowfn(rd1, cpu_env, rm);
     write_neon_element32(rd0, a->vd, 0, MO_32);
     write_neon_element32(rd1, a->vd, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
 
     widenfn(rd, rm0);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
-    neon_store_reg64(rd, a->vd);
+    write_neon_element64(rd, a->vd, 0, MO_64);
     widenfn(rd, rm1);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
-    neon_store_reg64(rd, a->vd + 1);
+    write_neon_element64(rd, a->vd, 1, MO_64);
 
     tcg_temp_free_i64(rd);
     tcg_temp_free_i32(rm0);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSWP(DisasContext *s, arg_2misc *a)
     rm = tcg_temp_new_i64();
     rd = tcg_temp_new_i64();
     for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
-        neon_load_reg64(rm, a->vm + pass);
-        neon_load_reg64(rd, a->vd + pass);
-        neon_store_reg64(rm, a->vd + pass);
-        neon_store_reg64(rd, a->vm + pass);
+        read_neon_element64(rm, a->vm, pass, MO_64);
+        read_neon_element64(rd, a->vd, pass, MO_64);
+        write_neon_element64(rm, a->vd, pass, MO_64);
+        write_neon_element64(rd, a->vm, pass, MO_64);
     }
     tcg_temp_free_i64(rm);
     tcg_temp_free_i64(rd);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The only uses of this function are for loading VFP
double-precision values, and nothing to do with NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-10-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         |  8 ++--
 target/arm/translate-vfp.c.inc | 84 +++++++++++++++++-----------------
 2 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-static inline void neon_load_reg64(TCGv_i64 var, int reg)
+static inline void vfp_load_reg64(TCGv_i64 var, int reg)
 {
-    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
+    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(true, reg));
 }
 
-static inline void neon_store_reg64(TCGv_i64 var, int reg)
+static inline void vfp_store_reg64(TCGv_i64 var, int reg)
 {
-    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
+    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(true, reg));
 }
 
 static inline void vfp_load_reg32(TCGv_i32 var, int reg)
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         tcg_gen_ext_i32_i64(nf, cpu_NF);
         tcg_gen_ext_i32_i64(vf, cpu_VF);
 
-        neon_load_reg64(frn, rn);
-        neon_load_reg64(frm, rm);
+        vfp_load_reg64(frn, rn);
+        vfp_load_reg64(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
             tcg_temp_free_i64(tmp);
             break;
         }
-        neon_store_reg64(dest, rd);
+        vfp_store_reg64(dest, rd);
         tcg_temp_free_i64(frn);
         tcg_temp_free_i64(frm);
         tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i64 tcg_res;
         tcg_op = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
-        neon_load_reg64(tcg_op, rm);
+        vfp_load_reg64(tcg_op, rm);
         gen_helper_rintd(tcg_res, tcg_op, fpst);
-        neon_store_reg64(tcg_res, rd);
+        vfp_store_reg64(tcg_res, rd);
         tcg_temp_free_i64(tcg_op);
         tcg_temp_free_i64(tcg_res);
     } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         tcg_double = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
         tcg_tmp = tcg_temp_new_i32();
-        neon_load_reg64(tcg_double, rm);
+        vfp_load_reg64(tcg_double, rm);
         if (is_signed) {
             gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
         } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
     tmp = tcg_temp_new_i64();
     if (a->l) {
         gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-        neon_store_reg64(tmp, a->vd);
+        vfp_store_reg64(tmp, a->vd);
     } else {
-        neon_load_reg64(tmp, a->vd);
+        vfp_load_reg64(tmp, a->vd);
         gen_aa32_st64(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i64(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
         if (a->l) {
             /* load */
             gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-            neon_store_reg64(tmp, a->vd + i);
+            vfp_store_reg64(tmp, a->vd + i);
         } else {
             /* store */
-            neon_load_reg64(tmp, a->vd + i);
+            vfp_load_reg64(tmp, a->vd + i);
             gen_aa32_st64(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
     fd = tcg_temp_new_i64();
     fpst = fpstatus_ptr(FPST_FPCR);
 
-    neon_load_reg64(f0, vn);
-    neon_load_reg64(f1, vm);
+    vfp_load_reg64(f0, vn);
+    vfp_load_reg64(f1, vm);
 
     for (;;) {
         if (reads_vd) {
-            neon_load_reg64(fd, vd);
+            vfp_load_reg64(fd, vd);
         }
         fn(fd, f0, f1, fpst);
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
         veclen--;
         vd = vfp_advance_dreg(vd, delta_d);
         vn = vfp_advance_dreg(vn, delta_d);
-        neon_load_reg64(f0, vn);
+        vfp_load_reg64(f0, vn);
         if (delta_m) {
             vm = vfp_advance_dreg(vm, delta_m);
-            neon_load_reg64(f1, vm);
+            vfp_load_reg64(f1, vm);
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
     f0 = tcg_temp_new_i64();
     fd = tcg_temp_new_i64();
 
-    neon_load_reg64(f0, vm);
+    vfp_load_reg64(f0, vm);
 
     for (;;) {
         fn(fd, f0);
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
             /* single source one-many */
             while (veclen--) {
                 vd = vfp_advance_dreg(vd, delta_d);
-                neon_store_reg64(fd, vd);
+                vfp_store_reg64(fd, vd);
             }
             break;
         }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
         veclen--;
         vd = vfp_advance_dreg(vd, delta_d);
         vd = vfp_advance_dreg(vm, delta_m);
-        neon_load_reg64(f0, vm);
+        vfp_load_reg64(f0, vm);
     }
 
     tcg_temp_free_i64(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i64();
 
-    neon_load_reg64(vn, a->vn);
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vn, a->vn);
+    vfp_load_reg64(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negd(vn, vn);
     }
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negd(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     fd = tcg_const_i64(vfp_expand_imm(MO_64, a->imm));
 
     for (;;) {
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
     vd = tcg_temp_new_i64();
     vm = tcg_temp_new_i64();
 
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i64(vm, 0);
     } else {
-        neon_load_reg64(vm, a->vm);
+        vfp_load_reg64(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
     tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
     vd = tcg_temp_new_i64();
     gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(ahp_mode);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
     tmp = tcg_temp_new_i32();
     vm = tcg_temp_new_i64();
 
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
     tcg_temp_free_i64(vm);
     tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rintd(tmp, tmp, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rintd(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     tcg_temp_free_i32(tcg_rmode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rintd_exact(tmp, tmp, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
     vd = tcg_temp_new_i64();
     vfp_load_reg32(vm, a->vm);
     gen_helper_vfp_fcvtds(vd, vm, cpu_env);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_i64(vd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
 
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i64();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
     vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
         /* u32 -> f64 */
         gen_helper_vfp_uitod(vd, vm, fpst);
     }
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_i64(vd);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
 
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i32();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vjcvt(vd, vm, cpu_env);
     vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i64(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i64();
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i64(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
     fpst = fpstatus_ptr(FPST_FPCR);
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i32();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

In both cases, we can sink the write-back and perform
the accumulate into the normal destination temps.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-11-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.c.inc | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     if (accfn) {
         tmp = tcg_temp_new_i64();
         read_neon_element64(tmp, a->vd, 0, MO_64);
-        accfn(tmp, tmp, rd0);
-        write_neon_element64(tmp, a->vd, 0, MO_64);
+        accfn(rd0, tmp, rd0);
         read_neon_element64(tmp, a->vd, 1, MO_64);
-        accfn(tmp, tmp, rd1);
-        write_neon_element64(tmp, a->vd, 1, MO_64);
+        accfn(rd1, tmp, rd1);
         tcg_temp_free_i64(tmp);
-    } else {
-        write_neon_element64(rd0, a->vd, 0, MO_64);
-        write_neon_element64(rd1, a->vd, 1, MO_64);
     }
 
+    write_neon_element64(rd0, a->vd, 0, MO_64);
+    write_neon_element64(rd1, a->vd, 1, MO_64);
     tcg_temp_free_i64(rd0);
     tcg_temp_free_i64(rd1);
 
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
     if (accfn) {
         TCGv_i64 t64 = tcg_temp_new_i64();
         read_neon_element64(t64, a->vd, 0, MO_64);
-        accfn(t64, t64, rn0_64);
-        write_neon_element64(t64, a->vd, 0, MO_64);
+        accfn(rn0_64, t64, rn0_64);
         read_neon_element64(t64, a->vd, 1, MO_64);
-        accfn(t64, t64, rn1_64);
-        write_neon_element64(t64, a->vd, 1, MO_64);
+        accfn(rn1_64, t64, rn1_64);
         tcg_temp_free_i64(t64);
-    } else {
-        write_neon_element64(rn0_64, a->vd, 0, MO_64);
-        write_neon_element64(rn1_64, a->vd, 1, MO_64);
     }
+
+    write_neon_element64(rn0_64, a->vd, 0, MO_64);
+    write_neon_element64(rn1_64, a->vd, 1, MO_64);
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
     return true;
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We can use proper widening loads to extend 32-bit inputs,
and skip the "widenfn" step.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-12-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  6 +++
 target/arm/translate-neon.c.inc | 66 ++++++++++++++++++---------------
 2 files changed, 43 insertions(+), 29 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
     long off = neon_element_offset(reg, ele, memop);
 
     switch (memop) {
+    case MO_SL:
+        tcg_gen_ld32s_i64(dest, cpu_env, off);
+        break;
+    case MO_UL:
+        tcg_gen_ld32u_i64(dest, cpu_env, off);
+        break;
     case MO_Q:
         tcg_gen_ld_i64(dest, cpu_env, off);
         break;
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
 static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
                            NeonGenWidenFn *widenfn,
                            NeonGenTwo64OpFn *opfn,
-                           bool src1_wide)
+                           int src1_mop, int src2_mop)
 {
     /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
     TCGv_i64 rn0_64, rn1_64, rm_64;
-    TCGv_i32 rm;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
         return false;
     }
 
-    if (!widenfn || !opfn) {
+    if (!opfn) {
         /* size == 3 case, which is an entirely different insn group */
         return false;
     }
 
-    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
+    if ((a->vd & 1) || (src1_mop == MO_Q && (a->vn & 1))) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rn1_64 = tcg_temp_new_i64();
     rm_64 = tcg_temp_new_i64();
 
-    if (src1_wide) {
-        read_neon_element64(rn0_64, a->vn, 0, MO_64);
+    if (src1_mop >= 0) {
+        read_neon_element64(rn0_64, a->vn, 0, src1_mop);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 0, MO_32);
         widenfn(rn0_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = tcg_temp_new_i32();
-    read_neon_element32(rm, a->vm, 0, MO_32);
+    if (src2_mop >= 0) {
+        read_neon_element64(rm_64, a->vm, 0, src2_mop);
+    } else {
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, 0, MO_32);
+        widenfn(rm_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
 
-    widenfn(rm_64, rm);
-    tcg_temp_free_i32(rm);
     opfn(rn0_64, rn0_64, rm_64);
 
     /*
      * Load second pass inputs before storing the first pass result, to
      * avoid incorrect results if a narrow input overlaps with the result.
      */
-    if (src1_wide) {
-        read_neon_element64(rn1_64, a->vn, 1, MO_64);
+    if (src1_mop >= 0) {
+        read_neon_element64(rn1_64, a->vn, 1, src1_mop);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 1, MO_32);
         widenfn(rn1_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = tcg_temp_new_i32();
-    read_neon_element32(rm, a->vm, 1, MO_32);
+    if (src2_mop >= 0) {
+        read_neon_element64(rm_64, a->vm, 1, src2_mop);
+    } else {
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, 1, MO_32);
+        widenfn(rm_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
 
     write_neon_element64(rn0_64, a->vd, 0, MO_64);
 
-    widenfn(rm_64, rm);
-    tcg_temp_free_i32(rm);
     opfn(rn1_64, rn1_64, rm_64);
     write_neon_element64(rn1_64, a->vd, 1, MO_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     return true;
 }
 
-#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
+#define DO_PREWIDEN(INSN, S, OP, SRC1WIDE, SIGN)                        \
     static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
     {                                                                   \
         static NeonGenWidenFn * const widenfn[] = {                     \
             gen_helper_neon_widen_##S##8,                               \
             gen_helper_neon_widen_##S##16,                              \
-            tcg_gen_##EXT##_i32_i64,                                    \
-            NULL,                                                       \
+            NULL, NULL,                                                 \
         };                                                              \
         static NeonGenTwo64OpFn * const addfn[] = {                     \
             gen_helper_neon_##OP##l_u16,                                \
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
             tcg_gen_##OP##_i64,                                         \
             NULL,                                                       \
         };                                                              \
-        return do_prewiden_3d(s, a, widenfn[a->size],                   \
-                              addfn[a->size], SRC1WIDE);                \
+        int narrow_mop = a->size == MO_32 ? MO_32 | SIGN : -1;          \
+        return do_prewiden_3d(s, a, widenfn[a->size], addfn[a->size],   \
+                              SRC1WIDE ? MO_Q : narrow_mop,             \
+                              narrow_mop);                              \
     }
 
-DO_PREWIDEN(VADDL_S, s, ext, add, false)
-DO_PREWIDEN(VADDL_U, u, extu, add, false)
-DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
-DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
-DO_PREWIDEN(VADDW_S, s, ext, add, true)
-DO_PREWIDEN(VADDW_U, u, extu, add, true)
-DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
-DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
+DO_PREWIDEN(VADDL_S, s, add, false, MO_SIGN)
+DO_PREWIDEN(VADDL_U, u, add, false, 0)
+DO_PREWIDEN(VSUBL_S, s, sub, false, MO_SIGN)
+DO_PREWIDEN(VSUBL_U, u, sub, false, 0)
+DO_PREWIDEN(VADDW_S, s, add, true, MO_SIGN)
+DO_PREWIDEN(VADDW_U, u, add, true, 0)
+DO_PREWIDEN(VSUBW_S, s, sub, true, MO_SIGN)
+DO_PREWIDEN(VSUBW_U, u, sub, true, 0)
 
 static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
                          NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
-- 
2.20.1

In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
meant we were using the H4() address swizzler macro rather than the
H2() which is required for 2-byte data.  This had no effect on
little-endian hosts but meant we put the result data into the
destination Dreg in the wrong order on big-endian hosts.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201028191712.4910-2-peter.maydell@linaro.org
---
 target/arm/vec_helper.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_ABA(gvec_uaba_d, uint64_t)
         r2 = float16_##OP(m[H2(0)], m[H2(1)], fpst);                    \
         r3 = float16_##OP(m[H2(2)], m[H2(3)], fpst);                    \
                                                                         \
-        d[H4(0)] = r0;                                                  \
-        d[H4(1)] = r1;                                                  \
-        d[H4(2)] = r2;                                                  \
-        d[H4(3)] = r3;                                                  \
+        d[H2(0)] = r0;                                                  \
+        d[H2(1)] = r1;                                                  \
+        d[H2(2)] = r2;                                                  \
+        d[H2(3)] = r3;                                                  \
     }
 
 DO_NEON_PAIRWISE(neon_padd, add)
-- 
2.20.1

The helper functions for performing the udot/sdot operations against
a scalar were not using an address-swizzling macro when converting
the index of the scalar element into a pointer into the vm array.
This had no effect on little-endian hosts but meant we generated
incorrect results on big-endian hosts.

For these insns, the index is indexing over group of 4 8-bit values,
so 32 bits per indexed entity, and H4() is therefore what we want.
(For Neon the only possible input indexes are 0 and 1.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201028191712.4910-3-peter.maydell@linaro.org
---
 target/arm/vec_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t index = simd_data(desc);
     uint32_t *d = vd;
     int8_t *n = vn;
-    int8_t *m_indexed = (int8_t *)vm + index * 4;
+    int8_t *m_indexed = (int8_t *)vm + H4(index) * 4;
 
     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
      * Otherwise opr_sz is a multiple of 16.
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t index = simd_data(desc);
     uint32_t *d = vd;
     uint8_t *n = vn;
-    uint8_t *m_indexed = (uint8_t *)vm + index * 4;
+    uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4;
 
     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
      * Otherwise opr_sz is a multiple of 16.
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

HCR should be applied when NS is set, not when it is cleared.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
 
 /*
  * Non-IS variants of TLB operations are upgraded to
- * IS versions if we are at NS EL1 and HCR_EL2.FB is set to
+ * IS versions if we are at EL1 and HCR_EL2.FB is effectively set to
  * force broadcast of these operations.
  */
 static bool tlb_force_broadcast(CPUARMState *env)
 {
-    return (env->cp15.hcr_el2 & HCR_FB) &&
-        arm_current_el(env) == 1 && arm_is_secure_below_el3(env);
+    return arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_FB);
 }
 
 static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

Secure mode is not exempted from checking SCR_EL3.TLOR, and in the
future HCR_EL2.TLOR when S-EL2 is enabled.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t id_aa64pfr0_read(CPUARMState *env, const ARMCPRegInfo *ri)
 #endif
 
 /* Shared logic between LORID and the rest of the LOR* registers.
- * Secure state has already been delt with.
+ * Secure state exclusion has already been dealt with.
  */
-static CPAccessResult access_lor_ns(CPUARMState *env)
+static CPAccessResult access_lor_ns(CPUARMState *env,
+                                    const ARMCPRegInfo *ri, bool isread)
 {
     int el = arm_current_el(env);
 
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_ns(CPUARMState *env)
     return CP_ACCESS_OK;
 }
 
-static CPAccessResult access_lorid(CPUARMState *env, const ARMCPRegInfo *ri,
-                                   bool isread)
-{
-    if (arm_is_secure_below_el3(env)) {
-        /* Access ok in secure mode.  */
-        return CP_ACCESS_OK;
-    }
-    return access_lor_ns(env);
-}
-
 static CPAccessResult access_lor_other(CPUARMState *env,
                                        const ARMCPRegInfo *ri, bool isread)
 {
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_other(CPUARMState *env,
         /* Access denied in secure mode.  */
         return CP_ACCESS_TRAP;
     }
-    return access_lor_ns(env);
+    return access_lor_ns(env, ri, isread);
 }
 
 /*
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lor_reginfo[] = {
       .type = ARM_CP_CONST, .resetvalue = 0 },
     { .name = "LORID_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 7,
-      .access = PL1_R, .accessfn = access_lorid,
+      .access = PL1_R, .accessfn = access_lor_ns,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     REGINFO_SENTINEL
 };
-- 
2.20.1

If we're using the capstone disassembler, disassembly of a run of
instructions more than 32 bytes long disassembles the wrong data for
instructions beyond the 32 byte mark:

(qemu) xp /16x 0x100
0000000000000100: 0x00000005 0x54410001 0x00000001 0x00001000
0000000000000110: 0x00000000 0x00000004 0x54410002 0x3c000000
0000000000000120: 0x00000000 0x00000004 0x54410009 0x74736574
0000000000000130: 0x00000000 0x00000000 0x00000000 0x00000000
(qemu) xp /16i 0x100
0x00000100: 00000005 andeq r0, r0, r5
0x00000104: 54410001 strbpl r0, [r1], #-1
0x00000108: 00000001 andeq r0, r0, r1
0x0000010c: 00001000 andeq r1, r0, r0
0x00000110: 00000000 andeq r0, r0, r0
0x00000114: 00000004 andeq r0, r0, r4
0x00000118: 54410002 strbpl r0, [r1], #-2
0x0000011c: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
0x00000120: 54410001 strbpl r0, [r1], #-1
0x00000124: 00000001 andeq r0, r0, r1
0x00000128: 00001000 andeq r1, r0, r0
0x0000012c: 00000000 andeq r0, r0, r0
0x00000130: 00000004 andeq r0, r0, r4
0x00000134: 54410002 strbpl r0, [r1], #-2
0x00000138: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
0x0000013c: 00000000 andeq r0, r0, r0

Here the disassembly of 0x120..0x13f is using the data that is in
0x104..0x123.

This is caused by passing the wrong value to the read_memory_func().
The intention is that at this point in the loop the 'cap_buf' buffer
already contains 'csize' bytes of data for the instruction at guest
addr 'pc', and we want to read in an extra 'tsize' bytes.  Those
extra bytes are therefore at 'pc + csize', not 'pc'.  On the first
time through the loop 'csize' happens to be zero, so the initial read
of 32 bytes into cap_buf is correct and as long as the disassembly
never needs to read more data we return the correct information.

Use the correct guest address in the call to read_memory_func().

Cc: qemu-stable@nongnu.org
Fixes: https://bugs.launchpad.net/qemu/+bug/1900779
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201022132445.25039-1-peter.maydell@linaro.org
---
 disas/capstone.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/disas/capstone.c b/disas/capstone.c
index XXXXXXX..XXXXXXX 100644
--- a/disas/capstone.c
+++ b/disas/capstone.c
@@ -XXX,XX +XXX,XX @@ bool cap_disas_monitor(disassemble_info *info, uint64_t pc, int count)
 
         /* Make certain that we can make progress.  */
         assert(tsize != 0);
-        info->read_memory_func(pc, cap_buf + csize, tsize, info);
+        info->read_memory_func(pc + csize, cap_buf + csize, tsize, info);
         csize += tsize;
 
         if (cs_disasm_iter(handle, &cbuf, &csize, &pc, insn)) {
-- 
2.20.1

From: Philippe Mathieu-Daudé <philmd@redhat.com>

Use the BIT_ULL() macro to ensure we use 64-bit arithmetic.
This fixes the following Coverity issue (OVERFLOW_BEFORE_WIDEN):

CID 1432363 (#1 of 1): Unintentional integer overflow:

overflow_before_widen:
    Potentially overflowing expression 1 << scale with type int
    (32 bits, signed) is evaluated using 32-bit arithmetic, and
    then used in a context that expects an expression of type
    hwaddr (64 bits, unsigned).

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20201030144617.1535064-1-philmd@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/smmuv3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/bitops.h"
 #include "hw/irq.h"
 #include "hw/sysbus.h"
 #include "migration/vmstate.h"
@@ -XXX,XX +XXX,XX @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
         scale = CMD_SCALE(cmd);
         num = CMD_NUM(cmd);
         ttl = CMD_TTL(cmd);
-        num_pages = (num + 1) * (1 << (scale));
+        num_pages = (num + 1) * BIT_ULL(scale);
     }
 
     if (type == SMMU_CMD_TLBI_NH_VA) {
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

When booting a CPU with EL3 using the -kernel flag, set up CPTR_EL3 so
that SVE will not trap to EL3.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030151541.11976-1-remi@remlab.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/boot.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@ static void do_cpu_reset(void *opaque)
                     if (cpu_isar_feature(aa64_mte, cpu)) {
                         env->cp15.scr_el3 |= SCR_ATA;
                     }
+                    if (cpu_isar_feature(aa64_sve, cpu)) {
+                        env->cp15.cptr_el[3] |= CPTR_EZ;
+                    }
                     /* AArch64 kernels never boot in secure mode */
                     assert(!info->secure_boot);
                     /* This hook is only supported for AArch32 currently:
-- 
2.20.1

From: AlexChen <alex.chen@huawei.com>

In omap_lcd_interrupts(), the pointer omap_lcd is dereferinced before
being check if it is valid, which may lead to NULL pointer dereference.
So move the assignment to surface after checking that the omap_lcd is valid
and move surface_bits_per_pixel(surface) to after the surface assignment.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: AlexChen <alex.chen@huawei.com>
Message-id: 5F9CDB8A.9000001@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/display/omap_lcdc.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/display/omap_lcdc.c b/hw/display/omap_lcdc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/omap_lcdc.c
+++ b/hw/display/omap_lcdc.c
@@ -XXX,XX +XXX,XX @@ static void omap_lcd_interrupts(struct omap_lcd_panel_s *s)
 static void omap_update_display(void *opaque)
 {
     struct omap_lcd_panel_s *omap_lcd = (struct omap_lcd_panel_s *) opaque;
-    DisplaySurface *surface = qemu_console_surface(omap_lcd->con);
+    DisplaySurface *surface;
     draw_line_func draw_line;
     int size, height, first, last;
     int width, linesize, step, bpp, frame_offset;
     hwaddr frame_base;
 
-    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable ||
-        !surface_bits_per_pixel(surface)) {
+    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable) {
+        return;
+    }
+
+    surface = qemu_console_surface(omap_lcd->con);
+    if (!surface_bits_per_pixel(surface)) {
         return;
     }
 
-- 
2.20.1

From: AlexChen <alex.chen@huawei.com>

In exynos4210_fimd_update(), the pointer s is dereferinced before
being check if it is valid, which may lead to NULL pointer dereference.
So move the assignment to global_width after checking that the s is valid.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 5F9F8D88.9030102@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/display/exynos4210_fimd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/display/exynos4210_fimd.c b/hw/display/exynos4210_fimd.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/exynos4210_fimd.c
+++ b/hw/display/exynos4210_fimd.c
@@ -XXX,XX +XXX,XX @@ static void exynos4210_fimd_update(void *opaque)
     bool blend = false;
     uint8_t *host_fb_addr;
     bool is_dirty = false;
-    const int global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
+    int global_width;
 
     if (!s || !s->console || !s->enabled ||
         surface_bits_per_pixel(qemu_console_surface(s->console)) == 0) {
         return;
     }
+
+    global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
     exynos4210_update_resolution(s);
     surface = qemu_console_surface(s->console);
 
-- 
2.20.1

In arm_v7m_mmu_idx_for_secstate() we get the 'priv' level to pass to
armv7m_mmu_idx_for_secstate_and_priv() by calling arm_current_el().
This is incorrect when the security state being queried is not the
current one, because arm_current_el() uses the current security state
to determine which of the banked CONTROL.nPRIV bits to look at.
The effect was that if (for instance) Secure state was in privileged
mode but Non-Secure was not then we would return the wrong MMU index.

The only places where we are using this function in a way that could
trigger this bug are for the stack loads during a v8M function-return
and for the instruction fetch of a v8M SG insn.

Fix the bug by expanding out the M-profile version of the
arm_current_el() logic inline so it can use the passed in secstate
rather than env->v7m.secure.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201022164408.13214-1-peter.maydell@linaro.org
---
 target/arm/m_helper.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
 /* Return the MMU index for a v7M CPU in the specified security state */
 ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
 {
-    bool priv = arm_current_el(env) != 0;
+    bool priv = arm_v7m_is_handler_mode(env) ||
+        !(env->v7m.control[secstate] & 1);
 
     return arm_v7m_mmu_idx_for_secstate_and_priv(env, secstate, priv);
 }
-- 
2.20.1

On some hosts (eg Ubuntu Bionic) pkg-config returns a set of
libraries for gio-2.0 which don't actually work when compiling
statically. (Specifically, the returned library string includes
-lmount, but not -lblkid which -lmount depends upon, so linking
fails due to missing symbols.)

Check that the libraries work, and don't enable gio if they don't,
in the same way we do for gnutls.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200928160402.7961-1-peter.maydell@linaro.org
---
 configure | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index XXXXXXX..XXXXXXX 100755
--- a/configure
+++ b/configure
@@ -XXX,XX +XXX,XX @@ if test "$static" = yes && test "$mingw32" = yes; then
 fi
 
 if $pkg_config --atleast-version=$glib_req_ver gio-2.0; then
-    gio=yes
     gio_cflags=$($pkg_config --cflags gio-2.0)
     gio_libs=$($pkg_config --libs gio-2.0)
     gdbus_codegen=$($pkg_config --variable=gdbus_codegen gio-2.0)
     if [ ! -x "$gdbus_codegen" ]; then
         gdbus_codegen=
     fi
+    # Check that the libraries actually work -- Ubuntu 18.04 ships
+    # with pkg-config --static --libs data for gio-2.0 that is missing
+    # -lblkid and will give a link error.
+    write_c_skeleton
+    if compile_prog "" "gio_libs" ; then
+        gio=yes
+    else
+        gio=no
+    fi
 else
     gio=no
 fi
-- 
2.20.1

In gicv3_init_cpuif() we copy the ARMCPU gicv3_maintenance_interrupt
into the GICv3CPUState struct's maintenance_irq field.  This will
only work if the board happens to have already wired up the CPU
maintenance IRQ before the GIC was realized.  Unfortunately this is
not the case for the 'virt' board, and so the value that gets copied
is NULL (since a qemu_irq is really a pointer to an IRQState struct
under the hood).  The effect is that the CPU interface code never
actually raises the maintenance interrupt line.

Instead, since the GICv3CPUState has a pointer to the CPUState, make
the dereference at the point where we want to raise the interrupt, to
avoid an implicit requirement on board code to wire things up in a
particular order.

Reported-by: Jose Martins <josemartins90@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20201009153904.28529-1-peter.maydell@linaro.org
Reviewed-by: Luc Michel <luc@lmichel.fr>
---
 include/hw/intc/arm_gicv3_common.h | 1 -
 hw/intc/arm_gicv3_cpuif.c          | 5 ++---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -XXX,XX +XXX,XX @@ struct GICv3CPUState {
     qemu_irq parent_fiq;
     qemu_irq parent_virq;
     qemu_irq parent_vfiq;
-    qemu_irq maintenance_irq;
 
     /* Redistributor */
     uint32_t level;                  /* Current IRQ level */
diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
     int irqlevel = 0;
     int fiqlevel = 0;
     int maintlevel = 0;
+    ARMCPU *cpu = ARM_CPU(cs->cpu);
 
     idx = hppvi_index(cs);
     trace_gicv3_cpuif_virt_update(gicv3_redist_affid(cs), idx);
@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
 
     qemu_set_irq(cs->parent_vfiq, fiqlevel);
     qemu_set_irq(cs->parent_virq, irqlevel);
-    qemu_set_irq(cs->maintenance_irq, maintlevel);
+    qemu_set_irq(cpu->gicv3_maintenance_interrupt, maintlevel);
 }
 
 static uint64_t icv_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
@@ -XXX,XX +XXX,XX @@ void gicv3_init_cpuif(GICv3State *s)
             && cpu->gic_num_lrs) {
             int j;
 
-            cs->maintenance_irq = cpu->gicv3_maintenance_interrupt;
-
             cs->num_list_regs = cpu->gic_num_lrs;
             cs->vpribits = cpu->gic_vpribits;
             cs->vprebits = cpu->gic_vprebits;
-- 
2.20.1

The kerneldoc script currently emits Sphinx markup for a macro with
arguments that uses the c:function directive. This is correct for
Sphinx versions earlier than Sphinx 3, where c:macro doesn't allow
documentation of macros with arguments and c:function is not picky
about the syntax of what it is passed. However, in Sphinx 3 the
c:macro directive was enhanced to support macros with arguments,
and c:function was made more picky about what syntax it accepted.

When kerneldoc is told that it needs to produce output for Sphinx
3 or later, make it emit c:function only for functions and c:macro
for macros with arguments. We assume that anything with a return
type is a function and anything without is a macro.

This fixes the Sphinx error:

/home/petmay01/linaro/qemu-from-laptop/qemu/docs/../include/qom/object.h:155:Error in declarator
If declarator-id with parameters (e.g., 'void f(int arg)'):
  Invalid C declaration: Expected identifier in nested name. [error at 25]
    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
    -------------------------^
If parenthesis in noptr-declarator (e.g., 'void (*f(int arg))(double)'):
  Error in declarator or parameters
  Invalid C declaration: Expecting "(" in parameters. [error at 39]
    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
    ---------------------------------------^

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201030174700.7204-2-peter.maydell@linaro.org
---
 scripts/kernel-doc | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index XXXXXXX..XXXXXXX 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -XXX,XX +XXX,XX @@ sub output_function_rst(%) {
 	output_highlight_rst($args{'purpose'});
 	$start = "\n\n**Syntax**\n\n  ``";
     } else {
-	print ".. c:function:: ";
+        if ((split(/\./, $sphinx_version))[0] >= 3) {
+            # Sphinx 3 and later distinguish macros and functions and
+            # complain if you use c:function with something that's not
+            # syntactically valid as a function declaration.
+            # We assume that anything with a return type is a function
+            # and anything without is a macro.
+            if ($args{'functiontype'} ne "") {
+                print ".. c:function:: ";
+            } else {
+                print ".. c:macro:: ";
+            }
+        } else {
+            # Older Sphinx don't support documenting macros that take
+            # arguments with c:macro, and don't complain about the use
+            # of c:function for this.
+            print ".. c:function:: ";
+        }
     }
     if ($args{'functiontype'} ne "") {
 	$start .= $args{'functiontype'} . " " . $args{'function'} . " (";
-- 
2.20.1

Sphinx 3.2 is pickier than earlier versions about the option:: markup,
and complains about our usage in qemu-option-trace.rst:

../../docs/qemu-option-trace.rst.inc:4:Malformed option description
  '[enable=]PATTERN', should look like "opt", "-opt args", "--opt args",
  "/opt args" or "+opt args"

In this file, we're really trying to document the different parts of
the top-level --trace option, which qemu-nbd.rst and qemu-img.rst
have already introduced with an option:: markup.  So it's not right
to use option:: here anyway.  Switch to a different markup
(definition lists) which gives about the same formatted output.

(Unlike option::, this markup doesn't produce index entries; but
at the moment we don't do anything much with indexes anyway, and
in any case I think it doesn't make much sense to have individual
index entries for the sub-parts of the --trace option.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201030174700.7204-3-peter.maydell@linaro.org
---
 docs/qemu-option-trace.rst.inc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/qemu-option-trace.rst.inc b/docs/qemu-option-trace.rst.inc
index XXXXXXX..XXXXXXX 100644
--- a/docs/qemu-option-trace.rst.inc
+++ b/docs/qemu-option-trace.rst.inc
@@ -XXX,XX +XXX,XX @@
 
 Specify tracing options.
 
-.. option:: [enable=]PATTERN
+``[enable=]PATTERN``
 
   Immediately enable events matching *PATTERN*
   (either event name or a globbing pattern).  This option is only
@@ -XXX,XX +XXX,XX @@ Specify tracing options.
 
   Use :option:`-trace help` to print a list of names of trace points.
 
-.. option:: events=FILE
+``events=FILE``
 
   Immediately enable events listed in *FILE*.
   The file must contain one event name (as listed in the ``trace-events-all``
@@ -XXX,XX +XXX,XX @@ Specify tracing options.
   available if QEMU has been compiled with the ``simple``, ``log`` or
   ``ftrace`` tracing backend.
 
-.. option:: file=FILE
+``file=FILE``
 
   Log output traces to *FILE*.
   This option is only available if QEMU has been compiled with
-- 
2.20.1

The randomness tests in the NPCM7xx RNG test fail intermittently
but fairly frequently. On my machine running the test in a loop:
 while QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/npcm7xx_rng-test; do true; done

will fail in less than a minute with an error like:
ERROR:../../tests/qtest/npcm7xx_rng-test.c:256:test_first_byte_runs:
assertion failed (calc_runs_p(buf.l, sizeof(buf) * BITS_PER_BYTE) > 0.01): (0.00286205989 > 0.01)

(Failures have been observed on all 4 of the randomness tests,
not just first_byte_runs.)

It's not clear why these tests are failing like this, but intermittent
failures make CI and merge testing awkward, so disable running them
unless a developer specifically sets QEMU_TEST_FLAKY_RNG_TESTS when
running the test suite, until we work out the cause.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201102152454.8287-1-peter.maydell@linaro.org
Reviewed-by: Havard Skinnemoen <hskinnemoen@google.com>
---
 tests/qtest/npcm7xx_rng-test.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/npcm7xx_rng-test.c b/tests/qtest/npcm7xx_rng-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/npcm7xx_rng-test.c
+++ b/tests/qtest/npcm7xx_rng-test.c
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
 
     qtest_add_func("npcm7xx_rng/enable_disable", test_enable_disable);
     qtest_add_func("npcm7xx_rng/rosel", test_rosel);
-    qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
-    qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
-    qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
-    qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
+    /*
+     * These tests fail intermittently; only run them on explicit
+     * request until we figure out why.
+     */
+    if (getenv("QEMU_TEST_FLAKY_RNG_TESTS")) {
+        qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
+        qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
+        qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
+        qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
+    }
 
     qtest_start("-machine npcm750-evb");
     ret = g_test_run();
-- 
2.20.1

The following changes since commit a97978bcc2d1f650c7d411428806e5b03082b8c7:

Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210603' into staging (2021-06-03 10:00:35 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210603

for you to fetch changes up to 1c861885894d840235954060050d240259f5340b:

tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed (2021-06-03 16:43:27 +0100)

----------------------------------------------------------------
target-arm queue:
 * Some not-yet-enabled preliminaries for M-profile MVE support
 * Consistently use "Cortex-Axx", not "Cortex Axx" in docs, comments
 * docs: Fix installation of man pages with Sphinx 4.x
 * Mark LDS{MIN,MAX} as signed operations
 * Fix missing syndrome value for DAIF and PAC check exceptions
 * Implement BFloat16 extensions
 * Refactoring of hvf accelerator code in preparation for aarch64 support
 * Fix some coverity nits in test code

----------------------------------------------------------------
Alexander Graf (12):
      hvf: Move assert_hvf_ok() into common directory
      hvf: Move vcpu thread functions into common directory
      hvf: Move cpu functions into common directory
      hvf: Move hvf internal definitions into common header
      hvf: Make hvf_set_phys_mem() static
      hvf: Remove use of hv_uvaddr_t and hv_gpaddr_t
      hvf: Split out common code on vcpu init and destroy
      hvf: Use cpu_synchronize_state()
      hvf: Make synchronize functions static
      hvf: Remove hvf-accel-ops.h
      hvf: Introduce hvf vcpu struct
      hvf: Simplify post reset/init/loadvm hooks

Damien Goutte-Gattat (1):
      docs: Fix installation of man pages with Sphinx 4.x

Jamie Iles (4):
      target/arm: fix missing exception class
      target/arm: fold do_raise_exception into raise_exception
      target/arm: use raise_exception_ra for MTE check failure
      target/arm: use raise_exception_ra for stack limit exception

Peter Maydell (15):
      target/arm: Add isar feature check functions for MVE
      target/arm: Update feature checks for insns which are "MVE or FP"
      target/arm: Move fpsp/fpdp isar check into callers of do_vfp_2op_sp/dp
      target/arm: Add MVE check to VMOV_reg_sp and VMOV_reg_dp
      target/arm: Fix return values in fp_sysreg_checks()
      target/arm: Implement M-profile VPR register
      target/arm: Make FPSCR.LTPSIZE writable for MVE
      target/arm: Allow board models to specify initial NS VTOR
      arm: Consistently use "Cortex-Axx", not "Cortex Axx"
      tests/qtest/bios-tables-test: Check for dup2() failure
      tests/qtest/e1000e-test: Check qemu_recv() succeeded
      tests/qtest/hd-geo-test: Fix checks on mkstemp() return value
      tests/qtest/pflash-cfi02-test: Avoid potential integer overflow
      tests/qtest/tpm-tests: Remove unnecessary NULL checks
      tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed

Richard Henderson (13):
      target/arm: Mark LDS{MIN,MAX} as signed operations
      target/arm: Add isar_feature_{aa32, aa64, aa64_sve}_bf16
      target/arm: Unify unallocated path in disas_fp_1src
      target/arm: Implement scalar float32 to bfloat16 conversion
      target/arm: Implement vector float32 to bfloat16 conversion
      softfpu: Add float_round_to_odd_inf
      target/arm: Implement bfloat16 dot product (vector)
      target/arm: Implement bfloat16 dot product (indexed)
      target/arm: Implement bfloat16 matrix multiply accumulate
      target/arm: Implement bfloat widening fma (vector)
      target/arm: Implement bfloat widening fma (indexed)
      linux-user/aarch64: Enable hwcap bits for bfloat16
      target/arm: Enable BFloat16 extensions

Add the isar feature check functions we will need for v8.1M MVE:
 * a check for MVE present: this corresponds to the pseudocode's
   CheckDecodeFaults(ExtType_Mve)
 * a check for the optional floating-point part of MVE: this
   corresponds to CheckDecodeFaults(ExtType_MveFp)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-2-peter.maydell@linaro.org
---
 target/arm/cpu.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
     }
 }
 
+static inline bool isar_feature_aa32_mve(const ARMISARegisters *id)
+{
+    /*
+     * Return true if MVE is supported (either integer or floating point).
+     * We must check for M-profile as the MVFR1 field means something
+     * else for A-profile.
+     */
+    return isar_feature_aa32_mprofile(id) &&
+        FIELD_EX32(id->mvfr1, MVFR1, MVE) > 0;
+}
+
+static inline bool isar_feature_aa32_mve_fp(const ARMISARegisters *id)
+{
+    /*
+     * Return true if MVE is supported (either integer or floating point).
+     * We must check for M-profile as the MVFR1 field means something
+     * else for A-profile.
+     */
+    return isar_feature_aa32_mprofile(id) &&
+        FIELD_EX32(id->mvfr1, MVFR1, MVE) >= 2;
+}
+
 static inline bool isar_feature_aa32_vfp_simd(const ARMISARegisters *id)
 {
     /*
-- 
2.20.1

Some v8M instructions are present if either the floating point
extension or MVE is implemented.  Update our implementation of them
to check for MVE as well as for FP.

This is all the insns which use CheckDecodeFaults(ExtType_MveOrFp) or
CheckDecodeFaults(ExtType_MveOrDpFp) in their pseudocode, which are
essentially the loads and stores, moves and sysreg accesses, except
for VMOV_reg_sp and VMOV_reg_dp, which we handle in subsequent
patches because they need a refactor to provide a place to put the
new MVE check.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-3-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 48 +++++++++++++++++++++++---------------
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
     /* VMOV scalar to general purpose register */
     TCGv_i32 tmp;
 
-    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
-    if (a->size == MO_32
-        ? !dc_isar_feature(aa32_fpsp_v2, s)
-        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-        return false;
+    /*
+     * SIZE == MO_32 is a VFP instruction; otherwise NEON. MVE has
+     * all sizes, whether the CPU has fp or not.
+     */
+    if (!dc_isar_feature(aa32_mve, s)) {
+        if (a->size == MO_32
+            ? !dc_isar_feature(aa32_fpsp_v2, s)
+            : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
+            return false;
+        }
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist */
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
     /* VMOV general purpose register to scalar */
     TCGv_i32 tmp;
 
-    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
-    if (a->size == MO_32
-        ? !dc_isar_feature(aa32_fpsp_v2, s)
-        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-        return false;
+    /*
+     * SIZE == MO_32 is a VFP instruction; otherwise NEON. MVE has
+     * all sizes, whether the CPU has fp or not.
+     */
+    if (!dc_isar_feature(aa32_mve, s)) {
+        if (a->size == MO_32
+            ? !dc_isar_feature(aa32_fpsp_v2, s)
+            : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
+            return false;
+        }
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist */
@@ -XXX,XX +XXX,XX @@ typedef enum FPSysRegCheckResult {
 
 static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
 {
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return FPSysRegCheckFailed;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
 {
     TCGv_i32 tmp;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
 {
     TCGv_i32 tmp;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
      * floating point register.  Note that this does not require support
      * for double precision arithmetic.
      */
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     uint32_t offset;
     TCGv_i32 addr, tmp;
 
-    if (!dc_isar_feature(aa32_fp16_arith, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     uint32_t offset;
     TCGv_i32 addr, tmp;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
     TCGv_i64 tmp;
 
     /* Note that this does not require support for double arithmetic.  */
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
     TCGv_i32 addr, tmp;
     int i, n;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
     int i, n;
 
     /* Note that this does not require support for double arithmetic.  */
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
-- 
2.20.1

The do_vfp_2op_sp() and do_vfp_2op_dp() functions currently check
whether floating point is supported via the aa32_fpdp_v2 and
aa32_fpsp_v2 isar checks.  For v8.1M MVE support, the VMOV_reg trans
functions (but not any of the others) need to update this to also
allow the insn if MVE is implemented.  Move the check out of the do_
function and into its callsites (which are all implemented via the
DO_VFP_2OP macro), so we have a place to change the check for the
VMOV insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-4-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     int veclen = s->vec_len;
     TCGv_i32 f0, fd;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
-        return false;
-    }
+    /* Note that the caller must check the aa32_fpsp_v2 feature. */
 
     if (!dc_isar_feature(aa32_fpshvec, s) &&
         (veclen != 0 || s->vec_stride != 0)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
      */
     TCGv_i32 f0;
 
+    /* Note that the caller must check the aa32_fp16_arith feature */
+
     if (!dc_isar_feature(aa32_fp16_arith, s)) {
         return false;
     }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
     int veclen = s->vec_len;
     TCGv_i64 f0, fd;
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
-        return false;
-    }
+    /* Note that the caller must check the aa32_fpdp_v2 feature. */
 
     /* UNDEF accesses to D16-D31 if they don't exist */
     if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vm) & 0x10)) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     return true;
 }
 
-#define DO_VFP_2OP(INSN, PREC, FN)                              \
+#define DO_VFP_2OP(INSN, PREC, FN, CHECK)                       \
     static bool trans_##INSN##_##PREC(DisasContext *s,          \
                                       arg_##INSN##_##PREC *a)   \
     {                                                           \
+        if (!dc_isar_feature(CHECK, s)) {                       \
+            return false;                                       \
+        }                                                       \
         return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
     }
 
-DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32)
-DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64)
+DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32, aa32_fpsp_v2)
+DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64, aa32_fpdp_v2)
 
-DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh)
-DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss)
-DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd)
+DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh, aa32_fp16_arith)
+DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss, aa32_fpsp_v2)
+DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd, aa32_fpdp_v2)
 
-DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh)
-DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs)
-DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd)
+DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh, aa32_fp16_arith)
+DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs, aa32_fpsp_v2)
+DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd, aa32_fpdp_v2)
 
 static void gen_VSQRT_hp(TCGv_i32 vd, TCGv_i32 vm)
 {
@@ -XXX,XX +XXX,XX @@ static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
     gen_helper_vfp_sqrtd(vd, vm, cpu_env);
 }
 
-DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp)
-DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp)
-DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp)
+DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp, aa32_fp16_arith)
+DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp, aa32_fpsp_v2)
+DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp, aa32_fpdp_v2)
 
 static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
 {
-- 
2.20.1

Split out the handling of VMOV_reg_sp and VMOV_reg_dp so that we can
permit the insns if either FP or MVE are present.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-5-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

The fp_sysreg_checks() function is supposed to be returning an
FPSysRegCheckResult, which is an enum with three possible values.
However, three places in the function "return false" (a hangover from
a previous iteration of the design where the function just returned a
bool).  Make these return FPSysRegCheckFailed instead (for no
functional change, since both false and FPSysRegCheckFailed are
zero).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-6-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
         break;
     case ARM_VFP_FPSCR_NZCVQC:
         if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-            return false;
+            return FPSysRegCheckFailed;
         }
         break;
     case ARM_VFP_FPCXT_S:
     case ARM_VFP_FPCXT_NS:
         if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-            return false;
+            return FPSysRegCheckFailed;
         }
         if (!s->v8m_secure) {
-            return false;
+            return FPSysRegCheckFailed;
         }
         break;
     default:
-- 
2.20.1

If MVE is implemented for an M-profile CPU then it has a VPR
register, which tracks predication information.

Implement the read and write handling of this register, and
the migration of its state.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-7-peter.maydell@linaro.org
---
 target/arm/cpu.h           |  6 ++++++
 target/arm/machine.c       | 19 +++++++++++++++++++
 target/arm/translate-vfp.c | 38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
         uint32_t cpacr[M_REG_NUM_BANKS];
         uint32_t nsacr;
         int ltpsize;
+        uint32_t vpr;
     } v7m;
 
     /* Information associated with an exception about to be taken:
@@ -XXX,XX +XXX,XX @@ FIELD(V7M_FPCCR, ASPEN, 31, 1)
      R_V7M_FPCCR_UFRDY_MASK |                   \
      R_V7M_FPCCR_ASPEN_MASK)
 
+/* v7M VPR bits */
+FIELD(V7M_VPR, P0, 0, 16)
+FIELD(V7M_VPR, MASK01, 16, 4)
+FIELD(V7M_VPR, MASK23, 20, 4)
+
 /*
  * System register ID fields.
  */
diff --git a/target/arm/machine.c b/target/arm/machine.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_fp = {
     }
 };
 
+static bool mve_needed(void *opaque)
+{
+    ARMCPU *cpu = opaque;
+
+    return cpu_isar_feature(aa32_mve, cpu);
+}
+
+static const VMStateDescription vmstate_m_mve = {
+    .name = "cpu/m/mve",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = mve_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(env.v7m.vpr, ARMCPU),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_m = {
     .name = "cpu/m",
     .version_id = 4,
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m = {
         &vmstate_m_other_sp,
         &vmstate_m_v8m,
         &vmstate_m_fp,
+        &vmstate_m_mve,
         NULL
     }
 };
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
             return FPSysRegCheckFailed;
         }
         break;
+    case ARM_VFP_VPR:
+    case ARM_VFP_P0:
+        if (!dc_isar_feature(aa32_mve, s)) {
+            return FPSysRegCheckFailed;
+        }
+        break;
     default:
         return FPSysRegCheckFailed;
     }
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
         tcg_temp_free_i32(sfpa);
         break;
     }
+    case ARM_VFP_VPR:
+        /* Behaves as NOP if not privileged */
+        if (IS_USER(s)) {
+            break;
+        }
+        tmp = loadfn(s, opaque);
+        store_cpu_field(tmp, v7m.vpr);
+        break;
+    case ARM_VFP_P0:
+    {
+        TCGv_i32 vpr;
+        tmp = loadfn(s, opaque);
+        vpr = load_cpu_field(v7m.vpr);
+        tcg_gen_deposit_i32(vpr, vpr, tmp,
+                            R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
+        store_cpu_field(vpr, v7m.vpr);
+        tcg_temp_free_i32(tmp);
+        break;
+    }
     default:
         g_assert_not_reached();
     }
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
         tcg_temp_free_i32(fpscr);
         break;
     }
+    case ARM_VFP_VPR:
+        /* Behaves as NOP if not privileged */
+        if (IS_USER(s)) {
+            break;
+        }
+        tmp = load_cpu_field(v7m.vpr);
+        storefn(s, opaque, tmp);
+        break;
+    case ARM_VFP_P0:
+        tmp = load_cpu_field(v7m.vpr);
+        tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
+        storefn(s, opaque, tmp);
+        break;
     default:
         g_assert_not_reached();
     }
-- 
2.20.1

The M-profile FPSCR has an LTPSIZE field, but if MVE is not
implemented it is read-only and always reads as 4; this is how QEMU
currently handles it.

Make the field writable when MVE is implemented.

We can safely add the field to the MVE migration struct because
currently no CPUs enable MVE and so the migration struct is never
used.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-8-peter.maydell@linaro.org
---
 target/arm/cpu.h        | 3 ++-
 target/arm/machine.c    | 1 +
 target/arm/vfp_helper.c | 9 ++++++---
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
         uint32_t fpdscr[M_REG_NUM_BANKS];
         uint32_t cpacr[M_REG_NUM_BANKS];
         uint32_t nsacr;
-        int ltpsize;
+        uint32_t ltpsize;
         uint32_t vpr;
     } v7m;
 
@@ -XXX,XX +XXX,XX @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
 
 #define FPCR_LTPSIZE_SHIFT 16   /* LTPSIZE, M-profile only */
 #define FPCR_LTPSIZE_MASK (7 << FPCR_LTPSIZE_SHIFT)
+#define FPCR_LTPSIZE_LENGTH 3
 
 #define FPCR_NZCV_MASK (FPCR_N | FPCR_Z | FPCR_C | FPCR_V)
 #define FPCR_NZCVQC_MASK (FPCR_NZCV_MASK | FPCR_QC)
diff --git a/target/arm/machine.c b/target/arm/machine.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_mve = {
     .needed = mve_needed,
     .fields = (VMStateField[]) {
         VMSTATE_UINT32(env.v7m.vpr, ARMCPU),
+        VMSTATE_UINT32(env.v7m.ltpsize, ARMCPU),
         VMSTATE_END_OF_LIST()
     },
 };
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t vfp_get_fpscr(CPUARMState *env)
 
 void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
 {
+    ARMCPU *cpu = env_archcpu(env);
+
     /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
-    if (!cpu_isar_feature(any_fp16, env_archcpu(env))) {
+    if (!cpu_isar_feature(any_fp16, cpu)) {
         val &= ~FPCR_FZ16;
     }
 
@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
          * because in v7A no-short-vector-support cores still had to
          * allow Stride/Len to be written with the only effect that
          * some insns are required to UNDEF if the guest sets them.
-         *
-         * TODO: if M-profile MVE implemented, set LTPSIZE.
          */
         env->vfp.vec_len = extract32(val, 16, 3);
         env->vfp.vec_stride = extract32(val, 20, 2);
+    } else if (cpu_isar_feature(aa32_mve, cpu)) {
+        env->v7m.ltpsize = extract32(val, FPCR_LTPSIZE_SHIFT,
+                                     FPCR_LTPSIZE_LENGTH);
     }
 
     if (arm_feature(env, ARM_FEATURE_NEON)) {
-- 
2.20.1

Currently we allow board models to specify the initial value of the
Secure VTOR register, using an init-svtor property on the TYPE_ARMV7M
object which is plumbed through to the CPU.  Allow board models to
also specify the initial value of the Non-secure VTOR via a similar
init-nsvtor property.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-10-peter.maydell@linaro.org
---
 include/hw/arm/armv7m.h |  2 ++
 target/arm/cpu.h        |  2 ++
 hw/arm/armv7m.c         |  7 +++++++
 target/arm/cpu.c        | 10 ++++++++++
 4 files changed, 21 insertions(+)

diff --git a/include/hw/arm/armv7m.h b/include/hw/arm/armv7m.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/armv7m.h
+++ b/include/hw/arm/armv7m.h
@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(ARMv7MState, ARMV7M)
  *   devices will be automatically layered on top of this view.)
  * + Property "idau": IDAU interface (forwarded to CPU object)
  * + Property "init-svtor": secure VTOR reset value (forwarded to CPU object)
+ * + Property "init-nsvtor": non-secure VTOR reset value (forwarded to CPU object)
  * + Property "vfp": enable VFP (forwarded to CPU object)
  * + Property "dsp": enable DSP (forwarded to CPU object)
  * + Property "enable-bitband": expose bitbanded IO
@@ -XXX,XX +XXX,XX @@ struct ARMv7MState {
     MemoryRegion *board_memory;
     Object *idau;
     uint32_t init_svtor;
+    uint32_t init_nsvtor;
     bool enable_bitband;
     bool start_powered_off;
     bool vfp;
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
 
     /* For v8M, initial value of the Secure VTOR */
     uint32_t init_svtor;
+    /* For v8M, initial value of the Non-secure VTOR */
+    uint32_t init_nsvtor;
 
     /* [QEMU_]KVM_ARM_TARGET_* constant for this CPU, or
      * QEMU_KVM_ARM_TARGET_NONE if the kernel doesn't support this CPU type.
diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/armv7m.c
+++ b/hw/arm/armv7m.c
@@ -XXX,XX +XXX,XX @@ static void armv7m_realize(DeviceState *dev, Error **errp)
             return;
         }
     }
+    if (object_property_find(OBJECT(s->cpu), "init-nsvtor")) {
+        if (!object_property_set_uint(OBJECT(s->cpu), "init-nsvtor",
+                                      s->init_nsvtor, errp)) {
+            return;
+        }
+    }
     if (object_property_find(OBJECT(s->cpu), "start-powered-off")) {
         if (!object_property_set_bool(OBJECT(s->cpu), "start-powered-off",
                                       s->start_powered_off, errp)) {
@@ -XXX,XX +XXX,XX @@ static Property armv7m_properties[] = {
                      MemoryRegion *),
     DEFINE_PROP_LINK("idau", ARMv7MState, idau, TYPE_IDAU_INTERFACE, Object *),
     DEFINE_PROP_UINT32("init-svtor", ARMv7MState, init_svtor, 0),
+    DEFINE_PROP_UINT32("init-nsvtor", ARMv7MState, init_nsvtor, 0),
     DEFINE_PROP_BOOL("enable-bitband", ARMv7MState, enable_bitband, false),
     DEFINE_PROP_BOOL("start-powered-off", ARMv7MState, start_powered_off,
                      false),
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
         env->regs[14] = 0xffffffff;
 
         env->v7m.vecbase[M_REG_S] = cpu->init_svtor & 0xffffff80;
+        env->v7m.vecbase[M_REG_NS] = cpu->init_nsvtor & 0xffffff80;
 
         /* Load the initial SP and PC from offset 0 and 4 in the vector table */
         vecbase = env->v7m.vecbase[env->v7m.secure];
@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
                                        &cpu->init_svtor,
                                        OBJ_PROP_FLAG_READWRITE);
     }
+    if (arm_feature(&cpu->env, ARM_FEATURE_M)) {
+        /*
+         * Initial value of the NS VTOR (for cores without the Security
+         * extension, this is the only VTOR)
+         */
+        object_property_add_uint32_ptr(obj, "init-nsvtor",
+                                       &cpu->init_nsvtor,
+                                       OBJ_PROP_FLAG_READWRITE);
+    }
 
     qdev_property_add_static(DEVICE(obj), &arm_cpu_cfgend_property);
 
-- 
2.20.1

The official punctuation for Arm CPU names uses a hyphen, like
"Cortex-A9". We mostly follow this, but in a few places usage
without the hyphen has crept in. Fix those so we consistently
use the same way of writing the CPU name.

This commit was created with:
  git grep -z -l 'Cortex ' | xargs -0 sed -i 's/Cortex /Cortex-/'

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20210527095152.10968-1-peter.maydell@linaro.org
---
 docs/system/arm/aspeed.rst    | 4 ++--
 docs/system/arm/nuvoton.rst   | 6 +++---
 docs/system/arm/sabrelite.rst | 2 +-
 include/hw/arm/allwinner-h3.h | 2 +-
 hw/arm/aspeed.c               | 6 +++---
 hw/arm/mcimx6ul-evk.c         | 2 +-
 hw/arm/mcimx7d-sabre.c        | 2 +-
 hw/arm/npcm7xx_boards.c       | 4 ++--
 hw/arm/sabrelite.c            | 2 +-
 hw/misc/npcm7xx_clk.c         | 2 +-
 10 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/aspeed.rst
+++ b/docs/system/arm/aspeed.rst
@@ -XXX,XX +XXX,XX @@ The QEMU Aspeed machines model BMCs of various OpenPOWER systems and
 Aspeed evaluation boards. They are based on different releases of the
 Aspeed SoC : the AST2400 integrating an ARM926EJ-S CPU (400MHz), the
 AST2500 with an ARM1176JZS CPU (800MHz) and more recently the AST2600
-with dual cores ARM Cortex A7 CPUs (1.2GHz).
+with dual cores ARM Cortex-A7 CPUs (1.2GHz).
 
 The SoC comes with RAM, Gigabit ethernet, USB, SD/MMC, USB, SPI, I2C,
 etc.
@@ -XXX,XX +XXX,XX @@ AST2500 SoC based machines :
 
 AST2600 SoC based machines :
 
-- ``ast2600-evb``          Aspeed AST2600 Evaluation board (Cortex A7)
+- ``ast2600-evb``          Aspeed AST2600 Evaluation board (Cortex-A7)
 - ``tacoma-bmc``           OpenPOWER Witherspoon POWER9 AST2600 BMC
 
 Supported devices
diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/nuvoton.rst
+++ b/docs/system/arm/nuvoton.rst
@@ -XXX,XX +XXX,XX @@ Nuvoton iBMC boards (``npcm750-evb``, ``quanta-gsj``)
 
 The `Nuvoton iBMC`_ chips (NPCM7xx) are a family of ARM-based SoCs that are
 designed to be used as Baseboard Management Controllers (BMCs) in various
-servers. They all feature one or two ARM Cortex A9 CPU cores, as well as an
+servers. They all feature one or two ARM Cortex-A9 CPU cores, as well as an
 assortment of peripherals targeted for either Enterprise or Data Center /
 Hyperscale applications. The former is a superset of the latter, so NPCM750 has
 all the peripherals of NPCM730 and more.
 
 .. _Nuvoton iBMC: https://www.nuvoton.com/products/cloud-computing/ibmc/
 
-The NPCM750 SoC has two Cortex A9 cores and is targeted for the Enterprise
+The NPCM750 SoC has two Cortex-A9 cores and is targeted for the Enterprise
 segment. The following machines are based on this chip :
 
 - ``npcm750-evb``       Nuvoton NPCM750 Evaluation board
 
-The NPCM730 SoC has two Cortex A9 cores and is targeted for Data Center and
+The NPCM730 SoC has two Cortex-A9 cores and is targeted for Data Center and
 Hyperscale applications. The following machines are based on this chip :
 
 - ``quanta-gsj``        Quanta GSJ server BMC
diff --git a/docs/system/arm/sabrelite.rst b/docs/system/arm/sabrelite.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/sabrelite.rst
+++ b/docs/system/arm/sabrelite.rst
@@ -XXX,XX +XXX,XX @@ Supported devices
 
 The SABRE Lite machine supports the following devices:
 
- * Up to 4 Cortex A9 cores
+ * Up to 4 Cortex-A9 cores
  * Generic Interrupt Controller
  * 1 Clock Controller Module
  * 1 System Reset Controller
diff --git a/include/hw/arm/allwinner-h3.h b/include/hw/arm/allwinner-h3.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/allwinner-h3.h
+++ b/include/hw/arm/allwinner-h3.h
@@ -XXX,XX +XXX,XX @@
  */
 
 /*
- * The Allwinner H3 is a System on Chip containing four ARM Cortex A7
+ * The Allwinner H3 is a System on Chip containing four ARM Cortex-A7
  * processor cores. Features and specifications include DDR2/DDR3 memory,
  * SD/MMC storage cards, 10/100/1000Mbit Ethernet, USB 2.0, HDMI and
  * various I/O modules.
diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_ast2600_evb_class_init(ObjectClass *oc, void *data)
     MachineClass *mc = MACHINE_CLASS(oc);
     AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 
-    mc->desc       = "Aspeed AST2600 EVB (Cortex A7)";
+    mc->desc       = "Aspeed AST2600 EVB (Cortex-A7)";
     amc->soc_name  = "ast2600-a1";
     amc->hw_strap1 = AST2600_EVB_HW_STRAP1;
     amc->hw_strap2 = AST2600_EVB_HW_STRAP2;
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_tacoma_class_init(ObjectClass *oc, void *data)
     MachineClass *mc = MACHINE_CLASS(oc);
     AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 
-    mc->desc       = "OpenPOWER Tacoma BMC (Cortex A7)";
+    mc->desc       = "OpenPOWER Tacoma BMC (Cortex-A7)";
     amc->soc_name  = "ast2600-a1";
     amc->hw_strap1 = TACOMA_BMC_HW_STRAP1;
     amc->hw_strap2 = TACOMA_BMC_HW_STRAP2;
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_rainier_class_init(ObjectClass *oc, void *data)
     MachineClass *mc = MACHINE_CLASS(oc);
     AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 
-    mc->desc       = "IBM Rainier BMC (Cortex A7)";
+    mc->desc       = "IBM Rainier BMC (Cortex-A7)";
     amc->soc_name  = "ast2600-a1";
     amc->hw_strap1 = RAINIER_BMC_HW_STRAP1;
     amc->hw_strap2 = RAINIER_BMC_HW_STRAP2;
diff --git a/hw/arm/mcimx6ul-evk.c b/hw/arm/mcimx6ul-evk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mcimx6ul-evk.c
+++ b/hw/arm/mcimx6ul-evk.c
@@ -XXX,XX +XXX,XX @@ static void mcimx6ul_evk_init(MachineState *machine)
 
 static void mcimx6ul_evk_machine_init(MachineClass *mc)
 {
-    mc->desc = "Freescale i.MX6UL Evaluation Kit (Cortex A7)";
+    mc->desc = "Freescale i.MX6UL Evaluation Kit (Cortex-A7)";
     mc->init = mcimx6ul_evk_init;
     mc->max_cpus = FSL_IMX6UL_NUM_CPUS;
     mc->default_ram_id = "mcimx6ul-evk.ram";
diff --git a/hw/arm/mcimx7d-sabre.c b/hw/arm/mcimx7d-sabre.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mcimx7d-sabre.c
+++ b/hw/arm/mcimx7d-sabre.c
@@ -XXX,XX +XXX,XX @@ static void mcimx7d_sabre_init(MachineState *machine)
 
 static void mcimx7d_sabre_machine_init(MachineClass *mc)
 {
-    mc->desc = "Freescale i.MX7 DUAL SABRE (Cortex A7)";
+    mc->desc = "Freescale i.MX7 DUAL SABRE (Cortex-A7)";
     mc->init = mcimx7d_sabre_init;
     mc->max_cpus = FSL_IMX7_NUM_CPUS;
     mc->default_ram_id = "mcimx7d-sabre.ram";
diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/npcm7xx_boards.c
+++ b/hw/arm/npcm7xx_boards.c
@@ -XXX,XX +XXX,XX @@ static void npcm750_evb_machine_class_init(ObjectClass *oc, void *data)
 
     npcm7xx_set_soc_type(nmc, TYPE_NPCM750);
 
-    mc->desc = "Nuvoton NPCM750 Evaluation Board (Cortex A9)";
+    mc->desc = "Nuvoton NPCM750 Evaluation Board (Cortex-A9)";
     mc->init = npcm750_evb_init;
     mc->default_ram_size = 512 * MiB;
 };
@@ -XXX,XX +XXX,XX @@ static void gsj_machine_class_init(ObjectClass *oc, void *data)
 
     npcm7xx_set_soc_type(nmc, TYPE_NPCM730);
 
-    mc->desc = "Quanta GSJ (Cortex A9)";
+    mc->desc = "Quanta GSJ (Cortex-A9)";
     mc->init = quanta_gsj_init;
     mc->default_ram_size = 512 * MiB;
 };
diff --git a/hw/arm/sabrelite.c b/hw/arm/sabrelite.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/sabrelite.c
+++ b/hw/arm/sabrelite.c
@@ -XXX,XX +XXX,XX @@ static void sabrelite_init(MachineState *machine)
 
 static void sabrelite_machine_init(MachineClass *mc)
 {
-    mc->desc = "Freescale i.MX6 Quad SABRE Lite Board (Cortex A9)";
+    mc->desc = "Freescale i.MX6 Quad SABRE Lite Board (Cortex-A9)";
     mc->init = sabrelite_init;
     mc->max_cpus = FSL_IMX6_NUM_CPUS;
     mc->ignore_memory_transaction_failures = true;
diff --git a/hw/misc/npcm7xx_clk.c b/hw/misc/npcm7xx_clk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/npcm7xx_clk.c
+++ b/hw/misc/npcm7xx_clk.c
@@ -XXX,XX +XXX,XX @@
 #define NPCM7XX_CLOCK_REF_HZ            (25000000)
 
 /* Register Field Definitions */
-#define NPCM7XX_CLK_WDRCR_CA9C  BIT(0) /* Cortex A9 Cores */
+#define NPCM7XX_CLK_WDRCR_CA9C  BIT(0) /* Cortex-A9 Cores */
 
 #define PLLCON_LOKI     BIT(31)
 #define PLLCON_LOKS     BIT(30)
-- 
2.20.1

From: Damien Goutte-Gattat <dgouttegattat@incenp.org>

The 4.x branch of Sphinx introduces a breaking change, as generated man
pages are now written to subdirectories corresponding to the manual
section they belong to. This results in `make install` erroring out when
attempting to install the man pages, because they are not where it
expects to find them.

This patch restores the behavior of Sphinx 3.x regarding man pages.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/256
Signed-off-by: Damien Goutte-Gattat <dgouttegattat@incenp.org>
Message-id: 20210503161422.15028-1-dgouttegattat@incenp.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/conf.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/conf.py b/docs/conf.py
index XXXXXXX..XXXXXXX 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -XXX,XX +XXX,XX @@
      ['Stefan Hajnoczi <stefanha@redhat.com>',
       'Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>'], 1),
 ]
+man_make_section_directory = False
 
 # -- Options for Texinfo output -------------------------------------------
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The operands to tcg_gen_atomic_fetch_s{min,max}_i64 must
be signed, so that the inputs are properly extended.
Zero extend the result afterward, as needed.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/364
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20210602020720.47679-1-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     int o3_opc = extract32(insn, 12, 4);
     bool r = extract32(insn, 22, 1);
     bool a = extract32(insn, 23, 1);
-    TCGv_i64 tcg_rs, clean_addr;
+    TCGv_i64 tcg_rs, tcg_rt, clean_addr;
     AtomicThreeOpFn *fn = NULL;
+    MemOp mop = s->be_data | size | MO_ALIGN;
 
     if (is_vector || !dc_isar_feature(aa64_atomics, s)) {
         unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
         break;
     case 004: /* LDSMAX */
         fn = tcg_gen_atomic_fetch_smax_i64;
+        mop |= MO_SIGN;
         break;
     case 005: /* LDSMIN */
         fn = tcg_gen_atomic_fetch_smin_i64;
+        mop |= MO_SIGN;
         break;
     case 006: /* LDUMAX */
         fn = tcg_gen_atomic_fetch_umax_i64;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     }
 
     tcg_rs = read_cpu_reg(s, rs, true);
+    tcg_rt = cpu_reg(s, rt);
 
     if (o3_opc == 1) { /* LDCLR */
         tcg_gen_not_i64(tcg_rs, tcg_rs);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     /* The tcg atomic primitives are all full barriers.  Therefore we
      * can ignore the Acquire and Release bits of this instruction.
      */
-    fn(cpu_reg(s, rt), clean_addr, tcg_rs, get_mem_index(s),
-       s->be_data | size | MO_ALIGN);
+    fn(tcg_rt, clean_addr, tcg_rs, get_mem_index(s), mop);
+
+    if ((mop & MO_SIGN) && size != MO_64) {
+        tcg_gen_ext32u_i64(tcg_rt, tcg_rt);
+    }
 }
 
 /*
-- 
2.20.1

From: Jamie Iles <jamie@nuviainc.com>

The DAIF and PAC checks used raise_exception_ra to raise an exception
and unwind CPU state but raise_exception_ra is currently designed for
handling data aborts as the syndrome is partially precomputed and
encoded in the TB and then merged in merge_syn_data_abort when handling
the data abort.  Using raise_exception_ra for DAIF and PAC checks
results in an empty syndrome being retrieved from data[2] in
restore_state_to_opc and setting ESR to 0.  This manifested as:

kvm [571]: Unknown exception class: esr: 0x000000 –
  Unknown/Uncategorized

when launching a KVM guest when the host qemu used a CPU supporting
EL2+pointer authentication and enabling pointer authentication in the
guest.

Rework raise_exception_ra such that the state is restored before raising
the exception so that the exception is not clobbered by
restore_state_to_opc.

Fixes: 0d43e1a2d29a ("target/arm: Add PAuth helpers")
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
[PMM: added comment]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/op_helper.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ void raise_exception(CPUARMState *env, uint32_t excp,
 void raise_exception_ra(CPUARMState *env, uint32_t excp, uint32_t syndrome,
                         uint32_t target_el, uintptr_t ra)
 {
-    CPUState *cs = do_raise_exception(env, excp, syndrome, target_el);
-    cpu_loop_exit_restore(cs, ra);
+    CPUState *cs = env_cpu(env);
+
+    /*
+     * restore_state_to_opc() will set env->exception.syndrome, so
+     * we must restore CPU state here before setting the syndrome
+     * the caller passed us, and cannot use cpu_loop_exit_restore().
+     */
+    cpu_restore_state(cs, ra, true);
+    raise_exception(env, excp, syndrome, target_el);
 }
 
 uint64_t HELPER(neon_tbl)(CPUARMState *env, uint32_t desc,
-- 
2.20.1

From: Jamie Iles <jamie@nuviainc.com>

Now that there are no other users of do_raise_exception, fold it into
raise_exception.

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/op_helper.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@
 #define SIGNBIT (uint32_t)0x80000000
 #define SIGNBIT64 ((uint64_t)1 << 63)
 
-static CPUState *do_raise_exception(CPUARMState *env, uint32_t excp,
-                                    uint32_t syndrome, uint32_t target_el)
+void raise_exception(CPUARMState *env, uint32_t excp,
+                     uint32_t syndrome, uint32_t target_el)
 {
     CPUState *cs = env_cpu(env);
 
@@ -XXX,XX +XXX,XX @@ static CPUState *do_raise_exception(CPUARMState *env, uint32_t excp,
     cs->exception_index = excp;
     env->exception.syndrome = syndrome;
     env->exception.target_el = target_el;
-
-    return cs;
-}
-
-void raise_exception(CPUARMState *env, uint32_t excp,
-                     uint32_t syndrome, uint32_t target_el)
-{
-    CPUState *cs = do_raise_exception(env, excp, syndrome, target_el);
     cpu_loop_exit(cs);
 }
 
-- 
2.20.1

From: Jamie Iles <jamie@nuviainc.com>

Now that raise_exception_ra restores the state before raising the
exception we can use restore_exception_ra to perform the state restore +
exception raising without clobbering the syndrome.

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
[PMM: Keep the one line of the comment that is still relevant]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/mte_helper.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
 
     switch (tcf) {
     case 1:
-        /*
-         * Tag check fail causes a synchronous exception.
-         *
-         * In restore_state_to_opc, we set the exception syndrome
-         * for the load or store operation.  Unwind first so we
-         * may overwrite that with the syndrome for the tag check.
-         */
-        cpu_restore_state(env_cpu(env), ra, true);
+        /* Tag check fail causes a synchronous exception. */
         env->exception.vaddress = dirty_ptr;
 
         is_write = FIELD_EX32(desc, MTEDESC, WRITE);
         syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0,
                                     is_write, 0x11);
-        raise_exception(env, EXCP_DATA_ABORT, syn, exception_target_el(env));
+        raise_exception_ra(env, EXCP_DATA_ABORT, syn,
+                           exception_target_el(env), ra);
         /* noreturn, but fall through to the assert anyway */
 
     case 0:
-- 
2.20.1

From: Jamie Iles <jamie@nuviainc.com>

The sequence cpu_restore_state() + raise_exception() is equivalent to
raise_exception_ra(), so use that instead.  (In this case we never
cared about the syndrome value, because M-profile doesn't use the
syndrome; the old code was just written unnecessarily awkwardly.)

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
[PMM: Retain edited version of comment; rewrite commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/m_helper.c  | 5 +----
 target/arm/op_helper.c | 9 +++------
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
             limit = is_psp ? env->v7m.psplim[false] : env->v7m.msplim[false];
 
             if (val < limit) {
-                CPUState *cs = env_cpu(env);
-
-                cpu_restore_state(cs, GETPC(), true);
-                raise_exception(env, EXCP_STKOF, 0, 1);
+                raise_exception_ra(env, EXCP_STKOF, 0, 1, GETPC());
             }
 
             if (is_psp) {
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v8m_stackcheck)(CPUARMState *env, uint32_t newvalue)
      * raising an exception if the limit is breached.
      */
     if (newvalue < v7m_sp_limit(env)) {
-        CPUState *cs = env_cpu(env);
-
         /*
          * Stack limit exceptions are a rare case, so rather than syncing
-         * PC/condbits before the call, we use cpu_restore_state() to
-         * get them right before raising the exception.
+         * PC/condbits before the call, we use raise_exception_ra() so
+         * that cpu_restore_state() will sort them out.
          */
-        cpu_restore_state(cs, GETPC(), true);
-        raise_exception(env, EXCP_STKOF, 0, 1);
+        raise_exception_ra(env, EXCP_STKOF, 0, 1, GETPC());
     }
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Note that the SVE BFLOAT16 support does not require SVE2,
it is an independent extension.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_predinv(const ARMISARegisters *id)
     return FIELD_EX32(id->id_isar6, ID_ISAR6, SPECRES) != 0;
 }
 
+static inline bool isar_feature_aa32_bf16(const ARMISARegisters *id)
+{
+    return FIELD_EX32(id->id_isar6, ID_ISAR6, BF16) != 0;
+}
+
 static inline bool isar_feature_aa32_i8mm(const ARMISARegisters *id)
 {
     return FIELD_EX32(id->id_isar6, ID_ISAR6, I8MM) != 0;
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_dcpodp(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) >= 2;
 }
 
+static inline bool isar_feature_aa64_bf16(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, BF16) != 0;
+}
+
 static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id)
 {
     /* We always set the AdvSIMD and FP fields identically.  */
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sve2_bitperm(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BITPERM) != 0;
 }
 
+static inline bool isar_feature_aa64_sve_bf16(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BFLOAT16) != 0;
+}
+
 static inline bool isar_feature_aa64_sve2_sha3(const ARMISARegisters *id)
 {
     return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, SHA3) != 0;
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
     int rd = extract32(insn, 0, 5);
 
     if (mos) {
-        unallocated_encoding(s);
-        return;
+        goto do_unallocated;
     }
 
     switch (opcode) {
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
         /* FCVT between half, single and double precision */
         int dtype = extract32(opcode, 0, 2);
         if (type == 2 || dtype == type) {
-            unallocated_encoding(s);
-            return;
+            goto do_unallocated;
         }
         if (!fp_access_check(s)) {
             return;
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
 
     case 0x10 ... 0x13: /* FRINT{32,64}{X,Z} */
         if (type > 1 || !dc_isar_feature(aa64_frint, s)) {
-            unallocated_encoding(s);
-            return;
+            goto do_unallocated;
         }
         /* fall through */
     case 0x0 ... 0x3:
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
             break;
         case 3:
             if (!dc_isar_feature(aa64_fp16, s)) {
-                unallocated_encoding(s);
-                return;
+                goto do_unallocated;
             }
 
             if (!fp_access_check(s)) {
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
             handle_fp_1src_half(s, opcode, rd, rn);
             break;
         default:
-            unallocated_encoding(s);
+            goto do_unallocated;
         }
         break;
 
     default:
+    do_unallocated:
         unallocated_encoding(s);
         break;
     }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is the 64-bit BFCVT and the 32-bit VCVT{B,T}.BF16.F32.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h        |  1 +
 target/arm/vfp.decode      |  2 ++
 target/arm/translate-a64.c | 19 +++++++++++++++++++
 target/arm/translate-vfp.c | 24 ++++++++++++++++++++++++
 target/arm/vfp_helper.c    |  5 +++++
 5 files changed, 51 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
 
 DEF_HELPER_2(vfp_fcvtds, f64, f32, env)
 DEF_HELPER_2(vfp_fcvtsd, f32, f64, env)
+DEF_HELPER_FLAGS_2(bfcvt, TCG_CALL_NO_RWG, i32, f32, ptr)
 
 DEF_HELPER_2(vfp_uitoh, f16, i32, ptr)
 DEF_HELPER_2(vfp_uitos, f32, i32, ptr)
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
 
 # VCVTB and VCVTT to f16: Vd format is always vd_sp;
 # Vm format depends on size bit
+VCVT_b16_f32 ---- 1110 1.11 0011 .... 1001 t:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
 VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn)
     case 0x3: /* FSQRT */
         gen_helper_vfp_sqrts(tcg_res, tcg_op, cpu_env);
         goto done;
+    case 0x6: /* BFCVT */
+        gen_fpst = gen_helper_bfcvt;
+        break;
     case 0x8: /* FRINTN */
     case 0x9: /* FRINTP */
     case 0xa: /* FRINTM */
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
         }
         break;
 
+    case 0x6:
+        switch (type) {
+        case 1: /* BFCVT */
+            if (!dc_isar_feature(aa64_bf16, s)) {
+                goto do_unallocated;
+            }
+            if (!fp_access_check(s)) {
+                return;
+            }
+            handle_fp_1src_single(s, opcode, rd, rn);
+            break;
+        default:
+            goto do_unallocated;
+        }
+        break;
+
     default:
     do_unallocated:
         unallocated_encoding(s);
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
     return true;
 }
 
+static bool trans_VCVT_b16_f32(DisasContext *s, arg_VCVT_b16_f32 *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = fpstatus_ptr(FPST_FPCR);
+    tmp = tcg_temp_new_i32();
+
+    vfp_load_reg32(tmp, a->vm);
+    gen_helper_bfcvt(tmp, tmp, fpst);
+    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
 static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
 {
     TCGv_ptr fpst;
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ float32 VFP_HELPER(fcvts, d)(float64 x, CPUARMState *env)
     return float64_to_float32(x, &env->vfp.fp_status);
 }
 
+uint32_t HELPER(bfcvt)(float32 x, void *status)
+{
+    return float32_to_bfloat16(x, status);
+}
+
 /*
  * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
  * must always round-to-nearest; the AArch64 ones honour the FPSCR
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFCVT{N,T} for both AArch64 AdvSIMD and SVE,
and VCVT.BF16.F32 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-sve.h     |  4 ++++
 target/arm/helper.h         |  1 +
 target/arm/neon-dp.decode   |  1 +
 target/arm/sve.decode       |  2 ++
 target/arm/sve_helper.c     |  2 ++
 target/arm/translate-a64.c  | 17 ++++++++++++++
 target/arm/translate-neon.c | 45 +++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c  | 16 +++++++++++++
 target/arm/vfp_helper.c     |  7 ++++++
 9 files changed, 95 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve_fcvt_hd, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_fcvt_sd, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_bfcvt, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(sve_fcvtzs_hh, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve2_fcvtnt_sh, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve2_fcvtnt_ds, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_bfcvtnt, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(sve2_fcvtlt_hs, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
 DEF_HELPER_2(vfp_fcvtds, f64, f32, env)
 DEF_HELPER_2(vfp_fcvtsd, f32, f64, env)
 DEF_HELPER_FLAGS_2(bfcvt, TCG_CALL_NO_RWG, i32, f32, ptr)
+DEF_HELPER_FLAGS_2(bfcvt_pair, TCG_CALL_NO_RWG, i32, i64, ptr)
 
 DEF_HELPER_2(vfp_uitoh, f16, i32, ptr)
 DEF_HELPER_2(vfp_uitos, f32, i32, ptr)
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
     VRINTZ       1111 001 11 . 11 .. 10 .... 0 1011 . . 0 .... @2misc
 
     VCVT_F16_F32 1111 001 11 . 11 .. 10 .... 0 1100 0 . 0 .... @2misc_q0
+    VCVT_B16_F32 1111 001 11 . 11 .. 10 .... 0 1100 1 . 0 .... @2misc_q0
 
     VRINTM       1111 001 11 . 11 .. 10 .... 0 1101 . . 0 .... @2misc
 
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FNMLS_zpzzz     01100101 .. 1 ..... 111 ... ..... .....         @rdn_pg_rm_ra
 # SVE floating-point convert precision
 FCVT_sh         01100101 10 0010 00 101 ... ..... .....         @rd_pg_rn_e0
 FCVT_hs         01100101 10 0010 01 101 ... ..... .....         @rd_pg_rn_e0
+BFCVT           01100101 10 0010 10 101 ... ..... .....         @rd_pg_rn_e0
 FCVT_dh         01100101 11 0010 00 101 ... ..... .....         @rd_pg_rn_e0
 FCVT_hd         01100101 11 0010 01 101 ... ..... .....         @rd_pg_rn_e0
 FCVT_ds         01100101 11 0010 10 101 ... ..... .....         @rd_pg_rn_e0
@@ -XXX,XX +XXX,XX @@ RAX1            01000101 00 1 ..... 11110 1 ..... .....  @rd_rn_rm_e0
 FCVTXNT_ds      01100100 00 0010 10 101 ... ..... .....  @rd_pg_rn_e0
 FCVTX_ds        01100101 00 0010 10 101 ... ..... .....  @rd_pg_rn_e0
 FCVTNT_sh       01100100 10 0010 00 101 ... ..... .....  @rd_pg_rn_e0
+BFCVTNT         01100100 10 0010 10 101 ... ..... .....  @rd_pg_rn_e0
 FCVTLT_hs       01100100 10 0010 01 101 ... ..... .....  @rd_pg_rn_e0
 FCVTNT_ds       01100100 11 0010 10 101 ... ..... .....  @rd_pg_rn_e0
 FCVTLT_sd       01100100 11 0010 11 101 ... ..... .....  @rd_pg_rn_e0
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint64_t vfp_float64_to_uint64_rtz(float64 f, float_status *s)
 
 DO_ZPZ_FP(sve_fcvt_sh, uint32_t, H1_4, sve_f32_to_f16)
 DO_ZPZ_FP(sve_fcvt_hs, uint32_t, H1_4, sve_f16_to_f32)
+DO_ZPZ_FP(sve_bfcvt,   uint32_t, H1_4, float32_to_bfloat16)
 DO_ZPZ_FP(sve_fcvt_dh, uint64_t,     , sve_f64_to_f16)
 DO_ZPZ_FP(sve_fcvt_hd, uint64_t,     , sve_f16_to_f64)
 DO_ZPZ_FP(sve_fcvt_ds, uint64_t,     , float64_to_float32)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc)  \
     } while (i != 0);                                                         \
 }
 
+DO_FCVTNT(sve_bfcvtnt,    uint32_t, uint16_t, H1_4, H1_2, float32_to_bfloat16)
 DO_FCVTNT(sve2_fcvtnt_sh, uint32_t, uint16_t, H1_4, H1_2, sve_f32_to_f16)
 DO_FCVTNT(sve2_fcvtnt_ds, uint64_t, uint32_t,     , H1_4, float64_to_float32)
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_2misc_narrow(DisasContext *s, bool scalar,
                 tcg_temp_free_i32(ahp);
             }
             break;
+        case 0x36: /* BFCVTN, BFCVTN2 */
+            {
+                TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
+                gen_helper_bfcvt_pair(tcg_res[pass], tcg_op, fpst);
+                tcg_temp_free_ptr(fpst);
+            }
+            break;
         case 0x56:  /* FCVTXN, FCVTXN2 */
             /* 64 bit to 32 bit float conversion
              * with von Neumann rounding (round to odd)
@@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
             }
             handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd);
             return;
+        case 0x36: /* BFCVTN, BFCVTN2 */
+            if (!dc_isar_feature(aa64_bf16, s) || size != 2) {
+                unallocated_encoding(s);
+                return;
+            }
+            if (!fp_access_check(s)) {
+                return;
+            }
+            handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd);
+            return;
         case 0x17: /* FCVTL, FCVTL2 */
             if (!fp_access_check(s)) {
                 return;
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
     return true;
 }
 
+static bool trans_VCVT_B16_F32(DisasContext *s, arg_2misc *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i64 tmp;
+    TCGv_i32 dst0, dst1;
+
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vm & 1) || (a->size != 1)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = fpstatus_ptr(FPST_STD);
+    tmp = tcg_temp_new_i64();
+    dst0 = tcg_temp_new_i32();
+    dst1 = tcg_temp_new_i32();
+
+    read_neon_element64(tmp, a->vm, 0, MO_64);
+    gen_helper_bfcvt_pair(dst0, tmp, fpst);
+
+    read_neon_element64(tmp, a->vm, 1, MO_64);
+    gen_helper_bfcvt_pair(dst1, tmp, fpst);
+
+    write_neon_element32(dst0, a->vd, 0, MO_32);
+    write_neon_element32(dst1, a->vd, 1, MO_32);
+
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i32(dst0);
+    tcg_temp_free_i32(dst1);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
+
 static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
 {
     TCGv_ptr fpst;
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_hs(DisasContext *s, arg_rpr_esz *a)
     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_hs);
 }
 
+static bool trans_BFCVT(DisasContext *s, arg_rpr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_bfcvt);
+}
+
 static bool trans_FCVT_dh(DisasContext *s, arg_rpr_esz *a)
 {
     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_dh);
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVTNT_sh(DisasContext *s, arg_rpr_esz *a)
     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtnt_sh);
 }
 
+static bool trans_BFCVTNT(DisasContext *s, arg_rpr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_bfcvtnt);
+}
+
 static bool trans_FCVTNT_ds(DisasContext *s, arg_rpr_esz *a)
 {
     if (!dc_isar_feature(aa64_sve2, s)) {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(bfcvt)(float32 x, void *status)
     return float32_to_bfloat16(x, status);
 }
 
+uint32_t HELPER(bfcvt_pair)(uint64_t pair, void *status)
+{
+    bfloat16 lo = float32_to_bfloat16(extract64(pair, 0, 32), status);
+    bfloat16 hi = float32_to_bfloat16(extract64(pair, 32, 32), status);
+    return deposit32(lo, 16, 16, hi);
+}
+
 /*
  * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
  * must always round-to-nearest; the AArch64 ones honour the FPSCR
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

For Arm BFDOT and BFMMLA, we need a version of round-to-odd
that overflows to infinity, instead of the max normal number.

Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-6-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/fpu/softfloat-types.h | 4 +++-
 fpu/softfloat-parts.c.inc     | 6 ++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index XXXXXXX..XXXXXXX 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
     float_round_up           = 2,
     float_round_to_zero      = 3,
     float_round_ties_away    = 4,
-    /* Not an IEEE rounding mode: round to the closest odd mantissa value */
+    /* Not an IEEE rounding mode: round to closest odd, overflow to max */
     float_round_to_odd       = 5,
+    /* Not an IEEE rounding mode: round to closest odd, overflow to inf */
+    float_round_to_odd_inf   = 6,
 } FloatRoundMode;
 
 /*
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
         g_assert_not_reached();
     }
 
+    overflow_norm = false;
     switch (s->float_rounding_mode) {
     case float_round_nearest_even:
-        overflow_norm = false;
         inc = ((p->frac_lo & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
         break;
     case float_round_ties_away:
-        overflow_norm = false;
         inc = frac_lsbm1;
         break;
     case float_round_to_zero:
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
         break;
     case float_round_to_odd:
         overflow_norm = true;
+        /* fall through */
+    case float_round_to_odd_inf:
         inc = p->frac_lo & frac_lsb ? 0 : round_mask;
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
                        ? frac_lsbm1 : 0);
                 break;
             case float_round_to_odd:
+            case float_round_to_odd_inf:
                 inc = p->frac_lo & frac_lsb ? 0 : round_mask;
                 break;
             default:
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFDOT for both AArch64 AdvSIMD and SVE,
and VDOT.BF16 for AArch32 NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-7-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  3 +++
 target/arm/neon-shared.decode |  2 ++
 target/arm/sve.decode         |  3 +++
 target/arm/translate-a64.c    | 20 ++++++++++++++++++
 target/arm/translate-neon.c   |  9 ++++++++
 target/arm/translate-sve.c    | 12 +++++++++++
 target/arm/vec_helper.c       | 40 +++++++++++++++++++++++++++++++++++
 7 files changed, 89 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_ummla_b, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUDOT          1111 110 00 . 10 .... .... 1101 . q:1 . 1 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
 VUSDOT         1111 110 01 . 10 .... .... 1101 . q:1 . 0 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+VDOT_b16       1111 110 00 . 00 .... .... 1101 . q:1 . 0 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
 # VFM[AS]L
 VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
 FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_e0
 FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_e0
 
+### SVE2 floating-point bfloat16 dot-product
+BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
+
 ### SVE2 floating-point multiply-add long (indexed)
 FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
 FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         feature = dc_isar_feature(aa64_fcma, s);
         break;
+    case 0x1f: /* BFDOT */
+        switch (size) {
+        case 1:
+            feature = dc_isar_feature(aa64_bf16, s);
+            break;
+        default:
+            unallocated_encoding(s);
+            return;
+        }
+        break;
     default:
         unallocated_encoding(s);
         return;
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         return;
 
+    case 0xf: /* BFDOT */
+        switch (size) {
+        case 1:
+            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        return;
+
     default:
         g_assert_not_reached();
     }
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VUSDOT(DisasContext *s, arg_VUSDOT *a)
                         gen_helper_gvec_usdot_b);
 }
 
+static bool trans_VDOT_b16(DisasContext *s, arg_VDOT_b16 *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0,
+                        gen_helper_gvec_bfdot);
+}
+
 static bool trans_VFML(DisasContext *s, arg_VFML *a)
 {
     int opr_sz;
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_UMMLA(DisasContext *s, arg_rrrr_esz *a)
 {
     return do_i8mm_zzzz_ool(s, a, gen_helper_gvec_ummla_b, 0);
 }
+
+static bool trans_BFDOT_zzzz(DisasContext *s, arg_rrrr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfdot,
+                          a->rd, a->rn, a->rm, a->ra, 0);
+    }
+    return true;
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static void do_mmla_b(void *vd, void *vn, void *vm, void *va, uint32_t desc,
 DO_MMLA_B(gvec_smmla_b, do_smmla_b)
 DO_MMLA_B(gvec_ummla_b, do_ummla_b)
 DO_MMLA_B(gvec_usmmla_b, do_usmmla_b)
+
+/*
+ * BFloat16 Dot Product
+ */
+
+static float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2)
+{
+    /* FPCR is ignored for BFDOT and BFMMLA. */
+    float_status bf_status = {
+        .tininess_before_rounding = float_tininess_before_rounding,
+        .float_rounding_mode = float_round_to_odd_inf,
+        .flush_to_zero = true,
+        .flush_inputs_to_zero = true,
+        .default_nan_mode = true,
+    };
+    float32 t1, t2;
+
+    /*
+     * Extract each BFloat16 from the element pair, and shift
+     * them such that they become float32.
+     */
+    t1 = float32_mul(e1 << 16, e2 << 16, &bf_status);
+    t2 = float32_mul(e1 & 0xffff0000u, e2 & 0xffff0000u, &bf_status);
+    t1 = float32_add(t1, t2, &bf_status);
+    t1 = float32_add(sum, t1, &bf_status);
+
+    return t1;
+}
+
+void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    float32 *d = vd, *a = va;
+    uint32_t *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 4; ++i) {
+        d[i] = bfdotadd(a[i], n[i], m[i]);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFDOT for both AArch64 AdvSIMD and SVE,
and VDOT.BF16 for AArch32 NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  2 ++
 target/arm/neon-shared.decode |  2 ++
 target/arm/sve.decode         |  3 +++
 target/arm/translate-a64.c    | 41 +++++++++++++++++++++++++++--------
 target/arm/translate-neon.c   |  9 ++++++++
 target/arm/translate-sve.c    | 12 ++++++++++
 target/arm/vec_helper.c       | 20 +++++++++++++++++
 7 files changed, 80 insertions(+), 9 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG,
 
 DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
 
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUSDOT_scalar  1111 1110 1 . 00 .... .... 1101 . q:1 index:1 0 vm:4 \
                vn=%vn_dp vd=%vd_dp
 VSUDOT_scalar  1111 1110 1 . 00 .... .... 1101 . q:1 index:1 1 vm:4 \
                vn=%vn_dp vd=%vd_dp
+VDOT_b16_scal  1111 1110 0 . 00 .... .... 1101 . q:1 index:1 0 vm:4 \
+               vn=%vn_dp vd=%vd_dp
 
 %vfml_scalar_q0_rm 0:3 5:1
 %vfml_scalar_q1_index 5:1 3:1
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
 FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
 FMLSLB_zzxw     01100100 10 1 ..... 0110.0 ..... .....     @rrxr_3a esz=2
 FMLSLT_zzxw     01100100 10 1 ..... 0110.1 ..... .....     @rrxr_3a esz=2
+
+### SVE2 floating-point bfloat16 dot-product (indexed)
+BFDOT_zzxz      01100100 01 1 ..... 010000 ..... .....     @rrxr_2 esz=2
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
             return;
         }
         break;
-    case 0x0f: /* SUDOT, USDOT */
-        if (is_scalar || (size & 1) || !dc_isar_feature(aa64_i8mm, s)) {
+    case 0x0f:
+        switch (size) {
+        case 0: /* SUDOT */
+        case 2: /* USDOT */
+            if (is_scalar || !dc_isar_feature(aa64_i8mm, s)) {
+                unallocated_encoding(s);
+                return;
+            }
+            break;
+        case 1: /* BFDOT */
+            if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
+                unallocated_encoding(s);
+                return;
+            }
+            break;
+        default:
             unallocated_encoding(s);
             return;
         }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                          u ? gen_helper_gvec_udot_idx_b
                          : gen_helper_gvec_sdot_idx_b);
         return;
-    case 0x0f: /* SUDOT, USDOT */
-        gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
-                         extract32(insn, 23, 1)
-                         ? gen_helper_gvec_usdot_idx_b
-                         : gen_helper_gvec_sudot_idx_b);
-        return;
-
+    case 0x0f:
+        switch (extract32(insn, 22, 2)) {
+        case 0: /* SUDOT */
+            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
+                             gen_helper_gvec_sudot_idx_b);
+            return;
+        case 1: /* BFDOT */
+            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
+                             gen_helper_gvec_bfdot_idx);
+            return;
+        case 2: /* USDOT */
+            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
+                             gen_helper_gvec_usdot_idx_b);
+            return;
+        }
+        g_assert_not_reached();
     case 0x11: /* FCMLA #0 */
     case 0x13: /* FCMLA #90 */
     case 0x15: /* FCMLA #180 */
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSUDOT_scalar(DisasContext *s, arg_VSUDOT_scalar *a)
                         gen_helper_gvec_sudot_idx_b);
 }
 
+static bool trans_VDOT_b16_scal(DisasContext *s, arg_VDOT_b16_scal *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index,
+                        gen_helper_gvec_bfdot_idx);
+}
+
 static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
 {
     int opr_sz;
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFDOT_zzzz(DisasContext *s, arg_rrrr_esz *a)
     }
     return true;
 }
+
+static bool trans_BFDOT_zzxz(DisasContext *s, arg_rrxr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfdot_idx,
+                          a->rd, a->rn, a->rm, a->ra, a->index);
+    }
+    return true;
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm,
+                            void *va, uint32_t desc)
+{
+    intptr_t i, j, opr_sz = simd_oprsz(desc);
+    intptr_t index = simd_data(desc);
+    intptr_t elements = opr_sz / 4;
+    intptr_t eltspersegment = MIN(16 / 4, elements);
+    float32 *d = vd, *a = va;
+    uint32_t *n = vn, *m = vm;
+
+    for (i = 0; i < elements; i += eltspersegment) {
+        uint32_t m_idx = m[i + H4(index)];
+
+        for (j = i; j < i + eltspersegment; j++) {
+            d[j] = bfdotadd(a[j], n[j], m_idx);
+        }
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFMMLA for both AArch64 AdvSIMD and SVE,
and VMMLA.BF16 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  3 +++
 target/arm/neon-shared.decode |  2 ++
 target/arm/sve.decode         |  6 +++--
 target/arm/translate-a64.c    | 10 +++++++++
 target/arm/translate-neon.c   |  9 ++++++++
 target/arm/translate-sve.c    | 12 ++++++++++
 target/arm/vec_helper.c       | 42 ++++++++++++++++++++++++++++++++++-
 7 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUMMLA         1111 1100 0.10 .... .... 1100 .1.1 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
 VUSMMLA        1111 1100 1.10 .... .... 1100 .1.0 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+VMMLA_b16      1111 1100 0.00 .... .... 1100 .1.0 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
 VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
                vn=%vn_dp vd=%vd_dp size=1
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ SQRDCMLAH_zzzz  01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5  ra=%reg_movprfx
 USDOT_zzzz      01000100 .. 0 ..... 011 110 ..... .....  @rda_rn_rm
 
 ### SVE2 floating point matrix multiply accumulate
-
-FMMLA           01100100 .. 1 ..... 111001 ..... .....  @rda_rn_rm
+{
+  BFMMLA        01100100 01 1 ..... 111 001 ..... .....  @rda_rn_rm_e0
+  FMMLA         01100100 .. 1 ..... 111 001 ..... .....  @rda_rn_rm
+}
 
 ### SVE2 Memory Gather Load Group
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         feature = dc_isar_feature(aa64_fcma, s);
         break;
+    case 0x1d: /* BFMMLA */
+        if (size != MO_16 || !is_q) {
+            unallocated_encoding(s);
+            return;
+        }
+        feature = dc_isar_feature(aa64_bf16, s);
+        break;
     case 0x1f: /* BFDOT */
         switch (size) {
         case 1:
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         return;
 
+    case 0xd: /* BFMMLA */
+        gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
+        return;
     case 0xf: /* BFDOT */
         switch (size) {
         case 1:
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VUSMMLA(DisasContext *s, arg_VUSMMLA *a)
     return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
                         gen_helper_gvec_usmmla_b);
 }
+
+static bool trans_VMMLA_b16(DisasContext *s, arg_VMMLA_b16 *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
+                        gen_helper_gvec_bfmmla);
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFDOT_zzxz(DisasContext *s, arg_rrxr_esz *a)
     }
     return true;
 }
+
+static bool trans_BFMMLA(DisasContext *s, arg_rrrr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfmmla,
+                          a->rd, a->rn, a->rm, a->ra, 0);
+    }
+    return true;
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static void do_mmla_b(void *vd, void *vn, void *vm, void *va, uint32_t desc,
          * Process the entire segment at once, writing back the
          * results only after we've consumed all of the inputs.
          *
-         * Key to indicies by column:
+         * Key to indices by column:
          *          i   j                  i             j
          */
         sum0 = a[H4(0 + 0)];
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm,
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
+{
+    intptr_t s, opr_sz = simd_oprsz(desc);
+    float32 *d = vd, *a = va;
+    uint32_t *n = vn, *m = vm;
+
+    for (s = 0; s < opr_sz / 4; s += 4) {
+        float32 sum00, sum01, sum10, sum11;
+
+        /*
+         * Process the entire segment at once, writing back the
+         * results only after we've consumed all of the inputs.
+         *
+         * Key to indicies by column:
+         *               i   j           i   k             j   k
+         */
+        sum00 = a[s + H4(0 + 0)];
+        sum00 = bfdotadd(sum00, n[s + H4(0 + 0)], m[s + H4(0 + 0)]);
+        sum00 = bfdotadd(sum00, n[s + H4(0 + 1)], m[s + H4(0 + 1)]);
+
+        sum01 = a[s + H4(0 + 1)];
+        sum01 = bfdotadd(sum01, n[s + H4(0 + 0)], m[s + H4(2 + 0)]);
+        sum01 = bfdotadd(sum01, n[s + H4(0 + 1)], m[s + H4(2 + 1)]);
+
+        sum10 = a[s + H4(2 + 0)];
+        sum10 = bfdotadd(sum10, n[s + H4(2 + 0)], m[s + H4(0 + 0)]);
+        sum10 = bfdotadd(sum10, n[s + H4(2 + 1)], m[s + H4(0 + 1)]);
+
+        sum11 = a[s + H4(2 + 1)];
+        sum11 = bfdotadd(sum11, n[s + H4(2 + 0)], m[s + H4(2 + 0)]);
+        sum11 = bfdotadd(sum11, n[s + H4(2 + 1)], m[s + H4(2 + 1)]);
+
+        d[s + H4(0 + 0)] = sum00;
+        d[s + H4(0 + 1)] = sum01;
+        d[s + H4(2 + 0)] = sum10;
+        d[s + H4(2 + 1)] = sum11;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
and VFMA{B,T}.BF16 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-10-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  3 +++
 target/arm/neon-shared.decode |  3 +++
 target/arm/sve.decode         |  3 +++
 target/arm/translate-a64.c    | 13 +++++++++----
 target/arm/translate-neon.c   |  9 +++++++++
 target/arm/translate-sve.c    | 30 ++++++++++++++++++++++++++++++
 target/arm/vec_helper.c       | 16 ++++++++++++++++
 7 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUSMMLA        1111 1100 1.10 .... .... 1100 .1.0 .... \
 VMMLA_b16      1111 1100 0.00 .... .... 1100 .1.0 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
+VFMA_b16       1111 110 0 0.11 .... .... 1000 . q:1 . 1 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
 VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
                vn=%vn_dp vd=%vd_dp size=1
 VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
 FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_e0
 FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_e0
 
+BFMLALB_zzzw    01100100 11 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
+BFMLALT_zzzw    01100100 11 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
+
 ### SVE2 floating-point bfloat16 dot-product
 BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         feature = dc_isar_feature(aa64_bf16, s);
         break;
-    case 0x1f: /* BFDOT */
+    case 0x1f:
         switch (size) {
-        case 1:
+        case 1: /* BFDOT */
+        case 3: /* BFMLAL{B,T} */
             feature = dc_isar_feature(aa64_bf16, s);
             break;
         default:
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
     case 0xd: /* BFMMLA */
         gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
         return;
-    case 0xf: /* BFDOT */
+    case 0xf:
         switch (size) {
-        case 1:
+        case 1: /* BFDOT */
             gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
             break;
+        case 3: /* BFMLAL{B,T} */
+            gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, false, is_q,
+                              gen_helper_gvec_bfmlal);
+            break;
         default:
             g_assert_not_reached();
         }
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMMLA_b16(DisasContext *s, arg_VMMLA_b16 *a)
     return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
                         gen_helper_gvec_bfmmla);
 }
+
+static bool trans_VFMA_b16(DisasContext *s, arg_VFMA_b16 *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda_fpst(s, 7, a->vd, a->vn, a->vm, a->q, FPST_STD,
+                             gen_helper_gvec_bfmlal);
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMMLA(DisasContext *s, arg_rrrr_esz *a)
     }
     return true;
 }
+
+static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        TCGv_ptr status = fpstatus_ptr(FPST_FPCR);
+        unsigned vsz = vec_full_reg_size(s);
+
+        tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
+                           vec_full_reg_offset(s, a->rn),
+                           vec_full_reg_offset(s, a->rm),
+                           vec_full_reg_offset(s, a->ra),
+                           status, vsz, vsz, sel,
+                           gen_helper_gvec_bfmlal);
+        tcg_temp_free_ptr(status);
+    }
+    return true;
+}
+
+static bool trans_BFMLALB_zzzw(DisasContext *s, arg_rrrr_esz *a)
+{
+    return do_BFMLAL_zzzw(s, a, false);
+}
+
+static bool trans_BFMLALT_zzzw(DisasContext *s, arg_rrrr_esz *a)
+{
+    return do_BFMLAL_zzzw(s, a, true);
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va,
+                         void *stat, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    intptr_t sel = simd_data(desc);
+    float32 *d = vd, *a = va;
+    bfloat16 *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 4; ++i) {
+        float32 nn = n[H2(i * 2 + sel)] << 16;
+        float32 mm = m[H2(i * 2 + sel)] << 16;
+        d[H4(i)] = float32_muladd(nn, mm, a[H4(i)], 0, stat);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
and VFMA{B,T}.BF16 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-11-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  2 ++
 target/arm/neon-shared.decode |  2 ++
 target/arm/sve.decode         |  2 ++
 target/arm/translate-a64.c    | 15 ++++++++++++++-
 target/arm/translate-neon.c   | 10 ++++++++++
 target/arm/translate-sve.c    | 30 ++++++++++++++++++++++++++++++
 target/arm/vec_helper.c       | 22 ++++++++++++++++++++++
 7 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
 
 DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
 
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 0 . 1 index:1 ... \
                rm=%vfml_scalar_q0_rm vn=%vn_sp vd=%vd_dp q=0
 VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 1 . 1 . rm:3 \
                index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp q=1
+VFMA_b16_scal  1111 1110 0.11 .... .... 1000 . q:1 . 1 . vm:3 \
+               index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
 FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
 FMLSLB_zzxw     01100100 10 1 ..... 0110.0 ..... .....     @rrxr_3a esz=2
 FMLSLT_zzxw     01100100 10 1 ..... 0110.1 ..... .....     @rrxr_3a esz=2
+BFMLALB_zzxw    01100100 11 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
+BFMLALT_zzxw    01100100 11 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
 
 ### SVE2 floating-point bfloat16 dot-product (indexed)
 BFDOT_zzxz      01100100 01 1 ..... 010000 ..... .....     @rrxr_2 esz=2
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                 unallocated_encoding(s);
                 return;
             }
+            size = MO_32;
             break;
         case 1: /* BFDOT */
             if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
                 unallocated_encoding(s);
                 return;
             }
+            size = MO_32;
+            break;
+        case 3: /* BFMLAL{B,T} */
+            if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
+                unallocated_encoding(s);
+                return;
+            }
+            /* can't set is_fp without other incorrect size checks */
+            size = MO_16;
             break;
         default:
             unallocated_encoding(s);
             return;
         }
-        size = MO_32;
         break;
     case 0x11: /* FCMLA #0 */
     case 0x13: /* FCMLA #90 */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
             gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
                              gen_helper_gvec_usdot_idx_b);
             return;
+        case 3: /* BFMLAL{B,T} */
+            gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, 0, (index << 1) | is_q,
+                              gen_helper_gvec_bfmlal_idx);
+            return;
         }
         g_assert_not_reached();
     case 0x11: /* FCMLA #0 */
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VFMA_b16(DisasContext *s, arg_VFMA_b16 *a)
     return do_neon_ddda_fpst(s, 7, a->vd, a->vn, a->vm, a->q, FPST_STD,
                              gen_helper_gvec_bfmlal);
 }
+
+static bool trans_VFMA_b16_scal(DisasContext *s, arg_VFMA_b16_scal *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda_fpst(s, 6, a->vd, a->vn, a->vm,
+                             (a->index << 1) | a->q, FPST_STD,
+                             gen_helper_gvec_bfmlal_idx);
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLALT_zzzw(DisasContext *s, arg_rrrr_esz *a)
 {
     return do_BFMLAL_zzzw(s, a, true);
 }
+
+static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        TCGv_ptr status = fpstatus_ptr(FPST_FPCR);
+        unsigned vsz = vec_full_reg_size(s);
+
+        tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
+                           vec_full_reg_offset(s, a->rn),
+                           vec_full_reg_offset(s, a->rm),
+                           vec_full_reg_offset(s, a->ra),
+                           status, vsz, vsz, (a->index << 1) | sel,
+                           gen_helper_gvec_bfmlal_idx);
+        tcg_temp_free_ptr(status);
+    }
+    return true;
+}
+
+static bool trans_BFMLALB_zzxw(DisasContext *s, arg_rrxr_esz *a)
+{
+    return do_BFMLAL_zzxw(s, a, false);
+}
+
+static bool trans_BFMLALT_zzxw(DisasContext *s, arg_rrxr_esz *a)
+{
+    return do_BFMLAL_zzxw(s, a, true);
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va,
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_bfmlal_idx)(void *vd, void *vn, void *vm,
+                             void *va, void *stat, uint32_t desc)
+{
+    intptr_t i, j, opr_sz = simd_oprsz(desc);
+    intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 1);
+    intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 1, 3);
+    intptr_t elements = opr_sz / 4;
+    intptr_t eltspersegment = MIN(16 / 4, elements);
+    float32 *d = vd, *a = va;
+    bfloat16 *n = vn, *m = vm;
+
+    for (i = 0; i < elements; i += eltspersegment) {
+        float32 m_idx = m[H2(2 * i + index)] << 16;
+
+        for (j = i; j < i + eltspersegment; j++) {
+            float32 n_j = n[H2(2 * j + sel)] << 16;
+            d[H4(j)] = float32_muladd(n_j, m_idx, a[H4(j)], 0, stat);
+        }
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-12-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 linux-user/elfload.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap2(void)
     GET_FEATURE_ID(aa64_sve_i8mm, ARM_HWCAP2_A64_SVEI8MM);
     GET_FEATURE_ID(aa64_sve_f32mm, ARM_HWCAP2_A64_SVEF32MM);
     GET_FEATURE_ID(aa64_sve_f64mm, ARM_HWCAP2_A64_SVEF64MM);
+    GET_FEATURE_ID(aa64_sve_bf16, ARM_HWCAP2_A64_SVEBF16);
     GET_FEATURE_ID(aa64_i8mm, ARM_HWCAP2_A64_I8MM);
+    GET_FEATURE_ID(aa64_bf16, ARM_HWCAP2_A64_BF16);
     GET_FEATURE_ID(aa64_rndr, ARM_HWCAP2_A64_RNG);
     GET_FEATURE_ID(aa64_bti, ARM_HWCAP2_A64_BTI);
     GET_FEATURE_ID(aa64_mte, ARM_HWCAP2_A64_MTE);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Disable BF16 again for !have_neon and !have_vfp during realize.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-13-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c     | 3 +++
 target/arm/cpu64.c   | 3 +++
 target/arm/cpu_tcg.c | 1 +
 3 files changed, 7 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
 
         u = cpu->isar.id_isar6;
         u = FIELD_DP32(u, ID_ISAR6, JSCVT, 0);
+        u = FIELD_DP32(u, ID_ISAR6, BF16, 0);
         cpu->isar.id_isar6 = u;
 
         u = cpu->isar.mvfr0;
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
 
         t = cpu->isar.id_aa64isar1;
         t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 0);
+        t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 0);
         t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 0);
         cpu->isar.id_aa64isar1 = t;
 
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
         u = cpu->isar.id_isar6;
         u = FIELD_DP32(u, ID_ISAR6, DP, 0);
         u = FIELD_DP32(u, ID_ISAR6, FHM, 0);
+        u = FIELD_DP32(u, ID_ISAR6, BF16, 0);
         u = FIELD_DP32(u, ID_ISAR6, I8MM, 0);
         cpu->isar.id_isar6 = u;
 
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
         t = FIELD_DP64(t, ID_AA64ISAR1, SB, 1);
         t = FIELD_DP64(t, ID_AA64ISAR1, SPECRES, 1);
+        t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 1);
         t = FIELD_DP64(t, ID_AA64ISAR1, FRINTTS, 1);
         t = FIELD_DP64(t, ID_AA64ISAR1, LRCPC, 2); /* ARMv8.4-RCPC */
         t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 1);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 1);
         t = FIELD_DP64(t, ID_AA64ZFR0, AES, 2);  /* PMULL */
         t = FIELD_DP64(t, ID_AA64ZFR0, BITPERM, 1);
+        t = FIELD_DP64(t, ID_AA64ZFR0, BFLOAT16, 1);
         t = FIELD_DP64(t, ID_AA64ZFR0, SHA3, 1);
         t = FIELD_DP64(t, ID_AA64ZFR0, SM4, 1);
         t = FIELD_DP64(t, ID_AA64ZFR0, I8MM, 1);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         u = FIELD_DP32(u, ID_ISAR6, FHM, 1);
         u = FIELD_DP32(u, ID_ISAR6, SB, 1);
         u = FIELD_DP32(u, ID_ISAR6, SPECRES, 1);
+        u = FIELD_DP32(u, ID_ISAR6, BF16, 1);
         u = FIELD_DP32(u, ID_ISAR6, I8MM, 1);
         cpu->isar.id_isar6 = u;
 
diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu_tcg.c
+++ b/target/arm/cpu_tcg.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
         t = FIELD_DP32(t, ID_ISAR6, FHM, 1);
         t = FIELD_DP32(t, ID_ISAR6, SB, 1);
         t = FIELD_DP32(t, ID_ISAR6, SPECRES, 1);
+        t = FIELD_DP32(t, ID_ISAR6, BF16, 1);
         t = FIELD_DP32(t, ID_ISAR6, I8MM, 1);
         cpu->isar.id_isar6 = t;
 
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

Until now, Hypervisor.framework has only been available on x86_64 systems.
With Apple Silicon shipping now, it extends its reach to aarch64. To
prepare for support for multiple architectures, let's start moving common
code out into its own accel directory.

This patch moves assert_hvf_ok() and introduces generic build infrastructure.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-2-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/sysemu/hvf_int.h | 18 +++++++++++++++
 accel/hvf/hvf-all.c      | 47 ++++++++++++++++++++++++++++++++++++++++
 target/i386/hvf/hvf.c    | 33 +---------------------------
 MAINTAINERS              |  8 +++++++
 accel/hvf/meson.build    |  6 +++++
 accel/meson.build        |  1 +
 6 files changed, 81 insertions(+), 32 deletions(-)
 create mode 100644 include/sysemu/hvf_int.h
 create mode 100644 accel/hvf/hvf-all.c
 create mode 100644 accel/hvf/meson.build

diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * QEMU Hypervisor.framework (HVF) support
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/* header to be included in HVF-specific code */
+
+#ifndef HVF_INT_H
+#define HVF_INT_H
+
+#include <Hypervisor/hv.h>
+
+void assert_hvf_ok(hv_return_t ret);
+
+#endif
diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/accel/hvf/hvf-all.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * QEMU Hypervisor.framework support
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
+
+void assert_hvf_ok(hv_return_t ret)
+{
+    if (ret == HV_SUCCESS) {
+        return;
+    }
+
+    switch (ret) {
+    case HV_ERROR:
+        error_report("Error: HV_ERROR");
+        break;
+    case HV_BUSY:
+        error_report("Error: HV_BUSY");
+        break;
+    case HV_BAD_ARGUMENT:
+        error_report("Error: HV_BAD_ARGUMENT");
+        break;
+    case HV_NO_RESOURCES:
+        error_report("Error: HV_NO_RESOURCES");
+        break;
+    case HV_NO_DEVICE:
+        error_report("Error: HV_NO_DEVICE");
+        break;
+    case HV_UNSUPPORTED:
+        error_report("Error: HV_UNSUPPORTED");
+        break;
+    default:
+        error_report("Unknown Error");
+    }
+
+    abort();
+}
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/error-report.h"
 
 #include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
 #include "sysemu/runstate.h"
 #include "hvf-i386.h"
 #include "vmcs.h"
@@ -XXX,XX +XXX,XX @@
 
 HVFState *hvf_state;
 
-static void assert_hvf_ok(hv_return_t ret)
-{
-    if (ret == HV_SUCCESS) {
-        return;
-    }
-
-    switch (ret) {
-    case HV_ERROR:
-        error_report("Error: HV_ERROR");
-        break;
-    case HV_BUSY:
-        error_report("Error: HV_BUSY");
-        break;
-    case HV_BAD_ARGUMENT:
-        error_report("Error: HV_BAD_ARGUMENT");
-        break;
-    case HV_NO_RESOURCES:
-        error_report("Error: HV_NO_RESOURCES");
-        break;
-    case HV_NO_DEVICE:
-        error_report("Error: HV_NO_DEVICE");
-        break;
-    case HV_UNSUPPORTED:
-        error_report("Error: HV_UNSUPPORTED");
-        break;
-    default:
-        error_report("Unknown Error");
-    }
-
-    abort();
-}
-
 /* Memory slots */
 hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
 {
diff --git a/MAINTAINERS b/MAINTAINERS
index XXXXXXX..XXXXXXX 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ M: Roman Bolshakov <r.bolshakov@yadro.com>
 W: https://wiki.qemu.org/Features/HVF
 S: Maintained
 F: target/i386/hvf/
+
+HVF
+M: Cameron Esfahani <dirty@apple.com>
+M: Roman Bolshakov <r.bolshakov@yadro.com>
+W: https://wiki.qemu.org/Features/HVF
+S: Maintained
+F: accel/hvf/
 F: include/sysemu/hvf.h
+F: include/sysemu/hvf_int.h
 
 WHPX CPUs
 M: Sunil Muthuswamy <sunilmut@microsoft.com>
diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/accel/hvf/meson.build
@@ -XXX,XX +XXX,XX @@
+hvf_ss = ss.source_set()
+hvf_ss.add(files(
+  'hvf-all.c',
+))
+
+specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
diff --git a/accel/meson.build b/accel/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/accel/meson.build
+++ b/accel/meson.build
@@ -XXX,XX +XXX,XX @@ specific_ss.add(files('accel-common.c'))
 softmmu_ss.add(files('accel-softmmu.c'))
 user_ss.add(files('accel-user.c'))
 
+subdir('hvf')
 subdir('qtest')
 subdir('kvm')
 subdir('tcg')
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

This patch moves the vCPU thread loop over.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-3-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 {target/i386 => accel}/hvf/hvf-accel-ops.h | 0
 {target/i386 => accel}/hvf/hvf-accel-ops.c | 0
 target/i386/hvf/x86hvf.c                   | 2 +-
 accel/hvf/meson.build                      | 1 +
 target/i386/hvf/meson.build                | 1 -
 5 files changed, 2 insertions(+), 2 deletions(-)
 rename {target/i386 => accel}/hvf/hvf-accel-ops.h (100%)
 rename {target/i386 => accel}/hvf/hvf-accel-ops.c (100%)

diff --git a/target/i386/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
similarity index 100%
rename from target/i386/hvf/hvf-accel-ops.h
rename to accel/hvf/hvf-accel-ops.h
diff --git a/target/i386/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
similarity index 100%
rename from target/i386/hvf/hvf-accel-ops.c
rename to accel/hvf/hvf-accel-ops.c
diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -XXX,XX +XXX,XX @@
 #include <Hypervisor/hv.h>
 #include <Hypervisor/hv_vmx.h>
 
-#include "hvf-accel-ops.h"
+#include "accel/hvf/hvf-accel-ops.h"
 
 void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
                      SegmentCache *qseg, bool is_tr)
diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/meson.build
+++ b/accel/hvf/meson.build
@@ -XXX,XX +XXX,XX @@
 hvf_ss = ss.source_set()
 hvf_ss.add(files(
   'hvf-all.c',
+  'hvf-accel-ops.c',
 ))
 
 specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/meson.build
+++ b/target/i386/hvf/meson.build
@@ -XXX,XX +XXX,XX @@
 i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
   'hvf.c',
-  'hvf-accel-ops.c',
   'x86.c',
   'x86_cpuid.c',
   'x86_decode.c',
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

This patch moves CPU and memory operations over. While at it, make sure
the code is consumable on non-i386 systems.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-4-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/sysemu/hvf_int.h   |   4 +
 target/i386/hvf/hvf-i386.h |   2 -
 target/i386/hvf/x86hvf.h   |   2 -
 accel/hvf/hvf-accel-ops.c  | 308 ++++++++++++++++++++++++++++++++++++-
 target/i386/hvf/hvf.c      | 302 ------------------------------------
 5 files changed, 311 insertions(+), 307 deletions(-)

diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@
 
 #include <Hypervisor/hv.h>
 
+void hvf_set_phys_mem(MemoryRegionSection *, bool);
 void assert_hvf_ok(hv_return_t ret);
+hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
+int hvf_put_registers(CPUState *);
+int hvf_get_registers(CPUState *);
 
 #endif
diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf-i386.h
+++ b/target/i386/hvf/hvf-i386.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
 };
 extern HVFState *hvf_state;
 
-void hvf_set_phys_mem(MemoryRegionSection *, bool);
 void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
-hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
 
 #ifdef NEED_CPU_H
 /* Functions exported to host specific mode */
diff --git a/target/i386/hvf/x86hvf.h b/target/i386/hvf/x86hvf.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86hvf.h
+++ b/target/i386/hvf/x86hvf.h
@@ -XXX,XX +XXX,XX @@
 #include "x86_descr.h"
 
 int hvf_process_events(CPUState *);
-int hvf_put_registers(CPUState *);
-int hvf_get_registers(CPUState *);
 bool hvf_inject_interrupts(CPUState *);
 void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
                      SegmentCache *qseg, bool is_tr);
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
+#include "exec/address-spaces.h"
+#include "exec/exec-all.h"
+#include "sysemu/cpus.h"
 #include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
 #include "sysemu/runstate.h"
-#include "target/i386/cpu.h"
 #include "qemu/guest-random.h"
 
 #include "hvf-accel-ops.h"
 
+HVFState *hvf_state;
+
+/* Memory slots */
+
+hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
+{
+    hvf_slot *slot;
+    int x;
+    for (x = 0; x < hvf_state->num_slots; ++x) {
+        slot = &hvf_state->slots[x];
+        if (slot->size && start < (slot->start + slot->size) &&
+            (start + size) > slot->start) {
+            return slot;
+        }
+    }
+    return NULL;
+}
+
+struct mac_slot {
+    int present;
+    uint64_t size;
+    uint64_t gpa_start;
+    uint64_t gva;
+};
+
+struct mac_slot mac_slots[32];
+
+static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
+{
+    struct mac_slot *macslot;
+    hv_return_t ret;
+
+    macslot = &mac_slots[slot->slot_id];
+
+    if (macslot->present) {
+        if (macslot->size != slot->size) {
+            macslot->present = 0;
+            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
+            assert_hvf_ok(ret);
+        }
+    }
+
+    if (!slot->size) {
+        return 0;
+    }
+
+    macslot->present = 1;
+    macslot->gpa_start = slot->start;
+    macslot->size = slot->size;
+    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
+    assert_hvf_ok(ret);
+    return 0;
+}
+
+void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
+{
+    hvf_slot *mem;
+    MemoryRegion *area = section->mr;
+    bool writeable = !area->readonly && !area->rom_device;
+    hv_memory_flags_t flags;
+
+    if (!memory_region_is_ram(area)) {
+        if (writeable) {
+            return;
+        } else if (!memory_region_is_romd(area)) {
+            /*
+             * If the memory device is not in romd_mode, then we actually want
+             * to remove the hvf memory slot so all accesses will trap.
+             */
+             add = false;
+        }
+    }
+
+    mem = hvf_find_overlap_slot(
+            section->offset_within_address_space,
+            int128_get64(section->size));
+
+    if (mem && add) {
+        if (mem->size == int128_get64(section->size) &&
+            mem->start == section->offset_within_address_space &&
+            mem->mem == (memory_region_get_ram_ptr(area) +
+            section->offset_within_region)) {
+            return; /* Same region was attempted to register, go away. */
+        }
+    }
+
+    /* Region needs to be reset. set the size to 0 and remap it. */
+    if (mem) {
+        mem->size = 0;
+        if (do_hvf_set_memory(mem, 0)) {
+            error_report("Failed to reset overlapping slot");
+            abort();
+        }
+    }
+
+    if (!add) {
+        return;
+    }
+
+    if (area->readonly ||
+        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
+        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
+    } else {
+        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
+    }
+
+    /* Now make a new slot. */
+    int x;
+
+    for (x = 0; x < hvf_state->num_slots; ++x) {
+        mem = &hvf_state->slots[x];
+        if (!mem->size) {
+            break;
+        }
+    }
+
+    if (x == hvf_state->num_slots) {
+        error_report("No free slots");
+        abort();
+    }
+
+    mem->size = int128_get64(section->size);
+    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
+    mem->start = section->offset_within_address_space;
+    mem->region = area;
+
+    if (do_hvf_set_memory(mem, flags)) {
+        error_report("Error registering new memory slot");
+        abort();
+    }
+}
+
+static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
+{
+    if (!cpu->vcpu_dirty) {
+        hvf_get_registers(cpu);
+        cpu->vcpu_dirty = true;
+    }
+}
+
+void hvf_cpu_synchronize_state(CPUState *cpu)
+{
+    if (!cpu->vcpu_dirty) {
+        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
+    }
+}
+
+static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
+                                              run_on_cpu_data arg)
+{
+    hvf_put_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+void hvf_cpu_synchronize_post_reset(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
+}
+
+static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
+                                             run_on_cpu_data arg)
+{
+    hvf_put_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+void hvf_cpu_synchronize_post_init(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
+}
+
+static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
+                                              run_on_cpu_data arg)
+{
+    cpu->vcpu_dirty = true;
+}
+
+void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
+}
+
+static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
+{
+    hvf_slot *slot;
+
+    slot = hvf_find_overlap_slot(
+            section->offset_within_address_space,
+            int128_get64(section->size));
+
+    /* protect region against writes; begin tracking it */
+    if (on) {
+        slot->flags |= HVF_SLOT_LOG;
+        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
+                      HV_MEMORY_READ);
+    /* stop tracking region*/
+    } else {
+        slot->flags &= ~HVF_SLOT_LOG;
+        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
+                      HV_MEMORY_READ | HV_MEMORY_WRITE);
+    }
+}
+
+static void hvf_log_start(MemoryListener *listener,
+                          MemoryRegionSection *section, int old, int new)
+{
+    if (old != 0) {
+        return;
+    }
+
+    hvf_set_dirty_tracking(section, 1);
+}
+
+static void hvf_log_stop(MemoryListener *listener,
+                         MemoryRegionSection *section, int old, int new)
+{
+    if (new != 0) {
+        return;
+    }
+
+    hvf_set_dirty_tracking(section, 0);
+}
+
+static void hvf_log_sync(MemoryListener *listener,
+                         MemoryRegionSection *section)
+{
+    /*
+     * sync of dirty pages is handled elsewhere; just make sure we keep
+     * tracking the region.
+     */
+    hvf_set_dirty_tracking(section, 1);
+}
+
+static void hvf_region_add(MemoryListener *listener,
+                           MemoryRegionSection *section)
+{
+    hvf_set_phys_mem(section, true);
+}
+
+static void hvf_region_del(MemoryListener *listener,
+                           MemoryRegionSection *section)
+{
+    hvf_set_phys_mem(section, false);
+}
+
+static MemoryListener hvf_memory_listener = {
+    .priority = 10,
+    .region_add = hvf_region_add,
+    .region_del = hvf_region_del,
+    .log_start = hvf_log_start,
+    .log_stop = hvf_log_stop,
+    .log_sync = hvf_log_sync,
+};
+
+static void dummy_signal(int sig)
+{
+}
+
+bool hvf_allowed;
+
+static int hvf_accel_init(MachineState *ms)
+{
+    int x;
+    hv_return_t ret;
+    HVFState *s;
+
+    ret = hv_vm_create(HV_VM_DEFAULT);
+    assert_hvf_ok(ret);
+
+    s = g_new0(HVFState, 1);
+
+    s->num_slots = 32;
+    for (x = 0; x < s->num_slots; ++x) {
+        s->slots[x].size = 0;
+        s->slots[x].slot_id = x;
+    }
+
+    hvf_state = s;
+    memory_listener_register(&hvf_memory_listener, &address_space_memory);
+    return 0;
+}
+
+static void hvf_accel_class_init(ObjectClass *oc, void *data)
+{
+    AccelClass *ac = ACCEL_CLASS(oc);
+    ac->name = "HVF";
+    ac->init_machine = hvf_accel_init;
+    ac->allowed = &hvf_allowed;
+}
+
+static const TypeInfo hvf_accel_type = {
+    .name = TYPE_HVF_ACCEL,
+    .parent = TYPE_ACCEL,
+    .class_init = hvf_accel_class_init,
+};
+
+static void hvf_type_init(void)
+{
+    type_register_static(&hvf_accel_type);
+}
+
+type_init(hvf_type_init);
+
 /*
  * The HVF-specific vCPU thread function. This one should only run when the host
  * CPU supports the VMX "unrestricted guest" feature.
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
 
 #include "hvf-accel-ops.h"
 
-HVFState *hvf_state;
-
-/* Memory slots */
-hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
-{
-    hvf_slot *slot;
-    int x;
-    for (x = 0; x < hvf_state->num_slots; ++x) {
-        slot = &hvf_state->slots[x];
-        if (slot->size && start < (slot->start + slot->size) &&
-            (start + size) > slot->start) {
-            return slot;
-        }
-    }
-    return NULL;
-}
-
-struct mac_slot {
-    int present;
-    uint64_t size;
-    uint64_t gpa_start;
-    uint64_t gva;
-};
-
-struct mac_slot mac_slots[32];
-
-static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
-{
-    struct mac_slot *macslot;
-    hv_return_t ret;
-
-    macslot = &mac_slots[slot->slot_id];
-
-    if (macslot->present) {
-        if (macslot->size != slot->size) {
-            macslot->present = 0;
-            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
-            assert_hvf_ok(ret);
-        }
-    }
-
-    if (!slot->size) {
-        return 0;
-    }
-
-    macslot->present = 1;
-    macslot->gpa_start = slot->start;
-    macslot->size = slot->size;
-    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
-    assert_hvf_ok(ret);
-    return 0;
-}
-
-void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
-{
-    hvf_slot *mem;
-    MemoryRegion *area = section->mr;
-    bool writeable = !area->readonly && !area->rom_device;
-    hv_memory_flags_t flags;
-
-    if (!memory_region_is_ram(area)) {
-        if (writeable) {
-            return;
-        } else if (!memory_region_is_romd(area)) {
-            /*
-             * If the memory device is not in romd_mode, then we actually want
-             * to remove the hvf memory slot so all accesses will trap.
-             */
-             add = false;
-        }
-    }
-
-    mem = hvf_find_overlap_slot(
-            section->offset_within_address_space,
-            int128_get64(section->size));
-
-    if (mem && add) {
-        if (mem->size == int128_get64(section->size) &&
-            mem->start == section->offset_within_address_space &&
-            mem->mem == (memory_region_get_ram_ptr(area) +
-            section->offset_within_region)) {
-            return; /* Same region was attempted to register, go away. */
-        }
-    }
-
-    /* Region needs to be reset. set the size to 0 and remap it. */
-    if (mem) {
-        mem->size = 0;
-        if (do_hvf_set_memory(mem, 0)) {
-            error_report("Failed to reset overlapping slot");
-            abort();
-        }
-    }
-
-    if (!add) {
-        return;
-    }
-
-    if (area->readonly ||
-        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
-        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
-    } else {
-        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
-    }
-
-    /* Now make a new slot. */
-    int x;
-
-    for (x = 0; x < hvf_state->num_slots; ++x) {
-        mem = &hvf_state->slots[x];
-        if (!mem->size) {
-            break;
-        }
-    }
-
-    if (x == hvf_state->num_slots) {
-        error_report("No free slots");
-        abort();
-    }
-
-    mem->size = int128_get64(section->size);
-    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
-    mem->start = section->offset_within_address_space;
-    mem->region = area;
-
-    if (do_hvf_set_memory(mem, flags)) {
-        error_report("Error registering new memory slot");
-        abort();
-    }
-}
-
 void vmx_update_tpr(CPUState *cpu)
 {
     /* TODO: need integrate APIC handling */
@@ -XXX,XX +XXX,XX @@ void hvf_handle_io(CPUArchState *env, uint16_t port, void *buffer,
     }
 }
 
-static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
-{
-    if (!cpu->vcpu_dirty) {
-        hvf_get_registers(cpu);
-        cpu->vcpu_dirty = true;
-    }
-}
-
-void hvf_cpu_synchronize_state(CPUState *cpu)
-{
-    if (!cpu->vcpu_dirty) {
-        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
-    }
-}
-
-static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
-                                              run_on_cpu_data arg)
-{
-    hvf_put_registers(cpu);
-    cpu->vcpu_dirty = false;
-}
-
-void hvf_cpu_synchronize_post_reset(CPUState *cpu)
-{
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
-}
-
-static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
-                                             run_on_cpu_data arg)
-{
-    hvf_put_registers(cpu);
-    cpu->vcpu_dirty = false;
-}
-
-void hvf_cpu_synchronize_post_init(CPUState *cpu)
-{
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
-}
-
-static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
-                                              run_on_cpu_data arg)
-{
-    cpu->vcpu_dirty = true;
-}
-
-void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
-{
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
-}
-
 static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
 {
     int read, write;
@@ -XXX,XX +XXX,XX @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
     return false;
 }
 
-static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
-{
-    hvf_slot *slot;
-
-    slot = hvf_find_overlap_slot(
-            section->offset_within_address_space,
-            int128_get64(section->size));
-
-    /* protect region against writes; begin tracking it */
-    if (on) {
-        slot->flags |= HVF_SLOT_LOG;
-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
-                      HV_MEMORY_READ);
-    /* stop tracking region*/
-    } else {
-        slot->flags &= ~HVF_SLOT_LOG;
-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
-                      HV_MEMORY_READ | HV_MEMORY_WRITE);
-    }
-}
-
-static void hvf_log_start(MemoryListener *listener,
-                          MemoryRegionSection *section, int old, int new)
-{
-    if (old != 0) {
-        return;
-    }
-
-    hvf_set_dirty_tracking(section, 1);
-}
-
-static void hvf_log_stop(MemoryListener *listener,
-                         MemoryRegionSection *section, int old, int new)
-{
-    if (new != 0) {
-        return;
-    }
-
-    hvf_set_dirty_tracking(section, 0);
-}
-
-static void hvf_log_sync(MemoryListener *listener,
-                         MemoryRegionSection *section)
-{
-    /*
-     * sync of dirty pages is handled elsewhere; just make sure we keep
-     * tracking the region.
-     */
-    hvf_set_dirty_tracking(section, 1);
-}
-
-static void hvf_region_add(MemoryListener *listener,
-                           MemoryRegionSection *section)
-{
-    hvf_set_phys_mem(section, true);
-}
-
-static void hvf_region_del(MemoryListener *listener,
-                           MemoryRegionSection *section)
-{
-    hvf_set_phys_mem(section, false);
-}
-
-static MemoryListener hvf_memory_listener = {
-    .priority = 10,
-    .region_add = hvf_region_add,
-    .region_del = hvf_region_del,
-    .log_start = hvf_log_start,
-    .log_stop = hvf_log_stop,
-    .log_sync = hvf_log_sync,
-};
-
 void hvf_vcpu_destroy(CPUState *cpu)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
@@ -XXX,XX +XXX,XX @@ void hvf_vcpu_destroy(CPUState *cpu)
     assert_hvf_ok(ret);
 }
 
-static void dummy_signal(int sig)
-{
-}
-
 static void init_tsc_freq(CPUX86State *env)
 {
     size_t length;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
 
     return ret;
 }
-
-bool hvf_allowed;
-
-static int hvf_accel_init(MachineState *ms)
-{
-    int x;
-    hv_return_t ret;
-    HVFState *s;
-
-    ret = hv_vm_create(HV_VM_DEFAULT);
-    assert_hvf_ok(ret);
-
-    s = g_new0(HVFState, 1);
- 
-    s->num_slots = 32;
-    for (x = 0; x < s->num_slots; ++x) {
-        s->slots[x].size = 0;
-        s->slots[x].slot_id = x;
-    }
-  
-    hvf_state = s;
-    memory_listener_register(&hvf_memory_listener, &address_space_memory);
-    return 0;
-}
-
-static void hvf_accel_class_init(ObjectClass *oc, void *data)
-{
-    AccelClass *ac = ACCEL_CLASS(oc);
-    ac->name = "HVF";
-    ac->init_machine = hvf_accel_init;
-    ac->allowed = &hvf_allowed;
-}
-
-static const TypeInfo hvf_accel_type = {
-    .name = TYPE_HVF_ACCEL,
-    .parent = TYPE_ACCEL,
-    .class_init = hvf_accel_class_init,
-};
-
-static void hvf_type_init(void)
-{
-    type_register_static(&hvf_accel_type);
-}
-
-type_init(hvf_type_init);
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

This patch moves a few internal struct and constant defines over.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-5-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/sysemu/hvf_int.h   | 30 ++++++++++++++++++++++++++++++
 target/i386/hvf/hvf-i386.h | 31 +------------------------------
 2 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@
 
 #include <Hypervisor/hv.h>
 
+/* hvf_slot flags */
+#define HVF_SLOT_LOG (1 << 0)
+
+typedef struct hvf_slot {
+    uint64_t start;
+    uint64_t size;
+    uint8_t *mem;
+    int slot_id;
+    uint32_t flags;
+    MemoryRegion *region;
+} hvf_slot;
+
+typedef struct hvf_vcpu_caps {
+    uint64_t vmx_cap_pinbased;
+    uint64_t vmx_cap_procbased;
+    uint64_t vmx_cap_procbased2;
+    uint64_t vmx_cap_entry;
+    uint64_t vmx_cap_exit;
+    uint64_t vmx_cap_preemption_timer;
+} hvf_vcpu_caps;
+
+struct HVFState {
+    AccelState parent;
+    hvf_slot slots[32];
+    int num_slots;
+
+    hvf_vcpu_caps *hvf_caps;
+};
+extern HVFState *hvf_state;
+
 void hvf_set_phys_mem(MemoryRegionSection *, bool);
 void assert_hvf_ok(hv_return_t ret);
 hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf-i386.h
+++ b/target/i386/hvf/hvf-i386.h
@@ -XXX,XX +XXX,XX @@
 
 #include "qemu/accel.h"
 #include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
 #include "cpu.h"
 #include "x86.h"
 
-/* hvf_slot flags */
-#define HVF_SLOT_LOG (1 << 0)
-
-typedef struct hvf_slot {
-    uint64_t start;
-    uint64_t size;
-    uint8_t *mem;
-    int slot_id;
-    uint32_t flags;
-    MemoryRegion *region;
-} hvf_slot;
-
-typedef struct hvf_vcpu_caps {
-    uint64_t vmx_cap_pinbased;
-    uint64_t vmx_cap_procbased;
-    uint64_t vmx_cap_procbased2;
-    uint64_t vmx_cap_entry;
-    uint64_t vmx_cap_exit;
-    uint64_t vmx_cap_preemption_timer;
-} hvf_vcpu_caps;
-
-struct HVFState {
-    AccelState parent;
-    hvf_slot slots[32];
-    int num_slots;
-
-    hvf_vcpu_caps *hvf_caps;
-};
-extern HVFState *hvf_state;
-
 void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
 
 #ifdef NEED_CPU_H
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

The hvf_set_phys_mem() function is only called within the same file.
Make it static.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-6-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/sysemu/hvf_int.h  | 1 -
 accel/hvf/hvf-accel-ops.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

From: Alexander Graf <agraf@csgraf.de>

The ARM version of Hypervisor.framework no longer defines these two
types, so let's just revert to standard ones.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-7-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
     macslot->present = 1;
     macslot->gpa_start = slot->start;
     macslot->size = slot->size;
-    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
+    ret = hv_vm_map(slot->mem, slot->start, slot->size, flags);
     assert_hvf_ok(ret);
     return 0;
 }
@@ -XXX,XX +XXX,XX @@ static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
     /* protect region against writes; begin tracking it */
     if (on) {
         slot->flags |= HVF_SLOT_LOG;
-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
+        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
                       HV_MEMORY_READ);
     /* stop tracking region*/
     } else {
         slot->flags &= ~HVF_SLOT_LOG;
-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
+        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
                       HV_MEMORY_READ | HV_MEMORY_WRITE);
     }
 }
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

This patch splits the vcpu init and destroy functions into a generic and
an architecture specific portion. This also allows us to move the generic
functions into the generic hvf code, removing exported functions.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-8-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.h |  2 --
 include/sysemu/hvf_int.h  |  2 ++
 accel/hvf/hvf-accel-ops.c | 30 ++++++++++++++++++++++++++++++
 target/i386/hvf/hvf.c     | 23 ++---------------------
 4 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.h
+++ b/accel/hvf/hvf-accel-ops.h
@@ -XXX,XX +XXX,XX @@
 
 #include "sysemu/cpus.h"
 
-int hvf_init_vcpu(CPUState *);
 int hvf_vcpu_exec(CPUState *);
 void hvf_cpu_synchronize_state(CPUState *);
 void hvf_cpu_synchronize_post_reset(CPUState *);
 void hvf_cpu_synchronize_post_init(CPUState *);
 void hvf_cpu_synchronize_pre_loadvm(CPUState *);
-void hvf_vcpu_destroy(CPUState *);
 
 #endif /* HVF_CPUS_H */
diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
 extern HVFState *hvf_state;
 
 void assert_hvf_ok(hv_return_t ret);
+int hvf_arch_init_vcpu(CPUState *cpu);
+void hvf_arch_vcpu_destroy(CPUState *cpu);
 hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
 int hvf_put_registers(CPUState *);
 int hvf_get_registers(CPUState *);
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static void hvf_type_init(void)
 
 type_init(hvf_type_init);
 
+static void hvf_vcpu_destroy(CPUState *cpu)
+{
+    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
+    assert_hvf_ok(ret);
+
+    hvf_arch_vcpu_destroy(cpu);
+}
+
+static int hvf_init_vcpu(CPUState *cpu)
+{
+    int r;
+
+    /* init cpu signals */
+    sigset_t set;
+    struct sigaction sigact;
+
+    memset(&sigact, 0, sizeof(sigact));
+    sigact.sa_handler = dummy_signal;
+    sigaction(SIG_IPI, &sigact, NULL);
+
+    pthread_sigmask(SIG_BLOCK, NULL, &set);
+    sigdelset(&set, SIG_IPI);
+
+    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
+    cpu->vcpu_dirty = 1;
+    assert_hvf_ok(r);
+
+    return hvf_arch_init_vcpu(cpu);
+}
+
 /*
  * The HVF-specific vCPU thread function. This one should only run when the host
  * CPU supports the VMX "unrestricted guest" feature.
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
     return false;
 }
 
-void hvf_vcpu_destroy(CPUState *cpu)
+void hvf_arch_vcpu_destroy(CPUState *cpu)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
     CPUX86State *env = &x86_cpu->env;
 
-    hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd);
     g_free(env->hvf_mmio_buf);
-    assert_hvf_ok(ret);
 }
 
 static void init_tsc_freq(CPUX86State *env)
@@ -XXX,XX +XXX,XX @@ static inline bool apic_bus_freq_is_known(CPUX86State *env)
     return env->apic_bus_freq != 0;
 }
 
-int hvf_init_vcpu(CPUState *cpu)
+int hvf_arch_init_vcpu(CPUState *cpu)
 {
-
     X86CPU *x86cpu = X86_CPU(cpu);
     CPUX86State *env = &x86cpu->env;
-    int r;
-
-    /* init cpu signals */
-    sigset_t set;
-    struct sigaction sigact;
-
-    memset(&sigact, 0, sizeof(sigact));
-    sigact.sa_handler = dummy_signal;
-    sigaction(SIG_IPI, &sigact, NULL);
-
-    pthread_sigmask(SIG_BLOCK, NULL, &set);
-    sigdelset(&set, SIG_IPI);
 
     init_emu();
     init_decoder();
@@ -XXX,XX +XXX,XX @@ int hvf_init_vcpu(CPUState *cpu)
         }
     }
 
-    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
-    cpu->vcpu_dirty = 1;
-    assert_hvf_ok(r);
-
     if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED,
         &hvf_state->hvf_caps->vmx_cap_pinbased)) {
         abort();
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

There is no reason to call the hvf specific hvf_cpu_synchronize_state()
when we can just use the generic cpu_synchronize_state() instead. This
allows us to have less dependency on internal function definitions and
allows us to make hvf_cpu_synchronize_state() static.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-9-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.h | 1 -
 accel/hvf/hvf-accel-ops.c | 2 +-
 target/i386/hvf/x86hvf.c  | 9 ++++-----
 3 files changed, 5 insertions(+), 7 deletions(-)

From: Alexander Graf <agraf@csgraf.de>

The hvf accel synchronize functions are only used as input for local
callback functions, so we can make them static.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-10-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.h | 3 ---
 accel/hvf/hvf-accel-ops.c | 6 +++---
 2 files changed, 3 insertions(+), 6 deletions(-)

From: Alexander Graf <agraf@csgraf.de>

We can move the definition of hvf_vcpu_exec() into our internal
hvf header, obsoleting the need for hvf-accel-ops.h.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-11-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.h | 17 -----------------
 include/sysemu/hvf_int.h  |  1 +
 accel/hvf/hvf-accel-ops.c |  2 --
 target/i386/hvf/hvf.c     |  2 --
 4 files changed, 1 insertion(+), 21 deletions(-)
 delete mode 100644 accel/hvf/hvf-accel-ops.h

diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
deleted file mode 100644
index XXXXXXX..XXXXXXX
--- a/accel/hvf/hvf-accel-ops.h
+++ /dev/null
@@ -XXX,XX +XXX,XX @@
-/*
- * Accelerator CPUS Interface
- *
- * Copyright 2020 SUSE LLC
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- */
-
-#ifndef HVF_CPUS_H
-#define HVF_CPUS_H
-
-#include "sysemu/cpus.h"
-
-int hvf_vcpu_exec(CPUState *);
-
-#endif /* HVF_CPUS_H */
diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ extern HVFState *hvf_state;
 void assert_hvf_ok(hv_return_t ret);
 int hvf_arch_init_vcpu(CPUState *cpu);
 void hvf_arch_vcpu_destroy(CPUState *cpu);
+int hvf_vcpu_exec(CPUState *);
 hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
 int hvf_put_registers(CPUState *);
 int hvf_get_registers(CPUState *);
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@
 #include "sysemu/runstate.h"
 #include "qemu/guest-random.h"
 
-#include "hvf-accel-ops.h"
-
 HVFState *hvf_state;
 
 /* Memory slots */
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/accel.h"
 #include "target/i386/cpu.h"
 
-#include "hvf-accel-ops.h"
-
 void vmx_update_tpr(CPUState *cpu)
 {
     /* TODO: need integrate APIC handling */
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

We will need more than a single field for hvf going forward. To keep
the global vcpu struct uncluttered, let's allocate a special hvf vcpu
struct, similar to how hax does it.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-12-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/core/cpu.h       |   3 +-
 include/sysemu/hvf_int.h    |   4 +
 target/i386/hvf/vmx.h       |  24 +++--
 accel/hvf/hvf-accel-ops.c   |   8 +-
 target/i386/hvf/hvf.c       | 104 +++++++++---------
 target/i386/hvf/x86.c       |  28 ++---
 target/i386/hvf/x86_descr.c |  26 ++---
 target/i386/hvf/x86_emu.c   |  62 +++++------
 target/i386/hvf/x86_mmu.c   |   4 +-
 target/i386/hvf/x86_task.c  |  12 +--
 target/i386/hvf/x86hvf.c    | 210 ++++++++++++++++++------------------
 11 files changed, 248 insertions(+), 237 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -XXX,XX +XXX,XX @@ struct KVMState;
 struct kvm_run;
 
 struct hax_vcpu_state;
+struct hvf_vcpu_state;
 
 #define TB_JMP_CACHE_BITS 12
 #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
@@ -XXX,XX +XXX,XX @@ struct CPUState {
 
     struct hax_vcpu_state *hax_vcpu;
 
-    int hvf_fd;
+    struct hvf_vcpu_state *hvf;
 
     /* track IOMMUs whose translations we've cached in the TCG TLB */
     GArray *iommu_notifiers;
diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
 };
 extern HVFState *hvf_state;
 
+struct hvf_vcpu_state {
+    int fd;
+};
+
 void assert_hvf_ok(hv_return_t ret);
 int hvf_arch_init_vcpu(CPUState *cpu);
 void hvf_arch_vcpu_destroy(CPUState *cpu);
diff --git a/target/i386/hvf/vmx.h b/target/i386/hvf/vmx.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/vmx.h
+++ b/target/i386/hvf/vmx.h
@@ -XXX,XX +XXX,XX @@
 #include "vmcs.h"
 #include "cpu.h"
 #include "x86.h"
+#include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
 
 #include "exec/address-spaces.h"
 
@@ -XXX,XX +XXX,XX @@ static inline void macvm_set_rip(CPUState *cpu, uint64_t rip)
     uint64_t val;
 
     /* BUG, should take considering overlap.. */
-    wreg(cpu->hvf_fd, HV_X86_RIP, rip);
+    wreg(cpu->hvf->fd, HV_X86_RIP, rip);
     env->eip = rip;
 
     /* after moving forward in rip, we need to clean INTERRUPTABILITY */
-   val = rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
+   val = rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
    if (val & (VMCS_INTERRUPTIBILITY_STI_BLOCKING |
                VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)) {
         env->hflags &= ~HF_INHIBIT_IRQ_MASK;
-        wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY,
+        wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY,
                val & ~(VMCS_INTERRUPTIBILITY_STI_BLOCKING |
                VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING));
    }
@@ -XXX,XX +XXX,XX @@ static inline void vmx_clear_nmi_blocking(CPUState *cpu)
     CPUX86State *env = &x86_cpu->env;
 
     env->hflags2 &= ~HF2_NMI_MASK;
-    uint32_t gi = (uint32_t) rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
+    uint32_t gi = (uint32_t) rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
     gi &= ~VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
-    wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
+    wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
 }
 
 static inline void vmx_set_nmi_blocking(CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ static inline void vmx_set_nmi_blocking(CPUState *cpu)
     CPUX86State *env = &x86_cpu->env;
 
     env->hflags2 |= HF2_NMI_MASK;
-    uint32_t gi = (uint32_t)rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
+    uint32_t gi = (uint32_t)rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
     gi |= VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
-    wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
+    wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
 }
 
 static inline void vmx_set_nmi_window_exiting(CPUState *cpu)
 {
     uint64_t val;
-    val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
-    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val |
+    val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
+    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val |
           VMCS_PRI_PROC_BASED_CTLS_NMI_WINDOW_EXITING);
 
 }
@@ -XXX,XX +XXX,XX @@ static inline void vmx_clear_nmi_window_exiting(CPUState *cpu)
 {
 
     uint64_t val;
-    val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
-    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val &
+    val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
+    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val &
           ~VMCS_PRI_PROC_BASED_CTLS_NMI_WINDOW_EXITING);
 }
 
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ type_init(hvf_type_init);
 
 static void hvf_vcpu_destroy(CPUState *cpu)
 {
-    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
+    hv_return_t ret = hv_vcpu_destroy(cpu->hvf->fd);
     assert_hvf_ok(ret);
 
     hvf_arch_vcpu_destroy(cpu);
+    g_free(cpu->hvf);
+    cpu->hvf = NULL;
 }
 
 static int hvf_init_vcpu(CPUState *cpu)
 {
     int r;
 
+    cpu->hvf = g_malloc0(sizeof(*cpu->hvf));
+
     /* init cpu signals */
     sigset_t set;
     struct sigaction sigact;
@@ -XXX,XX +XXX,XX @@ static int hvf_init_vcpu(CPUState *cpu)
     pthread_sigmask(SIG_BLOCK, NULL, &set);
     sigdelset(&set, SIG_IPI);
 
-    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
+    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf->fd, HV_VCPU_DEFAULT);
     cpu->vcpu_dirty = 1;
     assert_hvf_ok(r);
 
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@ void vmx_update_tpr(CPUState *cpu)
     int tpr = cpu_get_apic_tpr(x86_cpu->apic_state) << 4;
     int irr = apic_get_highest_priority_irr(x86_cpu->apic_state);
 
-    wreg(cpu->hvf_fd, HV_X86_TPR, tpr);
+    wreg(cpu->hvf->fd, HV_X86_TPR, tpr);
     if (irr == -1) {
-        wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, 0);
+        wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, 0);
     } else {
-        wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, (irr > tpr) ? tpr >> 4 :
+        wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, (irr > tpr) ? tpr >> 4 :
               irr >> 4);
     }
 }
@@ -XXX,XX +XXX,XX @@ void vmx_update_tpr(CPUState *cpu)
 static void update_apic_tpr(CPUState *cpu)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
-    int tpr = rreg(cpu->hvf_fd, HV_X86_TPR) >> 4;
+    int tpr = rreg(cpu->hvf->fd, HV_X86_TPR) >> 4;
     cpu_set_apic_tpr(x86_cpu->apic_state, tpr);
 }
 
@@ -XXX,XX +XXX,XX @@ int hvf_arch_init_vcpu(CPUState *cpu)
     }
 
     /* set VMCS control fields */
-    wvmcs(cpu->hvf_fd, VMCS_PIN_BASED_CTLS,
+    wvmcs(cpu->hvf->fd, VMCS_PIN_BASED_CTLS,
           cap2ctrl(hvf_state->hvf_caps->vmx_cap_pinbased,
           VMCS_PIN_BASED_CTLS_EXTINT |
           VMCS_PIN_BASED_CTLS_NMI |
           VMCS_PIN_BASED_CTLS_VNMI));
-    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS,
+    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS,
           cap2ctrl(hvf_state->hvf_caps->vmx_cap_procbased,
           VMCS_PRI_PROC_BASED_CTLS_HLT |
           VMCS_PRI_PROC_BASED_CTLS_MWAIT |
           VMCS_PRI_PROC_BASED_CTLS_TSC_OFFSET |
           VMCS_PRI_PROC_BASED_CTLS_TPR_SHADOW) |
           VMCS_PRI_PROC_BASED_CTLS_SEC_CONTROL);
-    wvmcs(cpu->hvf_fd, VMCS_SEC_PROC_BASED_CTLS,
+    wvmcs(cpu->hvf->fd, VMCS_SEC_PROC_BASED_CTLS,
           cap2ctrl(hvf_state->hvf_caps->vmx_cap_procbased2,
                    VMCS_PRI_PROC_BASED2_CTLS_APIC_ACCESSES));
 
-    wvmcs(cpu->hvf_fd, VMCS_ENTRY_CTLS, cap2ctrl(hvf_state->hvf_caps->vmx_cap_entry,
+    wvmcs(cpu->hvf->fd, VMCS_ENTRY_CTLS, cap2ctrl(hvf_state->hvf_caps->vmx_cap_entry,
           0));
-    wvmcs(cpu->hvf_fd, VMCS_EXCEPTION_BITMAP, 0); /* Double fault */
+    wvmcs(cpu->hvf->fd, VMCS_EXCEPTION_BITMAP, 0); /* Double fault */
 
-    wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, 0);
+    wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, 0);
 
     x86cpu = X86_CPU(cpu);
     x86cpu->env.xsave_buf = qemu_memalign(4096, 4096);
 
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_STAR, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_LSTAR, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_CSTAR, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_FMASK, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_FSBASE, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_GSBASE, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_KERNELGSBASE, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_TSC_AUX, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_TSC, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_CS, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_EIP, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_ESP, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_STAR, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_LSTAR, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_CSTAR, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_FMASK, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_FSBASE, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_GSBASE, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_KERNELGSBASE, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_TSC_AUX, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_TSC, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_CS, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_EIP, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_ESP, 1);
 
     return 0;
 }
@@ -XXX,XX +XXX,XX @@ static void hvf_store_events(CPUState *cpu, uint32_t ins_len, uint64_t idtvec_in
         }
         if (idtvec_info & VMCS_IDT_VEC_ERRCODE_VALID) {
             env->has_error_code = true;
-            env->error_code = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_ERROR);
+            env->error_code = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_ERROR);
         }
     }
-    if ((rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY) &
+    if ((rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY) &
         VMCS_INTERRUPTIBILITY_NMI_BLOCKING)) {
         env->hflags2 |= HF2_NMI_MASK;
     } else {
         env->hflags2 &= ~HF2_NMI_MASK;
     }
-    if (rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY) &
+    if (rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY) &
          (VMCS_INTERRUPTIBILITY_STI_BLOCKING |
          VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)) {
         env->hflags |= HF_INHIBIT_IRQ_MASK;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
             return EXCP_HLT;
         }
 
-        hv_return_t r  = hv_vcpu_run(cpu->hvf_fd);
+        hv_return_t r  = hv_vcpu_run(cpu->hvf->fd);
         assert_hvf_ok(r);
 
         /* handle VMEXIT */
-        uint64_t exit_reason = rvmcs(cpu->hvf_fd, VMCS_EXIT_REASON);
-        uint64_t exit_qual = rvmcs(cpu->hvf_fd, VMCS_EXIT_QUALIFICATION);
-        uint32_t ins_len = (uint32_t)rvmcs(cpu->hvf_fd,
+        uint64_t exit_reason = rvmcs(cpu->hvf->fd, VMCS_EXIT_REASON);
+        uint64_t exit_qual = rvmcs(cpu->hvf->fd, VMCS_EXIT_QUALIFICATION);
+        uint32_t ins_len = (uint32_t)rvmcs(cpu->hvf->fd,
                                            VMCS_EXIT_INSTRUCTION_LENGTH);
 
-        uint64_t idtvec_info = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_INFO);
+        uint64_t idtvec_info = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_INFO);
 
         hvf_store_events(cpu, ins_len, idtvec_info);
-        rip = rreg(cpu->hvf_fd, HV_X86_RIP);
-        env->eflags = rreg(cpu->hvf_fd, HV_X86_RFLAGS);
+        rip = rreg(cpu->hvf->fd, HV_X86_RIP);
+        env->eflags = rreg(cpu->hvf->fd, HV_X86_RFLAGS);
 
         qemu_mutex_lock_iothread();
 
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
         case EXIT_REASON_EPT_FAULT:
         {
             hvf_slot *slot;
-            uint64_t gpa = rvmcs(cpu->hvf_fd, VMCS_GUEST_PHYSICAL_ADDRESS);
+            uint64_t gpa = rvmcs(cpu->hvf->fd, VMCS_GUEST_PHYSICAL_ADDRESS);
 
             if (((idtvec_info & VMCS_IDT_VEC_VALID) == 0) &&
                 ((exit_qual & EXIT_QUAL_NMIUDTI) != 0)) {
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
                 store_regs(cpu);
                 break;
             } else if (!string && !in) {
-                RAX(env) = rreg(cpu->hvf_fd, HV_X86_RAX);
+                RAX(env) = rreg(cpu->hvf->fd, HV_X86_RAX);
                 hvf_handle_io(env, port, &RAX(env), 1, size, 1);
                 macvm_set_rip(cpu, rip + ins_len);
                 break;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
             break;
         }
         case EXIT_REASON_CPUID: {
-            uint32_t rax = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RAX);
-            uint32_t rbx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RBX);
-            uint32_t rcx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RCX);
-            uint32_t rdx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RDX);
+            uint32_t rax = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RAX);
+            uint32_t rbx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RBX);
+            uint32_t rcx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RCX);
+            uint32_t rdx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RDX);
 
             if (rax == 1) {
                 /* CPUID1.ecx.OSXSAVE needs to know CR4 */
-                env->cr[4] = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR4);
+                env->cr[4] = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR4);
             }
             hvf_cpu_x86_cpuid(env, rax, rcx, &rax, &rbx, &rcx, &rdx);
 
-            wreg(cpu->hvf_fd, HV_X86_RAX, rax);
-            wreg(cpu->hvf_fd, HV_X86_RBX, rbx);
-            wreg(cpu->hvf_fd, HV_X86_RCX, rcx);
-            wreg(cpu->hvf_fd, HV_X86_RDX, rdx);
+            wreg(cpu->hvf->fd, HV_X86_RAX, rax);
+            wreg(cpu->hvf->fd, HV_X86_RBX, rbx);
+            wreg(cpu->hvf->fd, HV_X86_RCX, rcx);
+            wreg(cpu->hvf->fd, HV_X86_RDX, rdx);
 
             macvm_set_rip(cpu, rip + ins_len);
             break;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
         case EXIT_REASON_XSETBV: {
             X86CPU *x86_cpu = X86_CPU(cpu);
             CPUX86State *env = &x86_cpu->env;
-            uint32_t eax = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RAX);
-            uint32_t ecx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RCX);
-            uint32_t edx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RDX);
+            uint32_t eax = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RAX);
+            uint32_t ecx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RCX);
+            uint32_t edx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RDX);
 
             if (ecx) {
                 macvm_set_rip(cpu, rip + ins_len);
                 break;
             }
             env->xcr0 = ((uint64_t)edx << 32) | eax;
-            wreg(cpu->hvf_fd, HV_X86_XCR0, env->xcr0 | 1);
+            wreg(cpu->hvf->fd, HV_X86_XCR0, env->xcr0 | 1);
             macvm_set_rip(cpu, rip + ins_len);
             break;
         }
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
 
             switch (cr) {
             case 0x0: {
-                macvm_set_cr0(cpu->hvf_fd, RRX(env, reg));
+                macvm_set_cr0(cpu->hvf->fd, RRX(env, reg));
                 break;
             }
             case 4: {
-                macvm_set_cr4(cpu->hvf_fd, RRX(env, reg));
+                macvm_set_cr4(cpu->hvf->fd, RRX(env, reg));
                 break;
             }
             case 8: {
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
             break;
         }
         case EXIT_REASON_TASK_SWITCH: {
-            uint64_t vinfo = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_INFO);
+            uint64_t vinfo = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_INFO);
             x68_segment_selector sel = {.sel = exit_qual & 0xffff};
             vmx_handle_task_switch(cpu, sel, (exit_qual >> 30) & 0x3,
              vinfo & VMCS_INTR_VALID, vinfo & VECTORING_INFO_VECTOR_MASK, vinfo
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
             break;
         }
         case EXIT_REASON_RDPMC:
-            wreg(cpu->hvf_fd, HV_X86_RAX, 0);
-            wreg(cpu->hvf_fd, HV_X86_RDX, 0);
+            wreg(cpu->hvf->fd, HV_X86_RAX, 0);
+            wreg(cpu->hvf->fd, HV_X86_RDX, 0);
             macvm_set_rip(cpu, rip + ins_len);
             break;
         case VMX_REASON_VMCALL:
diff --git a/target/i386/hvf/x86.c b/target/i386/hvf/x86.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86.c
+++ b/target/i386/hvf/x86.c
@@ -XXX,XX +XXX,XX @@ bool x86_read_segment_descriptor(struct CPUState *cpu,
     }
 
     if (GDT_SEL == sel.ti) {
-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_BASE);
-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_BASE);
+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
     } else {
-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_BASE);
-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_LIMIT);
+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_BASE);
+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_LIMIT);
     }
 
     if (sel.index * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_write_segment_descriptor(struct CPUState *cpu,
     uint32_t limit;
     
     if (GDT_SEL == sel.ti) {
-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_BASE);
-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_BASE);
+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
     } else {
-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_BASE);
-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_LIMIT);
+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_BASE);
+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_LIMIT);
     }
     
     if (sel.index * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_write_segment_descriptor(struct CPUState *cpu,
 bool x86_read_call_gate(struct CPUState *cpu, struct x86_call_gate *idt_desc,
                         int gate)
 {
-    target_ulong base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_IDTR_BASE);
-    uint32_t limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_IDTR_LIMIT);
+    target_ulong base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_IDTR_BASE);
+    uint32_t limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_IDTR_LIMIT);
 
     memset(idt_desc, 0, sizeof(*idt_desc));
     if (gate * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_read_call_gate(struct CPUState *cpu, struct x86_call_gate *idt_desc,
 
 bool x86_is_protected(struct CPUState *cpu)
 {
-    uint64_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
+    uint64_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
     return cr0 & CR0_PE;
 }
 
@@ -XXX,XX +XXX,XX @@ bool x86_is_v8086(struct CPUState *cpu)
 
 bool x86_is_long_mode(struct CPUState *cpu)
 {
-    return rvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER) & MSR_EFER_LMA;
+    return rvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER) & MSR_EFER_LMA;
 }
 
 bool x86_is_long64_mode(struct CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ bool x86_is_long64_mode(struct CPUState *cpu)
 
 bool x86_is_paging_mode(struct CPUState *cpu)
 {
-    uint64_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
+    uint64_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
     return cr0 & CR0_PG;
 }
 
 bool x86_is_pae_enabled(struct CPUState *cpu)
 {
-    uint64_t cr4 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR4);
+    uint64_t cr4 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR4);
     return cr4 & CR4_PAE;
 }
 
diff --git a/target/i386/hvf/x86_descr.c b/target/i386/hvf/x86_descr.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86_descr.c
+++ b/target/i386/hvf/x86_descr.c
@@ -XXX,XX +XXX,XX @@ static const struct vmx_segment_field {
 
 uint32_t vmx_read_segment_limit(CPUState *cpu, X86Seg seg)
 {
-    return (uint32_t)rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].limit);
+    return (uint32_t)rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].limit);
 }
 
 uint32_t vmx_read_segment_ar(CPUState *cpu, X86Seg seg)
 {
-    return (uint32_t)rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].ar_bytes);
+    return (uint32_t)rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].ar_bytes);
 }
 
 uint64_t vmx_read_segment_base(CPUState *cpu, X86Seg seg)
 {
-    return rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].base);
+    return rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].base);
 }
 
 x68_segment_selector vmx_read_segment_selector(CPUState *cpu, X86Seg seg)
 {
     x68_segment_selector sel;
-    sel.sel = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector);
+    sel.sel = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector);
     return sel;
 }
 
 void vmx_write_segment_selector(struct CPUState *cpu, x68_segment_selector selector, X86Seg seg)
 {
-    wvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector, selector.sel);
+    wvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector, selector.sel);
 }
 
 void vmx_read_segment_descriptor(struct CPUState *cpu, struct vmx_segment *desc, X86Seg seg)
 {
-    desc->sel = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector);
-    desc->base = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].base);
-    desc->limit = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].limit);
-    desc->ar = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].ar_bytes);
+    desc->sel = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector);
+    desc->base = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].base);
+    desc->limit = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].limit);
+    desc->ar = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].ar_bytes);
 }
 
 void vmx_write_segment_descriptor(CPUState *cpu, struct vmx_segment *desc, X86Seg seg)
 {
     const struct vmx_segment_field *sf = &vmx_segment_fields[seg];
 
-    wvmcs(cpu->hvf_fd, sf->base, desc->base);
-    wvmcs(cpu->hvf_fd, sf->limit, desc->limit);
-    wvmcs(cpu->hvf_fd, sf->selector, desc->sel);
-    wvmcs(cpu->hvf_fd, sf->ar_bytes, desc->ar);
+    wvmcs(cpu->hvf->fd, sf->base, desc->base);
+    wvmcs(cpu->hvf->fd, sf->limit, desc->limit);
+    wvmcs(cpu->hvf->fd, sf->selector, desc->sel);
+    wvmcs(cpu->hvf->fd, sf->ar_bytes, desc->ar);
 }
 
 void x86_segment_descriptor_to_vmx(struct CPUState *cpu, x68_segment_selector selector, struct x86_segment_descriptor *desc, struct vmx_segment *vmx_desc)
diff --git a/target/i386/hvf/x86_emu.c b/target/i386/hvf/x86_emu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86_emu.c
+++ b/target/i386/hvf/x86_emu.c
@@ -XXX,XX +XXX,XX @@ void simulate_rdmsr(struct CPUState *cpu)
 
     switch (msr) {
     case MSR_IA32_TSC:
-        val = rdtscp() + rvmcs(cpu->hvf_fd, VMCS_TSC_OFFSET);
+        val = rdtscp() + rvmcs(cpu->hvf->fd, VMCS_TSC_OFFSET);
         break;
     case MSR_IA32_APICBASE:
         val = cpu_get_apic_base(X86_CPU(cpu)->apic_state);
@@ -XXX,XX +XXX,XX @@ void simulate_rdmsr(struct CPUState *cpu)
         val = x86_cpu->ucode_rev;
         break;
     case MSR_EFER:
-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER);
+        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER);
         break;
     case MSR_FSBASE:
-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_FS_BASE);
+        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_FS_BASE);
         break;
     case MSR_GSBASE:
-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_GS_BASE);
+        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_GS_BASE);
         break;
     case MSR_KERNELGSBASE:
-        val = rvmcs(cpu->hvf_fd, VMCS_HOST_FS_BASE);
+        val = rvmcs(cpu->hvf->fd, VMCS_HOST_FS_BASE);
         break;
     case MSR_STAR:
         abort();
@@ -XXX,XX +XXX,XX @@ void simulate_wrmsr(struct CPUState *cpu)
         cpu_set_apic_base(X86_CPU(cpu)->apic_state, data);
         break;
     case MSR_FSBASE:
-        wvmcs(cpu->hvf_fd, VMCS_GUEST_FS_BASE, data);
+        wvmcs(cpu->hvf->fd, VMCS_GUEST_FS_BASE, data);
         break;
     case MSR_GSBASE:
-        wvmcs(cpu->hvf_fd, VMCS_GUEST_GS_BASE, data);
+        wvmcs(cpu->hvf->fd, VMCS_GUEST_GS_BASE, data);
         break;
     case MSR_KERNELGSBASE:
-        wvmcs(cpu->hvf_fd, VMCS_HOST_FS_BASE, data);
+        wvmcs(cpu->hvf->fd, VMCS_HOST_FS_BASE, data);
         break;
     case MSR_STAR:
         abort();
@@ -XXX,XX +XXX,XX @@ void simulate_wrmsr(struct CPUState *cpu)
         break;
     case MSR_EFER:
         /*printf("new efer %llx\n", EFER(cpu));*/
-        wvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER, data);
+        wvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER, data);
         if (data & MSR_EFER_NXE) {
-            hv_vcpu_invalidate_tlb(cpu->hvf_fd);
+            hv_vcpu_invalidate_tlb(cpu->hvf->fd);
         }
         break;
     case MSR_MTRRphysBase(0):
@@ -XXX,XX +XXX,XX @@ void load_regs(struct CPUState *cpu)
     CPUX86State *env = &x86_cpu->env;
 
     int i = 0;
-    RRX(env, R_EAX) = rreg(cpu->hvf_fd, HV_X86_RAX);
-    RRX(env, R_EBX) = rreg(cpu->hvf_fd, HV_X86_RBX);
-    RRX(env, R_ECX) = rreg(cpu->hvf_fd, HV_X86_RCX);
-    RRX(env, R_EDX) = rreg(cpu->hvf_fd, HV_X86_RDX);
-    RRX(env, R_ESI) = rreg(cpu->hvf_fd, HV_X86_RSI);
-    RRX(env, R_EDI) = rreg(cpu->hvf_fd, HV_X86_RDI);
-    RRX(env, R_ESP) = rreg(cpu->hvf_fd, HV_X86_RSP);
-    RRX(env, R_EBP) = rreg(cpu->hvf_fd, HV_X86_RBP);
+    RRX(env, R_EAX) = rreg(cpu->hvf->fd, HV_X86_RAX);
+    RRX(env, R_EBX) = rreg(cpu->hvf->fd, HV_X86_RBX);
+    RRX(env, R_ECX) = rreg(cpu->hvf->fd, HV_X86_RCX);
+    RRX(env, R_EDX) = rreg(cpu->hvf->fd, HV_X86_RDX);
+    RRX(env, R_ESI) = rreg(cpu->hvf->fd, HV_X86_RSI);
+    RRX(env, R_EDI) = rreg(cpu->hvf->fd, HV_X86_RDI);
+    RRX(env, R_ESP) = rreg(cpu->hvf->fd, HV_X86_RSP);
+    RRX(env, R_EBP) = rreg(cpu->hvf->fd, HV_X86_RBP);
     for (i = 8; i < 16; i++) {
-        RRX(env, i) = rreg(cpu->hvf_fd, HV_X86_RAX + i);
+        RRX(env, i) = rreg(cpu->hvf->fd, HV_X86_RAX + i);
     }
 
-    env->eflags = rreg(cpu->hvf_fd, HV_X86_RFLAGS);
+    env->eflags = rreg(cpu->hvf->fd, HV_X86_RFLAGS);
     rflags_to_lflags(env);
-    env->eip = rreg(cpu->hvf_fd, HV_X86_RIP);
+    env->eip = rreg(cpu->hvf->fd, HV_X86_RIP);
 }
 
 void store_regs(struct CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ void store_regs(struct CPUState *cpu)
     CPUX86State *env = &x86_cpu->env;
 
     int i = 0;
-    wreg(cpu->hvf_fd, HV_X86_RAX, RAX(env));
-    wreg(cpu->hvf_fd, HV_X86_RBX, RBX(env));
-    wreg(cpu->hvf_fd, HV_X86_RCX, RCX(env));
-    wreg(cpu->hvf_fd, HV_X86_RDX, RDX(env));
-    wreg(cpu->hvf_fd, HV_X86_RSI, RSI(env));
-    wreg(cpu->hvf_fd, HV_X86_RDI, RDI(env));
-    wreg(cpu->hvf_fd, HV_X86_RBP, RBP(env));
-    wreg(cpu->hvf_fd, HV_X86_RSP, RSP(env));
+    wreg(cpu->hvf->fd, HV_X86_RAX, RAX(env));
+    wreg(cpu->hvf->fd, HV_X86_RBX, RBX(env));
+    wreg(cpu->hvf->fd, HV_X86_RCX, RCX(env));
+    wreg(cpu->hvf->fd, HV_X86_RDX, RDX(env));
+    wreg(cpu->hvf->fd, HV_X86_RSI, RSI(env));
+    wreg(cpu->hvf->fd, HV_X86_RDI, RDI(env));
+    wreg(cpu->hvf->fd, HV_X86_RBP, RBP(env));
+    wreg(cpu->hvf->fd, HV_X86_RSP, RSP(env));
     for (i = 8; i < 16; i++) {
-        wreg(cpu->hvf_fd, HV_X86_RAX + i, RRX(env, i));
+        wreg(cpu->hvf->fd, HV_X86_RAX + i, RRX(env, i));
     }
 
     lflags_to_rflags(env);
-    wreg(cpu->hvf_fd, HV_X86_RFLAGS, env->eflags);
+    wreg(cpu->hvf->fd, HV_X86_RFLAGS, env->eflags);
     macvm_set_rip(cpu, env->eip);
 }
 
diff --git a/target/i386/hvf/x86_mmu.c b/target/i386/hvf/x86_mmu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86_mmu.c
+++ b/target/i386/hvf/x86_mmu.c
@@ -XXX,XX +XXX,XX @@ static bool test_pt_entry(struct CPUState *cpu, struct gpt_translation *pt,
         pt->err_code |= MMU_PAGE_PT;
     }
 
-    uint32_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
+    uint32_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
     /* check protection */
     if (cr0 & CR0_WP) {
         if (pt->write_access && !pte_write_access(pte)) {
@@ -XXX,XX +XXX,XX @@ static bool walk_gpt(struct CPUState *cpu, target_ulong addr, int err_code,
 {
     int top_level, level;
     bool is_large = false;
-    target_ulong cr3 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR3);
+    target_ulong cr3 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR3);
     uint64_t page_mask = pae ? PAE_PTE_PAGE_MASK : LEGACY_PTE_PAGE_MASK;
     
     memset(pt, 0, sizeof(*pt));
diff --git a/target/i386/hvf/x86_task.c b/target/i386/hvf/x86_task.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86_task.c
+++ b/target/i386/hvf/x86_task.c
@@ -XXX,XX +XXX,XX @@ static void load_state_from_tss32(CPUState *cpu, struct x86_tss_segment32 *tss)
     X86CPU *x86_cpu = X86_CPU(cpu);
     CPUX86State *env = &x86_cpu->env;
 
-    wvmcs(cpu->hvf_fd, VMCS_GUEST_CR3, tss->cr3);
+    wvmcs(cpu->hvf->fd, VMCS_GUEST_CR3, tss->cr3);
 
     env->eip = tss->eip;
     env->eflags = tss->eflags | 2;
@@ -XXX,XX +XXX,XX @@ static int task_switch_32(CPUState *cpu, x68_segment_selector tss_sel, x68_segme
 
 void vmx_handle_task_switch(CPUState *cpu, x68_segment_selector tss_sel, int reason, bool gate_valid, uint8_t gate, uint64_t gate_type)
 {
-    uint64_t rip = rreg(cpu->hvf_fd, HV_X86_RIP);
+    uint64_t rip = rreg(cpu->hvf->fd, HV_X86_RIP);
     if (!gate_valid || (gate_type != VMCS_INTR_T_HWEXCEPTION &&
                         gate_type != VMCS_INTR_T_HWINTR &&
                         gate_type != VMCS_INTR_T_NMI)) {
-        int ins_len = rvmcs(cpu->hvf_fd, VMCS_EXIT_INSTRUCTION_LENGTH);
+        int ins_len = rvmcs(cpu->hvf->fd, VMCS_EXIT_INSTRUCTION_LENGTH);
         macvm_set_rip(cpu, rip + ins_len);
         return;
     }
@@ -XXX,XX +XXX,XX @@ void vmx_handle_task_switch(CPUState *cpu, x68_segment_selector tss_sel, int rea
         //ret = task_switch_16(cpu, tss_sel, old_tss_sel, old_tss_base, &next_tss_desc);
         VM_PANIC("task_switch_16");
 
-    macvm_set_cr0(cpu->hvf_fd, rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0) | CR0_TS);
+    macvm_set_cr0(cpu->hvf->fd, rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0) | CR0_TS);
     x86_segment_descriptor_to_vmx(cpu, tss_sel, &next_tss_desc, &vmx_seg);
     vmx_write_segment_descriptor(cpu, &vmx_seg, R_TR);
 
     store_regs(cpu);
 
-    hv_vcpu_invalidate_tlb(cpu->hvf_fd);
-    hv_vcpu_flush(cpu->hvf_fd);
+    hv_vcpu_invalidate_tlb(cpu->hvf->fd);
+    hv_vcpu_flush(cpu->hvf->fd);
 }
diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -XXX,XX +XXX,XX @@ void hvf_put_xsave(CPUState *cpu_state)
 
     x86_cpu_xsave_all_areas(X86_CPU(cpu_state), xsave);
 
-    if (hv_vcpu_write_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
+    if (hv_vcpu_write_fpstate(cpu_state->hvf->fd, (void*)xsave, 4096)) {
         abort();
     }
 }
@@ -XXX,XX +XXX,XX @@ void hvf_put_segments(CPUState *cpu_state)
     CPUX86State *env = &X86_CPU(cpu_state)->env;
     struct vmx_segment seg;
     
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_LIMIT, env->idt.limit);
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_BASE, env->idt.base);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_LIMIT, env->idt.limit);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_BASE, env->idt.base);
 
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_LIMIT, env->gdt.limit);
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_BASE, env->gdt.base);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_LIMIT, env->gdt.limit);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_BASE, env->gdt.base);
 
-    /* wvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR2, env->cr[2]); */
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR3, env->cr[3]);
+    /* wvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR2, env->cr[2]); */
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR3, env->cr[3]);
     vmx_update_tpr(cpu_state);
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IA32_EFER, env->efer);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IA32_EFER, env->efer);
 
-    macvm_set_cr4(cpu_state->hvf_fd, env->cr[4]);
-    macvm_set_cr0(cpu_state->hvf_fd, env->cr[0]);
+    macvm_set_cr4(cpu_state->hvf->fd, env->cr[4]);
+    macvm_set_cr0(cpu_state->hvf->fd, env->cr[0]);
 
     hvf_set_segment(cpu_state, &seg, &env->segs[R_CS], false);
     vmx_write_segment_descriptor(cpu_state, &seg, R_CS);
@@ -XXX,XX +XXX,XX @@ void hvf_put_segments(CPUState *cpu_state)
     hvf_set_segment(cpu_state, &seg, &env->ldt, false);
     vmx_write_segment_descriptor(cpu_state, &seg, R_LDTR);
     
-    hv_vcpu_flush(cpu_state->hvf_fd);
+    hv_vcpu_flush(cpu_state->hvf->fd);
 }
     
 void hvf_put_msrs(CPUState *cpu_state)
 {
     CPUX86State *env = &X86_CPU(cpu_state)->env;
 
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_CS,
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_CS,
                       env->sysenter_cs);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_ESP,
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_ESP,
                       env->sysenter_esp);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_EIP,
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_EIP,
                       env->sysenter_eip);
 
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_STAR, env->star);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_STAR, env->star);
 
 #ifdef TARGET_X86_64
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_CSTAR, env->cstar);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_KERNELGSBASE, env->kernelgsbase);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_FMASK, env->fmask);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_LSTAR, env->lstar);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_CSTAR, env->cstar);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_KERNELGSBASE, env->kernelgsbase);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_FMASK, env->fmask);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_LSTAR, env->lstar);
 #endif
 
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_GSBASE, env->segs[R_GS].base);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_FSBASE, env->segs[R_FS].base);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_GSBASE, env->segs[R_GS].base);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_FSBASE, env->segs[R_FS].base);
 }
 
 
@@ -XXX,XX +XXX,XX @@ void hvf_get_xsave(CPUState *cpu_state)
 
     xsave = X86_CPU(cpu_state)->env.xsave_buf;
 
-    if (hv_vcpu_read_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
+    if (hv_vcpu_read_fpstate(cpu_state->hvf->fd, (void*)xsave, 4096)) {
         abort();
     }
 
@@ -XXX,XX +XXX,XX @@ void hvf_get_segments(CPUState *cpu_state)
     vmx_read_segment_descriptor(cpu_state, &seg, R_LDTR);
     hvf_get_segment(&env->ldt, &seg);
 
-    env->idt.limit = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_LIMIT);
-    env->idt.base = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_BASE);
-    env->gdt.limit = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
-    env->gdt.base = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_BASE);
+    env->idt.limit = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_LIMIT);
+    env->idt.base = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_BASE);
+    env->gdt.limit = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
+    env->gdt.base = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_BASE);
 
-    env->cr[0] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR0);
+    env->cr[0] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR0);
     env->cr[2] = 0;
-    env->cr[3] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR3);
-    env->cr[4] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR4);
+    env->cr[3] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR3);
+    env->cr[4] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR4);
     
-    env->efer = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IA32_EFER);
+    env->efer = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IA32_EFER);
 }
 
 void hvf_get_msrs(CPUState *cpu_state)
@@ -XXX,XX +XXX,XX @@ void hvf_get_msrs(CPUState *cpu_state)
     CPUX86State *env = &X86_CPU(cpu_state)->env;
     uint64_t tmp;
     
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_CS, &tmp);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_CS, &tmp);
     env->sysenter_cs = tmp;
     
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_ESP, &tmp);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_ESP, &tmp);
     env->sysenter_esp = tmp;
 
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_EIP, &tmp);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_EIP, &tmp);
     env->sysenter_eip = tmp;
 
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_STAR, &env->star);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_STAR, &env->star);
 
 #ifdef TARGET_X86_64
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_CSTAR, &env->cstar);
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_KERNELGSBASE, &env->kernelgsbase);
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_FMASK, &env->fmask);
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_LSTAR, &env->lstar);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_CSTAR, &env->cstar);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_KERNELGSBASE, &env->kernelgsbase);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_FMASK, &env->fmask);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_LSTAR, &env->lstar);
 #endif
 
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_APICBASE, &tmp);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_APICBASE, &tmp);
     
-    env->tsc = rdtscp() + rvmcs(cpu_state->hvf_fd, VMCS_TSC_OFFSET);
+    env->tsc = rdtscp() + rvmcs(cpu_state->hvf->fd, VMCS_TSC_OFFSET);
 }
 
 int hvf_put_registers(CPUState *cpu_state)
@@ -XXX,XX +XXX,XX @@ int hvf_put_registers(CPUState *cpu_state)
     X86CPU *x86cpu = X86_CPU(cpu_state);
     CPUX86State *env = &x86cpu->env;
 
-    wreg(cpu_state->hvf_fd, HV_X86_RAX, env->regs[R_EAX]);
-    wreg(cpu_state->hvf_fd, HV_X86_RBX, env->regs[R_EBX]);
-    wreg(cpu_state->hvf_fd, HV_X86_RCX, env->regs[R_ECX]);
-    wreg(cpu_state->hvf_fd, HV_X86_RDX, env->regs[R_EDX]);
-    wreg(cpu_state->hvf_fd, HV_X86_RBP, env->regs[R_EBP]);
-    wreg(cpu_state->hvf_fd, HV_X86_RSP, env->regs[R_ESP]);
-    wreg(cpu_state->hvf_fd, HV_X86_RSI, env->regs[R_ESI]);
-    wreg(cpu_state->hvf_fd, HV_X86_RDI, env->regs[R_EDI]);
-    wreg(cpu_state->hvf_fd, HV_X86_R8, env->regs[8]);
-    wreg(cpu_state->hvf_fd, HV_X86_R9, env->regs[9]);
-    wreg(cpu_state->hvf_fd, HV_X86_R10, env->regs[10]);
-    wreg(cpu_state->hvf_fd, HV_X86_R11, env->regs[11]);
-    wreg(cpu_state->hvf_fd, HV_X86_R12, env->regs[12]);
-    wreg(cpu_state->hvf_fd, HV_X86_R13, env->regs[13]);
-    wreg(cpu_state->hvf_fd, HV_X86_R14, env->regs[14]);
-    wreg(cpu_state->hvf_fd, HV_X86_R15, env->regs[15]);
-    wreg(cpu_state->hvf_fd, HV_X86_RFLAGS, env->eflags);
-    wreg(cpu_state->hvf_fd, HV_X86_RIP, env->eip);
+    wreg(cpu_state->hvf->fd, HV_X86_RAX, env->regs[R_EAX]);
+    wreg(cpu_state->hvf->fd, HV_X86_RBX, env->regs[R_EBX]);
+    wreg(cpu_state->hvf->fd, HV_X86_RCX, env->regs[R_ECX]);
+    wreg(cpu_state->hvf->fd, HV_X86_RDX, env->regs[R_EDX]);
+    wreg(cpu_state->hvf->fd, HV_X86_RBP, env->regs[R_EBP]);
+    wreg(cpu_state->hvf->fd, HV_X86_RSP, env->regs[R_ESP]);
+    wreg(cpu_state->hvf->fd, HV_X86_RSI, env->regs[R_ESI]);
+    wreg(cpu_state->hvf->fd, HV_X86_RDI, env->regs[R_EDI]);
+    wreg(cpu_state->hvf->fd, HV_X86_R8, env->regs[8]);
+    wreg(cpu_state->hvf->fd, HV_X86_R9, env->regs[9]);
+    wreg(cpu_state->hvf->fd, HV_X86_R10, env->regs[10]);
+    wreg(cpu_state->hvf->fd, HV_X86_R11, env->regs[11]);
+    wreg(cpu_state->hvf->fd, HV_X86_R12, env->regs[12]);
+    wreg(cpu_state->hvf->fd, HV_X86_R13, env->regs[13]);
+    wreg(cpu_state->hvf->fd, HV_X86_R14, env->regs[14]);
+    wreg(cpu_state->hvf->fd, HV_X86_R15, env->regs[15]);
+    wreg(cpu_state->hvf->fd, HV_X86_RFLAGS, env->eflags);
+    wreg(cpu_state->hvf->fd, HV_X86_RIP, env->eip);
    
-    wreg(cpu_state->hvf_fd, HV_X86_XCR0, env->xcr0);
+    wreg(cpu_state->hvf->fd, HV_X86_XCR0, env->xcr0);
     
     hvf_put_xsave(cpu_state);
     
@@ -XXX,XX +XXX,XX @@ int hvf_put_registers(CPUState *cpu_state)
     
     hvf_put_msrs(cpu_state);
     
-    wreg(cpu_state->hvf_fd, HV_X86_DR0, env->dr[0]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR1, env->dr[1]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR2, env->dr[2]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR3, env->dr[3]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR4, env->dr[4]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR5, env->dr[5]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR6, env->dr[6]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR7, env->dr[7]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR0, env->dr[0]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR1, env->dr[1]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR2, env->dr[2]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR3, env->dr[3]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR4, env->dr[4]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR5, env->dr[5]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR6, env->dr[6]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR7, env->dr[7]);
     
     return 0;
 }
@@ -XXX,XX +XXX,XX @@ int hvf_get_registers(CPUState *cpu_state)
     X86CPU *x86cpu = X86_CPU(cpu_state);
     CPUX86State *env = &x86cpu->env;
 
-    env->regs[R_EAX] = rreg(cpu_state->hvf_fd, HV_X86_RAX);
-    env->regs[R_EBX] = rreg(cpu_state->hvf_fd, HV_X86_RBX);
-    env->regs[R_ECX] = rreg(cpu_state->hvf_fd, HV_X86_RCX);
-    env->regs[R_EDX] = rreg(cpu_state->hvf_fd, HV_X86_RDX);
-    env->regs[R_EBP] = rreg(cpu_state->hvf_fd, HV_X86_RBP);
-    env->regs[R_ESP] = rreg(cpu_state->hvf_fd, HV_X86_RSP);
-    env->regs[R_ESI] = rreg(cpu_state->hvf_fd, HV_X86_RSI);
-    env->regs[R_EDI] = rreg(cpu_state->hvf_fd, HV_X86_RDI);
-    env->regs[8] = rreg(cpu_state->hvf_fd, HV_X86_R8);
-    env->regs[9] = rreg(cpu_state->hvf_fd, HV_X86_R9);
-    env->regs[10] = rreg(cpu_state->hvf_fd, HV_X86_R10);
-    env->regs[11] = rreg(cpu_state->hvf_fd, HV_X86_R11);
-    env->regs[12] = rreg(cpu_state->hvf_fd, HV_X86_R12);
-    env->regs[13] = rreg(cpu_state->hvf_fd, HV_X86_R13);
-    env->regs[14] = rreg(cpu_state->hvf_fd, HV_X86_R14);
-    env->regs[15] = rreg(cpu_state->hvf_fd, HV_X86_R15);
+    env->regs[R_EAX] = rreg(cpu_state->hvf->fd, HV_X86_RAX);
+    env->regs[R_EBX] = rreg(cpu_state->hvf->fd, HV_X86_RBX);
+    env->regs[R_ECX] = rreg(cpu_state->hvf->fd, HV_X86_RCX);
+    env->regs[R_EDX] = rreg(cpu_state->hvf->fd, HV_X86_RDX);
+    env->regs[R_EBP] = rreg(cpu_state->hvf->fd, HV_X86_RBP);
+    env->regs[R_ESP] = rreg(cpu_state->hvf->fd, HV_X86_RSP);
+    env->regs[R_ESI] = rreg(cpu_state->hvf->fd, HV_X86_RSI);
+    env->regs[R_EDI] = rreg(cpu_state->hvf->fd, HV_X86_RDI);
+    env->regs[8] = rreg(cpu_state->hvf->fd, HV_X86_R8);
+    env->regs[9] = rreg(cpu_state->hvf->fd, HV_X86_R9);
+    env->regs[10] = rreg(cpu_state->hvf->fd, HV_X86_R10);
+    env->regs[11] = rreg(cpu_state->hvf->fd, HV_X86_R11);
+    env->regs[12] = rreg(cpu_state->hvf->fd, HV_X86_R12);
+    env->regs[13] = rreg(cpu_state->hvf->fd, HV_X86_R13);
+    env->regs[14] = rreg(cpu_state->hvf->fd, HV_X86_R14);
+    env->regs[15] = rreg(cpu_state->hvf->fd, HV_X86_R15);
     
-    env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
-    env->eip = rreg(cpu_state->hvf_fd, HV_X86_RIP);
+    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
+    env->eip = rreg(cpu_state->hvf->fd, HV_X86_RIP);
    
     hvf_get_xsave(cpu_state);
-    env->xcr0 = rreg(cpu_state->hvf_fd, HV_X86_XCR0);
+    env->xcr0 = rreg(cpu_state->hvf->fd, HV_X86_XCR0);
     
     hvf_get_segments(cpu_state);
     hvf_get_msrs(cpu_state);
     
-    env->dr[0] = rreg(cpu_state->hvf_fd, HV_X86_DR0);
-    env->dr[1] = rreg(cpu_state->hvf_fd, HV_X86_DR1);
-    env->dr[2] = rreg(cpu_state->hvf_fd, HV_X86_DR2);
-    env->dr[3] = rreg(cpu_state->hvf_fd, HV_X86_DR3);
-    env->dr[4] = rreg(cpu_state->hvf_fd, HV_X86_DR4);
-    env->dr[5] = rreg(cpu_state->hvf_fd, HV_X86_DR5);
-    env->dr[6] = rreg(cpu_state->hvf_fd, HV_X86_DR6);
-    env->dr[7] = rreg(cpu_state->hvf_fd, HV_X86_DR7);
+    env->dr[0] = rreg(cpu_state->hvf->fd, HV_X86_DR0);
+    env->dr[1] = rreg(cpu_state->hvf->fd, HV_X86_DR1);
+    env->dr[2] = rreg(cpu_state->hvf->fd, HV_X86_DR2);
+    env->dr[3] = rreg(cpu_state->hvf->fd, HV_X86_DR3);
+    env->dr[4] = rreg(cpu_state->hvf->fd, HV_X86_DR4);
+    env->dr[5] = rreg(cpu_state->hvf->fd, HV_X86_DR5);
+    env->dr[6] = rreg(cpu_state->hvf->fd, HV_X86_DR6);
+    env->dr[7] = rreg(cpu_state->hvf->fd, HV_X86_DR7);
     
     x86_update_hflags(env);
     return 0;
@@ -XXX,XX +XXX,XX @@ int hvf_get_registers(CPUState *cpu_state)
 static void vmx_set_int_window_exiting(CPUState *cpu)
 {
      uint64_t val;
-     val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
-     wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val |
+     val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
+     wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val |
              VMCS_PRI_PROC_BASED_CTLS_INT_WINDOW_EXITING);
 }
 
 void vmx_clear_int_window_exiting(CPUState *cpu)
 {
      uint64_t val;
-     val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
-     wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val &
+     val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
+     wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val &
              ~VMCS_PRI_PROC_BASED_CTLS_INT_WINDOW_EXITING);
 }
 
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
     uint64_t info = 0;
     if (have_event) {
         info = vector | intr_type | VMCS_INTR_VALID;
-        uint64_t reason = rvmcs(cpu_state->hvf_fd, VMCS_EXIT_REASON);
+        uint64_t reason = rvmcs(cpu_state->hvf->fd, VMCS_EXIT_REASON);
         if (env->nmi_injected && reason != EXIT_REASON_TASK_SWITCH) {
             vmx_clear_nmi_blocking(cpu_state);
         }
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
             info &= ~(1 << 12); /* clear undefined bit */
             if (intr_type == VMCS_INTR_T_SWINTR ||
                 intr_type == VMCS_INTR_T_SWEXCEPTION) {
-                wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INST_LENGTH, env->ins_len);
+                wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INST_LENGTH, env->ins_len);
             }
             
             if (env->has_error_code) {
-                wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_EXCEPTION_ERROR,
+                wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_EXCEPTION_ERROR,
                       env->error_code);
                 /* Indicate that VMCS_ENTRY_EXCEPTION_ERROR is valid */
                 info |= VMCS_INTR_DEL_ERRCODE;
             }
             /*printf("reinject  %lx err %d\n", info, err);*/
-            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, info);
+            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, info);
         };
     }
 
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
         if (!(env->hflags2 & HF2_NMI_MASK) && !(info & VMCS_INTR_VALID)) {
             cpu_state->interrupt_request &= ~CPU_INTERRUPT_NMI;
             info = VMCS_INTR_VALID | VMCS_INTR_T_NMI | EXCP02_NMI;
-            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, info);
+            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, info);
         } else {
             vmx_set_nmi_window_exiting(cpu_state);
         }
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
         int line = cpu_get_pic_interrupt(&x86cpu->env);
         cpu_state->interrupt_request &= ~CPU_INTERRUPT_HARD;
         if (line >= 0) {
-            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, line |
+            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, line |
                   VMCS_INTR_VALID | VMCS_INTR_T_HWINTR);
         }
     }
@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
     X86CPU *cpu = X86_CPU(cpu_state);
     CPUX86State *env = &cpu->env;
 
-    env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
+    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
 
     if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
         cpu_synchronize_state(cpu_state);
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

The hooks we have that call us after reset, init and loadvm really all
just want to say "The reference of all register state is in the QEMU
vcpu struct, please push it".

We already have a working pushing mechanism though called cpu->vcpu_dirty,
so we can just reuse that for all of the above, syncing state properly the
next time we actually execute a vCPU.

This fixes PSCI resets on ARM, as they modify CPU state even after the
post init call has completed, but before we execute the vCPU again.

To also make the scheme work for x86, we have to make sure we don't
move stale eflags into our env when the vcpu state is dirty.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-13-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.c | 27 +++++++--------------------
 target/i386/hvf/x86hvf.c  |  5 ++++-
 2 files changed, 11 insertions(+), 21 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static void hvf_cpu_synchronize_state(CPUState *cpu)
     }
 }
 
-static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
-                                              run_on_cpu_data arg)
+static void do_hvf_cpu_synchronize_set_dirty(CPUState *cpu,
+                                             run_on_cpu_data arg)
 {
-    hvf_put_registers(cpu);
-    cpu->vcpu_dirty = false;
+    /* QEMU state is the reference, push it to HVF now and on next entry */
+    cpu->vcpu_dirty = true;
 }
 
 static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
-}
-
-static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
-                                             run_on_cpu_data arg)
-{
-    hvf_put_registers(cpu);
-    cpu->vcpu_dirty = false;
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
 }
 
 static void hvf_cpu_synchronize_post_init(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
-}
-
-static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
-                                              run_on_cpu_data arg)
-{
-    cpu->vcpu_dirty = true;
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
 }
 
 static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
 }
 
 static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
     X86CPU *cpu = X86_CPU(cpu_state);
     CPUX86State *env = &cpu->env;
 
-    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
+    if (!cpu_state->vcpu_dirty) {
+        /* light weight sync for CPU_INTERRUPT_HARD and IF_MASK */
+        env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
+    }
 
     if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
         cpu_synchronize_state(cpu_state);
-- 
2.20.1

Coverity notes that we don't check for dup2() failing.  Add some
assertions so that if it does ever happen we get some indication.
(This is similar to how we handle other "don't expect this syscall to
fail" checks in this test code.)

Fixes: Coverity CID 1432346
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-2-peter.maydell@linaro.org
---
 tests/qtest/bios-tables-test.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -XXX,XX +XXX,XX @@ static void test_acpi_asl(test_data *data)
                                                  exp_sdt->asl_file, sdt->asl_file);
                     int out = dup(STDOUT_FILENO);
                     int ret G_GNUC_UNUSED;
+                    int dupret;
 
-                    dup2(STDERR_FILENO, STDOUT_FILENO);
+                    g_assert(out >= 0);
+                    dupret = dup2(STDERR_FILENO, STDOUT_FILENO);
+                    g_assert(dupret >= 0);
                     ret = system(diff) ;
-                    dup2(out, STDOUT_FILENO);
+                    dupret = dup2(out, STDOUT_FILENO);
+                    g_assert(dupret >= 0);
                     close(out);
                     g_free(diff);
                 }
-- 
2.20.1

The e1000e_send_verify() test calls qemu_recv() but doesn't
check that the call succeeded, which annoys Coverity. Add
an explicit test check for the length of the data.

(This is a test check, not a "we assume this syscall always
succeeds", so we use g_assert_cmpint() rather than g_assert().)

Fixes: Coverity CID 1432324
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-3-peter.maydell@linaro.org
---
 tests/qtest/e1000e-test.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/e1000e-test.c b/tests/qtest/e1000e-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/e1000e-test.c
+++ b/tests/qtest/e1000e-test.c
@@ -XXX,XX +XXX,XX @@ static void e1000e_send_verify(QE1000E *d, int *test_sockets, QGuestAllocator *a
     /* Check data sent to the backend */
     ret = qemu_recv(test_sockets[0], &recv_len, sizeof(recv_len), 0);
     g_assert_cmpint(ret, == , sizeof(recv_len));
-    qemu_recv(test_sockets[0], buffer, 64, 0);
+    ret = qemu_recv(test_sockets[0], buffer, 64, 0);
+    g_assert_cmpint(ret, >=, 5);
     g_assert_cmpstr(buffer, == , "TEST");
 
     /* Free test data buffer */
-- 
2.20.1

Coverity notices that the checks against mkstemp() failing in
create_qcow2_with_mbr() are wrong: mkstemp returns -1 on failure but
the check is just "g_assert(fd)".  Fix to use "g_assert(fd >= 0)",
matching the correct check in create_test_img().

Fixes: Coverity CID 1432274
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-4-peter.maydell@linaro.org
---
 tests/qtest/hd-geo-test.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/hd-geo-test.c b/tests/qtest/hd-geo-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/hd-geo-test.c
+++ b/tests/qtest/hd-geo-test.c
@@ -XXX,XX +XXX,XX @@ static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors)
     }
 
     fd = mkstemp(raw_path);
-    g_assert(fd);
+    g_assert(fd >= 0);
     close(fd);
 
     fd = open(raw_path, O_WRONLY);
@@ -XXX,XX +XXX,XX @@ static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors)
     close(fd);
 
     fd = mkstemp(qcow2_path);
-    g_assert(fd);
+    g_assert(fd >= 0);
     close(fd);
 
     qemu_img_path = getenv("QTEST_QEMU_IMG");
-- 
2.20.1

Coverity points out that we calculate a 64-bit value using 32-bit
arithmetic; add the cast to force the multiply to be done as 64-bits.
(The overflow will never happen with the current test data.)

Fixes: Coverity CID 1432320
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-5-peter.maydell@linaro.org
---
 tests/qtest/pflash-cfi02-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/pflash-cfi02-test.c b/tests/qtest/pflash-cfi02-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/pflash-cfi02-test.c
+++ b/tests/qtest/pflash-cfi02-test.c
@@ -XXX,XX +XXX,XX @@ static void test_geometry(const void *opaque)
 
     for (int region = 0; region < nb_erase_regions; ++region) {
         for (uint32_t i = 0; i < c->nb_blocs[region]; ++i) {
-            uint64_t byte_addr = i * c->sector_len[region];
+            uint64_t byte_addr = (uint64_t)i * c->sector_len[region];
             g_assert_cmphex(flash_read(c, byte_addr), ==, bank_mask(c));
         }
     }
-- 
2.20.1

Coverity points out that in tpm_test_swtpm_migration_test() we
assume that src_tpm_addr and dst_tpm_addr are non-NULL (we
pass them to tpm_util_migration_start_qemu() which will
unconditionally dereference them) but then later explicitly
check them for NULL. Remove the pointless checks.

Fixes: Coverity CID 1432367, 1432359

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-6-peter.maydell@linaro.org
---
 tests/qtest/tpm-tests.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/tpm-tests.c
+++ b/tests/qtest/tpm-tests.c
@@ -XXX,XX +XXX,XX @@ void tpm_test_swtpm_migration_test(const char *src_tpm_path,
     qtest_quit(src_qemu);
 
     tpm_util_swtpm_kill(dst_tpm_pid);
-    if (dst_tpm_addr) {
-        g_unlink(dst_tpm_addr->u.q_unix.path);
-        qapi_free_SocketAddress(dst_tpm_addr);
-    }
+    g_unlink(dst_tpm_addr->u.q_unix.path);
+    qapi_free_SocketAddress(dst_tpm_addr);
 
     tpm_util_swtpm_kill(src_tpm_pid);
-    if (src_tpm_addr) {
-        g_unlink(src_tpm_addr->u.q_unix.path);
-        qapi_free_SocketAddress(src_tpm_addr);
-    }
+    g_unlink(src_tpm_addr->u.q_unix.path);
+    qapi_free_SocketAddress(src_tpm_addr);
 }
-- 
2.20.1

Coverity complains that we don't check for failures from dup()
and mkstemp(); add asserts that these syscalls succeeded.

Fixes: Coverity CID 1432516, 1432574
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20210525134458.6675-7-peter.maydell@linaro.org
---
 tests/unit/test-vmstate.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tests/unit/test-vmstate.c b/tests/unit/test-vmstate.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/unit/test-vmstate.c
+++ b/tests/unit/test-vmstate.c
@@ -XXX,XX +XXX,XX @@ static int temp_fd;
 /* Duplicate temp_fd and seek to the beginning of the file */
 static QEMUFile *open_test_file(bool write)
 {
-    int fd = dup(temp_fd);
+    int fd;
     QIOChannel *ioc;
     QEMUFile *f;
 
+    fd = dup(temp_fd);
+    g_assert(fd >= 0);
     lseek(fd, 0, SEEK_SET);
     if (write) {
         g_assert_cmpint(ftruncate(fd, 0), ==, 0);
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
     g_autofree char *temp_file = g_strdup_printf("%s/vmst.test.XXXXXX",
                                                  g_get_tmp_dir());
     temp_fd = mkstemp(temp_file);
+    g_assert(temp_fd >= 0);
 
     module_call_init(MODULE_INIT_QOM);
 
-- 
2.20.1