Series comparison

-[PULL 00/39] target-arm queue
+[PULL 00/45] target-arm queue
-Most of this is the Neon decodetree patches, followed by Edgar's versal cleanups.
+The following changes since commit a97978bcc2d1f650c7d411428806e5b03082b8c7:
-thanks
+  Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210603' into staging (2021-06-03 10:00:35 +0100)
 -- PMM
 The following changes since commit 2ef486e76d64436be90f7359a3071fb2a56ce835:
   Merge remote-tracking branch 'remotes/marcel/tags/rdma-pull-request' into staging (2020-05-03 14:12:56 +0100)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200504
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210603
-for you to fetch changes up to 9aefc6cf9b73f66062d2f914a0136756e7a28211:
+for you to fetch changes up to 1c861885894d840235954060050d240259f5340b:
-  target/arm: Move gen_ function typedefs to translate.h (2020-05-04 12:59:26 +0100)
+  tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed (2021-06-03 16:43:27 +0100)
 ----------------------------------------------------------------
 target-arm queue:
- * Start of conversion of Neon insns to decodetree
+ * Some not-yet-enabled preliminaries for M-profile MVE support
- * versal board: support SD and RTC
+ * Consistently use "Cortex-Axx", not "Cortex Axx" in docs, comments
- * Implement ARMv8.2-TTS2UXN
+ * docs: Fix installation of man pages with Sphinx 4.x
- * Make VQDMULL undefined when U=1
+ * Mark LDS{MIN,MAX} as signed operations
- * Some minor code cleanups
+ * Fix missing syndrome value for DAIF and PAC check exceptions
  * Implement BFloat16 extensions
  * Refactoring of hvf accelerator code in preparation for aarch64 support
  * Fix some coverity nits in test code
 ----------------------------------------------------------------
-Edgar E. Iglesias (11):
+Alexander Graf (12):
-      hw/arm: versal: Remove inclusion of arm_gicv3_common.h
+      hvf: Move assert_hvf_ok() into common directory
-      hw/arm: versal: Move misplaced comment
+      hvf: Move vcpu thread functions into common directory
-      hw/arm: versal-virt: Fix typo xlnx-ve -> xlnx-versal
+      hvf: Move cpu functions into common directory
-      hw/arm: versal: Embed the UARTs into the SoC type
+      hvf: Move hvf internal definitions into common header
-      hw/arm: versal: Embed the GEMs into the SoC type
+      hvf: Make hvf_set_phys_mem() static
-      hw/arm: versal: Embed the ADMAs into the SoC type
+      hvf: Remove use of hv_uvaddr_t and hv_gpaddr_t
-      hw/arm: versal: Embed the APUs into the SoC type
+      hvf: Split out common code on vcpu init and destroy
-      hw/arm: versal: Add support for SD
+      hvf: Use cpu_synchronize_state()
-      hw/arm: versal: Add support for the RTC
+      hvf: Make synchronize functions static
-      hw/arm: versal-virt: Add support for SD
+      hvf: Remove hvf-accel-ops.h
-      hw/arm: versal-virt: Add support for the RTC
+      hvf: Introduce hvf vcpu struct
       hvf: Simplify post reset/init/loadvm hooks
-Fredrik Strupe (1):
+Damien Goutte-Gattat (1):
-      target/arm: Make VQDMULL undefined when U=1
+      docs: Fix installation of man pages with Sphinx 4.x
-Peter Maydell (25):
+Jamie Iles (4):
-      target/arm: Don't use a TLB for ARMMMUIdx_Stage2
+      target/arm: fix missing exception class
-      target/arm: Use enum constant in get_phys_addr_lpae() call
+      target/arm: fold do_raise_exception into raise_exception
-      target/arm: Add new 's1_is_el0' argument to get_phys_addr_lpae()
+      target/arm: use raise_exception_ra for MTE check failure
-      target/arm: Implement ARMv8.2-TTS2UXN
+      target/arm: use raise_exception_ra for stack limit exception
       target/arm: Use correct variable for setting 'max' cpu's ID_AA64DFR0
       target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check
       target/arm: Don't allow Thumb Neon insns without FEATURE_NEON
       target/arm: Add stubs for AArch32 Neon decodetree
       target/arm: Convert VCMLA (vector) to decodetree
       target/arm: Convert VCADD (vector) to decodetree
       target/arm: Convert V[US]DOT (vector) to decodetree
       target/arm: Convert VFM[AS]L (vector) to decodetree
       target/arm: Convert VCMLA (scalar) to decodetree
       target/arm: Convert V[US]DOT (scalar) to decodetree
       target/arm: Convert VFM[AS]L (scalar) to decodetree
       target/arm: Convert Neon load/store multiple structures to decodetree
       target/arm: Convert Neon 'load single structure to all lanes' to decodetree
       target/arm: Convert Neon 'load/store single structure' to decodetree
       target/arm: Convert Neon 3-reg-same VADD/VSUB to decodetree
       target/arm: Convert Neon 3-reg-same logic ops to decodetree
       target/arm: Convert Neon 3-reg-same VMAX/VMIN to decodetree
       target/arm: Convert Neon 3-reg-same comparisons to decodetree
       target/arm: Convert Neon 3-reg-same VQADD/VQSUB to decodetree
       target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL to decodetree
       target/arm: Move gen_ function typedefs to translate.h
-Philippe Mathieu-Daudé (2):
+Peter Maydell (15):
-      hw/arm/mps2-tz: Use TYPE_IOTKIT instead of hardcoded string
+      target/arm: Add isar feature check functions for MVE
-      target/arm: Use uint64_t for midr field in CPU state struct
+      target/arm: Update feature checks for insns which are "MVE or FP"
       target/arm: Move fpsp/fpdp isar check into callers of do_vfp_2op_sp/dp
       target/arm: Add MVE check to VMOV_reg_sp and VMOV_reg_dp
       target/arm: Fix return values in fp_sysreg_checks()
       target/arm: Implement M-profile VPR register
       target/arm: Make FPSCR.LTPSIZE writable for MVE
       target/arm: Allow board models to specify initial NS VTOR
       arm: Consistently use "Cortex-Axx", not "Cortex Axx"
       tests/qtest/bios-tables-test: Check for dup2() failure
       tests/qtest/e1000e-test: Check qemu_recv() succeeded
       tests/qtest/hd-geo-test: Fix checks on mkstemp() return value
       tests/qtest/pflash-cfi02-test: Avoid potential integer overflow
       tests/qtest/tpm-tests: Remove unnecessary NULL checks
       tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed
- include/hw/arm/xlnx-versal.h    |  31 +-
+Richard Henderson (13):
- target/arm/cpu-param.h          |   2 +-
+      target/arm: Mark LDS{MIN,MAX} as signed operations
- target/arm/cpu.h                |  38 ++-
+      target/arm: Add isar_feature_{aa32, aa64, aa64_sve}_bf16
- target/arm/translate-a64.h      |   9 -
+      target/arm: Unify unallocated path in disas_fp_1src
- target/arm/translate.h          |  26 ++
+      target/arm: Implement scalar float32 to bfloat16 conversion
- target/arm/neon-dp.decode       |  86 +++++
+      target/arm: Implement vector float32 to bfloat16 conversion
- target/arm/neon-ls.decode       |  52 +++
+      softfpu: Add float_round_to_odd_inf
- target/arm/neon-shared.decode   |  66 ++++
+      target/arm: Implement bfloat16 dot product (vector)
- hw/arm/mps2-tz.c                |   2 +-
+      target/arm: Implement bfloat16 dot product (indexed)
- hw/arm/xlnx-versal-virt.c       |  74 ++++-
+      target/arm: Implement bfloat16 matrix multiply accumulate
- hw/arm/xlnx-versal.c            | 115 +++++--
+      target/arm: Implement bfloat widening fma (vector)
- target/arm/cpu.c                |   3 +-
+      target/arm: Implement bfloat widening fma (indexed)
- target/arm/cpu64.c              |   8 +-
+      linux-user/aarch64: Enable hwcap bits for bfloat16
- target/arm/helper.c             | 183 ++++------
+      target/arm: Enable BFloat16 extensions
  target/arm/translate-a64.c      |  17 -
  target/arm/translate-neon.inc.c | 714 +++++++++++++++++++++++++++++++++++++++
  target/arm/translate-vfp.inc.c  |   6 -
  target/arm/translate.c          | 716 +++-------------------------------------
  target/arm/Makefile.objs        |  18 +
 files changed, 1302 insertions(+), 864 deletions(-)
  create mode 100644 target/arm/neon-dp.decode
  create mode 100644 target/arm/neon-ls.decode
  create mode 100644 target/arm/neon-shared.decode
  create mode 100644 target/arm/translate-neon.inc.c
+ docs/conf.py                    |   1 +
+ docs/system/arm/aspeed.rst      |   4 +-
+ docs/system/arm/nuvoton.rst     |   6 +-
+ docs/system/arm/sabrelite.rst   |   2 +-
+ include/fpu/softfloat-types.h   |   4 +-
+ include/hw/arm/allwinner-h3.h   |   2 +-
+ include/hw/arm/armv7m.h         |   2 +
+ include/hw/core/cpu.h           |   3 +-
+ include/sysemu/hvf_int.h        |  58 +++++
+ target/arm/cpu.h                |  48 +++-
+ target/arm/helper-sve.h         |   4 +
+ target/arm/helper.h             |  15 ++
+ target/i386/hvf/hvf-accel-ops.h |  23 --
+ target/i386/hvf/hvf-i386.h      |  33 +--
+ target/i386/hvf/vmx.h           |  24 +-
+ target/i386/hvf/x86hvf.h        |   2 -
+ target/arm/neon-dp.decode       |   1 +
+ target/arm/neon-shared.decode   |  11 +
+ target/arm/sve.decode           |  19 +-
+ target/arm/vfp.decode           |   2 +
+ accel/hvf/hvf-accel-ops.c       | 471 ++++++++++++++++++++++++++++++++++++++++
+ accel/hvf/hvf-all.c             |  47 ++++
+ hw/arm/armv7m.c                 |   7 +
+ hw/arm/aspeed.c                 |   6 +-
+ hw/arm/mcimx6ul-evk.c           |   2 +-
+ hw/arm/mcimx7d-sabre.c          |   2 +-
+ hw/arm/npcm7xx_boards.c         |   4 +-
+ hw/arm/sabrelite.c              |   2 +-
+ hw/misc/npcm7xx_clk.c           |   2 +-
+ linux-user/elfload.c            |   2 +
+ target/arm/cpu.c                |  13 ++
+ target/arm/cpu64.c              |   3 +
+ target/arm/cpu_tcg.c            |   1 +
+ target/arm/m_helper.c           |   5 +-
+ target/arm/machine.c            |  20 ++
+ target/arm/mte_helper.c         |  12 +-
+ target/arm/op_helper.c          |  32 ++-
+ target/arm/sve_helper.c         |   2 +
+ target/arm/translate-a64.c      | 155 +++++++++++--
+ target/arm/translate-neon.c     |  91 ++++++++
+ target/arm/translate-sve.c      | 112 ++++++++++
+ target/arm/translate-vfp.c      | 164 ++++++++++----
+ target/arm/vec_helper.c         | 140 +++++++++++-
+ target/arm/vfp_helper.c         |  21 +-
+ target/i386/hvf/hvf-accel-ops.c | 146 -------------
+ target/i386/hvf/hvf.c           | 464 +++++----------------------------------
+ target/i386/hvf/x86.c           |  28 +--
+ target/i386/hvf/x86_descr.c     |  26 +--
+ target/i386/hvf/x86_emu.c       |  62 +++---
+ target/i386/hvf/x86_mmu.c       |   4 +-
+ target/i386/hvf/x86_task.c      |  12 +-
+ target/i386/hvf/x86hvf.c        | 222 +++++++++----------
+ tests/qtest/bios-tables-test.c  |   8 +-
+ tests/qtest/e1000e-test.c       |   3 +-
+ tests/qtest/hd-geo-test.c       |   4 +-
+ tests/qtest/pflash-cfi02-test.c |   2 +-
+ tests/qtest/tpm-tests.c         |  12 +-
+ tests/unit/test-vmstate.c       |   5 +-
+ fpu/softfloat-parts.c.inc       |   6 +-
+ MAINTAINERS                     |   8 +
+ accel/hvf/meson.build           |   7 +
+ accel/meson.build               |   1 +
+ target/i386/hvf/meson.build     |   1 -
+files changed, 1666 insertions(+), 935 deletions(-)
+ create mode 100644 include/sysemu/hvf_int.h
+ delete mode 100644 target/i386/hvf/hvf-accel-ops.h
+ create mode 100644 accel/hvf/hvf-accel-ops.c
+ create mode 100644 accel/hvf/hvf-all.c
+ delete mode 100644 target/i386/hvf/hvf-accel-ops.c
+ create mode 100644 accel/hvf/meson.build

-[PULL 38/39] target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL to decodetree
+[PULL 01/45] target/arm: Add isar feature check functions for MVE
-Convert the Neon VMUL, VMLA, VMLS and VSHL insns in the
+Add the isar feature check functions we will need for v8.1M MVE:
--reg-same grouping to decodetree.
+ * a check for MVE present: this corresponds to the pseudocode's
    CheckDecodeFaults(ExtType_Mve)
  * a check for the optional floating-point part of MVE: this
    corresponds to CheckDecodeFaults(ExtType_MveFp)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-20-peter.maydell@linaro.org
+Message-id: 20210520152840.24453-2-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  9 +++++++
+ target/arm/cpu.h | 22 ++++++++++++++++++++++
- target/arm/translate-neon.inc.c | 44 +++++++++++++++++++++++++++++++++
+file changed, 22 insertions(+)
  target/arm/translate.c          | 28 +++------------------
 files changed, 56 insertions(+), 25 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/cpu.h
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
- VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
+     }
- VCGE_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
+ }
-+VSHL_S_3s        1111 001 0 0 . .. .... .... 0100 . . . 0 .... @3same
++static inline bool isar_feature_aa32_mve(const ARMISARegisters *id)
 +VSHL_U_3s        1111 001 1 0 . .. .... .... 0100 . . . 0 .... @3same
 +
  VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
  VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
  VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
@@ -XXX,XX +XXX,XX @@ VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
  VTST_3s          1111 001 0 0 . .. .... .... 1000 . . . 1 .... @3same
  VCEQ_3s          1111 001 1 0 . .. .... .... 1000 . . . 1 .... @3same
 +
 +VMLA_3s          1111 001 0 0 . .. .... .... 1001 . . . 0 .... @3same
 +VMLS_3s          1111 001 1 0 . .. .... .... 1001 . . . 0 .... @3same
 +
 +VMUL_3s          1111 001 0 0 . .. .... .... 1001 . . . 1 .... @3same
 +VMUL_p_3s        1111 001 1 0 . .. .... .... 1001 . . . 1 .... @3same
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
  DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
  DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
  DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
 +DO_3SAME_NO_SZ_3(VMUL, tcg_gen_gvec_mul)
  #define DO_3SAME_CMP(INSN, COND)                                        \
      static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
@@ -XXX,XX +XXX,XX @@ DO_3SAME_GVEC4(VQADD_S, sqadd_op)
  DO_3SAME_GVEC4(VQADD_U, uqadd_op)
  DO_3SAME_GVEC4(VQSUB_S, sqsub_op)
  DO_3SAME_GVEC4(VQSUB_U, uqsub_op)
 +
 +static void gen_VMUL_p_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
 +                           uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz)
 +{
-+    tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz,
++    /*
-+                       0, gen_helper_gvec_pmul_b);
++     * Return true if MVE is supported (either integer or floating point).
 +     * We must check for M-profile as the MVFR1 field means something
 +     * else for A-profile.
 +     */
 +    return isar_feature_aa32_mprofile(id) &&
 +        FIELD_EX32(id->mvfr1, MVFR1, MVE) > 0;
 +}
 +
-+static bool trans_VMUL_p_3s(DisasContext *s, arg_3same *a)
++static inline bool isar_feature_aa32_mve_fp(const ARMISARegisters *id)
 +{
-+    if (a->size != 0) {
++    /*
-+        return false;
++     * Return true if MVE is supported (either integer or floating point).
-+    }
++     * We must check for M-profile as the MVFR1 field means something
-+    return do_3same(s, a, gen_VMUL_p_3s);
++     * else for A-profile.
 +     */
 +    return isar_feature_aa32_mprofile(id) &&
 +        FIELD_EX32(id->mvfr1, MVFR1, MVE) >= 2;
 +}
 +
-+#define DO_3SAME_GVEC3_NO_SZ_3(INSN, OPARRAY)                           \
+ static inline bool isar_feature_aa32_vfp_simd(const ARMISARegisters *id)
-+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+ {
-+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+     /*
 +                                uint32_t oprsz, uint32_t maxsz)         \
 +    {                                                                   \
 +        tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,                          \
 +                       oprsz, maxsz, &OPARRAY[vece]);                   \
 +    }                                                                   \
 +    DO_3SAME_NO_SZ_3(INSN, gen_##INSN##_3s)
 +
 +
 +DO_3SAME_GVEC3_NO_SZ_3(VMLA, mla_op)
 +DO_3SAME_GVEC3_NO_SZ_3(VMLS, mls_op)
 +
 +#define DO_3SAME_GVEC3_SHIFT(INSN, OPARRAY)                             \
 +    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
 +                                uint32_t rn_ofs, uint32_t rm_ofs,       \
 +                                uint32_t oprsz, uint32_t maxsz)         \
 +    {                                                                   \
 +        /* Note the operation is vshl vd,vm,vn */                       \
 +        tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs,                          \
 +                       oprsz, maxsz, &OPARRAY[vece]);                   \
 +    }                                                                   \
 +    DO_3SAME(INSN, gen_##INSN##_3s)
 +
 +DO_3SAME_GVEC3_SHIFT(VSHL_S, sshl_op)
 +DO_3SAME_GVEC3_SHIFT(VSHL_U, ushl_op)
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
              }
              return 1;
 -        case NEON_3R_VMUL: /* VMUL */
 -            if (u) {
 -                /* Polynomial case allows only P8.  */
 -                if (size != 0) {
 -                    return 1;
 -                }
 -                tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
 -                                   0, gen_helper_gvec_pmul_b);
 -            } else {
 -                tcg_gen_gvec_mul(size, rd_ofs, rn_ofs, rm_ofs,
 -                                 vec_size, vec_size);
 -            }
 -            return 0;
 -
 -        case NEON_3R_VML: /* VMLA, VMLS */
 -            tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
 -                           u ? &mls_op[size] : &mla_op[size]);
 -            return 0;
 -
 -        case NEON_3R_VSHL:
 -            /* Note the operation is vshl vd,vm,vn */
 -            tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
 -                           u ? &ushl_op[size] : &sshl_op[size]);
 -            return 0;
 -
          case NEON_3R_VADD_VSUB:
          case NEON_3R_LOGIC:
          case NEON_3R_VMAX:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
          case NEON_3R_VCGE:
          case NEON_3R_VQADD:
          case NEON_3R_VQSUB:
 +        case NEON_3R_VMUL:
 +        case NEON_3R_VML:
 +        case NEON_3R_VSHL:
              /* Already handled by decodetree */
              return 1;
          }
 --
 .20.1

-[PULL 31/39] target/arm: Convert Neon 'load single structure to all lanes' to decodetree
+[PULL 02/45] target/arm: Update feature checks for insns which are "MVE or FP"
-Convert the Neon "load single structure to all lanes" insns to
+Some v8M instructions are present if either the floating point
-decodetree.
+extension or MVE is implemented.  Update our implementation of them
 to check for MVE as well as for FP.
 This is all the insns which use CheckDecodeFaults(ExtType_MveOrFp) or
 CheckDecodeFaults(ExtType_MveOrDpFp) in their pseudocode, which are
 essentially the loads and stores, moves and sysreg accesses, except
 for VMOV_reg_sp and VMOV_reg_dp, which we handle in subsequent
 patches because they need a refactor to provide a place to put the
 new MVE check.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-13-peter.maydell@linaro.org
+Message-id: 20210520152840.24453-3-peter.maydell@linaro.org
 ---
- target/arm/neon-ls.decode       |  5 +++
+ target/arm/translate-vfp.c | 48 +++++++++++++++++++++++---------------
- target/arm/translate-neon.inc.c | 73 +++++++++++++++++++++++++++++++++
+file changed, 29 insertions(+), 19 deletions(-)
  target/arm/translate.c          | 55 +------------------------
 files changed, 80 insertions(+), 53 deletions(-)
-diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-ls.decode
+--- a/target/arm/translate-vfp.c
-+++ b/target/arm/neon-ls.decode
++++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
+     /* VMOV scalar to general purpose register */
- VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
+     TCGv_i32 tmp;
-                vd=%vd_dp
-+
+-    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
-+# Neon load single element to all lanes
+-    if (a->size == MO_32
-+
+-        ? !dc_isar_feature(aa32_fpsp_v2, s)
-+VLD_all_lanes  1111 0100 1 . 1 0 rn:4 .... 11 n:2 size:2 t:1 a:1 rm:4 \
+-        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+               vd=%vd_dp
+-        return false;
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
++    /*
-index XXXXXXX..XXXXXXX 100644
++     * SIZE == MO_32 is a VFP instruction; otherwise NEON. MVE has
---- a/target/arm/translate-neon.inc.c
++     * all sizes, whether the CPU has fp or not.
-+++ b/target/arm/translate-neon.inc.c
++     */
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_multiple(DisasContext *s, arg_VLDST_multiple *a)
++    if (!dc_isar_feature(aa32_mve, s)) {
-     gen_neon_ldst_base_update(s, a->rm, a->rn, nregs * interleave * 8);
++        if (a->size == MO_32
-     return true;
++            ? !dc_isar_feature(aa32_fpsp_v2, s)
- }
++            : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +
 +static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
 +{
 +    /* Neon load single structure to all lanes */
 +    int reg, stride, vec_size;
 +    int vd = a->vd;
 +    int size = a->size;
 +    int nregs = a->n + 1;
 +    TCGv_i32 addr, tmp;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
 +        return false;
 +    }
 +
 +    if (size == 3) {
 +        if (nregs != 4 || a->a == 0) {
 +            return false;
 +        }
-+        /* For VLD4 size == 3 a == 1 means 32 bits at 16 byte alignment */
+     }
-+        size = 2;
-+    }
+     /* UNDEF accesses to D16-D31 if they don't exist */
-+    if (nregs == 1 && a->a == 1 && size == 0) {
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
-+        return false;
+     /* VMOV general purpose register to scalar */
-+    }
+     TCGv_i32 tmp;
-+    if (nregs == 3 && a->a == 1) {
-+        return false;
+-    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
-+    }
+-    if (a->size == MO_32
-+
+-        ? !dc_isar_feature(aa32_fpsp_v2, s)
-+    if (!vfp_access_check(s)) {
+-        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        return true;
+-        return false;
 +    }
 +
 +    /*
-+     * VLD1 to all lanes: T bit indicates how many Dregs to write.
++     * SIZE == MO_32 is a VFP instruction; otherwise NEON. MVE has
-+     * VLD2/3/4 to all lanes: T bit indicates register stride.
++     * all sizes, whether the CPU has fp or not.
 +     */
-+    stride = a->t ? 2 : 1;
++    if (!dc_isar_feature(aa32_mve, s)) {
-+    vec_size = nregs == 1 ? stride * 8 : 8;
++        if (a->size == MO_32
-+
++            ? !dc_isar_feature(aa32_fpsp_v2, s)
-+    tmp = tcg_temp_new_i32();
++            : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+    addr = tcg_temp_new_i32();
++            return false;
 +    load_reg_var(s, addr, a->rn);
 +    for (reg = 0; reg < nregs; reg++) {
 +        gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
 +                        s->be_data | size);
 +        if ((vd & 1) && vec_size == 16) {
 +            /*
 +             * We cannot write 16 bytes at once because the
 +             * destination is unaligned.
 +             */
 +            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
 +                                 8, 8, tmp);
 +            tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
 +                             neon_reg_offset(vd, 0), 8, 8);
 +        } else {
 +            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
 +                                 vec_size, vec_size, tmp);
 +        }
-+        tcg_gen_addi_i32(addr, addr, 1 << size);
+     }
-+        vd += stride;
-+    }
+     /* UNDEF accesses to D16-D31 if they don't exist */
-+    tcg_temp_free_i32(tmp);
+@@ -XXX,XX +XXX,XX @@ typedef enum FPSysRegCheckResult {
-+    tcg_temp_free_i32(addr);
-+
+ static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
-+    gen_neon_ldst_base_update(s, a->rm, a->rn, (1 << size) * nregs);
+ {
-+
+-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
-+    return true;
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
-+}
+         return FPSysRegCheckFailed;
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+     }
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
-+++ b/target/arm/translate.c
+ {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
      int size;
      int reg;
      int load;
 -    int vec_size;
      TCGv_i32 addr;
      TCGv_i32 tmp;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
+-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
-     } else {
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
-         size = (insn >> 10) & 3;
+         return false;
-         if (size == 3) {
+     }
--            /* Load single element to all lanes.  */
--            int a = (insn >> 4) & 1;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
--            if (!load) {
+ {
--                return 1;
+     TCGv_i32 tmp;
--            }
--            size = (insn >> 6) & 3;
+-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
--            nregs = ((insn >> 8) & 3) + 1;
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
--
+         return false;
--            if (size == 3) {
+     }
--                if (nregs != 4 || a == 0) {
--                    return 1;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
--                }
+      * floating point register.  Note that this does not require support
--                /* For VLD4 size==3 a == 1 means 32 bits at 16 byte alignment */
+      * for double precision arithmetic.
--                size = 2;
+      */
--            }
+-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
--            if (nregs == 1 && a == 1 && size == 0) {
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
--                return 1;
+         return false;
--            }
+     }
--            if (nregs == 3 && a == 1) {
--                return 1;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
--            }
+     uint32_t offset;
--            addr = tcg_temp_new_i32();
+     TCGv_i32 addr, tmp;
--            load_reg_var(s, addr, rn);
--
+-    if (!dc_isar_feature(aa32_fp16_arith, s)) {
--            /* VLD1 to all lanes: bit 5 indicates how many Dregs to write.
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
--             * VLD2/3/4 to all lanes: bit 5 indicates register stride.
+         return false;
--             */
+     }
--            stride = (insn & (1 << 5)) ? 2 : 1;
--            vec_size = nregs == 1 ? stride * 8 : 8;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
--
+     uint32_t offset;
--            tmp = tcg_temp_new_i32();
+     TCGv_i32 addr, tmp;
--            for (reg = 0; reg < nregs; reg++) {
--                gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
+-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
--                                s->be_data | size);
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
--                if ((rd & 1) && vec_size == 16) {
+         return false;
--                    /* We cannot write 16 bytes at once because the
+     }
--                     * destination is unaligned.
--                     */
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
--                    tcg_gen_gvec_dup_i32(size, neon_reg_offset(rd, 0),
+     TCGv_i64 tmp;
--                                         8, 8, tmp);
--                    tcg_gen_gvec_mov(0, neon_reg_offset(rd + 1, 0),
+     /* Note that this does not require support for double arithmetic.  */
--                                     neon_reg_offset(rd, 0), 8, 8);
+-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
--                } else {
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
--                    tcg_gen_gvec_dup_i32(size, neon_reg_offset(rd, 0),
+         return false;
--                                         vec_size, vec_size, tmp);
+     }
--                }
--                tcg_gen_addi_i32(addr, addr, 1 << size);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
--                rd += stride;
+     TCGv_i32 addr, tmp;
--            }
+     int i, n;
--            tcg_temp_free_i32(tmp);
--            tcg_temp_free_i32(addr);
+-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
--            stride = (1 << size) * nregs;
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
-+            /* Load single element to all lanes -- handled by decodetree  */
+         return false;
-+            return 1;
+     }
-         } else {
-             /* Single element.  */
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
-             int idx = (insn >> 4) & 0xf;
+     int i, n;
      /* Note that this does not require support for double arithmetic.  */
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return false;
      }
 --
 .20.1

-[PULL 20/39] target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check
+[PULL 03/45] target/arm: Move fpsp/fpdp isar check into callers of do_vfp_2op_sp/dp
-Somewhere along theline we accidentally added a duplicate
+The do_vfp_2op_sp() and do_vfp_2op_dp() functions currently check
-"using D16-D31 when they don't exist" check to do_vfm_dp()
+whether floating point is supported via the aa32_fpdp_v2 and
-(probably an artifact of a patchseries rebase). Remove it.
+aa32_fpsp_v2 isar checks.  For v8.1M MVE support, the VMOV_reg trans
 functions (but not any of the others) need to update this to also
 allow the insn if MVE is implemented.  Move the check out of the do_
 function and into its callsites (which are all implemented via the
 DO_VFP_2OP macro), so we have a place to change the check for the
 VMOV insns.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20210520152840.24453-4-peter.maydell@linaro.org
 Message-id: 20200430181003.21682-2-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.inc.c | 6 ------
+ target/arm/translate-vfp.c | 37 +++++++++++++++++++------------------
-file changed, 6 deletions(-)
+file changed, 19 insertions(+), 18 deletions(-)
-diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.inc.c
+--- a/target/arm/translate-vfp.c
-+++ b/target/arm/translate-vfp.inc.c
++++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
      int veclen = s->vec_len;
      TCGv_i32 f0, fd;
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 -        return false;
 -    }
 +    /* Note that the caller must check the aa32_fpsp_v2 feature. */
      if (!dc_isar_feature(aa32_fpshvec, s) &&
          (veclen != 0 || s->vec_stride != 0)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
       */
      TCGv_i32 f0;
 +    /* Note that the caller must check the aa32_fp16_arith feature */
 +
      if (!dc_isar_feature(aa32_fp16_arith, s)) {
          return false;
      }
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
--    /* UNDEF accesses to D16-D31 if they don't exist. */
+     int veclen = s->vec_len;
--    if (!dc_isar_feature(aa32_simd_r32, s) &&
+     TCGv_i64 f0, fd;
--        ((a->vd | a->vn | a->vm) & 0x10)) {
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 -        return false;
 -    }
--
++    /* Note that the caller must check the aa32_fpdp_v2 feature. */
-     if (!vfp_access_check(s)) {
-         return true;
+     /* UNDEF accesses to D16-D31 if they don't exist */
      if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vm) & 0x10)) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
      return true;
  }
 -#define DO_VFP_2OP(INSN, PREC, FN)                              \
 +#define DO_VFP_2OP(INSN, PREC, FN, CHECK)                       \
      static bool trans_##INSN##_##PREC(DisasContext *s,          \
                                        arg_##INSN##_##PREC *a)   \
      {                                                           \
 +        if (!dc_isar_feature(CHECK, s)) {                       \
 +            return false;                                       \
 +        }                                                       \
          return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
      }
+-DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32)
+-DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64)
++DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32, aa32_fpsp_v2)
++DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64, aa32_fpdp_v2)
+-DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh)
+-DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss)
+-DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd)
++DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh, aa32_fp16_arith)
++DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss, aa32_fpsp_v2)
++DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd, aa32_fpdp_v2)
+-DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh)
+-DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs)
+-DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd)
++DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh, aa32_fp16_arith)
++DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs, aa32_fpsp_v2)
++DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd, aa32_fpdp_v2)
+ static void gen_VSQRT_hp(TCGv_i32 vd, TCGv_i32 vm)
+ {
+@@ -XXX,XX +XXX,XX @@ static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
+     gen_helper_vfp_sqrtd(vd, vm, cpu_env);
+ }
+-DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp)
+-DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp)
+-DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp)
++DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp, aa32_fp16_arith)
++DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp, aa32_fpsp_v2)
++DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp, aa32_fpdp_v2)
+ static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
+ {
 --
 .20.1

-[PULL 35/39] target/arm: Convert Neon 3-reg-same VMAX/VMIN to decodetree
+[PULL 04/45] target/arm: Add MVE check to VMOV_reg_sp and VMOV_reg_dp
-Convert the Neon 3-reg-same VMAX and VMIN insns to decodetree.
+Split out the handling of VMOV_reg_sp and VMOV_reg_dp so that we can
 permit the insns if either FP or MVE are present.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-17-peter.maydell@linaro.org
+Message-id: 20210520152840.24453-5-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  5 +++++
+ target/arm/translate-vfp.c | 15 +++++++++++++--
- target/arm/translate-neon.inc.c | 14 ++++++++++++++
+file changed, 13 insertions(+), 2 deletions(-)
  target/arm/translate.c          | 21 ++-------------------
 files changed, 21 insertions(+), 19 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/translate-vfp.c
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
- VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
+         return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
- VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
+     }
-+VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
+-DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32, aa32_fpsp_v2)
-+VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
+-DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64, aa32_fpdp_v2)
-+VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
++#define DO_VFP_VMOV(INSN, PREC, FN)                             \
-+VMIN_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 1 .... @3same
++    static bool trans_##INSN##_##PREC(DisasContext *s,          \
-+
++                                      arg_##INSN##_##PREC *a)   \
- VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
++    {                                                           \
- VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
++        if (!dc_isar_feature(aa32_fp##PREC##_v2, s) &&          \
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
++            !dc_isar_feature(aa32_mve, s)) {                    \
-index XXXXXXX..XXXXXXX 100644
++            return false;                                       \
---- a/target/arm/translate-neon.inc.c
++        }                                                       \
-+++ b/target/arm/translate-neon.inc.c
++        return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
@@ -XXX,XX +XXX,XX @@ DO_3SAME(VEOR, tcg_gen_gvec_xor)
  DO_3SAME_BITSEL(VBSL, rd_ofs, rn_ofs, rm_ofs)
  DO_3SAME_BITSEL(VBIT, rm_ofs, rn_ofs, rd_ofs)
  DO_3SAME_BITSEL(VBIF, rm_ofs, rd_ofs, rn_ofs)
 +
 +#define DO_3SAME_NO_SZ_3(INSN, FUNC)                                    \
 +    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
 +    {                                                                   \
 +        if (a->size == 3) {                                             \
 +            return false;                                               \
 +        }                                                               \
 +        return do_3same(s, a, FUNC);                                    \
 +    }
 +
-+DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
++DO_VFP_VMOV(VMOV_reg, sp, tcg_gen_mov_i32)
-+DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
++DO_VFP_VMOV(VMOV_reg, dp, tcg_gen_mov_i64)
-+DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
-+DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
+ DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh, aa32_fp16_arith)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+ DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss, aa32_fpsp_v2)
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                               rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
              return 0;
 -        case NEON_3R_VMAX:
 -            if (u) {
 -                tcg_gen_gvec_umax(size, rd_ofs, rn_ofs, rm_ofs,
 -                                  vec_size, vec_size);
 -            } else {
 -                tcg_gen_gvec_smax(size, rd_ofs, rn_ofs, rm_ofs,
 -                                  vec_size, vec_size);
 -            }
 -            return 0;
 -        case NEON_3R_VMIN:
 -            if (u) {
 -                tcg_gen_gvec_umin(size, rd_ofs, rn_ofs, rm_ofs,
 -                                  vec_size, vec_size);
 -            } else {
 -                tcg_gen_gvec_smin(size, rd_ofs, rn_ofs, rm_ofs,
 -                                  vec_size, vec_size);
 -            }
 -            return 0;
 -
          case NEON_3R_VSHL:
              /* Note the operation is vshl vd,vm,vn */
              tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
          case NEON_3R_VADD_VSUB:
          case NEON_3R_LOGIC:
 +        case NEON_3R_VMAX:
 +        case NEON_3R_VMIN:
              /* Already handled by decodetree */
              return 1;
          }
 --
 .20.1

-[PULL 37/39] target/arm: Convert Neon 3-reg-same VQADD/VQSUB to decodetree
+[PULL 05/45] target/arm: Fix return values in fp_sysreg_checks()
-Convert the Neon VQADD/VQSUB insns in the 3-reg-same grouping
+The fp_sysreg_checks() function is supposed to be returning an
-to decodetree.
+FPSysRegCheckResult, which is an enum with three possible values.
 However, three places in the function "return false" (a hangover from
 a previous iteration of the design where the function just returned a
 bool).  Make these return FPSysRegCheckFailed instead (for no
 functional change, since both false and FPSysRegCheckFailed are
 zero).
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-19-peter.maydell@linaro.org
+Message-id: 20210520152840.24453-6-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  6 ++++++
+ target/arm/translate-vfp.c | 6 +++---
- target/arm/translate-neon.inc.c | 15 +++++++++++++++
+file changed, 3 insertions(+), 3 deletions(-)
  target/arm/translate.c          | 14 ++------------
 files changed, 23 insertions(+), 12 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/translate-vfp.c
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
- @3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
+         break;
-                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
+     case ARM_VFP_FPSCR_NZCVQC:
+         if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-+VQADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 1 .... @3same
+-            return false;
-+VQADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 1 .... @3same
++            return FPSysRegCheckFailed;
 +
  @3same_logic     .... ... . . . .. .... .... .... . q:1 .. .... \
                   &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0
@@ -XXX,XX +XXX,XX @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
  VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
  VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
 +VQSUB_S_3s       1111 001 0 0 . .. .... .... 0010 . . . 1 .... @3same
 +VQSUB_U_3s       1111 001 1 0 . .. .... .... 0010 . . . 1 .... @3same
 +
  VCGT_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 0 .... @3same
  VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
  VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static void gen_VTST_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
      tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &cmtst_op[vece]);
  }
  DO_3SAME_NO_SZ_3(VTST, gen_VTST_3s)
 +
 +#define DO_3SAME_GVEC4(INSN, OPARRAY)                                   \
 +    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
 +                                uint32_t rn_ofs, uint32_t rm_ofs,       \
 +                                uint32_t oprsz, uint32_t maxsz)         \
 +    {                                                                   \
 +        tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),           \
 +                       rn_ofs, rm_ofs, oprsz, maxsz, &OPARRAY[vece]);   \
 +    }                                                                   \
 +    DO_3SAME(INSN, gen_##INSN##_3s)
 +
 +DO_3SAME_GVEC4(VQADD_S, sqadd_op)
 +DO_3SAME_GVEC4(VQADD_U, uqadd_op)
 +DO_3SAME_GVEC4(VQSUB_S, sqsub_op)
 +DO_3SAME_GVEC4(VQSUB_U, uqsub_op)
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
              }
              return 1;
 -        case NEON_3R_VQADD:
 -            tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
 -                           rn_ofs, rm_ofs, vec_size, vec_size,
 -                           (u ? uqadd_op : sqadd_op) + size);
 -            return 0;
 -
 -        case NEON_3R_VQSUB:
 -            tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
 -                           rn_ofs, rm_ofs, vec_size, vec_size,
 -                           (u ? uqsub_op : sqsub_op) + size);
 -            return 0;
 -
          case NEON_3R_VMUL: /* VMUL */
              if (u) {
                  /* Polynomial case allows only P8.  */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
          case NEON_3R_VTST_VCEQ:
          case NEON_3R_VCGT:
          case NEON_3R_VCGE:
 +        case NEON_3R_VQADD:
 +        case NEON_3R_VQSUB:
              /* Already handled by decodetree */
              return 1;
          }
+         break;
+     case ARM_VFP_FPCXT_S:
+     case ARM_VFP_FPCXT_NS:
+         if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+-            return false;
++            return FPSysRegCheckFailed;
+         }
+         if (!s->v8m_secure) {
+-            return false;
++            return FPSysRegCheckFailed;
+         }
+         break;
+     default:
 --
 .20.1

-[PULL 30/39] target/arm: Convert Neon load/store multiple structures to decodetree
+[PULL 06/45] target/arm: Implement M-profile VPR register
-Convert the Neon "load/store multiple structures" insns to decodetree.
+If MVE is implemented for an M-profile CPU then it has a VPR
 register, which tracks predication information.
 Implement the read and write handling of this register, and
 the migration of its state.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-12-peter.maydell@linaro.org
+Message-id: 20210520152840.24453-7-peter.maydell@linaro.org
 ---
- target/arm/neon-ls.decode       |   7 ++
+ target/arm/cpu.h           |  6 ++++++
- target/arm/translate-neon.inc.c | 124 ++++++++++++++++++++++++++++++++
+ target/arm/machine.c       | 19 +++++++++++++++++++
- target/arm/translate.c          |  91 +----------------------
+ target/arm/translate-vfp.c | 38 ++++++++++++++++++++++++++++++++++++++
-files changed, 133 insertions(+), 89 deletions(-)
+files changed, 63 insertions(+)
-diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-ls.decode
+--- a/target/arm/cpu.h
-+++ b/target/arm/neon-ls.decode
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
- #   0b1111_1001_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
+         uint32_t cpacr[M_REG_NUM_BANKS];
- # This file works on the A32 encoding only; calling code for T32 has to
+         uint32_t nsacr;
- # transform the insn into the A32 version first.
+         int ltpsize;
 +        uint32_t vpr;
      } v7m;
      /* Information associated with an exception about to be taken:
@@ -XXX,XX +XXX,XX @@ FIELD(V7M_FPCCR, ASPEN, 31, 1)
       R_V7M_FPCCR_UFRDY_MASK |                   \
       R_V7M_FPCCR_ASPEN_MASK)
 +/* v7M VPR bits */
 +FIELD(V7M_VPR, P0, 0, 16)
 +FIELD(V7M_VPR, MASK01, 16, 4)
 +FIELD(V7M_VPR, MASK23, 20, 4)
 +
-+%vd_dp  22:1 12:4
+ /*
   * System register ID fields.
   */
 diff --git a/target/arm/machine.c b/target/arm/machine.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/machine.c
 +++ b/target/arm/machine.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_fp = {
      }
  };
 +static bool mve_needed(void *opaque)
 +{
 +    ARMCPU *cpu = opaque;
 +
-+# Neon load/store multiple structures
++    return cpu_isar_feature(aa32_mve, cpu);
 +}
 +
-+VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
++static const VMStateDescription vmstate_m_mve = {
-+               vd=%vd_dp
++    .name = "cpu/m/mve",
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
++    .version_id = 1,
-index XXXXXXX..XXXXXXX 100644
++    .minimum_version_id = 1,
---- a/target/arm/translate-neon.inc.c
++    .needed = mve_needed,
-+++ b/target/arm/translate-neon.inc.c
++    .fields = (VMStateField[]) {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
++        VMSTATE_UINT32(env.v7m.vpr, ARMCPU),
-                        gen_helper_gvec_fmlal_idx_a32);
++        VMSTATE_END_OF_LIST()
-     return true;
++    },
  }
 +
 +static struct {
 +    int nregs;
 +    int interleave;
 +    int spacing;
 +} const neon_ls_element_type[11] = {
 +    {1, 4, 1},
 +    {1, 4, 2},
 +    {4, 1, 1},
 +    {2, 2, 2},
 +    {1, 3, 1},
 +    {1, 3, 2},
 +    {3, 1, 1},
 +    {1, 1, 1},
 +    {1, 2, 1},
 +    {1, 2, 2},
 +    {2, 1, 1}
 +};
 +
-+static void gen_neon_ldst_base_update(DisasContext *s, int rm, int rn,
+ static const VMStateDescription vmstate_m = {
-+                                      int stride)
+     .name = "cpu/m",
-+{
+     .version_id = 4,
-+    if (rm != 15) {
+@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m = {
-+        TCGv_i32 base;
+         &vmstate_m_other_sp,
-+
+         &vmstate_m_v8m,
-+        base = load_reg(s, rn);
+         &vmstate_m_fp,
-+        if (rm == 13) {
++        &vmstate_m_mve,
-+            tcg_gen_addi_i32(base, base, stride);
+         NULL
-+        } else {
+     }
-+            TCGv_i32 index;
+ };
-+            index = load_reg(s, rm);
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
-+            tcg_gen_add_i32(base, base, index);
+index XXXXXXX..XXXXXXX 100644
-+            tcg_temp_free_i32(index);
+--- a/target/arm/translate-vfp.c
-+        }
++++ b/target/arm/translate-vfp.c
-+        store_reg(s, rn, base);
+@@ -XXX,XX +XXX,XX @@ static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
-+    }
+             return FPSysRegCheckFailed;
-+}
+         }
-+
+         break;
-+static bool trans_VLDST_multiple(DisasContext *s, arg_VLDST_multiple *a)
++    case ARM_VFP_VPR:
-+{
++    case ARM_VFP_P0:
-+    /* Neon load/store multiple structures */
++        if (!dc_isar_feature(aa32_mve, s)) {
-+    int nregs, interleave, spacing, reg, n;
++            return FPSysRegCheckFailed;
 +    MemOp endian = s->be_data;
 +    int mmu_idx = get_mem_index(s);
 +    int size = a->size;
 +    TCGv_i64 tmp64;
 +    TCGv_i32 addr, tmp;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
 +        return false;
 +    }
 +    if (a->itype > 10) {
 +        return false;
 +    }
 +    /* Catch UNDEF cases for bad values of align field */
 +    switch (a->itype & 0xc) {
 +    case 4:
 +        if (a->align >= 2) {
 +            return false;
 +        }
 +        break;
-+    case 8:
+     default:
-+        if (a->align == 3) {
+         return FPSysRegCheckFailed;
-+            return false;
+     }
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
          tcg_temp_free_i32(sfpa);
          break;
      }
 +    case ARM_VFP_VPR:
 +        /* Behaves as NOP if not privileged */
 +        if (IS_USER(s)) {
 +            break;
 +        }
++        tmp = loadfn(s, opaque);
++        store_cpu_field(tmp, v7m.vpr);
 +        break;
-+    default:
++    case ARM_VFP_P0:
 +    {
 +        TCGv_i32 vpr;
 +        tmp = loadfn(s, opaque);
 +        vpr = load_cpu_field(v7m.vpr);
 +        tcg_gen_deposit_i32(vpr, vpr, tmp,
 +                            R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
 +        store_cpu_field(vpr, v7m.vpr);
 +        tcg_temp_free_i32(tmp);
 +        break;
 +    }
-+    nregs = neon_ls_element_type[a->itype].nregs;
+     default:
-+    interleave = neon_ls_element_type[a->itype].interleave;
+         g_assert_not_reached();
-+    spacing = neon_ls_element_type[a->itype].spacing;
+     }
-+    if (size == 3 && (interleave | spacing) != 1) {
+@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
-+        return false;
+         tcg_temp_free_i32(fpscr);
-+    }
+         break;
-+
+     }
-+    if (!vfp_access_check(s)) {
++    case ARM_VFP_VPR:
-+        return true;
++        /* Behaves as NOP if not privileged */
-+    }
++        if (IS_USER(s)) {
-+
++            break;
 +    /* For our purposes, bytes are always little-endian.  */
 +    if (size == 0) {
 +        endian = MO_LE;
 +    }
 +    /*
 +     * Consecutive little-endian elements from a single register
 +     * can be promoted to a larger little-endian operation.
 +     */
 +    if (interleave == 1 && endian == MO_LE) {
 +        size = 3;
 +    }
 +    tmp64 = tcg_temp_new_i64();
 +    addr = tcg_temp_new_i32();
 +    tmp = tcg_const_i32(1 << size);
 +    load_reg_var(s, addr, a->rn);
 +    for (reg = 0; reg < nregs; reg++) {
 +        for (n = 0; n < 8 >> size; n++) {
 +            int xs;
 +            for (xs = 0; xs < interleave; xs++) {
 +                int tt = a->vd + reg + spacing * xs;
 +
 +                if (a->l) {
 +                    gen_aa32_ld_i64(s, tmp64, addr, mmu_idx, endian | size);
 +                    neon_store_element64(tt, n, size, tmp64);
 +                } else {
 +                    neon_load_element64(tmp64, tt, n, size);
 +                    gen_aa32_st_i64(s, tmp64, addr, mmu_idx, endian | size);
 +                }
 +                tcg_gen_add_i32(addr, addr, tmp);
 +            }
 +        }
-+    }
++        tmp = load_cpu_field(v7m.vpr);
-+    tcg_temp_free_i32(addr);
++        storefn(s, opaque, tmp);
-+    tcg_temp_free_i32(tmp);
++        break;
-+    tcg_temp_free_i64(tmp64);
++    case ARM_VFP_P0:
-+
++        tmp = load_cpu_field(v7m.vpr);
-+    gen_neon_ldst_base_update(s, a->rm, a->rn, nregs * interleave * 8);
++        tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
-+    return true;
++        storefn(s, opaque, tmp);
-+}
++        break;
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+     default:
-index XXXXXXX..XXXXXXX 100644
+         g_assert_not_reached();
---- a/target/arm/translate.c
+     }
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_neon_trn_u16(TCGv_i32 t0, TCGv_i32 t1)
  }
 -static struct {
 -    int nregs;
 -    int interleave;
 -    int spacing;
 -} const neon_ls_element_type[11] = {
 -    {1, 4, 1},
 -    {1, 4, 2},
 -    {4, 1, 1},
 -    {2, 2, 2},
 -    {1, 3, 1},
 -    {1, 3, 2},
 -    {3, 1, 1},
 -    {1, 1, 1},
 -    {1, 2, 1},
 -    {1, 2, 2},
 -    {2, 1, 1}
 -};
 -
  /* Translate a NEON load/store element instruction.  Return nonzero if the
     instruction is invalid.  */
  static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
  {
      int rd, rn, rm;
 -    int op;
      int nregs;
 -    int interleave;
 -    int spacing;
      int stride;
      int size;
      int reg;
      int load;
 -    int n;
      int vec_size;
 -    int mmu_idx;
 -    MemOp endian;
      TCGv_i32 addr;
      TCGv_i32 tmp;
 -    TCGv_i32 tmp2;
 -    TCGv_i64 tmp64;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
      rn = (insn >> 16) & 0xf;
      rm = insn & 0xf;
      load = (insn & (1 << 21)) != 0;
 -    endian = s->be_data;
 -    mmu_idx = get_mem_index(s);
      if ((insn & (1 << 23)) == 0) {
 -        /* Load store all elements.  */
 -        op = (insn >> 8) & 0xf;
 -        size = (insn >> 6) & 3;
 -        if (op > 10)
 -            return 1;
 -        /* Catch UNDEF cases for bad values of align field */
 -        switch (op & 0xc) {
 -        case 4:
 -            if (((insn >> 5) & 1) == 1) {
 -                return 1;
 -            }
 -            break;
 -        case 8:
 -            if (((insn >> 4) & 3) == 3) {
 -                return 1;
 -            }
 -            break;
 -        default:
 -            break;
 -        }
 -        nregs = neon_ls_element_type[op].nregs;
 -        interleave = neon_ls_element_type[op].interleave;
 -        spacing = neon_ls_element_type[op].spacing;
 -        if (size == 3 && (interleave | spacing) != 1) {
 -            return 1;
 -        }
 -        /* For our purposes, bytes are always little-endian.  */
 -        if (size == 0) {
 -            endian = MO_LE;
 -        }
 -        /* Consecutive little-endian elements from a single register
 -         * can be promoted to a larger little-endian operation.
 -         */
 -        if (interleave == 1 && endian == MO_LE) {
 -            size = 3;
 -        }
 -        tmp64 = tcg_temp_new_i64();
 -        addr = tcg_temp_new_i32();
 -        tmp2 = tcg_const_i32(1 << size);
 -        load_reg_var(s, addr, rn);
 -        for (reg = 0; reg < nregs; reg++) {
 -            for (n = 0; n < 8 >> size; n++) {
 -                int xs;
 -                for (xs = 0; xs < interleave; xs++) {
 -                    int tt = rd + reg + spacing * xs;
 -
 -                    if (load) {
 -                        gen_aa32_ld_i64(s, tmp64, addr, mmu_idx, endian | size);
 -                        neon_store_element64(tt, n, size, tmp64);
 -                    } else {
 -                        neon_load_element64(tmp64, tt, n, size);
 -                        gen_aa32_st_i64(s, tmp64, addr, mmu_idx, endian | size);
 -                    }
 -                    tcg_gen_add_i32(addr, addr, tmp2);
 -                }
 -            }
 -        }
 -        tcg_temp_free_i32(addr);
 -        tcg_temp_free_i32(tmp2);
 -        tcg_temp_free_i64(tmp64);
 -        stride = nregs * interleave * 8;
 +        /* Load store all elements -- handled already by decodetree */
 +        return 1;
      } else {
          size = (insn >> 10) & 3;
          if (size == 3) {
 --
 .20.1

-[PULL 36/39] target/arm: Convert Neon 3-reg-same comparisons to decodetree
+[PULL 07/45] target/arm: Make FPSCR.LTPSIZE writable for MVE
-Convert the Neon comparison ops in the 3-reg-same grouping
+The M-profile FPSCR has an LTPSIZE field, but if MVE is not
-to decodetree.
+implemented it is read-only and always reads as 4; this is how QEMU
 currently handles it.
 Make the field writable when MVE is implemented.
 We can safely add the field to the MVE migration struct because
 currently no CPUs enable MVE and so the migration struct is never
 used.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-18-peter.maydell@linaro.org
+Message-id: 20210520152840.24453-8-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  8 ++++++++
+ target/arm/cpu.h        | 3 ++-
- target/arm/translate-neon.inc.c | 22 ++++++++++++++++++++++
+ target/arm/machine.c    | 1 +
- target/arm/translate.c          | 23 +++--------------------
+ target/arm/vfp_helper.c | 9 ++++++---
-files changed, 33 insertions(+), 20 deletions(-)
+files changed, 9 insertions(+), 4 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/cpu.h
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
- VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
+         uint32_t fpdscr[M_REG_NUM_BANKS];
- VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
+         uint32_t cpacr[M_REG_NUM_BANKS];
+         uint32_t nsacr;
-+VCGT_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 0 .... @3same
+-        int ltpsize;
-+VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
++        uint32_t ltpsize;
-+VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
+         uint32_t vpr;
-+VCGE_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
+     } v7m;
@@ -XXX,XX +XXX,XX @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
  #define FPCR_LTPSIZE_SHIFT 16   /* LTPSIZE, M-profile only */
  #define FPCR_LTPSIZE_MASK (7 << FPCR_LTPSIZE_SHIFT)
 +#define FPCR_LTPSIZE_LENGTH 3
  #define FPCR_NZCV_MASK (FPCR_N | FPCR_Z | FPCR_C | FPCR_V)
  #define FPCR_NZCVQC_MASK (FPCR_NZCV_MASK | FPCR_QC)
 diff --git a/target/arm/machine.c b/target/arm/machine.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/machine.c
 +++ b/target/arm/machine.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_mve = {
      .needed = mve_needed,
      .fields = (VMStateField[]) {
          VMSTATE_UINT32(env.v7m.vpr, ARMCPU),
 +        VMSTATE_UINT32(env.v7m.ltpsize, ARMCPU),
          VMSTATE_END_OF_LIST()
      },
  };
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t vfp_get_fpscr(CPUARMState *env)
  void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
  {
 +    ARMCPU *cpu = env_archcpu(env);
 +
- VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
+     /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
- VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
+-    if (!cpu_isar_feature(any_fp16, env_archcpu(env))) {
- VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
++    if (!cpu_isar_feature(any_fp16, cpu)) {
-@@ -XXX,XX +XXX,XX @@ VMIN_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 1 .... @3same
+         val &= ~FPCR_FZ16;
+     }
- VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
- VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
+@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
-+
+          * because in v7A no-short-vector-support cores still had to
-+VTST_3s          1111 001 0 0 . .. .... .... 1000 . . . 1 .... @3same
+          * allow Stride/Len to be written with the only effect that
-+VCEQ_3s          1111 001 1 0 . .. .... .... 1000 . . . 1 .... @3same
+          * some insns are required to UNDEF if the guest sets them.
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+-         *
-index XXXXXXX..XXXXXXX 100644
+-         * TODO: if M-profile MVE implemented, set LTPSIZE.
---- a/target/arm/translate-neon.inc.c
+          */
-+++ b/target/arm/translate-neon.inc.c
+         env->vfp.vec_len = extract32(val, 16, 3);
-@@ -XXX,XX +XXX,XX @@ DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
+         env->vfp.vec_stride = extract32(val, 20, 2);
- DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
++    } else if (cpu_isar_feature(aa32_mve, cpu)) {
- DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
++        env->v7m.ltpsize = extract32(val, FPCR_LTPSIZE_SHIFT,
- DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
++                                     FPCR_LTPSIZE_LENGTH);
-+
+     }
-+#define DO_3SAME_CMP(INSN, COND)                                        \
-+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+     if (arm_feature(env, ARM_FEATURE_NEON)) {
 +                                uint32_t rn_ofs, uint32_t rm_ofs,       \
 +                                uint32_t oprsz, uint32_t maxsz)         \
 +    {                                                                   \
 +        tcg_gen_gvec_cmp(COND, vece, rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz); \
 +    }                                                                   \
 +    DO_3SAME_NO_SZ_3(INSN, gen_##INSN##_3s)
 +
 +DO_3SAME_CMP(VCGT_S, TCG_COND_GT)
 +DO_3SAME_CMP(VCGT_U, TCG_COND_GTU)
 +DO_3SAME_CMP(VCGE_S, TCG_COND_GE)
 +DO_3SAME_CMP(VCGE_U, TCG_COND_GEU)
 +DO_3SAME_CMP(VCEQ, TCG_COND_EQ)
 +
 +static void gen_VTST_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
 +                         uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz)
 +{
 +    tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &cmtst_op[vece]);
 +}
 +DO_3SAME_NO_SZ_3(VTST, gen_VTST_3s)
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             u ? &mls_op[size] : &mla_op[size]);
              return 0;
 -        case NEON_3R_VTST_VCEQ:
 -            if (u) { /* VCEQ */
 -                tcg_gen_gvec_cmp(TCG_COND_EQ, size, rd_ofs, rn_ofs, rm_ofs,
 -                                 vec_size, vec_size);
 -            } else { /* VTST */
 -                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
 -                               vec_size, vec_size, &cmtst_op[size]);
 -            }
 -            return 0;
 -
 -        case NEON_3R_VCGT:
 -            tcg_gen_gvec_cmp(u ? TCG_COND_GTU : TCG_COND_GT, size,
 -                             rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
 -            return 0;
 -
 -        case NEON_3R_VCGE:
 -            tcg_gen_gvec_cmp(u ? TCG_COND_GEU : TCG_COND_GE, size,
 -                             rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
 -            return 0;
 -
          case NEON_3R_VSHL:
              /* Note the operation is vshl vd,vm,vn */
              tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
          case NEON_3R_LOGIC:
          case NEON_3R_VMAX:
          case NEON_3R_VMIN:
 +        case NEON_3R_VTST_VCEQ:
 +        case NEON_3R_VCGT:
 +        case NEON_3R_VCGE:
              /* Already handled by decodetree */
              return 1;
          }
 --
 .20.1

-[PULL 08/39] target/arm: Use uint64_t for midr field in CPU state struct
+[PULL 08/45] target/arm: Allow board models to specify initial NS VTOR
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Currently we allow board models to specify the initial value of the
 Secure VTOR register, using an init-svtor property on the TYPE_ARMV7M
 object which is plumbed through to the CPU.  Allow board models to
 also specify the initial value of the Non-secure VTOR via a similar
 init-nsvtor property.
-MIDR_EL1 is a 64-bit system register with the top 32-bit being RES0.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Represent it in QEMU's ARMCPU struct with a uint64_t, not a
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-uint32_t.
+Message-id: 20210520152840.24453-10-peter.maydell@linaro.org
 ---
  include/hw/arm/armv7m.h |  2 ++
  target/arm/cpu.h        |  2 ++
  hw/arm/armv7m.c         |  7 +++++++
  target/arm/cpu.c        | 10 ++++++++++
 files changed, 21 insertions(+)
-This fixes an error when compiling with -Werror=conversion
+diff --git a/include/hw/arm/armv7m.h b/include/hw/arm/armv7m.h
-because we were manipulating the register value using a
+index XXXXXXX..XXXXXXX 100644
-local uint64_t variable:
+--- a/include/hw/arm/armv7m.h
++++ b/include/hw/arm/armv7m.h
-  target/arm/cpu64.c: In function ‘aarch64_max_initfn’:
+@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(ARMv7MState, ARMV7M)
-  target/arm/cpu64.c:628:21: error: conversion from ‘uint64_t’ {aka ‘long unsigned int’} to ‘uint32_t’ {aka ‘unsigned int’} may change value [-Werror=conversion]
+  *   devices will be automatically layered on top of this view.)
-|         cpu->midr = t;
+  * + Property "idau": IDAU interface (forwarded to CPU object)
-        |                     ^
+  * + Property "init-svtor": secure VTOR reset value (forwarded to CPU object)
++ * + Property "init-nsvtor": non-secure VTOR reset value (forwarded to CPU object)
-and future-proofs us against a possible future architecture
+  * + Property "vfp": enable VFP (forwarded to CPU object)
-change using some of the top 32 bits.
+  * + Property "dsp": enable DSP (forwarded to CPU object)
+  * + Property "enable-bitband": expose bitbanded IO
-Suggested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
+@@ -XXX,XX +XXX,XX @@ struct ARMv7MState {
-Suggested-by: Peter Maydell <peter.maydell@linaro.org>
+     MemoryRegion *board_memory;
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+     Object *idau;
-Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
+     uint32_t init_svtor;
-Message-id: 20200428172634.29707-1-f4bug@amsat.org
++    uint32_t init_nsvtor;
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+     bool enable_bitband;
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+     bool start_powered_off;
----
+     bool vfp;
  target/arm/cpu.h | 2 +-
  target/arm/cpu.c | 2 +-
 files changed, 2 insertions(+), 2 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
 @@ -XXX,XX +XXX,XX @@ struct ARMCPU {
-         uint64_t id_aa64dfr0;
-         uint64_t id_aa64dfr1;
+     /* For v8M, initial value of the Secure VTOR */
-     } isar;
+     uint32_t init_svtor;
--    uint32_t midr;
++    /* For v8M, initial value of the Non-secure VTOR */
-+    uint64_t midr;
++    uint32_t init_nsvtor;
-     uint32_t revidr;
-     uint32_t reset_fpsid;
+     /* [QEMU_]KVM_ARM_TARGET_* constant for this CPU, or
-     uint32_t ctr;
+      * QEMU_KVM_ARM_TARGET_NONE if the kernel doesn't support this CPU type.
 diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/armv7m.c
 +++ b/hw/arm/armv7m.c
@@ -XXX,XX +XXX,XX @@ static void armv7m_realize(DeviceState *dev, Error **errp)
              return;
          }
      }
 +    if (object_property_find(OBJECT(s->cpu), "init-nsvtor")) {
 +        if (!object_property_set_uint(OBJECT(s->cpu), "init-nsvtor",
 +                                      s->init_nsvtor, errp)) {
 +            return;
 +        }
 +    }
      if (object_property_find(OBJECT(s->cpu), "start-powered-off")) {
          if (!object_property_set_bool(OBJECT(s->cpu), "start-powered-off",
                                        s->start_powered_off, errp)) {
@@ -XXX,XX +XXX,XX @@ static Property armv7m_properties[] = {
                       MemoryRegion *),
      DEFINE_PROP_LINK("idau", ARMv7MState, idau, TYPE_IDAU_INTERFACE, Object *),
      DEFINE_PROP_UINT32("init-svtor", ARMv7MState, init_svtor, 0),
 +    DEFINE_PROP_UINT32("init-nsvtor", ARMv7MState, init_nsvtor, 0),
      DEFINE_PROP_BOOL("enable-bitband", ARMv7MState, enable_bitband, false),
      DEFINE_PROP_BOOL("start-powered-off", ARMv7MState, start_powered_off,
                       false),
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo arm_cpus[] = {
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
- static Property arm_cpu_properties[] = {
+         env->regs[14] = 0xffffffff;
-     DEFINE_PROP_BOOL("start-powered-off", ARMCPU, start_powered_off, false),
-     DEFINE_PROP_UINT32("psci-conduit", ARMCPU, psci_conduit, 0),
+         env->v7m.vecbase[M_REG_S] = cpu->init_svtor & 0xffffff80;
--    DEFINE_PROP_UINT32("midr", ARMCPU, midr, 0),
++        env->v7m.vecbase[M_REG_NS] = cpu->init_nsvtor & 0xffffff80;
-+    DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
-     DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
+         /* Load the initial SP and PC from offset 0 and 4 in the vector table */
-                         mp_affinity, ARM64_AFFINITY_INVALID),
+         vecbase = env->v7m.vecbase[env->v7m.secure];
-     DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
+@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
                                         &cpu->init_svtor,
                                         OBJ_PROP_FLAG_READWRITE);
      }
 +    if (arm_feature(&cpu->env, ARM_FEATURE_M)) {
 +        /*
 +         * Initial value of the NS VTOR (for cores without the Security
 +         * extension, this is the only VTOR)
 +         */
 +        object_property_add_uint32_ptr(obj, "init-nsvtor",
 +                                       &cpu->init_nsvtor,
 +                                       OBJ_PROP_FLAG_READWRITE);
 +    }
      qdev_property_add_static(DEVICE(obj), &arm_cpu_cfgend_property);
 --
 .20.1

-[PULL 34/39] target/arm: Convert Neon 3-reg-same logic ops to decodetree
+[PULL 09/45] arm: Consistently use "Cortex-Axx", not "Cortex Axx"
-Convert the Neon logic ops in the 3-reg-same grouping to decodetree.
+The official punctuation for Arm CPU names uses a hyphen, like
-Note that for the logic ops the 'size' field forms part of their
+"Cortex-A9". We mostly follow this, but in a few places usage
-decode and the actual operations are always bitwise.
+without the hyphen has crept in. Fix those so we consistently
 use the same way of writing the CPU name.
 This commit was created with:
   git grep -z -l 'Cortex ' | xargs -0 sed -i 's/Cortex /Cortex-/'
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-16-peter.maydell@linaro.org
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Message-id: 20210527095152.10968-1-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       | 12 +++++++++++
+ docs/system/arm/aspeed.rst    | 4 ++--
- target/arm/translate-neon.inc.c | 19 +++++++++++++++++
+ docs/system/arm/nuvoton.rst   | 6 +++---
- target/arm/translate.c          | 38 +--------------------------------
+ docs/system/arm/sabrelite.rst | 2 +-
-files changed, 32 insertions(+), 37 deletions(-)
+ include/hw/arm/allwinner-h3.h | 2 +-
+ hw/arm/aspeed.c               | 6 +++---
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+ hw/arm/mcimx6ul-evk.c         | 2 +-
-index XXXXXXX..XXXXXXX 100644
+ hw/arm/mcimx7d-sabre.c        | 2 +-
---- a/target/arm/neon-dp.decode
+ hw/arm/npcm7xx_boards.c       | 4 ++--
-+++ b/target/arm/neon-dp.decode
+ hw/arm/sabrelite.c            | 2 +-
  hw/misc/npcm7xx_clk.c         | 2 +-
 files changed, 16 insertions(+), 16 deletions(-)
 diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/system/arm/aspeed.rst
 +++ b/docs/system/arm/aspeed.rst
@@ -XXX,XX +XXX,XX @@ The QEMU Aspeed machines model BMCs of various OpenPOWER systems and
  Aspeed evaluation boards. They are based on different releases of the
  Aspeed SoC : the AST2400 integrating an ARM926EJ-S CPU (400MHz), the
  AST2500 with an ARM1176JZS CPU (800MHz) and more recently the AST2600
 -with dual cores ARM Cortex A7 CPUs (1.2GHz).
 +with dual cores ARM Cortex-A7 CPUs (1.2GHz).
  The SoC comes with RAM, Gigabit ethernet, USB, SD/MMC, USB, SPI, I2C,
  etc.
@@ -XXX,XX +XXX,XX @@ AST2500 SoC based machines :
  AST2600 SoC based machines :
 -- ``ast2600-evb``          Aspeed AST2600 Evaluation board (Cortex A7)
 +- ``ast2600-evb``          Aspeed AST2600 Evaluation board (Cortex-A7)
  - ``tacoma-bmc``           OpenPOWER Witherspoon POWER9 AST2600 BMC
  Supported devices
 diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/system/arm/nuvoton.rst
 +++ b/docs/system/arm/nuvoton.rst
@@ -XXX,XX +XXX,XX @@ Nuvoton iBMC boards (``npcm750-evb``, ``quanta-gsj``)
  The `Nuvoton iBMC`_ chips (NPCM7xx) are a family of ARM-based SoCs that are
  designed to be used as Baseboard Management Controllers (BMCs) in various
 -servers. They all feature one or two ARM Cortex A9 CPU cores, as well as an
 +servers. They all feature one or two ARM Cortex-A9 CPU cores, as well as an
  assortment of peripherals targeted for either Enterprise or Data Center /
  Hyperscale applications. The former is a superset of the latter, so NPCM750 has
  all the peripherals of NPCM730 and more.
  .. _Nuvoton iBMC: https://www.nuvoton.com/products/cloud-computing/ibmc/
 -The NPCM750 SoC has two Cortex A9 cores and is targeted for the Enterprise
 +The NPCM750 SoC has two Cortex-A9 cores and is targeted for the Enterprise
  segment. The following machines are based on this chip :
  - ``npcm750-evb``       Nuvoton NPCM750 Evaluation board
 -The NPCM730 SoC has two Cortex A9 cores and is targeted for Data Center and
 +The NPCM730 SoC has two Cortex-A9 cores and is targeted for Data Center and
  Hyperscale applications. The following machines are based on this chip :
  - ``quanta-gsj``        Quanta GSJ server BMC
 diff --git a/docs/system/arm/sabrelite.rst b/docs/system/arm/sabrelite.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/system/arm/sabrelite.rst
 +++ b/docs/system/arm/sabrelite.rst
@@ -XXX,XX +XXX,XX @@ Supported devices
  The SABRE Lite machine supports the following devices:
 - * Up to 4 Cortex A9 cores
 + * Up to 4 Cortex-A9 cores
   * Generic Interrupt Controller
   * 1 Clock Controller Module
   * 1 System Reset Controller
 diff --git a/include/hw/arm/allwinner-h3.h b/include/hw/arm/allwinner-h3.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/arm/allwinner-h3.h
 +++ b/include/hw/arm/allwinner-h3.h
 @@ -XXX,XX +XXX,XX @@
- @3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
+  */
-                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
+ /*
-+@3same_logic     .... ... . . . .. .... .... .... . q:1 .. .... \
+- * The Allwinner H3 is a System on Chip containing four ARM Cortex A7
-+                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0
++ * The Allwinner H3 is a System on Chip containing four ARM Cortex-A7
-+
+  * processor cores. Features and specifications include DDR2/DDR3 memory,
-+VAND_3s          1111 001 0 0 . 00 .... .... 0001 ... 1 .... @3same_logic
+  * SD/MMC storage cards, 10/100/1000Mbit Ethernet, USB 2.0, HDMI and
-+VBIC_3s          1111 001 0 0 . 01 .... .... 0001 ... 1 .... @3same_logic
+  * various I/O modules.
-+VORR_3s          1111 001 0 0 . 10 .... .... 0001 ... 1 .... @3same_logic
+diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
-+VORN_3s          1111 001 0 0 . 11 .... .... 0001 ... 1 .... @3same_logic
+index XXXXXXX..XXXXXXX 100644
-+VEOR_3s          1111 001 1 0 . 00 .... .... 0001 ... 1 .... @3same_logic
+--- a/hw/arm/aspeed.c
-+VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
++++ b/hw/arm/aspeed.c
-+VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
+@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_ast2600_evb_class_init(ObjectClass *oc, void *data)
-+VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
+     MachineClass *mc = MACHINE_CLASS(oc);
-+
+     AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
- VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
- VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
+-    mc->desc       = "Aspeed AST2600 EVB (Cortex A7)";
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
++    mc->desc       = "Aspeed AST2600 EVB (Cortex-A7)";
-index XXXXXXX..XXXXXXX 100644
+     amc->soc_name  = "ast2600-a1";
---- a/target/arm/translate-neon.inc.c
+     amc->hw_strap1 = AST2600_EVB_HW_STRAP1;
-+++ b/target/arm/translate-neon.inc.c
+     amc->hw_strap2 = AST2600_EVB_HW_STRAP2;
-@@ -XXX,XX +XXX,XX @@ static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
+@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_tacoma_class_init(ObjectClass *oc, void *data)
+     MachineClass *mc = MACHINE_CLASS(oc);
- DO_3SAME(VADD, tcg_gen_gvec_add)
+     AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
- DO_3SAME(VSUB, tcg_gen_gvec_sub)
-+DO_3SAME(VAND, tcg_gen_gvec_and)
+-    mc->desc       = "OpenPOWER Tacoma BMC (Cortex A7)";
-+DO_3SAME(VBIC, tcg_gen_gvec_andc)
++    mc->desc       = "OpenPOWER Tacoma BMC (Cortex-A7)";
-+DO_3SAME(VORR, tcg_gen_gvec_or)
+     amc->soc_name  = "ast2600-a1";
-+DO_3SAME(VORN, tcg_gen_gvec_orc)
+     amc->hw_strap1 = TACOMA_BMC_HW_STRAP1;
-+DO_3SAME(VEOR, tcg_gen_gvec_xor)
+     amc->hw_strap2 = TACOMA_BMC_HW_STRAP2;
-+
+@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_rainier_class_init(ObjectClass *oc, void *data)
-+/* These insns are all gvec_bitsel but with the inputs in various orders. */
+     MachineClass *mc = MACHINE_CLASS(oc);
-+#define DO_3SAME_BITSEL(INSN, O1, O2, O3)                               \
+     AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
-+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
-+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+-    mc->desc       = "IBM Rainier BMC (Cortex A7)";
-+                                uint32_t oprsz, uint32_t maxsz)         \
++    mc->desc       = "IBM Rainier BMC (Cortex-A7)";
-+    {                                                                   \
+     amc->soc_name  = "ast2600-a1";
-+        tcg_gen_gvec_bitsel(vece, rd_ofs, O1, O2, O3, oprsz, maxsz);    \
+     amc->hw_strap1 = RAINIER_BMC_HW_STRAP1;
-+    }                                                                   \
+     amc->hw_strap2 = RAINIER_BMC_HW_STRAP2;
-+    DO_3SAME(INSN, gen_##INSN##_3s)
+diff --git a/hw/arm/mcimx6ul-evk.c b/hw/arm/mcimx6ul-evk.c
-+
+index XXXXXXX..XXXXXXX 100644
-+DO_3SAME_BITSEL(VBSL, rd_ofs, rn_ofs, rm_ofs)
+--- a/hw/arm/mcimx6ul-evk.c
-+DO_3SAME_BITSEL(VBIT, rm_ofs, rn_ofs, rd_ofs)
++++ b/hw/arm/mcimx6ul-evk.c
-+DO_3SAME_BITSEL(VBIF, rm_ofs, rd_ofs, rn_ofs)
+@@ -XXX,XX +XXX,XX @@ static void mcimx6ul_evk_init(MachineState *machine)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
+ static void mcimx6ul_evk_machine_init(MachineClass *mc)
---- a/target/arm/translate.c
+ {
-+++ b/target/arm/translate.c
+-    mc->desc = "Freescale i.MX6UL Evaluation Kit (Cortex A7)";
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
++    mc->desc = "Freescale i.MX6UL Evaluation Kit (Cortex-A7)";
-             }
+     mc->init = mcimx6ul_evk_init;
-             return 1;
+     mc->max_cpus = FSL_IMX6UL_NUM_CPUS;
+     mc->default_ram_id = "mcimx6ul-evk.ram";
--        case NEON_3R_LOGIC: /* Logic ops.  */
+diff --git a/hw/arm/mcimx7d-sabre.c b/hw/arm/mcimx7d-sabre.c
--            switch ((u << 2) | size) {
+index XXXXXXX..XXXXXXX 100644
--            case 0: /* VAND */
+--- a/hw/arm/mcimx7d-sabre.c
--                tcg_gen_gvec_and(0, rd_ofs, rn_ofs, rm_ofs,
++++ b/hw/arm/mcimx7d-sabre.c
--                                 vec_size, vec_size);
+@@ -XXX,XX +XXX,XX @@ static void mcimx7d_sabre_init(MachineState *machine)
--                break;
--            case 1: /* VBIC */
+ static void mcimx7d_sabre_machine_init(MachineClass *mc)
--                tcg_gen_gvec_andc(0, rd_ofs, rn_ofs, rm_ofs,
+ {
--                                  vec_size, vec_size);
+-    mc->desc = "Freescale i.MX7 DUAL SABRE (Cortex A7)";
--                break;
++    mc->desc = "Freescale i.MX7 DUAL SABRE (Cortex-A7)";
--            case 2: /* VORR */
+     mc->init = mcimx7d_sabre_init;
--                tcg_gen_gvec_or(0, rd_ofs, rn_ofs, rm_ofs,
+     mc->max_cpus = FSL_IMX7_NUM_CPUS;
--                                vec_size, vec_size);
+     mc->default_ram_id = "mcimx7d-sabre.ram";
--                break;
+diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
--            case 3: /* VORN */
+index XXXXXXX..XXXXXXX 100644
--                tcg_gen_gvec_orc(0, rd_ofs, rn_ofs, rm_ofs,
+--- a/hw/arm/npcm7xx_boards.c
--                                 vec_size, vec_size);
++++ b/hw/arm/npcm7xx_boards.c
--                break;
+@@ -XXX,XX +XXX,XX @@ static void npcm750_evb_machine_class_init(ObjectClass *oc, void *data)
--            case 4: /* VEOR */
--                tcg_gen_gvec_xor(0, rd_ofs, rn_ofs, rm_ofs,
+     npcm7xx_set_soc_type(nmc, TYPE_NPCM750);
--                                 vec_size, vec_size);
--                break;
+-    mc->desc = "Nuvoton NPCM750 Evaluation Board (Cortex A9)";
--            case 5: /* VBSL */
++    mc->desc = "Nuvoton NPCM750 Evaluation Board (Cortex-A9)";
--                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rd_ofs, rn_ofs, rm_ofs,
+     mc->init = npcm750_evb_init;
--                                    vec_size, vec_size);
+     mc->default_ram_size = 512 * MiB;
--                break;
+ };
--            case 6: /* VBIT */
+@@ -XXX,XX +XXX,XX @@ static void gsj_machine_class_init(ObjectClass *oc, void *data)
--                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rn_ofs, rd_ofs,
--                                    vec_size, vec_size);
+     npcm7xx_set_soc_type(nmc, TYPE_NPCM730);
--                break;
--            case 7: /* VBIF */
+-    mc->desc = "Quanta GSJ (Cortex A9)";
--                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rd_ofs, rn_ofs,
++    mc->desc = "Quanta GSJ (Cortex-A9)";
--                                    vec_size, vec_size);
+     mc->init = quanta_gsj_init;
--                break;
+     mc->default_ram_size = 512 * MiB;
--            }
+ };
--            return 0;
+diff --git a/hw/arm/sabrelite.c b/hw/arm/sabrelite.c
--
+index XXXXXXX..XXXXXXX 100644
-         case NEON_3R_VQADD:
+--- a/hw/arm/sabrelite.c
-             tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
++++ b/hw/arm/sabrelite.c
-                            rn_ofs, rm_ofs, vec_size, vec_size,
+@@ -XXX,XX +XXX,XX @@ static void sabrelite_init(MachineState *machine)
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-             return 0;
+ static void sabrelite_machine_init(MachineClass *mc)
+ {
-         case NEON_3R_VADD_VSUB:
+-    mc->desc = "Freescale i.MX6 Quad SABRE Lite Board (Cortex A9)";
-+        case NEON_3R_LOGIC:
++    mc->desc = "Freescale i.MX6 Quad SABRE Lite Board (Cortex-A9)";
-             /* Already handled by decodetree */
+     mc->init = sabrelite_init;
-             return 1;
+     mc->max_cpus = FSL_IMX6_NUM_CPUS;
-         }
+     mc->ignore_memory_transaction_failures = true;
 diff --git a/hw/misc/npcm7xx_clk.c b/hw/misc/npcm7xx_clk.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/npcm7xx_clk.c
 +++ b/hw/misc/npcm7xx_clk.c
@@ -XXX,XX +XXX,XX @@
  #define NPCM7XX_CLOCK_REF_HZ            (25000000)
  /* Register Field Definitions */
 -#define NPCM7XX_CLK_WDRCR_CA9C  BIT(0) /* Cortex A9 Cores */
 +#define NPCM7XX_CLK_WDRCR_CA9C  BIT(0) /* Cortex-A9 Cores */
  #define PLLCON_LOKI     BIT(31)
  #define PLLCON_LOKS     BIT(30)
 --
 .20.1

-New patch
+[PULL 10/45] docs: Fix installation of man pages with Sphinx 4.x
+From: Damien Goutte-Gattat <dgouttegattat@incenp.org>
+The 4.x branch of Sphinx introduces a breaking change, as generated man
+pages are now written to subdirectories corresponding to the manual
+section they belong to. This results in `make install` erroring out when
+attempting to install the man pages, because they are not where it
+expects to find them.
+This patch restores the behavior of Sphinx 3.x regarding man pages.
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/256
+Signed-off-by: Damien Goutte-Gattat <dgouttegattat@incenp.org>
+Message-id: 20210503161422.15028-1-dgouttegattat@incenp.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ docs/conf.py | 1 +
+file changed, 1 insertion(+)
+diff --git a/docs/conf.py b/docs/conf.py
+index XXXXXXX..XXXXXXX 100644
+--- a/docs/conf.py
++++ b/docs/conf.py
+@@ -XXX,XX +XXX,XX @@
+      ['Stefan Hajnoczi <stefanha@redhat.com>',
+       'Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>'], 1),
+ ]
++man_make_section_directory = False
+ # -- Options for Texinfo output -------------------------------------------
+--
+.20.1

-[PULL 39/39] target/arm: Move gen_ function typedefs to translate.h
+[PULL 11/45] target/arm: Mark LDS{MIN,MAX} as signed operations
-We're going to want at least some of the NeonGen* typedefs
+From: Richard Henderson <richard.henderson@linaro.org>
 for the refactored 32-bit Neon decoder, so move them all
 to translate.h since it makes more sense to keep them in
 one group.
+The operands to tcg_gen_atomic_fetch_s{min,max}_i64 must
+be signed, so that the inputs are properly extended.
+Zero extend the result afterward, as needed.
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/364
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Message-id: 20210602020720.47679-1-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-23-peter.maydell@linaro.org
 ---
- target/arm/translate.h     | 17 +++++++++++++++++
+ target/arm/translate-a64.c | 13 ++++++++++---
- target/arm/translate-a64.c | 17 -----------------
+file changed, 10 insertions(+), 3 deletions(-)
 files changed, 17 insertions(+), 17 deletions(-)
-diff --git a/target/arm/translate.h b/target/arm/translate.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.h
-+++ b/target/arm/translate.h
-@@ -XXX,XX +XXX,XX @@ typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
- typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
-                         uint32_t, uint32_t, uint32_t);
-+/* Function prototype for gen_ functions for calling Neon helpers */
-+typedef void NeonGenOneOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32);
-+typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
-+typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
-+typedef void NeonGenTwo64OpFn(TCGv_i64, TCGv_i64, TCGv_i64);
-+typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
-+typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
-+typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
-+typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
-+typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
-+typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
-+typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
-+typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
-+typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
-+typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
-+typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
-+
- #endif /* TARGET_ARM_TRANSLATE_H */
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ typedef struct AArch64DecodeTable {
+@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
-     AArch64DecodeFn *disas_fn;
+     int o3_opc = extract32(insn, 12, 4);
- } AArch64DecodeTable;
+     bool r = extract32(insn, 22, 1);
+     bool a = extract32(insn, 23, 1);
--/* Function prototype for gen_ functions for calling Neon helpers */
+-    TCGv_i64 tcg_rs, clean_addr;
--typedef void NeonGenOneOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32);
++    TCGv_i64 tcg_rs, tcg_rt, clean_addr;
--typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
+     AtomicThreeOpFn *fn = NULL;
--typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
++    MemOp mop = s->be_data | size | MO_ALIGN;
--typedef void NeonGenTwo64OpFn(TCGv_i64, TCGv_i64, TCGv_i64);
--typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
+     if (is_vector || !dc_isar_feature(aa64_atomics, s)) {
--typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
+         unallocated_encoding(s);
--typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
+@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
--typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
+         break;
--typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
+     case 004: /* LDSMAX */
--typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
+         fn = tcg_gen_atomic_fetch_smax_i64;
--typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
++        mop |= MO_SIGN;
--typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
+         break;
--typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
+     case 005: /* LDSMIN */
--typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
+         fn = tcg_gen_atomic_fetch_smin_i64;
--typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
++        mop |= MO_SIGN;
--
+         break;
- /* initialize TCG globals.  */
+     case 006: /* LDUMAX */
- void a64_translate_init(void)
+         fn = tcg_gen_atomic_fetch_umax_i64;
- {
+@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
      }
      tcg_rs = read_cpu_reg(s, rs, true);
 +    tcg_rt = cpu_reg(s, rt);
      if (o3_opc == 1) { /* LDCLR */
          tcg_gen_not_i64(tcg_rs, tcg_rs);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
      /* The tcg atomic primitives are all full barriers.  Therefore we
       * can ignore the Acquire and Release bits of this instruction.
       */
 -    fn(cpu_reg(s, rt), clean_addr, tcg_rs, get_mem_index(s),
 -       s->be_data | size | MO_ALIGN);
 +    fn(tcg_rt, clean_addr, tcg_rs, get_mem_index(s), mop);
 +
 +    if ((mop & MO_SIGN) && size != MO_64) {
 +        tcg_gen_ext32u_i64(tcg_rt, tcg_rt);
 +    }
  }
  /*
 --
 .20.1

-New patch
+[PULL 12/45] target/arm: fix missing exception class
+From: Jamie Iles <jamie@nuviainc.com>
+The DAIF and PAC checks used raise_exception_ra to raise an exception
+and unwind CPU state but raise_exception_ra is currently designed for
+handling data aborts as the syndrome is partially precomputed and
+encoded in the TB and then merged in merge_syn_data_abort when handling
+the data abort.  Using raise_exception_ra for DAIF and PAC checks
+results in an empty syndrome being retrieved from data[2] in
+restore_state_to_opc and setting ESR to 0.  This manifested as:
+  kvm [571]: Unknown exception class: esr: 0x000000 –
+  Unknown/Uncategorized
+when launching a KVM guest when the host qemu used a CPU supporting
+EL2+pointer authentication and enabling pointer authentication in the
+guest.
+Rework raise_exception_ra such that the state is restored before raising
+the exception so that the exception is not clobbered by
+restore_state_to_opc.
+Fixes: 0d43e1a2d29a ("target/arm: Add PAuth helpers")
+Cc: Richard Henderson <richard.henderson@linaro.org>
+Cc: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Jamie Iles <jamie@nuviainc.com>
+[PMM: added comment]
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/op_helper.c | 11 +++++++++--
+file changed, 9 insertions(+), 2 deletions(-)
+diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/op_helper.c
++++ b/target/arm/op_helper.c
+@@ -XXX,XX +XXX,XX @@ void raise_exception(CPUARMState *env, uint32_t excp,
+ void raise_exception_ra(CPUARMState *env, uint32_t excp, uint32_t syndrome,
+                         uint32_t target_el, uintptr_t ra)
+ {
+-    CPUState *cs = do_raise_exception(env, excp, syndrome, target_el);
+-    cpu_loop_exit_restore(cs, ra);
++    CPUState *cs = env_cpu(env);
++
++    /*
++     * restore_state_to_opc() will set env->exception.syndrome, so
++     * we must restore CPU state here before setting the syndrome
++     * the caller passed us, and cannot use cpu_loop_exit_restore().
++     */
++    cpu_restore_state(cs, ra, true);
++    raise_exception(env, excp, syndrome, target_el);
+ }
+ uint64_t HELPER(neon_tbl)(CPUARMState *env, uint32_t desc,
+--
+.20.1

-[PULL 09/39] hw/arm: versal: Remove inclusion of arm_gicv3_common.h
+[PULL 13/45] target/arm: fold do_raise_exception into raise_exception
-From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+From: Jamie Iles <jamie@nuviainc.com>
-Remove inclusion of arm_gicv3_common.h, this already gets
+Now that there are no other users of do_raise_exception, fold it into
-included via xlnx-versal.h.
+raise_exception.
-Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Cc: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Cc: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Luc Michel <luc.michel@greensocs.com>
+Signed-off-by: Jamie Iles <jamie@nuviainc.com>
-Message-id: 20200427181649.26851-2-edgar.iglesias@gmail.com
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/xlnx-versal.c | 1 -
+ target/arm/op_helper.c | 12 ++----------
-file changed, 1 deletion(-)
+file changed, 2 insertions(+), 10 deletions(-)
-diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
+diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/xlnx-versal.c
+--- a/target/arm/op_helper.c
-+++ b/hw/arm/xlnx-versal.c
++++ b/target/arm/op_helper.c
 @@ -XXX,XX +XXX,XX @@
- #include "hw/arm/boot.h"
+ #define SIGNBIT (uint32_t)0x80000000
- #include "kvm_arm.h"
+ #define SIGNBIT64 ((uint64_t)1 << 63)
- #include "hw/misc/unimp.h"
--#include "hw/intc/arm_gicv3_common.h"
+-static CPUState *do_raise_exception(CPUARMState *env, uint32_t excp,
- #include "hw/arm/xlnx-versal.h"
+-                                    uint32_t syndrome, uint32_t target_el)
- #include "hw/char/pl011.h"
++void raise_exception(CPUARMState *env, uint32_t excp,
 +                     uint32_t syndrome, uint32_t target_el)
  {
      CPUState *cs = env_cpu(env);
@@ -XXX,XX +XXX,XX @@ static CPUState *do_raise_exception(CPUARMState *env, uint32_t excp,
      cs->exception_index = excp;
      env->exception.syndrome = syndrome;
      env->exception.target_el = target_el;
 -
 -    return cs;
 -}
 -
 -void raise_exception(CPUARMState *env, uint32_t excp,
 -                     uint32_t syndrome, uint32_t target_el)
 -{
 -    CPUState *cs = do_raise_exception(env, excp, syndrome, target_el);
      cpu_loop_exit(cs);
  }
 --
 .20.1

-New patch
+[PULL 14/45] target/arm: use raise_exception_ra for MTE check failure
+From: Jamie Iles <jamie@nuviainc.com>
+Now that raise_exception_ra restores the state before raising the
+exception we can use restore_exception_ra to perform the state restore +
+exception raising without clobbering the syndrome.
+Cc: Richard Henderson <richard.henderson@linaro.org>
+Cc: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Jamie Iles <jamie@nuviainc.com>
+[PMM: Keep the one line of the comment that is still relevant]
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/mte_helper.c | 12 +++---------
+file changed, 3 insertions(+), 9 deletions(-)
+diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/mte_helper.c
++++ b/target/arm/mte_helper.c
+@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
+     switch (tcf) {
+     case 1:
+-        /*
+-         * Tag check fail causes a synchronous exception.
+-         *
+-         * In restore_state_to_opc, we set the exception syndrome
+-         * for the load or store operation.  Unwind first so we
+-         * may overwrite that with the syndrome for the tag check.
+-         */
+-        cpu_restore_state(env_cpu(env), ra, true);
++        /* Tag check fail causes a synchronous exception. */
+         env->exception.vaddress = dirty_ptr;
+         is_write = FIELD_EX32(desc, MTEDESC, WRITE);
+         syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0,
+                                     is_write, 0x11);
+-        raise_exception(env, EXCP_DATA_ABORT, syn, exception_target_el(env));
++        raise_exception_ra(env, EXCP_DATA_ABORT, syn,
++                           exception_target_el(env), ra);
+         /* noreturn, but fall through to the assert anyway */
+     case 0:
+--
+.20.1

-[PULL 19/39] hw/arm: versal-virt: Add support for the RTC
+[PULL 15/45] target/arm: use raise_exception_ra for stack limit exception
-From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+From: Jamie Iles <jamie@nuviainc.com>
-Add support for the RTC.
+The sequence cpu_restore_state() + raise_exception() is equivalent to
 raise_exception_ra(), so use that instead.  (In this case we never
 cared about the syndrome value, because M-profile doesn't use the
 syndrome; the old code was just written unnecessarily awkwardly.)
-Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Cc: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Cc: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Luc Michel <luc.michel@greensocs.com>
+Signed-off-by: Jamie Iles <jamie@nuviainc.com>
-Message-id: 20200427181649.26851-12-edgar.iglesias@gmail.com
+[PMM: Retain edited version of comment; rewrite commit message]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/xlnx-versal-virt.c | 22 ++++++++++++++++++++++
+ target/arm/m_helper.c  | 5 +----
-file changed, 22 insertions(+)
+ target/arm/op_helper.c | 9 +++------
 files changed, 4 insertions(+), 10 deletions(-)
-diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
+diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/xlnx-versal-virt.c
+--- a/target/arm/m_helper.c
-+++ b/hw/arm/xlnx-versal-virt.c
++++ b/target/arm/m_helper.c
-@@ -XXX,XX +XXX,XX @@ static void fdt_add_sd_nodes(VersalVirt *s)
+@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
              limit = is_psp ? env->v7m.psplim[false] : env->v7m.msplim[false];
              if (val < limit) {
 -                CPUState *cs = env_cpu(env);
 -
 -                cpu_restore_state(cs, GETPC(), true);
 -                raise_exception(env, EXCP_STKOF, 0, 1);
 +                raise_exception_ra(env, EXCP_STKOF, 0, 1, GETPC());
              }
              if (is_psp) {
 diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/op_helper.c
 +++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v8m_stackcheck)(CPUARMState *env, uint32_t newvalue)
       * raising an exception if the limit is breached.
       */
      if (newvalue < v7m_sp_limit(env)) {
 -        CPUState *cs = env_cpu(env);
 -
          /*
           * Stack limit exceptions are a rare case, so rather than syncing
 -         * PC/condbits before the call, we use cpu_restore_state() to
 -         * get them right before raising the exception.
 +         * PC/condbits before the call, we use raise_exception_ra() so
 +         * that cpu_restore_state() will sort them out.
           */
 -        cpu_restore_state(cs, GETPC(), true);
 -        raise_exception(env, EXCP_STKOF, 0, 1);
 +        raise_exception_ra(env, EXCP_STKOF, 0, 1, GETPC());
      }
  }
-+static void fdt_add_rtc_node(VersalVirt *s)
-+{
-+    const char compat[] = "xlnx,zynqmp-rtc";
-+    const char interrupt_names[] = "alarm\0sec";
-+    char *name = g_strdup_printf("/rtc@%x", MM_PMC_RTC);
-+
-+    qemu_fdt_add_subnode(s->fdt, name);
-+
-+    qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
-+                           GIC_FDT_IRQ_TYPE_SPI, VERSAL_RTC_ALARM_IRQ,
-+                           GIC_FDT_IRQ_FLAGS_LEVEL_HI,
-+                           GIC_FDT_IRQ_TYPE_SPI, VERSAL_RTC_SECONDS_IRQ,
-+                           GIC_FDT_IRQ_FLAGS_LEVEL_HI);
-+    qemu_fdt_setprop(s->fdt, name, "interrupt-names",
-+                     interrupt_names, sizeof(interrupt_names));
-+    qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
-+                                 2, MM_PMC_RTC, 2, MM_PMC_RTC_SIZE);
-+    qemu_fdt_setprop(s->fdt, name, "compatible", compat, sizeof(compat));
-+    g_free(name);
-+}
-+
- static void fdt_nop_memory_nodes(void *fdt, Error **errp)
- {
-     Error *err = NULL;
-@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
-     fdt_add_timer_nodes(s);
-     fdt_add_zdma_nodes(s);
-     fdt_add_sd_nodes(s);
-+    fdt_add_rtc_node(s);
-     fdt_add_cpu_nodes(s, psci_conduit);
-     fdt_add_clk_node(s, "/clk125", 125000000, s->phandle.clk_125Mhz);
-     fdt_add_clk_node(s, "/clk25", 25000000, s->phandle.clk_25Mhz);
 --
 .20.1

-[PULL 06/39] target/arm: Implement ARMv8.2-TTS2UXN
+[PULL 16/45] target/arm: Add isar_feature_{aa32, aa64, aa64_sve}_bf16
-The ARMv8.2-TTS2UXN feature extends the XN field in stage 2
+From: Richard Henderson <richard.henderson@linaro.org>
 translation table descriptors from just bit [54] to bits [54:53],
 allowing stage 2 to control execution permissions separately for EL0
 and EL1. Implement the new semantics of the XN field and enable
 the feature for our 'max' CPU.
+Note that the SVE BFLOAT16 support does not require SVE2,
+it is an independent extension.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210525225817.400336-2-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200330210400.11724-5-peter.maydell@linaro.org
 ---
- target/arm/cpu.h    | 15 +++++++++++++++
+ target/arm/cpu.h | 15 +++++++++++++++
- target/arm/cpu.c    |  1 +
+file changed, 15 insertions(+)
  target/arm/cpu64.c  |  2 ++
  target/arm/helper.c | 37 +++++++++++++++++++++++++++++++------
 files changed, 49 insertions(+), 6 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_ccidx(const ARMISARegisters *id)
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_predinv(const ARMISARegisters *id)
-     return FIELD_EX32(id->id_mmfr4, ID_MMFR4, CCIDX) != 0;
+     return FIELD_EX32(id->id_isar6, ID_ISAR6, SPECRES) != 0;
  }
-+static inline bool isar_feature_aa32_tts2uxn(const ARMISARegisters *id)
++static inline bool isar_feature_aa32_bf16(const ARMISARegisters *id)
 +{
-+    return FIELD_EX32(id->id_mmfr4, ID_MMFR4, XNX) != 0;
++    return FIELD_EX32(id->id_isar6, ID_ISAR6, BF16) != 0;
 +}
 +
- /*
+ static inline bool isar_feature_aa32_i8mm(const ARMISARegisters *id)
-  * 64-bit feature tests via id registers.
+ {
-  */
+     return FIELD_EX32(id->id_isar6, ID_ISAR6, I8MM) != 0;
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_ccidx(const ARMISARegisters *id)
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_dcpodp(const ARMISARegisters *id)
-     return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, CCIDX) != 0;
+     return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) >= 2;
  }
-+static inline bool isar_feature_aa64_tts2uxn(const ARMISARegisters *id)
++static inline bool isar_feature_aa64_bf16(const ARMISARegisters *id)
 +{
-+    return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, XNX) != 0;
++    return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, BF16) != 0;
 +}
 +
- /*
+ static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id)
-  * Feature tests for "does this exist in either 32-bit or 64-bit?"
+ {
-  */
+     /* We always set the AdvSIMD and FP fields identically.  */
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_any_ccidx(const ARMISARegisters *id)
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sve2_bitperm(const ARMISARegisters *id)
-     return isar_feature_aa64_ccidx(id) || isar_feature_aa32_ccidx(id);
+     return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BITPERM) != 0;
  }
-+static inline bool isar_feature_any_tts2uxn(const ARMISARegisters *id)
++static inline bool isar_feature_aa64_sve_bf16(const ARMISARegisters *id)
 +{
-+    return isar_feature_aa64_tts2uxn(id) || isar_feature_aa32_tts2uxn(id);
++    return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BFLOAT16) != 0;
 +}
 +
- /*
+ static inline bool isar_feature_aa64_sve2_sha3(const ARMISARegisters *id)
   * Forward to the above feature tests given an ARMCPU pointer.
   */
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
              t = FIELD_DP32(t, ID_MMFR4, HPDS, 1); /* AA32HPD */
              t = FIELD_DP32(t, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
              t = FIELD_DP32(t, ID_MMFR4, CNP, 1); /* TTCNP */
 +            t = FIELD_DP32(t, ID_MMFR4, XNX, 1); /* TTS2UXN */
              cpu->isar.id_mmfr4 = t;
          }
  #endif
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          t = FIELD_DP64(t, ID_AA64MMFR1, VH, 1);
          t = FIELD_DP64(t, ID_AA64MMFR1, PAN, 2); /* ATS1E1 */
          t = FIELD_DP64(t, ID_AA64MMFR1, VMIDBITS, 2); /* VMID16 */
 +        t = FIELD_DP64(t, ID_AA64MMFR1, XNX, 1); /* TTS2UXN */
          cpu->isar.id_aa64mmfr1 = t;
          t = cpu->isar.id_aa64mmfr2;
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          u = FIELD_DP32(u, ID_MMFR4, HPDS, 1); /* AA32HPD */
          u = FIELD_DP32(u, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
          u = FIELD_DP32(u, ID_MMFR4, CNP, 1); /* TTCNP */
 +        u = FIELD_DP32(u, ID_MMFR4, XNX, 1); /* TTS2UXN */
          cpu->isar.id_mmfr4 = u;
          u = cpu->isar.id_aa64dfr0;
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
   *
   * @env:     CPUARMState
   * @s2ap:    The 2-bit stage2 access permissions (S2AP)
 - * @xn:      XN (execute-never) bit
 + * @xn:      XN (execute-never) bits
 + * @s1_is_el0: true if this is S2 of an S1+2 walk for EL0
   */
 -static int get_S2prot(CPUARMState *env, int s2ap, int xn)
 +static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
  {
-     int prot = 0;
+     return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, SHA3) != 0;
@@ -XXX,XX +XXX,XX @@ static int get_S2prot(CPUARMState *env, int s2ap, int xn)
      if (s2ap & 2) {
          prot |= PAGE_WRITE;
      }
 -    if (!xn) {
 -        if (arm_el_is_aa64(env, 2) || prot & PAGE_READ) {
 +
 +    if (cpu_isar_feature(any_tts2uxn, env_archcpu(env))) {
 +        switch (xn) {
 +        case 0:
              prot |= PAGE_EXEC;
 +            break;
 +        case 1:
 +            if (s1_is_el0) {
 +                prot |= PAGE_EXEC;
 +            }
 +            break;
 +        case 2:
 +            break;
 +        case 3:
 +            if (!s1_is_el0) {
 +                prot |= PAGE_EXEC;
 +            }
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
 +    } else {
 +        if (!extract32(xn, 1, 1)) {
 +            if (arm_el_is_aa64(env, 2) || prot & PAGE_READ) {
 +                prot |= PAGE_EXEC;
 +            }
          }
      }
      return prot;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
      }
      ap = extract32(attrs, 4, 2);
 -    xn = extract32(attrs, 12, 1);
      if (mmu_idx == ARMMMUIdx_Stage2) {
          ns = true;
 -        *prot = get_S2prot(env, ap, xn);
 +        xn = extract32(attrs, 11, 2);
 +        *prot = get_S2prot(env, ap, xn, s1_is_el0);
      } else {
          ns = extract32(attrs, 3, 1);
 +        xn = extract32(attrs, 12, 1);
          pxn = extract32(attrs, 11, 1);
          *prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
      }
 --
 .20.1

-[PULL 11/39] hw/arm: versal-virt: Fix typo xlnx-ve -> xlnx-versal
+[PULL 17/45] target/arm: Unify unallocated path in disas_fp_1src
-From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-Fix typo xlnx-ve -> xlnx-versal.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Message-id: 20210525225817.400336-3-richard.henderson@linaro.org
 Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Luc Michel <luc.michel@greensocs.com>
 Message-id: 20200427181649.26851-4-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/xlnx-versal-virt.c | 2 +-
+ target/arm/translate-a64.c | 15 ++++++---------
-file changed, 1 insertion(+), 1 deletion(-)
+file changed, 6 insertions(+), 9 deletions(-)
-diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
+diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/xlnx-versal-virt.c
+--- a/target/arm/translate-a64.c
-+++ b/hw/arm/xlnx-versal-virt.c
++++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
+@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
-         psci_conduit = QEMU_PSCI_CONDUIT_SMC;
+     int rd = extract32(insn, 0, 5);
      if (mos) {
 -        unallocated_encoding(s);
 -        return;
 +        goto do_unallocated;
      }
--    sysbus_init_child_obj(OBJECT(machine), "xlnx-ve", &s->soc,
+     switch (opcode) {
-+    sysbus_init_child_obj(OBJECT(machine), "xlnx-versal", &s->soc,
+@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
-                           sizeof(s->soc), TYPE_XLNX_VERSAL);
+         /* FCVT between half, single and double precision */
-     object_property_set_link(OBJECT(&s->soc), OBJECT(machine->ram),
+         int dtype = extract32(opcode, 0, 2);
-                              "ddr", &error_abort);
+         if (type == 2 || dtype == type) {
 -            unallocated_encoding(s);
 -            return;
 +            goto do_unallocated;
          }
          if (!fp_access_check(s)) {
              return;
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
      case 0x10 ... 0x13: /* FRINT{32,64}{X,Z} */
          if (type > 1 || !dc_isar_feature(aa64_frint, s)) {
 -            unallocated_encoding(s);
 -            return;
 +            goto do_unallocated;
          }
          /* fall through */
      case 0x0 ... 0x3:
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
              break;
          case 3:
              if (!dc_isar_feature(aa64_fp16, s)) {
 -                unallocated_encoding(s);
 -                return;
 +                goto do_unallocated;
              }
              if (!fp_access_check(s)) {
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
              handle_fp_1src_half(s, opcode, rd, rn);
              break;
          default:
 -            unallocated_encoding(s);
 +            goto do_unallocated;
          }
          break;
      default:
 +    do_unallocated:
          unallocated_encoding(s);
          break;
      }
 --
 .20.1

-[PULL 32/39] target/arm: Convert Neon 'load/store single structure' to decodetree
+[PULL 18/45] target/arm: Implement scalar float32 to bfloat16 conversion
-Convert the Neon "load/store single structure to one lane" insns to
+From: Richard Henderson <richard.henderson@linaro.org>
 decodetree.
-As this is the last set of insns in the neon load/store group,
+This is the 64-bit BFCVT and the 32-bit VCVT{B,T}.BF16.F32.
 we can remove the whole disas_neon_ls_insn() function.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210525225817.400336-4-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-14-peter.maydell@linaro.org
 ---
- target/arm/neon-ls.decode       |  11 +++
+ target/arm/helper.h        |  1 +
- target/arm/translate-neon.inc.c |  89 +++++++++++++++++++
+ target/arm/vfp.decode      |  2 ++
- target/arm/translate.c          | 147 --------------------------------
+ target/arm/translate-a64.c | 19 +++++++++++++++++++
-files changed, 100 insertions(+), 147 deletions(-)
+ target/arm/translate-vfp.c | 24 ++++++++++++++++++++++++
  target/arm/vfp_helper.c    |  5 +++++
 files changed, 51 insertions(+)
-diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-ls.decode
+--- a/target/arm/helper.h
-+++ b/target/arm/neon-ls.decode
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
- VLD_all_lanes  1111 0100 1 . 1 0 rn:4 .... 11 n:2 size:2 t:1 a:1 rm:4 \
+ DEF_HELPER_2(vfp_fcvtds, f64, f32, env)
-                vd=%vd_dp
+ DEF_HELPER_2(vfp_fcvtsd, f32, f64, env)
 +DEF_HELPER_FLAGS_2(bfcvt, TCG_CALL_NO_RWG, i32, f32, ptr)
  DEF_HELPER_2(vfp_uitoh, f16, i32, ptr)
  DEF_HELPER_2(vfp_uitos, f32, i32, ptr)
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
  # VCVTB and VCVTT to f16: Vd format is always vd_sp;
  # Vm format depends on size bit
 +VCVT_b16_f32 ---- 1110 1.11 0011 .... 1001 t:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_sp
  VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
               vd=%vd_sp vm=%vm_sp
  VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn)
      case 0x3: /* FSQRT */
          gen_helper_vfp_sqrts(tcg_res, tcg_op, cpu_env);
          goto done;
 +    case 0x6: /* BFCVT */
 +        gen_fpst = gen_helper_bfcvt;
 +        break;
      case 0x8: /* FRINTN */
      case 0x9: /* FRINTP */
      case 0xa: /* FRINTM */
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
          }
          break;
 +    case 0x6:
 +        switch (type) {
 +        case 1: /* BFCVT */
 +            if (!dc_isar_feature(aa64_bf16, s)) {
 +                goto do_unallocated;
 +            }
 +            if (!fp_access_check(s)) {
 +                return;
 +            }
 +            handle_fp_1src_single(s, opcode, rd, rn);
 +            break;
 +        default:
 +            goto do_unallocated;
 +        }
 +        break;
 +
-+# Neon load/store single structure to one lane
+     default:
-+%imm1_5_p1 5:1 !function=plus1
+     do_unallocated:
-+%imm1_6_p1 6:1 !function=plus1
+         unallocated_encoding(s);
-+
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 +VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 00 n:2 reg_idx:3 align:1 rm:4 \
 +               vd=%vd_dp size=0 stride=1
 +VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 01 n:2 reg_idx:2 align:2 rm:4 \
 +               vd=%vd_dp size=1 stride=%imm1_5_p1
 +VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 10 n:2 reg_idx:1 align:3 rm:4 \
 +               vd=%vd_dp size=2 stride=%imm1_6_p1
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/arm/translate-vfp.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
   * It might be possible to convert it to a standalone .c file eventually.
   */
 +static inline int plus1(DisasContext *s, int x)
 +{
 +    return x + 1;
 +}
 +
  /* Include the generated Neon decoder */
  #include "decode-neon-dp.inc.c"
  #include "decode-neon-ls.inc.c"
@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
      return true;
  }
++static bool trans_VCVT_b16_f32(DisasContext *s, arg_VCVT_b16_f32 *a)
++{
++    TCGv_ptr fpst;
++    TCGv_i32 tmp;
 +
-+static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
++    if (!dc_isar_feature(aa32_bf16, s)) {
 +{
 +    /* Neon load/store single structure to one lane */
 +    int reg;
 +    int nregs = a->n + 1;
 +    int vd = a->vd;
 +    TCGv_i32 addr, tmp;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
 +        return false;
 +    }
 +
 +    /* Catch the UNDEF cases. This is unavoidably a bit messy. */
 +    switch (nregs) {
 +    case 1:
 +        if (((a->align & (1 << a->size)) != 0) ||
 +            (a->size == 2 && ((a->align & 3) == 1 || (a->align & 3) == 2))) {
 +            return false;
 +        }
 +        break;
 +    case 3:
 +        if ((a->align & 1) != 0) {
 +            return false;
 +        }
 +        /* fall through */
 +    case 2:
 +        if (a->size == 2 && (a->align & 2) != 0) {
 +            return false;
 +        }
 +        break;
 +    case 4:
 +        if ((a->size == 2) && ((a->align & 3) == 3)) {
 +            return false;
 +        }
 +        break;
 +    default:
 +        abort();
 +    }
 +    if ((vd + a->stride * (nregs - 1)) > 31) {
 +        /*
 +         * Attempts to write off the end of the register file are
 +         * UNPREDICTABLE; we choose to UNDEF because otherwise we would
 +         * access off the end of the array that holds the register data.
 +         */
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
++    fpst = fpstatus_ptr(FPST_FPCR);
 +    tmp = tcg_temp_new_i32();
-+    addr = tcg_temp_new_i32();
++
-+    load_reg_var(s, addr, a->rn);
++    vfp_load_reg32(tmp, a->vm);
-+    /*
++    gen_helper_bfcvt(tmp, tmp, fpst);
-+     * TODO: if we implemented alignment exceptions, we should check
++    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
-+     * addr against the alignment encoded in a->align here.
++    tcg_temp_free_ptr(fpst);
 +     */
 +    for (reg = 0; reg < nregs; reg++) {
 +        if (a->l) {
 +            gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
 +                            s->be_data | a->size);
 +            neon_store_element(vd, a->reg_idx, a->size, tmp);
 +        } else { /* Store */
 +            neon_load_element(tmp, vd, a->reg_idx, a->size);
 +            gen_aa32_st_i32(s, tmp, addr, get_mem_index(s),
 +                            s->be_data | a->size);
 +        }
 +        vd += a->stride;
 +        tcg_gen_addi_i32(addr, addr, 1 << a->size);
 +    }
 +    tcg_temp_free_i32(addr);
 +    tcg_temp_free_i32(tmp);
-+
-+    gen_neon_ldst_base_update(s, a->rm, a->rn, (1 << a->size) * nregs);
-+
 +    return true;
 +}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
++
  static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
  {
      TCGv_ptr fpst;
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/vfp_helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ static void gen_neon_trn_u16(TCGv_i32 t0, TCGv_i32 t1)
+@@ -XXX,XX +XXX,XX @@ float32 VFP_HELPER(fcvts, d)(float64 x, CPUARMState *env)
-     tcg_temp_free_i32(rd);
+     return float64_to_float32(x, &env->vfp.fp_status);
  }
--
++uint32_t HELPER(bfcvt)(float32 x, void *status)
--/* Translate a NEON load/store element instruction.  Return nonzero if the
++{
--   instruction is invalid.  */
++    return float32_to_bfloat16(x, status);
--static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
++}
--{
++
--    int rd, rn, rm;
+ /*
--    int nregs;
+  * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
--    int stride;
+  * must always round-to-nearest; the AArch64 ones honour the FPSCR
 -    int size;
 -    int reg;
 -    int load;
 -    TCGv_i32 addr;
 -    TCGv_i32 tmp;
 -
 -    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 -        return 1;
 -    }
 -
 -    /* FIXME: this access check should not take precedence over UNDEF
 -     * for invalid encodings; we will generate incorrect syndrome information
 -     * for attempts to execute invalid vfp/neon encodings with FP disabled.
 -     */
 -    if (s->fp_excp_el) {
 -        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
 -                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
 -        return 0;
 -    }
 -
 -    if (!s->vfp_enabled)
 -      return 1;
 -    VFP_DREG_D(rd, insn);
 -    rn = (insn >> 16) & 0xf;
 -    rm = insn & 0xf;
 -    load = (insn & (1 << 21)) != 0;
 -    if ((insn & (1 << 23)) == 0) {
 -        /* Load store all elements -- handled already by decodetree */
 -        return 1;
 -    } else {
 -        size = (insn >> 10) & 3;
 -        if (size == 3) {
 -            /* Load single element to all lanes -- handled by decodetree  */
 -            return 1;
 -        } else {
 -            /* Single element.  */
 -            int idx = (insn >> 4) & 0xf;
 -            int reg_idx;
 -            switch (size) {
 -            case 0:
 -                reg_idx = (insn >> 5) & 7;
 -                stride = 1;
 -                break;
 -            case 1:
 -                reg_idx = (insn >> 6) & 3;
 -                stride = (insn & (1 << 5)) ? 2 : 1;
 -                break;
 -            case 2:
 -                reg_idx = (insn >> 7) & 1;
 -                stride = (insn & (1 << 6)) ? 2 : 1;
 -                break;
 -            default:
 -                abort();
 -            }
 -            nregs = ((insn >> 8) & 3) + 1;
 -            /* Catch the UNDEF cases. This is unavoidably a bit messy. */
 -            switch (nregs) {
 -            case 1:
 -                if (((idx & (1 << size)) != 0) ||
 -                    (size == 2 && ((idx & 3) == 1 || (idx & 3) == 2))) {
 -                    return 1;
 -                }
 -                break;
 -            case 3:
 -                if ((idx & 1) != 0) {
 -                    return 1;
 -                }
 -                /* fall through */
 -            case 2:
 -                if (size == 2 && (idx & 2) != 0) {
 -                    return 1;
 -                }
 -                break;
 -            case 4:
 -                if ((size == 2) && ((idx & 3) == 3)) {
 -                    return 1;
 -                }
 -                break;
 -            default:
 -                abort();
 -            }
 -            if ((rd + stride * (nregs - 1)) > 31) {
 -                /* Attempts to write off the end of the register file
 -                 * are UNPREDICTABLE; we choose to UNDEF because otherwise
 -                 * the neon_load_reg() would write off the end of the array.
 -                 */
 -                return 1;
 -            }
 -            tmp = tcg_temp_new_i32();
 -            addr = tcg_temp_new_i32();
 -            load_reg_var(s, addr, rn);
 -            for (reg = 0; reg < nregs; reg++) {
 -                if (load) {
 -                    gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
 -                                    s->be_data | size);
 -                    neon_store_element(rd, reg_idx, size, tmp);
 -                } else { /* Store */
 -                    neon_load_element(tmp, rd, reg_idx, size);
 -                    gen_aa32_st_i32(s, tmp, addr, get_mem_index(s),
 -                                    s->be_data | size);
 -                }
 -                rd += stride;
 -                tcg_gen_addi_i32(addr, addr, 1 << size);
 -            }
 -            tcg_temp_free_i32(addr);
 -            tcg_temp_free_i32(tmp);
 -            stride = nregs * (1 << size);
 -        }
 -    }
 -    if (rm != 15) {
 -        TCGv_i32 base;
 -
 -        base = load_reg(s, rn);
 -        if (rm == 13) {
 -            tcg_gen_addi_i32(base, base, stride);
 -        } else {
 -            TCGv_i32 index;
 -            index = load_reg(s, rm);
 -            tcg_gen_add_i32(base, base, index);
 -            tcg_temp_free_i32(index);
 -        }
 -        store_reg(s, rn, base);
 -    }
 -    return 0;
 -}
 -
  static inline void gen_neon_narrow(int size, TCGv_i32 dest, TCGv_i64 src)
  {
      switch (size) {
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
              }
              return;
          }
 -        if ((insn & 0x0f100000) == 0x04000000) {
 -            /* NEON load/store.  */
 -            if (disas_neon_ls_insn(s, insn)) {
 -                goto illegal_op;
 -            }
 -            return;
 -        }
          if ((insn & 0x0e000f00) == 0x0c000100) {
              if (arm_dc_feature(s, ARM_FEATURE_IWMMXT)) {
                  /* iWMMXt register transfer.  */
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
          }
          break;
      case 12:
 -        if ((insn & 0x01100000) == 0x01000000) {
 -            if (disas_neon_ls_insn(s, insn)) {
 -                goto illegal_op;
 -            }
 -            break;
 -        }
          goto illegal_op;
      default:
      illegal_op:
 --
 .20.1

-[PULL 33/39] target/arm: Convert Neon 3-reg-same VADD/VSUB to decodetree
+[PULL 19/45] target/arm: Implement vector float32 to bfloat16 conversion
-Convert the Neon 3-reg-same VADD and VSUB insns to decodetree.
+From: Richard Henderson <richard.henderson@linaro.org>
-Note that we don't need the neon_3r_sizes[op] check here because all
+This is BFCVT{N,T} for both AArch64 AdvSIMD and SVE,
-size values are OK for VADD and VSUB; we'll add this when we convert
+and VCVT.BF16.F32 for AArch32 NEON.
-the first insn that has size restrictions.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-For this we need one of the GVecGen*Fn typedefs currently in
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-translate-a64.h; move them all to translate.h as a block so they
+Message-id: 20210525225817.400336-5-richard.henderson@linaro.org
 are visible to the 32-bit decoder.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-15-peter.maydell@linaro.org
 ---
- target/arm/translate-a64.h      |  9 --------
+ target/arm/helper-sve.h     |  4 ++++
- target/arm/translate.h          |  9 ++++++++
+ target/arm/helper.h         |  1 +
- target/arm/neon-dp.decode       | 17 +++++++++++++++
+ target/arm/neon-dp.decode   |  1 +
- target/arm/translate-neon.inc.c | 38 +++++++++++++++++++++++++++++++++
+ target/arm/sve.decode       |  2 ++
- target/arm/translate.c          | 14 ++++--------
+ target/arm/sve_helper.c     |  2 ++
-files changed, 68 insertions(+), 19 deletions(-)
+ target/arm/translate-a64.c  | 17 ++++++++++++++
+ target/arm/translate-neon.c | 45 +++++++++++++++++++++++++++++++++++++
-diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
+ target/arm/translate-sve.c  | 16 +++++++++++++
-index XXXXXXX..XXXXXXX 100644
+ target/arm/vfp_helper.c     |  7 ++++++
---- a/target/arm/translate-a64.h
+files changed, 95 insertions(+)
-+++ b/target/arm/translate-a64.h
-@@ -XXX,XX +XXX,XX @@ static inline int vec_full_reg_size(DisasContext *s)
+diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
+index XXXXXXX..XXXXXXX 100644
- bool disas_sve(DisasContext *, uint32_t);
+--- a/target/arm/helper-sve.h
++++ b/target/arm/helper-sve.h
--/* Note that the gvec expanders operate on offsets + sizes.  */
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve_fcvt_hd, TCG_CALL_NO_RWG,
--typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
+                    void, ptr, ptr, ptr, ptr, i32)
--typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
+ DEF_HELPER_FLAGS_5(sve_fcvt_sd, TCG_CALL_NO_RWG,
--                         uint32_t, uint32_t);
+                    void, ptr, ptr, ptr, ptr, i32)
--typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
++DEF_HELPER_FLAGS_5(sve_bfcvt, TCG_CALL_NO_RWG,
--                        uint32_t, uint32_t, uint32_t);
++                   void, ptr, ptr, ptr, ptr, i32)
--typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
--                        uint32_t, uint32_t, uint32_t);
+ DEF_HELPER_FLAGS_5(sve_fcvtzs_hh, TCG_CALL_NO_RWG,
--
+                    void, ptr, ptr, ptr, ptr, i32)
- #endif /* TARGET_ARM_TRANSLATE_A64_H */
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve2_fcvtnt_sh, TCG_CALL_NO_RWG,
-diff --git a/target/arm/translate.h b/target/arm/translate.h
+                    void, ptr, ptr, ptr, ptr, i32)
-index XXXXXXX..XXXXXXX 100644
+ DEF_HELPER_FLAGS_5(sve2_fcvtnt_ds, TCG_CALL_NO_RWG,
---- a/target/arm/translate.h
+                    void, ptr, ptr, ptr, ptr, i32)
-+++ b/target/arm/translate.h
++DEF_HELPER_FLAGS_5(sve_bfcvtnt, TCG_CALL_NO_RWG,
-@@ -XXX,XX +XXX,XX @@ void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
++                   void, ptr, ptr, ptr, ptr, i32)
- #define dc_isar_feature(name, ctx) \
-     ({ DisasContext *ctx_ = (ctx); isar_feature_##name(ctx_->isar); })
+ DEF_HELPER_FLAGS_5(sve2_fcvtlt_hs, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, i32)
-+/* Note that the gvec expanders operate on offsets + sizes.  */
+diff --git a/target/arm/helper.h b/target/arm/helper.h
-+typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
+index XXXXXXX..XXXXXXX 100644
-+typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
+--- a/target/arm/helper.h
-+                         uint32_t, uint32_t);
++++ b/target/arm/helper.h
-+typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
-+                        uint32_t, uint32_t, uint32_t);
+ DEF_HELPER_2(vfp_fcvtds, f64, f32, env)
-+typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
+ DEF_HELPER_2(vfp_fcvtsd, f32, f64, env)
-+                        uint32_t, uint32_t, uint32_t);
+ DEF_HELPER_FLAGS_2(bfcvt, TCG_CALL_NO_RWG, i32, f32, ptr)
-+
++DEF_HELPER_FLAGS_2(bfcvt_pair, TCG_CALL_NO_RWG, i32, i64, ptr)
- #endif /* TARGET_ARM_TRANSLATE_H */
  DEF_HELPER_2(vfp_uitoh, f16, i32, ptr)
  DEF_HELPER_2(vfp_uitos, f32, i32, ptr)
 diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-dp.decode
 +++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
- #
+     VRINTZ       1111 001 11 . 11 .. 10 .... 0 1011 . . 0 .... @2misc
- # This file is processed by scripts/decodetree.py
- #
+     VCVT_F16_F32 1111 001 11 . 11 .. 10 .... 0 1100 0 . 0 .... @2misc_q0
-+# VFP/Neon register fields; same as vfp.decode
++    VCVT_B16_F32 1111 001 11 . 11 .. 10 .... 0 1100 1 . 0 .... @2misc_q0
-+%vm_dp  5:1 0:4
-+%vn_dp  7:1 16:4
+     VRINTM       1111 001 11 . 11 .. 10 .... 0 1101 . . 0 .... @2misc
-+%vd_dp  22:1 12:4
+diff --git a/target/arm/sve.decode b/target/arm/sve.decode
- # Encodings for Neon data processing instructions where the T32 encoding
+index XXXXXXX..XXXXXXX 100644
- # is a simple transformation of the A32 encoding.
+--- a/target/arm/sve.decode
-@@ -XXX,XX +XXX,XX @@
++++ b/target/arm/sve.decode
- #   0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
+@@ -XXX,XX +XXX,XX @@ FNMLS_zpzzz     01100101 .. 1 ..... 111 ... ..... .....         @rdn_pg_rm_ra
- # This file works on the A32 encoding only; calling code for T32 has to
+ # SVE floating-point convert precision
- # transform the insn into the A32 version first.
+ FCVT_sh         01100101 10 0010 00 101 ... ..... .....         @rd_pg_rn_e0
-+
+ FCVT_hs         01100101 10 0010 01 101 ... ..... .....         @rd_pg_rn_e0
-+######################################################################
++BFCVT           01100101 10 0010 10 101 ... ..... .....         @rd_pg_rn_e0
-+# 3-reg-same grouping:
+ FCVT_dh         01100101 11 0010 00 101 ... ..... .....         @rd_pg_rn_e0
-+# 1111 001 U 0 D sz:2 Vn:4 Vd:4 opc:4 N Q M op Vm:4
+ FCVT_hd         01100101 11 0010 01 101 ... ..... .....         @rd_pg_rn_e0
-+######################################################################
+ FCVT_ds         01100101 11 0010 10 101 ... ..... .....         @rd_pg_rn_e0
-+
+@@ -XXX,XX +XXX,XX @@ RAX1            01000101 00 1 ..... 11110 1 ..... .....  @rd_rn_rm_e0
-+&3same vm vn vd q size
+ FCVTXNT_ds      01100100 00 0010 10 101 ... ..... .....  @rd_pg_rn_e0
-+
+ FCVTX_ds        01100101 00 0010 10 101 ... ..... .....  @rd_pg_rn_e0
-+@3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
+ FCVTNT_sh       01100100 10 0010 00 101 ... ..... .....  @rd_pg_rn_e0
-+                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
++BFCVTNT         01100100 10 0010 10 101 ... ..... .....  @rd_pg_rn_e0
-+
+ FCVTLT_hs       01100100 10 0010 01 101 ... ..... .....  @rd_pg_rn_e0
-+VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
+ FCVTNT_ds       01100100 11 0010 10 101 ... ..... .....  @rd_pg_rn_e0
-+VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
+ FCVTLT_sd       01100100 11 0010 11 101 ... ..... .....  @rd_pg_rn_e0
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/arm/sve_helper.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/arm/sve_helper.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
+@@ -XXX,XX +XXX,XX @@ static inline uint64_t vfp_float64_to_uint64_rtz(float64 f, float_status *s)
  DO_ZPZ_FP(sve_fcvt_sh, uint32_t, H1_4, sve_f32_to_f16)
  DO_ZPZ_FP(sve_fcvt_hs, uint32_t, H1_4, sve_f16_to_f32)
 +DO_ZPZ_FP(sve_bfcvt,   uint32_t, H1_4, float32_to_bfloat16)
  DO_ZPZ_FP(sve_fcvt_dh, uint64_t,     , sve_f64_to_f16)
  DO_ZPZ_FP(sve_fcvt_hd, uint64_t,     , sve_f16_to_f64)
  DO_ZPZ_FP(sve_fcvt_ds, uint64_t,     , float64_to_float32)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc)  \
      } while (i != 0);                                                         \
  }
 +DO_FCVTNT(sve_bfcvtnt,    uint32_t, uint16_t, H1_4, H1_2, float32_to_bfloat16)
  DO_FCVTNT(sve2_fcvtnt_sh, uint32_t, uint16_t, H1_4, H1_2, sve_f32_to_f16)
  DO_FCVTNT(sve2_fcvtnt_ds, uint64_t, uint32_t,     , H1_4, float64_to_float32)
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_2misc_narrow(DisasContext *s, bool scalar,
                  tcg_temp_free_i32(ahp);
              }
              break;
 +        case 0x36: /* BFCVTN, BFCVTN2 */
 +            {
 +                TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
 +                gen_helper_bfcvt_pair(tcg_res[pass], tcg_op, fpst);
 +                tcg_temp_free_ptr(fpst);
 +            }
 +            break;
          case 0x56:  /* FCVTXN, FCVTXN2 */
              /* 64 bit to 32 bit float conversion
               * with von Neumann rounding (round to odd)
@@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
              }
              handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd);
              return;
 +        case 0x36: /* BFCVTN, BFCVTN2 */
 +            if (!dc_isar_feature(aa64_bf16, s) || size != 2) {
 +                unallocated_encoding(s);
 +                return;
 +            }
 +            if (!fp_access_check(s)) {
 +                return;
 +            }
 +            handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd);
 +            return;
          case 0x17: /* FCVTL, FCVTL2 */
              if (!fp_access_check(s)) {
                  return;
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      return true;
  }
-+
-+static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
++static bool trans_VCVT_B16_F32(DisasContext *s, arg_2misc *a)
 +{
-+    int vec_size = a->q ? 16 : 8;
++    TCGv_ptr fpst;
-+    int rd_ofs = neon_reg_offset(a->vd, 0);
++    TCGv_i64 tmp;
-+    int rn_ofs = neon_reg_offset(a->vn, 0);
++    TCGv_i32 dst0, dst1;
-+    int rm_ofs = neon_reg_offset(a->vm, 0);
++
-+
++    if (!dc_isar_feature(aa32_bf16, s)) {
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
++        ((a->vd | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
-+    if ((a->vn | a->vm | a->vd) & a->q) {
++    if ((a->vm & 1) || (a->size != 1)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    fn(a->size, rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
++    fpst = fpstatus_ptr(FPST_STD);
 +    tmp = tcg_temp_new_i64();
 +    dst0 = tcg_temp_new_i32();
 +    dst1 = tcg_temp_new_i32();
 +
 +    read_neon_element64(tmp, a->vm, 0, MO_64);
 +    gen_helper_bfcvt_pair(dst0, tmp, fpst);
 +
 +    read_neon_element64(tmp, a->vm, 1, MO_64);
 +    gen_helper_bfcvt_pair(dst1, tmp, fpst);
 +
 +    write_neon_element32(dst0, a->vd, 0, MO_32);
 +    write_neon_element32(dst1, a->vd, 1, MO_32);
 +
 +    tcg_temp_free_i64(tmp);
 +    tcg_temp_free_i32(dst0);
 +    tcg_temp_free_i32(dst1);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
-+#define DO_3SAME(INSN, FUNC)                                            \
+ static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
-+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
+ {
-+    {                                                                   \
+     TCGv_ptr fpst;
-+        return do_3same(s, a, FUNC);                                    \
+diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
-+    }
+index XXXXXXX..XXXXXXX 100644
-+
+--- a/target/arm/translate-sve.c
-+DO_3SAME(VADD, tcg_gen_gvec_add)
++++ b/target/arm/translate-sve.c
-+DO_3SAME(VSUB, tcg_gen_gvec_sub)
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_hs(DisasContext *s, arg_rpr_esz *a)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_hs);
-index XXXXXXX..XXXXXXX 100644
+ }
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
++static bool trans_BFCVT(DisasContext *s, arg_rpr_esz *a)
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
++{
-             }
++    if (!dc_isar_feature(aa64_sve_bf16, s)) {
-             return 0;
++        return false;
++    }
--        case NEON_3R_VADD_VSUB:
++    return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_bfcvt);
--            if (u) {
++}
--                tcg_gen_gvec_sub(size, rd_ofs, rn_ofs, rm_ofs,
++
--                                 vec_size, vec_size);
+ static bool trans_FCVT_dh(DisasContext *s, arg_rpr_esz *a)
--            } else {
+ {
--                tcg_gen_gvec_add(size, rd_ofs, rn_ofs, rm_ofs,
+     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_dh);
--                                 vec_size, vec_size);
+@@ -XXX,XX +XXX,XX @@ static bool trans_FCVTNT_sh(DisasContext *s, arg_rpr_esz *a)
--            }
+     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtnt_sh);
--            return 0;
+ }
--
-         case NEON_3R_VQADD:
++static bool trans_BFCVTNT(DisasContext *s, arg_rpr_esz *a)
-             tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
++{
-                            rn_ofs, rm_ofs, vec_size, vec_size,
++    if (!dc_isar_feature(aa64_sve_bf16, s)) {
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
++        return false;
-             tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
++    }
-                            u ? &ushl_op[size] : &sshl_op[size]);
++    return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_bfcvtnt);
-             return 0;
++}
 +
-+        case NEON_3R_VADD_VSUB:
+ static bool trans_FCVTNT_ds(DisasContext *s, arg_rpr_esz *a)
-+            /* Already handled by decodetree */
+ {
-+            return 1;
+     if (!dc_isar_feature(aa64_sve2, s)) {
-         }
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
+index XXXXXXX..XXXXXXX 100644
-         if (size == 3) {
+--- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(bfcvt)(float32 x, void *status)
      return float32_to_bfloat16(x, status);
  }
 +uint32_t HELPER(bfcvt_pair)(uint64_t pair, void *status)
 +{
 +    bfloat16 lo = float32_to_bfloat16(extract64(pair, 0, 32), status);
 +    bfloat16 hi = float32_to_bfloat16(extract64(pair, 32, 32), status);
 +    return deposit32(lo, 16, 16, hi);
 +}
 +
  /*
   * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
   * must always round-to-nearest; the AArch64 ones honour the FPSCR
 --
 .20.1

-[PULL 02/39] hw/arm/mps2-tz: Use TYPE_IOTKIT instead of hardcoded string
+[PULL 20/45] softfpu: Add float_round_to_odd_inf
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
+From: Richard Henderson <richard.henderson@linaro.org>
-By using the TYPE_* definitions for devices, we can:
+For Arm BFDOT and BFMMLA, we need a version of round-to-odd
- - quickly find where devices are used with 'git-grep'
+that overflows to infinity, instead of the max normal number.
  - easily rename a device (one-line change).
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Cc: Alex Bennée <alex.bennee@linaro.org>
-Message-id: 20200428154650.21991-1-f4bug@amsat.org
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210525225817.400336-6-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/mps2-tz.c | 2 +-
+ include/fpu/softfloat-types.h | 4 +++-
-file changed, 1 insertion(+), 1 deletion(-)
+ fpu/softfloat-parts.c.inc     | 6 ++++--
 files changed, 7 insertions(+), 3 deletions(-)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/include/fpu/softfloat-types.h
-+++ b/hw/arm/mps2-tz.c
++++ b/include/fpu/softfloat-types.h
-@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
+@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
-         exit(EXIT_FAILURE);
+     float_round_up           = 2,
      float_round_to_zero      = 3,
      float_round_ties_away    = 4,
 -    /* Not an IEEE rounding mode: round to the closest odd mantissa value */
 +    /* Not an IEEE rounding mode: round to closest odd, overflow to max */
      float_round_to_odd       = 5,
 +    /* Not an IEEE rounding mode: round to closest odd, overflow to inf */
 +    float_round_to_odd_inf   = 6,
  } FloatRoundMode;
  /*
 diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/fpu/softfloat-parts.c.inc
 +++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
          g_assert_not_reached();
      }
--    sysbus_init_child_obj(OBJECT(machine), "iotkit", &mms->iotkit,
++    overflow_norm = false;
-+    sysbus_init_child_obj(OBJECT(machine), TYPE_IOTKIT, &mms->iotkit,
+     switch (s->float_rounding_mode) {
-                           sizeof(mms->iotkit), mmc->armsse_type);
+     case float_round_nearest_even:
-     iotkitdev = DEVICE(&mms->iotkit);
+-        overflow_norm = false;
-     object_property_set_link(OBJECT(&mms->iotkit), OBJECT(system_memory),
+         inc = ((p->frac_lo & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
          break;
      case float_round_ties_away:
 -        overflow_norm = false;
          inc = frac_lsbm1;
          break;
      case float_round_to_zero:
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
          break;
      case float_round_to_odd:
          overflow_norm = true;
 +        /* fall through */
 +    case float_round_to_odd_inf:
          inc = p->frac_lo & frac_lsb ? 0 : round_mask;
          break;
      default:
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
                         ? frac_lsbm1 : 0);
                  break;
              case float_round_to_odd:
 +            case float_round_to_odd_inf:
                  inc = p->frac_lo & frac_lsb ? 0 : round_mask;
                  break;
              default:
 --
 .20.1

-[PULL 23/39] target/arm: Convert VCMLA (vector) to decodetree
+[PULL 21/45] target/arm: Implement bfloat16 dot product (vector)
-Convert the VCMLA (vector) insns in the 3same extension group to
+From: Richard Henderson <richard.henderson@linaro.org>
 decodetree.
+This is BFDOT for both AArch64 AdvSIMD and SVE,
+and VDOT.BF16 for AArch32 NEON.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210525225817.400336-7-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-5-peter.maydell@linaro.org
 ---
- target/arm/neon-shared.decode   | 11 ++++++++++
+ target/arm/helper.h           |  3 +++
- target/arm/translate-neon.inc.c | 37 +++++++++++++++++++++++++++++++++
+ target/arm/neon-shared.decode |  2 ++
- target/arm/translate.c          | 11 +---------
+ target/arm/sve.decode         |  3 +++
-files changed, 49 insertions(+), 10 deletions(-)
+ target/arm/translate-a64.c    | 20 ++++++++++++++++++
  target/arm/translate-neon.c   |  9 ++++++++
  target/arm/translate-sve.c    | 12 +++++++++++
  target/arm/vec_helper.c       | 40 +++++++++++++++++++++++++++++++++++
 files changed, 89 insertions(+)
+diff --git a/target/arm/helper.h b/target/arm/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.h
++++ b/target/arm/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_ummla_b, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, i32)
++
+ #ifdef TARGET_AARCH64
+ #include "helper-a64.h"
+ #include "helper-sve.h"
 diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-shared.decode
 +++ b/target/arm/neon-shared.decode
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ VUDOT          1111 110 00 . 10 .... .... 1101 . q:1 . 1 .... \
- # More specifically, this covers:
+                vm=%vm_dp vn=%vn_dp vd=%vd_dp
- # 2reg scalar ext: 0b1111_1110_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
+ VUSDOT         1111 110 01 . 10 .... .... 1101 . q:1 . 0 .... \
- # 3same ext:       0b1111_110x_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
+                vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +VDOT_b16       1111 110 00 . 00 .... .... 1101 . q:1 . 0 .... \
 +               vm=%vm_dp vn=%vn_dp vd=%vd_dp
  # VFM[AS]L
  VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
 diff --git a/target/arm/sve.decode b/target/arm/sve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/sve.decode
 +++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
  FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_e0
  FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_e0
 +### SVE2 floating-point bfloat16 dot-product
 +BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
 +
-+# VFP/Neon register fields; same as vfp.decode
+ ### SVE2 floating-point multiply-add long (indexed)
-+%vm_dp  5:1 0:4
+ FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
-+%vm_sp  0:4 5:1
+ FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
-+%vn_dp  7:1 16:4
+diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
-+%vn_sp  16:4 7:1
+index XXXXXXX..XXXXXXX 100644
-+%vd_dp  22:1 12:4
+--- a/target/arm/translate-a64.c
-+%vd_sp  12:4 22:1
++++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          feature = dc_isar_feature(aa64_fcma, s);
          break;
 +    case 0x1f: /* BFDOT */
 +        switch (size) {
 +        case 1:
 +            feature = dc_isar_feature(aa64_bf16, s);
 +            break;
 +        default:
 +            unallocated_encoding(s);
 +            return;
 +        }
 +        break;
      default:
          unallocated_encoding(s);
          return;
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          return;
 +    case 0xf: /* BFDOT */
 +        switch (size) {
 +        case 1:
 +            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
 +        return;
 +
-+VCMLA          1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
+     default:
-+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
+         g_assert_not_reached();
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+     }
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/arm/translate-neon.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/arm/translate-neon.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static bool trans_VUSDOT(DisasContext *s, arg_VUSDOT *a)
- #include "decode-neon-dp.inc.c"
+                         gen_helper_gvec_usdot_b);
- #include "decode-neon-ls.inc.c"
+ }
- #include "decode-neon-shared.inc.c"
-+
++static bool trans_VDOT_b16(DisasContext *s, arg_VDOT_b16 *a)
 +static bool trans_VCMLA(DisasContext *s, arg_VCMLA *a)
 +{
-+    int opr_sz;
++    if (!dc_isar_feature(aa32_bf16, s)) {
 +    TCGv_ptr fpst;
 +    gen_helper_gvec_3_ptr *fn_gvec_ptr;
 +
 +    if (!dc_isar_feature(aa32_vcma, s)
 +        || (!a->size && !dc_isar_feature(aa32_fp16_arith, s))) {
 +        return false;
 +    }
++    return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0,
++                        gen_helper_gvec_bfdot);
++}
 +
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
+ static bool trans_VFML(DisasContext *s, arg_VFML *a)
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+ {
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
+     int opr_sz;
 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-sve.c
 +++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_UMMLA(DisasContext *s, arg_rrrr_esz *a)
  {
      return do_i8mm_zzzz_ool(s, a, gen_helper_gvec_ummla_b, 0);
  }
 +
 +static bool trans_BFDOT_zzzz(DisasContext *s, arg_rrrr_esz *a)
 +{
 +    if (!dc_isar_feature(aa64_sve_bf16, s)) {
 +        return false;
 +    }
-+
++    if (sve_access_check(s)) {
-+    if ((a->vn | a->vm | a->vd) & a->q) {
++        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfdot,
-+        return false;
++                          a->rd, a->rn, a->rm, a->ra, 0);
 +    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    opr_sz = (1 + a->q) * 8;
-+    fpst = get_fpstatus_ptr(1);
-+    fn_gvec_ptr = a->size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
-+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
-+                       vfp_reg_offset(1, a->vn),
-+                       vfp_reg_offset(1, a->vm),
-+                       fpst, opr_sz, opr_sz, a->rot,
-+                       fn_gvec_ptr);
-+    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/vec_helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void do_mmla_b(void *vd, void *vn, void *vm, void *va, uint32_t desc,
-     bool is_long = false, q = extract32(insn, 6, 1);
+ DO_MMLA_B(gvec_smmla_b, do_smmla_b)
-     bool ptr_is_env = false;
+ DO_MMLA_B(gvec_ummla_b, do_ummla_b)
+ DO_MMLA_B(gvec_usmmla_b, do_usmmla_b)
--    if ((insn & 0xfe200f10) == 0xfc200800) {
++
--        /* VCMLA -- 1111 110R R.1S .... .... 1000 ...0 .... */
++/*
--        int size = extract32(insn, 20, 1);
++ * BFloat16 Dot Product
--        data = extract32(insn, 23, 2); /* rot */
++ */
--        if (!dc_isar_feature(aa32_vcma, s)
++
--            || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
++static float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2)
--            return 1;
++{
--        }
++    /* FPCR is ignored for BFDOT and BFMMLA. */
--        fn_gvec_ptr = size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
++    float_status bf_status = {
--    } else if ((insn & 0xfea00f10) == 0xfc800800) {
++        .tininess_before_rounding = float_tininess_before_rounding,
-+    if ((insn & 0xfea00f10) == 0xfc800800) {
++        .float_rounding_mode = float_round_to_odd_inf,
-         /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
++        .flush_to_zero = true,
-         int size = extract32(insn, 20, 1);
++        .flush_inputs_to_zero = true,
-         data = extract32(insn, 24, 1); /* rot */
++        .default_nan_mode = true,
 +    };
 +    float32 t1, t2;
 +
 +    /*
 +     * Extract each BFloat16 from the element pair, and shift
 +     * them such that they become float32.
 +     */
 +    t1 = float32_mul(e1 << 16, e2 << 16, &bf_status);
 +    t2 = float32_mul(e1 & 0xffff0000u, e2 & 0xffff0000u, &bf_status);
 +    t1 = float32_add(t1, t2, &bf_status);
 +    t1 = float32_add(sum, t1, &bf_status);
 +
 +    return t1;
 +}
 +
 +void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
 +{
 +    intptr_t i, opr_sz = simd_oprsz(desc);
 +    float32 *d = vd, *a = va;
 +    uint32_t *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz / 4; ++i) {
 +        d[i] = bfdotadd(a[i], n[i], m[i]);
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 --
 .20.1

-[PULL 27/39] target/arm: Convert VCMLA (scalar) to decodetree
+[PULL 22/45] target/arm: Implement bfloat16 dot product (indexed)
-Convert VCMLA (scalar) in the 2reg-scalar-ext group to decodetree.
+From: Richard Henderson <richard.henderson@linaro.org>
+This is BFDOT for both AArch64 AdvSIMD and SVE,
+and VDOT.BF16 for AArch32 NEON.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210525225817.400336-8-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-9-peter.maydell@linaro.org
 ---
- target/arm/neon-shared.decode   |  5 +++++
+ target/arm/helper.h           |  2 ++
- target/arm/translate-neon.inc.c | 40 +++++++++++++++++++++++++++++++++
+ target/arm/neon-shared.decode |  2 ++
- target/arm/translate.c          | 26 +--------------------
+ target/arm/sve.decode         |  3 +++
-files changed, 46 insertions(+), 25 deletions(-)
+ target/arm/translate-a64.c    | 41 +++++++++++++++++++++++++++--------
  target/arm/translate-neon.c   |  9 ++++++++
  target/arm/translate-sve.c    | 12 ++++++++++
  target/arm/vec_helper.c       | 20 +++++++++++++++++
 files changed, 80 insertions(+), 9 deletions(-)
+diff --git a/target/arm/helper.h b/target/arm/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.h
++++ b/target/arm/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, i32)
+ #ifdef TARGET_AARCH64
+ #include "helper-a64.h"
 diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-shared.decode
 +++ b/target/arm/neon-shared.decode
-@@ -XXX,XX +XXX,XX @@ VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
+@@ -XXX,XX +XXX,XX @@ VUSDOT_scalar  1111 1110 1 . 00 .... .... 1101 . q:1 index:1 0 vm:4 \
-                vm=%vm_sp vn=%vn_sp vd=%vd_dp q=0
+                vn=%vn_dp vd=%vd_dp
- VFML           1111 110 0 s:1 . 10 .... .... 1000 . 1 . 1 .... \
+ VSUDOT_scalar  1111 1110 1 . 00 .... .... 1101 . q:1 index:1 1 vm:4 \
-                vm=%vm_dp vn=%vn_dp vd=%vd_dp q=1
+                vn=%vn_dp vd=%vd_dp
 +VDOT_b16_scal  1111 1110 0 . 00 .... .... 1101 . q:1 index:1 0 vm:4 \
 +               vn=%vn_dp vd=%vd_dp
  %vfml_scalar_q0_rm 0:3 5:1
  %vfml_scalar_q1_index 5:1 3:1
 diff --git a/target/arm/sve.decode b/target/arm/sve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/sve.decode
 +++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
  FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
  FMLSLB_zzxw     01100100 10 1 ..... 0110.0 ..... .....     @rrxr_3a esz=2
  FMLSLT_zzxw     01100100 10 1 ..... 0110.1 ..... .....     @rrxr_3a esz=2
 +
-+VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
++### SVE2 floating-point bfloat16 dot-product (indexed)
-+               vn=%vn_dp vd=%vd_dp size=0
++BFDOT_zzxz      01100100 01 1 ..... 010000 ..... .....     @rrxr_2 esz=2
-+VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
+diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 +               vm=%vm_dp vn=%vn_dp vd=%vd_dp size=1 index=0
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/arm/translate-a64.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VFML(DisasContext *s, arg_VFML *a)
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
-                        gen_helper_gvec_fmlal_a32);
+             return;
          }
          break;
 -    case 0x0f: /* SUDOT, USDOT */
 -        if (is_scalar || (size & 1) || !dc_isar_feature(aa64_i8mm, s)) {
 +    case 0x0f:
 +        switch (size) {
 +        case 0: /* SUDOT */
 +        case 2: /* USDOT */
 +            if (is_scalar || !dc_isar_feature(aa64_i8mm, s)) {
 +                unallocated_encoding(s);
 +                return;
 +            }
 +            break;
 +        case 1: /* BFDOT */
 +            if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
 +                unallocated_encoding(s);
 +                return;
 +            }
 +            break;
 +        default:
              unallocated_encoding(s);
              return;
          }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                           u ? gen_helper_gvec_udot_idx_b
                           : gen_helper_gvec_sdot_idx_b);
          return;
 -    case 0x0f: /* SUDOT, USDOT */
 -        gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
 -                         extract32(insn, 23, 1)
 -                         ? gen_helper_gvec_usdot_idx_b
 -                         : gen_helper_gvec_sudot_idx_b);
 -        return;
 -
 +    case 0x0f:
 +        switch (extract32(insn, 22, 2)) {
 +        case 0: /* SUDOT */
 +            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
 +                             gen_helper_gvec_sudot_idx_b);
 +            return;
 +        case 1: /* BFDOT */
 +            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
 +                             gen_helper_gvec_bfdot_idx);
 +            return;
 +        case 2: /* USDOT */
 +            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
 +                             gen_helper_gvec_usdot_idx_b);
 +            return;
 +        }
 +        g_assert_not_reached();
      case 0x11: /* FCMLA #0 */
      case 0x13: /* FCMLA #90 */
      case 0x15: /* FCMLA #180 */
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSUDOT_scalar(DisasContext *s, arg_VSUDOT_scalar *a)
                          gen_helper_gvec_sudot_idx_b);
  }
 +static bool trans_VDOT_b16_scal(DisasContext *s, arg_VDOT_b16_scal *a)
 +{
 +    if (!dc_isar_feature(aa32_bf16, s)) {
 +        return false;
 +    }
 +    return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index,
 +                        gen_helper_gvec_bfdot_idx);
 +}
 +
  static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
  {
      int opr_sz;
 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-sve.c
 +++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFDOT_zzzz(DisasContext *s, arg_rrrr_esz *a)
      }
      return true;
  }
 +
-+static bool trans_VCMLA_scalar(DisasContext *s, arg_VCMLA_scalar *a)
++static bool trans_BFDOT_zzxz(DisasContext *s, arg_rrxr_esz *a)
 +{
-+    gen_helper_gvec_3_ptr *fn_gvec_ptr;
++    if (!dc_isar_feature(aa64_sve_bf16, s)) {
 +    int opr_sz;
 +    TCGv_ptr fpst;
 +
 +    if (!dc_isar_feature(aa32_vcma, s)) {
 +        return false;
 +    }
-+    if (a->size == 0 && !dc_isar_feature(aa32_fp16_arith, s)) {
++    if (sve_access_check(s)) {
-+        return false;
++        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfdot_idx,
 +                          a->rd, a->rn, a->rm, a->ra, a->index);
 +    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
-+        return false;
-+    }
-+
-+    if ((a->vd | a->vn) & a->q) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    fn_gvec_ptr = (a->size ? gen_helper_gvec_fcmlas_idx
-+                   : gen_helper_gvec_fcmlah_idx);
-+    opr_sz = (1 + a->q) * 8;
-+    fpst = get_fpstatus_ptr(1);
-+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
-+                       vfp_reg_offset(1, a->vn),
-+                       vfp_reg_offset(1, a->vm),
-+                       fpst, opr_sz, opr_sz,
-+                       (a->index << 2) | a->rot, fn_gvec_ptr);
-+    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/vec_helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
-     bool is_long = false, q = extract32(insn, 6, 1);
+     }
-     bool ptr_is_env = false;
+     clear_tail(d, opr_sz, simd_maxsz(desc));
+ }
--    if ((insn & 0xff000f10) == 0xfe000800) {
++
--        /* VCMLA (indexed) -- 1111 1110 S.RR .... .... 1000 ...0 .... */
++void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm,
--        int rot = extract32(insn, 20, 2);
++                            void *va, uint32_t desc)
--        int size = extract32(insn, 23, 1);
++{
--        int index;
++    intptr_t i, j, opr_sz = simd_oprsz(desc);
--
++    intptr_t index = simd_data(desc);
--        if (!dc_isar_feature(aa32_vcma, s)) {
++    intptr_t elements = opr_sz / 4;
--            return 1;
++    intptr_t eltspersegment = MIN(16 / 4, elements);
--        }
++    float32 *d = vd, *a = va;
--        if (size == 0) {
++    uint32_t *n = vn, *m = vm;
--            if (!dc_isar_feature(aa32_fp16_arith, s)) {
++
--                return 1;
++    for (i = 0; i < elements; i += eltspersegment) {
--            }
++        uint32_t m_idx = m[i + H4(index)];
--            /* For fp16, rm is just Vm, and index is M.  */
++
--            rm = extract32(insn, 0, 4);
++        for (j = i; j < i + eltspersegment; j++) {
--            index = extract32(insn, 5, 1);
++            d[j] = bfdotadd(a[j], n[j], m_idx);
--        } else {
++        }
--            /* For fp32, rm is the usual M:Vm, and index is 0.  */
++    }
--            VFP_DREG_M(rm, insn);
++    clear_tail(d, opr_sz, simd_maxsz(desc));
--            index = 0;
++}
 -        }
 -        data = (index << 2) | rot;
 -        fn_gvec_ptr = (size ? gen_helper_gvec_fcmlas_idx
 -                       : gen_helper_gvec_fcmlah_idx);
 -    } else if ((insn & 0xffb00f00) == 0xfe200d00) {
 +    if ((insn & 0xffb00f00) == 0xfe200d00) {
          /* V[US]DOT -- 1111 1110 0.10 .... .... 1101 .Q.U .... */
          int u = extract32(insn, 4, 1);
 --
 .20.1

-[PULL 28/39] target/arm: Convert V[US]DOT (scalar) to decodetree
+[PULL 23/45] target/arm: Implement bfloat16 matrix multiply accumulate
-Convert the V[US]DOT (scalar) insns in the 2reg-scalar-ext group
+From: Richard Henderson <richard.henderson@linaro.org>
 to decodetree.
+This is BFMMLA for both AArch64 AdvSIMD and SVE,
+and VMMLA.BF16 for AArch32 NEON.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210525225817.400336-9-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-10-peter.maydell@linaro.org
 ---
- target/arm/neon-shared.decode   |  3 +++
+ target/arm/helper.h           |  3 +++
- target/arm/translate-neon.inc.c | 35 +++++++++++++++++++++++++++++++++
+ target/arm/neon-shared.decode |  2 ++
- target/arm/translate.c          | 13 +-----------
+ target/arm/sve.decode         |  6 +++--
-files changed, 39 insertions(+), 12 deletions(-)
+ target/arm/translate-a64.c    | 10 +++++++++
  target/arm/translate-neon.c   |  9 ++++++++
  target/arm/translate-sve.c    | 12 ++++++++++
  target/arm/vec_helper.c       | 42 ++++++++++++++++++++++++++++++++++-
 files changed, 81 insertions(+), 3 deletions(-)
+diff --git a/target/arm/helper.h b/target/arm/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.h
++++ b/target/arm/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, i32)
++
+ #ifdef TARGET_AARCH64
+ #include "helper-a64.h"
+ #include "helper-sve.h"
 diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-shared.decode
 +++ b/target/arm/neon-shared.decode
-@@ -XXX,XX +XXX,XX @@ VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
+@@ -XXX,XX +XXX,XX @@ VUMMLA         1111 1100 0.10 .... .... 1100 .1.1 .... \
-                vn=%vn_dp vd=%vd_dp size=0
+                vm=%vm_dp vn=%vn_dp vd=%vd_dp
- VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
+ VUSMMLA        1111 1100 1.10 .... .... 1100 .1.0 .... \
-                vm=%vm_dp vn=%vn_dp vd=%vd_dp size=1 index=0
+                vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +VMMLA_b16      1111 1100 0.00 .... .... 1100 .1.0 .... \
 +               vm=%vm_dp vn=%vn_dp vd=%vd_dp
  VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
                 vn=%vn_dp vd=%vd_dp size=1
 diff --git a/target/arm/sve.decode b/target/arm/sve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/sve.decode
 +++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ SQRDCMLAH_zzzz  01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5  ra=%reg_movprfx
  USDOT_zzzz      01000100 .. 0 ..... 011 110 ..... .....  @rda_rn_rm
  ### SVE2 floating point matrix multiply accumulate
 -
 -FMMLA           01100100 .. 1 ..... 111001 ..... .....  @rda_rn_rm
 +{
 +  BFMMLA        01100100 01 1 ..... 111 001 ..... .....  @rda_rn_rm_e0
 +  FMMLA         01100100 .. 1 ..... 111 001 ..... .....  @rda_rn_rm
 +}
  ### SVE2 Memory Gather Load Group
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          feature = dc_isar_feature(aa64_fcma, s);
          break;
 +    case 0x1d: /* BFMMLA */
 +        if (size != MO_16 || !is_q) {
 +            unallocated_encoding(s);
 +            return;
 +        }
 +        feature = dc_isar_feature(aa64_bf16, s);
 +        break;
      case 0x1f: /* BFDOT */
          switch (size) {
          case 1:
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          return;
 +    case 0xd: /* BFMMLA */
 +        gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
 +        return;
      case 0xf: /* BFDOT */
          switch (size) {
          case 1:
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VUSMMLA(DisasContext *s, arg_VUSMMLA *a)
      return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
                          gen_helper_gvec_usmmla_b);
  }
 +
-+VDOT_scalar    1111 1110 0 . 10 .... .... 1101 . q:1 index:1 u:1 rm:4 \
++static bool trans_VMMLA_b16(DisasContext *s, arg_VMMLA_b16 *a)
-+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
++{
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
++    if (!dc_isar_feature(aa32_bf16, s)) {
 +        return false;
 +    }
 +    return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
 +                        gen_helper_gvec_bfmmla);
 +}
 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/arm/translate-sve.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/arm/translate-sve.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCMLA_scalar(DisasContext *s, arg_VCMLA_scalar *a)
+@@ -XXX,XX +XXX,XX @@ static bool trans_BFDOT_zzxz(DisasContext *s, arg_rrxr_esz *a)
-     tcg_temp_free_ptr(fpst);
+     }
      return true;
  }
 +
-+static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
++static bool trans_BFMMLA(DisasContext *s, arg_rrrr_esz *a)
 +{
-+    gen_helper_gvec_3 *fn_gvec;
++    if (!dc_isar_feature(aa64_sve_bf16, s)) {
 +    int opr_sz;
 +    TCGv_ptr fpst;
 +
 +    if (!dc_isar_feature(aa32_dp, s)) {
 +        return false;
 +    }
-+
++    if (sve_access_check(s)) {
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
++        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfmmla,
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
++                          a->rd, a->rn, a->rm, a->ra, 0);
 +        ((a->vd | a->vn) & 0x10)) {
 +        return false;
 +    }
-+
-+    if ((a->vd | a->vn) & a->q) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    fn_gvec = a->u ? gen_helper_gvec_udot_idx_b : gen_helper_gvec_sdot_idx_b;
-+    opr_sz = (1 + a->q) * 8;
-+    fpst = get_fpstatus_ptr(1);
-+    tcg_gen_gvec_3_ool(vfp_reg_offset(1, a->vd),
-+                       vfp_reg_offset(1, a->vn),
-+                       vfp_reg_offset(1, a->rm),
-+                       opr_sz, opr_sz, a->index, fn_gvec);
-+    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/vec_helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void do_mmla_b(void *vd, void *vn, void *vm, void *va, uint32_t desc,
-     bool is_long = false, q = extract32(insn, 6, 1);
+          * Process the entire segment at once, writing back the
-     bool ptr_is_env = false;
+          * results only after we've consumed all of the inputs.
+          *
--    if ((insn & 0xffb00f00) == 0xfe200d00) {
+-         * Key to indicies by column:
--        /* V[US]DOT -- 1111 1110 0.10 .... .... 1101 .Q.U .... */
++         * Key to indices by column:
--        int u = extract32(insn, 4, 1);
+          *          i   j                  i             j
--
+          */
--        if (!dc_isar_feature(aa32_dp, s)) {
+         sum0 = a[H4(0 + 0)];
--            return 1;
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm,
--        }
+     }
--        fn_gvec = u ? gen_helper_gvec_udot_idx_b : gen_helper_gvec_sdot_idx_b;
+     clear_tail(d, opr_sz, simd_maxsz(desc));
--        /* rm is just Vm, and index is M.  */
+ }
--        data = extract32(insn, 5, 1); /* index */
++
--        rm = extract32(insn, 0, 4);
++void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
--    } else if ((insn & 0xffa00f10) == 0xfe000810) {
++{
-+    if ((insn & 0xffa00f10) == 0xfe000810) {
++    intptr_t s, opr_sz = simd_oprsz(desc);
-         /* VFM[AS]L -- 1111 1110 0.0S .... .... 1000 .Q.1 .... */
++    float32 *d = vd, *a = va;
-         int is_s = extract32(insn, 20, 1);
++    uint32_t *n = vn, *m = vm;
-         int vm20 = extract32(insn, 0, 3);
++
 +    for (s = 0; s < opr_sz / 4; s += 4) {
 +        float32 sum00, sum01, sum10, sum11;
 +
 +        /*
 +         * Process the entire segment at once, writing back the
 +         * results only after we've consumed all of the inputs.
 +         *
 +         * Key to indicies by column:
 +         *               i   j           i   k             j   k
 +         */
 +        sum00 = a[s + H4(0 + 0)];
 +        sum00 = bfdotadd(sum00, n[s + H4(0 + 0)], m[s + H4(0 + 0)]);
 +        sum00 = bfdotadd(sum00, n[s + H4(0 + 1)], m[s + H4(0 + 1)]);
 +
 +        sum01 = a[s + H4(0 + 1)];
 +        sum01 = bfdotadd(sum01, n[s + H4(0 + 0)], m[s + H4(2 + 0)]);
 +        sum01 = bfdotadd(sum01, n[s + H4(0 + 1)], m[s + H4(2 + 1)]);
 +
 +        sum10 = a[s + H4(2 + 0)];
 +        sum10 = bfdotadd(sum10, n[s + H4(2 + 0)], m[s + H4(0 + 0)]);
 +        sum10 = bfdotadd(sum10, n[s + H4(2 + 1)], m[s + H4(0 + 1)]);
 +
 +        sum11 = a[s + H4(2 + 1)];
 +        sum11 = bfdotadd(sum11, n[s + H4(2 + 0)], m[s + H4(2 + 0)]);
 +        sum11 = bfdotadd(sum11, n[s + H4(2 + 1)], m[s + H4(2 + 1)]);
 +
 +        d[s + H4(0 + 0)] = sum00;
 +        d[s + H4(0 + 1)] = sum01;
 +        d[s + H4(2 + 0)] = sum10;
 +        d[s + H4(2 + 1)] = sum11;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 --
 .20.1

-[PULL 25/39] target/arm: Convert V[US]DOT (vector) to decodetree
+[PULL 24/45] target/arm: Implement bfloat widening fma (vector)
-Convert the V[US]DOT (vector) insns to decodetree.
+From: Richard Henderson <richard.henderson@linaro.org>
+This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
+and VFMA{B,T}.BF16 for AArch32 NEON.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210525225817.400336-10-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-7-peter.maydell@linaro.org
 ---
- target/arm/neon-shared.decode   |  4 ++++
+ target/arm/helper.h           |  3 +++
- target/arm/translate-neon.inc.c | 32 ++++++++++++++++++++++++++++++++
+ target/arm/neon-shared.decode |  3 +++
- target/arm/translate.c          |  9 +--------
+ target/arm/sve.decode         |  3 +++
-files changed, 37 insertions(+), 8 deletions(-)
+ target/arm/translate-a64.c    | 13 +++++++++----
  target/arm/translate-neon.c   |  9 +++++++++
  target/arm/translate-sve.c    | 30 ++++++++++++++++++++++++++++++
  target/arm/vec_helper.c       | 16 ++++++++++++++++
 files changed, 73 insertions(+), 4 deletions(-)
+diff --git a/target/arm/helper.h b/target/arm/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.h
++++ b/target/arm/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, ptr, i32)
++
+ #ifdef TARGET_AARCH64
+ #include "helper-a64.h"
+ #include "helper-sve.h"
 diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-shared.decode
 +++ b/target/arm/neon-shared.decode
-@@ -XXX,XX +XXX,XX @@ VCMLA          1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
+@@ -XXX,XX +XXX,XX @@ VUSMMLA        1111 1100 1.10 .... .... 1100 .1.0 .... \
+ VMMLA_b16      1111 1100 0.00 .... .... 1100 .1.0 .... \
  VCADD          1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
++VFMA_b16       1111 110 0 0.11 .... .... 1000 . q:1 . 1 .... \
++               vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +
-+# VUDOT and VSDOT
+ VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
-+VDOT           1111 110 00 . 10 .... .... 1101 . q:1 . u:1 .... \
+                vn=%vn_dp vd=%vd_dp size=1
-+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
+ VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+diff --git a/target/arm/sve.decode b/target/arm/sve.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/arm/sve.decode
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/arm/sve.decode
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCADD(DisasContext *s, arg_VCADD *a)
+@@ -XXX,XX +XXX,XX @@ FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
-     tcg_temp_free_ptr(fpst);
+ FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_e0
  FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_e0
 +BFMLALB_zzzw    01100100 11 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
 +BFMLALT_zzzw    01100100 11 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
 +
  ### SVE2 floating-point bfloat16 dot-product
  BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          feature = dc_isar_feature(aa64_bf16, s);
          break;
 -    case 0x1f: /* BFDOT */
 +    case 0x1f:
          switch (size) {
 -        case 1:
 +        case 1: /* BFDOT */
 +        case 3: /* BFMLAL{B,T} */
              feature = dc_isar_feature(aa64_bf16, s);
              break;
          default:
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
      case 0xd: /* BFMMLA */
          gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
          return;
 -    case 0xf: /* BFDOT */
 +    case 0xf:
          switch (size) {
 -        case 1:
 +        case 1: /* BFDOT */
              gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
              break;
 +        case 3: /* BFMLAL{B,T} */
 +            gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, false, is_q,
 +                              gen_helper_gvec_bfmlal);
 +            break;
          default:
              g_assert_not_reached();
          }
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMMLA_b16(DisasContext *s, arg_VMMLA_b16 *a)
      return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
                          gen_helper_gvec_bfmmla);
  }
 +
 +static bool trans_VFMA_b16(DisasContext *s, arg_VFMA_b16 *a)
 +{
 +    if (!dc_isar_feature(aa32_bf16, s)) {
 +        return false;
 +    }
 +    return do_neon_ddda_fpst(s, 7, a->vd, a->vn, a->vm, a->q, FPST_STD,
 +                             gen_helper_gvec_bfmlal);
 +}
 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-sve.c
 +++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMMLA(DisasContext *s, arg_rrrr_esz *a)
      }
      return true;
  }
 +
-+static bool trans_VDOT(DisasContext *s, arg_VDOT *a)
++static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel)
 +{
-+    int opr_sz;
++    if (!dc_isar_feature(aa64_sve_bf16, s)) {
 +    gen_helper_gvec_3 *fn_gvec;
 +
 +    if (!dc_isar_feature(aa32_dp, s)) {
 +        return false;
 +    }
++    if (sve_access_check(s)) {
++        TCGv_ptr status = fpstatus_ptr(FPST_FPCR);
++        unsigned vsz = vec_full_reg_size(s);
 +
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
++        tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
++                           vec_full_reg_offset(s, a->rn),
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
++                           vec_full_reg_offset(s, a->rm),
-+        return false;
++                           vec_full_reg_offset(s, a->ra),
 +                           status, vsz, vsz, sel,
 +                           gen_helper_gvec_bfmlal);
 +        tcg_temp_free_ptr(status);
 +    }
-+
-+    if ((a->vn | a->vm | a->vd) & a->q) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    opr_sz = (1 + a->q) * 8;
-+    fn_gvec = a->u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b;
-+    tcg_gen_gvec_3_ool(vfp_reg_offset(1, a->vd),
-+                       vfp_reg_offset(1, a->vn),
-+                       vfp_reg_offset(1, a->vm),
-+                       opr_sz, opr_sz, 0, fn_gvec);
 +    return true;
 +}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
++
 +static bool trans_BFMLALB_zzzw(DisasContext *s, arg_rrrr_esz *a)
 +{
 +    return do_BFMLAL_zzzw(s, a, false);
 +}
 +
 +static bool trans_BFMLALT_zzzw(DisasContext *s, arg_rrrr_esz *a)
 +{
 +    return do_BFMLAL_zzzw(s, a, true);
 +}
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/vec_helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
-     bool is_long = false, q = extract32(insn, 6, 1);
+     }
-     bool ptr_is_env = false;
+     clear_tail(d, opr_sz, simd_maxsz(desc));
+ }
--    if ((insn & 0xfeb00f00) == 0xfc200d00) {
++
--        /* V[US]DOT -- 1111 1100 0.10 .... .... 1101 .Q.U .... */
++void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va,
--        bool u = extract32(insn, 4, 1);
++                         void *stat, uint32_t desc)
--        if (!dc_isar_feature(aa32_dp, s)) {
++{
--            return 1;
++    intptr_t i, opr_sz = simd_oprsz(desc);
--        }
++    intptr_t sel = simd_data(desc);
--        fn_gvec = u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b;
++    float32 *d = vd, *a = va;
--    } else if ((insn & 0xff300f10) == 0xfc200810) {
++    bfloat16 *n = vn, *m = vm;
-+    if ((insn & 0xff300f10) == 0xfc200810) {
++
-         /* VFM[AS]L -- 1111 1100 S.10 .... .... 1000 .Q.1 .... */
++    for (i = 0; i < opr_sz / 4; ++i) {
-         int is_s = extract32(insn, 23, 1);
++        float32 nn = n[H2(i * 2 + sel)] << 16;
-         if (!dc_isar_feature(aa32_fhm, s)) {
++        float32 mm = m[H2(i * 2 + sel)] << 16;
 +        d[H4(i)] = float32_muladd(nn, mm, a[H4(i)], 0, stat);
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 --
 .20.1

-[PULL 24/39] target/arm: Convert VCADD (vector) to decodetree
+[PULL 25/45] target/arm: Implement bfloat widening fma (indexed)
-Convert the VCADD (vector) insns to decodetree.
+From: Richard Henderson <richard.henderson@linaro.org>
+This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
+and VFMA{B,T}.BF16 for AArch32 NEON.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210525225817.400336-11-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-6-peter.maydell@linaro.org
 ---
- target/arm/neon-shared.decode   |  3 +++
+ target/arm/helper.h           |  2 ++
- target/arm/translate-neon.inc.c | 37 +++++++++++++++++++++++++++++++++
+ target/arm/neon-shared.decode |  2 ++
- target/arm/translate.c          | 11 +---------
+ target/arm/sve.decode         |  2 ++
-files changed, 41 insertions(+), 10 deletions(-)
+ target/arm/translate-a64.c    | 15 ++++++++++++++-
  target/arm/translate-neon.c   | 10 ++++++++++
  target/arm/translate-sve.c    | 30 ++++++++++++++++++++++++++++++
  target/arm/vec_helper.c       | 22 ++++++++++++++++++++++
 files changed, 82 insertions(+), 1 deletion(-)
+diff --git a/target/arm/helper.h b/target/arm/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.h
++++ b/target/arm/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, ptr, i32)
+ #ifdef TARGET_AARCH64
+ #include "helper-a64.h"
 diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-shared.decode
 +++ b/target/arm/neon-shared.decode
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 0 . 1 index:1 ... \
+                rm=%vfml_scalar_q0_rm vn=%vn_sp vd=%vd_dp q=0
- VCMLA          1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
+ VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 1 . 1 . rm:3 \
-                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+                index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp q=1
-+
++VFMA_b16_scal  1111 1110 0.11 .... .... 1000 . q:1 . 1 . vm:3 \
-+VCADD          1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
++               index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp
-+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
+diff --git a/target/arm/sve.decode b/target/arm/sve.decode
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/arm/sve.decode
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/arm/sve.decode
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCMLA(DisasContext *s, arg_VCMLA *a)
+@@ -XXX,XX +XXX,XX @@ FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
-     tcg_temp_free_ptr(fpst);
+ FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
-     return true;
+ FMLSLB_zzxw     01100100 10 1 ..... 0110.0 ..... .....     @rrxr_3a esz=2
  FMLSLT_zzxw     01100100 10 1 ..... 0110.1 ..... .....     @rrxr_3a esz=2
 +BFMLALB_zzxw    01100100 11 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
 +BFMLALT_zzxw    01100100 11 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
  ### SVE2 floating-point bfloat16 dot-product (indexed)
  BFDOT_zzxz      01100100 01 1 ..... 010000 ..... .....     @rrxr_2 esz=2
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                  unallocated_encoding(s);
                  return;
              }
 +            size = MO_32;
              break;
          case 1: /* BFDOT */
              if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
                  unallocated_encoding(s);
                  return;
              }
 +            size = MO_32;
 +            break;
 +        case 3: /* BFMLAL{B,T} */
 +            if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
 +                unallocated_encoding(s);
 +                return;
 +            }
 +            /* can't set is_fp without other incorrect size checks */
 +            size = MO_16;
              break;
          default:
              unallocated_encoding(s);
              return;
          }
 -        size = MO_32;
          break;
      case 0x11: /* FCMLA #0 */
      case 0x13: /* FCMLA #90 */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
              gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
                               gen_helper_gvec_usdot_idx_b);
              return;
 +        case 3: /* BFMLAL{B,T} */
 +            gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, 0, (index << 1) | is_q,
 +                              gen_helper_gvec_bfmlal_idx);
 +            return;
          }
          g_assert_not_reached();
      case 0x11: /* FCMLA #0 */
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VFMA_b16(DisasContext *s, arg_VFMA_b16 *a)
      return do_neon_ddda_fpst(s, 7, a->vd, a->vn, a->vm, a->q, FPST_STD,
                               gen_helper_gvec_bfmlal);
  }
 +
-+static bool trans_VCADD(DisasContext *s, arg_VCADD *a)
++static bool trans_VFMA_b16_scal(DisasContext *s, arg_VFMA_b16_scal *a)
 +{
-+    int opr_sz;
++    if (!dc_isar_feature(aa32_bf16, s)) {
 +    TCGv_ptr fpst;
 +    gen_helper_gvec_3_ptr *fn_gvec_ptr;
 +
 +    if (!dc_isar_feature(aa32_vcma, s)
 +        || (!a->size && !dc_isar_feature(aa32_fp16_arith, s))) {
 +        return false;
 +    }
++    return do_neon_ddda_fpst(s, 6, a->vd, a->vn, a->vm,
++                             (a->index << 1) | a->q, FPST_STD,
++                             gen_helper_gvec_bfmlal_idx);
++}
+diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-sve.c
++++ b/target/arm/translate-sve.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLALT_zzzw(DisasContext *s, arg_rrrr_esz *a)
+ {
+     return do_BFMLAL_zzzw(s, a, true);
+ }
 +
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
++static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel)
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
++{
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
++    if (!dc_isar_feature(aa64_sve_bf16, s)) {
 +        return false;
 +    }
++    if (sve_access_check(s)) {
++        TCGv_ptr status = fpstatus_ptr(FPST_FPCR);
++        unsigned vsz = vec_full_reg_size(s);
 +
-+    if ((a->vn | a->vm | a->vd) & a->q) {
++        tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
-+        return false;
++                           vec_full_reg_offset(s, a->rn),
 +                           vec_full_reg_offset(s, a->rm),
 +                           vec_full_reg_offset(s, a->ra),
 +                           status, vsz, vsz, (a->index << 1) | sel,
 +                           gen_helper_gvec_bfmlal_idx);
 +        tcg_temp_free_ptr(status);
 +    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    opr_sz = (1 + a->q) * 8;
-+    fpst = get_fpstatus_ptr(1);
-+    fn_gvec_ptr = a->size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
-+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
-+                       vfp_reg_offset(1, a->vn),
-+                       vfp_reg_offset(1, a->vm),
-+                       fpst, opr_sz, opr_sz, a->rot,
-+                       fn_gvec_ptr);
-+    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
++
 +static bool trans_BFMLALB_zzxw(DisasContext *s, arg_rrxr_esz *a)
 +{
 +    return do_BFMLAL_zzxw(s, a, false);
 +}
 +
 +static bool trans_BFMLALT_zzxw(DisasContext *s, arg_rrxr_esz *a)
 +{
 +    return do_BFMLAL_zzxw(s, a, true);
 +}
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/vec_helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va,
-     bool is_long = false, q = extract32(insn, 6, 1);
+     }
-     bool ptr_is_env = false;
+     clear_tail(d, opr_sz, simd_maxsz(desc));
+ }
--    if ((insn & 0xfea00f10) == 0xfc800800) {
++
--        /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
++void HELPER(gvec_bfmlal_idx)(void *vd, void *vn, void *vm,
--        int size = extract32(insn, 20, 1);
++                             void *va, void *stat, uint32_t desc)
--        data = extract32(insn, 24, 1); /* rot */
++{
--        if (!dc_isar_feature(aa32_vcma, s)
++    intptr_t i, j, opr_sz = simd_oprsz(desc);
--            || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
++    intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 1);
--            return 1;
++    intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 1, 3);
--        }
++    intptr_t elements = opr_sz / 4;
--        fn_gvec_ptr = size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
++    intptr_t eltspersegment = MIN(16 / 4, elements);
--    } else if ((insn & 0xfeb00f00) == 0xfc200d00) {
++    float32 *d = vd, *a = va;
-+    if ((insn & 0xfeb00f00) == 0xfc200d00) {
++    bfloat16 *n = vn, *m = vm;
-         /* V[US]DOT -- 1111 1100 0.10 .... .... 1101 .Q.U .... */
++
-         bool u = extract32(insn, 4, 1);
++    for (i = 0; i < elements; i += eltspersegment) {
-         if (!dc_isar_feature(aa32_dp, s)) {
++        float32 m_idx = m[H2(2 * i + index)] << 16;
 +
 +        for (j = i; j < i + eltspersegment; j++) {
 +            float32 n_j = n[H2(2 * j + sel)] << 16;
 +            d[H4(j)] = float32_muladd(n_j, m_idx, a[H4(j)], 0, stat);
 +        }
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 --
 .20.1

-New patch
+[PULL 26/45] linux-user/aarch64: Enable hwcap bits for bfloat16
+From: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210525225817.400336-12-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ linux-user/elfload.c | 2 ++
+file changed, 2 insertions(+)
+diff --git a/linux-user/elfload.c b/linux-user/elfload.c
+index XXXXXXX..XXXXXXX 100644
+--- a/linux-user/elfload.c
++++ b/linux-user/elfload.c
+@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap2(void)
+     GET_FEATURE_ID(aa64_sve_i8mm, ARM_HWCAP2_A64_SVEI8MM);
+     GET_FEATURE_ID(aa64_sve_f32mm, ARM_HWCAP2_A64_SVEF32MM);
+     GET_FEATURE_ID(aa64_sve_f64mm, ARM_HWCAP2_A64_SVEF64MM);
++    GET_FEATURE_ID(aa64_sve_bf16, ARM_HWCAP2_A64_SVEBF16);
+     GET_FEATURE_ID(aa64_i8mm, ARM_HWCAP2_A64_I8MM);
++    GET_FEATURE_ID(aa64_bf16, ARM_HWCAP2_A64_BF16);
+     GET_FEATURE_ID(aa64_rndr, ARM_HWCAP2_A64_RNG);
+     GET_FEATURE_ID(aa64_bti, ARM_HWCAP2_A64_BTI);
+     GET_FEATURE_ID(aa64_mte, ARM_HWCAP2_A64_MTE);
+--
+.20.1

-[PULL 07/39] target/arm: Use correct variable for setting 'max' cpu's ID_AA64DFR0
+[PULL 27/45] target/arm: Enable BFloat16 extensions
-In aarch64_max_initfn() we update both 32-bit and 64-bit ID
+From: Richard Henderson <richard.henderson@linaro.org>
 registers.  The intended pattern is that for 64-bit ID registers we
 use FIELD_DP64 and the uint64_t 't' register, while 32-bit ID
 registers use FIELD_DP32 and the uint32_t 'u' register.  For
 ID_AA64DFR0 we accidentally used 'u', meaning that the top 32 bits of
 this 64-bit ID register would end up always zero.  Luckily at the
 moment that's what they should be anyway, so this bug has no visible
 effects.
-Use the right-sized variable.
+Disable BF16 again for !have_neon and !have_vfp during realize.
-Fixes: 3bec78447a958d481991
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210525225817.400336-13-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20200423110915.10527-1-peter.maydell@linaro.org
 ---
- target/arm/cpu64.c | 6 +++---
+ target/arm/cpu.c     | 3 +++
-file changed, 3 insertions(+), 3 deletions(-)
+ target/arm/cpu64.c   | 3 +++
  target/arm/cpu_tcg.c | 1 +
 files changed, 7 insertions(+)
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.c
++++ b/target/arm/cpu.c
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
+         u = cpu->isar.id_isar6;
+         u = FIELD_DP32(u, ID_ISAR6, JSCVT, 0);
++        u = FIELD_DP32(u, ID_ISAR6, BF16, 0);
+         cpu->isar.id_isar6 = u;
+         u = cpu->isar.mvfr0;
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
+         t = cpu->isar.id_aa64isar1;
+         t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 0);
++        t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 0);
+         t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 0);
+         cpu->isar.id_aa64isar1 = t;
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
+         u = cpu->isar.id_isar6;
+         u = FIELD_DP32(u, ID_ISAR6, DP, 0);
+         u = FIELD_DP32(u, ID_ISAR6, FHM, 0);
++        u = FIELD_DP32(u, ID_ISAR6, BF16, 0);
+         u = FIELD_DP32(u, ID_ISAR6, I8MM, 0);
+         cpu->isar.id_isar6 = u;
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
 @@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
-         u = FIELD_DP32(u, ID_MMFR4, XNX, 1); /* TTS2UXN */
+         t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
-         cpu->isar.id_mmfr4 = u;
+         t = FIELD_DP64(t, ID_AA64ISAR1, SB, 1);
+         t = FIELD_DP64(t, ID_AA64ISAR1, SPECRES, 1);
--        u = cpu->isar.id_aa64dfr0;
++        t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 1);
--        u = FIELD_DP64(u, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
+         t = FIELD_DP64(t, ID_AA64ISAR1, FRINTTS, 1);
--        cpu->isar.id_aa64dfr0 = u;
+         t = FIELD_DP64(t, ID_AA64ISAR1, LRCPC, 2); /* ARMv8.4-RCPC */
-+        t = cpu->isar.id_aa64dfr0;
+         t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 1);
-+        t = FIELD_DP64(t, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
+@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
-+        cpu->isar.id_aa64dfr0 = t;
+         t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 1);
+         t = FIELD_DP64(t, ID_AA64ZFR0, AES, 2);  /* PMULL */
-         u = cpu->isar.id_dfr0;
+         t = FIELD_DP64(t, ID_AA64ZFR0, BITPERM, 1);
-         u = FIELD_DP32(u, ID_DFR0, PERFMON, 5); /* v8.4-PMU */
++        t = FIELD_DP64(t, ID_AA64ZFR0, BFLOAT16, 1);
          t = FIELD_DP64(t, ID_AA64ZFR0, SHA3, 1);
          t = FIELD_DP64(t, ID_AA64ZFR0, SM4, 1);
          t = FIELD_DP64(t, ID_AA64ZFR0, I8MM, 1);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          u = FIELD_DP32(u, ID_ISAR6, FHM, 1);
          u = FIELD_DP32(u, ID_ISAR6, SB, 1);
          u = FIELD_DP32(u, ID_ISAR6, SPECRES, 1);
 +        u = FIELD_DP32(u, ID_ISAR6, BF16, 1);
          u = FIELD_DP32(u, ID_ISAR6, I8MM, 1);
          cpu->isar.id_isar6 = u;
 diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu_tcg.c
 +++ b/target/arm/cpu_tcg.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
          t = FIELD_DP32(t, ID_ISAR6, FHM, 1);
          t = FIELD_DP32(t, ID_ISAR6, SB, 1);
          t = FIELD_DP32(t, ID_ISAR6, SPECRES, 1);
 +        t = FIELD_DP32(t, ID_ISAR6, BF16, 1);
          t = FIELD_DP32(t, ID_ISAR6, I8MM, 1);
          cpu->isar.id_isar6 = t;
 --
 .20.1

-[PULL 22/39] target/arm: Add stubs for AArch32 Neon decodetree
+[PULL 28/45] hvf: Move assert_hvf_ok() into common directory
-Add the infrastructure for building and invoking a decodetree decoder
+From: Alexander Graf <agraf@csgraf.de>
-for the AArch32 Neon encodings.  At the moment the new decoder covers
-nothing, so we always fall back to the existing hand-written decode.
+Until now, Hypervisor.framework has only been available on x86_64 systems.
+With Apple Silicon shipping now, it extends its reach to aarch64. To
-We follow the same pattern we did for the VFP decodetree conversion
+prepare for support for multiple architectures, let's start moving common
-(commit 78e138bc1f672c145ef6ace74617d and following): code that deals
+code out into its own accel directory.
-with Neon will be moving gradually out to translate-neon.vfp.inc,
-which we #include into translate.c.
+This patch moves assert_hvf_ok() and introduces generic build infrastructure.
-In order to share the decode files between A32 and T32, we
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
-split Neon into 3 parts:
+Reviewed-by: Sergio Lopez <slp@redhat.com>
- * data-processing
+Message-id: 20210519202253.76782-2-agraf@csgraf.de
- * load-store
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
  * 'shared' encodings
 The first two groups of instructions have similar but not identical
 A32 and T32 encodings, so we need to manually transform the T32
 encoding into the A32 one before calling the decoder; the third group
 covers the Neon instructions which are identical in A32 and T32.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-4-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       | 29 ++++++++++++++++++++++++++
+ include/sysemu/hvf_int.h | 18 +++++++++++++++
- target/arm/neon-ls.decode       | 29 ++++++++++++++++++++++++++
+ accel/hvf/hvf-all.c      | 47 ++++++++++++++++++++++++++++++++++++++++
- target/arm/neon-shared.decode   | 27 +++++++++++++++++++++++++
+ target/i386/hvf/hvf.c    | 33 +---------------------------
- target/arm/translate-neon.inc.c | 32 +++++++++++++++++++++++++++++
+ MAINTAINERS              |  8 +++++++
- target/arm/translate.c          | 36 +++++++++++++++++++++++++++++++--
+ accel/hvf/meson.build    |  6 +++++
- target/arm/Makefile.objs        | 18 +++++++++++++++++
+ accel/meson.build        |  1 +
-files changed, 169 insertions(+), 2 deletions(-)
+files changed, 81 insertions(+), 32 deletions(-)
- create mode 100644 target/arm/neon-dp.decode
+ create mode 100644 include/sysemu/hvf_int.h
- create mode 100644 target/arm/neon-ls.decode
+ create mode 100644 accel/hvf/hvf-all.c
- create mode 100644 target/arm/neon-shared.decode
+ create mode 100644 accel/hvf/meson.build
- create mode 100644 target/arm/translate-neon.inc.c
+diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
-+++ b/target/arm/neon-dp.decode
++++ b/include/sysemu/hvf_int.h
 @@ -XXX,XX +XXX,XX @@
-+# AArch32 Neon data-processing instruction descriptions
++/*
-+#
++ * QEMU Hypervisor.framework (HVF) support
-+#  Copyright (c) 2020 Linaro, Ltd
++ *
-+#
++ * This work is licensed under the terms of the GNU GPL, version 2 or later.
-+# This library is free software; you can redistribute it and/or
++ * See the COPYING file in the top-level directory.
-+# modify it under the terms of the GNU Lesser General Public
++ *
-+# License as published by the Free Software Foundation; either
++ */
-+# version 2 of the License, or (at your option) any later version.
++
-+#
++/* header to be included in HVF-specific code */
-+# This library is distributed in the hope that it will be useful,
++
-+# but WITHOUT ANY WARRANTY; without even the implied warranty of
++#ifndef HVF_INT_H
-+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++#define HVF_INT_H
-+# Lesser General Public License for more details.
++
-+#
++#include <Hypervisor/hv.h>
-+# You should have received a copy of the GNU Lesser General Public
++
-+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
++void assert_hvf_ok(hv_return_t ret);
 +
-+#
++#endif
-+# This file is processed by scripts/decodetree.py
+diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
 +#
 +
 +# Encodings for Neon data processing instructions where the T32 encoding
 +# is a simple transformation of the A32 encoding.
 +# More specifically, this file covers instructions where the A32 encoding is
 +#   0b1111_001p_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
 +# and the T32 encoding is
 +#   0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
 +# This file works on the A32 encoding only; calling code for T32 has to
 +# transform the insn into the A32 version first.
 diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
-+++ b/target/arm/neon-ls.decode
++++ b/accel/hvf/hvf-all.c
 @@ -XXX,XX +XXX,XX @@
-+# AArch32 Neon load/store instruction descriptions
++/*
-+#
++ * QEMU Hypervisor.framework support
-+#  Copyright (c) 2020 Linaro, Ltd
++ *
-+#
++ * This work is licensed under the terms of the GNU GPL, version 2.  See
-+# This library is free software; you can redistribute it and/or
++ * the COPYING file in the top-level directory.
-+# modify it under the terms of the GNU Lesser General Public
++ *
-+# License as published by the Free Software Foundation; either
++ * Contributions after 2012-01-13 are licensed under the terms of the
-+# version 2 of the License, or (at your option) any later version.
++ * GNU GPL, version 2 or (at your option) any later version.
-+#
++ */
-+# This library is distributed in the hope that it will be useful,
++
-+# but WITHOUT ANY WARRANTY; without even the implied warranty of
++#include "qemu/osdep.h"
-+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
++#include "qemu-common.h"
-+# Lesser General Public License for more details.
++#include "qemu/error-report.h"
-+#
++#include "sysemu/hvf.h"
-+# You should have received a copy of the GNU Lesser General Public
++#include "sysemu/hvf_int.h"
-+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
++
-+
++void assert_hvf_ok(hv_return_t ret)
-+#
++{
-+# This file is processed by scripts/decodetree.py
++    if (ret == HV_SUCCESS) {
-+#
++        return;
-+
++    }
-+# Encodings for Neon load/store instructions where the T32 encoding
++
-+# is a simple transformation of the A32 encoding.
++    switch (ret) {
-+# More specifically, this file covers instructions where the A32 encoding is
++    case HV_ERROR:
-+#   0b1111_0100_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
++        error_report("Error: HV_ERROR");
-+# and the T32 encoding is
++        break;
-+#   0b1111_1001_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
++    case HV_BUSY:
-+# This file works on the A32 encoding only; calling code for T32 has to
++        error_report("Error: HV_BUSY");
-+# transform the insn into the A32 version first.
++        break;
-diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
++    case HV_BAD_ARGUMENT:
 +        error_report("Error: HV_BAD_ARGUMENT");
 +        break;
 +    case HV_NO_RESOURCES:
 +        error_report("Error: HV_NO_RESOURCES");
 +        break;
 +    case HV_NO_DEVICE:
 +        error_report("Error: HV_NO_DEVICE");
 +        break;
 +    case HV_UNSUPPORTED:
 +        error_report("Error: HV_UNSUPPORTED");
 +        break;
 +    default:
 +        error_report("Unknown Error");
 +    }
 +
 +    abort();
 +}
 diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/hvf.c
 +++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/error-report.h"
  #include "sysemu/hvf.h"
 +#include "sysemu/hvf_int.h"
  #include "sysemu/runstate.h"
  #include "hvf-i386.h"
  #include "vmcs.h"
@@ -XXX,XX +XXX,XX @@
  HVFState *hvf_state;
 -static void assert_hvf_ok(hv_return_t ret)
 -{
 -    if (ret == HV_SUCCESS) {
 -        return;
 -    }
 -
 -    switch (ret) {
 -    case HV_ERROR:
 -        error_report("Error: HV_ERROR");
 -        break;
 -    case HV_BUSY:
 -        error_report("Error: HV_BUSY");
 -        break;
 -    case HV_BAD_ARGUMENT:
 -        error_report("Error: HV_BAD_ARGUMENT");
 -        break;
 -    case HV_NO_RESOURCES:
 -        error_report("Error: HV_NO_RESOURCES");
 -        break;
 -    case HV_NO_DEVICE:
 -        error_report("Error: HV_NO_DEVICE");
 -        break;
 -    case HV_UNSUPPORTED:
 -        error_report("Error: HV_UNSUPPORTED");
 -        break;
 -    default:
 -        error_report("Unknown Error");
 -    }
 -
 -    abort();
 -}
 -
  /* Memory slots */
  hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
  {
 diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ M: Roman Bolshakov <r.bolshakov@yadro.com>
  W: https://wiki.qemu.org/Features/HVF
  S: Maintained
  F: target/i386/hvf/
 +
 +HVF
 +M: Cameron Esfahani <dirty@apple.com>
 +M: Roman Bolshakov <r.bolshakov@yadro.com>
 +W: https://wiki.qemu.org/Features/HVF
 +S: Maintained
 +F: accel/hvf/
  F: include/sysemu/hvf.h
 +F: include/sysemu/hvf_int.h
  WHPX CPUs
  M: Sunil Muthuswamy <sunilmut@microsoft.com>
 diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
-+++ b/target/arm/neon-shared.decode
++++ b/accel/hvf/meson.build
 @@ -XXX,XX +XXX,XX @@
-+# AArch32 Neon instruction descriptions
++hvf_ss = ss.source_set()
-+#
++hvf_ss.add(files(
-+#  Copyright (c) 2020 Linaro, Ltd
++  'hvf-all.c',
-+#
++))
-+# This library is free software; you can redistribute it and/or
++
-+# modify it under the terms of the GNU Lesser General Public
++specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
-+# License as published by the Free Software Foundation; either
+diff --git a/accel/meson.build b/accel/meson.build
 +# version 2 of the License, or (at your option) any later version.
 +#
 +# This library is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 +# Lesser General Public License for more details.
 +#
 +# You should have received a copy of the GNU Lesser General Public
 +# License along with this library; if not, see <http://www.gnu.org/licenses/>.
 +
 +#
 +# This file is processed by scripts/decodetree.py
 +#
 +
 +# Encodings for Neon instructions whose encoding is the same for
 +# both A32 and T32.
 +
 +# More specifically, this covers:
 +# 2reg scalar ext: 0b1111_1110_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
 +# 3same ext:       0b1111_110x_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + *  ARM translation: AArch32 Neon instructions
 + *
 + *  Copyright (c) 2003 Fabrice Bellard
 + *  Copyright (c) 2005-2007 CodeSourcery
 + *  Copyright (c) 2007 OpenedHand, Ltd.
 + *  Copyright (c) 2020 Linaro, Ltd.
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
 + */
 +
 +/*
 + * This file is intended to be included from translate.c; it uses
 + * some macros and definitions provided by that file.
 + * It might be possible to convert it to a standalone .c file eventually.
 + */
 +
 +/* Include the generated Neon decoder */
 +#include "decode-neon-dp.inc.c"
 +#include "decode-neon-ls.inc.c"
 +#include "decode-neon-shared.inc.c"
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/accel/meson.build
-+++ b/target/arm/translate.c
++++ b/accel/meson.build
-@@ -XXX,XX +XXX,XX @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
+@@ -XXX,XX +XXX,XX @@ specific_ss.add(files('accel-common.c'))
+ softmmu_ss.add(files('accel-softmmu.c'))
- #define ARM_CP_RW_BIT   (1 << 20)
+ user_ss.add(files('accel-user.c'))
--/* Include the VFP decoder */
++subdir('hvf')
-+/* Include the VFP and Neon decoders */
+ subdir('qtest')
- #include "translate-vfp.inc.c"
+ subdir('kvm')
-+#include "translate-neon.inc.c"
+ subdir('tcg')
  static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
  {
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
          /* Unconditional instructions.  */
          /* TODO: Perhaps merge these into one decodetree output file.  */
          if (disas_a32_uncond(s, insn) ||
 -            disas_vfp_uncond(s, insn)) {
 +            disas_vfp_uncond(s, insn) ||
 +            disas_neon_dp(s, insn) ||
 +            disas_neon_ls(s, insn) ||
 +            disas_neon_shared(s, insn)) {
              return;
          }
          /* fall back to legacy decoder */
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
          ARCH(6T2);
      }
 +    if ((insn & 0xef000000) == 0xef000000) {
 +        /*
 +         * T32 encodings 0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
 +         * transform into
 +         * A32 encodings 0b1111_001p_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
 +         */
 +        uint32_t a32_insn = (insn & 0xe2ffffff) |
 +            ((insn & (1 << 28)) >> 4) | (1 << 28);
 +
 +        if (disas_neon_dp(s, a32_insn)) {
 +            return;
 +        }
 +    }
 +
 +    if ((insn & 0xff100000) == 0xf9000000) {
 +        /*
 +         * T32 encodings 0b1111_1001_ppp0_qqqq_qqqq_qqqq_qqqq_qqqq
 +         * transform into
 +         * A32 encodings 0b1111_0100_ppp0_qqqq_qqqq_qqqq_qqqq_qqqq
 +         */
 +        uint32_t a32_insn = (insn & 0x00ffffff) | 0xf4000000;
 +
 +        if (disas_neon_ls(s, a32_insn)) {
 +            return;
 +        }
 +    }
 +
      /*
       * TODO: Perhaps merge these into one decodetree output file.
       * Note disas_vfp is written for a32 with cond field in the
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
       */
      if (disas_t32(s, insn) ||
          disas_vfp_uncond(s, insn) ||
 +        disas_neon_shared(s, insn) ||
          ((insn >> 28) == 0xe && disas_vfp(s, insn))) {
          return;
      }
 diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/Makefile.objs
 +++ b/target/arm/Makefile.objs
@@ -XXX,XX +XXX,XX @@ target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
        $(PYTHON) $(DECODETREE) --decode disas_sve -o $@ $<,\
        "GEN", $(TARGET_DIR)$@)
 +target/arm/decode-neon-shared.inc.c: $(SRC_PATH)/target/arm/neon-shared.decode $(DECODETREE)
 +    $(call quiet-command,\
 +      $(PYTHON) $(DECODETREE) --static-decode disas_neon_shared -o $@ $<,\
 +      "GEN", $(TARGET_DIR)$@)
 +
 +target/arm/decode-neon-dp.inc.c: $(SRC_PATH)/target/arm/neon-dp.decode $(DECODETREE)
 +    $(call quiet-command,\
 +      $(PYTHON) $(DECODETREE) --static-decode disas_neon_dp -o $@ $<,\
 +      "GEN", $(TARGET_DIR)$@)
 +
 +target/arm/decode-neon-ls.inc.c: $(SRC_PATH)/target/arm/neon-ls.decode $(DECODETREE)
 +    $(call quiet-command,\
 +      $(PYTHON) $(DECODETREE) --static-decode disas_neon_ls -o $@ $<,\
 +      "GEN", $(TARGET_DIR)$@)
 +
  target/arm/decode-vfp.inc.c: $(SRC_PATH)/target/arm/vfp.decode $(DECODETREE)
      $(call quiet-command,\
        $(PYTHON) $(DECODETREE) --static-decode disas_vfp -o $@ $<,\
@@ -XXX,XX +XXX,XX @@ target/arm/decode-t16.inc.c: $(SRC_PATH)/target/arm/t16.decode $(DECODETREE)
        "GEN", $(TARGET_DIR)$@)
  target/arm/translate-sve.o: target/arm/decode-sve.inc.c
 +target/arm/translate.o: target/arm/decode-neon-shared.inc.c
 +target/arm/translate.o: target/arm/decode-neon-dp.inc.c
 +target/arm/translate.o: target/arm/decode-neon-ls.inc.c
  target/arm/translate.o: target/arm/decode-vfp.inc.c
  target/arm/translate.o: target/arm/decode-vfp-uncond.inc.c
  target/arm/translate.o: target/arm/decode-a32.inc.c
 --
 .20.1

-[PULL 14/39] hw/arm: versal: Embed the ADMAs into the SoC type
+[PULL 29/45] hvf: Move vcpu thread functions into common directory
-From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+From: Alexander Graf <agraf@csgraf.de>
-Embed the ADMAs into the SoC type.
+Until now, Hypervisor.framework has only been available on x86_64 systems.
 With Apple Silicon shipping now, it extends its reach to aarch64. To
 prepare for support for multiple architectures, let's start moving common
 code out into its own accel directory.
-Suggested-by: Peter Maydell <peter.maydell@linaro.org>
+This patch moves the vCPU thread loop over.
-Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Sergio Lopez <slp@redhat.com>
-Reviewed-by: Luc Michel <luc.michel@greensocs.com>
+Message-id: 20210519202253.76782-3-agraf@csgraf.de
-Message-id: 20200427181649.26851-7-edgar.iglesias@gmail.com
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/hw/arm/xlnx-versal.h |  3 ++-
+ {target/i386 => accel}/hvf/hvf-accel-ops.h | 0
- hw/arm/xlnx-versal.c         | 14 +++++++-------
+ {target/i386 => accel}/hvf/hvf-accel-ops.c | 0
-files changed, 9 insertions(+), 8 deletions(-)
+ target/i386/hvf/x86hvf.c                   | 2 +-
  accel/hvf/meson.build                      | 1 +
  target/i386/hvf/meson.build                | 1 -
 files changed, 2 insertions(+), 2 deletions(-)
  rename {target/i386 => accel}/hvf/hvf-accel-ops.h (100%)
  rename {target/i386 => accel}/hvf/hvf-accel-ops.c (100%)
-diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
+diff --git a/target/i386/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
 similarity index 100%
 rename from target/i386/hvf/hvf-accel-ops.h
 rename to accel/hvf/hvf-accel-ops.h
 diff --git a/target/i386/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 similarity index 100%
 rename from target/i386/hvf/hvf-accel-ops.c
 rename to accel/hvf/hvf-accel-ops.c
 diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/xlnx-versal.h
+--- a/target/i386/hvf/x86hvf.c
-+++ b/include/hw/arm/xlnx-versal.h
++++ b/target/i386/hvf/x86hvf.c
 @@ -XXX,XX +XXX,XX @@
- #include "hw/arm/boot.h"
+ #include <Hypervisor/hv.h>
- #include "hw/intc/arm_gicv3.h"
+ #include <Hypervisor/hv_vmx.h>
- #include "hw/char/pl011.h"
-+#include "hw/dma/xlnx-zdma.h"
+-#include "hvf-accel-ops.h"
- #include "hw/net/cadence_gem.h"
++#include "accel/hvf/hvf-accel-ops.h"
- #define TYPE_XLNX_VERSAL "xlnx-versal"
+ void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
-@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
+                      SegmentCache *qseg, bool is_tr)
-         struct {
+diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
              PL011State uart[XLNX_VERSAL_NR_UARTS];
              CadenceGEMState gem[XLNX_VERSAL_NR_GEMS];
 -            SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
 +            XlnxZDMA adma[XLNX_VERSAL_NR_ADMAS];
          } iou;
      } lpd;
 diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/xlnx-versal.c
+--- a/accel/hvf/meson.build
-+++ b/hw/arm/xlnx-versal.c
++++ b/accel/hvf/meson.build
-@@ -XXX,XX +XXX,XX @@ static void versal_create_admas(Versal *s, qemu_irq *pic)
+@@ -XXX,XX +XXX,XX @@
-         DeviceState *dev;
+ hvf_ss = ss.source_set()
-         MemoryRegion *mr;
+ hvf_ss.add(files(
+   'hvf-all.c',
--        dev = qdev_create(NULL, "xlnx.zdma");
++  'hvf-accel-ops.c',
--        s->lpd.iou.adma[i] = SYS_BUS_DEVICE(dev);
+ ))
--        object_property_set_int(OBJECT(s->lpd.iou.adma[i]), 128, "bus-width",
--                                &error_abort);
+ specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
--        object_property_add_child(OBJECT(s), name, OBJECT(dev), &error_fatal);
+diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build
-+        sysbus_init_child_obj(OBJECT(s), name,
+index XXXXXXX..XXXXXXX 100644
-+                              &s->lpd.iou.adma[i], sizeof(s->lpd.iou.adma[i]),
+--- a/target/i386/hvf/meson.build
-+                              TYPE_XLNX_ZDMA);
++++ b/target/i386/hvf/meson.build
-+        dev = DEVICE(&s->lpd.iou.adma[i]);
+@@ -XXX,XX +XXX,XX @@
-+        object_property_set_int(OBJECT(dev), 128, "bus-width", &error_abort);
+ i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
-         qdev_init_nofail(dev);
+   'hvf.c',
+-  'hvf-accel-ops.c',
--        mr = sysbus_mmio_get_region(s->lpd.iou.adma[i], 0);
+   'x86.c',
-+        mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
+   'x86_cpuid.c',
-         memory_region_add_subregion(&s->mr_ps,
+   'x86_decode.c',
                                      MM_ADMA_CH0 + i * MM_ADMA_CH0_SIZE, mr);
 -        sysbus_connect_irq(s->lpd.iou.adma[i], 0, pic[VERSAL_ADMA_IRQ_0 + i]);
 +        sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[VERSAL_ADMA_IRQ_0 + i]);
          g_free(name);
      }
  }
 --
 .20.1

-[PULL 29/39] target/arm: Convert VFM[AS]L (scalar) to decodetree
+[PULL 30/45] hvf: Move cpu functions into common directory
-Convert the VFM[AS]L (scalar) insns in the 2reg-scalar-ext group
+From: Alexander Graf <agraf@csgraf.de>
 to decodetree. These are the last ones in the group so we can remove
 all the legacy decode for the group.
-Note that in disas_thumb2_insn() the parts of this encoding space
+Until now, Hypervisor.framework has only been available on x86_64 systems.
-where the decodetree decoder returns false will correctly be directed
+With Apple Silicon shipping now, it extends its reach to aarch64. To
-to illegal_op by the "(insn & (1 << 28))" check so they won't fall
+prepare for support for multiple architectures, let's start moving common
-into disas_coproc_insn() by mistake.
+code out into its own accel directory.
+This patch moves CPU and memory operations over. While at it, make sure
+the code is consumable on non-i386 systems.
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
+Reviewed-by: Sergio Lopez <slp@redhat.com>
+Message-id: 20210519202253.76782-4-agraf@csgraf.de
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-11-peter.maydell@linaro.org
 ---
- target/arm/neon-shared.decode   |   7 +++
+ include/sysemu/hvf_int.h   |   4 +
- target/arm/translate-neon.inc.c |  32 ++++++++++
+ target/i386/hvf/hvf-i386.h |   2 -
- target/arm/translate.c          | 107 +-------------------------------
+ target/i386/hvf/x86hvf.h   |   2 -
-files changed, 40 insertions(+), 106 deletions(-)
+ accel/hvf/hvf-accel-ops.c  | 308 ++++++++++++++++++++++++++++++++++++-
  target/i386/hvf/hvf.c      | 302 ------------------------------------
 files changed, 311 insertions(+), 307 deletions(-)
-diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
+diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-shared.decode
+--- a/include/sysemu/hvf_int.h
-+++ b/target/arm/neon-shared.decode
++++ b/include/sysemu/hvf_int.h
-@@ -XXX,XX +XXX,XX @@ VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
+@@ -XXX,XX +XXX,XX @@
- VDOT_scalar    1111 1110 0 . 10 .... .... 1101 . q:1 index:1 u:1 rm:4 \
+ #include <Hypervisor/hv.h>
-                vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
++void hvf_set_phys_mem(MemoryRegionSection *, bool);
-+%vfml_scalar_q0_rm 0:3 5:1
+ void assert_hvf_ok(hv_return_t ret);
-+%vfml_scalar_q1_index 5:1 3:1
++hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
-+VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 0 . 1 index:1 ... \
++int hvf_put_registers(CPUState *);
-+               rm=%vfml_scalar_q0_rm vn=%vn_sp vd=%vd_dp q=0
++int hvf_get_registers(CPUState *);
-+VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 1 . 1 . rm:3 \
-+               index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp q=1
+ #endif
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/i386/hvf/hvf-i386.h
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/i386/hvf/hvf-i386.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
+@@ -XXX,XX +XXX,XX @@ struct HVFState {
-     tcg_temp_free_ptr(fpst);
+ };
-     return true;
+ extern HVFState *hvf_state;
 -void hvf_set_phys_mem(MemoryRegionSection *, bool);
  void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
 -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
  #ifdef NEED_CPU_H
  /* Functions exported to host specific mode */
 diff --git a/target/i386/hvf/x86hvf.h b/target/i386/hvf/x86hvf.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/x86hvf.h
 +++ b/target/i386/hvf/x86hvf.h
@@ -XXX,XX +XXX,XX @@
  #include "x86_descr.h"
  int hvf_process_events(CPUState *);
 -int hvf_put_registers(CPUState *);
 -int hvf_get_registers(CPUState *);
  bool hvf_inject_interrupts(CPUState *);
  void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
                       SegmentCache *qseg, bool is_tr);
 diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/hvf/hvf-accel-ops.c
 +++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/osdep.h"
  #include "qemu/error-report.h"
  #include "qemu/main-loop.h"
 +#include "exec/address-spaces.h"
 +#include "exec/exec-all.h"
 +#include "sysemu/cpus.h"
  #include "sysemu/hvf.h"
 +#include "sysemu/hvf_int.h"
  #include "sysemu/runstate.h"
 -#include "target/i386/cpu.h"
  #include "qemu/guest-random.h"
  #include "hvf-accel-ops.h"
 +HVFState *hvf_state;
 +
 +/* Memory slots */
 +
 +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
 +{
 +    hvf_slot *slot;
 +    int x;
 +    for (x = 0; x < hvf_state->num_slots; ++x) {
 +        slot = &hvf_state->slots[x];
 +        if (slot->size && start < (slot->start + slot->size) &&
 +            (start + size) > slot->start) {
 +            return slot;
 +        }
 +    }
 +    return NULL;
 +}
 +
 +struct mac_slot {
 +    int present;
 +    uint64_t size;
 +    uint64_t gpa_start;
 +    uint64_t gva;
 +};
 +
 +struct mac_slot mac_slots[32];
 +
 +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
 +{
 +    struct mac_slot *macslot;
 +    hv_return_t ret;
 +
 +    macslot = &mac_slots[slot->slot_id];
 +
 +    if (macslot->present) {
 +        if (macslot->size != slot->size) {
 +            macslot->present = 0;
 +            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
 +            assert_hvf_ok(ret);
 +        }
 +    }
 +
 +    if (!slot->size) {
 +        return 0;
 +    }
 +
 +    macslot->present = 1;
 +    macslot->gpa_start = slot->start;
 +    macslot->size = slot->size;
 +    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
 +    assert_hvf_ok(ret);
 +    return 0;
 +}
 +
 +void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
 +{
 +    hvf_slot *mem;
 +    MemoryRegion *area = section->mr;
 +    bool writeable = !area->readonly && !area->rom_device;
 +    hv_memory_flags_t flags;
 +
 +    if (!memory_region_is_ram(area)) {
 +        if (writeable) {
 +            return;
 +        } else if (!memory_region_is_romd(area)) {
 +            /*
 +             * If the memory device is not in romd_mode, then we actually want
 +             * to remove the hvf memory slot so all accesses will trap.
 +             */
 +             add = false;
 +        }
 +    }
 +
 +    mem = hvf_find_overlap_slot(
 +            section->offset_within_address_space,
 +            int128_get64(section->size));
 +
 +    if (mem && add) {
 +        if (mem->size == int128_get64(section->size) &&
 +            mem->start == section->offset_within_address_space &&
 +            mem->mem == (memory_region_get_ram_ptr(area) +
 +            section->offset_within_region)) {
 +            return; /* Same region was attempted to register, go away. */
 +        }
 +    }
 +
 +    /* Region needs to be reset. set the size to 0 and remap it. */
 +    if (mem) {
 +        mem->size = 0;
 +        if (do_hvf_set_memory(mem, 0)) {
 +            error_report("Failed to reset overlapping slot");
 +            abort();
 +        }
 +    }
 +
 +    if (!add) {
 +        return;
 +    }
 +
 +    if (area->readonly ||
 +        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
 +        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
 +    } else {
 +        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
 +    }
 +
 +    /* Now make a new slot. */
 +    int x;
 +
 +    for (x = 0; x < hvf_state->num_slots; ++x) {
 +        mem = &hvf_state->slots[x];
 +        if (!mem->size) {
 +            break;
 +        }
 +    }
 +
 +    if (x == hvf_state->num_slots) {
 +        error_report("No free slots");
 +        abort();
 +    }
 +
 +    mem->size = int128_get64(section->size);
 +    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
 +    mem->start = section->offset_within_address_space;
 +    mem->region = area;
 +
 +    if (do_hvf_set_memory(mem, flags)) {
 +        error_report("Error registering new memory slot");
 +        abort();
 +    }
 +}
 +
 +static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
 +{
 +    if (!cpu->vcpu_dirty) {
 +        hvf_get_registers(cpu);
 +        cpu->vcpu_dirty = true;
 +    }
 +}
 +
 +void hvf_cpu_synchronize_state(CPUState *cpu)
 +{
 +    if (!cpu->vcpu_dirty) {
 +        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
 +    }
 +}
 +
 +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
 +                                              run_on_cpu_data arg)
 +{
 +    hvf_put_registers(cpu);
 +    cpu->vcpu_dirty = false;
 +}
 +
 +void hvf_cpu_synchronize_post_reset(CPUState *cpu)
 +{
 +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
 +}
 +
 +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
 +                                             run_on_cpu_data arg)
 +{
 +    hvf_put_registers(cpu);
 +    cpu->vcpu_dirty = false;
 +}
 +
 +void hvf_cpu_synchronize_post_init(CPUState *cpu)
 +{
 +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
 +}
 +
 +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
 +                                              run_on_cpu_data arg)
 +{
 +    cpu->vcpu_dirty = true;
 +}
 +
 +void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
 +{
 +    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
 +}
 +
 +static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
 +{
 +    hvf_slot *slot;
 +
 +    slot = hvf_find_overlap_slot(
 +            section->offset_within_address_space,
 +            int128_get64(section->size));
 +
 +    /* protect region against writes; begin tracking it */
 +    if (on) {
 +        slot->flags |= HVF_SLOT_LOG;
 +        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
 +                      HV_MEMORY_READ);
 +    /* stop tracking region*/
 +    } else {
 +        slot->flags &= ~HVF_SLOT_LOG;
 +        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
 +                      HV_MEMORY_READ | HV_MEMORY_WRITE);
 +    }
 +}
 +
 +static void hvf_log_start(MemoryListener *listener,
 +                          MemoryRegionSection *section, int old, int new)
 +{
 +    if (old != 0) {
 +        return;
 +    }
 +
 +    hvf_set_dirty_tracking(section, 1);
 +}
 +
 +static void hvf_log_stop(MemoryListener *listener,
 +                         MemoryRegionSection *section, int old, int new)
 +{
 +    if (new != 0) {
 +        return;
 +    }
 +
 +    hvf_set_dirty_tracking(section, 0);
 +}
 +
 +static void hvf_log_sync(MemoryListener *listener,
 +                         MemoryRegionSection *section)
 +{
 +    /*
 +     * sync of dirty pages is handled elsewhere; just make sure we keep
 +     * tracking the region.
 +     */
 +    hvf_set_dirty_tracking(section, 1);
 +}
 +
 +static void hvf_region_add(MemoryListener *listener,
 +                           MemoryRegionSection *section)
 +{
 +    hvf_set_phys_mem(section, true);
 +}
 +
 +static void hvf_region_del(MemoryListener *listener,
 +                           MemoryRegionSection *section)
 +{
 +    hvf_set_phys_mem(section, false);
 +}
 +
 +static MemoryListener hvf_memory_listener = {
 +    .priority = 10,
 +    .region_add = hvf_region_add,
 +    .region_del = hvf_region_del,
 +    .log_start = hvf_log_start,
 +    .log_stop = hvf_log_stop,
 +    .log_sync = hvf_log_sync,
 +};
 +
 +static void dummy_signal(int sig)
 +{
 +}
 +
 +bool hvf_allowed;
 +
 +static int hvf_accel_init(MachineState *ms)
 +{
 +    int x;
 +    hv_return_t ret;
 +    HVFState *s;
 +
 +    ret = hv_vm_create(HV_VM_DEFAULT);
 +    assert_hvf_ok(ret);
 +
 +    s = g_new0(HVFState, 1);
 +
 +    s->num_slots = 32;
 +    for (x = 0; x < s->num_slots; ++x) {
 +        s->slots[x].size = 0;
 +        s->slots[x].slot_id = x;
 +    }
 +
 +    hvf_state = s;
 +    memory_listener_register(&hvf_memory_listener, &address_space_memory);
 +    return 0;
 +}
 +
 +static void hvf_accel_class_init(ObjectClass *oc, void *data)
 +{
 +    AccelClass *ac = ACCEL_CLASS(oc);
 +    ac->name = "HVF";
 +    ac->init_machine = hvf_accel_init;
 +    ac->allowed = &hvf_allowed;
 +}
 +
 +static const TypeInfo hvf_accel_type = {
 +    .name = TYPE_HVF_ACCEL,
 +    .parent = TYPE_ACCEL,
 +    .class_init = hvf_accel_class_init,
 +};
 +
 +static void hvf_type_init(void)
 +{
 +    type_register_static(&hvf_accel_type);
 +}
 +
 +type_init(hvf_type_init);
 +
  /*
   * The HVF-specific vCPU thread function. This one should only run when the host
   * CPU supports the VMX "unrestricted guest" feature.
 diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/hvf.c
 +++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
  #include "hvf-accel-ops.h"
 -HVFState *hvf_state;
 -
 -/* Memory slots */
 -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
 -{
 -    hvf_slot *slot;
 -    int x;
 -    for (x = 0; x < hvf_state->num_slots; ++x) {
 -        slot = &hvf_state->slots[x];
 -        if (slot->size && start < (slot->start + slot->size) &&
 -            (start + size) > slot->start) {
 -            return slot;
 -        }
 -    }
 -    return NULL;
 -}
 -
 -struct mac_slot {
 -    int present;
 -    uint64_t size;
 -    uint64_t gpa_start;
 -    uint64_t gva;
 -};
 -
 -struct mac_slot mac_slots[32];
 -
 -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
 -{
 -    struct mac_slot *macslot;
 -    hv_return_t ret;
 -
 -    macslot = &mac_slots[slot->slot_id];
 -
 -    if (macslot->present) {
 -        if (macslot->size != slot->size) {
 -            macslot->present = 0;
 -            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
 -            assert_hvf_ok(ret);
 -        }
 -    }
 -
 -    if (!slot->size) {
 -        return 0;
 -    }
 -
 -    macslot->present = 1;
 -    macslot->gpa_start = slot->start;
 -    macslot->size = slot->size;
 -    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
 -    assert_hvf_ok(ret);
 -    return 0;
 -}
 -
 -void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
 -{
 -    hvf_slot *mem;
 -    MemoryRegion *area = section->mr;
 -    bool writeable = !area->readonly && !area->rom_device;
 -    hv_memory_flags_t flags;
 -
 -    if (!memory_region_is_ram(area)) {
 -        if (writeable) {
 -            return;
 -        } else if (!memory_region_is_romd(area)) {
 -            /*
 -             * If the memory device is not in romd_mode, then we actually want
 -             * to remove the hvf memory slot so all accesses will trap.
 -             */
 -             add = false;
 -        }
 -    }
 -
 -    mem = hvf_find_overlap_slot(
 -            section->offset_within_address_space,
 -            int128_get64(section->size));
 -
 -    if (mem && add) {
 -        if (mem->size == int128_get64(section->size) &&
 -            mem->start == section->offset_within_address_space &&
 -            mem->mem == (memory_region_get_ram_ptr(area) +
 -            section->offset_within_region)) {
 -            return; /* Same region was attempted to register, go away. */
 -        }
 -    }
 -
 -    /* Region needs to be reset. set the size to 0 and remap it. */
 -    if (mem) {
 -        mem->size = 0;
 -        if (do_hvf_set_memory(mem, 0)) {
 -            error_report("Failed to reset overlapping slot");
 -            abort();
 -        }
 -    }
 -
 -    if (!add) {
 -        return;
 -    }
 -
 -    if (area->readonly ||
 -        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
 -        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
 -    } else {
 -        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
 -    }
 -
 -    /* Now make a new slot. */
 -    int x;
 -
 -    for (x = 0; x < hvf_state->num_slots; ++x) {
 -        mem = &hvf_state->slots[x];
 -        if (!mem->size) {
 -            break;
 -        }
 -    }
 -
 -    if (x == hvf_state->num_slots) {
 -        error_report("No free slots");
 -        abort();
 -    }
 -
 -    mem->size = int128_get64(section->size);
 -    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
 -    mem->start = section->offset_within_address_space;
 -    mem->region = area;
 -
 -    if (do_hvf_set_memory(mem, flags)) {
 -        error_report("Error registering new memory slot");
 -        abort();
 -    }
 -}
 -
  void vmx_update_tpr(CPUState *cpu)
  {
      /* TODO: need integrate APIC handling */
@@ -XXX,XX +XXX,XX @@ void hvf_handle_io(CPUArchState *env, uint16_t port, void *buffer,
      }
  }
-+
-+static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
+-static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
-+{
+-{
-+    int opr_sz;
+-    if (!cpu->vcpu_dirty) {
-+
+-        hvf_get_registers(cpu);
-+    if (!dc_isar_feature(aa32_fhm, s)) {
+-        cpu->vcpu_dirty = true;
-+        return false;
+-    }
-+    }
+-}
-+
+-
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
+-void hvf_cpu_synchronize_state(CPUState *cpu)
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+-{
-+        ((a->vd & 0x10) || (a->q && (a->vn & 0x10)))) {
+-    if (!cpu->vcpu_dirty) {
-+        return false;
+-        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
-+    }
+-    }
-+
+-}
-+    if (a->vd & a->q) {
+-
-+        return false;
+-static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
-+    }
+-                                              run_on_cpu_data arg)
-+
+-{
-+    if (!vfp_access_check(s)) {
+-    hvf_put_registers(cpu);
-+        return true;
+-    cpu->vcpu_dirty = false;
-+    }
+-}
-+
+-
-+    opr_sz = (1 + a->q) * 8;
+-void hvf_cpu_synchronize_post_reset(CPUState *cpu)
-+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
+-{
-+                       vfp_reg_offset(a->q, a->vn),
+-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
-+                       vfp_reg_offset(a->q, a->rm),
+-}
-+                       cpu_env, opr_sz, opr_sz,
+-
-+                       (a->index << 2) | a->s, /* is_2 == 0 */
+-static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
-+                       gen_helper_gvec_fmlal_idx_a32);
+-                                             run_on_cpu_data arg)
-+    return true;
+-{
-+}
+-    hvf_put_registers(cpu);
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+-    cpu->vcpu_dirty = false;
-index XXXXXXX..XXXXXXX 100644
+-}
---- a/target/arm/translate.c
+-
-+++ b/target/arm/translate.c
+-void hvf_cpu_synchronize_post_init(CPUState *cpu)
-@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
+-{
 -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
 -}
 -
 -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
 -                                              run_on_cpu_data arg)
 -{
 -    cpu->vcpu_dirty = true;
 -}
 -
 -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
 -{
 -    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
 -}
 -
  static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
  {
      int read, write;
@@ -XXX,XX +XXX,XX @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
      return false;
  }
- #define VFP_REG_SHR(x, n) (((n) > 0) ? (x) >> (n) : (x) << -(n))
+-static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
--#define VFP_SREG(insn, bigbit, smallbit) \
+-{
--  ((VFP_REG_SHR(insn, bigbit - 1) & 0x1e) | (((insn) >> (smallbit)) & 1))
+-    hvf_slot *slot;
- #define VFP_DREG(reg, insn, bigbit, smallbit) do { \
+-
-     if (dc_isar_feature(aa32_simd_r32, s)) { \
+-    slot = hvf_find_overlap_slot(
-         reg = (((insn) >> (bigbit)) & 0x0f) \
+-            section->offset_within_address_space,
-@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
+-            int128_get64(section->size));
-         reg = ((insn) >> (bigbit)) & 0x0f; \
+-
-     }} while (0)
+-    /* protect region against writes; begin tracking it */
+-    if (on) {
--#define VFP_SREG_D(insn) VFP_SREG(insn, 12, 22)
+-        slot->flags |= HVF_SLOT_LOG;
- #define VFP_DREG_D(reg, insn) VFP_DREG(reg, insn, 12, 22)
+-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
--#define VFP_SREG_N(insn) VFP_SREG(insn, 16,  7)
+-                      HV_MEMORY_READ);
- #define VFP_DREG_N(reg, insn) VFP_DREG(reg, insn, 16,  7)
+-    /* stop tracking region*/
--#define VFP_SREG_M(insn) VFP_SREG(insn,  0,  5)
+-    } else {
- #define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn,  0,  5)
+-        slot->flags &= ~HVF_SLOT_LOG;
+-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
- static void gen_neon_dup_low16(TCGv_i32 var)
+-                      HV_MEMORY_READ | HV_MEMORY_WRITE);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+-    }
-     return 0;
+-}
 -
 -static void hvf_log_start(MemoryListener *listener,
 -                          MemoryRegionSection *section, int old, int new)
 -{
 -    if (old != 0) {
 -        return;
 -    }
 -
 -    hvf_set_dirty_tracking(section, 1);
 -}
 -
 -static void hvf_log_stop(MemoryListener *listener,
 -                         MemoryRegionSection *section, int old, int new)
 -{
 -    if (new != 0) {
 -        return;
 -    }
 -
 -    hvf_set_dirty_tracking(section, 0);
 -}
 -
 -static void hvf_log_sync(MemoryListener *listener,
 -                         MemoryRegionSection *section)
 -{
 -    /*
 -     * sync of dirty pages is handled elsewhere; just make sure we keep
 -     * tracking the region.
 -     */
 -    hvf_set_dirty_tracking(section, 1);
 -}
 -
 -static void hvf_region_add(MemoryListener *listener,
 -                           MemoryRegionSection *section)
 -{
 -    hvf_set_phys_mem(section, true);
 -}
 -
 -static void hvf_region_del(MemoryListener *listener,
 -                           MemoryRegionSection *section)
 -{
 -    hvf_set_phys_mem(section, false);
 -}
 -
 -static MemoryListener hvf_memory_listener = {
 -    .priority = 10,
 -    .region_add = hvf_region_add,
 -    .region_del = hvf_region_del,
 -    .log_start = hvf_log_start,
 -    .log_stop = hvf_log_stop,
 -    .log_sync = hvf_log_sync,
 -};
 -
  void hvf_vcpu_destroy(CPUState *cpu)
  {
      X86CPU *x86_cpu = X86_CPU(cpu);
@@ -XXX,XX +XXX,XX @@ void hvf_vcpu_destroy(CPUState *cpu)
      assert_hvf_ok(ret);
  }
--/* Advanced SIMD two registers and a scalar extension.
+-static void dummy_signal(int sig)
-- *  31             24   23  22   20   16   12  11   10   9    8        3     0
+-{
-- * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
+-}
-- * | 1 1 1 1 1 1 1 0 | o1 | D | o2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
+-
-- * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
+ static void init_tsc_freq(CPUX86State *env)
-- *
+ {
-- */
+     size_t length;
--
+@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
--static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
--{
+     return ret;
--    gen_helper_gvec_3 *fn_gvec = NULL;
+ }
--    gen_helper_gvec_3_ptr *fn_gvec_ptr = NULL;
+-
--    int rd, rn, rm, opr_sz, data;
+-bool hvf_allowed;
--    int off_rn, off_rm;
+-
--    bool is_long = false, q = extract32(insn, 6, 1);
+-static int hvf_accel_init(MachineState *ms)
--    bool ptr_is_env = false;
+-{
--
+-    int x;
--    if ((insn & 0xffa00f10) == 0xfe000810) {
+-    hv_return_t ret;
--        /* VFM[AS]L -- 1111 1110 0.0S .... .... 1000 .Q.1 .... */
+-    HVFState *s;
--        int is_s = extract32(insn, 20, 1);
+-
--        int vm20 = extract32(insn, 0, 3);
+-    ret = hv_vm_create(HV_VM_DEFAULT);
--        int vm3 = extract32(insn, 3, 1);
+-    assert_hvf_ok(ret);
--        int m = extract32(insn, 5, 1);
+-
--        int index;
+-    s = g_new0(HVFState, 1);
 -
--        if (!dc_isar_feature(aa32_fhm, s)) {
+-    s->num_slots = 32;
--            return 1;
+-    for (x = 0; x < s->num_slots; ++x) {
--        }
+-        s->slots[x].size = 0;
--        if (q) {
+-        s->slots[x].slot_id = x;
--            rm = vm20;
+-    }
--            index = m * 2 + vm3;
+-
--        } else {
+-    hvf_state = s;
--            rm = vm20 * 2 + m;
+-    memory_listener_register(&hvf_memory_listener, &address_space_memory);
 -            index = vm3;
 -        }
 -        is_long = true;
 -        data = (index << 2) | is_s; /* is_2 == 0 */
 -        fn_gvec_ptr = gen_helper_gvec_fmlal_idx_a32;
 -        ptr_is_env = true;
 -    } else {
 -        return 1;
 -    }
 -
 -    VFP_DREG_D(rd, insn);
 -    if (rd & q) {
 -        return 1;
 -    }
 -    if (q || !is_long) {
 -        VFP_DREG_N(rn, insn);
 -        if (rn & q & !is_long) {
 -            return 1;
 -        }
 -        off_rn = vfp_reg_offset(1, rn);
 -        off_rm = vfp_reg_offset(1, rm);
 -    } else {
 -        rn = VFP_SREG_N(insn);
 -        off_rn = vfp_reg_offset(0, rn);
 -        off_rm = vfp_reg_offset(0, rm);
 -    }
 -    if (s->fp_excp_el) {
 -        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
 -                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
 -        return 0;
 -    }
 -    if (!s->vfp_enabled) {
 -        return 1;
 -    }
 -
 -    opr_sz = (1 + q) * 8;
 -    if (fn_gvec_ptr) {
 -        TCGv_ptr ptr;
 -        if (ptr_is_env) {
 -            ptr = cpu_env;
 -        } else {
 -            ptr = get_fpstatus_ptr(1);
 -        }
 -        tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), off_rn, off_rm, ptr,
 -                           opr_sz, opr_sz, data, fn_gvec_ptr);
 -        if (!ptr_is_env) {
 -            tcg_temp_free_ptr(ptr);
 -        }
 -    } else {
 -        tcg_gen_gvec_3_ool(vfp_reg_offset(1, rd), off_rn, off_rm,
 -                           opr_sz, opr_sz, data, fn_gvec);
 -    }
 -    return 0;
 -}
 -
- static int disas_coproc_insn(DisasContext *s, uint32_t insn)
+-static void hvf_accel_class_init(ObjectClass *oc, void *data)
- {
+-{
-     int cpnum, is64, crn, crm, opc1, opc2, isread, rt, rt2;
+-    AccelClass *ac = ACCEL_CLASS(oc);
-@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
+-    ac->name = "HVF";
-                     }
+-    ac->init_machine = hvf_accel_init;
-                 }
+-    ac->allowed = &hvf_allowed;
-             }
+-}
--        } else if ((insn & 0x0f000a00) == 0x0e000800
+-
--                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
+-static const TypeInfo hvf_accel_type = {
--            if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
+-    .name = TYPE_HVF_ACCEL,
--                goto illegal_op;
+-    .parent = TYPE_ACCEL,
--            }
+-    .class_init = hvf_accel_class_init,
--            return;
+-};
-         }
+-
-         goto illegal_op;
+-static void hvf_type_init(void)
-     }
+-{
-@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
+-    type_register_static(&hvf_accel_type);
-             }
+-}
-             break;
+-
-         }
+-type_init(hvf_type_init);
 -        if ((insn & 0xff000a00) == 0xfe000800
 -            && arm_dc_feature(s, ARM_FEATURE_V8)) {
 -            /* The Thumb2 and ARM encodings are identical.  */
 -            if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
 -                goto illegal_op;
 -            }
 -        } else if (((insn >> 24) & 3) == 3) {
 +        if (((insn >> 24) & 3) == 3) {
              /* Translate into the equivalent ARM encoding.  */
              insn = (insn & 0xe2ffffff) | ((insn & (1 << 28)) >> 4) | (1 << 28);
              if (disas_neon_data_insn(s, insn)) {
 --
 .20.1

-[PULL 16/39] hw/arm: versal: Add support for SD
+[PULL 31/45] hvf: Move hvf internal definitions into common header
-From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+From: Alexander Graf <agraf@csgraf.de>
-Add support for SD.
+Until now, Hypervisor.framework has only been available on x86_64 systems.
 With Apple Silicon shipping now, it extends its reach to aarch64. To
 prepare for support for multiple architectures, let's start moving common
 code out into its own accel directory.
-Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+This patch moves a few internal struct and constant defines over.
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Reviewed-by: Luc Michel <luc.michel@greensocs.com>
+Reviewed-by: Sergio Lopez <slp@redhat.com>
-Message-id: 20200427181649.26851-9-edgar.iglesias@gmail.com
+Message-id: 20210519202253.76782-5-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/hw/arm/xlnx-versal.h | 12 ++++++++++++
+ include/sysemu/hvf_int.h   | 30 ++++++++++++++++++++++++++++++
- hw/arm/xlnx-versal.c         | 31 +++++++++++++++++++++++++++++++
+ target/i386/hvf/hvf-i386.h | 31 +------------------------------
-files changed, 43 insertions(+)
+files changed, 31 insertions(+), 30 deletions(-)
-diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
+diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/xlnx-versal.h
+--- a/include/sysemu/hvf_int.h
-+++ b/include/hw/arm/xlnx-versal.h
++++ b/include/sysemu/hvf_int.h
 @@ -XXX,XX +XXX,XX @@
- #include "hw/sysbus.h"
+ #include <Hypervisor/hv.h>
- #include "hw/arm/boot.h"
-+#include "hw/sd/sdhci.h"
++/* hvf_slot flags */
- #include "hw/intc/arm_gicv3.h"
++#define HVF_SLOT_LOG (1 << 0)
- #include "hw/char/pl011.h"
++
- #include "hw/dma/xlnx-zdma.h"
++typedef struct hvf_slot {
 +    uint64_t start;
 +    uint64_t size;
 +    uint8_t *mem;
 +    int slot_id;
 +    uint32_t flags;
 +    MemoryRegion *region;
 +} hvf_slot;
 +
 +typedef struct hvf_vcpu_caps {
 +    uint64_t vmx_cap_pinbased;
 +    uint64_t vmx_cap_procbased;
 +    uint64_t vmx_cap_procbased2;
 +    uint64_t vmx_cap_entry;
 +    uint64_t vmx_cap_exit;
 +    uint64_t vmx_cap_preemption_timer;
 +} hvf_vcpu_caps;
 +
 +struct HVFState {
 +    AccelState parent;
 +    hvf_slot slots[32];
 +    int num_slots;
 +
 +    hvf_vcpu_caps *hvf_caps;
 +};
 +extern HVFState *hvf_state;
 +
  void hvf_set_phys_mem(MemoryRegionSection *, bool);
  void assert_hvf_ok(hv_return_t ret);
  hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
 diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/hvf-i386.h
 +++ b/target/i386/hvf/hvf-i386.h
 @@ -XXX,XX +XXX,XX @@
- #define XLNX_VERSAL_NR_UARTS   2
- #define XLNX_VERSAL_NR_GEMS    2
+ #include "qemu/accel.h"
- #define XLNX_VERSAL_NR_ADMAS   8
+ #include "sysemu/hvf.h"
-+#define XLNX_VERSAL_NR_SDS     2
++#include "sysemu/hvf_int.h"
- #define XLNX_VERSAL_NR_IRQS    192
+ #include "cpu.h"
+ #include "x86.h"
- typedef struct Versal {
-@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
+-/* hvf_slot flags */
-         } iou;
+-#define HVF_SLOT_LOG (1 << 0)
-     } lpd;
+-
+-typedef struct hvf_slot {
-+    /* The Platform Management Controller subsystem.  */
+-    uint64_t start;
-+    struct {
+-    uint64_t size;
-+        struct {
+-    uint8_t *mem;
-+            SDHCIState sd[XLNX_VERSAL_NR_SDS];
+-    int slot_id;
-+        } iou;
+-    uint32_t flags;
-+    } pmc;
+-    MemoryRegion *region;
-+
+-} hvf_slot;
-     struct {
+-
-         MemoryRegion *mr_ddr;
+-typedef struct hvf_vcpu_caps {
-         uint32_t psci_conduit;
+-    uint64_t vmx_cap_pinbased;
-@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
+-    uint64_t vmx_cap_procbased;
- #define VERSAL_GEM1_IRQ_0          58
+-    uint64_t vmx_cap_procbased2;
- #define VERSAL_GEM1_WAKE_IRQ_0     59
+-    uint64_t vmx_cap_entry;
- #define VERSAL_ADMA_IRQ_0          60
+-    uint64_t vmx_cap_exit;
-+#define VERSAL_SD0_IRQ_0           126
+-    uint64_t vmx_cap_preemption_timer;
+-} hvf_vcpu_caps;
- /* Architecturally reserved IRQs suitable for virtualization.  */
+-
- #define VERSAL_RSVD_IRQ_FIRST 111
+-struct HVFState {
-@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
+-    AccelState parent;
- #define MM_FPD_CRF                  0xfd1a0000U
+-    hvf_slot slots[32];
- #define MM_FPD_CRF_SIZE             0x140000
+-    int num_slots;
+-
-+#define MM_PMC_SD0                  0xf1040000U
+-    hvf_vcpu_caps *hvf_caps;
-+#define MM_PMC_SD0_SIZE             0x10000
+-};
- #define MM_PMC_CRP                  0xf1260000U
+-extern HVFState *hvf_state;
- #define MM_PMC_CRP_SIZE             0x10000
+-
- #endif
+ void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
-diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
-index XXXXXXX..XXXXXXX 100644
+ #ifdef NEED_CPU_H
 --- a/hw/arm/xlnx-versal.c
 +++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_admas(Versal *s, qemu_irq *pic)
      }
  }
 +#define SDHCI_CAPABILITIES  0x280737ec6481 /* Same as on ZynqMP.  */
 +static void versal_create_sds(Versal *s, qemu_irq *pic)
 +{
 +    int i;
 +
 +    for (i = 0; i < ARRAY_SIZE(s->pmc.iou.sd); i++) {
 +        DeviceState *dev;
 +        MemoryRegion *mr;
 +
 +        sysbus_init_child_obj(OBJECT(s), "sd[*]",
 +                              &s->pmc.iou.sd[i], sizeof(s->pmc.iou.sd[i]),
 +                              TYPE_SYSBUS_SDHCI);
 +        dev = DEVICE(&s->pmc.iou.sd[i]);
 +
 +        object_property_set_uint(OBJECT(dev),
 +                                 3, "sd-spec-version", &error_fatal);
 +        object_property_set_uint(OBJECT(dev), SDHCI_CAPABILITIES, "capareg",
 +                                 &error_fatal);
 +        object_property_set_uint(OBJECT(dev), UHS_I, "uhs", &error_fatal);
 +        qdev_init_nofail(dev);
 +
 +        mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
 +        memory_region_add_subregion(&s->mr_ps,
 +                                    MM_PMC_SD0 + i * MM_PMC_SD0_SIZE, mr);
 +
 +        sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0,
 +                           pic[VERSAL_SD0_IRQ_0 + i * 2]);
 +    }
 +}
 +
  /* This takes the board allocated linear DDR memory and creates aliases
   * for each split DDR range/aperture on the Versal address map.
   */
@@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp)
      versal_create_uarts(s, pic);
      versal_create_gems(s, pic);
      versal_create_admas(s, pic);
 +    versal_create_sds(s, pic);
      versal_map_ddr(s);
      versal_unimp(s);
 --
 .20.1

-[PULL 17/39] hw/arm: versal: Add support for the RTC
+[PULL 32/45] hvf: Make hvf_set_phys_mem() static
-From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+From: Alexander Graf <agraf@csgraf.de>
-hw/arm: versal: Add support for the RTC.
+The hvf_set_phys_mem() function is only called within the same file.
 Make it static.
-Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Sergio Lopez <slp@redhat.com>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Message-id: 20210519202253.76782-6-agraf@csgraf.de
-Reviewed-by: Luc Michel <luc.michel@greensocs.com>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20200427181649.26851-10-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/hw/arm/xlnx-versal.h |  8 ++++++++
+ include/sysemu/hvf_int.h  | 1 -
- hw/arm/xlnx-versal.c         | 21 +++++++++++++++++++++
+ accel/hvf/hvf-accel-ops.c | 2 +-
-files changed, 29 insertions(+)
+files changed, 1 insertion(+), 2 deletions(-)
-diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
+diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/xlnx-versal.h
+--- a/include/sysemu/hvf_int.h
-+++ b/include/hw/arm/xlnx-versal.h
++++ b/include/sysemu/hvf_int.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ struct HVFState {
- #include "hw/char/pl011.h"
+ };
- #include "hw/dma/xlnx-zdma.h"
+ extern HVFState *hvf_state;
- #include "hw/net/cadence_gem.h"
-+#include "hw/rtc/xlnx-zynqmp-rtc.h"
+-void hvf_set_phys_mem(MemoryRegionSection *, bool);
+ void assert_hvf_ok(hv_return_t ret);
- #define TYPE_XLNX_VERSAL "xlnx-versal"
+ hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
- #define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
+ int hvf_put_registers(CPUState *);
-@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
+diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
          struct {
              SDHCIState sd[XLNX_VERSAL_NR_SDS];
          } iou;
 +
 +        XlnxZynqMPRTC rtc;
      } pmc;
      struct {
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
  #define VERSAL_GEM1_IRQ_0          58
  #define VERSAL_GEM1_WAKE_IRQ_0     59
  #define VERSAL_ADMA_IRQ_0          60
 +#define VERSAL_RTC_APB_ERR_IRQ     121
  #define VERSAL_SD0_IRQ_0           126
 +#define VERSAL_RTC_ALARM_IRQ       142
 +#define VERSAL_RTC_SECONDS_IRQ     143
  /* Architecturally reserved IRQs suitable for virtualization.  */
  #define VERSAL_RSVD_IRQ_FIRST 111
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
  #define MM_PMC_SD0_SIZE             0x10000
  #define MM_PMC_CRP                  0xf1260000U
  #define MM_PMC_CRP_SIZE             0x10000
 +#define MM_PMC_RTC                  0xf12a0000
 +#define MM_PMC_RTC_SIZE             0x10000
  #endif
 diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/xlnx-versal.c
+--- a/accel/hvf/hvf-accel-ops.c
-+++ b/hw/arm/xlnx-versal.c
++++ b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@ static void versal_create_sds(Versal *s, qemu_irq *pic)
+@@ -XXX,XX +XXX,XX @@ static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
-     }
+     return 0;
  }
-+static void versal_create_rtc(Versal *s, qemu_irq *pic)
+-void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
-+{
++static void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
-+    SysBusDevice *sbd;
+ {
-+    MemoryRegion *mr;
+     hvf_slot *mem;
-+
+     MemoryRegion *area = section->mr;
 +    sysbus_init_child_obj(OBJECT(s), "rtc", &s->pmc.rtc, sizeof(s->pmc.rtc),
 +                          TYPE_XLNX_ZYNQMP_RTC);
 +    sbd = SYS_BUS_DEVICE(&s->pmc.rtc);
 +    qdev_init_nofail(DEVICE(sbd));
 +
 +    mr = sysbus_mmio_get_region(sbd, 0);
 +    memory_region_add_subregion(&s->mr_ps, MM_PMC_RTC, mr);
 +
 +    /*
 +     * TODO: Connect the ALARM and SECONDS interrupts once our RTC model
 +     * supports them.
 +     */
 +    sysbus_connect_irq(sbd, 1, pic[VERSAL_RTC_APB_ERR_IRQ]);
 +}
 +
  /* This takes the board allocated linear DDR memory and creates aliases
   * for each split DDR range/aperture on the Versal address map.
   */
@@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp)
      versal_create_gems(s, pic);
      versal_create_admas(s, pic);
      versal_create_sds(s, pic);
 +    versal_create_rtc(s, pic);
      versal_map_ddr(s);
      versal_unimp(s);
 --
 .20.1

-[PULL 13/39] hw/arm: versal: Embed the GEMs into the SoC type
+[PULL 33/45] hvf: Remove use of hv_uvaddr_t and hv_gpaddr_t
-From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+From: Alexander Graf <agraf@csgraf.de>
-Embed the GEMs into the SoC type.
+The ARM version of Hypervisor.framework no longer defines these two
 types, so let's just revert to standard ones.
-Suggested-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Reviewed-by: Sergio Lopez <slp@redhat.com>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Message-id: 20210519202253.76782-7-agraf@csgraf.de
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Luc Michel <luc.michel@greensocs.com>
 Message-id: 20200427181649.26851-6-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/hw/arm/xlnx-versal.h |  3 ++-
+ accel/hvf/hvf-accel-ops.c | 6 +++---
- hw/arm/xlnx-versal.c         | 15 ++++++++-------
+file changed, 3 insertions(+), 3 deletions(-)
 files changed, 10 insertions(+), 8 deletions(-)
-diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
+diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/xlnx-versal.h
+--- a/accel/hvf/hvf-accel-ops.c
-+++ b/include/hw/arm/xlnx-versal.h
++++ b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
- #include "hw/arm/boot.h"
+     macslot->present = 1;
- #include "hw/intc/arm_gicv3.h"
+     macslot->gpa_start = slot->start;
- #include "hw/char/pl011.h"
+     macslot->size = slot->size;
-+#include "hw/net/cadence_gem.h"
+-    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
++    ret = hv_vm_map(slot->mem, slot->start, slot->size, flags);
- #define TYPE_XLNX_VERSAL "xlnx-versal"
+     assert_hvf_ok(ret);
- #define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
+     return 0;
-@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
+ }
+@@ -XXX,XX +XXX,XX @@ static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
-         struct {
+     /* protect region against writes; begin tracking it */
-             PL011State uart[XLNX_VERSAL_NR_UARTS];
+     if (on) {
--            SysBusDevice *gem[XLNX_VERSAL_NR_GEMS];
+         slot->flags |= HVF_SLOT_LOG;
-+            CadenceGEMState gem[XLNX_VERSAL_NR_GEMS];
+-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
-             SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
++        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
-         } iou;
+                       HV_MEMORY_READ);
-     } lpd;
+     /* stop tracking region*/
-diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
+     } else {
-index XXXXXXX..XXXXXXX 100644
+         slot->flags &= ~HVF_SLOT_LOG;
---- a/hw/arm/xlnx-versal.c
+-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
-+++ b/hw/arm/xlnx-versal.c
++        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
-@@ -XXX,XX +XXX,XX @@ static void versal_create_gems(Versal *s, qemu_irq *pic)
+                       HV_MEMORY_READ | HV_MEMORY_WRITE);
          DeviceState *dev;
          MemoryRegion *mr;
 -        dev = qdev_create(NULL, "cadence_gem");
 -        s->lpd.iou.gem[i] = SYS_BUS_DEVICE(dev);
 -        object_property_add_child(OBJECT(s), name, OBJECT(dev), &error_fatal);
 +        sysbus_init_child_obj(OBJECT(s), name,
 +                              &s->lpd.iou.gem[i], sizeof(s->lpd.iou.gem[i]),
 +                              TYPE_CADENCE_GEM);
 +        dev = DEVICE(&s->lpd.iou.gem[i]);
          if (nd->used) {
              qemu_check_nic_model(nd, "cadence_gem");
              qdev_set_nic_properties(dev, nd);
          }
 -        object_property_set_int(OBJECT(s->lpd.iou.gem[i]),
 +        object_property_set_int(OBJECT(dev),
 , "num-priority-queues",
                                  &error_abort);
 -        object_property_set_link(OBJECT(s->lpd.iou.gem[i]),
 +        object_property_set_link(OBJECT(dev),
                                   OBJECT(&s->mr_ps), "dma",
                                   &error_abort);
          qdev_init_nofail(dev);
 -        mr = sysbus_mmio_get_region(s->lpd.iou.gem[i], 0);
 +        mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
          memory_region_add_subregion(&s->mr_ps, addrs[i], mr);
 -        sysbus_connect_irq(s->lpd.iou.gem[i], 0, pic[irqs[i]]);
 +        sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[irqs[i]]);
          g_free(name);
      }
  }
 --
 .20.1

-[PULL 18/39] hw/arm: versal-virt: Add support for SD
+[PULL 34/45] hvf: Split out common code on vcpu init and destroy
-From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+From: Alexander Graf <agraf@csgraf.de>
-Add support for SD.
+Until now, Hypervisor.framework has only been available on x86_64 systems.
 With Apple Silicon shipping now, it extends its reach to aarch64. To
 prepare for support for multiple architectures, let's start moving common
 code out into its own accel directory.
-Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+This patch splits the vcpu init and destroy functions into a generic and
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+an architecture specific portion. This also allows us to move the generic
-Reviewed-by: Luc Michel <luc.michel@greensocs.com>
+functions into the generic hvf code, removing exported functions.
-Message-id: 20200427181649.26851-11-edgar.iglesias@gmail.com
 Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-8-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/xlnx-versal-virt.c | 46 +++++++++++++++++++++++++++++++++++++++
+ accel/hvf/hvf-accel-ops.h |  2 --
-file changed, 46 insertions(+)
+ include/sysemu/hvf_int.h  |  2 ++
  accel/hvf/hvf-accel-ops.c | 30 ++++++++++++++++++++++++++++++
  target/i386/hvf/hvf.c     | 23 ++---------------------
 files changed, 34 insertions(+), 23 deletions(-)
-diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
+diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/xlnx-versal-virt.c
+--- a/accel/hvf/hvf-accel-ops.h
-+++ b/hw/arm/xlnx-versal-virt.c
++++ b/accel/hvf/hvf-accel-ops.h
 @@ -XXX,XX +XXX,XX @@
- #include "hw/arm/sysbus-fdt.h"
- #include "hw/arm/fdt.h"
+ #include "sysemu/cpus.h"
- #include "cpu.h"
-+#include "hw/qdev-properties.h"
+-int hvf_init_vcpu(CPUState *);
- #include "hw/arm/xlnx-versal.h"
+ int hvf_vcpu_exec(CPUState *);
+ void hvf_cpu_synchronize_state(CPUState *);
- #define TYPE_XLNX_VERSAL_VIRT_MACHINE MACHINE_TYPE_NAME("xlnx-versal-virt")
+ void hvf_cpu_synchronize_post_reset(CPUState *);
-@@ -XXX,XX +XXX,XX @@ static void fdt_add_zdma_nodes(VersalVirt *s)
+ void hvf_cpu_synchronize_post_init(CPUState *);
-     }
+ void hvf_cpu_synchronize_pre_loadvm(CPUState *);
- }
+-void hvf_vcpu_destroy(CPUState *);
-+static void fdt_add_sd_nodes(VersalVirt *s)
+ #endif /* HVF_CPUS_H */
 diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/sysemu/hvf_int.h
 +++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
  extern HVFState *hvf_state;
  void assert_hvf_ok(hv_return_t ret);
 +int hvf_arch_init_vcpu(CPUState *cpu);
 +void hvf_arch_vcpu_destroy(CPUState *cpu);
  hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
  int hvf_put_registers(CPUState *);
  int hvf_get_registers(CPUState *);
 diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/hvf/hvf-accel-ops.c
 +++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static void hvf_type_init(void)
  type_init(hvf_type_init);
 +static void hvf_vcpu_destroy(CPUState *cpu)
 +{
-+    const char clocknames[] = "clk_xin\0clk_ahb";
++    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
-+    const char compat[] = "arasan,sdhci-8.9a";
++    assert_hvf_ok(ret);
 +    int i;
 +
-+    for (i = ARRAY_SIZE(s->soc.pmc.iou.sd) - 1; i >= 0; i--) {
++    hvf_arch_vcpu_destroy(cpu);
 +        uint64_t addr = MM_PMC_SD0 + MM_PMC_SD0_SIZE * i;
 +        char *name = g_strdup_printf("/sdhci@%" PRIx64, addr);
 +
 +        qemu_fdt_add_subnode(s->fdt, name);
 +
 +        qemu_fdt_setprop_cells(s->fdt, name, "clocks",
 +                               s->phandle.clk_25Mhz, s->phandle.clk_25Mhz);
 +        qemu_fdt_setprop(s->fdt, name, "clock-names",
 +                         clocknames, sizeof(clocknames));
 +        qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
 +                               GIC_FDT_IRQ_TYPE_SPI, VERSAL_SD0_IRQ_0 + i * 2,
 +                               GIC_FDT_IRQ_FLAGS_LEVEL_HI);
 +        qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
 +                                     2, addr, 2, MM_PMC_SD0_SIZE);
 +        qemu_fdt_setprop(s->fdt, name, "compatible", compat, sizeof(compat));
 +        g_free(name);
 +    }
 +}
 +
- static void fdt_nop_memory_nodes(void *fdt, Error **errp)
++static int hvf_init_vcpu(CPUState *cpu)
  {
      Error *err = NULL;
@@ -XXX,XX +XXX,XX @@ static void create_virtio_regions(VersalVirt *s)
      }
  }
 +static void sd_plugin_card(SDHCIState *sd, DriveInfo *di)
 +{
-+    BlockBackend *blk = di ? blk_by_legacy_dinfo(di) : NULL;
++    int r;
 +    DeviceState *card;
 +
-+    card = qdev_create(qdev_get_child_bus(DEVICE(sd), "sd-bus"), TYPE_SD_CARD);
++    /* init cpu signals */
-+    object_property_add_child(OBJECT(sd), "card[*]", OBJECT(card),
++    sigset_t set;
-+                              &error_fatal);
++    struct sigaction sigact;
-+    qdev_prop_set_drive(card, "drive", blk, &error_fatal);
++
-+    object_property_set_bool(OBJECT(card), true, "realized", &error_fatal);
++    memset(&sigact, 0, sizeof(sigact));
 +    sigact.sa_handler = dummy_signal;
 +    sigaction(SIG_IPI, &sigact, NULL);
 +
 +    pthread_sigmask(SIG_BLOCK, NULL, &set);
 +    sigdelset(&set, SIG_IPI);
 +
 +    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
 +    cpu->vcpu_dirty = 1;
 +    assert_hvf_ok(r);
 +
 +    return hvf_arch_init_vcpu(cpu);
 +}
 +
- static void versal_virt_init(MachineState *machine)
+ /*
   * The HVF-specific vCPU thread function. This one should only run when the host
   * CPU supports the VMX "unrestricted guest" feature.
 diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/hvf.c
 +++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
      return false;
  }
 -void hvf_vcpu_destroy(CPUState *cpu)
 +void hvf_arch_vcpu_destroy(CPUState *cpu)
  {
-     VersalVirt *s = XLNX_VERSAL_VIRT_MACHINE(machine);
+     X86CPU *x86_cpu = X86_CPU(cpu);
-     int psci_conduit = QEMU_PSCI_CONDUIT_DISABLED;
+     CPUX86State *env = &x86_cpu->env;
-+    int i;
+-    hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd);
-     /*
+     g_free(env->hvf_mmio_buf);
-      * If the user provides an Operating System to be loaded, we expect them
+-    assert_hvf_ok(ret);
-@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
+ }
-     fdt_add_gic_nodes(s);
-     fdt_add_timer_nodes(s);
+ static void init_tsc_freq(CPUX86State *env)
-     fdt_add_zdma_nodes(s);
+@@ -XXX,XX +XXX,XX @@ static inline bool apic_bus_freq_is_known(CPUX86State *env)
-+    fdt_add_sd_nodes(s);
+     return env->apic_bus_freq != 0;
-     fdt_add_cpu_nodes(s, psci_conduit);
+ }
-     fdt_add_clk_node(s, "/clk125", 125000000, s->phandle.clk_125Mhz);
-     fdt_add_clk_node(s, "/clk25", 25000000, s->phandle.clk_25Mhz);
+-int hvf_init_vcpu(CPUState *cpu)
-@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
++int hvf_arch_init_vcpu(CPUState *cpu)
-     memory_region_add_subregion_overlap(get_system_memory(),
+ {
-, &s->soc.fpd.apu.mr, 0);
+-
+     X86CPU *x86cpu = X86_CPU(cpu);
-+    /* Plugin SD cards.  */
+     CPUX86State *env = &x86cpu->env;
-+    for (i = 0; i < ARRAY_SIZE(s->soc.pmc.iou.sd); i++) {
+-    int r;
-+        sd_plugin_card(&s->soc.pmc.iou.sd[i], drive_get_next(IF_SD));
+-
-+    }
+-    /* init cpu signals */
-+
+-    sigset_t set;
-     s->binfo.ram_size = machine->ram_size;
+-    struct sigaction sigact;
-     s->binfo.loader_start = 0x0;
+-
-     s->binfo.get_dtb = versal_virt_get_dtb;
+-    memset(&sigact, 0, sizeof(sigact));
 -    sigact.sa_handler = dummy_signal;
 -    sigaction(SIG_IPI, &sigact, NULL);
 -
 -    pthread_sigmask(SIG_BLOCK, NULL, &set);
 -    sigdelset(&set, SIG_IPI);
      init_emu();
      init_decoder();
@@ -XXX,XX +XXX,XX @@ int hvf_init_vcpu(CPUState *cpu)
          }
      }
 -    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
 -    cpu->vcpu_dirty = 1;
 -    assert_hvf_ok(r);
 -
      if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED,
          &hvf_state->hvf_caps->vmx_cap_pinbased)) {
          abort();
 --
 .20.1

-[PULL 15/39] hw/arm: versal: Embed the APUs into the SoC type
+[PULL 35/45] hvf: Use cpu_synchronize_state()
-From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+From: Alexander Graf <agraf@csgraf.de>
-Embed the APUs into the SoC type.
+There is no reason to call the hvf specific hvf_cpu_synchronize_state()
 when we can just use the generic cpu_synchronize_state() instead. This
 allows us to have less dependency on internal function definitions and
 allows us to make hvf_cpu_synchronize_state() static.
-Suggested-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Reviewed-by: Sergio Lopez <slp@redhat.com>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Message-id: 20210519202253.76782-9-agraf@csgraf.de
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Luc Michel <luc.michel@greensocs.com>
 Message-id: 20200427181649.26851-8-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/hw/arm/xlnx-versal.h |  2 +-
+ accel/hvf/hvf-accel-ops.h | 1 -
- hw/arm/xlnx-versal-virt.c    |  4 ++--
+ accel/hvf/hvf-accel-ops.c | 2 +-
- hw/arm/xlnx-versal.c         | 19 +++++--------------
+ target/i386/hvf/x86hvf.c  | 9 ++++-----
-files changed, 8 insertions(+), 17 deletions(-)
+files changed, 5 insertions(+), 7 deletions(-)
-diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
+diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/xlnx-versal.h
+--- a/accel/hvf/hvf-accel-ops.h
-+++ b/include/hw/arm/xlnx-versal.h
++++ b/accel/hvf/hvf-accel-ops.h
-@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
+@@ -XXX,XX +XXX,XX @@
-     struct {
+ #include "sysemu/cpus.h"
-         struct {
-             MemoryRegion mr;
+ int hvf_vcpu_exec(CPUState *);
--            ARMCPU *cpu[XLNX_VERSAL_NR_ACPUS];
+-void hvf_cpu_synchronize_state(CPUState *);
-+            ARMCPU cpu[XLNX_VERSAL_NR_ACPUS];
+ void hvf_cpu_synchronize_post_reset(CPUState *);
-             GICv3State gic;
+ void hvf_cpu_synchronize_post_init(CPUState *);
-         } apu;
+ void hvf_cpu_synchronize_pre_loadvm(CPUState *);
-     } fpd;
+diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/xlnx-versal-virt.c
+--- a/accel/hvf/hvf-accel-ops.c
-+++ b/hw/arm/xlnx-versal-virt.c
++++ b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
+@@ -XXX,XX +XXX,XX @@ static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
      s->binfo.get_dtb = versal_virt_get_dtb;
      s->binfo.modify_dtb = versal_virt_modify_dtb;
      if (machine->kernel_filename) {
 -        arm_load_kernel(s->soc.fpd.apu.cpu[0], machine, &s->binfo);
 +        arm_load_kernel(&s->soc.fpd.apu.cpu[0], machine, &s->binfo);
      } else {
 -        AddressSpace *as = arm_boot_address_space(s->soc.fpd.apu.cpu[0],
 +        AddressSpace *as = arm_boot_address_space(&s->soc.fpd.apu.cpu[0],
                                                    &s->binfo);
          /* Some boot-loaders (e.g u-boot) don't like blobs at address 0 (NULL).
           * Offset things by 4K.  */
 diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/xlnx-versal.c
 +++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
      for (i = 0; i < ARRAY_SIZE(s->fpd.apu.cpu); i++) {
          Object *obj;
 -        char *name;
 -
 -        obj = object_new(XLNX_VERSAL_ACPU_TYPE);
 -        if (!obj) {
 -            error_report("Unable to create apu.cpu[%d] of type %s",
 -                         i, XLNX_VERSAL_ACPU_TYPE);
 -            exit(EXIT_FAILURE);
 -        }
 -
 -        name = g_strdup_printf("apu-cpu[%d]", i);
 -        object_property_add_child(OBJECT(s), name, obj, &error_fatal);
 -        g_free(name);
 +        object_initialize_child(OBJECT(s), "apu-cpu[*]",
 +                                &s->fpd.apu.cpu[i], sizeof(s->fpd.apu.cpu[i]),
 +                                XLNX_VERSAL_ACPU_TYPE, &error_abort, NULL);
 +        obj = OBJECT(&s->fpd.apu.cpu[i]);
          object_property_set_int(obj, s->cfg.psci_conduit,
                                  "psci-conduit", &error_abort);
          if (i) {
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
          object_property_set_link(obj, OBJECT(&s->fpd.apu.mr), "memory",
                                   &error_abort);
          object_property_set_bool(obj, true, "realized", &error_fatal);
 -        s->fpd.apu.cpu[i] = ARM_CPU(obj);
      }
  }
-@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_gic(Versal *s, qemu_irq *pic)
+-void hvf_cpu_synchronize_state(CPUState *cpu)
 +static void hvf_cpu_synchronize_state(CPUState *cpu)
  {
      if (!cpu->vcpu_dirty) {
          run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
 diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/x86hvf.c
 +++ b/target/i386/hvf/x86hvf.c
@@ -XXX,XX +XXX,XX @@
  #include "cpu.h"
  #include "x86_descr.h"
  #include "x86_decode.h"
 +#include "sysemu/hw_accel.h"
  #include "hw/i386/apic_internal.h"
  #include <Hypervisor/hv.h>
  #include <Hypervisor/hv_vmx.h>
 -#include "accel/hvf/hvf-accel-ops.h"
 -
  void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
                       SegmentCache *qseg, bool is_tr)
  {
@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
      env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
      if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
 -        hvf_cpu_synchronize_state(cpu_state);
 +        cpu_synchronize_state(cpu_state);
          do_cpu_init(cpu);
      }
-     for (i = 0; i < nr_apu_cpus; i++) {
+@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
--        DeviceState *cpudev = DEVICE(s->fpd.apu.cpu[i]);
+         cpu_state->halted = 0;
-+        DeviceState *cpudev = DEVICE(&s->fpd.apu.cpu[i]);
+     }
-         int ppibase = XLNX_VERSAL_NR_IRQS + i * GIC_INTERNAL + GIC_NR_SGIS;
+     if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) {
-         qemu_irq maint_irq;
+-        hvf_cpu_synchronize_state(cpu_state);
-         int ti;
++        cpu_synchronize_state(cpu_state);
          do_cpu_sipi(cpu);
      }
      if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) {
          cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR;
 -        hvf_cpu_synchronize_state(cpu_state);
 +        cpu_synchronize_state(cpu_state);
          apic_handle_tpr_access_report(cpu->apic_state, env->eip,
                                        env->tpr_access_type);
      }
 --
 .20.1

-[PULL 12/39] hw/arm: versal: Embed the UARTs into the SoC type
+[PULL 36/45] hvf: Make synchronize functions static
-From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+From: Alexander Graf <agraf@csgraf.de>
-Embed the UARTs into the SoC type.
+The hvf accel synchronize functions are only used as input for local
 callback functions, so we can make them static.
-Suggested-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Reviewed-by: Sergio Lopez <slp@redhat.com>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Message-id: 20210519202253.76782-10-agraf@csgraf.de
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Luc Michel <luc.michel@greensocs.com>
 Message-id: 20200427181649.26851-5-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/hw/arm/xlnx-versal.h |  3 ++-
+ accel/hvf/hvf-accel-ops.h | 3 ---
- hw/arm/xlnx-versal.c         | 12 ++++++------
+ accel/hvf/hvf-accel-ops.c | 6 +++---
-files changed, 8 insertions(+), 7 deletions(-)
+files changed, 3 insertions(+), 6 deletions(-)
-diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
+diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/xlnx-versal.h
+--- a/accel/hvf/hvf-accel-ops.h
-+++ b/include/hw/arm/xlnx-versal.h
++++ b/accel/hvf/hvf-accel-ops.h
 @@ -XXX,XX +XXX,XX @@
- #include "hw/sysbus.h"
+ #include "sysemu/cpus.h"
- #include "hw/arm/boot.h"
- #include "hw/intc/arm_gicv3.h"
+ int hvf_vcpu_exec(CPUState *);
-+#include "hw/char/pl011.h"
+-void hvf_cpu_synchronize_post_reset(CPUState *);
+-void hvf_cpu_synchronize_post_init(CPUState *);
- #define TYPE_XLNX_VERSAL "xlnx-versal"
+-void hvf_cpu_synchronize_pre_loadvm(CPUState *);
- #define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
-@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
+ #endif /* HVF_CPUS_H */
-         MemoryRegion mr_ocm;
+diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
          struct {
 -            SysBusDevice *uart[XLNX_VERSAL_NR_UARTS];
 +            PL011State uart[XLNX_VERSAL_NR_UARTS];
              SysBusDevice *gem[XLNX_VERSAL_NR_GEMS];
              SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
          } iou;
 diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/xlnx-versal.c
+--- a/accel/hvf/hvf-accel-ops.c
-+++ b/hw/arm/xlnx-versal.c
++++ b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
- #include "kvm_arm.h"
+     cpu->vcpu_dirty = false;
- #include "hw/misc/unimp.h"
+ }
- #include "hw/arm/xlnx-versal.h"
--#include "hw/char/pl011.h"
+-void hvf_cpu_synchronize_post_reset(CPUState *cpu)
++static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
- #define XLNX_VERSAL_ACPU_TYPE ARM_CPU_TYPE_NAME("cortex-a72")
+ {
- #define GEM_REVISION        0x40070106
+     run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
-@@ -XXX,XX +XXX,XX @@ static void versal_create_uarts(Versal *s, qemu_irq *pic)
+ }
-         DeviceState *dev;
+@@ -XXX,XX +XXX,XX @@ static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
-         MemoryRegion *mr;
+     cpu->vcpu_dirty = false;
+ }
--        dev = qdev_create(NULL, TYPE_PL011);
--        s->lpd.iou.uart[i] = SYS_BUS_DEVICE(dev);
+-void hvf_cpu_synchronize_post_init(CPUState *cpu)
-+        sysbus_init_child_obj(OBJECT(s), name,
++static void hvf_cpu_synchronize_post_init(CPUState *cpu)
-+                              &s->lpd.iou.uart[i], sizeof(s->lpd.iou.uart[i]),
+ {
-+                              TYPE_PL011);
+     run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
-+        dev = DEVICE(&s->lpd.iou.uart[i]);
+ }
-         qdev_prop_set_chr(dev, "chardev", serial_hd(i));
+@@ -XXX,XX +XXX,XX @@ static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
--        object_property_add_child(OBJECT(s), name, OBJECT(dev), &error_fatal);
+     cpu->vcpu_dirty = true;
-         qdev_init_nofail(dev);
+ }
--        mr = sysbus_mmio_get_region(s->lpd.iou.uart[i], 0);
+-void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
-+        mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
++static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
-         memory_region_add_subregion(&s->mr_ps, addrs[i], mr);
+ {
+     run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
 -        sysbus_connect_irq(s->lpd.iou.uart[i], 0, pic[irqs[i]]);
 +        sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[irqs[i]]);
          g_free(name);
      }
  }
 --
 .20.1

-[PULL 01/39] target/arm: Make VQDMULL undefined when U=1
+[PULL 37/45] hvf: Remove hvf-accel-ops.h
-From: Fredrik Strupe <fredrik@strupe.net>
+From: Alexander Graf <agraf@csgraf.de>
-According to Arm ARM, VQDMULL is only valid when U=0, while having
+We can move the definition of hvf_vcpu_exec() into our internal
-U=1 is unallocated.
+hvf header, obsoleting the need for hvf-accel-ops.h.
-Signed-off-by: Fredrik Strupe <fredrik@strupe.net>
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Fixes: 695272dcb976 ("target-arm: Handle UNDEF cases for Neon 3-regs-different-widths")
+Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-11-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 2 +-
+ accel/hvf/hvf-accel-ops.h | 17 -----------------
-file changed, 1 insertion(+), 1 deletion(-)
+ include/sysemu/hvf_int.h  |  1 +
  accel/hvf/hvf-accel-ops.c |  2 --
  target/i386/hvf/hvf.c     |  2 --
 files changed, 1 insertion(+), 21 deletions(-)
  delete mode 100644 accel/hvf/hvf-accel-ops.h
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
 deleted file mode 100644
 index XXXXXXX..XXXXXXX
 --- a/accel/hvf/hvf-accel-ops.h
 +++ /dev/null
@@ -XXX,XX +XXX,XX @@
 -/*
 - * Accelerator CPUS Interface
 - *
 - * Copyright 2020 SUSE LLC
 - *
 - * This work is licensed under the terms of the GNU GPL, version 2 or later.
 - * See the COPYING file in the top-level directory.
 - */
 -
 -#ifndef HVF_CPUS_H
 -#define HVF_CPUS_H
 -
 -#include "sysemu/cpus.h"
 -
 -int hvf_vcpu_exec(CPUState *);
 -
 -#endif /* HVF_CPUS_H */
 diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/include/sysemu/hvf_int.h
-+++ b/target/arm/translate.c
++++ b/include/sysemu/hvf_int.h
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ extern HVFState *hvf_state;
-                     {0, 0, 0, 0}, /* VMLSL */
+ void assert_hvf_ok(hv_return_t ret);
-                     {0, 0, 0, 9}, /* VQDMLSL */
+ int hvf_arch_init_vcpu(CPUState *cpu);
-                     {0, 0, 0, 0}, /* Integer VMULL */
+ void hvf_arch_vcpu_destroy(CPUState *cpu);
--                    {0, 0, 0, 1}, /* VQDMULL */
++int hvf_vcpu_exec(CPUState *);
-+                    {0, 0, 0, 9}, /* VQDMULL */
+ hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
-                     {0, 0, 0, 0xa}, /* Polynomial VMULL */
+ int hvf_put_registers(CPUState *);
-                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
+ int hvf_get_registers(CPUState *);
-                 };
+diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/hvf/hvf-accel-ops.c
 +++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@
  #include "sysemu/runstate.h"
  #include "qemu/guest-random.h"
 -#include "hvf-accel-ops.h"
 -
  HVFState *hvf_state;
  /* Memory slots */
 diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/hvf.c
 +++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/accel.h"
  #include "target/i386/cpu.h"
 -#include "hvf-accel-ops.h"
 -
  void vmx_update_tpr(CPUState *cpu)
  {
      /* TODO: need integrate APIC handling */
 --
 .20.1

-[PULL 26/39] target/arm: Convert VFM[AS]L (vector) to decodetree
+[PULL 38/45] hvf: Introduce hvf vcpu struct
-Convert the VFM[AS]L (vector) insns to decodetree.  This is the last
+From: Alexander Graf <agraf@csgraf.de>
 insn in the legacy decoder for the 3same_ext group, so we can
 delete the legacy decoder function for the group entirely.
-Note that in disas_thumb2_insn() the parts of this encoding space
+We will need more than a single field for hvf going forward. To keep
-where the decodetree decoder returns false will correctly be directed
+the global vcpu struct uncluttered, let's allocate a special hvf vcpu
-to illegal_op by the "(insn & (1 << 28))" check so they won't fall
+struct, similar to how hax does it.
 into disas_coproc_insn() by mistake.
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
+Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
+Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Reviewed-by: Sergio Lopez <slp@redhat.com>
+Message-id: 20210519202253.76782-12-agraf@csgraf.de
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200430181003.21682-8-peter.maydell@linaro.org
 ---
- target/arm/neon-shared.decode   |  6 +++
+ include/hw/core/cpu.h       |   3 +-
- target/arm/translate-neon.inc.c | 31 +++++++++++
+ include/sysemu/hvf_int.h    |   4 +
- target/arm/translate.c          | 92 +--------------------------------
+ target/i386/hvf/vmx.h       |  24 +++--
-files changed, 38 insertions(+), 91 deletions(-)
+ accel/hvf/hvf-accel-ops.c   |   8 +-
  target/i386/hvf/hvf.c       | 104 +++++++++---------
  target/i386/hvf/x86.c       |  28 ++---
  target/i386/hvf/x86_descr.c |  26 ++---
  target/i386/hvf/x86_emu.c   |  62 +++++------
  target/i386/hvf/x86_mmu.c   |   4 +-
  target/i386/hvf/x86_task.c  |  12 +--
  target/i386/hvf/x86hvf.c    | 210 ++++++++++++++++++------------------
 files changed, 248 insertions(+), 237 deletions(-)
-diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
+diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-shared.decode
+--- a/include/hw/core/cpu.h
-+++ b/target/arm/neon-shared.decode
++++ b/include/hw/core/cpu.h
-@@ -XXX,XX +XXX,XX @@ VCADD          1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
+@@ -XXX,XX +XXX,XX @@ struct KVMState;
- # VUDOT and VSDOT
+ struct kvm_run;
- VDOT           1111 110 00 . 10 .... .... 1101 . q:1 . u:1 .... \
-                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+ struct hax_vcpu_state;
 +struct hvf_vcpu_state;
  #define TB_JMP_CACHE_BITS 12
  #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
@@ -XXX,XX +XXX,XX @@ struct CPUState {
      struct hax_vcpu_state *hax_vcpu;
 -    int hvf_fd;
 +    struct hvf_vcpu_state *hvf;
      /* track IOMMUs whose translations we've cached in the TCG TLB */
      GArray *iommu_notifiers;
 diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/sysemu/hvf_int.h
 +++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
  };
  extern HVFState *hvf_state;
 +struct hvf_vcpu_state {
 +    int fd;
 +};
 +
-+# VFM[AS]L
+ void assert_hvf_ok(hv_return_t ret);
-+VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
+ int hvf_arch_init_vcpu(CPUState *cpu);
-+               vm=%vm_sp vn=%vn_sp vd=%vd_dp q=0
+ void hvf_arch_vcpu_destroy(CPUState *cpu);
-+VFML           1111 110 0 s:1 . 10 .... .... 1000 . 1 . 1 .... \
+diff --git a/target/i386/hvf/vmx.h b/target/i386/hvf/vmx.h
 +               vm=%vm_dp vn=%vn_dp vd=%vd_dp q=1
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/i386/hvf/vmx.h
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/i386/hvf/vmx.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VDOT(DisasContext *s, arg_VDOT *a)
+@@ -XXX,XX +XXX,XX @@
-                        opr_sz, opr_sz, 0, fn_gvec);
+ #include "vmcs.h"
-     return true;
+ #include "cpu.h"
- }
+ #include "x86.h"
 +#include "sysemu/hvf.h"
 +#include "sysemu/hvf_int.h"
  #include "exec/address-spaces.h"
@@ -XXX,XX +XXX,XX @@ static inline void macvm_set_rip(CPUState *cpu, uint64_t rip)
      uint64_t val;
      /* BUG, should take considering overlap.. */
 -    wreg(cpu->hvf_fd, HV_X86_RIP, rip);
 +    wreg(cpu->hvf->fd, HV_X86_RIP, rip);
      env->eip = rip;
      /* after moving forward in rip, we need to clean INTERRUPTABILITY */
 -   val = rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
 +   val = rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
     if (val & (VMCS_INTERRUPTIBILITY_STI_BLOCKING |
                 VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)) {
          env->hflags &= ~HF_INHIBIT_IRQ_MASK;
 -        wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY,
 +        wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY,
                 val & ~(VMCS_INTERRUPTIBILITY_STI_BLOCKING |
                 VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING));
     }
@@ -XXX,XX +XXX,XX @@ static inline void vmx_clear_nmi_blocking(CPUState *cpu)
      CPUX86State *env = &x86_cpu->env;
      env->hflags2 &= ~HF2_NMI_MASK;
 -    uint32_t gi = (uint32_t) rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
 +    uint32_t gi = (uint32_t) rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
      gi &= ~VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
 -    wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
 +    wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
  }
  static inline void vmx_set_nmi_blocking(CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ static inline void vmx_set_nmi_blocking(CPUState *cpu)
      CPUX86State *env = &x86_cpu->env;
      env->hflags2 |= HF2_NMI_MASK;
 -    uint32_t gi = (uint32_t)rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
 +    uint32_t gi = (uint32_t)rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
      gi |= VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
 -    wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
 +    wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
  }
  static inline void vmx_set_nmi_window_exiting(CPUState *cpu)
  {
      uint64_t val;
 -    val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
 -    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val |
 +    val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
 +    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val |
            VMCS_PRI_PROC_BASED_CTLS_NMI_WINDOW_EXITING);
  }
@@ -XXX,XX +XXX,XX @@ static inline void vmx_clear_nmi_window_exiting(CPUState *cpu)
  {
      uint64_t val;
 -    val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
 -    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val &
 +    val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
 +    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val &
            ~VMCS_PRI_PROC_BASED_CTLS_NMI_WINDOW_EXITING);
  }
 diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/hvf/hvf-accel-ops.c
 +++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ type_init(hvf_type_init);
  static void hvf_vcpu_destroy(CPUState *cpu)
  {
 -    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
 +    hv_return_t ret = hv_vcpu_destroy(cpu->hvf->fd);
      assert_hvf_ok(ret);
      hvf_arch_vcpu_destroy(cpu);
 +    g_free(cpu->hvf);
 +    cpu->hvf = NULL;
  }
  static int hvf_init_vcpu(CPUState *cpu)
  {
      int r;
 +    cpu->hvf = g_malloc0(sizeof(*cpu->hvf));
 +
-+static bool trans_VFML(DisasContext *s, arg_VFML *a)
+     /* init cpu signals */
-+{
+     sigset_t set;
-+    int opr_sz;
+     struct sigaction sigact;
-+
+@@ -XXX,XX +XXX,XX @@ static int hvf_init_vcpu(CPUState *cpu)
-+    if (!dc_isar_feature(aa32_fhm, s)) {
+     pthread_sigmask(SIG_BLOCK, NULL, &set);
-+        return false;
+     sigdelset(&set, SIG_IPI);
-+    }
-+
+-    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
++    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf->fd, HV_VCPU_DEFAULT);
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+     cpu->vcpu_dirty = 1;
-+        (a->vd & 0x10)) {
+     assert_hvf_ok(r);
-+        return false;
-+    }
+diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
 +
 +    if (a->vd & a->q) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    opr_sz = (1 + a->q) * 8;
 +    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
 +                       vfp_reg_offset(a->q, a->vn),
 +                       vfp_reg_offset(a->q, a->vm),
 +                       cpu_env, opr_sz, opr_sz, a->s, /* is_2 == 0 */
 +                       gen_helper_gvec_fmlal_a32);
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/i386/hvf/hvf.c
-+++ b/target/arm/translate.c
++++ b/target/i386/hvf/hvf.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ void vmx_update_tpr(CPUState *cpu)
      int tpr = cpu_get_apic_tpr(x86_cpu->apic_state) << 4;
      int irr = apic_get_highest_priority_irr(x86_cpu->apic_state);
 -    wreg(cpu->hvf_fd, HV_X86_TPR, tpr);
 +    wreg(cpu->hvf->fd, HV_X86_TPR, tpr);
      if (irr == -1) {
 -        wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, 0);
 +        wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, 0);
      } else {
 -        wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, (irr > tpr) ? tpr >> 4 :
 +        wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, (irr > tpr) ? tpr >> 4 :
                irr >> 4);
      }
  }
@@ -XXX,XX +XXX,XX @@ void vmx_update_tpr(CPUState *cpu)
  static void update_apic_tpr(CPUState *cpu)
  {
      X86CPU *x86_cpu = X86_CPU(cpu);
 -    int tpr = rreg(cpu->hvf_fd, HV_X86_TPR) >> 4;
 +    int tpr = rreg(cpu->hvf->fd, HV_X86_TPR) >> 4;
      cpu_set_apic_tpr(x86_cpu->apic_state, tpr);
  }
@@ -XXX,XX +XXX,XX @@ int hvf_arch_init_vcpu(CPUState *cpu)
      }
      /* set VMCS control fields */
 -    wvmcs(cpu->hvf_fd, VMCS_PIN_BASED_CTLS,
 +    wvmcs(cpu->hvf->fd, VMCS_PIN_BASED_CTLS,
            cap2ctrl(hvf_state->hvf_caps->vmx_cap_pinbased,
            VMCS_PIN_BASED_CTLS_EXTINT |
            VMCS_PIN_BASED_CTLS_NMI |
            VMCS_PIN_BASED_CTLS_VNMI));
 -    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS,
 +    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS,
            cap2ctrl(hvf_state->hvf_caps->vmx_cap_procbased,
            VMCS_PRI_PROC_BASED_CTLS_HLT |
            VMCS_PRI_PROC_BASED_CTLS_MWAIT |
            VMCS_PRI_PROC_BASED_CTLS_TSC_OFFSET |
            VMCS_PRI_PROC_BASED_CTLS_TPR_SHADOW) |
            VMCS_PRI_PROC_BASED_CTLS_SEC_CONTROL);
 -    wvmcs(cpu->hvf_fd, VMCS_SEC_PROC_BASED_CTLS,
 +    wvmcs(cpu->hvf->fd, VMCS_SEC_PROC_BASED_CTLS,
            cap2ctrl(hvf_state->hvf_caps->vmx_cap_procbased2,
                     VMCS_PRI_PROC_BASED2_CTLS_APIC_ACCESSES));
 -    wvmcs(cpu->hvf_fd, VMCS_ENTRY_CTLS, cap2ctrl(hvf_state->hvf_caps->vmx_cap_entry,
 +    wvmcs(cpu->hvf->fd, VMCS_ENTRY_CTLS, cap2ctrl(hvf_state->hvf_caps->vmx_cap_entry,
 ));
 -    wvmcs(cpu->hvf_fd, VMCS_EXCEPTION_BITMAP, 0); /* Double fault */
 +    wvmcs(cpu->hvf->fd, VMCS_EXCEPTION_BITMAP, 0); /* Double fault */
 -    wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, 0);
 +    wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, 0);
      x86cpu = X86_CPU(cpu);
      x86cpu->env.xsave_buf = qemu_memalign(4096, 4096);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_STAR, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_LSTAR, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_CSTAR, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_FMASK, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_FSBASE, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_GSBASE, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_KERNELGSBASE, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_TSC_AUX, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_TSC, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_CS, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_EIP, 1);
 -    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_ESP, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_STAR, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_LSTAR, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_CSTAR, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_FMASK, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_FSBASE, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_GSBASE, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_KERNELGSBASE, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_TSC_AUX, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_TSC, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_CS, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_EIP, 1);
 +    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_ESP, 1);
      return 0;
  }
+@@ -XXX,XX +XXX,XX @@ static void hvf_store_events(CPUState *cpu, uint32_t ins_len, uint64_t idtvec_in
--/* Advanced SIMD three registers of the same length extension.
+         }
-- *  31           25    23  22    20   16   12  11   10   9    8        3     0
+         if (idtvec_info & VMCS_IDT_VEC_ERRCODE_VALID) {
-- * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
+             env->has_error_code = true;
-- * | 1 1 1 1 1 1 0 | op1 | D | op2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
+-            env->error_code = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_ERROR);
-- * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
++            env->error_code = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_ERROR);
-- */
+         }
--static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
+     }
--{
+-    if ((rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY) &
--    gen_helper_gvec_3 *fn_gvec = NULL;
++    if ((rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY) &
--    gen_helper_gvec_3_ptr *fn_gvec_ptr = NULL;
+         VMCS_INTERRUPTIBILITY_NMI_BLOCKING)) {
--    int rd, rn, rm, opr_sz;
+         env->hflags2 |= HF2_NMI_MASK;
--    int data = 0;
+     } else {
--    int off_rn, off_rm;
+         env->hflags2 &= ~HF2_NMI_MASK;
--    bool is_long = false, q = extract32(insn, 6, 1);
+     }
--    bool ptr_is_env = false;
+-    if (rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY) &
--
++    if (rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY) &
--    if ((insn & 0xff300f10) == 0xfc200810) {
+          (VMCS_INTERRUPTIBILITY_STI_BLOCKING |
--        /* VFM[AS]L -- 1111 1100 S.10 .... .... 1000 .Q.1 .... */
+          VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)) {
--        int is_s = extract32(insn, 23, 1);
+         env->hflags |= HF_INHIBIT_IRQ_MASK;
--        if (!dc_isar_feature(aa32_fhm, s)) {
+@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
--            return 1;
+             return EXCP_HLT;
--        }
+         }
--        is_long = true;
--        data = is_s; /* is_2 == 0 */
+-        hv_return_t r  = hv_vcpu_run(cpu->hvf_fd);
--        fn_gvec_ptr = gen_helper_gvec_fmlal_a32;
++        hv_return_t r  = hv_vcpu_run(cpu->hvf->fd);
--        ptr_is_env = true;
+         assert_hvf_ok(r);
--    } else {
--        return 1;
+         /* handle VMEXIT */
--    }
+-        uint64_t exit_reason = rvmcs(cpu->hvf_fd, VMCS_EXIT_REASON);
--
+-        uint64_t exit_qual = rvmcs(cpu->hvf_fd, VMCS_EXIT_QUALIFICATION);
--    VFP_DREG_D(rd, insn);
+-        uint32_t ins_len = (uint32_t)rvmcs(cpu->hvf_fd,
--    if (rd & q) {
++        uint64_t exit_reason = rvmcs(cpu->hvf->fd, VMCS_EXIT_REASON);
--        return 1;
++        uint64_t exit_qual = rvmcs(cpu->hvf->fd, VMCS_EXIT_QUALIFICATION);
--    }
++        uint32_t ins_len = (uint32_t)rvmcs(cpu->hvf->fd,
--    if (q || !is_long) {
+                                            VMCS_EXIT_INSTRUCTION_LENGTH);
--        VFP_DREG_N(rn, insn);
--        VFP_DREG_M(rm, insn);
+-        uint64_t idtvec_info = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_INFO);
--        if ((rn | rm) & q & !is_long) {
++        uint64_t idtvec_info = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_INFO);
--            return 1;
--        }
+         hvf_store_events(cpu, ins_len, idtvec_info);
--        off_rn = vfp_reg_offset(1, rn);
+-        rip = rreg(cpu->hvf_fd, HV_X86_RIP);
--        off_rm = vfp_reg_offset(1, rm);
+-        env->eflags = rreg(cpu->hvf_fd, HV_X86_RFLAGS);
--    } else {
++        rip = rreg(cpu->hvf->fd, HV_X86_RIP);
--        rn = VFP_SREG_N(insn);
++        env->eflags = rreg(cpu->hvf->fd, HV_X86_RFLAGS);
--        rm = VFP_SREG_M(insn);
--        off_rn = vfp_reg_offset(0, rn);
+         qemu_mutex_lock_iothread();
--        off_rm = vfp_reg_offset(0, rm);
--    }
+@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
--
+         case EXIT_REASON_EPT_FAULT:
--    if (s->fp_excp_el) {
+         {
--        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+             hvf_slot *slot;
--                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
+-            uint64_t gpa = rvmcs(cpu->hvf_fd, VMCS_GUEST_PHYSICAL_ADDRESS);
--        return 0;
++            uint64_t gpa = rvmcs(cpu->hvf->fd, VMCS_GUEST_PHYSICAL_ADDRESS);
--    }
--    if (!s->vfp_enabled) {
+             if (((idtvec_info & VMCS_IDT_VEC_VALID) == 0) &&
--        return 1;
+                 ((exit_qual & EXIT_QUAL_NMIUDTI) != 0)) {
--    }
+@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
--
+                 store_regs(cpu);
--    opr_sz = (1 + q) * 8;
+                 break;
--    if (fn_gvec_ptr) {
+             } else if (!string && !in) {
--        TCGv_ptr ptr;
+-                RAX(env) = rreg(cpu->hvf_fd, HV_X86_RAX);
--        if (ptr_is_env) {
++                RAX(env) = rreg(cpu->hvf->fd, HV_X86_RAX);
--            ptr = cpu_env;
+                 hvf_handle_io(env, port, &RAX(env), 1, size, 1);
--        } else {
+                 macvm_set_rip(cpu, rip + ins_len);
--            ptr = get_fpstatus_ptr(1);
+                 break;
--        }
+@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
 -        tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), off_rn, off_rm, ptr,
 -                           opr_sz, opr_sz, data, fn_gvec_ptr);
 -        if (!ptr_is_env) {
 -            tcg_temp_free_ptr(ptr);
 -        }
 -    } else {
 -        tcg_gen_gvec_3_ool(vfp_reg_offset(1, rd), off_rn, off_rm,
 -                           opr_sz, opr_sz, data, fn_gvec);
 -    }
 -    return 0;
 -}
 -
  /* Advanced SIMD two registers and a scalar extension.
   *  31             24   23  22   20   16   12  11   10   9    8        3     0
   * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                      }
                  }
              }
 -        } else if ((insn & 0x0e000a00) == 0x0c000800
 -                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
 -            if (disas_neon_insn_3same_ext(s, insn)) {
 -                goto illegal_op;
 -            }
 -            return;
          } else if ((insn & 0x0f000a00) == 0x0e000800
                     && arm_dc_feature(s, ARM_FEATURE_V8)) {
              if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
              }
              break;
          }
--        if ((insn & 0xfe000a00) == 0xfc000800
+         case EXIT_REASON_CPUID: {
-+        if ((insn & 0xff000a00) == 0xfe000800
+-            uint32_t rax = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RAX);
-             && arm_dc_feature(s, ARM_FEATURE_V8)) {
+-            uint32_t rbx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RBX);
-             /* The Thumb2 and ARM encodings are identical.  */
+-            uint32_t rcx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RCX);
--            if (disas_neon_insn_3same_ext(s, insn)) {
+-            uint32_t rdx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RDX);
--                goto illegal_op;
++            uint32_t rax = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RAX);
--            }
++            uint32_t rbx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RBX);
--        } else if ((insn & 0xff000a00) == 0xfe000800
++            uint32_t rcx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RCX);
--                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
++            uint32_t rdx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RDX);
--            /* The Thumb2 and ARM encodings are identical.  */
-             if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
+             if (rax == 1) {
-                 goto illegal_op;
+                 /* CPUID1.ecx.OSXSAVE needs to know CR4 */
 -                env->cr[4] = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR4);
 +                env->cr[4] = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR4);
              }
+             hvf_cpu_x86_cpuid(env, rax, rcx, &rax, &rbx, &rcx, &rdx);
+-            wreg(cpu->hvf_fd, HV_X86_RAX, rax);
+-            wreg(cpu->hvf_fd, HV_X86_RBX, rbx);
+-            wreg(cpu->hvf_fd, HV_X86_RCX, rcx);
+-            wreg(cpu->hvf_fd, HV_X86_RDX, rdx);
++            wreg(cpu->hvf->fd, HV_X86_RAX, rax);
++            wreg(cpu->hvf->fd, HV_X86_RBX, rbx);
++            wreg(cpu->hvf->fd, HV_X86_RCX, rcx);
++            wreg(cpu->hvf->fd, HV_X86_RDX, rdx);
+             macvm_set_rip(cpu, rip + ins_len);
+             break;
+@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
+         case EXIT_REASON_XSETBV: {
+             X86CPU *x86_cpu = X86_CPU(cpu);
+             CPUX86State *env = &x86_cpu->env;
+-            uint32_t eax = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RAX);
+-            uint32_t ecx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RCX);
+-            uint32_t edx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RDX);
++            uint32_t eax = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RAX);
++            uint32_t ecx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RCX);
++            uint32_t edx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RDX);
+             if (ecx) {
+                 macvm_set_rip(cpu, rip + ins_len);
+                 break;
+             }
+             env->xcr0 = ((uint64_t)edx << 32) | eax;
+-            wreg(cpu->hvf_fd, HV_X86_XCR0, env->xcr0 | 1);
++            wreg(cpu->hvf->fd, HV_X86_XCR0, env->xcr0 | 1);
+             macvm_set_rip(cpu, rip + ins_len);
+             break;
+         }
+@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
+             switch (cr) {
+             case 0x0: {
+-                macvm_set_cr0(cpu->hvf_fd, RRX(env, reg));
++                macvm_set_cr0(cpu->hvf->fd, RRX(env, reg));
+                 break;
+             }
+             case 4: {
+-                macvm_set_cr4(cpu->hvf_fd, RRX(env, reg));
++                macvm_set_cr4(cpu->hvf->fd, RRX(env, reg));
+                 break;
+             }
+             case 8: {
+@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
+             break;
+         }
+         case EXIT_REASON_TASK_SWITCH: {
+-            uint64_t vinfo = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_INFO);
++            uint64_t vinfo = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_INFO);
+             x68_segment_selector sel = {.sel = exit_qual & 0xffff};
+             vmx_handle_task_switch(cpu, sel, (exit_qual >> 30) & 0x3,
+              vinfo & VMCS_INTR_VALID, vinfo & VECTORING_INFO_VECTOR_MASK, vinfo
+@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
+             break;
+         }
+         case EXIT_REASON_RDPMC:
+-            wreg(cpu->hvf_fd, HV_X86_RAX, 0);
+-            wreg(cpu->hvf_fd, HV_X86_RDX, 0);
++            wreg(cpu->hvf->fd, HV_X86_RAX, 0);
++            wreg(cpu->hvf->fd, HV_X86_RDX, 0);
+             macvm_set_rip(cpu, rip + ins_len);
+             break;
+         case VMX_REASON_VMCALL:
+diff --git a/target/i386/hvf/x86.c b/target/i386/hvf/x86.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/i386/hvf/x86.c
++++ b/target/i386/hvf/x86.c
+@@ -XXX,XX +XXX,XX @@ bool x86_read_segment_descriptor(struct CPUState *cpu,
+     }
+     if (GDT_SEL == sel.ti) {
+-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_BASE);
+-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
++        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_BASE);
++        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
+     } else {
+-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_BASE);
+-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_LIMIT);
++        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_BASE);
++        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_LIMIT);
+     }
+     if (sel.index * 8 >= limit) {
+@@ -XXX,XX +XXX,XX @@ bool x86_write_segment_descriptor(struct CPUState *cpu,
+     uint32_t limit;
+     if (GDT_SEL == sel.ti) {
+-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_BASE);
+-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
++        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_BASE);
++        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
+     } else {
+-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_BASE);
+-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_LIMIT);
++        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_BASE);
++        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_LIMIT);
+     }
+     if (sel.index * 8 >= limit) {
+@@ -XXX,XX +XXX,XX @@ bool x86_write_segment_descriptor(struct CPUState *cpu,
+ bool x86_read_call_gate(struct CPUState *cpu, struct x86_call_gate *idt_desc,
+                         int gate)
+ {
+-    target_ulong base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_IDTR_BASE);
+-    uint32_t limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_IDTR_LIMIT);
++    target_ulong base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_IDTR_BASE);
++    uint32_t limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_IDTR_LIMIT);
+     memset(idt_desc, 0, sizeof(*idt_desc));
+     if (gate * 8 >= limit) {
+@@ -XXX,XX +XXX,XX @@ bool x86_read_call_gate(struct CPUState *cpu, struct x86_call_gate *idt_desc,
+ bool x86_is_protected(struct CPUState *cpu)
+ {
+-    uint64_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
++    uint64_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
+     return cr0 & CR0_PE;
+ }
+@@ -XXX,XX +XXX,XX @@ bool x86_is_v8086(struct CPUState *cpu)
+ bool x86_is_long_mode(struct CPUState *cpu)
+ {
+-    return rvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER) & MSR_EFER_LMA;
++    return rvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER) & MSR_EFER_LMA;
+ }
+ bool x86_is_long64_mode(struct CPUState *cpu)
+@@ -XXX,XX +XXX,XX @@ bool x86_is_long64_mode(struct CPUState *cpu)
+ bool x86_is_paging_mode(struct CPUState *cpu)
+ {
+-    uint64_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
++    uint64_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
+     return cr0 & CR0_PG;
+ }
+ bool x86_is_pae_enabled(struct CPUState *cpu)
+ {
+-    uint64_t cr4 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR4);
++    uint64_t cr4 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR4);
+     return cr4 & CR4_PAE;
+ }
+diff --git a/target/i386/hvf/x86_descr.c b/target/i386/hvf/x86_descr.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/i386/hvf/x86_descr.c
++++ b/target/i386/hvf/x86_descr.c
+@@ -XXX,XX +XXX,XX @@ static const struct vmx_segment_field {
+ uint32_t vmx_read_segment_limit(CPUState *cpu, X86Seg seg)
+ {
+-    return (uint32_t)rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].limit);
++    return (uint32_t)rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].limit);
+ }
+ uint32_t vmx_read_segment_ar(CPUState *cpu, X86Seg seg)
+ {
+-    return (uint32_t)rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].ar_bytes);
++    return (uint32_t)rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].ar_bytes);
+ }
+ uint64_t vmx_read_segment_base(CPUState *cpu, X86Seg seg)
+ {
+-    return rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].base);
++    return rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].base);
+ }
+ x68_segment_selector vmx_read_segment_selector(CPUState *cpu, X86Seg seg)
+ {
+     x68_segment_selector sel;
+-    sel.sel = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector);
++    sel.sel = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector);
+     return sel;
+ }
+ void vmx_write_segment_selector(struct CPUState *cpu, x68_segment_selector selector, X86Seg seg)
+ {
+-    wvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector, selector.sel);
++    wvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector, selector.sel);
+ }
+ void vmx_read_segment_descriptor(struct CPUState *cpu, struct vmx_segment *desc, X86Seg seg)
+ {
+-    desc->sel = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector);
+-    desc->base = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].base);
+-    desc->limit = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].limit);
+-    desc->ar = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].ar_bytes);
++    desc->sel = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector);
++    desc->base = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].base);
++    desc->limit = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].limit);
++    desc->ar = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].ar_bytes);
+ }
+ void vmx_write_segment_descriptor(CPUState *cpu, struct vmx_segment *desc, X86Seg seg)
+ {
+     const struct vmx_segment_field *sf = &vmx_segment_fields[seg];
+-    wvmcs(cpu->hvf_fd, sf->base, desc->base);
+-    wvmcs(cpu->hvf_fd, sf->limit, desc->limit);
+-    wvmcs(cpu->hvf_fd, sf->selector, desc->sel);
+-    wvmcs(cpu->hvf_fd, sf->ar_bytes, desc->ar);
++    wvmcs(cpu->hvf->fd, sf->base, desc->base);
++    wvmcs(cpu->hvf->fd, sf->limit, desc->limit);
++    wvmcs(cpu->hvf->fd, sf->selector, desc->sel);
++    wvmcs(cpu->hvf->fd, sf->ar_bytes, desc->ar);
+ }
+ void x86_segment_descriptor_to_vmx(struct CPUState *cpu, x68_segment_selector selector, struct x86_segment_descriptor *desc, struct vmx_segment *vmx_desc)
+diff --git a/target/i386/hvf/x86_emu.c b/target/i386/hvf/x86_emu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/i386/hvf/x86_emu.c
++++ b/target/i386/hvf/x86_emu.c
+@@ -XXX,XX +XXX,XX @@ void simulate_rdmsr(struct CPUState *cpu)
+     switch (msr) {
+     case MSR_IA32_TSC:
+-        val = rdtscp() + rvmcs(cpu->hvf_fd, VMCS_TSC_OFFSET);
++        val = rdtscp() + rvmcs(cpu->hvf->fd, VMCS_TSC_OFFSET);
+         break;
+     case MSR_IA32_APICBASE:
+         val = cpu_get_apic_base(X86_CPU(cpu)->apic_state);
+@@ -XXX,XX +XXX,XX @@ void simulate_rdmsr(struct CPUState *cpu)
+         val = x86_cpu->ucode_rev;
+         break;
+     case MSR_EFER:
+-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER);
++        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER);
+         break;
+     case MSR_FSBASE:
+-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_FS_BASE);
++        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_FS_BASE);
+         break;
+     case MSR_GSBASE:
+-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_GS_BASE);
++        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_GS_BASE);
+         break;
+     case MSR_KERNELGSBASE:
+-        val = rvmcs(cpu->hvf_fd, VMCS_HOST_FS_BASE);
++        val = rvmcs(cpu->hvf->fd, VMCS_HOST_FS_BASE);
+         break;
+     case MSR_STAR:
+         abort();
+@@ -XXX,XX +XXX,XX @@ void simulate_wrmsr(struct CPUState *cpu)
+         cpu_set_apic_base(X86_CPU(cpu)->apic_state, data);
+         break;
+     case MSR_FSBASE:
+-        wvmcs(cpu->hvf_fd, VMCS_GUEST_FS_BASE, data);
++        wvmcs(cpu->hvf->fd, VMCS_GUEST_FS_BASE, data);
+         break;
+     case MSR_GSBASE:
+-        wvmcs(cpu->hvf_fd, VMCS_GUEST_GS_BASE, data);
++        wvmcs(cpu->hvf->fd, VMCS_GUEST_GS_BASE, data);
+         break;
+     case MSR_KERNELGSBASE:
+-        wvmcs(cpu->hvf_fd, VMCS_HOST_FS_BASE, data);
++        wvmcs(cpu->hvf->fd, VMCS_HOST_FS_BASE, data);
+         break;
+     case MSR_STAR:
+         abort();
+@@ -XXX,XX +XXX,XX @@ void simulate_wrmsr(struct CPUState *cpu)
+         break;
+     case MSR_EFER:
+         /*printf("new efer %llx\n", EFER(cpu));*/
+-        wvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER, data);
++        wvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER, data);
+         if (data & MSR_EFER_NXE) {
+-            hv_vcpu_invalidate_tlb(cpu->hvf_fd);
++            hv_vcpu_invalidate_tlb(cpu->hvf->fd);
+         }
+         break;
+     case MSR_MTRRphysBase(0):
+@@ -XXX,XX +XXX,XX @@ void load_regs(struct CPUState *cpu)
+     CPUX86State *env = &x86_cpu->env;
+     int i = 0;
+-    RRX(env, R_EAX) = rreg(cpu->hvf_fd, HV_X86_RAX);
+-    RRX(env, R_EBX) = rreg(cpu->hvf_fd, HV_X86_RBX);
+-    RRX(env, R_ECX) = rreg(cpu->hvf_fd, HV_X86_RCX);
+-    RRX(env, R_EDX) = rreg(cpu->hvf_fd, HV_X86_RDX);
+-    RRX(env, R_ESI) = rreg(cpu->hvf_fd, HV_X86_RSI);
+-    RRX(env, R_EDI) = rreg(cpu->hvf_fd, HV_X86_RDI);
+-    RRX(env, R_ESP) = rreg(cpu->hvf_fd, HV_X86_RSP);
+-    RRX(env, R_EBP) = rreg(cpu->hvf_fd, HV_X86_RBP);
++    RRX(env, R_EAX) = rreg(cpu->hvf->fd, HV_X86_RAX);
++    RRX(env, R_EBX) = rreg(cpu->hvf->fd, HV_X86_RBX);
++    RRX(env, R_ECX) = rreg(cpu->hvf->fd, HV_X86_RCX);
++    RRX(env, R_EDX) = rreg(cpu->hvf->fd, HV_X86_RDX);
++    RRX(env, R_ESI) = rreg(cpu->hvf->fd, HV_X86_RSI);
++    RRX(env, R_EDI) = rreg(cpu->hvf->fd, HV_X86_RDI);
++    RRX(env, R_ESP) = rreg(cpu->hvf->fd, HV_X86_RSP);
++    RRX(env, R_EBP) = rreg(cpu->hvf->fd, HV_X86_RBP);
+     for (i = 8; i < 16; i++) {
+-        RRX(env, i) = rreg(cpu->hvf_fd, HV_X86_RAX + i);
++        RRX(env, i) = rreg(cpu->hvf->fd, HV_X86_RAX + i);
+     }
+-    env->eflags = rreg(cpu->hvf_fd, HV_X86_RFLAGS);
++    env->eflags = rreg(cpu->hvf->fd, HV_X86_RFLAGS);
+     rflags_to_lflags(env);
+-    env->eip = rreg(cpu->hvf_fd, HV_X86_RIP);
++    env->eip = rreg(cpu->hvf->fd, HV_X86_RIP);
+ }
+ void store_regs(struct CPUState *cpu)
+@@ -XXX,XX +XXX,XX @@ void store_regs(struct CPUState *cpu)
+     CPUX86State *env = &x86_cpu->env;
+     int i = 0;
+-    wreg(cpu->hvf_fd, HV_X86_RAX, RAX(env));
+-    wreg(cpu->hvf_fd, HV_X86_RBX, RBX(env));
+-    wreg(cpu->hvf_fd, HV_X86_RCX, RCX(env));
+-    wreg(cpu->hvf_fd, HV_X86_RDX, RDX(env));
+-    wreg(cpu->hvf_fd, HV_X86_RSI, RSI(env));
+-    wreg(cpu->hvf_fd, HV_X86_RDI, RDI(env));
+-    wreg(cpu->hvf_fd, HV_X86_RBP, RBP(env));
+-    wreg(cpu->hvf_fd, HV_X86_RSP, RSP(env));
++    wreg(cpu->hvf->fd, HV_X86_RAX, RAX(env));
++    wreg(cpu->hvf->fd, HV_X86_RBX, RBX(env));
++    wreg(cpu->hvf->fd, HV_X86_RCX, RCX(env));
++    wreg(cpu->hvf->fd, HV_X86_RDX, RDX(env));
++    wreg(cpu->hvf->fd, HV_X86_RSI, RSI(env));
++    wreg(cpu->hvf->fd, HV_X86_RDI, RDI(env));
++    wreg(cpu->hvf->fd, HV_X86_RBP, RBP(env));
++    wreg(cpu->hvf->fd, HV_X86_RSP, RSP(env));
+     for (i = 8; i < 16; i++) {
+-        wreg(cpu->hvf_fd, HV_X86_RAX + i, RRX(env, i));
++        wreg(cpu->hvf->fd, HV_X86_RAX + i, RRX(env, i));
+     }
+     lflags_to_rflags(env);
+-    wreg(cpu->hvf_fd, HV_X86_RFLAGS, env->eflags);
++    wreg(cpu->hvf->fd, HV_X86_RFLAGS, env->eflags);
+     macvm_set_rip(cpu, env->eip);
+ }
+diff --git a/target/i386/hvf/x86_mmu.c b/target/i386/hvf/x86_mmu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/i386/hvf/x86_mmu.c
++++ b/target/i386/hvf/x86_mmu.c
+@@ -XXX,XX +XXX,XX @@ static bool test_pt_entry(struct CPUState *cpu, struct gpt_translation *pt,
+         pt->err_code |= MMU_PAGE_PT;
+     }
+-    uint32_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
++    uint32_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
+     /* check protection */
+     if (cr0 & CR0_WP) {
+         if (pt->write_access && !pte_write_access(pte)) {
+@@ -XXX,XX +XXX,XX @@ static bool walk_gpt(struct CPUState *cpu, target_ulong addr, int err_code,
+ {
+     int top_level, level;
+     bool is_large = false;
+-    target_ulong cr3 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR3);
++    target_ulong cr3 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR3);
+     uint64_t page_mask = pae ? PAE_PTE_PAGE_MASK : LEGACY_PTE_PAGE_MASK;
+     memset(pt, 0, sizeof(*pt));
+diff --git a/target/i386/hvf/x86_task.c b/target/i386/hvf/x86_task.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/i386/hvf/x86_task.c
++++ b/target/i386/hvf/x86_task.c
+@@ -XXX,XX +XXX,XX @@ static void load_state_from_tss32(CPUState *cpu, struct x86_tss_segment32 *tss)
+     X86CPU *x86_cpu = X86_CPU(cpu);
+     CPUX86State *env = &x86_cpu->env;
+-    wvmcs(cpu->hvf_fd, VMCS_GUEST_CR3, tss->cr3);
++    wvmcs(cpu->hvf->fd, VMCS_GUEST_CR3, tss->cr3);
+     env->eip = tss->eip;
+     env->eflags = tss->eflags | 2;
+@@ -XXX,XX +XXX,XX @@ static int task_switch_32(CPUState *cpu, x68_segment_selector tss_sel, x68_segme
+ void vmx_handle_task_switch(CPUState *cpu, x68_segment_selector tss_sel, int reason, bool gate_valid, uint8_t gate, uint64_t gate_type)
+ {
+-    uint64_t rip = rreg(cpu->hvf_fd, HV_X86_RIP);
++    uint64_t rip = rreg(cpu->hvf->fd, HV_X86_RIP);
+     if (!gate_valid || (gate_type != VMCS_INTR_T_HWEXCEPTION &&
+                         gate_type != VMCS_INTR_T_HWINTR &&
+                         gate_type != VMCS_INTR_T_NMI)) {
+-        int ins_len = rvmcs(cpu->hvf_fd, VMCS_EXIT_INSTRUCTION_LENGTH);
++        int ins_len = rvmcs(cpu->hvf->fd, VMCS_EXIT_INSTRUCTION_LENGTH);
+         macvm_set_rip(cpu, rip + ins_len);
+         return;
+     }
+@@ -XXX,XX +XXX,XX @@ void vmx_handle_task_switch(CPUState *cpu, x68_segment_selector tss_sel, int rea
+         //ret = task_switch_16(cpu, tss_sel, old_tss_sel, old_tss_base, &next_tss_desc);
+         VM_PANIC("task_switch_16");
+-    macvm_set_cr0(cpu->hvf_fd, rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0) | CR0_TS);
++    macvm_set_cr0(cpu->hvf->fd, rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0) | CR0_TS);
+     x86_segment_descriptor_to_vmx(cpu, tss_sel, &next_tss_desc, &vmx_seg);
+     vmx_write_segment_descriptor(cpu, &vmx_seg, R_TR);
+     store_regs(cpu);
+-    hv_vcpu_invalidate_tlb(cpu->hvf_fd);
+-    hv_vcpu_flush(cpu->hvf_fd);
++    hv_vcpu_invalidate_tlb(cpu->hvf->fd);
++    hv_vcpu_flush(cpu->hvf->fd);
+ }
+diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/i386/hvf/x86hvf.c
++++ b/target/i386/hvf/x86hvf.c
+@@ -XXX,XX +XXX,XX @@ void hvf_put_xsave(CPUState *cpu_state)
+     x86_cpu_xsave_all_areas(X86_CPU(cpu_state), xsave);
+-    if (hv_vcpu_write_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
++    if (hv_vcpu_write_fpstate(cpu_state->hvf->fd, (void*)xsave, 4096)) {
+         abort();
+     }
+ }
+@@ -XXX,XX +XXX,XX @@ void hvf_put_segments(CPUState *cpu_state)
+     CPUX86State *env = &X86_CPU(cpu_state)->env;
+     struct vmx_segment seg;
+-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_LIMIT, env->idt.limit);
+-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_BASE, env->idt.base);
++    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_LIMIT, env->idt.limit);
++    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_BASE, env->idt.base);
+-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_LIMIT, env->gdt.limit);
+-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_BASE, env->gdt.base);
++    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_LIMIT, env->gdt.limit);
++    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_BASE, env->gdt.base);
+-    /* wvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR2, env->cr[2]); */
+-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR3, env->cr[3]);
++    /* wvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR2, env->cr[2]); */
++    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR3, env->cr[3]);
+     vmx_update_tpr(cpu_state);
+-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IA32_EFER, env->efer);
++    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IA32_EFER, env->efer);
+-    macvm_set_cr4(cpu_state->hvf_fd, env->cr[4]);
+-    macvm_set_cr0(cpu_state->hvf_fd, env->cr[0]);
++    macvm_set_cr4(cpu_state->hvf->fd, env->cr[4]);
++    macvm_set_cr0(cpu_state->hvf->fd, env->cr[0]);
+     hvf_set_segment(cpu_state, &seg, &env->segs[R_CS], false);
+     vmx_write_segment_descriptor(cpu_state, &seg, R_CS);
+@@ -XXX,XX +XXX,XX @@ void hvf_put_segments(CPUState *cpu_state)
+     hvf_set_segment(cpu_state, &seg, &env->ldt, false);
+     vmx_write_segment_descriptor(cpu_state, &seg, R_LDTR);
+-    hv_vcpu_flush(cpu_state->hvf_fd);
++    hv_vcpu_flush(cpu_state->hvf->fd);
+ }
+ void hvf_put_msrs(CPUState *cpu_state)
+ {
+     CPUX86State *env = &X86_CPU(cpu_state)->env;
+-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_CS,
++    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_CS,
+                       env->sysenter_cs);
+-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_ESP,
++    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_ESP,
+                       env->sysenter_esp);
+-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_EIP,
++    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_EIP,
+                       env->sysenter_eip);
+-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_STAR, env->star);
++    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_STAR, env->star);
+ #ifdef TARGET_X86_64
+-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_CSTAR, env->cstar);
+-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_KERNELGSBASE, env->kernelgsbase);
+-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_FMASK, env->fmask);
+-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_LSTAR, env->lstar);
++    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_CSTAR, env->cstar);
++    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_KERNELGSBASE, env->kernelgsbase);
++    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_FMASK, env->fmask);
++    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_LSTAR, env->lstar);
+ #endif
+-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_GSBASE, env->segs[R_GS].base);
+-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_FSBASE, env->segs[R_FS].base);
++    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_GSBASE, env->segs[R_GS].base);
++    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_FSBASE, env->segs[R_FS].base);
+ }
+@@ -XXX,XX +XXX,XX @@ void hvf_get_xsave(CPUState *cpu_state)
+     xsave = X86_CPU(cpu_state)->env.xsave_buf;
+-    if (hv_vcpu_read_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
++    if (hv_vcpu_read_fpstate(cpu_state->hvf->fd, (void*)xsave, 4096)) {
+         abort();
+     }
+@@ -XXX,XX +XXX,XX @@ void hvf_get_segments(CPUState *cpu_state)
+     vmx_read_segment_descriptor(cpu_state, &seg, R_LDTR);
+     hvf_get_segment(&env->ldt, &seg);
+-    env->idt.limit = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_LIMIT);
+-    env->idt.base = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_BASE);
+-    env->gdt.limit = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
+-    env->gdt.base = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_BASE);
++    env->idt.limit = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_LIMIT);
++    env->idt.base = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_BASE);
++    env->gdt.limit = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
++    env->gdt.base = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_BASE);
+-    env->cr[0] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR0);
++    env->cr[0] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR0);
+     env->cr[2] = 0;
+-    env->cr[3] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR3);
+-    env->cr[4] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR4);
++    env->cr[3] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR3);
++    env->cr[4] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR4);
+-    env->efer = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IA32_EFER);
++    env->efer = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IA32_EFER);
+ }
+ void hvf_get_msrs(CPUState *cpu_state)
+@@ -XXX,XX +XXX,XX @@ void hvf_get_msrs(CPUState *cpu_state)
+     CPUX86State *env = &X86_CPU(cpu_state)->env;
+     uint64_t tmp;
+-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_CS, &tmp);
++    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_CS, &tmp);
+     env->sysenter_cs = tmp;
+-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_ESP, &tmp);
++    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_ESP, &tmp);
+     env->sysenter_esp = tmp;
+-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_EIP, &tmp);
++    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_EIP, &tmp);
+     env->sysenter_eip = tmp;
+-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_STAR, &env->star);
++    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_STAR, &env->star);
+ #ifdef TARGET_X86_64
+-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_CSTAR, &env->cstar);
+-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_KERNELGSBASE, &env->kernelgsbase);
+-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_FMASK, &env->fmask);
+-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_LSTAR, &env->lstar);
++    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_CSTAR, &env->cstar);
++    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_KERNELGSBASE, &env->kernelgsbase);
++    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_FMASK, &env->fmask);
++    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_LSTAR, &env->lstar);
+ #endif
+-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_APICBASE, &tmp);
++    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_APICBASE, &tmp);
+-    env->tsc = rdtscp() + rvmcs(cpu_state->hvf_fd, VMCS_TSC_OFFSET);
++    env->tsc = rdtscp() + rvmcs(cpu_state->hvf->fd, VMCS_TSC_OFFSET);
+ }
+ int hvf_put_registers(CPUState *cpu_state)
+@@ -XXX,XX +XXX,XX @@ int hvf_put_registers(CPUState *cpu_state)
+     X86CPU *x86cpu = X86_CPU(cpu_state);
+     CPUX86State *env = &x86cpu->env;
+-    wreg(cpu_state->hvf_fd, HV_X86_RAX, env->regs[R_EAX]);
+-    wreg(cpu_state->hvf_fd, HV_X86_RBX, env->regs[R_EBX]);
+-    wreg(cpu_state->hvf_fd, HV_X86_RCX, env->regs[R_ECX]);
+-    wreg(cpu_state->hvf_fd, HV_X86_RDX, env->regs[R_EDX]);
+-    wreg(cpu_state->hvf_fd, HV_X86_RBP, env->regs[R_EBP]);
+-    wreg(cpu_state->hvf_fd, HV_X86_RSP, env->regs[R_ESP]);
+-    wreg(cpu_state->hvf_fd, HV_X86_RSI, env->regs[R_ESI]);
+-    wreg(cpu_state->hvf_fd, HV_X86_RDI, env->regs[R_EDI]);
+-    wreg(cpu_state->hvf_fd, HV_X86_R8, env->regs[8]);
+-    wreg(cpu_state->hvf_fd, HV_X86_R9, env->regs[9]);
+-    wreg(cpu_state->hvf_fd, HV_X86_R10, env->regs[10]);
+-    wreg(cpu_state->hvf_fd, HV_X86_R11, env->regs[11]);
+-    wreg(cpu_state->hvf_fd, HV_X86_R12, env->regs[12]);
+-    wreg(cpu_state->hvf_fd, HV_X86_R13, env->regs[13]);
+-    wreg(cpu_state->hvf_fd, HV_X86_R14, env->regs[14]);
+-    wreg(cpu_state->hvf_fd, HV_X86_R15, env->regs[15]);
+-    wreg(cpu_state->hvf_fd, HV_X86_RFLAGS, env->eflags);
+-    wreg(cpu_state->hvf_fd, HV_X86_RIP, env->eip);
++    wreg(cpu_state->hvf->fd, HV_X86_RAX, env->regs[R_EAX]);
++    wreg(cpu_state->hvf->fd, HV_X86_RBX, env->regs[R_EBX]);
++    wreg(cpu_state->hvf->fd, HV_X86_RCX, env->regs[R_ECX]);
++    wreg(cpu_state->hvf->fd, HV_X86_RDX, env->regs[R_EDX]);
++    wreg(cpu_state->hvf->fd, HV_X86_RBP, env->regs[R_EBP]);
++    wreg(cpu_state->hvf->fd, HV_X86_RSP, env->regs[R_ESP]);
++    wreg(cpu_state->hvf->fd, HV_X86_RSI, env->regs[R_ESI]);
++    wreg(cpu_state->hvf->fd, HV_X86_RDI, env->regs[R_EDI]);
++    wreg(cpu_state->hvf->fd, HV_X86_R8, env->regs[8]);
++    wreg(cpu_state->hvf->fd, HV_X86_R9, env->regs[9]);
++    wreg(cpu_state->hvf->fd, HV_X86_R10, env->regs[10]);
++    wreg(cpu_state->hvf->fd, HV_X86_R11, env->regs[11]);
++    wreg(cpu_state->hvf->fd, HV_X86_R12, env->regs[12]);
++    wreg(cpu_state->hvf->fd, HV_X86_R13, env->regs[13]);
++    wreg(cpu_state->hvf->fd, HV_X86_R14, env->regs[14]);
++    wreg(cpu_state->hvf->fd, HV_X86_R15, env->regs[15]);
++    wreg(cpu_state->hvf->fd, HV_X86_RFLAGS, env->eflags);
++    wreg(cpu_state->hvf->fd, HV_X86_RIP, env->eip);
+-    wreg(cpu_state->hvf_fd, HV_X86_XCR0, env->xcr0);
++    wreg(cpu_state->hvf->fd, HV_X86_XCR0, env->xcr0);
+     hvf_put_xsave(cpu_state);
+@@ -XXX,XX +XXX,XX @@ int hvf_put_registers(CPUState *cpu_state)
+     hvf_put_msrs(cpu_state);
+-    wreg(cpu_state->hvf_fd, HV_X86_DR0, env->dr[0]);
+-    wreg(cpu_state->hvf_fd, HV_X86_DR1, env->dr[1]);
+-    wreg(cpu_state->hvf_fd, HV_X86_DR2, env->dr[2]);
+-    wreg(cpu_state->hvf_fd, HV_X86_DR3, env->dr[3]);
+-    wreg(cpu_state->hvf_fd, HV_X86_DR4, env->dr[4]);
+-    wreg(cpu_state->hvf_fd, HV_X86_DR5, env->dr[5]);
+-    wreg(cpu_state->hvf_fd, HV_X86_DR6, env->dr[6]);
+-    wreg(cpu_state->hvf_fd, HV_X86_DR7, env->dr[7]);
++    wreg(cpu_state->hvf->fd, HV_X86_DR0, env->dr[0]);
++    wreg(cpu_state->hvf->fd, HV_X86_DR1, env->dr[1]);
++    wreg(cpu_state->hvf->fd, HV_X86_DR2, env->dr[2]);
++    wreg(cpu_state->hvf->fd, HV_X86_DR3, env->dr[3]);
++    wreg(cpu_state->hvf->fd, HV_X86_DR4, env->dr[4]);
++    wreg(cpu_state->hvf->fd, HV_X86_DR5, env->dr[5]);
++    wreg(cpu_state->hvf->fd, HV_X86_DR6, env->dr[6]);
++    wreg(cpu_state->hvf->fd, HV_X86_DR7, env->dr[7]);
+     return 0;
+ }
+@@ -XXX,XX +XXX,XX @@ int hvf_get_registers(CPUState *cpu_state)
+     X86CPU *x86cpu = X86_CPU(cpu_state);
+     CPUX86State *env = &x86cpu->env;
+-    env->regs[R_EAX] = rreg(cpu_state->hvf_fd, HV_X86_RAX);
+-    env->regs[R_EBX] = rreg(cpu_state->hvf_fd, HV_X86_RBX);
+-    env->regs[R_ECX] = rreg(cpu_state->hvf_fd, HV_X86_RCX);
+-    env->regs[R_EDX] = rreg(cpu_state->hvf_fd, HV_X86_RDX);
+-    env->regs[R_EBP] = rreg(cpu_state->hvf_fd, HV_X86_RBP);
+-    env->regs[R_ESP] = rreg(cpu_state->hvf_fd, HV_X86_RSP);
+-    env->regs[R_ESI] = rreg(cpu_state->hvf_fd, HV_X86_RSI);
+-    env->regs[R_EDI] = rreg(cpu_state->hvf_fd, HV_X86_RDI);
+-    env->regs[8] = rreg(cpu_state->hvf_fd, HV_X86_R8);
+-    env->regs[9] = rreg(cpu_state->hvf_fd, HV_X86_R9);
+-    env->regs[10] = rreg(cpu_state->hvf_fd, HV_X86_R10);
+-    env->regs[11] = rreg(cpu_state->hvf_fd, HV_X86_R11);
+-    env->regs[12] = rreg(cpu_state->hvf_fd, HV_X86_R12);
+-    env->regs[13] = rreg(cpu_state->hvf_fd, HV_X86_R13);
+-    env->regs[14] = rreg(cpu_state->hvf_fd, HV_X86_R14);
+-    env->regs[15] = rreg(cpu_state->hvf_fd, HV_X86_R15);
++    env->regs[R_EAX] = rreg(cpu_state->hvf->fd, HV_X86_RAX);
++    env->regs[R_EBX] = rreg(cpu_state->hvf->fd, HV_X86_RBX);
++    env->regs[R_ECX] = rreg(cpu_state->hvf->fd, HV_X86_RCX);
++    env->regs[R_EDX] = rreg(cpu_state->hvf->fd, HV_X86_RDX);
++    env->regs[R_EBP] = rreg(cpu_state->hvf->fd, HV_X86_RBP);
++    env->regs[R_ESP] = rreg(cpu_state->hvf->fd, HV_X86_RSP);
++    env->regs[R_ESI] = rreg(cpu_state->hvf->fd, HV_X86_RSI);
++    env->regs[R_EDI] = rreg(cpu_state->hvf->fd, HV_X86_RDI);
++    env->regs[8] = rreg(cpu_state->hvf->fd, HV_X86_R8);
++    env->regs[9] = rreg(cpu_state->hvf->fd, HV_X86_R9);
++    env->regs[10] = rreg(cpu_state->hvf->fd, HV_X86_R10);
++    env->regs[11] = rreg(cpu_state->hvf->fd, HV_X86_R11);
++    env->regs[12] = rreg(cpu_state->hvf->fd, HV_X86_R12);
++    env->regs[13] = rreg(cpu_state->hvf->fd, HV_X86_R13);
++    env->regs[14] = rreg(cpu_state->hvf->fd, HV_X86_R14);
++    env->regs[15] = rreg(cpu_state->hvf->fd, HV_X86_R15);
+-    env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
+-    env->eip = rreg(cpu_state->hvf_fd, HV_X86_RIP);
++    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
++    env->eip = rreg(cpu_state->hvf->fd, HV_X86_RIP);
+     hvf_get_xsave(cpu_state);
+-    env->xcr0 = rreg(cpu_state->hvf_fd, HV_X86_XCR0);
++    env->xcr0 = rreg(cpu_state->hvf->fd, HV_X86_XCR0);
+     hvf_get_segments(cpu_state);
+     hvf_get_msrs(cpu_state);
+-    env->dr[0] = rreg(cpu_state->hvf_fd, HV_X86_DR0);
+-    env->dr[1] = rreg(cpu_state->hvf_fd, HV_X86_DR1);
+-    env->dr[2] = rreg(cpu_state->hvf_fd, HV_X86_DR2);
+-    env->dr[3] = rreg(cpu_state->hvf_fd, HV_X86_DR3);
+-    env->dr[4] = rreg(cpu_state->hvf_fd, HV_X86_DR4);
+-    env->dr[5] = rreg(cpu_state->hvf_fd, HV_X86_DR5);
+-    env->dr[6] = rreg(cpu_state->hvf_fd, HV_X86_DR6);
+-    env->dr[7] = rreg(cpu_state->hvf_fd, HV_X86_DR7);
++    env->dr[0] = rreg(cpu_state->hvf->fd, HV_X86_DR0);
++    env->dr[1] = rreg(cpu_state->hvf->fd, HV_X86_DR1);
++    env->dr[2] = rreg(cpu_state->hvf->fd, HV_X86_DR2);
++    env->dr[3] = rreg(cpu_state->hvf->fd, HV_X86_DR3);
++    env->dr[4] = rreg(cpu_state->hvf->fd, HV_X86_DR4);
++    env->dr[5] = rreg(cpu_state->hvf->fd, HV_X86_DR5);
++    env->dr[6] = rreg(cpu_state->hvf->fd, HV_X86_DR6);
++    env->dr[7] = rreg(cpu_state->hvf->fd, HV_X86_DR7);
+     x86_update_hflags(env);
+     return 0;
+@@ -XXX,XX +XXX,XX @@ int hvf_get_registers(CPUState *cpu_state)
+ static void vmx_set_int_window_exiting(CPUState *cpu)
+ {
+      uint64_t val;
+-     val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
+-     wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val |
++     val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
++     wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val |
+              VMCS_PRI_PROC_BASED_CTLS_INT_WINDOW_EXITING);
+ }
+ void vmx_clear_int_window_exiting(CPUState *cpu)
+ {
+      uint64_t val;
+-     val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
+-     wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val &
++     val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
++     wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val &
+              ~VMCS_PRI_PROC_BASED_CTLS_INT_WINDOW_EXITING);
+ }
+@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
+     uint64_t info = 0;
+     if (have_event) {
+         info = vector | intr_type | VMCS_INTR_VALID;
+-        uint64_t reason = rvmcs(cpu_state->hvf_fd, VMCS_EXIT_REASON);
++        uint64_t reason = rvmcs(cpu_state->hvf->fd, VMCS_EXIT_REASON);
+         if (env->nmi_injected && reason != EXIT_REASON_TASK_SWITCH) {
+             vmx_clear_nmi_blocking(cpu_state);
+         }
+@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
+             info &= ~(1 << 12); /* clear undefined bit */
+             if (intr_type == VMCS_INTR_T_SWINTR ||
+                 intr_type == VMCS_INTR_T_SWEXCEPTION) {
+-                wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INST_LENGTH, env->ins_len);
++                wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INST_LENGTH, env->ins_len);
+             }
+             if (env->has_error_code) {
+-                wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_EXCEPTION_ERROR,
++                wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_EXCEPTION_ERROR,
+                       env->error_code);
+                 /* Indicate that VMCS_ENTRY_EXCEPTION_ERROR is valid */
+                 info |= VMCS_INTR_DEL_ERRCODE;
+             }
+             /*printf("reinject  %lx err %d\n", info, err);*/
+-            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, info);
++            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, info);
+         };
+     }
+@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
+         if (!(env->hflags2 & HF2_NMI_MASK) && !(info & VMCS_INTR_VALID)) {
+             cpu_state->interrupt_request &= ~CPU_INTERRUPT_NMI;
+             info = VMCS_INTR_VALID | VMCS_INTR_T_NMI | EXCP02_NMI;
+-            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, info);
++            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, info);
+         } else {
+             vmx_set_nmi_window_exiting(cpu_state);
+         }
+@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
+         int line = cpu_get_pic_interrupt(&x86cpu->env);
+         cpu_state->interrupt_request &= ~CPU_INTERRUPT_HARD;
+         if (line >= 0) {
+-            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, line |
++            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, line |
+                   VMCS_INTR_VALID | VMCS_INTR_T_HWINTR);
+         }
+     }
+@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
+     X86CPU *cpu = X86_CPU(cpu_state);
+     CPUX86State *env = &cpu->env;
+-    env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
++    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
+     if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
+         cpu_synchronize_state(cpu_state);
 --
 .20.1

-[PULL 03/39] target/arm: Don't use a TLB for ARMMMUIdx_Stage2
+[PULL 39/45] hvf: Simplify post reset/init/loadvm hooks
-We define ARMMMUIdx_Stage2 as being an MMU index which uses a QEMU
+From: Alexander Graf <agraf@csgraf.de>
 TLB.  However we never actually use the TLB -- all stage 2 lookups
 are done by direct calls to get_phys_addr_lpae() followed by a
 physical address load via address_space_ld*().
-Remove Stage2 from the list of ARM MMU indexes which correspond to
+The hooks we have that call us after reset, init and loadvm really all
-real core MMU indexes, and instead put it in the set of "NOTLB" ARM
+just want to say "The reference of all register state is in the QEMU
-MMU indexes.
+vcpu struct, please push it".
-This allows us to drop NB_MMU_MODES to 11.  It also means we can
+We already have a working pushing mechanism though called cpu->vcpu_dirty,
-safely add support for the ARMv8.3-TTS2UXN extension, which adds
+so we can just reuse that for all of the above, syncing state properly the
-permission bits to the stage 2 descriptors which define execute
+next time we actually execute a vCPU.
 permission separatel for EL0 and EL1; supporting that while keeping
 Stage2 in a QEMU TLB would require us to use separate TLBs for
 "Stage2 for an EL0 access" and "Stage2 for an EL1 access", which is a
 lot of extra complication given we aren't even using the QEMU TLB.
-In the process of updating the comment on our MMU index use,
+This fixes PSCI resets on ARM, as they modify CPU state even after the
-fix a couple of other minor errors:
+post init call has completed, but before we execute the vCPU again.
  * NS EL2 EL2&0 was missing from the list in the comment
  * some text hadn't been updated from when we bumped NB_MMU_MODES
    above 8
+To also make the scheme work for x86, we have to make sure we don't
+move stale eflags into our env when the vcpu state is dirty.
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
+Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
+Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
+Reviewed-by: Sergio Lopez <slp@redhat.com>
+Message-id: 20210519202253.76782-13-agraf@csgraf.de
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200330210400.11724-2-peter.maydell@linaro.org
 ---
- target/arm/cpu-param.h |   2 +-
+ accel/hvf/hvf-accel-ops.c | 27 +++++++--------------------
- target/arm/cpu.h       |  21 +++++---
+ target/i386/hvf/x86hvf.c  |  5 ++++-
- target/arm/helper.c    | 112 ++++-------------------------------------
+files changed, 11 insertions(+), 21 deletions(-)
 files changed, 27 insertions(+), 108 deletions(-)
-diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
+diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu-param.h
+--- a/accel/hvf/hvf-accel-ops.c
-+++ b/target/arm/cpu-param.h
++++ b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void hvf_cpu_synchronize_state(CPUState *cpu)
- # define TARGET_PAGE_BITS_MIN  10
+     }
  #endif
 -#define NB_MMU_MODES 12
 +#define NB_MMU_MODES 11
  #endif
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
   *     handling via the TLB. The only way to do a stage 1 translation without
   *     the immediate stage 2 translation is via the ATS or AT system insns,
   *     which can be slow-pathed and always do a page table walk.
 + *     The only use of stage 2 translations is either as part of an s1+2
 + *     lookup or when loading the descriptors during a stage 1 page table walk,
 + *     and in both those cases we don't use the TLB.
   *  4. we can also safely fold together the "32 bit EL3" and "64 bit EL3"
   *     translation regimes, because they map reasonably well to each other
   *     and they can't both be active at the same time.
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
   * NS EL1 EL1&0 stage 1+2 (aka NS PL1)
   * NS EL1 EL1&0 stage 1+2 +PAN
   * NS EL0 EL2&0
 + * NS EL2 EL2&0
   * NS EL2 EL2&0 +PAN
   * NS EL2 (aka NS PL2)
   * S EL0 EL1&0 (aka S PL0)
   * S EL1 EL1&0 (not used if EL3 is 32 bit)
   * S EL1 EL1&0 +PAN
   * S EL3 (aka S PL1)
 - * NS EL1&0 stage 2
   *
 - * for a total of 12 different mmu_idx.
 + * for a total of 11 different mmu_idx.
   *
   * R profile CPUs have an MPU, but can use the same set of MMU indexes
   * as A profile. They only need to distinguish NS EL0 and NS EL1 (and
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
   * are not quite the same -- different CPU types (most notably M profile
   * vs A/R profile) would like to use MMU indexes with different semantics,
   * but since we don't ever need to use all of those in a single CPU we
 - * can avoid setting NB_MMU_MODES to more than 8. The lower bits of
 + * can avoid having to set NB_MMU_MODES to "total number of A profile MMU
 + * modes + total number of M profile MMU modes". The lower bits of
   * ARMMMUIdx are the core TLB mmu index, and the higher bits are always
   * the same for any particular CPU.
   * Variables of type ARMMUIdx are always full values, and the core
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
      ARMMMUIdx_SE10_1_PAN = 9 | ARM_MMU_IDX_A,
      ARMMMUIdx_SE3        = 10 | ARM_MMU_IDX_A,
 -    ARMMMUIdx_Stage2     = 11 | ARM_MMU_IDX_A,
 -
      /*
       * These are not allocated TLBs and are used only for AT system
       * instructions or for the first stage of an S12 page table walk.
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
      ARMMMUIdx_Stage1_E0 = 0 | ARM_MMU_IDX_NOTLB,
      ARMMMUIdx_Stage1_E1 = 1 | ARM_MMU_IDX_NOTLB,
      ARMMMUIdx_Stage1_E1_PAN = 2 | ARM_MMU_IDX_NOTLB,
 +    /*
 +     * Not allocated a TLB: used only for second stage of an S12 page
 +     * table walk, or for descriptor loads during first stage of an S1
 +     * page table walk. Note that if we ever want to have a TLB for this
 +     * then various TLB flush insns which currently are no-ops or flush
 +     * only stage 1 MMU indexes will need to change to flush stage 2.
 +     */
 +    ARMMMUIdx_Stage2     = 3 | ARM_MMU_IDX_NOTLB,
      /*
       * M-profile.
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdxBit {
      TO_CORE_BIT(SE10_1),
      TO_CORE_BIT(SE10_1_PAN),
      TO_CORE_BIT(SE3),
 -    TO_CORE_BIT(Stage2),
      TO_CORE_BIT(MUser),
      TO_CORE_BIT(MPriv),
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void tlbiall_nsnh_write(CPUARMState *env, const ARMCPRegInfo *ri,
      tlb_flush_by_mmuidx(cs,
                          ARMMMUIdxBit_E10_1 |
                          ARMMMUIdxBit_E10_1_PAN |
 -                        ARMMMUIdxBit_E10_0 |
 -                        ARMMMUIdxBit_Stage2);
 +                        ARMMMUIdxBit_E10_0);
  }
- static void tlbiall_nsnh_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+-static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
-@@ -XXX,XX +XXX,XX @@ static void tlbiall_nsnh_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+-                                              run_on_cpu_data arg)
-     tlb_flush_by_mmuidx_all_cpus_synced(cs,
++static void do_hvf_cpu_synchronize_set_dirty(CPUState *cpu,
-                                         ARMMMUIdxBit_E10_1 |
++                                             run_on_cpu_data arg)
-                                         ARMMMUIdxBit_E10_1_PAN |
+ {
--                                        ARMMMUIdxBit_E10_0 |
+-    hvf_put_registers(cpu);
--                                        ARMMMUIdxBit_Stage2);
+-    cpu->vcpu_dirty = false;
-+                                        ARMMMUIdxBit_E10_0);
++    /* QEMU state is the reference, push it to HVF now and on next entry */
 +    cpu->vcpu_dirty = true;
  }
--static void tlbiipas2_write(CPUARMState *env, const ARMCPRegInfo *ri,
+ static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
--                            uint64_t value)
+ {
--{
+-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
 -    /* Invalidate by IPA. This has to invalidate any structures that
 -     * contain only stage 2 translation information, but does not need
 -     * to apply to structures that contain combined stage 1 and stage 2
 -     * translation information.
 -     * This must NOP if EL2 isn't implemented or SCR_EL3.NS is zero.
 -     */
 -    CPUState *cs = env_cpu(env);
 -    uint64_t pageaddr;
 -
 -    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
 -        return;
 -    }
 -
 -    pageaddr = sextract64(value << 12, 0, 40);
 -
 -    tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_Stage2);
 -}
 -
--static void tlbiipas2_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+-static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
--                               uint64_t value)
+-                                             run_on_cpu_data arg)
 -{
--    CPUState *cs = env_cpu(env);
+-    hvf_put_registers(cpu);
--    uint64_t pageaddr;
+-    cpu->vcpu_dirty = false;
--
++    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
 -    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
 -        return;
 -    }
 -
 -    pageaddr = sextract64(value << 12, 0, 40);
 -
 -    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
 -                                             ARMMMUIdxBit_Stage2);
 -}
  static void tlbiall_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                uint64_t value)
@@ -XXX,XX +XXX,XX @@ static void vttbr_write(CPUARMState *env, const ARMCPRegInfo *ri,
          tlb_flush_by_mmuidx(cs,
                              ARMMMUIdxBit_E10_1 |
                              ARMMMUIdxBit_E10_1_PAN |
 -                            ARMMMUIdxBit_E10_0 |
 -                            ARMMMUIdxBit_Stage2);
 +                            ARMMMUIdxBit_E10_0);
          raw_write(env, ri, value);
      }
  }
-@@ -XXX,XX +XXX,XX @@ static int alle1_tlbmask(CPUARMState *env)
-         return ARMMMUIdxBit_SE10_1 |
+ static void hvf_cpu_synchronize_post_init(CPUState *cpu)
-                ARMMMUIdxBit_SE10_1_PAN |
+ {
-                ARMMMUIdxBit_SE10_0;
+-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
 -    } else if (arm_feature(env, ARM_FEATURE_EL2)) {
 -        return ARMMMUIdxBit_E10_1 |
 -               ARMMMUIdxBit_E10_1_PAN |
 -               ARMMMUIdxBit_E10_0 |
 -               ARMMMUIdxBit_Stage2;
      } else {
          return ARMMMUIdxBit_E10_1 |
                 ARMMMUIdxBit_E10_1_PAN |
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                               ARMMMUIdxBit_SE3);
  }
 -static void tlbi_aa64_ipas2e1_write(CPUARMState *env, const ARMCPRegInfo *ri,
 -                                    uint64_t value)
 -{
 -    /* Invalidate by IPA. This has to invalidate any structures that
 -     * contain only stage 2 translation information, but does not need
 -     * to apply to structures that contain combined stage 1 and stage 2
 -     * translation information.
 -     * This must NOP if EL2 isn't implemented or SCR_EL3.NS is zero.
 -     */
 -    ARMCPU *cpu = env_archcpu(env);
 -    CPUState *cs = CPU(cpu);
 -    uint64_t pageaddr;
 -
 -    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
 -        return;
 -    }
 -
 -    pageaddr = sextract64(value << 12, 0, 48);
 -
 -    tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_Stage2);
 -}
 -
--static void tlbi_aa64_ipas2e1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+-static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
--                                      uint64_t value)
+-                                              run_on_cpu_data arg)
 -{
--    CPUState *cs = env_cpu(env);
+-    cpu->vcpu_dirty = true;
--    uint64_t pageaddr;
++    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
--
+ }
--    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
--        return;
+ static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
 -    }
 -
 -    pageaddr = sextract64(value << 12, 0, 48);
 -
 -    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
 -                                             ARMMMUIdxBit_Stage2);
 -}
 -
  static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri,
                                        bool isread)
  {
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
+-    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
-       .writefn = tlbi_aa64_vae1_write },
++    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
-     { .name = "TLBI_IPAS2E1IS", .state = ARM_CP_STATE_AA64,
+ }
-       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
--      .access = PL2_W, .type = ARM_CP_NO_RAW,
+ static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
--      .writefn = tlbi_aa64_ipas2e1is_write },
+diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
-+      .access = PL2_W, .type = ARM_CP_NOP },
+index XXXXXXX..XXXXXXX 100644
-     { .name = "TLBI_IPAS2LE1IS", .state = ARM_CP_STATE_AA64,
+--- a/target/i386/hvf/x86hvf.c
-       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5,
++++ b/target/i386/hvf/x86hvf.c
--      .access = PL2_W, .type = ARM_CP_NO_RAW,
+@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
--      .writefn = tlbi_aa64_ipas2e1is_write },
+     X86CPU *cpu = X86_CPU(cpu_state);
-+      .access = PL2_W, .type = ARM_CP_NOP },
+     CPUX86State *env = &cpu->env;
-     { .name = "TLBI_ALLE1IS", .state = ARM_CP_STATE_AA64,
-       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 4,
+-    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
-       .access = PL2_W, .type = ARM_CP_NO_RAW,
++    if (!cpu_state->vcpu_dirty) {
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
++        /* light weight sync for CPU_INTERRUPT_HARD and IF_MASK */
-       .writefn = tlbi_aa64_alle1is_write },
++        env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
-     { .name = "TLBI_IPAS2E1", .state = ARM_CP_STATE_AA64,
++    }
-       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1,
--      .access = PL2_W, .type = ARM_CP_NO_RAW,
+     if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
--      .writefn = tlbi_aa64_ipas2e1_write },
+         cpu_synchronize_state(cpu_state);
 +      .access = PL2_W, .type = ARM_CP_NOP },
      { .name = "TLBI_IPAS2LE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5,
 -      .access = PL2_W, .type = ARM_CP_NO_RAW,
 -      .writefn = tlbi_aa64_ipas2e1_write },
 +      .access = PL2_W, .type = ARM_CP_NOP },
      { .name = "TLBI_ALLE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 4,
        .access = PL2_W, .type = ARM_CP_NO_RAW,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
        .writefn = tlbimva_hyp_is_write },
      { .name = "TLBIIPAS2",
        .cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1,
 -      .type = ARM_CP_NO_RAW, .access = PL2_W,
 -      .writefn = tlbiipas2_write },
 +      .type = ARM_CP_NOP, .access = PL2_W },
      { .name = "TLBIIPAS2IS",
        .cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
 -      .type = ARM_CP_NO_RAW, .access = PL2_W,
 -      .writefn = tlbiipas2_is_write },
 +      .type = ARM_CP_NOP, .access = PL2_W },
      { .name = "TLBIIPAS2L",
        .cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5,
 -      .type = ARM_CP_NO_RAW, .access = PL2_W,
 -      .writefn = tlbiipas2_write },
 +      .type = ARM_CP_NOP, .access = PL2_W },
      { .name = "TLBIIPAS2LIS",
        .cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5,
 -      .type = ARM_CP_NO_RAW, .access = PL2_W,
 -      .writefn = tlbiipas2_is_write },
 +      .type = ARM_CP_NOP, .access = PL2_W },
      /* 32 bit cache operations */
      { .name = "ICIALLUIS", .cp = 15, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
        .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
 --
 .20.1

-New patch
+[PULL 40/45] tests/qtest/bios-tables-test: Check for dup2() failure
+Coverity notes that we don't check for dup2() failing.  Add some
+assertions so that if it does ever happen we get some indication.
+(This is similar to how we handle other "don't expect this syscall to
+fail" checks in this test code.)
+Fixes: Coverity CID 1432346
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
+Message-id: 20210525134458.6675-2-peter.maydell@linaro.org
+---
+ tests/qtest/bios-tables-test.c | 8 ++++++--
+file changed, 6 insertions(+), 2 deletions(-)
+diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/qtest/bios-tables-test.c
++++ b/tests/qtest/bios-tables-test.c
+@@ -XXX,XX +XXX,XX @@ static void test_acpi_asl(test_data *data)
+                                                  exp_sdt->asl_file, sdt->asl_file);
+                     int out = dup(STDOUT_FILENO);
+                     int ret G_GNUC_UNUSED;
++                    int dupret;
+-                    dup2(STDERR_FILENO, STDOUT_FILENO);
++                    g_assert(out >= 0);
++                    dupret = dup2(STDERR_FILENO, STDOUT_FILENO);
++                    g_assert(dupret >= 0);
+                     ret = system(diff) ;
+-                    dup2(out, STDOUT_FILENO);
++                    dupret = dup2(out, STDOUT_FILENO);
++                    g_assert(dupret >= 0);
+                     close(out);
+                     g_free(diff);
+                 }
+--
+.20.1

-New patch
+[PULL 41/45] tests/qtest/e1000e-test: Check qemu_recv() succeeded
+The e1000e_send_verify() test calls qemu_recv() but doesn't
+check that the call succeeded, which annoys Coverity. Add
+an explicit test check for the length of the data.
+(This is a test check, not a "we assume this syscall always
+succeeds", so we use g_assert_cmpint() rather than g_assert().)
+Fixes: Coverity CID 1432324
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
+Message-id: 20210525134458.6675-3-peter.maydell@linaro.org
+---
+ tests/qtest/e1000e-test.c | 3 ++-
+file changed, 2 insertions(+), 1 deletion(-)
+diff --git a/tests/qtest/e1000e-test.c b/tests/qtest/e1000e-test.c
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/qtest/e1000e-test.c
++++ b/tests/qtest/e1000e-test.c
+@@ -XXX,XX +XXX,XX @@ static void e1000e_send_verify(QE1000E *d, int *test_sockets, QGuestAllocator *a
+     /* Check data sent to the backend */
+     ret = qemu_recv(test_sockets[0], &recv_len, sizeof(recv_len), 0);
+     g_assert_cmpint(ret, == , sizeof(recv_len));
+-    qemu_recv(test_sockets[0], buffer, 64, 0);
++    ret = qemu_recv(test_sockets[0], buffer, 64, 0);
++    g_assert_cmpint(ret, >=, 5);
+     g_assert_cmpstr(buffer, == , "TEST");
+     /* Free test data buffer */
+--
+.20.1

-[PULL 05/39] target/arm: Add new 's1_is_el0' argument to get_phys_addr_lpae()
+[PULL 42/45] tests/qtest/hd-geo-test: Fix checks on mkstemp() return value
-For ARMv8.2-TTS2UXN, the stage 2 page table walk wants to know
+Coverity notices that the checks against mkstemp() failing in
-whether the stage 1 access is for EL0 or not, because whether
+create_qcow2_with_mbr() are wrong: mkstemp returns -1 on failure but
-exec permission is given can depend on whether this is an EL0
+the check is just "g_assert(fd)".  Fix to use "g_assert(fd >= 0)",
-or EL1 access. Add a new argument to get_phys_addr_lpae() so
+matching the correct check in create_test_img().
 the call sites can pass this information in.
-Since get_phys_addr_lpae() doesn't already have a doc comment,
+Fixes: Coverity CID 1432274
-add one so we have a place to put the documentation of the
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-semantics of the new s1_is_el0 argument.
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
 Message-id: 20210525134458.6675-4-peter.maydell@linaro.org
 ---
  tests/qtest/hd-geo-test.c | 4 ++--
 file changed, 2 insertions(+), 2 deletions(-)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/tests/qtest/hd-geo-test.c b/tests/qtest/hd-geo-test.c
 Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200330210400.11724-4-peter.maydell@linaro.org
 ---
  target/arm/helper.c | 29 ++++++++++++++++++++++++++++-
 file changed, 28 insertions(+), 1 deletion(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/tests/qtest/hd-geo-test.c
-+++ b/target/arm/helper.c
++++ b/tests/qtest/hd-geo-test.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors)
  static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                               bool s1_is_el0,
                                 hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
                                 target_ulong *page_size_ptr,
                                 ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs);
@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
          }
          ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, ARMMMUIdx_Stage2,
 +                                 false,
                                   &s2pa, &txattrs, &s2prot, &s2size, fi,
                                   pcacheattrs);
          if (ret) {
@@ -XXX,XX +XXX,XX @@ static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
      };
  }
 +/**
 + * get_phys_addr_lpae: perform one stage of page table walk, LPAE format
 + *
 + * Returns false if the translation was successful. Otherwise, phys_ptr, attrs,
 + * prot and page_size may not be filled in, and the populated fsr value provides
 + * information on why the translation aborted, in the format of a long-format
 + * DFSR/IFSR fault register, with the following caveats:
 + *  * the WnR bit is never set (the caller must do this).
 + *
 + * @env: CPUARMState
 + * @address: virtual address to get physical address for
 + * @access_type: MMU_DATA_LOAD, MMU_DATA_STORE or MMU_INST_FETCH
 + * @mmu_idx: MMU index indicating required translation regime
 + * @s1_is_el0: if @mmu_idx is ARMMMUIdx_Stage2 (so this is a stage 2 page table
 + *             walk), must be true if this is stage 2 of a stage 1+2 walk for an
 + *             EL0 access). If @mmu_idx is anything else, @s1_is_el0 is ignored.
 + * @phys_ptr: set to the physical address corresponding to the virtual address
 + * @attrs: set to the memory transaction attributes to use
 + * @prot: set to the permissions for the page containing phys_ptr
 + * @page_size_ptr: set to the size of the page containing phys_ptr
 + * @fi: set to fault info if the translation fails
 + * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
 + */
  static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
                                 MMUAccessType access_type, ARMMMUIdx mmu_idx,
 +                               bool s1_is_el0,
                                 hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
                                 target_ulong *page_size_ptr,
                                 ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
              /* S1 is done. Now do S2 translation.  */
              ret = get_phys_addr_lpae(env, ipa, access_type, ARMMMUIdx_Stage2,
 +                                     mmu_idx == ARMMMUIdx_E10_0,
                                       phys_ptr, attrs, &s2_prot,
                                       page_size, fi,
                                       cacheattrs != NULL ? &cacheattrs2 : NULL);
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
      }
-     if (regime_using_lpae_format(env, mmu_idx)) {
+     fd = mkstemp(raw_path);
--        return get_phys_addr_lpae(env, address, access_type, mmu_idx,
+-    g_assert(fd);
-+        return get_phys_addr_lpae(env, address, access_type, mmu_idx, false,
++    g_assert(fd >= 0);
-                                   phys_ptr, attrs, prot, page_size,
+     close(fd);
-                                   fi, cacheattrs);
-     } else if (regime_sctlr(env, mmu_idx) & SCTLR_XP) {
+     fd = open(raw_path, O_WRONLY);
@@ -XXX,XX +XXX,XX @@ static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors)
      close(fd);
      fd = mkstemp(qcow2_path);
 -    g_assert(fd);
 +    g_assert(fd >= 0);
      close(fd);
      qemu_img_path = getenv("QTEST_QEMU_IMG");
 --
 .20.1

-[PULL 10/39] hw/arm: versal: Move misplaced comment
+[PULL 43/45] tests/qtest/pflash-cfi02-test: Avoid potential integer overflow
-From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+Coverity points out that we calculate a 64-bit value using 32-bit
 arithmetic; add the cast to force the multiply to be done as 64-bits.
 (The overflow will never happen with the current test data.)
-Move misplaced comment.
+Fixes: Coverity CID 1432320
 Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
 Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Luc Michel <luc.michel@greensocs.com>
 Message-id: 20200427181649.26851-3-edgar.iglesias@gmail.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
+Message-id: 20210525134458.6675-5-peter.maydell@linaro.org
 ---
- hw/arm/xlnx-versal.c | 2 +-
+ tests/qtest/pflash-cfi02-test.c | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
+diff --git a/tests/qtest/pflash-cfi02-test.c b/tests/qtest/pflash-cfi02-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/xlnx-versal.c
+--- a/tests/qtest/pflash-cfi02-test.c
-+++ b/hw/arm/xlnx-versal.c
++++ b/tests/qtest/pflash-cfi02-test.c
-@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
+@@ -XXX,XX +XXX,XX @@ static void test_geometry(const void *opaque)
-         obj = object_new(XLNX_VERSAL_ACPU_TYPE);
+     for (int region = 0; region < nb_erase_regions; ++region) {
-         if (!obj) {
+         for (uint32_t i = 0; i < c->nb_blocs[region]; ++i) {
--            /* Secondary CPUs start in PSCI powered-down state */
+-            uint64_t byte_addr = i * c->sector_len[region];
-             error_report("Unable to create apu.cpu[%d] of type %s",
++            uint64_t byte_addr = (uint64_t)i * c->sector_len[region];
-                          i, XLNX_VERSAL_ACPU_TYPE);
+             g_assert_cmphex(flash_read(c, byte_addr), ==, bank_mask(c));
              exit(EXIT_FAILURE);
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
          object_property_set_int(obj, s->cfg.psci_conduit,
                                  "psci-conduit", &error_abort);
          if (i) {
 +            /* Secondary CPUs start in PSCI powered-down state */
              object_property_set_bool(obj, true,
                                       "start-powered-off", &error_abort);
          }
+     }
 --
 .20.1

-[PULL 21/39] target/arm: Don't allow Thumb Neon insns without FEATURE_NEON
+[PULL 44/45] tests/qtest/tpm-tests: Remove unnecessary NULL checks
-We were accidentally permitting decode of Thumb Neon insns even if
+Coverity points out that in tpm_test_swtpm_migration_test() we
-the CPU didn't have the FEATURE_NEON bit set, because the feature
+assume that src_tpm_addr and dst_tpm_addr are non-NULL (we
-check was being done before the call to disas_neon_data_insn() and
+pass them to tpm_util_migration_start_qemu() which will
-disas_neon_ls_insn() in the Arm decoder but was omitted from the
+unconditionally dereference them) but then later explicitly
-Thumb decoder.  Push the feature bit check down into the called
+check them for NULL. Remove the pointless checks.
-functions so it is done for both Arm and Thumb encodings.
 Fixes: Coverity CID 1432367, 1432359
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
-Message-id: 20200430181003.21682-3-peter.maydell@linaro.org
+Message-id: 20210525134458.6675-6-peter.maydell@linaro.org
 ---
- target/arm/translate.c | 16 ++++++++--------
+ tests/qtest/tpm-tests.c | 12 ++++--------
-file changed, 8 insertions(+), 8 deletions(-)
+file changed, 4 insertions(+), 8 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/tests/qtest/tpm-tests.c
-+++ b/target/arm/translate.c
++++ b/tests/qtest/tpm-tests.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ void tpm_test_swtpm_migration_test(const char *src_tpm_path,
-     TCGv_i32 tmp2;
+     qtest_quit(src_qemu);
-     TCGv_i64 tmp64;
+     tpm_util_swtpm_kill(dst_tpm_pid);
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+-    if (dst_tpm_addr) {
-+        return 1;
+-        g_unlink(dst_tpm_addr->u.q_unix.path);
-+    }
+-        qapi_free_SocketAddress(dst_tpm_addr);
-+
+-    }
-     /* FIXME: this access check should not take precedence over UNDEF
++    g_unlink(dst_tpm_addr->u.q_unix.path);
-      * for invalid encodings; we will generate incorrect syndrome information
++    qapi_free_SocketAddress(dst_tpm_addr);
-      * for attempts to execute invalid vfp/neon encodings with FP disabled.
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+     tpm_util_swtpm_kill(src_tpm_pid);
-     TCGv_ptr ptr1, ptr2, ptr3;
+-    if (src_tpm_addr) {
-     TCGv_i64 tmp64;
+-        g_unlink(src_tpm_addr->u.q_unix.path);
+-        qapi_free_SocketAddress(src_tpm_addr);
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+-    }
-+        return 1;
++    g_unlink(src_tpm_addr->u.q_unix.path);
-+    }
++    qapi_free_SocketAddress(src_tpm_addr);
-+
+ }
      /* FIXME: this access check should not take precedence over UNDEF
       * for invalid encodings; we will generate incorrect syndrome information
       * for attempts to execute invalid vfp/neon encodings with FP disabled.
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
          if (((insn >> 25) & 7) == 1) {
              /* NEON Data processing.  */
 -            if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 -                goto illegal_op;
 -            }
 -
              if (disas_neon_data_insn(s, insn)) {
                  goto illegal_op;
              }
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
          }
          if ((insn & 0x0f100000) == 0x04000000) {
              /* NEON load/store.  */
 -            if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 -                goto illegal_op;
 -            }
 -
              if (disas_neon_ls_insn(s, insn)) {
                  goto illegal_op;
              }
 --
 .20.1

-[PULL 04/39] target/arm: Use enum constant in get_phys_addr_lpae() call
+[PULL 45/45] tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed
-The access_type argument to get_phys_addr_lpae() is an MMUAccessType;
+Coverity complains that we don't check for failures from dup()
-use the enum constant MMU_DATA_LOAD rather than a literal 0 when we
+and mkstemp(); add asserts that these syscalls succeeded.
 call it in S1_ptw_translate().
+Fixes: Coverity CID 1432516, 1432574
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20200330210400.11724-3-peter.maydell@linaro.org
+Message-id: 20210525134458.6675-7-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 5 +++--
+ tests/unit/test-vmstate.c | 5 ++++-
-file changed, 3 insertions(+), 2 deletions(-)
+file changed, 4 insertions(+), 1 deletion(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/tests/unit/test-vmstate.c b/tests/unit/test-vmstate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/tests/unit/test-vmstate.c
-+++ b/target/arm/helper.c
++++ b/tests/unit/test-vmstate.c
-@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
+@@ -XXX,XX +XXX,XX @@ static int temp_fd;
-             pcacheattrs = &cacheattrs;
+ /* Duplicate temp_fd and seek to the beginning of the file */
-         }
+ static QEMUFile *open_test_file(bool write)
+ {
--        ret = get_phys_addr_lpae(env, addr, 0, ARMMMUIdx_Stage2, &s2pa,
+-    int fd = dup(temp_fd);
--                                 &txattrs, &s2prot, &s2size, fi, pcacheattrs);
++    int fd;
-+        ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, ARMMMUIdx_Stage2,
+     QIOChannel *ioc;
-+                                 &s2pa, &txattrs, &s2prot, &s2size, fi,
+     QEMUFile *f;
-+                                 pcacheattrs);
-         if (ret) {
++    fd = dup(temp_fd);
-             assert(fi->type != ARMFault_None);
++    g_assert(fd >= 0);
-             fi->s2addr = addr;
+     lseek(fd, 0, SEEK_SET);
      if (write) {
          g_assert_cmpint(ftruncate(fd, 0), ==, 0);
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
      g_autofree char *temp_file = g_strdup_printf("%s/vmst.test.XXXXXX",
                                                   g_get_tmp_dir());
      temp_fd = mkstemp(temp_file);
 +    g_assert(temp_fd >= 0);
      module_call_init(MODULE_INIT_QOM);
 --
 .20.1

Most of this is the Neon decodetree patches, followed by Edgar's versal cleanups.

thanks
-- PMM

The following changes since commit 2ef486e76d64436be90f7359a3071fb2a56ce835:

Merge remote-tracking branch 'remotes/marcel/tags/rdma-pull-request' into staging (2020-05-03 14:12:56 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200504

for you to fetch changes up to 9aefc6cf9b73f66062d2f914a0136756e7a28211:

target/arm: Move gen_ function typedefs to translate.h (2020-05-04 12:59:26 +0100)

----------------------------------------------------------------
target-arm queue:
 * Start of conversion of Neon insns to decodetree
 * versal board: support SD and RTC
 * Implement ARMv8.2-TTS2UXN
 * Make VQDMULL undefined when U=1
 * Some minor code cleanups

----------------------------------------------------------------
Edgar E. Iglesias (11):
      hw/arm: versal: Remove inclusion of arm_gicv3_common.h
      hw/arm: versal: Move misplaced comment
      hw/arm: versal-virt: Fix typo xlnx-ve -> xlnx-versal
      hw/arm: versal: Embed the UARTs into the SoC type
      hw/arm: versal: Embed the GEMs into the SoC type
      hw/arm: versal: Embed the ADMAs into the SoC type
      hw/arm: versal: Embed the APUs into the SoC type
      hw/arm: versal: Add support for SD
      hw/arm: versal: Add support for the RTC
      hw/arm: versal-virt: Add support for SD
      hw/arm: versal-virt: Add support for the RTC

Fredrik Strupe (1):
      target/arm: Make VQDMULL undefined when U=1

Peter Maydell (25):
      target/arm: Don't use a TLB for ARMMMUIdx_Stage2
      target/arm: Use enum constant in get_phys_addr_lpae() call
      target/arm: Add new 's1_is_el0' argument to get_phys_addr_lpae()
      target/arm: Implement ARMv8.2-TTS2UXN
      target/arm: Use correct variable for setting 'max' cpu's ID_AA64DFR0
      target/arm/translate-vfp.inc.c: Remove duplicate simd_r32 check
      target/arm: Don't allow Thumb Neon insns without FEATURE_NEON
      target/arm: Add stubs for AArch32 Neon decodetree
      target/arm: Convert VCMLA (vector) to decodetree
      target/arm: Convert VCADD (vector) to decodetree
      target/arm: Convert V[US]DOT (vector) to decodetree
      target/arm: Convert VFM[AS]L (vector) to decodetree
      target/arm: Convert VCMLA (scalar) to decodetree
      target/arm: Convert V[US]DOT (scalar) to decodetree
      target/arm: Convert VFM[AS]L (scalar) to decodetree
      target/arm: Convert Neon load/store multiple structures to decodetree
      target/arm: Convert Neon 'load single structure to all lanes' to decodetree
      target/arm: Convert Neon 'load/store single structure' to decodetree
      target/arm: Convert Neon 3-reg-same VADD/VSUB to decodetree
      target/arm: Convert Neon 3-reg-same logic ops to decodetree
      target/arm: Convert Neon 3-reg-same VMAX/VMIN to decodetree
      target/arm: Convert Neon 3-reg-same comparisons to decodetree
      target/arm: Convert Neon 3-reg-same VQADD/VQSUB to decodetree
      target/arm: Convert Neon 3-reg-same VMUL, VMLA, VMLS, VSHL to decodetree
      target/arm: Move gen_ function typedefs to translate.h

Philippe Mathieu-Daudé (2):
      hw/arm/mps2-tz: Use TYPE_IOTKIT instead of hardcoded string
      target/arm: Use uint64_t for midr field in CPU state struct

include/hw/arm/xlnx-versal.h    |  31 +-
 target/arm/cpu-param.h          |   2 +-
 target/arm/cpu.h                |  38 ++-
 target/arm/translate-a64.h      |   9 -
 target/arm/translate.h          |  26 ++
 target/arm/neon-dp.decode       |  86 +++++
 target/arm/neon-ls.decode       |  52 +++
 target/arm/neon-shared.decode   |  66 ++++
 hw/arm/mps2-tz.c                |   2 +-
 hw/arm/xlnx-versal-virt.c       |  74 ++++-
 hw/arm/xlnx-versal.c            | 115 +++++--
 target/arm/cpu.c                |   3 +-
 target/arm/cpu64.c              |   8 +-
 target/arm/helper.c             | 183 ++++------
 target/arm/translate-a64.c      |  17 -
 target/arm/translate-neon.inc.c | 714 +++++++++++++++++++++++++++++++++++++++
 target/arm/translate-vfp.inc.c  |   6 -
 target/arm/translate.c          | 716 +++-------------------------------------
 target/arm/Makefile.objs        |  18 +
 19 files changed, 1302 insertions(+), 864 deletions(-)
 create mode 100644 target/arm/neon-dp.decode
 create mode 100644 target/arm/neon-ls.decode
 create mode 100644 target/arm/neon-shared.decode
 create mode 100644 target/arm/translate-neon.inc.c

From: Fredrik Strupe <fredrik@strupe.net>

According to Arm ARM, VQDMULL is only valid when U=0, while having
U=1 is unallocated.

Signed-off-by: Fredrik Strupe <fredrik@strupe.net>
Fixes: 695272dcb976 ("target-arm: Handle UNDEF cases for Neon 3-regs-different-widths")
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 0}, /* VMLSL */
                     {0, 0, 0, 9}, /* VQDMLSL */
                     {0, 0, 0, 0}, /* Integer VMULL */
-                    {0, 0, 0, 1}, /* VQDMULL */
+                    {0, 0, 0, 9}, /* VQDMULL */
                     {0, 0, 0, 0xa}, /* Polynomial VMULL */
                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
                 };
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

By using the TYPE_* definitions for devices, we can:
 - quickly find where devices are used with 'git-grep'
 - easily rename a device (one-line change).

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20200428154650.21991-1-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/mps2-tz.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
         exit(EXIT_FAILURE);
     }
 
-    sysbus_init_child_obj(OBJECT(machine), "iotkit", &mms->iotkit,
+    sysbus_init_child_obj(OBJECT(machine), TYPE_IOTKIT, &mms->iotkit,
                           sizeof(mms->iotkit), mmc->armsse_type);
     iotkitdev = DEVICE(&mms->iotkit);
     object_property_set_link(OBJECT(&mms->iotkit), OBJECT(system_memory),
-- 
2.20.1

We define ARMMMUIdx_Stage2 as being an MMU index which uses a QEMU
TLB.  However we never actually use the TLB -- all stage 2 lookups
are done by direct calls to get_phys_addr_lpae() followed by a
physical address load via address_space_ld*().

Remove Stage2 from the list of ARM MMU indexes which correspond to
real core MMU indexes, and instead put it in the set of "NOTLB" ARM
MMU indexes.

This allows us to drop NB_MMU_MODES to 11.  It also means we can
safely add support for the ARMv8.3-TTS2UXN extension, which adds
permission bits to the stage 2 descriptors which define execute
permission separatel for EL0 and EL1; supporting that while keeping
Stage2 in a QEMU TLB would require us to use separate TLBs for
"Stage2 for an EL0 access" and "Stage2 for an EL1 access", which is a
lot of extra complication given we aren't even using the QEMU TLB.

In the process of updating the comment on our MMU index use,
fix a couple of other minor errors:
 * NS EL2 EL2&0 was missing from the list in the comment
 * some text hadn't been updated from when we bumped NB_MMU_MODES
   above 8

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200330210400.11724-2-peter.maydell@linaro.org
---
 target/arm/cpu-param.h |   2 +-
 target/arm/cpu.h       |  21 +++++---
 target/arm/helper.c    | 112 ++++-------------------------------------
 3 files changed, 27 insertions(+), 108 deletions(-)

diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu-param.h
+++ b/target/arm/cpu-param.h
@@ -XXX,XX +XXX,XX @@
 # define TARGET_PAGE_BITS_MIN  10
 #endif
 
-#define NB_MMU_MODES 12
+#define NB_MMU_MODES 11
 
 #endif
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
  *     handling via the TLB. The only way to do a stage 1 translation without
  *     the immediate stage 2 translation is via the ATS or AT system insns,
  *     which can be slow-pathed and always do a page table walk.
+ *     The only use of stage 2 translations is either as part of an s1+2
+ *     lookup or when loading the descriptors during a stage 1 page table walk,
+ *     and in both those cases we don't use the TLB.
  *  4. we can also safely fold together the "32 bit EL3" and "64 bit EL3"
  *     translation regimes, because they map reasonably well to each other
  *     and they can't both be active at the same time.
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
  * NS EL1 EL1&0 stage 1+2 (aka NS PL1)
  * NS EL1 EL1&0 stage 1+2 +PAN
  * NS EL0 EL2&0
+ * NS EL2 EL2&0
  * NS EL2 EL2&0 +PAN
  * NS EL2 (aka NS PL2)
  * S EL0 EL1&0 (aka S PL0)
  * S EL1 EL1&0 (not used if EL3 is 32 bit)
  * S EL1 EL1&0 +PAN
  * S EL3 (aka S PL1)
- * NS EL1&0 stage 2
  *
- * for a total of 12 different mmu_idx.
+ * for a total of 11 different mmu_idx.
  *
  * R profile CPUs have an MPU, but can use the same set of MMU indexes
  * as A profile. They only need to distinguish NS EL0 and NS EL1 (and
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
  * are not quite the same -- different CPU types (most notably M profile
  * vs A/R profile) would like to use MMU indexes with different semantics,
  * but since we don't ever need to use all of those in a single CPU we
- * can avoid setting NB_MMU_MODES to more than 8. The lower bits of
+ * can avoid having to set NB_MMU_MODES to "total number of A profile MMU
+ * modes + total number of M profile MMU modes". The lower bits of
  * ARMMMUIdx are the core TLB mmu index, and the higher bits are always
  * the same for any particular CPU.
  * Variables of type ARMMUIdx are always full values, and the core
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
     ARMMMUIdx_SE10_1_PAN = 9 | ARM_MMU_IDX_A,
     ARMMMUIdx_SE3        = 10 | ARM_MMU_IDX_A,
 
-    ARMMMUIdx_Stage2     = 11 | ARM_MMU_IDX_A,
-
     /*
      * These are not allocated TLBs and are used only for AT system
      * instructions or for the first stage of an S12 page table walk.
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
     ARMMMUIdx_Stage1_E0 = 0 | ARM_MMU_IDX_NOTLB,
     ARMMMUIdx_Stage1_E1 = 1 | ARM_MMU_IDX_NOTLB,
     ARMMMUIdx_Stage1_E1_PAN = 2 | ARM_MMU_IDX_NOTLB,
+    /*
+     * Not allocated a TLB: used only for second stage of an S12 page
+     * table walk, or for descriptor loads during first stage of an S1
+     * page table walk. Note that if we ever want to have a TLB for this
+     * then various TLB flush insns which currently are no-ops or flush
+     * only stage 1 MMU indexes will need to change to flush stage 2.
+     */
+    ARMMMUIdx_Stage2     = 3 | ARM_MMU_IDX_NOTLB,
 
     /*
      * M-profile.
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdxBit {
     TO_CORE_BIT(SE10_1),
     TO_CORE_BIT(SE10_1_PAN),
     TO_CORE_BIT(SE3),
-    TO_CORE_BIT(Stage2),
 
     TO_CORE_BIT(MUser),
     TO_CORE_BIT(MPriv),
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void tlbiall_nsnh_write(CPUARMState *env, const ARMCPRegInfo *ri,
     tlb_flush_by_mmuidx(cs,
                         ARMMMUIdxBit_E10_1 |
                         ARMMMUIdxBit_E10_1_PAN |
-                        ARMMMUIdxBit_E10_0 |
-                        ARMMMUIdxBit_Stage2);
+                        ARMMMUIdxBit_E10_0);
 }
 
 static void tlbiall_nsnh_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -XXX,XX +XXX,XX @@ static void tlbiall_nsnh_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
     tlb_flush_by_mmuidx_all_cpus_synced(cs,
                                         ARMMMUIdxBit_E10_1 |
                                         ARMMMUIdxBit_E10_1_PAN |
-                                        ARMMMUIdxBit_E10_0 |
-                                        ARMMMUIdxBit_Stage2);
+                                        ARMMMUIdxBit_E10_0);
 }
 
-static void tlbiipas2_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                            uint64_t value)
-{
-    /* Invalidate by IPA. This has to invalidate any structures that
-     * contain only stage 2 translation information, but does not need
-     * to apply to structures that contain combined stage 1 and stage 2
-     * translation information.
-     * This must NOP if EL2 isn't implemented or SCR_EL3.NS is zero.
-     */
-    CPUState *cs = env_cpu(env);
-    uint64_t pageaddr;
-
-    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
-        return;
-    }
-
-    pageaddr = sextract64(value << 12, 0, 40);
-
-    tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_Stage2);
-}
-
-static void tlbiipas2_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                               uint64_t value)
-{
-    CPUState *cs = env_cpu(env);
-    uint64_t pageaddr;
-
-    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
-        return;
-    }
-
-    pageaddr = sextract64(value << 12, 0, 40);
-
-    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
-                                             ARMMMUIdxBit_Stage2);
-}
 
 static void tlbiall_hyp_write(CPUARMState *env, const ARMCPRegInfo *ri,
                               uint64_t value)
@@ -XXX,XX +XXX,XX @@ static void vttbr_write(CPUARMState *env, const ARMCPRegInfo *ri,
         tlb_flush_by_mmuidx(cs,
                             ARMMMUIdxBit_E10_1 |
                             ARMMMUIdxBit_E10_1_PAN |
-                            ARMMMUIdxBit_E10_0 |
-                            ARMMMUIdxBit_Stage2);
+                            ARMMMUIdxBit_E10_0);
         raw_write(env, ri, value);
     }
 }
@@ -XXX,XX +XXX,XX @@ static int alle1_tlbmask(CPUARMState *env)
         return ARMMMUIdxBit_SE10_1 |
                ARMMMUIdxBit_SE10_1_PAN |
                ARMMMUIdxBit_SE10_0;
-    } else if (arm_feature(env, ARM_FEATURE_EL2)) {
-        return ARMMMUIdxBit_E10_1 |
-               ARMMMUIdxBit_E10_1_PAN |
-               ARMMMUIdxBit_E10_0 |
-               ARMMMUIdxBit_Stage2;
     } else {
         return ARMMMUIdxBit_E10_1 |
                ARMMMUIdxBit_E10_1_PAN |
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                              ARMMMUIdxBit_SE3);
 }
 
-static void tlbi_aa64_ipas2e1_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                                    uint64_t value)
-{
-    /* Invalidate by IPA. This has to invalidate any structures that
-     * contain only stage 2 translation information, but does not need
-     * to apply to structures that contain combined stage 1 and stage 2
-     * translation information.
-     * This must NOP if EL2 isn't implemented or SCR_EL3.NS is zero.
-     */
-    ARMCPU *cpu = env_archcpu(env);
-    CPUState *cs = CPU(cpu);
-    uint64_t pageaddr;
-
-    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
-        return;
-    }
-
-    pageaddr = sextract64(value << 12, 0, 48);
-
-    tlb_flush_page_by_mmuidx(cs, pageaddr, ARMMMUIdxBit_Stage2);
-}
-
-static void tlbi_aa64_ipas2e1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                                      uint64_t value)
-{
-    CPUState *cs = env_cpu(env);
-    uint64_t pageaddr;
-
-    if (!arm_feature(env, ARM_FEATURE_EL2) || !(env->cp15.scr_el3 & SCR_NS)) {
-        return;
-    }
-
-    pageaddr = sextract64(value << 12, 0, 48);
-
-    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
-                                             ARMMMUIdxBit_Stage2);
-}
-
 static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri,
                                       bool isread)
 {
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .writefn = tlbi_aa64_vae1_write },
     { .name = "TLBI_IPAS2E1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
-      .access = PL2_W, .type = ARM_CP_NO_RAW,
-      .writefn = tlbi_aa64_ipas2e1is_write },
+      .access = PL2_W, .type = ARM_CP_NOP },
     { .name = "TLBI_IPAS2LE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5,
-      .access = PL2_W, .type = ARM_CP_NO_RAW,
-      .writefn = tlbi_aa64_ipas2e1is_write },
+      .access = PL2_W, .type = ARM_CP_NOP },
     { .name = "TLBI_ALLE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 3, .opc2 = 4,
       .access = PL2_W, .type = ARM_CP_NO_RAW,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .writefn = tlbi_aa64_alle1is_write },
     { .name = "TLBI_IPAS2E1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1,
-      .access = PL2_W, .type = ARM_CP_NO_RAW,
-      .writefn = tlbi_aa64_ipas2e1_write },
+      .access = PL2_W, .type = ARM_CP_NOP },
     { .name = "TLBI_IPAS2LE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5,
-      .access = PL2_W, .type = ARM_CP_NO_RAW,
-      .writefn = tlbi_aa64_ipas2e1_write },
+      .access = PL2_W, .type = ARM_CP_NOP },
     { .name = "TLBI_ALLE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 4,
       .access = PL2_W, .type = ARM_CP_NO_RAW,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .writefn = tlbimva_hyp_is_write },
     { .name = "TLBIIPAS2",
       .cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 1,
-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-      .writefn = tlbiipas2_write },
+      .type = ARM_CP_NOP, .access = PL2_W },
     { .name = "TLBIIPAS2IS",
       .cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-      .writefn = tlbiipas2_is_write },
+      .type = ARM_CP_NOP, .access = PL2_W },
     { .name = "TLBIIPAS2L",
       .cp = 15, .opc1 = 4, .crn = 8, .crm = 4, .opc2 = 5,
-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-      .writefn = tlbiipas2_write },
+      .type = ARM_CP_NOP, .access = PL2_W },
     { .name = "TLBIIPAS2LIS",
       .cp = 15, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 5,
-      .type = ARM_CP_NO_RAW, .access = PL2_W,
-      .writefn = tlbiipas2_is_write },
+      .type = ARM_CP_NOP, .access = PL2_W },
     /* 32 bit cache operations */
     { .name = "ICIALLUIS", .cp = 15, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
-- 
2.20.1

The access_type argument to get_phys_addr_lpae() is an MMUAccessType;
use the enum constant MMU_DATA_LOAD rather than a literal 0 when we
call it in S1_ptw_translate().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200330210400.11724-3-peter.maydell@linaro.org
---
 target/arm/helper.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
             pcacheattrs = &cacheattrs;
         }
 
-        ret = get_phys_addr_lpae(env, addr, 0, ARMMMUIdx_Stage2, &s2pa,
-                                 &txattrs, &s2prot, &s2size, fi, pcacheattrs);
+        ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, ARMMMUIdx_Stage2,
+                                 &s2pa, &txattrs, &s2prot, &s2size, fi,
+                                 pcacheattrs);
         if (ret) {
             assert(fi->type != ARMFault_None);
             fi->s2addr = addr;
-- 
2.20.1

For ARMv8.2-TTS2UXN, the stage 2 page table walk wants to know
whether the stage 1 access is for EL0 or not, because whether
exec permission is given can depend on whether this is an EL0
or EL1 access. Add a new argument to get_phys_addr_lpae() so
the call sites can pass this information in.

Since get_phys_addr_lpae() doesn't already have a doc comment,
add one so we have a place to put the documentation of the
semantics of the new s1_is_el0 argument.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200330210400.11724-4-peter.maydell@linaro.org
---
 target/arm/helper.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@
 
 static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
                                MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                               bool s1_is_el0,
                                hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
                                target_ulong *page_size_ptr,
                                ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs);
@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
         }
 
         ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, ARMMMUIdx_Stage2,
+                                 false,
                                  &s2pa, &txattrs, &s2prot, &s2size, fi,
                                  pcacheattrs);
         if (ret) {
@@ -XXX,XX +XXX,XX @@ static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
     };
 }
 
+/**
+ * get_phys_addr_lpae: perform one stage of page table walk, LPAE format
+ *
+ * Returns false if the translation was successful. Otherwise, phys_ptr, attrs,
+ * prot and page_size may not be filled in, and the populated fsr value provides
+ * information on why the translation aborted, in the format of a long-format
+ * DFSR/IFSR fault register, with the following caveats:
+ *  * the WnR bit is never set (the caller must do this).
+ *
+ * @env: CPUARMState
+ * @address: virtual address to get physical address for
+ * @access_type: MMU_DATA_LOAD, MMU_DATA_STORE or MMU_INST_FETCH
+ * @mmu_idx: MMU index indicating required translation regime
+ * @s1_is_el0: if @mmu_idx is ARMMMUIdx_Stage2 (so this is a stage 2 page table
+ *             walk), must be true if this is stage 2 of a stage 1+2 walk for an
+ *             EL0 access). If @mmu_idx is anything else, @s1_is_el0 is ignored.
+ * @phys_ptr: set to the physical address corresponding to the virtual address
+ * @attrs: set to the memory transaction attributes to use
+ * @prot: set to the permissions for the page containing phys_ptr
+ * @page_size_ptr: set to the size of the page containing phys_ptr
+ * @fi: set to fault info if the translation fails
+ * @cacheattrs: (if non-NULL) set to the cacheability/shareability attributes
+ */
 static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
                                MMUAccessType access_type, ARMMMUIdx mmu_idx,
+                               bool s1_is_el0,
                                hwaddr *phys_ptr, MemTxAttrs *txattrs, int *prot,
                                target_ulong *page_size_ptr,
                                ARMMMUFaultInfo *fi, ARMCacheAttrs *cacheattrs)
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
 
             /* S1 is done. Now do S2 translation.  */
             ret = get_phys_addr_lpae(env, ipa, access_type, ARMMMUIdx_Stage2,
+                                     mmu_idx == ARMMMUIdx_E10_0,
                                      phys_ptr, attrs, &s2_prot,
                                      page_size, fi,
                                      cacheattrs != NULL ? &cacheattrs2 : NULL);
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
     }
 
     if (regime_using_lpae_format(env, mmu_idx)) {
-        return get_phys_addr_lpae(env, address, access_type, mmu_idx,
+        return get_phys_addr_lpae(env, address, access_type, mmu_idx, false,
                                   phys_ptr, attrs, prot, page_size,
                                   fi, cacheattrs);
     } else if (regime_sctlr(env, mmu_idx) & SCTLR_XP) {
-- 
2.20.1

The ARMv8.2-TTS2UXN feature extends the XN field in stage 2
translation table descriptors from just bit [54] to bits [54:53],
allowing stage 2 to control execution permissions separately for EL0
and EL1. Implement the new semantics of the XN field and enable
the feature for our 'max' CPU.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200330210400.11724-5-peter.maydell@linaro.org
---
 target/arm/cpu.h    | 15 +++++++++++++++
 target/arm/cpu.c    |  1 +
 target/arm/cpu64.c  |  2 ++
 target/arm/helper.c | 37 +++++++++++++++++++++++++++++++------
 4 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_ccidx(const ARMISARegisters *id)
     return FIELD_EX32(id->id_mmfr4, ID_MMFR4, CCIDX) != 0;
 }
 
+static inline bool isar_feature_aa32_tts2uxn(const ARMISARegisters *id)
+{
+    return FIELD_EX32(id->id_mmfr4, ID_MMFR4, XNX) != 0;
+}
+
 /*
  * 64-bit feature tests via id registers.
  */
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_ccidx(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, CCIDX) != 0;
 }
 
+static inline bool isar_feature_aa64_tts2uxn(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, XNX) != 0;
+}
+
 /*
  * Feature tests for "does this exist in either 32-bit or 64-bit?"
  */
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_any_ccidx(const ARMISARegisters *id)
     return isar_feature_aa64_ccidx(id) || isar_feature_aa32_ccidx(id);
 }
 
+static inline bool isar_feature_any_tts2uxn(const ARMISARegisters *id)
+{
+    return isar_feature_aa64_tts2uxn(id) || isar_feature_aa32_tts2uxn(id);
+}
+
 /*
  * Forward to the above feature tests given an ARMCPU pointer.
  */
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
             t = FIELD_DP32(t, ID_MMFR4, HPDS, 1); /* AA32HPD */
             t = FIELD_DP32(t, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
             t = FIELD_DP32(t, ID_MMFR4, CNP, 1); /* TTCNP */
+            t = FIELD_DP32(t, ID_MMFR4, XNX, 1); /* TTS2UXN */
             cpu->isar.id_mmfr4 = t;
         }
 #endif
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         t = FIELD_DP64(t, ID_AA64MMFR1, VH, 1);
         t = FIELD_DP64(t, ID_AA64MMFR1, PAN, 2); /* ATS1E1 */
         t = FIELD_DP64(t, ID_AA64MMFR1, VMIDBITS, 2); /* VMID16 */
+        t = FIELD_DP64(t, ID_AA64MMFR1, XNX, 1); /* TTS2UXN */
         cpu->isar.id_aa64mmfr1 = t;
 
         t = cpu->isar.id_aa64mmfr2;
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         u = FIELD_DP32(u, ID_MMFR4, HPDS, 1); /* AA32HPD */
         u = FIELD_DP32(u, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
         u = FIELD_DP32(u, ID_MMFR4, CNP, 1); /* TTCNP */
+        u = FIELD_DP32(u, ID_MMFR4, XNX, 1); /* TTS2UXN */
         cpu->isar.id_mmfr4 = u;
 
         u = cpu->isar.id_aa64dfr0;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx mmu_idx, int ap)
  *
  * @env:     CPUARMState
  * @s2ap:    The 2-bit stage2 access permissions (S2AP)
- * @xn:      XN (execute-never) bit
+ * @xn:      XN (execute-never) bits
+ * @s1_is_el0: true if this is S2 of an S1+2 walk for EL0
  */
-static int get_S2prot(CPUARMState *env, int s2ap, int xn)
+static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
 {
     int prot = 0;
 
@@ -XXX,XX +XXX,XX @@ static int get_S2prot(CPUARMState *env, int s2ap, int xn)
     if (s2ap & 2) {
         prot |= PAGE_WRITE;
     }
-    if (!xn) {
-        if (arm_el_is_aa64(env, 2) || prot & PAGE_READ) {
+
+    if (cpu_isar_feature(any_tts2uxn, env_archcpu(env))) {
+        switch (xn) {
+        case 0:
             prot |= PAGE_EXEC;
+            break;
+        case 1:
+            if (s1_is_el0) {
+                prot |= PAGE_EXEC;
+            }
+            break;
+        case 2:
+            break;
+        case 3:
+            if (!s1_is_el0) {
+                prot |= PAGE_EXEC;
+            }
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    } else {
+        if (!extract32(xn, 1, 1)) {
+            if (arm_el_is_aa64(env, 2) || prot & PAGE_READ) {
+                prot |= PAGE_EXEC;
+            }
         }
     }
     return prot;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
     }
 
     ap = extract32(attrs, 4, 2);
-    xn = extract32(attrs, 12, 1);
 
     if (mmu_idx == ARMMMUIdx_Stage2) {
         ns = true;
-        *prot = get_S2prot(env, ap, xn);
+        xn = extract32(attrs, 11, 2);
+        *prot = get_S2prot(env, ap, xn, s1_is_el0);
     } else {
         ns = extract32(attrs, 3, 1);
+        xn = extract32(attrs, 12, 1);
         pxn = extract32(attrs, 11, 1);
         *prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
     }
-- 
2.20.1

In aarch64_max_initfn() we update both 32-bit and 64-bit ID
registers.  The intended pattern is that for 64-bit ID registers we
use FIELD_DP64 and the uint64_t 't' register, while 32-bit ID
registers use FIELD_DP32 and the uint32_t 'u' register.  For
ID_AA64DFR0 we accidentally used 'u', meaning that the top 32 bits of
this 64-bit ID register would end up always zero.  Luckily at the
moment that's what they should be anyway, so this bug has no visible
effects.

Use the right-sized variable.

Fixes: 3bec78447a958d481991
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20200423110915.10527-1-peter.maydell@linaro.org
---
 target/arm/cpu64.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         u = FIELD_DP32(u, ID_MMFR4, XNX, 1); /* TTS2UXN */
         cpu->isar.id_mmfr4 = u;
 
-        u = cpu->isar.id_aa64dfr0;
-        u = FIELD_DP64(u, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
-        cpu->isar.id_aa64dfr0 = u;
+        t = cpu->isar.id_aa64dfr0;
+        t = FIELD_DP64(t, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
+        cpu->isar.id_aa64dfr0 = t;
 
         u = cpu->isar.id_dfr0;
         u = FIELD_DP32(u, ID_DFR0, PERFMON, 5); /* v8.4-PMU */
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

MIDR_EL1 is a 64-bit system register with the top 32-bit being RES0.
Represent it in QEMU's ARMCPU struct with a uint64_t, not a
uint32_t.

This fixes an error when compiling with -Werror=conversion
because we were manipulating the register value using a
local uint64_t variable:

target/arm/cpu64.c: In function ‘aarch64_max_initfn’:
  target/arm/cpu64.c:628:21: error: conversion from ‘uint64_t’ {aka ‘long unsigned int’} to ‘uint32_t’ {aka ‘unsigned int’} may change value [-Werror=conversion]
    628 |         cpu->midr = t;
        |                     ^

and future-proofs us against a possible future architecture
change using some of the top 32 bits.

Suggested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Message-id: 20200428172634.29707-1-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h | 2 +-
 target/arm/cpu.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
         uint64_t id_aa64dfr0;
         uint64_t id_aa64dfr1;
     } isar;
-    uint32_t midr;
+    uint64_t midr;
     uint32_t revidr;
     uint32_t reset_fpsid;
     uint32_t ctr;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo arm_cpus[] = {
 static Property arm_cpu_properties[] = {
     DEFINE_PROP_BOOL("start-powered-off", ARMCPU, start_powered_off, false),
     DEFINE_PROP_UINT32("psci-conduit", ARMCPU, psci_conduit, 0),
-    DEFINE_PROP_UINT32("midr", ARMCPU, midr, 0),
+    DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
     DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
                         mp_affinity, ARM64_AFFINITY_INVALID),
     DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
-- 
2.20.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Move misplaced comment.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-3-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/xlnx-versal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
 
         obj = object_new(XLNX_VERSAL_ACPU_TYPE);
         if (!obj) {
-            /* Secondary CPUs start in PSCI powered-down state */
             error_report("Unable to create apu.cpu[%d] of type %s",
                          i, XLNX_VERSAL_ACPU_TYPE);
             exit(EXIT_FAILURE);
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
         object_property_set_int(obj, s->cfg.psci_conduit,
                                 "psci-conduit", &error_abort);
         if (i) {
+            /* Secondary CPUs start in PSCI powered-down state */
             object_property_set_bool(obj, true,
                                      "start-powered-off", &error_abort);
         }
-- 
2.20.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Fix typo xlnx-ve -> xlnx-versal.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-4-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/xlnx-versal-virt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
         psci_conduit = QEMU_PSCI_CONDUIT_SMC;
     }
 
-    sysbus_init_child_obj(OBJECT(machine), "xlnx-ve", &s->soc,
+    sysbus_init_child_obj(OBJECT(machine), "xlnx-versal", &s->soc,
                           sizeof(s->soc), TYPE_XLNX_VERSAL);
     object_property_set_link(OBJECT(&s->soc), OBJECT(machine->ram),
                              "ddr", &error_abort);
-- 
2.20.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Embed the UARTs into the SoC type.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-5-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/xlnx-versal.h |  3 ++-
 hw/arm/xlnx-versal.c         | 12 ++++++------
 2 files changed, 8 insertions(+), 7 deletions(-)

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Embed the GEMs into the SoC type.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-6-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/xlnx-versal.h |  3 ++-
 hw/arm/xlnx-versal.c         | 15 ++++++++-------
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/xlnx-versal.h
+++ b/include/hw/arm/xlnx-versal.h
@@ -XXX,XX +XXX,XX @@
 #include "hw/arm/boot.h"
 #include "hw/intc/arm_gicv3.h"
 #include "hw/char/pl011.h"
+#include "hw/net/cadence_gem.h"
 
 #define TYPE_XLNX_VERSAL "xlnx-versal"
 #define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
 
         struct {
             PL011State uart[XLNX_VERSAL_NR_UARTS];
-            SysBusDevice *gem[XLNX_VERSAL_NR_GEMS];
+            CadenceGEMState gem[XLNX_VERSAL_NR_GEMS];
             SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
         } iou;
     } lpd;
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_gems(Versal *s, qemu_irq *pic)
         DeviceState *dev;
         MemoryRegion *mr;
 
-        dev = qdev_create(NULL, "cadence_gem");
-        s->lpd.iou.gem[i] = SYS_BUS_DEVICE(dev);
-        object_property_add_child(OBJECT(s), name, OBJECT(dev), &error_fatal);
+        sysbus_init_child_obj(OBJECT(s), name,
+                              &s->lpd.iou.gem[i], sizeof(s->lpd.iou.gem[i]),
+                              TYPE_CADENCE_GEM);
+        dev = DEVICE(&s->lpd.iou.gem[i]);
         if (nd->used) {
             qemu_check_nic_model(nd, "cadence_gem");
             qdev_set_nic_properties(dev, nd);
         }
-        object_property_set_int(OBJECT(s->lpd.iou.gem[i]),
+        object_property_set_int(OBJECT(dev),
                                 2, "num-priority-queues",
                                 &error_abort);
-        object_property_set_link(OBJECT(s->lpd.iou.gem[i]),
+        object_property_set_link(OBJECT(dev),
                                  OBJECT(&s->mr_ps), "dma",
                                  &error_abort);
         qdev_init_nofail(dev);
 
-        mr = sysbus_mmio_get_region(s->lpd.iou.gem[i], 0);
+        mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
         memory_region_add_subregion(&s->mr_ps, addrs[i], mr);
 
-        sysbus_connect_irq(s->lpd.iou.gem[i], 0, pic[irqs[i]]);
+        sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[irqs[i]]);
         g_free(name);
     }
 }
-- 
2.20.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Embed the ADMAs into the SoC type.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-7-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/xlnx-versal.h |  3 ++-
 hw/arm/xlnx-versal.c         | 14 +++++++-------
 2 files changed, 9 insertions(+), 8 deletions(-)

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Embed the APUs into the SoC type.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-8-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/xlnx-versal.h |  2 +-
 hw/arm/xlnx-versal-virt.c    |  4 ++--
 hw/arm/xlnx-versal.c         | 19 +++++--------------
 3 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/xlnx-versal.h
+++ b/include/hw/arm/xlnx-versal.h
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
     struct {
         struct {
             MemoryRegion mr;
-            ARMCPU *cpu[XLNX_VERSAL_NR_ACPUS];
+            ARMCPU cpu[XLNX_VERSAL_NR_ACPUS];
             GICv3State gic;
         } apu;
     } fpd;
diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
     s->binfo.get_dtb = versal_virt_get_dtb;
     s->binfo.modify_dtb = versal_virt_modify_dtb;
     if (machine->kernel_filename) {
-        arm_load_kernel(s->soc.fpd.apu.cpu[0], machine, &s->binfo);
+        arm_load_kernel(&s->soc.fpd.apu.cpu[0], machine, &s->binfo);
     } else {
-        AddressSpace *as = arm_boot_address_space(s->soc.fpd.apu.cpu[0],
+        AddressSpace *as = arm_boot_address_space(&s->soc.fpd.apu.cpu[0],
                                                   &s->binfo);
         /* Some boot-loaders (e.g u-boot) don't like blobs at address 0 (NULL).
          * Offset things by 4K.  */
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
 
     for (i = 0; i < ARRAY_SIZE(s->fpd.apu.cpu); i++) {
         Object *obj;
-        char *name;
-
-        obj = object_new(XLNX_VERSAL_ACPU_TYPE);
-        if (!obj) {
-            error_report("Unable to create apu.cpu[%d] of type %s",
-                         i, XLNX_VERSAL_ACPU_TYPE);
-            exit(EXIT_FAILURE);
-        }
-
-        name = g_strdup_printf("apu-cpu[%d]", i);
-        object_property_add_child(OBJECT(s), name, obj, &error_fatal);
-        g_free(name);
 
+        object_initialize_child(OBJECT(s), "apu-cpu[*]",
+                                &s->fpd.apu.cpu[i], sizeof(s->fpd.apu.cpu[i]),
+                                XLNX_VERSAL_ACPU_TYPE, &error_abort, NULL);
+        obj = OBJECT(&s->fpd.apu.cpu[i]);
         object_property_set_int(obj, s->cfg.psci_conduit,
                                 "psci-conduit", &error_abort);
         if (i) {
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_cpus(Versal *s)
         object_property_set_link(obj, OBJECT(&s->fpd.apu.mr), "memory",
                                  &error_abort);
         object_property_set_bool(obj, true, "realized", &error_fatal);
-        s->fpd.apu.cpu[i] = ARM_CPU(obj);
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void versal_create_apu_gic(Versal *s, qemu_irq *pic)
     }
 
     for (i = 0; i < nr_apu_cpus; i++) {
-        DeviceState *cpudev = DEVICE(s->fpd.apu.cpu[i]);
+        DeviceState *cpudev = DEVICE(&s->fpd.apu.cpu[i]);
         int ppibase = XLNX_VERSAL_NR_IRQS + i * GIC_INTERNAL + GIC_NR_SGIS;
         qemu_irq maint_irq;
         int ti;
-- 
2.20.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Add support for SD.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-9-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/xlnx-versal.h | 12 ++++++++++++
 hw/arm/xlnx-versal.c         | 31 +++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/xlnx-versal.h
+++ b/include/hw/arm/xlnx-versal.h
@@ -XXX,XX +XXX,XX @@
 
 #include "hw/sysbus.h"
 #include "hw/arm/boot.h"
+#include "hw/sd/sdhci.h"
 #include "hw/intc/arm_gicv3.h"
 #include "hw/char/pl011.h"
 #include "hw/dma/xlnx-zdma.h"
@@ -XXX,XX +XXX,XX @@
 #define XLNX_VERSAL_NR_UARTS   2
 #define XLNX_VERSAL_NR_GEMS    2
 #define XLNX_VERSAL_NR_ADMAS   8
+#define XLNX_VERSAL_NR_SDS     2
 #define XLNX_VERSAL_NR_IRQS    192
 
 typedef struct Versal {
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
         } iou;
     } lpd;
 
+    /* The Platform Management Controller subsystem.  */
+    struct {
+        struct {
+            SDHCIState sd[XLNX_VERSAL_NR_SDS];
+        } iou;
+    } pmc;
+
     struct {
         MemoryRegion *mr_ddr;
         uint32_t psci_conduit;
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
 #define VERSAL_GEM1_IRQ_0          58
 #define VERSAL_GEM1_WAKE_IRQ_0     59
 #define VERSAL_ADMA_IRQ_0          60
+#define VERSAL_SD0_IRQ_0           126
 
 /* Architecturally reserved IRQs suitable for virtualization.  */
 #define VERSAL_RSVD_IRQ_FIRST 111
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
 #define MM_FPD_CRF                  0xfd1a0000U
 #define MM_FPD_CRF_SIZE             0x140000
 
+#define MM_PMC_SD0                  0xf1040000U
+#define MM_PMC_SD0_SIZE             0x10000
 #define MM_PMC_CRP                  0xf1260000U
 #define MM_PMC_CRP_SIZE             0x10000
 #endif
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_admas(Versal *s, qemu_irq *pic)
     }
 }
 
+#define SDHCI_CAPABILITIES  0x280737ec6481 /* Same as on ZynqMP.  */
+static void versal_create_sds(Versal *s, qemu_irq *pic)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(s->pmc.iou.sd); i++) {
+        DeviceState *dev;
+        MemoryRegion *mr;
+
+        sysbus_init_child_obj(OBJECT(s), "sd[*]",
+                              &s->pmc.iou.sd[i], sizeof(s->pmc.iou.sd[i]),
+                              TYPE_SYSBUS_SDHCI);
+        dev = DEVICE(&s->pmc.iou.sd[i]);
+
+        object_property_set_uint(OBJECT(dev),
+                                 3, "sd-spec-version", &error_fatal);
+        object_property_set_uint(OBJECT(dev), SDHCI_CAPABILITIES, "capareg",
+                                 &error_fatal);
+        object_property_set_uint(OBJECT(dev), UHS_I, "uhs", &error_fatal);
+        qdev_init_nofail(dev);
+
+        mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0);
+        memory_region_add_subregion(&s->mr_ps,
+                                    MM_PMC_SD0 + i * MM_PMC_SD0_SIZE, mr);
+
+        sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0,
+                           pic[VERSAL_SD0_IRQ_0 + i * 2]);
+    }
+}
+
 /* This takes the board allocated linear DDR memory and creates aliases
  * for each split DDR range/aperture on the Versal address map.
  */
@@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp)
     versal_create_uarts(s, pic);
     versal_create_gems(s, pic);
     versal_create_admas(s, pic);
+    versal_create_sds(s, pic);
     versal_map_ddr(s);
     versal_unimp(s);
 
-- 
2.20.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

hw/arm: versal: Add support for the RTC.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-10-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/xlnx-versal.h |  8 ++++++++
 hw/arm/xlnx-versal.c         | 21 +++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/xlnx-versal.h
+++ b/include/hw/arm/xlnx-versal.h
@@ -XXX,XX +XXX,XX @@
 #include "hw/char/pl011.h"
 #include "hw/dma/xlnx-zdma.h"
 #include "hw/net/cadence_gem.h"
+#include "hw/rtc/xlnx-zynqmp-rtc.h"
 
 #define TYPE_XLNX_VERSAL "xlnx-versal"
 #define XLNX_VERSAL(obj) OBJECT_CHECK(Versal, (obj), TYPE_XLNX_VERSAL)
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
         struct {
             SDHCIState sd[XLNX_VERSAL_NR_SDS];
         } iou;
+
+        XlnxZynqMPRTC rtc;
     } pmc;
 
     struct {
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
 #define VERSAL_GEM1_IRQ_0          58
 #define VERSAL_GEM1_WAKE_IRQ_0     59
 #define VERSAL_ADMA_IRQ_0          60
+#define VERSAL_RTC_APB_ERR_IRQ     121
 #define VERSAL_SD0_IRQ_0           126
+#define VERSAL_RTC_ALARM_IRQ       142
+#define VERSAL_RTC_SECONDS_IRQ     143
 
 /* Architecturally reserved IRQs suitable for virtualization.  */
 #define VERSAL_RSVD_IRQ_FIRST 111
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
 #define MM_PMC_SD0_SIZE             0x10000
 #define MM_PMC_CRP                  0xf1260000U
 #define MM_PMC_CRP_SIZE             0x10000
+#define MM_PMC_RTC                  0xf12a0000
+#define MM_PMC_RTC_SIZE             0x10000
 #endif
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_sds(Versal *s, qemu_irq *pic)
     }
 }
 
+static void versal_create_rtc(Versal *s, qemu_irq *pic)
+{
+    SysBusDevice *sbd;
+    MemoryRegion *mr;
+
+    sysbus_init_child_obj(OBJECT(s), "rtc", &s->pmc.rtc, sizeof(s->pmc.rtc),
+                          TYPE_XLNX_ZYNQMP_RTC);
+    sbd = SYS_BUS_DEVICE(&s->pmc.rtc);
+    qdev_init_nofail(DEVICE(sbd));
+
+    mr = sysbus_mmio_get_region(sbd, 0);
+    memory_region_add_subregion(&s->mr_ps, MM_PMC_RTC, mr);
+
+    /*
+     * TODO: Connect the ALARM and SECONDS interrupts once our RTC model
+     * supports them.
+     */
+    sysbus_connect_irq(sbd, 1, pic[VERSAL_RTC_APB_ERR_IRQ]);
+}
+
 /* This takes the board allocated linear DDR memory and creates aliases
  * for each split DDR range/aperture on the Versal address map.
  */
@@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp)
     versal_create_gems(s, pic);
     versal_create_admas(s, pic);
     versal_create_sds(s, pic);
+    versal_create_rtc(s, pic);
     versal_map_ddr(s);
     versal_unimp(s);
 
-- 
2.20.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Add support for SD.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-11-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/xlnx-versal-virt.c | 46 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/arm/sysbus-fdt.h"
 #include "hw/arm/fdt.h"
 #include "cpu.h"
+#include "hw/qdev-properties.h"
 #include "hw/arm/xlnx-versal.h"
 
 #define TYPE_XLNX_VERSAL_VIRT_MACHINE MACHINE_TYPE_NAME("xlnx-versal-virt")
@@ -XXX,XX +XXX,XX @@ static void fdt_add_zdma_nodes(VersalVirt *s)
     }
 }
 
+static void fdt_add_sd_nodes(VersalVirt *s)
+{
+    const char clocknames[] = "clk_xin\0clk_ahb";
+    const char compat[] = "arasan,sdhci-8.9a";
+    int i;
+
+    for (i = ARRAY_SIZE(s->soc.pmc.iou.sd) - 1; i >= 0; i--) {
+        uint64_t addr = MM_PMC_SD0 + MM_PMC_SD0_SIZE * i;
+        char *name = g_strdup_printf("/sdhci@%" PRIx64, addr);
+
+        qemu_fdt_add_subnode(s->fdt, name);
+
+        qemu_fdt_setprop_cells(s->fdt, name, "clocks",
+                               s->phandle.clk_25Mhz, s->phandle.clk_25Mhz);
+        qemu_fdt_setprop(s->fdt, name, "clock-names",
+                         clocknames, sizeof(clocknames));
+        qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
+                               GIC_FDT_IRQ_TYPE_SPI, VERSAL_SD0_IRQ_0 + i * 2,
+                               GIC_FDT_IRQ_FLAGS_LEVEL_HI);
+        qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
+                                     2, addr, 2, MM_PMC_SD0_SIZE);
+        qemu_fdt_setprop(s->fdt, name, "compatible", compat, sizeof(compat));
+        g_free(name);
+    }
+}
+
 static void fdt_nop_memory_nodes(void *fdt, Error **errp)
 {
     Error *err = NULL;
@@ -XXX,XX +XXX,XX @@ static void create_virtio_regions(VersalVirt *s)
     }
 }
 
+static void sd_plugin_card(SDHCIState *sd, DriveInfo *di)
+{
+    BlockBackend *blk = di ? blk_by_legacy_dinfo(di) : NULL;
+    DeviceState *card;
+
+    card = qdev_create(qdev_get_child_bus(DEVICE(sd), "sd-bus"), TYPE_SD_CARD);
+    object_property_add_child(OBJECT(sd), "card[*]", OBJECT(card),
+                              &error_fatal);
+    qdev_prop_set_drive(card, "drive", blk, &error_fatal);
+    object_property_set_bool(OBJECT(card), true, "realized", &error_fatal);
+}
+
 static void versal_virt_init(MachineState *machine)
 {
     VersalVirt *s = XLNX_VERSAL_VIRT_MACHINE(machine);
     int psci_conduit = QEMU_PSCI_CONDUIT_DISABLED;
+    int i;
 
     /*
      * If the user provides an Operating System to be loaded, we expect them
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
     fdt_add_gic_nodes(s);
     fdt_add_timer_nodes(s);
     fdt_add_zdma_nodes(s);
+    fdt_add_sd_nodes(s);
     fdt_add_cpu_nodes(s, psci_conduit);
     fdt_add_clk_node(s, "/clk125", 125000000, s->phandle.clk_125Mhz);
     fdt_add_clk_node(s, "/clk25", 25000000, s->phandle.clk_25Mhz);
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
     memory_region_add_subregion_overlap(get_system_memory(),
                                         0, &s->soc.fpd.apu.mr, 0);
 
+    /* Plugin SD cards.  */
+    for (i = 0; i < ARRAY_SIZE(s->soc.pmc.iou.sd); i++) {
+        sd_plugin_card(&s->soc.pmc.iou.sd[i], drive_get_next(IF_SD));
+    }
+
     s->binfo.ram_size = machine->ram_size;
     s->binfo.loader_start = 0x0;
     s->binfo.get_dtb = versal_virt_get_dtb;
-- 
2.20.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Add support for the RTC.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20200427181649.26851-12-edgar.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/xlnx-versal-virt.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -XXX,XX +XXX,XX @@ static void fdt_add_sd_nodes(VersalVirt *s)
     }
 }
 
+static void fdt_add_rtc_node(VersalVirt *s)
+{
+    const char compat[] = "xlnx,zynqmp-rtc";
+    const char interrupt_names[] = "alarm\0sec";
+    char *name = g_strdup_printf("/rtc@%x", MM_PMC_RTC);
+
+    qemu_fdt_add_subnode(s->fdt, name);
+
+    qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
+                           GIC_FDT_IRQ_TYPE_SPI, VERSAL_RTC_ALARM_IRQ,
+                           GIC_FDT_IRQ_FLAGS_LEVEL_HI,
+                           GIC_FDT_IRQ_TYPE_SPI, VERSAL_RTC_SECONDS_IRQ,
+                           GIC_FDT_IRQ_FLAGS_LEVEL_HI);
+    qemu_fdt_setprop(s->fdt, name, "interrupt-names",
+                     interrupt_names, sizeof(interrupt_names));
+    qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
+                                 2, MM_PMC_RTC, 2, MM_PMC_RTC_SIZE);
+    qemu_fdt_setprop(s->fdt, name, "compatible", compat, sizeof(compat));
+    g_free(name);
+}
+
 static void fdt_nop_memory_nodes(void *fdt, Error **errp)
 {
     Error *err = NULL;
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
     fdt_add_timer_nodes(s);
     fdt_add_zdma_nodes(s);
     fdt_add_sd_nodes(s);
+    fdt_add_rtc_node(s);
     fdt_add_cpu_nodes(s, psci_conduit);
     fdt_add_clk_node(s, "/clk125", 125000000, s->phandle.clk_125Mhz);
     fdt_add_clk_node(s, "/clk25", 25000000, s->phandle.clk_25Mhz);
-- 
2.20.1

Somewhere along theline we accidentally added a duplicate
"using D16-D31 when they don't exist" check to do_vfm_dp()
(probably an artifact of a patchseries rebase). Remove it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20200430181003.21682-2-peter.maydell@linaro.org
---
 target/arm/translate-vfp.inc.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
         return false;
     }
 
-    /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_simd_r32, s) &&
-        ((a->vd | a->vn | a->vm) & 0x10)) {
-        return false;
-    }
-
     if (!vfp_access_check(s)) {
         return true;
     }
-- 
2.20.1

We were accidentally permitting decode of Thumb Neon insns even if
the CPU didn't have the FEATURE_NEON bit set, because the feature
check was being done before the call to disas_neon_data_insn() and
disas_neon_ls_insn() in the Arm decoder but was omitted from the
Thumb decoder.  Push the feature bit check down into the called
functions so it is done for both Arm and Thumb encodings.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20200430181003.21682-3-peter.maydell@linaro.org
---
 target/arm/translate.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
     TCGv_i32 tmp2;
     TCGv_i64 tmp64;
 
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return 1;
+    }
+
     /* FIXME: this access check should not take precedence over UNDEF
      * for invalid encodings; we will generate incorrect syndrome information
      * for attempts to execute invalid vfp/neon encodings with FP disabled.
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     TCGv_ptr ptr1, ptr2, ptr3;
     TCGv_i64 tmp64;
 
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return 1;
+    }
+
     /* FIXME: this access check should not take precedence over UNDEF
      * for invalid encodings; we will generate incorrect syndrome information
      * for attempts to execute invalid vfp/neon encodings with FP disabled.
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
 
         if (((insn >> 25) & 7) == 1) {
             /* NEON Data processing.  */
-            if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-                goto illegal_op;
-            }
-
             if (disas_neon_data_insn(s, insn)) {
                 goto illegal_op;
             }
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
         }
         if ((insn & 0x0f100000) == 0x04000000) {
             /* NEON load/store.  */
-            if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-                goto illegal_op;
-            }
-
             if (disas_neon_ls_insn(s, insn)) {
                 goto illegal_op;
             }
-- 
2.20.1

Add the infrastructure for building and invoking a decodetree decoder
for the AArch32 Neon encodings.  At the moment the new decoder covers
nothing, so we always fall back to the existing hand-written decode.

We follow the same pattern we did for the VFP decodetree conversion
(commit 78e138bc1f672c145ef6ace74617d and following): code that deals
with Neon will be moving gradually out to translate-neon.vfp.inc,
which we #include into translate.c.

In order to share the decode files between A32 and T32, we
split Neon into 3 parts:
 * data-processing
 * load-store
 * 'shared' encodings

The first two groups of instructions have similar but not identical
A32 and T32 encodings, so we need to manually transform the T32
encoding into the A32 one before calling the decoder; the third group
covers the Neon instructions which are identical in A32 and T32.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-4-peter.maydell@linaro.org
---
 target/arm/neon-dp.decode       | 29 ++++++++++++++++++++++++++
 target/arm/neon-ls.decode       | 29 ++++++++++++++++++++++++++
 target/arm/neon-shared.decode   | 27 +++++++++++++++++++++++++
 target/arm/translate-neon.inc.c | 32 +++++++++++++++++++++++++++++
 target/arm/translate.c          | 36 +++++++++++++++++++++++++++++++--
 target/arm/Makefile.objs        | 18 +++++++++++++++++
 6 files changed, 169 insertions(+), 2 deletions(-)
 create mode 100644 target/arm/neon-dp.decode
 create mode 100644 target/arm/neon-ls.decode
 create mode 100644 target/arm/neon-shared.decode
 create mode 100644 target/arm/translate-neon.inc.c

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@
+# AArch32 Neon data-processing instruction descriptions
+#
+#  Copyright (c) 2020 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
+
+# Encodings for Neon data processing instructions where the T32 encoding
+# is a simple transformation of the A32 encoding.
+# More specifically, this file covers instructions where the A32 encoding is
+#   0b1111_001p_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
+# and the T32 encoding is
+#   0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
+# This file works on the A32 encoding only; calling code for T32 has to
+# transform the insn into the A32 version first.
diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/neon-ls.decode
@@ -XXX,XX +XXX,XX @@
+# AArch32 Neon load/store instruction descriptions
+#
+#  Copyright (c) 2020 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
+
+# Encodings for Neon load/store instructions where the T32 encoding
+# is a simple transformation of the A32 encoding.
+# More specifically, this file covers instructions where the A32 encoding is
+#   0b1111_0100_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
+# and the T32 encoding is
+#   0b1111_1001_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
+# This file works on the A32 encoding only; calling code for T32 has to
+# transform the insn into the A32 version first.
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@
+# AArch32 Neon instruction descriptions
+#
+#  Copyright (c) 2020 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see <http://www.gnu.org/licenses/>.
+
+#
+# This file is processed by scripts/decodetree.py
+#
+
+# Encodings for Neon instructions whose encoding is the same for
+# both A32 and T32.
+
+# More specifically, this covers:
+# 2reg scalar ext: 0b1111_1110_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
+# 3same ext:       0b1111_110x_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@
+/*
+ *  ARM translation: AArch32 Neon instructions
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *  Copyright (c) 2005-2007 CodeSourcery
+ *  Copyright (c) 2007 OpenedHand, Ltd.
+ *  Copyright (c) 2020 Linaro, Ltd.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * This file is intended to be included from translate.c; it uses
+ * some macros and definitions provided by that file.
+ * It might be possible to convert it to a standalone .c file eventually.
+ */
+
+/* Include the generated Neon decoder */
+#include "decode-neon-dp.inc.c"
+#include "decode-neon-ls.inc.c"
+#include "decode-neon-shared.inc.c"
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 
 #define ARM_CP_RW_BIT   (1 << 20)
 
-/* Include the VFP decoder */
+/* Include the VFP and Neon decoders */
 #include "translate-vfp.inc.c"
+#include "translate-neon.inc.c"
 
 static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
 {
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
         /* Unconditional instructions.  */
         /* TODO: Perhaps merge these into one decodetree output file.  */
         if (disas_a32_uncond(s, insn) ||
-            disas_vfp_uncond(s, insn)) {
+            disas_vfp_uncond(s, insn) ||
+            disas_neon_dp(s, insn) ||
+            disas_neon_ls(s, insn) ||
+            disas_neon_shared(s, insn)) {
             return;
         }
         /* fall back to legacy decoder */
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
         ARCH(6T2);
     }
 
+    if ((insn & 0xef000000) == 0xef000000) {
+        /*
+         * T32 encodings 0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
+         * transform into
+         * A32 encodings 0b1111_001p_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
+         */
+        uint32_t a32_insn = (insn & 0xe2ffffff) |
+            ((insn & (1 << 28)) >> 4) | (1 << 28);
+
+        if (disas_neon_dp(s, a32_insn)) {
+            return;
+        }
+    }
+
+    if ((insn & 0xff100000) == 0xf9000000) {
+        /*
+         * T32 encodings 0b1111_1001_ppp0_qqqq_qqqq_qqqq_qqqq_qqqq
+         * transform into
+         * A32 encodings 0b1111_0100_ppp0_qqqq_qqqq_qqqq_qqqq_qqqq
+         */
+        uint32_t a32_insn = (insn & 0x00ffffff) | 0xf4000000;
+
+        if (disas_neon_ls(s, a32_insn)) {
+            return;
+        }
+    }
+
     /*
      * TODO: Perhaps merge these into one decodetree output file.
      * Note disas_vfp is written for a32 with cond field in the
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
      */
     if (disas_t32(s, insn) ||
         disas_vfp_uncond(s, insn) ||
+        disas_neon_shared(s, insn) ||
         ((insn >> 28) == 0xe && disas_vfp(s, insn))) {
         return;
     }
diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/Makefile.objs
+++ b/target/arm/Makefile.objs
@@ -XXX,XX +XXX,XX @@ target/arm/decode-sve.inc.c: $(SRC_PATH)/target/arm/sve.decode $(DECODETREE)
 	  $(PYTHON) $(DECODETREE) --decode disas_sve -o $@ $<,\
 	  "GEN", $(TARGET_DIR)$@)
 
+target/arm/decode-neon-shared.inc.c: $(SRC_PATH)/target/arm/neon-shared.decode $(DECODETREE)
+	$(call quiet-command,\
+	  $(PYTHON) $(DECODETREE) --static-decode disas_neon_shared -o $@ $<,\
+	  "GEN", $(TARGET_DIR)$@)
+
+target/arm/decode-neon-dp.inc.c: $(SRC_PATH)/target/arm/neon-dp.decode $(DECODETREE)
+	$(call quiet-command,\
+	  $(PYTHON) $(DECODETREE) --static-decode disas_neon_dp -o $@ $<,\
+	  "GEN", $(TARGET_DIR)$@)
+
+target/arm/decode-neon-ls.inc.c: $(SRC_PATH)/target/arm/neon-ls.decode $(DECODETREE)
+	$(call quiet-command,\
+	  $(PYTHON) $(DECODETREE) --static-decode disas_neon_ls -o $@ $<,\
+	  "GEN", $(TARGET_DIR)$@)
+
 target/arm/decode-vfp.inc.c: $(SRC_PATH)/target/arm/vfp.decode $(DECODETREE)
 	$(call quiet-command,\
 	  $(PYTHON) $(DECODETREE) --static-decode disas_vfp -o $@ $<,\
@@ -XXX,XX +XXX,XX @@ target/arm/decode-t16.inc.c: $(SRC_PATH)/target/arm/t16.decode $(DECODETREE)
 	  "GEN", $(TARGET_DIR)$@)
 
 target/arm/translate-sve.o: target/arm/decode-sve.inc.c
+target/arm/translate.o: target/arm/decode-neon-shared.inc.c
+target/arm/translate.o: target/arm/decode-neon-dp.inc.c
+target/arm/translate.o: target/arm/decode-neon-ls.inc.c
 target/arm/translate.o: target/arm/decode-vfp.inc.c
 target/arm/translate.o: target/arm/decode-vfp-uncond.inc.c
 target/arm/translate.o: target/arm/decode-a32.inc.c
-- 
2.20.1

Convert the VCMLA (vector) insns in the 3same extension group to
decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-5-peter.maydell@linaro.org
---
 target/arm/neon-shared.decode   | 11 ++++++++++
 target/arm/translate-neon.inc.c | 37 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 11 +---------
 3 files changed, 49 insertions(+), 10 deletions(-)

diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@
 # More specifically, this covers:
 # 2reg scalar ext: 0b1111_1110_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
 # 3same ext:       0b1111_110x_xxxx_xxxx_xxxx_1x0x_xxxx_xxxx
+
+# VFP/Neon register fields; same as vfp.decode
+%vm_dp  5:1 0:4
+%vm_sp  0:4 5:1
+%vn_dp  7:1 16:4
+%vn_sp  16:4 7:1
+%vd_dp  22:1 12:4
+%vd_sp  12:4 22:1
+
+VCMLA          1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@
 #include "decode-neon-dp.inc.c"
 #include "decode-neon-ls.inc.c"
 #include "decode-neon-shared.inc.c"
+
+static bool trans_VCMLA(DisasContext *s, arg_VCMLA *a)
+{
+    int opr_sz;
+    TCGv_ptr fpst;
+    gen_helper_gvec_3_ptr *fn_gvec_ptr;
+
+    if (!dc_isar_feature(aa32_vcma, s)
+        || (!a->size && !dc_isar_feature(aa32_fp16_arith, s))) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    opr_sz = (1 + a->q) * 8;
+    fpst = get_fpstatus_ptr(1);
+    fn_gvec_ptr = a->size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(1, a->vn),
+                       vfp_reg_offset(1, a->vm),
+                       fpst, opr_sz, opr_sz, a->rot,
+                       fn_gvec_ptr);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
     bool is_long = false, q = extract32(insn, 6, 1);
     bool ptr_is_env = false;
 
-    if ((insn & 0xfe200f10) == 0xfc200800) {
-        /* VCMLA -- 1111 110R R.1S .... .... 1000 ...0 .... */
-        int size = extract32(insn, 20, 1);
-        data = extract32(insn, 23, 2); /* rot */
-        if (!dc_isar_feature(aa32_vcma, s)
-            || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
-            return 1;
-        }
-        fn_gvec_ptr = size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
-    } else if ((insn & 0xfea00f10) == 0xfc800800) {
+    if ((insn & 0xfea00f10) == 0xfc800800) {
         /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
         int size = extract32(insn, 20, 1);
         data = extract32(insn, 24, 1); /* rot */
-- 
2.20.1

Convert the VCADD (vector) insns to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-6-peter.maydell@linaro.org
---
 target/arm/neon-shared.decode   |  3 +++
 target/arm/translate-neon.inc.c | 37 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 11 +---------
 3 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@
 
 VCMLA          1111 110 rot:2 . 1 size:1 .... .... 1000 . q:1 . 0 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VCADD          1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMLA(DisasContext *s, arg_VCMLA *a)
     tcg_temp_free_ptr(fpst);
     return true;
 }
+
+static bool trans_VCADD(DisasContext *s, arg_VCADD *a)
+{
+    int opr_sz;
+    TCGv_ptr fpst;
+    gen_helper_gvec_3_ptr *fn_gvec_ptr;
+
+    if (!dc_isar_feature(aa32_vcma, s)
+        || (!a->size && !dc_isar_feature(aa32_fp16_arith, s))) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    opr_sz = (1 + a->q) * 8;
+    fpst = get_fpstatus_ptr(1);
+    fn_gvec_ptr = a->size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(1, a->vn),
+                       vfp_reg_offset(1, a->vm),
+                       fpst, opr_sz, opr_sz, a->rot,
+                       fn_gvec_ptr);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
     bool is_long = false, q = extract32(insn, 6, 1);
     bool ptr_is_env = false;
 
-    if ((insn & 0xfea00f10) == 0xfc800800) {
-        /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
-        int size = extract32(insn, 20, 1);
-        data = extract32(insn, 24, 1); /* rot */
-        if (!dc_isar_feature(aa32_vcma, s)
-            || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
-            return 1;
-        }
-        fn_gvec_ptr = size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
-    } else if ((insn & 0xfeb00f00) == 0xfc200d00) {
+    if ((insn & 0xfeb00f00) == 0xfc200d00) {
         /* V[US]DOT -- 1111 1100 0.10 .... .... 1101 .Q.U .... */
         bool u = extract32(insn, 4, 1);
         if (!dc_isar_feature(aa32_dp, s)) {
-- 
2.20.1

Convert the V[US]DOT (vector) insns to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-7-peter.maydell@linaro.org
---
 target/arm/neon-shared.decode   |  4 ++++
 target/arm/translate-neon.inc.c | 32 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  9 +--------
 3 files changed, 37 insertions(+), 8 deletions(-)

Convert the VFM[AS]L (vector) insns to decodetree.  This is the last
insn in the legacy decoder for the 3same_ext group, so we can
delete the legacy decoder function for the group entirely.

Note that in disas_thumb2_insn() the parts of this encoding space
where the decodetree decoder returns false will correctly be directed
to illegal_op by the "(insn & (1 << 28))" check so they won't fall
into disas_coproc_insn() by mistake.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-8-peter.maydell@linaro.org
---
 target/arm/neon-shared.decode   |  6 +++
 target/arm/translate-neon.inc.c | 31 +++++++++++
 target/arm/translate.c          | 92 +--------------------------------
 3 files changed, 38 insertions(+), 91 deletions(-)

diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VCADD          1111 110 rot:1 1 . 0 size:1 .... .... 1000 . q:1 . 0 .... \
 # VUDOT and VSDOT
 VDOT           1111 110 00 . 10 .... .... 1101 . q:1 . u:1 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+# VFM[AS]L
+VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
+               vm=%vm_sp vn=%vn_sp vd=%vd_dp q=0
+VFML           1111 110 0 s:1 . 10 .... .... 1000 . 1 . 1 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp q=1
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VDOT(DisasContext *s, arg_VDOT *a)
                        opr_sz, opr_sz, 0, fn_gvec);
     return true;
 }
+
+static bool trans_VFML(DisasContext *s, arg_VFML *a)
+{
+    int opr_sz;
+
+    if (!dc_isar_feature(aa32_fhm, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        (a->vd & 0x10)) {
+        return false;
+    }
+
+    if (a->vd & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    opr_sz = (1 + a->q) * 8;
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(a->q, a->vn),
+                       vfp_reg_offset(a->q, a->vm),
+                       cpu_env, opr_sz, opr_sz, a->s, /* is_2 == 0 */
+                       gen_helper_gvec_fmlal_a32);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     return 0;
 }
 
-/* Advanced SIMD three registers of the same length extension.
- *  31           25    23  22    20   16   12  11   10   9    8        3     0
- * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
- * | 1 1 1 1 1 1 0 | op1 | D | op2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
- * +---------------+-----+---+-----+----+----+---+----+---+----+---------+----+
- */
-static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
-{
-    gen_helper_gvec_3 *fn_gvec = NULL;
-    gen_helper_gvec_3_ptr *fn_gvec_ptr = NULL;
-    int rd, rn, rm, opr_sz;
-    int data = 0;
-    int off_rn, off_rm;
-    bool is_long = false, q = extract32(insn, 6, 1);
-    bool ptr_is_env = false;
-
-    if ((insn & 0xff300f10) == 0xfc200810) {
-        /* VFM[AS]L -- 1111 1100 S.10 .... .... 1000 .Q.1 .... */
-        int is_s = extract32(insn, 23, 1);
-        if (!dc_isar_feature(aa32_fhm, s)) {
-            return 1;
-        }
-        is_long = true;
-        data = is_s; /* is_2 == 0 */
-        fn_gvec_ptr = gen_helper_gvec_fmlal_a32;
-        ptr_is_env = true;
-    } else {
-        return 1;
-    }
-
-    VFP_DREG_D(rd, insn);
-    if (rd & q) {
-        return 1;
-    }
-    if (q || !is_long) {
-        VFP_DREG_N(rn, insn);
-        VFP_DREG_M(rm, insn);
-        if ((rn | rm) & q & !is_long) {
-            return 1;
-        }
-        off_rn = vfp_reg_offset(1, rn);
-        off_rm = vfp_reg_offset(1, rm);
-    } else {
-        rn = VFP_SREG_N(insn);
-        rm = VFP_SREG_M(insn);
-        off_rn = vfp_reg_offset(0, rn);
-        off_rm = vfp_reg_offset(0, rm);
-    }
-
-    if (s->fp_excp_el) {
-        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
-                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
-        return 0;
-    }
-    if (!s->vfp_enabled) {
-        return 1;
-    }
-
-    opr_sz = (1 + q) * 8;
-    if (fn_gvec_ptr) {
-        TCGv_ptr ptr;
-        if (ptr_is_env) {
-            ptr = cpu_env;
-        } else {
-            ptr = get_fpstatus_ptr(1);
-        }
-        tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), off_rn, off_rm, ptr,
-                           opr_sz, opr_sz, data, fn_gvec_ptr);
-        if (!ptr_is_env) {
-            tcg_temp_free_ptr(ptr);
-        }
-    } else {
-        tcg_gen_gvec_3_ool(vfp_reg_offset(1, rd), off_rn, off_rm,
-                           opr_sz, opr_sz, data, fn_gvec);
-    }
-    return 0;
-}
-
 /* Advanced SIMD two registers and a scalar extension.
  *  31             24   23  22   20   16   12  11   10   9    8        3     0
  * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                     }
                 }
             }
-        } else if ((insn & 0x0e000a00) == 0x0c000800
-                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
-            if (disas_neon_insn_3same_ext(s, insn)) {
-                goto illegal_op;
-            }
-            return;
         } else if ((insn & 0x0f000a00) == 0x0e000800
                    && arm_dc_feature(s, ARM_FEATURE_V8)) {
             if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
             }
             break;
         }
-        if ((insn & 0xfe000a00) == 0xfc000800
+        if ((insn & 0xff000a00) == 0xfe000800
             && arm_dc_feature(s, ARM_FEATURE_V8)) {
             /* The Thumb2 and ARM encodings are identical.  */
-            if (disas_neon_insn_3same_ext(s, insn)) {
-                goto illegal_op;
-            }
-        } else if ((insn & 0xff000a00) == 0xfe000800
-                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
-            /* The Thumb2 and ARM encodings are identical.  */
             if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
                 goto illegal_op;
             }
-- 
2.20.1

Convert VCMLA (scalar) in the 2reg-scalar-ext group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-9-peter.maydell@linaro.org
---
 target/arm/neon-shared.decode   |  5 +++++
 target/arm/translate-neon.inc.c | 40 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 26 +--------------------
 3 files changed, 46 insertions(+), 25 deletions(-)

diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
                vm=%vm_sp vn=%vn_sp vd=%vd_dp q=0
 VFML           1111 110 0 s:1 . 10 .... .... 1000 . 1 . 1 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp q=1
+
+VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
+               vn=%vn_dp vd=%vd_dp size=0
+VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp size=1 index=0
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VFML(DisasContext *s, arg_VFML *a)
                        gen_helper_gvec_fmlal_a32);
     return true;
 }
+
+static bool trans_VCMLA_scalar(DisasContext *s, arg_VCMLA_scalar *a)
+{
+    gen_helper_gvec_3_ptr *fn_gvec_ptr;
+    int opr_sz;
+    TCGv_ptr fpst;
+
+    if (!dc_isar_feature(aa32_vcma, s)) {
+        return false;
+    }
+    if (a->size == 0 && !dc_isar_feature(aa32_fp16_arith, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vd | a->vn) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fn_gvec_ptr = (a->size ? gen_helper_gvec_fcmlas_idx
+                   : gen_helper_gvec_fcmlah_idx);
+    opr_sz = (1 + a->q) * 8;
+    fpst = get_fpstatus_ptr(1);
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(1, a->vn),
+                       vfp_reg_offset(1, a->vm),
+                       fpst, opr_sz, opr_sz,
+                       (a->index << 2) | a->rot, fn_gvec_ptr);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
     bool is_long = false, q = extract32(insn, 6, 1);
     bool ptr_is_env = false;
 
-    if ((insn & 0xff000f10) == 0xfe000800) {
-        /* VCMLA (indexed) -- 1111 1110 S.RR .... .... 1000 ...0 .... */
-        int rot = extract32(insn, 20, 2);
-        int size = extract32(insn, 23, 1);
-        int index;
-
-        if (!dc_isar_feature(aa32_vcma, s)) {
-            return 1;
-        }
-        if (size == 0) {
-            if (!dc_isar_feature(aa32_fp16_arith, s)) {
-                return 1;
-            }
-            /* For fp16, rm is just Vm, and index is M.  */
-            rm = extract32(insn, 0, 4);
-            index = extract32(insn, 5, 1);
-        } else {
-            /* For fp32, rm is the usual M:Vm, and index is 0.  */
-            VFP_DREG_M(rm, insn);
-            index = 0;
-        }
-        data = (index << 2) | rot;
-        fn_gvec_ptr = (size ? gen_helper_gvec_fcmlas_idx
-                       : gen_helper_gvec_fcmlah_idx);
-    } else if ((insn & 0xffb00f00) == 0xfe200d00) {
+    if ((insn & 0xffb00f00) == 0xfe200d00) {
         /* V[US]DOT -- 1111 1110 0.10 .... .... 1101 .Q.U .... */
         int u = extract32(insn, 4, 1);
 
-- 
2.20.1

Convert the V[US]DOT (scalar) insns in the 2reg-scalar-ext group
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-10-peter.maydell@linaro.org
---
 target/arm/neon-shared.decode   |  3 +++
 target/arm/translate-neon.inc.c | 35 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 13 +-----------
 3 files changed, 39 insertions(+), 12 deletions(-)

Convert the VFM[AS]L (scalar) insns in the 2reg-scalar-ext group
to decodetree. These are the last ones in the group so we can remove
all the legacy decode for the group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-11-peter.maydell@linaro.org
---
 target/arm/neon-shared.decode   |   7 +++
 target/arm/translate-neon.inc.c |  32 ++++++++++
 target/arm/translate.c          | 107 +-------------------------------
 3 files changed, 40 insertions(+), 106 deletions(-)

diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
 
 VDOT_scalar    1111 1110 0 . 10 .... .... 1101 . q:1 index:1 u:1 rm:4 \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+%vfml_scalar_q0_rm 0:3 5:1
+%vfml_scalar_q1_index 5:1 3:1
+VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 0 . 1 index:1 ... \
+               rm=%vfml_scalar_q0_rm vn=%vn_sp vd=%vd_dp q=0
+VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 1 . 1 . rm:3 \
+               index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp q=1
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VDOT_scalar(DisasContext *s, arg_VDOT_scalar *a)
     tcg_temp_free_ptr(fpst);
     return true;
 }
+
+static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
+{
+    int opr_sz;
+
+    if (!dc_isar_feature(aa32_fhm, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd & 0x10) || (a->q && (a->vn & 0x10)))) {
+        return false;
+    }
+
+    if (a->vd & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    opr_sz = (1 + a->q) * 8;
+    tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
+                       vfp_reg_offset(a->q, a->vn),
+                       vfp_reg_offset(a->q, a->rm),
+                       cpu_env, opr_sz, opr_sz,
+                       (a->index << 2) | a->s, /* is_2 == 0 */
+                       gen_helper_gvec_fmlal_idx_a32);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
 }
 
 #define VFP_REG_SHR(x, n) (((n) > 0) ? (x) >> (n) : (x) << -(n))
-#define VFP_SREG(insn, bigbit, smallbit) \
-  ((VFP_REG_SHR(insn, bigbit - 1) & 0x1e) | (((insn) >> (smallbit)) & 1))
 #define VFP_DREG(reg, insn, bigbit, smallbit) do { \
     if (dc_isar_feature(aa32_simd_r32, s)) { \
         reg = (((insn) >> (bigbit)) & 0x0f) \
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
         reg = ((insn) >> (bigbit)) & 0x0f; \
     }} while (0)
 
-#define VFP_SREG_D(insn) VFP_SREG(insn, 12, 22)
 #define VFP_DREG_D(reg, insn) VFP_DREG(reg, insn, 12, 22)
-#define VFP_SREG_N(insn) VFP_SREG(insn, 16,  7)
 #define VFP_DREG_N(reg, insn) VFP_DREG(reg, insn, 16,  7)
-#define VFP_SREG_M(insn) VFP_SREG(insn,  0,  5)
 #define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn,  0,  5)
 
 static void gen_neon_dup_low16(TCGv_i32 var)
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     return 0;
 }
 
-/* Advanced SIMD two registers and a scalar extension.
- *  31             24   23  22   20   16   12  11   10   9    8        3     0
- * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
- * | 1 1 1 1 1 1 1 0 | o1 | D | o2 | Vn | Vd | 1 | o3 | 0 | o4 | N Q M U | Vm |
- * +-----------------+----+---+----+----+----+---+----+---+----+---------+----+
- *
- */
-
-static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
-{
-    gen_helper_gvec_3 *fn_gvec = NULL;
-    gen_helper_gvec_3_ptr *fn_gvec_ptr = NULL;
-    int rd, rn, rm, opr_sz, data;
-    int off_rn, off_rm;
-    bool is_long = false, q = extract32(insn, 6, 1);
-    bool ptr_is_env = false;
-
-    if ((insn & 0xffa00f10) == 0xfe000810) {
-        /* VFM[AS]L -- 1111 1110 0.0S .... .... 1000 .Q.1 .... */
-        int is_s = extract32(insn, 20, 1);
-        int vm20 = extract32(insn, 0, 3);
-        int vm3 = extract32(insn, 3, 1);
-        int m = extract32(insn, 5, 1);
-        int index;
-
-        if (!dc_isar_feature(aa32_fhm, s)) {
-            return 1;
-        }
-        if (q) {
-            rm = vm20;
-            index = m * 2 + vm3;
-        } else {
-            rm = vm20 * 2 + m;
-            index = vm3;
-        }
-        is_long = true;
-        data = (index << 2) | is_s; /* is_2 == 0 */
-        fn_gvec_ptr = gen_helper_gvec_fmlal_idx_a32;
-        ptr_is_env = true;
-    } else {
-        return 1;
-    }
-
-    VFP_DREG_D(rd, insn);
-    if (rd & q) {
-        return 1;
-    }
-    if (q || !is_long) {
-        VFP_DREG_N(rn, insn);
-        if (rn & q & !is_long) {
-            return 1;
-        }
-        off_rn = vfp_reg_offset(1, rn);
-        off_rm = vfp_reg_offset(1, rm);
-    } else {
-        rn = VFP_SREG_N(insn);
-        off_rn = vfp_reg_offset(0, rn);
-        off_rm = vfp_reg_offset(0, rm);
-    }
-    if (s->fp_excp_el) {
-        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
-                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
-        return 0;
-    }
-    if (!s->vfp_enabled) {
-        return 1;
-    }
-
-    opr_sz = (1 + q) * 8;
-    if (fn_gvec_ptr) {
-        TCGv_ptr ptr;
-        if (ptr_is_env) {
-            ptr = cpu_env;
-        } else {
-            ptr = get_fpstatus_ptr(1);
-        }
-        tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd), off_rn, off_rm, ptr,
-                           opr_sz, opr_sz, data, fn_gvec_ptr);
-        if (!ptr_is_env) {
-            tcg_temp_free_ptr(ptr);
-        }
-    } else {
-        tcg_gen_gvec_3_ool(vfp_reg_offset(1, rd), off_rn, off_rm,
-                           opr_sz, opr_sz, data, fn_gvec);
-    }
-    return 0;
-}
-
 static int disas_coproc_insn(DisasContext *s, uint32_t insn)
 {
     int cpnum, is64, crn, crm, opc1, opc2, isread, rt, rt2;
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                     }
                 }
             }
-        } else if ((insn & 0x0f000a00) == 0x0e000800
-                   && arm_dc_feature(s, ARM_FEATURE_V8)) {
-            if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
-                goto illegal_op;
-            }
-            return;
         }
         goto illegal_op;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
             }
             break;
         }
-        if ((insn & 0xff000a00) == 0xfe000800
-            && arm_dc_feature(s, ARM_FEATURE_V8)) {
-            /* The Thumb2 and ARM encodings are identical.  */
-            if (disas_neon_insn_2reg_scalar_ext(s, insn)) {
-                goto illegal_op;
-            }
-        } else if (((insn >> 24) & 3) == 3) {
+        if (((insn >> 24) & 3) == 3) {
             /* Translate into the equivalent ARM encoding.  */
             insn = (insn & 0xe2ffffff) | ((insn & (1 << 28)) >> 4) | (1 << 28);
             if (disas_neon_data_insn(s, insn)) {
-- 
2.20.1

Convert the Neon "load/store multiple structures" insns to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-12-peter.maydell@linaro.org
---
 target/arm/neon-ls.decode       |   7 ++
 target/arm/translate-neon.inc.c | 124 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  91 +----------------------
 3 files changed, 133 insertions(+), 89 deletions(-)

diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-ls.decode
+++ b/target/arm/neon-ls.decode
@@ -XXX,XX +XXX,XX @@
 #   0b1111_1001_xxx0_xxxx_xxxx_xxxx_xxxx_xxxx
 # This file works on the A32 encoding only; calling code for T32 has to
 # transform the insn into the A32 version first.
+
+%vd_dp  22:1 12:4
+
+# Neon load/store multiple structures
+
+VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
+               vd=%vd_dp
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
                        gen_helper_gvec_fmlal_idx_a32);
     return true;
 }
+
+static struct {
+    int nregs;
+    int interleave;
+    int spacing;
+} const neon_ls_element_type[11] = {
+    {1, 4, 1},
+    {1, 4, 2},
+    {4, 1, 1},
+    {2, 2, 2},
+    {1, 3, 1},
+    {1, 3, 2},
+    {3, 1, 1},
+    {1, 1, 1},
+    {1, 2, 1},
+    {1, 2, 2},
+    {2, 1, 1}
+};
+
+static void gen_neon_ldst_base_update(DisasContext *s, int rm, int rn,
+                                      int stride)
+{
+    if (rm != 15) {
+        TCGv_i32 base;
+
+        base = load_reg(s, rn);
+        if (rm == 13) {
+            tcg_gen_addi_i32(base, base, stride);
+        } else {
+            TCGv_i32 index;
+            index = load_reg(s, rm);
+            tcg_gen_add_i32(base, base, index);
+            tcg_temp_free_i32(index);
+        }
+        store_reg(s, rn, base);
+    }
+}
+
+static bool trans_VLDST_multiple(DisasContext *s, arg_VLDST_multiple *a)
+{
+    /* Neon load/store multiple structures */
+    int nregs, interleave, spacing, reg, n;
+    MemOp endian = s->be_data;
+    int mmu_idx = get_mem_index(s);
+    int size = a->size;
+    TCGv_i64 tmp64;
+    TCGv_i32 addr, tmp;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
+        return false;
+    }
+    if (a->itype > 10) {
+        return false;
+    }
+    /* Catch UNDEF cases for bad values of align field */
+    switch (a->itype & 0xc) {
+    case 4:
+        if (a->align >= 2) {
+            return false;
+        }
+        break;
+    case 8:
+        if (a->align == 3) {
+            return false;
+        }
+        break;
+    default:
+        break;
+    }
+    nregs = neon_ls_element_type[a->itype].nregs;
+    interleave = neon_ls_element_type[a->itype].interleave;
+    spacing = neon_ls_element_type[a->itype].spacing;
+    if (size == 3 && (interleave | spacing) != 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    /* For our purposes, bytes are always little-endian.  */
+    if (size == 0) {
+        endian = MO_LE;
+    }
+    /*
+     * Consecutive little-endian elements from a single register
+     * can be promoted to a larger little-endian operation.
+     */
+    if (interleave == 1 && endian == MO_LE) {
+        size = 3;
+    }
+    tmp64 = tcg_temp_new_i64();
+    addr = tcg_temp_new_i32();
+    tmp = tcg_const_i32(1 << size);
+    load_reg_var(s, addr, a->rn);
+    for (reg = 0; reg < nregs; reg++) {
+        for (n = 0; n < 8 >> size; n++) {
+            int xs;
+            for (xs = 0; xs < interleave; xs++) {
+                int tt = a->vd + reg + spacing * xs;
+
+                if (a->l) {
+                    gen_aa32_ld_i64(s, tmp64, addr, mmu_idx, endian | size);
+                    neon_store_element64(tt, n, size, tmp64);
+                } else {
+                    neon_load_element64(tmp64, tt, n, size);
+                    gen_aa32_st_i64(s, tmp64, addr, mmu_idx, endian | size);
+                }
+                tcg_gen_add_i32(addr, addr, tmp);
+            }
+        }
+    }
+    tcg_temp_free_i32(addr);
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i64(tmp64);
+
+    gen_neon_ldst_base_update(s, a->rm, a->rn, nregs * interleave * 8);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_neon_trn_u16(TCGv_i32 t0, TCGv_i32 t1)
 }
 
 
-static struct {
-    int nregs;
-    int interleave;
-    int spacing;
-} const neon_ls_element_type[11] = {
-    {1, 4, 1},
-    {1, 4, 2},
-    {4, 1, 1},
-    {2, 2, 2},
-    {1, 3, 1},
-    {1, 3, 2},
-    {3, 1, 1},
-    {1, 1, 1},
-    {1, 2, 1},
-    {1, 2, 2},
-    {2, 1, 1}
-};
-
 /* Translate a NEON load/store element instruction.  Return nonzero if the
    instruction is invalid.  */
 static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
 {
     int rd, rn, rm;
-    int op;
     int nregs;
-    int interleave;
-    int spacing;
     int stride;
     int size;
     int reg;
     int load;
-    int n;
     int vec_size;
-    int mmu_idx;
-    MemOp endian;
     TCGv_i32 addr;
     TCGv_i32 tmp;
-    TCGv_i32 tmp2;
-    TCGv_i64 tmp64;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
     rn = (insn >> 16) & 0xf;
     rm = insn & 0xf;
     load = (insn & (1 << 21)) != 0;
-    endian = s->be_data;
-    mmu_idx = get_mem_index(s);
     if ((insn & (1 << 23)) == 0) {
-        /* Load store all elements.  */
-        op = (insn >> 8) & 0xf;
-        size = (insn >> 6) & 3;
-        if (op > 10)
-            return 1;
-        /* Catch UNDEF cases for bad values of align field */
-        switch (op & 0xc) {
-        case 4:
-            if (((insn >> 5) & 1) == 1) {
-                return 1;
-            }
-            break;
-        case 8:
-            if (((insn >> 4) & 3) == 3) {
-                return 1;
-            }
-            break;
-        default:
-            break;
-        }
-        nregs = neon_ls_element_type[op].nregs;
-        interleave = neon_ls_element_type[op].interleave;
-        spacing = neon_ls_element_type[op].spacing;
-        if (size == 3 && (interleave | spacing) != 1) {
-            return 1;
-        }
-        /* For our purposes, bytes are always little-endian.  */
-        if (size == 0) {
-            endian = MO_LE;
-        }
-        /* Consecutive little-endian elements from a single register
-         * can be promoted to a larger little-endian operation.
-         */
-        if (interleave == 1 && endian == MO_LE) {
-            size = 3;
-        }
-        tmp64 = tcg_temp_new_i64();
-        addr = tcg_temp_new_i32();
-        tmp2 = tcg_const_i32(1 << size);
-        load_reg_var(s, addr, rn);
-        for (reg = 0; reg < nregs; reg++) {
-            for (n = 0; n < 8 >> size; n++) {
-                int xs;
-                for (xs = 0; xs < interleave; xs++) {
-                    int tt = rd + reg + spacing * xs;
-
-                    if (load) {
-                        gen_aa32_ld_i64(s, tmp64, addr, mmu_idx, endian | size);
-                        neon_store_element64(tt, n, size, tmp64);
-                    } else {
-                        neon_load_element64(tmp64, tt, n, size);
-                        gen_aa32_st_i64(s, tmp64, addr, mmu_idx, endian | size);
-                    }
-                    tcg_gen_add_i32(addr, addr, tmp2);
-                }
-            }
-        }
-        tcg_temp_free_i32(addr);
-        tcg_temp_free_i32(tmp2);
-        tcg_temp_free_i64(tmp64);
-        stride = nregs * interleave * 8;
+        /* Load store all elements -- handled already by decodetree */
+        return 1;
     } else {
         size = (insn >> 10) & 3;
         if (size == 3) {
-- 
2.20.1

Convert the Neon "load single structure to all lanes" insns to
decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-13-peter.maydell@linaro.org
---
 target/arm/neon-ls.decode       |  5 +++
 target/arm/translate-neon.inc.c | 73 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 55 +------------------------
 3 files changed, 80 insertions(+), 53 deletions(-)

Convert the Neon "load/store single structure to one lane" insns to
decodetree.

As this is the last set of insns in the neon load/store group,
we can remove the whole disas_neon_ls_insn() function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-14-peter.maydell@linaro.org
---
 target/arm/neon-ls.decode       |  11 +++
 target/arm/translate-neon.inc.c |  89 +++++++++++++++++++
 target/arm/translate.c          | 147 --------------------------------
 3 files changed, 100 insertions(+), 147 deletions(-)

diff --git a/target/arm/neon-ls.decode b/target/arm/neon-ls.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-ls.decode
+++ b/target/arm/neon-ls.decode
@@ -XXX,XX +XXX,XX @@ VLDST_multiple 1111 0100 0 . l:1 0 rn:4 .... itype:4 size:2 align:2 rm:4 \
 
 VLD_all_lanes  1111 0100 1 . 1 0 rn:4 .... 11 n:2 size:2 t:1 a:1 rm:4 \
                vd=%vd_dp
+
+# Neon load/store single structure to one lane
+%imm1_5_p1 5:1 !function=plus1
+%imm1_6_p1 6:1 !function=plus1
+
+VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 00 n:2 reg_idx:3 align:1 rm:4 \
+               vd=%vd_dp size=0 stride=1
+VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 01 n:2 reg_idx:2 align:2 rm:4 \
+               vd=%vd_dp size=1 stride=%imm1_5_p1
+VLDST_single   1111 0100 1 . l:1 0 rn:4 .... 10 n:2 reg_idx:1 align:3 rm:4 \
+               vd=%vd_dp size=2 stride=%imm1_6_p1
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@
  * It might be possible to convert it to a standalone .c file eventually.
  */
 
+static inline int plus1(DisasContext *s, int x)
+{
+    return x + 1;
+}
+
 /* Include the generated Neon decoder */
 #include "decode-neon-dp.inc.c"
 #include "decode-neon-ls.inc.c"
@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
 
     return true;
 }
+
+static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
+{
+    /* Neon load/store single structure to one lane */
+    int reg;
+    int nregs = a->n + 1;
+    int vd = a->vd;
+    TCGv_i32 addr, tmp;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
+        return false;
+    }
+
+    /* Catch the UNDEF cases. This is unavoidably a bit messy. */
+    switch (nregs) {
+    case 1:
+        if (((a->align & (1 << a->size)) != 0) ||
+            (a->size == 2 && ((a->align & 3) == 1 || (a->align & 3) == 2))) {
+            return false;
+        }
+        break;
+    case 3:
+        if ((a->align & 1) != 0) {
+            return false;
+        }
+        /* fall through */
+    case 2:
+        if (a->size == 2 && (a->align & 2) != 0) {
+            return false;
+        }
+        break;
+    case 4:
+        if ((a->size == 2) && ((a->align & 3) == 3)) {
+            return false;
+        }
+        break;
+    default:
+        abort();
+    }
+    if ((vd + a->stride * (nregs - 1)) > 31) {
+        /*
+         * Attempts to write off the end of the register file are
+         * UNPREDICTABLE; we choose to UNDEF because otherwise we would
+         * access off the end of the array that holds the register data.
+         */
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    tmp = tcg_temp_new_i32();
+    addr = tcg_temp_new_i32();
+    load_reg_var(s, addr, a->rn);
+    /*
+     * TODO: if we implemented alignment exceptions, we should check
+     * addr against the alignment encoded in a->align here.
+     */
+    for (reg = 0; reg < nregs; reg++) {
+        if (a->l) {
+            gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
+                            s->be_data | a->size);
+            neon_store_element(vd, a->reg_idx, a->size, tmp);
+        } else { /* Store */
+            neon_load_element(tmp, vd, a->reg_idx, a->size);
+            gen_aa32_st_i32(s, tmp, addr, get_mem_index(s),
+                            s->be_data | a->size);
+        }
+        vd += a->stride;
+        tcg_gen_addi_i32(addr, addr, 1 << a->size);
+    }
+    tcg_temp_free_i32(addr);
+    tcg_temp_free_i32(tmp);
+
+    gen_neon_ldst_base_update(s, a->rm, a->rn, (1 << a->size) * nregs);
+
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_neon_trn_u16(TCGv_i32 t0, TCGv_i32 t1)
     tcg_temp_free_i32(rd);
 }
 
-
-/* Translate a NEON load/store element instruction.  Return nonzero if the
-   instruction is invalid.  */
-static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
-{
-    int rd, rn, rm;
-    int nregs;
-    int stride;
-    int size;
-    int reg;
-    int load;
-    TCGv_i32 addr;
-    TCGv_i32 tmp;
-
-    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-        return 1;
-    }
-
-    /* FIXME: this access check should not take precedence over UNDEF
-     * for invalid encodings; we will generate incorrect syndrome information
-     * for attempts to execute invalid vfp/neon encodings with FP disabled.
-     */
-    if (s->fp_excp_el) {
-        gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
-                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
-        return 0;
-    }
-
-    if (!s->vfp_enabled)
-      return 1;
-    VFP_DREG_D(rd, insn);
-    rn = (insn >> 16) & 0xf;
-    rm = insn & 0xf;
-    load = (insn & (1 << 21)) != 0;
-    if ((insn & (1 << 23)) == 0) {
-        /* Load store all elements -- handled already by decodetree */
-        return 1;
-    } else {
-        size = (insn >> 10) & 3;
-        if (size == 3) {
-            /* Load single element to all lanes -- handled by decodetree  */
-            return 1;
-        } else {
-            /* Single element.  */
-            int idx = (insn >> 4) & 0xf;
-            int reg_idx;
-            switch (size) {
-            case 0:
-                reg_idx = (insn >> 5) & 7;
-                stride = 1;
-                break;
-            case 1:
-                reg_idx = (insn >> 6) & 3;
-                stride = (insn & (1 << 5)) ? 2 : 1;
-                break;
-            case 2:
-                reg_idx = (insn >> 7) & 1;
-                stride = (insn & (1 << 6)) ? 2 : 1;
-                break;
-            default:
-                abort();
-            }
-            nregs = ((insn >> 8) & 3) + 1;
-            /* Catch the UNDEF cases. This is unavoidably a bit messy. */
-            switch (nregs) {
-            case 1:
-                if (((idx & (1 << size)) != 0) ||
-                    (size == 2 && ((idx & 3) == 1 || (idx & 3) == 2))) {
-                    return 1;
-                }
-                break;
-            case 3:
-                if ((idx & 1) != 0) {
-                    return 1;
-                }
-                /* fall through */
-            case 2:
-                if (size == 2 && (idx & 2) != 0) {
-                    return 1;
-                }
-                break;
-            case 4:
-                if ((size == 2) && ((idx & 3) == 3)) {
-                    return 1;
-                }
-                break;
-            default:
-                abort();
-            }
-            if ((rd + stride * (nregs - 1)) > 31) {
-                /* Attempts to write off the end of the register file
-                 * are UNPREDICTABLE; we choose to UNDEF because otherwise
-                 * the neon_load_reg() would write off the end of the array.
-                 */
-                return 1;
-            }
-            tmp = tcg_temp_new_i32();
-            addr = tcg_temp_new_i32();
-            load_reg_var(s, addr, rn);
-            for (reg = 0; reg < nregs; reg++) {
-                if (load) {
-                    gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
-                                    s->be_data | size);
-                    neon_store_element(rd, reg_idx, size, tmp);
-                } else { /* Store */
-                    neon_load_element(tmp, rd, reg_idx, size);
-                    gen_aa32_st_i32(s, tmp, addr, get_mem_index(s),
-                                    s->be_data | size);
-                }
-                rd += stride;
-                tcg_gen_addi_i32(addr, addr, 1 << size);
-            }
-            tcg_temp_free_i32(addr);
-            tcg_temp_free_i32(tmp);
-            stride = nregs * (1 << size);
-        }
-    }
-    if (rm != 15) {
-        TCGv_i32 base;
-
-        base = load_reg(s, rn);
-        if (rm == 13) {
-            tcg_gen_addi_i32(base, base, stride);
-        } else {
-            TCGv_i32 index;
-            index = load_reg(s, rm);
-            tcg_gen_add_i32(base, base, index);
-            tcg_temp_free_i32(index);
-        }
-        store_reg(s, rn, base);
-    }
-    return 0;
-}
-
 static inline void gen_neon_narrow(int size, TCGv_i32 dest, TCGv_i64 src)
 {
     switch (size) {
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
             }
             return;
         }
-        if ((insn & 0x0f100000) == 0x04000000) {
-            /* NEON load/store.  */
-            if (disas_neon_ls_insn(s, insn)) {
-                goto illegal_op;
-            }
-            return;
-        }
         if ((insn & 0x0e000f00) == 0x0c000100) {
             if (arm_dc_feature(s, ARM_FEATURE_IWMMXT)) {
                 /* iWMMXt register transfer.  */
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
         }
         break;
     case 12:
-        if ((insn & 0x01100000) == 0x01000000) {
-            if (disas_neon_ls_insn(s, insn)) {
-                goto illegal_op;
-            }
-            break;
-        }
         goto illegal_op;
     default:
     illegal_op:
-- 
2.20.1

Convert the Neon 3-reg-same VADD and VSUB insns to decodetree.

Note that we don't need the neon_3r_sizes[op] check here because all
size values are OK for VADD and VSUB; we'll add this when we convert
the first insn that has size restrictions.

For this we need one of the GVecGen*Fn typedefs currently in
translate-a64.h; move them all to translate.h as a block so they
are visible to the 32-bit decoder.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-15-peter.maydell@linaro.org
---
 target/arm/translate-a64.h      |  9 --------
 target/arm/translate.h          |  9 ++++++++
 target/arm/neon-dp.decode       | 17 +++++++++++++++
 target/arm/translate-neon.inc.c | 38 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 14 ++++--------
 5 files changed, 68 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -XXX,XX +XXX,XX @@ static inline int vec_full_reg_size(DisasContext *s)
 
 bool disas_sve(DisasContext *, uint32_t);
 
-/* Note that the gvec expanders operate on offsets + sizes.  */
-typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
-typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
-                         uint32_t, uint32_t);
-typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
-                        uint32_t, uint32_t, uint32_t);
-typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
-                        uint32_t, uint32_t, uint32_t);
-
 #endif /* TARGET_ARM_TRANSLATE_A64_H */
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 #define dc_isar_feature(name, ctx) \
     ({ DisasContext *ctx_ = (ctx); isar_feature_##name(ctx_->isar); })
 
+/* Note that the gvec expanders operate on offsets + sizes.  */
+typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
+typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, int64_t,
+                         uint32_t, uint32_t);
+typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
+                        uint32_t, uint32_t, uint32_t);
+typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
+                        uint32_t, uint32_t, uint32_t);
+
 #endif /* TARGET_ARM_TRANSLATE_H */
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@
 #
 # This file is processed by scripts/decodetree.py
 #
+# VFP/Neon register fields; same as vfp.decode
+%vm_dp  5:1 0:4
+%vn_dp  7:1 16:4
+%vd_dp  22:1 12:4
 
 # Encodings for Neon data processing instructions where the T32 encoding
 # is a simple transformation of the A32 encoding.
@@ -XXX,XX +XXX,XX @@
 #   0b111p_1111_qqqq_qqqq_qqqq_qqqq_qqqq_qqqq
 # This file works on the A32 encoding only; calling code for T32 has to
 # transform the insn into the A32 version first.
+
+######################################################################
+# 3-reg-same grouping:
+# 1111 001 U 0 D sz:2 Vn:4 Vd:4 opc:4 N Q M op Vm:4
+######################################################################
+
+&3same vm vn vd q size
+
+@3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
+                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
+VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
 
     return true;
 }
+
+static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
+{
+    int vec_size = a->q ? 16 : 8;
+    int rd_ofs = neon_reg_offset(a->vd, 0);
+    int rn_ofs = neon_reg_offset(a->vn, 0);
+    int rm_ofs = neon_reg_offset(a->vm, 0);
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fn(a->size, rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
+    return true;
+}
+
+#define DO_3SAME(INSN, FUNC)                                            \
+    static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)        \
+    {                                                                   \
+        return do_3same(s, a, FUNC);                                    \
+    }
+
+DO_3SAME(VADD, tcg_gen_gvec_add)
+DO_3SAME(VSUB, tcg_gen_gvec_sub)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             }
             return 0;
 
-        case NEON_3R_VADD_VSUB:
-            if (u) {
-                tcg_gen_gvec_sub(size, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-            } else {
-                tcg_gen_gvec_add(size, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-            }
-            return 0;
-
         case NEON_3R_VQADD:
             tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
                            rn_ofs, rm_ofs, vec_size, vec_size,
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
                            u ? &ushl_op[size] : &sshl_op[size]);
             return 0;
+
+        case NEON_3R_VADD_VSUB:
+            /* Already handled by decodetree */
+            return 1;
         }
 
         if (size == 3) {
-- 
2.20.1

Convert the Neon logic ops in the 3-reg-same grouping to decodetree.
Note that for the logic ops the 'size' field forms part of their
decode and the actual operations are always bitwise.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-16-peter.maydell@linaro.org
---
 target/arm/neon-dp.decode       | 12 +++++++++++
 target/arm/translate-neon.inc.c | 19 +++++++++++++++++
 target/arm/translate.c          | 38 +--------------------------------
 3 files changed, 32 insertions(+), 37 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@
 @3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
+@3same_logic     .... ... . . . .. .... .... .... . q:1 .. .... \
+                 &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0
+
+VAND_3s          1111 001 0 0 . 00 .... .... 0001 ... 1 .... @3same_logic
+VBIC_3s          1111 001 0 0 . 01 .... .... 0001 ... 1 .... @3same_logic
+VORR_3s          1111 001 0 0 . 10 .... .... 0001 ... 1 .... @3same_logic
+VORN_3s          1111 001 0 0 . 11 .... .... 0001 ... 1 .... @3same_logic
+VEOR_3s          1111 001 1 0 . 00 .... .... 0001 ... 1 .... @3same_logic
+VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
+VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
+VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
+
 VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
 VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
 
 DO_3SAME(VADD, tcg_gen_gvec_add)
 DO_3SAME(VSUB, tcg_gen_gvec_sub)
+DO_3SAME(VAND, tcg_gen_gvec_and)
+DO_3SAME(VBIC, tcg_gen_gvec_andc)
+DO_3SAME(VORR, tcg_gen_gvec_or)
+DO_3SAME(VORN, tcg_gen_gvec_orc)
+DO_3SAME(VEOR, tcg_gen_gvec_xor)
+
+/* These insns are all gvec_bitsel but with the inputs in various orders. */
+#define DO_3SAME_BITSEL(INSN, O1, O2, O3)                               \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        tcg_gen_gvec_bitsel(vece, rd_ofs, O1, O2, O3, oprsz, maxsz);    \
+    }                                                                   \
+    DO_3SAME(INSN, gen_##INSN##_3s)
+
+DO_3SAME_BITSEL(VBSL, rd_ofs, rn_ofs, rm_ofs)
+DO_3SAME_BITSEL(VBIT, rm_ofs, rn_ofs, rd_ofs)
+DO_3SAME_BITSEL(VBIF, rm_ofs, rd_ofs, rn_ofs)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             }
             return 1;
 
-        case NEON_3R_LOGIC: /* Logic ops.  */
-            switch ((u << 2) | size) {
-            case 0: /* VAND */
-                tcg_gen_gvec_and(0, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-                break;
-            case 1: /* VBIC */
-                tcg_gen_gvec_andc(0, rd_ofs, rn_ofs, rm_ofs,
-                                  vec_size, vec_size);
-                break;
-            case 2: /* VORR */
-                tcg_gen_gvec_or(0, rd_ofs, rn_ofs, rm_ofs,
-                                vec_size, vec_size);
-                break;
-            case 3: /* VORN */
-                tcg_gen_gvec_orc(0, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-                break;
-            case 4: /* VEOR */
-                tcg_gen_gvec_xor(0, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-                break;
-            case 5: /* VBSL */
-                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rd_ofs, rn_ofs, rm_ofs,
-                                    vec_size, vec_size);
-                break;
-            case 6: /* VBIT */
-                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rn_ofs, rd_ofs,
-                                    vec_size, vec_size);
-                break;
-            case 7: /* VBIF */
-                tcg_gen_gvec_bitsel(MO_8, rd_ofs, rm_ofs, rd_ofs, rn_ofs,
-                                    vec_size, vec_size);
-                break;
-            }
-            return 0;
-
         case NEON_3R_VQADD:
             tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
                            rn_ofs, rm_ofs, vec_size, vec_size,
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             return 0;
 
         case NEON_3R_VADD_VSUB:
+        case NEON_3R_LOGIC:
             /* Already handled by decodetree */
             return 1;
         }
-- 
2.20.1

Convert the Neon 3-reg-same VMAX and VMIN insns to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-17-peter.maydell@linaro.org
---
 target/arm/neon-dp.decode       |  5 +++++
 target/arm/translate-neon.inc.c | 14 ++++++++++++++
 target/arm/translate.c          | 21 ++-------------------
 3 files changed, 21 insertions(+), 19 deletions(-)

Convert the Neon comparison ops in the 3-reg-same grouping
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-18-peter.maydell@linaro.org
---
 target/arm/neon-dp.decode       |  8 ++++++++
 target/arm/translate-neon.inc.c | 22 ++++++++++++++++++++++
 target/arm/translate.c          | 23 +++--------------------
 3 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
 VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
 VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
 
+VCGT_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 0 .... @3same
+VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
+VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
+VCGE_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
+
 VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
 VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
 VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
@@ -XXX,XX +XXX,XX @@ VMIN_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 1 .... @3same
 
 VADD_3s          1111 001 0 0 . .. .... .... 1000 . . . 0 .... @3same
 VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
+
+VTST_3s          1111 001 0 0 . .. .... .... 1000 . . . 1 .... @3same
+VCEQ_3s          1111 001 1 0 . .. .... .... 1000 . . . 1 .... @3same
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
 DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
 DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
 DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
+
+#define DO_3SAME_CMP(INSN, COND)                                        \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        tcg_gen_gvec_cmp(COND, vece, rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz); \
+    }                                                                   \
+    DO_3SAME_NO_SZ_3(INSN, gen_##INSN##_3s)
+
+DO_3SAME_CMP(VCGT_S, TCG_COND_GT)
+DO_3SAME_CMP(VCGT_U, TCG_COND_GTU)
+DO_3SAME_CMP(VCGE_S, TCG_COND_GE)
+DO_3SAME_CMP(VCGE_U, TCG_COND_GEU)
+DO_3SAME_CMP(VCEQ, TCG_COND_EQ)
+
+static void gen_VTST_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                         uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &cmtst_op[vece]);
+}
+DO_3SAME_NO_SZ_3(VTST, gen_VTST_3s)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                            u ? &mls_op[size] : &mla_op[size]);
             return 0;
 
-        case NEON_3R_VTST_VCEQ:
-            if (u) { /* VCEQ */
-                tcg_gen_gvec_cmp(TCG_COND_EQ, size, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-            } else { /* VTST */
-                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
-                               vec_size, vec_size, &cmtst_op[size]);
-            }
-            return 0;
-
-        case NEON_3R_VCGT:
-            tcg_gen_gvec_cmp(u ? TCG_COND_GTU : TCG_COND_GT, size,
-                             rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
-            return 0;
-
-        case NEON_3R_VCGE:
-            tcg_gen_gvec_cmp(u ? TCG_COND_GEU : TCG_COND_GE, size,
-                             rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
-            return 0;
-
         case NEON_3R_VSHL:
             /* Note the operation is vshl vd,vm,vn */
             tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_LOGIC:
         case NEON_3R_VMAX:
         case NEON_3R_VMIN:
+        case NEON_3R_VTST_VCEQ:
+        case NEON_3R_VCGT:
+        case NEON_3R_VCGE:
             /* Already handled by decodetree */
             return 1;
         }
-- 
2.20.1

Convert the Neon VQADD/VQSUB insns in the 3-reg-same grouping
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-19-peter.maydell@linaro.org
---
 target/arm/neon-dp.decode       |  6 ++++++
 target/arm/translate-neon.inc.c | 15 +++++++++++++++
 target/arm/translate.c          | 14 ++------------
 3 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@
 @3same           .... ... . . . size:2 .... .... .... . q:1 . . .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
+VQADD_S_3s       1111 001 0 0 . .. .... .... 0000 . . . 1 .... @3same
+VQADD_U_3s       1111 001 1 0 . .. .... .... 0000 . . . 1 .... @3same
+
 @3same_logic     .... ... . . . .. .... .... .... . q:1 .. .... \
                  &3same vm=%vm_dp vn=%vn_dp vd=%vd_dp size=0
 
@@ -XXX,XX +XXX,XX @@ VBSL_3s          1111 001 1 0 . 01 .... .... 0001 ... 1 .... @3same_logic
 VBIT_3s          1111 001 1 0 . 10 .... .... 0001 ... 1 .... @3same_logic
 VBIF_3s          1111 001 1 0 . 11 .... .... 0001 ... 1 .... @3same_logic
 
+VQSUB_S_3s       1111 001 0 0 . .. .... .... 0010 . . . 1 .... @3same
+VQSUB_U_3s       1111 001 1 0 . .. .... .... 0010 . . . 1 .... @3same
+
 VCGT_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 0 .... @3same
 VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
 VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static void gen_VTST_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
     tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz, &cmtst_op[vece]);
 }
 DO_3SAME_NO_SZ_3(VTST, gen_VTST_3s)
+
+#define DO_3SAME_GVEC4(INSN, OPARRAY)                                   \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),           \
+                       rn_ofs, rm_ofs, oprsz, maxsz, &OPARRAY[vece]);   \
+    }                                                                   \
+    DO_3SAME(INSN, gen_##INSN##_3s)
+
+DO_3SAME_GVEC4(VQADD_S, sqadd_op)
+DO_3SAME_GVEC4(VQADD_U, uqadd_op)
+DO_3SAME_GVEC4(VQSUB_S, sqsub_op)
+DO_3SAME_GVEC4(VQSUB_U, uqsub_op)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             }
             return 1;
 
-        case NEON_3R_VQADD:
-            tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
-                           rn_ofs, rm_ofs, vec_size, vec_size,
-                           (u ? uqadd_op : sqadd_op) + size);
-            return 0;
-
-        case NEON_3R_VQSUB:
-            tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
-                           rn_ofs, rm_ofs, vec_size, vec_size,
-                           (u ? uqsub_op : sqsub_op) + size);
-            return 0;
-
         case NEON_3R_VMUL: /* VMUL */
             if (u) {
                 /* Polynomial case allows only P8.  */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VTST_VCEQ:
         case NEON_3R_VCGT:
         case NEON_3R_VCGE:
+        case NEON_3R_VQADD:
+        case NEON_3R_VQSUB:
             /* Already handled by decodetree */
             return 1;
         }
-- 
2.20.1

Convert the Neon VMUL, VMLA, VMLS and VSHL insns in the
3-reg-same grouping to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-20-peter.maydell@linaro.org
---
 target/arm/neon-dp.decode       |  9 +++++++
 target/arm/translate-neon.inc.c | 44 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 28 +++------------------
 3 files changed, 56 insertions(+), 25 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ VCGT_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 0 .... @3same
 VCGE_S_3s        1111 001 0 0 . .. .... .... 0011 . . . 1 .... @3same
 VCGE_U_3s        1111 001 1 0 . .. .... .... 0011 . . . 1 .... @3same
 
+VSHL_S_3s        1111 001 0 0 . .. .... .... 0100 . . . 0 .... @3same
+VSHL_U_3s        1111 001 1 0 . .. .... .... 0100 . . . 0 .... @3same
+
 VMAX_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 0 .... @3same
 VMAX_U_3s        1111 001 1 0 . .. .... .... 0110 . . . 0 .... @3same
 VMIN_S_3s        1111 001 0 0 . .. .... .... 0110 . . . 1 .... @3same
@@ -XXX,XX +XXX,XX @@ VSUB_3s          1111 001 1 0 . .. .... .... 1000 . . . 0 .... @3same
 
 VTST_3s          1111 001 0 0 . .. .... .... 1000 . . . 1 .... @3same
 VCEQ_3s          1111 001 1 0 . .. .... .... 1000 . . . 1 .... @3same
+
+VMLA_3s          1111 001 0 0 . .. .... .... 1001 . . . 0 .... @3same
+VMLS_3s          1111 001 1 0 . .. .... .... 1001 . . . 0 .... @3same
+
+VMUL_3s          1111 001 0 0 . .. .... .... 1001 . . . 1 .... @3same
+VMUL_p_3s        1111 001 1 0 . .. .... .... 1001 . . . 1 .... @3same
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_3SAME_NO_SZ_3(VMAX_S, tcg_gen_gvec_smax)
 DO_3SAME_NO_SZ_3(VMAX_U, tcg_gen_gvec_umax)
 DO_3SAME_NO_SZ_3(VMIN_S, tcg_gen_gvec_smin)
 DO_3SAME_NO_SZ_3(VMIN_U, tcg_gen_gvec_umin)
+DO_3SAME_NO_SZ_3(VMUL, tcg_gen_gvec_mul)
 
 #define DO_3SAME_CMP(INSN, COND)                                        \
     static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
@@ -XXX,XX +XXX,XX @@ DO_3SAME_GVEC4(VQADD_S, sqadd_op)
 DO_3SAME_GVEC4(VQADD_U, uqadd_op)
 DO_3SAME_GVEC4(VQSUB_S, sqsub_op)
 DO_3SAME_GVEC4(VQSUB_U, uqsub_op)
+
+static void gen_VMUL_p_3s(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                           uint32_t rm_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+    tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, oprsz, maxsz,
+                       0, gen_helper_gvec_pmul_b);
+}
+
+static bool trans_VMUL_p_3s(DisasContext *s, arg_3same *a)
+{
+    if (a->size != 0) {
+        return false;
+    }
+    return do_3same(s, a, gen_VMUL_p_3s);
+}
+
+#define DO_3SAME_GVEC3_NO_SZ_3(INSN, OPARRAY)                           \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,                          \
+                       oprsz, maxsz, &OPARRAY[vece]);                   \
+    }                                                                   \
+    DO_3SAME_NO_SZ_3(INSN, gen_##INSN##_3s)
+
+
+DO_3SAME_GVEC3_NO_SZ_3(VMLA, mla_op)
+DO_3SAME_GVEC3_NO_SZ_3(VMLS, mls_op)
+
+#define DO_3SAME_GVEC3_SHIFT(INSN, OPARRAY)                             \
+    static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs,         \
+                                uint32_t rn_ofs, uint32_t rm_ofs,       \
+                                uint32_t oprsz, uint32_t maxsz)         \
+    {                                                                   \
+        /* Note the operation is vshl vd,vm,vn */                       \
+        tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs,                          \
+                       oprsz, maxsz, &OPARRAY[vece]);                   \
+    }                                                                   \
+    DO_3SAME(INSN, gen_##INSN##_3s)
+
+DO_3SAME_GVEC3_SHIFT(VSHL_S, sshl_op)
+DO_3SAME_GVEC3_SHIFT(VSHL_U, ushl_op)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             }
             return 1;
 
-        case NEON_3R_VMUL: /* VMUL */
-            if (u) {
-                /* Polynomial case allows only P8.  */
-                if (size != 0) {
-                    return 1;
-                }
-                tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
-                                   0, gen_helper_gvec_pmul_b);
-            } else {
-                tcg_gen_gvec_mul(size, rd_ofs, rn_ofs, rm_ofs,
-                                 vec_size, vec_size);
-            }
-            return 0;
-
-        case NEON_3R_VML: /* VMLA, VMLS */
-            tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
-                           u ? &mls_op[size] : &mla_op[size]);
-            return 0;
-
-        case NEON_3R_VSHL:
-            /* Note the operation is vshl vd,vm,vn */
-            tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
-                           u ? &ushl_op[size] : &sshl_op[size]);
-            return 0;
-
         case NEON_3R_VADD_VSUB:
         case NEON_3R_LOGIC:
         case NEON_3R_VMAX:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VCGE:
         case NEON_3R_VQADD:
         case NEON_3R_VQSUB:
+        case NEON_3R_VMUL:
+        case NEON_3R_VML:
+        case NEON_3R_VSHL:
             /* Already handled by decodetree */
             return 1;
         }
-- 
2.20.1

We're going to want at least some of the NeonGen* typedefs
for the refactored 32-bit Neon decoder, so move them all
to translate.h since it makes more sense to keep them in
one group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200430181003.21682-23-peter.maydell@linaro.org
---
 target/arm/translate.h     | 17 +++++++++++++++++
 target/arm/translate-a64.c | 17 -----------------
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
 typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
                         uint32_t, uint32_t, uint32_t);
 
+/* Function prototype for gen_ functions for calling Neon helpers */
+typedef void NeonGenOneOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32);
+typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
+typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
+typedef void NeonGenTwo64OpFn(TCGv_i64, TCGv_i64, TCGv_i64);
+typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
+typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
+typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
+typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
+typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
+typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
+typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
+typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
+typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
+typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
+typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
+
 #endif /* TARGET_ARM_TRANSLATE_H */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ typedef struct AArch64DecodeTable {
     AArch64DecodeFn *disas_fn;
 } AArch64DecodeTable;
 
-/* Function prototype for gen_ functions for calling Neon helpers */
-typedef void NeonGenOneOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32);
-typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
-typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
-typedef void NeonGenTwo64OpFn(TCGv_i64, TCGv_i64, TCGv_i64);
-typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
-typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
-typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
-typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
-typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
-typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
-typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
-typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
-typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
-typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
-typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
-
 /* initialize TCG globals.  */
 void a64_translate_init(void)
 {
-- 
2.20.1

The following changes since commit a97978bcc2d1f650c7d411428806e5b03082b8c7:

Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210603' into staging (2021-06-03 10:00:35 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210603

for you to fetch changes up to 1c861885894d840235954060050d240259f5340b:

tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed (2021-06-03 16:43:27 +0100)

----------------------------------------------------------------
target-arm queue:
 * Some not-yet-enabled preliminaries for M-profile MVE support
 * Consistently use "Cortex-Axx", not "Cortex Axx" in docs, comments
 * docs: Fix installation of man pages with Sphinx 4.x
 * Mark LDS{MIN,MAX} as signed operations
 * Fix missing syndrome value for DAIF and PAC check exceptions
 * Implement BFloat16 extensions
 * Refactoring of hvf accelerator code in preparation for aarch64 support
 * Fix some coverity nits in test code

----------------------------------------------------------------
Alexander Graf (12):
      hvf: Move assert_hvf_ok() into common directory
      hvf: Move vcpu thread functions into common directory
      hvf: Move cpu functions into common directory
      hvf: Move hvf internal definitions into common header
      hvf: Make hvf_set_phys_mem() static
      hvf: Remove use of hv_uvaddr_t and hv_gpaddr_t
      hvf: Split out common code on vcpu init and destroy
      hvf: Use cpu_synchronize_state()
      hvf: Make synchronize functions static
      hvf: Remove hvf-accel-ops.h
      hvf: Introduce hvf vcpu struct
      hvf: Simplify post reset/init/loadvm hooks

Damien Goutte-Gattat (1):
      docs: Fix installation of man pages with Sphinx 4.x

Jamie Iles (4):
      target/arm: fix missing exception class
      target/arm: fold do_raise_exception into raise_exception
      target/arm: use raise_exception_ra for MTE check failure
      target/arm: use raise_exception_ra for stack limit exception

Peter Maydell (15):
      target/arm: Add isar feature check functions for MVE
      target/arm: Update feature checks for insns which are "MVE or FP"
      target/arm: Move fpsp/fpdp isar check into callers of do_vfp_2op_sp/dp
      target/arm: Add MVE check to VMOV_reg_sp and VMOV_reg_dp
      target/arm: Fix return values in fp_sysreg_checks()
      target/arm: Implement M-profile VPR register
      target/arm: Make FPSCR.LTPSIZE writable for MVE
      target/arm: Allow board models to specify initial NS VTOR
      arm: Consistently use "Cortex-Axx", not "Cortex Axx"
      tests/qtest/bios-tables-test: Check for dup2() failure
      tests/qtest/e1000e-test: Check qemu_recv() succeeded
      tests/qtest/hd-geo-test: Fix checks on mkstemp() return value
      tests/qtest/pflash-cfi02-test: Avoid potential integer overflow
      tests/qtest/tpm-tests: Remove unnecessary NULL checks
      tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed

Richard Henderson (13):
      target/arm: Mark LDS{MIN,MAX} as signed operations
      target/arm: Add isar_feature_{aa32, aa64, aa64_sve}_bf16
      target/arm: Unify unallocated path in disas_fp_1src
      target/arm: Implement scalar float32 to bfloat16 conversion
      target/arm: Implement vector float32 to bfloat16 conversion
      softfpu: Add float_round_to_odd_inf
      target/arm: Implement bfloat16 dot product (vector)
      target/arm: Implement bfloat16 dot product (indexed)
      target/arm: Implement bfloat16 matrix multiply accumulate
      target/arm: Implement bfloat widening fma (vector)
      target/arm: Implement bfloat widening fma (indexed)
      linux-user/aarch64: Enable hwcap bits for bfloat16
      target/arm: Enable BFloat16 extensions

Add the isar feature check functions we will need for v8.1M MVE:
 * a check for MVE present: this corresponds to the pseudocode's
   CheckDecodeFaults(ExtType_Mve)
 * a check for the optional floating-point part of MVE: this
   corresponds to CheckDecodeFaults(ExtType_MveFp)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-2-peter.maydell@linaro.org
---
 target/arm/cpu.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
     }
 }
 
+static inline bool isar_feature_aa32_mve(const ARMISARegisters *id)
+{
+    /*
+     * Return true if MVE is supported (either integer or floating point).
+     * We must check for M-profile as the MVFR1 field means something
+     * else for A-profile.
+     */
+    return isar_feature_aa32_mprofile(id) &&
+        FIELD_EX32(id->mvfr1, MVFR1, MVE) > 0;
+}
+
+static inline bool isar_feature_aa32_mve_fp(const ARMISARegisters *id)
+{
+    /*
+     * Return true if MVE is supported (either integer or floating point).
+     * We must check for M-profile as the MVFR1 field means something
+     * else for A-profile.
+     */
+    return isar_feature_aa32_mprofile(id) &&
+        FIELD_EX32(id->mvfr1, MVFR1, MVE) >= 2;
+}
+
 static inline bool isar_feature_aa32_vfp_simd(const ARMISARegisters *id)
 {
     /*
-- 
2.20.1

Some v8M instructions are present if either the floating point
extension or MVE is implemented.  Update our implementation of them
to check for MVE as well as for FP.

This is all the insns which use CheckDecodeFaults(ExtType_MveOrFp) or
CheckDecodeFaults(ExtType_MveOrDpFp) in their pseudocode, which are
essentially the loads and stores, moves and sysreg accesses, except
for VMOV_reg_sp and VMOV_reg_dp, which we handle in subsequent
patches because they need a refactor to provide a place to put the
new MVE check.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-3-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 48 +++++++++++++++++++++++---------------
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
     /* VMOV scalar to general purpose register */
     TCGv_i32 tmp;
 
-    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
-    if (a->size == MO_32
-        ? !dc_isar_feature(aa32_fpsp_v2, s)
-        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-        return false;
+    /*
+     * SIZE == MO_32 is a VFP instruction; otherwise NEON. MVE has
+     * all sizes, whether the CPU has fp or not.
+     */
+    if (!dc_isar_feature(aa32_mve, s)) {
+        if (a->size == MO_32
+            ? !dc_isar_feature(aa32_fpsp_v2, s)
+            : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
+            return false;
+        }
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist */
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
     /* VMOV general purpose register to scalar */
     TCGv_i32 tmp;
 
-    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
-    if (a->size == MO_32
-        ? !dc_isar_feature(aa32_fpsp_v2, s)
-        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-        return false;
+    /*
+     * SIZE == MO_32 is a VFP instruction; otherwise NEON. MVE has
+     * all sizes, whether the CPU has fp or not.
+     */
+    if (!dc_isar_feature(aa32_mve, s)) {
+        if (a->size == MO_32
+            ? !dc_isar_feature(aa32_fpsp_v2, s)
+            : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
+            return false;
+        }
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist */
@@ -XXX,XX +XXX,XX @@ typedef enum FPSysRegCheckResult {
 
 static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
 {
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return FPSysRegCheckFailed;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
 {
     TCGv_i32 tmp;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
 {
     TCGv_i32 tmp;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
      * floating point register.  Note that this does not require support
      * for double precision arithmetic.
      */
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     uint32_t offset;
     TCGv_i32 addr, tmp;
 
-    if (!dc_isar_feature(aa32_fp16_arith, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     uint32_t offset;
     TCGv_i32 addr, tmp;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
     TCGv_i64 tmp;
 
     /* Note that this does not require support for double arithmetic.  */
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
     TCGv_i32 addr, tmp;
     int i, n;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
     int i, n;
 
     /* Note that this does not require support for double arithmetic.  */
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
-- 
2.20.1

The do_vfp_2op_sp() and do_vfp_2op_dp() functions currently check
whether floating point is supported via the aa32_fpdp_v2 and
aa32_fpsp_v2 isar checks.  For v8.1M MVE support, the VMOV_reg trans
functions (but not any of the others) need to update this to also
allow the insn if MVE is implemented.  Move the check out of the do_
function and into its callsites (which are all implemented via the
DO_VFP_2OP macro), so we have a place to change the check for the
VMOV insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-4-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     int veclen = s->vec_len;
     TCGv_i32 f0, fd;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
-        return false;
-    }
+    /* Note that the caller must check the aa32_fpsp_v2 feature. */
 
     if (!dc_isar_feature(aa32_fpshvec, s) &&
         (veclen != 0 || s->vec_stride != 0)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
      */
     TCGv_i32 f0;
 
+    /* Note that the caller must check the aa32_fp16_arith feature */
+
     if (!dc_isar_feature(aa32_fp16_arith, s)) {
         return false;
     }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
     int veclen = s->vec_len;
     TCGv_i64 f0, fd;
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
-        return false;
-    }
+    /* Note that the caller must check the aa32_fpdp_v2 feature. */
 
     /* UNDEF accesses to D16-D31 if they don't exist */
     if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vm) & 0x10)) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     return true;
 }
 
-#define DO_VFP_2OP(INSN, PREC, FN)                              \
+#define DO_VFP_2OP(INSN, PREC, FN, CHECK)                       \
     static bool trans_##INSN##_##PREC(DisasContext *s,          \
                                       arg_##INSN##_##PREC *a)   \
     {                                                           \
+        if (!dc_isar_feature(CHECK, s)) {                       \
+            return false;                                       \
+        }                                                       \
         return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
     }
 
-DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32)
-DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64)
+DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32, aa32_fpsp_v2)
+DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64, aa32_fpdp_v2)
 
-DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh)
-DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss)
-DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd)
+DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh, aa32_fp16_arith)
+DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss, aa32_fpsp_v2)
+DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd, aa32_fpdp_v2)
 
-DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh)
-DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs)
-DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd)
+DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh, aa32_fp16_arith)
+DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs, aa32_fpsp_v2)
+DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd, aa32_fpdp_v2)
 
 static void gen_VSQRT_hp(TCGv_i32 vd, TCGv_i32 vm)
 {
@@ -XXX,XX +XXX,XX @@ static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
     gen_helper_vfp_sqrtd(vd, vm, cpu_env);
 }
 
-DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp)
-DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp)
-DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp)
+DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp, aa32_fp16_arith)
+DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp, aa32_fpsp_v2)
+DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp, aa32_fpdp_v2)
 
 static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
 {
-- 
2.20.1

Split out the handling of VMOV_reg_sp and VMOV_reg_dp so that we can
permit the insns if either FP or MVE are present.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-5-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

The fp_sysreg_checks() function is supposed to be returning an
FPSysRegCheckResult, which is an enum with three possible values.
However, three places in the function "return false" (a hangover from
a previous iteration of the design where the function just returned a
bool).  Make these return FPSysRegCheckFailed instead (for no
functional change, since both false and FPSysRegCheckFailed are
zero).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-6-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
         break;
     case ARM_VFP_FPSCR_NZCVQC:
         if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-            return false;
+            return FPSysRegCheckFailed;
         }
         break;
     case ARM_VFP_FPCXT_S:
     case ARM_VFP_FPCXT_NS:
         if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-            return false;
+            return FPSysRegCheckFailed;
         }
         if (!s->v8m_secure) {
-            return false;
+            return FPSysRegCheckFailed;
         }
         break;
     default:
-- 
2.20.1

If MVE is implemented for an M-profile CPU then it has a VPR
register, which tracks predication information.

Implement the read and write handling of this register, and
the migration of its state.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-7-peter.maydell@linaro.org
---
 target/arm/cpu.h           |  6 ++++++
 target/arm/machine.c       | 19 +++++++++++++++++++
 target/arm/translate-vfp.c | 38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
         uint32_t cpacr[M_REG_NUM_BANKS];
         uint32_t nsacr;
         int ltpsize;
+        uint32_t vpr;
     } v7m;
 
     /* Information associated with an exception about to be taken:
@@ -XXX,XX +XXX,XX @@ FIELD(V7M_FPCCR, ASPEN, 31, 1)
      R_V7M_FPCCR_UFRDY_MASK |                   \
      R_V7M_FPCCR_ASPEN_MASK)
 
+/* v7M VPR bits */
+FIELD(V7M_VPR, P0, 0, 16)
+FIELD(V7M_VPR, MASK01, 16, 4)
+FIELD(V7M_VPR, MASK23, 20, 4)
+
 /*
  * System register ID fields.
  */
diff --git a/target/arm/machine.c b/target/arm/machine.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_fp = {
     }
 };
 
+static bool mve_needed(void *opaque)
+{
+    ARMCPU *cpu = opaque;
+
+    return cpu_isar_feature(aa32_mve, cpu);
+}
+
+static const VMStateDescription vmstate_m_mve = {
+    .name = "cpu/m/mve",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = mve_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(env.v7m.vpr, ARMCPU),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_m = {
     .name = "cpu/m",
     .version_id = 4,
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m = {
         &vmstate_m_other_sp,
         &vmstate_m_v8m,
         &vmstate_m_fp,
+        &vmstate_m_mve,
         NULL
     }
 };
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
             return FPSysRegCheckFailed;
         }
         break;
+    case ARM_VFP_VPR:
+    case ARM_VFP_P0:
+        if (!dc_isar_feature(aa32_mve, s)) {
+            return FPSysRegCheckFailed;
+        }
+        break;
     default:
         return FPSysRegCheckFailed;
     }
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
         tcg_temp_free_i32(sfpa);
         break;
     }
+    case ARM_VFP_VPR:
+        /* Behaves as NOP if not privileged */
+        if (IS_USER(s)) {
+            break;
+        }
+        tmp = loadfn(s, opaque);
+        store_cpu_field(tmp, v7m.vpr);
+        break;
+    case ARM_VFP_P0:
+    {
+        TCGv_i32 vpr;
+        tmp = loadfn(s, opaque);
+        vpr = load_cpu_field(v7m.vpr);
+        tcg_gen_deposit_i32(vpr, vpr, tmp,
+                            R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
+        store_cpu_field(vpr, v7m.vpr);
+        tcg_temp_free_i32(tmp);
+        break;
+    }
     default:
         g_assert_not_reached();
     }
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
         tcg_temp_free_i32(fpscr);
         break;
     }
+    case ARM_VFP_VPR:
+        /* Behaves as NOP if not privileged */
+        if (IS_USER(s)) {
+            break;
+        }
+        tmp = load_cpu_field(v7m.vpr);
+        storefn(s, opaque, tmp);
+        break;
+    case ARM_VFP_P0:
+        tmp = load_cpu_field(v7m.vpr);
+        tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
+        storefn(s, opaque, tmp);
+        break;
     default:
         g_assert_not_reached();
     }
-- 
2.20.1

The M-profile FPSCR has an LTPSIZE field, but if MVE is not
implemented it is read-only and always reads as 4; this is how QEMU
currently handles it.

Make the field writable when MVE is implemented.

We can safely add the field to the MVE migration struct because
currently no CPUs enable MVE and so the migration struct is never
used.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-8-peter.maydell@linaro.org
---
 target/arm/cpu.h        | 3 ++-
 target/arm/machine.c    | 1 +
 target/arm/vfp_helper.c | 9 ++++++---
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
         uint32_t fpdscr[M_REG_NUM_BANKS];
         uint32_t cpacr[M_REG_NUM_BANKS];
         uint32_t nsacr;
-        int ltpsize;
+        uint32_t ltpsize;
         uint32_t vpr;
     } v7m;
 
@@ -XXX,XX +XXX,XX @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
 
 #define FPCR_LTPSIZE_SHIFT 16   /* LTPSIZE, M-profile only */
 #define FPCR_LTPSIZE_MASK (7 << FPCR_LTPSIZE_SHIFT)
+#define FPCR_LTPSIZE_LENGTH 3
 
 #define FPCR_NZCV_MASK (FPCR_N | FPCR_Z | FPCR_C | FPCR_V)
 #define FPCR_NZCVQC_MASK (FPCR_NZCV_MASK | FPCR_QC)
diff --git a/target/arm/machine.c b/target/arm/machine.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_mve = {
     .needed = mve_needed,
     .fields = (VMStateField[]) {
         VMSTATE_UINT32(env.v7m.vpr, ARMCPU),
+        VMSTATE_UINT32(env.v7m.ltpsize, ARMCPU),
         VMSTATE_END_OF_LIST()
     },
 };
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t vfp_get_fpscr(CPUARMState *env)
 
 void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
 {
+    ARMCPU *cpu = env_archcpu(env);
+
     /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
-    if (!cpu_isar_feature(any_fp16, env_archcpu(env))) {
+    if (!cpu_isar_feature(any_fp16, cpu)) {
         val &= ~FPCR_FZ16;
     }
 
@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
          * because in v7A no-short-vector-support cores still had to
          * allow Stride/Len to be written with the only effect that
          * some insns are required to UNDEF if the guest sets them.
-         *
-         * TODO: if M-profile MVE implemented, set LTPSIZE.
          */
         env->vfp.vec_len = extract32(val, 16, 3);
         env->vfp.vec_stride = extract32(val, 20, 2);
+    } else if (cpu_isar_feature(aa32_mve, cpu)) {
+        env->v7m.ltpsize = extract32(val, FPCR_LTPSIZE_SHIFT,
+                                     FPCR_LTPSIZE_LENGTH);
     }
 
     if (arm_feature(env, ARM_FEATURE_NEON)) {
-- 
2.20.1

Currently we allow board models to specify the initial value of the
Secure VTOR register, using an init-svtor property on the TYPE_ARMV7M
object which is plumbed through to the CPU.  Allow board models to
also specify the initial value of the Non-secure VTOR via a similar
init-nsvtor property.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-10-peter.maydell@linaro.org
---
 include/hw/arm/armv7m.h |  2 ++
 target/arm/cpu.h        |  2 ++
 hw/arm/armv7m.c         |  7 +++++++
 target/arm/cpu.c        | 10 ++++++++++
 4 files changed, 21 insertions(+)

diff --git a/include/hw/arm/armv7m.h b/include/hw/arm/armv7m.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/armv7m.h
+++ b/include/hw/arm/armv7m.h
@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(ARMv7MState, ARMV7M)
  *   devices will be automatically layered on top of this view.)
  * + Property "idau": IDAU interface (forwarded to CPU object)
  * + Property "init-svtor": secure VTOR reset value (forwarded to CPU object)
+ * + Property "init-nsvtor": non-secure VTOR reset value (forwarded to CPU object)
  * + Property "vfp": enable VFP (forwarded to CPU object)
  * + Property "dsp": enable DSP (forwarded to CPU object)
  * + Property "enable-bitband": expose bitbanded IO
@@ -XXX,XX +XXX,XX @@ struct ARMv7MState {
     MemoryRegion *board_memory;
     Object *idau;
     uint32_t init_svtor;
+    uint32_t init_nsvtor;
     bool enable_bitband;
     bool start_powered_off;
     bool vfp;
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
 
     /* For v8M, initial value of the Secure VTOR */
     uint32_t init_svtor;
+    /* For v8M, initial value of the Non-secure VTOR */
+    uint32_t init_nsvtor;
 
     /* [QEMU_]KVM_ARM_TARGET_* constant for this CPU, or
      * QEMU_KVM_ARM_TARGET_NONE if the kernel doesn't support this CPU type.
diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/armv7m.c
+++ b/hw/arm/armv7m.c
@@ -XXX,XX +XXX,XX @@ static void armv7m_realize(DeviceState *dev, Error **errp)
             return;
         }
     }
+    if (object_property_find(OBJECT(s->cpu), "init-nsvtor")) {
+        if (!object_property_set_uint(OBJECT(s->cpu), "init-nsvtor",
+                                      s->init_nsvtor, errp)) {
+            return;
+        }
+    }
     if (object_property_find(OBJECT(s->cpu), "start-powered-off")) {
         if (!object_property_set_bool(OBJECT(s->cpu), "start-powered-off",
                                       s->start_powered_off, errp)) {
@@ -XXX,XX +XXX,XX @@ static Property armv7m_properties[] = {
                      MemoryRegion *),
     DEFINE_PROP_LINK("idau", ARMv7MState, idau, TYPE_IDAU_INTERFACE, Object *),
     DEFINE_PROP_UINT32("init-svtor", ARMv7MState, init_svtor, 0),
+    DEFINE_PROP_UINT32("init-nsvtor", ARMv7MState, init_nsvtor, 0),
     DEFINE_PROP_BOOL("enable-bitband", ARMv7MState, enable_bitband, false),
     DEFINE_PROP_BOOL("start-powered-off", ARMv7MState, start_powered_off,
                      false),
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
         env->regs[14] = 0xffffffff;
 
         env->v7m.vecbase[M_REG_S] = cpu->init_svtor & 0xffffff80;
+        env->v7m.vecbase[M_REG_NS] = cpu->init_nsvtor & 0xffffff80;
 
         /* Load the initial SP and PC from offset 0 and 4 in the vector table */
         vecbase = env->v7m.vecbase[env->v7m.secure];
@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
                                        &cpu->init_svtor,
                                        OBJ_PROP_FLAG_READWRITE);
     }
+    if (arm_feature(&cpu->env, ARM_FEATURE_M)) {
+        /*
+         * Initial value of the NS VTOR (for cores without the Security
+         * extension, this is the only VTOR)
+         */
+        object_property_add_uint32_ptr(obj, "init-nsvtor",
+                                       &cpu->init_nsvtor,
+                                       OBJ_PROP_FLAG_READWRITE);
+    }
 
     qdev_property_add_static(DEVICE(obj), &arm_cpu_cfgend_property);
 
-- 
2.20.1

The official punctuation for Arm CPU names uses a hyphen, like
"Cortex-A9". We mostly follow this, but in a few places usage
without the hyphen has crept in. Fix those so we consistently
use the same way of writing the CPU name.

This commit was created with:
  git grep -z -l 'Cortex ' | xargs -0 sed -i 's/Cortex /Cortex-/'

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20210527095152.10968-1-peter.maydell@linaro.org
---
 docs/system/arm/aspeed.rst    | 4 ++--
 docs/system/arm/nuvoton.rst   | 6 +++---
 docs/system/arm/sabrelite.rst | 2 +-
 include/hw/arm/allwinner-h3.h | 2 +-
 hw/arm/aspeed.c               | 6 +++---
 hw/arm/mcimx6ul-evk.c         | 2 +-
 hw/arm/mcimx7d-sabre.c        | 2 +-
 hw/arm/npcm7xx_boards.c       | 4 ++--
 hw/arm/sabrelite.c            | 2 +-
 hw/misc/npcm7xx_clk.c         | 2 +-
 10 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/aspeed.rst
+++ b/docs/system/arm/aspeed.rst
@@ -XXX,XX +XXX,XX @@ The QEMU Aspeed machines model BMCs of various OpenPOWER systems and
 Aspeed evaluation boards. They are based on different releases of the
 Aspeed SoC : the AST2400 integrating an ARM926EJ-S CPU (400MHz), the
 AST2500 with an ARM1176JZS CPU (800MHz) and more recently the AST2600
-with dual cores ARM Cortex A7 CPUs (1.2GHz).
+with dual cores ARM Cortex-A7 CPUs (1.2GHz).
 
 The SoC comes with RAM, Gigabit ethernet, USB, SD/MMC, USB, SPI, I2C,
 etc.
@@ -XXX,XX +XXX,XX @@ AST2500 SoC based machines :
 
 AST2600 SoC based machines :
 
-- ``ast2600-evb``          Aspeed AST2600 Evaluation board (Cortex A7)
+- ``ast2600-evb``          Aspeed AST2600 Evaluation board (Cortex-A7)
 - ``tacoma-bmc``           OpenPOWER Witherspoon POWER9 AST2600 BMC
 
 Supported devices
diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/nuvoton.rst
+++ b/docs/system/arm/nuvoton.rst
@@ -XXX,XX +XXX,XX @@ Nuvoton iBMC boards (``npcm750-evb``, ``quanta-gsj``)
 
 The `Nuvoton iBMC`_ chips (NPCM7xx) are a family of ARM-based SoCs that are
 designed to be used as Baseboard Management Controllers (BMCs) in various
-servers. They all feature one or two ARM Cortex A9 CPU cores, as well as an
+servers. They all feature one or two ARM Cortex-A9 CPU cores, as well as an
 assortment of peripherals targeted for either Enterprise or Data Center /
 Hyperscale applications. The former is a superset of the latter, so NPCM750 has
 all the peripherals of NPCM730 and more.
 
 .. _Nuvoton iBMC: https://www.nuvoton.com/products/cloud-computing/ibmc/
 
-The NPCM750 SoC has two Cortex A9 cores and is targeted for the Enterprise
+The NPCM750 SoC has two Cortex-A9 cores and is targeted for the Enterprise
 segment. The following machines are based on this chip :
 
 - ``npcm750-evb``       Nuvoton NPCM750 Evaluation board
 
-The NPCM730 SoC has two Cortex A9 cores and is targeted for Data Center and
+The NPCM730 SoC has two Cortex-A9 cores and is targeted for Data Center and
 Hyperscale applications. The following machines are based on this chip :
 
 - ``quanta-gsj``        Quanta GSJ server BMC
diff --git a/docs/system/arm/sabrelite.rst b/docs/system/arm/sabrelite.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/sabrelite.rst
+++ b/docs/system/arm/sabrelite.rst
@@ -XXX,XX +XXX,XX @@ Supported devices
 
 The SABRE Lite machine supports the following devices:
 
- * Up to 4 Cortex A9 cores
+ * Up to 4 Cortex-A9 cores
  * Generic Interrupt Controller
  * 1 Clock Controller Module
  * 1 System Reset Controller
diff --git a/include/hw/arm/allwinner-h3.h b/include/hw/arm/allwinner-h3.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/allwinner-h3.h
+++ b/include/hw/arm/allwinner-h3.h
@@ -XXX,XX +XXX,XX @@
  */
 
 /*
- * The Allwinner H3 is a System on Chip containing four ARM Cortex A7
+ * The Allwinner H3 is a System on Chip containing four ARM Cortex-A7
  * processor cores. Features and specifications include DDR2/DDR3 memory,
  * SD/MMC storage cards, 10/100/1000Mbit Ethernet, USB 2.0, HDMI and
  * various I/O modules.
diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_ast2600_evb_class_init(ObjectClass *oc, void *data)
     MachineClass *mc = MACHINE_CLASS(oc);
     AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 
-    mc->desc       = "Aspeed AST2600 EVB (Cortex A7)";
+    mc->desc       = "Aspeed AST2600 EVB (Cortex-A7)";
     amc->soc_name  = "ast2600-a1";
     amc->hw_strap1 = AST2600_EVB_HW_STRAP1;
     amc->hw_strap2 = AST2600_EVB_HW_STRAP2;
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_tacoma_class_init(ObjectClass *oc, void *data)
     MachineClass *mc = MACHINE_CLASS(oc);
     AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 
-    mc->desc       = "OpenPOWER Tacoma BMC (Cortex A7)";
+    mc->desc       = "OpenPOWER Tacoma BMC (Cortex-A7)";
     amc->soc_name  = "ast2600-a1";
     amc->hw_strap1 = TACOMA_BMC_HW_STRAP1;
     amc->hw_strap2 = TACOMA_BMC_HW_STRAP2;
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_rainier_class_init(ObjectClass *oc, void *data)
     MachineClass *mc = MACHINE_CLASS(oc);
     AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 
-    mc->desc       = "IBM Rainier BMC (Cortex A7)";
+    mc->desc       = "IBM Rainier BMC (Cortex-A7)";
     amc->soc_name  = "ast2600-a1";
     amc->hw_strap1 = RAINIER_BMC_HW_STRAP1;
     amc->hw_strap2 = RAINIER_BMC_HW_STRAP2;
diff --git a/hw/arm/mcimx6ul-evk.c b/hw/arm/mcimx6ul-evk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mcimx6ul-evk.c
+++ b/hw/arm/mcimx6ul-evk.c
@@ -XXX,XX +XXX,XX @@ static void mcimx6ul_evk_init(MachineState *machine)
 
 static void mcimx6ul_evk_machine_init(MachineClass *mc)
 {
-    mc->desc = "Freescale i.MX6UL Evaluation Kit (Cortex A7)";
+    mc->desc = "Freescale i.MX6UL Evaluation Kit (Cortex-A7)";
     mc->init = mcimx6ul_evk_init;
     mc->max_cpus = FSL_IMX6UL_NUM_CPUS;
     mc->default_ram_id = "mcimx6ul-evk.ram";
diff --git a/hw/arm/mcimx7d-sabre.c b/hw/arm/mcimx7d-sabre.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mcimx7d-sabre.c
+++ b/hw/arm/mcimx7d-sabre.c
@@ -XXX,XX +XXX,XX @@ static void mcimx7d_sabre_init(MachineState *machine)
 
 static void mcimx7d_sabre_machine_init(MachineClass *mc)
 {
-    mc->desc = "Freescale i.MX7 DUAL SABRE (Cortex A7)";
+    mc->desc = "Freescale i.MX7 DUAL SABRE (Cortex-A7)";
     mc->init = mcimx7d_sabre_init;
     mc->max_cpus = FSL_IMX7_NUM_CPUS;
     mc->default_ram_id = "mcimx7d-sabre.ram";
diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/npcm7xx_boards.c
+++ b/hw/arm/npcm7xx_boards.c
@@ -XXX,XX +XXX,XX @@ static void npcm750_evb_machine_class_init(ObjectClass *oc, void *data)
 
     npcm7xx_set_soc_type(nmc, TYPE_NPCM750);
 
-    mc->desc = "Nuvoton NPCM750 Evaluation Board (Cortex A9)";
+    mc->desc = "Nuvoton NPCM750 Evaluation Board (Cortex-A9)";
     mc->init = npcm750_evb_init;
     mc->default_ram_size = 512 * MiB;
 };
@@ -XXX,XX +XXX,XX @@ static void gsj_machine_class_init(ObjectClass *oc, void *data)
 
     npcm7xx_set_soc_type(nmc, TYPE_NPCM730);
 
-    mc->desc = "Quanta GSJ (Cortex A9)";
+    mc->desc = "Quanta GSJ (Cortex-A9)";
     mc->init = quanta_gsj_init;
     mc->default_ram_size = 512 * MiB;
 };
diff --git a/hw/arm/sabrelite.c b/hw/arm/sabrelite.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/sabrelite.c
+++ b/hw/arm/sabrelite.c
@@ -XXX,XX +XXX,XX @@ static void sabrelite_init(MachineState *machine)
 
 static void sabrelite_machine_init(MachineClass *mc)
 {
-    mc->desc = "Freescale i.MX6 Quad SABRE Lite Board (Cortex A9)";
+    mc->desc = "Freescale i.MX6 Quad SABRE Lite Board (Cortex-A9)";
     mc->init = sabrelite_init;
     mc->max_cpus = FSL_IMX6_NUM_CPUS;
     mc->ignore_memory_transaction_failures = true;
diff --git a/hw/misc/npcm7xx_clk.c b/hw/misc/npcm7xx_clk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/npcm7xx_clk.c
+++ b/hw/misc/npcm7xx_clk.c
@@ -XXX,XX +XXX,XX @@
 #define NPCM7XX_CLOCK_REF_HZ            (25000000)
 
 /* Register Field Definitions */
-#define NPCM7XX_CLK_WDRCR_CA9C  BIT(0) /* Cortex A9 Cores */
+#define NPCM7XX_CLK_WDRCR_CA9C  BIT(0) /* Cortex-A9 Cores */
 
 #define PLLCON_LOKI     BIT(31)
 #define PLLCON_LOKS     BIT(30)
-- 
2.20.1

From: Damien Goutte-Gattat <dgouttegattat@incenp.org>

The 4.x branch of Sphinx introduces a breaking change, as generated man
pages are now written to subdirectories corresponding to the manual
section they belong to. This results in `make install` erroring out when
attempting to install the man pages, because they are not where it
expects to find them.

This patch restores the behavior of Sphinx 3.x regarding man pages.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/256
Signed-off-by: Damien Goutte-Gattat <dgouttegattat@incenp.org>
Message-id: 20210503161422.15028-1-dgouttegattat@incenp.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/conf.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/conf.py b/docs/conf.py
index XXXXXXX..XXXXXXX 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -XXX,XX +XXX,XX @@
      ['Stefan Hajnoczi <stefanha@redhat.com>',
       'Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>'], 1),
 ]
+man_make_section_directory = False
 
 # -- Options for Texinfo output -------------------------------------------
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The operands to tcg_gen_atomic_fetch_s{min,max}_i64 must
be signed, so that the inputs are properly extended.
Zero extend the result afterward, as needed.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/364
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20210602020720.47679-1-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     int o3_opc = extract32(insn, 12, 4);
     bool r = extract32(insn, 22, 1);
     bool a = extract32(insn, 23, 1);
-    TCGv_i64 tcg_rs, clean_addr;
+    TCGv_i64 tcg_rs, tcg_rt, clean_addr;
     AtomicThreeOpFn *fn = NULL;
+    MemOp mop = s->be_data | size | MO_ALIGN;
 
     if (is_vector || !dc_isar_feature(aa64_atomics, s)) {
         unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
         break;
     case 004: /* LDSMAX */
         fn = tcg_gen_atomic_fetch_smax_i64;
+        mop |= MO_SIGN;
         break;
     case 005: /* LDSMIN */
         fn = tcg_gen_atomic_fetch_smin_i64;
+        mop |= MO_SIGN;
         break;
     case 006: /* LDUMAX */
         fn = tcg_gen_atomic_fetch_umax_i64;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     }
 
     tcg_rs = read_cpu_reg(s, rs, true);
+    tcg_rt = cpu_reg(s, rt);
 
     if (o3_opc == 1) { /* LDCLR */
         tcg_gen_not_i64(tcg_rs, tcg_rs);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     /* The tcg atomic primitives are all full barriers.  Therefore we
      * can ignore the Acquire and Release bits of this instruction.
      */
-    fn(cpu_reg(s, rt), clean_addr, tcg_rs, get_mem_index(s),
-       s->be_data | size | MO_ALIGN);
+    fn(tcg_rt, clean_addr, tcg_rs, get_mem_index(s), mop);
+
+    if ((mop & MO_SIGN) && size != MO_64) {
+        tcg_gen_ext32u_i64(tcg_rt, tcg_rt);
+    }
 }
 
 /*
-- 
2.20.1

From: Jamie Iles <jamie@nuviainc.com>

The DAIF and PAC checks used raise_exception_ra to raise an exception
and unwind CPU state but raise_exception_ra is currently designed for
handling data aborts as the syndrome is partially precomputed and
encoded in the TB and then merged in merge_syn_data_abort when handling
the data abort.  Using raise_exception_ra for DAIF and PAC checks
results in an empty syndrome being retrieved from data[2] in
restore_state_to_opc and setting ESR to 0.  This manifested as:

kvm [571]: Unknown exception class: esr: 0x000000 –
  Unknown/Uncategorized

when launching a KVM guest when the host qemu used a CPU supporting
EL2+pointer authentication and enabling pointer authentication in the
guest.

Rework raise_exception_ra such that the state is restored before raising
the exception so that the exception is not clobbered by
restore_state_to_opc.

Fixes: 0d43e1a2d29a ("target/arm: Add PAuth helpers")
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
[PMM: added comment]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/op_helper.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ void raise_exception(CPUARMState *env, uint32_t excp,
 void raise_exception_ra(CPUARMState *env, uint32_t excp, uint32_t syndrome,
                         uint32_t target_el, uintptr_t ra)
 {
-    CPUState *cs = do_raise_exception(env, excp, syndrome, target_el);
-    cpu_loop_exit_restore(cs, ra);
+    CPUState *cs = env_cpu(env);
+
+    /*
+     * restore_state_to_opc() will set env->exception.syndrome, so
+     * we must restore CPU state here before setting the syndrome
+     * the caller passed us, and cannot use cpu_loop_exit_restore().
+     */
+    cpu_restore_state(cs, ra, true);
+    raise_exception(env, excp, syndrome, target_el);
 }
 
 uint64_t HELPER(neon_tbl)(CPUARMState *env, uint32_t desc,
-- 
2.20.1

From: Jamie Iles <jamie@nuviainc.com>

Now that there are no other users of do_raise_exception, fold it into
raise_exception.

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/op_helper.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@
 #define SIGNBIT (uint32_t)0x80000000
 #define SIGNBIT64 ((uint64_t)1 << 63)
 
-static CPUState *do_raise_exception(CPUARMState *env, uint32_t excp,
-                                    uint32_t syndrome, uint32_t target_el)
+void raise_exception(CPUARMState *env, uint32_t excp,
+                     uint32_t syndrome, uint32_t target_el)
 {
     CPUState *cs = env_cpu(env);
 
@@ -XXX,XX +XXX,XX @@ static CPUState *do_raise_exception(CPUARMState *env, uint32_t excp,
     cs->exception_index = excp;
     env->exception.syndrome = syndrome;
     env->exception.target_el = target_el;
-
-    return cs;
-}
-
-void raise_exception(CPUARMState *env, uint32_t excp,
-                     uint32_t syndrome, uint32_t target_el)
-{
-    CPUState *cs = do_raise_exception(env, excp, syndrome, target_el);
     cpu_loop_exit(cs);
 }
 
-- 
2.20.1

From: Jamie Iles <jamie@nuviainc.com>

Now that raise_exception_ra restores the state before raising the
exception we can use restore_exception_ra to perform the state restore +
exception raising without clobbering the syndrome.

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
[PMM: Keep the one line of the comment that is still relevant]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/mte_helper.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
 
     switch (tcf) {
     case 1:
-        /*
-         * Tag check fail causes a synchronous exception.
-         *
-         * In restore_state_to_opc, we set the exception syndrome
-         * for the load or store operation.  Unwind first so we
-         * may overwrite that with the syndrome for the tag check.
-         */
-        cpu_restore_state(env_cpu(env), ra, true);
+        /* Tag check fail causes a synchronous exception. */
         env->exception.vaddress = dirty_ptr;
 
         is_write = FIELD_EX32(desc, MTEDESC, WRITE);
         syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0,
                                     is_write, 0x11);
-        raise_exception(env, EXCP_DATA_ABORT, syn, exception_target_el(env));
+        raise_exception_ra(env, EXCP_DATA_ABORT, syn,
+                           exception_target_el(env), ra);
         /* noreturn, but fall through to the assert anyway */
 
     case 0:
-- 
2.20.1

From: Jamie Iles <jamie@nuviainc.com>

The sequence cpu_restore_state() + raise_exception() is equivalent to
raise_exception_ra(), so use that instead.  (In this case we never
cared about the syndrome value, because M-profile doesn't use the
syndrome; the old code was just written unnecessarily awkwardly.)

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
[PMM: Retain edited version of comment; rewrite commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/m_helper.c  | 5 +----
 target/arm/op_helper.c | 9 +++------
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
             limit = is_psp ? env->v7m.psplim[false] : env->v7m.msplim[false];
 
             if (val < limit) {
-                CPUState *cs = env_cpu(env);
-
-                cpu_restore_state(cs, GETPC(), true);
-                raise_exception(env, EXCP_STKOF, 0, 1);
+                raise_exception_ra(env, EXCP_STKOF, 0, 1, GETPC());
             }
 
             if (is_psp) {
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v8m_stackcheck)(CPUARMState *env, uint32_t newvalue)
      * raising an exception if the limit is breached.
      */
     if (newvalue < v7m_sp_limit(env)) {
-        CPUState *cs = env_cpu(env);
-
         /*
          * Stack limit exceptions are a rare case, so rather than syncing
-         * PC/condbits before the call, we use cpu_restore_state() to
-         * get them right before raising the exception.
+         * PC/condbits before the call, we use raise_exception_ra() so
+         * that cpu_restore_state() will sort them out.
          */
-        cpu_restore_state(cs, GETPC(), true);
-        raise_exception(env, EXCP_STKOF, 0, 1);
+        raise_exception_ra(env, EXCP_STKOF, 0, 1, GETPC());
     }
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Note that the SVE BFLOAT16 support does not require SVE2,
it is an independent extension.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
     int rd = extract32(insn, 0, 5);
 
     if (mos) {
-        unallocated_encoding(s);
-        return;
+        goto do_unallocated;
     }
 
     switch (opcode) {
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
         /* FCVT between half, single and double precision */
         int dtype = extract32(opcode, 0, 2);
         if (type == 2 || dtype == type) {
-            unallocated_encoding(s);
-            return;
+            goto do_unallocated;
         }
         if (!fp_access_check(s)) {
             return;
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
 
     case 0x10 ... 0x13: /* FRINT{32,64}{X,Z} */
         if (type > 1 || !dc_isar_feature(aa64_frint, s)) {
-            unallocated_encoding(s);
-            return;
+            goto do_unallocated;
         }
         /* fall through */
     case 0x0 ... 0x3:
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
             break;
         case 3:
             if (!dc_isar_feature(aa64_fp16, s)) {
-                unallocated_encoding(s);
-                return;
+                goto do_unallocated;
             }
 
             if (!fp_access_check(s)) {
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
             handle_fp_1src_half(s, opcode, rd, rn);
             break;
         default:
-            unallocated_encoding(s);
+            goto do_unallocated;
         }
         break;
 
     default:
+    do_unallocated:
         unallocated_encoding(s);
         break;
     }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is the 64-bit BFCVT and the 32-bit VCVT{B,T}.BF16.F32.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h        |  1 +
 target/arm/vfp.decode      |  2 ++
 target/arm/translate-a64.c | 19 +++++++++++++++++++
 target/arm/translate-vfp.c | 24 ++++++++++++++++++++++++
 target/arm/vfp_helper.c    |  5 +++++
 5 files changed, 51 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
 
 DEF_HELPER_2(vfp_fcvtds, f64, f32, env)
 DEF_HELPER_2(vfp_fcvtsd, f32, f64, env)
+DEF_HELPER_FLAGS_2(bfcvt, TCG_CALL_NO_RWG, i32, f32, ptr)
 
 DEF_HELPER_2(vfp_uitoh, f16, i32, ptr)
 DEF_HELPER_2(vfp_uitos, f32, i32, ptr)
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
 
 # VCVTB and VCVTT to f16: Vd format is always vd_sp;
 # Vm format depends on size bit
+VCVT_b16_f32 ---- 1110 1.11 0011 .... 1001 t:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
 VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn)
     case 0x3: /* FSQRT */
         gen_helper_vfp_sqrts(tcg_res, tcg_op, cpu_env);
         goto done;
+    case 0x6: /* BFCVT */
+        gen_fpst = gen_helper_bfcvt;
+        break;
     case 0x8: /* FRINTN */
     case 0x9: /* FRINTP */
     case 0xa: /* FRINTM */
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
         }
         break;
 
+    case 0x6:
+        switch (type) {
+        case 1: /* BFCVT */
+            if (!dc_isar_feature(aa64_bf16, s)) {
+                goto do_unallocated;
+            }
+            if (!fp_access_check(s)) {
+                return;
+            }
+            handle_fp_1src_single(s, opcode, rd, rn);
+            break;
+        default:
+            goto do_unallocated;
+        }
+        break;
+
     default:
     do_unallocated:
         unallocated_encoding(s);
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
     return true;
 }
 
+static bool trans_VCVT_b16_f32(DisasContext *s, arg_VCVT_b16_f32 *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = fpstatus_ptr(FPST_FPCR);
+    tmp = tcg_temp_new_i32();
+
+    vfp_load_reg32(tmp, a->vm);
+    gen_helper_bfcvt(tmp, tmp, fpst);
+    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
 static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
 {
     TCGv_ptr fpst;
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ float32 VFP_HELPER(fcvts, d)(float64 x, CPUARMState *env)
     return float64_to_float32(x, &env->vfp.fp_status);
 }
 
+uint32_t HELPER(bfcvt)(float32 x, void *status)
+{
+    return float32_to_bfloat16(x, status);
+}
+
 /*
  * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
  * must always round-to-nearest; the AArch64 ones honour the FPSCR
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFCVT{N,T} for both AArch64 AdvSIMD and SVE,
and VCVT.BF16.F32 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-sve.h     |  4 ++++
 target/arm/helper.h         |  1 +
 target/arm/neon-dp.decode   |  1 +
 target/arm/sve.decode       |  2 ++
 target/arm/sve_helper.c     |  2 ++
 target/arm/translate-a64.c  | 17 ++++++++++++++
 target/arm/translate-neon.c | 45 +++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c  | 16 +++++++++++++
 target/arm/vfp_helper.c     |  7 ++++++
 9 files changed, 95 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve_fcvt_hd, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_fcvt_sd, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_bfcvt, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(sve_fcvtzs_hh, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve2_fcvtnt_sh, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve2_fcvtnt_ds, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_bfcvtnt, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(sve2_fcvtlt_hs, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
 DEF_HELPER_2(vfp_fcvtds, f64, f32, env)
 DEF_HELPER_2(vfp_fcvtsd, f32, f64, env)
 DEF_HELPER_FLAGS_2(bfcvt, TCG_CALL_NO_RWG, i32, f32, ptr)
+DEF_HELPER_FLAGS_2(bfcvt_pair, TCG_CALL_NO_RWG, i32, i64, ptr)
 
 DEF_HELPER_2(vfp_uitoh, f16, i32, ptr)
 DEF_HELPER_2(vfp_uitos, f32, i32, ptr)
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
     VRINTZ       1111 001 11 . 11 .. 10 .... 0 1011 . . 0 .... @2misc
 
     VCVT_F16_F32 1111 001 11 . 11 .. 10 .... 0 1100 0 . 0 .... @2misc_q0
+    VCVT_B16_F32 1111 001 11 . 11 .. 10 .... 0 1100 1 . 0 .... @2misc_q0
 
     VRINTM       1111 001 11 . 11 .. 10 .... 0 1101 . . 0 .... @2misc
 
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FNMLS_zpzzz     01100101 .. 1 ..... 111 ... ..... .....         @rdn_pg_rm_ra
 # SVE floating-point convert precision
 FCVT_sh         01100101 10 0010 00 101 ... ..... .....         @rd_pg_rn_e0
 FCVT_hs         01100101 10 0010 01 101 ... ..... .....         @rd_pg_rn_e0
+BFCVT           01100101 10 0010 10 101 ... ..... .....         @rd_pg_rn_e0
 FCVT_dh         01100101 11 0010 00 101 ... ..... .....         @rd_pg_rn_e0
 FCVT_hd         01100101 11 0010 01 101 ... ..... .....         @rd_pg_rn_e0
 FCVT_ds         01100101 11 0010 10 101 ... ..... .....         @rd_pg_rn_e0
@@ -XXX,XX +XXX,XX @@ RAX1            01000101 00 1 ..... 11110 1 ..... .....  @rd_rn_rm_e0
 FCVTXNT_ds      01100100 00 0010 10 101 ... ..... .....  @rd_pg_rn_e0
 FCVTX_ds        01100101 00 0010 10 101 ... ..... .....  @rd_pg_rn_e0
 FCVTNT_sh       01100100 10 0010 00 101 ... ..... .....  @rd_pg_rn_e0
+BFCVTNT         01100100 10 0010 10 101 ... ..... .....  @rd_pg_rn_e0
 FCVTLT_hs       01100100 10 0010 01 101 ... ..... .....  @rd_pg_rn_e0
 FCVTNT_ds       01100100 11 0010 10 101 ... ..... .....  @rd_pg_rn_e0
 FCVTLT_sd       01100100 11 0010 11 101 ... ..... .....  @rd_pg_rn_e0
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint64_t vfp_float64_to_uint64_rtz(float64 f, float_status *s)
 
 DO_ZPZ_FP(sve_fcvt_sh, uint32_t, H1_4, sve_f32_to_f16)
 DO_ZPZ_FP(sve_fcvt_hs, uint32_t, H1_4, sve_f16_to_f32)
+DO_ZPZ_FP(sve_bfcvt,   uint32_t, H1_4, float32_to_bfloat16)
 DO_ZPZ_FP(sve_fcvt_dh, uint64_t,     , sve_f64_to_f16)
 DO_ZPZ_FP(sve_fcvt_hd, uint64_t,     , sve_f16_to_f64)
 DO_ZPZ_FP(sve_fcvt_ds, uint64_t,     , float64_to_float32)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc)  \
     } while (i != 0);                                                         \
 }
 
+DO_FCVTNT(sve_bfcvtnt,    uint32_t, uint16_t, H1_4, H1_2, float32_to_bfloat16)
 DO_FCVTNT(sve2_fcvtnt_sh, uint32_t, uint16_t, H1_4, H1_2, sve_f32_to_f16)
 DO_FCVTNT(sve2_fcvtnt_ds, uint64_t, uint32_t,     , H1_4, float64_to_float32)
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_2misc_narrow(DisasContext *s, bool scalar,
                 tcg_temp_free_i32(ahp);
             }
             break;
+        case 0x36: /* BFCVTN, BFCVTN2 */
+            {
+                TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
+                gen_helper_bfcvt_pair(tcg_res[pass], tcg_op, fpst);
+                tcg_temp_free_ptr(fpst);
+            }
+            break;
         case 0x56:  /* FCVTXN, FCVTXN2 */
             /* 64 bit to 32 bit float conversion
              * with von Neumann rounding (round to odd)
@@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
             }
             handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd);
             return;
+        case 0x36: /* BFCVTN, BFCVTN2 */
+            if (!dc_isar_feature(aa64_bf16, s) || size != 2) {
+                unallocated_encoding(s);
+                return;
+            }
+            if (!fp_access_check(s)) {
+                return;
+            }
+            handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd);
+            return;
         case 0x17: /* FCVTL, FCVTL2 */
             if (!fp_access_check(s)) {
                 return;
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
     return true;
 }
 
+static bool trans_VCVT_B16_F32(DisasContext *s, arg_2misc *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i64 tmp;
+    TCGv_i32 dst0, dst1;
+
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vm & 1) || (a->size != 1)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = fpstatus_ptr(FPST_STD);
+    tmp = tcg_temp_new_i64();
+    dst0 = tcg_temp_new_i32();
+    dst1 = tcg_temp_new_i32();
+
+    read_neon_element64(tmp, a->vm, 0, MO_64);
+    gen_helper_bfcvt_pair(dst0, tmp, fpst);
+
+    read_neon_element64(tmp, a->vm, 1, MO_64);
+    gen_helper_bfcvt_pair(dst1, tmp, fpst);
+
+    write_neon_element32(dst0, a->vd, 0, MO_32);
+    write_neon_element32(dst1, a->vd, 1, MO_32);
+
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i32(dst0);
+    tcg_temp_free_i32(dst1);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
+
 static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
 {
     TCGv_ptr fpst;
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_hs(DisasContext *s, arg_rpr_esz *a)
     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_hs);
 }
 
+static bool trans_BFCVT(DisasContext *s, arg_rpr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_bfcvt);
+}
+
 static bool trans_FCVT_dh(DisasContext *s, arg_rpr_esz *a)
 {
     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_dh);
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVTNT_sh(DisasContext *s, arg_rpr_esz *a)
     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtnt_sh);
 }
 
+static bool trans_BFCVTNT(DisasContext *s, arg_rpr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_bfcvtnt);
+}
+
 static bool trans_FCVTNT_ds(DisasContext *s, arg_rpr_esz *a)
 {
     if (!dc_isar_feature(aa64_sve2, s)) {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(bfcvt)(float32 x, void *status)
     return float32_to_bfloat16(x, status);
 }
 
+uint32_t HELPER(bfcvt_pair)(uint64_t pair, void *status)
+{
+    bfloat16 lo = float32_to_bfloat16(extract64(pair, 0, 32), status);
+    bfloat16 hi = float32_to_bfloat16(extract64(pair, 32, 32), status);
+    return deposit32(lo, 16, 16, hi);
+}
+
 /*
  * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
  * must always round-to-nearest; the AArch64 ones honour the FPSCR
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

For Arm BFDOT and BFMMLA, we need a version of round-to-odd
that overflows to infinity, instead of the max normal number.

Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-6-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/fpu/softfloat-types.h | 4 +++-
 fpu/softfloat-parts.c.inc     | 6 ++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index XXXXXXX..XXXXXXX 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
     float_round_up           = 2,
     float_round_to_zero      = 3,
     float_round_ties_away    = 4,
-    /* Not an IEEE rounding mode: round to the closest odd mantissa value */
+    /* Not an IEEE rounding mode: round to closest odd, overflow to max */
     float_round_to_odd       = 5,
+    /* Not an IEEE rounding mode: round to closest odd, overflow to inf */
+    float_round_to_odd_inf   = 6,
 } FloatRoundMode;
 
 /*
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
         g_assert_not_reached();
     }
 
+    overflow_norm = false;
     switch (s->float_rounding_mode) {
     case float_round_nearest_even:
-        overflow_norm = false;
         inc = ((p->frac_lo & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
         break;
     case float_round_ties_away:
-        overflow_norm = false;
         inc = frac_lsbm1;
         break;
     case float_round_to_zero:
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
         break;
     case float_round_to_odd:
         overflow_norm = true;
+        /* fall through */
+    case float_round_to_odd_inf:
         inc = p->frac_lo & frac_lsb ? 0 : round_mask;
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
                        ? frac_lsbm1 : 0);
                 break;
             case float_round_to_odd:
+            case float_round_to_odd_inf:
                 inc = p->frac_lo & frac_lsb ? 0 : round_mask;
                 break;
             default:
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFDOT for both AArch64 AdvSIMD and SVE,
and VDOT.BF16 for AArch32 NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-7-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  3 +++
 target/arm/neon-shared.decode |  2 ++
 target/arm/sve.decode         |  3 +++
 target/arm/translate-a64.c    | 20 ++++++++++++++++++
 target/arm/translate-neon.c   |  9 ++++++++
 target/arm/translate-sve.c    | 12 +++++++++++
 target/arm/vec_helper.c       | 40 +++++++++++++++++++++++++++++++++++
 7 files changed, 89 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_ummla_b, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUDOT          1111 110 00 . 10 .... .... 1101 . q:1 . 1 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
 VUSDOT         1111 110 01 . 10 .... .... 1101 . q:1 . 0 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+VDOT_b16       1111 110 00 . 00 .... .... 1101 . q:1 . 0 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
 # VFM[AS]L
 VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
 FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_e0
 FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_e0
 
+### SVE2 floating-point bfloat16 dot-product
+BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
+
 ### SVE2 floating-point multiply-add long (indexed)
 FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
 FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         feature = dc_isar_feature(aa64_fcma, s);
         break;
+    case 0x1f: /* BFDOT */
+        switch (size) {
+        case 1:
+            feature = dc_isar_feature(aa64_bf16, s);
+            break;
+        default:
+            unallocated_encoding(s);
+            return;
+        }
+        break;
     default:
         unallocated_encoding(s);
         return;
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         return;
 
+    case 0xf: /* BFDOT */
+        switch (size) {
+        case 1:
+            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        return;
+
     default:
         g_assert_not_reached();
     }
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VUSDOT(DisasContext *s, arg_VUSDOT *a)
                         gen_helper_gvec_usdot_b);
 }
 
+static bool trans_VDOT_b16(DisasContext *s, arg_VDOT_b16 *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0,
+                        gen_helper_gvec_bfdot);
+}
+
 static bool trans_VFML(DisasContext *s, arg_VFML *a)
 {
     int opr_sz;
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_UMMLA(DisasContext *s, arg_rrrr_esz *a)
 {
     return do_i8mm_zzzz_ool(s, a, gen_helper_gvec_ummla_b, 0);
 }
+
+static bool trans_BFDOT_zzzz(DisasContext *s, arg_rrrr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfdot,
+                          a->rd, a->rn, a->rm, a->ra, 0);
+    }
+    return true;
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static void do_mmla_b(void *vd, void *vn, void *vm, void *va, uint32_t desc,
 DO_MMLA_B(gvec_smmla_b, do_smmla_b)
 DO_MMLA_B(gvec_ummla_b, do_ummla_b)
 DO_MMLA_B(gvec_usmmla_b, do_usmmla_b)
+
+/*
+ * BFloat16 Dot Product
+ */
+
+static float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2)
+{
+    /* FPCR is ignored for BFDOT and BFMMLA. */
+    float_status bf_status = {
+        .tininess_before_rounding = float_tininess_before_rounding,
+        .float_rounding_mode = float_round_to_odd_inf,
+        .flush_to_zero = true,
+        .flush_inputs_to_zero = true,
+        .default_nan_mode = true,
+    };
+    float32 t1, t2;
+
+    /*
+     * Extract each BFloat16 from the element pair, and shift
+     * them such that they become float32.
+     */
+    t1 = float32_mul(e1 << 16, e2 << 16, &bf_status);
+    t2 = float32_mul(e1 & 0xffff0000u, e2 & 0xffff0000u, &bf_status);
+    t1 = float32_add(t1, t2, &bf_status);
+    t1 = float32_add(sum, t1, &bf_status);
+
+    return t1;
+}
+
+void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    float32 *d = vd, *a = va;
+    uint32_t *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 4; ++i) {
+        d[i] = bfdotadd(a[i], n[i], m[i]);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFDOT for both AArch64 AdvSIMD and SVE,
and VDOT.BF16 for AArch32 NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  2 ++
 target/arm/neon-shared.decode |  2 ++
 target/arm/sve.decode         |  3 +++
 target/arm/translate-a64.c    | 41 +++++++++++++++++++++++++++--------
 target/arm/translate-neon.c   |  9 ++++++++
 target/arm/translate-sve.c    | 12 ++++++++++
 target/arm/vec_helper.c       | 20 +++++++++++++++++
 7 files changed, 80 insertions(+), 9 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG,
 
 DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
 
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUSDOT_scalar  1111 1110 1 . 00 .... .... 1101 . q:1 index:1 0 vm:4 \
                vn=%vn_dp vd=%vd_dp
 VSUDOT_scalar  1111 1110 1 . 00 .... .... 1101 . q:1 index:1 1 vm:4 \
                vn=%vn_dp vd=%vd_dp
+VDOT_b16_scal  1111 1110 0 . 00 .... .... 1101 . q:1 index:1 0 vm:4 \
+               vn=%vn_dp vd=%vd_dp
 
 %vfml_scalar_q0_rm 0:3 5:1
 %vfml_scalar_q1_index 5:1 3:1
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
 FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
 FMLSLB_zzxw     01100100 10 1 ..... 0110.0 ..... .....     @rrxr_3a esz=2
 FMLSLT_zzxw     01100100 10 1 ..... 0110.1 ..... .....     @rrxr_3a esz=2
+
+### SVE2 floating-point bfloat16 dot-product (indexed)
+BFDOT_zzxz      01100100 01 1 ..... 010000 ..... .....     @rrxr_2 esz=2
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
             return;
         }
         break;
-    case 0x0f: /* SUDOT, USDOT */
-        if (is_scalar || (size & 1) || !dc_isar_feature(aa64_i8mm, s)) {
+    case 0x0f:
+        switch (size) {
+        case 0: /* SUDOT */
+        case 2: /* USDOT */
+            if (is_scalar || !dc_isar_feature(aa64_i8mm, s)) {
+                unallocated_encoding(s);
+                return;
+            }
+            break;
+        case 1: /* BFDOT */
+            if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
+                unallocated_encoding(s);
+                return;
+            }
+            break;
+        default:
             unallocated_encoding(s);
             return;
         }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                          u ? gen_helper_gvec_udot_idx_b
                          : gen_helper_gvec_sdot_idx_b);
         return;
-    case 0x0f: /* SUDOT, USDOT */
-        gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
-                         extract32(insn, 23, 1)
-                         ? gen_helper_gvec_usdot_idx_b
-                         : gen_helper_gvec_sudot_idx_b);
-        return;
-
+    case 0x0f:
+        switch (extract32(insn, 22, 2)) {
+        case 0: /* SUDOT */
+            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
+                             gen_helper_gvec_sudot_idx_b);
+            return;
+        case 1: /* BFDOT */
+            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
+                             gen_helper_gvec_bfdot_idx);
+            return;
+        case 2: /* USDOT */
+            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
+                             gen_helper_gvec_usdot_idx_b);
+            return;
+        }
+        g_assert_not_reached();
     case 0x11: /* FCMLA #0 */
     case 0x13: /* FCMLA #90 */
     case 0x15: /* FCMLA #180 */
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSUDOT_scalar(DisasContext *s, arg_VSUDOT_scalar *a)
                         gen_helper_gvec_sudot_idx_b);
 }
 
+static bool trans_VDOT_b16_scal(DisasContext *s, arg_VDOT_b16_scal *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index,
+                        gen_helper_gvec_bfdot_idx);
+}
+
 static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
 {
     int opr_sz;
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFDOT_zzzz(DisasContext *s, arg_rrrr_esz *a)
     }
     return true;
 }
+
+static bool trans_BFDOT_zzxz(DisasContext *s, arg_rrxr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfdot_idx,
+                          a->rd, a->rn, a->rm, a->ra, a->index);
+    }
+    return true;
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm,
+                            void *va, uint32_t desc)
+{
+    intptr_t i, j, opr_sz = simd_oprsz(desc);
+    intptr_t index = simd_data(desc);
+    intptr_t elements = opr_sz / 4;
+    intptr_t eltspersegment = MIN(16 / 4, elements);
+    float32 *d = vd, *a = va;
+    uint32_t *n = vn, *m = vm;
+
+    for (i = 0; i < elements; i += eltspersegment) {
+        uint32_t m_idx = m[i + H4(index)];
+
+        for (j = i; j < i + eltspersegment; j++) {
+            d[j] = bfdotadd(a[j], n[j], m_idx);
+        }
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFMMLA for both AArch64 AdvSIMD and SVE,
and VMMLA.BF16 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  3 +++
 target/arm/neon-shared.decode |  2 ++
 target/arm/sve.decode         |  6 +++--
 target/arm/translate-a64.c    | 10 +++++++++
 target/arm/translate-neon.c   |  9 ++++++++
 target/arm/translate-sve.c    | 12 ++++++++++
 target/arm/vec_helper.c       | 42 ++++++++++++++++++++++++++++++++++-
 7 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUMMLA         1111 1100 0.10 .... .... 1100 .1.1 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
 VUSMMLA        1111 1100 1.10 .... .... 1100 .1.0 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+VMMLA_b16      1111 1100 0.00 .... .... 1100 .1.0 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
 VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
                vn=%vn_dp vd=%vd_dp size=1
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ SQRDCMLAH_zzzz  01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5  ra=%reg_movprfx
 USDOT_zzzz      01000100 .. 0 ..... 011 110 ..... .....  @rda_rn_rm
 
 ### SVE2 floating point matrix multiply accumulate
-
-FMMLA           01100100 .. 1 ..... 111001 ..... .....  @rda_rn_rm
+{
+  BFMMLA        01100100 01 1 ..... 111 001 ..... .....  @rda_rn_rm_e0
+  FMMLA         01100100 .. 1 ..... 111 001 ..... .....  @rda_rn_rm
+}
 
 ### SVE2 Memory Gather Load Group
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         feature = dc_isar_feature(aa64_fcma, s);
         break;
+    case 0x1d: /* BFMMLA */
+        if (size != MO_16 || !is_q) {
+            unallocated_encoding(s);
+            return;
+        }
+        feature = dc_isar_feature(aa64_bf16, s);
+        break;
     case 0x1f: /* BFDOT */
         switch (size) {
         case 1:
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         return;
 
+    case 0xd: /* BFMMLA */
+        gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
+        return;
     case 0xf: /* BFDOT */
         switch (size) {
         case 1:
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VUSMMLA(DisasContext *s, arg_VUSMMLA *a)
     return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
                         gen_helper_gvec_usmmla_b);
 }
+
+static bool trans_VMMLA_b16(DisasContext *s, arg_VMMLA_b16 *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
+                        gen_helper_gvec_bfmmla);
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFDOT_zzxz(DisasContext *s, arg_rrxr_esz *a)
     }
     return true;
 }
+
+static bool trans_BFMMLA(DisasContext *s, arg_rrrr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfmmla,
+                          a->rd, a->rn, a->rm, a->ra, 0);
+    }
+    return true;
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static void do_mmla_b(void *vd, void *vn, void *vm, void *va, uint32_t desc,
          * Process the entire segment at once, writing back the
          * results only after we've consumed all of the inputs.
          *
-         * Key to indicies by column:
+         * Key to indices by column:
          *          i   j                  i             j
          */
         sum0 = a[H4(0 + 0)];
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm,
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
+{
+    intptr_t s, opr_sz = simd_oprsz(desc);
+    float32 *d = vd, *a = va;
+    uint32_t *n = vn, *m = vm;
+
+    for (s = 0; s < opr_sz / 4; s += 4) {
+        float32 sum00, sum01, sum10, sum11;
+
+        /*
+         * Process the entire segment at once, writing back the
+         * results only after we've consumed all of the inputs.
+         *
+         * Key to indicies by column:
+         *               i   j           i   k             j   k
+         */
+        sum00 = a[s + H4(0 + 0)];
+        sum00 = bfdotadd(sum00, n[s + H4(0 + 0)], m[s + H4(0 + 0)]);
+        sum00 = bfdotadd(sum00, n[s + H4(0 + 1)], m[s + H4(0 + 1)]);
+
+        sum01 = a[s + H4(0 + 1)];
+        sum01 = bfdotadd(sum01, n[s + H4(0 + 0)], m[s + H4(2 + 0)]);
+        sum01 = bfdotadd(sum01, n[s + H4(0 + 1)], m[s + H4(2 + 1)]);
+
+        sum10 = a[s + H4(2 + 0)];
+        sum10 = bfdotadd(sum10, n[s + H4(2 + 0)], m[s + H4(0 + 0)]);
+        sum10 = bfdotadd(sum10, n[s + H4(2 + 1)], m[s + H4(0 + 1)]);
+
+        sum11 = a[s + H4(2 + 1)];
+        sum11 = bfdotadd(sum11, n[s + H4(2 + 0)], m[s + H4(2 + 0)]);
+        sum11 = bfdotadd(sum11, n[s + H4(2 + 1)], m[s + H4(2 + 1)]);
+
+        d[s + H4(0 + 0)] = sum00;
+        d[s + H4(0 + 1)] = sum01;
+        d[s + H4(2 + 0)] = sum10;
+        d[s + H4(2 + 1)] = sum11;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
and VFMA{B,T}.BF16 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-10-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  3 +++
 target/arm/neon-shared.decode |  3 +++
 target/arm/sve.decode         |  3 +++
 target/arm/translate-a64.c    | 13 +++++++++----
 target/arm/translate-neon.c   |  9 +++++++++
 target/arm/translate-sve.c    | 30 ++++++++++++++++++++++++++++++
 target/arm/vec_helper.c       | 16 ++++++++++++++++
 7 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUSMMLA        1111 1100 1.10 .... .... 1100 .1.0 .... \
 VMMLA_b16      1111 1100 0.00 .... .... 1100 .1.0 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
+VFMA_b16       1111 110 0 0.11 .... .... 1000 . q:1 . 1 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
 VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
                vn=%vn_dp vd=%vd_dp size=1
 VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
 FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_e0
 FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_e0
 
+BFMLALB_zzzw    01100100 11 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
+BFMLALT_zzzw    01100100 11 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
+
 ### SVE2 floating-point bfloat16 dot-product
 BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         feature = dc_isar_feature(aa64_bf16, s);
         break;
-    case 0x1f: /* BFDOT */
+    case 0x1f:
         switch (size) {
-        case 1:
+        case 1: /* BFDOT */
+        case 3: /* BFMLAL{B,T} */
             feature = dc_isar_feature(aa64_bf16, s);
             break;
         default:
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
     case 0xd: /* BFMMLA */
         gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
         return;
-    case 0xf: /* BFDOT */
+    case 0xf:
         switch (size) {
-        case 1:
+        case 1: /* BFDOT */
             gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
             break;
+        case 3: /* BFMLAL{B,T} */
+            gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, false, is_q,
+                              gen_helper_gvec_bfmlal);
+            break;
         default:
             g_assert_not_reached();
         }
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMMLA_b16(DisasContext *s, arg_VMMLA_b16 *a)
     return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
                         gen_helper_gvec_bfmmla);
 }
+
+static bool trans_VFMA_b16(DisasContext *s, arg_VFMA_b16 *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda_fpst(s, 7, a->vd, a->vn, a->vm, a->q, FPST_STD,
+                             gen_helper_gvec_bfmlal);
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMMLA(DisasContext *s, arg_rrrr_esz *a)
     }
     return true;
 }
+
+static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        TCGv_ptr status = fpstatus_ptr(FPST_FPCR);
+        unsigned vsz = vec_full_reg_size(s);
+
+        tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
+                           vec_full_reg_offset(s, a->rn),
+                           vec_full_reg_offset(s, a->rm),
+                           vec_full_reg_offset(s, a->ra),
+                           status, vsz, vsz, sel,
+                           gen_helper_gvec_bfmlal);
+        tcg_temp_free_ptr(status);
+    }
+    return true;
+}
+
+static bool trans_BFMLALB_zzzw(DisasContext *s, arg_rrrr_esz *a)
+{
+    return do_BFMLAL_zzzw(s, a, false);
+}
+
+static bool trans_BFMLALT_zzzw(DisasContext *s, arg_rrrr_esz *a)
+{
+    return do_BFMLAL_zzzw(s, a, true);
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va,
+                         void *stat, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    intptr_t sel = simd_data(desc);
+    float32 *d = vd, *a = va;
+    bfloat16 *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 4; ++i) {
+        float32 nn = n[H2(i * 2 + sel)] << 16;
+        float32 mm = m[H2(i * 2 + sel)] << 16;
+        d[H4(i)] = float32_muladd(nn, mm, a[H4(i)], 0, stat);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
and VFMA{B,T}.BF16 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-11-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  2 ++
 target/arm/neon-shared.decode |  2 ++
 target/arm/sve.decode         |  2 ++
 target/arm/translate-a64.c    | 15 ++++++++++++++-
 target/arm/translate-neon.c   | 10 ++++++++++
 target/arm/translate-sve.c    | 30 ++++++++++++++++++++++++++++++
 target/arm/vec_helper.c       | 22 ++++++++++++++++++++++
 7 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
 
 DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
 
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 0 . 1 index:1 ... \
                rm=%vfml_scalar_q0_rm vn=%vn_sp vd=%vd_dp q=0
 VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 1 . 1 . rm:3 \
                index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp q=1
+VFMA_b16_scal  1111 1110 0.11 .... .... 1000 . q:1 . 1 . vm:3 \
+               index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
 FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
 FMLSLB_zzxw     01100100 10 1 ..... 0110.0 ..... .....     @rrxr_3a esz=2
 FMLSLT_zzxw     01100100 10 1 ..... 0110.1 ..... .....     @rrxr_3a esz=2
+BFMLALB_zzxw    01100100 11 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
+BFMLALT_zzxw    01100100 11 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
 
 ### SVE2 floating-point bfloat16 dot-product (indexed)
 BFDOT_zzxz      01100100 01 1 ..... 010000 ..... .....     @rrxr_2 esz=2
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                 unallocated_encoding(s);
                 return;
             }
+            size = MO_32;
             break;
         case 1: /* BFDOT */
             if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
                 unallocated_encoding(s);
                 return;
             }
+            size = MO_32;
+            break;
+        case 3: /* BFMLAL{B,T} */
+            if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
+                unallocated_encoding(s);
+                return;
+            }
+            /* can't set is_fp without other incorrect size checks */
+            size = MO_16;
             break;
         default:
             unallocated_encoding(s);
             return;
         }
-        size = MO_32;
         break;
     case 0x11: /* FCMLA #0 */
     case 0x13: /* FCMLA #90 */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
             gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
                              gen_helper_gvec_usdot_idx_b);
             return;
+        case 3: /* BFMLAL{B,T} */
+            gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, 0, (index << 1) | is_q,
+                              gen_helper_gvec_bfmlal_idx);
+            return;
         }
         g_assert_not_reached();
     case 0x11: /* FCMLA #0 */
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VFMA_b16(DisasContext *s, arg_VFMA_b16 *a)
     return do_neon_ddda_fpst(s, 7, a->vd, a->vn, a->vm, a->q, FPST_STD,
                              gen_helper_gvec_bfmlal);
 }
+
+static bool trans_VFMA_b16_scal(DisasContext *s, arg_VFMA_b16_scal *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda_fpst(s, 6, a->vd, a->vn, a->vm,
+                             (a->index << 1) | a->q, FPST_STD,
+                             gen_helper_gvec_bfmlal_idx);
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLALT_zzzw(DisasContext *s, arg_rrrr_esz *a)
 {
     return do_BFMLAL_zzzw(s, a, true);
 }
+
+static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        TCGv_ptr status = fpstatus_ptr(FPST_FPCR);
+        unsigned vsz = vec_full_reg_size(s);
+
+        tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
+                           vec_full_reg_offset(s, a->rn),
+                           vec_full_reg_offset(s, a->rm),
+                           vec_full_reg_offset(s, a->ra),
+                           status, vsz, vsz, (a->index << 1) | sel,
+                           gen_helper_gvec_bfmlal_idx);
+        tcg_temp_free_ptr(status);
+    }
+    return true;
+}
+
+static bool trans_BFMLALB_zzxw(DisasContext *s, arg_rrxr_esz *a)
+{
+    return do_BFMLAL_zzxw(s, a, false);
+}
+
+static bool trans_BFMLALT_zzxw(DisasContext *s, arg_rrxr_esz *a)
+{
+    return do_BFMLAL_zzxw(s, a, true);
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va,
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_bfmlal_idx)(void *vd, void *vn, void *vm,
+                             void *va, void *stat, uint32_t desc)
+{
+    intptr_t i, j, opr_sz = simd_oprsz(desc);
+    intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 1);
+    intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 1, 3);
+    intptr_t elements = opr_sz / 4;
+    intptr_t eltspersegment = MIN(16 / 4, elements);
+    float32 *d = vd, *a = va;
+    bfloat16 *n = vn, *m = vm;
+
+    for (i = 0; i < elements; i += eltspersegment) {
+        float32 m_idx = m[H2(2 * i + index)] << 16;
+
+        for (j = i; j < i + eltspersegment; j++) {
+            float32 n_j = n[H2(2 * j + sel)] << 16;
+            d[H4(j)] = float32_muladd(n_j, m_idx, a[H4(j)], 0, stat);
+        }
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-12-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 linux-user/elfload.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap2(void)
     GET_FEATURE_ID(aa64_sve_i8mm, ARM_HWCAP2_A64_SVEI8MM);
     GET_FEATURE_ID(aa64_sve_f32mm, ARM_HWCAP2_A64_SVEF32MM);
     GET_FEATURE_ID(aa64_sve_f64mm, ARM_HWCAP2_A64_SVEF64MM);
+    GET_FEATURE_ID(aa64_sve_bf16, ARM_HWCAP2_A64_SVEBF16);
     GET_FEATURE_ID(aa64_i8mm, ARM_HWCAP2_A64_I8MM);
+    GET_FEATURE_ID(aa64_bf16, ARM_HWCAP2_A64_BF16);
     GET_FEATURE_ID(aa64_rndr, ARM_HWCAP2_A64_RNG);
     GET_FEATURE_ID(aa64_bti, ARM_HWCAP2_A64_BTI);
     GET_FEATURE_ID(aa64_mte, ARM_HWCAP2_A64_MTE);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Disable BF16 again for !have_neon and !have_vfp during realize.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-13-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c     | 3 +++
 target/arm/cpu64.c   | 3 +++
 target/arm/cpu_tcg.c | 1 +
 3 files changed, 7 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
 
         u = cpu->isar.id_isar6;
         u = FIELD_DP32(u, ID_ISAR6, JSCVT, 0);
+        u = FIELD_DP32(u, ID_ISAR6, BF16, 0);
         cpu->isar.id_isar6 = u;
 
         u = cpu->isar.mvfr0;
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
 
         t = cpu->isar.id_aa64isar1;
         t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 0);
+        t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 0);
         t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 0);
         cpu->isar.id_aa64isar1 = t;
 
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
         u = cpu->isar.id_isar6;
         u = FIELD_DP32(u, ID_ISAR6, DP, 0);
         u = FIELD_DP32(u, ID_ISAR6, FHM, 0);
+        u = FIELD_DP32(u, ID_ISAR6, BF16, 0);
         u = FIELD_DP32(u, ID_ISAR6, I8MM, 0);
         cpu->isar.id_isar6 = u;
 
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
         t = FIELD_DP64(t, ID_AA64ISAR1, SB, 1);
         t = FIELD_DP64(t, ID_AA64ISAR1, SPECRES, 1);
+        t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 1);
         t = FIELD_DP64(t, ID_AA64ISAR1, FRINTTS, 1);
         t = FIELD_DP64(t, ID_AA64ISAR1, LRCPC, 2); /* ARMv8.4-RCPC */
         t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 1);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 1);
         t = FIELD_DP64(t, ID_AA64ZFR0, AES, 2);  /* PMULL */
         t = FIELD_DP64(t, ID_AA64ZFR0, BITPERM, 1);
+        t = FIELD_DP64(t, ID_AA64ZFR0, BFLOAT16, 1);
         t = FIELD_DP64(t, ID_AA64ZFR0, SHA3, 1);
         t = FIELD_DP64(t, ID_AA64ZFR0, SM4, 1);
         t = FIELD_DP64(t, ID_AA64ZFR0, I8MM, 1);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         u = FIELD_DP32(u, ID_ISAR6, FHM, 1);
         u = FIELD_DP32(u, ID_ISAR6, SB, 1);
         u = FIELD_DP32(u, ID_ISAR6, SPECRES, 1);
+        u = FIELD_DP32(u, ID_ISAR6, BF16, 1);
         u = FIELD_DP32(u, ID_ISAR6, I8MM, 1);
         cpu->isar.id_isar6 = u;
 
diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu_tcg.c
+++ b/target/arm/cpu_tcg.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
         t = FIELD_DP32(t, ID_ISAR6, FHM, 1);
         t = FIELD_DP32(t, ID_ISAR6, SB, 1);
         t = FIELD_DP32(t, ID_ISAR6, SPECRES, 1);
+        t = FIELD_DP32(t, ID_ISAR6, BF16, 1);
         t = FIELD_DP32(t, ID_ISAR6, I8MM, 1);
         cpu->isar.id_isar6 = t;
 
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

Until now, Hypervisor.framework has only been available on x86_64 systems.
With Apple Silicon shipping now, it extends its reach to aarch64. To
prepare for support for multiple architectures, let's start moving common
code out into its own accel directory.

This patch moves assert_hvf_ok() and introduces generic build infrastructure.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-2-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/sysemu/hvf_int.h | 18 +++++++++++++++
 accel/hvf/hvf-all.c      | 47 ++++++++++++++++++++++++++++++++++++++++
 target/i386/hvf/hvf.c    | 33 +---------------------------
 MAINTAINERS              |  8 +++++++
 accel/hvf/meson.build    |  6 +++++
 accel/meson.build        |  1 +
 6 files changed, 81 insertions(+), 32 deletions(-)
 create mode 100644 include/sysemu/hvf_int.h
 create mode 100644 accel/hvf/hvf-all.c
 create mode 100644 accel/hvf/meson.build

diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * QEMU Hypervisor.framework (HVF) support
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/* header to be included in HVF-specific code */
+
+#ifndef HVF_INT_H
+#define HVF_INT_H
+
+#include <Hypervisor/hv.h>
+
+void assert_hvf_ok(hv_return_t ret);
+
+#endif
diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/accel/hvf/hvf-all.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * QEMU Hypervisor.framework support
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
+
+void assert_hvf_ok(hv_return_t ret)
+{
+    if (ret == HV_SUCCESS) {
+        return;
+    }
+
+    switch (ret) {
+    case HV_ERROR:
+        error_report("Error: HV_ERROR");
+        break;
+    case HV_BUSY:
+        error_report("Error: HV_BUSY");
+        break;
+    case HV_BAD_ARGUMENT:
+        error_report("Error: HV_BAD_ARGUMENT");
+        break;
+    case HV_NO_RESOURCES:
+        error_report("Error: HV_NO_RESOURCES");
+        break;
+    case HV_NO_DEVICE:
+        error_report("Error: HV_NO_DEVICE");
+        break;
+    case HV_UNSUPPORTED:
+        error_report("Error: HV_UNSUPPORTED");
+        break;
+    default:
+        error_report("Unknown Error");
+    }
+
+    abort();
+}
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/error-report.h"
 
 #include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
 #include "sysemu/runstate.h"
 #include "hvf-i386.h"
 #include "vmcs.h"
@@ -XXX,XX +XXX,XX @@
 
 HVFState *hvf_state;
 
-static void assert_hvf_ok(hv_return_t ret)
-{
-    if (ret == HV_SUCCESS) {
-        return;
-    }
-
-    switch (ret) {
-    case HV_ERROR:
-        error_report("Error: HV_ERROR");
-        break;
-    case HV_BUSY:
-        error_report("Error: HV_BUSY");
-        break;
-    case HV_BAD_ARGUMENT:
-        error_report("Error: HV_BAD_ARGUMENT");
-        break;
-    case HV_NO_RESOURCES:
-        error_report("Error: HV_NO_RESOURCES");
-        break;
-    case HV_NO_DEVICE:
-        error_report("Error: HV_NO_DEVICE");
-        break;
-    case HV_UNSUPPORTED:
-        error_report("Error: HV_UNSUPPORTED");
-        break;
-    default:
-        error_report("Unknown Error");
-    }
-
-    abort();
-}
-
 /* Memory slots */
 hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
 {
diff --git a/MAINTAINERS b/MAINTAINERS
index XXXXXXX..XXXXXXX 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ M: Roman Bolshakov <r.bolshakov@yadro.com>
 W: https://wiki.qemu.org/Features/HVF
 S: Maintained
 F: target/i386/hvf/
+
+HVF
+M: Cameron Esfahani <dirty@apple.com>
+M: Roman Bolshakov <r.bolshakov@yadro.com>
+W: https://wiki.qemu.org/Features/HVF
+S: Maintained
+F: accel/hvf/
 F: include/sysemu/hvf.h
+F: include/sysemu/hvf_int.h
 
 WHPX CPUs
 M: Sunil Muthuswamy <sunilmut@microsoft.com>
diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/accel/hvf/meson.build
@@ -XXX,XX +XXX,XX @@
+hvf_ss = ss.source_set()
+hvf_ss.add(files(
+  'hvf-all.c',
+))
+
+specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
diff --git a/accel/meson.build b/accel/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/accel/meson.build
+++ b/accel/meson.build
@@ -XXX,XX +XXX,XX @@ specific_ss.add(files('accel-common.c'))
 softmmu_ss.add(files('accel-softmmu.c'))
 user_ss.add(files('accel-user.c'))
 
+subdir('hvf')
 subdir('qtest')
 subdir('kvm')
 subdir('tcg')
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

This patch moves the vCPU thread loop over.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-3-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 {target/i386 => accel}/hvf/hvf-accel-ops.h | 0
 {target/i386 => accel}/hvf/hvf-accel-ops.c | 0
 target/i386/hvf/x86hvf.c                   | 2 +-
 accel/hvf/meson.build                      | 1 +
 target/i386/hvf/meson.build                | 1 -
 5 files changed, 2 insertions(+), 2 deletions(-)
 rename {target/i386 => accel}/hvf/hvf-accel-ops.h (100%)
 rename {target/i386 => accel}/hvf/hvf-accel-ops.c (100%)

diff --git a/target/i386/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
similarity index 100%
rename from target/i386/hvf/hvf-accel-ops.h
rename to accel/hvf/hvf-accel-ops.h
diff --git a/target/i386/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
similarity index 100%
rename from target/i386/hvf/hvf-accel-ops.c
rename to accel/hvf/hvf-accel-ops.c
diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -XXX,XX +XXX,XX @@
 #include <Hypervisor/hv.h>
 #include <Hypervisor/hv_vmx.h>
 
-#include "hvf-accel-ops.h"
+#include "accel/hvf/hvf-accel-ops.h"
 
 void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
                      SegmentCache *qseg, bool is_tr)
diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/meson.build
+++ b/accel/hvf/meson.build
@@ -XXX,XX +XXX,XX @@
 hvf_ss = ss.source_set()
 hvf_ss.add(files(
   'hvf-all.c',
+  'hvf-accel-ops.c',
 ))
 
 specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/meson.build
+++ b/target/i386/hvf/meson.build
@@ -XXX,XX +XXX,XX @@
 i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
   'hvf.c',
-  'hvf-accel-ops.c',
   'x86.c',
   'x86_cpuid.c',
   'x86_decode.c',
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

This patch moves CPU and memory operations over. While at it, make sure
the code is consumable on non-i386 systems.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-4-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/sysemu/hvf_int.h   |   4 +
 target/i386/hvf/hvf-i386.h |   2 -
 target/i386/hvf/x86hvf.h   |   2 -
 accel/hvf/hvf-accel-ops.c  | 308 ++++++++++++++++++++++++++++++++++++-
 target/i386/hvf/hvf.c      | 302 ------------------------------------
 5 files changed, 311 insertions(+), 307 deletions(-)

diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@
 
 #include <Hypervisor/hv.h>
 
+void hvf_set_phys_mem(MemoryRegionSection *, bool);
 void assert_hvf_ok(hv_return_t ret);
+hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
+int hvf_put_registers(CPUState *);
+int hvf_get_registers(CPUState *);
 
 #endif
diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf-i386.h
+++ b/target/i386/hvf/hvf-i386.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
 };
 extern HVFState *hvf_state;
 
-void hvf_set_phys_mem(MemoryRegionSection *, bool);
 void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
-hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
 
 #ifdef NEED_CPU_H
 /* Functions exported to host specific mode */
diff --git a/target/i386/hvf/x86hvf.h b/target/i386/hvf/x86hvf.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86hvf.h
+++ b/target/i386/hvf/x86hvf.h
@@ -XXX,XX +XXX,XX @@
 #include "x86_descr.h"
 
 int hvf_process_events(CPUState *);
-int hvf_put_registers(CPUState *);
-int hvf_get_registers(CPUState *);
 bool hvf_inject_interrupts(CPUState *);
 void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
                      SegmentCache *qseg, bool is_tr);
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
+#include "exec/address-spaces.h"
+#include "exec/exec-all.h"
+#include "sysemu/cpus.h"
 #include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
 #include "sysemu/runstate.h"
-#include "target/i386/cpu.h"
 #include "qemu/guest-random.h"
 
 #include "hvf-accel-ops.h"
 
+HVFState *hvf_state;
+
+/* Memory slots */
+
+hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
+{
+    hvf_slot *slot;
+    int x;
+    for (x = 0; x < hvf_state->num_slots; ++x) {
+        slot = &hvf_state->slots[x];
+        if (slot->size && start < (slot->start + slot->size) &&
+            (start + size) > slot->start) {
+            return slot;
+        }
+    }
+    return NULL;
+}
+
+struct mac_slot {
+    int present;
+    uint64_t size;
+    uint64_t gpa_start;
+    uint64_t gva;
+};
+
+struct mac_slot mac_slots[32];
+
+static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
+{
+    struct mac_slot *macslot;
+    hv_return_t ret;
+
+    macslot = &mac_slots[slot->slot_id];
+
+    if (macslot->present) {
+        if (macslot->size != slot->size) {
+            macslot->present = 0;
+            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
+            assert_hvf_ok(ret);
+        }
+    }
+
+    if (!slot->size) {
+        return 0;
+    }
+
+    macslot->present = 1;
+    macslot->gpa_start = slot->start;
+    macslot->size = slot->size;
+    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
+    assert_hvf_ok(ret);
+    return 0;
+}
+
+void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
+{
+    hvf_slot *mem;
+    MemoryRegion *area = section->mr;
+    bool writeable = !area->readonly && !area->rom_device;
+    hv_memory_flags_t flags;
+
+    if (!memory_region_is_ram(area)) {
+        if (writeable) {
+            return;
+        } else if (!memory_region_is_romd(area)) {
+            /*
+             * If the memory device is not in romd_mode, then we actually want
+             * to remove the hvf memory slot so all accesses will trap.
+             */
+             add = false;
+        }
+    }
+
+    mem = hvf_find_overlap_slot(
+            section->offset_within_address_space,
+            int128_get64(section->size));
+
+    if (mem && add) {
+        if (mem->size == int128_get64(section->size) &&
+            mem->start == section->offset_within_address_space &&
+            mem->mem == (memory_region_get_ram_ptr(area) +
+            section->offset_within_region)) {
+            return; /* Same region was attempted to register, go away. */
+        }
+    }
+
+    /* Region needs to be reset. set the size to 0 and remap it. */
+    if (mem) {
+        mem->size = 0;
+        if (do_hvf_set_memory(mem, 0)) {
+            error_report("Failed to reset overlapping slot");
+            abort();
+        }
+    }
+
+    if (!add) {
+        return;
+    }
+
+    if (area->readonly ||
+        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
+        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
+    } else {
+        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
+    }
+
+    /* Now make a new slot. */
+    int x;
+
+    for (x = 0; x < hvf_state->num_slots; ++x) {
+        mem = &hvf_state->slots[x];
+        if (!mem->size) {
+            break;
+        }
+    }
+
+    if (x == hvf_state->num_slots) {
+        error_report("No free slots");
+        abort();
+    }
+
+    mem->size = int128_get64(section->size);
+    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
+    mem->start = section->offset_within_address_space;
+    mem->region = area;
+
+    if (do_hvf_set_memory(mem, flags)) {
+        error_report("Error registering new memory slot");
+        abort();
+    }
+}
+
+static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
+{
+    if (!cpu->vcpu_dirty) {
+        hvf_get_registers(cpu);
+        cpu->vcpu_dirty = true;
+    }
+}
+
+void hvf_cpu_synchronize_state(CPUState *cpu)
+{
+    if (!cpu->vcpu_dirty) {
+        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
+    }
+}
+
+static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
+                                              run_on_cpu_data arg)
+{
+    hvf_put_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+void hvf_cpu_synchronize_post_reset(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
+}
+
+static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
+                                             run_on_cpu_data arg)
+{
+    hvf_put_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+void hvf_cpu_synchronize_post_init(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
+}
+
+static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
+                                              run_on_cpu_data arg)
+{
+    cpu->vcpu_dirty = true;
+}
+
+void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
+}
+
+static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
+{
+    hvf_slot *slot;
+
+    slot = hvf_find_overlap_slot(
+            section->offset_within_address_space,
+            int128_get64(section->size));
+
+    /* protect region against writes; begin tracking it */
+    if (on) {
+        slot->flags |= HVF_SLOT_LOG;
+        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
+                      HV_MEMORY_READ);
+    /* stop tracking region*/
+    } else {
+        slot->flags &= ~HVF_SLOT_LOG;
+        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
+                      HV_MEMORY_READ | HV_MEMORY_WRITE);
+    }
+}
+
+static void hvf_log_start(MemoryListener *listener,
+                          MemoryRegionSection *section, int old, int new)
+{
+    if (old != 0) {
+        return;
+    }
+
+    hvf_set_dirty_tracking(section, 1);
+}
+
+static void hvf_log_stop(MemoryListener *listener,
+                         MemoryRegionSection *section, int old, int new)
+{
+    if (new != 0) {
+        return;
+    }
+
+    hvf_set_dirty_tracking(section, 0);
+}
+
+static void hvf_log_sync(MemoryListener *listener,
+                         MemoryRegionSection *section)
+{
+    /*
+     * sync of dirty pages is handled elsewhere; just make sure we keep
+     * tracking the region.
+     */
+    hvf_set_dirty_tracking(section, 1);
+}
+
+static void hvf_region_add(MemoryListener *listener,
+                           MemoryRegionSection *section)
+{
+    hvf_set_phys_mem(section, true);
+}
+
+static void hvf_region_del(MemoryListener *listener,
+                           MemoryRegionSection *section)
+{
+    hvf_set_phys_mem(section, false);
+}
+
+static MemoryListener hvf_memory_listener = {
+    .priority = 10,
+    .region_add = hvf_region_add,
+    .region_del = hvf_region_del,
+    .log_start = hvf_log_start,
+    .log_stop = hvf_log_stop,
+    .log_sync = hvf_log_sync,
+};
+
+static void dummy_signal(int sig)
+{
+}
+
+bool hvf_allowed;
+
+static int hvf_accel_init(MachineState *ms)
+{
+    int x;
+    hv_return_t ret;
+    HVFState *s;
+
+    ret = hv_vm_create(HV_VM_DEFAULT);
+    assert_hvf_ok(ret);
+
+    s = g_new0(HVFState, 1);
+
+    s->num_slots = 32;
+    for (x = 0; x < s->num_slots; ++x) {
+        s->slots[x].size = 0;
+        s->slots[x].slot_id = x;
+    }
+
+    hvf_state = s;
+    memory_listener_register(&hvf_memory_listener, &address_space_memory);
+    return 0;
+}
+
+static void hvf_accel_class_init(ObjectClass *oc, void *data)
+{
+    AccelClass *ac = ACCEL_CLASS(oc);
+    ac->name = "HVF";
+    ac->init_machine = hvf_accel_init;
+    ac->allowed = &hvf_allowed;
+}
+
+static const TypeInfo hvf_accel_type = {
+    .name = TYPE_HVF_ACCEL,
+    .parent = TYPE_ACCEL,
+    .class_init = hvf_accel_class_init,
+};
+
+static void hvf_type_init(void)
+{
+    type_register_static(&hvf_accel_type);
+}
+
+type_init(hvf_type_init);
+
 /*
  * The HVF-specific vCPU thread function. This one should only run when the host
  * CPU supports the VMX "unrestricted guest" feature.
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
 
 #include "hvf-accel-ops.h"
 
-HVFState *hvf_state;
-
-/* Memory slots */
-hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
-{
-    hvf_slot *slot;
-    int x;
-    for (x = 0; x < hvf_state->num_slots; ++x) {
-        slot = &hvf_state->slots[x];
-        if (slot->size && start < (slot->start + slot->size) &&
-            (start + size) > slot->start) {
-            return slot;
-        }
-    }
-    return NULL;
-}
-
-struct mac_slot {
-    int present;
-    uint64_t size;
-    uint64_t gpa_start;
-    uint64_t gva;
-};
-
-struct mac_slot mac_slots[32];
-
-static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
-{
-    struct mac_slot *macslot;
-    hv_return_t ret;
-
-    macslot = &mac_slots[slot->slot_id];
-
-    if (macslot->present) {
-        if (macslot->size != slot->size) {
-            macslot->present = 0;
-            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
-            assert_hvf_ok(ret);
-        }
-    }
-
-    if (!slot->size) {
-        return 0;
-    }
-
-    macslot->present = 1;
-    macslot->gpa_start = slot->start;
-    macslot->size = slot->size;
-    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
-    assert_hvf_ok(ret);
-    return 0;
-}
-
-void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
-{
-    hvf_slot *mem;
-    MemoryRegion *area = section->mr;
-    bool writeable = !area->readonly && !area->rom_device;
-    hv_memory_flags_t flags;
-
-    if (!memory_region_is_ram(area)) {
-        if (writeable) {
-            return;
-        } else if (!memory_region_is_romd(area)) {
-            /*
-             * If the memory device is not in romd_mode, then we actually want
-             * to remove the hvf memory slot so all accesses will trap.
-             */
-             add = false;
-        }
-    }
-
-    mem = hvf_find_overlap_slot(
-            section->offset_within_address_space,
-            int128_get64(section->size));
-
-    if (mem && add) {
-        if (mem->size == int128_get64(section->size) &&
-            mem->start == section->offset_within_address_space &&
-            mem->mem == (memory_region_get_ram_ptr(area) +
-            section->offset_within_region)) {
-            return; /* Same region was attempted to register, go away. */
-        }
-    }
-
-    /* Region needs to be reset. set the size to 0 and remap it. */
-    if (mem) {
-        mem->size = 0;
-        if (do_hvf_set_memory(mem, 0)) {
-            error_report("Failed to reset overlapping slot");
-            abort();
-        }
-    }
-
-    if (!add) {
-        return;
-    }
-
-    if (area->readonly ||
-        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
-        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
-    } else {
-        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
-    }
-
-    /* Now make a new slot. */
-    int x;
-
-    for (x = 0; x < hvf_state->num_slots; ++x) {
-        mem = &hvf_state->slots[x];
-        if (!mem->size) {
-            break;
-        }
-    }
-
-    if (x == hvf_state->num_slots) {
-        error_report("No free slots");
-        abort();
-    }
-
-    mem->size = int128_get64(section->size);
-    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
-    mem->start = section->offset_within_address_space;
-    mem->region = area;
-
-    if (do_hvf_set_memory(mem, flags)) {
-        error_report("Error registering new memory slot");
-        abort();
-    }
-}
-
 void vmx_update_tpr(CPUState *cpu)
 {
     /* TODO: need integrate APIC handling */
@@ -XXX,XX +XXX,XX @@ void hvf_handle_io(CPUArchState *env, uint16_t port, void *buffer,
     }
 }
 
-static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
-{
-    if (!cpu->vcpu_dirty) {
-        hvf_get_registers(cpu);
-        cpu->vcpu_dirty = true;
-    }
-}
-
-void hvf_cpu_synchronize_state(CPUState *cpu)
-{
-    if (!cpu->vcpu_dirty) {
-        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
-    }
-}
-
-static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
-                                              run_on_cpu_data arg)
-{
-    hvf_put_registers(cpu);
-    cpu->vcpu_dirty = false;
-}
-
-void hvf_cpu_synchronize_post_reset(CPUState *cpu)
-{
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
-}
-
-static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
-                                             run_on_cpu_data arg)
-{
-    hvf_put_registers(cpu);
-    cpu->vcpu_dirty = false;
-}
-
-void hvf_cpu_synchronize_post_init(CPUState *cpu)
-{
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
-}
-
-static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
-                                              run_on_cpu_data arg)
-{
-    cpu->vcpu_dirty = true;
-}
-
-void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
-{
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
-}
-
 static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
 {
     int read, write;
@@ -XXX,XX +XXX,XX @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
     return false;
 }
 
-static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
-{
-    hvf_slot *slot;
-
-    slot = hvf_find_overlap_slot(
-            section->offset_within_address_space,
-            int128_get64(section->size));
-
-    /* protect region against writes; begin tracking it */
-    if (on) {
-        slot->flags |= HVF_SLOT_LOG;
-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
-                      HV_MEMORY_READ);
-    /* stop tracking region*/
-    } else {
-        slot->flags &= ~HVF_SLOT_LOG;
-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
-                      HV_MEMORY_READ | HV_MEMORY_WRITE);
-    }
-}
-
-static void hvf_log_start(MemoryListener *listener,
-                          MemoryRegionSection *section, int old, int new)
-{
-    if (old != 0) {
-        return;
-    }
-
-    hvf_set_dirty_tracking(section, 1);
-}
-
-static void hvf_log_stop(MemoryListener *listener,
-                         MemoryRegionSection *section, int old, int new)
-{
-    if (new != 0) {
-        return;
-    }
-
-    hvf_set_dirty_tracking(section, 0);
-}
-
-static void hvf_log_sync(MemoryListener *listener,
-                         MemoryRegionSection *section)
-{
-    /*
-     * sync of dirty pages is handled elsewhere; just make sure we keep
-     * tracking the region.
-     */
-    hvf_set_dirty_tracking(section, 1);
-}
-
-static void hvf_region_add(MemoryListener *listener,
-                           MemoryRegionSection *section)
-{
-    hvf_set_phys_mem(section, true);
-}
-
-static void hvf_region_del(MemoryListener *listener,
-                           MemoryRegionSection *section)
-{
-    hvf_set_phys_mem(section, false);
-}
-
-static MemoryListener hvf_memory_listener = {
-    .priority = 10,
-    .region_add = hvf_region_add,
-    .region_del = hvf_region_del,
-    .log_start = hvf_log_start,
-    .log_stop = hvf_log_stop,
-    .log_sync = hvf_log_sync,
-};
-
 void hvf_vcpu_destroy(CPUState *cpu)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
@@ -XXX,XX +XXX,XX @@ void hvf_vcpu_destroy(CPUState *cpu)
     assert_hvf_ok(ret);
 }
 
-static void dummy_signal(int sig)
-{
-}
-
 static void init_tsc_freq(CPUX86State *env)
 {
     size_t length;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
 
     return ret;
 }
-
-bool hvf_allowed;
-
-static int hvf_accel_init(MachineState *ms)
-{
-    int x;
-    hv_return_t ret;
-    HVFState *s;
-
-    ret = hv_vm_create(HV_VM_DEFAULT);
-    assert_hvf_ok(ret);
-
-    s = g_new0(HVFState, 1);
- 
-    s->num_slots = 32;
-    for (x = 0; x < s->num_slots; ++x) {
-        s->slots[x].size = 0;
-        s->slots[x].slot_id = x;
-    }
-  
-    hvf_state = s;
-    memory_listener_register(&hvf_memory_listener, &address_space_memory);
-    return 0;
-}
-
-static void hvf_accel_class_init(ObjectClass *oc, void *data)
-{
-    AccelClass *ac = ACCEL_CLASS(oc);
-    ac->name = "HVF";
-    ac->init_machine = hvf_accel_init;
-    ac->allowed = &hvf_allowed;
-}
-
-static const TypeInfo hvf_accel_type = {
-    .name = TYPE_HVF_ACCEL,
-    .parent = TYPE_ACCEL,
-    .class_init = hvf_accel_class_init,
-};
-
-static void hvf_type_init(void)
-{
-    type_register_static(&hvf_accel_type);
-}
-
-type_init(hvf_type_init);
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

This patch moves a few internal struct and constant defines over.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-5-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/sysemu/hvf_int.h   | 30 ++++++++++++++++++++++++++++++
 target/i386/hvf/hvf-i386.h | 31 +------------------------------
 2 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@
 
 #include <Hypervisor/hv.h>
 
+/* hvf_slot flags */
+#define HVF_SLOT_LOG (1 << 0)
+
+typedef struct hvf_slot {
+    uint64_t start;
+    uint64_t size;
+    uint8_t *mem;
+    int slot_id;
+    uint32_t flags;
+    MemoryRegion *region;
+} hvf_slot;
+
+typedef struct hvf_vcpu_caps {
+    uint64_t vmx_cap_pinbased;
+    uint64_t vmx_cap_procbased;
+    uint64_t vmx_cap_procbased2;
+    uint64_t vmx_cap_entry;
+    uint64_t vmx_cap_exit;
+    uint64_t vmx_cap_preemption_timer;
+} hvf_vcpu_caps;
+
+struct HVFState {
+    AccelState parent;
+    hvf_slot slots[32];
+    int num_slots;
+
+    hvf_vcpu_caps *hvf_caps;
+};
+extern HVFState *hvf_state;
+
 void hvf_set_phys_mem(MemoryRegionSection *, bool);
 void assert_hvf_ok(hv_return_t ret);
 hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf-i386.h
+++ b/target/i386/hvf/hvf-i386.h
@@ -XXX,XX +XXX,XX @@
 
 #include "qemu/accel.h"
 #include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
 #include "cpu.h"
 #include "x86.h"
 
-/* hvf_slot flags */
-#define HVF_SLOT_LOG (1 << 0)
-
-typedef struct hvf_slot {
-    uint64_t start;
-    uint64_t size;
-    uint8_t *mem;
-    int slot_id;
-    uint32_t flags;
-    MemoryRegion *region;
-} hvf_slot;
-
-typedef struct hvf_vcpu_caps {
-    uint64_t vmx_cap_pinbased;
-    uint64_t vmx_cap_procbased;
-    uint64_t vmx_cap_procbased2;
-    uint64_t vmx_cap_entry;
-    uint64_t vmx_cap_exit;
-    uint64_t vmx_cap_preemption_timer;
-} hvf_vcpu_caps;
-
-struct HVFState {
-    AccelState parent;
-    hvf_slot slots[32];
-    int num_slots;
-
-    hvf_vcpu_caps *hvf_caps;
-};
-extern HVFState *hvf_state;
-
 void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
 
 #ifdef NEED_CPU_H
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

The hvf_set_phys_mem() function is only called within the same file.
Make it static.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-6-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/sysemu/hvf_int.h  | 1 -
 accel/hvf/hvf-accel-ops.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

From: Alexander Graf <agraf@csgraf.de>

The ARM version of Hypervisor.framework no longer defines these two
types, so let's just revert to standard ones.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-7-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
     macslot->present = 1;
     macslot->gpa_start = slot->start;
     macslot->size = slot->size;
-    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
+    ret = hv_vm_map(slot->mem, slot->start, slot->size, flags);
     assert_hvf_ok(ret);
     return 0;
 }
@@ -XXX,XX +XXX,XX @@ static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
     /* protect region against writes; begin tracking it */
     if (on) {
         slot->flags |= HVF_SLOT_LOG;
-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
+        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
                       HV_MEMORY_READ);
     /* stop tracking region*/
     } else {
         slot->flags &= ~HVF_SLOT_LOG;
-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
+        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
                       HV_MEMORY_READ | HV_MEMORY_WRITE);
     }
 }
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

This patch splits the vcpu init and destroy functions into a generic and
an architecture specific portion. This also allows us to move the generic
functions into the generic hvf code, removing exported functions.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-8-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.h |  2 --
 include/sysemu/hvf_int.h  |  2 ++
 accel/hvf/hvf-accel-ops.c | 30 ++++++++++++++++++++++++++++++
 target/i386/hvf/hvf.c     | 23 ++---------------------
 4 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.h
+++ b/accel/hvf/hvf-accel-ops.h
@@ -XXX,XX +XXX,XX @@
 
 #include "sysemu/cpus.h"
 
-int hvf_init_vcpu(CPUState *);
 int hvf_vcpu_exec(CPUState *);
 void hvf_cpu_synchronize_state(CPUState *);
 void hvf_cpu_synchronize_post_reset(CPUState *);
 void hvf_cpu_synchronize_post_init(CPUState *);
 void hvf_cpu_synchronize_pre_loadvm(CPUState *);
-void hvf_vcpu_destroy(CPUState *);
 
 #endif /* HVF_CPUS_H */
diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
 extern HVFState *hvf_state;
 
 void assert_hvf_ok(hv_return_t ret);
+int hvf_arch_init_vcpu(CPUState *cpu);
+void hvf_arch_vcpu_destroy(CPUState *cpu);
 hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
 int hvf_put_registers(CPUState *);
 int hvf_get_registers(CPUState *);
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static void hvf_type_init(void)
 
 type_init(hvf_type_init);
 
+static void hvf_vcpu_destroy(CPUState *cpu)
+{
+    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
+    assert_hvf_ok(ret);
+
+    hvf_arch_vcpu_destroy(cpu);
+}
+
+static int hvf_init_vcpu(CPUState *cpu)
+{
+    int r;
+
+    /* init cpu signals */
+    sigset_t set;
+    struct sigaction sigact;
+
+    memset(&sigact, 0, sizeof(sigact));
+    sigact.sa_handler = dummy_signal;
+    sigaction(SIG_IPI, &sigact, NULL);
+
+    pthread_sigmask(SIG_BLOCK, NULL, &set);
+    sigdelset(&set, SIG_IPI);
+
+    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
+    cpu->vcpu_dirty = 1;
+    assert_hvf_ok(r);
+
+    return hvf_arch_init_vcpu(cpu);
+}
+
 /*
  * The HVF-specific vCPU thread function. This one should only run when the host
  * CPU supports the VMX "unrestricted guest" feature.
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
     return false;
 }
 
-void hvf_vcpu_destroy(CPUState *cpu)
+void hvf_arch_vcpu_destroy(CPUState *cpu)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
     CPUX86State *env = &x86_cpu->env;
 
-    hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd);
     g_free(env->hvf_mmio_buf);
-    assert_hvf_ok(ret);
 }
 
 static void init_tsc_freq(CPUX86State *env)
@@ -XXX,XX +XXX,XX @@ static inline bool apic_bus_freq_is_known(CPUX86State *env)
     return env->apic_bus_freq != 0;
 }
 
-int hvf_init_vcpu(CPUState *cpu)
+int hvf_arch_init_vcpu(CPUState *cpu)
 {
-
     X86CPU *x86cpu = X86_CPU(cpu);
     CPUX86State *env = &x86cpu->env;
-    int r;
-
-    /* init cpu signals */
-    sigset_t set;
-    struct sigaction sigact;
-
-    memset(&sigact, 0, sizeof(sigact));
-    sigact.sa_handler = dummy_signal;
-    sigaction(SIG_IPI, &sigact, NULL);
-
-    pthread_sigmask(SIG_BLOCK, NULL, &set);
-    sigdelset(&set, SIG_IPI);
 
     init_emu();
     init_decoder();
@@ -XXX,XX +XXX,XX @@ int hvf_init_vcpu(CPUState *cpu)
         }
     }
 
-    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
-    cpu->vcpu_dirty = 1;
-    assert_hvf_ok(r);
-
     if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED,
         &hvf_state->hvf_caps->vmx_cap_pinbased)) {
         abort();
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

There is no reason to call the hvf specific hvf_cpu_synchronize_state()
when we can just use the generic cpu_synchronize_state() instead. This
allows us to have less dependency on internal function definitions and
allows us to make hvf_cpu_synchronize_state() static.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-9-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.h | 1 -
 accel/hvf/hvf-accel-ops.c | 2 +-
 target/i386/hvf/x86hvf.c  | 9 ++++-----
 3 files changed, 5 insertions(+), 7 deletions(-)

From: Alexander Graf <agraf@csgraf.de>

The hvf accel synchronize functions are only used as input for local
callback functions, so we can make them static.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-10-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.h | 3 ---
 accel/hvf/hvf-accel-ops.c | 6 +++---
 2 files changed, 3 insertions(+), 6 deletions(-)

From: Alexander Graf <agraf@csgraf.de>

We can move the definition of hvf_vcpu_exec() into our internal
hvf header, obsoleting the need for hvf-accel-ops.h.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-11-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.h | 17 -----------------
 include/sysemu/hvf_int.h  |  1 +
 accel/hvf/hvf-accel-ops.c |  2 --
 target/i386/hvf/hvf.c     |  2 --
 4 files changed, 1 insertion(+), 21 deletions(-)
 delete mode 100644 accel/hvf/hvf-accel-ops.h

diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
deleted file mode 100644
index XXXXXXX..XXXXXXX
--- a/accel/hvf/hvf-accel-ops.h
+++ /dev/null
@@ -XXX,XX +XXX,XX @@
-/*
- * Accelerator CPUS Interface
- *
- * Copyright 2020 SUSE LLC
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- */
-
-#ifndef HVF_CPUS_H
-#define HVF_CPUS_H
-
-#include "sysemu/cpus.h"
-
-int hvf_vcpu_exec(CPUState *);
-
-#endif /* HVF_CPUS_H */
diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ extern HVFState *hvf_state;
 void assert_hvf_ok(hv_return_t ret);
 int hvf_arch_init_vcpu(CPUState *cpu);
 void hvf_arch_vcpu_destroy(CPUState *cpu);
+int hvf_vcpu_exec(CPUState *);
 hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
 int hvf_put_registers(CPUState *);
 int hvf_get_registers(CPUState *);
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@
 #include "sysemu/runstate.h"
 #include "qemu/guest-random.h"
 
-#include "hvf-accel-ops.h"
-
 HVFState *hvf_state;
 
 /* Memory slots */
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/accel.h"
 #include "target/i386/cpu.h"
 
-#include "hvf-accel-ops.h"
-
 void vmx_update_tpr(CPUState *cpu)
 {
     /* TODO: need integrate APIC handling */
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

We will need more than a single field for hvf going forward. To keep
the global vcpu struct uncluttered, let's allocate a special hvf vcpu
struct, similar to how hax does it.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-12-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/core/cpu.h       |   3 +-
 include/sysemu/hvf_int.h    |   4 +
 target/i386/hvf/vmx.h       |  24 +++--
 accel/hvf/hvf-accel-ops.c   |   8 +-
 target/i386/hvf/hvf.c       | 104 +++++++++---------
 target/i386/hvf/x86.c       |  28 ++---
 target/i386/hvf/x86_descr.c |  26 ++---
 target/i386/hvf/x86_emu.c   |  62 +++++------
 target/i386/hvf/x86_mmu.c   |   4 +-
 target/i386/hvf/x86_task.c  |  12 +--
 target/i386/hvf/x86hvf.c    | 210 ++++++++++++++++++------------------
 11 files changed, 248 insertions(+), 237 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -XXX,XX +XXX,XX @@ struct KVMState;
 struct kvm_run;
 
 struct hax_vcpu_state;
+struct hvf_vcpu_state;
 
 #define TB_JMP_CACHE_BITS 12
 #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
@@ -XXX,XX +XXX,XX @@ struct CPUState {
 
     struct hax_vcpu_state *hax_vcpu;
 
-    int hvf_fd;
+    struct hvf_vcpu_state *hvf;
 
     /* track IOMMUs whose translations we've cached in the TCG TLB */
     GArray *iommu_notifiers;
diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
 };
 extern HVFState *hvf_state;
 
+struct hvf_vcpu_state {
+    int fd;
+};
+
 void assert_hvf_ok(hv_return_t ret);
 int hvf_arch_init_vcpu(CPUState *cpu);
 void hvf_arch_vcpu_destroy(CPUState *cpu);
diff --git a/target/i386/hvf/vmx.h b/target/i386/hvf/vmx.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/vmx.h
+++ b/target/i386/hvf/vmx.h
@@ -XXX,XX +XXX,XX @@
 #include "vmcs.h"
 #include "cpu.h"
 #include "x86.h"
+#include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
 
 #include "exec/address-spaces.h"
 
@@ -XXX,XX +XXX,XX @@ static inline void macvm_set_rip(CPUState *cpu, uint64_t rip)
     uint64_t val;
 
     /* BUG, should take considering overlap.. */
-    wreg(cpu->hvf_fd, HV_X86_RIP, rip);
+    wreg(cpu->hvf->fd, HV_X86_RIP, rip);
     env->eip = rip;
 
     /* after moving forward in rip, we need to clean INTERRUPTABILITY */
-   val = rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
+   val = rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
    if (val & (VMCS_INTERRUPTIBILITY_STI_BLOCKING |
                VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)) {
         env->hflags &= ~HF_INHIBIT_IRQ_MASK;
-        wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY,
+        wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY,
                val & ~(VMCS_INTERRUPTIBILITY_STI_BLOCKING |
                VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING));
    }
@@ -XXX,XX +XXX,XX @@ static inline void vmx_clear_nmi_blocking(CPUState *cpu)
     CPUX86State *env = &x86_cpu->env;
 
     env->hflags2 &= ~HF2_NMI_MASK;
-    uint32_t gi = (uint32_t) rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
+    uint32_t gi = (uint32_t) rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
     gi &= ~VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
-    wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
+    wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
 }
 
 static inline void vmx_set_nmi_blocking(CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ static inline void vmx_set_nmi_blocking(CPUState *cpu)
     CPUX86State *env = &x86_cpu->env;
 
     env->hflags2 |= HF2_NMI_MASK;
-    uint32_t gi = (uint32_t)rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
+    uint32_t gi = (uint32_t)rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
     gi |= VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
-    wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
+    wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
 }
 
 static inline void vmx_set_nmi_window_exiting(CPUState *cpu)
 {
     uint64_t val;
-    val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
-    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val |
+    val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
+    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val |
           VMCS_PRI_PROC_BASED_CTLS_NMI_WINDOW_EXITING);
 
 }
@@ -XXX,XX +XXX,XX @@ static inline void vmx_clear_nmi_window_exiting(CPUState *cpu)
 {
 
     uint64_t val;
-    val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
-    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val &
+    val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
+    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val &
           ~VMCS_PRI_PROC_BASED_CTLS_NMI_WINDOW_EXITING);
 }
 
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ type_init(hvf_type_init);
 
 static void hvf_vcpu_destroy(CPUState *cpu)
 {
-    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
+    hv_return_t ret = hv_vcpu_destroy(cpu->hvf->fd);
     assert_hvf_ok(ret);
 
     hvf_arch_vcpu_destroy(cpu);
+    g_free(cpu->hvf);
+    cpu->hvf = NULL;
 }
 
 static int hvf_init_vcpu(CPUState *cpu)
 {
     int r;
 
+    cpu->hvf = g_malloc0(sizeof(*cpu->hvf));
+
     /* init cpu signals */
     sigset_t set;
     struct sigaction sigact;
@@ -XXX,XX +XXX,XX @@ static int hvf_init_vcpu(CPUState *cpu)
     pthread_sigmask(SIG_BLOCK, NULL, &set);
     sigdelset(&set, SIG_IPI);
 
-    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
+    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf->fd, HV_VCPU_DEFAULT);
     cpu->vcpu_dirty = 1;
     assert_hvf_ok(r);
 
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@ void vmx_update_tpr(CPUState *cpu)
     int tpr = cpu_get_apic_tpr(x86_cpu->apic_state) << 4;
     int irr = apic_get_highest_priority_irr(x86_cpu->apic_state);
 
-    wreg(cpu->hvf_fd, HV_X86_TPR, tpr);
+    wreg(cpu->hvf->fd, HV_X86_TPR, tpr);
     if (irr == -1) {
-        wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, 0);
+        wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, 0);
     } else {
-        wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, (irr > tpr) ? tpr >> 4 :
+        wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, (irr > tpr) ? tpr >> 4 :
               irr >> 4);
     }
 }
@@ -XXX,XX +XXX,XX @@ void vmx_update_tpr(CPUState *cpu)
 static void update_apic_tpr(CPUState *cpu)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
-    int tpr = rreg(cpu->hvf_fd, HV_X86_TPR) >> 4;
+    int tpr = rreg(cpu->hvf->fd, HV_X86_TPR) >> 4;
     cpu_set_apic_tpr(x86_cpu->apic_state, tpr);
 }
 
@@ -XXX,XX +XXX,XX @@ int hvf_arch_init_vcpu(CPUState *cpu)
     }
 
     /* set VMCS control fields */
-    wvmcs(cpu->hvf_fd, VMCS_PIN_BASED_CTLS,
+    wvmcs(cpu->hvf->fd, VMCS_PIN_BASED_CTLS,
           cap2ctrl(hvf_state->hvf_caps->vmx_cap_pinbased,
           VMCS_PIN_BASED_CTLS_EXTINT |
           VMCS_PIN_BASED_CTLS_NMI |
           VMCS_PIN_BASED_CTLS_VNMI));
-    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS,
+    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS,
           cap2ctrl(hvf_state->hvf_caps->vmx_cap_procbased,
           VMCS_PRI_PROC_BASED_CTLS_HLT |
           VMCS_PRI_PROC_BASED_CTLS_MWAIT |
           VMCS_PRI_PROC_BASED_CTLS_TSC_OFFSET |
           VMCS_PRI_PROC_BASED_CTLS_TPR_SHADOW) |
           VMCS_PRI_PROC_BASED_CTLS_SEC_CONTROL);
-    wvmcs(cpu->hvf_fd, VMCS_SEC_PROC_BASED_CTLS,
+    wvmcs(cpu->hvf->fd, VMCS_SEC_PROC_BASED_CTLS,
           cap2ctrl(hvf_state->hvf_caps->vmx_cap_procbased2,
                    VMCS_PRI_PROC_BASED2_CTLS_APIC_ACCESSES));
 
-    wvmcs(cpu->hvf_fd, VMCS_ENTRY_CTLS, cap2ctrl(hvf_state->hvf_caps->vmx_cap_entry,
+    wvmcs(cpu->hvf->fd, VMCS_ENTRY_CTLS, cap2ctrl(hvf_state->hvf_caps->vmx_cap_entry,
           0));
-    wvmcs(cpu->hvf_fd, VMCS_EXCEPTION_BITMAP, 0); /* Double fault */
+    wvmcs(cpu->hvf->fd, VMCS_EXCEPTION_BITMAP, 0); /* Double fault */
 
-    wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, 0);
+    wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, 0);
 
     x86cpu = X86_CPU(cpu);
     x86cpu->env.xsave_buf = qemu_memalign(4096, 4096);
 
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_STAR, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_LSTAR, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_CSTAR, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_FMASK, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_FSBASE, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_GSBASE, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_KERNELGSBASE, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_TSC_AUX, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_TSC, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_CS, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_EIP, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_ESP, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_STAR, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_LSTAR, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_CSTAR, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_FMASK, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_FSBASE, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_GSBASE, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_KERNELGSBASE, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_TSC_AUX, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_TSC, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_CS, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_EIP, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_ESP, 1);
 
     return 0;
 }
@@ -XXX,XX +XXX,XX @@ static void hvf_store_events(CPUState *cpu, uint32_t ins_len, uint64_t idtvec_in
         }
         if (idtvec_info & VMCS_IDT_VEC_ERRCODE_VALID) {
             env->has_error_code = true;
-            env->error_code = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_ERROR);
+            env->error_code = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_ERROR);
         }
     }
-    if ((rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY) &
+    if ((rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY) &
         VMCS_INTERRUPTIBILITY_NMI_BLOCKING)) {
         env->hflags2 |= HF2_NMI_MASK;
     } else {
         env->hflags2 &= ~HF2_NMI_MASK;
     }
-    if (rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY) &
+    if (rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY) &
          (VMCS_INTERRUPTIBILITY_STI_BLOCKING |
          VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)) {
         env->hflags |= HF_INHIBIT_IRQ_MASK;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
             return EXCP_HLT;
         }
 
-        hv_return_t r  = hv_vcpu_run(cpu->hvf_fd);
+        hv_return_t r  = hv_vcpu_run(cpu->hvf->fd);
         assert_hvf_ok(r);
 
         /* handle VMEXIT */
-        uint64_t exit_reason = rvmcs(cpu->hvf_fd, VMCS_EXIT_REASON);
-        uint64_t exit_qual = rvmcs(cpu->hvf_fd, VMCS_EXIT_QUALIFICATION);
-        uint32_t ins_len = (uint32_t)rvmcs(cpu->hvf_fd,
+        uint64_t exit_reason = rvmcs(cpu->hvf->fd, VMCS_EXIT_REASON);
+        uint64_t exit_qual = rvmcs(cpu->hvf->fd, VMCS_EXIT_QUALIFICATION);
+        uint32_t ins_len = (uint32_t)rvmcs(cpu->hvf->fd,
                                            VMCS_EXIT_INSTRUCTION_LENGTH);
 
-        uint64_t idtvec_info = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_INFO);
+        uint64_t idtvec_info = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_INFO);
 
         hvf_store_events(cpu, ins_len, idtvec_info);
-        rip = rreg(cpu->hvf_fd, HV_X86_RIP);
-        env->eflags = rreg(cpu->hvf_fd, HV_X86_RFLAGS);
+        rip = rreg(cpu->hvf->fd, HV_X86_RIP);
+        env->eflags = rreg(cpu->hvf->fd, HV_X86_RFLAGS);
 
         qemu_mutex_lock_iothread();
 
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
         case EXIT_REASON_EPT_FAULT:
         {
             hvf_slot *slot;
-            uint64_t gpa = rvmcs(cpu->hvf_fd, VMCS_GUEST_PHYSICAL_ADDRESS);
+            uint64_t gpa = rvmcs(cpu->hvf->fd, VMCS_GUEST_PHYSICAL_ADDRESS);
 
             if (((idtvec_info & VMCS_IDT_VEC_VALID) == 0) &&
                 ((exit_qual & EXIT_QUAL_NMIUDTI) != 0)) {
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
                 store_regs(cpu);
                 break;
             } else if (!string && !in) {
-                RAX(env) = rreg(cpu->hvf_fd, HV_X86_RAX);
+                RAX(env) = rreg(cpu->hvf->fd, HV_X86_RAX);
                 hvf_handle_io(env, port, &RAX(env), 1, size, 1);
                 macvm_set_rip(cpu, rip + ins_len);
                 break;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
             break;
         }
         case EXIT_REASON_CPUID: {
-            uint32_t rax = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RAX);
-            uint32_t rbx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RBX);
-            uint32_t rcx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RCX);
-            uint32_t rdx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RDX);
+            uint32_t rax = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RAX);
+            uint32_t rbx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RBX);
+            uint32_t rcx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RCX);
+            uint32_t rdx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RDX);
 
             if (rax == 1) {
                 /* CPUID1.ecx.OSXSAVE needs to know CR4 */
-                env->cr[4] = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR4);
+                env->cr[4] = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR4);
             }
             hvf_cpu_x86_cpuid(env, rax, rcx, &rax, &rbx, &rcx, &rdx);
 
-            wreg(cpu->hvf_fd, HV_X86_RAX, rax);
-            wreg(cpu->hvf_fd, HV_X86_RBX, rbx);
-            wreg(cpu->hvf_fd, HV_X86_RCX, rcx);
-            wreg(cpu->hvf_fd, HV_X86_RDX, rdx);
+            wreg(cpu->hvf->fd, HV_X86_RAX, rax);
+            wreg(cpu->hvf->fd, HV_X86_RBX, rbx);
+            wreg(cpu->hvf->fd, HV_X86_RCX, rcx);
+            wreg(cpu->hvf->fd, HV_X86_RDX, rdx);
 
             macvm_set_rip(cpu, rip + ins_len);
             break;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
         case EXIT_REASON_XSETBV: {
             X86CPU *x86_cpu = X86_CPU(cpu);
             CPUX86State *env = &x86_cpu->env;
-            uint32_t eax = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RAX);
-            uint32_t ecx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RCX);
-            uint32_t edx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RDX);
+            uint32_t eax = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RAX);
+            uint32_t ecx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RCX);
+            uint32_t edx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RDX);
 
             if (ecx) {
                 macvm_set_rip(cpu, rip + ins_len);
                 break;
             }
             env->xcr0 = ((uint64_t)edx << 32) | eax;
-            wreg(cpu->hvf_fd, HV_X86_XCR0, env->xcr0 | 1);
+            wreg(cpu->hvf->fd, HV_X86_XCR0, env->xcr0 | 1);
             macvm_set_rip(cpu, rip + ins_len);
             break;
         }
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
 
             switch (cr) {
             case 0x0: {
-                macvm_set_cr0(cpu->hvf_fd, RRX(env, reg));
+                macvm_set_cr0(cpu->hvf->fd, RRX(env, reg));
                 break;
             }
             case 4: {
-                macvm_set_cr4(cpu->hvf_fd, RRX(env, reg));
+                macvm_set_cr4(cpu->hvf->fd, RRX(env, reg));
                 break;
             }
             case 8: {
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
             break;
         }
         case EXIT_REASON_TASK_SWITCH: {
-            uint64_t vinfo = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_INFO);
+            uint64_t vinfo = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_INFO);
             x68_segment_selector sel = {.sel = exit_qual & 0xffff};
             vmx_handle_task_switch(cpu, sel, (exit_qual >> 30) & 0x3,
              vinfo & VMCS_INTR_VALID, vinfo & VECTORING_INFO_VECTOR_MASK, vinfo
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
             break;
         }
         case EXIT_REASON_RDPMC:
-            wreg(cpu->hvf_fd, HV_X86_RAX, 0);
-            wreg(cpu->hvf_fd, HV_X86_RDX, 0);
+            wreg(cpu->hvf->fd, HV_X86_RAX, 0);
+            wreg(cpu->hvf->fd, HV_X86_RDX, 0);
             macvm_set_rip(cpu, rip + ins_len);
             break;
         case VMX_REASON_VMCALL:
diff --git a/target/i386/hvf/x86.c b/target/i386/hvf/x86.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86.c
+++ b/target/i386/hvf/x86.c
@@ -XXX,XX +XXX,XX @@ bool x86_read_segment_descriptor(struct CPUState *cpu,
     }
 
     if (GDT_SEL == sel.ti) {
-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_BASE);
-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_BASE);
+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
     } else {
-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_BASE);
-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_LIMIT);
+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_BASE);
+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_LIMIT);
     }
 
     if (sel.index * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_write_segment_descriptor(struct CPUState *cpu,
     uint32_t limit;
     
     if (GDT_SEL == sel.ti) {
-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_BASE);
-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_BASE);
+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
     } else {
-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_BASE);
-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_LIMIT);
+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_BASE);
+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_LIMIT);
     }
     
     if (sel.index * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_write_segment_descriptor(struct CPUState *cpu,
 bool x86_read_call_gate(struct CPUState *cpu, struct x86_call_gate *idt_desc,
                         int gate)
 {
-    target_ulong base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_IDTR_BASE);
-    uint32_t limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_IDTR_LIMIT);
+    target_ulong base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_IDTR_BASE);
+    uint32_t limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_IDTR_LIMIT);
 
     memset(idt_desc, 0, sizeof(*idt_desc));
     if (gate * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_read_call_gate(struct CPUState *cpu, struct x86_call_gate *idt_desc,
 
 bool x86_is_protected(struct CPUState *cpu)
 {
-    uint64_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
+    uint64_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
     return cr0 & CR0_PE;
 }
 
@@ -XXX,XX +XXX,XX @@ bool x86_is_v8086(struct CPUState *cpu)
 
 bool x86_is_long_mode(struct CPUState *cpu)
 {
-    return rvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER) & MSR_EFER_LMA;
+    return rvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER) & MSR_EFER_LMA;
 }
 
 bool x86_is_long64_mode(struct CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ bool x86_is_long64_mode(struct CPUState *cpu)
 
 bool x86_is_paging_mode(struct CPUState *cpu)
 {
-    uint64_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
+    uint64_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
     return cr0 & CR0_PG;
 }
 
 bool x86_is_pae_enabled(struct CPUState *cpu)
 {
-    uint64_t cr4 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR4);
+    uint64_t cr4 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR4);
     return cr4 & CR4_PAE;
 }
 
diff --git a/target/i386/hvf/x86_descr.c b/target/i386/hvf/x86_descr.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86_descr.c
+++ b/target/i386/hvf/x86_descr.c
@@ -XXX,XX +XXX,XX @@ static const struct vmx_segment_field {
 
 uint32_t vmx_read_segment_limit(CPUState *cpu, X86Seg seg)
 {
-    return (uint32_t)rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].limit);
+    return (uint32_t)rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].limit);
 }
 
 uint32_t vmx_read_segment_ar(CPUState *cpu, X86Seg seg)
 {
-    return (uint32_t)rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].ar_bytes);
+    return (uint32_t)rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].ar_bytes);
 }
 
 uint64_t vmx_read_segment_base(CPUState *cpu, X86Seg seg)
 {
-    return rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].base);
+    return rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].base);
 }
 
 x68_segment_selector vmx_read_segment_selector(CPUState *cpu, X86Seg seg)
 {
     x68_segment_selector sel;
-    sel.sel = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector);
+    sel.sel = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector);
     return sel;
 }
 
 void vmx_write_segment_selector(struct CPUState *cpu, x68_segment_selector selector, X86Seg seg)
 {
-    wvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector, selector.sel);
+    wvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector, selector.sel);
 }
 
 void vmx_read_segment_descriptor(struct CPUState *cpu, struct vmx_segment *desc, X86Seg seg)
 {
-    desc->sel = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector);
-    desc->base = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].base);
-    desc->limit = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].limit);
-    desc->ar = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].ar_bytes);
+    desc->sel = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector);
+    desc->base = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].base);
+    desc->limit = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].limit);
+    desc->ar = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].ar_bytes);
 }
 
 void vmx_write_segment_descriptor(CPUState *cpu, struct vmx_segment *desc, X86Seg seg)
 {
     const struct vmx_segment_field *sf = &vmx_segment_fields[seg];
 
-    wvmcs(cpu->hvf_fd, sf->base, desc->base);
-    wvmcs(cpu->hvf_fd, sf->limit, desc->limit);
-    wvmcs(cpu->hvf_fd, sf->selector, desc->sel);
-    wvmcs(cpu->hvf_fd, sf->ar_bytes, desc->ar);
+    wvmcs(cpu->hvf->fd, sf->base, desc->base);
+    wvmcs(cpu->hvf->fd, sf->limit, desc->limit);
+    wvmcs(cpu->hvf->fd, sf->selector, desc->sel);
+    wvmcs(cpu->hvf->fd, sf->ar_bytes, desc->ar);
 }
 
 void x86_segment_descriptor_to_vmx(struct CPUState *cpu, x68_segment_selector selector, struct x86_segment_descriptor *desc, struct vmx_segment *vmx_desc)
diff --git a/target/i386/hvf/x86_emu.c b/target/i386/hvf/x86_emu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86_emu.c
+++ b/target/i386/hvf/x86_emu.c
@@ -XXX,XX +XXX,XX @@ void simulate_rdmsr(struct CPUState *cpu)
 
     switch (msr) {
     case MSR_IA32_TSC:
-        val = rdtscp() + rvmcs(cpu->hvf_fd, VMCS_TSC_OFFSET);
+        val = rdtscp() + rvmcs(cpu->hvf->fd, VMCS_TSC_OFFSET);
         break;
     case MSR_IA32_APICBASE:
         val = cpu_get_apic_base(X86_CPU(cpu)->apic_state);
@@ -XXX,XX +XXX,XX @@ void simulate_rdmsr(struct CPUState *cpu)
         val = x86_cpu->ucode_rev;
         break;
     case MSR_EFER:
-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER);
+        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER);
         break;
     case MSR_FSBASE:
-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_FS_BASE);
+        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_FS_BASE);
         break;
     case MSR_GSBASE:
-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_GS_BASE);
+        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_GS_BASE);
         break;
     case MSR_KERNELGSBASE:
-        val = rvmcs(cpu->hvf_fd, VMCS_HOST_FS_BASE);
+        val = rvmcs(cpu->hvf->fd, VMCS_HOST_FS_BASE);
         break;
     case MSR_STAR:
         abort();
@@ -XXX,XX +XXX,XX @@ void simulate_wrmsr(struct CPUState *cpu)
         cpu_set_apic_base(X86_CPU(cpu)->apic_state, data);
         break;
     case MSR_FSBASE:
-        wvmcs(cpu->hvf_fd, VMCS_GUEST_FS_BASE, data);
+        wvmcs(cpu->hvf->fd, VMCS_GUEST_FS_BASE, data);
         break;
     case MSR_GSBASE:
-        wvmcs(cpu->hvf_fd, VMCS_GUEST_GS_BASE, data);
+        wvmcs(cpu->hvf->fd, VMCS_GUEST_GS_BASE, data);
         break;
     case MSR_KERNELGSBASE:
-        wvmcs(cpu->hvf_fd, VMCS_HOST_FS_BASE, data);
+        wvmcs(cpu->hvf->fd, VMCS_HOST_FS_BASE, data);
         break;
     case MSR_STAR:
         abort();
@@ -XXX,XX +XXX,XX @@ void simulate_wrmsr(struct CPUState *cpu)
         break;
     case MSR_EFER:
         /*printf("new efer %llx\n", EFER(cpu));*/
-        wvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER, data);
+        wvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER, data);
         if (data & MSR_EFER_NXE) {
-            hv_vcpu_invalidate_tlb(cpu->hvf_fd);
+            hv_vcpu_invalidate_tlb(cpu->hvf->fd);
         }
         break;
     case MSR_MTRRphysBase(0):
@@ -XXX,XX +XXX,XX @@ void load_regs(struct CPUState *cpu)
     CPUX86State *env = &x86_cpu->env;
 
     int i = 0;
-    RRX(env, R_EAX) = rreg(cpu->hvf_fd, HV_X86_RAX);
-    RRX(env, R_EBX) = rreg(cpu->hvf_fd, HV_X86_RBX);
-    RRX(env, R_ECX) = rreg(cpu->hvf_fd, HV_X86_RCX);
-    RRX(env, R_EDX) = rreg(cpu->hvf_fd, HV_X86_RDX);
-    RRX(env, R_ESI) = rreg(cpu->hvf_fd, HV_X86_RSI);
-    RRX(env, R_EDI) = rreg(cpu->hvf_fd, HV_X86_RDI);
-    RRX(env, R_ESP) = rreg(cpu->hvf_fd, HV_X86_RSP);
-    RRX(env, R_EBP) = rreg(cpu->hvf_fd, HV_X86_RBP);
+    RRX(env, R_EAX) = rreg(cpu->hvf->fd, HV_X86_RAX);
+    RRX(env, R_EBX) = rreg(cpu->hvf->fd, HV_X86_RBX);
+    RRX(env, R_ECX) = rreg(cpu->hvf->fd, HV_X86_RCX);
+    RRX(env, R_EDX) = rreg(cpu->hvf->fd, HV_X86_RDX);
+    RRX(env, R_ESI) = rreg(cpu->hvf->fd, HV_X86_RSI);
+    RRX(env, R_EDI) = rreg(cpu->hvf->fd, HV_X86_RDI);
+    RRX(env, R_ESP) = rreg(cpu->hvf->fd, HV_X86_RSP);
+    RRX(env, R_EBP) = rreg(cpu->hvf->fd, HV_X86_RBP);
     for (i = 8; i < 16; i++) {
-        RRX(env, i) = rreg(cpu->hvf_fd, HV_X86_RAX + i);
+        RRX(env, i) = rreg(cpu->hvf->fd, HV_X86_RAX + i);
     }
 
-    env->eflags = rreg(cpu->hvf_fd, HV_X86_RFLAGS);
+    env->eflags = rreg(cpu->hvf->fd, HV_X86_RFLAGS);
     rflags_to_lflags(env);
-    env->eip = rreg(cpu->hvf_fd, HV_X86_RIP);
+    env->eip = rreg(cpu->hvf->fd, HV_X86_RIP);
 }
 
 void store_regs(struct CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ void store_regs(struct CPUState *cpu)
     CPUX86State *env = &x86_cpu->env;
 
     int i = 0;
-    wreg(cpu->hvf_fd, HV_X86_RAX, RAX(env));
-    wreg(cpu->hvf_fd, HV_X86_RBX, RBX(env));
-    wreg(cpu->hvf_fd, HV_X86_RCX, RCX(env));
-    wreg(cpu->hvf_fd, HV_X86_RDX, RDX(env));
-    wreg(cpu->hvf_fd, HV_X86_RSI, RSI(env));
-    wreg(cpu->hvf_fd, HV_X86_RDI, RDI(env));
-    wreg(cpu->hvf_fd, HV_X86_RBP, RBP(env));
-    wreg(cpu->hvf_fd, HV_X86_RSP, RSP(env));
+    wreg(cpu->hvf->fd, HV_X86_RAX, RAX(env));
+    wreg(cpu->hvf->fd, HV_X86_RBX, RBX(env));
+    wreg(cpu->hvf->fd, HV_X86_RCX, RCX(env));
+    wreg(cpu->hvf->fd, HV_X86_RDX, RDX(env));
+    wreg(cpu->hvf->fd, HV_X86_RSI, RSI(env));
+    wreg(cpu->hvf->fd, HV_X86_RDI, RDI(env));
+    wreg(cpu->hvf->fd, HV_X86_RBP, RBP(env));
+    wreg(cpu->hvf->fd, HV_X86_RSP, RSP(env));
     for (i = 8; i < 16; i++) {
-        wreg(cpu->hvf_fd, HV_X86_RAX + i, RRX(env, i));
+        wreg(cpu->hvf->fd, HV_X86_RAX + i, RRX(env, i));
     }
 
     lflags_to_rflags(env);
-    wreg(cpu->hvf_fd, HV_X86_RFLAGS, env->eflags);
+    wreg(cpu->hvf->fd, HV_X86_RFLAGS, env->eflags);
     macvm_set_rip(cpu, env->eip);
 }
 
diff --git a/target/i386/hvf/x86_mmu.c b/target/i386/hvf/x86_mmu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86_mmu.c
+++ b/target/i386/hvf/x86_mmu.c
@@ -XXX,XX +XXX,XX @@ static bool test_pt_entry(struct CPUState *cpu, struct gpt_translation *pt,
         pt->err_code |= MMU_PAGE_PT;
     }
 
-    uint32_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
+    uint32_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
     /* check protection */
     if (cr0 & CR0_WP) {
         if (pt->write_access && !pte_write_access(pte)) {
@@ -XXX,XX +XXX,XX @@ static bool walk_gpt(struct CPUState *cpu, target_ulong addr, int err_code,
 {
     int top_level, level;
     bool is_large = false;
-    target_ulong cr3 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR3);
+    target_ulong cr3 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR3);
     uint64_t page_mask = pae ? PAE_PTE_PAGE_MASK : LEGACY_PTE_PAGE_MASK;
     
     memset(pt, 0, sizeof(*pt));
diff --git a/target/i386/hvf/x86_task.c b/target/i386/hvf/x86_task.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86_task.c
+++ b/target/i386/hvf/x86_task.c
@@ -XXX,XX +XXX,XX @@ static void load_state_from_tss32(CPUState *cpu, struct x86_tss_segment32 *tss)
     X86CPU *x86_cpu = X86_CPU(cpu);
     CPUX86State *env = &x86_cpu->env;
 
-    wvmcs(cpu->hvf_fd, VMCS_GUEST_CR3, tss->cr3);
+    wvmcs(cpu->hvf->fd, VMCS_GUEST_CR3, tss->cr3);
 
     env->eip = tss->eip;
     env->eflags = tss->eflags | 2;
@@ -XXX,XX +XXX,XX @@ static int task_switch_32(CPUState *cpu, x68_segment_selector tss_sel, x68_segme
 
 void vmx_handle_task_switch(CPUState *cpu, x68_segment_selector tss_sel, int reason, bool gate_valid, uint8_t gate, uint64_t gate_type)
 {
-    uint64_t rip = rreg(cpu->hvf_fd, HV_X86_RIP);
+    uint64_t rip = rreg(cpu->hvf->fd, HV_X86_RIP);
     if (!gate_valid || (gate_type != VMCS_INTR_T_HWEXCEPTION &&
                         gate_type != VMCS_INTR_T_HWINTR &&
                         gate_type != VMCS_INTR_T_NMI)) {
-        int ins_len = rvmcs(cpu->hvf_fd, VMCS_EXIT_INSTRUCTION_LENGTH);
+        int ins_len = rvmcs(cpu->hvf->fd, VMCS_EXIT_INSTRUCTION_LENGTH);
         macvm_set_rip(cpu, rip + ins_len);
         return;
     }
@@ -XXX,XX +XXX,XX @@ void vmx_handle_task_switch(CPUState *cpu, x68_segment_selector tss_sel, int rea
         //ret = task_switch_16(cpu, tss_sel, old_tss_sel, old_tss_base, &next_tss_desc);
         VM_PANIC("task_switch_16");
 
-    macvm_set_cr0(cpu->hvf_fd, rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0) | CR0_TS);
+    macvm_set_cr0(cpu->hvf->fd, rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0) | CR0_TS);
     x86_segment_descriptor_to_vmx(cpu, tss_sel, &next_tss_desc, &vmx_seg);
     vmx_write_segment_descriptor(cpu, &vmx_seg, R_TR);
 
     store_regs(cpu);
 
-    hv_vcpu_invalidate_tlb(cpu->hvf_fd);
-    hv_vcpu_flush(cpu->hvf_fd);
+    hv_vcpu_invalidate_tlb(cpu->hvf->fd);
+    hv_vcpu_flush(cpu->hvf->fd);
 }
diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -XXX,XX +XXX,XX @@ void hvf_put_xsave(CPUState *cpu_state)
 
     x86_cpu_xsave_all_areas(X86_CPU(cpu_state), xsave);
 
-    if (hv_vcpu_write_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
+    if (hv_vcpu_write_fpstate(cpu_state->hvf->fd, (void*)xsave, 4096)) {
         abort();
     }
 }
@@ -XXX,XX +XXX,XX @@ void hvf_put_segments(CPUState *cpu_state)
     CPUX86State *env = &X86_CPU(cpu_state)->env;
     struct vmx_segment seg;
     
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_LIMIT, env->idt.limit);
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_BASE, env->idt.base);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_LIMIT, env->idt.limit);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_BASE, env->idt.base);
 
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_LIMIT, env->gdt.limit);
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_BASE, env->gdt.base);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_LIMIT, env->gdt.limit);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_BASE, env->gdt.base);
 
-    /* wvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR2, env->cr[2]); */
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR3, env->cr[3]);
+    /* wvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR2, env->cr[2]); */
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR3, env->cr[3]);
     vmx_update_tpr(cpu_state);
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IA32_EFER, env->efer);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IA32_EFER, env->efer);
 
-    macvm_set_cr4(cpu_state->hvf_fd, env->cr[4]);
-    macvm_set_cr0(cpu_state->hvf_fd, env->cr[0]);
+    macvm_set_cr4(cpu_state->hvf->fd, env->cr[4]);
+    macvm_set_cr0(cpu_state->hvf->fd, env->cr[0]);
 
     hvf_set_segment(cpu_state, &seg, &env->segs[R_CS], false);
     vmx_write_segment_descriptor(cpu_state, &seg, R_CS);
@@ -XXX,XX +XXX,XX @@ void hvf_put_segments(CPUState *cpu_state)
     hvf_set_segment(cpu_state, &seg, &env->ldt, false);
     vmx_write_segment_descriptor(cpu_state, &seg, R_LDTR);
     
-    hv_vcpu_flush(cpu_state->hvf_fd);
+    hv_vcpu_flush(cpu_state->hvf->fd);
 }
     
 void hvf_put_msrs(CPUState *cpu_state)
 {
     CPUX86State *env = &X86_CPU(cpu_state)->env;
 
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_CS,
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_CS,
                       env->sysenter_cs);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_ESP,
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_ESP,
                       env->sysenter_esp);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_EIP,
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_EIP,
                       env->sysenter_eip);
 
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_STAR, env->star);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_STAR, env->star);
 
 #ifdef TARGET_X86_64
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_CSTAR, env->cstar);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_KERNELGSBASE, env->kernelgsbase);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_FMASK, env->fmask);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_LSTAR, env->lstar);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_CSTAR, env->cstar);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_KERNELGSBASE, env->kernelgsbase);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_FMASK, env->fmask);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_LSTAR, env->lstar);
 #endif
 
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_GSBASE, env->segs[R_GS].base);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_FSBASE, env->segs[R_FS].base);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_GSBASE, env->segs[R_GS].base);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_FSBASE, env->segs[R_FS].base);
 }
 
 
@@ -XXX,XX +XXX,XX @@ void hvf_get_xsave(CPUState *cpu_state)
 
     xsave = X86_CPU(cpu_state)->env.xsave_buf;
 
-    if (hv_vcpu_read_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
+    if (hv_vcpu_read_fpstate(cpu_state->hvf->fd, (void*)xsave, 4096)) {
         abort();
     }
 
@@ -XXX,XX +XXX,XX @@ void hvf_get_segments(CPUState *cpu_state)
     vmx_read_segment_descriptor(cpu_state, &seg, R_LDTR);
     hvf_get_segment(&env->ldt, &seg);
 
-    env->idt.limit = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_LIMIT);
-    env->idt.base = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_BASE);
-    env->gdt.limit = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
-    env->gdt.base = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_BASE);
+    env->idt.limit = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_LIMIT);
+    env->idt.base = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_BASE);
+    env->gdt.limit = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
+    env->gdt.base = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_BASE);
 
-    env->cr[0] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR0);
+    env->cr[0] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR0);
     env->cr[2] = 0;
-    env->cr[3] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR3);
-    env->cr[4] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR4);
+    env->cr[3] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR3);
+    env->cr[4] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR4);
     
-    env->efer = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IA32_EFER);
+    env->efer = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IA32_EFER);
 }
 
 void hvf_get_msrs(CPUState *cpu_state)
@@ -XXX,XX +XXX,XX @@ void hvf_get_msrs(CPUState *cpu_state)
     CPUX86State *env = &X86_CPU(cpu_state)->env;
     uint64_t tmp;
     
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_CS, &tmp);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_CS, &tmp);
     env->sysenter_cs = tmp;
     
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_ESP, &tmp);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_ESP, &tmp);
     env->sysenter_esp = tmp;
 
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_EIP, &tmp);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_EIP, &tmp);
     env->sysenter_eip = tmp;
 
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_STAR, &env->star);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_STAR, &env->star);
 
 #ifdef TARGET_X86_64
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_CSTAR, &env->cstar);
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_KERNELGSBASE, &env->kernelgsbase);
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_FMASK, &env->fmask);
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_LSTAR, &env->lstar);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_CSTAR, &env->cstar);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_KERNELGSBASE, &env->kernelgsbase);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_FMASK, &env->fmask);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_LSTAR, &env->lstar);
 #endif
 
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_APICBASE, &tmp);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_APICBASE, &tmp);
     
-    env->tsc = rdtscp() + rvmcs(cpu_state->hvf_fd, VMCS_TSC_OFFSET);
+    env->tsc = rdtscp() + rvmcs(cpu_state->hvf->fd, VMCS_TSC_OFFSET);
 }
 
 int hvf_put_registers(CPUState *cpu_state)
@@ -XXX,XX +XXX,XX @@ int hvf_put_registers(CPUState *cpu_state)
     X86CPU *x86cpu = X86_CPU(cpu_state);
     CPUX86State *env = &x86cpu->env;
 
-    wreg(cpu_state->hvf_fd, HV_X86_RAX, env->regs[R_EAX]);
-    wreg(cpu_state->hvf_fd, HV_X86_RBX, env->regs[R_EBX]);
-    wreg(cpu_state->hvf_fd, HV_X86_RCX, env->regs[R_ECX]);
-    wreg(cpu_state->hvf_fd, HV_X86_RDX, env->regs[R_EDX]);
-    wreg(cpu_state->hvf_fd, HV_X86_RBP, env->regs[R_EBP]);
-    wreg(cpu_state->hvf_fd, HV_X86_RSP, env->regs[R_ESP]);
-    wreg(cpu_state->hvf_fd, HV_X86_RSI, env->regs[R_ESI]);
-    wreg(cpu_state->hvf_fd, HV_X86_RDI, env->regs[R_EDI]);
-    wreg(cpu_state->hvf_fd, HV_X86_R8, env->regs[8]);
-    wreg(cpu_state->hvf_fd, HV_X86_R9, env->regs[9]);
-    wreg(cpu_state->hvf_fd, HV_X86_R10, env->regs[10]);
-    wreg(cpu_state->hvf_fd, HV_X86_R11, env->regs[11]);
-    wreg(cpu_state->hvf_fd, HV_X86_R12, env->regs[12]);
-    wreg(cpu_state->hvf_fd, HV_X86_R13, env->regs[13]);
-    wreg(cpu_state->hvf_fd, HV_X86_R14, env->regs[14]);
-    wreg(cpu_state->hvf_fd, HV_X86_R15, env->regs[15]);
-    wreg(cpu_state->hvf_fd, HV_X86_RFLAGS, env->eflags);
-    wreg(cpu_state->hvf_fd, HV_X86_RIP, env->eip);
+    wreg(cpu_state->hvf->fd, HV_X86_RAX, env->regs[R_EAX]);
+    wreg(cpu_state->hvf->fd, HV_X86_RBX, env->regs[R_EBX]);
+    wreg(cpu_state->hvf->fd, HV_X86_RCX, env->regs[R_ECX]);
+    wreg(cpu_state->hvf->fd, HV_X86_RDX, env->regs[R_EDX]);
+    wreg(cpu_state->hvf->fd, HV_X86_RBP, env->regs[R_EBP]);
+    wreg(cpu_state->hvf->fd, HV_X86_RSP, env->regs[R_ESP]);
+    wreg(cpu_state->hvf->fd, HV_X86_RSI, env->regs[R_ESI]);
+    wreg(cpu_state->hvf->fd, HV_X86_RDI, env->regs[R_EDI]);
+    wreg(cpu_state->hvf->fd, HV_X86_R8, env->regs[8]);
+    wreg(cpu_state->hvf->fd, HV_X86_R9, env->regs[9]);
+    wreg(cpu_state->hvf->fd, HV_X86_R10, env->regs[10]);
+    wreg(cpu_state->hvf->fd, HV_X86_R11, env->regs[11]);
+    wreg(cpu_state->hvf->fd, HV_X86_R12, env->regs[12]);
+    wreg(cpu_state->hvf->fd, HV_X86_R13, env->regs[13]);
+    wreg(cpu_state->hvf->fd, HV_X86_R14, env->regs[14]);
+    wreg(cpu_state->hvf->fd, HV_X86_R15, env->regs[15]);
+    wreg(cpu_state->hvf->fd, HV_X86_RFLAGS, env->eflags);
+    wreg(cpu_state->hvf->fd, HV_X86_RIP, env->eip);
    
-    wreg(cpu_state->hvf_fd, HV_X86_XCR0, env->xcr0);
+    wreg(cpu_state->hvf->fd, HV_X86_XCR0, env->xcr0);
     
     hvf_put_xsave(cpu_state);
     
@@ -XXX,XX +XXX,XX @@ int hvf_put_registers(CPUState *cpu_state)
     
     hvf_put_msrs(cpu_state);
     
-    wreg(cpu_state->hvf_fd, HV_X86_DR0, env->dr[0]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR1, env->dr[1]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR2, env->dr[2]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR3, env->dr[3]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR4, env->dr[4]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR5, env->dr[5]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR6, env->dr[6]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR7, env->dr[7]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR0, env->dr[0]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR1, env->dr[1]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR2, env->dr[2]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR3, env->dr[3]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR4, env->dr[4]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR5, env->dr[5]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR6, env->dr[6]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR7, env->dr[7]);
     
     return 0;
 }
@@ -XXX,XX +XXX,XX @@ int hvf_get_registers(CPUState *cpu_state)
     X86CPU *x86cpu = X86_CPU(cpu_state);
     CPUX86State *env = &x86cpu->env;
 
-    env->regs[R_EAX] = rreg(cpu_state->hvf_fd, HV_X86_RAX);
-    env->regs[R_EBX] = rreg(cpu_state->hvf_fd, HV_X86_RBX);
-    env->regs[R_ECX] = rreg(cpu_state->hvf_fd, HV_X86_RCX);
-    env->regs[R_EDX] = rreg(cpu_state->hvf_fd, HV_X86_RDX);
-    env->regs[R_EBP] = rreg(cpu_state->hvf_fd, HV_X86_RBP);
-    env->regs[R_ESP] = rreg(cpu_state->hvf_fd, HV_X86_RSP);
-    env->regs[R_ESI] = rreg(cpu_state->hvf_fd, HV_X86_RSI);
-    env->regs[R_EDI] = rreg(cpu_state->hvf_fd, HV_X86_RDI);
-    env->regs[8] = rreg(cpu_state->hvf_fd, HV_X86_R8);
-    env->regs[9] = rreg(cpu_state->hvf_fd, HV_X86_R9);
-    env->regs[10] = rreg(cpu_state->hvf_fd, HV_X86_R10);
-    env->regs[11] = rreg(cpu_state->hvf_fd, HV_X86_R11);
-    env->regs[12] = rreg(cpu_state->hvf_fd, HV_X86_R12);
-    env->regs[13] = rreg(cpu_state->hvf_fd, HV_X86_R13);
-    env->regs[14] = rreg(cpu_state->hvf_fd, HV_X86_R14);
-    env->regs[15] = rreg(cpu_state->hvf_fd, HV_X86_R15);
+    env->regs[R_EAX] = rreg(cpu_state->hvf->fd, HV_X86_RAX);
+    env->regs[R_EBX] = rreg(cpu_state->hvf->fd, HV_X86_RBX);
+    env->regs[R_ECX] = rreg(cpu_state->hvf->fd, HV_X86_RCX);
+    env->regs[R_EDX] = rreg(cpu_state->hvf->fd, HV_X86_RDX);
+    env->regs[R_EBP] = rreg(cpu_state->hvf->fd, HV_X86_RBP);
+    env->regs[R_ESP] = rreg(cpu_state->hvf->fd, HV_X86_RSP);
+    env->regs[R_ESI] = rreg(cpu_state->hvf->fd, HV_X86_RSI);
+    env->regs[R_EDI] = rreg(cpu_state->hvf->fd, HV_X86_RDI);
+    env->regs[8] = rreg(cpu_state->hvf->fd, HV_X86_R8);
+    env->regs[9] = rreg(cpu_state->hvf->fd, HV_X86_R9);
+    env->regs[10] = rreg(cpu_state->hvf->fd, HV_X86_R10);
+    env->regs[11] = rreg(cpu_state->hvf->fd, HV_X86_R11);
+    env->regs[12] = rreg(cpu_state->hvf->fd, HV_X86_R12);
+    env->regs[13] = rreg(cpu_state->hvf->fd, HV_X86_R13);
+    env->regs[14] = rreg(cpu_state->hvf->fd, HV_X86_R14);
+    env->regs[15] = rreg(cpu_state->hvf->fd, HV_X86_R15);
     
-    env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
-    env->eip = rreg(cpu_state->hvf_fd, HV_X86_RIP);
+    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
+    env->eip = rreg(cpu_state->hvf->fd, HV_X86_RIP);
    
     hvf_get_xsave(cpu_state);
-    env->xcr0 = rreg(cpu_state->hvf_fd, HV_X86_XCR0);
+    env->xcr0 = rreg(cpu_state->hvf->fd, HV_X86_XCR0);
     
     hvf_get_segments(cpu_state);
     hvf_get_msrs(cpu_state);
     
-    env->dr[0] = rreg(cpu_state->hvf_fd, HV_X86_DR0);
-    env->dr[1] = rreg(cpu_state->hvf_fd, HV_X86_DR1);
-    env->dr[2] = rreg(cpu_state->hvf_fd, HV_X86_DR2);
-    env->dr[3] = rreg(cpu_state->hvf_fd, HV_X86_DR3);
-    env->dr[4] = rreg(cpu_state->hvf_fd, HV_X86_DR4);
-    env->dr[5] = rreg(cpu_state->hvf_fd, HV_X86_DR5);
-    env->dr[6] = rreg(cpu_state->hvf_fd, HV_X86_DR6);
-    env->dr[7] = rreg(cpu_state->hvf_fd, HV_X86_DR7);
+    env->dr[0] = rreg(cpu_state->hvf->fd, HV_X86_DR0);
+    env->dr[1] = rreg(cpu_state->hvf->fd, HV_X86_DR1);
+    env->dr[2] = rreg(cpu_state->hvf->fd, HV_X86_DR2);
+    env->dr[3] = rreg(cpu_state->hvf->fd, HV_X86_DR3);
+    env->dr[4] = rreg(cpu_state->hvf->fd, HV_X86_DR4);
+    env->dr[5] = rreg(cpu_state->hvf->fd, HV_X86_DR5);
+    env->dr[6] = rreg(cpu_state->hvf->fd, HV_X86_DR6);
+    env->dr[7] = rreg(cpu_state->hvf->fd, HV_X86_DR7);
     
     x86_update_hflags(env);
     return 0;
@@ -XXX,XX +XXX,XX @@ int hvf_get_registers(CPUState *cpu_state)
 static void vmx_set_int_window_exiting(CPUState *cpu)
 {
      uint64_t val;
-     val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
-     wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val |
+     val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
+     wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val |
              VMCS_PRI_PROC_BASED_CTLS_INT_WINDOW_EXITING);
 }
 
 void vmx_clear_int_window_exiting(CPUState *cpu)
 {
      uint64_t val;
-     val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
-     wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val &
+     val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
+     wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val &
              ~VMCS_PRI_PROC_BASED_CTLS_INT_WINDOW_EXITING);
 }
 
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
     uint64_t info = 0;
     if (have_event) {
         info = vector | intr_type | VMCS_INTR_VALID;
-        uint64_t reason = rvmcs(cpu_state->hvf_fd, VMCS_EXIT_REASON);
+        uint64_t reason = rvmcs(cpu_state->hvf->fd, VMCS_EXIT_REASON);
         if (env->nmi_injected && reason != EXIT_REASON_TASK_SWITCH) {
             vmx_clear_nmi_blocking(cpu_state);
         }
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
             info &= ~(1 << 12); /* clear undefined bit */
             if (intr_type == VMCS_INTR_T_SWINTR ||
                 intr_type == VMCS_INTR_T_SWEXCEPTION) {
-                wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INST_LENGTH, env->ins_len);
+                wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INST_LENGTH, env->ins_len);
             }
             
             if (env->has_error_code) {
-                wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_EXCEPTION_ERROR,
+                wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_EXCEPTION_ERROR,
                       env->error_code);
                 /* Indicate that VMCS_ENTRY_EXCEPTION_ERROR is valid */
                 info |= VMCS_INTR_DEL_ERRCODE;
             }
             /*printf("reinject  %lx err %d\n", info, err);*/
-            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, info);
+            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, info);
         };
     }
 
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
         if (!(env->hflags2 & HF2_NMI_MASK) && !(info & VMCS_INTR_VALID)) {
             cpu_state->interrupt_request &= ~CPU_INTERRUPT_NMI;
             info = VMCS_INTR_VALID | VMCS_INTR_T_NMI | EXCP02_NMI;
-            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, info);
+            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, info);
         } else {
             vmx_set_nmi_window_exiting(cpu_state);
         }
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
         int line = cpu_get_pic_interrupt(&x86cpu->env);
         cpu_state->interrupt_request &= ~CPU_INTERRUPT_HARD;
         if (line >= 0) {
-            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, line |
+            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, line |
                   VMCS_INTR_VALID | VMCS_INTR_T_HWINTR);
         }
     }
@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
     X86CPU *cpu = X86_CPU(cpu_state);
     CPUX86State *env = &cpu->env;
 
-    env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
+    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
 
     if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
         cpu_synchronize_state(cpu_state);
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

The hooks we have that call us after reset, init and loadvm really all
just want to say "The reference of all register state is in the QEMU
vcpu struct, please push it".

We already have a working pushing mechanism though called cpu->vcpu_dirty,
so we can just reuse that for all of the above, syncing state properly the
next time we actually execute a vCPU.

This fixes PSCI resets on ARM, as they modify CPU state even after the
post init call has completed, but before we execute the vCPU again.

To also make the scheme work for x86, we have to make sure we don't
move stale eflags into our env when the vcpu state is dirty.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-13-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.c | 27 +++++++--------------------
 target/i386/hvf/x86hvf.c  |  5 ++++-
 2 files changed, 11 insertions(+), 21 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static void hvf_cpu_synchronize_state(CPUState *cpu)
     }
 }
 
-static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
-                                              run_on_cpu_data arg)
+static void do_hvf_cpu_synchronize_set_dirty(CPUState *cpu,
+                                             run_on_cpu_data arg)
 {
-    hvf_put_registers(cpu);
-    cpu->vcpu_dirty = false;
+    /* QEMU state is the reference, push it to HVF now and on next entry */
+    cpu->vcpu_dirty = true;
 }
 
 static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
-}
-
-static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
-                                             run_on_cpu_data arg)
-{
-    hvf_put_registers(cpu);
-    cpu->vcpu_dirty = false;
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
 }
 
 static void hvf_cpu_synchronize_post_init(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
-}
-
-static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
-                                              run_on_cpu_data arg)
-{
-    cpu->vcpu_dirty = true;
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
 }
 
 static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
 }
 
 static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
     X86CPU *cpu = X86_CPU(cpu_state);
     CPUX86State *env = &cpu->env;
 
-    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
+    if (!cpu_state->vcpu_dirty) {
+        /* light weight sync for CPU_INTERRUPT_HARD and IF_MASK */
+        env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
+    }
 
     if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
         cpu_synchronize_state(cpu_state);
-- 
2.20.1

Coverity notes that we don't check for dup2() failing.  Add some
assertions so that if it does ever happen we get some indication.
(This is similar to how we handle other "don't expect this syscall to
fail" checks in this test code.)

Fixes: Coverity CID 1432346
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-2-peter.maydell@linaro.org
---
 tests/qtest/bios-tables-test.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -XXX,XX +XXX,XX @@ static void test_acpi_asl(test_data *data)
                                                  exp_sdt->asl_file, sdt->asl_file);
                     int out = dup(STDOUT_FILENO);
                     int ret G_GNUC_UNUSED;
+                    int dupret;
 
-                    dup2(STDERR_FILENO, STDOUT_FILENO);
+                    g_assert(out >= 0);
+                    dupret = dup2(STDERR_FILENO, STDOUT_FILENO);
+                    g_assert(dupret >= 0);
                     ret = system(diff) ;
-                    dup2(out, STDOUT_FILENO);
+                    dupret = dup2(out, STDOUT_FILENO);
+                    g_assert(dupret >= 0);
                     close(out);
                     g_free(diff);
                 }
-- 
2.20.1

The e1000e_send_verify() test calls qemu_recv() but doesn't
check that the call succeeded, which annoys Coverity. Add
an explicit test check for the length of the data.

(This is a test check, not a "we assume this syscall always
succeeds", so we use g_assert_cmpint() rather than g_assert().)

Fixes: Coverity CID 1432324
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-3-peter.maydell@linaro.org
---
 tests/qtest/e1000e-test.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/e1000e-test.c b/tests/qtest/e1000e-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/e1000e-test.c
+++ b/tests/qtest/e1000e-test.c
@@ -XXX,XX +XXX,XX @@ static void e1000e_send_verify(QE1000E *d, int *test_sockets, QGuestAllocator *a
     /* Check data sent to the backend */
     ret = qemu_recv(test_sockets[0], &recv_len, sizeof(recv_len), 0);
     g_assert_cmpint(ret, == , sizeof(recv_len));
-    qemu_recv(test_sockets[0], buffer, 64, 0);
+    ret = qemu_recv(test_sockets[0], buffer, 64, 0);
+    g_assert_cmpint(ret, >=, 5);
     g_assert_cmpstr(buffer, == , "TEST");
 
     /* Free test data buffer */
-- 
2.20.1

Coverity notices that the checks against mkstemp() failing in
create_qcow2_with_mbr() are wrong: mkstemp returns -1 on failure but
the check is just "g_assert(fd)".  Fix to use "g_assert(fd >= 0)",
matching the correct check in create_test_img().

Fixes: Coverity CID 1432274
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-4-peter.maydell@linaro.org
---
 tests/qtest/hd-geo-test.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/hd-geo-test.c b/tests/qtest/hd-geo-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/hd-geo-test.c
+++ b/tests/qtest/hd-geo-test.c
@@ -XXX,XX +XXX,XX @@ static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors)
     }
 
     fd = mkstemp(raw_path);
-    g_assert(fd);
+    g_assert(fd >= 0);
     close(fd);
 
     fd = open(raw_path, O_WRONLY);
@@ -XXX,XX +XXX,XX @@ static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors)
     close(fd);
 
     fd = mkstemp(qcow2_path);
-    g_assert(fd);
+    g_assert(fd >= 0);
     close(fd);
 
     qemu_img_path = getenv("QTEST_QEMU_IMG");
-- 
2.20.1

Coverity points out that we calculate a 64-bit value using 32-bit
arithmetic; add the cast to force the multiply to be done as 64-bits.
(The overflow will never happen with the current test data.)

Fixes: Coverity CID 1432320
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-5-peter.maydell@linaro.org
---
 tests/qtest/pflash-cfi02-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/pflash-cfi02-test.c b/tests/qtest/pflash-cfi02-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/pflash-cfi02-test.c
+++ b/tests/qtest/pflash-cfi02-test.c
@@ -XXX,XX +XXX,XX @@ static void test_geometry(const void *opaque)
 
     for (int region = 0; region < nb_erase_regions; ++region) {
         for (uint32_t i = 0; i < c->nb_blocs[region]; ++i) {
-            uint64_t byte_addr = i * c->sector_len[region];
+            uint64_t byte_addr = (uint64_t)i * c->sector_len[region];
             g_assert_cmphex(flash_read(c, byte_addr), ==, bank_mask(c));
         }
     }
-- 
2.20.1

Coverity points out that in tpm_test_swtpm_migration_test() we
assume that src_tpm_addr and dst_tpm_addr are non-NULL (we
pass them to tpm_util_migration_start_qemu() which will
unconditionally dereference them) but then later explicitly
check them for NULL. Remove the pointless checks.

Fixes: Coverity CID 1432367, 1432359

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-6-peter.maydell@linaro.org
---
 tests/qtest/tpm-tests.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/tpm-tests.c
+++ b/tests/qtest/tpm-tests.c
@@ -XXX,XX +XXX,XX @@ void tpm_test_swtpm_migration_test(const char *src_tpm_path,
     qtest_quit(src_qemu);
 
     tpm_util_swtpm_kill(dst_tpm_pid);
-    if (dst_tpm_addr) {
-        g_unlink(dst_tpm_addr->u.q_unix.path);
-        qapi_free_SocketAddress(dst_tpm_addr);
-    }
+    g_unlink(dst_tpm_addr->u.q_unix.path);
+    qapi_free_SocketAddress(dst_tpm_addr);
 
     tpm_util_swtpm_kill(src_tpm_pid);
-    if (src_tpm_addr) {
-        g_unlink(src_tpm_addr->u.q_unix.path);
-        qapi_free_SocketAddress(src_tpm_addr);
-    }
+    g_unlink(src_tpm_addr->u.q_unix.path);
+    qapi_free_SocketAddress(src_tpm_addr);
 }
-- 
2.20.1

Coverity complains that we don't check for failures from dup()
and mkstemp(); add asserts that these syscalls succeeded.

Fixes: Coverity CID 1432516, 1432574
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20210525134458.6675-7-peter.maydell@linaro.org
---
 tests/unit/test-vmstate.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tests/unit/test-vmstate.c b/tests/unit/test-vmstate.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/unit/test-vmstate.c
+++ b/tests/unit/test-vmstate.c
@@ -XXX,XX +XXX,XX @@ static int temp_fd;
 /* Duplicate temp_fd and seek to the beginning of the file */
 static QEMUFile *open_test_file(bool write)
 {
-    int fd = dup(temp_fd);
+    int fd;
     QIOChannel *ioc;
     QEMUFile *f;
 
+    fd = dup(temp_fd);
+    g_assert(fd >= 0);
     lseek(fd, 0, SEEK_SET);
     if (write) {
         g_assert_cmpint(ftruncate(fd, 0), ==, 0);
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
     g_autofree char *temp_file = g_strdup_printf("%s/vmst.test.XXXXXX",
                                                  g_get_tmp_dir());
     temp_fd = mkstemp(temp_file);
+    g_assert(temp_fd >= 0);
 
     module_call_init(MODULE_INIT_QOM);
 
-- 
2.20.1