Series comparison

-[Qemu-devel] [PULL 00/45] target-arm queue
+[PULL 00/45] target-arm queue
-As promised, another pullreq... This one's mostly RTH's patches.
+The following changes since commit a97978bcc2d1f650c7d411428806e5b03082b8c7:
-thanks
+  Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210603' into staging (2021-06-03 10:00:35 +0100)
 -- PMM
 The following changes since commit 784c2e4f232adf5ef47a84a262ec72a07d068d6a:
   Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging (2018-10-19 15:30:40 +0100)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20181019
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210603
-for you to fetch changes up to 88c9add25e7120e8622796c81ad3f3fb7f8d40e7:
+for you to fetch changes up to 1c861885894d840235954060050d240259f5340b:
-  target/arm: Only flush tlb if ASID changes (2018-10-19 17:38:48 +0100)
+  tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed (2021-06-03 16:43:27 +0100)
 ----------------------------------------------------------------
 target-arm queue:
- * ssi-sd: Make devices picking up backends unavailable with -device
+ * Some not-yet-enabled preliminaries for M-profile MVE support
- * Add support for VCPU event states
+ * Consistently use "Cortex-Axx", not "Cortex Axx" in docs, comments
- * Move towards making ID registers the source of truth for
+ * docs: Fix installation of man pages with Sphinx 4.x
-   whether a guest CPU implements a feature, rather than having
+ * Mark LDS{MIN,MAX} as signed operations
-   parallel ID registers and feature bit flags
+ * Fix missing syndrome value for DAIF and PAC check exceptions
- * Implement various HCR hypervisor trap/config bits
+ * Implement BFloat16 extensions
- * Get IL bit correct for v7 syndrome values
+ * Refactoring of hvf accelerator code in preparation for aarch64 support
- * Report correct syndrome for FP/SIMD traps to Hyp mode
+ * Fix some coverity nits in test code
  * hw/arm/boot: Increase compliance with kernel arm64 boot protocol
  * Refactor A32 Neon to use generic vector infrastructure
  * Fix a bug in A32 VLD2 "(multiple 2-element structures)" insn
  * net: cadence_gem: Report features correctly in ID register
  * Avoid some unnecessary TLB flushes on TTBR register writes
 ----------------------------------------------------------------
-Dongjiu Geng (1):
+Alexander Graf (12):
-      target/arm: Add support for VCPU event states
+      hvf: Move assert_hvf_ok() into common directory
       hvf: Move vcpu thread functions into common directory
       hvf: Move cpu functions into common directory
       hvf: Move hvf internal definitions into common header
       hvf: Make hvf_set_phys_mem() static
       hvf: Remove use of hv_uvaddr_t and hv_gpaddr_t
       hvf: Split out common code on vcpu init and destroy
       hvf: Use cpu_synchronize_state()
       hvf: Make synchronize functions static
       hvf: Remove hvf-accel-ops.h
       hvf: Introduce hvf vcpu struct
       hvf: Simplify post reset/init/loadvm hooks
-Edgar E. Iglesias (2):
+Damien Goutte-Gattat (1):
-      net: cadence_gem: Announce availability of priority queues
+      docs: Fix installation of man pages with Sphinx 4.x
       net: cadence_gem: Announce 64bit addressing support
-Markus Armbruster (1):
+Jamie Iles (4):
-      ssi-sd: Make devices picking up backends unavailable with -device
+      target/arm: fix missing exception class
       target/arm: fold do_raise_exception into raise_exception
       target/arm: use raise_exception_ra for MTE check failure
       target/arm: use raise_exception_ra for stack limit exception
-Peter Maydell (10):
+Peter Maydell (15):
-      target/arm: Improve debug logging of AArch32 exception return
+      target/arm: Add isar feature check functions for MVE
-      target/arm: Make switch_mode() file-local
+      target/arm: Update feature checks for insns which are "MVE or FP"
-      target/arm: Implement HCR.FB
+      target/arm: Move fpsp/fpdp isar check into callers of do_vfp_2op_sp/dp
-      target/arm: Implement HCR.DC
+      target/arm: Add MVE check to VMOV_reg_sp and VMOV_reg_dp
-      target/arm: ISR_EL1 bits track virtual interrupts if IMO/FMO set
+      target/arm: Fix return values in fp_sysreg_checks()
-      target/arm: Implement HCR.VI and VF
+      target/arm: Implement M-profile VPR register
-      target/arm: Implement HCR.PTW
+      target/arm: Make FPSCR.LTPSIZE writable for MVE
-      target/arm: New utility function to extract EC from syndrome
+      target/arm: Allow board models to specify initial NS VTOR
-      target/arm: Get IL bit correct for v7 syndrome values
+      arm: Consistently use "Cortex-Axx", not "Cortex Axx"
-      target/arm: Report correct syndrome for FP/SIMD traps to Hyp mode
+      tests/qtest/bios-tables-test: Check for dup2() failure
       tests/qtest/e1000e-test: Check qemu_recv() succeeded
       tests/qtest/hd-geo-test: Fix checks on mkstemp() return value
       tests/qtest/pflash-cfi02-test: Avoid potential integer overflow
       tests/qtest/tpm-tests: Remove unnecessary NULL checks
       tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed
-Richard Henderson (30):
+Richard Henderson (13):
-      target/arm: Move some system registers into a substructure
+      target/arm: Mark LDS{MIN,MAX} as signed operations
-      target/arm: V8M should not imply V7VE
+      target/arm: Add isar_feature_{aa32, aa64, aa64_sve}_bf16
-      target/arm: Convert v8 extensions from feature bits to isar tests
+      target/arm: Unify unallocated path in disas_fp_1src
-      target/arm: Convert division from feature bits to isar0 tests
+      target/arm: Implement scalar float32 to bfloat16 conversion
-      target/arm: Convert jazelle from feature bit to isar1 test
+      target/arm: Implement vector float32 to bfloat16 conversion
-      target/arm: Convert t32ee from feature bit to isar3 test
+      softfpu: Add float_round_to_odd_inf
-      target/arm: Convert sve from feature bit to aa64pfr0 test
+      target/arm: Implement bfloat16 dot product (vector)
-      target/arm: Convert v8.2-fp16 from feature bit to aa64pfr0 test
+      target/arm: Implement bfloat16 dot product (indexed)
-      target/arm: Hoist address increment for vector memory ops
+      target/arm: Implement bfloat16 matrix multiply accumulate
-      target/arm: Don't call tcg_clear_temp_count
+      target/arm: Implement bfloat widening fma (vector)
-      target/arm: Use tcg_gen_gvec_dup_i64 for LD[1-4]R
+      target/arm: Implement bfloat widening fma (indexed)
-      target/arm: Promote consecutive memory ops for aa64
+      linux-user/aarch64: Enable hwcap bits for bfloat16
-      target/arm: Mark some arrays const
+      target/arm: Enable BFloat16 extensions
       target/arm: Use gvec for NEON VDUP
       target/arm: Use gvec for NEON VMOV, VMVN, VBIC & VORR (immediate)
       target/arm: Use gvec for NEON_3R_LOGIC insns
       target/arm: Use gvec for NEON_3R_VADD_VSUB insns
       target/arm: Use gvec for NEON_2RM_VMN, NEON_2RM_VNEG
       target/arm: Use gvec for NEON_3R_VMUL
       target/arm: Use gvec for VSHR, VSHL
       target/arm: Use gvec for VSRA
       target/arm: Use gvec for VSRI, VSLI
       target/arm: Use gvec for NEON_3R_VML
       target/arm: Use gvec for NEON_3R_VTST_VCEQ, NEON_3R_VCGT, NEON_3R_VCGE
       target/arm: Use gvec for NEON VLD all lanes
       target/arm: Reorg NEON VLD/VST all elements
       target/arm: Promote consecutive memory ops for aa32
       target/arm: Reorg NEON VLD/VST single element to one lane
       target/arm: Remove writefn from TTBR0_EL3
       target/arm: Only flush tlb if ASID changes
-Stewart Hildebrand (1):
+ docs/conf.py                    |   1 +
-      hw/arm/boot: Increase compliance with kernel arm64 boot protocol
+ docs/system/arm/aspeed.rst      |   4 +-
  docs/system/arm/nuvoton.rst     |   6 +-
  docs/system/arm/sabrelite.rst   |   2 +-
  include/fpu/softfloat-types.h   |   4 +-
  include/hw/arm/allwinner-h3.h   |   2 +-
  include/hw/arm/armv7m.h         |   2 +
  include/hw/core/cpu.h           |   3 +-
  include/sysemu/hvf_int.h        |  58 +++++
  target/arm/cpu.h                |  48 +++-
  target/arm/helper-sve.h         |   4 +
  target/arm/helper.h             |  15 ++
  target/i386/hvf/hvf-accel-ops.h |  23 --
  target/i386/hvf/hvf-i386.h      |  33 +--
  target/i386/hvf/vmx.h           |  24 +-
  target/i386/hvf/x86hvf.h        |   2 -
  target/arm/neon-dp.decode       |   1 +
  target/arm/neon-shared.decode   |  11 +
  target/arm/sve.decode           |  19 +-
  target/arm/vfp.decode           |   2 +
  accel/hvf/hvf-accel-ops.c       | 471 ++++++++++++++++++++++++++++++++++++++++
  accel/hvf/hvf-all.c             |  47 ++++
  hw/arm/armv7m.c                 |   7 +
  hw/arm/aspeed.c                 |   6 +-
  hw/arm/mcimx6ul-evk.c           |   2 +-
  hw/arm/mcimx7d-sabre.c          |   2 +-
  hw/arm/npcm7xx_boards.c         |   4 +-
  hw/arm/sabrelite.c              |   2 +-
  hw/misc/npcm7xx_clk.c           |   2 +-
  linux-user/elfload.c            |   2 +
  target/arm/cpu.c                |  13 ++
  target/arm/cpu64.c              |   3 +
  target/arm/cpu_tcg.c            |   1 +
  target/arm/m_helper.c           |   5 +-
  target/arm/machine.c            |  20 ++
  target/arm/mte_helper.c         |  12 +-
  target/arm/op_helper.c          |  32 ++-
  target/arm/sve_helper.c         |   2 +
  target/arm/translate-a64.c      | 155 +++++++++++--
  target/arm/translate-neon.c     |  91 ++++++++
  target/arm/translate-sve.c      | 112 ++++++++++
  target/arm/translate-vfp.c      | 164 ++++++++++----
  target/arm/vec_helper.c         | 140 +++++++++++-
  target/arm/vfp_helper.c         |  21 +-
  target/i386/hvf/hvf-accel-ops.c | 146 -------------
  target/i386/hvf/hvf.c           | 464 +++++----------------------------------
  target/i386/hvf/x86.c           |  28 +--
  target/i386/hvf/x86_descr.c     |  26 +--
  target/i386/hvf/x86_emu.c       |  62 +++---
  target/i386/hvf/x86_mmu.c       |   4 +-
  target/i386/hvf/x86_task.c      |  12 +-
  target/i386/hvf/x86hvf.c        | 222 +++++++++----------
  tests/qtest/bios-tables-test.c  |   8 +-
  tests/qtest/e1000e-test.c       |   3 +-
  tests/qtest/hd-geo-test.c       |   4 +-
  tests/qtest/pflash-cfi02-test.c |   2 +-
  tests/qtest/tpm-tests.c         |  12 +-
  tests/unit/test-vmstate.c       |   5 +-
  fpu/softfloat-parts.c.inc       |   6 +-
  MAINTAINERS                     |   8 +
  accel/hvf/meson.build           |   7 +
  accel/meson.build               |   1 +
  target/i386/hvf/meson.build     |   1 -
 files changed, 1666 insertions(+), 935 deletions(-)
  create mode 100644 include/sysemu/hvf_int.h
  delete mode 100644 target/i386/hvf/hvf-accel-ops.h
  create mode 100644 accel/hvf/hvf-accel-ops.c
  create mode 100644 accel/hvf/hvf-all.c
  delete mode 100644 target/i386/hvf/hvf-accel-ops.c
  create mode 100644 accel/hvf/meson.build
- target/arm/cpu.h            |  227 ++++++-
- target/arm/internals.h      |   45 +-
- target/arm/kvm_arm.h        |   24 +
- target/arm/translate.h      |   21 +
- hw/arm/boot.c               |   18 +
- hw/intc/armv7m_nvic.c       |   12 +-
- hw/net/cadence_gem.c        |    9 +-
- hw/sd/ssi-sd.c              |    2 +
- linux-user/aarch64/signal.c |    4 +-
- linux-user/elfload.c        |   60 +-
- linux-user/syscall.c        |   10 +-
- target/arm/cpu.c            |  242 ++++----
- target/arm/cpu64.c          |  148 +++--
- target/arm/helper.c         |  397 ++++++++----
- target/arm/kvm.c            |   60 ++
- target/arm/kvm32.c          |   13 +
- target/arm/kvm64.c          |   15 +-
- target/arm/machine.c        |   28 +-
- target/arm/op_helper.c      |    2 +-
- target/arm/translate-a64.c  |  715 ++++-----------------
- target/arm/translate.c      | 1451 ++++++++++++++++++++++++++++---------------
-files changed, 2021 insertions(+), 1482 deletions(-)

-[Qemu-devel] [PULL 11/45] target/arm: Improve debug logging of AArch32 exception return
+[PULL 01/45] target/arm: Add isar feature check functions for MVE
-For AArch32, exception return happens through certain kinds
+Add the isar feature check functions we will need for v8.1M MVE:
-of CPSR write. We don't currently have any CPU_LOG_INT logging
+ * a check for MVE present: this corresponds to the pseudocode's
-of these events (unlike AArch64, where we log in the ERET
+   CheckDecodeFaults(ExtType_Mve)
-instruction). Add some suitable logging.
+ * a check for the optional floating-point part of MVE: this
+   corresponds to CheckDecodeFaults(ExtType_MveFp)
 This will log exception returns like this:
 Exception return from AArch32 hyp to usr PC 0x80100374
 paralleling the existing logging in the exception_return
 helper for AArch64 exception returns:
 Exception return from AArch64 EL2 to AArch64 EL0 PC 0x8003045c
 Exception return from AArch64 EL2 to AArch32 EL0 PC 0x8003045c
 (Note that an AArch32 exception return can only be
 AArch32->AArch32, never to AArch64.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181012144235.19646-2-peter.maydell@linaro.org
+Message-id: 20210520152840.24453-2-peter.maydell@linaro.org
 ---
- target/arm/internals.h | 18 ++++++++++++++++++
+ target/arm/cpu.h | 22 ++++++++++++++++++++++
- target/arm/helper.c    | 10 ++++++++++
+file changed, 22 insertions(+)
  target/arm/translate.c |  7 +------
 files changed, 29 insertions(+), 6 deletions(-)
-diff --git a/target/arm/internals.h b/target/arm/internals.h
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/internals.h
+--- a/target/arm/cpu.h
-+++ b/target/arm/internals.h
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline uint32_t v7m_sp_limit(CPUARMState *env)
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
      }
  }
-+/**
++static inline bool isar_feature_aa32_mve(const ARMISARegisters *id)
 + * aarch32_mode_name(): Return name of the AArch32 CPU mode
 + * @psr: Program Status Register indicating CPU mode
 + *
 + * Returns, for debug logging purposes, a printable representation
 + * of the AArch32 CPU mode ("svc", "usr", etc) as indicated by
 + * the low bits of the specified PSR.
 + */
 +static inline const char *aarch32_mode_name(uint32_t psr)
 +{
-+    static const char cpu_mode_names[16][4] = {
++    /*
-+        "usr", "fiq", "irq", "svc", "???", "???", "mon", "abt",
++     * Return true if MVE is supported (either integer or floating point).
-+        "???", "???", "hyp", "und", "???", "???", "???", "sys"
++     * We must check for M-profile as the MVFR1 field means something
-+    };
++     * else for A-profile.
-+
++     */
-+    return cpu_mode_names[psr & 0xf];
++    return isar_feature_aa32_mprofile(id) &&
 +        FIELD_EX32(id->mvfr1, MVFR1, MVE) > 0;
 +}
 +
- #endif
++static inline bool isar_feature_aa32_mve_fp(const ARMISARegisters *id)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
++{
-index XXXXXXX..XXXXXXX 100644
++    /*
---- a/target/arm/helper.c
++     * Return true if MVE is supported (either integer or floating point).
-+++ b/target/arm/helper.c
++     * We must check for M-profile as the MVFR1 field means something
-@@ -XXX,XX +XXX,XX @@ void cpsr_write(CPUARMState *env, uint32_t val, uint32_t mask,
++     * else for A-profile.
-                 mask |= CPSR_IL;
++     */
-                 val |= CPSR_IL;
++    return isar_feature_aa32_mprofile(id) &&
-             }
++        FIELD_EX32(id->mvfr1, MVFR1, MVE) >= 2;
-+            qemu_log_mask(LOG_GUEST_ERROR,
++}
-+                          "Illegal AArch32 mode switch attempt from %s to %s\n",
++
-+                          aarch32_mode_name(env->uncached_cpsr),
+ static inline bool isar_feature_aa32_vfp_simd(const ARMISARegisters *id)
 +                          aarch32_mode_name(val));
          } else {
 +            qemu_log_mask(CPU_LOG_INT, "%s %s to %s PC 0x%" PRIx32 "\n",
 +                          write_type == CPSRWriteExceptionReturn ?
 +                          "Exception return from AArch32" :
 +                          "AArch32 mode switch from",
 +                          aarch32_mode_name(env->uncached_cpsr),
 +                          aarch32_mode_name(val), env->regs[15]);
              switch_mode(env, val & CPSR_M);
          }
      }
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ void gen_intermediate_code(CPUState *cpu, TranslationBlock *tb)
      translator_loop(ops, &dc.base, cpu, tb);
  }
 -static const char *cpu_mode_names[16] = {
 -  "usr", "fiq", "irq", "svc", "???", "???", "mon", "abt",
 -  "???", "???", "hyp", "und", "???", "???", "???", "sys"
 -};
 -
  void arm_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
                          int flags)
  {
-@@ -XXX,XX +XXX,XX @@ void arm_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
+     /*
                      psr & CPSR_V ? 'V' : '-',
                      psr & CPSR_T ? 'T' : 'A',
                      ns_status,
 -                    cpu_mode_names[psr & 0xf], (psr & 0x10) ? 32 : 26);
 +                    aarch32_mode_name(psr), (psr & 0x10) ? 32 : 26);
      }
      if (flags & CPU_DUMP_FPU) {
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 14/45] target/arm: Implement HCR.DC
+[PULL 02/45] target/arm: Update feature checks for insns which are "MVE or FP"
-The HCR.DC virtualization configuration register bit has the
+Some v8M instructions are present if either the floating point
-following effects:
+extension or MVE is implemented.  Update our implementation of them
- * SCTLR.M behaves as if it is 0 for all purposes except
+to check for MVE as well as for FP.
    direct reads of the bit
  * HCR.VM behaves as if it is 1 for all purposes except
    direct reads of the bit
  * the memory type produced by the first stage of the EL1&EL0
    translation regime is Normal Non-Shareable,
    Inner Write-Back Read-Allocate Write-Allocate,
    Outer Write-Back Read-Allocate Write-Allocate.
-Implement this behaviour.
+This is all the insns which use CheckDecodeFaults(ExtType_MveOrFp) or
 CheckDecodeFaults(ExtType_MveOrDpFp) in their pseudocode, which are
 essentially the loads and stores, moves and sysreg accesses, except
 for VMOV_reg_sp and VMOV_reg_dp, which we handle in subsequent
 patches because they need a refactor to provide a place to put the
 new MVE check.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181012144235.19646-5-peter.maydell@linaro.org
+Message-id: 20210520152840.24453-3-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 23 +++++++++++++++++++++--
+ target/arm/translate-vfp.c | 48 +++++++++++++++++++++++---------------
-file changed, 21 insertions(+), 2 deletions(-)
+file changed, 29 insertions(+), 19 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/translate-vfp.c
-+++ b/target/arm/helper.c
++++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
-          * * The Non-secure TTBCR.EAE bit is set to 1
+     /* VMOV scalar to general purpose register */
-          * * The implementation includes EL2, and the value of HCR.VM is 1
+     TCGv_i32 tmp;
-          *
-+         * (Note that HCR.DC makes HCR.VM behave as if it is 1.)
+-    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
-+         *
+-    if (a->size == MO_32
-          * ATS1Hx always uses the 64bit format (not supported yet).
+-        ? !dc_isar_feature(aa32_fpsp_v2, s)
-          */
+-        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-         format64 = arm_s1_regime_using_lpae_format(env, mmu_idx);
+-        return false;
++    /*
-         if (arm_feature(env, ARM_FEATURE_EL2)) {
++     * SIZE == MO_32 is a VFP instruction; otherwise NEON. MVE has
-             if (mmu_idx == ARMMMUIdx_S12NSE0 || mmu_idx == ARMMMUIdx_S12NSE1) {
++     * all sizes, whether the CPU has fp or not.
--                format64 |= env->cp15.hcr_el2 & HCR_VM;
++     */
-+                format64 |= env->cp15.hcr_el2 & (HCR_VM | HCR_DC);
++    if (!dc_isar_feature(aa32_mve, s)) {
-             } else {
++        if (a->size == MO_32
-                 format64 |= arm_current_el(env) == 2;
++            ? !dc_isar_feature(aa32_fpsp_v2, s)
-             }
++            : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-@@ -XXX,XX +XXX,XX @@ static inline bool regime_translation_disabled(CPUARMState *env,
++            return false;
 +        }
      }
-     if (mmu_idx == ARMMMUIdx_S2NS) {
+     /* UNDEF accesses to D16-D31 if they don't exist */
--        return (env->cp15.hcr_el2 & HCR_VM) == 0;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
-+        /* HCR.DC means HCR.VM behaves as 1 */
+     /* VMOV general purpose register to scalar */
-+        return (env->cp15.hcr_el2 & (HCR_DC | HCR_VM)) == 0;
+     TCGv_i32 tmp;
 -    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
 -    if (a->size == MO_32
 -        ? !dc_isar_feature(aa32_fpsp_v2, s)
 -        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
 -        return false;
 +    /*
 +     * SIZE == MO_32 is a VFP instruction; otherwise NEON. MVE has
 +     * all sizes, whether the CPU has fp or not.
 +     */
 +    if (!dc_isar_feature(aa32_mve, s)) {
 +        if (a->size == MO_32
 +            ? !dc_isar_feature(aa32_fpsp_v2, s)
 +            : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +            return false;
 +        }
      }
-     if (env->cp15.hcr_el2 & HCR_TGE) {
+     /* UNDEF accesses to D16-D31 if they don't exist */
-@@ -XXX,XX +XXX,XX @@ static inline bool regime_translation_disabled(CPUARMState *env,
+@@ -XXX,XX +XXX,XX @@ typedef enum FPSysRegCheckResult {
-         }
  static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
  {
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return FPSysRegCheckFailed;
      }
-+    if ((env->cp15.hcr_el2 & HCR_DC) &&
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
-+        (mmu_idx == ARMMMUIdx_S1NSE0 || mmu_idx == ARMMMUIdx_S1NSE1)) {
+ {
-+        /* HCR.DC means SCTLR_EL1.M behaves as 0 */
+     TCGv_i32 tmp;
-+        return true;
-+    }
+-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
-+
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
-     return (regime_sctlr(env, mmu_idx) & SCTLR_M) == 0;
+         return false;
- }
+     }
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr(CPUARMState *env, target_ulong address,
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
+ {
-             /* Combine the S1 and S2 cache attributes, if needed */
+     TCGv_i32 tmp;
-             if (!ret && cacheattrs != NULL) {
-+                if (env->cp15.hcr_el2 & HCR_DC) {
+-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
-+                    /*
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
-+                     * HCR.DC forces the first stage attributes to
+         return false;
-+                     *  Normal Non-Shareable,
+     }
-+                     *  Inner Write-Back Read-Allocate Write-Allocate,
-+                     *  Outer Write-Back Read-Allocate Write-Allocate.
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
-+                     */
+      * floating point register.  Note that this does not require support
-+                    cacheattrs->attrs = 0xff;
+      * for double precision arithmetic.
-+                    cacheattrs->shareability = 0;
+      */
-+                }
+-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
-                 *cacheattrs = combine_cacheattrs(*cacheattrs, cacheattrs2);
++    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
-             }
+         return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
      uint32_t offset;
      TCGv_i32 addr, tmp;
 -    if (!dc_isar_feature(aa32_fp16_arith, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
      uint32_t offset;
      TCGv_i32 addr, tmp;
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
      TCGv_i64 tmp;
      /* Note that this does not require support for double arithmetic.  */
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
      TCGv_i32 addr, tmp;
      int i, n;
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
      int i, n;
      /* Note that this does not require support for double arithmetic.  */
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return false;
      }
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 12/45] target/arm: Make switch_mode() file-local
+[PULL 03/45] target/arm: Move fpsp/fpdp isar check into callers of do_vfp_2op_sp/dp
-The switch_mode() function is defined in target/arm/helper.c and used
+The do_vfp_2op_sp() and do_vfp_2op_dp() functions currently check
-only in that file and nowhere else, so we can make it file-local
+whether floating point is supported via the aa32_fpdp_v2 and
-rather than global.
+aa32_fpsp_v2 isar checks.  For v8.1M MVE support, the VMOV_reg trans
 functions (but not any of the others) need to update this to also
 allow the insn if MVE is implemented.  Move the check out of the do_
 function and into its callsites (which are all implemented via the
 DO_VFP_2OP macro), so we have a place to change the check for the
 VMOV insns.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181012144235.19646-3-peter.maydell@linaro.org
+Message-id: 20210520152840.24453-4-peter.maydell@linaro.org
 ---
- target/arm/internals.h | 1 -
+ target/arm/translate-vfp.c | 37 +++++++++++++++++++------------------
- target/arm/helper.c    | 6 ++++--
+file changed, 19 insertions(+), 18 deletions(-)
 files changed, 4 insertions(+), 3 deletions(-)
-diff --git a/target/arm/internals.h b/target/arm/internals.h
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/internals.h
+--- a/target/arm/translate-vfp.c
-+++ b/target/arm/internals.h
++++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@ static inline int bank_number(int mode)
+@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
-     g_assert_not_reached();
+     int veclen = s->vec_len;
      TCGv_i32 f0, fd;
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 -        return false;
 -    }
 +    /* Note that the caller must check the aa32_fpsp_v2 feature. */
      if (!dc_isar_feature(aa32_fpshvec, s) &&
          (veclen != 0 || s->vec_stride != 0)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
       */
      TCGv_i32 f0;
 +    /* Note that the caller must check the aa32_fp16_arith feature */
 +
      if (!dc_isar_feature(aa32_fp16_arith, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
      int veclen = s->vec_len;
      TCGv_i64 f0, fd;
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 -        return false;
 -    }
 +    /* Note that the caller must check the aa32_fpdp_v2 feature. */
      /* UNDEF accesses to D16-D31 if they don't exist */
      if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vm) & 0x10)) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
      return true;
  }
--void switch_mode(CPUARMState *, int);
+-#define DO_VFP_2OP(INSN, PREC, FN)                              \
- void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu);
++#define DO_VFP_2OP(INSN, PREC, FN, CHECK)                       \
- void arm_translate_init(void);
+     static bool trans_##INSN##_##PREC(DisasContext *s,          \
+                                       arg_##INSN##_##PREC *a)   \
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+     {                                                           \
-index XXXXXXX..XXXXXXX 100644
++        if (!dc_isar_feature(CHECK, s)) {                       \
---- a/target/arm/helper.c
++            return false;                                       \
-+++ b/target/arm/helper.c
++        }                                                       \
-@@ -XXX,XX +XXX,XX @@ static void v8m_security_lookup(CPUARMState *env, uint32_t address,
+         return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
-                                 V8M_SAttributes *sattrs);
+     }
- #endif
+-DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32)
-+static void switch_mode(CPUARMState *env, int mode);
+-DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64)
-+
++DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32, aa32_fpsp_v2)
- static int vfp_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg)
++DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64, aa32_fpdp_v2)
 -DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh)
 -DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss)
 -DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd)
 +DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh, aa32_fp16_arith)
 +DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss, aa32_fpsp_v2)
 +DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd, aa32_fpdp_v2)
 -DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh)
 -DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs)
 -DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd)
 +DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh, aa32_fp16_arith)
 +DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs, aa32_fpsp_v2)
 +DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd, aa32_fpdp_v2)
  static void gen_VSQRT_hp(TCGv_i32 vd, TCGv_i32 vm)
  {
-     int nregs;
+@@ -XXX,XX +XXX,XX @@ static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(v7m_tt)(CPUARMState *env, uint32_t addr, uint32_t op)
+     gen_helper_vfp_sqrtd(vd, vm, cpu_env);
      return 0;
  }
--void switch_mode(CPUARMState *env, int mode)
+-DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp)
-+static void switch_mode(CPUARMState *env, int mode)
+-DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp)
 -DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp)
 +DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp, aa32_fp16_arith)
 +DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp, aa32_fpsp_v2)
 +DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp, aa32_fpdp_v2)
  static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
  {
-     ARMCPU *cpu = arm_env_get_cpu(env);
-@@ -XXX,XX +XXX,XX @@ void aarch64_sync_64_to_32(CPUARMState *env)
- #else
--void switch_mode(CPUARMState *env, int mode)
-+static void switch_mode(CPUARMState *env, int mode)
- {
-     int old_mode;
-     int i;
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 19/45] target/arm: Get IL bit correct for v7 syndrome values
+[PULL 04/45] target/arm: Add MVE check to VMOV_reg_sp and VMOV_reg_dp
-For the v7 version of the Arm architecture, the IL bit in
+Split out the handling of VMOV_reg_sp and VMOV_reg_dp so that we can
-syndrome register values where the field is not valid was
+permit the insns if either FP or MVE are present.
 defined to be UNK/SBZP. In v8 this is RES1, which is what
 QEMU currently implements. Handle the desired v7 behaviour
 by squashing the IL bit for the affected cases:
  * EC == EC_UNCATEGORIZED
  * prefetch aborts
  * data aborts where ISV is 0
 (The fourth case listed in the v8 Arm ARM DDI 0487C.a in
 section G7.2.70, "illegal state exception", can't happen
 on a v7 CPU.)
 This deals with a corner case noted in a comment.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181012144235.19646-10-peter.maydell@linaro.org
+Message-id: 20210520152840.24453-5-peter.maydell@linaro.org
 ---
- target/arm/internals.h |  7 ++-----
+ target/arm/translate-vfp.c | 15 +++++++++++++--
- target/arm/helper.c    | 13 +++++++++++++
+file changed, 13 insertions(+), 2 deletions(-)
 files changed, 15 insertions(+), 5 deletions(-)
-diff --git a/target/arm/internals.h b/target/arm/internals.h
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/internals.h
+--- a/target/arm/translate-vfp.c
-+++ b/target/arm/internals.h
++++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@ static inline uint32_t syn_get_ec(uint32_t syn)
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
- /* Utility functions for constructing various kinds of syndrome value.
+         return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
   * Note that in general we follow the AArch64 syndrome values; in a
   * few cases the value in HSR for exceptions taken to AArch32 Hyp
 - * mode differs slightly, so if we ever implemented Hyp mode then the
 - * syndrome value would need some massaging on exception entry.
 - * (One example of this is that AArch64 defaults to IL bit set for
 - * exceptions which don't specifically indicate information about the
 - * trapping instruction, whereas AArch32 defaults to IL bit clear.)
 + * mode differs slightly, and we fix this up when populating HSR in
 + * arm_cpu_do_interrupt_aarch32_hyp().
   */
  static inline uint32_t syn_uncategorized(void)
  {
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_do_interrupt_aarch32_hyp(CPUState *cs)
      }
-     if (cs->exception_index != EXCP_IRQ && cs->exception_index != EXCP_FIQ) {
+-DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32, aa32_fpsp_v2)
-+        if (!arm_feature(env, ARM_FEATURE_V8)) {
+-DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64, aa32_fpdp_v2)
-+            /*
++#define DO_VFP_VMOV(INSN, PREC, FN)                             \
-+             * QEMU syndrome values are v8-style. v7 has the IL bit
++    static bool trans_##INSN##_##PREC(DisasContext *s,          \
-+             * UNK/SBZP for "field not valid" cases, where v8 uses RES1.
++                                      arg_##INSN##_##PREC *a)   \
-+             * If this is a v7 CPU, squash the IL bit in those cases.
++    {                                                           \
-+             */
++        if (!dc_isar_feature(aa32_fp##PREC##_v2, s) &&          \
-+            if (cs->exception_index == EXCP_PREFETCH_ABORT ||
++            !dc_isar_feature(aa32_mve, s)) {                    \
-+                (cs->exception_index == EXCP_DATA_ABORT &&
++            return false;                                       \
-+                 !(env->exception.syndrome & ARM_EL_ISV)) ||
++        }                                                       \
-+                syn_get_ec(env->exception.syndrome) == EC_UNCATEGORIZED) {
++        return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
-+                env->exception.syndrome &= ~ARM_EL_IL;
++    }
-+            }
++
-+        }
++DO_VFP_VMOV(VMOV_reg, sp, tcg_gen_mov_i32)
-         env->cp15.esr_el[2] = env->exception.syndrome;
++DO_VFP_VMOV(VMOV_reg, dp, tcg_gen_mov_i64)
-     }
+ DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh, aa32_fp16_arith)
  DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss, aa32_fpsp_v2)
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 17/45] target/arm: Implement HCR.PTW
+[PULL 05/45] target/arm: Fix return values in fp_sysreg_checks()
-If the HCR_EL2 PTW virtualizaiton configuration register bit
+The fp_sysreg_checks() function is supposed to be returning an
-is set, then this means that a stage 2 Permission fault must
+FPSysRegCheckResult, which is an enum with three possible values.
-be generated if a stage 1 translation table access is made
+However, three places in the function "return false" (a hangover from
-to an address that is mapped as Device memory in stage 2.
+a previous iteration of the design where the function just returned a
-Implement this.
+bool).  Make these return FPSysRegCheckFailed instead (for no
 functional change, since both false and FPSysRegCheckFailed are
 zero).
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181012144235.19646-8-peter.maydell@linaro.org
+Message-id: 20210520152840.24453-6-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 21 ++++++++++++++++++++-
+ target/arm/translate-vfp.c | 6 +++---
-file changed, 20 insertions(+), 1 deletion(-)
+file changed, 3 insertions(+), 3 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/translate-vfp.c
-+++ b/target/arm/helper.c
++++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
+@@ -XXX,XX +XXX,XX @@ static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
-         hwaddr s2pa;
+         break;
-         int s2prot;
+     case ARM_VFP_FPSCR_NZCVQC:
-         int ret;
+         if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-+        ARMCacheAttrs cacheattrs = {};
+-            return false;
-+        ARMCacheAttrs *pcacheattrs = NULL;
++            return FPSysRegCheckFailed;
 +
 +        if (env->cp15.hcr_el2 & HCR_PTW) {
 +            /*
 +             * PTW means we must fault if this S1 walk touches S2 Device
 +             * memory; otherwise we don't care about the attributes and can
 +             * save the S2 translation the effort of computing them.
 +             */
 +            pcacheattrs = &cacheattrs;
 +        }
          ret = get_phys_addr_lpae(env, addr, 0, ARMMMUIdx_S2NS, &s2pa,
 -                                 &txattrs, &s2prot, &s2size, fi, NULL);
 +                                 &txattrs, &s2prot, &s2size, fi, pcacheattrs);
          if (ret) {
              assert(fi->type != ARMFault_None);
              fi->s2addr = addr;
@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
              fi->s1ptw = true;
              return ~0;
          }
-+        if (pcacheattrs && (pcacheattrs->attrs & 0xf0) == 0) {
+         break;
-+            /* Access was to Device memory: generate Permission fault */
+     case ARM_VFP_FPCXT_S:
-+            fi->type = ARMFault_Permission;
+     case ARM_VFP_FPCXT_NS:
-+            fi->s2addr = addr;
+         if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-+            fi->stage2 = true;
+-            return false;
-+            fi->s1ptw = true;
++            return FPSysRegCheckFailed;
-+            return ~0;
+         }
-+        }
+         if (!s->v8m_secure) {
-         addr = s2pa;
+-            return false;
-     }
++            return FPSysRegCheckFailed;
-     return addr;
+         }
          break;
      default:
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 02/45] target/arm: Add support for VCPU event states
+[PULL 06/45] target/arm: Implement M-profile VPR register
-From: Dongjiu Geng <gengdongjiu@huawei.com>
+If MVE is implemented for an M-profile CPU then it has a VPR
 register, which tracks predication information.
-This patch extends the qemu-kvm state sync logic with support for
+Implement the read and write handling of this register, and
-KVM_GET/SET_VCPU_EVENTS, giving access to yet missing SError exception.
+the migration of its state.
 And also it can support the exception state migration.
-The SError exception states include SError pending state and ESR value,
-the kvm_put/get_vcpu_events() will be called when set or get system
-registers. When do migration, if source machine has SError pending,
-QEMU will do this migration regardless whether the target machine supports
-to specify guest ESR value, because if target machine does not support that,
-it can also inject the SError with zero ESR value.
-Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
-Reviewed-by: Andrew Jones <drjones@redhat.com>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 1538067351-23931-3-git-send-email-gengdongjiu@huawei.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210520152840.24453-7-peter.maydell@linaro.org
 ---
- target/arm/cpu.h     |  7 ++++++
+ target/arm/cpu.h           |  6 ++++++
- target/arm/kvm_arm.h | 24 ++++++++++++++++++
+ target/arm/machine.c       | 19 +++++++++++++++++++
- target/arm/kvm.c     | 60 ++++++++++++++++++++++++++++++++++++++++++++
+ target/arm/translate-vfp.c | 38 ++++++++++++++++++++++++++++++++++++++
- target/arm/kvm32.c   | 13 ++++++++++
+files changed, 63 insertions(+)
  target/arm/kvm64.c   | 13 ++++++++++
  target/arm/machine.c | 22 ++++++++++++++++
 files changed, 139 insertions(+)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
 @@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
-          */
+         uint32_t cpacr[M_REG_NUM_BANKS];
-     } exception;
+         uint32_t nsacr;
+         int ltpsize;
-+    /* Information associated with an SError */
++        uint32_t vpr;
-+    struct {
+     } v7m;
-+        uint8_t pending;
-+        uint8_t has_esr;
+     /* Information associated with an exception about to be taken:
-+        uint64_t esr;
+@@ -XXX,XX +XXX,XX @@ FIELD(V7M_FPCCR, ASPEN, 31, 1)
-+    } serror;
+      R_V7M_FPCCR_UFRDY_MASK |                   \
       R_V7M_FPCCR_ASPEN_MASK)
 +/* v7M VPR bits */
 +FIELD(V7M_VPR, P0, 0, 16)
 +FIELD(V7M_VPR, MASK01, 16, 4)
 +FIELD(V7M_VPR, MASK23, 20, 4)
 +
-     /* Thumb-2 EE state.  */
+ /*
-     uint32_t teecr;
+  * System register ID fields.
      uint32_t teehbr;
 diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm_arm.h
 +++ b/target/arm/kvm_arm.h
@@ -XXX,XX +XXX,XX @@ bool write_kvmstate_to_list(ARMCPU *cpu);
   */
- void kvm_arm_reset_vcpu(ARMCPU *cpu);
-+/**
-+ * kvm_arm_init_serror_injection:
-+ * @cs: CPUState
-+ *
-+ * Check whether KVM can set guest SError syndrome.
-+ */
-+void kvm_arm_init_serror_injection(CPUState *cs);
-+
-+/**
-+ * kvm_get_vcpu_events:
-+ * @cpu: ARMCPU
-+ *
-+ * Get VCPU related state from kvm.
-+ */
-+int kvm_get_vcpu_events(ARMCPU *cpu);
-+
-+/**
-+ * kvm_put_vcpu_events:
-+ * @cpu: ARMCPU
-+ *
-+ * Put VCPU related state to kvm.
-+ */
-+int kvm_put_vcpu_events(ARMCPU *cpu);
-+
- #ifdef CONFIG_KVM
- /**
-  * kvm_arm_create_scratch_host_vcpu:
-diff --git a/target/arm/kvm.c b/target/arm/kvm.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm.c
-+++ b/target/arm/kvm.c
-@@ -XXX,XX +XXX,XX @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
- };
- static bool cap_has_mp_state;
-+static bool cap_has_inject_serror_esr;
- static ARMHostCPUFeatures arm_host_cpu_features;
-@@ -XXX,XX +XXX,XX @@ int kvm_arm_vcpu_init(CPUState *cs)
-     return kvm_vcpu_ioctl(cs, KVM_ARM_VCPU_INIT, &init);
- }
-+void kvm_arm_init_serror_injection(CPUState *cs)
-+{
-+    cap_has_inject_serror_esr = kvm_check_extension(cs->kvm_state,
-+                                    KVM_CAP_ARM_INJECT_SERROR_ESR);
-+}
-+
- bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
-                                       int *fdarray,
-                                       struct kvm_vcpu_init *init)
-@@ -XXX,XX +XXX,XX @@ int kvm_arm_sync_mpstate_to_qemu(ARMCPU *cpu)
-     return 0;
- }
-+int kvm_put_vcpu_events(ARMCPU *cpu)
-+{
-+    CPUARMState *env = &cpu->env;
-+    struct kvm_vcpu_events events;
-+    int ret;
-+
-+    if (!kvm_has_vcpu_events()) {
-+        return 0;
-+    }
-+
-+    memset(&events, 0, sizeof(events));
-+    events.exception.serror_pending = env->serror.pending;
-+
-+    /* Inject SError to guest with specified syndrome if host kernel
-+     * supports it, otherwise inject SError without syndrome.
-+     */
-+    if (cap_has_inject_serror_esr) {
-+        events.exception.serror_has_esr = env->serror.has_esr;
-+        events.exception.serror_esr = env->serror.esr;
-+    }
-+
-+    ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, &events);
-+    if (ret) {
-+        error_report("failed to put vcpu events");
-+    }
-+
-+    return ret;
-+}
-+
-+int kvm_get_vcpu_events(ARMCPU *cpu)
-+{
-+    CPUARMState *env = &cpu->env;
-+    struct kvm_vcpu_events events;
-+    int ret;
-+
-+    if (!kvm_has_vcpu_events()) {
-+        return 0;
-+    }
-+
-+    memset(&events, 0, sizeof(events));
-+    ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_VCPU_EVENTS, &events);
-+    if (ret) {
-+        error_report("failed to get vcpu events");
-+        return ret;
-+    }
-+
-+    env->serror.pending = events.exception.serror_pending;
-+    env->serror.has_esr = events.exception.serror_has_esr;
-+    env->serror.esr = events.exception.serror_esr;
-+
-+    return 0;
-+}
-+
- void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
- {
- }
-diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm32.c
-+++ b/target/arm/kvm32.c
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
-     }
-     cpu->mp_affinity = mpidr & ARM32_AFFINITY_MASK;
-+    /* Check whether userspace can specify guest syndrome value */
-+    kvm_arm_init_serror_injection(cs);
-+
-     return kvm_arm_init_cpreg_list(cpu);
- }
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_put_registers(CPUState *cs, int level)
-         return ret;
-     }
-+    ret = kvm_put_vcpu_events(cpu);
-+    if (ret) {
-+        return ret;
-+    }
-+
-     /* Note that we do not call write_cpustate_to_list()
-      * here, so we are only writing the tuple list back to
-      * KVM. This is safe because nothing can change the
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_get_registers(CPUState *cs)
-     }
-     vfp_set_fpscr(env, fpscr);
-+    ret = kvm_get_vcpu_events(cpu);
-+    if (ret) {
-+        return ret;
-+    }
-+
-     if (!write_kvmstate_to_list(cpu)) {
-         return EINVAL;
-     }
-diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm64.c
-+++ b/target/arm/kvm64.c
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
-     kvm_arm_init_debug(cs);
-+    /* Check whether user space can specify guest syndrome value */
-+    kvm_arm_init_serror_injection(cs);
-+
-     return kvm_arm_init_cpreg_list(cpu);
- }
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_put_registers(CPUState *cs, int level)
-         return ret;
-     }
-+    ret = kvm_put_vcpu_events(cpu);
-+    if (ret) {
-+        return ret;
-+    }
-+
-     if (!write_list_to_kvmstate(cpu, level)) {
-         return EINVAL;
-     }
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_get_registers(CPUState *cs)
-     }
-     vfp_set_fpcr(env, fpr);
-+    ret = kvm_get_vcpu_events(cpu);
-+    if (ret) {
-+        return ret;
-+    }
-+
-     if (!write_kvmstate_to_list(cpu)) {
-         return EINVAL;
-     }
 diff --git a/target/arm/machine.c b/target/arm/machine.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/machine.c
 +++ b/target/arm/machine.c
-@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_sve = {
+@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_fp = {
      }
  };
- #endif /* AARCH64 */
++static bool mve_needed(void *opaque)
 +static bool serror_needed(void *opaque)
 +{
 +    ARMCPU *cpu = opaque;
-+    CPUARMState *env = &cpu->env;
 +
-+    return env->serror.pending != 0;
++    return cpu_isar_feature(aa32_mve, cpu);
 +}
 +
-+static const VMStateDescription vmstate_serror = {
++static const VMStateDescription vmstate_m_mve = {
-+    .name = "cpu/serror",
++    .name = "cpu/m/mve",
 +    .version_id = 1,
 +    .minimum_version_id = 1,
-+    .needed = serror_needed,
++    .needed = mve_needed,
 +    .fields = (VMStateField[]) {
-+        VMSTATE_UINT8(env.serror.pending, ARMCPU),
++        VMSTATE_UINT32(env.v7m.vpr, ARMCPU),
 +        VMSTATE_UINT8(env.serror.has_esr, ARMCPU),
 +        VMSTATE_UINT64(env.serror.esr, ARMCPU),
 +        VMSTATE_END_OF_LIST()
-+    }
++    },
 +};
 +
- static bool m_needed(void *opaque)
+ static const VMStateDescription vmstate_m = {
- {
+     .name = "cpu/m",
-     ARMCPU *cpu = opaque;
+     .version_id = 4,
-@@ -XXX,XX +XXX,XX @@ const VMStateDescription vmstate_arm_cpu = {
+@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m = {
- #ifdef TARGET_AARCH64
+         &vmstate_m_other_sp,
-         &vmstate_sve,
+         &vmstate_m_v8m,
- #endif
+         &vmstate_m_fp,
-+        &vmstate_serror,
++        &vmstate_m_mve,
          NULL
      }
  };
+diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-vfp.c
++++ b/target/arm/translate-vfp.c
+@@ -XXX,XX +XXX,XX @@ static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
+             return FPSysRegCheckFailed;
+         }
+         break;
++    case ARM_VFP_VPR:
++    case ARM_VFP_P0:
++        if (!dc_isar_feature(aa32_mve, s)) {
++            return FPSysRegCheckFailed;
++        }
++        break;
+     default:
+         return FPSysRegCheckFailed;
+     }
+@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
+         tcg_temp_free_i32(sfpa);
+         break;
+     }
++    case ARM_VFP_VPR:
++        /* Behaves as NOP if not privileged */
++        if (IS_USER(s)) {
++            break;
++        }
++        tmp = loadfn(s, opaque);
++        store_cpu_field(tmp, v7m.vpr);
++        break;
++    case ARM_VFP_P0:
++    {
++        TCGv_i32 vpr;
++        tmp = loadfn(s, opaque);
++        vpr = load_cpu_field(v7m.vpr);
++        tcg_gen_deposit_i32(vpr, vpr, tmp,
++                            R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
++        store_cpu_field(vpr, v7m.vpr);
++        tcg_temp_free_i32(tmp);
++        break;
++    }
+     default:
+         g_assert_not_reached();
+     }
+@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
+         tcg_temp_free_i32(fpscr);
+         break;
+     }
++    case ARM_VFP_VPR:
++        /* Behaves as NOP if not privileged */
++        if (IS_USER(s)) {
++            break;
++        }
++        tmp = load_cpu_field(v7m.vpr);
++        storefn(s, opaque, tmp);
++        break;
++    case ARM_VFP_P0:
++        tmp = load_cpu_field(v7m.vpr);
++        tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
++        storefn(s, opaque, tmp);
++        break;
+     default:
+         g_assert_not_reached();
+     }
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 09/45] target/arm: Convert sve from feature bit to aa64pfr0 test
+[PULL 07/45] target/arm: Make FPSCR.LTPSIZE writable for MVE
-From: Richard Henderson <richard.henderson@linaro.org>
+The M-profile FPSCR has an LTPSIZE field, but if MVE is not
 implemented it is read-only and always reads as 4; this is how QEMU
 currently handles it.
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Make the field writable when MVE is implemented.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181016223115.24100-8-richard.henderson@linaro.org
+We can safely add the field to the MVE migration struct because
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+currently no CPUs enable MVE and so the migration struct is never
 used.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210520152840.24453-8-peter.maydell@linaro.org
 ---
- target/arm/cpu.h            | 16 +++++++++++++++-
+ target/arm/cpu.h        | 3 ++-
- linux-user/aarch64/signal.c |  4 ++--
+ target/arm/machine.c    | 1 +
- linux-user/elfload.c        |  2 +-
+ target/arm/vfp_helper.c | 9 ++++++---
- linux-user/syscall.c        | 10 ++++++----
+files changed, 9 insertions(+), 4 deletions(-)
  target/arm/cpu64.c          |  5 ++++-
  target/arm/helper.c         |  9 ++++++---
  target/arm/machine.c        |  3 +--
  target/arm/translate-a64.c  |  4 ++--
 files changed, 37 insertions(+), 16 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ FIELD(ID_AA64ISAR1, FRINTTS, 32, 4)
+@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
- FIELD(ID_AA64ISAR1, SB, 36, 4)
+         uint32_t fpdscr[M_REG_NUM_BANKS];
- FIELD(ID_AA64ISAR1, SPECRES, 40, 4)
+         uint32_t cpacr[M_REG_NUM_BANKS];
+         uint32_t nsacr;
-+FIELD(ID_AA64PFR0, EL0, 0, 4)
+-        int ltpsize;
-+FIELD(ID_AA64PFR0, EL1, 4, 4)
++        uint32_t ltpsize;
-+FIELD(ID_AA64PFR0, EL2, 8, 4)
+         uint32_t vpr;
-+FIELD(ID_AA64PFR0, EL3, 12, 4)
+     } v7m;
-+FIELD(ID_AA64PFR0, FP, 16, 4)
-+FIELD(ID_AA64PFR0, ADVSIMD, 20, 4)
+@@ -XXX,XX +XXX,XX @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
-+FIELD(ID_AA64PFR0, GIC, 24, 4)
-+FIELD(ID_AA64PFR0, RAS, 28, 4)
+ #define FPCR_LTPSIZE_SHIFT 16   /* LTPSIZE, M-profile only */
-+FIELD(ID_AA64PFR0, SVE, 32, 4)
+ #define FPCR_LTPSIZE_MASK (7 << FPCR_LTPSIZE_SHIFT)
-+
++#define FPCR_LTPSIZE_LENGTH 3
- QEMU_BUILD_BUG_ON(ARRAY_SIZE(((ARMCPU *)0)->ccsidr) <= R_V7M_CSSELR_INDEX_MASK);
+ #define FPCR_NZCV_MASK (FPCR_N | FPCR_Z | FPCR_C | FPCR_V)
- /* If adding a feature bit which corresponds to a Linux ELF
+ #define FPCR_NZCVQC_MASK (FPCR_NZCV_MASK | FPCR_QC)
@@ -XXX,XX +XXX,XX @@ enum arm_features {
      ARM_FEATURE_PMU, /* has PMU support */
      ARM_FEATURE_VBAR, /* has cp15 VBAR */
      ARM_FEATURE_M_SECURITY, /* M profile Security Extension */
 -    ARM_FEATURE_SVE, /* has Scalable Vector Extension */
      ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
      ARM_FEATURE_M_MAIN, /* M profile Main Extension */
  };
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_fcma(const ARMISARegisters *id)
      return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, FCMA) != 0;
  }
 +static inline bool isar_feature_aa64_sve(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, SVE) != 0;
 +}
 +
  /*
   * Forward to the above feature tests given an ARMCPU pointer.
   */
 diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
 index XXXXXXX..XXXXXXX 100644
 --- a/linux-user/aarch64/signal.c
 +++ b/linux-user/aarch64/signal.c
@@ -XXX,XX +XXX,XX @@ static int target_restore_sigframe(CPUARMState *env,
              break;
          case TARGET_SVE_MAGIC:
 -            if (arm_feature(env, ARM_FEATURE_SVE)) {
 +            if (cpu_isar_feature(aa64_sve, arm_env_get_cpu(env))) {
                  vq = (env->vfp.zcr_el[1] & 0xf) + 1;
                  sve_size = QEMU_ALIGN_UP(TARGET_SVE_SIG_CONTEXT_SIZE(vq), 16);
                  if (!sve && size == sve_size) {
@@ -XXX,XX +XXX,XX @@ static void target_setup_frame(int usig, struct target_sigaction *ka,
                                        &layout);
      /* SVE state needs saving only if it exists.  */
 -    if (arm_feature(env, ARM_FEATURE_SVE)) {
 +    if (cpu_isar_feature(aa64_sve, arm_env_get_cpu(env))) {
          vq = (env->vfp.zcr_el[1] & 0xf) + 1;
          sve_size = QEMU_ALIGN_UP(TARGET_SVE_SIG_CONTEXT_SIZE(vq), 16);
          sve_ofs = alloc_sigframe_space(sve_size, &layout);
 diff --git a/linux-user/elfload.c b/linux-user/elfload.c
 index XXXXXXX..XXXXXXX 100644
 --- a/linux-user/elfload.c
 +++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
      GET_FEATURE_ID(aa64_rdm, ARM_HWCAP_A64_ASIMDRDM);
      GET_FEATURE_ID(aa64_dp, ARM_HWCAP_A64_ASIMDDP);
      GET_FEATURE_ID(aa64_fcma, ARM_HWCAP_A64_FCMA);
 -    GET_FEATURE(ARM_FEATURE_SVE, ARM_HWCAP_A64_SVE);
 +    GET_FEATURE_ID(aa64_sve, ARM_HWCAP_A64_SVE);
  #undef GET_FEATURE
  #undef GET_FEATURE_ID
 diff --git a/linux-user/syscall.c b/linux-user/syscall.c
 index XXXXXXX..XXXXXXX 100644
 --- a/linux-user/syscall.c
 +++ b/linux-user/syscall.c
@@ -XXX,XX +XXX,XX @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
               * even though the current architectural maximum is VQ=16.
               */
              ret = -TARGET_EINVAL;
 -            if (arm_feature(cpu_env, ARM_FEATURE_SVE)
 +            if (cpu_isar_feature(aa64_sve, arm_env_get_cpu(cpu_env))
                  && arg2 >= 0 && arg2 <= 512 * 16 && !(arg2 & 15)) {
                  CPUARMState *env = cpu_env;
                  ARMCPU *cpu = arm_env_get_cpu(env);
@@ -XXX,XX +XXX,XX @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
              return ret;
          case TARGET_PR_SVE_GET_VL:
              ret = -TARGET_EINVAL;
 -            if (arm_feature(cpu_env, ARM_FEATURE_SVE)) {
 -                CPUARMState *env = cpu_env;
 -                ret = ((env->vfp.zcr_el[1] & 0xf) + 1) * 16;
 +            {
 +                ARMCPU *cpu = arm_env_get_cpu(cpu_env);
 +                if (cpu_isar_feature(aa64_sve, cpu)) {
 +                    ret = ((cpu->env.vfp.zcr_el[1] & 0xf) + 1) * 16;
 +                }
              }
              return ret;
  #endif /* AARCH64 */
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
          cpu->isar.id_aa64isar1 = t;
 +        t = cpu->isar.id_aa64pfr0;
 +        t = FIELD_DP64(t, ID_AA64PFR0, SVE, 1);
 +        cpu->isar.id_aa64pfr0 = t;
 +
          /* Replicate the same data to the 32-bit id registers.  */
          u = cpu->isar.id_isar5;
          u = FIELD_DP32(u, ID_ISAR5, AES, 2); /* AES + PMULL */
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
           * present in either.
           */
          set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
 -        set_feature(&cpu->env, ARM_FEATURE_SVE);
          /* For usermode -cpu max we can use a larger and more efficient DCZ
           * blocksize since we don't have to follow what the hardware does.
           */
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
          define_one_arm_cp_reg(cpu, &sctlr);
      }
 -    if (arm_feature(env, ARM_FEATURE_SVE)) {
 +    if (cpu_isar_feature(aa64_sve, cpu)) {
          define_one_arm_cp_reg(cpu, &zcr_el1_reginfo);
          if (arm_feature(env, ARM_FEATURE_EL2)) {
              define_one_arm_cp_reg(cpu, &zcr_el2_reginfo);
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
      uint32_t flags;
      if (is_a64(env)) {
 +        ARMCPU *cpu = arm_env_get_cpu(env);
 +
          *pc = env->pc;
          flags = ARM_TBFLAG_AARCH64_STATE_MASK;
          /* Get control bits for tagged addresses */
          flags |= (arm_regime_tbi0(env, mmu_idx) << ARM_TBFLAG_TBI0_SHIFT);
          flags |= (arm_regime_tbi1(env, mmu_idx) << ARM_TBFLAG_TBI1_SHIFT);
 -        if (arm_feature(env, ARM_FEATURE_SVE)) {
 +        if (cpu_isar_feature(aa64_sve, cpu)) {
              int sve_el = sve_exception_el(env, current_el);
              uint32_t zcr_len;
@@ -XXX,XX +XXX,XX @@ void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq)
  void aarch64_sve_change_el(CPUARMState *env, int old_el,
                             int new_el, bool el0_a64)
  {
 +    ARMCPU *cpu = arm_env_get_cpu(env);
      int old_len, new_len;
      bool old_a64, new_a64;
      /* Nothing to do if no SVE.  */
 -    if (!arm_feature(env, ARM_FEATURE_SVE)) {
 +    if (!cpu_isar_feature(aa64_sve, cpu)) {
          return;
      }
 diff --git a/target/arm/machine.c b/target/arm/machine.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/machine.c
 +++ b/target/arm/machine.c
-@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_iwmmxt = {
+@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_mve = {
- static bool sve_needed(void *opaque)
+     .needed = mve_needed,
      .fields = (VMStateField[]) {
          VMSTATE_UINT32(env.v7m.vpr, ARMCPU),
 +        VMSTATE_UINT32(env.v7m.ltpsize, ARMCPU),
          VMSTATE_END_OF_LIST()
      },
  };
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t vfp_get_fpscr(CPUARMState *env)
  void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
  {
-     ARMCPU *cpu = opaque;
++    ARMCPU *cpu = env_archcpu(env);
--    CPUARMState *env = &cpu->env;
++
+     /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
--    return arm_feature(env, ARM_FEATURE_SVE);
+-    if (!cpu_isar_feature(any_fp16, env_archcpu(env))) {
-+    return cpu_isar_feature(aa64_sve, cpu);
++    if (!cpu_isar_feature(any_fp16, cpu)) {
- }
+         val &= ~FPCR_FZ16;
+     }
- /* The first two words of each Zreg is stored in VFP state.  */
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
-index XXXXXXX..XXXXXXX 100644
+          * because in v7A no-short-vector-support cores still had to
---- a/target/arm/translate-a64.c
+          * allow Stride/Len to be written with the only effect that
-+++ b/target/arm/translate-a64.c
+          * some insns are required to UNDEF if the guest sets them.
-@@ -XXX,XX +XXX,XX @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
+-         *
-     cpu_fprintf(f, "     FPCR=%08x FPSR=%08x\n",
+-         * TODO: if M-profile MVE implemented, set LTPSIZE.
-                 vfp_get_fpcr(env), vfp_get_fpsr(env));
+          */
+         env->vfp.vec_len = extract32(val, 16, 3);
--    if (arm_feature(env, ARM_FEATURE_SVE) && sve_exception_el(env, el) == 0) {
+         env->vfp.vec_stride = extract32(val, 20, 2);
-+    if (cpu_isar_feature(aa64_sve, cpu) && sve_exception_el(env, el) == 0) {
++    } else if (cpu_isar_feature(aa32_mve, cpu)) {
-         int j, zcr_len = sve_zcr_len_for_el(env, el);
++        env->v7m.ltpsize = extract32(val, FPCR_LTPSIZE_SHIFT,
++                                     FPCR_LTPSIZE_LENGTH);
-         for (i = 0; i <= FFR_PRED_NUM; i++) {
+     }
-@@ -XXX,XX +XXX,XX @@ static void disas_a64_insn(CPUARMState *env, DisasContext *s)
-         unallocated_encoding(s);
+     if (arm_feature(env, ARM_FEATURE_NEON)) {
          break;
      case 0x2:
 -        if (!arm_dc_feature(s, ARM_FEATURE_SVE) || !disas_sve(s, insn)) {
 +        if (!dc_isar_feature(aa64_sve, s) || !disas_sve(s, insn)) {
              unallocated_encoding(s);
          }
          break;
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 08/45] target/arm: Convert t32ee from feature bit to isar3 test
+[PULL 08/45] target/arm: Allow board models to specify initial NS VTOR
-From: Richard Henderson <richard.henderson@linaro.org>
+Currently we allow board models to specify the initial value of the
 Secure VTOR register, using an init-svtor property on the TYPE_ARMV7M
 object which is plumbed through to the CPU.  Allow board models to
 also specify the initial value of the Non-secure VTOR via a similar
 init-nsvtor property.
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181016223115.24100-7-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210520152840.24453-10-peter.maydell@linaro.org
 ---
- target/arm/cpu.h     | 6 +++++-
+ include/hw/arm/armv7m.h |  2 ++
- linux-user/elfload.c | 2 +-
+ target/arm/cpu.h        |  2 ++
- target/arm/cpu.c     | 4 ----
+ hw/arm/armv7m.c         |  7 +++++++
- target/arm/helper.c  | 2 +-
+ target/arm/cpu.c        | 10 ++++++++++
- target/arm/machine.c | 3 +--
+files changed, 21 insertions(+)
 files changed, 8 insertions(+), 9 deletions(-)
+diff --git a/include/hw/arm/armv7m.h b/include/hw/arm/armv7m.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/hw/arm/armv7m.h
++++ b/include/hw/arm/armv7m.h
+@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(ARMv7MState, ARMV7M)
+  *   devices will be automatically layered on top of this view.)
+  * + Property "idau": IDAU interface (forwarded to CPU object)
+  * + Property "init-svtor": secure VTOR reset value (forwarded to CPU object)
++ * + Property "init-nsvtor": non-secure VTOR reset value (forwarded to CPU object)
+  * + Property "vfp": enable VFP (forwarded to CPU object)
+  * + Property "dsp": enable DSP (forwarded to CPU object)
+  * + Property "enable-bitband": expose bitbanded IO
+@@ -XXX,XX +XXX,XX @@ struct ARMv7MState {
+     MemoryRegion *board_memory;
+     Object *idau;
+     uint32_t init_svtor;
++    uint32_t init_nsvtor;
+     bool enable_bitband;
+     bool start_powered_off;
+     bool vfp;
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ enum arm_features {
+@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
-     ARM_FEATURE_NEON,
-     ARM_FEATURE_M, /* Microcontroller profile.  */
+     /* For v8M, initial value of the Secure VTOR */
-     ARM_FEATURE_OMAPCP, /* OMAP specific CP15 ops handling.  */
+     uint32_t init_svtor;
--    ARM_FEATURE_THUMB2EE,
++    /* For v8M, initial value of the Non-secure VTOR */
-     ARM_FEATURE_V7MP,    /* v7 Multiprocessing Extensions */
++    uint32_t init_nsvtor;
-     ARM_FEATURE_V7VE, /* v7 Virtualization Extensions (non-EL2 parts) */
-     ARM_FEATURE_V4T,
+     /* [QEMU_]KVM_ARM_TARGET_* constant for this CPU, or
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_jazelle(const ARMISARegisters *id)
+      * QEMU_KVM_ARM_TARGET_NONE if the kernel doesn't support this CPU type.
-     return FIELD_EX32(id->id_isar1, ID_ISAR1, JAZELLE) != 0;
+diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
  }
 +static inline bool isar_feature_t32ee(const ARMISARegisters *id)
 +{
 +    return FIELD_EX32(id->id_isar3, ID_ISAR3, T32EE) != 0;
 +}
 +
  static inline bool isar_feature_aa32_aes(const ARMISARegisters *id)
  {
      return FIELD_EX32(id->id_isar5, ID_ISAR5, AES) != 0;
 diff --git a/linux-user/elfload.c b/linux-user/elfload.c
 index XXXXXXX..XXXXXXX 100644
---- a/linux-user/elfload.c
+--- a/hw/arm/armv7m.c
-+++ b/linux-user/elfload.c
++++ b/hw/arm/armv7m.c
-@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
+@@ -XXX,XX +XXX,XX @@ static void armv7m_realize(DeviceState *dev, Error **errp)
-     GET_FEATURE(ARM_FEATURE_V5, ARM_HWCAP_ARM_EDSP);
+             return;
-     GET_FEATURE(ARM_FEATURE_VFP, ARM_HWCAP_ARM_VFP);
+         }
-     GET_FEATURE(ARM_FEATURE_IWMMXT, ARM_HWCAP_ARM_IWMMXT);
+     }
--    GET_FEATURE(ARM_FEATURE_THUMB2EE, ARM_HWCAP_ARM_THUMBEE);
++    if (object_property_find(OBJECT(s->cpu), "init-nsvtor")) {
-+    GET_FEATURE_ID(t32ee, ARM_HWCAP_ARM_THUMBEE);
++        if (!object_property_set_uint(OBJECT(s->cpu), "init-nsvtor",
-     GET_FEATURE(ARM_FEATURE_NEON, ARM_HWCAP_ARM_NEON);
++                                      s->init_nsvtor, errp)) {
-     GET_FEATURE(ARM_FEATURE_VFP3, ARM_HWCAP_ARM_VFPv3);
++            return;
-     GET_FEATURE(ARM_FEATURE_V6K, ARM_HWCAP_ARM_TLS);
++        }
 +    }
      if (object_property_find(OBJECT(s->cpu), "start-powered-off")) {
          if (!object_property_set_bool(OBJECT(s->cpu), "start-powered-off",
                                        s->start_powered_off, errp)) {
@@ -XXX,XX +XXX,XX @@ static Property armv7m_properties[] = {
                       MemoryRegion *),
      DEFINE_PROP_LINK("idau", ARMv7MState, idau, TYPE_IDAU_INTERFACE, Object *),
      DEFINE_PROP_UINT32("init-svtor", ARMv7MState, init_svtor, 0),
 +    DEFINE_PROP_UINT32("init-nsvtor", ARMv7MState, init_nsvtor, 0),
      DEFINE_PROP_BOOL("enable-bitband", ARMv7MState, enable_bitband, false),
      DEFINE_PROP_BOOL("start-powered-off", ARMv7MState, start_powered_off,
                       false),
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void cortex_a8_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
-     set_feature(&cpu->env, ARM_FEATURE_V7);
+         env->regs[14] = 0xffffffff;
-     set_feature(&cpu->env, ARM_FEATURE_VFP3);
-     set_feature(&cpu->env, ARM_FEATURE_NEON);
+         env->v7m.vecbase[M_REG_S] = cpu->init_svtor & 0xffffff80;
--    set_feature(&cpu->env, ARM_FEATURE_THUMB2EE);
++        env->v7m.vecbase[M_REG_NS] = cpu->init_nsvtor & 0xffffff80;
-     set_feature(&cpu->env, ARM_FEATURE_DUMMY_C15_REGS);
-     set_feature(&cpu->env, ARM_FEATURE_EL3);
+         /* Load the initial SP and PC from offset 0 and 4 in the vector table */
-     cpu->midr = 0x410fc080;
+         vecbase = env->v7m.vecbase[env->v7m.secure];
-@@ -XXX,XX +XXX,XX @@ static void cortex_a9_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
-     set_feature(&cpu->env, ARM_FEATURE_VFP3);
+                                        &cpu->init_svtor,
-     set_feature(&cpu->env, ARM_FEATURE_VFP_FP16);
+                                        OBJ_PROP_FLAG_READWRITE);
      set_feature(&cpu->env, ARM_FEATURE_NEON);
 -    set_feature(&cpu->env, ARM_FEATURE_THUMB2EE);
      set_feature(&cpu->env, ARM_FEATURE_EL3);
      /* Note that A9 supports the MP extensions even for
       * A9UP and single-core A9MP (which are both different
@@ -XXX,XX +XXX,XX @@ static void cortex_a7_initfn(Object *obj)
      set_feature(&cpu->env, ARM_FEATURE_V7VE);
      set_feature(&cpu->env, ARM_FEATURE_VFP4);
      set_feature(&cpu->env, ARM_FEATURE_NEON);
 -    set_feature(&cpu->env, ARM_FEATURE_THUMB2EE);
      set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
      set_feature(&cpu->env, ARM_FEATURE_DUMMY_C15_REGS);
      set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
@@ -XXX,XX +XXX,XX @@ static void cortex_a15_initfn(Object *obj)
      set_feature(&cpu->env, ARM_FEATURE_V7VE);
      set_feature(&cpu->env, ARM_FEATURE_VFP4);
      set_feature(&cpu->env, ARM_FEATURE_NEON);
 -    set_feature(&cpu->env, ARM_FEATURE_THUMB2EE);
      set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
      set_feature(&cpu->env, ARM_FEATURE_DUMMY_C15_REGS);
      set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
          define_arm_cp_regs(cpu, vmsa_pmsa_cp_reginfo);
          define_arm_cp_regs(cpu, vmsa_cp_reginfo);
      }
--    if (arm_feature(env, ARM_FEATURE_THUMB2EE)) {
++    if (arm_feature(&cpu->env, ARM_FEATURE_M)) {
-+    if (cpu_isar_feature(t32ee, cpu)) {
++        /*
-         define_arm_cp_regs(cpu, t2ee_cp_reginfo);
++         * Initial value of the NS VTOR (for cores without the Security
-     }
++         * extension, this is the only VTOR)
-     if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) {
++         */
-diff --git a/target/arm/machine.c b/target/arm/machine.c
++        object_property_add_uint32_ptr(obj, "init-nsvtor",
-index XXXXXXX..XXXXXXX 100644
++                                       &cpu->init_nsvtor,
---- a/target/arm/machine.c
++                                       OBJ_PROP_FLAG_READWRITE);
-+++ b/target/arm/machine.c
++    }
-@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m = {
- static bool thumb2ee_needed(void *opaque)
+     qdev_property_add_static(DEVICE(obj), &arm_cpu_cfgend_property);
- {
      ARMCPU *cpu = opaque;
 -    CPUARMState *env = &cpu->env;
 -    return arm_feature(env, ARM_FEATURE_THUMB2EE);
 +    return cpu_isar_feature(t32ee, cpu);
  }
  static const VMStateDescription vmstate_thumb2ee = {
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 16/45] target/arm: Implement HCR.VI and VF
+[PULL 09/45] arm: Consistently use "Cortex-Axx", not "Cortex Axx"
-The HCR_EL2 VI and VF bits are supposed to track whether there is
+The official punctuation for Arm CPU names uses a hyphen, like
-a pending virtual IRQ or virtual FIQ. For QEMU we store the
+"Cortex-A9". We mostly follow this, but in a few places usage
-pending VIRQ/VFIQ status in cs->interrupt_request, so this means:
+without the hyphen has crept in. Fix those so we consistently
- * if the register is read we must get these bit values from
+use the same way of writing the CPU name.
-   cs->interrupt_request
- * if the register is written then we must write the bit
+This commit was created with:
-   values back into cs->interrupt_request
+  git grep -z -l 'Cortex ' | xargs -0 sed -i 's/Cortex /Cortex-/'
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181012144235.19646-7-peter.maydell@linaro.org
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Message-id: 20210527095152.10968-1-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 47 +++++++++++++++++++++++++++++++++++++++++----
+ docs/system/arm/aspeed.rst    | 4 ++--
-file changed, 43 insertions(+), 4 deletions(-)
+ docs/system/arm/nuvoton.rst   | 6 +++---
+ docs/system/arm/sabrelite.rst | 2 +-
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+ include/hw/arm/allwinner-h3.h | 2 +-
-index XXXXXXX..XXXXXXX 100644
+ hw/arm/aspeed.c               | 6 +++---
---- a/target/arm/helper.c
+ hw/arm/mcimx6ul-evk.c         | 2 +-
-+++ b/target/arm/helper.c
+ hw/arm/mcimx7d-sabre.c        | 2 +-
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el3_no_el2_v8_cp_reginfo[] = {
+ hw/arm/npcm7xx_boards.c       | 4 ++--
- static void hcr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
+ hw/arm/sabrelite.c            | 2 +-
  hw/misc/npcm7xx_clk.c         | 2 +-
 files changed, 16 insertions(+), 16 deletions(-)
 diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/system/arm/aspeed.rst
 +++ b/docs/system/arm/aspeed.rst
@@ -XXX,XX +XXX,XX @@ The QEMU Aspeed machines model BMCs of various OpenPOWER systems and
  Aspeed evaluation boards. They are based on different releases of the
  Aspeed SoC : the AST2400 integrating an ARM926EJ-S CPU (400MHz), the
  AST2500 with an ARM1176JZS CPU (800MHz) and more recently the AST2600
 -with dual cores ARM Cortex A7 CPUs (1.2GHz).
 +with dual cores ARM Cortex-A7 CPUs (1.2GHz).
  The SoC comes with RAM, Gigabit ethernet, USB, SD/MMC, USB, SPI, I2C,
  etc.
@@ -XXX,XX +XXX,XX @@ AST2500 SoC based machines :
  AST2600 SoC based machines :
 -- ``ast2600-evb``          Aspeed AST2600 Evaluation board (Cortex A7)
 +- ``ast2600-evb``          Aspeed AST2600 Evaluation board (Cortex-A7)
  - ``tacoma-bmc``           OpenPOWER Witherspoon POWER9 AST2600 BMC
  Supported devices
 diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/system/arm/nuvoton.rst
 +++ b/docs/system/arm/nuvoton.rst
@@ -XXX,XX +XXX,XX @@ Nuvoton iBMC boards (``npcm750-evb``, ``quanta-gsj``)
  The `Nuvoton iBMC`_ chips (NPCM7xx) are a family of ARM-based SoCs that are
  designed to be used as Baseboard Management Controllers (BMCs) in various
 -servers. They all feature one or two ARM Cortex A9 CPU cores, as well as an
 +servers. They all feature one or two ARM Cortex-A9 CPU cores, as well as an
  assortment of peripherals targeted for either Enterprise or Data Center /
  Hyperscale applications. The former is a superset of the latter, so NPCM750 has
  all the peripherals of NPCM730 and more.
  .. _Nuvoton iBMC: https://www.nuvoton.com/products/cloud-computing/ibmc/
 -The NPCM750 SoC has two Cortex A9 cores and is targeted for the Enterprise
 +The NPCM750 SoC has two Cortex-A9 cores and is targeted for the Enterprise
  segment. The following machines are based on this chip :
  - ``npcm750-evb``       Nuvoton NPCM750 Evaluation board
 -The NPCM730 SoC has two Cortex A9 cores and is targeted for Data Center and
 +The NPCM730 SoC has two Cortex-A9 cores and is targeted for Data Center and
  Hyperscale applications. The following machines are based on this chip :
  - ``quanta-gsj``        Quanta GSJ server BMC
 diff --git a/docs/system/arm/sabrelite.rst b/docs/system/arm/sabrelite.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/system/arm/sabrelite.rst
 +++ b/docs/system/arm/sabrelite.rst
@@ -XXX,XX +XXX,XX @@ Supported devices
  The SABRE Lite machine supports the following devices:
 - * Up to 4 Cortex A9 cores
 + * Up to 4 Cortex-A9 cores
   * Generic Interrupt Controller
   * 1 Clock Controller Module
   * 1 System Reset Controller
 diff --git a/include/hw/arm/allwinner-h3.h b/include/hw/arm/allwinner-h3.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/arm/allwinner-h3.h
 +++ b/include/hw/arm/allwinner-h3.h
@@ -XXX,XX +XXX,XX @@
   */
  /*
 - * The Allwinner H3 is a System on Chip containing four ARM Cortex A7
 + * The Allwinner H3 is a System on Chip containing four ARM Cortex-A7
   * processor cores. Features and specifications include DDR2/DDR3 memory,
   * SD/MMC storage cards, 10/100/1000Mbit Ethernet, USB 2.0, HDMI and
   * various I/O modules.
 diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/aspeed.c
 +++ b/hw/arm/aspeed.c
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_ast2600_evb_class_init(ObjectClass *oc, void *data)
      MachineClass *mc = MACHINE_CLASS(oc);
      AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 -    mc->desc       = "Aspeed AST2600 EVB (Cortex A7)";
 +    mc->desc       = "Aspeed AST2600 EVB (Cortex-A7)";
      amc->soc_name  = "ast2600-a1";
      amc->hw_strap1 = AST2600_EVB_HW_STRAP1;
      amc->hw_strap2 = AST2600_EVB_HW_STRAP2;
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_tacoma_class_init(ObjectClass *oc, void *data)
      MachineClass *mc = MACHINE_CLASS(oc);
      AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 -    mc->desc       = "OpenPOWER Tacoma BMC (Cortex A7)";
 +    mc->desc       = "OpenPOWER Tacoma BMC (Cortex-A7)";
      amc->soc_name  = "ast2600-a1";
      amc->hw_strap1 = TACOMA_BMC_HW_STRAP1;
      amc->hw_strap2 = TACOMA_BMC_HW_STRAP2;
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_rainier_class_init(ObjectClass *oc, void *data)
      MachineClass *mc = MACHINE_CLASS(oc);
      AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 -    mc->desc       = "IBM Rainier BMC (Cortex A7)";
 +    mc->desc       = "IBM Rainier BMC (Cortex-A7)";
      amc->soc_name  = "ast2600-a1";
      amc->hw_strap1 = RAINIER_BMC_HW_STRAP1;
      amc->hw_strap2 = RAINIER_BMC_HW_STRAP2;
 diff --git a/hw/arm/mcimx6ul-evk.c b/hw/arm/mcimx6ul-evk.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/mcimx6ul-evk.c
 +++ b/hw/arm/mcimx6ul-evk.c
@@ -XXX,XX +XXX,XX @@ static void mcimx6ul_evk_init(MachineState *machine)
  static void mcimx6ul_evk_machine_init(MachineClass *mc)
  {
-     ARMCPU *cpu = arm_env_get_cpu(env);
+-    mc->desc = "Freescale i.MX6UL Evaluation Kit (Cortex A7)";
-+    CPUState *cs = ENV_GET_CPU(env);
++    mc->desc = "Freescale i.MX6UL Evaluation Kit (Cortex-A7)";
-     uint64_t valid_mask = HCR_MASK;
+     mc->init = mcimx6ul_evk_init;
+     mc->max_cpus = FSL_IMX6UL_NUM_CPUS;
-     if (arm_feature(env, ARM_FEATURE_EL3)) {
+     mc->default_ram_id = "mcimx6ul-evk.ram";
-@@ -XXX,XX +XXX,XX @@ static void hcr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
+diff --git a/hw/arm/mcimx7d-sabre.c b/hw/arm/mcimx7d-sabre.c
-     /* Clear RES0 bits.  */
+index XXXXXXX..XXXXXXX 100644
-     value &= valid_mask;
+--- a/hw/arm/mcimx7d-sabre.c
++++ b/hw/arm/mcimx7d-sabre.c
-+    /*
+@@ -XXX,XX +XXX,XX @@ static void mcimx7d_sabre_init(MachineState *machine)
-+     * VI and VF are kept in cs->interrupt_request. Modifying that
-+     * requires that we have the iothread lock, which is done by
+ static void mcimx7d_sabre_machine_init(MachineClass *mc)
-+     * marking the reginfo structs as ARM_CP_IO.
+ {
-+     * Note that if a write to HCR pends a VIRQ or VFIQ it is never
+-    mc->desc = "Freescale i.MX7 DUAL SABRE (Cortex A7)";
-+     * possible for it to be taken immediately, because VIRQ and
++    mc->desc = "Freescale i.MX7 DUAL SABRE (Cortex-A7)";
-+     * VFIQ are masked unless running at EL0 or EL1, and HCR
+     mc->init = mcimx7d_sabre_init;
-+     * can only be written at EL2.
+     mc->max_cpus = FSL_IMX7_NUM_CPUS;
-+     */
+     mc->default_ram_id = "mcimx7d-sabre.ram";
-+    g_assert(qemu_mutex_iothread_locked());
+diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
-+    if (value & HCR_VI) {
+index XXXXXXX..XXXXXXX 100644
-+        cs->interrupt_request |= CPU_INTERRUPT_VIRQ;
+--- a/hw/arm/npcm7xx_boards.c
-+    } else {
++++ b/hw/arm/npcm7xx_boards.c
-+        cs->interrupt_request &= ~CPU_INTERRUPT_VIRQ;
+@@ -XXX,XX +XXX,XX @@ static void npcm750_evb_machine_class_init(ObjectClass *oc, void *data)
-+    }
-+    if (value & HCR_VF) {
+     npcm7xx_set_soc_type(nmc, TYPE_NPCM750);
-+        cs->interrupt_request |= CPU_INTERRUPT_VFIQ;
-+    } else {
+-    mc->desc = "Nuvoton NPCM750 Evaluation Board (Cortex A9)";
-+        cs->interrupt_request &= ~CPU_INTERRUPT_VFIQ;
++    mc->desc = "Nuvoton NPCM750 Evaluation Board (Cortex-A9)";
-+    }
+     mc->init = npcm750_evb_init;
-+    value &= ~(HCR_VI | HCR_VF);
+     mc->default_ram_size = 512 * MiB;
-+
+ };
-     /* These bits change the MMU setup:
+@@ -XXX,XX +XXX,XX @@ static void gsj_machine_class_init(ObjectClass *oc, void *data)
-      * HCR_VM enables stage 2 translation
-      * HCR_PTW forbids certain page-table setups
+     npcm7xx_set_soc_type(nmc, TYPE_NPCM730);
-@@ -XXX,XX +XXX,XX @@ static void hcr_writelow(CPUARMState *env, const ARMCPRegInfo *ri,
-     hcr_write(env, NULL, value);
+-    mc->desc = "Quanta GSJ (Cortex A9)";
- }
++    mc->desc = "Quanta GSJ (Cortex-A9)";
+     mc->init = quanta_gsj_init;
-+static uint64_t hcr_read(CPUARMState *env, const ARMCPRegInfo *ri)
+     mc->default_ram_size = 512 * MiB;
-+{
+ };
-+    /* The VI and VF bits live in cs->interrupt_request */
+diff --git a/hw/arm/sabrelite.c b/hw/arm/sabrelite.c
-+    uint64_t ret = env->cp15.hcr_el2 & ~(HCR_VI | HCR_VF);
+index XXXXXXX..XXXXXXX 100644
-+    CPUState *cs = ENV_GET_CPU(env);
+--- a/hw/arm/sabrelite.c
-+
++++ b/hw/arm/sabrelite.c
-+    if (cs->interrupt_request & CPU_INTERRUPT_VIRQ) {
+@@ -XXX,XX +XXX,XX @@ static void sabrelite_init(MachineState *machine)
-+        ret |= HCR_VI;
-+    }
+ static void sabrelite_machine_init(MachineClass *mc)
-+    if (cs->interrupt_request & CPU_INTERRUPT_VFIQ) {
+ {
-+        ret |= HCR_VF;
+-    mc->desc = "Freescale i.MX6 Quad SABRE Lite Board (Cortex A9)";
-+    }
++    mc->desc = "Freescale i.MX6 Quad SABRE Lite Board (Cortex-A9)";
-+    return ret;
+     mc->init = sabrelite_init;
-+}
+     mc->max_cpus = FSL_IMX6_NUM_CPUS;
-+
+     mc->ignore_memory_transaction_failures = true;
- static const ARMCPRegInfo el2_cp_reginfo[] = {
+diff --git a/hw/misc/npcm7xx_clk.c b/hw/misc/npcm7xx_clk.c
-     { .name = "HCR_EL2", .state = ARM_CP_STATE_AA64,
+index XXXXXXX..XXXXXXX 100644
-+      .type = ARM_CP_IO,
+--- a/hw/misc/npcm7xx_clk.c
-       .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 0,
++++ b/hw/misc/npcm7xx_clk.c
-       .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.hcr_el2),
+@@ -XXX,XX +XXX,XX @@
--      .writefn = hcr_write },
+ #define NPCM7XX_CLOCK_REF_HZ            (25000000)
-+      .writefn = hcr_write, .readfn = hcr_read },
-     { .name = "HCR", .state = ARM_CP_STATE_AA32,
+ /* Register Field Definitions */
--      .type = ARM_CP_ALIAS,
+-#define NPCM7XX_CLK_WDRCR_CA9C  BIT(0) /* Cortex A9 Cores */
-+      .type = ARM_CP_ALIAS | ARM_CP_IO,
++#define NPCM7XX_CLK_WDRCR_CA9C  BIT(0) /* Cortex-A9 Cores */
-       .cp = 15, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 0,
-       .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.hcr_el2),
+ #define PLLCON_LOKI     BIT(31)
--      .writefn = hcr_writelow },
+ #define PLLCON_LOKS     BIT(30)
 +      .writefn = hcr_writelow, .readfn = hcr_read },
      { .name = "ELR_EL2", .state = ARM_CP_STATE_AA64,
        .type = ARM_CP_ALIAS,
        .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 0, .opc2 = 1,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
  static const ARMCPRegInfo el2_v8_cp_reginfo[] = {
      { .name = "HCR2", .state = ARM_CP_STATE_AA32,
 -      .type = ARM_CP_ALIAS,
 +      .type = ARM_CP_ALIAS | ARM_CP_IO,
        .cp = 15, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 4,
        .access = PL2_RW,
        .fieldoffset = offsetofhigh32(CPUARMState, cp15.hcr_el2),
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 33/45] target/arm: Use gvec for VSHR, VSHL
+[PULL 10/45] docs: Fix installation of man pages with Sphinx 4.x
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Damien Goutte-Gattat <dgouttegattat@incenp.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+The 4.x branch of Sphinx introduces a breaking change, as generated man
-Message-id: 20181011205206.3552-13-richard.henderson@linaro.org
+pages are now written to subdirectories corresponding to the manual
 section they belong to. This results in `make install` erroring out when
 attempting to install the man pages, because they are not where it
 expects to find them.
 This patch restores the behavior of Sphinx 3.x regarding man pages.
 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/256
 Signed-off-by: Damien Goutte-Gattat <dgouttegattat@incenp.org>
 Message-id: 20210503161422.15028-1-dgouttegattat@incenp.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 70 +++++++++++++++++++++++++++++-------------
+ docs/conf.py | 1 +
-file changed, 48 insertions(+), 22 deletions(-)
+file changed, 1 insertion(+)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/docs/conf.py b/docs/conf.py
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/docs/conf.py
-+++ b/target/arm/translate.c
++++ b/docs/conf.py
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@
-                     size--;
+      ['Stefan Hajnoczi <stefanha@redhat.com>',
-             }
+       'Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>'], 1),
-             shift = (insn >> 16) & ((1 << (3 + size)) - 1);
+ ]
--            /* To avoid excessive duplication of ops we implement shift
++man_make_section_directory = False
--               by immediate using the variable shift operations.  */
-             if (op < 8) {
+ # -- Options for Texinfo output -------------------------------------------
                  /* Shift by immediate:
                     VSHR, VSRA, VRSHR, VRSRA, VSRI, VSHL, VQSHL, VQSHLU.  */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  }
                  /* Right shifts are encoded as N - shift, where N is the
                     element size in bits.  */
 -                if (op <= 4)
 +                if (op <= 4) {
                      shift = shift - (1 << (size + 3));
 +                }
 +
 +                switch (op) {
 +                case 0:  /* VSHR */
 +                    /* Right shift comes here negative.  */
 +                    shift = -shift;
 +                    /* Shifts larger than the element size are architecturally
 +                     * valid.  Unsigned results in all zeros; signed results
 +                     * in all sign bits.
 +                     */
 +                    if (!u) {
 +                        tcg_gen_gvec_sari(size, rd_ofs, rm_ofs,
 +                                          MIN(shift, (8 << size) - 1),
 +                                          vec_size, vec_size);
 +                    } else if (shift >= 8 << size) {
 +                        tcg_gen_gvec_dup8i(rd_ofs, vec_size, vec_size, 0);
 +                    } else {
 +                        tcg_gen_gvec_shri(size, rd_ofs, rm_ofs, shift,
 +                                          vec_size, vec_size);
 +                    }
 +                    return 0;
 +
 +                case 5: /* VSHL, VSLI */
 +                    if (!u) { /* VSHL */
 +                        /* Shifts larger than the element size are
 +                         * architecturally valid and results in zero.
 +                         */
 +                        if (shift >= 8 << size) {
 +                            tcg_gen_gvec_dup8i(rd_ofs, vec_size, vec_size, 0);
 +                        } else {
 +                            tcg_gen_gvec_shli(size, rd_ofs, rm_ofs, shift,
 +                                              vec_size, vec_size);
 +                        }
 +                        return 0;
 +                    }
 +                    break;
 +                }
 +
                  if (size == 3) {
                      count = q + 1;
                  } else {
                      count = q ? 4: 2;
                  }
 -                switch (size) {
 -                case 0:
 -                    imm = (uint8_t) shift;
 -                    imm |= imm << 8;
 -                    imm |= imm << 16;
 -                    break;
 -                case 1:
 -                    imm = (uint16_t) shift;
 -                    imm |= imm << 16;
 -                    break;
 -                case 2:
 -                case 3:
 -                    imm = shift;
 -                    break;
 -                default:
 -                    abort();
 -                }
 +
 +                /* To avoid excessive duplication of ops we implement shift
 +                 * by immediate using the variable shift operations.
 +                  */
 +                imm = dup_const(size, shift);
                  for (pass = 0; pass < count; pass++) {
                      if (size == 3) {
                          neon_load_reg64(cpu_V0, rm + pass);
                          tcg_gen_movi_i64(cpu_V1, imm);
                          switch (op) {
 -                        case 0:  /* VSHR */
                          case 1:  /* VSRA */
                              if (u)
                                  gen_helper_neon_shl_u64(cpu_V0, cpu_V0, cpu_V1);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                                           cpu_V0, cpu_V1);
                              }
                              break;
 +                        default:
 +                            g_assert_not_reached();
                          }
                          if (op == 1 || op == 3) {
                              /* Accumulate.  */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          tmp2 = tcg_temp_new_i32();
                          tcg_gen_movi_i32(tmp2, imm);
                          switch (op) {
 -                        case 0:  /* VSHR */
                          case 1:  /* VSRA */
                              GEN_NEON_INTEGER_OP(shl);
                              break;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          case 7: /* VQSHL */
                              GEN_NEON_INTEGER_OP_ENV(qshl);
                              break;
 +                        default:
 +                            g_assert_not_reached();
                          }
                          tcg_temp_free_i32(tmp2);
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 23/45] target/arm: Don't call tcg_clear_temp_count
+[PULL 11/45] target/arm: Mark LDS{MIN,MAX} as signed operations
 From: Richard Henderson <richard.henderson@linaro.org>
-This is done generically in translator_loop.
+The operands to tcg_gen_atomic_fetch_s{min,max}_i64 must
 be signed, so that the inputs are properly extended.
 Zero extend the result afterward, as needed.
-Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/364
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Message-id: 20181011205206.3552-3-richard.henderson@linaro.org
+Message-id: 20210602020720.47679-1-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 1 -
+ target/arm/translate-a64.c | 13 ++++++++++---
- target/arm/translate.c     | 1 -
+file changed, 10 insertions(+), 3 deletions(-)
 files changed, 2 deletions(-)
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
+@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
+     int o3_opc = extract32(insn, 12, 4);
- static void aarch64_tr_tb_start(DisasContextBase *db, CPUState *cpu)
+     bool r = extract32(insn, 22, 1);
- {
+     bool a = extract32(insn, 23, 1);
--    tcg_clear_temp_count();
+-    TCGv_i64 tcg_rs, clean_addr;
 +    TCGv_i64 tcg_rs, tcg_rt, clean_addr;
      AtomicThreeOpFn *fn = NULL;
 +    MemOp mop = s->be_data | size | MO_ALIGN;
      if (is_vector || !dc_isar_feature(aa64_atomics, s)) {
          unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
          break;
      case 004: /* LDSMAX */
          fn = tcg_gen_atomic_fetch_smax_i64;
 +        mop |= MO_SIGN;
          break;
      case 005: /* LDSMIN */
          fn = tcg_gen_atomic_fetch_smin_i64;
 +        mop |= MO_SIGN;
          break;
      case 006: /* LDUMAX */
          fn = tcg_gen_atomic_fetch_umax_i64;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
      }
      tcg_rs = read_cpu_reg(s, rs, true);
 +    tcg_rt = cpu_reg(s, rt);
      if (o3_opc == 1) { /* LDCLR */
          tcg_gen_not_i64(tcg_rs, tcg_rs);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
      /* The tcg atomic primitives are all full barriers.  Therefore we
       * can ignore the Acquire and Release bits of this instruction.
       */
 -    fn(cpu_reg(s, rt), clean_addr, tcg_rs, get_mem_index(s),
 -       s->be_data | size | MO_ALIGN);
 +    fn(tcg_rt, clean_addr, tcg_rs, get_mem_index(s), mop);
 +
 +    if ((mop & MO_SIGN) && size != MO_64) {
 +        tcg_gen_ext32u_i64(tcg_rt, tcg_rt);
 +    }
  }
- static void aarch64_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu)
+ /*
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void arm_tr_tb_start(DisasContextBase *dcbase, CPUState *cpu)
          tcg_gen_movi_i32(tmp, 0);
          store_cpu_field(tmp, condexec_bits);
      }
 -    tcg_clear_temp_count();
  }
  static void arm_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu)
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 18/45] target/arm: New utility function to extract EC from syndrome
+[PULL 12/45] target/arm: fix missing exception class
-Create and use a utility function to extract the EC field
+From: Jamie Iles <jamie@nuviainc.com>
 from a syndrome, rather than open-coding the shift.
+The DAIF and PAC checks used raise_exception_ra to raise an exception
+and unwind CPU state but raise_exception_ra is currently designed for
+handling data aborts as the syndrome is partially precomputed and
+encoded in the TB and then merged in merge_syn_data_abort when handling
+the data abort.  Using raise_exception_ra for DAIF and PAC checks
+results in an empty syndrome being retrieved from data[2] in
+restore_state_to_opc and setting ESR to 0.  This manifested as:
+  kvm [571]: Unknown exception class: esr: 0x000000 –
+  Unknown/Uncategorized
+when launching a KVM guest when the host qemu used a CPU supporting
+EL2+pointer authentication and enabling pointer authentication in the
+guest.
+Rework raise_exception_ra such that the state is restored before raising
+the exception so that the exception is not clobbered by
+restore_state_to_opc.
+Fixes: 0d43e1a2d29a ("target/arm: Add PAuth helpers")
+Cc: Richard Henderson <richard.henderson@linaro.org>
+Cc: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Jamie Iles <jamie@nuviainc.com>
+[PMM: added comment]
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181012144235.19646-9-peter.maydell@linaro.org
 ---
- target/arm/internals.h | 5 +++++
+ target/arm/op_helper.c | 11 +++++++++--
- target/arm/helper.c    | 4 ++--
+file changed, 9 insertions(+), 2 deletions(-)
  target/arm/kvm64.c     | 2 +-
  target/arm/op_helper.c | 2 +-
 files changed, 9 insertions(+), 4 deletions(-)
-diff --git a/target/arm/internals.h b/target/arm/internals.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/internals.h
-+++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ enum arm_exception_class {
- #define ARM_EL_IL (1 << ARM_EL_IL_SHIFT)
- #define ARM_EL_ISV (1 << ARM_EL_ISV_SHIFT)
-+static inline uint32_t syn_get_ec(uint32_t syn)
-+{
-+    return syn >> ARM_EL_EC_SHIFT;
-+}
-+
- /* Utility functions for constructing various kinds of syndrome value.
-  * Note that in general we follow the AArch64 syndrome values; in a
-  * few cases the value in HSR for exceptions taken to AArch32 Hyp
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_do_interrupt_aarch32(CPUState *cs)
-     uint32_t moe;
-     /* If this is a debug exception we must update the DBGDSCR.MOE bits */
--    switch (env->exception.syndrome >> ARM_EL_EC_SHIFT) {
-+    switch (syn_get_ec(env->exception.syndrome)) {
-     case EC_BREAKPOINT:
-     case EC_BREAKPOINT_SAME_EL:
-         moe = 1;
-@@ -XXX,XX +XXX,XX @@ void arm_cpu_do_interrupt(CPUState *cs)
-     if (qemu_loglevel_mask(CPU_LOG_INT)
-         && !excp_is_internal(cs->exception_index)) {
-         qemu_log_mask(CPU_LOG_INT, "...with ESR 0x%x/0x%" PRIx32 "\n",
--                      env->exception.syndrome >> ARM_EL_EC_SHIFT,
-+                      syn_get_ec(env->exception.syndrome),
-                       env->exception.syndrome);
-     }
-diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm64.c
-+++ b/target/arm/kvm64.c
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_remove_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
- bool kvm_arm_handle_debug(CPUState *cs, struct kvm_debug_exit_arch *debug_exit)
- {
--    int hsr_ec = debug_exit->hsr >> ARM_EL_EC_SHIFT;
-+    int hsr_ec = syn_get_ec(debug_exit->hsr);
-     ARMCPU *cpu = ARM_CPU(cs);
-     CPUClass *cc = CPU_GET_CLASS(cs);
-     CPUARMState *env = &cpu->env;
 diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/op_helper.c
 +++ b/target/arm/op_helper.c
 @@ -XXX,XX +XXX,XX @@ void raise_exception(CPUARMState *env, uint32_t excp,
-          * (see DDI0478C.a D1.10.4)
+ void raise_exception_ra(CPUARMState *env, uint32_t excp, uint32_t syndrome,
-          */
+                         uint32_t target_el, uintptr_t ra)
-         target_el = 2;
+ {
--        if (syndrome >> ARM_EL_EC_SHIFT == EC_ADVSIMDFPACCESSTRAP) {
+-    CPUState *cs = do_raise_exception(env, excp, syndrome, target_el);
-+        if (syn_get_ec(syndrome) == EC_ADVSIMDFPACCESSTRAP) {
+-    cpu_loop_exit_restore(cs, ra);
-             syndrome = syn_uncategorized();
++    CPUState *cs = env_cpu(env);
-         }
++
-     }
++    /*
 +     * restore_state_to_opc() will set env->exception.syndrome, so
 +     * we must restore CPU state here before setting the syndrome
 +     * the caller passed us, and cannot use cpu_loop_exit_restore().
 +     */
 +    cpu_restore_state(cs, ra, true);
 +    raise_exception(env, excp, syndrome, target_el);
  }
  uint64_t HELPER(neon_tbl)(CPUARMState *env, uint32_t desc,
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 38/45] target/arm: Use gvec for NEON VLD all lanes
+[PULL 13/45] target/arm: fold do_raise_exception into raise_exception
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Jamie Iles <jamie@nuviainc.com>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Now that there are no other users of do_raise_exception, fold it into
-Message-id: 20181011205206.3552-18-richard.henderson@linaro.org
+raise_exception.
-[PMM: added parens in ?: expression]
 Cc: Richard Henderson <richard.henderson@linaro.org>
 Cc: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Jamie Iles <jamie@nuviainc.com>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 81 ++++++++++++++----------------------------
+ target/arm/op_helper.c | 12 ++----------
-file changed, 26 insertions(+), 55 deletions(-)
+file changed, 2 insertions(+), 10 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/op_helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/op_helper.c
-@@ -XXX,XX +XXX,XX @@ static void gen_vfp_msr(TCGv_i32 tmp)
+@@ -XXX,XX +XXX,XX @@
-     tcg_temp_free_i32(tmp);
+ #define SIGNBIT (uint32_t)0x80000000
- }
+ #define SIGNBIT64 ((uint64_t)1 << 63)
--static void gen_neon_dup_u8(TCGv_i32 var, int shift)
+-static CPUState *do_raise_exception(CPUARMState *env, uint32_t excp,
--{
+-                                    uint32_t syndrome, uint32_t target_el)
--    TCGv_i32 tmp = tcg_temp_new_i32();
++void raise_exception(CPUARMState *env, uint32_t excp,
--    if (shift)
++                     uint32_t syndrome, uint32_t target_el)
--        tcg_gen_shri_i32(var, var, shift);
+ {
--    tcg_gen_ext8u_i32(var, var);
+     CPUState *cs = env_cpu(env);
--    tcg_gen_shli_i32(tmp, var, 8);
--    tcg_gen_or_i32(var, var, tmp);
+@@ -XXX,XX +XXX,XX @@ static CPUState *do_raise_exception(CPUARMState *env, uint32_t excp,
--    tcg_gen_shli_i32(tmp, var, 16);
+     cs->exception_index = excp;
--    tcg_gen_or_i32(var, var, tmp);
+     env->exception.syndrome = syndrome;
--    tcg_temp_free_i32(tmp);
+     env->exception.target_el = target_el;
 -
 -    return cs;
 -}
 -
- static void gen_neon_dup_low16(TCGv_i32 var)
+-void raise_exception(CPUARMState *env, uint32_t excp,
- {
+-                     uint32_t syndrome, uint32_t target_el)
-     TCGv_i32 tmp = tcg_temp_new_i32();
+-{
-@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
+-    CPUState *cs = do_raise_exception(env, excp, syndrome, target_el);
-     tcg_temp_free_i32(tmp);
+     cpu_loop_exit(cs);
  }
--static TCGv_i32 gen_load_and_replicate(DisasContext *s, TCGv_i32 addr, int size)
--{
--    /* Load a single Neon element and replicate into a 32 bit TCG reg */
--    TCGv_i32 tmp = tcg_temp_new_i32();
--    switch (size) {
--    case 0:
--        gen_aa32_ld8u(s, tmp, addr, get_mem_index(s));
--        gen_neon_dup_u8(tmp, 0);
--        break;
--    case 1:
--        gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
--        gen_neon_dup_low16(tmp);
--        break;
--    case 2:
--        gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
--        break;
--    default: /* Avoid compiler warnings.  */
--        abort();
--    }
--    return tmp;
--}
--
- static int handle_vsel(uint32_t insn, uint32_t rd, uint32_t rn, uint32_t rm,
-                        uint32_t dp)
- {
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
-     int load;
-     int shift;
-     int n;
-+    int vec_size;
-     TCGv_i32 addr;
-     TCGv_i32 tmp;
-     TCGv_i32 tmp2;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
-             }
-             addr = tcg_temp_new_i32();
-             load_reg_var(s, addr, rn);
--            if (nregs == 1) {
--                /* VLD1 to all lanes: bit 5 indicates how many Dregs to write */
--                tmp = gen_load_and_replicate(s, addr, size);
--                tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 0));
--                tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 1));
--                if (insn & (1 << 5)) {
--                    tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd + 1, 0));
--                    tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd + 1, 1));
--                }
--                tcg_temp_free_i32(tmp);
--            } else {
--                /* VLD2/3/4 to all lanes: bit 5 indicates register stride */
--                stride = (insn & (1 << 5)) ? 2 : 1;
--                for (reg = 0; reg < nregs; reg++) {
--                    tmp = gen_load_and_replicate(s, addr, size);
--                    tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 0));
--                    tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 1));
--                    tcg_temp_free_i32(tmp);
--                    tcg_gen_addi_i32(addr, addr, 1 << size);
--                    rd += stride;
-+
-+            /* VLD1 to all lanes: bit 5 indicates how many Dregs to write.
-+             * VLD2/3/4 to all lanes: bit 5 indicates register stride.
-+             */
-+            stride = (insn & (1 << 5)) ? 2 : 1;
-+            vec_size = nregs == 1 ? stride * 8 : 8;
-+
-+            tmp = tcg_temp_new_i32();
-+            for (reg = 0; reg < nregs; reg++) {
-+                gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
-+                                s->be_data | size);
-+                if ((rd & 1) && vec_size == 16) {
-+                    /* We cannot write 16 bytes at once because the
-+                     * destination is unaligned.
-+                     */
-+                    tcg_gen_gvec_dup_i32(size, neon_reg_offset(rd, 0),
-+                                         8, 8, tmp);
-+                    tcg_gen_gvec_mov(0, neon_reg_offset(rd + 1, 0),
-+                                     neon_reg_offset(rd, 0), 8, 8);
-+                } else {
-+                    tcg_gen_gvec_dup_i32(size, neon_reg_offset(rd, 0),
-+                                         vec_size, vec_size, tmp);
-                 }
-+                tcg_gen_addi_i32(addr, addr, 1 << size);
-+                rd += stride;
-             }
-+            tcg_temp_free_i32(tmp);
-             tcg_temp_free_i32(addr);
-             stride = (1 << size) * nregs;
-         } else {
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 32/45] target/arm: Use gvec for NEON_3R_VMUL
+[PULL 14/45] target/arm: use raise_exception_ra for MTE check failure
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Jamie Iles <jamie@nuviainc.com>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Now that raise_exception_ra restores the state before raising the
-Message-id: 20181011205206.3552-12-richard.henderson@linaro.org
+exception we can use restore_exception_ra to perform the state restore +
 exception raising without clobbering the syndrome.
 Cc: Richard Henderson <richard.henderson@linaro.org>
 Cc: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Jamie Iles <jamie@nuviainc.com>
 [PMM: Keep the one line of the comment that is still relevant]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 31 +++++++++++++++----------------
+ target/arm/mte_helper.c | 12 +++---------
-file changed, 15 insertions(+), 16 deletions(-)
+file changed, 3 insertions(+), 9 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/mte_helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/mte_helper.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
-                                  vec_size, vec_size);
-             }
+     switch (tcf) {
-             return 0;
+     case 1:
-+
+-        /*
-+        case NEON_3R_VMUL: /* VMUL */
+-         * Tag check fail causes a synchronous exception.
-+            if (u) {
+-         *
-+                /* Polynomial case allows only P8 and is handled below.  */
+-         * In restore_state_to_opc, we set the exception syndrome
-+                if (size != 0) {
+-         * for the load or store operation.  Unwind first so we
-+                    return 1;
+-         * may overwrite that with the syndrome for the tag check.
-+                }
+-         */
-+            } else {
+-        cpu_restore_state(env_cpu(env), ra, true);
-+                tcg_gen_gvec_mul(size, rd_ofs, rn_ofs, rm_ofs,
++        /* Tag check fail causes a synchronous exception. */
-+                                 vec_size, vec_size);
+         env->exception.vaddress = dirty_ptr;
-+                return 0;
-+            }
+         is_write = FIELD_EX32(desc, MTEDESC, WRITE);
-+            break;
+         syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0,
-         }
+                                     is_write, 0x11);
-         if (size == 3) {
+-        raise_exception(env, EXCP_DATA_ABORT, syn, exception_target_el(env));
-             /* 64-bit element instructions. */
++        raise_exception_ra(env, EXCP_DATA_ABORT, syn,
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
++                           exception_target_el(env), ra);
-                 return 1;
+         /* noreturn, but fall through to the assert anyway */
-             }
-             break;
+     case 0:
 -        case NEON_3R_VMUL:
 -            if (u && (size != 0)) {
 -                /* UNDEF on invalid size for polynomial subcase */
 -                return 1;
 -            }
 -            break;
          case NEON_3R_VFM_VQRDMLSH:
              if (!arm_dc_feature(s, ARM_FEATURE_VFP4)) {
                  return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
              }
              break;
          case NEON_3R_VMUL:
 -            if (u) { /* polynomial */
 -                gen_helper_neon_mul_p8(tmp, tmp, tmp2);
 -            } else { /* Integer */
 -                switch (size) {
 -                case 0: gen_helper_neon_mul_u8(tmp, tmp, tmp2); break;
 -                case 1: gen_helper_neon_mul_u16(tmp, tmp, tmp2); break;
 -                case 2: tcg_gen_mul_i32(tmp, tmp, tmp2); break;
 -                default: abort();
 -                }
 -            }
 +            /* VMUL.P8; other cases already eliminated.  */
 +            gen_helper_neon_mul_p8(tmp, tmp, tmp2);
              break;
          case NEON_3R_VPMAX:
              GEN_NEON_INTEGER_OP(pmax);
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 01/45] ssi-sd: Make devices picking up backends unavailable with -device
+[PULL 15/45] target/arm: use raise_exception_ra for stack limit exception
-From: Markus Armbruster <armbru@redhat.com>
+From: Jamie Iles <jamie@nuviainc.com>
-Device models aren't supposed to go on fishing expeditions for
+The sequence cpu_restore_state() + raise_exception() is equivalent to
-backends.  They should expose suitable properties for the user to set.
+raise_exception_ra(), so use that instead.  (In this case we never
-For onboard devices, board code sets them.
+cared about the syndrome value, because M-profile doesn't use the
 syndrome; the old code was just written unnecessarily awkwardly.)
-Device ssi-sd picks up its block backend in its init() method with
+Cc: Richard Henderson <richard.henderson@linaro.org>
-drive_get_next() instead.  This mistake is already marked FIXME since
+Cc: Peter Maydell <peter.maydell@linaro.org>
-commit af9e40a.
+Signed-off-by: Jamie Iles <jamie@nuviainc.com>
+[PMM: Retain edited version of comment; rewrite commit message]
-Unset user_creatable to remove the mistake from our external
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 interface.  Since the SSI bus doesn't support hotplug, only -device
 can be affected.  Only certain ARM machines have ssi-sd and provide an
 SSI bus for it; this patch breaks -device ssi-sd for these machines.
 No actual use of -device ssi-sd is known.
 Signed-off-by: Markus Armbruster <armbru@redhat.com>
 Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Acked-by: Thomas Huth <thuth@redhat.com>
 Message-id: 20181009060835.4608-1-armbru@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/sd/ssi-sd.c | 2 ++
+ target/arm/m_helper.c  | 5 +----
-file changed, 2 insertions(+)
+ target/arm/op_helper.c | 9 +++------
 files changed, 4 insertions(+), 10 deletions(-)
-diff --git a/hw/sd/ssi-sd.c b/hw/sd/ssi-sd.c
+diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/sd/ssi-sd.c
+--- a/target/arm/m_helper.c
-+++ b/hw/sd/ssi-sd.c
++++ b/target/arm/m_helper.c
-@@ -XXX,XX +XXX,XX @@ static void ssi_sd_class_init(ObjectClass *klass, void *data)
+@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
-     k->cs_polarity = SSI_CS_LOW;
+             limit = is_psp ? env->v7m.psplim[false] : env->v7m.msplim[false];
-     dc->vmsd = &vmstate_ssi_sd;
-     dc->reset = ssi_sd_reset;
+             if (val < limit) {
-+    /* Reason: init() method uses drive_get_next() */
+-                CPUState *cs = env_cpu(env);
-+    dc->user_creatable = false;
+-
 -                cpu_restore_state(cs, GETPC(), true);
 -                raise_exception(env, EXCP_STKOF, 0, 1);
 +                raise_exception_ra(env, EXCP_STKOF, 0, 1, GETPC());
              }
              if (is_psp) {
 diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/op_helper.c
 +++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v8m_stackcheck)(CPUARMState *env, uint32_t newvalue)
       * raising an exception if the limit is breached.
       */
      if (newvalue < v7m_sp_limit(env)) {
 -        CPUState *cs = env_cpu(env);
 -
          /*
           * Stack limit exceptions are a rare case, so rather than syncing
 -         * PC/condbits before the call, we use cpu_restore_state() to
 -         * get them right before raising the exception.
 +         * PC/condbits before the call, we use raise_exception_ra() so
 +         * that cpu_restore_state() will sort them out.
           */
 -        cpu_restore_state(cs, GETPC(), true);
 -        raise_exception(env, EXCP_STKOF, 0, 1);
 +        raise_exception_ra(env, EXCP_STKOF, 0, 1, GETPC());
      }
  }
- static const TypeInfo ssi_sd_info = {
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 07/45] target/arm: Convert jazelle from feature bit to isar1 test
+[PULL 16/45] target/arm: Add isar_feature_{aa32, aa64, aa64_sve}_bf16
 From: Richard Henderson <richard.henderson@linaro.org>
-Having V6 alone imply jazelle was wrong for cortex-m0.
+Note that the SVE BFLOAT16 support does not require SVE2,
-Change to an assertion for V6 & !M.
+it is an independent extension.
-This was harmless, because the only place we tested ARM_FEATURE_JAZELLE
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 was for 'bxj' in disas_arm(), which is unreachable for M-profile cores.
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181016223115.24100-6-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-2-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h       |  6 +++++-
+ target/arm/cpu.h | 15 +++++++++++++++
- target/arm/cpu.c       | 17 ++++++++++++++---
+file changed, 15 insertions(+)
  target/arm/translate.c |  2 +-
 files changed, 20 insertions(+), 5 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ enum arm_features {
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_predinv(const ARMISARegisters *id)
-     ARM_FEATURE_PMU, /* has PMU support */
+     return FIELD_EX32(id->id_isar6, ID_ISAR6, SPECRES) != 0;
      ARM_FEATURE_VBAR, /* has cp15 VBAR */
      ARM_FEATURE_M_SECURITY, /* M profile Security Extension */
 -    ARM_FEATURE_JAZELLE, /* has (trivial) Jazelle implementation */
      ARM_FEATURE_SVE, /* has Scalable Vector Extension */
      ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
      ARM_FEATURE_M_MAIN, /* M profile Main Extension */
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_arm_div(const ARMISARegisters *id)
      return FIELD_EX32(id->id_isar0, ID_ISAR0, DIVIDE) > 1;
  }
-+static inline bool isar_feature_jazelle(const ARMISARegisters *id)
++static inline bool isar_feature_aa32_bf16(const ARMISARegisters *id)
 +{
-+    return FIELD_EX32(id->id_isar1, ID_ISAR1, JAZELLE) != 0;
++    return FIELD_EX32(id->id_isar6, ID_ISAR6, BF16) != 0;
 +}
 +
- static inline bool isar_feature_aa32_aes(const ARMISARegisters *id)
+ static inline bool isar_feature_aa32_i8mm(const ARMISARegisters *id)
  {
-     return FIELD_EX32(id->id_isar5, ID_ISAR5, AES) != 0;
+     return FIELD_EX32(id->id_isar6, ID_ISAR6, I8MM) != 0;
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_dcpodp(const ARMISARegisters *id)
-index XXXXXXX..XXXXXXX 100644
+     return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) >= 2;
---- a/target/arm/cpu.c
+ }
-+++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
++static inline bool isar_feature_aa64_bf16(const ARMISARegisters *id)
-     }
++{
-     if (arm_feature(env, ARM_FEATURE_V6)) {
++    return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, BF16) != 0;
-         set_feature(env, ARM_FEATURE_V5);
++}
 -        set_feature(env, ARM_FEATURE_JAZELLE);
          if (!arm_feature(env, ARM_FEATURE_M)) {
 +            assert(cpu_isar_feature(jazelle, cpu));
              set_feature(env, ARM_FEATURE_AUXCR);
          }
      }
@@ -XXX,XX +XXX,XX @@ static void arm926_initfn(Object *obj)
      set_feature(&cpu->env, ARM_FEATURE_VFP);
      set_feature(&cpu->env, ARM_FEATURE_DUMMY_C15_REGS);
      set_feature(&cpu->env, ARM_FEATURE_CACHE_TEST_CLEAN);
 -    set_feature(&cpu->env, ARM_FEATURE_JAZELLE);
      cpu->midr = 0x41069265;
      cpu->reset_fpsid = 0x41011090;
      cpu->ctr = 0x1dd20d2;
      cpu->reset_sctlr = 0x00090078;
 +
-+    /*
+ static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id)
-+     * ARMv5 does not have the ID_ISAR registers, but we can still
+ {
-+     * set the field to indicate Jazelle support within QEMU.
+     /* We always set the AdvSIMD and FP fields identically.  */
-+     */
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sve2_bitperm(const ARMISARegisters *id)
-+    cpu->isar.id_isar1 = FIELD_DP32(cpu->isar.id_isar1, ID_ISAR1, JAZELLE, 1);
+     return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BITPERM) != 0;
  }
- static void arm946_initfn(Object *obj)
++static inline bool isar_feature_aa64_sve_bf16(const ARMISARegisters *id)
-@@ -XXX,XX +XXX,XX @@ static void arm1026_initfn(Object *obj)
++{
-     set_feature(&cpu->env, ARM_FEATURE_AUXCR);
++    return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BFLOAT16) != 0;
-     set_feature(&cpu->env, ARM_FEATURE_DUMMY_C15_REGS);
++}
      set_feature(&cpu->env, ARM_FEATURE_CACHE_TEST_CLEAN);
 -    set_feature(&cpu->env, ARM_FEATURE_JAZELLE);
      cpu->midr = 0x4106a262;
      cpu->reset_fpsid = 0x410110a0;
      cpu->ctr = 0x1dd20d2;
      cpu->reset_sctlr = 0x00090078;
      cpu->reset_auxcr = 1;
 +
-+    /*
+ static inline bool isar_feature_aa64_sve2_sha3(const ARMISARegisters *id)
-+     * ARMv5 does not have the ID_ISAR registers, but we can still
+ {
-+     * set the field to indicate Jazelle support within QEMU.
+     return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, SHA3) != 0;
 +     */
 +    cpu->isar.id_isar1 = FIELD_DP32(cpu->isar.id_isar1, ID_ISAR1, JAZELLE, 1);
 +
      {
          /* The 1026 had an IFAR at c6,c0,0,1 rather than the ARMv6 c6,c0,0,2 */
          ARMCPRegInfo ifar = {
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@
  #define ENABLE_ARCH_5     arm_dc_feature(s, ARM_FEATURE_V5)
  /* currently all emulated v5 cores are also v5TE, so don't bother */
  #define ENABLE_ARCH_5TE   arm_dc_feature(s, ARM_FEATURE_V5)
 -#define ENABLE_ARCH_5J    arm_dc_feature(s, ARM_FEATURE_JAZELLE)
 +#define ENABLE_ARCH_5J    dc_isar_feature(jazelle, s)
  #define ENABLE_ARCH_6     arm_dc_feature(s, ARM_FEATURE_V6)
  #define ENABLE_ARCH_6K    arm_dc_feature(s, ARM_FEATURE_V6K)
  #define ENABLE_ARCH_6T2   arm_dc_feature(s, ARM_FEATURE_THUMB2)
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 25/45] target/arm: Promote consecutive memory ops for aa64
+[PULL 17/45] target/arm: Unify unallocated path in disas_fp_1src
 From: Richard Henderson <richard.henderson@linaro.org>
-For a sequence of loads or stores from a single register,
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 little-endian operations can be promoted to an 8-byte op.
 This can reduce the number of operations by a factor of 8.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181011205206.3552-5-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-3-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 66 +++++++++++++++++++++++---------------
+ target/arm/translate-a64.c | 15 ++++++---------
-file changed, 40 insertions(+), 26 deletions(-)
+file changed, 6 insertions(+), 9 deletions(-)
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void write_vec_element_i32(DisasContext *s, TCGv_i32 tcg_src,
+@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
+     int rd = extract32(insn, 0, 5);
- /* Store from vector register to memory */
- static void do_vec_st(DisasContext *s, int srcidx, int element,
+     if (mos) {
--                      TCGv_i64 tcg_addr, int size)
+-        unallocated_encoding(s);
-+                      TCGv_i64 tcg_addr, int size, TCGMemOp endian)
+-        return;
- {
++        goto do_unallocated;
 -    TCGMemOp memop = s->be_data + size;
      TCGv_i64 tcg_tmp = tcg_temp_new_i64();
      read_vec_element(s, tcg_tmp, srcidx, element, size);
 -    tcg_gen_qemu_st_i64(tcg_tmp, tcg_addr, get_mem_index(s), memop);
 +    tcg_gen_qemu_st_i64(tcg_tmp, tcg_addr, get_mem_index(s), endian | size);
      tcg_temp_free_i64(tcg_tmp);
  }
  /* Load from memory to vector register */
  static void do_vec_ld(DisasContext *s, int destidx, int element,
 -                      TCGv_i64 tcg_addr, int size)
 +                      TCGv_i64 tcg_addr, int size, TCGMemOp endian)
  {
 -    TCGMemOp memop = s->be_data + size;
      TCGv_i64 tcg_tmp = tcg_temp_new_i64();
 -    tcg_gen_qemu_ld_i64(tcg_tmp, tcg_addr, get_mem_index(s), memop);
 +    tcg_gen_qemu_ld_i64(tcg_tmp, tcg_addr, get_mem_index(s), endian | size);
      write_vec_element(s, tcg_tmp, destidx, element, size);
      tcg_temp_free_i64(tcg_tmp);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
      bool is_postidx = extract32(insn, 23, 1);
      bool is_q = extract32(insn, 30, 1);
      TCGv_i64 tcg_addr, tcg_rn, tcg_ebytes;
 +    TCGMemOp endian = s->be_data;
 -    int ebytes = 1 << size;
 -    int elements = (is_q ? 128 : 64) / (8 << size);
 +    int ebytes;   /* bytes per element */
 +    int elements; /* elements per vector */
      int rpt;    /* num iterations */
      int selem;  /* structure elements */
      int r;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
          gen_check_sp_alignment(s);
      }
-+    /* For our purposes, bytes are always little-endian.  */
+     switch (opcode) {
-+    if (size == 0) {
+@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
-+        endian = MO_LE;
+         /* FCVT between half, single and double precision */
-+    }
+         int dtype = extract32(opcode, 0, 2);
-+
+         if (type == 2 || dtype == type) {
-+    /* Consecutive little-endian elements from a single register
+-            unallocated_encoding(s);
-+     * can be promoted to a larger little-endian operation.
+-            return;
-+     */
++            goto do_unallocated;
-+    if (selem == 1 && endian == MO_LE) {
+         }
-+        size = 3;
+         if (!fp_access_check(s)) {
-+    }
+             return;
-+    ebytes = 1 << size;
+@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
-+    elements = (is_q ? 16 : 8) / ebytes;
-+
+     case 0x10 ... 0x13: /* FRINT{32,64}{X,Z} */
-     tcg_rn = cpu_reg_sp(s, rn);
+         if (type > 1 || !dc_isar_feature(aa64_frint, s)) {
-     tcg_addr = tcg_temp_new_i64();
+-            unallocated_encoding(s);
-     tcg_gen_mov_i64(tcg_addr, tcg_rn);
+-            return;
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
++            goto do_unallocated;
-     for (r = 0; r < rpt; r++) {
+         }
-         int e;
+         /* fall through */
-         for (e = 0; e < elements; e++) {
+     case 0x0 ... 0x3:
--            int tt = (rt + r) % 32;
+@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
-             int xs;
+             break;
-             for (xs = 0; xs < selem; xs++) {
+         case 3:
-+                int tt = (rt + r + xs) % 32;
+             if (!dc_isar_feature(aa64_fp16, s)) {
-                 if (is_store) {
+-                unallocated_encoding(s);
--                    do_vec_st(s, tt, e, tcg_addr, size);
+-                return;
-+                    do_vec_st(s, tt, e, tcg_addr, size, endian);
++                goto do_unallocated;
                  } else {
 -                    do_vec_ld(s, tt, e, tcg_addr, size);
 -
 -                    /* For non-quad operations, setting a slice of the low
 -                     * 64 bits of the register clears the high 64 bits (in
 -                     * the ARM ARM pseudocode this is implicit in the fact
 -                     * that 'rval' is a 64 bit wide variable).
 -                     * For quad operations, we might still need to zero the
 -                     * high bits of SVE.  We optimize by noticing that we only
 -                     * need to do this the first time we touch a register.
 -                     */
 -                    if (e == 0 && (r == 0 || xs == selem - 1)) {
 -                        clear_vec_high(s, is_q, tt);
 -                    }
 +                    do_vec_ld(s, tt, e, tcg_addr, size, endian);
                  }
                  tcg_gen_add_i64(tcg_addr, tcg_addr, tcg_ebytes);
 -                tt = (tt + 1) % 32;
              }
+             if (!fp_access_check(s)) {
+@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
+             handle_fp_1src_half(s, opcode, rd, rn);
+             break;
+         default:
+-            unallocated_encoding(s);
++            goto do_unallocated;
          }
+         break;
+     default:
++    do_unallocated:
+         unallocated_encoding(s);
+         break;
      }
-+    if (!is_store) {
-+        /* For non-quad operations, setting a slice of the low
-+         * 64 bits of the register clears the high 64 bits (in
-+         * the ARM ARM pseudocode this is implicit in the fact
-+         * that 'rval' is a 64 bit wide variable).
-+         * For quad operations, we might still need to zero the
-+         * high bits of SVE.
-+         */
-+        for (r = 0; r < rpt * selem; r++) {
-+            int tt = (rt + r) % 32;
-+            clear_vec_high(s, is_q, tt);
-+        }
-+    }
-+
-     if (is_postidx) {
-         int rm = extract32(insn, 16, 5);
-         if (rm == 31) {
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
-         } else {
-             /* Load/store one element per register */
-             if (is_load) {
--                do_vec_ld(s, rt, index, tcg_addr, scale);
-+                do_vec_ld(s, rt, index, tcg_addr, scale, s->be_data);
-             } else {
--                do_vec_st(s, rt, index, tcg_addr, scale);
-+                do_vec_st(s, rt, index, tcg_addr, scale, s->be_data);
-             }
-         }
-         tcg_gen_add_i64(tcg_addr, tcg_addr, tcg_ebytes);
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 39/45] target/arm: Reorg NEON VLD/VST all elements
+[PULL 18/45] target/arm: Implement scalar float32 to bfloat16 conversion
 From: Richard Henderson <richard.henderson@linaro.org>
-Instead of shifts and masks, use direct loads and stores from the neon
+This is the 64-bit BFCVT and the 32-bit VCVT{B,T}.BF16.F32.
 register file.  Mirror the iteration structure of the ARM pseudocode
 more closely.  Correct the parameters of the VLD2 A2 insn.
-Note that this includes a bugfix for handling of the insn
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 "VLD2 (multiple 2-element structures)" -- we were using an
 incorrect stride value.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181011205206.3552-19-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-4-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 170 ++++++++++++++++++-----------------------
+ target/arm/helper.h        |  1 +
-file changed, 74 insertions(+), 96 deletions(-)
+ target/arm/vfp.decode      |  2 ++
  target/arm/translate-a64.c | 19 +++++++++++++++++++
  target/arm/translate-vfp.c | 24 ++++++++++++++++++++++++
  target/arm/vfp_helper.c    |  5 +++++
 files changed, 51 insertions(+)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/helper.h
-+++ b/target/arm/translate.c
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ static TCGv_i32 neon_load_reg(int reg, int pass)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
-     return tmp;
  DEF_HELPER_2(vfp_fcvtds, f64, f32, env)
  DEF_HELPER_2(vfp_fcvtsd, f32, f64, env)
 +DEF_HELPER_FLAGS_2(bfcvt, TCG_CALL_NO_RWG, i32, f32, ptr)
  DEF_HELPER_2(vfp_uitoh, f16, i32, ptr)
  DEF_HELPER_2(vfp_uitos, f32, i32, ptr)
 diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp.decode
 +++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
  # VCVTB and VCVTT to f16: Vd format is always vd_sp;
  # Vm format depends on size bit
 +VCVT_b16_f32 ---- 1110 1.11 0011 .... 1001 t:1 1.0 .... \
 +             vd=%vd_sp vm=%vm_sp
  VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
               vd=%vd_sp vm=%vm_sp
  VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn)
      case 0x3: /* FSQRT */
          gen_helper_vfp_sqrts(tcg_res, tcg_op, cpu_env);
          goto done;
 +    case 0x6: /* BFCVT */
 +        gen_fpst = gen_helper_bfcvt;
 +        break;
      case 0x8: /* FRINTN */
      case 0x9: /* FRINTP */
      case 0xa: /* FRINTM */
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
          }
          break;
 +    case 0x6:
 +        switch (type) {
 +        case 1: /* BFCVT */
 +            if (!dc_isar_feature(aa64_bf16, s)) {
 +                goto do_unallocated;
 +            }
 +            if (!fp_access_check(s)) {
 +                return;
 +            }
 +            handle_fp_1src_single(s, opcode, rd, rn);
 +            break;
 +        default:
 +            goto do_unallocated;
 +        }
 +        break;
 +
      default:
      do_unallocated:
          unallocated_encoding(s);
 diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c
 +++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
      return true;
  }
-+static void neon_load_element64(TCGv_i64 var, int reg, int ele, TCGMemOp mop)
++static bool trans_VCVT_b16_f32(DisasContext *s, arg_VCVT_b16_f32 *a)
 +{
-+    long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
++    TCGv_ptr fpst;
 +    TCGv_i32 tmp;
 +
-+    switch (mop) {
++    if (!dc_isar_feature(aa32_bf16, s)) {
-+    case MO_UB:
++        return false;
 +        tcg_gen_ld8u_i64(var, cpu_env, offset);
 +        break;
 +    case MO_UW:
 +        tcg_gen_ld16u_i64(var, cpu_env, offset);
 +        break;
 +    case MO_UL:
 +        tcg_gen_ld32u_i64(var, cpu_env, offset);
 +        break;
 +    case MO_Q:
 +        tcg_gen_ld_i64(var, cpu_env, offset);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
++
++    if (!vfp_access_check(s)) {
++        return true;
++    }
++
++    fpst = fpstatus_ptr(FPST_FPCR);
++    tmp = tcg_temp_new_i32();
++
++    vfp_load_reg32(tmp, a->vm);
++    gen_helper_bfcvt(tmp, tmp, fpst);
++    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
++    tcg_temp_free_ptr(fpst);
++    tcg_temp_free_i32(tmp);
++    return true;
 +}
 +
- static void neon_store_reg(int reg, int pass, TCGv_i32 var)
+ static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
  {
-     tcg_gen_st_i32(var, cpu_env, neon_reg_offset(reg, pass));
+     TCGv_ptr fpst;
-     tcg_temp_free_i32(var);
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ float32 VFP_HELPER(fcvts, d)(float64 x, CPUARMState *env)
      return float64_to_float32(x, &env->vfp.fp_status);
  }
-+static void neon_store_element64(int reg, int ele, TCGMemOp size, TCGv_i64 var)
++uint32_t HELPER(bfcvt)(float32 x, void *status)
 +{
-+    long offset = neon_element_offset(reg, ele, size);
++    return float32_to_bfloat16(x, status);
 +
 +    switch (size) {
 +    case MO_8:
 +        tcg_gen_st8_i64(var, cpu_env, offset);
 +        break;
 +    case MO_16:
 +        tcg_gen_st16_i64(var, cpu_env, offset);
 +        break;
 +    case MO_32:
 +        tcg_gen_st32_i64(var, cpu_env, offset);
 +        break;
 +    case MO_64:
 +        tcg_gen_st_i64(var, cpu_env, offset);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
- static inline void neon_load_reg64(TCGv_i64 var, int reg)
+ /*
- {
+  * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
-     tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
+  * must always round-to-nearest; the AArch64 ones honour the FPSCR
@@ -XXX,XX +XXX,XX @@ static struct {
      int interleave;
      int spacing;
  } const neon_ls_element_type[11] = {
 -    {4, 4, 1},
 -    {4, 4, 2},
 +    {1, 4, 1},
 +    {1, 4, 2},
      {4, 1, 1},
 -    {4, 2, 1},
 -    {3, 3, 1},
 -    {3, 3, 2},
 +    {2, 2, 2},
 +    {1, 3, 1},
 +    {1, 3, 2},
      {3, 1, 1},
      {1, 1, 1},
 -    {2, 2, 1},
 -    {2, 2, 2},
 +    {1, 2, 1},
 +    {1, 2, 2},
      {2, 1, 1}
  };
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
      int shift;
      int n;
      int vec_size;
 +    int mmu_idx;
 +    TCGMemOp endian;
      TCGv_i32 addr;
      TCGv_i32 tmp;
      TCGv_i32 tmp2;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
      rn = (insn >> 16) & 0xf;
      rm = insn & 0xf;
      load = (insn & (1 << 21)) != 0;
 +    endian = s->be_data;
 +    mmu_idx = get_mem_index(s);
      if ((insn & (1 << 23)) == 0) {
          /* Load store all elements.  */
          op = (insn >> 8) & 0xf;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
          nregs = neon_ls_element_type[op].nregs;
          interleave = neon_ls_element_type[op].interleave;
          spacing = neon_ls_element_type[op].spacing;
 -        if (size == 3 && (interleave | spacing) != 1)
 +        if (size == 3 && (interleave | spacing) != 1) {
              return 1;
 +        }
 +        tmp64 = tcg_temp_new_i64();
          addr = tcg_temp_new_i32();
 +        tmp2 = tcg_const_i32(1 << size);
          load_reg_var(s, addr, rn);
 -        stride = (1 << size) * interleave;
          for (reg = 0; reg < nregs; reg++) {
 -            if (interleave > 2 || (interleave == 2 && nregs == 2)) {
 -                load_reg_var(s, addr, rn);
 -                tcg_gen_addi_i32(addr, addr, (1 << size) * reg);
 -            } else if (interleave == 2 && nregs == 4 && reg == 2) {
 -                load_reg_var(s, addr, rn);
 -                tcg_gen_addi_i32(addr, addr, 1 << size);
 -            }
 -            if (size == 3) {
 -                tmp64 = tcg_temp_new_i64();
 -                if (load) {
 -                    gen_aa32_ld64(s, tmp64, addr, get_mem_index(s));
 -                    neon_store_reg64(tmp64, rd);
 -                } else {
 -                    neon_load_reg64(tmp64, rd);
 -                    gen_aa32_st64(s, tmp64, addr, get_mem_index(s));
 -                }
 -                tcg_temp_free_i64(tmp64);
 -                tcg_gen_addi_i32(addr, addr, stride);
 -            } else {
 -                for (pass = 0; pass < 2; pass++) {
 -                    if (size == 2) {
 -                        if (load) {
 -                            tmp = tcg_temp_new_i32();
 -                            gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
 -                            neon_store_reg(rd, pass, tmp);
 -                        } else {
 -                            tmp = neon_load_reg(rd, pass);
 -                            gen_aa32_st32(s, tmp, addr, get_mem_index(s));
 -                            tcg_temp_free_i32(tmp);
 -                        }
 -                        tcg_gen_addi_i32(addr, addr, stride);
 -                    } else if (size == 1) {
 -                        if (load) {
 -                            tmp = tcg_temp_new_i32();
 -                            gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
 -                            tcg_gen_addi_i32(addr, addr, stride);
 -                            tmp2 = tcg_temp_new_i32();
 -                            gen_aa32_ld16u(s, tmp2, addr, get_mem_index(s));
 -                            tcg_gen_addi_i32(addr, addr, stride);
 -                            tcg_gen_shli_i32(tmp2, tmp2, 16);
 -                            tcg_gen_or_i32(tmp, tmp, tmp2);
 -                            tcg_temp_free_i32(tmp2);
 -                            neon_store_reg(rd, pass, tmp);
 -                        } else {
 -                            tmp = neon_load_reg(rd, pass);
 -                            tmp2 = tcg_temp_new_i32();
 -                            tcg_gen_shri_i32(tmp2, tmp, 16);
 -                            gen_aa32_st16(s, tmp, addr, get_mem_index(s));
 -                            tcg_temp_free_i32(tmp);
 -                            tcg_gen_addi_i32(addr, addr, stride);
 -                            gen_aa32_st16(s, tmp2, addr, get_mem_index(s));
 -                            tcg_temp_free_i32(tmp2);
 -                            tcg_gen_addi_i32(addr, addr, stride);
 -                        }
 -                    } else /* size == 0 */ {
 -                        if (load) {
 -                            tmp2 = NULL;
 -                            for (n = 0; n < 4; n++) {
 -                                tmp = tcg_temp_new_i32();
 -                                gen_aa32_ld8u(s, tmp, addr, get_mem_index(s));
 -                                tcg_gen_addi_i32(addr, addr, stride);
 -                                if (n == 0) {
 -                                    tmp2 = tmp;
 -                                } else {
 -                                    tcg_gen_shli_i32(tmp, tmp, n * 8);
 -                                    tcg_gen_or_i32(tmp2, tmp2, tmp);
 -                                    tcg_temp_free_i32(tmp);
 -                                }
 -                            }
 -                            neon_store_reg(rd, pass, tmp2);
 -                        } else {
 -                            tmp2 = neon_load_reg(rd, pass);
 -                            for (n = 0; n < 4; n++) {
 -                                tmp = tcg_temp_new_i32();
 -                                if (n == 0) {
 -                                    tcg_gen_mov_i32(tmp, tmp2);
 -                                } else {
 -                                    tcg_gen_shri_i32(tmp, tmp2, n * 8);
 -                                }
 -                                gen_aa32_st8(s, tmp, addr, get_mem_index(s));
 -                                tcg_temp_free_i32(tmp);
 -                                tcg_gen_addi_i32(addr, addr, stride);
 -                            }
 -                            tcg_temp_free_i32(tmp2);
 -                        }
 +            for (n = 0; n < 8 >> size; n++) {
 +                int xs;
 +                for (xs = 0; xs < interleave; xs++) {
 +                    int tt = rd + reg + spacing * xs;
 +
 +                    if (load) {
 +                        gen_aa32_ld_i64(s, tmp64, addr, mmu_idx, endian | size);
 +                        neon_store_element64(tt, n, size, tmp64);
 +                    } else {
 +                        neon_load_element64(tmp64, tt, n, size);
 +                        gen_aa32_st_i64(s, tmp64, addr, mmu_idx, endian | size);
                      }
 +                    tcg_gen_add_i32(addr, addr, tmp2);
                  }
              }
 -            rd += spacing;
          }
          tcg_temp_free_i32(addr);
 -        stride = nregs * 8;
 +        tcg_temp_free_i32(tmp2);
 +        tcg_temp_free_i64(tmp64);
 +        stride = nregs * interleave * 8;
      } else {
          size = (insn >> 10) & 3;
          if (size == 3) {
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 36/45] target/arm: Use gvec for NEON_3R_VML
+[PULL 19/45] target/arm: Implement vector float32 to bfloat16 conversion
 From: Richard Henderson <richard.henderson@linaro.org>
-Move mla_op and mls_op expanders from translate-a64.c.
+This is BFCVT{N,T} for both AArch64 AdvSIMD and SVE,
+and VCVT.BF16.F32 for AArch32 NEON.
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181011205206.3552-16-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-5-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.h     |   2 +
+ target/arm/helper-sve.h     |  4 ++++
- target/arm/translate-a64.c | 106 -----------------------------
+ target/arm/helper.h         |  1 +
- target/arm/translate.c     | 134 ++++++++++++++++++++++++++++++++-----
+ target/arm/neon-dp.decode   |  1 +
-files changed, 120 insertions(+), 122 deletions(-)
+ target/arm/sve.decode       |  2 ++
+ target/arm/sve_helper.c     |  2 ++
-diff --git a/target/arm/translate.h b/target/arm/translate.h
+ target/arm/translate-a64.c  | 17 ++++++++++++++
-index XXXXXXX..XXXXXXX 100644
+ target/arm/translate-neon.c | 45 +++++++++++++++++++++++++++++++++++++
---- a/target/arm/translate.h
+ target/arm/translate-sve.c  | 16 +++++++++++++
-+++ b/target/arm/translate.h
+ target/arm/vfp_helper.c     |  7 ++++++
-@@ -XXX,XX +XXX,XX @@ static inline TCGv_i32 get_ahp_flag(void)
+files changed, 95 insertions(+)
- extern const GVecGen3 bsl_op;
- extern const GVecGen3 bit_op;
+diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
- extern const GVecGen3 bif_op;
+index XXXXXXX..XXXXXXX 100644
-+extern const GVecGen3 mla_op[4];
+--- a/target/arm/helper-sve.h
-+extern const GVecGen3 mls_op[4];
++++ b/target/arm/helper-sve.h
- extern const GVecGen2i ssra_op[4];
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve_fcvt_hd, TCG_CALL_NO_RWG,
- extern const GVecGen2i usra_op[4];
+                    void, ptr, ptr, ptr, ptr, i32)
- extern const GVecGen2i sri_op[4];
+ DEF_HELPER_FLAGS_5(sve_fcvt_sd, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(sve_bfcvt, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_5(sve_fcvtzs_hh, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve2_fcvtnt_sh, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_5(sve2_fcvtnt_ds, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_5(sve_bfcvtnt, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_5(sve2_fcvtlt_hs, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.h
 +++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
  DEF_HELPER_2(vfp_fcvtds, f64, f32, env)
  DEF_HELPER_2(vfp_fcvtsd, f32, f64, env)
  DEF_HELPER_FLAGS_2(bfcvt, TCG_CALL_NO_RWG, i32, f32, ptr)
 +DEF_HELPER_FLAGS_2(bfcvt_pair, TCG_CALL_NO_RWG, i32, i64, ptr)
  DEF_HELPER_2(vfp_uitoh, f16, i32, ptr)
  DEF_HELPER_2(vfp_uitos, f32, i32, ptr)
 diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-dp.decode
 +++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
      VRINTZ       1111 001 11 . 11 .. 10 .... 0 1011 . . 0 .... @2misc
      VCVT_F16_F32 1111 001 11 . 11 .. 10 .... 0 1100 0 . 0 .... @2misc_q0
 +    VCVT_B16_F32 1111 001 11 . 11 .. 10 .... 0 1100 1 . 0 .... @2misc_q0
      VRINTM       1111 001 11 . 11 .. 10 .... 0 1101 . . 0 .... @2misc
 diff --git a/target/arm/sve.decode b/target/arm/sve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/sve.decode
 +++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FNMLS_zpzzz     01100101 .. 1 ..... 111 ... ..... .....         @rdn_pg_rm_ra
  # SVE floating-point convert precision
  FCVT_sh         01100101 10 0010 00 101 ... ..... .....         @rd_pg_rn_e0
  FCVT_hs         01100101 10 0010 01 101 ... ..... .....         @rd_pg_rn_e0
 +BFCVT           01100101 10 0010 10 101 ... ..... .....         @rd_pg_rn_e0
  FCVT_dh         01100101 11 0010 00 101 ... ..... .....         @rd_pg_rn_e0
  FCVT_hd         01100101 11 0010 01 101 ... ..... .....         @rd_pg_rn_e0
  FCVT_ds         01100101 11 0010 10 101 ... ..... .....         @rd_pg_rn_e0
@@ -XXX,XX +XXX,XX @@ RAX1            01000101 00 1 ..... 11110 1 ..... .....  @rd_rn_rm_e0
  FCVTXNT_ds      01100100 00 0010 10 101 ... ..... .....  @rd_pg_rn_e0
  FCVTX_ds        01100101 00 0010 10 101 ... ..... .....  @rd_pg_rn_e0
  FCVTNT_sh       01100100 10 0010 00 101 ... ..... .....  @rd_pg_rn_e0
 +BFCVTNT         01100100 10 0010 10 101 ... ..... .....  @rd_pg_rn_e0
  FCVTLT_hs       01100100 10 0010 01 101 ... ..... .....  @rd_pg_rn_e0
  FCVTNT_ds       01100100 11 0010 10 101 ... ..... .....  @rd_pg_rn_e0
  FCVTLT_sd       01100100 11 0010 11 101 ... ..... .....  @rd_pg_rn_e0
 diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/sve_helper.c
 +++ b/target/arm/sve_helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint64_t vfp_float64_to_uint64_rtz(float64 f, float_status *s)
  DO_ZPZ_FP(sve_fcvt_sh, uint32_t, H1_4, sve_f32_to_f16)
  DO_ZPZ_FP(sve_fcvt_hs, uint32_t, H1_4, sve_f16_to_f32)
 +DO_ZPZ_FP(sve_bfcvt,   uint32_t, H1_4, float32_to_bfloat16)
  DO_ZPZ_FP(sve_fcvt_dh, uint64_t,     , sve_f64_to_f16)
  DO_ZPZ_FP(sve_fcvt_hd, uint64_t,     , sve_f16_to_f64)
  DO_ZPZ_FP(sve_fcvt_ds, uint64_t,     , float64_to_float32)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc)  \
      } while (i != 0);                                                         \
  }
 +DO_FCVTNT(sve_bfcvtnt,    uint32_t, uint16_t, H1_4, H1_2, float32_to_bfloat16)
  DO_FCVTNT(sve2_fcvtnt_sh, uint32_t, uint16_t, H1_4, H1_2, sve_f32_to_f16)
  DO_FCVTNT(sve2_fcvtnt_ds, uint64_t, uint32_t,     , H1_4, float64_to_float32)
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_float(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void handle_2misc_narrow(DisasContext *s, bool scalar,
-     }
+                 tcg_temp_free_i32(ahp);
  }
 -static void gen_mla8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 -{
 -    gen_helper_neon_mul_u8(a, a, b);
 -    gen_helper_neon_add_u8(d, d, a);
 -}
 -
 -static void gen_mla16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 -{
 -    gen_helper_neon_mul_u16(a, a, b);
 -    gen_helper_neon_add_u16(d, d, a);
 -}
 -
 -static void gen_mla32_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 -{
 -    tcg_gen_mul_i32(a, a, b);
 -    tcg_gen_add_i32(d, d, a);
 -}
 -
 -static void gen_mla64_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 -{
 -    tcg_gen_mul_i64(a, a, b);
 -    tcg_gen_add_i64(d, d, a);
 -}
 -
 -static void gen_mla_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
 -{
 -    tcg_gen_mul_vec(vece, a, a, b);
 -    tcg_gen_add_vec(vece, d, d, a);
 -}
 -
 -static void gen_mls8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 -{
 -    gen_helper_neon_mul_u8(a, a, b);
 -    gen_helper_neon_sub_u8(d, d, a);
 -}
 -
 -static void gen_mls16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 -{
 -    gen_helper_neon_mul_u16(a, a, b);
 -    gen_helper_neon_sub_u16(d, d, a);
 -}
 -
 -static void gen_mls32_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 -{
 -    tcg_gen_mul_i32(a, a, b);
 -    tcg_gen_sub_i32(d, d, a);
 -}
 -
 -static void gen_mls64_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 -{
 -    tcg_gen_mul_i64(a, a, b);
 -    tcg_gen_sub_i64(d, d, a);
 -}
 -
 -static void gen_mls_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
 -{
 -    tcg_gen_mul_vec(vece, a, a, b);
 -    tcg_gen_sub_vec(vece, d, d, a);
 -}
 -
  /* Integer op subgroup of C3.6.16. */
  static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
  {
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
            .prefer_i64 = TCG_TARGET_REG_BITS == 64,
            .vece = MO_64 },
      };
 -    static const GVecGen3 mla_op[4] = {
 -        { .fni4 = gen_mla8_i32,
 -          .fniv = gen_mla_vec,
 -          .opc = INDEX_op_mul_vec,
 -          .load_dest = true,
 -          .vece = MO_8 },
 -        { .fni4 = gen_mla16_i32,
 -          .fniv = gen_mla_vec,
 -          .opc = INDEX_op_mul_vec,
 -          .load_dest = true,
 -          .vece = MO_16 },
 -        { .fni4 = gen_mla32_i32,
 -          .fniv = gen_mla_vec,
 -          .opc = INDEX_op_mul_vec,
 -          .load_dest = true,
 -          .vece = MO_32 },
 -        { .fni8 = gen_mla64_i64,
 -          .fniv = gen_mla_vec,
 -          .opc = INDEX_op_mul_vec,
 -          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -          .load_dest = true,
 -          .vece = MO_64 },
 -    };
 -    static const GVecGen3 mls_op[4] = {
 -        { .fni4 = gen_mls8_i32,
 -          .fniv = gen_mls_vec,
 -          .opc = INDEX_op_mul_vec,
 -          .load_dest = true,
 -          .vece = MO_8 },
 -        { .fni4 = gen_mls16_i32,
 -          .fniv = gen_mls_vec,
 -          .opc = INDEX_op_mul_vec,
 -          .load_dest = true,
 -          .vece = MO_16 },
 -        { .fni4 = gen_mls32_i32,
 -          .fniv = gen_mls_vec,
 -          .opc = INDEX_op_mul_vec,
 -          .load_dest = true,
 -          .vece = MO_32 },
 -        { .fni8 = gen_mls64_i64,
 -          .fniv = gen_mls_vec,
 -          .opc = INDEX_op_mul_vec,
 -          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -          .load_dest = true,
 -          .vece = MO_64 },
 -    };
      int is_q = extract32(insn, 30, 1);
      int u = extract32(insn, 29, 1);
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_neon_narrow_op(int op, int u, int size,
  #define NEON_3R_VABA 15
  #define NEON_3R_VADD_VSUB 16
  #define NEON_3R_VTST_VCEQ 17
 -#define NEON_3R_VML 18 /* VMLA, VMLAL, VMLS, VMLSL */
 +#define NEON_3R_VML 18 /* VMLA, VMLS */
  #define NEON_3R_VMUL 19
  #define NEON_3R_VPMAX 20
  #define NEON_3R_VPMIN 21
@@ -XXX,XX +XXX,XX @@ const GVecGen2i sli_op[4] = {
        .vece = MO_64 },
  };
 +static void gen_mla8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 +{
 +    gen_helper_neon_mul_u8(a, a, b);
 +    gen_helper_neon_add_u8(d, d, a);
 +}
 +
 +static void gen_mls8_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 +{
 +    gen_helper_neon_mul_u8(a, a, b);
 +    gen_helper_neon_sub_u8(d, d, a);
 +}
 +
 +static void gen_mla16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 +{
 +    gen_helper_neon_mul_u16(a, a, b);
 +    gen_helper_neon_add_u16(d, d, a);
 +}
 +
 +static void gen_mls16_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 +{
 +    gen_helper_neon_mul_u16(a, a, b);
 +    gen_helper_neon_sub_u16(d, d, a);
 +}
 +
 +static void gen_mla32_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 +{
 +    tcg_gen_mul_i32(a, a, b);
 +    tcg_gen_add_i32(d, d, a);
 +}
 +
 +static void gen_mls32_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 +{
 +    tcg_gen_mul_i32(a, a, b);
 +    tcg_gen_sub_i32(d, d, a);
 +}
 +
 +static void gen_mla64_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 +{
 +    tcg_gen_mul_i64(a, a, b);
 +    tcg_gen_add_i64(d, d, a);
 +}
 +
 +static void gen_mls64_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 +{
 +    tcg_gen_mul_i64(a, a, b);
 +    tcg_gen_sub_i64(d, d, a);
 +}
 +
 +static void gen_mla_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
 +{
 +    tcg_gen_mul_vec(vece, a, a, b);
 +    tcg_gen_add_vec(vece, d, d, a);
 +}
 +
 +static void gen_mls_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
 +{
 +    tcg_gen_mul_vec(vece, a, a, b);
 +    tcg_gen_sub_vec(vece, d, d, a);
 +}
 +
 +/* Note that while NEON does not support VMLA and VMLS as 64-bit ops,
 + * these tables are shared with AArch64 which does support them.
 + */
 +const GVecGen3 mla_op[4] = {
 +    { .fni4 = gen_mla8_i32,
 +      .fniv = gen_mla_vec,
 +      .opc = INDEX_op_mul_vec,
 +      .load_dest = true,
 +      .vece = MO_8 },
 +    { .fni4 = gen_mla16_i32,
 +      .fniv = gen_mla_vec,
 +      .opc = INDEX_op_mul_vec,
 +      .load_dest = true,
 +      .vece = MO_16 },
 +    { .fni4 = gen_mla32_i32,
 +      .fniv = gen_mla_vec,
 +      .opc = INDEX_op_mul_vec,
 +      .load_dest = true,
 +      .vece = MO_32 },
 +    { .fni8 = gen_mla64_i64,
 +      .fniv = gen_mla_vec,
 +      .opc = INDEX_op_mul_vec,
 +      .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 +      .load_dest = true,
 +      .vece = MO_64 },
 +};
 +
 +const GVecGen3 mls_op[4] = {
 +    { .fni4 = gen_mls8_i32,
 +      .fniv = gen_mls_vec,
 +      .opc = INDEX_op_mul_vec,
 +      .load_dest = true,
 +      .vece = MO_8 },
 +    { .fni4 = gen_mls16_i32,
 +      .fniv = gen_mls_vec,
 +      .opc = INDEX_op_mul_vec,
 +      .load_dest = true,
 +      .vece = MO_16 },
 +    { .fni4 = gen_mls32_i32,
 +      .fniv = gen_mls_vec,
 +      .opc = INDEX_op_mul_vec,
 +      .load_dest = true,
 +      .vece = MO_32 },
 +    { .fni8 = gen_mls64_i64,
 +      .fniv = gen_mls_vec,
 +      .opc = INDEX_op_mul_vec,
 +      .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 +      .load_dest = true,
 +      .vece = MO_64 },
 +};
 +
  /* Translate a NEON data processing instruction.  Return nonzero if the
     instruction is invalid.
     We process data in a mixture of 32-bit and 64-bit chunks.
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  return 0;
              }
              break;
-+
++        case 0x36: /* BFCVTN, BFCVTN2 */
-+        case NEON_3R_VML: /* VMLA, VMLS */
++            {
-+            tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
++                TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
-+                           u ? &mls_op[size] : &mla_op[size]);
++                gen_helper_bfcvt_pair(tcg_res[pass], tcg_op, fpst);
-+            return 0;
++                tcg_temp_free_ptr(fpst);
-         }
++            }
-+
++            break;
-         if (size == 3) {
+         case 0x56:  /* FCVTXN, FCVTXN2 */
-             /* 64-bit element instructions. */
+             /* 64 bit to 32 bit float conversion
-             for (pass = 0; pass < (q ? 2 : 1); pass++) {
+              * with von Neumann rounding (round to odd)
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
                  }
              }
-             break;
+             handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd);
--        case NEON_3R_VML: /* VMLA, VMLAL, VMLS,VMLSL */
+             return;
--            switch (size) {
++        case 0x36: /* BFCVTN, BFCVTN2 */
--            case 0: gen_helper_neon_mul_u8(tmp, tmp, tmp2); break;
++            if (!dc_isar_feature(aa64_bf16, s) || size != 2) {
--            case 1: gen_helper_neon_mul_u16(tmp, tmp, tmp2); break;
++                unallocated_encoding(s);
--            case 2: tcg_gen_mul_i32(tmp, tmp, tmp2); break;
++                return;
--            default: abort();
++            }
--            }
++            if (!fp_access_check(s)) {
--            tcg_temp_free_i32(tmp2);
++                return;
--            tmp2 = neon_load_reg(rd, pass);
++            }
--            if (u) { /* VMLS */
++            handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd);
--                gen_neon_rsb(size, tmp, tmp2);
++            return;
--            } else { /* VMLA */
+         case 0x17: /* FCVTL, FCVTL2 */
--                gen_neon_add(size, tmp, tmp2);
+             if (!fp_access_check(s)) {
--            }
+                 return;
--            break;
+diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
-         case NEON_3R_VMUL:
+index XXXXXXX..XXXXXXX 100644
-             /* VMUL.P8; other cases already eliminated.  */
+--- a/target/arm/translate-neon.c
-             gen_helper_neon_mul_p8(tmp, tmp, tmp2);
++++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      return true;
  }
 +static bool trans_VCVT_B16_F32(DisasContext *s, arg_2misc *a)
 +{
 +    TCGv_ptr fpst;
 +    TCGv_i64 tmp;
 +    TCGv_i32 dst0, dst1;
 +
 +    if (!dc_isar_feature(aa32_bf16, s)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if ((a->vm & 1) || (a->size != 1)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    fpst = fpstatus_ptr(FPST_STD);
 +    tmp = tcg_temp_new_i64();
 +    dst0 = tcg_temp_new_i32();
 +    dst1 = tcg_temp_new_i32();
 +
 +    read_neon_element64(tmp, a->vm, 0, MO_64);
 +    gen_helper_bfcvt_pair(dst0, tmp, fpst);
 +
 +    read_neon_element64(tmp, a->vm, 1, MO_64);
 +    gen_helper_bfcvt_pair(dst1, tmp, fpst);
 +
 +    write_neon_element32(dst0, a->vd, 0, MO_32);
 +    write_neon_element32(dst1, a->vd, 1, MO_32);
 +
 +    tcg_temp_free_i64(tmp);
 +    tcg_temp_free_i32(dst0);
 +    tcg_temp_free_i32(dst1);
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
  static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
  {
      TCGv_ptr fpst;
 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-sve.c
 +++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_hs(DisasContext *s, arg_rpr_esz *a)
      return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_hs);
  }
 +static bool trans_BFCVT(DisasContext *s, arg_rpr_esz *a)
 +{
 +    if (!dc_isar_feature(aa64_sve_bf16, s)) {
 +        return false;
 +    }
 +    return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_bfcvt);
 +}
 +
  static bool trans_FCVT_dh(DisasContext *s, arg_rpr_esz *a)
  {
      return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_dh);
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVTNT_sh(DisasContext *s, arg_rpr_esz *a)
      return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtnt_sh);
  }
 +static bool trans_BFCVTNT(DisasContext *s, arg_rpr_esz *a)
 +{
 +    if (!dc_isar_feature(aa64_sve_bf16, s)) {
 +        return false;
 +    }
 +    return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_bfcvtnt);
 +}
 +
  static bool trans_FCVTNT_ds(DisasContext *s, arg_rpr_esz *a)
  {
      if (!dc_isar_feature(aa64_sve2, s)) {
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(bfcvt)(float32 x, void *status)
      return float32_to_bfloat16(x, status);
  }
 +uint32_t HELPER(bfcvt_pair)(uint64_t pair, void *status)
 +{
 +    bfloat16 lo = float32_to_bfloat16(extract64(pair, 0, 32), status);
 +    bfloat16 hi = float32_to_bfloat16(extract64(pair, 32, 32), status);
 +    return deposit32(lo, 16, 16, hi);
 +}
 +
  /*
   * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
   * must always round-to-nearest; the AArch64 ones honour the FPSCR
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 31/45] target/arm: Use gvec for NEON_2RM_VMN, NEON_2RM_VNEG
+[PULL 20/45] softfpu: Add float_round_to_odd_inf
 From: Richard Henderson <richard.henderson@linaro.org>
+For Arm BFDOT and BFMMLA, we need a version of round-to-odd
+that overflows to infinity, instead of the max normal number.
+Cc: Alex Bennée <alex.bennee@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181011205206.3552-11-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-6-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 16 ++++++++--------
+ include/fpu/softfloat-types.h | 4 +++-
-file changed, 8 insertions(+), 8 deletions(-)
+ fpu/softfloat-parts.c.inc     | 6 ++++--
 files changed, 7 insertions(+), 3 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/include/fpu/softfloat-types.h
-+++ b/target/arm/translate.c
++++ b/include/fpu/softfloat-types.h
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
-                     tcg_temp_free_ptr(ptr1);
+     float_round_up           = 2,
-                     tcg_temp_free_ptr(ptr2);
+     float_round_to_zero      = 3,
-                     break;
+     float_round_ties_away    = 4,
-+
+-    /* Not an IEEE rounding mode: round to the closest odd mantissa value */
-+                case NEON_2RM_VMVN:
++    /* Not an IEEE rounding mode: round to closest odd, overflow to max */
-+                    tcg_gen_gvec_not(0, rd_ofs, rm_ofs, vec_size, vec_size);
+     float_round_to_odd       = 5,
-+                    break;
++    /* Not an IEEE rounding mode: round to closest odd, overflow to inf */
-+                case NEON_2RM_VNEG:
++    float_round_to_odd_inf   = 6,
-+                    tcg_gen_gvec_neg(size, rd_ofs, rm_ofs, vec_size, vec_size);
+ } FloatRoundMode;
-+                    break;
-+
+ /*
-                 default:
+diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
-                 elementwise:
+index XXXXXXX..XXXXXXX 100644
-                     for (pass = 0; pass < (q ? 4 : 2); pass++) {
+--- a/fpu/softfloat-parts.c.inc
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
++++ b/fpu/softfloat-parts.c.inc
-                         case NEON_2RM_VCNT:
+@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
-                             gen_helper_neon_cnt_u8(tmp, tmp);
+         g_assert_not_reached();
-                             break;
+     }
--                        case NEON_2RM_VMVN:
--                            tcg_gen_not_i32(tmp, tmp);
++    overflow_norm = false;
--                            break;
+     switch (s->float_rounding_mode) {
-                         case NEON_2RM_VQABS:
+     case float_round_nearest_even:
-                             switch (size) {
+-        overflow_norm = false;
-                             case 0:
+         inc = ((p->frac_lo & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+         break;
-                             default: abort();
+     case float_round_ties_away:
-                             }
+-        overflow_norm = false;
-                             break;
+         inc = frac_lsbm1;
--                        case NEON_2RM_VNEG:
+         break;
--                            tmp2 = tcg_const_i32(0);
+     case float_round_to_zero:
--                            gen_neon_rsb(size, tmp, tmp2);
+@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
--                            tcg_temp_free_i32(tmp2);
+         break;
--                            break;
+     case float_round_to_odd:
-                         case NEON_2RM_VCGT0_F:
+         overflow_norm = true;
-                         {
++        /* fall through */
-                             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
++    case float_round_to_odd_inf:
          inc = p->frac_lo & frac_lsb ? 0 : round_mask;
          break;
      default:
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
                         ? frac_lsbm1 : 0);
                  break;
              case float_round_to_odd:
 +            case float_round_to_odd_inf:
                  inc = p->frac_lo & frac_lsb ? 0 : round_mask;
                  break;
              default:
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 37/45] target/arm: Use gvec for NEON_3R_VTST_VCEQ, NEON_3R_VCGT, NEON_3R_VCGE
+[PULL 21/45] target/arm: Implement bfloat16 dot product (vector)
 From: Richard Henderson <richard.henderson@linaro.org>
-Move cmtst_op expanders from translate-a64.c.
+This is BFDOT for both AArch64 AdvSIMD and SVE,
 and VDOT.BF16 for AArch32 NEON.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181011205206.3552-17-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-7-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.h     |  2 +
+ target/arm/helper.h           |  3 +++
- target/arm/translate-a64.c | 38 ------------------
+ target/arm/neon-shared.decode |  2 ++
- target/arm/translate.c     | 81 +++++++++++++++++++++++++++-----------
+ target/arm/sve.decode         |  3 +++
-files changed, 60 insertions(+), 61 deletions(-)
+ target/arm/translate-a64.c    | 20 ++++++++++++++++++
  target/arm/translate-neon.c   |  9 ++++++++
  target/arm/translate-sve.c    | 12 +++++++++++
  target/arm/vec_helper.c       | 40 +++++++++++++++++++++++++++++++++++
 files changed, 89 insertions(+)
-diff --git a/target/arm/translate.h b/target/arm/translate.h
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.h
+--- a/target/arm/helper.h
-+++ b/target/arm/translate.h
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ extern const GVecGen3 bit_op;
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_ummla_b, TCG_CALL_NO_RWG,
- extern const GVecGen3 bif_op;
+ DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG,
- extern const GVecGen3 mla_op[4];
+                    void, ptr, ptr, ptr, ptr, i32)
- extern const GVecGen3 mls_op[4];
-+extern const GVecGen3 cmtst_op[4];
++DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
- extern const GVecGen2i ssra_op[4];
++                   void, ptr, ptr, ptr, ptr, i32)
- extern const GVecGen2i usra_op[4];
++
- extern const GVecGen2i sri_op[4];
+ #ifdef TARGET_AARCH64
- extern const GVecGen2i sli_op[4];
+ #include "helper-a64.h"
-+void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+ #include "helper-sve.h"
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
- /*
+index XXXXXXX..XXXXXXX 100644
-  * Forward to the isar_feature_* tests given a DisasContext pointer.
+--- a/target/arm/neon-shared.decode
 +++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUDOT          1111 110 00 . 10 .... .... 1101 . q:1 . 1 .... \
                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
  VUSDOT         1111 110 01 . 10 .... .... 1101 . q:1 . 0 .... \
                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +VDOT_b16       1111 110 00 . 00 .... .... 1101 . q:1 . 0 .... \
 +               vm=%vm_dp vn=%vn_dp vd=%vd_dp
  # VFM[AS]L
  VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
 diff --git a/target/arm/sve.decode b/target/arm/sve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/sve.decode
 +++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
  FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_e0
  FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_e0
 +### SVE2 floating-point bfloat16 dot-product
 +BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
 +
  ### SVE2 floating-point multiply-add long (indexed)
  FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
  FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          feature = dc_isar_feature(aa64_fcma, s);
          break;
 +    case 0x1f: /* BFDOT */
 +        switch (size) {
 +        case 1:
 +            feature = dc_isar_feature(aa64_bf16, s);
 +            break;
 +        default:
 +            unallocated_encoding(s);
 +            return;
 +        }
 +        break;
      default:
          unallocated_encoding(s);
          return;
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          return;
 +    case 0xf: /* BFDOT */
 +        switch (size) {
 +        case 1:
 +            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
 +        return;
 +
      default:
          g_assert_not_reached();
      }
+diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.c
++++ b/target/arm/translate-neon.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_VUSDOT(DisasContext *s, arg_VUSDOT *a)
+                         gen_helper_gvec_usdot_b);
  }
--/* CMTST : test is "if (X & Y != 0)". */
++static bool trans_VDOT_b16(DisasContext *s, arg_VDOT_b16 *a)
 -static void gen_cmtst_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 -{
 -    tcg_gen_and_i32(d, a, b);
 -    tcg_gen_setcondi_i32(TCG_COND_NE, d, d, 0);
 -    tcg_gen_neg_i32(d, d);
 -}
 -
 -static void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 -{
 -    tcg_gen_and_i64(d, a, b);
 -    tcg_gen_setcondi_i64(TCG_COND_NE, d, d, 0);
 -    tcg_gen_neg_i64(d, d);
 -}
 -
 -static void gen_cmtst_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
 -{
 -    tcg_gen_and_vec(vece, d, a, b);
 -    tcg_gen_dupi_vec(vece, a, 0);
 -    tcg_gen_cmp_vec(TCG_COND_NE, vece, d, d, a);
 -}
 -
  static void handle_3same_64(DisasContext *s, int opcode, bool u,
                              TCGv_i64 tcg_rd, TCGv_i64 tcg_rn, TCGv_i64 tcg_rm)
  {
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_float(DisasContext *s, uint32_t insn)
  /* Integer op subgroup of C3.6.16. */
  static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
  {
 -    static const GVecGen3 cmtst_op[4] = {
 -        { .fni4 = gen_helper_neon_tst_u8,
 -          .fniv = gen_cmtst_vec,
 -          .vece = MO_8 },
 -        { .fni4 = gen_helper_neon_tst_u16,
 -          .fniv = gen_cmtst_vec,
 -          .vece = MO_16 },
 -        { .fni4 = gen_cmtst_i32,
 -          .fniv = gen_cmtst_vec,
 -          .vece = MO_32 },
 -        { .fni8 = gen_cmtst_i64,
 -          .fniv = gen_cmtst_vec,
 -          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -          .vece = MO_64 },
 -    };
 -
      int is_q = extract32(insn, 30, 1);
      int u = extract32(insn, 29, 1);
      int size = extract32(insn, 22, 2);
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ const GVecGen3 mls_op[4] = {
        .vece = MO_64 },
  };
 +/* CMTST : test is "if (X & Y != 0)". */
 +static void gen_cmtst_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 +{
-+    tcg_gen_and_i32(d, a, b);
++    if (!dc_isar_feature(aa32_bf16, s)) {
-+    tcg_gen_setcondi_i32(TCG_COND_NE, d, d, 0);
++        return false;
-+    tcg_gen_neg_i32(d, d);
++    }
 +    return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0,
 +                        gen_helper_gvec_bfdot);
 +}
 +
-+void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+ static bool trans_VFML(DisasContext *s, arg_VFML *a)
  {
      int opr_sz;
 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-sve.c
 +++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_UMMLA(DisasContext *s, arg_rrrr_esz *a)
  {
      return do_i8mm_zzzz_ool(s, a, gen_helper_gvec_ummla_b, 0);
  }
 +
 +static bool trans_BFDOT_zzzz(DisasContext *s, arg_rrrr_esz *a)
 +{
-+    tcg_gen_and_i64(d, a, b);
++    if (!dc_isar_feature(aa64_sve_bf16, s)) {
-+    tcg_gen_setcondi_i64(TCG_COND_NE, d, d, 0);
++        return false;
-+    tcg_gen_neg_i64(d, d);
++    }
 +    if (sve_access_check(s)) {
 +        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfdot,
 +                          a->rd, a->rn, a->rm, a->ra, 0);
 +    }
 +    return true;
 +}
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static void do_mmla_b(void *vd, void *vn, void *vm, void *va, uint32_t desc,
  DO_MMLA_B(gvec_smmla_b, do_smmla_b)
  DO_MMLA_B(gvec_ummla_b, do_ummla_b)
  DO_MMLA_B(gvec_usmmla_b, do_usmmla_b)
 +
 +/*
 + * BFloat16 Dot Product
 + */
 +
 +static float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2)
 +{
 +    /* FPCR is ignored for BFDOT and BFMMLA. */
 +    float_status bf_status = {
 +        .tininess_before_rounding = float_tininess_before_rounding,
 +        .float_rounding_mode = float_round_to_odd_inf,
 +        .flush_to_zero = true,
 +        .flush_inputs_to_zero = true,
 +        .default_nan_mode = true,
 +    };
 +    float32 t1, t2;
 +
 +    /*
 +     * Extract each BFloat16 from the element pair, and shift
 +     * them such that they become float32.
 +     */
 +    t1 = float32_mul(e1 << 16, e2 << 16, &bf_status);
 +    t2 = float32_mul(e1 & 0xffff0000u, e2 & 0xffff0000u, &bf_status);
 +    t1 = float32_add(t1, t2, &bf_status);
 +    t1 = float32_add(sum, t1, &bf_status);
 +
 +    return t1;
 +}
 +
-+static void gen_cmtst_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
++void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
 +{
-+    tcg_gen_and_vec(vece, d, a, b);
++    intptr_t i, opr_sz = simd_oprsz(desc);
-+    tcg_gen_dupi_vec(vece, a, 0);
++    float32 *d = vd, *a = va;
-+    tcg_gen_cmp_vec(TCG_COND_NE, vece, d, d, a);
++    uint32_t *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz / 4; ++i) {
 +        d[i] = bfdotadd(a[i], n[i], m[i]);
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
-+
-+const GVecGen3 cmtst_op[4] = {
-+    { .fni4 = gen_helper_neon_tst_u8,
-+      .fniv = gen_cmtst_vec,
-+      .vece = MO_8 },
-+    { .fni4 = gen_helper_neon_tst_u16,
-+      .fniv = gen_cmtst_vec,
-+      .vece = MO_16 },
-+    { .fni4 = gen_cmtst_i32,
-+      .fniv = gen_cmtst_vec,
-+      .vece = MO_32 },
-+    { .fni8 = gen_cmtst_i64,
-+      .fniv = gen_cmtst_vec,
-+      .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-+      .vece = MO_64 },
-+};
-+
- /* Translate a NEON data processing instruction.  Return nonzero if the
-    instruction is invalid.
-    We process data in a mixture of 32-bit and 64-bit chunks.
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-             tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
-                            u ? &mls_op[size] : &mla_op[size]);
-             return 0;
-+
-+        case NEON_3R_VTST_VCEQ:
-+            if (u) { /* VCEQ */
-+                tcg_gen_gvec_cmp(TCG_COND_EQ, size, rd_ofs, rn_ofs, rm_ofs,
-+                                 vec_size, vec_size);
-+            } else { /* VTST */
-+                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
-+                               vec_size, vec_size, &cmtst_op[size]);
-+            }
-+            return 0;
-+
-+        case NEON_3R_VCGT:
-+            tcg_gen_gvec_cmp(u ? TCG_COND_GTU : TCG_COND_GT, size,
-+                             rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
-+            return 0;
-+
-+        case NEON_3R_VCGE:
-+            tcg_gen_gvec_cmp(u ? TCG_COND_GEU : TCG_COND_GE, size,
-+                             rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size);
-+            return 0;
-         }
-         if (size == 3) {
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-         case NEON_3R_VQSUB:
-             GEN_NEON_INTEGER_OP_ENV(qsub);
-             break;
--        case NEON_3R_VCGT:
--            GEN_NEON_INTEGER_OP(cgt);
--            break;
--        case NEON_3R_VCGE:
--            GEN_NEON_INTEGER_OP(cge);
--            break;
-         case NEON_3R_VSHL:
-             GEN_NEON_INTEGER_OP(shl);
-             break;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-             tmp2 = neon_load_reg(rd, pass);
-             gen_neon_add(size, tmp, tmp2);
-             break;
--        case NEON_3R_VTST_VCEQ:
--            if (!u) { /* VTST */
--                switch (size) {
--                case 0: gen_helper_neon_tst_u8(tmp, tmp, tmp2); break;
--                case 1: gen_helper_neon_tst_u16(tmp, tmp, tmp2); break;
--                case 2: gen_helper_neon_tst_u32(tmp, tmp, tmp2); break;
--                default: abort();
--                }
--            } else { /* VCEQ */
--                switch (size) {
--                case 0: gen_helper_neon_ceq_u8(tmp, tmp, tmp2); break;
--                case 1: gen_helper_neon_ceq_u16(tmp, tmp, tmp2); break;
--                case 2: gen_helper_neon_ceq_u32(tmp, tmp, tmp2); break;
--                default: abort();
--                }
--            }
--            break;
-         case NEON_3R_VMUL:
-             /* VMUL.P8; other cases already eliminated.  */
-             gen_helper_neon_mul_p8(tmp, tmp, tmp2);
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 05/45] target/arm: Convert v8 extensions from feature bits to isar tests
+[PULL 22/45] target/arm: Implement bfloat16 dot product (indexed)
 From: Richard Henderson <richard.henderson@linaro.org>
-Most of the v8 extensions are self-contained within the ISAR
+This is BFDOT for both AArch64 AdvSIMD and SVE,
-registers and are not implied by other feature bits, which
+and VDOT.BF16 for AArch32 NEON.
 makes them the easiest to convert.
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181016223115.24100-4-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-8-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h           | 131 +++++++++++++++++++++++++++++++++----
+ target/arm/helper.h           |  2 ++
- target/arm/translate.h     |   7 ++
+ target/arm/neon-shared.decode |  2 ++
- linux-user/elfload.c       |  46 ++++++++-----
+ target/arm/sve.decode         |  3 +++
- target/arm/cpu.c           |  27 +++++---
+ target/arm/translate-a64.c    | 41 +++++++++++++++++++++++++++--------
- target/arm/cpu64.c         |  57 +++++++++-------
+ target/arm/translate-neon.c   |  9 ++++++++
- target/arm/translate-a64.c | 101 ++++++++++++++--------------
+ target/arm/translate-sve.c    | 12 ++++++++++
- target/arm/translate.c     |  36 +++++-----
+ target/arm/vec_helper.c       | 20 +++++++++++++++++
-files changed, 273 insertions(+), 132 deletions(-)
+files changed, 80 insertions(+), 9 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/helper.h
-+++ b/target/arm/cpu.h
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ typedef enum ARMPSCIState {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG,
-     PSCI_ON_PENDING = 2
- } ARMPSCIState;
+ DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, i32)
-+typedef struct ARMISARegisters ARMISARegisters;
++DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, i32)
  #ifdef TARGET_AARCH64
  #include "helper-a64.h"
 diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-shared.decode
 +++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUSDOT_scalar  1111 1110 1 . 00 .... .... 1101 . q:1 index:1 0 vm:4 \
                 vn=%vn_dp vd=%vd_dp
  VSUDOT_scalar  1111 1110 1 . 00 .... .... 1101 . q:1 index:1 1 vm:4 \
                 vn=%vn_dp vd=%vd_dp
 +VDOT_b16_scal  1111 1110 0 . 00 .... .... 1101 . q:1 index:1 0 vm:4 \
 +               vn=%vn_dp vd=%vd_dp
  %vfml_scalar_q0_rm 0:3 5:1
  %vfml_scalar_q1_index 5:1 3:1
 diff --git a/target/arm/sve.decode b/target/arm/sve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/sve.decode
 +++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
  FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
  FMLSLB_zzxw     01100100 10 1 ..... 0110.0 ..... .....     @rrxr_3a esz=2
  FMLSLT_zzxw     01100100 10 1 ..... 0110.1 ..... .....     @rrxr_3a esz=2
 +
- /**
++### SVE2 floating-point bfloat16 dot-product (indexed)
-  * ARMCPU:
++BFDOT_zzxz      01100100 01 1 ..... 010000 ..... .....     @rrxr_2 esz=2
   * @env: #CPUARMState
@@ -XXX,XX +XXX,XX @@ enum arm_features {
      ARM_FEATURE_LPAE, /* has Large Physical Address Extension */
      ARM_FEATURE_V8,
      ARM_FEATURE_AARCH64, /* supports 64 bit mode */
 -    ARM_FEATURE_V8_AES, /* implements AES part of v8 Crypto Extensions */
      ARM_FEATURE_CBAR, /* has cp15 CBAR */
      ARM_FEATURE_CRC, /* ARMv8 CRC instructions */
      ARM_FEATURE_CBAR_RO, /* has cp15 CBAR and it is read-only */
      ARM_FEATURE_EL2, /* has EL2 Virtualization support */
      ARM_FEATURE_EL3, /* has EL3 Secure monitor support */
 -    ARM_FEATURE_V8_SHA1, /* implements SHA1 part of v8 Crypto Extensions */
 -    ARM_FEATURE_V8_SHA256, /* implements SHA256 part of v8 Crypto Extensions */
 -    ARM_FEATURE_V8_PMULL, /* implements PMULL part of v8 Crypto Extensions */
      ARM_FEATURE_THUMB_DSP, /* DSP insns supported in the Thumb encodings */
      ARM_FEATURE_PMU, /* has PMU support */
      ARM_FEATURE_VBAR, /* has cp15 VBAR */
      ARM_FEATURE_M_SECURITY, /* M profile Security Extension */
      ARM_FEATURE_JAZELLE, /* has (trivial) Jazelle implementation */
      ARM_FEATURE_SVE, /* has Scalable Vector Extension */
 -    ARM_FEATURE_V8_SHA512, /* implements SHA512 part of v8 Crypto Extensions */
 -    ARM_FEATURE_V8_SHA3, /* implements SHA3 part of v8 Crypto Extensions */
 -    ARM_FEATURE_V8_SM3, /* implements SM3 part of v8 Crypto Extensions */
 -    ARM_FEATURE_V8_SM4, /* implements SM4 part of v8 Crypto Extensions */
 -    ARM_FEATURE_V8_ATOMICS, /* ARMv8.1-Atomics feature */
 -    ARM_FEATURE_V8_RDM, /* implements v8.1 simd round multiply */
 -    ARM_FEATURE_V8_DOTPROD, /* implements v8.2 simd dot product */
      ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
 -    ARM_FEATURE_V8_FCMA, /* has complex number part of v8.3 extensions.  */
      ARM_FEATURE_M_MAIN, /* M profile Main Extension */
  };
@@ -XXX,XX +XXX,XX @@ static inline uint64_t *aa64_vfp_qreg(CPUARMState *env, unsigned regno)
  /* Shared between translate-sve.c and sve_helper.c.  */
  extern const uint64_t pred_esz_masks[4];
 +/*
 + * 32-bit feature tests via id registers.
 + */
 +static inline bool isar_feature_aa32_aes(const ARMISARegisters *id)
 +{
 +    return FIELD_EX32(id->id_isar5, ID_ISAR5, AES) != 0;
 +}
 +
 +static inline bool isar_feature_aa32_pmull(const ARMISARegisters *id)
 +{
 +    return FIELD_EX32(id->id_isar5, ID_ISAR5, AES) > 1;
 +}
 +
 +static inline bool isar_feature_aa32_sha1(const ARMISARegisters *id)
 +{
 +    return FIELD_EX32(id->id_isar5, ID_ISAR5, SHA1) != 0;
 +}
 +
 +static inline bool isar_feature_aa32_sha2(const ARMISARegisters *id)
 +{
 +    return FIELD_EX32(id->id_isar5, ID_ISAR5, SHA2) != 0;
 +}
 +
 +static inline bool isar_feature_aa32_crc32(const ARMISARegisters *id)
 +{
 +    return FIELD_EX32(id->id_isar5, ID_ISAR5, CRC32) != 0;
 +}
 +
 +static inline bool isar_feature_aa32_rdm(const ARMISARegisters *id)
 +{
 +    return FIELD_EX32(id->id_isar5, ID_ISAR5, RDM) != 0;
 +}
 +
 +static inline bool isar_feature_aa32_vcma(const ARMISARegisters *id)
 +{
 +    return FIELD_EX32(id->id_isar5, ID_ISAR5, VCMA) != 0;
 +}
 +
 +static inline bool isar_feature_aa32_dp(const ARMISARegisters *id)
 +{
 +    return FIELD_EX32(id->id_isar6, ID_ISAR6, DP) != 0;
 +}
 +
 +/*
 + * 64-bit feature tests via id registers.
 + */
 +static inline bool isar_feature_aa64_aes(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, AES) != 0;
 +}
 +
 +static inline bool isar_feature_aa64_pmull(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, AES) > 1;
 +}
 +
 +static inline bool isar_feature_aa64_sha1(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SHA1) != 0;
 +}
 +
 +static inline bool isar_feature_aa64_sha256(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SHA2) != 0;
 +}
 +
 +static inline bool isar_feature_aa64_sha512(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SHA2) > 1;
 +}
 +
 +static inline bool isar_feature_aa64_crc32(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, CRC32) != 0;
 +}
 +
 +static inline bool isar_feature_aa64_atomics(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, ATOMIC) != 0;
 +}
 +
 +static inline bool isar_feature_aa64_rdm(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, RDM) != 0;
 +}
 +
 +static inline bool isar_feature_aa64_sha3(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SHA3) != 0;
 +}
 +
 +static inline bool isar_feature_aa64_sm3(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SM3) != 0;
 +}
 +
 +static inline bool isar_feature_aa64_sm4(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SM4) != 0;
 +}
 +
 +static inline bool isar_feature_aa64_dp(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, DP) != 0;
 +}
 +
 +static inline bool isar_feature_aa64_fcma(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, FCMA) != 0;
 +}
 +
 +/*
 + * Forward to the above feature tests given an ARMCPU pointer.
 + */
 +#define cpu_isar_feature(name, cpu) \
 +    ({ ARMCPU *cpu_ = (cpu); isar_feature_##name(&cpu_->isar); })
 +
  #endif
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@
  /* internal defines */
  typedef struct DisasContext {
      DisasContextBase base;
 +    const ARMISARegisters *isar;
      target_ulong pc;
      target_ulong page_start;
@@ -XXX,XX +XXX,XX @@ static inline TCGv_i32 get_ahp_flag(void)
      return ret;
  }
 +/*
 + * Forward to the isar_feature_* tests given a DisasContext pointer.
 + */
 +#define dc_isar_feature(name, ctx) \
 +    ({ DisasContext *ctx_ = (ctx); isar_feature_##name(ctx_->isar); })
 +
  #endif /* TARGET_ARM_TRANSLATE_H */
 diff --git a/linux-user/elfload.c b/linux-user/elfload.c
 index XXXXXXX..XXXXXXX 100644
 --- a/linux-user/elfload.c
 +++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
      /* probe for the extra features */
  #define GET_FEATURE(feat, hwcap) \
      do { if (arm_feature(&cpu->env, feat)) { hwcaps |= hwcap; } } while (0)
 +
 +#define GET_FEATURE_ID(feat, hwcap) \
 +    do { if (cpu_isar_feature(feat, cpu)) { hwcaps |= hwcap; } } while (0)
 +
      /* EDSP is in v5TE and above, but all our v5 CPUs are v5TE */
      GET_FEATURE(ARM_FEATURE_V5, ARM_HWCAP_ARM_EDSP);
      GET_FEATURE(ARM_FEATURE_VFP, ARM_HWCAP_ARM_VFP);
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap2(void)
      ARMCPU *cpu = ARM_CPU(thread_cpu);
      uint32_t hwcaps = 0;
 -    GET_FEATURE(ARM_FEATURE_V8_AES, ARM_HWCAP2_ARM_AES);
 -    GET_FEATURE(ARM_FEATURE_V8_PMULL, ARM_HWCAP2_ARM_PMULL);
 -    GET_FEATURE(ARM_FEATURE_V8_SHA1, ARM_HWCAP2_ARM_SHA1);
 -    GET_FEATURE(ARM_FEATURE_V8_SHA256, ARM_HWCAP2_ARM_SHA2);
 -    GET_FEATURE(ARM_FEATURE_CRC, ARM_HWCAP2_ARM_CRC32);
 +    GET_FEATURE_ID(aa32_aes, ARM_HWCAP2_ARM_AES);
 +    GET_FEATURE_ID(aa32_pmull, ARM_HWCAP2_ARM_PMULL);
 +    GET_FEATURE_ID(aa32_sha1, ARM_HWCAP2_ARM_SHA1);
 +    GET_FEATURE_ID(aa32_sha2, ARM_HWCAP2_ARM_SHA2);
 +    GET_FEATURE_ID(aa32_crc32, ARM_HWCAP2_ARM_CRC32);
      return hwcaps;
  }
  #undef GET_FEATURE
 +#undef GET_FEATURE_ID
  #else
  /* 64 bit ARM definitions */
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
      /* probe for the extra features */
  #define GET_FEATURE(feat, hwcap) \
      do { if (arm_feature(&cpu->env, feat)) { hwcaps |= hwcap; } } while (0)
 -    GET_FEATURE(ARM_FEATURE_V8_AES, ARM_HWCAP_A64_AES);
 -    GET_FEATURE(ARM_FEATURE_V8_PMULL, ARM_HWCAP_A64_PMULL);
 -    GET_FEATURE(ARM_FEATURE_V8_SHA1, ARM_HWCAP_A64_SHA1);
 -    GET_FEATURE(ARM_FEATURE_V8_SHA256, ARM_HWCAP_A64_SHA2);
 -    GET_FEATURE(ARM_FEATURE_CRC, ARM_HWCAP_A64_CRC32);
 -    GET_FEATURE(ARM_FEATURE_V8_SHA3, ARM_HWCAP_A64_SHA3);
 -    GET_FEATURE(ARM_FEATURE_V8_SM3, ARM_HWCAP_A64_SM3);
 -    GET_FEATURE(ARM_FEATURE_V8_SM4, ARM_HWCAP_A64_SM4);
 -    GET_FEATURE(ARM_FEATURE_V8_SHA512, ARM_HWCAP_A64_SHA512);
 +#define GET_FEATURE_ID(feat, hwcap) \
 +    do { if (cpu_isar_feature(feat, cpu)) { hwcaps |= hwcap; } } while (0)
 +
 +    GET_FEATURE_ID(aa64_aes, ARM_HWCAP_A64_AES);
 +    GET_FEATURE_ID(aa64_pmull, ARM_HWCAP_A64_PMULL);
 +    GET_FEATURE_ID(aa64_sha1, ARM_HWCAP_A64_SHA1);
 +    GET_FEATURE_ID(aa64_sha256, ARM_HWCAP_A64_SHA2);
 +    GET_FEATURE_ID(aa64_sha512, ARM_HWCAP_A64_SHA512);
 +    GET_FEATURE_ID(aa64_crc32, ARM_HWCAP_A64_CRC32);
 +    GET_FEATURE_ID(aa64_sha3, ARM_HWCAP_A64_SHA3);
 +    GET_FEATURE_ID(aa64_sm3, ARM_HWCAP_A64_SM3);
 +    GET_FEATURE_ID(aa64_sm4, ARM_HWCAP_A64_SM4);
      GET_FEATURE(ARM_FEATURE_V8_FP16,
                  ARM_HWCAP_A64_FPHP | ARM_HWCAP_A64_ASIMDHP);
 -    GET_FEATURE(ARM_FEATURE_V8_ATOMICS, ARM_HWCAP_A64_ATOMICS);
 -    GET_FEATURE(ARM_FEATURE_V8_RDM, ARM_HWCAP_A64_ASIMDRDM);
 -    GET_FEATURE(ARM_FEATURE_V8_DOTPROD, ARM_HWCAP_A64_ASIMDDP);
 -    GET_FEATURE(ARM_FEATURE_V8_FCMA, ARM_HWCAP_A64_FCMA);
 +    GET_FEATURE_ID(aa64_atomics, ARM_HWCAP_A64_ATOMICS);
 +    GET_FEATURE_ID(aa64_rdm, ARM_HWCAP_A64_ASIMDRDM);
 +    GET_FEATURE_ID(aa64_dp, ARM_HWCAP_A64_ASIMDDP);
 +    GET_FEATURE_ID(aa64_fcma, ARM_HWCAP_A64_FCMA);
      GET_FEATURE(ARM_FEATURE_SVE, ARM_HWCAP_A64_SVE);
 +
  #undef GET_FEATURE
 +#undef GET_FEATURE_ID
      return hwcaps;
  }
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
          cortex_a15_initfn(obj);
  #ifdef CONFIG_USER_ONLY
          /* We don't set these in system emulation mode for the moment,
 -         * since we don't correctly set the ID registers to advertise them,
 +         * since we don't correctly set (all of) the ID registers to
 +         * advertise them.
           */
          set_feature(&cpu->env, ARM_FEATURE_V8);
 -        set_feature(&cpu->env, ARM_FEATURE_V8_AES);
 -        set_feature(&cpu->env, ARM_FEATURE_V8_SHA1);
 -        set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
 -        set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
 -        set_feature(&cpu->env, ARM_FEATURE_CRC);
 -        set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
 -        set_feature(&cpu->env, ARM_FEATURE_V8_DOTPROD);
 -        set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
 +        {
 +            uint32_t t;
 +
 +            t = cpu->isar.id_isar5;
 +            t = FIELD_DP32(t, ID_ISAR5, AES, 2);
 +            t = FIELD_DP32(t, ID_ISAR5, SHA1, 1);
 +            t = FIELD_DP32(t, ID_ISAR5, SHA2, 1);
 +            t = FIELD_DP32(t, ID_ISAR5, CRC32, 1);
 +            t = FIELD_DP32(t, ID_ISAR5, RDM, 1);
 +            t = FIELD_DP32(t, ID_ISAR5, VCMA, 1);
 +            cpu->isar.id_isar5 = t;
 +
 +            t = cpu->isar.id_isar6;
 +            t = FIELD_DP32(t, ID_ISAR6, DP, 1);
 +            cpu->isar.id_isar6 = t;
 +        }
  #endif
      }
  }
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_a57_initfn(Object *obj)
      set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
      set_feature(&cpu->env, ARM_FEATURE_AARCH64);
      set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
 -    set_feature(&cpu->env, ARM_FEATURE_V8_AES);
 -    set_feature(&cpu->env, ARM_FEATURE_V8_SHA1);
 -    set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
 -    set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
 -    set_feature(&cpu->env, ARM_FEATURE_CRC);
      set_feature(&cpu->env, ARM_FEATURE_EL2);
      set_feature(&cpu->env, ARM_FEATURE_EL3);
      set_feature(&cpu->env, ARM_FEATURE_PMU);
@@ -XXX,XX +XXX,XX @@ static void aarch64_a53_initfn(Object *obj)
      set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
      set_feature(&cpu->env, ARM_FEATURE_AARCH64);
      set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
 -    set_feature(&cpu->env, ARM_FEATURE_V8_AES);
 -    set_feature(&cpu->env, ARM_FEATURE_V8_SHA1);
 -    set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
 -    set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
 -    set_feature(&cpu->env, ARM_FEATURE_CRC);
      set_feature(&cpu->env, ARM_FEATURE_EL2);
      set_feature(&cpu->env, ARM_FEATURE_EL3);
      set_feature(&cpu->env, ARM_FEATURE_PMU);
@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
      set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
      set_feature(&cpu->env, ARM_FEATURE_AARCH64);
      set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
 -    set_feature(&cpu->env, ARM_FEATURE_V8_AES);
 -    set_feature(&cpu->env, ARM_FEATURE_V8_SHA1);
 -    set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
 -    set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
 -    set_feature(&cpu->env, ARM_FEATURE_CRC);
      set_feature(&cpu->env, ARM_FEATURE_EL2);
      set_feature(&cpu->env, ARM_FEATURE_EL3);
      set_feature(&cpu->env, ARM_FEATURE_PMU);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
      if (kvm_enabled()) {
          kvm_arm_set_cpu_features_from_host(cpu);
      } else {
 +        uint64_t t;
 +        uint32_t u;
          aarch64_a57_initfn(obj);
 +
 +        t = cpu->isar.id_aa64isar0;
 +        t = FIELD_DP64(t, ID_AA64ISAR0, AES, 2); /* AES + PMULL */
 +        t = FIELD_DP64(t, ID_AA64ISAR0, SHA1, 1);
 +        t = FIELD_DP64(t, ID_AA64ISAR0, SHA2, 2); /* SHA512 */
 +        t = FIELD_DP64(t, ID_AA64ISAR0, CRC32, 1);
 +        t = FIELD_DP64(t, ID_AA64ISAR0, ATOMIC, 2);
 +        t = FIELD_DP64(t, ID_AA64ISAR0, RDM, 1);
 +        t = FIELD_DP64(t, ID_AA64ISAR0, SHA3, 1);
 +        t = FIELD_DP64(t, ID_AA64ISAR0, SM3, 1);
 +        t = FIELD_DP64(t, ID_AA64ISAR0, SM4, 1);
 +        t = FIELD_DP64(t, ID_AA64ISAR0, DP, 1);
 +        cpu->isar.id_aa64isar0 = t;
 +
 +        t = cpu->isar.id_aa64isar1;
 +        t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
 +        cpu->isar.id_aa64isar1 = t;
 +
 +        /* Replicate the same data to the 32-bit id registers.  */
 +        u = cpu->isar.id_isar5;
 +        u = FIELD_DP32(u, ID_ISAR5, AES, 2); /* AES + PMULL */
 +        u = FIELD_DP32(u, ID_ISAR5, SHA1, 1);
 +        u = FIELD_DP32(u, ID_ISAR5, SHA2, 1);
 +        u = FIELD_DP32(u, ID_ISAR5, CRC32, 1);
 +        u = FIELD_DP32(u, ID_ISAR5, RDM, 1);
 +        u = FIELD_DP32(u, ID_ISAR5, VCMA, 1);
 +        cpu->isar.id_isar5 = u;
 +
 +        u = cpu->isar.id_isar6;
 +        u = FIELD_DP32(u, ID_ISAR6, DP, 1);
 +        cpu->isar.id_isar6 = u;
 +
  #ifdef CONFIG_USER_ONLY
          /* We don't set these in system emulation mode for the moment,
           * since we don't correctly set the ID registers to advertise them,
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
           * whereas the architecture requires them to be present in both if
           * present in either.
           */
 -        set_feature(&cpu->env, ARM_FEATURE_V8_SHA512);
 -        set_feature(&cpu->env, ARM_FEATURE_V8_SHA3);
 -        set_feature(&cpu->env, ARM_FEATURE_V8_SM3);
 -        set_feature(&cpu->env, ARM_FEATURE_V8_SM4);
 -        set_feature(&cpu->env, ARM_FEATURE_V8_ATOMICS);
 -        set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
 -        set_feature(&cpu->env, ARM_FEATURE_V8_DOTPROD);
          set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
 -        set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
          set_feature(&cpu->env, ARM_FEATURE_SVE);
          /* For usermode -cpu max we can use a larger and more efficient DCZ
           * blocksize since we don't have to follow what the hardware does.
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
-         }
-         if (rt2 == 31
-             && ((rt | rs) & 1) == 0
--            && arm_dc_feature(s, ARM_FEATURE_V8_ATOMICS)) {
-+            && dc_isar_feature(aa64_atomics, s)) {
-             /* CASP / CASPL */
-             gen_compare_and_swap_pair(s, rs, rt, rn, size | 2);
-             return;
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
-         }
-         if (rt2 == 31
-             && ((rt | rs) & 1) == 0
--            && arm_dc_feature(s, ARM_FEATURE_V8_ATOMICS)) {
-+            && dc_isar_feature(aa64_atomics, s)) {
-             /* CASPA / CASPAL */
-             gen_compare_and_swap_pair(s, rs, rt, rn, size | 2);
-             return;
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
-     case 0xb: /* CASL */
-     case 0xe: /* CASA */
-     case 0xf: /* CASAL */
--        if (rt2 == 31 && arm_dc_feature(s, ARM_FEATURE_V8_ATOMICS)) {
-+        if (rt2 == 31 && dc_isar_feature(aa64_atomics, s)) {
-             gen_compare_and_swap(s, rs, rt, rn, size);
-             return;
-         }
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
-     int rs = extract32(insn, 16, 5);
-     int rn = extract32(insn, 5, 5);
-     int o3_opc = extract32(insn, 12, 4);
--    int feature = ARM_FEATURE_V8_ATOMICS;
-     TCGv_i64 tcg_rn, tcg_rs;
-     AtomicThreeOpFn *fn;
--    if (is_vector) {
-+    if (is_vector || !dc_isar_feature(aa64_atomics, s)) {
-         unallocated_encoding(s);
-         return;
-     }
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
-         unallocated_encoding(s);
-         return;
-     }
--    if (!arm_dc_feature(s, feature)) {
--        unallocated_encoding(s);
--        return;
--    }
-     if (rn == 31) {
-         gen_check_sp_alignment(s);
-@@ -XXX,XX +XXX,XX @@ static void handle_crc32(DisasContext *s,
-     TCGv_i64 tcg_acc, tcg_val;
-     TCGv_i32 tcg_bytes;
--    if (!arm_dc_feature(s, ARM_FEATURE_CRC)
-+    if (!dc_isar_feature(aa64_crc32, s)
-         || (sf == 1 && sz != 3)
-         || (sf == 0 && sz == 3)) {
-         unallocated_encoding(s);
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_three_reg_same_extra(DisasContext *s,
-     bool u = extract32(insn, 29, 1);
-     TCGv_i32 ele1, ele2, ele3;
-     TCGv_i64 res;
--    int feature;
-+    bool feature;
-     switch (u * 16 + opcode) {
-     case 0x10: /* SQRDMLAH (vector) */
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_three_reg_same_extra(DisasContext *s,
-             unallocated_encoding(s);
-             return;
-         }
--        feature = ARM_FEATURE_V8_RDM;
-+        feature = dc_isar_feature(aa64_rdm, s);
-         break;
-     default:
-         unallocated_encoding(s);
-         return;
-     }
--    if (!arm_dc_feature(s, feature)) {
-+    if (!feature) {
-         unallocated_encoding(s);
-         return;
-     }
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
-             return;
-         }
-         if (size == 3) {
--            if (!arm_dc_feature(s, ARM_FEATURE_V8_PMULL)) {
-+            if (!dc_isar_feature(aa64_pmull, s)) {
-                 unallocated_encoding(s);
-                 return;
-             }
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
-     int size = extract32(insn, 22, 2);
-     bool u = extract32(insn, 29, 1);
-     bool is_q = extract32(insn, 30, 1);
--    int feature, rot;
-+    bool feature;
-+    int rot;
-     switch (u * 16 + opcode) {
-     case 0x10: /* SQRDMLAH (vector) */
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
-             unallocated_encoding(s);
-             return;
-         }
--        feature = ARM_FEATURE_V8_RDM;
-+        feature = dc_isar_feature(aa64_rdm, s);
-         break;
-     case 0x02: /* SDOT (vector) */
-     case 0x12: /* UDOT (vector) */
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
-             unallocated_encoding(s);
-             return;
-         }
--        feature = ARM_FEATURE_V8_DOTPROD;
-+        feature = dc_isar_feature(aa64_dp, s);
-         break;
-     case 0x18: /* FCMLA, #0 */
-     case 0x19: /* FCMLA, #90 */
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
-             unallocated_encoding(s);
-             return;
-         }
--        feature = ARM_FEATURE_V8_FCMA;
-+        feature = dc_isar_feature(aa64_fcma, s);
-         break;
-     default:
-         unallocated_encoding(s);
-         return;
-     }
--    if (!arm_dc_feature(s, feature)) {
-+    if (!feature) {
-         unallocated_encoding(s);
-         return;
-     }
 @@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
-         break;
-     case 0x1d: /* SQRDMLAH */
-     case 0x1f: /* SQRDMLSH */
--        if (!arm_dc_feature(s, ARM_FEATURE_V8_RDM)) {
-+        if (!dc_isar_feature(aa64_rdm, s)) {
-             unallocated_encoding(s);
              return;
          }
          break;
-     case 0x0e: /* SDOT */
+-    case 0x0f: /* SUDOT, USDOT */
-     case 0x1e: /* UDOT */
+-        if (is_scalar || (size & 1) || !dc_isar_feature(aa64_i8mm, s)) {
--        if (size != MO_32 || !arm_dc_feature(s, ARM_FEATURE_V8_DOTPROD)) {
++    case 0x0f:
-+        if (size != MO_32 || !dc_isar_feature(aa64_dp, s)) {
++        switch (size) {
 +        case 0: /* SUDOT */
 +        case 2: /* USDOT */
 +            if (is_scalar || !dc_isar_feature(aa64_i8mm, s)) {
 +                unallocated_encoding(s);
 +                return;
 +            }
 +            break;
 +        case 1: /* BFDOT */
 +            if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
 +                unallocated_encoding(s);
 +                return;
 +            }
 +            break;
 +        default:
              unallocated_encoding(s);
              return;
          }
 @@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
+                          u ? gen_helper_gvec_udot_idx_b
+                          : gen_helper_gvec_sdot_idx_b);
+         return;
+-    case 0x0f: /* SUDOT, USDOT */
+-        gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
+-                         extract32(insn, 23, 1)
+-                         ? gen_helper_gvec_usdot_idx_b
+-                         : gen_helper_gvec_sudot_idx_b);
+-        return;
+-
++    case 0x0f:
++        switch (extract32(insn, 22, 2)) {
++        case 0: /* SUDOT */
++            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
++                             gen_helper_gvec_sudot_idx_b);
++            return;
++        case 1: /* BFDOT */
++            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
++                             gen_helper_gvec_bfdot_idx);
++            return;
++        case 2: /* USDOT */
++            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
++                             gen_helper_gvec_usdot_idx_b);
++            return;
++        }
++        g_assert_not_reached();
+     case 0x11: /* FCMLA #0 */
      case 0x13: /* FCMLA #90 */
      case 0x15: /* FCMLA #180 */
-     case 0x17: /* FCMLA #270 */
+diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
--        if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)) {
+index XXXXXXX..XXXXXXX 100644
-+        if (!dc_isar_feature(aa64_fcma, s)) {
+--- a/target/arm/translate-neon.c
-             unallocated_encoding(s);
++++ b/target/arm/translate-neon.c
-             return;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VSUDOT_scalar(DisasContext *s, arg_VSUDOT_scalar *a)
-         }
+                         gen_helper_gvec_sudot_idx_b);
-@@ -XXX,XX +XXX,XX @@ static void disas_crypto_aes(DisasContext *s, uint32_t insn)
+ }
-     TCGv_i32 tcg_decrypt;
-     CryptoThreeOpIntFn *genfn;
++static bool trans_VDOT_b16_scal(DisasContext *s, arg_VDOT_b16_scal *a)
++{
--    if (!arm_dc_feature(s, ARM_FEATURE_V8_AES)
++    if (!dc_isar_feature(aa32_bf16, s)) {
--        || size != 0) {
++        return false;
-+    if (!dc_isar_feature(aa64_aes, s) || size != 0) {
++    }
-         unallocated_encoding(s);
++    return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index,
-         return;
++                        gen_helper_gvec_bfdot_idx);
 +}
 +
  static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
  {
      int opr_sz;
 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-sve.c
 +++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFDOT_zzzz(DisasContext *s, arg_rrrr_esz *a)
      }
-@@ -XXX,XX +XXX,XX @@ static void disas_crypto_three_reg_sha(DisasContext *s, uint32_t insn)
+     return true;
-     int rd = extract32(insn, 0, 5);
+ }
-     CryptoThreeOpFn *genfn;
++
-     TCGv_ptr tcg_rd_ptr, tcg_rn_ptr, tcg_rm_ptr;
++static bool trans_BFDOT_zzxz(DisasContext *s, arg_rrxr_esz *a)
--    int feature = ARM_FEATURE_V8_SHA256;
++{
-+    bool feature;
++    if (!dc_isar_feature(aa64_sve_bf16, s)) {
++        return false;
-     if (size != 0) {
++    }
-         unallocated_encoding(s);
++    if (sve_access_check(s)) {
-@@ -XXX,XX +XXX,XX @@ static void disas_crypto_three_reg_sha(DisasContext *s, uint32_t insn)
++        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfdot_idx,
-     case 2: /* SHA1M */
++                          a->rd, a->rn, a->rm, a->ra, a->index);
-     case 3: /* SHA1SU0 */
++    }
-         genfn = NULL;
++    return true;
--        feature = ARM_FEATURE_V8_SHA1;
++}
-+        feature = dc_isar_feature(aa64_sha1, s);
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
-         break;
+index XXXXXXX..XXXXXXX 100644
-     case 4: /* SHA256H */
+--- a/target/arm/vec_helper.c
-         genfn = gen_helper_crypto_sha256h;
++++ b/target/arm/vec_helper.c
-+        feature = dc_isar_feature(aa64_sha256, s);
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
          break;
      case 5: /* SHA256H2 */
          genfn = gen_helper_crypto_sha256h2;
 +        feature = dc_isar_feature(aa64_sha256, s);
          break;
      case 6: /* SHA256SU1 */
          genfn = gen_helper_crypto_sha256su1;
 +        feature = dc_isar_feature(aa64_sha256, s);
          break;
      default:
          unallocated_encoding(s);
          return;
      }
+     clear_tail(d, opr_sz, simd_maxsz(desc));
--    if (!arm_dc_feature(s, feature)) {
+ }
-+    if (!feature) {
++
-         unallocated_encoding(s);
++void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm,
-         return;
++                            void *va, uint32_t desc)
-     }
++{
-@@ -XXX,XX +XXX,XX @@ static void disas_crypto_two_reg_sha(DisasContext *s, uint32_t insn)
++    intptr_t i, j, opr_sz = simd_oprsz(desc);
-     int rn = extract32(insn, 5, 5);
++    intptr_t index = simd_data(desc);
-     int rd = extract32(insn, 0, 5);
++    intptr_t elements = opr_sz / 4;
-     CryptoTwoOpFn *genfn;
++    intptr_t eltspersegment = MIN(16 / 4, elements);
--    int feature;
++    float32 *d = vd, *a = va;
-+    bool feature;
++    uint32_t *n = vn, *m = vm;
-     TCGv_ptr tcg_rd_ptr, tcg_rn_ptr;
++
++    for (i = 0; i < elements; i += eltspersegment) {
-     if (size != 0) {
++        uint32_t m_idx = m[i + H4(index)];
-@@ -XXX,XX +XXX,XX @@ static void disas_crypto_two_reg_sha(DisasContext *s, uint32_t insn)
++
++        for (j = i; j < i + eltspersegment; j++) {
-     switch (opcode) {
++            d[j] = bfdotadd(a[j], n[j], m_idx);
-     case 0: /* SHA1H */
++        }
--        feature = ARM_FEATURE_V8_SHA1;
++    }
-+        feature = dc_isar_feature(aa64_sha1, s);
++    clear_tail(d, opr_sz, simd_maxsz(desc));
-         genfn = gen_helper_crypto_sha1h;
++}
          break;
      case 1: /* SHA1SU1 */
 -        feature = ARM_FEATURE_V8_SHA1;
 +        feature = dc_isar_feature(aa64_sha1, s);
          genfn = gen_helper_crypto_sha1su1;
          break;
      case 2: /* SHA256SU0 */
 -        feature = ARM_FEATURE_V8_SHA256;
 +        feature = dc_isar_feature(aa64_sha256, s);
          genfn = gen_helper_crypto_sha256su0;
          break;
      default:
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_two_reg_sha(DisasContext *s, uint32_t insn)
          return;
      }
 -    if (!arm_dc_feature(s, feature)) {
 +    if (!feature) {
          unallocated_encoding(s);
          return;
      }
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_three_reg_sha512(DisasContext *s, uint32_t insn)
      int rm = extract32(insn, 16, 5);
      int rn = extract32(insn, 5, 5);
      int rd = extract32(insn, 0, 5);
 -    int feature;
 +    bool feature;
      CryptoThreeOpFn *genfn;
      if (o == 0) {
          switch (opcode) {
          case 0: /* SHA512H */
 -            feature = ARM_FEATURE_V8_SHA512;
 +            feature = dc_isar_feature(aa64_sha512, s);
              genfn = gen_helper_crypto_sha512h;
              break;
          case 1: /* SHA512H2 */
 -            feature = ARM_FEATURE_V8_SHA512;
 +            feature = dc_isar_feature(aa64_sha512, s);
              genfn = gen_helper_crypto_sha512h2;
              break;
          case 2: /* SHA512SU1 */
 -            feature = ARM_FEATURE_V8_SHA512;
 +            feature = dc_isar_feature(aa64_sha512, s);
              genfn = gen_helper_crypto_sha512su1;
              break;
          case 3: /* RAX1 */
 -            feature = ARM_FEATURE_V8_SHA3;
 +            feature = dc_isar_feature(aa64_sha3, s);
              genfn = NULL;
              break;
          }
      } else {
          switch (opcode) {
          case 0: /* SM3PARTW1 */
 -            feature = ARM_FEATURE_V8_SM3;
 +            feature = dc_isar_feature(aa64_sm3, s);
              genfn = gen_helper_crypto_sm3partw1;
              break;
          case 1: /* SM3PARTW2 */
 -            feature = ARM_FEATURE_V8_SM3;
 +            feature = dc_isar_feature(aa64_sm3, s);
              genfn = gen_helper_crypto_sm3partw2;
              break;
          case 2: /* SM4EKEY */
 -            feature = ARM_FEATURE_V8_SM4;
 +            feature = dc_isar_feature(aa64_sm4, s);
              genfn = gen_helper_crypto_sm4ekey;
              break;
          default:
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_three_reg_sha512(DisasContext *s, uint32_t insn)
          }
      }
 -    if (!arm_dc_feature(s, feature)) {
 +    if (!feature) {
          unallocated_encoding(s);
          return;
      }
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_two_reg_sha512(DisasContext *s, uint32_t insn)
      int rn = extract32(insn, 5, 5);
      int rd = extract32(insn, 0, 5);
      TCGv_ptr tcg_rd_ptr, tcg_rn_ptr;
 -    int feature;
 +    bool feature;
      CryptoTwoOpFn *genfn;
      switch (opcode) {
      case 0: /* SHA512SU0 */
 -        feature = ARM_FEATURE_V8_SHA512;
 +        feature = dc_isar_feature(aa64_sha512, s);
          genfn = gen_helper_crypto_sha512su0;
          break;
      case 1: /* SM4E */
 -        feature = ARM_FEATURE_V8_SM4;
 +        feature = dc_isar_feature(aa64_sm4, s);
          genfn = gen_helper_crypto_sm4e;
          break;
      default:
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_two_reg_sha512(DisasContext *s, uint32_t insn)
          return;
      }
 -    if (!arm_dc_feature(s, feature)) {
 +    if (!feature) {
          unallocated_encoding(s);
          return;
      }
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_four_reg(DisasContext *s, uint32_t insn)
      int ra = extract32(insn, 10, 5);
      int rn = extract32(insn, 5, 5);
      int rd = extract32(insn, 0, 5);
 -    int feature;
 +    bool feature;
      switch (op0) {
      case 0: /* EOR3 */
      case 1: /* BCAX */
 -        feature = ARM_FEATURE_V8_SHA3;
 +        feature = dc_isar_feature(aa64_sha3, s);
          break;
      case 2: /* SM3SS1 */
 -        feature = ARM_FEATURE_V8_SM3;
 +        feature = dc_isar_feature(aa64_sm3, s);
          break;
      default:
          unallocated_encoding(s);
          return;
      }
 -    if (!arm_dc_feature(s, feature)) {
 +    if (!feature) {
          unallocated_encoding(s);
          return;
      }
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_xar(DisasContext *s, uint32_t insn)
      TCGv_i64 tcg_op1, tcg_op2, tcg_res[2];
      int pass;
 -    if (!arm_dc_feature(s, ARM_FEATURE_V8_SHA3)) {
 +    if (!dc_isar_feature(aa64_sha3, s)) {
          unallocated_encoding(s);
          return;
      }
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_three_reg_imm2(DisasContext *s, uint32_t insn)
      TCGv_ptr tcg_rd_ptr, tcg_rn_ptr, tcg_rm_ptr;
      TCGv_i32 tcg_imm2, tcg_opcode;
 -    if (!arm_dc_feature(s, ARM_FEATURE_V8_SM3)) {
 +    if (!dc_isar_feature(aa64_sm3, s)) {
          unallocated_encoding(s);
          return;
      }
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
      ARMCPU *arm_cpu = arm_env_get_cpu(env);
      int bound;
 +    dc->isar = &arm_cpu->isar;
      dc->pc = dc->base.pc_first;
      dc->condjmp = 0;
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static const uint8_t neon_2rm_sizes[] = {
  static int do_v81_helper(DisasContext *s, gen_helper_gvec_3_ptr *fn,
                           int q, int rd, int rn, int rm)
  {
 -    if (arm_dc_feature(s, ARM_FEATURE_V8_RDM)) {
 +    if (dc_isar_feature(aa32_rdm, s)) {
          int opr_sz = (1 + q) * 8;
          tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd),
                             vfp_reg_offset(1, rn),
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  return 1;
              }
              if (!u) { /* SHA-1 */
 -                if (!arm_dc_feature(s, ARM_FEATURE_V8_SHA1)) {
 +                if (!dc_isar_feature(aa32_sha1, s)) {
                      return 1;
                  }
                  ptr1 = vfp_reg_ptr(true, rd);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  gen_helper_crypto_sha1_3reg(ptr1, ptr2, ptr3, tmp4);
                  tcg_temp_free_i32(tmp4);
              } else { /* SHA-256 */
 -                if (!arm_dc_feature(s, ARM_FEATURE_V8_SHA256) || size == 3) {
 +                if (!dc_isar_feature(aa32_sha2, s) || size == 3) {
                      return 1;
                  }
                  ptr1 = vfp_reg_ptr(true, rd);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  if (op == 14 && size == 2) {
                      TCGv_i64 tcg_rn, tcg_rm, tcg_rd;
 -                    if (!arm_dc_feature(s, ARM_FEATURE_V8_PMULL)) {
 +                    if (!dc_isar_feature(aa32_pmull, s)) {
                          return 1;
                      }
                      tcg_rn = tcg_temp_new_i64();
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      {
                          NeonGenThreeOpEnvFn *fn;
 -                        if (!arm_dc_feature(s, ARM_FEATURE_V8_RDM)) {
 +                        if (!dc_isar_feature(aa32_rdm, s)) {
                              return 1;
                          }
                          if (u && ((rd | rn) & 1)) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      break;
                  }
                  case NEON_2RM_AESE: case NEON_2RM_AESMC:
 -                    if (!arm_dc_feature(s, ARM_FEATURE_V8_AES)
 -                        || ((rm | rd) & 1)) {
 +                    if (!dc_isar_feature(aa32_aes, s) || ((rm | rd) & 1)) {
                          return 1;
                      }
                      ptr1 = vfp_reg_ptr(true, rd);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      tcg_temp_free_i32(tmp3);
                      break;
                  case NEON_2RM_SHA1H:
 -                    if (!arm_dc_feature(s, ARM_FEATURE_V8_SHA1)
 -                        || ((rm | rd) & 1)) {
 +                    if (!dc_isar_feature(aa32_sha1, s) || ((rm | rd) & 1)) {
                          return 1;
                      }
                      ptr1 = vfp_reg_ptr(true, rd);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      }
                      /* bit 6 (q): set -> SHA256SU0, cleared -> SHA1SU1 */
                      if (q) {
 -                        if (!arm_dc_feature(s, ARM_FEATURE_V8_SHA256)) {
 +                        if (!dc_isar_feature(aa32_sha2, s)) {
                              return 1;
                          }
 -                    } else if (!arm_dc_feature(s, ARM_FEATURE_V8_SHA1)) {
 +                    } else if (!dc_isar_feature(aa32_sha1, s)) {
                          return 1;
                      }
                      ptr1 = vfp_reg_ptr(true, rd);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
          /* VCMLA -- 1111 110R R.1S .... .... 1000 ...0 .... */
          int size = extract32(insn, 20, 1);
          data = extract32(insn, 23, 2); /* rot */
 -        if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)
 +        if (!dc_isar_feature(aa32_vcma, s)
              || (!size && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) {
              return 1;
          }
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
          /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
          int size = extract32(insn, 20, 1);
          data = extract32(insn, 24, 1); /* rot */
 -        if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)
 +        if (!dc_isar_feature(aa32_vcma, s)
              || (!size && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) {
              return 1;
          }
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
      } else if ((insn & 0xfeb00f00) == 0xfc200d00) {
          /* V[US]DOT -- 1111 1100 0.10 .... .... 1101 .Q.U .... */
          bool u = extract32(insn, 4, 1);
 -        if (!arm_dc_feature(s, ARM_FEATURE_V8_DOTPROD)) {
 +        if (!dc_isar_feature(aa32_dp, s)) {
              return 1;
          }
          fn_gvec = u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
          int size = extract32(insn, 23, 1);
          int index;
 -        if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)) {
 +        if (!dc_isar_feature(aa32_vcma, s)) {
              return 1;
          }
          if (size == 0) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
      } else if ((insn & 0xffb00f00) == 0xfe200d00) {
          /* V[US]DOT -- 1111 1110 0.10 .... .... 1101 .Q.U .... */
          int u = extract32(insn, 4, 1);
 -        if (!arm_dc_feature(s, ARM_FEATURE_V8_DOTPROD)) {
 +        if (!dc_isar_feature(aa32_dp, s)) {
              return 1;
          }
          fn_gvec = u ? gen_helper_gvec_udot_idx_b : gen_helper_gvec_sdot_idx_b;
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
               * op1 == 3 is UNPREDICTABLE but handle as UNDEFINED.
               * Bits 8, 10 and 11 should be zero.
               */
 -            if (!arm_dc_feature(s, ARM_FEATURE_CRC) || op1 == 0x3 ||
 -                (c & 0xd) != 0) {
 +            if (!dc_isar_feature(aa32_crc32, s) || op1 == 0x3 || (c & 0xd) != 0) {
                  goto illegal_op;
              }
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
                  case 0x28:
                  case 0x29:
                  case 0x2a:
 -                    if (!arm_dc_feature(s, ARM_FEATURE_CRC)) {
 +                    if (!dc_isar_feature(aa32_crc32, s)) {
                          goto illegal_op;
                      }
                      break;
@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
      CPUARMState *env = cs->env_ptr;
      ARMCPU *cpu = arm_env_get_cpu(env);
 +    dc->isar = &cpu->isar;
      dc->pc = dc->base.pc_first;
      dc->condjmp = 0;
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 24/45] target/arm: Use tcg_gen_gvec_dup_i64 for LD[1-4]R
+[PULL 23/45] target/arm: Implement bfloat16 matrix multiply accumulate
 From: Richard Henderson <richard.henderson@linaro.org>
+This is BFMMLA for both AArch64 AdvSIMD and SVE,
+and VMMLA.BF16 for AArch32 NEON.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181011205206.3552-4-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-9-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 28 +++-------------------------
+ target/arm/helper.h           |  3 +++
-file changed, 3 insertions(+), 25 deletions(-)
+ target/arm/neon-shared.decode |  2 ++
  target/arm/sve.decode         |  6 +++--
  target/arm/translate-a64.c    | 10 +++++++++
  target/arm/translate-neon.c   |  9 ++++++++
  target/arm/translate-sve.c    | 12 ++++++++++
  target/arm/vec_helper.c       | 42 ++++++++++++++++++++++++++++++++++-
 files changed, 81 insertions(+), 3 deletions(-)
+diff --git a/target/arm/helper.h b/target/arm/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.h
++++ b/target/arm/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
++                   void, ptr, ptr, ptr, ptr, i32)
++
+ #ifdef TARGET_AARCH64
+ #include "helper-a64.h"
+ #include "helper-sve.h"
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/neon-shared.decode
++++ b/target/arm/neon-shared.decode
+@@ -XXX,XX +XXX,XX @@ VUMMLA         1111 1100 0.10 .... .... 1100 .1.1 .... \
+                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+ VUSMMLA        1111 1100 1.10 .... .... 1100 .1.0 .... \
+                vm=%vm_dp vn=%vn_dp vd=%vd_dp
++VMMLA_b16      1111 1100 0.00 .... .... 1100 .1.0 .... \
++               vm=%vm_dp vn=%vn_dp vd=%vd_dp
+ VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
+                vn=%vn_dp vd=%vd_dp size=1
+diff --git a/target/arm/sve.decode b/target/arm/sve.decode
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/sve.decode
++++ b/target/arm/sve.decode
+@@ -XXX,XX +XXX,XX @@ SQRDCMLAH_zzzz  01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5  ra=%reg_movprfx
+ USDOT_zzzz      01000100 .. 0 ..... 011 110 ..... .....  @rda_rn_rm
+ ### SVE2 floating point matrix multiply accumulate
+-
+-FMMLA           01100100 .. 1 ..... 111001 ..... .....  @rda_rn_rm
++{
++  BFMMLA        01100100 01 1 ..... 111 001 ..... .....  @rda_rn_rm_e0
++  FMMLA         01100100 .. 1 ..... 111 001 ..... .....  @rda_rn_rm
++}
+ ### SVE2 Memory Gather Load Group
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
-     for (xs = 0; xs < selem; xs++) {
+         }
-         if (replicate) {
+         feature = dc_isar_feature(aa64_fcma, s);
-             /* Load and replicate to all elements */
+         break;
--            uint64_t mulconst;
++    case 0x1d: /* BFMMLA */
-             TCGv_i64 tcg_tmp = tcg_temp_new_i64();
++        if (size != MO_16 || !is_q) {
++            unallocated_encoding(s);
-             tcg_gen_qemu_ld_i64(tcg_tmp, tcg_addr,
++            return;
-                                 get_mem_index(s), s->be_data + scale);
++        }
--            switch (scale) {
++        feature = dc_isar_feature(aa64_bf16, s);
--            case 0:
++        break;
--                mulconst = 0x0101010101010101ULL;
+     case 0x1f: /* BFDOT */
--                break;
+         switch (size) {
--            case 1:
+         case 1:
--                mulconst = 0x0001000100010001ULL;
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
--                break;
+         }
--            case 2:
+         return;
--                mulconst = 0x0000000100000001ULL;
--                break;
++    case 0xd: /* BFMMLA */
--            case 3:
++        gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
--                mulconst = 0;
++        return;
--                break;
+     case 0xf: /* BFDOT */
--            default:
+         switch (size) {
--                g_assert_not_reached();
+         case 1:
--            }
+diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
--            if (mulconst) {
+index XXXXXXX..XXXXXXX 100644
--                tcg_gen_muli_i64(tcg_tmp, tcg_tmp, mulconst);
+--- a/target/arm/translate-neon.c
--            }
++++ b/target/arm/translate-neon.c
--            write_vec_element(s, tcg_tmp, rt, 0, MO_64);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VUSMMLA(DisasContext *s, arg_VUSMMLA *a)
--            if (is_q) {
+     return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
--                write_vec_element(s, tcg_tmp, rt, 1, MO_64);
+                         gen_helper_gvec_usmmla_b);
--            }
+ }
-+            tcg_gen_gvec_dup_i64(scale, vec_full_reg_offset(s, rt),
++
-+                                 (is_q + 1) * 8, vec_full_reg_size(s),
++static bool trans_VMMLA_b16(DisasContext *s, arg_VMMLA_b16 *a)
-+                                 tcg_tmp);
++{
-             tcg_temp_free_i64(tcg_tmp);
++    if (!dc_isar_feature(aa32_bf16, s)) {
--            clear_vec_high(s, is_q, rt);
++        return false;
-         } else {
++    }
-             /* Load/store one element per register */
++    return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
-             if (is_load) {
++                        gen_helper_gvec_bfmmla);
 +}
 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-sve.c
 +++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFDOT_zzxz(DisasContext *s, arg_rrxr_esz *a)
      }
      return true;
  }
 +
 +static bool trans_BFMMLA(DisasContext *s, arg_rrrr_esz *a)
 +{
 +    if (!dc_isar_feature(aa64_sve_bf16, s)) {
 +        return false;
 +    }
 +    if (sve_access_check(s)) {
 +        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfmmla,
 +                          a->rd, a->rn, a->rm, a->ra, 0);
 +    }
 +    return true;
 +}
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static void do_mmla_b(void *vd, void *vn, void *vm, void *va, uint32_t desc,
           * Process the entire segment at once, writing back the
           * results only after we've consumed all of the inputs.
           *
 -         * Key to indicies by column:
 +         * Key to indices by column:
           *          i   j                  i             j
           */
          sum0 = a[H4(0 + 0)];
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm,
      }
      clear_tail(d, opr_sz, simd_maxsz(desc));
  }
 +
 +void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
 +{
 +    intptr_t s, opr_sz = simd_oprsz(desc);
 +    float32 *d = vd, *a = va;
 +    uint32_t *n = vn, *m = vm;
 +
 +    for (s = 0; s < opr_sz / 4; s += 4) {
 +        float32 sum00, sum01, sum10, sum11;
 +
 +        /*
 +         * Process the entire segment at once, writing back the
 +         * results only after we've consumed all of the inputs.
 +         *
 +         * Key to indicies by column:
 +         *               i   j           i   k             j   k
 +         */
 +        sum00 = a[s + H4(0 + 0)];
 +        sum00 = bfdotadd(sum00, n[s + H4(0 + 0)], m[s + H4(0 + 0)]);
 +        sum00 = bfdotadd(sum00, n[s + H4(0 + 1)], m[s + H4(0 + 1)]);
 +
 +        sum01 = a[s + H4(0 + 1)];
 +        sum01 = bfdotadd(sum01, n[s + H4(0 + 0)], m[s + H4(2 + 0)]);
 +        sum01 = bfdotadd(sum01, n[s + H4(0 + 1)], m[s + H4(2 + 1)]);
 +
 +        sum10 = a[s + H4(2 + 0)];
 +        sum10 = bfdotadd(sum10, n[s + H4(2 + 0)], m[s + H4(0 + 0)]);
 +        sum10 = bfdotadd(sum10, n[s + H4(2 + 1)], m[s + H4(0 + 1)]);
 +
 +        sum11 = a[s + H4(2 + 1)];
 +        sum11 = bfdotadd(sum11, n[s + H4(2 + 0)], m[s + H4(2 + 0)]);
 +        sum11 = bfdotadd(sum11, n[s + H4(2 + 1)], m[s + H4(2 + 1)]);
 +
 +        d[s + H4(0 + 0)] = sum00;
 +        d[s + H4(0 + 1)] = sum01;
 +        d[s + H4(2 + 0)] = sum10;
 +        d[s + H4(2 + 1)] = sum11;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 34/45] target/arm: Use gvec for VSRA
+[PULL 24/45] target/arm: Implement bfloat widening fma (vector)
 From: Richard Henderson <richard.henderson@linaro.org>
-Move ssra_op and usra_op expanders from translate-a64.c.
+This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
 and VFMA{B,T}.BF16 for AArch32 NEON.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181011205206.3552-14-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-10-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.h     |   2 +
+ target/arm/helper.h           |  3 +++
- target/arm/translate-a64.c | 106 ----------------------------
+ target/arm/neon-shared.decode |  3 +++
- target/arm/translate.c     | 139 ++++++++++++++++++++++++++++++++++---
+ target/arm/sve.decode         |  3 +++
-files changed, 130 insertions(+), 117 deletions(-)
+ target/arm/translate-a64.c    | 13 +++++++++----
  target/arm/translate-neon.c   |  9 +++++++++
  target/arm/translate-sve.c    | 30 ++++++++++++++++++++++++++++++
  target/arm/vec_helper.c       | 16 ++++++++++++++++
 files changed, 73 insertions(+), 4 deletions(-)
-diff --git a/target/arm/translate.h b/target/arm/translate.h
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.h
+--- a/target/arm/helper.h
-+++ b/target/arm/translate.h
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ static inline TCGv_i32 get_ahp_flag(void)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
- extern const GVecGen3 bsl_op;
+ DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
- extern const GVecGen3 bit_op;
+                    void, ptr, ptr, ptr, ptr, i32)
- extern const GVecGen3 bif_op;
-+extern const GVecGen2i ssra_op[4];
++DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
-+extern const GVecGen2i usra_op[4];
++                   void, ptr, ptr, ptr, ptr, ptr, i32)
++
- /*
+ #ifdef TARGET_AARCH64
-  * Forward to the isar_feature_* tests given a DisasContext pointer.
+ #include "helper-a64.h"
  #include "helper-sve.h"
 diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon-shared.decode
 +++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUSMMLA        1111 1100 1.10 .... .... 1100 .1.0 .... \
  VMMLA_b16      1111 1100 0.00 .... .... 1100 .1.0 .... \
                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +VFMA_b16       1111 110 0 0.11 .... .... 1000 . q:1 . 1 .... \
 +               vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +
  VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
                 vn=%vn_dp vd=%vd_dp size=1
  VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
 diff --git a/target/arm/sve.decode b/target/arm/sve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/sve.decode
 +++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
  FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_e0
  FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_e0
 +BFMLALB_zzzw    01100100 11 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
 +BFMLALT_zzzw    01100100 11 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
 +
  ### SVE2 floating-point bfloat16 dot-product
  BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          feature = dc_isar_feature(aa64_bf16, s);
          break;
 -    case 0x1f: /* BFDOT */
 +    case 0x1f:
          switch (size) {
 -        case 1:
 +        case 1: /* BFDOT */
 +        case 3: /* BFMLAL{B,T} */
              feature = dc_isar_feature(aa64_bf16, s);
              break;
          default:
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
      case 0xd: /* BFMMLA */
          gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
          return;
 -    case 0xf: /* BFDOT */
 +    case 0xf:
          switch (size) {
 -        case 1:
 +        case 1: /* BFDOT */
              gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
              break;
 +        case 3: /* BFMLAL{B,T} */
 +            gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, false, is_q,
 +                              gen_helper_gvec_bfmlal);
 +            break;
          default:
              g_assert_not_reached();
          }
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMMLA_b16(DisasContext *s, arg_VMMLA_b16 *a)
      return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
                          gen_helper_gvec_bfmmla);
  }
 +
 +static bool trans_VFMA_b16(DisasContext *s, arg_VFMA_b16 *a)
 +{
 +    if (!dc_isar_feature(aa32_bf16, s)) {
 +        return false;
 +    }
 +    return do_neon_ddda_fpst(s, 7, a->vd, a->vn, a->vm, a->q, FPST_STD,
 +                             gen_helper_gvec_bfmlal);
 +}
 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-sve.c
 +++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMMLA(DisasContext *s, arg_rrrr_esz *a)
      }
+     return true;
  }
++
--static void gen_ssra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
++static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel)
 -{
 -    tcg_gen_vec_sar8i_i64(a, a, shift);
 -    tcg_gen_vec_add8_i64(d, d, a);
 -}
 -
 -static void gen_ssra16_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 -{
 -    tcg_gen_vec_sar16i_i64(a, a, shift);
 -    tcg_gen_vec_add16_i64(d, d, a);
 -}
 -
 -static void gen_ssra32_i32(TCGv_i32 d, TCGv_i32 a, int32_t shift)
 -{
 -    tcg_gen_sari_i32(a, a, shift);
 -    tcg_gen_add_i32(d, d, a);
 -}
 -
 -static void gen_ssra64_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 -{
 -    tcg_gen_sari_i64(a, a, shift);
 -    tcg_gen_add_i64(d, d, a);
 -}
 -
 -static void gen_ssra_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
 -{
 -    tcg_gen_sari_vec(vece, a, a, sh);
 -    tcg_gen_add_vec(vece, d, d, a);
 -}
 -
 -static void gen_usra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 -{
 -    tcg_gen_vec_shr8i_i64(a, a, shift);
 -    tcg_gen_vec_add8_i64(d, d, a);
 -}
 -
 -static void gen_usra16_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 -{
 -    tcg_gen_vec_shr16i_i64(a, a, shift);
 -    tcg_gen_vec_add16_i64(d, d, a);
 -}
 -
 -static void gen_usra32_i32(TCGv_i32 d, TCGv_i32 a, int32_t shift)
 -{
 -    tcg_gen_shri_i32(a, a, shift);
 -    tcg_gen_add_i32(d, d, a);
 -}
 -
 -static void gen_usra64_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 -{
 -    tcg_gen_shri_i64(a, a, shift);
 -    tcg_gen_add_i64(d, d, a);
 -}
 -
 -static void gen_usra_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
 -{
 -    tcg_gen_shri_vec(vece, a, a, sh);
 -    tcg_gen_add_vec(vece, d, d, a);
 -}
 -
  static void gen_shr8_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
  {
      uint64_t mask = dup_const(MO_8, 0xff >> shift);
@@ -XXX,XX +XXX,XX @@ static void gen_shr_ins_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
  static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u,
                                   int immh, int immb, int opcode, int rn, int rd)
  {
 -    static const GVecGen2i ssra_op[4] = {
 -        { .fni8 = gen_ssra8_i64,
 -          .fniv = gen_ssra_vec,
 -          .load_dest = true,
 -          .opc = INDEX_op_sari_vec,
 -          .vece = MO_8 },
 -        { .fni8 = gen_ssra16_i64,
 -          .fniv = gen_ssra_vec,
 -          .load_dest = true,
 -          .opc = INDEX_op_sari_vec,
 -          .vece = MO_16 },
 -        { .fni4 = gen_ssra32_i32,
 -          .fniv = gen_ssra_vec,
 -          .load_dest = true,
 -          .opc = INDEX_op_sari_vec,
 -          .vece = MO_32 },
 -        { .fni8 = gen_ssra64_i64,
 -          .fniv = gen_ssra_vec,
 -          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -          .load_dest = true,
 -          .opc = INDEX_op_sari_vec,
 -          .vece = MO_64 },
 -    };
 -    static const GVecGen2i usra_op[4] = {
 -        { .fni8 = gen_usra8_i64,
 -          .fniv = gen_usra_vec,
 -          .load_dest = true,
 -          .opc = INDEX_op_shri_vec,
 -          .vece = MO_8, },
 -        { .fni8 = gen_usra16_i64,
 -          .fniv = gen_usra_vec,
 -          .load_dest = true,
 -          .opc = INDEX_op_shri_vec,
 -          .vece = MO_16, },
 -        { .fni4 = gen_usra32_i32,
 -          .fniv = gen_usra_vec,
 -          .load_dest = true,
 -          .opc = INDEX_op_shri_vec,
 -          .vece = MO_32, },
 -        { .fni8 = gen_usra64_i64,
 -          .fniv = gen_usra_vec,
 -          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -          .load_dest = true,
 -          .opc = INDEX_op_shri_vec,
 -          .vece = MO_64, },
 -    };
      static const GVecGen2i sri_op[4] = {
          { .fni8 = gen_shr8_ins_i64,
            .fniv = gen_shr_ins_vec,
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ const GVecGen3 bif_op = {
      .load_dest = true
  };
 +static void gen_ssra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 +{
-+    tcg_gen_vec_sar8i_i64(a, a, shift);
++    if (!dc_isar_feature(aa64_sve_bf16, s)) {
-+    tcg_gen_vec_add8_i64(d, d, a);
++        return false;
 +    }
 +    if (sve_access_check(s)) {
 +        TCGv_ptr status = fpstatus_ptr(FPST_FPCR);
 +        unsigned vsz = vec_full_reg_size(s);
 +
 +        tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
 +                           vec_full_reg_offset(s, a->rn),
 +                           vec_full_reg_offset(s, a->rm),
 +                           vec_full_reg_offset(s, a->ra),
 +                           status, vsz, vsz, sel,
 +                           gen_helper_gvec_bfmlal);
 +        tcg_temp_free_ptr(status);
 +    }
 +    return true;
 +}
 +
-+static void gen_ssra16_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
++static bool trans_BFMLALB_zzzw(DisasContext *s, arg_rrrr_esz *a)
 +{
-+    tcg_gen_vec_sar16i_i64(a, a, shift);
++    return do_BFMLAL_zzzw(s, a, false);
 +    tcg_gen_vec_add16_i64(d, d, a);
 +}
 +
-+static void gen_ssra32_i32(TCGv_i32 d, TCGv_i32 a, int32_t shift)
++static bool trans_BFMLALT_zzzw(DisasContext *s, arg_rrrr_esz *a)
 +{
-+    tcg_gen_sari_i32(a, a, shift);
++    return do_BFMLAL_zzzw(s, a, true);
 +    tcg_gen_add_i32(d, d, a);
 +}
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vec_helper.c
++++ b/target/arm/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
+     }
+     clear_tail(d, opr_sz, simd_maxsz(desc));
+ }
 +
-+static void gen_ssra64_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
++void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va,
 +                         void *stat, uint32_t desc)
 +{
-+    tcg_gen_sari_i64(a, a, shift);
++    intptr_t i, opr_sz = simd_oprsz(desc);
-+    tcg_gen_add_i64(d, d, a);
++    intptr_t sel = simd_data(desc);
 +    float32 *d = vd, *a = va;
 +    bfloat16 *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz / 4; ++i) {
 +        float32 nn = n[H2(i * 2 + sel)] << 16;
 +        float32 mm = m[H2(i * 2 + sel)] << 16;
 +        d[H4(i)] = float32_muladd(nn, mm, a[H4(i)], 0, stat);
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
-+
-+static void gen_ssra_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
-+{
-+    tcg_gen_sari_vec(vece, a, a, sh);
-+    tcg_gen_add_vec(vece, d, d, a);
-+}
-+
-+const GVecGen2i ssra_op[4] = {
-+    { .fni8 = gen_ssra8_i64,
-+      .fniv = gen_ssra_vec,
-+      .load_dest = true,
-+      .opc = INDEX_op_sari_vec,
-+      .vece = MO_8 },
-+    { .fni8 = gen_ssra16_i64,
-+      .fniv = gen_ssra_vec,
-+      .load_dest = true,
-+      .opc = INDEX_op_sari_vec,
-+      .vece = MO_16 },
-+    { .fni4 = gen_ssra32_i32,
-+      .fniv = gen_ssra_vec,
-+      .load_dest = true,
-+      .opc = INDEX_op_sari_vec,
-+      .vece = MO_32 },
-+    { .fni8 = gen_ssra64_i64,
-+      .fniv = gen_ssra_vec,
-+      .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-+      .load_dest = true,
-+      .opc = INDEX_op_sari_vec,
-+      .vece = MO_64 },
-+};
-+
-+static void gen_usra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
-+{
-+    tcg_gen_vec_shr8i_i64(a, a, shift);
-+    tcg_gen_vec_add8_i64(d, d, a);
-+}
-+
-+static void gen_usra16_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
-+{
-+    tcg_gen_vec_shr16i_i64(a, a, shift);
-+    tcg_gen_vec_add16_i64(d, d, a);
-+}
-+
-+static void gen_usra32_i32(TCGv_i32 d, TCGv_i32 a, int32_t shift)
-+{
-+    tcg_gen_shri_i32(a, a, shift);
-+    tcg_gen_add_i32(d, d, a);
-+}
-+
-+static void gen_usra64_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
-+{
-+    tcg_gen_shri_i64(a, a, shift);
-+    tcg_gen_add_i64(d, d, a);
-+}
-+
-+static void gen_usra_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
-+{
-+    tcg_gen_shri_vec(vece, a, a, sh);
-+    tcg_gen_add_vec(vece, d, d, a);
-+}
-+
-+const GVecGen2i usra_op[4] = {
-+    { .fni8 = gen_usra8_i64,
-+      .fniv = gen_usra_vec,
-+      .load_dest = true,
-+      .opc = INDEX_op_shri_vec,
-+      .vece = MO_8, },
-+    { .fni8 = gen_usra16_i64,
-+      .fniv = gen_usra_vec,
-+      .load_dest = true,
-+      .opc = INDEX_op_shri_vec,
-+      .vece = MO_16, },
-+    { .fni4 = gen_usra32_i32,
-+      .fniv = gen_usra_vec,
-+      .load_dest = true,
-+      .opc = INDEX_op_shri_vec,
-+      .vece = MO_32, },
-+    { .fni8 = gen_usra64_i64,
-+      .fniv = gen_usra_vec,
-+      .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-+      .load_dest = true,
-+      .opc = INDEX_op_shri_vec,
-+      .vece = MO_64, },
-+};
- /* Translate a NEON data processing instruction.  Return nonzero if the
-    instruction is invalid.
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                     }
-                     return 0;
-+                case 1:  /* VSRA */
-+                    /* Right shift comes here negative.  */
-+                    shift = -shift;
-+                    /* Shifts larger than the element size are architecturally
-+                     * valid.  Unsigned results in all zeros; signed results
-+                     * in all sign bits.
-+                     */
-+                    if (!u) {
-+                        tcg_gen_gvec_2i(rd_ofs, rm_ofs, vec_size, vec_size,
-+                                        MIN(shift, (8 << size) - 1),
-+                                        &ssra_op[size]);
-+                    } else if (shift >= 8 << size) {
-+                        /* rd += 0 */
-+                    } else {
-+                        tcg_gen_gvec_2i(rd_ofs, rm_ofs, vec_size, vec_size,
-+                                        shift, &usra_op[size]);
-+                    }
-+                    return 0;
-+
-                 case 5: /* VSHL, VSLI */
-                     if (!u) { /* VSHL */
-                         /* Shifts larger than the element size are
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                         neon_load_reg64(cpu_V0, rm + pass);
-                         tcg_gen_movi_i64(cpu_V1, imm);
-                         switch (op) {
--                        case 1:  /* VSRA */
--                            if (u)
--                                gen_helper_neon_shl_u64(cpu_V0, cpu_V0, cpu_V1);
--                            else
--                                gen_helper_neon_shl_s64(cpu_V0, cpu_V0, cpu_V1);
--                            break;
-                         case 2: /* VRSHR */
-                         case 3: /* VRSRA */
-                             if (u)
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                         default:
-                             g_assert_not_reached();
-                         }
--                        if (op == 1 || op == 3) {
-+                        if (op == 3) {
-                             /* Accumulate.  */
-                             neon_load_reg64(cpu_V1, rd + pass);
-                             tcg_gen_add_i64(cpu_V0, cpu_V0, cpu_V1);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                         tmp2 = tcg_temp_new_i32();
-                         tcg_gen_movi_i32(tmp2, imm);
-                         switch (op) {
--                        case 1:  /* VSRA */
--                            GEN_NEON_INTEGER_OP(shl);
--                            break;
-                         case 2: /* VRSHR */
-                         case 3: /* VRSRA */
-                             GEN_NEON_INTEGER_OP(rshl);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                         }
-                         tcg_temp_free_i32(tmp2);
--                        if (op == 1 || op == 3) {
-+                        if (op == 3) {
-                             /* Accumulate.  */
-                             tmp2 = neon_load_reg(rd, pass);
-                             gen_neon_add(size, tmp, tmp2);
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 29/45] target/arm: Use gvec for NEON_3R_LOGIC insns
+[PULL 25/45] target/arm: Implement bfloat widening fma (indexed)
 From: Richard Henderson <richard.henderson@linaro.org>
-Move expanders for VBSL, VBIT, and VBIF from translate-a64.c.
+This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
 and VFMA{B,T}.BF16 for AArch32 NEON.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181011205206.3552-9-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-11-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.h     |   6 ++
+ target/arm/helper.h           |  2 ++
- target/arm/translate-a64.c |  61 --------------
+ target/arm/neon-shared.decode |  2 ++
- target/arm/translate.c     | 162 +++++++++++++++++++++++++++----------
+ target/arm/sve.decode         |  2 ++
-files changed, 124 insertions(+), 105 deletions(-)
+ target/arm/translate-a64.c    | 15 ++++++++++++++-
  target/arm/translate-neon.c   | 10 ++++++++++
  target/arm/translate-sve.c    | 30 ++++++++++++++++++++++++++++++
  target/arm/vec_helper.c       | 22 ++++++++++++++++++++++
 files changed, 82 insertions(+), 1 deletion(-)
-diff --git a/target/arm/translate.h b/target/arm/translate.h
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.h
+--- a/target/arm/helper.h
-+++ b/target/arm/translate.h
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ static inline TCGv_i32 get_ahp_flag(void)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
-     return ret;
- }
+ DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
+                    void, ptr, ptr, ptr, ptr, ptr, i32)
-+
++DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG,
-+/* Vector operations shared between ARM and AArch64.  */
++                   void, ptr, ptr, ptr, ptr, ptr, i32)
-+extern const GVecGen3 bsl_op;
-+extern const GVecGen3 bit_op;
+ #ifdef TARGET_AARCH64
-+extern const GVecGen3 bif_op;
+ #include "helper-a64.h"
-+
+diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
- /*
+index XXXXXXX..XXXXXXX 100644
-  * Forward to the isar_feature_* tests given a DisasContext pointer.
+--- a/target/arm/neon-shared.decode
-  */
++++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 0 . 1 index:1 ... \
                 rm=%vfml_scalar_q0_rm vn=%vn_sp vd=%vd_dp q=0
  VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 1 . 1 . rm:3 \
                 index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp q=1
 +VFMA_b16_scal  1111 1110 0.11 .... .... 1000 . q:1 . 1 . vm:3 \
 +               index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp
 diff --git a/target/arm/sve.decode b/target/arm/sve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/sve.decode
 +++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
  FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
  FMLSLB_zzxw     01100100 10 1 ..... 0110.0 ..... .....     @rrxr_3a esz=2
  FMLSLT_zzxw     01100100 10 1 ..... 0110.1 ..... .....     @rrxr_3a esz=2
 +BFMLALB_zzxw    01100100 11 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
 +BFMLALT_zzxw    01100100 11 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
  ### SVE2 floating-point bfloat16 dot-product (indexed)
  BFDOT_zzxz      01100100 01 1 ..... 010000 ..... .....     @rrxr_2 esz=2
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
-     }
+                 unallocated_encoding(s);
                  return;
              }
 +            size = MO_32;
              break;
          case 1: /* BFDOT */
              if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
                  unallocated_encoding(s);
                  return;
              }
 +            size = MO_32;
 +            break;
 +        case 3: /* BFMLAL{B,T} */
 +            if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
 +                unallocated_encoding(s);
 +                return;
 +            }
 +            /* can't set is_fp without other incorrect size checks */
 +            size = MO_16;
              break;
          default:
              unallocated_encoding(s);
              return;
          }
 -        size = MO_32;
          break;
      case 0x11: /* FCMLA #0 */
      case 0x13: /* FCMLA #90 */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
              gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
                               gen_helper_gvec_usdot_idx_b);
              return;
 +        case 3: /* BFMLAL{B,T} */
 +            gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, 0, (index << 1) | is_q,
 +                              gen_helper_gvec_bfmlal_idx);
 +            return;
          }
          g_assert_not_reached();
      case 0x11: /* FCMLA #0 */
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VFMA_b16(DisasContext *s, arg_VFMA_b16 *a)
      return do_neon_ddda_fpst(s, 7, a->vd, a->vn, a->vm, a->q, FPST_STD,
                               gen_helper_gvec_bfmlal);
  }
++
--static void gen_bsl_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
++static bool trans_VFMA_b16_scal(DisasContext *s, arg_VFMA_b16_scal *a)
--{
++{
--    tcg_gen_xor_i64(rn, rn, rm);
++    if (!dc_isar_feature(aa32_bf16, s)) {
--    tcg_gen_and_i64(rn, rn, rd);
++        return false;
--    tcg_gen_xor_i64(rd, rm, rn);
++    }
--}
++    return do_neon_ddda_fpst(s, 6, a->vd, a->vn, a->vm,
--
++                             (a->index << 1) | a->q, FPST_STD,
--static void gen_bit_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
++                             gen_helper_gvec_bfmlal_idx);
--{
++}
--    tcg_gen_xor_i64(rn, rn, rd);
+diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
--    tcg_gen_and_i64(rn, rn, rm);
+index XXXXXXX..XXXXXXX 100644
--    tcg_gen_xor_i64(rd, rd, rn);
+--- a/target/arm/translate-sve.c
--}
++++ b/target/arm/translate-sve.c
--
+@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLALT_zzzw(DisasContext *s, arg_rrrr_esz *a)
 -static void gen_bif_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
 -{
 -    tcg_gen_xor_i64(rn, rn, rd);
 -    tcg_gen_andc_i64(rn, rn, rm);
 -    tcg_gen_xor_i64(rd, rd, rn);
 -}
 -
 -static void gen_bsl_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
 -{
 -    tcg_gen_xor_vec(vece, rn, rn, rm);
 -    tcg_gen_and_vec(vece, rn, rn, rd);
 -    tcg_gen_xor_vec(vece, rd, rm, rn);
 -}
 -
 -static void gen_bit_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
 -{
 -    tcg_gen_xor_vec(vece, rn, rn, rd);
 -    tcg_gen_and_vec(vece, rn, rn, rm);
 -    tcg_gen_xor_vec(vece, rd, rd, rn);
 -}
 -
 -static void gen_bif_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
 -{
 -    tcg_gen_xor_vec(vece, rn, rn, rd);
 -    tcg_gen_andc_vec(vece, rn, rn, rm);
 -    tcg_gen_xor_vec(vece, rd, rd, rn);
 -}
 -
  /* Logic op (opcode == 3) subgroup of C3.6.16. */
  static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
  {
--    static const GVecGen3 bsl_op = {
+     return do_BFMLAL_zzzw(s, a, true);
 -        .fni8 = gen_bsl_i64,
 -        .fniv = gen_bsl_vec,
 -        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -        .load_dest = true
 -    };
 -    static const GVecGen3 bit_op = {
 -        .fni8 = gen_bit_i64,
 -        .fniv = gen_bit_vec,
 -        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -        .load_dest = true
 -    };
 -    static const GVecGen3 bif_op = {
 -        .fni8 = gen_bif_i64,
 -        .fniv = gen_bif_vec,
 -        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -        .load_dest = true
 -    };
 -
      int rd = extract32(insn, 0, 5);
      int rn = extract32(insn, 5, 5);
      int rm = extract32(insn, 16, 5);
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
      return 0;
  }
++
--/* Bitwise select.  dest = c ? t : f.  Clobbers T and F.  */
++static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel)
 -static void gen_neon_bsl(TCGv_i32 dest, TCGv_i32 t, TCGv_i32 f, TCGv_i32 c)
 -{
 -    tcg_gen_and_i32(t, t, c);
 -    tcg_gen_andc_i32(f, f, c);
 -    tcg_gen_or_i32(dest, t, f);
 -}
 -
  static inline void gen_neon_narrow(int size, TCGv_i32 dest, TCGv_i64 src)
  {
      switch (size) {
@@ -XXX,XX +XXX,XX @@ static int do_v81_helper(DisasContext *s, gen_helper_gvec_3_ptr *fn,
      return 1;
  }
 +/*
 + * Expanders for VBitOps_VBIF, VBIT, VBSL.
 + */
 +static void gen_bsl_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
 +{
-+    tcg_gen_xor_i64(rn, rn, rm);
++    if (!dc_isar_feature(aa64_sve_bf16, s)) {
-+    tcg_gen_and_i64(rn, rn, rd);
++        return false;
-+    tcg_gen_xor_i64(rd, rm, rn);
++    }
 +    if (sve_access_check(s)) {
 +        TCGv_ptr status = fpstatus_ptr(FPST_FPCR);
 +        unsigned vsz = vec_full_reg_size(s);
 +
 +        tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
 +                           vec_full_reg_offset(s, a->rn),
 +                           vec_full_reg_offset(s, a->rm),
 +                           vec_full_reg_offset(s, a->ra),
 +                           status, vsz, vsz, (a->index << 1) | sel,
 +                           gen_helper_gvec_bfmlal_idx);
 +        tcg_temp_free_ptr(status);
 +    }
 +    return true;
 +}
 +
-+static void gen_bit_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
++static bool trans_BFMLALB_zzxw(DisasContext *s, arg_rrxr_esz *a)
 +{
-+    tcg_gen_xor_i64(rn, rn, rd);
++    return do_BFMLAL_zzxw(s, a, false);
 +    tcg_gen_and_i64(rn, rn, rm);
 +    tcg_gen_xor_i64(rd, rd, rn);
 +}
 +
-+static void gen_bif_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
++static bool trans_BFMLALT_zzxw(DisasContext *s, arg_rrxr_esz *a)
 +{
-+    tcg_gen_xor_i64(rn, rn, rd);
++    return do_BFMLAL_zzxw(s, a, true);
 +    tcg_gen_andc_i64(rn, rn, rm);
 +    tcg_gen_xor_i64(rd, rd, rn);
 +}
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vec_helper.c
++++ b/target/arm/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va,
+     }
+     clear_tail(d, opr_sz, simd_maxsz(desc));
+ }
 +
-+static void gen_bsl_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
++void HELPER(gvec_bfmlal_idx)(void *vd, void *vn, void *vm,
 +                             void *va, void *stat, uint32_t desc)
 +{
-+    tcg_gen_xor_vec(vece, rn, rn, rm);
++    intptr_t i, j, opr_sz = simd_oprsz(desc);
-+    tcg_gen_and_vec(vece, rn, rn, rd);
++    intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 1);
-+    tcg_gen_xor_vec(vece, rd, rm, rn);
++    intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 1, 3);
 +    intptr_t elements = opr_sz / 4;
 +    intptr_t eltspersegment = MIN(16 / 4, elements);
 +    float32 *d = vd, *a = va;
 +    bfloat16 *n = vn, *m = vm;
 +
 +    for (i = 0; i < elements; i += eltspersegment) {
 +        float32 m_idx = m[H2(2 * i + index)] << 16;
 +
 +        for (j = i; j < i + eltspersegment; j++) {
 +            float32 n_j = n[H2(2 * j + sel)] << 16;
 +            d[H4(j)] = float32_muladd(n_j, m_idx, a[H4(j)], 0, stat);
 +        }
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
-+
-+static void gen_bit_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
-+{
-+    tcg_gen_xor_vec(vece, rn, rn, rd);
-+    tcg_gen_and_vec(vece, rn, rn, rm);
-+    tcg_gen_xor_vec(vece, rd, rd, rn);
-+}
-+
-+static void gen_bif_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
-+{
-+    tcg_gen_xor_vec(vece, rn, rn, rd);
-+    tcg_gen_andc_vec(vece, rn, rn, rm);
-+    tcg_gen_xor_vec(vece, rd, rd, rn);
-+}
-+
-+const GVecGen3 bsl_op = {
-+    .fni8 = gen_bsl_i64,
-+    .fniv = gen_bsl_vec,
-+    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-+    .load_dest = true
-+};
-+
-+const GVecGen3 bit_op = {
-+    .fni8 = gen_bit_i64,
-+    .fniv = gen_bit_vec,
-+    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-+    .load_dest = true
-+};
-+
-+const GVecGen3 bif_op = {
-+    .fni8 = gen_bif_i64,
-+    .fniv = gen_bif_vec,
-+    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-+    .load_dest = true
-+};
-+
-+
- /* Translate a NEON data processing instruction.  Return nonzero if the
-    instruction is invalid.
-    We process data in a mixture of 32-bit and 64-bit chunks.
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
- {
-     int op;
-     int q;
--    int rd, rn, rm;
-+    int rd, rn, rm, rd_ofs, rn_ofs, rm_ofs;
-     int size;
-     int shift;
-     int pass;
-     int count;
-     int pairwise;
-     int u;
-+    int vec_size;
-     uint32_t imm, mask;
-     TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
-     TCGv_ptr ptr1, ptr2, ptr3;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-     VFP_DREG_N(rn, insn);
-     VFP_DREG_M(rm, insn);
-     size = (insn >> 20) & 3;
-+    vec_size = q ? 16 : 8;
-+    rd_ofs = neon_reg_offset(rd, 0);
-+    rn_ofs = neon_reg_offset(rn, 0);
-+    rm_ofs = neon_reg_offset(rm, 0);
-+
-     if ((insn & (1 << 23)) == 0) {
-         /* Three register same length.  */
-         op = ((insn >> 7) & 0x1e) | ((insn >> 4) & 1);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                                      q, rd, rn, rm);
-             }
-             return 1;
-+
-+        case NEON_3R_LOGIC: /* Logic ops.  */
-+            switch ((u << 2) | size) {
-+            case 0: /* VAND */
-+                tcg_gen_gvec_and(0, rd_ofs, rn_ofs, rm_ofs,
-+                                 vec_size, vec_size);
-+                break;
-+            case 1: /* VBIC */
-+                tcg_gen_gvec_andc(0, rd_ofs, rn_ofs, rm_ofs,
-+                                  vec_size, vec_size);
-+                break;
-+            case 2:
-+                if (rn == rm) {
-+                    /* VMOV */
-+                    tcg_gen_gvec_mov(0, rd_ofs, rn_ofs, vec_size, vec_size);
-+                } else {
-+                    /* VORR */
-+                    tcg_gen_gvec_or(0, rd_ofs, rn_ofs, rm_ofs,
-+                                    vec_size, vec_size);
-+                }
-+                break;
-+            case 3: /* VORN */
-+                tcg_gen_gvec_orc(0, rd_ofs, rn_ofs, rm_ofs,
-+                                 vec_size, vec_size);
-+                break;
-+            case 4: /* VEOR */
-+                tcg_gen_gvec_xor(0, rd_ofs, rn_ofs, rm_ofs,
-+                                 vec_size, vec_size);
-+                break;
-+            case 5: /* VBSL */
-+                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
-+                               vec_size, vec_size, &bsl_op);
-+                break;
-+            case 6: /* VBIT */
-+                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
-+                               vec_size, vec_size, &bit_op);
-+                break;
-+            case 7: /* VBIF */
-+                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
-+                               vec_size, vec_size, &bif_op);
-+                break;
-+            }
-+            return 0;
-         }
--        if (size == 3 && op != NEON_3R_LOGIC) {
-+        if (size == 3) {
-             /* 64-bit element instructions. */
-             for (pass = 0; pass < (q ? 2 : 1); pass++) {
-                 neon_load_reg64(cpu_V0, rn + pass);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-         case NEON_3R_VRHADD:
-             GEN_NEON_INTEGER_OP(rhadd);
-             break;
--        case NEON_3R_LOGIC: /* Logic ops.  */
--            switch ((u << 2) | size) {
--            case 0: /* VAND */
--                tcg_gen_and_i32(tmp, tmp, tmp2);
--                break;
--            case 1: /* BIC */
--                tcg_gen_andc_i32(tmp, tmp, tmp2);
--                break;
--            case 2: /* VORR */
--                tcg_gen_or_i32(tmp, tmp, tmp2);
--                break;
--            case 3: /* VORN */
--                tcg_gen_orc_i32(tmp, tmp, tmp2);
--                break;
--            case 4: /* VEOR */
--                tcg_gen_xor_i32(tmp, tmp, tmp2);
--                break;
--            case 5: /* VBSL */
--                tmp3 = neon_load_reg(rd, pass);
--                gen_neon_bsl(tmp, tmp, tmp2, tmp3);
--                tcg_temp_free_i32(tmp3);
--                break;
--            case 6: /* VBIT */
--                tmp3 = neon_load_reg(rd, pass);
--                gen_neon_bsl(tmp, tmp, tmp3, tmp2);
--                tcg_temp_free_i32(tmp3);
--                break;
--            case 7: /* VBIF */
--                tmp3 = neon_load_reg(rd, pass);
--                gen_neon_bsl(tmp, tmp3, tmp, tmp2);
--                tcg_temp_free_i32(tmp3);
--                break;
--            }
--            break;
-         case NEON_3R_VHSUB:
-             GEN_NEON_INTEGER_OP(hsub);
-             break;
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 06/45] target/arm: Convert division from feature bits to isar0 tests
+[PULL 26/45] linux-user/aarch64: Enable hwcap bits for bfloat16
 From: Richard Henderson <richard.henderson@linaro.org>
-Both arm and thumb2 division are controlled by the same ISAR field,
-which takes care of the arm implies thumb case.  Having M imply
-thumb2 division was wrong for cortex-m0, which is v6m and does not
-have thumb2 at all, much less thumb2 division.
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181016223115.24100-5-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-12-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h       | 12 ++++++++++--
+ linux-user/elfload.c | 2 ++
- linux-user/elfload.c   |  4 ++--
+file changed, 2 insertions(+)
  target/arm/cpu.c       | 10 +---------
  target/arm/translate.c |  4 ++--
 files changed, 15 insertions(+), 15 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
-+++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ enum arm_features {
-     ARM_FEATURE_VFP3,
-     ARM_FEATURE_VFP_FP16,
-     ARM_FEATURE_NEON,
--    ARM_FEATURE_THUMB_DIV, /* divide supported in Thumb encoding */
-     ARM_FEATURE_M, /* Microcontroller profile.  */
-     ARM_FEATURE_OMAPCP, /* OMAP specific CP15 ops handling.  */
-     ARM_FEATURE_THUMB2EE,
-@@ -XXX,XX +XXX,XX @@ enum arm_features {
-     ARM_FEATURE_V5,
-     ARM_FEATURE_STRONGARM,
-     ARM_FEATURE_VAPA, /* cp15 VA to PA lookups */
--    ARM_FEATURE_ARM_DIV, /* divide supported in ARM encoding */
-     ARM_FEATURE_VFP4, /* VFPv4 (implies that NEON is v2) */
-     ARM_FEATURE_GENERIC_TIMER,
-     ARM_FEATURE_MVFR, /* Media and VFP Feature Registers 0 and 1 */
-@@ -XXX,XX +XXX,XX @@ extern const uint64_t pred_esz_masks[4];
- /*
-  * 32-bit feature tests via id registers.
-  */
-+static inline bool isar_feature_thumb_div(const ARMISARegisters *id)
-+{
-+    return FIELD_EX32(id->id_isar0, ID_ISAR0, DIVIDE) != 0;
-+}
-+
-+static inline bool isar_feature_arm_div(const ARMISARegisters *id)
-+{
-+    return FIELD_EX32(id->id_isar0, ID_ISAR0, DIVIDE) > 1;
-+}
-+
- static inline bool isar_feature_aa32_aes(const ARMISARegisters *id)
- {
-     return FIELD_EX32(id->id_isar5, ID_ISAR5, AES) != 0;
 diff --git a/linux-user/elfload.c b/linux-user/elfload.c
 index XXXXXXX..XXXXXXX 100644
 --- a/linux-user/elfload.c
 +++ b/linux-user/elfload.c
-@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
+@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap2(void)
-     GET_FEATURE(ARM_FEATURE_VFP3, ARM_HWCAP_ARM_VFPv3);
+     GET_FEATURE_ID(aa64_sve_i8mm, ARM_HWCAP2_A64_SVEI8MM);
-     GET_FEATURE(ARM_FEATURE_V6K, ARM_HWCAP_ARM_TLS);
+     GET_FEATURE_ID(aa64_sve_f32mm, ARM_HWCAP2_A64_SVEF32MM);
-     GET_FEATURE(ARM_FEATURE_VFP4, ARM_HWCAP_ARM_VFPv4);
+     GET_FEATURE_ID(aa64_sve_f64mm, ARM_HWCAP2_A64_SVEF64MM);
--    GET_FEATURE(ARM_FEATURE_ARM_DIV, ARM_HWCAP_ARM_IDIVA);
++    GET_FEATURE_ID(aa64_sve_bf16, ARM_HWCAP2_A64_SVEBF16);
--    GET_FEATURE(ARM_FEATURE_THUMB_DIV, ARM_HWCAP_ARM_IDIVT);
+     GET_FEATURE_ID(aa64_i8mm, ARM_HWCAP2_A64_I8MM);
-+    GET_FEATURE_ID(arm_div, ARM_HWCAP_ARM_IDIVA);
++    GET_FEATURE_ID(aa64_bf16, ARM_HWCAP2_A64_BF16);
-+    GET_FEATURE_ID(thumb_div, ARM_HWCAP_ARM_IDIVT);
+     GET_FEATURE_ID(aa64_rndr, ARM_HWCAP2_A64_RNG);
-     /* All QEMU's VFPv3 CPUs have 32 registers, see VFP_DREG in translate.c.
+     GET_FEATURE_ID(aa64_bti, ARM_HWCAP2_A64_BTI);
-      * Note that the ARM_HWCAP_ARM_VFPv3D16 bit is always the inverse of
+     GET_FEATURE_ID(aa64_mte, ARM_HWCAP2_A64_MTE);
       * ARM_HWCAP_ARM_VFPD32 (and so always clear for QEMU); it is unrelated
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
           * Presence of EL2 itself is ARM_FEATURE_EL2, and of the
           * Security Extensions is ARM_FEATURE_EL3.
           */
 -        set_feature(env, ARM_FEATURE_ARM_DIV);
 +        assert(cpu_isar_feature(arm_div, cpu));
          set_feature(env, ARM_FEATURE_LPAE);
          set_feature(env, ARM_FEATURE_V7);
      }
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
      if (arm_feature(env, ARM_FEATURE_V5)) {
          set_feature(env, ARM_FEATURE_V4T);
      }
 -    if (arm_feature(env, ARM_FEATURE_M)) {
 -        set_feature(env, ARM_FEATURE_THUMB_DIV);
 -    }
 -    if (arm_feature(env, ARM_FEATURE_ARM_DIV)) {
 -        set_feature(env, ARM_FEATURE_THUMB_DIV);
 -    }
      if (arm_feature(env, ARM_FEATURE_VFP4)) {
          set_feature(env, ARM_FEATURE_VFP3);
          set_feature(env, ARM_FEATURE_VFP_FP16);
@@ -XXX,XX +XXX,XX @@ static void cortex_r5_initfn(Object *obj)
      ARMCPU *cpu = ARM_CPU(obj);
      set_feature(&cpu->env, ARM_FEATURE_V7);
 -    set_feature(&cpu->env, ARM_FEATURE_THUMB_DIV);
 -    set_feature(&cpu->env, ARM_FEATURE_ARM_DIV);
      set_feature(&cpu->env, ARM_FEATURE_V7MP);
      set_feature(&cpu->env, ARM_FEATURE_PMSA);
      cpu->midr = 0x411fc153; /* r1p3 */
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                      case 1:
                      case 3:
                          /* SDIV, UDIV */
 -                        if (!arm_dc_feature(s, ARM_FEATURE_ARM_DIV)) {
 +                        if (!dc_isar_feature(arm_div, s)) {
                              goto illegal_op;
                          }
                          if (((insn >> 5) & 7) || (rd != 15)) {
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
              tmp2 = load_reg(s, rm);
              if ((op & 0x50) == 0x10) {
                  /* sdiv, udiv */
 -                if (!arm_dc_feature(s, ARM_FEATURE_THUMB_DIV)) {
 +                if (!dc_isar_feature(thumb_div, s)) {
                      goto illegal_op;
                  }
                  if (op & 0x20)
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 04/45] target/arm: V8M should not imply V7VE
+[PULL 27/45] target/arm: Enable BFloat16 extensions
 From: Richard Henderson <richard.henderson@linaro.org>
-Instantiating mps2-an505 (cortex-m33) will fail make check when
+Disable BF16 again for !have_neon and !have_vfp during realize.
 V7VE asserts that ID_ISAR0.Divide includes ARM division.  It is
 also wrong to include ARM_FEATURE_LPAE.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181016223115.24100-3-richard.henderson@linaro.org
+Message-id: 20210525225817.400336-13-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.c | 6 +++++-
+ target/arm/cpu.c     | 3 +++
-file changed, 5 insertions(+), 1 deletion(-)
+ target/arm/cpu64.c   | 3 +++
  target/arm/cpu_tcg.c | 1 +
 files changed, 7 insertions(+)
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
 @@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
-     /* Some features automatically imply others: */
+         u = cpu->isar.id_isar6;
-     if (arm_feature(env, ARM_FEATURE_V8)) {
+         u = FIELD_DP32(u, ID_ISAR6, JSCVT, 0);
--        set_feature(env, ARM_FEATURE_V7VE);
++        u = FIELD_DP32(u, ID_ISAR6, BF16, 0);
-+        if (arm_feature(env, ARM_FEATURE_M)) {
+         cpu->isar.id_isar6 = u;
-+            set_feature(env, ARM_FEATURE_V7);
-+        } else {
+         u = cpu->isar.mvfr0;
-+            set_feature(env, ARM_FEATURE_V7VE);
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
-+        }
-     }
+         t = cpu->isar.id_aa64isar1;
-     if (arm_feature(env, ARM_FEATURE_V7VE)) {
+         t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 0);
-         /* v7 Virtualization Extensions. In real hardware this implies
++        t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 0);
          t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 0);
          cpu->isar.id_aa64isar1 = t;
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
          u = cpu->isar.id_isar6;
          u = FIELD_DP32(u, ID_ISAR6, DP, 0);
          u = FIELD_DP32(u, ID_ISAR6, FHM, 0);
 +        u = FIELD_DP32(u, ID_ISAR6, BF16, 0);
          u = FIELD_DP32(u, ID_ISAR6, I8MM, 0);
          cpu->isar.id_isar6 = u;
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
          t = FIELD_DP64(t, ID_AA64ISAR1, SB, 1);
          t = FIELD_DP64(t, ID_AA64ISAR1, SPECRES, 1);
 +        t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 1);
          t = FIELD_DP64(t, ID_AA64ISAR1, FRINTTS, 1);
          t = FIELD_DP64(t, ID_AA64ISAR1, LRCPC, 2); /* ARMv8.4-RCPC */
          t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 1);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 1);
          t = FIELD_DP64(t, ID_AA64ZFR0, AES, 2);  /* PMULL */
          t = FIELD_DP64(t, ID_AA64ZFR0, BITPERM, 1);
 +        t = FIELD_DP64(t, ID_AA64ZFR0, BFLOAT16, 1);
          t = FIELD_DP64(t, ID_AA64ZFR0, SHA3, 1);
          t = FIELD_DP64(t, ID_AA64ZFR0, SM4, 1);
          t = FIELD_DP64(t, ID_AA64ZFR0, I8MM, 1);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          u = FIELD_DP32(u, ID_ISAR6, FHM, 1);
          u = FIELD_DP32(u, ID_ISAR6, SB, 1);
          u = FIELD_DP32(u, ID_ISAR6, SPECRES, 1);
 +        u = FIELD_DP32(u, ID_ISAR6, BF16, 1);
          u = FIELD_DP32(u, ID_ISAR6, I8MM, 1);
          cpu->isar.id_isar6 = u;
 diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu_tcg.c
 +++ b/target/arm/cpu_tcg.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
          t = FIELD_DP32(t, ID_ISAR6, FHM, 1);
          t = FIELD_DP32(t, ID_ISAR6, SB, 1);
          t = FIELD_DP32(t, ID_ISAR6, SPECRES, 1);
 +        t = FIELD_DP32(t, ID_ISAR6, BF16, 1);
          t = FIELD_DP32(t, ID_ISAR6, I8MM, 1);
          cpu->isar.id_isar6 = t;
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 30/45] target/arm: Use gvec for NEON_3R_VADD_VSUB insns
+[PULL 28/45] hvf: Move assert_hvf_ok() into common directory
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Alexander Graf <agraf@csgraf.de>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Until now, Hypervisor.framework has only been available on x86_64 systems.
-Message-id: 20181011205206.3552-10-richard.henderson@linaro.org
+With Apple Silicon shipping now, it extends its reach to aarch64. To
 prepare for support for multiple architectures, let's start moving common
 code out into its own accel directory.
 This patch moves assert_hvf_ok() and introduces generic build infrastructure.
 Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-2-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 29 ++++++++++-------------------
+ include/sysemu/hvf_int.h | 18 +++++++++++++++
-file changed, 10 insertions(+), 19 deletions(-)
+ accel/hvf/hvf-all.c      | 47 ++++++++++++++++++++++++++++++++++++++++
+ target/i386/hvf/hvf.c    | 33 +---------------------------
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+ MAINTAINERS              |  8 +++++++
  accel/hvf/meson.build    |  6 +++++
  accel/meson.build        |  1 +
 files changed, 81 insertions(+), 32 deletions(-)
  create mode 100644 include/sysemu/hvf_int.h
  create mode 100644 accel/hvf/hvf-all.c
  create mode 100644 accel/hvf/meson.build
 diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * QEMU Hypervisor.framework (HVF) support
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + *
 + */
 +
 +/* header to be included in HVF-specific code */
 +
 +#ifndef HVF_INT_H
 +#define HVF_INT_H
 +
 +#include <Hypervisor/hv.h>
 +
 +void assert_hvf_ok(hv_return_t ret);
 +
 +#endif
 diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/accel/hvf/hvf-all.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * QEMU Hypervisor.framework support
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2.  See
 + * the COPYING file in the top-level directory.
 + *
 + * Contributions after 2012-01-13 are licensed under the terms of the
 + * GNU GPL, version 2 or (at your option) any later version.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qemu-common.h"
 +#include "qemu/error-report.h"
 +#include "sysemu/hvf.h"
 +#include "sysemu/hvf_int.h"
 +
 +void assert_hvf_ok(hv_return_t ret)
 +{
 +    if (ret == HV_SUCCESS) {
 +        return;
 +    }
 +
 +    switch (ret) {
 +    case HV_ERROR:
 +        error_report("Error: HV_ERROR");
 +        break;
 +    case HV_BUSY:
 +        error_report("Error: HV_BUSY");
 +        break;
 +    case HV_BAD_ARGUMENT:
 +        error_report("Error: HV_BAD_ARGUMENT");
 +        break;
 +    case HV_NO_RESOURCES:
 +        error_report("Error: HV_NO_RESOURCES");
 +        break;
 +    case HV_NO_DEVICE:
 +        error_report("Error: HV_NO_DEVICE");
 +        break;
 +    case HV_UNSUPPORTED:
 +        error_report("Error: HV_UNSUPPORTED");
 +        break;
 +    default:
 +        error_report("Unknown Error");
 +    }
 +
 +    abort();
 +}
 diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/i386/hvf/hvf.c
-+++ b/target/arm/translate.c
++++ b/target/i386/hvf/hvf.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@
-                 break;
+ #include "qemu/error-report.h"
-             }
-             return 0;
+ #include "sysemu/hvf.h"
-+
++#include "sysemu/hvf_int.h"
-+        case NEON_3R_VADD_VSUB:
+ #include "sysemu/runstate.h"
-+            if (u) {
+ #include "hvf-i386.h"
-+                tcg_gen_gvec_sub(size, rd_ofs, rn_ofs, rm_ofs,
+ #include "vmcs.h"
-+                                 vec_size, vec_size);
+@@ -XXX,XX +XXX,XX @@
-+            } else {
-+                tcg_gen_gvec_add(size, rd_ofs, rn_ofs, rm_ofs,
+ HVFState *hvf_state;
-+                                 vec_size, vec_size);
-+            }
+-static void assert_hvf_ok(hv_return_t ret)
-+            return 0;
+-{
-         }
+-    if (ret == HV_SUCCESS) {
-         if (size == 3) {
+-        return;
-             /* 64-bit element instructions. */
+-    }
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+-
-                                                   cpu_V1, cpu_V0);
+-    switch (ret) {
-                     }
+-    case HV_ERROR:
-                     break;
+-        error_report("Error: HV_ERROR");
--                case NEON_3R_VADD_VSUB:
+-        break;
--                    if (u) {
+-    case HV_BUSY:
--                        tcg_gen_sub_i64(CPU_V001);
+-        error_report("Error: HV_BUSY");
--                    } else {
+-        break;
--                        tcg_gen_add_i64(CPU_V001);
+-    case HV_BAD_ARGUMENT:
--                    }
+-        error_report("Error: HV_BAD_ARGUMENT");
--                    break;
+-        break;
-                 default:
+-    case HV_NO_RESOURCES:
-                     abort();
+-        error_report("Error: HV_NO_RESOURCES");
-                 }
+-        break;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+-    case HV_NO_DEVICE:
-             tmp2 = neon_load_reg(rd, pass);
+-        error_report("Error: HV_NO_DEVICE");
-             gen_neon_add(size, tmp, tmp2);
+-        break;
-             break;
+-    case HV_UNSUPPORTED:
--        case NEON_3R_VADD_VSUB:
+-        error_report("Error: HV_UNSUPPORTED");
--            if (!u) { /* VADD */
+-        break;
--                gen_neon_add(size, tmp, tmp2);
+-    default:
--            } else { /* VSUB */
+-        error_report("Unknown Error");
--                switch (size) {
+-    }
--                case 0: gen_helper_neon_sub_u8(tmp, tmp, tmp2); break;
+-
--                case 1: gen_helper_neon_sub_u16(tmp, tmp, tmp2); break;
+-    abort();
--                case 2: tcg_gen_sub_i32(tmp, tmp, tmp2); break;
+-}
--                default: abort();
+-
--                }
+ /* Memory slots */
--            }
+ hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
--            break;
+ {
-         case NEON_3R_VTST_VCEQ:
+diff --git a/MAINTAINERS b/MAINTAINERS
-             if (!u) { /* VTST */
+index XXXXXXX..XXXXXXX 100644
-                 switch (size) {
+--- a/MAINTAINERS
 +++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ M: Roman Bolshakov <r.bolshakov@yadro.com>
  W: https://wiki.qemu.org/Features/HVF
  S: Maintained
  F: target/i386/hvf/
 +
 +HVF
 +M: Cameron Esfahani <dirty@apple.com>
 +M: Roman Bolshakov <r.bolshakov@yadro.com>
 +W: https://wiki.qemu.org/Features/HVF
 +S: Maintained
 +F: accel/hvf/
  F: include/sysemu/hvf.h
 +F: include/sysemu/hvf_int.h
  WHPX CPUs
  M: Sunil Muthuswamy <sunilmut@microsoft.com>
 diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/accel/hvf/meson.build
@@ -XXX,XX +XXX,XX @@
 +hvf_ss = ss.source_set()
 +hvf_ss.add(files(
 +  'hvf-all.c',
 +))
 +
 +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
 diff --git a/accel/meson.build b/accel/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/meson.build
 +++ b/accel/meson.build
@@ -XXX,XX +XXX,XX @@ specific_ss.add(files('accel-common.c'))
  softmmu_ss.add(files('accel-softmmu.c'))
  user_ss.add(files('accel-user.c'))
 +subdir('hvf')
  subdir('qtest')
  subdir('kvm')
  subdir('tcg')
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 28/45] target/arm: Use gvec for NEON VMOV, VMVN, VBIC & VORR (immediate)
+[PULL 29/45] hvf: Move vcpu thread functions into common directory
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Alexander Graf <agraf@csgraf.de>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Until now, Hypervisor.framework has only been available on x86_64 systems.
-Message-id: 20181011205206.3552-8-richard.henderson@linaro.org
+With Apple Silicon shipping now, it extends its reach to aarch64. To
 prepare for support for multiple architectures, let's start moving common
 code out into its own accel directory.
 This patch moves the vCPU thread loop over.
 Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-3-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 67 ++++++++++++++++++++++++------------------
+ {target/i386 => accel}/hvf/hvf-accel-ops.h | 0
-file changed, 39 insertions(+), 28 deletions(-)
+ {target/i386 => accel}/hvf/hvf-accel-ops.c | 0
  target/i386/hvf/x86hvf.c                   | 2 +-
  accel/hvf/meson.build                      | 1 +
  target/i386/hvf/meson.build                | 1 -
 files changed, 2 insertions(+), 2 deletions(-)
  rename {target/i386 => accel}/hvf/hvf-accel-ops.h (100%)
  rename {target/i386 => accel}/hvf/hvf-accel-ops.c (100%)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/i386/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
 similarity index 100%
 rename from target/i386/hvf/hvf-accel-ops.h
 rename to accel/hvf/hvf-accel-ops.h
 diff --git a/target/i386/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 similarity index 100%
 rename from target/i386/hvf/hvf-accel-ops.c
 rename to accel/hvf/hvf-accel-ops.c
 diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/i386/hvf/x86hvf.c
-+++ b/target/arm/translate.c
++++ b/target/i386/hvf/x86hvf.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@
-                 return 1;
+ #include <Hypervisor/hv.h>
-             }
+ #include <Hypervisor/hv_vmx.h>
-         } else { /* (insn & 0x00380080) == 0 */
--            int invert;
+-#include "hvf-accel-ops.h"
-+            int invert, reg_ofs, vec_size;
++#include "accel/hvf/hvf-accel-ops.h"
-+
-             if (q && (rd & 1)) {
+ void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
-                 return 1;
+                      SegmentCache *qseg, bool is_tr)
-             }
+diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+index XXXXXXX..XXXXXXX 100644
-                 break;
+--- a/accel/hvf/meson.build
-             case 14:
++++ b/accel/hvf/meson.build
-                 imm |= (imm << 8) | (imm << 16) | (imm << 24);
+@@ -XXX,XX +XXX,XX @@
--                if (invert)
+ hvf_ss = ss.source_set()
-+                if (invert) {
+ hvf_ss.add(files(
-                     imm = ~imm;
+   'hvf-all.c',
-+                }
++  'hvf-accel-ops.c',
-                 break;
+ ))
-             case 15:
-                 if (invert) {
+ specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build
-                       | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
+index XXXXXXX..XXXXXXX 100644
-                 break;
+--- a/target/i386/hvf/meson.build
-             }
++++ b/target/i386/hvf/meson.build
--            if (invert)
+@@ -XXX,XX +XXX,XX @@
-+            if (invert) {
+ i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
-                 imm = ~imm;
+   'hvf.c',
-+            }
+-  'hvf-accel-ops.c',
+   'x86.c',
--            for (pass = 0; pass < (q ? 4 : 2); pass++) {
+   'x86_cpuid.c',
--                if (op & 1 && op < 12) {
+   'x86_decode.c',
 -                    tmp = neon_load_reg(rd, pass);
 -                    if (invert) {
 -                        /* The immediate value has already been inverted, so
 -                           BIC becomes AND.  */
 -                        tcg_gen_andi_i32(tmp, tmp, imm);
 -                    } else {
 -                        tcg_gen_ori_i32(tmp, tmp, imm);
 -                    }
 +            reg_ofs = neon_reg_offset(rd, 0);
 +            vec_size = q ? 16 : 8;
 +
 +            if (op & 1 && op < 12) {
 +                if (invert) {
 +                    /* The immediate value has already been inverted,
 +                     * so BIC becomes AND.
 +                     */
 +                    tcg_gen_gvec_andi(MO_32, reg_ofs, reg_ofs, imm,
 +                                      vec_size, vec_size);
                  } else {
 -                    /* VMOV, VMVN.  */
 -                    tmp = tcg_temp_new_i32();
 -                    if (op == 14 && invert) {
 -                        int n;
 -                        uint32_t val;
 -                        val = 0;
 -                        for (n = 0; n < 4; n++) {
 -                            if (imm & (1 << (n + (pass & 1) * 4)))
 -                                val |= 0xff << (n * 8);
 -                        }
 -                        tcg_gen_movi_i32(tmp, val);
 -                    } else {
 -                        tcg_gen_movi_i32(tmp, imm);
 -                    }
 +                    tcg_gen_gvec_ori(MO_32, reg_ofs, reg_ofs, imm,
 +                                     vec_size, vec_size);
 +                }
 +            } else {
 +                /* VMOV, VMVN.  */
 +                if (op == 14 && invert) {
 +                    TCGv_i64 t64 = tcg_temp_new_i64();
 +
 +                    for (pass = 0; pass <= q; ++pass) {
 +                        uint64_t val = 0;
 +                        int n;
 +
 +                        for (n = 0; n < 8; n++) {
 +                            if (imm & (1 << (n + pass * 8))) {
 +                                val |= 0xffull << (n * 8);
 +                            }
 +                        }
 +                        tcg_gen_movi_i64(t64, val);
 +                        neon_store_reg64(t64, rd + pass);
 +                    }
 +                    tcg_temp_free_i64(t64);
 +                } else {
 +                    tcg_gen_gvec_dup32i(reg_ofs, vec_size, vec_size, imm);
                  }
 -                neon_store_reg(rd, pass, tmp);
              }
          }
      } else { /* (insn & 0x00800010 == 0x00800000) */
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 13/45] target/arm: Implement HCR.FB
+[PULL 30/45] hvf: Move cpu functions into common directory
-The HCR.FB virtualization configuration register bit requests that
+From: Alexander Graf <agraf@csgraf.de>
 TLB maintenance, branch predictor invalidate-all and icache
 invalidate-all operations performed in NS EL1 should be upgraded
 from "local CPU only to "broadcast within Inner Shareable domain".
 For QEMU we NOP the branch predictor and icache operations, so
 we only need to upgrade the TLB invalidates:
  AArch32 TLBIALL, TLBIMVA, TLBIASID, DTLBIALL, DTLBIMVA, DTLBIASID,
          ITLBIALL, ITLBIMVA, ITLBIASID, TLBIMVAA, TLBIMVAL, TLBIMVAAL
  AArch64 TLBI VMALLE1, TLBI VAE1, TLBI ASIDE1, TLBI VAAE1,
          TLBI VALE1, TLBI VAALE1
+Until now, Hypervisor.framework has only been available on x86_64 systems.
+With Apple Silicon shipping now, it extends its reach to aarch64. To
+prepare for support for multiple architectures, let's start moving common
+code out into its own accel directory.
+This patch moves CPU and memory operations over. While at it, make sure
+the code is consumable on non-i386 systems.
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
+Reviewed-by: Sergio Lopez <slp@redhat.com>
+Message-id: 20210519202253.76782-4-agraf@csgraf.de
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20181012144235.19646-4-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 191 +++++++++++++++++++++++++++-----------------
+ include/sysemu/hvf_int.h   |   4 +
-file changed, 116 insertions(+), 75 deletions(-)
+ target/i386/hvf/hvf-i386.h |   2 -
  target/i386/hvf/x86hvf.h   |   2 -
  accel/hvf/hvf-accel-ops.c  | 308 ++++++++++++++++++++++++++++++++++++-
  target/i386/hvf/hvf.c      | 302 ------------------------------------
 files changed, 311 insertions(+), 307 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/include/sysemu/hvf_int.h
-+++ b/target/arm/helper.c
++++ b/include/sysemu/hvf_int.h
-@@ -XXX,XX +XXX,XX @@ static void contextidr_write(CPUARMState *env, const ARMCPRegInfo *ri,
+@@ -XXX,XX +XXX,XX @@
-     raw_write(env, ri, value);
- }
+ #include <Hypervisor/hv.h>
--static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
++void hvf_set_phys_mem(MemoryRegionSection *, bool);
--                          uint64_t value)
+ void assert_hvf_ok(hv_return_t ret);
--{
++hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
--    /* Invalidate all (TLBIALL) */
++int hvf_put_registers(CPUState *);
--    ARMCPU *cpu = arm_env_get_cpu(env);
++int hvf_get_registers(CPUState *);
--
--    tlb_flush(CPU(cpu));
+ #endif
--}
+diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
--
+index XXXXXXX..XXXXXXX 100644
--static void tlbimva_write(CPUARMState *env, const ARMCPRegInfo *ri,
+--- a/target/i386/hvf/hvf-i386.h
--                          uint64_t value)
++++ b/target/i386/hvf/hvf-i386.h
--{
+@@ -XXX,XX +XXX,XX @@ struct HVFState {
--    /* Invalidate single TLB entry by MVA and ASID (TLBIMVA) */
+ };
--    ARMCPU *cpu = arm_env_get_cpu(env);
+ extern HVFState *hvf_state;
--
--    tlb_flush_page(CPU(cpu), value & TARGET_PAGE_MASK);
+-void hvf_set_phys_mem(MemoryRegionSection *, bool);
--}
+ void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
--
+-hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
--static void tlbiasid_write(CPUARMState *env, const ARMCPRegInfo *ri,
--                           uint64_t value)
+ #ifdef NEED_CPU_H
--{
+ /* Functions exported to host specific mode */
--    /* Invalidate by ASID (TLBIASID) */
+diff --git a/target/i386/hvf/x86hvf.h b/target/i386/hvf/x86hvf.h
--    ARMCPU *cpu = arm_env_get_cpu(env);
+index XXXXXXX..XXXXXXX 100644
--
+--- a/target/i386/hvf/x86hvf.h
--    tlb_flush(CPU(cpu));
++++ b/target/i386/hvf/x86hvf.h
--}
+@@ -XXX,XX +XXX,XX @@
--
+ #include "x86_descr.h"
--static void tlbimvaa_write(CPUARMState *env, const ARMCPRegInfo *ri,
--                           uint64_t value)
+ int hvf_process_events(CPUState *);
--{
+-int hvf_put_registers(CPUState *);
--    /* Invalidate single entry by MVA, all ASIDs (TLBIMVAA) */
+-int hvf_get_registers(CPUState *);
--    ARMCPU *cpu = arm_env_get_cpu(env);
+ bool hvf_inject_interrupts(CPUState *);
--
+ void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
--    tlb_flush_page(CPU(cpu), value & TARGET_PAGE_MASK);
+                      SegmentCache *qseg, bool is_tr);
--}
+diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
--
+index XXXXXXX..XXXXXXX 100644
- /* IS variants of TLB operations must affect all cores */
+--- a/accel/hvf/hvf-accel-ops.c
- static void tlbiall_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
++++ b/accel/hvf/hvf-accel-ops.c
-                              uint64_t value)
+@@ -XXX,XX +XXX,XX @@
-@@ -XXX,XX +XXX,XX @@ static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+ #include "qemu/osdep.h"
-     tlb_flush_page_all_cpus_synced(cs, value & TARGET_PAGE_MASK);
+ #include "qemu/error-report.h"
- }
+ #include "qemu/main-loop.h"
++#include "exec/address-spaces.h"
-+/*
++#include "exec/exec-all.h"
-+ * Non-IS variants of TLB operations are upgraded to
++#include "sysemu/cpus.h"
-+ * IS versions if we are at NS EL1 and HCR_EL2.FB is set to
+ #include "sysemu/hvf.h"
-+ * force broadcast of these operations.
++#include "sysemu/hvf_int.h"
-+ */
+ #include "sysemu/runstate.h"
-+static bool tlb_force_broadcast(CPUARMState *env)
+-#include "target/i386/cpu.h"
-+{
+ #include "qemu/guest-random.h"
-+    return (env->cp15.hcr_el2 & HCR_FB) &&
-+        arm_current_el(env) == 1 && arm_is_secure_below_el3(env);
+ #include "hvf-accel-ops.h"
-+}
-+
++HVFState *hvf_state;
-+static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
++
-+                          uint64_t value)
++/* Memory slots */
-+{
++
-+    /* Invalidate all (TLBIALL) */
++hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
-+    ARMCPU *cpu = arm_env_get_cpu(env);
++{
-+
++    hvf_slot *slot;
-+    if (tlb_force_broadcast(env)) {
++    int x;
-+        tlbiall_is_write(env, NULL, value);
++    for (x = 0; x < hvf_state->num_slots; ++x) {
 +        slot = &hvf_state->slots[x];
 +        if (slot->size && start < (slot->start + slot->size) &&
 +            (start + size) > slot->start) {
 +            return slot;
 +        }
 +    }
 +    return NULL;
 +}
 +
 +struct mac_slot {
 +    int present;
 +    uint64_t size;
 +    uint64_t gpa_start;
 +    uint64_t gva;
 +};
 +
 +struct mac_slot mac_slots[32];
 +
 +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
 +{
 +    struct mac_slot *macslot;
 +    hv_return_t ret;
 +
 +    macslot = &mac_slots[slot->slot_id];
 +
 +    if (macslot->present) {
 +        if (macslot->size != slot->size) {
 +            macslot->present = 0;
 +            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
 +            assert_hvf_ok(ret);
 +        }
 +    }
 +
 +    if (!slot->size) {
 +        return 0;
 +    }
 +
 +    macslot->present = 1;
 +    macslot->gpa_start = slot->start;
 +    macslot->size = slot->size;
 +    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
 +    assert_hvf_ok(ret);
 +    return 0;
 +}
 +
 +void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
 +{
 +    hvf_slot *mem;
 +    MemoryRegion *area = section->mr;
 +    bool writeable = !area->readonly && !area->rom_device;
 +    hv_memory_flags_t flags;
 +
 +    if (!memory_region_is_ram(area)) {
 +        if (writeable) {
 +            return;
 +        } else if (!memory_region_is_romd(area)) {
 +            /*
 +             * If the memory device is not in romd_mode, then we actually want
 +             * to remove the hvf memory slot so all accesses will trap.
 +             */
 +             add = false;
 +        }
 +    }
 +
 +    mem = hvf_find_overlap_slot(
 +            section->offset_within_address_space,
 +            int128_get64(section->size));
 +
 +    if (mem && add) {
 +        if (mem->size == int128_get64(section->size) &&
 +            mem->start == section->offset_within_address_space &&
 +            mem->mem == (memory_region_get_ram_ptr(area) +
 +            section->offset_within_region)) {
 +            return; /* Same region was attempted to register, go away. */
 +        }
 +    }
 +
 +    /* Region needs to be reset. set the size to 0 and remap it. */
 +    if (mem) {
 +        mem->size = 0;
 +        if (do_hvf_set_memory(mem, 0)) {
 +            error_report("Failed to reset overlapping slot");
 +            abort();
 +        }
 +    }
 +
 +    if (!add) {
 +        return;
 +    }
 +
-+    tlb_flush(CPU(cpu));
++    if (area->readonly ||
-+}
++        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
-+
++        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
-+static void tlbimva_write(CPUARMState *env, const ARMCPRegInfo *ri,
++    } else {
-+                          uint64_t value)
++        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
-+{
++    }
-+    /* Invalidate single TLB entry by MVA and ASID (TLBIMVA) */
++
-+    ARMCPU *cpu = arm_env_get_cpu(env);
++    /* Now make a new slot. */
-+
++    int x;
-+    if (tlb_force_broadcast(env)) {
++
-+        tlbimva_is_write(env, NULL, value);
++    for (x = 0; x < hvf_state->num_slots; ++x) {
 +        mem = &hvf_state->slots[x];
 +        if (!mem->size) {
 +            break;
 +        }
 +    }
 +
 +    if (x == hvf_state->num_slots) {
 +        error_report("No free slots");
 +        abort();
 +    }
 +
 +    mem->size = int128_get64(section->size);
 +    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
 +    mem->start = section->offset_within_address_space;
 +    mem->region = area;
 +
 +    if (do_hvf_set_memory(mem, flags)) {
 +        error_report("Error registering new memory slot");
 +        abort();
 +    }
 +}
 +
 +static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
 +{
 +    if (!cpu->vcpu_dirty) {
 +        hvf_get_registers(cpu);
 +        cpu->vcpu_dirty = true;
 +    }
 +}
 +
 +void hvf_cpu_synchronize_state(CPUState *cpu)
 +{
 +    if (!cpu->vcpu_dirty) {
 +        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
 +    }
 +}
 +
 +static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
 +                                              run_on_cpu_data arg)
 +{
 +    hvf_put_registers(cpu);
 +    cpu->vcpu_dirty = false;
 +}
 +
 +void hvf_cpu_synchronize_post_reset(CPUState *cpu)
 +{
 +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
 +}
 +
 +static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
 +                                             run_on_cpu_data arg)
 +{
 +    hvf_put_registers(cpu);
 +    cpu->vcpu_dirty = false;
 +}
 +
 +void hvf_cpu_synchronize_post_init(CPUState *cpu)
 +{
 +    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
 +}
 +
 +static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
 +                                              run_on_cpu_data arg)
 +{
 +    cpu->vcpu_dirty = true;
 +}
 +
 +void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
 +{
 +    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
 +}
 +
 +static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
 +{
 +    hvf_slot *slot;
 +
 +    slot = hvf_find_overlap_slot(
 +            section->offset_within_address_space,
 +            int128_get64(section->size));
 +
 +    /* protect region against writes; begin tracking it */
 +    if (on) {
 +        slot->flags |= HVF_SLOT_LOG;
 +        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
 +                      HV_MEMORY_READ);
 +    /* stop tracking region*/
 +    } else {
 +        slot->flags &= ~HVF_SLOT_LOG;
 +        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
 +                      HV_MEMORY_READ | HV_MEMORY_WRITE);
 +    }
 +}
 +
 +static void hvf_log_start(MemoryListener *listener,
 +                          MemoryRegionSection *section, int old, int new)
 +{
 +    if (old != 0) {
 +        return;
 +    }
 +
-+    tlb_flush_page(CPU(cpu), value & TARGET_PAGE_MASK);
++    hvf_set_dirty_tracking(section, 1);
 +}
 +
-+static void tlbiasid_write(CPUARMState *env, const ARMCPRegInfo *ri,
++static void hvf_log_stop(MemoryListener *listener,
-+                           uint64_t value)
++                         MemoryRegionSection *section, int old, int new)
 +{
-+    /* Invalidate by ASID (TLBIASID) */
++    if (new != 0) {
 +    ARMCPU *cpu = arm_env_get_cpu(env);
 +
 +    if (tlb_force_broadcast(env)) {
 +        tlbiasid_is_write(env, NULL, value);
 +        return;
 +    }
 +
-+    tlb_flush(CPU(cpu));
++    hvf_set_dirty_tracking(section, 0);
 +}
 +
-+static void tlbimvaa_write(CPUARMState *env, const ARMCPRegInfo *ri,
++static void hvf_log_sync(MemoryListener *listener,
-+                           uint64_t value)
++                         MemoryRegionSection *section)
 +{
-+    /* Invalidate single entry by MVA, all ASIDs (TLBIMVAA) */
++    /*
-+    ARMCPU *cpu = arm_env_get_cpu(env);
++     * sync of dirty pages is handled elsewhere; just make sure we keep
-+
++     * tracking the region.
-+    if (tlb_force_broadcast(env)) {
++     */
-+        tlbimvaa_is_write(env, NULL, value);
++    hvf_set_dirty_tracking(section, 1);
-+        return;
++}
-+    }
++
-+
++static void hvf_region_add(MemoryListener *listener,
-+    tlb_flush_page(CPU(cpu), value & TARGET_PAGE_MASK);
++                           MemoryRegionSection *section)
-+}
++{
-+
++    hvf_set_phys_mem(section, true);
- static void tlbiall_nsnh_write(CPUARMState *env, const ARMCPRegInfo *ri,
++}
-                                uint64_t value)
++
 +static void hvf_region_del(MemoryListener *listener,
 +                           MemoryRegionSection *section)
 +{
 +    hvf_set_phys_mem(section, false);
 +}
 +
 +static MemoryListener hvf_memory_listener = {
 +    .priority = 10,
 +    .region_add = hvf_region_add,
 +    .region_del = hvf_region_del,
 +    .log_start = hvf_log_start,
 +    .log_stop = hvf_log_stop,
 +    .log_sync = hvf_log_sync,
 +};
 +
 +static void dummy_signal(int sig)
 +{
 +}
 +
 +bool hvf_allowed;
 +
 +static int hvf_accel_init(MachineState *ms)
 +{
 +    int x;
 +    hv_return_t ret;
 +    HVFState *s;
 +
 +    ret = hv_vm_create(HV_VM_DEFAULT);
 +    assert_hvf_ok(ret);
 +
 +    s = g_new0(HVFState, 1);
 +
 +    s->num_slots = 32;
 +    for (x = 0; x < s->num_slots; ++x) {
 +        s->slots[x].size = 0;
 +        s->slots[x].slot_id = x;
 +    }
 +
 +    hvf_state = s;
 +    memory_listener_register(&hvf_memory_listener, &address_space_memory);
 +    return 0;
 +}
 +
 +static void hvf_accel_class_init(ObjectClass *oc, void *data)
 +{
 +    AccelClass *ac = ACCEL_CLASS(oc);
 +    ac->name = "HVF";
 +    ac->init_machine = hvf_accel_init;
 +    ac->allowed = &hvf_allowed;
 +}
 +
 +static const TypeInfo hvf_accel_type = {
 +    .name = TYPE_HVF_ACCEL,
 +    .parent = TYPE_ACCEL,
 +    .class_init = hvf_accel_class_init,
 +};
 +
 +static void hvf_type_init(void)
 +{
 +    type_register_static(&hvf_accel_type);
 +}
 +
 +type_init(hvf_type_init);
 +
  /*
   * The HVF-specific vCPU thread function. This one should only run when the host
   * CPU supports the VMX "unrestricted guest" feature.
 diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/hvf.c
 +++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
  #include "hvf-accel-ops.h"
 -HVFState *hvf_state;
 -
 -/* Memory slots */
 -hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
 -{
 -    hvf_slot *slot;
 -    int x;
 -    for (x = 0; x < hvf_state->num_slots; ++x) {
 -        slot = &hvf_state->slots[x];
 -        if (slot->size && start < (slot->start + slot->size) &&
 -            (start + size) > slot->start) {
 -            return slot;
 -        }
 -    }
 -    return NULL;
 -}
 -
 -struct mac_slot {
 -    int present;
 -    uint64_t size;
 -    uint64_t gpa_start;
 -    uint64_t gva;
 -};
 -
 -struct mac_slot mac_slots[32];
 -
 -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
 -{
 -    struct mac_slot *macslot;
 -    hv_return_t ret;
 -
 -    macslot = &mac_slots[slot->slot_id];
 -
 -    if (macslot->present) {
 -        if (macslot->size != slot->size) {
 -            macslot->present = 0;
 -            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
 -            assert_hvf_ok(ret);
 -        }
 -    }
 -
 -    if (!slot->size) {
 -        return 0;
 -    }
 -
 -    macslot->present = 1;
 -    macslot->gpa_start = slot->start;
 -    macslot->size = slot->size;
 -    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
 -    assert_hvf_ok(ret);
 -    return 0;
 -}
 -
 -void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
 -{
 -    hvf_slot *mem;
 -    MemoryRegion *area = section->mr;
 -    bool writeable = !area->readonly && !area->rom_device;
 -    hv_memory_flags_t flags;
 -
 -    if (!memory_region_is_ram(area)) {
 -        if (writeable) {
 -            return;
 -        } else if (!memory_region_is_romd(area)) {
 -            /*
 -             * If the memory device is not in romd_mode, then we actually want
 -             * to remove the hvf memory slot so all accesses will trap.
 -             */
 -             add = false;
 -        }
 -    }
 -
 -    mem = hvf_find_overlap_slot(
 -            section->offset_within_address_space,
 -            int128_get64(section->size));
 -
 -    if (mem && add) {
 -        if (mem->size == int128_get64(section->size) &&
 -            mem->start == section->offset_within_address_space &&
 -            mem->mem == (memory_region_get_ram_ptr(area) +
 -            section->offset_within_region)) {
 -            return; /* Same region was attempted to register, go away. */
 -        }
 -    }
 -
 -    /* Region needs to be reset. set the size to 0 and remap it. */
 -    if (mem) {
 -        mem->size = 0;
 -        if (do_hvf_set_memory(mem, 0)) {
 -            error_report("Failed to reset overlapping slot");
 -            abort();
 -        }
 -    }
 -
 -    if (!add) {
 -        return;
 -    }
 -
 -    if (area->readonly ||
 -        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
 -        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
 -    } else {
 -        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
 -    }
 -
 -    /* Now make a new slot. */
 -    int x;
 -
 -    for (x = 0; x < hvf_state->num_slots; ++x) {
 -        mem = &hvf_state->slots[x];
 -        if (!mem->size) {
 -            break;
 -        }
 -    }
 -
 -    if (x == hvf_state->num_slots) {
 -        error_report("No free slots");
 -        abort();
 -    }
 -
 -    mem->size = int128_get64(section->size);
 -    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
 -    mem->start = section->offset_within_address_space;
 -    mem->region = area;
 -
 -    if (do_hvf_set_memory(mem, flags)) {
 -        error_report("Error registering new memory slot");
 -        abort();
 -    }
 -}
 -
  void vmx_update_tpr(CPUState *cpu)
  {
-@@ -XXX,XX +XXX,XX @@ static CPAccessResult aa64_cacheop_access(CPUARMState *env,
+     /* TODO: need integrate APIC handling */
-  * Page D4-1736 (DDI0487A.b)
+@@ -XXX,XX +XXX,XX @@ void hvf_handle_io(CPUArchState *env, uint16_t port, void *buffer,
   */
 -static void tlbi_aa64_vmalle1_write(CPUARMState *env, const ARMCPRegInfo *ri,
 -                                    uint64_t value)
 -{
 -    CPUState *cs = ENV_GET_CPU(env);
 -
 -    if (arm_is_secure_below_el3(env)) {
 -        tlb_flush_by_mmuidx(cs,
 -                            ARMMMUIdxBit_S1SE1 |
 -                            ARMMMUIdxBit_S1SE0);
 -    } else {
 -        tlb_flush_by_mmuidx(cs,
 -                            ARMMMUIdxBit_S12NSE1 |
 -                            ARMMMUIdxBit_S12NSE0);
 -    }
 -}
 -
  static void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                        uint64_t value)
  {
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
      }
  }
-+static void tlbi_aa64_vmalle1_write(CPUARMState *env, const ARMCPRegInfo *ri,
+-static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
-+                                    uint64_t value)
+-{
-+{
+-    if (!cpu->vcpu_dirty) {
-+    CPUState *cs = ENV_GET_CPU(env);
+-        hvf_get_registers(cpu);
-+
+-        cpu->vcpu_dirty = true;
-+    if (tlb_force_broadcast(env)) {
+-    }
-+        tlbi_aa64_vmalle1_write(env, NULL, value);
+-}
-+        return;
+-
-+    }
+-void hvf_cpu_synchronize_state(CPUState *cpu)
-+
+-{
-+    if (arm_is_secure_below_el3(env)) {
+-    if (!cpu->vcpu_dirty) {
-+        tlb_flush_by_mmuidx(cs,
+-        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
-+                            ARMMMUIdxBit_S1SE1 |
+-    }
-+                            ARMMMUIdxBit_S1SE0);
+-}
-+    } else {
+-
-+        tlb_flush_by_mmuidx(cs,
+-static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
-+                            ARMMMUIdxBit_S12NSE1 |
+-                                              run_on_cpu_data arg)
-+                            ARMMMUIdxBit_S12NSE0);
+-{
-+    }
+-    hvf_put_registers(cpu);
-+}
+-    cpu->vcpu_dirty = false;
-+
+-}
- static void tlbi_aa64_alle1_write(CPUARMState *env, const ARMCPRegInfo *ri,
+-
-                                   uint64_t value)
+-void hvf_cpu_synchronize_post_reset(CPUState *cpu)
 -{
 -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
 -}
 -
 -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
 -                                             run_on_cpu_data arg)
 -{
 -    hvf_put_registers(cpu);
 -    cpu->vcpu_dirty = false;
 -}
 -
 -void hvf_cpu_synchronize_post_init(CPUState *cpu)
 -{
 -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
 -}
 -
 -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
 -                                              run_on_cpu_data arg)
 -{
 -    cpu->vcpu_dirty = true;
 -}
 -
 -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
 -{
 -    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
 -}
 -
  static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
  {
-@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_alle3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+     int read, write;
-     tlb_flush_by_mmuidx_all_cpus_synced(cs, ARMMMUIdxBit_S1E3);
+@@ -XXX,XX +XXX,XX @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
      return false;
  }
--static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri,
+-static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
--                                 uint64_t value)
+-{
--{
+-    hvf_slot *slot;
--    /* Invalidate by VA, EL1&0 (AArch64 version).
+-
--     * Currently handles all of VAE1, VAAE1, VAALE1 and VALE1,
+-    slot = hvf_find_overlap_slot(
--     * since we don't support flush-for-specific-ASID-only or
+-            section->offset_within_address_space,
--     * flush-last-level-only.
+-            int128_get64(section->size));
 -
 -    /* protect region against writes; begin tracking it */
 -    if (on) {
 -        slot->flags |= HVF_SLOT_LOG;
 -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
 -                      HV_MEMORY_READ);
 -    /* stop tracking region*/
 -    } else {
 -        slot->flags &= ~HVF_SLOT_LOG;
 -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
 -                      HV_MEMORY_READ | HV_MEMORY_WRITE);
 -    }
 -}
 -
 -static void hvf_log_start(MemoryListener *listener,
 -                          MemoryRegionSection *section, int old, int new)
 -{
 -    if (old != 0) {
 -        return;
 -    }
 -
 -    hvf_set_dirty_tracking(section, 1);
 -}
 -
 -static void hvf_log_stop(MemoryListener *listener,
 -                         MemoryRegionSection *section, int old, int new)
 -{
 -    if (new != 0) {
 -        return;
 -    }
 -
 -    hvf_set_dirty_tracking(section, 0);
 -}
 -
 -static void hvf_log_sync(MemoryListener *listener,
 -                         MemoryRegionSection *section)
 -{
 -    /*
 -     * sync of dirty pages is handled elsewhere; just make sure we keep
 -     * tracking the region.
 -     */
--    ARMCPU *cpu = arm_env_get_cpu(env);
+-    hvf_set_dirty_tracking(section, 1);
--    CPUState *cs = CPU(cpu);
+-}
--    uint64_t pageaddr = sextract64(value << 12, 0, 56);
+-
--
+-static void hvf_region_add(MemoryListener *listener,
--    if (arm_is_secure_below_el3(env)) {
+-                           MemoryRegionSection *section)
--        tlb_flush_page_by_mmuidx(cs, pageaddr,
+-{
--                                 ARMMMUIdxBit_S1SE1 |
+-    hvf_set_phys_mem(section, true);
--                                 ARMMMUIdxBit_S1SE0);
+-}
--    } else {
+-
--        tlb_flush_page_by_mmuidx(cs, pageaddr,
+-static void hvf_region_del(MemoryListener *listener,
--                                 ARMMMUIdxBit_S12NSE1 |
+-                           MemoryRegionSection *section)
--                                 ARMMMUIdxBit_S12NSE0);
+-{
--    }
+-    hvf_set_phys_mem(section, false);
 -}
 -
- static void tlbi_aa64_vae2_write(CPUARMState *env, const ARMCPRegInfo *ri,
+-static MemoryListener hvf_memory_listener = {
-                                  uint64_t value)
+-    .priority = 10,
 -    .region_add = hvf_region_add,
 -    .region_del = hvf_region_del,
 -    .log_start = hvf_log_start,
 -    .log_stop = hvf_log_stop,
 -    .log_sync = hvf_log_sync,
 -};
 -
  void hvf_vcpu_destroy(CPUState *cpu)
  {
-@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+     X86CPU *x86_cpu = X86_CPU(cpu);
-     }
+@@ -XXX,XX +XXX,XX @@ void hvf_vcpu_destroy(CPUState *cpu)
      assert_hvf_ok(ret);
  }
-+static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri,
+-static void dummy_signal(int sig)
-+                                 uint64_t value)
+-{
-+{
+-}
-+    /* Invalidate by VA, EL1&0 (AArch64 version).
+-
-+     * Currently handles all of VAE1, VAAE1, VAALE1 and VALE1,
+ static void init_tsc_freq(CPUX86State *env)
 +     * since we don't support flush-for-specific-ASID-only or
 +     * flush-last-level-only.
 +     */
 +    ARMCPU *cpu = arm_env_get_cpu(env);
 +    CPUState *cs = CPU(cpu);
 +    uint64_t pageaddr = sextract64(value << 12, 0, 56);
 +
 +    if (tlb_force_broadcast(env)) {
 +        tlbi_aa64_vae1is_write(env, NULL, value);
 +        return;
 +    }
 +
 +    if (arm_is_secure_below_el3(env)) {
 +        tlb_flush_page_by_mmuidx(cs, pageaddr,
 +                                 ARMMMUIdxBit_S1SE1 |
 +                                 ARMMMUIdxBit_S1SE0);
 +    } else {
 +        tlb_flush_page_by_mmuidx(cs, pageaddr,
 +                                 ARMMMUIdxBit_S12NSE1 |
 +                                 ARMMMUIdxBit_S12NSE0);
 +    }
 +}
 +
  static void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                     uint64_t value)
  {
+     size_t length;
+@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
+     return ret;
+ }
+-
+-bool hvf_allowed;
+-
+-static int hvf_accel_init(MachineState *ms)
+-{
+-    int x;
+-    hv_return_t ret;
+-    HVFState *s;
+-
+-    ret = hv_vm_create(HV_VM_DEFAULT);
+-    assert_hvf_ok(ret);
+-
+-    s = g_new0(HVFState, 1);
+-
+-    s->num_slots = 32;
+-    for (x = 0; x < s->num_slots; ++x) {
+-        s->slots[x].size = 0;
+-        s->slots[x].slot_id = x;
+-    }
+-
+-    hvf_state = s;
+-    memory_listener_register(&hvf_memory_listener, &address_space_memory);
+-    return 0;
+-}
+-
+-static void hvf_accel_class_init(ObjectClass *oc, void *data)
+-{
+-    AccelClass *ac = ACCEL_CLASS(oc);
+-    ac->name = "HVF";
+-    ac->init_machine = hvf_accel_init;
+-    ac->allowed = &hvf_allowed;
+-}
+-
+-static const TypeInfo hvf_accel_type = {
+-    .name = TYPE_HVF_ACCEL,
+-    .parent = TYPE_ACCEL,
+-    .class_init = hvf_accel_class_init,
+-};
+-
+-static void hvf_type_init(void)
+-{
+-    type_register_static(&hvf_accel_type);
+-}
+-
+-type_init(hvf_type_init);
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 42/45] net: cadence_gem: Announce availability of priority queues
+[PULL 31/45] hvf: Move hvf internal definitions into common header
-From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+From: Alexander Graf <agraf@csgraf.de>
-Announce the availability of the various priority queues.
+Until now, Hypervisor.framework has only been available on x86_64 systems.
-This fixes an issue where guest kernels would miss to
+With Apple Silicon shipping now, it extends its reach to aarch64. To
-configure secondary queues due to inproper feature bits.
+prepare for support for multiple architectures, let's start moving common
 code out into its own accel directory.
-Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+This patch moves a few internal struct and constant defines over.
-Message-id: 20181017213932.19973-2-edgar.iglesias@gmail.com
 Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-5-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/net/cadence_gem.c | 8 +++++++-
+ include/sysemu/hvf_int.h   | 30 ++++++++++++++++++++++++++++++
-file changed, 7 insertions(+), 1 deletion(-)
+ target/i386/hvf/hvf-i386.h | 31 +------------------------------
 files changed, 31 insertions(+), 30 deletions(-)
-diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
+diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/net/cadence_gem.c
+--- a/include/sysemu/hvf_int.h
-+++ b/hw/net/cadence_gem.c
++++ b/include/sysemu/hvf_int.h
-@@ -XXX,XX +XXX,XX @@ static void gem_reset(DeviceState *d)
+@@ -XXX,XX +XXX,XX @@
-     int i;
-     CadenceGEMState *s = CADENCE_GEM(d);
+ #include <Hypervisor/hv.h>
-     const uint8_t *a;
-+    uint32_t queues_mask = 0;
++/* hvf_slot flags */
++#define HVF_SLOT_LOG (1 << 0)
      DB_PRINT("\n");
@@ -XXX,XX +XXX,XX @@ static void gem_reset(DeviceState *d)
      s->regs[GEM_DESCONF] = 0x02500111;
      s->regs[GEM_DESCONF2] = 0x2ab13fff;
      s->regs[GEM_DESCONF5] = 0x002f2045;
 -    s->regs[GEM_DESCONF6] = 0x00000200;
 +    s->regs[GEM_DESCONF6] = 0x0;
 +
-+    if (s->num_priority_queues > 1) {
++typedef struct hvf_slot {
-+        queues_mask = MAKE_64BIT_MASK(1, s->num_priority_queues - 1);
++    uint64_t start;
-+        s->regs[GEM_DESCONF6] |= queues_mask;
++    uint64_t size;
-+    }
++    uint8_t *mem;
++    int slot_id;
-     /* Set MAC address */
++    uint32_t flags;
-     a = &s->conf.macaddr.a[0];
++    MemoryRegion *region;
 +} hvf_slot;
 +
 +typedef struct hvf_vcpu_caps {
 +    uint64_t vmx_cap_pinbased;
 +    uint64_t vmx_cap_procbased;
 +    uint64_t vmx_cap_procbased2;
 +    uint64_t vmx_cap_entry;
 +    uint64_t vmx_cap_exit;
 +    uint64_t vmx_cap_preemption_timer;
 +} hvf_vcpu_caps;
 +
 +struct HVFState {
 +    AccelState parent;
 +    hvf_slot slots[32];
 +    int num_slots;
 +
 +    hvf_vcpu_caps *hvf_caps;
 +};
 +extern HVFState *hvf_state;
 +
  void hvf_set_phys_mem(MemoryRegionSection *, bool);
  void assert_hvf_ok(hv_return_t ret);
  hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
 diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/hvf-i386.h
 +++ b/target/i386/hvf/hvf-i386.h
@@ -XXX,XX +XXX,XX @@
  #include "qemu/accel.h"
  #include "sysemu/hvf.h"
 +#include "sysemu/hvf_int.h"
  #include "cpu.h"
  #include "x86.h"
 -/* hvf_slot flags */
 -#define HVF_SLOT_LOG (1 << 0)
 -
 -typedef struct hvf_slot {
 -    uint64_t start;
 -    uint64_t size;
 -    uint8_t *mem;
 -    int slot_id;
 -    uint32_t flags;
 -    MemoryRegion *region;
 -} hvf_slot;
 -
 -typedef struct hvf_vcpu_caps {
 -    uint64_t vmx_cap_pinbased;
 -    uint64_t vmx_cap_procbased;
 -    uint64_t vmx_cap_procbased2;
 -    uint64_t vmx_cap_entry;
 -    uint64_t vmx_cap_exit;
 -    uint64_t vmx_cap_preemption_timer;
 -} hvf_vcpu_caps;
 -
 -struct HVFState {
 -    AccelState parent;
 -    hvf_slot slots[32];
 -    int num_slots;
 -
 -    hvf_vcpu_caps *hvf_caps;
 -};
 -extern HVFState *hvf_state;
 -
  void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
  #ifdef NEED_CPU_H
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 27/45] target/arm: Use gvec for NEON VDUP
+[PULL 32/45] hvf: Make hvf_set_phys_mem() static
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Alexander Graf <agraf@csgraf.de>
-Also introduces neon_element_offset to find the env offset
+The hvf_set_phys_mem() function is only called within the same file.
-of a specific element within a neon register.
+Make it static.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Message-id: 20181011205206.3552-7-richard.henderson@linaro.org
+Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-6-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 63 ++++++++++++++++++++++++------------------
+ include/sysemu/hvf_int.h  | 1 -
-file changed, 36 insertions(+), 27 deletions(-)
+ accel/hvf/hvf-accel-ops.c | 2 +-
 files changed, 1 insertion(+), 2 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/include/sysemu/hvf_int.h
-+++ b/target/arm/translate.c
++++ b/include/sysemu/hvf_int.h
-@@ -XXX,XX +XXX,XX @@ neon_reg_offset (int reg, int n)
+@@ -XXX,XX +XXX,XX @@ struct HVFState {
-     return vfp_reg_offset(0, sreg);
+ };
  extern HVFState *hvf_state;
 -void hvf_set_phys_mem(MemoryRegionSection *, bool);
  void assert_hvf_ok(hv_return_t ret);
  hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
  int hvf_put_registers(CPUState *);
 diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/hvf/hvf-accel-ops.c
 +++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
      return 0;
  }
-+/* Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
+-void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
-+ * where 0 is the least significant end of the register.
++static void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
 + */
 +static inline long
 +neon_element_offset(int reg, int element, TCGMemOp size)
 +{
 +    int element_size = 1 << size;
 +    int ofs = element * element_size;
 +#ifdef HOST_WORDS_BIGENDIAN
 +    /* Calculate the offset assuming fully little-endian,
 +     * then XOR to account for the order of the 8-byte units.
 +     */
 +    if (element_size < 8) {
 +        ofs ^= 8 - element_size;
 +    }
 +#endif
 +    return neon_reg_offset(reg, 0) + ofs;
 +}
 +
  static TCGv_i32 neon_load_reg(int reg, int pass)
  {
-     TCGv_i32 tmp = tcg_temp_new_i32();
+     hvf_slot *mem;
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+     MemoryRegion *area = section->mr;
                      tmp = load_reg(s, rd);
                      if (insn & (1 << 23)) {
                          /* VDUP */
 -                        if (size == 0) {
 -                            gen_neon_dup_u8(tmp, 0);
 -                        } else if (size == 1) {
 -                            gen_neon_dup_low16(tmp);
 -                        }
 -                        for (n = 0; n <= pass * 2; n++) {
 -                            tmp2 = tcg_temp_new_i32();
 -                            tcg_gen_mov_i32(tmp2, tmp);
 -                            neon_store_reg(rn, n, tmp2);
 -                        }
 -                        neon_store_reg(rn, n, tmp);
 +                        int vec_size = pass ? 16 : 8;
 +                        tcg_gen_gvec_dup_i32(size, neon_reg_offset(rn, 0),
 +                                             vec_size, vec_size, tmp);
 +                        tcg_temp_free_i32(tmp);
                      } else {
                          /* VMOV */
                          switch (size) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  tcg_temp_free_i32(tmp);
              } else if ((insn & 0x380) == 0) {
                  /* VDUP */
 +                int element;
 +                TCGMemOp size;
 +
                  if ((insn & (7 << 16)) == 0 || (q && (rd & 1))) {
                      return 1;
                  }
 -                if (insn & (1 << 19)) {
 -                    tmp = neon_load_reg(rm, 1);
 -                } else {
 -                    tmp = neon_load_reg(rm, 0);
 -                }
                  if (insn & (1 << 16)) {
 -                    gen_neon_dup_u8(tmp, ((insn >> 17) & 3) * 8);
 +                    size = MO_8;
 +                    element = (insn >> 17) & 7;
                  } else if (insn & (1 << 17)) {
 -                    if ((insn >> 18) & 1)
 -                        gen_neon_dup_high16(tmp);
 -                    else
 -                        gen_neon_dup_low16(tmp);
 +                    size = MO_16;
 +                    element = (insn >> 18) & 3;
 +                } else {
 +                    size = MO_32;
 +                    element = (insn >> 19) & 1;
                  }
 -                for (pass = 0; pass < (q ? 4 : 2); pass++) {
 -                    tmp2 = tcg_temp_new_i32();
 -                    tcg_gen_mov_i32(tmp2, tmp);
 -                    neon_store_reg(rd, pass, tmp2);
 -                }
 -                tcg_temp_free_i32(tmp);
 +                tcg_gen_gvec_dup_mem(size, neon_reg_offset(rd, 0),
 +                                     neon_element_offset(rm, element, size),
 +                                     q ? 16 : 8, q ? 16 : 8);
              } else {
                  return 1;
              }
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 40/45] target/arm: Promote consecutive memory ops for aa32
+[PULL 33/45] hvf: Remove use of hv_uvaddr_t and hv_gpaddr_t
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Alexander Graf <agraf@csgraf.de>
-For a sequence of loads or stores from a single register,
+The ARM version of Hypervisor.framework no longer defines these two
-little-endian operations can be promoted to an 8-byte op.
+types, so let's just revert to standard ones.
 This can reduce the number of operations by a factor of 8.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Message-id: 20181011205206.3552-20-richard.henderson@linaro.org
+Reviewed-by: Sergio Lopez <slp@redhat.com>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Message-id: 20210519202253.76782-7-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 10 ++++++++++
+ accel/hvf/hvf-accel-ops.c | 6 +++---
-file changed, 10 insertions(+)
+file changed, 3 insertions(+), 3 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/accel/hvf/hvf-accel-ops.c
-+++ b/target/arm/translate.c
++++ b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
-         if (size == 3 && (interleave | spacing) != 1) {
+     macslot->present = 1;
-             return 1;
+     macslot->gpa_start = slot->start;
-         }
+     macslot->size = slot->size;
-+        /* For our purposes, bytes are always little-endian.  */
+-    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
-+        if (size == 0) {
++    ret = hv_vm_map(slot->mem, slot->start, slot->size, flags);
-+            endian = MO_LE;
+     assert_hvf_ok(ret);
-+        }
+     return 0;
-+        /* Consecutive little-endian elements from a single register
+ }
-+         * can be promoted to a larger little-endian operation.
+@@ -XXX,XX +XXX,XX @@ static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
-+         */
+     /* protect region against writes; begin tracking it */
-+        if (interleave == 1 && endian == MO_LE) {
+     if (on) {
-+            size = 3;
+         slot->flags |= HVF_SLOT_LOG;
-+        }
+-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
-         tmp64 = tcg_temp_new_i64();
++        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
-         addr = tcg_temp_new_i32();
+                       HV_MEMORY_READ);
-         tmp2 = tcg_const_i32(1 << size);
+     /* stop tracking region*/
      } else {
          slot->flags &= ~HVF_SLOT_LOG;
 -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
 +        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
                        HV_MEMORY_READ | HV_MEMORY_WRITE);
      }
  }
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 41/45] target/arm: Reorg NEON VLD/VST single element to one lane
+[PULL 34/45] hvf: Split out common code on vcpu init and destroy
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Alexander Graf <agraf@csgraf.de>
-Instead of shifts and masks, use direct loads and stores from
+Until now, Hypervisor.framework has only been available on x86_64 systems.
-the neon register file.
+With Apple Silicon shipping now, it extends its reach to aarch64. To
 prepare for support for multiple architectures, let's start moving common
 code out into its own accel directory.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+This patch splits the vcpu init and destroy functions into a generic and
-Message-id: 20181011205206.3552-21-richard.henderson@linaro.org
+an architecture specific portion. This also allows us to move the generic
 functions into the generic hvf code, removing exported functions.
 Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-8-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 92 +++++++++++++++++++++++-------------------
+ accel/hvf/hvf-accel-ops.h |  2 --
-file changed, 50 insertions(+), 42 deletions(-)
+ include/sysemu/hvf_int.h  |  2 ++
  accel/hvf/hvf-accel-ops.c | 30 ++++++++++++++++++++++++++++++
  target/i386/hvf/hvf.c     | 23 ++---------------------
 files changed, 34 insertions(+), 23 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/accel/hvf/hvf-accel-ops.h
-+++ b/target/arm/translate.c
++++ b/accel/hvf/hvf-accel-ops.h
-@@ -XXX,XX +XXX,XX @@ static TCGv_i32 neon_load_reg(int reg, int pass)
+@@ -XXX,XX +XXX,XX @@
-     return tmp;
- }
+ #include "sysemu/cpus.h"
-+static void neon_load_element(TCGv_i32 var, int reg, int ele, TCGMemOp mop)
+-int hvf_init_vcpu(CPUState *);
  int hvf_vcpu_exec(CPUState *);
  void hvf_cpu_synchronize_state(CPUState *);
  void hvf_cpu_synchronize_post_reset(CPUState *);
  void hvf_cpu_synchronize_post_init(CPUState *);
  void hvf_cpu_synchronize_pre_loadvm(CPUState *);
 -void hvf_vcpu_destroy(CPUState *);
  #endif /* HVF_CPUS_H */
 diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/sysemu/hvf_int.h
 +++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
  extern HVFState *hvf_state;
  void assert_hvf_ok(hv_return_t ret);
 +int hvf_arch_init_vcpu(CPUState *cpu);
 +void hvf_arch_vcpu_destroy(CPUState *cpu);
  hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
  int hvf_put_registers(CPUState *);
  int hvf_get_registers(CPUState *);
 diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/hvf/hvf-accel-ops.c
 +++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static void hvf_type_init(void)
  type_init(hvf_type_init);
 +static void hvf_vcpu_destroy(CPUState *cpu)
 +{
-+    long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
++    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
 +    assert_hvf_ok(ret);
 +
-+    switch (mop) {
++    hvf_arch_vcpu_destroy(cpu);
 +    case MO_UB:
 +        tcg_gen_ld8u_i32(var, cpu_env, offset);
 +        break;
 +    case MO_UW:
 +        tcg_gen_ld16u_i32(var, cpu_env, offset);
 +        break;
 +    case MO_UL:
 +        tcg_gen_ld_i32(var, cpu_env, offset);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
- static void neon_load_element64(TCGv_i64 var, int reg, int ele, TCGMemOp mop)
++static int hvf_init_vcpu(CPUState *cpu)
  {
      long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
@@ -XXX,XX +XXX,XX @@ static void neon_store_reg(int reg, int pass, TCGv_i32 var)
      tcg_temp_free_i32(var);
  }
 +static void neon_store_element(int reg, int ele, TCGMemOp size, TCGv_i32 var)
 +{
-+    long offset = neon_element_offset(reg, ele, size);
++    int r;
 +
-+    switch (size) {
++    /* init cpu signals */
-+    case MO_8:
++    sigset_t set;
-+        tcg_gen_st8_i32(var, cpu_env, offset);
++    struct sigaction sigact;
-+        break;
++
-+    case MO_16:
++    memset(&sigact, 0, sizeof(sigact));
-+        tcg_gen_st16_i32(var, cpu_env, offset);
++    sigact.sa_handler = dummy_signal;
-+        break;
++    sigaction(SIG_IPI, &sigact, NULL);
-+    case MO_32:
++
-+        tcg_gen_st_i32(var, cpu_env, offset);
++    pthread_sigmask(SIG_BLOCK, NULL, &set);
-+        break;
++    sigdelset(&set, SIG_IPI);
-+    default:
++
-+        g_assert_not_reached();
++    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
-+    }
++    cpu->vcpu_dirty = 1;
 +    assert_hvf_ok(r);
 +
 +    return hvf_arch_init_vcpu(cpu);
 +}
 +
- static void neon_store_element64(int reg, int ele, TCGMemOp size, TCGv_i64 var)
+ /*
   * The HVF-specific vCPU thread function. This one should only run when the host
   * CPU supports the VMX "unrestricted guest" feature.
 diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/hvf.c
 +++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
      return false;
  }
 -void hvf_vcpu_destroy(CPUState *cpu)
 +void hvf_arch_vcpu_destroy(CPUState *cpu)
  {
-     long offset = neon_element_offset(reg, ele, size);
+     X86CPU *x86_cpu = X86_CPU(cpu);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
+     CPUX86State *env = &x86_cpu->env;
-     int stride;
-     int size;
+-    hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd);
-     int reg;
+     g_free(env->hvf_mmio_buf);
--    int pass;
+-    assert_hvf_ok(ret);
-     int load;
+ }
--    int shift;
-     int n;
+ static void init_tsc_freq(CPUX86State *env)
-     int vec_size;
+@@ -XXX,XX +XXX,XX @@ static inline bool apic_bus_freq_is_known(CPUX86State *env)
-     int mmu_idx;
+     return env->apic_bus_freq != 0;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
+ }
-         } else {
-             /* Single element.  */
+-int hvf_init_vcpu(CPUState *cpu)
-             int idx = (insn >> 4) & 0xf;
++int hvf_arch_init_vcpu(CPUState *cpu)
--            pass = (insn >> 7) & 1;
+ {
-+            int reg_idx;
+-
-             switch (size) {
+     X86CPU *x86cpu = X86_CPU(cpu);
-             case 0:
+     CPUX86State *env = &x86cpu->env;
--                shift = ((insn >> 5) & 3) * 8;
+-    int r;
-+                reg_idx = (insn >> 5) & 7;
+-
-                 stride = 1;
+-    /* init cpu signals */
-                 break;
+-    sigset_t set;
-             case 1:
+-    struct sigaction sigact;
--                shift = ((insn >> 6) & 1) * 16;
+-
-+                reg_idx = (insn >> 6) & 3;
+-    memset(&sigact, 0, sizeof(sigact));
-                 stride = (insn & (1 << 5)) ? 2 : 1;
+-    sigact.sa_handler = dummy_signal;
-                 break;
+-    sigaction(SIG_IPI, &sigact, NULL);
-             case 2:
+-
--                shift = 0;
+-    pthread_sigmask(SIG_BLOCK, NULL, &set);
-+                reg_idx = (insn >> 7) & 1;
+-    sigdelset(&set, SIG_IPI);
-                 stride = (insn & (1 << 6)) ? 2 : 1;
-                 break;
+     init_emu();
-             default:
+     init_decoder();
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ int hvf_init_vcpu(CPUState *cpu)
                   */
                  return 1;
              }
 +            tmp = tcg_temp_new_i32();
              addr = tcg_temp_new_i32();
              load_reg_var(s, addr, rn);
              for (reg = 0; reg < nregs; reg++) {
                  if (load) {
 -                    tmp = tcg_temp_new_i32();
 -                    switch (size) {
 -                    case 0:
 -                        gen_aa32_ld8u(s, tmp, addr, get_mem_index(s));
 -                        break;
 -                    case 1:
 -                        gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
 -                        break;
 -                    case 2:
 -                        gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
 -                        break;
 -                    default: /* Avoid compiler warnings.  */
 -                        abort();
 -                    }
 -                    if (size != 2) {
 -                        tmp2 = neon_load_reg(rd, pass);
 -                        tcg_gen_deposit_i32(tmp, tmp2, tmp,
 -                                            shift, size ? 16 : 8);
 -                        tcg_temp_free_i32(tmp2);
 -                    }
 -                    neon_store_reg(rd, pass, tmp);
 +                    gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
 +                                    s->be_data | size);
 +                    neon_store_element(rd, reg_idx, size, tmp);
                  } else { /* Store */
 -                    tmp = neon_load_reg(rd, pass);
 -                    if (shift)
 -                        tcg_gen_shri_i32(tmp, tmp, shift);
 -                    switch (size) {
 -                    case 0:
 -                        gen_aa32_st8(s, tmp, addr, get_mem_index(s));
 -                        break;
 -                    case 1:
 -                        gen_aa32_st16(s, tmp, addr, get_mem_index(s));
 -                        break;
 -                    case 2:
 -                        gen_aa32_st32(s, tmp, addr, get_mem_index(s));
 -                        break;
 -                    }
 -                    tcg_temp_free_i32(tmp);
 +                    neon_load_element(tmp, rd, reg_idx, size);
 +                    gen_aa32_st_i32(s, tmp, addr, get_mem_index(s),
 +                                    s->be_data | size);
                  }
                  rd += stride;
                  tcg_gen_addi_i32(addr, addr, 1 << size);
              }
              tcg_temp_free_i32(addr);
 +            tcg_temp_free_i32(tmp);
              stride = nregs * (1 << size);
          }
      }
+-    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
+-    cpu->vcpu_dirty = 1;
+-    assert_hvf_ok(r);
+-
+     if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED,
+         &hvf_state->hvf_caps->vmx_cap_pinbased)) {
+         abort();
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 22/45] target/arm: Hoist address increment for vector memory ops
+[PULL 35/45] hvf: Use cpu_synchronize_state()
-From: Richard Henderson <rth@twiddle.net>
+From: Alexander Graf <agraf@csgraf.de>
-This can reduce the number of opcodes required for certain
+There is no reason to call the hvf specific hvf_cpu_synchronize_state()
-complex forms of load-multiple (e.g. ld4.16b).
+when we can just use the generic cpu_synchronize_state() instead. This
 allows us to have less dependency on internal function definitions and
 allows us to make hvf_cpu_synchronize_state() static.
-Signed-off-by: Richard Henderson <rth@twiddle.net>
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Message-id: 20181011205206.3552-2-richard.henderson@linaro.org
+Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-9-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 12 ++++++++----
+ accel/hvf/hvf-accel-ops.h | 1 -
-file changed, 8 insertions(+), 4 deletions(-)
+ accel/hvf/hvf-accel-ops.c | 2 +-
  target/i386/hvf/x86hvf.c  | 9 ++++-----
 files changed, 5 insertions(+), 7 deletions(-)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/accel/hvf/hvf-accel-ops.h
-+++ b/target/arm/translate-a64.c
++++ b/accel/hvf/hvf-accel-ops.h
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@
-     bool is_store = !extract32(insn, 22, 1);
+ #include "sysemu/cpus.h"
-     bool is_postidx = extract32(insn, 23, 1);
-     bool is_q = extract32(insn, 30, 1);
+ int hvf_vcpu_exec(CPUState *);
--    TCGv_i64 tcg_addr, tcg_rn;
+-void hvf_cpu_synchronize_state(CPUState *);
-+    TCGv_i64 tcg_addr, tcg_rn, tcg_ebytes;
+ void hvf_cpu_synchronize_post_reset(CPUState *);
+ void hvf_cpu_synchronize_post_init(CPUState *);
-     int ebytes = 1 << size;
+ void hvf_cpu_synchronize_pre_loadvm(CPUState *);
-     int elements = (is_q ? 128 : 64) / (8 << size);
+diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
+index XXXXXXX..XXXXXXX 100644
-     tcg_rn = cpu_reg_sp(s, rn);
+--- a/accel/hvf/hvf-accel-ops.c
-     tcg_addr = tcg_temp_new_i64();
++++ b/accel/hvf/hvf-accel-ops.c
-     tcg_gen_mov_i64(tcg_addr, tcg_rn);
+@@ -XXX,XX +XXX,XX @@ static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
 +    tcg_ebytes = tcg_const_i64(ebytes);
      for (r = 0; r < rpt; r++) {
          int e;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
                          clear_vec_high(s, is_q, tt);
                      }
                  }
 -                tcg_gen_addi_i64(tcg_addr, tcg_addr, ebytes);
 +                tcg_gen_add_i64(tcg_addr, tcg_addr, tcg_ebytes);
                  tt = (tt + 1) % 32;
              }
          }
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
              tcg_gen_add_i64(tcg_rn, tcg_rn, cpu_reg(s, rm));
          }
      }
-+    tcg_temp_free_i64(tcg_ebytes);
-     tcg_temp_free_i64(tcg_addr);
  }
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
+-void hvf_cpu_synchronize_state(CPUState *cpu)
-     bool replicate = false;
++static void hvf_cpu_synchronize_state(CPUState *cpu)
-     int index = is_q << 3 | S << 2 | size;
+ {
-     int ebytes, xs;
+     if (!cpu->vcpu_dirty) {
--    TCGv_i64 tcg_addr, tcg_rn;
+         run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
-+    TCGv_i64 tcg_addr, tcg_rn, tcg_ebytes;
+diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
+index XXXXXXX..XXXXXXX 100644
-     switch (scale) {
+--- a/target/i386/hvf/x86hvf.c
-     case 3:
++++ b/target/i386/hvf/x86hvf.c
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@
-     tcg_rn = cpu_reg_sp(s, rn);
+ #include "cpu.h"
-     tcg_addr = tcg_temp_new_i64();
+ #include "x86_descr.h"
-     tcg_gen_mov_i64(tcg_addr, tcg_rn);
+ #include "x86_decode.h"
-+    tcg_ebytes = tcg_const_i64(ebytes);
++#include "sysemu/hw_accel.h"
-     for (xs = 0; xs < selem; xs++) {
+ #include "hw/i386/apic_internal.h"
-         if (replicate) {
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
+ #include <Hypervisor/hv.h>
-                 do_vec_st(s, rt, index, tcg_addr, scale);
+ #include <Hypervisor/hv_vmx.h>
-             }
-         }
+-#include "accel/hvf/hvf-accel-ops.h"
--        tcg_gen_addi_i64(tcg_addr, tcg_addr, ebytes);
+-
-+        tcg_gen_add_i64(tcg_addr, tcg_addr, tcg_ebytes);
+ void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
-         rt = (rt + 1) % 32;
+                      SegmentCache *qseg, bool is_tr)
  {
@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
      env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
      if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
 -        hvf_cpu_synchronize_state(cpu_state);
 +        cpu_synchronize_state(cpu_state);
          do_cpu_init(cpu);
      }
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
-             tcg_gen_add_i64(tcg_rn, tcg_rn, cpu_reg(s, rm));
+         cpu_state->halted = 0;
          }
      }
-+    tcg_temp_free_i64(tcg_ebytes);
+     if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) {
-     tcg_temp_free_i64(tcg_addr);
+-        hvf_cpu_synchronize_state(cpu_state);
- }
++        cpu_synchronize_state(cpu_state);
+         do_cpu_sipi(cpu);
      }
      if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) {
          cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR;
 -        hvf_cpu_synchronize_state(cpu_state);
 +        cpu_synchronize_state(cpu_state);
          apic_handle_tpr_access_report(cpu->apic_state, env->eip,
                                        env->tpr_access_type);
      }
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 03/45] target/arm: Move some system registers into a substructure
+[PULL 36/45] hvf: Make synchronize functions static
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Alexander Graf <agraf@csgraf.de>
-Create struct ARMISARegisters, to be accessed during translation.
+The hvf accel synchronize functions are only used as input for local
 callback functions, so we can make them static.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Message-id: 20181016223115.24100-2-richard.henderson@linaro.org
+Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-10-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h      |  32 ++++----
+ accel/hvf/hvf-accel-ops.h | 3 ---
- hw/intc/armv7m_nvic.c |  12 +--
+ accel/hvf/hvf-accel-ops.c | 6 +++---
- target/arm/cpu.c      | 178 +++++++++++++++++++++---------------------
+files changed, 3 insertions(+), 6 deletions(-)
  target/arm/cpu64.c    |  70 ++++++++---------
  target/arm/helper.c   |  28 +++----
 files changed, 162 insertions(+), 158 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/accel/hvf/hvf-accel-ops.h
-+++ b/target/arm/cpu.h
++++ b/accel/hvf/hvf-accel-ops.h
-@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
+@@ -XXX,XX +XXX,XX @@
-      * ARMv7AR ARM Architecture Reference Manual. A reset_ prefix
+ #include "sysemu/cpus.h"
-      * is used for reset values of non-constant registers; no reset_
-      * prefix means a constant register.
+ int hvf_vcpu_exec(CPUState *);
-+     * Some of these registers are split out into a substructure that
+-void hvf_cpu_synchronize_post_reset(CPUState *);
-+     * is shared with the translators to control the ISA.
+-void hvf_cpu_synchronize_post_init(CPUState *);
-      */
+-void hvf_cpu_synchronize_pre_loadvm(CPUState *);
-+    struct ARMISARegisters {
-+        uint32_t id_isar0;
+ #endif /* HVF_CPUS_H */
-+        uint32_t id_isar1;
+diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 +        uint32_t id_isar2;
 +        uint32_t id_isar3;
 +        uint32_t id_isar4;
 +        uint32_t id_isar5;
 +        uint32_t id_isar6;
 +        uint32_t mvfr0;
 +        uint32_t mvfr1;
 +        uint32_t mvfr2;
 +        uint64_t id_aa64isar0;
 +        uint64_t id_aa64isar1;
 +        uint64_t id_aa64pfr0;
 +        uint64_t id_aa64pfr1;
 +    } isar;
      uint32_t midr;
      uint32_t revidr;
      uint32_t reset_fpsid;
 -    uint32_t mvfr0;
 -    uint32_t mvfr1;
 -    uint32_t mvfr2;
      uint32_t ctr;
      uint32_t reset_sctlr;
      uint32_t id_pfr0;
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
      uint32_t id_mmfr2;
      uint32_t id_mmfr3;
      uint32_t id_mmfr4;
 -    uint32_t id_isar0;
 -    uint32_t id_isar1;
 -    uint32_t id_isar2;
 -    uint32_t id_isar3;
 -    uint32_t id_isar4;
 -    uint32_t id_isar5;
 -    uint32_t id_isar6;
 -    uint64_t id_aa64pfr0;
 -    uint64_t id_aa64pfr1;
      uint64_t id_aa64dfr0;
      uint64_t id_aa64dfr1;
      uint64_t id_aa64afr0;
      uint64_t id_aa64afr1;
 -    uint64_t id_aa64isar0;
 -    uint64_t id_aa64isar1;
      uint64_t id_aa64mmfr0;
      uint64_t id_aa64mmfr1;
      uint32_t dbgdidr;
 diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/armv7m_nvic.c
+--- a/accel/hvf/hvf-accel-ops.c
-+++ b/hw/intc/armv7m_nvic.c
++++ b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
+@@ -XXX,XX +XXX,XX @@ static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
-     case 0xd5c: /* MMFR3.  */
+     cpu->vcpu_dirty = false;
          return cpu->id_mmfr3;
      case 0xd60: /* ISAR0.  */
 -        return cpu->id_isar0;
 +        return cpu->isar.id_isar0;
      case 0xd64: /* ISAR1.  */
 -        return cpu->id_isar1;
 +        return cpu->isar.id_isar1;
      case 0xd68: /* ISAR2.  */
 -        return cpu->id_isar2;
 +        return cpu->isar.id_isar2;
      case 0xd6c: /* ISAR3.  */
 -        return cpu->id_isar3;
 +        return cpu->isar.id_isar3;
      case 0xd70: /* ISAR4.  */
 -        return cpu->id_isar4;
 +        return cpu->isar.id_isar4;
      case 0xd74: /* ISAR5.  */
 -        return cpu->id_isar5;
 +        return cpu->isar.id_isar5;
      case 0xd78: /* CLIDR */
          return cpu->clidr;
      case 0xd7c: /* CTR */
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
      g_hash_table_foreach(cpu->cp_regs, cp_reg_check_reset, cpu);
      env->vfp.xregs[ARM_VFP_FPSID] = cpu->reset_fpsid;
 -    env->vfp.xregs[ARM_VFP_MVFR0] = cpu->mvfr0;
 -    env->vfp.xregs[ARM_VFP_MVFR1] = cpu->mvfr1;
 -    env->vfp.xregs[ARM_VFP_MVFR2] = cpu->mvfr2;
 +    env->vfp.xregs[ARM_VFP_MVFR0] = cpu->isar.mvfr0;
 +    env->vfp.xregs[ARM_VFP_MVFR1] = cpu->isar.mvfr1;
 +    env->vfp.xregs[ARM_VFP_MVFR2] = cpu->isar.mvfr2;
      cpu->power_state = cpu->start_powered_off ? PSCI_OFF : PSCI_ON;
      s->halted = cpu->start_powered_off;
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
           * registers as well. These are id_pfr1[7:4] and id_aa64pfr0[15:12].
           */
          cpu->id_pfr1 &= ~0xf0;
 -        cpu->id_aa64pfr0 &= ~0xf000;
 +        cpu->isar.id_aa64pfr0 &= ~0xf000;
      }
      if (!cpu->has_el2) {
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
           * registers if we don't have EL2. These are id_pfr1[15:12] and
           * id_aa64pfr0_el1[11:8].
           */
 -        cpu->id_aa64pfr0 &= ~0xf00;
 +        cpu->isar.id_aa64pfr0 &= ~0xf00;
          cpu->id_pfr1 &= ~0xf000;
      }
@@ -XXX,XX +XXX,XX @@ static void arm1136_r2_initfn(Object *obj)
      set_feature(&cpu->env, ARM_FEATURE_CACHE_BLOCK_OPS);
      cpu->midr = 0x4107b362;
      cpu->reset_fpsid = 0x410120b4;
 -    cpu->mvfr0 = 0x11111111;
 -    cpu->mvfr1 = 0x00000000;
 +    cpu->isar.mvfr0 = 0x11111111;
 +    cpu->isar.mvfr1 = 0x00000000;
      cpu->ctr = 0x1dd20d2;
      cpu->reset_sctlr = 0x00050078;
      cpu->id_pfr0 = 0x111;
@@ -XXX,XX +XXX,XX @@ static void arm1136_r2_initfn(Object *obj)
      cpu->id_mmfr0 = 0x01130003;
      cpu->id_mmfr1 = 0x10030302;
      cpu->id_mmfr2 = 0x01222110;
 -    cpu->id_isar0 = 0x00140011;
 -    cpu->id_isar1 = 0x12002111;
 -    cpu->id_isar2 = 0x11231111;
 -    cpu->id_isar3 = 0x01102131;
 -    cpu->id_isar4 = 0x141;
 +    cpu->isar.id_isar0 = 0x00140011;
 +    cpu->isar.id_isar1 = 0x12002111;
 +    cpu->isar.id_isar2 = 0x11231111;
 +    cpu->isar.id_isar3 = 0x01102131;
 +    cpu->isar.id_isar4 = 0x141;
      cpu->reset_auxcr = 7;
  }
-@@ -XXX,XX +XXX,XX @@ static void arm1136_initfn(Object *obj)
+-void hvf_cpu_synchronize_post_reset(CPUState *cpu)
-     set_feature(&cpu->env, ARM_FEATURE_CACHE_BLOCK_OPS);
++static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
-     cpu->midr = 0x4117b363;
+ {
-     cpu->reset_fpsid = 0x410120b4;
+     run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
 -    cpu->mvfr0 = 0x11111111;
 -    cpu->mvfr1 = 0x00000000;
 +    cpu->isar.mvfr0 = 0x11111111;
 +    cpu->isar.mvfr1 = 0x00000000;
      cpu->ctr = 0x1dd20d2;
      cpu->reset_sctlr = 0x00050078;
      cpu->id_pfr0 = 0x111;
@@ -XXX,XX +XXX,XX @@ static void arm1136_initfn(Object *obj)
      cpu->id_mmfr0 = 0x01130003;
      cpu->id_mmfr1 = 0x10030302;
      cpu->id_mmfr2 = 0x01222110;
 -    cpu->id_isar0 = 0x00140011;
 -    cpu->id_isar1 = 0x12002111;
 -    cpu->id_isar2 = 0x11231111;
 -    cpu->id_isar3 = 0x01102131;
 -    cpu->id_isar4 = 0x141;
 +    cpu->isar.id_isar0 = 0x00140011;
 +    cpu->isar.id_isar1 = 0x12002111;
 +    cpu->isar.id_isar2 = 0x11231111;
 +    cpu->isar.id_isar3 = 0x01102131;
 +    cpu->isar.id_isar4 = 0x141;
      cpu->reset_auxcr = 7;
  }
+@@ -XXX,XX +XXX,XX @@ static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
-@@ -XXX,XX +XXX,XX @@ static void arm1176_initfn(Object *obj)
+     cpu->vcpu_dirty = false;
      set_feature(&cpu->env, ARM_FEATURE_EL3);
      cpu->midr = 0x410fb767;
      cpu->reset_fpsid = 0x410120b5;
 -    cpu->mvfr0 = 0x11111111;
 -    cpu->mvfr1 = 0x00000000;
 +    cpu->isar.mvfr0 = 0x11111111;
 +    cpu->isar.mvfr1 = 0x00000000;
      cpu->ctr = 0x1dd20d2;
      cpu->reset_sctlr = 0x00050078;
      cpu->id_pfr0 = 0x111;
@@ -XXX,XX +XXX,XX @@ static void arm1176_initfn(Object *obj)
      cpu->id_mmfr0 = 0x01130003;
      cpu->id_mmfr1 = 0x10030302;
      cpu->id_mmfr2 = 0x01222100;
 -    cpu->id_isar0 = 0x0140011;
 -    cpu->id_isar1 = 0x12002111;
 -    cpu->id_isar2 = 0x11231121;
 -    cpu->id_isar3 = 0x01102131;
 -    cpu->id_isar4 = 0x01141;
 +    cpu->isar.id_isar0 = 0x0140011;
 +    cpu->isar.id_isar1 = 0x12002111;
 +    cpu->isar.id_isar2 = 0x11231121;
 +    cpu->isar.id_isar3 = 0x01102131;
 +    cpu->isar.id_isar4 = 0x01141;
      cpu->reset_auxcr = 7;
  }
-@@ -XXX,XX +XXX,XX @@ static void arm11mpcore_initfn(Object *obj)
+-void hvf_cpu_synchronize_post_init(CPUState *cpu)
-     set_feature(&cpu->env, ARM_FEATURE_DUMMY_C15_REGS);
++static void hvf_cpu_synchronize_post_init(CPUState *cpu)
-     cpu->midr = 0x410fb022;
+ {
-     cpu->reset_fpsid = 0x410120b4;
+     run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
 -    cpu->mvfr0 = 0x11111111;
 -    cpu->mvfr1 = 0x00000000;
 +    cpu->isar.mvfr0 = 0x11111111;
 +    cpu->isar.mvfr1 = 0x00000000;
      cpu->ctr = 0x1d192992; /* 32K icache 32K dcache */
      cpu->id_pfr0 = 0x111;
      cpu->id_pfr1 = 0x1;
@@ -XXX,XX +XXX,XX @@ static void arm11mpcore_initfn(Object *obj)
      cpu->id_mmfr0 = 0x01100103;
      cpu->id_mmfr1 = 0x10020302;
      cpu->id_mmfr2 = 0x01222000;
 -    cpu->id_isar0 = 0x00100011;
 -    cpu->id_isar1 = 0x12002111;
 -    cpu->id_isar2 = 0x11221011;
 -    cpu->id_isar3 = 0x01102131;
 -    cpu->id_isar4 = 0x141;
 +    cpu->isar.id_isar0 = 0x00100011;
 +    cpu->isar.id_isar1 = 0x12002111;
 +    cpu->isar.id_isar2 = 0x11221011;
 +    cpu->isar.id_isar3 = 0x01102131;
 +    cpu->isar.id_isar4 = 0x141;
      cpu->reset_auxcr = 1;
  }
+@@ -XXX,XX +XXX,XX @@ static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
-@@ -XXX,XX +XXX,XX @@ static void cortex_m3_initfn(Object *obj)
+     cpu->vcpu_dirty = true;
      cpu->id_mmfr1 = 0x00000000;
      cpu->id_mmfr2 = 0x00000000;
      cpu->id_mmfr3 = 0x00000000;
 -    cpu->id_isar0 = 0x01141110;
 -    cpu->id_isar1 = 0x02111000;
 -    cpu->id_isar2 = 0x21112231;
 -    cpu->id_isar3 = 0x01111110;
 -    cpu->id_isar4 = 0x01310102;
 -    cpu->id_isar5 = 0x00000000;
 -    cpu->id_isar6 = 0x00000000;
 +    cpu->isar.id_isar0 = 0x01141110;
 +    cpu->isar.id_isar1 = 0x02111000;
 +    cpu->isar.id_isar2 = 0x21112231;
 +    cpu->isar.id_isar3 = 0x01111110;
 +    cpu->isar.id_isar4 = 0x01310102;
 +    cpu->isar.id_isar5 = 0x00000000;
 +    cpu->isar.id_isar6 = 0x00000000;
  }
- static void cortex_m4_initfn(Object *obj)
+-void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
-@@ -XXX,XX +XXX,XX @@ static void cortex_m4_initfn(Object *obj)
++static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
-     cpu->id_mmfr1 = 0x00000000;
+ {
-     cpu->id_mmfr2 = 0x00000000;
+     run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
      cpu->id_mmfr3 = 0x00000000;
 -    cpu->id_isar0 = 0x01141110;
 -    cpu->id_isar1 = 0x02111000;
 -    cpu->id_isar2 = 0x21112231;
 -    cpu->id_isar3 = 0x01111110;
 -    cpu->id_isar4 = 0x01310102;
 -    cpu->id_isar5 = 0x00000000;
 -    cpu->id_isar6 = 0x00000000;
 +    cpu->isar.id_isar0 = 0x01141110;
 +    cpu->isar.id_isar1 = 0x02111000;
 +    cpu->isar.id_isar2 = 0x21112231;
 +    cpu->isar.id_isar3 = 0x01111110;
 +    cpu->isar.id_isar4 = 0x01310102;
 +    cpu->isar.id_isar5 = 0x00000000;
 +    cpu->isar.id_isar6 = 0x00000000;
  }
- static void cortex_m33_initfn(Object *obj)
-@@ -XXX,XX +XXX,XX @@ static void cortex_m33_initfn(Object *obj)
-     cpu->id_mmfr1 = 0x00000000;
-     cpu->id_mmfr2 = 0x01000000;
-     cpu->id_mmfr3 = 0x00000000;
--    cpu->id_isar0 = 0x01101110;
--    cpu->id_isar1 = 0x02212000;
--    cpu->id_isar2 = 0x20232232;
--    cpu->id_isar3 = 0x01111131;
--    cpu->id_isar4 = 0x01310132;
--    cpu->id_isar5 = 0x00000000;
--    cpu->id_isar6 = 0x00000000;
-+    cpu->isar.id_isar0 = 0x01101110;
-+    cpu->isar.id_isar1 = 0x02212000;
-+    cpu->isar.id_isar2 = 0x20232232;
-+    cpu->isar.id_isar3 = 0x01111131;
-+    cpu->isar.id_isar4 = 0x01310132;
-+    cpu->isar.id_isar5 = 0x00000000;
-+    cpu->isar.id_isar6 = 0x00000000;
-     cpu->clidr = 0x00000000;
-     cpu->ctr = 0x8000c000;
- }
-@@ -XXX,XX +XXX,XX @@ static void cortex_r5_initfn(Object *obj)
-     cpu->id_mmfr1 = 0x00000000;
-     cpu->id_mmfr2 = 0x01200000;
-     cpu->id_mmfr3 = 0x0211;
--    cpu->id_isar0 = 0x02101111;
--    cpu->id_isar1 = 0x13112111;
--    cpu->id_isar2 = 0x21232141;
--    cpu->id_isar3 = 0x01112131;
--    cpu->id_isar4 = 0x0010142;
--    cpu->id_isar5 = 0x0;
--    cpu->id_isar6 = 0x0;
-+    cpu->isar.id_isar0 = 0x02101111;
-+    cpu->isar.id_isar1 = 0x13112111;
-+    cpu->isar.id_isar2 = 0x21232141;
-+    cpu->isar.id_isar3 = 0x01112131;
-+    cpu->isar.id_isar4 = 0x0010142;
-+    cpu->isar.id_isar5 = 0x0;
-+    cpu->isar.id_isar6 = 0x0;
-     cpu->mp_is_up = true;
-     cpu->pmsav7_dregion = 16;
-     define_arm_cp_regs(cpu, cortexr5_cp_reginfo);
-@@ -XXX,XX +XXX,XX @@ static void cortex_a8_initfn(Object *obj)
-     set_feature(&cpu->env, ARM_FEATURE_EL3);
-     cpu->midr = 0x410fc080;
-     cpu->reset_fpsid = 0x410330c0;
--    cpu->mvfr0 = 0x11110222;
--    cpu->mvfr1 = 0x00011111;
-+    cpu->isar.mvfr0 = 0x11110222;
-+    cpu->isar.mvfr1 = 0x00011111;
-     cpu->ctr = 0x82048004;
-     cpu->reset_sctlr = 0x00c50078;
-     cpu->id_pfr0 = 0x1031;
-@@ -XXX,XX +XXX,XX @@ static void cortex_a8_initfn(Object *obj)
-     cpu->id_mmfr1 = 0x20000000;
-     cpu->id_mmfr2 = 0x01202000;
-     cpu->id_mmfr3 = 0x11;
--    cpu->id_isar0 = 0x00101111;
--    cpu->id_isar1 = 0x12112111;
--    cpu->id_isar2 = 0x21232031;
--    cpu->id_isar3 = 0x11112131;
--    cpu->id_isar4 = 0x00111142;
-+    cpu->isar.id_isar0 = 0x00101111;
-+    cpu->isar.id_isar1 = 0x12112111;
-+    cpu->isar.id_isar2 = 0x21232031;
-+    cpu->isar.id_isar3 = 0x11112131;
-+    cpu->isar.id_isar4 = 0x00111142;
-     cpu->dbgdidr = 0x15141000;
-     cpu->clidr = (1 << 27) | (2 << 24) | 3;
-     cpu->ccsidr[0] = 0xe007e01a; /* 16k L1 dcache. */
-@@ -XXX,XX +XXX,XX @@ static void cortex_a9_initfn(Object *obj)
-     set_feature(&cpu->env, ARM_FEATURE_CBAR);
-     cpu->midr = 0x410fc090;
-     cpu->reset_fpsid = 0x41033090;
--    cpu->mvfr0 = 0x11110222;
--    cpu->mvfr1 = 0x01111111;
-+    cpu->isar.mvfr0 = 0x11110222;
-+    cpu->isar.mvfr1 = 0x01111111;
-     cpu->ctr = 0x80038003;
-     cpu->reset_sctlr = 0x00c50078;
-     cpu->id_pfr0 = 0x1031;
-@@ -XXX,XX +XXX,XX @@ static void cortex_a9_initfn(Object *obj)
-     cpu->id_mmfr1 = 0x20000000;
-     cpu->id_mmfr2 = 0x01230000;
-     cpu->id_mmfr3 = 0x00002111;
--    cpu->id_isar0 = 0x00101111;
--    cpu->id_isar1 = 0x13112111;
--    cpu->id_isar2 = 0x21232041;
--    cpu->id_isar3 = 0x11112131;
--    cpu->id_isar4 = 0x00111142;
-+    cpu->isar.id_isar0 = 0x00101111;
-+    cpu->isar.id_isar1 = 0x13112111;
-+    cpu->isar.id_isar2 = 0x21232041;
-+    cpu->isar.id_isar3 = 0x11112131;
-+    cpu->isar.id_isar4 = 0x00111142;
-     cpu->dbgdidr = 0x35141000;
-     cpu->clidr = (1 << 27) | (1 << 24) | 3;
-     cpu->ccsidr[0] = 0xe00fe019; /* 16k L1 dcache. */
-@@ -XXX,XX +XXX,XX @@ static void cortex_a7_initfn(Object *obj)
-     cpu->kvm_target = QEMU_KVM_ARM_TARGET_CORTEX_A7;
-     cpu->midr = 0x410fc075;
-     cpu->reset_fpsid = 0x41023075;
--    cpu->mvfr0 = 0x10110222;
--    cpu->mvfr1 = 0x11111111;
-+    cpu->isar.mvfr0 = 0x10110222;
-+    cpu->isar.mvfr1 = 0x11111111;
-     cpu->ctr = 0x84448003;
-     cpu->reset_sctlr = 0x00c50078;
-     cpu->id_pfr0 = 0x00001131;
-@@ -XXX,XX +XXX,XX @@ static void cortex_a7_initfn(Object *obj)
-     /* a7_mpcore_r0p5_trm, page 4-4 gives 0x01101110; but
-      * table 4-41 gives 0x02101110, which includes the arm div insns.
-      */
--    cpu->id_isar0 = 0x02101110;
--    cpu->id_isar1 = 0x13112111;
--    cpu->id_isar2 = 0x21232041;
--    cpu->id_isar3 = 0x11112131;
--    cpu->id_isar4 = 0x10011142;
-+    cpu->isar.id_isar0 = 0x02101110;
-+    cpu->isar.id_isar1 = 0x13112111;
-+    cpu->isar.id_isar2 = 0x21232041;
-+    cpu->isar.id_isar3 = 0x11112131;
-+    cpu->isar.id_isar4 = 0x10011142;
-     cpu->dbgdidr = 0x3515f005;
-     cpu->clidr = 0x0a200023;
-     cpu->ccsidr[0] = 0x701fe00a; /* 32K L1 dcache */
-@@ -XXX,XX +XXX,XX @@ static void cortex_a15_initfn(Object *obj)
-     cpu->kvm_target = QEMU_KVM_ARM_TARGET_CORTEX_A15;
-     cpu->midr = 0x412fc0f1;
-     cpu->reset_fpsid = 0x410430f0;
--    cpu->mvfr0 = 0x10110222;
--    cpu->mvfr1 = 0x11111111;
-+    cpu->isar.mvfr0 = 0x10110222;
-+    cpu->isar.mvfr1 = 0x11111111;
-     cpu->ctr = 0x8444c004;
-     cpu->reset_sctlr = 0x00c50078;
-     cpu->id_pfr0 = 0x00001131;
-@@ -XXX,XX +XXX,XX @@ static void cortex_a15_initfn(Object *obj)
-     cpu->id_mmfr1 = 0x20000000;
-     cpu->id_mmfr2 = 0x01240000;
-     cpu->id_mmfr3 = 0x02102211;
--    cpu->id_isar0 = 0x02101110;
--    cpu->id_isar1 = 0x13112111;
--    cpu->id_isar2 = 0x21232041;
--    cpu->id_isar3 = 0x11112131;
--    cpu->id_isar4 = 0x10011142;
-+    cpu->isar.id_isar0 = 0x02101110;
-+    cpu->isar.id_isar1 = 0x13112111;
-+    cpu->isar.id_isar2 = 0x21232041;
-+    cpu->isar.id_isar3 = 0x11112131;
-+    cpu->isar.id_isar4 = 0x10011142;
-     cpu->dbgdidr = 0x3515f021;
-     cpu->clidr = 0x0a200023;
-     cpu->ccsidr[0] = 0x701fe00a; /* 32K L1 dcache */
-diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu64.c
-+++ b/target/arm/cpu64.c
-@@ -XXX,XX +XXX,XX @@ static void aarch64_a57_initfn(Object *obj)
-     cpu->midr = 0x411fd070;
-     cpu->revidr = 0x00000000;
-     cpu->reset_fpsid = 0x41034070;
--    cpu->mvfr0 = 0x10110222;
--    cpu->mvfr1 = 0x12111111;
--    cpu->mvfr2 = 0x00000043;
-+    cpu->isar.mvfr0 = 0x10110222;
-+    cpu->isar.mvfr1 = 0x12111111;
-+    cpu->isar.mvfr2 = 0x00000043;
-     cpu->ctr = 0x8444c004;
-     cpu->reset_sctlr = 0x00c50838;
-     cpu->id_pfr0 = 0x00000131;
-@@ -XXX,XX +XXX,XX @@ static void aarch64_a57_initfn(Object *obj)
-     cpu->id_mmfr1 = 0x40000000;
-     cpu->id_mmfr2 = 0x01260000;
-     cpu->id_mmfr3 = 0x02102211;
--    cpu->id_isar0 = 0x02101110;
--    cpu->id_isar1 = 0x13112111;
--    cpu->id_isar2 = 0x21232042;
--    cpu->id_isar3 = 0x01112131;
--    cpu->id_isar4 = 0x00011142;
--    cpu->id_isar5 = 0x00011121;
--    cpu->id_isar6 = 0;
--    cpu->id_aa64pfr0 = 0x00002222;
-+    cpu->isar.id_isar0 = 0x02101110;
-+    cpu->isar.id_isar1 = 0x13112111;
-+    cpu->isar.id_isar2 = 0x21232042;
-+    cpu->isar.id_isar3 = 0x01112131;
-+    cpu->isar.id_isar4 = 0x00011142;
-+    cpu->isar.id_isar5 = 0x00011121;
-+    cpu->isar.id_isar6 = 0;
-+    cpu->isar.id_aa64pfr0 = 0x00002222;
-     cpu->id_aa64dfr0 = 0x10305106;
-     cpu->pmceid0 = 0x00000000;
-     cpu->pmceid1 = 0x00000000;
--    cpu->id_aa64isar0 = 0x00011120;
-+    cpu->isar.id_aa64isar0 = 0x00011120;
-     cpu->id_aa64mmfr0 = 0x00001124;
-     cpu->dbgdidr = 0x3516d000;
-     cpu->clidr = 0x0a200023;
-@@ -XXX,XX +XXX,XX @@ static void aarch64_a53_initfn(Object *obj)
-     cpu->midr = 0x410fd034;
-     cpu->revidr = 0x00000000;
-     cpu->reset_fpsid = 0x41034070;
--    cpu->mvfr0 = 0x10110222;
--    cpu->mvfr1 = 0x12111111;
--    cpu->mvfr2 = 0x00000043;
-+    cpu->isar.mvfr0 = 0x10110222;
-+    cpu->isar.mvfr1 = 0x12111111;
-+    cpu->isar.mvfr2 = 0x00000043;
-     cpu->ctr = 0x84448004; /* L1Ip = VIPT */
-     cpu->reset_sctlr = 0x00c50838;
-     cpu->id_pfr0 = 0x00000131;
-@@ -XXX,XX +XXX,XX @@ static void aarch64_a53_initfn(Object *obj)
-     cpu->id_mmfr1 = 0x40000000;
-     cpu->id_mmfr2 = 0x01260000;
-     cpu->id_mmfr3 = 0x02102211;
--    cpu->id_isar0 = 0x02101110;
--    cpu->id_isar1 = 0x13112111;
--    cpu->id_isar2 = 0x21232042;
--    cpu->id_isar3 = 0x01112131;
--    cpu->id_isar4 = 0x00011142;
--    cpu->id_isar5 = 0x00011121;
--    cpu->id_isar6 = 0;
--    cpu->id_aa64pfr0 = 0x00002222;
-+    cpu->isar.id_isar0 = 0x02101110;
-+    cpu->isar.id_isar1 = 0x13112111;
-+    cpu->isar.id_isar2 = 0x21232042;
-+    cpu->isar.id_isar3 = 0x01112131;
-+    cpu->isar.id_isar4 = 0x00011142;
-+    cpu->isar.id_isar5 = 0x00011121;
-+    cpu->isar.id_isar6 = 0;
-+    cpu->isar.id_aa64pfr0 = 0x00002222;
-     cpu->id_aa64dfr0 = 0x10305106;
--    cpu->id_aa64isar0 = 0x00011120;
-+    cpu->isar.id_aa64isar0 = 0x00011120;
-     cpu->id_aa64mmfr0 = 0x00001122; /* 40 bit physical addr */
-     cpu->dbgdidr = 0x3516d000;
-     cpu->clidr = 0x0a200023;
-@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
-     cpu->midr = 0x410fd083;
-     cpu->revidr = 0x00000000;
-     cpu->reset_fpsid = 0x41034080;
--    cpu->mvfr0 = 0x10110222;
--    cpu->mvfr1 = 0x12111111;
--    cpu->mvfr2 = 0x00000043;
-+    cpu->isar.mvfr0 = 0x10110222;
-+    cpu->isar.mvfr1 = 0x12111111;
-+    cpu->isar.mvfr2 = 0x00000043;
-     cpu->ctr = 0x8444c004;
-     cpu->reset_sctlr = 0x00c50838;
-     cpu->id_pfr0 = 0x00000131;
-@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
-     cpu->id_mmfr1 = 0x40000000;
-     cpu->id_mmfr2 = 0x01260000;
-     cpu->id_mmfr3 = 0x02102211;
--    cpu->id_isar0 = 0x02101110;
--    cpu->id_isar1 = 0x13112111;
--    cpu->id_isar2 = 0x21232042;
--    cpu->id_isar3 = 0x01112131;
--    cpu->id_isar4 = 0x00011142;
--    cpu->id_isar5 = 0x00011121;
--    cpu->id_aa64pfr0 = 0x00002222;
-+    cpu->isar.id_isar0 = 0x02101110;
-+    cpu->isar.id_isar1 = 0x13112111;
-+    cpu->isar.id_isar2 = 0x21232042;
-+    cpu->isar.id_isar3 = 0x01112131;
-+    cpu->isar.id_isar4 = 0x00011142;
-+    cpu->isar.id_isar5 = 0x00011121;
-+    cpu->isar.id_aa64pfr0 = 0x00002222;
-     cpu->id_aa64dfr0 = 0x10305106;
-     cpu->pmceid0 = 0x00000000;
-     cpu->pmceid1 = 0x00000000;
--    cpu->id_aa64isar0 = 0x00011120;
-+    cpu->isar.id_aa64isar0 = 0x00011120;
-     cpu->id_aa64mmfr0 = 0x00001124;
-     cpu->dbgdidr = 0x3516d000;
-     cpu->clidr = 0x0a200023;
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static uint64_t id_pfr1_read(CPUARMState *env, const ARMCPRegInfo *ri)
- static uint64_t id_aa64pfr0_read(CPUARMState *env, const ARMCPRegInfo *ri)
- {
-     ARMCPU *cpu = arm_env_get_cpu(env);
--    uint64_t pfr0 = cpu->id_aa64pfr0;
-+    uint64_t pfr0 = cpu->isar.id_aa64pfr0;
-     if (env->gicv3state) {
-         pfr0 |= 1 << 24;
-@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
-             { .name = "ID_ISAR0", .state = ARM_CP_STATE_BOTH,
-               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 0,
-               .access = PL1_R, .type = ARM_CP_CONST,
--              .resetvalue = cpu->id_isar0 },
-+              .resetvalue = cpu->isar.id_isar0 },
-             { .name = "ID_ISAR1", .state = ARM_CP_STATE_BOTH,
-               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 1,
-               .access = PL1_R, .type = ARM_CP_CONST,
--              .resetvalue = cpu->id_isar1 },
-+              .resetvalue = cpu->isar.id_isar1 },
-             { .name = "ID_ISAR2", .state = ARM_CP_STATE_BOTH,
-               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 2,
-               .access = PL1_R, .type = ARM_CP_CONST,
--              .resetvalue = cpu->id_isar2 },
-+              .resetvalue = cpu->isar.id_isar2 },
-             { .name = "ID_ISAR3", .state = ARM_CP_STATE_BOTH,
-               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 3,
-               .access = PL1_R, .type = ARM_CP_CONST,
--              .resetvalue = cpu->id_isar3 },
-+              .resetvalue = cpu->isar.id_isar3 },
-             { .name = "ID_ISAR4", .state = ARM_CP_STATE_BOTH,
-               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 4,
-               .access = PL1_R, .type = ARM_CP_CONST,
--              .resetvalue = cpu->id_isar4 },
-+              .resetvalue = cpu->isar.id_isar4 },
-             { .name = "ID_ISAR5", .state = ARM_CP_STATE_BOTH,
-               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 5,
-               .access = PL1_R, .type = ARM_CP_CONST,
--              .resetvalue = cpu->id_isar5 },
-+              .resetvalue = cpu->isar.id_isar5 },
-             { .name = "ID_MMFR4", .state = ARM_CP_STATE_BOTH,
-               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 6,
-               .access = PL1_R, .type = ARM_CP_CONST,
-@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
-             { .name = "ID_ISAR6", .state = ARM_CP_STATE_BOTH,
-               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 7,
-               .access = PL1_R, .type = ARM_CP_CONST,
--              .resetvalue = cpu->id_isar6 },
-+              .resetvalue = cpu->isar.id_isar6 },
-             REGINFO_SENTINEL
-         };
-         define_arm_cp_regs(cpu, v6_idregs);
-@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
-             { .name = "ID_AA64PFR1_EL1", .state = ARM_CP_STATE_AA64,
-               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 4, .opc2 = 1,
-               .access = PL1_R, .type = ARM_CP_CONST,
--              .resetvalue = cpu->id_aa64pfr1},
-+              .resetvalue = cpu->isar.id_aa64pfr1},
-             { .name = "ID_AA64PFR2_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
-               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 4, .opc2 = 2,
-               .access = PL1_R, .type = ARM_CP_CONST,
-@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
-             { .name = "ID_AA64ISAR0_EL1", .state = ARM_CP_STATE_AA64,
-               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 0,
-               .access = PL1_R, .type = ARM_CP_CONST,
--              .resetvalue = cpu->id_aa64isar0 },
-+              .resetvalue = cpu->isar.id_aa64isar0 },
-             { .name = "ID_AA64ISAR1_EL1", .state = ARM_CP_STATE_AA64,
-               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 1,
-               .access = PL1_R, .type = ARM_CP_CONST,
--              .resetvalue = cpu->id_aa64isar1 },
-+              .resetvalue = cpu->isar.id_aa64isar1 },
-             { .name = "ID_AA64ISAR2_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
-               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 2,
-               .access = PL1_R, .type = ARM_CP_CONST,
-@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
-             { .name = "MVFR0_EL1", .state = ARM_CP_STATE_AA64,
-               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 3, .opc2 = 0,
-               .access = PL1_R, .type = ARM_CP_CONST,
--              .resetvalue = cpu->mvfr0 },
-+              .resetvalue = cpu->isar.mvfr0 },
-             { .name = "MVFR1_EL1", .state = ARM_CP_STATE_AA64,
-               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 3, .opc2 = 1,
-               .access = PL1_R, .type = ARM_CP_CONST,
--              .resetvalue = cpu->mvfr1 },
-+              .resetvalue = cpu->isar.mvfr1 },
-             { .name = "MVFR2_EL1", .state = ARM_CP_STATE_AA64,
-               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 3, .opc2 = 2,
-               .access = PL1_R, .type = ARM_CP_CONST,
--              .resetvalue = cpu->mvfr2 },
-+              .resetvalue = cpu->isar.mvfr2 },
-             { .name = "MVFR3_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
-               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 3, .opc2 = 3,
-               .access = PL1_R, .type = ARM_CP_CONST,
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 26/45] target/arm: Mark some arrays const
+[PULL 37/45] hvf: Remove hvf-accel-ops.h
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Alexander Graf <agraf@csgraf.de>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+We can move the definition of hvf_vcpu_exec() into our internal
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+hvf header, obsoleting the need for hvf-accel-ops.h.
-Message-id: 20181011205206.3552-6-richard.henderson@linaro.org
-[PMM: drop change to now-deleted cpu_mode_names array]
+Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-11-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 4 ++--
+ accel/hvf/hvf-accel-ops.h | 17 -----------------
-file changed, 2 insertions(+), 2 deletions(-)
+ include/sysemu/hvf_int.h  |  1 +
  accel/hvf/hvf-accel-ops.c |  2 --
  target/i386/hvf/hvf.c     |  2 --
 files changed, 1 insertion(+), 21 deletions(-)
  delete mode 100644 accel/hvf/hvf-accel-ops.h
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
 deleted file mode 100644
 index XXXXXXX..XXXXXXX
 --- a/accel/hvf/hvf-accel-ops.h
 +++ /dev/null
@@ -XXX,XX +XXX,XX @@
 -/*
 - * Accelerator CPUS Interface
 - *
 - * Copyright 2020 SUSE LLC
 - *
 - * This work is licensed under the terms of the GNU GPL, version 2 or later.
 - * See the COPYING file in the top-level directory.
 - */
 -
 -#ifndef HVF_CPUS_H
 -#define HVF_CPUS_H
 -
 -#include "sysemu/cpus.h"
 -
 -int hvf_vcpu_exec(CPUState *);
 -
 -#endif /* HVF_CPUS_H */
 diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/include/sysemu/hvf_int.h
-+++ b/target/arm/translate.c
++++ b/include/sysemu/hvf_int.h
-@@ -XXX,XX +XXX,XX @@ static TCGv_i64 cpu_F0d, cpu_F1d;
+@@ -XXX,XX +XXX,XX @@ extern HVFState *hvf_state;
+ void assert_hvf_ok(hv_return_t ret);
- #include "exec/gen-icount.h"
+ int hvf_arch_init_vcpu(CPUState *cpu);
+ void hvf_arch_vcpu_destroy(CPUState *cpu);
--static const char *regnames[] =
++int hvf_vcpu_exec(CPUState *);
-+static const char * const regnames[] =
+ hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
-     { "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
+ int hvf_put_registers(CPUState *);
-       "r8", "r9", "r10", "r11", "r12", "r13", "r14", "pc" };
+ int hvf_get_registers(CPUState *);
+diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@ static struct {
+index XXXXXXX..XXXXXXX 100644
-     int nregs;
+--- a/accel/hvf/hvf-accel-ops.c
-     int interleave;
++++ b/accel/hvf/hvf-accel-ops.c
-     int spacing;
+@@ -XXX,XX +XXX,XX @@
--} neon_ls_element_type[11] = {
+ #include "sysemu/runstate.h"
-+} const neon_ls_element_type[11] = {
+ #include "qemu/guest-random.h"
-     {4, 4, 1},
-     {4, 4, 2},
+-#include "hvf-accel-ops.h"
-     {4, 1, 1},
+-
  HVFState *hvf_state;
  /* Memory slots */
 diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/hvf.c
 +++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/accel.h"
  #include "target/i386/cpu.h"
 -#include "hvf-accel-ops.h"
 -
  void vmx_update_tpr(CPUState *cpu)
  {
      /* TODO: need integrate APIC handling */
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 10/45] target/arm: Convert v8.2-fp16 from feature bit to aa64pfr0 test
+[PULL 38/45] hvf: Introduce hvf vcpu struct
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Alexander Graf <agraf@csgraf.de>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+We will need more than a single field for hvf going forward. To keep
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+the global vcpu struct uncluttered, let's allocate a special hvf vcpu
-Message-id: 20181016223115.24100-9-richard.henderson@linaro.org
+struct, similar to how hax does it.
 Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
 Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
 Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-12-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h           | 17 +++++++++++++++-
+ include/hw/core/cpu.h       |   3 +-
- linux-user/elfload.c       |  6 +-----
+ include/sysemu/hvf_int.h    |   4 +
- target/arm/cpu64.c         | 16 ++++++++-------
+ target/i386/hvf/vmx.h       |  24 +++--
- target/arm/helper.c        |  2 +-
+ accel/hvf/hvf-accel-ops.c   |   8 +-
- target/arm/translate-a64.c | 40 +++++++++++++++++++-------------------
+ target/i386/hvf/hvf.c       | 104 +++++++++---------
- target/arm/translate.c     |  6 +++---
+ target/i386/hvf/x86.c       |  28 ++---
-files changed, 50 insertions(+), 37 deletions(-)
+ target/i386/hvf/x86_descr.c |  26 ++---
  target/i386/hvf/x86_emu.c   |  62 +++++------
  target/i386/hvf/x86_mmu.c   |   4 +-
  target/i386/hvf/x86_task.c  |  12 +--
  target/i386/hvf/x86hvf.c    | 210 ++++++++++++++++++------------------
 files changed, 248 insertions(+), 237 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/include/hw/core/cpu.h
-+++ b/target/arm/cpu.h
++++ b/include/hw/core/cpu.h
-@@ -XXX,XX +XXX,XX @@ enum arm_features {
+@@ -XXX,XX +XXX,XX @@ struct KVMState;
-     ARM_FEATURE_PMU, /* has PMU support */
+ struct kvm_run;
-     ARM_FEATURE_VBAR, /* has cp15 VBAR */
-     ARM_FEATURE_M_SECURITY, /* M profile Security Extension */
+ struct hax_vcpu_state;
--    ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
++struct hvf_vcpu_state;
-     ARM_FEATURE_M_MAIN, /* M profile Main Extension */
  #define TB_JMP_CACHE_BITS 12
  #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
@@ -XXX,XX +XXX,XX @@ struct CPUState {
      struct hax_vcpu_state *hax_vcpu;
 -    int hvf_fd;
 +    struct hvf_vcpu_state *hvf;
      /* track IOMMUs whose translations we've cached in the TCG TLB */
      GArray *iommu_notifiers;
 diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/sysemu/hvf_int.h
 +++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
  };
+ extern HVFState *hvf_state;
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_dp(const ARMISARegisters *id)
-     return FIELD_EX32(id->id_isar6, ID_ISAR6, DP) != 0;
++struct hvf_vcpu_state {
- }
++    int fd;
++};
 +static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
 +{
 +    /*
 +     * This is a placeholder for use by VCMA until the rest of
 +     * the ARMv8.2-FP16 extension is implemented for aa32 mode.
 +     * At which point we can properly set and check MVFR1.FPHP.
 +     */
 +    return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, FP) == 1;
 +}
 +
- /*
+ void assert_hvf_ok(hv_return_t ret);
-  * 64-bit feature tests via id registers.
+ int hvf_arch_init_vcpu(CPUState *cpu);
-  */
+ void hvf_arch_vcpu_destroy(CPUState *cpu);
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_fcma(const ARMISARegisters *id)
+diff --git a/target/i386/hvf/vmx.h b/target/i386/hvf/vmx.h
-     return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, FCMA) != 0;
+index XXXXXXX..XXXXXXX 100644
- }
+--- a/target/i386/hvf/vmx.h
++++ b/target/i386/hvf/vmx.h
-+static inline bool isar_feature_aa64_fp16(const ARMISARegisters *id)
+@@ -XXX,XX +XXX,XX @@
-+{
+ #include "vmcs.h"
-+    /* We always set the AdvSIMD and FP fields identically wrt FP16.  */
+ #include "cpu.h"
-+    return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, FP) == 1;
+ #include "x86.h"
-+}
++#include "sysemu/hvf.h"
 +#include "sysemu/hvf_int.h"
  #include "exec/address-spaces.h"
@@ -XXX,XX +XXX,XX @@ static inline void macvm_set_rip(CPUState *cpu, uint64_t rip)
      uint64_t val;
      /* BUG, should take considering overlap.. */
 -    wreg(cpu->hvf_fd, HV_X86_RIP, rip);
 +    wreg(cpu->hvf->fd, HV_X86_RIP, rip);
      env->eip = rip;
      /* after moving forward in rip, we need to clean INTERRUPTABILITY */
 -   val = rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
 +   val = rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
     if (val & (VMCS_INTERRUPTIBILITY_STI_BLOCKING |
                 VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)) {
          env->hflags &= ~HF_INHIBIT_IRQ_MASK;
 -        wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY,
 +        wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY,
                 val & ~(VMCS_INTERRUPTIBILITY_STI_BLOCKING |
                 VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING));
     }
@@ -XXX,XX +XXX,XX @@ static inline void vmx_clear_nmi_blocking(CPUState *cpu)
      CPUX86State *env = &x86_cpu->env;
      env->hflags2 &= ~HF2_NMI_MASK;
 -    uint32_t gi = (uint32_t) rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
 +    uint32_t gi = (uint32_t) rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
      gi &= ~VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
 -    wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
 +    wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
  }
  static inline void vmx_set_nmi_blocking(CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ static inline void vmx_set_nmi_blocking(CPUState *cpu)
      CPUX86State *env = &x86_cpu->env;
      env->hflags2 |= HF2_NMI_MASK;
 -    uint32_t gi = (uint32_t)rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
 +    uint32_t gi = (uint32_t)rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
      gi |= VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
 -    wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
 +    wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
  }
  static inline void vmx_set_nmi_window_exiting(CPUState *cpu)
  {
      uint64_t val;
 -    val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
 -    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val |
 +    val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
 +    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val |
            VMCS_PRI_PROC_BASED_CTLS_NMI_WINDOW_EXITING);
  }
@@ -XXX,XX +XXX,XX @@ static inline void vmx_clear_nmi_window_exiting(CPUState *cpu)
  {
      uint64_t val;
 -    val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
 -    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val &
 +    val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
 +    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val &
            ~VMCS_PRI_PROC_BASED_CTLS_NMI_WINDOW_EXITING);
  }
 diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/hvf/hvf-accel-ops.c
 +++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ type_init(hvf_type_init);
  static void hvf_vcpu_destroy(CPUState *cpu)
  {
 -    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
 +    hv_return_t ret = hv_vcpu_destroy(cpu->hvf->fd);
      assert_hvf_ok(ret);
      hvf_arch_vcpu_destroy(cpu);
 +    g_free(cpu->hvf);
 +    cpu->hvf = NULL;
  }
  static int hvf_init_vcpu(CPUState *cpu)
  {
      int r;
 +    cpu->hvf = g_malloc0(sizeof(*cpu->hvf));
 +
- static inline bool isar_feature_aa64_sve(const ARMISARegisters *id)
+     /* init cpu signals */
- {
+     sigset_t set;
-     return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, SVE) != 0;
+     struct sigaction sigact;
-diff --git a/linux-user/elfload.c b/linux-user/elfload.c
+@@ -XXX,XX +XXX,XX @@ static int hvf_init_vcpu(CPUState *cpu)
      pthread_sigmask(SIG_BLOCK, NULL, &set);
      sigdelset(&set, SIG_IPI);
 -    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
 +    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf->fd, HV_VCPU_DEFAULT);
      cpu->vcpu_dirty = 1;
      assert_hvf_ok(r);
 diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
 index XXXXXXX..XXXXXXX 100644
---- a/linux-user/elfload.c
+--- a/target/i386/hvf/hvf.c
-+++ b/linux-user/elfload.c
++++ b/target/i386/hvf/hvf.c
-@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
+@@ -XXX,XX +XXX,XX @@ void vmx_update_tpr(CPUState *cpu)
-     hwcaps |= ARM_HWCAP_A64_ASIMD;
+     int tpr = cpu_get_apic_tpr(x86_cpu->apic_state) << 4;
+     int irr = apic_get_highest_priority_irr(x86_cpu->apic_state);
-     /* probe for the extra features */
--#define GET_FEATURE(feat, hwcap) \
+-    wreg(cpu->hvf_fd, HV_X86_TPR, tpr);
--    do { if (arm_feature(&cpu->env, feat)) { hwcaps |= hwcap; } } while (0)
++    wreg(cpu->hvf->fd, HV_X86_TPR, tpr);
- #define GET_FEATURE_ID(feat, hwcap) \
+     if (irr == -1) {
-     do { if (cpu_isar_feature(feat, cpu)) { hwcaps |= hwcap; } } while (0)
+-        wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, 0);
++        wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, 0);
-@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
+     } else {
-     GET_FEATURE_ID(aa64_sha3, ARM_HWCAP_A64_SHA3);
+-        wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, (irr > tpr) ? tpr >> 4 :
-     GET_FEATURE_ID(aa64_sm3, ARM_HWCAP_A64_SM3);
++        wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, (irr > tpr) ? tpr >> 4 :
-     GET_FEATURE_ID(aa64_sm4, ARM_HWCAP_A64_SM4);
+               irr >> 4);
--    GET_FEATURE(ARM_FEATURE_V8_FP16,
+     }
--                ARM_HWCAP_A64_FPHP | ARM_HWCAP_A64_ASIMDHP);
+ }
-+    GET_FEATURE_ID(aa64_fp16, ARM_HWCAP_A64_FPHP | ARM_HWCAP_A64_ASIMDHP);
+@@ -XXX,XX +XXX,XX @@ void vmx_update_tpr(CPUState *cpu)
-     GET_FEATURE_ID(aa64_atomics, ARM_HWCAP_A64_ATOMICS);
+ static void update_apic_tpr(CPUState *cpu)
-     GET_FEATURE_ID(aa64_rdm, ARM_HWCAP_A64_ASIMDRDM);
+ {
-     GET_FEATURE_ID(aa64_dp, ARM_HWCAP_A64_ASIMDDP);
+     X86CPU *x86_cpu = X86_CPU(cpu);
-     GET_FEATURE_ID(aa64_fcma, ARM_HWCAP_A64_FCMA);
+-    int tpr = rreg(cpu->hvf_fd, HV_X86_TPR) >> 4;
-     GET_FEATURE_ID(aa64_sve, ARM_HWCAP_A64_SVE);
++    int tpr = rreg(cpu->hvf->fd, HV_X86_TPR) >> 4;
+     cpu_set_apic_tpr(x86_cpu->apic_state, tpr);
--#undef GET_FEATURE
+ }
- #undef GET_FEATURE_ID
+@@ -XXX,XX +XXX,XX @@ int hvf_arch_init_vcpu(CPUState *cpu)
-     return hwcaps;
+     }
-diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
-index XXXXXXX..XXXXXXX 100644
+     /* set VMCS control fields */
---- a/target/arm/cpu64.c
+-    wvmcs(cpu->hvf_fd, VMCS_PIN_BASED_CTLS,
-+++ b/target/arm/cpu64.c
++    wvmcs(cpu->hvf->fd, VMCS_PIN_BASED_CTLS,
-@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
+           cap2ctrl(hvf_state->hvf_caps->vmx_cap_pinbased,
+           VMCS_PIN_BASED_CTLS_EXTINT |
-         t = cpu->isar.id_aa64pfr0;
+           VMCS_PIN_BASED_CTLS_NMI |
-         t = FIELD_DP64(t, ID_AA64PFR0, SVE, 1);
+           VMCS_PIN_BASED_CTLS_VNMI));
-+        t = FIELD_DP64(t, ID_AA64PFR0, FP, 1);
+-    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS,
-+        t = FIELD_DP64(t, ID_AA64PFR0, ADVSIMD, 1);
++    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS,
-         cpu->isar.id_aa64pfr0 = t;
+           cap2ctrl(hvf_state->hvf_caps->vmx_cap_procbased,
+           VMCS_PRI_PROC_BASED_CTLS_HLT |
-         /* Replicate the same data to the 32-bit id registers.  */
+           VMCS_PRI_PROC_BASED_CTLS_MWAIT |
-@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
+           VMCS_PRI_PROC_BASED_CTLS_TSC_OFFSET |
-         u = FIELD_DP32(u, ID_ISAR6, DP, 1);
+           VMCS_PRI_PROC_BASED_CTLS_TPR_SHADOW) |
-         cpu->isar.id_isar6 = u;
+           VMCS_PRI_PROC_BASED_CTLS_SEC_CONTROL);
+-    wvmcs(cpu->hvf_fd, VMCS_SEC_PROC_BASED_CTLS,
--#ifdef CONFIG_USER_ONLY
++    wvmcs(cpu->hvf->fd, VMCS_SEC_PROC_BASED_CTLS,
--        /* We don't set these in system emulation mode for the moment,
+           cap2ctrl(hvf_state->hvf_caps->vmx_cap_procbased2,
--         * since we don't correctly set the ID registers to advertise them,
+                    VMCS_PRI_PROC_BASED2_CTLS_APIC_ACCESSES));
--         * and in some cases they're only available in AArch64 and not AArch32,
--         * whereas the architecture requires them to be present in both if
+-    wvmcs(cpu->hvf_fd, VMCS_ENTRY_CTLS, cap2ctrl(hvf_state->hvf_caps->vmx_cap_entry,
--         * present in either.
++    wvmcs(cpu->hvf->fd, VMCS_ENTRY_CTLS, cap2ctrl(hvf_state->hvf_caps->vmx_cap_entry,
-+        /*
+));
-+         * FIXME: We do not yet support ARMv8.2-fp16 for AArch32 yet,
+-    wvmcs(cpu->hvf_fd, VMCS_EXCEPTION_BITMAP, 0); /* Double fault */
-+         * so do not set MVFR1.FPHP.  Strictly speaking this is not legal,
++    wvmcs(cpu->hvf->fd, VMCS_EXCEPTION_BITMAP, 0); /* Double fault */
-+         * but it is also not legal to enable SVE without support for FP16,
-+         * and enabling SVE in system mode is more useful in the short term.
+-    wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, 0);
-          */
++    wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, 0);
--        set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
-+
+     x86cpu = X86_CPU(cpu);
-+#ifdef CONFIG_USER_ONLY
+     x86cpu->env.xsave_buf = qemu_memalign(4096, 4096);
-         /* For usermode -cpu max we can use a larger and more efficient DCZ
-          * blocksize since we don't have to follow what the hardware does.
+-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_STAR, 1);
-          */
+-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_LSTAR, 1);
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_CSTAR, 1);
-index XXXXXXX..XXXXXXX 100644
+-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_FMASK, 1);
---- a/target/arm/helper.c
+-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_FSBASE, 1);
-+++ b/target/arm/helper.c
+-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_GSBASE, 1);
-@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
+-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_KERNELGSBASE, 1);
-     uint32_t changed;
+-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_TSC_AUX, 1);
+-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_TSC, 1);
-     /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
+-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_CS, 1);
--    if (!arm_feature(env, ARM_FEATURE_V8_FP16)) {
+-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_EIP, 1);
-+    if (!cpu_isar_feature(aa64_fp16, arm_env_get_cpu(env))) {
+-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_ESP, 1);
-         val &= ~FPCR_FZ16;
++    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_STAR, 1);
-     }
++    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_LSTAR, 1);
++    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_CSTAR, 1);
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
++    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_FMASK, 1);
-index XXXXXXX..XXXXXXX 100644
++    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_FSBASE, 1);
---- a/target/arm/translate-a64.c
++    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_GSBASE, 1);
-+++ b/target/arm/translate-a64.c
++    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_KERNELGSBASE, 1);
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_compare(DisasContext *s, uint32_t insn)
++    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_TSC_AUX, 1);
-         break;
++    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_TSC, 1);
-     case 3:
++    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_CS, 1);
-         size = MO_16;
++    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_EIP, 1);
--        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
++    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_ESP, 1);
-+        if (dc_isar_feature(aa64_fp16, s)) {
      return 0;
  }
@@ -XXX,XX +XXX,XX @@ static void hvf_store_events(CPUState *cpu, uint32_t ins_len, uint64_t idtvec_in
          }
          if (idtvec_info & VMCS_IDT_VEC_ERRCODE_VALID) {
              env->has_error_code = true;
 -            env->error_code = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_ERROR);
 +            env->error_code = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_ERROR);
          }
      }
 -    if ((rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY) &
 +    if ((rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY) &
          VMCS_INTERRUPTIBILITY_NMI_BLOCKING)) {
          env->hflags2 |= HF2_NMI_MASK;
      } else {
          env->hflags2 &= ~HF2_NMI_MASK;
      }
 -    if (rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY) &
 +    if (rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY) &
           (VMCS_INTERRUPTIBILITY_STI_BLOCKING |
           VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)) {
          env->hflags |= HF_INHIBIT_IRQ_MASK;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
              return EXCP_HLT;
          }
 -        hv_return_t r  = hv_vcpu_run(cpu->hvf_fd);
 +        hv_return_t r  = hv_vcpu_run(cpu->hvf->fd);
          assert_hvf_ok(r);
          /* handle VMEXIT */
 -        uint64_t exit_reason = rvmcs(cpu->hvf_fd, VMCS_EXIT_REASON);
 -        uint64_t exit_qual = rvmcs(cpu->hvf_fd, VMCS_EXIT_QUALIFICATION);
 -        uint32_t ins_len = (uint32_t)rvmcs(cpu->hvf_fd,
 +        uint64_t exit_reason = rvmcs(cpu->hvf->fd, VMCS_EXIT_REASON);
 +        uint64_t exit_qual = rvmcs(cpu->hvf->fd, VMCS_EXIT_QUALIFICATION);
 +        uint32_t ins_len = (uint32_t)rvmcs(cpu->hvf->fd,
                                             VMCS_EXIT_INSTRUCTION_LENGTH);
 -        uint64_t idtvec_info = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_INFO);
 +        uint64_t idtvec_info = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_INFO);
          hvf_store_events(cpu, ins_len, idtvec_info);
 -        rip = rreg(cpu->hvf_fd, HV_X86_RIP);
 -        env->eflags = rreg(cpu->hvf_fd, HV_X86_RFLAGS);
 +        rip = rreg(cpu->hvf->fd, HV_X86_RIP);
 +        env->eflags = rreg(cpu->hvf->fd, HV_X86_RFLAGS);
          qemu_mutex_lock_iothread();
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
          case EXIT_REASON_EPT_FAULT:
          {
              hvf_slot *slot;
 -            uint64_t gpa = rvmcs(cpu->hvf_fd, VMCS_GUEST_PHYSICAL_ADDRESS);
 +            uint64_t gpa = rvmcs(cpu->hvf->fd, VMCS_GUEST_PHYSICAL_ADDRESS);
              if (((idtvec_info & VMCS_IDT_VEC_VALID) == 0) &&
                  ((exit_qual & EXIT_QUAL_NMIUDTI) != 0)) {
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
                  store_regs(cpu);
                  break;
              } else if (!string && !in) {
 -                RAX(env) = rreg(cpu->hvf_fd, HV_X86_RAX);
 +                RAX(env) = rreg(cpu->hvf->fd, HV_X86_RAX);
                  hvf_handle_io(env, port, &RAX(env), 1, size, 1);
                  macvm_set_rip(cpu, rip + ins_len);
                  break;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
              break;
          }
-         /* fallthru */
+         case EXIT_REASON_CPUID: {
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
+-            uint32_t rax = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RAX);
-         break;
+-            uint32_t rbx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RBX);
-     case 3:
+-            uint32_t rcx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RCX);
-         size = MO_16;
+-            uint32_t rdx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RDX);
--        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
++            uint32_t rax = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RAX);
-+        if (dc_isar_feature(aa64_fp16, s)) {
++            uint32_t rbx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RBX);
 +            uint32_t rcx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RCX);
 +            uint32_t rdx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RDX);
              if (rax == 1) {
                  /* CPUID1.ecx.OSXSAVE needs to know CR4 */
 -                env->cr[4] = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR4);
 +                env->cr[4] = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR4);
              }
              hvf_cpu_x86_cpuid(env, rax, rcx, &rax, &rbx, &rcx, &rdx);
 -            wreg(cpu->hvf_fd, HV_X86_RAX, rax);
 -            wreg(cpu->hvf_fd, HV_X86_RBX, rbx);
 -            wreg(cpu->hvf_fd, HV_X86_RCX, rcx);
 -            wreg(cpu->hvf_fd, HV_X86_RDX, rdx);
 +            wreg(cpu->hvf->fd, HV_X86_RAX, rax);
 +            wreg(cpu->hvf->fd, HV_X86_RBX, rbx);
 +            wreg(cpu->hvf->fd, HV_X86_RCX, rcx);
 +            wreg(cpu->hvf->fd, HV_X86_RDX, rdx);
              macvm_set_rip(cpu, rip + ins_len);
              break;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
          case EXIT_REASON_XSETBV: {
              X86CPU *x86_cpu = X86_CPU(cpu);
              CPUX86State *env = &x86_cpu->env;
 -            uint32_t eax = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RAX);
 -            uint32_t ecx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RCX);
 -            uint32_t edx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RDX);
 +            uint32_t eax = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RAX);
 +            uint32_t ecx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RCX);
 +            uint32_t edx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RDX);
              if (ecx) {
                  macvm_set_rip(cpu, rip + ins_len);
                  break;
              }
              env->xcr0 = ((uint64_t)edx << 32) | eax;
 -            wreg(cpu->hvf_fd, HV_X86_XCR0, env->xcr0 | 1);
 +            wreg(cpu->hvf->fd, HV_X86_XCR0, env->xcr0 | 1);
              macvm_set_rip(cpu, rip + ins_len);
              break;
          }
-         /* fallthru */
+@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
-         break;
+             switch (cr) {
-     case 3:
+             case 0x0: {
-         sz = MO_16;
+-                macvm_set_cr0(cpu->hvf_fd, RRX(env, reg));
--        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
++                macvm_set_cr0(cpu->hvf->fd, RRX(env, reg));
-+        if (dc_isar_feature(aa64_fp16, s)) {
+                 break;
              }
              case 4: {
 -                macvm_set_cr4(cpu->hvf_fd, RRX(env, reg));
 +                macvm_set_cr4(cpu->hvf->fd, RRX(env, reg));
                  break;
              }
              case 8: {
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
              break;
          }
-         /* fallthru */
+         case EXIT_REASON_TASK_SWITCH: {
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
+-            uint64_t vinfo = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_INFO);
-             handle_fp_1src_double(s, opcode, rd, rn);
++            uint64_t vinfo = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_INFO);
-             break;
+             x68_segment_selector sel = {.sel = exit_qual & 0xffff};
-         case 3:
+             vmx_handle_task_switch(cpu, sel, (exit_qual >> 30) & 0x3,
--            if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+              vinfo & VMCS_INTR_VALID, vinfo & VECTORING_INFO_VECTOR_MASK, vinfo
-+            if (!dc_isar_feature(aa64_fp16, s)) {
+@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
                  unallocated_encoding(s);
                  return;
              }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_2src(DisasContext *s, uint32_t insn)
          handle_fp_2src_double(s, opcode, rd, rn, rm);
          break;
      case 3:
 -        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 +        if (!dc_isar_feature(aa64_fp16, s)) {
              unallocated_encoding(s);
              return;
          }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_3src(DisasContext *s, uint32_t insn)
          handle_fp_3src_double(s, o0, o1, rd, rn, rm, ra);
          break;
      case 3:
 -        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 +        if (!dc_isar_feature(aa64_fp16, s)) {
              unallocated_encoding(s);
              return;
          }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_imm(DisasContext *s, uint32_t insn)
          break;
      case 3:
          sz = MO_16;
 -        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 +        if (dc_isar_feature(aa64_fp16, s)) {
              break;
          }
-         /* fallthru */
+         case EXIT_REASON_RDPMC:
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_fixed_conv(DisasContext *s, uint32_t insn)
+-            wreg(cpu->hvf_fd, HV_X86_RAX, 0);
-     case 1: /* float64 */
+-            wreg(cpu->hvf_fd, HV_X86_RDX, 0);
 +            wreg(cpu->hvf->fd, HV_X86_RAX, 0);
 +            wreg(cpu->hvf->fd, HV_X86_RDX, 0);
              macvm_set_rip(cpu, rip + ins_len);
              break;
          case VMX_REASON_VMCALL:
 diff --git a/target/i386/hvf/x86.c b/target/i386/hvf/x86.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/x86.c
 +++ b/target/i386/hvf/x86.c
@@ -XXX,XX +XXX,XX @@ bool x86_read_segment_descriptor(struct CPUState *cpu,
      }
      if (GDT_SEL == sel.ti) {
 -        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_BASE);
 -        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
 +        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_BASE);
 +        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
      } else {
 -        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_BASE);
 -        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_LIMIT);
 +        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_BASE);
 +        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_LIMIT);
      }
      if (sel.index * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_write_segment_descriptor(struct CPUState *cpu,
      uint32_t limit;
      if (GDT_SEL == sel.ti) {
 -        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_BASE);
 -        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
 +        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_BASE);
 +        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
      } else {
 -        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_BASE);
 -        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_LIMIT);
 +        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_BASE);
 +        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_LIMIT);
      }
      if (sel.index * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_write_segment_descriptor(struct CPUState *cpu,
  bool x86_read_call_gate(struct CPUState *cpu, struct x86_call_gate *idt_desc,
                          int gate)
  {
 -    target_ulong base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_IDTR_BASE);
 -    uint32_t limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_IDTR_LIMIT);
 +    target_ulong base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_IDTR_BASE);
 +    uint32_t limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_IDTR_LIMIT);
      memset(idt_desc, 0, sizeof(*idt_desc));
      if (gate * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_read_call_gate(struct CPUState *cpu, struct x86_call_gate *idt_desc,
  bool x86_is_protected(struct CPUState *cpu)
  {
 -    uint64_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
 +    uint64_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
      return cr0 & CR0_PE;
  }
@@ -XXX,XX +XXX,XX @@ bool x86_is_v8086(struct CPUState *cpu)
  bool x86_is_long_mode(struct CPUState *cpu)
  {
 -    return rvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER) & MSR_EFER_LMA;
 +    return rvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER) & MSR_EFER_LMA;
  }
  bool x86_is_long64_mode(struct CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ bool x86_is_long64_mode(struct CPUState *cpu)
  bool x86_is_paging_mode(struct CPUState *cpu)
  {
 -    uint64_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
 +    uint64_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
      return cr0 & CR0_PG;
  }
  bool x86_is_pae_enabled(struct CPUState *cpu)
  {
 -    uint64_t cr4 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR4);
 +    uint64_t cr4 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR4);
      return cr4 & CR4_PAE;
  }
 diff --git a/target/i386/hvf/x86_descr.c b/target/i386/hvf/x86_descr.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/x86_descr.c
 +++ b/target/i386/hvf/x86_descr.c
@@ -XXX,XX +XXX,XX @@ static const struct vmx_segment_field {
  uint32_t vmx_read_segment_limit(CPUState *cpu, X86Seg seg)
  {
 -    return (uint32_t)rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].limit);
 +    return (uint32_t)rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].limit);
  }
  uint32_t vmx_read_segment_ar(CPUState *cpu, X86Seg seg)
  {
 -    return (uint32_t)rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].ar_bytes);
 +    return (uint32_t)rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].ar_bytes);
  }
  uint64_t vmx_read_segment_base(CPUState *cpu, X86Seg seg)
  {
 -    return rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].base);
 +    return rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].base);
  }
  x68_segment_selector vmx_read_segment_selector(CPUState *cpu, X86Seg seg)
  {
      x68_segment_selector sel;
 -    sel.sel = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector);
 +    sel.sel = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector);
      return sel;
  }
  void vmx_write_segment_selector(struct CPUState *cpu, x68_segment_selector selector, X86Seg seg)
  {
 -    wvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector, selector.sel);
 +    wvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector, selector.sel);
  }
  void vmx_read_segment_descriptor(struct CPUState *cpu, struct vmx_segment *desc, X86Seg seg)
  {
 -    desc->sel = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector);
 -    desc->base = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].base);
 -    desc->limit = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].limit);
 -    desc->ar = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].ar_bytes);
 +    desc->sel = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector);
 +    desc->base = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].base);
 +    desc->limit = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].limit);
 +    desc->ar = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].ar_bytes);
  }
  void vmx_write_segment_descriptor(CPUState *cpu, struct vmx_segment *desc, X86Seg seg)
  {
      const struct vmx_segment_field *sf = &vmx_segment_fields[seg];
 -    wvmcs(cpu->hvf_fd, sf->base, desc->base);
 -    wvmcs(cpu->hvf_fd, sf->limit, desc->limit);
 -    wvmcs(cpu->hvf_fd, sf->selector, desc->sel);
 -    wvmcs(cpu->hvf_fd, sf->ar_bytes, desc->ar);
 +    wvmcs(cpu->hvf->fd, sf->base, desc->base);
 +    wvmcs(cpu->hvf->fd, sf->limit, desc->limit);
 +    wvmcs(cpu->hvf->fd, sf->selector, desc->sel);
 +    wvmcs(cpu->hvf->fd, sf->ar_bytes, desc->ar);
  }
  void x86_segment_descriptor_to_vmx(struct CPUState *cpu, x68_segment_selector selector, struct x86_segment_descriptor *desc, struct vmx_segment *vmx_desc)
 diff --git a/target/i386/hvf/x86_emu.c b/target/i386/hvf/x86_emu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/x86_emu.c
 +++ b/target/i386/hvf/x86_emu.c
@@ -XXX,XX +XXX,XX @@ void simulate_rdmsr(struct CPUState *cpu)
      switch (msr) {
      case MSR_IA32_TSC:
 -        val = rdtscp() + rvmcs(cpu->hvf_fd, VMCS_TSC_OFFSET);
 +        val = rdtscp() + rvmcs(cpu->hvf->fd, VMCS_TSC_OFFSET);
          break;
-     case 3: /* float16 */
+     case MSR_IA32_APICBASE:
--        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+         val = cpu_get_apic_base(X86_CPU(cpu)->apic_state);
-+        if (dc_isar_feature(aa64_fp16, s)) {
+@@ -XXX,XX +XXX,XX @@ void simulate_rdmsr(struct CPUState *cpu)
-             break;
+         val = x86_cpu->ucode_rev;
-         }
+         break;
-         /* fallthru */
+     case MSR_EFER:
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
+-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER);
-             break;
++        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER);
-         case 0x6: /* 16-bit float, 32-bit int */
+         break;
-         case 0xe: /* 16-bit float, 64-bit int */
+     case MSR_FSBASE:
--            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_FS_BASE);
-+            if (dc_isar_feature(aa64_fp16, s)) {
++        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_FS_BASE);
-                 break;
+         break;
-             }
+     case MSR_GSBASE:
-             /* fallthru */
+-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_GS_BASE);
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
++        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_GS_BASE);
-         case 1: /* float64 */
+         break;
-             break;
+     case MSR_KERNELGSBASE:
-         case 3: /* float16 */
+-        val = rvmcs(cpu->hvf_fd, VMCS_HOST_FS_BASE);
--            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
++        val = rvmcs(cpu->hvf->fd, VMCS_HOST_FS_BASE);
-+            if (dc_isar_feature(aa64_fp16, s)) {
+         break;
-                 break;
+     case MSR_STAR:
-             }
+         abort();
-             /* fallthru */
+@@ -XXX,XX +XXX,XX @@ void simulate_wrmsr(struct CPUState *cpu)
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn)
+         cpu_set_apic_base(X86_CPU(cpu)->apic_state, data);
-          */
+         break;
-         is_min = extract32(size, 1, 1);
+     case MSR_FSBASE:
-         is_fp = true;
+-        wvmcs(cpu->hvf_fd, VMCS_GUEST_FS_BASE, data);
--        if (!is_u && arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
++        wvmcs(cpu->hvf->fd, VMCS_GUEST_FS_BASE, data);
-+        if (!is_u && dc_isar_feature(aa64_fp16, s)) {
+         break;
-             size = 1;
+     case MSR_GSBASE:
-         } else if (!is_u || !is_q || extract32(size, 0, 1)) {
+-        wvmcs(cpu->hvf_fd, VMCS_GUEST_GS_BASE, data);
-             unallocated_encoding(s);
++        wvmcs(cpu->hvf->fd, VMCS_GUEST_GS_BASE, data);
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
+         break;
+     case MSR_KERNELGSBASE:
-     if (o2 != 0 || ((cmode == 0xf) && is_neg && !is_q)) {
+-        wvmcs(cpu->hvf_fd, VMCS_HOST_FS_BASE, data);
-         /* Check for FMOV (vector, immediate) - half-precision */
++        wvmcs(cpu->hvf->fd, VMCS_HOST_FS_BASE, data);
--        if (!(arm_dc_feature(s, ARM_FEATURE_V8_FP16) && o2 && cmode == 0xf)) {
+         break;
-+        if (!(dc_isar_feature(aa64_fp16, s) && o2 && cmode == 0xf)) {
+     case MSR_STAR:
-             unallocated_encoding(s);
+         abort();
-             return;
+@@ -XXX,XX +XXX,XX @@ void simulate_wrmsr(struct CPUState *cpu)
-         }
+         break;
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_pairwise(DisasContext *s, uint32_t insn)
+     case MSR_EFER:
-     case 0x2f: /* FMINP */
+         /*printf("new efer %llx\n", EFER(cpu));*/
-         /* FP op, size[0] is 32 or 64 bit*/
+-        wvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER, data);
-         if (!u) {
++        wvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER, data);
--            if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+         if (data & MSR_EFER_NXE) {
-+            if (!dc_isar_feature(aa64_fp16, s)) {
+-            hv_vcpu_invalidate_tlb(cpu->hvf_fd);
-                 unallocated_encoding(s);
++            hv_vcpu_invalidate_tlb(cpu->hvf->fd);
                  return;
              } else {
@@ -XXX,XX +XXX,XX @@ static void handle_simd_shift_intfp_conv(DisasContext *s, bool is_scalar,
          size = MO_32;
      } else if (immh & 2) {
          size = MO_16;
 -        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 +        if (!dc_isar_feature(aa64_fp16, s)) {
              unallocated_encoding(s);
              return;
          }
@@ -XXX,XX +XXX,XX @@ static void handle_simd_shift_fpint_conv(DisasContext *s, bool is_scalar,
          size = MO_32;
      } else if (immh & 0x2) {
          size = MO_16;
 -        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 +        if (!dc_isar_feature(aa64_fp16, s)) {
              unallocated_encoding(s);
              return;
          }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_three_reg_same_fp16(DisasContext *s,
          return;
      }
 -    if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 +    if (!dc_isar_feature(aa64_fp16, s)) {
          unallocated_encoding(s);
      }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_fp16(DisasContext *s, uint32_t insn)
      TCGv_ptr fpst;
      bool pairwise = false;
 -    if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 +    if (!dc_isar_feature(aa64_fp16, s)) {
          unallocated_encoding(s);
          return;
      }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
      case 0x1c: /* FCADD, #90 */
      case 0x1e: /* FCADD, #270 */
          if (size == 0
 -            || (size == 1 && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))
 +            || (size == 1 && !dc_isar_feature(aa64_fp16, s))
              || (size == 3 && !is_q)) {
              unallocated_encoding(s);
              return;
@@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn)
      bool need_fpst = true;
      int rmode;
 -    if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 +    if (!dc_isar_feature(aa64_fp16, s)) {
          unallocated_encoding(s);
          return;
      }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
          }
          break;
-     }
+     case MSR_MTRRphysBase(0):
--    if (is_fp16 && !arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+@@ -XXX,XX +XXX,XX @@ void load_regs(struct CPUState *cpu)
-+    if (is_fp16 && !dc_isar_feature(aa64_fp16, s)) {
+     CPUX86State *env = &x86_cpu->env;
-         unallocated_encoding(s);
      int i = 0;
 -    RRX(env, R_EAX) = rreg(cpu->hvf_fd, HV_X86_RAX);
 -    RRX(env, R_EBX) = rreg(cpu->hvf_fd, HV_X86_RBX);
 -    RRX(env, R_ECX) = rreg(cpu->hvf_fd, HV_X86_RCX);
 -    RRX(env, R_EDX) = rreg(cpu->hvf_fd, HV_X86_RDX);
 -    RRX(env, R_ESI) = rreg(cpu->hvf_fd, HV_X86_RSI);
 -    RRX(env, R_EDI) = rreg(cpu->hvf_fd, HV_X86_RDI);
 -    RRX(env, R_ESP) = rreg(cpu->hvf_fd, HV_X86_RSP);
 -    RRX(env, R_EBP) = rreg(cpu->hvf_fd, HV_X86_RBP);
 +    RRX(env, R_EAX) = rreg(cpu->hvf->fd, HV_X86_RAX);
 +    RRX(env, R_EBX) = rreg(cpu->hvf->fd, HV_X86_RBX);
 +    RRX(env, R_ECX) = rreg(cpu->hvf->fd, HV_X86_RCX);
 +    RRX(env, R_EDX) = rreg(cpu->hvf->fd, HV_X86_RDX);
 +    RRX(env, R_ESI) = rreg(cpu->hvf->fd, HV_X86_RSI);
 +    RRX(env, R_EDI) = rreg(cpu->hvf->fd, HV_X86_RDI);
 +    RRX(env, R_ESP) = rreg(cpu->hvf->fd, HV_X86_RSP);
 +    RRX(env, R_EBP) = rreg(cpu->hvf->fd, HV_X86_RBP);
      for (i = 8; i < 16; i++) {
 -        RRX(env, i) = rreg(cpu->hvf_fd, HV_X86_RAX + i);
 +        RRX(env, i) = rreg(cpu->hvf->fd, HV_X86_RAX + i);
      }
 -    env->eflags = rreg(cpu->hvf_fd, HV_X86_RFLAGS);
 +    env->eflags = rreg(cpu->hvf->fd, HV_X86_RFLAGS);
      rflags_to_lflags(env);
 -    env->eip = rreg(cpu->hvf_fd, HV_X86_RIP);
 +    env->eip = rreg(cpu->hvf->fd, HV_X86_RIP);
  }
  void store_regs(struct CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ void store_regs(struct CPUState *cpu)
      CPUX86State *env = &x86_cpu->env;
      int i = 0;
 -    wreg(cpu->hvf_fd, HV_X86_RAX, RAX(env));
 -    wreg(cpu->hvf_fd, HV_X86_RBX, RBX(env));
 -    wreg(cpu->hvf_fd, HV_X86_RCX, RCX(env));
 -    wreg(cpu->hvf_fd, HV_X86_RDX, RDX(env));
 -    wreg(cpu->hvf_fd, HV_X86_RSI, RSI(env));
 -    wreg(cpu->hvf_fd, HV_X86_RDI, RDI(env));
 -    wreg(cpu->hvf_fd, HV_X86_RBP, RBP(env));
 -    wreg(cpu->hvf_fd, HV_X86_RSP, RSP(env));
 +    wreg(cpu->hvf->fd, HV_X86_RAX, RAX(env));
 +    wreg(cpu->hvf->fd, HV_X86_RBX, RBX(env));
 +    wreg(cpu->hvf->fd, HV_X86_RCX, RCX(env));
 +    wreg(cpu->hvf->fd, HV_X86_RDX, RDX(env));
 +    wreg(cpu->hvf->fd, HV_X86_RSI, RSI(env));
 +    wreg(cpu->hvf->fd, HV_X86_RDI, RDI(env));
 +    wreg(cpu->hvf->fd, HV_X86_RBP, RBP(env));
 +    wreg(cpu->hvf->fd, HV_X86_RSP, RSP(env));
      for (i = 8; i < 16; i++) {
 -        wreg(cpu->hvf_fd, HV_X86_RAX + i, RRX(env, i));
 +        wreg(cpu->hvf->fd, HV_X86_RAX + i, RRX(env, i));
      }
      lflags_to_rflags(env);
 -    wreg(cpu->hvf_fd, HV_X86_RFLAGS, env->eflags);
 +    wreg(cpu->hvf->fd, HV_X86_RFLAGS, env->eflags);
      macvm_set_rip(cpu, env->eip);
  }
 diff --git a/target/i386/hvf/x86_mmu.c b/target/i386/hvf/x86_mmu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/x86_mmu.c
 +++ b/target/i386/hvf/x86_mmu.c
@@ -XXX,XX +XXX,XX @@ static bool test_pt_entry(struct CPUState *cpu, struct gpt_translation *pt,
          pt->err_code |= MMU_PAGE_PT;
      }
 -    uint32_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
 +    uint32_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
      /* check protection */
      if (cr0 & CR0_WP) {
          if (pt->write_access && !pte_write_access(pte)) {
@@ -XXX,XX +XXX,XX @@ static bool walk_gpt(struct CPUState *cpu, target_ulong addr, int err_code,
  {
      int top_level, level;
      bool is_large = false;
 -    target_ulong cr3 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR3);
 +    target_ulong cr3 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR3);
      uint64_t page_mask = pae ? PAE_PTE_PAGE_MASK : LEGACY_PTE_PAGE_MASK;
      memset(pt, 0, sizeof(*pt));
 diff --git a/target/i386/hvf/x86_task.c b/target/i386/hvf/x86_task.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/x86_task.c
 +++ b/target/i386/hvf/x86_task.c
@@ -XXX,XX +XXX,XX @@ static void load_state_from_tss32(CPUState *cpu, struct x86_tss_segment32 *tss)
      X86CPU *x86_cpu = X86_CPU(cpu);
      CPUX86State *env = &x86_cpu->env;
 -    wvmcs(cpu->hvf_fd, VMCS_GUEST_CR3, tss->cr3);
 +    wvmcs(cpu->hvf->fd, VMCS_GUEST_CR3, tss->cr3);
      env->eip = tss->eip;
      env->eflags = tss->eflags | 2;
@@ -XXX,XX +XXX,XX @@ static int task_switch_32(CPUState *cpu, x68_segment_selector tss_sel, x68_segme
  void vmx_handle_task_switch(CPUState *cpu, x68_segment_selector tss_sel, int reason, bool gate_valid, uint8_t gate, uint64_t gate_type)
  {
 -    uint64_t rip = rreg(cpu->hvf_fd, HV_X86_RIP);
 +    uint64_t rip = rreg(cpu->hvf->fd, HV_X86_RIP);
      if (!gate_valid || (gate_type != VMCS_INTR_T_HWEXCEPTION &&
                          gate_type != VMCS_INTR_T_HWINTR &&
                          gate_type != VMCS_INTR_T_NMI)) {
 -        int ins_len = rvmcs(cpu->hvf_fd, VMCS_EXIT_INSTRUCTION_LENGTH);
 +        int ins_len = rvmcs(cpu->hvf->fd, VMCS_EXIT_INSTRUCTION_LENGTH);
          macvm_set_rip(cpu, rip + ins_len);
          return;
      }
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ void vmx_handle_task_switch(CPUState *cpu, x68_segment_selector tss_sel, int rea
          //ret = task_switch_16(cpu, tss_sel, old_tss_sel, old_tss_base, &next_tss_desc);
          VM_PANIC("task_switch_16");
 -    macvm_set_cr0(cpu->hvf_fd, rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0) | CR0_TS);
 +    macvm_set_cr0(cpu->hvf->fd, rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0) | CR0_TS);
      x86_segment_descriptor_to_vmx(cpu, tss_sel, &next_tss_desc, &vmx_seg);
      vmx_write_segment_descriptor(cpu, &vmx_seg, R_TR);
      store_regs(cpu);
 -    hv_vcpu_invalidate_tlb(cpu->hvf_fd);
 -    hv_vcpu_flush(cpu->hvf_fd);
 +    hv_vcpu_invalidate_tlb(cpu->hvf->fd);
 +    hv_vcpu_flush(cpu->hvf->fd);
  }
 diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/i386/hvf/x86hvf.c
-+++ b/target/arm/translate.c
++++ b/target/i386/hvf/x86hvf.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ void hvf_put_xsave(CPUState *cpu_state)
-         int size = extract32(insn, 20, 1);
-         data = extract32(insn, 23, 2); /* rot */
+     x86_cpu_xsave_all_areas(X86_CPU(cpu_state), xsave);
-         if (!dc_isar_feature(aa32_vcma, s)
--            || (!size && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) {
+-    if (hv_vcpu_write_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
-+            || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
++    if (hv_vcpu_write_fpstate(cpu_state->hvf->fd, (void*)xsave, 4096)) {
-             return 1;
+         abort();
      }
  }
@@ -XXX,XX +XXX,XX @@ void hvf_put_segments(CPUState *cpu_state)
      CPUX86State *env = &X86_CPU(cpu_state)->env;
      struct vmx_segment seg;
 -    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_LIMIT, env->idt.limit);
 -    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_BASE, env->idt.base);
 +    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_LIMIT, env->idt.limit);
 +    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_BASE, env->idt.base);
 -    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_LIMIT, env->gdt.limit);
 -    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_BASE, env->gdt.base);
 +    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_LIMIT, env->gdt.limit);
 +    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_BASE, env->gdt.base);
 -    /* wvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR2, env->cr[2]); */
 -    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR3, env->cr[3]);
 +    /* wvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR2, env->cr[2]); */
 +    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR3, env->cr[3]);
      vmx_update_tpr(cpu_state);
 -    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IA32_EFER, env->efer);
 +    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IA32_EFER, env->efer);
 -    macvm_set_cr4(cpu_state->hvf_fd, env->cr[4]);
 -    macvm_set_cr0(cpu_state->hvf_fd, env->cr[0]);
 +    macvm_set_cr4(cpu_state->hvf->fd, env->cr[4]);
 +    macvm_set_cr0(cpu_state->hvf->fd, env->cr[0]);
      hvf_set_segment(cpu_state, &seg, &env->segs[R_CS], false);
      vmx_write_segment_descriptor(cpu_state, &seg, R_CS);
@@ -XXX,XX +XXX,XX @@ void hvf_put_segments(CPUState *cpu_state)
      hvf_set_segment(cpu_state, &seg, &env->ldt, false);
      vmx_write_segment_descriptor(cpu_state, &seg, R_LDTR);
 -    hv_vcpu_flush(cpu_state->hvf_fd);
 +    hv_vcpu_flush(cpu_state->hvf->fd);
  }
  void hvf_put_msrs(CPUState *cpu_state)
  {
      CPUX86State *env = &X86_CPU(cpu_state)->env;
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_CS,
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_CS,
                        env->sysenter_cs);
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_ESP,
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_ESP,
                        env->sysenter_esp);
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_EIP,
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_EIP,
                        env->sysenter_eip);
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_STAR, env->star);
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_STAR, env->star);
  #ifdef TARGET_X86_64
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_CSTAR, env->cstar);
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_KERNELGSBASE, env->kernelgsbase);
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_FMASK, env->fmask);
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_LSTAR, env->lstar);
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_CSTAR, env->cstar);
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_KERNELGSBASE, env->kernelgsbase);
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_FMASK, env->fmask);
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_LSTAR, env->lstar);
  #endif
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_GSBASE, env->segs[R_GS].base);
 -    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_FSBASE, env->segs[R_FS].base);
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_GSBASE, env->segs[R_GS].base);
 +    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_FSBASE, env->segs[R_FS].base);
  }
@@ -XXX,XX +XXX,XX @@ void hvf_get_xsave(CPUState *cpu_state)
      xsave = X86_CPU(cpu_state)->env.xsave_buf;
 -    if (hv_vcpu_read_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
 +    if (hv_vcpu_read_fpstate(cpu_state->hvf->fd, (void*)xsave, 4096)) {
          abort();
      }
@@ -XXX,XX +XXX,XX @@ void hvf_get_segments(CPUState *cpu_state)
      vmx_read_segment_descriptor(cpu_state, &seg, R_LDTR);
      hvf_get_segment(&env->ldt, &seg);
 -    env->idt.limit = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_LIMIT);
 -    env->idt.base = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_BASE);
 -    env->gdt.limit = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
 -    env->gdt.base = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_BASE);
 +    env->idt.limit = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_LIMIT);
 +    env->idt.base = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_BASE);
 +    env->gdt.limit = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
 +    env->gdt.base = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_BASE);
 -    env->cr[0] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR0);
 +    env->cr[0] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR0);
      env->cr[2] = 0;
 -    env->cr[3] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR3);
 -    env->cr[4] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR4);
 +    env->cr[3] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR3);
 +    env->cr[4] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR4);
 -    env->efer = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IA32_EFER);
 +    env->efer = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IA32_EFER);
  }
  void hvf_get_msrs(CPUState *cpu_state)
@@ -XXX,XX +XXX,XX @@ void hvf_get_msrs(CPUState *cpu_state)
      CPUX86State *env = &X86_CPU(cpu_state)->env;
      uint64_t tmp;
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_CS, &tmp);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_CS, &tmp);
      env->sysenter_cs = tmp;
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_ESP, &tmp);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_ESP, &tmp);
      env->sysenter_esp = tmp;
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_EIP, &tmp);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_EIP, &tmp);
      env->sysenter_eip = tmp;
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_STAR, &env->star);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_STAR, &env->star);
  #ifdef TARGET_X86_64
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_CSTAR, &env->cstar);
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_KERNELGSBASE, &env->kernelgsbase);
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_FMASK, &env->fmask);
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_LSTAR, &env->lstar);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_CSTAR, &env->cstar);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_KERNELGSBASE, &env->kernelgsbase);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_FMASK, &env->fmask);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_LSTAR, &env->lstar);
  #endif
 -    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_APICBASE, &tmp);
 +    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_APICBASE, &tmp);
 -    env->tsc = rdtscp() + rvmcs(cpu_state->hvf_fd, VMCS_TSC_OFFSET);
 +    env->tsc = rdtscp() + rvmcs(cpu_state->hvf->fd, VMCS_TSC_OFFSET);
  }
  int hvf_put_registers(CPUState *cpu_state)
@@ -XXX,XX +XXX,XX @@ int hvf_put_registers(CPUState *cpu_state)
      X86CPU *x86cpu = X86_CPU(cpu_state);
      CPUX86State *env = &x86cpu->env;
 -    wreg(cpu_state->hvf_fd, HV_X86_RAX, env->regs[R_EAX]);
 -    wreg(cpu_state->hvf_fd, HV_X86_RBX, env->regs[R_EBX]);
 -    wreg(cpu_state->hvf_fd, HV_X86_RCX, env->regs[R_ECX]);
 -    wreg(cpu_state->hvf_fd, HV_X86_RDX, env->regs[R_EDX]);
 -    wreg(cpu_state->hvf_fd, HV_X86_RBP, env->regs[R_EBP]);
 -    wreg(cpu_state->hvf_fd, HV_X86_RSP, env->regs[R_ESP]);
 -    wreg(cpu_state->hvf_fd, HV_X86_RSI, env->regs[R_ESI]);
 -    wreg(cpu_state->hvf_fd, HV_X86_RDI, env->regs[R_EDI]);
 -    wreg(cpu_state->hvf_fd, HV_X86_R8, env->regs[8]);
 -    wreg(cpu_state->hvf_fd, HV_X86_R9, env->regs[9]);
 -    wreg(cpu_state->hvf_fd, HV_X86_R10, env->regs[10]);
 -    wreg(cpu_state->hvf_fd, HV_X86_R11, env->regs[11]);
 -    wreg(cpu_state->hvf_fd, HV_X86_R12, env->regs[12]);
 -    wreg(cpu_state->hvf_fd, HV_X86_R13, env->regs[13]);
 -    wreg(cpu_state->hvf_fd, HV_X86_R14, env->regs[14]);
 -    wreg(cpu_state->hvf_fd, HV_X86_R15, env->regs[15]);
 -    wreg(cpu_state->hvf_fd, HV_X86_RFLAGS, env->eflags);
 -    wreg(cpu_state->hvf_fd, HV_X86_RIP, env->eip);
 +    wreg(cpu_state->hvf->fd, HV_X86_RAX, env->regs[R_EAX]);
 +    wreg(cpu_state->hvf->fd, HV_X86_RBX, env->regs[R_EBX]);
 +    wreg(cpu_state->hvf->fd, HV_X86_RCX, env->regs[R_ECX]);
 +    wreg(cpu_state->hvf->fd, HV_X86_RDX, env->regs[R_EDX]);
 +    wreg(cpu_state->hvf->fd, HV_X86_RBP, env->regs[R_EBP]);
 +    wreg(cpu_state->hvf->fd, HV_X86_RSP, env->regs[R_ESP]);
 +    wreg(cpu_state->hvf->fd, HV_X86_RSI, env->regs[R_ESI]);
 +    wreg(cpu_state->hvf->fd, HV_X86_RDI, env->regs[R_EDI]);
 +    wreg(cpu_state->hvf->fd, HV_X86_R8, env->regs[8]);
 +    wreg(cpu_state->hvf->fd, HV_X86_R9, env->regs[9]);
 +    wreg(cpu_state->hvf->fd, HV_X86_R10, env->regs[10]);
 +    wreg(cpu_state->hvf->fd, HV_X86_R11, env->regs[11]);
 +    wreg(cpu_state->hvf->fd, HV_X86_R12, env->regs[12]);
 +    wreg(cpu_state->hvf->fd, HV_X86_R13, env->regs[13]);
 +    wreg(cpu_state->hvf->fd, HV_X86_R14, env->regs[14]);
 +    wreg(cpu_state->hvf->fd, HV_X86_R15, env->regs[15]);
 +    wreg(cpu_state->hvf->fd, HV_X86_RFLAGS, env->eflags);
 +    wreg(cpu_state->hvf->fd, HV_X86_RIP, env->eip);
 -    wreg(cpu_state->hvf_fd, HV_X86_XCR0, env->xcr0);
 +    wreg(cpu_state->hvf->fd, HV_X86_XCR0, env->xcr0);
      hvf_put_xsave(cpu_state);
@@ -XXX,XX +XXX,XX @@ int hvf_put_registers(CPUState *cpu_state)
      hvf_put_msrs(cpu_state);
 -    wreg(cpu_state->hvf_fd, HV_X86_DR0, env->dr[0]);
 -    wreg(cpu_state->hvf_fd, HV_X86_DR1, env->dr[1]);
 -    wreg(cpu_state->hvf_fd, HV_X86_DR2, env->dr[2]);
 -    wreg(cpu_state->hvf_fd, HV_X86_DR3, env->dr[3]);
 -    wreg(cpu_state->hvf_fd, HV_X86_DR4, env->dr[4]);
 -    wreg(cpu_state->hvf_fd, HV_X86_DR5, env->dr[5]);
 -    wreg(cpu_state->hvf_fd, HV_X86_DR6, env->dr[6]);
 -    wreg(cpu_state->hvf_fd, HV_X86_DR7, env->dr[7]);
 +    wreg(cpu_state->hvf->fd, HV_X86_DR0, env->dr[0]);
 +    wreg(cpu_state->hvf->fd, HV_X86_DR1, env->dr[1]);
 +    wreg(cpu_state->hvf->fd, HV_X86_DR2, env->dr[2]);
 +    wreg(cpu_state->hvf->fd, HV_X86_DR3, env->dr[3]);
 +    wreg(cpu_state->hvf->fd, HV_X86_DR4, env->dr[4]);
 +    wreg(cpu_state->hvf->fd, HV_X86_DR5, env->dr[5]);
 +    wreg(cpu_state->hvf->fd, HV_X86_DR6, env->dr[6]);
 +    wreg(cpu_state->hvf->fd, HV_X86_DR7, env->dr[7]);
      return 0;
  }
@@ -XXX,XX +XXX,XX @@ int hvf_get_registers(CPUState *cpu_state)
      X86CPU *x86cpu = X86_CPU(cpu_state);
      CPUX86State *env = &x86cpu->env;
 -    env->regs[R_EAX] = rreg(cpu_state->hvf_fd, HV_X86_RAX);
 -    env->regs[R_EBX] = rreg(cpu_state->hvf_fd, HV_X86_RBX);
 -    env->regs[R_ECX] = rreg(cpu_state->hvf_fd, HV_X86_RCX);
 -    env->regs[R_EDX] = rreg(cpu_state->hvf_fd, HV_X86_RDX);
 -    env->regs[R_EBP] = rreg(cpu_state->hvf_fd, HV_X86_RBP);
 -    env->regs[R_ESP] = rreg(cpu_state->hvf_fd, HV_X86_RSP);
 -    env->regs[R_ESI] = rreg(cpu_state->hvf_fd, HV_X86_RSI);
 -    env->regs[R_EDI] = rreg(cpu_state->hvf_fd, HV_X86_RDI);
 -    env->regs[8] = rreg(cpu_state->hvf_fd, HV_X86_R8);
 -    env->regs[9] = rreg(cpu_state->hvf_fd, HV_X86_R9);
 -    env->regs[10] = rreg(cpu_state->hvf_fd, HV_X86_R10);
 -    env->regs[11] = rreg(cpu_state->hvf_fd, HV_X86_R11);
 -    env->regs[12] = rreg(cpu_state->hvf_fd, HV_X86_R12);
 -    env->regs[13] = rreg(cpu_state->hvf_fd, HV_X86_R13);
 -    env->regs[14] = rreg(cpu_state->hvf_fd, HV_X86_R14);
 -    env->regs[15] = rreg(cpu_state->hvf_fd, HV_X86_R15);
 +    env->regs[R_EAX] = rreg(cpu_state->hvf->fd, HV_X86_RAX);
 +    env->regs[R_EBX] = rreg(cpu_state->hvf->fd, HV_X86_RBX);
 +    env->regs[R_ECX] = rreg(cpu_state->hvf->fd, HV_X86_RCX);
 +    env->regs[R_EDX] = rreg(cpu_state->hvf->fd, HV_X86_RDX);
 +    env->regs[R_EBP] = rreg(cpu_state->hvf->fd, HV_X86_RBP);
 +    env->regs[R_ESP] = rreg(cpu_state->hvf->fd, HV_X86_RSP);
 +    env->regs[R_ESI] = rreg(cpu_state->hvf->fd, HV_X86_RSI);
 +    env->regs[R_EDI] = rreg(cpu_state->hvf->fd, HV_X86_RDI);
 +    env->regs[8] = rreg(cpu_state->hvf->fd, HV_X86_R8);
 +    env->regs[9] = rreg(cpu_state->hvf->fd, HV_X86_R9);
 +    env->regs[10] = rreg(cpu_state->hvf->fd, HV_X86_R10);
 +    env->regs[11] = rreg(cpu_state->hvf->fd, HV_X86_R11);
 +    env->regs[12] = rreg(cpu_state->hvf->fd, HV_X86_R12);
 +    env->regs[13] = rreg(cpu_state->hvf->fd, HV_X86_R13);
 +    env->regs[14] = rreg(cpu_state->hvf->fd, HV_X86_R14);
 +    env->regs[15] = rreg(cpu_state->hvf->fd, HV_X86_R15);
 -    env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
 -    env->eip = rreg(cpu_state->hvf_fd, HV_X86_RIP);
 +    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
 +    env->eip = rreg(cpu_state->hvf->fd, HV_X86_RIP);
      hvf_get_xsave(cpu_state);
 -    env->xcr0 = rreg(cpu_state->hvf_fd, HV_X86_XCR0);
 +    env->xcr0 = rreg(cpu_state->hvf->fd, HV_X86_XCR0);
      hvf_get_segments(cpu_state);
      hvf_get_msrs(cpu_state);
 -    env->dr[0] = rreg(cpu_state->hvf_fd, HV_X86_DR0);
 -    env->dr[1] = rreg(cpu_state->hvf_fd, HV_X86_DR1);
 -    env->dr[2] = rreg(cpu_state->hvf_fd, HV_X86_DR2);
 -    env->dr[3] = rreg(cpu_state->hvf_fd, HV_X86_DR3);
 -    env->dr[4] = rreg(cpu_state->hvf_fd, HV_X86_DR4);
 -    env->dr[5] = rreg(cpu_state->hvf_fd, HV_X86_DR5);
 -    env->dr[6] = rreg(cpu_state->hvf_fd, HV_X86_DR6);
 -    env->dr[7] = rreg(cpu_state->hvf_fd, HV_X86_DR7);
 +    env->dr[0] = rreg(cpu_state->hvf->fd, HV_X86_DR0);
 +    env->dr[1] = rreg(cpu_state->hvf->fd, HV_X86_DR1);
 +    env->dr[2] = rreg(cpu_state->hvf->fd, HV_X86_DR2);
 +    env->dr[3] = rreg(cpu_state->hvf->fd, HV_X86_DR3);
 +    env->dr[4] = rreg(cpu_state->hvf->fd, HV_X86_DR4);
 +    env->dr[5] = rreg(cpu_state->hvf->fd, HV_X86_DR5);
 +    env->dr[6] = rreg(cpu_state->hvf->fd, HV_X86_DR6);
 +    env->dr[7] = rreg(cpu_state->hvf->fd, HV_X86_DR7);
      x86_update_hflags(env);
      return 0;
@@ -XXX,XX +XXX,XX @@ int hvf_get_registers(CPUState *cpu_state)
  static void vmx_set_int_window_exiting(CPUState *cpu)
  {
       uint64_t val;
 -     val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
 -     wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val |
 +     val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
 +     wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val |
               VMCS_PRI_PROC_BASED_CTLS_INT_WINDOW_EXITING);
  }
  void vmx_clear_int_window_exiting(CPUState *cpu)
  {
       uint64_t val;
 -     val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
 -     wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val &
 +     val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
 +     wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val &
               ~VMCS_PRI_PROC_BASED_CTLS_INT_WINDOW_EXITING);
  }
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
      uint64_t info = 0;
      if (have_event) {
          info = vector | intr_type | VMCS_INTR_VALID;
 -        uint64_t reason = rvmcs(cpu_state->hvf_fd, VMCS_EXIT_REASON);
 +        uint64_t reason = rvmcs(cpu_state->hvf->fd, VMCS_EXIT_REASON);
          if (env->nmi_injected && reason != EXIT_REASON_TASK_SWITCH) {
              vmx_clear_nmi_blocking(cpu_state);
          }
-         fn_gvec_ptr = size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
+@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
+             info &= ~(1 << 12); /* clear undefined bit */
-         int size = extract32(insn, 20, 1);
+             if (intr_type == VMCS_INTR_T_SWINTR ||
-         data = extract32(insn, 24, 1); /* rot */
+                 intr_type == VMCS_INTR_T_SWEXCEPTION) {
-         if (!dc_isar_feature(aa32_vcma, s)
+-                wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INST_LENGTH, env->ins_len);
--            || (!size && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) {
++                wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INST_LENGTH, env->ins_len);
-+            || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
+             }
-             return 1;
              if (env->has_error_code) {
 -                wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_EXCEPTION_ERROR,
 +                wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_EXCEPTION_ERROR,
                        env->error_code);
                  /* Indicate that VMCS_ENTRY_EXCEPTION_ERROR is valid */
                  info |= VMCS_INTR_DEL_ERRCODE;
              }
              /*printf("reinject  %lx err %d\n", info, err);*/
 -            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, info);
 +            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, info);
          };
      }
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
          if (!(env->hflags2 & HF2_NMI_MASK) && !(info & VMCS_INTR_VALID)) {
              cpu_state->interrupt_request &= ~CPU_INTERRUPT_NMI;
              info = VMCS_INTR_VALID | VMCS_INTR_T_NMI | EXCP02_NMI;
 -            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, info);
 +            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, info);
          } else {
              vmx_set_nmi_window_exiting(cpu_state);
          }
-         fn_gvec_ptr = size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
+@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
+         int line = cpu_get_pic_interrupt(&x86cpu->env);
-             return 1;
+         cpu_state->interrupt_request &= ~CPU_INTERRUPT_HARD;
          if (line >= 0) {
 -            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, line |
 +            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, line |
                    VMCS_INTR_VALID | VMCS_INTR_T_HWINTR);
          }
-         if (size == 0) {
+     }
--            if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
-+            if (!dc_isar_feature(aa32_fp16_arith, s)) {
+     X86CPU *cpu = X86_CPU(cpu_state);
-                 return 1;
+     CPUX86State *env = &cpu->env;
-             }
-             /* For fp16, rm is just Vm, and index is M.  */
+-    env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
 +    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
      if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
          cpu_synchronize_state(cpu_state);
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 35/45] target/arm: Use gvec for VSRI, VSLI
+[PULL 39/45] hvf: Simplify post reset/init/loadvm hooks
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Alexander Graf <agraf@csgraf.de>
-Move shi_op and sli_op expanders from translate-a64.c.
+The hooks we have that call us after reset, init and loadvm really all
 just want to say "The reference of all register state is in the QEMU
 vcpu struct, please push it".
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+We already have a working pushing mechanism though called cpu->vcpu_dirty,
-Message-id: 20181011205206.3552-15-richard.henderson@linaro.org
+so we can just reuse that for all of the above, syncing state properly the
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+next time we actually execute a vCPU.
 This fixes PSCI resets on ARM, as they modify CPU state even after the
 post init call has completed, but before we execute the vCPU again.
 To also make the scheme work for x86, we have to make sure we don't
 move stale eflags into our env when the vcpu state is dirty.
 Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
 Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
 Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-13-agraf@csgraf.de
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.h     |   2 +
+ accel/hvf/hvf-accel-ops.c | 27 +++++++--------------------
- target/arm/translate-a64.c | 152 +----------------------
+ target/i386/hvf/x86hvf.c  |  5 ++++-
- target/arm/translate.c     | 244 ++++++++++++++++++++++++++-----------
+files changed, 11 insertions(+), 21 deletions(-)
 files changed, 179 insertions(+), 219 deletions(-)
-diff --git a/target/arm/translate.h b/target/arm/translate.h
+diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.h
+--- a/accel/hvf/hvf-accel-ops.c
-+++ b/target/arm/translate.h
++++ b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@ extern const GVecGen3 bit_op;
+@@ -XXX,XX +XXX,XX @@ static void hvf_cpu_synchronize_state(CPUState *cpu)
  extern const GVecGen3 bif_op;
  extern const GVecGen2i ssra_op[4];
  extern const GVecGen2i usra_op[4];
 +extern const GVecGen2i sri_op[4];
 +extern const GVecGen2i sli_op[4];
  /*
   * Forward to the isar_feature_* tests given a DisasContext pointer.
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn)
      }
  }
--static void gen_shr8_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
+-static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
--{
+-                                              run_on_cpu_data arg)
--    uint64_t mask = dup_const(MO_8, 0xff >> shift);
++static void do_hvf_cpu_synchronize_set_dirty(CPUState *cpu,
--    TCGv_i64 t = tcg_temp_new_i64();
++                                             run_on_cpu_data arg)
--
+ {
--    tcg_gen_shri_i64(t, a, shift);
+-    hvf_put_registers(cpu);
--    tcg_gen_andi_i64(t, t, mask);
+-    cpu->vcpu_dirty = false;
--    tcg_gen_andi_i64(d, d, ~mask);
++    /* QEMU state is the reference, push it to HVF now and on next entry */
--    tcg_gen_or_i64(d, d, t);
++    cpu->vcpu_dirty = true;
--    tcg_temp_free_i64(t);
+ }
  static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
  {
 -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
 -}
 -
--static void gen_shr16_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
+-static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
 -                                             run_on_cpu_data arg)
 -{
--    uint64_t mask = dup_const(MO_16, 0xffff >> shift);
+-    hvf_put_registers(cpu);
--    TCGv_i64 t = tcg_temp_new_i64();
+-    cpu->vcpu_dirty = false;
--
++    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
--    tcg_gen_shri_i64(t, a, shift);
+ }
--    tcg_gen_andi_i64(t, t, mask);
--    tcg_gen_andi_i64(d, d, ~mask);
+ static void hvf_cpu_synchronize_post_init(CPUState *cpu)
--    tcg_gen_or_i64(d, d, t);
+ {
--    tcg_temp_free_i64(t);
+-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
 -}
 -
--static void gen_shr32_ins_i32(TCGv_i32 d, TCGv_i32 a, int32_t shift)
+-static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
 -                                              run_on_cpu_data arg)
 -{
--    tcg_gen_shri_i32(a, a, shift);
+-    cpu->vcpu_dirty = true;
--    tcg_gen_deposit_i32(d, d, a, 0, 32 - shift);
++    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
--}
+ }
--
--static void gen_shr64_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
+ static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
 -{
 -    tcg_gen_shri_i64(a, a, shift);
 -    tcg_gen_deposit_i64(d, d, a, 0, 64 - shift);
 -}
 -
 -static void gen_shr_ins_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
 -{
 -    uint64_t mask = (2ull << ((8 << vece) - 1)) - 1;
 -    TCGv_vec t = tcg_temp_new_vec_matching(d);
 -    TCGv_vec m = tcg_temp_new_vec_matching(d);
 -
 -    tcg_gen_dupi_vec(vece, m, mask ^ (mask >> sh));
 -    tcg_gen_shri_vec(vece, t, a, sh);
 -    tcg_gen_and_vec(vece, d, d, m);
 -    tcg_gen_or_vec(vece, d, d, t);
 -
 -    tcg_temp_free_vec(t);
 -    tcg_temp_free_vec(m);
 -}
 -
  /* SSHR[RA]/USHR[RA] - Vector shift right (optional rounding/accumulate) */
  static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u,
                                   int immh, int immb, int opcode, int rn, int rd)
  {
--    static const GVecGen2i sri_op[4] = {
+-    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
--        { .fni8 = gen_shr8_ins_i64,
++    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
 -          .fniv = gen_shr_ins_vec,
 -          .load_dest = true,
 -          .opc = INDEX_op_shri_vec,
 -          .vece = MO_8 },
 -        { .fni8 = gen_shr16_ins_i64,
 -          .fniv = gen_shr_ins_vec,
 -          .load_dest = true,
 -          .opc = INDEX_op_shri_vec,
 -          .vece = MO_16 },
 -        { .fni4 = gen_shr32_ins_i32,
 -          .fniv = gen_shr_ins_vec,
 -          .load_dest = true,
 -          .opc = INDEX_op_shri_vec,
 -          .vece = MO_32 },
 -        { .fni8 = gen_shr64_ins_i64,
 -          .fniv = gen_shr_ins_vec,
 -          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -          .load_dest = true,
 -          .opc = INDEX_op_shri_vec,
 -          .vece = MO_64 },
 -    };
 -
      int size = 32 - clz32(immh) - 1;
      int immhb = immh << 3 | immb;
      int shift = 2 * (8 << size) - immhb;
@@ -XXX,XX +XXX,XX @@ static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u,
      clear_vec_high(s, is_q, rd);
  }
--static void gen_shl8_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
+ static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
--{
+diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
 -    uint64_t mask = dup_const(MO_8, 0xff << shift);
 -    TCGv_i64 t = tcg_temp_new_i64();
 -
 -    tcg_gen_shli_i64(t, a, shift);
 -    tcg_gen_andi_i64(t, t, mask);
 -    tcg_gen_andi_i64(d, d, ~mask);
 -    tcg_gen_or_i64(d, d, t);
 -    tcg_temp_free_i64(t);
 -}
 -
 -static void gen_shl16_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 -{
 -    uint64_t mask = dup_const(MO_16, 0xffff << shift);
 -    TCGv_i64 t = tcg_temp_new_i64();
 -
 -    tcg_gen_shli_i64(t, a, shift);
 -    tcg_gen_andi_i64(t, t, mask);
 -    tcg_gen_andi_i64(d, d, ~mask);
 -    tcg_gen_or_i64(d, d, t);
 -    tcg_temp_free_i64(t);
 -}
 -
 -static void gen_shl32_ins_i32(TCGv_i32 d, TCGv_i32 a, int32_t shift)
 -{
 -    tcg_gen_deposit_i32(d, d, a, shift, 32 - shift);
 -}
 -
 -static void gen_shl64_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 -{
 -    tcg_gen_deposit_i64(d, d, a, shift, 64 - shift);
 -}
 -
 -static void gen_shl_ins_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
 -{
 -    uint64_t mask = (1ull << sh) - 1;
 -    TCGv_vec t = tcg_temp_new_vec_matching(d);
 -    TCGv_vec m = tcg_temp_new_vec_matching(d);
 -
 -    tcg_gen_dupi_vec(vece, m, mask);
 -    tcg_gen_shli_vec(vece, t, a, sh);
 -    tcg_gen_and_vec(vece, d, d, m);
 -    tcg_gen_or_vec(vece, d, d, t);
 -
 -    tcg_temp_free_vec(t);
 -    tcg_temp_free_vec(m);
 -}
 -
  /* SHL/SLI - Vector shift left */
  static void handle_vec_simd_shli(DisasContext *s, bool is_q, bool insert,
                                   int immh, int immb, int opcode, int rn, int rd)
  {
 -    static const GVecGen2i shi_op[4] = {
 -        { .fni8 = gen_shl8_ins_i64,
 -          .fniv = gen_shl_ins_vec,
 -          .opc = INDEX_op_shli_vec,
 -          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -          .load_dest = true,
 -          .vece = MO_8 },
 -        { .fni8 = gen_shl16_ins_i64,
 -          .fniv = gen_shl_ins_vec,
 -          .opc = INDEX_op_shli_vec,
 -          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -          .load_dest = true,
 -          .vece = MO_16 },
 -        { .fni4 = gen_shl32_ins_i32,
 -          .fniv = gen_shl_ins_vec,
 -          .opc = INDEX_op_shli_vec,
 -          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -          .load_dest = true,
 -          .vece = MO_32 },
 -        { .fni8 = gen_shl64_ins_i64,
 -          .fniv = gen_shl_ins_vec,
 -          .opc = INDEX_op_shli_vec,
 -          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 -          .load_dest = true,
 -          .vece = MO_64 },
 -    };
      int size = 32 - clz32(immh) - 1;
      int immhb = immh << 3 | immb;
      int shift = immhb - (8 << size);
@@ -XXX,XX +XXX,XX @@ static void handle_vec_simd_shli(DisasContext *s, bool is_q, bool insert,
      }
      if (insert) {
 -        gen_gvec_op2i(s, is_q, rd, rn, shift, &shi_op[size]);
 +        gen_gvec_op2i(s, is_q, rd, rn, shift, &sli_op[size]);
      } else {
          gen_gvec_fn2i(s, is_q, rd, rn, shift, tcg_gen_gvec_shli, size);
      }
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/i386/hvf/x86hvf.c
-+++ b/target/arm/translate.c
++++ b/target/i386/hvf/x86hvf.c
-@@ -XXX,XX +XXX,XX @@ const GVecGen2i usra_op[4] = {
+@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
-       .vece = MO_64, },
+     X86CPU *cpu = X86_CPU(cpu_state);
- };
+     CPUX86State *env = &cpu->env;
-+static void gen_shr8_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
+-    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
-+{
++    if (!cpu_state->vcpu_dirty) {
-+    uint64_t mask = dup_const(MO_8, 0xff >> shift);
++        /* light weight sync for CPU_INTERRUPT_HARD and IF_MASK */
-+    TCGv_i64 t = tcg_temp_new_i64();
++        env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
 +
 +    tcg_gen_shri_i64(t, a, shift);
 +    tcg_gen_andi_i64(t, t, mask);
 +    tcg_gen_andi_i64(d, d, ~mask);
 +    tcg_gen_or_i64(d, d, t);
 +    tcg_temp_free_i64(t);
 +}
 +
 +static void gen_shr16_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 +{
 +    uint64_t mask = dup_const(MO_16, 0xffff >> shift);
 +    TCGv_i64 t = tcg_temp_new_i64();
 +
 +    tcg_gen_shri_i64(t, a, shift);
 +    tcg_gen_andi_i64(t, t, mask);
 +    tcg_gen_andi_i64(d, d, ~mask);
 +    tcg_gen_or_i64(d, d, t);
 +    tcg_temp_free_i64(t);
 +}
 +
 +static void gen_shr32_ins_i32(TCGv_i32 d, TCGv_i32 a, int32_t shift)
 +{
 +    tcg_gen_shri_i32(a, a, shift);
 +    tcg_gen_deposit_i32(d, d, a, 0, 32 - shift);
 +}
 +
 +static void gen_shr64_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 +{
 +    tcg_gen_shri_i64(a, a, shift);
 +    tcg_gen_deposit_i64(d, d, a, 0, 64 - shift);
 +}
 +
 +static void gen_shr_ins_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
 +{
 +    if (sh == 0) {
 +        tcg_gen_mov_vec(d, a);
 +    } else {
 +        TCGv_vec t = tcg_temp_new_vec_matching(d);
 +        TCGv_vec m = tcg_temp_new_vec_matching(d);
 +
 +        tcg_gen_dupi_vec(vece, m, MAKE_64BIT_MASK((8 << vece) - sh, sh));
 +        tcg_gen_shri_vec(vece, t, a, sh);
 +        tcg_gen_and_vec(vece, d, d, m);
 +        tcg_gen_or_vec(vece, d, d, t);
 +
 +        tcg_temp_free_vec(t);
 +        tcg_temp_free_vec(m);
 +    }
-+}
-+
+     if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
-+const GVecGen2i sri_op[4] = {
+         cpu_synchronize_state(cpu_state);
 +    { .fni8 = gen_shr8_ins_i64,
 +      .fniv = gen_shr_ins_vec,
 +      .load_dest = true,
 +      .opc = INDEX_op_shri_vec,
 +      .vece = MO_8 },
 +    { .fni8 = gen_shr16_ins_i64,
 +      .fniv = gen_shr_ins_vec,
 +      .load_dest = true,
 +      .opc = INDEX_op_shri_vec,
 +      .vece = MO_16 },
 +    { .fni4 = gen_shr32_ins_i32,
 +      .fniv = gen_shr_ins_vec,
 +      .load_dest = true,
 +      .opc = INDEX_op_shri_vec,
 +      .vece = MO_32 },
 +    { .fni8 = gen_shr64_ins_i64,
 +      .fniv = gen_shr_ins_vec,
 +      .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 +      .load_dest = true,
 +      .opc = INDEX_op_shri_vec,
 +      .vece = MO_64 },
 +};
 +
 +static void gen_shl8_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 +{
 +    uint64_t mask = dup_const(MO_8, 0xff << shift);
 +    TCGv_i64 t = tcg_temp_new_i64();
 +
 +    tcg_gen_shli_i64(t, a, shift);
 +    tcg_gen_andi_i64(t, t, mask);
 +    tcg_gen_andi_i64(d, d, ~mask);
 +    tcg_gen_or_i64(d, d, t);
 +    tcg_temp_free_i64(t);
 +}
 +
 +static void gen_shl16_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 +{
 +    uint64_t mask = dup_const(MO_16, 0xffff << shift);
 +    TCGv_i64 t = tcg_temp_new_i64();
 +
 +    tcg_gen_shli_i64(t, a, shift);
 +    tcg_gen_andi_i64(t, t, mask);
 +    tcg_gen_andi_i64(d, d, ~mask);
 +    tcg_gen_or_i64(d, d, t);
 +    tcg_temp_free_i64(t);
 +}
 +
 +static void gen_shl32_ins_i32(TCGv_i32 d, TCGv_i32 a, int32_t shift)
 +{
 +    tcg_gen_deposit_i32(d, d, a, shift, 32 - shift);
 +}
 +
 +static void gen_shl64_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 +{
 +    tcg_gen_deposit_i64(d, d, a, shift, 64 - shift);
 +}
 +
 +static void gen_shl_ins_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
 +{
 +    if (sh == 0) {
 +        tcg_gen_mov_vec(d, a);
 +    } else {
 +        TCGv_vec t = tcg_temp_new_vec_matching(d);
 +        TCGv_vec m = tcg_temp_new_vec_matching(d);
 +
 +        tcg_gen_dupi_vec(vece, m, MAKE_64BIT_MASK(0, sh));
 +        tcg_gen_shli_vec(vece, t, a, sh);
 +        tcg_gen_and_vec(vece, d, d, m);
 +        tcg_gen_or_vec(vece, d, d, t);
 +
 +        tcg_temp_free_vec(t);
 +        tcg_temp_free_vec(m);
 +    }
 +}
 +
 +const GVecGen2i sli_op[4] = {
 +    { .fni8 = gen_shl8_ins_i64,
 +      .fniv = gen_shl_ins_vec,
 +      .load_dest = true,
 +      .opc = INDEX_op_shli_vec,
 +      .vece = MO_8 },
 +    { .fni8 = gen_shl16_ins_i64,
 +      .fniv = gen_shl_ins_vec,
 +      .load_dest = true,
 +      .opc = INDEX_op_shli_vec,
 +      .vece = MO_16 },
 +    { .fni4 = gen_shl32_ins_i32,
 +      .fniv = gen_shl_ins_vec,
 +      .load_dest = true,
 +      .opc = INDEX_op_shli_vec,
 +      .vece = MO_32 },
 +    { .fni8 = gen_shl64_ins_i64,
 +      .fniv = gen_shl_ins_vec,
 +      .prefer_i64 = TCG_TARGET_REG_BITS == 64,
 +      .load_dest = true,
 +      .opc = INDEX_op_shli_vec,
 +      .vece = MO_64 },
 +};
 +
  /* Translate a NEON data processing instruction.  Return nonzero if the
     instruction is invalid.
     We process data in a mixture of 32-bit and 64-bit chunks.
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
      int pairwise;
      int u;
      int vec_size;
 -    uint32_t imm, mask;
 +    uint32_t imm;
      TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
      TCGv_ptr ptr1, ptr2, ptr3;
      TCGv_i64 tmp64;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      }
                      return 0;
 +                case 4: /* VSRI */
 +                    if (!u) {
 +                        return 1;
 +                    }
 +                    /* Right shift comes here negative.  */
 +                    shift = -shift;
 +                    /* Shift out of range leaves destination unchanged.  */
 +                    if (shift < 8 << size) {
 +                        tcg_gen_gvec_2i(rd_ofs, rm_ofs, vec_size, vec_size,
 +                                        shift, &sri_op[size]);
 +                    }
 +                    return 0;
 +
                  case 5: /* VSHL, VSLI */
 -                    if (!u) { /* VSHL */
 +                    if (u) { /* VSLI */
 +                        /* Shift out of range leaves destination unchanged.  */
 +                        if (shift < 8 << size) {
 +                            tcg_gen_gvec_2i(rd_ofs, rm_ofs, vec_size,
 +                                            vec_size, shift, &sli_op[size]);
 +                        }
 +                    } else { /* VSHL */
                          /* Shifts larger than the element size are
                           * architecturally valid and results in zero.
                           */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                              tcg_gen_gvec_shli(size, rd_ofs, rm_ofs, shift,
                                                vec_size, vec_size);
                          }
 -                        return 0;
                      }
 -                    break;
 +                    return 0;
                  }
                  if (size == 3) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                              else
                                  gen_helper_neon_rshl_s64(cpu_V0, cpu_V0, cpu_V1);
                              break;
 -                        case 4: /* VSRI */
 -                        case 5: /* VSHL, VSLI */
 -                            gen_helper_neon_shl_u64(cpu_V0, cpu_V0, cpu_V1);
 -                            break;
                          case 6: /* VQSHLU */
                              gen_helper_neon_qshlu_s64(cpu_V0, cpu_env,
                                                        cpu_V0, cpu_V1);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                              /* Accumulate.  */
                              neon_load_reg64(cpu_V1, rd + pass);
                              tcg_gen_add_i64(cpu_V0, cpu_V0, cpu_V1);
 -                        } else if (op == 4 || (op == 5 && u)) {
 -                            /* Insert */
 -                            neon_load_reg64(cpu_V1, rd + pass);
 -                            uint64_t mask;
 -                            if (shift < -63 || shift > 63) {
 -                                mask = 0;
 -                            } else {
 -                                if (op == 4) {
 -                                    mask = 0xffffffffffffffffull >> -shift;
 -                                } else {
 -                                    mask = 0xffffffffffffffffull << shift;
 -                                }
 -                            }
 -                            tcg_gen_andi_i64(cpu_V1, cpu_V1, ~mask);
 -                            tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
                          }
                          neon_store_reg64(cpu_V0, rd + pass);
                      } else { /* size < 3 */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          case 3: /* VRSRA */
                              GEN_NEON_INTEGER_OP(rshl);
                              break;
 -                        case 4: /* VSRI */
 -                        case 5: /* VSHL, VSLI */
 -                            switch (size) {
 -                            case 0: gen_helper_neon_shl_u8(tmp, tmp, tmp2); break;
 -                            case 1: gen_helper_neon_shl_u16(tmp, tmp, tmp2); break;
 -                            case 2: gen_helper_neon_shl_u32(tmp, tmp, tmp2); break;
 -                            default: abort();
 -                            }
 -                            break;
                          case 6: /* VQSHLU */
                              switch (size) {
                              case 0:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                              tmp2 = neon_load_reg(rd, pass);
                              gen_neon_add(size, tmp, tmp2);
                              tcg_temp_free_i32(tmp2);
 -                        } else if (op == 4 || (op == 5 && u)) {
 -                            /* Insert */
 -                            switch (size) {
 -                            case 0:
 -                                if (op == 4)
 -                                    mask = 0xff >> -shift;
 -                                else
 -                                    mask = (uint8_t)(0xff << shift);
 -                                mask |= mask << 8;
 -                                mask |= mask << 16;
 -                                break;
 -                            case 1:
 -                                if (op == 4)
 -                                    mask = 0xffff >> -shift;
 -                                else
 -                                    mask = (uint16_t)(0xffff << shift);
 -                                mask |= mask << 16;
 -                                break;
 -                            case 2:
 -                                if (shift < -31 || shift > 31) {
 -                                    mask = 0;
 -                                } else {
 -                                    if (op == 4)
 -                                        mask = 0xffffffffu >> -shift;
 -                                    else
 -                                        mask = 0xffffffffu << shift;
 -                                }
 -                                break;
 -                            default:
 -                                abort();
 -                            }
 -                            tmp2 = neon_load_reg(rd, pass);
 -                            tcg_gen_andi_i32(tmp, tmp, mask);
 -                            tcg_gen_andi_i32(tmp2, tmp2, ~mask);
 -                            tcg_gen_or_i32(tmp, tmp, tmp2);
 -                            tcg_temp_free_i32(tmp2);
                          }
                          neon_store_reg(rd, pass, tmp);
                      }
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 45/45] target/arm: Only flush tlb if ASID changes
+[PULL 40/45] tests/qtest/bios-tables-test: Check for dup2() failure
-From: Richard Henderson <richard.henderson@linaro.org>
+Coverity notes that we don't check for dup2() failing.  Add some
 assertions so that if it does ever happen we get some indication.
 (This is similar to how we handle other "don't expect this syscall to
 fail" checks in this test code.)
-Since QEMU does not implement ASIDs, changes to the ASID must flush the
+Fixes: Coverity CID 1432346
-tlb.  However, if the ASID does not change there is no reason to flush.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
 Message-id: 20210525134458.6675-2-peter.maydell@linaro.org
 ---
  tests/qtest/bios-tables-test.c | 8 ++++++--
 file changed, 6 insertions(+), 2 deletions(-)
-In testing a boot of the Ubuntu installer to the first menu, this reduces
+diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
 the number of flushes by 30%, or nearly 600k instances.
 Reviewed-by: Aaron Lindsay <aaron@os.amperecomputing.com>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Message-id: 20181019015617.22583-3-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/helper.c | 8 +++-----
 file changed, 3 insertions(+), 5 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/tests/qtest/bios-tables-test.c
-+++ b/target/arm/helper.c
++++ b/tests/qtest/bios-tables-test.c
-@@ -XXX,XX +XXX,XX @@ static void vmsa_tcr_el1_write(CPUARMState *env, const ARMCPRegInfo *ri,
+@@ -XXX,XX +XXX,XX @@ static void test_acpi_asl(test_data *data)
- static void vmsa_ttbr_write(CPUARMState *env, const ARMCPRegInfo *ri,
+                                                  exp_sdt->asl_file, sdt->asl_file);
-                             uint64_t value)
+                     int out = dup(STDOUT_FILENO);
- {
+                     int ret G_GNUC_UNUSED;
--    /* 64 bit accesses to the TTBRs can change the ASID and so we
++                    int dupret;
--     * must flush the TLB.
--     */
+-                    dup2(STDERR_FILENO, STDOUT_FILENO);
--    if (cpreg_field_is_64bit(ri)) {
++                    g_assert(out >= 0);
-+    /* If the ASID changes (with a 64-bit write), we must flush the TLB.  */
++                    dupret = dup2(STDERR_FILENO, STDOUT_FILENO);
-+    if (cpreg_field_is_64bit(ri) &&
++                    g_assert(dupret >= 0);
-+        extract64(raw_read(env, ri) ^ value, 48, 16) != 0) {
+                     ret = system(diff) ;
-         ARMCPU *cpu = arm_env_get_cpu(env);
+-                    dup2(out, STDOUT_FILENO);
--
++                    dupret = dup2(out, STDOUT_FILENO);
-         tlb_flush(CPU(cpu));
++                    g_assert(dupret >= 0);
-     }
+                     close(out);
-     raw_write(env, ri, value);
+                     g_free(diff);
                  }
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 43/45] net: cadence_gem: Announce 64bit addressing support
+[PULL 41/45] tests/qtest/e1000e-test: Check qemu_recv() succeeded
-From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+The e1000e_send_verify() test calls qemu_recv() but doesn't
 check that the call succeeded, which annoys Coverity. Add
 an explicit test check for the length of the data.
-Announce 64bit addressing support.
+(This is a test check, not a "we assume this syscall always
 succeeds", so we use g_assert_cmpint() rather than g_assert().)
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Fixes: Coverity CID 1432324
 Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
 Message-id: 20181017213932.19973-3-edgar.iglesias@gmail.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
+Message-id: 20210525134458.6675-3-peter.maydell@linaro.org
 ---
- hw/net/cadence_gem.c | 3 ++-
+ tests/qtest/e1000e-test.c | 3 ++-
 file changed, 2 insertions(+), 1 deletion(-)
-diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
+diff --git a/tests/qtest/e1000e-test.c b/tests/qtest/e1000e-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/net/cadence_gem.c
+--- a/tests/qtest/e1000e-test.c
-+++ b/hw/net/cadence_gem.c
++++ b/tests/qtest/e1000e-test.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void e1000e_send_verify(QE1000E *d, int *test_sockets, QGuestAllocator *a
- #define GEM_DESCONF4      (0x0000028C/4)
+     /* Check data sent to the backend */
- #define GEM_DESCONF5      (0x00000290/4)
+     ret = qemu_recv(test_sockets[0], &recv_len, sizeof(recv_len), 0);
- #define GEM_DESCONF6      (0x00000294/4)
+     g_assert_cmpint(ret, == , sizeof(recv_len));
-+#define GEM_DESCONF6_64B_MASK (1U << 23)
+-    qemu_recv(test_sockets[0], buffer, 64, 0);
- #define GEM_DESCONF7      (0x00000298/4)
++    ret = qemu_recv(test_sockets[0], buffer, 64, 0);
++    g_assert_cmpint(ret, >=, 5);
- #define GEM_INT_Q1_STATUS               (0x00000400 / 4)
+     g_assert_cmpstr(buffer, == , "TEST");
-@@ -XXX,XX +XXX,XX @@ static void gem_reset(DeviceState *d)
-     s->regs[GEM_DESCONF] = 0x02500111;
+     /* Free test data buffer */
      s->regs[GEM_DESCONF2] = 0x2ab13fff;
      s->regs[GEM_DESCONF5] = 0x002f2045;
 -    s->regs[GEM_DESCONF6] = 0x0;
 +    s->regs[GEM_DESCONF6] = GEM_DESCONF6_64B_MASK;
      if (s->num_priority_queues > 1) {
          queues_mask = MAKE_64BIT_MASK(1, s->num_priority_queues - 1);
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 20/45] target/arm: Report correct syndrome for FP/SIMD traps to Hyp mode
+[PULL 42/45] tests/qtest/hd-geo-test: Fix checks on mkstemp() return value
-For traps of FP/SIMD instructions to AArch32 Hyp mode, the syndrome
+Coverity notices that the checks against mkstemp() failing in
-provided in HSR has more information than is reported to AArch64.
+create_qcow2_with_mbr() are wrong: mkstemp returns -1 on failure but
-Specifically, there are extra fields TA and coproc which indicate
+the check is just "g_assert(fd)".  Fix to use "g_assert(fd >= 0)",
-whether the trapped instruction was FP or SIMD. Add this extra
+matching the correct check in create_test_img().
 information to the syndromes we construct, and mask it out when
 taking the exception to AArch64.
+Fixes: Coverity CID 1432274
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20181012144235.19646-11-peter.maydell@linaro.org
+Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
 Message-id: 20210525134458.6675-4-peter.maydell@linaro.org
 ---
- target/arm/internals.h | 14 +++++++++++++-
+ tests/qtest/hd-geo-test.c | 4 ++--
- target/arm/helper.c    |  9 +++++++++
+file changed, 2 insertions(+), 2 deletions(-)
  target/arm/translate.c |  8 ++++----
 files changed, 26 insertions(+), 5 deletions(-)
-diff --git a/target/arm/internals.h b/target/arm/internals.h
+diff --git a/tests/qtest/hd-geo-test.c b/tests/qtest/hd-geo-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/internals.h
+--- a/tests/qtest/hd-geo-test.c
-+++ b/target/arm/internals.h
++++ b/tests/qtest/hd-geo-test.c
-@@ -XXX,XX +XXX,XX @@ static inline uint32_t syn_get_ec(uint32_t syn)
+@@ -XXX,XX +XXX,XX @@ static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors)
   * few cases the value in HSR for exceptions taken to AArch32 Hyp
   * mode differs slightly, and we fix this up when populating HSR in
   * arm_cpu_do_interrupt_aarch32_hyp().
 + * The exception is FP/SIMD access traps -- these report extra information
 + * when taking an exception to AArch32. For those we include the extra coproc
 + * and TA fields, and mask them out when taking the exception to AArch64.
   */
  static inline uint32_t syn_uncategorized(void)
  {
@@ -XXX,XX +XXX,XX @@ static inline uint32_t syn_cp15_rrt_trap(int cv, int cond, int opc1, int crm,
  static inline uint32_t syn_fp_access_trap(int cv, int cond, bool is_16bit)
  {
 +    /* AArch32 FP trap or any AArch64 FP/SIMD trap: TA == 0 coproc == 0xa */
      return (EC_ADVSIMDFPACCESSTRAP << ARM_EL_EC_SHIFT)
          | (is_16bit ? 0 : ARM_EL_IL)
 -        | (cv << 24) | (cond << 20);
 +        | (cv << 24) | (cond << 20) | 0xa;
 +}
 +
 +static inline uint32_t syn_simd_access_trap(int cv, int cond, bool is_16bit)
 +{
 +    /* AArch32 SIMD trap: TA == 1 coproc == 0 */
 +    return (EC_ADVSIMDFPACCESSTRAP << ARM_EL_EC_SHIFT)
 +        | (is_16bit ? 0 : ARM_EL_IL)
 +        | (cv << 24) | (cond << 20) | (1 << 5);
  }
  static inline uint32_t syn_sve_access_trap(void)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
      case EXCP_HVC:
      case EXCP_HYP_TRAP:
      case EXCP_SMC:
 +        if (syn_get_ec(env->exception.syndrome) == EC_ADVSIMDFPACCESSTRAP) {
 +            /*
 +             * QEMU internal FP/SIMD syndromes from AArch32 include the
 +             * TA and coproc fields which are only exposed if the exception
 +             * is taken to AArch32 Hyp mode. Mask them out to get a valid
 +             * AArch64 format syndrome.
 +             */
 +            env->exception.syndrome &= ~MAKE_64BIT_MASK(0, 20);
 +        }
          env->cp15.esr_el[new_el] = env->exception.syndrome;
          break;
      case EXCP_IRQ:
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
       */
      if (s->fp_excp_el) {
          gen_exception_insn(s, 4, EXCP_UDEF,
 -                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
 +                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
          return 0;
      }
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+     fd = mkstemp(raw_path);
-      */
+-    g_assert(fd);
-     if (s->fp_excp_el) {
++    g_assert(fd >= 0);
-         gen_exception_insn(s, 4, EXCP_UDEF,
+     close(fd);
--                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
-+                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
+     fd = open(raw_path, O_WRONLY);
-         return 0;
+@@ -XXX,XX +XXX,XX @@ static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors)
-     }
+     close(fd);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
+     fd = mkstemp(qcow2_path);
+-    g_assert(fd);
-     if (s->fp_excp_el) {
++    g_assert(fd >= 0);
-         gen_exception_insn(s, 4, EXCP_UDEF,
+     close(fd);
--                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
-+                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
+     qemu_img_path = getenv("QTEST_QEMU_IMG");
          return 0;
      }
      if (!s->vfp_enabled) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
      if (s->fp_excp_el) {
          gen_exception_insn(s, 4, EXCP_UDEF,
 -                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
 +                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
          return 0;
      }
      if (!s->vfp_enabled) {
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 44/45] target/arm: Remove writefn from TTBR0_EL3
+[PULL 43/45] tests/qtest/pflash-cfi02-test: Avoid potential integer overflow
-From: Richard Henderson <richard.henderson@linaro.org>
+Coverity points out that we calculate a 64-bit value using 32-bit
 arithmetic; add the cast to force the multiply to be done as 64-bits.
 (The overflow will never happen with the current test data.)
-The EL3 version of this register does not include an ASID,
+Fixes: Coverity CID 1432320
 and so the tlb_flush performed by vmsa_ttbr_write is not needed.
 Reviewed-by: Aaron Lindsay <aaron@os.amperecomputing.com>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20181019015617.22583-2-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
+Message-id: 20210525134458.6675-5-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 2 +-
+ tests/qtest/pflash-cfi02-test.c | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/tests/qtest/pflash-cfi02-test.c b/tests/qtest/pflash-cfi02-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/tests/qtest/pflash-cfi02-test.c
-+++ b/target/arm/helper.c
++++ b/tests/qtest/pflash-cfi02-test.c
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el3_cp_reginfo[] = {
+@@ -XXX,XX +XXX,XX @@ static void test_geometry(const void *opaque)
-       .fieldoffset = offsetof(CPUARMState, cp15.mvbar) },
-     { .name = "TTBR0_EL3", .state = ARM_CP_STATE_AA64,
+     for (int region = 0; region < nb_erase_regions; ++region) {
-       .opc0 = 3, .opc1 = 6, .crn = 2, .crm = 0, .opc2 = 0,
+         for (uint32_t i = 0; i < c->nb_blocs[region]; ++i) {
--      .access = PL3_RW, .writefn = vmsa_ttbr_write, .resetvalue = 0,
+-            uint64_t byte_addr = i * c->sector_len[region];
-+      .access = PL3_RW, .resetvalue = 0,
++            uint64_t byte_addr = (uint64_t)i * c->sector_len[region];
-       .fieldoffset = offsetof(CPUARMState, cp15.ttbr0_el[3]) },
+             g_assert_cmphex(flash_read(c, byte_addr), ==, bank_mask(c));
-     { .name = "TCR_EL3", .state = ARM_CP_STATE_AA64,
+         }
-       .opc0 = 3, .opc1 = 6, .crn = 2, .crm = 0, .opc2 = 2,
+     }
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 15/45] target/arm: ISR_EL1 bits track virtual interrupts if IMO/FMO set
+[PULL 44/45] tests/qtest/tpm-tests: Remove unnecessary NULL checks
-The A/I/F bits in ISR_EL1 should track the virtual interrupt
+Coverity points out that in tpm_test_swtpm_migration_test() we
-status, not the physical interrupt status, if the associated
+assume that src_tpm_addr and dst_tpm_addr are non-NULL (we
-HCR_EL2.AMO/IMO/FMO bit is set. Implement this, rather than
+pass them to tpm_util_migration_start_qemu() which will
-always showing the physical interrupt status.
+unconditionally dereference them) but then later explicitly
 check them for NULL. Remove the pointless checks.
-We don't currently implement anything to do with external
+Fixes: Coverity CID 1432367, 1432359
 aborts, so this applies only to the I and F bits (though it
 ought to be possible for the outer guest to present a virtual
 external abort to the inner guest, even if QEMU doesn't
 emulate physical external aborts, so there is missing
 functionality in this area).
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20181012144235.19646-6-peter.maydell@linaro.org
+Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
 Message-id: 20210525134458.6675-6-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 22 ++++++++++++++++++----
+ tests/qtest/tpm-tests.c | 12 ++++--------
-file changed, 18 insertions(+), 4 deletions(-)
+file changed, 4 insertions(+), 8 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/tests/qtest/tpm-tests.c
-+++ b/target/arm/helper.c
++++ b/tests/qtest/tpm-tests.c
-@@ -XXX,XX +XXX,XX @@ static uint64_t isr_read(CPUARMState *env, const ARMCPRegInfo *ri)
+@@ -XXX,XX +XXX,XX @@ void tpm_test_swtpm_migration_test(const char *src_tpm_path,
-     CPUState *cs = ENV_GET_CPU(env);
+     qtest_quit(src_qemu);
-     uint64_t ret = 0;
+     tpm_util_swtpm_kill(dst_tpm_pid);
--    if (cs->interrupt_request & CPU_INTERRUPT_HARD) {
+-    if (dst_tpm_addr) {
--        ret |= CPSR_I;
+-        g_unlink(dst_tpm_addr->u.q_unix.path);
-+    if (arm_hcr_el2_imo(env)) {
+-        qapi_free_SocketAddress(dst_tpm_addr);
-+        if (cs->interrupt_request & CPU_INTERRUPT_VIRQ) {
+-    }
-+            ret |= CPSR_I;
++    g_unlink(dst_tpm_addr->u.q_unix.path);
-+        }
++    qapi_free_SocketAddress(dst_tpm_addr);
-+    } else {
-+        if (cs->interrupt_request & CPU_INTERRUPT_HARD) {
+     tpm_util_swtpm_kill(src_tpm_pid);
-+            ret |= CPSR_I;
+-    if (src_tpm_addr) {
-+        }
+-        g_unlink(src_tpm_addr->u.q_unix.path);
-     }
+-        qapi_free_SocketAddress(src_tpm_addr);
--    if (cs->interrupt_request & CPU_INTERRUPT_FIQ) {
+-    }
--        ret |= CPSR_F;
++    g_unlink(src_tpm_addr->u.q_unix.path);
-+
++    qapi_free_SocketAddress(src_tpm_addr);
 +    if (arm_hcr_el2_fmo(env)) {
 +        if (cs->interrupt_request & CPU_INTERRUPT_VFIQ) {
 +            ret |= CPSR_F;
 +        }
 +    } else {
 +        if (cs->interrupt_request & CPU_INTERRUPT_FIQ) {
 +            ret |= CPSR_F;
 +        }
      }
 +
      /* External aborts are not possible in QEMU so A bit is always clear */
      return ret;
  }
 --
-.19.1
+.20.1

-[Qemu-devel] [PULL 21/45] hw/arm/boot: Increase compliance with kernel arm64 boot protocol
+[PULL 45/45] tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed
-From: Stewart Hildebrand <Stewart.Hildebrand@dornerworks.com>
+Coverity complains that we don't check for failures from dup()
 and mkstemp(); add asserts that these syscalls succeeded.
-"The Image must be placed text_offset bytes from a 2MB aligned base
+Fixes: Coverity CID 1432516, 1432574
-address anywhere in usable system RAM and called there."
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Message-id: 20210525134458.6675-7-peter.maydell@linaro.org
 ---
  tests/unit/test-vmstate.c | 5 ++++-
 file changed, 4 insertions(+), 1 deletion(-)
-For the virt board, we write our startup bootloader at the very
+diff --git a/tests/unit/test-vmstate.c b/tests/unit/test-vmstate.c
 bottom of RAM, so that bit can't be used for the image. To avoid
 overlap in case the image requests to be loaded at an offset
 smaller than our bootloader, we increment the load offset to the
 next 2MB.
 This fixes a boot failure for Xen AArch64.
 Signed-off-by: Stewart Hildebrand <stewart.hildebrand@dornerworks.com>
 Tested-by: Andre Przywara <andre.przywara@arm.com>
 Message-id: b8a89518794b4436af0c151ed10de4fa@dornerworks.com
 [PMM: Rephrased a comment a bit]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  hw/arm/boot.c | 18 ++++++++++++++++++
 file changed, 18 insertions(+)
 diff --git a/hw/arm/boot.c b/hw/arm/boot.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/boot.c
+--- a/tests/unit/test-vmstate.c
-+++ b/hw/arm/boot.c
++++ b/tests/unit/test-vmstate.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static int temp_fd;
- #include "qemu/config-file.h"
+ /* Duplicate temp_fd and seek to the beginning of the file */
- #include "qemu/option.h"
+ static QEMUFile *open_test_file(bool write)
  #include "exec/address-spaces.h"
 +#include "qemu/units.h"
  /* Kernel boot protocol is specified in the kernel docs
   * Documentation/arm/Booting and Documentation/arm64/booting.txt
@@ -XXX,XX +XXX,XX @@
  #define ARM64_TEXT_OFFSET_OFFSET    8
  #define ARM64_MAGIC_OFFSET          56
 +#define BOOTLOADER_MAX_SIZE         (4 * KiB)
 +
  AddressSpace *arm_boot_address_space(ARMCPU *cpu,
                                       const struct arm_boot_info *info)
  {
-@@ -XXX,XX +XXX,XX @@ static void write_bootloader(const char *name, hwaddr addr,
+-    int fd = dup(temp_fd);
-         code[i] = tswap32(insn);
++    int fd;
-     }
+     QIOChannel *ioc;
+     QEMUFile *f;
-+    assert((len * sizeof(uint32_t)) < BOOTLOADER_MAX_SIZE);
-+
++    fd = dup(temp_fd);
-     rom_add_blob_fixed_as(name, code, len * sizeof(uint32_t), addr, as);
++    g_assert(fd >= 0);
+     lseek(fd, 0, SEEK_SET);
-     g_free(code);
+     if (write) {
-@@ -XXX,XX +XXX,XX @@ static uint64_t load_aarch64_image(const char *filename, hwaddr mem_base,
+         g_assert_cmpint(ftruncate(fd, 0), ==, 0);
-         memcpy(&hdrvals, buffer + ARM64_TEXT_OFFSET_OFFSET, sizeof(hdrvals));
+@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
-         if (hdrvals[1] != 0) {
+     g_autofree char *temp_file = g_strdup_printf("%s/vmst.test.XXXXXX",
-             kernel_load_offset = le64_to_cpu(hdrvals[0]);
+                                                  g_get_tmp_dir());
-+
+     temp_fd = mkstemp(temp_file);
-+            /*
++    g_assert(temp_fd >= 0);
-+             * We write our startup "bootloader" at the very bottom of RAM,
-+             * so that bit can't be used for the image. Luckily the Image
+     module_call_init(MODULE_INIT_QOM);
 +             * format specification is that the image requests only an offset
 +             * from a 2MB boundary, not an absolute load address. So if the
 +             * image requests an offset that might mean it overlaps with the
 +             * bootloader, we can just load it starting at 2MB+offset rather
 +             * than 0MB + offset.
 +             */
 +            if (kernel_load_offset < BOOTLOADER_MAX_SIZE) {
 +                kernel_load_offset += 2 * MiB;
 +            }
          }
      }
 --
-.19.1
+.20.1

As promised, another pullreq... This one's mostly RTH's patches.

thanks
-- PMM

The following changes since commit 784c2e4f232adf5ef47a84a262ec72a07d068d6a:

Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging (2018-10-19 15:30:40 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20181019

for you to fetch changes up to 88c9add25e7120e8622796c81ad3f3fb7f8d40e7:

target/arm: Only flush tlb if ASID changes (2018-10-19 17:38:48 +0100)

----------------------------------------------------------------
target-arm queue:
 * ssi-sd: Make devices picking up backends unavailable with -device
 * Add support for VCPU event states
 * Move towards making ID registers the source of truth for
   whether a guest CPU implements a feature, rather than having
   parallel ID registers and feature bit flags
 * Implement various HCR hypervisor trap/config bits
 * Get IL bit correct for v7 syndrome values
 * Report correct syndrome for FP/SIMD traps to Hyp mode
 * hw/arm/boot: Increase compliance with kernel arm64 boot protocol
 * Refactor A32 Neon to use generic vector infrastructure
 * Fix a bug in A32 VLD2 "(multiple 2-element structures)" insn
 * net: cadence_gem: Report features correctly in ID register
 * Avoid some unnecessary TLB flushes on TTBR register writes

----------------------------------------------------------------
Dongjiu Geng (1):
      target/arm: Add support for VCPU event states

Edgar E. Iglesias (2):
      net: cadence_gem: Announce availability of priority queues
      net: cadence_gem: Announce 64bit addressing support

Markus Armbruster (1):
      ssi-sd: Make devices picking up backends unavailable with -device

Peter Maydell (10):
      target/arm: Improve debug logging of AArch32 exception return
      target/arm: Make switch_mode() file-local
      target/arm: Implement HCR.FB
      target/arm: Implement HCR.DC
      target/arm: ISR_EL1 bits track virtual interrupts if IMO/FMO set
      target/arm: Implement HCR.VI and VF
      target/arm: Implement HCR.PTW
      target/arm: New utility function to extract EC from syndrome
      target/arm: Get IL bit correct for v7 syndrome values
      target/arm: Report correct syndrome for FP/SIMD traps to Hyp mode

Richard Henderson (30):
      target/arm: Move some system registers into a substructure
      target/arm: V8M should not imply V7VE
      target/arm: Convert v8 extensions from feature bits to isar tests
      target/arm: Convert division from feature bits to isar0 tests
      target/arm: Convert jazelle from feature bit to isar1 test
      target/arm: Convert t32ee from feature bit to isar3 test
      target/arm: Convert sve from feature bit to aa64pfr0 test
      target/arm: Convert v8.2-fp16 from feature bit to aa64pfr0 test
      target/arm: Hoist address increment for vector memory ops
      target/arm: Don't call tcg_clear_temp_count
      target/arm: Use tcg_gen_gvec_dup_i64 for LD[1-4]R
      target/arm: Promote consecutive memory ops for aa64
      target/arm: Mark some arrays const
      target/arm: Use gvec for NEON VDUP
      target/arm: Use gvec for NEON VMOV, VMVN, VBIC & VORR (immediate)
      target/arm: Use gvec for NEON_3R_LOGIC insns
      target/arm: Use gvec for NEON_3R_VADD_VSUB insns
      target/arm: Use gvec for NEON_2RM_VMN, NEON_2RM_VNEG
      target/arm: Use gvec for NEON_3R_VMUL
      target/arm: Use gvec for VSHR, VSHL
      target/arm: Use gvec for VSRA
      target/arm: Use gvec for VSRI, VSLI
      target/arm: Use gvec for NEON_3R_VML
      target/arm: Use gvec for NEON_3R_VTST_VCEQ, NEON_3R_VCGT, NEON_3R_VCGE
      target/arm: Use gvec for NEON VLD all lanes
      target/arm: Reorg NEON VLD/VST all elements
      target/arm: Promote consecutive memory ops for aa32
      target/arm: Reorg NEON VLD/VST single element to one lane
      target/arm: Remove writefn from TTBR0_EL3
      target/arm: Only flush tlb if ASID changes

Stewart Hildebrand (1):
      hw/arm/boot: Increase compliance with kernel arm64 boot protocol

From: Markus Armbruster <armbru@redhat.com>

Device models aren't supposed to go on fishing expeditions for
backends.  They should expose suitable properties for the user to set.
For onboard devices, board code sets them.

Device ssi-sd picks up its block backend in its init() method with
drive_get_next() instead.  This mistake is already marked FIXME since
commit af9e40a.

Unset user_creatable to remove the mistake from our external
interface.  Since the SSI bus doesn't support hotplug, only -device
can be affected.  Only certain ARM machines have ssi-sd and provide an
SSI bus for it; this patch breaks -device ssi-sd for these machines.
No actual use of -device ssi-sd is known.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Thomas Huth <thuth@redhat.com>
Message-id: 20181009060835.4608-1-armbru@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/sd/ssi-sd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/sd/ssi-sd.c b/hw/sd/ssi-sd.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/sd/ssi-sd.c
+++ b/hw/sd/ssi-sd.c
@@ -XXX,XX +XXX,XX @@ static void ssi_sd_class_init(ObjectClass *klass, void *data)
     k->cs_polarity = SSI_CS_LOW;
     dc->vmsd = &vmstate_ssi_sd;
     dc->reset = ssi_sd_reset;
+    /* Reason: init() method uses drive_get_next() */
+    dc->user_creatable = false;
 }
 
 static const TypeInfo ssi_sd_info = {
-- 
2.19.1

From: Dongjiu Geng <gengdongjiu@huawei.com>

This patch extends the qemu-kvm state sync logic with support for
KVM_GET/SET_VCPU_EVENTS, giving access to yet missing SError exception.
And also it can support the exception state migration.

The SError exception states include SError pending state and ESR value,
the kvm_put/get_vcpu_events() will be called when set or get system
registers. When do migration, if source machine has SError pending,
QEMU will do this migration regardless whether the target machine supports
to specify guest ESR value, because if target machine does not support that,
it can also inject the SError with zero ESR value.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1538067351-23931-3-git-send-email-gengdongjiu@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h     |  7 ++++++
 target/arm/kvm_arm.h | 24 ++++++++++++++++++
 target/arm/kvm.c     | 60 ++++++++++++++++++++++++++++++++++++++++++++
 target/arm/kvm32.c   | 13 ++++++++++
 target/arm/kvm64.c   | 13 ++++++++++
 target/arm/machine.c | 22 ++++++++++++++++
 6 files changed, 139 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
          */
     } exception;
 
+    /* Information associated with an SError */
+    struct {
+        uint8_t pending;
+        uint8_t has_esr;
+        uint64_t esr;
+    } serror;
+
     /* Thumb-2 EE state.  */
     uint32_t teecr;
     uint32_t teehbr;
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -XXX,XX +XXX,XX @@ bool write_kvmstate_to_list(ARMCPU *cpu);
  */
 void kvm_arm_reset_vcpu(ARMCPU *cpu);
 
+/**
+ * kvm_arm_init_serror_injection:
+ * @cs: CPUState
+ *
+ * Check whether KVM can set guest SError syndrome.
+ */
+void kvm_arm_init_serror_injection(CPUState *cs);
+
+/**
+ * kvm_get_vcpu_events:
+ * @cpu: ARMCPU
+ *
+ * Get VCPU related state from kvm.
+ */
+int kvm_get_vcpu_events(ARMCPU *cpu);
+
+/**
+ * kvm_put_vcpu_events:
+ * @cpu: ARMCPU
+ *
+ * Put VCPU related state to kvm.
+ */
+int kvm_put_vcpu_events(ARMCPU *cpu);
+
 #ifdef CONFIG_KVM
 /**
  * kvm_arm_create_scratch_host_vcpu:
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 };
 
 static bool cap_has_mp_state;
+static bool cap_has_inject_serror_esr;
 
 static ARMHostCPUFeatures arm_host_cpu_features;
 
@@ -XXX,XX +XXX,XX @@ int kvm_arm_vcpu_init(CPUState *cs)
     return kvm_vcpu_ioctl(cs, KVM_ARM_VCPU_INIT, &init);
 }
 
+void kvm_arm_init_serror_injection(CPUState *cs)
+{
+    cap_has_inject_serror_esr = kvm_check_extension(cs->kvm_state,
+                                    KVM_CAP_ARM_INJECT_SERROR_ESR);
+}
+
 bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
                                       int *fdarray,
                                       struct kvm_vcpu_init *init)
@@ -XXX,XX +XXX,XX @@ int kvm_arm_sync_mpstate_to_qemu(ARMCPU *cpu)
     return 0;
 }
 
+int kvm_put_vcpu_events(ARMCPU *cpu)
+{
+    CPUARMState *env = &cpu->env;
+    struct kvm_vcpu_events events;
+    int ret;
+
+    if (!kvm_has_vcpu_events()) {
+        return 0;
+    }
+
+    memset(&events, 0, sizeof(events));
+    events.exception.serror_pending = env->serror.pending;
+
+    /* Inject SError to guest with specified syndrome if host kernel
+     * supports it, otherwise inject SError without syndrome.
+     */
+    if (cap_has_inject_serror_esr) {
+        events.exception.serror_has_esr = env->serror.has_esr;
+        events.exception.serror_esr = env->serror.esr;
+    }
+
+    ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, &events);
+    if (ret) {
+        error_report("failed to put vcpu events");
+    }
+
+    return ret;
+}
+
+int kvm_get_vcpu_events(ARMCPU *cpu)
+{
+    CPUARMState *env = &cpu->env;
+    struct kvm_vcpu_events events;
+    int ret;
+
+    if (!kvm_has_vcpu_events()) {
+        return 0;
+    }
+
+    memset(&events, 0, sizeof(events));
+    ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_VCPU_EVENTS, &events);
+    if (ret) {
+        error_report("failed to get vcpu events");
+        return ret;
+    }
+
+    env->serror.pending = events.exception.serror_pending;
+    env->serror.has_esr = events.exception.serror_has_esr;
+    env->serror.esr = events.exception.serror_esr;
+
+    return 0;
+}
+
 void kvm_arch_pre_run(CPUState *cs, struct kvm_run *run)
 {
 }
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
     }
     cpu->mp_affinity = mpidr & ARM32_AFFINITY_MASK;
 
+    /* Check whether userspace can specify guest syndrome value */
+    kvm_arm_init_serror_injection(cs);
+
     return kvm_arm_init_cpreg_list(cpu);
 }
 
@@ -XXX,XX +XXX,XX @@ int kvm_arch_put_registers(CPUState *cs, int level)
         return ret;
     }
 
+    ret = kvm_put_vcpu_events(cpu);
+    if (ret) {
+        return ret;
+    }
+
     /* Note that we do not call write_cpustate_to_list()
      * here, so we are only writing the tuple list back to
      * KVM. This is safe because nothing can change the
@@ -XXX,XX +XXX,XX @@ int kvm_arch_get_registers(CPUState *cs)
     }
     vfp_set_fpscr(env, fpscr);
 
+    ret = kvm_get_vcpu_events(cpu);
+    if (ret) {
+        return ret;
+    }
+
     if (!write_kvmstate_to_list(cpu)) {
         return EINVAL;
     }
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
 
     kvm_arm_init_debug(cs);
 
+    /* Check whether user space can specify guest syndrome value */
+    kvm_arm_init_serror_injection(cs);
+
     return kvm_arm_init_cpreg_list(cpu);
 }
 
@@ -XXX,XX +XXX,XX @@ int kvm_arch_put_registers(CPUState *cs, int level)
         return ret;
     }
 
+    ret = kvm_put_vcpu_events(cpu);
+    if (ret) {
+        return ret;
+    }
+
     if (!write_list_to_kvmstate(cpu, level)) {
         return EINVAL;
     }
@@ -XXX,XX +XXX,XX @@ int kvm_arch_get_registers(CPUState *cs)
     }
     vfp_set_fpcr(env, fpr);
 
+    ret = kvm_get_vcpu_events(cpu);
+    if (ret) {
+        return ret;
+    }
+
     if (!write_kvmstate_to_list(cpu)) {
         return EINVAL;
     }
diff --git a/target/arm/machine.c b/target/arm/machine.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_sve = {
 };
 #endif /* AARCH64 */
 
+static bool serror_needed(void *opaque)
+{
+    ARMCPU *cpu = opaque;
+    CPUARMState *env = &cpu->env;
+
+    return env->serror.pending != 0;
+}
+
+static const VMStateDescription vmstate_serror = {
+    .name = "cpu/serror",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = serror_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT8(env.serror.pending, ARMCPU),
+        VMSTATE_UINT8(env.serror.has_esr, ARMCPU),
+        VMSTATE_UINT64(env.serror.esr, ARMCPU),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static bool m_needed(void *opaque)
 {
     ARMCPU *cpu = opaque;
@@ -XXX,XX +XXX,XX @@ const VMStateDescription vmstate_arm_cpu = {
 #ifdef TARGET_AARCH64
         &vmstate_sve,
 #endif
+        &vmstate_serror,
         NULL
     }
 };
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Create struct ARMISARegisters, to be accessed during translation.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181016223115.24100-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h      |  32 ++++----
 hw/intc/armv7m_nvic.c |  12 +--
 target/arm/cpu.c      | 178 +++++++++++++++++++++---------------------
 target/arm/cpu64.c    |  70 ++++++++---------
 target/arm/helper.c   |  28 +++----
 5 files changed, 162 insertions(+), 158 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
      * ARMv7AR ARM Architecture Reference Manual. A reset_ prefix
      * is used for reset values of non-constant registers; no reset_
      * prefix means a constant register.
+     * Some of these registers are split out into a substructure that
+     * is shared with the translators to control the ISA.
      */
+    struct ARMISARegisters {
+        uint32_t id_isar0;
+        uint32_t id_isar1;
+        uint32_t id_isar2;
+        uint32_t id_isar3;
+        uint32_t id_isar4;
+        uint32_t id_isar5;
+        uint32_t id_isar6;
+        uint32_t mvfr0;
+        uint32_t mvfr1;
+        uint32_t mvfr2;
+        uint64_t id_aa64isar0;
+        uint64_t id_aa64isar1;
+        uint64_t id_aa64pfr0;
+        uint64_t id_aa64pfr1;
+    } isar;
     uint32_t midr;
     uint32_t revidr;
     uint32_t reset_fpsid;
-    uint32_t mvfr0;
-    uint32_t mvfr1;
-    uint32_t mvfr2;
     uint32_t ctr;
     uint32_t reset_sctlr;
     uint32_t id_pfr0;
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
     uint32_t id_mmfr2;
     uint32_t id_mmfr3;
     uint32_t id_mmfr4;
-    uint32_t id_isar0;
-    uint32_t id_isar1;
-    uint32_t id_isar2;
-    uint32_t id_isar3;
-    uint32_t id_isar4;
-    uint32_t id_isar5;
-    uint32_t id_isar6;
-    uint64_t id_aa64pfr0;
-    uint64_t id_aa64pfr1;
     uint64_t id_aa64dfr0;
     uint64_t id_aa64dfr1;
     uint64_t id_aa64afr0;
     uint64_t id_aa64afr1;
-    uint64_t id_aa64isar0;
-    uint64_t id_aa64isar1;
     uint64_t id_aa64mmfr0;
     uint64_t id_aa64mmfr1;
     uint32_t dbgdidr;
diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
     case 0xd5c: /* MMFR3.  */
         return cpu->id_mmfr3;
     case 0xd60: /* ISAR0.  */
-        return cpu->id_isar0;
+        return cpu->isar.id_isar0;
     case 0xd64: /* ISAR1.  */
-        return cpu->id_isar1;
+        return cpu->isar.id_isar1;
     case 0xd68: /* ISAR2.  */
-        return cpu->id_isar2;
+        return cpu->isar.id_isar2;
     case 0xd6c: /* ISAR3.  */
-        return cpu->id_isar3;
+        return cpu->isar.id_isar3;
     case 0xd70: /* ISAR4.  */
-        return cpu->id_isar4;
+        return cpu->isar.id_isar4;
     case 0xd74: /* ISAR5.  */
-        return cpu->id_isar5;
+        return cpu->isar.id_isar5;
     case 0xd78: /* CLIDR */
         return cpu->clidr;
     case 0xd7c: /* CTR */
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
     g_hash_table_foreach(cpu->cp_regs, cp_reg_check_reset, cpu);
 
     env->vfp.xregs[ARM_VFP_FPSID] = cpu->reset_fpsid;
-    env->vfp.xregs[ARM_VFP_MVFR0] = cpu->mvfr0;
-    env->vfp.xregs[ARM_VFP_MVFR1] = cpu->mvfr1;
-    env->vfp.xregs[ARM_VFP_MVFR2] = cpu->mvfr2;
+    env->vfp.xregs[ARM_VFP_MVFR0] = cpu->isar.mvfr0;
+    env->vfp.xregs[ARM_VFP_MVFR1] = cpu->isar.mvfr1;
+    env->vfp.xregs[ARM_VFP_MVFR2] = cpu->isar.mvfr2;
 
     cpu->power_state = cpu->start_powered_off ? PSCI_OFF : PSCI_ON;
     s->halted = cpu->start_powered_off;
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
          * registers as well. These are id_pfr1[7:4] and id_aa64pfr0[15:12].
          */
         cpu->id_pfr1 &= ~0xf0;
-        cpu->id_aa64pfr0 &= ~0xf000;
+        cpu->isar.id_aa64pfr0 &= ~0xf000;
     }
 
     if (!cpu->has_el2) {
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
          * registers if we don't have EL2. These are id_pfr1[15:12] and
          * id_aa64pfr0_el1[11:8].
          */
-        cpu->id_aa64pfr0 &= ~0xf00;
+        cpu->isar.id_aa64pfr0 &= ~0xf00;
         cpu->id_pfr1 &= ~0xf000;
     }
 
@@ -XXX,XX +XXX,XX @@ static void arm1136_r2_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_CACHE_BLOCK_OPS);
     cpu->midr = 0x4107b362;
     cpu->reset_fpsid = 0x410120b4;
-    cpu->mvfr0 = 0x11111111;
-    cpu->mvfr1 = 0x00000000;
+    cpu->isar.mvfr0 = 0x11111111;
+    cpu->isar.mvfr1 = 0x00000000;
     cpu->ctr = 0x1dd20d2;
     cpu->reset_sctlr = 0x00050078;
     cpu->id_pfr0 = 0x111;
@@ -XXX,XX +XXX,XX @@ static void arm1136_r2_initfn(Object *obj)
     cpu->id_mmfr0 = 0x01130003;
     cpu->id_mmfr1 = 0x10030302;
     cpu->id_mmfr2 = 0x01222110;
-    cpu->id_isar0 = 0x00140011;
-    cpu->id_isar1 = 0x12002111;
-    cpu->id_isar2 = 0x11231111;
-    cpu->id_isar3 = 0x01102131;
-    cpu->id_isar4 = 0x141;
+    cpu->isar.id_isar0 = 0x00140011;
+    cpu->isar.id_isar1 = 0x12002111;
+    cpu->isar.id_isar2 = 0x11231111;
+    cpu->isar.id_isar3 = 0x01102131;
+    cpu->isar.id_isar4 = 0x141;
     cpu->reset_auxcr = 7;
 }
 
@@ -XXX,XX +XXX,XX @@ static void arm1136_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_CACHE_BLOCK_OPS);
     cpu->midr = 0x4117b363;
     cpu->reset_fpsid = 0x410120b4;
-    cpu->mvfr0 = 0x11111111;
-    cpu->mvfr1 = 0x00000000;
+    cpu->isar.mvfr0 = 0x11111111;
+    cpu->isar.mvfr1 = 0x00000000;
     cpu->ctr = 0x1dd20d2;
     cpu->reset_sctlr = 0x00050078;
     cpu->id_pfr0 = 0x111;
@@ -XXX,XX +XXX,XX @@ static void arm1136_initfn(Object *obj)
     cpu->id_mmfr0 = 0x01130003;
     cpu->id_mmfr1 = 0x10030302;
     cpu->id_mmfr2 = 0x01222110;
-    cpu->id_isar0 = 0x00140011;
-    cpu->id_isar1 = 0x12002111;
-    cpu->id_isar2 = 0x11231111;
-    cpu->id_isar3 = 0x01102131;
-    cpu->id_isar4 = 0x141;
+    cpu->isar.id_isar0 = 0x00140011;
+    cpu->isar.id_isar1 = 0x12002111;
+    cpu->isar.id_isar2 = 0x11231111;
+    cpu->isar.id_isar3 = 0x01102131;
+    cpu->isar.id_isar4 = 0x141;
     cpu->reset_auxcr = 7;
 }
 
@@ -XXX,XX +XXX,XX @@ static void arm1176_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_EL3);
     cpu->midr = 0x410fb767;
     cpu->reset_fpsid = 0x410120b5;
-    cpu->mvfr0 = 0x11111111;
-    cpu->mvfr1 = 0x00000000;
+    cpu->isar.mvfr0 = 0x11111111;
+    cpu->isar.mvfr1 = 0x00000000;
     cpu->ctr = 0x1dd20d2;
     cpu->reset_sctlr = 0x00050078;
     cpu->id_pfr0 = 0x111;
@@ -XXX,XX +XXX,XX @@ static void arm1176_initfn(Object *obj)
     cpu->id_mmfr0 = 0x01130003;
     cpu->id_mmfr1 = 0x10030302;
     cpu->id_mmfr2 = 0x01222100;
-    cpu->id_isar0 = 0x0140011;
-    cpu->id_isar1 = 0x12002111;
-    cpu->id_isar2 = 0x11231121;
-    cpu->id_isar3 = 0x01102131;
-    cpu->id_isar4 = 0x01141;
+    cpu->isar.id_isar0 = 0x0140011;
+    cpu->isar.id_isar1 = 0x12002111;
+    cpu->isar.id_isar2 = 0x11231121;
+    cpu->isar.id_isar3 = 0x01102131;
+    cpu->isar.id_isar4 = 0x01141;
     cpu->reset_auxcr = 7;
 }
 
@@ -XXX,XX +XXX,XX @@ static void arm11mpcore_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_DUMMY_C15_REGS);
     cpu->midr = 0x410fb022;
     cpu->reset_fpsid = 0x410120b4;
-    cpu->mvfr0 = 0x11111111;
-    cpu->mvfr1 = 0x00000000;
+    cpu->isar.mvfr0 = 0x11111111;
+    cpu->isar.mvfr1 = 0x00000000;
     cpu->ctr = 0x1d192992; /* 32K icache 32K dcache */
     cpu->id_pfr0 = 0x111;
     cpu->id_pfr1 = 0x1;
@@ -XXX,XX +XXX,XX @@ static void arm11mpcore_initfn(Object *obj)
     cpu->id_mmfr0 = 0x01100103;
     cpu->id_mmfr1 = 0x10020302;
     cpu->id_mmfr2 = 0x01222000;
-    cpu->id_isar0 = 0x00100011;
-    cpu->id_isar1 = 0x12002111;
-    cpu->id_isar2 = 0x11221011;
-    cpu->id_isar3 = 0x01102131;
-    cpu->id_isar4 = 0x141;
+    cpu->isar.id_isar0 = 0x00100011;
+    cpu->isar.id_isar1 = 0x12002111;
+    cpu->isar.id_isar2 = 0x11221011;
+    cpu->isar.id_isar3 = 0x01102131;
+    cpu->isar.id_isar4 = 0x141;
     cpu->reset_auxcr = 1;
 }
 
@@ -XXX,XX +XXX,XX @@ static void cortex_m3_initfn(Object *obj)
     cpu->id_mmfr1 = 0x00000000;
     cpu->id_mmfr2 = 0x00000000;
     cpu->id_mmfr3 = 0x00000000;
-    cpu->id_isar0 = 0x01141110;
-    cpu->id_isar1 = 0x02111000;
-    cpu->id_isar2 = 0x21112231;
-    cpu->id_isar3 = 0x01111110;
-    cpu->id_isar4 = 0x01310102;
-    cpu->id_isar5 = 0x00000000;
-    cpu->id_isar6 = 0x00000000;
+    cpu->isar.id_isar0 = 0x01141110;
+    cpu->isar.id_isar1 = 0x02111000;
+    cpu->isar.id_isar2 = 0x21112231;
+    cpu->isar.id_isar3 = 0x01111110;
+    cpu->isar.id_isar4 = 0x01310102;
+    cpu->isar.id_isar5 = 0x00000000;
+    cpu->isar.id_isar6 = 0x00000000;
 }
 
 static void cortex_m4_initfn(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void cortex_m4_initfn(Object *obj)
     cpu->id_mmfr1 = 0x00000000;
     cpu->id_mmfr2 = 0x00000000;
     cpu->id_mmfr3 = 0x00000000;
-    cpu->id_isar0 = 0x01141110;
-    cpu->id_isar1 = 0x02111000;
-    cpu->id_isar2 = 0x21112231;
-    cpu->id_isar3 = 0x01111110;
-    cpu->id_isar4 = 0x01310102;
-    cpu->id_isar5 = 0x00000000;
-    cpu->id_isar6 = 0x00000000;
+    cpu->isar.id_isar0 = 0x01141110;
+    cpu->isar.id_isar1 = 0x02111000;
+    cpu->isar.id_isar2 = 0x21112231;
+    cpu->isar.id_isar3 = 0x01111110;
+    cpu->isar.id_isar4 = 0x01310102;
+    cpu->isar.id_isar5 = 0x00000000;
+    cpu->isar.id_isar6 = 0x00000000;
 }
 
 static void cortex_m33_initfn(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void cortex_m33_initfn(Object *obj)
     cpu->id_mmfr1 = 0x00000000;
     cpu->id_mmfr2 = 0x01000000;
     cpu->id_mmfr3 = 0x00000000;
-    cpu->id_isar0 = 0x01101110;
-    cpu->id_isar1 = 0x02212000;
-    cpu->id_isar2 = 0x20232232;
-    cpu->id_isar3 = 0x01111131;
-    cpu->id_isar4 = 0x01310132;
-    cpu->id_isar5 = 0x00000000;
-    cpu->id_isar6 = 0x00000000;
+    cpu->isar.id_isar0 = 0x01101110;
+    cpu->isar.id_isar1 = 0x02212000;
+    cpu->isar.id_isar2 = 0x20232232;
+    cpu->isar.id_isar3 = 0x01111131;
+    cpu->isar.id_isar4 = 0x01310132;
+    cpu->isar.id_isar5 = 0x00000000;
+    cpu->isar.id_isar6 = 0x00000000;
     cpu->clidr = 0x00000000;
     cpu->ctr = 0x8000c000;
 }
@@ -XXX,XX +XXX,XX @@ static void cortex_r5_initfn(Object *obj)
     cpu->id_mmfr1 = 0x00000000;
     cpu->id_mmfr2 = 0x01200000;
     cpu->id_mmfr3 = 0x0211;
-    cpu->id_isar0 = 0x02101111;
-    cpu->id_isar1 = 0x13112111;
-    cpu->id_isar2 = 0x21232141;
-    cpu->id_isar3 = 0x01112131;
-    cpu->id_isar4 = 0x0010142;
-    cpu->id_isar5 = 0x0;
-    cpu->id_isar6 = 0x0;
+    cpu->isar.id_isar0 = 0x02101111;
+    cpu->isar.id_isar1 = 0x13112111;
+    cpu->isar.id_isar2 = 0x21232141;
+    cpu->isar.id_isar3 = 0x01112131;
+    cpu->isar.id_isar4 = 0x0010142;
+    cpu->isar.id_isar5 = 0x0;
+    cpu->isar.id_isar6 = 0x0;
     cpu->mp_is_up = true;
     cpu->pmsav7_dregion = 16;
     define_arm_cp_regs(cpu, cortexr5_cp_reginfo);
@@ -XXX,XX +XXX,XX @@ static void cortex_a8_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_EL3);
     cpu->midr = 0x410fc080;
     cpu->reset_fpsid = 0x410330c0;
-    cpu->mvfr0 = 0x11110222;
-    cpu->mvfr1 = 0x00011111;
+    cpu->isar.mvfr0 = 0x11110222;
+    cpu->isar.mvfr1 = 0x00011111;
     cpu->ctr = 0x82048004;
     cpu->reset_sctlr = 0x00c50078;
     cpu->id_pfr0 = 0x1031;
@@ -XXX,XX +XXX,XX @@ static void cortex_a8_initfn(Object *obj)
     cpu->id_mmfr1 = 0x20000000;
     cpu->id_mmfr2 = 0x01202000;
     cpu->id_mmfr3 = 0x11;
-    cpu->id_isar0 = 0x00101111;
-    cpu->id_isar1 = 0x12112111;
-    cpu->id_isar2 = 0x21232031;
-    cpu->id_isar3 = 0x11112131;
-    cpu->id_isar4 = 0x00111142;
+    cpu->isar.id_isar0 = 0x00101111;
+    cpu->isar.id_isar1 = 0x12112111;
+    cpu->isar.id_isar2 = 0x21232031;
+    cpu->isar.id_isar3 = 0x11112131;
+    cpu->isar.id_isar4 = 0x00111142;
     cpu->dbgdidr = 0x15141000;
     cpu->clidr = (1 << 27) | (2 << 24) | 3;
     cpu->ccsidr[0] = 0xe007e01a; /* 16k L1 dcache. */
@@ -XXX,XX +XXX,XX @@ static void cortex_a9_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_CBAR);
     cpu->midr = 0x410fc090;
     cpu->reset_fpsid = 0x41033090;
-    cpu->mvfr0 = 0x11110222;
-    cpu->mvfr1 = 0x01111111;
+    cpu->isar.mvfr0 = 0x11110222;
+    cpu->isar.mvfr1 = 0x01111111;
     cpu->ctr = 0x80038003;
     cpu->reset_sctlr = 0x00c50078;
     cpu->id_pfr0 = 0x1031;
@@ -XXX,XX +XXX,XX @@ static void cortex_a9_initfn(Object *obj)
     cpu->id_mmfr1 = 0x20000000;
     cpu->id_mmfr2 = 0x01230000;
     cpu->id_mmfr3 = 0x00002111;
-    cpu->id_isar0 = 0x00101111;
-    cpu->id_isar1 = 0x13112111;
-    cpu->id_isar2 = 0x21232041;
-    cpu->id_isar3 = 0x11112131;
-    cpu->id_isar4 = 0x00111142;
+    cpu->isar.id_isar0 = 0x00101111;
+    cpu->isar.id_isar1 = 0x13112111;
+    cpu->isar.id_isar2 = 0x21232041;
+    cpu->isar.id_isar3 = 0x11112131;
+    cpu->isar.id_isar4 = 0x00111142;
     cpu->dbgdidr = 0x35141000;
     cpu->clidr = (1 << 27) | (1 << 24) | 3;
     cpu->ccsidr[0] = 0xe00fe019; /* 16k L1 dcache. */
@@ -XXX,XX +XXX,XX @@ static void cortex_a7_initfn(Object *obj)
     cpu->kvm_target = QEMU_KVM_ARM_TARGET_CORTEX_A7;
     cpu->midr = 0x410fc075;
     cpu->reset_fpsid = 0x41023075;
-    cpu->mvfr0 = 0x10110222;
-    cpu->mvfr1 = 0x11111111;
+    cpu->isar.mvfr0 = 0x10110222;
+    cpu->isar.mvfr1 = 0x11111111;
     cpu->ctr = 0x84448003;
     cpu->reset_sctlr = 0x00c50078;
     cpu->id_pfr0 = 0x00001131;
@@ -XXX,XX +XXX,XX @@ static void cortex_a7_initfn(Object *obj)
     /* a7_mpcore_r0p5_trm, page 4-4 gives 0x01101110; but
      * table 4-41 gives 0x02101110, which includes the arm div insns.
      */
-    cpu->id_isar0 = 0x02101110;
-    cpu->id_isar1 = 0x13112111;
-    cpu->id_isar2 = 0x21232041;
-    cpu->id_isar3 = 0x11112131;
-    cpu->id_isar4 = 0x10011142;
+    cpu->isar.id_isar0 = 0x02101110;
+    cpu->isar.id_isar1 = 0x13112111;
+    cpu->isar.id_isar2 = 0x21232041;
+    cpu->isar.id_isar3 = 0x11112131;
+    cpu->isar.id_isar4 = 0x10011142;
     cpu->dbgdidr = 0x3515f005;
     cpu->clidr = 0x0a200023;
     cpu->ccsidr[0] = 0x701fe00a; /* 32K L1 dcache */
@@ -XXX,XX +XXX,XX @@ static void cortex_a15_initfn(Object *obj)
     cpu->kvm_target = QEMU_KVM_ARM_TARGET_CORTEX_A15;
     cpu->midr = 0x412fc0f1;
     cpu->reset_fpsid = 0x410430f0;
-    cpu->mvfr0 = 0x10110222;
-    cpu->mvfr1 = 0x11111111;
+    cpu->isar.mvfr0 = 0x10110222;
+    cpu->isar.mvfr1 = 0x11111111;
     cpu->ctr = 0x8444c004;
     cpu->reset_sctlr = 0x00c50078;
     cpu->id_pfr0 = 0x00001131;
@@ -XXX,XX +XXX,XX @@ static void cortex_a15_initfn(Object *obj)
     cpu->id_mmfr1 = 0x20000000;
     cpu->id_mmfr2 = 0x01240000;
     cpu->id_mmfr3 = 0x02102211;
-    cpu->id_isar0 = 0x02101110;
-    cpu->id_isar1 = 0x13112111;
-    cpu->id_isar2 = 0x21232041;
-    cpu->id_isar3 = 0x11112131;
-    cpu->id_isar4 = 0x10011142;
+    cpu->isar.id_isar0 = 0x02101110;
+    cpu->isar.id_isar1 = 0x13112111;
+    cpu->isar.id_isar2 = 0x21232041;
+    cpu->isar.id_isar3 = 0x11112131;
+    cpu->isar.id_isar4 = 0x10011142;
     cpu->dbgdidr = 0x3515f021;
     cpu->clidr = 0x0a200023;
     cpu->ccsidr[0] = 0x701fe00a; /* 32K L1 dcache */
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_a57_initfn(Object *obj)
     cpu->midr = 0x411fd070;
     cpu->revidr = 0x00000000;
     cpu->reset_fpsid = 0x41034070;
-    cpu->mvfr0 = 0x10110222;
-    cpu->mvfr1 = 0x12111111;
-    cpu->mvfr2 = 0x00000043;
+    cpu->isar.mvfr0 = 0x10110222;
+    cpu->isar.mvfr1 = 0x12111111;
+    cpu->isar.mvfr2 = 0x00000043;
     cpu->ctr = 0x8444c004;
     cpu->reset_sctlr = 0x00c50838;
     cpu->id_pfr0 = 0x00000131;
@@ -XXX,XX +XXX,XX @@ static void aarch64_a57_initfn(Object *obj)
     cpu->id_mmfr1 = 0x40000000;
     cpu->id_mmfr2 = 0x01260000;
     cpu->id_mmfr3 = 0x02102211;
-    cpu->id_isar0 = 0x02101110;
-    cpu->id_isar1 = 0x13112111;
-    cpu->id_isar2 = 0x21232042;
-    cpu->id_isar3 = 0x01112131;
-    cpu->id_isar4 = 0x00011142;
-    cpu->id_isar5 = 0x00011121;
-    cpu->id_isar6 = 0;
-    cpu->id_aa64pfr0 = 0x00002222;
+    cpu->isar.id_isar0 = 0x02101110;
+    cpu->isar.id_isar1 = 0x13112111;
+    cpu->isar.id_isar2 = 0x21232042;
+    cpu->isar.id_isar3 = 0x01112131;
+    cpu->isar.id_isar4 = 0x00011142;
+    cpu->isar.id_isar5 = 0x00011121;
+    cpu->isar.id_isar6 = 0;
+    cpu->isar.id_aa64pfr0 = 0x00002222;
     cpu->id_aa64dfr0 = 0x10305106;
     cpu->pmceid0 = 0x00000000;
     cpu->pmceid1 = 0x00000000;
-    cpu->id_aa64isar0 = 0x00011120;
+    cpu->isar.id_aa64isar0 = 0x00011120;
     cpu->id_aa64mmfr0 = 0x00001124;
     cpu->dbgdidr = 0x3516d000;
     cpu->clidr = 0x0a200023;
@@ -XXX,XX +XXX,XX @@ static void aarch64_a53_initfn(Object *obj)
     cpu->midr = 0x410fd034;
     cpu->revidr = 0x00000000;
     cpu->reset_fpsid = 0x41034070;
-    cpu->mvfr0 = 0x10110222;
-    cpu->mvfr1 = 0x12111111;
-    cpu->mvfr2 = 0x00000043;
+    cpu->isar.mvfr0 = 0x10110222;
+    cpu->isar.mvfr1 = 0x12111111;
+    cpu->isar.mvfr2 = 0x00000043;
     cpu->ctr = 0x84448004; /* L1Ip = VIPT */
     cpu->reset_sctlr = 0x00c50838;
     cpu->id_pfr0 = 0x00000131;
@@ -XXX,XX +XXX,XX @@ static void aarch64_a53_initfn(Object *obj)
     cpu->id_mmfr1 = 0x40000000;
     cpu->id_mmfr2 = 0x01260000;
     cpu->id_mmfr3 = 0x02102211;
-    cpu->id_isar0 = 0x02101110;
-    cpu->id_isar1 = 0x13112111;
-    cpu->id_isar2 = 0x21232042;
-    cpu->id_isar3 = 0x01112131;
-    cpu->id_isar4 = 0x00011142;
-    cpu->id_isar5 = 0x00011121;
-    cpu->id_isar6 = 0;
-    cpu->id_aa64pfr0 = 0x00002222;
+    cpu->isar.id_isar0 = 0x02101110;
+    cpu->isar.id_isar1 = 0x13112111;
+    cpu->isar.id_isar2 = 0x21232042;
+    cpu->isar.id_isar3 = 0x01112131;
+    cpu->isar.id_isar4 = 0x00011142;
+    cpu->isar.id_isar5 = 0x00011121;
+    cpu->isar.id_isar6 = 0;
+    cpu->isar.id_aa64pfr0 = 0x00002222;
     cpu->id_aa64dfr0 = 0x10305106;
-    cpu->id_aa64isar0 = 0x00011120;
+    cpu->isar.id_aa64isar0 = 0x00011120;
     cpu->id_aa64mmfr0 = 0x00001122; /* 40 bit physical addr */
     cpu->dbgdidr = 0x3516d000;
     cpu->clidr = 0x0a200023;
@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
     cpu->midr = 0x410fd083;
     cpu->revidr = 0x00000000;
     cpu->reset_fpsid = 0x41034080;
-    cpu->mvfr0 = 0x10110222;
-    cpu->mvfr1 = 0x12111111;
-    cpu->mvfr2 = 0x00000043;
+    cpu->isar.mvfr0 = 0x10110222;
+    cpu->isar.mvfr1 = 0x12111111;
+    cpu->isar.mvfr2 = 0x00000043;
     cpu->ctr = 0x8444c004;
     cpu->reset_sctlr = 0x00c50838;
     cpu->id_pfr0 = 0x00000131;
@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
     cpu->id_mmfr1 = 0x40000000;
     cpu->id_mmfr2 = 0x01260000;
     cpu->id_mmfr3 = 0x02102211;
-    cpu->id_isar0 = 0x02101110;
-    cpu->id_isar1 = 0x13112111;
-    cpu->id_isar2 = 0x21232042;
-    cpu->id_isar3 = 0x01112131;
-    cpu->id_isar4 = 0x00011142;
-    cpu->id_isar5 = 0x00011121;
-    cpu->id_aa64pfr0 = 0x00002222;
+    cpu->isar.id_isar0 = 0x02101110;
+    cpu->isar.id_isar1 = 0x13112111;
+    cpu->isar.id_isar2 = 0x21232042;
+    cpu->isar.id_isar3 = 0x01112131;
+    cpu->isar.id_isar4 = 0x00011142;
+    cpu->isar.id_isar5 = 0x00011121;
+    cpu->isar.id_aa64pfr0 = 0x00002222;
     cpu->id_aa64dfr0 = 0x10305106;
     cpu->pmceid0 = 0x00000000;
     cpu->pmceid1 = 0x00000000;
-    cpu->id_aa64isar0 = 0x00011120;
+    cpu->isar.id_aa64isar0 = 0x00011120;
     cpu->id_aa64mmfr0 = 0x00001124;
     cpu->dbgdidr = 0x3516d000;
     cpu->clidr = 0x0a200023;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t id_pfr1_read(CPUARMState *env, const ARMCPRegInfo *ri)
 static uint64_t id_aa64pfr0_read(CPUARMState *env, const ARMCPRegInfo *ri)
 {
     ARMCPU *cpu = arm_env_get_cpu(env);
-    uint64_t pfr0 = cpu->id_aa64pfr0;
+    uint64_t pfr0 = cpu->isar.id_aa64pfr0;
 
     if (env->gicv3state) {
         pfr0 |= 1 << 24;
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
             { .name = "ID_ISAR0", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 0,
               .access = PL1_R, .type = ARM_CP_CONST,
-              .resetvalue = cpu->id_isar0 },
+              .resetvalue = cpu->isar.id_isar0 },
             { .name = "ID_ISAR1", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 1,
               .access = PL1_R, .type = ARM_CP_CONST,
-              .resetvalue = cpu->id_isar1 },
+              .resetvalue = cpu->isar.id_isar1 },
             { .name = "ID_ISAR2", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 2,
               .access = PL1_R, .type = ARM_CP_CONST,
-              .resetvalue = cpu->id_isar2 },
+              .resetvalue = cpu->isar.id_isar2 },
             { .name = "ID_ISAR3", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 3,
               .access = PL1_R, .type = ARM_CP_CONST,
-              .resetvalue = cpu->id_isar3 },
+              .resetvalue = cpu->isar.id_isar3 },
             { .name = "ID_ISAR4", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 4,
               .access = PL1_R, .type = ARM_CP_CONST,
-              .resetvalue = cpu->id_isar4 },
+              .resetvalue = cpu->isar.id_isar4 },
             { .name = "ID_ISAR5", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 5,
               .access = PL1_R, .type = ARM_CP_CONST,
-              .resetvalue = cpu->id_isar5 },
+              .resetvalue = cpu->isar.id_isar5 },
             { .name = "ID_MMFR4", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 6,
               .access = PL1_R, .type = ARM_CP_CONST,
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
             { .name = "ID_ISAR6", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 7,
               .access = PL1_R, .type = ARM_CP_CONST,
-              .resetvalue = cpu->id_isar6 },
+              .resetvalue = cpu->isar.id_isar6 },
             REGINFO_SENTINEL
         };
         define_arm_cp_regs(cpu, v6_idregs);
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
             { .name = "ID_AA64PFR1_EL1", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 4, .opc2 = 1,
               .access = PL1_R, .type = ARM_CP_CONST,
-              .resetvalue = cpu->id_aa64pfr1},
+              .resetvalue = cpu->isar.id_aa64pfr1},
             { .name = "ID_AA64PFR2_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 4, .opc2 = 2,
               .access = PL1_R, .type = ARM_CP_CONST,
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
             { .name = "ID_AA64ISAR0_EL1", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 0,
               .access = PL1_R, .type = ARM_CP_CONST,
-              .resetvalue = cpu->id_aa64isar0 },
+              .resetvalue = cpu->isar.id_aa64isar0 },
             { .name = "ID_AA64ISAR1_EL1", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 1,
               .access = PL1_R, .type = ARM_CP_CONST,
-              .resetvalue = cpu->id_aa64isar1 },
+              .resetvalue = cpu->isar.id_aa64isar1 },
             { .name = "ID_AA64ISAR2_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 2,
               .access = PL1_R, .type = ARM_CP_CONST,
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
             { .name = "MVFR0_EL1", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 3, .opc2 = 0,
               .access = PL1_R, .type = ARM_CP_CONST,
-              .resetvalue = cpu->mvfr0 },
+              .resetvalue = cpu->isar.mvfr0 },
             { .name = "MVFR1_EL1", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 3, .opc2 = 1,
               .access = PL1_R, .type = ARM_CP_CONST,
-              .resetvalue = cpu->mvfr1 },
+              .resetvalue = cpu->isar.mvfr1 },
             { .name = "MVFR2_EL1", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 3, .opc2 = 2,
               .access = PL1_R, .type = ARM_CP_CONST,
-              .resetvalue = cpu->mvfr2 },
+              .resetvalue = cpu->isar.mvfr2 },
             { .name = "MVFR3_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 3, .opc2 = 3,
               .access = PL1_R, .type = ARM_CP_CONST,
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Instantiating mps2-an505 (cortex-m33) will fail make check when
V7VE asserts that ID_ISAR0.Divide includes ARM division.  It is
also wrong to include ARM_FEATURE_LPAE.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181016223115.24100-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
 
     /* Some features automatically imply others: */
     if (arm_feature(env, ARM_FEATURE_V8)) {
-        set_feature(env, ARM_FEATURE_V7VE);
+        if (arm_feature(env, ARM_FEATURE_M)) {
+            set_feature(env, ARM_FEATURE_V7);
+        } else {
+            set_feature(env, ARM_FEATURE_V7VE);
+        }
     }
     if (arm_feature(env, ARM_FEATURE_V7VE)) {
         /* v7 Virtualization Extensions. In real hardware this implies
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Most of the v8 extensions are self-contained within the ISAR
registers and are not implied by other feature bits, which
makes them the easiest to convert.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181016223115.24100-4-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h           | 131 +++++++++++++++++++++++++++++++++----
 target/arm/translate.h     |   7 ++
 linux-user/elfload.c       |  46 ++++++++-----
 target/arm/cpu.c           |  27 +++++---
 target/arm/cpu64.c         |  57 +++++++++-------
 target/arm/translate-a64.c | 101 ++++++++++++++--------------
 target/arm/translate.c     |  36 +++++-----
 7 files changed, 273 insertions(+), 132 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef enum ARMPSCIState {
     PSCI_ON_PENDING = 2
 } ARMPSCIState;
 
+typedef struct ARMISARegisters ARMISARegisters;
+
 /**
  * ARMCPU:
  * @env: #CPUARMState
@@ -XXX,XX +XXX,XX @@ enum arm_features {
     ARM_FEATURE_LPAE, /* has Large Physical Address Extension */
     ARM_FEATURE_V8,
     ARM_FEATURE_AARCH64, /* supports 64 bit mode */
-    ARM_FEATURE_V8_AES, /* implements AES part of v8 Crypto Extensions */
     ARM_FEATURE_CBAR, /* has cp15 CBAR */
     ARM_FEATURE_CRC, /* ARMv8 CRC instructions */
     ARM_FEATURE_CBAR_RO, /* has cp15 CBAR and it is read-only */
     ARM_FEATURE_EL2, /* has EL2 Virtualization support */
     ARM_FEATURE_EL3, /* has EL3 Secure monitor support */
-    ARM_FEATURE_V8_SHA1, /* implements SHA1 part of v8 Crypto Extensions */
-    ARM_FEATURE_V8_SHA256, /* implements SHA256 part of v8 Crypto Extensions */
-    ARM_FEATURE_V8_PMULL, /* implements PMULL part of v8 Crypto Extensions */
     ARM_FEATURE_THUMB_DSP, /* DSP insns supported in the Thumb encodings */
     ARM_FEATURE_PMU, /* has PMU support */
     ARM_FEATURE_VBAR, /* has cp15 VBAR */
     ARM_FEATURE_M_SECURITY, /* M profile Security Extension */
     ARM_FEATURE_JAZELLE, /* has (trivial) Jazelle implementation */
     ARM_FEATURE_SVE, /* has Scalable Vector Extension */
-    ARM_FEATURE_V8_SHA512, /* implements SHA512 part of v8 Crypto Extensions */
-    ARM_FEATURE_V8_SHA3, /* implements SHA3 part of v8 Crypto Extensions */
-    ARM_FEATURE_V8_SM3, /* implements SM3 part of v8 Crypto Extensions */
-    ARM_FEATURE_V8_SM4, /* implements SM4 part of v8 Crypto Extensions */
-    ARM_FEATURE_V8_ATOMICS, /* ARMv8.1-Atomics feature */
-    ARM_FEATURE_V8_RDM, /* implements v8.1 simd round multiply */
-    ARM_FEATURE_V8_DOTPROD, /* implements v8.2 simd dot product */
     ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
-    ARM_FEATURE_V8_FCMA, /* has complex number part of v8.3 extensions.  */
     ARM_FEATURE_M_MAIN, /* M profile Main Extension */
 };
 
@@ -XXX,XX +XXX,XX @@ static inline uint64_t *aa64_vfp_qreg(CPUARMState *env, unsigned regno)
 /* Shared between translate-sve.c and sve_helper.c.  */
 extern const uint64_t pred_esz_masks[4];
 
+/*
+ * 32-bit feature tests via id registers.
+ */
+static inline bool isar_feature_aa32_aes(const ARMISARegisters *id)
+{
+    return FIELD_EX32(id->id_isar5, ID_ISAR5, AES) != 0;
+}
+
+static inline bool isar_feature_aa32_pmull(const ARMISARegisters *id)
+{
+    return FIELD_EX32(id->id_isar5, ID_ISAR5, AES) > 1;
+}
+
+static inline bool isar_feature_aa32_sha1(const ARMISARegisters *id)
+{
+    return FIELD_EX32(id->id_isar5, ID_ISAR5, SHA1) != 0;
+}
+
+static inline bool isar_feature_aa32_sha2(const ARMISARegisters *id)
+{
+    return FIELD_EX32(id->id_isar5, ID_ISAR5, SHA2) != 0;
+}
+
+static inline bool isar_feature_aa32_crc32(const ARMISARegisters *id)
+{
+    return FIELD_EX32(id->id_isar5, ID_ISAR5, CRC32) != 0;
+}
+
+static inline bool isar_feature_aa32_rdm(const ARMISARegisters *id)
+{
+    return FIELD_EX32(id->id_isar5, ID_ISAR5, RDM) != 0;
+}
+
+static inline bool isar_feature_aa32_vcma(const ARMISARegisters *id)
+{
+    return FIELD_EX32(id->id_isar5, ID_ISAR5, VCMA) != 0;
+}
+
+static inline bool isar_feature_aa32_dp(const ARMISARegisters *id)
+{
+    return FIELD_EX32(id->id_isar6, ID_ISAR6, DP) != 0;
+}
+
+/*
+ * 64-bit feature tests via id registers.
+ */
+static inline bool isar_feature_aa64_aes(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, AES) != 0;
+}
+
+static inline bool isar_feature_aa64_pmull(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, AES) > 1;
+}
+
+static inline bool isar_feature_aa64_sha1(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SHA1) != 0;
+}
+
+static inline bool isar_feature_aa64_sha256(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SHA2) != 0;
+}
+
+static inline bool isar_feature_aa64_sha512(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SHA2) > 1;
+}
+
+static inline bool isar_feature_aa64_crc32(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, CRC32) != 0;
+}
+
+static inline bool isar_feature_aa64_atomics(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, ATOMIC) != 0;
+}
+
+static inline bool isar_feature_aa64_rdm(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, RDM) != 0;
+}
+
+static inline bool isar_feature_aa64_sha3(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SHA3) != 0;
+}
+
+static inline bool isar_feature_aa64_sm3(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SM3) != 0;
+}
+
+static inline bool isar_feature_aa64_sm4(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, SM4) != 0;
+}
+
+static inline bool isar_feature_aa64_dp(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, DP) != 0;
+}
+
+static inline bool isar_feature_aa64_fcma(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, FCMA) != 0;
+}
+
+/*
+ * Forward to the above feature tests given an ARMCPU pointer.
+ */
+#define cpu_isar_feature(name, cpu) \
+    ({ ARMCPU *cpu_ = (cpu); isar_feature_##name(&cpu_->isar); })
+
 #endif
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@
 /* internal defines */
 typedef struct DisasContext {
     DisasContextBase base;
+    const ARMISARegisters *isar;
 
     target_ulong pc;
     target_ulong page_start;
@@ -XXX,XX +XXX,XX @@ static inline TCGv_i32 get_ahp_flag(void)
     return ret;
 }
 
+/*
+ * Forward to the isar_feature_* tests given a DisasContext pointer.
+ */
+#define dc_isar_feature(name, ctx) \
+    ({ DisasContext *ctx_ = (ctx); isar_feature_##name(ctx_->isar); })
+
 #endif /* TARGET_ARM_TRANSLATE_H */
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
     /* probe for the extra features */
 #define GET_FEATURE(feat, hwcap) \
     do { if (arm_feature(&cpu->env, feat)) { hwcaps |= hwcap; } } while (0)
+
+#define GET_FEATURE_ID(feat, hwcap) \
+    do { if (cpu_isar_feature(feat, cpu)) { hwcaps |= hwcap; } } while (0)
+
     /* EDSP is in v5TE and above, but all our v5 CPUs are v5TE */
     GET_FEATURE(ARM_FEATURE_V5, ARM_HWCAP_ARM_EDSP);
     GET_FEATURE(ARM_FEATURE_VFP, ARM_HWCAP_ARM_VFP);
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap2(void)
     ARMCPU *cpu = ARM_CPU(thread_cpu);
     uint32_t hwcaps = 0;
 
-    GET_FEATURE(ARM_FEATURE_V8_AES, ARM_HWCAP2_ARM_AES);
-    GET_FEATURE(ARM_FEATURE_V8_PMULL, ARM_HWCAP2_ARM_PMULL);
-    GET_FEATURE(ARM_FEATURE_V8_SHA1, ARM_HWCAP2_ARM_SHA1);
-    GET_FEATURE(ARM_FEATURE_V8_SHA256, ARM_HWCAP2_ARM_SHA2);
-    GET_FEATURE(ARM_FEATURE_CRC, ARM_HWCAP2_ARM_CRC32);
+    GET_FEATURE_ID(aa32_aes, ARM_HWCAP2_ARM_AES);
+    GET_FEATURE_ID(aa32_pmull, ARM_HWCAP2_ARM_PMULL);
+    GET_FEATURE_ID(aa32_sha1, ARM_HWCAP2_ARM_SHA1);
+    GET_FEATURE_ID(aa32_sha2, ARM_HWCAP2_ARM_SHA2);
+    GET_FEATURE_ID(aa32_crc32, ARM_HWCAP2_ARM_CRC32);
     return hwcaps;
 }
 
 #undef GET_FEATURE
+#undef GET_FEATURE_ID
 
 #else
 /* 64 bit ARM definitions */
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
     /* probe for the extra features */
 #define GET_FEATURE(feat, hwcap) \
     do { if (arm_feature(&cpu->env, feat)) { hwcaps |= hwcap; } } while (0)
-    GET_FEATURE(ARM_FEATURE_V8_AES, ARM_HWCAP_A64_AES);
-    GET_FEATURE(ARM_FEATURE_V8_PMULL, ARM_HWCAP_A64_PMULL);
-    GET_FEATURE(ARM_FEATURE_V8_SHA1, ARM_HWCAP_A64_SHA1);
-    GET_FEATURE(ARM_FEATURE_V8_SHA256, ARM_HWCAP_A64_SHA2);
-    GET_FEATURE(ARM_FEATURE_CRC, ARM_HWCAP_A64_CRC32);
-    GET_FEATURE(ARM_FEATURE_V8_SHA3, ARM_HWCAP_A64_SHA3);
-    GET_FEATURE(ARM_FEATURE_V8_SM3, ARM_HWCAP_A64_SM3);
-    GET_FEATURE(ARM_FEATURE_V8_SM4, ARM_HWCAP_A64_SM4);
-    GET_FEATURE(ARM_FEATURE_V8_SHA512, ARM_HWCAP_A64_SHA512);
+#define GET_FEATURE_ID(feat, hwcap) \
+    do { if (cpu_isar_feature(feat, cpu)) { hwcaps |= hwcap; } } while (0)
+
+    GET_FEATURE_ID(aa64_aes, ARM_HWCAP_A64_AES);
+    GET_FEATURE_ID(aa64_pmull, ARM_HWCAP_A64_PMULL);
+    GET_FEATURE_ID(aa64_sha1, ARM_HWCAP_A64_SHA1);
+    GET_FEATURE_ID(aa64_sha256, ARM_HWCAP_A64_SHA2);
+    GET_FEATURE_ID(aa64_sha512, ARM_HWCAP_A64_SHA512);
+    GET_FEATURE_ID(aa64_crc32, ARM_HWCAP_A64_CRC32);
+    GET_FEATURE_ID(aa64_sha3, ARM_HWCAP_A64_SHA3);
+    GET_FEATURE_ID(aa64_sm3, ARM_HWCAP_A64_SM3);
+    GET_FEATURE_ID(aa64_sm4, ARM_HWCAP_A64_SM4);
     GET_FEATURE(ARM_FEATURE_V8_FP16,
                 ARM_HWCAP_A64_FPHP | ARM_HWCAP_A64_ASIMDHP);
-    GET_FEATURE(ARM_FEATURE_V8_ATOMICS, ARM_HWCAP_A64_ATOMICS);
-    GET_FEATURE(ARM_FEATURE_V8_RDM, ARM_HWCAP_A64_ASIMDRDM);
-    GET_FEATURE(ARM_FEATURE_V8_DOTPROD, ARM_HWCAP_A64_ASIMDDP);
-    GET_FEATURE(ARM_FEATURE_V8_FCMA, ARM_HWCAP_A64_FCMA);
+    GET_FEATURE_ID(aa64_atomics, ARM_HWCAP_A64_ATOMICS);
+    GET_FEATURE_ID(aa64_rdm, ARM_HWCAP_A64_ASIMDRDM);
+    GET_FEATURE_ID(aa64_dp, ARM_HWCAP_A64_ASIMDDP);
+    GET_FEATURE_ID(aa64_fcma, ARM_HWCAP_A64_FCMA);
     GET_FEATURE(ARM_FEATURE_SVE, ARM_HWCAP_A64_SVE);
+
 #undef GET_FEATURE
+#undef GET_FEATURE_ID
 
     return hwcaps;
 }
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
         cortex_a15_initfn(obj);
 #ifdef CONFIG_USER_ONLY
         /* We don't set these in system emulation mode for the moment,
-         * since we don't correctly set the ID registers to advertise them,
+         * since we don't correctly set (all of) the ID registers to
+         * advertise them.
          */
         set_feature(&cpu->env, ARM_FEATURE_V8);
-        set_feature(&cpu->env, ARM_FEATURE_V8_AES);
-        set_feature(&cpu->env, ARM_FEATURE_V8_SHA1);
-        set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
-        set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
-        set_feature(&cpu->env, ARM_FEATURE_CRC);
-        set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
-        set_feature(&cpu->env, ARM_FEATURE_V8_DOTPROD);
-        set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
+        {
+            uint32_t t;
+
+            t = cpu->isar.id_isar5;
+            t = FIELD_DP32(t, ID_ISAR5, AES, 2);
+            t = FIELD_DP32(t, ID_ISAR5, SHA1, 1);
+            t = FIELD_DP32(t, ID_ISAR5, SHA2, 1);
+            t = FIELD_DP32(t, ID_ISAR5, CRC32, 1);
+            t = FIELD_DP32(t, ID_ISAR5, RDM, 1);
+            t = FIELD_DP32(t, ID_ISAR5, VCMA, 1);
+            cpu->isar.id_isar5 = t;
+
+            t = cpu->isar.id_isar6;
+            t = FIELD_DP32(t, ID_ISAR6, DP, 1);
+            cpu->isar.id_isar6 = t;
+        }
 #endif
     }
 }
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_a57_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
     set_feature(&cpu->env, ARM_FEATURE_AARCH64);
     set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
-    set_feature(&cpu->env, ARM_FEATURE_V8_AES);
-    set_feature(&cpu->env, ARM_FEATURE_V8_SHA1);
-    set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
-    set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
-    set_feature(&cpu->env, ARM_FEATURE_CRC);
     set_feature(&cpu->env, ARM_FEATURE_EL2);
     set_feature(&cpu->env, ARM_FEATURE_EL3);
     set_feature(&cpu->env, ARM_FEATURE_PMU);
@@ -XXX,XX +XXX,XX @@ static void aarch64_a53_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
     set_feature(&cpu->env, ARM_FEATURE_AARCH64);
     set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
-    set_feature(&cpu->env, ARM_FEATURE_V8_AES);
-    set_feature(&cpu->env, ARM_FEATURE_V8_SHA1);
-    set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
-    set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
-    set_feature(&cpu->env, ARM_FEATURE_CRC);
     set_feature(&cpu->env, ARM_FEATURE_EL2);
     set_feature(&cpu->env, ARM_FEATURE_EL3);
     set_feature(&cpu->env, ARM_FEATURE_PMU);
@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
     set_feature(&cpu->env, ARM_FEATURE_AARCH64);
     set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
-    set_feature(&cpu->env, ARM_FEATURE_V8_AES);
-    set_feature(&cpu->env, ARM_FEATURE_V8_SHA1);
-    set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
-    set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
-    set_feature(&cpu->env, ARM_FEATURE_CRC);
     set_feature(&cpu->env, ARM_FEATURE_EL2);
     set_feature(&cpu->env, ARM_FEATURE_EL3);
     set_feature(&cpu->env, ARM_FEATURE_PMU);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
     if (kvm_enabled()) {
         kvm_arm_set_cpu_features_from_host(cpu);
     } else {
+        uint64_t t;
+        uint32_t u;
         aarch64_a57_initfn(obj);
+
+        t = cpu->isar.id_aa64isar0;
+        t = FIELD_DP64(t, ID_AA64ISAR0, AES, 2); /* AES + PMULL */
+        t = FIELD_DP64(t, ID_AA64ISAR0, SHA1, 1);
+        t = FIELD_DP64(t, ID_AA64ISAR0, SHA2, 2); /* SHA512 */
+        t = FIELD_DP64(t, ID_AA64ISAR0, CRC32, 1);
+        t = FIELD_DP64(t, ID_AA64ISAR0, ATOMIC, 2);
+        t = FIELD_DP64(t, ID_AA64ISAR0, RDM, 1);
+        t = FIELD_DP64(t, ID_AA64ISAR0, SHA3, 1);
+        t = FIELD_DP64(t, ID_AA64ISAR0, SM3, 1);
+        t = FIELD_DP64(t, ID_AA64ISAR0, SM4, 1);
+        t = FIELD_DP64(t, ID_AA64ISAR0, DP, 1);
+        cpu->isar.id_aa64isar0 = t;
+
+        t = cpu->isar.id_aa64isar1;
+        t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
+        cpu->isar.id_aa64isar1 = t;
+
+        /* Replicate the same data to the 32-bit id registers.  */
+        u = cpu->isar.id_isar5;
+        u = FIELD_DP32(u, ID_ISAR5, AES, 2); /* AES + PMULL */
+        u = FIELD_DP32(u, ID_ISAR5, SHA1, 1);
+        u = FIELD_DP32(u, ID_ISAR5, SHA2, 1);
+        u = FIELD_DP32(u, ID_ISAR5, CRC32, 1);
+        u = FIELD_DP32(u, ID_ISAR5, RDM, 1);
+        u = FIELD_DP32(u, ID_ISAR5, VCMA, 1);
+        cpu->isar.id_isar5 = u;
+
+        u = cpu->isar.id_isar6;
+        u = FIELD_DP32(u, ID_ISAR6, DP, 1);
+        cpu->isar.id_isar6 = u;
+
 #ifdef CONFIG_USER_ONLY
         /* We don't set these in system emulation mode for the moment,
          * since we don't correctly set the ID registers to advertise them,
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          * whereas the architecture requires them to be present in both if
          * present in either.
          */
-        set_feature(&cpu->env, ARM_FEATURE_V8_SHA512);
-        set_feature(&cpu->env, ARM_FEATURE_V8_SHA3);
-        set_feature(&cpu->env, ARM_FEATURE_V8_SM3);
-        set_feature(&cpu->env, ARM_FEATURE_V8_SM4);
-        set_feature(&cpu->env, ARM_FEATURE_V8_ATOMICS);
-        set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
-        set_feature(&cpu->env, ARM_FEATURE_V8_DOTPROD);
         set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
-        set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
         set_feature(&cpu->env, ARM_FEATURE_SVE);
         /* For usermode -cpu max we can use a larger and more efficient DCZ
          * blocksize since we don't have to follow what the hardware does.
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
         }
         if (rt2 == 31
             && ((rt | rs) & 1) == 0
-            && arm_dc_feature(s, ARM_FEATURE_V8_ATOMICS)) {
+            && dc_isar_feature(aa64_atomics, s)) {
             /* CASP / CASPL */
             gen_compare_and_swap_pair(s, rs, rt, rn, size | 2);
             return;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
         }
         if (rt2 == 31
             && ((rt | rs) & 1) == 0
-            && arm_dc_feature(s, ARM_FEATURE_V8_ATOMICS)) {
+            && dc_isar_feature(aa64_atomics, s)) {
             /* CASPA / CASPAL */
             gen_compare_and_swap_pair(s, rs, rt, rn, size | 2);
             return;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
     case 0xb: /* CASL */
     case 0xe: /* CASA */
     case 0xf: /* CASAL */
-        if (rt2 == 31 && arm_dc_feature(s, ARM_FEATURE_V8_ATOMICS)) {
+        if (rt2 == 31 && dc_isar_feature(aa64_atomics, s)) {
             gen_compare_and_swap(s, rs, rt, rn, size);
             return;
         }
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     int rs = extract32(insn, 16, 5);
     int rn = extract32(insn, 5, 5);
     int o3_opc = extract32(insn, 12, 4);
-    int feature = ARM_FEATURE_V8_ATOMICS;
     TCGv_i64 tcg_rn, tcg_rs;
     AtomicThreeOpFn *fn;
 
-    if (is_vector) {
+    if (is_vector || !dc_isar_feature(aa64_atomics, s)) {
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
         unallocated_encoding(s);
         return;
     }
-    if (!arm_dc_feature(s, feature)) {
-        unallocated_encoding(s);
-        return;
-    }
 
     if (rn == 31) {
         gen_check_sp_alignment(s);
@@ -XXX,XX +XXX,XX @@ static void handle_crc32(DisasContext *s,
     TCGv_i64 tcg_acc, tcg_val;
     TCGv_i32 tcg_bytes;
 
-    if (!arm_dc_feature(s, ARM_FEATURE_CRC)
+    if (!dc_isar_feature(aa64_crc32, s)
         || (sf == 1 && sz != 3)
         || (sf == 0 && sz == 3)) {
         unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_three_reg_same_extra(DisasContext *s,
     bool u = extract32(insn, 29, 1);
     TCGv_i32 ele1, ele2, ele3;
     TCGv_i64 res;
-    int feature;
+    bool feature;
 
     switch (u * 16 + opcode) {
     case 0x10: /* SQRDMLAH (vector) */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_three_reg_same_extra(DisasContext *s,
             unallocated_encoding(s);
             return;
         }
-        feature = ARM_FEATURE_V8_RDM;
+        feature = dc_isar_feature(aa64_rdm, s);
         break;
     default:
         unallocated_encoding(s);
         return;
     }
-    if (!arm_dc_feature(s, feature)) {
+    if (!feature) {
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
             return;
         }
         if (size == 3) {
-            if (!arm_dc_feature(s, ARM_FEATURE_V8_PMULL)) {
+            if (!dc_isar_feature(aa64_pmull, s)) {
                 unallocated_encoding(s);
                 return;
             }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
     int size = extract32(insn, 22, 2);
     bool u = extract32(insn, 29, 1);
     bool is_q = extract32(insn, 30, 1);
-    int feature, rot;
+    bool feature;
+    int rot;
 
     switch (u * 16 + opcode) {
     case 0x10: /* SQRDMLAH (vector) */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
             unallocated_encoding(s);
             return;
         }
-        feature = ARM_FEATURE_V8_RDM;
+        feature = dc_isar_feature(aa64_rdm, s);
         break;
     case 0x02: /* SDOT (vector) */
     case 0x12: /* UDOT (vector) */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
             unallocated_encoding(s);
             return;
         }
-        feature = ARM_FEATURE_V8_DOTPROD;
+        feature = dc_isar_feature(aa64_dp, s);
         break;
     case 0x18: /* FCMLA, #0 */
     case 0x19: /* FCMLA, #90 */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
             unallocated_encoding(s);
             return;
         }
-        feature = ARM_FEATURE_V8_FCMA;
+        feature = dc_isar_feature(aa64_fcma, s);
         break;
     default:
         unallocated_encoding(s);
         return;
     }
-    if (!arm_dc_feature(s, feature)) {
+    if (!feature) {
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
         break;
     case 0x1d: /* SQRDMLAH */
     case 0x1f: /* SQRDMLSH */
-        if (!arm_dc_feature(s, ARM_FEATURE_V8_RDM)) {
+        if (!dc_isar_feature(aa64_rdm, s)) {
             unallocated_encoding(s);
             return;
         }
         break;
     case 0x0e: /* SDOT */
     case 0x1e: /* UDOT */
-        if (size != MO_32 || !arm_dc_feature(s, ARM_FEATURE_V8_DOTPROD)) {
+        if (size != MO_32 || !dc_isar_feature(aa64_dp, s)) {
             unallocated_encoding(s);
             return;
         }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
     case 0x13: /* FCMLA #90 */
     case 0x15: /* FCMLA #180 */
     case 0x17: /* FCMLA #270 */
-        if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)) {
+        if (!dc_isar_feature(aa64_fcma, s)) {
             unallocated_encoding(s);
             return;
         }
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_aes(DisasContext *s, uint32_t insn)
     TCGv_i32 tcg_decrypt;
     CryptoThreeOpIntFn *genfn;
 
-    if (!arm_dc_feature(s, ARM_FEATURE_V8_AES)
-        || size != 0) {
+    if (!dc_isar_feature(aa64_aes, s) || size != 0) {
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_three_reg_sha(DisasContext *s, uint32_t insn)
     int rd = extract32(insn, 0, 5);
     CryptoThreeOpFn *genfn;
     TCGv_ptr tcg_rd_ptr, tcg_rn_ptr, tcg_rm_ptr;
-    int feature = ARM_FEATURE_V8_SHA256;
+    bool feature;
 
     if (size != 0) {
         unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_three_reg_sha(DisasContext *s, uint32_t insn)
     case 2: /* SHA1M */
     case 3: /* SHA1SU0 */
         genfn = NULL;
-        feature = ARM_FEATURE_V8_SHA1;
+        feature = dc_isar_feature(aa64_sha1, s);
         break;
     case 4: /* SHA256H */
         genfn = gen_helper_crypto_sha256h;
+        feature = dc_isar_feature(aa64_sha256, s);
         break;
     case 5: /* SHA256H2 */
         genfn = gen_helper_crypto_sha256h2;
+        feature = dc_isar_feature(aa64_sha256, s);
         break;
     case 6: /* SHA256SU1 */
         genfn = gen_helper_crypto_sha256su1;
+        feature = dc_isar_feature(aa64_sha256, s);
         break;
     default:
         unallocated_encoding(s);
         return;
     }
 
-    if (!arm_dc_feature(s, feature)) {
+    if (!feature) {
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_two_reg_sha(DisasContext *s, uint32_t insn)
     int rn = extract32(insn, 5, 5);
     int rd = extract32(insn, 0, 5);
     CryptoTwoOpFn *genfn;
-    int feature;
+    bool feature;
     TCGv_ptr tcg_rd_ptr, tcg_rn_ptr;
 
     if (size != 0) {
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_two_reg_sha(DisasContext *s, uint32_t insn)
 
     switch (opcode) {
     case 0: /* SHA1H */
-        feature = ARM_FEATURE_V8_SHA1;
+        feature = dc_isar_feature(aa64_sha1, s);
         genfn = gen_helper_crypto_sha1h;
         break;
     case 1: /* SHA1SU1 */
-        feature = ARM_FEATURE_V8_SHA1;
+        feature = dc_isar_feature(aa64_sha1, s);
         genfn = gen_helper_crypto_sha1su1;
         break;
     case 2: /* SHA256SU0 */
-        feature = ARM_FEATURE_V8_SHA256;
+        feature = dc_isar_feature(aa64_sha256, s);
         genfn = gen_helper_crypto_sha256su0;
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_two_reg_sha(DisasContext *s, uint32_t insn)
         return;
     }
 
-    if (!arm_dc_feature(s, feature)) {
+    if (!feature) {
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_three_reg_sha512(DisasContext *s, uint32_t insn)
     int rm = extract32(insn, 16, 5);
     int rn = extract32(insn, 5, 5);
     int rd = extract32(insn, 0, 5);
-    int feature;
+    bool feature;
     CryptoThreeOpFn *genfn;
 
     if (o == 0) {
         switch (opcode) {
         case 0: /* SHA512H */
-            feature = ARM_FEATURE_V8_SHA512;
+            feature = dc_isar_feature(aa64_sha512, s);
             genfn = gen_helper_crypto_sha512h;
             break;
         case 1: /* SHA512H2 */
-            feature = ARM_FEATURE_V8_SHA512;
+            feature = dc_isar_feature(aa64_sha512, s);
             genfn = gen_helper_crypto_sha512h2;
             break;
         case 2: /* SHA512SU1 */
-            feature = ARM_FEATURE_V8_SHA512;
+            feature = dc_isar_feature(aa64_sha512, s);
             genfn = gen_helper_crypto_sha512su1;
             break;
         case 3: /* RAX1 */
-            feature = ARM_FEATURE_V8_SHA3;
+            feature = dc_isar_feature(aa64_sha3, s);
             genfn = NULL;
             break;
         }
     } else {
         switch (opcode) {
         case 0: /* SM3PARTW1 */
-            feature = ARM_FEATURE_V8_SM3;
+            feature = dc_isar_feature(aa64_sm3, s);
             genfn = gen_helper_crypto_sm3partw1;
             break;
         case 1: /* SM3PARTW2 */
-            feature = ARM_FEATURE_V8_SM3;
+            feature = dc_isar_feature(aa64_sm3, s);
             genfn = gen_helper_crypto_sm3partw2;
             break;
         case 2: /* SM4EKEY */
-            feature = ARM_FEATURE_V8_SM4;
+            feature = dc_isar_feature(aa64_sm4, s);
             genfn = gen_helper_crypto_sm4ekey;
             break;
         default:
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_three_reg_sha512(DisasContext *s, uint32_t insn)
         }
     }
 
-    if (!arm_dc_feature(s, feature)) {
+    if (!feature) {
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_two_reg_sha512(DisasContext *s, uint32_t insn)
     int rn = extract32(insn, 5, 5);
     int rd = extract32(insn, 0, 5);
     TCGv_ptr tcg_rd_ptr, tcg_rn_ptr;
-    int feature;
+    bool feature;
     CryptoTwoOpFn *genfn;
 
     switch (opcode) {
     case 0: /* SHA512SU0 */
-        feature = ARM_FEATURE_V8_SHA512;
+        feature = dc_isar_feature(aa64_sha512, s);
         genfn = gen_helper_crypto_sha512su0;
         break;
     case 1: /* SM4E */
-        feature = ARM_FEATURE_V8_SM4;
+        feature = dc_isar_feature(aa64_sm4, s);
         genfn = gen_helper_crypto_sm4e;
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_two_reg_sha512(DisasContext *s, uint32_t insn)
         return;
     }
 
-    if (!arm_dc_feature(s, feature)) {
+    if (!feature) {
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_four_reg(DisasContext *s, uint32_t insn)
     int ra = extract32(insn, 10, 5);
     int rn = extract32(insn, 5, 5);
     int rd = extract32(insn, 0, 5);
-    int feature;
+    bool feature;
 
     switch (op0) {
     case 0: /* EOR3 */
     case 1: /* BCAX */
-        feature = ARM_FEATURE_V8_SHA3;
+        feature = dc_isar_feature(aa64_sha3, s);
         break;
     case 2: /* SM3SS1 */
-        feature = ARM_FEATURE_V8_SM3;
+        feature = dc_isar_feature(aa64_sm3, s);
         break;
     default:
         unallocated_encoding(s);
         return;
     }
 
-    if (!arm_dc_feature(s, feature)) {
+    if (!feature) {
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_xar(DisasContext *s, uint32_t insn)
     TCGv_i64 tcg_op1, tcg_op2, tcg_res[2];
     int pass;
 
-    if (!arm_dc_feature(s, ARM_FEATURE_V8_SHA3)) {
+    if (!dc_isar_feature(aa64_sha3, s)) {
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_crypto_three_reg_imm2(DisasContext *s, uint32_t insn)
     TCGv_ptr tcg_rd_ptr, tcg_rn_ptr, tcg_rm_ptr;
     TCGv_i32 tcg_imm2, tcg_opcode;
 
-    if (!arm_dc_feature(s, ARM_FEATURE_V8_SM3)) {
+    if (!dc_isar_feature(aa64_sm3, s)) {
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     ARMCPU *arm_cpu = arm_env_get_cpu(env);
     int bound;
 
+    dc->isar = &arm_cpu->isar;
     dc->pc = dc->base.pc_first;
     dc->condjmp = 0;
 
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static const uint8_t neon_2rm_sizes[] = {
 static int do_v81_helper(DisasContext *s, gen_helper_gvec_3_ptr *fn,
                          int q, int rd, int rn, int rm)
 {
-    if (arm_dc_feature(s, ARM_FEATURE_V8_RDM)) {
+    if (dc_isar_feature(aa32_rdm, s)) {
         int opr_sz = (1 + q) * 8;
         tcg_gen_gvec_3_ptr(vfp_reg_offset(1, rd),
                            vfp_reg_offset(1, rn),
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 return 1;
             }
             if (!u) { /* SHA-1 */
-                if (!arm_dc_feature(s, ARM_FEATURE_V8_SHA1)) {
+                if (!dc_isar_feature(aa32_sha1, s)) {
                     return 1;
                 }
                 ptr1 = vfp_reg_ptr(true, rd);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 gen_helper_crypto_sha1_3reg(ptr1, ptr2, ptr3, tmp4);
                 tcg_temp_free_i32(tmp4);
             } else { /* SHA-256 */
-                if (!arm_dc_feature(s, ARM_FEATURE_V8_SHA256) || size == 3) {
+                if (!dc_isar_feature(aa32_sha2, s) || size == 3) {
                     return 1;
                 }
                 ptr1 = vfp_reg_ptr(true, rd);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 if (op == 14 && size == 2) {
                     TCGv_i64 tcg_rn, tcg_rm, tcg_rd;
 
-                    if (!arm_dc_feature(s, ARM_FEATURE_V8_PMULL)) {
+                    if (!dc_isar_feature(aa32_pmull, s)) {
                         return 1;
                     }
                     tcg_rn = tcg_temp_new_i64();
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {
                         NeonGenThreeOpEnvFn *fn;
 
-                        if (!arm_dc_feature(s, ARM_FEATURE_V8_RDM)) {
+                        if (!dc_isar_feature(aa32_rdm, s)) {
                             return 1;
                         }
                         if (u && ((rd | rn) & 1)) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     break;
                 }
                 case NEON_2RM_AESE: case NEON_2RM_AESMC:
-                    if (!arm_dc_feature(s, ARM_FEATURE_V8_AES)
-                        || ((rm | rd) & 1)) {
+                    if (!dc_isar_feature(aa32_aes, s) || ((rm | rd) & 1)) {
                         return 1;
                     }
                     ptr1 = vfp_reg_ptr(true, rd);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     tcg_temp_free_i32(tmp3);
                     break;
                 case NEON_2RM_SHA1H:
-                    if (!arm_dc_feature(s, ARM_FEATURE_V8_SHA1)
-                        || ((rm | rd) & 1)) {
+                    if (!dc_isar_feature(aa32_sha1, s) || ((rm | rd) & 1)) {
                         return 1;
                     }
                     ptr1 = vfp_reg_ptr(true, rd);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     }
                     /* bit 6 (q): set -> SHA256SU0, cleared -> SHA1SU1 */
                     if (q) {
-                        if (!arm_dc_feature(s, ARM_FEATURE_V8_SHA256)) {
+                        if (!dc_isar_feature(aa32_sha2, s)) {
                             return 1;
                         }
-                    } else if (!arm_dc_feature(s, ARM_FEATURE_V8_SHA1)) {
+                    } else if (!dc_isar_feature(aa32_sha1, s)) {
                         return 1;
                     }
                     ptr1 = vfp_reg_ptr(true, rd);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
         /* VCMLA -- 1111 110R R.1S .... .... 1000 ...0 .... */
         int size = extract32(insn, 20, 1);
         data = extract32(insn, 23, 2); /* rot */
-        if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)
+        if (!dc_isar_feature(aa32_vcma, s)
             || (!size && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) {
             return 1;
         }
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
         /* VCADD -- 1111 110R 1.0S .... .... 1000 ...0 .... */
         int size = extract32(insn, 20, 1);
         data = extract32(insn, 24, 1); /* rot */
-        if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)
+        if (!dc_isar_feature(aa32_vcma, s)
             || (!size && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) {
             return 1;
         }
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
     } else if ((insn & 0xfeb00f00) == 0xfc200d00) {
         /* V[US]DOT -- 1111 1100 0.10 .... .... 1101 .Q.U .... */
         bool u = extract32(insn, 4, 1);
-        if (!arm_dc_feature(s, ARM_FEATURE_V8_DOTPROD)) {
+        if (!dc_isar_feature(aa32_dp, s)) {
             return 1;
         }
         fn_gvec = u ? gen_helper_gvec_udot_b : gen_helper_gvec_sdot_b;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
         int size = extract32(insn, 23, 1);
         int index;
 
-        if (!arm_dc_feature(s, ARM_FEATURE_V8_FCMA)) {
+        if (!dc_isar_feature(aa32_vcma, s)) {
             return 1;
         }
         if (size == 0) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
     } else if ((insn & 0xffb00f00) == 0xfe200d00) {
         /* V[US]DOT -- 1111 1110 0.10 .... .... 1101 .Q.U .... */
         int u = extract32(insn, 4, 1);
-        if (!arm_dc_feature(s, ARM_FEATURE_V8_DOTPROD)) {
+        if (!dc_isar_feature(aa32_dp, s)) {
             return 1;
         }
         fn_gvec = u ? gen_helper_gvec_udot_idx_b : gen_helper_gvec_sdot_idx_b;
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
              * op1 == 3 is UNPREDICTABLE but handle as UNDEFINED.
              * Bits 8, 10 and 11 should be zero.
              */
-            if (!arm_dc_feature(s, ARM_FEATURE_CRC) || op1 == 0x3 ||
-                (c & 0xd) != 0) {
+            if (!dc_isar_feature(aa32_crc32, s) || op1 == 0x3 || (c & 0xd) != 0) {
                 goto illegal_op;
             }
 
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
                 case 0x28:
                 case 0x29:
                 case 0x2a:
-                    if (!arm_dc_feature(s, ARM_FEATURE_CRC)) {
+                    if (!dc_isar_feature(aa32_crc32, s)) {
                         goto illegal_op;
                     }
                     break;
@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     CPUARMState *env = cs->env_ptr;
     ARMCPU *cpu = arm_env_get_cpu(env);
 
+    dc->isar = &cpu->isar;
     dc->pc = dc->base.pc_first;
     dc->condjmp = 0;
 
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Both arm and thumb2 division are controlled by the same ISAR field,
which takes care of the arm implies thumb case.  Having M imply
thumb2 division was wrong for cortex-m0, which is v6m and does not
have thumb2 at all, much less thumb2 division.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181016223115.24100-5-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h       | 12 ++++++++++--
 linux-user/elfload.c   |  4 ++--
 target/arm/cpu.c       | 10 +---------
 target/arm/translate.c |  4 ++--
 4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ enum arm_features {
     ARM_FEATURE_VFP3,
     ARM_FEATURE_VFP_FP16,
     ARM_FEATURE_NEON,
-    ARM_FEATURE_THUMB_DIV, /* divide supported in Thumb encoding */
     ARM_FEATURE_M, /* Microcontroller profile.  */
     ARM_FEATURE_OMAPCP, /* OMAP specific CP15 ops handling.  */
     ARM_FEATURE_THUMB2EE,
@@ -XXX,XX +XXX,XX @@ enum arm_features {
     ARM_FEATURE_V5,
     ARM_FEATURE_STRONGARM,
     ARM_FEATURE_VAPA, /* cp15 VA to PA lookups */
-    ARM_FEATURE_ARM_DIV, /* divide supported in ARM encoding */
     ARM_FEATURE_VFP4, /* VFPv4 (implies that NEON is v2) */
     ARM_FEATURE_GENERIC_TIMER,
     ARM_FEATURE_MVFR, /* Media and VFP Feature Registers 0 and 1 */
@@ -XXX,XX +XXX,XX @@ extern const uint64_t pred_esz_masks[4];
 /*
  * 32-bit feature tests via id registers.
  */
+static inline bool isar_feature_thumb_div(const ARMISARegisters *id)
+{
+    return FIELD_EX32(id->id_isar0, ID_ISAR0, DIVIDE) != 0;
+}
+
+static inline bool isar_feature_arm_div(const ARMISARegisters *id)
+{
+    return FIELD_EX32(id->id_isar0, ID_ISAR0, DIVIDE) > 1;
+}
+
 static inline bool isar_feature_aa32_aes(const ARMISARegisters *id)
 {
     return FIELD_EX32(id->id_isar5, ID_ISAR5, AES) != 0;
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
     GET_FEATURE(ARM_FEATURE_VFP3, ARM_HWCAP_ARM_VFPv3);
     GET_FEATURE(ARM_FEATURE_V6K, ARM_HWCAP_ARM_TLS);
     GET_FEATURE(ARM_FEATURE_VFP4, ARM_HWCAP_ARM_VFPv4);
-    GET_FEATURE(ARM_FEATURE_ARM_DIV, ARM_HWCAP_ARM_IDIVA);
-    GET_FEATURE(ARM_FEATURE_THUMB_DIV, ARM_HWCAP_ARM_IDIVT);
+    GET_FEATURE_ID(arm_div, ARM_HWCAP_ARM_IDIVA);
+    GET_FEATURE_ID(thumb_div, ARM_HWCAP_ARM_IDIVT);
     /* All QEMU's VFPv3 CPUs have 32 registers, see VFP_DREG in translate.c.
      * Note that the ARM_HWCAP_ARM_VFPv3D16 bit is always the inverse of
      * ARM_HWCAP_ARM_VFPD32 (and so always clear for QEMU); it is unrelated
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
          * Presence of EL2 itself is ARM_FEATURE_EL2, and of the
          * Security Extensions is ARM_FEATURE_EL3.
          */
-        set_feature(env, ARM_FEATURE_ARM_DIV);
+        assert(cpu_isar_feature(arm_div, cpu));
         set_feature(env, ARM_FEATURE_LPAE);
         set_feature(env, ARM_FEATURE_V7);
     }
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
     if (arm_feature(env, ARM_FEATURE_V5)) {
         set_feature(env, ARM_FEATURE_V4T);
     }
-    if (arm_feature(env, ARM_FEATURE_M)) {
-        set_feature(env, ARM_FEATURE_THUMB_DIV);
-    }
-    if (arm_feature(env, ARM_FEATURE_ARM_DIV)) {
-        set_feature(env, ARM_FEATURE_THUMB_DIV);
-    }
     if (arm_feature(env, ARM_FEATURE_VFP4)) {
         set_feature(env, ARM_FEATURE_VFP3);
         set_feature(env, ARM_FEATURE_VFP_FP16);
@@ -XXX,XX +XXX,XX @@ static void cortex_r5_initfn(Object *obj)
     ARMCPU *cpu = ARM_CPU(obj);
 
     set_feature(&cpu->env, ARM_FEATURE_V7);
-    set_feature(&cpu->env, ARM_FEATURE_THUMB_DIV);
-    set_feature(&cpu->env, ARM_FEATURE_ARM_DIV);
     set_feature(&cpu->env, ARM_FEATURE_V7MP);
     set_feature(&cpu->env, ARM_FEATURE_PMSA);
     cpu->midr = 0x411fc153; /* r1p3 */
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
                     case 1:
                     case 3:
                         /* SDIV, UDIV */
-                        if (!arm_dc_feature(s, ARM_FEATURE_ARM_DIV)) {
+                        if (!dc_isar_feature(arm_div, s)) {
                             goto illegal_op;
                         }
                         if (((insn >> 5) & 7) || (rd != 15)) {
@@ -XXX,XX +XXX,XX @@ static void disas_thumb2_insn(DisasContext *s, uint32_t insn)
             tmp2 = load_reg(s, rm);
             if ((op & 0x50) == 0x10) {
                 /* sdiv, udiv */
-                if (!arm_dc_feature(s, ARM_FEATURE_THUMB_DIV)) {
+                if (!dc_isar_feature(thumb_div, s)) {
                     goto illegal_op;
                 }
                 if (op & 0x20)
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Having V6 alone imply jazelle was wrong for cortex-m0.
Change to an assertion for V6 & !M.

This was harmless, because the only place we tested ARM_FEATURE_JAZELLE
was for 'bxj' in disas_arm(), which is unreachable for M-profile cores.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181016223115.24100-6-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h       |  6 +++++-
 target/arm/cpu.c       | 17 ++++++++++++++---
 target/arm/translate.c |  2 +-
 3 files changed, 20 insertions(+), 5 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181016223115.24100-7-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h     | 6 +++++-
 linux-user/elfload.c | 2 +-
 target/arm/cpu.c     | 4 ----
 target/arm/helper.c  | 2 +-
 target/arm/machine.c | 3 +--
 5 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ enum arm_features {
     ARM_FEATURE_NEON,
     ARM_FEATURE_M, /* Microcontroller profile.  */
     ARM_FEATURE_OMAPCP, /* OMAP specific CP15 ops handling.  */
-    ARM_FEATURE_THUMB2EE,
     ARM_FEATURE_V7MP,    /* v7 Multiprocessing Extensions */
     ARM_FEATURE_V7VE, /* v7 Virtualization Extensions (non-EL2 parts) */
     ARM_FEATURE_V4T,
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_jazelle(const ARMISARegisters *id)
     return FIELD_EX32(id->id_isar1, ID_ISAR1, JAZELLE) != 0;
 }
 
+static inline bool isar_feature_t32ee(const ARMISARegisters *id)
+{
+    return FIELD_EX32(id->id_isar3, ID_ISAR3, T32EE) != 0;
+}
+
 static inline bool isar_feature_aa32_aes(const ARMISARegisters *id)
 {
     return FIELD_EX32(id->id_isar5, ID_ISAR5, AES) != 0;
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
     GET_FEATURE(ARM_FEATURE_V5, ARM_HWCAP_ARM_EDSP);
     GET_FEATURE(ARM_FEATURE_VFP, ARM_HWCAP_ARM_VFP);
     GET_FEATURE(ARM_FEATURE_IWMMXT, ARM_HWCAP_ARM_IWMMXT);
-    GET_FEATURE(ARM_FEATURE_THUMB2EE, ARM_HWCAP_ARM_THUMBEE);
+    GET_FEATURE_ID(t32ee, ARM_HWCAP_ARM_THUMBEE);
     GET_FEATURE(ARM_FEATURE_NEON, ARM_HWCAP_ARM_NEON);
     GET_FEATURE(ARM_FEATURE_VFP3, ARM_HWCAP_ARM_VFPv3);
     GET_FEATURE(ARM_FEATURE_V6K, ARM_HWCAP_ARM_TLS);
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void cortex_a8_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_V7);
     set_feature(&cpu->env, ARM_FEATURE_VFP3);
     set_feature(&cpu->env, ARM_FEATURE_NEON);
-    set_feature(&cpu->env, ARM_FEATURE_THUMB2EE);
     set_feature(&cpu->env, ARM_FEATURE_DUMMY_C15_REGS);
     set_feature(&cpu->env, ARM_FEATURE_EL3);
     cpu->midr = 0x410fc080;
@@ -XXX,XX +XXX,XX @@ static void cortex_a9_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_VFP3);
     set_feature(&cpu->env, ARM_FEATURE_VFP_FP16);
     set_feature(&cpu->env, ARM_FEATURE_NEON);
-    set_feature(&cpu->env, ARM_FEATURE_THUMB2EE);
     set_feature(&cpu->env, ARM_FEATURE_EL3);
     /* Note that A9 supports the MP extensions even for
      * A9UP and single-core A9MP (which are both different
@@ -XXX,XX +XXX,XX @@ static void cortex_a7_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_V7VE);
     set_feature(&cpu->env, ARM_FEATURE_VFP4);
     set_feature(&cpu->env, ARM_FEATURE_NEON);
-    set_feature(&cpu->env, ARM_FEATURE_THUMB2EE);
     set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
     set_feature(&cpu->env, ARM_FEATURE_DUMMY_C15_REGS);
     set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
@@ -XXX,XX +XXX,XX @@ static void cortex_a15_initfn(Object *obj)
     set_feature(&cpu->env, ARM_FEATURE_V7VE);
     set_feature(&cpu->env, ARM_FEATURE_VFP4);
     set_feature(&cpu->env, ARM_FEATURE_NEON);
-    set_feature(&cpu->env, ARM_FEATURE_THUMB2EE);
     set_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER);
     set_feature(&cpu->env, ARM_FEATURE_DUMMY_C15_REGS);
     set_feature(&cpu->env, ARM_FEATURE_CBAR_RO);
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
         define_arm_cp_regs(cpu, vmsa_pmsa_cp_reginfo);
         define_arm_cp_regs(cpu, vmsa_cp_reginfo);
     }
-    if (arm_feature(env, ARM_FEATURE_THUMB2EE)) {
+    if (cpu_isar_feature(t32ee, cpu)) {
         define_arm_cp_regs(cpu, t2ee_cp_reginfo);
     }
     if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) {
diff --git a/target/arm/machine.c b/target/arm/machine.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m = {
 static bool thumb2ee_needed(void *opaque)
 {
     ARMCPU *cpu = opaque;
-    CPUARMState *env = &cpu->env;
 
-    return arm_feature(env, ARM_FEATURE_THUMB2EE);
+    return cpu_isar_feature(t32ee, cpu);
 }
 
 static const VMStateDescription vmstate_thumb2ee = {
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181016223115.24100-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h            | 16 +++++++++++++++-
 linux-user/aarch64/signal.c |  4 ++--
 linux-user/elfload.c        |  2 +-
 linux-user/syscall.c        | 10 ++++++----
 target/arm/cpu64.c          |  5 ++++-
 target/arm/helper.c         |  9 ++++++---
 target/arm/machine.c        |  3 +--
 target/arm/translate-a64.c  |  4 ++--
 8 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(ID_AA64ISAR1, FRINTTS, 32, 4)
 FIELD(ID_AA64ISAR1, SB, 36, 4)
 FIELD(ID_AA64ISAR1, SPECRES, 40, 4)
 
+FIELD(ID_AA64PFR0, EL0, 0, 4)
+FIELD(ID_AA64PFR0, EL1, 4, 4)
+FIELD(ID_AA64PFR0, EL2, 8, 4)
+FIELD(ID_AA64PFR0, EL3, 12, 4)
+FIELD(ID_AA64PFR0, FP, 16, 4)
+FIELD(ID_AA64PFR0, ADVSIMD, 20, 4)
+FIELD(ID_AA64PFR0, GIC, 24, 4)
+FIELD(ID_AA64PFR0, RAS, 28, 4)
+FIELD(ID_AA64PFR0, SVE, 32, 4)
+
 QEMU_BUILD_BUG_ON(ARRAY_SIZE(((ARMCPU *)0)->ccsidr) <= R_V7M_CSSELR_INDEX_MASK);
 
 /* If adding a feature bit which corresponds to a Linux ELF
@@ -XXX,XX +XXX,XX @@ enum arm_features {
     ARM_FEATURE_PMU, /* has PMU support */
     ARM_FEATURE_VBAR, /* has cp15 VBAR */
     ARM_FEATURE_M_SECURITY, /* M profile Security Extension */
-    ARM_FEATURE_SVE, /* has Scalable Vector Extension */
     ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
     ARM_FEATURE_M_MAIN, /* M profile Main Extension */
 };
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_fcma(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, FCMA) != 0;
 }
 
+static inline bool isar_feature_aa64_sve(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, SVE) != 0;
+}
+
 /*
  * Forward to the above feature tests given an ARMCPU pointer.
  */
diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/aarch64/signal.c
+++ b/linux-user/aarch64/signal.c
@@ -XXX,XX +XXX,XX @@ static int target_restore_sigframe(CPUARMState *env,
             break;
 
         case TARGET_SVE_MAGIC:
-            if (arm_feature(env, ARM_FEATURE_SVE)) {
+            if (cpu_isar_feature(aa64_sve, arm_env_get_cpu(env))) {
                 vq = (env->vfp.zcr_el[1] & 0xf) + 1;
                 sve_size = QEMU_ALIGN_UP(TARGET_SVE_SIG_CONTEXT_SIZE(vq), 16);
                 if (!sve && size == sve_size) {
@@ -XXX,XX +XXX,XX @@ static void target_setup_frame(int usig, struct target_sigaction *ka,
                                       &layout);
 
     /* SVE state needs saving only if it exists.  */
-    if (arm_feature(env, ARM_FEATURE_SVE)) {
+    if (cpu_isar_feature(aa64_sve, arm_env_get_cpu(env))) {
         vq = (env->vfp.zcr_el[1] & 0xf) + 1;
         sve_size = QEMU_ALIGN_UP(TARGET_SVE_SIG_CONTEXT_SIZE(vq), 16);
         sve_ofs = alloc_sigframe_space(sve_size, &layout);
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
     GET_FEATURE_ID(aa64_rdm, ARM_HWCAP_A64_ASIMDRDM);
     GET_FEATURE_ID(aa64_dp, ARM_HWCAP_A64_ASIMDDP);
     GET_FEATURE_ID(aa64_fcma, ARM_HWCAP_A64_FCMA);
-    GET_FEATURE(ARM_FEATURE_SVE, ARM_HWCAP_A64_SVE);
+    GET_FEATURE_ID(aa64_sve, ARM_HWCAP_A64_SVE);
 
 #undef GET_FEATURE
 #undef GET_FEATURE_ID
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -XXX,XX +XXX,XX @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
              * even though the current architectural maximum is VQ=16.
              */
             ret = -TARGET_EINVAL;
-            if (arm_feature(cpu_env, ARM_FEATURE_SVE)
+            if (cpu_isar_feature(aa64_sve, arm_env_get_cpu(cpu_env))
                 && arg2 >= 0 && arg2 <= 512 * 16 && !(arg2 & 15)) {
                 CPUARMState *env = cpu_env;
                 ARMCPU *cpu = arm_env_get_cpu(env);
@@ -XXX,XX +XXX,XX @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
             return ret;
         case TARGET_PR_SVE_GET_VL:
             ret = -TARGET_EINVAL;
-            if (arm_feature(cpu_env, ARM_FEATURE_SVE)) {
-                CPUARMState *env = cpu_env;
-                ret = ((env->vfp.zcr_el[1] & 0xf) + 1) * 16;
+            {
+                ARMCPU *cpu = arm_env_get_cpu(cpu_env);
+                if (cpu_isar_feature(aa64_sve, cpu)) {
+                    ret = ((cpu->env.vfp.zcr_el[1] & 0xf) + 1) * 16;
+                }
             }
             return ret;
 #endif /* AARCH64 */
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
         cpu->isar.id_aa64isar1 = t;
 
+        t = cpu->isar.id_aa64pfr0;
+        t = FIELD_DP64(t, ID_AA64PFR0, SVE, 1);
+        cpu->isar.id_aa64pfr0 = t;
+
         /* Replicate the same data to the 32-bit id registers.  */
         u = cpu->isar.id_isar5;
         u = FIELD_DP32(u, ID_ISAR5, AES, 2); /* AES + PMULL */
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          * present in either.
          */
         set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
-        set_feature(&cpu->env, ARM_FEATURE_SVE);
         /* For usermode -cpu max we can use a larger and more efficient DCZ
          * blocksize since we don't have to follow what the hardware does.
          */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
         define_one_arm_cp_reg(cpu, &sctlr);
     }
 
-    if (arm_feature(env, ARM_FEATURE_SVE)) {
+    if (cpu_isar_feature(aa64_sve, cpu)) {
         define_one_arm_cp_reg(cpu, &zcr_el1_reginfo);
         if (arm_feature(env, ARM_FEATURE_EL2)) {
             define_one_arm_cp_reg(cpu, &zcr_el2_reginfo);
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
     uint32_t flags;
 
     if (is_a64(env)) {
+        ARMCPU *cpu = arm_env_get_cpu(env);
+
         *pc = env->pc;
         flags = ARM_TBFLAG_AARCH64_STATE_MASK;
         /* Get control bits for tagged addresses */
         flags |= (arm_regime_tbi0(env, mmu_idx) << ARM_TBFLAG_TBI0_SHIFT);
         flags |= (arm_regime_tbi1(env, mmu_idx) << ARM_TBFLAG_TBI1_SHIFT);
 
-        if (arm_feature(env, ARM_FEATURE_SVE)) {
+        if (cpu_isar_feature(aa64_sve, cpu)) {
             int sve_el = sve_exception_el(env, current_el);
             uint32_t zcr_len;
 
@@ -XXX,XX +XXX,XX @@ void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq)
 void aarch64_sve_change_el(CPUARMState *env, int old_el,
                            int new_el, bool el0_a64)
 {
+    ARMCPU *cpu = arm_env_get_cpu(env);
     int old_len, new_len;
     bool old_a64, new_a64;
 
     /* Nothing to do if no SVE.  */
-    if (!arm_feature(env, ARM_FEATURE_SVE)) {
+    if (!cpu_isar_feature(aa64_sve, cpu)) {
         return;
     }
 
diff --git a/target/arm/machine.c b/target/arm/machine.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_iwmmxt = {
 static bool sve_needed(void *opaque)
 {
     ARMCPU *cpu = opaque;
-    CPUARMState *env = &cpu->env;
 
-    return arm_feature(env, ARM_FEATURE_SVE);
+    return cpu_isar_feature(aa64_sve, cpu);
 }
 
 /* The first two words of each Zreg is stored in VFP state.  */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
     cpu_fprintf(f, "     FPCR=%08x FPSR=%08x\n",
                 vfp_get_fpcr(env), vfp_get_fpsr(env));
 
-    if (arm_feature(env, ARM_FEATURE_SVE) && sve_exception_el(env, el) == 0) {
+    if (cpu_isar_feature(aa64_sve, cpu) && sve_exception_el(env, el) == 0) {
         int j, zcr_len = sve_zcr_len_for_el(env, el);
 
         for (i = 0; i <= FFR_PRED_NUM; i++) {
@@ -XXX,XX +XXX,XX @@ static void disas_a64_insn(CPUARMState *env, DisasContext *s)
         unallocated_encoding(s);
         break;
     case 0x2:
-        if (!arm_dc_feature(s, ARM_FEATURE_SVE) || !disas_sve(s, insn)) {
+        if (!dc_isar_feature(aa64_sve, s) || !disas_sve(s, insn)) {
             unallocated_encoding(s);
         }
         break;
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181016223115.24100-9-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h           | 17 +++++++++++++++-
 linux-user/elfload.c       |  6 +-----
 target/arm/cpu64.c         | 16 ++++++++-------
 target/arm/helper.c        |  2 +-
 target/arm/translate-a64.c | 40 +++++++++++++++++++-------------------
 target/arm/translate.c     |  6 +++---
 6 files changed, 50 insertions(+), 37 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ enum arm_features {
     ARM_FEATURE_PMU, /* has PMU support */
     ARM_FEATURE_VBAR, /* has cp15 VBAR */
     ARM_FEATURE_M_SECURITY, /* M profile Security Extension */
-    ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
     ARM_FEATURE_M_MAIN, /* M profile Main Extension */
 };
 
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_dp(const ARMISARegisters *id)
     return FIELD_EX32(id->id_isar6, ID_ISAR6, DP) != 0;
 }
 
+static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
+{
+    /*
+     * This is a placeholder for use by VCMA until the rest of
+     * the ARMv8.2-FP16 extension is implemented for aa32 mode.
+     * At which point we can properly set and check MVFR1.FPHP.
+     */
+    return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, FP) == 1;
+}
+
 /*
  * 64-bit feature tests via id registers.
  */
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_fcma(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, FCMA) != 0;
 }
 
+static inline bool isar_feature_aa64_fp16(const ARMISARegisters *id)
+{
+    /* We always set the AdvSIMD and FP fields identically wrt FP16.  */
+    return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, FP) == 1;
+}
+
 static inline bool isar_feature_aa64_sve(const ARMISARegisters *id)
 {
     return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, SVE) != 0;
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
     hwcaps |= ARM_HWCAP_A64_ASIMD;
 
     /* probe for the extra features */
-#define GET_FEATURE(feat, hwcap) \
-    do { if (arm_feature(&cpu->env, feat)) { hwcaps |= hwcap; } } while (0)
 #define GET_FEATURE_ID(feat, hwcap) \
     do { if (cpu_isar_feature(feat, cpu)) { hwcaps |= hwcap; } } while (0)
 
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
     GET_FEATURE_ID(aa64_sha3, ARM_HWCAP_A64_SHA3);
     GET_FEATURE_ID(aa64_sm3, ARM_HWCAP_A64_SM3);
     GET_FEATURE_ID(aa64_sm4, ARM_HWCAP_A64_SM4);
-    GET_FEATURE(ARM_FEATURE_V8_FP16,
-                ARM_HWCAP_A64_FPHP | ARM_HWCAP_A64_ASIMDHP);
+    GET_FEATURE_ID(aa64_fp16, ARM_HWCAP_A64_FPHP | ARM_HWCAP_A64_ASIMDHP);
     GET_FEATURE_ID(aa64_atomics, ARM_HWCAP_A64_ATOMICS);
     GET_FEATURE_ID(aa64_rdm, ARM_HWCAP_A64_ASIMDRDM);
     GET_FEATURE_ID(aa64_dp, ARM_HWCAP_A64_ASIMDDP);
     GET_FEATURE_ID(aa64_fcma, ARM_HWCAP_A64_FCMA);
     GET_FEATURE_ID(aa64_sve, ARM_HWCAP_A64_SVE);
 
-#undef GET_FEATURE
 #undef GET_FEATURE_ID
 
     return hwcaps;
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
 
         t = cpu->isar.id_aa64pfr0;
         t = FIELD_DP64(t, ID_AA64PFR0, SVE, 1);
+        t = FIELD_DP64(t, ID_AA64PFR0, FP, 1);
+        t = FIELD_DP64(t, ID_AA64PFR0, ADVSIMD, 1);
         cpu->isar.id_aa64pfr0 = t;
 
         /* Replicate the same data to the 32-bit id registers.  */
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         u = FIELD_DP32(u, ID_ISAR6, DP, 1);
         cpu->isar.id_isar6 = u;
 
-#ifdef CONFIG_USER_ONLY
-        /* We don't set these in system emulation mode for the moment,
-         * since we don't correctly set the ID registers to advertise them,
-         * and in some cases they're only available in AArch64 and not AArch32,
-         * whereas the architecture requires them to be present in both if
-         * present in either.
+        /*
+         * FIXME: We do not yet support ARMv8.2-fp16 for AArch32 yet,
+         * so do not set MVFR1.FPHP.  Strictly speaking this is not legal,
+         * but it is also not legal to enable SVE without support for FP16,
+         * and enabling SVE in system mode is more useful in the short term.
          */
-        set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
+
+#ifdef CONFIG_USER_ONLY
         /* For usermode -cpu max we can use a larger and more efficient DCZ
          * blocksize since we don't have to follow what the hardware does.
          */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
     uint32_t changed;
 
     /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
-    if (!arm_feature(env, ARM_FEATURE_V8_FP16)) {
+    if (!cpu_isar_feature(aa64_fp16, arm_env_get_cpu(env))) {
         val &= ~FPCR_FZ16;
     }
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_fp_compare(DisasContext *s, uint32_t insn)
         break;
     case 3:
         size = MO_16;
-        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+        if (dc_isar_feature(aa64_fp16, s)) {
             break;
         }
         /* fallthru */
@@ -XXX,XX +XXX,XX @@ static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
         break;
     case 3:
         size = MO_16;
-        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+        if (dc_isar_feature(aa64_fp16, s)) {
             break;
         }
         /* fallthru */
@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
         break;
     case 3:
         sz = MO_16;
-        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+        if (dc_isar_feature(aa64_fp16, s)) {
             break;
         }
         /* fallthru */
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
             handle_fp_1src_double(s, opcode, rd, rn);
             break;
         case 3:
-            if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            if (!dc_isar_feature(aa64_fp16, s)) {
                 unallocated_encoding(s);
                 return;
             }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_2src(DisasContext *s, uint32_t insn)
         handle_fp_2src_double(s, opcode, rd, rn, rm);
         break;
     case 3:
-        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+        if (!dc_isar_feature(aa64_fp16, s)) {
             unallocated_encoding(s);
             return;
         }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_3src(DisasContext *s, uint32_t insn)
         handle_fp_3src_double(s, o0, o1, rd, rn, rm, ra);
         break;
     case 3:
-        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+        if (!dc_isar_feature(aa64_fp16, s)) {
             unallocated_encoding(s);
             return;
         }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_imm(DisasContext *s, uint32_t insn)
         break;
     case 3:
         sz = MO_16;
-        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+        if (dc_isar_feature(aa64_fp16, s)) {
             break;
         }
         /* fallthru */
@@ -XXX,XX +XXX,XX @@ static void disas_fp_fixed_conv(DisasContext *s, uint32_t insn)
     case 1: /* float64 */
         break;
     case 3: /* float16 */
-        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+        if (dc_isar_feature(aa64_fp16, s)) {
             break;
         }
         /* fallthru */
@@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
             break;
         case 0x6: /* 16-bit float, 32-bit int */
         case 0xe: /* 16-bit float, 64-bit int */
-            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            if (dc_isar_feature(aa64_fp16, s)) {
                 break;
             }
             /* fallthru */
@@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
         case 1: /* float64 */
             break;
         case 3: /* float16 */
-            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            if (dc_isar_feature(aa64_fp16, s)) {
                 break;
             }
             /* fallthru */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_across_lanes(DisasContext *s, uint32_t insn)
          */
         is_min = extract32(size, 1, 1);
         is_fp = true;
-        if (!is_u && arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+        if (!is_u && dc_isar_feature(aa64_fp16, s)) {
             size = 1;
         } else if (!is_u || !is_q || extract32(size, 0, 1)) {
             unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
 
     if (o2 != 0 || ((cmode == 0xf) && is_neg && !is_q)) {
         /* Check for FMOV (vector, immediate) - half-precision */
-        if (!(arm_dc_feature(s, ARM_FEATURE_V8_FP16) && o2 && cmode == 0xf)) {
+        if (!(dc_isar_feature(aa64_fp16, s) && o2 && cmode == 0xf)) {
             unallocated_encoding(s);
             return;
         }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_pairwise(DisasContext *s, uint32_t insn)
     case 0x2f: /* FMINP */
         /* FP op, size[0] is 32 or 64 bit*/
         if (!u) {
-            if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            if (!dc_isar_feature(aa64_fp16, s)) {
                 unallocated_encoding(s);
                 return;
             } else {
@@ -XXX,XX +XXX,XX @@ static void handle_simd_shift_intfp_conv(DisasContext *s, bool is_scalar,
         size = MO_32;
     } else if (immh & 2) {
         size = MO_16;
-        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+        if (!dc_isar_feature(aa64_fp16, s)) {
             unallocated_encoding(s);
             return;
         }
@@ -XXX,XX +XXX,XX @@ static void handle_simd_shift_fpint_conv(DisasContext *s, bool is_scalar,
         size = MO_32;
     } else if (immh & 0x2) {
         size = MO_16;
-        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+        if (!dc_isar_feature(aa64_fp16, s)) {
             unallocated_encoding(s);
             return;
         }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_three_reg_same_fp16(DisasContext *s,
         return;
     }
 
-    if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+    if (!dc_isar_feature(aa64_fp16, s)) {
         unallocated_encoding(s);
     }
 
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_fp16(DisasContext *s, uint32_t insn)
     TCGv_ptr fpst;
     bool pairwise = false;
 
-    if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+    if (!dc_isar_feature(aa64_fp16, s)) {
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
     case 0x1c: /* FCADD, #90 */
     case 0x1e: /* FCADD, #270 */
         if (size == 0
-            || (size == 1 && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))
+            || (size == 1 && !dc_isar_feature(aa64_fp16, s))
             || (size == 3 && !is_q)) {
             unallocated_encoding(s);
             return;
@@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn)
     bool need_fpst = true;
     int rmode;
 
-    if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+    if (!dc_isar_feature(aa64_fp16, s)) {
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
         }
         break;
     }
-    if (is_fp16 && !arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+    if (is_fp16 && !dc_isar_feature(aa64_fp16, s)) {
         unallocated_encoding(s);
         return;
     }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
         int size = extract32(insn, 20, 1);
         data = extract32(insn, 23, 2); /* rot */
         if (!dc_isar_feature(aa32_vcma, s)
-            || (!size && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) {
+            || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
             return 1;
         }
         fn_gvec_ptr = size ? gen_helper_gvec_fcmlas : gen_helper_gvec_fcmlah;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
         int size = extract32(insn, 20, 1);
         data = extract32(insn, 24, 1); /* rot */
         if (!dc_isar_feature(aa32_vcma, s)
-            || (!size && !arm_dc_feature(s, ARM_FEATURE_V8_FP16))) {
+            || (!size && !dc_isar_feature(aa32_fp16_arith, s))) {
             return 1;
         }
         fn_gvec_ptr = size ? gen_helper_gvec_fcadds : gen_helper_gvec_fcaddh;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
             return 1;
         }
         if (size == 0) {
-            if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            if (!dc_isar_feature(aa32_fp16_arith, s)) {
                 return 1;
             }
             /* For fp16, rm is just Vm, and index is M.  */
-- 
2.19.1

For AArch32, exception return happens through certain kinds
of CPSR write. We don't currently have any CPU_LOG_INT logging
of these events (unlike AArch64, where we log in the ERET
instruction). Add some suitable logging.

This will log exception returns like this:
Exception return from AArch32 hyp to usr PC 0x80100374

paralleling the existing logging in the exception_return
helper for AArch64 exception returns:
Exception return from AArch64 EL2 to AArch64 EL0 PC 0x8003045c
Exception return from AArch64 EL2 to AArch32 EL0 PC 0x8003045c

(Note that an AArch32 exception return can only be
AArch32->AArch32, never to AArch64.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181012144235.19646-2-peter.maydell@linaro.org
---
 target/arm/internals.h | 18 ++++++++++++++++++
 target/arm/helper.c    | 10 ++++++++++
 target/arm/translate.c |  7 +------
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline uint32_t v7m_sp_limit(CPUARMState *env)
     }
 }
 
+/**
+ * aarch32_mode_name(): Return name of the AArch32 CPU mode
+ * @psr: Program Status Register indicating CPU mode
+ *
+ * Returns, for debug logging purposes, a printable representation
+ * of the AArch32 CPU mode ("svc", "usr", etc) as indicated by
+ * the low bits of the specified PSR.
+ */
+static inline const char *aarch32_mode_name(uint32_t psr)
+{
+    static const char cpu_mode_names[16][4] = {
+        "usr", "fiq", "irq", "svc", "???", "???", "mon", "abt",
+        "???", "???", "hyp", "und", "???", "???", "???", "sys"
+    };
+
+    return cpu_mode_names[psr & 0xf];
+}
+
 #endif
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void cpsr_write(CPUARMState *env, uint32_t val, uint32_t mask,
                 mask |= CPSR_IL;
                 val |= CPSR_IL;
             }
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "Illegal AArch32 mode switch attempt from %s to %s\n",
+                          aarch32_mode_name(env->uncached_cpsr),
+                          aarch32_mode_name(val));
         } else {
+            qemu_log_mask(CPU_LOG_INT, "%s %s to %s PC 0x%" PRIx32 "\n",
+                          write_type == CPSRWriteExceptionReturn ?
+                          "Exception return from AArch32" :
+                          "AArch32 mode switch from",
+                          aarch32_mode_name(env->uncached_cpsr),
+                          aarch32_mode_name(val), env->regs[15]);
             switch_mode(env, val & CPSR_M);
         }
     }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ void gen_intermediate_code(CPUState *cpu, TranslationBlock *tb)
     translator_loop(ops, &dc.base, cpu, tb);
 }
 
-static const char *cpu_mode_names[16] = {
-  "usr", "fiq", "irq", "svc", "???", "???", "mon", "abt",
-  "???", "???", "hyp", "und", "???", "???", "???", "sys"
-};
-
 void arm_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
                         int flags)
 {
@@ -XXX,XX +XXX,XX @@ void arm_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
                     psr & CPSR_V ? 'V' : '-',
                     psr & CPSR_T ? 'T' : 'A',
                     ns_status,
-                    cpu_mode_names[psr & 0xf], (psr & 0x10) ? 32 : 26);
+                    aarch32_mode_name(psr), (psr & 0x10) ? 32 : 26);
     }
 
     if (flags & CPU_DUMP_FPU) {
-- 
2.19.1

The switch_mode() function is defined in target/arm/helper.c and used
only in that file and nowhere else, so we can make it file-local
rather than global.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181012144235.19646-3-peter.maydell@linaro.org
---
 target/arm/internals.h | 1 -
 target/arm/helper.c    | 6 ++++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline int bank_number(int mode)
     g_assert_not_reached();
 }
 
-void switch_mode(CPUARMState *, int);
 void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu);
 void arm_translate_init(void);
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void v8m_security_lookup(CPUARMState *env, uint32_t address,
                                 V8M_SAttributes *sattrs);
 #endif
 
+static void switch_mode(CPUARMState *env, int mode);
+
 static int vfp_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg)
 {
     int nregs;
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(v7m_tt)(CPUARMState *env, uint32_t addr, uint32_t op)
     return 0;
 }
 
-void switch_mode(CPUARMState *env, int mode)
+static void switch_mode(CPUARMState *env, int mode)
 {
     ARMCPU *cpu = arm_env_get_cpu(env);
 
@@ -XXX,XX +XXX,XX @@ void aarch64_sync_64_to_32(CPUARMState *env)
 
 #else
 
-void switch_mode(CPUARMState *env, int mode)
+static void switch_mode(CPUARMState *env, int mode)
 {
     int old_mode;
     int i;
-- 
2.19.1

The HCR.FB virtualization configuration register bit requests that
TLB maintenance, branch predictor invalidate-all and icache
invalidate-all operations performed in NS EL1 should be upgraded
from "local CPU only to "broadcast within Inner Shareable domain".
For QEMU we NOP the branch predictor and icache operations, so
we only need to upgrade the TLB invalidates:
 AArch32 TLBIALL, TLBIMVA, TLBIASID, DTLBIALL, DTLBIMVA, DTLBIASID,
         ITLBIALL, ITLBIMVA, ITLBIASID, TLBIMVAA, TLBIMVAL, TLBIMVAAL
 AArch64 TLBI VMALLE1, TLBI VAE1, TLBI ASIDE1, TLBI VAAE1,
         TLBI VALE1, TLBI VAALE1

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181012144235.19646-4-peter.maydell@linaro.org
---
 target/arm/helper.c | 191 +++++++++++++++++++++++++++-----------------
 1 file changed, 116 insertions(+), 75 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void contextidr_write(CPUARMState *env, const ARMCPRegInfo *ri,
     raw_write(env, ri, value);
 }
 
-static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                          uint64_t value)
-{
-    /* Invalidate all (TLBIALL) */
-    ARMCPU *cpu = arm_env_get_cpu(env);
-
-    tlb_flush(CPU(cpu));
-}
-
-static void tlbimva_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                          uint64_t value)
-{
-    /* Invalidate single TLB entry by MVA and ASID (TLBIMVA) */
-    ARMCPU *cpu = arm_env_get_cpu(env);
-
-    tlb_flush_page(CPU(cpu), value & TARGET_PAGE_MASK);
-}
-
-static void tlbiasid_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                           uint64_t value)
-{
-    /* Invalidate by ASID (TLBIASID) */
-    ARMCPU *cpu = arm_env_get_cpu(env);
-
-    tlb_flush(CPU(cpu));
-}
-
-static void tlbimvaa_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                           uint64_t value)
-{
-    /* Invalidate single entry by MVA, all ASIDs (TLBIMVAA) */
-    ARMCPU *cpu = arm_env_get_cpu(env);
-
-    tlb_flush_page(CPU(cpu), value & TARGET_PAGE_MASK);
-}
-
 /* IS variants of TLB operations must affect all cores */
 static void tlbiall_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                              uint64_t value)
@@ -XXX,XX +XXX,XX @@ static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
     tlb_flush_page_all_cpus_synced(cs, value & TARGET_PAGE_MASK);
 }
 
+/*
+ * Non-IS variants of TLB operations are upgraded to
+ * IS versions if we are at NS EL1 and HCR_EL2.FB is set to
+ * force broadcast of these operations.
+ */
+static bool tlb_force_broadcast(CPUARMState *env)
+{
+    return (env->cp15.hcr_el2 & HCR_FB) &&
+        arm_current_el(env) == 1 && arm_is_secure_below_el3(env);
+}
+
+static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
+                          uint64_t value)
+{
+    /* Invalidate all (TLBIALL) */
+    ARMCPU *cpu = arm_env_get_cpu(env);
+
+    if (tlb_force_broadcast(env)) {
+        tlbiall_is_write(env, NULL, value);
+        return;
+    }
+
+    tlb_flush(CPU(cpu));
+}
+
+static void tlbimva_write(CPUARMState *env, const ARMCPRegInfo *ri,
+                          uint64_t value)
+{
+    /* Invalidate single TLB entry by MVA and ASID (TLBIMVA) */
+    ARMCPU *cpu = arm_env_get_cpu(env);
+
+    if (tlb_force_broadcast(env)) {
+        tlbimva_is_write(env, NULL, value);
+        return;
+    }
+
+    tlb_flush_page(CPU(cpu), value & TARGET_PAGE_MASK);
+}
+
+static void tlbiasid_write(CPUARMState *env, const ARMCPRegInfo *ri,
+                           uint64_t value)
+{
+    /* Invalidate by ASID (TLBIASID) */
+    ARMCPU *cpu = arm_env_get_cpu(env);
+
+    if (tlb_force_broadcast(env)) {
+        tlbiasid_is_write(env, NULL, value);
+        return;
+    }
+
+    tlb_flush(CPU(cpu));
+}
+
+static void tlbimvaa_write(CPUARMState *env, const ARMCPRegInfo *ri,
+                           uint64_t value)
+{
+    /* Invalidate single entry by MVA, all ASIDs (TLBIMVAA) */
+    ARMCPU *cpu = arm_env_get_cpu(env);
+
+    if (tlb_force_broadcast(env)) {
+        tlbimvaa_is_write(env, NULL, value);
+        return;
+    }
+
+    tlb_flush_page(CPU(cpu), value & TARGET_PAGE_MASK);
+}
+
 static void tlbiall_nsnh_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                uint64_t value)
 {
@@ -XXX,XX +XXX,XX @@ static CPAccessResult aa64_cacheop_access(CPUARMState *env,
  * Page D4-1736 (DDI0487A.b)
  */
 
-static void tlbi_aa64_vmalle1_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                                    uint64_t value)
-{
-    CPUState *cs = ENV_GET_CPU(env);
-
-    if (arm_is_secure_below_el3(env)) {
-        tlb_flush_by_mmuidx(cs,
-                            ARMMMUIdxBit_S1SE1 |
-                            ARMMMUIdxBit_S1SE0);
-    } else {
-        tlb_flush_by_mmuidx(cs,
-                            ARMMMUIdxBit_S12NSE1 |
-                            ARMMMUIdxBit_S12NSE0);
-    }
-}
-
 static void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                       uint64_t value)
 {
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
     }
 }
 
+static void tlbi_aa64_vmalle1_write(CPUARMState *env, const ARMCPRegInfo *ri,
+                                    uint64_t value)
+{
+    CPUState *cs = ENV_GET_CPU(env);
+
+    if (tlb_force_broadcast(env)) {
+        tlbi_aa64_vmalle1_write(env, NULL, value);
+        return;
+    }
+
+    if (arm_is_secure_below_el3(env)) {
+        tlb_flush_by_mmuidx(cs,
+                            ARMMMUIdxBit_S1SE1 |
+                            ARMMMUIdxBit_S1SE0);
+    } else {
+        tlb_flush_by_mmuidx(cs,
+                            ARMMMUIdxBit_S12NSE1 |
+                            ARMMMUIdxBit_S12NSE0);
+    }
+}
+
 static void tlbi_aa64_alle1_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                   uint64_t value)
 {
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_alle3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
     tlb_flush_by_mmuidx_all_cpus_synced(cs, ARMMMUIdxBit_S1E3);
 }
 
-static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri,
-                                 uint64_t value)
-{
-    /* Invalidate by VA, EL1&0 (AArch64 version).
-     * Currently handles all of VAE1, VAAE1, VAALE1 and VALE1,
-     * since we don't support flush-for-specific-ASID-only or
-     * flush-last-level-only.
-     */
-    ARMCPU *cpu = arm_env_get_cpu(env);
-    CPUState *cs = CPU(cpu);
-    uint64_t pageaddr = sextract64(value << 12, 0, 56);
-
-    if (arm_is_secure_below_el3(env)) {
-        tlb_flush_page_by_mmuidx(cs, pageaddr,
-                                 ARMMMUIdxBit_S1SE1 |
-                                 ARMMMUIdxBit_S1SE0);
-    } else {
-        tlb_flush_page_by_mmuidx(cs, pageaddr,
-                                 ARMMMUIdxBit_S12NSE1 |
-                                 ARMMMUIdxBit_S12NSE0);
-    }
-}
-
 static void tlbi_aa64_vae2_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                  uint64_t value)
 {
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
     }
 }
 
+static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri,
+                                 uint64_t value)
+{
+    /* Invalidate by VA, EL1&0 (AArch64 version).
+     * Currently handles all of VAE1, VAAE1, VAALE1 and VALE1,
+     * since we don't support flush-for-specific-ASID-only or
+     * flush-last-level-only.
+     */
+    ARMCPU *cpu = arm_env_get_cpu(env);
+    CPUState *cs = CPU(cpu);
+    uint64_t pageaddr = sextract64(value << 12, 0, 56);
+
+    if (tlb_force_broadcast(env)) {
+        tlbi_aa64_vae1is_write(env, NULL, value);
+        return;
+    }
+
+    if (arm_is_secure_below_el3(env)) {
+        tlb_flush_page_by_mmuidx(cs, pageaddr,
+                                 ARMMMUIdxBit_S1SE1 |
+                                 ARMMMUIdxBit_S1SE0);
+    } else {
+        tlb_flush_page_by_mmuidx(cs, pageaddr,
+                                 ARMMMUIdxBit_S12NSE1 |
+                                 ARMMMUIdxBit_S12NSE0);
+    }
+}
+
 static void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                    uint64_t value)
 {
-- 
2.19.1

The HCR.DC virtualization configuration register bit has the
following effects:
 * SCTLR.M behaves as if it is 0 for all purposes except
   direct reads of the bit
 * HCR.VM behaves as if it is 1 for all purposes except
   direct reads of the bit
 * the memory type produced by the first stage of the EL1&EL0
   translation regime is Normal Non-Shareable,
   Inner Write-Back Read-Allocate Write-Allocate,
   Outer Write-Back Read-Allocate Write-Allocate.

Implement this behaviour.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181012144235.19646-5-peter.maydell@linaro.org
---
 target/arm/helper.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
          * * The Non-secure TTBCR.EAE bit is set to 1
          * * The implementation includes EL2, and the value of HCR.VM is 1
          *
+         * (Note that HCR.DC makes HCR.VM behave as if it is 1.)
+         *
          * ATS1Hx always uses the 64bit format (not supported yet).
          */
         format64 = arm_s1_regime_using_lpae_format(env, mmu_idx);
 
         if (arm_feature(env, ARM_FEATURE_EL2)) {
             if (mmu_idx == ARMMMUIdx_S12NSE0 || mmu_idx == ARMMMUIdx_S12NSE1) {
-                format64 |= env->cp15.hcr_el2 & HCR_VM;
+                format64 |= env->cp15.hcr_el2 & (HCR_VM | HCR_DC);
             } else {
                 format64 |= arm_current_el(env) == 2;
             }
@@ -XXX,XX +XXX,XX @@ static inline bool regime_translation_disabled(CPUARMState *env,
     }
 
     if (mmu_idx == ARMMMUIdx_S2NS) {
-        return (env->cp15.hcr_el2 & HCR_VM) == 0;
+        /* HCR.DC means HCR.VM behaves as 1 */
+        return (env->cp15.hcr_el2 & (HCR_DC | HCR_VM)) == 0;
     }
 
     if (env->cp15.hcr_el2 & HCR_TGE) {
@@ -XXX,XX +XXX,XX @@ static inline bool regime_translation_disabled(CPUARMState *env,
         }
     }
 
+    if ((env->cp15.hcr_el2 & HCR_DC) &&
+        (mmu_idx == ARMMMUIdx_S1NSE0 || mmu_idx == ARMMMUIdx_S1NSE1)) {
+        /* HCR.DC means SCTLR_EL1.M behaves as 0 */
+        return true;
+    }
+
     return (regime_sctlr(env, mmu_idx) & SCTLR_M) == 0;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr(CPUARMState *env, target_ulong address,
 
             /* Combine the S1 and S2 cache attributes, if needed */
             if (!ret && cacheattrs != NULL) {
+                if (env->cp15.hcr_el2 & HCR_DC) {
+                    /*
+                     * HCR.DC forces the first stage attributes to
+                     *  Normal Non-Shareable,
+                     *  Inner Write-Back Read-Allocate Write-Allocate,
+                     *  Outer Write-Back Read-Allocate Write-Allocate.
+                     */
+                    cacheattrs->attrs = 0xff;
+                    cacheattrs->shareability = 0;
+                }
                 *cacheattrs = combine_cacheattrs(*cacheattrs, cacheattrs2);
             }
 
-- 
2.19.1

The A/I/F bits in ISR_EL1 should track the virtual interrupt
status, not the physical interrupt status, if the associated
HCR_EL2.AMO/IMO/FMO bit is set. Implement this, rather than
always showing the physical interrupt status.

We don't currently implement anything to do with external
aborts, so this applies only to the I and F bits (though it
ought to be possible for the outer guest to present a virtual
external abort to the inner guest, even if QEMU doesn't
emulate physical external aborts, so there is missing
functionality in this area).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181012144235.19646-6-peter.maydell@linaro.org
---
 target/arm/helper.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t isr_read(CPUARMState *env, const ARMCPRegInfo *ri)
     CPUState *cs = ENV_GET_CPU(env);
     uint64_t ret = 0;
 
-    if (cs->interrupt_request & CPU_INTERRUPT_HARD) {
-        ret |= CPSR_I;
+    if (arm_hcr_el2_imo(env)) {
+        if (cs->interrupt_request & CPU_INTERRUPT_VIRQ) {
+            ret |= CPSR_I;
+        }
+    } else {
+        if (cs->interrupt_request & CPU_INTERRUPT_HARD) {
+            ret |= CPSR_I;
+        }
     }
-    if (cs->interrupt_request & CPU_INTERRUPT_FIQ) {
-        ret |= CPSR_F;
+
+    if (arm_hcr_el2_fmo(env)) {
+        if (cs->interrupt_request & CPU_INTERRUPT_VFIQ) {
+            ret |= CPSR_F;
+        }
+    } else {
+        if (cs->interrupt_request & CPU_INTERRUPT_FIQ) {
+            ret |= CPSR_F;
+        }
     }
+
     /* External aborts are not possible in QEMU so A bit is always clear */
     return ret;
 }
-- 
2.19.1

The HCR_EL2 VI and VF bits are supposed to track whether there is
a pending virtual IRQ or virtual FIQ. For QEMU we store the
pending VIRQ/VFIQ status in cs->interrupt_request, so this means:
 * if the register is read we must get these bit values from
   cs->interrupt_request
 * if the register is written then we must write the bit
   values back into cs->interrupt_request

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181012144235.19646-7-peter.maydell@linaro.org
---
 target/arm/helper.c | 47 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 43 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el3_no_el2_v8_cp_reginfo[] = {
 static void hcr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
 {
     ARMCPU *cpu = arm_env_get_cpu(env);
+    CPUState *cs = ENV_GET_CPU(env);
     uint64_t valid_mask = HCR_MASK;
 
     if (arm_feature(env, ARM_FEATURE_EL3)) {
@@ -XXX,XX +XXX,XX @@ static void hcr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
     /* Clear RES0 bits.  */
     value &= valid_mask;
 
+    /*
+     * VI and VF are kept in cs->interrupt_request. Modifying that
+     * requires that we have the iothread lock, which is done by
+     * marking the reginfo structs as ARM_CP_IO.
+     * Note that if a write to HCR pends a VIRQ or VFIQ it is never
+     * possible for it to be taken immediately, because VIRQ and
+     * VFIQ are masked unless running at EL0 or EL1, and HCR
+     * can only be written at EL2.
+     */
+    g_assert(qemu_mutex_iothread_locked());
+    if (value & HCR_VI) {
+        cs->interrupt_request |= CPU_INTERRUPT_VIRQ;
+    } else {
+        cs->interrupt_request &= ~CPU_INTERRUPT_VIRQ;
+    }
+    if (value & HCR_VF) {
+        cs->interrupt_request |= CPU_INTERRUPT_VFIQ;
+    } else {
+        cs->interrupt_request &= ~CPU_INTERRUPT_VFIQ;
+    }
+    value &= ~(HCR_VI | HCR_VF);
+
     /* These bits change the MMU setup:
      * HCR_VM enables stage 2 translation
      * HCR_PTW forbids certain page-table setups
@@ -XXX,XX +XXX,XX @@ static void hcr_writelow(CPUARMState *env, const ARMCPRegInfo *ri,
     hcr_write(env, NULL, value);
 }
 
+static uint64_t hcr_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+    /* The VI and VF bits live in cs->interrupt_request */
+    uint64_t ret = env->cp15.hcr_el2 & ~(HCR_VI | HCR_VF);
+    CPUState *cs = ENV_GET_CPU(env);
+
+    if (cs->interrupt_request & CPU_INTERRUPT_VIRQ) {
+        ret |= HCR_VI;
+    }
+    if (cs->interrupt_request & CPU_INTERRUPT_VFIQ) {
+        ret |= HCR_VF;
+    }
+    return ret;
+}
+
 static const ARMCPRegInfo el2_cp_reginfo[] = {
     { .name = "HCR_EL2", .state = ARM_CP_STATE_AA64,
+      .type = ARM_CP_IO,
       .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 0,
       .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.hcr_el2),
-      .writefn = hcr_write },
+      .writefn = hcr_write, .readfn = hcr_read },
     { .name = "HCR", .state = ARM_CP_STATE_AA32,
-      .type = ARM_CP_ALIAS,
+      .type = ARM_CP_ALIAS | ARM_CP_IO,
       .cp = 15, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 0,
       .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.hcr_el2),
-      .writefn = hcr_writelow },
+      .writefn = hcr_writelow, .readfn = hcr_read },
     { .name = "ELR_EL2", .state = ARM_CP_STATE_AA64,
       .type = ARM_CP_ALIAS,
       .opc0 = 3, .opc1 = 4, .crn = 4, .crm = 0, .opc2 = 1,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
 
 static const ARMCPRegInfo el2_v8_cp_reginfo[] = {
     { .name = "HCR2", .state = ARM_CP_STATE_AA32,
-      .type = ARM_CP_ALIAS,
+      .type = ARM_CP_ALIAS | ARM_CP_IO,
       .cp = 15, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 4,
       .access = PL2_RW,
       .fieldoffset = offsetofhigh32(CPUARMState, cp15.hcr_el2),
-- 
2.19.1

If the HCR_EL2 PTW virtualizaiton configuration register bit
is set, then this means that a stage 2 Permission fault must
be generated if a stage 1 translation table access is made
to an address that is mapped as Device memory in stage 2.
Implement this.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181012144235.19646-8-peter.maydell@linaro.org
---
 target/arm/helper.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
         hwaddr s2pa;
         int s2prot;
         int ret;
+        ARMCacheAttrs cacheattrs = {};
+        ARMCacheAttrs *pcacheattrs = NULL;
+
+        if (env->cp15.hcr_el2 & HCR_PTW) {
+            /*
+             * PTW means we must fault if this S1 walk touches S2 Device
+             * memory; otherwise we don't care about the attributes and can
+             * save the S2 translation the effort of computing them.
+             */
+            pcacheattrs = &cacheattrs;
+        }
 
         ret = get_phys_addr_lpae(env, addr, 0, ARMMMUIdx_S2NS, &s2pa,
-                                 &txattrs, &s2prot, &s2size, fi, NULL);
+                                 &txattrs, &s2prot, &s2size, fi, pcacheattrs);
         if (ret) {
             assert(fi->type != ARMFault_None);
             fi->s2addr = addr;
@@ -XXX,XX +XXX,XX @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
             fi->s1ptw = true;
             return ~0;
         }
+        if (pcacheattrs && (pcacheattrs->attrs & 0xf0) == 0) {
+            /* Access was to Device memory: generate Permission fault */
+            fi->type = ARMFault_Permission;
+            fi->s2addr = addr;
+            fi->stage2 = true;
+            fi->s1ptw = true;
+            return ~0;
+        }
         addr = s2pa;
     }
     return addr;
-- 
2.19.1

Create and use a utility function to extract the EC field
from a syndrome, rather than open-coding the shift.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181012144235.19646-9-peter.maydell@linaro.org
---
 target/arm/internals.h | 5 +++++
 target/arm/helper.c    | 4 ++--
 target/arm/kvm64.c     | 2 +-
 target/arm/op_helper.c | 2 +-
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ enum arm_exception_class {
 #define ARM_EL_IL (1 << ARM_EL_IL_SHIFT)
 #define ARM_EL_ISV (1 << ARM_EL_ISV_SHIFT)
 
+static inline uint32_t syn_get_ec(uint32_t syn)
+{
+    return syn >> ARM_EL_EC_SHIFT;
+}
+
 /* Utility functions for constructing various kinds of syndrome value.
  * Note that in general we follow the AArch64 syndrome values; in a
  * few cases the value in HSR for exceptions taken to AArch32 Hyp
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_do_interrupt_aarch32(CPUState *cs)
     uint32_t moe;
 
     /* If this is a debug exception we must update the DBGDSCR.MOE bits */
-    switch (env->exception.syndrome >> ARM_EL_EC_SHIFT) {
+    switch (syn_get_ec(env->exception.syndrome)) {
     case EC_BREAKPOINT:
     case EC_BREAKPOINT_SAME_EL:
         moe = 1;
@@ -XXX,XX +XXX,XX @@ void arm_cpu_do_interrupt(CPUState *cs)
     if (qemu_loglevel_mask(CPU_LOG_INT)
         && !excp_is_internal(cs->exception_index)) {
         qemu_log_mask(CPU_LOG_INT, "...with ESR 0x%x/0x%" PRIx32 "\n",
-                      env->exception.syndrome >> ARM_EL_EC_SHIFT,
+                      syn_get_ec(env->exception.syndrome),
                       env->exception.syndrome);
     }
 
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ int kvm_arch_remove_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
 
 bool kvm_arm_handle_debug(CPUState *cs, struct kvm_debug_exit_arch *debug_exit)
 {
-    int hsr_ec = debug_exit->hsr >> ARM_EL_EC_SHIFT;
+    int hsr_ec = syn_get_ec(debug_exit->hsr);
     ARMCPU *cpu = ARM_CPU(cs);
     CPUClass *cc = CPU_GET_CLASS(cs);
     CPUARMState *env = &cpu->env;
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ void raise_exception(CPUARMState *env, uint32_t excp,
          * (see DDI0478C.a D1.10.4)
          */
         target_el = 2;
-        if (syndrome >> ARM_EL_EC_SHIFT == EC_ADVSIMDFPACCESSTRAP) {
+        if (syn_get_ec(syndrome) == EC_ADVSIMDFPACCESSTRAP) {
             syndrome = syn_uncategorized();
         }
     }
-- 
2.19.1

For the v7 version of the Arm architecture, the IL bit in
syndrome register values where the field is not valid was
defined to be UNK/SBZP. In v8 this is RES1, which is what
QEMU currently implements. Handle the desired v7 behaviour
by squashing the IL bit for the affected cases:
 * EC == EC_UNCATEGORIZED
 * prefetch aborts
 * data aborts where ISV is 0

(The fourth case listed in the v8 Arm ARM DDI 0487C.a in
section G7.2.70, "illegal state exception", can't happen
on a v7 CPU.)

This deals with a corner case noted in a comment.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181012144235.19646-10-peter.maydell@linaro.org
---
 target/arm/internals.h |  7 ++-----
 target/arm/helper.c    | 13 +++++++++++++
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline uint32_t syn_get_ec(uint32_t syn)
 /* Utility functions for constructing various kinds of syndrome value.
  * Note that in general we follow the AArch64 syndrome values; in a
  * few cases the value in HSR for exceptions taken to AArch32 Hyp
- * mode differs slightly, so if we ever implemented Hyp mode then the
- * syndrome value would need some massaging on exception entry.
- * (One example of this is that AArch64 defaults to IL bit set for
- * exceptions which don't specifically indicate information about the
- * trapping instruction, whereas AArch32 defaults to IL bit clear.)
+ * mode differs slightly, and we fix this up when populating HSR in
+ * arm_cpu_do_interrupt_aarch32_hyp().
  */
 static inline uint32_t syn_uncategorized(void)
 {
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_do_interrupt_aarch32_hyp(CPUState *cs)
     }
 
     if (cs->exception_index != EXCP_IRQ && cs->exception_index != EXCP_FIQ) {
+        if (!arm_feature(env, ARM_FEATURE_V8)) {
+            /*
+             * QEMU syndrome values are v8-style. v7 has the IL bit
+             * UNK/SBZP for "field not valid" cases, where v8 uses RES1.
+             * If this is a v7 CPU, squash the IL bit in those cases.
+             */
+            if (cs->exception_index == EXCP_PREFETCH_ABORT ||
+                (cs->exception_index == EXCP_DATA_ABORT &&
+                 !(env->exception.syndrome & ARM_EL_ISV)) ||
+                syn_get_ec(env->exception.syndrome) == EC_UNCATEGORIZED) {
+                env->exception.syndrome &= ~ARM_EL_IL;
+            }
+        }
         env->cp15.esr_el[2] = env->exception.syndrome;
     }
 
-- 
2.19.1

For traps of FP/SIMD instructions to AArch32 Hyp mode, the syndrome
provided in HSR has more information than is reported to AArch64.
Specifically, there are extra fields TA and coproc which indicate
whether the trapped instruction was FP or SIMD. Add this extra
information to the syndromes we construct, and mask it out when
taking the exception to AArch64.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181012144235.19646-11-peter.maydell@linaro.org
---
 target/arm/internals.h | 14 +++++++++++++-
 target/arm/helper.c    |  9 +++++++++
 target/arm/translate.c |  8 ++++----
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline uint32_t syn_get_ec(uint32_t syn)
  * few cases the value in HSR for exceptions taken to AArch32 Hyp
  * mode differs slightly, and we fix this up when populating HSR in
  * arm_cpu_do_interrupt_aarch32_hyp().
+ * The exception is FP/SIMD access traps -- these report extra information
+ * when taking an exception to AArch32. For those we include the extra coproc
+ * and TA fields, and mask them out when taking the exception to AArch64.
  */
 static inline uint32_t syn_uncategorized(void)
 {
@@ -XXX,XX +XXX,XX @@ static inline uint32_t syn_cp15_rrt_trap(int cv, int cond, int opc1, int crm,
 
 static inline uint32_t syn_fp_access_trap(int cv, int cond, bool is_16bit)
 {
+    /* AArch32 FP trap or any AArch64 FP/SIMD trap: TA == 0 coproc == 0xa */
     return (EC_ADVSIMDFPACCESSTRAP << ARM_EL_EC_SHIFT)
         | (is_16bit ? 0 : ARM_EL_IL)
-        | (cv << 24) | (cond << 20);
+        | (cv << 24) | (cond << 20) | 0xa;
+}
+
+static inline uint32_t syn_simd_access_trap(int cv, int cond, bool is_16bit)
+{
+    /* AArch32 SIMD trap: TA == 1 coproc == 0 */
+    return (EC_ADVSIMDFPACCESSTRAP << ARM_EL_EC_SHIFT)
+        | (is_16bit ? 0 : ARM_EL_IL)
+        | (cv << 24) | (cond << 20) | (1 << 5);
 }
 
 static inline uint32_t syn_sve_access_trap(void)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
     case EXCP_HVC:
     case EXCP_HYP_TRAP:
     case EXCP_SMC:
+        if (syn_get_ec(env->exception.syndrome) == EC_ADVSIMDFPACCESSTRAP) {
+            /*
+             * QEMU internal FP/SIMD syndromes from AArch32 include the
+             * TA and coproc fields which are only exposed if the exception
+             * is taken to AArch32 Hyp mode. Mask them out to get a valid
+             * AArch64 format syndrome.
+             */
+            env->exception.syndrome &= ~MAKE_64BIT_MASK(0, 20);
+        }
         env->cp15.esr_el[new_el] = env->exception.syndrome;
         break;
     case EXCP_IRQ:
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
      */
     if (s->fp_excp_el) {
         gen_exception_insn(s, 4, EXCP_UDEF,
-                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
+                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
         return 0;
     }
 
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
      */
     if (s->fp_excp_el) {
         gen_exception_insn(s, 4, EXCP_UDEF,
-                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
+                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
         return 0;
     }
 
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_3same_ext(DisasContext *s, uint32_t insn)
 
     if (s->fp_excp_el) {
         gen_exception_insn(s, 4, EXCP_UDEF,
-                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
+                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
         return 0;
     }
     if (!s->vfp_enabled) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_insn_2reg_scalar_ext(DisasContext *s, uint32_t insn)
 
     if (s->fp_excp_el) {
         gen_exception_insn(s, 4, EXCP_UDEF,
-                           syn_fp_access_trap(1, 0xe, false), s->fp_excp_el);
+                           syn_simd_access_trap(1, 0xe, false), s->fp_excp_el);
         return 0;
     }
     if (!s->vfp_enabled) {
-- 
2.19.1

From: Stewart Hildebrand <Stewart.Hildebrand@dornerworks.com>

"The Image must be placed text_offset bytes from a 2MB aligned base
address anywhere in usable system RAM and called there."

For the virt board, we write our startup bootloader at the very
bottom of RAM, so that bit can't be used for the image. To avoid
overlap in case the image requests to be loaded at an offset
smaller than our bootloader, we increment the load offset to the
next 2MB.

This fixes a boot failure for Xen AArch64.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@dornerworks.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Message-id: b8a89518794b4436af0c151ed10de4fa@dornerworks.com
[PMM: Rephrased a comment a bit]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/boot.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/config-file.h"
 #include "qemu/option.h"
 #include "exec/address-spaces.h"
+#include "qemu/units.h"
 
 /* Kernel boot protocol is specified in the kernel docs
  * Documentation/arm/Booting and Documentation/arm64/booting.txt
@@ -XXX,XX +XXX,XX @@
 #define ARM64_TEXT_OFFSET_OFFSET    8
 #define ARM64_MAGIC_OFFSET          56
 
+#define BOOTLOADER_MAX_SIZE         (4 * KiB)
+
 AddressSpace *arm_boot_address_space(ARMCPU *cpu,
                                      const struct arm_boot_info *info)
 {
@@ -XXX,XX +XXX,XX @@ static void write_bootloader(const char *name, hwaddr addr,
         code[i] = tswap32(insn);
     }
 
+    assert((len * sizeof(uint32_t)) < BOOTLOADER_MAX_SIZE);
+
     rom_add_blob_fixed_as(name, code, len * sizeof(uint32_t), addr, as);
 
     g_free(code);
@@ -XXX,XX +XXX,XX @@ static uint64_t load_aarch64_image(const char *filename, hwaddr mem_base,
         memcpy(&hdrvals, buffer + ARM64_TEXT_OFFSET_OFFSET, sizeof(hdrvals));
         if (hdrvals[1] != 0) {
             kernel_load_offset = le64_to_cpu(hdrvals[0]);
+
+            /*
+             * We write our startup "bootloader" at the very bottom of RAM,
+             * so that bit can't be used for the image. Luckily the Image
+             * format specification is that the image requests only an offset
+             * from a 2MB boundary, not an absolute load address. So if the
+             * image requests an offset that might mean it overlaps with the
+             * bootloader, we can just load it starting at 2MB+offset rather
+             * than 0MB + offset.
+             */
+            if (kernel_load_offset < BOOTLOADER_MAX_SIZE) {
+                kernel_load_offset += 2 * MiB;
+            }
         }
     }
 
-- 
2.19.1

From: Richard Henderson <rth@twiddle.net>

This can reduce the number of opcodes required for certain
complex forms of load-multiple (e.g. ld4.16b).

Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-id: 20181011205206.3552-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
     bool is_store = !extract32(insn, 22, 1);
     bool is_postidx = extract32(insn, 23, 1);
     bool is_q = extract32(insn, 30, 1);
-    TCGv_i64 tcg_addr, tcg_rn;
+    TCGv_i64 tcg_addr, tcg_rn, tcg_ebytes;
 
     int ebytes = 1 << size;
     int elements = (is_q ? 128 : 64) / (8 << size);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
     tcg_rn = cpu_reg_sp(s, rn);
     tcg_addr = tcg_temp_new_i64();
     tcg_gen_mov_i64(tcg_addr, tcg_rn);
+    tcg_ebytes = tcg_const_i64(ebytes);
 
     for (r = 0; r < rpt; r++) {
         int e;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
                         clear_vec_high(s, is_q, tt);
                     }
                 }
-                tcg_gen_addi_i64(tcg_addr, tcg_addr, ebytes);
+                tcg_gen_add_i64(tcg_addr, tcg_addr, tcg_ebytes);
                 tt = (tt + 1) % 32;
             }
         }
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
             tcg_gen_add_i64(tcg_rn, tcg_rn, cpu_reg(s, rm));
         }
     }
+    tcg_temp_free_i64(tcg_ebytes);
     tcg_temp_free_i64(tcg_addr);
 }
 
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
     bool replicate = false;
     int index = is_q << 3 | S << 2 | size;
     int ebytes, xs;
-    TCGv_i64 tcg_addr, tcg_rn;
+    TCGv_i64 tcg_addr, tcg_rn, tcg_ebytes;
 
     switch (scale) {
     case 3:
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
     tcg_rn = cpu_reg_sp(s, rn);
     tcg_addr = tcg_temp_new_i64();
     tcg_gen_mov_i64(tcg_addr, tcg_rn);
+    tcg_ebytes = tcg_const_i64(ebytes);
 
     for (xs = 0; xs < selem; xs++) {
         if (replicate) {
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
                 do_vec_st(s, rt, index, tcg_addr, scale);
             }
         }
-        tcg_gen_addi_i64(tcg_addr, tcg_addr, ebytes);
+        tcg_gen_add_i64(tcg_addr, tcg_addr, tcg_ebytes);
         rt = (rt + 1) % 32;
     }
 
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
             tcg_gen_add_i64(tcg_rn, tcg_rn, cpu_reg(s, rm));
         }
     }
+    tcg_temp_free_i64(tcg_ebytes);
     tcg_temp_free_i64(tcg_addr);
 }
 
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

This is done generically in translator_loop.

Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20181011205206.3552-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 1 -
 target/arm/translate.c     | 1 -
 2 files changed, 2 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
 
 static void aarch64_tr_tb_start(DisasContextBase *db, CPUState *cpu)
 {
-    tcg_clear_temp_count();
 }
 
 static void aarch64_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void arm_tr_tb_start(DisasContextBase *dcbase, CPUState *cpu)
         tcg_gen_movi_i32(tmp, 0);
         store_cpu_field(tmp, condexec_bits);
     }
-    tcg_clear_temp_count();
 }
 
 static void arm_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu)
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181011205206.3552-4-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 28 +++-------------------------
 1 file changed, 3 insertions(+), 25 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
     for (xs = 0; xs < selem; xs++) {
         if (replicate) {
             /* Load and replicate to all elements */
-            uint64_t mulconst;
             TCGv_i64 tcg_tmp = tcg_temp_new_i64();
 
             tcg_gen_qemu_ld_i64(tcg_tmp, tcg_addr,
                                 get_mem_index(s), s->be_data + scale);
-            switch (scale) {
-            case 0:
-                mulconst = 0x0101010101010101ULL;
-                break;
-            case 1:
-                mulconst = 0x0001000100010001ULL;
-                break;
-            case 2:
-                mulconst = 0x0000000100000001ULL;
-                break;
-            case 3:
-                mulconst = 0;
-                break;
-            default:
-                g_assert_not_reached();
-            }
-            if (mulconst) {
-                tcg_gen_muli_i64(tcg_tmp, tcg_tmp, mulconst);
-            }
-            write_vec_element(s, tcg_tmp, rt, 0, MO_64);
-            if (is_q) {
-                write_vec_element(s, tcg_tmp, rt, 1, MO_64);
-            }
+            tcg_gen_gvec_dup_i64(scale, vec_full_reg_offset(s, rt),
+                                 (is_q + 1) * 8, vec_full_reg_size(s),
+                                 tcg_tmp);
             tcg_temp_free_i64(tcg_tmp);
-            clear_vec_high(s, is_q, rt);
         } else {
             /* Load/store one element per register */
             if (is_load) {
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

For a sequence of loads or stores from a single register,
little-endian operations can be promoted to an 8-byte op.
This can reduce the number of operations by a factor of 8.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181011205206.3552-5-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 66 +++++++++++++++++++++++---------------
 1 file changed, 40 insertions(+), 26 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void write_vec_element_i32(DisasContext *s, TCGv_i32 tcg_src,
 
 /* Store from vector register to memory */
 static void do_vec_st(DisasContext *s, int srcidx, int element,
-                      TCGv_i64 tcg_addr, int size)
+                      TCGv_i64 tcg_addr, int size, TCGMemOp endian)
 {
-    TCGMemOp memop = s->be_data + size;
     TCGv_i64 tcg_tmp = tcg_temp_new_i64();
 
     read_vec_element(s, tcg_tmp, srcidx, element, size);
-    tcg_gen_qemu_st_i64(tcg_tmp, tcg_addr, get_mem_index(s), memop);
+    tcg_gen_qemu_st_i64(tcg_tmp, tcg_addr, get_mem_index(s), endian | size);
 
     tcg_temp_free_i64(tcg_tmp);
 }
 
 /* Load from memory to vector register */
 static void do_vec_ld(DisasContext *s, int destidx, int element,
-                      TCGv_i64 tcg_addr, int size)
+                      TCGv_i64 tcg_addr, int size, TCGMemOp endian)
 {
-    TCGMemOp memop = s->be_data + size;
     TCGv_i64 tcg_tmp = tcg_temp_new_i64();
 
-    tcg_gen_qemu_ld_i64(tcg_tmp, tcg_addr, get_mem_index(s), memop);
+    tcg_gen_qemu_ld_i64(tcg_tmp, tcg_addr, get_mem_index(s), endian | size);
     write_vec_element(s, tcg_tmp, destidx, element, size);
 
     tcg_temp_free_i64(tcg_tmp);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
     bool is_postidx = extract32(insn, 23, 1);
     bool is_q = extract32(insn, 30, 1);
     TCGv_i64 tcg_addr, tcg_rn, tcg_ebytes;
+    TCGMemOp endian = s->be_data;
 
-    int ebytes = 1 << size;
-    int elements = (is_q ? 128 : 64) / (8 << size);
+    int ebytes;   /* bytes per element */
+    int elements; /* elements per vector */
     int rpt;    /* num iterations */
     int selem;  /* structure elements */
     int r;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
         gen_check_sp_alignment(s);
     }
 
+    /* For our purposes, bytes are always little-endian.  */
+    if (size == 0) {
+        endian = MO_LE;
+    }
+
+    /* Consecutive little-endian elements from a single register
+     * can be promoted to a larger little-endian operation.
+     */
+    if (selem == 1 && endian == MO_LE) {
+        size = 3;
+    }
+    ebytes = 1 << size;
+    elements = (is_q ? 16 : 8) / ebytes;
+
     tcg_rn = cpu_reg_sp(s, rn);
     tcg_addr = tcg_temp_new_i64();
     tcg_gen_mov_i64(tcg_addr, tcg_rn);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_multiple_struct(DisasContext *s, uint32_t insn)
     for (r = 0; r < rpt; r++) {
         int e;
         for (e = 0; e < elements; e++) {
-            int tt = (rt + r) % 32;
             int xs;
             for (xs = 0; xs < selem; xs++) {
+                int tt = (rt + r + xs) % 32;
                 if (is_store) {
-                    do_vec_st(s, tt, e, tcg_addr, size);
+                    do_vec_st(s, tt, e, tcg_addr, size, endian);
                 } else {
-                    do_vec_ld(s, tt, e, tcg_addr, size);
-
-                    /* For non-quad operations, setting a slice of the low
-                     * 64 bits of the register clears the high 64 bits (in
-                     * the ARM ARM pseudocode this is implicit in the fact
-                     * that 'rval' is a 64 bit wide variable).
-                     * For quad operations, we might still need to zero the
-                     * high bits of SVE.  We optimize by noticing that we only
-                     * need to do this the first time we touch a register.
-                     */
-                    if (e == 0 && (r == 0 || xs == selem - 1)) {
-                        clear_vec_high(s, is_q, tt);
-                    }
+                    do_vec_ld(s, tt, e, tcg_addr, size, endian);
                 }
                 tcg_gen_add_i64(tcg_addr, tcg_addr, tcg_ebytes);
-                tt = (tt + 1) % 32;
             }
         }
     }
 
+    if (!is_store) {
+        /* For non-quad operations, setting a slice of the low
+         * 64 bits of the register clears the high 64 bits (in
+         * the ARM ARM pseudocode this is implicit in the fact
+         * that 'rval' is a 64 bit wide variable).
+         * For quad operations, we might still need to zero the
+         * high bits of SVE.
+         */
+        for (r = 0; r < rpt * selem; r++) {
+            int tt = (rt + r) % 32;
+            clear_vec_high(s, is_q, tt);
+        }
+    }
+
     if (is_postidx) {
         int rm = extract32(insn, 16, 5);
         if (rm == 31) {
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_single_struct(DisasContext *s, uint32_t insn)
         } else {
             /* Load/store one element per register */
             if (is_load) {
-                do_vec_ld(s, rt, index, tcg_addr, scale);
+                do_vec_ld(s, rt, index, tcg_addr, scale, s->be_data);
             } else {
-                do_vec_st(s, rt, index, tcg_addr, scale);
+                do_vec_st(s, rt, index, tcg_addr, scale, s->be_data);
             }
         }
         tcg_gen_add_i64(tcg_addr, tcg_addr, tcg_ebytes);
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20181011205206.3552-6-richard.henderson@linaro.org
[PMM: drop change to now-deleted cpu_mode_names array]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static TCGv_i64 cpu_F0d, cpu_F1d;
 
 #include "exec/gen-icount.h"
 
-static const char *regnames[] =
+static const char * const regnames[] =
     { "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
       "r8", "r9", "r10", "r11", "r12", "r13", "r14", "pc" };
 
@@ -XXX,XX +XXX,XX @@ static struct {
     int nregs;
     int interleave;
     int spacing;
-} neon_ls_element_type[11] = {
+} const neon_ls_element_type[11] = {
     {4, 4, 1},
     {4, 4, 2},
     {4, 1, 1},
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Also introduces neon_element_offset to find the env offset
of a specific element within a neon register.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181011205206.3552-7-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 63 ++++++++++++++++++++++++------------------
 1 file changed, 36 insertions(+), 27 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ neon_reg_offset (int reg, int n)
     return vfp_reg_offset(0, sreg);
 }
 
+/* Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
+ * where 0 is the least significant end of the register.
+ */
+static inline long
+neon_element_offset(int reg, int element, TCGMemOp size)
+{
+    int element_size = 1 << size;
+    int ofs = element * element_size;
+#ifdef HOST_WORDS_BIGENDIAN
+    /* Calculate the offset assuming fully little-endian,
+     * then XOR to account for the order of the 8-byte units.
+     */
+    if (element_size < 8) {
+        ofs ^= 8 - element_size;
+    }
+#endif
+    return neon_reg_offset(reg, 0) + ofs;
+}
+
 static TCGv_i32 neon_load_reg(int reg, int pass)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                     tmp = load_reg(s, rd);
                     if (insn & (1 << 23)) {
                         /* VDUP */
-                        if (size == 0) {
-                            gen_neon_dup_u8(tmp, 0);
-                        } else if (size == 1) {
-                            gen_neon_dup_low16(tmp);
-                        }
-                        for (n = 0; n <= pass * 2; n++) {
-                            tmp2 = tcg_temp_new_i32();
-                            tcg_gen_mov_i32(tmp2, tmp);
-                            neon_store_reg(rn, n, tmp2);
-                        }
-                        neon_store_reg(rn, n, tmp);
+                        int vec_size = pass ? 16 : 8;
+                        tcg_gen_gvec_dup_i32(size, neon_reg_offset(rn, 0),
+                                             vec_size, vec_size, tmp);
+                        tcg_temp_free_i32(tmp);
                     } else {
                         /* VMOV */
                         switch (size) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 tcg_temp_free_i32(tmp);
             } else if ((insn & 0x380) == 0) {
                 /* VDUP */
+                int element;
+                TCGMemOp size;
+
                 if ((insn & (7 << 16)) == 0 || (q && (rd & 1))) {
                     return 1;
                 }
-                if (insn & (1 << 19)) {
-                    tmp = neon_load_reg(rm, 1);
-                } else {
-                    tmp = neon_load_reg(rm, 0);
-                }
                 if (insn & (1 << 16)) {
-                    gen_neon_dup_u8(tmp, ((insn >> 17) & 3) * 8);
+                    size = MO_8;
+                    element = (insn >> 17) & 7;
                 } else if (insn & (1 << 17)) {
-                    if ((insn >> 18) & 1)
-                        gen_neon_dup_high16(tmp);
-                    else
-                        gen_neon_dup_low16(tmp);
+                    size = MO_16;
+                    element = (insn >> 18) & 3;
+                } else {
+                    size = MO_32;
+                    element = (insn >> 19) & 1;
                 }
-                for (pass = 0; pass < (q ? 4 : 2); pass++) {
-                    tmp2 = tcg_temp_new_i32();
-                    tcg_gen_mov_i32(tmp2, tmp);
-                    neon_store_reg(rd, pass, tmp2);
-                }
-                tcg_temp_free_i32(tmp);
+                tcg_gen_gvec_dup_mem(size, neon_reg_offset(rd, 0),
+                                     neon_element_offset(rm, element, size),
+                                     q ? 16 : 8, q ? 16 : 8);
             } else {
                 return 1;
             }
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181011205206.3552-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 67 ++++++++++++++++++++++++------------------
 1 file changed, 39 insertions(+), 28 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 return 1;
             }
         } else { /* (insn & 0x00380080) == 0 */
-            int invert;
+            int invert, reg_ofs, vec_size;
+
             if (q && (rd & 1)) {
                 return 1;
             }
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 break;
             case 14:
                 imm |= (imm << 8) | (imm << 16) | (imm << 24);
-                if (invert)
+                if (invert) {
                     imm = ~imm;
+                }
                 break;
             case 15:
                 if (invert) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                       | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
                 break;
             }
-            if (invert)
+            if (invert) {
                 imm = ~imm;
+            }
 
-            for (pass = 0; pass < (q ? 4 : 2); pass++) {
-                if (op & 1 && op < 12) {
-                    tmp = neon_load_reg(rd, pass);
-                    if (invert) {
-                        /* The immediate value has already been inverted, so
-                           BIC becomes AND.  */
-                        tcg_gen_andi_i32(tmp, tmp, imm);
-                    } else {
-                        tcg_gen_ori_i32(tmp, tmp, imm);
-                    }
+            reg_ofs = neon_reg_offset(rd, 0);
+            vec_size = q ? 16 : 8;
+
+            if (op & 1 && op < 12) {
+                if (invert) {
+                    /* The immediate value has already been inverted,
+                     * so BIC becomes AND.
+                     */
+                    tcg_gen_gvec_andi(MO_32, reg_ofs, reg_ofs, imm,
+                                      vec_size, vec_size);
                 } else {
-                    /* VMOV, VMVN.  */
-                    tmp = tcg_temp_new_i32();
-                    if (op == 14 && invert) {
-                        int n;
-                        uint32_t val;
-                        val = 0;
-                        for (n = 0; n < 4; n++) {
-                            if (imm & (1 << (n + (pass & 1) * 4)))
-                                val |= 0xff << (n * 8);
-                        }
-                        tcg_gen_movi_i32(tmp, val);
-                    } else {
-                        tcg_gen_movi_i32(tmp, imm);
-                    }
+                    tcg_gen_gvec_ori(MO_32, reg_ofs, reg_ofs, imm,
+                                     vec_size, vec_size);
+                }
+            } else {
+                /* VMOV, VMVN.  */
+                if (op == 14 && invert) {
+                    TCGv_i64 t64 = tcg_temp_new_i64();
+
+                    for (pass = 0; pass <= q; ++pass) {
+                        uint64_t val = 0;
+                        int n;
+
+                        for (n = 0; n < 8; n++) {
+                            if (imm & (1 << (n + pass * 8))) {
+                                val |= 0xffull << (n * 8);
+                            }
+                        }
+                        tcg_gen_movi_i64(t64, val);
+                        neon_store_reg64(t64, rd + pass);
+                    }
+                    tcg_temp_free_i64(t64);
+                } else {
+                    tcg_gen_gvec_dup32i(reg_ofs, vec_size, vec_size, imm);
                 }
-                neon_store_reg(rd, pass, tmp);
             }
         }
     } else { /* (insn & 0x00800010 == 0x00800000) */
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Move expanders for VBSL, VBIT, and VBIF from translate-a64.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181011205206.3552-9-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.h     |   6 ++
 target/arm/translate-a64.c |  61 --------------
 target/arm/translate.c     | 162 +++++++++++++++++++++++++++----------
 3 files changed, 124 insertions(+), 105 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ static inline TCGv_i32 get_ahp_flag(void)
     return ret;
 }
 
+
+/* Vector operations shared between ARM and AArch64.  */
+extern const GVecGen3 bsl_op;
+extern const GVecGen3 bit_op;
+extern const GVecGen3 bif_op;
+
 /*
  * Forward to the isar_feature_* tests given a DisasContext pointer.
  */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
     }
 }
 
-static void gen_bsl_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-{
-    tcg_gen_xor_i64(rn, rn, rm);
-    tcg_gen_and_i64(rn, rn, rd);
-    tcg_gen_xor_i64(rd, rm, rn);
-}
-
-static void gen_bit_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-{
-    tcg_gen_xor_i64(rn, rn, rd);
-    tcg_gen_and_i64(rn, rn, rm);
-    tcg_gen_xor_i64(rd, rd, rn);
-}
-
-static void gen_bif_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-{
-    tcg_gen_xor_i64(rn, rn, rd);
-    tcg_gen_andc_i64(rn, rn, rm);
-    tcg_gen_xor_i64(rd, rd, rn);
-}
-
-static void gen_bsl_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
-{
-    tcg_gen_xor_vec(vece, rn, rn, rm);
-    tcg_gen_and_vec(vece, rn, rn, rd);
-    tcg_gen_xor_vec(vece, rd, rm, rn);
-}
-
-static void gen_bit_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
-{
-    tcg_gen_xor_vec(vece, rn, rn, rd);
-    tcg_gen_and_vec(vece, rn, rn, rm);
-    tcg_gen_xor_vec(vece, rd, rd, rn);
-}
-
-static void gen_bif_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
-{
-    tcg_gen_xor_vec(vece, rn, rn, rd);
-    tcg_gen_andc_vec(vece, rn, rn, rm);
-    tcg_gen_xor_vec(vece, rd, rd, rn);
-}
-
 /* Logic op (opcode == 3) subgroup of C3.6.16. */
 static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
 {
-    static const GVecGen3 bsl_op = {
-        .fni8 = gen_bsl_i64,
-        .fniv = gen_bsl_vec,
-        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-        .load_dest = true
-    };
-    static const GVecGen3 bit_op = {
-        .fni8 = gen_bit_i64,
-        .fniv = gen_bit_vec,
-        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-        .load_dest = true
-    };
-    static const GVecGen3 bif_op = {
-        .fni8 = gen_bif_i64,
-        .fniv = gen_bif_vec,
-        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-        .load_dest = true
-    };
-
     int rd = extract32(insn, 0, 5);
     int rn = extract32(insn, 5, 5);
     int rm = extract32(insn, 16, 5);
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
     return 0;
 }
 
-/* Bitwise select.  dest = c ? t : f.  Clobbers T and F.  */
-static void gen_neon_bsl(TCGv_i32 dest, TCGv_i32 t, TCGv_i32 f, TCGv_i32 c)
-{
-    tcg_gen_and_i32(t, t, c);
-    tcg_gen_andc_i32(f, f, c);
-    tcg_gen_or_i32(dest, t, f);
-}
-
 static inline void gen_neon_narrow(int size, TCGv_i32 dest, TCGv_i64 src)
 {
     switch (size) {
@@ -XXX,XX +XXX,XX @@ static int do_v81_helper(DisasContext *s, gen_helper_gvec_3_ptr *fn,
     return 1;
 }
 
+/*
+ * Expanders for VBitOps_VBIF, VBIT, VBSL.
+ */
+static void gen_bsl_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
+{
+    tcg_gen_xor_i64(rn, rn, rm);
+    tcg_gen_and_i64(rn, rn, rd);
+    tcg_gen_xor_i64(rd, rm, rn);
+}
+
+static void gen_bit_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
+{
+    tcg_gen_xor_i64(rn, rn, rd);
+    tcg_gen_and_i64(rn, rn, rm);
+    tcg_gen_xor_i64(rd, rd, rn);
+}
+
+static void gen_bif_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
+{
+    tcg_gen_xor_i64(rn, rn, rd);
+    tcg_gen_andc_i64(rn, rn, rm);
+    tcg_gen_xor_i64(rd, rd, rn);
+}
+
+static void gen_bsl_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
+{
+    tcg_gen_xor_vec(vece, rn, rn, rm);
+    tcg_gen_and_vec(vece, rn, rn, rd);
+    tcg_gen_xor_vec(vece, rd, rm, rn);
+}
+
+static void gen_bit_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
+{
+    tcg_gen_xor_vec(vece, rn, rn, rd);
+    tcg_gen_and_vec(vece, rn, rn, rm);
+    tcg_gen_xor_vec(vece, rd, rd, rn);
+}
+
+static void gen_bif_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
+{
+    tcg_gen_xor_vec(vece, rn, rn, rd);
+    tcg_gen_andc_vec(vece, rn, rn, rm);
+    tcg_gen_xor_vec(vece, rd, rd, rn);
+}
+
+const GVecGen3 bsl_op = {
+    .fni8 = gen_bsl_i64,
+    .fniv = gen_bsl_vec,
+    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    .load_dest = true
+};
+
+const GVecGen3 bit_op = {
+    .fni8 = gen_bit_i64,
+    .fniv = gen_bit_vec,
+    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    .load_dest = true
+};
+
+const GVecGen3 bif_op = {
+    .fni8 = gen_bif_i64,
+    .fniv = gen_bif_vec,
+    .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+    .load_dest = true
+};
+
+
 /* Translate a NEON data processing instruction.  Return nonzero if the
    instruction is invalid.
    We process data in a mixture of 32-bit and 64-bit chunks.
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
 {
     int op;
     int q;
-    int rd, rn, rm;
+    int rd, rn, rm, rd_ofs, rn_ofs, rm_ofs;
     int size;
     int shift;
     int pass;
     int count;
     int pairwise;
     int u;
+    int vec_size;
     uint32_t imm, mask;
     TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
     TCGv_ptr ptr1, ptr2, ptr3;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     VFP_DREG_N(rn, insn);
     VFP_DREG_M(rm, insn);
     size = (insn >> 20) & 3;
+    vec_size = q ? 16 : 8;
+    rd_ofs = neon_reg_offset(rd, 0);
+    rn_ofs = neon_reg_offset(rn, 0);
+    rm_ofs = neon_reg_offset(rm, 0);
+
     if ((insn & (1 << 23)) == 0) {
         /* Three register same length.  */
         op = ((insn >> 7) & 0x1e) | ((insn >> 4) & 1);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                      q, rd, rn, rm);
             }
             return 1;
+
+        case NEON_3R_LOGIC: /* Logic ops.  */
+            switch ((u << 2) | size) {
+            case 0: /* VAND */
+                tcg_gen_gvec_and(0, rd_ofs, rn_ofs, rm_ofs,
+                                 vec_size, vec_size);
+                break;
+            case 1: /* VBIC */
+                tcg_gen_gvec_andc(0, rd_ofs, rn_ofs, rm_ofs,
+                                  vec_size, vec_size);
+                break;
+            case 2:
+                if (rn == rm) {
+                    /* VMOV */
+                    tcg_gen_gvec_mov(0, rd_ofs, rn_ofs, vec_size, vec_size);
+                } else {
+                    /* VORR */
+                    tcg_gen_gvec_or(0, rd_ofs, rn_ofs, rm_ofs,
+                                    vec_size, vec_size);
+                }
+                break;
+            case 3: /* VORN */
+                tcg_gen_gvec_orc(0, rd_ofs, rn_ofs, rm_ofs,
+                                 vec_size, vec_size);
+                break;
+            case 4: /* VEOR */
+                tcg_gen_gvec_xor(0, rd_ofs, rn_ofs, rm_ofs,
+                                 vec_size, vec_size);
+                break;
+            case 5: /* VBSL */
+                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
+                               vec_size, vec_size, &bsl_op);
+                break;
+            case 6: /* VBIT */
+                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
+                               vec_size, vec_size, &bit_op);
+                break;
+            case 7: /* VBIF */
+                tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs,
+                               vec_size, vec_size, &bif_op);
+                break;
+            }
+            return 0;
         }
-        if (size == 3 && op != NEON_3R_LOGIC) {
+        if (size == 3) {
             /* 64-bit element instructions. */
             for (pass = 0; pass < (q ? 2 : 1); pass++) {
                 neon_load_reg64(cpu_V0, rn + pass);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VRHADD:
             GEN_NEON_INTEGER_OP(rhadd);
             break;
-        case NEON_3R_LOGIC: /* Logic ops.  */
-            switch ((u << 2) | size) {
-            case 0: /* VAND */
-                tcg_gen_and_i32(tmp, tmp, tmp2);
-                break;
-            case 1: /* BIC */
-                tcg_gen_andc_i32(tmp, tmp, tmp2);
-                break;
-            case 2: /* VORR */
-                tcg_gen_or_i32(tmp, tmp, tmp2);
-                break;
-            case 3: /* VORN */
-                tcg_gen_orc_i32(tmp, tmp, tmp2);
-                break;
-            case 4: /* VEOR */
-                tcg_gen_xor_i32(tmp, tmp, tmp2);
-                break;
-            case 5: /* VBSL */
-                tmp3 = neon_load_reg(rd, pass);
-                gen_neon_bsl(tmp, tmp, tmp2, tmp3);
-                tcg_temp_free_i32(tmp3);
-                break;
-            case 6: /* VBIT */
-                tmp3 = neon_load_reg(rd, pass);
-                gen_neon_bsl(tmp, tmp, tmp3, tmp2);
-                tcg_temp_free_i32(tmp3);
-                break;
-            case 7: /* VBIF */
-                tmp3 = neon_load_reg(rd, pass);
-                gen_neon_bsl(tmp, tmp3, tmp, tmp2);
-                tcg_temp_free_i32(tmp3);
-                break;
-            }
-            break;
         case NEON_3R_VHSUB:
             GEN_NEON_INTEGER_OP(hsub);
             break;
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181011205206.3552-10-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 29 ++++++++++-------------------
 1 file changed, 10 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 break;
             }
             return 0;
+
+        case NEON_3R_VADD_VSUB:
+            if (u) {
+                tcg_gen_gvec_sub(size, rd_ofs, rn_ofs, rm_ofs,
+                                 vec_size, vec_size);
+            } else {
+                tcg_gen_gvec_add(size, rd_ofs, rn_ofs, rm_ofs,
+                                 vec_size, vec_size);
+            }
+            return 0;
         }
         if (size == 3) {
             /* 64-bit element instructions. */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                                   cpu_V1, cpu_V0);
                     }
                     break;
-                case NEON_3R_VADD_VSUB:
-                    if (u) {
-                        tcg_gen_sub_i64(CPU_V001);
-                    } else {
-                        tcg_gen_add_i64(CPU_V001);
-                    }
-                    break;
                 default:
                     abort();
                 }
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             tmp2 = neon_load_reg(rd, pass);
             gen_neon_add(size, tmp, tmp2);
             break;
-        case NEON_3R_VADD_VSUB:
-            if (!u) { /* VADD */
-                gen_neon_add(size, tmp, tmp2);
-            } else { /* VSUB */
-                switch (size) {
-                case 0: gen_helper_neon_sub_u8(tmp, tmp, tmp2); break;
-                case 1: gen_helper_neon_sub_u16(tmp, tmp, tmp2); break;
-                case 2: tcg_gen_sub_i32(tmp, tmp, tmp2); break;
-                default: abort();
-                }
-            }
-            break;
         case NEON_3R_VTST_VCEQ:
             if (!u) { /* VTST */
                 switch (size) {
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181011205206.3552-11-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     tcg_temp_free_ptr(ptr1);
                     tcg_temp_free_ptr(ptr2);
                     break;
+
+                case NEON_2RM_VMVN:
+                    tcg_gen_gvec_not(0, rd_ofs, rm_ofs, vec_size, vec_size);
+                    break;
+                case NEON_2RM_VNEG:
+                    tcg_gen_gvec_neg(size, rd_ofs, rm_ofs, vec_size, vec_size);
+                    break;
+
                 default:
                 elementwise:
                     for (pass = 0; pass < (q ? 4 : 2); pass++) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         case NEON_2RM_VCNT:
                             gen_helper_neon_cnt_u8(tmp, tmp);
                             break;
-                        case NEON_2RM_VMVN:
-                            tcg_gen_not_i32(tmp, tmp);
-                            break;
                         case NEON_2RM_VQABS:
                             switch (size) {
                             case 0:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             default: abort();
                             }
                             break;
-                        case NEON_2RM_VNEG:
-                            tmp2 = tcg_const_i32(0);
-                            gen_neon_rsb(size, tmp, tmp2);
-                            tcg_temp_free_i32(tmp2);
-                            break;
                         case NEON_2RM_VCGT0_F:
                         {
                             TCGv_ptr fpstatus = get_fpstatus_ptr(1);
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181011205206.3552-12-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                  vec_size, vec_size);
             }
             return 0;
+
+        case NEON_3R_VMUL: /* VMUL */
+            if (u) {
+                /* Polynomial case allows only P8 and is handled below.  */
+                if (size != 0) {
+                    return 1;
+                }
+            } else {
+                tcg_gen_gvec_mul(size, rd_ofs, rn_ofs, rm_ofs,
+                                 vec_size, vec_size);
+                return 0;
+            }
+            break;
         }
         if (size == 3) {
             /* 64-bit element instructions. */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 return 1;
             }
             break;
-        case NEON_3R_VMUL:
-            if (u && (size != 0)) {
-                /* UNDEF on invalid size for polynomial subcase */
-                return 1;
-            }
-            break;
         case NEON_3R_VFM_VQRDMLSH:
             if (!arm_dc_feature(s, ARM_FEATURE_VFP4)) {
                 return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             }
             break;
         case NEON_3R_VMUL:
-            if (u) { /* polynomial */
-                gen_helper_neon_mul_p8(tmp, tmp, tmp2);
-            } else { /* Integer */
-                switch (size) {
-                case 0: gen_helper_neon_mul_u8(tmp, tmp, tmp2); break;
-                case 1: gen_helper_neon_mul_u16(tmp, tmp, tmp2); break;
-                case 2: tcg_gen_mul_i32(tmp, tmp, tmp2); break;
-                default: abort();
-                }
-            }
+            /* VMUL.P8; other cases already eliminated.  */
+            gen_helper_neon_mul_p8(tmp, tmp, tmp2);
             break;
         case NEON_3R_VPMAX:
             GEN_NEON_INTEGER_OP(pmax);
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181011205206.3552-13-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 70 +++++++++++++++++++++++++++++-------------
 1 file changed, 48 insertions(+), 22 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     size--;
             }
             shift = (insn >> 16) & ((1 << (3 + size)) - 1);
-            /* To avoid excessive duplication of ops we implement shift
-               by immediate using the variable shift operations.  */
             if (op < 8) {
                 /* Shift by immediate:
                    VSHR, VSRA, VRSHR, VRSRA, VSRI, VSHL, VQSHL, VQSHLU.  */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 }
                 /* Right shifts are encoded as N - shift, where N is the
                    element size in bits.  */
-                if (op <= 4)
+                if (op <= 4) {
                     shift = shift - (1 << (size + 3));
+                }
+
+                switch (op) {
+                case 0:  /* VSHR */
+                    /* Right shift comes here negative.  */
+                    shift = -shift;
+                    /* Shifts larger than the element size are architecturally
+                     * valid.  Unsigned results in all zeros; signed results
+                     * in all sign bits.
+                     */
+                    if (!u) {
+                        tcg_gen_gvec_sari(size, rd_ofs, rm_ofs,
+                                          MIN(shift, (8 << size) - 1),
+                                          vec_size, vec_size);
+                    } else if (shift >= 8 << size) {
+                        tcg_gen_gvec_dup8i(rd_ofs, vec_size, vec_size, 0);
+                    } else {
+                        tcg_gen_gvec_shri(size, rd_ofs, rm_ofs, shift,
+                                          vec_size, vec_size);
+                    }
+                    return 0;
+
+                case 5: /* VSHL, VSLI */
+                    if (!u) { /* VSHL */
+                        /* Shifts larger than the element size are
+                         * architecturally valid and results in zero.
+                         */
+                        if (shift >= 8 << size) {
+                            tcg_gen_gvec_dup8i(rd_ofs, vec_size, vec_size, 0);
+                        } else {
+                            tcg_gen_gvec_shli(size, rd_ofs, rm_ofs, shift,
+                                              vec_size, vec_size);
+                        }
+                        return 0;
+                    }
+                    break;
+                }
+
                 if (size == 3) {
                     count = q + 1;
                 } else {
                     count = q ? 4: 2;
                 }
-                switch (size) {
-                case 0:
-                    imm = (uint8_t) shift;
-                    imm |= imm << 8;
-                    imm |= imm << 16;
-                    break;
-                case 1:
-                    imm = (uint16_t) shift;
-                    imm |= imm << 16;
-                    break;
-                case 2:
-                case 3:
-                    imm = shift;
-                    break;
-                default:
-                    abort();
-                }
+
+                /* To avoid excessive duplication of ops we implement shift
+                 * by immediate using the variable shift operations.
+                  */
+                imm = dup_const(size, shift);
 
                 for (pass = 0; pass < count; pass++) {
                     if (size == 3) {
                         neon_load_reg64(cpu_V0, rm + pass);
                         tcg_gen_movi_i64(cpu_V1, imm);
                         switch (op) {
-                        case 0:  /* VSHR */
                         case 1:  /* VSRA */
                             if (u)
                                 gen_helper_neon_shl_u64(cpu_V0, cpu_V0, cpu_V1);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                                          cpu_V0, cpu_V1);
                             }
                             break;
+                        default:
+                            g_assert_not_reached();
                         }
                         if (op == 1 || op == 3) {
                             /* Accumulate.  */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         tmp2 = tcg_temp_new_i32();
                         tcg_gen_movi_i32(tmp2, imm);
                         switch (op) {
-                        case 0:  /* VSHR */
                         case 1:  /* VSRA */
                             GEN_NEON_INTEGER_OP(shl);
                             break;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         case 7: /* VQSHL */
                             GEN_NEON_INTEGER_OP_ENV(qshl);
                             break;
+                        default:
+                            g_assert_not_reached();
                         }
                         tcg_temp_free_i32(tmp2);
 
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Move ssra_op and usra_op expanders from translate-a64.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181011205206.3552-14-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.h     |   2 +
 target/arm/translate-a64.c | 106 ----------------------------
 target/arm/translate.c     | 139 ++++++++++++++++++++++++++++++++++---
 3 files changed, 130 insertions(+), 117 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Move shi_op and sli_op expanders from translate-a64.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181011205206.3552-15-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.h     |   2 +
 target/arm/translate-a64.c | 152 +----------------------
 target/arm/translate.c     | 244 ++++++++++++++++++++++++++-----------
 3 files changed, 179 insertions(+), 219 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ extern const GVecGen3 bit_op;
 extern const GVecGen3 bif_op;
 extern const GVecGen2i ssra_op[4];
 extern const GVecGen2i usra_op[4];
+extern const GVecGen2i sri_op[4];
+extern const GVecGen2i sli_op[4];
 
 /*
  * Forward to the isar_feature_* tests given a DisasContext pointer.
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn)
     }
 }
 
-static void gen_shr8_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
-{
-    uint64_t mask = dup_const(MO_8, 0xff >> shift);
-    TCGv_i64 t = tcg_temp_new_i64();
-
-    tcg_gen_shri_i64(t, a, shift);
-    tcg_gen_andi_i64(t, t, mask);
-    tcg_gen_andi_i64(d, d, ~mask);
-    tcg_gen_or_i64(d, d, t);
-    tcg_temp_free_i64(t);
-}
-
-static void gen_shr16_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
-{
-    uint64_t mask = dup_const(MO_16, 0xffff >> shift);
-    TCGv_i64 t = tcg_temp_new_i64();
-
-    tcg_gen_shri_i64(t, a, shift);
-    tcg_gen_andi_i64(t, t, mask);
-    tcg_gen_andi_i64(d, d, ~mask);
-    tcg_gen_or_i64(d, d, t);
-    tcg_temp_free_i64(t);
-}
-
-static void gen_shr32_ins_i32(TCGv_i32 d, TCGv_i32 a, int32_t shift)
-{
-    tcg_gen_shri_i32(a, a, shift);
-    tcg_gen_deposit_i32(d, d, a, 0, 32 - shift);
-}
-
-static void gen_shr64_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
-{
-    tcg_gen_shri_i64(a, a, shift);
-    tcg_gen_deposit_i64(d, d, a, 0, 64 - shift);
-}
-
-static void gen_shr_ins_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
-{
-    uint64_t mask = (2ull << ((8 << vece) - 1)) - 1;
-    TCGv_vec t = tcg_temp_new_vec_matching(d);
-    TCGv_vec m = tcg_temp_new_vec_matching(d);
-
-    tcg_gen_dupi_vec(vece, m, mask ^ (mask >> sh));
-    tcg_gen_shri_vec(vece, t, a, sh);
-    tcg_gen_and_vec(vece, d, d, m);
-    tcg_gen_or_vec(vece, d, d, t);
-
-    tcg_temp_free_vec(t);
-    tcg_temp_free_vec(m);
-}
-
 /* SSHR[RA]/USHR[RA] - Vector shift right (optional rounding/accumulate) */
 static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u,
                                  int immh, int immb, int opcode, int rn, int rd)
 {
-    static const GVecGen2i sri_op[4] = {
-        { .fni8 = gen_shr8_ins_i64,
-          .fniv = gen_shr_ins_vec,
-          .load_dest = true,
-          .opc = INDEX_op_shri_vec,
-          .vece = MO_8 },
-        { .fni8 = gen_shr16_ins_i64,
-          .fniv = gen_shr_ins_vec,
-          .load_dest = true,
-          .opc = INDEX_op_shri_vec,
-          .vece = MO_16 },
-        { .fni4 = gen_shr32_ins_i32,
-          .fniv = gen_shr_ins_vec,
-          .load_dest = true,
-          .opc = INDEX_op_shri_vec,
-          .vece = MO_32 },
-        { .fni8 = gen_shr64_ins_i64,
-          .fniv = gen_shr_ins_vec,
-          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-          .load_dest = true,
-          .opc = INDEX_op_shri_vec,
-          .vece = MO_64 },
-    };
-
     int size = 32 - clz32(immh) - 1;
     int immhb = immh << 3 | immb;
     int shift = 2 * (8 << size) - immhb;
@@ -XXX,XX +XXX,XX @@ static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u,
     clear_vec_high(s, is_q, rd);
 }
 
-static void gen_shl8_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
-{
-    uint64_t mask = dup_const(MO_8, 0xff << shift);
-    TCGv_i64 t = tcg_temp_new_i64();
-
-    tcg_gen_shli_i64(t, a, shift);
-    tcg_gen_andi_i64(t, t, mask);
-    tcg_gen_andi_i64(d, d, ~mask);
-    tcg_gen_or_i64(d, d, t);
-    tcg_temp_free_i64(t);
-}
-
-static void gen_shl16_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
-{
-    uint64_t mask = dup_const(MO_16, 0xffff << shift);
-    TCGv_i64 t = tcg_temp_new_i64();
-
-    tcg_gen_shli_i64(t, a, shift);
-    tcg_gen_andi_i64(t, t, mask);
-    tcg_gen_andi_i64(d, d, ~mask);
-    tcg_gen_or_i64(d, d, t);
-    tcg_temp_free_i64(t);
-}
-
-static void gen_shl32_ins_i32(TCGv_i32 d, TCGv_i32 a, int32_t shift)
-{
-    tcg_gen_deposit_i32(d, d, a, shift, 32 - shift);
-}
-
-static void gen_shl64_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
-{
-    tcg_gen_deposit_i64(d, d, a, shift, 64 - shift);
-}
-
-static void gen_shl_ins_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
-{
-    uint64_t mask = (1ull << sh) - 1;
-    TCGv_vec t = tcg_temp_new_vec_matching(d);
-    TCGv_vec m = tcg_temp_new_vec_matching(d);
-
-    tcg_gen_dupi_vec(vece, m, mask);
-    tcg_gen_shli_vec(vece, t, a, sh);
-    tcg_gen_and_vec(vece, d, d, m);
-    tcg_gen_or_vec(vece, d, d, t);
-
-    tcg_temp_free_vec(t);
-    tcg_temp_free_vec(m);
-}
-
 /* SHL/SLI - Vector shift left */
 static void handle_vec_simd_shli(DisasContext *s, bool is_q, bool insert,
                                  int immh, int immb, int opcode, int rn, int rd)
 {
-    static const GVecGen2i shi_op[4] = {
-        { .fni8 = gen_shl8_ins_i64,
-          .fniv = gen_shl_ins_vec,
-          .opc = INDEX_op_shli_vec,
-          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-          .load_dest = true,
-          .vece = MO_8 },
-        { .fni8 = gen_shl16_ins_i64,
-          .fniv = gen_shl_ins_vec,
-          .opc = INDEX_op_shli_vec,
-          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-          .load_dest = true,
-          .vece = MO_16 },
-        { .fni4 = gen_shl32_ins_i32,
-          .fniv = gen_shl_ins_vec,
-          .opc = INDEX_op_shli_vec,
-          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-          .load_dest = true,
-          .vece = MO_32 },
-        { .fni8 = gen_shl64_ins_i64,
-          .fniv = gen_shl_ins_vec,
-          .opc = INDEX_op_shli_vec,
-          .prefer_i64 = TCG_TARGET_REG_BITS == 64,
-          .load_dest = true,
-          .vece = MO_64 },
-    };
     int size = 32 - clz32(immh) - 1;
     int immhb = immh << 3 | immb;
     int shift = immhb - (8 << size);
@@ -XXX,XX +XXX,XX @@ static void handle_vec_simd_shli(DisasContext *s, bool is_q, bool insert,
     }
 
     if (insert) {
-        gen_gvec_op2i(s, is_q, rd, rn, shift, &shi_op[size]);
+        gen_gvec_op2i(s, is_q, rd, rn, shift, &sli_op[size]);
     } else {
         gen_gvec_fn2i(s, is_q, rd, rn, shift, tcg_gen_gvec_shli, size);
     }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ const GVecGen2i usra_op[4] = {
       .vece = MO_64, },
 };
 
+static void gen_shr8_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
+{
+    uint64_t mask = dup_const(MO_8, 0xff >> shift);
+    TCGv_i64 t = tcg_temp_new_i64();
+
+    tcg_gen_shri_i64(t, a, shift);
+    tcg_gen_andi_i64(t, t, mask);
+    tcg_gen_andi_i64(d, d, ~mask);
+    tcg_gen_or_i64(d, d, t);
+    tcg_temp_free_i64(t);
+}
+
+static void gen_shr16_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
+{
+    uint64_t mask = dup_const(MO_16, 0xffff >> shift);
+    TCGv_i64 t = tcg_temp_new_i64();
+
+    tcg_gen_shri_i64(t, a, shift);
+    tcg_gen_andi_i64(t, t, mask);
+    tcg_gen_andi_i64(d, d, ~mask);
+    tcg_gen_or_i64(d, d, t);
+    tcg_temp_free_i64(t);
+}
+
+static void gen_shr32_ins_i32(TCGv_i32 d, TCGv_i32 a, int32_t shift)
+{
+    tcg_gen_shri_i32(a, a, shift);
+    tcg_gen_deposit_i32(d, d, a, 0, 32 - shift);
+}
+
+static void gen_shr64_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
+{
+    tcg_gen_shri_i64(a, a, shift);
+    tcg_gen_deposit_i64(d, d, a, 0, 64 - shift);
+}
+
+static void gen_shr_ins_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
+{
+    if (sh == 0) {
+        tcg_gen_mov_vec(d, a);
+    } else {
+        TCGv_vec t = tcg_temp_new_vec_matching(d);
+        TCGv_vec m = tcg_temp_new_vec_matching(d);
+
+        tcg_gen_dupi_vec(vece, m, MAKE_64BIT_MASK((8 << vece) - sh, sh));
+        tcg_gen_shri_vec(vece, t, a, sh);
+        tcg_gen_and_vec(vece, d, d, m);
+        tcg_gen_or_vec(vece, d, d, t);
+
+        tcg_temp_free_vec(t);
+        tcg_temp_free_vec(m);
+    }
+}
+
+const GVecGen2i sri_op[4] = {
+    { .fni8 = gen_shr8_ins_i64,
+      .fniv = gen_shr_ins_vec,
+      .load_dest = true,
+      .opc = INDEX_op_shri_vec,
+      .vece = MO_8 },
+    { .fni8 = gen_shr16_ins_i64,
+      .fniv = gen_shr_ins_vec,
+      .load_dest = true,
+      .opc = INDEX_op_shri_vec,
+      .vece = MO_16 },
+    { .fni4 = gen_shr32_ins_i32,
+      .fniv = gen_shr_ins_vec,
+      .load_dest = true,
+      .opc = INDEX_op_shri_vec,
+      .vece = MO_32 },
+    { .fni8 = gen_shr64_ins_i64,
+      .fniv = gen_shr_ins_vec,
+      .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+      .load_dest = true,
+      .opc = INDEX_op_shri_vec,
+      .vece = MO_64 },
+};
+
+static void gen_shl8_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
+{
+    uint64_t mask = dup_const(MO_8, 0xff << shift);
+    TCGv_i64 t = tcg_temp_new_i64();
+
+    tcg_gen_shli_i64(t, a, shift);
+    tcg_gen_andi_i64(t, t, mask);
+    tcg_gen_andi_i64(d, d, ~mask);
+    tcg_gen_or_i64(d, d, t);
+    tcg_temp_free_i64(t);
+}
+
+static void gen_shl16_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
+{
+    uint64_t mask = dup_const(MO_16, 0xffff << shift);
+    TCGv_i64 t = tcg_temp_new_i64();
+
+    tcg_gen_shli_i64(t, a, shift);
+    tcg_gen_andi_i64(t, t, mask);
+    tcg_gen_andi_i64(d, d, ~mask);
+    tcg_gen_or_i64(d, d, t);
+    tcg_temp_free_i64(t);
+}
+
+static void gen_shl32_ins_i32(TCGv_i32 d, TCGv_i32 a, int32_t shift)
+{
+    tcg_gen_deposit_i32(d, d, a, shift, 32 - shift);
+}
+
+static void gen_shl64_ins_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
+{
+    tcg_gen_deposit_i64(d, d, a, shift, 64 - shift);
+}
+
+static void gen_shl_ins_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
+{
+    if (sh == 0) {
+        tcg_gen_mov_vec(d, a);
+    } else {
+        TCGv_vec t = tcg_temp_new_vec_matching(d);
+        TCGv_vec m = tcg_temp_new_vec_matching(d);
+
+        tcg_gen_dupi_vec(vece, m, MAKE_64BIT_MASK(0, sh));
+        tcg_gen_shli_vec(vece, t, a, sh);
+        tcg_gen_and_vec(vece, d, d, m);
+        tcg_gen_or_vec(vece, d, d, t);
+
+        tcg_temp_free_vec(t);
+        tcg_temp_free_vec(m);
+    }
+}
+
+const GVecGen2i sli_op[4] = {
+    { .fni8 = gen_shl8_ins_i64,
+      .fniv = gen_shl_ins_vec,
+      .load_dest = true,
+      .opc = INDEX_op_shli_vec,
+      .vece = MO_8 },
+    { .fni8 = gen_shl16_ins_i64,
+      .fniv = gen_shl_ins_vec,
+      .load_dest = true,
+      .opc = INDEX_op_shli_vec,
+      .vece = MO_16 },
+    { .fni4 = gen_shl32_ins_i32,
+      .fniv = gen_shl_ins_vec,
+      .load_dest = true,
+      .opc = INDEX_op_shli_vec,
+      .vece = MO_32 },
+    { .fni8 = gen_shl64_ins_i64,
+      .fniv = gen_shl_ins_vec,
+      .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+      .load_dest = true,
+      .opc = INDEX_op_shli_vec,
+      .vece = MO_64 },
+};
+
 /* Translate a NEON data processing instruction.  Return nonzero if the
    instruction is invalid.
    We process data in a mixture of 32-bit and 64-bit chunks.
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     int pairwise;
     int u;
     int vec_size;
-    uint32_t imm, mask;
+    uint32_t imm;
     TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
     TCGv_ptr ptr1, ptr2, ptr3;
     TCGv_i64 tmp64;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     }
                     return 0;
 
+                case 4: /* VSRI */
+                    if (!u) {
+                        return 1;
+                    }
+                    /* Right shift comes here negative.  */
+                    shift = -shift;
+                    /* Shift out of range leaves destination unchanged.  */
+                    if (shift < 8 << size) {
+                        tcg_gen_gvec_2i(rd_ofs, rm_ofs, vec_size, vec_size,
+                                        shift, &sri_op[size]);
+                    }
+                    return 0;
+
                 case 5: /* VSHL, VSLI */
-                    if (!u) { /* VSHL */
+                    if (u) { /* VSLI */
+                        /* Shift out of range leaves destination unchanged.  */
+                        if (shift < 8 << size) {
+                            tcg_gen_gvec_2i(rd_ofs, rm_ofs, vec_size,
+                                            vec_size, shift, &sli_op[size]);
+                        }
+                    } else { /* VSHL */
                         /* Shifts larger than the element size are
                          * architecturally valid and results in zero.
                          */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             tcg_gen_gvec_shli(size, rd_ofs, rm_ofs, shift,
                                               vec_size, vec_size);
                         }
-                        return 0;
                     }
-                    break;
+                    return 0;
                 }
 
                 if (size == 3) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             else
                                 gen_helper_neon_rshl_s64(cpu_V0, cpu_V0, cpu_V1);
                             break;
-                        case 4: /* VSRI */
-                        case 5: /* VSHL, VSLI */
-                            gen_helper_neon_shl_u64(cpu_V0, cpu_V0, cpu_V1);
-                            break;
                         case 6: /* VQSHLU */
                             gen_helper_neon_qshlu_s64(cpu_V0, cpu_env,
                                                       cpu_V0, cpu_V1);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             /* Accumulate.  */
                             neon_load_reg64(cpu_V1, rd + pass);
                             tcg_gen_add_i64(cpu_V0, cpu_V0, cpu_V1);
-                        } else if (op == 4 || (op == 5 && u)) {
-                            /* Insert */
-                            neon_load_reg64(cpu_V1, rd + pass);
-                            uint64_t mask;
-                            if (shift < -63 || shift > 63) {
-                                mask = 0;
-                            } else {
-                                if (op == 4) {
-                                    mask = 0xffffffffffffffffull >> -shift;
-                                } else {
-                                    mask = 0xffffffffffffffffull << shift;
-                                }
-                            }
-                            tcg_gen_andi_i64(cpu_V1, cpu_V1, ~mask);
-                            tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
                         }
                         neon_store_reg64(cpu_V0, rd + pass);
                     } else { /* size < 3 */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         case 3: /* VRSRA */
                             GEN_NEON_INTEGER_OP(rshl);
                             break;
-                        case 4: /* VSRI */
-                        case 5: /* VSHL, VSLI */
-                            switch (size) {
-                            case 0: gen_helper_neon_shl_u8(tmp, tmp, tmp2); break;
-                            case 1: gen_helper_neon_shl_u16(tmp, tmp, tmp2); break;
-                            case 2: gen_helper_neon_shl_u32(tmp, tmp, tmp2); break;
-                            default: abort();
-                            }
-                            break;
                         case 6: /* VQSHLU */
                             switch (size) {
                             case 0:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             tmp2 = neon_load_reg(rd, pass);
                             gen_neon_add(size, tmp, tmp2);
                             tcg_temp_free_i32(tmp2);
-                        } else if (op == 4 || (op == 5 && u)) {
-                            /* Insert */
-                            switch (size) {
-                            case 0:
-                                if (op == 4)
-                                    mask = 0xff >> -shift;
-                                else
-                                    mask = (uint8_t)(0xff << shift);
-                                mask |= mask << 8;
-                                mask |= mask << 16;
-                                break;
-                            case 1:
-                                if (op == 4)
-                                    mask = 0xffff >> -shift;
-                                else
-                                    mask = (uint16_t)(0xffff << shift);
-                                mask |= mask << 16;
-                                break;
-                            case 2:
-                                if (shift < -31 || shift > 31) {
-                                    mask = 0;
-                                } else {
-                                    if (op == 4)
-                                        mask = 0xffffffffu >> -shift;
-                                    else
-                                        mask = 0xffffffffu << shift;
-                                }
-                                break;
-                            default:
-                                abort();
-                            }
-                            tmp2 = neon_load_reg(rd, pass);
-                            tcg_gen_andi_i32(tmp, tmp, mask);
-                            tcg_gen_andi_i32(tmp2, tmp2, ~mask);
-                            tcg_gen_or_i32(tmp, tmp, tmp2);
-                            tcg_temp_free_i32(tmp2);
                         }
                         neon_store_reg(rd, pass, tmp);
                     }
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Move mla_op and mls_op expanders from translate-a64.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181011205206.3552-16-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.h     |   2 +
 target/arm/translate-a64.c | 106 -----------------------------
 target/arm/translate.c     | 134 ++++++++++++++++++++++++++++++++-----
 3 files changed, 120 insertions(+), 122 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Move cmtst_op expanders from translate-a64.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181011205206.3552-17-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.h     |  2 +
 target/arm/translate-a64.c | 38 ------------------
 target/arm/translate.c     | 81 +++++++++++++++++++++++++++-----------
 3 files changed, 60 insertions(+), 61 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181011205206.3552-18-richard.henderson@linaro.org
[PMM: added parens in ?: expression]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 81 ++++++++++++++----------------------------
 1 file changed, 26 insertions(+), 55 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_vfp_msr(TCGv_i32 tmp)
     tcg_temp_free_i32(tmp);
 }
 
-static void gen_neon_dup_u8(TCGv_i32 var, int shift)
-{
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    if (shift)
-        tcg_gen_shri_i32(var, var, shift);
-    tcg_gen_ext8u_i32(var, var);
-    tcg_gen_shli_i32(tmp, var, 8);
-    tcg_gen_or_i32(var, var, tmp);
-    tcg_gen_shli_i32(tmp, var, 16);
-    tcg_gen_or_i32(var, var, tmp);
-    tcg_temp_free_i32(tmp);
-}
-
 static void gen_neon_dup_low16(TCGv_i32 var)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
     tcg_temp_free_i32(tmp);
 }
 
-static TCGv_i32 gen_load_and_replicate(DisasContext *s, TCGv_i32 addr, int size)
-{
-    /* Load a single Neon element and replicate into a 32 bit TCG reg */
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    switch (size) {
-    case 0:
-        gen_aa32_ld8u(s, tmp, addr, get_mem_index(s));
-        gen_neon_dup_u8(tmp, 0);
-        break;
-    case 1:
-        gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
-        gen_neon_dup_low16(tmp);
-        break;
-    case 2:
-        gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-        break;
-    default: /* Avoid compiler warnings.  */
-        abort();
-    }
-    return tmp;
-}
-
 static int handle_vsel(uint32_t insn, uint32_t rd, uint32_t rn, uint32_t rm,
                        uint32_t dp)
 {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
     int load;
     int shift;
     int n;
+    int vec_size;
     TCGv_i32 addr;
     TCGv_i32 tmp;
     TCGv_i32 tmp2;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
             }
             addr = tcg_temp_new_i32();
             load_reg_var(s, addr, rn);
-            if (nregs == 1) {
-                /* VLD1 to all lanes: bit 5 indicates how many Dregs to write */
-                tmp = gen_load_and_replicate(s, addr, size);
-                tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 0));
-                tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 1));
-                if (insn & (1 << 5)) {
-                    tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd + 1, 0));
-                    tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd + 1, 1));
-                }
-                tcg_temp_free_i32(tmp);
-            } else {
-                /* VLD2/3/4 to all lanes: bit 5 indicates register stride */
-                stride = (insn & (1 << 5)) ? 2 : 1;
-                for (reg = 0; reg < nregs; reg++) {
-                    tmp = gen_load_and_replicate(s, addr, size);
-                    tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 0));
-                    tcg_gen_st_i32(tmp, cpu_env, neon_reg_offset(rd, 1));
-                    tcg_temp_free_i32(tmp);
-                    tcg_gen_addi_i32(addr, addr, 1 << size);
-                    rd += stride;
+
+            /* VLD1 to all lanes: bit 5 indicates how many Dregs to write.
+             * VLD2/3/4 to all lanes: bit 5 indicates register stride.
+             */
+            stride = (insn & (1 << 5)) ? 2 : 1;
+            vec_size = nregs == 1 ? stride * 8 : 8;
+
+            tmp = tcg_temp_new_i32();
+            for (reg = 0; reg < nregs; reg++) {
+                gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s),
+                                s->be_data | size);
+                if ((rd & 1) && vec_size == 16) {
+                    /* We cannot write 16 bytes at once because the
+                     * destination is unaligned.
+                     */
+                    tcg_gen_gvec_dup_i32(size, neon_reg_offset(rd, 0),
+                                         8, 8, tmp);
+                    tcg_gen_gvec_mov(0, neon_reg_offset(rd + 1, 0),
+                                     neon_reg_offset(rd, 0), 8, 8);
+                } else {
+                    tcg_gen_gvec_dup_i32(size, neon_reg_offset(rd, 0),
+                                         vec_size, vec_size, tmp);
                 }
+                tcg_gen_addi_i32(addr, addr, 1 << size);
+                rd += stride;
             }
+            tcg_temp_free_i32(tmp);
             tcg_temp_free_i32(addr);
             stride = (1 << size) * nregs;
         } else {
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Instead of shifts and masks, use direct loads and stores from the neon
register file.  Mirror the iteration structure of the ARM pseudocode
more closely.  Correct the parameters of the VLD2 A2 insn.

Note that this includes a bugfix for handling of the insn
"VLD2 (multiple 2-element structures)" -- we were using an
incorrect stride value.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181011205206.3552-19-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 170 ++++++++++++++++++-----------------------
 1 file changed, 74 insertions(+), 96 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static TCGv_i32 neon_load_reg(int reg, int pass)
     return tmp;
 }
 
+static void neon_load_element64(TCGv_i64 var, int reg, int ele, TCGMemOp mop)
+{
+    long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
+
+    switch (mop) {
+    case MO_UB:
+        tcg_gen_ld8u_i64(var, cpu_env, offset);
+        break;
+    case MO_UW:
+        tcg_gen_ld16u_i64(var, cpu_env, offset);
+        break;
+    case MO_UL:
+        tcg_gen_ld32u_i64(var, cpu_env, offset);
+        break;
+    case MO_Q:
+        tcg_gen_ld_i64(var, cpu_env, offset);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static void neon_store_reg(int reg, int pass, TCGv_i32 var)
 {
     tcg_gen_st_i32(var, cpu_env, neon_reg_offset(reg, pass));
     tcg_temp_free_i32(var);
 }
 
+static void neon_store_element64(int reg, int ele, TCGMemOp size, TCGv_i64 var)
+{
+    long offset = neon_element_offset(reg, ele, size);
+
+    switch (size) {
+    case MO_8:
+        tcg_gen_st8_i64(var, cpu_env, offset);
+        break;
+    case MO_16:
+        tcg_gen_st16_i64(var, cpu_env, offset);
+        break;
+    case MO_32:
+        tcg_gen_st32_i64(var, cpu_env, offset);
+        break;
+    case MO_64:
+        tcg_gen_st_i64(var, cpu_env, offset);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static inline void neon_load_reg64(TCGv_i64 var, int reg)
 {
     tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
@@ -XXX,XX +XXX,XX @@ static struct {
     int interleave;
     int spacing;
 } const neon_ls_element_type[11] = {
-    {4, 4, 1},
-    {4, 4, 2},
+    {1, 4, 1},
+    {1, 4, 2},
     {4, 1, 1},
-    {4, 2, 1},
-    {3, 3, 1},
-    {3, 3, 2},
+    {2, 2, 2},
+    {1, 3, 1},
+    {1, 3, 2},
     {3, 1, 1},
     {1, 1, 1},
-    {2, 2, 1},
-    {2, 2, 2},
+    {1, 2, 1},
+    {1, 2, 2},
     {2, 1, 1}
 };
 
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
     int shift;
     int n;
     int vec_size;
+    int mmu_idx;
+    TCGMemOp endian;
     TCGv_i32 addr;
     TCGv_i32 tmp;
     TCGv_i32 tmp2;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
     rn = (insn >> 16) & 0xf;
     rm = insn & 0xf;
     load = (insn & (1 << 21)) != 0;
+    endian = s->be_data;
+    mmu_idx = get_mem_index(s);
     if ((insn & (1 << 23)) == 0) {
         /* Load store all elements.  */
         op = (insn >> 8) & 0xf;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
         nregs = neon_ls_element_type[op].nregs;
         interleave = neon_ls_element_type[op].interleave;
         spacing = neon_ls_element_type[op].spacing;
-        if (size == 3 && (interleave | spacing) != 1)
+        if (size == 3 && (interleave | spacing) != 1) {
             return 1;
+        }
+        tmp64 = tcg_temp_new_i64();
         addr = tcg_temp_new_i32();
+        tmp2 = tcg_const_i32(1 << size);
         load_reg_var(s, addr, rn);
-        stride = (1 << size) * interleave;
         for (reg = 0; reg < nregs; reg++) {
-            if (interleave > 2 || (interleave == 2 && nregs == 2)) {
-                load_reg_var(s, addr, rn);
-                tcg_gen_addi_i32(addr, addr, (1 << size) * reg);
-            } else if (interleave == 2 && nregs == 4 && reg == 2) {
-                load_reg_var(s, addr, rn);
-                tcg_gen_addi_i32(addr, addr, 1 << size);
-            }
-            if (size == 3) {
-                tmp64 = tcg_temp_new_i64();
-                if (load) {
-                    gen_aa32_ld64(s, tmp64, addr, get_mem_index(s));
-                    neon_store_reg64(tmp64, rd);
-                } else {
-                    neon_load_reg64(tmp64, rd);
-                    gen_aa32_st64(s, tmp64, addr, get_mem_index(s));
-                }
-                tcg_temp_free_i64(tmp64);
-                tcg_gen_addi_i32(addr, addr, stride);
-            } else {
-                for (pass = 0; pass < 2; pass++) {
-                    if (size == 2) {
-                        if (load) {
-                            tmp = tcg_temp_new_i32();
-                            gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-                            neon_store_reg(rd, pass, tmp);
-                        } else {
-                            tmp = neon_load_reg(rd, pass);
-                            gen_aa32_st32(s, tmp, addr, get_mem_index(s));
-                            tcg_temp_free_i32(tmp);
-                        }
-                        tcg_gen_addi_i32(addr, addr, stride);
-                    } else if (size == 1) {
-                        if (load) {
-                            tmp = tcg_temp_new_i32();
-                            gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
-                            tcg_gen_addi_i32(addr, addr, stride);
-                            tmp2 = tcg_temp_new_i32();
-                            gen_aa32_ld16u(s, tmp2, addr, get_mem_index(s));
-                            tcg_gen_addi_i32(addr, addr, stride);
-                            tcg_gen_shli_i32(tmp2, tmp2, 16);
-                            tcg_gen_or_i32(tmp, tmp, tmp2);
-                            tcg_temp_free_i32(tmp2);
-                            neon_store_reg(rd, pass, tmp);
-                        } else {
-                            tmp = neon_load_reg(rd, pass);
-                            tmp2 = tcg_temp_new_i32();
-                            tcg_gen_shri_i32(tmp2, tmp, 16);
-                            gen_aa32_st16(s, tmp, addr, get_mem_index(s));
-                            tcg_temp_free_i32(tmp);
-                            tcg_gen_addi_i32(addr, addr, stride);
-                            gen_aa32_st16(s, tmp2, addr, get_mem_index(s));
-                            tcg_temp_free_i32(tmp2);
-                            tcg_gen_addi_i32(addr, addr, stride);
-                        }
-                    } else /* size == 0 */ {
-                        if (load) {
-                            tmp2 = NULL;
-                            for (n = 0; n < 4; n++) {
-                                tmp = tcg_temp_new_i32();
-                                gen_aa32_ld8u(s, tmp, addr, get_mem_index(s));
-                                tcg_gen_addi_i32(addr, addr, stride);
-                                if (n == 0) {
-                                    tmp2 = tmp;
-                                } else {
-                                    tcg_gen_shli_i32(tmp, tmp, n * 8);
-                                    tcg_gen_or_i32(tmp2, tmp2, tmp);
-                                    tcg_temp_free_i32(tmp);
-                                }
-                            }
-                            neon_store_reg(rd, pass, tmp2);
-                        } else {
-                            tmp2 = neon_load_reg(rd, pass);
-                            for (n = 0; n < 4; n++) {
-                                tmp = tcg_temp_new_i32();
-                                if (n == 0) {
-                                    tcg_gen_mov_i32(tmp, tmp2);
-                                } else {
-                                    tcg_gen_shri_i32(tmp, tmp2, n * 8);
-                                }
-                                gen_aa32_st8(s, tmp, addr, get_mem_index(s));
-                                tcg_temp_free_i32(tmp);
-                                tcg_gen_addi_i32(addr, addr, stride);
-                            }
-                            tcg_temp_free_i32(tmp2);
-                        }
+            for (n = 0; n < 8 >> size; n++) {
+                int xs;
+                for (xs = 0; xs < interleave; xs++) {
+                    int tt = rd + reg + spacing * xs;
+
+                    if (load) {
+                        gen_aa32_ld_i64(s, tmp64, addr, mmu_idx, endian | size);
+                        neon_store_element64(tt, n, size, tmp64);
+                    } else {
+                        neon_load_element64(tmp64, tt, n, size);
+                        gen_aa32_st_i64(s, tmp64, addr, mmu_idx, endian | size);
                     }
+                    tcg_gen_add_i32(addr, addr, tmp2);
                 }
             }
-            rd += spacing;
         }
         tcg_temp_free_i32(addr);
-        stride = nregs * 8;
+        tcg_temp_free_i32(tmp2);
+        tcg_temp_free_i64(tmp64);
+        stride = nregs * interleave * 8;
     } else {
         size = (insn >> 10) & 3;
         if (size == 3) {
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

For a sequence of loads or stores from a single register,
little-endian operations can be promoted to an 8-byte op.
This can reduce the number of operations by a factor of 8.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181011205206.3552-20-richard.henderson@linaro.org
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_ls_insn(DisasContext *s, uint32_t insn)
         if (size == 3 && (interleave | spacing) != 1) {
             return 1;
         }
+        /* For our purposes, bytes are always little-endian.  */
+        if (size == 0) {
+            endian = MO_LE;
+        }
+        /* Consecutive little-endian elements from a single register
+         * can be promoted to a larger little-endian operation.
+         */
+        if (interleave == 1 && endian == MO_LE) {
+            size = 3;
+        }
         tmp64 = tcg_temp_new_i64();
         addr = tcg_temp_new_i32();
         tmp2 = tcg_const_i32(1 << size);
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Instead of shifts and masks, use direct loads and stores from
the neon register file.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20181011205206.3552-21-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 92 +++++++++++++++++++++++-------------------
 1 file changed, 50 insertions(+), 42 deletions(-)

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Announce the availability of the various priority queues.
This fixes an issue where guest kernels would miss to
configure secondary queues due to inproper feature bits.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 20181017213932.19973-2-edgar.iglesias@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/net/cadence_gem.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/cadence_gem.c
+++ b/hw/net/cadence_gem.c
@@ -XXX,XX +XXX,XX @@ static void gem_reset(DeviceState *d)
     int i;
     CadenceGEMState *s = CADENCE_GEM(d);
     const uint8_t *a;
+    uint32_t queues_mask = 0;
 
     DB_PRINT("\n");
 
@@ -XXX,XX +XXX,XX @@ static void gem_reset(DeviceState *d)
     s->regs[GEM_DESCONF] = 0x02500111;
     s->regs[GEM_DESCONF2] = 0x2ab13fff;
     s->regs[GEM_DESCONF5] = 0x002f2045;
-    s->regs[GEM_DESCONF6] = 0x00000200;
+    s->regs[GEM_DESCONF6] = 0x0;
+
+    if (s->num_priority_queues > 1) {
+        queues_mask = MAKE_64BIT_MASK(1, s->num_priority_queues - 1);
+        s->regs[GEM_DESCONF6] |= queues_mask;
+    }
 
     /* Set MAC address */
     a = &s->conf.macaddr.a[0];
-- 
2.19.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Announce 64bit addressing support.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 20181017213932.19973-3-edgar.iglesias@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/net/cadence_gem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/cadence_gem.c
+++ b/hw/net/cadence_gem.c
@@ -XXX,XX +XXX,XX @@
 #define GEM_DESCONF4      (0x0000028C/4)
 #define GEM_DESCONF5      (0x00000290/4)
 #define GEM_DESCONF6      (0x00000294/4)
+#define GEM_DESCONF6_64B_MASK (1U << 23)
 #define GEM_DESCONF7      (0x00000298/4)
 
 #define GEM_INT_Q1_STATUS               (0x00000400 / 4)
@@ -XXX,XX +XXX,XX @@ static void gem_reset(DeviceState *d)
     s->regs[GEM_DESCONF] = 0x02500111;
     s->regs[GEM_DESCONF2] = 0x2ab13fff;
     s->regs[GEM_DESCONF5] = 0x002f2045;
-    s->regs[GEM_DESCONF6] = 0x0;
+    s->regs[GEM_DESCONF6] = GEM_DESCONF6_64B_MASK;
 
     if (s->num_priority_queues > 1) {
         queues_mask = MAKE_64BIT_MASK(1, s->num_priority_queues - 1);
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

The EL3 version of this register does not include an ASID,
and so the tlb_flush performed by vmsa_ttbr_write is not needed.

Reviewed-by: Aaron Lindsay <aaron@os.amperecomputing.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20181019015617.22583-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el3_cp_reginfo[] = {
       .fieldoffset = offsetof(CPUARMState, cp15.mvbar) },
     { .name = "TTBR0_EL3", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 6, .crn = 2, .crm = 0, .opc2 = 0,
-      .access = PL3_RW, .writefn = vmsa_ttbr_write, .resetvalue = 0,
+      .access = PL3_RW, .resetvalue = 0,
       .fieldoffset = offsetof(CPUARMState, cp15.ttbr0_el[3]) },
     { .name = "TCR_EL3", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 6, .crn = 2, .crm = 0, .opc2 = 2,
-- 
2.19.1

From: Richard Henderson <richard.henderson@linaro.org>

Since QEMU does not implement ASIDs, changes to the ASID must flush the
tlb.  However, if the ASID does not change there is no reason to flush.

In testing a boot of the Ubuntu installer to the first menu, this reduces
the number of flushes by 30%, or nearly 600k instances.

Reviewed-by: Aaron Lindsay <aaron@os.amperecomputing.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20181019015617.22583-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

The following changes since commit a97978bcc2d1f650c7d411428806e5b03082b8c7:

Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210603' into staging (2021-06-03 10:00:35 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210603

for you to fetch changes up to 1c861885894d840235954060050d240259f5340b:

tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed (2021-06-03 16:43:27 +0100)

----------------------------------------------------------------
target-arm queue:
 * Some not-yet-enabled preliminaries for M-profile MVE support
 * Consistently use "Cortex-Axx", not "Cortex Axx" in docs, comments
 * docs: Fix installation of man pages with Sphinx 4.x
 * Mark LDS{MIN,MAX} as signed operations
 * Fix missing syndrome value for DAIF and PAC check exceptions
 * Implement BFloat16 extensions
 * Refactoring of hvf accelerator code in preparation for aarch64 support
 * Fix some coverity nits in test code

----------------------------------------------------------------
Alexander Graf (12):
      hvf: Move assert_hvf_ok() into common directory
      hvf: Move vcpu thread functions into common directory
      hvf: Move cpu functions into common directory
      hvf: Move hvf internal definitions into common header
      hvf: Make hvf_set_phys_mem() static
      hvf: Remove use of hv_uvaddr_t and hv_gpaddr_t
      hvf: Split out common code on vcpu init and destroy
      hvf: Use cpu_synchronize_state()
      hvf: Make synchronize functions static
      hvf: Remove hvf-accel-ops.h
      hvf: Introduce hvf vcpu struct
      hvf: Simplify post reset/init/loadvm hooks

Damien Goutte-Gattat (1):
      docs: Fix installation of man pages with Sphinx 4.x

Jamie Iles (4):
      target/arm: fix missing exception class
      target/arm: fold do_raise_exception into raise_exception
      target/arm: use raise_exception_ra for MTE check failure
      target/arm: use raise_exception_ra for stack limit exception

Peter Maydell (15):
      target/arm: Add isar feature check functions for MVE
      target/arm: Update feature checks for insns which are "MVE or FP"
      target/arm: Move fpsp/fpdp isar check into callers of do_vfp_2op_sp/dp
      target/arm: Add MVE check to VMOV_reg_sp and VMOV_reg_dp
      target/arm: Fix return values in fp_sysreg_checks()
      target/arm: Implement M-profile VPR register
      target/arm: Make FPSCR.LTPSIZE writable for MVE
      target/arm: Allow board models to specify initial NS VTOR
      arm: Consistently use "Cortex-Axx", not "Cortex Axx"
      tests/qtest/bios-tables-test: Check for dup2() failure
      tests/qtest/e1000e-test: Check qemu_recv() succeeded
      tests/qtest/hd-geo-test: Fix checks on mkstemp() return value
      tests/qtest/pflash-cfi02-test: Avoid potential integer overflow
      tests/qtest/tpm-tests: Remove unnecessary NULL checks
      tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed

Richard Henderson (13):
      target/arm: Mark LDS{MIN,MAX} as signed operations
      target/arm: Add isar_feature_{aa32, aa64, aa64_sve}_bf16
      target/arm: Unify unallocated path in disas_fp_1src
      target/arm: Implement scalar float32 to bfloat16 conversion
      target/arm: Implement vector float32 to bfloat16 conversion
      softfpu: Add float_round_to_odd_inf
      target/arm: Implement bfloat16 dot product (vector)
      target/arm: Implement bfloat16 dot product (indexed)
      target/arm: Implement bfloat16 matrix multiply accumulate
      target/arm: Implement bfloat widening fma (vector)
      target/arm: Implement bfloat widening fma (indexed)
      linux-user/aarch64: Enable hwcap bits for bfloat16
      target/arm: Enable BFloat16 extensions

Add the isar feature check functions we will need for v8.1M MVE:
 * a check for MVE present: this corresponds to the pseudocode's
   CheckDecodeFaults(ExtType_Mve)
 * a check for the optional floating-point part of MVE: this
   corresponds to CheckDecodeFaults(ExtType_MveFp)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-2-peter.maydell@linaro.org
---
 target/arm/cpu.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
     }
 }
 
+static inline bool isar_feature_aa32_mve(const ARMISARegisters *id)
+{
+    /*
+     * Return true if MVE is supported (either integer or floating point).
+     * We must check for M-profile as the MVFR1 field means something
+     * else for A-profile.
+     */
+    return isar_feature_aa32_mprofile(id) &&
+        FIELD_EX32(id->mvfr1, MVFR1, MVE) > 0;
+}
+
+static inline bool isar_feature_aa32_mve_fp(const ARMISARegisters *id)
+{
+    /*
+     * Return true if MVE is supported (either integer or floating point).
+     * We must check for M-profile as the MVFR1 field means something
+     * else for A-profile.
+     */
+    return isar_feature_aa32_mprofile(id) &&
+        FIELD_EX32(id->mvfr1, MVFR1, MVE) >= 2;
+}
+
 static inline bool isar_feature_aa32_vfp_simd(const ARMISARegisters *id)
 {
     /*
-- 
2.20.1

Some v8M instructions are present if either the floating point
extension or MVE is implemented.  Update our implementation of them
to check for MVE as well as for FP.

This is all the insns which use CheckDecodeFaults(ExtType_MveOrFp) or
CheckDecodeFaults(ExtType_MveOrDpFp) in their pseudocode, which are
essentially the loads and stores, moves and sysreg accesses, except
for VMOV_reg_sp and VMOV_reg_dp, which we handle in subsequent
patches because they need a refactor to provide a place to put the
new MVE check.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-3-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 48 +++++++++++++++++++++++---------------
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
     /* VMOV scalar to general purpose register */
     TCGv_i32 tmp;
 
-    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
-    if (a->size == MO_32
-        ? !dc_isar_feature(aa32_fpsp_v2, s)
-        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-        return false;
+    /*
+     * SIZE == MO_32 is a VFP instruction; otherwise NEON. MVE has
+     * all sizes, whether the CPU has fp or not.
+     */
+    if (!dc_isar_feature(aa32_mve, s)) {
+        if (a->size == MO_32
+            ? !dc_isar_feature(aa32_fpsp_v2, s)
+            : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
+            return false;
+        }
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist */
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
     /* VMOV general purpose register to scalar */
     TCGv_i32 tmp;
 
-    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
-    if (a->size == MO_32
-        ? !dc_isar_feature(aa32_fpsp_v2, s)
-        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-        return false;
+    /*
+     * SIZE == MO_32 is a VFP instruction; otherwise NEON. MVE has
+     * all sizes, whether the CPU has fp or not.
+     */
+    if (!dc_isar_feature(aa32_mve, s)) {
+        if (a->size == MO_32
+            ? !dc_isar_feature(aa32_fpsp_v2, s)
+            : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
+            return false;
+        }
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist */
@@ -XXX,XX +XXX,XX @@ typedef enum FPSysRegCheckResult {
 
 static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
 {
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return FPSysRegCheckFailed;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
 {
     TCGv_i32 tmp;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
 {
     TCGv_i32 tmp;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
      * floating point register.  Note that this does not require support
      * for double precision arithmetic.
      */
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     uint32_t offset;
     TCGv_i32 addr, tmp;
 
-    if (!dc_isar_feature(aa32_fp16_arith, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     uint32_t offset;
     TCGv_i32 addr, tmp;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
     TCGv_i64 tmp;
 
     /* Note that this does not require support for double arithmetic.  */
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
     TCGv_i32 addr, tmp;
     int i, n;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
     int i, n;
 
     /* Note that this does not require support for double arithmetic.  */
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
-- 
2.20.1

The do_vfp_2op_sp() and do_vfp_2op_dp() functions currently check
whether floating point is supported via the aa32_fpdp_v2 and
aa32_fpsp_v2 isar checks.  For v8.1M MVE support, the VMOV_reg trans
functions (but not any of the others) need to update this to also
allow the insn if MVE is implemented.  Move the check out of the do_
function and into its callsites (which are all implemented via the
DO_VFP_2OP macro), so we have a place to change the check for the
VMOV insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-4-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     int veclen = s->vec_len;
     TCGv_i32 f0, fd;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
-        return false;
-    }
+    /* Note that the caller must check the aa32_fpsp_v2 feature. */
 
     if (!dc_isar_feature(aa32_fpshvec, s) &&
         (veclen != 0 || s->vec_stride != 0)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
      */
     TCGv_i32 f0;
 
+    /* Note that the caller must check the aa32_fp16_arith feature */
+
     if (!dc_isar_feature(aa32_fp16_arith, s)) {
         return false;
     }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
     int veclen = s->vec_len;
     TCGv_i64 f0, fd;
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
-        return false;
-    }
+    /* Note that the caller must check the aa32_fpdp_v2 feature. */
 
     /* UNDEF accesses to D16-D31 if they don't exist */
     if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vm) & 0x10)) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     return true;
 }
 
-#define DO_VFP_2OP(INSN, PREC, FN)                              \
+#define DO_VFP_2OP(INSN, PREC, FN, CHECK)                       \
     static bool trans_##INSN##_##PREC(DisasContext *s,          \
                                       arg_##INSN##_##PREC *a)   \
     {                                                           \
+        if (!dc_isar_feature(CHECK, s)) {                       \
+            return false;                                       \
+        }                                                       \
         return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
     }
 
-DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32)
-DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64)
+DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32, aa32_fpsp_v2)
+DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64, aa32_fpdp_v2)
 
-DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh)
-DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss)
-DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd)
+DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh, aa32_fp16_arith)
+DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss, aa32_fpsp_v2)
+DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd, aa32_fpdp_v2)
 
-DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh)
-DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs)
-DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd)
+DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh, aa32_fp16_arith)
+DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs, aa32_fpsp_v2)
+DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd, aa32_fpdp_v2)
 
 static void gen_VSQRT_hp(TCGv_i32 vd, TCGv_i32 vm)
 {
@@ -XXX,XX +XXX,XX @@ static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
     gen_helper_vfp_sqrtd(vd, vm, cpu_env);
 }
 
-DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp)
-DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp)
-DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp)
+DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp, aa32_fp16_arith)
+DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp, aa32_fpsp_v2)
+DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp, aa32_fpdp_v2)
 
 static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
 {
-- 
2.20.1

Split out the handling of VMOV_reg_sp and VMOV_reg_dp so that we can
permit the insns if either FP or MVE are present.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-5-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

The fp_sysreg_checks() function is supposed to be returning an
FPSysRegCheckResult, which is an enum with three possible values.
However, three places in the function "return false" (a hangover from
a previous iteration of the design where the function just returned a
bool).  Make these return FPSysRegCheckFailed instead (for no
functional change, since both false and FPSysRegCheckFailed are
zero).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-6-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
         break;
     case ARM_VFP_FPSCR_NZCVQC:
         if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-            return false;
+            return FPSysRegCheckFailed;
         }
         break;
     case ARM_VFP_FPCXT_S:
     case ARM_VFP_FPCXT_NS:
         if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-            return false;
+            return FPSysRegCheckFailed;
         }
         if (!s->v8m_secure) {
-            return false;
+            return FPSysRegCheckFailed;
         }
         break;
     default:
-- 
2.20.1

If MVE is implemented for an M-profile CPU then it has a VPR
register, which tracks predication information.

Implement the read and write handling of this register, and
the migration of its state.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-7-peter.maydell@linaro.org
---
 target/arm/cpu.h           |  6 ++++++
 target/arm/machine.c       | 19 +++++++++++++++++++
 target/arm/translate-vfp.c | 38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
         uint32_t cpacr[M_REG_NUM_BANKS];
         uint32_t nsacr;
         int ltpsize;
+        uint32_t vpr;
     } v7m;
 
     /* Information associated with an exception about to be taken:
@@ -XXX,XX +XXX,XX @@ FIELD(V7M_FPCCR, ASPEN, 31, 1)
      R_V7M_FPCCR_UFRDY_MASK |                   \
      R_V7M_FPCCR_ASPEN_MASK)
 
+/* v7M VPR bits */
+FIELD(V7M_VPR, P0, 0, 16)
+FIELD(V7M_VPR, MASK01, 16, 4)
+FIELD(V7M_VPR, MASK23, 20, 4)
+
 /*
  * System register ID fields.
  */
diff --git a/target/arm/machine.c b/target/arm/machine.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_fp = {
     }
 };
 
+static bool mve_needed(void *opaque)
+{
+    ARMCPU *cpu = opaque;
+
+    return cpu_isar_feature(aa32_mve, cpu);
+}
+
+static const VMStateDescription vmstate_m_mve = {
+    .name = "cpu/m/mve",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = mve_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(env.v7m.vpr, ARMCPU),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_m = {
     .name = "cpu/m",
     .version_id = 4,
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m = {
         &vmstate_m_other_sp,
         &vmstate_m_v8m,
         &vmstate_m_fp,
+        &vmstate_m_mve,
         NULL
     }
 };
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
             return FPSysRegCheckFailed;
         }
         break;
+    case ARM_VFP_VPR:
+    case ARM_VFP_P0:
+        if (!dc_isar_feature(aa32_mve, s)) {
+            return FPSysRegCheckFailed;
+        }
+        break;
     default:
         return FPSysRegCheckFailed;
     }
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
         tcg_temp_free_i32(sfpa);
         break;
     }
+    case ARM_VFP_VPR:
+        /* Behaves as NOP if not privileged */
+        if (IS_USER(s)) {
+            break;
+        }
+        tmp = loadfn(s, opaque);
+        store_cpu_field(tmp, v7m.vpr);
+        break;
+    case ARM_VFP_P0:
+    {
+        TCGv_i32 vpr;
+        tmp = loadfn(s, opaque);
+        vpr = load_cpu_field(v7m.vpr);
+        tcg_gen_deposit_i32(vpr, vpr, tmp,
+                            R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
+        store_cpu_field(vpr, v7m.vpr);
+        tcg_temp_free_i32(tmp);
+        break;
+    }
     default:
         g_assert_not_reached();
     }
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
         tcg_temp_free_i32(fpscr);
         break;
     }
+    case ARM_VFP_VPR:
+        /* Behaves as NOP if not privileged */
+        if (IS_USER(s)) {
+            break;
+        }
+        tmp = load_cpu_field(v7m.vpr);
+        storefn(s, opaque, tmp);
+        break;
+    case ARM_VFP_P0:
+        tmp = load_cpu_field(v7m.vpr);
+        tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
+        storefn(s, opaque, tmp);
+        break;
     default:
         g_assert_not_reached();
     }
-- 
2.20.1

The M-profile FPSCR has an LTPSIZE field, but if MVE is not
implemented it is read-only and always reads as 4; this is how QEMU
currently handles it.

Make the field writable when MVE is implemented.

We can safely add the field to the MVE migration struct because
currently no CPUs enable MVE and so the migration struct is never
used.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-8-peter.maydell@linaro.org
---
 target/arm/cpu.h        | 3 ++-
 target/arm/machine.c    | 1 +
 target/arm/vfp_helper.c | 9 ++++++---
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
         uint32_t fpdscr[M_REG_NUM_BANKS];
         uint32_t cpacr[M_REG_NUM_BANKS];
         uint32_t nsacr;
-        int ltpsize;
+        uint32_t ltpsize;
         uint32_t vpr;
     } v7m;
 
@@ -XXX,XX +XXX,XX @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
 
 #define FPCR_LTPSIZE_SHIFT 16   /* LTPSIZE, M-profile only */
 #define FPCR_LTPSIZE_MASK (7 << FPCR_LTPSIZE_SHIFT)
+#define FPCR_LTPSIZE_LENGTH 3
 
 #define FPCR_NZCV_MASK (FPCR_N | FPCR_Z | FPCR_C | FPCR_V)
 #define FPCR_NZCVQC_MASK (FPCR_NZCV_MASK | FPCR_QC)
diff --git a/target/arm/machine.c b/target/arm/machine.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_mve = {
     .needed = mve_needed,
     .fields = (VMStateField[]) {
         VMSTATE_UINT32(env.v7m.vpr, ARMCPU),
+        VMSTATE_UINT32(env.v7m.ltpsize, ARMCPU),
         VMSTATE_END_OF_LIST()
     },
 };
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t vfp_get_fpscr(CPUARMState *env)
 
 void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
 {
+    ARMCPU *cpu = env_archcpu(env);
+
     /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
-    if (!cpu_isar_feature(any_fp16, env_archcpu(env))) {
+    if (!cpu_isar_feature(any_fp16, cpu)) {
         val &= ~FPCR_FZ16;
     }
 
@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
          * because in v7A no-short-vector-support cores still had to
          * allow Stride/Len to be written with the only effect that
          * some insns are required to UNDEF if the guest sets them.
-         *
-         * TODO: if M-profile MVE implemented, set LTPSIZE.
          */
         env->vfp.vec_len = extract32(val, 16, 3);
         env->vfp.vec_stride = extract32(val, 20, 2);
+    } else if (cpu_isar_feature(aa32_mve, cpu)) {
+        env->v7m.ltpsize = extract32(val, FPCR_LTPSIZE_SHIFT,
+                                     FPCR_LTPSIZE_LENGTH);
     }
 
     if (arm_feature(env, ARM_FEATURE_NEON)) {
-- 
2.20.1

Currently we allow board models to specify the initial value of the
Secure VTOR register, using an init-svtor property on the TYPE_ARMV7M
object which is plumbed through to the CPU.  Allow board models to
also specify the initial value of the Non-secure VTOR via a similar
init-nsvtor property.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-10-peter.maydell@linaro.org
---
 include/hw/arm/armv7m.h |  2 ++
 target/arm/cpu.h        |  2 ++
 hw/arm/armv7m.c         |  7 +++++++
 target/arm/cpu.c        | 10 ++++++++++
 4 files changed, 21 insertions(+)

diff --git a/include/hw/arm/armv7m.h b/include/hw/arm/armv7m.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/armv7m.h
+++ b/include/hw/arm/armv7m.h
@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(ARMv7MState, ARMV7M)
  *   devices will be automatically layered on top of this view.)
  * + Property "idau": IDAU interface (forwarded to CPU object)
  * + Property "init-svtor": secure VTOR reset value (forwarded to CPU object)
+ * + Property "init-nsvtor": non-secure VTOR reset value (forwarded to CPU object)
  * + Property "vfp": enable VFP (forwarded to CPU object)
  * + Property "dsp": enable DSP (forwarded to CPU object)
  * + Property "enable-bitband": expose bitbanded IO
@@ -XXX,XX +XXX,XX @@ struct ARMv7MState {
     MemoryRegion *board_memory;
     Object *idau;
     uint32_t init_svtor;
+    uint32_t init_nsvtor;
     bool enable_bitband;
     bool start_powered_off;
     bool vfp;
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
 
     /* For v8M, initial value of the Secure VTOR */
     uint32_t init_svtor;
+    /* For v8M, initial value of the Non-secure VTOR */
+    uint32_t init_nsvtor;
 
     /* [QEMU_]KVM_ARM_TARGET_* constant for this CPU, or
      * QEMU_KVM_ARM_TARGET_NONE if the kernel doesn't support this CPU type.
diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/armv7m.c
+++ b/hw/arm/armv7m.c
@@ -XXX,XX +XXX,XX @@ static void armv7m_realize(DeviceState *dev, Error **errp)
             return;
         }
     }
+    if (object_property_find(OBJECT(s->cpu), "init-nsvtor")) {
+        if (!object_property_set_uint(OBJECT(s->cpu), "init-nsvtor",
+                                      s->init_nsvtor, errp)) {
+            return;
+        }
+    }
     if (object_property_find(OBJECT(s->cpu), "start-powered-off")) {
         if (!object_property_set_bool(OBJECT(s->cpu), "start-powered-off",
                                       s->start_powered_off, errp)) {
@@ -XXX,XX +XXX,XX @@ static Property armv7m_properties[] = {
                      MemoryRegion *),
     DEFINE_PROP_LINK("idau", ARMv7MState, idau, TYPE_IDAU_INTERFACE, Object *),
     DEFINE_PROP_UINT32("init-svtor", ARMv7MState, init_svtor, 0),
+    DEFINE_PROP_UINT32("init-nsvtor", ARMv7MState, init_nsvtor, 0),
     DEFINE_PROP_BOOL("enable-bitband", ARMv7MState, enable_bitband, false),
     DEFINE_PROP_BOOL("start-powered-off", ARMv7MState, start_powered_off,
                      false),
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
         env->regs[14] = 0xffffffff;
 
         env->v7m.vecbase[M_REG_S] = cpu->init_svtor & 0xffffff80;
+        env->v7m.vecbase[M_REG_NS] = cpu->init_nsvtor & 0xffffff80;
 
         /* Load the initial SP and PC from offset 0 and 4 in the vector table */
         vecbase = env->v7m.vecbase[env->v7m.secure];
@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
                                        &cpu->init_svtor,
                                        OBJ_PROP_FLAG_READWRITE);
     }
+    if (arm_feature(&cpu->env, ARM_FEATURE_M)) {
+        /*
+         * Initial value of the NS VTOR (for cores without the Security
+         * extension, this is the only VTOR)
+         */
+        object_property_add_uint32_ptr(obj, "init-nsvtor",
+                                       &cpu->init_nsvtor,
+                                       OBJ_PROP_FLAG_READWRITE);
+    }
 
     qdev_property_add_static(DEVICE(obj), &arm_cpu_cfgend_property);
 
-- 
2.20.1

The official punctuation for Arm CPU names uses a hyphen, like
"Cortex-A9". We mostly follow this, but in a few places usage
without the hyphen has crept in. Fix those so we consistently
use the same way of writing the CPU name.

This commit was created with:
  git grep -z -l 'Cortex ' | xargs -0 sed -i 's/Cortex /Cortex-/'

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20210527095152.10968-1-peter.maydell@linaro.org
---
 docs/system/arm/aspeed.rst    | 4 ++--
 docs/system/arm/nuvoton.rst   | 6 +++---
 docs/system/arm/sabrelite.rst | 2 +-
 include/hw/arm/allwinner-h3.h | 2 +-
 hw/arm/aspeed.c               | 6 +++---
 hw/arm/mcimx6ul-evk.c         | 2 +-
 hw/arm/mcimx7d-sabre.c        | 2 +-
 hw/arm/npcm7xx_boards.c       | 4 ++--
 hw/arm/sabrelite.c            | 2 +-
 hw/misc/npcm7xx_clk.c         | 2 +-
 10 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/aspeed.rst
+++ b/docs/system/arm/aspeed.rst
@@ -XXX,XX +XXX,XX @@ The QEMU Aspeed machines model BMCs of various OpenPOWER systems and
 Aspeed evaluation boards. They are based on different releases of the
 Aspeed SoC : the AST2400 integrating an ARM926EJ-S CPU (400MHz), the
 AST2500 with an ARM1176JZS CPU (800MHz) and more recently the AST2600
-with dual cores ARM Cortex A7 CPUs (1.2GHz).
+with dual cores ARM Cortex-A7 CPUs (1.2GHz).
 
 The SoC comes with RAM, Gigabit ethernet, USB, SD/MMC, USB, SPI, I2C,
 etc.
@@ -XXX,XX +XXX,XX @@ AST2500 SoC based machines :
 
 AST2600 SoC based machines :
 
-- ``ast2600-evb``          Aspeed AST2600 Evaluation board (Cortex A7)
+- ``ast2600-evb``          Aspeed AST2600 Evaluation board (Cortex-A7)
 - ``tacoma-bmc``           OpenPOWER Witherspoon POWER9 AST2600 BMC
 
 Supported devices
diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/nuvoton.rst
+++ b/docs/system/arm/nuvoton.rst
@@ -XXX,XX +XXX,XX @@ Nuvoton iBMC boards (``npcm750-evb``, ``quanta-gsj``)
 
 The `Nuvoton iBMC`_ chips (NPCM7xx) are a family of ARM-based SoCs that are
 designed to be used as Baseboard Management Controllers (BMCs) in various
-servers. They all feature one or two ARM Cortex A9 CPU cores, as well as an
+servers. They all feature one or two ARM Cortex-A9 CPU cores, as well as an
 assortment of peripherals targeted for either Enterprise or Data Center /
 Hyperscale applications. The former is a superset of the latter, so NPCM750 has
 all the peripherals of NPCM730 and more.
 
 .. _Nuvoton iBMC: https://www.nuvoton.com/products/cloud-computing/ibmc/
 
-The NPCM750 SoC has two Cortex A9 cores and is targeted for the Enterprise
+The NPCM750 SoC has two Cortex-A9 cores and is targeted for the Enterprise
 segment. The following machines are based on this chip :
 
 - ``npcm750-evb``       Nuvoton NPCM750 Evaluation board
 
-The NPCM730 SoC has two Cortex A9 cores and is targeted for Data Center and
+The NPCM730 SoC has two Cortex-A9 cores and is targeted for Data Center and
 Hyperscale applications. The following machines are based on this chip :
 
 - ``quanta-gsj``        Quanta GSJ server BMC
diff --git a/docs/system/arm/sabrelite.rst b/docs/system/arm/sabrelite.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/sabrelite.rst
+++ b/docs/system/arm/sabrelite.rst
@@ -XXX,XX +XXX,XX @@ Supported devices
 
 The SABRE Lite machine supports the following devices:
 
- * Up to 4 Cortex A9 cores
+ * Up to 4 Cortex-A9 cores
  * Generic Interrupt Controller
  * 1 Clock Controller Module
  * 1 System Reset Controller
diff --git a/include/hw/arm/allwinner-h3.h b/include/hw/arm/allwinner-h3.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/allwinner-h3.h
+++ b/include/hw/arm/allwinner-h3.h
@@ -XXX,XX +XXX,XX @@
  */
 
 /*
- * The Allwinner H3 is a System on Chip containing four ARM Cortex A7
+ * The Allwinner H3 is a System on Chip containing four ARM Cortex-A7
  * processor cores. Features and specifications include DDR2/DDR3 memory,
  * SD/MMC storage cards, 10/100/1000Mbit Ethernet, USB 2.0, HDMI and
  * various I/O modules.
diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_ast2600_evb_class_init(ObjectClass *oc, void *data)
     MachineClass *mc = MACHINE_CLASS(oc);
     AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 
-    mc->desc       = "Aspeed AST2600 EVB (Cortex A7)";
+    mc->desc       = "Aspeed AST2600 EVB (Cortex-A7)";
     amc->soc_name  = "ast2600-a1";
     amc->hw_strap1 = AST2600_EVB_HW_STRAP1;
     amc->hw_strap2 = AST2600_EVB_HW_STRAP2;
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_tacoma_class_init(ObjectClass *oc, void *data)
     MachineClass *mc = MACHINE_CLASS(oc);
     AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 
-    mc->desc       = "OpenPOWER Tacoma BMC (Cortex A7)";
+    mc->desc       = "OpenPOWER Tacoma BMC (Cortex-A7)";
     amc->soc_name  = "ast2600-a1";
     amc->hw_strap1 = TACOMA_BMC_HW_STRAP1;
     amc->hw_strap2 = TACOMA_BMC_HW_STRAP2;
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_rainier_class_init(ObjectClass *oc, void *data)
     MachineClass *mc = MACHINE_CLASS(oc);
     AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 
-    mc->desc       = "IBM Rainier BMC (Cortex A7)";
+    mc->desc       = "IBM Rainier BMC (Cortex-A7)";
     amc->soc_name  = "ast2600-a1";
     amc->hw_strap1 = RAINIER_BMC_HW_STRAP1;
     amc->hw_strap2 = RAINIER_BMC_HW_STRAP2;
diff --git a/hw/arm/mcimx6ul-evk.c b/hw/arm/mcimx6ul-evk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mcimx6ul-evk.c
+++ b/hw/arm/mcimx6ul-evk.c
@@ -XXX,XX +XXX,XX @@ static void mcimx6ul_evk_init(MachineState *machine)
 
 static void mcimx6ul_evk_machine_init(MachineClass *mc)
 {
-    mc->desc = "Freescale i.MX6UL Evaluation Kit (Cortex A7)";
+    mc->desc = "Freescale i.MX6UL Evaluation Kit (Cortex-A7)";
     mc->init = mcimx6ul_evk_init;
     mc->max_cpus = FSL_IMX6UL_NUM_CPUS;
     mc->default_ram_id = "mcimx6ul-evk.ram";
diff --git a/hw/arm/mcimx7d-sabre.c b/hw/arm/mcimx7d-sabre.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mcimx7d-sabre.c
+++ b/hw/arm/mcimx7d-sabre.c
@@ -XXX,XX +XXX,XX @@ static void mcimx7d_sabre_init(MachineState *machine)
 
 static void mcimx7d_sabre_machine_init(MachineClass *mc)
 {
-    mc->desc = "Freescale i.MX7 DUAL SABRE (Cortex A7)";
+    mc->desc = "Freescale i.MX7 DUAL SABRE (Cortex-A7)";
     mc->init = mcimx7d_sabre_init;
     mc->max_cpus = FSL_IMX7_NUM_CPUS;
     mc->default_ram_id = "mcimx7d-sabre.ram";
diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/npcm7xx_boards.c
+++ b/hw/arm/npcm7xx_boards.c
@@ -XXX,XX +XXX,XX @@ static void npcm750_evb_machine_class_init(ObjectClass *oc, void *data)
 
     npcm7xx_set_soc_type(nmc, TYPE_NPCM750);
 
-    mc->desc = "Nuvoton NPCM750 Evaluation Board (Cortex A9)";
+    mc->desc = "Nuvoton NPCM750 Evaluation Board (Cortex-A9)";
     mc->init = npcm750_evb_init;
     mc->default_ram_size = 512 * MiB;
 };
@@ -XXX,XX +XXX,XX @@ static void gsj_machine_class_init(ObjectClass *oc, void *data)
 
     npcm7xx_set_soc_type(nmc, TYPE_NPCM730);
 
-    mc->desc = "Quanta GSJ (Cortex A9)";
+    mc->desc = "Quanta GSJ (Cortex-A9)";
     mc->init = quanta_gsj_init;
     mc->default_ram_size = 512 * MiB;
 };
diff --git a/hw/arm/sabrelite.c b/hw/arm/sabrelite.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/sabrelite.c
+++ b/hw/arm/sabrelite.c
@@ -XXX,XX +XXX,XX @@ static void sabrelite_init(MachineState *machine)
 
 static void sabrelite_machine_init(MachineClass *mc)
 {
-    mc->desc = "Freescale i.MX6 Quad SABRE Lite Board (Cortex A9)";
+    mc->desc = "Freescale i.MX6 Quad SABRE Lite Board (Cortex-A9)";
     mc->init = sabrelite_init;
     mc->max_cpus = FSL_IMX6_NUM_CPUS;
     mc->ignore_memory_transaction_failures = true;
diff --git a/hw/misc/npcm7xx_clk.c b/hw/misc/npcm7xx_clk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/npcm7xx_clk.c
+++ b/hw/misc/npcm7xx_clk.c
@@ -XXX,XX +XXX,XX @@
 #define NPCM7XX_CLOCK_REF_HZ            (25000000)
 
 /* Register Field Definitions */
-#define NPCM7XX_CLK_WDRCR_CA9C  BIT(0) /* Cortex A9 Cores */
+#define NPCM7XX_CLK_WDRCR_CA9C  BIT(0) /* Cortex-A9 Cores */
 
 #define PLLCON_LOKI     BIT(31)
 #define PLLCON_LOKS     BIT(30)
-- 
2.20.1

From: Damien Goutte-Gattat <dgouttegattat@incenp.org>

The 4.x branch of Sphinx introduces a breaking change, as generated man
pages are now written to subdirectories corresponding to the manual
section they belong to. This results in `make install` erroring out when
attempting to install the man pages, because they are not where it
expects to find them.

This patch restores the behavior of Sphinx 3.x regarding man pages.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/256
Signed-off-by: Damien Goutte-Gattat <dgouttegattat@incenp.org>
Message-id: 20210503161422.15028-1-dgouttegattat@incenp.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/conf.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/conf.py b/docs/conf.py
index XXXXXXX..XXXXXXX 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -XXX,XX +XXX,XX @@
      ['Stefan Hajnoczi <stefanha@redhat.com>',
       'Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>'], 1),
 ]
+man_make_section_directory = False
 
 # -- Options for Texinfo output -------------------------------------------
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The operands to tcg_gen_atomic_fetch_s{min,max}_i64 must
be signed, so that the inputs are properly extended.
Zero extend the result afterward, as needed.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/364
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20210602020720.47679-1-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     int o3_opc = extract32(insn, 12, 4);
     bool r = extract32(insn, 22, 1);
     bool a = extract32(insn, 23, 1);
-    TCGv_i64 tcg_rs, clean_addr;
+    TCGv_i64 tcg_rs, tcg_rt, clean_addr;
     AtomicThreeOpFn *fn = NULL;
+    MemOp mop = s->be_data | size | MO_ALIGN;
 
     if (is_vector || !dc_isar_feature(aa64_atomics, s)) {
         unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
         break;
     case 004: /* LDSMAX */
         fn = tcg_gen_atomic_fetch_smax_i64;
+        mop |= MO_SIGN;
         break;
     case 005: /* LDSMIN */
         fn = tcg_gen_atomic_fetch_smin_i64;
+        mop |= MO_SIGN;
         break;
     case 006: /* LDUMAX */
         fn = tcg_gen_atomic_fetch_umax_i64;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     }
 
     tcg_rs = read_cpu_reg(s, rs, true);
+    tcg_rt = cpu_reg(s, rt);
 
     if (o3_opc == 1) { /* LDCLR */
         tcg_gen_not_i64(tcg_rs, tcg_rs);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     /* The tcg atomic primitives are all full barriers.  Therefore we
      * can ignore the Acquire and Release bits of this instruction.
      */
-    fn(cpu_reg(s, rt), clean_addr, tcg_rs, get_mem_index(s),
-       s->be_data | size | MO_ALIGN);
+    fn(tcg_rt, clean_addr, tcg_rs, get_mem_index(s), mop);
+
+    if ((mop & MO_SIGN) && size != MO_64) {
+        tcg_gen_ext32u_i64(tcg_rt, tcg_rt);
+    }
 }
 
 /*
-- 
2.20.1

From: Jamie Iles <jamie@nuviainc.com>

The DAIF and PAC checks used raise_exception_ra to raise an exception
and unwind CPU state but raise_exception_ra is currently designed for
handling data aborts as the syndrome is partially precomputed and
encoded in the TB and then merged in merge_syn_data_abort when handling
the data abort.  Using raise_exception_ra for DAIF and PAC checks
results in an empty syndrome being retrieved from data[2] in
restore_state_to_opc and setting ESR to 0.  This manifested as:

kvm [571]: Unknown exception class: esr: 0x000000 –
  Unknown/Uncategorized

when launching a KVM guest when the host qemu used a CPU supporting
EL2+pointer authentication and enabling pointer authentication in the
guest.

Rework raise_exception_ra such that the state is restored before raising
the exception so that the exception is not clobbered by
restore_state_to_opc.

Fixes: 0d43e1a2d29a ("target/arm: Add PAuth helpers")
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
[PMM: added comment]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/op_helper.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ void raise_exception(CPUARMState *env, uint32_t excp,
 void raise_exception_ra(CPUARMState *env, uint32_t excp, uint32_t syndrome,
                         uint32_t target_el, uintptr_t ra)
 {
-    CPUState *cs = do_raise_exception(env, excp, syndrome, target_el);
-    cpu_loop_exit_restore(cs, ra);
+    CPUState *cs = env_cpu(env);
+
+    /*
+     * restore_state_to_opc() will set env->exception.syndrome, so
+     * we must restore CPU state here before setting the syndrome
+     * the caller passed us, and cannot use cpu_loop_exit_restore().
+     */
+    cpu_restore_state(cs, ra, true);
+    raise_exception(env, excp, syndrome, target_el);
 }
 
 uint64_t HELPER(neon_tbl)(CPUARMState *env, uint32_t desc,
-- 
2.20.1

From: Jamie Iles <jamie@nuviainc.com>

Now that there are no other users of do_raise_exception, fold it into
raise_exception.

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/op_helper.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@
 #define SIGNBIT (uint32_t)0x80000000
 #define SIGNBIT64 ((uint64_t)1 << 63)
 
-static CPUState *do_raise_exception(CPUARMState *env, uint32_t excp,
-                                    uint32_t syndrome, uint32_t target_el)
+void raise_exception(CPUARMState *env, uint32_t excp,
+                     uint32_t syndrome, uint32_t target_el)
 {
     CPUState *cs = env_cpu(env);
 
@@ -XXX,XX +XXX,XX @@ static CPUState *do_raise_exception(CPUARMState *env, uint32_t excp,
     cs->exception_index = excp;
     env->exception.syndrome = syndrome;
     env->exception.target_el = target_el;
-
-    return cs;
-}
-
-void raise_exception(CPUARMState *env, uint32_t excp,
-                     uint32_t syndrome, uint32_t target_el)
-{
-    CPUState *cs = do_raise_exception(env, excp, syndrome, target_el);
     cpu_loop_exit(cs);
 }
 
-- 
2.20.1

From: Jamie Iles <jamie@nuviainc.com>

Now that raise_exception_ra restores the state before raising the
exception we can use restore_exception_ra to perform the state restore +
exception raising without clobbering the syndrome.

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
[PMM: Keep the one line of the comment that is still relevant]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/mte_helper.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
 
     switch (tcf) {
     case 1:
-        /*
-         * Tag check fail causes a synchronous exception.
-         *
-         * In restore_state_to_opc, we set the exception syndrome
-         * for the load or store operation.  Unwind first so we
-         * may overwrite that with the syndrome for the tag check.
-         */
-        cpu_restore_state(env_cpu(env), ra, true);
+        /* Tag check fail causes a synchronous exception. */
         env->exception.vaddress = dirty_ptr;
 
         is_write = FIELD_EX32(desc, MTEDESC, WRITE);
         syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0,
                                     is_write, 0x11);
-        raise_exception(env, EXCP_DATA_ABORT, syn, exception_target_el(env));
+        raise_exception_ra(env, EXCP_DATA_ABORT, syn,
+                           exception_target_el(env), ra);
         /* noreturn, but fall through to the assert anyway */
 
     case 0:
-- 
2.20.1

From: Jamie Iles <jamie@nuviainc.com>

The sequence cpu_restore_state() + raise_exception() is equivalent to
raise_exception_ra(), so use that instead.  (In this case we never
cared about the syndrome value, because M-profile doesn't use the
syndrome; the old code was just written unnecessarily awkwardly.)

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
[PMM: Retain edited version of comment; rewrite commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/m_helper.c  | 5 +----
 target/arm/op_helper.c | 9 +++------
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
             limit = is_psp ? env->v7m.psplim[false] : env->v7m.msplim[false];
 
             if (val < limit) {
-                CPUState *cs = env_cpu(env);
-
-                cpu_restore_state(cs, GETPC(), true);
-                raise_exception(env, EXCP_STKOF, 0, 1);
+                raise_exception_ra(env, EXCP_STKOF, 0, 1, GETPC());
             }
 
             if (is_psp) {
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v8m_stackcheck)(CPUARMState *env, uint32_t newvalue)
      * raising an exception if the limit is breached.
      */
     if (newvalue < v7m_sp_limit(env)) {
-        CPUState *cs = env_cpu(env);
-
         /*
          * Stack limit exceptions are a rare case, so rather than syncing
-         * PC/condbits before the call, we use cpu_restore_state() to
-         * get them right before raising the exception.
+         * PC/condbits before the call, we use raise_exception_ra() so
+         * that cpu_restore_state() will sort them out.
          */
-        cpu_restore_state(cs, GETPC(), true);
-        raise_exception(env, EXCP_STKOF, 0, 1);
+        raise_exception_ra(env, EXCP_STKOF, 0, 1, GETPC());
     }
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Note that the SVE BFLOAT16 support does not require SVE2,
it is an independent extension.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_predinv(const ARMISARegisters *id)
     return FIELD_EX32(id->id_isar6, ID_ISAR6, SPECRES) != 0;
 }
 
+static inline bool isar_feature_aa32_bf16(const ARMISARegisters *id)
+{
+    return FIELD_EX32(id->id_isar6, ID_ISAR6, BF16) != 0;
+}
+
 static inline bool isar_feature_aa32_i8mm(const ARMISARegisters *id)
 {
     return FIELD_EX32(id->id_isar6, ID_ISAR6, I8MM) != 0;
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_dcpodp(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) >= 2;
 }
 
+static inline bool isar_feature_aa64_bf16(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, BF16) != 0;
+}
+
 static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id)
 {
     /* We always set the AdvSIMD and FP fields identically.  */
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sve2_bitperm(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BITPERM) != 0;
 }
 
+static inline bool isar_feature_aa64_sve_bf16(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BFLOAT16) != 0;
+}
+
 static inline bool isar_feature_aa64_sve2_sha3(const ARMISARegisters *id)
 {
     return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, SHA3) != 0;
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
     int rd = extract32(insn, 0, 5);
 
     if (mos) {
-        unallocated_encoding(s);
-        return;
+        goto do_unallocated;
     }
 
     switch (opcode) {
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
         /* FCVT between half, single and double precision */
         int dtype = extract32(opcode, 0, 2);
         if (type == 2 || dtype == type) {
-            unallocated_encoding(s);
-            return;
+            goto do_unallocated;
         }
         if (!fp_access_check(s)) {
             return;
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
 
     case 0x10 ... 0x13: /* FRINT{32,64}{X,Z} */
         if (type > 1 || !dc_isar_feature(aa64_frint, s)) {
-            unallocated_encoding(s);
-            return;
+            goto do_unallocated;
         }
         /* fall through */
     case 0x0 ... 0x3:
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
             break;
         case 3:
             if (!dc_isar_feature(aa64_fp16, s)) {
-                unallocated_encoding(s);
-                return;
+                goto do_unallocated;
             }
 
             if (!fp_access_check(s)) {
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
             handle_fp_1src_half(s, opcode, rd, rn);
             break;
         default:
-            unallocated_encoding(s);
+            goto do_unallocated;
         }
         break;
 
     default:
+    do_unallocated:
         unallocated_encoding(s);
         break;
     }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is the 64-bit BFCVT and the 32-bit VCVT{B,T}.BF16.F32.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h        |  1 +
 target/arm/vfp.decode      |  2 ++
 target/arm/translate-a64.c | 19 +++++++++++++++++++
 target/arm/translate-vfp.c | 24 ++++++++++++++++++++++++
 target/arm/vfp_helper.c    |  5 +++++
 5 files changed, 51 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
 
 DEF_HELPER_2(vfp_fcvtds, f64, f32, env)
 DEF_HELPER_2(vfp_fcvtsd, f32, f64, env)
+DEF_HELPER_FLAGS_2(bfcvt, TCG_CALL_NO_RWG, i32, f32, ptr)
 
 DEF_HELPER_2(vfp_uitoh, f16, i32, ptr)
 DEF_HELPER_2(vfp_uitos, f32, i32, ptr)
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
 
 # VCVTB and VCVTT to f16: Vd format is always vd_sp;
 # Vm format depends on size bit
+VCVT_b16_f32 ---- 1110 1.11 0011 .... 1001 t:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
 VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn)
     case 0x3: /* FSQRT */
         gen_helper_vfp_sqrts(tcg_res, tcg_op, cpu_env);
         goto done;
+    case 0x6: /* BFCVT */
+        gen_fpst = gen_helper_bfcvt;
+        break;
     case 0x8: /* FRINTN */
     case 0x9: /* FRINTP */
     case 0xa: /* FRINTM */
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
         }
         break;
 
+    case 0x6:
+        switch (type) {
+        case 1: /* BFCVT */
+            if (!dc_isar_feature(aa64_bf16, s)) {
+                goto do_unallocated;
+            }
+            if (!fp_access_check(s)) {
+                return;
+            }
+            handle_fp_1src_single(s, opcode, rd, rn);
+            break;
+        default:
+            goto do_unallocated;
+        }
+        break;
+
     default:
     do_unallocated:
         unallocated_encoding(s);
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
     return true;
 }
 
+static bool trans_VCVT_b16_f32(DisasContext *s, arg_VCVT_b16_f32 *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = fpstatus_ptr(FPST_FPCR);
+    tmp = tcg_temp_new_i32();
+
+    vfp_load_reg32(tmp, a->vm);
+    gen_helper_bfcvt(tmp, tmp, fpst);
+    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
 static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
 {
     TCGv_ptr fpst;
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ float32 VFP_HELPER(fcvts, d)(float64 x, CPUARMState *env)
     return float64_to_float32(x, &env->vfp.fp_status);
 }
 
+uint32_t HELPER(bfcvt)(float32 x, void *status)
+{
+    return float32_to_bfloat16(x, status);
+}
+
 /*
  * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
  * must always round-to-nearest; the AArch64 ones honour the FPSCR
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFCVT{N,T} for both AArch64 AdvSIMD and SVE,
and VCVT.BF16.F32 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-sve.h     |  4 ++++
 target/arm/helper.h         |  1 +
 target/arm/neon-dp.decode   |  1 +
 target/arm/sve.decode       |  2 ++
 target/arm/sve_helper.c     |  2 ++
 target/arm/translate-a64.c  | 17 ++++++++++++++
 target/arm/translate-neon.c | 45 +++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c  | 16 +++++++++++++
 target/arm/vfp_helper.c     |  7 ++++++
 9 files changed, 95 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve_fcvt_hd, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_fcvt_sd, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_bfcvt, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(sve_fcvtzs_hh, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve2_fcvtnt_sh, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve2_fcvtnt_ds, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_bfcvtnt, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(sve2_fcvtlt_hs, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
 DEF_HELPER_2(vfp_fcvtds, f64, f32, env)
 DEF_HELPER_2(vfp_fcvtsd, f32, f64, env)
 DEF_HELPER_FLAGS_2(bfcvt, TCG_CALL_NO_RWG, i32, f32, ptr)
+DEF_HELPER_FLAGS_2(bfcvt_pair, TCG_CALL_NO_RWG, i32, i64, ptr)
 
 DEF_HELPER_2(vfp_uitoh, f16, i32, ptr)
 DEF_HELPER_2(vfp_uitos, f32, i32, ptr)
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
     VRINTZ       1111 001 11 . 11 .. 10 .... 0 1011 . . 0 .... @2misc
 
     VCVT_F16_F32 1111 001 11 . 11 .. 10 .... 0 1100 0 . 0 .... @2misc_q0
+    VCVT_B16_F32 1111 001 11 . 11 .. 10 .... 0 1100 1 . 0 .... @2misc_q0
 
     VRINTM       1111 001 11 . 11 .. 10 .... 0 1101 . . 0 .... @2misc
 
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FNMLS_zpzzz     01100101 .. 1 ..... 111 ... ..... .....         @rdn_pg_rm_ra
 # SVE floating-point convert precision
 FCVT_sh         01100101 10 0010 00 101 ... ..... .....         @rd_pg_rn_e0
 FCVT_hs         01100101 10 0010 01 101 ... ..... .....         @rd_pg_rn_e0
+BFCVT           01100101 10 0010 10 101 ... ..... .....         @rd_pg_rn_e0
 FCVT_dh         01100101 11 0010 00 101 ... ..... .....         @rd_pg_rn_e0
 FCVT_hd         01100101 11 0010 01 101 ... ..... .....         @rd_pg_rn_e0
 FCVT_ds         01100101 11 0010 10 101 ... ..... .....         @rd_pg_rn_e0
@@ -XXX,XX +XXX,XX @@ RAX1            01000101 00 1 ..... 11110 1 ..... .....  @rd_rn_rm_e0
 FCVTXNT_ds      01100100 00 0010 10 101 ... ..... .....  @rd_pg_rn_e0
 FCVTX_ds        01100101 00 0010 10 101 ... ..... .....  @rd_pg_rn_e0
 FCVTNT_sh       01100100 10 0010 00 101 ... ..... .....  @rd_pg_rn_e0
+BFCVTNT         01100100 10 0010 10 101 ... ..... .....  @rd_pg_rn_e0
 FCVTLT_hs       01100100 10 0010 01 101 ... ..... .....  @rd_pg_rn_e0
 FCVTNT_ds       01100100 11 0010 10 101 ... ..... .....  @rd_pg_rn_e0
 FCVTLT_sd       01100100 11 0010 11 101 ... ..... .....  @rd_pg_rn_e0
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint64_t vfp_float64_to_uint64_rtz(float64 f, float_status *s)
 
 DO_ZPZ_FP(sve_fcvt_sh, uint32_t, H1_4, sve_f32_to_f16)
 DO_ZPZ_FP(sve_fcvt_hs, uint32_t, H1_4, sve_f16_to_f32)
+DO_ZPZ_FP(sve_bfcvt,   uint32_t, H1_4, float32_to_bfloat16)
 DO_ZPZ_FP(sve_fcvt_dh, uint64_t,     , sve_f64_to_f16)
 DO_ZPZ_FP(sve_fcvt_hd, uint64_t,     , sve_f16_to_f64)
 DO_ZPZ_FP(sve_fcvt_ds, uint64_t,     , float64_to_float32)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc)  \
     } while (i != 0);                                                         \
 }
 
+DO_FCVTNT(sve_bfcvtnt,    uint32_t, uint16_t, H1_4, H1_2, float32_to_bfloat16)
 DO_FCVTNT(sve2_fcvtnt_sh, uint32_t, uint16_t, H1_4, H1_2, sve_f32_to_f16)
 DO_FCVTNT(sve2_fcvtnt_ds, uint64_t, uint32_t,     , H1_4, float64_to_float32)
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_2misc_narrow(DisasContext *s, bool scalar,
                 tcg_temp_free_i32(ahp);
             }
             break;
+        case 0x36: /* BFCVTN, BFCVTN2 */
+            {
+                TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
+                gen_helper_bfcvt_pair(tcg_res[pass], tcg_op, fpst);
+                tcg_temp_free_ptr(fpst);
+            }
+            break;
         case 0x56:  /* FCVTXN, FCVTXN2 */
             /* 64 bit to 32 bit float conversion
              * with von Neumann rounding (round to odd)
@@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
             }
             handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd);
             return;
+        case 0x36: /* BFCVTN, BFCVTN2 */
+            if (!dc_isar_feature(aa64_bf16, s) || size != 2) {
+                unallocated_encoding(s);
+                return;
+            }
+            if (!fp_access_check(s)) {
+                return;
+            }
+            handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd);
+            return;
         case 0x17: /* FCVTL, FCVTL2 */
             if (!fp_access_check(s)) {
                 return;
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
     return true;
 }
 
+static bool trans_VCVT_B16_F32(DisasContext *s, arg_2misc *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i64 tmp;
+    TCGv_i32 dst0, dst1;
+
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vm & 1) || (a->size != 1)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = fpstatus_ptr(FPST_STD);
+    tmp = tcg_temp_new_i64();
+    dst0 = tcg_temp_new_i32();
+    dst1 = tcg_temp_new_i32();
+
+    read_neon_element64(tmp, a->vm, 0, MO_64);
+    gen_helper_bfcvt_pair(dst0, tmp, fpst);
+
+    read_neon_element64(tmp, a->vm, 1, MO_64);
+    gen_helper_bfcvt_pair(dst1, tmp, fpst);
+
+    write_neon_element32(dst0, a->vd, 0, MO_32);
+    write_neon_element32(dst1, a->vd, 1, MO_32);
+
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i32(dst0);
+    tcg_temp_free_i32(dst1);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
+
 static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
 {
     TCGv_ptr fpst;
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_hs(DisasContext *s, arg_rpr_esz *a)
     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_hs);
 }
 
+static bool trans_BFCVT(DisasContext *s, arg_rpr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_bfcvt);
+}
+
 static bool trans_FCVT_dh(DisasContext *s, arg_rpr_esz *a)
 {
     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_dh);
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVTNT_sh(DisasContext *s, arg_rpr_esz *a)
     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtnt_sh);
 }
 
+static bool trans_BFCVTNT(DisasContext *s, arg_rpr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_bfcvtnt);
+}
+
 static bool trans_FCVTNT_ds(DisasContext *s, arg_rpr_esz *a)
 {
     if (!dc_isar_feature(aa64_sve2, s)) {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(bfcvt)(float32 x, void *status)
     return float32_to_bfloat16(x, status);
 }
 
+uint32_t HELPER(bfcvt_pair)(uint64_t pair, void *status)
+{
+    bfloat16 lo = float32_to_bfloat16(extract64(pair, 0, 32), status);
+    bfloat16 hi = float32_to_bfloat16(extract64(pair, 32, 32), status);
+    return deposit32(lo, 16, 16, hi);
+}
+
 /*
  * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
  * must always round-to-nearest; the AArch64 ones honour the FPSCR
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

For Arm BFDOT and BFMMLA, we need a version of round-to-odd
that overflows to infinity, instead of the max normal number.

Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-6-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/fpu/softfloat-types.h | 4 +++-
 fpu/softfloat-parts.c.inc     | 6 ++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index XXXXXXX..XXXXXXX 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
     float_round_up           = 2,
     float_round_to_zero      = 3,
     float_round_ties_away    = 4,
-    /* Not an IEEE rounding mode: round to the closest odd mantissa value */
+    /* Not an IEEE rounding mode: round to closest odd, overflow to max */
     float_round_to_odd       = 5,
+    /* Not an IEEE rounding mode: round to closest odd, overflow to inf */
+    float_round_to_odd_inf   = 6,
 } FloatRoundMode;
 
 /*
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
         g_assert_not_reached();
     }
 
+    overflow_norm = false;
     switch (s->float_rounding_mode) {
     case float_round_nearest_even:
-        overflow_norm = false;
         inc = ((p->frac_lo & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
         break;
     case float_round_ties_away:
-        overflow_norm = false;
         inc = frac_lsbm1;
         break;
     case float_round_to_zero:
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
         break;
     case float_round_to_odd:
         overflow_norm = true;
+        /* fall through */
+    case float_round_to_odd_inf:
         inc = p->frac_lo & frac_lsb ? 0 : round_mask;
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
                        ? frac_lsbm1 : 0);
                 break;
             case float_round_to_odd:
+            case float_round_to_odd_inf:
                 inc = p->frac_lo & frac_lsb ? 0 : round_mask;
                 break;
             default:
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFDOT for both AArch64 AdvSIMD and SVE,
and VDOT.BF16 for AArch32 NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-7-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  3 +++
 target/arm/neon-shared.decode |  2 ++
 target/arm/sve.decode         |  3 +++
 target/arm/translate-a64.c    | 20 ++++++++++++++++++
 target/arm/translate-neon.c   |  9 ++++++++
 target/arm/translate-sve.c    | 12 +++++++++++
 target/arm/vec_helper.c       | 40 +++++++++++++++++++++++++++++++++++
 7 files changed, 89 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_ummla_b, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUDOT          1111 110 00 . 10 .... .... 1101 . q:1 . 1 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
 VUSDOT         1111 110 01 . 10 .... .... 1101 . q:1 . 0 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+VDOT_b16       1111 110 00 . 00 .... .... 1101 . q:1 . 0 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
 # VFM[AS]L
 VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
 FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_e0
 FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_e0
 
+### SVE2 floating-point bfloat16 dot-product
+BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
+
 ### SVE2 floating-point multiply-add long (indexed)
 FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
 FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         feature = dc_isar_feature(aa64_fcma, s);
         break;
+    case 0x1f: /* BFDOT */
+        switch (size) {
+        case 1:
+            feature = dc_isar_feature(aa64_bf16, s);
+            break;
+        default:
+            unallocated_encoding(s);
+            return;
+        }
+        break;
     default:
         unallocated_encoding(s);
         return;
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         return;
 
+    case 0xf: /* BFDOT */
+        switch (size) {
+        case 1:
+            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        return;
+
     default:
         g_assert_not_reached();
     }
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VUSDOT(DisasContext *s, arg_VUSDOT *a)
                         gen_helper_gvec_usdot_b);
 }
 
+static bool trans_VDOT_b16(DisasContext *s, arg_VDOT_b16 *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0,
+                        gen_helper_gvec_bfdot);
+}
+
 static bool trans_VFML(DisasContext *s, arg_VFML *a)
 {
     int opr_sz;
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_UMMLA(DisasContext *s, arg_rrrr_esz *a)
 {
     return do_i8mm_zzzz_ool(s, a, gen_helper_gvec_ummla_b, 0);
 }
+
+static bool trans_BFDOT_zzzz(DisasContext *s, arg_rrrr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfdot,
+                          a->rd, a->rn, a->rm, a->ra, 0);
+    }
+    return true;
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static void do_mmla_b(void *vd, void *vn, void *vm, void *va, uint32_t desc,
 DO_MMLA_B(gvec_smmla_b, do_smmla_b)
 DO_MMLA_B(gvec_ummla_b, do_ummla_b)
 DO_MMLA_B(gvec_usmmla_b, do_usmmla_b)
+
+/*
+ * BFloat16 Dot Product
+ */
+
+static float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2)
+{
+    /* FPCR is ignored for BFDOT and BFMMLA. */
+    float_status bf_status = {
+        .tininess_before_rounding = float_tininess_before_rounding,
+        .float_rounding_mode = float_round_to_odd_inf,
+        .flush_to_zero = true,
+        .flush_inputs_to_zero = true,
+        .default_nan_mode = true,
+    };
+    float32 t1, t2;
+
+    /*
+     * Extract each BFloat16 from the element pair, and shift
+     * them such that they become float32.
+     */
+    t1 = float32_mul(e1 << 16, e2 << 16, &bf_status);
+    t2 = float32_mul(e1 & 0xffff0000u, e2 & 0xffff0000u, &bf_status);
+    t1 = float32_add(t1, t2, &bf_status);
+    t1 = float32_add(sum, t1, &bf_status);
+
+    return t1;
+}
+
+void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    float32 *d = vd, *a = va;
+    uint32_t *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 4; ++i) {
+        d[i] = bfdotadd(a[i], n[i], m[i]);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFDOT for both AArch64 AdvSIMD and SVE,
and VDOT.BF16 for AArch32 NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  2 ++
 target/arm/neon-shared.decode |  2 ++
 target/arm/sve.decode         |  3 +++
 target/arm/translate-a64.c    | 41 +++++++++++++++++++++++++++--------
 target/arm/translate-neon.c   |  9 ++++++++
 target/arm/translate-sve.c    | 12 ++++++++++
 target/arm/vec_helper.c       | 20 +++++++++++++++++
 7 files changed, 80 insertions(+), 9 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG,
 
 DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
 
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUSDOT_scalar  1111 1110 1 . 00 .... .... 1101 . q:1 index:1 0 vm:4 \
                vn=%vn_dp vd=%vd_dp
 VSUDOT_scalar  1111 1110 1 . 00 .... .... 1101 . q:1 index:1 1 vm:4 \
                vn=%vn_dp vd=%vd_dp
+VDOT_b16_scal  1111 1110 0 . 00 .... .... 1101 . q:1 index:1 0 vm:4 \
+               vn=%vn_dp vd=%vd_dp
 
 %vfml_scalar_q0_rm 0:3 5:1
 %vfml_scalar_q1_index 5:1 3:1
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
 FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
 FMLSLB_zzxw     01100100 10 1 ..... 0110.0 ..... .....     @rrxr_3a esz=2
 FMLSLT_zzxw     01100100 10 1 ..... 0110.1 ..... .....     @rrxr_3a esz=2
+
+### SVE2 floating-point bfloat16 dot-product (indexed)
+BFDOT_zzxz      01100100 01 1 ..... 010000 ..... .....     @rrxr_2 esz=2
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
             return;
         }
         break;
-    case 0x0f: /* SUDOT, USDOT */
-        if (is_scalar || (size & 1) || !dc_isar_feature(aa64_i8mm, s)) {
+    case 0x0f:
+        switch (size) {
+        case 0: /* SUDOT */
+        case 2: /* USDOT */
+            if (is_scalar || !dc_isar_feature(aa64_i8mm, s)) {
+                unallocated_encoding(s);
+                return;
+            }
+            break;
+        case 1: /* BFDOT */
+            if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
+                unallocated_encoding(s);
+                return;
+            }
+            break;
+        default:
             unallocated_encoding(s);
             return;
         }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                          u ? gen_helper_gvec_udot_idx_b
                          : gen_helper_gvec_sdot_idx_b);
         return;
-    case 0x0f: /* SUDOT, USDOT */
-        gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
-                         extract32(insn, 23, 1)
-                         ? gen_helper_gvec_usdot_idx_b
-                         : gen_helper_gvec_sudot_idx_b);
-        return;
-
+    case 0x0f:
+        switch (extract32(insn, 22, 2)) {
+        case 0: /* SUDOT */
+            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
+                             gen_helper_gvec_sudot_idx_b);
+            return;
+        case 1: /* BFDOT */
+            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
+                             gen_helper_gvec_bfdot_idx);
+            return;
+        case 2: /* USDOT */
+            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
+                             gen_helper_gvec_usdot_idx_b);
+            return;
+        }
+        g_assert_not_reached();
     case 0x11: /* FCMLA #0 */
     case 0x13: /* FCMLA #90 */
     case 0x15: /* FCMLA #180 */
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSUDOT_scalar(DisasContext *s, arg_VSUDOT_scalar *a)
                         gen_helper_gvec_sudot_idx_b);
 }
 
+static bool trans_VDOT_b16_scal(DisasContext *s, arg_VDOT_b16_scal *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index,
+                        gen_helper_gvec_bfdot_idx);
+}
+
 static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
 {
     int opr_sz;
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFDOT_zzzz(DisasContext *s, arg_rrrr_esz *a)
     }
     return true;
 }
+
+static bool trans_BFDOT_zzxz(DisasContext *s, arg_rrxr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfdot_idx,
+                          a->rd, a->rn, a->rm, a->ra, a->index);
+    }
+    return true;
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm,
+                            void *va, uint32_t desc)
+{
+    intptr_t i, j, opr_sz = simd_oprsz(desc);
+    intptr_t index = simd_data(desc);
+    intptr_t elements = opr_sz / 4;
+    intptr_t eltspersegment = MIN(16 / 4, elements);
+    float32 *d = vd, *a = va;
+    uint32_t *n = vn, *m = vm;
+
+    for (i = 0; i < elements; i += eltspersegment) {
+        uint32_t m_idx = m[i + H4(index)];
+
+        for (j = i; j < i + eltspersegment; j++) {
+            d[j] = bfdotadd(a[j], n[j], m_idx);
+        }
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFMMLA for both AArch64 AdvSIMD and SVE,
and VMMLA.BF16 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  3 +++
 target/arm/neon-shared.decode |  2 ++
 target/arm/sve.decode         |  6 +++--
 target/arm/translate-a64.c    | 10 +++++++++
 target/arm/translate-neon.c   |  9 ++++++++
 target/arm/translate-sve.c    | 12 ++++++++++
 target/arm/vec_helper.c       | 42 ++++++++++++++++++++++++++++++++++-
 7 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUMMLA         1111 1100 0.10 .... .... 1100 .1.1 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
 VUSMMLA        1111 1100 1.10 .... .... 1100 .1.0 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+VMMLA_b16      1111 1100 0.00 .... .... 1100 .1.0 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
 VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
                vn=%vn_dp vd=%vd_dp size=1
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ SQRDCMLAH_zzzz  01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5  ra=%reg_movprfx
 USDOT_zzzz      01000100 .. 0 ..... 011 110 ..... .....  @rda_rn_rm
 
 ### SVE2 floating point matrix multiply accumulate
-
-FMMLA           01100100 .. 1 ..... 111001 ..... .....  @rda_rn_rm
+{
+  BFMMLA        01100100 01 1 ..... 111 001 ..... .....  @rda_rn_rm_e0
+  FMMLA         01100100 .. 1 ..... 111 001 ..... .....  @rda_rn_rm
+}
 
 ### SVE2 Memory Gather Load Group
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         feature = dc_isar_feature(aa64_fcma, s);
         break;
+    case 0x1d: /* BFMMLA */
+        if (size != MO_16 || !is_q) {
+            unallocated_encoding(s);
+            return;
+        }
+        feature = dc_isar_feature(aa64_bf16, s);
+        break;
     case 0x1f: /* BFDOT */
         switch (size) {
         case 1:
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         return;
 
+    case 0xd: /* BFMMLA */
+        gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
+        return;
     case 0xf: /* BFDOT */
         switch (size) {
         case 1:
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VUSMMLA(DisasContext *s, arg_VUSMMLA *a)
     return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
                         gen_helper_gvec_usmmla_b);
 }
+
+static bool trans_VMMLA_b16(DisasContext *s, arg_VMMLA_b16 *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
+                        gen_helper_gvec_bfmmla);
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFDOT_zzxz(DisasContext *s, arg_rrxr_esz *a)
     }
     return true;
 }
+
+static bool trans_BFMMLA(DisasContext *s, arg_rrrr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfmmla,
+                          a->rd, a->rn, a->rm, a->ra, 0);
+    }
+    return true;
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static void do_mmla_b(void *vd, void *vn, void *vm, void *va, uint32_t desc,
          * Process the entire segment at once, writing back the
          * results only after we've consumed all of the inputs.
          *
-         * Key to indicies by column:
+         * Key to indices by column:
          *          i   j                  i             j
          */
         sum0 = a[H4(0 + 0)];
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm,
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
+{
+    intptr_t s, opr_sz = simd_oprsz(desc);
+    float32 *d = vd, *a = va;
+    uint32_t *n = vn, *m = vm;
+
+    for (s = 0; s < opr_sz / 4; s += 4) {
+        float32 sum00, sum01, sum10, sum11;
+
+        /*
+         * Process the entire segment at once, writing back the
+         * results only after we've consumed all of the inputs.
+         *
+         * Key to indicies by column:
+         *               i   j           i   k             j   k
+         */
+        sum00 = a[s + H4(0 + 0)];
+        sum00 = bfdotadd(sum00, n[s + H4(0 + 0)], m[s + H4(0 + 0)]);
+        sum00 = bfdotadd(sum00, n[s + H4(0 + 1)], m[s + H4(0 + 1)]);
+
+        sum01 = a[s + H4(0 + 1)];
+        sum01 = bfdotadd(sum01, n[s + H4(0 + 0)], m[s + H4(2 + 0)]);
+        sum01 = bfdotadd(sum01, n[s + H4(0 + 1)], m[s + H4(2 + 1)]);
+
+        sum10 = a[s + H4(2 + 0)];
+        sum10 = bfdotadd(sum10, n[s + H4(2 + 0)], m[s + H4(0 + 0)]);
+        sum10 = bfdotadd(sum10, n[s + H4(2 + 1)], m[s + H4(0 + 1)]);
+
+        sum11 = a[s + H4(2 + 1)];
+        sum11 = bfdotadd(sum11, n[s + H4(2 + 0)], m[s + H4(2 + 0)]);
+        sum11 = bfdotadd(sum11, n[s + H4(2 + 1)], m[s + H4(2 + 1)]);
+
+        d[s + H4(0 + 0)] = sum00;
+        d[s + H4(0 + 1)] = sum01;
+        d[s + H4(2 + 0)] = sum10;
+        d[s + H4(2 + 1)] = sum11;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
and VFMA{B,T}.BF16 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-10-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  3 +++
 target/arm/neon-shared.decode |  3 +++
 target/arm/sve.decode         |  3 +++
 target/arm/translate-a64.c    | 13 +++++++++----
 target/arm/translate-neon.c   |  9 +++++++++
 target/arm/translate-sve.c    | 30 ++++++++++++++++++++++++++++++
 target/arm/vec_helper.c       | 16 ++++++++++++++++
 7 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUSMMLA        1111 1100 1.10 .... .... 1100 .1.0 .... \
 VMMLA_b16      1111 1100 0.00 .... .... 1100 .1.0 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
+VFMA_b16       1111 110 0 0.11 .... .... 1000 . q:1 . 1 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
 VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
                vn=%vn_dp vd=%vd_dp size=1
 VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
 FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_e0
 FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_e0
 
+BFMLALB_zzzw    01100100 11 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
+BFMLALT_zzzw    01100100 11 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
+
 ### SVE2 floating-point bfloat16 dot-product
 BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         feature = dc_isar_feature(aa64_bf16, s);
         break;
-    case 0x1f: /* BFDOT */
+    case 0x1f:
         switch (size) {
-        case 1:
+        case 1: /* BFDOT */
+        case 3: /* BFMLAL{B,T} */
             feature = dc_isar_feature(aa64_bf16, s);
             break;
         default:
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
     case 0xd: /* BFMMLA */
         gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
         return;
-    case 0xf: /* BFDOT */
+    case 0xf:
         switch (size) {
-        case 1:
+        case 1: /* BFDOT */
             gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
             break;
+        case 3: /* BFMLAL{B,T} */
+            gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, false, is_q,
+                              gen_helper_gvec_bfmlal);
+            break;
         default:
             g_assert_not_reached();
         }
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMMLA_b16(DisasContext *s, arg_VMMLA_b16 *a)
     return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
                         gen_helper_gvec_bfmmla);
 }
+
+static bool trans_VFMA_b16(DisasContext *s, arg_VFMA_b16 *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda_fpst(s, 7, a->vd, a->vn, a->vm, a->q, FPST_STD,
+                             gen_helper_gvec_bfmlal);
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMMLA(DisasContext *s, arg_rrrr_esz *a)
     }
     return true;
 }
+
+static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        TCGv_ptr status = fpstatus_ptr(FPST_FPCR);
+        unsigned vsz = vec_full_reg_size(s);
+
+        tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
+                           vec_full_reg_offset(s, a->rn),
+                           vec_full_reg_offset(s, a->rm),
+                           vec_full_reg_offset(s, a->ra),
+                           status, vsz, vsz, sel,
+                           gen_helper_gvec_bfmlal);
+        tcg_temp_free_ptr(status);
+    }
+    return true;
+}
+
+static bool trans_BFMLALB_zzzw(DisasContext *s, arg_rrrr_esz *a)
+{
+    return do_BFMLAL_zzzw(s, a, false);
+}
+
+static bool trans_BFMLALT_zzzw(DisasContext *s, arg_rrrr_esz *a)
+{
+    return do_BFMLAL_zzzw(s, a, true);
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va,
+                         void *stat, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    intptr_t sel = simd_data(desc);
+    float32 *d = vd, *a = va;
+    bfloat16 *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 4; ++i) {
+        float32 nn = n[H2(i * 2 + sel)] << 16;
+        float32 mm = m[H2(i * 2 + sel)] << 16;
+        d[H4(i)] = float32_muladd(nn, mm, a[H4(i)], 0, stat);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
and VFMA{B,T}.BF16 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-11-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  2 ++
 target/arm/neon-shared.decode |  2 ++
 target/arm/sve.decode         |  2 ++
 target/arm/translate-a64.c    | 15 ++++++++++++++-
 target/arm/translate-neon.c   | 10 ++++++++++
 target/arm/translate-sve.c    | 30 ++++++++++++++++++++++++++++++
 target/arm/vec_helper.c       | 22 ++++++++++++++++++++++
 7 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
 
 DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
 
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 0 . 1 index:1 ... \
                rm=%vfml_scalar_q0_rm vn=%vn_sp vd=%vd_dp q=0
 VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 1 . 1 . rm:3 \
                index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp q=1
+VFMA_b16_scal  1111 1110 0.11 .... .... 1000 . q:1 . 1 . vm:3 \
+               index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
 FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
 FMLSLB_zzxw     01100100 10 1 ..... 0110.0 ..... .....     @rrxr_3a esz=2
 FMLSLT_zzxw     01100100 10 1 ..... 0110.1 ..... .....     @rrxr_3a esz=2
+BFMLALB_zzxw    01100100 11 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
+BFMLALT_zzxw    01100100 11 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
 
 ### SVE2 floating-point bfloat16 dot-product (indexed)
 BFDOT_zzxz      01100100 01 1 ..... 010000 ..... .....     @rrxr_2 esz=2
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                 unallocated_encoding(s);
                 return;
             }
+            size = MO_32;
             break;
         case 1: /* BFDOT */
             if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
                 unallocated_encoding(s);
                 return;
             }
+            size = MO_32;
+            break;
+        case 3: /* BFMLAL{B,T} */
+            if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
+                unallocated_encoding(s);
+                return;
+            }
+            /* can't set is_fp without other incorrect size checks */
+            size = MO_16;
             break;
         default:
             unallocated_encoding(s);
             return;
         }
-        size = MO_32;
         break;
     case 0x11: /* FCMLA #0 */
     case 0x13: /* FCMLA #90 */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
             gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
                              gen_helper_gvec_usdot_idx_b);
             return;
+        case 3: /* BFMLAL{B,T} */
+            gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, 0, (index << 1) | is_q,
+                              gen_helper_gvec_bfmlal_idx);
+            return;
         }
         g_assert_not_reached();
     case 0x11: /* FCMLA #0 */
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VFMA_b16(DisasContext *s, arg_VFMA_b16 *a)
     return do_neon_ddda_fpst(s, 7, a->vd, a->vn, a->vm, a->q, FPST_STD,
                              gen_helper_gvec_bfmlal);
 }
+
+static bool trans_VFMA_b16_scal(DisasContext *s, arg_VFMA_b16_scal *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda_fpst(s, 6, a->vd, a->vn, a->vm,
+                             (a->index << 1) | a->q, FPST_STD,
+                             gen_helper_gvec_bfmlal_idx);
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLALT_zzzw(DisasContext *s, arg_rrrr_esz *a)
 {
     return do_BFMLAL_zzzw(s, a, true);
 }
+
+static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        TCGv_ptr status = fpstatus_ptr(FPST_FPCR);
+        unsigned vsz = vec_full_reg_size(s);
+
+        tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
+                           vec_full_reg_offset(s, a->rn),
+                           vec_full_reg_offset(s, a->rm),
+                           vec_full_reg_offset(s, a->ra),
+                           status, vsz, vsz, (a->index << 1) | sel,
+                           gen_helper_gvec_bfmlal_idx);
+        tcg_temp_free_ptr(status);
+    }
+    return true;
+}
+
+static bool trans_BFMLALB_zzxw(DisasContext *s, arg_rrxr_esz *a)
+{
+    return do_BFMLAL_zzxw(s, a, false);
+}
+
+static bool trans_BFMLALT_zzxw(DisasContext *s, arg_rrxr_esz *a)
+{
+    return do_BFMLAL_zzxw(s, a, true);
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va,
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_bfmlal_idx)(void *vd, void *vn, void *vm,
+                             void *va, void *stat, uint32_t desc)
+{
+    intptr_t i, j, opr_sz = simd_oprsz(desc);
+    intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 1);
+    intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 1, 3);
+    intptr_t elements = opr_sz / 4;
+    intptr_t eltspersegment = MIN(16 / 4, elements);
+    float32 *d = vd, *a = va;
+    bfloat16 *n = vn, *m = vm;
+
+    for (i = 0; i < elements; i += eltspersegment) {
+        float32 m_idx = m[H2(2 * i + index)] << 16;
+
+        for (j = i; j < i + eltspersegment; j++) {
+            float32 n_j = n[H2(2 * j + sel)] << 16;
+            d[H4(j)] = float32_muladd(n_j, m_idx, a[H4(j)], 0, stat);
+        }
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-12-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 linux-user/elfload.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap2(void)
     GET_FEATURE_ID(aa64_sve_i8mm, ARM_HWCAP2_A64_SVEI8MM);
     GET_FEATURE_ID(aa64_sve_f32mm, ARM_HWCAP2_A64_SVEF32MM);
     GET_FEATURE_ID(aa64_sve_f64mm, ARM_HWCAP2_A64_SVEF64MM);
+    GET_FEATURE_ID(aa64_sve_bf16, ARM_HWCAP2_A64_SVEBF16);
     GET_FEATURE_ID(aa64_i8mm, ARM_HWCAP2_A64_I8MM);
+    GET_FEATURE_ID(aa64_bf16, ARM_HWCAP2_A64_BF16);
     GET_FEATURE_ID(aa64_rndr, ARM_HWCAP2_A64_RNG);
     GET_FEATURE_ID(aa64_bti, ARM_HWCAP2_A64_BTI);
     GET_FEATURE_ID(aa64_mte, ARM_HWCAP2_A64_MTE);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Disable BF16 again for !have_neon and !have_vfp during realize.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-13-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c     | 3 +++
 target/arm/cpu64.c   | 3 +++
 target/arm/cpu_tcg.c | 1 +
 3 files changed, 7 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
 
         u = cpu->isar.id_isar6;
         u = FIELD_DP32(u, ID_ISAR6, JSCVT, 0);
+        u = FIELD_DP32(u, ID_ISAR6, BF16, 0);
         cpu->isar.id_isar6 = u;
 
         u = cpu->isar.mvfr0;
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
 
         t = cpu->isar.id_aa64isar1;
         t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 0);
+        t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 0);
         t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 0);
         cpu->isar.id_aa64isar1 = t;
 
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
         u = cpu->isar.id_isar6;
         u = FIELD_DP32(u, ID_ISAR6, DP, 0);
         u = FIELD_DP32(u, ID_ISAR6, FHM, 0);
+        u = FIELD_DP32(u, ID_ISAR6, BF16, 0);
         u = FIELD_DP32(u, ID_ISAR6, I8MM, 0);
         cpu->isar.id_isar6 = u;
 
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
         t = FIELD_DP64(t, ID_AA64ISAR1, SB, 1);
         t = FIELD_DP64(t, ID_AA64ISAR1, SPECRES, 1);
+        t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 1);
         t = FIELD_DP64(t, ID_AA64ISAR1, FRINTTS, 1);
         t = FIELD_DP64(t, ID_AA64ISAR1, LRCPC, 2); /* ARMv8.4-RCPC */
         t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 1);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 1);
         t = FIELD_DP64(t, ID_AA64ZFR0, AES, 2);  /* PMULL */
         t = FIELD_DP64(t, ID_AA64ZFR0, BITPERM, 1);
+        t = FIELD_DP64(t, ID_AA64ZFR0, BFLOAT16, 1);
         t = FIELD_DP64(t, ID_AA64ZFR0, SHA3, 1);
         t = FIELD_DP64(t, ID_AA64ZFR0, SM4, 1);
         t = FIELD_DP64(t, ID_AA64ZFR0, I8MM, 1);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         u = FIELD_DP32(u, ID_ISAR6, FHM, 1);
         u = FIELD_DP32(u, ID_ISAR6, SB, 1);
         u = FIELD_DP32(u, ID_ISAR6, SPECRES, 1);
+        u = FIELD_DP32(u, ID_ISAR6, BF16, 1);
         u = FIELD_DP32(u, ID_ISAR6, I8MM, 1);
         cpu->isar.id_isar6 = u;
 
diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu_tcg.c
+++ b/target/arm/cpu_tcg.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
         t = FIELD_DP32(t, ID_ISAR6, FHM, 1);
         t = FIELD_DP32(t, ID_ISAR6, SB, 1);
         t = FIELD_DP32(t, ID_ISAR6, SPECRES, 1);
+        t = FIELD_DP32(t, ID_ISAR6, BF16, 1);
         t = FIELD_DP32(t, ID_ISAR6, I8MM, 1);
         cpu->isar.id_isar6 = t;
 
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

Until now, Hypervisor.framework has only been available on x86_64 systems.
With Apple Silicon shipping now, it extends its reach to aarch64. To
prepare for support for multiple architectures, let's start moving common
code out into its own accel directory.

This patch moves assert_hvf_ok() and introduces generic build infrastructure.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-2-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/sysemu/hvf_int.h | 18 +++++++++++++++
 accel/hvf/hvf-all.c      | 47 ++++++++++++++++++++++++++++++++++++++++
 target/i386/hvf/hvf.c    | 33 +---------------------------
 MAINTAINERS              |  8 +++++++
 accel/hvf/meson.build    |  6 +++++
 accel/meson.build        |  1 +
 6 files changed, 81 insertions(+), 32 deletions(-)
 create mode 100644 include/sysemu/hvf_int.h
 create mode 100644 accel/hvf/hvf-all.c
 create mode 100644 accel/hvf/meson.build

diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * QEMU Hypervisor.framework (HVF) support
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/* header to be included in HVF-specific code */
+
+#ifndef HVF_INT_H
+#define HVF_INT_H
+
+#include <Hypervisor/hv.h>
+
+void assert_hvf_ok(hv_return_t ret);
+
+#endif
diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/accel/hvf/hvf-all.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * QEMU Hypervisor.framework support
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
+
+void assert_hvf_ok(hv_return_t ret)
+{
+    if (ret == HV_SUCCESS) {
+        return;
+    }
+
+    switch (ret) {
+    case HV_ERROR:
+        error_report("Error: HV_ERROR");
+        break;
+    case HV_BUSY:
+        error_report("Error: HV_BUSY");
+        break;
+    case HV_BAD_ARGUMENT:
+        error_report("Error: HV_BAD_ARGUMENT");
+        break;
+    case HV_NO_RESOURCES:
+        error_report("Error: HV_NO_RESOURCES");
+        break;
+    case HV_NO_DEVICE:
+        error_report("Error: HV_NO_DEVICE");
+        break;
+    case HV_UNSUPPORTED:
+        error_report("Error: HV_UNSUPPORTED");
+        break;
+    default:
+        error_report("Unknown Error");
+    }
+
+    abort();
+}
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/error-report.h"
 
 #include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
 #include "sysemu/runstate.h"
 #include "hvf-i386.h"
 #include "vmcs.h"
@@ -XXX,XX +XXX,XX @@
 
 HVFState *hvf_state;
 
-static void assert_hvf_ok(hv_return_t ret)
-{
-    if (ret == HV_SUCCESS) {
-        return;
-    }
-
-    switch (ret) {
-    case HV_ERROR:
-        error_report("Error: HV_ERROR");
-        break;
-    case HV_BUSY:
-        error_report("Error: HV_BUSY");
-        break;
-    case HV_BAD_ARGUMENT:
-        error_report("Error: HV_BAD_ARGUMENT");
-        break;
-    case HV_NO_RESOURCES:
-        error_report("Error: HV_NO_RESOURCES");
-        break;
-    case HV_NO_DEVICE:
-        error_report("Error: HV_NO_DEVICE");
-        break;
-    case HV_UNSUPPORTED:
-        error_report("Error: HV_UNSUPPORTED");
-        break;
-    default:
-        error_report("Unknown Error");
-    }
-
-    abort();
-}
-
 /* Memory slots */
 hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
 {
diff --git a/MAINTAINERS b/MAINTAINERS
index XXXXXXX..XXXXXXX 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ M: Roman Bolshakov <r.bolshakov@yadro.com>
 W: https://wiki.qemu.org/Features/HVF
 S: Maintained
 F: target/i386/hvf/
+
+HVF
+M: Cameron Esfahani <dirty@apple.com>
+M: Roman Bolshakov <r.bolshakov@yadro.com>
+W: https://wiki.qemu.org/Features/HVF
+S: Maintained
+F: accel/hvf/
 F: include/sysemu/hvf.h
+F: include/sysemu/hvf_int.h
 
 WHPX CPUs
 M: Sunil Muthuswamy <sunilmut@microsoft.com>
diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/accel/hvf/meson.build
@@ -XXX,XX +XXX,XX @@
+hvf_ss = ss.source_set()
+hvf_ss.add(files(
+  'hvf-all.c',
+))
+
+specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
diff --git a/accel/meson.build b/accel/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/accel/meson.build
+++ b/accel/meson.build
@@ -XXX,XX +XXX,XX @@ specific_ss.add(files('accel-common.c'))
 softmmu_ss.add(files('accel-softmmu.c'))
 user_ss.add(files('accel-user.c'))
 
+subdir('hvf')
 subdir('qtest')
 subdir('kvm')
 subdir('tcg')
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

This patch moves the vCPU thread loop over.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-3-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 {target/i386 => accel}/hvf/hvf-accel-ops.h | 0
 {target/i386 => accel}/hvf/hvf-accel-ops.c | 0
 target/i386/hvf/x86hvf.c                   | 2 +-
 accel/hvf/meson.build                      | 1 +
 target/i386/hvf/meson.build                | 1 -
 5 files changed, 2 insertions(+), 2 deletions(-)
 rename {target/i386 => accel}/hvf/hvf-accel-ops.h (100%)
 rename {target/i386 => accel}/hvf/hvf-accel-ops.c (100%)

diff --git a/target/i386/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
similarity index 100%
rename from target/i386/hvf/hvf-accel-ops.h
rename to accel/hvf/hvf-accel-ops.h
diff --git a/target/i386/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
similarity index 100%
rename from target/i386/hvf/hvf-accel-ops.c
rename to accel/hvf/hvf-accel-ops.c
diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -XXX,XX +XXX,XX @@
 #include <Hypervisor/hv.h>
 #include <Hypervisor/hv_vmx.h>
 
-#include "hvf-accel-ops.h"
+#include "accel/hvf/hvf-accel-ops.h"
 
 void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
                      SegmentCache *qseg, bool is_tr)
diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/meson.build
+++ b/accel/hvf/meson.build
@@ -XXX,XX +XXX,XX @@
 hvf_ss = ss.source_set()
 hvf_ss.add(files(
   'hvf-all.c',
+  'hvf-accel-ops.c',
 ))
 
 specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/meson.build
+++ b/target/i386/hvf/meson.build
@@ -XXX,XX +XXX,XX @@
 i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
   'hvf.c',
-  'hvf-accel-ops.c',
   'x86.c',
   'x86_cpuid.c',
   'x86_decode.c',
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

This patch moves CPU and memory operations over. While at it, make sure
the code is consumable on non-i386 systems.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-4-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/sysemu/hvf_int.h   |   4 +
 target/i386/hvf/hvf-i386.h |   2 -
 target/i386/hvf/x86hvf.h   |   2 -
 accel/hvf/hvf-accel-ops.c  | 308 ++++++++++++++++++++++++++++++++++++-
 target/i386/hvf/hvf.c      | 302 ------------------------------------
 5 files changed, 311 insertions(+), 307 deletions(-)

diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@
 
 #include <Hypervisor/hv.h>
 
+void hvf_set_phys_mem(MemoryRegionSection *, bool);
 void assert_hvf_ok(hv_return_t ret);
+hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
+int hvf_put_registers(CPUState *);
+int hvf_get_registers(CPUState *);
 
 #endif
diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf-i386.h
+++ b/target/i386/hvf/hvf-i386.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
 };
 extern HVFState *hvf_state;
 
-void hvf_set_phys_mem(MemoryRegionSection *, bool);
 void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
-hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
 
 #ifdef NEED_CPU_H
 /* Functions exported to host specific mode */
diff --git a/target/i386/hvf/x86hvf.h b/target/i386/hvf/x86hvf.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86hvf.h
+++ b/target/i386/hvf/x86hvf.h
@@ -XXX,XX +XXX,XX @@
 #include "x86_descr.h"
 
 int hvf_process_events(CPUState *);
-int hvf_put_registers(CPUState *);
-int hvf_get_registers(CPUState *);
 bool hvf_inject_interrupts(CPUState *);
 void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
                      SegmentCache *qseg, bool is_tr);
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
+#include "exec/address-spaces.h"
+#include "exec/exec-all.h"
+#include "sysemu/cpus.h"
 #include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
 #include "sysemu/runstate.h"
-#include "target/i386/cpu.h"
 #include "qemu/guest-random.h"
 
 #include "hvf-accel-ops.h"
 
+HVFState *hvf_state;
+
+/* Memory slots */
+
+hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
+{
+    hvf_slot *slot;
+    int x;
+    for (x = 0; x < hvf_state->num_slots; ++x) {
+        slot = &hvf_state->slots[x];
+        if (slot->size && start < (slot->start + slot->size) &&
+            (start + size) > slot->start) {
+            return slot;
+        }
+    }
+    return NULL;
+}
+
+struct mac_slot {
+    int present;
+    uint64_t size;
+    uint64_t gpa_start;
+    uint64_t gva;
+};
+
+struct mac_slot mac_slots[32];
+
+static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
+{
+    struct mac_slot *macslot;
+    hv_return_t ret;
+
+    macslot = &mac_slots[slot->slot_id];
+
+    if (macslot->present) {
+        if (macslot->size != slot->size) {
+            macslot->present = 0;
+            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
+            assert_hvf_ok(ret);
+        }
+    }
+
+    if (!slot->size) {
+        return 0;
+    }
+
+    macslot->present = 1;
+    macslot->gpa_start = slot->start;
+    macslot->size = slot->size;
+    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
+    assert_hvf_ok(ret);
+    return 0;
+}
+
+void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
+{
+    hvf_slot *mem;
+    MemoryRegion *area = section->mr;
+    bool writeable = !area->readonly && !area->rom_device;
+    hv_memory_flags_t flags;
+
+    if (!memory_region_is_ram(area)) {
+        if (writeable) {
+            return;
+        } else if (!memory_region_is_romd(area)) {
+            /*
+             * If the memory device is not in romd_mode, then we actually want
+             * to remove the hvf memory slot so all accesses will trap.
+             */
+             add = false;
+        }
+    }
+
+    mem = hvf_find_overlap_slot(
+            section->offset_within_address_space,
+            int128_get64(section->size));
+
+    if (mem && add) {
+        if (mem->size == int128_get64(section->size) &&
+            mem->start == section->offset_within_address_space &&
+            mem->mem == (memory_region_get_ram_ptr(area) +
+            section->offset_within_region)) {
+            return; /* Same region was attempted to register, go away. */
+        }
+    }
+
+    /* Region needs to be reset. set the size to 0 and remap it. */
+    if (mem) {
+        mem->size = 0;
+        if (do_hvf_set_memory(mem, 0)) {
+            error_report("Failed to reset overlapping slot");
+            abort();
+        }
+    }
+
+    if (!add) {
+        return;
+    }
+
+    if (area->readonly ||
+        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
+        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
+    } else {
+        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
+    }
+
+    /* Now make a new slot. */
+    int x;
+
+    for (x = 0; x < hvf_state->num_slots; ++x) {
+        mem = &hvf_state->slots[x];
+        if (!mem->size) {
+            break;
+        }
+    }
+
+    if (x == hvf_state->num_slots) {
+        error_report("No free slots");
+        abort();
+    }
+
+    mem->size = int128_get64(section->size);
+    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
+    mem->start = section->offset_within_address_space;
+    mem->region = area;
+
+    if (do_hvf_set_memory(mem, flags)) {
+        error_report("Error registering new memory slot");
+        abort();
+    }
+}
+
+static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
+{
+    if (!cpu->vcpu_dirty) {
+        hvf_get_registers(cpu);
+        cpu->vcpu_dirty = true;
+    }
+}
+
+void hvf_cpu_synchronize_state(CPUState *cpu)
+{
+    if (!cpu->vcpu_dirty) {
+        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
+    }
+}
+
+static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
+                                              run_on_cpu_data arg)
+{
+    hvf_put_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+void hvf_cpu_synchronize_post_reset(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
+}
+
+static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
+                                             run_on_cpu_data arg)
+{
+    hvf_put_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+void hvf_cpu_synchronize_post_init(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
+}
+
+static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
+                                              run_on_cpu_data arg)
+{
+    cpu->vcpu_dirty = true;
+}
+
+void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
+}
+
+static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
+{
+    hvf_slot *slot;
+
+    slot = hvf_find_overlap_slot(
+            section->offset_within_address_space,
+            int128_get64(section->size));
+
+    /* protect region against writes; begin tracking it */
+    if (on) {
+        slot->flags |= HVF_SLOT_LOG;
+        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
+                      HV_MEMORY_READ);
+    /* stop tracking region*/
+    } else {
+        slot->flags &= ~HVF_SLOT_LOG;
+        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
+                      HV_MEMORY_READ | HV_MEMORY_WRITE);
+    }
+}
+
+static void hvf_log_start(MemoryListener *listener,
+                          MemoryRegionSection *section, int old, int new)
+{
+    if (old != 0) {
+        return;
+    }
+
+    hvf_set_dirty_tracking(section, 1);
+}
+
+static void hvf_log_stop(MemoryListener *listener,
+                         MemoryRegionSection *section, int old, int new)
+{
+    if (new != 0) {
+        return;
+    }
+
+    hvf_set_dirty_tracking(section, 0);
+}
+
+static void hvf_log_sync(MemoryListener *listener,
+                         MemoryRegionSection *section)
+{
+    /*
+     * sync of dirty pages is handled elsewhere; just make sure we keep
+     * tracking the region.
+     */
+    hvf_set_dirty_tracking(section, 1);
+}
+
+static void hvf_region_add(MemoryListener *listener,
+                           MemoryRegionSection *section)
+{
+    hvf_set_phys_mem(section, true);
+}
+
+static void hvf_region_del(MemoryListener *listener,
+                           MemoryRegionSection *section)
+{
+    hvf_set_phys_mem(section, false);
+}
+
+static MemoryListener hvf_memory_listener = {
+    .priority = 10,
+    .region_add = hvf_region_add,
+    .region_del = hvf_region_del,
+    .log_start = hvf_log_start,
+    .log_stop = hvf_log_stop,
+    .log_sync = hvf_log_sync,
+};
+
+static void dummy_signal(int sig)
+{
+}
+
+bool hvf_allowed;
+
+static int hvf_accel_init(MachineState *ms)
+{
+    int x;
+    hv_return_t ret;
+    HVFState *s;
+
+    ret = hv_vm_create(HV_VM_DEFAULT);
+    assert_hvf_ok(ret);
+
+    s = g_new0(HVFState, 1);
+
+    s->num_slots = 32;
+    for (x = 0; x < s->num_slots; ++x) {
+        s->slots[x].size = 0;
+        s->slots[x].slot_id = x;
+    }
+
+    hvf_state = s;
+    memory_listener_register(&hvf_memory_listener, &address_space_memory);
+    return 0;
+}
+
+static void hvf_accel_class_init(ObjectClass *oc, void *data)
+{
+    AccelClass *ac = ACCEL_CLASS(oc);
+    ac->name = "HVF";
+    ac->init_machine = hvf_accel_init;
+    ac->allowed = &hvf_allowed;
+}
+
+static const TypeInfo hvf_accel_type = {
+    .name = TYPE_HVF_ACCEL,
+    .parent = TYPE_ACCEL,
+    .class_init = hvf_accel_class_init,
+};
+
+static void hvf_type_init(void)
+{
+    type_register_static(&hvf_accel_type);
+}
+
+type_init(hvf_type_init);
+
 /*
  * The HVF-specific vCPU thread function. This one should only run when the host
  * CPU supports the VMX "unrestricted guest" feature.
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
 
 #include "hvf-accel-ops.h"
 
-HVFState *hvf_state;
-
-/* Memory slots */
-hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
-{
-    hvf_slot *slot;
-    int x;
-    for (x = 0; x < hvf_state->num_slots; ++x) {
-        slot = &hvf_state->slots[x];
-        if (slot->size && start < (slot->start + slot->size) &&
-            (start + size) > slot->start) {
-            return slot;
-        }
-    }
-    return NULL;
-}
-
-struct mac_slot {
-    int present;
-    uint64_t size;
-    uint64_t gpa_start;
-    uint64_t gva;
-};
-
-struct mac_slot mac_slots[32];
-
-static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
-{
-    struct mac_slot *macslot;
-    hv_return_t ret;
-
-    macslot = &mac_slots[slot->slot_id];
-
-    if (macslot->present) {
-        if (macslot->size != slot->size) {
-            macslot->present = 0;
-            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
-            assert_hvf_ok(ret);
-        }
-    }
-
-    if (!slot->size) {
-        return 0;
-    }
-
-    macslot->present = 1;
-    macslot->gpa_start = slot->start;
-    macslot->size = slot->size;
-    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
-    assert_hvf_ok(ret);
-    return 0;
-}
-
-void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
-{
-    hvf_slot *mem;
-    MemoryRegion *area = section->mr;
-    bool writeable = !area->readonly && !area->rom_device;
-    hv_memory_flags_t flags;
-
-    if (!memory_region_is_ram(area)) {
-        if (writeable) {
-            return;
-        } else if (!memory_region_is_romd(area)) {
-            /*
-             * If the memory device is not in romd_mode, then we actually want
-             * to remove the hvf memory slot so all accesses will trap.
-             */
-             add = false;
-        }
-    }
-
-    mem = hvf_find_overlap_slot(
-            section->offset_within_address_space,
-            int128_get64(section->size));
-
-    if (mem && add) {
-        if (mem->size == int128_get64(section->size) &&
-            mem->start == section->offset_within_address_space &&
-            mem->mem == (memory_region_get_ram_ptr(area) +
-            section->offset_within_region)) {
-            return; /* Same region was attempted to register, go away. */
-        }
-    }
-
-    /* Region needs to be reset. set the size to 0 and remap it. */
-    if (mem) {
-        mem->size = 0;
-        if (do_hvf_set_memory(mem, 0)) {
-            error_report("Failed to reset overlapping slot");
-            abort();
-        }
-    }
-
-    if (!add) {
-        return;
-    }
-
-    if (area->readonly ||
-        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
-        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
-    } else {
-        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
-    }
-
-    /* Now make a new slot. */
-    int x;
-
-    for (x = 0; x < hvf_state->num_slots; ++x) {
-        mem = &hvf_state->slots[x];
-        if (!mem->size) {
-            break;
-        }
-    }
-
-    if (x == hvf_state->num_slots) {
-        error_report("No free slots");
-        abort();
-    }
-
-    mem->size = int128_get64(section->size);
-    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
-    mem->start = section->offset_within_address_space;
-    mem->region = area;
-
-    if (do_hvf_set_memory(mem, flags)) {
-        error_report("Error registering new memory slot");
-        abort();
-    }
-}
-
 void vmx_update_tpr(CPUState *cpu)
 {
     /* TODO: need integrate APIC handling */
@@ -XXX,XX +XXX,XX @@ void hvf_handle_io(CPUArchState *env, uint16_t port, void *buffer,
     }
 }
 
-static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
-{
-    if (!cpu->vcpu_dirty) {
-        hvf_get_registers(cpu);
-        cpu->vcpu_dirty = true;
-    }
-}
-
-void hvf_cpu_synchronize_state(CPUState *cpu)
-{
-    if (!cpu->vcpu_dirty) {
-        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
-    }
-}
-
-static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
-                                              run_on_cpu_data arg)
-{
-    hvf_put_registers(cpu);
-    cpu->vcpu_dirty = false;
-}
-
-void hvf_cpu_synchronize_post_reset(CPUState *cpu)
-{
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
-}
-
-static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
-                                             run_on_cpu_data arg)
-{
-    hvf_put_registers(cpu);
-    cpu->vcpu_dirty = false;
-}
-
-void hvf_cpu_synchronize_post_init(CPUState *cpu)
-{
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
-}
-
-static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
-                                              run_on_cpu_data arg)
-{
-    cpu->vcpu_dirty = true;
-}
-
-void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
-{
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
-}
-
 static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
 {
     int read, write;
@@ -XXX,XX +XXX,XX @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
     return false;
 }
 
-static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
-{
-    hvf_slot *slot;
-
-    slot = hvf_find_overlap_slot(
-            section->offset_within_address_space,
-            int128_get64(section->size));
-
-    /* protect region against writes; begin tracking it */
-    if (on) {
-        slot->flags |= HVF_SLOT_LOG;
-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
-                      HV_MEMORY_READ);
-    /* stop tracking region*/
-    } else {
-        slot->flags &= ~HVF_SLOT_LOG;
-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
-                      HV_MEMORY_READ | HV_MEMORY_WRITE);
-    }
-}
-
-static void hvf_log_start(MemoryListener *listener,
-                          MemoryRegionSection *section, int old, int new)
-{
-    if (old != 0) {
-        return;
-    }
-
-    hvf_set_dirty_tracking(section, 1);
-}
-
-static void hvf_log_stop(MemoryListener *listener,
-                         MemoryRegionSection *section, int old, int new)
-{
-    if (new != 0) {
-        return;
-    }
-
-    hvf_set_dirty_tracking(section, 0);
-}
-
-static void hvf_log_sync(MemoryListener *listener,
-                         MemoryRegionSection *section)
-{
-    /*
-     * sync of dirty pages is handled elsewhere; just make sure we keep
-     * tracking the region.
-     */
-    hvf_set_dirty_tracking(section, 1);
-}
-
-static void hvf_region_add(MemoryListener *listener,
-                           MemoryRegionSection *section)
-{
-    hvf_set_phys_mem(section, true);
-}
-
-static void hvf_region_del(MemoryListener *listener,
-                           MemoryRegionSection *section)
-{
-    hvf_set_phys_mem(section, false);
-}
-
-static MemoryListener hvf_memory_listener = {
-    .priority = 10,
-    .region_add = hvf_region_add,
-    .region_del = hvf_region_del,
-    .log_start = hvf_log_start,
-    .log_stop = hvf_log_stop,
-    .log_sync = hvf_log_sync,
-};
-
 void hvf_vcpu_destroy(CPUState *cpu)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
@@ -XXX,XX +XXX,XX @@ void hvf_vcpu_destroy(CPUState *cpu)
     assert_hvf_ok(ret);
 }
 
-static void dummy_signal(int sig)
-{
-}
-
 static void init_tsc_freq(CPUX86State *env)
 {
     size_t length;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
 
     return ret;
 }
-
-bool hvf_allowed;
-
-static int hvf_accel_init(MachineState *ms)
-{
-    int x;
-    hv_return_t ret;
-    HVFState *s;
-
-    ret = hv_vm_create(HV_VM_DEFAULT);
-    assert_hvf_ok(ret);
-
-    s = g_new0(HVFState, 1);
- 
-    s->num_slots = 32;
-    for (x = 0; x < s->num_slots; ++x) {
-        s->slots[x].size = 0;
-        s->slots[x].slot_id = x;
-    }
-  
-    hvf_state = s;
-    memory_listener_register(&hvf_memory_listener, &address_space_memory);
-    return 0;
-}
-
-static void hvf_accel_class_init(ObjectClass *oc, void *data)
-{
-    AccelClass *ac = ACCEL_CLASS(oc);
-    ac->name = "HVF";
-    ac->init_machine = hvf_accel_init;
-    ac->allowed = &hvf_allowed;
-}
-
-static const TypeInfo hvf_accel_type = {
-    .name = TYPE_HVF_ACCEL,
-    .parent = TYPE_ACCEL,
-    .class_init = hvf_accel_class_init,
-};
-
-static void hvf_type_init(void)
-{
-    type_register_static(&hvf_accel_type);
-}
-
-type_init(hvf_type_init);
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

This patch moves a few internal struct and constant defines over.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-5-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/sysemu/hvf_int.h   | 30 ++++++++++++++++++++++++++++++
 target/i386/hvf/hvf-i386.h | 31 +------------------------------
 2 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@
 
 #include <Hypervisor/hv.h>
 
+/* hvf_slot flags */
+#define HVF_SLOT_LOG (1 << 0)
+
+typedef struct hvf_slot {
+    uint64_t start;
+    uint64_t size;
+    uint8_t *mem;
+    int slot_id;
+    uint32_t flags;
+    MemoryRegion *region;
+} hvf_slot;
+
+typedef struct hvf_vcpu_caps {
+    uint64_t vmx_cap_pinbased;
+    uint64_t vmx_cap_procbased;
+    uint64_t vmx_cap_procbased2;
+    uint64_t vmx_cap_entry;
+    uint64_t vmx_cap_exit;
+    uint64_t vmx_cap_preemption_timer;
+} hvf_vcpu_caps;
+
+struct HVFState {
+    AccelState parent;
+    hvf_slot slots[32];
+    int num_slots;
+
+    hvf_vcpu_caps *hvf_caps;
+};
+extern HVFState *hvf_state;
+
 void hvf_set_phys_mem(MemoryRegionSection *, bool);
 void assert_hvf_ok(hv_return_t ret);
 hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf-i386.h
+++ b/target/i386/hvf/hvf-i386.h
@@ -XXX,XX +XXX,XX @@
 
 #include "qemu/accel.h"
 #include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
 #include "cpu.h"
 #include "x86.h"
 
-/* hvf_slot flags */
-#define HVF_SLOT_LOG (1 << 0)
-
-typedef struct hvf_slot {
-    uint64_t start;
-    uint64_t size;
-    uint8_t *mem;
-    int slot_id;
-    uint32_t flags;
-    MemoryRegion *region;
-} hvf_slot;
-
-typedef struct hvf_vcpu_caps {
-    uint64_t vmx_cap_pinbased;
-    uint64_t vmx_cap_procbased;
-    uint64_t vmx_cap_procbased2;
-    uint64_t vmx_cap_entry;
-    uint64_t vmx_cap_exit;
-    uint64_t vmx_cap_preemption_timer;
-} hvf_vcpu_caps;
-
-struct HVFState {
-    AccelState parent;
-    hvf_slot slots[32];
-    int num_slots;
-
-    hvf_vcpu_caps *hvf_caps;
-};
-extern HVFState *hvf_state;
-
 void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
 
 #ifdef NEED_CPU_H
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

The hvf_set_phys_mem() function is only called within the same file.
Make it static.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-6-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/sysemu/hvf_int.h  | 1 -
 accel/hvf/hvf-accel-ops.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

From: Alexander Graf <agraf@csgraf.de>

The ARM version of Hypervisor.framework no longer defines these two
types, so let's just revert to standard ones.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-7-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
     macslot->present = 1;
     macslot->gpa_start = slot->start;
     macslot->size = slot->size;
-    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
+    ret = hv_vm_map(slot->mem, slot->start, slot->size, flags);
     assert_hvf_ok(ret);
     return 0;
 }
@@ -XXX,XX +XXX,XX @@ static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
     /* protect region against writes; begin tracking it */
     if (on) {
         slot->flags |= HVF_SLOT_LOG;
-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
+        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
                       HV_MEMORY_READ);
     /* stop tracking region*/
     } else {
         slot->flags &= ~HVF_SLOT_LOG;
-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
+        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
                       HV_MEMORY_READ | HV_MEMORY_WRITE);
     }
 }
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

This patch splits the vcpu init and destroy functions into a generic and
an architecture specific portion. This also allows us to move the generic
functions into the generic hvf code, removing exported functions.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-8-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.h |  2 --
 include/sysemu/hvf_int.h  |  2 ++
 accel/hvf/hvf-accel-ops.c | 30 ++++++++++++++++++++++++++++++
 target/i386/hvf/hvf.c     | 23 ++---------------------
 4 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.h
+++ b/accel/hvf/hvf-accel-ops.h
@@ -XXX,XX +XXX,XX @@
 
 #include "sysemu/cpus.h"
 
-int hvf_init_vcpu(CPUState *);
 int hvf_vcpu_exec(CPUState *);
 void hvf_cpu_synchronize_state(CPUState *);
 void hvf_cpu_synchronize_post_reset(CPUState *);
 void hvf_cpu_synchronize_post_init(CPUState *);
 void hvf_cpu_synchronize_pre_loadvm(CPUState *);
-void hvf_vcpu_destroy(CPUState *);
 
 #endif /* HVF_CPUS_H */
diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
 extern HVFState *hvf_state;
 
 void assert_hvf_ok(hv_return_t ret);
+int hvf_arch_init_vcpu(CPUState *cpu);
+void hvf_arch_vcpu_destroy(CPUState *cpu);
 hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
 int hvf_put_registers(CPUState *);
 int hvf_get_registers(CPUState *);
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static void hvf_type_init(void)
 
 type_init(hvf_type_init);
 
+static void hvf_vcpu_destroy(CPUState *cpu)
+{
+    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
+    assert_hvf_ok(ret);
+
+    hvf_arch_vcpu_destroy(cpu);
+}
+
+static int hvf_init_vcpu(CPUState *cpu)
+{
+    int r;
+
+    /* init cpu signals */
+    sigset_t set;
+    struct sigaction sigact;
+
+    memset(&sigact, 0, sizeof(sigact));
+    sigact.sa_handler = dummy_signal;
+    sigaction(SIG_IPI, &sigact, NULL);
+
+    pthread_sigmask(SIG_BLOCK, NULL, &set);
+    sigdelset(&set, SIG_IPI);
+
+    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
+    cpu->vcpu_dirty = 1;
+    assert_hvf_ok(r);
+
+    return hvf_arch_init_vcpu(cpu);
+}
+
 /*
  * The HVF-specific vCPU thread function. This one should only run when the host
  * CPU supports the VMX "unrestricted guest" feature.
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
     return false;
 }
 
-void hvf_vcpu_destroy(CPUState *cpu)
+void hvf_arch_vcpu_destroy(CPUState *cpu)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
     CPUX86State *env = &x86_cpu->env;
 
-    hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd);
     g_free(env->hvf_mmio_buf);
-    assert_hvf_ok(ret);
 }
 
 static void init_tsc_freq(CPUX86State *env)
@@ -XXX,XX +XXX,XX @@ static inline bool apic_bus_freq_is_known(CPUX86State *env)
     return env->apic_bus_freq != 0;
 }
 
-int hvf_init_vcpu(CPUState *cpu)
+int hvf_arch_init_vcpu(CPUState *cpu)
 {
-
     X86CPU *x86cpu = X86_CPU(cpu);
     CPUX86State *env = &x86cpu->env;
-    int r;
-
-    /* init cpu signals */
-    sigset_t set;
-    struct sigaction sigact;
-
-    memset(&sigact, 0, sizeof(sigact));
-    sigact.sa_handler = dummy_signal;
-    sigaction(SIG_IPI, &sigact, NULL);
-
-    pthread_sigmask(SIG_BLOCK, NULL, &set);
-    sigdelset(&set, SIG_IPI);
 
     init_emu();
     init_decoder();
@@ -XXX,XX +XXX,XX @@ int hvf_init_vcpu(CPUState *cpu)
         }
     }
 
-    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
-    cpu->vcpu_dirty = 1;
-    assert_hvf_ok(r);
-
     if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED,
         &hvf_state->hvf_caps->vmx_cap_pinbased)) {
         abort();
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

There is no reason to call the hvf specific hvf_cpu_synchronize_state()
when we can just use the generic cpu_synchronize_state() instead. This
allows us to have less dependency on internal function definitions and
allows us to make hvf_cpu_synchronize_state() static.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-9-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.h | 1 -
 accel/hvf/hvf-accel-ops.c | 2 +-
 target/i386/hvf/x86hvf.c  | 9 ++++-----
 3 files changed, 5 insertions(+), 7 deletions(-)

From: Alexander Graf <agraf@csgraf.de>

The hvf accel synchronize functions are only used as input for local
callback functions, so we can make them static.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-10-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.h | 3 ---
 accel/hvf/hvf-accel-ops.c | 6 +++---
 2 files changed, 3 insertions(+), 6 deletions(-)

From: Alexander Graf <agraf@csgraf.de>

We can move the definition of hvf_vcpu_exec() into our internal
hvf header, obsoleting the need for hvf-accel-ops.h.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-11-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.h | 17 -----------------
 include/sysemu/hvf_int.h  |  1 +
 accel/hvf/hvf-accel-ops.c |  2 --
 target/i386/hvf/hvf.c     |  2 --
 4 files changed, 1 insertion(+), 21 deletions(-)
 delete mode 100644 accel/hvf/hvf-accel-ops.h

diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
deleted file mode 100644
index XXXXXXX..XXXXXXX
--- a/accel/hvf/hvf-accel-ops.h
+++ /dev/null
@@ -XXX,XX +XXX,XX @@
-/*
- * Accelerator CPUS Interface
- *
- * Copyright 2020 SUSE LLC
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- */
-
-#ifndef HVF_CPUS_H
-#define HVF_CPUS_H
-
-#include "sysemu/cpus.h"
-
-int hvf_vcpu_exec(CPUState *);
-
-#endif /* HVF_CPUS_H */
diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ extern HVFState *hvf_state;
 void assert_hvf_ok(hv_return_t ret);
 int hvf_arch_init_vcpu(CPUState *cpu);
 void hvf_arch_vcpu_destroy(CPUState *cpu);
+int hvf_vcpu_exec(CPUState *);
 hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
 int hvf_put_registers(CPUState *);
 int hvf_get_registers(CPUState *);
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@
 #include "sysemu/runstate.h"
 #include "qemu/guest-random.h"
 
-#include "hvf-accel-ops.h"
-
 HVFState *hvf_state;
 
 /* Memory slots */
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/accel.h"
 #include "target/i386/cpu.h"
 
-#include "hvf-accel-ops.h"
-
 void vmx_update_tpr(CPUState *cpu)
 {
     /* TODO: need integrate APIC handling */
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

We will need more than a single field for hvf going forward. To keep
the global vcpu struct uncluttered, let's allocate a special hvf vcpu
struct, similar to how hax does it.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-12-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/core/cpu.h       |   3 +-
 include/sysemu/hvf_int.h    |   4 +
 target/i386/hvf/vmx.h       |  24 +++--
 accel/hvf/hvf-accel-ops.c   |   8 +-
 target/i386/hvf/hvf.c       | 104 +++++++++---------
 target/i386/hvf/x86.c       |  28 ++---
 target/i386/hvf/x86_descr.c |  26 ++---
 target/i386/hvf/x86_emu.c   |  62 +++++------
 target/i386/hvf/x86_mmu.c   |   4 +-
 target/i386/hvf/x86_task.c  |  12 +--
 target/i386/hvf/x86hvf.c    | 210 ++++++++++++++++++------------------
 11 files changed, 248 insertions(+), 237 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -XXX,XX +XXX,XX @@ struct KVMState;
 struct kvm_run;
 
 struct hax_vcpu_state;
+struct hvf_vcpu_state;
 
 #define TB_JMP_CACHE_BITS 12
 #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
@@ -XXX,XX +XXX,XX @@ struct CPUState {
 
     struct hax_vcpu_state *hax_vcpu;
 
-    int hvf_fd;
+    struct hvf_vcpu_state *hvf;
 
     /* track IOMMUs whose translations we've cached in the TCG TLB */
     GArray *iommu_notifiers;
diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
 };
 extern HVFState *hvf_state;
 
+struct hvf_vcpu_state {
+    int fd;
+};
+
 void assert_hvf_ok(hv_return_t ret);
 int hvf_arch_init_vcpu(CPUState *cpu);
 void hvf_arch_vcpu_destroy(CPUState *cpu);
diff --git a/target/i386/hvf/vmx.h b/target/i386/hvf/vmx.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/vmx.h
+++ b/target/i386/hvf/vmx.h
@@ -XXX,XX +XXX,XX @@
 #include "vmcs.h"
 #include "cpu.h"
 #include "x86.h"
+#include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
 
 #include "exec/address-spaces.h"
 
@@ -XXX,XX +XXX,XX @@ static inline void macvm_set_rip(CPUState *cpu, uint64_t rip)
     uint64_t val;
 
     /* BUG, should take considering overlap.. */
-    wreg(cpu->hvf_fd, HV_X86_RIP, rip);
+    wreg(cpu->hvf->fd, HV_X86_RIP, rip);
     env->eip = rip;
 
     /* after moving forward in rip, we need to clean INTERRUPTABILITY */
-   val = rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
+   val = rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
    if (val & (VMCS_INTERRUPTIBILITY_STI_BLOCKING |
                VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)) {
         env->hflags &= ~HF_INHIBIT_IRQ_MASK;
-        wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY,
+        wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY,
                val & ~(VMCS_INTERRUPTIBILITY_STI_BLOCKING |
                VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING));
    }
@@ -XXX,XX +XXX,XX @@ static inline void vmx_clear_nmi_blocking(CPUState *cpu)
     CPUX86State *env = &x86_cpu->env;
 
     env->hflags2 &= ~HF2_NMI_MASK;
-    uint32_t gi = (uint32_t) rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
+    uint32_t gi = (uint32_t) rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
     gi &= ~VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
-    wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
+    wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
 }
 
 static inline void vmx_set_nmi_blocking(CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ static inline void vmx_set_nmi_blocking(CPUState *cpu)
     CPUX86State *env = &x86_cpu->env;
 
     env->hflags2 |= HF2_NMI_MASK;
-    uint32_t gi = (uint32_t)rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
+    uint32_t gi = (uint32_t)rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
     gi |= VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
-    wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
+    wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
 }
 
 static inline void vmx_set_nmi_window_exiting(CPUState *cpu)
 {
     uint64_t val;
-    val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
-    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val |
+    val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
+    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val |
           VMCS_PRI_PROC_BASED_CTLS_NMI_WINDOW_EXITING);
 
 }
@@ -XXX,XX +XXX,XX @@ static inline void vmx_clear_nmi_window_exiting(CPUState *cpu)
 {
 
     uint64_t val;
-    val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
-    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val &
+    val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
+    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val &
           ~VMCS_PRI_PROC_BASED_CTLS_NMI_WINDOW_EXITING);
 }
 
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ type_init(hvf_type_init);
 
 static void hvf_vcpu_destroy(CPUState *cpu)
 {
-    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
+    hv_return_t ret = hv_vcpu_destroy(cpu->hvf->fd);
     assert_hvf_ok(ret);
 
     hvf_arch_vcpu_destroy(cpu);
+    g_free(cpu->hvf);
+    cpu->hvf = NULL;
 }
 
 static int hvf_init_vcpu(CPUState *cpu)
 {
     int r;
 
+    cpu->hvf = g_malloc0(sizeof(*cpu->hvf));
+
     /* init cpu signals */
     sigset_t set;
     struct sigaction sigact;
@@ -XXX,XX +XXX,XX @@ static int hvf_init_vcpu(CPUState *cpu)
     pthread_sigmask(SIG_BLOCK, NULL, &set);
     sigdelset(&set, SIG_IPI);
 
-    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
+    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf->fd, HV_VCPU_DEFAULT);
     cpu->vcpu_dirty = 1;
     assert_hvf_ok(r);
 
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@ void vmx_update_tpr(CPUState *cpu)
     int tpr = cpu_get_apic_tpr(x86_cpu->apic_state) << 4;
     int irr = apic_get_highest_priority_irr(x86_cpu->apic_state);
 
-    wreg(cpu->hvf_fd, HV_X86_TPR, tpr);
+    wreg(cpu->hvf->fd, HV_X86_TPR, tpr);
     if (irr == -1) {
-        wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, 0);
+        wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, 0);
     } else {
-        wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, (irr > tpr) ? tpr >> 4 :
+        wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, (irr > tpr) ? tpr >> 4 :
               irr >> 4);
     }
 }
@@ -XXX,XX +XXX,XX @@ void vmx_update_tpr(CPUState *cpu)
 static void update_apic_tpr(CPUState *cpu)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
-    int tpr = rreg(cpu->hvf_fd, HV_X86_TPR) >> 4;
+    int tpr = rreg(cpu->hvf->fd, HV_X86_TPR) >> 4;
     cpu_set_apic_tpr(x86_cpu->apic_state, tpr);
 }
 
@@ -XXX,XX +XXX,XX @@ int hvf_arch_init_vcpu(CPUState *cpu)
     }
 
     /* set VMCS control fields */
-    wvmcs(cpu->hvf_fd, VMCS_PIN_BASED_CTLS,
+    wvmcs(cpu->hvf->fd, VMCS_PIN_BASED_CTLS,
           cap2ctrl(hvf_state->hvf_caps->vmx_cap_pinbased,
           VMCS_PIN_BASED_CTLS_EXTINT |
           VMCS_PIN_BASED_CTLS_NMI |
           VMCS_PIN_BASED_CTLS_VNMI));
-    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS,
+    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS,
           cap2ctrl(hvf_state->hvf_caps->vmx_cap_procbased,
           VMCS_PRI_PROC_BASED_CTLS_HLT |
           VMCS_PRI_PROC_BASED_CTLS_MWAIT |
           VMCS_PRI_PROC_BASED_CTLS_TSC_OFFSET |
           VMCS_PRI_PROC_BASED_CTLS_TPR_SHADOW) |
           VMCS_PRI_PROC_BASED_CTLS_SEC_CONTROL);
-    wvmcs(cpu->hvf_fd, VMCS_SEC_PROC_BASED_CTLS,
+    wvmcs(cpu->hvf->fd, VMCS_SEC_PROC_BASED_CTLS,
           cap2ctrl(hvf_state->hvf_caps->vmx_cap_procbased2,
                    VMCS_PRI_PROC_BASED2_CTLS_APIC_ACCESSES));
 
-    wvmcs(cpu->hvf_fd, VMCS_ENTRY_CTLS, cap2ctrl(hvf_state->hvf_caps->vmx_cap_entry,
+    wvmcs(cpu->hvf->fd, VMCS_ENTRY_CTLS, cap2ctrl(hvf_state->hvf_caps->vmx_cap_entry,
           0));
-    wvmcs(cpu->hvf_fd, VMCS_EXCEPTION_BITMAP, 0); /* Double fault */
+    wvmcs(cpu->hvf->fd, VMCS_EXCEPTION_BITMAP, 0); /* Double fault */
 
-    wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, 0);
+    wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, 0);
 
     x86cpu = X86_CPU(cpu);
     x86cpu->env.xsave_buf = qemu_memalign(4096, 4096);
 
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_STAR, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_LSTAR, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_CSTAR, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_FMASK, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_FSBASE, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_GSBASE, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_KERNELGSBASE, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_TSC_AUX, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_TSC, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_CS, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_EIP, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_ESP, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_STAR, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_LSTAR, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_CSTAR, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_FMASK, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_FSBASE, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_GSBASE, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_KERNELGSBASE, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_TSC_AUX, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_TSC, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_CS, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_EIP, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_ESP, 1);
 
     return 0;
 }
@@ -XXX,XX +XXX,XX @@ static void hvf_store_events(CPUState *cpu, uint32_t ins_len, uint64_t idtvec_in
         }
         if (idtvec_info & VMCS_IDT_VEC_ERRCODE_VALID) {
             env->has_error_code = true;
-            env->error_code = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_ERROR);
+            env->error_code = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_ERROR);
         }
     }
-    if ((rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY) &
+    if ((rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY) &
         VMCS_INTERRUPTIBILITY_NMI_BLOCKING)) {
         env->hflags2 |= HF2_NMI_MASK;
     } else {
         env->hflags2 &= ~HF2_NMI_MASK;
     }
-    if (rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY) &
+    if (rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY) &
          (VMCS_INTERRUPTIBILITY_STI_BLOCKING |
          VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)) {
         env->hflags |= HF_INHIBIT_IRQ_MASK;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
             return EXCP_HLT;
         }
 
-        hv_return_t r  = hv_vcpu_run(cpu->hvf_fd);
+        hv_return_t r  = hv_vcpu_run(cpu->hvf->fd);
         assert_hvf_ok(r);
 
         /* handle VMEXIT */
-        uint64_t exit_reason = rvmcs(cpu->hvf_fd, VMCS_EXIT_REASON);
-        uint64_t exit_qual = rvmcs(cpu->hvf_fd, VMCS_EXIT_QUALIFICATION);
-        uint32_t ins_len = (uint32_t)rvmcs(cpu->hvf_fd,
+        uint64_t exit_reason = rvmcs(cpu->hvf->fd, VMCS_EXIT_REASON);
+        uint64_t exit_qual = rvmcs(cpu->hvf->fd, VMCS_EXIT_QUALIFICATION);
+        uint32_t ins_len = (uint32_t)rvmcs(cpu->hvf->fd,
                                            VMCS_EXIT_INSTRUCTION_LENGTH);
 
-        uint64_t idtvec_info = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_INFO);
+        uint64_t idtvec_info = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_INFO);
 
         hvf_store_events(cpu, ins_len, idtvec_info);
-        rip = rreg(cpu->hvf_fd, HV_X86_RIP);
-        env->eflags = rreg(cpu->hvf_fd, HV_X86_RFLAGS);
+        rip = rreg(cpu->hvf->fd, HV_X86_RIP);
+        env->eflags = rreg(cpu->hvf->fd, HV_X86_RFLAGS);
 
         qemu_mutex_lock_iothread();
 
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
         case EXIT_REASON_EPT_FAULT:
         {
             hvf_slot *slot;
-            uint64_t gpa = rvmcs(cpu->hvf_fd, VMCS_GUEST_PHYSICAL_ADDRESS);
+            uint64_t gpa = rvmcs(cpu->hvf->fd, VMCS_GUEST_PHYSICAL_ADDRESS);
 
             if (((idtvec_info & VMCS_IDT_VEC_VALID) == 0) &&
                 ((exit_qual & EXIT_QUAL_NMIUDTI) != 0)) {
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
                 store_regs(cpu);
                 break;
             } else if (!string && !in) {
-                RAX(env) = rreg(cpu->hvf_fd, HV_X86_RAX);
+                RAX(env) = rreg(cpu->hvf->fd, HV_X86_RAX);
                 hvf_handle_io(env, port, &RAX(env), 1, size, 1);
                 macvm_set_rip(cpu, rip + ins_len);
                 break;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
             break;
         }
         case EXIT_REASON_CPUID: {
-            uint32_t rax = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RAX);
-            uint32_t rbx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RBX);
-            uint32_t rcx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RCX);
-            uint32_t rdx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RDX);
+            uint32_t rax = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RAX);
+            uint32_t rbx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RBX);
+            uint32_t rcx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RCX);
+            uint32_t rdx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RDX);
 
             if (rax == 1) {
                 /* CPUID1.ecx.OSXSAVE needs to know CR4 */
-                env->cr[4] = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR4);
+                env->cr[4] = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR4);
             }
             hvf_cpu_x86_cpuid(env, rax, rcx, &rax, &rbx, &rcx, &rdx);
 
-            wreg(cpu->hvf_fd, HV_X86_RAX, rax);
-            wreg(cpu->hvf_fd, HV_X86_RBX, rbx);
-            wreg(cpu->hvf_fd, HV_X86_RCX, rcx);
-            wreg(cpu->hvf_fd, HV_X86_RDX, rdx);
+            wreg(cpu->hvf->fd, HV_X86_RAX, rax);
+            wreg(cpu->hvf->fd, HV_X86_RBX, rbx);
+            wreg(cpu->hvf->fd, HV_X86_RCX, rcx);
+            wreg(cpu->hvf->fd, HV_X86_RDX, rdx);
 
             macvm_set_rip(cpu, rip + ins_len);
             break;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
         case EXIT_REASON_XSETBV: {
             X86CPU *x86_cpu = X86_CPU(cpu);
             CPUX86State *env = &x86_cpu->env;
-            uint32_t eax = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RAX);
-            uint32_t ecx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RCX);
-            uint32_t edx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RDX);
+            uint32_t eax = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RAX);
+            uint32_t ecx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RCX);
+            uint32_t edx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RDX);
 
             if (ecx) {
                 macvm_set_rip(cpu, rip + ins_len);
                 break;
             }
             env->xcr0 = ((uint64_t)edx << 32) | eax;
-            wreg(cpu->hvf_fd, HV_X86_XCR0, env->xcr0 | 1);
+            wreg(cpu->hvf->fd, HV_X86_XCR0, env->xcr0 | 1);
             macvm_set_rip(cpu, rip + ins_len);
             break;
         }
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
 
             switch (cr) {
             case 0x0: {
-                macvm_set_cr0(cpu->hvf_fd, RRX(env, reg));
+                macvm_set_cr0(cpu->hvf->fd, RRX(env, reg));
                 break;
             }
             case 4: {
-                macvm_set_cr4(cpu->hvf_fd, RRX(env, reg));
+                macvm_set_cr4(cpu->hvf->fd, RRX(env, reg));
                 break;
             }
             case 8: {
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
             break;
         }
         case EXIT_REASON_TASK_SWITCH: {
-            uint64_t vinfo = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_INFO);
+            uint64_t vinfo = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_INFO);
             x68_segment_selector sel = {.sel = exit_qual & 0xffff};
             vmx_handle_task_switch(cpu, sel, (exit_qual >> 30) & 0x3,
              vinfo & VMCS_INTR_VALID, vinfo & VECTORING_INFO_VECTOR_MASK, vinfo
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
             break;
         }
         case EXIT_REASON_RDPMC:
-            wreg(cpu->hvf_fd, HV_X86_RAX, 0);
-            wreg(cpu->hvf_fd, HV_X86_RDX, 0);
+            wreg(cpu->hvf->fd, HV_X86_RAX, 0);
+            wreg(cpu->hvf->fd, HV_X86_RDX, 0);
             macvm_set_rip(cpu, rip + ins_len);
             break;
         case VMX_REASON_VMCALL:
diff --git a/target/i386/hvf/x86.c b/target/i386/hvf/x86.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86.c
+++ b/target/i386/hvf/x86.c
@@ -XXX,XX +XXX,XX @@ bool x86_read_segment_descriptor(struct CPUState *cpu,
     }
 
     if (GDT_SEL == sel.ti) {
-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_BASE);
-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_BASE);
+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
     } else {
-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_BASE);
-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_LIMIT);
+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_BASE);
+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_LIMIT);
     }
 
     if (sel.index * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_write_segment_descriptor(struct CPUState *cpu,
     uint32_t limit;
     
     if (GDT_SEL == sel.ti) {
-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_BASE);
-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_BASE);
+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
     } else {
-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_BASE);
-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_LIMIT);
+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_BASE);
+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_LIMIT);
     }
     
     if (sel.index * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_write_segment_descriptor(struct CPUState *cpu,
 bool x86_read_call_gate(struct CPUState *cpu, struct x86_call_gate *idt_desc,
                         int gate)
 {
-    target_ulong base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_IDTR_BASE);
-    uint32_t limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_IDTR_LIMIT);
+    target_ulong base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_IDTR_BASE);
+    uint32_t limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_IDTR_LIMIT);
 
     memset(idt_desc, 0, sizeof(*idt_desc));
     if (gate * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_read_call_gate(struct CPUState *cpu, struct x86_call_gate *idt_desc,
 
 bool x86_is_protected(struct CPUState *cpu)
 {
-    uint64_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
+    uint64_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
     return cr0 & CR0_PE;
 }
 
@@ -XXX,XX +XXX,XX @@ bool x86_is_v8086(struct CPUState *cpu)
 
 bool x86_is_long_mode(struct CPUState *cpu)
 {
-    return rvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER) & MSR_EFER_LMA;
+    return rvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER) & MSR_EFER_LMA;
 }
 
 bool x86_is_long64_mode(struct CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ bool x86_is_long64_mode(struct CPUState *cpu)
 
 bool x86_is_paging_mode(struct CPUState *cpu)
 {
-    uint64_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
+    uint64_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
     return cr0 & CR0_PG;
 }
 
 bool x86_is_pae_enabled(struct CPUState *cpu)
 {
-    uint64_t cr4 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR4);
+    uint64_t cr4 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR4);
     return cr4 & CR4_PAE;
 }
 
diff --git a/target/i386/hvf/x86_descr.c b/target/i386/hvf/x86_descr.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86_descr.c
+++ b/target/i386/hvf/x86_descr.c
@@ -XXX,XX +XXX,XX @@ static const struct vmx_segment_field {
 
 uint32_t vmx_read_segment_limit(CPUState *cpu, X86Seg seg)
 {
-    return (uint32_t)rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].limit);
+    return (uint32_t)rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].limit);
 }
 
 uint32_t vmx_read_segment_ar(CPUState *cpu, X86Seg seg)
 {
-    return (uint32_t)rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].ar_bytes);
+    return (uint32_t)rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].ar_bytes);
 }
 
 uint64_t vmx_read_segment_base(CPUState *cpu, X86Seg seg)
 {
-    return rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].base);
+    return rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].base);
 }
 
 x68_segment_selector vmx_read_segment_selector(CPUState *cpu, X86Seg seg)
 {
     x68_segment_selector sel;
-    sel.sel = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector);
+    sel.sel = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector);
     return sel;
 }
 
 void vmx_write_segment_selector(struct CPUState *cpu, x68_segment_selector selector, X86Seg seg)
 {
-    wvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector, selector.sel);
+    wvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector, selector.sel);
 }
 
 void vmx_read_segment_descriptor(struct CPUState *cpu, struct vmx_segment *desc, X86Seg seg)
 {
-    desc->sel = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector);
-    desc->base = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].base);
-    desc->limit = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].limit);
-    desc->ar = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].ar_bytes);
+    desc->sel = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector);
+    desc->base = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].base);
+    desc->limit = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].limit);
+    desc->ar = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].ar_bytes);
 }
 
 void vmx_write_segment_descriptor(CPUState *cpu, struct vmx_segment *desc, X86Seg seg)
 {
     const struct vmx_segment_field *sf = &vmx_segment_fields[seg];
 
-    wvmcs(cpu->hvf_fd, sf->base, desc->base);
-    wvmcs(cpu->hvf_fd, sf->limit, desc->limit);
-    wvmcs(cpu->hvf_fd, sf->selector, desc->sel);
-    wvmcs(cpu->hvf_fd, sf->ar_bytes, desc->ar);
+    wvmcs(cpu->hvf->fd, sf->base, desc->base);
+    wvmcs(cpu->hvf->fd, sf->limit, desc->limit);
+    wvmcs(cpu->hvf->fd, sf->selector, desc->sel);
+    wvmcs(cpu->hvf->fd, sf->ar_bytes, desc->ar);
 }
 
 void x86_segment_descriptor_to_vmx(struct CPUState *cpu, x68_segment_selector selector, struct x86_segment_descriptor *desc, struct vmx_segment *vmx_desc)
diff --git a/target/i386/hvf/x86_emu.c b/target/i386/hvf/x86_emu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86_emu.c
+++ b/target/i386/hvf/x86_emu.c
@@ -XXX,XX +XXX,XX @@ void simulate_rdmsr(struct CPUState *cpu)
 
     switch (msr) {
     case MSR_IA32_TSC:
-        val = rdtscp() + rvmcs(cpu->hvf_fd, VMCS_TSC_OFFSET);
+        val = rdtscp() + rvmcs(cpu->hvf->fd, VMCS_TSC_OFFSET);
         break;
     case MSR_IA32_APICBASE:
         val = cpu_get_apic_base(X86_CPU(cpu)->apic_state);
@@ -XXX,XX +XXX,XX @@ void simulate_rdmsr(struct CPUState *cpu)
         val = x86_cpu->ucode_rev;
         break;
     case MSR_EFER:
-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER);
+        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER);
         break;
     case MSR_FSBASE:
-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_FS_BASE);
+        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_FS_BASE);
         break;
     case MSR_GSBASE:
-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_GS_BASE);
+        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_GS_BASE);
         break;
     case MSR_KERNELGSBASE:
-        val = rvmcs(cpu->hvf_fd, VMCS_HOST_FS_BASE);
+        val = rvmcs(cpu->hvf->fd, VMCS_HOST_FS_BASE);
         break;
     case MSR_STAR:
         abort();
@@ -XXX,XX +XXX,XX @@ void simulate_wrmsr(struct CPUState *cpu)
         cpu_set_apic_base(X86_CPU(cpu)->apic_state, data);
         break;
     case MSR_FSBASE:
-        wvmcs(cpu->hvf_fd, VMCS_GUEST_FS_BASE, data);
+        wvmcs(cpu->hvf->fd, VMCS_GUEST_FS_BASE, data);
         break;
     case MSR_GSBASE:
-        wvmcs(cpu->hvf_fd, VMCS_GUEST_GS_BASE, data);
+        wvmcs(cpu->hvf->fd, VMCS_GUEST_GS_BASE, data);
         break;
     case MSR_KERNELGSBASE:
-        wvmcs(cpu->hvf_fd, VMCS_HOST_FS_BASE, data);
+        wvmcs(cpu->hvf->fd, VMCS_HOST_FS_BASE, data);
         break;
     case MSR_STAR:
         abort();
@@ -XXX,XX +XXX,XX @@ void simulate_wrmsr(struct CPUState *cpu)
         break;
     case MSR_EFER:
         /*printf("new efer %llx\n", EFER(cpu));*/
-        wvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER, data);
+        wvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER, data);
         if (data & MSR_EFER_NXE) {
-            hv_vcpu_invalidate_tlb(cpu->hvf_fd);
+            hv_vcpu_invalidate_tlb(cpu->hvf->fd);
         }
         break;
     case MSR_MTRRphysBase(0):
@@ -XXX,XX +XXX,XX @@ void load_regs(struct CPUState *cpu)
     CPUX86State *env = &x86_cpu->env;
 
     int i = 0;
-    RRX(env, R_EAX) = rreg(cpu->hvf_fd, HV_X86_RAX);
-    RRX(env, R_EBX) = rreg(cpu->hvf_fd, HV_X86_RBX);
-    RRX(env, R_ECX) = rreg(cpu->hvf_fd, HV_X86_RCX);
-    RRX(env, R_EDX) = rreg(cpu->hvf_fd, HV_X86_RDX);
-    RRX(env, R_ESI) = rreg(cpu->hvf_fd, HV_X86_RSI);
-    RRX(env, R_EDI) = rreg(cpu->hvf_fd, HV_X86_RDI);
-    RRX(env, R_ESP) = rreg(cpu->hvf_fd, HV_X86_RSP);
-    RRX(env, R_EBP) = rreg(cpu->hvf_fd, HV_X86_RBP);
+    RRX(env, R_EAX) = rreg(cpu->hvf->fd, HV_X86_RAX);
+    RRX(env, R_EBX) = rreg(cpu->hvf->fd, HV_X86_RBX);
+    RRX(env, R_ECX) = rreg(cpu->hvf->fd, HV_X86_RCX);
+    RRX(env, R_EDX) = rreg(cpu->hvf->fd, HV_X86_RDX);
+    RRX(env, R_ESI) = rreg(cpu->hvf->fd, HV_X86_RSI);
+    RRX(env, R_EDI) = rreg(cpu->hvf->fd, HV_X86_RDI);
+    RRX(env, R_ESP) = rreg(cpu->hvf->fd, HV_X86_RSP);
+    RRX(env, R_EBP) = rreg(cpu->hvf->fd, HV_X86_RBP);
     for (i = 8; i < 16; i++) {
-        RRX(env, i) = rreg(cpu->hvf_fd, HV_X86_RAX + i);
+        RRX(env, i) = rreg(cpu->hvf->fd, HV_X86_RAX + i);
     }
 
-    env->eflags = rreg(cpu->hvf_fd, HV_X86_RFLAGS);
+    env->eflags = rreg(cpu->hvf->fd, HV_X86_RFLAGS);
     rflags_to_lflags(env);
-    env->eip = rreg(cpu->hvf_fd, HV_X86_RIP);
+    env->eip = rreg(cpu->hvf->fd, HV_X86_RIP);
 }
 
 void store_regs(struct CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ void store_regs(struct CPUState *cpu)
     CPUX86State *env = &x86_cpu->env;
 
     int i = 0;
-    wreg(cpu->hvf_fd, HV_X86_RAX, RAX(env));
-    wreg(cpu->hvf_fd, HV_X86_RBX, RBX(env));
-    wreg(cpu->hvf_fd, HV_X86_RCX, RCX(env));
-    wreg(cpu->hvf_fd, HV_X86_RDX, RDX(env));
-    wreg(cpu->hvf_fd, HV_X86_RSI, RSI(env));
-    wreg(cpu->hvf_fd, HV_X86_RDI, RDI(env));
-    wreg(cpu->hvf_fd, HV_X86_RBP, RBP(env));
-    wreg(cpu->hvf_fd, HV_X86_RSP, RSP(env));
+    wreg(cpu->hvf->fd, HV_X86_RAX, RAX(env));
+    wreg(cpu->hvf->fd, HV_X86_RBX, RBX(env));
+    wreg(cpu->hvf->fd, HV_X86_RCX, RCX(env));
+    wreg(cpu->hvf->fd, HV_X86_RDX, RDX(env));
+    wreg(cpu->hvf->fd, HV_X86_RSI, RSI(env));
+    wreg(cpu->hvf->fd, HV_X86_RDI, RDI(env));
+    wreg(cpu->hvf->fd, HV_X86_RBP, RBP(env));
+    wreg(cpu->hvf->fd, HV_X86_RSP, RSP(env));
     for (i = 8; i < 16; i++) {
-        wreg(cpu->hvf_fd, HV_X86_RAX + i, RRX(env, i));
+        wreg(cpu->hvf->fd, HV_X86_RAX + i, RRX(env, i));
     }
 
     lflags_to_rflags(env);
-    wreg(cpu->hvf_fd, HV_X86_RFLAGS, env->eflags);
+    wreg(cpu->hvf->fd, HV_X86_RFLAGS, env->eflags);
     macvm_set_rip(cpu, env->eip);
 }
 
diff --git a/target/i386/hvf/x86_mmu.c b/target/i386/hvf/x86_mmu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86_mmu.c
+++ b/target/i386/hvf/x86_mmu.c
@@ -XXX,XX +XXX,XX @@ static bool test_pt_entry(struct CPUState *cpu, struct gpt_translation *pt,
         pt->err_code |= MMU_PAGE_PT;
     }
 
-    uint32_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
+    uint32_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
     /* check protection */
     if (cr0 & CR0_WP) {
         if (pt->write_access && !pte_write_access(pte)) {
@@ -XXX,XX +XXX,XX @@ static bool walk_gpt(struct CPUState *cpu, target_ulong addr, int err_code,
 {
     int top_level, level;
     bool is_large = false;
-    target_ulong cr3 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR3);
+    target_ulong cr3 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR3);
     uint64_t page_mask = pae ? PAE_PTE_PAGE_MASK : LEGACY_PTE_PAGE_MASK;
     
     memset(pt, 0, sizeof(*pt));
diff --git a/target/i386/hvf/x86_task.c b/target/i386/hvf/x86_task.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86_task.c
+++ b/target/i386/hvf/x86_task.c
@@ -XXX,XX +XXX,XX @@ static void load_state_from_tss32(CPUState *cpu, struct x86_tss_segment32 *tss)
     X86CPU *x86_cpu = X86_CPU(cpu);
     CPUX86State *env = &x86_cpu->env;
 
-    wvmcs(cpu->hvf_fd, VMCS_GUEST_CR3, tss->cr3);
+    wvmcs(cpu->hvf->fd, VMCS_GUEST_CR3, tss->cr3);
 
     env->eip = tss->eip;
     env->eflags = tss->eflags | 2;
@@ -XXX,XX +XXX,XX @@ static int task_switch_32(CPUState *cpu, x68_segment_selector tss_sel, x68_segme
 
 void vmx_handle_task_switch(CPUState *cpu, x68_segment_selector tss_sel, int reason, bool gate_valid, uint8_t gate, uint64_t gate_type)
 {
-    uint64_t rip = rreg(cpu->hvf_fd, HV_X86_RIP);
+    uint64_t rip = rreg(cpu->hvf->fd, HV_X86_RIP);
     if (!gate_valid || (gate_type != VMCS_INTR_T_HWEXCEPTION &&
                         gate_type != VMCS_INTR_T_HWINTR &&
                         gate_type != VMCS_INTR_T_NMI)) {
-        int ins_len = rvmcs(cpu->hvf_fd, VMCS_EXIT_INSTRUCTION_LENGTH);
+        int ins_len = rvmcs(cpu->hvf->fd, VMCS_EXIT_INSTRUCTION_LENGTH);
         macvm_set_rip(cpu, rip + ins_len);
         return;
     }
@@ -XXX,XX +XXX,XX @@ void vmx_handle_task_switch(CPUState *cpu, x68_segment_selector tss_sel, int rea
         //ret = task_switch_16(cpu, tss_sel, old_tss_sel, old_tss_base, &next_tss_desc);
         VM_PANIC("task_switch_16");
 
-    macvm_set_cr0(cpu->hvf_fd, rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0) | CR0_TS);
+    macvm_set_cr0(cpu->hvf->fd, rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0) | CR0_TS);
     x86_segment_descriptor_to_vmx(cpu, tss_sel, &next_tss_desc, &vmx_seg);
     vmx_write_segment_descriptor(cpu, &vmx_seg, R_TR);
 
     store_regs(cpu);
 
-    hv_vcpu_invalidate_tlb(cpu->hvf_fd);
-    hv_vcpu_flush(cpu->hvf_fd);
+    hv_vcpu_invalidate_tlb(cpu->hvf->fd);
+    hv_vcpu_flush(cpu->hvf->fd);
 }
diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -XXX,XX +XXX,XX @@ void hvf_put_xsave(CPUState *cpu_state)
 
     x86_cpu_xsave_all_areas(X86_CPU(cpu_state), xsave);
 
-    if (hv_vcpu_write_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
+    if (hv_vcpu_write_fpstate(cpu_state->hvf->fd, (void*)xsave, 4096)) {
         abort();
     }
 }
@@ -XXX,XX +XXX,XX @@ void hvf_put_segments(CPUState *cpu_state)
     CPUX86State *env = &X86_CPU(cpu_state)->env;
     struct vmx_segment seg;
     
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_LIMIT, env->idt.limit);
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_BASE, env->idt.base);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_LIMIT, env->idt.limit);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_BASE, env->idt.base);
 
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_LIMIT, env->gdt.limit);
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_BASE, env->gdt.base);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_LIMIT, env->gdt.limit);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_BASE, env->gdt.base);
 
-    /* wvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR2, env->cr[2]); */
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR3, env->cr[3]);
+    /* wvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR2, env->cr[2]); */
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR3, env->cr[3]);
     vmx_update_tpr(cpu_state);
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IA32_EFER, env->efer);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IA32_EFER, env->efer);
 
-    macvm_set_cr4(cpu_state->hvf_fd, env->cr[4]);
-    macvm_set_cr0(cpu_state->hvf_fd, env->cr[0]);
+    macvm_set_cr4(cpu_state->hvf->fd, env->cr[4]);
+    macvm_set_cr0(cpu_state->hvf->fd, env->cr[0]);
 
     hvf_set_segment(cpu_state, &seg, &env->segs[R_CS], false);
     vmx_write_segment_descriptor(cpu_state, &seg, R_CS);
@@ -XXX,XX +XXX,XX @@ void hvf_put_segments(CPUState *cpu_state)
     hvf_set_segment(cpu_state, &seg, &env->ldt, false);
     vmx_write_segment_descriptor(cpu_state, &seg, R_LDTR);
     
-    hv_vcpu_flush(cpu_state->hvf_fd);
+    hv_vcpu_flush(cpu_state->hvf->fd);
 }
     
 void hvf_put_msrs(CPUState *cpu_state)
 {
     CPUX86State *env = &X86_CPU(cpu_state)->env;
 
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_CS,
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_CS,
                       env->sysenter_cs);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_ESP,
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_ESP,
                       env->sysenter_esp);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_EIP,
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_EIP,
                       env->sysenter_eip);
 
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_STAR, env->star);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_STAR, env->star);
 
 #ifdef TARGET_X86_64
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_CSTAR, env->cstar);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_KERNELGSBASE, env->kernelgsbase);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_FMASK, env->fmask);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_LSTAR, env->lstar);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_CSTAR, env->cstar);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_KERNELGSBASE, env->kernelgsbase);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_FMASK, env->fmask);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_LSTAR, env->lstar);
 #endif
 
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_GSBASE, env->segs[R_GS].base);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_FSBASE, env->segs[R_FS].base);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_GSBASE, env->segs[R_GS].base);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_FSBASE, env->segs[R_FS].base);
 }
 
 
@@ -XXX,XX +XXX,XX @@ void hvf_get_xsave(CPUState *cpu_state)
 
     xsave = X86_CPU(cpu_state)->env.xsave_buf;
 
-    if (hv_vcpu_read_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
+    if (hv_vcpu_read_fpstate(cpu_state->hvf->fd, (void*)xsave, 4096)) {
         abort();
     }
 
@@ -XXX,XX +XXX,XX @@ void hvf_get_segments(CPUState *cpu_state)
     vmx_read_segment_descriptor(cpu_state, &seg, R_LDTR);
     hvf_get_segment(&env->ldt, &seg);
 
-    env->idt.limit = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_LIMIT);
-    env->idt.base = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_BASE);
-    env->gdt.limit = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
-    env->gdt.base = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_BASE);
+    env->idt.limit = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_LIMIT);
+    env->idt.base = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_BASE);
+    env->gdt.limit = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
+    env->gdt.base = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_BASE);
 
-    env->cr[0] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR0);
+    env->cr[0] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR0);
     env->cr[2] = 0;
-    env->cr[3] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR3);
-    env->cr[4] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR4);
+    env->cr[3] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR3);
+    env->cr[4] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR4);
     
-    env->efer = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IA32_EFER);
+    env->efer = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IA32_EFER);
 }
 
 void hvf_get_msrs(CPUState *cpu_state)
@@ -XXX,XX +XXX,XX @@ void hvf_get_msrs(CPUState *cpu_state)
     CPUX86State *env = &X86_CPU(cpu_state)->env;
     uint64_t tmp;
     
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_CS, &tmp);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_CS, &tmp);
     env->sysenter_cs = tmp;
     
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_ESP, &tmp);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_ESP, &tmp);
     env->sysenter_esp = tmp;
 
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_EIP, &tmp);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_EIP, &tmp);
     env->sysenter_eip = tmp;
 
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_STAR, &env->star);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_STAR, &env->star);
 
 #ifdef TARGET_X86_64
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_CSTAR, &env->cstar);
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_KERNELGSBASE, &env->kernelgsbase);
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_FMASK, &env->fmask);
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_LSTAR, &env->lstar);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_CSTAR, &env->cstar);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_KERNELGSBASE, &env->kernelgsbase);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_FMASK, &env->fmask);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_LSTAR, &env->lstar);
 #endif
 
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_APICBASE, &tmp);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_APICBASE, &tmp);
     
-    env->tsc = rdtscp() + rvmcs(cpu_state->hvf_fd, VMCS_TSC_OFFSET);
+    env->tsc = rdtscp() + rvmcs(cpu_state->hvf->fd, VMCS_TSC_OFFSET);
 }
 
 int hvf_put_registers(CPUState *cpu_state)
@@ -XXX,XX +XXX,XX @@ int hvf_put_registers(CPUState *cpu_state)
     X86CPU *x86cpu = X86_CPU(cpu_state);
     CPUX86State *env = &x86cpu->env;
 
-    wreg(cpu_state->hvf_fd, HV_X86_RAX, env->regs[R_EAX]);
-    wreg(cpu_state->hvf_fd, HV_X86_RBX, env->regs[R_EBX]);
-    wreg(cpu_state->hvf_fd, HV_X86_RCX, env->regs[R_ECX]);
-    wreg(cpu_state->hvf_fd, HV_X86_RDX, env->regs[R_EDX]);
-    wreg(cpu_state->hvf_fd, HV_X86_RBP, env->regs[R_EBP]);
-    wreg(cpu_state->hvf_fd, HV_X86_RSP, env->regs[R_ESP]);
-    wreg(cpu_state->hvf_fd, HV_X86_RSI, env->regs[R_ESI]);
-    wreg(cpu_state->hvf_fd, HV_X86_RDI, env->regs[R_EDI]);
-    wreg(cpu_state->hvf_fd, HV_X86_R8, env->regs[8]);
-    wreg(cpu_state->hvf_fd, HV_X86_R9, env->regs[9]);
-    wreg(cpu_state->hvf_fd, HV_X86_R10, env->regs[10]);
-    wreg(cpu_state->hvf_fd, HV_X86_R11, env->regs[11]);
-    wreg(cpu_state->hvf_fd, HV_X86_R12, env->regs[12]);
-    wreg(cpu_state->hvf_fd, HV_X86_R13, env->regs[13]);
-    wreg(cpu_state->hvf_fd, HV_X86_R14, env->regs[14]);
-    wreg(cpu_state->hvf_fd, HV_X86_R15, env->regs[15]);
-    wreg(cpu_state->hvf_fd, HV_X86_RFLAGS, env->eflags);
-    wreg(cpu_state->hvf_fd, HV_X86_RIP, env->eip);
+    wreg(cpu_state->hvf->fd, HV_X86_RAX, env->regs[R_EAX]);
+    wreg(cpu_state->hvf->fd, HV_X86_RBX, env->regs[R_EBX]);
+    wreg(cpu_state->hvf->fd, HV_X86_RCX, env->regs[R_ECX]);
+    wreg(cpu_state->hvf->fd, HV_X86_RDX, env->regs[R_EDX]);
+    wreg(cpu_state->hvf->fd, HV_X86_RBP, env->regs[R_EBP]);
+    wreg(cpu_state->hvf->fd, HV_X86_RSP, env->regs[R_ESP]);
+    wreg(cpu_state->hvf->fd, HV_X86_RSI, env->regs[R_ESI]);
+    wreg(cpu_state->hvf->fd, HV_X86_RDI, env->regs[R_EDI]);
+    wreg(cpu_state->hvf->fd, HV_X86_R8, env->regs[8]);
+    wreg(cpu_state->hvf->fd, HV_X86_R9, env->regs[9]);
+    wreg(cpu_state->hvf->fd, HV_X86_R10, env->regs[10]);
+    wreg(cpu_state->hvf->fd, HV_X86_R11, env->regs[11]);
+    wreg(cpu_state->hvf->fd, HV_X86_R12, env->regs[12]);
+    wreg(cpu_state->hvf->fd, HV_X86_R13, env->regs[13]);
+    wreg(cpu_state->hvf->fd, HV_X86_R14, env->regs[14]);
+    wreg(cpu_state->hvf->fd, HV_X86_R15, env->regs[15]);
+    wreg(cpu_state->hvf->fd, HV_X86_RFLAGS, env->eflags);
+    wreg(cpu_state->hvf->fd, HV_X86_RIP, env->eip);
    
-    wreg(cpu_state->hvf_fd, HV_X86_XCR0, env->xcr0);
+    wreg(cpu_state->hvf->fd, HV_X86_XCR0, env->xcr0);
     
     hvf_put_xsave(cpu_state);
     
@@ -XXX,XX +XXX,XX @@ int hvf_put_registers(CPUState *cpu_state)
     
     hvf_put_msrs(cpu_state);
     
-    wreg(cpu_state->hvf_fd, HV_X86_DR0, env->dr[0]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR1, env->dr[1]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR2, env->dr[2]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR3, env->dr[3]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR4, env->dr[4]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR5, env->dr[5]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR6, env->dr[6]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR7, env->dr[7]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR0, env->dr[0]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR1, env->dr[1]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR2, env->dr[2]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR3, env->dr[3]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR4, env->dr[4]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR5, env->dr[5]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR6, env->dr[6]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR7, env->dr[7]);
     
     return 0;
 }
@@ -XXX,XX +XXX,XX @@ int hvf_get_registers(CPUState *cpu_state)
     X86CPU *x86cpu = X86_CPU(cpu_state);
     CPUX86State *env = &x86cpu->env;
 
-    env->regs[R_EAX] = rreg(cpu_state->hvf_fd, HV_X86_RAX);
-    env->regs[R_EBX] = rreg(cpu_state->hvf_fd, HV_X86_RBX);
-    env->regs[R_ECX] = rreg(cpu_state->hvf_fd, HV_X86_RCX);
-    env->regs[R_EDX] = rreg(cpu_state->hvf_fd, HV_X86_RDX);
-    env->regs[R_EBP] = rreg(cpu_state->hvf_fd, HV_X86_RBP);
-    env->regs[R_ESP] = rreg(cpu_state->hvf_fd, HV_X86_RSP);
-    env->regs[R_ESI] = rreg(cpu_state->hvf_fd, HV_X86_RSI);
-    env->regs[R_EDI] = rreg(cpu_state->hvf_fd, HV_X86_RDI);
-    env->regs[8] = rreg(cpu_state->hvf_fd, HV_X86_R8);
-    env->regs[9] = rreg(cpu_state->hvf_fd, HV_X86_R9);
-    env->regs[10] = rreg(cpu_state->hvf_fd, HV_X86_R10);
-    env->regs[11] = rreg(cpu_state->hvf_fd, HV_X86_R11);
-    env->regs[12] = rreg(cpu_state->hvf_fd, HV_X86_R12);
-    env->regs[13] = rreg(cpu_state->hvf_fd, HV_X86_R13);
-    env->regs[14] = rreg(cpu_state->hvf_fd, HV_X86_R14);
-    env->regs[15] = rreg(cpu_state->hvf_fd, HV_X86_R15);
+    env->regs[R_EAX] = rreg(cpu_state->hvf->fd, HV_X86_RAX);
+    env->regs[R_EBX] = rreg(cpu_state->hvf->fd, HV_X86_RBX);
+    env->regs[R_ECX] = rreg(cpu_state->hvf->fd, HV_X86_RCX);
+    env->regs[R_EDX] = rreg(cpu_state->hvf->fd, HV_X86_RDX);
+    env->regs[R_EBP] = rreg(cpu_state->hvf->fd, HV_X86_RBP);
+    env->regs[R_ESP] = rreg(cpu_state->hvf->fd, HV_X86_RSP);
+    env->regs[R_ESI] = rreg(cpu_state->hvf->fd, HV_X86_RSI);
+    env->regs[R_EDI] = rreg(cpu_state->hvf->fd, HV_X86_RDI);
+    env->regs[8] = rreg(cpu_state->hvf->fd, HV_X86_R8);
+    env->regs[9] = rreg(cpu_state->hvf->fd, HV_X86_R9);
+    env->regs[10] = rreg(cpu_state->hvf->fd, HV_X86_R10);
+    env->regs[11] = rreg(cpu_state->hvf->fd, HV_X86_R11);
+    env->regs[12] = rreg(cpu_state->hvf->fd, HV_X86_R12);
+    env->regs[13] = rreg(cpu_state->hvf->fd, HV_X86_R13);
+    env->regs[14] = rreg(cpu_state->hvf->fd, HV_X86_R14);
+    env->regs[15] = rreg(cpu_state->hvf->fd, HV_X86_R15);
     
-    env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
-    env->eip = rreg(cpu_state->hvf_fd, HV_X86_RIP);
+    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
+    env->eip = rreg(cpu_state->hvf->fd, HV_X86_RIP);
    
     hvf_get_xsave(cpu_state);
-    env->xcr0 = rreg(cpu_state->hvf_fd, HV_X86_XCR0);
+    env->xcr0 = rreg(cpu_state->hvf->fd, HV_X86_XCR0);
     
     hvf_get_segments(cpu_state);
     hvf_get_msrs(cpu_state);
     
-    env->dr[0] = rreg(cpu_state->hvf_fd, HV_X86_DR0);
-    env->dr[1] = rreg(cpu_state->hvf_fd, HV_X86_DR1);
-    env->dr[2] = rreg(cpu_state->hvf_fd, HV_X86_DR2);
-    env->dr[3] = rreg(cpu_state->hvf_fd, HV_X86_DR3);
-    env->dr[4] = rreg(cpu_state->hvf_fd, HV_X86_DR4);
-    env->dr[5] = rreg(cpu_state->hvf_fd, HV_X86_DR5);
-    env->dr[6] = rreg(cpu_state->hvf_fd, HV_X86_DR6);
-    env->dr[7] = rreg(cpu_state->hvf_fd, HV_X86_DR7);
+    env->dr[0] = rreg(cpu_state->hvf->fd, HV_X86_DR0);
+    env->dr[1] = rreg(cpu_state->hvf->fd, HV_X86_DR1);
+    env->dr[2] = rreg(cpu_state->hvf->fd, HV_X86_DR2);
+    env->dr[3] = rreg(cpu_state->hvf->fd, HV_X86_DR3);
+    env->dr[4] = rreg(cpu_state->hvf->fd, HV_X86_DR4);
+    env->dr[5] = rreg(cpu_state->hvf->fd, HV_X86_DR5);
+    env->dr[6] = rreg(cpu_state->hvf->fd, HV_X86_DR6);
+    env->dr[7] = rreg(cpu_state->hvf->fd, HV_X86_DR7);
     
     x86_update_hflags(env);
     return 0;
@@ -XXX,XX +XXX,XX @@ int hvf_get_registers(CPUState *cpu_state)
 static void vmx_set_int_window_exiting(CPUState *cpu)
 {
      uint64_t val;
-     val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
-     wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val |
+     val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
+     wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val |
              VMCS_PRI_PROC_BASED_CTLS_INT_WINDOW_EXITING);
 }
 
 void vmx_clear_int_window_exiting(CPUState *cpu)
 {
      uint64_t val;
-     val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
-     wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val &
+     val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
+     wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val &
              ~VMCS_PRI_PROC_BASED_CTLS_INT_WINDOW_EXITING);
 }
 
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
     uint64_t info = 0;
     if (have_event) {
         info = vector | intr_type | VMCS_INTR_VALID;
-        uint64_t reason = rvmcs(cpu_state->hvf_fd, VMCS_EXIT_REASON);
+        uint64_t reason = rvmcs(cpu_state->hvf->fd, VMCS_EXIT_REASON);
         if (env->nmi_injected && reason != EXIT_REASON_TASK_SWITCH) {
             vmx_clear_nmi_blocking(cpu_state);
         }
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
             info &= ~(1 << 12); /* clear undefined bit */
             if (intr_type == VMCS_INTR_T_SWINTR ||
                 intr_type == VMCS_INTR_T_SWEXCEPTION) {
-                wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INST_LENGTH, env->ins_len);
+                wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INST_LENGTH, env->ins_len);
             }
             
             if (env->has_error_code) {
-                wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_EXCEPTION_ERROR,
+                wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_EXCEPTION_ERROR,
                       env->error_code);
                 /* Indicate that VMCS_ENTRY_EXCEPTION_ERROR is valid */
                 info |= VMCS_INTR_DEL_ERRCODE;
             }
             /*printf("reinject  %lx err %d\n", info, err);*/
-            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, info);
+            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, info);
         };
     }
 
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
         if (!(env->hflags2 & HF2_NMI_MASK) && !(info & VMCS_INTR_VALID)) {
             cpu_state->interrupt_request &= ~CPU_INTERRUPT_NMI;
             info = VMCS_INTR_VALID | VMCS_INTR_T_NMI | EXCP02_NMI;
-            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, info);
+            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, info);
         } else {
             vmx_set_nmi_window_exiting(cpu_state);
         }
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
         int line = cpu_get_pic_interrupt(&x86cpu->env);
         cpu_state->interrupt_request &= ~CPU_INTERRUPT_HARD;
         if (line >= 0) {
-            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, line |
+            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, line |
                   VMCS_INTR_VALID | VMCS_INTR_T_HWINTR);
         }
     }
@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
     X86CPU *cpu = X86_CPU(cpu_state);
     CPUX86State *env = &cpu->env;
 
-    env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
+    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
 
     if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
         cpu_synchronize_state(cpu_state);
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

The hooks we have that call us after reset, init and loadvm really all
just want to say "The reference of all register state is in the QEMU
vcpu struct, please push it".

We already have a working pushing mechanism though called cpu->vcpu_dirty,
so we can just reuse that for all of the above, syncing state properly the
next time we actually execute a vCPU.

This fixes PSCI resets on ARM, as they modify CPU state even after the
post init call has completed, but before we execute the vCPU again.

To also make the scheme work for x86, we have to make sure we don't
move stale eflags into our env when the vcpu state is dirty.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-13-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.c | 27 +++++++--------------------
 target/i386/hvf/x86hvf.c  |  5 ++++-
 2 files changed, 11 insertions(+), 21 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static void hvf_cpu_synchronize_state(CPUState *cpu)
     }
 }
 
-static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
-                                              run_on_cpu_data arg)
+static void do_hvf_cpu_synchronize_set_dirty(CPUState *cpu,
+                                             run_on_cpu_data arg)
 {
-    hvf_put_registers(cpu);
-    cpu->vcpu_dirty = false;
+    /* QEMU state is the reference, push it to HVF now and on next entry */
+    cpu->vcpu_dirty = true;
 }
 
 static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
-}
-
-static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
-                                             run_on_cpu_data arg)
-{
-    hvf_put_registers(cpu);
-    cpu->vcpu_dirty = false;
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
 }
 
 static void hvf_cpu_synchronize_post_init(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
-}
-
-static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
-                                              run_on_cpu_data arg)
-{
-    cpu->vcpu_dirty = true;
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
 }
 
 static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
 }
 
 static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
     X86CPU *cpu = X86_CPU(cpu_state);
     CPUX86State *env = &cpu->env;
 
-    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
+    if (!cpu_state->vcpu_dirty) {
+        /* light weight sync for CPU_INTERRUPT_HARD and IF_MASK */
+        env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
+    }
 
     if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
         cpu_synchronize_state(cpu_state);
-- 
2.20.1

Coverity notes that we don't check for dup2() failing.  Add some
assertions so that if it does ever happen we get some indication.
(This is similar to how we handle other "don't expect this syscall to
fail" checks in this test code.)

Fixes: Coverity CID 1432346
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-2-peter.maydell@linaro.org
---
 tests/qtest/bios-tables-test.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -XXX,XX +XXX,XX @@ static void test_acpi_asl(test_data *data)
                                                  exp_sdt->asl_file, sdt->asl_file);
                     int out = dup(STDOUT_FILENO);
                     int ret G_GNUC_UNUSED;
+                    int dupret;
 
-                    dup2(STDERR_FILENO, STDOUT_FILENO);
+                    g_assert(out >= 0);
+                    dupret = dup2(STDERR_FILENO, STDOUT_FILENO);
+                    g_assert(dupret >= 0);
                     ret = system(diff) ;
-                    dup2(out, STDOUT_FILENO);
+                    dupret = dup2(out, STDOUT_FILENO);
+                    g_assert(dupret >= 0);
                     close(out);
                     g_free(diff);
                 }
-- 
2.20.1

The e1000e_send_verify() test calls qemu_recv() but doesn't
check that the call succeeded, which annoys Coverity. Add
an explicit test check for the length of the data.

(This is a test check, not a "we assume this syscall always
succeeds", so we use g_assert_cmpint() rather than g_assert().)

Fixes: Coverity CID 1432324
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-3-peter.maydell@linaro.org
---
 tests/qtest/e1000e-test.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/e1000e-test.c b/tests/qtest/e1000e-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/e1000e-test.c
+++ b/tests/qtest/e1000e-test.c
@@ -XXX,XX +XXX,XX @@ static void e1000e_send_verify(QE1000E *d, int *test_sockets, QGuestAllocator *a
     /* Check data sent to the backend */
     ret = qemu_recv(test_sockets[0], &recv_len, sizeof(recv_len), 0);
     g_assert_cmpint(ret, == , sizeof(recv_len));
-    qemu_recv(test_sockets[0], buffer, 64, 0);
+    ret = qemu_recv(test_sockets[0], buffer, 64, 0);
+    g_assert_cmpint(ret, >=, 5);
     g_assert_cmpstr(buffer, == , "TEST");
 
     /* Free test data buffer */
-- 
2.20.1

Coverity notices that the checks against mkstemp() failing in
create_qcow2_with_mbr() are wrong: mkstemp returns -1 on failure but
the check is just "g_assert(fd)".  Fix to use "g_assert(fd >= 0)",
matching the correct check in create_test_img().

Fixes: Coverity CID 1432274
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-4-peter.maydell@linaro.org
---
 tests/qtest/hd-geo-test.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/hd-geo-test.c b/tests/qtest/hd-geo-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/hd-geo-test.c
+++ b/tests/qtest/hd-geo-test.c
@@ -XXX,XX +XXX,XX @@ static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors)
     }
 
     fd = mkstemp(raw_path);
-    g_assert(fd);
+    g_assert(fd >= 0);
     close(fd);
 
     fd = open(raw_path, O_WRONLY);
@@ -XXX,XX +XXX,XX @@ static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors)
     close(fd);
 
     fd = mkstemp(qcow2_path);
-    g_assert(fd);
+    g_assert(fd >= 0);
     close(fd);
 
     qemu_img_path = getenv("QTEST_QEMU_IMG");
-- 
2.20.1

Coverity points out that we calculate a 64-bit value using 32-bit
arithmetic; add the cast to force the multiply to be done as 64-bits.
(The overflow will never happen with the current test data.)

Fixes: Coverity CID 1432320
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-5-peter.maydell@linaro.org
---
 tests/qtest/pflash-cfi02-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/pflash-cfi02-test.c b/tests/qtest/pflash-cfi02-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/pflash-cfi02-test.c
+++ b/tests/qtest/pflash-cfi02-test.c
@@ -XXX,XX +XXX,XX @@ static void test_geometry(const void *opaque)
 
     for (int region = 0; region < nb_erase_regions; ++region) {
         for (uint32_t i = 0; i < c->nb_blocs[region]; ++i) {
-            uint64_t byte_addr = i * c->sector_len[region];
+            uint64_t byte_addr = (uint64_t)i * c->sector_len[region];
             g_assert_cmphex(flash_read(c, byte_addr), ==, bank_mask(c));
         }
     }
-- 
2.20.1

Coverity points out that in tpm_test_swtpm_migration_test() we
assume that src_tpm_addr and dst_tpm_addr are non-NULL (we
pass them to tpm_util_migration_start_qemu() which will
unconditionally dereference them) but then later explicitly
check them for NULL. Remove the pointless checks.

Fixes: Coverity CID 1432367, 1432359

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-6-peter.maydell@linaro.org
---
 tests/qtest/tpm-tests.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/tpm-tests.c
+++ b/tests/qtest/tpm-tests.c
@@ -XXX,XX +XXX,XX @@ void tpm_test_swtpm_migration_test(const char *src_tpm_path,
     qtest_quit(src_qemu);
 
     tpm_util_swtpm_kill(dst_tpm_pid);
-    if (dst_tpm_addr) {
-        g_unlink(dst_tpm_addr->u.q_unix.path);
-        qapi_free_SocketAddress(dst_tpm_addr);
-    }
+    g_unlink(dst_tpm_addr->u.q_unix.path);
+    qapi_free_SocketAddress(dst_tpm_addr);
 
     tpm_util_swtpm_kill(src_tpm_pid);
-    if (src_tpm_addr) {
-        g_unlink(src_tpm_addr->u.q_unix.path);
-        qapi_free_SocketAddress(src_tpm_addr);
-    }
+    g_unlink(src_tpm_addr->u.q_unix.path);
+    qapi_free_SocketAddress(src_tpm_addr);
 }
-- 
2.20.1

Coverity complains that we don't check for failures from dup()
and mkstemp(); add asserts that these syscalls succeeded.

Fixes: Coverity CID 1432516, 1432574
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20210525134458.6675-7-peter.maydell@linaro.org
---
 tests/unit/test-vmstate.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tests/unit/test-vmstate.c b/tests/unit/test-vmstate.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/unit/test-vmstate.c
+++ b/tests/unit/test-vmstate.c
@@ -XXX,XX +XXX,XX @@ static int temp_fd;
 /* Duplicate temp_fd and seek to the beginning of the file */
 static QEMUFile *open_test_file(bool write)
 {
-    int fd = dup(temp_fd);
+    int fd;
     QIOChannel *ioc;
     QEMUFile *f;
 
+    fd = dup(temp_fd);
+    g_assert(fd >= 0);
     lseek(fd, 0, SEEK_SET);
     if (write) {
         g_assert_cmpint(ftruncate(fd, 0), ==, 0);
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
     g_autofree char *temp_file = g_strdup_printf("%s/vmst.test.XXXXXX",
                                                  g_get_tmp_dir());
     temp_fd = mkstemp(temp_file);
+    g_assert(temp_fd >= 0);
 
     module_call_init(MODULE_INIT_QOM);
 
-- 
2.20.1