Series comparison

-[PULL 00/45] target-arm queue
+[PULL 00/24] target-arm queue
-The following changes since commit a97978bcc2d1f650c7d411428806e5b03082b8c7:
+The following changes since commit 5a67d7735d4162630769ef495cf813244fc850df:
-  Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210603' into staging (2021-06-03 10:00:35 +0100)
+  Merge remote-tracking branch 'remotes/berrange-gitlab/tags/tls-deps-pull-request' into staging (2021-07-02 08:22:39 +0100)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210603
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210702
-for you to fetch changes up to 1c861885894d840235954060050d240259f5340b:
+for you to fetch changes up to 04ea4d3cfd0a21b248ece8eb7a9436a3d9898dd8:
-  tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed (2021-06-03 16:43:27 +0100)
+  target/arm: Implement MVE shifts by register (2021-07-02 11:48:38 +0100)
 ----------------------------------------------------------------
 target-arm queue:
- * Some not-yet-enabled preliminaries for M-profile MVE support
+ * more MVE instructions
- * Consistently use "Cortex-Axx", not "Cortex Axx" in docs, comments
+ * hw/gpio/gpio_pwr: use shutdown function for reboot
- * docs: Fix installation of man pages with Sphinx 4.x
+ * target/arm: Check NaN mode before silencing NaN
- * Mark LDS{MIN,MAX} as signed operations
+ * tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine
- * Fix missing syndrome value for DAIF and PAC check exceptions
+ * hw/arm: Add basic power management to raspi.
- * Implement BFloat16 extensions
+ * docs/system/arm: Add quanta-gbs-bmc, quanta-q7l1-bmc
  * Refactoring of hvf accelerator code in preparation for aarch64 support
  * Fix some coverity nits in test code
 ----------------------------------------------------------------
-Alexander Graf (12):
+Joe Komlodi (1):
-      hvf: Move assert_hvf_ok() into common directory
+      target/arm: Check NaN mode before silencing NaN
       hvf: Move vcpu thread functions into common directory
       hvf: Move cpu functions into common directory
       hvf: Move hvf internal definitions into common header
       hvf: Make hvf_set_phys_mem() static
       hvf: Remove use of hv_uvaddr_t and hv_gpaddr_t
       hvf: Split out common code on vcpu init and destroy
       hvf: Use cpu_synchronize_state()
       hvf: Make synchronize functions static
       hvf: Remove hvf-accel-ops.h
       hvf: Introduce hvf vcpu struct
       hvf: Simplify post reset/init/loadvm hooks
-Damien Goutte-Gattat (1):
+Maxim Uvarov (1):
-      docs: Fix installation of man pages with Sphinx 4.x
+      hw/gpio/gpio_pwr: use shutdown function for reboot
-Jamie Iles (4):
+Nolan Leake (1):
-      target/arm: fix missing exception class
+      hw/arm: Add basic power management to raspi.
       target/arm: fold do_raise_exception into raise_exception
       target/arm: use raise_exception_ra for MTE check failure
       target/arm: use raise_exception_ra for stack limit exception
-Peter Maydell (15):
+Patrick Venture (2):
-      target/arm: Add isar feature check functions for MVE
+      docs/system/arm: Add quanta-q7l1-bmc reference
-      target/arm: Update feature checks for insns which are "MVE or FP"
+      docs/system/arm: Add quanta-gbs-bmc reference
       target/arm: Move fpsp/fpdp isar check into callers of do_vfp_2op_sp/dp
       target/arm: Add MVE check to VMOV_reg_sp and VMOV_reg_dp
       target/arm: Fix return values in fp_sysreg_checks()
       target/arm: Implement M-profile VPR register
       target/arm: Make FPSCR.LTPSIZE writable for MVE
       target/arm: Allow board models to specify initial NS VTOR
       arm: Consistently use "Cortex-Axx", not "Cortex Axx"
       tests/qtest/bios-tables-test: Check for dup2() failure
       tests/qtest/e1000e-test: Check qemu_recv() succeeded
       tests/qtest/hd-geo-test: Fix checks on mkstemp() return value
       tests/qtest/pflash-cfi02-test: Avoid potential integer overflow
       tests/qtest/tpm-tests: Remove unnecessary NULL checks
       tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed
-Richard Henderson (13):
+Peter Maydell (18):
-      target/arm: Mark LDS{MIN,MAX} as signed operations
+      target/arm: Fix MVE widening/narrowing VLDR/VSTR offset calculation
-      target/arm: Add isar_feature_{aa32, aa64, aa64_sve}_bf16
+      target/arm: Fix bugs in MVE VRMLALDAVH, VRMLSLDAVH
-      target/arm: Unify unallocated path in disas_fp_1src
+      target/arm: Make asimd_imm_const() public
-      target/arm: Implement scalar float32 to bfloat16 conversion
+      target/arm: Use asimd_imm_const for A64 decode
-      target/arm: Implement vector float32 to bfloat16 conversion
+      target/arm: Use dup_const() instead of bitfield_replicate()
-      softfpu: Add float_round_to_odd_inf
+      target/arm: Implement MVE logical immediate insns
-      target/arm: Implement bfloat16 dot product (vector)
+      target/arm: Implement MVE vector shift left by immediate insns
-      target/arm: Implement bfloat16 dot product (indexed)
+      target/arm: Implement MVE vector shift right by immediate insns
-      target/arm: Implement bfloat16 matrix multiply accumulate
+      target/arm: Implement MVE VSHLL
-      target/arm: Implement bfloat widening fma (vector)
+      target/arm: Implement MVE VSRI, VSLI
-      target/arm: Implement bfloat widening fma (indexed)
+      target/arm: Implement MVE VSHRN, VRSHRN
-      linux-user/aarch64: Enable hwcap bits for bfloat16
+      target/arm: Implement MVE saturating narrowing shifts
-      target/arm: Enable BFloat16 extensions
+      target/arm: Implement MVE VSHLC
       target/arm: Implement MVE VADDLV
       target/arm: Implement MVE long shifts by immediate
       target/arm: Implement MVE long shifts by register
       target/arm: Implement MVE shifts by immediate
       target/arm: Implement MVE shifts by register
- docs/conf.py                    |   1 +
+Philippe Mathieu-Daudé (1):
- docs/system/arm/aspeed.rst      |   4 +-
+      tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine
  docs/system/arm/nuvoton.rst     |   6 +-
  docs/system/arm/sabrelite.rst   |   2 +-
  include/fpu/softfloat-types.h   |   4 +-
  include/hw/arm/allwinner-h3.h   |   2 +-
  include/hw/arm/armv7m.h         |   2 +
  include/hw/core/cpu.h           |   3 +-
  include/sysemu/hvf_int.h        |  58 +++++
  target/arm/cpu.h                |  48 +++-
  target/arm/helper-sve.h         |   4 +
  target/arm/helper.h             |  15 ++
  target/i386/hvf/hvf-accel-ops.h |  23 --
  target/i386/hvf/hvf-i386.h      |  33 +--
  target/i386/hvf/vmx.h           |  24 +-
  target/i386/hvf/x86hvf.h        |   2 -
  target/arm/neon-dp.decode       |   1 +
  target/arm/neon-shared.decode   |  11 +
  target/arm/sve.decode           |  19 +-
  target/arm/vfp.decode           |   2 +
  accel/hvf/hvf-accel-ops.c       | 471 ++++++++++++++++++++++++++++++++++++++++
  accel/hvf/hvf-all.c             |  47 ++++
  hw/arm/armv7m.c                 |   7 +
  hw/arm/aspeed.c                 |   6 +-
  hw/arm/mcimx6ul-evk.c           |   2 +-
  hw/arm/mcimx7d-sabre.c          |   2 +-
  hw/arm/npcm7xx_boards.c         |   4 +-
  hw/arm/sabrelite.c              |   2 +-
  hw/misc/npcm7xx_clk.c           |   2 +-
  linux-user/elfload.c            |   2 +
  target/arm/cpu.c                |  13 ++
  target/arm/cpu64.c              |   3 +
  target/arm/cpu_tcg.c            |   1 +
  target/arm/m_helper.c           |   5 +-
  target/arm/machine.c            |  20 ++
  target/arm/mte_helper.c         |  12 +-
  target/arm/op_helper.c          |  32 ++-
  target/arm/sve_helper.c         |   2 +
  target/arm/translate-a64.c      | 155 +++++++++++--
  target/arm/translate-neon.c     |  91 ++++++++
  target/arm/translate-sve.c      | 112 ++++++++++
  target/arm/translate-vfp.c      | 164 ++++++++++----
  target/arm/vec_helper.c         | 140 +++++++++++-
  target/arm/vfp_helper.c         |  21 +-
  target/i386/hvf/hvf-accel-ops.c | 146 -------------
  target/i386/hvf/hvf.c           | 464 +++++----------------------------------
  target/i386/hvf/x86.c           |  28 +--
  target/i386/hvf/x86_descr.c     |  26 +--
  target/i386/hvf/x86_emu.c       |  62 +++---
  target/i386/hvf/x86_mmu.c       |   4 +-
  target/i386/hvf/x86_task.c      |  12 +-
  target/i386/hvf/x86hvf.c        | 222 +++++++++----------
  tests/qtest/bios-tables-test.c  |   8 +-
  tests/qtest/e1000e-test.c       |   3 +-
  tests/qtest/hd-geo-test.c       |   4 +-
  tests/qtest/pflash-cfi02-test.c |   2 +-
  tests/qtest/tpm-tests.c         |  12 +-
  tests/unit/test-vmstate.c       |   5 +-
  fpu/softfloat-parts.c.inc       |   6 +-
  MAINTAINERS                     |   8 +
  accel/hvf/meson.build           |   7 +
  accel/meson.build               |   1 +
  target/i386/hvf/meson.build     |   1 -
 files changed, 1666 insertions(+), 935 deletions(-)
  create mode 100644 include/sysemu/hvf_int.h
  delete mode 100644 target/i386/hvf/hvf-accel-ops.h
  create mode 100644 accel/hvf/hvf-accel-ops.c
  create mode 100644 accel/hvf/hvf-all.c
  delete mode 100644 target/i386/hvf/hvf-accel-ops.c
  create mode 100644 accel/hvf/meson.build
+ docs/system/arm/aspeed.rst             |   1 +
+ docs/system/arm/nuvoton.rst            |   5 +-
+ include/hw/arm/bcm2835_peripherals.h   |   3 +-
+ include/hw/misc/bcm2835_powermgt.h     |  29 ++
+ target/arm/helper-mve.h                | 108 +++++++
+ target/arm/translate.h                 |  41 +++
+ target/arm/mve.decode                  | 177 ++++++++++-
+ target/arm/t32.decode                  |  71 ++++-
+ hw/arm/bcm2835_peripherals.c           |  13 +-
+ hw/gpio/gpio_pwr.c                     |   2 +-
+ hw/misc/bcm2835_powermgt.c             | 160 ++++++++++
+ target/arm/helper-a64.c                |  12 +-
+ target/arm/mve_helper.c                | 524 +++++++++++++++++++++++++++++++--
+ target/arm/translate-a64.c             |  86 +-----
+ target/arm/translate-mve.c             | 261 +++++++++++++++-
+ target/arm/translate-neon.c            |  81 -----
+ target/arm/translate.c                 | 327 +++++++++++++++++++-
+ target/arm/vfp_helper.c                |  24 +-
+ hw/misc/meson.build                    |   1 +
+ tests/acceptance/boot_linux_console.py |  43 +++
+files changed, 1760 insertions(+), 209 deletions(-)
+ create mode 100644 include/hw/misc/bcm2835_powermgt.h
+ create mode 100644 hw/misc/bcm2835_powermgt.c

-[PULL 10/45] docs: Fix installation of man pages with Sphinx 4.x
+[PULL 01/24] docs/system/arm: Add quanta-q7l1-bmc reference
-From: Damien Goutte-Gattat <dgouttegattat@incenp.org>
+From: Patrick Venture <venture@google.com>
-The 4.x branch of Sphinx introduces a breaking change, as generated man
+Adds a line-item reference to the supported quanta-q71l-bmc aspeed
-pages are now written to subdirectories corresponding to the manual
+entry.
 section they belong to. This results in `make install` erroring out when
 attempting to install the man pages, because they are not where it
 expects to find them.
-This patch restores the behavior of Sphinx 3.x regarding man pages.
+Signed-off-by: Patrick Venture <venture@google.com>
+Reviewed-by: Cédric Le Goater <clg@kaod.org>
-Resolves: https://gitlab.com/qemu-project/qemu/-/issues/256
+Message-id: 20210615192848.1065297-2-venture@google.com
 Signed-off-by: Damien Goutte-Gattat <dgouttegattat@incenp.org>
 Message-id: 20210503161422.15028-1-dgouttegattat@incenp.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- docs/conf.py | 1 +
+ docs/system/arm/aspeed.rst | 1 +
 file changed, 1 insertion(+)
-diff --git a/docs/conf.py b/docs/conf.py
+diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst
 index XXXXXXX..XXXXXXX 100644
---- a/docs/conf.py
+--- a/docs/system/arm/aspeed.rst
-+++ b/docs/conf.py
++++ b/docs/system/arm/aspeed.rst
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ etc.
-      ['Stefan Hajnoczi <stefanha@redhat.com>',
+ AST2400 SoC based machines :
-       'Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>'], 1),
- ]
+ - ``palmetto-bmc``         OpenPOWER Palmetto POWER8 BMC
-+man_make_section_directory = False
++- ``quanta-q71l-bmc``      OpenBMC Quanta BMC
- # -- Options for Texinfo output -------------------------------------------
+ AST2500 SoC based machines :
 --
 .20.1

-[PULL 09/45] arm: Consistently use "Cortex-Axx", not "Cortex Axx"
+[PULL 02/24] docs/system/arm: Add quanta-gbs-bmc reference
-The official punctuation for Arm CPU names uses a hyphen, like
+From: Patrick Venture <venture@google.com>
 "Cortex-A9". We mostly follow this, but in a few places usage
 without the hyphen has crept in. Fix those so we consistently
 use the same way of writing the CPU name.
-This commit was created with:
+Add line item reference to quanta-gbs-bmc machine.
   git grep -z -l 'Cortex ' | xargs -0 sed -i 's/Cortex /Cortex-/'
+Signed-off-by: Patrick Venture <venture@google.com>
+Reviewed-by: Cédric Le Goater <clg@kaod.org>
+Message-id: 20210615192848.1065297-3-venture@google.com
+[PMM: fixed underline Sphinx warning]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Message-id: 20210527095152.10968-1-peter.maydell@linaro.org
 ---
- docs/system/arm/aspeed.rst    | 4 ++--
+ docs/system/arm/nuvoton.rst | 5 +++--
- docs/system/arm/nuvoton.rst   | 6 +++---
+file changed, 3 insertions(+), 2 deletions(-)
  docs/system/arm/sabrelite.rst | 2 +-
  include/hw/arm/allwinner-h3.h | 2 +-
  hw/arm/aspeed.c               | 6 +++---
  hw/arm/mcimx6ul-evk.c         | 2 +-
  hw/arm/mcimx7d-sabre.c        | 2 +-
  hw/arm/npcm7xx_boards.c       | 4 ++--
  hw/arm/sabrelite.c            | 2 +-
  hw/misc/npcm7xx_clk.c         | 2 +-
 files changed, 16 insertions(+), 16 deletions(-)
-diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst
-index XXXXXXX..XXXXXXX 100644
---- a/docs/system/arm/aspeed.rst
-+++ b/docs/system/arm/aspeed.rst
-@@ -XXX,XX +XXX,XX @@ The QEMU Aspeed machines model BMCs of various OpenPOWER systems and
- Aspeed evaluation boards. They are based on different releases of the
- Aspeed SoC : the AST2400 integrating an ARM926EJ-S CPU (400MHz), the
- AST2500 with an ARM1176JZS CPU (800MHz) and more recently the AST2600
--with dual cores ARM Cortex A7 CPUs (1.2GHz).
-+with dual cores ARM Cortex-A7 CPUs (1.2GHz).
- The SoC comes with RAM, Gigabit ethernet, USB, SD/MMC, USB, SPI, I2C,
- etc.
-@@ -XXX,XX +XXX,XX @@ AST2500 SoC based machines :
- AST2600 SoC based machines :
--- ``ast2600-evb``          Aspeed AST2600 Evaluation board (Cortex A7)
-+- ``ast2600-evb``          Aspeed AST2600 Evaluation board (Cortex-A7)
- - ``tacoma-bmc``           OpenPOWER Witherspoon POWER9 AST2600 BMC
- Supported devices
 diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/system/arm/nuvoton.rst
 +++ b/docs/system/arm/nuvoton.rst
-@@ -XXX,XX +XXX,XX @@ Nuvoton iBMC boards (``npcm750-evb``, ``quanta-gsj``)
+@@ -XXX,XX +XXX,XX @@
 -Nuvoton iBMC boards (``npcm750-evb``, ``quanta-gsj``)
 -=====================================================
 +Nuvoton iBMC boards (``*-bmc``, ``npcm750-evb``, ``quanta-gsj``)
 +================================================================
  The `Nuvoton iBMC`_ chips (NPCM7xx) are a family of ARM-based SoCs that are
  designed to be used as Baseboard Management Controllers (BMCs) in various
--servers. They all feature one or two ARM Cortex A9 CPU cores, as well as an
+@@ -XXX,XX +XXX,XX @@ segment. The following machines are based on this chip :
-+servers. They all feature one or two ARM Cortex-A9 CPU cores, as well as an
+ The NPCM730 SoC has two Cortex-A9 cores and is targeted for Data Center and
  assortment of peripherals targeted for either Enterprise or Data Center /
  Hyperscale applications. The former is a superset of the latter, so NPCM750 has
  all the peripherals of NPCM730 and more.
  .. _Nuvoton iBMC: https://www.nuvoton.com/products/cloud-computing/ibmc/
 -The NPCM750 SoC has two Cortex A9 cores and is targeted for the Enterprise
 +The NPCM750 SoC has two Cortex-A9 cores and is targeted for the Enterprise
  segment. The following machines are based on this chip :
  - ``npcm750-evb``       Nuvoton NPCM750 Evaluation board
 -The NPCM730 SoC has two Cortex A9 cores and is targeted for Data Center and
 +The NPCM730 SoC has two Cortex-A9 cores and is targeted for Data Center and
  Hyperscale applications. The following machines are based on this chip :
++- ``quanta-gbs-bmc``    Quanta GBS server BMC
  - ``quanta-gsj``        Quanta GSJ server BMC
-diff --git a/docs/system/arm/sabrelite.rst b/docs/system/arm/sabrelite.rst
-index XXXXXXX..XXXXXXX 100644
+ There are also two more SoCs, NPCM710 and NPCM705, which are single-core
 --- a/docs/system/arm/sabrelite.rst
 +++ b/docs/system/arm/sabrelite.rst
@@ -XXX,XX +XXX,XX @@ Supported devices
  The SABRE Lite machine supports the following devices:
 - * Up to 4 Cortex A9 cores
 + * Up to 4 Cortex-A9 cores
   * Generic Interrupt Controller
   * 1 Clock Controller Module
   * 1 System Reset Controller
 diff --git a/include/hw/arm/allwinner-h3.h b/include/hw/arm/allwinner-h3.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/arm/allwinner-h3.h
 +++ b/include/hw/arm/allwinner-h3.h
@@ -XXX,XX +XXX,XX @@
   */
  /*
 - * The Allwinner H3 is a System on Chip containing four ARM Cortex A7
 + * The Allwinner H3 is a System on Chip containing four ARM Cortex-A7
   * processor cores. Features and specifications include DDR2/DDR3 memory,
   * SD/MMC storage cards, 10/100/1000Mbit Ethernet, USB 2.0, HDMI and
   * various I/O modules.
 diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/aspeed.c
 +++ b/hw/arm/aspeed.c
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_ast2600_evb_class_init(ObjectClass *oc, void *data)
      MachineClass *mc = MACHINE_CLASS(oc);
      AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 -    mc->desc       = "Aspeed AST2600 EVB (Cortex A7)";
 +    mc->desc       = "Aspeed AST2600 EVB (Cortex-A7)";
      amc->soc_name  = "ast2600-a1";
      amc->hw_strap1 = AST2600_EVB_HW_STRAP1;
      amc->hw_strap2 = AST2600_EVB_HW_STRAP2;
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_tacoma_class_init(ObjectClass *oc, void *data)
      MachineClass *mc = MACHINE_CLASS(oc);
      AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 -    mc->desc       = "OpenPOWER Tacoma BMC (Cortex A7)";
 +    mc->desc       = "OpenPOWER Tacoma BMC (Cortex-A7)";
      amc->soc_name  = "ast2600-a1";
      amc->hw_strap1 = TACOMA_BMC_HW_STRAP1;
      amc->hw_strap2 = TACOMA_BMC_HW_STRAP2;
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_rainier_class_init(ObjectClass *oc, void *data)
      MachineClass *mc = MACHINE_CLASS(oc);
      AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 -    mc->desc       = "IBM Rainier BMC (Cortex A7)";
 +    mc->desc       = "IBM Rainier BMC (Cortex-A7)";
      amc->soc_name  = "ast2600-a1";
      amc->hw_strap1 = RAINIER_BMC_HW_STRAP1;
      amc->hw_strap2 = RAINIER_BMC_HW_STRAP2;
 diff --git a/hw/arm/mcimx6ul-evk.c b/hw/arm/mcimx6ul-evk.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/mcimx6ul-evk.c
 +++ b/hw/arm/mcimx6ul-evk.c
@@ -XXX,XX +XXX,XX @@ static void mcimx6ul_evk_init(MachineState *machine)
  static void mcimx6ul_evk_machine_init(MachineClass *mc)
  {
 -    mc->desc = "Freescale i.MX6UL Evaluation Kit (Cortex A7)";
 +    mc->desc = "Freescale i.MX6UL Evaluation Kit (Cortex-A7)";
      mc->init = mcimx6ul_evk_init;
      mc->max_cpus = FSL_IMX6UL_NUM_CPUS;
      mc->default_ram_id = "mcimx6ul-evk.ram";
 diff --git a/hw/arm/mcimx7d-sabre.c b/hw/arm/mcimx7d-sabre.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/mcimx7d-sabre.c
 +++ b/hw/arm/mcimx7d-sabre.c
@@ -XXX,XX +XXX,XX @@ static void mcimx7d_sabre_init(MachineState *machine)
  static void mcimx7d_sabre_machine_init(MachineClass *mc)
  {
 -    mc->desc = "Freescale i.MX7 DUAL SABRE (Cortex A7)";
 +    mc->desc = "Freescale i.MX7 DUAL SABRE (Cortex-A7)";
      mc->init = mcimx7d_sabre_init;
      mc->max_cpus = FSL_IMX7_NUM_CPUS;
      mc->default_ram_id = "mcimx7d-sabre.ram";
 diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/npcm7xx_boards.c
 +++ b/hw/arm/npcm7xx_boards.c
@@ -XXX,XX +XXX,XX @@ static void npcm750_evb_machine_class_init(ObjectClass *oc, void *data)
      npcm7xx_set_soc_type(nmc, TYPE_NPCM750);
 -    mc->desc = "Nuvoton NPCM750 Evaluation Board (Cortex A9)";
 +    mc->desc = "Nuvoton NPCM750 Evaluation Board (Cortex-A9)";
      mc->init = npcm750_evb_init;
      mc->default_ram_size = 512 * MiB;
  };
@@ -XXX,XX +XXX,XX @@ static void gsj_machine_class_init(ObjectClass *oc, void *data)
      npcm7xx_set_soc_type(nmc, TYPE_NPCM730);
 -    mc->desc = "Quanta GSJ (Cortex A9)";
 +    mc->desc = "Quanta GSJ (Cortex-A9)";
      mc->init = quanta_gsj_init;
      mc->default_ram_size = 512 * MiB;
  };
 diff --git a/hw/arm/sabrelite.c b/hw/arm/sabrelite.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/sabrelite.c
 +++ b/hw/arm/sabrelite.c
@@ -XXX,XX +XXX,XX @@ static void sabrelite_init(MachineState *machine)
  static void sabrelite_machine_init(MachineClass *mc)
  {
 -    mc->desc = "Freescale i.MX6 Quad SABRE Lite Board (Cortex A9)";
 +    mc->desc = "Freescale i.MX6 Quad SABRE Lite Board (Cortex-A9)";
      mc->init = sabrelite_init;
      mc->max_cpus = FSL_IMX6_NUM_CPUS;
      mc->ignore_memory_transaction_failures = true;
 diff --git a/hw/misc/npcm7xx_clk.c b/hw/misc/npcm7xx_clk.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/npcm7xx_clk.c
 +++ b/hw/misc/npcm7xx_clk.c
@@ -XXX,XX +XXX,XX @@
  #define NPCM7XX_CLOCK_REF_HZ            (25000000)
  /* Register Field Definitions */
 -#define NPCM7XX_CLK_WDRCR_CA9C  BIT(0) /* Cortex A9 Cores */
 +#define NPCM7XX_CLK_WDRCR_CA9C  BIT(0) /* Cortex-A9 Cores */
  #define PLLCON_LOKI     BIT(31)
  #define PLLCON_LOKS     BIT(30)
 --
 .20.1

-[PULL 28/45] hvf: Move assert_hvf_ok() into common directory
+[PULL 03/24] hw/arm: Add basic power management to raspi.
-From: Alexander Graf <agraf@csgraf.de>
+From: Nolan Leake <nolan@sigbus.net>
-Until now, Hypervisor.framework has only been available on x86_64 systems.
+This is just enough to make reboot and poweroff work. Works for
-With Apple Silicon shipping now, it extends its reach to aarch64. To
+linux, u-boot, and the arm trusted firmware. Not tested, but should
-prepare for support for multiple architectures, let's start moving common
+work for plan9, and bare-metal/hobby OSes, since they seem to generally
-code out into its own accel directory.
+do what linux does for reset.
-This patch moves assert_hvf_ok() and introduces generic build infrastructure.
+The watchdog timer functionality is not yet implemented.
-Signed-off-by: Alexander Graf <agraf@csgraf.de>
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/64
-Reviewed-by: Sergio Lopez <slp@redhat.com>
+Signed-off-by: Nolan Leake <nolan@sigbus.net>
-Message-id: 20210519202253.76782-2-agraf@csgraf.de
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Message-id: 20210625210209.1870217-1-nolan@sigbus.net
 [PMM: tweaked commit title; fixed region size to 0x200;
  moved header file to include/]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/sysemu/hvf_int.h | 18 +++++++++++++++
+ include/hw/arm/bcm2835_peripherals.h |   3 +-
- accel/hvf/hvf-all.c      | 47 ++++++++++++++++++++++++++++++++++++++++
+ include/hw/misc/bcm2835_powermgt.h   |  29 +++++
- target/i386/hvf/hvf.c    | 33 +---------------------------
+ hw/arm/bcm2835_peripherals.c         |  13 ++-
- MAINTAINERS              |  8 +++++++
+ hw/misc/bcm2835_powermgt.c           | 160 +++++++++++++++++++++++++++
- accel/hvf/meson.build    |  6 +++++
+ hw/misc/meson.build                  |   1 +
- accel/meson.build        |  1 +
+files changed, 204 insertions(+), 2 deletions(-)
-files changed, 81 insertions(+), 32 deletions(-)
+ create mode 100644 include/hw/misc/bcm2835_powermgt.h
- create mode 100644 include/sysemu/hvf_int.h
+ create mode 100644 hw/misc/bcm2835_powermgt.c
- create mode 100644 accel/hvf/hvf-all.c
- create mode 100644 accel/hvf/meson.build
+diff --git a/include/hw/arm/bcm2835_peripherals.h b/include/hw/arm/bcm2835_peripherals.h
+index XXXXXXX..XXXXXXX 100644
-diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
+--- a/include/hw/arm/bcm2835_peripherals.h
 +++ b/include/hw/arm/bcm2835_peripherals.h
@@ -XXX,XX +XXX,XX @@
  #include "hw/misc/bcm2835_mphi.h"
  #include "hw/misc/bcm2835_thermal.h"
  #include "hw/misc/bcm2835_cprman.h"
 +#include "hw/misc/bcm2835_powermgt.h"
  #include "hw/sd/sdhci.h"
  #include "hw/sd/bcm2835_sdhost.h"
  #include "hw/gpio/bcm2835_gpio.h"
@@ -XXX,XX +XXX,XX @@ struct BCM2835PeripheralState {
      BCM2835MphiState mphi;
      UnimplementedDeviceState txp;
      UnimplementedDeviceState armtmr;
 -    UnimplementedDeviceState powermgt;
 +    BCM2835PowerMgtState powermgt;
      BCM2835CprmanState cprman;
      PL011State uart0;
      BCM2835AuxState aux;
 diff --git a/include/hw/misc/bcm2835_powermgt.h b/include/hw/misc/bcm2835_powermgt.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
-+++ b/include/sysemu/hvf_int.h
++++ b/include/hw/misc/bcm2835_powermgt.h
 @@ -XXX,XX +XXX,XX @@
 +/*
-+ * QEMU Hypervisor.framework (HVF) support
++ * BCM2835 Power Management emulation
 + *
 + * Copyright (C) 2017 Marcin Chojnacki <marcinch7@gmail.com>
 + * Copyright (C) 2021 Nolan Leake <nolan@sigbus.net>
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
-+ *
 + */
 +
-+/* header to be included in HVF-specific code */
++#ifndef BCM2835_POWERMGT_H
-+
++#define BCM2835_POWERMGT_H
-+#ifndef HVF_INT_H
++
-+#define HVF_INT_H
++#include "hw/sysbus.h"
-+
++#include "qom/object.h"
-+#include <Hypervisor/hv.h>
++
-+
++#define TYPE_BCM2835_POWERMGT "bcm2835-powermgt"
-+void assert_hvf_ok(hv_return_t ret);
++OBJECT_DECLARE_SIMPLE_TYPE(BCM2835PowerMgtState, BCM2835_POWERMGT)
 +
 +struct BCM2835PowerMgtState {
 +    SysBusDevice busdev;
 +    MemoryRegion iomem;
 +
 +    uint32_t rstc;
 +    uint32_t rsts;
 +    uint32_t wdog;
 +};
 +
 +#endif
-diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
+diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/bcm2835_peripherals.c
 +++ b/hw/arm/bcm2835_peripherals.c
@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_init(Object *obj)
      object_property_add_const_link(OBJECT(&s->dwc2), "dma-mr",
                                     OBJECT(&s->gpu_bus_mr));
 +
 +    /* Power Management */
 +    object_initialize_child(obj, "powermgt", &s->powermgt,
 +                            TYPE_BCM2835_POWERMGT);
  }
  static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
          qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
                                 INTERRUPT_USB));
 +    /* Power Management */
 +    if (!sysbus_realize(SYS_BUS_DEVICE(&s->powermgt), errp)) {
 +        return;
 +    }
 +
 +    memory_region_add_subregion(&s->peri_mr, PM_OFFSET,
 +                sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->powermgt), 0));
 +
      create_unimp(s, &s->txp, "bcm2835-txp", TXP_OFFSET, 0x1000);
      create_unimp(s, &s->armtmr, "bcm2835-sp804", ARMCTRL_TIMER0_1_OFFSET, 0x40);
 -    create_unimp(s, &s->powermgt, "bcm2835-powermgt", PM_OFFSET, 0x114);
      create_unimp(s, &s->i2s, "bcm2835-i2s", I2S_OFFSET, 0x100);
      create_unimp(s, &s->smi, "bcm2835-smi", SMI_OFFSET, 0x100);
      create_unimp(s, &s->spi[0], "bcm2835-spi0", SPI0_OFFSET, 0x20);
 diff --git a/hw/misc/bcm2835_powermgt.c b/hw/misc/bcm2835_powermgt.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
-+++ b/accel/hvf/hvf-all.c
++++ b/hw/misc/bcm2835_powermgt.c
 @@ -XXX,XX +XXX,XX @@
 +/*
-+ * QEMU Hypervisor.framework support
++ * BCM2835 Power Management emulation
 + *
-+ * This work is licensed under the terms of the GNU GPL, version 2.  See
++ * Copyright (C) 2017 Marcin Chojnacki <marcinch7@gmail.com>
-+ * the COPYING file in the top-level directory.
++ * Copyright (C) 2021 Nolan Leake <nolan@sigbus.net>
 + *
-+ * Contributions after 2012-01-13 are licensed under the terms of the
++ * This work is licensed under the terms of the GNU GPL, version 2 or later.
-+ * GNU GPL, version 2 or (at your option) any later version.
++ * See the COPYING file in the top-level directory.
 + */
 +
 +#include "qemu/osdep.h"
-+#include "qemu-common.h"
++#include "qemu/log.h"
-+#include "qemu/error-report.h"
++#include "qemu/module.h"
-+#include "sysemu/hvf.h"
++#include "hw/misc/bcm2835_powermgt.h"
-+#include "sysemu/hvf_int.h"
++#include "migration/vmstate.h"
-+
++#include "sysemu/runstate.h"
-+void assert_hvf_ok(hv_return_t ret)
++
-+{
++#define PASSWORD 0x5a000000
-+    if (ret == HV_SUCCESS) {
++#define PASSWORD_MASK 0xff000000
 +
 +#define R_RSTC 0x1c
 +#define V_RSTC_RESET 0x20
 +#define R_RSTS 0x20
 +#define V_RSTS_POWEROFF 0x555 /* Linux uses partition 63 to indicate halt. */
 +#define R_WDOG 0x24
 +
 +static uint64_t bcm2835_powermgt_read(void *opaque, hwaddr offset,
 +                                      unsigned size)
 +{
 +    BCM2835PowerMgtState *s = (BCM2835PowerMgtState *)opaque;
 +    uint32_t res = 0;
 +
 +    switch (offset) {
 +    case R_RSTC:
 +        res = s->rstc;
 +        break;
 +    case R_RSTS:
 +        res = s->rsts;
 +        break;
 +    case R_WDOG:
 +        res = s->wdog;
 +        break;
 +
 +    default:
 +        qemu_log_mask(LOG_UNIMP,
 +                      "bcm2835_powermgt_read: Unknown offset 0x%08"HWADDR_PRIx
 +                      "\n", offset);
 +        res = 0;
 +        break;
 +    }
 +
 +    return res;
 +}
 +
 +static void bcm2835_powermgt_write(void *opaque, hwaddr offset,
 +                                   uint64_t value, unsigned size)
 +{
 +    BCM2835PowerMgtState *s = (BCM2835PowerMgtState *)opaque;
 +
 +    if ((value & PASSWORD_MASK) != PASSWORD) {
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "bcm2835_powermgt_write: Bad password 0x%"PRIx64
 +                      " at offset 0x%08"HWADDR_PRIx"\n",
 +                      value, offset);
 +        return;
 +    }
 +
-+    switch (ret) {
++    value = value & ~PASSWORD_MASK;
-+    case HV_ERROR:
++
-+        error_report("Error: HV_ERROR");
++    switch (offset) {
-+        break;
++    case R_RSTC:
-+    case HV_BUSY:
++        s->rstc = value;
-+        error_report("Error: HV_BUSY");
++        if (value & V_RSTC_RESET) {
-+        break;
++            if ((s->rsts & 0xfff) == V_RSTS_POWEROFF) {
-+    case HV_BAD_ARGUMENT:
++                qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
-+        error_report("Error: HV_BAD_ARGUMENT");
++            } else {
-+        break;
++                qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
-+    case HV_NO_RESOURCES:
++            }
-+        error_report("Error: HV_NO_RESOURCES");
++        }
 +        break;
-+    case HV_NO_DEVICE:
++    case R_RSTS:
-+        error_report("Error: HV_NO_DEVICE");
++        qemu_log_mask(LOG_UNIMP,
-+        break;
++                      "bcm2835_powermgt_write: RSTS\n");
-+    case HV_UNSUPPORTED:
++        s->rsts = value;
-+        error_report("Error: HV_UNSUPPORTED");
++        break;
-+        break;
++    case R_WDOG:
 +        qemu_log_mask(LOG_UNIMP,
 +                      "bcm2835_powermgt_write: WDOG\n");
 +        s->wdog = value;
 +        break;
 +
 +    default:
-+        error_report("Unknown Error");
++        qemu_log_mask(LOG_UNIMP,
-+    }
++                      "bcm2835_powermgt_write: Unknown offset 0x%08"HWADDR_PRIx
-+
++                      "\n", offset);
-+    abort();
++        break;
-+}
++    }
-diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
++}
 +
 +static const MemoryRegionOps bcm2835_powermgt_ops = {
 +    .read = bcm2835_powermgt_read,
 +    .write = bcm2835_powermgt_write,
 +    .endianness = DEVICE_NATIVE_ENDIAN,
 +    .impl.min_access_size = 4,
 +    .impl.max_access_size = 4,
 +};
 +
 +static const VMStateDescription vmstate_bcm2835_powermgt = {
 +    .name = TYPE_BCM2835_POWERMGT,
 +    .version_id = 1,
 +    .minimum_version_id = 1,
 +    .fields = (VMStateField[]) {
 +        VMSTATE_UINT32(rstc, BCM2835PowerMgtState),
 +        VMSTATE_UINT32(rsts, BCM2835PowerMgtState),
 +        VMSTATE_UINT32(wdog, BCM2835PowerMgtState),
 +        VMSTATE_END_OF_LIST()
 +    }
 +};
 +
 +static void bcm2835_powermgt_init(Object *obj)
 +{
 +    BCM2835PowerMgtState *s = BCM2835_POWERMGT(obj);
 +
 +    memory_region_init_io(&s->iomem, obj, &bcm2835_powermgt_ops, s,
 +                          TYPE_BCM2835_POWERMGT, 0x200);
 +    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->iomem);
 +}
 +
 +static void bcm2835_powermgt_reset(DeviceState *dev)
 +{
 +    BCM2835PowerMgtState *s = BCM2835_POWERMGT(dev);
 +
 +    /* https://elinux.org/BCM2835_registers#PM */
 +    s->rstc = 0x00000102;
 +    s->rsts = 0x00001000;
 +    s->wdog = 0x00000000;
 +}
 +
 +static void bcm2835_powermgt_class_init(ObjectClass *klass, void *data)
 +{
 +    DeviceClass *dc = DEVICE_CLASS(klass);
 +
 +    dc->reset = bcm2835_powermgt_reset;
 +    dc->vmsd = &vmstate_bcm2835_powermgt;
 +}
 +
 +static TypeInfo bcm2835_powermgt_info = {
 +    .name          = TYPE_BCM2835_POWERMGT,
 +    .parent        = TYPE_SYS_BUS_DEVICE,
 +    .instance_size = sizeof(BCM2835PowerMgtState),
 +    .class_init    = bcm2835_powermgt_class_init,
 +    .instance_init = bcm2835_powermgt_init,
 +};
 +
 +static void bcm2835_powermgt_register_types(void)
 +{
 +    type_register_static(&bcm2835_powermgt_info);
 +}
 +
 +type_init(bcm2835_powermgt_register_types)
 diff --git a/hw/misc/meson.build b/hw/misc/meson.build
 index XXXXXXX..XXXXXXX 100644
---- a/target/i386/hvf/hvf.c
+--- a/hw/misc/meson.build
-+++ b/target/i386/hvf/hvf.c
++++ b/hw/misc/meson.build
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_RASPI', if_true: files(
- #include "qemu/error-report.h"
+   'bcm2835_rng.c',
+   'bcm2835_thermal.c',
- #include "sysemu/hvf.h"
+   'bcm2835_cprman.c',
-+#include "sysemu/hvf_int.h"
++  'bcm2835_powermgt.c',
- #include "sysemu/runstate.h"
+ ))
- #include "hvf-i386.h"
+ softmmu_ss.add(when: 'CONFIG_SLAVIO', if_true: files('slavio_misc.c'))
- #include "vmcs.h"
+ softmmu_ss.add(when: 'CONFIG_ZYNQ', if_true: files('zynq_slcr.c', 'zynq-xadc.c'))
@@ -XXX,XX +XXX,XX @@
  HVFState *hvf_state;
 -static void assert_hvf_ok(hv_return_t ret)
 -{
 -    if (ret == HV_SUCCESS) {
 -        return;
 -    }
 -
 -    switch (ret) {
 -    case HV_ERROR:
 -        error_report("Error: HV_ERROR");
 -        break;
 -    case HV_BUSY:
 -        error_report("Error: HV_BUSY");
 -        break;
 -    case HV_BAD_ARGUMENT:
 -        error_report("Error: HV_BAD_ARGUMENT");
 -        break;
 -    case HV_NO_RESOURCES:
 -        error_report("Error: HV_NO_RESOURCES");
 -        break;
 -    case HV_NO_DEVICE:
 -        error_report("Error: HV_NO_DEVICE");
 -        break;
 -    case HV_UNSUPPORTED:
 -        error_report("Error: HV_UNSUPPORTED");
 -        break;
 -    default:
 -        error_report("Unknown Error");
 -    }
 -
 -    abort();
 -}
 -
  /* Memory slots */
  hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
  {
 diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ M: Roman Bolshakov <r.bolshakov@yadro.com>
  W: https://wiki.qemu.org/Features/HVF
  S: Maintained
  F: target/i386/hvf/
 +
 +HVF
 +M: Cameron Esfahani <dirty@apple.com>
 +M: Roman Bolshakov <r.bolshakov@yadro.com>
 +W: https://wiki.qemu.org/Features/HVF
 +S: Maintained
 +F: accel/hvf/
  F: include/sysemu/hvf.h
 +F: include/sysemu/hvf_int.h
  WHPX CPUs
  M: Sunil Muthuswamy <sunilmut@microsoft.com>
 diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/accel/hvf/meson.build
@@ -XXX,XX +XXX,XX @@
 +hvf_ss = ss.source_set()
 +hvf_ss.add(files(
 +  'hvf-all.c',
 +))
 +
 +specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
 diff --git a/accel/meson.build b/accel/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/meson.build
 +++ b/accel/meson.build
@@ -XXX,XX +XXX,XX @@ specific_ss.add(files('accel-common.c'))
  softmmu_ss.add(files('accel-softmmu.c'))
  user_ss.add(files('accel-user.c'))
 +subdir('hvf')
  subdir('qtest')
  subdir('kvm')
  subdir('tcg')
 --
 .20.1

-[PULL 39/45] hvf: Simplify post reset/init/loadvm hooks
+[PULL 04/24] tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine
-From: Alexander Graf <agraf@csgraf.de>
+From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-The hooks we have that call us after reset, init and loadvm really all
+Add a test booting and quickly shutdown a raspi2 machine,
-just want to say "The reference of all register state is in the QEMU
+to test the power management model:
 vcpu struct, please push it".
-We already have a working pushing mechanism though called cpu->vcpu_dirty,
+   (1/1) tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_raspi2_initrd:
-so we can just reuse that for all of the above, syncing state properly the
+  console: [    0.000000] Booting Linux on physical CPU 0xf00
-next time we actually execute a vCPU.
+  console: [    0.000000] Linux version 4.14.98-v7+ (dom@dom-XPS-13-9370) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1200 SMP Tue Feb 12 20:27:48 GMT 2019
   console: [    0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
   console: [    0.000000] CPU: div instructions available: patching division code
   console: [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
   console: [    0.000000] OF: fdt: Machine model: Raspberry Pi 2 Model B
   ...
   console: Boot successful.
   console: cat /proc/cpuinfo
   console: / # cat /proc/cpuinfo
   ...
   console: processor      : 3
   console: model name     : ARMv7 Processor rev 5 (v7l)
   console: BogoMIPS       : 125.00
   console: Features       : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
   console: CPU implementer        : 0x41
   console: CPU architecture: 7
   console: CPU variant    : 0x0
   console: CPU part       : 0xc07
   console: CPU revision   : 5
   console: Hardware       : BCM2835
   console: Revision       : 0000
   console: Serial         : 0000000000000000
   console: cat /proc/iomem
   console: / # cat /proc/iomem
   console: 00000000-3bffffff : System RAM
   console: 00008000-00afffff : Kernel code
   console: 00c00000-00d468ef : Kernel data
   console: 3f006000-3f006fff : dwc_otg
   console: 3f007000-3f007eff : /soc/dma@7e007000
   console: 3f00b880-3f00b8bf : /soc/mailbox@7e00b880
   console: 3f100000-3f100027 : /soc/watchdog@7e100000
   console: 3f101000-3f102fff : /soc/cprman@7e101000
   console: 3f200000-3f2000b3 : /soc/gpio@7e200000
   PASS (24.59 s)
   RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
   JOB TIME   : 25.02 s
-This fixes PSCI resets on ARM, as they modify CPU state even after the
+Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-post init call has completed, but before we execute the vCPU again.
+Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
+Message-id: 20210531113837.1689775-1-f4bug@amsat.org
 To also make the scheme work for x86, we have to make sure we don't
 move stale eflags into our env when the vcpu state is dirty.
 Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
 Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
 Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-13-agraf@csgraf.de
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- accel/hvf/hvf-accel-ops.c | 27 +++++++--------------------
+ tests/acceptance/boot_linux_console.py | 43 ++++++++++++++++++++++++++
- target/i386/hvf/x86hvf.c  |  5 ++++-
+file changed, 43 insertions(+)
 files changed, 11 insertions(+), 21 deletions(-)
-diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
+diff --git a/tests/acceptance/boot_linux_console.py b/tests/acceptance/boot_linux_console.py
 index XXXXXXX..XXXXXXX 100644
---- a/accel/hvf/hvf-accel-ops.c
+--- a/tests/acceptance/boot_linux_console.py
-+++ b/accel/hvf/hvf-accel-ops.c
++++ b/tests/acceptance/boot_linux_console.py
-@@ -XXX,XX +XXX,XX @@ static void hvf_cpu_synchronize_state(CPUState *cpu)
+@@ -XXX,XX +XXX,XX @@
-     }
+ from avocado import skip
- }
+ from avocado import skipUnless
+ from avocado_qemu import Test
--static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
++from avocado_qemu import exec_command
--                                              run_on_cpu_data arg)
+ from avocado_qemu import exec_command_and_wait_for_pattern
-+static void do_hvf_cpu_synchronize_set_dirty(CPUState *cpu,
+ from avocado_qemu import interrupt_interactive_console_until_pattern
-+                                             run_on_cpu_data arg)
+ from avocado_qemu import wait_for_console_pattern
- {
+@@ -XXX,XX +XXX,XX @@ def test_arm_raspi2_uart0(self):
--    hvf_put_registers(cpu);
+         """
--    cpu->vcpu_dirty = false;
+         self.do_test_arm_raspi2(0)
-+    /* QEMU state is the reference, push it to HVF now and on next entry */
-+    cpu->vcpu_dirty = true;
++    def test_arm_raspi2_initrd(self):
- }
++        """
++        :avocado: tags=arch:arm
- static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
++        :avocado: tags=machine:raspi2
- {
++        """
--    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
++        deb_url = ('http://archive.raspberrypi.org/debian/'
--}
++                   'pool/main/r/raspberrypi-firmware/'
--
++                   'raspberrypi-kernel_1.20190215-1_armhf.deb')
--static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
++        deb_hash = 'cd284220b32128c5084037553db3c482426f3972'
--                                             run_on_cpu_data arg)
++        deb_path = self.fetch_asset(deb_url, asset_hash=deb_hash)
--{
++        kernel_path = self.extract_from_deb(deb_path, '/boot/kernel7.img')
--    hvf_put_registers(cpu);
++        dtb_path = self.extract_from_deb(deb_path, '/boot/bcm2709-rpi-2-b.dtb')
--    cpu->vcpu_dirty = false;
++
-+    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
++        initrd_url = ('https://github.com/groeck/linux-build-test/raw/'
- }
++                      '2eb0a73b5d5a28df3170c546ddaaa9757e1e0848/rootfs/'
++                      'arm/rootfs-armv7a.cpio.gz')
- static void hvf_cpu_synchronize_post_init(CPUState *cpu)
++        initrd_hash = '604b2e45cdf35045846b8bbfbf2129b1891bdc9c'
- {
++        initrd_path_gz = self.fetch_asset(initrd_url, asset_hash=initrd_hash)
--    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
++        initrd_path = os.path.join(self.workdir, 'rootfs.cpio')
--}
++        archive.gzip_uncompress(initrd_path_gz, initrd_path)
--
++
--static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
++        self.vm.set_console()
--                                              run_on_cpu_data arg)
++        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
--{
++                               'earlycon=pl011,0x3f201000 console=ttyAMA0 '
--    cpu->vcpu_dirty = true;
++                               'panic=-1 noreboot ' +
-+    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
++                               'dwc_otg.fiq_fsm_enable=0')
- }
++        self.vm.add_args('-kernel', kernel_path,
++                         '-dtb', dtb_path,
- static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
++                         '-initrd', initrd_path,
- {
++                         '-append', kernel_command_line,
--    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
++                         '-no-reboot')
-+    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
++        self.vm.launch()
- }
++        self.wait_for_console_pattern('Boot successful.')
++
- static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
++        exec_command_and_wait_for_pattern(self, 'cat /proc/cpuinfo',
-diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
++                                                'BCM2835')
-index XXXXXXX..XXXXXXX 100644
++        exec_command_and_wait_for_pattern(self, 'cat /proc/iomem',
---- a/target/i386/hvf/x86hvf.c
++                                                '/soc/cprman@7e101000')
-+++ b/target/i386/hvf/x86hvf.c
++        exec_command(self, 'halt')
-@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
++        # Wait for VM to shut down gracefully
-     X86CPU *cpu = X86_CPU(cpu_state);
++        self.vm.wait()
-     CPUX86State *env = &cpu->env;
++
+     def test_arm_exynos4210_initrd(self):
--    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
+         """
-+    if (!cpu_state->vcpu_dirty) {
+         :avocado: tags=arch:arm
 +        /* light weight sync for CPU_INTERRUPT_HARD and IF_MASK */
 +        env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
 +    }
      if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
          cpu_synchronize_state(cpu_state);
 --
 .20.1

-[PULL 07/45] target/arm: Make FPSCR.LTPSIZE writable for MVE
+[PULL 05/24] target/arm: Check NaN mode before silencing NaN
-The M-profile FPSCR has an LTPSIZE field, but if MVE is not
+From: Joe Komlodi <joe.komlodi@xilinx.com>
 implemented it is read-only and always reads as 4; this is how QEMU
 currently handles it.
-Make the field writable when MVE is implemented.
+If the CPU is running in default NaN mode (FPCR.DN == 1) and we execute
 FRSQRTE, FRECPE, or FRECPX with a signaling NaN, parts_silence_nan_frac() will
 assert due to fpst->default_nan_mode being set.
-We can safely add the field to the MVE migration struct because
+To avoid this, we check to see what NaN mode we're running in before we call
-currently no CPUs enable MVE and so the migration struct is never
+floatxx_silence_nan().
 used.
+Signed-off-by: Joe Komlodi <joe.komlodi@xilinx.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 1624662174-175828-2-git-send-email-joe.komlodi@xilinx.com
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210520152840.24453-8-peter.maydell@linaro.org
 ---
- target/arm/cpu.h        | 3 ++-
+ target/arm/helper-a64.c | 12 +++++++++---
- target/arm/machine.c    | 1 +
+ target/arm/vfp_helper.c | 24 ++++++++++++++++++------
- target/arm/vfp_helper.c | 9 ++++++---
+files changed, 27 insertions(+), 9 deletions(-)
 files changed, 9 insertions(+), 4 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/helper-a64.c
-+++ b/target/arm/cpu.h
++++ b/target/arm/helper-a64.c
-@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(frecpx_f16)(uint32_t a, void *fpstp)
-         uint32_t fpdscr[M_REG_NUM_BANKS];
+         float16 nan = a;
-         uint32_t cpacr[M_REG_NUM_BANKS];
+         if (float16_is_signaling_nan(a, fpst)) {
-         uint32_t nsacr;
+             float_raise(float_flag_invalid, fpst);
--        int ltpsize;
+-            nan = float16_silence_nan(a, fpst);
-+        uint32_t ltpsize;
++            if (!fpst->default_nan_mode) {
-         uint32_t vpr;
++                nan = float16_silence_nan(a, fpst);
-     } v7m;
++            }
+         }
-@@ -XXX,XX +XXX,XX @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
+         if (fpst->default_nan_mode) {
+             nan = float16_default_nan(fpst);
- #define FPCR_LTPSIZE_SHIFT 16   /* LTPSIZE, M-profile only */
+@@ -XXX,XX +XXX,XX @@ float32 HELPER(frecpx_f32)(float32 a, void *fpstp)
- #define FPCR_LTPSIZE_MASK (7 << FPCR_LTPSIZE_SHIFT)
+         float32 nan = a;
-+#define FPCR_LTPSIZE_LENGTH 3
+         if (float32_is_signaling_nan(a, fpst)) {
+             float_raise(float_flag_invalid, fpst);
- #define FPCR_NZCV_MASK (FPCR_N | FPCR_Z | FPCR_C | FPCR_V)
+-            nan = float32_silence_nan(a, fpst);
- #define FPCR_NZCVQC_MASK (FPCR_NZCV_MASK | FPCR_QC)
++            if (!fpst->default_nan_mode) {
-diff --git a/target/arm/machine.c b/target/arm/machine.c
++                nan = float32_silence_nan(a, fpst);
-index XXXXXXX..XXXXXXX 100644
++            }
---- a/target/arm/machine.c
+         }
-+++ b/target/arm/machine.c
+         if (fpst->default_nan_mode) {
-@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_mve = {
+             nan = float32_default_nan(fpst);
-     .needed = mve_needed,
+@@ -XXX,XX +XXX,XX @@ float64 HELPER(frecpx_f64)(float64 a, void *fpstp)
-     .fields = (VMStateField[]) {
+         float64 nan = a;
-         VMSTATE_UINT32(env.v7m.vpr, ARMCPU),
+         if (float64_is_signaling_nan(a, fpst)) {
-+        VMSTATE_UINT32(env.v7m.ltpsize, ARMCPU),
+             float_raise(float_flag_invalid, fpst);
-         VMSTATE_END_OF_LIST()
+-            nan = float64_silence_nan(a, fpst);
-     },
++            if (!fpst->default_nan_mode) {
- };
++                nan = float64_silence_nan(a, fpst);
 +            }
          }
          if (fpst->default_nan_mode) {
              nan = float64_default_nan(fpst);
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
-@@ -XXX,XX +XXX,XX @@ uint32_t vfp_get_fpscr(CPUARMState *env)
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, void *fpstp)
+         float16 nan = f16;
- void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
+         if (float16_is_signaling_nan(f16, fpst)) {
- {
+             float_raise(float_flag_invalid, fpst);
-+    ARMCPU *cpu = env_archcpu(env);
+-            nan = float16_silence_nan(f16, fpst);
-+
++            if (!fpst->default_nan_mode) {
-     /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
++                nan = float16_silence_nan(f16, fpst);
--    if (!cpu_isar_feature(any_fp16, env_archcpu(env))) {
++            }
-+    if (!cpu_isar_feature(any_fp16, cpu)) {
+         }
-         val &= ~FPCR_FZ16;
+         if (fpst->default_nan_mode) {
-     }
+             nan =  float16_default_nan(fpst);
+@@ -XXX,XX +XXX,XX @@ float32 HELPER(recpe_f32)(float32 input, void *fpstp)
-@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
+         float32 nan = f32;
-          * because in v7A no-short-vector-support cores still had to
+         if (float32_is_signaling_nan(f32, fpst)) {
-          * allow Stride/Len to be written with the only effect that
+             float_raise(float_flag_invalid, fpst);
-          * some insns are required to UNDEF if the guest sets them.
+-            nan = float32_silence_nan(f32, fpst);
--         *
++            if (!fpst->default_nan_mode) {
--         * TODO: if M-profile MVE implemented, set LTPSIZE.
++                nan = float32_silence_nan(f32, fpst);
-          */
++            }
-         env->vfp.vec_len = extract32(val, 16, 3);
+         }
-         env->vfp.vec_stride = extract32(val, 20, 2);
+         if (fpst->default_nan_mode) {
-+    } else if (cpu_isar_feature(aa32_mve, cpu)) {
+             nan =  float32_default_nan(fpst);
-+        env->v7m.ltpsize = extract32(val, FPCR_LTPSIZE_SHIFT,
+@@ -XXX,XX +XXX,XX @@ float64 HELPER(recpe_f64)(float64 input, void *fpstp)
-+                                     FPCR_LTPSIZE_LENGTH);
+         float64 nan = f64;
-     }
+         if (float64_is_signaling_nan(f64, fpst)) {
+             float_raise(float_flag_invalid, fpst);
-     if (arm_feature(env, ARM_FEATURE_NEON)) {
+-            nan = float64_silence_nan(f64, fpst);
 +            if (!fpst->default_nan_mode) {
 +                nan = float64_silence_nan(f64, fpst);
 +            }
          }
          if (fpst->default_nan_mode) {
              nan =  float64_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, void *fpstp)
          float16 nan = f16;
          if (float16_is_signaling_nan(f16, s)) {
              float_raise(float_flag_invalid, s);
 -            nan = float16_silence_nan(f16, s);
 +            if (!s->default_nan_mode) {
 +                nan = float16_silence_nan(f16, fpstp);
 +            }
          }
          if (s->default_nan_mode) {
              nan =  float16_default_nan(s);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(rsqrte_f32)(float32 input, void *fpstp)
          float32 nan = f32;
          if (float32_is_signaling_nan(f32, s)) {
              float_raise(float_flag_invalid, s);
 -            nan = float32_silence_nan(f32, s);
 +            if (!s->default_nan_mode) {
 +                nan = float32_silence_nan(f32, fpstp);
 +            }
          }
          if (s->default_nan_mode) {
              nan =  float32_default_nan(s);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(rsqrte_f64)(float64 input, void *fpstp)
          float64 nan = f64;
          if (float64_is_signaling_nan(f64, s)) {
              float_raise(float_flag_invalid, s);
 -            nan = float64_silence_nan(f64, s);
 +            if (!s->default_nan_mode) {
 +                nan = float64_silence_nan(f64, fpstp);
 +            }
          }
          if (s->default_nan_mode) {
              nan =  float64_default_nan(s);
 --
 .20.1

-[PULL 15/45] target/arm: use raise_exception_ra for stack limit exception
+[PULL 06/24] hw/gpio/gpio_pwr: use shutdown function for reboot
-From: Jamie Iles <jamie@nuviainc.com>
+From: Maxim Uvarov <maxim.uvarov@linaro.org>
-The sequence cpu_restore_state() + raise_exception() is equivalent to
+qemu has 2 type of functions: shutdown and reboot. Shutdown
-raise_exception_ra(), so use that instead.  (In this case we never
+function has to be used for machine shutdown. Otherwise we cause
-cared about the syndrome value, because M-profile doesn't use the
+a reset with a bogus "cause" value, when we intended a shutdown.
 syndrome; the old code was just written unnecessarily awkwardly.)
-Cc: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org>
 Cc: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Jamie Iles <jamie@nuviainc.com>
 [PMM: Retain edited version of comment; rewrite commit message]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20210625111842.3790-3-maxim.uvarov@linaro.org
+[PMM: tweaked commit message]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/m_helper.c  | 5 +----
+ hw/gpio/gpio_pwr.c | 2 +-
- target/arm/op_helper.c | 9 +++------
+file changed, 1 insertion(+), 1 deletion(-)
 files changed, 4 insertions(+), 10 deletions(-)
-diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
+diff --git a/hw/gpio/gpio_pwr.c b/hw/gpio/gpio_pwr.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/m_helper.c
+--- a/hw/gpio/gpio_pwr.c
-+++ b/target/arm/m_helper.c
++++ b/hw/gpio/gpio_pwr.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
+@@ -XXX,XX +XXX,XX @@ static void gpio_pwr_reset(void *opaque, int n, int level)
-             limit = is_psp ? env->v7m.psplim[false] : env->v7m.msplim[false];
+ static void gpio_pwr_shutdown(void *opaque, int n, int level)
+ {
-             if (val < limit) {
+     if (level) {
--                CPUState *cs = env_cpu(env);
+-        qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
--
++        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
 -                cpu_restore_state(cs, GETPC(), true);
 -                raise_exception(env, EXCP_STKOF, 0, 1);
 +                raise_exception_ra(env, EXCP_STKOF, 0, 1, GETPC());
              }
              if (is_psp) {
 diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/op_helper.c
 +++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v8m_stackcheck)(CPUARMState *env, uint32_t newvalue)
       * raising an exception if the limit is breached.
       */
      if (newvalue < v7m_sp_limit(env)) {
 -        CPUState *cs = env_cpu(env);
 -
          /*
           * Stack limit exceptions are a rare case, so rather than syncing
 -         * PC/condbits before the call, we use cpu_restore_state() to
 -         * get them right before raising the exception.
 +         * PC/condbits before the call, we use raise_exception_ra() so
 +         * that cpu_restore_state() will sort them out.
           */
 -        cpu_restore_state(cs, GETPC(), true);
 -        raise_exception(env, EXCP_STKOF, 0, 1);
 +        raise_exception_ra(env, EXCP_STKOF, 0, 1, GETPC());
      }
  }
 --
 .20.1

-[PULL 03/45] target/arm: Move fpsp/fpdp isar check into callers of do_vfp_2op_sp/dp
+[PULL 07/24] target/arm: Fix MVE widening/narrowing VLDR/VSTR offset calculation
-The do_vfp_2op_sp() and do_vfp_2op_dp() functions currently check
+In do_ldst(), the calculation of the offset needs to be based on the
-whether floating point is supported via the aa32_fpdp_v2 and
+size of the memory access, not the size of the elements in the
-aa32_fpsp_v2 isar checks.  For v8.1M MVE support, the VMOV_reg trans
+vector.  This meant we were getting it wrong for the widening and
-functions (but not any of the others) need to update this to also
+narrowing variants of the various VLDR and VSTR insns.
 allow the insn if MVE is implemented.  Move the check out of the do_
 function and into its callsites (which are all implemented via the
 DO_VFP_2OP macro), so we have a place to change the check for the
 VMOV insns.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210520152840.24453-4-peter.maydell@linaro.org
+Message-id: 20210628135835.6690-2-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.c | 37 +++++++++++++++++++------------------
+ target/arm/translate-mve.c | 17 +++++++++--------
-file changed, 19 insertions(+), 18 deletions(-)
+file changed, 9 insertions(+), 8 deletions(-)
-diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.c
+--- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-vfp.c
++++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
+@@ -XXX,XX +XXX,XX @@ static bool mve_skip_first_beat(DisasContext *s)
      int veclen = s->vec_len;
      TCGv_i32 f0, fd;
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 -        return false;
 -    }
 +    /* Note that the caller must check the aa32_fpsp_v2 feature. */
      if (!dc_isar_feature(aa32_fpshvec, s) &&
          (veclen != 0 || s->vec_stride != 0)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
       */
      TCGv_i32 f0;
 +    /* Note that the caller must check the aa32_fp16_arith feature */
 +
      if (!dc_isar_feature(aa32_fp16_arith, s)) {
          return false;
      }
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
-     int veclen = s->vec_len;
-     TCGv_i64 f0, fd;
--    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
--        return false;
--    }
-+    /* Note that the caller must check the aa32_fpdp_v2 feature. */
-     /* UNDEF accesses to D16-D31 if they don't exist */
-     if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vm) & 0x10)) {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
-     return true;
  }
--#define DO_VFP_2OP(INSN, PREC, FN)                              \
+-static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
-+#define DO_VFP_2OP(INSN, PREC, FN, CHECK)                       \
++static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn,
-     static bool trans_##INSN##_##PREC(DisasContext *s,          \
++                    unsigned msize)
-                                       arg_##INSN##_##PREC *a)   \
+ {
      TCGv_i32 addr;
      uint32_t offset;
@@ -XXX,XX +XXX,XX @@ static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
          return true;
      }
 -    offset = a->imm << a->size;
 +    offset = a->imm << msize;
      if (!a->a) {
          offset = -offset;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
          { gen_helper_mve_vstrw, gen_helper_mve_vldrw },
          { NULL, NULL }
      };
 -    return do_ldst(s, a, ldstfns[a->size][a->l]);
 +    return do_ldst(s, a, ldstfns[a->size][a->l], a->size);
  }
 -#define DO_VLDST_WIDE_NARROW(OP, SLD, ULD, ST)                  \
 +#define DO_VLDST_WIDE_NARROW(OP, SLD, ULD, ST, MSIZE)           \
      static bool trans_##OP(DisasContext *s, arg_VLDR_VSTR *a)   \
      {                                                           \
-+        if (!dc_isar_feature(CHECK, s)) {                       \
+         static MVEGenLdStFn * const ldstfns[2][2] = {           \
-+            return false;                                       \
+             { gen_helper_mve_##ST, gen_helper_mve_##SLD },      \
-+        }                                                       \
+             { NULL, gen_helper_mve_##ULD },                     \
-         return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
+         };                                                      \
 -        return do_ldst(s, a, ldstfns[a->u][a->l]);              \
 +        return do_ldst(s, a, ldstfns[a->u][a->l], MSIZE);       \
      }
--DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32)
+-DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h)
--DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64)
+-DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w)
-+DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32, aa32_fpsp_v2)
+-DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w)
-+DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64, aa32_fpdp_v2)
++DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h, MO_8)
++DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w, MO_8)
--DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh)
++DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w, MO_16)
--DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss)
--DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd)
+ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
 +DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh, aa32_fp16_arith)
 +DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss, aa32_fpsp_v2)
 +DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd, aa32_fpdp_v2)
 -DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh)
 -DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs)
 -DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd)
 +DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh, aa32_fp16_arith)
 +DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs, aa32_fpsp_v2)
 +DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd, aa32_fpdp_v2)
  static void gen_VSQRT_hp(TCGv_i32 vd, TCGv_i32 vm)
  {
@@ -XXX,XX +XXX,XX @@ static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
      gen_helper_vfp_sqrtd(vd, vm, cpu_env);
  }
 -DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp)
 -DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp)
 -DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp)
 +DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp, aa32_fp16_arith)
 +DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp, aa32_fpsp_v2)
 +DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp, aa32_fpdp_v2)
  static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
  {
 --
 .20.1

-[PULL 02/45] target/arm: Update feature checks for insns which are "MVE or FP"
+[PULL 08/24] target/arm: Fix bugs in MVE VRMLALDAVH, VRMLSLDAVH
-Some v8M instructions are present if either the floating point
+The initial implementation of the MVE VRMLALDAVH and VRMLSLDAVH
-extension or MVE is implemented.  Update our implementation of them
+insns had some bugs:
-to check for MVE as well as for FP.
+ * the 32x32 multiply of elements was being done as 32x32->32,
    not 32x32->64
  * we were incorrectly maintaining the accumulator in its full
 -bit form across all 4 beats of the insn; in the pseudocode
    it is squashed back into the 64 bits of the RdaHi:RdaLo
    registers after each beat
-This is all the insns which use CheckDecodeFaults(ExtType_MveOrFp) or
+In particular, fixing the second of these allows us to recast
-CheckDecodeFaults(ExtType_MveOrDpFp) in their pseudocode, which are
+the implementation to avoid 128-bit arithmetic entirely.
 essentially the loads and stores, moves and sysreg accesses, except
 for VMOV_reg_sp and VMOV_reg_dp, which we handle in subsequent
 patches because they need a refactor to provide a place to put the
 new MVE check.
+Since the element size here is always 4, we can also drop the
+parameterization of ESIZE to make the code a little more readable.
+Suggested-by: Richard Henderson <richard.henderson@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210520152840.24453-3-peter.maydell@linaro.org
+Message-id: 20210628135835.6690-3-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.c | 48 +++++++++++++++++++++++---------------
+ target/arm/mve_helper.c | 38 +++++++++++++++++++++-----------------
-file changed, 29 insertions(+), 19 deletions(-)
+file changed, 21 insertions(+), 17 deletions(-)
-diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.c
+--- a/target/arm/mve_helper.c
-+++ b/target/arm/translate-vfp.c
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
+@@ -XXX,XX +XXX,XX @@
-     /* VMOV scalar to general purpose register */
+  */
-     TCGv_i32 tmp;
+ #include "qemu/osdep.h"
--    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
+-#include "qemu/int128.h"
--    if (a->size == MO_32
+ #include "cpu.h"
--        ? !dc_isar_feature(aa32_fpsp_v2, s)
+ #include "internals.h"
--        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
+ #include "vec_internal.h"
--        return false;
+@@ -XXX,XX +XXX,XX @@ DO_LDAV(vmlsldavsw, 4, int32_t, false, +=, -=)
-+    /*
+ DO_LDAV(vmlsldavxsw, 4, int32_t, true, +=, -=)
-+     * SIZE == MO_32 is a VFP instruction; otherwise NEON. MVE has
-+     * all sizes, whether the CPU has fp or not.
+ /*
-+     */
+- * Rounding multiply add long dual accumulate high: we must keep
-+    if (!dc_isar_feature(aa32_mve, s)) {
+- * a 72-bit internal accumulator value and return the top 64 bits.
-+        if (a->size == MO_32
++ * Rounding multiply add long dual accumulate high. In the pseudocode
-+            ? !dc_isar_feature(aa32_fpsp_v2, s)
++ * this is implemented with a 72-bit internal accumulator value of which
-+            : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
++ * the top 64 bits are returned. We optimize this to avoid having to
-+            return false;
++ * use 128-bit arithmetic -- we can do this because the 74-bit accumulator
-+        }
++ * is squashed back into 64-bits after each beat.
   */
 -#define DO_LDAVH(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC, TO128)         \
 +#define DO_LDAVH(OP, TYPE, LTYPE, XCHG, SUB)                            \
      uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
                                      void *vm, uint64_t a)               \
      {                                                                   \
          uint16_t mask = mve_element_mask(env);                          \
          unsigned e;                                                     \
          TYPE *n = vn, *m = vm;                                          \
 -        Int128 acc = int128_lshift(TO128(a), 8);                        \
 -        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +        for (e = 0; e < 16 / 4; e++, mask >>= 4) {                      \
              if (mask & 1) {                                             \
 +                LTYPE mul;                                              \
                  if (e & 1) {                                            \
 -                    acc = ODDACC(acc, TO128(n[H##ESIZE(e - 1 * XCHG)] * \
 -                                            m[H##ESIZE(e)]));           \
 +                    mul = (LTYPE)n[H4(e - 1 * XCHG)] * m[H4(e)];        \
 +                    if (SUB) {                                          \
 +                        mul = -mul;                                     \
 +                    }                                                   \
                  } else {                                                \
 -                    acc = EVENACC(acc, TO128(n[H##ESIZE(e + 1 * XCHG)] * \
 -                                             m[H##ESIZE(e)]));          \
 +                    mul = (LTYPE)n[H4(e + 1 * XCHG)] * m[H4(e)];        \
                  }                                                       \
 -                acc = int128_add(acc, int128_make64(1 << 7));           \
 +                mul = (mul >> 8) + ((mul >> 7) & 1);                    \
 +                a += mul;                                               \
              }                                                           \
          }                                                               \
          mve_advance_vpt(env);                                           \
 -        return int128_getlo(int128_rshift(acc, 8));                     \
 +        return a;                                                       \
      }
-     /* UNDEF accesses to D16-D31 if they don't exist */
+-DO_LDAVH(vrmlaldavhsw, 4, int32_t, false, int128_add, int128_add, int128_makes64)
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
+-DO_LDAVH(vrmlaldavhxsw, 4, int32_t, true, int128_add, int128_add, int128_makes64)
-     /* VMOV general purpose register to scalar */
++DO_LDAVH(vrmlaldavhsw, int32_t, int64_t, false, false)
-     TCGv_i32 tmp;
++DO_LDAVH(vrmlaldavhxsw, int32_t, int64_t, true, false)
--    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
+-DO_LDAVH(vrmlaldavhuw, 4, uint32_t, false, int128_add, int128_add, int128_make64)
--    if (a->size == MO_32
++DO_LDAVH(vrmlaldavhuw, uint32_t, uint64_t, false, false)
--        ? !dc_isar_feature(aa32_fpsp_v2, s)
--        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
+-DO_LDAVH(vrmlsldavhsw, 4, int32_t, false, int128_add, int128_sub, int128_makes64)
--        return false;
+-DO_LDAVH(vrmlsldavhxsw, 4, int32_t, true, int128_add, int128_sub, int128_makes64)
-+    /*
++DO_LDAVH(vrmlsldavhsw, int32_t, int64_t, false, true)
-+     * SIZE == MO_32 is a VFP instruction; otherwise NEON. MVE has
++DO_LDAVH(vrmlsldavhxsw, int32_t, int64_t, true, true)
-+     * all sizes, whether the CPU has fp or not.
-+     */
+ /* Vector add across vector */
-+    if (!dc_isar_feature(aa32_mve, s)) {
+ #define DO_VADDV(OP, ESIZE, TYPE)                               \
 +        if (a->size == MO_32
 +            ? !dc_isar_feature(aa32_fpsp_v2, s)
 +            : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +            return false;
 +        }
      }
      /* UNDEF accesses to D16-D31 if they don't exist */
@@ -XXX,XX +XXX,XX @@ typedef enum FPSysRegCheckResult {
  static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
  {
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return FPSysRegCheckFailed;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
  {
      TCGv_i32 tmp;
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
  {
      TCGv_i32 tmp;
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
       * floating point register.  Note that this does not require support
       * for double precision arithmetic.
       */
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
      uint32_t offset;
      TCGv_i32 addr, tmp;
 -    if (!dc_isar_feature(aa32_fp16_arith, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
      uint32_t offset;
      TCGv_i32 addr, tmp;
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
      TCGv_i64 tmp;
      /* Note that this does not require support for double arithmetic.  */
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
      TCGv_i32 addr, tmp;
      int i, n;
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
      int i, n;
      /* Note that this does not require support for double arithmetic.  */
 -    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
          return false;
      }
 --
 .20.1

-[PULL 22/45] target/arm: Implement bfloat16 dot product (indexed)
+[PULL 09/24] target/arm: Make asimd_imm_const() public
-From: Richard Henderson <richard.henderson@linaro.org>
+The function asimd_imm_const() in translate-neon.c is an
 implementation of the pseudocode AdvSIMDExpandImm(), which we will
 also want for MVE.  Move the implementation to translate.c, with a
 prototype in translate.h.
-This is BFDOT for both AArch64 AdvSIMD and SVE,
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-and VDOT.BF16 for AArch32 NEON.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210628135835.6690-4-peter.maydell@linaro.org
 ---
  target/arm/translate.h      | 16 ++++++++++
  target/arm/translate-neon.c | 63 -------------------------------------
  target/arm/translate.c      | 57 +++++++++++++++++++++++++++++++++
 files changed, 73 insertions(+), 63 deletions(-)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+diff --git a/target/arm/translate.h b/target/arm/translate.h
 Message-id: 20210525225817.400336-8-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/helper.h           |  2 ++
  target/arm/neon-shared.decode |  2 ++
  target/arm/sve.decode         |  3 +++
  target/arm/translate-a64.c    | 41 +++++++++++++++++++++++++++--------
  target/arm/translate-neon.c   |  9 ++++++++
  target/arm/translate-sve.c    | 12 ++++++++++
  target/arm/vec_helper.c       | 20 +++++++++++++++++
 files changed, 80 insertions(+), 9 deletions(-)
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.h
+--- a/target/arm/translate.h
-+++ b/target/arm/helper.h
++++ b/target/arm/translate.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG,
+@@ -XXX,XX +XXX,XX @@ static inline MemOp finalize_memop(DisasContext *s, MemOp opc)
+     return opc | s->be_data;
- DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
+ }
-                    void, ptr, ptr, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
++/**
-+                   void, ptr, ptr, ptr, ptr, i32)
++ * asimd_imm_const: Expand an encoded SIMD constant value
++ *
- #ifdef TARGET_AARCH64
++ * Expand a SIMD constant value. This is essentially the pseudocode
- #include "helper-a64.h"
++ * AdvSIMDExpandImm, except that we also perform the boolean NOT needed for
-diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
++ * VMVN and VBIC (when cmode < 14 && op == 1).
-index XXXXXXX..XXXXXXX 100644
++ *
---- a/target/arm/neon-shared.decode
++ * The combination cmode == 15 op == 1 is a reserved encoding for AArch32;
-+++ b/target/arm/neon-shared.decode
++ * callers must catch this.
-@@ -XXX,XX +XXX,XX @@ VUSDOT_scalar  1111 1110 1 . 00 .... .... 1101 . q:1 index:1 0 vm:4 \
++ *
-                vn=%vn_dp vd=%vd_dp
++ * cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 was UNPREDICTABLE in v7A but
- VSUDOT_scalar  1111 1110 1 . 00 .... .... 1101 . q:1 index:1 1 vm:4 \
++ * is either not unpredictable or merely CONSTRAINED UNPREDICTABLE in v8A;
-                vn=%vn_dp vd=%vd_dp
++ * we produce an immediate constant value of 0 in these cases.
-+VDOT_b16_scal  1111 1110 0 . 00 .... .... 1101 . q:1 index:1 0 vm:4 \
++ */
-+               vn=%vn_dp vd=%vd_dp
++uint64_t asimd_imm_const(uint32_t imm, int cmode, int op);
  %vfml_scalar_q0_rm 0:3 5:1
  %vfml_scalar_q1_index 5:1 3:1
 diff --git a/target/arm/sve.decode b/target/arm/sve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/sve.decode
 +++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
  FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
  FMLSLB_zzxw     01100100 10 1 ..... 0110.0 ..... .....     @rrxr_3a esz=2
  FMLSLT_zzxw     01100100 10 1 ..... 0110.1 ..... .....     @rrxr_3a esz=2
 +
-+### SVE2 floating-point bfloat16 dot-product (indexed)
+ #endif /* TARGET_ARM_TRANSLATE_H */
 +BFDOT_zzxz      01100100 01 1 ..... 010000 ..... .....     @rrxr_2 esz=2
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
              return;
          }
          break;
 -    case 0x0f: /* SUDOT, USDOT */
 -        if (is_scalar || (size & 1) || !dc_isar_feature(aa64_i8mm, s)) {
 +    case 0x0f:
 +        switch (size) {
 +        case 0: /* SUDOT */
 +        case 2: /* USDOT */
 +            if (is_scalar || !dc_isar_feature(aa64_i8mm, s)) {
 +                unallocated_encoding(s);
 +                return;
 +            }
 +            break;
 +        case 1: /* BFDOT */
 +            if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
 +                unallocated_encoding(s);
 +                return;
 +            }
 +            break;
 +        default:
              unallocated_encoding(s);
              return;
          }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                           u ? gen_helper_gvec_udot_idx_b
                           : gen_helper_gvec_sdot_idx_b);
          return;
 -    case 0x0f: /* SUDOT, USDOT */
 -        gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
 -                         extract32(insn, 23, 1)
 -                         ? gen_helper_gvec_usdot_idx_b
 -                         : gen_helper_gvec_sudot_idx_b);
 -        return;
 -
 +    case 0x0f:
 +        switch (extract32(insn, 22, 2)) {
 +        case 0: /* SUDOT */
 +            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
 +                             gen_helper_gvec_sudot_idx_b);
 +            return;
 +        case 1: /* BFDOT */
 +            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
 +                             gen_helper_gvec_bfdot_idx);
 +            return;
 +        case 2: /* USDOT */
 +            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
 +                             gen_helper_gvec_usdot_idx_b);
 +            return;
 +        }
 +        g_assert_not_reached();
      case 0x11: /* FCMLA #0 */
      case 0x13: /* FCMLA #90 */
      case 0x15: /* FCMLA #180 */
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSUDOT_scalar(DisasContext *s, arg_VSUDOT_scalar *a)
+@@ -XXX,XX +XXX,XX @@ DO_FP_2SH(VCVT_UH, gen_helper_gvec_vcvt_uh)
-                         gen_helper_gvec_sudot_idx_b);
+ DO_FP_2SH(VCVT_HS, gen_helper_gvec_vcvt_hs)
  DO_FP_2SH(VCVT_HU, gen_helper_gvec_vcvt_hu)
 -static uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
 -{
 -    /*
 -     * Expand the encoded constant.
 -     * Note that cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 is UNPREDICTABLE.
 -     * We choose to not special-case this and will behave as if a
 -     * valid constant encoding of 0 had been given.
 -     * cmode = 15 op = 1 must UNDEF; we assume decode has handled that.
 -     */
 -    switch (cmode) {
 -    case 0: case 1:
 -        /* no-op */
 -        break;
 -    case 2: case 3:
 -        imm <<= 8;
 -        break;
 -    case 4: case 5:
 -        imm <<= 16;
 -        break;
 -    case 6: case 7:
 -        imm <<= 24;
 -        break;
 -    case 8: case 9:
 -        imm |= imm << 16;
 -        break;
 -    case 10: case 11:
 -        imm = (imm << 8) | (imm << 24);
 -        break;
 -    case 12:
 -        imm = (imm << 8) | 0xff;
 -        break;
 -    case 13:
 -        imm = (imm << 16) | 0xffff;
 -        break;
 -    case 14:
 -        if (op) {
 -            /*
 -             * This is the only case where the top and bottom 32 bits
 -             * of the encoded constant differ.
 -             */
 -            uint64_t imm64 = 0;
 -            int n;
 -
 -            for (n = 0; n < 8; n++) {
 -                if (imm & (1 << n)) {
 -                    imm64 |= (0xffULL << (n * 8));
 -                }
 -            }
 -            return imm64;
 -        }
 -        imm |= (imm << 8) | (imm << 16) | (imm << 24);
 -        break;
 -    case 15:
 -        imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
 -            | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
 -        break;
 -    }
 -    if (op) {
 -        imm = ~imm;
 -    }
 -    return dup_const(MO_32, imm);
 -}
 -
  static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
                          GVecGen2iFn *fn)
  {
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ void arm_translate_init(void)
      a64_translate_init();
  }
-+static bool trans_VDOT_b16_scal(DisasContext *s, arg_VDOT_b16_scal *a)
++uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
 +{
-+    if (!dc_isar_feature(aa32_bf16, s)) {
++    /* Expand the encoded constant as per AdvSIMDExpandImm pseudocode */
-+        return false;
++    switch (cmode) {
 +    case 0: case 1:
 +        /* no-op */
 +        break;
 +    case 2: case 3:
 +        imm <<= 8;
 +        break;
 +    case 4: case 5:
 +        imm <<= 16;
 +        break;
 +    case 6: case 7:
 +        imm <<= 24;
 +        break;
 +    case 8: case 9:
 +        imm |= imm << 16;
 +        break;
 +    case 10: case 11:
 +        imm = (imm << 8) | (imm << 24);
 +        break;
 +    case 12:
 +        imm = (imm << 8) | 0xff;
 +        break;
 +    case 13:
 +        imm = (imm << 16) | 0xffff;
 +        break;
 +    case 14:
 +        if (op) {
 +            /*
 +             * This is the only case where the top and bottom 32 bits
 +             * of the encoded constant differ.
 +             */
 +            uint64_t imm64 = 0;
 +            int n;
 +
 +            for (n = 0; n < 8; n++) {
 +                if (imm & (1 << n)) {
 +                    imm64 |= (0xffULL << (n * 8));
 +                }
 +            }
 +            return imm64;
 +        }
 +        imm |= (imm << 8) | (imm << 16) | (imm << 24);
 +        break;
 +    case 15:
 +        imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
 +            | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
 +        break;
 +    }
-+    return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index,
++    if (op) {
-+                        gen_helper_gvec_bfdot_idx);
++        imm = ~imm;
 +    }
 +    return dup_const(MO_32, imm);
 +}
 +
- static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
+ /* Generate a label used for skipping this instruction */
  void arm_gen_condlabel(DisasContext *s)
  {
-     int opr_sz;
-diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-sve.c
-+++ b/target/arm/translate-sve.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_BFDOT_zzzz(DisasContext *s, arg_rrrr_esz *a)
-     }
-     return true;
- }
-+
-+static bool trans_BFDOT_zzxz(DisasContext *s, arg_rrxr_esz *a)
-+{
-+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
-+        return false;
-+    }
-+    if (sve_access_check(s)) {
-+        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfdot_idx,
-+                          a->rd, a->rn, a->rm, a->ra, a->index);
-+    }
-+    return true;
-+}
-diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vec_helper.c
-+++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
-     }
-     clear_tail(d, opr_sz, simd_maxsz(desc));
- }
-+
-+void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm,
-+                            void *va, uint32_t desc)
-+{
-+    intptr_t i, j, opr_sz = simd_oprsz(desc);
-+    intptr_t index = simd_data(desc);
-+    intptr_t elements = opr_sz / 4;
-+    intptr_t eltspersegment = MIN(16 / 4, elements);
-+    float32 *d = vd, *a = va;
-+    uint32_t *n = vn, *m = vm;
-+
-+    for (i = 0; i < elements; i += eltspersegment) {
-+        uint32_t m_idx = m[i + H4(index)];
-+
-+        for (j = i; j < i + eltspersegment; j++) {
-+            d[j] = bfdotadd(a[j], n[j], m_idx);
-+        }
-+    }
-+    clear_tail(d, opr_sz, simd_maxsz(desc));
-+}
 --
 .20.1

-[PULL 11/45] target/arm: Mark LDS{MIN,MAX} as signed operations
+[PULL 10/24] target/arm: Use asimd_imm_const for A64 decode
-From: Richard Henderson <richard.henderson@linaro.org>
+The A64 AdvSIMD modified-immediate grouping uses almost the same
 constant encoding that A32 Neon does; reuse asimd_imm_const() (to
 which we add the AArch64-specific case for cmode 15 op 1) instead of
 reimplementing it all.
-The operands to tcg_gen_atomic_fetch_s{min,max}_i64 must
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-be signed, so that the inputs are properly extended.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Zero extend the result afterward, as needed.
+Message-id: 20210628135835.6690-5-peter.maydell@linaro.org
 ---
  target/arm/translate.h     |  3 +-
  target/arm/translate-a64.c | 86 ++++----------------------------------
  target/arm/translate.c     | 17 +++++++-
 files changed, 24 insertions(+), 82 deletions(-)
-Resolves: https://gitlab.com/qemu-project/qemu/-/issues/364
+diff --git a/target/arm/translate.h b/target/arm/translate.h
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+index XXXXXXX..XXXXXXX 100644
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+--- a/target/arm/translate.h
-Message-id: 20210602020720.47679-1-richard.henderson@linaro.org
++++ b/target/arm/translate.h
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+@@ -XXX,XX +XXX,XX @@ static inline MemOp finalize_memop(DisasContext *s, MemOp opc)
----
+  * VMVN and VBIC (when cmode < 14 && op == 1).
- target/arm/translate-a64.c | 13 ++++++++++---
+  *
-file changed, 10 insertions(+), 3 deletions(-)
+  * The combination cmode == 15 op == 1 is a reserved encoding for AArch32;
+- * callers must catch this.
 + * callers must catch this; we return the 64-bit constant value defined
 + * for AArch64.
   *
   * cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 was UNPREDICTABLE in v7A but
   * is either not unpredictable or merely CONSTRAINED UNPREDICTABLE in v8A;
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
-     int o3_opc = extract32(insn, 12, 4);
+ {
-     bool r = extract32(insn, 22, 1);
+     int rd = extract32(insn, 0, 5);
-     bool a = extract32(insn, 23, 1);
+     int cmode = extract32(insn, 12, 4);
--    TCGv_i64 tcg_rs, clean_addr;
+-    int cmode_3_1 = extract32(cmode, 1, 3);
-+    TCGv_i64 tcg_rs, tcg_rt, clean_addr;
+-    int cmode_0 = extract32(cmode, 0, 1);
-     AtomicThreeOpFn *fn = NULL;
+     int o2 = extract32(insn, 11, 1);
-+    MemOp mop = s->be_data | size | MO_ALIGN;
+     uint64_t abcdefgh = extract32(insn, 5, 5) | (extract32(insn, 16, 3) << 5);
+     bool is_neg = extract32(insn, 29, 1);
-     if (is_vector || !dc_isar_feature(aa64_atomics, s)) {
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
-         unallocated_encoding(s);
+         return;
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
+     }
 -    /* See AdvSIMDExpandImm() in ARM ARM */
 -    switch (cmode_3_1) {
 -    case 0: /* Replicate(Zeros(24):imm8, 2) */
 -    case 1: /* Replicate(Zeros(16):imm8:Zeros(8), 2) */
 -    case 2: /* Replicate(Zeros(8):imm8:Zeros(16), 2) */
 -    case 3: /* Replicate(imm8:Zeros(24), 2) */
 -    {
 -        int shift = cmode_3_1 * 8;
 -        imm = bitfield_replicate(abcdefgh << shift, 32);
 -        break;
 -    }
 -    case 4: /* Replicate(Zeros(8):imm8, 4) */
 -    case 5: /* Replicate(imm8:Zeros(8), 4) */
 -    {
 -        int shift = (cmode_3_1 & 0x1) * 8;
 -        imm = bitfield_replicate(abcdefgh << shift, 16);
 -        break;
 -    }
 -    case 6:
 -        if (cmode_0) {
 -            /* Replicate(Zeros(8):imm8:Ones(16), 2) */
 -            imm = (abcdefgh << 16) | 0xffff;
 -        } else {
 -            /* Replicate(Zeros(16):imm8:Ones(8), 2) */
 -            imm = (abcdefgh << 8) | 0xff;
 -        }
 -        imm = bitfield_replicate(imm, 32);
 -        break;
 -    case 7:
 -        if (!cmode_0 && !is_neg) {
 -            imm = bitfield_replicate(abcdefgh, 8);
 -        } else if (!cmode_0 && is_neg) {
 -            int i;
 -            imm = 0;
 -            for (i = 0; i < 8; i++) {
 -                if ((abcdefgh) & (1 << i)) {
 -                    imm |= 0xffULL << (i * 8);
 -                }
 -            }
 -        } else if (cmode_0) {
 -            if (is_neg) {
 -                imm = (abcdefgh & 0x3f) << 48;
 -                if (abcdefgh & 0x80) {
 -                    imm |= 0x8000000000000000ULL;
 -                }
 -                if (abcdefgh & 0x40) {
 -                    imm |= 0x3fc0000000000000ULL;
 -                } else {
 -                    imm |= 0x4000000000000000ULL;
 -                }
 -            } else {
 -                if (o2) {
 -                    /* FMOV (vector, immediate) - half-precision */
 -                    imm = vfp_expand_imm(MO_16, abcdefgh);
 -                    /* now duplicate across the lanes */
 -                    imm = bitfield_replicate(imm, 16);
 -                } else {
 -                    imm = (abcdefgh & 0x3f) << 19;
 -                    if (abcdefgh & 0x80) {
 -                        imm |= 0x80000000;
 -                    }
 -                    if (abcdefgh & 0x40) {
 -                        imm |= 0x3e000000;
 -                    } else {
 -                        imm |= 0x40000000;
 -                    }
 -                    imm |= (imm << 32);
 -                }
 -            }
 -        }
 -        break;
 -    default:
 -        g_assert_not_reached();
 -    }
 -
 -    if (cmode_3_1 != 7 && is_neg) {
 -        imm = ~imm;
 +    if (cmode == 15 && o2 && !is_neg) {
 +        /* FMOV (vector, immediate) - half-precision */
 +        imm = vfp_expand_imm(MO_16, abcdefgh);
 +        /* now duplicate across the lanes */
 +        imm = bitfield_replicate(imm, 16);
 +    } else {
 +        imm = asimd_imm_const(abcdefgh, cmode, is_neg);
      }
      if (!((cmode & 0x9) == 0x1 || (cmode & 0xd) == 0x9)) {
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
      case 14:
          if (op) {
              /*
 -             * This is the only case where the top and bottom 32 bits
 -             * of the encoded constant differ.
 +             * This and cmode == 15 op == 1 are the only cases where
 +             * the top and bottom 32 bits of the encoded constant differ.
               */
              uint64_t imm64 = 0;
              int n;
@@ -XXX,XX +XXX,XX @@ uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
          imm |= (imm << 8) | (imm << 16) | (imm << 24);
          break;
-     case 004: /* LDSMAX */
+     case 15:
-         fn = tcg_gen_atomic_fetch_smax_i64;
++        if (op) {
-+        mop |= MO_SIGN;
++            /* Reserved encoding for AArch32; valid for AArch64 */
 +            uint64_t imm64 = (uint64_t)(imm & 0x3f) << 48;
 +            if (imm & 0x80) {
 +                imm64 |= 0x8000000000000000ULL;
 +            }
 +            if (imm & 0x40) {
 +                imm64 |= 0x3fc0000000000000ULL;
 +            } else {
 +                imm64 |= 0x4000000000000000ULL;
 +            }
 +            return imm64;
 +        }
          imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
              | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
          break;
-     case 005: /* LDSMIN */
-         fn = tcg_gen_atomic_fetch_smin_i64;
-+        mop |= MO_SIGN;
-         break;
-     case 006: /* LDUMAX */
-         fn = tcg_gen_atomic_fetch_umax_i64;
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
-     }
-     tcg_rs = read_cpu_reg(s, rs, true);
-+    tcg_rt = cpu_reg(s, rt);
-     if (o3_opc == 1) { /* LDCLR */
-         tcg_gen_not_i64(tcg_rs, tcg_rs);
-@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
-     /* The tcg atomic primitives are all full barriers.  Therefore we
-      * can ignore the Acquire and Release bits of this instruction.
-      */
--    fn(cpu_reg(s, rt), clean_addr, tcg_rs, get_mem_index(s),
--       s->be_data | size | MO_ALIGN);
-+    fn(tcg_rt, clean_addr, tcg_rs, get_mem_index(s), mop);
-+
-+    if ((mop & MO_SIGN) && size != MO_64) {
-+        tcg_gen_ext32u_i64(tcg_rt, tcg_rt);
-+    }
- }
- /*
 --
 .20.1

-[PULL 17/45] target/arm: Unify unallocated path in disas_fp_1src
+[PULL 11/24] target/arm: Use dup_const() instead of bitfield_replicate()
-From: Richard Henderson <richard.henderson@linaro.org>
+Use dup_const() instead of bitfield_replicate() in
 disas_simd_mod_imm().
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+(We can't replace the other use of bitfield_replicate() in this file,
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+in logic_imm_decode_wmask(), because that location needs to handle 2
-Message-id: 20210525225817.400336-3-richard.henderson@linaro.org
+and 4 bit elements, which dup_const() cannot.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210628135835.6690-6-peter.maydell@linaro.org
 ---
- target/arm/translate-a64.c | 15 ++++++---------
+ target/arm/translate-a64.c | 2 +-
-file changed, 6 insertions(+), 9 deletions(-)
+file changed, 1 insertion(+), 1 deletion(-)
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
-     int rd = extract32(insn, 0, 5);
+         /* FMOV (vector, immediate) - half-precision */
+         imm = vfp_expand_imm(MO_16, abcdefgh);
-     if (mos) {
+         /* now duplicate across the lanes */
--        unallocated_encoding(s);
+-        imm = bitfield_replicate(imm, 16);
--        return;
++        imm = dup_const(MO_16, imm);
-+        goto do_unallocated;
+     } else {
-     }
+         imm = asimd_imm_const(abcdefgh, cmode, is_neg);
      switch (opcode) {
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
          /* FCVT between half, single and double precision */
          int dtype = extract32(opcode, 0, 2);
          if (type == 2 || dtype == type) {
 -            unallocated_encoding(s);
 -            return;
 +            goto do_unallocated;
          }
          if (!fp_access_check(s)) {
              return;
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
      case 0x10 ... 0x13: /* FRINT{32,64}{X,Z} */
          if (type > 1 || !dc_isar_feature(aa64_frint, s)) {
 -            unallocated_encoding(s);
 -            return;
 +            goto do_unallocated;
          }
          /* fall through */
      case 0x0 ... 0x3:
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
              break;
          case 3:
              if (!dc_isar_feature(aa64_fp16, s)) {
 -                unallocated_encoding(s);
 -                return;
 +                goto do_unallocated;
              }
              if (!fp_access_check(s)) {
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
              handle_fp_1src_half(s, opcode, rd, rn);
              break;
          default:
 -            unallocated_encoding(s);
 +            goto do_unallocated;
          }
          break;
      default:
 +    do_unallocated:
          unallocated_encoding(s);
          break;
      }
 --
 .20.1

-[PULL 24/45] target/arm: Implement bfloat widening fma (vector)
+[PULL 12/24] target/arm: Implement MVE logical immediate insns
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the MVE logical-immediate insns (VMOV, VMVN,
 VORR and VBIC). These have essentially the same encoding
 as their Neon equivalents, and we implement the decode
 in the same way.
-This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-and VFMA{B,T}.BF16 for AArch32 NEON.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210628135835.6690-7-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h    |  4 +++
  target/arm/mve.decode      | 17 +++++++++++++
  target/arm/mve_helper.c    | 24 ++++++++++++++++++
  target/arm/translate-mve.c | 50 ++++++++++++++++++++++++++++++++++++++
 files changed, 95 insertions(+)
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210525225817.400336-10-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/helper.h           |  3 +++
  target/arm/neon-shared.decode |  3 +++
  target/arm/sve.decode         |  3 +++
  target/arm/translate-a64.c    | 13 +++++++++----
  target/arm/translate-neon.c   |  9 +++++++++
  target/arm/translate-sve.c    | 30 ++++++++++++++++++++++++++++++
  target/arm/vec_helper.c       | 16 ++++++++++++++++
 files changed, 73 insertions(+), 4 deletions(-)
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.h
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/helper.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vaddvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
- DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_3(mve_vaddvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
-                    void, ptr, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_3(mve_vaddvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
+ DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
 +                   void, ptr, ptr, ptr, ptr, ptr, i32)
 +
- #ifdef TARGET_AARCH64
++DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
- #include "helper-a64.h"
++DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
- #include "helper-sve.h"
++DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
-diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-shared.decode
+--- a/target/arm/mve.decode
-+++ b/target/arm/neon-shared.decode
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ VUSMMLA        1111 1100 1.10 .... .... 1100 .1.0 .... \
+@@ -XXX,XX +XXX,XX @@
- VMMLA_b16      1111 1100 0.00 .... .... 1100 .1.0 .... \
+ # VQDMULL has size in bit 28: 0 for 16 bit, 1 for 32 bit
-                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+ %size_28 28:1 !function=plus_1
-+VFMA_b16       1111 110 0 0.11 .... .... 1000 . q:1 . 1 .... \
++# 1imm format immediate
-+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
++%imm_28_16_0 28:1 16:3 0:4
 +
- VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
+ &vldr_vstr rn qd imm p a w size l u
-                vn=%vn_dp vd=%vd_dp size=1
+ &1op qd qm size
- VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
+ &2op qd qm qn size
-diff --git a/target/arm/sve.decode b/target/arm/sve.decode
+ &2scalar qd qn rm size
 +&1imm qd imm cmode op
  @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
  # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -XXX,XX +XXX,XX @@
  @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
  @2op_sz28 .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn \
       size=%size_28
 +@1imm .... .... .... .... .... cmode:4 .. op:1 . .... &1imm qd=%qd imm=%imm_28_16_0
  # The _rev suffix indicates that Vn and Vm are reversed. This is
  # the case for shifts. In the Arm ARM these insns are documented
@@ -XXX,XX +XXX,XX @@ VADDV            111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rd
  # Predicate operations
  %mask_22_13      22:1 13:3
  VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
 +
 +# Logical immediate operations (1 reg and modified-immediate)
 +
 +# The cmode/op bits here decode VORR/VBIC/VMOV/VMVN, but
 +# not in a way we can conveniently represent in decodetree without
 +# a lot of repetition:
 +# VORR: op=0, (cmode & 1) && cmode < 12
 +# VBIC: op=1, (cmode & 1) && cmode < 12
 +# VMOV: everything else
 +# So we have a single decode line and check the cmode/op in the
 +# trans function.
 +Vimm_1r 111 . 1111 1 . 00 0 ... ... 0 .... 0 1 . 1 .... @1imm
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/sve.decode
+--- a/target/arm/mve_helper.c
-+++ b/target/arm/sve.decode
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
+@@ -XXX,XX +XXX,XX @@ DO_1OP(vnegw, 4, int32_t, DO_NEG)
- FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_e0
+ DO_1OP(vfnegh, 8, uint64_t, DO_FNEGH)
- FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_e0
+ DO_1OP(vfnegs, 8, uint64_t, DO_FNEGS)
-+BFMLALB_zzzw    01100100 11 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
++/*
-+BFMLALT_zzzw    01100100 11 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
++ * 1 operand immediates: Vda is destination and possibly also one source.
 + * All these insns work at 64-bit widths.
 + */
 +#define DO_1OP_IMM(OP, FN)                                              \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vda, uint64_t imm)    \
 +    {                                                                   \
 +        uint64_t *da = vda;                                             \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        for (e = 0; e < 16 / 8; e++, mask >>= 8) {                      \
 +            mergemask(&da[H8(e)], FN(da[H8(e)], imm), mask);            \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
- ### SVE2 floating-point bfloat16 dot-product
++#define DO_MOVI(N, I) (I)
- BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
++#define DO_ANDI(N, I) ((N) & (I))
++#define DO_ORRI(N, I) ((N) | (I))
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
++
 +DO_1OP_IMM(vmovi, DO_MOVI)
 +DO_1OP_IMM(vandi, DO_ANDI)
 +DO_1OP_IMM(vorri, DO_ORRI)
 +
  #define DO_2OP(OP, ESIZE, TYPE, FN)                                     \
      void HELPER(glue(mve_, OP))(CPUARMState *env,                       \
                                  void *vd, void *vn, void *vm)           \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
-         }
+ typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
-         feature = dc_isar_feature(aa64_bf16, s);
+ typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
-         break;
+ typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
--    case 0x1f: /* BFDOT */
++typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
-+    case 0x1f:
-         switch (size) {
+ /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
--        case 1:
+ static inline long mve_qreg_offset(unsigned reg)
-+        case 1: /* BFDOT */
+@@ -XXX,XX +XXX,XX @@ static bool trans_VADDV(DisasContext *s, arg_VADDV *a)
-+        case 3: /* BFMLAL{B,T} */
+     mve_update_eci(s);
              feature = dc_isar_feature(aa64_bf16, s);
              break;
          default:
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
      case 0xd: /* BFMMLA */
          gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
          return;
 -    case 0xf: /* BFDOT */
 +    case 0xf:
          switch (size) {
 -        case 1:
 +        case 1: /* BFDOT */
              gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
              break;
 +        case 3: /* BFMLAL{B,T} */
 +            gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, false, is_q,
 +                              gen_helper_gvec_bfmlal);
 +            break;
          default:
              g_assert_not_reached();
          }
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMMLA_b16(DisasContext *s, arg_VMMLA_b16 *a)
      return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
                          gen_helper_gvec_bfmmla);
  }
 +
 +static bool trans_VFMA_b16(DisasContext *s, arg_VFMA_b16 *a)
 +{
 +    if (!dc_isar_feature(aa32_bf16, s)) {
 +        return false;
 +    }
 +    return do_neon_ddda_fpst(s, 7, a->vd, a->vn, a->vm, a->q, FPST_STD,
 +                             gen_helper_gvec_bfmlal);
 +}
 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-sve.c
 +++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMMLA(DisasContext *s, arg_rrrr_esz *a)
      }
      return true;
  }
 +
-+static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel)
++static bool do_1imm(DisasContext *s, arg_1imm *a, MVEGenOneOpImmFn *fn)
 +{
-+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
++    TCGv_ptr qd;
 +    uint64_t imm;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd) ||
 +        !fn) {
 +        return false;
 +    }
-+    if (sve_access_check(s)) {
++    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-+        TCGv_ptr status = fpstatus_ptr(FPST_FPCR);
++        return true;
-+        unsigned vsz = vec_full_reg_size(s);
++    }
 +
-+        tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
++    imm = asimd_imm_const(a->imm, a->cmode, a->op);
-+                           vec_full_reg_offset(s, a->rn),
++
-+                           vec_full_reg_offset(s, a->rm),
++    qd = mve_qreg_ptr(a->qd);
-+                           vec_full_reg_offset(s, a->ra),
++    fn(cpu_env, qd, tcg_constant_i64(imm));
-+                           status, vsz, vsz, sel,
++    tcg_temp_free_ptr(qd);
-+                           gen_helper_gvec_bfmlal);
++    mve_update_eci(s);
 +        tcg_temp_free_ptr(status);
 +    }
 +    return true;
 +}
 +
-+static bool trans_BFMLALB_zzzw(DisasContext *s, arg_rrrr_esz *a)
++static bool trans_Vimm_1r(DisasContext *s, arg_1imm *a)
 +{
-+    return do_BFMLAL_zzzw(s, a, false);
++    /* Handle decode of cmode/op here between VORR/VBIC/VMOV */
-+}
++    MVEGenOneOpImmFn *fn;
 +
-+static bool trans_BFMLALT_zzzw(DisasContext *s, arg_rrrr_esz *a)
++    if ((a->cmode & 1) && a->cmode < 12) {
-+{
++        if (a->op) {
-+    return do_BFMLAL_zzzw(s, a, true);
++            /*
-+}
++             * For op=1, the immediate will be inverted by asimd_imm_const(),
-diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
++             * so the VBIC becomes a logical AND operation.
-index XXXXXXX..XXXXXXX 100644
++             */
---- a/target/arm/vec_helper.c
++            fn = gen_helper_mve_vandi;
-+++ b/target/arm/vec_helper.c
++        } else {
-@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
++            fn = gen_helper_mve_vorri;
-     }
++        }
-     clear_tail(d, opr_sz, simd_maxsz(desc));
++    } else {
- }
++        /* There is one unallocated cmode/op combination in this space */
-+
++        if (a->cmode == 15 && a->op == 1) {
-+void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va,
++            return false;
-+                         void *stat, uint32_t desc)
++        }
-+{
++        /* asimd_imm_const() sorts out VMVNI vs VMOVI for us */
-+    intptr_t i, opr_sz = simd_oprsz(desc);
++        fn = gen_helper_mve_vmovi;
 +    intptr_t sel = simd_data(desc);
 +    float32 *d = vd, *a = va;
 +    bfloat16 *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz / 4; ++i) {
 +        float32 nn = n[H2(i * 2 + sel)] << 16;
 +        float32 mm = m[H2(i * 2 + sel)] << 16;
 +        d[H4(i)] = float32_muladd(nn, mm, a[H4(i)], 0, stat);
 +    }
-+    clear_tail(d, opr_sz, simd_maxsz(desc));
++    return do_1imm(s, a, fn);
 +}
 --
 .20.1

-[PULL 25/45] target/arm: Implement bfloat widening fma (indexed)
+[PULL 13/24] target/arm: Implement MVE vector shift left by immediate insns
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the MVE shift-vector-left-by-immediate insns VSHL, VQSHL
+and VQSHLU.
-This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
-and VFMA{B,T}.BF16 for AArch32 NEON.
+The size-and-immediate encoding here is the same as Neon, and we
+handle it the same way neon-dp.decode does.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210525225817.400336-11-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210628135835.6690-8-peter.maydell@linaro.org
 ---
- target/arm/helper.h           |  2 ++
+ target/arm/helper-mve.h    | 16 +++++++++++
- target/arm/neon-shared.decode |  2 ++
+ target/arm/mve.decode      | 23 +++++++++++++++
- target/arm/sve.decode         |  2 ++
+ target/arm/mve_helper.c    | 57 ++++++++++++++++++++++++++++++++++++++
- target/arm/translate-a64.c    | 15 ++++++++++++++-
+ target/arm/translate-mve.c | 51 ++++++++++++++++++++++++++++++++++
- target/arm/translate-neon.c   | 10 ++++++++++
+files changed, 147 insertions(+)
- target/arm/translate-sve.c    | 30 ++++++++++++++++++++++++++++++
- target/arm/vec_helper.c       | 22 ++++++++++++++++++++++
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-files changed, 82 insertions(+), 1 deletion(-)
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper-mve.h
-diff --git a/target/arm/helper.h b/target/arm/helper.h
++++ b/target/arm/helper-mve.h
-index XXXXXXX..XXXXXXX 100644
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
---- a/target/arm/helper.h
+ DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
-+++ b/target/arm/helper.h
+ DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
++
- DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
++DEF_HELPER_FLAGS_4(mve_vshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-                    void, ptr, ptr, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(mve_vshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG,
++DEF_HELPER_FLAGS_4(mve_vshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+                   void, ptr, ptr, ptr, ptr, ptr, i32)
++
++DEF_HELPER_FLAGS_4(mve_vqshli_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- #ifdef TARGET_AARCH64
++DEF_HELPER_FLAGS_4(mve_vqshli_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- #include "helper-a64.h"
++DEF_HELPER_FLAGS_4(mve_vqshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
++
-index XXXXXXX..XXXXXXX 100644
++DEF_HELPER_FLAGS_4(mve_vqshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
---- a/target/arm/neon-shared.decode
++DEF_HELPER_FLAGS_4(mve_vqshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+++ b/target/arm/neon-shared.decode
++DEF_HELPER_FLAGS_4(mve_vqshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 0 . 1 index:1 ... \
++
-                rm=%vfml_scalar_q0_rm vn=%vn_sp vd=%vd_dp q=0
++DEF_HELPER_FLAGS_4(mve_vqshlui_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 1 . 1 . rm:3 \
++DEF_HELPER_FLAGS_4(mve_vqshlui_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-                index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp q=1
++DEF_HELPER_FLAGS_4(mve_vqshlui_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+VFMA_b16_scal  1111 1110 0.11 .... .... 1000 . q:1 . 1 . vm:3 \
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-+               index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp
+index XXXXXXX..XXXXXXX 100644
-diff --git a/target/arm/sve.decode b/target/arm/sve.decode
+--- a/target/arm/mve.decode
-index XXXXXXX..XXXXXXX 100644
++++ b/target/arm/mve.decode
---- a/target/arm/sve.decode
+@@ -XXX,XX +XXX,XX @@
-+++ b/target/arm/sve.decode
+ &2op qd qm qn size
-@@ -XXX,XX +XXX,XX @@ FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
+ &2scalar qd qn rm size
- FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
+ &1imm qd imm cmode op
- FMLSLB_zzxw     01100100 10 1 ..... 0110.0 ..... .....     @rrxr_3a esz=2
++&2shift qd qm shift size
- FMLSLT_zzxw     01100100 10 1 ..... 0110.1 ..... .....     @rrxr_3a esz=2
-+BFMLALB_zzxw    01100100 11 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
+ @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
-+BFMLALT_zzxw    01100100 11 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
+ # Note that both Rn and Qd are 3 bits only (no D bit)
+@@ -XXX,XX +XXX,XX @@
- ### SVE2 floating-point bfloat16 dot-product (indexed)
+ @2scalar .... .... .. size:2 .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
- BFDOT_zzxz      01100100 01 1 ..... 010000 ..... .....     @rrxr_2 esz=2
+ @2scalar_nosz .... .... .... .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
-index XXXXXXX..XXXXXXX 100644
++@2_shl_b .... .... .. 001 shift:3 .... .... .... .... &2shift qd=%qd qm=%qm size=0
---- a/target/arm/translate-a64.c
++@2_shl_h .... .... .. 01  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
-+++ b/target/arm/translate-a64.c
++@2_shl_w .... .... .. 1   shift:5 .... .... .... .... &2shift qd=%qd qm=%qm size=2
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
++
-                 unallocated_encoding(s);
+ # Vector loads and stores
-                 return;
-             }
+ # Widening loads and narrowing stores:
-+            size = MO_32;
+@@ -XXX,XX +XXX,XX @@ VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
-             break;
+ # So we have a single decode line and check the cmode/op in the
-         case 1: /* BFDOT */
+ # trans function.
-             if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
+ Vimm_1r 111 . 1111 1 . 00 0 ... ... 0 .... 0 1 . 1 .... @1imm
-                 unallocated_encoding(s);
++
-                 return;
++# Shifts by immediate
-             }
++
-+            size = MO_32;
++VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_b
-+            break;
++VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_h
-+        case 3: /* BFMLAL{B,T} */
++VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_w
-+            if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
++
-+                unallocated_encoding(s);
++VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_b
-+                return;
++VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_h
-+            }
++VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_w
-+            /* can't set is_fp without other incorrect size checks */
++
-+            size = MO_16;
++VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_b
-             break;
++VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_h
-         default:
++VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_w
-             unallocated_encoding(s);
++
-             return;
++VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_b
-         }
++VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_h
--        size = MO_32;
++VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_w
-         break;
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-     case 0x11: /* FCMLA #0 */
+index XXXXXXX..XXXXXXX 100644
-     case 0x13: /* FCMLA #90 */
+--- a/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
++++ b/target/arm/mve_helper.c
-             gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
+@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT(vqsubsw, 4, int32_t, DO_SQSUB_W)
-                              gen_helper_gvec_usdot_idx_b);
+     WRAP_QRSHL_HELPER(do_sqrshl_bhs, N, M, true, satp)
-             return;
+ #define DO_UQRSHL_OP(N, M, satp) \
-+        case 3: /* BFMLAL{B,T} */
+     WRAP_QRSHL_HELPER(do_uqrshl_bhs, N, M, true, satp)
-+            gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, 0, (index << 1) | is_q,
++#define DO_SUQSHL_OP(N, M, satp) \
-+                              gen_helper_gvec_bfmlal_idx);
++    WRAP_QRSHL_HELPER(do_suqrshl_bhs, N, M, false, satp)
-+            return;
-         }
+ DO_2OP_SAT_S(vqshls, DO_SQSHL_OP)
-         g_assert_not_reached();
+ DO_2OP_SAT_U(vqshlu, DO_UQSHL_OP)
-     case 0x11: /* FCMLA #0 */
+@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvsw, 4, uint32_t)
-diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
+ DO_VADDV(vaddvub, 1, uint8_t)
-index XXXXXXX..XXXXXXX 100644
+ DO_VADDV(vaddvuh, 2, uint16_t)
---- a/target/arm/translate-neon.c
+ DO_VADDV(vaddvuw, 4, uint32_t)
-+++ b/target/arm/translate-neon.c
++
-@@ -XXX,XX +XXX,XX @@ static bool trans_VFMA_b16(DisasContext *s, arg_VFMA_b16 *a)
++/* Shifts by immediate */
-     return do_neon_ddda_fpst(s, 7, a->vd, a->vn, a->vm, a->q, FPST_STD,
++#define DO_2SHIFT(OP, ESIZE, TYPE, FN)                          \
-                              gen_helper_gvec_bfmlal);
++    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
 +                                void *vm, uint32_t shift)       \
 +    {                                                           \
 +        TYPE *d = vd, *m = vm;                                  \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned e;                                             \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
 +            mergemask(&d[H##ESIZE(e)],                          \
 +                      FN(m[H##ESIZE(e)], shift), mask);         \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +    }
 +
 +#define DO_2SHIFT_SAT(OP, ESIZE, TYPE, FN)                      \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
 +                                void *vm, uint32_t shift)       \
 +    {                                                           \
 +        TYPE *d = vd, *m = vm;                                  \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned e;                                             \
 +        bool qc = false;                                        \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
 +            bool sat = false;                                   \
 +            mergemask(&d[H##ESIZE(e)],                          \
 +                      FN(m[H##ESIZE(e)], shift, &sat), mask);   \
 +            qc |= sat & mask & 1;                               \
 +        }                                                       \
 +        if (qc) {                                               \
 +            env->vfp.qc[0] = qc;                                \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +    }
 +
 +/* provide unsigned 2-op shift helpers for all sizes */
 +#define DO_2SHIFT_U(OP, FN)                     \
 +    DO_2SHIFT(OP##b, 1, uint8_t, FN)            \
 +    DO_2SHIFT(OP##h, 2, uint16_t, FN)           \
 +    DO_2SHIFT(OP##w, 4, uint32_t, FN)
 +
 +#define DO_2SHIFT_SAT_U(OP, FN)                 \
 +    DO_2SHIFT_SAT(OP##b, 1, uint8_t, FN)        \
 +    DO_2SHIFT_SAT(OP##h, 2, uint16_t, FN)       \
 +    DO_2SHIFT_SAT(OP##w, 4, uint32_t, FN)
 +#define DO_2SHIFT_SAT_S(OP, FN)                 \
 +    DO_2SHIFT_SAT(OP##b, 1, int8_t, FN)         \
 +    DO_2SHIFT_SAT(OP##h, 2, int16_t, FN)        \
 +    DO_2SHIFT_SAT(OP##w, 4, int32_t, FN)
 +
 +DO_2SHIFT_U(vshli_u, DO_VSHLU)
 +DO_2SHIFT_SAT_U(vqshli_u, DO_UQSHL_OP)
 +DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
 +DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
 +typedef void MVEGenTwoOpShiftFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
  typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1imm *a)
      }
      return do_1imm(s, a, fn);
  }
 +
-+static bool trans_VFMA_b16_scal(DisasContext *s, arg_VFMA_b16_scal *a)
++static bool do_2shift(DisasContext *s, arg_2shift *a, MVEGenTwoOpShiftFn fn,
 +                      bool negateshift)
 +{
-+    if (!dc_isar_feature(aa32_bf16, s)) {
++    TCGv_ptr qd, qm;
 +    int shift = a->shift;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd | a->qm) ||
 +        !fn) {
 +        return false;
 +    }
-+    return do_neon_ddda_fpst(s, 6, a->vd, a->vn, a->vm,
++    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-+                             (a->index << 1) | a->q, FPST_STD,
++        return true;
-+                             gen_helper_gvec_bfmlal_idx);
++    }
-+}
++
-diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
++    /*
-index XXXXXXX..XXXXXXX 100644
++     * When we handle a right shift insn using a left-shift helper
---- a/target/arm/translate-sve.c
++     * which permits a negative shift count to indicate a right-shift,
-+++ b/target/arm/translate-sve.c
++     * we must negate the shift count.
-@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLALT_zzzw(DisasContext *s, arg_rrrr_esz *a)
++     */
- {
++    if (negateshift) {
-     return do_BFMLAL_zzzw(s, a, true);
++        shift = -shift;
- }
++    }
 +
-+static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel)
++    qd = mve_qreg_ptr(a->qd);
-+{
++    qm = mve_qreg_ptr(a->qm);
-+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
++    fn(cpu_env, qd, qm, tcg_constant_i32(shift));
-+        return false;
++    tcg_temp_free_ptr(qd);
-+    }
++    tcg_temp_free_ptr(qm);
-+    if (sve_access_check(s)) {
++    mve_update_eci(s);
 +        TCGv_ptr status = fpstatus_ptr(FPST_FPCR);
 +        unsigned vsz = vec_full_reg_size(s);
 +
 +        tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
 +                           vec_full_reg_offset(s, a->rn),
 +                           vec_full_reg_offset(s, a->rm),
 +                           vec_full_reg_offset(s, a->ra),
 +                           status, vsz, vsz, (a->index << 1) | sel,
 +                           gen_helper_gvec_bfmlal_idx);
 +        tcg_temp_free_ptr(status);
 +    }
 +    return true;
 +}
 +
-+static bool trans_BFMLALB_zzxw(DisasContext *s, arg_rrxr_esz *a)
++#define DO_2SHIFT(INSN, FN, NEGATESHIFT)                         \
-+{
++    static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
-+    return do_BFMLAL_zzxw(s, a, false);
++    {                                                           \
-+}
++        static MVEGenTwoOpShiftFn * const fns[] = {             \
-+
++            gen_helper_mve_##FN##b,                             \
-+static bool trans_BFMLALT_zzxw(DisasContext *s, arg_rrxr_esz *a)
++            gen_helper_mve_##FN##h,                             \
-+{
++            gen_helper_mve_##FN##w,                             \
-+    return do_BFMLAL_zzxw(s, a, true);
++            NULL,                                               \
-+}
++        };                                                      \
-diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
++        return do_2shift(s, a, fns[a->size], NEGATESHIFT);      \
-index XXXXXXX..XXXXXXX 100644
++    }
---- a/target/arm/vec_helper.c
++
-+++ b/target/arm/vec_helper.c
++DO_2SHIFT(VSHLI, vshli_u, false)
-@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va,
++DO_2SHIFT(VQSHLI_S, vqshli_s, false)
-     }
++DO_2SHIFT(VQSHLI_U, vqshli_u, false)
-     clear_tail(d, opr_sz, simd_maxsz(desc));
++DO_2SHIFT(VQSHLUI, vqshlui_s, false)
  }
 +
 +void HELPER(gvec_bfmlal_idx)(void *vd, void *vn, void *vm,
 +                             void *va, void *stat, uint32_t desc)
 +{
 +    intptr_t i, j, opr_sz = simd_oprsz(desc);
 +    intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 1);
 +    intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 1, 3);
 +    intptr_t elements = opr_sz / 4;
 +    intptr_t eltspersegment = MIN(16 / 4, elements);
 +    float32 *d = vd, *a = va;
 +    bfloat16 *n = vn, *m = vm;
 +
 +    for (i = 0; i < elements; i += eltspersegment) {
 +        float32 m_idx = m[H2(2 * i + index)] << 16;
 +
 +        for (j = i; j < i + eltspersegment; j++) {
 +            float32 n_j = n[H2(2 * j + sel)] << 16;
 +            d[H4(j)] = float32_muladd(n_j, m_idx, a[H4(j)], 0, stat);
 +        }
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 --
 .20.1

-[PULL 30/45] hvf: Move cpu functions into common directory
+[PULL 14/24] target/arm: Implement MVE vector shift right by immediate insns
-From: Alexander Graf <agraf@csgraf.de>
+Implement the MVE vector shift right by immediate insns VSHRI and
 VRSHRI.  As with Neon, we implement these by using helper functions
 which perform left shifts but allow negative shift counts to indicate
 right shifts.
-Until now, Hypervisor.framework has only been available on x86_64 systems.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-With Apple Silicon shipping now, it extends its reach to aarch64. To
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-prepare for support for multiple architectures, let's start moving common
+Message-id: 20210628135835.6690-9-peter.maydell@linaro.org
-code out into its own accel directory.
+---
  target/arm/helper-mve.h     | 12 ++++++++++++
  target/arm/translate.h      | 20 ++++++++++++++++++++
  target/arm/mve.decode       | 28 ++++++++++++++++++++++++++++
  target/arm/mve_helper.c     |  7 +++++++
  target/arm/translate-mve.c  |  5 +++++
  target/arm/translate-neon.c | 18 ------------------
 files changed, 72 insertions(+), 18 deletions(-)
-This patch moves CPU and memory operations over. While at it, make sure
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 the code is consumable on non-i386 systems.
 Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-4-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  include/sysemu/hvf_int.h   |   4 +
  target/i386/hvf/hvf-i386.h |   2 -
  target/i386/hvf/x86hvf.h   |   2 -
  accel/hvf/hvf-accel-ops.c  | 308 ++++++++++++++++++++++++++++++++++++-
  target/i386/hvf/hvf.c      | 302 ------------------------------------
 files changed, 311 insertions(+), 307 deletions(-)
 diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/sysemu/hvf_int.h
+--- a/target/arm/helper-mve.h
-+++ b/include/sysemu/hvf_int.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
+ DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
- #include <Hypervisor/hv.h>
+ DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
-+void hvf_set_phys_mem(MemoryRegionSection *, bool);
++DEF_HELPER_FLAGS_4(mve_vshli_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- void assert_hvf_ok(hv_return_t ret);
++DEF_HELPER_FLAGS_4(mve_vshli_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
++DEF_HELPER_FLAGS_4(mve_vshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+int hvf_put_registers(CPUState *);
++
-+int hvf_get_registers(CPUState *);
+ DEF_HELPER_FLAGS_4(mve_vshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(mve_vshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- #endif
+ DEF_HELPER_FLAGS_4(mve_vshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vqshlui_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vqshlui_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vqshlui_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vrshli_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshli_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/i386/hvf/hvf-i386.h
+--- a/target/arm/translate.h
-+++ b/target/i386/hvf/hvf-i386.h
++++ b/target/arm/translate.h
-@@ -XXX,XX +XXX,XX @@ struct HVFState {
+@@ -XXX,XX +XXX,XX @@ static inline int times_2_plus_1(DisasContext *s, int x)
- };
+     return x * 2 + 1;
- extern HVFState *hvf_state;
+ }
--void hvf_set_phys_mem(MemoryRegionSection *, bool);
++static inline int rsub_64(DisasContext *s, int x)
  void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
 -hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
  #ifdef NEED_CPU_H
  /* Functions exported to host specific mode */
 diff --git a/target/i386/hvf/x86hvf.h b/target/i386/hvf/x86hvf.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/x86hvf.h
 +++ b/target/i386/hvf/x86hvf.h
@@ -XXX,XX +XXX,XX @@
  #include "x86_descr.h"
  int hvf_process_events(CPUState *);
 -int hvf_put_registers(CPUState *);
 -int hvf_get_registers(CPUState *);
  bool hvf_inject_interrupts(CPUState *);
  void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
                       SegmentCache *qseg, bool is_tr);
 diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
 index XXXXXXX..XXXXXXX 100644
 --- a/accel/hvf/hvf-accel-ops.c
 +++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/osdep.h"
  #include "qemu/error-report.h"
  #include "qemu/main-loop.h"
 +#include "exec/address-spaces.h"
 +#include "exec/exec-all.h"
 +#include "sysemu/cpus.h"
  #include "sysemu/hvf.h"
 +#include "sysemu/hvf_int.h"
  #include "sysemu/runstate.h"
 -#include "target/i386/cpu.h"
  #include "qemu/guest-random.h"
  #include "hvf-accel-ops.h"
 +HVFState *hvf_state;
 +
 +/* Memory slots */
 +
 +hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
 +{
-+    hvf_slot *slot;
++    return 64 - x;
 +    int x;
 +    for (x = 0; x < hvf_state->num_slots; ++x) {
 +        slot = &hvf_state->slots[x];
 +        if (slot->size && start < (slot->start + slot->size) &&
 +            (start + size) > slot->start) {
 +            return slot;
 +        }
 +    }
 +    return NULL;
 +}
 +
-+struct mac_slot {
++static inline int rsub_32(DisasContext *s, int x)
 +    int present;
 +    uint64_t size;
 +    uint64_t gpa_start;
 +    uint64_t gva;
 +};
 +
 +struct mac_slot mac_slots[32];
 +
 +static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
 +{
-+    struct mac_slot *macslot;
++    return 32 - x;
 +    hv_return_t ret;
 +
 +    macslot = &mac_slots[slot->slot_id];
 +
 +    if (macslot->present) {
 +        if (macslot->size != slot->size) {
 +            macslot->present = 0;
 +            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
 +            assert_hvf_ok(ret);
 +        }
 +    }
 +
 +    if (!slot->size) {
 +        return 0;
 +    }
 +
 +    macslot->present = 1;
 +    macslot->gpa_start = slot->start;
 +    macslot->size = slot->size;
 +    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
 +    assert_hvf_ok(ret);
 +    return 0;
 +}
 +
-+void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
++static inline int rsub_16(DisasContext *s, int x)
 +{
-+    hvf_slot *mem;
++    return 16 - x;
 +    MemoryRegion *area = section->mr;
 +    bool writeable = !area->readonly && !area->rom_device;
 +    hv_memory_flags_t flags;
 +
 +    if (!memory_region_is_ram(area)) {
 +        if (writeable) {
 +            return;
 +        } else if (!memory_region_is_romd(area)) {
 +            /*
 +             * If the memory device is not in romd_mode, then we actually want
 +             * to remove the hvf memory slot so all accesses will trap.
 +             */
 +             add = false;
 +        }
 +    }
 +
 +    mem = hvf_find_overlap_slot(
 +            section->offset_within_address_space,
 +            int128_get64(section->size));
 +
 +    if (mem && add) {
 +        if (mem->size == int128_get64(section->size) &&
 +            mem->start == section->offset_within_address_space &&
 +            mem->mem == (memory_region_get_ram_ptr(area) +
 +            section->offset_within_region)) {
 +            return; /* Same region was attempted to register, go away. */
 +        }
 +    }
 +
 +    /* Region needs to be reset. set the size to 0 and remap it. */
 +    if (mem) {
 +        mem->size = 0;
 +        if (do_hvf_set_memory(mem, 0)) {
 +            error_report("Failed to reset overlapping slot");
 +            abort();
 +        }
 +    }
 +
 +    if (!add) {
 +        return;
 +    }
 +
 +    if (area->readonly ||
 +        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
 +        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
 +    } else {
 +        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
 +    }
 +
 +    /* Now make a new slot. */
 +    int x;
 +
 +    for (x = 0; x < hvf_state->num_slots; ++x) {
 +        mem = &hvf_state->slots[x];
 +        if (!mem->size) {
 +            break;
 +        }
 +    }
 +
 +    if (x == hvf_state->num_slots) {
 +        error_report("No free slots");
 +        abort();
 +    }
 +
 +    mem->size = int128_get64(section->size);
 +    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
 +    mem->start = section->offset_within_address_space;
 +    mem->region = area;
 +
 +    if (do_hvf_set_memory(mem, flags)) {
 +        error_report("Error registering new memory slot");
 +        abort();
 +    }
 +}
 +
-+static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
++static inline int rsub_8(DisasContext *s, int x)
 +{
-+    if (!cpu->vcpu_dirty) {
++    return 8 - x;
 +        hvf_get_registers(cpu);
 +        cpu->vcpu_dirty = true;
 +    }
 +}
 +
-+void hvf_cpu_synchronize_state(CPUState *cpu)
+ static inline int arm_dc_feature(DisasContext *dc, int feature)
-+{
+ {
-+    if (!cpu->vcpu_dirty) {
+     return (dc->features & (1ULL << feature)) != 0;
-+        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-+    }
+index XXXXXXX..XXXXXXX 100644
-+}
+--- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  @2_shl_h .... .... .. 01  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
  @2_shl_w .... .... .. 1   shift:5 .... .... .... .... &2shift qd=%qd qm=%qm size=2
 +# Right shifts are encoded as N - shift, where N is the element size in bits.
 +%rshift_i5  16:5 !function=rsub_32
 +%rshift_i4  16:4 !function=rsub_16
 +%rshift_i3  16:3 !function=rsub_8
 +
-+static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
++@2_shr_b .... .... .. 001 ... .... .... .... .... &2shift qd=%qd qm=%qm \
-+                                              run_on_cpu_data arg)
++         size=0 shift=%rshift_i3
-+{
++@2_shr_h .... .... .. 01 .... .... .... .... .... &2shift qd=%qd qm=%qm \
-+    hvf_put_registers(cpu);
++         size=1 shift=%rshift_i4
-+    cpu->vcpu_dirty = false;
++@2_shr_w .... .... .. 1 ..... .... .... .... .... &2shift qd=%qd qm=%qm \
-+}
++         size=2 shift=%rshift_i5
 +
-+void hvf_cpu_synchronize_post_reset(CPUState *cpu)
+ # Vector loads and stores
-+{
-+    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
+ # Widening loads and narrowing stores:
-+}
+@@ -XXX,XX +XXX,XX @@ VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_w
  VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_b
  VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_h
  VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_w
 +
-+static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
++VSHRI_S           111 0 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_b
-+                                             run_on_cpu_data arg)
++VSHRI_S           111 0 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_h
-+{
++VSHRI_S           111 0 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_w
 +    hvf_put_registers(cpu);
 +    cpu->vcpu_dirty = false;
 +}
 +
-+void hvf_cpu_synchronize_post_init(CPUState *cpu)
++VSHRI_U           111 1 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_b
-+{
++VSHRI_U           111 1 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_h
-+    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
++VSHRI_U           111 1 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_w
 +}
 +
-+static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
++VRSHRI_S          111 0 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_b
-+                                              run_on_cpu_data arg)
++VRSHRI_S          111 0 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_h
-+{
++VRSHRI_S          111 0 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
 +    cpu->vcpu_dirty = true;
 +}
 +
-+void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
++VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_b
-+{
++VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_h
-+    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
++VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
-+}
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 +
 +static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
 +{
 +    hvf_slot *slot;
 +
 +    slot = hvf_find_overlap_slot(
 +            section->offset_within_address_space,
 +            int128_get64(section->size));
 +
 +    /* protect region against writes; begin tracking it */
 +    if (on) {
 +        slot->flags |= HVF_SLOT_LOG;
 +        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
 +                      HV_MEMORY_READ);
 +    /* stop tracking region*/
 +    } else {
 +        slot->flags &= ~HVF_SLOT_LOG;
 +        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
 +                      HV_MEMORY_READ | HV_MEMORY_WRITE);
 +    }
 +}
 +
 +static void hvf_log_start(MemoryListener *listener,
 +                          MemoryRegionSection *section, int old, int new)
 +{
 +    if (old != 0) {
 +        return;
 +    }
 +
 +    hvf_set_dirty_tracking(section, 1);
 +}
 +
 +static void hvf_log_stop(MemoryListener *listener,
 +                         MemoryRegionSection *section, int old, int new)
 +{
 +    if (new != 0) {
 +        return;
 +    }
 +
 +    hvf_set_dirty_tracking(section, 0);
 +}
 +
 +static void hvf_log_sync(MemoryListener *listener,
 +                         MemoryRegionSection *section)
 +{
 +    /*
 +     * sync of dirty pages is handled elsewhere; just make sure we keep
 +     * tracking the region.
 +     */
 +    hvf_set_dirty_tracking(section, 1);
 +}
 +
 +static void hvf_region_add(MemoryListener *listener,
 +                           MemoryRegionSection *section)
 +{
 +    hvf_set_phys_mem(section, true);
 +}
 +
 +static void hvf_region_del(MemoryListener *listener,
 +                           MemoryRegionSection *section)
 +{
 +    hvf_set_phys_mem(section, false);
 +}
 +
 +static MemoryListener hvf_memory_listener = {
 +    .priority = 10,
 +    .region_add = hvf_region_add,
 +    .region_del = hvf_region_del,
 +    .log_start = hvf_log_start,
 +    .log_stop = hvf_log_stop,
 +    .log_sync = hvf_log_sync,
 +};
 +
 +static void dummy_signal(int sig)
 +{
 +}
 +
 +bool hvf_allowed;
 +
 +static int hvf_accel_init(MachineState *ms)
 +{
 +    int x;
 +    hv_return_t ret;
 +    HVFState *s;
 +
 +    ret = hv_vm_create(HV_VM_DEFAULT);
 +    assert_hvf_ok(ret);
 +
 +    s = g_new0(HVFState, 1);
 +
 +    s->num_slots = 32;
 +    for (x = 0; x < s->num_slots; ++x) {
 +        s->slots[x].size = 0;
 +        s->slots[x].slot_id = x;
 +    }
 +
 +    hvf_state = s;
 +    memory_listener_register(&hvf_memory_listener, &address_space_memory);
 +    return 0;
 +}
 +
 +static void hvf_accel_class_init(ObjectClass *oc, void *data)
 +{
 +    AccelClass *ac = ACCEL_CLASS(oc);
 +    ac->name = "HVF";
 +    ac->init_machine = hvf_accel_init;
 +    ac->allowed = &hvf_allowed;
 +}
 +
 +static const TypeInfo hvf_accel_type = {
 +    .name = TYPE_HVF_ACCEL,
 +    .parent = TYPE_ACCEL,
 +    .class_init = hvf_accel_class_init,
 +};
 +
 +static void hvf_type_init(void)
 +{
 +    type_register_static(&hvf_accel_type);
 +}
 +
 +type_init(hvf_type_init);
 +
  /*
   * The HVF-specific vCPU thread function. This one should only run when the host
   * CPU supports the VMX "unrestricted guest" feature.
 diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/i386/hvf/hvf.c
+--- a/target/arm/mve_helper.c
-+++ b/target/i386/hvf/hvf.c
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvuw, 4, uint32_t)
+     DO_2SHIFT(OP##b, 1, uint8_t, FN)            \
- #include "hvf-accel-ops.h"
+     DO_2SHIFT(OP##h, 2, uint16_t, FN)           \
+     DO_2SHIFT(OP##w, 4, uint32_t, FN)
--HVFState *hvf_state;
++#define DO_2SHIFT_S(OP, FN)                     \
--
++    DO_2SHIFT(OP##b, 1, int8_t, FN)             \
--/* Memory slots */
++    DO_2SHIFT(OP##h, 2, int16_t, FN)            \
--hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
++    DO_2SHIFT(OP##w, 4, int32_t, FN)
  #define DO_2SHIFT_SAT_U(OP, FN)                 \
      DO_2SHIFT_SAT(OP##b, 1, uint8_t, FN)        \
@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvuw, 4, uint32_t)
      DO_2SHIFT_SAT(OP##w, 4, int32_t, FN)
  DO_2SHIFT_U(vshli_u, DO_VSHLU)
 +DO_2SHIFT_S(vshli_s, DO_VSHLS)
  DO_2SHIFT_SAT_U(vqshli_u, DO_UQSHL_OP)
  DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
  DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
 +DO_2SHIFT_U(vrshli_u, DO_VRSHLU)
 +DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT(VSHLI, vshli_u, false)
  DO_2SHIFT(VQSHLI_S, vqshli_s, false)
  DO_2SHIFT(VQSHLI_U, vqshli_u, false)
  DO_2SHIFT(VQSHLUI, vqshlui_s, false)
 +/* These right shifts use a left-shift helper with negated shift count */
 +DO_2SHIFT(VSHRI_S, vshli_s, true)
 +DO_2SHIFT(VSHRI_U, vshli_u, true)
 +DO_2SHIFT(VRSHRI_S, vrshli_s, true)
 +DO_2SHIFT(VRSHRI_U, vrshli_u, true)
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static inline int plus1(DisasContext *s, int x)
      return x + 1;
  }
 -static inline int rsub_64(DisasContext *s, int x)
 -{
--    hvf_slot *slot;
+-    return 64 - x;
 -    int x;
 -    for (x = 0; x < hvf_state->num_slots; ++x) {
 -        slot = &hvf_state->slots[x];
 -        if (slot->size && start < (slot->start + slot->size) &&
 -            (start + size) > slot->start) {
 -            return slot;
 -        }
 -    }
 -    return NULL;
 -}
 -
--struct mac_slot {
+-static inline int rsub_32(DisasContext *s, int x)
 -    int present;
 -    uint64_t size;
 -    uint64_t gpa_start;
 -    uint64_t gva;
 -};
 -
 -struct mac_slot mac_slots[32];
 -
 -static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
 -{
--    struct mac_slot *macslot;
+-    return 32 - x;
--    hv_return_t ret;
+-}
--
+-static inline int rsub_16(DisasContext *s, int x)
--    macslot = &mac_slots[slot->slot_id];
+-{
--
+-    return 16 - x;
--    if (macslot->present) {
+-}
--        if (macslot->size != slot->size) {
+-static inline int rsub_8(DisasContext *s, int x)
--            macslot->present = 0;
+-{
--            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
+-    return 8 - x;
 -            assert_hvf_ok(ret);
 -        }
 -    }
 -
 -    if (!slot->size) {
 -        return 0;
 -    }
 -
 -    macslot->present = 1;
 -    macslot->gpa_start = slot->start;
 -    macslot->size = slot->size;
 -    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
 -    assert_hvf_ok(ret);
 -    return 0;
 -}
 -
--void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
+ static inline int neon_3same_fp_size(DisasContext *s, int x)
 -{
 -    hvf_slot *mem;
 -    MemoryRegion *area = section->mr;
 -    bool writeable = !area->readonly && !area->rom_device;
 -    hv_memory_flags_t flags;
 -
 -    if (!memory_region_is_ram(area)) {
 -        if (writeable) {
 -            return;
 -        } else if (!memory_region_is_romd(area)) {
 -            /*
 -             * If the memory device is not in romd_mode, then we actually want
 -             * to remove the hvf memory slot so all accesses will trap.
 -             */
 -             add = false;
 -        }
 -    }
 -
 -    mem = hvf_find_overlap_slot(
 -            section->offset_within_address_space,
 -            int128_get64(section->size));
 -
 -    if (mem && add) {
 -        if (mem->size == int128_get64(section->size) &&
 -            mem->start == section->offset_within_address_space &&
 -            mem->mem == (memory_region_get_ram_ptr(area) +
 -            section->offset_within_region)) {
 -            return; /* Same region was attempted to register, go away. */
 -        }
 -    }
 -
 -    /* Region needs to be reset. set the size to 0 and remap it. */
 -    if (mem) {
 -        mem->size = 0;
 -        if (do_hvf_set_memory(mem, 0)) {
 -            error_report("Failed to reset overlapping slot");
 -            abort();
 -        }
 -    }
 -
 -    if (!add) {
 -        return;
 -    }
 -
 -    if (area->readonly ||
 -        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
 -        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
 -    } else {
 -        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
 -    }
 -
 -    /* Now make a new slot. */
 -    int x;
 -
 -    for (x = 0; x < hvf_state->num_slots; ++x) {
 -        mem = &hvf_state->slots[x];
 -        if (!mem->size) {
 -            break;
 -        }
 -    }
 -
 -    if (x == hvf_state->num_slots) {
 -        error_report("No free slots");
 -        abort();
 -    }
 -
 -    mem->size = int128_get64(section->size);
 -    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
 -    mem->start = section->offset_within_address_space;
 -    mem->region = area;
 -
 -    if (do_hvf_set_memory(mem, flags)) {
 -        error_report("Error registering new memory slot");
 -        abort();
 -    }
 -}
 -
  void vmx_update_tpr(CPUState *cpu)
  {
-     /* TODO: need integrate APIC handling */
+     /* Convert 0==fp32, 1==fp16 into a MO_* value */
@@ -XXX,XX +XXX,XX @@ void hvf_handle_io(CPUArchState *env, uint16_t port, void *buffer,
      }
  }
 -static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
 -{
 -    if (!cpu->vcpu_dirty) {
 -        hvf_get_registers(cpu);
 -        cpu->vcpu_dirty = true;
 -    }
 -}
 -
 -void hvf_cpu_synchronize_state(CPUState *cpu)
 -{
 -    if (!cpu->vcpu_dirty) {
 -        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
 -    }
 -}
 -
 -static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
 -                                              run_on_cpu_data arg)
 -{
 -    hvf_put_registers(cpu);
 -    cpu->vcpu_dirty = false;
 -}
 -
 -void hvf_cpu_synchronize_post_reset(CPUState *cpu)
 -{
 -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
 -}
 -
 -static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
 -                                             run_on_cpu_data arg)
 -{
 -    hvf_put_registers(cpu);
 -    cpu->vcpu_dirty = false;
 -}
 -
 -void hvf_cpu_synchronize_post_init(CPUState *cpu)
 -{
 -    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
 -}
 -
 -static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
 -                                              run_on_cpu_data arg)
 -{
 -    cpu->vcpu_dirty = true;
 -}
 -
 -void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
 -{
 -    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
 -}
 -
  static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
  {
      int read, write;
@@ -XXX,XX +XXX,XX @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
      return false;
  }
 -static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
 -{
 -    hvf_slot *slot;
 -
 -    slot = hvf_find_overlap_slot(
 -            section->offset_within_address_space,
 -            int128_get64(section->size));
 -
 -    /* protect region against writes; begin tracking it */
 -    if (on) {
 -        slot->flags |= HVF_SLOT_LOG;
 -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
 -                      HV_MEMORY_READ);
 -    /* stop tracking region*/
 -    } else {
 -        slot->flags &= ~HVF_SLOT_LOG;
 -        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
 -                      HV_MEMORY_READ | HV_MEMORY_WRITE);
 -    }
 -}
 -
 -static void hvf_log_start(MemoryListener *listener,
 -                          MemoryRegionSection *section, int old, int new)
 -{
 -    if (old != 0) {
 -        return;
 -    }
 -
 -    hvf_set_dirty_tracking(section, 1);
 -}
 -
 -static void hvf_log_stop(MemoryListener *listener,
 -                         MemoryRegionSection *section, int old, int new)
 -{
 -    if (new != 0) {
 -        return;
 -    }
 -
 -    hvf_set_dirty_tracking(section, 0);
 -}
 -
 -static void hvf_log_sync(MemoryListener *listener,
 -                         MemoryRegionSection *section)
 -{
 -    /*
 -     * sync of dirty pages is handled elsewhere; just make sure we keep
 -     * tracking the region.
 -     */
 -    hvf_set_dirty_tracking(section, 1);
 -}
 -
 -static void hvf_region_add(MemoryListener *listener,
 -                           MemoryRegionSection *section)
 -{
 -    hvf_set_phys_mem(section, true);
 -}
 -
 -static void hvf_region_del(MemoryListener *listener,
 -                           MemoryRegionSection *section)
 -{
 -    hvf_set_phys_mem(section, false);
 -}
 -
 -static MemoryListener hvf_memory_listener = {
 -    .priority = 10,
 -    .region_add = hvf_region_add,
 -    .region_del = hvf_region_del,
 -    .log_start = hvf_log_start,
 -    .log_stop = hvf_log_stop,
 -    .log_sync = hvf_log_sync,
 -};
 -
  void hvf_vcpu_destroy(CPUState *cpu)
  {
      X86CPU *x86_cpu = X86_CPU(cpu);
@@ -XXX,XX +XXX,XX @@ void hvf_vcpu_destroy(CPUState *cpu)
      assert_hvf_ok(ret);
  }
 -static void dummy_signal(int sig)
 -{
 -}
 -
  static void init_tsc_freq(CPUX86State *env)
  {
      size_t length;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
      return ret;
  }
 -
 -bool hvf_allowed;
 -
 -static int hvf_accel_init(MachineState *ms)
 -{
 -    int x;
 -    hv_return_t ret;
 -    HVFState *s;
 -
 -    ret = hv_vm_create(HV_VM_DEFAULT);
 -    assert_hvf_ok(ret);
 -
 -    s = g_new0(HVFState, 1);
 -
 -    s->num_slots = 32;
 -    for (x = 0; x < s->num_slots; ++x) {
 -        s->slots[x].size = 0;
 -        s->slots[x].slot_id = x;
 -    }
 -
 -    hvf_state = s;
 -    memory_listener_register(&hvf_memory_listener, &address_space_memory);
 -    return 0;
 -}
 -
 -static void hvf_accel_class_init(ObjectClass *oc, void *data)
 -{
 -    AccelClass *ac = ACCEL_CLASS(oc);
 -    ac->name = "HVF";
 -    ac->init_machine = hvf_accel_init;
 -    ac->allowed = &hvf_allowed;
 -}
 -
 -static const TypeInfo hvf_accel_type = {
 -    .name = TYPE_HVF_ACCEL,
 -    .parent = TYPE_ACCEL,
 -    .class_init = hvf_accel_class_init,
 -};
 -
 -static void hvf_type_init(void)
 -{
 -    type_register_static(&hvf_accel_type);
 -}
 -
 -type_init(hvf_type_init);
 --
 .20.1

-[PULL 16/45] target/arm: Add isar_feature_{aa32, aa64, aa64_sve}_bf16
+[PULL 15/24] target/arm: Implement MVE VSHLL
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the MVE VHLL (vector shift left long) insn.  This has two
 encodings: the T1 encoding is the usual shift-by-immediate format,
 and the T2 encoding is a special case where the shift count is always
 equal to the element size.
-Note that the SVE BFLOAT16 support does not require SVE2,
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-it is an independent extension.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210628135835.6690-10-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h    |  9 +++++++
  target/arm/mve.decode      | 53 +++++++++++++++++++++++++++++++++++---
  target/arm/mve_helper.c    | 32 +++++++++++++++++++++++
  target/arm/translate-mve.c | 15 +++++++++++
 files changed, 105 insertions(+), 4 deletions(-)
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210525225817.400336-2-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/cpu.h | 15 +++++++++++++++
 file changed, 15 insertions(+)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/cpu.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_predinv(const ARMISARegisters *id)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     return FIELD_EX32(id->id_isar6, ID_ISAR6, SPECRES) != 0;
+ DEF_HELPER_FLAGS_4(mve_vrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- }
+ DEF_HELPER_FLAGS_4(mve_vrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(mve_vrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+static inline bool isar_feature_aa32_bf16(const ARMISARegisters *id)
++
 +DEF_HELPER_FLAGS_4(mve_vshllbsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vshllbsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vshllbub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vshllbuh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vshlltsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vshlltsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vshlltub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vshlltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  @2_shl_h .... .... .. 01  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
  @2_shl_w .... .... .. 1   shift:5 .... .... .... .... &2shift qd=%qd qm=%qm size=2
 +@2_shll_b .... .... ... 01 shift:3 .... .... .... .... &2shift qd=%qd qm=%qm size=0
 +@2_shll_h .... .... ... 1  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
 +# VSHLL encoding T2 where shift == esize
 +@2_shll_esize_b .... .... .... 00 .. .... .... .... .... &2shift \
 +                qd=%qd qm=%qm size=0 shift=8
 +@2_shll_esize_h .... .... .... 01 .. .... .... .... .... &2shift \
 +                qd=%qd qm=%qm size=1 shift=16
 +
  # Right shifts are encoded as N - shift, where N is the element size in bits.
  %rshift_i5  16:5 !function=rsub_32
  %rshift_i4  16:4 !function=rsub_16
@@ -XXX,XX +XXX,XX @@ VADD             1110 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
  VSUB             1111 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
  VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
 -VMULH_S          111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
 -VMULH_U          111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
 +# The VSHLL T2 encoding is not a @2op pattern, but is here because it
 +# overlaps what would be size=0b11 VMULH/VRMULH
 +{
-+    return FIELD_EX32(id->id_isar6, ID_ISAR6, BF16) != 0;
++  VSHLL_BS       111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
 +  VSHLL_BS       111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
 -VRMULH_S         111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
 -VRMULH_U         111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
 +  VMULH_S        111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
 +}
 +
- static inline bool isar_feature_aa32_i8mm(const ARMISARegisters *id)
- {
-     return FIELD_EX32(id->id_isar6, ID_ISAR6, I8MM) != 0;
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_dcpodp(const ARMISARegisters *id)
-     return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) >= 2;
- }
-+static inline bool isar_feature_aa64_bf16(const ARMISARegisters *id)
 +{
-+    return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, BF16) != 0;
++  VSHLL_BU       111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
 +  VSHLL_BU       111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
 +
 +  VMULH_U        111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
 +}
 +
- static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id)
- {
-     /* We always set the AdvSIMD and FP fields identically.  */
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sve2_bitperm(const ARMISARegisters *id)
-     return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BITPERM) != 0;
- }
-+static inline bool isar_feature_aa64_sve_bf16(const ARMISARegisters *id)
 +{
-+    return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BFLOAT16) != 0;
++  VSHLL_TS       111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
 +  VSHLL_TS       111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
 +
 +  VRMULH_S       111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
 +}
 +
- static inline bool isar_feature_aa64_sve2_sha3(const ARMISARegisters *id)
++{
- {
++  VSHLL_TU       111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
-     return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, SHA3) != 0;
++  VSHLL_TU       111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
 +
 +  VRMULH_U       111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
 +}
  VMAX_S           111 0 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
  VMAX_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
@@ -XXX,XX +XXX,XX @@ VRSHRI_S          111 0 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
  VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_b
  VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_h
  VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
 +
 +# VSHLL T1 encoding; the T2 VSHLL encoding is elsewhere in this file
 +VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_b
 +VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_h
 +
 +VSHLL_BU          111 1 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_b
 +VSHLL_BU          111 1 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_h
 +
 +VSHLL_TS          111 0 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_b
 +VSHLL_TS          111 0 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
 +
 +VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_b
 +VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
  DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
  DO_2SHIFT_U(vrshli_u, DO_VRSHLU)
  DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
 +
 +/*
 + * Long shifts taking half-sized inputs from top or bottom of the input
 + * vector and producing a double-width result. ESIZE, TYPE are for
 + * the input, and LESIZE, LTYPE for the output.
 + * Unlike the normal shift helpers, we do not handle negative shift counts,
 + * because the long shift is strictly left-only.
 + */
 +#define DO_VSHLL(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE)                   \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,             \
 +                                void *vm, uint32_t shift)               \
 +    {                                                                   \
 +        LTYPE *d = vd;                                                  \
 +        TYPE *m = vm;                                                   \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned le;                                                    \
 +        assert(shift <= 16);                                            \
 +        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
 +            LTYPE r = (LTYPE)m[H##ESIZE(le * 2 + TOP)] << shift;        \
 +            mergemask(&d[H##LESIZE(le)], r, mask);                      \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +#define DO_VSHLL_ALL(OP, TOP)                                \
 +    DO_VSHLL(OP##sb, TOP, 1, int8_t, 2, int16_t)             \
 +    DO_VSHLL(OP##ub, TOP, 1, uint8_t, 2, uint16_t)           \
 +    DO_VSHLL(OP##sh, TOP, 2, int16_t, 4, int32_t)            \
 +    DO_VSHLL(OP##uh, TOP, 2, uint16_t, 4, uint32_t)          \
 +
 +DO_VSHLL_ALL(vshllb, false)
 +DO_VSHLL_ALL(vshllt, true)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT(VSHRI_S, vshli_s, true)
  DO_2SHIFT(VSHRI_U, vshli_u, true)
  DO_2SHIFT(VRSHRI_S, vrshli_s, true)
  DO_2SHIFT(VRSHRI_U, vrshli_u, true)
 +
 +#define DO_VSHLL(INSN, FN)                                      \
 +    static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
 +    {                                                           \
 +        static MVEGenTwoOpShiftFn * const fns[] = {             \
 +            gen_helper_mve_##FN##b,                             \
 +            gen_helper_mve_##FN##h,                             \
 +        };                                                      \
 +        return do_2shift(s, a, fns[a->size], false);            \
 +    }
 +
 +DO_VSHLL(VSHLL_BS, vshllbs)
 +DO_VSHLL(VSHLL_BU, vshllbu)
 +DO_VSHLL(VSHLL_TS, vshllts)
 +DO_VSHLL(VSHLL_TU, vshlltu)
 --
 .20.1

-[PULL 06/45] target/arm: Implement M-profile VPR register
+[PULL 16/24] target/arm: Implement MVE VSRI, VSLI
-If MVE is implemented for an M-profile CPU then it has a VPR
+Implement the MVE VSRI and VSLI insns, which perform a
-register, which tracks predication information.
+shift-and-insert operation.
 Implement the read and write handling of this register, and
 the migration of its state.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210520152840.24453-7-peter.maydell@linaro.org
+Message-id: 20210628135835.6690-11-peter.maydell@linaro.org
 ---
- target/arm/cpu.h           |  6 ++++++
+ target/arm/helper-mve.h    |  8 ++++++++
- target/arm/machine.c       | 19 +++++++++++++++++++
+ target/arm/mve.decode      |  9 ++++++++
- target/arm/translate-vfp.c | 38 ++++++++++++++++++++++++++++++++++++++
+ target/arm/mve_helper.c    | 42 ++++++++++++++++++++++++++++++++++++++
-files changed, 63 insertions(+)
+ target/arm/translate-mve.c |  3 +++
 files changed, 62 insertions(+)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/cpu.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vshlltsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-         uint32_t cpacr[M_REG_NUM_BANKS];
+ DEF_HELPER_FLAGS_4(mve_vshlltsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-         uint32_t nsacr;
+ DEF_HELPER_FLAGS_4(mve_vshlltub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-         int ltpsize;
+ DEF_HELPER_FLAGS_4(mve_vshlltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+        uint32_t vpr;
++
-     } v7m;
++DEF_HELPER_FLAGS_4(mve_vsrib, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(mve_vsrih, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     /* Information associated with an exception about to be taken:
++DEF_HELPER_FLAGS_4(mve_vsriw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ FIELD(V7M_FPCCR, ASPEN, 31, 1)
++
-      R_V7M_FPCCR_UFRDY_MASK |                   \
++DEF_HELPER_FLAGS_4(mve_vslib, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-      R_V7M_FPCCR_ASPEN_MASK)
++DEF_HELPER_FLAGS_4(mve_vslih, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(mve_vsliw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+/* v7M VPR bits */
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-+FIELD(V7M_VPR, P0, 0, 16)
+index XXXXXXX..XXXXXXX 100644
-+FIELD(V7M_VPR, MASK01, 16, 4)
+--- a/target/arm/mve.decode
-+FIELD(V7M_VPR, MASK23, 20, 4)
++++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VSHLL_TS          111 0 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
  VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_b
  VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
 +
 +# Shift-and-insert
 +VSRI              111 1 1111 1 . ... ... ... 0 0100 0 1 . 1 ... 0 @2_shr_b
 +VSRI              111 1 1111 1 . ... ... ... 0 0100 0 1 . 1 ... 0 @2_shr_h
 +VSRI              111 1 1111 1 . ... ... ... 0 0100 0 1 . 1 ... 0 @2_shr_w
 +
 +VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_b
 +VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_h
 +VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_w
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
  DO_2SHIFT_U(vrshli_u, DO_VRSHLU)
  DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
 +/* Shift-and-insert; we always work with 64 bits at a time */
 +#define DO_2SHIFT_INSERT(OP, ESIZE, SHIFTFN, MASKFN)                    \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,             \
 +                                void *vm, uint32_t shift)               \
 +    {                                                                   \
 +        uint64_t *d = vd, *m = vm;                                      \
 +        uint16_t mask;                                                  \
 +        uint64_t shiftmask;                                             \
 +        unsigned e;                                                     \
 +        if (shift == 0 || shift == ESIZE * 8) {                         \
 +            /*                                                          \
 +             * Only VSLI can shift by 0; only VSRI can shift by <dt>.   \
 +             * The generic logic would give the right answer for 0 but  \
 +             * fails for <dt>.                                          \
 +             */                                                         \
 +            goto done;                                                  \
 +        }                                                               \
 +        assert(shift < ESIZE * 8);                                      \
 +        mask = mve_element_mask(env);                                   \
 +        /* ESIZE / 2 gives the MO_* value if ESIZE is in [1,2,4] */     \
 +        shiftmask = dup_const(ESIZE / 2, MASKFN(ESIZE * 8, shift));     \
 +        for (e = 0; e < 16 / 8; e++, mask >>= 8) {                      \
 +            uint64_t r = (SHIFTFN(m[H8(e)], shift) & shiftmask) |       \
 +                (d[H8(e)] & ~shiftmask);                                \
 +            mergemask(&d[H8(e)], r, mask);                              \
 +        }                                                               \
 +done:                                                                   \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
 +#define DO_SHL(N, SHIFT) ((N) << (SHIFT))
 +#define DO_SHR(N, SHIFT) ((N) >> (SHIFT))
 +#define SHL_MASK(EBITS, SHIFT) MAKE_64BIT_MASK((SHIFT), (EBITS) - (SHIFT))
 +#define SHR_MASK(EBITS, SHIFT) MAKE_64BIT_MASK(0, (EBITS) - (SHIFT))
 +
 +DO_2SHIFT_INSERT(vsrib, 1, DO_SHR, SHR_MASK)
 +DO_2SHIFT_INSERT(vsrih, 2, DO_SHR, SHR_MASK)
 +DO_2SHIFT_INSERT(vsriw, 4, DO_SHR, SHR_MASK)
 +DO_2SHIFT_INSERT(vslib, 1, DO_SHL, SHL_MASK)
 +DO_2SHIFT_INSERT(vslih, 2, DO_SHL, SHL_MASK)
 +DO_2SHIFT_INSERT(vsliw, 4, DO_SHL, SHL_MASK)
 +
  /*
-  * System register ID fields.
+  * Long shifts taking half-sized inputs from top or bottom of the input
-  */
+  * vector and producing a double-width result. ESIZE, TYPE are for
-diff --git a/target/arm/machine.c b/target/arm/machine.c
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/machine.c
+--- a/target/arm/translate-mve.c
-+++ b/target/arm/machine.c
++++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_fp = {
+@@ -XXX,XX +XXX,XX @@ DO_2SHIFT(VSHRI_U, vshli_u, true)
-     }
+ DO_2SHIFT(VRSHRI_S, vrshli_s, true)
- };
+ DO_2SHIFT(VRSHRI_U, vrshli_u, true)
-+static bool mve_needed(void *opaque)
++DO_2SHIFT(VSRI, vsri, false)
-+{
++DO_2SHIFT(VSLI, vsli, false)
 +    ARMCPU *cpu = opaque;
 +
-+    return cpu_isar_feature(aa32_mve, cpu);
+ #define DO_VSHLL(INSN, FN)                                      \
-+}
+     static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
-+
+     {                                                           \
 +static const VMStateDescription vmstate_m_mve = {
 +    .name = "cpu/m/mve",
 +    .version_id = 1,
 +    .minimum_version_id = 1,
 +    .needed = mve_needed,
 +    .fields = (VMStateField[]) {
 +        VMSTATE_UINT32(env.v7m.vpr, ARMCPU),
 +        VMSTATE_END_OF_LIST()
 +    },
 +};
 +
  static const VMStateDescription vmstate_m = {
      .name = "cpu/m",
      .version_id = 4,
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m = {
          &vmstate_m_other_sp,
          &vmstate_m_v8m,
          &vmstate_m_fp,
 +        &vmstate_m_mve,
          NULL
      }
  };
 diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c
 +++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
              return FPSysRegCheckFailed;
          }
          break;
 +    case ARM_VFP_VPR:
 +    case ARM_VFP_P0:
 +        if (!dc_isar_feature(aa32_mve, s)) {
 +            return FPSysRegCheckFailed;
 +        }
 +        break;
      default:
          return FPSysRegCheckFailed;
      }
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
          tcg_temp_free_i32(sfpa);
          break;
      }
 +    case ARM_VFP_VPR:
 +        /* Behaves as NOP if not privileged */
 +        if (IS_USER(s)) {
 +            break;
 +        }
 +        tmp = loadfn(s, opaque);
 +        store_cpu_field(tmp, v7m.vpr);
 +        break;
 +    case ARM_VFP_P0:
 +    {
 +        TCGv_i32 vpr;
 +        tmp = loadfn(s, opaque);
 +        vpr = load_cpu_field(v7m.vpr);
 +        tcg_gen_deposit_i32(vpr, vpr, tmp,
 +                            R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
 +        store_cpu_field(vpr, v7m.vpr);
 +        tcg_temp_free_i32(tmp);
 +        break;
 +    }
      default:
          g_assert_not_reached();
      }
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
          tcg_temp_free_i32(fpscr);
          break;
      }
 +    case ARM_VFP_VPR:
 +        /* Behaves as NOP if not privileged */
 +        if (IS_USER(s)) {
 +            break;
 +        }
 +        tmp = load_cpu_field(v7m.vpr);
 +        storefn(s, opaque, tmp);
 +        break;
 +    case ARM_VFP_P0:
 +        tmp = load_cpu_field(v7m.vpr);
 +        tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
 +        storefn(s, opaque, tmp);
 +        break;
      default:
          g_assert_not_reached();
      }
 --
 .20.1

-[PULL 04/45] target/arm: Add MVE check to VMOV_reg_sp and VMOV_reg_dp
+[PULL 17/24] target/arm: Implement MVE VSHRN, VRSHRN
-Split out the handling of VMOV_reg_sp and VMOV_reg_dp so that we can
+Implement the MVE shift-right-and-narrow insn VSHRN and VRSHRN.
-permit the insns if either FP or MVE are present.
 do_urshr() is borrowed from sve_helper.c.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210520152840.24453-5-peter.maydell@linaro.org
+Message-id: 20210628135835.6690-12-peter.maydell@linaro.org
 ---
- target/arm/translate-vfp.c | 15 +++++++++++++--
+ target/arm/helper-mve.h    | 10 ++++++++++
-file changed, 13 insertions(+), 2 deletions(-)
+ target/arm/mve.decode      | 11 +++++++++++
  target/arm/mve_helper.c    | 40 ++++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c | 15 ++++++++++++++
 files changed, 76 insertions(+)
-diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.c
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/translate-vfp.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vsriw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-         return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
+ DEF_HELPER_FLAGS_4(mve_vslib, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     }
+ DEF_HELPER_FLAGS_4(mve_vslih, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(mve_vsliw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32, aa32_fpsp_v2)
++
--DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64, aa32_fpdp_v2)
++DEF_HELPER_FLAGS_4(mve_vshrnbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+#define DO_VFP_VMOV(INSN, PREC, FN)                             \
++DEF_HELPER_FLAGS_4(mve_vshrnbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+    static bool trans_##INSN##_##PREC(DisasContext *s,          \
++DEF_HELPER_FLAGS_4(mve_vshrntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+                                      arg_##INSN##_##PREC *a)   \
++DEF_HELPER_FLAGS_4(mve_vshrnth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vrshrnbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshrnbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshrntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshrnth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VSRI              111 1 1111 1 . ... ... ... 0 0100 0 1 . 1 ... 0 @2_shr_w
  VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_b
  VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_h
  VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_w
 +
 +# Narrowing shifts (which only support b and h sizes)
 +VSHRNB            111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_b
 +VSHRNB            111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_h
 +VSHRNT            111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_b
 +VSHRNT            111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_h
 +
 +VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_b
 +VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_h
 +VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_b
 +VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_h
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_INSERT(vsliw, 4, DO_SHL, SHL_MASK)
  DO_VSHLL_ALL(vshllb, false)
  DO_VSHLL_ALL(vshllt, true)
 +
 +/*
 + * Narrowing right shifts, taking a double sized input, shifting it
 + * and putting the result in either the top or bottom half of the output.
 + * ESIZE, TYPE are the output, and LESIZE, LTYPE the input.
 + */
 +#define DO_VSHRN(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN)       \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
 +                                void *vm, uint32_t shift)       \
 +    {                                                           \
-+        if (!dc_isar_feature(aa32_fp##PREC##_v2, s) &&          \
++        LTYPE *m = vm;                                          \
-+            !dc_isar_feature(aa32_mve, s)) {                    \
++        TYPE *d = vd;                                           \
-+            return false;                                       \
++        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned le;                                            \
 +        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
 +            TYPE r = FN(m[H##LESIZE(le)], shift);               \
 +            mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask);     \
 +        }                                                       \
-+        return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
++        mve_advance_vpt(env);                                   \
 +    }
 +
-+DO_VFP_VMOV(VMOV_reg, sp, tcg_gen_mov_i32)
++#define DO_VSHRN_ALL(OP, FN)                                    \
-+DO_VFP_VMOV(VMOV_reg, dp, tcg_gen_mov_i64)
++    DO_VSHRN(OP##bb, false, 1, uint8_t, 2, uint16_t, FN)        \
++    DO_VSHRN(OP##bh, false, 2, uint16_t, 4, uint32_t, FN)       \
- DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh, aa32_fp16_arith)
++    DO_VSHRN(OP##tb, true, 1, uint8_t, 2, uint16_t, FN)         \
- DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss, aa32_fpsp_v2)
++    DO_VSHRN(OP##th, true, 2, uint16_t, 4, uint32_t, FN)
 +
 +static inline uint64_t do_urshr(uint64_t x, unsigned sh)
 +{
 +    if (likely(sh < 64)) {
 +        return (x >> sh) + ((x >> (sh - 1)) & 1);
 +    } else if (sh == 64) {
 +        return x >> 63;
 +    } else {
 +        return 0;
 +    }
 +}
 +
 +DO_VSHRN_ALL(vshrn, DO_SHR)
 +DO_VSHRN_ALL(vrshrn, do_urshr)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_VSHLL(VSHLL_BS, vshllbs)
  DO_VSHLL(VSHLL_BU, vshllbu)
  DO_VSHLL(VSHLL_TS, vshllts)
  DO_VSHLL(VSHLL_TU, vshlltu)
 +
 +#define DO_2SHIFT_N(INSN, FN)                                   \
 +    static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
 +    {                                                           \
 +        static MVEGenTwoOpShiftFn * const fns[] = {             \
 +            gen_helper_mve_##FN##b,                             \
 +            gen_helper_mve_##FN##h,                             \
 +        };                                                      \
 +        return do_2shift(s, a, fns[a->size], false);            \
 +    }
 +
 +DO_2SHIFT_N(VSHRNB, vshrnb)
 +DO_2SHIFT_N(VSHRNT, vshrnt)
 +DO_2SHIFT_N(VRSHRNB, vrshrnb)
 +DO_2SHIFT_N(VRSHRNT, vrshrnt)
 --
 .20.1

-[PULL 01/45] target/arm: Add isar feature check functions for MVE
+[PULL 18/24] target/arm: Implement MVE saturating narrowing shifts
-Add the isar feature check functions we will need for v8.1M MVE:
+Implement the MVE saturating shift-right-and-narrow insns
- * a check for MVE present: this corresponds to the pseudocode's
+VQSHRN, VQSHRUN, VQRSHRN and VQRSHRUN.
-   CheckDecodeFaults(ExtType_Mve)
- * a check for the optional floating-point part of MVE: this
+do_srshr() is borrowed from sve_helper.c.
    corresponds to CheckDecodeFaults(ExtType_MveFp)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210520152840.24453-2-peter.maydell@linaro.org
+Message-id: 20210628135835.6690-13-peter.maydell@linaro.org
 ---
- target/arm/cpu.h | 22 ++++++++++++++++++++++
+ target/arm/helper-mve.h    |  30 +++++++++++
-file changed, 22 insertions(+)
+ target/arm/mve.decode      |  28 ++++++++++
+ target/arm/mve_helper.c    | 104 +++++++++++++++++++++++++++++++++++++
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+ target/arm/translate-mve.c |  12 +++++
-index XXXXXXX..XXXXXXX 100644
+files changed, 174 insertions(+)
---- a/target/arm/cpu.h
-+++ b/target/arm/cpu.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper-mve.h
 +++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrshrnbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vrshrnbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vrshrntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vrshrnth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqshrnb_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqshrnb_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqshrnt_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqshrnt_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqshrnb_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqshrnb_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqshrnt_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqshrnt_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqshrunbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqshrunbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqrshrnb_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrshrnb_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrshrnt_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrshrnt_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqrshrnb_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrshrnb_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrshrnt_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrshrnt_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqrshrunbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrshrunbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_b
  VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_h
  VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_b
  VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_h
 +
 +VQSHRNB_S         111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_b
 +VQSHRNB_S         111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_h
 +VQSHRNT_S         111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_b
 +VQSHRNT_S         111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_h
 +VQSHRNB_U         111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_b
 +VQSHRNB_U         111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_h
 +VQSHRNT_U         111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_b
 +VQSHRNT_U         111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_h
 +
 +VQSHRUNB          111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_b
 +VQSHRUNB          111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_h
 +VQSHRUNT          111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
 +VQSHRUNT          111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
 +
 +VQRSHRNB_S        111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_b
 +VQRSHRNB_S        111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_h
 +VQRSHRNT_S        111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_b
 +VQRSHRNT_S        111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_h
 +VQRSHRNB_U        111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_b
 +VQRSHRNB_U        111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_h
 +VQRSHRNT_U        111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_b
 +VQRSHRNT_U        111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_h
 +
 +VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_b
 +VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_h
 +VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
 +VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint64_t do_urshr(uint64_t x, unsigned sh)
      }
  }
-+static inline bool isar_feature_aa32_mve(const ARMISARegisters *id)
++static inline int64_t do_srshr(int64_t x, unsigned sh)
 +{
-+    /*
++    if (likely(sh < 64)) {
-+     * Return true if MVE is supported (either integer or floating point).
++        return (x >> sh) + ((x >> (sh - 1)) & 1);
-+     * We must check for M-profile as the MVFR1 field means something
++    } else {
-+     * else for A-profile.
++        /* Rounding the sign bit always produces 0. */
-+     */
++        return 0;
-+    return isar_feature_aa32_mprofile(id) &&
++    }
 +        FIELD_EX32(id->mvfr1, MVFR1, MVE) > 0;
 +}
 +
-+static inline bool isar_feature_aa32_mve_fp(const ARMISARegisters *id)
+ DO_VSHRN_ALL(vshrn, DO_SHR)
  DO_VSHRN_ALL(vrshrn, do_urshr)
 +
 +static inline int32_t do_sat_bhs(int64_t val, int64_t min, int64_t max,
 +                                 bool *satp)
 +{
-+    /*
++    if (val > max) {
-+     * Return true if MVE is supported (either integer or floating point).
++        *satp = true;
-+     * We must check for M-profile as the MVFR1 field means something
++        return max;
-+     * else for A-profile.
++    } else if (val < min) {
-+     */
++        *satp = true;
-+    return isar_feature_aa32_mprofile(id) &&
++        return min;
-+        FIELD_EX32(id->mvfr1, MVFR1, MVE) >= 2;
++    } else {
 +        return val;
 +    }
 +}
 +
- static inline bool isar_feature_aa32_vfp_simd(const ARMISARegisters *id)
++/* Saturating narrowing right shifts */
- {
++#define DO_VSHRN_SAT(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN)   \
-     /*
++    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
 +                                void *vm, uint32_t shift)       \
 +    {                                                           \
 +        LTYPE *m = vm;                                          \
 +        TYPE *d = vd;                                           \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        bool qc = false;                                        \
 +        unsigned le;                                            \
 +        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
 +            bool sat = false;                                   \
 +            TYPE r = FN(m[H##LESIZE(le)], shift, &sat);         \
 +            mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask);     \
 +            qc |= sat && (mask & 1 << (TOP * ESIZE));           \
 +        }                                                       \
 +        if (qc) {                                               \
 +            env->vfp.qc[0] = qc;                                \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +    }
 +
 +#define DO_VSHRN_SAT_UB(BOP, TOP, FN)                           \
 +    DO_VSHRN_SAT(BOP, false, 1, uint8_t, 2, uint16_t, FN)       \
 +    DO_VSHRN_SAT(TOP, true, 1, uint8_t, 2, uint16_t, FN)
 +
 +#define DO_VSHRN_SAT_UH(BOP, TOP, FN)                           \
 +    DO_VSHRN_SAT(BOP, false, 2, uint16_t, 4, uint32_t, FN)      \
 +    DO_VSHRN_SAT(TOP, true, 2, uint16_t, 4, uint32_t, FN)
 +
 +#define DO_VSHRN_SAT_SB(BOP, TOP, FN)                           \
 +    DO_VSHRN_SAT(BOP, false, 1, int8_t, 2, int16_t, FN)         \
 +    DO_VSHRN_SAT(TOP, true, 1, int8_t, 2, int16_t, FN)
 +
 +#define DO_VSHRN_SAT_SH(BOP, TOP, FN)                           \
 +    DO_VSHRN_SAT(BOP, false, 2, int16_t, 4, int32_t, FN)        \
 +    DO_VSHRN_SAT(TOP, true, 2, int16_t, 4, int32_t, FN)
 +
 +#define DO_SHRN_SB(N, M, SATP)                                  \
 +    do_sat_bhs((int64_t)(N) >> (M), INT8_MIN, INT8_MAX, SATP)
 +#define DO_SHRN_UB(N, M, SATP)                                  \
 +    do_sat_bhs((uint64_t)(N) >> (M), 0, UINT8_MAX, SATP)
 +#define DO_SHRUN_B(N, M, SATP)                                  \
 +    do_sat_bhs((int64_t)(N) >> (M), 0, UINT8_MAX, SATP)
 +
 +#define DO_SHRN_SH(N, M, SATP)                                  \
 +    do_sat_bhs((int64_t)(N) >> (M), INT16_MIN, INT16_MAX, SATP)
 +#define DO_SHRN_UH(N, M, SATP)                                  \
 +    do_sat_bhs((uint64_t)(N) >> (M), 0, UINT16_MAX, SATP)
 +#define DO_SHRUN_H(N, M, SATP)                                  \
 +    do_sat_bhs((int64_t)(N) >> (M), 0, UINT16_MAX, SATP)
 +
 +#define DO_RSHRN_SB(N, M, SATP)                                 \
 +    do_sat_bhs(do_srshr(N, M), INT8_MIN, INT8_MAX, SATP)
 +#define DO_RSHRN_UB(N, M, SATP)                                 \
 +    do_sat_bhs(do_urshr(N, M), 0, UINT8_MAX, SATP)
 +#define DO_RSHRUN_B(N, M, SATP)                                 \
 +    do_sat_bhs(do_srshr(N, M), 0, UINT8_MAX, SATP)
 +
 +#define DO_RSHRN_SH(N, M, SATP)                                 \
 +    do_sat_bhs(do_srshr(N, M), INT16_MIN, INT16_MAX, SATP)
 +#define DO_RSHRN_UH(N, M, SATP)                                 \
 +    do_sat_bhs(do_urshr(N, M), 0, UINT16_MAX, SATP)
 +#define DO_RSHRUN_H(N, M, SATP)                                 \
 +    do_sat_bhs(do_srshr(N, M), 0, UINT16_MAX, SATP)
 +
 +DO_VSHRN_SAT_SB(vqshrnb_sb, vqshrnt_sb, DO_SHRN_SB)
 +DO_VSHRN_SAT_SH(vqshrnb_sh, vqshrnt_sh, DO_SHRN_SH)
 +DO_VSHRN_SAT_UB(vqshrnb_ub, vqshrnt_ub, DO_SHRN_UB)
 +DO_VSHRN_SAT_UH(vqshrnb_uh, vqshrnt_uh, DO_SHRN_UH)
 +DO_VSHRN_SAT_SB(vqshrunbb, vqshruntb, DO_SHRUN_B)
 +DO_VSHRN_SAT_SH(vqshrunbh, vqshrunth, DO_SHRUN_H)
 +
 +DO_VSHRN_SAT_SB(vqrshrnb_sb, vqrshrnt_sb, DO_RSHRN_SB)
 +DO_VSHRN_SAT_SH(vqrshrnb_sh, vqrshrnt_sh, DO_RSHRN_SH)
 +DO_VSHRN_SAT_UB(vqrshrnb_ub, vqrshrnt_ub, DO_RSHRN_UB)
 +DO_VSHRN_SAT_UH(vqrshrnb_uh, vqrshrnt_uh, DO_RSHRN_UH)
 +DO_VSHRN_SAT_SB(vqrshrunbb, vqrshruntb, DO_RSHRUN_B)
 +DO_VSHRN_SAT_SH(vqrshrunbh, vqrshrunth, DO_RSHRUN_H)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_N(VSHRNB, vshrnb)
  DO_2SHIFT_N(VSHRNT, vshrnt)
  DO_2SHIFT_N(VRSHRNB, vrshrnb)
  DO_2SHIFT_N(VRSHRNT, vrshrnt)
 +DO_2SHIFT_N(VQSHRNB_S, vqshrnb_s)
 +DO_2SHIFT_N(VQSHRNT_S, vqshrnt_s)
 +DO_2SHIFT_N(VQSHRNB_U, vqshrnb_u)
 +DO_2SHIFT_N(VQSHRNT_U, vqshrnt_u)
 +DO_2SHIFT_N(VQSHRUNB, vqshrunb)
 +DO_2SHIFT_N(VQSHRUNT, vqshrunt)
 +DO_2SHIFT_N(VQRSHRNB_S, vqrshrnb_s)
 +DO_2SHIFT_N(VQRSHRNT_S, vqrshrnt_s)
 +DO_2SHIFT_N(VQRSHRNB_U, vqrshrnb_u)
 +DO_2SHIFT_N(VQRSHRNT_U, vqrshrnt_u)
 +DO_2SHIFT_N(VQRSHRUNB, vqrshrunb)
 +DO_2SHIFT_N(VQRSHRUNT, vqrshrunt)
 --
 .20.1

-[PULL 05/45] target/arm: Fix return values in fp_sysreg_checks()
+Deleted patch
-The fp_sysreg_checks() function is supposed to be returning an
-FPSysRegCheckResult, which is an enum with three possible values.
-However, three places in the function "return false" (a hangover from
-a previous iteration of the design where the function just returned a
-bool).  Make these return FPSysRegCheckFailed instead (for no
-functional change, since both false and FPSysRegCheckFailed are
-zero).
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210520152840.24453-6-peter.maydell@linaro.org
----
- target/arm/translate-vfp.c | 6 +++---
-file changed, 3 insertions(+), 3 deletions(-)
-diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.c
-+++ b/target/arm/translate-vfp.c
-@@ -XXX,XX +XXX,XX @@ static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
-         break;
-     case ARM_VFP_FPSCR_NZCVQC:
-         if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
--            return false;
-+            return FPSysRegCheckFailed;
-         }
-         break;
-     case ARM_VFP_FPCXT_S:
-     case ARM_VFP_FPCXT_NS:
-         if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
--            return false;
-+            return FPSysRegCheckFailed;
-         }
-         if (!s->v8m_secure) {
--            return false;
-+            return FPSysRegCheckFailed;
-         }
-         break;
-     default:
---
-.20.1

-[PULL 08/45] target/arm: Allow board models to specify initial NS VTOR
+[PULL 19/24] target/arm: Implement MVE VSHLC
-Currently we allow board models to specify the initial value of the
+Implement the MVE VSHLC insn, which performs a shift left of the
-Secure VTOR register, using an init-svtor property on the TYPE_ARMV7M
+entire vector with carry in bits provided from a general purpose
-object which is plumbed through to the CPU.  Allow board models to
+register and carry out bits written back to that register.
 also specify the initial value of the Non-secure VTOR via a similar
 init-nsvtor property.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210520152840.24453-10-peter.maydell@linaro.org
+Message-id: 20210628135835.6690-14-peter.maydell@linaro.org
 ---
- include/hw/arm/armv7m.h |  2 ++
+ target/arm/helper-mve.h    |  2 ++
- target/arm/cpu.h        |  2 ++
+ target/arm/mve.decode      |  2 ++
- hw/arm/armv7m.c         |  7 +++++++
+ target/arm/mve_helper.c    | 38 ++++++++++++++++++++++++++++++++++++++
- target/arm/cpu.c        | 10 ++++++++++
+ target/arm/translate-mve.c | 30 ++++++++++++++++++++++++++++++
-files changed, 21 insertions(+)
+files changed, 72 insertions(+)
-diff --git a/include/hw/arm/armv7m.h b/include/hw/arm/armv7m.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/armv7m.h
+--- a/target/arm/helper-mve.h
-+++ b/include/hw/arm/armv7m.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(ARMv7MState, ARMV7M)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrshrunbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-  *   devices will be automatically layered on top of this view.)
+ DEF_HELPER_FLAGS_4(mve_vqrshrunbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-  * + Property "idau": IDAU interface (forwarded to CPU object)
+ DEF_HELPER_FLAGS_4(mve_vqrshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-  * + Property "init-svtor": secure VTOR reset value (forwarded to CPU object)
+ DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+ * + Property "init-nsvtor": non-secure VTOR reset value (forwarded to CPU object)
++
-  * + Property "vfp": enable VFP (forwarded to CPU object)
++DEF_HELPER_FLAGS_4(mve_vshlc, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
-  * + Property "dsp": enable DSP (forwarded to CPU object)
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
   * + Property "enable-bitband": expose bitbanded IO
@@ -XXX,XX +XXX,XX @@ struct ARMv7MState {
      MemoryRegion *board_memory;
      Object *idau;
      uint32_t init_svtor;
 +    uint32_t init_nsvtor;
      bool enable_bitband;
      bool start_powered_off;
      bool vfp;
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/mve.decode
-+++ b/target/arm/cpu.h
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
+@@ -XXX,XX +XXX,XX @@ VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_b
+ VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_h
-     /* For v8M, initial value of the Secure VTOR */
+ VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
-     uint32_t init_svtor;
+ VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
-+    /* For v8M, initial value of the Non-secure VTOR */
++
-+    uint32_t init_nsvtor;
++VSHLC             111 0 1110 1 . 1 imm:5 ... 0 1111 1100 rdm:4 qd=%qd
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
      /* [QEMU_]KVM_ARM_TARGET_* constant for this CPU, or
       * QEMU_KVM_ARM_TARGET_NONE if the kernel doesn't support this CPU type.
 diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/armv7m.c
+--- a/target/arm/mve_helper.c
-+++ b/hw/arm/armv7m.c
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static void armv7m_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@ DO_VSHRN_SAT_UB(vqrshrnb_ub, vqrshrnt_ub, DO_RSHRN_UB)
-             return;
+ DO_VSHRN_SAT_UH(vqrshrnb_uh, vqrshrnt_uh, DO_RSHRN_UH)
-         }
+ DO_VSHRN_SAT_SB(vqrshrunbb, vqrshruntb, DO_RSHRUN_B)
-     }
+ DO_VSHRN_SAT_SH(vqrshrunbh, vqrshrunth, DO_RSHRUN_H)
-+    if (object_property_find(OBJECT(s->cpu), "init-nsvtor")) {
++
-+        if (!object_property_set_uint(OBJECT(s->cpu), "init-nsvtor",
++uint32_t HELPER(mve_vshlc)(CPUARMState *env, void *vd, uint32_t rdm,
-+                                      s->init_nsvtor, errp)) {
++                           uint32_t shift)
-+            return;
++{
 +    uint32_t *d = vd;
 +    uint16_t mask = mve_element_mask(env);
 +    unsigned e;
 +    uint32_t r;
 +
 +    /*
 +     * For each 32-bit element, we shift it left, bringing in the
 +     * low 'shift' bits of rdm at the bottom. Bits shifted out at
 +     * the top become the new rdm, if the predicate mask permits.
 +     * The final rdm value is returned to update the register.
 +     * shift == 0 here means "shift by 32 bits".
 +     */
 +    if (shift == 0) {
 +        for (e = 0; e < 16 / 4; e++, mask >>= 4) {
 +            r = rdm;
 +            if (mask & 1) {
 +                rdm = d[H4(e)];
 +            }
 +            mergemask(&d[H4(e)], r, mask);
 +        }
 +    } else {
 +        uint32_t shiftmask = MAKE_64BIT_MASK(0, shift);
 +
 +        for (e = 0; e < 16 / 4; e++, mask >>= 4) {
 +            r = (d[H4(e)] << shift) | (rdm & shiftmask);
 +            if (mask & 1) {
 +                rdm = d[H4(e)] >> (32 - shift);
 +            }
 +            mergemask(&d[H4(e)], r, mask);
 +        }
 +    }
-     if (object_property_find(OBJECT(s->cpu), "start-powered-off")) {
++    mve_advance_vpt(env);
-         if (!object_property_set_bool(OBJECT(s->cpu), "start-powered-off",
++    return rdm;
-                                       s->start_powered_off, errp)) {
++}
-@@ -XXX,XX +XXX,XX @@ static Property armv7m_properties[] = {
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
                       MemoryRegion *),
      DEFINE_PROP_LINK("idau", ARMv7MState, idau, TYPE_IDAU_INTERFACE, Object *),
      DEFINE_PROP_UINT32("init-svtor", ARMv7MState, init_svtor, 0),
 +    DEFINE_PROP_UINT32("init-nsvtor", ARMv7MState, init_nsvtor, 0),
      DEFINE_PROP_BOOL("enable-bitband", ARMv7MState, enable_bitband, false),
      DEFINE_PROP_BOOL("start-powered-off", ARMv7MState, start_powered_off,
                       false),
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
+--- a/target/arm/translate-mve.c
-+++ b/target/arm/cpu.c
++++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
+@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_N(VQRSHRNB_U, vqrshrnb_u)
-         env->regs[14] = 0xffffffff;
+ DO_2SHIFT_N(VQRSHRNT_U, vqrshrnt_u)
+ DO_2SHIFT_N(VQRSHRUNB, vqrshrunb)
-         env->v7m.vecbase[M_REG_S] = cpu->init_svtor & 0xffffff80;
+ DO_2SHIFT_N(VQRSHRUNT, vqrshrunt)
-+        env->v7m.vecbase[M_REG_NS] = cpu->init_nsvtor & 0xffffff80;
++
++static bool trans_VSHLC(DisasContext *s, arg_VSHLC *a)
-         /* Load the initial SP and PC from offset 0 and 4 in the vector table */
++{
-         vecbase = env->v7m.vecbase[env->v7m.secure];
++    /*
-@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
++     * Whole Vector Left Shift with Carry. The carry is taken
-                                        &cpu->init_svtor,
++     * from a general purpose register and written back there.
-                                        OBJ_PROP_FLAG_READWRITE);
++     * An imm of 0 means "shift by 32".
-     }
++     */
-+    if (arm_feature(&cpu->env, ARM_FEATURE_M)) {
++    TCGv_ptr qd;
-+        /*
++    TCGv_i32 rdm;
-+         * Initial value of the NS VTOR (for cores without the Security
++
-+         * extension, this is the only VTOR)
++    if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd)) {
-+         */
++        return false;
 +        object_property_add_uint32_ptr(obj, "init-nsvtor",
 +                                       &cpu->init_nsvtor,
 +                                       OBJ_PROP_FLAG_READWRITE);
 +    }
++    if (a->rdm == 13 || a->rdm == 15) {
-     qdev_property_add_static(DEVICE(obj), &arm_cpu_cfgend_property);
++        /* CONSTRAINED UNPREDICTABLE: we UNDEF */
++        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    qd = mve_qreg_ptr(a->qd);
 +    rdm = load_reg(s, a->rdm);
 +    gen_helper_mve_vshlc(rdm, cpu_env, qd, rdm, tcg_constant_i32(a->imm));
 +    store_reg(s, a->rdm, rdm);
 +    tcg_temp_free_ptr(qd);
 +    mve_update_eci(s);
 +    return true;
 +}
 --
 .20.1

-[PULL 12/45] target/arm: fix missing exception class
+Deleted patch
-From: Jamie Iles <jamie@nuviainc.com>
-The DAIF and PAC checks used raise_exception_ra to raise an exception
-and unwind CPU state but raise_exception_ra is currently designed for
-handling data aborts as the syndrome is partially precomputed and
-encoded in the TB and then merged in merge_syn_data_abort when handling
-the data abort.  Using raise_exception_ra for DAIF and PAC checks
-results in an empty syndrome being retrieved from data[2] in
-restore_state_to_opc and setting ESR to 0.  This manifested as:
-  kvm [571]: Unknown exception class: esr: 0x000000 –
-  Unknown/Uncategorized
-when launching a KVM guest when the host qemu used a CPU supporting
-EL2+pointer authentication and enabling pointer authentication in the
-guest.
-Rework raise_exception_ra such that the state is restored before raising
-the exception so that the exception is not clobbered by
-restore_state_to_opc.
-Fixes: 0d43e1a2d29a ("target/arm: Add PAuth helpers")
-Cc: Richard Henderson <richard.henderson@linaro.org>
-Cc: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Jamie Iles <jamie@nuviainc.com>
-[PMM: added comment]
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/op_helper.c | 11 +++++++++--
-file changed, 9 insertions(+), 2 deletions(-)
-diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/op_helper.c
-+++ b/target/arm/op_helper.c
-@@ -XXX,XX +XXX,XX @@ void raise_exception(CPUARMState *env, uint32_t excp,
- void raise_exception_ra(CPUARMState *env, uint32_t excp, uint32_t syndrome,
-                         uint32_t target_el, uintptr_t ra)
- {
--    CPUState *cs = do_raise_exception(env, excp, syndrome, target_el);
--    cpu_loop_exit_restore(cs, ra);
-+    CPUState *cs = env_cpu(env);
-+
-+    /*
-+     * restore_state_to_opc() will set env->exception.syndrome, so
-+     * we must restore CPU state here before setting the syndrome
-+     * the caller passed us, and cannot use cpu_loop_exit_restore().
-+     */
-+    cpu_restore_state(cs, ra, true);
-+    raise_exception(env, excp, syndrome, target_el);
- }
- uint64_t HELPER(neon_tbl)(CPUARMState *env, uint32_t desc,
---
-.20.1

-[PULL 13/45] target/arm: fold do_raise_exception into raise_exception
+Deleted patch
-From: Jamie Iles <jamie@nuviainc.com>
-Now that there are no other users of do_raise_exception, fold it into
-raise_exception.
-Cc: Richard Henderson <richard.henderson@linaro.org>
-Cc: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Jamie Iles <jamie@nuviainc.com>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/op_helper.c | 12 ++----------
-file changed, 2 insertions(+), 10 deletions(-)
-diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/op_helper.c
-+++ b/target/arm/op_helper.c
-@@ -XXX,XX +XXX,XX @@
- #define SIGNBIT (uint32_t)0x80000000
- #define SIGNBIT64 ((uint64_t)1 << 63)
--static CPUState *do_raise_exception(CPUARMState *env, uint32_t excp,
--                                    uint32_t syndrome, uint32_t target_el)
-+void raise_exception(CPUARMState *env, uint32_t excp,
-+                     uint32_t syndrome, uint32_t target_el)
- {
-     CPUState *cs = env_cpu(env);
-@@ -XXX,XX +XXX,XX @@ static CPUState *do_raise_exception(CPUARMState *env, uint32_t excp,
-     cs->exception_index = excp;
-     env->exception.syndrome = syndrome;
-     env->exception.target_el = target_el;
--
--    return cs;
--}
--
--void raise_exception(CPUARMState *env, uint32_t excp,
--                     uint32_t syndrome, uint32_t target_el)
--{
--    CPUState *cs = do_raise_exception(env, excp, syndrome, target_el);
-     cpu_loop_exit(cs);
- }
---
-.20.1

-[PULL 14/45] target/arm: use raise_exception_ra for MTE check failure
+Deleted patch
-From: Jamie Iles <jamie@nuviainc.com>
-Now that raise_exception_ra restores the state before raising the
-exception we can use restore_exception_ra to perform the state restore +
-exception raising without clobbering the syndrome.
-Cc: Richard Henderson <richard.henderson@linaro.org>
-Cc: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Jamie Iles <jamie@nuviainc.com>
-[PMM: Keep the one line of the comment that is still relevant]
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/mte_helper.c | 12 +++---------
-file changed, 3 insertions(+), 9 deletions(-)
-diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mte_helper.c
-+++ b/target/arm/mte_helper.c
-@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
-     switch (tcf) {
-     case 1:
--        /*
--         * Tag check fail causes a synchronous exception.
--         *
--         * In restore_state_to_opc, we set the exception syndrome
--         * for the load or store operation.  Unwind first so we
--         * may overwrite that with the syndrome for the tag check.
--         */
--        cpu_restore_state(env_cpu(env), ra, true);
-+        /* Tag check fail causes a synchronous exception. */
-         env->exception.vaddress = dirty_ptr;
-         is_write = FIELD_EX32(desc, MTEDESC, WRITE);
-         syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0,
-                                     is_write, 0x11);
--        raise_exception(env, EXCP_DATA_ABORT, syn, exception_target_el(env));
-+        raise_exception_ra(env, EXCP_DATA_ABORT, syn,
-+                           exception_target_el(env), ra);
-         /* noreturn, but fall through to the assert anyway */
-     case 0:
---
-.20.1

-[PULL 23/45] target/arm: Implement bfloat16 matrix multiply accumulate
+[PULL 20/24] target/arm: Implement MVE VADDLV
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the MVE VADDLV insn; this is similar to VADDV, except
 that it accumulates 32-bit elements into a 64-bit accumulator
 stored in a pair of general-purpose registers.
-This is BFMMLA for both AArch64 AdvSIMD and SVE,
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-and VMMLA.BF16 for AArch32 NEON.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210628135835.6690-15-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h    |  3 ++
  target/arm/mve.decode      |  6 +++-
  target/arm/mve_helper.c    | 19 ++++++++++++
  target/arm/translate-mve.c | 63 ++++++++++++++++++++++++++++++++++++++
 files changed, 90 insertions(+), 1 deletion(-)
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210525225817.400336-9-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/helper.h           |  3 +++
  target/arm/neon-shared.decode |  2 ++
  target/arm/sve.decode         |  6 +++--
  target/arm/translate-a64.c    | 10 +++++++++
  target/arm/translate-neon.c   |  9 ++++++++
  target/arm/translate-sve.c    | 12 ++++++++++
  target/arm/vec_helper.c       | 42 ++++++++++++++++++++++++++++++++++-
 files changed, 81 insertions(+), 3 deletions(-)
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.h
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/helper.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vaddvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
- DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
+ DEF_HELPER_FLAGS_3(mve_vaddvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
-                    void, ptr, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
-+DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
++DEF_HELPER_FLAGS_3(mve_vaddlv_s, TCG_CALL_NO_WG, i64, env, ptr, i64)
-+                   void, ptr, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_3(mve_vaddlv_u, TCG_CALL_NO_WG, i64, env, ptr, i64)
 +
- #ifdef TARGET_AARCH64
+ DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
- #include "helper-a64.h"
+ DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
- #include "helper-sve.h"
+ DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
-diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-shared.decode
+--- a/target/arm/mve.decode
-+++ b/target/arm/neon-shared.decode
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ VUMMLA         1111 1100 0.10 .... .... 1100 .1.1 .... \
+@@ -XXX,XX +XXX,XX @@ VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
-                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+ VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
- VUSMMLA        1111 1100 1.10 .... .... 1100 .1.0 .... \
-                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+ # Vector add across vector
-+VMMLA_b16      1111 1100 0.00 .... .... 1100 .1.0 .... \
+-VADDV            111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rdalo
-+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
++{
++  VADDV          111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rdalo
- VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
++  VADDLV         111 u:1 1110 1 ... 1001 ... 0 1111 00 a:1 0 qm:3 0 \
-                vn=%vn_dp vd=%vd_dp size=1
++                 rdahi=%rdahi rdalo=%rdalo
-diff --git a/target/arm/sve.decode b/target/arm/sve.decode
++}
  # Predicate operations
  %mask_22_13      22:1 13:3
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/sve.decode
+--- a/target/arm/mve_helper.c
-+++ b/target/arm/sve.decode
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ SQRDCMLAH_zzzz  01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5  ra=%reg_movprfx
+@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvub, 1, uint8_t)
- USDOT_zzzz      01000100 .. 0 ..... 011 110 ..... .....  @rda_rn_rm
+ DO_VADDV(vaddvuh, 2, uint16_t)
+ DO_VADDV(vaddvuw, 4, uint32_t)
- ### SVE2 floating point matrix multiply accumulate
--
++#define DO_VADDLV(OP, TYPE, LTYPE)                              \
--FMMLA           01100100 .. 1 ..... 111001 ..... .....  @rda_rn_rm
++    uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
 +                                    uint64_t ra)                \
 +    {                                                           \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned e;                                             \
 +        TYPE *m = vm;                                           \
 +        for (e = 0; e < 16 / 4; e++, mask >>= 4) {              \
 +            if (mask & 1) {                                     \
 +                ra += (LTYPE)m[H4(e)];                          \
 +            }                                                   \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +        return ra;                                              \
 +    }                                                           \
 +
 +DO_VADDLV(vaddlv_s, int32_t, int64_t)
 +DO_VADDLV(vaddlv_u, uint32_t, uint64_t)
 +
  /* Shifts by immediate */
  #define DO_2SHIFT(OP, ESIZE, TYPE, FN)                          \
      void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VADDV(DisasContext *s, arg_VADDV *a)
      return true;
  }
 +static bool trans_VADDLV(DisasContext *s, arg_VADDLV *a)
 +{
-+  BFMMLA        01100100 01 1 ..... 111 001 ..... .....  @rda_rn_rm_e0
++    /*
-+  FMMLA         01100100 .. 1 ..... 111 001 ..... .....  @rda_rn_rm
++     * Vector Add Long Across Vector: accumulate the 32-bit
-+}
++     * elements of the vector into a 64-bit result stored in
++     * a pair of general-purpose registers.
- ### SVE2 Memory Gather Load Group
++     * No need to check Qm's bank: it is only 3 bits in decode.
++     */
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
++    TCGv_ptr qm;
-index XXXXXXX..XXXXXXX 100644
++    TCGv_i64 rda;
---- a/target/arm/translate-a64.c
++    TCGv_i32 rdalo, rdahi;
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          feature = dc_isar_feature(aa64_fcma, s);
          break;
 +    case 0x1d: /* BFMMLA */
 +        if (size != MO_16 || !is_q) {
 +            unallocated_encoding(s);
 +            return;
 +        }
 +        feature = dc_isar_feature(aa64_bf16, s);
 +        break;
      case 0x1f: /* BFDOT */
          switch (size) {
          case 1:
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          return;
 +    case 0xd: /* BFMMLA */
 +        gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
 +        return;
      case 0xf: /* BFDOT */
          switch (size) {
          case 1:
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VUSMMLA(DisasContext *s, arg_VUSMMLA *a)
      return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
                          gen_helper_gvec_usmmla_b);
  }
 +
-+static bool trans_VMMLA_b16(DisasContext *s, arg_VMMLA_b16 *a)
++    if (!dc_isar_feature(aa32_mve, s)) {
 +{
 +    if (!dc_isar_feature(aa32_bf16, s)) {
 +        return false;
 +    }
-+    return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
++    /*
-+                        gen_helper_gvec_bfmmla);
++     * rdahi == 13 is UNPREDICTABLE; rdahi == 15 is a related
-+}
++     * encoding; rdalo always has bit 0 clear so cannot be 13 or 15.
-diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
++     */
-index XXXXXXX..XXXXXXX 100644
++    if (a->rdahi == 13 || a->rdahi == 15) {
 --- a/target/arm/translate-sve.c
 +++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFDOT_zzxz(DisasContext *s, arg_rrxr_esz *a)
      }
      return true;
  }
 +
 +static bool trans_BFMMLA(DisasContext *s, arg_rrrr_esz *a)
 +{
 +    if (!dc_isar_feature(aa64_sve_bf16, s)) {
 +        return false;
 +    }
-+    if (sve_access_check(s)) {
++    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-+        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfmmla,
++        return true;
 +                          a->rd, a->rn, a->rm, a->ra, 0);
 +    }
++
++    /*
++     * This insn is subject to beat-wise execution. Partial execution
++     * of an A=0 (no-accumulate) insn which does not execute the first
++     * beat must start with the current value of RdaHi:RdaLo, not zero.
++     */
++    if (a->a || mve_skip_first_beat(s)) {
++        /* Accumulate input from RdaHi:RdaLo */
++        rda = tcg_temp_new_i64();
++        rdalo = load_reg(s, a->rdalo);
++        rdahi = load_reg(s, a->rdahi);
++        tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
++        tcg_temp_free_i32(rdalo);
++        tcg_temp_free_i32(rdahi);
++    } else {
++        /* Accumulate starting at zero */
++        rda = tcg_const_i64(0);
++    }
++
++    qm = mve_qreg_ptr(a->qm);
++    if (a->u) {
++        gen_helper_mve_vaddlv_u(rda, cpu_env, qm, rda);
++    } else {
++        gen_helper_mve_vaddlv_s(rda, cpu_env, qm, rda);
++    }
++    tcg_temp_free_ptr(qm);
++
++    rdalo = tcg_temp_new_i32();
++    rdahi = tcg_temp_new_i32();
++    tcg_gen_extrl_i64_i32(rdalo, rda);
++    tcg_gen_extrh_i64_i32(rdahi, rda);
++    store_reg(s, a->rdalo, rdalo);
++    store_reg(s, a->rdahi, rdahi);
++    tcg_temp_free_i64(rda);
++    mve_update_eci(s);
 +    return true;
 +}
-diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vec_helper.c
-+++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ static void do_mmla_b(void *vd, void *vn, void *vm, void *va, uint32_t desc,
-          * Process the entire segment at once, writing back the
-          * results only after we've consumed all of the inputs.
-          *
--         * Key to indicies by column:
-+         * Key to indices by column:
-          *          i   j                  i             j
-          */
-         sum0 = a[H4(0 + 0)];
-@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm,
-     }
-     clear_tail(d, opr_sz, simd_maxsz(desc));
- }
 +
-+void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
+ static bool do_1imm(DisasContext *s, arg_1imm *a, MVEGenOneOpImmFn *fn)
-+{
+ {
-+    intptr_t s, opr_sz = simd_oprsz(desc);
+     TCGv_ptr qd;
 +    float32 *d = vd, *a = va;
 +    uint32_t *n = vn, *m = vm;
 +
 +    for (s = 0; s < opr_sz / 4; s += 4) {
 +        float32 sum00, sum01, sum10, sum11;
 +
 +        /*
 +         * Process the entire segment at once, writing back the
 +         * results only after we've consumed all of the inputs.
 +         *
 +         * Key to indicies by column:
 +         *               i   j           i   k             j   k
 +         */
 +        sum00 = a[s + H4(0 + 0)];
 +        sum00 = bfdotadd(sum00, n[s + H4(0 + 0)], m[s + H4(0 + 0)]);
 +        sum00 = bfdotadd(sum00, n[s + H4(0 + 1)], m[s + H4(0 + 1)]);
 +
 +        sum01 = a[s + H4(0 + 1)];
 +        sum01 = bfdotadd(sum01, n[s + H4(0 + 0)], m[s + H4(2 + 0)]);
 +        sum01 = bfdotadd(sum01, n[s + H4(0 + 1)], m[s + H4(2 + 1)]);
 +
 +        sum10 = a[s + H4(2 + 0)];
 +        sum10 = bfdotadd(sum10, n[s + H4(2 + 0)], m[s + H4(0 + 0)]);
 +        sum10 = bfdotadd(sum10, n[s + H4(2 + 1)], m[s + H4(0 + 1)]);
 +
 +        sum11 = a[s + H4(2 + 1)];
 +        sum11 = bfdotadd(sum11, n[s + H4(2 + 0)], m[s + H4(2 + 0)]);
 +        sum11 = bfdotadd(sum11, n[s + H4(2 + 1)], m[s + H4(2 + 1)]);
 +
 +        d[s + H4(0 + 0)] = sum00;
 +        d[s + H4(0 + 1)] = sum01;
 +        d[s + H4(2 + 0)] = sum10;
 +        d[s + H4(2 + 1)] = sum11;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 --
 .20.1

-[PULL 19/45] target/arm: Implement vector float32 to bfloat16 conversion
+[PULL 21/24] target/arm: Implement MVE long shifts by immediate
-From: Richard Henderson <richard.henderson@linaro.org>
+The MVE extension to v8.1M includes some new shift instructions which
+sit entirely within the non-coprocessor part of the encoding space
-This is BFCVT{N,T} for both AArch64 AdvSIMD and SVE,
+and which operate only on general-purpose registers.  They take up
-and VCVT.BF16.F32 for AArch32 NEON.
+the space which was previously UNPREDICTABLE MOVS and ORRS encodings
+with Rm == 13 or 15.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Implement the long shifts by immediate, which perform shifts on a
-Message-id: 20210525225817.400336-5-richard.henderson@linaro.org
+pair of general-purpose registers treated as a 64-bit quantity, with
 an immediate shift count between 1 and 32.
 Awkwardly, because the MOVS and ORRS trans functions do not UNDEF for
 the Rm==13,15 case, we need to explicitly emit code to UNDEF for the
 cases where v8.1M now requires that.  (Trying to change MOVS and ORRS
 is too difficult, because the functions that generate the code are
 shared between a dozen different kinds of arithmetic or logical
 instruction for all A32, T16 and T32 encodings, and for some insns
 and some encodings Rm==13,15 are valid.)
 We make the helper functions we need for UQSHLL and SQSHLL take
 a 32-bit value which the helper casts to int8_t because we'll need
 these helpers also for the shift-by-register insns, where the shift
 count might be < 0 or > 32.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210628135835.6690-16-peter.maydell@linaro.org
 ---
- target/arm/helper-sve.h     |  4 ++++
+ target/arm/helper-mve.h |  3 ++
- target/arm/helper.h         |  1 +
+ target/arm/translate.h  |  1 +
- target/arm/neon-dp.decode   |  1 +
+ target/arm/t32.decode   | 28 +++++++++++++
- target/arm/sve.decode       |  2 ++
+ target/arm/mve_helper.c | 10 +++++
- target/arm/sve_helper.c     |  2 ++
+ target/arm/translate.c  | 90 +++++++++++++++++++++++++++++++++++++++++
- target/arm/translate-a64.c  | 17 ++++++++++++++
+files changed, 132 insertions(+)
- target/arm/translate-neon.c | 45 +++++++++++++++++++++++++++++++++++++
- target/arm/translate-sve.c  | 16 +++++++++++++
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
- target/arm/vfp_helper.c     |  7 ++++++
+index XXXXXXX..XXXXXXX 100644
-files changed, 95 insertions(+)
+--- a/target/arm/helper-mve.h
++++ b/target/arm/helper-mve.h
-diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-index XXXXXXX..XXXXXXX 100644
+ DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
---- a/target/arm/helper-sve.h
-+++ b/target/arm/helper-sve.h
+ DEF_HELPER_FLAGS_4(mve_vshlc, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve_fcvt_hd, TCG_CALL_NO_RWG,
++
-                    void, ptr, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_3(mve_sqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
- DEF_HELPER_FLAGS_5(sve_fcvt_sd, TCG_CALL_NO_RWG,
++DEF_HELPER_FLAGS_3(mve_uqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
-                    void, ptr, ptr, ptr, ptr, i32)
+diff --git a/target/arm/translate.h b/target/arm/translate.h
-+DEF_HELPER_FLAGS_5(sve_bfcvt, TCG_CALL_NO_RWG,
+index XXXXXXX..XXXXXXX 100644
-+                   void, ptr, ptr, ptr, ptr, i32)
+--- a/target/arm/translate.h
++++ b/target/arm/translate.h
- DEF_HELPER_FLAGS_5(sve_fcvtzs_hh, TCG_CALL_NO_RWG,
+@@ -XXX,XX +XXX,XX @@ typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
-                    void, ptr, ptr, ptr, ptr, i32)
+ typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve2_fcvtnt_sh, TCG_CALL_NO_RWG,
+ typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
-                    void, ptr, ptr, ptr, ptr, i32)
+ typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
- DEF_HELPER_FLAGS_5(sve2_fcvtnt_ds, TCG_CALL_NO_RWG,
++typedef void WideShiftImmFn(TCGv_i64, TCGv_i64, int64_t shift);
-                    void, ptr, ptr, ptr, ptr, i32)
-+DEF_HELPER_FLAGS_5(sve_bfcvtnt, TCG_CALL_NO_RWG,
+ /**
-+                   void, ptr, ptr, ptr, ptr, i32)
+  * arm_tbflags_from_tb:
+diff --git a/target/arm/t32.decode b/target/arm/t32.decode
- DEF_HELPER_FLAGS_5(sve2_fcvtlt_hs, TCG_CALL_NO_RWG,
+index XXXXXXX..XXXXXXX 100644
-                    void, ptr, ptr, ptr, ptr, i32)
+--- a/target/arm/t32.decode
-diff --git a/target/arm/helper.h b/target/arm/helper.h
++++ b/target/arm/t32.decode
-index XXXXXXX..XXXXXXX 100644
+@@ -XXX,XX +XXX,XX @@
---- a/target/arm/helper.h
+ &mcr             !extern cp opc1 crn crm opc2 rt
-+++ b/target/arm/helper.h
+ &mcrr            !extern cp opc1 crm rt rt2
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
- DEF_HELPER_2(vfp_fcvtds, f64, f32, env)
++&mve_shl_ri      rdalo rdahi shim
- DEF_HELPER_2(vfp_fcvtsd, f32, f64, env)
++
- DEF_HELPER_FLAGS_2(bfcvt, TCG_CALL_NO_RWG, i32, f32, ptr)
++# rdahi: bits [3:1] from insn, bit 0 is 1
-+DEF_HELPER_FLAGS_2(bfcvt_pair, TCG_CALL_NO_RWG, i32, i64, ptr)
++# rdalo: bits [3:1] from insn, bit 0 is 0
++%rdahi_9 9:3 !function=times_2_plus_1
- DEF_HELPER_2(vfp_uitoh, f16, i32, ptr)
++%rdalo_17 17:3 !function=times_2
- DEF_HELPER_2(vfp_uitos, f32, i32, ptr)
++
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+ # Data-processing (register)
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+ %imm5_12_6       12:3 6:2
-+++ b/target/arm/neon-dp.decode
+@@ -XXX,XX +XXX,XX @@
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+ @S_xrr_shi       ....... .... .   rn:4 .... .... .. shty:2 rm:4 \
-     VRINTZ       1111 001 11 . 11 .. 10 .... 0 1011 . . 0 .... @2misc
+                  &s_rrr_shi shim=%imm5_12_6 s=1 rd=0
-     VCVT_F16_F32 1111 001 11 . 11 .. 10 .... 0 1100 0 . 0 .... @2misc_q0
++@mve_shl_ri      ....... .... . ... . . ... ... . .. .. .... \
-+    VCVT_B16_F32 1111 001 11 . 11 .. 10 .... 0 1100 1 . 0 .... @2misc_q0
++                 &mve_shl_ri shim=%imm5_12_6 rdalo=%rdalo_17 rdahi=%rdahi_9
++
-     VRINTM       1111 001 11 . 11 .. 10 .... 0 1101 . . 0 .... @2misc
+ {
+   TST_xrri       1110101 0000 1 .... 0 ... 1111 .... ....     @S_xrr_shi
-diff --git a/target/arm/sve.decode b/target/arm/sve.decode
+   AND_rrri       1110101 0000 . .... 0 ... .... .... ....     @s_rrr_shi
-index XXXXXXX..XXXXXXX 100644
+ }
---- a/target/arm/sve.decode
+ BIC_rrri         1110101 0001 . .... 0 ... .... .... ....     @s_rrr_shi
-+++ b/target/arm/sve.decode
+ {
-@@ -XXX,XX +XXX,XX @@ FNMLS_zpzzz     01100101 .. 1 ..... 111 ... ..... .....         @rdn_pg_rm_ra
++  # The v8.1M MVE shift insns overlap in encoding with MOVS/ORRS
- # SVE floating-point convert precision
++  # and are distinguished by having Rm==13 or 15. Those are UNPREDICTABLE
- FCVT_sh         01100101 10 0010 00 101 ... ..... .....         @rd_pg_rn_e0
++  # cases for MOVS/ORRS. We decode the MVE cases first, ensuring that
- FCVT_hs         01100101 10 0010 01 101 ... ..... .....         @rd_pg_rn_e0
++  # they explicitly call unallocated_encoding() for cases that must UNDEF
-+BFCVT           01100101 10 0010 10 101 ... ..... .....         @rd_pg_rn_e0
++  # (eg "using a new shift insn on a v8.1M CPU without MVE"), and letting
- FCVT_dh         01100101 11 0010 00 101 ... ..... .....         @rd_pg_rn_e0
++  # the rest fall through (where ORR_rrri and MOV_rxri will end up
- FCVT_hd         01100101 11 0010 01 101 ... ..... .....         @rd_pg_rn_e0
++  # handling them as r13 and r15 accesses with the same semantics as A32).
- FCVT_ds         01100101 11 0010 10 101 ... ..... .....         @rd_pg_rn_e0
++  [
-@@ -XXX,XX +XXX,XX @@ RAX1            01000101 00 1 ..... 11110 1 ..... .....  @rd_rn_rm_e0
++    LSLL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 00 1111  @mve_shl_ri
- FCVTXNT_ds      01100100 00 0010 10 101 ... ..... .....  @rd_pg_rn_e0
++    LSRL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 01 1111  @mve_shl_ri
- FCVTX_ds        01100101 00 0010 10 101 ... ..... .....  @rd_pg_rn_e0
++    ASRL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 10 1111  @mve_shl_ri
- FCVTNT_sh       01100100 10 0010 00 101 ... ..... .....  @rd_pg_rn_e0
++
-+BFCVTNT         01100100 10 0010 10 101 ... ..... .....  @rd_pg_rn_e0
++    UQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 00 1111  @mve_shl_ri
- FCVTLT_hs       01100100 10 0010 01 101 ... ..... .....  @rd_pg_rn_e0
++    URSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
- FCVTNT_ds       01100100 11 0010 10 101 ... ..... .....  @rd_pg_rn_e0
++    SRSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
- FCVTLT_sd       01100100 11 0010 11 101 ... ..... .....  @rd_pg_rn_e0
++    SQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
-diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
++  ]
-index XXXXXXX..XXXXXXX 100644
++
---- a/target/arm/sve_helper.c
+   MOV_rxri       1110101 0010 . 1111 0 ... .... .... ....     @s_rxr_shi
-+++ b/target/arm/sve_helper.c
+   ORR_rrri       1110101 0010 . .... 0 ... .... .... ....     @s_rrr_shi
-@@ -XXX,XX +XXX,XX @@ static inline uint64_t vfp_float64_to_uint64_rtz(float64 f, float_status *s)
+ }
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
- DO_ZPZ_FP(sve_fcvt_sh, uint32_t, H1_4, sve_f32_to_f16)
+index XXXXXXX..XXXXXXX 100644
- DO_ZPZ_FP(sve_fcvt_hs, uint32_t, H1_4, sve_f16_to_f32)
+--- a/target/arm/mve_helper.c
-+DO_ZPZ_FP(sve_bfcvt,   uint32_t, H1_4, float32_to_bfloat16)
++++ b/target/arm/mve_helper.c
- DO_ZPZ_FP(sve_fcvt_dh, uint64_t,     , sve_f64_to_f16)
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mve_vshlc)(CPUARMState *env, void *vd, uint32_t rdm,
- DO_ZPZ_FP(sve_fcvt_hd, uint64_t,     , sve_f16_to_f64)
+     mve_advance_vpt(env);
- DO_ZPZ_FP(sve_fcvt_ds, uint64_t,     , float64_to_float32)
+     return rdm;
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc)  \
+ }
-     } while (i != 0);                                                         \
++
- }
++uint64_t HELPER(mve_sqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
++{
-+DO_FCVTNT(sve_bfcvtnt,    uint32_t, uint16_t, H1_4, H1_2, float32_to_bfloat16)
++    return do_sqrshl_d(n, (int8_t)shift, false, &env->QF);
- DO_FCVTNT(sve2_fcvtnt_sh, uint32_t, uint16_t, H1_4, H1_2, sve_f32_to_f16)
++}
- DO_FCVTNT(sve2_fcvtnt_ds, uint64_t, uint32_t,     , H1_4, float64_to_float32)
++
++uint64_t HELPER(mve_uqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
++{
-index XXXXXXX..XXXXXXX 100644
++    return do_uqrshl_d(n, (int8_t)shift, false, &env->QF);
---- a/target/arm/translate-a64.c
++}
-+++ b/target/arm/translate-a64.c
+diff --git a/target/arm/translate.c b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static void handle_2misc_narrow(DisasContext *s, bool scalar,
+index XXXXXXX..XXXXXXX 100644
-                 tcg_temp_free_i32(ahp);
+--- a/target/arm/translate.c
-             }
++++ b/target/arm/translate.c
-             break;
+@@ -XXX,XX +XXX,XX @@ static bool trans_MOVT(DisasContext *s, arg_MOVW *a)
 +        case 0x36: /* BFCVTN, BFCVTN2 */
 +            {
 +                TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
 +                gen_helper_bfcvt_pair(tcg_res[pass], tcg_op, fpst);
 +                tcg_temp_free_ptr(fpst);
 +            }
 +            break;
          case 0x56:  /* FCVTXN, FCVTXN2 */
              /* 64 bit to 32 bit float conversion
               * with von Neumann rounding (round to odd)
@@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
              }
              handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd);
              return;
 +        case 0x36: /* BFCVTN, BFCVTN2 */
 +            if (!dc_isar_feature(aa64_bf16, s) || size != 2) {
 +                unallocated_encoding(s);
 +                return;
 +            }
 +            if (!fp_access_check(s)) {
 +                return;
 +            }
 +            handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd);
 +            return;
          case 0x17: /* FCVTL, FCVTL2 */
              if (!fp_access_check(s)) {
                  return;
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      return true;
  }
-+static bool trans_VCVT_B16_F32(DisasContext *s, arg_2misc *a)
++/*
-+{
++ * v8.1M MVE wide-shifts
-+    TCGv_ptr fpst;
++ */
-+    TCGv_i64 tmp;
++static bool do_mve_shl_ri(DisasContext *s, arg_mve_shl_ri *a,
-+    TCGv_i32 dst0, dst1;
++                          WideShiftImmFn *fn)
-+
++{
-+    if (!dc_isar_feature(aa32_bf16, s)) {
++    TCGv_i64 rda;
 +    TCGv_i32 rdalo, rdahi;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 +        /* Decode falls through to ORR/MOV UNPREDICTABLE handling */
 +        return false;
 +    }
-+
++    if (a->rdahi == 15) {
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
++        /* These are a different encoding (SQSHL/SRSHR/UQSHL/URSHR) */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vm) & 0x10)) {
 +        return false;
 +    }
-+
++    if (!dc_isar_feature(aa32_mve, s) ||
-+    if ((a->vm & 1) || (a->size != 1)) {
++        !arm_dc_feature(s, ARM_FEATURE_M_MAIN) ||
-+        return false;
++        a->rdahi == 13) {
-+    }
++        /* RdaHi == 13 is UNPREDICTABLE; we choose to UNDEF */
-+
++        unallocated_encoding(s);
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
-+    fpst = fpstatus_ptr(FPST_STD);
++    if (a->shim == 0) {
-+    tmp = tcg_temp_new_i64();
++        a->shim = 32;
-+    dst0 = tcg_temp_new_i32();
++    }
-+    dst1 = tcg_temp_new_i32();
++
-+
++    rda = tcg_temp_new_i64();
-+    read_neon_element64(tmp, a->vm, 0, MO_64);
++    rdalo = load_reg(s, a->rdalo);
-+    gen_helper_bfcvt_pair(dst0, tmp, fpst);
++    rdahi = load_reg(s, a->rdahi);
-+
++    tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
-+    read_neon_element64(tmp, a->vm, 1, MO_64);
++
-+    gen_helper_bfcvt_pair(dst1, tmp, fpst);
++    fn(rda, rda, a->shim);
 +
-+    write_neon_element32(dst0, a->vd, 0, MO_32);
++    tcg_gen_extrl_i64_i32(rdalo, rda);
-+    write_neon_element32(dst1, a->vd, 1, MO_32);
++    tcg_gen_extrh_i64_i32(rdahi, rda);
-+
++    store_reg(s, a->rdalo, rdalo);
-+    tcg_temp_free_i64(tmp);
++    store_reg(s, a->rdahi, rdahi);
-+    tcg_temp_free_i32(dst0);
++    tcg_temp_free_i64(rda);
-+    tcg_temp_free_i32(dst1);
++
 +    tcg_temp_free_ptr(fpst);
 +    return true;
 +}
 +
- static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
++static bool trans_ASRL_ri(DisasContext *s, arg_mve_shl_ri *a)
- {
++{
-     TCGv_ptr fpst;
++    return do_mve_shl_ri(s, a, tcg_gen_sari_i64);
-diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
++}
-index XXXXXXX..XXXXXXX 100644
++
---- a/target/arm/translate-sve.c
++static bool trans_LSLL_ri(DisasContext *s, arg_mve_shl_ri *a)
-+++ b/target/arm/translate-sve.c
++{
-@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_hs(DisasContext *s, arg_rpr_esz *a)
++    return do_mve_shl_ri(s, a, tcg_gen_shli_i64);
-     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_hs);
++}
- }
++
++static bool trans_LSRL_ri(DisasContext *s, arg_mve_shl_ri *a)
-+static bool trans_BFCVT(DisasContext *s, arg_rpr_esz *a)
++{
-+{
++    return do_mve_shl_ri(s, a, tcg_gen_shri_i64);
-+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
++}
-+        return false;
++
-+    }
++static void gen_mve_sqshll(TCGv_i64 r, TCGv_i64 n, int64_t shift)
-+    return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_bfcvt);
++{
-+}
++    gen_helper_mve_sqshll(r, cpu_env, n, tcg_constant_i32(shift));
-+
++}
- static bool trans_FCVT_dh(DisasContext *s, arg_rpr_esz *a)
++
- {
++static bool trans_SQSHLL_ri(DisasContext *s, arg_mve_shl_ri *a)
-     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_dh);
++{
-@@ -XXX,XX +XXX,XX @@ static bool trans_FCVTNT_sh(DisasContext *s, arg_rpr_esz *a)
++    return do_mve_shl_ri(s, a, gen_mve_sqshll);
-     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtnt_sh);
++}
- }
++
++static void gen_mve_uqshll(TCGv_i64 r, TCGv_i64 n, int64_t shift)
-+static bool trans_BFCVTNT(DisasContext *s, arg_rpr_esz *a)
++{
-+{
++    gen_helper_mve_uqshll(r, cpu_env, n, tcg_constant_i32(shift));
-+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
++}
-+        return false;
++
-+    }
++static bool trans_UQSHLL_ri(DisasContext *s, arg_mve_shl_ri *a)
-+    return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_bfcvtnt);
++{
-+}
++    return do_mve_shl_ri(s, a, gen_mve_uqshll);
-+
++}
- static bool trans_FCVTNT_ds(DisasContext *s, arg_rpr_esz *a)
++
- {
++static bool trans_SRSHRL_ri(DisasContext *s, arg_mve_shl_ri *a)
-     if (!dc_isar_feature(aa64_sve2, s)) {
++{
-diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
++    return do_mve_shl_ri(s, a, gen_srshr64_i64);
-index XXXXXXX..XXXXXXX 100644
++}
---- a/target/arm/vfp_helper.c
++
-+++ b/target/arm/vfp_helper.c
++static bool trans_URSHRL_ri(DisasContext *s, arg_mve_shl_ri *a)
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(bfcvt)(float32 x, void *status)
++{
-     return float32_to_bfloat16(x, status);
++    return do_mve_shl_ri(s, a, gen_urshr64_i64);
  }
 +uint32_t HELPER(bfcvt_pair)(uint64_t pair, void *status)
 +{
 +    bfloat16 lo = float32_to_bfloat16(extract64(pair, 0, 32), status);
 +    bfloat16 hi = float32_to_bfloat16(extract64(pair, 32, 32), status);
 +    return deposit32(lo, 16, 16, hi);
 +}
 +
  /*
-  * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
+  * Multiply and multiply accumulate
-  * must always round-to-nearest; the AArch64 ones honour the FPSCR
+  */
 --
 .20.1

-[PULL 18/45] target/arm: Implement scalar float32 to bfloat16 conversion
+[PULL 22/24] target/arm: Implement MVE long shifts by register
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the MVE long shifts by register, which perform shifts on a
+pair of general-purpose registers treated as a 64-bit quantity, with
-This is the 64-bit BFCVT and the 32-bit VCVT{B,T}.BF16.F32.
+the shift count in another general-purpose register, which might be
+either positive or negative.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Like the long-shifts-by-immediate, these encodings sit in the space
-Message-id: 20210525225817.400336-4-richard.henderson@linaro.org
+that was previously the UNPREDICTABLE MOVS/ORRS with Rm==13,15.
 Because LSLL_rr and ASRL_rr overlap with both MOV_rxri/ORR_rrri and
 also with CSEL (as one of the previously-UNPREDICTABLE Rm==13 cases),
 we have to move the CSEL pattern into the same decodetree group.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210628135835.6690-17-peter.maydell@linaro.org
 ---
- target/arm/helper.h        |  1 +
+ target/arm/helper-mve.h |  6 +++
- target/arm/vfp.decode      |  2 ++
+ target/arm/translate.h  |  1 +
- target/arm/translate-a64.c | 19 +++++++++++++++++++
+ target/arm/t32.decode   | 16 +++++--
- target/arm/translate-vfp.c | 24 ++++++++++++++++++++++++
+ target/arm/mve_helper.c | 93 +++++++++++++++++++++++++++++++++++++++++
- target/arm/vfp_helper.c    |  5 +++++
+ target/arm/translate.c  | 69 ++++++++++++++++++++++++++++++
-files changed, 51 insertions(+)
+files changed, 182 insertions(+), 3 deletions(-)
-diff --git a/target/arm/helper.h b/target/arm/helper.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.h
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/helper.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- DEF_HELPER_2(vfp_fcvtds, f64, f32, env)
+ DEF_HELPER_FLAGS_4(mve_vshlc, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
- DEF_HELPER_2(vfp_fcvtsd, f32, f64, env)
-+DEF_HELPER_FLAGS_2(bfcvt, TCG_CALL_NO_RWG, i32, f32, ptr)
++DEF_HELPER_FLAGS_3(mve_sshrl, TCG_CALL_NO_RWG, i64, env, i64, i32)
++DEF_HELPER_FLAGS_3(mve_ushll, TCG_CALL_NO_RWG, i64, env, i64, i32)
- DEF_HELPER_2(vfp_uitoh, f16, i32, ptr)
+ DEF_HELPER_FLAGS_3(mve_sqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
- DEF_HELPER_2(vfp_uitos, f32, i32, ptr)
+ DEF_HELPER_FLAGS_3(mve_uqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
-diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
++DEF_HELPER_FLAGS_3(mve_sqrshrl, TCG_CALL_NO_RWG, i64, env, i64, i32)
-index XXXXXXX..XXXXXXX 100644
++DEF_HELPER_FLAGS_3(mve_uqrshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
---- a/target/arm/vfp.decode
++DEF_HELPER_FLAGS_3(mve_sqrshrl48, TCG_CALL_NO_RWG, i64, env, i64, i32)
-+++ b/target/arm/vfp.decode
++DEF_HELPER_FLAGS_3(mve_uqrshll48, TCG_CALL_NO_RWG, i64, env, i64, i32)
-@@ -XXX,XX +XXX,XX @@ VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
+diff --git a/target/arm/translate.h b/target/arm/translate.h
+index XXXXXXX..XXXXXXX 100644
- # VCVTB and VCVTT to f16: Vd format is always vd_sp;
+--- a/target/arm/translate.h
- # Vm format depends on size bit
++++ b/target/arm/translate.h
-+VCVT_b16_f32 ---- 1110 1.11 0011 .... 1001 t:1 1.0 .... \
+@@ -XXX,XX +XXX,XX @@ typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
-+             vd=%vd_sp vm=%vm_sp
+ typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
- VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
+ typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
-              vd=%vd_sp vm=%vm_sp
+ typedef void WideShiftImmFn(TCGv_i64, TCGv_i64, int64_t shift);
- VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
++typedef void WideShiftFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i32);
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
-index XXXXXXX..XXXXXXX 100644
+ /**
---- a/target/arm/translate-a64.c
+  * arm_tbflags_from_tb:
-+++ b/target/arm/translate-a64.c
+diff --git a/target/arm/t32.decode b/target/arm/t32.decode
-@@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn)
+index XXXXXXX..XXXXXXX 100644
-     case 0x3: /* FSQRT */
+--- a/target/arm/t32.decode
-         gen_helper_vfp_sqrts(tcg_res, tcg_op, cpu_env);
++++ b/target/arm/t32.decode
-         goto done;
+@@ -XXX,XX +XXX,XX @@
-+    case 0x6: /* BFCVT */
+ &mcrr            !extern cp opc1 crm rt rt2
-+        gen_fpst = gen_helper_bfcvt;
-+        break;
+ &mve_shl_ri      rdalo rdahi shim
-     case 0x8: /* FRINTN */
++&mve_shl_rr      rdalo rdahi rm
-     case 0x9: /* FRINTP */
-     case 0xa: /* FRINTM */
+ # rdahi: bits [3:1] from insn, bit 0 is 1
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
+ # rdalo: bits [3:1] from insn, bit 0 is 0
-         }
+@@ -XXX,XX +XXX,XX @@
-         break;
+ @mve_shl_ri      ....... .... . ... . . ... ... . .. .. .... \
-+    case 0x6:
+                  &mve_shl_ri shim=%imm5_12_6 rdalo=%rdalo_17 rdahi=%rdahi_9
-+        switch (type) {
++@mve_shl_rr      ....... .... . ... . rm:4  ... . .. .. .... \
-+        case 1: /* BFCVT */
++                 &mve_shl_rr rdalo=%rdalo_17 rdahi=%rdahi_9
-+            if (!dc_isar_feature(aa64_bf16, s)) {
-+                goto do_unallocated;
+ {
-+            }
+   TST_xrri       1110101 0000 1 .... 0 ... 1111 .... ....     @S_xrr_shi
-+            if (!fp_access_check(s)) {
+@@ -XXX,XX +XXX,XX @@ BIC_rrri         1110101 0001 . .... 0 ... .... .... ....     @s_rrr_shi
-+                return;
+     URSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
-+            }
+     SRSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
-+            handle_fp_1src_single(s, opcode, rd, rn);
+     SQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
-+            break;
++
-+        default:
++    LSLL_rr      1110101 0010 1 ... 0 ....  ... 1  0000 1101  @mve_shl_rr
-+            goto do_unallocated;
++    ASRL_rr      1110101 0010 1 ... 0 ....  ... 1  0010 1101  @mve_shl_rr
-+        }
++    UQRSHLL64_rr 1110101 0010 1 ... 1 ....  ... 1  0000 1101  @mve_shl_rr
-+        break;
++    SQRSHRL64_rr 1110101 0010 1 ... 1 ....  ... 1  0010 1101  @mve_shl_rr
-+
++    UQRSHLL48_rr 1110101 0010 1 ... 1 ....  ... 1  1000 1101  @mve_shl_rr
-     default:
++    SQRSHRL48_rr 1110101 0010 1 ... 1 ....  ... 1  1010 1101  @mve_shl_rr
-     do_unallocated:
+   ]
-         unallocated_encoding(s);
-diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
+   MOV_rxri       1110101 0010 . 1111 0 ... .... .... ....     @s_rxr_shi
-index XXXXXXX..XXXXXXX 100644
+   ORR_rrri       1110101 0010 . .... 0 ... .... .... ....     @s_rrr_shi
---- a/target/arm/translate-vfp.c
++
-+++ b/target/arm/translate-vfp.c
++  # v8.1M CSEL and friends
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
++  CSEL           1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
-     return true;
+ }
- }
+ {
+   MVN_rxri       1110101 0011 . 1111 0 ... .... .... ....     @s_rxr_shi
-+static bool trans_VCVT_b16_f32(DisasContext *s, arg_VCVT_b16_f32 *a)
+@@ -XXX,XX +XXX,XX @@ SBC_rrri         1110101 1011 . .... 0 ... .... .... ....     @s_rrr_shi
-+{
+ }
-+    TCGv_ptr fpst;
+ RSB_rrri         1110101 1110 . .... 0 ... .... .... ....     @s_rrr_shi
-+    TCGv_i32 tmp;
-+
+-# v8.1M CSEL and friends
-+    if (!dc_isar_feature(aa32_bf16, s)) {
+-CSEL             1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
 -
  # Data-processing (register-shifted register)
  MOV_rxrr         1111 1010 0 shty:2 s:1 rm:4 1111 rd:4 0000 rs:4 \
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mve_vshlc)(CPUARMState *env, void *vd, uint32_t rdm,
      return rdm;
  }
 +uint64_t HELPER(mve_sshrl)(CPUARMState *env, uint64_t n, uint32_t shift)
 +{
 +    return do_sqrshl_d(n, -(int8_t)shift, false, NULL);
 +}
 +
 +uint64_t HELPER(mve_ushll)(CPUARMState *env, uint64_t n, uint32_t shift)
 +{
 +    return do_uqrshl_d(n, (int8_t)shift, false, NULL);
 +}
 +
  uint64_t HELPER(mve_sqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
  {
      return do_sqrshl_d(n, (int8_t)shift, false, &env->QF);
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(mve_uqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
  {
      return do_uqrshl_d(n, (int8_t)shift, false, &env->QF);
  }
 +
 +uint64_t HELPER(mve_sqrshrl)(CPUARMState *env, uint64_t n, uint32_t shift)
 +{
 +    return do_sqrshl_d(n, -(int8_t)shift, true, &env->QF);
 +}
 +
 +uint64_t HELPER(mve_uqrshll)(CPUARMState *env, uint64_t n, uint32_t shift)
 +{
 +    return do_uqrshl_d(n, (int8_t)shift, true, &env->QF);
 +}
 +
 +/* Operate on 64-bit values, but saturate at 48 bits */
 +static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
 +                                    bool round, uint32_t *sat)
 +{
 +    if (shift <= -48) {
 +        /* Rounding the sign bit always produces 0. */
 +        if (round) {
 +            return 0;
 +        }
 +        return src >> 63;
 +    } else if (shift < 0) {
 +        if (round) {
 +            src >>= -shift - 1;
 +            return (src >> 1) + (src & 1);
 +        }
 +        return src >> -shift;
 +    } else if (shift < 48) {
 +        int64_t val = src << shift;
 +        int64_t extval = sextract64(val, 0, 48);
 +        if (!sat || val == extval) {
 +            return extval;
 +        }
 +    } else if (!sat || src == 0) {
 +        return 0;
 +    }
 +
 +    *sat = 1;
 +    return (1ULL << 47) - (src >= 0);
 +}
 +
 +/* Operate on 64-bit values, but saturate at 48 bits */
 +static inline uint64_t do_uqrshl48_d(uint64_t src, int64_t shift,
 +                                     bool round, uint32_t *sat)
 +{
 +    uint64_t val, extval;
 +
 +    if (shift <= -(48 + round)) {
 +        return 0;
 +    } else if (shift < 0) {
 +        if (round) {
 +            val = src >> (-shift - 1);
 +            val = (val >> 1) + (val & 1);
 +        } else {
 +            val = src >> -shift;
 +        }
 +        extval = extract64(val, 0, 48);
 +        if (!sat || val == extval) {
 +            return extval;
 +        }
 +    } else if (shift < 48) {
 +        uint64_t val = src << shift;
 +        uint64_t extval = extract64(val, 0, 48);
 +        if (!sat || val == extval) {
 +            return extval;
 +        }
 +    } else if (!sat || src == 0) {
 +        return 0;
 +    }
 +
 +    *sat = 1;
 +    return MAKE_64BIT_MASK(0, 48);
 +}
 +
 +uint64_t HELPER(mve_sqrshrl48)(CPUARMState *env, uint64_t n, uint32_t shift)
 +{
 +    return do_sqrshl48_d(n, -(int8_t)shift, true, &env->QF);
 +}
 +
 +uint64_t HELPER(mve_uqrshll48)(CPUARMState *env, uint64_t n, uint32_t shift)
 +{
 +    return do_uqrshl48_d(n, (int8_t)shift, true, &env->QF);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_URSHRL_ri(DisasContext *s, arg_mve_shl_ri *a)
      return do_mve_shl_ri(s, a, gen_urshr64_i64);
  }
 +static bool do_mve_shl_rr(DisasContext *s, arg_mve_shl_rr *a, WideShiftFn *fn)
 +{
 +    TCGv_i64 rda;
 +    TCGv_i32 rdalo, rdahi;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 +        /* Decode falls through to ORR/MOV UNPREDICTABLE handling */
 +        return false;
 +    }
-+
++    if (a->rdahi == 15) {
-+    if (!vfp_access_check(s)) {
++        /* These are a different encoding (SQSHL/SRSHR/UQSHL/URSHR) */
 +        return false;
 +    }
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !arm_dc_feature(s, ARM_FEATURE_M_MAIN) ||
 +        a->rdahi == 13 || a->rm == 13 || a->rm == 15 ||
 +        a->rm == a->rdahi || a->rm == a->rdalo) {
 +        /* These rdahi/rdalo/rm cases are UNPREDICTABLE; we choose to UNDEF */
 +        unallocated_encoding(s);
 +        return true;
 +    }
 +
-+    fpst = fpstatus_ptr(FPST_FPCR);
++    rda = tcg_temp_new_i64();
-+    tmp = tcg_temp_new_i32();
++    rdalo = load_reg(s, a->rdalo);
-+
++    rdahi = load_reg(s, a->rdahi);
-+    vfp_load_reg32(tmp, a->vm);
++    tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
-+    gen_helper_bfcvt(tmp, tmp, fpst);
++
-+    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
++    /* The helper takes care of the sign-extension of the low 8 bits of Rm */
-+    tcg_temp_free_ptr(fpst);
++    fn(rda, cpu_env, rda, cpu_R[a->rm]);
-+    tcg_temp_free_i32(tmp);
++
 +    tcg_gen_extrl_i64_i32(rdalo, rda);
 +    tcg_gen_extrh_i64_i32(rdahi, rda);
 +    store_reg(s, a->rdalo, rdalo);
 +    store_reg(s, a->rdahi, rdahi);
 +    tcg_temp_free_i64(rda);
 +
 +    return true;
 +}
 +
- static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
++static bool trans_LSLL_rr(DisasContext *s, arg_mve_shl_rr *a)
- {
++{
-     TCGv_ptr fpst;
++    return do_mve_shl_rr(s, a, gen_helper_mve_ushll);
-diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
++}
-index XXXXXXX..XXXXXXX 100644
++
---- a/target/arm/vfp_helper.c
++static bool trans_ASRL_rr(DisasContext *s, arg_mve_shl_rr *a)
-+++ b/target/arm/vfp_helper.c
++{
-@@ -XXX,XX +XXX,XX @@ float32 VFP_HELPER(fcvts, d)(float64 x, CPUARMState *env)
++    return do_mve_shl_rr(s, a, gen_helper_mve_sshrl);
-     return float64_to_float32(x, &env->vfp.fp_status);
++}
- }
++
++static bool trans_UQRSHLL64_rr(DisasContext *s, arg_mve_shl_rr *a)
-+uint32_t HELPER(bfcvt)(float32 x, void *status)
++{
-+{
++    return do_mve_shl_rr(s, a, gen_helper_mve_uqrshll);
-+    return float32_to_bfloat16(x, status);
++}
 +
 +static bool trans_SQRSHRL64_rr(DisasContext *s, arg_mve_shl_rr *a)
 +{
 +    return do_mve_shl_rr(s, a, gen_helper_mve_sqrshrl);
 +}
 +
 +static bool trans_UQRSHLL48_rr(DisasContext *s, arg_mve_shl_rr *a)
 +{
 +    return do_mve_shl_rr(s, a, gen_helper_mve_uqrshll48);
 +}
 +
 +static bool trans_SQRSHRL48_rr(DisasContext *s, arg_mve_shl_rr *a)
 +{
 +    return do_mve_shl_rr(s, a, gen_helper_mve_sqrshrl48);
 +}
 +
  /*
-  * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
+  * Multiply and multiply accumulate
-  * must always round-to-nearest; the AArch64 ones honour the FPSCR
+  */
 --
 .20.1

-[PULL 20/45] softfpu: Add float_round_to_odd_inf
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-For Arm BFDOT and BFMMLA, we need a version of round-to-odd
-that overflows to infinity, instead of the max normal number.
-Cc: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210525225817.400336-6-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- include/fpu/softfloat-types.h | 4 +++-
- fpu/softfloat-parts.c.inc     | 6 ++++--
-files changed, 7 insertions(+), 3 deletions(-)
-diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/fpu/softfloat-types.h
-+++ b/include/fpu/softfloat-types.h
-@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
-     float_round_up           = 2,
-     float_round_to_zero      = 3,
-     float_round_ties_away    = 4,
--    /* Not an IEEE rounding mode: round to the closest odd mantissa value */
-+    /* Not an IEEE rounding mode: round to closest odd, overflow to max */
-     float_round_to_odd       = 5,
-+    /* Not an IEEE rounding mode: round to closest odd, overflow to inf */
-+    float_round_to_odd_inf   = 6,
- } FloatRoundMode;
- /*
-diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat-parts.c.inc
-+++ b/fpu/softfloat-parts.c.inc
-@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
-         g_assert_not_reached();
-     }
-+    overflow_norm = false;
-     switch (s->float_rounding_mode) {
-     case float_round_nearest_even:
--        overflow_norm = false;
-         inc = ((p->frac_lo & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
-         break;
-     case float_round_ties_away:
--        overflow_norm = false;
-         inc = frac_lsbm1;
-         break;
-     case float_round_to_zero:
-@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
-         break;
-     case float_round_to_odd:
-         overflow_norm = true;
-+        /* fall through */
-+    case float_round_to_odd_inf:
-         inc = p->frac_lo & frac_lsb ? 0 : round_mask;
-         break;
-     default:
-@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
-                        ? frac_lsbm1 : 0);
-                 break;
-             case float_round_to_odd:
-+            case float_round_to_odd_inf:
-                 inc = p->frac_lo & frac_lsb ? 0 : round_mask;
-                 break;
-             default:
---
-.20.1

-[PULL 34/45] hvf: Split out common code on vcpu init and destroy
+[PULL 23/24] target/arm: Implement MVE shifts by immediate
-From: Alexander Graf <agraf@csgraf.de>
+Implement the MVE shifts by immediate, which perform shifts
+on a single general-purpose register.
-Until now, Hypervisor.framework has only been available on x86_64 systems.
-With Apple Silicon shipping now, it extends its reach to aarch64. To
+These patterns overlap with the long-shift-by-immediates,
-prepare for support for multiple architectures, let's start moving common
+so we have to rearrange the grouping a little here.
-code out into its own accel directory.
 This patch splits the vcpu init and destroy functions into a generic and
 an architecture specific portion. This also allows us to move the generic
 functions into the generic hvf code, removing exported functions.
 Signed-off-by: Alexander Graf <agraf@csgraf.de>
 Reviewed-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20210519202253.76782-8-agraf@csgraf.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210628135835.6690-18-peter.maydell@linaro.org
 ---
- accel/hvf/hvf-accel-ops.h |  2 --
+ target/arm/helper-mve.h |  3 ++
- include/sysemu/hvf_int.h  |  2 ++
+ target/arm/translate.h  |  1 +
- accel/hvf/hvf-accel-ops.c | 30 ++++++++++++++++++++++++++++++
+ target/arm/t32.decode   | 31 ++++++++++++++-----
- target/i386/hvf/hvf.c     | 23 ++---------------------
+ target/arm/mve_helper.c | 10 ++++++
-files changed, 34 insertions(+), 23 deletions(-)
+ target/arm/translate.c  | 68 +++++++++++++++++++++++++++++++++++++++--
+files changed, 104 insertions(+), 9 deletions(-)
-diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
-index XXXXXXX..XXXXXXX 100644
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
---- a/accel/hvf/hvf-accel-ops.h
+index XXXXXXX..XXXXXXX 100644
-+++ b/accel/hvf/hvf-accel-ops.h
+--- a/target/arm/helper-mve.h
 +++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_sqrshrl, TCG_CALL_NO_RWG, i64, env, i64, i32)
  DEF_HELPER_FLAGS_3(mve_uqrshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
  DEF_HELPER_FLAGS_3(mve_sqrshrl48, TCG_CALL_NO_RWG, i64, env, i64, i32)
  DEF_HELPER_FLAGS_3(mve_uqrshll48, TCG_CALL_NO_RWG, i64, env, i64, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_uqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_sqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
  typedef void WideShiftImmFn(TCGv_i64, TCGv_i64, int64_t shift);
  typedef void WideShiftFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i32);
 +typedef void ShiftImmFn(TCGv_i32, TCGv_i32, int32_t shift);
  /**
   * arm_tbflags_from_tb:
 diff --git a/target/arm/t32.decode b/target/arm/t32.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/t32.decode
 +++ b/target/arm/t32.decode
 @@ -XXX,XX +XXX,XX @@
- #include "sysemu/cpus.h"
+ &mve_shl_ri      rdalo rdahi shim
+ &mve_shl_rr      rdalo rdahi rm
--int hvf_init_vcpu(CPUState *);
++&mve_sh_ri       rda shim
- int hvf_vcpu_exec(CPUState *);
- void hvf_cpu_synchronize_state(CPUState *);
+ # rdahi: bits [3:1] from insn, bit 0 is 1
- void hvf_cpu_synchronize_post_reset(CPUState *);
+ # rdalo: bits [3:1] from insn, bit 0 is 0
- void hvf_cpu_synchronize_post_init(CPUState *);
+@@ -XXX,XX +XXX,XX @@
- void hvf_cpu_synchronize_pre_loadvm(CPUState *);
+                  &mve_shl_ri shim=%imm5_12_6 rdalo=%rdalo_17 rdahi=%rdahi_9
--void hvf_vcpu_destroy(CPUState *);
+ @mve_shl_rr      ....... .... . ... . rm:4  ... . .. .. .... \
+                  &mve_shl_rr rdalo=%rdalo_17 rdahi=%rdahi_9
- #endif /* HVF_CPUS_H */
++@mve_sh_ri       ....... .... . rda:4 . ... ... . .. .. .... \
-diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
++                 &mve_sh_ri shim=%imm5_12_6
-index XXXXXXX..XXXXXXX 100644
---- a/include/sysemu/hvf_int.h
+ {
-+++ b/include/sysemu/hvf_int.h
+   TST_xrri       1110101 0000 1 .... 0 ... 1111 .... ....     @S_xrr_shi
-@@ -XXX,XX +XXX,XX @@ struct HVFState {
+@@ -XXX,XX +XXX,XX @@ BIC_rrri         1110101 0001 . .... 0 ... .... .... ....     @s_rrr_shi
- extern HVFState *hvf_state;
+   # the rest fall through (where ORR_rrri and MOV_rxri will end up
+   # handling them as r13 and r15 accesses with the same semantics as A32).
- void assert_hvf_ok(hv_return_t ret);
+   [
-+int hvf_arch_init_vcpu(CPUState *cpu);
+-    LSLL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 00 1111  @mve_shl_ri
-+void hvf_arch_vcpu_destroy(CPUState *cpu);
+-    LSRL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 01 1111  @mve_shl_ri
- hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
+-    ASRL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 10 1111  @mve_shl_ri
- int hvf_put_registers(CPUState *);
++    {
- int hvf_get_registers(CPUState *);
++      UQSHL_ri   1110101 0010 1 ....  0 ...  1111 .. 00 1111  @mve_sh_ri
-diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
++      LSLL_ri    1110101 0010 1 ... 0 0 ... ... 1 .. 00 1111  @mve_shl_ri
-index XXXXXXX..XXXXXXX 100644
++      UQSHLL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 00 1111  @mve_shl_ri
---- a/accel/hvf/hvf-accel-ops.c
++    }
-+++ b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@ static void hvf_type_init(void)
+-    UQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 00 1111  @mve_shl_ri
+-    URSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
- type_init(hvf_type_init);
+-    SRSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
+-    SQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
-+static void hvf_vcpu_destroy(CPUState *cpu)
++    {
-+{
++      URSHR_ri   1110101 0010 1 ....  0 ...  1111 .. 01 1111  @mve_sh_ri
-+    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
++      LSRL_ri    1110101 0010 1 ... 0 0 ... ... 1 .. 01 1111  @mve_shl_ri
-+    assert_hvf_ok(ret);
++      URSHRL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
-+
++    }
-+    hvf_arch_vcpu_destroy(cpu);
++
-+}
++    {
-+
++      SRSHR_ri   1110101 0010 1 ....  0 ...  1111 .. 10 1111  @mve_sh_ri
-+static int hvf_init_vcpu(CPUState *cpu)
++      ASRL_ri    1110101 0010 1 ... 0 0 ... ... 1 .. 10 1111  @mve_shl_ri
-+{
++      SRSHRL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
-+    int r;
++    }
 +
-+    /* init cpu signals */
++    {
-+    sigset_t set;
++      SQSHL_ri   1110101 0010 1 ....  0 ...  1111 .. 11 1111  @mve_sh_ri
-+    struct sigaction sigact;
++      SQSHLL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
-+
++    }
-+    memset(&sigact, 0, sizeof(sigact));
-+    sigact.sa_handler = dummy_signal;
+     LSLL_rr      1110101 0010 1 ... 0 ....  ... 1  0000 1101  @mve_shl_rr
-+    sigaction(SIG_IPI, &sigact, NULL);
+     ASRL_rr      1110101 0010 1 ... 0 ....  ... 1  0010 1101  @mve_shl_rr
-+
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-+    pthread_sigmask(SIG_BLOCK, NULL, &set);
+index XXXXXXX..XXXXXXX 100644
-+    sigdelset(&set, SIG_IPI);
+--- a/target/arm/mve_helper.c
-+
++++ b/target/arm/mve_helper.c
-+    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
+@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(mve_uqrshll48)(CPUARMState *env, uint64_t n, uint32_t shift)
-+    cpu->vcpu_dirty = 1;
+ {
-+    assert_hvf_ok(r);
+     return do_uqrshl48_d(n, (int8_t)shift, true, &env->QF);
-+
+ }
-+    return hvf_arch_init_vcpu(cpu);
++
 +uint32_t HELPER(mve_uqshl)(CPUARMState *env, uint32_t n, uint32_t shift)
 +{
 +    return do_uqrshl_bhs(n, (int8_t)shift, 32, false, &env->QF);
 +}
 +
 +uint32_t HELPER(mve_sqshl)(CPUARMState *env, uint32_t n, uint32_t shift)
 +{
 +    return do_sqrshl_bhs(n, (int8_t)shift, 32, false, &env->QF);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_srshr16_i64(TCGv_i64 d, TCGv_i64 a, int64_t sh)
  static void gen_srshr32_i32(TCGv_i32 d, TCGv_i32 a, int32_t sh)
  {
 -    TCGv_i32 t = tcg_temp_new_i32();
 +    TCGv_i32 t;
 +    /* Handle shift by the input size for the benefit of trans_SRSHR_ri */
 +    if (sh == 32) {
 +        tcg_gen_movi_i32(d, 0);
 +        return;
 +    }
 +    t = tcg_temp_new_i32();
      tcg_gen_extract_i32(t, a, sh - 1, 1);
      tcg_gen_sari_i32(d, a, sh);
      tcg_gen_add_i32(d, d, t);
@@ -XXX,XX +XXX,XX @@ static void gen_urshr16_i64(TCGv_i64 d, TCGv_i64 a, int64_t sh)
  static void gen_urshr32_i32(TCGv_i32 d, TCGv_i32 a, int32_t sh)
  {
 -    TCGv_i32 t = tcg_temp_new_i32();
 +    TCGv_i32 t;
 +    /* Handle shift by the input size for the benefit of trans_URSHR_ri */
 +    if (sh == 32) {
 +        tcg_gen_extract_i32(d, a, sh - 1, 1);
 +        return;
 +    }
 +    t = tcg_temp_new_i32();
      tcg_gen_extract_i32(t, a, sh - 1, 1);
      tcg_gen_shri_i32(d, a, sh);
      tcg_gen_add_i32(d, d, t);
@@ -XXX,XX +XXX,XX @@ static bool trans_SQRSHRL48_rr(DisasContext *s, arg_mve_shl_rr *a)
      return do_mve_shl_rr(s, a, gen_helper_mve_sqrshrl48);
  }
 +static bool do_mve_sh_ri(DisasContext *s, arg_mve_sh_ri *a, ShiftImmFn *fn)
 +{
 +    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 +        /* Decode falls through to ORR/MOV UNPREDICTABLE handling */
 +        return false;
 +    }
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !arm_dc_feature(s, ARM_FEATURE_M_MAIN) ||
 +        a->rda == 13 || a->rda == 15) {
 +        /* These rda cases are UNPREDICTABLE; we choose to UNDEF */
 +        unallocated_encoding(s);
 +        return true;
 +    }
 +
 +    if (a->shim == 0) {
 +        a->shim = 32;
 +    }
 +    fn(cpu_R[a->rda], cpu_R[a->rda], a->shim);
 +
 +    return true;
 +}
 +
 +static bool trans_URSHR_ri(DisasContext *s, arg_mve_sh_ri *a)
 +{
 +    return do_mve_sh_ri(s, a, gen_urshr32_i32);
 +}
 +
 +static bool trans_SRSHR_ri(DisasContext *s, arg_mve_sh_ri *a)
 +{
 +    return do_mve_sh_ri(s, a, gen_srshr32_i32);
 +}
 +
 +static void gen_mve_sqshl(TCGv_i32 r, TCGv_i32 n, int32_t shift)
 +{
 +    gen_helper_mve_sqshl(r, cpu_env, n, tcg_constant_i32(shift));
 +}
 +
 +static bool trans_SQSHL_ri(DisasContext *s, arg_mve_sh_ri *a)
 +{
 +    return do_mve_sh_ri(s, a, gen_mve_sqshl);
 +}
 +
 +static void gen_mve_uqshl(TCGv_i32 r, TCGv_i32 n, int32_t shift)
 +{
 +    gen_helper_mve_uqshl(r, cpu_env, n, tcg_constant_i32(shift));
 +}
 +
 +static bool trans_UQSHL_ri(DisasContext *s, arg_mve_sh_ri *a)
 +{
 +    return do_mve_sh_ri(s, a, gen_mve_uqshl);
 +}
 +
  /*
-  * The HVF-specific vCPU thread function. This one should only run when the host
+  * Multiply and multiply accumulate
-  * CPU supports the VMX "unrestricted guest" feature.
+  */
 diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/i386/hvf/hvf.c
 +++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
      return false;
  }
 -void hvf_vcpu_destroy(CPUState *cpu)
 +void hvf_arch_vcpu_destroy(CPUState *cpu)
  {
      X86CPU *x86_cpu = X86_CPU(cpu);
      CPUX86State *env = &x86_cpu->env;
 -    hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd);
      g_free(env->hvf_mmio_buf);
 -    assert_hvf_ok(ret);
  }
  static void init_tsc_freq(CPUX86State *env)
@@ -XXX,XX +XXX,XX @@ static inline bool apic_bus_freq_is_known(CPUX86State *env)
      return env->apic_bus_freq != 0;
  }
 -int hvf_init_vcpu(CPUState *cpu)
 +int hvf_arch_init_vcpu(CPUState *cpu)
  {
 -
      X86CPU *x86cpu = X86_CPU(cpu);
      CPUX86State *env = &x86cpu->env;
 -    int r;
 -
 -    /* init cpu signals */
 -    sigset_t set;
 -    struct sigaction sigact;
 -
 -    memset(&sigact, 0, sizeof(sigact));
 -    sigact.sa_handler = dummy_signal;
 -    sigaction(SIG_IPI, &sigact, NULL);
 -
 -    pthread_sigmask(SIG_BLOCK, NULL, &set);
 -    sigdelset(&set, SIG_IPI);
      init_emu();
      init_decoder();
@@ -XXX,XX +XXX,XX @@ int hvf_init_vcpu(CPUState *cpu)
          }
      }
 -    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
 -    cpu->vcpu_dirty = 1;
 -    assert_hvf_ok(r);
 -
      if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED,
          &hvf_state->hvf_caps->vmx_cap_pinbased)) {
          abort();
 --
 .20.1

-[PULL 21/45] target/arm: Implement bfloat16 dot product (vector)
+[PULL 24/24] target/arm: Implement MVE shifts by register
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the MVE shifts by register, which perform
 shifts on a single general-purpose register.
-This is BFDOT for both AArch64 AdvSIMD and SVE,
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-and VDOT.BF16 for AArch32 NEON.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210628135835.6690-19-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h |  2 ++
  target/arm/translate.h  |  1 +
  target/arm/t32.decode   | 18 ++++++++++++++----
  target/arm/mve_helper.c | 10 ++++++++++
  target/arm/translate.c  | 30 ++++++++++++++++++++++++++++++
 files changed, 57 insertions(+), 4 deletions(-)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 Message-id: 20210525225817.400336-7-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/helper.h           |  3 +++
  target/arm/neon-shared.decode |  2 ++
  target/arm/sve.decode         |  3 +++
  target/arm/translate-a64.c    | 20 ++++++++++++++++++
  target/arm/translate-neon.c   |  9 ++++++++
  target/arm/translate-sve.c    | 12 +++++++++++
  target/arm/vec_helper.c       | 40 +++++++++++++++++++++++++++++++++++
 files changed, 89 insertions(+)
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.h
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/helper.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_ummla_b, TCG_CALL_NO_RWG,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_uqrshll48, TCG_CALL_NO_RWG, i64, env, i64, i32)
- DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG,
-                    void, ptr, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_3(mve_uqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
+ DEF_HELPER_FLAGS_3(mve_sqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
-+DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
++DEF_HELPER_FLAGS_3(mve_uqrshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
-+                   void, ptr, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_3(mve_sqrshr, TCG_CALL_NO_RWG, i32, env, i32, i32)
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
  typedef void WideShiftImmFn(TCGv_i64, TCGv_i64, int64_t shift);
  typedef void WideShiftFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i32);
  typedef void ShiftImmFn(TCGv_i32, TCGv_i32, int32_t shift);
 +typedef void ShiftFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
  /**
   * arm_tbflags_from_tb:
 diff --git a/target/arm/t32.decode b/target/arm/t32.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/t32.decode
 +++ b/target/arm/t32.decode
@@ -XXX,XX +XXX,XX @@
  &mve_shl_ri      rdalo rdahi shim
  &mve_shl_rr      rdalo rdahi rm
  &mve_sh_ri       rda shim
 +&mve_sh_rr       rda rm
  # rdahi: bits [3:1] from insn, bit 0 is 1
  # rdalo: bits [3:1] from insn, bit 0 is 0
@@ -XXX,XX +XXX,XX @@
                   &mve_shl_rr rdalo=%rdalo_17 rdahi=%rdahi_9
  @mve_sh_ri       ....... .... . rda:4 . ... ... . .. .. .... \
                   &mve_sh_ri shim=%imm5_12_6
 +@mve_sh_rr       ....... .... . rda:4 rm:4 .... .... .... &mve_sh_rr
  {
    TST_xrri       1110101 0000 1 .... 0 ... 1111 .... ....     @S_xrr_shi
@@ -XXX,XX +XXX,XX @@ BIC_rrri         1110101 0001 . .... 0 ... .... .... ....     @s_rrr_shi
        SQSHLL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
      }
 -    LSLL_rr      1110101 0010 1 ... 0 ....  ... 1  0000 1101  @mve_shl_rr
 -    ASRL_rr      1110101 0010 1 ... 0 ....  ... 1  0010 1101  @mve_shl_rr
 -    UQRSHLL64_rr 1110101 0010 1 ... 1 ....  ... 1  0000 1101  @mve_shl_rr
 -    SQRSHRL64_rr 1110101 0010 1 ... 1 ....  ... 1  0010 1101  @mve_shl_rr
 +    {
 +      UQRSHL_rr    1110101 0010 1 ....  ....  1111 0000 1101  @mve_sh_rr
 +      LSLL_rr      1110101 0010 1 ... 0 .... ... 1 0000 1101  @mve_shl_rr
 +      UQRSHLL64_rr 1110101 0010 1 ... 1 .... ... 1 0000 1101  @mve_shl_rr
 +    }
 +
- #ifdef TARGET_AARCH64
++    {
- #include "helper-a64.h"
++      SQRSHR_rr    1110101 0010 1 ....  ....  1111 0010 1101  @mve_sh_rr
- #include "helper-sve.h"
++      ASRL_rr      1110101 0010 1 ... 0 .... ... 1 0010 1101  @mve_shl_rr
-diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
++      SQRSHRL64_rr 1110101 0010 1 ... 1 .... ... 1 0010 1101  @mve_shl_rr
 +    }
 +
      UQRSHLL48_rr 1110101 0010 1 ... 1 ....  ... 1  1000 1101  @mve_shl_rr
      SQRSHRL48_rr 1110101 0010 1 ... 1 ....  ... 1  1010 1101  @mve_shl_rr
    ]
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-shared.decode
+--- a/target/arm/mve_helper.c
-+++ b/target/arm/neon-shared.decode
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ VUDOT          1111 110 00 . 10 .... .... 1101 . q:1 . 1 .... \
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mve_sqshl)(CPUARMState *env, uint32_t n, uint32_t shift)
-                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+ {
- VUSDOT         1111 110 01 . 10 .... .... 1101 . q:1 . 0 .... \
+     return do_sqrshl_bhs(n, (int8_t)shift, 32, false, &env->QF);
-                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+ }
-+VDOT_b16       1111 110 00 . 00 .... .... 1101 . q:1 . 0 .... \
++
-+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
++uint32_t HELPER(mve_uqrshl)(CPUARMState *env, uint32_t n, uint32_t shift)
++{
- # VFM[AS]L
++    return do_uqrshl_bhs(n, (int8_t)shift, 32, true, &env->QF);
- VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
++}
-diff --git a/target/arm/sve.decode b/target/arm/sve.decode
++
 +uint32_t HELPER(mve_sqrshr)(CPUARMState *env, uint32_t n, uint32_t shift)
 +{
 +    return do_sqrshl_bhs(n, -(int8_t)shift, 32, true, &env->QF);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/sve.decode
+--- a/target/arm/translate.c
-+++ b/target/arm/sve.decode
++++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
+@@ -XXX,XX +XXX,XX @@ static bool trans_UQSHL_ri(DisasContext *s, arg_mve_sh_ri *a)
- FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_e0
+     return do_mve_sh_ri(s, a, gen_mve_uqshl);
  FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_e0
 +### SVE2 floating-point bfloat16 dot-product
 +BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
 +
  ### SVE2 floating-point multiply-add long (indexed)
  FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
  FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          feature = dc_isar_feature(aa64_fcma, s);
          break;
 +    case 0x1f: /* BFDOT */
 +        switch (size) {
 +        case 1:
 +            feature = dc_isar_feature(aa64_bf16, s);
 +            break;
 +        default:
 +            unallocated_encoding(s);
 +            return;
 +        }
 +        break;
      default:
          unallocated_encoding(s);
          return;
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
          }
          return;
 +    case 0xf: /* BFDOT */
 +        switch (size) {
 +        case 1:
 +            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
 +            break;
 +        default:
 +            g_assert_not_reached();
 +        }
 +        return;
 +
      default:
          g_assert_not_reached();
      }
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VUSDOT(DisasContext *s, arg_VUSDOT *a)
                          gen_helper_gvec_usdot_b);
  }
-+static bool trans_VDOT_b16(DisasContext *s, arg_VDOT_b16 *a)
++static bool do_mve_sh_rr(DisasContext *s, arg_mve_sh_rr *a, ShiftFn *fn)
 +{
-+    if (!dc_isar_feature(aa32_bf16, s)) {
++    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 +        /* Decode falls through to ORR/MOV UNPREDICTABLE handling */
 +        return false;
 +    }
-+    return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0,
++    if (!dc_isar_feature(aa32_mve, s) ||
-+                        gen_helper_gvec_bfdot);
++        !arm_dc_feature(s, ARM_FEATURE_M_MAIN) ||
 +        a->rda == 13 || a->rda == 15 || a->rm == 13 || a->rm == 15 ||
 +        a->rm == a->rda) {
 +        /* These rda/rm cases are UNPREDICTABLE; we choose to UNDEF */
 +        unallocated_encoding(s);
 +        return true;
 +    }
 +
 +    /* The helper takes care of the sign-extension of the low 8 bits of Rm */
 +    fn(cpu_R[a->rda], cpu_env, cpu_R[a->rda], cpu_R[a->rm]);
 +    return true;
 +}
 +
- static bool trans_VFML(DisasContext *s, arg_VFML *a)
++static bool trans_SQRSHR_rr(DisasContext *s, arg_mve_sh_rr *a)
  {
      int opr_sz;
 diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-sve.c
 +++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_UMMLA(DisasContext *s, arg_rrrr_esz *a)
  {
      return do_i8mm_zzzz_ool(s, a, gen_helper_gvec_ummla_b, 0);
  }
 +
 +static bool trans_BFDOT_zzzz(DisasContext *s, arg_rrrr_esz *a)
 +{
-+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
++    return do_mve_sh_rr(s, a, gen_helper_mve_sqrshr);
 +        return false;
 +    }
 +    if (sve_access_check(s)) {
 +        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfdot,
 +                          a->rd, a->rn, a->rm, a->ra, 0);
 +    }
 +    return true;
 +}
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static void do_mmla_b(void *vd, void *vn, void *vm, void *va, uint32_t desc,
  DO_MMLA_B(gvec_smmla_b, do_smmla_b)
  DO_MMLA_B(gvec_ummla_b, do_ummla_b)
  DO_MMLA_B(gvec_usmmla_b, do_usmmla_b)
 +
 +/*
 + * BFloat16 Dot Product
 + */
 +
 +static float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2)
 +{
 +    /* FPCR is ignored for BFDOT and BFMMLA. */
 +    float_status bf_status = {
 +        .tininess_before_rounding = float_tininess_before_rounding,
 +        .float_rounding_mode = float_round_to_odd_inf,
 +        .flush_to_zero = true,
 +        .flush_inputs_to_zero = true,
 +        .default_nan_mode = true,
 +    };
 +    float32 t1, t2;
 +
 +    /*
 +     * Extract each BFloat16 from the element pair, and shift
 +     * them such that they become float32.
 +     */
 +    t1 = float32_mul(e1 << 16, e2 << 16, &bf_status);
 +    t2 = float32_mul(e1 & 0xffff0000u, e2 & 0xffff0000u, &bf_status);
 +    t1 = float32_add(t1, t2, &bf_status);
 +    t1 = float32_add(sum, t1, &bf_status);
 +
 +    return t1;
 +}
 +
-+void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
++static bool trans_UQRSHL_rr(DisasContext *s, arg_mve_sh_rr *a)
 +{
-+    intptr_t i, opr_sz = simd_oprsz(desc);
++    return do_mve_sh_rr(s, a, gen_helper_mve_uqrshl);
-+    float32 *d = vd, *a = va;
++}
 +    uint32_t *n = vn, *m = vm;
 +
-+    for (i = 0; i < opr_sz / 4; ++i) {
+ /*
-+        d[i] = bfdotadd(a[i], n[i], m[i]);
+  * Multiply and multiply accumulate
-+    }
+  */
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 --
 .20.1

-[PULL 26/45] linux-user/aarch64: Enable hwcap bits for bfloat16
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210525225817.400336-12-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- linux-user/elfload.c | 2 ++
-file changed, 2 insertions(+)
-diff --git a/linux-user/elfload.c b/linux-user/elfload.c
-index XXXXXXX..XXXXXXX 100644
---- a/linux-user/elfload.c
-+++ b/linux-user/elfload.c
-@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap2(void)
-     GET_FEATURE_ID(aa64_sve_i8mm, ARM_HWCAP2_A64_SVEI8MM);
-     GET_FEATURE_ID(aa64_sve_f32mm, ARM_HWCAP2_A64_SVEF32MM);
-     GET_FEATURE_ID(aa64_sve_f64mm, ARM_HWCAP2_A64_SVEF64MM);
-+    GET_FEATURE_ID(aa64_sve_bf16, ARM_HWCAP2_A64_SVEBF16);
-     GET_FEATURE_ID(aa64_i8mm, ARM_HWCAP2_A64_I8MM);
-+    GET_FEATURE_ID(aa64_bf16, ARM_HWCAP2_A64_BF16);
-     GET_FEATURE_ID(aa64_rndr, ARM_HWCAP2_A64_RNG);
-     GET_FEATURE_ID(aa64_bti, ARM_HWCAP2_A64_BTI);
-     GET_FEATURE_ID(aa64_mte, ARM_HWCAP2_A64_MTE);
---
-.20.1

-[PULL 27/45] target/arm: Enable BFloat16 extensions
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-Disable BF16 again for !have_neon and !have_vfp during realize.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20210525225817.400336-13-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/cpu.c     | 3 +++
- target/arm/cpu64.c   | 3 +++
- target/arm/cpu_tcg.c | 1 +
-files changed, 7 insertions(+)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
-+++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
-         u = cpu->isar.id_isar6;
-         u = FIELD_DP32(u, ID_ISAR6, JSCVT, 0);
-+        u = FIELD_DP32(u, ID_ISAR6, BF16, 0);
-         cpu->isar.id_isar6 = u;
-         u = cpu->isar.mvfr0;
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
-         t = cpu->isar.id_aa64isar1;
-         t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 0);
-+        t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 0);
-         t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 0);
-         cpu->isar.id_aa64isar1 = t;
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
-         u = cpu->isar.id_isar6;
-         u = FIELD_DP32(u, ID_ISAR6, DP, 0);
-         u = FIELD_DP32(u, ID_ISAR6, FHM, 0);
-+        u = FIELD_DP32(u, ID_ISAR6, BF16, 0);
-         u = FIELD_DP32(u, ID_ISAR6, I8MM, 0);
-         cpu->isar.id_isar6 = u;
-diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu64.c
-+++ b/target/arm/cpu64.c
-@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
-         t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
-         t = FIELD_DP64(t, ID_AA64ISAR1, SB, 1);
-         t = FIELD_DP64(t, ID_AA64ISAR1, SPECRES, 1);
-+        t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 1);
-         t = FIELD_DP64(t, ID_AA64ISAR1, FRINTTS, 1);
-         t = FIELD_DP64(t, ID_AA64ISAR1, LRCPC, 2); /* ARMv8.4-RCPC */
-         t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 1);
-@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
-         t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 1);
-         t = FIELD_DP64(t, ID_AA64ZFR0, AES, 2);  /* PMULL */
-         t = FIELD_DP64(t, ID_AA64ZFR0, BITPERM, 1);
-+        t = FIELD_DP64(t, ID_AA64ZFR0, BFLOAT16, 1);
-         t = FIELD_DP64(t, ID_AA64ZFR0, SHA3, 1);
-         t = FIELD_DP64(t, ID_AA64ZFR0, SM4, 1);
-         t = FIELD_DP64(t, ID_AA64ZFR0, I8MM, 1);
-@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
-         u = FIELD_DP32(u, ID_ISAR6, FHM, 1);
-         u = FIELD_DP32(u, ID_ISAR6, SB, 1);
-         u = FIELD_DP32(u, ID_ISAR6, SPECRES, 1);
-+        u = FIELD_DP32(u, ID_ISAR6, BF16, 1);
-         u = FIELD_DP32(u, ID_ISAR6, I8MM, 1);
-         cpu->isar.id_isar6 = u;
-diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu_tcg.c
-+++ b/target/arm/cpu_tcg.c
-@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
-         t = FIELD_DP32(t, ID_ISAR6, FHM, 1);
-         t = FIELD_DP32(t, ID_ISAR6, SB, 1);
-         t = FIELD_DP32(t, ID_ISAR6, SPECRES, 1);
-+        t = FIELD_DP32(t, ID_ISAR6, BF16, 1);
-         t = FIELD_DP32(t, ID_ISAR6, I8MM, 1);
-         cpu->isar.id_isar6 = t;
---
-.20.1

-[PULL 29/45] hvf: Move vcpu thread functions into common directory
+Deleted patch
-From: Alexander Graf <agraf@csgraf.de>
-Until now, Hypervisor.framework has only been available on x86_64 systems.
-With Apple Silicon shipping now, it extends its reach to aarch64. To
-prepare for support for multiple architectures, let's start moving common
-code out into its own accel directory.
-This patch moves the vCPU thread loop over.
-Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Reviewed-by: Sergio Lopez <slp@redhat.com>
-Message-id: 20210519202253.76782-3-agraf@csgraf.de
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- {target/i386 => accel}/hvf/hvf-accel-ops.h | 0
- {target/i386 => accel}/hvf/hvf-accel-ops.c | 0
- target/i386/hvf/x86hvf.c                   | 2 +-
- accel/hvf/meson.build                      | 1 +
- target/i386/hvf/meson.build                | 1 -
-files changed, 2 insertions(+), 2 deletions(-)
- rename {target/i386 => accel}/hvf/hvf-accel-ops.h (100%)
- rename {target/i386 => accel}/hvf/hvf-accel-ops.c (100%)
-diff --git a/target/i386/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
-similarity index 100%
-rename from target/i386/hvf/hvf-accel-ops.h
-rename to accel/hvf/hvf-accel-ops.h
-diff --git a/target/i386/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
-similarity index 100%
-rename from target/i386/hvf/hvf-accel-ops.c
-rename to accel/hvf/hvf-accel-ops.c
-diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/i386/hvf/x86hvf.c
-+++ b/target/i386/hvf/x86hvf.c
-@@ -XXX,XX +XXX,XX @@
- #include <Hypervisor/hv.h>
- #include <Hypervisor/hv_vmx.h>
--#include "hvf-accel-ops.h"
-+#include "accel/hvf/hvf-accel-ops.h"
- void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
-                      SegmentCache *qseg, bool is_tr)
-diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
-index XXXXXXX..XXXXXXX 100644
---- a/accel/hvf/meson.build
-+++ b/accel/hvf/meson.build
-@@ -XXX,XX +XXX,XX @@
- hvf_ss = ss.source_set()
- hvf_ss.add(files(
-   'hvf-all.c',
-+  'hvf-accel-ops.c',
- ))
- specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
-diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build
-index XXXXXXX..XXXXXXX 100644
---- a/target/i386/hvf/meson.build
-+++ b/target/i386/hvf/meson.build
-@@ -XXX,XX +XXX,XX @@
- i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
-   'hvf.c',
--  'hvf-accel-ops.c',
-   'x86.c',
-   'x86_cpuid.c',
-   'x86_decode.c',
---
-.20.1

-[PULL 31/45] hvf: Move hvf internal definitions into common header
+Deleted patch
-From: Alexander Graf <agraf@csgraf.de>
-Until now, Hypervisor.framework has only been available on x86_64 systems.
-With Apple Silicon shipping now, it extends its reach to aarch64. To
-prepare for support for multiple architectures, let's start moving common
-code out into its own accel directory.
-This patch moves a few internal struct and constant defines over.
-Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Reviewed-by: Sergio Lopez <slp@redhat.com>
-Message-id: 20210519202253.76782-5-agraf@csgraf.de
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- include/sysemu/hvf_int.h   | 30 ++++++++++++++++++++++++++++++
- target/i386/hvf/hvf-i386.h | 31 +------------------------------
-files changed, 31 insertions(+), 30 deletions(-)
-diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/sysemu/hvf_int.h
-+++ b/include/sysemu/hvf_int.h
-@@ -XXX,XX +XXX,XX @@
- #include <Hypervisor/hv.h>
-+/* hvf_slot flags */
-+#define HVF_SLOT_LOG (1 << 0)
-+
-+typedef struct hvf_slot {
-+    uint64_t start;
-+    uint64_t size;
-+    uint8_t *mem;
-+    int slot_id;
-+    uint32_t flags;
-+    MemoryRegion *region;
-+} hvf_slot;
-+
-+typedef struct hvf_vcpu_caps {
-+    uint64_t vmx_cap_pinbased;
-+    uint64_t vmx_cap_procbased;
-+    uint64_t vmx_cap_procbased2;
-+    uint64_t vmx_cap_entry;
-+    uint64_t vmx_cap_exit;
-+    uint64_t vmx_cap_preemption_timer;
-+} hvf_vcpu_caps;
-+
-+struct HVFState {
-+    AccelState parent;
-+    hvf_slot slots[32];
-+    int num_slots;
-+
-+    hvf_vcpu_caps *hvf_caps;
-+};
-+extern HVFState *hvf_state;
-+
- void hvf_set_phys_mem(MemoryRegionSection *, bool);
- void assert_hvf_ok(hv_return_t ret);
- hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
-diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/i386/hvf/hvf-i386.h
-+++ b/target/i386/hvf/hvf-i386.h
-@@ -XXX,XX +XXX,XX @@
- #include "qemu/accel.h"
- #include "sysemu/hvf.h"
-+#include "sysemu/hvf_int.h"
- #include "cpu.h"
- #include "x86.h"
--/* hvf_slot flags */
--#define HVF_SLOT_LOG (1 << 0)
--
--typedef struct hvf_slot {
--    uint64_t start;
--    uint64_t size;
--    uint8_t *mem;
--    int slot_id;
--    uint32_t flags;
--    MemoryRegion *region;
--} hvf_slot;
--
--typedef struct hvf_vcpu_caps {
--    uint64_t vmx_cap_pinbased;
--    uint64_t vmx_cap_procbased;
--    uint64_t vmx_cap_procbased2;
--    uint64_t vmx_cap_entry;
--    uint64_t vmx_cap_exit;
--    uint64_t vmx_cap_preemption_timer;
--} hvf_vcpu_caps;
--
--struct HVFState {
--    AccelState parent;
--    hvf_slot slots[32];
--    int num_slots;
--
--    hvf_vcpu_caps *hvf_caps;
--};
--extern HVFState *hvf_state;
--
- void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
- #ifdef NEED_CPU_H
---
-.20.1

-[PULL 32/45] hvf: Make hvf_set_phys_mem() static
+Deleted patch
-From: Alexander Graf <agraf@csgraf.de>
-The hvf_set_phys_mem() function is only called within the same file.
-Make it static.
-Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Reviewed-by: Sergio Lopez <slp@redhat.com>
-Message-id: 20210519202253.76782-6-agraf@csgraf.de
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- include/sysemu/hvf_int.h  | 1 -
- accel/hvf/hvf-accel-ops.c | 2 +-
-files changed, 1 insertion(+), 2 deletions(-)
-diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/sysemu/hvf_int.h
-+++ b/include/sysemu/hvf_int.h
-@@ -XXX,XX +XXX,XX @@ struct HVFState {
- };
- extern HVFState *hvf_state;
--void hvf_set_phys_mem(MemoryRegionSection *, bool);
- void assert_hvf_ok(hv_return_t ret);
- hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
- int hvf_put_registers(CPUState *);
-diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
-index XXXXXXX..XXXXXXX 100644
---- a/accel/hvf/hvf-accel-ops.c
-+++ b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@ static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
-     return 0;
- }
--void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
-+static void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
- {
-     hvf_slot *mem;
-     MemoryRegion *area = section->mr;
---
-.20.1

-[PULL 33/45] hvf: Remove use of hv_uvaddr_t and hv_gpaddr_t
+Deleted patch
-From: Alexander Graf <agraf@csgraf.de>
-The ARM version of Hypervisor.framework no longer defines these two
-types, so let's just revert to standard ones.
-Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Reviewed-by: Sergio Lopez <slp@redhat.com>
-Message-id: 20210519202253.76782-7-agraf@csgraf.de
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- accel/hvf/hvf-accel-ops.c | 6 +++---
-file changed, 3 insertions(+), 3 deletions(-)
-diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
-index XXXXXXX..XXXXXXX 100644
---- a/accel/hvf/hvf-accel-ops.c
-+++ b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@ static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
-     macslot->present = 1;
-     macslot->gpa_start = slot->start;
-     macslot->size = slot->size;
--    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
-+    ret = hv_vm_map(slot->mem, slot->start, slot->size, flags);
-     assert_hvf_ok(ret);
-     return 0;
- }
-@@ -XXX,XX +XXX,XX @@ static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
-     /* protect region against writes; begin tracking it */
-     if (on) {
-         slot->flags |= HVF_SLOT_LOG;
--        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
-+        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
-                       HV_MEMORY_READ);
-     /* stop tracking region*/
-     } else {
-         slot->flags &= ~HVF_SLOT_LOG;
--        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
-+        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
-                       HV_MEMORY_READ | HV_MEMORY_WRITE);
-     }
- }
---
-.20.1

-[PULL 35/45] hvf: Use cpu_synchronize_state()
+Deleted patch
-From: Alexander Graf <agraf@csgraf.de>
-There is no reason to call the hvf specific hvf_cpu_synchronize_state()
-when we can just use the generic cpu_synchronize_state() instead. This
-allows us to have less dependency on internal function definitions and
-allows us to make hvf_cpu_synchronize_state() static.
-Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Reviewed-by: Sergio Lopez <slp@redhat.com>
-Message-id: 20210519202253.76782-9-agraf@csgraf.de
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- accel/hvf/hvf-accel-ops.h | 1 -
- accel/hvf/hvf-accel-ops.c | 2 +-
- target/i386/hvf/x86hvf.c  | 9 ++++-----
-files changed, 5 insertions(+), 7 deletions(-)
-diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
-index XXXXXXX..XXXXXXX 100644
---- a/accel/hvf/hvf-accel-ops.h
-+++ b/accel/hvf/hvf-accel-ops.h
-@@ -XXX,XX +XXX,XX @@
- #include "sysemu/cpus.h"
- int hvf_vcpu_exec(CPUState *);
--void hvf_cpu_synchronize_state(CPUState *);
- void hvf_cpu_synchronize_post_reset(CPUState *);
- void hvf_cpu_synchronize_post_init(CPUState *);
- void hvf_cpu_synchronize_pre_loadvm(CPUState *);
-diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
-index XXXXXXX..XXXXXXX 100644
---- a/accel/hvf/hvf-accel-ops.c
-+++ b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@ static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
-     }
- }
--void hvf_cpu_synchronize_state(CPUState *cpu)
-+static void hvf_cpu_synchronize_state(CPUState *cpu)
- {
-     if (!cpu->vcpu_dirty) {
-         run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
-diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/i386/hvf/x86hvf.c
-+++ b/target/i386/hvf/x86hvf.c
-@@ -XXX,XX +XXX,XX @@
- #include "cpu.h"
- #include "x86_descr.h"
- #include "x86_decode.h"
-+#include "sysemu/hw_accel.h"
- #include "hw/i386/apic_internal.h"
- #include <Hypervisor/hv.h>
- #include <Hypervisor/hv_vmx.h>
--#include "accel/hvf/hvf-accel-ops.h"
--
- void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
-                      SegmentCache *qseg, bool is_tr)
- {
-@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
-     env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
-     if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
--        hvf_cpu_synchronize_state(cpu_state);
-+        cpu_synchronize_state(cpu_state);
-         do_cpu_init(cpu);
-     }
-@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
-         cpu_state->halted = 0;
-     }
-     if (cpu_state->interrupt_request & CPU_INTERRUPT_SIPI) {
--        hvf_cpu_synchronize_state(cpu_state);
-+        cpu_synchronize_state(cpu_state);
-         do_cpu_sipi(cpu);
-     }
-     if (cpu_state->interrupt_request & CPU_INTERRUPT_TPR) {
-         cpu_state->interrupt_request &= ~CPU_INTERRUPT_TPR;
--        hvf_cpu_synchronize_state(cpu_state);
-+        cpu_synchronize_state(cpu_state);
-         apic_handle_tpr_access_report(cpu->apic_state, env->eip,
-                                       env->tpr_access_type);
-     }
---
-.20.1

-[PULL 36/45] hvf: Make synchronize functions static
+Deleted patch
-From: Alexander Graf <agraf@csgraf.de>
-The hvf accel synchronize functions are only used as input for local
-callback functions, so we can make them static.
-Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Reviewed-by: Sergio Lopez <slp@redhat.com>
-Message-id: 20210519202253.76782-10-agraf@csgraf.de
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- accel/hvf/hvf-accel-ops.h | 3 ---
- accel/hvf/hvf-accel-ops.c | 6 +++---
-files changed, 3 insertions(+), 6 deletions(-)
-diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
-index XXXXXXX..XXXXXXX 100644
---- a/accel/hvf/hvf-accel-ops.h
-+++ b/accel/hvf/hvf-accel-ops.h
-@@ -XXX,XX +XXX,XX @@
- #include "sysemu/cpus.h"
- int hvf_vcpu_exec(CPUState *);
--void hvf_cpu_synchronize_post_reset(CPUState *);
--void hvf_cpu_synchronize_post_init(CPUState *);
--void hvf_cpu_synchronize_pre_loadvm(CPUState *);
- #endif /* HVF_CPUS_H */
-diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
-index XXXXXXX..XXXXXXX 100644
---- a/accel/hvf/hvf-accel-ops.c
-+++ b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@ static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
-     cpu->vcpu_dirty = false;
- }
--void hvf_cpu_synchronize_post_reset(CPUState *cpu)
-+static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
- {
-     run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
- }
-@@ -XXX,XX +XXX,XX @@ static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
-     cpu->vcpu_dirty = false;
- }
--void hvf_cpu_synchronize_post_init(CPUState *cpu)
-+static void hvf_cpu_synchronize_post_init(CPUState *cpu)
- {
-     run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
- }
-@@ -XXX,XX +XXX,XX @@ static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
-     cpu->vcpu_dirty = true;
- }
--void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
-+static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
- {
-     run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
- }
---
-.20.1

-[PULL 37/45] hvf: Remove hvf-accel-ops.h
+Deleted patch
-From: Alexander Graf <agraf@csgraf.de>
-We can move the definition of hvf_vcpu_exec() into our internal
-hvf header, obsoleting the need for hvf-accel-ops.h.
-Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Reviewed-by: Sergio Lopez <slp@redhat.com>
-Message-id: 20210519202253.76782-11-agraf@csgraf.de
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- accel/hvf/hvf-accel-ops.h | 17 -----------------
- include/sysemu/hvf_int.h  |  1 +
- accel/hvf/hvf-accel-ops.c |  2 --
- target/i386/hvf/hvf.c     |  2 --
-files changed, 1 insertion(+), 21 deletions(-)
- delete mode 100644 accel/hvf/hvf-accel-ops.h
-diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
-deleted file mode 100644
-index XXXXXXX..XXXXXXX
---- a/accel/hvf/hvf-accel-ops.h
-+++ /dev/null
-@@ -XXX,XX +XXX,XX @@
--/*
-- * Accelerator CPUS Interface
-- *
-- * Copyright 2020 SUSE LLC
-- *
-- * This work is licensed under the terms of the GNU GPL, version 2 or later.
-- * See the COPYING file in the top-level directory.
-- */
--
--#ifndef HVF_CPUS_H
--#define HVF_CPUS_H
--
--#include "sysemu/cpus.h"
--
--int hvf_vcpu_exec(CPUState *);
--
--#endif /* HVF_CPUS_H */
-diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/sysemu/hvf_int.h
-+++ b/include/sysemu/hvf_int.h
-@@ -XXX,XX +XXX,XX @@ extern HVFState *hvf_state;
- void assert_hvf_ok(hv_return_t ret);
- int hvf_arch_init_vcpu(CPUState *cpu);
- void hvf_arch_vcpu_destroy(CPUState *cpu);
-+int hvf_vcpu_exec(CPUState *);
- hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
- int hvf_put_registers(CPUState *);
- int hvf_get_registers(CPUState *);
-diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
-index XXXXXXX..XXXXXXX 100644
---- a/accel/hvf/hvf-accel-ops.c
-+++ b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@
- #include "sysemu/runstate.h"
- #include "qemu/guest-random.h"
--#include "hvf-accel-ops.h"
--
- HVFState *hvf_state;
- /* Memory slots */
-diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/i386/hvf/hvf.c
-+++ b/target/i386/hvf/hvf.c
-@@ -XXX,XX +XXX,XX @@
- #include "qemu/accel.h"
- #include "target/i386/cpu.h"
--#include "hvf-accel-ops.h"
--
- void vmx_update_tpr(CPUState *cpu)
- {
-     /* TODO: need integrate APIC handling */
---
-.20.1

-[PULL 38/45] hvf: Introduce hvf vcpu struct
+Deleted patch
-From: Alexander Graf <agraf@csgraf.de>
-We will need more than a single field for hvf going forward. To keep
-the global vcpu struct uncluttered, let's allocate a special hvf vcpu
-struct, similar to how hax does it.
-Signed-off-by: Alexander Graf <agraf@csgraf.de>
-Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
-Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Sergio Lopez <slp@redhat.com>
-Message-id: 20210519202253.76782-12-agraf@csgraf.de
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- include/hw/core/cpu.h       |   3 +-
- include/sysemu/hvf_int.h    |   4 +
- target/i386/hvf/vmx.h       |  24 +++--
- accel/hvf/hvf-accel-ops.c   |   8 +-
- target/i386/hvf/hvf.c       | 104 +++++++++---------
- target/i386/hvf/x86.c       |  28 ++---
- target/i386/hvf/x86_descr.c |  26 ++---
- target/i386/hvf/x86_emu.c   |  62 +++++------
- target/i386/hvf/x86_mmu.c   |   4 +-
- target/i386/hvf/x86_task.c  |  12 +--
- target/i386/hvf/x86hvf.c    | 210 ++++++++++++++++++------------------
-files changed, 248 insertions(+), 237 deletions(-)
-diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/hw/core/cpu.h
-+++ b/include/hw/core/cpu.h
-@@ -XXX,XX +XXX,XX @@ struct KVMState;
- struct kvm_run;
- struct hax_vcpu_state;
-+struct hvf_vcpu_state;
- #define TB_JMP_CACHE_BITS 12
- #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
-@@ -XXX,XX +XXX,XX @@ struct CPUState {
-     struct hax_vcpu_state *hax_vcpu;
--    int hvf_fd;
-+    struct hvf_vcpu_state *hvf;
-     /* track IOMMUs whose translations we've cached in the TCG TLB */
-     GArray *iommu_notifiers;
-diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/sysemu/hvf_int.h
-+++ b/include/sysemu/hvf_int.h
-@@ -XXX,XX +XXX,XX @@ struct HVFState {
- };
- extern HVFState *hvf_state;
-+struct hvf_vcpu_state {
-+    int fd;
-+};
-+
- void assert_hvf_ok(hv_return_t ret);
- int hvf_arch_init_vcpu(CPUState *cpu);
- void hvf_arch_vcpu_destroy(CPUState *cpu);
-diff --git a/target/i386/hvf/vmx.h b/target/i386/hvf/vmx.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/i386/hvf/vmx.h
-+++ b/target/i386/hvf/vmx.h
-@@ -XXX,XX +XXX,XX @@
- #include "vmcs.h"
- #include "cpu.h"
- #include "x86.h"
-+#include "sysemu/hvf.h"
-+#include "sysemu/hvf_int.h"
- #include "exec/address-spaces.h"
-@@ -XXX,XX +XXX,XX @@ static inline void macvm_set_rip(CPUState *cpu, uint64_t rip)
-     uint64_t val;
-     /* BUG, should take considering overlap.. */
--    wreg(cpu->hvf_fd, HV_X86_RIP, rip);
-+    wreg(cpu->hvf->fd, HV_X86_RIP, rip);
-     env->eip = rip;
-     /* after moving forward in rip, we need to clean INTERRUPTABILITY */
--   val = rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
-+   val = rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
-    if (val & (VMCS_INTERRUPTIBILITY_STI_BLOCKING |
-                VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)) {
-         env->hflags &= ~HF_INHIBIT_IRQ_MASK;
--        wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY,
-+        wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY,
-                val & ~(VMCS_INTERRUPTIBILITY_STI_BLOCKING |
-                VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING));
-    }
-@@ -XXX,XX +XXX,XX @@ static inline void vmx_clear_nmi_blocking(CPUState *cpu)
-     CPUX86State *env = &x86_cpu->env;
-     env->hflags2 &= ~HF2_NMI_MASK;
--    uint32_t gi = (uint32_t) rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
-+    uint32_t gi = (uint32_t) rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
-     gi &= ~VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
--    wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
-+    wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
- }
- static inline void vmx_set_nmi_blocking(CPUState *cpu)
-@@ -XXX,XX +XXX,XX @@ static inline void vmx_set_nmi_blocking(CPUState *cpu)
-     CPUX86State *env = &x86_cpu->env;
-     env->hflags2 |= HF2_NMI_MASK;
--    uint32_t gi = (uint32_t)rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
-+    uint32_t gi = (uint32_t)rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
-     gi |= VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
--    wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
-+    wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
- }
- static inline void vmx_set_nmi_window_exiting(CPUState *cpu)
- {
-     uint64_t val;
--    val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
--    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val |
-+    val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
-+    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val |
-           VMCS_PRI_PROC_BASED_CTLS_NMI_WINDOW_EXITING);
- }
-@@ -XXX,XX +XXX,XX @@ static inline void vmx_clear_nmi_window_exiting(CPUState *cpu)
- {
-     uint64_t val;
--    val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
--    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val &
-+    val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
-+    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val &
-           ~VMCS_PRI_PROC_BASED_CTLS_NMI_WINDOW_EXITING);
- }
-diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
-index XXXXXXX..XXXXXXX 100644
---- a/accel/hvf/hvf-accel-ops.c
-+++ b/accel/hvf/hvf-accel-ops.c
-@@ -XXX,XX +XXX,XX @@ type_init(hvf_type_init);
- static void hvf_vcpu_destroy(CPUState *cpu)
- {
--    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
-+    hv_return_t ret = hv_vcpu_destroy(cpu->hvf->fd);
-     assert_hvf_ok(ret);
-     hvf_arch_vcpu_destroy(cpu);
-+    g_free(cpu->hvf);
-+    cpu->hvf = NULL;
- }
- static int hvf_init_vcpu(CPUState *cpu)
- {
-     int r;
-+    cpu->hvf = g_malloc0(sizeof(*cpu->hvf));
-+
-     /* init cpu signals */
-     sigset_t set;
-     struct sigaction sigact;
-@@ -XXX,XX +XXX,XX @@ static int hvf_init_vcpu(CPUState *cpu)
-     pthread_sigmask(SIG_BLOCK, NULL, &set);
-     sigdelset(&set, SIG_IPI);
--    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
-+    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf->fd, HV_VCPU_DEFAULT);
-     cpu->vcpu_dirty = 1;
-     assert_hvf_ok(r);
-diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/i386/hvf/hvf.c
-+++ b/target/i386/hvf/hvf.c
-@@ -XXX,XX +XXX,XX @@ void vmx_update_tpr(CPUState *cpu)
-     int tpr = cpu_get_apic_tpr(x86_cpu->apic_state) << 4;
-     int irr = apic_get_highest_priority_irr(x86_cpu->apic_state);
--    wreg(cpu->hvf_fd, HV_X86_TPR, tpr);
-+    wreg(cpu->hvf->fd, HV_X86_TPR, tpr);
-     if (irr == -1) {
--        wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, 0);
-+        wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, 0);
-     } else {
--        wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, (irr > tpr) ? tpr >> 4 :
-+        wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, (irr > tpr) ? tpr >> 4 :
-               irr >> 4);
-     }
- }
-@@ -XXX,XX +XXX,XX @@ void vmx_update_tpr(CPUState *cpu)
- static void update_apic_tpr(CPUState *cpu)
- {
-     X86CPU *x86_cpu = X86_CPU(cpu);
--    int tpr = rreg(cpu->hvf_fd, HV_X86_TPR) >> 4;
-+    int tpr = rreg(cpu->hvf->fd, HV_X86_TPR) >> 4;
-     cpu_set_apic_tpr(x86_cpu->apic_state, tpr);
- }
-@@ -XXX,XX +XXX,XX @@ int hvf_arch_init_vcpu(CPUState *cpu)
-     }
-     /* set VMCS control fields */
--    wvmcs(cpu->hvf_fd, VMCS_PIN_BASED_CTLS,
-+    wvmcs(cpu->hvf->fd, VMCS_PIN_BASED_CTLS,
-           cap2ctrl(hvf_state->hvf_caps->vmx_cap_pinbased,
-           VMCS_PIN_BASED_CTLS_EXTINT |
-           VMCS_PIN_BASED_CTLS_NMI |
-           VMCS_PIN_BASED_CTLS_VNMI));
--    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS,
-+    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS,
-           cap2ctrl(hvf_state->hvf_caps->vmx_cap_procbased,
-           VMCS_PRI_PROC_BASED_CTLS_HLT |
-           VMCS_PRI_PROC_BASED_CTLS_MWAIT |
-           VMCS_PRI_PROC_BASED_CTLS_TSC_OFFSET |
-           VMCS_PRI_PROC_BASED_CTLS_TPR_SHADOW) |
-           VMCS_PRI_PROC_BASED_CTLS_SEC_CONTROL);
--    wvmcs(cpu->hvf_fd, VMCS_SEC_PROC_BASED_CTLS,
-+    wvmcs(cpu->hvf->fd, VMCS_SEC_PROC_BASED_CTLS,
-           cap2ctrl(hvf_state->hvf_caps->vmx_cap_procbased2,
-                    VMCS_PRI_PROC_BASED2_CTLS_APIC_ACCESSES));
--    wvmcs(cpu->hvf_fd, VMCS_ENTRY_CTLS, cap2ctrl(hvf_state->hvf_caps->vmx_cap_entry,
-+    wvmcs(cpu->hvf->fd, VMCS_ENTRY_CTLS, cap2ctrl(hvf_state->hvf_caps->vmx_cap_entry,
-));
--    wvmcs(cpu->hvf_fd, VMCS_EXCEPTION_BITMAP, 0); /* Double fault */
-+    wvmcs(cpu->hvf->fd, VMCS_EXCEPTION_BITMAP, 0); /* Double fault */
--    wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, 0);
-+    wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, 0);
-     x86cpu = X86_CPU(cpu);
-     x86cpu->env.xsave_buf = qemu_memalign(4096, 4096);
--    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_STAR, 1);
--    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_LSTAR, 1);
--    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_CSTAR, 1);
--    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_FMASK, 1);
--    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_FSBASE, 1);
--    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_GSBASE, 1);
--    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_KERNELGSBASE, 1);
--    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_TSC_AUX, 1);
--    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_TSC, 1);
--    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_CS, 1);
--    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_EIP, 1);
--    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_ESP, 1);
-+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_STAR, 1);
-+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_LSTAR, 1);
-+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_CSTAR, 1);
-+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_FMASK, 1);
-+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_FSBASE, 1);
-+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_GSBASE, 1);
-+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_KERNELGSBASE, 1);
-+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_TSC_AUX, 1);
-+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_TSC, 1);
-+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_CS, 1);
-+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_EIP, 1);
-+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_ESP, 1);
-     return 0;
- }
-@@ -XXX,XX +XXX,XX @@ static void hvf_store_events(CPUState *cpu, uint32_t ins_len, uint64_t idtvec_in
-         }
-         if (idtvec_info & VMCS_IDT_VEC_ERRCODE_VALID) {
-             env->has_error_code = true;
--            env->error_code = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_ERROR);
-+            env->error_code = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_ERROR);
-         }
-     }
--    if ((rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY) &
-+    if ((rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY) &
-         VMCS_INTERRUPTIBILITY_NMI_BLOCKING)) {
-         env->hflags2 |= HF2_NMI_MASK;
-     } else {
-         env->hflags2 &= ~HF2_NMI_MASK;
-     }
--    if (rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY) &
-+    if (rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY) &
-          (VMCS_INTERRUPTIBILITY_STI_BLOCKING |
-          VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)) {
-         env->hflags |= HF_INHIBIT_IRQ_MASK;
-@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
-             return EXCP_HLT;
-         }
--        hv_return_t r  = hv_vcpu_run(cpu->hvf_fd);
-+        hv_return_t r  = hv_vcpu_run(cpu->hvf->fd);
-         assert_hvf_ok(r);
-         /* handle VMEXIT */
--        uint64_t exit_reason = rvmcs(cpu->hvf_fd, VMCS_EXIT_REASON);
--        uint64_t exit_qual = rvmcs(cpu->hvf_fd, VMCS_EXIT_QUALIFICATION);
--        uint32_t ins_len = (uint32_t)rvmcs(cpu->hvf_fd,
-+        uint64_t exit_reason = rvmcs(cpu->hvf->fd, VMCS_EXIT_REASON);
-+        uint64_t exit_qual = rvmcs(cpu->hvf->fd, VMCS_EXIT_QUALIFICATION);
-+        uint32_t ins_len = (uint32_t)rvmcs(cpu->hvf->fd,
-                                            VMCS_EXIT_INSTRUCTION_LENGTH);
--        uint64_t idtvec_info = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_INFO);
-+        uint64_t idtvec_info = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_INFO);
-         hvf_store_events(cpu, ins_len, idtvec_info);
--        rip = rreg(cpu->hvf_fd, HV_X86_RIP);
--        env->eflags = rreg(cpu->hvf_fd, HV_X86_RFLAGS);
-+        rip = rreg(cpu->hvf->fd, HV_X86_RIP);
-+        env->eflags = rreg(cpu->hvf->fd, HV_X86_RFLAGS);
-         qemu_mutex_lock_iothread();
-@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
-         case EXIT_REASON_EPT_FAULT:
-         {
-             hvf_slot *slot;
--            uint64_t gpa = rvmcs(cpu->hvf_fd, VMCS_GUEST_PHYSICAL_ADDRESS);
-+            uint64_t gpa = rvmcs(cpu->hvf->fd, VMCS_GUEST_PHYSICAL_ADDRESS);
-             if (((idtvec_info & VMCS_IDT_VEC_VALID) == 0) &&
-                 ((exit_qual & EXIT_QUAL_NMIUDTI) != 0)) {
-@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
-                 store_regs(cpu);
-                 break;
-             } else if (!string && !in) {
--                RAX(env) = rreg(cpu->hvf_fd, HV_X86_RAX);
-+                RAX(env) = rreg(cpu->hvf->fd, HV_X86_RAX);
-                 hvf_handle_io(env, port, &RAX(env), 1, size, 1);
-                 macvm_set_rip(cpu, rip + ins_len);
-                 break;
-@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
-             break;
-         }
-         case EXIT_REASON_CPUID: {
--            uint32_t rax = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RAX);
--            uint32_t rbx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RBX);
--            uint32_t rcx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RCX);
--            uint32_t rdx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RDX);
-+            uint32_t rax = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RAX);
-+            uint32_t rbx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RBX);
-+            uint32_t rcx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RCX);
-+            uint32_t rdx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RDX);
-             if (rax == 1) {
-                 /* CPUID1.ecx.OSXSAVE needs to know CR4 */
--                env->cr[4] = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR4);
-+                env->cr[4] = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR4);
-             }
-             hvf_cpu_x86_cpuid(env, rax, rcx, &rax, &rbx, &rcx, &rdx);
--            wreg(cpu->hvf_fd, HV_X86_RAX, rax);
--            wreg(cpu->hvf_fd, HV_X86_RBX, rbx);
--            wreg(cpu->hvf_fd, HV_X86_RCX, rcx);
--            wreg(cpu->hvf_fd, HV_X86_RDX, rdx);
-+            wreg(cpu->hvf->fd, HV_X86_RAX, rax);
-+            wreg(cpu->hvf->fd, HV_X86_RBX, rbx);
-+            wreg(cpu->hvf->fd, HV_X86_RCX, rcx);
-+            wreg(cpu->hvf->fd, HV_X86_RDX, rdx);
-             macvm_set_rip(cpu, rip + ins_len);
-             break;
-@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
-         case EXIT_REASON_XSETBV: {
-             X86CPU *x86_cpu = X86_CPU(cpu);
-             CPUX86State *env = &x86_cpu->env;
--            uint32_t eax = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RAX);
--            uint32_t ecx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RCX);
--            uint32_t edx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RDX);
-+            uint32_t eax = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RAX);
-+            uint32_t ecx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RCX);
-+            uint32_t edx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RDX);
-             if (ecx) {
-                 macvm_set_rip(cpu, rip + ins_len);
-                 break;
-             }
-             env->xcr0 = ((uint64_t)edx << 32) | eax;
--            wreg(cpu->hvf_fd, HV_X86_XCR0, env->xcr0 | 1);
-+            wreg(cpu->hvf->fd, HV_X86_XCR0, env->xcr0 | 1);
-             macvm_set_rip(cpu, rip + ins_len);
-             break;
-         }
-@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
-             switch (cr) {
-             case 0x0: {
--                macvm_set_cr0(cpu->hvf_fd, RRX(env, reg));
-+                macvm_set_cr0(cpu->hvf->fd, RRX(env, reg));
-                 break;
-             }
-             case 4: {
--                macvm_set_cr4(cpu->hvf_fd, RRX(env, reg));
-+                macvm_set_cr4(cpu->hvf->fd, RRX(env, reg));
-                 break;
-             }
-             case 8: {
-@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
-             break;
-         }
-         case EXIT_REASON_TASK_SWITCH: {
--            uint64_t vinfo = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_INFO);
-+            uint64_t vinfo = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_INFO);
-             x68_segment_selector sel = {.sel = exit_qual & 0xffff};
-             vmx_handle_task_switch(cpu, sel, (exit_qual >> 30) & 0x3,
-              vinfo & VMCS_INTR_VALID, vinfo & VECTORING_INFO_VECTOR_MASK, vinfo
-@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
-             break;
-         }
-         case EXIT_REASON_RDPMC:
--            wreg(cpu->hvf_fd, HV_X86_RAX, 0);
--            wreg(cpu->hvf_fd, HV_X86_RDX, 0);
-+            wreg(cpu->hvf->fd, HV_X86_RAX, 0);
-+            wreg(cpu->hvf->fd, HV_X86_RDX, 0);
-             macvm_set_rip(cpu, rip + ins_len);
-             break;
-         case VMX_REASON_VMCALL:
-diff --git a/target/i386/hvf/x86.c b/target/i386/hvf/x86.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/i386/hvf/x86.c
-+++ b/target/i386/hvf/x86.c
-@@ -XXX,XX +XXX,XX @@ bool x86_read_segment_descriptor(struct CPUState *cpu,
-     }
-     if (GDT_SEL == sel.ti) {
--        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_BASE);
--        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
-+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_BASE);
-+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
-     } else {
--        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_BASE);
--        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_LIMIT);
-+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_BASE);
-+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_LIMIT);
-     }
-     if (sel.index * 8 >= limit) {
-@@ -XXX,XX +XXX,XX @@ bool x86_write_segment_descriptor(struct CPUState *cpu,
-     uint32_t limit;
-     if (GDT_SEL == sel.ti) {
--        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_BASE);
--        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
-+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_BASE);
-+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
-     } else {
--        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_BASE);
--        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_LIMIT);
-+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_BASE);
-+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_LIMIT);
-     }
-     if (sel.index * 8 >= limit) {
-@@ -XXX,XX +XXX,XX @@ bool x86_write_segment_descriptor(struct CPUState *cpu,
- bool x86_read_call_gate(struct CPUState *cpu, struct x86_call_gate *idt_desc,
-                         int gate)
- {
--    target_ulong base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_IDTR_BASE);
--    uint32_t limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_IDTR_LIMIT);
-+    target_ulong base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_IDTR_BASE);
-+    uint32_t limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_IDTR_LIMIT);
-     memset(idt_desc, 0, sizeof(*idt_desc));
-     if (gate * 8 >= limit) {
-@@ -XXX,XX +XXX,XX @@ bool x86_read_call_gate(struct CPUState *cpu, struct x86_call_gate *idt_desc,
- bool x86_is_protected(struct CPUState *cpu)
- {
--    uint64_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
-+    uint64_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
-     return cr0 & CR0_PE;
- }
-@@ -XXX,XX +XXX,XX @@ bool x86_is_v8086(struct CPUState *cpu)
- bool x86_is_long_mode(struct CPUState *cpu)
- {
--    return rvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER) & MSR_EFER_LMA;
-+    return rvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER) & MSR_EFER_LMA;
- }
- bool x86_is_long64_mode(struct CPUState *cpu)
-@@ -XXX,XX +XXX,XX @@ bool x86_is_long64_mode(struct CPUState *cpu)
- bool x86_is_paging_mode(struct CPUState *cpu)
- {
--    uint64_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
-+    uint64_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
-     return cr0 & CR0_PG;
- }
- bool x86_is_pae_enabled(struct CPUState *cpu)
- {
--    uint64_t cr4 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR4);
-+    uint64_t cr4 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR4);
-     return cr4 & CR4_PAE;
- }
-diff --git a/target/i386/hvf/x86_descr.c b/target/i386/hvf/x86_descr.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/i386/hvf/x86_descr.c
-+++ b/target/i386/hvf/x86_descr.c
-@@ -XXX,XX +XXX,XX @@ static const struct vmx_segment_field {
- uint32_t vmx_read_segment_limit(CPUState *cpu, X86Seg seg)
- {
--    return (uint32_t)rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].limit);
-+    return (uint32_t)rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].limit);
- }
- uint32_t vmx_read_segment_ar(CPUState *cpu, X86Seg seg)
- {
--    return (uint32_t)rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].ar_bytes);
-+    return (uint32_t)rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].ar_bytes);
- }
- uint64_t vmx_read_segment_base(CPUState *cpu, X86Seg seg)
- {
--    return rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].base);
-+    return rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].base);
- }
- x68_segment_selector vmx_read_segment_selector(CPUState *cpu, X86Seg seg)
- {
-     x68_segment_selector sel;
--    sel.sel = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector);
-+    sel.sel = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector);
-     return sel;
- }
- void vmx_write_segment_selector(struct CPUState *cpu, x68_segment_selector selector, X86Seg seg)
- {
--    wvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector, selector.sel);
-+    wvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector, selector.sel);
- }
- void vmx_read_segment_descriptor(struct CPUState *cpu, struct vmx_segment *desc, X86Seg seg)
- {
--    desc->sel = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector);
--    desc->base = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].base);
--    desc->limit = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].limit);
--    desc->ar = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].ar_bytes);
-+    desc->sel = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector);
-+    desc->base = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].base);
-+    desc->limit = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].limit);
-+    desc->ar = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].ar_bytes);
- }
- void vmx_write_segment_descriptor(CPUState *cpu, struct vmx_segment *desc, X86Seg seg)
- {
-     const struct vmx_segment_field *sf = &vmx_segment_fields[seg];
--    wvmcs(cpu->hvf_fd, sf->base, desc->base);
--    wvmcs(cpu->hvf_fd, sf->limit, desc->limit);
--    wvmcs(cpu->hvf_fd, sf->selector, desc->sel);
--    wvmcs(cpu->hvf_fd, sf->ar_bytes, desc->ar);
-+    wvmcs(cpu->hvf->fd, sf->base, desc->base);
-+    wvmcs(cpu->hvf->fd, sf->limit, desc->limit);
-+    wvmcs(cpu->hvf->fd, sf->selector, desc->sel);
-+    wvmcs(cpu->hvf->fd, sf->ar_bytes, desc->ar);
- }
- void x86_segment_descriptor_to_vmx(struct CPUState *cpu, x68_segment_selector selector, struct x86_segment_descriptor *desc, struct vmx_segment *vmx_desc)
-diff --git a/target/i386/hvf/x86_emu.c b/target/i386/hvf/x86_emu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/i386/hvf/x86_emu.c
-+++ b/target/i386/hvf/x86_emu.c
-@@ -XXX,XX +XXX,XX @@ void simulate_rdmsr(struct CPUState *cpu)
-     switch (msr) {
-     case MSR_IA32_TSC:
--        val = rdtscp() + rvmcs(cpu->hvf_fd, VMCS_TSC_OFFSET);
-+        val = rdtscp() + rvmcs(cpu->hvf->fd, VMCS_TSC_OFFSET);
-         break;
-     case MSR_IA32_APICBASE:
-         val = cpu_get_apic_base(X86_CPU(cpu)->apic_state);
-@@ -XXX,XX +XXX,XX @@ void simulate_rdmsr(struct CPUState *cpu)
-         val = x86_cpu->ucode_rev;
-         break;
-     case MSR_EFER:
--        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER);
-+        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER);
-         break;
-     case MSR_FSBASE:
--        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_FS_BASE);
-+        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_FS_BASE);
-         break;
-     case MSR_GSBASE:
--        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_GS_BASE);
-+        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_GS_BASE);
-         break;
-     case MSR_KERNELGSBASE:
--        val = rvmcs(cpu->hvf_fd, VMCS_HOST_FS_BASE);
-+        val = rvmcs(cpu->hvf->fd, VMCS_HOST_FS_BASE);
-         break;
-     case MSR_STAR:
-         abort();
-@@ -XXX,XX +XXX,XX @@ void simulate_wrmsr(struct CPUState *cpu)
-         cpu_set_apic_base(X86_CPU(cpu)->apic_state, data);
-         break;
-     case MSR_FSBASE:
--        wvmcs(cpu->hvf_fd, VMCS_GUEST_FS_BASE, data);
-+        wvmcs(cpu->hvf->fd, VMCS_GUEST_FS_BASE, data);
-         break;
-     case MSR_GSBASE:
--        wvmcs(cpu->hvf_fd, VMCS_GUEST_GS_BASE, data);
-+        wvmcs(cpu->hvf->fd, VMCS_GUEST_GS_BASE, data);
-         break;
-     case MSR_KERNELGSBASE:
--        wvmcs(cpu->hvf_fd, VMCS_HOST_FS_BASE, data);
-+        wvmcs(cpu->hvf->fd, VMCS_HOST_FS_BASE, data);
-         break;
-     case MSR_STAR:
-         abort();
-@@ -XXX,XX +XXX,XX @@ void simulate_wrmsr(struct CPUState *cpu)
-         break;
-     case MSR_EFER:
-         /*printf("new efer %llx\n", EFER(cpu));*/
--        wvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER, data);
-+        wvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER, data);
-         if (data & MSR_EFER_NXE) {
--            hv_vcpu_invalidate_tlb(cpu->hvf_fd);
-+            hv_vcpu_invalidate_tlb(cpu->hvf->fd);
-         }
-         break;
-     case MSR_MTRRphysBase(0):
-@@ -XXX,XX +XXX,XX @@ void load_regs(struct CPUState *cpu)
-     CPUX86State *env = &x86_cpu->env;
-     int i = 0;
--    RRX(env, R_EAX) = rreg(cpu->hvf_fd, HV_X86_RAX);
--    RRX(env, R_EBX) = rreg(cpu->hvf_fd, HV_X86_RBX);
--    RRX(env, R_ECX) = rreg(cpu->hvf_fd, HV_X86_RCX);
--    RRX(env, R_EDX) = rreg(cpu->hvf_fd, HV_X86_RDX);
--    RRX(env, R_ESI) = rreg(cpu->hvf_fd, HV_X86_RSI);
--    RRX(env, R_EDI) = rreg(cpu->hvf_fd, HV_X86_RDI);
--    RRX(env, R_ESP) = rreg(cpu->hvf_fd, HV_X86_RSP);
--    RRX(env, R_EBP) = rreg(cpu->hvf_fd, HV_X86_RBP);
-+    RRX(env, R_EAX) = rreg(cpu->hvf->fd, HV_X86_RAX);
-+    RRX(env, R_EBX) = rreg(cpu->hvf->fd, HV_X86_RBX);
-+    RRX(env, R_ECX) = rreg(cpu->hvf->fd, HV_X86_RCX);
-+    RRX(env, R_EDX) = rreg(cpu->hvf->fd, HV_X86_RDX);
-+    RRX(env, R_ESI) = rreg(cpu->hvf->fd, HV_X86_RSI);
-+    RRX(env, R_EDI) = rreg(cpu->hvf->fd, HV_X86_RDI);
-+    RRX(env, R_ESP) = rreg(cpu->hvf->fd, HV_X86_RSP);
-+    RRX(env, R_EBP) = rreg(cpu->hvf->fd, HV_X86_RBP);
-     for (i = 8; i < 16; i++) {
--        RRX(env, i) = rreg(cpu->hvf_fd, HV_X86_RAX + i);
-+        RRX(env, i) = rreg(cpu->hvf->fd, HV_X86_RAX + i);
-     }
--    env->eflags = rreg(cpu->hvf_fd, HV_X86_RFLAGS);
-+    env->eflags = rreg(cpu->hvf->fd, HV_X86_RFLAGS);
-     rflags_to_lflags(env);
--    env->eip = rreg(cpu->hvf_fd, HV_X86_RIP);
-+    env->eip = rreg(cpu->hvf->fd, HV_X86_RIP);
- }
- void store_regs(struct CPUState *cpu)
-@@ -XXX,XX +XXX,XX @@ void store_regs(struct CPUState *cpu)
-     CPUX86State *env = &x86_cpu->env;
-     int i = 0;
--    wreg(cpu->hvf_fd, HV_X86_RAX, RAX(env));
--    wreg(cpu->hvf_fd, HV_X86_RBX, RBX(env));
--    wreg(cpu->hvf_fd, HV_X86_RCX, RCX(env));
--    wreg(cpu->hvf_fd, HV_X86_RDX, RDX(env));
--    wreg(cpu->hvf_fd, HV_X86_RSI, RSI(env));
--    wreg(cpu->hvf_fd, HV_X86_RDI, RDI(env));
--    wreg(cpu->hvf_fd, HV_X86_RBP, RBP(env));
--    wreg(cpu->hvf_fd, HV_X86_RSP, RSP(env));
-+    wreg(cpu->hvf->fd, HV_X86_RAX, RAX(env));
-+    wreg(cpu->hvf->fd, HV_X86_RBX, RBX(env));
-+    wreg(cpu->hvf->fd, HV_X86_RCX, RCX(env));
-+    wreg(cpu->hvf->fd, HV_X86_RDX, RDX(env));
-+    wreg(cpu->hvf->fd, HV_X86_RSI, RSI(env));
-+    wreg(cpu->hvf->fd, HV_X86_RDI, RDI(env));
-+    wreg(cpu->hvf->fd, HV_X86_RBP, RBP(env));
-+    wreg(cpu->hvf->fd, HV_X86_RSP, RSP(env));
-     for (i = 8; i < 16; i++) {
--        wreg(cpu->hvf_fd, HV_X86_RAX + i, RRX(env, i));
-+        wreg(cpu->hvf->fd, HV_X86_RAX + i, RRX(env, i));
-     }
-     lflags_to_rflags(env);
--    wreg(cpu->hvf_fd, HV_X86_RFLAGS, env->eflags);
-+    wreg(cpu->hvf->fd, HV_X86_RFLAGS, env->eflags);
-     macvm_set_rip(cpu, env->eip);
- }
-diff --git a/target/i386/hvf/x86_mmu.c b/target/i386/hvf/x86_mmu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/i386/hvf/x86_mmu.c
-+++ b/target/i386/hvf/x86_mmu.c
-@@ -XXX,XX +XXX,XX @@ static bool test_pt_entry(struct CPUState *cpu, struct gpt_translation *pt,
-         pt->err_code |= MMU_PAGE_PT;
-     }
--    uint32_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
-+    uint32_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
-     /* check protection */
-     if (cr0 & CR0_WP) {
-         if (pt->write_access && !pte_write_access(pte)) {
-@@ -XXX,XX +XXX,XX @@ static bool walk_gpt(struct CPUState *cpu, target_ulong addr, int err_code,
- {
-     int top_level, level;
-     bool is_large = false;
--    target_ulong cr3 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR3);
-+    target_ulong cr3 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR3);
-     uint64_t page_mask = pae ? PAE_PTE_PAGE_MASK : LEGACY_PTE_PAGE_MASK;
-     memset(pt, 0, sizeof(*pt));
-diff --git a/target/i386/hvf/x86_task.c b/target/i386/hvf/x86_task.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/i386/hvf/x86_task.c
-+++ b/target/i386/hvf/x86_task.c
-@@ -XXX,XX +XXX,XX @@ static void load_state_from_tss32(CPUState *cpu, struct x86_tss_segment32 *tss)
-     X86CPU *x86_cpu = X86_CPU(cpu);
-     CPUX86State *env = &x86_cpu->env;
--    wvmcs(cpu->hvf_fd, VMCS_GUEST_CR3, tss->cr3);
-+    wvmcs(cpu->hvf->fd, VMCS_GUEST_CR3, tss->cr3);
-     env->eip = tss->eip;
-     env->eflags = tss->eflags | 2;
-@@ -XXX,XX +XXX,XX @@ static int task_switch_32(CPUState *cpu, x68_segment_selector tss_sel, x68_segme
- void vmx_handle_task_switch(CPUState *cpu, x68_segment_selector tss_sel, int reason, bool gate_valid, uint8_t gate, uint64_t gate_type)
- {
--    uint64_t rip = rreg(cpu->hvf_fd, HV_X86_RIP);
-+    uint64_t rip = rreg(cpu->hvf->fd, HV_X86_RIP);
-     if (!gate_valid || (gate_type != VMCS_INTR_T_HWEXCEPTION &&
-                         gate_type != VMCS_INTR_T_HWINTR &&
-                         gate_type != VMCS_INTR_T_NMI)) {
--        int ins_len = rvmcs(cpu->hvf_fd, VMCS_EXIT_INSTRUCTION_LENGTH);
-+        int ins_len = rvmcs(cpu->hvf->fd, VMCS_EXIT_INSTRUCTION_LENGTH);
-         macvm_set_rip(cpu, rip + ins_len);
-         return;
-     }
-@@ -XXX,XX +XXX,XX @@ void vmx_handle_task_switch(CPUState *cpu, x68_segment_selector tss_sel, int rea
-         //ret = task_switch_16(cpu, tss_sel, old_tss_sel, old_tss_base, &next_tss_desc);
-         VM_PANIC("task_switch_16");
--    macvm_set_cr0(cpu->hvf_fd, rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0) | CR0_TS);
-+    macvm_set_cr0(cpu->hvf->fd, rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0) | CR0_TS);
-     x86_segment_descriptor_to_vmx(cpu, tss_sel, &next_tss_desc, &vmx_seg);
-     vmx_write_segment_descriptor(cpu, &vmx_seg, R_TR);
-     store_regs(cpu);
--    hv_vcpu_invalidate_tlb(cpu->hvf_fd);
--    hv_vcpu_flush(cpu->hvf_fd);
-+    hv_vcpu_invalidate_tlb(cpu->hvf->fd);
-+    hv_vcpu_flush(cpu->hvf->fd);
- }
-diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/i386/hvf/x86hvf.c
-+++ b/target/i386/hvf/x86hvf.c
-@@ -XXX,XX +XXX,XX @@ void hvf_put_xsave(CPUState *cpu_state)
-     x86_cpu_xsave_all_areas(X86_CPU(cpu_state), xsave);
--    if (hv_vcpu_write_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
-+    if (hv_vcpu_write_fpstate(cpu_state->hvf->fd, (void*)xsave, 4096)) {
-         abort();
-     }
- }
-@@ -XXX,XX +XXX,XX @@ void hvf_put_segments(CPUState *cpu_state)
-     CPUX86State *env = &X86_CPU(cpu_state)->env;
-     struct vmx_segment seg;
--    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_LIMIT, env->idt.limit);
--    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_BASE, env->idt.base);
-+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_LIMIT, env->idt.limit);
-+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_BASE, env->idt.base);
--    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_LIMIT, env->gdt.limit);
--    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_BASE, env->gdt.base);
-+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_LIMIT, env->gdt.limit);
-+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_BASE, env->gdt.base);
--    /* wvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR2, env->cr[2]); */
--    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR3, env->cr[3]);
-+    /* wvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR2, env->cr[2]); */
-+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR3, env->cr[3]);
-     vmx_update_tpr(cpu_state);
--    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IA32_EFER, env->efer);
-+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IA32_EFER, env->efer);
--    macvm_set_cr4(cpu_state->hvf_fd, env->cr[4]);
--    macvm_set_cr0(cpu_state->hvf_fd, env->cr[0]);
-+    macvm_set_cr4(cpu_state->hvf->fd, env->cr[4]);
-+    macvm_set_cr0(cpu_state->hvf->fd, env->cr[0]);
-     hvf_set_segment(cpu_state, &seg, &env->segs[R_CS], false);
-     vmx_write_segment_descriptor(cpu_state, &seg, R_CS);
-@@ -XXX,XX +XXX,XX @@ void hvf_put_segments(CPUState *cpu_state)
-     hvf_set_segment(cpu_state, &seg, &env->ldt, false);
-     vmx_write_segment_descriptor(cpu_state, &seg, R_LDTR);
--    hv_vcpu_flush(cpu_state->hvf_fd);
-+    hv_vcpu_flush(cpu_state->hvf->fd);
- }
- void hvf_put_msrs(CPUState *cpu_state)
- {
-     CPUX86State *env = &X86_CPU(cpu_state)->env;
--    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_CS,
-+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_CS,
-                       env->sysenter_cs);
--    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_ESP,
-+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_ESP,
-                       env->sysenter_esp);
--    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_EIP,
-+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_EIP,
-                       env->sysenter_eip);
--    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_STAR, env->star);
-+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_STAR, env->star);
- #ifdef TARGET_X86_64
--    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_CSTAR, env->cstar);
--    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_KERNELGSBASE, env->kernelgsbase);
--    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_FMASK, env->fmask);
--    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_LSTAR, env->lstar);
-+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_CSTAR, env->cstar);
-+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_KERNELGSBASE, env->kernelgsbase);
-+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_FMASK, env->fmask);
-+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_LSTAR, env->lstar);
- #endif
--    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_GSBASE, env->segs[R_GS].base);
--    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_FSBASE, env->segs[R_FS].base);
-+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_GSBASE, env->segs[R_GS].base);
-+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_FSBASE, env->segs[R_FS].base);
- }
-@@ -XXX,XX +XXX,XX @@ void hvf_get_xsave(CPUState *cpu_state)
-     xsave = X86_CPU(cpu_state)->env.xsave_buf;
--    if (hv_vcpu_read_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
-+    if (hv_vcpu_read_fpstate(cpu_state->hvf->fd, (void*)xsave, 4096)) {
-         abort();
-     }
-@@ -XXX,XX +XXX,XX @@ void hvf_get_segments(CPUState *cpu_state)
-     vmx_read_segment_descriptor(cpu_state, &seg, R_LDTR);
-     hvf_get_segment(&env->ldt, &seg);
--    env->idt.limit = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_LIMIT);
--    env->idt.base = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_BASE);
--    env->gdt.limit = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
--    env->gdt.base = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_BASE);
-+    env->idt.limit = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_LIMIT);
-+    env->idt.base = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_BASE);
-+    env->gdt.limit = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
-+    env->gdt.base = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_BASE);
--    env->cr[0] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR0);
-+    env->cr[0] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR0);
-     env->cr[2] = 0;
--    env->cr[3] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR3);
--    env->cr[4] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR4);
-+    env->cr[3] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR3);
-+    env->cr[4] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR4);
--    env->efer = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IA32_EFER);
-+    env->efer = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IA32_EFER);
- }
- void hvf_get_msrs(CPUState *cpu_state)
-@@ -XXX,XX +XXX,XX @@ void hvf_get_msrs(CPUState *cpu_state)
-     CPUX86State *env = &X86_CPU(cpu_state)->env;
-     uint64_t tmp;
--    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_CS, &tmp);
-+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_CS, &tmp);
-     env->sysenter_cs = tmp;
--    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_ESP, &tmp);
-+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_ESP, &tmp);
-     env->sysenter_esp = tmp;
--    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_EIP, &tmp);
-+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_EIP, &tmp);
-     env->sysenter_eip = tmp;
--    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_STAR, &env->star);
-+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_STAR, &env->star);
- #ifdef TARGET_X86_64
--    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_CSTAR, &env->cstar);
--    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_KERNELGSBASE, &env->kernelgsbase);
--    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_FMASK, &env->fmask);
--    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_LSTAR, &env->lstar);
-+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_CSTAR, &env->cstar);
-+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_KERNELGSBASE, &env->kernelgsbase);
-+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_FMASK, &env->fmask);
-+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_LSTAR, &env->lstar);
- #endif
--    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_APICBASE, &tmp);
-+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_APICBASE, &tmp);
--    env->tsc = rdtscp() + rvmcs(cpu_state->hvf_fd, VMCS_TSC_OFFSET);
-+    env->tsc = rdtscp() + rvmcs(cpu_state->hvf->fd, VMCS_TSC_OFFSET);
- }
- int hvf_put_registers(CPUState *cpu_state)
-@@ -XXX,XX +XXX,XX @@ int hvf_put_registers(CPUState *cpu_state)
-     X86CPU *x86cpu = X86_CPU(cpu_state);
-     CPUX86State *env = &x86cpu->env;
--    wreg(cpu_state->hvf_fd, HV_X86_RAX, env->regs[R_EAX]);
--    wreg(cpu_state->hvf_fd, HV_X86_RBX, env->regs[R_EBX]);
--    wreg(cpu_state->hvf_fd, HV_X86_RCX, env->regs[R_ECX]);
--    wreg(cpu_state->hvf_fd, HV_X86_RDX, env->regs[R_EDX]);
--    wreg(cpu_state->hvf_fd, HV_X86_RBP, env->regs[R_EBP]);
--    wreg(cpu_state->hvf_fd, HV_X86_RSP, env->regs[R_ESP]);
--    wreg(cpu_state->hvf_fd, HV_X86_RSI, env->regs[R_ESI]);
--    wreg(cpu_state->hvf_fd, HV_X86_RDI, env->regs[R_EDI]);
--    wreg(cpu_state->hvf_fd, HV_X86_R8, env->regs[8]);
--    wreg(cpu_state->hvf_fd, HV_X86_R9, env->regs[9]);
--    wreg(cpu_state->hvf_fd, HV_X86_R10, env->regs[10]);
--    wreg(cpu_state->hvf_fd, HV_X86_R11, env->regs[11]);
--    wreg(cpu_state->hvf_fd, HV_X86_R12, env->regs[12]);
--    wreg(cpu_state->hvf_fd, HV_X86_R13, env->regs[13]);
--    wreg(cpu_state->hvf_fd, HV_X86_R14, env->regs[14]);
--    wreg(cpu_state->hvf_fd, HV_X86_R15, env->regs[15]);
--    wreg(cpu_state->hvf_fd, HV_X86_RFLAGS, env->eflags);
--    wreg(cpu_state->hvf_fd, HV_X86_RIP, env->eip);
-+    wreg(cpu_state->hvf->fd, HV_X86_RAX, env->regs[R_EAX]);
-+    wreg(cpu_state->hvf->fd, HV_X86_RBX, env->regs[R_EBX]);
-+    wreg(cpu_state->hvf->fd, HV_X86_RCX, env->regs[R_ECX]);
-+    wreg(cpu_state->hvf->fd, HV_X86_RDX, env->regs[R_EDX]);
-+    wreg(cpu_state->hvf->fd, HV_X86_RBP, env->regs[R_EBP]);
-+    wreg(cpu_state->hvf->fd, HV_X86_RSP, env->regs[R_ESP]);
-+    wreg(cpu_state->hvf->fd, HV_X86_RSI, env->regs[R_ESI]);
-+    wreg(cpu_state->hvf->fd, HV_X86_RDI, env->regs[R_EDI]);
-+    wreg(cpu_state->hvf->fd, HV_X86_R8, env->regs[8]);
-+    wreg(cpu_state->hvf->fd, HV_X86_R9, env->regs[9]);
-+    wreg(cpu_state->hvf->fd, HV_X86_R10, env->regs[10]);
-+    wreg(cpu_state->hvf->fd, HV_X86_R11, env->regs[11]);
-+    wreg(cpu_state->hvf->fd, HV_X86_R12, env->regs[12]);
-+    wreg(cpu_state->hvf->fd, HV_X86_R13, env->regs[13]);
-+    wreg(cpu_state->hvf->fd, HV_X86_R14, env->regs[14]);
-+    wreg(cpu_state->hvf->fd, HV_X86_R15, env->regs[15]);
-+    wreg(cpu_state->hvf->fd, HV_X86_RFLAGS, env->eflags);
-+    wreg(cpu_state->hvf->fd, HV_X86_RIP, env->eip);
--    wreg(cpu_state->hvf_fd, HV_X86_XCR0, env->xcr0);
-+    wreg(cpu_state->hvf->fd, HV_X86_XCR0, env->xcr0);
-     hvf_put_xsave(cpu_state);
-@@ -XXX,XX +XXX,XX @@ int hvf_put_registers(CPUState *cpu_state)
-     hvf_put_msrs(cpu_state);
--    wreg(cpu_state->hvf_fd, HV_X86_DR0, env->dr[0]);
--    wreg(cpu_state->hvf_fd, HV_X86_DR1, env->dr[1]);
--    wreg(cpu_state->hvf_fd, HV_X86_DR2, env->dr[2]);
--    wreg(cpu_state->hvf_fd, HV_X86_DR3, env->dr[3]);
--    wreg(cpu_state->hvf_fd, HV_X86_DR4, env->dr[4]);
--    wreg(cpu_state->hvf_fd, HV_X86_DR5, env->dr[5]);
--    wreg(cpu_state->hvf_fd, HV_X86_DR6, env->dr[6]);
--    wreg(cpu_state->hvf_fd, HV_X86_DR7, env->dr[7]);
-+    wreg(cpu_state->hvf->fd, HV_X86_DR0, env->dr[0]);
-+    wreg(cpu_state->hvf->fd, HV_X86_DR1, env->dr[1]);
-+    wreg(cpu_state->hvf->fd, HV_X86_DR2, env->dr[2]);
-+    wreg(cpu_state->hvf->fd, HV_X86_DR3, env->dr[3]);
-+    wreg(cpu_state->hvf->fd, HV_X86_DR4, env->dr[4]);
-+    wreg(cpu_state->hvf->fd, HV_X86_DR5, env->dr[5]);
-+    wreg(cpu_state->hvf->fd, HV_X86_DR6, env->dr[6]);
-+    wreg(cpu_state->hvf->fd, HV_X86_DR7, env->dr[7]);
-     return 0;
- }
-@@ -XXX,XX +XXX,XX @@ int hvf_get_registers(CPUState *cpu_state)
-     X86CPU *x86cpu = X86_CPU(cpu_state);
-     CPUX86State *env = &x86cpu->env;
--    env->regs[R_EAX] = rreg(cpu_state->hvf_fd, HV_X86_RAX);
--    env->regs[R_EBX] = rreg(cpu_state->hvf_fd, HV_X86_RBX);
--    env->regs[R_ECX] = rreg(cpu_state->hvf_fd, HV_X86_RCX);
--    env->regs[R_EDX] = rreg(cpu_state->hvf_fd, HV_X86_RDX);
--    env->regs[R_EBP] = rreg(cpu_state->hvf_fd, HV_X86_RBP);
--    env->regs[R_ESP] = rreg(cpu_state->hvf_fd, HV_X86_RSP);
--    env->regs[R_ESI] = rreg(cpu_state->hvf_fd, HV_X86_RSI);
--    env->regs[R_EDI] = rreg(cpu_state->hvf_fd, HV_X86_RDI);
--    env->regs[8] = rreg(cpu_state->hvf_fd, HV_X86_R8);
--    env->regs[9] = rreg(cpu_state->hvf_fd, HV_X86_R9);
--    env->regs[10] = rreg(cpu_state->hvf_fd, HV_X86_R10);
--    env->regs[11] = rreg(cpu_state->hvf_fd, HV_X86_R11);
--    env->regs[12] = rreg(cpu_state->hvf_fd, HV_X86_R12);
--    env->regs[13] = rreg(cpu_state->hvf_fd, HV_X86_R13);
--    env->regs[14] = rreg(cpu_state->hvf_fd, HV_X86_R14);
--    env->regs[15] = rreg(cpu_state->hvf_fd, HV_X86_R15);
-+    env->regs[R_EAX] = rreg(cpu_state->hvf->fd, HV_X86_RAX);
-+    env->regs[R_EBX] = rreg(cpu_state->hvf->fd, HV_X86_RBX);
-+    env->regs[R_ECX] = rreg(cpu_state->hvf->fd, HV_X86_RCX);
-+    env->regs[R_EDX] = rreg(cpu_state->hvf->fd, HV_X86_RDX);
-+    env->regs[R_EBP] = rreg(cpu_state->hvf->fd, HV_X86_RBP);
-+    env->regs[R_ESP] = rreg(cpu_state->hvf->fd, HV_X86_RSP);
-+    env->regs[R_ESI] = rreg(cpu_state->hvf->fd, HV_X86_RSI);
-+    env->regs[R_EDI] = rreg(cpu_state->hvf->fd, HV_X86_RDI);
-+    env->regs[8] = rreg(cpu_state->hvf->fd, HV_X86_R8);
-+    env->regs[9] = rreg(cpu_state->hvf->fd, HV_X86_R9);
-+    env->regs[10] = rreg(cpu_state->hvf->fd, HV_X86_R10);
-+    env->regs[11] = rreg(cpu_state->hvf->fd, HV_X86_R11);
-+    env->regs[12] = rreg(cpu_state->hvf->fd, HV_X86_R12);
-+    env->regs[13] = rreg(cpu_state->hvf->fd, HV_X86_R13);
-+    env->regs[14] = rreg(cpu_state->hvf->fd, HV_X86_R14);
-+    env->regs[15] = rreg(cpu_state->hvf->fd, HV_X86_R15);
--    env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
--    env->eip = rreg(cpu_state->hvf_fd, HV_X86_RIP);
-+    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
-+    env->eip = rreg(cpu_state->hvf->fd, HV_X86_RIP);
-     hvf_get_xsave(cpu_state);
--    env->xcr0 = rreg(cpu_state->hvf_fd, HV_X86_XCR0);
-+    env->xcr0 = rreg(cpu_state->hvf->fd, HV_X86_XCR0);
-     hvf_get_segments(cpu_state);
-     hvf_get_msrs(cpu_state);
--    env->dr[0] = rreg(cpu_state->hvf_fd, HV_X86_DR0);
--    env->dr[1] = rreg(cpu_state->hvf_fd, HV_X86_DR1);
--    env->dr[2] = rreg(cpu_state->hvf_fd, HV_X86_DR2);
--    env->dr[3] = rreg(cpu_state->hvf_fd, HV_X86_DR3);
--    env->dr[4] = rreg(cpu_state->hvf_fd, HV_X86_DR4);
--    env->dr[5] = rreg(cpu_state->hvf_fd, HV_X86_DR5);
--    env->dr[6] = rreg(cpu_state->hvf_fd, HV_X86_DR6);
--    env->dr[7] = rreg(cpu_state->hvf_fd, HV_X86_DR7);
-+    env->dr[0] = rreg(cpu_state->hvf->fd, HV_X86_DR0);
-+    env->dr[1] = rreg(cpu_state->hvf->fd, HV_X86_DR1);
-+    env->dr[2] = rreg(cpu_state->hvf->fd, HV_X86_DR2);
-+    env->dr[3] = rreg(cpu_state->hvf->fd, HV_X86_DR3);
-+    env->dr[4] = rreg(cpu_state->hvf->fd, HV_X86_DR4);
-+    env->dr[5] = rreg(cpu_state->hvf->fd, HV_X86_DR5);
-+    env->dr[6] = rreg(cpu_state->hvf->fd, HV_X86_DR6);
-+    env->dr[7] = rreg(cpu_state->hvf->fd, HV_X86_DR7);
-     x86_update_hflags(env);
-     return 0;
-@@ -XXX,XX +XXX,XX @@ int hvf_get_registers(CPUState *cpu_state)
- static void vmx_set_int_window_exiting(CPUState *cpu)
- {
-      uint64_t val;
--     val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
--     wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val |
-+     val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
-+     wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val |
-              VMCS_PRI_PROC_BASED_CTLS_INT_WINDOW_EXITING);
- }
- void vmx_clear_int_window_exiting(CPUState *cpu)
- {
-      uint64_t val;
--     val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
--     wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val &
-+     val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
-+     wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val &
-              ~VMCS_PRI_PROC_BASED_CTLS_INT_WINDOW_EXITING);
- }
-@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
-     uint64_t info = 0;
-     if (have_event) {
-         info = vector | intr_type | VMCS_INTR_VALID;
--        uint64_t reason = rvmcs(cpu_state->hvf_fd, VMCS_EXIT_REASON);
-+        uint64_t reason = rvmcs(cpu_state->hvf->fd, VMCS_EXIT_REASON);
-         if (env->nmi_injected && reason != EXIT_REASON_TASK_SWITCH) {
-             vmx_clear_nmi_blocking(cpu_state);
-         }
-@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
-             info &= ~(1 << 12); /* clear undefined bit */
-             if (intr_type == VMCS_INTR_T_SWINTR ||
-                 intr_type == VMCS_INTR_T_SWEXCEPTION) {
--                wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INST_LENGTH, env->ins_len);
-+                wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INST_LENGTH, env->ins_len);
-             }
-             if (env->has_error_code) {
--                wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_EXCEPTION_ERROR,
-+                wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_EXCEPTION_ERROR,
-                       env->error_code);
-                 /* Indicate that VMCS_ENTRY_EXCEPTION_ERROR is valid */
-                 info |= VMCS_INTR_DEL_ERRCODE;
-             }
-             /*printf("reinject  %lx err %d\n", info, err);*/
--            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, info);
-+            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, info);
-         };
-     }
-@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
-         if (!(env->hflags2 & HF2_NMI_MASK) && !(info & VMCS_INTR_VALID)) {
-             cpu_state->interrupt_request &= ~CPU_INTERRUPT_NMI;
-             info = VMCS_INTR_VALID | VMCS_INTR_T_NMI | EXCP02_NMI;
--            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, info);
-+            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, info);
-         } else {
-             vmx_set_nmi_window_exiting(cpu_state);
-         }
-@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
-         int line = cpu_get_pic_interrupt(&x86cpu->env);
-         cpu_state->interrupt_request &= ~CPU_INTERRUPT_HARD;
-         if (line >= 0) {
--            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, line |
-+            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, line |
-                   VMCS_INTR_VALID | VMCS_INTR_T_HWINTR);
-         }
-     }
-@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
-     X86CPU *cpu = X86_CPU(cpu_state);
-     CPUX86State *env = &cpu->env;
--    env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
-+    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
-     if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
-         cpu_synchronize_state(cpu_state);
---
-.20.1

-[PULL 40/45] tests/qtest/bios-tables-test: Check for dup2() failure
+Deleted patch
-Coverity notes that we don't check for dup2() failing.  Add some
-assertions so that if it does ever happen we get some indication.
-(This is similar to how we handle other "don't expect this syscall to
-fail" checks in this test code.)
-Fixes: Coverity CID 1432346
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
-Message-id: 20210525134458.6675-2-peter.maydell@linaro.org
----
- tests/qtest/bios-tables-test.c | 8 ++++++--
-file changed, 6 insertions(+), 2 deletions(-)
-diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
-index XXXXXXX..XXXXXXX 100644
---- a/tests/qtest/bios-tables-test.c
-+++ b/tests/qtest/bios-tables-test.c
-@@ -XXX,XX +XXX,XX @@ static void test_acpi_asl(test_data *data)
-                                                  exp_sdt->asl_file, sdt->asl_file);
-                     int out = dup(STDOUT_FILENO);
-                     int ret G_GNUC_UNUSED;
-+                    int dupret;
--                    dup2(STDERR_FILENO, STDOUT_FILENO);
-+                    g_assert(out >= 0);
-+                    dupret = dup2(STDERR_FILENO, STDOUT_FILENO);
-+                    g_assert(dupret >= 0);
-                     ret = system(diff) ;
--                    dup2(out, STDOUT_FILENO);
-+                    dupret = dup2(out, STDOUT_FILENO);
-+                    g_assert(dupret >= 0);
-                     close(out);
-                     g_free(diff);
-                 }
---
-.20.1

-[PULL 41/45] tests/qtest/e1000e-test: Check qemu_recv() succeeded
+Deleted patch
-The e1000e_send_verify() test calls qemu_recv() but doesn't
-check that the call succeeded, which annoys Coverity. Add
-an explicit test check for the length of the data.
-(This is a test check, not a "we assume this syscall always
-succeeds", so we use g_assert_cmpint() rather than g_assert().)
-Fixes: Coverity CID 1432324
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
-Message-id: 20210525134458.6675-3-peter.maydell@linaro.org
----
- tests/qtest/e1000e-test.c | 3 ++-
-file changed, 2 insertions(+), 1 deletion(-)
-diff --git a/tests/qtest/e1000e-test.c b/tests/qtest/e1000e-test.c
-index XXXXXXX..XXXXXXX 100644
---- a/tests/qtest/e1000e-test.c
-+++ b/tests/qtest/e1000e-test.c
-@@ -XXX,XX +XXX,XX @@ static void e1000e_send_verify(QE1000E *d, int *test_sockets, QGuestAllocator *a
-     /* Check data sent to the backend */
-     ret = qemu_recv(test_sockets[0], &recv_len, sizeof(recv_len), 0);
-     g_assert_cmpint(ret, == , sizeof(recv_len));
--    qemu_recv(test_sockets[0], buffer, 64, 0);
-+    ret = qemu_recv(test_sockets[0], buffer, 64, 0);
-+    g_assert_cmpint(ret, >=, 5);
-     g_assert_cmpstr(buffer, == , "TEST");
-     /* Free test data buffer */
---
-.20.1

-[PULL 42/45] tests/qtest/hd-geo-test: Fix checks on mkstemp() return value
+Deleted patch
-Coverity notices that the checks against mkstemp() failing in
-create_qcow2_with_mbr() are wrong: mkstemp returns -1 on failure but
-the check is just "g_assert(fd)".  Fix to use "g_assert(fd >= 0)",
-matching the correct check in create_test_img().
-Fixes: Coverity CID 1432274
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
-Message-id: 20210525134458.6675-4-peter.maydell@linaro.org
----
- tests/qtest/hd-geo-test.c | 4 ++--
-file changed, 2 insertions(+), 2 deletions(-)
-diff --git a/tests/qtest/hd-geo-test.c b/tests/qtest/hd-geo-test.c
-index XXXXXXX..XXXXXXX 100644
---- a/tests/qtest/hd-geo-test.c
-+++ b/tests/qtest/hd-geo-test.c
-@@ -XXX,XX +XXX,XX @@ static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors)
-     }
-     fd = mkstemp(raw_path);
--    g_assert(fd);
-+    g_assert(fd >= 0);
-     close(fd);
-     fd = open(raw_path, O_WRONLY);
-@@ -XXX,XX +XXX,XX @@ static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors)
-     close(fd);
-     fd = mkstemp(qcow2_path);
--    g_assert(fd);
-+    g_assert(fd >= 0);
-     close(fd);
-     qemu_img_path = getenv("QTEST_QEMU_IMG");
---
-.20.1

-[PULL 43/45] tests/qtest/pflash-cfi02-test: Avoid potential integer overflow
+Deleted patch
-Coverity points out that we calculate a 64-bit value using 32-bit
-arithmetic; add the cast to force the multiply to be done as 64-bits.
-(The overflow will never happen with the current test data.)
-Fixes: Coverity CID 1432320
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
-Message-id: 20210525134458.6675-5-peter.maydell@linaro.org
----
- tests/qtest/pflash-cfi02-test.c | 2 +-
-file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/tests/qtest/pflash-cfi02-test.c b/tests/qtest/pflash-cfi02-test.c
-index XXXXXXX..XXXXXXX 100644
---- a/tests/qtest/pflash-cfi02-test.c
-+++ b/tests/qtest/pflash-cfi02-test.c
-@@ -XXX,XX +XXX,XX @@ static void test_geometry(const void *opaque)
-     for (int region = 0; region < nb_erase_regions; ++region) {
-         for (uint32_t i = 0; i < c->nb_blocs[region]; ++i) {
--            uint64_t byte_addr = i * c->sector_len[region];
-+            uint64_t byte_addr = (uint64_t)i * c->sector_len[region];
-             g_assert_cmphex(flash_read(c, byte_addr), ==, bank_mask(c));
-         }
-     }
---
-.20.1

-[PULL 44/45] tests/qtest/tpm-tests: Remove unnecessary NULL checks
+Deleted patch
-Coverity points out that in tpm_test_swtpm_migration_test() we
-assume that src_tpm_addr and dst_tpm_addr are non-NULL (we
-pass them to tpm_util_migration_start_qemu() which will
-unconditionally dereference them) but then later explicitly
-check them for NULL. Remove the pointless checks.
-Fixes: Coverity CID 1432367, 1432359
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
-Message-id: 20210525134458.6675-6-peter.maydell@linaro.org
----
- tests/qtest/tpm-tests.c | 12 ++++--------
-file changed, 4 insertions(+), 8 deletions(-)
-diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
-index XXXXXXX..XXXXXXX 100644
---- a/tests/qtest/tpm-tests.c
-+++ b/tests/qtest/tpm-tests.c
-@@ -XXX,XX +XXX,XX @@ void tpm_test_swtpm_migration_test(const char *src_tpm_path,
-     qtest_quit(src_qemu);
-     tpm_util_swtpm_kill(dst_tpm_pid);
--    if (dst_tpm_addr) {
--        g_unlink(dst_tpm_addr->u.q_unix.path);
--        qapi_free_SocketAddress(dst_tpm_addr);
--    }
-+    g_unlink(dst_tpm_addr->u.q_unix.path);
-+    qapi_free_SocketAddress(dst_tpm_addr);
-     tpm_util_swtpm_kill(src_tpm_pid);
--    if (src_tpm_addr) {
--        g_unlink(src_tpm_addr->u.q_unix.path);
--        qapi_free_SocketAddress(src_tpm_addr);
--    }
-+    g_unlink(src_tpm_addr->u.q_unix.path);
-+    qapi_free_SocketAddress(src_tpm_addr);
- }
---
-.20.1

-[PULL 45/45] tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed
+Deleted patch
-Coverity complains that we don't check for failures from dup()
-and mkstemp(); add asserts that these syscalls succeeded.
-Fixes: Coverity CID 1432516, 1432574
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20210525134458.6675-7-peter.maydell@linaro.org
----
- tests/unit/test-vmstate.c | 5 ++++-
-file changed, 4 insertions(+), 1 deletion(-)
-diff --git a/tests/unit/test-vmstate.c b/tests/unit/test-vmstate.c
-index XXXXXXX..XXXXXXX 100644
---- a/tests/unit/test-vmstate.c
-+++ b/tests/unit/test-vmstate.c
-@@ -XXX,XX +XXX,XX @@ static int temp_fd;
- /* Duplicate temp_fd and seek to the beginning of the file */
- static QEMUFile *open_test_file(bool write)
- {
--    int fd = dup(temp_fd);
-+    int fd;
-     QIOChannel *ioc;
-     QEMUFile *f;
-+    fd = dup(temp_fd);
-+    g_assert(fd >= 0);
-     lseek(fd, 0, SEEK_SET);
-     if (write) {
-         g_assert_cmpint(ftruncate(fd, 0), ==, 0);
-@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
-     g_autofree char *temp_file = g_strdup_printf("%s/vmst.test.XXXXXX",
-                                                  g_get_tmp_dir());
-     temp_fd = mkstemp(temp_file);
-+    g_assert(temp_fd >= 0);
-     module_call_init(MODULE_INIT_QOM);
---
-.20.1

The following changes since commit a97978bcc2d1f650c7d411428806e5b03082b8c7:

Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.1-20210603' into staging (2021-06-03 10:00:35 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210603

for you to fetch changes up to 1c861885894d840235954060050d240259f5340b:

tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed (2021-06-03 16:43:27 +0100)

----------------------------------------------------------------
target-arm queue:
 * Some not-yet-enabled preliminaries for M-profile MVE support
 * Consistently use "Cortex-Axx", not "Cortex Axx" in docs, comments
 * docs: Fix installation of man pages with Sphinx 4.x
 * Mark LDS{MIN,MAX} as signed operations
 * Fix missing syndrome value for DAIF and PAC check exceptions
 * Implement BFloat16 extensions
 * Refactoring of hvf accelerator code in preparation for aarch64 support
 * Fix some coverity nits in test code

----------------------------------------------------------------
Alexander Graf (12):
      hvf: Move assert_hvf_ok() into common directory
      hvf: Move vcpu thread functions into common directory
      hvf: Move cpu functions into common directory
      hvf: Move hvf internal definitions into common header
      hvf: Make hvf_set_phys_mem() static
      hvf: Remove use of hv_uvaddr_t and hv_gpaddr_t
      hvf: Split out common code on vcpu init and destroy
      hvf: Use cpu_synchronize_state()
      hvf: Make synchronize functions static
      hvf: Remove hvf-accel-ops.h
      hvf: Introduce hvf vcpu struct
      hvf: Simplify post reset/init/loadvm hooks

Damien Goutte-Gattat (1):
      docs: Fix installation of man pages with Sphinx 4.x

Jamie Iles (4):
      target/arm: fix missing exception class
      target/arm: fold do_raise_exception into raise_exception
      target/arm: use raise_exception_ra for MTE check failure
      target/arm: use raise_exception_ra for stack limit exception

Peter Maydell (15):
      target/arm: Add isar feature check functions for MVE
      target/arm: Update feature checks for insns which are "MVE or FP"
      target/arm: Move fpsp/fpdp isar check into callers of do_vfp_2op_sp/dp
      target/arm: Add MVE check to VMOV_reg_sp and VMOV_reg_dp
      target/arm: Fix return values in fp_sysreg_checks()
      target/arm: Implement M-profile VPR register
      target/arm: Make FPSCR.LTPSIZE writable for MVE
      target/arm: Allow board models to specify initial NS VTOR
      arm: Consistently use "Cortex-Axx", not "Cortex Axx"
      tests/qtest/bios-tables-test: Check for dup2() failure
      tests/qtest/e1000e-test: Check qemu_recv() succeeded
      tests/qtest/hd-geo-test: Fix checks on mkstemp() return value
      tests/qtest/pflash-cfi02-test: Avoid potential integer overflow
      tests/qtest/tpm-tests: Remove unnecessary NULL checks
      tests/unit/test-vmstate: Assert that dup() and mkstemp() succeed

Richard Henderson (13):
      target/arm: Mark LDS{MIN,MAX} as signed operations
      target/arm: Add isar_feature_{aa32, aa64, aa64_sve}_bf16
      target/arm: Unify unallocated path in disas_fp_1src
      target/arm: Implement scalar float32 to bfloat16 conversion
      target/arm: Implement vector float32 to bfloat16 conversion
      softfpu: Add float_round_to_odd_inf
      target/arm: Implement bfloat16 dot product (vector)
      target/arm: Implement bfloat16 dot product (indexed)
      target/arm: Implement bfloat16 matrix multiply accumulate
      target/arm: Implement bfloat widening fma (vector)
      target/arm: Implement bfloat widening fma (indexed)
      linux-user/aarch64: Enable hwcap bits for bfloat16
      target/arm: Enable BFloat16 extensions

Add the isar feature check functions we will need for v8.1M MVE:
 * a check for MVE present: this corresponds to the pseudocode's
   CheckDecodeFaults(ExtType_Mve)
 * a check for the optional floating-point part of MVE: this
   corresponds to CheckDecodeFaults(ExtType_MveFp)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-2-peter.maydell@linaro.org
---
 target/arm/cpu.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
     }
 }
 
+static inline bool isar_feature_aa32_mve(const ARMISARegisters *id)
+{
+    /*
+     * Return true if MVE is supported (either integer or floating point).
+     * We must check for M-profile as the MVFR1 field means something
+     * else for A-profile.
+     */
+    return isar_feature_aa32_mprofile(id) &&
+        FIELD_EX32(id->mvfr1, MVFR1, MVE) > 0;
+}
+
+static inline bool isar_feature_aa32_mve_fp(const ARMISARegisters *id)
+{
+    /*
+     * Return true if MVE is supported (either integer or floating point).
+     * We must check for M-profile as the MVFR1 field means something
+     * else for A-profile.
+     */
+    return isar_feature_aa32_mprofile(id) &&
+        FIELD_EX32(id->mvfr1, MVFR1, MVE) >= 2;
+}
+
 static inline bool isar_feature_aa32_vfp_simd(const ARMISARegisters *id)
 {
     /*
-- 
2.20.1

Some v8M instructions are present if either the floating point
extension or MVE is implemented.  Update our implementation of them
to check for MVE as well as for FP.

This is all the insns which use CheckDecodeFaults(ExtType_MveOrFp) or
CheckDecodeFaults(ExtType_MveOrDpFp) in their pseudocode, which are
essentially the loads and stores, moves and sysreg accesses, except
for VMOV_reg_sp and VMOV_reg_dp, which we handle in subsequent
patches because they need a refactor to provide a place to put the
new MVE check.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-3-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 48 +++++++++++++++++++++++---------------
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
     /* VMOV scalar to general purpose register */
     TCGv_i32 tmp;
 
-    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
-    if (a->size == MO_32
-        ? !dc_isar_feature(aa32_fpsp_v2, s)
-        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-        return false;
+    /*
+     * SIZE == MO_32 is a VFP instruction; otherwise NEON. MVE has
+     * all sizes, whether the CPU has fp or not.
+     */
+    if (!dc_isar_feature(aa32_mve, s)) {
+        if (a->size == MO_32
+            ? !dc_isar_feature(aa32_fpsp_v2, s)
+            : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
+            return false;
+        }
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist */
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
     /* VMOV general purpose register to scalar */
     TCGv_i32 tmp;
 
-    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
-    if (a->size == MO_32
-        ? !dc_isar_feature(aa32_fpsp_v2, s)
-        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
-        return false;
+    /*
+     * SIZE == MO_32 is a VFP instruction; otherwise NEON. MVE has
+     * all sizes, whether the CPU has fp or not.
+     */
+    if (!dc_isar_feature(aa32_mve, s)) {
+        if (a->size == MO_32
+            ? !dc_isar_feature(aa32_fpsp_v2, s)
+            : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
+            return false;
+        }
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist */
@@ -XXX,XX +XXX,XX @@ typedef enum FPSysRegCheckResult {
 
 static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
 {
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return FPSysRegCheckFailed;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
 {
     TCGv_i32 tmp;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
 {
     TCGv_i32 tmp;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
      * floating point register.  Note that this does not require support
      * for double precision arithmetic.
      */
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     uint32_t offset;
     TCGv_i32 addr, tmp;
 
-    if (!dc_isar_feature(aa32_fp16_arith, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     uint32_t offset;
     TCGv_i32 addr, tmp;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
     TCGv_i64 tmp;
 
     /* Note that this does not require support for double arithmetic.  */
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
     TCGv_i32 addr, tmp;
     int i, n;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
     int i, n;
 
     /* Note that this does not require support for double arithmetic.  */
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
+    if (!dc_isar_feature(aa32_fpsp_v2, s) && !dc_isar_feature(aa32_mve, s)) {
         return false;
     }
 
-- 
2.20.1

The do_vfp_2op_sp() and do_vfp_2op_dp() functions currently check
whether floating point is supported via the aa32_fpdp_v2 and
aa32_fpsp_v2 isar checks.  For v8.1M MVE support, the VMOV_reg trans
functions (but not any of the others) need to update this to also
allow the insn if MVE is implemented.  Move the check out of the do_
function and into its callsites (which are all implemented via the
DO_VFP_2OP macro), so we have a place to change the check for the
VMOV insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-4-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     int veclen = s->vec_len;
     TCGv_i32 f0, fd;
 
-    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
-        return false;
-    }
+    /* Note that the caller must check the aa32_fpsp_v2 feature. */
 
     if (!dc_isar_feature(aa32_fpshvec, s) &&
         (veclen != 0 || s->vec_stride != 0)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
      */
     TCGv_i32 f0;
 
+    /* Note that the caller must check the aa32_fp16_arith feature */
+
     if (!dc_isar_feature(aa32_fp16_arith, s)) {
         return false;
     }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
     int veclen = s->vec_len;
     TCGv_i64 f0, fd;
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
-        return false;
-    }
+    /* Note that the caller must check the aa32_fpdp_v2 feature. */
 
     /* UNDEF accesses to D16-D31 if they don't exist */
     if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vm) & 0x10)) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     return true;
 }
 
-#define DO_VFP_2OP(INSN, PREC, FN)                              \
+#define DO_VFP_2OP(INSN, PREC, FN, CHECK)                       \
     static bool trans_##INSN##_##PREC(DisasContext *s,          \
                                       arg_##INSN##_##PREC *a)   \
     {                                                           \
+        if (!dc_isar_feature(CHECK, s)) {                       \
+            return false;                                       \
+        }                                                       \
         return do_vfp_2op_##PREC(s, FN, a->vd, a->vm);          \
     }
 
-DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32)
-DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64)
+DO_VFP_2OP(VMOV_reg, sp, tcg_gen_mov_i32, aa32_fpsp_v2)
+DO_VFP_2OP(VMOV_reg, dp, tcg_gen_mov_i64, aa32_fpdp_v2)
 
-DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh)
-DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss)
-DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd)
+DO_VFP_2OP(VABS, hp, gen_helper_vfp_absh, aa32_fp16_arith)
+DO_VFP_2OP(VABS, sp, gen_helper_vfp_abss, aa32_fpsp_v2)
+DO_VFP_2OP(VABS, dp, gen_helper_vfp_absd, aa32_fpdp_v2)
 
-DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh)
-DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs)
-DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd)
+DO_VFP_2OP(VNEG, hp, gen_helper_vfp_negh, aa32_fp16_arith)
+DO_VFP_2OP(VNEG, sp, gen_helper_vfp_negs, aa32_fpsp_v2)
+DO_VFP_2OP(VNEG, dp, gen_helper_vfp_negd, aa32_fpdp_v2)
 
 static void gen_VSQRT_hp(TCGv_i32 vd, TCGv_i32 vm)
 {
@@ -XXX,XX +XXX,XX @@ static void gen_VSQRT_dp(TCGv_i64 vd, TCGv_i64 vm)
     gen_helper_vfp_sqrtd(vd, vm, cpu_env);
 }
 
-DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp)
-DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp)
-DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp)
+DO_VFP_2OP(VSQRT, hp, gen_VSQRT_hp, aa32_fp16_arith)
+DO_VFP_2OP(VSQRT, sp, gen_VSQRT_sp, aa32_fpsp_v2)
+DO_VFP_2OP(VSQRT, dp, gen_VSQRT_dp, aa32_fpdp_v2)
 
 static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
 {
-- 
2.20.1

Split out the handling of VMOV_reg_sp and VMOV_reg_dp so that we can
permit the insns if either FP or MVE are present.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-5-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

The fp_sysreg_checks() function is supposed to be returning an
FPSysRegCheckResult, which is an enum with three possible values.
However, three places in the function "return false" (a hangover from
a previous iteration of the design where the function just returned a
bool).  Make these return FPSysRegCheckFailed instead (for no
functional change, since both false and FPSysRegCheckFailed are
zero).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-6-peter.maydell@linaro.org
---
 target/arm/translate-vfp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
         break;
     case ARM_VFP_FPSCR_NZCVQC:
         if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-            return false;
+            return FPSysRegCheckFailed;
         }
         break;
     case ARM_VFP_FPCXT_S:
     case ARM_VFP_FPCXT_NS:
         if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-            return false;
+            return FPSysRegCheckFailed;
         }
         if (!s->v8m_secure) {
-            return false;
+            return FPSysRegCheckFailed;
         }
         break;
     default:
-- 
2.20.1

If MVE is implemented for an M-profile CPU then it has a VPR
register, which tracks predication information.

Implement the read and write handling of this register, and
the migration of its state.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-7-peter.maydell@linaro.org
---
 target/arm/cpu.h           |  6 ++++++
 target/arm/machine.c       | 19 +++++++++++++++++++
 target/arm/translate-vfp.c | 38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
         uint32_t cpacr[M_REG_NUM_BANKS];
         uint32_t nsacr;
         int ltpsize;
+        uint32_t vpr;
     } v7m;
 
     /* Information associated with an exception about to be taken:
@@ -XXX,XX +XXX,XX @@ FIELD(V7M_FPCCR, ASPEN, 31, 1)
      R_V7M_FPCCR_UFRDY_MASK |                   \
      R_V7M_FPCCR_ASPEN_MASK)
 
+/* v7M VPR bits */
+FIELD(V7M_VPR, P0, 0, 16)
+FIELD(V7M_VPR, MASK01, 16, 4)
+FIELD(V7M_VPR, MASK23, 20, 4)
+
 /*
  * System register ID fields.
  */
diff --git a/target/arm/machine.c b/target/arm/machine.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_fp = {
     }
 };
 
+static bool mve_needed(void *opaque)
+{
+    ARMCPU *cpu = opaque;
+
+    return cpu_isar_feature(aa32_mve, cpu);
+}
+
+static const VMStateDescription vmstate_m_mve = {
+    .name = "cpu/m/mve",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = mve_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(env.v7m.vpr, ARMCPU),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_m = {
     .name = "cpu/m",
     .version_id = 4,
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m = {
         &vmstate_m_other_sp,
         &vmstate_m_v8m,
         &vmstate_m_fp,
+        &vmstate_m_mve,
         NULL
     }
 };
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static FPSysRegCheckResult fp_sysreg_checks(DisasContext *s, int regno)
             return FPSysRegCheckFailed;
         }
         break;
+    case ARM_VFP_VPR:
+    case ARM_VFP_P0:
+        if (!dc_isar_feature(aa32_mve, s)) {
+            return FPSysRegCheckFailed;
+        }
+        break;
     default:
         return FPSysRegCheckFailed;
     }
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_write(DisasContext *s, int regno,
         tcg_temp_free_i32(sfpa);
         break;
     }
+    case ARM_VFP_VPR:
+        /* Behaves as NOP if not privileged */
+        if (IS_USER(s)) {
+            break;
+        }
+        tmp = loadfn(s, opaque);
+        store_cpu_field(tmp, v7m.vpr);
+        break;
+    case ARM_VFP_P0:
+    {
+        TCGv_i32 vpr;
+        tmp = loadfn(s, opaque);
+        vpr = load_cpu_field(v7m.vpr);
+        tcg_gen_deposit_i32(vpr, vpr, tmp,
+                            R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
+        store_cpu_field(vpr, v7m.vpr);
+        tcg_temp_free_i32(tmp);
+        break;
+    }
     default:
         g_assert_not_reached();
     }
@@ -XXX,XX +XXX,XX @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int regno,
         tcg_temp_free_i32(fpscr);
         break;
     }
+    case ARM_VFP_VPR:
+        /* Behaves as NOP if not privileged */
+        if (IS_USER(s)) {
+            break;
+        }
+        tmp = load_cpu_field(v7m.vpr);
+        storefn(s, opaque, tmp);
+        break;
+    case ARM_VFP_P0:
+        tmp = load_cpu_field(v7m.vpr);
+        tcg_gen_extract_i32(tmp, tmp, R_V7M_VPR_P0_SHIFT, R_V7M_VPR_P0_LENGTH);
+        storefn(s, opaque, tmp);
+        break;
     default:
         g_assert_not_reached();
     }
-- 
2.20.1

The M-profile FPSCR has an LTPSIZE field, but if MVE is not
implemented it is read-only and always reads as 4; this is how QEMU
currently handles it.

Make the field writable when MVE is implemented.

We can safely add the field to the MVE migration struct because
currently no CPUs enable MVE and so the migration struct is never
used.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-8-peter.maydell@linaro.org
---
 target/arm/cpu.h        | 3 ++-
 target/arm/machine.c    | 1 +
 target/arm/vfp_helper.c | 9 ++++++---
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
         uint32_t fpdscr[M_REG_NUM_BANKS];
         uint32_t cpacr[M_REG_NUM_BANKS];
         uint32_t nsacr;
-        int ltpsize;
+        uint32_t ltpsize;
         uint32_t vpr;
     } v7m;
 
@@ -XXX,XX +XXX,XX @@ void vfp_set_fpscr(CPUARMState *env, uint32_t val);
 
 #define FPCR_LTPSIZE_SHIFT 16   /* LTPSIZE, M-profile only */
 #define FPCR_LTPSIZE_MASK (7 << FPCR_LTPSIZE_SHIFT)
+#define FPCR_LTPSIZE_LENGTH 3
 
 #define FPCR_NZCV_MASK (FPCR_N | FPCR_Z | FPCR_C | FPCR_V)
 #define FPCR_NZCVQC_MASK (FPCR_NZCV_MASK | FPCR_QC)
diff --git a/target/arm/machine.c b/target/arm/machine.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_m_mve = {
     .needed = mve_needed,
     .fields = (VMStateField[]) {
         VMSTATE_UINT32(env.v7m.vpr, ARMCPU),
+        VMSTATE_UINT32(env.v7m.ltpsize, ARMCPU),
         VMSTATE_END_OF_LIST()
     },
 };
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t vfp_get_fpscr(CPUARMState *env)
 
 void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
 {
+    ARMCPU *cpu = env_archcpu(env);
+
     /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
-    if (!cpu_isar_feature(any_fp16, env_archcpu(env))) {
+    if (!cpu_isar_feature(any_fp16, cpu)) {
         val &= ~FPCR_FZ16;
     }
 
@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
          * because in v7A no-short-vector-support cores still had to
          * allow Stride/Len to be written with the only effect that
          * some insns are required to UNDEF if the guest sets them.
-         *
-         * TODO: if M-profile MVE implemented, set LTPSIZE.
          */
         env->vfp.vec_len = extract32(val, 16, 3);
         env->vfp.vec_stride = extract32(val, 20, 2);
+    } else if (cpu_isar_feature(aa32_mve, cpu)) {
+        env->v7m.ltpsize = extract32(val, FPCR_LTPSIZE_SHIFT,
+                                     FPCR_LTPSIZE_LENGTH);
     }
 
     if (arm_feature(env, ARM_FEATURE_NEON)) {
-- 
2.20.1

Currently we allow board models to specify the initial value of the
Secure VTOR register, using an init-svtor property on the TYPE_ARMV7M
object which is plumbed through to the CPU.  Allow board models to
also specify the initial value of the Non-secure VTOR via a similar
init-nsvtor property.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210520152840.24453-10-peter.maydell@linaro.org
---
 include/hw/arm/armv7m.h |  2 ++
 target/arm/cpu.h        |  2 ++
 hw/arm/armv7m.c         |  7 +++++++
 target/arm/cpu.c        | 10 ++++++++++
 4 files changed, 21 insertions(+)

diff --git a/include/hw/arm/armv7m.h b/include/hw/arm/armv7m.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/armv7m.h
+++ b/include/hw/arm/armv7m.h
@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(ARMv7MState, ARMV7M)
  *   devices will be automatically layered on top of this view.)
  * + Property "idau": IDAU interface (forwarded to CPU object)
  * + Property "init-svtor": secure VTOR reset value (forwarded to CPU object)
+ * + Property "init-nsvtor": non-secure VTOR reset value (forwarded to CPU object)
  * + Property "vfp": enable VFP (forwarded to CPU object)
  * + Property "dsp": enable DSP (forwarded to CPU object)
  * + Property "enable-bitband": expose bitbanded IO
@@ -XXX,XX +XXX,XX @@ struct ARMv7MState {
     MemoryRegion *board_memory;
     Object *idau;
     uint32_t init_svtor;
+    uint32_t init_nsvtor;
     bool enable_bitband;
     bool start_powered_off;
     bool vfp;
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
 
     /* For v8M, initial value of the Secure VTOR */
     uint32_t init_svtor;
+    /* For v8M, initial value of the Non-secure VTOR */
+    uint32_t init_nsvtor;
 
     /* [QEMU_]KVM_ARM_TARGET_* constant for this CPU, or
      * QEMU_KVM_ARM_TARGET_NONE if the kernel doesn't support this CPU type.
diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/armv7m.c
+++ b/hw/arm/armv7m.c
@@ -XXX,XX +XXX,XX @@ static void armv7m_realize(DeviceState *dev, Error **errp)
             return;
         }
     }
+    if (object_property_find(OBJECT(s->cpu), "init-nsvtor")) {
+        if (!object_property_set_uint(OBJECT(s->cpu), "init-nsvtor",
+                                      s->init_nsvtor, errp)) {
+            return;
+        }
+    }
     if (object_property_find(OBJECT(s->cpu), "start-powered-off")) {
         if (!object_property_set_bool(OBJECT(s->cpu), "start-powered-off",
                                       s->start_powered_off, errp)) {
@@ -XXX,XX +XXX,XX @@ static Property armv7m_properties[] = {
                      MemoryRegion *),
     DEFINE_PROP_LINK("idau", ARMv7MState, idau, TYPE_IDAU_INTERFACE, Object *),
     DEFINE_PROP_UINT32("init-svtor", ARMv7MState, init_svtor, 0),
+    DEFINE_PROP_UINT32("init-nsvtor", ARMv7MState, init_nsvtor, 0),
     DEFINE_PROP_BOOL("enable-bitband", ARMv7MState, enable_bitband, false),
     DEFINE_PROP_BOOL("start-powered-off", ARMv7MState, start_powered_off,
                      false),
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
         env->regs[14] = 0xffffffff;
 
         env->v7m.vecbase[M_REG_S] = cpu->init_svtor & 0xffffff80;
+        env->v7m.vecbase[M_REG_NS] = cpu->init_nsvtor & 0xffffff80;
 
         /* Load the initial SP and PC from offset 0 and 4 in the vector table */
         vecbase = env->v7m.vecbase[env->v7m.secure];
@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
                                        &cpu->init_svtor,
                                        OBJ_PROP_FLAG_READWRITE);
     }
+    if (arm_feature(&cpu->env, ARM_FEATURE_M)) {
+        /*
+         * Initial value of the NS VTOR (for cores without the Security
+         * extension, this is the only VTOR)
+         */
+        object_property_add_uint32_ptr(obj, "init-nsvtor",
+                                       &cpu->init_nsvtor,
+                                       OBJ_PROP_FLAG_READWRITE);
+    }
 
     qdev_property_add_static(DEVICE(obj), &arm_cpu_cfgend_property);
 
-- 
2.20.1

The official punctuation for Arm CPU names uses a hyphen, like
"Cortex-A9". We mostly follow this, but in a few places usage
without the hyphen has crept in. Fix those so we consistently
use the same way of writing the CPU name.

This commit was created with:
  git grep -z -l 'Cortex ' | xargs -0 sed -i 's/Cortex /Cortex-/'

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20210527095152.10968-1-peter.maydell@linaro.org
---
 docs/system/arm/aspeed.rst    | 4 ++--
 docs/system/arm/nuvoton.rst   | 6 +++---
 docs/system/arm/sabrelite.rst | 2 +-
 include/hw/arm/allwinner-h3.h | 2 +-
 hw/arm/aspeed.c               | 6 +++---
 hw/arm/mcimx6ul-evk.c         | 2 +-
 hw/arm/mcimx7d-sabre.c        | 2 +-
 hw/arm/npcm7xx_boards.c       | 4 ++--
 hw/arm/sabrelite.c            | 2 +-
 hw/misc/npcm7xx_clk.c         | 2 +-
 10 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/aspeed.rst
+++ b/docs/system/arm/aspeed.rst
@@ -XXX,XX +XXX,XX @@ The QEMU Aspeed machines model BMCs of various OpenPOWER systems and
 Aspeed evaluation boards. They are based on different releases of the
 Aspeed SoC : the AST2400 integrating an ARM926EJ-S CPU (400MHz), the
 AST2500 with an ARM1176JZS CPU (800MHz) and more recently the AST2600
-with dual cores ARM Cortex A7 CPUs (1.2GHz).
+with dual cores ARM Cortex-A7 CPUs (1.2GHz).
 
 The SoC comes with RAM, Gigabit ethernet, USB, SD/MMC, USB, SPI, I2C,
 etc.
@@ -XXX,XX +XXX,XX @@ AST2500 SoC based machines :
 
 AST2600 SoC based machines :
 
-- ``ast2600-evb``          Aspeed AST2600 Evaluation board (Cortex A7)
+- ``ast2600-evb``          Aspeed AST2600 Evaluation board (Cortex-A7)
 - ``tacoma-bmc``           OpenPOWER Witherspoon POWER9 AST2600 BMC
 
 Supported devices
diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/nuvoton.rst
+++ b/docs/system/arm/nuvoton.rst
@@ -XXX,XX +XXX,XX @@ Nuvoton iBMC boards (``npcm750-evb``, ``quanta-gsj``)
 
 The `Nuvoton iBMC`_ chips (NPCM7xx) are a family of ARM-based SoCs that are
 designed to be used as Baseboard Management Controllers (BMCs) in various
-servers. They all feature one or two ARM Cortex A9 CPU cores, as well as an
+servers. They all feature one or two ARM Cortex-A9 CPU cores, as well as an
 assortment of peripherals targeted for either Enterprise or Data Center /
 Hyperscale applications. The former is a superset of the latter, so NPCM750 has
 all the peripherals of NPCM730 and more.
 
 .. _Nuvoton iBMC: https://www.nuvoton.com/products/cloud-computing/ibmc/
 
-The NPCM750 SoC has two Cortex A9 cores and is targeted for the Enterprise
+The NPCM750 SoC has two Cortex-A9 cores and is targeted for the Enterprise
 segment. The following machines are based on this chip :
 
 - ``npcm750-evb``       Nuvoton NPCM750 Evaluation board
 
-The NPCM730 SoC has two Cortex A9 cores and is targeted for Data Center and
+The NPCM730 SoC has two Cortex-A9 cores and is targeted for Data Center and
 Hyperscale applications. The following machines are based on this chip :
 
 - ``quanta-gsj``        Quanta GSJ server BMC
diff --git a/docs/system/arm/sabrelite.rst b/docs/system/arm/sabrelite.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/sabrelite.rst
+++ b/docs/system/arm/sabrelite.rst
@@ -XXX,XX +XXX,XX @@ Supported devices
 
 The SABRE Lite machine supports the following devices:
 
- * Up to 4 Cortex A9 cores
+ * Up to 4 Cortex-A9 cores
  * Generic Interrupt Controller
  * 1 Clock Controller Module
  * 1 System Reset Controller
diff --git a/include/hw/arm/allwinner-h3.h b/include/hw/arm/allwinner-h3.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/allwinner-h3.h
+++ b/include/hw/arm/allwinner-h3.h
@@ -XXX,XX +XXX,XX @@
  */
 
 /*
- * The Allwinner H3 is a System on Chip containing four ARM Cortex A7
+ * The Allwinner H3 is a System on Chip containing four ARM Cortex-A7
  * processor cores. Features and specifications include DDR2/DDR3 memory,
  * SD/MMC storage cards, 10/100/1000Mbit Ethernet, USB 2.0, HDMI and
  * various I/O modules.
diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_ast2600_evb_class_init(ObjectClass *oc, void *data)
     MachineClass *mc = MACHINE_CLASS(oc);
     AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 
-    mc->desc       = "Aspeed AST2600 EVB (Cortex A7)";
+    mc->desc       = "Aspeed AST2600 EVB (Cortex-A7)";
     amc->soc_name  = "ast2600-a1";
     amc->hw_strap1 = AST2600_EVB_HW_STRAP1;
     amc->hw_strap2 = AST2600_EVB_HW_STRAP2;
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_tacoma_class_init(ObjectClass *oc, void *data)
     MachineClass *mc = MACHINE_CLASS(oc);
     AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 
-    mc->desc       = "OpenPOWER Tacoma BMC (Cortex A7)";
+    mc->desc       = "OpenPOWER Tacoma BMC (Cortex-A7)";
     amc->soc_name  = "ast2600-a1";
     amc->hw_strap1 = TACOMA_BMC_HW_STRAP1;
     amc->hw_strap2 = TACOMA_BMC_HW_STRAP2;
@@ -XXX,XX +XXX,XX @@ static void aspeed_machine_rainier_class_init(ObjectClass *oc, void *data)
     MachineClass *mc = MACHINE_CLASS(oc);
     AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
 
-    mc->desc       = "IBM Rainier BMC (Cortex A7)";
+    mc->desc       = "IBM Rainier BMC (Cortex-A7)";
     amc->soc_name  = "ast2600-a1";
     amc->hw_strap1 = RAINIER_BMC_HW_STRAP1;
     amc->hw_strap2 = RAINIER_BMC_HW_STRAP2;
diff --git a/hw/arm/mcimx6ul-evk.c b/hw/arm/mcimx6ul-evk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mcimx6ul-evk.c
+++ b/hw/arm/mcimx6ul-evk.c
@@ -XXX,XX +XXX,XX @@ static void mcimx6ul_evk_init(MachineState *machine)
 
 static void mcimx6ul_evk_machine_init(MachineClass *mc)
 {
-    mc->desc = "Freescale i.MX6UL Evaluation Kit (Cortex A7)";
+    mc->desc = "Freescale i.MX6UL Evaluation Kit (Cortex-A7)";
     mc->init = mcimx6ul_evk_init;
     mc->max_cpus = FSL_IMX6UL_NUM_CPUS;
     mc->default_ram_id = "mcimx6ul-evk.ram";
diff --git a/hw/arm/mcimx7d-sabre.c b/hw/arm/mcimx7d-sabre.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mcimx7d-sabre.c
+++ b/hw/arm/mcimx7d-sabre.c
@@ -XXX,XX +XXX,XX @@ static void mcimx7d_sabre_init(MachineState *machine)
 
 static void mcimx7d_sabre_machine_init(MachineClass *mc)
 {
-    mc->desc = "Freescale i.MX7 DUAL SABRE (Cortex A7)";
+    mc->desc = "Freescale i.MX7 DUAL SABRE (Cortex-A7)";
     mc->init = mcimx7d_sabre_init;
     mc->max_cpus = FSL_IMX7_NUM_CPUS;
     mc->default_ram_id = "mcimx7d-sabre.ram";
diff --git a/hw/arm/npcm7xx_boards.c b/hw/arm/npcm7xx_boards.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/npcm7xx_boards.c
+++ b/hw/arm/npcm7xx_boards.c
@@ -XXX,XX +XXX,XX @@ static void npcm750_evb_machine_class_init(ObjectClass *oc, void *data)
 
     npcm7xx_set_soc_type(nmc, TYPE_NPCM750);
 
-    mc->desc = "Nuvoton NPCM750 Evaluation Board (Cortex A9)";
+    mc->desc = "Nuvoton NPCM750 Evaluation Board (Cortex-A9)";
     mc->init = npcm750_evb_init;
     mc->default_ram_size = 512 * MiB;
 };
@@ -XXX,XX +XXX,XX @@ static void gsj_machine_class_init(ObjectClass *oc, void *data)
 
     npcm7xx_set_soc_type(nmc, TYPE_NPCM730);
 
-    mc->desc = "Quanta GSJ (Cortex A9)";
+    mc->desc = "Quanta GSJ (Cortex-A9)";
     mc->init = quanta_gsj_init;
     mc->default_ram_size = 512 * MiB;
 };
diff --git a/hw/arm/sabrelite.c b/hw/arm/sabrelite.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/sabrelite.c
+++ b/hw/arm/sabrelite.c
@@ -XXX,XX +XXX,XX @@ static void sabrelite_init(MachineState *machine)
 
 static void sabrelite_machine_init(MachineClass *mc)
 {
-    mc->desc = "Freescale i.MX6 Quad SABRE Lite Board (Cortex A9)";
+    mc->desc = "Freescale i.MX6 Quad SABRE Lite Board (Cortex-A9)";
     mc->init = sabrelite_init;
     mc->max_cpus = FSL_IMX6_NUM_CPUS;
     mc->ignore_memory_transaction_failures = true;
diff --git a/hw/misc/npcm7xx_clk.c b/hw/misc/npcm7xx_clk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/npcm7xx_clk.c
+++ b/hw/misc/npcm7xx_clk.c
@@ -XXX,XX +XXX,XX @@
 #define NPCM7XX_CLOCK_REF_HZ            (25000000)
 
 /* Register Field Definitions */
-#define NPCM7XX_CLK_WDRCR_CA9C  BIT(0) /* Cortex A9 Cores */
+#define NPCM7XX_CLK_WDRCR_CA9C  BIT(0) /* Cortex-A9 Cores */
 
 #define PLLCON_LOKI     BIT(31)
 #define PLLCON_LOKS     BIT(30)
-- 
2.20.1

From: Damien Goutte-Gattat <dgouttegattat@incenp.org>

The 4.x branch of Sphinx introduces a breaking change, as generated man
pages are now written to subdirectories corresponding to the manual
section they belong to. This results in `make install` erroring out when
attempting to install the man pages, because they are not where it
expects to find them.

This patch restores the behavior of Sphinx 3.x regarding man pages.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/256
Signed-off-by: Damien Goutte-Gattat <dgouttegattat@incenp.org>
Message-id: 20210503161422.15028-1-dgouttegattat@incenp.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/conf.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/conf.py b/docs/conf.py
index XXXXXXX..XXXXXXX 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -XXX,XX +XXX,XX @@
      ['Stefan Hajnoczi <stefanha@redhat.com>',
       'Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>'], 1),
 ]
+man_make_section_directory = False
 
 # -- Options for Texinfo output -------------------------------------------
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The operands to tcg_gen_atomic_fetch_s{min,max}_i64 must
be signed, so that the inputs are properly extended.
Zero extend the result afterward, as needed.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/364
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20210602020720.47679-1-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     int o3_opc = extract32(insn, 12, 4);
     bool r = extract32(insn, 22, 1);
     bool a = extract32(insn, 23, 1);
-    TCGv_i64 tcg_rs, clean_addr;
+    TCGv_i64 tcg_rs, tcg_rt, clean_addr;
     AtomicThreeOpFn *fn = NULL;
+    MemOp mop = s->be_data | size | MO_ALIGN;
 
     if (is_vector || !dc_isar_feature(aa64_atomics, s)) {
         unallocated_encoding(s);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
         break;
     case 004: /* LDSMAX */
         fn = tcg_gen_atomic_fetch_smax_i64;
+        mop |= MO_SIGN;
         break;
     case 005: /* LDSMIN */
         fn = tcg_gen_atomic_fetch_smin_i64;
+        mop |= MO_SIGN;
         break;
     case 006: /* LDUMAX */
         fn = tcg_gen_atomic_fetch_umax_i64;
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     }
 
     tcg_rs = read_cpu_reg(s, rs, true);
+    tcg_rt = cpu_reg(s, rt);
 
     if (o3_opc == 1) { /* LDCLR */
         tcg_gen_not_i64(tcg_rs, tcg_rs);
@@ -XXX,XX +XXX,XX @@ static void disas_ldst_atomic(DisasContext *s, uint32_t insn,
     /* The tcg atomic primitives are all full barriers.  Therefore we
      * can ignore the Acquire and Release bits of this instruction.
      */
-    fn(cpu_reg(s, rt), clean_addr, tcg_rs, get_mem_index(s),
-       s->be_data | size | MO_ALIGN);
+    fn(tcg_rt, clean_addr, tcg_rs, get_mem_index(s), mop);
+
+    if ((mop & MO_SIGN) && size != MO_64) {
+        tcg_gen_ext32u_i64(tcg_rt, tcg_rt);
+    }
 }
 
 /*
-- 
2.20.1

From: Jamie Iles <jamie@nuviainc.com>

The DAIF and PAC checks used raise_exception_ra to raise an exception
and unwind CPU state but raise_exception_ra is currently designed for
handling data aborts as the syndrome is partially precomputed and
encoded in the TB and then merged in merge_syn_data_abort when handling
the data abort.  Using raise_exception_ra for DAIF and PAC checks
results in an empty syndrome being retrieved from data[2] in
restore_state_to_opc and setting ESR to 0.  This manifested as:

kvm [571]: Unknown exception class: esr: 0x000000 –
  Unknown/Uncategorized

when launching a KVM guest when the host qemu used a CPU supporting
EL2+pointer authentication and enabling pointer authentication in the
guest.

Rework raise_exception_ra such that the state is restored before raising
the exception so that the exception is not clobbered by
restore_state_to_opc.

Fixes: 0d43e1a2d29a ("target/arm: Add PAuth helpers")
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
[PMM: added comment]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/op_helper.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ void raise_exception(CPUARMState *env, uint32_t excp,
 void raise_exception_ra(CPUARMState *env, uint32_t excp, uint32_t syndrome,
                         uint32_t target_el, uintptr_t ra)
 {
-    CPUState *cs = do_raise_exception(env, excp, syndrome, target_el);
-    cpu_loop_exit_restore(cs, ra);
+    CPUState *cs = env_cpu(env);
+
+    /*
+     * restore_state_to_opc() will set env->exception.syndrome, so
+     * we must restore CPU state here before setting the syndrome
+     * the caller passed us, and cannot use cpu_loop_exit_restore().
+     */
+    cpu_restore_state(cs, ra, true);
+    raise_exception(env, excp, syndrome, target_el);
 }
 
 uint64_t HELPER(neon_tbl)(CPUARMState *env, uint32_t desc,
-- 
2.20.1

From: Jamie Iles <jamie@nuviainc.com>

Now that there are no other users of do_raise_exception, fold it into
raise_exception.

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/op_helper.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@
 #define SIGNBIT (uint32_t)0x80000000
 #define SIGNBIT64 ((uint64_t)1 << 63)
 
-static CPUState *do_raise_exception(CPUARMState *env, uint32_t excp,
-                                    uint32_t syndrome, uint32_t target_el)
+void raise_exception(CPUARMState *env, uint32_t excp,
+                     uint32_t syndrome, uint32_t target_el)
 {
     CPUState *cs = env_cpu(env);
 
@@ -XXX,XX +XXX,XX @@ static CPUState *do_raise_exception(CPUARMState *env, uint32_t excp,
     cs->exception_index = excp;
     env->exception.syndrome = syndrome;
     env->exception.target_el = target_el;
-
-    return cs;
-}
-
-void raise_exception(CPUARMState *env, uint32_t excp,
-                     uint32_t syndrome, uint32_t target_el)
-{
-    CPUState *cs = do_raise_exception(env, excp, syndrome, target_el);
     cpu_loop_exit(cs);
 }
 
-- 
2.20.1

From: Jamie Iles <jamie@nuviainc.com>

Now that raise_exception_ra restores the state before raising the
exception we can use restore_exception_ra to perform the state restore +
exception raising without clobbering the syndrome.

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
[PMM: Keep the one line of the comment that is still relevant]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/mte_helper.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
 
     switch (tcf) {
     case 1:
-        /*
-         * Tag check fail causes a synchronous exception.
-         *
-         * In restore_state_to_opc, we set the exception syndrome
-         * for the load or store operation.  Unwind first so we
-         * may overwrite that with the syndrome for the tag check.
-         */
-        cpu_restore_state(env_cpu(env), ra, true);
+        /* Tag check fail causes a synchronous exception. */
         env->exception.vaddress = dirty_ptr;
 
         is_write = FIELD_EX32(desc, MTEDESC, WRITE);
         syn = syn_data_abort_no_iss(arm_current_el(env) != 0, 0, 0, 0, 0,
                                     is_write, 0x11);
-        raise_exception(env, EXCP_DATA_ABORT, syn, exception_target_el(env));
+        raise_exception_ra(env, EXCP_DATA_ABORT, syn,
+                           exception_target_el(env), ra);
         /* noreturn, but fall through to the assert anyway */
 
     case 0:
-- 
2.20.1

From: Jamie Iles <jamie@nuviainc.com>

The sequence cpu_restore_state() + raise_exception() is equivalent to
raise_exception_ra(), so use that instead.  (In this case we never
cared about the syndrome value, because M-profile doesn't use the
syndrome; the old code was just written unnecessarily awkwardly.)

Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
[PMM: Retain edited version of comment; rewrite commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/m_helper.c  | 5 +----
 target/arm/op_helper.c | 9 +++------
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
             limit = is_psp ? env->v7m.psplim[false] : env->v7m.msplim[false];
 
             if (val < limit) {
-                CPUState *cs = env_cpu(env);
-
-                cpu_restore_state(cs, GETPC(), true);
-                raise_exception(env, EXCP_STKOF, 0, 1);
+                raise_exception_ra(env, EXCP_STKOF, 0, 1, GETPC());
             }
 
             if (is_psp) {
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v8m_stackcheck)(CPUARMState *env, uint32_t newvalue)
      * raising an exception if the limit is breached.
      */
     if (newvalue < v7m_sp_limit(env)) {
-        CPUState *cs = env_cpu(env);
-
         /*
          * Stack limit exceptions are a rare case, so rather than syncing
-         * PC/condbits before the call, we use cpu_restore_state() to
-         * get them right before raising the exception.
+         * PC/condbits before the call, we use raise_exception_ra() so
+         * that cpu_restore_state() will sort them out.
          */
-        cpu_restore_state(cs, GETPC(), true);
-        raise_exception(env, EXCP_STKOF, 0, 1);
+        raise_exception_ra(env, EXCP_STKOF, 0, 1, GETPC());
     }
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Note that the SVE BFLOAT16 support does not require SVE2,
it is an independent extension.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_predinv(const ARMISARegisters *id)
     return FIELD_EX32(id->id_isar6, ID_ISAR6, SPECRES) != 0;
 }
 
+static inline bool isar_feature_aa32_bf16(const ARMISARegisters *id)
+{
+    return FIELD_EX32(id->id_isar6, ID_ISAR6, BF16) != 0;
+}
+
 static inline bool isar_feature_aa32_i8mm(const ARMISARegisters *id)
 {
     return FIELD_EX32(id->id_isar6, ID_ISAR6, I8MM) != 0;
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_dcpodp(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, DPB) >= 2;
 }
 
+static inline bool isar_feature_aa64_bf16(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, BF16) != 0;
+}
+
 static inline bool isar_feature_aa64_fp_simd(const ARMISARegisters *id)
 {
     /* We always set the AdvSIMD and FP fields identically.  */
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_sve2_bitperm(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BITPERM) != 0;
 }
 
+static inline bool isar_feature_aa64_sve_bf16(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, BFLOAT16) != 0;
+}
+
 static inline bool isar_feature_aa64_sve2_sha3(const ARMISARegisters *id)
 {
     return FIELD_EX64(id->id_aa64zfr0, ID_AA64ZFR0, SHA3) != 0;
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
     int rd = extract32(insn, 0, 5);
 
     if (mos) {
-        unallocated_encoding(s);
-        return;
+        goto do_unallocated;
     }
 
     switch (opcode) {
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
         /* FCVT between half, single and double precision */
         int dtype = extract32(opcode, 0, 2);
         if (type == 2 || dtype == type) {
-            unallocated_encoding(s);
-            return;
+            goto do_unallocated;
         }
         if (!fp_access_check(s)) {
             return;
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
 
     case 0x10 ... 0x13: /* FRINT{32,64}{X,Z} */
         if (type > 1 || !dc_isar_feature(aa64_frint, s)) {
-            unallocated_encoding(s);
-            return;
+            goto do_unallocated;
         }
         /* fall through */
     case 0x0 ... 0x3:
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
             break;
         case 3:
             if (!dc_isar_feature(aa64_fp16, s)) {
-                unallocated_encoding(s);
-                return;
+                goto do_unallocated;
             }
 
             if (!fp_access_check(s)) {
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
             handle_fp_1src_half(s, opcode, rd, rn);
             break;
         default:
-            unallocated_encoding(s);
+            goto do_unallocated;
         }
         break;
 
     default:
+    do_unallocated:
         unallocated_encoding(s);
         break;
     }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is the 64-bit BFCVT and the 32-bit VCVT{B,T}.BF16.F32.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h        |  1 +
 target/arm/vfp.decode      |  2 ++
 target/arm/translate-a64.c | 19 +++++++++++++++++++
 target/arm/translate-vfp.c | 24 ++++++++++++++++++++++++
 target/arm/vfp_helper.c    |  5 +++++
 5 files changed, 51 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
 
 DEF_HELPER_2(vfp_fcvtds, f64, f32, env)
 DEF_HELPER_2(vfp_fcvtsd, f32, f64, env)
+DEF_HELPER_FLAGS_2(bfcvt, TCG_CALL_NO_RWG, i32, f32, ptr)
 
 DEF_HELPER_2(vfp_uitoh, f16, i32, ptr)
 DEF_HELPER_2(vfp_uitos, f32, i32, ptr)
diff --git a/target/arm/vfp.decode b/target/arm/vfp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp.decode
+++ b/target/arm/vfp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_f64_f16 ---- 1110 1.11 0010 .... 1011 t:1 1.0 .... \
 
 # VCVTB and VCVTT to f16: Vd format is always vd_sp;
 # Vm format depends on size bit
+VCVT_b16_f32 ---- 1110 1.11 0011 .... 1001 t:1 1.0 .... \
+             vd=%vd_sp vm=%vm_sp
 VCVT_f16_f32 ---- 1110 1.11 0011 .... 1010 t:1 1.0 .... \
              vd=%vd_sp vm=%vm_sp
 VCVT_f16_f64 ---- 1110 1.11 0011 .... 1011 t:1 1.0 .... \
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_single(DisasContext *s, int opcode, int rd, int rn)
     case 0x3: /* FSQRT */
         gen_helper_vfp_sqrts(tcg_res, tcg_op, cpu_env);
         goto done;
+    case 0x6: /* BFCVT */
+        gen_fpst = gen_helper_bfcvt;
+        break;
     case 0x8: /* FRINTN */
     case 0x9: /* FRINTP */
     case 0xa: /* FRINTM */
@@ -XXX,XX +XXX,XX @@ static void disas_fp_1src(DisasContext *s, uint32_t insn)
         }
         break;
 
+    case 0x6:
+        switch (type) {
+        case 1: /* BFCVT */
+            if (!dc_isar_feature(aa64_bf16, s)) {
+                goto do_unallocated;
+            }
+            if (!fp_access_check(s)) {
+                return;
+            }
+            handle_fp_1src_single(s, opcode, rd, rn);
+            break;
+        default:
+            goto do_unallocated;
+        }
+        break;
+
     default:
     do_unallocated:
         unallocated_encoding(s);
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
     return true;
 }
 
+static bool trans_VCVT_b16_f32(DisasContext *s, arg_VCVT_b16_f32 *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = fpstatus_ptr(FPST_FPCR);
+    tmp = tcg_temp_new_i32();
+
+    vfp_load_reg32(tmp, a->vm);
+    gen_helper_bfcvt(tmp, tmp, fpst);
+    tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
+
 static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
 {
     TCGv_ptr fpst;
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ float32 VFP_HELPER(fcvts, d)(float64 x, CPUARMState *env)
     return float64_to_float32(x, &env->vfp.fp_status);
 }
 
+uint32_t HELPER(bfcvt)(float32 x, void *status)
+{
+    return float32_to_bfloat16(x, status);
+}
+
 /*
  * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
  * must always round-to-nearest; the AArch64 ones honour the FPSCR
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFCVT{N,T} for both AArch64 AdvSIMD and SVE,
and VCVT.BF16.F32 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-sve.h     |  4 ++++
 target/arm/helper.h         |  1 +
 target/arm/neon-dp.decode   |  1 +
 target/arm/sve.decode       |  2 ++
 target/arm/sve_helper.c     |  2 ++
 target/arm/translate-a64.c  | 17 ++++++++++++++
 target/arm/translate-neon.c | 45 +++++++++++++++++++++++++++++++++++++
 target/arm/translate-sve.c  | 16 +++++++++++++
 target/arm/vfp_helper.c     |  7 ++++++
 9 files changed, 95 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve_fcvt_hd, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_fcvt_sd, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_bfcvt, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(sve_fcvtzs_hh, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sve2_fcvtnt_sh, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve2_fcvtnt_ds, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_bfcvtnt, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(sve2_fcvtlt_hs, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_cmped, void, f64, f64, env)
 DEF_HELPER_2(vfp_fcvtds, f64, f32, env)
 DEF_HELPER_2(vfp_fcvtsd, f32, f64, env)
 DEF_HELPER_FLAGS_2(bfcvt, TCG_CALL_NO_RWG, i32, f32, ptr)
+DEF_HELPER_FLAGS_2(bfcvt_pair, TCG_CALL_NO_RWG, i32, i64, ptr)
 
 DEF_HELPER_2(vfp_uitoh, f16, i32, ptr)
 DEF_HELPER_2(vfp_uitos, f32, i32, ptr)
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
     VRINTZ       1111 001 11 . 11 .. 10 .... 0 1011 . . 0 .... @2misc
 
     VCVT_F16_F32 1111 001 11 . 11 .. 10 .... 0 1100 0 . 0 .... @2misc_q0
+    VCVT_B16_F32 1111 001 11 . 11 .. 10 .... 0 1100 1 . 0 .... @2misc_q0
 
     VRINTM       1111 001 11 . 11 .. 10 .... 0 1101 . . 0 .... @2misc
 
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FNMLS_zpzzz     01100101 .. 1 ..... 111 ... ..... .....         @rdn_pg_rm_ra
 # SVE floating-point convert precision
 FCVT_sh         01100101 10 0010 00 101 ... ..... .....         @rd_pg_rn_e0
 FCVT_hs         01100101 10 0010 01 101 ... ..... .....         @rd_pg_rn_e0
+BFCVT           01100101 10 0010 10 101 ... ..... .....         @rd_pg_rn_e0
 FCVT_dh         01100101 11 0010 00 101 ... ..... .....         @rd_pg_rn_e0
 FCVT_hd         01100101 11 0010 01 101 ... ..... .....         @rd_pg_rn_e0
 FCVT_ds         01100101 11 0010 10 101 ... ..... .....         @rd_pg_rn_e0
@@ -XXX,XX +XXX,XX @@ RAX1            01000101 00 1 ..... 11110 1 ..... .....  @rd_rn_rm_e0
 FCVTXNT_ds      01100100 00 0010 10 101 ... ..... .....  @rd_pg_rn_e0
 FCVTX_ds        01100101 00 0010 10 101 ... ..... .....  @rd_pg_rn_e0
 FCVTNT_sh       01100100 10 0010 00 101 ... ..... .....  @rd_pg_rn_e0
+BFCVTNT         01100100 10 0010 10 101 ... ..... .....  @rd_pg_rn_e0
 FCVTLT_hs       01100100 10 0010 01 101 ... ..... .....  @rd_pg_rn_e0
 FCVTNT_ds       01100100 11 0010 10 101 ... ..... .....  @rd_pg_rn_e0
 FCVTLT_sd       01100100 11 0010 11 101 ... ..... .....  @rd_pg_rn_e0
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint64_t vfp_float64_to_uint64_rtz(float64 f, float_status *s)
 
 DO_ZPZ_FP(sve_fcvt_sh, uint32_t, H1_4, sve_f32_to_f16)
 DO_ZPZ_FP(sve_fcvt_hs, uint32_t, H1_4, sve_f16_to_f32)
+DO_ZPZ_FP(sve_bfcvt,   uint32_t, H1_4, float32_to_bfloat16)
 DO_ZPZ_FP(sve_fcvt_dh, uint64_t,     , sve_f64_to_f16)
 DO_ZPZ_FP(sve_fcvt_hd, uint64_t,     , sve_f16_to_f64)
 DO_ZPZ_FP(sve_fcvt_ds, uint64_t,     , float64_to_float32)
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vg, void *status, uint32_t desc)  \
     } while (i != 0);                                                         \
 }
 
+DO_FCVTNT(sve_bfcvtnt,    uint32_t, uint16_t, H1_4, H1_2, float32_to_bfloat16)
 DO_FCVTNT(sve2_fcvtnt_sh, uint32_t, uint16_t, H1_4, H1_2, sve_f32_to_f16)
 DO_FCVTNT(sve2_fcvtnt_ds, uint64_t, uint32_t,     , H1_4, float64_to_float32)
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_2misc_narrow(DisasContext *s, bool scalar,
                 tcg_temp_free_i32(ahp);
             }
             break;
+        case 0x36: /* BFCVTN, BFCVTN2 */
+            {
+                TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
+                gen_helper_bfcvt_pair(tcg_res[pass], tcg_op, fpst);
+                tcg_temp_free_ptr(fpst);
+            }
+            break;
         case 0x56:  /* FCVTXN, FCVTXN2 */
             /* 64 bit to 32 bit float conversion
              * with von Neumann rounding (round to odd)
@@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
             }
             handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd);
             return;
+        case 0x36: /* BFCVTN, BFCVTN2 */
+            if (!dc_isar_feature(aa64_bf16, s) || size != 2) {
+                unallocated_encoding(s);
+                return;
+            }
+            if (!fp_access_check(s)) {
+                return;
+            }
+            handle_2misc_narrow(s, false, opcode, 0, is_q, size - 1, rn, rd);
+            return;
         case 0x17: /* FCVTL, FCVTL2 */
             if (!fp_access_check(s)) {
                 return;
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
     return true;
 }
 
+static bool trans_VCVT_B16_F32(DisasContext *s, arg_2misc *a)
+{
+    TCGv_ptr fpst;
+    TCGv_i64 tmp;
+    TCGv_i32 dst0, dst1;
+
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vm & 1) || (a->size != 1)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    fpst = fpstatus_ptr(FPST_STD);
+    tmp = tcg_temp_new_i64();
+    dst0 = tcg_temp_new_i32();
+    dst1 = tcg_temp_new_i32();
+
+    read_neon_element64(tmp, a->vm, 0, MO_64);
+    gen_helper_bfcvt_pair(dst0, tmp, fpst);
+
+    read_neon_element64(tmp, a->vm, 1, MO_64);
+    gen_helper_bfcvt_pair(dst1, tmp, fpst);
+
+    write_neon_element32(dst0, a->vd, 0, MO_32);
+    write_neon_element32(dst1, a->vd, 1, MO_32);
+
+    tcg_temp_free_i64(tmp);
+    tcg_temp_free_i32(dst0);
+    tcg_temp_free_i32(dst1);
+    tcg_temp_free_ptr(fpst);
+    return true;
+}
+
 static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
 {
     TCGv_ptr fpst;
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVT_hs(DisasContext *s, arg_rpr_esz *a)
     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_hs);
 }
 
+static bool trans_BFCVT(DisasContext *s, arg_rpr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_bfcvt);
+}
+
 static bool trans_FCVT_dh(DisasContext *s, arg_rpr_esz *a)
 {
     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_fcvt_dh);
@@ -XXX,XX +XXX,XX @@ static bool trans_FCVTNT_sh(DisasContext *s, arg_rpr_esz *a)
     return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve2_fcvtnt_sh);
 }
 
+static bool trans_BFCVTNT(DisasContext *s, arg_rpr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    return do_zpz_ptr(s, a->rd, a->rn, a->pg, false, gen_helper_sve_bfcvtnt);
+}
+
 static bool trans_FCVTNT_ds(DisasContext *s, arg_rpr_esz *a)
 {
     if (!dc_isar_feature(aa64_sve2, s)) {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(bfcvt)(float32 x, void *status)
     return float32_to_bfloat16(x, status);
 }
 
+uint32_t HELPER(bfcvt_pair)(uint64_t pair, void *status)
+{
+    bfloat16 lo = float32_to_bfloat16(extract64(pair, 0, 32), status);
+    bfloat16 hi = float32_to_bfloat16(extract64(pair, 32, 32), status);
+    return deposit32(lo, 16, 16, hi);
+}
+
 /*
  * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
  * must always round-to-nearest; the AArch64 ones honour the FPSCR
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

For Arm BFDOT and BFMMLA, we need a version of round-to-odd
that overflows to infinity, instead of the max normal number.

Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-6-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/fpu/softfloat-types.h | 4 +++-
 fpu/softfloat-parts.c.inc     | 6 ++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index XXXXXXX..XXXXXXX 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
     float_round_up           = 2,
     float_round_to_zero      = 3,
     float_round_ties_away    = 4,
-    /* Not an IEEE rounding mode: round to the closest odd mantissa value */
+    /* Not an IEEE rounding mode: round to closest odd, overflow to max */
     float_round_to_odd       = 5,
+    /* Not an IEEE rounding mode: round to closest odd, overflow to inf */
+    float_round_to_odd_inf   = 6,
 } FloatRoundMode;
 
 /*
diff --git a/fpu/softfloat-parts.c.inc b/fpu/softfloat-parts.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-parts.c.inc
+++ b/fpu/softfloat-parts.c.inc
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
         g_assert_not_reached();
     }
 
+    overflow_norm = false;
     switch (s->float_rounding_mode) {
     case float_round_nearest_even:
-        overflow_norm = false;
         inc = ((p->frac_lo & roundeven_mask) != frac_lsbm1 ? frac_lsbm1 : 0);
         break;
     case float_round_ties_away:
-        overflow_norm = false;
         inc = frac_lsbm1;
         break;
     case float_round_to_zero:
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
         break;
     case float_round_to_odd:
         overflow_norm = true;
+        /* fall through */
+    case float_round_to_odd_inf:
         inc = p->frac_lo & frac_lsb ? 0 : round_mask;
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void partsN(uncanon)(FloatPartsN *p, float_status *s,
                        ? frac_lsbm1 : 0);
                 break;
             case float_round_to_odd:
+            case float_round_to_odd_inf:
                 inc = p->frac_lo & frac_lsb ? 0 : round_mask;
                 break;
             default:
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFDOT for both AArch64 AdvSIMD and SVE,
and VDOT.BF16 for AArch32 NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-7-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  3 +++
 target/arm/neon-shared.decode |  2 ++
 target/arm/sve.decode         |  3 +++
 target/arm/translate-a64.c    | 20 ++++++++++++++++++
 target/arm/translate-neon.c   |  9 ++++++++
 target/arm/translate-sve.c    | 12 +++++++++++
 target/arm/vec_helper.c       | 40 +++++++++++++++++++++++++++++++++++
 7 files changed, 89 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_ummla_b, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUDOT          1111 110 00 . 10 .... .... 1101 . q:1 . 1 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
 VUSDOT         1111 110 01 . 10 .... .... 1101 . q:1 . 0 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+VDOT_b16       1111 110 00 . 00 .... .... 1101 . q:1 . 0 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
 # VFM[AS]L
 VFML           1111 110 0 s:1 . 10 .... .... 1000 . 0 . 1 .... \
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
 FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_e0
 FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_e0
 
+### SVE2 floating-point bfloat16 dot-product
+BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
+
 ### SVE2 floating-point multiply-add long (indexed)
 FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
 FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         feature = dc_isar_feature(aa64_fcma, s);
         break;
+    case 0x1f: /* BFDOT */
+        switch (size) {
+        case 1:
+            feature = dc_isar_feature(aa64_bf16, s);
+            break;
+        default:
+            unallocated_encoding(s);
+            return;
+        }
+        break;
     default:
         unallocated_encoding(s);
         return;
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         return;
 
+    case 0xf: /* BFDOT */
+        switch (size) {
+        case 1:
+            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+        return;
+
     default:
         g_assert_not_reached();
     }
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VUSDOT(DisasContext *s, arg_VUSDOT *a)
                         gen_helper_gvec_usdot_b);
 }
 
+static bool trans_VDOT_b16(DisasContext *s, arg_VDOT_b16 *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda(s, a->q * 7, a->vd, a->vn, a->vm, 0,
+                        gen_helper_gvec_bfdot);
+}
+
 static bool trans_VFML(DisasContext *s, arg_VFML *a)
 {
     int opr_sz;
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_UMMLA(DisasContext *s, arg_rrrr_esz *a)
 {
     return do_i8mm_zzzz_ool(s, a, gen_helper_gvec_ummla_b, 0);
 }
+
+static bool trans_BFDOT_zzzz(DisasContext *s, arg_rrrr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfdot,
+                          a->rd, a->rn, a->rm, a->ra, 0);
+    }
+    return true;
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static void do_mmla_b(void *vd, void *vn, void *vm, void *va, uint32_t desc,
 DO_MMLA_B(gvec_smmla_b, do_smmla_b)
 DO_MMLA_B(gvec_ummla_b, do_ummla_b)
 DO_MMLA_B(gvec_usmmla_b, do_usmmla_b)
+
+/*
+ * BFloat16 Dot Product
+ */
+
+static float32 bfdotadd(float32 sum, uint32_t e1, uint32_t e2)
+{
+    /* FPCR is ignored for BFDOT and BFMMLA. */
+    float_status bf_status = {
+        .tininess_before_rounding = float_tininess_before_rounding,
+        .float_rounding_mode = float_round_to_odd_inf,
+        .flush_to_zero = true,
+        .flush_inputs_to_zero = true,
+        .default_nan_mode = true,
+    };
+    float32 t1, t2;
+
+    /*
+     * Extract each BFloat16 from the element pair, and shift
+     * them such that they become float32.
+     */
+    t1 = float32_mul(e1 << 16, e2 << 16, &bf_status);
+    t2 = float32_mul(e1 & 0xffff0000u, e2 & 0xffff0000u, &bf_status);
+    t1 = float32_add(t1, t2, &bf_status);
+    t1 = float32_add(sum, t1, &bf_status);
+
+    return t1;
+}
+
+void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    float32 *d = vd, *a = va;
+    uint32_t *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 4; ++i) {
+        d[i] = bfdotadd(a[i], n[i], m[i]);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFDOT for both AArch64 AdvSIMD and SVE,
and VDOT.BF16 for AArch32 NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  2 ++
 target/arm/neon-shared.decode |  2 ++
 target/arm/sve.decode         |  3 +++
 target/arm/translate-a64.c    | 41 +++++++++++++++++++++++++++--------
 target/arm/translate-neon.c   |  9 ++++++++
 target/arm/translate-sve.c    | 12 ++++++++++
 target/arm/vec_helper.c       | 20 +++++++++++++++++
 7 files changed, 80 insertions(+), 9 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_usmmla_b, TCG_CALL_NO_RWG,
 
 DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
 
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUSDOT_scalar  1111 1110 1 . 00 .... .... 1101 . q:1 index:1 0 vm:4 \
                vn=%vn_dp vd=%vd_dp
 VSUDOT_scalar  1111 1110 1 . 00 .... .... 1101 . q:1 index:1 1 vm:4 \
                vn=%vn_dp vd=%vd_dp
+VDOT_b16_scal  1111 1110 0 . 00 .... .... 1101 . q:1 index:1 0 vm:4 \
+               vn=%vn_dp vd=%vd_dp
 
 %vfml_scalar_q0_rm 0:3 5:1
 %vfml_scalar_q1_index 5:1 3:1
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
 FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
 FMLSLB_zzxw     01100100 10 1 ..... 0110.0 ..... .....     @rrxr_3a esz=2
 FMLSLT_zzxw     01100100 10 1 ..... 0110.1 ..... .....     @rrxr_3a esz=2
+
+### SVE2 floating-point bfloat16 dot-product (indexed)
+BFDOT_zzxz      01100100 01 1 ..... 010000 ..... .....     @rrxr_2 esz=2
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
             return;
         }
         break;
-    case 0x0f: /* SUDOT, USDOT */
-        if (is_scalar || (size & 1) || !dc_isar_feature(aa64_i8mm, s)) {
+    case 0x0f:
+        switch (size) {
+        case 0: /* SUDOT */
+        case 2: /* USDOT */
+            if (is_scalar || !dc_isar_feature(aa64_i8mm, s)) {
+                unallocated_encoding(s);
+                return;
+            }
+            break;
+        case 1: /* BFDOT */
+            if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
+                unallocated_encoding(s);
+                return;
+            }
+            break;
+        default:
             unallocated_encoding(s);
             return;
         }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                          u ? gen_helper_gvec_udot_idx_b
                          : gen_helper_gvec_sdot_idx_b);
         return;
-    case 0x0f: /* SUDOT, USDOT */
-        gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
-                         extract32(insn, 23, 1)
-                         ? gen_helper_gvec_usdot_idx_b
-                         : gen_helper_gvec_sudot_idx_b);
-        return;
-
+    case 0x0f:
+        switch (extract32(insn, 22, 2)) {
+        case 0: /* SUDOT */
+            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
+                             gen_helper_gvec_sudot_idx_b);
+            return;
+        case 1: /* BFDOT */
+            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
+                             gen_helper_gvec_bfdot_idx);
+            return;
+        case 2: /* USDOT */
+            gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
+                             gen_helper_gvec_usdot_idx_b);
+            return;
+        }
+        g_assert_not_reached();
     case 0x11: /* FCMLA #0 */
     case 0x13: /* FCMLA #90 */
     case 0x15: /* FCMLA #180 */
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSUDOT_scalar(DisasContext *s, arg_VSUDOT_scalar *a)
                         gen_helper_gvec_sudot_idx_b);
 }
 
+static bool trans_VDOT_b16_scal(DisasContext *s, arg_VDOT_b16_scal *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda(s, a->q * 6, a->vd, a->vn, a->vm, a->index,
+                        gen_helper_gvec_bfdot_idx);
+}
+
 static bool trans_VFML_scalar(DisasContext *s, arg_VFML_scalar *a)
 {
     int opr_sz;
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFDOT_zzzz(DisasContext *s, arg_rrrr_esz *a)
     }
     return true;
 }
+
+static bool trans_BFDOT_zzxz(DisasContext *s, arg_rrxr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfdot_idx,
+                          a->rd, a->rn, a->rm, a->ra, a->index);
+    }
+    return true;
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm,
+                            void *va, uint32_t desc)
+{
+    intptr_t i, j, opr_sz = simd_oprsz(desc);
+    intptr_t index = simd_data(desc);
+    intptr_t elements = opr_sz / 4;
+    intptr_t eltspersegment = MIN(16 / 4, elements);
+    float32 *d = vd, *a = va;
+    uint32_t *n = vn, *m = vm;
+
+    for (i = 0; i < elements; i += eltspersegment) {
+        uint32_t m_idx = m[i + H4(index)];
+
+        for (j = i; j < i + eltspersegment; j++) {
+            d[j] = bfdotadd(a[j], n[j], m_idx);
+        }
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFMMLA for both AArch64 AdvSIMD and SVE,
and VMMLA.BF16 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  3 +++
 target/arm/neon-shared.decode |  2 ++
 target/arm/sve.decode         |  6 +++--
 target/arm/translate-a64.c    | 10 +++++++++
 target/arm/translate-neon.c   |  9 ++++++++
 target/arm/translate-sve.c    | 12 ++++++++++
 target/arm/vec_helper.c       | 42 ++++++++++++++++++++++++++++++++++-
 7 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfdot, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUMMLA         1111 1100 0.10 .... .... 1100 .1.1 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
 VUSMMLA        1111 1100 1.10 .... .... 1100 .1.0 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
+VMMLA_b16      1111 1100 0.00 .... .... 1100 .1.0 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
 VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
                vn=%vn_dp vd=%vd_dp size=1
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ SQRDCMLAH_zzzz  01000100 esz:2 0 rm:5 0011 rot:2 rn:5 rd:5  ra=%reg_movprfx
 USDOT_zzzz      01000100 .. 0 ..... 011 110 ..... .....  @rda_rn_rm
 
 ### SVE2 floating point matrix multiply accumulate
-
-FMMLA           01100100 .. 1 ..... 111001 ..... .....  @rda_rn_rm
+{
+  BFMMLA        01100100 01 1 ..... 111 001 ..... .....  @rda_rn_rm_e0
+  FMMLA         01100100 .. 1 ..... 111 001 ..... .....  @rda_rn_rm
+}
 
 ### SVE2 Memory Gather Load Group
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         feature = dc_isar_feature(aa64_fcma, s);
         break;
+    case 0x1d: /* BFMMLA */
+        if (size != MO_16 || !is_q) {
+            unallocated_encoding(s);
+            return;
+        }
+        feature = dc_isar_feature(aa64_bf16, s);
+        break;
     case 0x1f: /* BFDOT */
         switch (size) {
         case 1:
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         return;
 
+    case 0xd: /* BFMMLA */
+        gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
+        return;
     case 0xf: /* BFDOT */
         switch (size) {
         case 1:
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VUSMMLA(DisasContext *s, arg_VUSMMLA *a)
     return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
                         gen_helper_gvec_usmmla_b);
 }
+
+static bool trans_VMMLA_b16(DisasContext *s, arg_VMMLA_b16 *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
+                        gen_helper_gvec_bfmmla);
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFDOT_zzxz(DisasContext *s, arg_rrxr_esz *a)
     }
     return true;
 }
+
+static bool trans_BFMMLA(DisasContext *s, arg_rrrr_esz *a)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        gen_gvec_ool_zzzz(s, gen_helper_gvec_bfmmla,
+                          a->rd, a->rn, a->rm, a->ra, 0);
+    }
+    return true;
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ static void do_mmla_b(void *vd, void *vn, void *vm, void *va, uint32_t desc,
          * Process the entire segment at once, writing back the
          * results only after we've consumed all of the inputs.
          *
-         * Key to indicies by column:
+         * Key to indices by column:
          *          i   j                  i             j
          */
         sum0 = a[H4(0 + 0)];
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfdot_idx)(void *vd, void *vn, void *vm,
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
+{
+    intptr_t s, opr_sz = simd_oprsz(desc);
+    float32 *d = vd, *a = va;
+    uint32_t *n = vn, *m = vm;
+
+    for (s = 0; s < opr_sz / 4; s += 4) {
+        float32 sum00, sum01, sum10, sum11;
+
+        /*
+         * Process the entire segment at once, writing back the
+         * results only after we've consumed all of the inputs.
+         *
+         * Key to indicies by column:
+         *               i   j           i   k             j   k
+         */
+        sum00 = a[s + H4(0 + 0)];
+        sum00 = bfdotadd(sum00, n[s + H4(0 + 0)], m[s + H4(0 + 0)]);
+        sum00 = bfdotadd(sum00, n[s + H4(0 + 1)], m[s + H4(0 + 1)]);
+
+        sum01 = a[s + H4(0 + 1)];
+        sum01 = bfdotadd(sum01, n[s + H4(0 + 0)], m[s + H4(2 + 0)]);
+        sum01 = bfdotadd(sum01, n[s + H4(0 + 1)], m[s + H4(2 + 1)]);
+
+        sum10 = a[s + H4(2 + 0)];
+        sum10 = bfdotadd(sum10, n[s + H4(2 + 0)], m[s + H4(0 + 0)]);
+        sum10 = bfdotadd(sum10, n[s + H4(2 + 1)], m[s + H4(0 + 1)]);
+
+        sum11 = a[s + H4(2 + 1)];
+        sum11 = bfdotadd(sum11, n[s + H4(2 + 0)], m[s + H4(2 + 0)]);
+        sum11 = bfdotadd(sum11, n[s + H4(2 + 1)], m[s + H4(2 + 1)]);
+
+        d[s + H4(0 + 0)] = sum00;
+        d[s + H4(0 + 1)] = sum01;
+        d[s + H4(2 + 0)] = sum10;
+        d[s + H4(2 + 1)] = sum11;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
and VFMA{B,T}.BF16 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-10-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  3 +++
 target/arm/neon-shared.decode |  3 +++
 target/arm/sve.decode         |  3 +++
 target/arm/translate-a64.c    | 13 +++++++++----
 target/arm/translate-neon.c   |  9 +++++++++
 target/arm/translate-sve.c    | 30 ++++++++++++++++++++++++++++++
 target/arm/vec_helper.c       | 16 ++++++++++++++++
 7 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfdot_idx, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VUSMMLA        1111 1100 1.10 .... .... 1100 .1.0 .... \
 VMMLA_b16      1111 1100 0.00 .... .... 1100 .1.0 .... \
                vm=%vm_dp vn=%vn_dp vd=%vd_dp
 
+VFMA_b16       1111 110 0 0.11 .... .... 1000 . q:1 . 1 .... \
+               vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
 VCMLA_scalar   1111 1110 0 . rot:2 .... .... 1000 . q:1 index:1 0 vm:4 \
                vn=%vn_dp vd=%vd_dp size=1
 VCMLA_scalar   1111 1110 1 . rot:2 .... .... 1000 . q:1 . 0 .... \
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALT_zzzw     01100100 10 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
 FMLSLB_zzzw     01100100 10 1 ..... 10 1 00 0 ..... .....  @rda_rn_rm_e0
 FMLSLT_zzzw     01100100 10 1 ..... 10 1 00 1 ..... .....  @rda_rn_rm_e0
 
+BFMLALB_zzzw    01100100 11 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
+BFMLALT_zzzw    01100100 11 1 ..... 10 0 00 1 ..... .....  @rda_rn_rm_e0
+
 ### SVE2 floating-point bfloat16 dot-product
 BFDOT_zzzz      01100100 01 1 ..... 10 0 00 0 ..... .....  @rda_rn_rm_e0
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
         }
         feature = dc_isar_feature(aa64_bf16, s);
         break;
-    case 0x1f: /* BFDOT */
+    case 0x1f:
         switch (size) {
-        case 1:
+        case 1: /* BFDOT */
+        case 3: /* BFMLAL{B,T} */
             feature = dc_isar_feature(aa64_bf16, s);
             break;
         default:
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_same_extra(DisasContext *s, uint32_t insn)
     case 0xd: /* BFMMLA */
         gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfmmla);
         return;
-    case 0xf: /* BFDOT */
+    case 0xf:
         switch (size) {
-        case 1:
+        case 1: /* BFDOT */
             gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, 0, gen_helper_gvec_bfdot);
             break;
+        case 3: /* BFMLAL{B,T} */
+            gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, false, is_q,
+                              gen_helper_gvec_bfmlal);
+            break;
         default:
             g_assert_not_reached();
         }
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMMLA_b16(DisasContext *s, arg_VMMLA_b16 *a)
     return do_neon_ddda(s, 7, a->vd, a->vn, a->vm, 0,
                         gen_helper_gvec_bfmmla);
 }
+
+static bool trans_VFMA_b16(DisasContext *s, arg_VFMA_b16 *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda_fpst(s, 7, a->vd, a->vn, a->vm, a->q, FPST_STD,
+                             gen_helper_gvec_bfmlal);
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMMLA(DisasContext *s, arg_rrrr_esz *a)
     }
     return true;
 }
+
+static bool do_BFMLAL_zzzw(DisasContext *s, arg_rrrr_esz *a, bool sel)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        TCGv_ptr status = fpstatus_ptr(FPST_FPCR);
+        unsigned vsz = vec_full_reg_size(s);
+
+        tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
+                           vec_full_reg_offset(s, a->rn),
+                           vec_full_reg_offset(s, a->rm),
+                           vec_full_reg_offset(s, a->ra),
+                           status, vsz, vsz, sel,
+                           gen_helper_gvec_bfmlal);
+        tcg_temp_free_ptr(status);
+    }
+    return true;
+}
+
+static bool trans_BFMLALB_zzzw(DisasContext *s, arg_rrrr_esz *a)
+{
+    return do_BFMLAL_zzzw(s, a, false);
+}
+
+static bool trans_BFMLALT_zzzw(DisasContext *s, arg_rrrr_esz *a)
+{
+    return do_BFMLAL_zzzw(s, a, true);
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmmla)(void *vd, void *vn, void *vm, void *va, uint32_t desc)
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va,
+                         void *stat, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    intptr_t sel = simd_data(desc);
+    float32 *d = vd, *a = va;
+    bfloat16 *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 4; ++i) {
+        float32 nn = n[H2(i * 2 + sel)] << 16;
+        float32 mm = m[H2(i * 2 + sel)] << 16;
+        d[H4(i)] = float32_muladd(nn, mm, a[H4(i)], 0, stat);
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is BFMLAL{B,T} for both AArch64 AdvSIMD and SVE,
and VFMA{B,T}.BF16 for AArch32 NEON.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-11-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h           |  2 ++
 target/arm/neon-shared.decode |  2 ++
 target/arm/sve.decode         |  2 ++
 target/arm/translate-a64.c    | 15 ++++++++++++++-
 target/arm/translate-neon.c   | 10 ++++++++++
 target/arm/translate-sve.c    | 30 ++++++++++++++++++++++++++++++
 target/arm/vec_helper.c       | 22 ++++++++++++++++++++++
 7 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(gvec_bfmmla, TCG_CALL_NO_RWG,
 
 DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG,
+                   void, ptr, ptr, ptr, ptr, ptr, i32)
 
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
diff --git a/target/arm/neon-shared.decode b/target/arm/neon-shared.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-shared.decode
+++ b/target/arm/neon-shared.decode
@@ -XXX,XX +XXX,XX @@ VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 0 . 1 index:1 ... \
                rm=%vfml_scalar_q0_rm vn=%vn_sp vd=%vd_dp q=0
 VFML_scalar    1111 1110 0 . 0 s:1 .... .... 1000 . 1 . 1 . rm:3 \
                index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp q=1
+VFMA_b16_scal  1111 1110 0.11 .... .... 1000 . q:1 . 1 . vm:3 \
+               index=%vfml_scalar_q1_index vn=%vn_dp vd=%vd_dp
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -XXX,XX +XXX,XX @@ FMLALB_zzxw     01100100 10 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
 FMLALT_zzxw     01100100 10 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
 FMLSLB_zzxw     01100100 10 1 ..... 0110.0 ..... .....     @rrxr_3a esz=2
 FMLSLT_zzxw     01100100 10 1 ..... 0110.1 ..... .....     @rrxr_3a esz=2
+BFMLALB_zzxw    01100100 11 1 ..... 0100.0 ..... .....     @rrxr_3a esz=2
+BFMLALT_zzxw    01100100 11 1 ..... 0100.1 ..... .....     @rrxr_3a esz=2
 
 ### SVE2 floating-point bfloat16 dot-product (indexed)
 BFDOT_zzxz      01100100 01 1 ..... 010000 ..... .....     @rrxr_2 esz=2
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
                 unallocated_encoding(s);
                 return;
             }
+            size = MO_32;
             break;
         case 1: /* BFDOT */
             if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
                 unallocated_encoding(s);
                 return;
             }
+            size = MO_32;
+            break;
+        case 3: /* BFMLAL{B,T} */
+            if (is_scalar || !dc_isar_feature(aa64_bf16, s)) {
+                unallocated_encoding(s);
+                return;
+            }
+            /* can't set is_fp without other incorrect size checks */
+            size = MO_16;
             break;
         default:
             unallocated_encoding(s);
             return;
         }
-        size = MO_32;
         break;
     case 0x11: /* FCMLA #0 */
     case 0x13: /* FCMLA #90 */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_indexed(DisasContext *s, uint32_t insn)
             gen_gvec_op4_ool(s, is_q, rd, rn, rm, rd, index,
                              gen_helper_gvec_usdot_idx_b);
             return;
+        case 3: /* BFMLAL{B,T} */
+            gen_gvec_op4_fpst(s, 1, rd, rn, rm, rd, 0, (index << 1) | is_q,
+                              gen_helper_gvec_bfmlal_idx);
+            return;
         }
         g_assert_not_reached();
     case 0x11: /* FCMLA #0 */
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VFMA_b16(DisasContext *s, arg_VFMA_b16 *a)
     return do_neon_ddda_fpst(s, 7, a->vd, a->vn, a->vm, a->q, FPST_STD,
                              gen_helper_gvec_bfmlal);
 }
+
+static bool trans_VFMA_b16_scal(DisasContext *s, arg_VFMA_b16_scal *a)
+{
+    if (!dc_isar_feature(aa32_bf16, s)) {
+        return false;
+    }
+    return do_neon_ddda_fpst(s, 6, a->vd, a->vn, a->vm,
+                             (a->index << 1) | a->q, FPST_STD,
+                             gen_helper_gvec_bfmlal_idx);
+}
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BFMLALT_zzzw(DisasContext *s, arg_rrrr_esz *a)
 {
     return do_BFMLAL_zzzw(s, a, true);
 }
+
+static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz *a, bool sel)
+{
+    if (!dc_isar_feature(aa64_sve_bf16, s)) {
+        return false;
+    }
+    if (sve_access_check(s)) {
+        TCGv_ptr status = fpstatus_ptr(FPST_FPCR);
+        unsigned vsz = vec_full_reg_size(s);
+
+        tcg_gen_gvec_4_ptr(vec_full_reg_offset(s, a->rd),
+                           vec_full_reg_offset(s, a->rn),
+                           vec_full_reg_offset(s, a->rm),
+                           vec_full_reg_offset(s, a->ra),
+                           status, vsz, vsz, (a->index << 1) | sel,
+                           gen_helper_gvec_bfmlal_idx);
+        tcg_temp_free_ptr(status);
+    }
+    return true;
+}
+
+static bool trans_BFMLALB_zzxw(DisasContext *s, arg_rrxr_esz *a)
+{
+    return do_BFMLAL_zzxw(s, a, false);
+}
+
+static bool trans_BFMLALT_zzxw(DisasContext *s, arg_rrxr_esz *a)
+{
+    return do_BFMLAL_zzxw(s, a, true);
+}
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_bfmlal)(void *vd, void *vn, void *vm, void *va,
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+void HELPER(gvec_bfmlal_idx)(void *vd, void *vn, void *vm,
+                             void *va, void *stat, uint32_t desc)
+{
+    intptr_t i, j, opr_sz = simd_oprsz(desc);
+    intptr_t sel = extract32(desc, SIMD_DATA_SHIFT, 1);
+    intptr_t index = extract32(desc, SIMD_DATA_SHIFT + 1, 3);
+    intptr_t elements = opr_sz / 4;
+    intptr_t eltspersegment = MIN(16 / 4, elements);
+    float32 *d = vd, *a = va;
+    bfloat16 *n = vn, *m = vm;
+
+    for (i = 0; i < elements; i += eltspersegment) {
+        float32 m_idx = m[H2(2 * i + index)] << 16;
+
+        for (j = i; j < i + eltspersegment; j++) {
+            float32 n_j = n[H2(2 * j + sel)] << 16;
+            d[H4(j)] = float32_muladd(n_j, m_idx, a[H4(j)], 0, stat);
+        }
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-12-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 linux-user/elfload.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap2(void)
     GET_FEATURE_ID(aa64_sve_i8mm, ARM_HWCAP2_A64_SVEI8MM);
     GET_FEATURE_ID(aa64_sve_f32mm, ARM_HWCAP2_A64_SVEF32MM);
     GET_FEATURE_ID(aa64_sve_f64mm, ARM_HWCAP2_A64_SVEF64MM);
+    GET_FEATURE_ID(aa64_sve_bf16, ARM_HWCAP2_A64_SVEBF16);
     GET_FEATURE_ID(aa64_i8mm, ARM_HWCAP2_A64_I8MM);
+    GET_FEATURE_ID(aa64_bf16, ARM_HWCAP2_A64_BF16);
     GET_FEATURE_ID(aa64_rndr, ARM_HWCAP2_A64_RNG);
     GET_FEATURE_ID(aa64_bti, ARM_HWCAP2_A64_BTI);
     GET_FEATURE_ID(aa64_mte, ARM_HWCAP2_A64_MTE);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Disable BF16 again for !have_neon and !have_vfp during realize.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210525225817.400336-13-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c     | 3 +++
 target/arm/cpu64.c   | 3 +++
 target/arm/cpu_tcg.c | 1 +
 3 files changed, 7 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
 
         u = cpu->isar.id_isar6;
         u = FIELD_DP32(u, ID_ISAR6, JSCVT, 0);
+        u = FIELD_DP32(u, ID_ISAR6, BF16, 0);
         cpu->isar.id_isar6 = u;
 
         u = cpu->isar.mvfr0;
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
 
         t = cpu->isar.id_aa64isar1;
         t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 0);
+        t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 0);
         t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 0);
         cpu->isar.id_aa64isar1 = t;
 
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
         u = cpu->isar.id_isar6;
         u = FIELD_DP32(u, ID_ISAR6, DP, 0);
         u = FIELD_DP32(u, ID_ISAR6, FHM, 0);
+        u = FIELD_DP32(u, ID_ISAR6, BF16, 0);
         u = FIELD_DP32(u, ID_ISAR6, I8MM, 0);
         cpu->isar.id_isar6 = u;
 
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
         t = FIELD_DP64(t, ID_AA64ISAR1, SB, 1);
         t = FIELD_DP64(t, ID_AA64ISAR1, SPECRES, 1);
+        t = FIELD_DP64(t, ID_AA64ISAR1, BF16, 1);
         t = FIELD_DP64(t, ID_AA64ISAR1, FRINTTS, 1);
         t = FIELD_DP64(t, ID_AA64ISAR1, LRCPC, 2); /* ARMv8.4-RCPC */
         t = FIELD_DP64(t, ID_AA64ISAR1, I8MM, 1);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         t = FIELD_DP64(t, ID_AA64ZFR0, SVEVER, 1);
         t = FIELD_DP64(t, ID_AA64ZFR0, AES, 2);  /* PMULL */
         t = FIELD_DP64(t, ID_AA64ZFR0, BITPERM, 1);
+        t = FIELD_DP64(t, ID_AA64ZFR0, BFLOAT16, 1);
         t = FIELD_DP64(t, ID_AA64ZFR0, SHA3, 1);
         t = FIELD_DP64(t, ID_AA64ZFR0, SM4, 1);
         t = FIELD_DP64(t, ID_AA64ZFR0, I8MM, 1);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         u = FIELD_DP32(u, ID_ISAR6, FHM, 1);
         u = FIELD_DP32(u, ID_ISAR6, SB, 1);
         u = FIELD_DP32(u, ID_ISAR6, SPECRES, 1);
+        u = FIELD_DP32(u, ID_ISAR6, BF16, 1);
         u = FIELD_DP32(u, ID_ISAR6, I8MM, 1);
         cpu->isar.id_isar6 = u;
 
diff --git a/target/arm/cpu_tcg.c b/target/arm/cpu_tcg.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu_tcg.c
+++ b/target/arm/cpu_tcg.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
         t = FIELD_DP32(t, ID_ISAR6, FHM, 1);
         t = FIELD_DP32(t, ID_ISAR6, SB, 1);
         t = FIELD_DP32(t, ID_ISAR6, SPECRES, 1);
+        t = FIELD_DP32(t, ID_ISAR6, BF16, 1);
         t = FIELD_DP32(t, ID_ISAR6, I8MM, 1);
         cpu->isar.id_isar6 = t;
 
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

Until now, Hypervisor.framework has only been available on x86_64 systems.
With Apple Silicon shipping now, it extends its reach to aarch64. To
prepare for support for multiple architectures, let's start moving common
code out into its own accel directory.

This patch moves assert_hvf_ok() and introduces generic build infrastructure.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-2-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/sysemu/hvf_int.h | 18 +++++++++++++++
 accel/hvf/hvf-all.c      | 47 ++++++++++++++++++++++++++++++++++++++++
 target/i386/hvf/hvf.c    | 33 +---------------------------
 MAINTAINERS              |  8 +++++++
 accel/hvf/meson.build    |  6 +++++
 accel/meson.build        |  1 +
 6 files changed, 81 insertions(+), 32 deletions(-)
 create mode 100644 include/sysemu/hvf_int.h
 create mode 100644 accel/hvf/hvf-all.c
 create mode 100644 accel/hvf/meson.build

diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * QEMU Hypervisor.framework (HVF) support
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/* header to be included in HVF-specific code */
+
+#ifndef HVF_INT_H
+#define HVF_INT_H
+
+#include <Hypervisor/hv.h>
+
+void assert_hvf_ok(hv_return_t ret);
+
+#endif
diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/accel/hvf/hvf-all.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * QEMU Hypervisor.framework support
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
+
+void assert_hvf_ok(hv_return_t ret)
+{
+    if (ret == HV_SUCCESS) {
+        return;
+    }
+
+    switch (ret) {
+    case HV_ERROR:
+        error_report("Error: HV_ERROR");
+        break;
+    case HV_BUSY:
+        error_report("Error: HV_BUSY");
+        break;
+    case HV_BAD_ARGUMENT:
+        error_report("Error: HV_BAD_ARGUMENT");
+        break;
+    case HV_NO_RESOURCES:
+        error_report("Error: HV_NO_RESOURCES");
+        break;
+    case HV_NO_DEVICE:
+        error_report("Error: HV_NO_DEVICE");
+        break;
+    case HV_UNSUPPORTED:
+        error_report("Error: HV_UNSUPPORTED");
+        break;
+    default:
+        error_report("Unknown Error");
+    }
+
+    abort();
+}
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/error-report.h"
 
 #include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
 #include "sysemu/runstate.h"
 #include "hvf-i386.h"
 #include "vmcs.h"
@@ -XXX,XX +XXX,XX @@
 
 HVFState *hvf_state;
 
-static void assert_hvf_ok(hv_return_t ret)
-{
-    if (ret == HV_SUCCESS) {
-        return;
-    }
-
-    switch (ret) {
-    case HV_ERROR:
-        error_report("Error: HV_ERROR");
-        break;
-    case HV_BUSY:
-        error_report("Error: HV_BUSY");
-        break;
-    case HV_BAD_ARGUMENT:
-        error_report("Error: HV_BAD_ARGUMENT");
-        break;
-    case HV_NO_RESOURCES:
-        error_report("Error: HV_NO_RESOURCES");
-        break;
-    case HV_NO_DEVICE:
-        error_report("Error: HV_NO_DEVICE");
-        break;
-    case HV_UNSUPPORTED:
-        error_report("Error: HV_UNSUPPORTED");
-        break;
-    default:
-        error_report("Unknown Error");
-    }
-
-    abort();
-}
-
 /* Memory slots */
 hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
 {
diff --git a/MAINTAINERS b/MAINTAINERS
index XXXXXXX..XXXXXXX 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ M: Roman Bolshakov <r.bolshakov@yadro.com>
 W: https://wiki.qemu.org/Features/HVF
 S: Maintained
 F: target/i386/hvf/
+
+HVF
+M: Cameron Esfahani <dirty@apple.com>
+M: Roman Bolshakov <r.bolshakov@yadro.com>
+W: https://wiki.qemu.org/Features/HVF
+S: Maintained
+F: accel/hvf/
 F: include/sysemu/hvf.h
+F: include/sysemu/hvf_int.h
 
 WHPX CPUs
 M: Sunil Muthuswamy <sunilmut@microsoft.com>
diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/accel/hvf/meson.build
@@ -XXX,XX +XXX,XX @@
+hvf_ss = ss.source_set()
+hvf_ss.add(files(
+  'hvf-all.c',
+))
+
+specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
diff --git a/accel/meson.build b/accel/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/accel/meson.build
+++ b/accel/meson.build
@@ -XXX,XX +XXX,XX @@ specific_ss.add(files('accel-common.c'))
 softmmu_ss.add(files('accel-softmmu.c'))
 user_ss.add(files('accel-user.c'))
 
+subdir('hvf')
 subdir('qtest')
 subdir('kvm')
 subdir('tcg')
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

This patch moves the vCPU thread loop over.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-3-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 {target/i386 => accel}/hvf/hvf-accel-ops.h | 0
 {target/i386 => accel}/hvf/hvf-accel-ops.c | 0
 target/i386/hvf/x86hvf.c                   | 2 +-
 accel/hvf/meson.build                      | 1 +
 target/i386/hvf/meson.build                | 1 -
 5 files changed, 2 insertions(+), 2 deletions(-)
 rename {target/i386 => accel}/hvf/hvf-accel-ops.h (100%)
 rename {target/i386 => accel}/hvf/hvf-accel-ops.c (100%)

diff --git a/target/i386/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
similarity index 100%
rename from target/i386/hvf/hvf-accel-ops.h
rename to accel/hvf/hvf-accel-ops.h
diff --git a/target/i386/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
similarity index 100%
rename from target/i386/hvf/hvf-accel-ops.c
rename to accel/hvf/hvf-accel-ops.c
diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -XXX,XX +XXX,XX @@
 #include <Hypervisor/hv.h>
 #include <Hypervisor/hv_vmx.h>
 
-#include "hvf-accel-ops.h"
+#include "accel/hvf/hvf-accel-ops.h"
 
 void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
                      SegmentCache *qseg, bool is_tr)
diff --git a/accel/hvf/meson.build b/accel/hvf/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/meson.build
+++ b/accel/hvf/meson.build
@@ -XXX,XX +XXX,XX @@
 hvf_ss = ss.source_set()
 hvf_ss.add(files(
   'hvf-all.c',
+  'hvf-accel-ops.c',
 ))
 
 specific_ss.add_all(when: 'CONFIG_HVF', if_true: hvf_ss)
diff --git a/target/i386/hvf/meson.build b/target/i386/hvf/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/meson.build
+++ b/target/i386/hvf/meson.build
@@ -XXX,XX +XXX,XX @@
 i386_softmmu_ss.add(when: [hvf, 'CONFIG_HVF'], if_true: files(
   'hvf.c',
-  'hvf-accel-ops.c',
   'x86.c',
   'x86_cpuid.c',
   'x86_decode.c',
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

This patch moves CPU and memory operations over. While at it, make sure
the code is consumable on non-i386 systems.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-4-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/sysemu/hvf_int.h   |   4 +
 target/i386/hvf/hvf-i386.h |   2 -
 target/i386/hvf/x86hvf.h   |   2 -
 accel/hvf/hvf-accel-ops.c  | 308 ++++++++++++++++++++++++++++++++++++-
 target/i386/hvf/hvf.c      | 302 ------------------------------------
 5 files changed, 311 insertions(+), 307 deletions(-)

diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@
 
 #include <Hypervisor/hv.h>
 
+void hvf_set_phys_mem(MemoryRegionSection *, bool);
 void assert_hvf_ok(hv_return_t ret);
+hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
+int hvf_put_registers(CPUState *);
+int hvf_get_registers(CPUState *);
 
 #endif
diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf-i386.h
+++ b/target/i386/hvf/hvf-i386.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
 };
 extern HVFState *hvf_state;
 
-void hvf_set_phys_mem(MemoryRegionSection *, bool);
 void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
-hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
 
 #ifdef NEED_CPU_H
 /* Functions exported to host specific mode */
diff --git a/target/i386/hvf/x86hvf.h b/target/i386/hvf/x86hvf.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86hvf.h
+++ b/target/i386/hvf/x86hvf.h
@@ -XXX,XX +XXX,XX @@
 #include "x86_descr.h"
 
 int hvf_process_events(CPUState *);
-int hvf_put_registers(CPUState *);
-int hvf_get_registers(CPUState *);
 bool hvf_inject_interrupts(CPUState *);
 void hvf_set_segment(struct CPUState *cpu, struct vmx_segment *vmx_seg,
                      SegmentCache *qseg, bool is_tr);
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
+#include "exec/address-spaces.h"
+#include "exec/exec-all.h"
+#include "sysemu/cpus.h"
 #include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
 #include "sysemu/runstate.h"
-#include "target/i386/cpu.h"
 #include "qemu/guest-random.h"
 
 #include "hvf-accel-ops.h"
 
+HVFState *hvf_state;
+
+/* Memory slots */
+
+hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
+{
+    hvf_slot *slot;
+    int x;
+    for (x = 0; x < hvf_state->num_slots; ++x) {
+        slot = &hvf_state->slots[x];
+        if (slot->size && start < (slot->start + slot->size) &&
+            (start + size) > slot->start) {
+            return slot;
+        }
+    }
+    return NULL;
+}
+
+struct mac_slot {
+    int present;
+    uint64_t size;
+    uint64_t gpa_start;
+    uint64_t gva;
+};
+
+struct mac_slot mac_slots[32];
+
+static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
+{
+    struct mac_slot *macslot;
+    hv_return_t ret;
+
+    macslot = &mac_slots[slot->slot_id];
+
+    if (macslot->present) {
+        if (macslot->size != slot->size) {
+            macslot->present = 0;
+            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
+            assert_hvf_ok(ret);
+        }
+    }
+
+    if (!slot->size) {
+        return 0;
+    }
+
+    macslot->present = 1;
+    macslot->gpa_start = slot->start;
+    macslot->size = slot->size;
+    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
+    assert_hvf_ok(ret);
+    return 0;
+}
+
+void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
+{
+    hvf_slot *mem;
+    MemoryRegion *area = section->mr;
+    bool writeable = !area->readonly && !area->rom_device;
+    hv_memory_flags_t flags;
+
+    if (!memory_region_is_ram(area)) {
+        if (writeable) {
+            return;
+        } else if (!memory_region_is_romd(area)) {
+            /*
+             * If the memory device is not in romd_mode, then we actually want
+             * to remove the hvf memory slot so all accesses will trap.
+             */
+             add = false;
+        }
+    }
+
+    mem = hvf_find_overlap_slot(
+            section->offset_within_address_space,
+            int128_get64(section->size));
+
+    if (mem && add) {
+        if (mem->size == int128_get64(section->size) &&
+            mem->start == section->offset_within_address_space &&
+            mem->mem == (memory_region_get_ram_ptr(area) +
+            section->offset_within_region)) {
+            return; /* Same region was attempted to register, go away. */
+        }
+    }
+
+    /* Region needs to be reset. set the size to 0 and remap it. */
+    if (mem) {
+        mem->size = 0;
+        if (do_hvf_set_memory(mem, 0)) {
+            error_report("Failed to reset overlapping slot");
+            abort();
+        }
+    }
+
+    if (!add) {
+        return;
+    }
+
+    if (area->readonly ||
+        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
+        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
+    } else {
+        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
+    }
+
+    /* Now make a new slot. */
+    int x;
+
+    for (x = 0; x < hvf_state->num_slots; ++x) {
+        mem = &hvf_state->slots[x];
+        if (!mem->size) {
+            break;
+        }
+    }
+
+    if (x == hvf_state->num_slots) {
+        error_report("No free slots");
+        abort();
+    }
+
+    mem->size = int128_get64(section->size);
+    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
+    mem->start = section->offset_within_address_space;
+    mem->region = area;
+
+    if (do_hvf_set_memory(mem, flags)) {
+        error_report("Error registering new memory slot");
+        abort();
+    }
+}
+
+static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
+{
+    if (!cpu->vcpu_dirty) {
+        hvf_get_registers(cpu);
+        cpu->vcpu_dirty = true;
+    }
+}
+
+void hvf_cpu_synchronize_state(CPUState *cpu)
+{
+    if (!cpu->vcpu_dirty) {
+        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
+    }
+}
+
+static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
+                                              run_on_cpu_data arg)
+{
+    hvf_put_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+void hvf_cpu_synchronize_post_reset(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
+}
+
+static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
+                                             run_on_cpu_data arg)
+{
+    hvf_put_registers(cpu);
+    cpu->vcpu_dirty = false;
+}
+
+void hvf_cpu_synchronize_post_init(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
+}
+
+static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
+                                              run_on_cpu_data arg)
+{
+    cpu->vcpu_dirty = true;
+}
+
+void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
+{
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
+}
+
+static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
+{
+    hvf_slot *slot;
+
+    slot = hvf_find_overlap_slot(
+            section->offset_within_address_space,
+            int128_get64(section->size));
+
+    /* protect region against writes; begin tracking it */
+    if (on) {
+        slot->flags |= HVF_SLOT_LOG;
+        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
+                      HV_MEMORY_READ);
+    /* stop tracking region*/
+    } else {
+        slot->flags &= ~HVF_SLOT_LOG;
+        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
+                      HV_MEMORY_READ | HV_MEMORY_WRITE);
+    }
+}
+
+static void hvf_log_start(MemoryListener *listener,
+                          MemoryRegionSection *section, int old, int new)
+{
+    if (old != 0) {
+        return;
+    }
+
+    hvf_set_dirty_tracking(section, 1);
+}
+
+static void hvf_log_stop(MemoryListener *listener,
+                         MemoryRegionSection *section, int old, int new)
+{
+    if (new != 0) {
+        return;
+    }
+
+    hvf_set_dirty_tracking(section, 0);
+}
+
+static void hvf_log_sync(MemoryListener *listener,
+                         MemoryRegionSection *section)
+{
+    /*
+     * sync of dirty pages is handled elsewhere; just make sure we keep
+     * tracking the region.
+     */
+    hvf_set_dirty_tracking(section, 1);
+}
+
+static void hvf_region_add(MemoryListener *listener,
+                           MemoryRegionSection *section)
+{
+    hvf_set_phys_mem(section, true);
+}
+
+static void hvf_region_del(MemoryListener *listener,
+                           MemoryRegionSection *section)
+{
+    hvf_set_phys_mem(section, false);
+}
+
+static MemoryListener hvf_memory_listener = {
+    .priority = 10,
+    .region_add = hvf_region_add,
+    .region_del = hvf_region_del,
+    .log_start = hvf_log_start,
+    .log_stop = hvf_log_stop,
+    .log_sync = hvf_log_sync,
+};
+
+static void dummy_signal(int sig)
+{
+}
+
+bool hvf_allowed;
+
+static int hvf_accel_init(MachineState *ms)
+{
+    int x;
+    hv_return_t ret;
+    HVFState *s;
+
+    ret = hv_vm_create(HV_VM_DEFAULT);
+    assert_hvf_ok(ret);
+
+    s = g_new0(HVFState, 1);
+
+    s->num_slots = 32;
+    for (x = 0; x < s->num_slots; ++x) {
+        s->slots[x].size = 0;
+        s->slots[x].slot_id = x;
+    }
+
+    hvf_state = s;
+    memory_listener_register(&hvf_memory_listener, &address_space_memory);
+    return 0;
+}
+
+static void hvf_accel_class_init(ObjectClass *oc, void *data)
+{
+    AccelClass *ac = ACCEL_CLASS(oc);
+    ac->name = "HVF";
+    ac->init_machine = hvf_accel_init;
+    ac->allowed = &hvf_allowed;
+}
+
+static const TypeInfo hvf_accel_type = {
+    .name = TYPE_HVF_ACCEL,
+    .parent = TYPE_ACCEL,
+    .class_init = hvf_accel_class_init,
+};
+
+static void hvf_type_init(void)
+{
+    type_register_static(&hvf_accel_type);
+}
+
+type_init(hvf_type_init);
+
 /*
  * The HVF-specific vCPU thread function. This one should only run when the host
  * CPU supports the VMX "unrestricted guest" feature.
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
 
 #include "hvf-accel-ops.h"
 
-HVFState *hvf_state;
-
-/* Memory slots */
-hvf_slot *hvf_find_overlap_slot(uint64_t start, uint64_t size)
-{
-    hvf_slot *slot;
-    int x;
-    for (x = 0; x < hvf_state->num_slots; ++x) {
-        slot = &hvf_state->slots[x];
-        if (slot->size && start < (slot->start + slot->size) &&
-            (start + size) > slot->start) {
-            return slot;
-        }
-    }
-    return NULL;
-}
-
-struct mac_slot {
-    int present;
-    uint64_t size;
-    uint64_t gpa_start;
-    uint64_t gva;
-};
-
-struct mac_slot mac_slots[32];
-
-static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
-{
-    struct mac_slot *macslot;
-    hv_return_t ret;
-
-    macslot = &mac_slots[slot->slot_id];
-
-    if (macslot->present) {
-        if (macslot->size != slot->size) {
-            macslot->present = 0;
-            ret = hv_vm_unmap(macslot->gpa_start, macslot->size);
-            assert_hvf_ok(ret);
-        }
-    }
-
-    if (!slot->size) {
-        return 0;
-    }
-
-    macslot->present = 1;
-    macslot->gpa_start = slot->start;
-    macslot->size = slot->size;
-    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
-    assert_hvf_ok(ret);
-    return 0;
-}
-
-void hvf_set_phys_mem(MemoryRegionSection *section, bool add)
-{
-    hvf_slot *mem;
-    MemoryRegion *area = section->mr;
-    bool writeable = !area->readonly && !area->rom_device;
-    hv_memory_flags_t flags;
-
-    if (!memory_region_is_ram(area)) {
-        if (writeable) {
-            return;
-        } else if (!memory_region_is_romd(area)) {
-            /*
-             * If the memory device is not in romd_mode, then we actually want
-             * to remove the hvf memory slot so all accesses will trap.
-             */
-             add = false;
-        }
-    }
-
-    mem = hvf_find_overlap_slot(
-            section->offset_within_address_space,
-            int128_get64(section->size));
-
-    if (mem && add) {
-        if (mem->size == int128_get64(section->size) &&
-            mem->start == section->offset_within_address_space &&
-            mem->mem == (memory_region_get_ram_ptr(area) +
-            section->offset_within_region)) {
-            return; /* Same region was attempted to register, go away. */
-        }
-    }
-
-    /* Region needs to be reset. set the size to 0 and remap it. */
-    if (mem) {
-        mem->size = 0;
-        if (do_hvf_set_memory(mem, 0)) {
-            error_report("Failed to reset overlapping slot");
-            abort();
-        }
-    }
-
-    if (!add) {
-        return;
-    }
-
-    if (area->readonly ||
-        (!memory_region_is_ram(area) && memory_region_is_romd(area))) {
-        flags = HV_MEMORY_READ | HV_MEMORY_EXEC;
-    } else {
-        flags = HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC;
-    }
-
-    /* Now make a new slot. */
-    int x;
-
-    for (x = 0; x < hvf_state->num_slots; ++x) {
-        mem = &hvf_state->slots[x];
-        if (!mem->size) {
-            break;
-        }
-    }
-
-    if (x == hvf_state->num_slots) {
-        error_report("No free slots");
-        abort();
-    }
-
-    mem->size = int128_get64(section->size);
-    mem->mem = memory_region_get_ram_ptr(area) + section->offset_within_region;
-    mem->start = section->offset_within_address_space;
-    mem->region = area;
-
-    if (do_hvf_set_memory(mem, flags)) {
-        error_report("Error registering new memory slot");
-        abort();
-    }
-}
-
 void vmx_update_tpr(CPUState *cpu)
 {
     /* TODO: need integrate APIC handling */
@@ -XXX,XX +XXX,XX @@ void hvf_handle_io(CPUArchState *env, uint16_t port, void *buffer,
     }
 }
 
-static void do_hvf_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
-{
-    if (!cpu->vcpu_dirty) {
-        hvf_get_registers(cpu);
-        cpu->vcpu_dirty = true;
-    }
-}
-
-void hvf_cpu_synchronize_state(CPUState *cpu)
-{
-    if (!cpu->vcpu_dirty) {
-        run_on_cpu(cpu, do_hvf_cpu_synchronize_state, RUN_ON_CPU_NULL);
-    }
-}
-
-static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
-                                              run_on_cpu_data arg)
-{
-    hvf_put_registers(cpu);
-    cpu->vcpu_dirty = false;
-}
-
-void hvf_cpu_synchronize_post_reset(CPUState *cpu)
-{
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
-}
-
-static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
-                                             run_on_cpu_data arg)
-{
-    hvf_put_registers(cpu);
-    cpu->vcpu_dirty = false;
-}
-
-void hvf_cpu_synchronize_post_init(CPUState *cpu)
-{
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
-}
-
-static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
-                                              run_on_cpu_data arg)
-{
-    cpu->vcpu_dirty = true;
-}
-
-void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
-{
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
-}
-
 static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
 {
     int read, write;
@@ -XXX,XX +XXX,XX @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
     return false;
 }
 
-static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
-{
-    hvf_slot *slot;
-
-    slot = hvf_find_overlap_slot(
-            section->offset_within_address_space,
-            int128_get64(section->size));
-
-    /* protect region against writes; begin tracking it */
-    if (on) {
-        slot->flags |= HVF_SLOT_LOG;
-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
-                      HV_MEMORY_READ);
-    /* stop tracking region*/
-    } else {
-        slot->flags &= ~HVF_SLOT_LOG;
-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
-                      HV_MEMORY_READ | HV_MEMORY_WRITE);
-    }
-}
-
-static void hvf_log_start(MemoryListener *listener,
-                          MemoryRegionSection *section, int old, int new)
-{
-    if (old != 0) {
-        return;
-    }
-
-    hvf_set_dirty_tracking(section, 1);
-}
-
-static void hvf_log_stop(MemoryListener *listener,
-                         MemoryRegionSection *section, int old, int new)
-{
-    if (new != 0) {
-        return;
-    }
-
-    hvf_set_dirty_tracking(section, 0);
-}
-
-static void hvf_log_sync(MemoryListener *listener,
-                         MemoryRegionSection *section)
-{
-    /*
-     * sync of dirty pages is handled elsewhere; just make sure we keep
-     * tracking the region.
-     */
-    hvf_set_dirty_tracking(section, 1);
-}
-
-static void hvf_region_add(MemoryListener *listener,
-                           MemoryRegionSection *section)
-{
-    hvf_set_phys_mem(section, true);
-}
-
-static void hvf_region_del(MemoryListener *listener,
-                           MemoryRegionSection *section)
-{
-    hvf_set_phys_mem(section, false);
-}
-
-static MemoryListener hvf_memory_listener = {
-    .priority = 10,
-    .region_add = hvf_region_add,
-    .region_del = hvf_region_del,
-    .log_start = hvf_log_start,
-    .log_stop = hvf_log_stop,
-    .log_sync = hvf_log_sync,
-};
-
 void hvf_vcpu_destroy(CPUState *cpu)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
@@ -XXX,XX +XXX,XX @@ void hvf_vcpu_destroy(CPUState *cpu)
     assert_hvf_ok(ret);
 }
 
-static void dummy_signal(int sig)
-{
-}
-
 static void init_tsc_freq(CPUX86State *env)
 {
     size_t length;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
 
     return ret;
 }
-
-bool hvf_allowed;
-
-static int hvf_accel_init(MachineState *ms)
-{
-    int x;
-    hv_return_t ret;
-    HVFState *s;
-
-    ret = hv_vm_create(HV_VM_DEFAULT);
-    assert_hvf_ok(ret);
-
-    s = g_new0(HVFState, 1);
- 
-    s->num_slots = 32;
-    for (x = 0; x < s->num_slots; ++x) {
-        s->slots[x].size = 0;
-        s->slots[x].slot_id = x;
-    }
-  
-    hvf_state = s;
-    memory_listener_register(&hvf_memory_listener, &address_space_memory);
-    return 0;
-}
-
-static void hvf_accel_class_init(ObjectClass *oc, void *data)
-{
-    AccelClass *ac = ACCEL_CLASS(oc);
-    ac->name = "HVF";
-    ac->init_machine = hvf_accel_init;
-    ac->allowed = &hvf_allowed;
-}
-
-static const TypeInfo hvf_accel_type = {
-    .name = TYPE_HVF_ACCEL,
-    .parent = TYPE_ACCEL,
-    .class_init = hvf_accel_class_init,
-};
-
-static void hvf_type_init(void)
-{
-    type_register_static(&hvf_accel_type);
-}
-
-type_init(hvf_type_init);
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

This patch moves a few internal struct and constant defines over.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-5-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/sysemu/hvf_int.h   | 30 ++++++++++++++++++++++++++++++
 target/i386/hvf/hvf-i386.h | 31 +------------------------------
 2 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@
 
 #include <Hypervisor/hv.h>
 
+/* hvf_slot flags */
+#define HVF_SLOT_LOG (1 << 0)
+
+typedef struct hvf_slot {
+    uint64_t start;
+    uint64_t size;
+    uint8_t *mem;
+    int slot_id;
+    uint32_t flags;
+    MemoryRegion *region;
+} hvf_slot;
+
+typedef struct hvf_vcpu_caps {
+    uint64_t vmx_cap_pinbased;
+    uint64_t vmx_cap_procbased;
+    uint64_t vmx_cap_procbased2;
+    uint64_t vmx_cap_entry;
+    uint64_t vmx_cap_exit;
+    uint64_t vmx_cap_preemption_timer;
+} hvf_vcpu_caps;
+
+struct HVFState {
+    AccelState parent;
+    hvf_slot slots[32];
+    int num_slots;
+
+    hvf_vcpu_caps *hvf_caps;
+};
+extern HVFState *hvf_state;
+
 void hvf_set_phys_mem(MemoryRegionSection *, bool);
 void assert_hvf_ok(hv_return_t ret);
 hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
diff --git a/target/i386/hvf/hvf-i386.h b/target/i386/hvf/hvf-i386.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf-i386.h
+++ b/target/i386/hvf/hvf-i386.h
@@ -XXX,XX +XXX,XX @@
 
 #include "qemu/accel.h"
 #include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
 #include "cpu.h"
 #include "x86.h"
 
-/* hvf_slot flags */
-#define HVF_SLOT_LOG (1 << 0)
-
-typedef struct hvf_slot {
-    uint64_t start;
-    uint64_t size;
-    uint8_t *mem;
-    int slot_id;
-    uint32_t flags;
-    MemoryRegion *region;
-} hvf_slot;
-
-typedef struct hvf_vcpu_caps {
-    uint64_t vmx_cap_pinbased;
-    uint64_t vmx_cap_procbased;
-    uint64_t vmx_cap_procbased2;
-    uint64_t vmx_cap_entry;
-    uint64_t vmx_cap_exit;
-    uint64_t vmx_cap_preemption_timer;
-} hvf_vcpu_caps;
-
-struct HVFState {
-    AccelState parent;
-    hvf_slot slots[32];
-    int num_slots;
-
-    hvf_vcpu_caps *hvf_caps;
-};
-extern HVFState *hvf_state;
-
 void hvf_handle_io(CPUArchState *, uint16_t, void *, int, int, int);
 
 #ifdef NEED_CPU_H
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

The hvf_set_phys_mem() function is only called within the same file.
Make it static.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-6-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/sysemu/hvf_int.h  | 1 -
 accel/hvf/hvf-accel-ops.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

From: Alexander Graf <agraf@csgraf.de>

The ARM version of Hypervisor.framework no longer defines these two
types, so let's just revert to standard ones.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-7-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static int do_hvf_set_memory(hvf_slot *slot, hv_memory_flags_t flags)
     macslot->present = 1;
     macslot->gpa_start = slot->start;
     macslot->size = slot->size;
-    ret = hv_vm_map((hv_uvaddr_t)slot->mem, slot->start, slot->size, flags);
+    ret = hv_vm_map(slot->mem, slot->start, slot->size, flags);
     assert_hvf_ok(ret);
     return 0;
 }
@@ -XXX,XX +XXX,XX @@ static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
     /* protect region against writes; begin tracking it */
     if (on) {
         slot->flags |= HVF_SLOT_LOG;
-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
+        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
                       HV_MEMORY_READ);
     /* stop tracking region*/
     } else {
         slot->flags &= ~HVF_SLOT_LOG;
-        hv_vm_protect((hv_gpaddr_t)slot->start, (size_t)slot->size,
+        hv_vm_protect((uintptr_t)slot->start, (size_t)slot->size,
                       HV_MEMORY_READ | HV_MEMORY_WRITE);
     }
 }
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

This patch splits the vcpu init and destroy functions into a generic and
an architecture specific portion. This also allows us to move the generic
functions into the generic hvf code, removing exported functions.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-8-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.h |  2 --
 include/sysemu/hvf_int.h  |  2 ++
 accel/hvf/hvf-accel-ops.c | 30 ++++++++++++++++++++++++++++++
 target/i386/hvf/hvf.c     | 23 ++---------------------
 4 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.h
+++ b/accel/hvf/hvf-accel-ops.h
@@ -XXX,XX +XXX,XX @@
 
 #include "sysemu/cpus.h"
 
-int hvf_init_vcpu(CPUState *);
 int hvf_vcpu_exec(CPUState *);
 void hvf_cpu_synchronize_state(CPUState *);
 void hvf_cpu_synchronize_post_reset(CPUState *);
 void hvf_cpu_synchronize_post_init(CPUState *);
 void hvf_cpu_synchronize_pre_loadvm(CPUState *);
-void hvf_vcpu_destroy(CPUState *);
 
 #endif /* HVF_CPUS_H */
diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
 extern HVFState *hvf_state;
 
 void assert_hvf_ok(hv_return_t ret);
+int hvf_arch_init_vcpu(CPUState *cpu);
+void hvf_arch_vcpu_destroy(CPUState *cpu);
 hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
 int hvf_put_registers(CPUState *);
 int hvf_get_registers(CPUState *);
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static void hvf_type_init(void)
 
 type_init(hvf_type_init);
 
+static void hvf_vcpu_destroy(CPUState *cpu)
+{
+    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
+    assert_hvf_ok(ret);
+
+    hvf_arch_vcpu_destroy(cpu);
+}
+
+static int hvf_init_vcpu(CPUState *cpu)
+{
+    int r;
+
+    /* init cpu signals */
+    sigset_t set;
+    struct sigaction sigact;
+
+    memset(&sigact, 0, sizeof(sigact));
+    sigact.sa_handler = dummy_signal;
+    sigaction(SIG_IPI, &sigact, NULL);
+
+    pthread_sigmask(SIG_BLOCK, NULL, &set);
+    sigdelset(&set, SIG_IPI);
+
+    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
+    cpu->vcpu_dirty = 1;
+    assert_hvf_ok(r);
+
+    return hvf_arch_init_vcpu(cpu);
+}
+
 /*
  * The HVF-specific vCPU thread function. This one should only run when the host
  * CPU supports the VMX "unrestricted guest" feature.
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@ static bool ept_emulation_fault(hvf_slot *slot, uint64_t gpa, uint64_t ept_qual)
     return false;
 }
 
-void hvf_vcpu_destroy(CPUState *cpu)
+void hvf_arch_vcpu_destroy(CPUState *cpu)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
     CPUX86State *env = &x86_cpu->env;
 
-    hv_return_t ret = hv_vcpu_destroy((hv_vcpuid_t)cpu->hvf_fd);
     g_free(env->hvf_mmio_buf);
-    assert_hvf_ok(ret);
 }
 
 static void init_tsc_freq(CPUX86State *env)
@@ -XXX,XX +XXX,XX @@ static inline bool apic_bus_freq_is_known(CPUX86State *env)
     return env->apic_bus_freq != 0;
 }
 
-int hvf_init_vcpu(CPUState *cpu)
+int hvf_arch_init_vcpu(CPUState *cpu)
 {
-
     X86CPU *x86cpu = X86_CPU(cpu);
     CPUX86State *env = &x86cpu->env;
-    int r;
-
-    /* init cpu signals */
-    sigset_t set;
-    struct sigaction sigact;
-
-    memset(&sigact, 0, sizeof(sigact));
-    sigact.sa_handler = dummy_signal;
-    sigaction(SIG_IPI, &sigact, NULL);
-
-    pthread_sigmask(SIG_BLOCK, NULL, &set);
-    sigdelset(&set, SIG_IPI);
 
     init_emu();
     init_decoder();
@@ -XXX,XX +XXX,XX @@ int hvf_init_vcpu(CPUState *cpu)
         }
     }
 
-    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
-    cpu->vcpu_dirty = 1;
-    assert_hvf_ok(r);
-
     if (hv_vmx_read_capability(HV_VMX_CAP_PINBASED,
         &hvf_state->hvf_caps->vmx_cap_pinbased)) {
         abort();
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

There is no reason to call the hvf specific hvf_cpu_synchronize_state()
when we can just use the generic cpu_synchronize_state() instead. This
allows us to have less dependency on internal function definitions and
allows us to make hvf_cpu_synchronize_state() static.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-9-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.h | 1 -
 accel/hvf/hvf-accel-ops.c | 2 +-
 target/i386/hvf/x86hvf.c  | 9 ++++-----
 3 files changed, 5 insertions(+), 7 deletions(-)

From: Alexander Graf <agraf@csgraf.de>

The hvf accel synchronize functions are only used as input for local
callback functions, so we can make them static.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-10-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.h | 3 ---
 accel/hvf/hvf-accel-ops.c | 6 +++---
 2 files changed, 3 insertions(+), 6 deletions(-)

From: Alexander Graf <agraf@csgraf.de>

We can move the definition of hvf_vcpu_exec() into our internal
hvf header, obsoleting the need for hvf-accel-ops.h.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-11-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.h | 17 -----------------
 include/sysemu/hvf_int.h  |  1 +
 accel/hvf/hvf-accel-ops.c |  2 --
 target/i386/hvf/hvf.c     |  2 --
 4 files changed, 1 insertion(+), 21 deletions(-)
 delete mode 100644 accel/hvf/hvf-accel-ops.h

diff --git a/accel/hvf/hvf-accel-ops.h b/accel/hvf/hvf-accel-ops.h
deleted file mode 100644
index XXXXXXX..XXXXXXX
--- a/accel/hvf/hvf-accel-ops.h
+++ /dev/null
@@ -XXX,XX +XXX,XX @@
-/*
- * Accelerator CPUS Interface
- *
- * Copyright 2020 SUSE LLC
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- */
-
-#ifndef HVF_CPUS_H
-#define HVF_CPUS_H
-
-#include "sysemu/cpus.h"
-
-int hvf_vcpu_exec(CPUState *);
-
-#endif /* HVF_CPUS_H */
diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ extern HVFState *hvf_state;
 void assert_hvf_ok(hv_return_t ret);
 int hvf_arch_init_vcpu(CPUState *cpu);
 void hvf_arch_vcpu_destroy(CPUState *cpu);
+int hvf_vcpu_exec(CPUState *);
 hvf_slot *hvf_find_overlap_slot(uint64_t, uint64_t);
 int hvf_put_registers(CPUState *);
 int hvf_get_registers(CPUState *);
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@
 #include "sysemu/runstate.h"
 #include "qemu/guest-random.h"
 
-#include "hvf-accel-ops.h"
-
 HVFState *hvf_state;
 
 /* Memory slots */
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/accel.h"
 #include "target/i386/cpu.h"
 
-#include "hvf-accel-ops.h"
-
 void vmx_update_tpr(CPUState *cpu)
 {
     /* TODO: need integrate APIC handling */
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

We will need more than a single field for hvf going forward. To keep
the global vcpu struct uncluttered, let's allocate a special hvf vcpu
struct, similar to how hax does it.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-12-agraf@csgraf.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/core/cpu.h       |   3 +-
 include/sysemu/hvf_int.h    |   4 +
 target/i386/hvf/vmx.h       |  24 +++--
 accel/hvf/hvf-accel-ops.c   |   8 +-
 target/i386/hvf/hvf.c       | 104 +++++++++---------
 target/i386/hvf/x86.c       |  28 ++---
 target/i386/hvf/x86_descr.c |  26 ++---
 target/i386/hvf/x86_emu.c   |  62 +++++------
 target/i386/hvf/x86_mmu.c   |   4 +-
 target/i386/hvf/x86_task.c  |  12 +--
 target/i386/hvf/x86hvf.c    | 210 ++++++++++++++++++------------------
 11 files changed, 248 insertions(+), 237 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -XXX,XX +XXX,XX @@ struct KVMState;
 struct kvm_run;
 
 struct hax_vcpu_state;
+struct hvf_vcpu_state;
 
 #define TB_JMP_CACHE_BITS 12
 #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
@@ -XXX,XX +XXX,XX @@ struct CPUState {
 
     struct hax_vcpu_state *hax_vcpu;
 
-    int hvf_fd;
+    struct hvf_vcpu_state *hvf;
 
     /* track IOMMUs whose translations we've cached in the TCG TLB */
     GArray *iommu_notifiers;
diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/hvf_int.h
+++ b/include/sysemu/hvf_int.h
@@ -XXX,XX +XXX,XX @@ struct HVFState {
 };
 extern HVFState *hvf_state;
 
+struct hvf_vcpu_state {
+    int fd;
+};
+
 void assert_hvf_ok(hv_return_t ret);
 int hvf_arch_init_vcpu(CPUState *cpu);
 void hvf_arch_vcpu_destroy(CPUState *cpu);
diff --git a/target/i386/hvf/vmx.h b/target/i386/hvf/vmx.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/vmx.h
+++ b/target/i386/hvf/vmx.h
@@ -XXX,XX +XXX,XX @@
 #include "vmcs.h"
 #include "cpu.h"
 #include "x86.h"
+#include "sysemu/hvf.h"
+#include "sysemu/hvf_int.h"
 
 #include "exec/address-spaces.h"
 
@@ -XXX,XX +XXX,XX @@ static inline void macvm_set_rip(CPUState *cpu, uint64_t rip)
     uint64_t val;
 
     /* BUG, should take considering overlap.. */
-    wreg(cpu->hvf_fd, HV_X86_RIP, rip);
+    wreg(cpu->hvf->fd, HV_X86_RIP, rip);
     env->eip = rip;
 
     /* after moving forward in rip, we need to clean INTERRUPTABILITY */
-   val = rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
+   val = rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
    if (val & (VMCS_INTERRUPTIBILITY_STI_BLOCKING |
                VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)) {
         env->hflags &= ~HF_INHIBIT_IRQ_MASK;
-        wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY,
+        wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY,
                val & ~(VMCS_INTERRUPTIBILITY_STI_BLOCKING |
                VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING));
    }
@@ -XXX,XX +XXX,XX @@ static inline void vmx_clear_nmi_blocking(CPUState *cpu)
     CPUX86State *env = &x86_cpu->env;
 
     env->hflags2 &= ~HF2_NMI_MASK;
-    uint32_t gi = (uint32_t) rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
+    uint32_t gi = (uint32_t) rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
     gi &= ~VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
-    wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
+    wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
 }
 
 static inline void vmx_set_nmi_blocking(CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ static inline void vmx_set_nmi_blocking(CPUState *cpu)
     CPUX86State *env = &x86_cpu->env;
 
     env->hflags2 |= HF2_NMI_MASK;
-    uint32_t gi = (uint32_t)rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY);
+    uint32_t gi = (uint32_t)rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY);
     gi |= VMCS_INTERRUPTIBILITY_NMI_BLOCKING;
-    wvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
+    wvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY, gi);
 }
 
 static inline void vmx_set_nmi_window_exiting(CPUState *cpu)
 {
     uint64_t val;
-    val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
-    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val |
+    val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
+    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val |
           VMCS_PRI_PROC_BASED_CTLS_NMI_WINDOW_EXITING);
 
 }
@@ -XXX,XX +XXX,XX @@ static inline void vmx_clear_nmi_window_exiting(CPUState *cpu)
 {
 
     uint64_t val;
-    val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
-    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val &
+    val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
+    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val &
           ~VMCS_PRI_PROC_BASED_CTLS_NMI_WINDOW_EXITING);
 }
 
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ type_init(hvf_type_init);
 
 static void hvf_vcpu_destroy(CPUState *cpu)
 {
-    hv_return_t ret = hv_vcpu_destroy(cpu->hvf_fd);
+    hv_return_t ret = hv_vcpu_destroy(cpu->hvf->fd);
     assert_hvf_ok(ret);
 
     hvf_arch_vcpu_destroy(cpu);
+    g_free(cpu->hvf);
+    cpu->hvf = NULL;
 }
 
 static int hvf_init_vcpu(CPUState *cpu)
 {
     int r;
 
+    cpu->hvf = g_malloc0(sizeof(*cpu->hvf));
+
     /* init cpu signals */
     sigset_t set;
     struct sigaction sigact;
@@ -XXX,XX +XXX,XX @@ static int hvf_init_vcpu(CPUState *cpu)
     pthread_sigmask(SIG_BLOCK, NULL, &set);
     sigdelset(&set, SIG_IPI);
 
-    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf_fd, HV_VCPU_DEFAULT);
+    r = hv_vcpu_create((hv_vcpuid_t *)&cpu->hvf->fd, HV_VCPU_DEFAULT);
     cpu->vcpu_dirty = 1;
     assert_hvf_ok(r);
 
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@ void vmx_update_tpr(CPUState *cpu)
     int tpr = cpu_get_apic_tpr(x86_cpu->apic_state) << 4;
     int irr = apic_get_highest_priority_irr(x86_cpu->apic_state);
 
-    wreg(cpu->hvf_fd, HV_X86_TPR, tpr);
+    wreg(cpu->hvf->fd, HV_X86_TPR, tpr);
     if (irr == -1) {
-        wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, 0);
+        wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, 0);
     } else {
-        wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, (irr > tpr) ? tpr >> 4 :
+        wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, (irr > tpr) ? tpr >> 4 :
               irr >> 4);
     }
 }
@@ -XXX,XX +XXX,XX @@ void vmx_update_tpr(CPUState *cpu)
 static void update_apic_tpr(CPUState *cpu)
 {
     X86CPU *x86_cpu = X86_CPU(cpu);
-    int tpr = rreg(cpu->hvf_fd, HV_X86_TPR) >> 4;
+    int tpr = rreg(cpu->hvf->fd, HV_X86_TPR) >> 4;
     cpu_set_apic_tpr(x86_cpu->apic_state, tpr);
 }
 
@@ -XXX,XX +XXX,XX @@ int hvf_arch_init_vcpu(CPUState *cpu)
     }
 
     /* set VMCS control fields */
-    wvmcs(cpu->hvf_fd, VMCS_PIN_BASED_CTLS,
+    wvmcs(cpu->hvf->fd, VMCS_PIN_BASED_CTLS,
           cap2ctrl(hvf_state->hvf_caps->vmx_cap_pinbased,
           VMCS_PIN_BASED_CTLS_EXTINT |
           VMCS_PIN_BASED_CTLS_NMI |
           VMCS_PIN_BASED_CTLS_VNMI));
-    wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS,
+    wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS,
           cap2ctrl(hvf_state->hvf_caps->vmx_cap_procbased,
           VMCS_PRI_PROC_BASED_CTLS_HLT |
           VMCS_PRI_PROC_BASED_CTLS_MWAIT |
           VMCS_PRI_PROC_BASED_CTLS_TSC_OFFSET |
           VMCS_PRI_PROC_BASED_CTLS_TPR_SHADOW) |
           VMCS_PRI_PROC_BASED_CTLS_SEC_CONTROL);
-    wvmcs(cpu->hvf_fd, VMCS_SEC_PROC_BASED_CTLS,
+    wvmcs(cpu->hvf->fd, VMCS_SEC_PROC_BASED_CTLS,
           cap2ctrl(hvf_state->hvf_caps->vmx_cap_procbased2,
                    VMCS_PRI_PROC_BASED2_CTLS_APIC_ACCESSES));
 
-    wvmcs(cpu->hvf_fd, VMCS_ENTRY_CTLS, cap2ctrl(hvf_state->hvf_caps->vmx_cap_entry,
+    wvmcs(cpu->hvf->fd, VMCS_ENTRY_CTLS, cap2ctrl(hvf_state->hvf_caps->vmx_cap_entry,
           0));
-    wvmcs(cpu->hvf_fd, VMCS_EXCEPTION_BITMAP, 0); /* Double fault */
+    wvmcs(cpu->hvf->fd, VMCS_EXCEPTION_BITMAP, 0); /* Double fault */
 
-    wvmcs(cpu->hvf_fd, VMCS_TPR_THRESHOLD, 0);
+    wvmcs(cpu->hvf->fd, VMCS_TPR_THRESHOLD, 0);
 
     x86cpu = X86_CPU(cpu);
     x86cpu->env.xsave_buf = qemu_memalign(4096, 4096);
 
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_STAR, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_LSTAR, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_CSTAR, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_FMASK, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_FSBASE, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_GSBASE, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_KERNELGSBASE, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_TSC_AUX, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_TSC, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_CS, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_EIP, 1);
-    hv_vcpu_enable_native_msr(cpu->hvf_fd, MSR_IA32_SYSENTER_ESP, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_STAR, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_LSTAR, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_CSTAR, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_FMASK, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_FSBASE, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_GSBASE, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_KERNELGSBASE, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_TSC_AUX, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_TSC, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_CS, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_EIP, 1);
+    hv_vcpu_enable_native_msr(cpu->hvf->fd, MSR_IA32_SYSENTER_ESP, 1);
 
     return 0;
 }
@@ -XXX,XX +XXX,XX @@ static void hvf_store_events(CPUState *cpu, uint32_t ins_len, uint64_t idtvec_in
         }
         if (idtvec_info & VMCS_IDT_VEC_ERRCODE_VALID) {
             env->has_error_code = true;
-            env->error_code = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_ERROR);
+            env->error_code = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_ERROR);
         }
     }
-    if ((rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY) &
+    if ((rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY) &
         VMCS_INTERRUPTIBILITY_NMI_BLOCKING)) {
         env->hflags2 |= HF2_NMI_MASK;
     } else {
         env->hflags2 &= ~HF2_NMI_MASK;
     }
-    if (rvmcs(cpu->hvf_fd, VMCS_GUEST_INTERRUPTIBILITY) &
+    if (rvmcs(cpu->hvf->fd, VMCS_GUEST_INTERRUPTIBILITY) &
          (VMCS_INTERRUPTIBILITY_STI_BLOCKING |
          VMCS_INTERRUPTIBILITY_MOVSS_BLOCKING)) {
         env->hflags |= HF_INHIBIT_IRQ_MASK;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
             return EXCP_HLT;
         }
 
-        hv_return_t r  = hv_vcpu_run(cpu->hvf_fd);
+        hv_return_t r  = hv_vcpu_run(cpu->hvf->fd);
         assert_hvf_ok(r);
 
         /* handle VMEXIT */
-        uint64_t exit_reason = rvmcs(cpu->hvf_fd, VMCS_EXIT_REASON);
-        uint64_t exit_qual = rvmcs(cpu->hvf_fd, VMCS_EXIT_QUALIFICATION);
-        uint32_t ins_len = (uint32_t)rvmcs(cpu->hvf_fd,
+        uint64_t exit_reason = rvmcs(cpu->hvf->fd, VMCS_EXIT_REASON);
+        uint64_t exit_qual = rvmcs(cpu->hvf->fd, VMCS_EXIT_QUALIFICATION);
+        uint32_t ins_len = (uint32_t)rvmcs(cpu->hvf->fd,
                                            VMCS_EXIT_INSTRUCTION_LENGTH);
 
-        uint64_t idtvec_info = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_INFO);
+        uint64_t idtvec_info = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_INFO);
 
         hvf_store_events(cpu, ins_len, idtvec_info);
-        rip = rreg(cpu->hvf_fd, HV_X86_RIP);
-        env->eflags = rreg(cpu->hvf_fd, HV_X86_RFLAGS);
+        rip = rreg(cpu->hvf->fd, HV_X86_RIP);
+        env->eflags = rreg(cpu->hvf->fd, HV_X86_RFLAGS);
 
         qemu_mutex_lock_iothread();
 
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
         case EXIT_REASON_EPT_FAULT:
         {
             hvf_slot *slot;
-            uint64_t gpa = rvmcs(cpu->hvf_fd, VMCS_GUEST_PHYSICAL_ADDRESS);
+            uint64_t gpa = rvmcs(cpu->hvf->fd, VMCS_GUEST_PHYSICAL_ADDRESS);
 
             if (((idtvec_info & VMCS_IDT_VEC_VALID) == 0) &&
                 ((exit_qual & EXIT_QUAL_NMIUDTI) != 0)) {
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
                 store_regs(cpu);
                 break;
             } else if (!string && !in) {
-                RAX(env) = rreg(cpu->hvf_fd, HV_X86_RAX);
+                RAX(env) = rreg(cpu->hvf->fd, HV_X86_RAX);
                 hvf_handle_io(env, port, &RAX(env), 1, size, 1);
                 macvm_set_rip(cpu, rip + ins_len);
                 break;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
             break;
         }
         case EXIT_REASON_CPUID: {
-            uint32_t rax = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RAX);
-            uint32_t rbx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RBX);
-            uint32_t rcx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RCX);
-            uint32_t rdx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RDX);
+            uint32_t rax = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RAX);
+            uint32_t rbx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RBX);
+            uint32_t rcx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RCX);
+            uint32_t rdx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RDX);
 
             if (rax == 1) {
                 /* CPUID1.ecx.OSXSAVE needs to know CR4 */
-                env->cr[4] = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR4);
+                env->cr[4] = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR4);
             }
             hvf_cpu_x86_cpuid(env, rax, rcx, &rax, &rbx, &rcx, &rdx);
 
-            wreg(cpu->hvf_fd, HV_X86_RAX, rax);
-            wreg(cpu->hvf_fd, HV_X86_RBX, rbx);
-            wreg(cpu->hvf_fd, HV_X86_RCX, rcx);
-            wreg(cpu->hvf_fd, HV_X86_RDX, rdx);
+            wreg(cpu->hvf->fd, HV_X86_RAX, rax);
+            wreg(cpu->hvf->fd, HV_X86_RBX, rbx);
+            wreg(cpu->hvf->fd, HV_X86_RCX, rcx);
+            wreg(cpu->hvf->fd, HV_X86_RDX, rdx);
 
             macvm_set_rip(cpu, rip + ins_len);
             break;
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
         case EXIT_REASON_XSETBV: {
             X86CPU *x86_cpu = X86_CPU(cpu);
             CPUX86State *env = &x86_cpu->env;
-            uint32_t eax = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RAX);
-            uint32_t ecx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RCX);
-            uint32_t edx = (uint32_t)rreg(cpu->hvf_fd, HV_X86_RDX);
+            uint32_t eax = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RAX);
+            uint32_t ecx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RCX);
+            uint32_t edx = (uint32_t)rreg(cpu->hvf->fd, HV_X86_RDX);
 
             if (ecx) {
                 macvm_set_rip(cpu, rip + ins_len);
                 break;
             }
             env->xcr0 = ((uint64_t)edx << 32) | eax;
-            wreg(cpu->hvf_fd, HV_X86_XCR0, env->xcr0 | 1);
+            wreg(cpu->hvf->fd, HV_X86_XCR0, env->xcr0 | 1);
             macvm_set_rip(cpu, rip + ins_len);
             break;
         }
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
 
             switch (cr) {
             case 0x0: {
-                macvm_set_cr0(cpu->hvf_fd, RRX(env, reg));
+                macvm_set_cr0(cpu->hvf->fd, RRX(env, reg));
                 break;
             }
             case 4: {
-                macvm_set_cr4(cpu->hvf_fd, RRX(env, reg));
+                macvm_set_cr4(cpu->hvf->fd, RRX(env, reg));
                 break;
             }
             case 8: {
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
             break;
         }
         case EXIT_REASON_TASK_SWITCH: {
-            uint64_t vinfo = rvmcs(cpu->hvf_fd, VMCS_IDT_VECTORING_INFO);
+            uint64_t vinfo = rvmcs(cpu->hvf->fd, VMCS_IDT_VECTORING_INFO);
             x68_segment_selector sel = {.sel = exit_qual & 0xffff};
             vmx_handle_task_switch(cpu, sel, (exit_qual >> 30) & 0x3,
              vinfo & VMCS_INTR_VALID, vinfo & VECTORING_INFO_VECTOR_MASK, vinfo
@@ -XXX,XX +XXX,XX @@ int hvf_vcpu_exec(CPUState *cpu)
             break;
         }
         case EXIT_REASON_RDPMC:
-            wreg(cpu->hvf_fd, HV_X86_RAX, 0);
-            wreg(cpu->hvf_fd, HV_X86_RDX, 0);
+            wreg(cpu->hvf->fd, HV_X86_RAX, 0);
+            wreg(cpu->hvf->fd, HV_X86_RDX, 0);
             macvm_set_rip(cpu, rip + ins_len);
             break;
         case VMX_REASON_VMCALL:
diff --git a/target/i386/hvf/x86.c b/target/i386/hvf/x86.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86.c
+++ b/target/i386/hvf/x86.c
@@ -XXX,XX +XXX,XX @@ bool x86_read_segment_descriptor(struct CPUState *cpu,
     }
 
     if (GDT_SEL == sel.ti) {
-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_BASE);
-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_BASE);
+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
     } else {
-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_BASE);
-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_LIMIT);
+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_BASE);
+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_LIMIT);
     }
 
     if (sel.index * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_write_segment_descriptor(struct CPUState *cpu,
     uint32_t limit;
     
     if (GDT_SEL == sel.ti) {
-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_BASE);
-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_BASE);
+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
     } else {
-        base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_BASE);
-        limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_LDTR_LIMIT);
+        base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_BASE);
+        limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_LDTR_LIMIT);
     }
     
     if (sel.index * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_write_segment_descriptor(struct CPUState *cpu,
 bool x86_read_call_gate(struct CPUState *cpu, struct x86_call_gate *idt_desc,
                         int gate)
 {
-    target_ulong base  = rvmcs(cpu->hvf_fd, VMCS_GUEST_IDTR_BASE);
-    uint32_t limit = rvmcs(cpu->hvf_fd, VMCS_GUEST_IDTR_LIMIT);
+    target_ulong base  = rvmcs(cpu->hvf->fd, VMCS_GUEST_IDTR_BASE);
+    uint32_t limit = rvmcs(cpu->hvf->fd, VMCS_GUEST_IDTR_LIMIT);
 
     memset(idt_desc, 0, sizeof(*idt_desc));
     if (gate * 8 >= limit) {
@@ -XXX,XX +XXX,XX @@ bool x86_read_call_gate(struct CPUState *cpu, struct x86_call_gate *idt_desc,
 
 bool x86_is_protected(struct CPUState *cpu)
 {
-    uint64_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
+    uint64_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
     return cr0 & CR0_PE;
 }
 
@@ -XXX,XX +XXX,XX @@ bool x86_is_v8086(struct CPUState *cpu)
 
 bool x86_is_long_mode(struct CPUState *cpu)
 {
-    return rvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER) & MSR_EFER_LMA;
+    return rvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER) & MSR_EFER_LMA;
 }
 
 bool x86_is_long64_mode(struct CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ bool x86_is_long64_mode(struct CPUState *cpu)
 
 bool x86_is_paging_mode(struct CPUState *cpu)
 {
-    uint64_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
+    uint64_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
     return cr0 & CR0_PG;
 }
 
 bool x86_is_pae_enabled(struct CPUState *cpu)
 {
-    uint64_t cr4 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR4);
+    uint64_t cr4 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR4);
     return cr4 & CR4_PAE;
 }
 
diff --git a/target/i386/hvf/x86_descr.c b/target/i386/hvf/x86_descr.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86_descr.c
+++ b/target/i386/hvf/x86_descr.c
@@ -XXX,XX +XXX,XX @@ static const struct vmx_segment_field {
 
 uint32_t vmx_read_segment_limit(CPUState *cpu, X86Seg seg)
 {
-    return (uint32_t)rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].limit);
+    return (uint32_t)rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].limit);
 }
 
 uint32_t vmx_read_segment_ar(CPUState *cpu, X86Seg seg)
 {
-    return (uint32_t)rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].ar_bytes);
+    return (uint32_t)rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].ar_bytes);
 }
 
 uint64_t vmx_read_segment_base(CPUState *cpu, X86Seg seg)
 {
-    return rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].base);
+    return rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].base);
 }
 
 x68_segment_selector vmx_read_segment_selector(CPUState *cpu, X86Seg seg)
 {
     x68_segment_selector sel;
-    sel.sel = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector);
+    sel.sel = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector);
     return sel;
 }
 
 void vmx_write_segment_selector(struct CPUState *cpu, x68_segment_selector selector, X86Seg seg)
 {
-    wvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector, selector.sel);
+    wvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector, selector.sel);
 }
 
 void vmx_read_segment_descriptor(struct CPUState *cpu, struct vmx_segment *desc, X86Seg seg)
 {
-    desc->sel = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].selector);
-    desc->base = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].base);
-    desc->limit = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].limit);
-    desc->ar = rvmcs(cpu->hvf_fd, vmx_segment_fields[seg].ar_bytes);
+    desc->sel = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].selector);
+    desc->base = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].base);
+    desc->limit = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].limit);
+    desc->ar = rvmcs(cpu->hvf->fd, vmx_segment_fields[seg].ar_bytes);
 }
 
 void vmx_write_segment_descriptor(CPUState *cpu, struct vmx_segment *desc, X86Seg seg)
 {
     const struct vmx_segment_field *sf = &vmx_segment_fields[seg];
 
-    wvmcs(cpu->hvf_fd, sf->base, desc->base);
-    wvmcs(cpu->hvf_fd, sf->limit, desc->limit);
-    wvmcs(cpu->hvf_fd, sf->selector, desc->sel);
-    wvmcs(cpu->hvf_fd, sf->ar_bytes, desc->ar);
+    wvmcs(cpu->hvf->fd, sf->base, desc->base);
+    wvmcs(cpu->hvf->fd, sf->limit, desc->limit);
+    wvmcs(cpu->hvf->fd, sf->selector, desc->sel);
+    wvmcs(cpu->hvf->fd, sf->ar_bytes, desc->ar);
 }
 
 void x86_segment_descriptor_to_vmx(struct CPUState *cpu, x68_segment_selector selector, struct x86_segment_descriptor *desc, struct vmx_segment *vmx_desc)
diff --git a/target/i386/hvf/x86_emu.c b/target/i386/hvf/x86_emu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86_emu.c
+++ b/target/i386/hvf/x86_emu.c
@@ -XXX,XX +XXX,XX @@ void simulate_rdmsr(struct CPUState *cpu)
 
     switch (msr) {
     case MSR_IA32_TSC:
-        val = rdtscp() + rvmcs(cpu->hvf_fd, VMCS_TSC_OFFSET);
+        val = rdtscp() + rvmcs(cpu->hvf->fd, VMCS_TSC_OFFSET);
         break;
     case MSR_IA32_APICBASE:
         val = cpu_get_apic_base(X86_CPU(cpu)->apic_state);
@@ -XXX,XX +XXX,XX @@ void simulate_rdmsr(struct CPUState *cpu)
         val = x86_cpu->ucode_rev;
         break;
     case MSR_EFER:
-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER);
+        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER);
         break;
     case MSR_FSBASE:
-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_FS_BASE);
+        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_FS_BASE);
         break;
     case MSR_GSBASE:
-        val = rvmcs(cpu->hvf_fd, VMCS_GUEST_GS_BASE);
+        val = rvmcs(cpu->hvf->fd, VMCS_GUEST_GS_BASE);
         break;
     case MSR_KERNELGSBASE:
-        val = rvmcs(cpu->hvf_fd, VMCS_HOST_FS_BASE);
+        val = rvmcs(cpu->hvf->fd, VMCS_HOST_FS_BASE);
         break;
     case MSR_STAR:
         abort();
@@ -XXX,XX +XXX,XX @@ void simulate_wrmsr(struct CPUState *cpu)
         cpu_set_apic_base(X86_CPU(cpu)->apic_state, data);
         break;
     case MSR_FSBASE:
-        wvmcs(cpu->hvf_fd, VMCS_GUEST_FS_BASE, data);
+        wvmcs(cpu->hvf->fd, VMCS_GUEST_FS_BASE, data);
         break;
     case MSR_GSBASE:
-        wvmcs(cpu->hvf_fd, VMCS_GUEST_GS_BASE, data);
+        wvmcs(cpu->hvf->fd, VMCS_GUEST_GS_BASE, data);
         break;
     case MSR_KERNELGSBASE:
-        wvmcs(cpu->hvf_fd, VMCS_HOST_FS_BASE, data);
+        wvmcs(cpu->hvf->fd, VMCS_HOST_FS_BASE, data);
         break;
     case MSR_STAR:
         abort();
@@ -XXX,XX +XXX,XX @@ void simulate_wrmsr(struct CPUState *cpu)
         break;
     case MSR_EFER:
         /*printf("new efer %llx\n", EFER(cpu));*/
-        wvmcs(cpu->hvf_fd, VMCS_GUEST_IA32_EFER, data);
+        wvmcs(cpu->hvf->fd, VMCS_GUEST_IA32_EFER, data);
         if (data & MSR_EFER_NXE) {
-            hv_vcpu_invalidate_tlb(cpu->hvf_fd);
+            hv_vcpu_invalidate_tlb(cpu->hvf->fd);
         }
         break;
     case MSR_MTRRphysBase(0):
@@ -XXX,XX +XXX,XX @@ void load_regs(struct CPUState *cpu)
     CPUX86State *env = &x86_cpu->env;
 
     int i = 0;
-    RRX(env, R_EAX) = rreg(cpu->hvf_fd, HV_X86_RAX);
-    RRX(env, R_EBX) = rreg(cpu->hvf_fd, HV_X86_RBX);
-    RRX(env, R_ECX) = rreg(cpu->hvf_fd, HV_X86_RCX);
-    RRX(env, R_EDX) = rreg(cpu->hvf_fd, HV_X86_RDX);
-    RRX(env, R_ESI) = rreg(cpu->hvf_fd, HV_X86_RSI);
-    RRX(env, R_EDI) = rreg(cpu->hvf_fd, HV_X86_RDI);
-    RRX(env, R_ESP) = rreg(cpu->hvf_fd, HV_X86_RSP);
-    RRX(env, R_EBP) = rreg(cpu->hvf_fd, HV_X86_RBP);
+    RRX(env, R_EAX) = rreg(cpu->hvf->fd, HV_X86_RAX);
+    RRX(env, R_EBX) = rreg(cpu->hvf->fd, HV_X86_RBX);
+    RRX(env, R_ECX) = rreg(cpu->hvf->fd, HV_X86_RCX);
+    RRX(env, R_EDX) = rreg(cpu->hvf->fd, HV_X86_RDX);
+    RRX(env, R_ESI) = rreg(cpu->hvf->fd, HV_X86_RSI);
+    RRX(env, R_EDI) = rreg(cpu->hvf->fd, HV_X86_RDI);
+    RRX(env, R_ESP) = rreg(cpu->hvf->fd, HV_X86_RSP);
+    RRX(env, R_EBP) = rreg(cpu->hvf->fd, HV_X86_RBP);
     for (i = 8; i < 16; i++) {
-        RRX(env, i) = rreg(cpu->hvf_fd, HV_X86_RAX + i);
+        RRX(env, i) = rreg(cpu->hvf->fd, HV_X86_RAX + i);
     }
 
-    env->eflags = rreg(cpu->hvf_fd, HV_X86_RFLAGS);
+    env->eflags = rreg(cpu->hvf->fd, HV_X86_RFLAGS);
     rflags_to_lflags(env);
-    env->eip = rreg(cpu->hvf_fd, HV_X86_RIP);
+    env->eip = rreg(cpu->hvf->fd, HV_X86_RIP);
 }
 
 void store_regs(struct CPUState *cpu)
@@ -XXX,XX +XXX,XX @@ void store_regs(struct CPUState *cpu)
     CPUX86State *env = &x86_cpu->env;
 
     int i = 0;
-    wreg(cpu->hvf_fd, HV_X86_RAX, RAX(env));
-    wreg(cpu->hvf_fd, HV_X86_RBX, RBX(env));
-    wreg(cpu->hvf_fd, HV_X86_RCX, RCX(env));
-    wreg(cpu->hvf_fd, HV_X86_RDX, RDX(env));
-    wreg(cpu->hvf_fd, HV_X86_RSI, RSI(env));
-    wreg(cpu->hvf_fd, HV_X86_RDI, RDI(env));
-    wreg(cpu->hvf_fd, HV_X86_RBP, RBP(env));
-    wreg(cpu->hvf_fd, HV_X86_RSP, RSP(env));
+    wreg(cpu->hvf->fd, HV_X86_RAX, RAX(env));
+    wreg(cpu->hvf->fd, HV_X86_RBX, RBX(env));
+    wreg(cpu->hvf->fd, HV_X86_RCX, RCX(env));
+    wreg(cpu->hvf->fd, HV_X86_RDX, RDX(env));
+    wreg(cpu->hvf->fd, HV_X86_RSI, RSI(env));
+    wreg(cpu->hvf->fd, HV_X86_RDI, RDI(env));
+    wreg(cpu->hvf->fd, HV_X86_RBP, RBP(env));
+    wreg(cpu->hvf->fd, HV_X86_RSP, RSP(env));
     for (i = 8; i < 16; i++) {
-        wreg(cpu->hvf_fd, HV_X86_RAX + i, RRX(env, i));
+        wreg(cpu->hvf->fd, HV_X86_RAX + i, RRX(env, i));
     }
 
     lflags_to_rflags(env);
-    wreg(cpu->hvf_fd, HV_X86_RFLAGS, env->eflags);
+    wreg(cpu->hvf->fd, HV_X86_RFLAGS, env->eflags);
     macvm_set_rip(cpu, env->eip);
 }
 
diff --git a/target/i386/hvf/x86_mmu.c b/target/i386/hvf/x86_mmu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86_mmu.c
+++ b/target/i386/hvf/x86_mmu.c
@@ -XXX,XX +XXX,XX @@ static bool test_pt_entry(struct CPUState *cpu, struct gpt_translation *pt,
         pt->err_code |= MMU_PAGE_PT;
     }
 
-    uint32_t cr0 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0);
+    uint32_t cr0 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0);
     /* check protection */
     if (cr0 & CR0_WP) {
         if (pt->write_access && !pte_write_access(pte)) {
@@ -XXX,XX +XXX,XX @@ static bool walk_gpt(struct CPUState *cpu, target_ulong addr, int err_code,
 {
     int top_level, level;
     bool is_large = false;
-    target_ulong cr3 = rvmcs(cpu->hvf_fd, VMCS_GUEST_CR3);
+    target_ulong cr3 = rvmcs(cpu->hvf->fd, VMCS_GUEST_CR3);
     uint64_t page_mask = pae ? PAE_PTE_PAGE_MASK : LEGACY_PTE_PAGE_MASK;
     
     memset(pt, 0, sizeof(*pt));
diff --git a/target/i386/hvf/x86_task.c b/target/i386/hvf/x86_task.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86_task.c
+++ b/target/i386/hvf/x86_task.c
@@ -XXX,XX +XXX,XX @@ static void load_state_from_tss32(CPUState *cpu, struct x86_tss_segment32 *tss)
     X86CPU *x86_cpu = X86_CPU(cpu);
     CPUX86State *env = &x86_cpu->env;
 
-    wvmcs(cpu->hvf_fd, VMCS_GUEST_CR3, tss->cr3);
+    wvmcs(cpu->hvf->fd, VMCS_GUEST_CR3, tss->cr3);
 
     env->eip = tss->eip;
     env->eflags = tss->eflags | 2;
@@ -XXX,XX +XXX,XX @@ static int task_switch_32(CPUState *cpu, x68_segment_selector tss_sel, x68_segme
 
 void vmx_handle_task_switch(CPUState *cpu, x68_segment_selector tss_sel, int reason, bool gate_valid, uint8_t gate, uint64_t gate_type)
 {
-    uint64_t rip = rreg(cpu->hvf_fd, HV_X86_RIP);
+    uint64_t rip = rreg(cpu->hvf->fd, HV_X86_RIP);
     if (!gate_valid || (gate_type != VMCS_INTR_T_HWEXCEPTION &&
                         gate_type != VMCS_INTR_T_HWINTR &&
                         gate_type != VMCS_INTR_T_NMI)) {
-        int ins_len = rvmcs(cpu->hvf_fd, VMCS_EXIT_INSTRUCTION_LENGTH);
+        int ins_len = rvmcs(cpu->hvf->fd, VMCS_EXIT_INSTRUCTION_LENGTH);
         macvm_set_rip(cpu, rip + ins_len);
         return;
     }
@@ -XXX,XX +XXX,XX @@ void vmx_handle_task_switch(CPUState *cpu, x68_segment_selector tss_sel, int rea
         //ret = task_switch_16(cpu, tss_sel, old_tss_sel, old_tss_base, &next_tss_desc);
         VM_PANIC("task_switch_16");
 
-    macvm_set_cr0(cpu->hvf_fd, rvmcs(cpu->hvf_fd, VMCS_GUEST_CR0) | CR0_TS);
+    macvm_set_cr0(cpu->hvf->fd, rvmcs(cpu->hvf->fd, VMCS_GUEST_CR0) | CR0_TS);
     x86_segment_descriptor_to_vmx(cpu, tss_sel, &next_tss_desc, &vmx_seg);
     vmx_write_segment_descriptor(cpu, &vmx_seg, R_TR);
 
     store_regs(cpu);
 
-    hv_vcpu_invalidate_tlb(cpu->hvf_fd);
-    hv_vcpu_flush(cpu->hvf_fd);
+    hv_vcpu_invalidate_tlb(cpu->hvf->fd);
+    hv_vcpu_flush(cpu->hvf->fd);
 }
diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -XXX,XX +XXX,XX @@ void hvf_put_xsave(CPUState *cpu_state)
 
     x86_cpu_xsave_all_areas(X86_CPU(cpu_state), xsave);
 
-    if (hv_vcpu_write_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
+    if (hv_vcpu_write_fpstate(cpu_state->hvf->fd, (void*)xsave, 4096)) {
         abort();
     }
 }
@@ -XXX,XX +XXX,XX @@ void hvf_put_segments(CPUState *cpu_state)
     CPUX86State *env = &X86_CPU(cpu_state)->env;
     struct vmx_segment seg;
     
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_LIMIT, env->idt.limit);
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_BASE, env->idt.base);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_LIMIT, env->idt.limit);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_BASE, env->idt.base);
 
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_LIMIT, env->gdt.limit);
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_BASE, env->gdt.base);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_LIMIT, env->gdt.limit);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_BASE, env->gdt.base);
 
-    /* wvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR2, env->cr[2]); */
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR3, env->cr[3]);
+    /* wvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR2, env->cr[2]); */
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR3, env->cr[3]);
     vmx_update_tpr(cpu_state);
-    wvmcs(cpu_state->hvf_fd, VMCS_GUEST_IA32_EFER, env->efer);
+    wvmcs(cpu_state->hvf->fd, VMCS_GUEST_IA32_EFER, env->efer);
 
-    macvm_set_cr4(cpu_state->hvf_fd, env->cr[4]);
-    macvm_set_cr0(cpu_state->hvf_fd, env->cr[0]);
+    macvm_set_cr4(cpu_state->hvf->fd, env->cr[4]);
+    macvm_set_cr0(cpu_state->hvf->fd, env->cr[0]);
 
     hvf_set_segment(cpu_state, &seg, &env->segs[R_CS], false);
     vmx_write_segment_descriptor(cpu_state, &seg, R_CS);
@@ -XXX,XX +XXX,XX @@ void hvf_put_segments(CPUState *cpu_state)
     hvf_set_segment(cpu_state, &seg, &env->ldt, false);
     vmx_write_segment_descriptor(cpu_state, &seg, R_LDTR);
     
-    hv_vcpu_flush(cpu_state->hvf_fd);
+    hv_vcpu_flush(cpu_state->hvf->fd);
 }
     
 void hvf_put_msrs(CPUState *cpu_state)
 {
     CPUX86State *env = &X86_CPU(cpu_state)->env;
 
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_CS,
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_CS,
                       env->sysenter_cs);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_ESP,
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_ESP,
                       env->sysenter_esp);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_EIP,
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_EIP,
                       env->sysenter_eip);
 
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_STAR, env->star);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_STAR, env->star);
 
 #ifdef TARGET_X86_64
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_CSTAR, env->cstar);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_KERNELGSBASE, env->kernelgsbase);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_FMASK, env->fmask);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_LSTAR, env->lstar);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_CSTAR, env->cstar);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_KERNELGSBASE, env->kernelgsbase);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_FMASK, env->fmask);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_LSTAR, env->lstar);
 #endif
 
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_GSBASE, env->segs[R_GS].base);
-    hv_vcpu_write_msr(cpu_state->hvf_fd, MSR_FSBASE, env->segs[R_FS].base);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_GSBASE, env->segs[R_GS].base);
+    hv_vcpu_write_msr(cpu_state->hvf->fd, MSR_FSBASE, env->segs[R_FS].base);
 }
 
 
@@ -XXX,XX +XXX,XX @@ void hvf_get_xsave(CPUState *cpu_state)
 
     xsave = X86_CPU(cpu_state)->env.xsave_buf;
 
-    if (hv_vcpu_read_fpstate(cpu_state->hvf_fd, (void*)xsave, 4096)) {
+    if (hv_vcpu_read_fpstate(cpu_state->hvf->fd, (void*)xsave, 4096)) {
         abort();
     }
 
@@ -XXX,XX +XXX,XX @@ void hvf_get_segments(CPUState *cpu_state)
     vmx_read_segment_descriptor(cpu_state, &seg, R_LDTR);
     hvf_get_segment(&env->ldt, &seg);
 
-    env->idt.limit = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_LIMIT);
-    env->idt.base = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IDTR_BASE);
-    env->gdt.limit = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_LIMIT);
-    env->gdt.base = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_GDTR_BASE);
+    env->idt.limit = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_LIMIT);
+    env->idt.base = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IDTR_BASE);
+    env->gdt.limit = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_LIMIT);
+    env->gdt.base = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_GDTR_BASE);
 
-    env->cr[0] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR0);
+    env->cr[0] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR0);
     env->cr[2] = 0;
-    env->cr[3] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR3);
-    env->cr[4] = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_CR4);
+    env->cr[3] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR3);
+    env->cr[4] = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_CR4);
     
-    env->efer = rvmcs(cpu_state->hvf_fd, VMCS_GUEST_IA32_EFER);
+    env->efer = rvmcs(cpu_state->hvf->fd, VMCS_GUEST_IA32_EFER);
 }
 
 void hvf_get_msrs(CPUState *cpu_state)
@@ -XXX,XX +XXX,XX @@ void hvf_get_msrs(CPUState *cpu_state)
     CPUX86State *env = &X86_CPU(cpu_state)->env;
     uint64_t tmp;
     
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_CS, &tmp);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_CS, &tmp);
     env->sysenter_cs = tmp;
     
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_ESP, &tmp);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_ESP, &tmp);
     env->sysenter_esp = tmp;
 
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_SYSENTER_EIP, &tmp);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_SYSENTER_EIP, &tmp);
     env->sysenter_eip = tmp;
 
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_STAR, &env->star);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_STAR, &env->star);
 
 #ifdef TARGET_X86_64
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_CSTAR, &env->cstar);
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_KERNELGSBASE, &env->kernelgsbase);
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_FMASK, &env->fmask);
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_LSTAR, &env->lstar);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_CSTAR, &env->cstar);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_KERNELGSBASE, &env->kernelgsbase);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_FMASK, &env->fmask);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_LSTAR, &env->lstar);
 #endif
 
-    hv_vcpu_read_msr(cpu_state->hvf_fd, MSR_IA32_APICBASE, &tmp);
+    hv_vcpu_read_msr(cpu_state->hvf->fd, MSR_IA32_APICBASE, &tmp);
     
-    env->tsc = rdtscp() + rvmcs(cpu_state->hvf_fd, VMCS_TSC_OFFSET);
+    env->tsc = rdtscp() + rvmcs(cpu_state->hvf->fd, VMCS_TSC_OFFSET);
 }
 
 int hvf_put_registers(CPUState *cpu_state)
@@ -XXX,XX +XXX,XX @@ int hvf_put_registers(CPUState *cpu_state)
     X86CPU *x86cpu = X86_CPU(cpu_state);
     CPUX86State *env = &x86cpu->env;
 
-    wreg(cpu_state->hvf_fd, HV_X86_RAX, env->regs[R_EAX]);
-    wreg(cpu_state->hvf_fd, HV_X86_RBX, env->regs[R_EBX]);
-    wreg(cpu_state->hvf_fd, HV_X86_RCX, env->regs[R_ECX]);
-    wreg(cpu_state->hvf_fd, HV_X86_RDX, env->regs[R_EDX]);
-    wreg(cpu_state->hvf_fd, HV_X86_RBP, env->regs[R_EBP]);
-    wreg(cpu_state->hvf_fd, HV_X86_RSP, env->regs[R_ESP]);
-    wreg(cpu_state->hvf_fd, HV_X86_RSI, env->regs[R_ESI]);
-    wreg(cpu_state->hvf_fd, HV_X86_RDI, env->regs[R_EDI]);
-    wreg(cpu_state->hvf_fd, HV_X86_R8, env->regs[8]);
-    wreg(cpu_state->hvf_fd, HV_X86_R9, env->regs[9]);
-    wreg(cpu_state->hvf_fd, HV_X86_R10, env->regs[10]);
-    wreg(cpu_state->hvf_fd, HV_X86_R11, env->regs[11]);
-    wreg(cpu_state->hvf_fd, HV_X86_R12, env->regs[12]);
-    wreg(cpu_state->hvf_fd, HV_X86_R13, env->regs[13]);
-    wreg(cpu_state->hvf_fd, HV_X86_R14, env->regs[14]);
-    wreg(cpu_state->hvf_fd, HV_X86_R15, env->regs[15]);
-    wreg(cpu_state->hvf_fd, HV_X86_RFLAGS, env->eflags);
-    wreg(cpu_state->hvf_fd, HV_X86_RIP, env->eip);
+    wreg(cpu_state->hvf->fd, HV_X86_RAX, env->regs[R_EAX]);
+    wreg(cpu_state->hvf->fd, HV_X86_RBX, env->regs[R_EBX]);
+    wreg(cpu_state->hvf->fd, HV_X86_RCX, env->regs[R_ECX]);
+    wreg(cpu_state->hvf->fd, HV_X86_RDX, env->regs[R_EDX]);
+    wreg(cpu_state->hvf->fd, HV_X86_RBP, env->regs[R_EBP]);
+    wreg(cpu_state->hvf->fd, HV_X86_RSP, env->regs[R_ESP]);
+    wreg(cpu_state->hvf->fd, HV_X86_RSI, env->regs[R_ESI]);
+    wreg(cpu_state->hvf->fd, HV_X86_RDI, env->regs[R_EDI]);
+    wreg(cpu_state->hvf->fd, HV_X86_R8, env->regs[8]);
+    wreg(cpu_state->hvf->fd, HV_X86_R9, env->regs[9]);
+    wreg(cpu_state->hvf->fd, HV_X86_R10, env->regs[10]);
+    wreg(cpu_state->hvf->fd, HV_X86_R11, env->regs[11]);
+    wreg(cpu_state->hvf->fd, HV_X86_R12, env->regs[12]);
+    wreg(cpu_state->hvf->fd, HV_X86_R13, env->regs[13]);
+    wreg(cpu_state->hvf->fd, HV_X86_R14, env->regs[14]);
+    wreg(cpu_state->hvf->fd, HV_X86_R15, env->regs[15]);
+    wreg(cpu_state->hvf->fd, HV_X86_RFLAGS, env->eflags);
+    wreg(cpu_state->hvf->fd, HV_X86_RIP, env->eip);
    
-    wreg(cpu_state->hvf_fd, HV_X86_XCR0, env->xcr0);
+    wreg(cpu_state->hvf->fd, HV_X86_XCR0, env->xcr0);
     
     hvf_put_xsave(cpu_state);
     
@@ -XXX,XX +XXX,XX @@ int hvf_put_registers(CPUState *cpu_state)
     
     hvf_put_msrs(cpu_state);
     
-    wreg(cpu_state->hvf_fd, HV_X86_DR0, env->dr[0]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR1, env->dr[1]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR2, env->dr[2]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR3, env->dr[3]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR4, env->dr[4]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR5, env->dr[5]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR6, env->dr[6]);
-    wreg(cpu_state->hvf_fd, HV_X86_DR7, env->dr[7]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR0, env->dr[0]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR1, env->dr[1]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR2, env->dr[2]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR3, env->dr[3]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR4, env->dr[4]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR5, env->dr[5]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR6, env->dr[6]);
+    wreg(cpu_state->hvf->fd, HV_X86_DR7, env->dr[7]);
     
     return 0;
 }
@@ -XXX,XX +XXX,XX @@ int hvf_get_registers(CPUState *cpu_state)
     X86CPU *x86cpu = X86_CPU(cpu_state);
     CPUX86State *env = &x86cpu->env;
 
-    env->regs[R_EAX] = rreg(cpu_state->hvf_fd, HV_X86_RAX);
-    env->regs[R_EBX] = rreg(cpu_state->hvf_fd, HV_X86_RBX);
-    env->regs[R_ECX] = rreg(cpu_state->hvf_fd, HV_X86_RCX);
-    env->regs[R_EDX] = rreg(cpu_state->hvf_fd, HV_X86_RDX);
-    env->regs[R_EBP] = rreg(cpu_state->hvf_fd, HV_X86_RBP);
-    env->regs[R_ESP] = rreg(cpu_state->hvf_fd, HV_X86_RSP);
-    env->regs[R_ESI] = rreg(cpu_state->hvf_fd, HV_X86_RSI);
-    env->regs[R_EDI] = rreg(cpu_state->hvf_fd, HV_X86_RDI);
-    env->regs[8] = rreg(cpu_state->hvf_fd, HV_X86_R8);
-    env->regs[9] = rreg(cpu_state->hvf_fd, HV_X86_R9);
-    env->regs[10] = rreg(cpu_state->hvf_fd, HV_X86_R10);
-    env->regs[11] = rreg(cpu_state->hvf_fd, HV_X86_R11);
-    env->regs[12] = rreg(cpu_state->hvf_fd, HV_X86_R12);
-    env->regs[13] = rreg(cpu_state->hvf_fd, HV_X86_R13);
-    env->regs[14] = rreg(cpu_state->hvf_fd, HV_X86_R14);
-    env->regs[15] = rreg(cpu_state->hvf_fd, HV_X86_R15);
+    env->regs[R_EAX] = rreg(cpu_state->hvf->fd, HV_X86_RAX);
+    env->regs[R_EBX] = rreg(cpu_state->hvf->fd, HV_X86_RBX);
+    env->regs[R_ECX] = rreg(cpu_state->hvf->fd, HV_X86_RCX);
+    env->regs[R_EDX] = rreg(cpu_state->hvf->fd, HV_X86_RDX);
+    env->regs[R_EBP] = rreg(cpu_state->hvf->fd, HV_X86_RBP);
+    env->regs[R_ESP] = rreg(cpu_state->hvf->fd, HV_X86_RSP);
+    env->regs[R_ESI] = rreg(cpu_state->hvf->fd, HV_X86_RSI);
+    env->regs[R_EDI] = rreg(cpu_state->hvf->fd, HV_X86_RDI);
+    env->regs[8] = rreg(cpu_state->hvf->fd, HV_X86_R8);
+    env->regs[9] = rreg(cpu_state->hvf->fd, HV_X86_R9);
+    env->regs[10] = rreg(cpu_state->hvf->fd, HV_X86_R10);
+    env->regs[11] = rreg(cpu_state->hvf->fd, HV_X86_R11);
+    env->regs[12] = rreg(cpu_state->hvf->fd, HV_X86_R12);
+    env->regs[13] = rreg(cpu_state->hvf->fd, HV_X86_R13);
+    env->regs[14] = rreg(cpu_state->hvf->fd, HV_X86_R14);
+    env->regs[15] = rreg(cpu_state->hvf->fd, HV_X86_R15);
     
-    env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
-    env->eip = rreg(cpu_state->hvf_fd, HV_X86_RIP);
+    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
+    env->eip = rreg(cpu_state->hvf->fd, HV_X86_RIP);
    
     hvf_get_xsave(cpu_state);
-    env->xcr0 = rreg(cpu_state->hvf_fd, HV_X86_XCR0);
+    env->xcr0 = rreg(cpu_state->hvf->fd, HV_X86_XCR0);
     
     hvf_get_segments(cpu_state);
     hvf_get_msrs(cpu_state);
     
-    env->dr[0] = rreg(cpu_state->hvf_fd, HV_X86_DR0);
-    env->dr[1] = rreg(cpu_state->hvf_fd, HV_X86_DR1);
-    env->dr[2] = rreg(cpu_state->hvf_fd, HV_X86_DR2);
-    env->dr[3] = rreg(cpu_state->hvf_fd, HV_X86_DR3);
-    env->dr[4] = rreg(cpu_state->hvf_fd, HV_X86_DR4);
-    env->dr[5] = rreg(cpu_state->hvf_fd, HV_X86_DR5);
-    env->dr[6] = rreg(cpu_state->hvf_fd, HV_X86_DR6);
-    env->dr[7] = rreg(cpu_state->hvf_fd, HV_X86_DR7);
+    env->dr[0] = rreg(cpu_state->hvf->fd, HV_X86_DR0);
+    env->dr[1] = rreg(cpu_state->hvf->fd, HV_X86_DR1);
+    env->dr[2] = rreg(cpu_state->hvf->fd, HV_X86_DR2);
+    env->dr[3] = rreg(cpu_state->hvf->fd, HV_X86_DR3);
+    env->dr[4] = rreg(cpu_state->hvf->fd, HV_X86_DR4);
+    env->dr[5] = rreg(cpu_state->hvf->fd, HV_X86_DR5);
+    env->dr[6] = rreg(cpu_state->hvf->fd, HV_X86_DR6);
+    env->dr[7] = rreg(cpu_state->hvf->fd, HV_X86_DR7);
     
     x86_update_hflags(env);
     return 0;
@@ -XXX,XX +XXX,XX @@ int hvf_get_registers(CPUState *cpu_state)
 static void vmx_set_int_window_exiting(CPUState *cpu)
 {
      uint64_t val;
-     val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
-     wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val |
+     val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
+     wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val |
              VMCS_PRI_PROC_BASED_CTLS_INT_WINDOW_EXITING);
 }
 
 void vmx_clear_int_window_exiting(CPUState *cpu)
 {
      uint64_t val;
-     val = rvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS);
-     wvmcs(cpu->hvf_fd, VMCS_PRI_PROC_BASED_CTLS, val &
+     val = rvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS);
+     wvmcs(cpu->hvf->fd, VMCS_PRI_PROC_BASED_CTLS, val &
              ~VMCS_PRI_PROC_BASED_CTLS_INT_WINDOW_EXITING);
 }
 
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
     uint64_t info = 0;
     if (have_event) {
         info = vector | intr_type | VMCS_INTR_VALID;
-        uint64_t reason = rvmcs(cpu_state->hvf_fd, VMCS_EXIT_REASON);
+        uint64_t reason = rvmcs(cpu_state->hvf->fd, VMCS_EXIT_REASON);
         if (env->nmi_injected && reason != EXIT_REASON_TASK_SWITCH) {
             vmx_clear_nmi_blocking(cpu_state);
         }
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
             info &= ~(1 << 12); /* clear undefined bit */
             if (intr_type == VMCS_INTR_T_SWINTR ||
                 intr_type == VMCS_INTR_T_SWEXCEPTION) {
-                wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INST_LENGTH, env->ins_len);
+                wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INST_LENGTH, env->ins_len);
             }
             
             if (env->has_error_code) {
-                wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_EXCEPTION_ERROR,
+                wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_EXCEPTION_ERROR,
                       env->error_code);
                 /* Indicate that VMCS_ENTRY_EXCEPTION_ERROR is valid */
                 info |= VMCS_INTR_DEL_ERRCODE;
             }
             /*printf("reinject  %lx err %d\n", info, err);*/
-            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, info);
+            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, info);
         };
     }
 
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
         if (!(env->hflags2 & HF2_NMI_MASK) && !(info & VMCS_INTR_VALID)) {
             cpu_state->interrupt_request &= ~CPU_INTERRUPT_NMI;
             info = VMCS_INTR_VALID | VMCS_INTR_T_NMI | EXCP02_NMI;
-            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, info);
+            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, info);
         } else {
             vmx_set_nmi_window_exiting(cpu_state);
         }
@@ -XXX,XX +XXX,XX @@ bool hvf_inject_interrupts(CPUState *cpu_state)
         int line = cpu_get_pic_interrupt(&x86cpu->env);
         cpu_state->interrupt_request &= ~CPU_INTERRUPT_HARD;
         if (line >= 0) {
-            wvmcs(cpu_state->hvf_fd, VMCS_ENTRY_INTR_INFO, line |
+            wvmcs(cpu_state->hvf->fd, VMCS_ENTRY_INTR_INFO, line |
                   VMCS_INTR_VALID | VMCS_INTR_T_HWINTR);
         }
     }
@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
     X86CPU *cpu = X86_CPU(cpu_state);
     CPUX86State *env = &cpu->env;
 
-    env->eflags = rreg(cpu_state->hvf_fd, HV_X86_RFLAGS);
+    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
 
     if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
         cpu_synchronize_state(cpu_state);
-- 
2.20.1

From: Alexander Graf <agraf@csgraf.de>

The hooks we have that call us after reset, init and loadvm really all
just want to say "The reference of all register state is in the QEMU
vcpu struct, please push it".

We already have a working pushing mechanism though called cpu->vcpu_dirty,
so we can just reuse that for all of the above, syncing state properly the
next time we actually execute a vCPU.

This fixes PSCI resets on ARM, as they modify CPU state even after the
post init call has completed, but before we execute the vCPU again.

To also make the scheme work for x86, we have to make sure we don't
move stale eflags into our env when the vcpu state is dirty.

Signed-off-by: Alexander Graf <agraf@csgraf.de>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-id: 20210519202253.76782-13-agraf@csgraf.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/hvf/hvf-accel-ops.c | 27 +++++++--------------------
 target/i386/hvf/x86hvf.c  |  5 ++++-
 2 files changed, 11 insertions(+), 21 deletions(-)

diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -XXX,XX +XXX,XX @@ static void hvf_cpu_synchronize_state(CPUState *cpu)
     }
 }
 
-static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
-                                              run_on_cpu_data arg)
+static void do_hvf_cpu_synchronize_set_dirty(CPUState *cpu,
+                                             run_on_cpu_data arg)
 {
-    hvf_put_registers(cpu);
-    cpu->vcpu_dirty = false;
+    /* QEMU state is the reference, push it to HVF now and on next entry */
+    cpu->vcpu_dirty = true;
 }
 
 static void hvf_cpu_synchronize_post_reset(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_reset, RUN_ON_CPU_NULL);
-}
-
-static void do_hvf_cpu_synchronize_post_init(CPUState *cpu,
-                                             run_on_cpu_data arg)
-{
-    hvf_put_registers(cpu);
-    cpu->vcpu_dirty = false;
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
 }
 
 static void hvf_cpu_synchronize_post_init(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_post_init, RUN_ON_CPU_NULL);
-}
-
-static void do_hvf_cpu_synchronize_pre_loadvm(CPUState *cpu,
-                                              run_on_cpu_data arg)
-{
-    cpu->vcpu_dirty = true;
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
 }
 
 static void hvf_cpu_synchronize_pre_loadvm(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_hvf_cpu_synchronize_pre_loadvm, RUN_ON_CPU_NULL);
+    run_on_cpu(cpu, do_hvf_cpu_synchronize_set_dirty, RUN_ON_CPU_NULL);
 }
 
 static void hvf_set_dirty_tracking(MemoryRegionSection *section, bool on)
diff --git a/target/i386/hvf/x86hvf.c b/target/i386/hvf/x86hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/hvf/x86hvf.c
+++ b/target/i386/hvf/x86hvf.c
@@ -XXX,XX +XXX,XX @@ int hvf_process_events(CPUState *cpu_state)
     X86CPU *cpu = X86_CPU(cpu_state);
     CPUX86State *env = &cpu->env;
 
-    env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
+    if (!cpu_state->vcpu_dirty) {
+        /* light weight sync for CPU_INTERRUPT_HARD and IF_MASK */
+        env->eflags = rreg(cpu_state->hvf->fd, HV_X86_RFLAGS);
+    }
 
     if (cpu_state->interrupt_request & CPU_INTERRUPT_INIT) {
         cpu_synchronize_state(cpu_state);
-- 
2.20.1

Coverity notes that we don't check for dup2() failing.  Add some
assertions so that if it does ever happen we get some indication.
(This is similar to how we handle other "don't expect this syscall to
fail" checks in this test code.)

Fixes: Coverity CID 1432346
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-2-peter.maydell@linaro.org
---
 tests/qtest/bios-tables-test.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -XXX,XX +XXX,XX @@ static void test_acpi_asl(test_data *data)
                                                  exp_sdt->asl_file, sdt->asl_file);
                     int out = dup(STDOUT_FILENO);
                     int ret G_GNUC_UNUSED;
+                    int dupret;
 
-                    dup2(STDERR_FILENO, STDOUT_FILENO);
+                    g_assert(out >= 0);
+                    dupret = dup2(STDERR_FILENO, STDOUT_FILENO);
+                    g_assert(dupret >= 0);
                     ret = system(diff) ;
-                    dup2(out, STDOUT_FILENO);
+                    dupret = dup2(out, STDOUT_FILENO);
+                    g_assert(dupret >= 0);
                     close(out);
                     g_free(diff);
                 }
-- 
2.20.1

The e1000e_send_verify() test calls qemu_recv() but doesn't
check that the call succeeded, which annoys Coverity. Add
an explicit test check for the length of the data.

(This is a test check, not a "we assume this syscall always
succeeds", so we use g_assert_cmpint() rather than g_assert().)

Fixes: Coverity CID 1432324
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-3-peter.maydell@linaro.org
---
 tests/qtest/e1000e-test.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/e1000e-test.c b/tests/qtest/e1000e-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/e1000e-test.c
+++ b/tests/qtest/e1000e-test.c
@@ -XXX,XX +XXX,XX @@ static void e1000e_send_verify(QE1000E *d, int *test_sockets, QGuestAllocator *a
     /* Check data sent to the backend */
     ret = qemu_recv(test_sockets[0], &recv_len, sizeof(recv_len), 0);
     g_assert_cmpint(ret, == , sizeof(recv_len));
-    qemu_recv(test_sockets[0], buffer, 64, 0);
+    ret = qemu_recv(test_sockets[0], buffer, 64, 0);
+    g_assert_cmpint(ret, >=, 5);
     g_assert_cmpstr(buffer, == , "TEST");
 
     /* Free test data buffer */
-- 
2.20.1

Coverity notices that the checks against mkstemp() failing in
create_qcow2_with_mbr() are wrong: mkstemp returns -1 on failure but
the check is just "g_assert(fd)".  Fix to use "g_assert(fd >= 0)",
matching the correct check in create_test_img().

Fixes: Coverity CID 1432274
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-4-peter.maydell@linaro.org
---
 tests/qtest/hd-geo-test.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/hd-geo-test.c b/tests/qtest/hd-geo-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/hd-geo-test.c
+++ b/tests/qtest/hd-geo-test.c
@@ -XXX,XX +XXX,XX @@ static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors)
     }
 
     fd = mkstemp(raw_path);
-    g_assert(fd);
+    g_assert(fd >= 0);
     close(fd);
 
     fd = open(raw_path, O_WRONLY);
@@ -XXX,XX +XXX,XX @@ static char *create_qcow2_with_mbr(MBRpartitions mbr, uint64_t sectors)
     close(fd);
 
     fd = mkstemp(qcow2_path);
-    g_assert(fd);
+    g_assert(fd >= 0);
     close(fd);
 
     qemu_img_path = getenv("QTEST_QEMU_IMG");
-- 
2.20.1

Coverity points out that we calculate a 64-bit value using 32-bit
arithmetic; add the cast to force the multiply to be done as 64-bits.
(The overflow will never happen with the current test data.)

Fixes: Coverity CID 1432320
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-5-peter.maydell@linaro.org
---
 tests/qtest/pflash-cfi02-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/pflash-cfi02-test.c b/tests/qtest/pflash-cfi02-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/pflash-cfi02-test.c
+++ b/tests/qtest/pflash-cfi02-test.c
@@ -XXX,XX +XXX,XX @@ static void test_geometry(const void *opaque)
 
     for (int region = 0; region < nb_erase_regions; ++region) {
         for (uint32_t i = 0; i < c->nb_blocs[region]; ++i) {
-            uint64_t byte_addr = i * c->sector_len[region];
+            uint64_t byte_addr = (uint64_t)i * c->sector_len[region];
             g_assert_cmphex(flash_read(c, byte_addr), ==, bank_mask(c));
         }
     }
-- 
2.20.1

Coverity points out that in tpm_test_swtpm_migration_test() we
assume that src_tpm_addr and dst_tpm_addr are non-NULL (we
pass them to tpm_util_migration_start_qemu() which will
unconditionally dereference them) but then later explicitly
check them for NULL. Remove the pointless checks.

Fixes: Coverity CID 1432367, 1432359

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Message-id: 20210525134458.6675-6-peter.maydell@linaro.org
---
 tests/qtest/tpm-tests.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/tpm-tests.c
+++ b/tests/qtest/tpm-tests.c
@@ -XXX,XX +XXX,XX @@ void tpm_test_swtpm_migration_test(const char *src_tpm_path,
     qtest_quit(src_qemu);
 
     tpm_util_swtpm_kill(dst_tpm_pid);
-    if (dst_tpm_addr) {
-        g_unlink(dst_tpm_addr->u.q_unix.path);
-        qapi_free_SocketAddress(dst_tpm_addr);
-    }
+    g_unlink(dst_tpm_addr->u.q_unix.path);
+    qapi_free_SocketAddress(dst_tpm_addr);
 
     tpm_util_swtpm_kill(src_tpm_pid);
-    if (src_tpm_addr) {
-        g_unlink(src_tpm_addr->u.q_unix.path);
-        qapi_free_SocketAddress(src_tpm_addr);
-    }
+    g_unlink(src_tpm_addr->u.q_unix.path);
+    qapi_free_SocketAddress(src_tpm_addr);
 }
-- 
2.20.1

Coverity complains that we don't check for failures from dup()
and mkstemp(); add asserts that these syscalls succeeded.

Fixes: Coverity CID 1432516, 1432574
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20210525134458.6675-7-peter.maydell@linaro.org
---
 tests/unit/test-vmstate.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tests/unit/test-vmstate.c b/tests/unit/test-vmstate.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/unit/test-vmstate.c
+++ b/tests/unit/test-vmstate.c
@@ -XXX,XX +XXX,XX @@ static int temp_fd;
 /* Duplicate temp_fd and seek to the beginning of the file */
 static QEMUFile *open_test_file(bool write)
 {
-    int fd = dup(temp_fd);
+    int fd;
     QIOChannel *ioc;
     QEMUFile *f;
 
+    fd = dup(temp_fd);
+    g_assert(fd >= 0);
     lseek(fd, 0, SEEK_SET);
     if (write) {
         g_assert_cmpint(ftruncate(fd, 0), ==, 0);
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
     g_autofree char *temp_file = g_strdup_printf("%s/vmst.test.XXXXXX",
                                                  g_get_tmp_dir());
     temp_fd = mkstemp(temp_file);
+    g_assert(temp_fd >= 0);
 
     module_call_init(MODULE_INIT_QOM);
 
-- 
2.20.1

The following changes since commit 5a67d7735d4162630769ef495cf813244fc850df:

Merge remote-tracking branch 'remotes/berrange-gitlab/tags/tls-deps-pull-request' into staging (2021-07-02 08:22:39 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210702

for you to fetch changes up to 04ea4d3cfd0a21b248ece8eb7a9436a3d9898dd8:

target/arm: Implement MVE shifts by register (2021-07-02 11:48:38 +0100)

----------------------------------------------------------------
target-arm queue:
 * more MVE instructions
 * hw/gpio/gpio_pwr: use shutdown function for reboot
 * target/arm: Check NaN mode before silencing NaN
 * tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine
 * hw/arm: Add basic power management to raspi.
 * docs/system/arm: Add quanta-gbs-bmc, quanta-q7l1-bmc

----------------------------------------------------------------
Joe Komlodi (1):
      target/arm: Check NaN mode before silencing NaN

Maxim Uvarov (1):
      hw/gpio/gpio_pwr: use shutdown function for reboot

Nolan Leake (1):
      hw/arm: Add basic power management to raspi.

Patrick Venture (2):
      docs/system/arm: Add quanta-q7l1-bmc reference
      docs/system/arm: Add quanta-gbs-bmc reference

Peter Maydell (18):
      target/arm: Fix MVE widening/narrowing VLDR/VSTR offset calculation
      target/arm: Fix bugs in MVE VRMLALDAVH, VRMLSLDAVH
      target/arm: Make asimd_imm_const() public
      target/arm: Use asimd_imm_const for A64 decode
      target/arm: Use dup_const() instead of bitfield_replicate()
      target/arm: Implement MVE logical immediate insns
      target/arm: Implement MVE vector shift left by immediate insns
      target/arm: Implement MVE vector shift right by immediate insns
      target/arm: Implement MVE VSHLL
      target/arm: Implement MVE VSRI, VSLI
      target/arm: Implement MVE VSHRN, VRSHRN
      target/arm: Implement MVE saturating narrowing shifts
      target/arm: Implement MVE VSHLC
      target/arm: Implement MVE VADDLV
      target/arm: Implement MVE long shifts by immediate
      target/arm: Implement MVE long shifts by register
      target/arm: Implement MVE shifts by immediate
      target/arm: Implement MVE shifts by register

Philippe Mathieu-Daudé (1):
      tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine

docs/system/arm/aspeed.rst             |   1 +
 docs/system/arm/nuvoton.rst            |   5 +-
 include/hw/arm/bcm2835_peripherals.h   |   3 +-
 include/hw/misc/bcm2835_powermgt.h     |  29 ++
 target/arm/helper-mve.h                | 108 +++++++
 target/arm/translate.h                 |  41 +++
 target/arm/mve.decode                  | 177 ++++++++++-
 target/arm/t32.decode                  |  71 ++++-
 hw/arm/bcm2835_peripherals.c           |  13 +-
 hw/gpio/gpio_pwr.c                     |   2 +-
 hw/misc/bcm2835_powermgt.c             | 160 ++++++++++
 target/arm/helper-a64.c                |  12 +-
 target/arm/mve_helper.c                | 524 +++++++++++++++++++++++++++++++--
 target/arm/translate-a64.c             |  86 +-----
 target/arm/translate-mve.c             | 261 +++++++++++++++-
 target/arm/translate-neon.c            |  81 -----
 target/arm/translate.c                 | 327 +++++++++++++++++++-
 target/arm/vfp_helper.c                |  24 +-
 hw/misc/meson.build                    |   1 +
 tests/acceptance/boot_linux_console.py |  43 +++
 20 files changed, 1760 insertions(+), 209 deletions(-)
 create mode 100644 include/hw/misc/bcm2835_powermgt.h
 create mode 100644 hw/misc/bcm2835_powermgt.c

From: Patrick Venture <venture@google.com>

Add line item reference to quanta-gbs-bmc machine.

Signed-off-by: Patrick Venture <venture@google.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-id: 20210615192848.1065297-3-venture@google.com
[PMM: fixed underline Sphinx warning]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/nuvoton.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/nuvoton.rst
+++ b/docs/system/arm/nuvoton.rst
@@ -XXX,XX +XXX,XX @@
-Nuvoton iBMC boards (``npcm750-evb``, ``quanta-gsj``)
-=====================================================
+Nuvoton iBMC boards (``*-bmc``, ``npcm750-evb``, ``quanta-gsj``)
+================================================================
 
 The `Nuvoton iBMC`_ chips (NPCM7xx) are a family of ARM-based SoCs that are
 designed to be used as Baseboard Management Controllers (BMCs) in various
@@ -XXX,XX +XXX,XX @@ segment. The following machines are based on this chip :
 The NPCM730 SoC has two Cortex-A9 cores and is targeted for Data Center and
 Hyperscale applications. The following machines are based on this chip :
 
+- ``quanta-gbs-bmc``    Quanta GBS server BMC
 - ``quanta-gsj``        Quanta GSJ server BMC
 
 There are also two more SoCs, NPCM710 and NPCM705, which are single-core
-- 
2.20.1

From: Nolan Leake <nolan@sigbus.net>

This is just enough to make reboot and poweroff work. Works for
linux, u-boot, and the arm trusted firmware. Not tested, but should
work for plan9, and bare-metal/hobby OSes, since they seem to generally
do what linux does for reset.

The watchdog timer functionality is not yet implemented.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/64
Signed-off-by: Nolan Leake <nolan@sigbus.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210625210209.1870217-1-nolan@sigbus.net
[PMM: tweaked commit title; fixed region size to 0x200;
 moved header file to include/]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/bcm2835_peripherals.h |   3 +-
 include/hw/misc/bcm2835_powermgt.h   |  29 +++++
 hw/arm/bcm2835_peripherals.c         |  13 ++-
 hw/misc/bcm2835_powermgt.c           | 160 +++++++++++++++++++++++++++
 hw/misc/meson.build                  |   1 +
 5 files changed, 204 insertions(+), 2 deletions(-)
 create mode 100644 include/hw/misc/bcm2835_powermgt.h
 create mode 100644 hw/misc/bcm2835_powermgt.c

diff --git a/include/hw/arm/bcm2835_peripherals.h b/include/hw/arm/bcm2835_peripherals.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/bcm2835_peripherals.h
+++ b/include/hw/arm/bcm2835_peripherals.h
@@ -XXX,XX +XXX,XX @@
 #include "hw/misc/bcm2835_mphi.h"
 #include "hw/misc/bcm2835_thermal.h"
 #include "hw/misc/bcm2835_cprman.h"
+#include "hw/misc/bcm2835_powermgt.h"
 #include "hw/sd/sdhci.h"
 #include "hw/sd/bcm2835_sdhost.h"
 #include "hw/gpio/bcm2835_gpio.h"
@@ -XXX,XX +XXX,XX @@ struct BCM2835PeripheralState {
     BCM2835MphiState mphi;
     UnimplementedDeviceState txp;
     UnimplementedDeviceState armtmr;
-    UnimplementedDeviceState powermgt;
+    BCM2835PowerMgtState powermgt;
     BCM2835CprmanState cprman;
     PL011State uart0;
     BCM2835AuxState aux;
diff --git a/include/hw/misc/bcm2835_powermgt.h b/include/hw/misc/bcm2835_powermgt.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/hw/misc/bcm2835_powermgt.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * BCM2835 Power Management emulation
+ *
+ * Copyright (C) 2017 Marcin Chojnacki <marcinch7@gmail.com>
+ * Copyright (C) 2021 Nolan Leake <nolan@sigbus.net>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef BCM2835_POWERMGT_H
+#define BCM2835_POWERMGT_H
+
+#include "hw/sysbus.h"
+#include "qom/object.h"
+
+#define TYPE_BCM2835_POWERMGT "bcm2835-powermgt"
+OBJECT_DECLARE_SIMPLE_TYPE(BCM2835PowerMgtState, BCM2835_POWERMGT)
+
+struct BCM2835PowerMgtState {
+    SysBusDevice busdev;
+    MemoryRegion iomem;
+
+    uint32_t rstc;
+    uint32_t rsts;
+    uint32_t wdog;
+};
+
+#endif
diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/bcm2835_peripherals.c
+++ b/hw/arm/bcm2835_peripherals.c
@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_init(Object *obj)
 
     object_property_add_const_link(OBJECT(&s->dwc2), "dma-mr",
                                    OBJECT(&s->gpu_bus_mr));
+
+    /* Power Management */
+    object_initialize_child(obj, "powermgt", &s->powermgt,
+                            TYPE_BCM2835_POWERMGT);
 }
 
 static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
         qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
                                INTERRUPT_USB));
 
+    /* Power Management */
+    if (!sysbus_realize(SYS_BUS_DEVICE(&s->powermgt), errp)) {
+        return;
+    }
+
+    memory_region_add_subregion(&s->peri_mr, PM_OFFSET,
+                sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->powermgt), 0));
+
     create_unimp(s, &s->txp, "bcm2835-txp", TXP_OFFSET, 0x1000);
     create_unimp(s, &s->armtmr, "bcm2835-sp804", ARMCTRL_TIMER0_1_OFFSET, 0x40);
-    create_unimp(s, &s->powermgt, "bcm2835-powermgt", PM_OFFSET, 0x114);
     create_unimp(s, &s->i2s, "bcm2835-i2s", I2S_OFFSET, 0x100);
     create_unimp(s, &s->smi, "bcm2835-smi", SMI_OFFSET, 0x100);
     create_unimp(s, &s->spi[0], "bcm2835-spi0", SPI0_OFFSET, 0x20);
diff --git a/hw/misc/bcm2835_powermgt.c b/hw/misc/bcm2835_powermgt.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/misc/bcm2835_powermgt.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * BCM2835 Power Management emulation
+ *
+ * Copyright (C) 2017 Marcin Chojnacki <marcinch7@gmail.com>
+ * Copyright (C) 2021 Nolan Leake <nolan@sigbus.net>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "hw/misc/bcm2835_powermgt.h"
+#include "migration/vmstate.h"
+#include "sysemu/runstate.h"
+
+#define PASSWORD 0x5a000000
+#define PASSWORD_MASK 0xff000000
+
+#define R_RSTC 0x1c
+#define V_RSTC_RESET 0x20
+#define R_RSTS 0x20
+#define V_RSTS_POWEROFF 0x555 /* Linux uses partition 63 to indicate halt. */
+#define R_WDOG 0x24
+
+static uint64_t bcm2835_powermgt_read(void *opaque, hwaddr offset,
+                                      unsigned size)
+{
+    BCM2835PowerMgtState *s = (BCM2835PowerMgtState *)opaque;
+    uint32_t res = 0;
+
+    switch (offset) {
+    case R_RSTC:
+        res = s->rstc;
+        break;
+    case R_RSTS:
+        res = s->rsts;
+        break;
+    case R_WDOG:
+        res = s->wdog;
+        break;
+
+    default:
+        qemu_log_mask(LOG_UNIMP,
+                      "bcm2835_powermgt_read: Unknown offset 0x%08"HWADDR_PRIx
+                      "\n", offset);
+        res = 0;
+        break;
+    }
+
+    return res;
+}
+
+static void bcm2835_powermgt_write(void *opaque, hwaddr offset,
+                                   uint64_t value, unsigned size)
+{
+    BCM2835PowerMgtState *s = (BCM2835PowerMgtState *)opaque;
+
+    if ((value & PASSWORD_MASK) != PASSWORD) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "bcm2835_powermgt_write: Bad password 0x%"PRIx64
+                      " at offset 0x%08"HWADDR_PRIx"\n",
+                      value, offset);
+        return;
+    }
+
+    value = value & ~PASSWORD_MASK;
+
+    switch (offset) {
+    case R_RSTC:
+        s->rstc = value;
+        if (value & V_RSTC_RESET) {
+            if ((s->rsts & 0xfff) == V_RSTS_POWEROFF) {
+                qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+            } else {
+                qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+            }
+        }
+        break;
+    case R_RSTS:
+        qemu_log_mask(LOG_UNIMP,
+                      "bcm2835_powermgt_write: RSTS\n");
+        s->rsts = value;
+        break;
+    case R_WDOG:
+        qemu_log_mask(LOG_UNIMP,
+                      "bcm2835_powermgt_write: WDOG\n");
+        s->wdog = value;
+        break;
+
+    default:
+        qemu_log_mask(LOG_UNIMP,
+                      "bcm2835_powermgt_write: Unknown offset 0x%08"HWADDR_PRIx
+                      "\n", offset);
+        break;
+    }
+}
+
+static const MemoryRegionOps bcm2835_powermgt_ops = {
+    .read = bcm2835_powermgt_read,
+    .write = bcm2835_powermgt_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .impl.min_access_size = 4,
+    .impl.max_access_size = 4,
+};
+
+static const VMStateDescription vmstate_bcm2835_powermgt = {
+    .name = TYPE_BCM2835_POWERMGT,
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(rstc, BCM2835PowerMgtState),
+        VMSTATE_UINT32(rsts, BCM2835PowerMgtState),
+        VMSTATE_UINT32(wdog, BCM2835PowerMgtState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static void bcm2835_powermgt_init(Object *obj)
+{
+    BCM2835PowerMgtState *s = BCM2835_POWERMGT(obj);
+
+    memory_region_init_io(&s->iomem, obj, &bcm2835_powermgt_ops, s,
+                          TYPE_BCM2835_POWERMGT, 0x200);
+    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->iomem);
+}
+
+static void bcm2835_powermgt_reset(DeviceState *dev)
+{
+    BCM2835PowerMgtState *s = BCM2835_POWERMGT(dev);
+
+    /* https://elinux.org/BCM2835_registers#PM */
+    s->rstc = 0x00000102;
+    s->rsts = 0x00001000;
+    s->wdog = 0x00000000;
+}
+
+static void bcm2835_powermgt_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->reset = bcm2835_powermgt_reset;
+    dc->vmsd = &vmstate_bcm2835_powermgt;
+}
+
+static TypeInfo bcm2835_powermgt_info = {
+    .name          = TYPE_BCM2835_POWERMGT,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(BCM2835PowerMgtState),
+    .class_init    = bcm2835_powermgt_class_init,
+    .instance_init = bcm2835_powermgt_init,
+};
+
+static void bcm2835_powermgt_register_types(void)
+{
+    type_register_static(&bcm2835_powermgt_info);
+}
+
+type_init(bcm2835_powermgt_register_types)
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_RASPI', if_true: files(
   'bcm2835_rng.c',
   'bcm2835_thermal.c',
   'bcm2835_cprman.c',
+  'bcm2835_powermgt.c',
 ))
 softmmu_ss.add(when: 'CONFIG_SLAVIO', if_true: files('slavio_misc.c'))
 softmmu_ss.add(when: 'CONFIG_ZYNQ', if_true: files('zynq_slcr.c', 'zynq-xadc.c'))
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

Add a test booting and quickly shutdown a raspi2 machine,
to test the power management model:

(1/1) tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_raspi2_initrd:
  console: [    0.000000] Booting Linux on physical CPU 0xf00
  console: [    0.000000] Linux version 4.14.98-v7+ (dom@dom-XPS-13-9370) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1200 SMP Tue Feb 12 20:27:48 GMT 2019
  console: [    0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
  console: [    0.000000] CPU: div instructions available: patching division code
  console: [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
  console: [    0.000000] OF: fdt: Machine model: Raspberry Pi 2 Model B
  ...
  console: Boot successful.
  console: cat /proc/cpuinfo
  console: / # cat /proc/cpuinfo
  ...
  console: processor      : 3
  console: model name     : ARMv7 Processor rev 5 (v7l)
  console: BogoMIPS       : 125.00
  console: Features       : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
  console: CPU implementer        : 0x41
  console: CPU architecture: 7
  console: CPU variant    : 0x0
  console: CPU part       : 0xc07
  console: CPU revision   : 5
  console: Hardware       : BCM2835
  console: Revision       : 0000
  console: Serial         : 0000000000000000
  console: cat /proc/iomem
  console: / # cat /proc/iomem
  console: 00000000-3bffffff : System RAM
  console: 00008000-00afffff : Kernel code
  console: 00c00000-00d468ef : Kernel data
  console: 3f006000-3f006fff : dwc_otg
  console: 3f007000-3f007eff : /soc/dma@7e007000
  console: 3f00b880-3f00b8bf : /soc/mailbox@7e00b880
  console: 3f100000-3f100027 : /soc/watchdog@7e100000
  console: 3f101000-3f102fff : /soc/cprman@7e101000
  console: 3f200000-3f2000b3 : /soc/gpio@7e200000
  PASS (24.59 s)
  RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
  JOB TIME   : 25.02 s

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Message-id: 20210531113837.1689775-1-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/acceptance/boot_linux_console.py | 43 ++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/tests/acceptance/boot_linux_console.py b/tests/acceptance/boot_linux_console.py
index XXXXXXX..XXXXXXX 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -XXX,XX +XXX,XX @@
 from avocado import skip
 from avocado import skipUnless
 from avocado_qemu import Test
+from avocado_qemu import exec_command
 from avocado_qemu import exec_command_and_wait_for_pattern
 from avocado_qemu import interrupt_interactive_console_until_pattern
 from avocado_qemu import wait_for_console_pattern
@@ -XXX,XX +XXX,XX @@ def test_arm_raspi2_uart0(self):
         """
         self.do_test_arm_raspi2(0)
 
+    def test_arm_raspi2_initrd(self):
+        """
+        :avocado: tags=arch:arm
+        :avocado: tags=machine:raspi2
+        """
+        deb_url = ('http://archive.raspberrypi.org/debian/'
+                   'pool/main/r/raspberrypi-firmware/'
+                   'raspberrypi-kernel_1.20190215-1_armhf.deb')
+        deb_hash = 'cd284220b32128c5084037553db3c482426f3972'
+        deb_path = self.fetch_asset(deb_url, asset_hash=deb_hash)
+        kernel_path = self.extract_from_deb(deb_path, '/boot/kernel7.img')
+        dtb_path = self.extract_from_deb(deb_path, '/boot/bcm2709-rpi-2-b.dtb')
+
+        initrd_url = ('https://github.com/groeck/linux-build-test/raw/'
+                      '2eb0a73b5d5a28df3170c546ddaaa9757e1e0848/rootfs/'
+                      'arm/rootfs-armv7a.cpio.gz')
+        initrd_hash = '604b2e45cdf35045846b8bbfbf2129b1891bdc9c'
+        initrd_path_gz = self.fetch_asset(initrd_url, asset_hash=initrd_hash)
+        initrd_path = os.path.join(self.workdir, 'rootfs.cpio')
+        archive.gzip_uncompress(initrd_path_gz, initrd_path)
+
+        self.vm.set_console()
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'earlycon=pl011,0x3f201000 console=ttyAMA0 '
+                               'panic=-1 noreboot ' +
+                               'dwc_otg.fiq_fsm_enable=0')
+        self.vm.add_args('-kernel', kernel_path,
+                         '-dtb', dtb_path,
+                         '-initrd', initrd_path,
+                         '-append', kernel_command_line,
+                         '-no-reboot')
+        self.vm.launch()
+        self.wait_for_console_pattern('Boot successful.')
+
+        exec_command_and_wait_for_pattern(self, 'cat /proc/cpuinfo',
+                                                'BCM2835')
+        exec_command_and_wait_for_pattern(self, 'cat /proc/iomem',
+                                                '/soc/cprman@7e101000')
+        exec_command(self, 'halt')
+        # Wait for VM to shut down gracefully
+        self.vm.wait()
+
     def test_arm_exynos4210_initrd(self):
         """
         :avocado: tags=arch:arm
-- 
2.20.1

From: Joe Komlodi <joe.komlodi@xilinx.com>

If the CPU is running in default NaN mode (FPCR.DN == 1) and we execute
FRSQRTE, FRECPE, or FRECPX with a signaling NaN, parts_silence_nan_frac() will
assert due to fpst->default_nan_mode being set.

To avoid this, we check to see what NaN mode we're running in before we call
floatxx_silence_nan().

Signed-off-by: Joe Komlodi <joe.komlodi@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1624662174-175828-2-git-send-email-joe.komlodi@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-a64.c | 12 +++++++++---
 target/arm/vfp_helper.c | 24 ++++++++++++++++++------
 2 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(frecpx_f16)(uint32_t a, void *fpstp)
         float16 nan = a;
         if (float16_is_signaling_nan(a, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float16_silence_nan(a, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float16_silence_nan(a, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan = float16_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(frecpx_f32)(float32 a, void *fpstp)
         float32 nan = a;
         if (float32_is_signaling_nan(a, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float32_silence_nan(a, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float32_silence_nan(a, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan = float32_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(frecpx_f64)(float64 a, void *fpstp)
         float64 nan = a;
         if (float64_is_signaling_nan(a, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float64_silence_nan(a, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float64_silence_nan(a, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan = float64_default_nan(fpst);
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, void *fpstp)
         float16 nan = f16;
         if (float16_is_signaling_nan(f16, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float16_silence_nan(f16, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float16_silence_nan(f16, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan =  float16_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(recpe_f32)(float32 input, void *fpstp)
         float32 nan = f32;
         if (float32_is_signaling_nan(f32, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float32_silence_nan(f32, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float32_silence_nan(f32, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan =  float32_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(recpe_f64)(float64 input, void *fpstp)
         float64 nan = f64;
         if (float64_is_signaling_nan(f64, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float64_silence_nan(f64, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float64_silence_nan(f64, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan =  float64_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, void *fpstp)
         float16 nan = f16;
         if (float16_is_signaling_nan(f16, s)) {
             float_raise(float_flag_invalid, s);
-            nan = float16_silence_nan(f16, s);
+            if (!s->default_nan_mode) {
+                nan = float16_silence_nan(f16, fpstp);
+            }
         }
         if (s->default_nan_mode) {
             nan =  float16_default_nan(s);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(rsqrte_f32)(float32 input, void *fpstp)
         float32 nan = f32;
         if (float32_is_signaling_nan(f32, s)) {
             float_raise(float_flag_invalid, s);
-            nan = float32_silence_nan(f32, s);
+            if (!s->default_nan_mode) {
+                nan = float32_silence_nan(f32, fpstp);
+            }
         }
         if (s->default_nan_mode) {
             nan =  float32_default_nan(s);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(rsqrte_f64)(float64 input, void *fpstp)
         float64 nan = f64;
         if (float64_is_signaling_nan(f64, s)) {
             float_raise(float_flag_invalid, s);
-            nan = float64_silence_nan(f64, s);
+            if (!s->default_nan_mode) {
+                nan = float64_silence_nan(f64, fpstp);
+            }
         }
         if (s->default_nan_mode) {
             nan =  float64_default_nan(s);
-- 
2.20.1

From: Maxim Uvarov <maxim.uvarov@linaro.org>

qemu has 2 type of functions: shutdown and reboot. Shutdown
function has to be used for machine shutdown. Otherwise we cause
a reset with a bogus "cause" value, when we intended a shutdown.

Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210625111842.3790-3-maxim.uvarov@linaro.org
[PMM: tweaked commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/gpio/gpio_pwr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/gpio/gpio_pwr.c b/hw/gpio/gpio_pwr.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/gpio/gpio_pwr.c
+++ b/hw/gpio/gpio_pwr.c
@@ -XXX,XX +XXX,XX @@ static void gpio_pwr_reset(void *opaque, int n, int level)
 static void gpio_pwr_shutdown(void *opaque, int n, int level)
 {
     if (level) {
-        qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
     }
 }
 
-- 
2.20.1

In do_ldst(), the calculation of the offset needs to be based on the
size of the memory access, not the size of the elements in the
vector.  This meant we were getting it wrong for the widening and
narrowing variants of the various VLDR and VSTR insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-2-peter.maydell@linaro.org
---
 target/arm/translate-mve.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool mve_skip_first_beat(DisasContext *s)
     }
 }
 
-static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
+static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn,
+                    unsigned msize)
 {
     TCGv_i32 addr;
     uint32_t offset;
@@ -XXX,XX +XXX,XX @@ static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
         return true;
     }
 
-    offset = a->imm << a->size;
+    offset = a->imm << msize;
     if (!a->a) {
         offset = -offset;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
         { gen_helper_mve_vstrw, gen_helper_mve_vldrw },
         { NULL, NULL }
     };
-    return do_ldst(s, a, ldstfns[a->size][a->l]);
+    return do_ldst(s, a, ldstfns[a->size][a->l], a->size);
 }
 
-#define DO_VLDST_WIDE_NARROW(OP, SLD, ULD, ST)                  \
+#define DO_VLDST_WIDE_NARROW(OP, SLD, ULD, ST, MSIZE)           \
     static bool trans_##OP(DisasContext *s, arg_VLDR_VSTR *a)   \
     {                                                           \
         static MVEGenLdStFn * const ldstfns[2][2] = {           \
             { gen_helper_mve_##ST, gen_helper_mve_##SLD },      \
             { NULL, gen_helper_mve_##ULD },                     \
         };                                                      \
-        return do_ldst(s, a, ldstfns[a->u][a->l]);              \
+        return do_ldst(s, a, ldstfns[a->u][a->l], MSIZE);       \
     }
 
-DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h)
-DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w)
-DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w)
+DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h, MO_8)
+DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w, MO_8)
+DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w, MO_16)
 
 static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
 {
-- 
2.20.1

The initial implementation of the MVE VRMLALDAVH and VRMLSLDAVH
insns had some bugs:
 * the 32x32 multiply of elements was being done as 32x32->32,
   not 32x32->64
 * we were incorrectly maintaining the accumulator in its full
   72-bit form across all 4 beats of the insn; in the pseudocode
   it is squashed back into the 64 bits of the RdaHi:RdaLo
   registers after each beat

In particular, fixing the second of these allows us to recast
the implementation to avoid 128-bit arithmetic entirely.

Since the element size here is always 4, we can also drop the
parameterization of ESIZE to make the code a little more readable.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-3-peter.maydell@linaro.org
---
 target/arm/mve_helper.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu/int128.h"
 #include "cpu.h"
 #include "internals.h"
 #include "vec_internal.h"
@@ -XXX,XX +XXX,XX @@ DO_LDAV(vmlsldavsw, 4, int32_t, false, +=, -=)
 DO_LDAV(vmlsldavxsw, 4, int32_t, true, +=, -=)
 
 /*
- * Rounding multiply add long dual accumulate high: we must keep
- * a 72-bit internal accumulator value and return the top 64 bits.
+ * Rounding multiply add long dual accumulate high. In the pseudocode
+ * this is implemented with a 72-bit internal accumulator value of which
+ * the top 64 bits are returned. We optimize this to avoid having to
+ * use 128-bit arithmetic -- we can do this because the 74-bit accumulator
+ * is squashed back into 64-bits after each beat.
  */
-#define DO_LDAVH(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC, TO128)         \
+#define DO_LDAVH(OP, TYPE, LTYPE, XCHG, SUB)                            \
     uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
                                     void *vm, uint64_t a)               \
     {                                                                   \
         uint16_t mask = mve_element_mask(env);                          \
         unsigned e;                                                     \
         TYPE *n = vn, *m = vm;                                          \
-        Int128 acc = int128_lshift(TO128(a), 8);                        \
-        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+        for (e = 0; e < 16 / 4; e++, mask >>= 4) {                      \
             if (mask & 1) {                                             \
+                LTYPE mul;                                              \
                 if (e & 1) {                                            \
-                    acc = ODDACC(acc, TO128(n[H##ESIZE(e - 1 * XCHG)] * \
-                                            m[H##ESIZE(e)]));           \
+                    mul = (LTYPE)n[H4(e - 1 * XCHG)] * m[H4(e)];        \
+                    if (SUB) {                                          \
+                        mul = -mul;                                     \
+                    }                                                   \
                 } else {                                                \
-                    acc = EVENACC(acc, TO128(n[H##ESIZE(e + 1 * XCHG)] * \
-                                             m[H##ESIZE(e)]));          \
+                    mul = (LTYPE)n[H4(e + 1 * XCHG)] * m[H4(e)];        \
                 }                                                       \
-                acc = int128_add(acc, int128_make64(1 << 7));           \
+                mul = (mul >> 8) + ((mul >> 7) & 1);                    \
+                a += mul;                                               \
             }                                                           \
         }                                                               \
         mve_advance_vpt(env);                                           \
-        return int128_getlo(int128_rshift(acc, 8));                     \
+        return a;                                                       \
     }
 
-DO_LDAVH(vrmlaldavhsw, 4, int32_t, false, int128_add, int128_add, int128_makes64)
-DO_LDAVH(vrmlaldavhxsw, 4, int32_t, true, int128_add, int128_add, int128_makes64)
+DO_LDAVH(vrmlaldavhsw, int32_t, int64_t, false, false)
+DO_LDAVH(vrmlaldavhxsw, int32_t, int64_t, true, false)
 
-DO_LDAVH(vrmlaldavhuw, 4, uint32_t, false, int128_add, int128_add, int128_make64)
+DO_LDAVH(vrmlaldavhuw, uint32_t, uint64_t, false, false)
 
-DO_LDAVH(vrmlsldavhsw, 4, int32_t, false, int128_add, int128_sub, int128_makes64)
-DO_LDAVH(vrmlsldavhxsw, 4, int32_t, true, int128_add, int128_sub, int128_makes64)
+DO_LDAVH(vrmlsldavhsw, int32_t, int64_t, false, true)
+DO_LDAVH(vrmlsldavhxsw, int32_t, int64_t, true, true)
 
 /* Vector add across vector */
 #define DO_VADDV(OP, ESIZE, TYPE)                               \
-- 
2.20.1

The function asimd_imm_const() in translate-neon.c is an
implementation of the pseudocode AdvSIMDExpandImm(), which we will
also want for MVE.  Move the implementation to translate.c, with a
prototype in translate.h.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-4-peter.maydell@linaro.org
---
 target/arm/translate.h      | 16 ++++++++++
 target/arm/translate-neon.c | 63 -------------------------------------
 target/arm/translate.c      | 57 +++++++++++++++++++++++++++++++++
 3 files changed, 73 insertions(+), 63 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ static inline MemOp finalize_memop(DisasContext *s, MemOp opc)
     return opc | s->be_data;
 }
 
+/**
+ * asimd_imm_const: Expand an encoded SIMD constant value
+ *
+ * Expand a SIMD constant value. This is essentially the pseudocode
+ * AdvSIMDExpandImm, except that we also perform the boolean NOT needed for
+ * VMVN and VBIC (when cmode < 14 && op == 1).
+ *
+ * The combination cmode == 15 op == 1 is a reserved encoding for AArch32;
+ * callers must catch this.
+ *
+ * cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 was UNPREDICTABLE in v7A but
+ * is either not unpredictable or merely CONSTRAINED UNPREDICTABLE in v8A;
+ * we produce an immediate constant value of 0 in these cases.
+ */
+uint64_t asimd_imm_const(uint32_t imm, int cmode, int op);
+
 #endif /* TARGET_ARM_TRANSLATE_H */
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ DO_FP_2SH(VCVT_UH, gen_helper_gvec_vcvt_uh)
 DO_FP_2SH(VCVT_HS, gen_helper_gvec_vcvt_hs)
 DO_FP_2SH(VCVT_HU, gen_helper_gvec_vcvt_hu)
 
-static uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
-{
-    /*
-     * Expand the encoded constant.
-     * Note that cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 is UNPREDICTABLE.
-     * We choose to not special-case this and will behave as if a
-     * valid constant encoding of 0 had been given.
-     * cmode = 15 op = 1 must UNDEF; we assume decode has handled that.
-     */
-    switch (cmode) {
-    case 0: case 1:
-        /* no-op */
-        break;
-    case 2: case 3:
-        imm <<= 8;
-        break;
-    case 4: case 5:
-        imm <<= 16;
-        break;
-    case 6: case 7:
-        imm <<= 24;
-        break;
-    case 8: case 9:
-        imm |= imm << 16;
-        break;
-    case 10: case 11:
-        imm = (imm << 8) | (imm << 24);
-        break;
-    case 12:
-        imm = (imm << 8) | 0xff;
-        break;
-    case 13:
-        imm = (imm << 16) | 0xffff;
-        break;
-    case 14:
-        if (op) {
-            /*
-             * This is the only case where the top and bottom 32 bits
-             * of the encoded constant differ.
-             */
-            uint64_t imm64 = 0;
-            int n;
-
-            for (n = 0; n < 8; n++) {
-                if (imm & (1 << n)) {
-                    imm64 |= (0xffULL << (n * 8));
-                }
-            }
-            return imm64;
-        }
-        imm |= (imm << 8) | (imm << 16) | (imm << 24);
-        break;
-    case 15:
-        imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
-            | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
-        break;
-    }
-    if (op) {
-        imm = ~imm;
-    }
-    return dup_const(MO_32, imm);
-}
-
 static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
                         GVecGen2iFn *fn)
 {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ void arm_translate_init(void)
     a64_translate_init();
 }
 
+uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
+{
+    /* Expand the encoded constant as per AdvSIMDExpandImm pseudocode */
+    switch (cmode) {
+    case 0: case 1:
+        /* no-op */
+        break;
+    case 2: case 3:
+        imm <<= 8;
+        break;
+    case 4: case 5:
+        imm <<= 16;
+        break;
+    case 6: case 7:
+        imm <<= 24;
+        break;
+    case 8: case 9:
+        imm |= imm << 16;
+        break;
+    case 10: case 11:
+        imm = (imm << 8) | (imm << 24);
+        break;
+    case 12:
+        imm = (imm << 8) | 0xff;
+        break;
+    case 13:
+        imm = (imm << 16) | 0xffff;
+        break;
+    case 14:
+        if (op) {
+            /*
+             * This is the only case where the top and bottom 32 bits
+             * of the encoded constant differ.
+             */
+            uint64_t imm64 = 0;
+            int n;
+
+            for (n = 0; n < 8; n++) {
+                if (imm & (1 << n)) {
+                    imm64 |= (0xffULL << (n * 8));
+                }
+            }
+            return imm64;
+        }
+        imm |= (imm << 8) | (imm << 16) | (imm << 24);
+        break;
+    case 15:
+        imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
+            | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
+        break;
+    }
+    if (op) {
+        imm = ~imm;
+    }
+    return dup_const(MO_32, imm);
+}
+
 /* Generate a label used for skipping this instruction */
 void arm_gen_condlabel(DisasContext *s)
 {
-- 
2.20.1

The A64 AdvSIMD modified-immediate grouping uses almost the same
constant encoding that A32 Neon does; reuse asimd_imm_const() (to
which we add the AArch64-specific case for cmode 15 op 1) instead of
reimplementing it all.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-5-peter.maydell@linaro.org
---
 target/arm/translate.h     |  3 +-
 target/arm/translate-a64.c | 86 ++++----------------------------------
 target/arm/translate.c     | 17 +++++++-
 3 files changed, 24 insertions(+), 82 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ static inline MemOp finalize_memop(DisasContext *s, MemOp opc)
  * VMVN and VBIC (when cmode < 14 && op == 1).
  *
  * The combination cmode == 15 op == 1 is a reserved encoding for AArch32;
- * callers must catch this.
+ * callers must catch this; we return the 64-bit constant value defined
+ * for AArch64.
  *
  * cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 was UNPREDICTABLE in v7A but
  * is either not unpredictable or merely CONSTRAINED UNPREDICTABLE in v8A;
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
 {
     int rd = extract32(insn, 0, 5);
     int cmode = extract32(insn, 12, 4);
-    int cmode_3_1 = extract32(cmode, 1, 3);
-    int cmode_0 = extract32(cmode, 0, 1);
     int o2 = extract32(insn, 11, 1);
     uint64_t abcdefgh = extract32(insn, 5, 5) | (extract32(insn, 16, 3) << 5);
     bool is_neg = extract32(insn, 29, 1);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
         return;
     }
 
-    /* See AdvSIMDExpandImm() in ARM ARM */
-    switch (cmode_3_1) {
-    case 0: /* Replicate(Zeros(24):imm8, 2) */
-    case 1: /* Replicate(Zeros(16):imm8:Zeros(8), 2) */
-    case 2: /* Replicate(Zeros(8):imm8:Zeros(16), 2) */
-    case 3: /* Replicate(imm8:Zeros(24), 2) */
-    {
-        int shift = cmode_3_1 * 8;
-        imm = bitfield_replicate(abcdefgh << shift, 32);
-        break;
-    }
-    case 4: /* Replicate(Zeros(8):imm8, 4) */
-    case 5: /* Replicate(imm8:Zeros(8), 4) */
-    {
-        int shift = (cmode_3_1 & 0x1) * 8;
-        imm = bitfield_replicate(abcdefgh << shift, 16);
-        break;
-    }
-    case 6:
-        if (cmode_0) {
-            /* Replicate(Zeros(8):imm8:Ones(16), 2) */
-            imm = (abcdefgh << 16) | 0xffff;
-        } else {
-            /* Replicate(Zeros(16):imm8:Ones(8), 2) */
-            imm = (abcdefgh << 8) | 0xff;
-        }
-        imm = bitfield_replicate(imm, 32);
-        break;
-    case 7:
-        if (!cmode_0 && !is_neg) {
-            imm = bitfield_replicate(abcdefgh, 8);
-        } else if (!cmode_0 && is_neg) {
-            int i;
-            imm = 0;
-            for (i = 0; i < 8; i++) {
-                if ((abcdefgh) & (1 << i)) {
-                    imm |= 0xffULL << (i * 8);
-                }
-            }
-        } else if (cmode_0) {
-            if (is_neg) {
-                imm = (abcdefgh & 0x3f) << 48;
-                if (abcdefgh & 0x80) {
-                    imm |= 0x8000000000000000ULL;
-                }
-                if (abcdefgh & 0x40) {
-                    imm |= 0x3fc0000000000000ULL;
-                } else {
-                    imm |= 0x4000000000000000ULL;
-                }
-            } else {
-                if (o2) {
-                    /* FMOV (vector, immediate) - half-precision */
-                    imm = vfp_expand_imm(MO_16, abcdefgh);
-                    /* now duplicate across the lanes */
-                    imm = bitfield_replicate(imm, 16);
-                } else {
-                    imm = (abcdefgh & 0x3f) << 19;
-                    if (abcdefgh & 0x80) {
-                        imm |= 0x80000000;
-                    }
-                    if (abcdefgh & 0x40) {
-                        imm |= 0x3e000000;
-                    } else {
-                        imm |= 0x40000000;
-                    }
-                    imm |= (imm << 32);
-                }
-            }
-        }
-        break;
-    default:
-        g_assert_not_reached();
-    }
-
-    if (cmode_3_1 != 7 && is_neg) {
-        imm = ~imm;
+    if (cmode == 15 && o2 && !is_neg) {
+        /* FMOV (vector, immediate) - half-precision */
+        imm = vfp_expand_imm(MO_16, abcdefgh);
+        /* now duplicate across the lanes */
+        imm = bitfield_replicate(imm, 16);
+    } else {
+        imm = asimd_imm_const(abcdefgh, cmode, is_neg);
     }
 
     if (!((cmode & 0x9) == 0x1 || (cmode & 0xd) == 0x9)) {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
     case 14:
         if (op) {
             /*
-             * This is the only case where the top and bottom 32 bits
-             * of the encoded constant differ.
+             * This and cmode == 15 op == 1 are the only cases where
+             * the top and bottom 32 bits of the encoded constant differ.
              */
             uint64_t imm64 = 0;
             int n;
@@ -XXX,XX +XXX,XX @@ uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
         imm |= (imm << 8) | (imm << 16) | (imm << 24);
         break;
     case 15:
+        if (op) {
+            /* Reserved encoding for AArch32; valid for AArch64 */
+            uint64_t imm64 = (uint64_t)(imm & 0x3f) << 48;
+            if (imm & 0x80) {
+                imm64 |= 0x8000000000000000ULL;
+            }
+            if (imm & 0x40) {
+                imm64 |= 0x3fc0000000000000ULL;
+            } else {
+                imm64 |= 0x4000000000000000ULL;
+            }
+            return imm64;
+        }
         imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
             | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
         break;
-- 
2.20.1

Use dup_const() instead of bitfield_replicate() in
disas_simd_mod_imm().

(We can't replace the other use of bitfield_replicate() in this file,
in logic_imm_decode_wmask(), because that location needs to handle 2
and 4 bit elements, which dup_const() cannot.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-6-peter.maydell@linaro.org
---
 target/arm/translate-a64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
         /* FMOV (vector, immediate) - half-precision */
         imm = vfp_expand_imm(MO_16, abcdefgh);
         /* now duplicate across the lanes */
-        imm = bitfield_replicate(imm, 16);
+        imm = dup_const(MO_16, imm);
     } else {
         imm = asimd_imm_const(abcdefgh, cmode, is_neg);
     }
-- 
2.20.1

Implement the MVE logical-immediate insns (VMOV, VMVN,
VORR and VBIC). These have essentially the same encoding
as their Neon equivalents, and we implement the decode
in the same way.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-7-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  4 +++
 target/arm/mve.decode      | 17 +++++++++++++
 target/arm/mve_helper.c    | 24 ++++++++++++++++++
 target/arm/translate-mve.c | 50 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 95 insertions(+)

Implement the MVE shift-vector-left-by-immediate insns VSHL, VQSHL
and VQSHLU.

The size-and-immediate encoding here is the same as Neon, and we
handle it the same way neon-dp.decode does.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-8-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 16 +++++++++++
 target/arm/mve.decode      | 23 +++++++++++++++
 target/arm/mve_helper.c    | 57 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 51 ++++++++++++++++++++++++++++++++++
 4 files changed, 147 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
 DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
 DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
+
+DEF_HELPER_FLAGS_4(mve_vshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshli_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshli_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshlui_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshlui_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshlui_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
 &2op qd qm qn size
 &2scalar qd qn rm size
 &1imm qd imm cmode op
+&2shift qd qm shift size
 
 @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
 # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -XXX,XX +XXX,XX @@
 @2scalar .... .... .. size:2 .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
 @2scalar_nosz .... .... .... .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
 
+@2_shl_b .... .... .. 001 shift:3 .... .... .... .... &2shift qd=%qd qm=%qm size=0
+@2_shl_h .... .... .. 01  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
+@2_shl_w .... .... .. 1   shift:5 .... .... .... .... &2shift qd=%qd qm=%qm size=2
+
 # Vector loads and stores
 
 # Widening loads and narrowing stores:
@@ -XXX,XX +XXX,XX @@ VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
 # So we have a single decode line and check the cmode/op in the
 # trans function.
 Vimm_1r 111 . 1111 1 . 00 0 ... ... 0 .... 0 1 . 1 .... @1imm
+
+# Shifts by immediate
+
+VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_b
+VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_h
+VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_w
+
+VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_b
+VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_h
+VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_w
+
+VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_b
+VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_h
+VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_w
+
+VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_b
+VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_h
+VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_w
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT(vqsubsw, 4, int32_t, DO_SQSUB_W)
     WRAP_QRSHL_HELPER(do_sqrshl_bhs, N, M, true, satp)
 #define DO_UQRSHL_OP(N, M, satp) \
     WRAP_QRSHL_HELPER(do_uqrshl_bhs, N, M, true, satp)
+#define DO_SUQSHL_OP(N, M, satp) \
+    WRAP_QRSHL_HELPER(do_suqrshl_bhs, N, M, false, satp)
 
 DO_2OP_SAT_S(vqshls, DO_SQSHL_OP)
 DO_2OP_SAT_U(vqshlu, DO_UQSHL_OP)
@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvsw, 4, uint32_t)
 DO_VADDV(vaddvub, 1, uint8_t)
 DO_VADDV(vaddvuh, 2, uint16_t)
 DO_VADDV(vaddvuw, 4, uint32_t)
+
+/* Shifts by immediate */
+#define DO_2SHIFT(OP, ESIZE, TYPE, FN)                          \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
+                                void *vm, uint32_t shift)       \
+    {                                                           \
+        TYPE *d = vd, *m = vm;                                  \
+        uint16_t mask = mve_element_mask(env);                  \
+        unsigned e;                                             \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
+            mergemask(&d[H##ESIZE(e)],                          \
+                      FN(m[H##ESIZE(e)], shift), mask);         \
+        }                                                       \
+        mve_advance_vpt(env);                                   \
+    }
+
+#define DO_2SHIFT_SAT(OP, ESIZE, TYPE, FN)                      \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
+                                void *vm, uint32_t shift)       \
+    {                                                           \
+        TYPE *d = vd, *m = vm;                                  \
+        uint16_t mask = mve_element_mask(env);                  \
+        unsigned e;                                             \
+        bool qc = false;                                        \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
+            bool sat = false;                                   \
+            mergemask(&d[H##ESIZE(e)],                          \
+                      FN(m[H##ESIZE(e)], shift, &sat), mask);   \
+            qc |= sat & mask & 1;                               \
+        }                                                       \
+        if (qc) {                                               \
+            env->vfp.qc[0] = qc;                                \
+        }                                                       \
+        mve_advance_vpt(env);                                   \
+    }
+
+/* provide unsigned 2-op shift helpers for all sizes */
+#define DO_2SHIFT_U(OP, FN)                     \
+    DO_2SHIFT(OP##b, 1, uint8_t, FN)            \
+    DO_2SHIFT(OP##h, 2, uint16_t, FN)           \
+    DO_2SHIFT(OP##w, 4, uint32_t, FN)
+
+#define DO_2SHIFT_SAT_U(OP, FN)                 \
+    DO_2SHIFT_SAT(OP##b, 1, uint8_t, FN)        \
+    DO_2SHIFT_SAT(OP##h, 2, uint16_t, FN)       \
+    DO_2SHIFT_SAT(OP##w, 4, uint32_t, FN)
+#define DO_2SHIFT_SAT_S(OP, FN)                 \
+    DO_2SHIFT_SAT(OP##b, 1, int8_t, FN)         \
+    DO_2SHIFT_SAT(OP##h, 2, int16_t, FN)        \
+    DO_2SHIFT_SAT(OP##w, 4, int32_t, FN)
+
+DO_2SHIFT_U(vshli_u, DO_VSHLU)
+DO_2SHIFT_SAT_U(vqshli_u, DO_UQSHL_OP)
+DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
+DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
+typedef void MVEGenTwoOpShiftFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
 typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1imm *a)
     }
     return do_1imm(s, a, fn);
 }
+
+static bool do_2shift(DisasContext *s, arg_2shift *a, MVEGenTwoOpShiftFn fn,
+                      bool negateshift)
+{
+    TCGv_ptr qd, qm;
+    int shift = a->shift;
+
+    if (!dc_isar_feature(aa32_mve, s) ||
+        !mve_check_qreg_bank(s, a->qd | a->qm) ||
+        !fn) {
+        return false;
+    }
+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
+        return true;
+    }
+
+    /*
+     * When we handle a right shift insn using a left-shift helper
+     * which permits a negative shift count to indicate a right-shift,
+     * we must negate the shift count.
+     */
+    if (negateshift) {
+        shift = -shift;
+    }
+
+    qd = mve_qreg_ptr(a->qd);
+    qm = mve_qreg_ptr(a->qm);
+    fn(cpu_env, qd, qm, tcg_constant_i32(shift));
+    tcg_temp_free_ptr(qd);
+    tcg_temp_free_ptr(qm);
+    mve_update_eci(s);
+    return true;
+}
+
+#define DO_2SHIFT(INSN, FN, NEGATESHIFT)                         \
+    static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
+    {                                                           \
+        static MVEGenTwoOpShiftFn * const fns[] = {             \
+            gen_helper_mve_##FN##b,                             \
+            gen_helper_mve_##FN##h,                             \
+            gen_helper_mve_##FN##w,                             \
+            NULL,                                               \
+        };                                                      \
+        return do_2shift(s, a, fns[a->size], NEGATESHIFT);      \
+    }
+
+DO_2SHIFT(VSHLI, vshli_u, false)
+DO_2SHIFT(VQSHLI_S, vqshli_s, false)
+DO_2SHIFT(VQSHLI_U, vqshli_u, false)
+DO_2SHIFT(VQSHLUI, vqshlui_s, false)
-- 
2.20.1

Implement the MVE vector shift right by immediate insns VSHRI and
VRSHRI.  As with Neon, we implement these by using helper functions
which perform left shifts but allow negative shift counts to indicate
right shifts.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-9-peter.maydell@linaro.org
---
 target/arm/helper-mve.h     | 12 ++++++++++++
 target/arm/translate.h      | 20 ++++++++++++++++++++
 target/arm/mve.decode       | 28 ++++++++++++++++++++++++++++
 target/arm/mve_helper.c     |  7 +++++++
 target/arm/translate-mve.c  |  5 +++++
 target/arm/translate-neon.c | 18 ------------------
 6 files changed, 72 insertions(+), 18 deletions(-)

Implement the MVE VHLL (vector shift left long) insn.  This has two
encodings: the T1 encoding is the usual shift-by-immediate format,
and the T2 encoding is a special case where the shift count is always
equal to the element size.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-10-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  9 +++++++
 target/arm/mve.decode      | 53 +++++++++++++++++++++++++++++++++++---
 target/arm/mve_helper.c    | 32 +++++++++++++++++++++++
 target/arm/translate-mve.c | 15 +++++++++++
 4 files changed, 105 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vshllbsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshllbsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshllbub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshllbuh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshlltsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshlltsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshlltub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshlltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
 @2_shl_h .... .... .. 01  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
 @2_shl_w .... .... .. 1   shift:5 .... .... .... .... &2shift qd=%qd qm=%qm size=2
 
+@2_shll_b .... .... ... 01 shift:3 .... .... .... .... &2shift qd=%qd qm=%qm size=0
+@2_shll_h .... .... ... 1  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
+# VSHLL encoding T2 where shift == esize
+@2_shll_esize_b .... .... .... 00 .. .... .... .... .... &2shift \
+                qd=%qd qm=%qm size=0 shift=8
+@2_shll_esize_h .... .... .... 01 .. .... .... .... .... &2shift \
+                qd=%qd qm=%qm size=1 shift=16
+
 # Right shifts are encoded as N - shift, where N is the element size in bits.
 %rshift_i5  16:5 !function=rsub_32
 %rshift_i4  16:4 !function=rsub_16
@@ -XXX,XX +XXX,XX @@ VADD             1110 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
 VSUB             1111 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
 VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
 
-VMULH_S          111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
-VMULH_U          111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
+# The VSHLL T2 encoding is not a @2op pattern, but is here because it
+# overlaps what would be size=0b11 VMULH/VRMULH
+{
+  VSHLL_BS       111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
+  VSHLL_BS       111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
 
-VRMULH_S         111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
-VRMULH_U         111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
+  VMULH_S        111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
+}
+
+{
+  VSHLL_BU       111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
+  VSHLL_BU       111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
+
+  VMULH_U        111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
+}
+
+{
+  VSHLL_TS       111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
+  VSHLL_TS       111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
+
+  VRMULH_S       111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
+}
+
+{
+  VSHLL_TU       111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
+  VSHLL_TU       111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
+
+  VRMULH_U       111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
+}
 
 VMAX_S           111 0 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
 VMAX_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
@@ -XXX,XX +XXX,XX @@ VRSHRI_S          111 0 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
 VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_b
 VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_h
 VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
+
+# VSHLL T1 encoding; the T2 VSHLL encoding is elsewhere in this file
+VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_b
+VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_h
+
+VSHLL_BU          111 1 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_b
+VSHLL_BU          111 1 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_h
+
+VSHLL_TS          111 0 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_b
+VSHLL_TS          111 0 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
+
+VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_b
+VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
 DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
 DO_2SHIFT_U(vrshli_u, DO_VRSHLU)
 DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
+
+/*
+ * Long shifts taking half-sized inputs from top or bottom of the input
+ * vector and producing a double-width result. ESIZE, TYPE are for
+ * the input, and LESIZE, LTYPE for the output.
+ * Unlike the normal shift helpers, we do not handle negative shift counts,
+ * because the long shift is strictly left-only.
+ */
+#define DO_VSHLL(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE)                   \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,             \
+                                void *vm, uint32_t shift)               \
+    {                                                                   \
+        LTYPE *d = vd;                                                  \
+        TYPE *m = vm;                                                   \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned le;                                                    \
+        assert(shift <= 16);                                            \
+        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
+            LTYPE r = (LTYPE)m[H##ESIZE(le * 2 + TOP)] << shift;        \
+            mergemask(&d[H##LESIZE(le)], r, mask);                      \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+#define DO_VSHLL_ALL(OP, TOP)                                \
+    DO_VSHLL(OP##sb, TOP, 1, int8_t, 2, int16_t)             \
+    DO_VSHLL(OP##ub, TOP, 1, uint8_t, 2, uint16_t)           \
+    DO_VSHLL(OP##sh, TOP, 2, int16_t, 4, int32_t)            \
+    DO_VSHLL(OP##uh, TOP, 2, uint16_t, 4, uint32_t)          \
+
+DO_VSHLL_ALL(vshllb, false)
+DO_VSHLL_ALL(vshllt, true)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT(VSHRI_S, vshli_s, true)
 DO_2SHIFT(VSHRI_U, vshli_u, true)
 DO_2SHIFT(VRSHRI_S, vrshli_s, true)
 DO_2SHIFT(VRSHRI_U, vrshli_u, true)
+
+#define DO_VSHLL(INSN, FN)                                      \
+    static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
+    {                                                           \
+        static MVEGenTwoOpShiftFn * const fns[] = {             \
+            gen_helper_mve_##FN##b,                             \
+            gen_helper_mve_##FN##h,                             \
+        };                                                      \
+        return do_2shift(s, a, fns[a->size], false);            \
+    }
+
+DO_VSHLL(VSHLL_BS, vshllbs)
+DO_VSHLL(VSHLL_BU, vshllbu)
+DO_VSHLL(VSHLL_TS, vshllts)
+DO_VSHLL(VSHLL_TU, vshlltu)
-- 
2.20.1

Implement the MVE VSRI and VSLI insns, which perform a
shift-and-insert operation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-11-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  9 ++++++++
 target/arm/mve_helper.c    | 42 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  3 +++
 4 files changed, 62 insertions(+)

Implement the MVE shift-right-and-narrow insn VSHRN and VRSHRN.

do_urshr() is borrowed from sve_helper.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-12-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 10 ++++++++++
 target/arm/mve.decode      | 11 +++++++++++
 target/arm/mve_helper.c    | 40 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 15 ++++++++++++++
 4 files changed, 76 insertions(+)

Implement the MVE saturating shift-right-and-narrow insns
VQSHRN, VQSHRUN, VQRSHRN and VQRSHRUN.

do_srshr() is borrowed from sve_helper.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-13-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  30 +++++++++++
 target/arm/mve.decode      |  28 ++++++++++
 target/arm/mve_helper.c    | 104 +++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  12 +++++
 4 files changed, 174 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrshrnbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshrnbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshrntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshrnth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshrnb_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnb_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnt_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnt_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshrnb_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnb_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnt_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnt_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshrunbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrunbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqrshrnb_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnb_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnt_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnt_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqrshrnb_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnb_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnt_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnt_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqrshrunbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrunbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_b
 VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_h
 VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_b
 VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_h
+
+VQSHRNB_S         111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_b
+VQSHRNB_S         111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_h
+VQSHRNT_S         111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_b
+VQSHRNT_S         111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_h
+VQSHRNB_U         111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_b
+VQSHRNB_U         111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_h
+VQSHRNT_U         111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_b
+VQSHRNT_U         111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_h
+
+VQSHRUNB          111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_b
+VQSHRUNB          111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_h
+VQSHRUNT          111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
+VQSHRUNT          111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
+
+VQRSHRNB_S        111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_b
+VQRSHRNB_S        111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_h
+VQRSHRNT_S        111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_b
+VQRSHRNT_S        111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_h
+VQRSHRNB_U        111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_b
+VQRSHRNB_U        111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_h
+VQRSHRNT_U        111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_b
+VQRSHRNT_U        111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_h
+
+VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_b
+VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_h
+VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
+VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint64_t do_urshr(uint64_t x, unsigned sh)
     }
 }
 
+static inline int64_t do_srshr(int64_t x, unsigned sh)
+{
+    if (likely(sh < 64)) {
+        return (x >> sh) + ((x >> (sh - 1)) & 1);
+    } else {
+        /* Rounding the sign bit always produces 0. */
+        return 0;
+    }
+}
+
 DO_VSHRN_ALL(vshrn, DO_SHR)
 DO_VSHRN_ALL(vrshrn, do_urshr)
+
+static inline int32_t do_sat_bhs(int64_t val, int64_t min, int64_t max,
+                                 bool *satp)
+{
+    if (val > max) {
+        *satp = true;
+        return max;
+    } else if (val < min) {
+        *satp = true;
+        return min;
+    } else {
+        return val;
+    }
+}
+
+/* Saturating narrowing right shifts */
+#define DO_VSHRN_SAT(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN)   \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
+                                void *vm, uint32_t shift)       \
+    {                                                           \
+        LTYPE *m = vm;                                          \
+        TYPE *d = vd;                                           \
+        uint16_t mask = mve_element_mask(env);                  \
+        bool qc = false;                                        \
+        unsigned le;                                            \
+        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
+            bool sat = false;                                   \
+            TYPE r = FN(m[H##LESIZE(le)], shift, &sat);         \
+            mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask);     \
+            qc |= sat && (mask & 1 << (TOP * ESIZE));           \
+        }                                                       \
+        if (qc) {                                               \
+            env->vfp.qc[0] = qc;                                \
+        }                                                       \
+        mve_advance_vpt(env);                                   \
+    }
+
+#define DO_VSHRN_SAT_UB(BOP, TOP, FN)                           \
+    DO_VSHRN_SAT(BOP, false, 1, uint8_t, 2, uint16_t, FN)       \
+    DO_VSHRN_SAT(TOP, true, 1, uint8_t, 2, uint16_t, FN)
+
+#define DO_VSHRN_SAT_UH(BOP, TOP, FN)                           \
+    DO_VSHRN_SAT(BOP, false, 2, uint16_t, 4, uint32_t, FN)      \
+    DO_VSHRN_SAT(TOP, true, 2, uint16_t, 4, uint32_t, FN)
+
+#define DO_VSHRN_SAT_SB(BOP, TOP, FN)                           \
+    DO_VSHRN_SAT(BOP, false, 1, int8_t, 2, int16_t, FN)         \
+    DO_VSHRN_SAT(TOP, true, 1, int8_t, 2, int16_t, FN)
+
+#define DO_VSHRN_SAT_SH(BOP, TOP, FN)                           \
+    DO_VSHRN_SAT(BOP, false, 2, int16_t, 4, int32_t, FN)        \
+    DO_VSHRN_SAT(TOP, true, 2, int16_t, 4, int32_t, FN)
+
+#define DO_SHRN_SB(N, M, SATP)                                  \
+    do_sat_bhs((int64_t)(N) >> (M), INT8_MIN, INT8_MAX, SATP)
+#define DO_SHRN_UB(N, M, SATP)                                  \
+    do_sat_bhs((uint64_t)(N) >> (M), 0, UINT8_MAX, SATP)
+#define DO_SHRUN_B(N, M, SATP)                                  \
+    do_sat_bhs((int64_t)(N) >> (M), 0, UINT8_MAX, SATP)
+
+#define DO_SHRN_SH(N, M, SATP)                                  \
+    do_sat_bhs((int64_t)(N) >> (M), INT16_MIN, INT16_MAX, SATP)
+#define DO_SHRN_UH(N, M, SATP)                                  \
+    do_sat_bhs((uint64_t)(N) >> (M), 0, UINT16_MAX, SATP)
+#define DO_SHRUN_H(N, M, SATP)                                  \
+    do_sat_bhs((int64_t)(N) >> (M), 0, UINT16_MAX, SATP)
+
+#define DO_RSHRN_SB(N, M, SATP)                                 \
+    do_sat_bhs(do_srshr(N, M), INT8_MIN, INT8_MAX, SATP)
+#define DO_RSHRN_UB(N, M, SATP)                                 \
+    do_sat_bhs(do_urshr(N, M), 0, UINT8_MAX, SATP)
+#define DO_RSHRUN_B(N, M, SATP)                                 \
+    do_sat_bhs(do_srshr(N, M), 0, UINT8_MAX, SATP)
+
+#define DO_RSHRN_SH(N, M, SATP)                                 \
+    do_sat_bhs(do_srshr(N, M), INT16_MIN, INT16_MAX, SATP)
+#define DO_RSHRN_UH(N, M, SATP)                                 \
+    do_sat_bhs(do_urshr(N, M), 0, UINT16_MAX, SATP)
+#define DO_RSHRUN_H(N, M, SATP)                                 \
+    do_sat_bhs(do_srshr(N, M), 0, UINT16_MAX, SATP)
+
+DO_VSHRN_SAT_SB(vqshrnb_sb, vqshrnt_sb, DO_SHRN_SB)
+DO_VSHRN_SAT_SH(vqshrnb_sh, vqshrnt_sh, DO_SHRN_SH)
+DO_VSHRN_SAT_UB(vqshrnb_ub, vqshrnt_ub, DO_SHRN_UB)
+DO_VSHRN_SAT_UH(vqshrnb_uh, vqshrnt_uh, DO_SHRN_UH)
+DO_VSHRN_SAT_SB(vqshrunbb, vqshruntb, DO_SHRUN_B)
+DO_VSHRN_SAT_SH(vqshrunbh, vqshrunth, DO_SHRUN_H)
+
+DO_VSHRN_SAT_SB(vqrshrnb_sb, vqrshrnt_sb, DO_RSHRN_SB)
+DO_VSHRN_SAT_SH(vqrshrnb_sh, vqrshrnt_sh, DO_RSHRN_SH)
+DO_VSHRN_SAT_UB(vqrshrnb_ub, vqrshrnt_ub, DO_RSHRN_UB)
+DO_VSHRN_SAT_UH(vqrshrnb_uh, vqrshrnt_uh, DO_RSHRN_UH)
+DO_VSHRN_SAT_SB(vqrshrunbb, vqrshruntb, DO_RSHRUN_B)
+DO_VSHRN_SAT_SH(vqrshrunbh, vqrshrunth, DO_RSHRUN_H)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_N(VSHRNB, vshrnb)
 DO_2SHIFT_N(VSHRNT, vshrnt)
 DO_2SHIFT_N(VRSHRNB, vrshrnb)
 DO_2SHIFT_N(VRSHRNT, vrshrnt)
+DO_2SHIFT_N(VQSHRNB_S, vqshrnb_s)
+DO_2SHIFT_N(VQSHRNT_S, vqshrnt_s)
+DO_2SHIFT_N(VQSHRNB_U, vqshrnb_u)
+DO_2SHIFT_N(VQSHRNT_U, vqshrnt_u)
+DO_2SHIFT_N(VQSHRUNB, vqshrunb)
+DO_2SHIFT_N(VQSHRUNT, vqshrunt)
+DO_2SHIFT_N(VQRSHRNB_S, vqrshrnb_s)
+DO_2SHIFT_N(VQRSHRNT_S, vqrshrnt_s)
+DO_2SHIFT_N(VQRSHRNB_U, vqrshrnb_u)
+DO_2SHIFT_N(VQRSHRNT_U, vqrshrnt_u)
+DO_2SHIFT_N(VQRSHRUNB, vqrshrunb)
+DO_2SHIFT_N(VQRSHRUNT, vqrshrunt)
-- 
2.20.1

Implement the MVE VSHLC insn, which performs a shift left of the
entire vector with carry in bits provided from a general purpose
register and carry out bits written back to that register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-14-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  2 ++
 target/arm/mve.decode      |  2 ++
 target/arm/mve_helper.c    | 38 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 30 ++++++++++++++++++++++++++++++
 4 files changed, 72 insertions(+)

Implement the MVE VADDLV insn; this is similar to VADDV, except
that it accumulates 32-bit elements into a 64-bit accumulator
stored in a pair of general-purpose registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-15-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  3 ++
 target/arm/mve.decode      |  6 +++-
 target/arm/mve_helper.c    | 19 ++++++++++++
 target/arm/translate-mve.c | 63 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 90 insertions(+), 1 deletion(-)

The MVE extension to v8.1M includes some new shift instructions which
sit entirely within the non-coprocessor part of the encoding space
and which operate only on general-purpose registers.  They take up
the space which was previously UNPREDICTABLE MOVS and ORRS encodings
with Rm == 13 or 15.

Implement the long shifts by immediate, which perform shifts on a
pair of general-purpose registers treated as a 64-bit quantity, with
an immediate shift count between 1 and 32.

Awkwardly, because the MOVS and ORRS trans functions do not UNDEF for
the Rm==13,15 case, we need to explicitly emit code to UNDEF for the
cases where v8.1M now requires that.  (Trying to change MOVS and ORRS
is too difficult, because the functions that generate the code are
shared between a dozen different kinds of arithmetic or logical
instruction for all A32, T16 and T32 encodings, and for some insns
and some encodings Rm==13,15 are valid.)

We make the helper functions we need for UQSHLL and SQSHLL take
a 32-bit value which the helper casts to int8_t because we'll need
these helpers also for the shift-by-register insns, where the shift
count might be < 0 or > 32.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-16-peter.maydell@linaro.org
---
 target/arm/helper-mve.h |  3 ++
 target/arm/translate.h  |  1 +
 target/arm/t32.decode   | 28 +++++++++++++
 target/arm/mve_helper.c | 10 +++++
 target/arm/translate.c  | 90 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 132 insertions(+)

Implement the MVE long shifts by register, which perform shifts on a
pair of general-purpose registers treated as a 64-bit quantity, with
the shift count in another general-purpose register, which might be
either positive or negative.

Like the long-shifts-by-immediate, these encodings sit in the space
that was previously the UNPREDICTABLE MOVS/ORRS with Rm==13,15.
Because LSLL_rr and ASRL_rr overlap with both MOV_rxri/ORR_rrri and
also with CSEL (as one of the previously-UNPREDICTABLE Rm==13 cases),
we have to move the CSEL pattern into the same decodetree group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-17-peter.maydell@linaro.org
---
 target/arm/helper-mve.h |  6 +++
 target/arm/translate.h  |  1 +
 target/arm/t32.decode   | 16 +++++--
 target/arm/mve_helper.c | 93 +++++++++++++++++++++++++++++++++++++++++
 target/arm/translate.c  | 69 ++++++++++++++++++++++++++++++
 5 files changed, 182 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(mve_vshlc, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
 
+DEF_HELPER_FLAGS_3(mve_sshrl, TCG_CALL_NO_RWG, i64, env, i64, i32)
+DEF_HELPER_FLAGS_3(mve_ushll, TCG_CALL_NO_RWG, i64, env, i64, i32)
 DEF_HELPER_FLAGS_3(mve_sqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
 DEF_HELPER_FLAGS_3(mve_uqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
+DEF_HELPER_FLAGS_3(mve_sqrshrl, TCG_CALL_NO_RWG, i64, env, i64, i32)
+DEF_HELPER_FLAGS_3(mve_uqrshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
+DEF_HELPER_FLAGS_3(mve_sqrshrl48, TCG_CALL_NO_RWG, i64, env, i64, i32)
+DEF_HELPER_FLAGS_3(mve_uqrshll48, TCG_CALL_NO_RWG, i64, env, i64, i32)
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
 typedef void WideShiftImmFn(TCGv_i64, TCGv_i64, int64_t shift);
+typedef void WideShiftFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i32);
 
 /**
  * arm_tbflags_from_tb:
diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -XXX,XX +XXX,XX @@
 &mcrr            !extern cp opc1 crm rt rt2
 
 &mve_shl_ri      rdalo rdahi shim
+&mve_shl_rr      rdalo rdahi rm
 
 # rdahi: bits [3:1] from insn, bit 0 is 1
 # rdalo: bits [3:1] from insn, bit 0 is 0
@@ -XXX,XX +XXX,XX @@
 
 @mve_shl_ri      ....... .... . ... . . ... ... . .. .. .... \
                  &mve_shl_ri shim=%imm5_12_6 rdalo=%rdalo_17 rdahi=%rdahi_9
+@mve_shl_rr      ....... .... . ... . rm:4  ... . .. .. .... \
+                 &mve_shl_rr rdalo=%rdalo_17 rdahi=%rdahi_9
 
 {
   TST_xrri       1110101 0000 1 .... 0 ... 1111 .... ....     @S_xrr_shi
@@ -XXX,XX +XXX,XX @@ BIC_rrri         1110101 0001 . .... 0 ... .... .... ....     @s_rrr_shi
     URSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
     SRSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
     SQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
+
+    LSLL_rr      1110101 0010 1 ... 0 ....  ... 1  0000 1101  @mve_shl_rr
+    ASRL_rr      1110101 0010 1 ... 0 ....  ... 1  0010 1101  @mve_shl_rr
+    UQRSHLL64_rr 1110101 0010 1 ... 1 ....  ... 1  0000 1101  @mve_shl_rr
+    SQRSHRL64_rr 1110101 0010 1 ... 1 ....  ... 1  0010 1101  @mve_shl_rr
+    UQRSHLL48_rr 1110101 0010 1 ... 1 ....  ... 1  1000 1101  @mve_shl_rr
+    SQRSHRL48_rr 1110101 0010 1 ... 1 ....  ... 1  1010 1101  @mve_shl_rr
   ]
 
   MOV_rxri       1110101 0010 . 1111 0 ... .... .... ....     @s_rxr_shi
   ORR_rrri       1110101 0010 . .... 0 ... .... .... ....     @s_rrr_shi
+
+  # v8.1M CSEL and friends
+  CSEL           1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
 }
 {
   MVN_rxri       1110101 0011 . 1111 0 ... .... .... ....     @s_rxr_shi
@@ -XXX,XX +XXX,XX @@ SBC_rrri         1110101 1011 . .... 0 ... .... .... ....     @s_rrr_shi
 }
 RSB_rrri         1110101 1110 . .... 0 ... .... .... ....     @s_rrr_shi
 
-# v8.1M CSEL and friends
-CSEL             1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
-
 # Data-processing (register-shifted register)
 
 MOV_rxrr         1111 1010 0 shty:2 s:1 rm:4 1111 rd:4 0000 rs:4 \
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mve_vshlc)(CPUARMState *env, void *vd, uint32_t rdm,
     return rdm;
 }
 
+uint64_t HELPER(mve_sshrl)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_sqrshl_d(n, -(int8_t)shift, false, NULL);
+}
+
+uint64_t HELPER(mve_ushll)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_uqrshl_d(n, (int8_t)shift, false, NULL);
+}
+
 uint64_t HELPER(mve_sqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
 {
     return do_sqrshl_d(n, (int8_t)shift, false, &env->QF);
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(mve_uqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
 {
     return do_uqrshl_d(n, (int8_t)shift, false, &env->QF);
 }
+
+uint64_t HELPER(mve_sqrshrl)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_sqrshl_d(n, -(int8_t)shift, true, &env->QF);
+}
+
+uint64_t HELPER(mve_uqrshll)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_uqrshl_d(n, (int8_t)shift, true, &env->QF);
+}
+
+/* Operate on 64-bit values, but saturate at 48 bits */
+static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
+                                    bool round, uint32_t *sat)
+{
+    if (shift <= -48) {
+        /* Rounding the sign bit always produces 0. */
+        if (round) {
+            return 0;
+        }
+        return src >> 63;
+    } else if (shift < 0) {
+        if (round) {
+            src >>= -shift - 1;
+            return (src >> 1) + (src & 1);
+        }
+        return src >> -shift;
+    } else if (shift < 48) {
+        int64_t val = src << shift;
+        int64_t extval = sextract64(val, 0, 48);
+        if (!sat || val == extval) {
+            return extval;
+        }
+    } else if (!sat || src == 0) {
+        return 0;
+    }
+
+    *sat = 1;
+    return (1ULL << 47) - (src >= 0);
+}
+
+/* Operate on 64-bit values, but saturate at 48 bits */
+static inline uint64_t do_uqrshl48_d(uint64_t src, int64_t shift,
+                                     bool round, uint32_t *sat)
+{
+    uint64_t val, extval;
+
+    if (shift <= -(48 + round)) {
+        return 0;
+    } else if (shift < 0) {
+        if (round) {
+            val = src >> (-shift - 1);
+            val = (val >> 1) + (val & 1);
+        } else {
+            val = src >> -shift;
+        }
+        extval = extract64(val, 0, 48);
+        if (!sat || val == extval) {
+            return extval;
+        }
+    } else if (shift < 48) {
+        uint64_t val = src << shift;
+        uint64_t extval = extract64(val, 0, 48);
+        if (!sat || val == extval) {
+            return extval;
+        }
+    } else if (!sat || src == 0) {
+        return 0;
+    }
+
+    *sat = 1;
+    return MAKE_64BIT_MASK(0, 48);
+}
+
+uint64_t HELPER(mve_sqrshrl48)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_sqrshl48_d(n, -(int8_t)shift, true, &env->QF);
+}
+
+uint64_t HELPER(mve_uqrshll48)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_uqrshl48_d(n, (int8_t)shift, true, &env->QF);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_URSHRL_ri(DisasContext *s, arg_mve_shl_ri *a)
     return do_mve_shl_ri(s, a, gen_urshr64_i64);
 }
 
+static bool do_mve_shl_rr(DisasContext *s, arg_mve_shl_rr *a, WideShiftFn *fn)
+{
+    TCGv_i64 rda;
+    TCGv_i32 rdalo, rdahi;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+        /* Decode falls through to ORR/MOV UNPREDICTABLE handling */
+        return false;
+    }
+    if (a->rdahi == 15) {
+        /* These are a different encoding (SQSHL/SRSHR/UQSHL/URSHR) */
+        return false;
+    }
+    if (!dc_isar_feature(aa32_mve, s) ||
+        !arm_dc_feature(s, ARM_FEATURE_M_MAIN) ||
+        a->rdahi == 13 || a->rm == 13 || a->rm == 15 ||
+        a->rm == a->rdahi || a->rm == a->rdalo) {
+        /* These rdahi/rdalo/rm cases are UNPREDICTABLE; we choose to UNDEF */
+        unallocated_encoding(s);
+        return true;
+    }
+
+    rda = tcg_temp_new_i64();
+    rdalo = load_reg(s, a->rdalo);
+    rdahi = load_reg(s, a->rdahi);
+    tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
+
+    /* The helper takes care of the sign-extension of the low 8 bits of Rm */
+    fn(rda, cpu_env, rda, cpu_R[a->rm]);
+
+    tcg_gen_extrl_i64_i32(rdalo, rda);
+    tcg_gen_extrh_i64_i32(rdahi, rda);
+    store_reg(s, a->rdalo, rdalo);
+    store_reg(s, a->rdahi, rdahi);
+    tcg_temp_free_i64(rda);
+
+    return true;
+}
+
+static bool trans_LSLL_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_ushll);
+}
+
+static bool trans_ASRL_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_sshrl);
+}
+
+static bool trans_UQRSHLL64_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_uqrshll);
+}
+
+static bool trans_SQRSHRL64_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_sqrshrl);
+}
+
+static bool trans_UQRSHLL48_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_uqrshll48);
+}
+
+static bool trans_SQRSHRL48_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_sqrshrl48);
+}
+
 /*
  * Multiply and multiply accumulate
  */
-- 
2.20.1

Implement the MVE shifts by immediate, which perform shifts
on a single general-purpose register.

These patterns overlap with the long-shift-by-immediates,
so we have to rearrange the grouping a little here.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-18-peter.maydell@linaro.org
---
 target/arm/helper-mve.h |  3 ++
 target/arm/translate.h  |  1 +
 target/arm/t32.decode   | 31 ++++++++++++++-----
 target/arm/mve_helper.c | 10 ++++++
 target/arm/translate.c  | 68 +++++++++++++++++++++++++++++++++++++++--
 5 files changed, 104 insertions(+), 9 deletions(-)

Implement the MVE shifts by register, which perform
shifts on a single general-purpose register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-19-peter.maydell@linaro.org
---
 target/arm/helper-mve.h |  2 ++
 target/arm/translate.h  |  1 +
 target/arm/t32.decode   | 18 ++++++++++++++----
 target/arm/mve_helper.c | 10 ++++++++++
 target/arm/translate.c  | 30 ++++++++++++++++++++++++++++++
 5 files changed, 57 insertions(+), 4 deletions(-)