Series comparison

-[PULL 00/37] target-arm queue
+[PULL 00/24] target-arm queue
-Nothing much exciting here, but it's 37 patches worth...
+The following changes since commit 5a67d7735d4162630769ef495cf813244fc850df:
-thanks
+  Merge remote-tracking branch 'remotes/berrange-gitlab/tags/tls-deps-pull-request' into staging (2021-07-02 08:22:39 +0100)
 -- PMM
 The following changes since commit e64a62df378a746c0b257105959613c9f8122e59:
   Merge remote-tracking branch 'remotes/stsquad/tags/pull-testing-040320-1' into staging (2020-03-05 12:13:51 +0000)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200305
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210702
-for you to fetch changes up to 597d61a3b1f94c53a3aaa77671697c0c5f797dbf:
+for you to fetch changes up to 04ea4d3cfd0a21b248ece8eb7a9436a3d9898dd8:
-  target/arm: Clean address for DC ZVA (2020-03-05 16:09:21 +0000)
+  target/arm: Implement MVE shifts by register (2021-07-02 11:48:38 +0100)
 ----------------------------------------------------------------
- * versal: Implement ADMA
+target-arm queue:
- * Implement (trivially) ARMv8.2-TTCNP
+ * more MVE instructions
- * hw/arm/smmu-common: a fix to smmu_find_smmu_pcibus
+ * hw/gpio/gpio_pwr: use shutdown function for reboot
- * Remove unnecessary endianness-handling on some boards
+ * target/arm: Check NaN mode before silencing NaN
- * Avoid minor memory leaks from timer_new in some devices
+ * tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine
- * Honour more of the HCR_EL2 trap bits
+ * hw/arm: Add basic power management to raspi.
- * Complain rather than ignoring bad command line options for cubieboard
+ * docs/system/arm: Add quanta-gbs-bmc, quanta-q7l1-bmc
  * Honour TBI for DC ZVA and exception return
 ----------------------------------------------------------------
-Edgar E. Iglesias (2):
+Joe Komlodi (1):
-      hw/arm: versal: Add support for the LPD ADMAs
+      target/arm: Check NaN mode before silencing NaN
       hw/arm: versal: Generate xlnx-versal-virt zdma FDT nodes
-Eric Auger (1):
+Maxim Uvarov (1):
-      hw/arm/smmu-common: a fix to smmu_find_smmu_pcibus
+      hw/gpio/gpio_pwr: use shutdown function for reboot
-Niek Linnenbank (4):
+Nolan Leake (1):
-      hw/arm/cubieboard: use ARM Cortex-A8 as the default CPU in machine definition
+      hw/arm: Add basic power management to raspi.
       hw/arm/cubieboard: restrict allowed CPU type to ARM Cortex-A8
       hw/arm/cubieboard: restrict allowed RAM size to 512MiB and 1GiB
       hw/arm/cubieboard: report error when using unsupported -bios argument
-Pan Nengyuan (4):
+Patrick Venture (2):
-      hw/arm/pxa2xx: move timer_new from init() into realize() to avoid memleaks
+      docs/system/arm: Add quanta-q7l1-bmc reference
-      hw/arm/spitz: move timer_new from init() into realize() to avoid memleaks
+      docs/system/arm: Add quanta-gbs-bmc reference
       hw/arm/strongarm: move timer_new from init() into realize() to avoid memleaks
       hw/timer/cadence_ttc: move timer_new from init() into realize() to avoid memleaks
-Peter Maydell (1):
+Peter Maydell (18):
-      target/arm: Implement (trivially) ARMv8.2-TTCNP
+      target/arm: Fix MVE widening/narrowing VLDR/VSTR offset calculation
       target/arm: Fix bugs in MVE VRMLALDAVH, VRMLSLDAVH
       target/arm: Make asimd_imm_const() public
       target/arm: Use asimd_imm_const for A64 decode
       target/arm: Use dup_const() instead of bitfield_replicate()
       target/arm: Implement MVE logical immediate insns
       target/arm: Implement MVE vector shift left by immediate insns
       target/arm: Implement MVE vector shift right by immediate insns
       target/arm: Implement MVE VSHLL
       target/arm: Implement MVE VSRI, VSLI
       target/arm: Implement MVE VSHRN, VRSHRN
       target/arm: Implement MVE saturating narrowing shifts
       target/arm: Implement MVE VSHLC
       target/arm: Implement MVE VADDLV
       target/arm: Implement MVE long shifts by immediate
       target/arm: Implement MVE long shifts by register
       target/arm: Implement MVE shifts by immediate
       target/arm: Implement MVE shifts by register
-Philippe Mathieu-Daudé (6):
+Philippe Mathieu-Daudé (1):
-      hw/arm/smmu-common: Simplify smmu_find_smmu_pcibus() logic
+      tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine
       hw/arm/gumstix: Simplify since the machines are little-endian only
       hw/arm/mainstone: Simplify since the machines are little-endian only
       hw/arm/omap_sx1: Simplify since the machines are little-endian only
       hw/arm/z2: Simplify since the machines are little-endian only
       hw/arm/musicpal: Simplify since the machines are little-endian only
-Richard Henderson (19):
+ docs/system/arm/aspeed.rst             |   1 +
-      target/arm: Improve masking of HCR/HCR2 RES0 bits
+ docs/system/arm/nuvoton.rst            |   5 +-
-      target/arm: Add HCR_EL2 bit definitions from ARMv8.6
+ include/hw/arm/bcm2835_peripherals.h   |   3 +-
-      target/arm: Disable has_el2 and has_el3 for user-only
+ include/hw/misc/bcm2835_powermgt.h     |  29 ++
-      target/arm: Remove EL2 and EL3 setup from user-only
+ target/arm/helper-mve.h                | 108 +++++++
-      target/arm: Improve masking in arm_hcr_el2_eff
+ target/arm/translate.h                 |  41 +++
-      target/arm: Honor the HCR_EL2.{TVM,TRVM} bits
+ target/arm/mve.decode                  | 177 ++++++++++-
-      target/arm: Honor the HCR_EL2.TSW bit
+ target/arm/t32.decode                  |  71 ++++-
-      target/arm: Honor the HCR_EL2.TACR bit
+ hw/arm/bcm2835_peripherals.c           |  13 +-
-      target/arm: Honor the HCR_EL2.TPCP bit
+ hw/gpio/gpio_pwr.c                     |   2 +-
-      target/arm: Honor the HCR_EL2.TPU bit
+ hw/misc/bcm2835_powermgt.c             | 160 ++++++++++
-      target/arm: Honor the HCR_EL2.TTLB bit
+ target/arm/helper-a64.c                |  12 +-
-      tests/tcg/aarch64: Add newline in pauth-1 printf
+ target/arm/mve_helper.c                | 524 +++++++++++++++++++++++++++++++--
-      target/arm: Replicate TBI/TBID bits for single range regimes
+ target/arm/translate-a64.c             |  86 +-----
-      target/arm: Optimize cpu_mmu_index
+ target/arm/translate-mve.c             | 261 +++++++++++++++-
-      target/arm: Introduce core_to_aa64_mmu_idx
+ target/arm/translate-neon.c            |  81 -----
-      target/arm: Apply TBI to ESR_ELx in helper_exception_return
+ target/arm/translate.c                 | 327 +++++++++++++++++++-
-      target/arm: Move helper_dc_zva to helper-a64.c
+ target/arm/vfp_helper.c                |  24 +-
-      target/arm: Use DEF_HELPER_FLAGS for helper_dc_zva
+ hw/misc/meson.build                    |   1 +
-      target/arm: Clean address for DC ZVA
+ tests/acceptance/boot_linux_console.py |  43 +++
 files changed, 1760 insertions(+), 209 deletions(-)
  create mode 100644 include/hw/misc/bcm2835_powermgt.h
  create mode 100644 hw/misc/bcm2835_powermgt.c
- include/hw/arm/xlnx-versal.h |   6 +
- target/arm/cpu.h             |  30 ++--
- target/arm/helper-a64.h      |   1 +
- target/arm/helper.h          |   1 -
- target/arm/internals.h       |   6 +
- hw/arm/cubieboard.c          |  29 +++-
- hw/arm/gumstix.c             |  16 +-
- hw/arm/mainstone.c           |   8 +-
- hw/arm/musicpal.c            |  10 --
- hw/arm/omap_sx1.c            |  11 +-
- hw/arm/pxa2xx.c              |  17 +-
- hw/arm/smmu-common.c         |  20 +--
- hw/arm/spitz.c               |   8 +-
- hw/arm/strongarm.c           |  18 ++-
- hw/arm/xlnx-versal-virt.c    |  28 ++++
- hw/arm/xlnx-versal.c         |  24 +++
- hw/arm/z2.c                  |   8 +-
- hw/timer/cadence_ttc.c       |  18 ++-
- target/arm/cpu.c             |  13 +-
- target/arm/cpu64.c           |   2 +
- target/arm/helper-a64.c      | 114 ++++++++++++-
- target/arm/helper.c          | 373 ++++++++++++++++++++++++++++++-------------
- target/arm/op_helper.c       |  93 -----------
- target/arm/translate-a64.c   |   4 +-
- tests/tcg/aarch64/pauth-1.c  |   2 +-
-files changed, 551 insertions(+), 309 deletions(-)

-[PULL 04/37] hw/arm/smmu-common: a fix to smmu_find_smmu_pcibus
+[PULL 01/24] docs/system/arm: Add quanta-q7l1-bmc reference
-From: Eric Auger <eric.auger@redhat.com>
+From: Patrick Venture <venture@google.com>
-Make sure a null SMMUPciBus is returned in case we were
+Adds a line-item reference to the supported quanta-q71l-bmc aspeed
-not able to identify a pci bus matching the @bus_num.
+entry.
-This matches the fix done on intel iommu in commit:
+Signed-off-by: Patrick Venture <venture@google.com>
-a2e1cd41ccfe796529abfd1b6aeb1dd4393762a2
+Reviewed-by: Cédric Le Goater <clg@kaod.org>
+Message-id: 20210615192848.1065297-2-venture@google.com
 Signed-off-by: Eric Auger <eric.auger@redhat.com>
 Reviewed-by: Peter Xu <peterx@redhat.com>
 Message-Id: <20200226172628.17449-1-eric.auger@redhat.com>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/smmu-common.c | 1 +
+ docs/system/arm/aspeed.rst | 1 +
 file changed, 1 insertion(+)
-diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
+diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/smmu-common.c
+--- a/docs/system/arm/aspeed.rst
-+++ b/hw/arm/smmu-common.c
++++ b/docs/system/arm/aspeed.rst
-@@ -XXX,XX +XXX,XX @@ SMMUPciBus *smmu_find_smmu_pcibus(SMMUState *s, uint8_t bus_num)
+@@ -XXX,XX +XXX,XX @@ etc.
-                 return smmu_pci_bus;
+ AST2400 SoC based machines :
-             }
-         }
+ - ``palmetto-bmc``         OpenPOWER Palmetto POWER8 BMC
-+        smmu_pci_bus = NULL;
++- ``quanta-q71l-bmc``      OpenBMC Quanta BMC
-     }
-     return smmu_pci_bus;
+ AST2500 SoC based machines :
- }
 --
 .20.1

-[PULL 36/37] target/arm: Use DEF_HELPER_FLAGS for helper_dc_zva
+[PULL 02/24] docs/system/arm: Add quanta-gbs-bmc reference
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Patrick Venture <venture@google.com>
-The function does not write registers, and only reads them by
+Add line item reference to quanta-gbs-bmc machine.
 implication via the exception path.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Patrick Venture <venture@google.com>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Cédric Le Goater <clg@kaod.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Message-id: 20210615192848.1065297-3-venture@google.com
-Message-id: 20200302175829.2183-7-richard.henderson@linaro.org
+[PMM: fixed underline Sphinx warning]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper-a64.h | 2 +-
+ docs/system/arm/nuvoton.rst | 5 +++--
-file changed, 1 insertion(+), 1 deletion(-)
+file changed, 3 insertions(+), 2 deletions(-)
-diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
+diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-a64.h
+--- a/docs/system/arm/nuvoton.rst
-+++ b/target/arm/helper-a64.h
++++ b/docs/system/arm/nuvoton.rst
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(advsimd_f16touinth, i32, f16, ptr)
+@@ -XXX,XX +XXX,XX @@
- DEF_HELPER_2(sqrt_f16, f16, f16, ptr)
+-Nuvoton iBMC boards (``npcm750-evb``, ``quanta-gsj``)
+-=====================================================
- DEF_HELPER_2(exception_return, void, env, i64)
++Nuvoton iBMC boards (``*-bmc``, ``npcm750-evb``, ``quanta-gsj``)
--DEF_HELPER_2(dc_zva, void, env, i64)
++================================================================
-+DEF_HELPER_FLAGS_2(dc_zva, TCG_CALL_NO_WG, void, env, i64)
+ The `Nuvoton iBMC`_ chips (NPCM7xx) are a family of ARM-based SoCs that are
- DEF_HELPER_FLAGS_3(pacia, TCG_CALL_NO_WG, i64, env, i64, i64)
+ designed to be used as Baseboard Management Controllers (BMCs) in various
- DEF_HELPER_FLAGS_3(pacib, TCG_CALL_NO_WG, i64, env, i64, i64)
+@@ -XXX,XX +XXX,XX @@ segment. The following machines are based on this chip :
  The NPCM730 SoC has two Cortex-A9 cores and is targeted for Data Center and
  Hyperscale applications. The following machines are based on this chip :
 +- ``quanta-gbs-bmc``    Quanta GBS server BMC
  - ``quanta-gsj``        Quanta GSJ server BMC
  There are also two more SoCs, NPCM710 and NPCM705, which are single-core
 --
 .20.1

-[PULL 32/37] target/arm: Optimize cpu_mmu_index
+[PULL 03/24] hw/arm: Add basic power management to raspi.
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Nolan Leake <nolan@sigbus.net>
-We now cache the core mmu_idx in env->hflags.  Rather than recompute
+This is just enough to make reboot and poweroff work. Works for
-from scratch, extract the field.  All of the uses of cpu_mmu_index
+linux, u-boot, and the arm trusted firmware. Not tested, but should
-within target/arm are within helpers, and env->hflags is always stable
+work for plan9, and bare-metal/hobby OSes, since they seem to generally
-within a translation block from whence helpers are called.
+do what linux does for reset.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+The watchdog timer functionality is not yet implemented.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20200302175829.2183-3-richard.henderson@linaro.org
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/64
 Signed-off-by: Nolan Leake <nolan@sigbus.net>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Message-id: 20210625210209.1870217-1-nolan@sigbus.net
 [PMM: tweaked commit title; fixed region size to 0x200;
  moved header file to include/]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h    | 23 +++++++++++++----------
+ include/hw/arm/bcm2835_peripherals.h |   3 +-
- target/arm/helper.c |  5 -----
+ include/hw/misc/bcm2835_powermgt.h   |  29 +++++
-files changed, 13 insertions(+), 15 deletions(-)
+ hw/arm/bcm2835_peripherals.c         |  13 ++-
+ hw/misc/bcm2835_powermgt.c           | 160 +++++++++++++++++++++++++++
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+ hw/misc/meson.build                  |   1 +
 files changed, 204 insertions(+), 2 deletions(-)
  create mode 100644 include/hw/misc/bcm2835_powermgt.h
  create mode 100644 hw/misc/bcm2835_powermgt.c
 diff --git a/include/hw/arm/bcm2835_peripherals.h b/include/hw/arm/bcm2835_peripherals.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/include/hw/arm/bcm2835_peripherals.h
-+++ b/target/arm/cpu.h
++++ b/include/hw/arm/bcm2835_peripherals.h
-@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdxBit {
+@@ -XXX,XX +XXX,XX @@
+ #include "hw/misc/bcm2835_mphi.h"
- #define MMU_USER_IDX 0
+ #include "hw/misc/bcm2835_thermal.h"
+ #include "hw/misc/bcm2835_cprman.h"
--/**
++#include "hw/misc/bcm2835_powermgt.h"
-- * cpu_mmu_index:
+ #include "hw/sd/sdhci.h"
-- * @env: The cpu environment
+ #include "hw/sd/bcm2835_sdhost.h"
-- * @ifetch: True for code access, false for data access.
+ #include "hw/gpio/bcm2835_gpio.h"
-- *
+@@ -XXX,XX +XXX,XX @@ struct BCM2835PeripheralState {
-- * Return the core mmu index for the current translation regime.
+     BCM2835MphiState mphi;
-- * This function is used by generic TCG code paths.
+     UnimplementedDeviceState txp;
-- */
+     UnimplementedDeviceState armtmr;
--int cpu_mmu_index(CPUARMState *env, bool ifetch);
+-    UnimplementedDeviceState powermgt;
--
++    BCM2835PowerMgtState powermgt;
- /* Indexes used when registering address spaces with cpu_address_space_init */
+     BCM2835CprmanState cprman;
- typedef enum ARMASIdx {
+     PL011State uart0;
-     ARMASIdx_NS = 0,
+     BCM2835AuxState aux;
-@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, BTYPE, 10, 2)         /* Not cached. */
+diff --git a/include/hw/misc/bcm2835_powermgt.h b/include/hw/misc/bcm2835_powermgt.h
- FIELD(TBFLAG_A64, TBID, 12, 2)
+new file mode 100644
- FIELD(TBFLAG_A64, UNPRIV, 14, 1)
+index XXXXXXX..XXXXXXX
+--- /dev/null
-+/**
++++ b/include/hw/misc/bcm2835_powermgt.h
-+ * cpu_mmu_index:
+@@ -XXX,XX +XXX,XX @@
-+ * @env: The cpu environment
++/*
-+ * @ifetch: True for code access, false for data access.
++ * BCM2835 Power Management emulation
 + *
-+ * Return the core mmu index for the current translation regime.
++ * Copyright (C) 2017 Marcin Chojnacki <marcinch7@gmail.com>
-+ * This function is used by generic TCG code paths.
++ * Copyright (C) 2021 Nolan Leake <nolan@sigbus.net>
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
-+static inline int cpu_mmu_index(CPUARMState *env, bool ifetch)
++
-+{
++#ifndef BCM2835_POWERMGT_H
-+    return FIELD_EX32(env->hflags, TBFLAG_ANY, MMUIDX);
++#define BCM2835_POWERMGT_H
-+}
++
-+
++#include "hw/sysbus.h"
- static inline bool bswap_code(bool sctlr_b)
++#include "qom/object.h"
- {
++
- #ifdef CONFIG_USER_ONLY
++#define TYPE_BCM2835_POWERMGT "bcm2835-powermgt"
-diff --git a/target/arm/helper.c b/target/arm/helper.c
++OBJECT_DECLARE_SIMPLE_TYPE(BCM2835PowerMgtState, BCM2835_POWERMGT)
 +
 +struct BCM2835PowerMgtState {
 +    SysBusDevice busdev;
 +    MemoryRegion iomem;
 +
 +    uint32_t rstc;
 +    uint32_t rsts;
 +    uint32_t wdog;
 +};
 +
 +#endif
 diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/hw/arm/bcm2835_peripherals.c
-+++ b/target/arm/helper.c
++++ b/hw/arm/bcm2835_peripherals.c
-@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx(CPUARMState *env)
+@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_init(Object *obj)
-     return arm_mmu_idx_el(env, arm_current_el(env));
      object_property_add_const_link(OBJECT(&s->dwc2), "dma-mr",
                                     OBJECT(&s->gpu_bus_mr));
 +
 +    /* Power Management */
 +    object_initialize_child(obj, "powermgt", &s->powermgt,
 +                            TYPE_BCM2835_POWERMGT);
  }
--int cpu_mmu_index(CPUARMState *env, bool ifetch)
+ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
--{
+@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
--    return arm_to_core_mmu_idx(arm_mmu_idx(env));
+         qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
--}
+                                INTERRUPT_USB));
--
- #ifndef CONFIG_USER_ONLY
++    /* Power Management */
- ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env)
++    if (!sysbus_realize(SYS_BUS_DEVICE(&s->powermgt), errp)) {
- {
++        return;
 +    }
 +
 +    memory_region_add_subregion(&s->peri_mr, PM_OFFSET,
 +                sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->powermgt), 0));
 +
      create_unimp(s, &s->txp, "bcm2835-txp", TXP_OFFSET, 0x1000);
      create_unimp(s, &s->armtmr, "bcm2835-sp804", ARMCTRL_TIMER0_1_OFFSET, 0x40);
 -    create_unimp(s, &s->powermgt, "bcm2835-powermgt", PM_OFFSET, 0x114);
      create_unimp(s, &s->i2s, "bcm2835-i2s", I2S_OFFSET, 0x100);
      create_unimp(s, &s->smi, "bcm2835-smi", SMI_OFFSET, 0x100);
      create_unimp(s, &s->spi[0], "bcm2835-spi0", SPI0_OFFSET, 0x20);
 diff --git a/hw/misc/bcm2835_powermgt.c b/hw/misc/bcm2835_powermgt.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/hw/misc/bcm2835_powermgt.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * BCM2835 Power Management emulation
 + *
 + * Copyright (C) 2017 Marcin Chojnacki <marcinch7@gmail.com>
 + * Copyright (C) 2021 Nolan Leake <nolan@sigbus.net>
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qemu/log.h"
 +#include "qemu/module.h"
 +#include "hw/misc/bcm2835_powermgt.h"
 +#include "migration/vmstate.h"
 +#include "sysemu/runstate.h"
 +
 +#define PASSWORD 0x5a000000
 +#define PASSWORD_MASK 0xff000000
 +
 +#define R_RSTC 0x1c
 +#define V_RSTC_RESET 0x20
 +#define R_RSTS 0x20
 +#define V_RSTS_POWEROFF 0x555 /* Linux uses partition 63 to indicate halt. */
 +#define R_WDOG 0x24
 +
 +static uint64_t bcm2835_powermgt_read(void *opaque, hwaddr offset,
 +                                      unsigned size)
 +{
 +    BCM2835PowerMgtState *s = (BCM2835PowerMgtState *)opaque;
 +    uint32_t res = 0;
 +
 +    switch (offset) {
 +    case R_RSTC:
 +        res = s->rstc;
 +        break;
 +    case R_RSTS:
 +        res = s->rsts;
 +        break;
 +    case R_WDOG:
 +        res = s->wdog;
 +        break;
 +
 +    default:
 +        qemu_log_mask(LOG_UNIMP,
 +                      "bcm2835_powermgt_read: Unknown offset 0x%08"HWADDR_PRIx
 +                      "\n", offset);
 +        res = 0;
 +        break;
 +    }
 +
 +    return res;
 +}
 +
 +static void bcm2835_powermgt_write(void *opaque, hwaddr offset,
 +                                   uint64_t value, unsigned size)
 +{
 +    BCM2835PowerMgtState *s = (BCM2835PowerMgtState *)opaque;
 +
 +    if ((value & PASSWORD_MASK) != PASSWORD) {
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "bcm2835_powermgt_write: Bad password 0x%"PRIx64
 +                      " at offset 0x%08"HWADDR_PRIx"\n",
 +                      value, offset);
 +        return;
 +    }
 +
 +    value = value & ~PASSWORD_MASK;
 +
 +    switch (offset) {
 +    case R_RSTC:
 +        s->rstc = value;
 +        if (value & V_RSTC_RESET) {
 +            if ((s->rsts & 0xfff) == V_RSTS_POWEROFF) {
 +                qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
 +            } else {
 +                qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
 +            }
 +        }
 +        break;
 +    case R_RSTS:
 +        qemu_log_mask(LOG_UNIMP,
 +                      "bcm2835_powermgt_write: RSTS\n");
 +        s->rsts = value;
 +        break;
 +    case R_WDOG:
 +        qemu_log_mask(LOG_UNIMP,
 +                      "bcm2835_powermgt_write: WDOG\n");
 +        s->wdog = value;
 +        break;
 +
 +    default:
 +        qemu_log_mask(LOG_UNIMP,
 +                      "bcm2835_powermgt_write: Unknown offset 0x%08"HWADDR_PRIx
 +                      "\n", offset);
 +        break;
 +    }
 +}
 +
 +static const MemoryRegionOps bcm2835_powermgt_ops = {
 +    .read = bcm2835_powermgt_read,
 +    .write = bcm2835_powermgt_write,
 +    .endianness = DEVICE_NATIVE_ENDIAN,
 +    .impl.min_access_size = 4,
 +    .impl.max_access_size = 4,
 +};
 +
 +static const VMStateDescription vmstate_bcm2835_powermgt = {
 +    .name = TYPE_BCM2835_POWERMGT,
 +    .version_id = 1,
 +    .minimum_version_id = 1,
 +    .fields = (VMStateField[]) {
 +        VMSTATE_UINT32(rstc, BCM2835PowerMgtState),
 +        VMSTATE_UINT32(rsts, BCM2835PowerMgtState),
 +        VMSTATE_UINT32(wdog, BCM2835PowerMgtState),
 +        VMSTATE_END_OF_LIST()
 +    }
 +};
 +
 +static void bcm2835_powermgt_init(Object *obj)
 +{
 +    BCM2835PowerMgtState *s = BCM2835_POWERMGT(obj);
 +
 +    memory_region_init_io(&s->iomem, obj, &bcm2835_powermgt_ops, s,
 +                          TYPE_BCM2835_POWERMGT, 0x200);
 +    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->iomem);
 +}
 +
 +static void bcm2835_powermgt_reset(DeviceState *dev)
 +{
 +    BCM2835PowerMgtState *s = BCM2835_POWERMGT(dev);
 +
 +    /* https://elinux.org/BCM2835_registers#PM */
 +    s->rstc = 0x00000102;
 +    s->rsts = 0x00001000;
 +    s->wdog = 0x00000000;
 +}
 +
 +static void bcm2835_powermgt_class_init(ObjectClass *klass, void *data)
 +{
 +    DeviceClass *dc = DEVICE_CLASS(klass);
 +
 +    dc->reset = bcm2835_powermgt_reset;
 +    dc->vmsd = &vmstate_bcm2835_powermgt;
 +}
 +
 +static TypeInfo bcm2835_powermgt_info = {
 +    .name          = TYPE_BCM2835_POWERMGT,
 +    .parent        = TYPE_SYS_BUS_DEVICE,
 +    .instance_size = sizeof(BCM2835PowerMgtState),
 +    .class_init    = bcm2835_powermgt_class_init,
 +    .instance_init = bcm2835_powermgt_init,
 +};
 +
 +static void bcm2835_powermgt_register_types(void)
 +{
 +    type_register_static(&bcm2835_powermgt_info);
 +}
 +
 +type_init(bcm2835_powermgt_register_types)
 diff --git a/hw/misc/meson.build b/hw/misc/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/meson.build
 +++ b/hw/misc/meson.build
@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_RASPI', if_true: files(
    'bcm2835_rng.c',
    'bcm2835_thermal.c',
    'bcm2835_cprman.c',
 +  'bcm2835_powermgt.c',
  ))
  softmmu_ss.add(when: 'CONFIG_SLAVIO', if_true: files('slavio_misc.c'))
  softmmu_ss.add(when: 'CONFIG_ZYNQ', if_true: files('zynq_slcr.c', 'zynq-xadc.c'))
 --
 .20.1

-[PULL 06/37] hw/arm/gumstix: Simplify since the machines are little-endian only
+[PULL 04/24] tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine
 From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-As the Connex and Verdex machines only boot in little-endian,
+Add a test booting and quickly shutdown a raspi2 machine,
-we can simplify the code.
+to test the power management model:
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+   (1/1) tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_raspi2_initrd:
   console: [    0.000000] Booting Linux on physical CPU 0xf00
   console: [    0.000000] Linux version 4.14.98-v7+ (dom@dom-XPS-13-9370) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1200 SMP Tue Feb 12 20:27:48 GMT 2019
   console: [    0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
   console: [    0.000000] CPU: div instructions available: patching division code
   console: [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
   console: [    0.000000] OF: fdt: Machine model: Raspberry Pi 2 Model B
   ...
   console: Boot successful.
   console: cat /proc/cpuinfo
   console: / # cat /proc/cpuinfo
   ...
   console: processor      : 3
   console: model name     : ARMv7 Processor rev 5 (v7l)
   console: BogoMIPS       : 125.00
   console: Features       : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
   console: CPU implementer        : 0x41
   console: CPU architecture: 7
   console: CPU variant    : 0x0
   console: CPU part       : 0xc07
   console: CPU revision   : 5
   console: Hardware       : BCM2835
   console: Revision       : 0000
   console: Serial         : 0000000000000000
   console: cat /proc/iomem
   console: / # cat /proc/iomem
   console: 00000000-3bffffff : System RAM
   console: 00008000-00afffff : Kernel code
   console: 00c00000-00d468ef : Kernel data
   console: 3f006000-3f006fff : dwc_otg
   console: 3f007000-3f007eff : /soc/dma@7e007000
   console: 3f00b880-3f00b8bf : /soc/mailbox@7e00b880
   console: 3f100000-3f100027 : /soc/watchdog@7e100000
   console: 3f101000-3f102fff : /soc/cprman@7e101000
   console: 3f200000-3f2000b3 : /soc/gpio@7e200000
   PASS (24.59 s)
   RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
   JOB TIME   : 25.02 s
 Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
 Message-id: 20210531113837.1689775-1-f4bug@amsat.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/gumstix.c | 16 ++--------------
+ tests/acceptance/boot_linux_console.py | 43 ++++++++++++++++++++++++++
-file changed, 2 insertions(+), 14 deletions(-)
+file changed, 43 insertions(+)
-diff --git a/hw/arm/gumstix.c b/hw/arm/gumstix.c
+diff --git a/tests/acceptance/boot_linux_console.py b/tests/acceptance/boot_linux_console.py
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/gumstix.c
+--- a/tests/acceptance/boot_linux_console.py
-+++ b/hw/arm/gumstix.c
++++ b/tests/acceptance/boot_linux_console.py
-@@ -XXX,XX +XXX,XX @@ static void connex_init(MachineState *machine)
+@@ -XXX,XX +XXX,XX @@
- {
+ from avocado import skip
-     PXA2xxState *cpu;
+ from avocado import skipUnless
-     DriveInfo *dinfo;
+ from avocado_qemu import Test
--    int be;
++from avocado_qemu import exec_command
-     MemoryRegion *address_space_mem = get_system_memory();
+ from avocado_qemu import exec_command_and_wait_for_pattern
+ from avocado_qemu import interrupt_interactive_console_until_pattern
-     uint32_t connex_rom = 0x01000000;
+ from avocado_qemu import wait_for_console_pattern
-@@ -XXX,XX +XXX,XX @@ static void connex_init(MachineState *machine)
+@@ -XXX,XX +XXX,XX @@ def test_arm_raspi2_uart0(self):
-         exit(1);
+         """
-     }
+         self.do_test_arm_raspi2(0)
--#ifdef TARGET_WORDS_BIGENDIAN
++    def test_arm_raspi2_initrd(self):
--    be = 1;
++        """
--#else
++        :avocado: tags=arch:arm
--    be = 0;
++        :avocado: tags=machine:raspi2
--#endif
++        """
-     if (!pflash_cfi01_register(0x00000000, "connext.rom", connex_rom,
++        deb_url = ('http://archive.raspberrypi.org/debian/'
-                                dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
++                   'pool/main/r/raspberrypi-firmware/'
--                               sector_len, 2, 0, 0, 0, 0, be)) {
++                   'raspberrypi-kernel_1.20190215-1_armhf.deb')
-+                               sector_len, 2, 0, 0, 0, 0, 0)) {
++        deb_hash = 'cd284220b32128c5084037553db3c482426f3972'
-         error_report("Error registering flash memory");
++        deb_path = self.fetch_asset(deb_url, asset_hash=deb_hash)
-         exit(1);
++        kernel_path = self.extract_from_deb(deb_path, '/boot/kernel7.img')
-     }
++        dtb_path = self.extract_from_deb(deb_path, '/boot/bcm2709-rpi-2-b.dtb')
-@@ -XXX,XX +XXX,XX @@ static void verdex_init(MachineState *machine)
++
- {
++        initrd_url = ('https://github.com/groeck/linux-build-test/raw/'
-     PXA2xxState *cpu;
++                      '2eb0a73b5d5a28df3170c546ddaaa9757e1e0848/rootfs/'
-     DriveInfo *dinfo;
++                      'arm/rootfs-armv7a.cpio.gz')
--    int be;
++        initrd_hash = '604b2e45cdf35045846b8bbfbf2129b1891bdc9c'
-     MemoryRegion *address_space_mem = get_system_memory();
++        initrd_path_gz = self.fetch_asset(initrd_url, asset_hash=initrd_hash)
++        initrd_path = os.path.join(self.workdir, 'rootfs.cpio')
-     uint32_t verdex_rom = 0x02000000;
++        archive.gzip_uncompress(initrd_path_gz, initrd_path)
-@@ -XXX,XX +XXX,XX @@ static void verdex_init(MachineState *machine)
++
-         exit(1);
++        self.vm.set_console()
-     }
++        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
++                               'earlycon=pl011,0x3f201000 console=ttyAMA0 '
--#ifdef TARGET_WORDS_BIGENDIAN
++                               'panic=-1 noreboot ' +
--    be = 1;
++                               'dwc_otg.fiq_fsm_enable=0')
--#else
++        self.vm.add_args('-kernel', kernel_path,
--    be = 0;
++                         '-dtb', dtb_path,
--#endif
++                         '-initrd', initrd_path,
-     if (!pflash_cfi01_register(0x00000000, "verdex.rom", verdex_rom,
++                         '-append', kernel_command_line,
-                                dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
++                         '-no-reboot')
--                               sector_len, 2, 0, 0, 0, 0, be)) {
++        self.vm.launch()
-+                               sector_len, 2, 0, 0, 0, 0, 0)) {
++        self.wait_for_console_pattern('Boot successful.')
-         error_report("Error registering flash memory");
++
-         exit(1);
++        exec_command_and_wait_for_pattern(self, 'cat /proc/cpuinfo',
-     }
++                                                'BCM2835')
 +        exec_command_and_wait_for_pattern(self, 'cat /proc/iomem',
 +                                                '/soc/cprman@7e101000')
 +        exec_command(self, 'halt')
 +        # Wait for VM to shut down gracefully
 +        self.vm.wait()
 +
      def test_arm_exynos4210_initrd(self):
          """
          :avocado: tags=arch:arm
 --
 .20.1

-[PULL 34/37] target/arm: Apply TBI to ESR_ELx in helper_exception_return
+[PULL 05/24] target/arm: Check NaN mode before silencing NaN
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Joe Komlodi <joe.komlodi@xilinx.com>
-We missed this case within AArch64.ExceptionReturn.
+If the CPU is running in default NaN mode (FPCR.DN == 1) and we execute
 FRSQRTE, FRECPE, or FRECPX with a signaling NaN, parts_silence_nan_frac() will
 assert due to fpst->default_nan_mode being set.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+To avoid this, we check to see what NaN mode we're running in before we call
 floatxx_silence_nan().
 Signed-off-by: Joe Komlodi <joe.komlodi@xilinx.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 1624662174-175828-2-git-send-email-joe.komlodi@xilinx.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20200302175829.2183-5-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper-a64.c | 23 ++++++++++++++++++++++-
+ target/arm/helper-a64.c | 12 +++++++++---
-file changed, 22 insertions(+), 1 deletion(-)
+ target/arm/vfp_helper.c | 24 ++++++++++++++++++------
 files changed, 27 insertions(+), 9 deletions(-)
 diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper-a64.c
 +++ b/target/arm/helper-a64.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(exception_return)(CPUARMState *env, uint64_t new_pc)
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(frecpx_f16)(uint32_t a, void *fpstp)
-                       "AArch32 EL%d PC 0x%" PRIx32 "\n",
+         float16 nan = a;
-                       cur_el, new_el, env->regs[15]);
+         if (float16_is_signaling_nan(a, fpst)) {
-     } else {
+             float_raise(float_flag_invalid, fpst);
-+        int tbii;
+-            nan = float16_silence_nan(a, fpst);
-+
++            if (!fpst->default_nan_mode) {
-         env->aarch64 = 1;
++                nan = float16_silence_nan(a, fpst);
-         spsr &= aarch64_pstate_valid_mask(&env_archcpu(env)->isar);
++            }
          pstate_write(env, spsr);
@@ -XXX,XX +XXX,XX @@ void HELPER(exception_return)(CPUARMState *env, uint64_t new_pc)
              env->pstate &= ~PSTATE_SS;
          }
-         aarch64_restore_sp(env, new_el);
+         if (fpst->default_nan_mode) {
--        env->pc = new_pc;
+             nan = float16_default_nan(fpst);
-         helper_rebuild_hflags_a64(env, new_el);
+@@ -XXX,XX +XXX,XX @@ float32 HELPER(frecpx_f32)(float32 a, void *fpstp)
-+
+         float32 nan = a;
-+        /*
+         if (float32_is_signaling_nan(a, fpst)) {
-+         * Apply TBI to the exception return address.  We had to delay this
+             float_raise(float_flag_invalid, fpst);
-+         * until after we selected the new EL, so that we could select the
+-            nan = float32_silence_nan(a, fpst);
-+         * correct TBI+TBID bits.  This is made easier by waiting until after
++            if (!fpst->default_nan_mode) {
-+         * the hflags rebuild, since we can pull the composite TBII field
++                nan = float32_silence_nan(a, fpst);
 +         * from there.
 +         */
 +        tbii = FIELD_EX32(env->hflags, TBFLAG_A64, TBII);
 +        if ((tbii >> extract64(new_pc, 55, 1)) & 1) {
 +            /* TBI is enabled. */
 +            int core_mmu_idx = cpu_mmu_index(env, false);
 +            if (regime_has_2_ranges(core_to_aa64_mmu_idx(core_mmu_idx))) {
 +                new_pc = sextract64(new_pc, 0, 56);
 +            } else {
 +                new_pc = extract64(new_pc, 0, 56);
 +            }
-+        }
+         }
-+        env->pc = new_pc;
+         if (fpst->default_nan_mode) {
-+
+             nan = float32_default_nan(fpst);
-         qemu_log_mask(CPU_LOG_INT, "Exception return from AArch64 EL%d to "
+@@ -XXX,XX +XXX,XX @@ float64 HELPER(frecpx_f64)(float64 a, void *fpstp)
-                       "AArch64 EL%d PC 0x%" PRIx64 "\n",
+         float64 nan = a;
-                       cur_el, new_el, env->pc);
+         if (float64_is_signaling_nan(a, fpst)) {
              float_raise(float_flag_invalid, fpst);
 -            nan = float64_silence_nan(a, fpst);
 +            if (!fpst->default_nan_mode) {
 +                nan = float64_silence_nan(a, fpst);
 +            }
          }
          if (fpst->default_nan_mode) {
              nan = float64_default_nan(fpst);
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, void *fpstp)
          float16 nan = f16;
          if (float16_is_signaling_nan(f16, fpst)) {
              float_raise(float_flag_invalid, fpst);
 -            nan = float16_silence_nan(f16, fpst);
 +            if (!fpst->default_nan_mode) {
 +                nan = float16_silence_nan(f16, fpst);
 +            }
          }
          if (fpst->default_nan_mode) {
              nan =  float16_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(recpe_f32)(float32 input, void *fpstp)
          float32 nan = f32;
          if (float32_is_signaling_nan(f32, fpst)) {
              float_raise(float_flag_invalid, fpst);
 -            nan = float32_silence_nan(f32, fpst);
 +            if (!fpst->default_nan_mode) {
 +                nan = float32_silence_nan(f32, fpst);
 +            }
          }
          if (fpst->default_nan_mode) {
              nan =  float32_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(recpe_f64)(float64 input, void *fpstp)
          float64 nan = f64;
          if (float64_is_signaling_nan(f64, fpst)) {
              float_raise(float_flag_invalid, fpst);
 -            nan = float64_silence_nan(f64, fpst);
 +            if (!fpst->default_nan_mode) {
 +                nan = float64_silence_nan(f64, fpst);
 +            }
          }
          if (fpst->default_nan_mode) {
              nan =  float64_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, void *fpstp)
          float16 nan = f16;
          if (float16_is_signaling_nan(f16, s)) {
              float_raise(float_flag_invalid, s);
 -            nan = float16_silence_nan(f16, s);
 +            if (!s->default_nan_mode) {
 +                nan = float16_silence_nan(f16, fpstp);
 +            }
          }
          if (s->default_nan_mode) {
              nan =  float16_default_nan(s);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(rsqrte_f32)(float32 input, void *fpstp)
          float32 nan = f32;
          if (float32_is_signaling_nan(f32, s)) {
              float_raise(float_flag_invalid, s);
 -            nan = float32_silence_nan(f32, s);
 +            if (!s->default_nan_mode) {
 +                nan = float32_silence_nan(f32, fpstp);
 +            }
          }
          if (s->default_nan_mode) {
              nan =  float32_default_nan(s);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(rsqrte_f64)(float64 input, void *fpstp)
          float64 nan = f64;
          if (float64_is_signaling_nan(f64, s)) {
              float_raise(float_flag_invalid, s);
 -            nan = float64_silence_nan(f64, s);
 +            if (!s->default_nan_mode) {
 +                nan = float64_silence_nan(f64, fpstp);
 +            }
          }
          if (s->default_nan_mode) {
              nan =  float64_default_nan(s);
 --
 .20.1

-[PULL 02/37] hw/arm: versal: Generate xlnx-versal-virt zdma FDT nodes
+[PULL 06/24] hw/gpio/gpio_pwr: use shutdown function for reboot
-From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+From: Maxim Uvarov <maxim.uvarov@linaro.org>
-Generate xlnx-versal-virt zdma FDT nodes.
+qemu has 2 type of functions: shutdown and reboot. Shutdown
 function has to be used for machine shutdown. Otherwise we cause
 a reset with a bogus "cause" value, when we intended a shutdown.
-Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org>
-Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: KONRAD Frederic <frederic.konrad@adacore.com>
+Message-id: 20210625111842.3790-3-maxim.uvarov@linaro.org
-Reviewed-by: Luc Michel <luc.michel@greensocs.com>
+[PMM: tweaked commit message]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/xlnx-versal-virt.c | 28 ++++++++++++++++++++++++++++
+ hw/gpio/gpio_pwr.c | 2 +-
-file changed, 28 insertions(+)
+file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
+diff --git a/hw/gpio/gpio_pwr.c b/hw/gpio/gpio_pwr.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/xlnx-versal-virt.c
+--- a/hw/gpio/gpio_pwr.c
-+++ b/hw/arm/xlnx-versal-virt.c
++++ b/hw/gpio/gpio_pwr.c
-@@ -XXX,XX +XXX,XX @@ static void fdt_add_gem_nodes(VersalVirt *s)
+@@ -XXX,XX +XXX,XX @@ static void gpio_pwr_reset(void *opaque, int n, int level)
  static void gpio_pwr_shutdown(void *opaque, int n, int level)
  {
      if (level) {
 -        qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
 +        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
      }
  }
-+static void fdt_add_zdma_nodes(VersalVirt *s)
-+{
-+    const char clocknames[] = "clk_main\0clk_apb";
-+    const char compat[] = "xlnx,zynqmp-dma-1.0";
-+    int i;
-+
-+    for (i = XLNX_VERSAL_NR_ADMAS - 1; i >= 0; i--) {
-+        uint64_t addr = MM_ADMA_CH0 + MM_ADMA_CH0_SIZE * i;
-+        char *name = g_strdup_printf("/dma@%" PRIx64, addr);
-+
-+        qemu_fdt_add_subnode(s->fdt, name);
-+
-+        qemu_fdt_setprop_cell(s->fdt, name, "xlnx,bus-width", 64);
-+        qemu_fdt_setprop_cells(s->fdt, name, "clocks",
-+                               s->phandle.clk_25Mhz, s->phandle.clk_25Mhz);
-+        qemu_fdt_setprop(s->fdt, name, "clock-names",
-+                         clocknames, sizeof(clocknames));
-+        qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
-+                               GIC_FDT_IRQ_TYPE_SPI, VERSAL_ADMA_IRQ_0 + i,
-+                               GIC_FDT_IRQ_FLAGS_LEVEL_HI);
-+        qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
-+                                     2, addr, 2, 0x1000);
-+        qemu_fdt_setprop(s->fdt, name, "compatible", compat, sizeof(compat));
-+        g_free(name);
-+    }
-+}
-+
- static void fdt_nop_memory_nodes(void *fdt, Error **errp)
- {
-     Error *err = NULL;
-@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
-     fdt_add_uart_nodes(s);
-     fdt_add_gic_nodes(s);
-     fdt_add_timer_nodes(s);
-+    fdt_add_zdma_nodes(s);
-     fdt_add_cpu_nodes(s, psci_conduit);
-     fdt_add_clk_node(s, "/clk125", 125000000, s->phandle.clk_125Mhz);
-     fdt_add_clk_node(s, "/clk25", 25000000, s->phandle.clk_25Mhz);
 --
 .20.1

-[PULL 31/37] target/arm: Replicate TBI/TBID bits for single range regimes
+[PULL 07/24] target/arm: Fix MVE widening/narrowing VLDR/VSTR offset calculation
-From: Richard Henderson <richard.henderson@linaro.org>
+In do_ldst(), the calculation of the offset needs to be based on the
 size of the memory access, not the size of the elements in the
 vector.  This meant we were getting it wrong for the widening and
 narrowing variants of the various VLDR and VSTR insns.
-Replicate the single TBI bit from TCR_EL2 and TCR_EL3 so that
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-we can unconditionally use pointer bit 55 to index into our
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-composite TBI1:TBI0 field.
+Message-id: 20210628135835.6690-2-peter.maydell@linaro.org
 ---
  target/arm/translate-mve.c | 17 +++++++++--------
 file changed, 9 insertions(+), 8 deletions(-)
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Message-id: 20200302175829.2183-2-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/helper.c | 6 ++++--
 file changed, 4 insertions(+), 2 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/translate-mve.c
-+++ b/target/arm/helper.c
++++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
+@@ -XXX,XX +XXX,XX @@ static bool mve_skip_first_beat(DisasContext *s)
      } else if (mmu_idx == ARMMMUIdx_Stage2) {
          return 0; /* VTCR_EL2 */
      } else {
 -        return extract32(tcr, 20, 1);
 +        /* Replicate the single TBI bit so we always have 2 bits.  */
 +        return extract32(tcr, 20, 1) * 3;
      }
  }
-@@ -XXX,XX +XXX,XX @@ static int aa64_va_parameter_tbid(uint64_t tcr, ARMMMUIdx mmu_idx)
+-static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
-     } else if (mmu_idx == ARMMMUIdx_Stage2) {
++static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn,
-         return 0; /* VTCR_EL2 */
++                    unsigned msize)
-     } else {
+ {
--        return extract32(tcr, 29, 1);
+     TCGv_i32 addr;
-+        /* Replicate the single TBID bit so we always have 2 bits.  */
+     uint32_t offset;
-+        return extract32(tcr, 29, 1) * 3;
+@@ -XXX,XX +XXX,XX @@ static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
          return true;
      }
+-    offset = a->imm << a->size;
++    offset = a->imm << msize;
+     if (!a->a) {
+         offset = -offset;
+     }
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
+         { gen_helper_mve_vstrw, gen_helper_mve_vldrw },
+         { NULL, NULL }
+     };
+-    return do_ldst(s, a, ldstfns[a->size][a->l]);
++    return do_ldst(s, a, ldstfns[a->size][a->l], a->size);
  }
+-#define DO_VLDST_WIDE_NARROW(OP, SLD, ULD, ST)                  \
++#define DO_VLDST_WIDE_NARROW(OP, SLD, ULD, ST, MSIZE)           \
+     static bool trans_##OP(DisasContext *s, arg_VLDR_VSTR *a)   \
+     {                                                           \
+         static MVEGenLdStFn * const ldstfns[2][2] = {           \
+             { gen_helper_mve_##ST, gen_helper_mve_##SLD },      \
+             { NULL, gen_helper_mve_##ULD },                     \
+         };                                                      \
+-        return do_ldst(s, a, ldstfns[a->u][a->l]);              \
++        return do_ldst(s, a, ldstfns[a->u][a->l], MSIZE);       \
+     }
+-DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h)
+-DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w)
+-DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w)
++DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h, MO_8)
++DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w, MO_8)
++DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w, MO_16)
+ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
+ {
 --
 .20.1

-[PULL 35/37] target/arm: Move helper_dc_zva to helper-a64.c
+[PULL 08/24] target/arm: Fix bugs in MVE VRMLALDAVH, VRMLSLDAVH
-From: Richard Henderson <richard.henderson@linaro.org>
+The initial implementation of the MVE VRMLALDAVH and VRMLSLDAVH
 insns had some bugs:
  * the 32x32 multiply of elements was being done as 32x32->32,
    not 32x32->64
  * we were incorrectly maintaining the accumulator in its full
 -bit form across all 4 beats of the insn; in the pseudocode
    it is squashed back into the 64 bits of the RdaHi:RdaLo
    registers after each beat
-This is an aarch64-only function.  Move it out of the shared file.
+In particular, fixing the second of these allows us to recast
-This patch is code movement only.
+the implementation to avoid 128-bit arithmetic entirely.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Since the element size here is always 4, we can also drop the
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+parameterization of ESIZE to make the code a little more readable.
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20200302175829.2183-6-richard.henderson@linaro.org
+Suggested-by: Richard Henderson <richard.henderson@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210628135835.6690-3-peter.maydell@linaro.org
 ---
- target/arm/helper-a64.h |  1 +
+ target/arm/mve_helper.c | 38 +++++++++++++++++++++-----------------
- target/arm/helper.h     |  1 -
+file changed, 21 insertions(+), 17 deletions(-)
  target/arm/helper-a64.c | 91 ++++++++++++++++++++++++++++++++++++++++
  target/arm/op_helper.c  | 93 -----------------------------------------
 files changed, 92 insertions(+), 94 deletions(-)
-diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-a64.h
+--- a/target/arm/mve_helper.c
-+++ b/target/arm/helper-a64.h
++++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(advsimd_f16touinth, i32, f16, ptr)
  DEF_HELPER_2(sqrt_f16, f16, f16, ptr)
  DEF_HELPER_2(exception_return, void, env, i64)
 +DEF_HELPER_2(dc_zva, void, env, i64)
  DEF_HELPER_FLAGS_3(pacia, TCG_CALL_NO_WG, i64, env, i64, i64)
  DEF_HELPER_FLAGS_3(pacib, TCG_CALL_NO_WG, i64, env, i64, i64)
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.h
 +++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(crypto_sm4ekey, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
  DEF_HELPER_FLAGS_3(crc32, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
  DEF_HELPER_FLAGS_3(crc32c, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
 -DEF_HELPER_2(dc_zva, void, env, i64)
  DEF_HELPER_FLAGS_5(gvec_qrdmlah_s16, TCG_CALL_NO_RWG,
                     void, ptr, ptr, ptr, ptr, i32)
 diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper-a64.c
 +++ b/target/arm/helper-a64.c
 @@ -XXX,XX +XXX,XX @@
   */
  #include "qemu/osdep.h"
-+#include "qemu/units.h"
+-#include "qemu/int128.h"
  #include "cpu.h"
- #include "exec/gdbstub.h"
+ #include "internals.h"
- #include "exec/helper-proto.h"
+ #include "vec_internal.h"
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sqrt_f16)(uint32_t a, void *fpstp)
+@@ -XXX,XX +XXX,XX @@ DO_LDAV(vmlsldavsw, 4, int32_t, false, +=, -=)
-     return float16_sqrt(a, s);
+ DO_LDAV(vmlsldavxsw, 4, int32_t, true, +=, -=)
- }
+ /*
-+void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
+- * Rounding multiply add long dual accumulate high: we must keep
-+{
+- * a 72-bit internal accumulator value and return the top 64 bits.
-+    /*
++ * Rounding multiply add long dual accumulate high. In the pseudocode
-+     * Implement DC ZVA, which zeroes a fixed-length block of memory.
++ * this is implemented with a 72-bit internal accumulator value of which
-+     * Note that we do not implement the (architecturally mandated)
++ * the top 64 bits are returned. We optimize this to avoid having to
-+     * alignment fault for attempts to use this on Device memory
++ * use 128-bit arithmetic -- we can do this because the 74-bit accumulator
-+     * (which matches the usual QEMU behaviour of not implementing either
++ * is squashed back into 64-bits after each beat.
 +     * alignment faults or any memory attribute handling).
 +     */
 +    ARMCPU *cpu = env_archcpu(env);
 +    uint64_t blocklen = 4 << cpu->dcz_blocksize;
 +    uint64_t vaddr = vaddr_in & ~(blocklen - 1);
 +
 +#ifndef CONFIG_USER_ONLY
 +    {
 +        /*
 +         * Slightly awkwardly, QEMU's TARGET_PAGE_SIZE may be less than
 +         * the block size so we might have to do more than one TLB lookup.
 +         * We know that in fact for any v8 CPU the page size is at least 4K
 +         * and the block size must be 2K or less, but TARGET_PAGE_SIZE is only
 +         * 1K as an artefact of legacy v5 subpage support being present in the
 +         * same QEMU executable. So in practice the hostaddr[] array has
 +         * two entries, given the current setting of TARGET_PAGE_BITS_MIN.
 +         */
 +        int maxidx = DIV_ROUND_UP(blocklen, TARGET_PAGE_SIZE);
 +        void *hostaddr[DIV_ROUND_UP(2 * KiB, 1 << TARGET_PAGE_BITS_MIN)];
 +        int try, i;
 +        unsigned mmu_idx = cpu_mmu_index(env, false);
 +        TCGMemOpIdx oi = make_memop_idx(MO_UB, mmu_idx);
 +
 +        assert(maxidx <= ARRAY_SIZE(hostaddr));
 +
 +        for (try = 0; try < 2; try++) {
 +
 +            for (i = 0; i < maxidx; i++) {
 +                hostaddr[i] = tlb_vaddr_to_host(env,
 +                                                vaddr + TARGET_PAGE_SIZE * i,
 +                                                1, mmu_idx);
 +                if (!hostaddr[i]) {
 +                    break;
 +                }
 +            }
 +            if (i == maxidx) {
 +                /*
 +                 * If it's all in the TLB it's fair game for just writing to;
 +                 * we know we don't need to update dirty status, etc.
 +                 */
 +                for (i = 0; i < maxidx - 1; i++) {
 +                    memset(hostaddr[i], 0, TARGET_PAGE_SIZE);
 +                }
 +                memset(hostaddr[i], 0, blocklen - (i * TARGET_PAGE_SIZE));
 +                return;
 +            }
 +            /*
 +             * OK, try a store and see if we can populate the tlb. This
 +             * might cause an exception if the memory isn't writable,
 +             * in which case we will longjmp out of here. We must for
 +             * this purpose use the actual register value passed to us
 +             * so that we get the fault address right.
 +             */
 +            helper_ret_stb_mmu(env, vaddr_in, 0, oi, GETPC());
 +            /* Now we can populate the other TLB entries, if any */
 +            for (i = 0; i < maxidx; i++) {
 +                uint64_t va = vaddr + TARGET_PAGE_SIZE * i;
 +                if (va != (vaddr_in & TARGET_PAGE_MASK)) {
 +                    helper_ret_stb_mmu(env, va, 0, oi, GETPC());
 +                }
 +            }
 +        }
 +
 +        /*
 +         * Slow path (probably attempt to do this to an I/O device or
 +         * similar, or clearing of a block of code we have translations
 +         * cached for). Just do a series of byte writes as the architecture
 +         * demands. It's not worth trying to use a cpu_physical_memory_map(),
 +         * memset(), unmap() sequence here because:
 +         *  + we'd need to account for the blocksize being larger than a page
 +         *  + the direct-RAM access case is almost always going to be dealt
 +         *    with in the fastpath code above, so there's no speed benefit
 +         *  + we would have to deal with the map returning NULL because the
 +         *    bounce buffer was in use
 +         */
 +        for (i = 0; i < blocklen; i++) {
 +            helper_ret_stb_mmu(env, vaddr + i, 0, oi, GETPC());
 +        }
 +    }
 +#else
 +    memset(g2h(vaddr), 0, blocklen);
 +#endif
 +}
 diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/op_helper.c
 +++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@
   * License along with this library; if not, see <http://www.gnu.org/licenses/>.
   */
- #include "qemu/osdep.h"
+-#define DO_LDAVH(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC, TO128)         \
--#include "qemu/units.h"
++#define DO_LDAVH(OP, TYPE, LTYPE, XCHG, SUB)                            \
- #include "qemu/log.h"
+     uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
- #include "qemu/main-loop.h"
+                                     void *vm, uint64_t a)               \
- #include "cpu.h"
+     {                                                                   \
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(ror_cc)(CPUARMState *env, uint32_t x, uint32_t i)
+         uint16_t mask = mve_element_mask(env);                          \
-         return ((uint32_t)x >> shift) | (x << (32 - shift));
+         unsigned e;                                                     \
          TYPE *n = vn, *m = vm;                                          \
 -        Int128 acc = int128_lshift(TO128(a), 8);                        \
 -        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +        for (e = 0; e < 16 / 4; e++, mask >>= 4) {                      \
              if (mask & 1) {                                             \
 +                LTYPE mul;                                              \
                  if (e & 1) {                                            \
 -                    acc = ODDACC(acc, TO128(n[H##ESIZE(e - 1 * XCHG)] * \
 -                                            m[H##ESIZE(e)]));           \
 +                    mul = (LTYPE)n[H4(e - 1 * XCHG)] * m[H4(e)];        \
 +                    if (SUB) {                                          \
 +                        mul = -mul;                                     \
 +                    }                                                   \
                  } else {                                                \
 -                    acc = EVENACC(acc, TO128(n[H##ESIZE(e + 1 * XCHG)] * \
 -                                             m[H##ESIZE(e)]));          \
 +                    mul = (LTYPE)n[H4(e + 1 * XCHG)] * m[H4(e)];        \
                  }                                                       \
 -                acc = int128_add(acc, int128_make64(1 << 7));           \
 +                mul = (mul >> 8) + ((mul >> 7) & 1);                    \
 +                a += mul;                                               \
              }                                                           \
          }                                                               \
          mve_advance_vpt(env);                                           \
 -        return int128_getlo(int128_rshift(acc, 8));                     \
 +        return a;                                                       \
      }
- }
--
+-DO_LDAVH(vrmlaldavhsw, 4, int32_t, false, int128_add, int128_add, int128_makes64)
--void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
+-DO_LDAVH(vrmlaldavhxsw, 4, int32_t, true, int128_add, int128_add, int128_makes64)
--{
++DO_LDAVH(vrmlaldavhsw, int32_t, int64_t, false, false)
--    /*
++DO_LDAVH(vrmlaldavhxsw, int32_t, int64_t, true, false)
--     * Implement DC ZVA, which zeroes a fixed-length block of memory.
--     * Note that we do not implement the (architecturally mandated)
+-DO_LDAVH(vrmlaldavhuw, 4, uint32_t, false, int128_add, int128_add, int128_make64)
--     * alignment fault for attempts to use this on Device memory
++DO_LDAVH(vrmlaldavhuw, uint32_t, uint64_t, false, false)
--     * (which matches the usual QEMU behaviour of not implementing either
--     * alignment faults or any memory attribute handling).
+-DO_LDAVH(vrmlsldavhsw, 4, int32_t, false, int128_add, int128_sub, int128_makes64)
--     */
+-DO_LDAVH(vrmlsldavhxsw, 4, int32_t, true, int128_add, int128_sub, int128_makes64)
--
++DO_LDAVH(vrmlsldavhsw, int32_t, int64_t, false, true)
--    ARMCPU *cpu = env_archcpu(env);
++DO_LDAVH(vrmlsldavhxsw, int32_t, int64_t, true, true)
--    uint64_t blocklen = 4 << cpu->dcz_blocksize;
--    uint64_t vaddr = vaddr_in & ~(blocklen - 1);
+ /* Vector add across vector */
--
+ #define DO_VADDV(OP, ESIZE, TYPE)                               \
 -#ifndef CONFIG_USER_ONLY
 -    {
 -        /*
 -         * Slightly awkwardly, QEMU's TARGET_PAGE_SIZE may be less than
 -         * the block size so we might have to do more than one TLB lookup.
 -         * We know that in fact for any v8 CPU the page size is at least 4K
 -         * and the block size must be 2K or less, but TARGET_PAGE_SIZE is only
 -         * 1K as an artefact of legacy v5 subpage support being present in the
 -         * same QEMU executable. So in practice the hostaddr[] array has
 -         * two entries, given the current setting of TARGET_PAGE_BITS_MIN.
 -         */
 -        int maxidx = DIV_ROUND_UP(blocklen, TARGET_PAGE_SIZE);
 -        void *hostaddr[DIV_ROUND_UP(2 * KiB, 1 << TARGET_PAGE_BITS_MIN)];
 -        int try, i;
 -        unsigned mmu_idx = cpu_mmu_index(env, false);
 -        TCGMemOpIdx oi = make_memop_idx(MO_UB, mmu_idx);
 -
 -        assert(maxidx <= ARRAY_SIZE(hostaddr));
 -
 -        for (try = 0; try < 2; try++) {
 -
 -            for (i = 0; i < maxidx; i++) {
 -                hostaddr[i] = tlb_vaddr_to_host(env,
 -                                                vaddr + TARGET_PAGE_SIZE * i,
 -                                                1, mmu_idx);
 -                if (!hostaddr[i]) {
 -                    break;
 -                }
 -            }
 -            if (i == maxidx) {
 -                /*
 -                 * If it's all in the TLB it's fair game for just writing to;
 -                 * we know we don't need to update dirty status, etc.
 -                 */
 -                for (i = 0; i < maxidx - 1; i++) {
 -                    memset(hostaddr[i], 0, TARGET_PAGE_SIZE);
 -                }
 -                memset(hostaddr[i], 0, blocklen - (i * TARGET_PAGE_SIZE));
 -                return;
 -            }
 -            /*
 -             * OK, try a store and see if we can populate the tlb. This
 -             * might cause an exception if the memory isn't writable,
 -             * in which case we will longjmp out of here. We must for
 -             * this purpose use the actual register value passed to us
 -             * so that we get the fault address right.
 -             */
 -            helper_ret_stb_mmu(env, vaddr_in, 0, oi, GETPC());
 -            /* Now we can populate the other TLB entries, if any */
 -            for (i = 0; i < maxidx; i++) {
 -                uint64_t va = vaddr + TARGET_PAGE_SIZE * i;
 -                if (va != (vaddr_in & TARGET_PAGE_MASK)) {
 -                    helper_ret_stb_mmu(env, va, 0, oi, GETPC());
 -                }
 -            }
 -        }
 -
 -        /*
 -         * Slow path (probably attempt to do this to an I/O device or
 -         * similar, or clearing of a block of code we have translations
 -         * cached for). Just do a series of byte writes as the architecture
 -         * demands. It's not worth trying to use a cpu_physical_memory_map(),
 -         * memset(), unmap() sequence here because:
 -         *  + we'd need to account for the blocksize being larger than a page
 -         *  + the direct-RAM access case is almost always going to be dealt
 -         *    with in the fastpath code above, so there's no speed benefit
 -         *  + we would have to deal with the map returning NULL because the
 -         *    bounce buffer was in use
 -         */
 -        for (i = 0; i < blocklen; i++) {
 -            helper_ret_stb_mmu(env, vaddr + i, 0, oi, GETPC());
 -        }
 -    }
 -#else
 -    memset(g2h(vaddr), 0, blocklen);
 -#endif
 -}
 --
 .20.1

-[PULL 24/37] target/arm: Honor the HCR_EL2.TPU bit
+[PULL 09/24] target/arm: Make asimd_imm_const() public
-From: Richard Henderson <richard.henderson@linaro.org>
+The function asimd_imm_const() in translate-neon.c is an
 implementation of the pseudocode AdvSIMDExpandImm(), which we will
 also want for MVE.  Move the implementation to translate.c, with a
 prototype in translate.h.
-This bit traps EL1 access to cache maintenance insns that operate
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-to the point of unification.  There are no longer any references to
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-plain aa64_cacheop_access, so remove it.
+Message-id: 20210628135835.6690-4-peter.maydell@linaro.org
 ---
  target/arm/translate.h      | 16 ++++++++++
  target/arm/translate-neon.c | 63 -------------------------------------
  target/arm/translate.c      | 57 +++++++++++++++++++++++++++++++++
 files changed, 73 insertions(+), 63 deletions(-)
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/target/arm/translate.h b/target/arm/translate.h
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200229012811.24129-11-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/helper.c | 53 +++++++++++++++++++++++++++------------------
 file changed, 32 insertions(+), 21 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/translate.h
-+++ b/target/arm/helper.c
++++ b/target/arm/translate.h
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo uao_reginfo = {
+@@ -XXX,XX +XXX,XX @@ static inline MemOp finalize_memop(DisasContext *s, MemOp opc)
-     .readfn = aa64_uao_read, .writefn = aa64_uao_write
+     return opc | s->be_data;
- };
+ }
--static CPAccessResult aa64_cacheop_access(CPUARMState *env,
++/**
--                                          const ARMCPRegInfo *ri,
++ * asimd_imm_const: Expand an encoded SIMD constant value
--                                          bool isread)
++ *
 + * Expand a SIMD constant value. This is essentially the pseudocode
 + * AdvSIMDExpandImm, except that we also perform the boolean NOT needed for
 + * VMVN and VBIC (when cmode < 14 && op == 1).
 + *
 + * The combination cmode == 15 op == 1 is a reserved encoding for AArch32;
 + * callers must catch this.
 + *
 + * cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 was UNPREDICTABLE in v7A but
 + * is either not unpredictable or merely CONSTRAINED UNPREDICTABLE in v8A;
 + * we produce an immediate constant value of 0 in these cases.
 + */
 +uint64_t asimd_imm_const(uint32_t imm, int cmode, int op);
 +
  #endif /* TARGET_ARM_TRANSLATE_H */
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ DO_FP_2SH(VCVT_UH, gen_helper_gvec_vcvt_uh)
  DO_FP_2SH(VCVT_HS, gen_helper_gvec_vcvt_hs)
  DO_FP_2SH(VCVT_HU, gen_helper_gvec_vcvt_hu)
 -static uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
 -{
--    /* Cache invalidate/clean: NOP, but EL0 must UNDEF unless
+-    /*
--     * SCTLR_EL1.UCI is set.
+-     * Expand the encoded constant.
 -     * Note that cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 is UNPREDICTABLE.
 -     * We choose to not special-case this and will behave as if a
 -     * valid constant encoding of 0 had been given.
 -     * cmode = 15 op = 1 must UNDEF; we assume decode has handled that.
 -     */
--    if (arm_current_el(env) == 0 && !(arm_sctlr(env, 0) & SCTLR_UCI)) {
+-    switch (cmode) {
--        return CP_ACCESS_TRAP;
+-    case 0: case 1:
 -        /* no-op */
 -        break;
 -    case 2: case 3:
 -        imm <<= 8;
 -        break;
 -    case 4: case 5:
 -        imm <<= 16;
 -        break;
 -    case 6: case 7:
 -        imm <<= 24;
 -        break;
 -    case 8: case 9:
 -        imm |= imm << 16;
 -        break;
 -    case 10: case 11:
 -        imm = (imm << 8) | (imm << 24);
 -        break;
 -    case 12:
 -        imm = (imm << 8) | 0xff;
 -        break;
 -    case 13:
 -        imm = (imm << 16) | 0xffff;
 -        break;
 -    case 14:
 -        if (op) {
 -            /*
 -             * This is the only case where the top and bottom 32 bits
 -             * of the encoded constant differ.
 -             */
 -            uint64_t imm64 = 0;
 -            int n;
 -
 -            for (n = 0; n < 8; n++) {
 -                if (imm & (1 << n)) {
 -                    imm64 |= (0xffULL << (n * 8));
 -                }
 -            }
 -            return imm64;
 -        }
 -        imm |= (imm << 8) | (imm << 16) | (imm << 24);
 -        break;
 -    case 15:
 -        imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
 -            | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
 -        break;
 -    }
--    return CP_ACCESS_OK;
+-    if (op) {
 -        imm = ~imm;
 -    }
 -    return dup_const(MO_32, imm);
 -}
 -
- static CPAccessResult aa64_cacheop_poc_access(CPUARMState *env,
+ static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
-                                               const ARMCPRegInfo *ri,
+                         GVecGen2iFn *fn)
-                                               bool isread)
+ {
-@@ -XXX,XX +XXX,XX @@ static CPAccessResult aa64_cacheop_poc_access(CPUARMState *env,
+diff --git a/target/arm/translate.c b/target/arm/translate.c
-     return CP_ACCESS_OK;
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ void arm_translate_init(void)
      a64_translate_init();
  }
-+static CPAccessResult aa64_cacheop_pou_access(CPUARMState *env,
++uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
 +                                              const ARMCPRegInfo *ri,
 +                                              bool isread)
 +{
-+    /* Cache invalidate/clean to Point of Unification... */
++    /* Expand the encoded constant as per AdvSIMDExpandImm pseudocode */
-+    switch (arm_current_el(env)) {
++    switch (cmode) {
-+    case 0:
++    case 0: case 1:
-+        /* ... EL0 must UNDEF unless SCTLR_EL1.UCI is set.  */
++        /* no-op */
-+        if (!(arm_sctlr(env, 0) & SCTLR_UCI)) {
++        break;
-+            return CP_ACCESS_TRAP;
++    case 2: case 3:
 +        imm <<= 8;
 +        break;
 +    case 4: case 5:
 +        imm <<= 16;
 +        break;
 +    case 6: case 7:
 +        imm <<= 24;
 +        break;
 +    case 8: case 9:
 +        imm |= imm << 16;
 +        break;
 +    case 10: case 11:
 +        imm = (imm << 8) | (imm << 24);
 +        break;
 +    case 12:
 +        imm = (imm << 8) | 0xff;
 +        break;
 +    case 13:
 +        imm = (imm << 16) | 0xffff;
 +        break;
 +    case 14:
 +        if (op) {
 +            /*
 +             * This is the only case where the top and bottom 32 bits
 +             * of the encoded constant differ.
 +             */
 +            uint64_t imm64 = 0;
 +            int n;
 +
 +            for (n = 0; n < 8; n++) {
 +                if (imm & (1 << n)) {
 +                    imm64 |= (0xffULL << (n * 8));
 +                }
 +            }
 +            return imm64;
 +        }
-+        /* fall through */
++        imm |= (imm << 8) | (imm << 16) | (imm << 24);
-+    case 1:
++        break;
-+        /* ... EL1 must trap to EL2 if HCR_EL2.TPU is set.  */
++    case 15:
-+        if (arm_hcr_el2_eff(env) & HCR_TPU) {
++        imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
-+            return CP_ACCESS_TRAP_EL2;
++            | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
 +        }
 +        break;
 +    }
-+    return CP_ACCESS_OK;
++    if (op) {
 +        imm = ~imm;
 +    }
 +    return dup_const(MO_32, imm);
 +}
 +
- /* See: D4.7.2 TLB maintenance requirements and the TLB maintenance instructions
+ /* Generate a label used for skipping this instruction */
-  * Page D4-1736 (DDI0487A.b)
+ void arm_gen_condlabel(DisasContext *s)
-  */
+ {
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
      /* Cache ops: all NOPs since we don't emulate caches */
      { .name = "IC_IALLUIS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
 -      .access = PL1_W, .type = ARM_CP_NOP },
 +      .access = PL1_W, .type = ARM_CP_NOP,
 +      .accessfn = aa64_cacheop_pou_access },
      { .name = "IC_IALLU", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 0,
 -      .access = PL1_W, .type = ARM_CP_NOP },
 +      .access = PL1_W, .type = ARM_CP_NOP,
 +      .accessfn = aa64_cacheop_pou_access },
      { .name = "IC_IVAU", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 5, .opc2 = 1,
        .access = PL0_W, .type = ARM_CP_NOP,
 -      .accessfn = aa64_cacheop_access },
 +      .accessfn = aa64_cacheop_pou_access },
      { .name = "DC_IVAC", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 1,
        .access = PL1_W, .accessfn = aa64_cacheop_poc_access,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
      { .name = "DC_CVAU", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 11, .opc2 = 1,
        .access = PL0_W, .type = ARM_CP_NOP,
 -      .accessfn = aa64_cacheop_access },
 +      .accessfn = aa64_cacheop_pou_access },
      { .name = "DC_CIVAC", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 14, .opc2 = 1,
        .access = PL0_W, .type = ARM_CP_NOP,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
        .writefn = tlbiipas2_is_write },
      /* 32 bit cache operations */
      { .name = "ICIALLUIS", .cp = 15, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
 -      .type = ARM_CP_NOP, .access = PL1_W },
 +      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
      { .name = "BPIALLUIS", .cp = 15, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 6,
        .type = ARM_CP_NOP, .access = PL1_W },
      { .name = "ICIALLU", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 0,
 -      .type = ARM_CP_NOP, .access = PL1_W },
 +      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
      { .name = "ICIMVAU", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 1,
 -      .type = ARM_CP_NOP, .access = PL1_W },
 +      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
      { .name = "BPIALL", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 6,
        .type = ARM_CP_NOP, .access = PL1_W },
      { .name = "BPIMVA", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 7,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
      { .name = "DCCSW", .cp = 15, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 2,
        .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
      { .name = "DCCMVAU", .cp = 15, .opc1 = 0, .crn = 7, .crm = 11, .opc2 = 1,
 -      .type = ARM_CP_NOP, .access = PL1_W },
 +      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
      { .name = "DCCIMVAC", .cp = 15, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 1,
        .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_poc_access },
      { .name = "DCCISW", .cp = 15, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 2,
 --
 .20.1

-[PULL 33/37] target/arm: Introduce core_to_aa64_mmu_idx
+[PULL 10/24] target/arm: Use asimd_imm_const for A64 decode
-From: Richard Henderson <richard.henderson@linaro.org>
+The A64 AdvSIMD modified-immediate grouping uses almost the same
 constant encoding that A32 Neon does; reuse asimd_imm_const() (to
 which we add the AArch64-specific case for cmode 15 op 1) instead of
 reimplementing it all.
-If by context we know that we're in AArch64 mode, we need not
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-test for M-profile when reconstructing the full ARMMMUIdx.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210628135835.6690-5-peter.maydell@linaro.org
 ---
  target/arm/translate.h     |  3 +-
  target/arm/translate-a64.c | 86 ++++----------------------------------
  target/arm/translate.c     | 17 +++++++-
 files changed, 24 insertions(+), 82 deletions(-)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+diff --git a/target/arm/translate.h b/target/arm/translate.h
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20200302175829.2183-4-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/internals.h     | 6 ++++++
  target/arm/translate-a64.c | 2 +-
 files changed, 7 insertions(+), 1 deletion(-)
 diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/internals.h
+--- a/target/arm/translate.h
-+++ b/target/arm/internals.h
++++ b/target/arm/translate.h
-@@ -XXX,XX +XXX,XX @@ static inline ARMMMUIdx core_to_arm_mmu_idx(CPUARMState *env, int mmu_idx)
+@@ -XXX,XX +XXX,XX @@ static inline MemOp finalize_memop(DisasContext *s, MemOp opc)
-     }
+  * VMVN and VBIC (when cmode < 14 && op == 1).
- }
+  *
+  * The combination cmode == 15 op == 1 is a reserved encoding for AArch32;
-+static inline ARMMMUIdx core_to_aa64_mmu_idx(int mmu_idx)
+- * callers must catch this.
-+{
++ * callers must catch this; we return the 64-bit constant value defined
-+    /* AArch64 is always a-profile. */
++ * for AArch64.
-+    return mmu_idx | ARM_MMU_IDX_A;
+  *
-+}
+  * cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 was UNPREDICTABLE in v7A but
-+
+  * is either not unpredictable or merely CONSTRAINED UNPREDICTABLE in v8A;
  int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx);
  /*
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
-     dc->condexec_mask = 0;
+ {
-     dc->condexec_cond = 0;
+     int rd = extract32(insn, 0, 5);
-     core_mmu_idx = FIELD_EX32(tb_flags, TBFLAG_ANY, MMUIDX);
+     int cmode = extract32(insn, 12, 4);
--    dc->mmu_idx = core_to_arm_mmu_idx(env, core_mmu_idx);
+-    int cmode_3_1 = extract32(cmode, 1, 3);
-+    dc->mmu_idx = core_to_aa64_mmu_idx(core_mmu_idx);
+-    int cmode_0 = extract32(cmode, 0, 1);
-     dc->tbii = FIELD_EX32(tb_flags, TBFLAG_A64, TBII);
+     int o2 = extract32(insn, 11, 1);
-     dc->tbid = FIELD_EX32(tb_flags, TBFLAG_A64, TBID);
+     uint64_t abcdefgh = extract32(insn, 5, 5) | (extract32(insn, 16, 3) << 5);
-     dc->current_el = arm_mmu_idx_to_el(dc->mmu_idx);
+     bool is_neg = extract32(insn, 29, 1);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
          return;
      }
 -    /* See AdvSIMDExpandImm() in ARM ARM */
 -    switch (cmode_3_1) {
 -    case 0: /* Replicate(Zeros(24):imm8, 2) */
 -    case 1: /* Replicate(Zeros(16):imm8:Zeros(8), 2) */
 -    case 2: /* Replicate(Zeros(8):imm8:Zeros(16), 2) */
 -    case 3: /* Replicate(imm8:Zeros(24), 2) */
 -    {
 -        int shift = cmode_3_1 * 8;
 -        imm = bitfield_replicate(abcdefgh << shift, 32);
 -        break;
 -    }
 -    case 4: /* Replicate(Zeros(8):imm8, 4) */
 -    case 5: /* Replicate(imm8:Zeros(8), 4) */
 -    {
 -        int shift = (cmode_3_1 & 0x1) * 8;
 -        imm = bitfield_replicate(abcdefgh << shift, 16);
 -        break;
 -    }
 -    case 6:
 -        if (cmode_0) {
 -            /* Replicate(Zeros(8):imm8:Ones(16), 2) */
 -            imm = (abcdefgh << 16) | 0xffff;
 -        } else {
 -            /* Replicate(Zeros(16):imm8:Ones(8), 2) */
 -            imm = (abcdefgh << 8) | 0xff;
 -        }
 -        imm = bitfield_replicate(imm, 32);
 -        break;
 -    case 7:
 -        if (!cmode_0 && !is_neg) {
 -            imm = bitfield_replicate(abcdefgh, 8);
 -        } else if (!cmode_0 && is_neg) {
 -            int i;
 -            imm = 0;
 -            for (i = 0; i < 8; i++) {
 -                if ((abcdefgh) & (1 << i)) {
 -                    imm |= 0xffULL << (i * 8);
 -                }
 -            }
 -        } else if (cmode_0) {
 -            if (is_neg) {
 -                imm = (abcdefgh & 0x3f) << 48;
 -                if (abcdefgh & 0x80) {
 -                    imm |= 0x8000000000000000ULL;
 -                }
 -                if (abcdefgh & 0x40) {
 -                    imm |= 0x3fc0000000000000ULL;
 -                } else {
 -                    imm |= 0x4000000000000000ULL;
 -                }
 -            } else {
 -                if (o2) {
 -                    /* FMOV (vector, immediate) - half-precision */
 -                    imm = vfp_expand_imm(MO_16, abcdefgh);
 -                    /* now duplicate across the lanes */
 -                    imm = bitfield_replicate(imm, 16);
 -                } else {
 -                    imm = (abcdefgh & 0x3f) << 19;
 -                    if (abcdefgh & 0x80) {
 -                        imm |= 0x80000000;
 -                    }
 -                    if (abcdefgh & 0x40) {
 -                        imm |= 0x3e000000;
 -                    } else {
 -                        imm |= 0x40000000;
 -                    }
 -                    imm |= (imm << 32);
 -                }
 -            }
 -        }
 -        break;
 -    default:
 -        g_assert_not_reached();
 -    }
 -
 -    if (cmode_3_1 != 7 && is_neg) {
 -        imm = ~imm;
 +    if (cmode == 15 && o2 && !is_neg) {
 +        /* FMOV (vector, immediate) - half-precision */
 +        imm = vfp_expand_imm(MO_16, abcdefgh);
 +        /* now duplicate across the lanes */
 +        imm = bitfield_replicate(imm, 16);
 +    } else {
 +        imm = asimd_imm_const(abcdefgh, cmode, is_neg);
      }
      if (!((cmode & 0x9) == 0x1 || (cmode & 0xd) == 0x9)) {
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
      case 14:
          if (op) {
              /*
 -             * This is the only case where the top and bottom 32 bits
 -             * of the encoded constant differ.
 +             * This and cmode == 15 op == 1 are the only cases where
 +             * the top and bottom 32 bits of the encoded constant differ.
               */
              uint64_t imm64 = 0;
              int n;
@@ -XXX,XX +XXX,XX @@ uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
          imm |= (imm << 8) | (imm << 16) | (imm << 24);
          break;
      case 15:
 +        if (op) {
 +            /* Reserved encoding for AArch32; valid for AArch64 */
 +            uint64_t imm64 = (uint64_t)(imm & 0x3f) << 48;
 +            if (imm & 0x80) {
 +                imm64 |= 0x8000000000000000ULL;
 +            }
 +            if (imm & 0x40) {
 +                imm64 |= 0x3fc0000000000000ULL;
 +            } else {
 +                imm64 |= 0x4000000000000000ULL;
 +            }
 +            return imm64;
 +        }
          imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
              | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
          break;
 --
 .20.1

-[PULL 37/37] target/arm: Clean address for DC ZVA
+[PULL 11/24] target/arm: Use dup_const() instead of bitfield_replicate()
-From: Richard Henderson <richard.henderson@linaro.org>
+Use dup_const() instead of bitfield_replicate() in
 disas_simd_mod_imm().
-This data access was forgotten when we added support for cleaning
+(We can't replace the other use of bitfield_replicate() in this file,
-addresses of TBI information.
+in logic_imm_decode_wmask(), because that location needs to handle 2
 and 4 bit elements, which dup_const() cannot.)
-Fixes: 3a471103ac1823ba
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200302175829.2183-8-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210628135835.6690-6-peter.maydell@linaro.org
 ---
  target/arm/translate-a64.c | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void handle_sys(DisasContext *s, uint32_t insn, bool isread,
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
-         return;
+         /* FMOV (vector, immediate) - half-precision */
-     case ARM_CP_DC_ZVA:
+         imm = vfp_expand_imm(MO_16, abcdefgh);
-         /* Writes clear the aligned block of memory which rt points into. */
+         /* now duplicate across the lanes */
--        tcg_rt = cpu_reg(s, rt);
+-        imm = bitfield_replicate(imm, 16);
-+        tcg_rt = clean_data_tbi(s, cpu_reg(s, rt));
++        imm = dup_const(MO_16, imm);
-         gen_helper_dc_zva(cpu_env, tcg_rt);
+     } else {
-         return;
+         imm = asimd_imm_const(abcdefgh, cmode, is_neg);
-     default:
+     }
 --
 .20.1

-[PULL 30/37] hw/arm/cubieboard: report error when using unsupported -bios argument
+[PULL 12/24] target/arm: Implement MVE logical immediate insns
-From: Niek Linnenbank <nieklinnenbank@gmail.com>
+Implement the MVE logical-immediate insns (VMOV, VMVN,
 VORR and VBIC). These have essentially the same encoding
 as their Neon equivalents, and we implement the decode
 in the same way.
-The Cubieboard machine does not support the -bios argument.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Report an error when -bios is used and exit immediately.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210628135835.6690-7-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h    |  4 +++
  target/arm/mve.decode      | 17 +++++++++++++
  target/arm/mve_helper.c    | 24 ++++++++++++++++++
  target/arm/translate-mve.c | 50 ++++++++++++++++++++++++++++++++++++++
 files changed, 95 insertions(+)
-Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 Message-id: 20200227220149.6845-5-nieklinnenbank@gmail.com
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  hw/arm/cubieboard.c | 7 +++++++
 file changed, 7 insertions(+)
 diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/cubieboard.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/arm/cubieboard.c
++++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vaddvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
  DEF_HELPER_FLAGS_3(mve_vaddvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
  DEF_HELPER_FLAGS_3(mve_vaddvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
  DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
 +DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
 +DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
 @@ -XXX,XX +XXX,XX @@
- #include "exec/address-spaces.h"
+ # VQDMULL has size in bit 28: 0 for 16 bit, 1 for 32 bit
- #include "qapi/error.h"
+ %size_28 28:1 !function=plus_1
- #include "cpu.h"
-+#include "sysemu/sysemu.h"
++# 1imm format immediate
- #include "hw/sysbus.h"
++%imm_28_16_0 28:1 16:3 0:4
- #include "hw/boards.h"
++
- #include "hw/arm/allwinner-a10.h"
+ &vldr_vstr rn qd imm p a w size l u
-@@ -XXX,XX +XXX,XX @@ static void cubieboard_init(MachineState *machine)
+ &1op qd qm size
-     AwA10State *a10;
+ &2op qd qm qn size
-     Error *err = NULL;
+ &2scalar qd qn rm size
++&1imm qd imm cmode op
-+    /* BIOS is not supported by this board */
-+    if (bios_name) {
+ @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
-+        error_report("BIOS not supported for this machine");
+ # Note that both Rn and Qd are 3 bits only (no D bit)
-+        exit(1);
+@@ -XXX,XX +XXX,XX @@
  @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
  @2op_sz28 .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn \
       size=%size_28
 +@1imm .... .... .... .... .... cmode:4 .. op:1 . .... &1imm qd=%qd imm=%imm_28_16_0
  # The _rev suffix indicates that Vn and Vm are reversed. This is
  # the case for shifts. In the Arm ARM these insns are documented
@@ -XXX,XX +XXX,XX @@ VADDV            111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rd
  # Predicate operations
  %mask_22_13      22:1 13:3
  VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
 +
 +# Logical immediate operations (1 reg and modified-immediate)
 +
 +# The cmode/op bits here decode VORR/VBIC/VMOV/VMVN, but
 +# not in a way we can conveniently represent in decodetree without
 +# a lot of repetition:
 +# VORR: op=0, (cmode & 1) && cmode < 12
 +# VBIC: op=1, (cmode & 1) && cmode < 12
 +# VMOV: everything else
 +# So we have a single decode line and check the cmode/op in the
 +# trans function.
 +Vimm_1r 111 . 1111 1 . 00 0 ... ... 0 .... 0 1 . 1 .... @1imm
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_1OP(vnegw, 4, int32_t, DO_NEG)
  DO_1OP(vfnegh, 8, uint64_t, DO_FNEGH)
  DO_1OP(vfnegs, 8, uint64_t, DO_FNEGS)
 +/*
 + * 1 operand immediates: Vda is destination and possibly also one source.
 + * All these insns work at 64-bit widths.
 + */
 +#define DO_1OP_IMM(OP, FN)                                              \
 +    void HELPER(mve_##OP)(CPUARMState *env, void *vda, uint64_t imm)    \
 +    {                                                                   \
 +        uint64_t *da = vda;                                             \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned e;                                                     \
 +        for (e = 0; e < 16 / 8; e++, mask >>= 8) {                      \
 +            mergemask(&da[H8(e)], FN(da[H8(e)], imm), mask);            \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
-     /* This board has fixed size RAM (512MiB or 1GiB) */
++#define DO_MOVI(N, I) (I)
-     if (machine->ram_size != 512 * MiB &&
++#define DO_ANDI(N, I) ((N) & (I))
-         machine->ram_size != 1 * GiB) {
++#define DO_ORRI(N, I) ((N) | (I))
 +
 +DO_1OP_IMM(vmovi, DO_MOVI)
 +DO_1OP_IMM(vandi, DO_ANDI)
 +DO_1OP_IMM(vorri, DO_ORRI)
 +
  #define DO_2OP(OP, ESIZE, TYPE, FN)                                     \
      void HELPER(glue(mve_, OP))(CPUARMState *env,                       \
                                  void *vd, void *vn, void *vm)           \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
  typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
 +typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
  /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
  static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static bool trans_VADDV(DisasContext *s, arg_VADDV *a)
      mve_update_eci(s);
      return true;
  }
 +
 +static bool do_1imm(DisasContext *s, arg_1imm *a, MVEGenOneOpImmFn *fn)
 +{
 +    TCGv_ptr qd;
 +    uint64_t imm;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd) ||
 +        !fn) {
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    imm = asimd_imm_const(a->imm, a->cmode, a->op);
 +
 +    qd = mve_qreg_ptr(a->qd);
 +    fn(cpu_env, qd, tcg_constant_i64(imm));
 +    tcg_temp_free_ptr(qd);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +static bool trans_Vimm_1r(DisasContext *s, arg_1imm *a)
 +{
 +    /* Handle decode of cmode/op here between VORR/VBIC/VMOV */
 +    MVEGenOneOpImmFn *fn;
 +
 +    if ((a->cmode & 1) && a->cmode < 12) {
 +        if (a->op) {
 +            /*
 +             * For op=1, the immediate will be inverted by asimd_imm_const(),
 +             * so the VBIC becomes a logical AND operation.
 +             */
 +            fn = gen_helper_mve_vandi;
 +        } else {
 +            fn = gen_helper_mve_vorri;
 +        }
 +    } else {
 +        /* There is one unallocated cmode/op combination in this space */
 +        if (a->cmode == 15 && a->op == 1) {
 +            return false;
 +        }
 +        /* asimd_imm_const() sorts out VMVNI vs VMOVI for us */
 +        fn = gen_helper_mve_vmovi;
 +    }
 +    return do_1imm(s, a, fn);
 +}
 --
 .20.1

-[PULL 25/37] target/arm: Honor the HCR_EL2.TTLB bit
+[PULL 13/24] target/arm: Implement MVE vector shift left by immediate insns
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the MVE shift-vector-left-by-immediate insns VSHL, VQSHL
+and VQSHLU.
-This bit traps EL1 access to tlb maintenance insns.
+The size-and-immediate encoding here is the same as Neon, and we
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+handle it the same way neon-dp.decode does.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200229012811.24129-12-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210628135835.6690-8-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 85 +++++++++++++++++++++++++++++----------------
+ target/arm/helper-mve.h    | 16 +++++++++++
-file changed, 55 insertions(+), 30 deletions(-)
+ target/arm/mve.decode      | 23 +++++++++++++++
+ target/arm/mve_helper.c    | 57 ++++++++++++++++++++++++++++++++++++++
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+ target/arm/translate-mve.c | 51 ++++++++++++++++++++++++++++++++++
-index XXXXXXX..XXXXXXX 100644
+files changed, 147 insertions(+)
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_tacr(CPUARMState *env, const ARMCPRegInfo *ri,
+index XXXXXXX..XXXXXXX 100644
-     return CP_ACCESS_OK;
+--- a/target/arm/helper-mve.h
 +++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
  DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
  DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
  DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
 +
 +DEF_HELPER_FLAGS_4(mve_vshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqshli_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqshli_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vqshlui_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqshlui_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vqshlui_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  &2op qd qm qn size
  &2scalar qd qn rm size
  &1imm qd imm cmode op
 +&2shift qd qm shift size
  @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
  # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -XXX,XX +XXX,XX @@
  @2scalar .... .... .. size:2 .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
  @2scalar_nosz .... .... .... .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
 +@2_shl_b .... .... .. 001 shift:3 .... .... .... .... &2shift qd=%qd qm=%qm size=0
 +@2_shl_h .... .... .. 01  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
 +@2_shl_w .... .... .. 1   shift:5 .... .... .... .... &2shift qd=%qd qm=%qm size=2
 +
  # Vector loads and stores
  # Widening loads and narrowing stores:
@@ -XXX,XX +XXX,XX @@ VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
  # So we have a single decode line and check the cmode/op in the
  # trans function.
  Vimm_1r 111 . 1111 1 . 00 0 ... ... 0 .... 0 1 . 1 .... @1imm
 +
 +# Shifts by immediate
 +
 +VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_b
 +VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_h
 +VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_w
 +
 +VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_b
 +VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_h
 +VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_w
 +
 +VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_b
 +VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_h
 +VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_w
 +
 +VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_b
 +VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_h
 +VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_w
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT(vqsubsw, 4, int32_t, DO_SQSUB_W)
      WRAP_QRSHL_HELPER(do_sqrshl_bhs, N, M, true, satp)
  #define DO_UQRSHL_OP(N, M, satp) \
      WRAP_QRSHL_HELPER(do_uqrshl_bhs, N, M, true, satp)
 +#define DO_SUQSHL_OP(N, M, satp) \
 +    WRAP_QRSHL_HELPER(do_suqrshl_bhs, N, M, false, satp)
  DO_2OP_SAT_S(vqshls, DO_SQSHL_OP)
  DO_2OP_SAT_U(vqshlu, DO_UQSHL_OP)
@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvsw, 4, uint32_t)
  DO_VADDV(vaddvub, 1, uint8_t)
  DO_VADDV(vaddvuh, 2, uint16_t)
  DO_VADDV(vaddvuw, 4, uint32_t)
 +
 +/* Shifts by immediate */
 +#define DO_2SHIFT(OP, ESIZE, TYPE, FN)                          \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
 +                                void *vm, uint32_t shift)       \
 +    {                                                           \
 +        TYPE *d = vd, *m = vm;                                  \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned e;                                             \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
 +            mergemask(&d[H##ESIZE(e)],                          \
 +                      FN(m[H##ESIZE(e)], shift), mask);         \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +    }
 +
 +#define DO_2SHIFT_SAT(OP, ESIZE, TYPE, FN)                      \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
 +                                void *vm, uint32_t shift)       \
 +    {                                                           \
 +        TYPE *d = vd, *m = vm;                                  \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned e;                                             \
 +        bool qc = false;                                        \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
 +            bool sat = false;                                   \
 +            mergemask(&d[H##ESIZE(e)],                          \
 +                      FN(m[H##ESIZE(e)], shift, &sat), mask);   \
 +            qc |= sat & mask & 1;                               \
 +        }                                                       \
 +        if (qc) {                                               \
 +            env->vfp.qc[0] = qc;                                \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +    }
 +
 +/* provide unsigned 2-op shift helpers for all sizes */
 +#define DO_2SHIFT_U(OP, FN)                     \
 +    DO_2SHIFT(OP##b, 1, uint8_t, FN)            \
 +    DO_2SHIFT(OP##h, 2, uint16_t, FN)           \
 +    DO_2SHIFT(OP##w, 4, uint32_t, FN)
 +
 +#define DO_2SHIFT_SAT_U(OP, FN)                 \
 +    DO_2SHIFT_SAT(OP##b, 1, uint8_t, FN)        \
 +    DO_2SHIFT_SAT(OP##h, 2, uint16_t, FN)       \
 +    DO_2SHIFT_SAT(OP##w, 4, uint32_t, FN)
 +#define DO_2SHIFT_SAT_S(OP, FN)                 \
 +    DO_2SHIFT_SAT(OP##b, 1, int8_t, FN)         \
 +    DO_2SHIFT_SAT(OP##h, 2, int16_t, FN)        \
 +    DO_2SHIFT_SAT(OP##w, 4, int32_t, FN)
 +
 +DO_2SHIFT_U(vshli_u, DO_VSHLU)
 +DO_2SHIFT_SAT_U(vqshli_u, DO_UQSHL_OP)
 +DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
 +DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
 +typedef void MVEGenTwoOpShiftFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
  typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1imm *a)
      }
      return do_1imm(s, a, fn);
  }
++
-+/* Check for traps from EL1 due to HCR_EL2.TTLB. */
++static bool do_2shift(DisasContext *s, arg_2shift *a, MVEGenTwoOpShiftFn fn,
-+static CPAccessResult access_ttlb(CPUARMState *env, const ARMCPRegInfo *ri,
++                      bool negateshift)
 +                                  bool isread)
 +{
-+    if (arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_TTLB)) {
++    TCGv_ptr qd, qm;
-+        return CP_ACCESS_TRAP_EL2;
++    int shift = a->shift;
-+    }
++
-+    return CP_ACCESS_OK;
++    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd | a->qm) ||
 +        !fn) {
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    /*
 +     * When we handle a right shift insn using a left-shift helper
 +     * which permits a negative shift count to indicate a right-shift,
 +     * we must negate the shift count.
 +     */
 +    if (negateshift) {
 +        shift = -shift;
 +    }
 +
 +    qd = mve_qreg_ptr(a->qd);
 +    qm = mve_qreg_ptr(a->qm);
 +    fn(cpu_env, qd, qm, tcg_constant_i32(shift));
 +    tcg_temp_free_ptr(qd);
 +    tcg_temp_free_ptr(qm);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
- static void dacr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
++#define DO_2SHIFT(INSN, FN, NEGATESHIFT)                         \
- {
++    static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
-     ARMCPU *cpu = env_archcpu(env);
++    {                                                           \
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
++        static MVEGenTwoOpShiftFn * const fns[] = {             \
-       .type = ARM_CP_NO_RAW, .access = PL1_R, .readfn = isr_read },
++            gen_helper_mve_##FN##b,                             \
-     /* 32 bit ITLB invalidates */
++            gen_helper_mve_##FN##h,                             \
-     { .name = "ITLBIALL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 0,
++            gen_helper_mve_##FN##w,                             \
--      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbiall_write },
++            NULL,                                               \
-+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
++        };                                                      \
-+      .writefn = tlbiall_write },
++        return do_2shift(s, a, fns[a->size], NEGATESHIFT);      \
-     { .name = "ITLBIMVA", .cp = 15, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 1,
++    }
--      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbimva_write },
++
-+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
++DO_2SHIFT(VSHLI, vshli_u, false)
-+      .writefn = tlbimva_write },
++DO_2SHIFT(VQSHLI_S, vqshli_s, false)
-     { .name = "ITLBIASID", .cp = 15, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 2,
++DO_2SHIFT(VQSHLI_U, vqshli_u, false)
--      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbiasid_write },
++DO_2SHIFT(VQSHLUI, vqshlui_s, false)
 +      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
 +      .writefn = tlbiasid_write },
      /* 32 bit DTLB invalidates */
      { .name = "DTLBIALL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 0,
 -      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbiall_write },
 +      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
 +      .writefn = tlbiall_write },
      { .name = "DTLBIMVA", .cp = 15, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 1,
 -      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbimva_write },
 +      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
 +      .writefn = tlbimva_write },
      { .name = "DTLBIASID", .cp = 15, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 2,
 -      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbiasid_write },
 +      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
 +      .writefn = tlbiasid_write },
      /* 32 bit TLB invalidates */
      { .name = "TLBIALL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 0,
 -      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbiall_write },
 +      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
 +      .writefn = tlbiall_write },
      { .name = "TLBIMVA", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 1,
 -      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbimva_write },
 +      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
 +      .writefn = tlbimva_write },
      { .name = "TLBIASID", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 2,
 -      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbiasid_write },
 +      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
 +      .writefn = tlbiasid_write },
      { .name = "TLBIMVAA", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 3,
 -      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbimvaa_write },
 +      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
 +      .writefn = tlbimvaa_write },
      REGINFO_SENTINEL
  };
  static const ARMCPRegInfo v7mp_cp_reginfo[] = {
      /* 32 bit TLB invalidates, Inner Shareable */
      { .name = "TLBIALLIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 0,
 -      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbiall_is_write },
 +      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
 +      .writefn = tlbiall_is_write },
      { .name = "TLBIMVAIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 1,
 -      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbimva_is_write },
 +      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
 +      .writefn = tlbimva_is_write },
      { .name = "TLBIASIDIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 2,
 -      .type = ARM_CP_NO_RAW, .access = PL1_W,
 +      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
        .writefn = tlbiasid_is_write },
      { .name = "TLBIMVAAIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 3,
 -      .type = ARM_CP_NO_RAW, .access = PL1_W,
 +      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
        .writefn = tlbimvaa_is_write },
      REGINFO_SENTINEL
  };
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
      /* TLBI operations */
      { .name = "TLBI_VMALLE1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 0,
 -      .access = PL1_W, .type = ARM_CP_NO_RAW,
 +      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
        .writefn = tlbi_aa64_vmalle1is_write },
      { .name = "TLBI_VAE1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 1,
 -      .access = PL1_W, .type = ARM_CP_NO_RAW,
 +      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
        .writefn = tlbi_aa64_vae1is_write },
      { .name = "TLBI_ASIDE1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 2,
 -      .access = PL1_W, .type = ARM_CP_NO_RAW,
 +      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
        .writefn = tlbi_aa64_vmalle1is_write },
      { .name = "TLBI_VAAE1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 3,
 -      .access = PL1_W, .type = ARM_CP_NO_RAW,
 +      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
        .writefn = tlbi_aa64_vae1is_write },
      { .name = "TLBI_VALE1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 5,
 -      .access = PL1_W, .type = ARM_CP_NO_RAW,
 +      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
        .writefn = tlbi_aa64_vae1is_write },
      { .name = "TLBI_VAALE1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 7,
 -      .access = PL1_W, .type = ARM_CP_NO_RAW,
 +      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
        .writefn = tlbi_aa64_vae1is_write },
      { .name = "TLBI_VMALLE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 0,
 -      .access = PL1_W, .type = ARM_CP_NO_RAW,
 +      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
        .writefn = tlbi_aa64_vmalle1_write },
      { .name = "TLBI_VAE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 1,
 -      .access = PL1_W, .type = ARM_CP_NO_RAW,
 +      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
        .writefn = tlbi_aa64_vae1_write },
      { .name = "TLBI_ASIDE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 2,
 -      .access = PL1_W, .type = ARM_CP_NO_RAW,
 +      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
        .writefn = tlbi_aa64_vmalle1_write },
      { .name = "TLBI_VAAE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 3,
 -      .access = PL1_W, .type = ARM_CP_NO_RAW,
 +      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
        .writefn = tlbi_aa64_vae1_write },
      { .name = "TLBI_VALE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 5,
 -      .access = PL1_W, .type = ARM_CP_NO_RAW,
 +      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
        .writefn = tlbi_aa64_vae1_write },
      { .name = "TLBI_VAALE1", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 7,
 -      .access = PL1_W, .type = ARM_CP_NO_RAW,
 +      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
        .writefn = tlbi_aa64_vae1_write },
      { .name = "TLBI_IPAS2E1IS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
  #endif
      /* TLB invalidate last level of translation table walk */
      { .name = "TLBIMVALIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 5,
 -      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbimva_is_write },
 +      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
 +      .writefn = tlbimva_is_write },
      { .name = "TLBIMVAALIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 7,
 -      .type = ARM_CP_NO_RAW, .access = PL1_W,
 +      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
        .writefn = tlbimvaa_is_write },
      { .name = "TLBIMVAL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 5,
 -      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbimva_write },
 +      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
 +      .writefn = tlbimva_write },
      { .name = "TLBIMVAAL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 7,
 -      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbimvaa_write },
 +      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
 +      .writefn = tlbimvaa_write },
      { .name = "TLBIMVALH", .cp = 15, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 5,
        .type = ARM_CP_NO_RAW, .access = PL2_W,
        .writefn = tlbimva_hyp_write },
 --
 .20.1

-[PULL 13/37] hw/arm/strongarm: move timer_new from init() into realize() to avoid memleaks
+[PULL 14/24] target/arm: Implement MVE vector shift right by immediate insns
-From: Pan Nengyuan <pannengyuan@huawei.com>
+Implement the MVE vector shift right by immediate insns VSHRI and
 VRSHRI.  As with Neon, we implement these by using helper functions
 which perform left shifts but allow negative shift counts to indicate
 right shifts.
-There are some memleaks when we call 'device_list_properties'. This patch move timer_new from init into realize to fix it.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210628135835.6690-9-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h     | 12 ++++++++++++
  target/arm/translate.h      | 20 ++++++++++++++++++++
  target/arm/mve.decode       | 28 ++++++++++++++++++++++++++++
  target/arm/mve_helper.c     |  7 +++++++
  target/arm/translate-mve.c  |  5 +++++
  target/arm/translate-neon.c | 18 ------------------
 files changed, 72 insertions(+), 18 deletions(-)
-Reported-by: Euler Robot <euler.robot@huawei.com>
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
 Message-id: 20200227025055.14341-5-pannengyuan@huawei.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  hw/arm/strongarm.c | 18 ++++++++++++------
 file changed, 12 insertions(+), 6 deletions(-)
 diff --git a/hw/arm/strongarm.c b/hw/arm/strongarm.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/strongarm.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/arm/strongarm.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void strongarm_rtc_init(Object *obj)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
-     s->last_rcnr = (uint32_t) mktimegm(&tm);
+ DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
-     s->last_hz = qemu_clock_get_ms(rtc_clock);
+ DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
--    s->rtc_alarm = timer_new_ms(rtc_clock, strongarm_rtc_alarm_tick, s);
++DEF_HELPER_FLAGS_4(mve_vshli_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--    s->rtc_hz = timer_new_ms(rtc_clock, strongarm_rtc_hz_tick, s);
++DEF_HELPER_FLAGS_4(mve_vshli_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--
++DEF_HELPER_FLAGS_4(mve_vshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     sysbus_init_irq(dev, &s->rtc_irq);
++
-     sysbus_init_irq(dev, &s->rtc_hz_irq);
+ DEF_HELPER_FLAGS_4(mve_vshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(mve_vshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ static void strongarm_rtc_init(Object *obj)
+ DEF_HELPER_FLAGS_4(mve_vshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     sysbus_init_mmio(dev, &s->iomem);
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vqshlui_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vqshlui_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vqshlui_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vrshli_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshli_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ static inline int times_2_plus_1(DisasContext *s, int x)
      return x * 2 + 1;
  }
-+static void strongarm_rtc_realize(DeviceState *dev, Error **errp)
++static inline int rsub_64(DisasContext *s, int x)
 +{
-+    StrongARMRTCState *s = STRONGARM_RTC(dev);
++    return 64 - x;
 +    s->rtc_alarm = timer_new_ms(rtc_clock, strongarm_rtc_alarm_tick, s);
 +    s->rtc_hz = timer_new_ms(rtc_clock, strongarm_rtc_hz_tick, s);
 +}
 +
- static int strongarm_rtc_pre_save(void *opaque)
++static inline int rsub_32(DisasContext *s, int x)
 +{
 +    return 32 - x;
 +}
 +
 +static inline int rsub_16(DisasContext *s, int x)
 +{
 +    return 16 - x;
 +}
 +
 +static inline int rsub_8(DisasContext *s, int x)
 +{
 +    return 8 - x;
 +}
 +
  static inline int arm_dc_feature(DisasContext *dc, int feature)
  {
-     StrongARMRTCState *s = opaque;
+     return (dc->features & (1ULL << feature)) != 0;
-@@ -XXX,XX +XXX,XX @@ static void strongarm_rtc_sysbus_class_init(ObjectClass *klass, void *data)
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+index XXXXXXX..XXXXXXX 100644
-     dc->desc = "StrongARM RTC Controller";
+--- a/target/arm/mve.decode
-     dc->vmsd = &vmstate_strongarm_rtc_regs;
++++ b/target/arm/mve.decode
-+    dc->realize = strongarm_rtc_realize;
+@@ -XXX,XX +XXX,XX @@
  @2_shl_h .... .... .. 01  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
  @2_shl_w .... .... .. 1   shift:5 .... .... .... .... &2shift qd=%qd qm=%qm size=2
 +# Right shifts are encoded as N - shift, where N is the element size in bits.
 +%rshift_i5  16:5 !function=rsub_32
 +%rshift_i4  16:4 !function=rsub_16
 +%rshift_i3  16:3 !function=rsub_8
 +
 +@2_shr_b .... .... .. 001 ... .... .... .... .... &2shift qd=%qd qm=%qm \
 +         size=0 shift=%rshift_i3
 +@2_shr_h .... .... .. 01 .... .... .... .... .... &2shift qd=%qd qm=%qm \
 +         size=1 shift=%rshift_i4
 +@2_shr_w .... .... .. 1 ..... .... .... .... .... &2shift qd=%qd qm=%qm \
 +         size=2 shift=%rshift_i5
 +
  # Vector loads and stores
  # Widening loads and narrowing stores:
@@ -XXX,XX +XXX,XX @@ VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_w
  VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_b
  VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_h
  VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_w
 +
 +VSHRI_S           111 0 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_b
 +VSHRI_S           111 0 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_h
 +VSHRI_S           111 0 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_w
 +
 +VSHRI_U           111 1 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_b
 +VSHRI_U           111 1 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_h
 +VSHRI_U           111 1 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_w
 +
 +VRSHRI_S          111 0 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_b
 +VRSHRI_S          111 0 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_h
 +VRSHRI_S          111 0 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
 +
 +VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_b
 +VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_h
 +VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvuw, 4, uint32_t)
      DO_2SHIFT(OP##b, 1, uint8_t, FN)            \
      DO_2SHIFT(OP##h, 2, uint16_t, FN)           \
      DO_2SHIFT(OP##w, 4, uint32_t, FN)
 +#define DO_2SHIFT_S(OP, FN)                     \
 +    DO_2SHIFT(OP##b, 1, int8_t, FN)             \
 +    DO_2SHIFT(OP##h, 2, int16_t, FN)            \
 +    DO_2SHIFT(OP##w, 4, int32_t, FN)
  #define DO_2SHIFT_SAT_U(OP, FN)                 \
      DO_2SHIFT_SAT(OP##b, 1, uint8_t, FN)        \
@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvuw, 4, uint32_t)
      DO_2SHIFT_SAT(OP##w, 4, int32_t, FN)
  DO_2SHIFT_U(vshli_u, DO_VSHLU)
 +DO_2SHIFT_S(vshli_s, DO_VSHLS)
  DO_2SHIFT_SAT_U(vqshli_u, DO_UQSHL_OP)
  DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
  DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
 +DO_2SHIFT_U(vrshli_u, DO_VRSHLU)
 +DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT(VSHLI, vshli_u, false)
  DO_2SHIFT(VQSHLI_S, vqshli_s, false)
  DO_2SHIFT(VQSHLI_U, vqshli_u, false)
  DO_2SHIFT(VQSHLUI, vqshlui_s, false)
 +/* These right shifts use a left-shift helper with negated shift count */
 +DO_2SHIFT(VSHRI_S, vshli_s, true)
 +DO_2SHIFT(VSHRI_U, vshli_u, true)
 +DO_2SHIFT(VRSHRI_S, vrshli_s, true)
 +DO_2SHIFT(VRSHRI_U, vrshli_u, true)
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static inline int plus1(DisasContext *s, int x)
      return x + 1;
  }
- static const TypeInfo strongarm_rtc_sysbus_info = {
+-static inline int rsub_64(DisasContext *s, int x)
-@@ -XXX,XX +XXX,XX @@ static void strongarm_uart_init(Object *obj)
+-{
-                           "uart", 0x10000);
+-    return 64 - x;
-     sysbus_init_mmio(dev, &s->iomem);
+-}
      sysbus_init_irq(dev, &s->irq);
 -
--    s->rx_timeout_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, strongarm_uart_rx_to, s);
+-static inline int rsub_32(DisasContext *s, int x)
--    s->tx_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, strongarm_uart_tx, s);
+-{
- }
+-    return 32 - x;
+-}
- static void strongarm_uart_realize(DeviceState *dev, Error **errp)
+-static inline int rsub_16(DisasContext *s, int x)
 -{
 -    return 16 - x;
 -}
 -static inline int rsub_8(DisasContext *s, int x)
 -{
 -    return 8 - x;
 -}
 -
  static inline int neon_3same_fp_size(DisasContext *s, int x)
  {
-     StrongARMUARTState *s = STRONGARM_UART(dev);
+     /* Convert 0==fp32, 1==fp16 into a MO_* value */
 +    s->rx_timeout_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
 +                                       strongarm_uart_rx_to,
 +                                       s);
 +    s->tx_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, strongarm_uart_tx, s);
      qemu_chr_fe_set_handlers(&s->chr,
                               strongarm_uart_can_receive,
                               strongarm_uart_receive,
 --
 .20.1

-[PULL 29/37] hw/arm/cubieboard: restrict allowed RAM size to 512MiB and 1GiB
+[PULL 15/24] target/arm: Implement MVE VSHLL
-From: Niek Linnenbank <nieklinnenbank@gmail.com>
+Implement the MVE VHLL (vector shift left long) insn.  This has two
 encodings: the T1 encoding is the usual shift-by-immediate format,
 and the T2 encoding is a special case where the shift count is always
 equal to the element size.
-The Cubieboard contains either 512MiB or 1GiB of onboard RAM [1].
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Prevent changing RAM to a different size which could break user programs.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210628135835.6690-10-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h    |  9 +++++++
  target/arm/mve.decode      | 53 +++++++++++++++++++++++++++++++++++---
  target/arm/mve_helper.c    | 32 +++++++++++++++++++++++
  target/arm/translate-mve.c | 15 +++++++++++
 files changed, 105 insertions(+), 4 deletions(-)
- [1] http://linux-sunxi.org/Cubieboard
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
 Message-id: 20200227220149.6845-4-nieklinnenbank@gmail.com
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  hw/arm/cubieboard.c | 8 ++++++++
 file changed, 8 insertions(+)
 diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/cubieboard.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/arm/cubieboard.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void cubieboard_init(MachineState *machine)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     AwA10State *a10;
+ DEF_HELPER_FLAGS_4(mve_vrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     Error *err = NULL;
+ DEF_HELPER_FLAGS_4(mve_vrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(mve_vrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+    /* This board has fixed size RAM (512MiB or 1GiB) */
++
-+    if (machine->ram_size != 512 * MiB &&
++DEF_HELPER_FLAGS_4(mve_vshllbsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+        machine->ram_size != 1 * GiB) {
++DEF_HELPER_FLAGS_4(mve_vshllbsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+        error_report("This machine can only be used with 512MiB or 1GiB RAM");
++DEF_HELPER_FLAGS_4(mve_vshllbub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+        exit(1);
++DEF_HELPER_FLAGS_4(mve_vshllbuh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vshlltsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vshlltsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vshlltub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vshlltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  @2_shl_h .... .... .. 01  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
  @2_shl_w .... .... .. 1   shift:5 .... .... .... .... &2shift qd=%qd qm=%qm size=2
 +@2_shll_b .... .... ... 01 shift:3 .... .... .... .... &2shift qd=%qd qm=%qm size=0
 +@2_shll_h .... .... ... 1  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
 +# VSHLL encoding T2 where shift == esize
 +@2_shll_esize_b .... .... .... 00 .. .... .... .... .... &2shift \
 +                qd=%qd qm=%qm size=0 shift=8
 +@2_shll_esize_h .... .... .... 01 .. .... .... .... .... &2shift \
 +                qd=%qd qm=%qm size=1 shift=16
 +
  # Right shifts are encoded as N - shift, where N is the element size in bits.
  %rshift_i5  16:5 !function=rsub_32
  %rshift_i4  16:4 !function=rsub_16
@@ -XXX,XX +XXX,XX @@ VADD             1110 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
  VSUB             1111 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
  VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
 -VMULH_S          111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
 -VMULH_U          111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
 +# The VSHLL T2 encoding is not a @2op pattern, but is here because it
 +# overlaps what would be size=0b11 VMULH/VRMULH
 +{
 +  VSHLL_BS       111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
 +  VSHLL_BS       111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
 -VRMULH_S         111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
 -VRMULH_U         111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
 +  VMULH_S        111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
 +}
 +
 +{
 +  VSHLL_BU       111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
 +  VSHLL_BU       111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
 +
 +  VMULH_U        111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
 +}
 +
 +{
 +  VSHLL_TS       111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
 +  VSHLL_TS       111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
 +
 +  VRMULH_S       111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
 +}
 +
 +{
 +  VSHLL_TU       111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
 +  VSHLL_TU       111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
 +
 +  VRMULH_U       111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
 +}
  VMAX_S           111 0 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
  VMAX_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
@@ -XXX,XX +XXX,XX @@ VRSHRI_S          111 0 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
  VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_b
  VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_h
  VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
 +
 +# VSHLL T1 encoding; the T2 VSHLL encoding is elsewhere in this file
 +VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_b
 +VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_h
 +
 +VSHLL_BU          111 1 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_b
 +VSHLL_BU          111 1 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_h
 +
 +VSHLL_TS          111 0 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_b
 +VSHLL_TS          111 0 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
 +
 +VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_b
 +VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
  DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
  DO_2SHIFT_U(vrshli_u, DO_VRSHLU)
  DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
 +
 +/*
 + * Long shifts taking half-sized inputs from top or bottom of the input
 + * vector and producing a double-width result. ESIZE, TYPE are for
 + * the input, and LESIZE, LTYPE for the output.
 + * Unlike the normal shift helpers, we do not handle negative shift counts,
 + * because the long shift is strictly left-only.
 + */
 +#define DO_VSHLL(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE)                   \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,             \
 +                                void *vm, uint32_t shift)               \
 +    {                                                                   \
 +        LTYPE *d = vd;                                                  \
 +        TYPE *m = vm;                                                   \
 +        uint16_t mask = mve_element_mask(env);                          \
 +        unsigned le;                                                    \
 +        assert(shift <= 16);                                            \
 +        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
 +            LTYPE r = (LTYPE)m[H##ESIZE(le * 2 + TOP)] << shift;        \
 +            mergemask(&d[H##LESIZE(le)], r, mask);                      \
 +        }                                                               \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
-     /* Only allow Cortex-A8 for this board */
++#define DO_VSHLL_ALL(OP, TOP)                                \
-     if (strcmp(machine->cpu_type, ARM_CPU_TYPE_NAME("cortex-a8")) != 0) {
++    DO_VSHLL(OP##sb, TOP, 1, int8_t, 2, int16_t)             \
-         error_report("This board can only be used with cortex-a8 CPU");
++    DO_VSHLL(OP##ub, TOP, 1, uint8_t, 2, uint16_t)           \
-@@ -XXX,XX +XXX,XX @@ static void cubieboard_machine_init(MachineClass *mc)
++    DO_VSHLL(OP##sh, TOP, 2, int16_t, 4, int32_t)            \
- {
++    DO_VSHLL(OP##uh, TOP, 2, uint16_t, 4, uint32_t)          \
-     mc->desc = "cubietech cubieboard (Cortex-A8)";
++
-     mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a8");
++DO_VSHLL_ALL(vshllb, false)
-+    mc->default_ram_size = 1 * GiB;
++DO_VSHLL_ALL(vshllt, true)
-     mc->init = cubieboard_init;
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-     mc->block_default_type = IF_IDE;
+index XXXXXXX..XXXXXXX 100644
-     mc->units_per_default_bus = 1;
+--- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT(VSHRI_S, vshli_s, true)
  DO_2SHIFT(VSHRI_U, vshli_u, true)
  DO_2SHIFT(VRSHRI_S, vrshli_s, true)
  DO_2SHIFT(VRSHRI_U, vrshli_u, true)
 +
 +#define DO_VSHLL(INSN, FN)                                      \
 +    static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
 +    {                                                           \
 +        static MVEGenTwoOpShiftFn * const fns[] = {             \
 +            gen_helper_mve_##FN##b,                             \
 +            gen_helper_mve_##FN##h,                             \
 +        };                                                      \
 +        return do_2shift(s, a, fns[a->size], false);            \
 +    }
 +
 +DO_VSHLL(VSHLL_BS, vshllbs)
 +DO_VSHLL(VSHLL_BU, vshllbu)
 +DO_VSHLL(VSHLL_TS, vshllts)
 +DO_VSHLL(VSHLL_TU, vshlltu)
 --
 .20.1

-[PULL 28/37] hw/arm/cubieboard: restrict allowed CPU type to ARM Cortex-A8
+[PULL 16/24] target/arm: Implement MVE VSRI, VSLI
-From: Niek Linnenbank <nieklinnenbank@gmail.com>
+Implement the MVE VSRI and VSLI insns, which perform a
 shift-and-insert operation.
-The Cubieboard has an ARM Cortex-A8.  Instead of simply ignoring a
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-bogus -cpu option provided by the user, give them an error message so
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-they know their command line is wrong.
+Message-id: 20210628135835.6690-11-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h    |  8 ++++++++
  target/arm/mve.decode      |  9 ++++++++
  target/arm/mve_helper.c    | 42 ++++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c |  3 +++
 files changed, 62 insertions(+)
-Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 Message-id: 20200227220149.6845-3-nieklinnenbank@gmail.com
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 [PMM: tweaked commit message]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  hw/arm/cubieboard.c | 10 +++++++++-
 file changed, 9 insertions(+), 1 deletion(-)
 diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/cubieboard.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/arm/cubieboard.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static struct arm_boot_info cubieboard_binfo = {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vshlltsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(mve_vshlltsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- static void cubieboard_init(MachineState *machine)
+ DEF_HELPER_FLAGS_4(mve_vshlltub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- {
+ DEF_HELPER_FLAGS_4(mve_vshlltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--    AwA10State *a10 = AW_A10(object_new(TYPE_AW_A10));
++
-+    AwA10State *a10;
++DEF_HELPER_FLAGS_4(mve_vsrib, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     Error *err = NULL;
++DEF_HELPER_FLAGS_4(mve_vsrih, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(mve_vsriw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+    /* Only allow Cortex-A8 for this board */
++
-+    if (strcmp(machine->cpu_type, ARM_CPU_TYPE_NAME("cortex-a8")) != 0) {
++DEF_HELPER_FLAGS_4(mve_vslib, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+        error_report("This board can only be used with cortex-a8 CPU");
++DEF_HELPER_FLAGS_4(mve_vslih, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+        exit(1);
++DEF_HELPER_FLAGS_4(mve_vsliw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VSHLL_TS          111 0 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
  VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_b
  VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
 +
 +# Shift-and-insert
 +VSRI              111 1 1111 1 . ... ... ... 0 0100 0 1 . 1 ... 0 @2_shr_b
 +VSRI              111 1 1111 1 . ... ... ... 0 0100 0 1 . 1 ... 0 @2_shr_h
 +VSRI              111 1 1111 1 . ... ... ... 0 0100 0 1 . 1 ... 0 @2_shr_w
 +
 +VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_b
 +VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_h
 +VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_w
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
  DO_2SHIFT_U(vrshli_u, DO_VRSHLU)
  DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
 +/* Shift-and-insert; we always work with 64 bits at a time */
 +#define DO_2SHIFT_INSERT(OP, ESIZE, SHIFTFN, MASKFN)                    \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,             \
 +                                void *vm, uint32_t shift)               \
 +    {                                                                   \
 +        uint64_t *d = vd, *m = vm;                                      \
 +        uint16_t mask;                                                  \
 +        uint64_t shiftmask;                                             \
 +        unsigned e;                                                     \
 +        if (shift == 0 || shift == ESIZE * 8) {                         \
 +            /*                                                          \
 +             * Only VSLI can shift by 0; only VSRI can shift by <dt>.   \
 +             * The generic logic would give the right answer for 0 but  \
 +             * fails for <dt>.                                          \
 +             */                                                         \
 +            goto done;                                                  \
 +        }                                                               \
 +        assert(shift < ESIZE * 8);                                      \
 +        mask = mve_element_mask(env);                                   \
 +        /* ESIZE / 2 gives the MO_* value if ESIZE is in [1,2,4] */     \
 +        shiftmask = dup_const(ESIZE / 2, MASKFN(ESIZE * 8, shift));     \
 +        for (e = 0; e < 16 / 8; e++, mask >>= 8) {                      \
 +            uint64_t r = (SHIFTFN(m[H8(e)], shift) & shiftmask) |       \
 +                (d[H8(e)] & ~shiftmask);                                \
 +            mergemask(&d[H8(e)], r, mask);                              \
 +        }                                                               \
 +done:                                                                   \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
-+    a10 = AW_A10(object_new(TYPE_AW_A10));
++#define DO_SHL(N, SHIFT) ((N) << (SHIFT))
 +#define DO_SHR(N, SHIFT) ((N) >> (SHIFT))
 +#define SHL_MASK(EBITS, SHIFT) MAKE_64BIT_MASK((SHIFT), (EBITS) - (SHIFT))
 +#define SHR_MASK(EBITS, SHIFT) MAKE_64BIT_MASK(0, (EBITS) - (SHIFT))
 +
-     object_property_set_int(OBJECT(&a10->emac), 1, "phy-addr", &err);
++DO_2SHIFT_INSERT(vsrib, 1, DO_SHR, SHR_MASK)
-     if (err != NULL) {
++DO_2SHIFT_INSERT(vsrih, 2, DO_SHR, SHR_MASK)
-         error_reportf_err(err, "Couldn't set phy address: ");
++DO_2SHIFT_INSERT(vsriw, 4, DO_SHR, SHR_MASK)
 +DO_2SHIFT_INSERT(vslib, 1, DO_SHL, SHL_MASK)
 +DO_2SHIFT_INSERT(vslih, 2, DO_SHL, SHL_MASK)
 +DO_2SHIFT_INSERT(vsliw, 4, DO_SHL, SHL_MASK)
 +
  /*
   * Long shifts taking half-sized inputs from top or bottom of the input
   * vector and producing a double-width result. ESIZE, TYPE are for
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT(VSHRI_U, vshli_u, true)
  DO_2SHIFT(VRSHRI_S, vrshli_s, true)
  DO_2SHIFT(VRSHRI_U, vrshli_u, true)
 +DO_2SHIFT(VSRI, vsri, false)
 +DO_2SHIFT(VSLI, vsli, false)
 +
  #define DO_VSHLL(INSN, FN)                                      \
      static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
      {                                                           \
 --
 .20.1

-[PULL 14/37] hw/timer/cadence_ttc: move timer_new from init() into realize() to avoid memleaks
+[PULL 17/24] target/arm: Implement MVE VSHRN, VRSHRN
-From: Pan Nengyuan <pannengyuan@huawei.com>
+Implement the MVE shift-right-and-narrow insn VSHRN and VRSHRN.
-There are some memleaks when we call 'device_list_properties'. This patch move timer_new from init into realize to fix it.
+do_urshr() is borrowed from sve_helper.c.
-Reported-by: Euler Robot <euler.robot@huawei.com>
-Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
-Message-id: 20200227025055.14341-7-pannengyuan@huawei.com
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210628135835.6690-12-peter.maydell@linaro.org
 ---
- hw/timer/cadence_ttc.c | 18 ++++++++++++------
+ target/arm/helper-mve.h    | 10 ++++++++++
-file changed, 12 insertions(+), 6 deletions(-)
+ target/arm/mve.decode      | 11 +++++++++++
  target/arm/mve_helper.c    | 40 ++++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c | 15 ++++++++++++++
 files changed, 76 insertions(+)
-diff --git a/hw/timer/cadence_ttc.c b/hw/timer/cadence_ttc.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/timer/cadence_ttc.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/timer/cadence_ttc.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void cadence_timer_init(uint32_t freq, CadenceTimerState *s)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vsriw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- static void cadence_ttc_init(Object *obj)
+ DEF_HELPER_FLAGS_4(mve_vslib, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- {
+ DEF_HELPER_FLAGS_4(mve_vslih, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     CadenceTTCState *s = CADENCE_TTC(obj);
+ DEF_HELPER_FLAGS_4(mve_vsliw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--    int i;
++
--
++DEF_HELPER_FLAGS_4(mve_vshrnbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--    for (i = 0; i < 3; ++i) {
++DEF_HELPER_FLAGS_4(mve_vshrnbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--        cadence_timer_init(133000000, &s->timer[i]);
++DEF_HELPER_FLAGS_4(mve_vshrntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--        sysbus_init_irq(SYS_BUS_DEVICE(obj), &s->timer[i].irq);
++DEF_HELPER_FLAGS_4(mve_vshrnth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--    }
++
++DEF_HELPER_FLAGS_4(mve_vrshrnbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     memory_region_init_io(&s->iomem, obj, &cadence_ttc_ops, s,
++DEF_HELPER_FLAGS_4(mve_vrshrnbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-                           "timer", 0x1000);
++DEF_HELPER_FLAGS_4(mve_vrshrntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     sysbus_init_mmio(SYS_BUS_DEVICE(obj), &s->iomem);
++DEF_HELPER_FLAGS_4(mve_vrshrnth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- }
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
+index XXXXXXX..XXXXXXX 100644
-+static void cadence_ttc_realize(DeviceState *dev, Error **errp)
+--- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VSRI              111 1 1111 1 . ... ... ... 0 0100 0 1 . 1 ... 0 @2_shr_w
  VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_b
  VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_h
  VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_w
 +
 +# Narrowing shifts (which only support b and h sizes)
 +VSHRNB            111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_b
 +VSHRNB            111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_h
 +VSHRNT            111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_b
 +VSHRNT            111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_h
 +
 +VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_b
 +VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_h
 +VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_b
 +VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_h
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_INSERT(vsliw, 4, DO_SHL, SHL_MASK)
  DO_VSHLL_ALL(vshllb, false)
  DO_VSHLL_ALL(vshllt, true)
 +
 +/*
 + * Narrowing right shifts, taking a double sized input, shifting it
 + * and putting the result in either the top or bottom half of the output.
 + * ESIZE, TYPE are the output, and LESIZE, LTYPE the input.
 + */
 +#define DO_VSHRN(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN)       \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
 +                                void *vm, uint32_t shift)       \
 +    {                                                           \
 +        LTYPE *m = vm;                                          \
 +        TYPE *d = vd;                                           \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned le;                                            \
 +        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
 +            TYPE r = FN(m[H##LESIZE(le)], shift);               \
 +            mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask);     \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +    }
 +
 +#define DO_VSHRN_ALL(OP, FN)                                    \
 +    DO_VSHRN(OP##bb, false, 1, uint8_t, 2, uint16_t, FN)        \
 +    DO_VSHRN(OP##bh, false, 2, uint16_t, 4, uint32_t, FN)       \
 +    DO_VSHRN(OP##tb, true, 1, uint8_t, 2, uint16_t, FN)         \
 +    DO_VSHRN(OP##th, true, 2, uint16_t, 4, uint32_t, FN)
 +
 +static inline uint64_t do_urshr(uint64_t x, unsigned sh)
 +{
-+    CadenceTTCState *s = CADENCE_TTC(dev);
++    if (likely(sh < 64)) {
-+    int i;
++        return (x >> sh) + ((x >> (sh - 1)) & 1);
-+
++    } else if (sh == 64) {
-+    for (i = 0; i < 3; ++i) {
++        return x >> 63;
-+        cadence_timer_init(133000000, &s->timer[i]);
++    } else {
-+        sysbus_init_irq(SYS_BUS_DEVICE(dev), &s->timer[i].irq);
++        return 0;
 +    }
 +}
 +
- static int cadence_timer_pre_save(void *opaque)
++DO_VSHRN_ALL(vshrn, DO_SHR)
- {
++DO_VSHRN_ALL(vrshrn, do_urshr)
-     cadence_timer_sync((CadenceTimerState *)opaque);
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static void cadence_ttc_class_init(ObjectClass *klass, void *data)
+index XXXXXXX..XXXXXXX 100644
-     DeviceClass *dc = DEVICE_CLASS(klass);
+--- a/target/arm/translate-mve.c
++++ b/target/arm/translate-mve.c
-     dc->vmsd = &vmstate_cadence_ttc;
+@@ -XXX,XX +XXX,XX @@ DO_VSHLL(VSHLL_BS, vshllbs)
-+    dc->realize = cadence_ttc_realize;
+ DO_VSHLL(VSHLL_BU, vshllbu)
- }
+ DO_VSHLL(VSHLL_TS, vshllts)
+ DO_VSHLL(VSHLL_TU, vshlltu)
- static const TypeInfo cadence_ttc_info = {
++
 +#define DO_2SHIFT_N(INSN, FN)                                   \
 +    static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
 +    {                                                           \
 +        static MVEGenTwoOpShiftFn * const fns[] = {             \
 +            gen_helper_mve_##FN##b,                             \
 +            gen_helper_mve_##FN##h,                             \
 +        };                                                      \
 +        return do_2shift(s, a, fns[a->size], false);            \
 +    }
 +
 +DO_2SHIFT_N(VSHRNB, vshrnb)
 +DO_2SHIFT_N(VSHRNT, vshrnt)
 +DO_2SHIFT_N(VRSHRNB, vrshrnb)
 +DO_2SHIFT_N(VRSHRNT, vrshrnt)
 --
 .20.1

-[PULL 01/37] hw/arm: versal: Add support for the LPD ADMAs
+[PULL 18/24] target/arm: Implement MVE saturating narrowing shifts
-From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>
+Implement the MVE saturating shift-right-and-narrow insns
+VQSHRN, VQSHRUN, VQRSHRN and VQRSHRUN.
-Add support for the Versal LPD ADMAs.
+do_srshr() is borrowed from sve_helper.c.
-Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
 Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
 Reviewed-by: KONRAD Frederic <frederic.konrad@adacore.com>
 Reviewed-by: Luc Michel <luc.michel@greensocs.com>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210628135835.6690-13-peter.maydell@linaro.org
 ---
- include/hw/arm/xlnx-versal.h |  6 ++++++
+ target/arm/helper-mve.h    |  30 +++++++++++
- hw/arm/xlnx-versal.c         | 24 ++++++++++++++++++++++++
+ target/arm/mve.decode      |  28 ++++++++++
-files changed, 30 insertions(+)
+ target/arm/mve_helper.c    | 104 +++++++++++++++++++++++++++++++++++++
+ target/arm/translate-mve.c |  12 +++++
-diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
+files changed, 174 insertions(+)
-index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/xlnx-versal.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-+++ b/include/hw/arm/xlnx-versal.h
+index XXXXXXX..XXXXXXX 100644
-@@ -XXX,XX +XXX,XX @@
+--- a/target/arm/helper-mve.h
- #define XLNX_VERSAL_NR_ACPUS   2
++++ b/target/arm/helper-mve.h
- #define XLNX_VERSAL_NR_UARTS   2
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrshrnbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- #define XLNX_VERSAL_NR_GEMS    2
+ DEF_HELPER_FLAGS_4(mve_vrshrnbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+#define XLNX_VERSAL_NR_ADMAS   8
+ DEF_HELPER_FLAGS_4(mve_vrshrntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- #define XLNX_VERSAL_NR_IRQS    192
+ DEF_HELPER_FLAGS_4(mve_vrshrnth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++
- typedef struct Versal {
++DEF_HELPER_FLAGS_4(mve_vqshrnb_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
++DEF_HELPER_FLAGS_4(mve_vqshrnb_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-         struct {
++DEF_HELPER_FLAGS_4(mve_vqshrnt_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-             SysBusDevice *uart[XLNX_VERSAL_NR_UARTS];
++DEF_HELPER_FLAGS_4(mve_vqshrnt_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-             SysBusDevice *gem[XLNX_VERSAL_NR_GEMS];
++
-+            SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
++DEF_HELPER_FLAGS_4(mve_vqshrnb_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-         } iou;
++DEF_HELPER_FLAGS_4(mve_vqshrnb_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     } lpd;
++DEF_HELPER_FLAGS_4(mve_vqshrnt_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(mve_vqshrnt_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
++
- #define VERSAL_GEM0_WAKE_IRQ_0     57
++DEF_HELPER_FLAGS_4(mve_vqshrunbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- #define VERSAL_GEM1_IRQ_0          58
++DEF_HELPER_FLAGS_4(mve_vqshrunbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- #define VERSAL_GEM1_WAKE_IRQ_0     59
++DEF_HELPER_FLAGS_4(mve_vqshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+#define VERSAL_ADMA_IRQ_0          60
++DEF_HELPER_FLAGS_4(mve_vqshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++
- /* Architecturally reserved IRQs suitable for virtualization.  */
++DEF_HELPER_FLAGS_4(mve_vqrshrnb_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- #define VERSAL_RSVD_IRQ_FIRST 111
++DEF_HELPER_FLAGS_4(mve_vqrshrnb_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
++DEF_HELPER_FLAGS_4(mve_vqrshrnt_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- #define MM_GEM1                     0xff0d0000U
++DEF_HELPER_FLAGS_4(mve_vqrshrnt_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- #define MM_GEM1_SIZE                0x10000
++
++DEF_HELPER_FLAGS_4(mve_vqrshrnb_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+#define MM_ADMA_CH0                 0xffa80000U
++DEF_HELPER_FLAGS_4(mve_vqrshrnb_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+#define MM_ADMA_CH0_SIZE            0x10000
++DEF_HELPER_FLAGS_4(mve_vqrshrnt_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+
++DEF_HELPER_FLAGS_4(mve_vqrshrnt_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- #define MM_OCM                      0xfffc0000U
++
- #define MM_OCM_SIZE                 0x40000
++DEF_HELPER_FLAGS_4(mve_vqrshrunbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(mve_vqrshrunbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
++DEF_HELPER_FLAGS_4(mve_vqrshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-index XXXXXXX..XXXXXXX 100644
++DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
---- a/hw/arm/xlnx-versal.c
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-+++ b/hw/arm/xlnx-versal.c
+index XXXXXXX..XXXXXXX 100644
-@@ -XXX,XX +XXX,XX @@ static void versal_create_gems(Versal *s, qemu_irq *pic)
+--- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_b
  VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_h
  VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_b
  VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_h
 +
 +VQSHRNB_S         111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_b
 +VQSHRNB_S         111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_h
 +VQSHRNT_S         111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_b
 +VQSHRNT_S         111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_h
 +VQSHRNB_U         111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_b
 +VQSHRNB_U         111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_h
 +VQSHRNT_U         111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_b
 +VQSHRNT_U         111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_h
 +
 +VQSHRUNB          111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_b
 +VQSHRUNB          111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_h
 +VQSHRUNT          111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
 +VQSHRUNT          111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
 +
 +VQRSHRNB_S        111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_b
 +VQRSHRNB_S        111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_h
 +VQRSHRNT_S        111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_b
 +VQRSHRNT_S        111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_h
 +VQRSHRNB_U        111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_b
 +VQRSHRNB_U        111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_h
 +VQRSHRNT_U        111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_b
 +VQRSHRNT_U        111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_h
 +
 +VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_b
 +VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_h
 +VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
 +VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint64_t do_urshr(uint64_t x, unsigned sh)
      }
  }
-+static void versal_create_admas(Versal *s, qemu_irq *pic)
++static inline int64_t do_srshr(int64_t x, unsigned sh)
 +{
-+    int i;
++    if (likely(sh < 64)) {
-+
++        return (x >> sh) + ((x >> (sh - 1)) & 1);
-+    for (i = 0; i < ARRAY_SIZE(s->lpd.iou.adma); i++) {
++    } else {
-+        char *name = g_strdup_printf("adma%d", i);
++        /* Rounding the sign bit always produces 0. */
-+        DeviceState *dev;
++        return 0;
 +        MemoryRegion *mr;
 +
 +        dev = qdev_create(NULL, "xlnx.zdma");
 +        s->lpd.iou.adma[i] = SYS_BUS_DEVICE(dev);
 +        object_property_add_child(OBJECT(s), name, OBJECT(dev), &error_fatal);
 +        qdev_init_nofail(dev);
 +
 +        mr = sysbus_mmio_get_region(s->lpd.iou.adma[i], 0);
 +        memory_region_add_subregion(&s->mr_ps,
 +                                    MM_ADMA_CH0 + i * MM_ADMA_CH0_SIZE, mr);
 +
 +        sysbus_connect_irq(s->lpd.iou.adma[i], 0, pic[VERSAL_ADMA_IRQ_0 + i]);
 +        g_free(name);
 +    }
 +}
 +
- /* This takes the board allocated linear DDR memory and creates aliases
+ DO_VSHRN_ALL(vshrn, DO_SHR)
-  * for each split DDR range/aperture on the Versal address map.
+ DO_VSHRN_ALL(vrshrn, do_urshr)
-  */
++
-@@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp)
++static inline int32_t do_sat_bhs(int64_t val, int64_t min, int64_t max,
-     versal_create_apu_gic(s, pic);
++                                 bool *satp)
-     versal_create_uarts(s, pic);
++{
-     versal_create_gems(s, pic);
++    if (val > max) {
-+    versal_create_admas(s, pic);
++        *satp = true;
-     versal_map_ddr(s);
++        return max;
-     versal_unimp(s);
++    } else if (val < min) {
++        *satp = true;
 +        return min;
 +    } else {
 +        return val;
 +    }
 +}
 +
 +/* Saturating narrowing right shifts */
 +#define DO_VSHRN_SAT(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN)   \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
 +                                void *vm, uint32_t shift)       \
 +    {                                                           \
 +        LTYPE *m = vm;                                          \
 +        TYPE *d = vd;                                           \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        bool qc = false;                                        \
 +        unsigned le;                                            \
 +        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
 +            bool sat = false;                                   \
 +            TYPE r = FN(m[H##LESIZE(le)], shift, &sat);         \
 +            mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask);     \
 +            qc |= sat && (mask & 1 << (TOP * ESIZE));           \
 +        }                                                       \
 +        if (qc) {                                               \
 +            env->vfp.qc[0] = qc;                                \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +    }
 +
 +#define DO_VSHRN_SAT_UB(BOP, TOP, FN)                           \
 +    DO_VSHRN_SAT(BOP, false, 1, uint8_t, 2, uint16_t, FN)       \
 +    DO_VSHRN_SAT(TOP, true, 1, uint8_t, 2, uint16_t, FN)
 +
 +#define DO_VSHRN_SAT_UH(BOP, TOP, FN)                           \
 +    DO_VSHRN_SAT(BOP, false, 2, uint16_t, 4, uint32_t, FN)      \
 +    DO_VSHRN_SAT(TOP, true, 2, uint16_t, 4, uint32_t, FN)
 +
 +#define DO_VSHRN_SAT_SB(BOP, TOP, FN)                           \
 +    DO_VSHRN_SAT(BOP, false, 1, int8_t, 2, int16_t, FN)         \
 +    DO_VSHRN_SAT(TOP, true, 1, int8_t, 2, int16_t, FN)
 +
 +#define DO_VSHRN_SAT_SH(BOP, TOP, FN)                           \
 +    DO_VSHRN_SAT(BOP, false, 2, int16_t, 4, int32_t, FN)        \
 +    DO_VSHRN_SAT(TOP, true, 2, int16_t, 4, int32_t, FN)
 +
 +#define DO_SHRN_SB(N, M, SATP)                                  \
 +    do_sat_bhs((int64_t)(N) >> (M), INT8_MIN, INT8_MAX, SATP)
 +#define DO_SHRN_UB(N, M, SATP)                                  \
 +    do_sat_bhs((uint64_t)(N) >> (M), 0, UINT8_MAX, SATP)
 +#define DO_SHRUN_B(N, M, SATP)                                  \
 +    do_sat_bhs((int64_t)(N) >> (M), 0, UINT8_MAX, SATP)
 +
 +#define DO_SHRN_SH(N, M, SATP)                                  \
 +    do_sat_bhs((int64_t)(N) >> (M), INT16_MIN, INT16_MAX, SATP)
 +#define DO_SHRN_UH(N, M, SATP)                                  \
 +    do_sat_bhs((uint64_t)(N) >> (M), 0, UINT16_MAX, SATP)
 +#define DO_SHRUN_H(N, M, SATP)                                  \
 +    do_sat_bhs((int64_t)(N) >> (M), 0, UINT16_MAX, SATP)
 +
 +#define DO_RSHRN_SB(N, M, SATP)                                 \
 +    do_sat_bhs(do_srshr(N, M), INT8_MIN, INT8_MAX, SATP)
 +#define DO_RSHRN_UB(N, M, SATP)                                 \
 +    do_sat_bhs(do_urshr(N, M), 0, UINT8_MAX, SATP)
 +#define DO_RSHRUN_B(N, M, SATP)                                 \
 +    do_sat_bhs(do_srshr(N, M), 0, UINT8_MAX, SATP)
 +
 +#define DO_RSHRN_SH(N, M, SATP)                                 \
 +    do_sat_bhs(do_srshr(N, M), INT16_MIN, INT16_MAX, SATP)
 +#define DO_RSHRN_UH(N, M, SATP)                                 \
 +    do_sat_bhs(do_urshr(N, M), 0, UINT16_MAX, SATP)
 +#define DO_RSHRUN_H(N, M, SATP)                                 \
 +    do_sat_bhs(do_srshr(N, M), 0, UINT16_MAX, SATP)
 +
 +DO_VSHRN_SAT_SB(vqshrnb_sb, vqshrnt_sb, DO_SHRN_SB)
 +DO_VSHRN_SAT_SH(vqshrnb_sh, vqshrnt_sh, DO_SHRN_SH)
 +DO_VSHRN_SAT_UB(vqshrnb_ub, vqshrnt_ub, DO_SHRN_UB)
 +DO_VSHRN_SAT_UH(vqshrnb_uh, vqshrnt_uh, DO_SHRN_UH)
 +DO_VSHRN_SAT_SB(vqshrunbb, vqshruntb, DO_SHRUN_B)
 +DO_VSHRN_SAT_SH(vqshrunbh, vqshrunth, DO_SHRUN_H)
 +
 +DO_VSHRN_SAT_SB(vqrshrnb_sb, vqrshrnt_sb, DO_RSHRN_SB)
 +DO_VSHRN_SAT_SH(vqrshrnb_sh, vqrshrnt_sh, DO_RSHRN_SH)
 +DO_VSHRN_SAT_UB(vqrshrnb_ub, vqrshrnt_ub, DO_RSHRN_UB)
 +DO_VSHRN_SAT_UH(vqrshrnb_uh, vqrshrnt_uh, DO_RSHRN_UH)
 +DO_VSHRN_SAT_SB(vqrshrunbb, vqrshruntb, DO_RSHRUN_B)
 +DO_VSHRN_SAT_SH(vqrshrunbh, vqrshrunth, DO_RSHRUN_H)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_N(VSHRNB, vshrnb)
  DO_2SHIFT_N(VSHRNT, vshrnt)
  DO_2SHIFT_N(VRSHRNB, vrshrnb)
  DO_2SHIFT_N(VRSHRNT, vrshrnt)
 +DO_2SHIFT_N(VQSHRNB_S, vqshrnb_s)
 +DO_2SHIFT_N(VQSHRNT_S, vqshrnt_s)
 +DO_2SHIFT_N(VQSHRNB_U, vqshrnb_u)
 +DO_2SHIFT_N(VQSHRNT_U, vqshrnt_u)
 +DO_2SHIFT_N(VQSHRUNB, vqshrunb)
 +DO_2SHIFT_N(VQSHRUNT, vqshrunt)
 +DO_2SHIFT_N(VQRSHRNB_S, vqrshrnb_s)
 +DO_2SHIFT_N(VQRSHRNT_S, vqrshrnt_s)
 +DO_2SHIFT_N(VQRSHRNB_U, vqrshrnb_u)
 +DO_2SHIFT_N(VQRSHRNT_U, vqrshrnt_u)
 +DO_2SHIFT_N(VQRSHRUNB, vqrshrunb)
 +DO_2SHIFT_N(VQRSHRUNT, vqrshrunt)
 --
 .20.1

-[PULL 20/37] target/arm: Honor the HCR_EL2.{TVM,TRVM} bits
+[PULL 19/24] target/arm: Implement MVE VSHLC
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the MVE VSHLC insn, which performs a shift left of the
 entire vector with carry in bits provided from a general purpose
 register and carry out bits written back to that register.
-These bits trap EL1 access to various virtual memory controls.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210628135835.6690-14-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h    |  2 ++
  target/arm/mve.decode      |  2 ++
  target/arm/mve_helper.c    | 38 ++++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c | 30 ++++++++++++++++++++++++++++++
 files changed, 72 insertions(+)
-Buglink: https://bugs.launchpad.net/bugs/1855072
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200229012811.24129-7-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/helper.c | 82 ++++++++++++++++++++++++++++++---------------
 file changed, 55 insertions(+), 27 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/helper.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_tpm(CPUARMState *env, const ARMCPRegInfo *ri,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrshrunbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     return CP_ACCESS_OK;
+ DEF_HELPER_FLAGS_4(mve_vqrshrunbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- }
+ DEF_HELPER_FLAGS_4(mve_vqrshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+/* Check for traps from EL1 due to HCR_EL2.TVM and HCR_EL2.TRVM.  */
++
-+static CPAccessResult access_tvm_trvm(CPUARMState *env, const ARMCPRegInfo *ri,
++DEF_HELPER_FLAGS_4(mve_vshlc, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
-+                                      bool isread)
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_b
  VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_h
  VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
  VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
 +
 +VSHLC             111 0 1110 1 . 1 imm:5 ... 0 1111 1100 rdm:4 qd=%qd
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VSHRN_SAT_UB(vqrshrnb_ub, vqrshrnt_ub, DO_RSHRN_UB)
  DO_VSHRN_SAT_UH(vqrshrnb_uh, vqrshrnt_uh, DO_RSHRN_UH)
  DO_VSHRN_SAT_SB(vqrshrunbb, vqrshruntb, DO_RSHRUN_B)
  DO_VSHRN_SAT_SH(vqrshrunbh, vqrshrunth, DO_RSHRUN_H)
 +
 +uint32_t HELPER(mve_vshlc)(CPUARMState *env, void *vd, uint32_t rdm,
 +                           uint32_t shift)
 +{
-+    if (arm_current_el(env) == 1) {
++    uint32_t *d = vd;
-+        uint64_t trap = isread ? HCR_TRVM : HCR_TVM;
++    uint16_t mask = mve_element_mask(env);
-+        if (arm_hcr_el2_eff(env) & trap) {
++    unsigned e;
-+            return CP_ACCESS_TRAP_EL2;
++    uint32_t r;
 +
 +    /*
 +     * For each 32-bit element, we shift it left, bringing in the
 +     * low 'shift' bits of rdm at the bottom. Bits shifted out at
 +     * the top become the new rdm, if the predicate mask permits.
 +     * The final rdm value is returned to update the register.
 +     * shift == 0 here means "shift by 32 bits".
 +     */
 +    if (shift == 0) {
 +        for (e = 0; e < 16 / 4; e++, mask >>= 4) {
 +            r = rdm;
 +            if (mask & 1) {
 +                rdm = d[H4(e)];
 +            }
 +            mergemask(&d[H4(e)], r, mask);
 +        }
 +    } else {
 +        uint32_t shiftmask = MAKE_64BIT_MASK(0, shift);
 +
 +        for (e = 0; e < 16 / 4; e++, mask >>= 4) {
 +            r = (d[H4(e)] << shift) | (rdm & shiftmask);
 +            if (mask & 1) {
 +                rdm = d[H4(e)] >> (32 - shift);
 +            }
 +            mergemask(&d[H4(e)], r, mask);
 +        }
 +    }
-+    return CP_ACCESS_OK;
++    mve_advance_vpt(env);
 +    return rdm;
 +}
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-mve.c
++++ b/target/arm/translate-mve.c
+@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_N(VQRSHRNB_U, vqrshrnb_u)
+ DO_2SHIFT_N(VQRSHRNT_U, vqrshrnt_u)
+ DO_2SHIFT_N(VQRSHRUNB, vqrshrunb)
+ DO_2SHIFT_N(VQRSHRUNT, vqrshrunt)
 +
- static void dacr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
++static bool trans_VSHLC(DisasContext *s, arg_VSHLC *a)
- {
++{
-     ARMCPU *cpu = env_archcpu(env);
++    /*
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo cp_reginfo[] = {
++     * Whole Vector Left Shift with Carry. The carry is taken
-      */
++     * from a general purpose register and written back there.
-     { .name = "CONTEXTIDR_EL1", .state = ARM_CP_STATE_BOTH,
++     * An imm of 0 means "shift by 32".
-       .opc0 = 3, .opc1 = 0, .crn = 13, .crm = 0, .opc2 = 1,
++     */
--      .access = PL1_RW, .secure = ARM_CP_SECSTATE_NS,
++    TCGv_ptr qd;
-+      .access = PL1_RW, .accessfn = access_tvm_trvm,
++    TCGv_i32 rdm;
-+      .secure = ARM_CP_SECSTATE_NS,
++
-       .fieldoffset = offsetof(CPUARMState, cp15.contextidr_el[1]),
++    if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd)) {
-       .resetvalue = 0, .writefn = contextidr_write, .raw_writefn = raw_write, },
++        return false;
-     { .name = "CONTEXTIDR_S", .state = ARM_CP_STATE_AA32,
++    }
-       .cp = 15, .opc1 = 0, .crn = 13, .crm = 0, .opc2 = 1,
++    if (a->rdm == 13 || a->rdm == 15) {
--      .access = PL1_RW, .secure = ARM_CP_SECSTATE_S,
++        /* CONSTRAINED UNPREDICTABLE: we UNDEF */
-+      .access = PL1_RW, .accessfn = access_tvm_trvm,
++        return false;
-+      .secure = ARM_CP_SECSTATE_S,
++    }
-       .fieldoffset = offsetof(CPUARMState, cp15.contextidr_s),
++    if (!mve_eci_check(s) || !vfp_access_check(s)) {
-       .resetvalue = 0, .writefn = contextidr_write, .raw_writefn = raw_write, },
++        return true;
-     REGINFO_SENTINEL
++    }
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo not_v8_cp_reginfo[] = {
++
-     /* MMU Domain access control / MPU write buffer control */
++    qd = mve_qreg_ptr(a->qd);
-     { .name = "DACR",
++    rdm = load_reg(s, a->rdm);
-       .cp = 15, .opc1 = CP_ANY, .crn = 3, .crm = CP_ANY, .opc2 = CP_ANY,
++    gen_helper_mve_vshlc(rdm, cpu_env, qd, rdm, tcg_constant_i32(a->imm));
--      .access = PL1_RW, .resetvalue = 0,
++    store_reg(s, a->rdm, rdm);
-+      .access = PL1_RW, .accessfn = access_tvm_trvm, .resetvalue = 0,
++    tcg_temp_free_ptr(qd);
-       .writefn = dacr_write, .raw_writefn = raw_write,
++    mve_update_eci(s);
-       .bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.dacr_s),
++    return true;
-                              offsetoflow32(CPUARMState, cp15.dacr_ns) } },
++}
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
      { .name = "DMB", .cp = 15, .crn = 7, .crm = 10, .opc1 = 0, .opc2 = 5,
        .access = PL0_W, .type = ARM_CP_NOP },
      { .name = "IFAR", .cp = 15, .crn = 6, .crm = 0, .opc1 = 0, .opc2 = 2,
 -      .access = PL1_RW,
 +      .access = PL1_RW, .accessfn = access_tvm_trvm,
        .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ifar_s),
                               offsetof(CPUARMState, cp15.ifar_ns) },
        .resetvalue = 0, },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
       */
      { .name = "AFSR0_EL1", .state = ARM_CP_STATE_BOTH,
        .opc0 = 3, .opc1 = 0, .crn = 5, .crm = 1, .opc2 = 0,
 -      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +      .access = PL1_RW, .accessfn = access_tvm_trvm,
 +      .type = ARM_CP_CONST, .resetvalue = 0 },
      { .name = "AFSR1_EL1", .state = ARM_CP_STATE_BOTH,
        .opc0 = 3, .opc1 = 0, .crn = 5, .crm = 1, .opc2 = 1,
 -      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 +      .access = PL1_RW, .accessfn = access_tvm_trvm,
 +      .type = ARM_CP_CONST, .resetvalue = 0 },
      /* MAIR can just read-as-written because we don't implement caches
       * and so don't need to care about memory attributes.
       */
      { .name = "MAIR_EL1", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 2, .opc2 = 0,
 -      .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.mair_el[1]),
 +      .access = PL1_RW, .accessfn = access_tvm_trvm,
 +      .fieldoffset = offsetof(CPUARMState, cp15.mair_el[1]),
        .resetvalue = 0 },
      { .name = "MAIR_EL3", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 6, .crn = 10, .crm = 2, .opc2 = 0,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
        * handled in the field definitions.
        */
      { .name = "MAIR0", .state = ARM_CP_STATE_AA32,
 -      .cp = 15, .opc1 = 0, .crn = 10, .crm = 2, .opc2 = 0, .access = PL1_RW,
 +      .cp = 15, .opc1 = 0, .crn = 10, .crm = 2, .opc2 = 0,
 +      .access = PL1_RW, .accessfn = access_tvm_trvm,
        .bank_fieldoffsets = { offsetof(CPUARMState, cp15.mair0_s),
                               offsetof(CPUARMState, cp15.mair0_ns) },
        .resetfn = arm_cp_reset_ignore },
      { .name = "MAIR1", .state = ARM_CP_STATE_AA32,
 -      .cp = 15, .opc1 = 0, .crn = 10, .crm = 2, .opc2 = 1, .access = PL1_RW,
 +      .cp = 15, .opc1 = 0, .crn = 10, .crm = 2, .opc2 = 1,
 +      .access = PL1_RW, .accessfn = access_tvm_trvm,
        .bank_fieldoffsets = { offsetof(CPUARMState, cp15.mair1_s),
                               offsetof(CPUARMState, cp15.mair1_ns) },
        .resetfn = arm_cp_reset_ignore },
@@ -XXX,XX +XXX,XX @@ static void vttbr_write(CPUARMState *env, const ARMCPRegInfo *ri,
  static const ARMCPRegInfo vmsa_pmsa_cp_reginfo[] = {
      { .name = "DFSR", .cp = 15, .crn = 5, .crm = 0, .opc1 = 0, .opc2 = 0,
 -      .access = PL1_RW, .type = ARM_CP_ALIAS,
 +      .access = PL1_RW, .accessfn = access_tvm_trvm, .type = ARM_CP_ALIAS,
        .bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.dfsr_s),
                               offsetoflow32(CPUARMState, cp15.dfsr_ns) }, },
      { .name = "IFSR", .cp = 15, .crn = 5, .crm = 0, .opc1 = 0, .opc2 = 1,
 -      .access = PL1_RW, .resetvalue = 0,
 +      .access = PL1_RW, .accessfn = access_tvm_trvm, .resetvalue = 0,
        .bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.ifsr_s),
                               offsetoflow32(CPUARMState, cp15.ifsr_ns) } },
      { .name = "DFAR", .cp = 15, .opc1 = 0, .crn = 6, .crm = 0, .opc2 = 0,
 -      .access = PL1_RW, .resetvalue = 0,
 +      .access = PL1_RW, .accessfn = access_tvm_trvm, .resetvalue = 0,
        .bank_fieldoffsets = { offsetof(CPUARMState, cp15.dfar_s),
                               offsetof(CPUARMState, cp15.dfar_ns) } },
      { .name = "FAR_EL1", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .crn = 6, .crm = 0, .opc1 = 0, .opc2 = 0,
 -      .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.far_el[1]),
 +      .access = PL1_RW, .accessfn = access_tvm_trvm,
 +      .fieldoffset = offsetof(CPUARMState, cp15.far_el[1]),
        .resetvalue = 0, },
      REGINFO_SENTINEL
  };
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vmsa_pmsa_cp_reginfo[] = {
  static const ARMCPRegInfo vmsa_cp_reginfo[] = {
      { .name = "ESR_EL1", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .crn = 5, .crm = 2, .opc1 = 0, .opc2 = 0,
 -      .access = PL1_RW,
 +      .access = PL1_RW, .accessfn = access_tvm_trvm,
        .fieldoffset = offsetof(CPUARMState, cp15.esr_el[1]), .resetvalue = 0, },
      { .name = "TTBR0_EL1", .state = ARM_CP_STATE_BOTH,
        .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 0,
 -      .access = PL1_RW, .writefn = vmsa_ttbr_write, .resetvalue = 0,
 +      .access = PL1_RW, .accessfn = access_tvm_trvm,
 +      .writefn = vmsa_ttbr_write, .resetvalue = 0,
        .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr0_s),
                               offsetof(CPUARMState, cp15.ttbr0_ns) } },
      { .name = "TTBR1_EL1", .state = ARM_CP_STATE_BOTH,
        .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 1,
 -      .access = PL1_RW, .writefn = vmsa_ttbr_write, .resetvalue = 0,
 +      .access = PL1_RW, .accessfn = access_tvm_trvm,
 +      .writefn = vmsa_ttbr_write, .resetvalue = 0,
        .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr1_s),
                               offsetof(CPUARMState, cp15.ttbr1_ns) } },
      { .name = "TCR_EL1", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .crn = 2, .crm = 0, .opc1 = 0, .opc2 = 2,
 -      .access = PL1_RW, .writefn = vmsa_tcr_el12_write,
 +      .access = PL1_RW, .accessfn = access_tvm_trvm,
 +      .writefn = vmsa_tcr_el12_write,
        .resetfn = vmsa_ttbcr_reset, .raw_writefn = raw_write,
        .fieldoffset = offsetof(CPUARMState, cp15.tcr_el[1]) },
      { .name = "TTBCR", .cp = 15, .crn = 2, .crm = 0, .opc1 = 0, .opc2 = 2,
 -      .access = PL1_RW, .type = ARM_CP_ALIAS, .writefn = vmsa_ttbcr_write,
 +      .access = PL1_RW, .accessfn = access_tvm_trvm,
 +      .type = ARM_CP_ALIAS, .writefn = vmsa_ttbcr_write,
        .raw_writefn = vmsa_ttbcr_raw_write,
        .bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.tcr_el[3]),
                               offsetoflow32(CPUARMState, cp15.tcr_el[1])} },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vmsa_cp_reginfo[] = {
   */
  static const ARMCPRegInfo ttbcr2_reginfo = {
      .name = "TTBCR2", .cp = 15, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 3,
 -    .access = PL1_RW, .type = ARM_CP_ALIAS,
 +    .access = PL1_RW, .accessfn = access_tvm_trvm,
 +    .type = ARM_CP_ALIAS,
      .bank_fieldoffsets = { offsetofhigh32(CPUARMState, cp15.tcr_el[3]),
                             offsetofhigh32(CPUARMState, cp15.tcr_el[1]) },
  };
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lpae_cp_reginfo[] = {
      /* NOP AMAIR0/1 */
      { .name = "AMAIR0", .state = ARM_CP_STATE_BOTH,
        .opc0 = 3, .crn = 10, .crm = 3, .opc1 = 0, .opc2 = 0,
 -      .access = PL1_RW, .type = ARM_CP_CONST,
 -      .resetvalue = 0 },
 +      .access = PL1_RW, .accessfn = access_tvm_trvm,
 +      .type = ARM_CP_CONST, .resetvalue = 0 },
      /* AMAIR1 is mapped to AMAIR_EL1[63:32] */
      { .name = "AMAIR1", .cp = 15, .crn = 10, .crm = 3, .opc1 = 0, .opc2 = 1,
 -      .access = PL1_RW, .type = ARM_CP_CONST,
 -      .resetvalue = 0 },
 +      .access = PL1_RW, .accessfn = access_tvm_trvm,
 +      .type = ARM_CP_CONST, .resetvalue = 0 },
      { .name = "PAR", .cp = 15, .crm = 7, .opc1 = 0,
        .access = PL1_RW, .type = ARM_CP_64BIT, .resetvalue = 0,
        .bank_fieldoffsets = { offsetof(CPUARMState, cp15.par_s),
                               offsetof(CPUARMState, cp15.par_ns)} },
      { .name = "TTBR0", .cp = 15, .crm = 2, .opc1 = 0,
 -      .access = PL1_RW, .type = ARM_CP_64BIT | ARM_CP_ALIAS,
 +      .access = PL1_RW, .accessfn = access_tvm_trvm,
 +      .type = ARM_CP_64BIT | ARM_CP_ALIAS,
        .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr0_s),
                               offsetof(CPUARMState, cp15.ttbr0_ns) },
        .writefn = vmsa_ttbr_write, },
      { .name = "TTBR1", .cp = 15, .crm = 2, .opc1 = 1,
 -      .access = PL1_RW, .type = ARM_CP_64BIT | ARM_CP_ALIAS,
 +      .access = PL1_RW, .accessfn = access_tvm_trvm,
 +      .type = ARM_CP_64BIT | ARM_CP_ALIAS,
        .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr1_s),
                               offsetof(CPUARMState, cp15.ttbr1_ns) },
        .writefn = vmsa_ttbr_write, },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
        .type = ARM_CP_NOP, .access = PL1_W },
      /* MMU Domain access control / MPU write buffer control */
      { .name = "DACR", .cp = 15, .opc1 = 0, .crn = 3, .crm = 0, .opc2 = 0,
 -      .access = PL1_RW, .resetvalue = 0,
 +      .access = PL1_RW, .accessfn = access_tvm_trvm, .resetvalue = 0,
        .writefn = dacr_write, .raw_writefn = raw_write,
        .bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.dacr_s),
                               offsetoflow32(CPUARMState, cp15.dacr_ns) } },
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
          ARMCPRegInfo sctlr = {
              .name = "SCTLR", .state = ARM_CP_STATE_BOTH,
              .opc0 = 3, .opc1 = 0, .crn = 1, .crm = 0, .opc2 = 0,
 -            .access = PL1_RW,
 +            .access = PL1_RW, .accessfn = access_tvm_trvm,
              .bank_fieldoffsets = { offsetof(CPUARMState, cp15.sctlr_s),
                                     offsetof(CPUARMState, cp15.sctlr_ns) },
              .writefn = sctlr_write, .resetvalue = cpu->reset_sctlr,
 --
 .20.1

-[PULL 12/37] hw/arm/spitz: move timer_new from init() into realize() to avoid memleaks
+[PULL 20/24] target/arm: Implement MVE VADDLV
-From: Pan Nengyuan <pannengyuan@huawei.com>
+Implement the MVE VADDLV insn; this is similar to VADDV, except
 that it accumulates 32-bit elements into a 64-bit accumulator
 stored in a pair of general-purpose registers.
-There are some memleaks when we call 'device_list_properties'. This patch move timer_new from init into realize to fix it.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210628135835.6690-15-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h    |  3 ++
  target/arm/mve.decode      |  6 +++-
  target/arm/mve_helper.c    | 19 ++++++++++++
  target/arm/translate-mve.c | 63 ++++++++++++++++++++++++++++++++++++++
 files changed, 90 insertions(+), 1 deletion(-)
-Reported-by: Euler Robot <euler.robot@huawei.com>
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
 Message-id: 20200227025055.14341-4-pannengyuan@huawei.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  hw/arm/spitz.c | 8 +++++++-
 file changed, 7 insertions(+), 1 deletion(-)
 diff --git a/hw/arm/spitz.c b/hw/arm/spitz.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/spitz.c
+--- a/target/arm/helper-mve.h
-+++ b/hw/arm/spitz.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void spitz_keyboard_init(Object *obj)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vaddvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
+ DEF_HELPER_FLAGS_3(mve_vaddvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
-     spitz_keyboard_pre_map(s);
+ DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
--    s->kbdtimer = timer_new_ns(QEMU_CLOCK_VIRTUAL, spitz_keyboard_tick, s);
++DEF_HELPER_FLAGS_3(mve_vaddlv_s, TCG_CALL_NO_WG, i64, env, ptr, i64)
-     qdev_init_gpio_in(dev, spitz_keyboard_strobe, SPITZ_KEY_STROBE_NUM);
++DEF_HELPER_FLAGS_3(mve_vaddlv_u, TCG_CALL_NO_WG, i64, env, ptr, i64)
-     qdev_init_gpio_out(dev, s->sense, SPITZ_KEY_SENSE_NUM);
++
  DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
  DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
  DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
  VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
  # Vector add across vector
 -VADDV            111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rdalo
 +{
 +  VADDV          111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rdalo
 +  VADDLV         111 u:1 1110 1 ... 1001 ... 0 1111 00 a:1 0 qm:3 0 \
 +                 rdahi=%rdahi rdalo=%rdalo
 +}
  # Predicate operations
  %mask_22_13      22:1 13:3
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvub, 1, uint8_t)
  DO_VADDV(vaddvuh, 2, uint16_t)
  DO_VADDV(vaddvuw, 4, uint32_t)
 +#define DO_VADDLV(OP, TYPE, LTYPE)                              \
 +    uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
 +                                    uint64_t ra)                \
 +    {                                                           \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned e;                                             \
 +        TYPE *m = vm;                                           \
 +        for (e = 0; e < 16 / 4; e++, mask >>= 4) {              \
 +            if (mask & 1) {                                     \
 +                ra += (LTYPE)m[H4(e)];                          \
 +            }                                                   \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +        return ra;                                              \
 +    }                                                           \
 +
 +DO_VADDLV(vaddlv_s, int32_t, int64_t)
 +DO_VADDLV(vaddlv_u, uint32_t, uint64_t)
 +
  /* Shifts by immediate */
  #define DO_2SHIFT(OP, ESIZE, TYPE, FN)                          \
      void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VADDV(DisasContext *s, arg_VADDV *a)
      return true;
  }
-+static void spitz_keyboard_realize(DeviceState *dev, Error **errp)
++static bool trans_VADDLV(DisasContext *s, arg_VADDLV *a)
 +{
-+    SpitzKeyboardState *s = SPITZ_KEYBOARD(dev);
++    /*
-+    s->kbdtimer = timer_new_ns(QEMU_CLOCK_VIRTUAL, spitz_keyboard_tick, s);
++     * Vector Add Long Across Vector: accumulate the 32-bit
 +     * elements of the vector into a 64-bit result stored in
 +     * a pair of general-purpose registers.
 +     * No need to check Qm's bank: it is only 3 bits in decode.
 +     */
 +    TCGv_ptr qm;
 +    TCGv_i64 rda;
 +    TCGv_i32 rdalo, rdahi;
 +
 +    if (!dc_isar_feature(aa32_mve, s)) {
 +        return false;
 +    }
 +    /*
 +     * rdahi == 13 is UNPREDICTABLE; rdahi == 15 is a related
 +     * encoding; rdalo always has bit 0 clear so cannot be 13 or 15.
 +     */
 +    if (a->rdahi == 13 || a->rdahi == 15) {
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    /*
 +     * This insn is subject to beat-wise execution. Partial execution
 +     * of an A=0 (no-accumulate) insn which does not execute the first
 +     * beat must start with the current value of RdaHi:RdaLo, not zero.
 +     */
 +    if (a->a || mve_skip_first_beat(s)) {
 +        /* Accumulate input from RdaHi:RdaLo */
 +        rda = tcg_temp_new_i64();
 +        rdalo = load_reg(s, a->rdalo);
 +        rdahi = load_reg(s, a->rdahi);
 +        tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
 +        tcg_temp_free_i32(rdalo);
 +        tcg_temp_free_i32(rdahi);
 +    } else {
 +        /* Accumulate starting at zero */
 +        rda = tcg_const_i64(0);
 +    }
 +
 +    qm = mve_qreg_ptr(a->qm);
 +    if (a->u) {
 +        gen_helper_mve_vaddlv_u(rda, cpu_env, qm, rda);
 +    } else {
 +        gen_helper_mve_vaddlv_s(rda, cpu_env, qm, rda);
 +    }
 +    tcg_temp_free_ptr(qm);
 +
 +    rdalo = tcg_temp_new_i32();
 +    rdahi = tcg_temp_new_i32();
 +    tcg_gen_extrl_i64_i32(rdalo, rda);
 +    tcg_gen_extrh_i64_i32(rdahi, rda);
 +    store_reg(s, a->rdalo, rdalo);
 +    store_reg(s, a->rdahi, rdahi);
 +    tcg_temp_free_i64(rda);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
- /* LCD backlight controller */
+ static bool do_1imm(DisasContext *s, arg_1imm *a, MVEGenOneOpImmFn *fn)
+ {
- #define LCDTG_RESCTL    0x00
+     TCGv_ptr qd;
@@ -XXX,XX +XXX,XX @@ static void spitz_keyboard_class_init(ObjectClass *klass, void *data)
      DeviceClass *dc = DEVICE_CLASS(klass);
      dc->vmsd = &vmstate_spitz_kbd;
 +    dc->realize = spitz_keyboard_realize;
  }
  static const TypeInfo spitz_keyboard_info = {
 --
 .20.1

-[PULL 03/37] target/arm: Implement (trivially) ARMv8.2-TTCNP
+[PULL 21/24] target/arm: Implement MVE long shifts by immediate
-The ARMv8.2-TTCNP extension allows an implementation to optimize by
+The MVE extension to v8.1M includes some new shift instructions which
-sharing TLB entries between multiple cores, provided that software
+sit entirely within the non-coprocessor part of the encoding space
-declares that it's ready to deal with this by setting a CnP bit in
+and which operate only on general-purpose registers.  They take up
-the TTBRn_ELx.  It is mandatory from ARMv8.2 onward.
+the space which was previously UNPREDICTABLE MOVS and ORRS encodings
+with Rm == 13 or 15.
-For QEMU's TLB implementation, sharing TLB entries between different
-cores would not really benefit us and would be a lot of work to
+Implement the long shifts by immediate, which perform shifts on a
-implement.  So we implement this extension in the "trivial" manner:
+pair of general-purpose registers treated as a 64-bit quantity, with
-we allow the guest to set and read back the CnP bit, but don't change
+an immediate shift count between 1 and 32.
-our behaviour (this is an architecturally valid implementation
-choice).
+Awkwardly, because the MOVS and ORRS trans functions do not UNDEF for
+the Rm==13,15 case, we need to explicitly emit code to UNDEF for the
-The only code path which looks at the TTBRn_ELx values for the
+cases where v8.1M now requires that.  (Trying to change MOVS and ORRS
-long-descriptor format where the CnP bit is defined is already doing
+is too difficult, because the functions that generate the code are
-enough masking to not get confused when the CnP bit at the bottom of
+shared between a dozen different kinds of arithmetic or logical
-the register is set, so we can simply add a comment noting why we're
+instruction for all A32, T16 and T32 encodings, and for some insns
-relying on that mask.
+and some encodings Rm==13,15 are valid.)
 We make the helper functions we need for UQSHLL and SQSHLL take
 a 32-bit value which the helper casts to int8_t because we'll need
 these helpers also for the shift-by-register insns, where the shift
 count might be < 0 or > 32.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200225193822.18874-1-peter.maydell@linaro.org
+Message-id: 20210628135835.6690-16-peter.maydell@linaro.org
 ---
- target/arm/cpu.c    | 1 +
+ target/arm/helper-mve.h |  3 ++
- target/arm/cpu64.c  | 2 ++
+ target/arm/translate.h  |  1 +
- target/arm/helper.c | 4 ++++
+ target/arm/t32.decode   | 28 +++++++++++++
-files changed, 7 insertions(+)
+ target/arm/mve_helper.c | 10 +++++
+ target/arm/translate.c  | 90 +++++++++++++++++++++++++++++++++++++++++
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+files changed, 132 insertions(+)
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-+++ b/target/arm/cpu.c
+index XXXXXXX..XXXXXXX 100644
-@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
+--- a/target/arm/helper-mve.h
-             t = cpu->isar.id_mmfr4;
++++ b/target/arm/helper-mve.h
-             t = FIELD_DP32(t, ID_MMFR4, HPDS, 1); /* AA32HPD */
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-             t = FIELD_DP32(t, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
+ DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+            t = FIELD_DP32(t, ID_MMFR4, CNP, 1); /* TTCNP */
-             cpu->isar.id_mmfr4 = t;
+ DEF_HELPER_FLAGS_4(mve_vshlc, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
-         }
++
- #endif
++DEF_HELPER_FLAGS_3(mve_sqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
-diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
++DEF_HELPER_FLAGS_3(mve_uqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
-index XXXXXXX..XXXXXXX 100644
+diff --git a/target/arm/translate.h b/target/arm/translate.h
---- a/target/arm/cpu64.c
+index XXXXXXX..XXXXXXX 100644
-+++ b/target/arm/cpu64.c
+--- a/target/arm/translate.h
-@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
++++ b/target/arm/translate.h
+@@ -XXX,XX +XXX,XX @@ typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
-         t = cpu->isar.id_aa64mmfr2;
+ typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
-         t = FIELD_DP64(t, ID_AA64MMFR2, UAO, 1);
+ typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
-+        t = FIELD_DP64(t, ID_AA64MMFR2, CNP, 1); /* TTCNP */
+ typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
-         cpu->isar.id_aa64mmfr2 = t;
++typedef void WideShiftImmFn(TCGv_i64, TCGv_i64, int64_t shift);
-         /* Replicate the same data to the 32-bit id registers.  */
+ /**
-@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
+  * arm_tbflags_from_tb:
-         u = cpu->isar.id_mmfr4;
+diff --git a/target/arm/t32.decode b/target/arm/t32.decode
-         u = FIELD_DP32(u, ID_MMFR4, HPDS, 1); /* AA32HPD */
+index XXXXXXX..XXXXXXX 100644
-         u = FIELD_DP32(u, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
+--- a/target/arm/t32.decode
-+        u = FIELD_DP32(t, ID_MMFR4, CNP, 1); /* TTCNP */
++++ b/target/arm/t32.decode
-         cpu->isar.id_mmfr4 = u;
+@@ -XXX,XX +XXX,XX @@
+ &mcr             !extern cp opc1 crn crm opc2 rt
-         u = cpu->isar.id_aa64dfr0;
+ &mcrr            !extern cp opc1 crm rt rt2
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
++&mve_shl_ri      rdalo rdahi shim
---- a/target/arm/helper.c
++
-+++ b/target/arm/helper.c
++# rdahi: bits [3:1] from insn, bit 0 is 1
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
++# rdalo: bits [3:1] from insn, bit 0 is 0
++%rdahi_9 9:3 !function=times_2_plus_1
-     /* Now we can extract the actual base address from the TTBR */
++%rdalo_17 17:3 !function=times_2
-     descaddr = extract64(ttbr, 0, 48);
++
-+    /*
+ # Data-processing (register)
-+     * We rely on this masking to clear the RES0 bits at the bottom of the TTBR
-+     * and also to mask out CnP (bit 0) which could validly be non-zero.
+ %imm5_12_6       12:3 6:2
-+     */
+@@ -XXX,XX +XXX,XX @@
-     descaddr &= ~indexmask;
+ @S_xrr_shi       ....... .... .   rn:4 .... .... .. shty:2 rm:4 \
+                  &s_rrr_shi shim=%imm5_12_6 s=1 rd=0
-     /* The address field in the descriptor goes up to bit 39 for ARMv7
 +@mve_shl_ri      ....... .... . ... . . ... ... . .. .. .... \
 +                 &mve_shl_ri shim=%imm5_12_6 rdalo=%rdalo_17 rdahi=%rdahi_9
 +
  {
    TST_xrri       1110101 0000 1 .... 0 ... 1111 .... ....     @S_xrr_shi
    AND_rrri       1110101 0000 . .... 0 ... .... .... ....     @s_rrr_shi
  }
  BIC_rrri         1110101 0001 . .... 0 ... .... .... ....     @s_rrr_shi
  {
 +  # The v8.1M MVE shift insns overlap in encoding with MOVS/ORRS
 +  # and are distinguished by having Rm==13 or 15. Those are UNPREDICTABLE
 +  # cases for MOVS/ORRS. We decode the MVE cases first, ensuring that
 +  # they explicitly call unallocated_encoding() for cases that must UNDEF
 +  # (eg "using a new shift insn on a v8.1M CPU without MVE"), and letting
 +  # the rest fall through (where ORR_rrri and MOV_rxri will end up
 +  # handling them as r13 and r15 accesses with the same semantics as A32).
 +  [
 +    LSLL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 00 1111  @mve_shl_ri
 +    LSRL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 01 1111  @mve_shl_ri
 +    ASRL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 10 1111  @mve_shl_ri
 +
 +    UQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 00 1111  @mve_shl_ri
 +    URSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
 +    SRSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
 +    SQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
 +  ]
 +
    MOV_rxri       1110101 0010 . 1111 0 ... .... .... ....     @s_rxr_shi
    ORR_rrri       1110101 0010 . .... 0 ... .... .... ....     @s_rrr_shi
  }
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mve_vshlc)(CPUARMState *env, void *vd, uint32_t rdm,
      mve_advance_vpt(env);
      return rdm;
  }
 +
 +uint64_t HELPER(mve_sqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
 +{
 +    return do_sqrshl_d(n, (int8_t)shift, false, &env->QF);
 +}
 +
 +uint64_t HELPER(mve_uqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
 +{
 +    return do_uqrshl_d(n, (int8_t)shift, false, &env->QF);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_MOVT(DisasContext *s, arg_MOVW *a)
      return true;
  }
 +/*
 + * v8.1M MVE wide-shifts
 + */
 +static bool do_mve_shl_ri(DisasContext *s, arg_mve_shl_ri *a,
 +                          WideShiftImmFn *fn)
 +{
 +    TCGv_i64 rda;
 +    TCGv_i32 rdalo, rdahi;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 +        /* Decode falls through to ORR/MOV UNPREDICTABLE handling */
 +        return false;
 +    }
 +    if (a->rdahi == 15) {
 +        /* These are a different encoding (SQSHL/SRSHR/UQSHL/URSHR) */
 +        return false;
 +    }
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !arm_dc_feature(s, ARM_FEATURE_M_MAIN) ||
 +        a->rdahi == 13) {
 +        /* RdaHi == 13 is UNPREDICTABLE; we choose to UNDEF */
 +        unallocated_encoding(s);
 +        return true;
 +    }
 +
 +    if (a->shim == 0) {
 +        a->shim = 32;
 +    }
 +
 +    rda = tcg_temp_new_i64();
 +    rdalo = load_reg(s, a->rdalo);
 +    rdahi = load_reg(s, a->rdahi);
 +    tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
 +
 +    fn(rda, rda, a->shim);
 +
 +    tcg_gen_extrl_i64_i32(rdalo, rda);
 +    tcg_gen_extrh_i64_i32(rdahi, rda);
 +    store_reg(s, a->rdalo, rdalo);
 +    store_reg(s, a->rdahi, rdahi);
 +    tcg_temp_free_i64(rda);
 +
 +    return true;
 +}
 +
 +static bool trans_ASRL_ri(DisasContext *s, arg_mve_shl_ri *a)
 +{
 +    return do_mve_shl_ri(s, a, tcg_gen_sari_i64);
 +}
 +
 +static bool trans_LSLL_ri(DisasContext *s, arg_mve_shl_ri *a)
 +{
 +    return do_mve_shl_ri(s, a, tcg_gen_shli_i64);
 +}
 +
 +static bool trans_LSRL_ri(DisasContext *s, arg_mve_shl_ri *a)
 +{
 +    return do_mve_shl_ri(s, a, tcg_gen_shri_i64);
 +}
 +
 +static void gen_mve_sqshll(TCGv_i64 r, TCGv_i64 n, int64_t shift)
 +{
 +    gen_helper_mve_sqshll(r, cpu_env, n, tcg_constant_i32(shift));
 +}
 +
 +static bool trans_SQSHLL_ri(DisasContext *s, arg_mve_shl_ri *a)
 +{
 +    return do_mve_shl_ri(s, a, gen_mve_sqshll);
 +}
 +
 +static void gen_mve_uqshll(TCGv_i64 r, TCGv_i64 n, int64_t shift)
 +{
 +    gen_helper_mve_uqshll(r, cpu_env, n, tcg_constant_i32(shift));
 +}
 +
 +static bool trans_UQSHLL_ri(DisasContext *s, arg_mve_shl_ri *a)
 +{
 +    return do_mve_shl_ri(s, a, gen_mve_uqshll);
 +}
 +
 +static bool trans_SRSHRL_ri(DisasContext *s, arg_mve_shl_ri *a)
 +{
 +    return do_mve_shl_ri(s, a, gen_srshr64_i64);
 +}
 +
 +static bool trans_URSHRL_ri(DisasContext *s, arg_mve_shl_ri *a)
 +{
 +    return do_mve_shl_ri(s, a, gen_urshr64_i64);
 +}
 +
  /*
   * Multiply and multiply accumulate
   */
 --
 .20.1

-[PULL 05/37] hw/arm/smmu-common: Simplify smmu_find_smmu_pcibus() logic
+Deleted patch
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
-The smmu_find_smmu_pcibus() function was introduced (in commit
-cac994ef43b) in a code format that could return an incorrect
-pointer, which was then fixed by the previous commit.
-We could have avoided this by writing the if() statement
-differently. Do it now, in case this function is re-used.
-The code is easier to review (harder to miss bugs).
-Acked-by: Eric Auger <eric.auger@redhat.com>
-Reviewed-by: Peter Xu <peterx@redhat.com>
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- hw/arm/smmu-common.c | 25 +++++++++++++------------
-file changed, 13 insertions(+), 12 deletions(-)
-diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/smmu-common.c
-+++ b/hw/arm/smmu-common.c
-@@ -XXX,XX +XXX,XX @@ inline int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
- SMMUPciBus *smmu_find_smmu_pcibus(SMMUState *s, uint8_t bus_num)
- {
-     SMMUPciBus *smmu_pci_bus = s->smmu_pcibus_by_bus_num[bus_num];
-+    GHashTableIter iter;
--    if (!smmu_pci_bus) {
--        GHashTableIter iter;
--
--        g_hash_table_iter_init(&iter, s->smmu_pcibus_by_busptr);
--        while (g_hash_table_iter_next(&iter, NULL, (void **)&smmu_pci_bus)) {
--            if (pci_bus_num(smmu_pci_bus->bus) == bus_num) {
--                s->smmu_pcibus_by_bus_num[bus_num] = smmu_pci_bus;
--                return smmu_pci_bus;
--            }
--        }
--        smmu_pci_bus = NULL;
-+    if (smmu_pci_bus) {
-+        return smmu_pci_bus;
-     }
--    return smmu_pci_bus;
-+
-+    g_hash_table_iter_init(&iter, s->smmu_pcibus_by_busptr);
-+    while (g_hash_table_iter_next(&iter, NULL, (void **)&smmu_pci_bus)) {
-+        if (pci_bus_num(smmu_pci_bus->bus) == bus_num) {
-+            s->smmu_pcibus_by_bus_num[bus_num] = smmu_pci_bus;
-+            return smmu_pci_bus;
-+        }
-+    }
-+
-+    return NULL;
- }
- static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
---
-.20.1

-[PULL 07/37] hw/arm/mainstone: Simplify since the machines are little-endian only
+Deleted patch
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
-We only build the little-endian softmmu configurations. Checking
-for big endian is pointless, remove the unused code.
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- hw/arm/mainstone.c | 8 +-------
-file changed, 1 insertion(+), 7 deletions(-)
-diff --git a/hw/arm/mainstone.c b/hw/arm/mainstone.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mainstone.c
-+++ b/hw/arm/mainstone.c
-@@ -XXX,XX +XXX,XX @@ static void mainstone_common_init(MemoryRegion *address_space_mem,
-     DeviceState *mst_irq;
-     DriveInfo *dinfo;
-     int i;
--    int be;
-     MemoryRegion *rom = g_new(MemoryRegion, 1);
-     /* Setup CPU & memory */
-@@ -XXX,XX +XXX,XX @@ static void mainstone_common_init(MemoryRegion *address_space_mem,
-     memory_region_set_readonly(rom, true);
-     memory_region_add_subregion(address_space_mem, 0, rom);
--#ifdef TARGET_WORDS_BIGENDIAN
--    be = 1;
--#else
--    be = 0;
--#endif
-     /* There are two 32MiB flash devices on the board */
-     for (i = 0; i < 2; i ++) {
-         dinfo = drive_get(IF_PFLASH, 0, i);
-@@ -XXX,XX +XXX,XX @@ static void mainstone_common_init(MemoryRegion *address_space_mem,
-                                    i ? "mainstone.flash1" : "mainstone.flash0",
-                                    MAINSTONE_FLASH,
-                                    dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
--                                   sector_len, 4, 0, 0, 0, 0, be)) {
-+                                   sector_len, 4, 0, 0, 0, 0, 0)) {
-             error_report("Error registering flash memory");
-             exit(1);
-         }
---
-.20.1

-[PULL 08/37] hw/arm/omap_sx1: Simplify since the machines are little-endian only
+Deleted patch
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
-We only build the little-endian softmmu configurations. Checking
-for big endian is pointless, remove the unused code.
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- hw/arm/omap_sx1.c | 11 ++---------
-file changed, 2 insertions(+), 9 deletions(-)
-diff --git a/hw/arm/omap_sx1.c b/hw/arm/omap_sx1.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/omap_sx1.c
-+++ b/hw/arm/omap_sx1.c
-@@ -XXX,XX +XXX,XX @@ static void sx1_init(MachineState *machine, const int version)
-     DriveInfo *dinfo;
-     int fl_idx;
-     uint32_t flash_size = flash0_size;
--    int be;
-     if (machine->ram_size != mc->default_ram_size) {
-         char *sz = size_to_str(mc->default_ram_size);
-@@ -XXX,XX +XXX,XX @@ static void sx1_init(MachineState *machine, const int version)
-                                 OMAP_CS2_BASE, &cs[3]);
-     fl_idx = 0;
--#ifdef TARGET_WORDS_BIGENDIAN
--    be = 1;
--#else
--    be = 0;
--#endif
--
-     if ((dinfo = drive_get(IF_PFLASH, 0, fl_idx)) != NULL) {
-         if (!pflash_cfi01_register(OMAP_CS0_BASE,
-                                    "omap_sx1.flash0-1", flash_size,
-                                    blk_by_legacy_dinfo(dinfo),
--                                   sector_size, 4, 0, 0, 0, 0, be)) {
-+                                   sector_size, 4, 0, 0, 0, 0, 0)) {
-             fprintf(stderr, "qemu: Error registering flash memory %d.\n",
-                            fl_idx);
-         }
-@@ -XXX,XX +XXX,XX @@ static void sx1_init(MachineState *machine, const int version)
-         if (!pflash_cfi01_register(OMAP_CS1_BASE,
-                                    "omap_sx1.flash1-1", flash1_size,
-                                    blk_by_legacy_dinfo(dinfo),
--                                   sector_size, 4, 0, 0, 0, 0, be)) {
-+                                   sector_size, 4, 0, 0, 0, 0, 0)) {
-             fprintf(stderr, "qemu: Error registering flash memory %d.\n",
-                            fl_idx);
-         }
---
-.20.1

-[PULL 09/37] hw/arm/z2: Simplify since the machines are little-endian only
+Deleted patch
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
-We only build the little-endian softmmu configurations. Checking
-for big endian is pointless, remove the unused code.
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- hw/arm/z2.c | 8 +-------
-file changed, 1 insertion(+), 7 deletions(-)
-diff --git a/hw/arm/z2.c b/hw/arm/z2.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/z2.c
-+++ b/hw/arm/z2.c
-@@ -XXX,XX +XXX,XX @@ static void z2_init(MachineState *machine)
-     uint32_t sector_len = 0x10000;
-     PXA2xxState *mpu;
-     DriveInfo *dinfo;
--    int be;
-     void *z2_lcd;
-     I2CBus *bus;
-     DeviceState *wm;
-@@ -XXX,XX +XXX,XX @@ static void z2_init(MachineState *machine)
-     /* Setup CPU & memory */
-     mpu = pxa270_init(address_space_mem, z2_binfo.ram_size, machine->cpu_type);
--#ifdef TARGET_WORDS_BIGENDIAN
--    be = 1;
--#else
--    be = 0;
--#endif
-     dinfo = drive_get(IF_PFLASH, 0, 0);
-     if (!pflash_cfi01_register(Z2_FLASH_BASE, "z2.flash0", Z2_FLASH_SIZE,
-                                dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
--                               sector_len, 4, 0, 0, 0, 0, be)) {
-+                               sector_len, 4, 0, 0, 0, 0, 0)) {
-         error_report("Error registering flash memory");
-         exit(1);
-     }
---
-.20.1

-[PULL 10/37] hw/arm/musicpal: Simplify since the machines are little-endian only
+Deleted patch
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
-We only build the little-endian softmmu configurations. Checking
-for big endian is pointless, remove the unused code.
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- hw/arm/musicpal.c | 10 ----------
-file changed, 10 deletions(-)
-diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/musicpal.c
-+++ b/hw/arm/musicpal.c
-@@ -XXX,XX +XXX,XX @@ static void musicpal_init(MachineState *machine)
-          * 0xFF800000 (if there is 8 MB flash). So remap flash access if the
-          * image is smaller than 32 MB.
-          */
--#ifdef TARGET_WORDS_BIGENDIAN
--        pflash_cfi02_register(0x100000000ULL - MP_FLASH_SIZE_MAX,
--                              "musicpal.flash", flash_size,
--                              blk, 0x10000,
--                              MP_FLASH_SIZE_MAX / flash_size,
--                              2, 0x00BF, 0x236D, 0x0000, 0x0000,
--                              0x5555, 0x2AAA, 1);
--#else
-         pflash_cfi02_register(0x100000000ULL - MP_FLASH_SIZE_MAX,
-                               "musicpal.flash", flash_size,
-                               blk, 0x10000,
-                               MP_FLASH_SIZE_MAX / flash_size,
-, 0x00BF, 0x236D, 0x0000, 0x0000,
-x5555, 0x2AAA, 0);
--#endif
--
-     }
-     sysbus_create_simple(TYPE_MV88W8618_FLASHCFG, MP_FLASHCFG_BASE, NULL);
---
-.20.1

-[PULL 19/37] target/arm: Improve masking in arm_hcr_el2_eff
+[PULL 22/24] target/arm: Implement MVE long shifts by register
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the MVE long shifts by register, which perform shifts on a
+pair of general-purpose registers treated as a 64-bit quantity, with
-Update the {TGE,E2H} == '11' masking to ARMv8.6.
+the shift count in another general-purpose register, which might be
-If EL2 is configured for aarch32, disable all of
+either positive or negative.
-the bits that are RES0 in aarch32 mode.
+Like the long-shifts-by-immediate, these encodings sit in the space
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+that was previously the UNPREDICTABLE MOVS/ORRS with Rm==13,15.
-Message-id: 20200229012811.24129-6-richard.henderson@linaro.org
+Because LSLL_rr and ASRL_rr overlap with both MOV_rxri/ORR_rrri and
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+also with CSEL (as one of the previously-UNPREDICTABLE Rm==13 cases),
 we have to move the CSEL pattern into the same decodetree group.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210628135835.6690-17-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 31 +++++++++++++++++++++++++++----
+ target/arm/helper-mve.h |  6 +++
-file changed, 27 insertions(+), 4 deletions(-)
+ target/arm/translate.h  |  1 +
+ target/arm/t32.decode   | 16 +++++--
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+ target/arm/mve_helper.c | 93 +++++++++++++++++++++++++++++++++++++++++
-index XXXXXXX..XXXXXXX 100644
+ target/arm/translate.c  | 69 ++++++++++++++++++++++++++++++
---- a/target/arm/helper.c
+files changed, 182 insertions(+), 3 deletions(-)
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ uint64_t arm_hcr_el2_eff(CPUARMState *env)
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-          * Since the v8.4 language applies to the entire register, and
+index XXXXXXX..XXXXXXX 100644
-          * appears to be backward compatible, use that.
+--- a/target/arm/helper-mve.h
-          */
++++ b/target/arm/helper-mve.h
--        ret = 0;
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
--    } else if (ret & HCR_TGE) {
--        /* These bits are up-to-date as of ARMv8.4.  */
+ DEF_HELPER_FLAGS_4(mve_vshlc, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_sshrl, TCG_CALL_NO_RWG, i64, env, i64, i32)
 +DEF_HELPER_FLAGS_3(mve_ushll, TCG_CALL_NO_RWG, i64, env, i64, i32)
  DEF_HELPER_FLAGS_3(mve_sqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
  DEF_HELPER_FLAGS_3(mve_uqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
 +DEF_HELPER_FLAGS_3(mve_sqrshrl, TCG_CALL_NO_RWG, i64, env, i64, i32)
 +DEF_HELPER_FLAGS_3(mve_uqrshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
 +DEF_HELPER_FLAGS_3(mve_sqrshrl48, TCG_CALL_NO_RWG, i64, env, i64, i32)
 +DEF_HELPER_FLAGS_3(mve_uqrshll48, TCG_CALL_NO_RWG, i64, env, i64, i32)
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
  typedef void WideShiftImmFn(TCGv_i64, TCGv_i64, int64_t shift);
 +typedef void WideShiftFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i32);
  /**
   * arm_tbflags_from_tb:
 diff --git a/target/arm/t32.decode b/target/arm/t32.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/t32.decode
 +++ b/target/arm/t32.decode
@@ -XXX,XX +XXX,XX @@
  &mcrr            !extern cp opc1 crm rt rt2
  &mve_shl_ri      rdalo rdahi shim
 +&mve_shl_rr      rdalo rdahi rm
  # rdahi: bits [3:1] from insn, bit 0 is 1
  # rdalo: bits [3:1] from insn, bit 0 is 0
@@ -XXX,XX +XXX,XX @@
  @mve_shl_ri      ....... .... . ... . . ... ... . .. .. .... \
                   &mve_shl_ri shim=%imm5_12_6 rdalo=%rdalo_17 rdahi=%rdahi_9
 +@mve_shl_rr      ....... .... . ... . rm:4  ... . .. .. .... \
 +                 &mve_shl_rr rdalo=%rdalo_17 rdahi=%rdahi_9
  {
    TST_xrri       1110101 0000 1 .... 0 ... 1111 .... ....     @S_xrr_shi
@@ -XXX,XX +XXX,XX @@ BIC_rrri         1110101 0001 . .... 0 ... .... .... ....     @s_rrr_shi
      URSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
      SRSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
      SQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
 +
 +    LSLL_rr      1110101 0010 1 ... 0 ....  ... 1  0000 1101  @mve_shl_rr
 +    ASRL_rr      1110101 0010 1 ... 0 ....  ... 1  0010 1101  @mve_shl_rr
 +    UQRSHLL64_rr 1110101 0010 1 ... 1 ....  ... 1  0000 1101  @mve_shl_rr
 +    SQRSHRL64_rr 1110101 0010 1 ... 1 ....  ... 1  0010 1101  @mve_shl_rr
 +    UQRSHLL48_rr 1110101 0010 1 ... 1 ....  ... 1  1000 1101  @mve_shl_rr
 +    SQRSHRL48_rr 1110101 0010 1 ... 1 ....  ... 1  1010 1101  @mve_shl_rr
    ]
    MOV_rxri       1110101 0010 . 1111 0 ... .... .... ....     @s_rxr_shi
    ORR_rrri       1110101 0010 . .... 0 ... .... .... ....     @s_rrr_shi
 +
 +  # v8.1M CSEL and friends
 +  CSEL           1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
  }
  {
    MVN_rxri       1110101 0011 . 1111 0 ... .... .... ....     @s_rxr_shi
@@ -XXX,XX +XXX,XX @@ SBC_rrri         1110101 1011 . .... 0 ... .... .... ....     @s_rrr_shi
  }
  RSB_rrri         1110101 1110 . .... 0 ... .... .... ....     @s_rrr_shi
 -# v8.1M CSEL and friends
 -CSEL             1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
 -
  # Data-processing (register-shifted register)
  MOV_rxrr         1111 1010 0 shty:2 s:1 rm:4 1111 rd:4 0000 rs:4 \
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mve_vshlc)(CPUARMState *env, void *vd, uint32_t rdm,
      return rdm;
  }
 +uint64_t HELPER(mve_sshrl)(CPUARMState *env, uint64_t n, uint32_t shift)
 +{
 +    return do_sqrshl_d(n, -(int8_t)shift, false, NULL);
 +}
 +
 +uint64_t HELPER(mve_ushll)(CPUARMState *env, uint64_t n, uint32_t shift)
 +{
 +    return do_uqrshl_d(n, (int8_t)shift, false, NULL);
 +}
 +
  uint64_t HELPER(mve_sqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
  {
      return do_sqrshl_d(n, (int8_t)shift, false, &env->QF);
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(mve_uqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
  {
      return do_uqrshl_d(n, (int8_t)shift, false, &env->QF);
  }
 +
 +uint64_t HELPER(mve_sqrshrl)(CPUARMState *env, uint64_t n, uint32_t shift)
 +{
 +    return do_sqrshl_d(n, -(int8_t)shift, true, &env->QF);
 +}
 +
 +uint64_t HELPER(mve_uqrshll)(CPUARMState *env, uint64_t n, uint32_t shift)
 +{
 +    return do_uqrshl_d(n, (int8_t)shift, true, &env->QF);
 +}
 +
 +/* Operate on 64-bit values, but saturate at 48 bits */
 +static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
 +                                    bool round, uint32_t *sat)
 +{
 +    if (shift <= -48) {
 +        /* Rounding the sign bit always produces 0. */
 +        if (round) {
 +            return 0;
 +        }
 +        return src >> 63;
 +    } else if (shift < 0) {
 +        if (round) {
 +            src >>= -shift - 1;
 +            return (src >> 1) + (src & 1);
 +        }
 +        return src >> -shift;
 +    } else if (shift < 48) {
 +        int64_t val = src << shift;
 +        int64_t extval = sextract64(val, 0, 48);
 +        if (!sat || val == extval) {
 +            return extval;
 +        }
 +    } else if (!sat || src == 0) {
 +        return 0;
 +    }
 +
-+    /*
++    *sat = 1;
-+     * For a cpu that supports both aarch64 and aarch32, we can set bits
++    return (1ULL << 47) - (src >= 0);
-+     * in HCR_EL2 (e.g. via EL3) that are RES0 when we enter EL2 as aa32.
++}
-+     * Ignore all of the bits in HCR+HCR2 that are not valid for aarch32.
++
-+     */
++/* Operate on 64-bit values, but saturate at 48 bits */
-+    if (!arm_el_is_aa64(env, 2)) {
++static inline uint64_t do_uqrshl48_d(uint64_t src, int64_t shift,
-+        uint64_t aa32_valid;
++                                     bool round, uint32_t *sat)
-+
++{
-+        /*
++    uint64_t val, extval;
-+         * These bits are up-to-date as of ARMv8.6.
++
-+         * For HCR, it's easiest to list just the 2 bits that are invalid.
++    if (shift <= -(48 + round)) {
-+         * For HCR2, list those that are valid.
++        return 0;
-+         */
++    } else if (shift < 0) {
-+        aa32_valid = MAKE_64BIT_MASK(0, 32) & ~(HCR_RW | HCR_TDZ);
++        if (round) {
-+        aa32_valid |= (HCR_CD | HCR_ID | HCR_TERR | HCR_TEA | HCR_MIOCNCE |
++            val = src >> (-shift - 1);
-+                       HCR_TID4 | HCR_TICAB | HCR_TOCU | HCR_TTLBIS);
++            val = (val >> 1) + (val & 1);
-+        ret &= aa32_valid;
++        } else {
-+    }
++            val = src >> -shift;
-+
++        }
-+    if (ret & HCR_TGE) {
++        extval = extract64(val, 0, 48);
-+        /* These bits are up-to-date as of ARMv8.6.  */
++        if (!sat || val == extval) {
-         if (ret & HCR_E2H) {
++            return extval;
-             ret &= ~(HCR_VM | HCR_FMO | HCR_IMO | HCR_AMO |
++        }
-                      HCR_BSU_MASK | HCR_DC | HCR_TWI | HCR_TWE |
++    } else if (shift < 48) {
-                      HCR_TID0 | HCR_TID2 | HCR_TPCP | HCR_TPU |
++        uint64_t val = src << shift;
--                     HCR_TDZ | HCR_CD | HCR_ID | HCR_MIOCNCE);
++        uint64_t extval = extract64(val, 0, 48);
-+                     HCR_TDZ | HCR_CD | HCR_ID | HCR_MIOCNCE |
++        if (!sat || val == extval) {
-+                     HCR_TID4 | HCR_TICAB | HCR_TOCU | HCR_ENSCXT |
++            return extval;
-+                     HCR_TTLBIS | HCR_TTLBOS | HCR_TID5);
++        }
-         } else {
++    } else if (!sat || src == 0) {
-             ret |= HCR_FMO | HCR_IMO | HCR_AMO;
++        return 0;
-         }
++    }
 +
 +    *sat = 1;
 +    return MAKE_64BIT_MASK(0, 48);
 +}
 +
 +uint64_t HELPER(mve_sqrshrl48)(CPUARMState *env, uint64_t n, uint32_t shift)
 +{
 +    return do_sqrshl48_d(n, -(int8_t)shift, true, &env->QF);
 +}
 +
 +uint64_t HELPER(mve_uqrshll48)(CPUARMState *env, uint64_t n, uint32_t shift)
 +{
 +    return do_uqrshl48_d(n, (int8_t)shift, true, &env->QF);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_URSHRL_ri(DisasContext *s, arg_mve_shl_ri *a)
      return do_mve_shl_ri(s, a, gen_urshr64_i64);
  }
 +static bool do_mve_shl_rr(DisasContext *s, arg_mve_shl_rr *a, WideShiftFn *fn)
 +{
 +    TCGv_i64 rda;
 +    TCGv_i32 rdalo, rdahi;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 +        /* Decode falls through to ORR/MOV UNPREDICTABLE handling */
 +        return false;
 +    }
 +    if (a->rdahi == 15) {
 +        /* These are a different encoding (SQSHL/SRSHR/UQSHL/URSHR) */
 +        return false;
 +    }
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !arm_dc_feature(s, ARM_FEATURE_M_MAIN) ||
 +        a->rdahi == 13 || a->rm == 13 || a->rm == 15 ||
 +        a->rm == a->rdahi || a->rm == a->rdalo) {
 +        /* These rdahi/rdalo/rm cases are UNPREDICTABLE; we choose to UNDEF */
 +        unallocated_encoding(s);
 +        return true;
 +    }
 +
 +    rda = tcg_temp_new_i64();
 +    rdalo = load_reg(s, a->rdalo);
 +    rdahi = load_reg(s, a->rdahi);
 +    tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
 +
 +    /* The helper takes care of the sign-extension of the low 8 bits of Rm */
 +    fn(rda, cpu_env, rda, cpu_R[a->rm]);
 +
 +    tcg_gen_extrl_i64_i32(rdalo, rda);
 +    tcg_gen_extrh_i64_i32(rdahi, rda);
 +    store_reg(s, a->rdalo, rdalo);
 +    store_reg(s, a->rdahi, rdahi);
 +    tcg_temp_free_i64(rda);
 +
 +    return true;
 +}
 +
 +static bool trans_LSLL_rr(DisasContext *s, arg_mve_shl_rr *a)
 +{
 +    return do_mve_shl_rr(s, a, gen_helper_mve_ushll);
 +}
 +
 +static bool trans_ASRL_rr(DisasContext *s, arg_mve_shl_rr *a)
 +{
 +    return do_mve_shl_rr(s, a, gen_helper_mve_sshrl);
 +}
 +
 +static bool trans_UQRSHLL64_rr(DisasContext *s, arg_mve_shl_rr *a)
 +{
 +    return do_mve_shl_rr(s, a, gen_helper_mve_uqrshll);
 +}
 +
 +static bool trans_SQRSHRL64_rr(DisasContext *s, arg_mve_shl_rr *a)
 +{
 +    return do_mve_shl_rr(s, a, gen_helper_mve_sqrshrl);
 +}
 +
 +static bool trans_UQRSHLL48_rr(DisasContext *s, arg_mve_shl_rr *a)
 +{
 +    return do_mve_shl_rr(s, a, gen_helper_mve_uqrshll48);
 +}
 +
 +static bool trans_SQRSHRL48_rr(DisasContext *s, arg_mve_shl_rr *a)
 +{
 +    return do_mve_shl_rr(s, a, gen_helper_mve_sqrshrl48);
 +}
 +
  /*
   * Multiply and multiply accumulate
   */
 --
 .20.1

-[PULL 11/37] hw/arm/pxa2xx: move timer_new from init() into realize() to avoid memleaks
+[PULL 23/24] target/arm: Implement MVE shifts by immediate
-From: Pan Nengyuan <pannengyuan@huawei.com>
+Implement the MVE shifts by immediate, which perform shifts
+on a single general-purpose register.
-There are some memleaks when we call 'device_list_properties'. This patch move timer_new from init into realize to fix it.
+These patterns overlap with the long-shift-by-immediates,
-Reported-by: Euler Robot <euler.robot@huawei.com>
+so we have to rearrange the grouping a little here.
-Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
 Message-id: 20200227025055.14341-3-pannengyuan@huawei.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210628135835.6690-18-peter.maydell@linaro.org
 ---
- hw/arm/pxa2xx.c | 17 +++++++++++------
+ target/arm/helper-mve.h |  3 ++
-file changed, 11 insertions(+), 6 deletions(-)
+ target/arm/translate.h  |  1 +
+ target/arm/t32.decode   | 31 ++++++++++++++-----
-diff --git a/hw/arm/pxa2xx.c b/hw/arm/pxa2xx.c
+ target/arm/mve_helper.c | 10 ++++++
-index XXXXXXX..XXXXXXX 100644
+ target/arm/translate.c  | 68 +++++++++++++++++++++++++++++++++++++++--
---- a/hw/arm/pxa2xx.c
+files changed, 104 insertions(+), 9 deletions(-)
-+++ b/hw/arm/pxa2xx.c
-@@ -XXX,XX +XXX,XX @@ static void pxa2xx_rtc_init(Object *obj)
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-     s->last_rtcpicr = 0;
+index XXXXXXX..XXXXXXX 100644
-     s->last_hz = s->last_sw = s->last_pi = qemu_clock_get_ms(rtc_clock);
+--- a/target/arm/helper-mve.h
++++ b/target/arm/helper-mve.h
-+    sysbus_init_irq(dev, &s->rtc_irq);
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_sqrshrl, TCG_CALL_NO_RWG, i64, env, i64, i32)
-+
+ DEF_HELPER_FLAGS_3(mve_uqrshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
-+    memory_region_init_io(&s->iomem, obj, &pxa2xx_rtc_ops, s,
+ DEF_HELPER_FLAGS_3(mve_sqrshrl48, TCG_CALL_NO_RWG, i64, env, i64, i32)
-+                          "pxa2xx-rtc", 0x10000);
+ DEF_HELPER_FLAGS_3(mve_uqrshll48, TCG_CALL_NO_RWG, i64, env, i64, i32)
-+    sysbus_init_mmio(dev, &s->iomem);
++
-+}
++DEF_HELPER_FLAGS_3(mve_uqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
-+
++DEF_HELPER_FLAGS_3(mve_sqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
-+static void pxa2xx_rtc_realize(DeviceState *dev, Error **errp)
+diff --git a/target/arm/translate.h b/target/arm/translate.h
-+{
+index XXXXXXX..XXXXXXX 100644
-+    PXA2xxRTCState *s = PXA2XX_RTC(dev);
+--- a/target/arm/translate.h
-     s->rtc_hz    = timer_new_ms(rtc_clock, pxa2xx_rtc_hz_tick,    s);
++++ b/target/arm/translate.h
-     s->rtc_rdal1 = timer_new_ms(rtc_clock, pxa2xx_rtc_rdal1_tick, s);
+@@ -XXX,XX +XXX,XX @@ typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
-     s->rtc_rdal2 = timer_new_ms(rtc_clock, pxa2xx_rtc_rdal2_tick, s);
+ typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
-     s->rtc_swal1 = timer_new_ms(rtc_clock, pxa2xx_rtc_swal1_tick, s);
+ typedef void WideShiftImmFn(TCGv_i64, TCGv_i64, int64_t shift);
-     s->rtc_swal2 = timer_new_ms(rtc_clock, pxa2xx_rtc_swal2_tick, s);
+ typedef void WideShiftFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i32);
-     s->rtc_pi    = timer_new_ms(rtc_clock, pxa2xx_rtc_pi_tick,    s);
++typedef void ShiftImmFn(TCGv_i32, TCGv_i32, int32_t shift);
--
--    sysbus_init_irq(dev, &s->rtc_irq);
+ /**
--
+  * arm_tbflags_from_tb:
--    memory_region_init_io(&s->iomem, obj, &pxa2xx_rtc_ops, s,
+diff --git a/target/arm/t32.decode b/target/arm/t32.decode
--                          "pxa2xx-rtc", 0x10000);
+index XXXXXXX..XXXXXXX 100644
--    sysbus_init_mmio(dev, &s->iomem);
+--- a/target/arm/t32.decode
 +++ b/target/arm/t32.decode
@@ -XXX,XX +XXX,XX @@
  &mve_shl_ri      rdalo rdahi shim
  &mve_shl_rr      rdalo rdahi rm
 +&mve_sh_ri       rda shim
  # rdahi: bits [3:1] from insn, bit 0 is 1
  # rdalo: bits [3:1] from insn, bit 0 is 0
@@ -XXX,XX +XXX,XX @@
                   &mve_shl_ri shim=%imm5_12_6 rdalo=%rdalo_17 rdahi=%rdahi_9
  @mve_shl_rr      ....... .... . ... . rm:4  ... . .. .. .... \
                   &mve_shl_rr rdalo=%rdalo_17 rdahi=%rdahi_9
 +@mve_sh_ri       ....... .... . rda:4 . ... ... . .. .. .... \
 +                 &mve_sh_ri shim=%imm5_12_6
  {
    TST_xrri       1110101 0000 1 .... 0 ... 1111 .... ....     @S_xrr_shi
@@ -XXX,XX +XXX,XX @@ BIC_rrri         1110101 0001 . .... 0 ... .... .... ....     @s_rrr_shi
    # the rest fall through (where ORR_rrri and MOV_rxri will end up
    # handling them as r13 and r15 accesses with the same semantics as A32).
    [
 -    LSLL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 00 1111  @mve_shl_ri
 -    LSRL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 01 1111  @mve_shl_ri
 -    ASRL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 10 1111  @mve_shl_ri
 +    {
 +      UQSHL_ri   1110101 0010 1 ....  0 ...  1111 .. 00 1111  @mve_sh_ri
 +      LSLL_ri    1110101 0010 1 ... 0 0 ... ... 1 .. 00 1111  @mve_shl_ri
 +      UQSHLL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 00 1111  @mve_shl_ri
 +    }
 -    UQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 00 1111  @mve_shl_ri
 -    URSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
 -    SRSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
 -    SQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
 +    {
 +      URSHR_ri   1110101 0010 1 ....  0 ...  1111 .. 01 1111  @mve_sh_ri
 +      LSRL_ri    1110101 0010 1 ... 0 0 ... ... 1 .. 01 1111  @mve_shl_ri
 +      URSHRL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
 +    }
 +
 +    {
 +      SRSHR_ri   1110101 0010 1 ....  0 ...  1111 .. 10 1111  @mve_sh_ri
 +      ASRL_ri    1110101 0010 1 ... 0 0 ... ... 1 .. 10 1111  @mve_shl_ri
 +      SRSHRL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
 +    }
 +
 +    {
 +      SQSHL_ri   1110101 0010 1 ....  0 ...  1111 .. 11 1111  @mve_sh_ri
 +      SQSHLL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
 +    }
      LSLL_rr      1110101 0010 1 ... 0 ....  ... 1  0000 1101  @mve_shl_rr
      ASRL_rr      1110101 0010 1 ... 0 ....  ... 1  0010 1101  @mve_shl_rr
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(mve_uqrshll48)(CPUARMState *env, uint64_t n, uint32_t shift)
  {
      return do_uqrshl48_d(n, (int8_t)shift, true, &env->QF);
  }
++
- static int pxa2xx_rtc_pre_save(void *opaque)
++uint32_t HELPER(mve_uqshl)(CPUARMState *env, uint32_t n, uint32_t shift)
-@@ -XXX,XX +XXX,XX @@ static void pxa2xx_rtc_sysbus_class_init(ObjectClass *klass, void *data)
++{
++    return do_uqrshl_bhs(n, (int8_t)shift, 32, false, &env->QF);
-     dc->desc = "PXA2xx RTC Controller";
++}
-     dc->vmsd = &vmstate_pxa2xx_rtc_regs;
++
-+    dc->realize = pxa2xx_rtc_realize;
++uint32_t HELPER(mve_sqshl)(CPUARMState *env, uint32_t n, uint32_t shift)
 +{
 +    return do_sqrshl_bhs(n, (int8_t)shift, 32, false, &env->QF);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_srshr16_i64(TCGv_i64 d, TCGv_i64 a, int64_t sh)
  static void gen_srshr32_i32(TCGv_i32 d, TCGv_i32 a, int32_t sh)
  {
 -    TCGv_i32 t = tcg_temp_new_i32();
 +    TCGv_i32 t;
 +    /* Handle shift by the input size for the benefit of trans_SRSHR_ri */
 +    if (sh == 32) {
 +        tcg_gen_movi_i32(d, 0);
 +        return;
 +    }
 +    t = tcg_temp_new_i32();
      tcg_gen_extract_i32(t, a, sh - 1, 1);
      tcg_gen_sari_i32(d, a, sh);
      tcg_gen_add_i32(d, d, t);
@@ -XXX,XX +XXX,XX @@ static void gen_urshr16_i64(TCGv_i64 d, TCGv_i64 a, int64_t sh)
  static void gen_urshr32_i32(TCGv_i32 d, TCGv_i32 a, int32_t sh)
  {
 -    TCGv_i32 t = tcg_temp_new_i32();
 +    TCGv_i32 t;
 +    /* Handle shift by the input size for the benefit of trans_URSHR_ri */
 +    if (sh == 32) {
 +        tcg_gen_extract_i32(d, a, sh - 1, 1);
 +        return;
 +    }
 +    t = tcg_temp_new_i32();
      tcg_gen_extract_i32(t, a, sh - 1, 1);
      tcg_gen_shri_i32(d, a, sh);
      tcg_gen_add_i32(d, d, t);
@@ -XXX,XX +XXX,XX @@ static bool trans_SQRSHRL48_rr(DisasContext *s, arg_mve_shl_rr *a)
      return do_mve_shl_rr(s, a, gen_helper_mve_sqrshrl48);
  }
- static const TypeInfo pxa2xx_rtc_sysbus_info = {
++static bool do_mve_sh_ri(DisasContext *s, arg_mve_sh_ri *a, ShiftImmFn *fn)
 +{
 +    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 +        /* Decode falls through to ORR/MOV UNPREDICTABLE handling */
 +        return false;
 +    }
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !arm_dc_feature(s, ARM_FEATURE_M_MAIN) ||
 +        a->rda == 13 || a->rda == 15) {
 +        /* These rda cases are UNPREDICTABLE; we choose to UNDEF */
 +        unallocated_encoding(s);
 +        return true;
 +    }
 +
 +    if (a->shim == 0) {
 +        a->shim = 32;
 +    }
 +    fn(cpu_R[a->rda], cpu_R[a->rda], a->shim);
 +
 +    return true;
 +}
 +
 +static bool trans_URSHR_ri(DisasContext *s, arg_mve_sh_ri *a)
 +{
 +    return do_mve_sh_ri(s, a, gen_urshr32_i32);
 +}
 +
 +static bool trans_SRSHR_ri(DisasContext *s, arg_mve_sh_ri *a)
 +{
 +    return do_mve_sh_ri(s, a, gen_srshr32_i32);
 +}
 +
 +static void gen_mve_sqshl(TCGv_i32 r, TCGv_i32 n, int32_t shift)
 +{
 +    gen_helper_mve_sqshl(r, cpu_env, n, tcg_constant_i32(shift));
 +}
 +
 +static bool trans_SQSHL_ri(DisasContext *s, arg_mve_sh_ri *a)
 +{
 +    return do_mve_sh_ri(s, a, gen_mve_sqshl);
 +}
 +
 +static void gen_mve_uqshl(TCGv_i32 r, TCGv_i32 n, int32_t shift)
 +{
 +    gen_helper_mve_uqshl(r, cpu_env, n, tcg_constant_i32(shift));
 +}
 +
 +static bool trans_UQSHL_ri(DisasContext *s, arg_mve_sh_ri *a)
 +{
 +    return do_mve_sh_ri(s, a, gen_mve_uqshl);
 +}
 +
  /*
   * Multiply and multiply accumulate
   */
 --
 .20.1

-[PULL 15/37] target/arm: Improve masking of HCR/HCR2 RES0 bits
+[PULL 24/24] target/arm: Implement MVE shifts by register
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the MVE shifts by register, which perform
 shifts on a single general-purpose register.
-Don't merely start with v8.0, handle v7VE as well.  Ensure that writes
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-from aarch32 mode do not change bits in the other half of the register.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Protect reads of aa64 id registers with ARM_FEATURE_AARCH64.
+Message-id: 20210628135835.6690-19-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h |  2 ++
  target/arm/translate.h  |  1 +
  target/arm/t32.decode   | 18 ++++++++++++++----
  target/arm/mve_helper.c | 10 ++++++++++
  target/arm/translate.c  | 30 ++++++++++++++++++++++++++++++
 files changed, 57 insertions(+), 4 deletions(-)
-Suggested-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200229012811.24129-2-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/helper.c | 38 +++++++++++++++++++++++++-------------
 file changed, 25 insertions(+), 13 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/helper.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el3_no_el2_v8_cp_reginfo[] = {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_uqrshll48, TCG_CALL_NO_RWG, i64, env, i64, i32)
-     REGINFO_SENTINEL
- };
+ DEF_HELPER_FLAGS_3(mve_uqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
+ DEF_HELPER_FLAGS_3(mve_sqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
--static void hcr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
++DEF_HELPER_FLAGS_3(mve_uqrshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
-+static void do_hcr_write(CPUARMState *env, uint64_t value, uint64_t valid_mask)
++DEF_HELPER_FLAGS_3(mve_sqrshr, TCG_CALL_NO_RWG, i32, env, i32, i32)
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
  typedef void WideShiftImmFn(TCGv_i64, TCGv_i64, int64_t shift);
  typedef void WideShiftFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i32);
  typedef void ShiftImmFn(TCGv_i32, TCGv_i32, int32_t shift);
 +typedef void ShiftFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
  /**
   * arm_tbflags_from_tb:
 diff --git a/target/arm/t32.decode b/target/arm/t32.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/t32.decode
 +++ b/target/arm/t32.decode
@@ -XXX,XX +XXX,XX @@
  &mve_shl_ri      rdalo rdahi shim
  &mve_shl_rr      rdalo rdahi rm
  &mve_sh_ri       rda shim
 +&mve_sh_rr       rda rm
  # rdahi: bits [3:1] from insn, bit 0 is 1
  # rdalo: bits [3:1] from insn, bit 0 is 0
@@ -XXX,XX +XXX,XX @@
                   &mve_shl_rr rdalo=%rdalo_17 rdahi=%rdahi_9
  @mve_sh_ri       ....... .... . rda:4 . ... ... . .. .. .... \
                   &mve_sh_ri shim=%imm5_12_6
 +@mve_sh_rr       ....... .... . rda:4 rm:4 .... .... .... &mve_sh_rr
  {
-     ARMCPU *cpu = env_archcpu(env);
+   TST_xrri       1110101 0000 1 .... 0 ... 1111 .... ....     @S_xrr_shi
--    /* Begin with bits defined in base ARMv8.0.  */
+@@ -XXX,XX +XXX,XX @@ BIC_rrri         1110101 0001 . .... 0 ... .... .... ....     @s_rrr_shi
--    uint64_t valid_mask = MAKE_64BIT_MASK(0, 34);
+       SQSHLL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
      }
 -    LSLL_rr      1110101 0010 1 ... 0 ....  ... 1  0000 1101  @mve_shl_rr
 -    ASRL_rr      1110101 0010 1 ... 0 ....  ... 1  0010 1101  @mve_shl_rr
 -    UQRSHLL64_rr 1110101 0010 1 ... 1 ....  ... 1  0000 1101  @mve_shl_rr
 -    SQRSHRL64_rr 1110101 0010 1 ... 1 ....  ... 1  0010 1101  @mve_shl_rr
 +    {
 +      UQRSHL_rr    1110101 0010 1 ....  ....  1111 0000 1101  @mve_sh_rr
 +      LSLL_rr      1110101 0010 1 ... 0 .... ... 1 0000 1101  @mve_shl_rr
 +      UQRSHLL64_rr 1110101 0010 1 ... 1 .... ... 1 0000 1101  @mve_shl_rr
 +    }
 +
-+    if (arm_feature(env, ARM_FEATURE_V8)) {
++    {
-+        valid_mask |= MAKE_64BIT_MASK(0, 34);  /* ARMv8.0 */
++      SQRSHR_rr    1110101 0010 1 ....  ....  1111 0010 1101  @mve_sh_rr
-+    } else {
++      ASRL_rr      1110101 0010 1 ... 0 .... ... 1 0010 1101  @mve_shl_rr
-+        valid_mask |= MAKE_64BIT_MASK(0, 28);  /* ARMv7VE */
++      SQRSHRL64_rr 1110101 0010 1 ... 1 .... ... 1 0010 1101  @mve_shl_rr
 +    }
-     if (arm_feature(env, ARM_FEATURE_EL3)) {
-         valid_mask &= ~HCR_HCD;
-@@ -XXX,XX +XXX,XX @@ static void hcr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
-          */
-         valid_mask &= ~HCR_TSC;
-     }
--    if (cpu_isar_feature(aa64_vh, cpu)) {
--        valid_mask |= HCR_E2H;
--    }
--    if (cpu_isar_feature(aa64_lor, cpu)) {
--        valid_mask |= HCR_TLOR;
--    }
--    if (cpu_isar_feature(aa64_pauth, cpu)) {
--        valid_mask |= HCR_API | HCR_APK;
 +
-+    if (arm_feature(env, ARM_FEATURE_AARCH64)) {
+     UQRSHLL48_rr 1110101 0010 1 ... 1 ....  ... 1  1000 1101  @mve_shl_rr
-+        if (cpu_isar_feature(aa64_vh, cpu)) {
+     SQRSHRL48_rr 1110101 0010 1 ... 1 ....  ... 1  1010 1101  @mve_shl_rr
-+            valid_mask |= HCR_E2H;
+   ]
-+        }
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-+        if (cpu_isar_feature(aa64_lor, cpu)) {
+index XXXXXXX..XXXXXXX 100644
-+            valid_mask |= HCR_TLOR;
+--- a/target/arm/mve_helper.c
-+        }
++++ b/target/arm/mve_helper.c
-+        if (cpu_isar_feature(aa64_pauth, cpu)) {
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mve_sqshl)(CPUARMState *env, uint32_t n, uint32_t shift)
-+            valid_mask |= HCR_API | HCR_APK;
+ {
-+        }
+     return do_sqrshl_bhs(n, (int8_t)shift, 32, false, &env->QF);
      }
      /* Clear RES0 bits.  */
@@ -XXX,XX +XXX,XX @@ static void hcr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
      arm_cpu_update_vfiq(cpu);
  }
++
-+static void hcr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
++uint32_t HELPER(mve_uqrshl)(CPUARMState *env, uint32_t n, uint32_t shift)
 +{
-+    do_hcr_write(env, value, 0);
++    return do_uqrshl_bhs(n, (int8_t)shift, 32, true, &env->QF);
 +}
 +
- static void hcr_writehigh(CPUARMState *env, const ARMCPRegInfo *ri,
++uint32_t HELPER(mve_sqrshr)(CPUARMState *env, uint32_t n, uint32_t shift)
-                           uint64_t value)
++{
- {
++    return do_sqrshl_bhs(n, -(int8_t)shift, 32, true, &env->QF);
-     /* Handle HCR2 write, i.e. write to high half of HCR_EL2 */
++}
-     value = deposit64(env->cp15.hcr_el2, 32, 32, value);
+diff --git a/target/arm/translate.c b/target/arm/translate.c
--    hcr_write(env, NULL, value);
+index XXXXXXX..XXXXXXX 100644
-+    do_hcr_write(env, value, MAKE_64BIT_MASK(0, 32));
+--- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_UQSHL_ri(DisasContext *s, arg_mve_sh_ri *a)
      return do_mve_sh_ri(s, a, gen_mve_uqshl);
  }
- static void hcr_writelow(CPUARMState *env, const ARMCPRegInfo *ri,
++static bool do_mve_sh_rr(DisasContext *s, arg_mve_sh_rr *a, ShiftFn *fn)
-@@ -XXX,XX +XXX,XX @@ static void hcr_writelow(CPUARMState *env, const ARMCPRegInfo *ri,
++{
- {
++    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-     /* Handle HCR write, i.e. write to low half of HCR_EL2 */
++        /* Decode falls through to ORR/MOV UNPREDICTABLE handling */
-     value = deposit64(env->cp15.hcr_el2, 0, 32, value);
++        return false;
--    hcr_write(env, NULL, value);
++    }
-+    do_hcr_write(env, value, MAKE_64BIT_MASK(32, 32));
++    if (!dc_isar_feature(aa32_mve, s) ||
- }
++        !arm_dc_feature(s, ARM_FEATURE_M_MAIN) ||
++        a->rda == 13 || a->rda == 15 || a->rm == 13 || a->rm == 15 ||
 +        a->rm == a->rda) {
 +        /* These rda/rm cases are UNPREDICTABLE; we choose to UNDEF */
 +        unallocated_encoding(s);
 +        return true;
 +    }
 +
 +    /* The helper takes care of the sign-extension of the low 8 bits of Rm */
 +    fn(cpu_R[a->rda], cpu_env, cpu_R[a->rda], cpu_R[a->rm]);
 +    return true;
 +}
 +
 +static bool trans_SQRSHR_rr(DisasContext *s, arg_mve_sh_rr *a)
 +{
 +    return do_mve_sh_rr(s, a, gen_helper_mve_sqrshr);
 +}
 +
 +static bool trans_UQRSHL_rr(DisasContext *s, arg_mve_sh_rr *a)
 +{
 +    return do_mve_sh_rr(s, a, gen_helper_mve_uqrshl);
 +}
 +
  /*
+  * Multiply and multiply accumulate
+  */
 --
 .20.1

-[PULL 16/37] target/arm: Add HCR_EL2 bit definitions from ARMv8.6
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200229012811.24129-3-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/cpu.h | 7 +++++++
-file changed, 7 insertions(+)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
-+++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline void xpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
- #define HCR_TERR      (1ULL << 36)
- #define HCR_TEA       (1ULL << 37)
- #define HCR_MIOCNCE   (1ULL << 38)
-+/* RES0 bit 39 */
- #define HCR_APK       (1ULL << 40)
- #define HCR_API       (1ULL << 41)
- #define HCR_NV        (1ULL << 42)
-@@ -XXX,XX +XXX,XX @@ static inline void xpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
- #define HCR_NV2       (1ULL << 45)
- #define HCR_FWB       (1ULL << 46)
- #define HCR_FIEN      (1ULL << 47)
-+/* RES0 bit 48 */
- #define HCR_TID4      (1ULL << 49)
- #define HCR_TICAB     (1ULL << 50)
-+#define HCR_AMVOFFEN  (1ULL << 51)
- #define HCR_TOCU      (1ULL << 52)
-+#define HCR_ENSCXT    (1ULL << 53)
- #define HCR_TTLBIS    (1ULL << 54)
- #define HCR_TTLBOS    (1ULL << 55)
- #define HCR_ATA       (1ULL << 56)
- #define HCR_DCT       (1ULL << 57)
-+#define HCR_TID5      (1ULL << 58)
-+#define HCR_TWEDEN    (1ULL << 59)
-+#define HCR_TWEDEL    MAKE_64BIT_MASK(60, 4)
- #define SCR_NS                (1U << 0)
- #define SCR_IRQ               (1U << 1)
---
-.20.1

-[PULL 17/37] target/arm: Disable has_el2 and has_el3 for user-only
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-In arm_cpu_reset, we configure many system registers so that user-only
-behaves as it should with a minimum of ifdefs.  However, we do not set
-all of the system registers as required for a cpu with EL2 and EL3.
-Disabling EL2 and EL3 mean that we will not look at those registers,
-which means that we don't have to worry about configuring them.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200229012811.24129-4-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/cpu.c | 6 ++++--
-file changed, 4 insertions(+), 2 deletions(-)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
-+++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static Property arm_cpu_reset_hivecs_property =
- static Property arm_cpu_rvbar_property =
-             DEFINE_PROP_UINT64("rvbar", ARMCPU, rvbar, 0);
-+#ifndef CONFIG_USER_ONLY
- static Property arm_cpu_has_el2_property =
-             DEFINE_PROP_BOOL("has_el2", ARMCPU, has_el2, true);
- static Property arm_cpu_has_el3_property =
-             DEFINE_PROP_BOOL("has_el3", ARMCPU, has_el3, true);
-+#endif
- static Property arm_cpu_cfgend_property =
-             DEFINE_PROP_BOOL("cfgend", ARMCPU, cfgend, false);
-@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
-         qdev_property_add_static(DEVICE(obj), &arm_cpu_rvbar_property);
-     }
-+#ifndef CONFIG_USER_ONLY
-     if (arm_feature(&cpu->env, ARM_FEATURE_EL3)) {
-         /* Add the has_el3 state CPU property only if EL3 is allowed.  This will
-          * prevent "has_el3" from existing on CPUs which cannot support EL3.
-          */
-         qdev_property_add_static(DEVICE(obj), &arm_cpu_has_el3_property);
--#ifndef CONFIG_USER_ONLY
-         object_property_add_link(obj, "secure-memory",
-                                  TYPE_MEMORY_REGION,
-                                  (Object **)&cpu->secure_memory,
-                                  qdev_prop_allow_set_link_before_realize,
-                                  OBJ_PROP_LINK_STRONG,
-                                  &error_abort);
--#endif
-     }
-     if (arm_feature(&cpu->env, ARM_FEATURE_EL2)) {
-         qdev_property_add_static(DEVICE(obj), &arm_cpu_has_el2_property);
-     }
-+#endif
-     if (arm_feature(&cpu->env, ARM_FEATURE_PMU)) {
-         cpu->has_pmu = true;
---
-.20.1

-[PULL 18/37] target/arm: Remove EL2 and EL3 setup from user-only
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-We have disabled EL2 and EL3 for user-only, which means that these
-registers "don't exist" and should not be set.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200229012811.24129-5-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/cpu.c | 6 ------
-file changed, 6 deletions(-)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
-+++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
-         /* Enable all PAC keys.  */
-         env->cp15.sctlr_el[1] |= (SCTLR_EnIA | SCTLR_EnIB |
-                                   SCTLR_EnDA | SCTLR_EnDB);
--        /* Enable all PAC instructions */
--        env->cp15.hcr_el2 |= HCR_API;
--        env->cp15.scr_el3 |= SCR_API;
-         /* and to the FP/Neon instructions */
-         env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 20, 2, 3);
-         /* and to the SVE instructions */
-         env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 16, 2, 3);
--        env->cp15.cptr_el[3] |= CPTR_EZ;
-         /* with maximum vector length */
-         env->vfp.zcr_el[1] = cpu_isar_feature(aa64_sve, cpu) ?
-                              cpu->sve_max_vq - 1 : 0;
--        env->vfp.zcr_el[2] = env->vfp.zcr_el[1];
--        env->vfp.zcr_el[3] = env->vfp.zcr_el[1];
-         /*
-          * Enable TBI0 and TBI1.  While the real kernel only enables TBI0,
-          * turning on both here will produce smaller code and otherwise
---
-.20.1

-[PULL 21/37] target/arm: Honor the HCR_EL2.TSW bit
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-These bits trap EL1 access to set/way cache maintenance insns.
-Buglink: https://bugs.launchpad.net/bugs/1863685
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200229012811.24129-8-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/helper.c | 22 ++++++++++++++++------
-file changed, 16 insertions(+), 6 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_tvm_trvm(CPUARMState *env, const ARMCPRegInfo *ri,
-     return CP_ACCESS_OK;
- }
-+/* Check for traps from EL1 due to HCR_EL2.TSW.  */
-+static CPAccessResult access_tsw(CPUARMState *env, const ARMCPRegInfo *ri,
-+                                 bool isread)
-+{
-+    if (arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_TSW)) {
-+        return CP_ACCESS_TRAP_EL2;
-+    }
-+    return CP_ACCESS_OK;
-+}
-+
- static void dacr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
- {
-     ARMCPU *cpu = env_archcpu(env);
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
-       .access = PL1_W, .type = ARM_CP_NOP },
-     { .name = "DC_ISW", .state = ARM_CP_STATE_AA64,
-       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 2,
--      .access = PL1_W, .type = ARM_CP_NOP },
-+      .access = PL1_W, .accessfn = access_tsw, .type = ARM_CP_NOP },
-     { .name = "DC_CVAC", .state = ARM_CP_STATE_AA64,
-       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 10, .opc2 = 1,
-       .access = PL0_W, .type = ARM_CP_NOP,
-       .accessfn = aa64_cacheop_access },
-     { .name = "DC_CSW", .state = ARM_CP_STATE_AA64,
-       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 2,
--      .access = PL1_W, .type = ARM_CP_NOP },
-+      .access = PL1_W, .accessfn = access_tsw, .type = ARM_CP_NOP },
-     { .name = "DC_CVAU", .state = ARM_CP_STATE_AA64,
-       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 11, .opc2 = 1,
-       .access = PL0_W, .type = ARM_CP_NOP,
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
-       .accessfn = aa64_cacheop_access },
-     { .name = "DC_CISW", .state = ARM_CP_STATE_AA64,
-       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 2,
--      .access = PL1_W, .type = ARM_CP_NOP },
-+      .access = PL1_W, .accessfn = access_tsw, .type = ARM_CP_NOP },
-     /* TLBI operations */
-     { .name = "TLBI_VMALLE1IS", .state = ARM_CP_STATE_AA64,
-       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 0,
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
-     { .name = "DCIMVAC", .cp = 15, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 1,
-       .type = ARM_CP_NOP, .access = PL1_W },
-     { .name = "DCISW", .cp = 15, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 2,
--      .type = ARM_CP_NOP, .access = PL1_W },
-+      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
-     { .name = "DCCMVAC", .cp = 15, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 1,
-       .type = ARM_CP_NOP, .access = PL1_W },
-     { .name = "DCCSW", .cp = 15, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 2,
--      .type = ARM_CP_NOP, .access = PL1_W },
-+      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
-     { .name = "DCCMVAU", .cp = 15, .opc1 = 0, .crn = 7, .crm = 11, .opc2 = 1,
-       .type = ARM_CP_NOP, .access = PL1_W },
-     { .name = "DCCIMVAC", .cp = 15, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 1,
-       .type = ARM_CP_NOP, .access = PL1_W },
-     { .name = "DCCISW", .cp = 15, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 2,
--      .type = ARM_CP_NOP, .access = PL1_W },
-+      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
-     /* MMU Domain access control / MPU write buffer control */
-     { .name = "DACR", .cp = 15, .opc1 = 0, .crn = 3, .crm = 0, .opc2 = 0,
-       .access = PL1_RW, .accessfn = access_tvm_trvm, .resetvalue = 0,
---
-.20.1

-[PULL 22/37] target/arm: Honor the HCR_EL2.TACR bit
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-This bit traps EL1 access to the auxiliary control registers.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200229012811.24129-9-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/helper.c | 18 ++++++++++++++----
-file changed, 14 insertions(+), 4 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_tsw(CPUARMState *env, const ARMCPRegInfo *ri,
-     return CP_ACCESS_OK;
- }
-+/* Check for traps from EL1 due to HCR_EL2.TACR.  */
-+static CPAccessResult access_tacr(CPUARMState *env, const ARMCPRegInfo *ri,
-+                                  bool isread)
-+{
-+    if (arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_TACR)) {
-+        return CP_ACCESS_TRAP_EL2;
-+    }
-+    return CP_ACCESS_OK;
-+}
-+
- static void dacr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
- {
-     ARMCPU *cpu = env_archcpu(env);
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo ats1cp_reginfo[] = {
- static const ARMCPRegInfo actlr2_hactlr2_reginfo[] = {
-     { .name = "ACTLR2", .state = ARM_CP_STATE_AA32,
-       .cp = 15, .opc1 = 0, .crn = 1, .crm = 0, .opc2 = 3,
--      .access = PL1_RW, .type = ARM_CP_CONST,
--      .resetvalue = 0 },
-+      .access = PL1_RW, .accessfn = access_tacr,
-+      .type = ARM_CP_CONST, .resetvalue = 0 },
-     { .name = "HACTLR2", .state = ARM_CP_STATE_AA32,
-       .cp = 15, .opc1 = 4, .crn = 1, .crm = 0, .opc2 = 3,
-       .access = PL2_RW, .type = ARM_CP_CONST,
-@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
-         ARMCPRegInfo auxcr_reginfo[] = {
-             { .name = "ACTLR_EL1", .state = ARM_CP_STATE_BOTH,
-               .opc0 = 3, .opc1 = 0, .crn = 1, .crm = 0, .opc2 = 1,
--              .access = PL1_RW, .type = ARM_CP_CONST,
--              .resetvalue = cpu->reset_auxcr },
-+              .access = PL1_RW, .accessfn = access_tacr,
-+              .type = ARM_CP_CONST, .resetvalue = cpu->reset_auxcr },
-             { .name = "ACTLR_EL2", .state = ARM_CP_STATE_BOTH,
-               .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 0, .opc2 = 1,
-               .access = PL2_RW, .type = ARM_CP_CONST,
---
-.20.1

-[PULL 23/37] target/arm: Honor the HCR_EL2.TPCP bit
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-This bit traps EL1 access to cache maintenance insns that operate
-to the point of coherency or persistence.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200229012811.24129-10-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/helper.c | 39 +++++++++++++++++++++++++++++++--------
-file changed, 31 insertions(+), 8 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static CPAccessResult aa64_cacheop_access(CPUARMState *env,
-     return CP_ACCESS_OK;
- }
-+static CPAccessResult aa64_cacheop_poc_access(CPUARMState *env,
-+                                              const ARMCPRegInfo *ri,
-+                                              bool isread)
-+{
-+    /* Cache invalidate/clean to Point of Coherency or Persistence...  */
-+    switch (arm_current_el(env)) {
-+    case 0:
-+        /* ... EL0 must UNDEF unless SCTLR_EL1.UCI is set.  */
-+        if (!(arm_sctlr(env, 0) & SCTLR_UCI)) {
-+            return CP_ACCESS_TRAP;
-+        }
-+        /* fall through */
-+    case 1:
-+        /* ... EL1 must trap to EL2 if HCR_EL2.TPCP is set.  */
-+        if (arm_hcr_el2_eff(env) & HCR_TPCP) {
-+            return CP_ACCESS_TRAP_EL2;
-+        }
-+        break;
-+    }
-+    return CP_ACCESS_OK;
-+}
-+
- /* See: D4.7.2 TLB maintenance requirements and the TLB maintenance instructions
-  * Page D4-1736 (DDI0487A.b)
-  */
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
-       .accessfn = aa64_cacheop_access },
-     { .name = "DC_IVAC", .state = ARM_CP_STATE_AA64,
-       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 1,
--      .access = PL1_W, .type = ARM_CP_NOP },
-+      .access = PL1_W, .accessfn = aa64_cacheop_poc_access,
-+      .type = ARM_CP_NOP },
-     { .name = "DC_ISW", .state = ARM_CP_STATE_AA64,
-       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 2,
-       .access = PL1_W, .accessfn = access_tsw, .type = ARM_CP_NOP },
-     { .name = "DC_CVAC", .state = ARM_CP_STATE_AA64,
-       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 10, .opc2 = 1,
-       .access = PL0_W, .type = ARM_CP_NOP,
--      .accessfn = aa64_cacheop_access },
-+      .accessfn = aa64_cacheop_poc_access },
-     { .name = "DC_CSW", .state = ARM_CP_STATE_AA64,
-       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 2,
-       .access = PL1_W, .accessfn = access_tsw, .type = ARM_CP_NOP },
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
-     { .name = "DC_CIVAC", .state = ARM_CP_STATE_AA64,
-       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 14, .opc2 = 1,
-       .access = PL0_W, .type = ARM_CP_NOP,
--      .accessfn = aa64_cacheop_access },
-+      .accessfn = aa64_cacheop_poc_access },
-     { .name = "DC_CISW", .state = ARM_CP_STATE_AA64,
-       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 2,
-       .access = PL1_W, .accessfn = access_tsw, .type = ARM_CP_NOP },
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
-     { .name = "BPIMVA", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 7,
-       .type = ARM_CP_NOP, .access = PL1_W },
-     { .name = "DCIMVAC", .cp = 15, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 1,
--      .type = ARM_CP_NOP, .access = PL1_W },
-+      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_poc_access },
-     { .name = "DCISW", .cp = 15, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 2,
-       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
-     { .name = "DCCMVAC", .cp = 15, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 1,
--      .type = ARM_CP_NOP, .access = PL1_W },
-+      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_poc_access },
-     { .name = "DCCSW", .cp = 15, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 2,
-       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
-     { .name = "DCCMVAU", .cp = 15, .opc1 = 0, .crn = 7, .crm = 11, .opc2 = 1,
-       .type = ARM_CP_NOP, .access = PL1_W },
-     { .name = "DCCIMVAC", .cp = 15, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 1,
--      .type = ARM_CP_NOP, .access = PL1_W },
-+      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_poc_access },
-     { .name = "DCCISW", .cp = 15, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 2,
-       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
-     /* MMU Domain access control / MPU write buffer control */
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo dcpop_reg[] = {
-     { .name = "DC_CVAP", .state = ARM_CP_STATE_AA64,
-       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 12, .opc2 = 1,
-       .access = PL0_W, .type = ARM_CP_NO_RAW | ARM_CP_SUPPRESS_TB_END,
--      .accessfn = aa64_cacheop_access, .writefn = dccvap_writefn },
-+      .accessfn = aa64_cacheop_poc_access, .writefn = dccvap_writefn },
-     REGINFO_SENTINEL
- };
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo dcpodp_reg[] = {
-     { .name = "DC_CVADP", .state = ARM_CP_STATE_AA64,
-       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 13, .opc2 = 1,
-       .access = PL0_W, .type = ARM_CP_NO_RAW | ARM_CP_SUPPRESS_TB_END,
--      .accessfn = aa64_cacheop_access, .writefn = dccvap_writefn },
-+      .accessfn = aa64_cacheop_poc_access, .writefn = dccvap_writefn },
-     REGINFO_SENTINEL
- };
- #endif /*CONFIG_USER_ONLY*/
---
-.20.1

-[PULL 26/37] tests/tcg/aarch64: Add newline in pauth-1 printf
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-Make the output just a bit prettier when running by hand.
-Cc: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20200229012811.24129-13-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- tests/tcg/aarch64/pauth-1.c | 2 +-
-file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/tests/tcg/aarch64/pauth-1.c b/tests/tcg/aarch64/pauth-1.c
-index XXXXXXX..XXXXXXX 100644
---- a/tests/tcg/aarch64/pauth-1.c
-+++ b/tests/tcg/aarch64/pauth-1.c
-@@ -XXX,XX +XXX,XX @@ int main()
-     }
-     perc = (float) count / (float) (TESTS * 2);
--    printf("Ptr Check: %0.2f%%", perc * 100.0);
-+    printf("Ptr Check: %0.2f%%\n", perc * 100.0);
-     assert(perc > 0.95);
-     return 0;
- }
---
-.20.1

-[PULL 27/37] hw/arm/cubieboard: use ARM Cortex-A8 as the default CPU in machine definition
+Deleted patch
-From: Niek Linnenbank <nieklinnenbank@gmail.com>
-The Cubieboard is a singleboard computer with an Allwinner A10 System-on-Chip [1].
-As documented in the Allwinner A10 User Manual V1.5 [2], the SoC has an ARM
-Cortex-A8 processor. Currently the Cubieboard machine definition specifies the
-ARM Cortex-A9 in its description and as the default CPU.
-This patch corrects the Cubieboard machine definition to use the ARM Cortex-A8.
-The only user-visible effect is that our textual description of the
-machine was wrong, because hw/arm/allwinner-a10.c always creates a
-Cortex-A8 CPU regardless of the default value in the MachineClass struct.
- [1] http://docs.cubieboard.org/products/start#cubieboard1
- [2] https://linux-sunxi.org/File:Allwinner_A10_User_manual_V1.5.pdf
-Fixes: 8a863c8120994981a099
-Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
-Message-id: 20200227220149.6845-2-nieklinnenbank@gmail.com
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-[note in commit message that the bug didn't have much visible effect]
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- hw/arm/cubieboard.c | 4 ++--
-file changed, 2 insertions(+), 2 deletions(-)
-diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/cubieboard.c
-+++ b/hw/arm/cubieboard.c
-@@ -XXX,XX +XXX,XX @@ static void cubieboard_init(MachineState *machine)
- static void cubieboard_machine_init(MachineClass *mc)
- {
--    mc->desc = "cubietech cubieboard (Cortex-A9)";
--    mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a9");
-+    mc->desc = "cubietech cubieboard (Cortex-A8)";
-+    mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a8");
-     mc->init = cubieboard_init;
-     mc->block_default_type = IF_IDE;
-     mc->units_per_default_bus = 1;
---
-.20.1

Nothing much exciting here, but it's 37 patches worth...

thanks
-- PMM

The following changes since commit e64a62df378a746c0b257105959613c9f8122e59:

Merge remote-tracking branch 'remotes/stsquad/tags/pull-testing-040320-1' into staging (2020-03-05 12:13:51 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200305

for you to fetch changes up to 597d61a3b1f94c53a3aaa77671697c0c5f797dbf:

target/arm: Clean address for DC ZVA (2020-03-05 16:09:21 +0000)

----------------------------------------------------------------
 * versal: Implement ADMA
 * Implement (trivially) ARMv8.2-TTCNP
 * hw/arm/smmu-common: a fix to smmu_find_smmu_pcibus
 * Remove unnecessary endianness-handling on some boards
 * Avoid minor memory leaks from timer_new in some devices
 * Honour more of the HCR_EL2 trap bits
 * Complain rather than ignoring bad command line options for cubieboard
 * Honour TBI for DC ZVA and exception return

----------------------------------------------------------------
Edgar E. Iglesias (2):
      hw/arm: versal: Add support for the LPD ADMAs
      hw/arm: versal: Generate xlnx-versal-virt zdma FDT nodes

Eric Auger (1):
      hw/arm/smmu-common: a fix to smmu_find_smmu_pcibus

Niek Linnenbank (4):
      hw/arm/cubieboard: use ARM Cortex-A8 as the default CPU in machine definition
      hw/arm/cubieboard: restrict allowed CPU type to ARM Cortex-A8
      hw/arm/cubieboard: restrict allowed RAM size to 512MiB and 1GiB
      hw/arm/cubieboard: report error when using unsupported -bios argument

Pan Nengyuan (4):
      hw/arm/pxa2xx: move timer_new from init() into realize() to avoid memleaks
      hw/arm/spitz: move timer_new from init() into realize() to avoid memleaks
      hw/arm/strongarm: move timer_new from init() into realize() to avoid memleaks
      hw/timer/cadence_ttc: move timer_new from init() into realize() to avoid memleaks

Peter Maydell (1):
      target/arm: Implement (trivially) ARMv8.2-TTCNP

Philippe Mathieu-Daudé (6):
      hw/arm/smmu-common: Simplify smmu_find_smmu_pcibus() logic
      hw/arm/gumstix: Simplify since the machines are little-endian only
      hw/arm/mainstone: Simplify since the machines are little-endian only
      hw/arm/omap_sx1: Simplify since the machines are little-endian only
      hw/arm/z2: Simplify since the machines are little-endian only
      hw/arm/musicpal: Simplify since the machines are little-endian only

Richard Henderson (19):
      target/arm: Improve masking of HCR/HCR2 RES0 bits
      target/arm: Add HCR_EL2 bit definitions from ARMv8.6
      target/arm: Disable has_el2 and has_el3 for user-only
      target/arm: Remove EL2 and EL3 setup from user-only
      target/arm: Improve masking in arm_hcr_el2_eff
      target/arm: Honor the HCR_EL2.{TVM,TRVM} bits
      target/arm: Honor the HCR_EL2.TSW bit
      target/arm: Honor the HCR_EL2.TACR bit
      target/arm: Honor the HCR_EL2.TPCP bit
      target/arm: Honor the HCR_EL2.TPU bit
      target/arm: Honor the HCR_EL2.TTLB bit
      tests/tcg/aarch64: Add newline in pauth-1 printf
      target/arm: Replicate TBI/TBID bits for single range regimes
      target/arm: Optimize cpu_mmu_index
      target/arm: Introduce core_to_aa64_mmu_idx
      target/arm: Apply TBI to ESR_ELx in helper_exception_return
      target/arm: Move helper_dc_zva to helper-a64.c
      target/arm: Use DEF_HELPER_FLAGS for helper_dc_zva
      target/arm: Clean address for DC ZVA

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Add support for the Versal LPD ADMAs.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Reviewed-by: KONRAD Frederic <frederic.konrad@adacore.com>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/xlnx-versal.h |  6 ++++++
 hw/arm/xlnx-versal.c         | 24 ++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/xlnx-versal.h
+++ b/include/hw/arm/xlnx-versal.h
@@ -XXX,XX +XXX,XX @@
 #define XLNX_VERSAL_NR_ACPUS   2
 #define XLNX_VERSAL_NR_UARTS   2
 #define XLNX_VERSAL_NR_GEMS    2
+#define XLNX_VERSAL_NR_ADMAS   8
 #define XLNX_VERSAL_NR_IRQS    192
 
 typedef struct Versal {
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
         struct {
             SysBusDevice *uart[XLNX_VERSAL_NR_UARTS];
             SysBusDevice *gem[XLNX_VERSAL_NR_GEMS];
+            SysBusDevice *adma[XLNX_VERSAL_NR_ADMAS];
         } iou;
     } lpd;
 
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
 #define VERSAL_GEM0_WAKE_IRQ_0     57
 #define VERSAL_GEM1_IRQ_0          58
 #define VERSAL_GEM1_WAKE_IRQ_0     59
+#define VERSAL_ADMA_IRQ_0          60
 
 /* Architecturally reserved IRQs suitable for virtualization.  */
 #define VERSAL_RSVD_IRQ_FIRST 111
@@ -XXX,XX +XXX,XX @@ typedef struct Versal {
 #define MM_GEM1                     0xff0d0000U
 #define MM_GEM1_SIZE                0x10000
 
+#define MM_ADMA_CH0                 0xffa80000U
+#define MM_ADMA_CH0_SIZE            0x10000
+
 #define MM_OCM                      0xfffc0000U
 #define MM_OCM_SIZE                 0x40000
 
diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -XXX,XX +XXX,XX @@ static void versal_create_gems(Versal *s, qemu_irq *pic)
     }
 }
 
+static void versal_create_admas(Versal *s, qemu_irq *pic)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(s->lpd.iou.adma); i++) {
+        char *name = g_strdup_printf("adma%d", i);
+        DeviceState *dev;
+        MemoryRegion *mr;
+
+        dev = qdev_create(NULL, "xlnx.zdma");
+        s->lpd.iou.adma[i] = SYS_BUS_DEVICE(dev);
+        object_property_add_child(OBJECT(s), name, OBJECT(dev), &error_fatal);
+        qdev_init_nofail(dev);
+
+        mr = sysbus_mmio_get_region(s->lpd.iou.adma[i], 0);
+        memory_region_add_subregion(&s->mr_ps,
+                                    MM_ADMA_CH0 + i * MM_ADMA_CH0_SIZE, mr);
+
+        sysbus_connect_irq(s->lpd.iou.adma[i], 0, pic[VERSAL_ADMA_IRQ_0 + i]);
+        g_free(name);
+    }
+}
+
 /* This takes the board allocated linear DDR memory and creates aliases
  * for each split DDR range/aperture on the Versal address map.
  */
@@ -XXX,XX +XXX,XX @@ static void versal_realize(DeviceState *dev, Error **errp)
     versal_create_apu_gic(s, pic);
     versal_create_uarts(s, pic);
     versal_create_gems(s, pic);
+    versal_create_admas(s, pic);
     versal_map_ddr(s);
     versal_unimp(s);
 
-- 
2.20.1

From: "Edgar E. Iglesias" <edgar.iglesias@xilinx.com>

Generate xlnx-versal-virt zdma FDT nodes.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Reviewed-by: KONRAD Frederic <frederic.konrad@adacore.com>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/xlnx-versal-virt.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/hw/arm/xlnx-versal-virt.c b/hw/arm/xlnx-versal-virt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xlnx-versal-virt.c
+++ b/hw/arm/xlnx-versal-virt.c
@@ -XXX,XX +XXX,XX @@ static void fdt_add_gem_nodes(VersalVirt *s)
     }
 }
 
+static void fdt_add_zdma_nodes(VersalVirt *s)
+{
+    const char clocknames[] = "clk_main\0clk_apb";
+    const char compat[] = "xlnx,zynqmp-dma-1.0";
+    int i;
+
+    for (i = XLNX_VERSAL_NR_ADMAS - 1; i >= 0; i--) {
+        uint64_t addr = MM_ADMA_CH0 + MM_ADMA_CH0_SIZE * i;
+        char *name = g_strdup_printf("/dma@%" PRIx64, addr);
+
+        qemu_fdt_add_subnode(s->fdt, name);
+
+        qemu_fdt_setprop_cell(s->fdt, name, "xlnx,bus-width", 64);
+        qemu_fdt_setprop_cells(s->fdt, name, "clocks",
+                               s->phandle.clk_25Mhz, s->phandle.clk_25Mhz);
+        qemu_fdt_setprop(s->fdt, name, "clock-names",
+                         clocknames, sizeof(clocknames));
+        qemu_fdt_setprop_cells(s->fdt, name, "interrupts",
+                               GIC_FDT_IRQ_TYPE_SPI, VERSAL_ADMA_IRQ_0 + i,
+                               GIC_FDT_IRQ_FLAGS_LEVEL_HI);
+        qemu_fdt_setprop_sized_cells(s->fdt, name, "reg",
+                                     2, addr, 2, 0x1000);
+        qemu_fdt_setprop(s->fdt, name, "compatible", compat, sizeof(compat));
+        g_free(name);
+    }
+}
+
 static void fdt_nop_memory_nodes(void *fdt, Error **errp)
 {
     Error *err = NULL;
@@ -XXX,XX +XXX,XX @@ static void versal_virt_init(MachineState *machine)
     fdt_add_uart_nodes(s);
     fdt_add_gic_nodes(s);
     fdt_add_timer_nodes(s);
+    fdt_add_zdma_nodes(s);
     fdt_add_cpu_nodes(s, psci_conduit);
     fdt_add_clk_node(s, "/clk125", 125000000, s->phandle.clk_125Mhz);
     fdt_add_clk_node(s, "/clk25", 25000000, s->phandle.clk_25Mhz);
-- 
2.20.1

The ARMv8.2-TTCNP extension allows an implementation to optimize by
sharing TLB entries between multiple cores, provided that software
declares that it's ready to deal with this by setting a CnP bit in
the TTBRn_ELx.  It is mandatory from ARMv8.2 onward.

For QEMU's TLB implementation, sharing TLB entries between different
cores would not really benefit us and would be a lot of work to
implement.  So we implement this extension in the "trivial" manner:
we allow the guest to set and read back the CnP bit, but don't change
our behaviour (this is an architecturally valid implementation
choice).

The only code path which looks at the TTBRn_ELx values for the
long-descriptor format where the CnP bit is defined is already doing
enough masking to not get confused when the CnP bit at the bottom of
the register is set, so we can simply add a comment noting why we're
relying on that mask.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200225193822.18874-1-peter.maydell@linaro.org
---
 target/arm/cpu.c    | 1 +
 target/arm/cpu64.c  | 2 ++
 target/arm/helper.c | 4 ++++
 3 files changed, 7 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
             t = cpu->isar.id_mmfr4;
             t = FIELD_DP32(t, ID_MMFR4, HPDS, 1); /* AA32HPD */
             t = FIELD_DP32(t, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
+            t = FIELD_DP32(t, ID_MMFR4, CNP, 1); /* TTCNP */
             cpu->isar.id_mmfr4 = t;
         }
 #endif
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
 
         t = cpu->isar.id_aa64mmfr2;
         t = FIELD_DP64(t, ID_AA64MMFR2, UAO, 1);
+        t = FIELD_DP64(t, ID_AA64MMFR2, CNP, 1); /* TTCNP */
         cpu->isar.id_aa64mmfr2 = t;
 
         /* Replicate the same data to the 32-bit id registers.  */
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         u = cpu->isar.id_mmfr4;
         u = FIELD_DP32(u, ID_MMFR4, HPDS, 1); /* AA32HPD */
         u = FIELD_DP32(u, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
+        u = FIELD_DP32(t, ID_MMFR4, CNP, 1); /* TTCNP */
         cpu->isar.id_mmfr4 = u;
 
         u = cpu->isar.id_aa64dfr0;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
 
     /* Now we can extract the actual base address from the TTBR */
     descaddr = extract64(ttbr, 0, 48);
+    /*
+     * We rely on this masking to clear the RES0 bits at the bottom of the TTBR
+     * and also to mask out CnP (bit 0) which could validly be non-zero.
+     */
     descaddr &= ~indexmask;
 
     /* The address field in the descriptor goes up to bit 39 for ARMv7
-- 
2.20.1

From: Eric Auger <eric.auger@redhat.com>

Make sure a null SMMUPciBus is returned in case we were
not able to identify a pci bus matching the @bus_num.

This matches the fix done on intel iommu in commit:
a2e1cd41ccfe796529abfd1b6aeb1dd4393762a2

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200226172628.17449-1-eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/smmu-common.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -XXX,XX +XXX,XX @@ SMMUPciBus *smmu_find_smmu_pcibus(SMMUState *s, uint8_t bus_num)
                 return smmu_pci_bus;
             }
         }
+        smmu_pci_bus = NULL;
     }
     return smmu_pci_bus;
 }
-- 
2.20.1

From: Philippe Mathieu-Daudé <philmd@redhat.com>

The smmu_find_smmu_pcibus() function was introduced (in commit
cac994ef43b) in a code format that could return an incorrect
pointer, which was then fixed by the previous commit.
We could have avoided this by writing the if() statement
differently. Do it now, in case this function is re-used.
The code is easier to review (harder to miss bugs).

Acked-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/smmu-common.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -XXX,XX +XXX,XX @@ inline int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, IOMMUAccessFlags perm,
 SMMUPciBus *smmu_find_smmu_pcibus(SMMUState *s, uint8_t bus_num)
 {
     SMMUPciBus *smmu_pci_bus = s->smmu_pcibus_by_bus_num[bus_num];
+    GHashTableIter iter;
 
-    if (!smmu_pci_bus) {
-        GHashTableIter iter;
-
-        g_hash_table_iter_init(&iter, s->smmu_pcibus_by_busptr);
-        while (g_hash_table_iter_next(&iter, NULL, (void **)&smmu_pci_bus)) {
-            if (pci_bus_num(smmu_pci_bus->bus) == bus_num) {
-                s->smmu_pcibus_by_bus_num[bus_num] = smmu_pci_bus;
-                return smmu_pci_bus;
-            }
-        }
-        smmu_pci_bus = NULL;
+    if (smmu_pci_bus) {
+        return smmu_pci_bus;
     }
-    return smmu_pci_bus;
+
+    g_hash_table_iter_init(&iter, s->smmu_pcibus_by_busptr);
+    while (g_hash_table_iter_next(&iter, NULL, (void **)&smmu_pci_bus)) {
+        if (pci_bus_num(smmu_pci_bus->bus) == bus_num) {
+            s->smmu_pcibus_by_bus_num[bus_num] = smmu_pci_bus;
+            return smmu_pci_bus;
+        }
+    }
+
+    return NULL;
 }
 
 static AddressSpace *smmu_find_add_as(PCIBus *bus, void *opaque, int devfn)
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

As the Connex and Verdex machines only boot in little-endian,
we can simplify the code.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/gumstix.c | 16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/hw/arm/gumstix.c b/hw/arm/gumstix.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/gumstix.c
+++ b/hw/arm/gumstix.c
@@ -XXX,XX +XXX,XX @@ static void connex_init(MachineState *machine)
 {
     PXA2xxState *cpu;
     DriveInfo *dinfo;
-    int be;
     MemoryRegion *address_space_mem = get_system_memory();
 
     uint32_t connex_rom = 0x01000000;
@@ -XXX,XX +XXX,XX @@ static void connex_init(MachineState *machine)
         exit(1);
     }
 
-#ifdef TARGET_WORDS_BIGENDIAN
-    be = 1;
-#else
-    be = 0;
-#endif
     if (!pflash_cfi01_register(0x00000000, "connext.rom", connex_rom,
                                dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
-                               sector_len, 2, 0, 0, 0, 0, be)) {
+                               sector_len, 2, 0, 0, 0, 0, 0)) {
         error_report("Error registering flash memory");
         exit(1);
     }
@@ -XXX,XX +XXX,XX @@ static void verdex_init(MachineState *machine)
 {
     PXA2xxState *cpu;
     DriveInfo *dinfo;
-    int be;
     MemoryRegion *address_space_mem = get_system_memory();
 
     uint32_t verdex_rom = 0x02000000;
@@ -XXX,XX +XXX,XX @@ static void verdex_init(MachineState *machine)
         exit(1);
     }
 
-#ifdef TARGET_WORDS_BIGENDIAN
-    be = 1;
-#else
-    be = 0;
-#endif
     if (!pflash_cfi01_register(0x00000000, "verdex.rom", verdex_rom,
                                dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
-                               sector_len, 2, 0, 0, 0, 0, be)) {
+                               sector_len, 2, 0, 0, 0, 0, 0)) {
         error_report("Error registering flash memory");
         exit(1);
     }
-- 
2.20.1

From: Philippe Mathieu-Daudé <philmd@redhat.com>

We only build the little-endian softmmu configurations. Checking
for big endian is pointless, remove the unused code.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/mainstone.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/hw/arm/mainstone.c b/hw/arm/mainstone.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mainstone.c
+++ b/hw/arm/mainstone.c
@@ -XXX,XX +XXX,XX @@ static void mainstone_common_init(MemoryRegion *address_space_mem,
     DeviceState *mst_irq;
     DriveInfo *dinfo;
     int i;
-    int be;
     MemoryRegion *rom = g_new(MemoryRegion, 1);
 
     /* Setup CPU & memory */
@@ -XXX,XX +XXX,XX @@ static void mainstone_common_init(MemoryRegion *address_space_mem,
     memory_region_set_readonly(rom, true);
     memory_region_add_subregion(address_space_mem, 0, rom);
 
-#ifdef TARGET_WORDS_BIGENDIAN
-    be = 1;
-#else
-    be = 0;
-#endif
     /* There are two 32MiB flash devices on the board */
     for (i = 0; i < 2; i ++) {
         dinfo = drive_get(IF_PFLASH, 0, i);
@@ -XXX,XX +XXX,XX @@ static void mainstone_common_init(MemoryRegion *address_space_mem,
                                    i ? "mainstone.flash1" : "mainstone.flash0",
                                    MAINSTONE_FLASH,
                                    dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
-                                   sector_len, 4, 0, 0, 0, 0, be)) {
+                                   sector_len, 4, 0, 0, 0, 0, 0)) {
             error_report("Error registering flash memory");
             exit(1);
         }
-- 
2.20.1

From: Philippe Mathieu-Daudé <philmd@redhat.com>

We only build the little-endian softmmu configurations. Checking
for big endian is pointless, remove the unused code.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/omap_sx1.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/hw/arm/omap_sx1.c b/hw/arm/omap_sx1.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/omap_sx1.c
+++ b/hw/arm/omap_sx1.c
@@ -XXX,XX +XXX,XX @@ static void sx1_init(MachineState *machine, const int version)
     DriveInfo *dinfo;
     int fl_idx;
     uint32_t flash_size = flash0_size;
-    int be;
 
     if (machine->ram_size != mc->default_ram_size) {
         char *sz = size_to_str(mc->default_ram_size);
@@ -XXX,XX +XXX,XX @@ static void sx1_init(MachineState *machine, const int version)
                                 OMAP_CS2_BASE, &cs[3]);
 
     fl_idx = 0;
-#ifdef TARGET_WORDS_BIGENDIAN
-    be = 1;
-#else
-    be = 0;
-#endif
-
     if ((dinfo = drive_get(IF_PFLASH, 0, fl_idx)) != NULL) {
         if (!pflash_cfi01_register(OMAP_CS0_BASE,
                                    "omap_sx1.flash0-1", flash_size,
                                    blk_by_legacy_dinfo(dinfo),
-                                   sector_size, 4, 0, 0, 0, 0, be)) {
+                                   sector_size, 4, 0, 0, 0, 0, 0)) {
             fprintf(stderr, "qemu: Error registering flash memory %d.\n",
                            fl_idx);
         }
@@ -XXX,XX +XXX,XX @@ static void sx1_init(MachineState *machine, const int version)
         if (!pflash_cfi01_register(OMAP_CS1_BASE,
                                    "omap_sx1.flash1-1", flash1_size,
                                    blk_by_legacy_dinfo(dinfo),
-                                   sector_size, 4, 0, 0, 0, 0, be)) {
+                                   sector_size, 4, 0, 0, 0, 0, 0)) {
             fprintf(stderr, "qemu: Error registering flash memory %d.\n",
                            fl_idx);
         }
-- 
2.20.1

From: Philippe Mathieu-Daudé <philmd@redhat.com>

We only build the little-endian softmmu configurations. Checking
for big endian is pointless, remove the unused code.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/z2.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/hw/arm/z2.c b/hw/arm/z2.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/z2.c
+++ b/hw/arm/z2.c
@@ -XXX,XX +XXX,XX @@ static void z2_init(MachineState *machine)
     uint32_t sector_len = 0x10000;
     PXA2xxState *mpu;
     DriveInfo *dinfo;
-    int be;
     void *z2_lcd;
     I2CBus *bus;
     DeviceState *wm;
@@ -XXX,XX +XXX,XX @@ static void z2_init(MachineState *machine)
     /* Setup CPU & memory */
     mpu = pxa270_init(address_space_mem, z2_binfo.ram_size, machine->cpu_type);
 
-#ifdef TARGET_WORDS_BIGENDIAN
-    be = 1;
-#else
-    be = 0;
-#endif
     dinfo = drive_get(IF_PFLASH, 0, 0);
     if (!pflash_cfi01_register(Z2_FLASH_BASE, "z2.flash0", Z2_FLASH_SIZE,
                                dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
-                               sector_len, 4, 0, 0, 0, 0, be)) {
+                               sector_len, 4, 0, 0, 0, 0, 0)) {
         error_report("Error registering flash memory");
         exit(1);
     }
-- 
2.20.1

From: Philippe Mathieu-Daudé <philmd@redhat.com>

We only build the little-endian softmmu configurations. Checking
for big endian is pointless, remove the unused code.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/musicpal.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/musicpal.c
+++ b/hw/arm/musicpal.c
@@ -XXX,XX +XXX,XX @@ static void musicpal_init(MachineState *machine)
          * 0xFF800000 (if there is 8 MB flash). So remap flash access if the
          * image is smaller than 32 MB.
          */
-#ifdef TARGET_WORDS_BIGENDIAN
-        pflash_cfi02_register(0x100000000ULL - MP_FLASH_SIZE_MAX,
-                              "musicpal.flash", flash_size,
-                              blk, 0x10000,
-                              MP_FLASH_SIZE_MAX / flash_size,
-                              2, 0x00BF, 0x236D, 0x0000, 0x0000,
-                              0x5555, 0x2AAA, 1);
-#else
         pflash_cfi02_register(0x100000000ULL - MP_FLASH_SIZE_MAX,
                               "musicpal.flash", flash_size,
                               blk, 0x10000,
                               MP_FLASH_SIZE_MAX / flash_size,
                               2, 0x00BF, 0x236D, 0x0000, 0x0000,
                               0x5555, 0x2AAA, 0);
-#endif
-
     }
     sysbus_create_simple(TYPE_MV88W8618_FLASHCFG, MP_FLASHCFG_BASE, NULL);
 
-- 
2.20.1

From: Pan Nengyuan <pannengyuan@huawei.com>

There are some memleaks when we call 'device_list_properties'. This patch move timer_new from init into realize to fix it.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-id: 20200227025055.14341-3-pannengyuan@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/pxa2xx.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/hw/arm/pxa2xx.c b/hw/arm/pxa2xx.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/pxa2xx.c
+++ b/hw/arm/pxa2xx.c
@@ -XXX,XX +XXX,XX @@ static void pxa2xx_rtc_init(Object *obj)
     s->last_rtcpicr = 0;
     s->last_hz = s->last_sw = s->last_pi = qemu_clock_get_ms(rtc_clock);
 
+    sysbus_init_irq(dev, &s->rtc_irq);
+
+    memory_region_init_io(&s->iomem, obj, &pxa2xx_rtc_ops, s,
+                          "pxa2xx-rtc", 0x10000);
+    sysbus_init_mmio(dev, &s->iomem);
+}
+
+static void pxa2xx_rtc_realize(DeviceState *dev, Error **errp)
+{
+    PXA2xxRTCState *s = PXA2XX_RTC(dev);
     s->rtc_hz    = timer_new_ms(rtc_clock, pxa2xx_rtc_hz_tick,    s);
     s->rtc_rdal1 = timer_new_ms(rtc_clock, pxa2xx_rtc_rdal1_tick, s);
     s->rtc_rdal2 = timer_new_ms(rtc_clock, pxa2xx_rtc_rdal2_tick, s);
     s->rtc_swal1 = timer_new_ms(rtc_clock, pxa2xx_rtc_swal1_tick, s);
     s->rtc_swal2 = timer_new_ms(rtc_clock, pxa2xx_rtc_swal2_tick, s);
     s->rtc_pi    = timer_new_ms(rtc_clock, pxa2xx_rtc_pi_tick,    s);
-
-    sysbus_init_irq(dev, &s->rtc_irq);
-
-    memory_region_init_io(&s->iomem, obj, &pxa2xx_rtc_ops, s,
-                          "pxa2xx-rtc", 0x10000);
-    sysbus_init_mmio(dev, &s->iomem);
 }
 
 static int pxa2xx_rtc_pre_save(void *opaque)
@@ -XXX,XX +XXX,XX @@ static void pxa2xx_rtc_sysbus_class_init(ObjectClass *klass, void *data)
 
     dc->desc = "PXA2xx RTC Controller";
     dc->vmsd = &vmstate_pxa2xx_rtc_regs;
+    dc->realize = pxa2xx_rtc_realize;
 }
 
 static const TypeInfo pxa2xx_rtc_sysbus_info = {
-- 
2.20.1

From: Pan Nengyuan <pannengyuan@huawei.com>

There are some memleaks when we call 'device_list_properties'. This patch move timer_new from init into realize to fix it.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-id: 20200227025055.14341-4-pannengyuan@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/spitz.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/arm/spitz.c b/hw/arm/spitz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/spitz.c
+++ b/hw/arm/spitz.c
@@ -XXX,XX +XXX,XX @@ static void spitz_keyboard_init(Object *obj)
 
     spitz_keyboard_pre_map(s);
 
-    s->kbdtimer = timer_new_ns(QEMU_CLOCK_VIRTUAL, spitz_keyboard_tick, s);
     qdev_init_gpio_in(dev, spitz_keyboard_strobe, SPITZ_KEY_STROBE_NUM);
     qdev_init_gpio_out(dev, s->sense, SPITZ_KEY_SENSE_NUM);
 }
 
+static void spitz_keyboard_realize(DeviceState *dev, Error **errp)
+{
+    SpitzKeyboardState *s = SPITZ_KEYBOARD(dev);
+    s->kbdtimer = timer_new_ns(QEMU_CLOCK_VIRTUAL, spitz_keyboard_tick, s);
+}
+
 /* LCD backlight controller */
 
 #define LCDTG_RESCTL	0x00
@@ -XXX,XX +XXX,XX @@ static void spitz_keyboard_class_init(ObjectClass *klass, void *data)
     DeviceClass *dc = DEVICE_CLASS(klass);
 
     dc->vmsd = &vmstate_spitz_kbd;
+    dc->realize = spitz_keyboard_realize;
 }
 
 static const TypeInfo spitz_keyboard_info = {
-- 
2.20.1

From: Pan Nengyuan <pannengyuan@huawei.com>

There are some memleaks when we call 'device_list_properties'. This patch move timer_new from init into realize to fix it.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-id: 20200227025055.14341-5-pannengyuan@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/strongarm.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/hw/arm/strongarm.c b/hw/arm/strongarm.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/strongarm.c
+++ b/hw/arm/strongarm.c
@@ -XXX,XX +XXX,XX @@ static void strongarm_rtc_init(Object *obj)
     s->last_rcnr = (uint32_t) mktimegm(&tm);
     s->last_hz = qemu_clock_get_ms(rtc_clock);
 
-    s->rtc_alarm = timer_new_ms(rtc_clock, strongarm_rtc_alarm_tick, s);
-    s->rtc_hz = timer_new_ms(rtc_clock, strongarm_rtc_hz_tick, s);
-
     sysbus_init_irq(dev, &s->rtc_irq);
     sysbus_init_irq(dev, &s->rtc_hz_irq);
 
@@ -XXX,XX +XXX,XX @@ static void strongarm_rtc_init(Object *obj)
     sysbus_init_mmio(dev, &s->iomem);
 }
 
+static void strongarm_rtc_realize(DeviceState *dev, Error **errp)
+{
+    StrongARMRTCState *s = STRONGARM_RTC(dev);
+    s->rtc_alarm = timer_new_ms(rtc_clock, strongarm_rtc_alarm_tick, s);
+    s->rtc_hz = timer_new_ms(rtc_clock, strongarm_rtc_hz_tick, s);
+}
+
 static int strongarm_rtc_pre_save(void *opaque)
 {
     StrongARMRTCState *s = opaque;
@@ -XXX,XX +XXX,XX @@ static void strongarm_rtc_sysbus_class_init(ObjectClass *klass, void *data)
 
     dc->desc = "StrongARM RTC Controller";
     dc->vmsd = &vmstate_strongarm_rtc_regs;
+    dc->realize = strongarm_rtc_realize;
 }
 
 static const TypeInfo strongarm_rtc_sysbus_info = {
@@ -XXX,XX +XXX,XX @@ static void strongarm_uart_init(Object *obj)
                           "uart", 0x10000);
     sysbus_init_mmio(dev, &s->iomem);
     sysbus_init_irq(dev, &s->irq);
-
-    s->rx_timeout_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, strongarm_uart_rx_to, s);
-    s->tx_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, strongarm_uart_tx, s);
 }
 
 static void strongarm_uart_realize(DeviceState *dev, Error **errp)
 {
     StrongARMUARTState *s = STRONGARM_UART(dev);
 
+    s->rx_timeout_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+                                       strongarm_uart_rx_to,
+                                       s);
+    s->tx_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, strongarm_uart_tx, s);
     qemu_chr_fe_set_handlers(&s->chr,
                              strongarm_uart_can_receive,
                              strongarm_uart_receive,
-- 
2.20.1

From: Pan Nengyuan <pannengyuan@huawei.com>

There are some memleaks when we call 'device_list_properties'. This patch move timer_new from init into realize to fix it.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20200227025055.14341-7-pannengyuan@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/timer/cadence_ttc.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/hw/timer/cadence_ttc.c b/hw/timer/cadence_ttc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/cadence_ttc.c
+++ b/hw/timer/cadence_ttc.c
@@ -XXX,XX +XXX,XX @@ static void cadence_timer_init(uint32_t freq, CadenceTimerState *s)
 static void cadence_ttc_init(Object *obj)
 {
     CadenceTTCState *s = CADENCE_TTC(obj);
-    int i;
-
-    for (i = 0; i < 3; ++i) {
-        cadence_timer_init(133000000, &s->timer[i]);
-        sysbus_init_irq(SYS_BUS_DEVICE(obj), &s->timer[i].irq);
-    }
 
     memory_region_init_io(&s->iomem, obj, &cadence_ttc_ops, s,
                           "timer", 0x1000);
     sysbus_init_mmio(SYS_BUS_DEVICE(obj), &s->iomem);
 }
 
+static void cadence_ttc_realize(DeviceState *dev, Error **errp)
+{
+    CadenceTTCState *s = CADENCE_TTC(dev);
+    int i;
+
+    for (i = 0; i < 3; ++i) {
+        cadence_timer_init(133000000, &s->timer[i]);
+        sysbus_init_irq(SYS_BUS_DEVICE(dev), &s->timer[i].irq);
+    }
+}
+
 static int cadence_timer_pre_save(void *opaque)
 {
     cadence_timer_sync((CadenceTimerState *)opaque);
@@ -XXX,XX +XXX,XX @@ static void cadence_ttc_class_init(ObjectClass *klass, void *data)
     DeviceClass *dc = DEVICE_CLASS(klass);
 
     dc->vmsd = &vmstate_cadence_ttc;
+    dc->realize = cadence_ttc_realize;
 }
 
 static const TypeInfo cadence_ttc_info = {
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Don't merely start with v8.0, handle v7VE as well.  Ensure that writes
from aarch32 mode do not change bits in the other half of the register.
Protect reads of aa64 id registers with ARM_FEATURE_AARCH64.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200229012811.24129-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 38 +++++++++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el3_no_el2_v8_cp_reginfo[] = {
     REGINFO_SENTINEL
 };
 
-static void hcr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
+static void do_hcr_write(CPUARMState *env, uint64_t value, uint64_t valid_mask)
 {
     ARMCPU *cpu = env_archcpu(env);
-    /* Begin with bits defined in base ARMv8.0.  */
-    uint64_t valid_mask = MAKE_64BIT_MASK(0, 34);
+
+    if (arm_feature(env, ARM_FEATURE_V8)) {
+        valid_mask |= MAKE_64BIT_MASK(0, 34);  /* ARMv8.0 */
+    } else {
+        valid_mask |= MAKE_64BIT_MASK(0, 28);  /* ARMv7VE */
+    }
 
     if (arm_feature(env, ARM_FEATURE_EL3)) {
         valid_mask &= ~HCR_HCD;
@@ -XXX,XX +XXX,XX @@ static void hcr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
          */
         valid_mask &= ~HCR_TSC;
     }
-    if (cpu_isar_feature(aa64_vh, cpu)) {
-        valid_mask |= HCR_E2H;
-    }
-    if (cpu_isar_feature(aa64_lor, cpu)) {
-        valid_mask |= HCR_TLOR;
-    }
-    if (cpu_isar_feature(aa64_pauth, cpu)) {
-        valid_mask |= HCR_API | HCR_APK;
+
+    if (arm_feature(env, ARM_FEATURE_AARCH64)) {
+        if (cpu_isar_feature(aa64_vh, cpu)) {
+            valid_mask |= HCR_E2H;
+        }
+        if (cpu_isar_feature(aa64_lor, cpu)) {
+            valid_mask |= HCR_TLOR;
+        }
+        if (cpu_isar_feature(aa64_pauth, cpu)) {
+            valid_mask |= HCR_API | HCR_APK;
+        }
     }
 
     /* Clear RES0 bits.  */
@@ -XXX,XX +XXX,XX @@ static void hcr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
     arm_cpu_update_vfiq(cpu);
 }
 
+static void hcr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
+{
+    do_hcr_write(env, value, 0);
+}
+
 static void hcr_writehigh(CPUARMState *env, const ARMCPRegInfo *ri,
                           uint64_t value)
 {
     /* Handle HCR2 write, i.e. write to high half of HCR_EL2 */
     value = deposit64(env->cp15.hcr_el2, 32, 32, value);
-    hcr_write(env, NULL, value);
+    do_hcr_write(env, value, MAKE_64BIT_MASK(0, 32));
 }
 
 static void hcr_writelow(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -XXX,XX +XXX,XX @@ static void hcr_writelow(CPUARMState *env, const ARMCPRegInfo *ri,
 {
     /* Handle HCR write, i.e. write to low half of HCR_EL2 */
     value = deposit64(env->cp15.hcr_el2, 0, 32, value);
-    hcr_write(env, NULL, value);
+    do_hcr_write(env, value, MAKE_64BIT_MASK(32, 32));
 }
 
 /*
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200229012811.24129-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline void xpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
 #define HCR_TERR      (1ULL << 36)
 #define HCR_TEA       (1ULL << 37)
 #define HCR_MIOCNCE   (1ULL << 38)
+/* RES0 bit 39 */
 #define HCR_APK       (1ULL << 40)
 #define HCR_API       (1ULL << 41)
 #define HCR_NV        (1ULL << 42)
@@ -XXX,XX +XXX,XX @@ static inline void xpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
 #define HCR_NV2       (1ULL << 45)
 #define HCR_FWB       (1ULL << 46)
 #define HCR_FIEN      (1ULL << 47)
+/* RES0 bit 48 */
 #define HCR_TID4      (1ULL << 49)
 #define HCR_TICAB     (1ULL << 50)
+#define HCR_AMVOFFEN  (1ULL << 51)
 #define HCR_TOCU      (1ULL << 52)
+#define HCR_ENSCXT    (1ULL << 53)
 #define HCR_TTLBIS    (1ULL << 54)
 #define HCR_TTLBOS    (1ULL << 55)
 #define HCR_ATA       (1ULL << 56)
 #define HCR_DCT       (1ULL << 57)
+#define HCR_TID5      (1ULL << 58)
+#define HCR_TWEDEN    (1ULL << 59)
+#define HCR_TWEDEL    MAKE_64BIT_MASK(60, 4)
 
 #define SCR_NS                (1U << 0)
 #define SCR_IRQ               (1U << 1)
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

In arm_cpu_reset, we configure many system registers so that user-only
behaves as it should with a minimum of ifdefs.  However, we do not set
all of the system registers as required for a cpu with EL2 and EL3.

Disabling EL2 and EL3 mean that we will not look at those registers,
which means that we don't have to worry about configuring them.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200229012811.24129-4-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static Property arm_cpu_reset_hivecs_property =
 static Property arm_cpu_rvbar_property =
             DEFINE_PROP_UINT64("rvbar", ARMCPU, rvbar, 0);
 
+#ifndef CONFIG_USER_ONLY
 static Property arm_cpu_has_el2_property =
             DEFINE_PROP_BOOL("has_el2", ARMCPU, has_el2, true);
 
 static Property arm_cpu_has_el3_property =
             DEFINE_PROP_BOOL("has_el3", ARMCPU, has_el3, true);
+#endif
 
 static Property arm_cpu_cfgend_property =
             DEFINE_PROP_BOOL("cfgend", ARMCPU, cfgend, false);
@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
         qdev_property_add_static(DEVICE(obj), &arm_cpu_rvbar_property);
     }
 
+#ifndef CONFIG_USER_ONLY
     if (arm_feature(&cpu->env, ARM_FEATURE_EL3)) {
         /* Add the has_el3 state CPU property only if EL3 is allowed.  This will
          * prevent "has_el3" from existing on CPUs which cannot support EL3.
          */
         qdev_property_add_static(DEVICE(obj), &arm_cpu_has_el3_property);
 
-#ifndef CONFIG_USER_ONLY
         object_property_add_link(obj, "secure-memory",
                                  TYPE_MEMORY_REGION,
                                  (Object **)&cpu->secure_memory,
                                  qdev_prop_allow_set_link_before_realize,
                                  OBJ_PROP_LINK_STRONG,
                                  &error_abort);
-#endif
     }
 
     if (arm_feature(&cpu->env, ARM_FEATURE_EL2)) {
         qdev_property_add_static(DEVICE(obj), &arm_cpu_has_el2_property);
     }
+#endif
 
     if (arm_feature(&cpu->env, ARM_FEATURE_PMU)) {
         cpu->has_pmu = true;
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We have disabled EL2 and EL3 for user-only, which means that these
registers "don't exist" and should not be set.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200229012811.24129-5-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
         /* Enable all PAC keys.  */
         env->cp15.sctlr_el[1] |= (SCTLR_EnIA | SCTLR_EnIB |
                                   SCTLR_EnDA | SCTLR_EnDB);
-        /* Enable all PAC instructions */
-        env->cp15.hcr_el2 |= HCR_API;
-        env->cp15.scr_el3 |= SCR_API;
         /* and to the FP/Neon instructions */
         env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 20, 2, 3);
         /* and to the SVE instructions */
         env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 16, 2, 3);
-        env->cp15.cptr_el[3] |= CPTR_EZ;
         /* with maximum vector length */
         env->vfp.zcr_el[1] = cpu_isar_feature(aa64_sve, cpu) ?
                              cpu->sve_max_vq - 1 : 0;
-        env->vfp.zcr_el[2] = env->vfp.zcr_el[1];
-        env->vfp.zcr_el[3] = env->vfp.zcr_el[1];
         /*
          * Enable TBI0 and TBI1.  While the real kernel only enables TBI0,
          * turning on both here will produce smaller code and otherwise
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Update the {TGE,E2H} == '11' masking to ARMv8.6.
If EL2 is configured for aarch32, disable all of
the bits that are RES0 in aarch32 mode.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200229012811.24129-6-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ uint64_t arm_hcr_el2_eff(CPUARMState *env)
          * Since the v8.4 language applies to the entire register, and
          * appears to be backward compatible, use that.
          */
-        ret = 0;
-    } else if (ret & HCR_TGE) {
-        /* These bits are up-to-date as of ARMv8.4.  */
+        return 0;
+    }
+
+    /*
+     * For a cpu that supports both aarch64 and aarch32, we can set bits
+     * in HCR_EL2 (e.g. via EL3) that are RES0 when we enter EL2 as aa32.
+     * Ignore all of the bits in HCR+HCR2 that are not valid for aarch32.
+     */
+    if (!arm_el_is_aa64(env, 2)) {
+        uint64_t aa32_valid;
+
+        /*
+         * These bits are up-to-date as of ARMv8.6.
+         * For HCR, it's easiest to list just the 2 bits that are invalid.
+         * For HCR2, list those that are valid.
+         */
+        aa32_valid = MAKE_64BIT_MASK(0, 32) & ~(HCR_RW | HCR_TDZ);
+        aa32_valid |= (HCR_CD | HCR_ID | HCR_TERR | HCR_TEA | HCR_MIOCNCE |
+                       HCR_TID4 | HCR_TICAB | HCR_TOCU | HCR_TTLBIS);
+        ret &= aa32_valid;
+    }
+
+    if (ret & HCR_TGE) {
+        /* These bits are up-to-date as of ARMv8.6.  */
         if (ret & HCR_E2H) {
             ret &= ~(HCR_VM | HCR_FMO | HCR_IMO | HCR_AMO |
                      HCR_BSU_MASK | HCR_DC | HCR_TWI | HCR_TWE |
                      HCR_TID0 | HCR_TID2 | HCR_TPCP | HCR_TPU |
-                     HCR_TDZ | HCR_CD | HCR_ID | HCR_MIOCNCE);
+                     HCR_TDZ | HCR_CD | HCR_ID | HCR_MIOCNCE |
+                     HCR_TID4 | HCR_TICAB | HCR_TOCU | HCR_ENSCXT |
+                     HCR_TTLBIS | HCR_TTLBOS | HCR_TID5);
         } else {
             ret |= HCR_FMO | HCR_IMO | HCR_AMO;
         }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

These bits trap EL1 access to various virtual memory controls.

Buglink: https://bugs.launchpad.net/bugs/1855072
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200229012811.24129-7-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 82 ++++++++++++++++++++++++++++++---------------
 1 file changed, 55 insertions(+), 27 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_tpm(CPUARMState *env, const ARMCPRegInfo *ri,
     return CP_ACCESS_OK;
 }
 
+/* Check for traps from EL1 due to HCR_EL2.TVM and HCR_EL2.TRVM.  */
+static CPAccessResult access_tvm_trvm(CPUARMState *env, const ARMCPRegInfo *ri,
+                                      bool isread)
+{
+    if (arm_current_el(env) == 1) {
+        uint64_t trap = isread ? HCR_TRVM : HCR_TVM;
+        if (arm_hcr_el2_eff(env) & trap) {
+            return CP_ACCESS_TRAP_EL2;
+        }
+    }
+    return CP_ACCESS_OK;
+}
+
 static void dacr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
 {
     ARMCPU *cpu = env_archcpu(env);
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo cp_reginfo[] = {
      */
     { .name = "CONTEXTIDR_EL1", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 0, .crn = 13, .crm = 0, .opc2 = 1,
-      .access = PL1_RW, .secure = ARM_CP_SECSTATE_NS,
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .secure = ARM_CP_SECSTATE_NS,
       .fieldoffset = offsetof(CPUARMState, cp15.contextidr_el[1]),
       .resetvalue = 0, .writefn = contextidr_write, .raw_writefn = raw_write, },
     { .name = "CONTEXTIDR_S", .state = ARM_CP_STATE_AA32,
       .cp = 15, .opc1 = 0, .crn = 13, .crm = 0, .opc2 = 1,
-      .access = PL1_RW, .secure = ARM_CP_SECSTATE_S,
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .secure = ARM_CP_SECSTATE_S,
       .fieldoffset = offsetof(CPUARMState, cp15.contextidr_s),
       .resetvalue = 0, .writefn = contextidr_write, .raw_writefn = raw_write, },
     REGINFO_SENTINEL
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo not_v8_cp_reginfo[] = {
     /* MMU Domain access control / MPU write buffer control */
     { .name = "DACR",
       .cp = 15, .opc1 = CP_ANY, .crn = 3, .crm = CP_ANY, .opc2 = CP_ANY,
-      .access = PL1_RW, .resetvalue = 0,
+      .access = PL1_RW, .accessfn = access_tvm_trvm, .resetvalue = 0,
       .writefn = dacr_write, .raw_writefn = raw_write,
       .bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.dacr_s),
                              offsetoflow32(CPUARMState, cp15.dacr_ns) } },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
     { .name = "DMB", .cp = 15, .crn = 7, .crm = 10, .opc1 = 0, .opc2 = 5,
       .access = PL0_W, .type = ARM_CP_NOP },
     { .name = "IFAR", .cp = 15, .crn = 6, .crm = 0, .opc1 = 0, .opc2 = 2,
-      .access = PL1_RW,
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ifar_s),
                              offsetof(CPUARMState, cp15.ifar_ns) },
       .resetvalue = 0, },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
      */
     { .name = "AFSR0_EL1", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 0, .crn = 5, .crm = 1, .opc2 = 0,
-      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .type = ARM_CP_CONST, .resetvalue = 0 },
     { .name = "AFSR1_EL1", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 0, .crn = 5, .crm = 1, .opc2 = 1,
-      .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .type = ARM_CP_CONST, .resetvalue = 0 },
     /* MAIR can just read-as-written because we don't implement caches
      * and so don't need to care about memory attributes.
      */
     { .name = "MAIR_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 2, .opc2 = 0,
-      .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.mair_el[1]),
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .fieldoffset = offsetof(CPUARMState, cp15.mair_el[1]),
       .resetvalue = 0 },
     { .name = "MAIR_EL3", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 6, .crn = 10, .crm = 2, .opc2 = 0,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
       * handled in the field definitions.
       */
     { .name = "MAIR0", .state = ARM_CP_STATE_AA32,
-      .cp = 15, .opc1 = 0, .crn = 10, .crm = 2, .opc2 = 0, .access = PL1_RW,
+      .cp = 15, .opc1 = 0, .crn = 10, .crm = 2, .opc2 = 0,
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.mair0_s),
                              offsetof(CPUARMState, cp15.mair0_ns) },
       .resetfn = arm_cp_reset_ignore },
     { .name = "MAIR1", .state = ARM_CP_STATE_AA32,
-      .cp = 15, .opc1 = 0, .crn = 10, .crm = 2, .opc2 = 1, .access = PL1_RW,
+      .cp = 15, .opc1 = 0, .crn = 10, .crm = 2, .opc2 = 1,
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.mair1_s),
                              offsetof(CPUARMState, cp15.mair1_ns) },
       .resetfn = arm_cp_reset_ignore },
@@ -XXX,XX +XXX,XX @@ static void vttbr_write(CPUARMState *env, const ARMCPRegInfo *ri,
 
 static const ARMCPRegInfo vmsa_pmsa_cp_reginfo[] = {
     { .name = "DFSR", .cp = 15, .crn = 5, .crm = 0, .opc1 = 0, .opc2 = 0,
-      .access = PL1_RW, .type = ARM_CP_ALIAS,
+      .access = PL1_RW, .accessfn = access_tvm_trvm, .type = ARM_CP_ALIAS,
       .bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.dfsr_s),
                              offsetoflow32(CPUARMState, cp15.dfsr_ns) }, },
     { .name = "IFSR", .cp = 15, .crn = 5, .crm = 0, .opc1 = 0, .opc2 = 1,
-      .access = PL1_RW, .resetvalue = 0,
+      .access = PL1_RW, .accessfn = access_tvm_trvm, .resetvalue = 0,
       .bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.ifsr_s),
                              offsetoflow32(CPUARMState, cp15.ifsr_ns) } },
     { .name = "DFAR", .cp = 15, .opc1 = 0, .crn = 6, .crm = 0, .opc2 = 0,
-      .access = PL1_RW, .resetvalue = 0,
+      .access = PL1_RW, .accessfn = access_tvm_trvm, .resetvalue = 0,
       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.dfar_s),
                              offsetof(CPUARMState, cp15.dfar_ns) } },
     { .name = "FAR_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .crn = 6, .crm = 0, .opc1 = 0, .opc2 = 0,
-      .access = PL1_RW, .fieldoffset = offsetof(CPUARMState, cp15.far_el[1]),
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .fieldoffset = offsetof(CPUARMState, cp15.far_el[1]),
       .resetvalue = 0, },
     REGINFO_SENTINEL
 };
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vmsa_pmsa_cp_reginfo[] = {
 static const ARMCPRegInfo vmsa_cp_reginfo[] = {
     { .name = "ESR_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .crn = 5, .crm = 2, .opc1 = 0, .opc2 = 0,
-      .access = PL1_RW,
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
       .fieldoffset = offsetof(CPUARMState, cp15.esr_el[1]), .resetvalue = 0, },
     { .name = "TTBR0_EL1", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 0,
-      .access = PL1_RW, .writefn = vmsa_ttbr_write, .resetvalue = 0,
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .writefn = vmsa_ttbr_write, .resetvalue = 0,
       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr0_s),
                              offsetof(CPUARMState, cp15.ttbr0_ns) } },
     { .name = "TTBR1_EL1", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 1,
-      .access = PL1_RW, .writefn = vmsa_ttbr_write, .resetvalue = 0,
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .writefn = vmsa_ttbr_write, .resetvalue = 0,
       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr1_s),
                              offsetof(CPUARMState, cp15.ttbr1_ns) } },
     { .name = "TCR_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .crn = 2, .crm = 0, .opc1 = 0, .opc2 = 2,
-      .access = PL1_RW, .writefn = vmsa_tcr_el12_write,
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .writefn = vmsa_tcr_el12_write,
       .resetfn = vmsa_ttbcr_reset, .raw_writefn = raw_write,
       .fieldoffset = offsetof(CPUARMState, cp15.tcr_el[1]) },
     { .name = "TTBCR", .cp = 15, .crn = 2, .crm = 0, .opc1 = 0, .opc2 = 2,
-      .access = PL1_RW, .type = ARM_CP_ALIAS, .writefn = vmsa_ttbcr_write,
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .type = ARM_CP_ALIAS, .writefn = vmsa_ttbcr_write,
       .raw_writefn = vmsa_ttbcr_raw_write,
       .bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.tcr_el[3]),
                              offsetoflow32(CPUARMState, cp15.tcr_el[1])} },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vmsa_cp_reginfo[] = {
  */
 static const ARMCPRegInfo ttbcr2_reginfo = {
     .name = "TTBCR2", .cp = 15, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 3,
-    .access = PL1_RW, .type = ARM_CP_ALIAS,
+    .access = PL1_RW, .accessfn = access_tvm_trvm,
+    .type = ARM_CP_ALIAS,
     .bank_fieldoffsets = { offsetofhigh32(CPUARMState, cp15.tcr_el[3]),
                            offsetofhigh32(CPUARMState, cp15.tcr_el[1]) },
 };
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lpae_cp_reginfo[] = {
     /* NOP AMAIR0/1 */
     { .name = "AMAIR0", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .crn = 10, .crm = 3, .opc1 = 0, .opc2 = 0,
-      .access = PL1_RW, .type = ARM_CP_CONST,
-      .resetvalue = 0 },
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .type = ARM_CP_CONST, .resetvalue = 0 },
     /* AMAIR1 is mapped to AMAIR_EL1[63:32] */
     { .name = "AMAIR1", .cp = 15, .crn = 10, .crm = 3, .opc1 = 0, .opc2 = 1,
-      .access = PL1_RW, .type = ARM_CP_CONST,
-      .resetvalue = 0 },
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .type = ARM_CP_CONST, .resetvalue = 0 },
     { .name = "PAR", .cp = 15, .crm = 7, .opc1 = 0,
       .access = PL1_RW, .type = ARM_CP_64BIT, .resetvalue = 0,
       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.par_s),
                              offsetof(CPUARMState, cp15.par_ns)} },
     { .name = "TTBR0", .cp = 15, .crm = 2, .opc1 = 0,
-      .access = PL1_RW, .type = ARM_CP_64BIT | ARM_CP_ALIAS,
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .type = ARM_CP_64BIT | ARM_CP_ALIAS,
       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr0_s),
                              offsetof(CPUARMState, cp15.ttbr0_ns) },
       .writefn = vmsa_ttbr_write, },
     { .name = "TTBR1", .cp = 15, .crm = 2, .opc1 = 1,
-      .access = PL1_RW, .type = ARM_CP_64BIT | ARM_CP_ALIAS,
+      .access = PL1_RW, .accessfn = access_tvm_trvm,
+      .type = ARM_CP_64BIT | ARM_CP_ALIAS,
       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr1_s),
                              offsetof(CPUARMState, cp15.ttbr1_ns) },
       .writefn = vmsa_ttbr_write, },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .type = ARM_CP_NOP, .access = PL1_W },
     /* MMU Domain access control / MPU write buffer control */
     { .name = "DACR", .cp = 15, .opc1 = 0, .crn = 3, .crm = 0, .opc2 = 0,
-      .access = PL1_RW, .resetvalue = 0,
+      .access = PL1_RW, .accessfn = access_tvm_trvm, .resetvalue = 0,
       .writefn = dacr_write, .raw_writefn = raw_write,
       .bank_fieldoffsets = { offsetoflow32(CPUARMState, cp15.dacr_s),
                              offsetoflow32(CPUARMState, cp15.dacr_ns) } },
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
         ARMCPRegInfo sctlr = {
             .name = "SCTLR", .state = ARM_CP_STATE_BOTH,
             .opc0 = 3, .opc1 = 0, .crn = 1, .crm = 0, .opc2 = 0,
-            .access = PL1_RW,
+            .access = PL1_RW, .accessfn = access_tvm_trvm,
             .bank_fieldoffsets = { offsetof(CPUARMState, cp15.sctlr_s),
                                    offsetof(CPUARMState, cp15.sctlr_ns) },
             .writefn = sctlr_write, .resetvalue = cpu->reset_sctlr,
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

These bits trap EL1 access to set/way cache maintenance insns.

Buglink: https://bugs.launchpad.net/bugs/1863685
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200229012811.24129-8-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

This bit traps EL1 access to the auxiliary control registers.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200229012811.24129-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

This bit traps EL1 access to cache maintenance insns that operate
to the point of coherency or persistence.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200229012811.24129-10-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 39 +++++++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPAccessResult aa64_cacheop_access(CPUARMState *env,
     return CP_ACCESS_OK;
 }
 
+static CPAccessResult aa64_cacheop_poc_access(CPUARMState *env,
+                                              const ARMCPRegInfo *ri,
+                                              bool isread)
+{
+    /* Cache invalidate/clean to Point of Coherency or Persistence...  */
+    switch (arm_current_el(env)) {
+    case 0:
+        /* ... EL0 must UNDEF unless SCTLR_EL1.UCI is set.  */
+        if (!(arm_sctlr(env, 0) & SCTLR_UCI)) {
+            return CP_ACCESS_TRAP;
+        }
+        /* fall through */
+    case 1:
+        /* ... EL1 must trap to EL2 if HCR_EL2.TPCP is set.  */
+        if (arm_hcr_el2_eff(env) & HCR_TPCP) {
+            return CP_ACCESS_TRAP_EL2;
+        }
+        break;
+    }
+    return CP_ACCESS_OK;
+}
+
 /* See: D4.7.2 TLB maintenance requirements and the TLB maintenance instructions
  * Page D4-1736 (DDI0487A.b)
  */
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .accessfn = aa64_cacheop_access },
     { .name = "DC_IVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 1,
-      .access = PL1_W, .type = ARM_CP_NOP },
+      .access = PL1_W, .accessfn = aa64_cacheop_poc_access,
+      .type = ARM_CP_NOP },
     { .name = "DC_ISW", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 2,
       .access = PL1_W, .accessfn = access_tsw, .type = ARM_CP_NOP },
     { .name = "DC_CVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 10, .opc2 = 1,
       .access = PL0_W, .type = ARM_CP_NOP,
-      .accessfn = aa64_cacheop_access },
+      .accessfn = aa64_cacheop_poc_access },
     { .name = "DC_CSW", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 2,
       .access = PL1_W, .accessfn = access_tsw, .type = ARM_CP_NOP },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
     { .name = "DC_CIVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 14, .opc2 = 1,
       .access = PL0_W, .type = ARM_CP_NOP,
-      .accessfn = aa64_cacheop_access },
+      .accessfn = aa64_cacheop_poc_access },
     { .name = "DC_CISW", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 2,
       .access = PL1_W, .accessfn = access_tsw, .type = ARM_CP_NOP },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
     { .name = "BPIMVA", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 7,
       .type = ARM_CP_NOP, .access = PL1_W },
     { .name = "DCIMVAC", .cp = 15, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 1,
-      .type = ARM_CP_NOP, .access = PL1_W },
+      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_poc_access },
     { .name = "DCISW", .cp = 15, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 2,
       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
     { .name = "DCCMVAC", .cp = 15, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 1,
-      .type = ARM_CP_NOP, .access = PL1_W },
+      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_poc_access },
     { .name = "DCCSW", .cp = 15, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 2,
       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
     { .name = "DCCMVAU", .cp = 15, .opc1 = 0, .crn = 7, .crm = 11, .opc2 = 1,
       .type = ARM_CP_NOP, .access = PL1_W },
     { .name = "DCCIMVAC", .cp = 15, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 1,
-      .type = ARM_CP_NOP, .access = PL1_W },
+      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_poc_access },
     { .name = "DCCISW", .cp = 15, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 2,
       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
     /* MMU Domain access control / MPU write buffer control */
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo dcpop_reg[] = {
     { .name = "DC_CVAP", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 12, .opc2 = 1,
       .access = PL0_W, .type = ARM_CP_NO_RAW | ARM_CP_SUPPRESS_TB_END,
-      .accessfn = aa64_cacheop_access, .writefn = dccvap_writefn },
+      .accessfn = aa64_cacheop_poc_access, .writefn = dccvap_writefn },
     REGINFO_SENTINEL
 };
 
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo dcpodp_reg[] = {
     { .name = "DC_CVADP", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 13, .opc2 = 1,
       .access = PL0_W, .type = ARM_CP_NO_RAW | ARM_CP_SUPPRESS_TB_END,
-      .accessfn = aa64_cacheop_access, .writefn = dccvap_writefn },
+      .accessfn = aa64_cacheop_poc_access, .writefn = dccvap_writefn },
     REGINFO_SENTINEL
 };
 #endif /*CONFIG_USER_ONLY*/
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This bit traps EL1 access to cache maintenance insns that operate
to the point of unification.  There are no longer any references to
plain aa64_cacheop_access, so remove it.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200229012811.24129-11-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 53 +++++++++++++++++++++++++++------------------
 1 file changed, 32 insertions(+), 21 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo uao_reginfo = {
     .readfn = aa64_uao_read, .writefn = aa64_uao_write
 };
 
-static CPAccessResult aa64_cacheop_access(CPUARMState *env,
-                                          const ARMCPRegInfo *ri,
-                                          bool isread)
-{
-    /* Cache invalidate/clean: NOP, but EL0 must UNDEF unless
-     * SCTLR_EL1.UCI is set.
-     */
-    if (arm_current_el(env) == 0 && !(arm_sctlr(env, 0) & SCTLR_UCI)) {
-        return CP_ACCESS_TRAP;
-    }
-    return CP_ACCESS_OK;
-}
-
 static CPAccessResult aa64_cacheop_poc_access(CPUARMState *env,
                                               const ARMCPRegInfo *ri,
                                               bool isread)
@@ -XXX,XX +XXX,XX @@ static CPAccessResult aa64_cacheop_poc_access(CPUARMState *env,
     return CP_ACCESS_OK;
 }
 
+static CPAccessResult aa64_cacheop_pou_access(CPUARMState *env,
+                                              const ARMCPRegInfo *ri,
+                                              bool isread)
+{
+    /* Cache invalidate/clean to Point of Unification... */
+    switch (arm_current_el(env)) {
+    case 0:
+        /* ... EL0 must UNDEF unless SCTLR_EL1.UCI is set.  */
+        if (!(arm_sctlr(env, 0) & SCTLR_UCI)) {
+            return CP_ACCESS_TRAP;
+        }
+        /* fall through */
+    case 1:
+        /* ... EL1 must trap to EL2 if HCR_EL2.TPU is set.  */
+        if (arm_hcr_el2_eff(env) & HCR_TPU) {
+            return CP_ACCESS_TRAP_EL2;
+        }
+        break;
+    }
+    return CP_ACCESS_OK;
+}
+
 /* See: D4.7.2 TLB maintenance requirements and the TLB maintenance instructions
  * Page D4-1736 (DDI0487A.b)
  */
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
     /* Cache ops: all NOPs since we don't emulate caches */
     { .name = "IC_IALLUIS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
-      .access = PL1_W, .type = ARM_CP_NOP },
+      .access = PL1_W, .type = ARM_CP_NOP,
+      .accessfn = aa64_cacheop_pou_access },
     { .name = "IC_IALLU", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 0,
-      .access = PL1_W, .type = ARM_CP_NOP },
+      .access = PL1_W, .type = ARM_CP_NOP,
+      .accessfn = aa64_cacheop_pou_access },
     { .name = "IC_IVAU", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 5, .opc2 = 1,
       .access = PL0_W, .type = ARM_CP_NOP,
-      .accessfn = aa64_cacheop_access },
+      .accessfn = aa64_cacheop_pou_access },
     { .name = "DC_IVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 1,
       .access = PL1_W, .accessfn = aa64_cacheop_poc_access,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
     { .name = "DC_CVAU", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 11, .opc2 = 1,
       .access = PL0_W, .type = ARM_CP_NOP,
-      .accessfn = aa64_cacheop_access },
+      .accessfn = aa64_cacheop_pou_access },
     { .name = "DC_CIVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 14, .opc2 = 1,
       .access = PL0_W, .type = ARM_CP_NOP,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .writefn = tlbiipas2_is_write },
     /* 32 bit cache operations */
     { .name = "ICIALLUIS", .cp = 15, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
-      .type = ARM_CP_NOP, .access = PL1_W },
+      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
     { .name = "BPIALLUIS", .cp = 15, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 6,
       .type = ARM_CP_NOP, .access = PL1_W },
     { .name = "ICIALLU", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 0,
-      .type = ARM_CP_NOP, .access = PL1_W },
+      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
     { .name = "ICIMVAU", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 1,
-      .type = ARM_CP_NOP, .access = PL1_W },
+      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
     { .name = "BPIALL", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 6,
       .type = ARM_CP_NOP, .access = PL1_W },
     { .name = "BPIMVA", .cp = 15, .opc1 = 0, .crn = 7, .crm = 5, .opc2 = 7,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
     { .name = "DCCSW", .cp = 15, .opc1 = 0, .crn = 7, .crm = 10, .opc2 = 2,
       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = access_tsw },
     { .name = "DCCMVAU", .cp = 15, .opc1 = 0, .crn = 7, .crm = 11, .opc2 = 1,
-      .type = ARM_CP_NOP, .access = PL1_W },
+      .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_pou_access },
     { .name = "DCCIMVAC", .cp = 15, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 1,
       .type = ARM_CP_NOP, .access = PL1_W, .accessfn = aa64_cacheop_poc_access },
     { .name = "DCCISW", .cp = 15, .opc1 = 0, .crn = 7, .crm = 14, .opc2 = 2,
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This bit traps EL1 access to tlb maintenance insns.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200229012811.24129-12-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 85 +++++++++++++++++++++++++++++----------------
 1 file changed, 55 insertions(+), 30 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_tacr(CPUARMState *env, const ARMCPRegInfo *ri,
     return CP_ACCESS_OK;
 }
 
+/* Check for traps from EL1 due to HCR_EL2.TTLB. */
+static CPAccessResult access_ttlb(CPUARMState *env, const ARMCPRegInfo *ri,
+                                  bool isread)
+{
+    if (arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_TTLB)) {
+        return CP_ACCESS_TRAP_EL2;
+    }
+    return CP_ACCESS_OK;
+}
+
 static void dacr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
 {
     ARMCPU *cpu = env_archcpu(env);
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v7_cp_reginfo[] = {
       .type = ARM_CP_NO_RAW, .access = PL1_R, .readfn = isr_read },
     /* 32 bit ITLB invalidates */
     { .name = "ITLBIALL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 0,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbiall_write },
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .writefn = tlbiall_write },
     { .name = "ITLBIMVA", .cp = 15, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 1,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbimva_write },
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .writefn = tlbimva_write },
     { .name = "ITLBIASID", .cp = 15, .opc1 = 0, .crn = 8, .crm = 5, .opc2 = 2,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbiasid_write },
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .writefn = tlbiasid_write },
     /* 32 bit DTLB invalidates */
     { .name = "DTLBIALL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 0,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbiall_write },
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .writefn = tlbiall_write },
     { .name = "DTLBIMVA", .cp = 15, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 1,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbimva_write },
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .writefn = tlbimva_write },
     { .name = "DTLBIASID", .cp = 15, .opc1 = 0, .crn = 8, .crm = 6, .opc2 = 2,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbiasid_write },
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .writefn = tlbiasid_write },
     /* 32 bit TLB invalidates */
     { .name = "TLBIALL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 0,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbiall_write },
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .writefn = tlbiall_write },
     { .name = "TLBIMVA", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 1,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbimva_write },
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .writefn = tlbimva_write },
     { .name = "TLBIASID", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 2,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbiasid_write },
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .writefn = tlbiasid_write },
     { .name = "TLBIMVAA", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 3,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbimvaa_write },
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .writefn = tlbimvaa_write },
     REGINFO_SENTINEL
 };
 
 static const ARMCPRegInfo v7mp_cp_reginfo[] = {
     /* 32 bit TLB invalidates, Inner Shareable */
     { .name = "TLBIALLIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 0,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbiall_is_write },
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .writefn = tlbiall_is_write },
     { .name = "TLBIMVAIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 1,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbimva_is_write },
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .writefn = tlbimva_is_write },
     { .name = "TLBIASIDIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 2,
-      .type = ARM_CP_NO_RAW, .access = PL1_W,
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
       .writefn = tlbiasid_is_write },
     { .name = "TLBIMVAAIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 3,
-      .type = ARM_CP_NO_RAW, .access = PL1_W,
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
       .writefn = tlbimvaa_is_write },
     REGINFO_SENTINEL
 };
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
     /* TLBI operations */
     { .name = "TLBI_VMALLE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 0,
-      .access = PL1_W, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vmalle1is_write },
     { .name = "TLBI_VAE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 1,
-      .access = PL1_W, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vae1is_write },
     { .name = "TLBI_ASIDE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 2,
-      .access = PL1_W, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vmalle1is_write },
     { .name = "TLBI_VAAE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 3,
-      .access = PL1_W, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vae1is_write },
     { .name = "TLBI_VALE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 5,
-      .access = PL1_W, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vae1is_write },
     { .name = "TLBI_VAALE1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 7,
-      .access = PL1_W, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vae1is_write },
     { .name = "TLBI_VMALLE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 0,
-      .access = PL1_W, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vmalle1_write },
     { .name = "TLBI_VAE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 1,
-      .access = PL1_W, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vae1_write },
     { .name = "TLBI_ASIDE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 2,
-      .access = PL1_W, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vmalle1_write },
     { .name = "TLBI_VAAE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 3,
-      .access = PL1_W, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vae1_write },
     { .name = "TLBI_VALE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 5,
-      .access = PL1_W, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vae1_write },
     { .name = "TLBI_VAALE1", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 7,
-      .access = PL1_W, .type = ARM_CP_NO_RAW,
+      .access = PL1_W, .accessfn = access_ttlb, .type = ARM_CP_NO_RAW,
       .writefn = tlbi_aa64_vae1_write },
     { .name = "TLBI_IPAS2E1IS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 4, .crn = 8, .crm = 0, .opc2 = 1,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
 #endif
     /* TLB invalidate last level of translation table walk */
     { .name = "TLBIMVALIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 5,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbimva_is_write },
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .writefn = tlbimva_is_write },
     { .name = "TLBIMVAALIS", .cp = 15, .opc1 = 0, .crn = 8, .crm = 3, .opc2 = 7,
-      .type = ARM_CP_NO_RAW, .access = PL1_W,
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
       .writefn = tlbimvaa_is_write },
     { .name = "TLBIMVAL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 5,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbimva_write },
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .writefn = tlbimva_write },
     { .name = "TLBIMVAAL", .cp = 15, .opc1 = 0, .crn = 8, .crm = 7, .opc2 = 7,
-      .type = ARM_CP_NO_RAW, .access = PL1_W, .writefn = tlbimvaa_write },
+      .type = ARM_CP_NO_RAW, .access = PL1_W, .accessfn = access_ttlb,
+      .writefn = tlbimvaa_write },
     { .name = "TLBIMVALH", .cp = 15, .opc1 = 4, .crn = 8, .crm = 7, .opc2 = 5,
       .type = ARM_CP_NO_RAW, .access = PL2_W,
       .writefn = tlbimva_hyp_write },
-- 
2.20.1

From: Niek Linnenbank <nieklinnenbank@gmail.com>

The Cubieboard is a singleboard computer with an Allwinner A10 System-on-Chip [1].
As documented in the Allwinner A10 User Manual V1.5 [2], the SoC has an ARM
Cortex-A8 processor. Currently the Cubieboard machine definition specifies the
ARM Cortex-A9 in its description and as the default CPU.

This patch corrects the Cubieboard machine definition to use the ARM Cortex-A8.

The only user-visible effect is that our textual description of the
machine was wrong, because hw/arm/allwinner-a10.c always creates a
Cortex-A8 CPU regardless of the default value in the MachineClass struct.

[1] http://docs.cubieboard.org/products/start#cubieboard1
 [2] https://linux-sunxi.org/File:Allwinner_A10_User_manual_V1.5.pdf

Fixes: 8a863c8120994981a099
Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-id: 20200227220149.6845-2-nieklinnenbank@gmail.com
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[note in commit message that the bug didn't have much visible effect]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/cubieboard.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/cubieboard.c
+++ b/hw/arm/cubieboard.c
@@ -XXX,XX +XXX,XX @@ static void cubieboard_init(MachineState *machine)
 
 static void cubieboard_machine_init(MachineClass *mc)
 {
-    mc->desc = "cubietech cubieboard (Cortex-A9)";
-    mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a9");
+    mc->desc = "cubietech cubieboard (Cortex-A8)";
+    mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a8");
     mc->init = cubieboard_init;
     mc->block_default_type = IF_IDE;
     mc->units_per_default_bus = 1;
-- 
2.20.1

From: Niek Linnenbank <nieklinnenbank@gmail.com>

The Cubieboard has an ARM Cortex-A8.  Instead of simply ignoring a
bogus -cpu option provided by the user, give them an error message so
they know their command line is wrong.

Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-id: 20200227220149.6845-3-nieklinnenbank@gmail.com
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweaked commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/cubieboard.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/cubieboard.c
+++ b/hw/arm/cubieboard.c
@@ -XXX,XX +XXX,XX @@ static struct arm_boot_info cubieboard_binfo = {
 
 static void cubieboard_init(MachineState *machine)
 {
-    AwA10State *a10 = AW_A10(object_new(TYPE_AW_A10));
+    AwA10State *a10;
     Error *err = NULL;
 
+    /* Only allow Cortex-A8 for this board */
+    if (strcmp(machine->cpu_type, ARM_CPU_TYPE_NAME("cortex-a8")) != 0) {
+        error_report("This board can only be used with cortex-a8 CPU");
+        exit(1);
+    }
+
+    a10 = AW_A10(object_new(TYPE_AW_A10));
+
     object_property_set_int(OBJECT(&a10->emac), 1, "phy-addr", &err);
     if (err != NULL) {
         error_reportf_err(err, "Couldn't set phy address: ");
-- 
2.20.1

From: Niek Linnenbank <nieklinnenbank@gmail.com>

The Cubieboard contains either 512MiB or 1GiB of onboard RAM [1].
Prevent changing RAM to a different size which could break user programs.

[1] http://linux-sunxi.org/Cubieboard

Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-id: 20200227220149.6845-4-nieklinnenbank@gmail.com
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/cubieboard.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/cubieboard.c
+++ b/hw/arm/cubieboard.c
@@ -XXX,XX +XXX,XX @@ static void cubieboard_init(MachineState *machine)
     AwA10State *a10;
     Error *err = NULL;
 
+    /* This board has fixed size RAM (512MiB or 1GiB) */
+    if (machine->ram_size != 512 * MiB &&
+        machine->ram_size != 1 * GiB) {
+        error_report("This machine can only be used with 512MiB or 1GiB RAM");
+        exit(1);
+    }
+
     /* Only allow Cortex-A8 for this board */
     if (strcmp(machine->cpu_type, ARM_CPU_TYPE_NAME("cortex-a8")) != 0) {
         error_report("This board can only be used with cortex-a8 CPU");
@@ -XXX,XX +XXX,XX @@ static void cubieboard_machine_init(MachineClass *mc)
 {
     mc->desc = "cubietech cubieboard (Cortex-A8)";
     mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a8");
+    mc->default_ram_size = 1 * GiB;
     mc->init = cubieboard_init;
     mc->block_default_type = IF_IDE;
     mc->units_per_default_bus = 1;
-- 
2.20.1

From: Niek Linnenbank <nieklinnenbank@gmail.com>

The Cubieboard machine does not support the -bios argument.
Report an error when -bios is used and exit immediately.

Signed-off-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-id: 20200227220149.6845-5-nieklinnenbank@gmail.com
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/cubieboard.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/hw/arm/cubieboard.c b/hw/arm/cubieboard.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/cubieboard.c
+++ b/hw/arm/cubieboard.c
@@ -XXX,XX +XXX,XX @@
 #include "exec/address-spaces.h"
 #include "qapi/error.h"
 #include "cpu.h"
+#include "sysemu/sysemu.h"
 #include "hw/sysbus.h"
 #include "hw/boards.h"
 #include "hw/arm/allwinner-a10.h"
@@ -XXX,XX +XXX,XX @@ static void cubieboard_init(MachineState *machine)
     AwA10State *a10;
     Error *err = NULL;
 
+    /* BIOS is not supported by this board */
+    if (bios_name) {
+        error_report("BIOS not supported for this machine");
+        exit(1);
+    }
+
     /* This board has fixed size RAM (512MiB or 1GiB) */
     if (machine->ram_size != 512 * MiB &&
         machine->ram_size != 1 * GiB) {
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Replicate the single TBI bit from TCR_EL2 and TCR_EL3 so that
we can unconditionally use pointer bit 55 to index into our
composite TBI1:TBI0 field.

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
     } else if (mmu_idx == ARMMMUIdx_Stage2) {
         return 0; /* VTCR_EL2 */
     } else {
-        return extract32(tcr, 20, 1);
+        /* Replicate the single TBI bit so we always have 2 bits.  */
+        return extract32(tcr, 20, 1) * 3;
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static int aa64_va_parameter_tbid(uint64_t tcr, ARMMMUIdx mmu_idx)
     } else if (mmu_idx == ARMMMUIdx_Stage2) {
         return 0; /* VTCR_EL2 */
     } else {
-        return extract32(tcr, 29, 1);
+        /* Replicate the single TBID bit so we always have 2 bits.  */
+        return extract32(tcr, 29, 1) * 3;
     }
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We now cache the core mmu_idx in env->hflags.  Rather than recompute
from scratch, extract the field.  All of the uses of cpu_mmu_index
within target/arm are within helpers, and env->hflags is always stable
within a translation block from whence helpers are called.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200302175829.2183-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h    | 23 +++++++++++++----------
 target/arm/helper.c |  5 -----
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdxBit {
 
 #define MMU_USER_IDX 0
 
-/**
- * cpu_mmu_index:
- * @env: The cpu environment
- * @ifetch: True for code access, false for data access.
- *
- * Return the core mmu index for the current translation regime.
- * This function is used by generic TCG code paths.
- */
-int cpu_mmu_index(CPUARMState *env, bool ifetch);
-
 /* Indexes used when registering address spaces with cpu_address_space_init */
 typedef enum ARMASIdx {
     ARMASIdx_NS = 0,
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, BTYPE, 10, 2)         /* Not cached. */
 FIELD(TBFLAG_A64, TBID, 12, 2)
 FIELD(TBFLAG_A64, UNPRIV, 14, 1)
 
+/**
+ * cpu_mmu_index:
+ * @env: The cpu environment
+ * @ifetch: True for code access, false for data access.
+ *
+ * Return the core mmu index for the current translation regime.
+ * This function is used by generic TCG code paths.
+ */
+static inline int cpu_mmu_index(CPUARMState *env, bool ifetch)
+{
+    return FIELD_EX32(env->hflags, TBFLAG_ANY, MMUIDX);
+}
+
 static inline bool bswap_code(bool sctlr_b)
 {
 #ifdef CONFIG_USER_ONLY
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx(CPUARMState *env)
     return arm_mmu_idx_el(env, arm_current_el(env));
 }
 
-int cpu_mmu_index(CPUARMState *env, bool ifetch)
-{
-    return arm_to_core_mmu_idx(arm_mmu_idx(env));
-}
-
 #ifndef CONFIG_USER_ONLY
 ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env)
 {
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

If by context we know that we're in AArch64 mode, we need not
test for M-profile when reconstructing the full ARMMMUIdx.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200302175829.2183-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h     | 6 ++++++
 target/arm/translate-a64.c | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline ARMMMUIdx core_to_arm_mmu_idx(CPUARMState *env, int mmu_idx)
     }
 }
 
+static inline ARMMMUIdx core_to_aa64_mmu_idx(int mmu_idx)
+{
+    /* AArch64 is always a-profile. */
+    return mmu_idx | ARM_MMU_IDX_A;
+}
+
 int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx);
 
 /*
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->condexec_mask = 0;
     dc->condexec_cond = 0;
     core_mmu_idx = FIELD_EX32(tb_flags, TBFLAG_ANY, MMUIDX);
-    dc->mmu_idx = core_to_arm_mmu_idx(env, core_mmu_idx);
+    dc->mmu_idx = core_to_aa64_mmu_idx(core_mmu_idx);
     dc->tbii = FIELD_EX32(tb_flags, TBFLAG_A64, TBII);
     dc->tbid = FIELD_EX32(tb_flags, TBFLAG_A64, TBID);
     dc->current_el = arm_mmu_idx_to_el(dc->mmu_idx);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We missed this case within AArch64.ExceptionReturn.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200302175829.2183-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-a64.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -XXX,XX +XXX,XX @@ void HELPER(exception_return)(CPUARMState *env, uint64_t new_pc)
                       "AArch32 EL%d PC 0x%" PRIx32 "\n",
                       cur_el, new_el, env->regs[15]);
     } else {
+        int tbii;
+
         env->aarch64 = 1;
         spsr &= aarch64_pstate_valid_mask(&env_archcpu(env)->isar);
         pstate_write(env, spsr);
@@ -XXX,XX +XXX,XX @@ void HELPER(exception_return)(CPUARMState *env, uint64_t new_pc)
             env->pstate &= ~PSTATE_SS;
         }
         aarch64_restore_sp(env, new_el);
-        env->pc = new_pc;
         helper_rebuild_hflags_a64(env, new_el);
+
+        /*
+         * Apply TBI to the exception return address.  We had to delay this
+         * until after we selected the new EL, so that we could select the
+         * correct TBI+TBID bits.  This is made easier by waiting until after
+         * the hflags rebuild, since we can pull the composite TBII field
+         * from there.
+         */
+        tbii = FIELD_EX32(env->hflags, TBFLAG_A64, TBII);
+        if ((tbii >> extract64(new_pc, 55, 1)) & 1) {
+            /* TBI is enabled. */
+            int core_mmu_idx = cpu_mmu_index(env, false);
+            if (regime_has_2_ranges(core_to_aa64_mmu_idx(core_mmu_idx))) {
+                new_pc = sextract64(new_pc, 0, 56);
+            } else {
+                new_pc = extract64(new_pc, 0, 56);
+            }
+        }
+        env->pc = new_pc;
+
         qemu_log_mask(CPU_LOG_INT, "Exception return from AArch64 EL%d to "
                       "AArch64 EL%d PC 0x%" PRIx64 "\n",
                       cur_el, new_el, env->pc);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is an aarch64-only function.  Move it out of the shared file.
This patch is code movement only.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200302175829.2183-6-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-a64.h |  1 +
 target/arm/helper.h     |  1 -
 target/arm/helper-a64.c | 91 ++++++++++++++++++++++++++++++++++++++++
 target/arm/op_helper.c  | 93 -----------------------------------------
 4 files changed, 92 insertions(+), 94 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(advsimd_f16touinth, i32, f16, ptr)
 DEF_HELPER_2(sqrt_f16, f16, f16, ptr)
 
 DEF_HELPER_2(exception_return, void, env, i64)
+DEF_HELPER_2(dc_zva, void, env, i64)
 
 DEF_HELPER_FLAGS_3(pacia, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(pacib, TCG_CALL_NO_WG, i64, env, i64, i64)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(crypto_sm4ekey, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
 
 DEF_HELPER_FLAGS_3(crc32, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
 DEF_HELPER_FLAGS_3(crc32c, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
-DEF_HELPER_2(dc_zva, void, env, i64)
 
 DEF_HELPER_FLAGS_5(gvec_qrdmlah_s16, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/units.h"
 #include "cpu.h"
 #include "exec/gdbstub.h"
 #include "exec/helper-proto.h"
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sqrt_f16)(uint32_t a, void *fpstp)
     return float16_sqrt(a, s);
 }
 
+void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
+{
+    /*
+     * Implement DC ZVA, which zeroes a fixed-length block of memory.
+     * Note that we do not implement the (architecturally mandated)
+     * alignment fault for attempts to use this on Device memory
+     * (which matches the usual QEMU behaviour of not implementing either
+     * alignment faults or any memory attribute handling).
+     */
 
+    ARMCPU *cpu = env_archcpu(env);
+    uint64_t blocklen = 4 << cpu->dcz_blocksize;
+    uint64_t vaddr = vaddr_in & ~(blocklen - 1);
+
+#ifndef CONFIG_USER_ONLY
+    {
+        /*
+         * Slightly awkwardly, QEMU's TARGET_PAGE_SIZE may be less than
+         * the block size so we might have to do more than one TLB lookup.
+         * We know that in fact for any v8 CPU the page size is at least 4K
+         * and the block size must be 2K or less, but TARGET_PAGE_SIZE is only
+         * 1K as an artefact of legacy v5 subpage support being present in the
+         * same QEMU executable. So in practice the hostaddr[] array has
+         * two entries, given the current setting of TARGET_PAGE_BITS_MIN.
+         */
+        int maxidx = DIV_ROUND_UP(blocklen, TARGET_PAGE_SIZE);
+        void *hostaddr[DIV_ROUND_UP(2 * KiB, 1 << TARGET_PAGE_BITS_MIN)];
+        int try, i;
+        unsigned mmu_idx = cpu_mmu_index(env, false);
+        TCGMemOpIdx oi = make_memop_idx(MO_UB, mmu_idx);
+
+        assert(maxidx <= ARRAY_SIZE(hostaddr));
+
+        for (try = 0; try < 2; try++) {
+
+            for (i = 0; i < maxidx; i++) {
+                hostaddr[i] = tlb_vaddr_to_host(env,
+                                                vaddr + TARGET_PAGE_SIZE * i,
+                                                1, mmu_idx);
+                if (!hostaddr[i]) {
+                    break;
+                }
+            }
+            if (i == maxidx) {
+                /*
+                 * If it's all in the TLB it's fair game for just writing to;
+                 * we know we don't need to update dirty status, etc.
+                 */
+                for (i = 0; i < maxidx - 1; i++) {
+                    memset(hostaddr[i], 0, TARGET_PAGE_SIZE);
+                }
+                memset(hostaddr[i], 0, blocklen - (i * TARGET_PAGE_SIZE));
+                return;
+            }
+            /*
+             * OK, try a store and see if we can populate the tlb. This
+             * might cause an exception if the memory isn't writable,
+             * in which case we will longjmp out of here. We must for
+             * this purpose use the actual register value passed to us
+             * so that we get the fault address right.
+             */
+            helper_ret_stb_mmu(env, vaddr_in, 0, oi, GETPC());
+            /* Now we can populate the other TLB entries, if any */
+            for (i = 0; i < maxidx; i++) {
+                uint64_t va = vaddr + TARGET_PAGE_SIZE * i;
+                if (va != (vaddr_in & TARGET_PAGE_MASK)) {
+                    helper_ret_stb_mmu(env, va, 0, oi, GETPC());
+                }
+            }
+        }
+
+        /*
+         * Slow path (probably attempt to do this to an I/O device or
+         * similar, or clearing of a block of code we have translations
+         * cached for). Just do a series of byte writes as the architecture
+         * demands. It's not worth trying to use a cpu_physical_memory_map(),
+         * memset(), unmap() sequence here because:
+         *  + we'd need to account for the blocksize being larger than a page
+         *  + the direct-RAM access case is almost always going to be dealt
+         *    with in the fastpath code above, so there's no speed benefit
+         *  + we would have to deal with the map returning NULL because the
+         *    bounce buffer was in use
+         */
+        for (i = 0; i < blocklen; i++) {
+            helper_ret_stb_mmu(env, vaddr + i, 0, oi, GETPC());
+        }
+    }
+#else
+    memset(g2h(vaddr), 0, blocklen);
+#endif
+}
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@
  * License along with this library; if not, see <http://www.gnu.org/licenses/>.
  */
 #include "qemu/osdep.h"
-#include "qemu/units.h"
 #include "qemu/log.h"
 #include "qemu/main-loop.h"
 #include "cpu.h"
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(ror_cc)(CPUARMState *env, uint32_t x, uint32_t i)
         return ((uint32_t)x >> shift) | (x << (32 - shift));
     }
 }
-
-void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
-{
-    /*
-     * Implement DC ZVA, which zeroes a fixed-length block of memory.
-     * Note that we do not implement the (architecturally mandated)
-     * alignment fault for attempts to use this on Device memory
-     * (which matches the usual QEMU behaviour of not implementing either
-     * alignment faults or any memory attribute handling).
-     */
-
-    ARMCPU *cpu = env_archcpu(env);
-    uint64_t blocklen = 4 << cpu->dcz_blocksize;
-    uint64_t vaddr = vaddr_in & ~(blocklen - 1);
-
-#ifndef CONFIG_USER_ONLY
-    {
-        /*
-         * Slightly awkwardly, QEMU's TARGET_PAGE_SIZE may be less than
-         * the block size so we might have to do more than one TLB lookup.
-         * We know that in fact for any v8 CPU the page size is at least 4K
-         * and the block size must be 2K or less, but TARGET_PAGE_SIZE is only
-         * 1K as an artefact of legacy v5 subpage support being present in the
-         * same QEMU executable. So in practice the hostaddr[] array has
-         * two entries, given the current setting of TARGET_PAGE_BITS_MIN.
-         */
-        int maxidx = DIV_ROUND_UP(blocklen, TARGET_PAGE_SIZE);
-        void *hostaddr[DIV_ROUND_UP(2 * KiB, 1 << TARGET_PAGE_BITS_MIN)];
-        int try, i;
-        unsigned mmu_idx = cpu_mmu_index(env, false);
-        TCGMemOpIdx oi = make_memop_idx(MO_UB, mmu_idx);
-
-        assert(maxidx <= ARRAY_SIZE(hostaddr));
-
-        for (try = 0; try < 2; try++) {
-
-            for (i = 0; i < maxidx; i++) {
-                hostaddr[i] = tlb_vaddr_to_host(env,
-                                                vaddr + TARGET_PAGE_SIZE * i,
-                                                1, mmu_idx);
-                if (!hostaddr[i]) {
-                    break;
-                }
-            }
-            if (i == maxidx) {
-                /*
-                 * If it's all in the TLB it's fair game for just writing to;
-                 * we know we don't need to update dirty status, etc.
-                 */
-                for (i = 0; i < maxidx - 1; i++) {
-                    memset(hostaddr[i], 0, TARGET_PAGE_SIZE);
-                }
-                memset(hostaddr[i], 0, blocklen - (i * TARGET_PAGE_SIZE));
-                return;
-            }
-            /*
-             * OK, try a store and see if we can populate the tlb. This
-             * might cause an exception if the memory isn't writable,
-             * in which case we will longjmp out of here. We must for
-             * this purpose use the actual register value passed to us
-             * so that we get the fault address right.
-             */
-            helper_ret_stb_mmu(env, vaddr_in, 0, oi, GETPC());
-            /* Now we can populate the other TLB entries, if any */
-            for (i = 0; i < maxidx; i++) {
-                uint64_t va = vaddr + TARGET_PAGE_SIZE * i;
-                if (va != (vaddr_in & TARGET_PAGE_MASK)) {
-                    helper_ret_stb_mmu(env, va, 0, oi, GETPC());
-                }
-            }
-        }
-
-        /*
-         * Slow path (probably attempt to do this to an I/O device or
-         * similar, or clearing of a block of code we have translations
-         * cached for). Just do a series of byte writes as the architecture
-         * demands. It's not worth trying to use a cpu_physical_memory_map(),
-         * memset(), unmap() sequence here because:
-         *  + we'd need to account for the blocksize being larger than a page
-         *  + the direct-RAM access case is almost always going to be dealt
-         *    with in the fastpath code above, so there's no speed benefit
-         *  + we would have to deal with the map returning NULL because the
-         *    bounce buffer was in use
-         */
-        for (i = 0; i < blocklen; i++) {
-            helper_ret_stb_mmu(env, vaddr + i, 0, oi, GETPC());
-        }
-    }
-#else
-    memset(g2h(vaddr), 0, blocklen);
-#endif
-}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The function does not write registers, and only reads them by
implication via the exception path.

From: Richard Henderson <richard.henderson@linaro.org>

This data access was forgotten when we added support for cleaning
addresses of TBI information.

Fixes: 3a471103ac1823ba
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200302175829.2183-8-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_sys(DisasContext *s, uint32_t insn, bool isread,
         return;
     case ARM_CP_DC_ZVA:
         /* Writes clear the aligned block of memory which rt points into. */
-        tcg_rt = cpu_reg(s, rt);
+        tcg_rt = clean_data_tbi(s, cpu_reg(s, rt));
         gen_helper_dc_zva(cpu_env, tcg_rt);
         return;
     default:
-- 
2.20.1

The following changes since commit 5a67d7735d4162630769ef495cf813244fc850df:

Merge remote-tracking branch 'remotes/berrange-gitlab/tags/tls-deps-pull-request' into staging (2021-07-02 08:22:39 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210702

for you to fetch changes up to 04ea4d3cfd0a21b248ece8eb7a9436a3d9898dd8:

target/arm: Implement MVE shifts by register (2021-07-02 11:48:38 +0100)

----------------------------------------------------------------
target-arm queue:
 * more MVE instructions
 * hw/gpio/gpio_pwr: use shutdown function for reboot
 * target/arm: Check NaN mode before silencing NaN
 * tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine
 * hw/arm: Add basic power management to raspi.
 * docs/system/arm: Add quanta-gbs-bmc, quanta-q7l1-bmc

----------------------------------------------------------------
Joe Komlodi (1):
      target/arm: Check NaN mode before silencing NaN

Maxim Uvarov (1):
      hw/gpio/gpio_pwr: use shutdown function for reboot

Nolan Leake (1):
      hw/arm: Add basic power management to raspi.

Patrick Venture (2):
      docs/system/arm: Add quanta-q7l1-bmc reference
      docs/system/arm: Add quanta-gbs-bmc reference

Peter Maydell (18):
      target/arm: Fix MVE widening/narrowing VLDR/VSTR offset calculation
      target/arm: Fix bugs in MVE VRMLALDAVH, VRMLSLDAVH
      target/arm: Make asimd_imm_const() public
      target/arm: Use asimd_imm_const for A64 decode
      target/arm: Use dup_const() instead of bitfield_replicate()
      target/arm: Implement MVE logical immediate insns
      target/arm: Implement MVE vector shift left by immediate insns
      target/arm: Implement MVE vector shift right by immediate insns
      target/arm: Implement MVE VSHLL
      target/arm: Implement MVE VSRI, VSLI
      target/arm: Implement MVE VSHRN, VRSHRN
      target/arm: Implement MVE saturating narrowing shifts
      target/arm: Implement MVE VSHLC
      target/arm: Implement MVE VADDLV
      target/arm: Implement MVE long shifts by immediate
      target/arm: Implement MVE long shifts by register
      target/arm: Implement MVE shifts by immediate
      target/arm: Implement MVE shifts by register

Philippe Mathieu-Daudé (1):
      tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine

docs/system/arm/aspeed.rst             |   1 +
 docs/system/arm/nuvoton.rst            |   5 +-
 include/hw/arm/bcm2835_peripherals.h   |   3 +-
 include/hw/misc/bcm2835_powermgt.h     |  29 ++
 target/arm/helper-mve.h                | 108 +++++++
 target/arm/translate.h                 |  41 +++
 target/arm/mve.decode                  | 177 ++++++++++-
 target/arm/t32.decode                  |  71 ++++-
 hw/arm/bcm2835_peripherals.c           |  13 +-
 hw/gpio/gpio_pwr.c                     |   2 +-
 hw/misc/bcm2835_powermgt.c             | 160 ++++++++++
 target/arm/helper-a64.c                |  12 +-
 target/arm/mve_helper.c                | 524 +++++++++++++++++++++++++++++++--
 target/arm/translate-a64.c             |  86 +-----
 target/arm/translate-mve.c             | 261 +++++++++++++++-
 target/arm/translate-neon.c            |  81 -----
 target/arm/translate.c                 | 327 +++++++++++++++++++-
 target/arm/vfp_helper.c                |  24 +-
 hw/misc/meson.build                    |   1 +
 tests/acceptance/boot_linux_console.py |  43 +++
 20 files changed, 1760 insertions(+), 209 deletions(-)
 create mode 100644 include/hw/misc/bcm2835_powermgt.h
 create mode 100644 hw/misc/bcm2835_powermgt.c

From: Patrick Venture <venture@google.com>

Add line item reference to quanta-gbs-bmc machine.

Signed-off-by: Patrick Venture <venture@google.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-id: 20210615192848.1065297-3-venture@google.com
[PMM: fixed underline Sphinx warning]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/nuvoton.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/nuvoton.rst
+++ b/docs/system/arm/nuvoton.rst
@@ -XXX,XX +XXX,XX @@
-Nuvoton iBMC boards (``npcm750-evb``, ``quanta-gsj``)
-=====================================================
+Nuvoton iBMC boards (``*-bmc``, ``npcm750-evb``, ``quanta-gsj``)
+================================================================
 
 The `Nuvoton iBMC`_ chips (NPCM7xx) are a family of ARM-based SoCs that are
 designed to be used as Baseboard Management Controllers (BMCs) in various
@@ -XXX,XX +XXX,XX @@ segment. The following machines are based on this chip :
 The NPCM730 SoC has two Cortex-A9 cores and is targeted for Data Center and
 Hyperscale applications. The following machines are based on this chip :
 
+- ``quanta-gbs-bmc``    Quanta GBS server BMC
 - ``quanta-gsj``        Quanta GSJ server BMC
 
 There are also two more SoCs, NPCM710 and NPCM705, which are single-core
-- 
2.20.1

From: Nolan Leake <nolan@sigbus.net>

This is just enough to make reboot and poweroff work. Works for
linux, u-boot, and the arm trusted firmware. Not tested, but should
work for plan9, and bare-metal/hobby OSes, since they seem to generally
do what linux does for reset.

The watchdog timer functionality is not yet implemented.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/64
Signed-off-by: Nolan Leake <nolan@sigbus.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210625210209.1870217-1-nolan@sigbus.net
[PMM: tweaked commit title; fixed region size to 0x200;
 moved header file to include/]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/bcm2835_peripherals.h |   3 +-
 include/hw/misc/bcm2835_powermgt.h   |  29 +++++
 hw/arm/bcm2835_peripherals.c         |  13 ++-
 hw/misc/bcm2835_powermgt.c           | 160 +++++++++++++++++++++++++++
 hw/misc/meson.build                  |   1 +
 5 files changed, 204 insertions(+), 2 deletions(-)
 create mode 100644 include/hw/misc/bcm2835_powermgt.h
 create mode 100644 hw/misc/bcm2835_powermgt.c

diff --git a/include/hw/arm/bcm2835_peripherals.h b/include/hw/arm/bcm2835_peripherals.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/bcm2835_peripherals.h
+++ b/include/hw/arm/bcm2835_peripherals.h
@@ -XXX,XX +XXX,XX @@
 #include "hw/misc/bcm2835_mphi.h"
 #include "hw/misc/bcm2835_thermal.h"
 #include "hw/misc/bcm2835_cprman.h"
+#include "hw/misc/bcm2835_powermgt.h"
 #include "hw/sd/sdhci.h"
 #include "hw/sd/bcm2835_sdhost.h"
 #include "hw/gpio/bcm2835_gpio.h"
@@ -XXX,XX +XXX,XX @@ struct BCM2835PeripheralState {
     BCM2835MphiState mphi;
     UnimplementedDeviceState txp;
     UnimplementedDeviceState armtmr;
-    UnimplementedDeviceState powermgt;
+    BCM2835PowerMgtState powermgt;
     BCM2835CprmanState cprman;
     PL011State uart0;
     BCM2835AuxState aux;
diff --git a/include/hw/misc/bcm2835_powermgt.h b/include/hw/misc/bcm2835_powermgt.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/hw/misc/bcm2835_powermgt.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * BCM2835 Power Management emulation
+ *
+ * Copyright (C) 2017 Marcin Chojnacki <marcinch7@gmail.com>
+ * Copyright (C) 2021 Nolan Leake <nolan@sigbus.net>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef BCM2835_POWERMGT_H
+#define BCM2835_POWERMGT_H
+
+#include "hw/sysbus.h"
+#include "qom/object.h"
+
+#define TYPE_BCM2835_POWERMGT "bcm2835-powermgt"
+OBJECT_DECLARE_SIMPLE_TYPE(BCM2835PowerMgtState, BCM2835_POWERMGT)
+
+struct BCM2835PowerMgtState {
+    SysBusDevice busdev;
+    MemoryRegion iomem;
+
+    uint32_t rstc;
+    uint32_t rsts;
+    uint32_t wdog;
+};
+
+#endif
diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/bcm2835_peripherals.c
+++ b/hw/arm/bcm2835_peripherals.c
@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_init(Object *obj)
 
     object_property_add_const_link(OBJECT(&s->dwc2), "dma-mr",
                                    OBJECT(&s->gpu_bus_mr));
+
+    /* Power Management */
+    object_initialize_child(obj, "powermgt", &s->powermgt,
+                            TYPE_BCM2835_POWERMGT);
 }
 
 static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
         qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
                                INTERRUPT_USB));
 
+    /* Power Management */
+    if (!sysbus_realize(SYS_BUS_DEVICE(&s->powermgt), errp)) {
+        return;
+    }
+
+    memory_region_add_subregion(&s->peri_mr, PM_OFFSET,
+                sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->powermgt), 0));
+
     create_unimp(s, &s->txp, "bcm2835-txp", TXP_OFFSET, 0x1000);
     create_unimp(s, &s->armtmr, "bcm2835-sp804", ARMCTRL_TIMER0_1_OFFSET, 0x40);
-    create_unimp(s, &s->powermgt, "bcm2835-powermgt", PM_OFFSET, 0x114);
     create_unimp(s, &s->i2s, "bcm2835-i2s", I2S_OFFSET, 0x100);
     create_unimp(s, &s->smi, "bcm2835-smi", SMI_OFFSET, 0x100);
     create_unimp(s, &s->spi[0], "bcm2835-spi0", SPI0_OFFSET, 0x20);
diff --git a/hw/misc/bcm2835_powermgt.c b/hw/misc/bcm2835_powermgt.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/misc/bcm2835_powermgt.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * BCM2835 Power Management emulation
+ *
+ * Copyright (C) 2017 Marcin Chojnacki <marcinch7@gmail.com>
+ * Copyright (C) 2021 Nolan Leake <nolan@sigbus.net>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "hw/misc/bcm2835_powermgt.h"
+#include "migration/vmstate.h"
+#include "sysemu/runstate.h"
+
+#define PASSWORD 0x5a000000
+#define PASSWORD_MASK 0xff000000
+
+#define R_RSTC 0x1c
+#define V_RSTC_RESET 0x20
+#define R_RSTS 0x20
+#define V_RSTS_POWEROFF 0x555 /* Linux uses partition 63 to indicate halt. */
+#define R_WDOG 0x24
+
+static uint64_t bcm2835_powermgt_read(void *opaque, hwaddr offset,
+                                      unsigned size)
+{
+    BCM2835PowerMgtState *s = (BCM2835PowerMgtState *)opaque;
+    uint32_t res = 0;
+
+    switch (offset) {
+    case R_RSTC:
+        res = s->rstc;
+        break;
+    case R_RSTS:
+        res = s->rsts;
+        break;
+    case R_WDOG:
+        res = s->wdog;
+        break;
+
+    default:
+        qemu_log_mask(LOG_UNIMP,
+                      "bcm2835_powermgt_read: Unknown offset 0x%08"HWADDR_PRIx
+                      "\n", offset);
+        res = 0;
+        break;
+    }
+
+    return res;
+}
+
+static void bcm2835_powermgt_write(void *opaque, hwaddr offset,
+                                   uint64_t value, unsigned size)
+{
+    BCM2835PowerMgtState *s = (BCM2835PowerMgtState *)opaque;
+
+    if ((value & PASSWORD_MASK) != PASSWORD) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "bcm2835_powermgt_write: Bad password 0x%"PRIx64
+                      " at offset 0x%08"HWADDR_PRIx"\n",
+                      value, offset);
+        return;
+    }
+
+    value = value & ~PASSWORD_MASK;
+
+    switch (offset) {
+    case R_RSTC:
+        s->rstc = value;
+        if (value & V_RSTC_RESET) {
+            if ((s->rsts & 0xfff) == V_RSTS_POWEROFF) {
+                qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+            } else {
+                qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+            }
+        }
+        break;
+    case R_RSTS:
+        qemu_log_mask(LOG_UNIMP,
+                      "bcm2835_powermgt_write: RSTS\n");
+        s->rsts = value;
+        break;
+    case R_WDOG:
+        qemu_log_mask(LOG_UNIMP,
+                      "bcm2835_powermgt_write: WDOG\n");
+        s->wdog = value;
+        break;
+
+    default:
+        qemu_log_mask(LOG_UNIMP,
+                      "bcm2835_powermgt_write: Unknown offset 0x%08"HWADDR_PRIx
+                      "\n", offset);
+        break;
+    }
+}
+
+static const MemoryRegionOps bcm2835_powermgt_ops = {
+    .read = bcm2835_powermgt_read,
+    .write = bcm2835_powermgt_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .impl.min_access_size = 4,
+    .impl.max_access_size = 4,
+};
+
+static const VMStateDescription vmstate_bcm2835_powermgt = {
+    .name = TYPE_BCM2835_POWERMGT,
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(rstc, BCM2835PowerMgtState),
+        VMSTATE_UINT32(rsts, BCM2835PowerMgtState),
+        VMSTATE_UINT32(wdog, BCM2835PowerMgtState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static void bcm2835_powermgt_init(Object *obj)
+{
+    BCM2835PowerMgtState *s = BCM2835_POWERMGT(obj);
+
+    memory_region_init_io(&s->iomem, obj, &bcm2835_powermgt_ops, s,
+                          TYPE_BCM2835_POWERMGT, 0x200);
+    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->iomem);
+}
+
+static void bcm2835_powermgt_reset(DeviceState *dev)
+{
+    BCM2835PowerMgtState *s = BCM2835_POWERMGT(dev);
+
+    /* https://elinux.org/BCM2835_registers#PM */
+    s->rstc = 0x00000102;
+    s->rsts = 0x00001000;
+    s->wdog = 0x00000000;
+}
+
+static void bcm2835_powermgt_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->reset = bcm2835_powermgt_reset;
+    dc->vmsd = &vmstate_bcm2835_powermgt;
+}
+
+static TypeInfo bcm2835_powermgt_info = {
+    .name          = TYPE_BCM2835_POWERMGT,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(BCM2835PowerMgtState),
+    .class_init    = bcm2835_powermgt_class_init,
+    .instance_init = bcm2835_powermgt_init,
+};
+
+static void bcm2835_powermgt_register_types(void)
+{
+    type_register_static(&bcm2835_powermgt_info);
+}
+
+type_init(bcm2835_powermgt_register_types)
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_RASPI', if_true: files(
   'bcm2835_rng.c',
   'bcm2835_thermal.c',
   'bcm2835_cprman.c',
+  'bcm2835_powermgt.c',
 ))
 softmmu_ss.add(when: 'CONFIG_SLAVIO', if_true: files('slavio_misc.c'))
 softmmu_ss.add(when: 'CONFIG_ZYNQ', if_true: files('zynq_slcr.c', 'zynq-xadc.c'))
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

Add a test booting and quickly shutdown a raspi2 machine,
to test the power management model:

(1/1) tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_raspi2_initrd:
  console: [    0.000000] Booting Linux on physical CPU 0xf00
  console: [    0.000000] Linux version 4.14.98-v7+ (dom@dom-XPS-13-9370) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1200 SMP Tue Feb 12 20:27:48 GMT 2019
  console: [    0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
  console: [    0.000000] CPU: div instructions available: patching division code
  console: [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
  console: [    0.000000] OF: fdt: Machine model: Raspberry Pi 2 Model B
  ...
  console: Boot successful.
  console: cat /proc/cpuinfo
  console: / # cat /proc/cpuinfo
  ...
  console: processor      : 3
  console: model name     : ARMv7 Processor rev 5 (v7l)
  console: BogoMIPS       : 125.00
  console: Features       : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
  console: CPU implementer        : 0x41
  console: CPU architecture: 7
  console: CPU variant    : 0x0
  console: CPU part       : 0xc07
  console: CPU revision   : 5
  console: Hardware       : BCM2835
  console: Revision       : 0000
  console: Serial         : 0000000000000000
  console: cat /proc/iomem
  console: / # cat /proc/iomem
  console: 00000000-3bffffff : System RAM
  console: 00008000-00afffff : Kernel code
  console: 00c00000-00d468ef : Kernel data
  console: 3f006000-3f006fff : dwc_otg
  console: 3f007000-3f007eff : /soc/dma@7e007000
  console: 3f00b880-3f00b8bf : /soc/mailbox@7e00b880
  console: 3f100000-3f100027 : /soc/watchdog@7e100000
  console: 3f101000-3f102fff : /soc/cprman@7e101000
  console: 3f200000-3f2000b3 : /soc/gpio@7e200000
  PASS (24.59 s)
  RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
  JOB TIME   : 25.02 s

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Message-id: 20210531113837.1689775-1-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/acceptance/boot_linux_console.py | 43 ++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/tests/acceptance/boot_linux_console.py b/tests/acceptance/boot_linux_console.py
index XXXXXXX..XXXXXXX 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -XXX,XX +XXX,XX @@
 from avocado import skip
 from avocado import skipUnless
 from avocado_qemu import Test
+from avocado_qemu import exec_command
 from avocado_qemu import exec_command_and_wait_for_pattern
 from avocado_qemu import interrupt_interactive_console_until_pattern
 from avocado_qemu import wait_for_console_pattern
@@ -XXX,XX +XXX,XX @@ def test_arm_raspi2_uart0(self):
         """
         self.do_test_arm_raspi2(0)
 
+    def test_arm_raspi2_initrd(self):
+        """
+        :avocado: tags=arch:arm
+        :avocado: tags=machine:raspi2
+        """
+        deb_url = ('http://archive.raspberrypi.org/debian/'
+                   'pool/main/r/raspberrypi-firmware/'
+                   'raspberrypi-kernel_1.20190215-1_armhf.deb')
+        deb_hash = 'cd284220b32128c5084037553db3c482426f3972'
+        deb_path = self.fetch_asset(deb_url, asset_hash=deb_hash)
+        kernel_path = self.extract_from_deb(deb_path, '/boot/kernel7.img')
+        dtb_path = self.extract_from_deb(deb_path, '/boot/bcm2709-rpi-2-b.dtb')
+
+        initrd_url = ('https://github.com/groeck/linux-build-test/raw/'
+                      '2eb0a73b5d5a28df3170c546ddaaa9757e1e0848/rootfs/'
+                      'arm/rootfs-armv7a.cpio.gz')
+        initrd_hash = '604b2e45cdf35045846b8bbfbf2129b1891bdc9c'
+        initrd_path_gz = self.fetch_asset(initrd_url, asset_hash=initrd_hash)
+        initrd_path = os.path.join(self.workdir, 'rootfs.cpio')
+        archive.gzip_uncompress(initrd_path_gz, initrd_path)
+
+        self.vm.set_console()
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'earlycon=pl011,0x3f201000 console=ttyAMA0 '
+                               'panic=-1 noreboot ' +
+                               'dwc_otg.fiq_fsm_enable=0')
+        self.vm.add_args('-kernel', kernel_path,
+                         '-dtb', dtb_path,
+                         '-initrd', initrd_path,
+                         '-append', kernel_command_line,
+                         '-no-reboot')
+        self.vm.launch()
+        self.wait_for_console_pattern('Boot successful.')
+
+        exec_command_and_wait_for_pattern(self, 'cat /proc/cpuinfo',
+                                                'BCM2835')
+        exec_command_and_wait_for_pattern(self, 'cat /proc/iomem',
+                                                '/soc/cprman@7e101000')
+        exec_command(self, 'halt')
+        # Wait for VM to shut down gracefully
+        self.vm.wait()
+
     def test_arm_exynos4210_initrd(self):
         """
         :avocado: tags=arch:arm
-- 
2.20.1

From: Joe Komlodi <joe.komlodi@xilinx.com>

If the CPU is running in default NaN mode (FPCR.DN == 1) and we execute
FRSQRTE, FRECPE, or FRECPX with a signaling NaN, parts_silence_nan_frac() will
assert due to fpst->default_nan_mode being set.

To avoid this, we check to see what NaN mode we're running in before we call
floatxx_silence_nan().

Signed-off-by: Joe Komlodi <joe.komlodi@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1624662174-175828-2-git-send-email-joe.komlodi@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-a64.c | 12 +++++++++---
 target/arm/vfp_helper.c | 24 ++++++++++++++++++------
 2 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(frecpx_f16)(uint32_t a, void *fpstp)
         float16 nan = a;
         if (float16_is_signaling_nan(a, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float16_silence_nan(a, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float16_silence_nan(a, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan = float16_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(frecpx_f32)(float32 a, void *fpstp)
         float32 nan = a;
         if (float32_is_signaling_nan(a, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float32_silence_nan(a, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float32_silence_nan(a, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan = float32_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(frecpx_f64)(float64 a, void *fpstp)
         float64 nan = a;
         if (float64_is_signaling_nan(a, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float64_silence_nan(a, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float64_silence_nan(a, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan = float64_default_nan(fpst);
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, void *fpstp)
         float16 nan = f16;
         if (float16_is_signaling_nan(f16, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float16_silence_nan(f16, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float16_silence_nan(f16, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan =  float16_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(recpe_f32)(float32 input, void *fpstp)
         float32 nan = f32;
         if (float32_is_signaling_nan(f32, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float32_silence_nan(f32, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float32_silence_nan(f32, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan =  float32_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(recpe_f64)(float64 input, void *fpstp)
         float64 nan = f64;
         if (float64_is_signaling_nan(f64, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float64_silence_nan(f64, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float64_silence_nan(f64, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan =  float64_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, void *fpstp)
         float16 nan = f16;
         if (float16_is_signaling_nan(f16, s)) {
             float_raise(float_flag_invalid, s);
-            nan = float16_silence_nan(f16, s);
+            if (!s->default_nan_mode) {
+                nan = float16_silence_nan(f16, fpstp);
+            }
         }
         if (s->default_nan_mode) {
             nan =  float16_default_nan(s);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(rsqrte_f32)(float32 input, void *fpstp)
         float32 nan = f32;
         if (float32_is_signaling_nan(f32, s)) {
             float_raise(float_flag_invalid, s);
-            nan = float32_silence_nan(f32, s);
+            if (!s->default_nan_mode) {
+                nan = float32_silence_nan(f32, fpstp);
+            }
         }
         if (s->default_nan_mode) {
             nan =  float32_default_nan(s);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(rsqrte_f64)(float64 input, void *fpstp)
         float64 nan = f64;
         if (float64_is_signaling_nan(f64, s)) {
             float_raise(float_flag_invalid, s);
-            nan = float64_silence_nan(f64, s);
+            if (!s->default_nan_mode) {
+                nan = float64_silence_nan(f64, fpstp);
+            }
         }
         if (s->default_nan_mode) {
             nan =  float64_default_nan(s);
-- 
2.20.1

From: Maxim Uvarov <maxim.uvarov@linaro.org>

qemu has 2 type of functions: shutdown and reboot. Shutdown
function has to be used for machine shutdown. Otherwise we cause
a reset with a bogus "cause" value, when we intended a shutdown.

Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210625111842.3790-3-maxim.uvarov@linaro.org
[PMM: tweaked commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/gpio/gpio_pwr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/gpio/gpio_pwr.c b/hw/gpio/gpio_pwr.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/gpio/gpio_pwr.c
+++ b/hw/gpio/gpio_pwr.c
@@ -XXX,XX +XXX,XX @@ static void gpio_pwr_reset(void *opaque, int n, int level)
 static void gpio_pwr_shutdown(void *opaque, int n, int level)
 {
     if (level) {
-        qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
     }
 }
 
-- 
2.20.1

In do_ldst(), the calculation of the offset needs to be based on the
size of the memory access, not the size of the elements in the
vector.  This meant we were getting it wrong for the widening and
narrowing variants of the various VLDR and VSTR insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-2-peter.maydell@linaro.org
---
 target/arm/translate-mve.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool mve_skip_first_beat(DisasContext *s)
     }
 }
 
-static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
+static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn,
+                    unsigned msize)
 {
     TCGv_i32 addr;
     uint32_t offset;
@@ -XXX,XX +XXX,XX @@ static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
         return true;
     }
 
-    offset = a->imm << a->size;
+    offset = a->imm << msize;
     if (!a->a) {
         offset = -offset;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
         { gen_helper_mve_vstrw, gen_helper_mve_vldrw },
         { NULL, NULL }
     };
-    return do_ldst(s, a, ldstfns[a->size][a->l]);
+    return do_ldst(s, a, ldstfns[a->size][a->l], a->size);
 }
 
-#define DO_VLDST_WIDE_NARROW(OP, SLD, ULD, ST)                  \
+#define DO_VLDST_WIDE_NARROW(OP, SLD, ULD, ST, MSIZE)           \
     static bool trans_##OP(DisasContext *s, arg_VLDR_VSTR *a)   \
     {                                                           \
         static MVEGenLdStFn * const ldstfns[2][2] = {           \
             { gen_helper_mve_##ST, gen_helper_mve_##SLD },      \
             { NULL, gen_helper_mve_##ULD },                     \
         };                                                      \
-        return do_ldst(s, a, ldstfns[a->u][a->l]);              \
+        return do_ldst(s, a, ldstfns[a->u][a->l], MSIZE);       \
     }
 
-DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h)
-DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w)
-DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w)
+DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h, MO_8)
+DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w, MO_8)
+DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w, MO_16)
 
 static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
 {
-- 
2.20.1

The initial implementation of the MVE VRMLALDAVH and VRMLSLDAVH
insns had some bugs:
 * the 32x32 multiply of elements was being done as 32x32->32,
   not 32x32->64
 * we were incorrectly maintaining the accumulator in its full
   72-bit form across all 4 beats of the insn; in the pseudocode
   it is squashed back into the 64 bits of the RdaHi:RdaLo
   registers after each beat

In particular, fixing the second of these allows us to recast
the implementation to avoid 128-bit arithmetic entirely.

Since the element size here is always 4, we can also drop the
parameterization of ESIZE to make the code a little more readable.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-3-peter.maydell@linaro.org
---
 target/arm/mve_helper.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu/int128.h"
 #include "cpu.h"
 #include "internals.h"
 #include "vec_internal.h"
@@ -XXX,XX +XXX,XX @@ DO_LDAV(vmlsldavsw, 4, int32_t, false, +=, -=)
 DO_LDAV(vmlsldavxsw, 4, int32_t, true, +=, -=)
 
 /*
- * Rounding multiply add long dual accumulate high: we must keep
- * a 72-bit internal accumulator value and return the top 64 bits.
+ * Rounding multiply add long dual accumulate high. In the pseudocode
+ * this is implemented with a 72-bit internal accumulator value of which
+ * the top 64 bits are returned. We optimize this to avoid having to
+ * use 128-bit arithmetic -- we can do this because the 74-bit accumulator
+ * is squashed back into 64-bits after each beat.
  */
-#define DO_LDAVH(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC, TO128)         \
+#define DO_LDAVH(OP, TYPE, LTYPE, XCHG, SUB)                            \
     uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
                                     void *vm, uint64_t a)               \
     {                                                                   \
         uint16_t mask = mve_element_mask(env);                          \
         unsigned e;                                                     \
         TYPE *n = vn, *m = vm;                                          \
-        Int128 acc = int128_lshift(TO128(a), 8);                        \
-        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+        for (e = 0; e < 16 / 4; e++, mask >>= 4) {                      \
             if (mask & 1) {                                             \
+                LTYPE mul;                                              \
                 if (e & 1) {                                            \
-                    acc = ODDACC(acc, TO128(n[H##ESIZE(e - 1 * XCHG)] * \
-                                            m[H##ESIZE(e)]));           \
+                    mul = (LTYPE)n[H4(e - 1 * XCHG)] * m[H4(e)];        \
+                    if (SUB) {                                          \
+                        mul = -mul;                                     \
+                    }                                                   \
                 } else {                                                \
-                    acc = EVENACC(acc, TO128(n[H##ESIZE(e + 1 * XCHG)] * \
-                                             m[H##ESIZE(e)]));          \
+                    mul = (LTYPE)n[H4(e + 1 * XCHG)] * m[H4(e)];        \
                 }                                                       \
-                acc = int128_add(acc, int128_make64(1 << 7));           \
+                mul = (mul >> 8) + ((mul >> 7) & 1);                    \
+                a += mul;                                               \
             }                                                           \
         }                                                               \
         mve_advance_vpt(env);                                           \
-        return int128_getlo(int128_rshift(acc, 8));                     \
+        return a;                                                       \
     }
 
-DO_LDAVH(vrmlaldavhsw, 4, int32_t, false, int128_add, int128_add, int128_makes64)
-DO_LDAVH(vrmlaldavhxsw, 4, int32_t, true, int128_add, int128_add, int128_makes64)
+DO_LDAVH(vrmlaldavhsw, int32_t, int64_t, false, false)
+DO_LDAVH(vrmlaldavhxsw, int32_t, int64_t, true, false)
 
-DO_LDAVH(vrmlaldavhuw, 4, uint32_t, false, int128_add, int128_add, int128_make64)
+DO_LDAVH(vrmlaldavhuw, uint32_t, uint64_t, false, false)
 
-DO_LDAVH(vrmlsldavhsw, 4, int32_t, false, int128_add, int128_sub, int128_makes64)
-DO_LDAVH(vrmlsldavhxsw, 4, int32_t, true, int128_add, int128_sub, int128_makes64)
+DO_LDAVH(vrmlsldavhsw, int32_t, int64_t, false, true)
+DO_LDAVH(vrmlsldavhxsw, int32_t, int64_t, true, true)
 
 /* Vector add across vector */
 #define DO_VADDV(OP, ESIZE, TYPE)                               \
-- 
2.20.1

The function asimd_imm_const() in translate-neon.c is an
implementation of the pseudocode AdvSIMDExpandImm(), which we will
also want for MVE.  Move the implementation to translate.c, with a
prototype in translate.h.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-4-peter.maydell@linaro.org
---
 target/arm/translate.h      | 16 ++++++++++
 target/arm/translate-neon.c | 63 -------------------------------------
 target/arm/translate.c      | 57 +++++++++++++++++++++++++++++++++
 3 files changed, 73 insertions(+), 63 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ static inline MemOp finalize_memop(DisasContext *s, MemOp opc)
     return opc | s->be_data;
 }
 
+/**
+ * asimd_imm_const: Expand an encoded SIMD constant value
+ *
+ * Expand a SIMD constant value. This is essentially the pseudocode
+ * AdvSIMDExpandImm, except that we also perform the boolean NOT needed for
+ * VMVN and VBIC (when cmode < 14 && op == 1).
+ *
+ * The combination cmode == 15 op == 1 is a reserved encoding for AArch32;
+ * callers must catch this.
+ *
+ * cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 was UNPREDICTABLE in v7A but
+ * is either not unpredictable or merely CONSTRAINED UNPREDICTABLE in v8A;
+ * we produce an immediate constant value of 0 in these cases.
+ */
+uint64_t asimd_imm_const(uint32_t imm, int cmode, int op);
+
 #endif /* TARGET_ARM_TRANSLATE_H */
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ DO_FP_2SH(VCVT_UH, gen_helper_gvec_vcvt_uh)
 DO_FP_2SH(VCVT_HS, gen_helper_gvec_vcvt_hs)
 DO_FP_2SH(VCVT_HU, gen_helper_gvec_vcvt_hu)
 
-static uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
-{
-    /*
-     * Expand the encoded constant.
-     * Note that cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 is UNPREDICTABLE.
-     * We choose to not special-case this and will behave as if a
-     * valid constant encoding of 0 had been given.
-     * cmode = 15 op = 1 must UNDEF; we assume decode has handled that.
-     */
-    switch (cmode) {
-    case 0: case 1:
-        /* no-op */
-        break;
-    case 2: case 3:
-        imm <<= 8;
-        break;
-    case 4: case 5:
-        imm <<= 16;
-        break;
-    case 6: case 7:
-        imm <<= 24;
-        break;
-    case 8: case 9:
-        imm |= imm << 16;
-        break;
-    case 10: case 11:
-        imm = (imm << 8) | (imm << 24);
-        break;
-    case 12:
-        imm = (imm << 8) | 0xff;
-        break;
-    case 13:
-        imm = (imm << 16) | 0xffff;
-        break;
-    case 14:
-        if (op) {
-            /*
-             * This is the only case where the top and bottom 32 bits
-             * of the encoded constant differ.
-             */
-            uint64_t imm64 = 0;
-            int n;
-
-            for (n = 0; n < 8; n++) {
-                if (imm & (1 << n)) {
-                    imm64 |= (0xffULL << (n * 8));
-                }
-            }
-            return imm64;
-        }
-        imm |= (imm << 8) | (imm << 16) | (imm << 24);
-        break;
-    case 15:
-        imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
-            | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
-        break;
-    }
-    if (op) {
-        imm = ~imm;
-    }
-    return dup_const(MO_32, imm);
-}
-
 static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
                         GVecGen2iFn *fn)
 {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ void arm_translate_init(void)
     a64_translate_init();
 }
 
+uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
+{
+    /* Expand the encoded constant as per AdvSIMDExpandImm pseudocode */
+    switch (cmode) {
+    case 0: case 1:
+        /* no-op */
+        break;
+    case 2: case 3:
+        imm <<= 8;
+        break;
+    case 4: case 5:
+        imm <<= 16;
+        break;
+    case 6: case 7:
+        imm <<= 24;
+        break;
+    case 8: case 9:
+        imm |= imm << 16;
+        break;
+    case 10: case 11:
+        imm = (imm << 8) | (imm << 24);
+        break;
+    case 12:
+        imm = (imm << 8) | 0xff;
+        break;
+    case 13:
+        imm = (imm << 16) | 0xffff;
+        break;
+    case 14:
+        if (op) {
+            /*
+             * This is the only case where the top and bottom 32 bits
+             * of the encoded constant differ.
+             */
+            uint64_t imm64 = 0;
+            int n;
+
+            for (n = 0; n < 8; n++) {
+                if (imm & (1 << n)) {
+                    imm64 |= (0xffULL << (n * 8));
+                }
+            }
+            return imm64;
+        }
+        imm |= (imm << 8) | (imm << 16) | (imm << 24);
+        break;
+    case 15:
+        imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
+            | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
+        break;
+    }
+    if (op) {
+        imm = ~imm;
+    }
+    return dup_const(MO_32, imm);
+}
+
 /* Generate a label used for skipping this instruction */
 void arm_gen_condlabel(DisasContext *s)
 {
-- 
2.20.1

The A64 AdvSIMD modified-immediate grouping uses almost the same
constant encoding that A32 Neon does; reuse asimd_imm_const() (to
which we add the AArch64-specific case for cmode 15 op 1) instead of
reimplementing it all.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-5-peter.maydell@linaro.org
---
 target/arm/translate.h     |  3 +-
 target/arm/translate-a64.c | 86 ++++----------------------------------
 target/arm/translate.c     | 17 +++++++-
 3 files changed, 24 insertions(+), 82 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ static inline MemOp finalize_memop(DisasContext *s, MemOp opc)
  * VMVN and VBIC (when cmode < 14 && op == 1).
  *
  * The combination cmode == 15 op == 1 is a reserved encoding for AArch32;
- * callers must catch this.
+ * callers must catch this; we return the 64-bit constant value defined
+ * for AArch64.
  *
  * cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 was UNPREDICTABLE in v7A but
  * is either not unpredictable or merely CONSTRAINED UNPREDICTABLE in v8A;
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
 {
     int rd = extract32(insn, 0, 5);
     int cmode = extract32(insn, 12, 4);
-    int cmode_3_1 = extract32(cmode, 1, 3);
-    int cmode_0 = extract32(cmode, 0, 1);
     int o2 = extract32(insn, 11, 1);
     uint64_t abcdefgh = extract32(insn, 5, 5) | (extract32(insn, 16, 3) << 5);
     bool is_neg = extract32(insn, 29, 1);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
         return;
     }
 
-    /* See AdvSIMDExpandImm() in ARM ARM */
-    switch (cmode_3_1) {
-    case 0: /* Replicate(Zeros(24):imm8, 2) */
-    case 1: /* Replicate(Zeros(16):imm8:Zeros(8), 2) */
-    case 2: /* Replicate(Zeros(8):imm8:Zeros(16), 2) */
-    case 3: /* Replicate(imm8:Zeros(24), 2) */
-    {
-        int shift = cmode_3_1 * 8;
-        imm = bitfield_replicate(abcdefgh << shift, 32);
-        break;
-    }
-    case 4: /* Replicate(Zeros(8):imm8, 4) */
-    case 5: /* Replicate(imm8:Zeros(8), 4) */
-    {
-        int shift = (cmode_3_1 & 0x1) * 8;
-        imm = bitfield_replicate(abcdefgh << shift, 16);
-        break;
-    }
-    case 6:
-        if (cmode_0) {
-            /* Replicate(Zeros(8):imm8:Ones(16), 2) */
-            imm = (abcdefgh << 16) | 0xffff;
-        } else {
-            /* Replicate(Zeros(16):imm8:Ones(8), 2) */
-            imm = (abcdefgh << 8) | 0xff;
-        }
-        imm = bitfield_replicate(imm, 32);
-        break;
-    case 7:
-        if (!cmode_0 && !is_neg) {
-            imm = bitfield_replicate(abcdefgh, 8);
-        } else if (!cmode_0 && is_neg) {
-            int i;
-            imm = 0;
-            for (i = 0; i < 8; i++) {
-                if ((abcdefgh) & (1 << i)) {
-                    imm |= 0xffULL << (i * 8);
-                }
-            }
-        } else if (cmode_0) {
-            if (is_neg) {
-                imm = (abcdefgh & 0x3f) << 48;
-                if (abcdefgh & 0x80) {
-                    imm |= 0x8000000000000000ULL;
-                }
-                if (abcdefgh & 0x40) {
-                    imm |= 0x3fc0000000000000ULL;
-                } else {
-                    imm |= 0x4000000000000000ULL;
-                }
-            } else {
-                if (o2) {
-                    /* FMOV (vector, immediate) - half-precision */
-                    imm = vfp_expand_imm(MO_16, abcdefgh);
-                    /* now duplicate across the lanes */
-                    imm = bitfield_replicate(imm, 16);
-                } else {
-                    imm = (abcdefgh & 0x3f) << 19;
-                    if (abcdefgh & 0x80) {
-                        imm |= 0x80000000;
-                    }
-                    if (abcdefgh & 0x40) {
-                        imm |= 0x3e000000;
-                    } else {
-                        imm |= 0x40000000;
-                    }
-                    imm |= (imm << 32);
-                }
-            }
-        }
-        break;
-    default:
-        g_assert_not_reached();
-    }
-
-    if (cmode_3_1 != 7 && is_neg) {
-        imm = ~imm;
+    if (cmode == 15 && o2 && !is_neg) {
+        /* FMOV (vector, immediate) - half-precision */
+        imm = vfp_expand_imm(MO_16, abcdefgh);
+        /* now duplicate across the lanes */
+        imm = bitfield_replicate(imm, 16);
+    } else {
+        imm = asimd_imm_const(abcdefgh, cmode, is_neg);
     }
 
     if (!((cmode & 0x9) == 0x1 || (cmode & 0xd) == 0x9)) {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
     case 14:
         if (op) {
             /*
-             * This is the only case where the top and bottom 32 bits
-             * of the encoded constant differ.
+             * This and cmode == 15 op == 1 are the only cases where
+             * the top and bottom 32 bits of the encoded constant differ.
              */
             uint64_t imm64 = 0;
             int n;
@@ -XXX,XX +XXX,XX @@ uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
         imm |= (imm << 8) | (imm << 16) | (imm << 24);
         break;
     case 15:
+        if (op) {
+            /* Reserved encoding for AArch32; valid for AArch64 */
+            uint64_t imm64 = (uint64_t)(imm & 0x3f) << 48;
+            if (imm & 0x80) {
+                imm64 |= 0x8000000000000000ULL;
+            }
+            if (imm & 0x40) {
+                imm64 |= 0x3fc0000000000000ULL;
+            } else {
+                imm64 |= 0x4000000000000000ULL;
+            }
+            return imm64;
+        }
         imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
             | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
         break;
-- 
2.20.1

Use dup_const() instead of bitfield_replicate() in
disas_simd_mod_imm().

(We can't replace the other use of bitfield_replicate() in this file,
in logic_imm_decode_wmask(), because that location needs to handle 2
and 4 bit elements, which dup_const() cannot.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-6-peter.maydell@linaro.org
---
 target/arm/translate-a64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
         /* FMOV (vector, immediate) - half-precision */
         imm = vfp_expand_imm(MO_16, abcdefgh);
         /* now duplicate across the lanes */
-        imm = bitfield_replicate(imm, 16);
+        imm = dup_const(MO_16, imm);
     } else {
         imm = asimd_imm_const(abcdefgh, cmode, is_neg);
     }
-- 
2.20.1

Implement the MVE logical-immediate insns (VMOV, VMVN,
VORR and VBIC). These have essentially the same encoding
as their Neon equivalents, and we implement the decode
in the same way.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-7-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  4 +++
 target/arm/mve.decode      | 17 +++++++++++++
 target/arm/mve_helper.c    | 24 ++++++++++++++++++
 target/arm/translate-mve.c | 50 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 95 insertions(+)

Implement the MVE shift-vector-left-by-immediate insns VSHL, VQSHL
and VQSHLU.

The size-and-immediate encoding here is the same as Neon, and we
handle it the same way neon-dp.decode does.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-8-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 16 +++++++++++
 target/arm/mve.decode      | 23 +++++++++++++++
 target/arm/mve_helper.c    | 57 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 51 ++++++++++++++++++++++++++++++++++
 4 files changed, 147 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
 DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
 DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
+
+DEF_HELPER_FLAGS_4(mve_vshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshli_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshli_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshlui_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshlui_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshlui_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
 &2op qd qm qn size
 &2scalar qd qn rm size
 &1imm qd imm cmode op
+&2shift qd qm shift size
 
 @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
 # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -XXX,XX +XXX,XX @@
 @2scalar .... .... .. size:2 .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
 @2scalar_nosz .... .... .... .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
 
+@2_shl_b .... .... .. 001 shift:3 .... .... .... .... &2shift qd=%qd qm=%qm size=0
+@2_shl_h .... .... .. 01  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
+@2_shl_w .... .... .. 1   shift:5 .... .... .... .... &2shift qd=%qd qm=%qm size=2
+
 # Vector loads and stores
 
 # Widening loads and narrowing stores:
@@ -XXX,XX +XXX,XX @@ VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
 # So we have a single decode line and check the cmode/op in the
 # trans function.
 Vimm_1r 111 . 1111 1 . 00 0 ... ... 0 .... 0 1 . 1 .... @1imm
+
+# Shifts by immediate
+
+VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_b
+VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_h
+VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_w
+
+VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_b
+VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_h
+VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_w
+
+VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_b
+VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_h
+VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_w
+
+VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_b
+VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_h
+VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_w
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT(vqsubsw, 4, int32_t, DO_SQSUB_W)
     WRAP_QRSHL_HELPER(do_sqrshl_bhs, N, M, true, satp)
 #define DO_UQRSHL_OP(N, M, satp) \
     WRAP_QRSHL_HELPER(do_uqrshl_bhs, N, M, true, satp)
+#define DO_SUQSHL_OP(N, M, satp) \
+    WRAP_QRSHL_HELPER(do_suqrshl_bhs, N, M, false, satp)
 
 DO_2OP_SAT_S(vqshls, DO_SQSHL_OP)
 DO_2OP_SAT_U(vqshlu, DO_UQSHL_OP)
@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvsw, 4, uint32_t)
 DO_VADDV(vaddvub, 1, uint8_t)
 DO_VADDV(vaddvuh, 2, uint16_t)
 DO_VADDV(vaddvuw, 4, uint32_t)
+
+/* Shifts by immediate */
+#define DO_2SHIFT(OP, ESIZE, TYPE, FN)                          \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
+                                void *vm, uint32_t shift)       \
+    {                                                           \
+        TYPE *d = vd, *m = vm;                                  \
+        uint16_t mask = mve_element_mask(env);                  \
+        unsigned e;                                             \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
+            mergemask(&d[H##ESIZE(e)],                          \
+                      FN(m[H##ESIZE(e)], shift), mask);         \
+        }                                                       \
+        mve_advance_vpt(env);                                   \
+    }
+
+#define DO_2SHIFT_SAT(OP, ESIZE, TYPE, FN)                      \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
+                                void *vm, uint32_t shift)       \
+    {                                                           \
+        TYPE *d = vd, *m = vm;                                  \
+        uint16_t mask = mve_element_mask(env);                  \
+        unsigned e;                                             \
+        bool qc = false;                                        \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
+            bool sat = false;                                   \
+            mergemask(&d[H##ESIZE(e)],                          \
+                      FN(m[H##ESIZE(e)], shift, &sat), mask);   \
+            qc |= sat & mask & 1;                               \
+        }                                                       \
+        if (qc) {                                               \
+            env->vfp.qc[0] = qc;                                \
+        }                                                       \
+        mve_advance_vpt(env);                                   \
+    }
+
+/* provide unsigned 2-op shift helpers for all sizes */
+#define DO_2SHIFT_U(OP, FN)                     \
+    DO_2SHIFT(OP##b, 1, uint8_t, FN)            \
+    DO_2SHIFT(OP##h, 2, uint16_t, FN)           \
+    DO_2SHIFT(OP##w, 4, uint32_t, FN)
+
+#define DO_2SHIFT_SAT_U(OP, FN)                 \
+    DO_2SHIFT_SAT(OP##b, 1, uint8_t, FN)        \
+    DO_2SHIFT_SAT(OP##h, 2, uint16_t, FN)       \
+    DO_2SHIFT_SAT(OP##w, 4, uint32_t, FN)
+#define DO_2SHIFT_SAT_S(OP, FN)                 \
+    DO_2SHIFT_SAT(OP##b, 1, int8_t, FN)         \
+    DO_2SHIFT_SAT(OP##h, 2, int16_t, FN)        \
+    DO_2SHIFT_SAT(OP##w, 4, int32_t, FN)
+
+DO_2SHIFT_U(vshli_u, DO_VSHLU)
+DO_2SHIFT_SAT_U(vqshli_u, DO_UQSHL_OP)
+DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
+DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
+typedef void MVEGenTwoOpShiftFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
 typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1imm *a)
     }
     return do_1imm(s, a, fn);
 }
+
+static bool do_2shift(DisasContext *s, arg_2shift *a, MVEGenTwoOpShiftFn fn,
+                      bool negateshift)
+{
+    TCGv_ptr qd, qm;
+    int shift = a->shift;
+
+    if (!dc_isar_feature(aa32_mve, s) ||
+        !mve_check_qreg_bank(s, a->qd | a->qm) ||
+        !fn) {
+        return false;
+    }
+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
+        return true;
+    }
+
+    /*
+     * When we handle a right shift insn using a left-shift helper
+     * which permits a negative shift count to indicate a right-shift,
+     * we must negate the shift count.
+     */
+    if (negateshift) {
+        shift = -shift;
+    }
+
+    qd = mve_qreg_ptr(a->qd);
+    qm = mve_qreg_ptr(a->qm);
+    fn(cpu_env, qd, qm, tcg_constant_i32(shift));
+    tcg_temp_free_ptr(qd);
+    tcg_temp_free_ptr(qm);
+    mve_update_eci(s);
+    return true;
+}
+
+#define DO_2SHIFT(INSN, FN, NEGATESHIFT)                         \
+    static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
+    {                                                           \
+        static MVEGenTwoOpShiftFn * const fns[] = {             \
+            gen_helper_mve_##FN##b,                             \
+            gen_helper_mve_##FN##h,                             \
+            gen_helper_mve_##FN##w,                             \
+            NULL,                                               \
+        };                                                      \
+        return do_2shift(s, a, fns[a->size], NEGATESHIFT);      \
+    }
+
+DO_2SHIFT(VSHLI, vshli_u, false)
+DO_2SHIFT(VQSHLI_S, vqshli_s, false)
+DO_2SHIFT(VQSHLI_U, vqshli_u, false)
+DO_2SHIFT(VQSHLUI, vqshlui_s, false)
-- 
2.20.1

Implement the MVE vector shift right by immediate insns VSHRI and
VRSHRI.  As with Neon, we implement these by using helper functions
which perform left shifts but allow negative shift counts to indicate
right shifts.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-9-peter.maydell@linaro.org
---
 target/arm/helper-mve.h     | 12 ++++++++++++
 target/arm/translate.h      | 20 ++++++++++++++++++++
 target/arm/mve.decode       | 28 ++++++++++++++++++++++++++++
 target/arm/mve_helper.c     |  7 +++++++
 target/arm/translate-mve.c  |  5 +++++
 target/arm/translate-neon.c | 18 ------------------
 6 files changed, 72 insertions(+), 18 deletions(-)

Implement the MVE VHLL (vector shift left long) insn.  This has two
encodings: the T1 encoding is the usual shift-by-immediate format,
and the T2 encoding is a special case where the shift count is always
equal to the element size.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-10-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  9 +++++++
 target/arm/mve.decode      | 53 +++++++++++++++++++++++++++++++++++---
 target/arm/mve_helper.c    | 32 +++++++++++++++++++++++
 target/arm/translate-mve.c | 15 +++++++++++
 4 files changed, 105 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vshllbsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshllbsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshllbub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshllbuh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshlltsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshlltsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshlltub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshlltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
 @2_shl_h .... .... .. 01  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
 @2_shl_w .... .... .. 1   shift:5 .... .... .... .... &2shift qd=%qd qm=%qm size=2
 
+@2_shll_b .... .... ... 01 shift:3 .... .... .... .... &2shift qd=%qd qm=%qm size=0
+@2_shll_h .... .... ... 1  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
+# VSHLL encoding T2 where shift == esize
+@2_shll_esize_b .... .... .... 00 .. .... .... .... .... &2shift \
+                qd=%qd qm=%qm size=0 shift=8
+@2_shll_esize_h .... .... .... 01 .. .... .... .... .... &2shift \
+                qd=%qd qm=%qm size=1 shift=16
+
 # Right shifts are encoded as N - shift, where N is the element size in bits.
 %rshift_i5  16:5 !function=rsub_32
 %rshift_i4  16:4 !function=rsub_16
@@ -XXX,XX +XXX,XX @@ VADD             1110 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
 VSUB             1111 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
 VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
 
-VMULH_S          111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
-VMULH_U          111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
+# The VSHLL T2 encoding is not a @2op pattern, but is here because it
+# overlaps what would be size=0b11 VMULH/VRMULH
+{
+  VSHLL_BS       111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
+  VSHLL_BS       111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
 
-VRMULH_S         111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
-VRMULH_U         111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
+  VMULH_S        111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
+}
+
+{
+  VSHLL_BU       111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
+  VSHLL_BU       111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
+
+  VMULH_U        111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
+}
+
+{
+  VSHLL_TS       111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
+  VSHLL_TS       111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
+
+  VRMULH_S       111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
+}
+
+{
+  VSHLL_TU       111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
+  VSHLL_TU       111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
+
+  VRMULH_U       111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
+}
 
 VMAX_S           111 0 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
 VMAX_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
@@ -XXX,XX +XXX,XX @@ VRSHRI_S          111 0 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
 VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_b
 VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_h
 VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
+
+# VSHLL T1 encoding; the T2 VSHLL encoding is elsewhere in this file
+VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_b
+VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_h
+
+VSHLL_BU          111 1 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_b
+VSHLL_BU          111 1 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_h
+
+VSHLL_TS          111 0 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_b
+VSHLL_TS          111 0 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
+
+VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_b
+VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
 DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
 DO_2SHIFT_U(vrshli_u, DO_VRSHLU)
 DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
+
+/*
+ * Long shifts taking half-sized inputs from top or bottom of the input
+ * vector and producing a double-width result. ESIZE, TYPE are for
+ * the input, and LESIZE, LTYPE for the output.
+ * Unlike the normal shift helpers, we do not handle negative shift counts,
+ * because the long shift is strictly left-only.
+ */
+#define DO_VSHLL(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE)                   \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,             \
+                                void *vm, uint32_t shift)               \
+    {                                                                   \
+        LTYPE *d = vd;                                                  \
+        TYPE *m = vm;                                                   \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned le;                                                    \
+        assert(shift <= 16);                                            \
+        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
+            LTYPE r = (LTYPE)m[H##ESIZE(le * 2 + TOP)] << shift;        \
+            mergemask(&d[H##LESIZE(le)], r, mask);                      \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+#define DO_VSHLL_ALL(OP, TOP)                                \
+    DO_VSHLL(OP##sb, TOP, 1, int8_t, 2, int16_t)             \
+    DO_VSHLL(OP##ub, TOP, 1, uint8_t, 2, uint16_t)           \
+    DO_VSHLL(OP##sh, TOP, 2, int16_t, 4, int32_t)            \
+    DO_VSHLL(OP##uh, TOP, 2, uint16_t, 4, uint32_t)          \
+
+DO_VSHLL_ALL(vshllb, false)
+DO_VSHLL_ALL(vshllt, true)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT(VSHRI_S, vshli_s, true)
 DO_2SHIFT(VSHRI_U, vshli_u, true)
 DO_2SHIFT(VRSHRI_S, vrshli_s, true)
 DO_2SHIFT(VRSHRI_U, vrshli_u, true)
+
+#define DO_VSHLL(INSN, FN)                                      \
+    static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
+    {                                                           \
+        static MVEGenTwoOpShiftFn * const fns[] = {             \
+            gen_helper_mve_##FN##b,                             \
+            gen_helper_mve_##FN##h,                             \
+        };                                                      \
+        return do_2shift(s, a, fns[a->size], false);            \
+    }
+
+DO_VSHLL(VSHLL_BS, vshllbs)
+DO_VSHLL(VSHLL_BU, vshllbu)
+DO_VSHLL(VSHLL_TS, vshllts)
+DO_VSHLL(VSHLL_TU, vshlltu)
-- 
2.20.1

Implement the MVE VSRI and VSLI insns, which perform a
shift-and-insert operation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-11-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  9 ++++++++
 target/arm/mve_helper.c    | 42 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  3 +++
 4 files changed, 62 insertions(+)

Implement the MVE shift-right-and-narrow insn VSHRN and VRSHRN.

do_urshr() is borrowed from sve_helper.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-12-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 10 ++++++++++
 target/arm/mve.decode      | 11 +++++++++++
 target/arm/mve_helper.c    | 40 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 15 ++++++++++++++
 4 files changed, 76 insertions(+)

Implement the MVE saturating shift-right-and-narrow insns
VQSHRN, VQSHRUN, VQRSHRN and VQRSHRUN.

do_srshr() is borrowed from sve_helper.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-13-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  30 +++++++++++
 target/arm/mve.decode      |  28 ++++++++++
 target/arm/mve_helper.c    | 104 +++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  12 +++++
 4 files changed, 174 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrshrnbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshrnbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshrntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshrnth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshrnb_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnb_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnt_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnt_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshrnb_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnb_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnt_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnt_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshrunbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrunbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqrshrnb_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnb_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnt_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnt_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqrshrnb_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnb_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnt_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnt_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqrshrunbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrunbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_b
 VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_h
 VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_b
 VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_h
+
+VQSHRNB_S         111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_b
+VQSHRNB_S         111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_h
+VQSHRNT_S         111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_b
+VQSHRNT_S         111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_h
+VQSHRNB_U         111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_b
+VQSHRNB_U         111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_h
+VQSHRNT_U         111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_b
+VQSHRNT_U         111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_h
+
+VQSHRUNB          111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_b
+VQSHRUNB          111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_h
+VQSHRUNT          111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
+VQSHRUNT          111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
+
+VQRSHRNB_S        111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_b
+VQRSHRNB_S        111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_h
+VQRSHRNT_S        111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_b
+VQRSHRNT_S        111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_h
+VQRSHRNB_U        111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_b
+VQRSHRNB_U        111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_h
+VQRSHRNT_U        111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_b
+VQRSHRNT_U        111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_h
+
+VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_b
+VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_h
+VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
+VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint64_t do_urshr(uint64_t x, unsigned sh)
     }
 }
 
+static inline int64_t do_srshr(int64_t x, unsigned sh)
+{
+    if (likely(sh < 64)) {
+        return (x >> sh) + ((x >> (sh - 1)) & 1);
+    } else {
+        /* Rounding the sign bit always produces 0. */
+        return 0;
+    }
+}
+
 DO_VSHRN_ALL(vshrn, DO_SHR)
 DO_VSHRN_ALL(vrshrn, do_urshr)
+
+static inline int32_t do_sat_bhs(int64_t val, int64_t min, int64_t max,
+                                 bool *satp)
+{
+    if (val > max) {
+        *satp = true;
+        return max;
+    } else if (val < min) {
+        *satp = true;
+        return min;
+    } else {
+        return val;
+    }
+}
+
+/* Saturating narrowing right shifts */
+#define DO_VSHRN_SAT(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN)   \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
+                                void *vm, uint32_t shift)       \
+    {                                                           \
+        LTYPE *m = vm;                                          \
+        TYPE *d = vd;                                           \
+        uint16_t mask = mve_element_mask(env);                  \
+        bool qc = false;                                        \
+        unsigned le;                                            \
+        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
+            bool sat = false;                                   \
+            TYPE r = FN(m[H##LESIZE(le)], shift, &sat);         \
+            mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask);     \
+            qc |= sat && (mask & 1 << (TOP * ESIZE));           \
+        }                                                       \
+        if (qc) {                                               \
+            env->vfp.qc[0] = qc;                                \
+        }                                                       \
+        mve_advance_vpt(env);                                   \
+    }
+
+#define DO_VSHRN_SAT_UB(BOP, TOP, FN)                           \
+    DO_VSHRN_SAT(BOP, false, 1, uint8_t, 2, uint16_t, FN)       \
+    DO_VSHRN_SAT(TOP, true, 1, uint8_t, 2, uint16_t, FN)
+
+#define DO_VSHRN_SAT_UH(BOP, TOP, FN)                           \
+    DO_VSHRN_SAT(BOP, false, 2, uint16_t, 4, uint32_t, FN)      \
+    DO_VSHRN_SAT(TOP, true, 2, uint16_t, 4, uint32_t, FN)
+
+#define DO_VSHRN_SAT_SB(BOP, TOP, FN)                           \
+    DO_VSHRN_SAT(BOP, false, 1, int8_t, 2, int16_t, FN)         \
+    DO_VSHRN_SAT(TOP, true, 1, int8_t, 2, int16_t, FN)
+
+#define DO_VSHRN_SAT_SH(BOP, TOP, FN)                           \
+    DO_VSHRN_SAT(BOP, false, 2, int16_t, 4, int32_t, FN)        \
+    DO_VSHRN_SAT(TOP, true, 2, int16_t, 4, int32_t, FN)
+
+#define DO_SHRN_SB(N, M, SATP)                                  \
+    do_sat_bhs((int64_t)(N) >> (M), INT8_MIN, INT8_MAX, SATP)
+#define DO_SHRN_UB(N, M, SATP)                                  \
+    do_sat_bhs((uint64_t)(N) >> (M), 0, UINT8_MAX, SATP)
+#define DO_SHRUN_B(N, M, SATP)                                  \
+    do_sat_bhs((int64_t)(N) >> (M), 0, UINT8_MAX, SATP)
+
+#define DO_SHRN_SH(N, M, SATP)                                  \
+    do_sat_bhs((int64_t)(N) >> (M), INT16_MIN, INT16_MAX, SATP)
+#define DO_SHRN_UH(N, M, SATP)                                  \
+    do_sat_bhs((uint64_t)(N) >> (M), 0, UINT16_MAX, SATP)
+#define DO_SHRUN_H(N, M, SATP)                                  \
+    do_sat_bhs((int64_t)(N) >> (M), 0, UINT16_MAX, SATP)
+
+#define DO_RSHRN_SB(N, M, SATP)                                 \
+    do_sat_bhs(do_srshr(N, M), INT8_MIN, INT8_MAX, SATP)
+#define DO_RSHRN_UB(N, M, SATP)                                 \
+    do_sat_bhs(do_urshr(N, M), 0, UINT8_MAX, SATP)
+#define DO_RSHRUN_B(N, M, SATP)                                 \
+    do_sat_bhs(do_srshr(N, M), 0, UINT8_MAX, SATP)
+
+#define DO_RSHRN_SH(N, M, SATP)                                 \
+    do_sat_bhs(do_srshr(N, M), INT16_MIN, INT16_MAX, SATP)
+#define DO_RSHRN_UH(N, M, SATP)                                 \
+    do_sat_bhs(do_urshr(N, M), 0, UINT16_MAX, SATP)
+#define DO_RSHRUN_H(N, M, SATP)                                 \
+    do_sat_bhs(do_srshr(N, M), 0, UINT16_MAX, SATP)
+
+DO_VSHRN_SAT_SB(vqshrnb_sb, vqshrnt_sb, DO_SHRN_SB)
+DO_VSHRN_SAT_SH(vqshrnb_sh, vqshrnt_sh, DO_SHRN_SH)
+DO_VSHRN_SAT_UB(vqshrnb_ub, vqshrnt_ub, DO_SHRN_UB)
+DO_VSHRN_SAT_UH(vqshrnb_uh, vqshrnt_uh, DO_SHRN_UH)
+DO_VSHRN_SAT_SB(vqshrunbb, vqshruntb, DO_SHRUN_B)
+DO_VSHRN_SAT_SH(vqshrunbh, vqshrunth, DO_SHRUN_H)
+
+DO_VSHRN_SAT_SB(vqrshrnb_sb, vqrshrnt_sb, DO_RSHRN_SB)
+DO_VSHRN_SAT_SH(vqrshrnb_sh, vqrshrnt_sh, DO_RSHRN_SH)
+DO_VSHRN_SAT_UB(vqrshrnb_ub, vqrshrnt_ub, DO_RSHRN_UB)
+DO_VSHRN_SAT_UH(vqrshrnb_uh, vqrshrnt_uh, DO_RSHRN_UH)
+DO_VSHRN_SAT_SB(vqrshrunbb, vqrshruntb, DO_RSHRUN_B)
+DO_VSHRN_SAT_SH(vqrshrunbh, vqrshrunth, DO_RSHRUN_H)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_N(VSHRNB, vshrnb)
 DO_2SHIFT_N(VSHRNT, vshrnt)
 DO_2SHIFT_N(VRSHRNB, vrshrnb)
 DO_2SHIFT_N(VRSHRNT, vrshrnt)
+DO_2SHIFT_N(VQSHRNB_S, vqshrnb_s)
+DO_2SHIFT_N(VQSHRNT_S, vqshrnt_s)
+DO_2SHIFT_N(VQSHRNB_U, vqshrnb_u)
+DO_2SHIFT_N(VQSHRNT_U, vqshrnt_u)
+DO_2SHIFT_N(VQSHRUNB, vqshrunb)
+DO_2SHIFT_N(VQSHRUNT, vqshrunt)
+DO_2SHIFT_N(VQRSHRNB_S, vqrshrnb_s)
+DO_2SHIFT_N(VQRSHRNT_S, vqrshrnt_s)
+DO_2SHIFT_N(VQRSHRNB_U, vqrshrnb_u)
+DO_2SHIFT_N(VQRSHRNT_U, vqrshrnt_u)
+DO_2SHIFT_N(VQRSHRUNB, vqrshrunb)
+DO_2SHIFT_N(VQRSHRUNT, vqrshrunt)
-- 
2.20.1

Implement the MVE VSHLC insn, which performs a shift left of the
entire vector with carry in bits provided from a general purpose
register and carry out bits written back to that register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-14-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  2 ++
 target/arm/mve.decode      |  2 ++
 target/arm/mve_helper.c    | 38 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 30 ++++++++++++++++++++++++++++++
 4 files changed, 72 insertions(+)

Implement the MVE VADDLV insn; this is similar to VADDV, except
that it accumulates 32-bit elements into a 64-bit accumulator
stored in a pair of general-purpose registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-15-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  3 ++
 target/arm/mve.decode      |  6 +++-
 target/arm/mve_helper.c    | 19 ++++++++++++
 target/arm/translate-mve.c | 63 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 90 insertions(+), 1 deletion(-)

The MVE extension to v8.1M includes some new shift instructions which
sit entirely within the non-coprocessor part of the encoding space
and which operate only on general-purpose registers.  They take up
the space which was previously UNPREDICTABLE MOVS and ORRS encodings
with Rm == 13 or 15.

Implement the long shifts by immediate, which perform shifts on a
pair of general-purpose registers treated as a 64-bit quantity, with
an immediate shift count between 1 and 32.

Awkwardly, because the MOVS and ORRS trans functions do not UNDEF for
the Rm==13,15 case, we need to explicitly emit code to UNDEF for the
cases where v8.1M now requires that.  (Trying to change MOVS and ORRS
is too difficult, because the functions that generate the code are
shared between a dozen different kinds of arithmetic or logical
instruction for all A32, T16 and T32 encodings, and for some insns
and some encodings Rm==13,15 are valid.)

We make the helper functions we need for UQSHLL and SQSHLL take
a 32-bit value which the helper casts to int8_t because we'll need
these helpers also for the shift-by-register insns, where the shift
count might be < 0 or > 32.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-16-peter.maydell@linaro.org
---
 target/arm/helper-mve.h |  3 ++
 target/arm/translate.h  |  1 +
 target/arm/t32.decode   | 28 +++++++++++++
 target/arm/mve_helper.c | 10 +++++
 target/arm/translate.c  | 90 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 132 insertions(+)

Implement the MVE long shifts by register, which perform shifts on a
pair of general-purpose registers treated as a 64-bit quantity, with
the shift count in another general-purpose register, which might be
either positive or negative.

Like the long-shifts-by-immediate, these encodings sit in the space
that was previously the UNPREDICTABLE MOVS/ORRS with Rm==13,15.
Because LSLL_rr and ASRL_rr overlap with both MOV_rxri/ORR_rrri and
also with CSEL (as one of the previously-UNPREDICTABLE Rm==13 cases),
we have to move the CSEL pattern into the same decodetree group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-17-peter.maydell@linaro.org
---
 target/arm/helper-mve.h |  6 +++
 target/arm/translate.h  |  1 +
 target/arm/t32.decode   | 16 +++++--
 target/arm/mve_helper.c | 93 +++++++++++++++++++++++++++++++++++++++++
 target/arm/translate.c  | 69 ++++++++++++++++++++++++++++++
 5 files changed, 182 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(mve_vshlc, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
 
+DEF_HELPER_FLAGS_3(mve_sshrl, TCG_CALL_NO_RWG, i64, env, i64, i32)
+DEF_HELPER_FLAGS_3(mve_ushll, TCG_CALL_NO_RWG, i64, env, i64, i32)
 DEF_HELPER_FLAGS_3(mve_sqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
 DEF_HELPER_FLAGS_3(mve_uqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
+DEF_HELPER_FLAGS_3(mve_sqrshrl, TCG_CALL_NO_RWG, i64, env, i64, i32)
+DEF_HELPER_FLAGS_3(mve_uqrshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
+DEF_HELPER_FLAGS_3(mve_sqrshrl48, TCG_CALL_NO_RWG, i64, env, i64, i32)
+DEF_HELPER_FLAGS_3(mve_uqrshll48, TCG_CALL_NO_RWG, i64, env, i64, i32)
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
 typedef void WideShiftImmFn(TCGv_i64, TCGv_i64, int64_t shift);
+typedef void WideShiftFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i32);
 
 /**
  * arm_tbflags_from_tb:
diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -XXX,XX +XXX,XX @@
 &mcrr            !extern cp opc1 crm rt rt2
 
 &mve_shl_ri      rdalo rdahi shim
+&mve_shl_rr      rdalo rdahi rm
 
 # rdahi: bits [3:1] from insn, bit 0 is 1
 # rdalo: bits [3:1] from insn, bit 0 is 0
@@ -XXX,XX +XXX,XX @@
 
 @mve_shl_ri      ....... .... . ... . . ... ... . .. .. .... \
                  &mve_shl_ri shim=%imm5_12_6 rdalo=%rdalo_17 rdahi=%rdahi_9
+@mve_shl_rr      ....... .... . ... . rm:4  ... . .. .. .... \
+                 &mve_shl_rr rdalo=%rdalo_17 rdahi=%rdahi_9
 
 {
   TST_xrri       1110101 0000 1 .... 0 ... 1111 .... ....     @S_xrr_shi
@@ -XXX,XX +XXX,XX @@ BIC_rrri         1110101 0001 . .... 0 ... .... .... ....     @s_rrr_shi
     URSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
     SRSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
     SQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
+
+    LSLL_rr      1110101 0010 1 ... 0 ....  ... 1  0000 1101  @mve_shl_rr
+    ASRL_rr      1110101 0010 1 ... 0 ....  ... 1  0010 1101  @mve_shl_rr
+    UQRSHLL64_rr 1110101 0010 1 ... 1 ....  ... 1  0000 1101  @mve_shl_rr
+    SQRSHRL64_rr 1110101 0010 1 ... 1 ....  ... 1  0010 1101  @mve_shl_rr
+    UQRSHLL48_rr 1110101 0010 1 ... 1 ....  ... 1  1000 1101  @mve_shl_rr
+    SQRSHRL48_rr 1110101 0010 1 ... 1 ....  ... 1  1010 1101  @mve_shl_rr
   ]
 
   MOV_rxri       1110101 0010 . 1111 0 ... .... .... ....     @s_rxr_shi
   ORR_rrri       1110101 0010 . .... 0 ... .... .... ....     @s_rrr_shi
+
+  # v8.1M CSEL and friends
+  CSEL           1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
 }
 {
   MVN_rxri       1110101 0011 . 1111 0 ... .... .... ....     @s_rxr_shi
@@ -XXX,XX +XXX,XX @@ SBC_rrri         1110101 1011 . .... 0 ... .... .... ....     @s_rrr_shi
 }
 RSB_rrri         1110101 1110 . .... 0 ... .... .... ....     @s_rrr_shi
 
-# v8.1M CSEL and friends
-CSEL             1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
-
 # Data-processing (register-shifted register)
 
 MOV_rxrr         1111 1010 0 shty:2 s:1 rm:4 1111 rd:4 0000 rs:4 \
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mve_vshlc)(CPUARMState *env, void *vd, uint32_t rdm,
     return rdm;
 }
 
+uint64_t HELPER(mve_sshrl)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_sqrshl_d(n, -(int8_t)shift, false, NULL);
+}
+
+uint64_t HELPER(mve_ushll)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_uqrshl_d(n, (int8_t)shift, false, NULL);
+}
+
 uint64_t HELPER(mve_sqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
 {
     return do_sqrshl_d(n, (int8_t)shift, false, &env->QF);
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(mve_uqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
 {
     return do_uqrshl_d(n, (int8_t)shift, false, &env->QF);
 }
+
+uint64_t HELPER(mve_sqrshrl)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_sqrshl_d(n, -(int8_t)shift, true, &env->QF);
+}
+
+uint64_t HELPER(mve_uqrshll)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_uqrshl_d(n, (int8_t)shift, true, &env->QF);
+}
+
+/* Operate on 64-bit values, but saturate at 48 bits */
+static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
+                                    bool round, uint32_t *sat)
+{
+    if (shift <= -48) {
+        /* Rounding the sign bit always produces 0. */
+        if (round) {
+            return 0;
+        }
+        return src >> 63;
+    } else if (shift < 0) {
+        if (round) {
+            src >>= -shift - 1;
+            return (src >> 1) + (src & 1);
+        }
+        return src >> -shift;
+    } else if (shift < 48) {
+        int64_t val = src << shift;
+        int64_t extval = sextract64(val, 0, 48);
+        if (!sat || val == extval) {
+            return extval;
+        }
+    } else if (!sat || src == 0) {
+        return 0;
+    }
+
+    *sat = 1;
+    return (1ULL << 47) - (src >= 0);
+}
+
+/* Operate on 64-bit values, but saturate at 48 bits */
+static inline uint64_t do_uqrshl48_d(uint64_t src, int64_t shift,
+                                     bool round, uint32_t *sat)
+{
+    uint64_t val, extval;
+
+    if (shift <= -(48 + round)) {
+        return 0;
+    } else if (shift < 0) {
+        if (round) {
+            val = src >> (-shift - 1);
+            val = (val >> 1) + (val & 1);
+        } else {
+            val = src >> -shift;
+        }
+        extval = extract64(val, 0, 48);
+        if (!sat || val == extval) {
+            return extval;
+        }
+    } else if (shift < 48) {
+        uint64_t val = src << shift;
+        uint64_t extval = extract64(val, 0, 48);
+        if (!sat || val == extval) {
+            return extval;
+        }
+    } else if (!sat || src == 0) {
+        return 0;
+    }
+
+    *sat = 1;
+    return MAKE_64BIT_MASK(0, 48);
+}
+
+uint64_t HELPER(mve_sqrshrl48)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_sqrshl48_d(n, -(int8_t)shift, true, &env->QF);
+}
+
+uint64_t HELPER(mve_uqrshll48)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_uqrshl48_d(n, (int8_t)shift, true, &env->QF);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_URSHRL_ri(DisasContext *s, arg_mve_shl_ri *a)
     return do_mve_shl_ri(s, a, gen_urshr64_i64);
 }
 
+static bool do_mve_shl_rr(DisasContext *s, arg_mve_shl_rr *a, WideShiftFn *fn)
+{
+    TCGv_i64 rda;
+    TCGv_i32 rdalo, rdahi;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+        /* Decode falls through to ORR/MOV UNPREDICTABLE handling */
+        return false;
+    }
+    if (a->rdahi == 15) {
+        /* These are a different encoding (SQSHL/SRSHR/UQSHL/URSHR) */
+        return false;
+    }
+    if (!dc_isar_feature(aa32_mve, s) ||
+        !arm_dc_feature(s, ARM_FEATURE_M_MAIN) ||
+        a->rdahi == 13 || a->rm == 13 || a->rm == 15 ||
+        a->rm == a->rdahi || a->rm == a->rdalo) {
+        /* These rdahi/rdalo/rm cases are UNPREDICTABLE; we choose to UNDEF */
+        unallocated_encoding(s);
+        return true;
+    }
+
+    rda = tcg_temp_new_i64();
+    rdalo = load_reg(s, a->rdalo);
+    rdahi = load_reg(s, a->rdahi);
+    tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
+
+    /* The helper takes care of the sign-extension of the low 8 bits of Rm */
+    fn(rda, cpu_env, rda, cpu_R[a->rm]);
+
+    tcg_gen_extrl_i64_i32(rdalo, rda);
+    tcg_gen_extrh_i64_i32(rdahi, rda);
+    store_reg(s, a->rdalo, rdalo);
+    store_reg(s, a->rdahi, rdahi);
+    tcg_temp_free_i64(rda);
+
+    return true;
+}
+
+static bool trans_LSLL_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_ushll);
+}
+
+static bool trans_ASRL_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_sshrl);
+}
+
+static bool trans_UQRSHLL64_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_uqrshll);
+}
+
+static bool trans_SQRSHRL64_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_sqrshrl);
+}
+
+static bool trans_UQRSHLL48_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_uqrshll48);
+}
+
+static bool trans_SQRSHRL48_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_sqrshrl48);
+}
+
 /*
  * Multiply and multiply accumulate
  */
-- 
2.20.1

Implement the MVE shifts by immediate, which perform shifts
on a single general-purpose register.

These patterns overlap with the long-shift-by-immediates,
so we have to rearrange the grouping a little here.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-18-peter.maydell@linaro.org
---
 target/arm/helper-mve.h |  3 ++
 target/arm/translate.h  |  1 +
 target/arm/t32.decode   | 31 ++++++++++++++-----
 target/arm/mve_helper.c | 10 ++++++
 target/arm/translate.c  | 68 +++++++++++++++++++++++++++++++++++++++--
 5 files changed, 104 insertions(+), 9 deletions(-)

Implement the MVE shifts by register, which perform
shifts on a single general-purpose register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-19-peter.maydell@linaro.org
---
 target/arm/helper-mve.h |  2 ++
 target/arm/translate.h  |  1 +
 target/arm/t32.decode   | 18 ++++++++++++++----
 target/arm/mve_helper.c | 10 ++++++++++
 target/arm/translate.c  | 30 ++++++++++++++++++++++++++++++
 5 files changed, 57 insertions(+), 4 deletions(-)