Series comparison

-[PULL 00/41] target-arm queue
+[PULL 00/24] target-arm queue
-The following changes since commit 4c41341af76cfc85b5a6c0f87de4838672ab9f89:
+The following changes since commit 5a67d7735d4162630769ef495cf813244fc850df:
-  Merge remote-tracking branch 'remotes/aperard/tags/pull-xen-20201020' into staging (2020-10-20 11:20:36 +0100)
+  Merge remote-tracking branch 'remotes/berrange-gitlab/tags/tls-deps-pull-request' into staging (2021-07-02 08:22:39 +0100)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20201020
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210702
-for you to fetch changes up to 6358890cb939192f6169fdf7664d903bf9b1d338:
+for you to fetch changes up to 04ea4d3cfd0a21b248ece8eb7a9436a3d9898dd8:
-  tests/tcg/aarch64: Add bti smoke tests (2020-10-20 16:12:02 +0100)
+  target/arm: Implement MVE shifts by register (2021-07-02 11:48:38 +0100)
 ----------------------------------------------------------------
 target-arm queue:
- * Fix AArch32 SMLAD incorrect setting of Q bit
+ * more MVE instructions
- * AArch32 VCVT fixed-point to float is always round-to-nearest
+ * hw/gpio/gpio_pwr: use shutdown function for reboot
- * strongarm: Fix 'time to transmit a char' unit comment
+ * target/arm: Check NaN mode before silencing NaN
- * Restrict APEI tables generation to the 'virt' machine
+ * tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine
- * bcm2835: minor code cleanups
+ * hw/arm: Add basic power management to raspi.
- * correctly flush TLBs when TBI is enabled
+ * docs/system/arm: Add quanta-gbs-bmc, quanta-q7l1-bmc
  * tests/qtest: Add npcm7xx timer test
  * loads-stores.rst: add footnote that clarifies GETPC usage
  * Fix reported EL for mte_check_fail
  * Ignore HCR_EL2.ATA when {E2H,TGE} != 11
  * microbit_i2c: Fix coredump when dump-vmstate
  * nseries: Fix loading kernel image on n8x0 machines
  * Implement v8.1M low-overhead-loops
  * linux-user: Support AArch64 BTI
 ----------------------------------------------------------------
-Emanuele Giuseppe Esposito (1):
+Joe Komlodi (1):
-      loads-stores.rst: add footnote that clarifies GETPC usage
+      target/arm: Check NaN mode before silencing NaN
-Havard Skinnemoen (1):
+Maxim Uvarov (1):
-      tests/qtest: Add npcm7xx timer test
+      hw/gpio/gpio_pwr: use shutdown function for reboot
-Peng Liang (1):
+Nolan Leake (1):
-      microbit_i2c: Fix coredump when dump-vmstate
+      hw/arm: Add basic power management to raspi.
-Peter Maydell (12):
+Patrick Venture (2):
-      target/arm: Fix SMLAD incorrect setting of Q bit
+      docs/system/arm: Add quanta-q7l1-bmc reference
-      target/arm: AArch32 VCVT fixed-point to float is always round-to-nearest
+      docs/system/arm: Add quanta-gbs-bmc reference
       decodetree: Fix codegen for non-overlapping group inside overlapping group
       target/arm: Implement v8.1M NOCP handling
       target/arm: Implement v8.1M conditional-select insns
       target/arm: Make the t32 insn[25:23]=111 group non-overlapping
       target/arm: Don't allow BLX imm for M-profile
       target/arm: Implement v8.1M branch-future insns (as NOPs)
       target/arm: Implement v8.1M low-overhead-loop instructions
       target/arm: Fix has_vfp/has_neon ID reg squashing for M-profile
       target/arm: Allow M-profile CPUs with FP16 to set FPSCR.FP16
       target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension
-Philippe Mathieu-Daudé (10):
+Peter Maydell (18):
-      hw/arm/strongarm: Fix 'time to transmit a char' unit comment
+      target/arm: Fix MVE widening/narrowing VLDR/VSTR offset calculation
-      hw/arm: Restrict APEI tables generation to the 'virt' machine
+      target/arm: Fix bugs in MVE VRMLALDAVH, VRMLSLDAVH
-      hw/timer/bcm2835: Introduce BCM2835_SYSTIMER_COUNT definition
+      target/arm: Make asimd_imm_const() public
-      hw/timer/bcm2835: Rename variable holding CTRL_STATUS register
+      target/arm: Use asimd_imm_const for A64 decode
-      hw/timer/bcm2835: Support the timer COMPARE registers
+      target/arm: Use dup_const() instead of bitfield_replicate()
-      hw/arm/bcm2835_peripherals: Correctly wire the SYS_timer IRQs
+      target/arm: Implement MVE logical immediate insns
-      hw/intc/bcm2835_ic: Trace GPU/CPU IRQ handlers
+      target/arm: Implement MVE vector shift left by immediate insns
-      hw/intc/bcm2836_control: Use IRQ definitions instead of magic numbers
+      target/arm: Implement MVE vector shift right by immediate insns
-      hw/arm/nseries: Fix loading kernel image on n8x0 machines
+      target/arm: Implement MVE VSHLL
-      linux-user/elfload: Avoid leaking interp_name using GLib memory API
+      target/arm: Implement MVE VSRI, VSLI
       target/arm: Implement MVE VSHRN, VRSHRN
       target/arm: Implement MVE saturating narrowing shifts
       target/arm: Implement MVE VSHLC
       target/arm: Implement MVE VADDLV
       target/arm: Implement MVE long shifts by immediate
       target/arm: Implement MVE long shifts by register
       target/arm: Implement MVE shifts by immediate
       target/arm: Implement MVE shifts by register
-Richard Henderson (16):
+Philippe Mathieu-Daudé (1):
-      accel/tcg: Add tlb_flush_page_bits_by_mmuidx*
+      tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine
       target/arm: Use tlb_flush_page_bits_by_mmuidx*
       target/arm: Remove redundant mmu_idx lookup
       target/arm: Fix reported EL for mte_check_fail
       target/arm: Ignore HCR_EL2.ATA when {E2H,TGE} != 11
       linux-user/aarch64: Reset btype for signals
       linux-user: Set PAGE_TARGET_1 for TARGET_PROT_BTI
       include/elf: Add defines related to GNU property notes for AArch64
       linux-user/elfload: Fix coding style in load_elf_image
       linux-user/elfload: Adjust iteration over phdr
       linux-user/elfload: Move PT_INTERP detection to first loop
       linux-user/elfload: Use Error for load_elf_image
       linux-user/elfload: Use Error for load_elf_interp
       linux-user/elfload: Parse NT_GNU_PROPERTY_TYPE_0 notes
       linux-user/elfload: Parse GNU_PROPERTY_AARCH64_FEATURE_1_AND
       tests/tcg/aarch64: Add bti smoke tests
- docs/devel/loads-stores.rst             |   8 +-
+ docs/system/arm/aspeed.rst             |   1 +
- default-configs/devices/arm-softmmu.mak |   1 -
+ docs/system/arm/nuvoton.rst            |   5 +-
- include/elf.h                           |  22 ++
+ include/hw/arm/bcm2835_peripherals.h   |   3 +-
- include/exec/cpu-all.h                  |   2 +
+ include/hw/misc/bcm2835_powermgt.h     |  29 ++
- include/exec/exec-all.h                 |  36 ++
+ target/arm/helper-mve.h                | 108 +++++++
- include/hw/timer/bcm2835_systmr.h       |  17 +-
+ target/arm/translate.h                 |  41 +++
- linux-user/qemu.h                       |   4 +
+ target/arm/mve.decode                  | 177 ++++++++++-
- linux-user/syscall_defs.h               |   4 +
+ target/arm/t32.decode                  |  71 ++++-
- target/arm/cpu.h                        |  13 +
+ hw/arm/bcm2835_peripherals.c           |  13 +-
- target/arm/helper.h                     |  13 +
+ hw/gpio/gpio_pwr.c                     |   2 +-
- target/arm/internals.h                  |   9 +-
+ hw/misc/bcm2835_powermgt.c             | 160 ++++++++++
- target/arm/m-nocp.decode                |  10 +-
+ target/arm/helper-a64.c                |  12 +-
- target/arm/t32.decode                   |  50 ++-
+ target/arm/mve_helper.c                | 524 +++++++++++++++++++++++++++++++--
- accel/tcg/cputlb.c                      | 275 +++++++++++++++-
+ target/arm/translate-a64.c             |  86 +-----
- hw/arm/bcm2835_peripherals.c            |  13 +-
+ target/arm/translate-mve.c             | 261 +++++++++++++++-
- hw/arm/nseries.c                        |   1 +
+ target/arm/translate-neon.c            |  81 -----
- hw/arm/strongarm.c                      |   2 +-
+ target/arm/translate.c                 | 327 +++++++++++++++++++-
- hw/i2c/microbit_i2c.c                   |   1 +
+ target/arm/vfp_helper.c                |  24 +-
- hw/intc/bcm2835_ic.c                    |   4 +-
+ hw/misc/meson.build                    |   1 +
- hw/intc/bcm2836_control.c               |   8 +-
+ tests/acceptance/boot_linux_console.py |  43 +++
- hw/timer/bcm2835_systmr.c               |  57 ++--
+files changed, 1760 insertions(+), 209 deletions(-)
- linux-user/aarch64/signal.c             |  10 +-
+ create mode 100644 include/hw/misc/bcm2835_powermgt.h
- linux-user/elfload.c                    | 326 ++++++++++++++----
+ create mode 100644 hw/misc/bcm2835_powermgt.c
  linux-user/mmap.c                       |  16 +
  target/arm/cpu.c                        |  38 ++-
  target/arm/helper.c                     |  55 +++-
  target/arm/mte_helper.c                 |  13 +-
  target/arm/translate-a64.c              |   6 +-
  target/arm/translate.c                  | 239 +++++++++++++-
  target/arm/vfp_helper.c                 |  76 +++--
  tests/qtest/npcm7xx_timer-test.c        | 562 ++++++++++++++++++++++++++++++++
  tests/tcg/aarch64/bti-1.c               |  62 ++++
  tests/tcg/aarch64/bti-2.c               | 108 ++++++
  tests/tcg/aarch64/bti-crt.inc.c         |  51 +++
  hw/arm/Kconfig                          |   1 +
  hw/intc/trace-events                    |   4 +
  hw/timer/trace-events                   |   6 +-
  scripts/decodetree.py                   |   2 +-
  target/arm/translate-vfp.c.inc          |  41 ++-
  tests/qtest/meson.build                 |   1 +
  tests/tcg/aarch64/Makefile.target       |  10 +
  tests/tcg/configure.sh                  |   4 +
 files changed, 1973 insertions(+), 208 deletions(-)
  create mode 100644 tests/qtest/npcm7xx_timer-test.c
  create mode 100644 tests/tcg/aarch64/bti-1.c
  create mode 100644 tests/tcg/aarch64/bti-2.c
  create mode 100644 tests/tcg/aarch64/bti-crt.inc.c

-[PULL 18/41] microbit_i2c: Fix coredump when dump-vmstate
+[PULL 01/24] docs/system/arm: Add quanta-q7l1-bmc reference
-From: Peng Liang <liangpeng10@huawei.com>
+From: Patrick Venture <venture@google.com>
-VMStateDescription.fields should be end with VMSTATE_END_OF_LIST().
+Adds a line-item reference to the supported quanta-q71l-bmc aspeed
-However, microbit_i2c_vmstate doesn't follow it.  Let's change it.
+entry.
-Fixes: 9d68bf564e ("arm: Stub out NRF51 TWI magnetometer/accelerometer detection")
+Signed-off-by: Patrick Venture <venture@google.com>
-Reported-by: Euler Robot <euler.robot@huawei.com>
+Reviewed-by: Cédric Le Goater <clg@kaod.org>
-Signed-off-by: Peng Liang <liangpeng10@huawei.com>
+Message-id: 20210615192848.1065297-2-venture@google.com
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Message-id: 20201019093401.2993833-1-liangpeng10@huawei.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/i2c/microbit_i2c.c | 1 +
+ docs/system/arm/aspeed.rst | 1 +
 file changed, 1 insertion(+)
-diff --git a/hw/i2c/microbit_i2c.c b/hw/i2c/microbit_i2c.c
+diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst
 index XXXXXXX..XXXXXXX 100644
---- a/hw/i2c/microbit_i2c.c
+--- a/docs/system/arm/aspeed.rst
-+++ b/hw/i2c/microbit_i2c.c
++++ b/docs/system/arm/aspeed.rst
-@@ -XXX,XX +XXX,XX @@ static const VMStateDescription microbit_i2c_vmstate = {
+@@ -XXX,XX +XXX,XX @@ etc.
-     .fields = (VMStateField[]) {
+ AST2400 SoC based machines :
-         VMSTATE_UINT32_ARRAY(regs, MicrobitI2CState, MICROBIT_I2C_NREGS),
-         VMSTATE_UINT32(read_idx, MicrobitI2CState),
+ - ``palmetto-bmc``         OpenPOWER Palmetto POWER8 BMC
-+        VMSTATE_END_OF_LIST()
++- ``quanta-q71l-bmc``      OpenBMC Quanta BMC
-     },
- };
+ AST2500 SoC based machines :
 --
 .20.1

-[PULL 40/41] linux-user/elfload: Parse GNU_PROPERTY_AARCH64_FEATURE_1_AND
+[PULL 02/24] docs/system/arm: Add quanta-gbs-bmc reference
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Patrick Venture <venture@google.com>
-Use the new generic support for NT_GNU_PROPERTY_TYPE_0.
+Add line item reference to quanta-gbs-bmc machine.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Patrick Venture <venture@google.com>
-Message-id: 20201016184207.786698-12-richard.henderson@linaro.org
+Reviewed-by: Cédric Le Goater <clg@kaod.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20210615192848.1065297-3-venture@google.com
 [PMM: fixed underline Sphinx warning]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- linux-user/elfload.c | 48 ++++++++++++++++++++++++++++++++++++++++++--
+ docs/system/arm/nuvoton.rst | 5 +++--
-file changed, 46 insertions(+), 2 deletions(-)
+file changed, 3 insertions(+), 2 deletions(-)
-diff --git a/linux-user/elfload.c b/linux-user/elfload.c
+diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
 index XXXXXXX..XXXXXXX 100644
---- a/linux-user/elfload.c
+--- a/docs/system/arm/nuvoton.rst
-+++ b/linux-user/elfload.c
++++ b/docs/system/arm/nuvoton.rst
-@@ -XXX,XX +XXX,XX @@ static void elf_core_copy_regs(target_elf_gregset_t *regs,
+@@ -XXX,XX +XXX,XX @@
+-Nuvoton iBMC boards (``npcm750-evb``, ``quanta-gsj``)
- #include "elf.h"
+-=====================================================
++Nuvoton iBMC boards (``*-bmc``, ``npcm750-evb``, ``quanta-gsj``)
-+/* We must delay the following stanzas until after "elf.h". */
++================================================================
-+#if defined(TARGET_AARCH64)
-+
+ The `Nuvoton iBMC`_ chips (NPCM7xx) are a family of ARM-based SoCs that are
-+static bool arch_parse_elf_property(uint32_t pr_type, uint32_t pr_datasz,
+ designed to be used as Baseboard Management Controllers (BMCs) in various
-+                                    const uint32_t *data,
+@@ -XXX,XX +XXX,XX @@ segment. The following machines are based on this chip :
-+                                    struct image_info *info,
+ The NPCM730 SoC has two Cortex-A9 cores and is targeted for Data Center and
-+                                    Error **errp)
+ Hyperscale applications. The following machines are based on this chip :
-+{
-+    if (pr_type == GNU_PROPERTY_AARCH64_FEATURE_1_AND) {
++- ``quanta-gbs-bmc``    Quanta GBS server BMC
-+        if (pr_datasz != sizeof(uint32_t)) {
+ - ``quanta-gsj``        Quanta GSJ server BMC
-+            error_setg(errp, "Ill-formed GNU_PROPERTY_AARCH64_FEATURE_1_AND");
-+            return false;
+ There are also two more SoCs, NPCM710 and NPCM705, which are single-core
 +        }
 +        /* We will extract GNU_PROPERTY_AARCH64_FEATURE_1_BTI later. */
 +        info->note_flags = *data;
 +    }
 +    return true;
 +}
 +#define ARCH_USE_GNU_PROPERTY 1
 +
 +#else
 +
  static bool arch_parse_elf_property(uint32_t pr_type, uint32_t pr_datasz,
                                      const uint32_t *data,
                                      struct image_info *info,
@@ -XXX,XX +XXX,XX @@ static bool arch_parse_elf_property(uint32_t pr_type, uint32_t pr_datasz,
  }
  #define ARCH_USE_GNU_PROPERTY 0
 +#endif
 +
  struct exec
  {
      unsigned int a_info;   /* Use macros N_MAGIC, etc for access */
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
      struct elfhdr *ehdr = (struct elfhdr *)bprm_buf;
      struct elf_phdr *phdr;
      abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
 -    int i, retval;
 +    int i, retval, prot_exec;
      Error *err = NULL;
      /* First of all, some simple consistency checks */
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
      info->brk = 0;
      info->elf_flags = ehdr->e_flags;
 +    prot_exec = PROT_EXEC;
 +#ifdef TARGET_AARCH64
 +    /*
 +     * If the BTI feature is present, this indicates that the executable
 +     * pages of the startup binary should be mapped with PROT_BTI, so that
 +     * branch targets are enforced.
 +     *
 +     * The startup binary is either the interpreter or the static executable.
 +     * The interpreter is responsible for all pages of a dynamic executable.
 +     *
 +     * Elf notes are backward compatible to older cpus.
 +     * Do not enable BTI unless it is supported.
 +     */
 +    if ((info->note_flags & GNU_PROPERTY_AARCH64_FEATURE_1_BTI)
 +        && (pinterp_name == NULL || *pinterp_name == 0)
 +        && cpu_isar_feature(aa64_bti, ARM_CPU(thread_cpu))) {
 +        prot_exec |= TARGET_PROT_BTI;
 +    }
 +#endif
 +
      for (i = 0; i < ehdr->e_phnum; i++) {
          struct elf_phdr *eppnt = phdr + i;
          if (eppnt->p_type == PT_LOAD) {
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
                  elf_prot |= PROT_WRITE;
              }
              if (eppnt->p_flags & PF_X) {
 -                elf_prot |= PROT_EXEC;
 +                elf_prot |= prot_exec;
              }
              vaddr = load_bias + eppnt->p_vaddr;
 --
 .20.1

-[PULL 41/41] tests/tcg/aarch64: Add bti smoke tests
+[PULL 03/24] hw/arm: Add basic power management to raspi.
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Nolan Leake <nolan@sigbus.net>
-The note test requires gcc 10 for -mbranch-protection=standard.
+This is just enough to make reboot and poweroff work. Works for
-The mmap test uses PROT_BTI and does not require special compiler support.
+linux, u-boot, and the arm trusted firmware. Not tested, but should
+work for plan9, and bare-metal/hobby OSes, since they seem to generally
-Acked-by: Alex Bennée <alex.bennee@linaro.org>
+do what linux does for reset.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+The watchdog timer functionality is not yet implemented.
-Message-id: 20201016184207.786698-13-richard.henderson@linaro.org
 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/64
 Signed-off-by: Nolan Leake <nolan@sigbus.net>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Message-id: 20210625210209.1870217-1-nolan@sigbus.net
 [PMM: tweaked commit title; fixed region size to 0x200;
  moved header file to include/]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- tests/tcg/aarch64/bti-1.c         |  62 +++++++++++++++++
+ include/hw/arm/bcm2835_peripherals.h |   3 +-
- tests/tcg/aarch64/bti-2.c         | 108 ++++++++++++++++++++++++++++++
+ include/hw/misc/bcm2835_powermgt.h   |  29 +++++
- tests/tcg/aarch64/bti-crt.inc.c   |  51 ++++++++++++++
+ hw/arm/bcm2835_peripherals.c         |  13 ++-
- tests/tcg/aarch64/Makefile.target |  10 +++
+ hw/misc/bcm2835_powermgt.c           | 160 +++++++++++++++++++++++++++
- tests/tcg/configure.sh            |   4 ++
+ hw/misc/meson.build                  |   1 +
-files changed, 235 insertions(+)
+files changed, 204 insertions(+), 2 deletions(-)
- create mode 100644 tests/tcg/aarch64/bti-1.c
+ create mode 100644 include/hw/misc/bcm2835_powermgt.h
- create mode 100644 tests/tcg/aarch64/bti-2.c
+ create mode 100644 hw/misc/bcm2835_powermgt.c
- create mode 100644 tests/tcg/aarch64/bti-crt.inc.c
+diff --git a/include/hw/arm/bcm2835_peripherals.h b/include/hw/arm/bcm2835_peripherals.h
-diff --git a/tests/tcg/aarch64/bti-1.c b/tests/tcg/aarch64/bti-1.c
+index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/arm/bcm2835_peripherals.h
 +++ b/include/hw/arm/bcm2835_peripherals.h
@@ -XXX,XX +XXX,XX @@
  #include "hw/misc/bcm2835_mphi.h"
  #include "hw/misc/bcm2835_thermal.h"
  #include "hw/misc/bcm2835_cprman.h"
 +#include "hw/misc/bcm2835_powermgt.h"
  #include "hw/sd/sdhci.h"
  #include "hw/sd/bcm2835_sdhost.h"
  #include "hw/gpio/bcm2835_gpio.h"
@@ -XXX,XX +XXX,XX @@ struct BCM2835PeripheralState {
      BCM2835MphiState mphi;
      UnimplementedDeviceState txp;
      UnimplementedDeviceState armtmr;
 -    UnimplementedDeviceState powermgt;
 +    BCM2835PowerMgtState powermgt;
      BCM2835CprmanState cprman;
      PL011State uart0;
      BCM2835AuxState aux;
 diff --git a/include/hw/misc/bcm2835_powermgt.h b/include/hw/misc/bcm2835_powermgt.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
-+++ b/tests/tcg/aarch64/bti-1.c
++++ b/include/hw/misc/bcm2835_powermgt.h
 @@ -XXX,XX +XXX,XX @@
 +/*
-+ * Branch target identification, basic notskip cases.
++ * BCM2835 Power Management emulation
 + *
 + * Copyright (C) 2017 Marcin Chojnacki <marcinch7@gmail.com>
 + * Copyright (C) 2021 Nolan Leake <nolan@sigbus.net>
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
 +
-+#include "bti-crt.inc.c"
++#ifndef BCM2835_POWERMGT_H
-+
++#define BCM2835_POWERMGT_H
-+static void skip2_sigill(int sig, siginfo_t *info, ucontext_t *uc)
++
-+{
++#include "hw/sysbus.h"
-+    uc->uc_mcontext.pc += 8;
++#include "qom/object.h"
-+    uc->uc_mcontext.pstate = 1;
++
-+}
++#define TYPE_BCM2835_POWERMGT "bcm2835-powermgt"
-+
++OBJECT_DECLARE_SIMPLE_TYPE(BCM2835PowerMgtState, BCM2835_POWERMGT)
-+#define NOP       "nop"
++
-+#define BTI_N     "hint #32"
++struct BCM2835PowerMgtState {
-+#define BTI_C     "hint #34"
++    SysBusDevice busdev;
-+#define BTI_J     "hint #36"
++    MemoryRegion iomem;
-+#define BTI_JC    "hint #38"
++
-+
++    uint32_t rstc;
-+#define BTYPE_1(DEST) \
++    uint32_t rsts;
-+    asm("mov %0,#1; adr x16, 1f; br x16; 1: " DEST "; mov %0,#0" \
++    uint32_t wdog;
-+        : "=r"(skipped) : : "x16")
++};
 +
-+#define BTYPE_2(DEST) \
++#endif
-+    asm("mov %0,#1; adr x16, 1f; blr x16; 1: " DEST "; mov %0,#0" \
+diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
-+        : "=r"(skipped) : : "x16", "x30")
+index XXXXXXX..XXXXXXX 100644
-+
+--- a/hw/arm/bcm2835_peripherals.c
-+#define BTYPE_3(DEST) \
++++ b/hw/arm/bcm2835_peripherals.c
-+    asm("mov %0,#1; adr x15, 1f; br x15; 1: " DEST "; mov %0,#0" \
+@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_init(Object *obj)
-+        : "=r"(skipped) : : "x15")
-+
+     object_property_add_const_link(OBJECT(&s->dwc2), "dma-mr",
-+#define TEST(WHICH, DEST, EXPECT) \
+                                    OBJECT(&s->gpu_bus_mr));
-+    do { WHICH(DEST); fail += skipped ^ EXPECT; } while (0)
++
-+
++    /* Power Management */
-+
++    object_initialize_child(obj, "powermgt", &s->powermgt,
-+int main()
++                            TYPE_BCM2835_POWERMGT);
-+{
+ }
-+    int fail = 0;
-+    int skipped;
+ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
-+
+@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
-+    /* Signal-like with SA_SIGINFO.  */
+         qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
-+    signal_info(SIGILL, skip2_sigill);
+                                INTERRUPT_USB));
-+
-+    TEST(BTYPE_1, NOP, 1);
++    /* Power Management */
-+    TEST(BTYPE_1, BTI_N, 1);
++    if (!sysbus_realize(SYS_BUS_DEVICE(&s->powermgt), errp)) {
-+    TEST(BTYPE_1, BTI_C, 0);
++        return;
-+    TEST(BTYPE_1, BTI_J, 0);
++    }
-+    TEST(BTYPE_1, BTI_JC, 0);
++
-+
++    memory_region_add_subregion(&s->peri_mr, PM_OFFSET,
-+    TEST(BTYPE_2, NOP, 1);
++                sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->powermgt), 0));
-+    TEST(BTYPE_2, BTI_N, 1);
++
-+    TEST(BTYPE_2, BTI_C, 0);
+     create_unimp(s, &s->txp, "bcm2835-txp", TXP_OFFSET, 0x1000);
-+    TEST(BTYPE_2, BTI_J, 1);
+     create_unimp(s, &s->armtmr, "bcm2835-sp804", ARMCTRL_TIMER0_1_OFFSET, 0x40);
-+    TEST(BTYPE_2, BTI_JC, 0);
+-    create_unimp(s, &s->powermgt, "bcm2835-powermgt", PM_OFFSET, 0x114);
-+
+     create_unimp(s, &s->i2s, "bcm2835-i2s", I2S_OFFSET, 0x100);
-+    TEST(BTYPE_3, NOP, 1);
+     create_unimp(s, &s->smi, "bcm2835-smi", SMI_OFFSET, 0x100);
-+    TEST(BTYPE_3, BTI_N, 1);
+     create_unimp(s, &s->spi[0], "bcm2835-spi0", SPI0_OFFSET, 0x20);
-+    TEST(BTYPE_3, BTI_C, 1);
+diff --git a/hw/misc/bcm2835_powermgt.c b/hw/misc/bcm2835_powermgt.c
 +    TEST(BTYPE_3, BTI_J, 0);
 +    TEST(BTYPE_3, BTI_JC, 0);
 +
 +    return fail;
 +}
 diff --git a/tests/tcg/aarch64/bti-2.c b/tests/tcg/aarch64/bti-2.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
-+++ b/tests/tcg/aarch64/bti-2.c
++++ b/hw/misc/bcm2835_powermgt.c
 @@ -XXX,XX +XXX,XX @@
 +/*
-+ * Branch target identification, basic notskip cases.
++ * BCM2835 Power Management emulation
 + *
 + * Copyright (C) 2017 Marcin Chojnacki <marcinch7@gmail.com>
 + * Copyright (C) 2021 Nolan Leake <nolan@sigbus.net>
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
 +
-+#include <stdio.h>
++#include "qemu/osdep.h"
-+#include <signal.h>
++#include "qemu/log.h"
-+#include <string.h>
++#include "qemu/module.h"
-+#include <unistd.h>
++#include "hw/misc/bcm2835_powermgt.h"
-+#include <sys/mman.h>
++#include "migration/vmstate.h"
-+
++#include "sysemu/runstate.h"
-+#ifndef PROT_BTI
++
-+#define PROT_BTI  0x10
++#define PASSWORD 0x5a000000
-+#endif
++#define PASSWORD_MASK 0xff000000
 +
-+static void skip2_sigill(int sig, siginfo_t *info, void *vuc)
++#define R_RSTC 0x1c
-+{
++#define V_RSTC_RESET 0x20
-+    ucontext_t *uc = vuc;
++#define R_RSTS 0x20
-+    uc->uc_mcontext.pc += 8;
++#define V_RSTS_POWEROFF 0x555 /* Linux uses partition 63 to indicate halt. */
-+    uc->uc_mcontext.pstate = 1;
++#define R_WDOG 0x24
-+}
++
-+
++static uint64_t bcm2835_powermgt_read(void *opaque, hwaddr offset,
-+#define NOP       "nop"
++                                      unsigned size)
-+#define BTI_N     "hint #32"
++{
-+#define BTI_C     "hint #34"
++    BCM2835PowerMgtState *s = (BCM2835PowerMgtState *)opaque;
-+#define BTI_J     "hint #36"
++    uint32_t res = 0;
-+#define BTI_JC    "hint #38"
++
-+
++    switch (offset) {
-+#define BTYPE_1(DEST)    \
++    case R_RSTC:
-+    "mov x1, #1\n\t"     \
++        res = s->rstc;
-+    "adr x16, 1f\n\t"    \
++        break;
-+    "br x16\n"           \
++    case R_RSTS:
-+"1: " DEST "\n\t"        \
++        res = s->rsts;
-+    "mov x1, #0"
++        break;
-+
++    case R_WDOG:
-+#define BTYPE_2(DEST)    \
++        res = s->wdog;
-+    "mov x1, #1\n\t"     \
++        break;
-+    "adr x16, 1f\n\t"    \
++
-+    "blr x16\n"          \
++    default:
-+"1: " DEST "\n\t"        \
++        qemu_log_mask(LOG_UNIMP,
-+    "mov x1, #0"
++                      "bcm2835_powermgt_read: Unknown offset 0x%08"HWADDR_PRIx
-+
++                      "\n", offset);
-+#define BTYPE_3(DEST)    \
++        res = 0;
-+    "mov x1, #1\n\t"     \
++        break;
-+    "adr x15, 1f\n\t"    \
++    }
-+    "br x15\n"           \
++
-+"1: " DEST "\n\t"        \
++    return res;
-+    "mov x1, #0"
++}
 +
-+#define TEST(WHICH, DEST, EXPECT) \
++static void bcm2835_powermgt_write(void *opaque, hwaddr offset,
-+    WHICH(DEST) "\n"              \
++                                   uint64_t value, unsigned size)
-+    ".if " #EXPECT "\n\t"         \
++{
-+    "eor x1, x1," #EXPECT "\n"    \
++    BCM2835PowerMgtState *s = (BCM2835PowerMgtState *)opaque;
-+    ".endif\n\t"                  \
++
-+    "add x0, x0, x1\n\t"
++    if ((value & PASSWORD_MASK) != PASSWORD) {
-+
++        qemu_log_mask(LOG_GUEST_ERROR,
-+extern char test_begin[], test_end[];
++                      "bcm2835_powermgt_write: Bad password 0x%"PRIx64
-+
++                      " at offset 0x%08"HWADDR_PRIx"\n",
-+asm("\n"
++                      value, offset);
-+"test_begin:\n\t"
++        return;
-+    BTI_C "\n\t"
++    }
-+    "mov x2, x30\n\t"
++
-+    "mov x0, #0\n\t"
++    value = value & ~PASSWORD_MASK;
 +
-+    TEST(BTYPE_1, NOP, 1)
++    switch (offset) {
-+    TEST(BTYPE_1, BTI_N, 1)
++    case R_RSTC:
-+    TEST(BTYPE_1, BTI_C, 0)
++        s->rstc = value;
-+    TEST(BTYPE_1, BTI_J, 0)
++        if (value & V_RSTC_RESET) {
-+    TEST(BTYPE_1, BTI_JC, 0)
++            if ((s->rsts & 0xfff) == V_RSTS_POWEROFF) {
-+
++                qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
-+    TEST(BTYPE_2, NOP, 1)
++            } else {
-+    TEST(BTYPE_2, BTI_N, 1)
++                qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
-+    TEST(BTYPE_2, BTI_C, 0)
++            }
-+    TEST(BTYPE_2, BTI_J, 1)
++        }
-+    TEST(BTYPE_2, BTI_JC, 0)
++        break;
-+
++    case R_RSTS:
-+    TEST(BTYPE_3, NOP, 1)
++        qemu_log_mask(LOG_UNIMP,
-+    TEST(BTYPE_3, BTI_N, 1)
++                      "bcm2835_powermgt_write: RSTS\n");
-+    TEST(BTYPE_3, BTI_C, 1)
++        s->rsts = value;
-+    TEST(BTYPE_3, BTI_J, 0)
++        break;
-+    TEST(BTYPE_3, BTI_JC, 0)
++    case R_WDOG:
-+
++        qemu_log_mask(LOG_UNIMP,
-+    "ret x2\n"
++                      "bcm2835_powermgt_write: WDOG\n");
-+"test_end:"
++        s->wdog = value;
-+);
++        break;
 +
-+int main()
++    default:
-+{
++        qemu_log_mask(LOG_UNIMP,
-+    struct sigaction sa;
++                      "bcm2835_powermgt_write: Unknown offset 0x%08"HWADDR_PRIx
-+
++                      "\n", offset);
-+    void *p = mmap(0, getpagesize(),
++        break;
-+                   PROT_EXEC | PROT_READ | PROT_WRITE | PROT_BTI,
++    }
-+                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
++}
-+    if (p == MAP_FAILED) {
++
-+        perror("mmap");
++static const MemoryRegionOps bcm2835_powermgt_ops = {
-+        return 1;
++    .read = bcm2835_powermgt_read,
-+    }
++    .write = bcm2835_powermgt_write,
-+
++    .endianness = DEVICE_NATIVE_ENDIAN,
-+    memset(&sa, 0, sizeof(sa));
++    .impl.min_access_size = 4,
-+    sa.sa_sigaction = skip2_sigill;
++    .impl.max_access_size = 4,
-+    sa.sa_flags = SA_SIGINFO;
++};
-+    if (sigaction(SIGILL, &sa, NULL) < 0) {
++
-+        perror("sigaction");
++static const VMStateDescription vmstate_bcm2835_powermgt = {
-+        return 1;
++    .name = TYPE_BCM2835_POWERMGT,
-+    }
++    .version_id = 1,
-+
++    .minimum_version_id = 1,
-+    memcpy(p, test_begin, test_end - test_begin);
++    .fields = (VMStateField[]) {
-+    return ((int (*)(void))p)();
++        VMSTATE_UINT32(rstc, BCM2835PowerMgtState),
-+}
++        VMSTATE_UINT32(rsts, BCM2835PowerMgtState),
-diff --git a/tests/tcg/aarch64/bti-crt.inc.c b/tests/tcg/aarch64/bti-crt.inc.c
++        VMSTATE_UINT32(wdog, BCM2835PowerMgtState),
-new file mode 100644
++        VMSTATE_END_OF_LIST()
-index XXXXXXX..XXXXXXX
++    }
---- /dev/null
++};
-+++ b/tests/tcg/aarch64/bti-crt.inc.c
++
-@@ -XXX,XX +XXX,XX @@
++static void bcm2835_powermgt_init(Object *obj)
-+/*
++{
-+ * Minimal user-environment for testing BTI.
++    BCM2835PowerMgtState *s = BCM2835_POWERMGT(obj);
-+ *
++
-+ * Normal libc is not (yet) built with BTI support enabled,
++    memory_region_init_io(&s->iomem, obj, &bcm2835_powermgt_ops, s,
-+ * and so could generate a BTI TRAP before ever reaching main.
++                          TYPE_BCM2835_POWERMGT, 0x200);
-+ */
++    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->iomem);
-+
++}
-+#include <stdlib.h>
++
-+#include <signal.h>
++static void bcm2835_powermgt_reset(DeviceState *dev)
-+#include <ucontext.h>
++{
-+#include <asm/unistd.h>
++    BCM2835PowerMgtState *s = BCM2835_POWERMGT(dev);
 +
-+int main(void);
++    /* https://elinux.org/BCM2835_registers#PM */
-+
++    s->rstc = 0x00000102;
-+void _start(void)
++    s->rsts = 0x00001000;
-+{
++    s->wdog = 0x00000000;
-+    exit(main());
++}
-+}
++
-+
++static void bcm2835_powermgt_class_init(ObjectClass *klass, void *data)
-+void exit(int ret)
++{
-+{
++    DeviceClass *dc = DEVICE_CLASS(klass);
-+    register int x0 __asm__("x0") = ret;
++
-+    register int x8 __asm__("x8") = __NR_exit;
++    dc->reset = bcm2835_powermgt_reset;
-+
++    dc->vmsd = &vmstate_bcm2835_powermgt;
-+    asm volatile("svc #0" : : "r"(x0), "r"(x8));
++}
-+    __builtin_unreachable();
++
-+}
++static TypeInfo bcm2835_powermgt_info = {
-+
++    .name          = TYPE_BCM2835_POWERMGT,
-+/*
++    .parent        = TYPE_SYS_BUS_DEVICE,
-+ * Irritatingly, the user API struct sigaction does not match the
++    .instance_size = sizeof(BCM2835PowerMgtState),
-+ * kernel API struct sigaction.  So for simplicity, isolate the
++    .class_init    = bcm2835_powermgt_class_init,
-+ * kernel ABI here, and make this act like signal.
++    .instance_init = bcm2835_powermgt_init,
-+ */
++};
-+void signal_info(int sig, void (*fn)(int, siginfo_t *, ucontext_t *))
++
-+{
++static void bcm2835_powermgt_register_types(void)
-+    struct kernel_sigaction {
++{
-+        void (*handler)(int, siginfo_t *, ucontext_t *);
++    type_register_static(&bcm2835_powermgt_info);
-+        unsigned long flags;
++}
-+        unsigned long restorer;
++
-+        unsigned long mask;
++type_init(bcm2835_powermgt_register_types)
-+    } sa = { fn, SA_SIGINFO, 0, 0 };
+diff --git a/hw/misc/meson.build b/hw/misc/meson.build
 +
 +    register int x0 __asm__("x0") = sig;
 +    register void *x1 __asm__("x1") = &sa;
 +    register void *x2 __asm__("x2") = 0;
 +    register int x3 __asm__("x3") = sizeof(unsigned long);
 +    register int x8 __asm__("x8") = __NR_rt_sigaction;
 +
 +    asm volatile("svc #0"
 +                 : : "r"(x0), "r"(x1), "r"(x2), "r"(x3), "r"(x8) : "memory");
 +}
 diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
 index XXXXXXX..XXXXXXX 100644
---- a/tests/tcg/aarch64/Makefile.target
+--- a/hw/misc/meson.build
-+++ b/tests/tcg/aarch64/Makefile.target
++++ b/hw/misc/meson.build
-@@ -XXX,XX +XXX,XX @@ run-pauth-%: QEMU_OPTS += -cpu max
+@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_RASPI', if_true: files(
- run-plugin-pauth-%: QEMU_OPTS += -cpu max
+   'bcm2835_rng.c',
- endif
+   'bcm2835_thermal.c',
+   'bcm2835_cprman.c',
-+# BTI Tests
++  'bcm2835_powermgt.c',
-+# bti-1 tests the elf notes, so we require special compiler support.
+ ))
-+ifneq ($(DOCKER_IMAGE)$(CROSS_CC_HAS_ARMV8_BTI),)
+ softmmu_ss.add(when: 'CONFIG_SLAVIO', if_true: files('slavio_misc.c'))
-+AARCH64_TESTS += bti-1
+ softmmu_ss.add(when: 'CONFIG_ZYNQ', if_true: files('zynq_slcr.c', 'zynq-xadc.c'))
 +bti-1: CFLAGS += -mbranch-protection=standard
 +bti-1: LDFLAGS += -nostdlib
 +endif
 +# bti-2 tests PROT_BTI, so no special compiler support required.
 +AARCH64_TESTS += bti-2
 +
  # Semihosting smoke test for linux-user
  AARCH64_TESTS += semihosting
  run-semihosting: semihosting
 diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
 index XXXXXXX..XXXXXXX 100755
 --- a/tests/tcg/configure.sh
 +++ b/tests/tcg/configure.sh
@@ -XXX,XX +XXX,XX @@ for target in $target_list; do
                 -march=armv8.3-a -o $TMPE $TMPC; then
                  echo "CROSS_CC_HAS_ARMV8_3=y" >> $config_target_mak
              fi
 +            if do_compiler "$target_compiler" $target_compiler_cflags \
 +               -mbranch-protection=standard -o $TMPE $TMPC; then
 +                echo "CROSS_CC_HAS_ARMV8_BTI=y" >> $config_target_mak
 +            fi
          ;;
      esac
 --
 .20.1

-[PULL 03/41] hw/arm/strongarm: Fix 'time to transmit a char' unit comment
+[PULL 04/24] tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine
 From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-The time to transmit a char is expressed in nanoseconds, not in ticks.
+Add a test booting and quickly shutdown a raspi2 machine,
 to test the power management model:
    (1/1) tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_raspi2_initrd:
   console: [    0.000000] Booting Linux on physical CPU 0xf00
   console: [    0.000000] Linux version 4.14.98-v7+ (dom@dom-XPS-13-9370) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1200 SMP Tue Feb 12 20:27:48 GMT 2019
   console: [    0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
   console: [    0.000000] CPU: div instructions available: patching division code
   console: [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
   console: [    0.000000] OF: fdt: Machine model: Raspberry Pi 2 Model B
   ...
   console: Boot successful.
   console: cat /proc/cpuinfo
   console: / # cat /proc/cpuinfo
   ...
   console: processor      : 3
   console: model name     : ARMv7 Processor rev 5 (v7l)
   console: BogoMIPS       : 125.00
   console: Features       : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
   console: CPU implementer        : 0x41
   console: CPU architecture: 7
   console: CPU variant    : 0x0
   console: CPU part       : 0xc07
   console: CPU revision   : 5
   console: Hardware       : BCM2835
   console: Revision       : 0000
   console: Serial         : 0000000000000000
   console: cat /proc/iomem
   console: / # cat /proc/iomem
   console: 00000000-3bffffff : System RAM
   console: 00008000-00afffff : Kernel code
   console: 00c00000-00d468ef : Kernel data
   console: 3f006000-3f006fff : dwc_otg
   console: 3f007000-3f007eff : /soc/dma@7e007000
   console: 3f00b880-3f00b8bf : /soc/mailbox@7e00b880
   console: 3f100000-3f100027 : /soc/watchdog@7e100000
   console: 3f101000-3f102fff : /soc/cprman@7e101000
   console: 3f200000-3f2000b3 : /soc/gpio@7e200000
   PASS (24.59 s)
   RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
   JOB TIME   : 25.02 s
 Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20201014213601.205222-1-f4bug@amsat.org
+Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20210531113837.1689775-1-f4bug@amsat.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/strongarm.c | 2 +-
+ tests/acceptance/boot_linux_console.py | 43 ++++++++++++++++++++++++++
-file changed, 1 insertion(+), 1 deletion(-)
+file changed, 43 insertions(+)
-diff --git a/hw/arm/strongarm.c b/hw/arm/strongarm.c
+diff --git a/tests/acceptance/boot_linux_console.py b/tests/acceptance/boot_linux_console.py
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/strongarm.c
+--- a/tests/acceptance/boot_linux_console.py
-+++ b/hw/arm/strongarm.c
++++ b/tests/acceptance/boot_linux_console.py
-@@ -XXX,XX +XXX,XX @@ struct StrongARMUARTState {
+@@ -XXX,XX +XXX,XX @@
-     uint8_t rx_start;
+ from avocado import skip
-     uint8_t rx_len;
+ from avocado import skipUnless
+ from avocado_qemu import Test
--    uint64_t char_transmit_time; /* time to transmit a char in ticks*/
++from avocado_qemu import exec_command
-+    uint64_t char_transmit_time; /* time to transmit a char in nanoseconds */
+ from avocado_qemu import exec_command_and_wait_for_pattern
-     bool wait_break_end;
+ from avocado_qemu import interrupt_interactive_console_until_pattern
-     QEMUTimer *rx_timeout_timer;
+ from avocado_qemu import wait_for_console_pattern
-     QEMUTimer *tx_timer;
+@@ -XXX,XX +XXX,XX @@ def test_arm_raspi2_uart0(self):
          """
          self.do_test_arm_raspi2(0)
 +    def test_arm_raspi2_initrd(self):
 +        """
 +        :avocado: tags=arch:arm
 +        :avocado: tags=machine:raspi2
 +        """
 +        deb_url = ('http://archive.raspberrypi.org/debian/'
 +                   'pool/main/r/raspberrypi-firmware/'
 +                   'raspberrypi-kernel_1.20190215-1_armhf.deb')
 +        deb_hash = 'cd284220b32128c5084037553db3c482426f3972'
 +        deb_path = self.fetch_asset(deb_url, asset_hash=deb_hash)
 +        kernel_path = self.extract_from_deb(deb_path, '/boot/kernel7.img')
 +        dtb_path = self.extract_from_deb(deb_path, '/boot/bcm2709-rpi-2-b.dtb')
 +
 +        initrd_url = ('https://github.com/groeck/linux-build-test/raw/'
 +                      '2eb0a73b5d5a28df3170c546ddaaa9757e1e0848/rootfs/'
 +                      'arm/rootfs-armv7a.cpio.gz')
 +        initrd_hash = '604b2e45cdf35045846b8bbfbf2129b1891bdc9c'
 +        initrd_path_gz = self.fetch_asset(initrd_url, asset_hash=initrd_hash)
 +        initrd_path = os.path.join(self.workdir, 'rootfs.cpio')
 +        archive.gzip_uncompress(initrd_path_gz, initrd_path)
 +
 +        self.vm.set_console()
 +        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
 +                               'earlycon=pl011,0x3f201000 console=ttyAMA0 '
 +                               'panic=-1 noreboot ' +
 +                               'dwc_otg.fiq_fsm_enable=0')
 +        self.vm.add_args('-kernel', kernel_path,
 +                         '-dtb', dtb_path,
 +                         '-initrd', initrd_path,
 +                         '-append', kernel_command_line,
 +                         '-no-reboot')
 +        self.vm.launch()
 +        self.wait_for_console_pattern('Boot successful.')
 +
 +        exec_command_and_wait_for_pattern(self, 'cat /proc/cpuinfo',
 +                                                'BCM2835')
 +        exec_command_and_wait_for_pattern(self, 'cat /proc/iomem',
 +                                                '/soc/cprman@7e101000')
 +        exec_command(self, 'halt')
 +        # Wait for VM to shut down gracefully
 +        self.vm.wait()
 +
      def test_arm_exynos4210_initrd(self):
          """
          :avocado: tags=arch:arm
 --
 .20.1

-[PULL 36/41] linux-user/elfload: Move PT_INTERP detection to first loop
+[PULL 05/24] target/arm: Check NaN mode before silencing NaN
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Joe Komlodi <joe.komlodi@xilinx.com>
-For BTI, we need to know if the executable is static or dynamic,
+If the CPU is running in default NaN mode (FPCR.DN == 1) and we execute
-which means looking for PT_INTERP earlier.
+FRSQRTE, FRECPE, or FRECPX with a signaling NaN, parts_silence_nan_frac() will
 assert due to fpst->default_nan_mode being set.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+To avoid this, we check to see what NaN mode we're running in before we call
-Message-id: 20201016184207.786698-8-richard.henderson@linaro.org
+floatxx_silence_nan().
 Signed-off-by: Joe Komlodi <joe.komlodi@xilinx.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 1624662174-175828-2-git-send-email-joe.komlodi@xilinx.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- linux-user/elfload.c | 60 +++++++++++++++++++++++---------------------
+ target/arm/helper-a64.c | 12 +++++++++---
-file changed, 31 insertions(+), 29 deletions(-)
+ target/arm/vfp_helper.c | 24 ++++++++++++++++++------
 files changed, 27 insertions(+), 9 deletions(-)
-diff --git a/linux-user/elfload.c b/linux-user/elfload.c
+diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/linux-user/elfload.c
+--- a/target/arm/helper-a64.c
-+++ b/linux-user/elfload.c
++++ b/target/arm/helper-a64.c
-@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(frecpx_f16)(uint32_t a, void *fpstp)
+         float16 nan = a;
-     mmap_lock();
+         if (float16_is_signaling_nan(a, fpst)) {
+             float_raise(float_flag_invalid, fpst);
--    /* Find the maximum size of the image and allocate an appropriate
+-            nan = float16_silence_nan(a, fpst);
--       amount of memory to handle that.  */
++            if (!fpst->default_nan_mode) {
-+    /*
++                nan = float16_silence_nan(a, fpst);
 +     * Find the maximum size of the image and allocate an appropriate
 +     * amount of memory to handle that.  Locate the interpreter, if any.
 +     */
      loaddr = -1, hiaddr = 0;
      info->alignment = 0;
      for (i = 0; i < ehdr->e_phnum; ++i) {
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
              }
              ++info->nsegs;
              info->alignment |= eppnt->p_align;
 +        } else if (eppnt->p_type == PT_INTERP && pinterp_name) {
 +            g_autofree char *interp_name = NULL;
 +
 +            if (*pinterp_name) {
 +                errmsg = "Multiple PT_INTERP entries";
 +                goto exit_errmsg;
 +            }
-+            interp_name = g_malloc(eppnt->p_filesz);
+         }
-+            if (!interp_name) {
+         if (fpst->default_nan_mode) {
-+                goto exit_perror;
+             nan = float16_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(frecpx_f32)(float32 a, void *fpstp)
          float32 nan = a;
          if (float32_is_signaling_nan(a, fpst)) {
              float_raise(float_flag_invalid, fpst);
 -            nan = float32_silence_nan(a, fpst);
 +            if (!fpst->default_nan_mode) {
 +                nan = float32_silence_nan(a, fpst);
 +            }
-+
+         }
-+            if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) {
+         if (fpst->default_nan_mode) {
-+                memcpy(interp_name, bprm_buf + eppnt->p_offset,
+             nan = float32_default_nan(fpst);
-+                       eppnt->p_filesz);
+@@ -XXX,XX +XXX,XX @@ float64 HELPER(frecpx_f64)(float64 a, void *fpstp)
-+            } else {
+         float64 nan = a;
-+                retval = pread(image_fd, interp_name, eppnt->p_filesz,
+         if (float64_is_signaling_nan(a, fpst)) {
-+                               eppnt->p_offset);
+             float_raise(float_flag_invalid, fpst);
-+                if (retval != eppnt->p_filesz) {
+-            nan = float64_silence_nan(a, fpst);
-+                    goto exit_perror;
++            if (!fpst->default_nan_mode) {
-+                }
++                nan = float64_silence_nan(a, fpst);
 +            }
-+            if (interp_name[eppnt->p_filesz - 1] != 0) {
+         }
-+                errmsg = "Invalid PT_INTERP entry";
+         if (fpst->default_nan_mode) {
-+                goto exit_errmsg;
+             nan = float64_default_nan(fpst);
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vfp_helper.c
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, void *fpstp)
          float16 nan = f16;
          if (float16_is_signaling_nan(f16, fpst)) {
              float_raise(float_flag_invalid, fpst);
 -            nan = float16_silence_nan(f16, fpst);
 +            if (!fpst->default_nan_mode) {
 +                nan = float16_silence_nan(f16, fpst);
 +            }
-+            *pinterp_name = g_steal_pointer(&interp_name);
          }
-     }
+         if (fpst->default_nan_mode) {
+             nan =  float16_default_nan(fpst);
-@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
+@@ -XXX,XX +XXX,XX @@ float32 HELPER(recpe_f32)(float32 input, void *fpstp)
-             if (vaddr_em > info->brk) {
+         float32 nan = f32;
-                 info->brk = vaddr_em;
+         if (float32_is_signaling_nan(f32, fpst)) {
-             }
+             float_raise(float_flag_invalid, fpst);
--        } else if (eppnt->p_type == PT_INTERP && pinterp_name) {
+-            nan = float32_silence_nan(f32, fpst);
--            g_autofree char *interp_name = NULL;
++            if (!fpst->default_nan_mode) {
--
++                nan = float32_silence_nan(f32, fpst);
--            if (*pinterp_name) {
++            }
--                errmsg = "Multiple PT_INTERP entries";
+         }
--                goto exit_errmsg;
+         if (fpst->default_nan_mode) {
--            }
+             nan =  float32_default_nan(fpst);
--            interp_name = g_malloc(eppnt->p_filesz);
+@@ -XXX,XX +XXX,XX @@ float64 HELPER(recpe_f64)(float64 input, void *fpstp)
--            if (!interp_name) {
+         float64 nan = f64;
--                goto exit_perror;
+         if (float64_is_signaling_nan(f64, fpst)) {
--            }
+             float_raise(float_flag_invalid, fpst);
--
+-            nan = float64_silence_nan(f64, fpst);
--            if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) {
++            if (!fpst->default_nan_mode) {
--                memcpy(interp_name, bprm_buf + eppnt->p_offset,
++                nan = float64_silence_nan(f64, fpst);
--                       eppnt->p_filesz);
++            }
--            } else {
+         }
--                retval = pread(image_fd, interp_name, eppnt->p_filesz,
+         if (fpst->default_nan_mode) {
--                               eppnt->p_offset);
+             nan =  float64_default_nan(fpst);
--                if (retval != eppnt->p_filesz) {
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, void *fpstp)
--                    goto exit_perror;
+         float16 nan = f16;
--                }
+         if (float16_is_signaling_nan(f16, s)) {
--            }
+             float_raise(float_flag_invalid, s);
--            if (interp_name[eppnt->p_filesz - 1] != 0) {
+-            nan = float16_silence_nan(f16, s);
--                errmsg = "Invalid PT_INTERP entry";
++            if (!s->default_nan_mode) {
--                goto exit_errmsg;
++                nan = float16_silence_nan(f16, fpstp);
--            }
++            }
--            *pinterp_name = g_steal_pointer(&interp_name);
+         }
- #ifdef TARGET_MIPS
+         if (s->default_nan_mode) {
-         } else if (eppnt->p_type == PT_MIPS_ABIFLAGS) {
+             nan =  float16_default_nan(s);
-             Mips_elf_abiflags_v0 abiflags;
+@@ -XXX,XX +XXX,XX @@ float32 HELPER(rsqrte_f32)(float32 input, void *fpstp)
          float32 nan = f32;
          if (float32_is_signaling_nan(f32, s)) {
              float_raise(float_flag_invalid, s);
 -            nan = float32_silence_nan(f32, s);
 +            if (!s->default_nan_mode) {
 +                nan = float32_silence_nan(f32, fpstp);
 +            }
          }
          if (s->default_nan_mode) {
              nan =  float32_default_nan(s);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(rsqrte_f64)(float64 input, void *fpstp)
          float64 nan = f64;
          if (float64_is_signaling_nan(f64, s)) {
              float_raise(float_flag_invalid, s);
 -            nan = float64_silence_nan(f64, s);
 +            if (!s->default_nan_mode) {
 +                nan = float64_silence_nan(f64, fpstp);
 +            }
          }
          if (s->default_nan_mode) {
              nan =  float64_default_nan(s);
 --
 .20.1

-[PULL 37/41] linux-user/elfload: Use Error for load_elf_image
+[PULL 06/24] hw/gpio/gpio_pwr: use shutdown function for reboot
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Maxim Uvarov <maxim.uvarov@linaro.org>
-This is a bit clearer than open-coding some of this
+qemu has 2 type of functions: shutdown and reboot. Shutdown
-with a bare c string.
+function has to be used for machine shutdown. Otherwise we cause
 a reset with a bogus "cause" value, when we intended a shutdown.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20201016184207.786698-9-richard.henderson@linaro.org
+Message-id: 20210625111842.3790-3-maxim.uvarov@linaro.org
 [PMM: tweaked commit message]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- linux-user/elfload.c | 37 ++++++++++++++++++++-----------------
+ hw/gpio/gpio_pwr.c | 2 +-
-file changed, 20 insertions(+), 17 deletions(-)
+file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/linux-user/elfload.c b/linux-user/elfload.c
+diff --git a/hw/gpio/gpio_pwr.c b/hw/gpio/gpio_pwr.c
 index XXXXXXX..XXXXXXX 100644
---- a/linux-user/elfload.c
+--- a/hw/gpio/gpio_pwr.c
-+++ b/linux-user/elfload.c
++++ b/hw/gpio/gpio_pwr.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void gpio_pwr_reset(void *opaque, int n, int level)
- #include "qemu/guest-random.h"
+ static void gpio_pwr_shutdown(void *opaque, int n, int level)
- #include "qemu/units.h"
+ {
- #include "qemu/selfmap.h"
+     if (level) {
-+#include "qapi/error.h"
+-        qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
++        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
  #ifdef _ARCH_PPC64
  #undef ARCH_DLINFO
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
      struct elf_phdr *phdr;
      abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
      int i, retval;
 -    const char *errmsg;
 +    Error *err = NULL;
      /* First of all, some simple consistency checks */
 -    errmsg = "Invalid ELF image for this architecture";
      if (!elf_check_ident(ehdr)) {
 +        error_setg(&err, "Invalid ELF image for this architecture");
          goto exit_errmsg;
      }
-     bswap_ehdr(ehdr);
-     if (!elf_check_ehdr(ehdr)) {
-+        error_setg(&err, "Invalid ELF image for this architecture");
-         goto exit_errmsg;
-     }
-@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
-             g_autofree char *interp_name = NULL;
-             if (*pinterp_name) {
--                errmsg = "Multiple PT_INTERP entries";
-+                error_setg(&err, "Multiple PT_INTERP entries");
-                 goto exit_errmsg;
-             }
-+
-             interp_name = g_malloc(eppnt->p_filesz);
--            if (!interp_name) {
--                goto exit_perror;
--            }
-             if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) {
-                 memcpy(interp_name, bprm_buf + eppnt->p_offset,
-@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
-                 retval = pread(image_fd, interp_name, eppnt->p_filesz,
-                                eppnt->p_offset);
-                 if (retval != eppnt->p_filesz) {
--                    goto exit_perror;
-+                    goto exit_read;
-                 }
-             }
-             if (interp_name[eppnt->p_filesz - 1] != 0) {
--                errmsg = "Invalid PT_INTERP entry";
-+                error_setg(&err, "Invalid PT_INTERP entry");
-                 goto exit_errmsg;
-             }
-             *pinterp_name = g_steal_pointer(&interp_name);
-@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
-                             (ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
-                             -1, 0);
-     if (load_addr == -1) {
--        goto exit_perror;
-+        goto exit_mmap;
-     }
-     load_bias = load_addr - loaddr;
-@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
-                                     image_fd, eppnt->p_offset - vaddr_po);
-                 if (error == -1) {
--                    goto exit_perror;
-+                    goto exit_mmap;
-                 }
-             }
-@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
-         } else if (eppnt->p_type == PT_MIPS_ABIFLAGS) {
-             Mips_elf_abiflags_v0 abiflags;
-             if (eppnt->p_filesz < sizeof(Mips_elf_abiflags_v0)) {
--                errmsg = "Invalid PT_MIPS_ABIFLAGS entry";
-+                error_setg(&err, "Invalid PT_MIPS_ABIFLAGS entry");
-                 goto exit_errmsg;
-             }
-             if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) {
-@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
-                 retval = pread(image_fd, &abiflags, sizeof(Mips_elf_abiflags_v0),
-                                eppnt->p_offset);
-                 if (retval != sizeof(Mips_elf_abiflags_v0)) {
--                    goto exit_perror;
-+                    goto exit_read;
-                 }
-             }
-             bswap_mips_abiflags(&abiflags);
-@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
-  exit_read:
-     if (retval >= 0) {
--        errmsg = "Incomplete read of file header";
--        goto exit_errmsg;
-+        error_setg(&err, "Incomplete read of file header");
-+    } else {
-+        error_setg_errno(&err, errno, "Error reading file header");
-     }
-- exit_perror:
--    errmsg = strerror(errno);
-+    goto exit_errmsg;
-+ exit_mmap:
-+    error_setg_errno(&err, errno, "Error mapping file");
-+    goto exit_errmsg;
-  exit_errmsg:
--    fprintf(stderr, "%s: %s\n", image_name, errmsg);
-+    error_reportf_err(err, "%s: ", image_name);
-     exit(-1);
  }
 --
 .20.1

-[PULL 38/41] linux-user/elfload: Use Error for load_elf_interp
+[PULL 07/24] target/arm: Fix MVE widening/narrowing VLDR/VSTR offset calculation
-From: Richard Henderson <richard.henderson@linaro.org>
+In do_ldst(), the calculation of the offset needs to be based on the
 size of the memory access, not the size of the elements in the
 vector.  This meant we were getting it wrong for the widening and
 narrowing variants of the various VLDR and VSTR insns.
-This is slightly clearer than just using strerror, though
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-the different forms produced by error_setg_file_open and
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-error_setg_errno isn't entirely convenient.
+Message-id: 20210628135835.6690-2-peter.maydell@linaro.org
 ---
  target/arm/translate-mve.c | 17 +++++++++--------
 file changed, 9 insertions(+), 8 deletions(-)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Message-id: 20201016184207.786698-10-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  linux-user/elfload.c | 15 ++++++++-------
 file changed, 8 insertions(+), 7 deletions(-)
 diff --git a/linux-user/elfload.c b/linux-user/elfload.c
 index XXXXXXX..XXXXXXX 100644
---- a/linux-user/elfload.c
+--- a/target/arm/translate-mve.c
-+++ b/linux-user/elfload.c
++++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static void load_elf_interp(const char *filename, struct image_info *info,
+@@ -XXX,XX +XXX,XX @@ static bool mve_skip_first_beat(DisasContext *s)
-                             char bprm_buf[BPRM_BUF_SIZE])
+     }
  }
 -static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
 +static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn,
 +                    unsigned msize)
  {
-     int fd, retval;
+     TCGv_i32 addr;
-+    Error *err = NULL;
+     uint32_t offset;
+@@ -XXX,XX +XXX,XX @@ static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
-     fd = open(path(filename), O_RDONLY);
+         return true;
      if (fd < 0) {
 -        goto exit_perror;
 +        error_setg_file_open(&err, errno, filename);
 +        error_report_err(err);
 +        exit(-1);
      }
-     retval = read(fd, bprm_buf, BPRM_BUF_SIZE);
+-    offset = a->imm << a->size;
-     if (retval < 0) {
++    offset = a->imm << msize;
--        goto exit_perror;
+     if (!a->a) {
-+        error_setg_errno(&err, errno, "Error reading file header");
+         offset = -offset;
 +        error_reportf_err(err, "%s: ", filename);
 +        exit(-1);
      }
-+
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
-     if (retval < BPRM_BUF_SIZE) {
+         { gen_helper_mve_vstrw, gen_helper_mve_vldrw },
-         memset(bprm_buf + retval, 0, BPRM_BUF_SIZE - retval);
+         { NULL, NULL }
      };
 -    return do_ldst(s, a, ldstfns[a->size][a->l]);
 +    return do_ldst(s, a, ldstfns[a->size][a->l], a->size);
  }
 -#define DO_VLDST_WIDE_NARROW(OP, SLD, ULD, ST)                  \
 +#define DO_VLDST_WIDE_NARROW(OP, SLD, ULD, ST, MSIZE)           \
      static bool trans_##OP(DisasContext *s, arg_VLDR_VSTR *a)   \
      {                                                           \
          static MVEGenLdStFn * const ldstfns[2][2] = {           \
              { gen_helper_mve_##ST, gen_helper_mve_##SLD },      \
              { NULL, gen_helper_mve_##ULD },                     \
          };                                                      \
 -        return do_ldst(s, a, ldstfns[a->u][a->l]);              \
 +        return do_ldst(s, a, ldstfns[a->u][a->l], MSIZE);       \
      }
-     load_elf_image(filename, fd, info, NULL, bprm_buf);
+-DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h)
--    return;
+-DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w)
--
+-DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w)
-- exit_perror:
++DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h, MO_8)
--    fprintf(stderr, "%s: %s\n", filename, strerror(errno));
++DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w, MO_8)
--    exit(-1);
++DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w, MO_16)
- }
+ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
- static int symfind(const void *s0, const void *s1)
+ {
 --
 .20.1

-[PULL 35/41] linux-user/elfload: Adjust iteration over phdr
+[PULL 08/24] target/arm: Fix bugs in MVE VRMLALDAVH, VRMLSLDAVH
-From: Richard Henderson <richard.henderson@linaro.org>
+The initial implementation of the MVE VRMLALDAVH and VRMLSLDAVH
 insns had some bugs:
  * the 32x32 multiply of elements was being done as 32x32->32,
    not 32x32->64
  * we were incorrectly maintaining the accumulator in its full
 -bit form across all 4 beats of the insn; in the pseudocode
    it is squashed back into the 64 bits of the RdaHi:RdaLo
    registers after each beat
-The second loop uses a loop induction variable, and the first
+In particular, fixing the second of these allows us to recast
-does not.  Transform the first to match the second, to simplify
+the implementation to avoid 128-bit arithmetic entirely.
 a following patch moving code between them.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Since the element size here is always 4, we can also drop the
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+parameterization of ESIZE to make the code a little more readable.
-Message-id: 20201016184207.786698-7-richard.henderson@linaro.org
 Suggested-by: Richard Henderson <richard.henderson@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210628135835.6690-3-peter.maydell@linaro.org
 ---
- linux-user/elfload.c | 9 +++++----
+ target/arm/mve_helper.c | 38 +++++++++++++++++++++-----------------
-file changed, 5 insertions(+), 4 deletions(-)
+file changed, 21 insertions(+), 17 deletions(-)
-diff --git a/linux-user/elfload.c b/linux-user/elfload.c
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/linux-user/elfload.c
+--- a/target/arm/mve_helper.c
-+++ b/linux-user/elfload.c
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
+@@ -XXX,XX +XXX,XX @@
-     loaddr = -1, hiaddr = 0;
+  */
-     info->alignment = 0;
-     for (i = 0; i < ehdr->e_phnum; ++i) {
+ #include "qemu/osdep.h"
--        if (phdr[i].p_type == PT_LOAD) {
+-#include "qemu/int128.h"
--            abi_ulong a = phdr[i].p_vaddr - phdr[i].p_offset;
+ #include "cpu.h"
-+        struct elf_phdr *eppnt = phdr + i;
+ #include "internals.h"
-+        if (eppnt->p_type == PT_LOAD) {
+ #include "vec_internal.h"
-+            abi_ulong a = eppnt->p_vaddr - eppnt->p_offset;
+@@ -XXX,XX +XXX,XX @@ DO_LDAV(vmlsldavsw, 4, int32_t, false, +=, -=)
-             if (a < loaddr) {
+ DO_LDAV(vmlsldavxsw, 4, int32_t, true, +=, -=)
-                 loaddr = a;
-             }
+ /*
--            a = phdr[i].p_vaddr + phdr[i].p_memsz;
+- * Rounding multiply add long dual accumulate high: we must keep
-+            a = eppnt->p_vaddr + eppnt->p_memsz;
+- * a 72-bit internal accumulator value and return the top 64 bits.
-             if (a > hiaddr) {
++ * Rounding multiply add long dual accumulate high. In the pseudocode
-                 hiaddr = a;
++ * this is implemented with a 72-bit internal accumulator value of which
-             }
++ * the top 64 bits are returned. We optimize this to avoid having to
-             ++info->nsegs;
++ * use 128-bit arithmetic -- we can do this because the 74-bit accumulator
--            info->alignment |= phdr[i].p_align;
++ * is squashed back into 64-bits after each beat.
-+            info->alignment |= eppnt->p_align;
+  */
-         }
+-#define DO_LDAVH(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC, TO128)         \
 +#define DO_LDAVH(OP, TYPE, LTYPE, XCHG, SUB)                            \
      uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
                                      void *vm, uint64_t a)               \
      {                                                                   \
          uint16_t mask = mve_element_mask(env);                          \
          unsigned e;                                                     \
          TYPE *n = vn, *m = vm;                                          \
 -        Int128 acc = int128_lshift(TO128(a), 8);                        \
 -        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
 +        for (e = 0; e < 16 / 4; e++, mask >>= 4) {                      \
              if (mask & 1) {                                             \
 +                LTYPE mul;                                              \
                  if (e & 1) {                                            \
 -                    acc = ODDACC(acc, TO128(n[H##ESIZE(e - 1 * XCHG)] * \
 -                                            m[H##ESIZE(e)]));           \
 +                    mul = (LTYPE)n[H4(e - 1 * XCHG)] * m[H4(e)];        \
 +                    if (SUB) {                                          \
 +                        mul = -mul;                                     \
 +                    }                                                   \
                  } else {                                                \
 -                    acc = EVENACC(acc, TO128(n[H##ESIZE(e + 1 * XCHG)] * \
 -                                             m[H##ESIZE(e)]));          \
 +                    mul = (LTYPE)n[H4(e + 1 * XCHG)] * m[H4(e)];        \
                  }                                                       \
 -                acc = int128_add(acc, int128_make64(1 << 7));           \
 +                mul = (mul >> 8) + ((mul >> 7) & 1);                    \
 +                a += mul;                                               \
              }                                                           \
          }                                                               \
          mve_advance_vpt(env);                                           \
 -        return int128_getlo(int128_rshift(acc, 8));                     \
 +        return a;                                                       \
      }
+-DO_LDAVH(vrmlaldavhsw, 4, int32_t, false, int128_add, int128_add, int128_makes64)
+-DO_LDAVH(vrmlaldavhxsw, 4, int32_t, true, int128_add, int128_add, int128_makes64)
++DO_LDAVH(vrmlaldavhsw, int32_t, int64_t, false, false)
++DO_LDAVH(vrmlaldavhxsw, int32_t, int64_t, true, false)
+-DO_LDAVH(vrmlaldavhuw, 4, uint32_t, false, int128_add, int128_add, int128_make64)
++DO_LDAVH(vrmlaldavhuw, uint32_t, uint64_t, false, false)
+-DO_LDAVH(vrmlsldavhsw, 4, int32_t, false, int128_add, int128_sub, int128_makes64)
+-DO_LDAVH(vrmlsldavhxsw, 4, int32_t, true, int128_add, int128_sub, int128_makes64)
++DO_LDAVH(vrmlsldavhsw, int32_t, int64_t, false, true)
++DO_LDAVH(vrmlsldavhxsw, int32_t, int64_t, true, true)
+ /* Vector add across vector */
+ #define DO_VADDV(OP, ESIZE, TYPE)                               \
 --
 .20.1

-[PULL 01/41] target/arm: Fix SMLAD incorrect setting of Q bit
+[PULL 09/24] target/arm: Make asimd_imm_const() public
-The SMLAD instruction is supposed to:
+The function asimd_imm_const() in translate-neon.c is an
- * signed multiply Rn[15:0] * Rm[15:0]
+implementation of the pseudocode AdvSIMDExpandImm(), which we will
- * signed multiply Rn[31:16] * Rm[31:16]
+also want for MVE.  Move the implementation to translate.c, with a
- * perform a signed addition of the products and Ra
+prototype in translate.h.
  * set Rd to the low 32 bits of the theoretical
    infinite-precision result
  * set the Q flag if the sign-extension of Rd
    would differ from the infinite-precision result
    (ie on overflow)
 Our current implementation doesn't quite do this, though: it performs
 an addition of the products setting Q on overflow, and then it adds
 Ra, again possibly setting Q.  This sometimes incorrectly sets Q when
 the architecturally mandated only-check-for-overflow-once algorithm
 does not. For instance:
  r1 = 0x80008000; r2 = 0x80008000; r3 = 0xffffffff
  smlad r0, r1, r2, r3
 This is (-32768 * -32768) + (-32768 * -32768) - 1
 The products are both 0x4000_0000, so when added together as 32-bit
 signed numbers they overflow (and QEMU sets Q), but because the
 addition of Ra == -1 brings the total back down to 0x7fff_ffff
 there is no overflow for the complete operation and setting Q is
 incorrect.
 Fix this edge case by resorting to 64-bit arithmetic for the
 case where we need to add three values together.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201009144712.11187-1-peter.maydell@linaro.org
+Message-id: 20210628135835.6690-4-peter.maydell@linaro.org
 ---
- target/arm/translate.c | 58 ++++++++++++++++++++++++++++++++++--------
+ target/arm/translate.h      | 16 ++++++++++
-file changed, 48 insertions(+), 10 deletions(-)
+ target/arm/translate-neon.c | 63 -------------------------------------
  target/arm/translate.c      | 57 +++++++++++++++++++++++++++++++++
 files changed, 73 insertions(+), 63 deletions(-)
+diff --git a/target/arm/translate.h b/target/arm/translate.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.h
++++ b/target/arm/translate.h
+@@ -XXX,XX +XXX,XX @@ static inline MemOp finalize_memop(DisasContext *s, MemOp opc)
+     return opc | s->be_data;
+ }
++/**
++ * asimd_imm_const: Expand an encoded SIMD constant value
++ *
++ * Expand a SIMD constant value. This is essentially the pseudocode
++ * AdvSIMDExpandImm, except that we also perform the boolean NOT needed for
++ * VMVN and VBIC (when cmode < 14 && op == 1).
++ *
++ * The combination cmode == 15 op == 1 is a reserved encoding for AArch32;
++ * callers must catch this.
++ *
++ * cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 was UNPREDICTABLE in v7A but
++ * is either not unpredictable or merely CONSTRAINED UNPREDICTABLE in v8A;
++ * we produce an immediate constant value of 0 in these cases.
++ */
++uint64_t asimd_imm_const(uint32_t imm, int cmode, int op);
++
+ #endif /* TARGET_ARM_TRANSLATE_H */
+diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.c
++++ b/target/arm/translate-neon.c
+@@ -XXX,XX +XXX,XX @@ DO_FP_2SH(VCVT_UH, gen_helper_gvec_vcvt_uh)
+ DO_FP_2SH(VCVT_HS, gen_helper_gvec_vcvt_hs)
+ DO_FP_2SH(VCVT_HU, gen_helper_gvec_vcvt_hu)
+-static uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
+-{
+-    /*
+-     * Expand the encoded constant.
+-     * Note that cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 is UNPREDICTABLE.
+-     * We choose to not special-case this and will behave as if a
+-     * valid constant encoding of 0 had been given.
+-     * cmode = 15 op = 1 must UNDEF; we assume decode has handled that.
+-     */
+-    switch (cmode) {
+-    case 0: case 1:
+-        /* no-op */
+-        break;
+-    case 2: case 3:
+-        imm <<= 8;
+-        break;
+-    case 4: case 5:
+-        imm <<= 16;
+-        break;
+-    case 6: case 7:
+-        imm <<= 24;
+-        break;
+-    case 8: case 9:
+-        imm |= imm << 16;
+-        break;
+-    case 10: case 11:
+-        imm = (imm << 8) | (imm << 24);
+-        break;
+-    case 12:
+-        imm = (imm << 8) | 0xff;
+-        break;
+-    case 13:
+-        imm = (imm << 16) | 0xffff;
+-        break;
+-    case 14:
+-        if (op) {
+-            /*
+-             * This is the only case where the top and bottom 32 bits
+-             * of the encoded constant differ.
+-             */
+-            uint64_t imm64 = 0;
+-            int n;
+-
+-            for (n = 0; n < 8; n++) {
+-                if (imm & (1 << n)) {
+-                    imm64 |= (0xffULL << (n * 8));
+-                }
+-            }
+-            return imm64;
+-        }
+-        imm |= (imm << 8) | (imm << 16) | (imm << 24);
+-        break;
+-    case 15:
+-        imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
+-            | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
+-        break;
+-    }
+-    if (op) {
+-        imm = ~imm;
+-    }
+-    return dup_const(MO_32, imm);
+-}
+-
+ static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
+                         GVecGen2iFn *fn)
+ {
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static bool op_smlad(DisasContext *s, arg_rrrr *a, bool m_swap, bool sub)
+@@ -XXX,XX +XXX,XX @@ void arm_translate_init(void)
-     gen_smul_dual(t1, t2);
+     a64_translate_init();
+ }
-     if (sub) {
--        /* This subtraction cannot overflow. */
++uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
-+        /*
++{
-+         * This subtraction cannot overflow, so we can do a simple
++    /* Expand the encoded constant as per AdvSIMDExpandImm pseudocode */
-+         * 32-bit subtraction and then a possible 32-bit saturating
++    switch (cmode) {
-+         * addition of Ra.
++    case 0: case 1:
-+         */
++        /* no-op */
-         tcg_gen_sub_i32(t1, t1, t2);
++        break;
-+        tcg_temp_free_i32(t2);
++    case 2: case 3:
 +        imm <<= 8;
 +        break;
 +    case 4: case 5:
 +        imm <<= 16;
 +        break;
 +    case 6: case 7:
 +        imm <<= 24;
 +        break;
 +    case 8: case 9:
 +        imm |= imm << 16;
 +        break;
 +    case 10: case 11:
 +        imm = (imm << 8) | (imm << 24);
 +        break;
 +    case 12:
 +        imm = (imm << 8) | 0xff;
 +        break;
 +    case 13:
 +        imm = (imm << 16) | 0xffff;
 +        break;
 +    case 14:
 +        if (op) {
 +            /*
 +             * This is the only case where the top and bottom 32 bits
 +             * of the encoded constant differ.
 +             */
 +            uint64_t imm64 = 0;
 +            int n;
 +
-+        if (a->ra != 15) {
++            for (n = 0; n < 8; n++) {
-+            t2 = load_reg(s, a->ra);
++                if (imm & (1 << n)) {
-+            gen_helper_add_setq(t1, cpu_env, t1, t2);
++                    imm64 |= (0xffULL << (n * 8));
-+            tcg_temp_free_i32(t2);
++                }
 +            }
 +            return imm64;
 +        }
-+    } else if (a->ra == 15) {
++        imm |= (imm << 8) | (imm << 16) | (imm << 24);
-+        /* Single saturation-checking addition */
++        break;
-+        gen_helper_add_setq(t1, cpu_env, t1, t2);
++    case 15:
-+        tcg_temp_free_i32(t2);
++        imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
-     } else {
++            | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
-         /*
++        break;
--         * This addition cannot overflow 32 bits; however it may
++    }
--         * overflow considered as a signed operation, in which case
++    if (op) {
--         * we must set the Q flag.
++        imm = ~imm;
-+         * We need to add the products and Ra together and then
++    }
-+         * determine whether the final result overflowed. Doing
++    return dup_const(MO_32, imm);
-+         * this as two separate add-and-check-overflow steps incorrectly
++}
 +         * sets Q for cases like (-32768 * -32768) + (-32768 * -32768) + -1.
 +         * Do all the arithmetic at 64-bits and then check for overflow.
           */
 -        gen_helper_add_setq(t1, cpu_env, t1, t2);
 -    }
 -    tcg_temp_free_i32(t2);
 +        TCGv_i64 p64, q64;
 +        TCGv_i32 t3, qf, one;
 -    if (a->ra != 15) {
 -        t2 = load_reg(s, a->ra);
 -        gen_helper_add_setq(t1, cpu_env, t1, t2);
 +        p64 = tcg_temp_new_i64();
 +        q64 = tcg_temp_new_i64();
 +        tcg_gen_ext_i32_i64(p64, t1);
 +        tcg_gen_ext_i32_i64(q64, t2);
 +        tcg_gen_add_i64(p64, p64, q64);
 +        load_reg_var(s, t2, a->ra);
 +        tcg_gen_ext_i32_i64(q64, t2);
 +        tcg_gen_add_i64(p64, p64, q64);
 +        tcg_temp_free_i64(q64);
 +
-+        tcg_gen_extr_i64_i32(t1, t2, p64);
+ /* Generate a label used for skipping this instruction */
-+        tcg_temp_free_i64(p64);
+ void arm_gen_condlabel(DisasContext *s)
-+        /*
+ {
 +         * t1 is the low half of the result which goes into Rd.
 +         * We have overflow and must set Q if the high half (t2)
 +         * is different from the sign-extension of t1.
 +         */
 +        t3 = tcg_temp_new_i32();
 +        tcg_gen_sari_i32(t3, t1, 31);
 +        qf = load_cpu_field(QF);
 +        one = tcg_const_i32(1);
 +        tcg_gen_movcond_i32(TCG_COND_NE, qf, t2, t3, one, qf);
 +        store_cpu_field(qf, QF);
 +        tcg_temp_free_i32(one);
 +        tcg_temp_free_i32(t3);
          tcg_temp_free_i32(t2);
      }
      store_reg(s, a->rd, t1);
 --
 .20.1

-[PULL 28/41] target/arm: Allow M-profile CPUs with FP16 to set FPSCR.FP16
+[PULL 10/24] target/arm: Use asimd_imm_const for A64 decode
-M-profile CPUs with half-precision floating point support should
+The A64 AdvSIMD modified-immediate grouping uses almost the same
-be able to write to FPSCR.FZ16, but an M-profile specific masking
+constant encoding that A32 Neon does; reuse asimd_imm_const() (to
-of the value at the top of vfp_set_fpscr() currently prevents that.
+which we add the AArch64-specific case for cmode 15 op 1) instead of
-This is not yet an active bug because we have no M-profile
+reimplementing it all.
 FP16 CPUs, but needs to be fixed before we can add any.
 The bits that the masking is effectively preventing from being
 set are the A-profile only short-vector Len and Stride fields,
 plus the Neon QC bit. Rearrange the order of the function so
 that those fields are handled earlier and only under a suitable
 guard; this allows us to drop the M-profile specific masking,
 making FZ16 writeable.
 This change also makes the QC bit correctly RAZ/WI for older
 no-Neon A-profile cores.
 This refactoring also paves the way for the low-overhead-branch
 LTPSIZE field, which uses some of the bits that are used for
 A-profile Stride and Len.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201019151301.2046-10-peter.maydell@linaro.org
+Message-id: 20210628135835.6690-5-peter.maydell@linaro.org
 ---
- target/arm/vfp_helper.c | 47 ++++++++++++++++++++++++-----------------
+ target/arm/translate.h     |  3 +-
-file changed, 28 insertions(+), 19 deletions(-)
+ target/arm/translate-a64.c | 86 ++++----------------------------------
  target/arm/translate.c     | 17 +++++++-
 files changed, 24 insertions(+), 82 deletions(-)
-diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
+diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp_helper.c
+--- a/target/arm/translate.h
-+++ b/target/arm/vfp_helper.c
++++ b/target/arm/translate.h
-@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
+@@ -XXX,XX +XXX,XX @@ static inline MemOp finalize_memop(DisasContext *s, MemOp opc)
-         val &= ~FPCR_FZ16;
+  * VMVN and VBIC (when cmode < 14 && op == 1).
   *
   * The combination cmode == 15 op == 1 is a reserved encoding for AArch32;
 - * callers must catch this.
 + * callers must catch this; we return the 64-bit constant value defined
 + * for AArch64.
   *
   * cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 was UNPREDICTABLE in v7A but
   * is either not unpredictable or merely CONSTRAINED UNPREDICTABLE in v8A;
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
  {
      int rd = extract32(insn, 0, 5);
      int cmode = extract32(insn, 12, 4);
 -    int cmode_3_1 = extract32(cmode, 1, 3);
 -    int cmode_0 = extract32(cmode, 0, 1);
      int o2 = extract32(insn, 11, 1);
      uint64_t abcdefgh = extract32(insn, 5, 5) | (extract32(insn, 16, 3) << 5);
      bool is_neg = extract32(insn, 29, 1);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
          return;
      }
--    if (arm_feature(env, ARM_FEATURE_M)) {
+-    /* See AdvSIMDExpandImm() in ARM ARM */
-+    vfp_set_fpscr_to_host(env, val);
+-    switch (cmode_3_1) {
-+
+-    case 0: /* Replicate(Zeros(24):imm8, 2) */
-+    if (!arm_feature(env, ARM_FEATURE_M)) {
+-    case 1: /* Replicate(Zeros(16):imm8:Zeros(8), 2) */
-         /*
+-    case 2: /* Replicate(Zeros(8):imm8:Zeros(16), 2) */
--         * M profile FPSCR is RES0 for the QC, STRIDE, FZ16, LEN bits
+-    case 3: /* Replicate(imm8:Zeros(24), 2) */
--         * and also for the trapped-exception-handling bits IxE.
+-    {
-+         * Short-vector length and stride; on M-profile these bits
+-        int shift = cmode_3_1 * 8;
-+         * are used for different purposes.
+-        imm = bitfield_replicate(abcdefgh << shift, 32);
-+         * We can't make this conditional be "if MVFR0.FPShVec != 0",
+-        break;
-+         * because in v7A no-short-vector-support cores still had to
+-    }
-+         * allow Stride/Len to be written with the only effect that
+-    case 4: /* Replicate(Zeros(8):imm8, 4) */
-+         * some insns are required to UNDEF if the guest sets them.
+-    case 5: /* Replicate(imm8:Zeros(8), 4) */
-+         *
+-    {
-+         * TODO: if M-profile MVE implemented, set LTPSIZE.
+-        int shift = (cmode_3_1 & 0x1) * 8;
-          */
+-        imm = bitfield_replicate(abcdefgh << shift, 16);
--        val &= 0xf7c0009f;
+-        break;
-+        env->vfp.vec_len = extract32(val, 16, 3);
+-    }
-+        env->vfp.vec_stride = extract32(val, 20, 2);
+-    case 6:
 -        if (cmode_0) {
 -            /* Replicate(Zeros(8):imm8:Ones(16), 2) */
 -            imm = (abcdefgh << 16) | 0xffff;
 -        } else {
 -            /* Replicate(Zeros(16):imm8:Ones(8), 2) */
 -            imm = (abcdefgh << 8) | 0xff;
 -        }
 -        imm = bitfield_replicate(imm, 32);
 -        break;
 -    case 7:
 -        if (!cmode_0 && !is_neg) {
 -            imm = bitfield_replicate(abcdefgh, 8);
 -        } else if (!cmode_0 && is_neg) {
 -            int i;
 -            imm = 0;
 -            for (i = 0; i < 8; i++) {
 -                if ((abcdefgh) & (1 << i)) {
 -                    imm |= 0xffULL << (i * 8);
 -                }
 -            }
 -        } else if (cmode_0) {
 -            if (is_neg) {
 -                imm = (abcdefgh & 0x3f) << 48;
 -                if (abcdefgh & 0x80) {
 -                    imm |= 0x8000000000000000ULL;
 -                }
 -                if (abcdefgh & 0x40) {
 -                    imm |= 0x3fc0000000000000ULL;
 -                } else {
 -                    imm |= 0x4000000000000000ULL;
 -                }
 -            } else {
 -                if (o2) {
 -                    /* FMOV (vector, immediate) - half-precision */
 -                    imm = vfp_expand_imm(MO_16, abcdefgh);
 -                    /* now duplicate across the lanes */
 -                    imm = bitfield_replicate(imm, 16);
 -                } else {
 -                    imm = (abcdefgh & 0x3f) << 19;
 -                    if (abcdefgh & 0x80) {
 -                        imm |= 0x80000000;
 -                    }
 -                    if (abcdefgh & 0x40) {
 -                        imm |= 0x3e000000;
 -                    } else {
 -                        imm |= 0x40000000;
 -                    }
 -                    imm |= (imm << 32);
 -                }
 -            }
 -        }
 -        break;
 -    default:
 -        g_assert_not_reached();
 -    }
 -
 -    if (cmode_3_1 != 7 && is_neg) {
 -        imm = ~imm;
 +    if (cmode == 15 && o2 && !is_neg) {
 +        /* FMOV (vector, immediate) - half-precision */
 +        imm = vfp_expand_imm(MO_16, abcdefgh);
 +        /* now duplicate across the lanes */
 +        imm = bitfield_replicate(imm, 16);
 +    } else {
 +        imm = asimd_imm_const(abcdefgh, cmode, is_neg);
      }
--    vfp_set_fpscr_to_host(env, val);
+     if (!((cmode & 0x9) == 0x1 || (cmode & 0xd) == 0x9)) {
-+    if (arm_feature(env, ARM_FEATURE_NEON)) {
+diff --git a/target/arm/translate.c b/target/arm/translate.c
-+        /*
+index XXXXXXX..XXXXXXX 100644
-+         * The bit we set within fpscr_q is arbitrary; the register as a
+--- a/target/arm/translate.c
-+         * whole being zero/non-zero is what counts.
++++ b/target/arm/translate.c
-+         * TODO: M-profile MVE also has a QC bit.
+@@ -XXX,XX +XXX,XX @@ uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
-+         */
+     case 14:
-+        env->vfp.qc[0] = val & FPCR_QC;
+         if (op) {
-+        env->vfp.qc[1] = 0;
+             /*
-+        env->vfp.qc[2] = 0;
+-             * This is the only case where the top and bottom 32 bits
-+        env->vfp.qc[3] = 0;
+-             * of the encoded constant differ.
-+    }
++             * This and cmode == 15 op == 1 are the only cases where
++             * the top and bottom 32 bits of the encoded constant differ.
-     /*
+              */
-      * We don't implement trapped exception handling, so the
+             uint64_t imm64 = 0;
-      * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
+             int n;
-      *
+@@ -XXX,XX +XXX,XX @@ uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
--     * If we exclude the exception flags, IOC|DZC|OFC|UFC|IXC|IDC
+         imm |= (imm << 8) | (imm << 16) | (imm << 24);
--     * (which are stored in fp_status), and the other RES0 bits
+         break;
--     * in between, then we clear all of the low 16 bits.
+     case 15:
-+     * The exception flags IOC|DZC|OFC|UFC|IXC|IDC are stored in
++        if (op) {
-+     * fp_status; QC, Len and Stride are stored separately earlier.
++            /* Reserved encoding for AArch32; valid for AArch64 */
-+     * Clear out all of those and the RES0 bits: only NZCV, AHP, DN,
++            uint64_t imm64 = (uint64_t)(imm & 0x3f) << 48;
-+     * FZ, RMode and FZ16 are kept in vfp.xregs[FPSCR].
++            if (imm & 0x80) {
-      */
++                imm64 |= 0x8000000000000000ULL;
-     env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000;
++            }
--    env->vfp.vec_len = (val >> 16) & 7;
++            if (imm & 0x40) {
--    env->vfp.vec_stride = (val >> 20) & 3;
++                imm64 |= 0x3fc0000000000000ULL;
--
++            } else {
--    /*
++                imm64 |= 0x4000000000000000ULL;
--     * The bit we set within fpscr_q is arbitrary; the register as a
++            }
--     * whole being zero/non-zero is what counts.
++            return imm64;
--     */
++        }
--    env->vfp.qc[0] = val & FPCR_QC;
+         imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
--    env->vfp.qc[1] = 0;
+             | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
--    env->vfp.qc[2] = 0;
+         break;
 -    env->vfp.qc[3] = 0;
  }
  void vfp_set_fpscr(CPUARMState *env, uint32_t val)
 --
 .20.1

-[PULL 31/41] linux-user: Set PAGE_TARGET_1 for TARGET_PROT_BTI
+[PULL 11/24] target/arm: Use dup_const() instead of bitfield_replicate()
-From: Richard Henderson <richard.henderson@linaro.org>
+Use dup_const() instead of bitfield_replicate() in
 disas_simd_mod_imm().
-Transform the prot bit to a qemu internal page bit, and save
+(We can't replace the other use of bitfield_replicate() in this file,
-it in the page tables.
+in logic_imm_decode_wmask(), because that location needs to handle 2
 and 4 bit elements, which dup_const() cannot.)
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201016184207.786698-3-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210628135835.6690-6-peter.maydell@linaro.org
 ---
- include/exec/cpu-all.h     |  2 ++
+ target/arm/translate-a64.c | 2 +-
- linux-user/syscall_defs.h  |  4 ++++
+file changed, 1 insertion(+), 1 deletion(-)
  target/arm/cpu.h           |  5 +++++
  linux-user/mmap.c          | 16 ++++++++++++++++
  target/arm/translate-a64.c |  6 +++---
 files changed, 30 insertions(+), 3 deletions(-)
-diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/exec/cpu-all.h
-+++ b/include/exec/cpu-all.h
-@@ -XXX,XX +XXX,XX @@ extern intptr_t qemu_host_page_mask;
- /* FIXME: Code that sets/uses this is broken and needs to go away.  */
- #define PAGE_RESERVED  0x0020
- #endif
-+/* Target-specific bits that will be used via page_get_flags().  */
-+#define PAGE_TARGET_1  0x0080
- #if defined(CONFIG_USER_ONLY)
- void page_dump(FILE *f);
-diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
-index XXXXXXX..XXXXXXX 100644
---- a/linux-user/syscall_defs.h
-+++ b/linux-user/syscall_defs.h
-@@ -XXX,XX +XXX,XX @@ struct target_winsize {
- #define TARGET_PROT_SEM         0x08
- #endif
-+#ifdef TARGET_AARCH64
-+#define TARGET_PROT_BTI         0x10
-+#endif
-+
- /* Common */
- #define TARGET_MAP_SHARED    0x01        /* Share changes */
- #define TARGET_MAP_PRIVATE    0x02        /* Changes are private */
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
-+++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline MemTxAttrs *typecheck_memtxattrs(MemTxAttrs *x)
- #define arm_tlb_bti_gp(x) (typecheck_memtxattrs(x)->target_tlb_bit0)
- #define arm_tlb_mte_tagged(x) (typecheck_memtxattrs(x)->target_tlb_bit1)
-+/*
-+ * AArch64 usage of the PAGE_TARGET_* bits for linux-user.
-+ */
-+#define PAGE_BTI  PAGE_TARGET_1
-+
- /*
-  * Naming convention for isar_feature functions:
-  * Functions which test 32-bit ID registers should have _aa32_ in
-diff --git a/linux-user/mmap.c b/linux-user/mmap.c
-index XXXXXXX..XXXXXXX 100644
---- a/linux-user/mmap.c
-+++ b/linux-user/mmap.c
-@@ -XXX,XX +XXX,XX @@ static int validate_prot_to_pageflags(int *host_prot, int prot)
-     *host_prot = (prot & (PROT_READ | PROT_WRITE))
-                | (prot & PROT_EXEC ? PROT_READ : 0);
-+#ifdef TARGET_AARCH64
-+    /*
-+     * The PROT_BTI bit is only accepted if the cpu supports the feature.
-+     * Since this is the unusual case, don't bother checking unless
-+     * the bit has been requested.  If set and valid, record the bit
-+     * within QEMU's page_flags.
-+     */
-+    if (prot & TARGET_PROT_BTI) {
-+        ARMCPU *cpu = ARM_CPU(thread_cpu);
-+        if (cpu_isar_feature(aa64_bti, cpu)) {
-+            valid |= TARGET_PROT_BTI;
-+            page_flags |= PAGE_BTI;
-+        }
-+    }
-+#endif
-+
-     return prot & ~valid ? 0 : page_flags;
- }
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void disas_data_proc_simd_fp(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
-  */
+         /* FMOV (vector, immediate) - half-precision */
- static bool is_guarded_page(CPUARMState *env, DisasContext *s)
+         imm = vfp_expand_imm(MO_16, abcdefgh);
- {
+         /* now duplicate across the lanes */
--#ifdef CONFIG_USER_ONLY
+-        imm = bitfield_replicate(imm, 16);
--    return false;  /* FIXME */
++        imm = dup_const(MO_16, imm);
--#else
+     } else {
-     uint64_t addr = s->base.pc_first;
+         imm = asimd_imm_const(abcdefgh, cmode, is_neg);
-+#ifdef CONFIG_USER_ONLY
+     }
 +    return page_get_flags(addr) & PAGE_BTI;
 +#else
      int mmu_idx = arm_to_core_mmu_idx(s->mmu_idx);
      unsigned int index = tlb_index(env, mmu_idx, addr);
      CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
 --
 .20.1

-[PULL 39/41] linux-user/elfload: Parse NT_GNU_PROPERTY_TYPE_0 notes
+[PULL 12/24] target/arm: Implement MVE logical immediate insns
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the MVE logical-immediate insns (VMOV, VMVN,
 VORR and VBIC). These have essentially the same encoding
 as their Neon equivalents, and we implement the decode
 in the same way.
-This is generic support, with the code disabled for all targets.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210628135835.6690-7-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h    |  4 +++
  target/arm/mve.decode      | 17 +++++++++++++
  target/arm/mve_helper.c    | 24 ++++++++++++++++++
  target/arm/translate-mve.c | 50 ++++++++++++++++++++++++++++++++++++++
 files changed, 95 insertions(+)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 Message-id: 20201016184207.786698-11-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  linux-user/qemu.h    |   4 ++
  linux-user/elfload.c | 157 +++++++++++++++++++++++++++++++++++++++++++
 files changed, 161 insertions(+)
 diff --git a/linux-user/qemu.h b/linux-user/qemu.h
 index XXXXXXX..XXXXXXX 100644
---- a/linux-user/qemu.h
+--- a/target/arm/helper-mve.h
-+++ b/linux-user/qemu.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ struct image_info {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vaddvsh, TCG_CALL_NO_WG, i32, env, ptr, i32)
-         abi_ulong       interpreter_loadmap_addr;
+ DEF_HELPER_FLAGS_3(mve_vaddvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
-         abi_ulong       interpreter_pt_dynamic_addr;
+ DEF_HELPER_FLAGS_3(mve_vaddvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
-         struct image_info *other_info;
+ DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
 +
-+        /* For target-specific processing of NT_GNU_PROPERTY_TYPE_0. */
++DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
-+        uint32_t        note_flags;
++DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
 +DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
  # VQDMULL has size in bit 28: 0 for 16 bit, 1 for 32 bit
  %size_28 28:1 !function=plus_1
 +# 1imm format immediate
 +%imm_28_16_0 28:1 16:3 0:4
 +
- #ifdef TARGET_MIPS
+ &vldr_vstr rn qd imm p a w size l u
-         int             fp_abi;
+ &1op qd qm size
-         int             interp_fp_abi;
+ &2op qd qm qn size
-diff --git a/linux-user/elfload.c b/linux-user/elfload.c
+ &2scalar qd qn rm size
 +&1imm qd imm cmode op
  @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
  # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -XXX,XX +XXX,XX @@
  @2op_nosz .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn size=0
  @2op_sz28 .... .... .... .... .... .... .... .... &2op qd=%qd qm=%qm qn=%qn \
       size=%size_28
 +@1imm .... .... .... .... .... cmode:4 .. op:1 . .... &1imm qd=%qd imm=%imm_28_16_0
  # The _rev suffix indicates that Vn and Vm are reversed. This is
  # the case for shifts. In the Arm ARM these insns are documented
@@ -XXX,XX +XXX,XX @@ VADDV            111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rd
  # Predicate operations
  %mask_22_13      22:1 13:3
  VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
 +
 +# Logical immediate operations (1 reg and modified-immediate)
 +
 +# The cmode/op bits here decode VORR/VBIC/VMOV/VMVN, but
 +# not in a way we can conveniently represent in decodetree without
 +# a lot of repetition:
 +# VORR: op=0, (cmode & 1) && cmode < 12
 +# VBIC: op=1, (cmode & 1) && cmode < 12
 +# VMOV: everything else
 +# So we have a single decode line and check the cmode/op in the
 +# trans function.
 +Vimm_1r 111 . 1111 1 . 00 0 ... ... 0 .... 0 1 . 1 .... @1imm
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/linux-user/elfload.c
+--- a/target/arm/mve_helper.c
-+++ b/linux-user/elfload.c
++++ b/target/arm/mve_helper.c
-@@ -XXX,XX +XXX,XX @@ static void elf_core_copy_regs(target_elf_gregset_t *regs,
+@@ -XXX,XX +XXX,XX @@ DO_1OP(vnegw, 4, int32_t, DO_NEG)
+ DO_1OP(vfnegh, 8, uint64_t, DO_FNEGH)
- #include "elf.h"
+ DO_1OP(vfnegs, 8, uint64_t, DO_FNEGS)
 +static bool arch_parse_elf_property(uint32_t pr_type, uint32_t pr_datasz,
 +                                    const uint32_t *data,
 +                                    struct image_info *info,
 +                                    Error **errp)
 +{
 +    g_assert_not_reached();
 +}
 +#define ARCH_USE_GNU_PROPERTY 0
 +
  struct exec
  {
      unsigned int a_info;   /* Use macros N_MAGIC, etc for access */
@@ -XXX,XX +XXX,XX @@ void probe_guest_base(const char *image_name, abi_ulong guest_loaddr,
                    "@ 0x%" PRIx64 "\n", (uint64_t)guest_base);
  }
 +enum {
 +    /* The string "GNU\0" as a magic number. */
 +    GNU0_MAGIC = const_le32('G' | 'N' << 8 | 'U' << 16),
 +    NOTE_DATA_SZ = 1 * KiB,
 +    NOTE_NAME_SZ = 4,
 +    ELF_GNU_PROPERTY_ALIGN = ELF_CLASS == ELFCLASS32 ? 4 : 8,
 +};
 +
 +/*
-+ * Process a single gnu_property entry.
++ * 1 operand immediates: Vda is destination and possibly also one source.
-+ * Return false for error.
++ * All these insns work at 64-bit widths.
 + */
-+static bool parse_elf_property(const uint32_t *data, int *off, int datasz,
++#define DO_1OP_IMM(OP, FN)                                              \
-+                               struct image_info *info, bool have_prev_type,
++    void HELPER(mve_##OP)(CPUARMState *env, void *vda, uint64_t imm)    \
-+                               uint32_t *prev_type, Error **errp)
++    {                                                                   \
-+{
++        uint64_t *da = vda;                                             \
-+    uint32_t pr_type, pr_datasz, step;
++        uint16_t mask = mve_element_mask(env);                          \
-+
++        unsigned e;                                                     \
-+    if (*off > datasz || !QEMU_IS_ALIGNED(*off, ELF_GNU_PROPERTY_ALIGN)) {
++        for (e = 0; e < 16 / 8; e++, mask >>= 8) {                      \
-+        goto error_data;
++            mergemask(&da[H8(e)], FN(da[H8(e)], imm), mask);            \
-+    }
++        }                                                               \
-+    datasz -= *off;
++        mve_advance_vpt(env);                                           \
 +    data += *off / sizeof(uint32_t);
 +
 +    if (datasz < 2 * sizeof(uint32_t)) {
 +        goto error_data;
 +    }
 +    pr_type = data[0];
 +    pr_datasz = data[1];
 +    data += 2;
 +    datasz -= 2 * sizeof(uint32_t);
 +    step = ROUND_UP(pr_datasz, ELF_GNU_PROPERTY_ALIGN);
 +    if (step > datasz) {
 +        goto error_data;
 +    }
 +
-+    /* Properties are supposed to be unique and sorted on pr_type. */
++#define DO_MOVI(N, I) (I)
-+    if (have_prev_type && pr_type <= *prev_type) {
++#define DO_ANDI(N, I) ((N) & (I))
-+        if (pr_type == *prev_type) {
++#define DO_ORRI(N, I) ((N) | (I))
-+            error_setg(errp, "Duplicate property in PT_GNU_PROPERTY");
++
-+        } else {
++DO_1OP_IMM(vmovi, DO_MOVI)
-+            error_setg(errp, "Unsorted property in PT_GNU_PROPERTY");
++DO_1OP_IMM(vandi, DO_ANDI)
-+        }
++DO_1OP_IMM(vorri, DO_ORRI)
 +
  #define DO_2OP(OP, ESIZE, TYPE, FN)                                     \
      void HELPER(glue(mve_, OP))(CPUARMState *env,                       \
                                  void *vd, void *vn, void *vm)           \
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
  typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
 +typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
  /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
  static inline long mve_qreg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static bool trans_VADDV(DisasContext *s, arg_VADDV *a)
      mve_update_eci(s);
      return true;
  }
 +
 +static bool do_1imm(DisasContext *s, arg_1imm *a, MVEGenOneOpImmFn *fn)
 +{
 +    TCGv_ptr qd;
 +    uint64_t imm;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd) ||
 +        !fn) {
 +        return false;
 +    }
-+    *prev_type = pr_type;
++    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +
 +    if (!arch_parse_elf_property(pr_type, pr_datasz, data, info, errp)) {
 +        return false;
 +    }
 +
 +    *off += 2 * sizeof(uint32_t) + step;
 +    return true;
 +
 + error_data:
 +    error_setg(errp, "Ill-formed property in PT_GNU_PROPERTY");
 +    return false;
 +}
 +
 +/* Process NT_GNU_PROPERTY_TYPE_0. */
 +static bool parse_elf_properties(int image_fd,
 +                                 struct image_info *info,
 +                                 const struct elf_phdr *phdr,
 +                                 char bprm_buf[BPRM_BUF_SIZE],
 +                                 Error **errp)
 +{
 +    union {
 +        struct elf_note nhdr;
 +        uint32_t data[NOTE_DATA_SZ / sizeof(uint32_t)];
 +    } note;
 +
 +    int n, off, datasz;
 +    bool have_prev_type;
 +    uint32_t prev_type;
 +
 +    /* Unless the arch requires properties, ignore them. */
 +    if (!ARCH_USE_GNU_PROPERTY) {
 +        return true;
 +    }
 +
-+    /* If the properties are crazy large, that's too bad. */
++    imm = asimd_imm_const(a->imm, a->cmode, a->op);
 +    n = phdr->p_filesz;
 +    if (n > sizeof(note)) {
 +        error_setg(errp, "PT_GNU_PROPERTY too large");
 +        return false;
 +    }
 +    if (n < sizeof(note.nhdr)) {
 +        error_setg(errp, "PT_GNU_PROPERTY too small");
 +        return false;
 +    }
 +
-+    if (phdr->p_offset + n <= BPRM_BUF_SIZE) {
++    qd = mve_qreg_ptr(a->qd);
-+        memcpy(&note, bprm_buf + phdr->p_offset, n);
++    fn(cpu_env, qd, tcg_constant_i64(imm));
 +    tcg_temp_free_ptr(qd);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +static bool trans_Vimm_1r(DisasContext *s, arg_1imm *a)
 +{
 +    /* Handle decode of cmode/op here between VORR/VBIC/VMOV */
 +    MVEGenOneOpImmFn *fn;
 +
 +    if ((a->cmode & 1) && a->cmode < 12) {
 +        if (a->op) {
 +            /*
 +             * For op=1, the immediate will be inverted by asimd_imm_const(),
 +             * so the VBIC becomes a logical AND operation.
 +             */
 +            fn = gen_helper_mve_vandi;
 +        } else {
 +            fn = gen_helper_mve_vorri;
 +        }
 +    } else {
-+        ssize_t len = pread(image_fd, &note, n, phdr->p_offset);
++        /* There is one unallocated cmode/op combination in this space */
-+        if (len != n) {
++        if (a->cmode == 15 && a->op == 1) {
 +            error_setg_errno(errp, errno, "Error reading file header");
 +            return false;
 +        }
++        /* asimd_imm_const() sorts out VMVNI vs VMOVI for us */
++        fn = gen_helper_mve_vmovi;
 +    }
-+
++    return do_1imm(s, a, fn);
 +    /*
 +     * The contents of a valid PT_GNU_PROPERTY is a sequence
 +     * of uint32_t -- swap them all now.
 +     */
 +#ifdef BSWAP_NEEDED
 +    for (int i = 0; i < n / 4; i++) {
 +        bswap32s(note.data + i);
 +    }
 +#endif
 +
 +    /*
 +     * Note that nhdr is 3 words, and that the "name" described by namesz
 +     * immediately follows nhdr and is thus at the 4th word.  Further, all
 +     * of the inputs to the kernel's round_up are multiples of 4.
 +     */
 +    if (note.nhdr.n_type != NT_GNU_PROPERTY_TYPE_0 ||
 +        note.nhdr.n_namesz != NOTE_NAME_SZ ||
 +        note.data[3] != GNU0_MAGIC) {
 +        error_setg(errp, "Invalid note in PT_GNU_PROPERTY");
 +        return false;
 +    }
 +    off = sizeof(note.nhdr) + NOTE_NAME_SZ;
 +
 +    datasz = note.nhdr.n_descsz + off;
 +    if (datasz > n) {
 +        error_setg(errp, "Invalid note size in PT_GNU_PROPERTY");
 +        return false;
 +    }
 +
 +    have_prev_type = false;
 +    prev_type = 0;
 +    while (1) {
 +        if (off == datasz) {
 +            return true;  /* end, exit ok */
 +        }
 +        if (!parse_elf_property(note.data, &off, datasz, info,
 +                                have_prev_type, &prev_type, errp)) {
 +            return false;
 +        }
 +        have_prev_type = true;
 +    }
 +}
-+
- /* Load an ELF image into the address space.
-    IMAGE_NAME is the filename of the image, to use in error messages.
-@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
-                 goto exit_errmsg;
-             }
-             *pinterp_name = g_steal_pointer(&interp_name);
-+        } else if (eppnt->p_type == PT_GNU_PROPERTY) {
-+            if (!parse_elf_properties(image_fd, info, eppnt, bprm_buf, &err)) {
-+                goto exit_errmsg;
-+            }
-         }
-     }
 --
 .20.1

-[PULL 29/41] target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension
+[PULL 13/24] target/arm: Implement MVE vector shift left by immediate insns
-If the M-profile low-overhead-branch extension is implemented, FPSCR
+Implement the MVE shift-vector-left-by-immediate insns VSHL, VQSHL
-bits [18:16] are a new field LTPSIZE.  If MVE is not implemented
+and VQSHLU.
-(currently always true for us) then this field always reads as 4 and
-ignores writes.
+The size-and-immediate encoding here is the same as Neon, and we
+handle it the same way neon-dp.decode does.
 These bits used to be the vector-length field for the old
 short-vector extension, so we need to take care that they are not
 misinterpreted as setting vec_len. We do this with a rearrangement
 of the vfp_set_fpscr() code that deals with vec_len, vec_stride
 and also the QC bit; this obviates the need for the M-profile
 only masking step that we used to have at the start of the function.
 We provide a new field in CPUState for LTPSIZE, even though this
 will always be 4, in preparation for MVE, so we don't have to
 come back later and split it out of the vfp.xregs[FPSCR] value.
 (This state struct field will be saved and restored as part of
 the FPSCR value via the vmstate_fpscr in machine.c.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201019151301.2046-11-peter.maydell@linaro.org
+Message-id: 20210628135835.6690-8-peter.maydell@linaro.org
 ---
- target/arm/cpu.h        | 1 +
+ target/arm/helper-mve.h    | 16 +++++++++++
- target/arm/cpu.c        | 9 +++++++++
+ target/arm/mve.decode      | 23 +++++++++++++++
- target/arm/vfp_helper.c | 6 ++++++
+ target/arm/mve_helper.c    | 57 ++++++++++++++++++++++++++++++++++++++
-files changed, 16 insertions(+)
+ target/arm/translate-mve.c | 51 ++++++++++++++++++++++++++++++++++
+files changed, 147 insertions(+)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
-index XXXXXXX..XXXXXXX 100644
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
---- a/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
-+++ b/target/arm/cpu.h
+--- a/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
++++ b/target/arm/helper-mve.h
-         uint32_t fpdscr[M_REG_NUM_BANKS];
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
-         uint32_t cpacr[M_REG_NUM_BANKS];
+ DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
-         uint32_t nsacr;
+ DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
-+        int ltpsize;
+ DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
-     } v7m;
++
++DEF_HELPER_FLAGS_4(mve_vshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     /* Information associated with an exception about to be taken:
++DEF_HELPER_FLAGS_4(mve_vshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
++DEF_HELPER_FLAGS_4(mve_vshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-index XXXXXXX..XXXXXXX 100644
++
---- a/target/arm/cpu.c
++DEF_HELPER_FLAGS_4(mve_vqshli_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+++ b/target/arm/cpu.c
++DEF_HELPER_FLAGS_4(mve_vqshli_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
++DEF_HELPER_FLAGS_4(mve_vqshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-         uint8_t *rom;
++
-         uint32_t vecbase;
++DEF_HELPER_FLAGS_4(mve_vqshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(mve_vqshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+        if (cpu_isar_feature(aa32_lob, cpu)) {
++DEF_HELPER_FLAGS_4(mve_vqshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+            /*
++
-+             * LTPSIZE is constant 4 if MVE not implemented, and resets
++DEF_HELPER_FLAGS_4(mve_vqshlui_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+             * to an UNKNOWN value if MVE is implemented. We choose to
++DEF_HELPER_FLAGS_4(mve_vqshlui_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+             * always reset to 4.
++DEF_HELPER_FLAGS_4(mve_vqshlui_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+             */
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-+            env->v7m.ltpsize = 4;
+index XXXXXXX..XXXXXXX 100644
-+        }
+--- a/target/arm/mve.decode
-+
++++ b/target/arm/mve.decode
-         if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
+@@ -XXX,XX +XXX,XX @@
-             env->v7m.secure = true;
+ &2op qd qm qn size
-         } else {
+ &2scalar qd qn rm size
-diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
+ &1imm qd imm cmode op
-index XXXXXXX..XXXXXXX 100644
++&2shift qd qm shift size
---- a/target/arm/vfp_helper.c
-+++ b/target/arm/vfp_helper.c
+ @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env)
+ # Note that both Rn and Qd are 3 bits only (no D bit)
-             | (env->vfp.vec_len << 16)
+@@ -XXX,XX +XXX,XX @@
-             | (env->vfp.vec_stride << 20);
+ @2scalar .... .... .. size:2 .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
+ @2scalar_nosz .... .... .... .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
 +@2_shl_b .... .... .. 001 shift:3 .... .... .... .... &2shift qd=%qd qm=%qm size=0
 +@2_shl_h .... .... .. 01  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
 +@2_shl_w .... .... .. 1   shift:5 .... .... .... .... &2shift qd=%qd qm=%qm size=2
 +
  # Vector loads and stores
  # Widening loads and narrowing stores:
@@ -XXX,XX +XXX,XX @@ VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
  # So we have a single decode line and check the cmode/op in the
  # trans function.
  Vimm_1r 111 . 1111 1 . 00 0 ... ... 0 .... 0 1 . 1 .... @1imm
 +
 +# Shifts by immediate
 +
 +VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_b
 +VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_h
 +VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_w
 +
 +VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_b
 +VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_h
 +VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_w
 +
 +VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_b
 +VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_h
 +VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_w
 +
 +VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_b
 +VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_h
 +VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_w
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT(vqsubsw, 4, int32_t, DO_SQSUB_W)
      WRAP_QRSHL_HELPER(do_sqrshl_bhs, N, M, true, satp)
  #define DO_UQRSHL_OP(N, M, satp) \
      WRAP_QRSHL_HELPER(do_uqrshl_bhs, N, M, true, satp)
 +#define DO_SUQSHL_OP(N, M, satp) \
 +    WRAP_QRSHL_HELPER(do_suqrshl_bhs, N, M, false, satp)
  DO_2OP_SAT_S(vqshls, DO_SQSHL_OP)
  DO_2OP_SAT_U(vqshlu, DO_UQSHL_OP)
@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvsw, 4, uint32_t)
  DO_VADDV(vaddvub, 1, uint8_t)
  DO_VADDV(vaddvuh, 2, uint16_t)
  DO_VADDV(vaddvuw, 4, uint32_t)
 +
 +/* Shifts by immediate */
 +#define DO_2SHIFT(OP, ESIZE, TYPE, FN)                          \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
 +                                void *vm, uint32_t shift)       \
 +    {                                                           \
 +        TYPE *d = vd, *m = vm;                                  \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned e;                                             \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
 +            mergemask(&d[H##ESIZE(e)],                          \
 +                      FN(m[H##ESIZE(e)], shift), mask);         \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +    }
 +
 +#define DO_2SHIFT_SAT(OP, ESIZE, TYPE, FN)                      \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
 +                                void *vm, uint32_t shift)       \
 +    {                                                           \
 +        TYPE *d = vd, *m = vm;                                  \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned e;                                             \
 +        bool qc = false;                                        \
 +        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
 +            bool sat = false;                                   \
 +            mergemask(&d[H##ESIZE(e)],                          \
 +                      FN(m[H##ESIZE(e)], shift, &sat), mask);   \
 +            qc |= sat & mask & 1;                               \
 +        }                                                       \
 +        if (qc) {                                               \
 +            env->vfp.qc[0] = qc;                                \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +    }
 +
 +/* provide unsigned 2-op shift helpers for all sizes */
 +#define DO_2SHIFT_U(OP, FN)                     \
 +    DO_2SHIFT(OP##b, 1, uint8_t, FN)            \
 +    DO_2SHIFT(OP##h, 2, uint16_t, FN)           \
 +    DO_2SHIFT(OP##w, 4, uint32_t, FN)
 +
 +#define DO_2SHIFT_SAT_U(OP, FN)                 \
 +    DO_2SHIFT_SAT(OP##b, 1, uint8_t, FN)        \
 +    DO_2SHIFT_SAT(OP##h, 2, uint16_t, FN)       \
 +    DO_2SHIFT_SAT(OP##w, 4, uint32_t, FN)
 +#define DO_2SHIFT_SAT_S(OP, FN)                 \
 +    DO_2SHIFT_SAT(OP##b, 1, int8_t, FN)         \
 +    DO_2SHIFT_SAT(OP##h, 2, int16_t, FN)        \
 +    DO_2SHIFT_SAT(OP##w, 4, int32_t, FN)
 +
 +DO_2SHIFT_U(vshli_u, DO_VSHLU)
 +DO_2SHIFT_SAT_U(vqshli_u, DO_UQSHL_OP)
 +DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
 +DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
 +typedef void MVEGenTwoOpShiftFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
  typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1imm *a)
      }
      return do_1imm(s, a, fn);
  }
 +
 +static bool do_2shift(DisasContext *s, arg_2shift *a, MVEGenTwoOpShiftFn fn,
 +                      bool negateshift)
 +{
 +    TCGv_ptr qd, qm;
 +    int shift = a->shift;
 +
 +    if (!dc_isar_feature(aa32_mve, s) ||
 +        !mve_check_qreg_bank(s, a->qd | a->qm) ||
 +        !fn) {
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    /*
-+     * M-profile LTPSIZE overlaps A-profile Stride; whichever of the
++     * When we handle a right shift insn using a left-shift helper
-+     * two is not applicable to this CPU will always be zero.
++     * which permits a negative shift count to indicate a right-shift,
 +     * we must negate the shift count.
 +     */
-+    fpscr |= env->v7m.ltpsize << 16;
++    if (negateshift) {
-+
++        shift = -shift;
-     fpscr |= vfp_get_fpscr_from_host(env);
++    }
++
-     i = env->vfp.qc[0] | env->vfp.qc[1] | env->vfp.qc[2] | env->vfp.qc[3];
++    qd = mve_qreg_ptr(a->qd);
 +    qm = mve_qreg_ptr(a->qm);
 +    fn(cpu_env, qd, qm, tcg_constant_i32(shift));
 +    tcg_temp_free_ptr(qd);
 +    tcg_temp_free_ptr(qm);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
 +#define DO_2SHIFT(INSN, FN, NEGATESHIFT)                         \
 +    static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
 +    {                                                           \
 +        static MVEGenTwoOpShiftFn * const fns[] = {             \
 +            gen_helper_mve_##FN##b,                             \
 +            gen_helper_mve_##FN##h,                             \
 +            gen_helper_mve_##FN##w,                             \
 +            NULL,                                               \
 +        };                                                      \
 +        return do_2shift(s, a, fns[a->size], NEGATESHIFT);      \
 +    }
 +
 +DO_2SHIFT(VSHLI, vshli_u, false)
 +DO_2SHIFT(VQSHLI_S, vqshli_s, false)
 +DO_2SHIFT(VQSHLI_U, vqshli_u, false)
 +DO_2SHIFT(VQSHLUI, vqshlui_s, false)
 --
 .20.1

-[PULL 10/41] target/arm: Use tlb_flush_page_bits_by_mmuidx*
+[PULL 14/24] target/arm: Implement MVE vector shift right by immediate insns
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the MVE vector shift right by immediate insns VSHRI and
 VRSHRI.  As with Neon, we implement these by using helper functions
 which perform left shifts but allow negative shift counts to indicate
 right shifts.
-When TBI is enabled in a given regime, 56 bits of the address
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-are significant and we need to clear out any other matching
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-virtual addresses with differing tags.
+Message-id: 20210628135835.6690-9-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h     | 12 ++++++++++++
  target/arm/translate.h      | 20 ++++++++++++++++++++
  target/arm/mve.decode       | 28 ++++++++++++++++++++++++++++
  target/arm/mve_helper.c     |  7 +++++++
  target/arm/translate-mve.c  |  5 +++++
  target/arm/translate-neon.c | 18 ------------------
 files changed, 72 insertions(+), 18 deletions(-)
-The other uses of tlb_flush_page (without mmuidx) in this file
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 are only used by aarch32 mode.
 Fixes: 38d931687fa1
 Reported-by: Jordan Frank <jordanfrank@fb.com>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Message-id: 20201016210754.818257-3-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/helper.c | 46 ++++++++++++++++++++++++++++++++++++++-------
 file changed, 39 insertions(+), 7 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/helper.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
- #endif
+ DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
+ DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
- static void switch_mode(CPUARMState *env, int mode);
-+static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx);
++DEF_HELPER_FLAGS_4(mve_vshli_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(mve_vshli_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- static int vfp_gdb_get_reg(CPUARMState *env, GByteArray *buf, int reg)
++DEF_HELPER_FLAGS_4(mve_vshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- {
++
-@@ -XXX,XX +XXX,XX @@ static int vae1_tlbmask(CPUARMState *env)
+ DEF_HELPER_FLAGS_4(mve_vshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     }
+ DEF_HELPER_FLAGS_4(mve_vshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vqshlui_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vqshlui_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vqshlui_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vrshli_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshli_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ static inline int times_2_plus_1(DisasContext *s, int x)
      return x * 2 + 1;
  }
-+/* Return 56 if TBI is enabled, 64 otherwise. */
++static inline int rsub_64(DisasContext *s, int x)
 +static int tlbbits_for_regime(CPUARMState *env, ARMMMUIdx mmu_idx,
 +                              uint64_t addr)
 +{
-+    uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
++    return 64 - x;
 +    int tbi = aa64_va_parameter_tbi(tcr, mmu_idx);
 +    int select = extract64(addr, 55, 1);
 +
 +    return (tbi >> select) & 1 ? 56 : 64;
 +}
 +
-+static int vae1_tlbbits(CPUARMState *env, uint64_t addr)
++static inline int rsub_32(DisasContext *s, int x)
 +{
-+    ARMMMUIdx mmu_idx;
++    return 32 - x;
 +
 +    /* Only the regime of the mmu_idx below is significant. */
 +    if (arm_is_secure_below_el3(env)) {
 +        mmu_idx = ARMMMUIdx_SE10_0;
 +    } else if ((env->cp15.hcr_el2 & (HCR_E2H | HCR_TGE))
 +               == (HCR_E2H | HCR_TGE)) {
 +        mmu_idx = ARMMMUIdx_E20_0;
 +    } else {
 +        mmu_idx = ARMMMUIdx_E10_0;
 +    }
 +    return tlbbits_for_regime(env, mmu_idx, addr);
 +}
 +
- static void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
++static inline int rsub_16(DisasContext *s, int x)
-                                       uint64_t value)
++{
 +    return 16 - x;
 +}
 +
 +static inline int rsub_8(DisasContext *s, int x)
 +{
 +    return 8 - x;
 +}
 +
  static inline int arm_dc_feature(DisasContext *dc, int feature)
  {
-@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+     return (dc->features & (1ULL << feature)) != 0;
-     CPUState *cs = env_cpu(env);
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-     int mask = vae1_tlbmask(env);
+index XXXXXXX..XXXXXXX 100644
-     uint64_t pageaddr = sextract64(value << 12, 0, 56);
+--- a/target/arm/mve.decode
-+    int bits = vae1_tlbbits(env, pageaddr);
++++ b/target/arm/mve.decode
+@@ -XXX,XX +XXX,XX @@
--    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, mask);
+ @2_shl_h .... .... .. 01  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
-+    tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits);
+ @2_shl_w .... .... .. 1   shift:5 .... .... .... .... &2shift qd=%qd qm=%qm size=2
 +# Right shifts are encoded as N - shift, where N is the element size in bits.
 +%rshift_i5  16:5 !function=rsub_32
 +%rshift_i4  16:4 !function=rsub_16
 +%rshift_i3  16:3 !function=rsub_8
 +
 +@2_shr_b .... .... .. 001 ... .... .... .... .... &2shift qd=%qd qm=%qm \
 +         size=0 shift=%rshift_i3
 +@2_shr_h .... .... .. 01 .... .... .... .... .... &2shift qd=%qd qm=%qm \
 +         size=1 shift=%rshift_i4
 +@2_shr_w .... .... .. 1 ..... .... .... .... .... &2shift qd=%qd qm=%qm \
 +         size=2 shift=%rshift_i5
 +
  # Vector loads and stores
  # Widening loads and narrowing stores:
@@ -XXX,XX +XXX,XX @@ VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_w
  VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_b
  VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_h
  VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_w
 +
 +VSHRI_S           111 0 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_b
 +VSHRI_S           111 0 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_h
 +VSHRI_S           111 0 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_w
 +
 +VSHRI_U           111 1 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_b
 +VSHRI_U           111 1 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_h
 +VSHRI_U           111 1 1111 1 . ... ... ... 0 0000 0 1 . 1 ... 0 @2_shr_w
 +
 +VRSHRI_S          111 0 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_b
 +VRSHRI_S          111 0 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_h
 +VRSHRI_S          111 0 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
 +
 +VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_b
 +VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_h
 +VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvuw, 4, uint32_t)
      DO_2SHIFT(OP##b, 1, uint8_t, FN)            \
      DO_2SHIFT(OP##h, 2, uint16_t, FN)           \
      DO_2SHIFT(OP##w, 4, uint32_t, FN)
 +#define DO_2SHIFT_S(OP, FN)                     \
 +    DO_2SHIFT(OP##b, 1, int8_t, FN)             \
 +    DO_2SHIFT(OP##h, 2, int16_t, FN)            \
 +    DO_2SHIFT(OP##w, 4, int32_t, FN)
  #define DO_2SHIFT_SAT_U(OP, FN)                 \
      DO_2SHIFT_SAT(OP##b, 1, uint8_t, FN)        \
@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvuw, 4, uint32_t)
      DO_2SHIFT_SAT(OP##w, 4, int32_t, FN)
  DO_2SHIFT_U(vshli_u, DO_VSHLU)
 +DO_2SHIFT_S(vshli_s, DO_VSHLS)
  DO_2SHIFT_SAT_U(vqshli_u, DO_UQSHL_OP)
  DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
  DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
 +DO_2SHIFT_U(vrshli_u, DO_VRSHLU)
 +DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT(VSHLI, vshli_u, false)
  DO_2SHIFT(VQSHLI_S, vqshli_s, false)
  DO_2SHIFT(VQSHLI_U, vqshli_u, false)
  DO_2SHIFT(VQSHLUI, vqshlui_s, false)
 +/* These right shifts use a left-shift helper with negated shift count */
 +DO_2SHIFT(VSHRI_S, vshli_s, true)
 +DO_2SHIFT(VSHRI_U, vshli_u, true)
 +DO_2SHIFT(VRSHRI_S, vrshli_s, true)
 +DO_2SHIFT(VRSHRI_U, vrshli_u, true)
 diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c
 +++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ static inline int plus1(DisasContext *s, int x)
      return x + 1;
  }
- static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri,
+-static inline int rsub_64(DisasContext *s, int x)
-@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri,
+-{
-     CPUState *cs = env_cpu(env);
+-    return 64 - x;
-     int mask = vae1_tlbmask(env);
+-}
-     uint64_t pageaddr = sextract64(value << 12, 0, 56);
+-
-+    int bits = vae1_tlbbits(env, pageaddr);
+-static inline int rsub_32(DisasContext *s, int x)
+-{
-     if (tlb_force_broadcast(env)) {
+-    return 32 - x;
--        tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, mask);
+-}
-+        tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits);
+-static inline int rsub_16(DisasContext *s, int x)
-     } else {
+-{
--        tlb_flush_page_by_mmuidx(cs, pageaddr, mask);
+-    return 16 - x;
-+        tlb_flush_page_bits_by_mmuidx(cs, pageaddr, mask, bits);
+-}
-     }
+-static inline int rsub_8(DisasContext *s, int x)
- }
+-{
+-    return 8 - x;
-@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+-}
 -
  static inline int neon_3same_fp_size(DisasContext *s, int x)
  {
-     CPUState *cs = env_cpu(env);
+     /* Convert 0==fp32, 1==fp16 into a MO_* value */
      uint64_t pageaddr = sextract64(value << 12, 0, 56);
 +    int bits = tlbbits_for_regime(env, ARMMMUIdx_E2, pageaddr);
 -    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
 -                                             ARMMMUIdxBit_E2);
 +    tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr,
 +                                                  ARMMMUIdxBit_E2, bits);
  }
  static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
  {
      CPUState *cs = env_cpu(env);
      uint64_t pageaddr = sextract64(value << 12, 0, 56);
 +    int bits = tlbbits_for_regime(env, ARMMMUIdx_SE3, pageaddr);
 -    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
 -                                             ARMMMUIdxBit_SE3);
 +    tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr,
 +                                                  ARMMMUIdxBit_SE3, bits);
  }
  static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri,
 --
 .20.1

-[PULL 11/41] tests/qtest: Add npcm7xx timer test
+[PULL 15/24] target/arm: Implement MVE VSHLL
-From: Havard Skinnemoen <hskinnemoen@google.com>
+Implement the MVE VHLL (vector shift left long) insn.  This has two
 encodings: the T1 encoding is the usual shift-by-immediate format,
 and the T2 encoding is a special case where the shift count is always
 equal to the element size.
-This test exercises the various modes of the npcm7xx timer. In
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-particular, it triggers the bug found by the fuzzer, as reported here:
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210628135835.6690-10-peter.maydell@linaro.org
 ---
  target/arm/helper-mve.h    |  9 +++++++
  target/arm/mve.decode      | 53 +++++++++++++++++++++++++++++++++++---
  target/arm/mve_helper.c    | 32 +++++++++++++++++++++++
  target/arm/translate-mve.c | 15 +++++++++++
 files changed, 105 insertions(+), 4 deletions(-)
-https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg02992.html
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+index XXXXXXX..XXXXXXX 100644
-It also found several other bugs, especially related to interrupt
+--- a/target/arm/helper-mve.h
-handling.
++++ b/target/arm/helper-mve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-The test exercises all the timers in all the timer modules, which
+ DEF_HELPER_FLAGS_4(mve_vrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-expands to 180 test cases in total.
+ DEF_HELPER_FLAGS_4(mve_vrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(mve_vrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-Reviewed-by: Tyrone Ting <kfting@nuvoton.com>
++
-Signed-off-by: Havard Skinnemoen <hskinnemoen@google.com>
++DEF_HELPER_FLAGS_4(mve_vshllbsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-Message-id: 20201008232154.94221-2-hskinnemoen@google.com
++DEF_HELPER_FLAGS_4(mve_vshllbsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
++DEF_HELPER_FLAGS_4(mve_vshllbub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
----
++DEF_HELPER_FLAGS_4(mve_vshllbuh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- tests/qtest/npcm7xx_timer-test.c | 562 +++++++++++++++++++++++++++++++
++DEF_HELPER_FLAGS_4(mve_vshlltsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- tests/qtest/meson.build          |   1 +
++DEF_HELPER_FLAGS_4(mve_vshlltsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-files changed, 563 insertions(+)
++DEF_HELPER_FLAGS_4(mve_vshlltub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- create mode 100644 tests/qtest/npcm7xx_timer-test.c
++DEF_HELPER_FLAGS_4(mve_vshlltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-diff --git a/tests/qtest/npcm7xx_timer-test.c b/tests/qtest/npcm7xx_timer-test.c
+index XXXXXXX..XXXXXXX 100644
-new file mode 100644
+--- a/target/arm/mve.decode
-index XXXXXXX..XXXXXXX
++++ b/target/arm/mve.decode
 --- /dev/null
 +++ b/tests/qtest/npcm7xx_timer-test.c
 @@ -XXX,XX +XXX,XX @@
-+/*
+ @2_shl_h .... .... .. 01  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
-+ * QTest testcase for the Nuvoton NPCM7xx Timer
+ @2_shl_w .... .... .. 1   shift:5 .... .... .... .... &2shift qd=%qd qm=%qm size=2
-+ *
-+ * Copyright 2020 Google LLC
++@2_shll_b .... .... ... 01 shift:3 .... .... .... .... &2shift qd=%qd qm=%qm size=0
-+ *
++@2_shll_h .... .... ... 1  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
-+ * This program is free software; you can redistribute it and/or modify it
++# VSHLL encoding T2 where shift == esize
-+ * under the terms of the GNU General Public License as published by the
++@2_shll_esize_b .... .... .... 00 .. .... .... .... .... &2shift \
-+ * Free Software Foundation; either version 2 of the License, or
++                qd=%qd qm=%qm size=0 shift=8
-+ * (at your option) any later version.
++@2_shll_esize_h .... .... .... 01 .. .... .... .... .... &2shift \
-+ *
++                qd=%qd qm=%qm size=1 shift=16
 + * This program is distributed in the hope that it will be useful, but WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
 + * for more details.
 + */
 +
-+#include "qemu/osdep.h"
+ # Right shifts are encoded as N - shift, where N is the element size in bits.
-+#include "qemu/timer.h"
+ %rshift_i5  16:5 !function=rsub_32
-+#include "libqtest-single.h"
+ %rshift_i4  16:4 !function=rsub_16
-+
+@@ -XXX,XX +XXX,XX @@ VADD             1110 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
-+#define TIM_REF_HZ      (25000000)
+ VSUB             1111 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
-+
+ VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
-+/* Bits in TCSRx */
-+#define CEN             BIT(30)
+-VMULH_S          111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
-+#define IE              BIT(29)
+-VMULH_U          111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
-+#define MODE_ONESHOT    (0 << 27)
++# The VSHLL T2 encoding is not a @2op pattern, but is here because it
-+#define MODE_PERIODIC   (1 << 27)
++# overlaps what would be size=0b11 VMULH/VRMULH
 +#define CRST            BIT(26)
 +#define CACT            BIT(25)
 +#define PRESCALE(x)     (x)
 +
 +/* Registers shared between all timers in a module. */
 +#define TISR    0x18
 +#define WTCR    0x1c
 +# define WTCLK(x)       ((x) << 10)
 +
 +/* Power-on default; used to re-initialize timers before each test. */
 +#define TCSR_DEFAULT    PRESCALE(5)
 +
 +/* Register offsets for a timer within a timer block. */
 +typedef struct Timer {
 +    unsigned int tcsr_offset;
 +    unsigned int ticr_offset;
 +    unsigned int tdr_offset;
 +} Timer;
 +
 +/* A timer block containing 5 timers. */
 +typedef struct TimerBlock {
 +    int irq_base;
 +    uint64_t base_addr;
 +} TimerBlock;
 +
 +/* Testdata for testing a particular timer within a timer block. */
 +typedef struct TestData {
 +    const TimerBlock *tim;
 +    const Timer *timer;
 +} TestData;
 +
 +const TimerBlock timer_block[] = {
 +    {
 +        .irq_base   = 32,
 +        .base_addr  = 0xf0008000,
 +    },
 +    {
 +        .irq_base   = 37,
 +        .base_addr  = 0xf0009000,
 +    },
 +    {
 +        .irq_base   = 42,
 +        .base_addr  = 0xf000a000,
 +    },
 +};
 +
 +const Timer timer[] = {
 +    {
 +        .tcsr_offset    = 0x00,
 +        .ticr_offset    = 0x08,
 +        .tdr_offset     = 0x10,
 +    }, {
 +        .tcsr_offset    = 0x04,
 +        .ticr_offset    = 0x0c,
 +        .tdr_offset     = 0x14,
 +    }, {
 +        .tcsr_offset    = 0x20,
 +        .ticr_offset    = 0x28,
 +        .tdr_offset     = 0x30,
 +    }, {
 +        .tcsr_offset    = 0x24,
 +        .ticr_offset    = 0x2c,
 +        .tdr_offset     = 0x34,
 +    }, {
 +        .tcsr_offset    = 0x40,
 +        .ticr_offset    = 0x48,
 +        .tdr_offset     = 0x50,
 +    },
 +};
 +
 +/* Returns the index of the timer block. */
 +static int tim_index(const TimerBlock *tim)
 +{
-+    ptrdiff_t diff = tim - timer_block;
++  VSHLL_BS       111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
-+
++  VSHLL_BS       111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
-+    g_assert(diff >= 0 && diff < ARRAY_SIZE(timer_block));
-+
+-VRMULH_S         111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
-+    return diff;
+-VRMULH_U         111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
 +  VMULH_S        111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
 +}
 +
-+/* Returns the index of a timer within a timer block. */
-+static int timer_index(const Timer *t)
 +{
-+    ptrdiff_t diff = t - timer;
++  VSHLL_BU       111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
 +  VSHLL_BU       111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
 +
-+    g_assert(diff >= 0 && diff < ARRAY_SIZE(timer));
++  VMULH_U        111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
 +
 +    return diff;
 +}
 +
-+/* Returns the irq line for a given timer. */
-+static int tim_timer_irq(const TestData *td)
 +{
-+    return td->tim->irq_base + timer_index(td->timer);
++  VSHLL_TS       111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
 +  VSHLL_TS       111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
 +
 +  VRMULH_S       111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
 +}
 +
-+/* Register read/write accessors. */
++{
 +  VSHLL_TU       111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
 +  VSHLL_TU       111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
 +
-+static void tim_write(const TestData *td,
++  VRMULH_U       111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
 +                      unsigned int offset, uint32_t value)
 +{
 +    writel(td->tim->base_addr + offset, value);
 +}
+ VMAX_S           111 0 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
+ VMAX_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
+@@ -XXX,XX +XXX,XX @@ VRSHRI_S          111 0 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
+ VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_b
+ VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_h
+ VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
 +
-+static uint32_t tim_read(const TestData *td, unsigned int offset)
++# VSHLL T1 encoding; the T2 VSHLL encoding is elsewhere in this file
-+{
++VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_b
-+    return readl(td->tim->base_addr + offset);
++VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_h
 +}
 +
-+static void tim_write_tcsr(const TestData *td, uint32_t value)
++VSHLL_BU          111 1 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_b
-+{
++VSHLL_BU          111 1 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_h
 +    tim_write(td, td->timer->tcsr_offset, value);
 +}
 +
-+static uint32_t tim_read_tcsr(const TestData *td)
++VSHLL_TS          111 0 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_b
-+{
++VSHLL_TS          111 0 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
 +    return tim_read(td, td->timer->tcsr_offset);
 +}
 +
-+static void tim_write_ticr(const TestData *td, uint32_t value)
++VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_b
-+{
++VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
-+    tim_write(td, td->timer->ticr_offset, value);
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-+}
+index XXXXXXX..XXXXXXX 100644
-+
+--- a/target/arm/mve_helper.c
-+static uint32_t tim_read_ticr(const TestData *td)
++++ b/target/arm/mve_helper.c
-+{
+@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
-+    return tim_read(td, td->timer->ticr_offset);
+ DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
-+}
+ DO_2SHIFT_U(vrshli_u, DO_VRSHLU)
-+
+ DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
 +static uint32_t tim_read_tdr(const TestData *td)
 +{
 +    return tim_read(td, td->timer->tdr_offset);
 +}
 +
 +/* Returns the number of nanoseconds to count the given number of cycles. */
 +static int64_t tim_calculate_step(uint32_t count, uint32_t prescale)
 +{
 +    return (1000000000LL / TIM_REF_HZ) * count * (prescale + 1);
 +}
 +
 +/* Returns a bitmask corresponding to the timer under test. */
 +static uint32_t tim_timer_bit(const TestData *td)
 +{
 +    return BIT(timer_index(td->timer));
 +}
 +
 +/* Resets all timers to power-on defaults. */
 +static void tim_reset(const TestData *td)
 +{
 +    int i, j;
 +
 +    /* Reset all the timers, in case a previous test left a timer running. */
 +    for (i = 0; i < ARRAY_SIZE(timer_block); i++) {
 +        for (j = 0; j < ARRAY_SIZE(timer); j++) {
 +            writel(timer_block[i].base_addr + timer[j].tcsr_offset,
 +                   CRST | TCSR_DEFAULT);
 +        }
 +        writel(timer_block[i].base_addr + TISR, -1);
 +    }
 +}
 +
 +/* Verifies the reset state of a timer. */
 +static void test_reset(gconstpointer test_data)
 +{
 +    const TestData *td = test_data;
 +
 +    tim_reset(td);
 +
 +    g_assert_cmphex(tim_read_tcsr(td), ==, TCSR_DEFAULT);
 +    g_assert_cmphex(tim_read_ticr(td), ==, 0);
 +    g_assert_cmphex(tim_read_tdr(td), ==, 0);
 +    g_assert_cmphex(tim_read(td, TISR), ==, 0);
 +    g_assert_cmphex(tim_read(td, WTCR), ==, WTCLK(1));
 +}
 +
 +/* Verifies that CRST wins if both CEN and CRST are set. */
 +static void test_reset_overrides_enable(gconstpointer test_data)
 +{
 +    const TestData *td = test_data;
 +
 +    tim_reset(td);
 +
 +    /* CRST should force CEN to 0 */
 +    tim_write_tcsr(td, CEN | CRST | TCSR_DEFAULT);
 +
 +    g_assert_cmphex(tim_read_tcsr(td), ==, TCSR_DEFAULT);
 +    g_assert_cmphex(tim_read_tdr(td), ==, 0);
 +    g_assert_cmphex(tim_read(td, TISR), ==, 0);
 +}
 +
 +/* Verifies the behavior when CEN is set and then cleared. */
 +static void test_oneshot_enable_then_disable(gconstpointer test_data)
 +{
 +    const TestData *td = test_data;
 +
 +    tim_reset(td);
 +
 +    /* Enable the timer with zero initial count, then disable it again. */
 +    tim_write_tcsr(td, CEN | TCSR_DEFAULT);
 +    tim_write_tcsr(td, TCSR_DEFAULT);
 +
 +    g_assert_cmphex(tim_read_tcsr(td), ==, TCSR_DEFAULT);
 +    g_assert_cmphex(tim_read_tdr(td), ==, 0);
 +    /* Timer interrupt flag should be set, but interrupts are not enabled. */
 +    g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
 +    g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
 +}
 +
 +/* Verifies that a one-shot timer fires when expected with prescaler 5. */
 +static void test_oneshot_ps5(gconstpointer test_data)
 +{
 +    const TestData *td = test_data;
 +    unsigned int count = 256;
 +    unsigned int ps = 5;
 +
 +    tim_reset(td);
 +
 +    tim_write_ticr(td, count);
 +    tim_write_tcsr(td, CEN | PRESCALE(ps));
 +    g_assert_cmphex(tim_read_tcsr(td), ==, CEN | CACT | PRESCALE(ps));
 +    g_assert_cmpuint(tim_read_tdr(td), ==, count);
 +
 +    clock_step(tim_calculate_step(count, ps) - 1);
 +
 +    g_assert_cmphex(tim_read_tcsr(td), ==, CEN | CACT | PRESCALE(ps));
 +    g_assert_cmpuint(tim_read_tdr(td), <, count);
 +    g_assert_cmphex(tim_read(td, TISR), ==, 0);
 +
 +    clock_step(1);
 +
 +    g_assert_cmphex(tim_read_tcsr(td), ==, PRESCALE(ps));
 +    g_assert_cmpuint(tim_read_tdr(td), ==, count);
 +    g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
 +    g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
 +
 +    /* Clear the interrupt flag. */
 +    tim_write(td, TISR, tim_timer_bit(td));
 +    g_assert_cmphex(tim_read(td, TISR), ==, 0);
 +    g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
 +
 +    /* Verify that this isn't a periodic timer. */
 +    clock_step(2 * tim_calculate_step(count, ps));
 +    g_assert_cmphex(tim_read(td, TISR), ==, 0);
 +    g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
 +}
 +
 +/* Verifies that a one-shot timer fires when expected with prescaler 0. */
 +static void test_oneshot_ps0(gconstpointer test_data)
 +{
 +    const TestData *td = test_data;
 +    unsigned int count = 1;
 +    unsigned int ps = 0;
 +
 +    tim_reset(td);
 +
 +    tim_write_ticr(td, count);
 +    tim_write_tcsr(td, CEN | PRESCALE(ps));
 +    g_assert_cmphex(tim_read_tcsr(td), ==, CEN | CACT | PRESCALE(ps));
 +    g_assert_cmpuint(tim_read_tdr(td), ==, count);
 +
 +    clock_step(tim_calculate_step(count, ps) - 1);
 +
 +    g_assert_cmphex(tim_read_tcsr(td), ==, CEN | CACT | PRESCALE(ps));
 +    g_assert_cmpuint(tim_read_tdr(td), <, count);
 +    g_assert_cmphex(tim_read(td, TISR), ==, 0);
 +
 +    clock_step(1);
 +
 +    g_assert_cmphex(tim_read_tcsr(td), ==, PRESCALE(ps));
 +    g_assert_cmpuint(tim_read_tdr(td), ==, count);
 +    g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
 +    g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
 +}
 +
 +/* Verifies that a one-shot timer fires when expected with highest prescaler. */
 +static void test_oneshot_ps255(gconstpointer test_data)
 +{
 +    const TestData *td = test_data;
 +    unsigned int count = (1U << 24) - 1;
 +    unsigned int ps = 255;
 +
 +    tim_reset(td);
 +
 +    tim_write_ticr(td, count);
 +    tim_write_tcsr(td, CEN | PRESCALE(ps));
 +    g_assert_cmphex(tim_read_tcsr(td), ==, CEN | CACT | PRESCALE(ps));
 +    g_assert_cmpuint(tim_read_tdr(td), ==, count);
 +
 +    clock_step(tim_calculate_step(count, ps) - 1);
 +
 +    g_assert_cmphex(tim_read_tcsr(td), ==, CEN | CACT | PRESCALE(ps));
 +    g_assert_cmpuint(tim_read_tdr(td), <, count);
 +    g_assert_cmphex(tim_read(td, TISR), ==, 0);
 +
 +    clock_step(1);
 +
 +    g_assert_cmphex(tim_read_tcsr(td), ==, PRESCALE(ps));
 +    g_assert_cmpuint(tim_read_tdr(td), ==, count);
 +    g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
 +    g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
 +}
 +
 +/* Verifies that a oneshot timer fires an interrupt when expected. */
 +static void test_oneshot_interrupt(gconstpointer test_data)
 +{
 +    const TestData *td = test_data;
 +    unsigned int count = 256;
 +    unsigned int ps = 7;
 +
 +    tim_reset(td);
 +
 +    tim_write_ticr(td, count);
 +    tim_write_tcsr(td, IE | CEN | MODE_ONESHOT | PRESCALE(ps));
 +
 +    clock_step_next();
 +
 +    g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
 +    g_assert_true(qtest_get_irq(global_qtest, tim_timer_irq(td)));
 +}
 +
 +/*
-+ * Verifies that the timer can be paused and later resumed, and it still fires
++ * Long shifts taking half-sized inputs from top or bottom of the input
-+ * at the right moment.
++ * vector and producing a double-width result. ESIZE, TYPE are for
 + * the input, and LESIZE, LTYPE for the output.
 + * Unlike the normal shift helpers, we do not handle negative shift counts,
 + * because the long shift is strictly left-only.
 + */
-+static void test_pause_resume(gconstpointer test_data)
++#define DO_VSHLL(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE)                   \
-+{
++    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,             \
-+    const TestData *td = test_data;
++                                void *vm, uint32_t shift)               \
-+    unsigned int count = 256;
++    {                                                                   \
-+    unsigned int ps = 1;
++        LTYPE *d = vd;                                                  \
-+
++        TYPE *m = vm;                                                   \
-+    tim_reset(td);
++        uint16_t mask = mve_element_mask(env);                          \
-+
++        unsigned le;                                                    \
-+    tim_write_ticr(td, count);
++        assert(shift <= 16);                                            \
-+    tim_write_tcsr(td, IE | CEN | MODE_ONESHOT | PRESCALE(ps));
++        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
-+
++            LTYPE r = (LTYPE)m[H##ESIZE(le * 2 + TOP)] << shift;        \
-+    /* Pause the timer halfway to expiration. */
++            mergemask(&d[H##LESIZE(le)], r, mask);                      \
-+    clock_step(tim_calculate_step(count / 2, ps));
++        }                                                               \
-+    tim_write_tcsr(td, IE | MODE_ONESHOT | PRESCALE(ps));
++        mve_advance_vpt(env);                                           \
 +    g_assert_cmpuint(tim_read_tdr(td), ==, count / 2);
 +
 +    /* Counter should not advance during the following step. */
 +    clock_step(2 * tim_calculate_step(count, ps));
 +    g_assert_cmpuint(tim_read_tdr(td), ==, count / 2);
 +    g_assert_cmphex(tim_read(td, TISR), ==, 0);
 +    g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
 +
 +    /* Resume the timer and run _almost_ to expiration. */
 +    tim_write_tcsr(td, IE | CEN | MODE_ONESHOT | PRESCALE(ps));
 +    clock_step(tim_calculate_step(count / 2, ps) - 1);
 +    g_assert_cmpuint(tim_read_tdr(td), <, count);
 +    g_assert_cmphex(tim_read(td, TISR), ==, 0);
 +    g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
 +
 +    /* Now, run the rest of the way and verify that the interrupt fires. */
 +    clock_step(1);
 +    g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
 +    g_assert_true(qtest_get_irq(global_qtest, tim_timer_irq(td)));
 +}
 +
 +/* Verifies that the prescaler can be changed while the timer is runnin. */
 +static void test_prescaler_change(gconstpointer test_data)
 +{
 +    const TestData *td = test_data;
 +    unsigned int count = 256;
 +    unsigned int ps = 5;
 +
 +    tim_reset(td);
 +
 +    tim_write_ticr(td, count);
 +    tim_write_tcsr(td, CEN | MODE_ONESHOT | PRESCALE(ps));
 +
 +    /* Run a quarter of the way, and change the prescaler. */
 +    clock_step(tim_calculate_step(count / 4, ps));
 +    g_assert_cmpuint(tim_read_tdr(td), ==, 3 * count / 4);
 +    ps = 2;
 +    tim_write_tcsr(td, CEN | MODE_ONESHOT | PRESCALE(ps));
 +    /* The counter must not change. */
 +    g_assert_cmpuint(tim_read_tdr(td), ==, 3 * count / 4);
 +
 +    /* Run another quarter of the way, and change the prescaler again. */
 +    clock_step(tim_calculate_step(count / 4, ps));
 +    g_assert_cmpuint(tim_read_tdr(td), ==, count / 2);
 +    ps = 8;
 +    tim_write_tcsr(td, CEN | MODE_ONESHOT | PRESCALE(ps));
 +    /* The counter must not change. */
 +    g_assert_cmpuint(tim_read_tdr(td), ==, count / 2);
 +
 +    /* Run another quarter of the way, and change the prescaler again. */
 +    clock_step(tim_calculate_step(count / 4, ps));
 +    g_assert_cmpuint(tim_read_tdr(td), ==, count / 4);
 +    ps = 0;
 +    tim_write_tcsr(td, CEN | MODE_ONESHOT | PRESCALE(ps));
 +    /* The counter must not change. */
 +    g_assert_cmpuint(tim_read_tdr(td), ==, count / 4);
 +
 +    /* Run almost to expiration, and verify the timer didn't fire yet. */
 +    clock_step(tim_calculate_step(count / 4, ps) - 1);
 +    g_assert_cmpuint(tim_read_tdr(td), <, count);
 +    g_assert_cmphex(tim_read(td, TISR), ==, 0);
 +
 +    /* Now, run the rest of the way and verify that the timer fires. */
 +    clock_step(1);
 +    g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
 +}
 +
 +/* Verifies that a periodic timer automatically restarts after expiration. */
 +static void test_periodic_no_interrupt(gconstpointer test_data)
 +{
 +    const TestData *td = test_data;
 +    unsigned int count = 2;
 +    unsigned int ps = 3;
 +    int i;
 +
 +    tim_reset(td);
 +
 +    tim_write_ticr(td, count);
 +    tim_write_tcsr(td, CEN | MODE_PERIODIC | PRESCALE(ps));
 +
 +    for (i = 0; i < 4; i++) {
 +        clock_step_next();
 +
 +        g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
 +        g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
 +
 +        tim_write(td, TISR, tim_timer_bit(td));
 +
 +        g_assert_cmphex(tim_read(td, TISR), ==, 0);
 +        g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
 +    }
 +}
 +
 +/* Verifies that a periodict timer fires an interrupt every time it expires. */
 +static void test_periodic_interrupt(gconstpointer test_data)
 +{
 +    const TestData *td = test_data;
 +    unsigned int count = 65535;
 +    unsigned int ps = 2;
 +    int i;
 +
 +    tim_reset(td);
 +
 +    tim_write_ticr(td, count);
 +    tim_write_tcsr(td, CEN | IE | MODE_PERIODIC | PRESCALE(ps));
 +
 +    for (i = 0; i < 4; i++) {
 +        clock_step_next();
 +
 +        g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
 +        g_assert_true(qtest_get_irq(global_qtest, tim_timer_irq(td)));
 +
 +        tim_write(td, TISR, tim_timer_bit(td));
 +
 +        g_assert_cmphex(tim_read(td, TISR), ==, 0);
 +        g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
 +    }
 +}
 +
 +/*
 + * Verifies that the timer behaves correctly when disabled right before and
 + * exactly when it's supposed to expire.
 + */
 +static void test_disable_on_expiration(gconstpointer test_data)
 +{
 +    const TestData *td = test_data;
 +    unsigned int count = 8;
 +    unsigned int ps = 255;
 +
 +    tim_reset(td);
 +
 +    tim_write_ticr(td, count);
 +    tim_write_tcsr(td, CEN | MODE_ONESHOT | PRESCALE(ps));
 +
 +    clock_step(tim_calculate_step(count, ps) - 1);
 +
 +    tim_write_tcsr(td, MODE_ONESHOT | PRESCALE(ps));
 +    tim_write_tcsr(td, CEN | MODE_ONESHOT | PRESCALE(ps));
 +    clock_step(1);
 +    tim_write_tcsr(td, MODE_ONESHOT | PRESCALE(ps));
 +    g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
 +}
 +
 +/*
 + * Constructs a name that includes the timer block, timer and testcase name,
 + * and adds the test to the test suite.
 + */
 +static void tim_add_test(const char *name, const TestData *td, GTestDataFunc fn)
 +{
 +    g_autofree char *full_name;
 +
 +    full_name = g_strdup_printf("npcm7xx_timer/tim[%d]/timer[%d]/%s",
 +                                tim_index(td->tim), timer_index(td->timer),
 +                                name);
 +    qtest_add_data_func(full_name, td, fn);
 +}
 +
 +/* Convenience macro for adding a test with a predictable function name. */
 +#define add_test(name, td) tim_add_test(#name, td, test_##name)
 +
 +int main(int argc, char **argv)
 +{
 +    TestData testdata[ARRAY_SIZE(timer_block) * ARRAY_SIZE(timer)];
 +    int ret;
 +    int i, j;
 +
 +    g_test_init(&argc, &argv, NULL);
 +    g_test_set_nonfatal_assertions();
 +
 +    for (i = 0; i < ARRAY_SIZE(timer_block); i++) {
 +        for (j = 0; j < ARRAY_SIZE(timer); j++) {
 +            TestData *td = &testdata[i * ARRAY_SIZE(timer) + j];
 +            td->tim = &timer_block[i];
 +            td->timer = &timer[j];
 +
 +            add_test(reset, td);
 +            add_test(reset_overrides_enable, td);
 +            add_test(oneshot_enable_then_disable, td);
 +            add_test(oneshot_ps5, td);
 +            add_test(oneshot_ps0, td);
 +            add_test(oneshot_ps255, td);
 +            add_test(oneshot_interrupt, td);
 +            add_test(pause_resume, td);
 +            add_test(prescaler_change, td);
 +            add_test(periodic_no_interrupt, td);
 +            add_test(periodic_interrupt, td);
 +            add_test(disable_on_expiration, td);
 +        }
 +    }
 +
-+    qtest_start("-machine npcm750-evb");
++#define DO_VSHLL_ALL(OP, TOP)                                \
-+    qtest_irq_intercept_in(global_qtest, "/machine/soc/a9mpcore/gic");
++    DO_VSHLL(OP##sb, TOP, 1, int8_t, 2, int16_t)             \
-+    ret = g_test_run();
++    DO_VSHLL(OP##ub, TOP, 1, uint8_t, 2, uint16_t)           \
-+    qtest_end();
++    DO_VSHLL(OP##sh, TOP, 2, int16_t, 4, int32_t)            \
 +    DO_VSHLL(OP##uh, TOP, 2, uint16_t, 4, uint32_t)          \
 +
-+    return ret;
++DO_VSHLL_ALL(vshllb, false)
-+}
++DO_VSHLL_ALL(vshllt, true)
-diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
---- a/tests/qtest/meson.build
+--- a/target/arm/translate-mve.c
-+++ b/tests/qtest/meson.build
++++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ qtests_arm = \
+@@ -XXX,XX +XXX,XX @@ DO_2SHIFT(VSHRI_S, vshli_s, true)
-   ['arm-cpu-features',
+ DO_2SHIFT(VSHRI_U, vshli_u, true)
-    'microbit-test',
+ DO_2SHIFT(VRSHRI_S, vrshli_s, true)
-    'm25p80-test',
+ DO_2SHIFT(VRSHRI_U, vrshli_u, true)
-+   'npcm7xx_timer-test',
++
-    'test-arm-mptimer',
++#define DO_VSHLL(INSN, FN)                                      \
-    'boot-serial-test',
++    static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
-    'hexloader-test']
++    {                                                           \
 +        static MVEGenTwoOpShiftFn * const fns[] = {             \
 +            gen_helper_mve_##FN##b,                             \
 +            gen_helper_mve_##FN##h,                             \
 +        };                                                      \
 +        return do_2shift(s, a, fns[a->size], false);            \
 +    }
 +
 +DO_VSHLL(VSHLL_BS, vshllbs)
 +DO_VSHLL(VSHLL_BU, vshllbu)
 +DO_VSHLL(VSHLL_TS, vshllts)
 +DO_VSHLL(VSHLL_TU, vshlltu)
 --
 .20.1

-[PULL 32/41] include/elf: Add defines related to GNU property notes for AArch64
+[PULL 16/24] target/arm: Implement MVE VSRI, VSLI
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the MVE VSRI and VSLI insns, which perform a
 shift-and-insert operation.
-These are all of the defines required to parse
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-GNU_PROPERTY_AARCH64_FEATURE_1_AND, copied from binutils.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Other missing defines related to other GNU program headers
+Message-id: 20210628135835.6690-11-peter.maydell@linaro.org
-and notes are elided for now.
+---
  target/arm/helper-mve.h    |  8 ++++++++
  target/arm/mve.decode      |  9 ++++++++
  target/arm/mve_helper.c    | 42 ++++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c |  3 +++
 files changed, 62 insertions(+)
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20201016184207.786698-4-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  include/elf.h | 22 ++++++++++++++++++++++
 file changed, 22 insertions(+)
 diff --git a/include/elf.h b/include/elf.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/elf.h
+--- a/target/arm/helper-mve.h
-+++ b/include/elf.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ typedef int64_t  Elf64_Sxword;
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vshlltsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- #define PT_NOTE    4
+ DEF_HELPER_FLAGS_4(mve_vshlltsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- #define PT_SHLIB   5
+ DEF_HELPER_FLAGS_4(mve_vshlltub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- #define PT_PHDR    6
+ DEF_HELPER_FLAGS_4(mve_vshlltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +#define PT_LOOS    0x60000000
 +#define PT_HIOS    0x6fffffff
  #define PT_LOPROC  0x70000000
  #define PT_HIPROC  0x7fffffff
 +#define PT_GNU_PROPERTY   (PT_LOOS + 0x474e553)
 +
- #define PT_MIPS_REGINFO   0x70000000
++DEF_HELPER_FLAGS_4(mve_vsrib, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- #define PT_MIPS_RTPROC    0x70000001
++DEF_HELPER_FLAGS_4(mve_vsrih, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- #define PT_MIPS_OPTIONS   0x70000002
++DEF_HELPER_FLAGS_4(mve_vsriw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
@@ -XXX,XX +XXX,XX @@ typedef struct elf64_shdr {
  #define NT_ARM_SYSTEM_CALL      0x404   /* ARM system call number */
  #define NT_ARM_SVE      0x405           /* ARM Scalable Vector Extension regs */
 +/* Defined note types for GNU systems.  */
 +
-+#define NT_GNU_PROPERTY_TYPE_0  5       /* Program property */
++DEF_HELPER_FLAGS_4(mve_vslib, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vslih, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vsliw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VSHLL_TS          111 0 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
  VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_b
  VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
 +
-+/* Values used in GNU .note.gnu.property notes (NT_GNU_PROPERTY_TYPE_0).  */
++# Shift-and-insert
 +VSRI              111 1 1111 1 . ... ... ... 0 0100 0 1 . 1 ... 0 @2_shr_b
 +VSRI              111 1 1111 1 . ... ... ... 0 0100 0 1 . 1 ... 0 @2_shr_h
 +VSRI              111 1 1111 1 . ... ... ... 0 0100 0 1 . 1 ... 0 @2_shr_w
 +
-+#define GNU_PROPERTY_STACK_SIZE                 1
++VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_b
-+#define GNU_PROPERTY_NO_COPY_ON_PROTECTED       2
++VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_h
 +VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_w
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
  DO_2SHIFT_U(vrshli_u, DO_VRSHLU)
  DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
 +/* Shift-and-insert; we always work with 64 bits at a time */
 +#define DO_2SHIFT_INSERT(OP, ESIZE, SHIFTFN, MASKFN)                    \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,             \
 +                                void *vm, uint32_t shift)               \
 +    {                                                                   \
 +        uint64_t *d = vd, *m = vm;                                      \
 +        uint16_t mask;                                                  \
 +        uint64_t shiftmask;                                             \
 +        unsigned e;                                                     \
 +        if (shift == 0 || shift == ESIZE * 8) {                         \
 +            /*                                                          \
 +             * Only VSLI can shift by 0; only VSRI can shift by <dt>.   \
 +             * The generic logic would give the right answer for 0 but  \
 +             * fails for <dt>.                                          \
 +             */                                                         \
 +            goto done;                                                  \
 +        }                                                               \
 +        assert(shift < ESIZE * 8);                                      \
 +        mask = mve_element_mask(env);                                   \
 +        /* ESIZE / 2 gives the MO_* value if ESIZE is in [1,2,4] */     \
 +        shiftmask = dup_const(ESIZE / 2, MASKFN(ESIZE * 8, shift));     \
 +        for (e = 0; e < 16 / 8; e++, mask >>= 8) {                      \
 +            uint64_t r = (SHIFTFN(m[H8(e)], shift) & shiftmask) |       \
 +                (d[H8(e)] & ~shiftmask);                                \
 +            mergemask(&d[H8(e)], r, mask);                              \
 +        }                                                               \
 +done:                                                                   \
 +        mve_advance_vpt(env);                                           \
 +    }
 +
-+#define GNU_PROPERTY_LOPROC                     0xc0000000
++#define DO_SHL(N, SHIFT) ((N) << (SHIFT))
-+#define GNU_PROPERTY_HIPROC                     0xdfffffff
++#define DO_SHR(N, SHIFT) ((N) >> (SHIFT))
-+#define GNU_PROPERTY_LOUSER                     0xe0000000
++#define SHL_MASK(EBITS, SHIFT) MAKE_64BIT_MASK((SHIFT), (EBITS) - (SHIFT))
-+#define GNU_PROPERTY_HIUSER                     0xffffffff
++#define SHR_MASK(EBITS, SHIFT) MAKE_64BIT_MASK(0, (EBITS) - (SHIFT))
 +
-+#define GNU_PROPERTY_AARCH64_FEATURE_1_AND      0xc0000000
++DO_2SHIFT_INSERT(vsrib, 1, DO_SHR, SHR_MASK)
-+#define GNU_PROPERTY_AARCH64_FEATURE_1_BTI      (1u << 0)
++DO_2SHIFT_INSERT(vsrih, 2, DO_SHR, SHR_MASK)
-+#define GNU_PROPERTY_AARCH64_FEATURE_1_PAC      (1u << 1)
++DO_2SHIFT_INSERT(vsriw, 4, DO_SHR, SHR_MASK)
 +DO_2SHIFT_INSERT(vslib, 1, DO_SHL, SHL_MASK)
 +DO_2SHIFT_INSERT(vslih, 2, DO_SHL, SHL_MASK)
 +DO_2SHIFT_INSERT(vsliw, 4, DO_SHL, SHL_MASK)
 +
  /*
-  * Physical entry point into the kernel.
+  * Long shifts taking half-sized inputs from top or bottom of the input
-  *
+  * vector and producing a double-width result. ESIZE, TYPE are for
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT(VSHRI_U, vshli_u, true)
  DO_2SHIFT(VRSHRI_S, vrshli_s, true)
  DO_2SHIFT(VRSHRI_U, vrshli_u, true)
 +DO_2SHIFT(VSRI, vsri, false)
 +DO_2SHIFT(VSLI, vsli, false)
 +
  #define DO_VSHLL(INSN, FN)                                      \
      static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
      {                                                           \
 --
 .20.1

-[PULL 30/41] linux-user/aarch64: Reset btype for signals
+[PULL 17/24] target/arm: Implement MVE VSHRN, VRSHRN
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the MVE shift-right-and-narrow insn VSHRN and VRSHRN.
-The kernel sets btype for the signal handler as if for a call.
+do_urshr() is borrowed from sve_helper.c.
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201016184207.786698-2-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210628135835.6690-12-peter.maydell@linaro.org
 ---
- linux-user/aarch64/signal.c | 10 ++++++++--
+ target/arm/helper-mve.h    | 10 ++++++++++
-file changed, 8 insertions(+), 2 deletions(-)
+ target/arm/mve.decode      | 11 +++++++++++
  target/arm/mve_helper.c    | 40 ++++++++++++++++++++++++++++++++++++++
  target/arm/translate-mve.c | 15 ++++++++++++++
 files changed, 76 insertions(+)
-diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/linux-user/aarch64/signal.c
+--- a/target/arm/helper-mve.h
-+++ b/linux-user/aarch64/signal.c
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ static void target_setup_frame(int usig, struct target_sigaction *ka,
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vsriw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-             + offsetof(struct target_rt_frame_record, tramp);
+ DEF_HELPER_FLAGS_4(mve_vslib, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     }
+ DEF_HELPER_FLAGS_4(mve_vslih, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-     env->xregs[0] = usig;
+ DEF_HELPER_FLAGS_4(mve_vsliw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 -    env->xregs[31] = frame_addr;
      env->xregs[29] = frame_addr + fr_ofs;
 -    env->pc = ka->_sa_handler;
      env->xregs[30] = return_addr;
 +    env->xregs[31] = frame_addr;
 +    env->pc = ka->_sa_handler;
 +
-+    /* Invoke the signal handler as if by indirect call.  */
++DEF_HELPER_FLAGS_4(mve_vshrnbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+    if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
++DEF_HELPER_FLAGS_4(mve_vshrnbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+        env->btype = 2;
++DEF_HELPER_FLAGS_4(mve_vshrntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vshrnth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
 +DEF_HELPER_FLAGS_4(mve_vrshrnbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshrnbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshrntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(mve_vrshrnth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve.decode
 +++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VSRI              111 1 1111 1 . ... ... ... 0 0100 0 1 . 1 ... 0 @2_shr_w
  VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_b
  VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_h
  VSLI              111 1 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_w
 +
 +# Narrowing shifts (which only support b and h sizes)
 +VSHRNB            111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_b
 +VSHRNB            111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_h
 +VSHRNT            111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_b
 +VSHRNT            111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_h
 +
 +VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_b
 +VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_h
 +VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_b
 +VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_h
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_INSERT(vsliw, 4, DO_SHL, SHL_MASK)
  DO_VSHLL_ALL(vshllb, false)
  DO_VSHLL_ALL(vshllt, true)
 +
 +/*
 + * Narrowing right shifts, taking a double sized input, shifting it
 + * and putting the result in either the top or bottom half of the output.
 + * ESIZE, TYPE are the output, and LESIZE, LTYPE the input.
 + */
 +#define DO_VSHRN(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN)       \
 +    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
 +                                void *vm, uint32_t shift)       \
 +    {                                                           \
 +        LTYPE *m = vm;                                          \
 +        TYPE *d = vd;                                           \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned le;                                            \
 +        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
 +            TYPE r = FN(m[H##LESIZE(le)], shift);               \
 +            mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask);     \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +    }
 +
-     if (info) {
++#define DO_VSHRN_ALL(OP, FN)                                    \
-         tswap_siginfo(&frame->info, info);
++    DO_VSHRN(OP##bb, false, 1, uint8_t, 2, uint16_t, FN)        \
-         env->xregs[1] = frame_addr + offsetof(struct target_rt_sigframe, info);
++    DO_VSHRN(OP##bh, false, 2, uint16_t, 4, uint32_t, FN)       \
 +    DO_VSHRN(OP##tb, true, 1, uint8_t, 2, uint16_t, FN)         \
 +    DO_VSHRN(OP##th, true, 2, uint16_t, 4, uint32_t, FN)
 +
 +static inline uint64_t do_urshr(uint64_t x, unsigned sh)
 +{
 +    if (likely(sh < 64)) {
 +        return (x >> sh) + ((x >> (sh - 1)) & 1);
 +    } else if (sh == 64) {
 +        return x >> 63;
 +    } else {
 +        return 0;
 +    }
 +}
 +
 +DO_VSHRN_ALL(vshrn, DO_SHR)
 +DO_VSHRN_ALL(vrshrn, do_urshr)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_VSHLL(VSHLL_BS, vshllbs)
  DO_VSHLL(VSHLL_BU, vshllbu)
  DO_VSHLL(VSHLL_TS, vshllts)
  DO_VSHLL(VSHLL_TU, vshlltu)
 +
 +#define DO_2SHIFT_N(INSN, FN)                                   \
 +    static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
 +    {                                                           \
 +        static MVEGenTwoOpShiftFn * const fns[] = {             \
 +            gen_helper_mve_##FN##b,                             \
 +            gen_helper_mve_##FN##h,                             \
 +        };                                                      \
 +        return do_2shift(s, a, fns[a->size], false);            \
 +    }
 +
 +DO_2SHIFT_N(VSHRNB, vshrnb)
 +DO_2SHIFT_N(VSHRNT, vshrnt)
 +DO_2SHIFT_N(VRSHRNB, vrshrnb)
 +DO_2SHIFT_N(VRSHRNT, vrshrnt)
 --
 .20.1

-[PULL 09/41] accel/tcg: Add tlb_flush_page_bits_by_mmuidx*
+[PULL 18/24] target/arm: Implement MVE saturating narrowing shifts
-From: Richard Henderson <richard.henderson@linaro.org>
+Implement the MVE saturating shift-right-and-narrow insns
+VQSHRN, VQSHRUN, VQRSHRN and VQRSHRUN.
-On ARM, the Top Byte Ignore feature means that only 56 bits of
-the address are significant in the virtual address.  We are
+do_srshr() is borrowed from sve_helper.c.
-required to give the entire 64-bit address to FAR_ELx on fault,
 which means that we do not "clean" the top byte early in TCG.
 This new interface allows us to flush all 256 possible aliases
 for a given page, currently missed by tlb_flush_page*.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20201016210754.818257-2-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210628135835.6690-13-peter.maydell@linaro.org
 ---
- include/exec/exec-all.h |  36 ++++++
+ target/arm/helper-mve.h    |  30 +++++++++++
- accel/tcg/cputlb.c      | 275 ++++++++++++++++++++++++++++++++++++++--
+ target/arm/mve.decode      |  28 ++++++++++
-files changed, 302 insertions(+), 9 deletions(-)
+ target/arm/mve_helper.c    | 104 +++++++++++++++++++++++++++++++++++++
+ target/arm/translate-mve.c |  12 +++++
-diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
+files changed, 174 insertions(+)
-index XXXXXXX..XXXXXXX 100644
---- a/include/exec/exec-all.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-+++ b/include/exec/exec-all.h
+index XXXXXXX..XXXXXXX 100644
-@@ -XXX,XX +XXX,XX @@ void tlb_flush_by_mmuidx_all_cpus(CPUState *cpu, uint16_t idxmap);
+--- a/target/arm/helper-mve.h
-  * depend on when the guests translation ends the TB.
++++ b/target/arm/helper-mve.h
-  */
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrshrnbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu, uint16_t idxmap);
+ DEF_HELPER_FLAGS_4(mve_vrshrnbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+
+ DEF_HELPER_FLAGS_4(mve_vrshrntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+/**
+ DEF_HELPER_FLAGS_4(mve_vrshrnth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+ * tlb_flush_page_bits_by_mmuidx
++
-+ * @cpu: CPU whose TLB should be flushed
++DEF_HELPER_FLAGS_4(mve_vqshrnb_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+ * @addr: virtual address of page to be flushed
++DEF_HELPER_FLAGS_4(mve_vqshrnb_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+ * @idxmap: bitmap of mmu indexes to flush
++DEF_HELPER_FLAGS_4(mve_vqshrnt_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+ * @bits: number of significant bits in address
++DEF_HELPER_FLAGS_4(mve_vqshrnt_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+ *
++
-+ * Similar to tlb_flush_page_mask, but with a bitmap of indexes.
++DEF_HELPER_FLAGS_4(mve_vqshrnb_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+ */
++DEF_HELPER_FLAGS_4(mve_vqshrnb_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+void tlb_flush_page_bits_by_mmuidx(CPUState *cpu, target_ulong addr,
++DEF_HELPER_FLAGS_4(mve_vqshrnt_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+                                   uint16_t idxmap, unsigned bits);
++DEF_HELPER_FLAGS_4(mve_vqshrnt_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +
-+/* Similarly, with broadcast and syncing. */
++DEF_HELPER_FLAGS_4(mve_vqshrunbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *cpu, target_ulong addr,
++DEF_HELPER_FLAGS_4(mve_vqshrunbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+                                            uint16_t idxmap, unsigned bits);
++DEF_HELPER_FLAGS_4(mve_vqshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+void tlb_flush_page_bits_by_mmuidx_all_cpus_synced
++DEF_HELPER_FLAGS_4(mve_vqshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+    (CPUState *cpu, target_ulong addr, uint16_t idxmap, unsigned bits);
++
-+
++DEF_HELPER_FLAGS_4(mve_vqrshrnb_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- /**
++DEF_HELPER_FLAGS_4(mve_vqrshrnb_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-  * tlb_set_page_with_attrs:
++DEF_HELPER_FLAGS_4(mve_vqrshrnt_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-  * @cpu: CPU to add this TLB entry for
++DEF_HELPER_FLAGS_4(mve_vqrshrnt_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-@@ -XXX,XX +XXX,XX @@ static inline void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu,
++
-                                                        uint16_t idxmap)
++DEF_HELPER_FLAGS_4(mve_vqrshrnb_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- {
++DEF_HELPER_FLAGS_4(mve_vqrshrnb_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- }
++DEF_HELPER_FLAGS_4(mve_vqrshrnt_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+static inline void tlb_flush_page_bits_by_mmuidx(CPUState *cpu,
++DEF_HELPER_FLAGS_4(mve_vqrshrnt_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+                                                 target_ulong addr,
++
-+                                                 uint16_t idxmap,
++DEF_HELPER_FLAGS_4(mve_vqrshrunbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+                                                 unsigned bits)
++DEF_HELPER_FLAGS_4(mve_vqrshrunbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+{
++DEF_HELPER_FLAGS_4(mve_vqrshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+}
++DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+static inline void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *cpu,
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
-+                                                          target_ulong addr,
+index XXXXXXX..XXXXXXX 100644
-+                                                          uint16_t idxmap,
+--- a/target/arm/mve.decode
-+                                                          unsigned bits)
++++ b/target/arm/mve.decode
-+{
+@@ -XXX,XX +XXX,XX @@ VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_b
-+}
+ VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_h
-+static inline void
+ VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_b
-+tlb_flush_page_bits_by_mmuidx_all_cpus_synced(CPUState *cpu, target_ulong addr,
+ VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_h
-+                                              uint16_t idxmap, unsigned bits)
++
-+{
++VQSHRNB_S         111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_b
-+}
++VQSHRNB_S         111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_h
- #endif
++VQSHRNT_S         111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_b
- /**
++VQSHRNT_S         111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_h
-  * probe_access:
++VQSHRNB_U         111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_b
-diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
++VQSHRNB_U         111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_h
-index XXXXXXX..XXXXXXX 100644
++VQSHRNT_U         111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_b
---- a/accel/tcg/cputlb.c
++VQSHRNT_U         111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_h
-+++ b/accel/tcg/cputlb.c
++
-@@ -XXX,XX +XXX,XX @@ void tlb_flush_all_cpus_synced(CPUState *src_cpu)
++VQSHRUNB          111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_b
-     tlb_flush_by_mmuidx_all_cpus_synced(src_cpu, ALL_MMUIDX_BITS);
++VQSHRUNB          111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_h
- }
++VQSHRUNT          111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
++VQSHRUNT          111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
-+static bool tlb_hit_page_mask_anyprot(CPUTLBEntry *tlb_entry,
++
-+                                      target_ulong page, target_ulong mask)
++VQRSHRNB_S        111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_b
-+{
++VQRSHRNB_S        111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_h
-+    page &= mask;
++VQRSHRNT_S        111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_b
-+    mask &= TARGET_PAGE_MASK | TLB_INVALID_MASK;
++VQRSHRNT_S        111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_h
-+
++VQRSHRNB_U        111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_b
-+    return (page == (tlb_entry->addr_read & mask) ||
++VQRSHRNB_U        111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_h
-+            page == (tlb_addr_write(tlb_entry) & mask) ||
++VQRSHRNT_U        111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_b
-+            page == (tlb_entry->addr_code & mask));
++VQRSHRNT_U        111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_h
-+}
++
-+
++VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_b
- static inline bool tlb_hit_page_anyprot(CPUTLBEntry *tlb_entry,
++VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_h
-                                         target_ulong page)
++VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
- {
++VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
--    return tlb_hit_page(tlb_entry->addr_read, page) ||
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
--           tlb_hit_page(tlb_addr_write(tlb_entry), page) ||
+index XXXXXXX..XXXXXXX 100644
--           tlb_hit_page(tlb_entry->addr_code, page);
+--- a/target/arm/mve_helper.c
-+    return tlb_hit_page_mask_anyprot(tlb_entry, page, -1);
++++ b/target/arm/mve_helper.c
- }
+@@ -XXX,XX +XXX,XX @@ static inline uint64_t do_urshr(uint64_t x, unsigned sh)
  /**
@@ -XXX,XX +XXX,XX @@ static inline bool tlb_entry_is_empty(const CPUTLBEntry *te)
  }
  /* Called with tlb_c.lock held */
 -static inline bool tlb_flush_entry_locked(CPUTLBEntry *tlb_entry,
 -                                          target_ulong page)
 +static bool tlb_flush_entry_mask_locked(CPUTLBEntry *tlb_entry,
 +                                        target_ulong page,
 +                                        target_ulong mask)
  {
 -    if (tlb_hit_page_anyprot(tlb_entry, page)) {
 +    if (tlb_hit_page_mask_anyprot(tlb_entry, page, mask)) {
          memset(tlb_entry, -1, sizeof(*tlb_entry));
          return true;
      }
      return false;
  }
 +static inline bool tlb_flush_entry_locked(CPUTLBEntry *tlb_entry,
 +                                          target_ulong page)
 +{
 +    return tlb_flush_entry_mask_locked(tlb_entry, page, -1);
 +}
 +
  /* Called with tlb_c.lock held */
 -static inline void tlb_flush_vtlb_page_locked(CPUArchState *env, int mmu_idx,
 -                                              target_ulong page)
 +static void tlb_flush_vtlb_page_mask_locked(CPUArchState *env, int mmu_idx,
 +                                            target_ulong page,
 +                                            target_ulong mask)
  {
      CPUTLBDesc *d = &env_tlb(env)->d[mmu_idx];
      int k;
      assert_cpu_is_self(env_cpu(env));
      for (k = 0; k < CPU_VTLB_SIZE; k++) {
 -        if (tlb_flush_entry_locked(&d->vtable[k], page)) {
 +        if (tlb_flush_entry_mask_locked(&d->vtable[k], page, mask)) {
              tlb_n_used_entries_dec(env, mmu_idx);
          }
      }
  }
-+static inline void tlb_flush_vtlb_page_locked(CPUArchState *env, int mmu_idx,
++static inline int64_t do_srshr(int64_t x, unsigned sh)
 +                                              target_ulong page)
 +{
-+    tlb_flush_vtlb_page_mask_locked(env, mmu_idx, page, -1);
++    if (likely(sh < 64)) {
-+}
++        return (x >> sh) + ((x >> (sh - 1)) & 1);
 +
  static void tlb_flush_page_locked(CPUArchState *env, int midx,
                                    target_ulong page)
  {
@@ -XXX,XX +XXX,XX @@ void tlb_flush_page_all_cpus_synced(CPUState *src, target_ulong addr)
      tlb_flush_page_by_mmuidx_all_cpus_synced(src, addr, ALL_MMUIDX_BITS);
  }
 +static void tlb_flush_page_bits_locked(CPUArchState *env, int midx,
 +                                       target_ulong page, unsigned bits)
 +{
 +    CPUTLBDesc *d = &env_tlb(env)->d[midx];
 +    CPUTLBDescFast *f = &env_tlb(env)->f[midx];
 +    target_ulong mask = MAKE_64BIT_MASK(0, bits);
 +
 +    /*
 +     * If @bits is smaller than the tlb size, there may be multiple entries
 +     * within the TLB; otherwise all addresses that match under @mask hit
 +     * the same TLB entry.
 +     *
 +     * TODO: Perhaps allow bits to be a few bits less than the size.
 +     * For now, just flush the entire TLB.
 +     */
 +    if (mask < f->mask) {
 +        tlb_debug("forcing full flush midx %d ("
 +                  TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
 +                  midx, page, mask);
 +        tlb_flush_one_mmuidx_locked(env, midx, get_clock_realtime());
 +        return;
 +    }
 +
 +    /* Check if we need to flush due to large pages.  */
 +    if ((page & d->large_page_mask) == d->large_page_addr) {
 +        tlb_debug("forcing full flush midx %d ("
 +                  TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
 +                  midx, d->large_page_addr, d->large_page_mask);
 +        tlb_flush_one_mmuidx_locked(env, midx, get_clock_realtime());
 +        return;
 +    }
 +
 +    if (tlb_flush_entry_mask_locked(tlb_entry(env, midx, page), page, mask)) {
 +        tlb_n_used_entries_dec(env, midx);
 +    }
 +    tlb_flush_vtlb_page_mask_locked(env, midx, page, mask);
 +}
 +
 +typedef struct {
 +    target_ulong addr;
 +    uint16_t idxmap;
 +    uint16_t bits;
 +} TLBFlushPageBitsByMMUIdxData;
 +
 +static void
 +tlb_flush_page_bits_by_mmuidx_async_0(CPUState *cpu,
 +                                      TLBFlushPageBitsByMMUIdxData d)
 +{
 +    CPUArchState *env = cpu->env_ptr;
 +    int mmu_idx;
 +
 +    assert_cpu_is_self(cpu);
 +
 +    tlb_debug("page addr:" TARGET_FMT_lx "/%u mmu_map:0x%x\n",
 +              d.addr, d.bits, d.idxmap);
 +
 +    qemu_spin_lock(&env_tlb(env)->c.lock);
 +    for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
 +        if ((d.idxmap >> mmu_idx) & 1) {
 +            tlb_flush_page_bits_locked(env, mmu_idx, d.addr, d.bits);
 +        }
 +    }
 +    qemu_spin_unlock(&env_tlb(env)->c.lock);
 +
 +    tb_flush_jmp_cache(cpu, d.addr);
 +}
 +
 +static bool encode_pbm_to_runon(run_on_cpu_data *out,
 +                                TLBFlushPageBitsByMMUIdxData d)
 +{
 +    /* We need 6 bits to hold to hold @bits up to 63. */
 +    if (d.idxmap <= MAKE_64BIT_MASK(0, TARGET_PAGE_BITS - 6)) {
 +        *out = RUN_ON_CPU_TARGET_PTR(d.addr | (d.idxmap << 6) | d.bits);
 +        return true;
 +    }
 +    return false;
 +}
 +
 +static TLBFlushPageBitsByMMUIdxData
 +decode_runon_to_pbm(run_on_cpu_data data)
 +{
 +    target_ulong addr_map_bits = (target_ulong) data.target_ptr;
 +    return (TLBFlushPageBitsByMMUIdxData){
 +        .addr = addr_map_bits & TARGET_PAGE_MASK,
 +        .idxmap = (addr_map_bits & ~TARGET_PAGE_MASK) >> 6,
 +        .bits = addr_map_bits & 0x3f
 +    };
 +}
 +
 +static void tlb_flush_page_bits_by_mmuidx_async_1(CPUState *cpu,
 +                                                  run_on_cpu_data runon)
 +{
 +    tlb_flush_page_bits_by_mmuidx_async_0(cpu, decode_runon_to_pbm(runon));
 +}
 +
 +static void tlb_flush_page_bits_by_mmuidx_async_2(CPUState *cpu,
 +                                                  run_on_cpu_data data)
 +{
 +    TLBFlushPageBitsByMMUIdxData *d = data.host_ptr;
 +    tlb_flush_page_bits_by_mmuidx_async_0(cpu, *d);
 +    g_free(d);
 +}
 +
 +void tlb_flush_page_bits_by_mmuidx(CPUState *cpu, target_ulong addr,
 +                                   uint16_t idxmap, unsigned bits)
 +{
 +    TLBFlushPageBitsByMMUIdxData d;
 +    run_on_cpu_data runon;
 +
 +    /* If all bits are significant, this devolves to tlb_flush_page. */
 +    if (bits >= TARGET_LONG_BITS) {
 +        tlb_flush_page_by_mmuidx(cpu, addr, idxmap);
 +        return;
 +    }
 +    /* If no page bits are significant, this devolves to tlb_flush. */
 +    if (bits < TARGET_PAGE_BITS) {
 +        tlb_flush_by_mmuidx(cpu, idxmap);
 +        return;
 +    }
 +
 +    /* This should already be page aligned */
 +    d.addr = addr & TARGET_PAGE_MASK;
 +    d.idxmap = idxmap;
 +    d.bits = bits;
 +
 +    if (qemu_cpu_is_self(cpu)) {
 +        tlb_flush_page_bits_by_mmuidx_async_0(cpu, d);
 +    } else if (encode_pbm_to_runon(&runon, d)) {
 +        async_run_on_cpu(cpu, tlb_flush_page_bits_by_mmuidx_async_1, runon);
 +    } else {
-+        TLBFlushPageBitsByMMUIdxData *p
++        /* Rounding the sign bit always produces 0. */
-+            = g_new(TLBFlushPageBitsByMMUIdxData, 1);
++        return 0;
 +
 +        /* Otherwise allocate a structure, freed by the worker.  */
 +        *p = d;
 +        async_run_on_cpu(cpu, tlb_flush_page_bits_by_mmuidx_async_2,
 +                         RUN_ON_CPU_HOST_PTR(p));
 +    }
 +}
 +
-+void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *src_cpu,
+ DO_VSHRN_ALL(vshrn, DO_SHR)
-+                                            target_ulong addr,
+ DO_VSHRN_ALL(vrshrn, do_urshr)
-+                                            uint16_t idxmap,
++
-+                                            unsigned bits)
++static inline int32_t do_sat_bhs(int64_t val, int64_t min, int64_t max,
 +                                 bool *satp)
 +{
-+    TLBFlushPageBitsByMMUIdxData d;
++    if (val > max) {
-+    run_on_cpu_data runon;
++        *satp = true;
-+
++        return max;
-+    /* If all bits are significant, this devolves to tlb_flush_page. */
++    } else if (val < min) {
-+    if (bits >= TARGET_LONG_BITS) {
++        *satp = true;
-+        tlb_flush_page_by_mmuidx_all_cpus(src_cpu, addr, idxmap);
++        return min;
 +        return;
 +    }
 +    /* If no page bits are significant, this devolves to tlb_flush. */
 +    if (bits < TARGET_PAGE_BITS) {
 +        tlb_flush_by_mmuidx_all_cpus(src_cpu, idxmap);
 +        return;
 +    }
 +
 +    /* This should already be page aligned */
 +    d.addr = addr & TARGET_PAGE_MASK;
 +    d.idxmap = idxmap;
 +    d.bits = bits;
 +
 +    if (encode_pbm_to_runon(&runon, d)) {
 +        flush_all_helper(src_cpu, tlb_flush_page_bits_by_mmuidx_async_1, runon);
 +    } else {
-+        CPUState *dst_cpu;
++        return val;
 +        TLBFlushPageBitsByMMUIdxData *p;
 +
 +        /* Allocate a separate data block for each destination cpu.  */
 +        CPU_FOREACH(dst_cpu) {
 +            if (dst_cpu != src_cpu) {
 +                p = g_new(TLBFlushPageBitsByMMUIdxData, 1);
 +                *p = d;
 +                async_run_on_cpu(dst_cpu,
 +                                 tlb_flush_page_bits_by_mmuidx_async_2,
 +                                 RUN_ON_CPU_HOST_PTR(p));
 +            }
 +        }
 +    }
 +
 +    tlb_flush_page_bits_by_mmuidx_async_0(src_cpu, d);
 +}
 +
 +void tlb_flush_page_bits_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
 +                                                   target_ulong addr,
 +                                                   uint16_t idxmap,
 +                                                   unsigned bits)
 +{
 +    TLBFlushPageBitsByMMUIdxData d;
 +    run_on_cpu_data runon;
 +
 +    /* If all bits are significant, this devolves to tlb_flush_page. */
 +    if (bits >= TARGET_LONG_BITS) {
 +        tlb_flush_page_by_mmuidx_all_cpus_synced(src_cpu, addr, idxmap);
 +        return;
 +    }
 +    /* If no page bits are significant, this devolves to tlb_flush. */
 +    if (bits < TARGET_PAGE_BITS) {
 +        tlb_flush_by_mmuidx_all_cpus_synced(src_cpu, idxmap);
 +        return;
 +    }
 +
 +    /* This should already be page aligned */
 +    d.addr = addr & TARGET_PAGE_MASK;
 +    d.idxmap = idxmap;
 +    d.bits = bits;
 +
 +    if (encode_pbm_to_runon(&runon, d)) {
 +        flush_all_helper(src_cpu, tlb_flush_page_bits_by_mmuidx_async_1, runon);
 +        async_safe_run_on_cpu(src_cpu, tlb_flush_page_bits_by_mmuidx_async_1,
 +                              runon);
 +    } else {
 +        CPUState *dst_cpu;
 +        TLBFlushPageBitsByMMUIdxData *p;
 +
 +        /* Allocate a separate data block for each destination cpu.  */
 +        CPU_FOREACH(dst_cpu) {
 +            if (dst_cpu != src_cpu) {
 +                p = g_new(TLBFlushPageBitsByMMUIdxData, 1);
 +                *p = d;
 +                async_run_on_cpu(dst_cpu, tlb_flush_page_bits_by_mmuidx_async_2,
 +                                 RUN_ON_CPU_HOST_PTR(p));
 +            }
 +        }
 +
 +        p = g_new(TLBFlushPageBitsByMMUIdxData, 1);
 +        *p = d;
 +        async_safe_run_on_cpu(src_cpu, tlb_flush_page_bits_by_mmuidx_async_2,
 +                              RUN_ON_CPU_HOST_PTR(p));
 +    }
 +}
 +
- /* update the TLBs so that writes to code in the virtual page 'addr'
++/* Saturating narrowing right shifts */
-    can be detected */
++#define DO_VSHRN_SAT(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN)   \
- void tlb_protect_code(ram_addr_t ram_addr)
++    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
 +                                void *vm, uint32_t shift)       \
 +    {                                                           \
 +        LTYPE *m = vm;                                          \
 +        TYPE *d = vd;                                           \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        bool qc = false;                                        \
 +        unsigned le;                                            \
 +        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
 +            bool sat = false;                                   \
 +            TYPE r = FN(m[H##LESIZE(le)], shift, &sat);         \
 +            mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask);     \
 +            qc |= sat && (mask & 1 << (TOP * ESIZE));           \
 +        }                                                       \
 +        if (qc) {                                               \
 +            env->vfp.qc[0] = qc;                                \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +    }
 +
 +#define DO_VSHRN_SAT_UB(BOP, TOP, FN)                           \
 +    DO_VSHRN_SAT(BOP, false, 1, uint8_t, 2, uint16_t, FN)       \
 +    DO_VSHRN_SAT(TOP, true, 1, uint8_t, 2, uint16_t, FN)
 +
 +#define DO_VSHRN_SAT_UH(BOP, TOP, FN)                           \
 +    DO_VSHRN_SAT(BOP, false, 2, uint16_t, 4, uint32_t, FN)      \
 +    DO_VSHRN_SAT(TOP, true, 2, uint16_t, 4, uint32_t, FN)
 +
 +#define DO_VSHRN_SAT_SB(BOP, TOP, FN)                           \
 +    DO_VSHRN_SAT(BOP, false, 1, int8_t, 2, int16_t, FN)         \
 +    DO_VSHRN_SAT(TOP, true, 1, int8_t, 2, int16_t, FN)
 +
 +#define DO_VSHRN_SAT_SH(BOP, TOP, FN)                           \
 +    DO_VSHRN_SAT(BOP, false, 2, int16_t, 4, int32_t, FN)        \
 +    DO_VSHRN_SAT(TOP, true, 2, int16_t, 4, int32_t, FN)
 +
 +#define DO_SHRN_SB(N, M, SATP)                                  \
 +    do_sat_bhs((int64_t)(N) >> (M), INT8_MIN, INT8_MAX, SATP)
 +#define DO_SHRN_UB(N, M, SATP)                                  \
 +    do_sat_bhs((uint64_t)(N) >> (M), 0, UINT8_MAX, SATP)
 +#define DO_SHRUN_B(N, M, SATP)                                  \
 +    do_sat_bhs((int64_t)(N) >> (M), 0, UINT8_MAX, SATP)
 +
 +#define DO_SHRN_SH(N, M, SATP)                                  \
 +    do_sat_bhs((int64_t)(N) >> (M), INT16_MIN, INT16_MAX, SATP)
 +#define DO_SHRN_UH(N, M, SATP)                                  \
 +    do_sat_bhs((uint64_t)(N) >> (M), 0, UINT16_MAX, SATP)
 +#define DO_SHRUN_H(N, M, SATP)                                  \
 +    do_sat_bhs((int64_t)(N) >> (M), 0, UINT16_MAX, SATP)
 +
 +#define DO_RSHRN_SB(N, M, SATP)                                 \
 +    do_sat_bhs(do_srshr(N, M), INT8_MIN, INT8_MAX, SATP)
 +#define DO_RSHRN_UB(N, M, SATP)                                 \
 +    do_sat_bhs(do_urshr(N, M), 0, UINT8_MAX, SATP)
 +#define DO_RSHRUN_B(N, M, SATP)                                 \
 +    do_sat_bhs(do_srshr(N, M), 0, UINT8_MAX, SATP)
 +
 +#define DO_RSHRN_SH(N, M, SATP)                                 \
 +    do_sat_bhs(do_srshr(N, M), INT16_MIN, INT16_MAX, SATP)
 +#define DO_RSHRN_UH(N, M, SATP)                                 \
 +    do_sat_bhs(do_urshr(N, M), 0, UINT16_MAX, SATP)
 +#define DO_RSHRUN_H(N, M, SATP)                                 \
 +    do_sat_bhs(do_srshr(N, M), 0, UINT16_MAX, SATP)
 +
 +DO_VSHRN_SAT_SB(vqshrnb_sb, vqshrnt_sb, DO_SHRN_SB)
 +DO_VSHRN_SAT_SH(vqshrnb_sh, vqshrnt_sh, DO_SHRN_SH)
 +DO_VSHRN_SAT_UB(vqshrnb_ub, vqshrnt_ub, DO_SHRN_UB)
 +DO_VSHRN_SAT_UH(vqshrnb_uh, vqshrnt_uh, DO_SHRN_UH)
 +DO_VSHRN_SAT_SB(vqshrunbb, vqshruntb, DO_SHRUN_B)
 +DO_VSHRN_SAT_SH(vqshrunbh, vqshrunth, DO_SHRUN_H)
 +
 +DO_VSHRN_SAT_SB(vqrshrnb_sb, vqrshrnt_sb, DO_RSHRN_SB)
 +DO_VSHRN_SAT_SH(vqrshrnb_sh, vqrshrnt_sh, DO_RSHRN_SH)
 +DO_VSHRN_SAT_UB(vqrshrnb_ub, vqrshrnt_ub, DO_RSHRN_UB)
 +DO_VSHRN_SAT_UH(vqrshrnb_uh, vqrshrnt_uh, DO_RSHRN_UH)
 +DO_VSHRN_SAT_SB(vqrshrunbb, vqrshruntb, DO_RSHRUN_B)
 +DO_VSHRN_SAT_SH(vqrshrunbh, vqrshrunth, DO_RSHRUN_H)
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_N(VSHRNB, vshrnb)
  DO_2SHIFT_N(VSHRNT, vshrnt)
  DO_2SHIFT_N(VRSHRNB, vrshrnb)
  DO_2SHIFT_N(VRSHRNT, vrshrnt)
 +DO_2SHIFT_N(VQSHRNB_S, vqshrnb_s)
 +DO_2SHIFT_N(VQSHRNT_S, vqshrnt_s)
 +DO_2SHIFT_N(VQSHRNB_U, vqshrnb_u)
 +DO_2SHIFT_N(VQSHRNT_U, vqshrnt_u)
 +DO_2SHIFT_N(VQSHRUNB, vqshrunb)
 +DO_2SHIFT_N(VQSHRUNT, vqshrunt)
 +DO_2SHIFT_N(VQRSHRNB_S, vqrshrnb_s)
 +DO_2SHIFT_N(VQRSHRNT_S, vqrshrnt_s)
 +DO_2SHIFT_N(VQRSHRNB_U, vqrshrnb_u)
 +DO_2SHIFT_N(VQRSHRNT_U, vqrshrnt_u)
 +DO_2SHIFT_N(VQRSHRUNB, vqrshrunb)
 +DO_2SHIFT_N(VQRSHRUNT, vqrshrunt)
 --
 .20.1

-[PULL 02/41] target/arm: AArch32 VCVT fixed-point to float is always round-to-nearest
+[PULL 19/24] target/arm: Implement MVE VSHLC
-For AArch32, unlike the VCVT of integer to float, which honours the
+Implement the MVE VSHLC insn, which performs a shift left of the
-rounding mode specified by the FPSCR, VCVT of fixed-point to float is
+entire vector with carry in bits provided from a general purpose
-always round-to-nearest. (AArch64 fixed-point-to-float conversions
+register and carry out bits written back to that register.
 always honour the FPCR rounding mode.)
 Implement this by providing _round_to_nearest versions of the
 relevant helpers which set the rounding mode temporarily when making
 the call to the underlying softfloat function.
 We only need to change the VFP VCVT instructions, because the
 standard- FPSCR value used by the Neon VCVT is always set to
 round-to-nearest, so we don't need to do the extra work of saving
 and restoring the rounding mode.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201013103532.13391-1-peter.maydell@linaro.org
+Message-id: 20210628135835.6690-14-peter.maydell@linaro.org
 ---
- target/arm/helper.h            | 13 +++++++++++++
+ target/arm/helper-mve.h    |  2 ++
- target/arm/vfp_helper.c        | 23 ++++++++++++++++++++++-
+ target/arm/mve.decode      |  2 ++
- target/arm/translate-vfp.c.inc | 24 ++++++++++++------------
+ target/arm/mve_helper.c    | 38 ++++++++++++++++++++++++++++++++++++++
-files changed, 47 insertions(+), 13 deletions(-)
+ target/arm/translate-mve.c | 30 ++++++++++++++++++++++++++++++
 files changed, 72 insertions(+)
-diff --git a/target/arm/helper.h b/target/arm/helper.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.h
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/helper.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_ultoh, f16, i32, i32, ptr)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrshrunbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- DEF_HELPER_3(vfp_sqtoh, f16, i64, i32, ptr)
+ DEF_HELPER_FLAGS_4(mve_vqrshrunbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
- DEF_HELPER_3(vfp_uqtoh, f16, i64, i32, ptr)
+ DEF_HELPER_FLAGS_4(mve_vqrshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 +DEF_HELPER_3(vfp_shtos_round_to_nearest, f32, i32, i32, ptr)
 +DEF_HELPER_3(vfp_sltos_round_to_nearest, f32, i32, i32, ptr)
 +DEF_HELPER_3(vfp_uhtos_round_to_nearest, f32, i32, i32, ptr)
 +DEF_HELPER_3(vfp_ultos_round_to_nearest, f32, i32, i32, ptr)
 +DEF_HELPER_3(vfp_shtod_round_to_nearest, f64, i64, i32, ptr)
 +DEF_HELPER_3(vfp_sltod_round_to_nearest, f64, i64, i32, ptr)
 +DEF_HELPER_3(vfp_uhtod_round_to_nearest, f64, i64, i32, ptr)
 +DEF_HELPER_3(vfp_ultod_round_to_nearest, f64, i64, i32, ptr)
 +DEF_HELPER_3(vfp_shtoh_round_to_nearest, f16, i32, i32, ptr)
 +DEF_HELPER_3(vfp_uhtoh_round_to_nearest, f16, i32, i32, ptr)
 +DEF_HELPER_3(vfp_sltoh_round_to_nearest, f16, i32, i32, ptr)
 +DEF_HELPER_3(vfp_ultoh_round_to_nearest, f16, i32, i32, ptr)
 +
- DEF_HELPER_FLAGS_2(set_rmode, TCG_CALL_NO_RWG, i32, i32, ptr)
++DEF_HELPER_FLAGS_4(mve_vshlc, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
+diff --git a/target/arm/mve.decode b/target/arm/mve.decode
  DEF_HELPER_FLAGS_3(vfp_fcvt_f16_to_f32, TCG_CALL_NO_RWG, f32, f16, ptr, i32)
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp_helper.c
+--- a/target/arm/mve.decode
-+++ b/target/arm/vfp_helper.c
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@ float32 VFP_HELPER(fcvts, d)(float64 x, CPUARMState *env)
+@@ -XXX,XX +XXX,XX @@ VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_b
-     return float64_to_float32(x, &env->vfp.fp_status);
+ VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_h
- }
+ VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
+ VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
--/* VFP3 fixed point conversion.  */
++
-+/*
++VSHLC             111 0 1110 1 . 1 imm:5 ... 0 1111 1100 rdm:4 qd=%qd
-+ * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-+ * must always round-to-nearest; the AArch64 ones honour the FPSCR
+index XXXXXXX..XXXXXXX 100644
-+ * rounding mode. (For AArch32 Neon the standard-FPSCR is set to
+--- a/target/arm/mve_helper.c
-+ * round-to-nearest so either helper will work.) AArch32 float-to-fix
++++ b/target/arm/mve_helper.c
-+ * must round-to-zero.
+@@ -XXX,XX +XXX,XX @@ DO_VSHRN_SAT_UB(vqrshrnb_ub, vqrshrnt_ub, DO_RSHRN_UB)
-+ */
+ DO_VSHRN_SAT_UH(vqrshrnb_uh, vqrshrnt_uh, DO_RSHRN_UH)
- #define VFP_CONV_FIX_FLOAT(name, p, fsz, ftype, isz, itype)            \
+ DO_VSHRN_SAT_SB(vqrshrunbb, vqrshruntb, DO_RSHRUN_B)
- ftype HELPER(vfp_##name##to##p)(uint##isz##_t  x, uint32_t shift,      \
+ DO_VSHRN_SAT_SH(vqrshrunbh, vqrshrunth, DO_RSHRUN_H)
-                                      void *fpstp) \
++
- { return itype##_to_##float##fsz##_scalbn(x, -shift, fpstp); }
++uint32_t HELPER(mve_vshlc)(CPUARMState *env, void *vd, uint32_t rdm,
++                           uint32_t shift)
-+#define VFP_CONV_FIX_FLOAT_ROUND(name, p, fsz, ftype, isz, itype)      \
++{
-+    ftype HELPER(vfp_##name##to##p##_round_to_nearest)(uint##isz##_t  x, \
++    uint32_t *d = vd;
-+                                                     uint32_t shift,   \
++    uint16_t mask = mve_element_mask(env);
-+                                                     void *fpstp)      \
++    unsigned e;
-+    {                                                                  \
++    uint32_t r;
-+        ftype ret;                                                     \
++
-+        float_status *fpst = fpstp;                                    \
++    /*
-+        FloatRoundMode oldmode = fpst->float_rounding_mode;            \
++     * For each 32-bit element, we shift it left, bringing in the
-+        fpst->float_rounding_mode = float_round_nearest_even;          \
++     * low 'shift' bits of rdm at the bottom. Bits shifted out at
-+        ret = itype##_to_##float##fsz##_scalbn(x, -shift, fpstp);      \
++     * the top become the new rdm, if the predicate mask permits.
-+        fpst->float_rounding_mode = oldmode;                           \
++     * The final rdm value is returned to update the register.
-+        return ret;                                                    \
++     * shift == 0 here means "shift by 32 bits".
 +     */
 +    if (shift == 0) {
 +        for (e = 0; e < 16 / 4; e++, mask >>= 4) {
 +            r = rdm;
 +            if (mask & 1) {
 +                rdm = d[H4(e)];
 +            }
 +            mergemask(&d[H4(e)], r, mask);
 +        }
 +    } else {
 +        uint32_t shiftmask = MAKE_64BIT_MASK(0, shift);
 +
 +        for (e = 0; e < 16 / 4; e++, mask >>= 4) {
 +            r = (d[H4(e)] << shift) | (rdm & shiftmask);
 +            if (mask & 1) {
 +                rdm = d[H4(e)] >> (32 - shift);
 +            }
 +            mergemask(&d[H4(e)], r, mask);
 +        }
 +    }
 +    mve_advance_vpt(env);
 +    return rdm;
 +}
 diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-mve.c
 +++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_N(VQRSHRNB_U, vqrshrnb_u)
  DO_2SHIFT_N(VQRSHRNT_U, vqrshrnt_u)
  DO_2SHIFT_N(VQRSHRUNB, vqrshrunb)
  DO_2SHIFT_N(VQRSHRUNT, vqrshrunt)
 +
 +static bool trans_VSHLC(DisasContext *s, arg_VSHLC *a)
 +{
 +    /*
 +     * Whole Vector Left Shift with Carry. The carry is taken
 +     * from a general purpose register and written back there.
 +     * An imm of 0 means "shift by 32".
 +     */
 +    TCGv_ptr qd;
 +    TCGv_i32 rdm;
 +
 +    if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd)) {
 +        return false;
 +    }
 +    if (a->rdm == 13 || a->rdm == 15) {
 +        /* CONSTRAINED UNPREDICTABLE: we UNDEF */
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
- #define VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype, ROUND, suff) \
++    qd = mve_qreg_ptr(a->qd);
- uint##isz##_t HELPER(vfp_to##name##p##suff)(ftype x, uint32_t shift,      \
++    rdm = load_reg(s, a->rdm);
-                                             void *fpst)                   \
++    gen_helper_mve_vshlc(rdm, cpu_env, qd, rdm, tcg_constant_i32(a->imm));
-@@ -XXX,XX +XXX,XX @@ uint##isz##_t HELPER(vfp_to##name##p##suff)(ftype x, uint32_t shift,      \
++    store_reg(s, a->rdm, rdm);
++    tcg_temp_free_ptr(qd);
- #define VFP_CONV_FIX(name, p, fsz, ftype, isz, itype)            \
++    mve_update_eci(s);
- VFP_CONV_FIX_FLOAT(name, p, fsz, ftype, isz, itype)              \
++    return true;
-+VFP_CONV_FIX_FLOAT_ROUND(name, p, fsz, ftype, isz, itype)        \
++}
  VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype,        \
                           float_round_to_zero, _round_to_zero)    \
  VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype,        \
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
      /* Switch on op:U:sx bits */
      switch (a->opc) {
      case 0:
 -        gen_helper_vfp_shtoh(vd, vd, shift, fpst);
 +        gen_helper_vfp_shtoh_round_to_nearest(vd, vd, shift, fpst);
          break;
      case 1:
 -        gen_helper_vfp_sltoh(vd, vd, shift, fpst);
 +        gen_helper_vfp_sltoh_round_to_nearest(vd, vd, shift, fpst);
          break;
      case 2:
 -        gen_helper_vfp_uhtoh(vd, vd, shift, fpst);
 +        gen_helper_vfp_uhtoh_round_to_nearest(vd, vd, shift, fpst);
          break;
      case 3:
 -        gen_helper_vfp_ultoh(vd, vd, shift, fpst);
 +        gen_helper_vfp_ultoh_round_to_nearest(vd, vd, shift, fpst);
          break;
      case 4:
          gen_helper_vfp_toshh_round_to_zero(vd, vd, shift, fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
      /* Switch on op:U:sx bits */
      switch (a->opc) {
      case 0:
 -        gen_helper_vfp_shtos(vd, vd, shift, fpst);
 +        gen_helper_vfp_shtos_round_to_nearest(vd, vd, shift, fpst);
          break;
      case 1:
 -        gen_helper_vfp_sltos(vd, vd, shift, fpst);
 +        gen_helper_vfp_sltos_round_to_nearest(vd, vd, shift, fpst);
          break;
      case 2:
 -        gen_helper_vfp_uhtos(vd, vd, shift, fpst);
 +        gen_helper_vfp_uhtos_round_to_nearest(vd, vd, shift, fpst);
          break;
      case 3:
 -        gen_helper_vfp_ultos(vd, vd, shift, fpst);
 +        gen_helper_vfp_ultos_round_to_nearest(vd, vd, shift, fpst);
          break;
      case 4:
          gen_helper_vfp_toshs_round_to_zero(vd, vd, shift, fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
      /* Switch on op:U:sx bits */
      switch (a->opc) {
      case 0:
 -        gen_helper_vfp_shtod(vd, vd, shift, fpst);
 +        gen_helper_vfp_shtod_round_to_nearest(vd, vd, shift, fpst);
          break;
      case 1:
 -        gen_helper_vfp_sltod(vd, vd, shift, fpst);
 +        gen_helper_vfp_sltod_round_to_nearest(vd, vd, shift, fpst);
          break;
      case 2:
 -        gen_helper_vfp_uhtod(vd, vd, shift, fpst);
 +        gen_helper_vfp_uhtod_round_to_nearest(vd, vd, shift, fpst);
          break;
      case 3:
 -        gen_helper_vfp_ultod(vd, vd, shift, fpst);
 +        gen_helper_vfp_ultod_round_to_nearest(vd, vd, shift, fpst);
          break;
      case 4:
          gen_helper_vfp_toshd_round_to_zero(vd, vd, shift, fpst);
 --
 .20.1

-[PULL 04/41] hw/arm: Restrict APEI tables generation to the 'virt' machine
+Deleted patch
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
-While APEI is a generic ACPI feature (usable by X86 and ARM64), only
-the 'virt' machine uses it, by enabling the RAS Virtualization. See
-commit 2afa8c8519: "hw/arm/virt: Introduce a RAS machine option").
-Restrict the APEI tables generation code to the single user: the virt
-machine. If another machine wants to use it, it simply has to 'select
-ACPI_APEI' in its Kconfig.
-Fixes: aa16508f1d ("ACPI: Build related register address fields via hardware error fw_cfg blob")
-Acked-by: Michael S. Tsirkin <mst@redhat.com>
-Reviewed-by: Dongjiu Geng <gengdongjiu@huawei.com>
-Acked-by: Laszlo Ersek <lersek@redhat.com>
-Reviewed-by: Igor Mammedov <imammedo@redhat.com>
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20201008161414.2672569-1-philmd@redhat.com
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- default-configs/devices/arm-softmmu.mak | 1 -
- hw/arm/Kconfig                          | 1 +
-files changed, 1 insertion(+), 1 deletion(-)
-diff --git a/default-configs/devices/arm-softmmu.mak b/default-configs/devices/arm-softmmu.mak
-index XXXXXXX..XXXXXXX 100644
---- a/default-configs/devices/arm-softmmu.mak
-+++ b/default-configs/devices/arm-softmmu.mak
-@@ -XXX,XX +XXX,XX @@ CONFIG_FSL_IMX7=y
- CONFIG_FSL_IMX6UL=y
- CONFIG_SEMIHOSTING=y
- CONFIG_ALLWINNER_H3=y
--CONFIG_ACPI_APEI=y
-diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/Kconfig
-+++ b/hw/arm/Kconfig
-@@ -XXX,XX +XXX,XX @@ config ARM_VIRT
-     select ACPI_MEMORY_HOTPLUG
-     select ACPI_HW_REDUCED
-     select ACPI_NVDIMM
-+    select ACPI_APEI
- config CHEETAH
-     bool
---
-.20.1

-[PULL 05/41] hw/timer/bcm2835: Introduce BCM2835_SYSTIMER_COUNT definition
+Deleted patch
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Use the BCM2835_SYSTIMER_COUNT definition instead of the
-magic '4' value.
-Reviewed-by: Luc Michel <luc.michel@greensocs.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20201010203709.3116542-2-f4bug@amsat.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- include/hw/timer/bcm2835_systmr.h | 4 +++-
- hw/timer/bcm2835_systmr.c         | 3 ++-
-files changed, 5 insertions(+), 2 deletions(-)
-diff --git a/include/hw/timer/bcm2835_systmr.h b/include/hw/timer/bcm2835_systmr.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/hw/timer/bcm2835_systmr.h
-+++ b/include/hw/timer/bcm2835_systmr.h
-@@ -XXX,XX +XXX,XX @@
- #define TYPE_BCM2835_SYSTIMER "bcm2835-sys-timer"
- OBJECT_DECLARE_SIMPLE_TYPE(BCM2835SystemTimerState, BCM2835_SYSTIMER)
-+#define BCM2835_SYSTIMER_COUNT 4
-+
- struct BCM2835SystemTimerState {
-     /*< private >*/
-     SysBusDevice parent_obj;
-@@ -XXX,XX +XXX,XX @@ struct BCM2835SystemTimerState {
-     struct {
-         uint32_t status;
--        uint32_t compare[4];
-+        uint32_t compare[BCM2835_SYSTIMER_COUNT];
-     } reg;
- };
-diff --git a/hw/timer/bcm2835_systmr.c b/hw/timer/bcm2835_systmr.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/timer/bcm2835_systmr.c
-+++ b/hw/timer/bcm2835_systmr.c
-@@ -XXX,XX +XXX,XX @@ static const VMStateDescription bcm2835_systmr_vmstate = {
-     .minimum_version_id = 1,
-     .fields = (VMStateField[]) {
-         VMSTATE_UINT32(reg.status, BCM2835SystemTimerState),
--        VMSTATE_UINT32_ARRAY(reg.compare, BCM2835SystemTimerState, 4),
-+        VMSTATE_UINT32_ARRAY(reg.compare, BCM2835SystemTimerState,
-+                             BCM2835_SYSTIMER_COUNT),
-         VMSTATE_END_OF_LIST()
-     }
- };
---
-.20.1

-[PULL 06/41] hw/timer/bcm2835: Rename variable holding CTRL_STATUS register
+Deleted patch
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-The variable holding the CTRL_STATUS register is misnamed
-'status'. Rename it 'ctrl_status' to make it more obvious
-this register is also used to control the peripheral.
-Reviewed-by: Luc Michel <luc.michel@greensocs.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20201010203709.3116542-3-f4bug@amsat.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- include/hw/timer/bcm2835_systmr.h | 2 +-
- hw/timer/bcm2835_systmr.c         | 8 ++++----
-files changed, 5 insertions(+), 5 deletions(-)
-diff --git a/include/hw/timer/bcm2835_systmr.h b/include/hw/timer/bcm2835_systmr.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/hw/timer/bcm2835_systmr.h
-+++ b/include/hw/timer/bcm2835_systmr.h
-@@ -XXX,XX +XXX,XX @@ struct BCM2835SystemTimerState {
-     qemu_irq irq;
-     struct {
--        uint32_t status;
-+        uint32_t ctrl_status;
-         uint32_t compare[BCM2835_SYSTIMER_COUNT];
-     } reg;
- };
-diff --git a/hw/timer/bcm2835_systmr.c b/hw/timer/bcm2835_systmr.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/timer/bcm2835_systmr.c
-+++ b/hw/timer/bcm2835_systmr.c
-@@ -XXX,XX +XXX,XX @@ REG32(COMPARE3,     0x18)
- static void bcm2835_systmr_update_irq(BCM2835SystemTimerState *s)
- {
--    bool enable = !!s->reg.status;
-+    bool enable = !!s->reg.ctrl_status;
-     trace_bcm2835_systmr_irq(enable);
-     qemu_set_irq(s->irq, enable);
-@@ -XXX,XX +XXX,XX @@ static uint64_t bcm2835_systmr_read(void *opaque, hwaddr offset,
-     switch (offset) {
-     case A_CTRL_STATUS:
--        r = s->reg.status;
-+        r = s->reg.ctrl_status;
-         break;
-     case A_COMPARE0 ... A_COMPARE3:
-         r = s->reg.compare[(offset - A_COMPARE0) >> 2];
-@@ -XXX,XX +XXX,XX @@ static void bcm2835_systmr_write(void *opaque, hwaddr offset,
-     trace_bcm2835_systmr_write(offset, value);
-     switch (offset) {
-     case A_CTRL_STATUS:
--        s->reg.status &= ~value; /* Ack */
-+        s->reg.ctrl_status &= ~value; /* Ack */
-         bcm2835_systmr_update_irq(s);
-         break;
-     case A_COMPARE0 ... A_COMPARE3:
-@@ -XXX,XX +XXX,XX @@ static const VMStateDescription bcm2835_systmr_vmstate = {
-     .version_id = 1,
-     .minimum_version_id = 1,
-     .fields = (VMStateField[]) {
--        VMSTATE_UINT32(reg.status, BCM2835SystemTimerState),
-+        VMSTATE_UINT32(reg.ctrl_status, BCM2835SystemTimerState),
-         VMSTATE_UINT32_ARRAY(reg.compare, BCM2835SystemTimerState,
-                              BCM2835_SYSTIMER_COUNT),
-         VMSTATE_END_OF_LIST()
---
-.20.1

-[PULL 07/41] hw/timer/bcm2835: Support the timer COMPARE registers
+Deleted patch
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-This peripheral has 1 free-running timer and 4 compare registers.
-Only the free-running timer is implemented. Add support the
-COMPARE registers (each register is wired to an IRQ).
-Reference: "BCM2835 ARM Peripherals" datasheet [*]
-            chapter 12 "System Timer":
-  The System Timer peripheral provides four 32-bit timer channels
-  and a single 64-bit free running counter. Each channel has an
-  output compare register, which is compared against the 32 least
-  significant bits of the free running counter values. When the
-  two values match, the system timer peripheral generates a signal
-  to indicate a match for the appropriate channel. The match signal
-  is then fed into the interrupt controller.
-This peripheral is used since Linux 3.7, commit ee4af5696720
-("ARM: bcm2835: add system timer").
-[*] https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Reviewed-by: Luc Michel <luc@lmichel.fr>
-Message-id: 20201010203709.3116542-4-f4bug@amsat.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- include/hw/timer/bcm2835_systmr.h | 11 +++++--
- hw/timer/bcm2835_systmr.c         | 48 ++++++++++++++++++++-----------
- hw/timer/trace-events             |  6 ++--
-files changed, 44 insertions(+), 21 deletions(-)
-diff --git a/include/hw/timer/bcm2835_systmr.h b/include/hw/timer/bcm2835_systmr.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/hw/timer/bcm2835_systmr.h
-+++ b/include/hw/timer/bcm2835_systmr.h
-@@ -XXX,XX +XXX,XX @@
- #include "hw/sysbus.h"
- #include "hw/irq.h"
-+#include "qemu/timer.h"
- #include "qom/object.h"
- #define TYPE_BCM2835_SYSTIMER "bcm2835-sys-timer"
-@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(BCM2835SystemTimerState, BCM2835_SYSTIMER)
- #define BCM2835_SYSTIMER_COUNT 4
-+typedef struct {
-+    unsigned id;
-+    QEMUTimer timer;
-+    qemu_irq irq;
-+    BCM2835SystemTimerState *state;
-+} BCM2835SystemTimerCompare;
-+
- struct BCM2835SystemTimerState {
-     /*< private >*/
-     SysBusDevice parent_obj;
-     /*< public >*/
-     MemoryRegion iomem;
--    qemu_irq irq;
--
-     struct {
-         uint32_t ctrl_status;
-         uint32_t compare[BCM2835_SYSTIMER_COUNT];
-     } reg;
-+    BCM2835SystemTimerCompare tmr[BCM2835_SYSTIMER_COUNT];
- };
- #endif
-diff --git a/hw/timer/bcm2835_systmr.c b/hw/timer/bcm2835_systmr.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/timer/bcm2835_systmr.c
-+++ b/hw/timer/bcm2835_systmr.c
-@@ -XXX,XX +XXX,XX @@ REG32(COMPARE1,     0x10)
- REG32(COMPARE2,     0x14)
- REG32(COMPARE3,     0x18)
--static void bcm2835_systmr_update_irq(BCM2835SystemTimerState *s)
-+static void bcm2835_systmr_timer_expire(void *opaque)
- {
--    bool enable = !!s->reg.ctrl_status;
-+    BCM2835SystemTimerCompare *tmr = opaque;
--    trace_bcm2835_systmr_irq(enable);
--    qemu_set_irq(s->irq, enable);
--}
--
--static void bcm2835_systmr_update_compare(BCM2835SystemTimerState *s,
--                                          unsigned timer_index)
--{
--    /* TODO fow now, since neither Linux nor U-boot use these timers. */
--    qemu_log_mask(LOG_UNIMP, "COMPARE register %u not implemented\n",
--                  timer_index);
-+    trace_bcm2835_systmr_timer_expired(tmr->id);
-+    tmr->state->reg.ctrl_status |= 1 << tmr->id;
-+    qemu_set_irq(tmr->irq, 1);
- }
- static uint64_t bcm2835_systmr_read(void *opaque, hwaddr offset,
-@@ -XXX,XX +XXX,XX @@ static uint64_t bcm2835_systmr_read(void *opaque, hwaddr offset,
- }
- static void bcm2835_systmr_write(void *opaque, hwaddr offset,
--                                 uint64_t value, unsigned size)
-+                                 uint64_t value64, unsigned size)
- {
-     BCM2835SystemTimerState *s = BCM2835_SYSTIMER(opaque);
-+    int index;
-+    uint32_t value = value64;
-+    uint32_t triggers_delay_us;
-+    uint64_t now;
-     trace_bcm2835_systmr_write(offset, value);
-     switch (offset) {
-     case A_CTRL_STATUS:
-         s->reg.ctrl_status &= ~value; /* Ack */
--        bcm2835_systmr_update_irq(s);
-+        for (index = 0; index < ARRAY_SIZE(s->tmr); index++) {
-+            if (extract32(value, index, 1)) {
-+                trace_bcm2835_systmr_irq_ack(index);
-+                qemu_set_irq(s->tmr[index].irq, 0);
-+            }
-+        }
-         break;
-     case A_COMPARE0 ... A_COMPARE3:
--        s->reg.compare[(offset - A_COMPARE0) >> 2] = value;
--        bcm2835_systmr_update_compare(s, (offset - A_COMPARE0) >> 2);
-+        index = (offset - A_COMPARE0) >> 2;
-+        s->reg.compare[index] = value;
-+        now = qemu_clock_get_us(QEMU_CLOCK_VIRTUAL);
-+        /* Compare lower 32-bits of the free-running counter. */
-+        triggers_delay_us = value - now;
-+        trace_bcm2835_systmr_run(index, triggers_delay_us);
-+        timer_mod(&s->tmr[index].timer, now + triggers_delay_us);
-         break;
-     case A_COUNTER_LOW:
-     case A_COUNTER_HIGH:
-@@ -XXX,XX +XXX,XX @@ static void bcm2835_systmr_realize(DeviceState *dev, Error **errp)
-     memory_region_init_io(&s->iomem, OBJECT(dev), &bcm2835_systmr_ops,
-                           s, "bcm2835-sys-timer", 0x20);
-     sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->iomem);
--    sysbus_init_irq(SYS_BUS_DEVICE(dev), &s->irq);
-+
-+    for (size_t i = 0; i < ARRAY_SIZE(s->tmr); i++) {
-+        s->tmr[i].id = i;
-+        s->tmr[i].state = s;
-+        sysbus_init_irq(SYS_BUS_DEVICE(dev), &s->tmr[i].irq);
-+        timer_init_us(&s->tmr[i].timer, QEMU_CLOCK_VIRTUAL,
-+                      bcm2835_systmr_timer_expire, &s->tmr[i]);
-+    }
- }
- static const VMStateDescription bcm2835_systmr_vmstate = {
-diff --git a/hw/timer/trace-events b/hw/timer/trace-events
-index XXXXXXX..XXXXXXX 100644
---- a/hw/timer/trace-events
-+++ b/hw/timer/trace-events
-@@ -XXX,XX +XXX,XX @@ nrf51_timer_write(uint8_t timer_id, uint64_t addr, uint32_t value, unsigned size
- nrf51_timer_set_count(uint8_t timer_id, uint8_t counter_id, uint32_t value) "timer %u counter %u count 0x%" PRIx32
- # bcm2835_systmr.c
--bcm2835_systmr_irq(bool enable) "timer irq state %u"
-+bcm2835_systmr_timer_expired(unsigned id) "timer #%u expired"
-+bcm2835_systmr_irq_ack(unsigned id) "timer #%u acked"
- bcm2835_systmr_read(uint64_t offset, uint64_t data) "timer read: offset 0x%" PRIx64 " data 0x%" PRIx64
--bcm2835_systmr_write(uint64_t offset, uint64_t data) "timer write: offset 0x%" PRIx64 " data 0x%" PRIx64
-+bcm2835_systmr_write(uint64_t offset, uint32_t data) "timer write: offset 0x%" PRIx64 " data 0x%" PRIx32
-+bcm2835_systmr_run(unsigned id, uint64_t delay_us) "timer #%u expiring in %"PRIu64" us"
- # avr_timer16.c
- avr_timer16_read(uint8_t addr, uint8_t value) "timer16 read addr:%u value:%u"
---
-.20.1

-[PULL 08/41] hw/arm/bcm2835_peripherals: Correctly wire the SYS_timer IRQs
+Deleted patch
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-The SYS_timer is not directly wired to the ARM core, but to the
-SoC (peripheral) interrupt controller.
-Fixes: 0e5bbd74064 ("hw/arm/bcm2835_peripherals: Use the SYS_timer")
-Reviewed-by: Luc Michel <luc.michel@greensocs.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20201010203709.3116542-5-f4bug@amsat.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- hw/arm/bcm2835_peripherals.c | 13 +++++++++++--
-file changed, 11 insertions(+), 2 deletions(-)
-diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/bcm2835_peripherals.c
-+++ b/hw/arm/bcm2835_peripherals.c
-@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
-     memory_region_add_subregion(&s->peri_mr, ST_OFFSET,
-                 sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->systmr), 0));
-     sysbus_connect_irq(SYS_BUS_DEVICE(&s->systmr), 0,
--        qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_ARM_IRQ,
--                               INTERRUPT_ARM_TIMER));
-+        qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
-+                               INTERRUPT_TIMER0));
-+    sysbus_connect_irq(SYS_BUS_DEVICE(&s->systmr), 1,
-+        qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
-+                               INTERRUPT_TIMER1));
-+    sysbus_connect_irq(SYS_BUS_DEVICE(&s->systmr), 2,
-+        qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
-+                               INTERRUPT_TIMER2));
-+    sysbus_connect_irq(SYS_BUS_DEVICE(&s->systmr), 3,
-+        qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
-+                               INTERRUPT_TIMER3));
-     /* UART0 */
-     qdev_prop_set_chr(DEVICE(&s->uart0), "chardev", serial_hd(0));
---
-.20.1

-[PULL 12/41] loads-stores.rst: add footnote that clarifies GETPC usage
+Deleted patch
-From: Emanuele Giuseppe Esposito <e.emanuelegiuseppe@gmail.com>
-Current documentation is not too clear on the GETPC usage.
-In particular, when used outside the top level helper function
-it causes unexpected behavior.
-Signed-off-by: Emanuele Giuseppe Esposito <e.emanuelegiuseppe@gmail.com>
-Message-id: 20201015095147.1691-1-e.emanuelegiuseppe@gmail.com
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- docs/devel/loads-stores.rst | 8 +++++++-
-file changed, 7 insertions(+), 1 deletion(-)
-diff --git a/docs/devel/loads-stores.rst b/docs/devel/loads-stores.rst
-index XXXXXXX..XXXXXXX 100644
---- a/docs/devel/loads-stores.rst
-+++ b/docs/devel/loads-stores.rst
-@@ -XXX,XX +XXX,XX @@ guest CPU state in case of a guest CPU exception.  This is passed
- to ``cpu_restore_state()``.  Therefore the value should either be 0,
- to indicate that the guest CPU state is already synchronized, or
- the result of ``GETPC()`` from the top level ``HELPER(foo)``
--function, which is a return address into the generated code.
-+function, which is a return address into the generated code [#gpc]_.
-+
-+.. [#gpc] Note that ``GETPC()`` should be used with great care: calling
-+          it in other functions that are *not* the top level
-+          ``HELPER(foo)`` will cause unexpected behavior. Instead, the
-+          value of ``GETPC()`` should be read from the helper and passed
-+          if needed to the functions that the helper calls.
- Function names follow the pattern:
---
-.20.1

-[PULL 13/41] hw/intc/bcm2835_ic: Trace GPU/CPU IRQ handlers
+Deleted patch
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Add trace events for GPU and CPU IRQs.
-Reviewed-by: Luc Michel <luc.michel@greensocs.com>
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20201017180731.1165871-2-f4bug@amsat.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- hw/intc/bcm2835_ic.c | 4 +++-
- hw/intc/trace-events | 4 ++++
-files changed, 7 insertions(+), 1 deletion(-)
-diff --git a/hw/intc/bcm2835_ic.c b/hw/intc/bcm2835_ic.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/bcm2835_ic.c
-+++ b/hw/intc/bcm2835_ic.c
-@@ -XXX,XX +XXX,XX @@
- #include "migration/vmstate.h"
- #include "qemu/log.h"
- #include "qemu/module.h"
-+#include "trace.h"
- #define GPU_IRQS 64
- #define ARM_IRQS 8
-@@ -XXX,XX +XXX,XX @@ static void bcm2835_ic_update(BCM2835ICState *s)
-     set = (s->gpu_irq_level & s->gpu_irq_enable)
-         || (s->arm_irq_level & s->arm_irq_enable);
-     qemu_set_irq(s->irq, set);
--
- }
- static void bcm2835_ic_set_gpu_irq(void *opaque, int irq, int level)
-@@ -XXX,XX +XXX,XX @@ static void bcm2835_ic_set_gpu_irq(void *opaque, int irq, int level)
-     BCM2835ICState *s = opaque;
-     assert(irq >= 0 && irq < 64);
-+    trace_bcm2835_ic_set_gpu_irq(irq, level);
-     s->gpu_irq_level = deposit64(s->gpu_irq_level, irq, 1, level != 0);
-     bcm2835_ic_update(s);
- }
-@@ -XXX,XX +XXX,XX @@ static void bcm2835_ic_set_arm_irq(void *opaque, int irq, int level)
-     BCM2835ICState *s = opaque;
-     assert(irq >= 0 && irq < 8);
-+    trace_bcm2835_ic_set_cpu_irq(irq, level);
-     s->arm_irq_level = deposit32(s->arm_irq_level, irq, 1, level != 0);
-     bcm2835_ic_update(s);
- }
-diff --git a/hw/intc/trace-events b/hw/intc/trace-events
-index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/trace-events
-+++ b/hw/intc/trace-events
-@@ -XXX,XX +XXX,XX @@ nvic_sysreg_write(uint64_t addr, uint32_t value, unsigned size) "NVIC sysreg wri
- heathrow_write(uint64_t addr, unsigned int n, uint64_t value) "0x%"PRIx64" %u: 0x%"PRIx64
- heathrow_read(uint64_t addr, unsigned int n, uint64_t value) "0x%"PRIx64" %u: 0x%"PRIx64
- heathrow_set_irq(int num, int level) "set_irq: num=0x%02x level=%d"
-+
-+# bcm2835_ic.c
-+bcm2835_ic_set_gpu_irq(int irq, int level) "GPU irq #%d level %d"
-+bcm2835_ic_set_cpu_irq(int irq, int level) "CPU irq #%d level %d"
---
-.20.1

-[PULL 14/41] hw/intc/bcm2836_control: Use IRQ definitions instead of magic numbers
+Deleted patch
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-The IRQ values are defined few lines earlier, use them instead of
-the magic numbers.
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20201017180731.1165871-3-f4bug@amsat.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- hw/intc/bcm2836_control.c | 8 ++++----
-file changed, 4 insertions(+), 4 deletions(-)
-diff --git a/hw/intc/bcm2836_control.c b/hw/intc/bcm2836_control.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/bcm2836_control.c
-+++ b/hw/intc/bcm2836_control.c
-@@ -XXX,XX +XXX,XX @@ static void bcm2836_control_set_local_irq(void *opaque, int core, int local_irq,
- static void bcm2836_control_set_local_irq0(void *opaque, int core, int level)
- {
--    bcm2836_control_set_local_irq(opaque, core, 0, level);
-+    bcm2836_control_set_local_irq(opaque, core, IRQ_CNTPSIRQ, level);
- }
- static void bcm2836_control_set_local_irq1(void *opaque, int core, int level)
- {
--    bcm2836_control_set_local_irq(opaque, core, 1, level);
-+    bcm2836_control_set_local_irq(opaque, core, IRQ_CNTPNSIRQ, level);
- }
- static void bcm2836_control_set_local_irq2(void *opaque, int core, int level)
- {
--    bcm2836_control_set_local_irq(opaque, core, 2, level);
-+    bcm2836_control_set_local_irq(opaque, core, IRQ_CNTHPIRQ, level);
- }
- static void bcm2836_control_set_local_irq3(void *opaque, int core, int level)
- {
--    bcm2836_control_set_local_irq(opaque, core, 3, level);
-+    bcm2836_control_set_local_irq(opaque, core, IRQ_CNTVIRQ, level);
- }
- static void bcm2836_control_set_gpu_irq(void *opaque, int irq, int level)
---
-.20.1

-[PULL 15/41] target/arm: Remove redundant mmu_idx lookup
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-We already have the full ARMMMUIdx as computed from the
-function parameter.
-For the purpose of regime_has_2_ranges, we can ignore any
-difference between AccType_Normal and AccType_Unpriv, which
-would be the only difference between the passed mmu_idx
-and arm_mmu_idx_el.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
-Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
-Message-id: 20201008162155.161886-2-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/mte_helper.c | 3 +--
-file changed, 1 insertion(+), 2 deletions(-)
-diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mte_helper.c
-+++ b/target/arm/mte_helper.c
-@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
-     case 2:
-         /* Tag check fail causes asynchronous flag set.  */
--        mmu_idx = arm_mmu_idx_el(env, el);
--        if (regime_has_2_ranges(mmu_idx)) {
-+        if (regime_has_2_ranges(arm_mmu_idx)) {
-             select = extract64(dirty_ptr, 55, 1);
-         } else {
-             select = 0;
---
-.20.1

-[PULL 16/41] target/arm: Fix reported EL for mte_check_fail
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-The reporting in AArch64.TagCheckFail only depends on PSTATE.EL,
-and not the AccType of the operation.  There are two guest
-visible problems that affect LDTR and STTR because of this:
-(1) Selecting TCF0 vs TCF1 to decide on reporting,
-(2) Report "data abort same el" not "data abort lower el".
-Reported-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
-Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
-Message-id: 20201008162155.161886-3-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/mte_helper.c | 10 +++-------
-file changed, 3 insertions(+), 7 deletions(-)
-diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/mte_helper.c
-+++ b/target/arm/mte_helper.c
-@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
-     reg_el = regime_el(env, arm_mmu_idx);
-     sctlr = env->cp15.sctlr_el[reg_el];
--    switch (arm_mmu_idx) {
--    case ARMMMUIdx_E10_0:
--    case ARMMMUIdx_E20_0:
--        el = 0;
-+    el = arm_current_el(env);
-+    if (el == 0) {
-         tcf = extract64(sctlr, 38, 2);
--        break;
--    default:
--        el = reg_el;
-+    } else {
-         tcf = extract64(sctlr, 40, 2);
-     }
---
-.20.1

-[PULL 17/41] target/arm: Ignore HCR_EL2.ATA when {E2H,TGE} != 11
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-Unlike many other bits in HCR_EL2, the description for this
-bit does not contain the phrase "if ... this field behaves
-as 0 for all purposes other than", so do not squash the bit
-in arm_hcr_el2_eff.
-Instead, replicate the E2H+TGE test in the two places that
-require it.
-Reported-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
-Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
-Message-id: 20201008162155.161886-4-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/internals.h | 9 +++++----
- target/arm/helper.c    | 9 +++++----
-files changed, 10 insertions(+), 8 deletions(-)
-diff --git a/target/arm/internals.h b/target/arm/internals.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/internals.h
-+++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ static inline bool allocation_tag_access_enabled(CPUARMState *env, int el,
-         && !(env->cp15.scr_el3 & SCR_ATA)) {
-         return false;
-     }
--    if (el < 2
--        && arm_feature(env, ARM_FEATURE_EL2)
--        && !(arm_hcr_el2_eff(env) & HCR_ATA)) {
--        return false;
-+    if (el < 2 && arm_feature(env, ARM_FEATURE_EL2)) {
-+        uint64_t hcr = arm_hcr_el2_eff(env);
-+        if (!(hcr & HCR_ATA) && (!(hcr & HCR_E2H) || !(hcr & HCR_TGE))) {
-+            return false;
-+        }
-     }
-     sctlr &= (el == 0 ? SCTLR_ATA0 : SCTLR_ATA);
-     return sctlr != 0;
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_mte(CPUARMState *env, const ARMCPRegInfo *ri,
- {
-     int el = arm_current_el(env);
--    if (el < 2 &&
--        arm_feature(env, ARM_FEATURE_EL2) &&
--        !(arm_hcr_el2_eff(env) & HCR_ATA)) {
--        return CP_ACCESS_TRAP_EL2;
-+    if (el < 2 && arm_feature(env, ARM_FEATURE_EL2)) {
-+        uint64_t hcr = arm_hcr_el2_eff(env);
-+        if (!(hcr & HCR_ATA) && (!(hcr & HCR_E2H) || !(hcr & HCR_TGE))) {
-+            return CP_ACCESS_TRAP_EL2;
-+        }
-     }
-     if (el < 3 &&
-         arm_feature(env, ARM_FEATURE_EL3) &&
---
-.20.1

-[PULL 19/41] hw/arm/nseries: Fix loading kernel image on n8x0 machines
+Deleted patch
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Commit 7998beb9c2e removed the ram_size initialization in the
-arm_boot_info structure, however it is used by arm_load_kernel().
-Initialize the field to fix:
-  $ qemu-system-arm -M n800 -append 'console=ttyS1' \
-    -kernel meego-arm-n8x0-1.0.80.20100712.1431-vmlinuz-2.6.35~rc4-129.1-n8x0
-  qemu-system-arm: kernel 'meego-arm-n8x0-1.0.80.20100712.1431-vmlinuz-2.6.35~rc4-129.1-n8x0' is too large to fit in RAM (kernel size 1964608, RAM size 0)
-Noticed while running the test introduced in commit 050a82f0c5b
-("tests/acceptance: Add a test for the N800 and N810 arm machines").
-Fixes: 7998beb9c2e ("arm/nseries: use memdev for RAM")
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Tested-by: Thomas Huth <thuth@redhat.com>
-Message-id: 20201019095148.1602119-1-f4bug@amsat.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- hw/arm/nseries.c | 1 +
-file changed, 1 insertion(+)
-diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/nseries.c
-+++ b/hw/arm/nseries.c
-@@ -XXX,XX +XXX,XX @@ static void n8x0_init(MachineState *machine,
-         g_free(sz);
-         exit(EXIT_FAILURE);
-     }
-+    binfo->ram_size = machine->ram_size;
-     memory_region_add_subregion(get_system_memory(), OMAP2_Q2_BASE,
-                                 machine->ram);
---
-.20.1

-[PULL 20/41] decodetree: Fix codegen for non-overlapping group inside overlapping group
+Deleted patch
-For nested groups like:
-  {
-    [
-      pattern 1
-      pattern 2
-    ]
-    pattern 3
-  }
-the intended behaviour is that patterns 1 and 2 must not
-overlap with each other; if the insn matches neither then
-we fall through to pattern 3 as the next thing in the
-outer overlapping group.
-Currently we generate incorrect code for this situation,
-because in the code path for a failed match inside the
-inner non-overlapping group we generate a "return" statement,
-which causes decode to stop entirely rather than continuing
-to the next thing in the outer group.
-Generate a "break" instead, so that decode flow behaves
-as required for this nested group case.
-Suggested-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20201019151301.2046-2-peter.maydell@linaro.org
----
- scripts/decodetree.py | 2 +-
-file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/scripts/decodetree.py b/scripts/decodetree.py
-index XXXXXXX..XXXXXXX 100644
---- a/scripts/decodetree.py
-+++ b/scripts/decodetree.py
-@@ -XXX,XX +XXX,XX @@ class Tree:
-             output(ind, '    /* ',
-                    str_match_bits(innerbits, innermask), ' */\n')
-             s.output_code(i + 4, extracted, innerbits, innermask)
--            output(ind, '    return false;\n')
-+            output(ind, '    break;\n')
-         output(ind, '}\n')
- # end Tree
---
-.20.1

-[PULL 21/41] target/arm: Implement v8.1M NOCP handling
+[PULL 20/24] target/arm: Implement MVE VADDLV
-From v8.1M, disabled-coprocessor handling changes slightly:
+Implement the MVE VADDLV insn; this is similar to VADDV, except
- * coprocessors 8, 9, 14 and 15 are also governed by the
+that it accumulates 32-bit elements into a 64-bit accumulator
-   cp10 enable bit, like cp11
+stored in a pair of general-purpose registers.
  * an extra range of instruction patterns is considered
    to be inside the coprocessor space
 We previously marked these up with TODO comments; implement the
 correct behaviour.
 Unfortunately there is no ID register field which indicates this
 behaviour.  We could in theory test an unrelated ID register which
 indicates guaranteed-to-be-in-v8.1M behaviour like ID_ISAR0.CmpBranch
 >= 3 (low-overhead-loops), but it seems better to simply define a new
 ARM_FEATURE_V8_1M feature flag and use it for this and other
 new-in-v8.1M behaviour that isn't identifiable from the ID registers.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201019151301.2046-3-peter.maydell@linaro.org
+Message-id: 20210628135835.6690-15-peter.maydell@linaro.org
 ---
- target/arm/cpu.h               |  1 +
+ target/arm/helper-mve.h    |  3 ++
- target/arm/m-nocp.decode       | 10 ++++++----
+ target/arm/mve.decode      |  6 +++-
- target/arm/translate-vfp.c.inc | 17 +++++++++++++++--
+ target/arm/mve_helper.c    | 19 ++++++++++++
-files changed, 22 insertions(+), 6 deletions(-)
+ target/arm/translate-mve.c | 63 ++++++++++++++++++++++++++++++++++++++
 files changed, 90 insertions(+), 1 deletion(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/helper-mve.h
-+++ b/target/arm/cpu.h
++++ b/target/arm/helper-mve.h
-@@ -XXX,XX +XXX,XX @@ enum arm_features {
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vaddvuh, TCG_CALL_NO_WG, i32, env, ptr, i32)
-     ARM_FEATURE_VBAR, /* has cp15 VBAR */
+ DEF_HELPER_FLAGS_3(mve_vaddvsw, TCG_CALL_NO_WG, i32, env, ptr, i32)
-     ARM_FEATURE_M_SECURITY, /* M profile Security Extension */
+ DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
-     ARM_FEATURE_M_MAIN, /* M profile Main Extension */
-+    ARM_FEATURE_V8_1M, /* M profile extras only in v8.1M and later */
++DEF_HELPER_FLAGS_3(mve_vaddlv_s, TCG_CALL_NO_WG, i64, env, ptr, i64)
- };
++DEF_HELPER_FLAGS_3(mve_vaddlv_u, TCG_CALL_NO_WG, i64, env, ptr, i64)
++
- static inline int arm_feature(CPUARMState *env, int feature)
+ DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
-diff --git a/target/arm/m-nocp.decode b/target/arm/m-nocp.decode
+ DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
  DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
 diff --git a/target/arm/mve.decode b/target/arm/mve.decode
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/m-nocp.decode
+--- a/target/arm/mve.decode
-+++ b/target/arm/m-nocp.decode
++++ b/target/arm/mve.decode
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
- # If the coprocessor is not present or disabled then we will generate
+ VQRDMULH_scalar  1111 1110 0 . .. ... 1 ... 0 1110 . 110 .... @2scalar
- # the NOCP exception; otherwise we let the insn through to the main decode.
+ # Vector add across vector
-+&nocp cp
+-VADDV            111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rdalo
 +{
 +  VADDV          111 u:1 1110 1111 size:2 01 ... 0 1111 0 0 a:1 0 qm:3 0 rda=%rdalo
 +  VADDLV         111 u:1 1110 1 ... 1001 ... 0 1111 00 a:1 0 qm:3 0 \
 +                 rdahi=%rdahi rdalo=%rdalo
 +}
  # Predicate operations
  %mask_22_13      22:1 13:3
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvub, 1, uint8_t)
  DO_VADDV(vaddvuh, 2, uint16_t)
  DO_VADDV(vaddvuw, 4, uint32_t)
 +#define DO_VADDLV(OP, TYPE, LTYPE)                              \
 +    uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
 +                                    uint64_t ra)                \
 +    {                                                           \
 +        uint16_t mask = mve_element_mask(env);                  \
 +        unsigned e;                                             \
 +        TYPE *m = vm;                                           \
 +        for (e = 0; e < 16 / 4; e++, mask >>= 4) {              \
 +            if (mask & 1) {                                     \
 +                ra += (LTYPE)m[H4(e)];                          \
 +            }                                                   \
 +        }                                                       \
 +        mve_advance_vpt(env);                                   \
 +        return ra;                                              \
 +    }                                                           \
 +
- {
++DO_VADDLV(vaddlv_s, int32_t, int64_t)
-   # Special cases which do not take an early NOCP: VLLDM and VLSTM
++DO_VADDLV(vaddlv_u, uint32_t, uint64_t)
-   VLLDM_VLSTM  1110 1100 001 l:1 rn:4 0000 1010 0000 0000
++
-   # TODO: VSCCLRM (new in v8.1M) is similar:
+ /* Shifts by immediate */
-   #VSCCLRM      1110 1100 1-01 1111 ---- 1011 ---- ---0
+ #define DO_2SHIFT(OP, ESIZE, TYPE, FN)                          \
+     void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
--  NOCP         111- 1110 ---- ---- ---- cp:4 ---- ----
+diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
 -  NOCP         111- 110- ---- ---- ---- cp:4 ---- ----
 -  # TODO: From v8.1M onwards we will also want this range to NOCP
 -  #NOCP_8_1     111- 1111 ---- ---- ---- ---- ---- ---- cp=10
 +  NOCP         111- 1110 ---- ---- ---- cp:4 ---- ---- &nocp
 +  NOCP         111- 110- ---- ---- ---- cp:4 ---- ---- &nocp
 +  # From v8.1M onwards this range will also NOCP:
 +  NOCP_8_1     111- 1111 ---- ---- ---- ---- ---- ---- &nocp cp=10
  }
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-vfp.c.inc
+--- a/target/arm/translate-mve.c
-+++ b/target/arm/translate-vfp.c.inc
++++ b/target/arm/translate-mve.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLLDM_VLSTM(DisasContext *s, arg_VLLDM_VLSTM *a)
+@@ -XXX,XX +XXX,XX @@ static bool trans_VADDV(DisasContext *s, arg_VADDV *a)
      return true;
  }
--static bool trans_NOCP(DisasContext *s, arg_NOCP *a)
++static bool trans_VADDLV(DisasContext *s, arg_VADDLV *a)
 +static bool trans_NOCP(DisasContext *s, arg_nocp *a)
  {
      /*
       * Handle M-profile early check for disabled coprocessor:
@@ -XXX,XX +XXX,XX @@ static bool trans_NOCP(DisasContext *s, arg_NOCP *a)
      if (a->cp == 11) {
          a->cp = 10;
      }
 -    /* TODO: in v8.1M cp 8, 9, 14, 15 also are governed by the cp10 enable */
 +    if (arm_dc_feature(s, ARM_FEATURE_V8_1M) &&
 +        (a->cp == 8 || a->cp == 9 || a->cp == 14 || a->cp == 15)) {
 +        /* in v8.1M cp 8, 9, 14, 15 also are governed by the cp10 enable */
 +        a->cp = 10;
 +    }
      if (a->cp != 10) {
          gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
@@ -XXX,XX +XXX,XX @@ static bool trans_NOCP(DisasContext *s, arg_NOCP *a)
      return false;
  }
 +static bool trans_NOCP_8_1(DisasContext *s, arg_nocp *a)
 +{
-+    /* This range needs a coprocessor check for v8.1M and later only */
++    /*
-+    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
++     * Vector Add Long Across Vector: accumulate the 32-bit
 +     * elements of the vector into a 64-bit result stored in
 +     * a pair of general-purpose registers.
 +     * No need to check Qm's bank: it is only 3 bits in decode.
 +     */
 +    TCGv_ptr qm;
 +    TCGv_i64 rda;
 +    TCGv_i32 rdalo, rdahi;
 +
 +    if (!dc_isar_feature(aa32_mve, s)) {
 +        return false;
 +    }
-+    return trans_NOCP(s, a);
++    /*
 +     * rdahi == 13 is UNPREDICTABLE; rdahi == 15 is a related
 +     * encoding; rdalo always has bit 0 clear so cannot be 13 or 15.
 +     */
 +    if (a->rdahi == 13 || a->rdahi == 15) {
 +        return false;
 +    }
 +    if (!mve_eci_check(s) || !vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    /*
 +     * This insn is subject to beat-wise execution. Partial execution
 +     * of an A=0 (no-accumulate) insn which does not execute the first
 +     * beat must start with the current value of RdaHi:RdaLo, not zero.
 +     */
 +    if (a->a || mve_skip_first_beat(s)) {
 +        /* Accumulate input from RdaHi:RdaLo */
 +        rda = tcg_temp_new_i64();
 +        rdalo = load_reg(s, a->rdalo);
 +        rdahi = load_reg(s, a->rdahi);
 +        tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
 +        tcg_temp_free_i32(rdalo);
 +        tcg_temp_free_i32(rdahi);
 +    } else {
 +        /* Accumulate starting at zero */
 +        rda = tcg_const_i64(0);
 +    }
 +
 +    qm = mve_qreg_ptr(a->qm);
 +    if (a->u) {
 +        gen_helper_mve_vaddlv_u(rda, cpu_env, qm, rda);
 +    } else {
 +        gen_helper_mve_vaddlv_s(rda, cpu_env, qm, rda);
 +    }
 +    tcg_temp_free_ptr(qm);
 +
 +    rdalo = tcg_temp_new_i32();
 +    rdahi = tcg_temp_new_i32();
 +    tcg_gen_extrl_i64_i32(rdalo, rda);
 +    tcg_gen_extrh_i64_i32(rdahi, rda);
 +    store_reg(s, a->rdalo, rdalo);
 +    store_reg(s, a->rdahi, rdahi);
 +    tcg_temp_free_i64(rda);
 +    mve_update_eci(s);
 +    return true;
 +}
 +
- static bool trans_VINS(DisasContext *s, arg_VINS *a)
+ static bool do_1imm(DisasContext *s, arg_1imm *a, MVEGenOneOpImmFn *fn)
  {
-     TCGv_i32 rd, rm;
+     TCGv_ptr qd;
 --
 .20.1

-[PULL 25/41] target/arm: Implement v8.1M branch-future insns (as NOPs)
+[PULL 21/24] target/arm: Implement MVE long shifts by immediate
-v8.1M implements a new 'branch future' feature, which is a
+The MVE extension to v8.1M includes some new shift instructions which
-set of instructions that request the CPU to perform a branch
+sit entirely within the non-coprocessor part of the encoding space
-"in the future", when it reaches a particular execution address.
+and which operate only on general-purpose registers.  They take up
-In hardware, the expected implementation is that the information
+the space which was previously UNPREDICTABLE MOVS and ORRS encodings
-about the branch location and destination is cached and then
+with Rm == 13 or 15.
-acted upon when execution reaches the specified address.
-However the architecture permits an implementation to discard
+Implement the long shifts by immediate, which perform shifts on a
-this cached information at any point, and so guest code must
+pair of general-purpose registers treated as a 64-bit quantity, with
-always include a normal branch insn at the branch point as
+an immediate shift count between 1 and 32.
-a fallback. In particular, an implementation is specifically
-permitted to treat all BF insns as NOPs (which is equivalent
+Awkwardly, because the MOVS and ORRS trans functions do not UNDEF for
-to discarding the cached information immediately).
+the Rm==13,15 case, we need to explicitly emit code to UNDEF for the
+cases where v8.1M now requires that.  (Trying to change MOVS and ORRS
-For QEMU, implementing this caching of branch information
+is too difficult, because the functions that generate the code are
-would be complicated and would not improve the speed of
+shared between a dozen different kinds of arithmetic or logical
-execution at all, so we make the IMPDEF choice to implement
+instruction for all A32, T16 and T32 encodings, and for some insns
-all BF insns as NOPs.
+and some encodings Rm==13,15 are valid.)
 We make the helper functions we need for UQSHLL and SQSHLL take
 a 32-bit value which the helper casts to int8_t because we'll need
 these helpers also for the shift-by-register insns, where the shift
 count might be < 0 or > 32.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20210628135835.6690-16-peter.maydell@linaro.org
 Message-id: 20201019151301.2046-7-peter.maydell@linaro.org
 ---
- target/arm/cpu.h       |  6 ++++++
+ target/arm/helper-mve.h |  3 ++
- target/arm/t32.decode  | 13 ++++++++++++-
+ target/arm/translate.h  |  1 +
- target/arm/translate.c | 20 ++++++++++++++++++++
+ target/arm/t32.decode   | 28 +++++++++++++
-files changed, 38 insertions(+), 1 deletion(-)
+ target/arm/mve_helper.c | 10 +++++
+ target/arm/translate.c  | 90 +++++++++++++++++++++++++++++++++++++++++
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+files changed, 132 insertions(+)
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
-+++ b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_arm_div(const ARMISARegisters *id)
+--- a/target/arm/helper-mve.h
-     return FIELD_EX32(id->id_isar0, ID_ISAR0, DIVIDE) > 1;
++++ b/target/arm/helper-mve.h
- }
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
-+static inline bool isar_feature_aa32_lob(const ARMISARegisters *id)
-+{
+ DEF_HELPER_FLAGS_4(mve_vshlc, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
-+    /* (M-profile) low-overhead loops and branch future */
++
-+    return FIELD_EX32(id->id_isar0, ID_ISAR0, CMPBRANCH) >= 3;
++DEF_HELPER_FLAGS_3(mve_sqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
-+}
++DEF_HELPER_FLAGS_3(mve_uqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
-+
+diff --git a/target/arm/translate.h b/target/arm/translate.h
- static inline bool isar_feature_aa32_jazelle(const ARMISARegisters *id)
+index XXXXXXX..XXXXXXX 100644
- {
+--- a/target/arm/translate.h
-     return FIELD_EX32(id->id_isar1, ID_ISAR1, JAZELLE) != 0;
++++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
  typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
 +typedef void WideShiftImmFn(TCGv_i64, TCGv_i64, int64_t shift);
  /**
   * arm_tbflags_from_tb:
 diff --git a/target/arm/t32.decode b/target/arm/t32.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/t32.decode
 +++ b/target/arm/t32.decode
-@@ -XXX,XX +XXX,XX @@ MRC              1110 1110 ... 1 .... .... .... ... 1 .... @mcr
+@@ -XXX,XX +XXX,XX @@
+ &mcr             !extern cp opc1 crn crm opc2 rt
- B                1111 0. .......... 10.1 ............         @branch24
+ &mcrr            !extern cp opc1 crm rt rt2
- BL               1111 0. .......... 11.1 ............         @branch24
--BLX_i            1111 0. .......... 11.0 ............         @branch24
++&mve_shl_ri      rdalo rdahi shim
-+{
++
-+  # BLX_i is non-M-profile only
++# rdahi: bits [3:1] from insn, bit 0 is 1
-+  BLX_i          1111 0. .......... 11.0 ............         @branch24
++# rdalo: bits [3:1] from insn, bit 0 is 0
-+  # M-profile only: loop and branch insns
++%rdahi_9 9:3 !function=times_2_plus_1
 +%rdalo_17 17:3 !function=times_2
 +
  # Data-processing (register)
  %imm5_12_6       12:3 6:2
@@ -XXX,XX +XXX,XX @@
  @S_xrr_shi       ....... .... .   rn:4 .... .... .. shty:2 rm:4 \
                   &s_rrr_shi shim=%imm5_12_6 s=1 rd=0
 +@mve_shl_ri      ....... .... . ... . . ... ... . .. .. .... \
 +                 &mve_shl_ri shim=%imm5_12_6 rdalo=%rdalo_17 rdahi=%rdahi_9
 +
  {
    TST_xrri       1110101 0000 1 .... 0 ... 1111 .... ....     @S_xrr_shi
    AND_rrri       1110101 0000 . .... 0 ... .... .... ....     @s_rrr_shi
  }
  BIC_rrri         1110101 0001 . .... 0 ... .... .... ....     @s_rrr_shi
  {
 +  # The v8.1M MVE shift insns overlap in encoding with MOVS/ORRS
 +  # and are distinguished by having Rm==13 or 15. Those are UNPREDICTABLE
 +  # cases for MOVS/ORRS. We decode the MVE cases first, ensuring that
 +  # they explicitly call unallocated_encoding() for cases that must UNDEF
 +  # (eg "using a new shift insn on a v8.1M CPU without MVE"), and letting
 +  # the rest fall through (where ORR_rrri and MOV_rxri will end up
 +  # handling them as r13 and r15 accesses with the same semantics as A32).
 +  [
-+    # All these BF insns have boff != 0b0000; we NOP them all
++    LSLL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 00 1111  @mve_shl_ri
-+    BF           1111 0 boff:4  ------- 1100 - ---------- 1    # BFL
++    LSRL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 01 1111  @mve_shl_ri
-+    BF           1111 0 boff:4 0 ------ 1110 - ---------- 1    # BFCSEL
++    ASRL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 10 1111  @mve_shl_ri
-+    BF           1111 0 boff:4 10 ----- 1110 - ---------- 1    # BF
++
-+    BF           1111 0 boff:4 11 ----- 1110 0 0000000000 1    # BFX, BFLX
++    UQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 00 1111  @mve_shl_ri
 +    URSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
 +    SRSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
 +    SQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
 +  ]
++
+   MOV_rxri       1110101 0010 . 1111 0 ... .... .... ....     @s_rxr_shi
+   ORR_rrri       1110101 0010 . .... 0 ... .... .... ....     @s_rrr_shi
+ }
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/mve_helper.c
++++ b/target/arm/mve_helper.c
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mve_vshlc)(CPUARMState *env, void *vd, uint32_t rdm,
+     mve_advance_vpt(env);
+     return rdm;
+ }
++
++uint64_t HELPER(mve_sqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
++{
++    return do_sqrshl_d(n, (int8_t)shift, false, &env->QF);
++}
++
++uint64_t HELPER(mve_uqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
++{
++    return do_uqrshl_d(n, (int8_t)shift, false, &env->QF);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_BLX_suffix(DisasContext *s, arg_BLX_suffix *a)
+@@ -XXX,XX +XXX,XX @@ static bool trans_MOVT(DisasContext *s, arg_MOVW *a)
      return true;
  }
-+static bool trans_BF(DisasContext *s, arg_BF *a)
++/*
-+{
++ * v8.1M MVE wide-shifts
-+    /*
++ */
-+     * M-profile branch future insns. The architecture permits an
++static bool do_mve_shl_ri(DisasContext *s, arg_mve_shl_ri *a,
-+     * implementation to implement these as NOPs (equivalent to
++                          WideShiftImmFn *fn)
-+     * discarding the LO_BRANCH_INFO cache immediately), and we
++{
-+     * take that IMPDEF option because for QEMU a "real" implementation
++    TCGv_i64 rda;
-+     * would be complicated and wouldn't execute any faster.
++    TCGv_i32 rdalo, rdahi;
-+     */
++
-+    if (!dc_isar_feature(aa32_lob, s)) {
++    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 +        /* Decode falls through to ORR/MOV UNPREDICTABLE handling */
 +        return false;
 +    }
-+    if (a->boff == 0) {
++    if (a->rdahi == 15) {
-+        /* SEE "Related encodings" (loop insns) */
++        /* These are a different encoding (SQSHL/SRSHR/UQSHL/URSHR) */
 +        return false;
 +    }
-+    /* Handle as NOP */
++    if (!dc_isar_feature(aa32_mve, s) ||
 +        !arm_dc_feature(s, ARM_FEATURE_M_MAIN) ||
 +        a->rdahi == 13) {
 +        /* RdaHi == 13 is UNPREDICTABLE; we choose to UNDEF */
 +        unallocated_encoding(s);
 +        return true;
 +    }
 +
 +    if (a->shim == 0) {
 +        a->shim = 32;
 +    }
 +
 +    rda = tcg_temp_new_i64();
 +    rdalo = load_reg(s, a->rdalo);
 +    rdahi = load_reg(s, a->rdahi);
 +    tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
 +
 +    fn(rda, rda, a->shim);
 +
 +    tcg_gen_extrl_i64_i32(rdalo, rda);
 +    tcg_gen_extrh_i64_i32(rdahi, rda);
 +    store_reg(s, a->rdalo, rdalo);
 +    store_reg(s, a->rdahi, rdahi);
 +    tcg_temp_free_i64(rda);
 +
 +    return true;
 +}
 +
- static bool op_tbranch(DisasContext *s, arg_tbranch *a, bool half)
++static bool trans_ASRL_ri(DisasContext *s, arg_mve_shl_ri *a)
- {
++{
-     TCGv_i32 addr, tmp;
++    return do_mve_shl_ri(s, a, tcg_gen_sari_i64);
 +}
 +
 +static bool trans_LSLL_ri(DisasContext *s, arg_mve_shl_ri *a)
 +{
 +    return do_mve_shl_ri(s, a, tcg_gen_shli_i64);
 +}
 +
 +static bool trans_LSRL_ri(DisasContext *s, arg_mve_shl_ri *a)
 +{
 +    return do_mve_shl_ri(s, a, tcg_gen_shri_i64);
 +}
 +
 +static void gen_mve_sqshll(TCGv_i64 r, TCGv_i64 n, int64_t shift)
 +{
 +    gen_helper_mve_sqshll(r, cpu_env, n, tcg_constant_i32(shift));
 +}
 +
 +static bool trans_SQSHLL_ri(DisasContext *s, arg_mve_shl_ri *a)
 +{
 +    return do_mve_shl_ri(s, a, gen_mve_sqshll);
 +}
 +
 +static void gen_mve_uqshll(TCGv_i64 r, TCGv_i64 n, int64_t shift)
 +{
 +    gen_helper_mve_uqshll(r, cpu_env, n, tcg_constant_i32(shift));
 +}
 +
 +static bool trans_UQSHLL_ri(DisasContext *s, arg_mve_shl_ri *a)
 +{
 +    return do_mve_shl_ri(s, a, gen_mve_uqshll);
 +}
 +
 +static bool trans_SRSHRL_ri(DisasContext *s, arg_mve_shl_ri *a)
 +{
 +    return do_mve_shl_ri(s, a, gen_srshr64_i64);
 +}
 +
 +static bool trans_URSHRL_ri(DisasContext *s, arg_mve_shl_ri *a)
 +{
 +    return do_mve_shl_ri(s, a, gen_urshr64_i64);
 +}
 +
  /*
   * Multiply and multiply accumulate
   */
 --
 .20.1

-[PULL 22/41] target/arm: Implement v8.1M conditional-select insns
+[PULL 22/24] target/arm: Implement MVE long shifts by register
-v8.1M brings four new insns to M-profile:
+Implement the MVE long shifts by register, which perform shifts on a
- * CSEL  : Rd = cond ? Rn : Rm
+pair of general-purpose registers treated as a 64-bit quantity, with
- * CSINC : Rd = cond ? Rn : Rm+1
+the shift count in another general-purpose register, which might be
- * CSINV : Rd = cond ? Rn : ~Rm
+either positive or negative.
- * CSNEG : Rd = cond ? Rn : -Rm
+Like the long-shifts-by-immediate, these encodings sit in the space
-Implement these.
+that was previously the UNPREDICTABLE MOVS/ORRS with Rm==13,15.
+Because LSLL_rr and ASRL_rr overlap with both MOV_rxri/ORR_rrri and
 also with CSEL (as one of the previously-UNPREDICTABLE Rm==13 cases),
 we have to move the CSEL pattern into the same decodetree group.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20210628135835.6690-17-peter.maydell@linaro.org
 Message-id: 20201019151301.2046-4-peter.maydell@linaro.org
 ---
- target/arm/t32.decode  |  3 +++
+ target/arm/helper-mve.h |  6 +++
- target/arm/translate.c | 60 ++++++++++++++++++++++++++++++++++++++++++
+ target/arm/translate.h  |  1 +
-files changed, 63 insertions(+)
+ target/arm/t32.decode   | 16 +++++--
+ target/arm/mve_helper.c | 93 +++++++++++++++++++++++++++++++++++++++++
  target/arm/translate.c  | 69 ++++++++++++++++++++++++++++++
 files changed, 182 insertions(+), 3 deletions(-)
 diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper-mve.h
 +++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(mve_vshlc, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_sshrl, TCG_CALL_NO_RWG, i64, env, i64, i32)
 +DEF_HELPER_FLAGS_3(mve_ushll, TCG_CALL_NO_RWG, i64, env, i64, i32)
  DEF_HELPER_FLAGS_3(mve_sqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
  DEF_HELPER_FLAGS_3(mve_uqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
 +DEF_HELPER_FLAGS_3(mve_sqrshrl, TCG_CALL_NO_RWG, i64, env, i64, i32)
 +DEF_HELPER_FLAGS_3(mve_uqrshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
 +DEF_HELPER_FLAGS_3(mve_sqrshrl48, TCG_CALL_NO_RWG, i64, env, i64, i32)
 +DEF_HELPER_FLAGS_3(mve_uqrshll48, TCG_CALL_NO_RWG, i64, env, i64, i32)
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
  typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
  typedef void WideShiftImmFn(TCGv_i64, TCGv_i64, int64_t shift);
 +typedef void WideShiftFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i32);
  /**
   * arm_tbflags_from_tb:
 diff --git a/target/arm/t32.decode b/target/arm/t32.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/t32.decode
 +++ b/target/arm/t32.decode
+@@ -XXX,XX +XXX,XX @@
+ &mcrr            !extern cp opc1 crm rt rt2
+ &mve_shl_ri      rdalo rdahi shim
++&mve_shl_rr      rdalo rdahi rm
+ # rdahi: bits [3:1] from insn, bit 0 is 1
+ # rdalo: bits [3:1] from insn, bit 0 is 0
+@@ -XXX,XX +XXX,XX @@
+ @mve_shl_ri      ....... .... . ... . . ... ... . .. .. .... \
+                  &mve_shl_ri shim=%imm5_12_6 rdalo=%rdalo_17 rdahi=%rdahi_9
++@mve_shl_rr      ....... .... . ... . rm:4  ... . .. .. .... \
++                 &mve_shl_rr rdalo=%rdalo_17 rdahi=%rdahi_9
+ {
+   TST_xrri       1110101 0000 1 .... 0 ... 1111 .... ....     @S_xrr_shi
+@@ -XXX,XX +XXX,XX @@ BIC_rrri         1110101 0001 . .... 0 ... .... .... ....     @s_rrr_shi
+     URSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
+     SRSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
+     SQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
++
++    LSLL_rr      1110101 0010 1 ... 0 ....  ... 1  0000 1101  @mve_shl_rr
++    ASRL_rr      1110101 0010 1 ... 0 ....  ... 1  0010 1101  @mve_shl_rr
++    UQRSHLL64_rr 1110101 0010 1 ... 1 ....  ... 1  0000 1101  @mve_shl_rr
++    SQRSHRL64_rr 1110101 0010 1 ... 1 ....  ... 1  0010 1101  @mve_shl_rr
++    UQRSHLL48_rr 1110101 0010 1 ... 1 ....  ... 1  1000 1101  @mve_shl_rr
++    SQRSHRL48_rr 1110101 0010 1 ... 1 ....  ... 1  1010 1101  @mve_shl_rr
+   ]
+   MOV_rxri       1110101 0010 . 1111 0 ... .... .... ....     @s_rxr_shi
+   ORR_rrri       1110101 0010 . .... 0 ... .... .... ....     @s_rrr_shi
++
++  # v8.1M CSEL and friends
++  CSEL           1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
+ }
+ {
+   MVN_rxri       1110101 0011 . 1111 0 ... .... .... ....     @s_rxr_shi
 @@ -XXX,XX +XXX,XX @@ SBC_rrri         1110101 1011 . .... 0 ... .... .... ....     @s_rrr_shi
  }
  RSB_rrri         1110101 1110 . .... 0 ... .... .... ....     @s_rrr_shi
-+# v8.1M CSEL and friends
+-# v8.1M CSEL and friends
-+CSEL             1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
+-CSEL             1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
-+
+-
  # Data-processing (register-shifted register)
  MOV_rxrr         1111 1010 0 shty:2 s:1 rm:4 1111 rd:4 0000 rs:4 \
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/mve_helper.c
++++ b/target/arm/mve_helper.c
+@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mve_vshlc)(CPUARMState *env, void *vd, uint32_t rdm,
+     return rdm;
+ }
++uint64_t HELPER(mve_sshrl)(CPUARMState *env, uint64_t n, uint32_t shift)
++{
++    return do_sqrshl_d(n, -(int8_t)shift, false, NULL);
++}
++
++uint64_t HELPER(mve_ushll)(CPUARMState *env, uint64_t n, uint32_t shift)
++{
++    return do_uqrshl_d(n, (int8_t)shift, false, NULL);
++}
++
+ uint64_t HELPER(mve_sqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
+ {
+     return do_sqrshl_d(n, (int8_t)shift, false, &env->QF);
+@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(mve_uqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
+ {
+     return do_uqrshl_d(n, (int8_t)shift, false, &env->QF);
+ }
++
++uint64_t HELPER(mve_sqrshrl)(CPUARMState *env, uint64_t n, uint32_t shift)
++{
++    return do_sqrshl_d(n, -(int8_t)shift, true, &env->QF);
++}
++
++uint64_t HELPER(mve_uqrshll)(CPUARMState *env, uint64_t n, uint32_t shift)
++{
++    return do_uqrshl_d(n, (int8_t)shift, true, &env->QF);
++}
++
++/* Operate on 64-bit values, but saturate at 48 bits */
++static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
++                                    bool round, uint32_t *sat)
++{
++    if (shift <= -48) {
++        /* Rounding the sign bit always produces 0. */
++        if (round) {
++            return 0;
++        }
++        return src >> 63;
++    } else if (shift < 0) {
++        if (round) {
++            src >>= -shift - 1;
++            return (src >> 1) + (src & 1);
++        }
++        return src >> -shift;
++    } else if (shift < 48) {
++        int64_t val = src << shift;
++        int64_t extval = sextract64(val, 0, 48);
++        if (!sat || val == extval) {
++            return extval;
++        }
++    } else if (!sat || src == 0) {
++        return 0;
++    }
++
++    *sat = 1;
++    return (1ULL << 47) - (src >= 0);
++}
++
++/* Operate on 64-bit values, but saturate at 48 bits */
++static inline uint64_t do_uqrshl48_d(uint64_t src, int64_t shift,
++                                     bool round, uint32_t *sat)
++{
++    uint64_t val, extval;
++
++    if (shift <= -(48 + round)) {
++        return 0;
++    } else if (shift < 0) {
++        if (round) {
++            val = src >> (-shift - 1);
++            val = (val >> 1) + (val & 1);
++        } else {
++            val = src >> -shift;
++        }
++        extval = extract64(val, 0, 48);
++        if (!sat || val == extval) {
++            return extval;
++        }
++    } else if (shift < 48) {
++        uint64_t val = src << shift;
++        uint64_t extval = extract64(val, 0, 48);
++        if (!sat || val == extval) {
++            return extval;
++        }
++    } else if (!sat || src == 0) {
++        return 0;
++    }
++
++    *sat = 1;
++    return MAKE_64BIT_MASK(0, 48);
++}
++
++uint64_t HELPER(mve_sqrshrl48)(CPUARMState *env, uint64_t n, uint32_t shift)
++{
++    return do_sqrshl48_d(n, -(int8_t)shift, true, &env->QF);
++}
++
++uint64_t HELPER(mve_uqrshll48)(CPUARMState *env, uint64_t n, uint32_t shift)
++{
++    return do_uqrshl48_d(n, (int8_t)shift, true, &env->QF);
++}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_IT(DisasContext *s, arg_IT *a)
+@@ -XXX,XX +XXX,XX @@ static bool trans_URSHRL_ri(DisasContext *s, arg_mve_shl_ri *a)
-     return true;
+     return do_mve_shl_ri(s, a, gen_urshr64_i64);
  }
-+/* v8.1M CSEL/CSINC/CSNEG/CSINV */
++static bool do_mve_shl_rr(DisasContext *s, arg_mve_shl_rr *a, WideShiftFn *fn)
-+static bool trans_CSEL(DisasContext *s, arg_CSEL *a)
++{
-+{
++    TCGv_i64 rda;
-+    TCGv_i32 rn, rm, zero;
++    TCGv_i32 rdalo, rdahi;
 +    DisasCompare c;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
++        /* Decode falls through to ORR/MOV UNPREDICTABLE handling */
 +        return false;
 +    }
-+
++    if (a->rdahi == 15) {
-+    if (a->rm == 13) {
++        /* These are a different encoding (SQSHL/SRSHR/UQSHL/URSHR) */
 +        /* SEE "Related encodings" (MVE shifts) */
 +        return false;
 +    }
-+
++    if (!dc_isar_feature(aa32_mve, s) ||
-+    if (a->rd == 13 || a->rd == 15 || a->rn == 13 || a->fcond >= 14) {
++        !arm_dc_feature(s, ARM_FEATURE_M_MAIN) ||
-+        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
++        a->rdahi == 13 || a->rm == 13 || a->rm == 15 ||
-+        return false;
++        a->rm == a->rdahi || a->rm == a->rdalo) {
-+    }
++        /* These rdahi/rdalo/rm cases are UNPREDICTABLE; we choose to UNDEF */
-+
++        unallocated_encoding(s);
-+    /* In this insn input reg fields of 0b1111 mean "zero", not "PC" */
++        return true;
-+    if (a->rn == 15) {
++    }
-+        rn = tcg_const_i32(0);
++
-+    } else {
++    rda = tcg_temp_new_i64();
-+        rn = load_reg(s, a->rn);
++    rdalo = load_reg(s, a->rdalo);
-+    }
++    rdahi = load_reg(s, a->rdahi);
-+    if (a->rm == 15) {
++    tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
-+        rm = tcg_const_i32(0);
++
-+    } else {
++    /* The helper takes care of the sign-extension of the low 8 bits of Rm */
-+        rm = load_reg(s, a->rm);
++    fn(rda, cpu_env, rda, cpu_R[a->rm]);
-+    }
++
-+
++    tcg_gen_extrl_i64_i32(rdalo, rda);
-+    switch (a->op) {
++    tcg_gen_extrh_i64_i32(rdahi, rda);
-+    case 0: /* CSEL */
++    store_reg(s, a->rdalo, rdalo);
-+        break;
++    store_reg(s, a->rdahi, rdahi);
-+    case 1: /* CSINC */
++    tcg_temp_free_i64(rda);
 +        tcg_gen_addi_i32(rm, rm, 1);
 +        break;
 +    case 2: /* CSINV */
 +        tcg_gen_not_i32(rm, rm);
 +        break;
 +    case 3: /* CSNEG */
 +        tcg_gen_neg_i32(rm, rm);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +
 +    arm_test_cc(&c, a->fcond);
 +    zero = tcg_const_i32(0);
 +    tcg_gen_movcond_i32(c.cond, rn, c.value, zero, rn, rm);
 +    arm_free_cc(&c);
 +    tcg_temp_free_i32(zero);
 +
 +    store_reg(s, a->rd, rn);
 +    tcg_temp_free_i32(rm);
 +
 +    return true;
 +}
 +
++static bool trans_LSLL_rr(DisasContext *s, arg_mve_shl_rr *a)
++{
++    return do_mve_shl_rr(s, a, gen_helper_mve_ushll);
++}
++
++static bool trans_ASRL_rr(DisasContext *s, arg_mve_shl_rr *a)
++{
++    return do_mve_shl_rr(s, a, gen_helper_mve_sshrl);
++}
++
++static bool trans_UQRSHLL64_rr(DisasContext *s, arg_mve_shl_rr *a)
++{
++    return do_mve_shl_rr(s, a, gen_helper_mve_uqrshll);
++}
++
++static bool trans_SQRSHRL64_rr(DisasContext *s, arg_mve_shl_rr *a)
++{
++    return do_mve_shl_rr(s, a, gen_helper_mve_sqrshrl);
++}
++
++static bool trans_UQRSHLL48_rr(DisasContext *s, arg_mve_shl_rr *a)
++{
++    return do_mve_shl_rr(s, a, gen_helper_mve_uqrshll48);
++}
++
++static bool trans_SQRSHRL48_rr(DisasContext *s, arg_mve_shl_rr *a)
++{
++    return do_mve_shl_rr(s, a, gen_helper_mve_sqrshrl48);
++}
++
  /*
-  * Legacy decoder.
+  * Multiply and multiply accumulate
   */
 --
 .20.1

-[PULL 23/41] target/arm: Make the t32 insn[25:23]=111 group non-overlapping
+Deleted patch
-The t32 decode has a group which represents a set of insns
-which overlap with B_cond_thumb because they have [25:23]=111
-(which is an invalid condition code field for the branch insn).
-This group is currently defined using the {} overlap-OK syntax,
-but it is almost entirely non-overlapping patterns. Switch
-it over to use a non-overlapping group.
-For this to be valid syntactically, CPS must move into the same
-overlapping-group as the hint insns (CPS vs hints was the
-only actual use of the overlap facility for the group).
-The non-overlapping subgroup for CLREX/DSB/DMB/ISB/SB is no longer
-necessary and so we can remove it (promoting those insns to
-be members of the parent group).
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20201019151301.2046-5-peter.maydell@linaro.org
----
- target/arm/t32.decode | 26 ++++++++++++--------------
-file changed, 12 insertions(+), 14 deletions(-)
-diff --git a/target/arm/t32.decode b/target/arm/t32.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/t32.decode
-+++ b/target/arm/t32.decode
-@@ -XXX,XX +XXX,XX @@ CLZ              1111 1010 1011 ---- 1111 .... 1000 ....      @rdm
- {
-   # Group insn[25:23] = 111, which is cond=111x for the branch below,
-   # or unconditional, which would be illegal for the branch.
--  {
--    # Hints
-+  [
-+    # Hints, and CPS
-     {
-       YIELD      1111 0011 1010 1111 1000 0000 0000 0001
-       WFE        1111 0011 1010 1111 1000 0000 0000 0010
-@@ -XXX,XX +XXX,XX @@ CLZ              1111 1010 1011 ---- 1111 .... 1000 ....      @rdm
-       # The canonical nop ends in 0000 0000, but the whole rest
-       # of the space is "reserved hint, behaves as nop".
-       NOP        1111 0011 1010 1111 1000 0000 ---- ----
-+
-+      # If imod == '00' && M == '0' then SEE "Hint instructions", above.
-+      CPS        1111 0011 1010 1111 1000 0 imod:2 M:1 A:1 I:1 F:1 mode:5 \
-+                 &cps
-     }
--    # If imod == '00' && M == '0' then SEE "Hint instructions", above.
--    CPS          1111 0011 1010 1111 1000 0 imod:2 M:1 A:1 I:1 F:1 mode:5 \
--                 &cps
--
-     # Miscellaneous control
--    [
--      CLREX      1111 0011 1011 1111 1000 1111 0010 1111
--      DSB        1111 0011 1011 1111 1000 1111 0100 ----
--      DMB        1111 0011 1011 1111 1000 1111 0101 ----
--      ISB        1111 0011 1011 1111 1000 1111 0110 ----
--      SB         1111 0011 1011 1111 1000 1111 0111 0000
--    ]
-+    CLREX        1111 0011 1011 1111 1000 1111 0010 1111
-+    DSB          1111 0011 1011 1111 1000 1111 0100 ----
-+    DMB          1111 0011 1011 1111 1000 1111 0101 ----
-+    ISB          1111 0011 1011 1111 1000 1111 0110 ----
-+    SB           1111 0011 1011 1111 1000 1111 0111 0000
-     # Note that the v7m insn overlaps both the normal and banked insn.
-     {
-@@ -XXX,XX +XXX,XX @@ CLZ              1111 1010 1011 ---- 1111 .... 1000 ....      @rdm
-     HVC          1111 0111 1110 ....  1000 .... .... ....     \
-                  &i imm=%imm16_16_0
-     UDF          1111 0111 1111 ----  1010 ---- ---- ----
--  }
-+  ]
-   B_cond_thumb   1111 0. cond:4 ...... 10.0 ............      &ci imm=%imm21
- }
---
-.20.1

-[PULL 24/41] target/arm: Don't allow BLX imm for M-profile
+[PULL 23/24] target/arm: Implement MVE shifts by immediate
-The BLX immediate insn in the Thumb encoding always performs
+Implement the MVE shifts by immediate, which perform shifts
-a switch from Thumb to Arm state. This would be totally useless
+on a single general-purpose register.
-in M-profile which has no Arm decoder, and so the instruction
-does not exist at all there. Make the encoding UNDEF for M-profile.
+These patterns overlap with the long-shift-by-immediates,
+so we have to rearrange the grouping a little here.
-(This part of the encoding space is used for the branch-future
-and low-overhead-loop insns in v8.1M.)
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20210628135835.6690-18-peter.maydell@linaro.org
 Message-id: 20201019151301.2046-6-peter.maydell@linaro.org
 ---
- target/arm/translate.c | 8 ++++++++
+ target/arm/helper-mve.h |  3 ++
-file changed, 8 insertions(+)
+ target/arm/translate.h  |  1 +
+ target/arm/t32.decode   | 31 ++++++++++++++-----
  target/arm/mve_helper.c | 10 ++++++
  target/arm/translate.c  | 68 +++++++++++++++++++++++++++++++++++++++--
 files changed, 104 insertions(+), 9 deletions(-)
 diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper-mve.h
 +++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_sqrshrl, TCG_CALL_NO_RWG, i64, env, i64, i32)
  DEF_HELPER_FLAGS_3(mve_uqrshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
  DEF_HELPER_FLAGS_3(mve_sqrshrl48, TCG_CALL_NO_RWG, i64, env, i64, i32)
  DEF_HELPER_FLAGS_3(mve_uqrshll48, TCG_CALL_NO_RWG, i64, env, i64, i32)
 +
 +DEF_HELPER_FLAGS_3(mve_uqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
 +DEF_HELPER_FLAGS_3(mve_sqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
 diff --git a/target/arm/translate.h b/target/arm/translate.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.h
 +++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
  typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
  typedef void WideShiftImmFn(TCGv_i64, TCGv_i64, int64_t shift);
  typedef void WideShiftFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i32);
 +typedef void ShiftImmFn(TCGv_i32, TCGv_i32, int32_t shift);
  /**
   * arm_tbflags_from_tb:
 diff --git a/target/arm/t32.decode b/target/arm/t32.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/t32.decode
 +++ b/target/arm/t32.decode
@@ -XXX,XX +XXX,XX @@
  &mve_shl_ri      rdalo rdahi shim
  &mve_shl_rr      rdalo rdahi rm
 +&mve_sh_ri       rda shim
  # rdahi: bits [3:1] from insn, bit 0 is 1
  # rdalo: bits [3:1] from insn, bit 0 is 0
@@ -XXX,XX +XXX,XX @@
                   &mve_shl_ri shim=%imm5_12_6 rdalo=%rdalo_17 rdahi=%rdahi_9
  @mve_shl_rr      ....... .... . ... . rm:4  ... . .. .. .... \
                   &mve_shl_rr rdalo=%rdalo_17 rdahi=%rdahi_9
 +@mve_sh_ri       ....... .... . rda:4 . ... ... . .. .. .... \
 +                 &mve_sh_ri shim=%imm5_12_6
  {
    TST_xrri       1110101 0000 1 .... 0 ... 1111 .... ....     @S_xrr_shi
@@ -XXX,XX +XXX,XX @@ BIC_rrri         1110101 0001 . .... 0 ... .... .... ....     @s_rrr_shi
    # the rest fall through (where ORR_rrri and MOV_rxri will end up
    # handling them as r13 and r15 accesses with the same semantics as A32).
    [
 -    LSLL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 00 1111  @mve_shl_ri
 -    LSRL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 01 1111  @mve_shl_ri
 -    ASRL_ri      1110101 0010 1 ... 0 0 ... ... 1 .. 10 1111  @mve_shl_ri
 +    {
 +      UQSHL_ri   1110101 0010 1 ....  0 ...  1111 .. 00 1111  @mve_sh_ri
 +      LSLL_ri    1110101 0010 1 ... 0 0 ... ... 1 .. 00 1111  @mve_shl_ri
 +      UQSHLL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 00 1111  @mve_shl_ri
 +    }
 -    UQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 00 1111  @mve_shl_ri
 -    URSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
 -    SRSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
 -    SQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
 +    {
 +      URSHR_ri   1110101 0010 1 ....  0 ...  1111 .. 01 1111  @mve_sh_ri
 +      LSRL_ri    1110101 0010 1 ... 0 0 ... ... 1 .. 01 1111  @mve_shl_ri
 +      URSHRL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
 +    }
 +
 +    {
 +      SRSHR_ri   1110101 0010 1 ....  0 ...  1111 .. 10 1111  @mve_sh_ri
 +      ASRL_ri    1110101 0010 1 ... 0 0 ... ... 1 .. 10 1111  @mve_shl_ri
 +      SRSHRL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
 +    }
 +
 +    {
 +      SQSHL_ri   1110101 0010 1 ....  0 ...  1111 .. 11 1111  @mve_sh_ri
 +      SQSHLL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
 +    }
      LSLL_rr      1110101 0010 1 ... 0 ....  ... 1  0000 1101  @mve_shl_rr
      ASRL_rr      1110101 0010 1 ... 0 ....  ... 1  0010 1101  @mve_shl_rr
 diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(mve_uqrshll48)(CPUARMState *env, uint64_t n, uint32_t shift)
  {
      return do_uqrshl48_d(n, (int8_t)shift, true, &env->QF);
  }
 +
 +uint32_t HELPER(mve_uqshl)(CPUARMState *env, uint32_t n, uint32_t shift)
 +{
 +    return do_uqrshl_bhs(n, (int8_t)shift, 32, false, &env->QF);
 +}
 +
 +uint32_t HELPER(mve_sqshl)(CPUARMState *env, uint32_t n, uint32_t shift)
 +{
 +    return do_sqrshl_bhs(n, (int8_t)shift, 32, false, &env->QF);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_BLX_i(DisasContext *s, arg_BLX_i *a)
+@@ -XXX,XX +XXX,XX @@ static void gen_srshr16_i64(TCGv_i64 d, TCGv_i64 a, int64_t sh)
- {
-     TCGv_i32 tmp;
+ static void gen_srshr32_i32(TCGv_i32 d, TCGv_i32 a, int32_t sh)
+ {
-+    /*
+-    TCGv_i32 t = tcg_temp_new_i32();
-+     * BLX <imm> would be useless on M-profile; the encoding space
++    TCGv_i32 t;
-+     * is used for other insns from v8.1M onward, and UNDEFs before that.
-+     */
++    /* Handle shift by the input size for the benefit of trans_SRSHR_ri */
-+    if (arm_dc_feature(s, ARM_FEATURE_M)) {
++    if (sh == 32) {
 +        tcg_gen_movi_i32(d, 0);
 +        return;
 +    }
 +    t = tcg_temp_new_i32();
      tcg_gen_extract_i32(t, a, sh - 1, 1);
      tcg_gen_sari_i32(d, a, sh);
      tcg_gen_add_i32(d, d, t);
@@ -XXX,XX +XXX,XX @@ static void gen_urshr16_i64(TCGv_i64 d, TCGv_i64 a, int64_t sh)
  static void gen_urshr32_i32(TCGv_i32 d, TCGv_i32 a, int32_t sh)
  {
 -    TCGv_i32 t = tcg_temp_new_i32();
 +    TCGv_i32 t;
 +    /* Handle shift by the input size for the benefit of trans_URSHR_ri */
 +    if (sh == 32) {
 +        tcg_gen_extract_i32(d, a, sh - 1, 1);
 +        return;
 +    }
 +    t = tcg_temp_new_i32();
      tcg_gen_extract_i32(t, a, sh - 1, 1);
      tcg_gen_shri_i32(d, a, sh);
      tcg_gen_add_i32(d, d, t);
@@ -XXX,XX +XXX,XX @@ static bool trans_SQRSHRL48_rr(DisasContext *s, arg_mve_shl_rr *a)
      return do_mve_shl_rr(s, a, gen_helper_mve_sqrshrl48);
  }
 +static bool do_mve_sh_ri(DisasContext *s, arg_mve_sh_ri *a, ShiftImmFn *fn)
 +{
 +    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 +        /* Decode falls through to ORR/MOV UNPREDICTABLE handling */
 +        return false;
 +    }
-+
++    if (!dc_isar_feature(aa32_mve, s) ||
-     /* For A32, ARM_FEATURE_V5 is checked near the start of the uncond block. */
++        !arm_dc_feature(s, ARM_FEATURE_M_MAIN) ||
-     if (s->thumb && (a->imm & 2)) {
++        a->rda == 13 || a->rda == 15) {
-         return false;
++        /* These rda cases are UNPREDICTABLE; we choose to UNDEF */
 +        unallocated_encoding(s);
 +        return true;
 +    }
 +
 +    if (a->shim == 0) {
 +        a->shim = 32;
 +    }
 +    fn(cpu_R[a->rda], cpu_R[a->rda], a->shim);
 +
 +    return true;
 +}
 +
 +static bool trans_URSHR_ri(DisasContext *s, arg_mve_sh_ri *a)
 +{
 +    return do_mve_sh_ri(s, a, gen_urshr32_i32);
 +}
 +
 +static bool trans_SRSHR_ri(DisasContext *s, arg_mve_sh_ri *a)
 +{
 +    return do_mve_sh_ri(s, a, gen_srshr32_i32);
 +}
 +
 +static void gen_mve_sqshl(TCGv_i32 r, TCGv_i32 n, int32_t shift)
 +{
 +    gen_helper_mve_sqshl(r, cpu_env, n, tcg_constant_i32(shift));
 +}
 +
 +static bool trans_SQSHL_ri(DisasContext *s, arg_mve_sh_ri *a)
 +{
 +    return do_mve_sh_ri(s, a, gen_mve_sqshl);
 +}
 +
 +static void gen_mve_uqshl(TCGv_i32 r, TCGv_i32 n, int32_t shift)
 +{
 +    gen_helper_mve_uqshl(r, cpu_env, n, tcg_constant_i32(shift));
 +}
 +
 +static bool trans_UQSHL_ri(DisasContext *s, arg_mve_sh_ri *a)
 +{
 +    return do_mve_sh_ri(s, a, gen_mve_uqshl);
 +}
 +
  /*
   * Multiply and multiply accumulate
   */
 --
 .20.1

-[PULL 26/41] target/arm: Implement v8.1M low-overhead-loop instructions
+[PULL 24/24] target/arm: Implement MVE shifts by register
-v8.1M's "low-overhead-loop" extension has three instructions
+Implement the MVE shifts by register, which perform
-for looping:
+shifts on a single general-purpose register.
  * DLS (start of a do-loop)
  * WLS (start of a while-loop)
  * LE (end of a loop)
 The loop-start instructions are both simple operations to start a
 loop whose iteration count (if any) is in LR.  The loop-end
 instruction handles "decrement iteration count and jump back to loop
 start"; it also caches the information about the branch back to the
 start of the loop to improve performance of the branch on subsequent
 iterations.
 As with the branch-future instructions, the architecture permits an
 implementation to discard the LO_BRANCH_INFO cache at any time, and
 QEMU takes the IMPDEF option to never set it in the first place
 (equivalent to discarding it immediately), because for us a "real"
 implementation would be unnecessary complexity.
 (This implementation only provides the simple looping constructs; the
 vector extension MVE (Helium) adds some extra variants to handle
 looping across vectors.  We'll add those later when we implement
 MVE.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201019151301.2046-8-peter.maydell@linaro.org
+Message-id: 20210628135835.6690-19-peter.maydell@linaro.org
 ---
- target/arm/t32.decode  |  8 ++++
+ target/arm/helper-mve.h |  2 ++
- target/arm/translate.c | 93 +++++++++++++++++++++++++++++++++++++++++-
+ target/arm/translate.h  |  1 +
-files changed, 99 insertions(+), 2 deletions(-)
+ target/arm/t32.decode   | 18 ++++++++++++++----
  target/arm/mve_helper.c | 10 ++++++++++
  target/arm/translate.c  | 30 ++++++++++++++++++++++++++++++
 files changed, 57 insertions(+), 4 deletions(-)
+diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper-mve.h
++++ b/target/arm/helper-mve.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_uqrshll48, TCG_CALL_NO_RWG, i64, env, i64, i32)
+ DEF_HELPER_FLAGS_3(mve_uqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
+ DEF_HELPER_FLAGS_3(mve_sqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
++DEF_HELPER_FLAGS_3(mve_uqrshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
++DEF_HELPER_FLAGS_3(mve_sqrshr, TCG_CALL_NO_RWG, i32, env, i32, i32)
+diff --git a/target/arm/translate.h b/target/arm/translate.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.h
++++ b/target/arm/translate.h
+@@ -XXX,XX +XXX,XX @@ typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
+ typedef void WideShiftImmFn(TCGv_i64, TCGv_i64, int64_t shift);
+ typedef void WideShiftFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i32);
+ typedef void ShiftImmFn(TCGv_i32, TCGv_i32, int32_t shift);
++typedef void ShiftFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
+ /**
+  * arm_tbflags_from_tb:
 diff --git a/target/arm/t32.decode b/target/arm/t32.decode
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/t32.decode
 +++ b/target/arm/t32.decode
-@@ -XXX,XX +XXX,XX @@ BL               1111 0. .......... 11.1 ............         @branch24
+@@ -XXX,XX +XXX,XX @@
-     BF           1111 0 boff:4 10 ----- 1110 - ---------- 1    # BF
+ &mve_shl_ri      rdalo rdahi shim
-     BF           1111 0 boff:4 11 ----- 1110 0 0000000000 1    # BFX, BFLX
+ &mve_shl_rr      rdalo rdahi rm
  &mve_sh_ri       rda shim
 +&mve_sh_rr       rda rm
  # rdahi: bits [3:1] from insn, bit 0 is 1
  # rdalo: bits [3:1] from insn, bit 0 is 0
@@ -XXX,XX +XXX,XX @@
                   &mve_shl_rr rdalo=%rdalo_17 rdahi=%rdahi_9
  @mve_sh_ri       ....... .... . rda:4 . ... ... . .. .. .... \
                   &mve_sh_ri shim=%imm5_12_6
 +@mve_sh_rr       ....... .... . rda:4 rm:4 .... .... .... &mve_sh_rr
  {
    TST_xrri       1110101 0000 1 .... 0 ... 1111 .... ....     @S_xrr_shi
@@ -XXX,XX +XXX,XX @@ BIC_rrri         1110101 0001 . .... 0 ... .... .... ....     @s_rrr_shi
        SQSHLL_ri  1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
      }
 -    LSLL_rr      1110101 0010 1 ... 0 ....  ... 1  0000 1101  @mve_shl_rr
 -    ASRL_rr      1110101 0010 1 ... 0 ....  ... 1  0010 1101  @mve_shl_rr
 -    UQRSHLL64_rr 1110101 0010 1 ... 1 ....  ... 1  0000 1101  @mve_shl_rr
 -    SQRSHRL64_rr 1110101 0010 1 ... 1 ....  ... 1  0010 1101  @mve_shl_rr
 +    {
 +      UQRSHL_rr    1110101 0010 1 ....  ....  1111 0000 1101  @mve_sh_rr
 +      LSLL_rr      1110101 0010 1 ... 0 .... ... 1 0000 1101  @mve_shl_rr
 +      UQRSHLL64_rr 1110101 0010 1 ... 1 .... ... 1 0000 1101  @mve_shl_rr
 +    }
 +
 +    {
 +      SQRSHR_rr    1110101 0010 1 ....  ....  1111 0010 1101  @mve_sh_rr
 +      ASRL_rr      1110101 0010 1 ... 0 .... ... 1 0010 1101  @mve_shl_rr
 +      SQRSHRL64_rr 1110101 0010 1 ... 1 .... ... 1 0010 1101  @mve_shl_rr
 +    }
 +
      UQRSHLL48_rr 1110101 0010 1 ... 1 ....  ... 1  1000 1101  @mve_shl_rr
      SQRSHRL48_rr 1110101 0010 1 ... 1 ....  ... 1  1010 1101  @mve_shl_rr
    ]
-+  [
+diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
-+    # LE and WLS immediate
+index XXXXXXX..XXXXXXX 100644
-+    %lob_imm 1:10 11:1 !function=times_2
+--- a/target/arm/mve_helper.c
 +++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mve_sqshl)(CPUARMState *env, uint32_t n, uint32_t shift)
  {
      return do_sqrshl_bhs(n, (int8_t)shift, 32, false, &env->QF);
  }
 +
-+    DLS          1111 0 0000 100     rn:4 1110 0000 0000 0001
++uint32_t HELPER(mve_uqrshl)(CPUARMState *env, uint32_t n, uint32_t shift)
-+    WLS          1111 0 0000 100     rn:4 1100 . .......... 1 imm=%lob_imm
++{
-+    LE           1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
++    return do_uqrshl_bhs(n, (int8_t)shift, 32, true, &env->QF);
-+  ]
++}
- }
++
 +uint32_t HELPER(mve_sqrshr)(CPUARMState *env, uint32_t n, uint32_t shift)
 +{
 +    return do_sqrshl_bhs(n, -(int8_t)shift, 32, true, &env->QF);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static void gen_goto_tb(DisasContext *s, int n, target_ulong dest)
+@@ -XXX,XX +XXX,XX @@ static bool trans_UQSHL_ri(DisasContext *s, arg_mve_sh_ri *a)
-     s->base.is_jmp = DISAS_NORETURN;
+     return do_mve_sh_ri(s, a, gen_mve_uqshl);
  }
--static inline void gen_jmp (DisasContext *s, uint32_t dest)
++static bool do_mve_sh_rr(DisasContext *s, arg_mve_sh_rr *a, ShiftFn *fn)
 +/* Jump, specifying which TB number to use if we gen_goto_tb() */
 +static inline void gen_jmp_tb(DisasContext *s, uint32_t dest, int tbno)
  {
      if (unlikely(is_singlestepping(s))) {
          /* An indirect jump so that we still trigger the debug exception.  */
          gen_set_pc_im(s, dest);
          s->base.is_jmp = DISAS_JUMP;
      } else {
 -        gen_goto_tb(s, 0, dest);
 +        gen_goto_tb(s, tbno, dest);
      }
  }
 +static inline void gen_jmp(DisasContext *s, uint32_t dest)
 +{
-+    gen_jmp_tb(s, dest, 0);
++    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-+}
++        /* Decode falls through to ORR/MOV UNPREDICTABLE handling */
 +
  static inline void gen_mulxy(TCGv_i32 t0, TCGv_i32 t1, int x, int y)
  {
      if (x)
@@ -XXX,XX +XXX,XX @@ static bool trans_BF(DisasContext *s, arg_BF *a)
      return true;
  }
 +static bool trans_DLS(DisasContext *s, arg_DLS *a)
 +{
 +    /* M-profile low-overhead loop start */
 +    TCGv_i32 tmp;
 +
 +    if (!dc_isar_feature(aa32_lob, s)) {
 +        return false;
 +    }
-+    if (a->rn == 13 || a->rn == 15) {
++    if (!dc_isar_feature(aa32_mve, s) ||
-+        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
++        !arm_dc_feature(s, ARM_FEATURE_M_MAIN) ||
-+        return false;
++        a->rda == 13 || a->rda == 15 || a->rm == 13 || a->rm == 15 ||
 +        a->rm == a->rda) {
 +        /* These rda/rm cases are UNPREDICTABLE; we choose to UNDEF */
 +        unallocated_encoding(s);
 +        return true;
 +    }
 +
-+    /* Not a while loop, no tail predication: just set LR to the count */
++    /* The helper takes care of the sign-extension of the low 8 bits of Rm */
-+    tmp = load_reg(s, a->rn);
++    fn(cpu_R[a->rda], cpu_env, cpu_R[a->rda], cpu_R[a->rm]);
 +    store_reg(s, 14, tmp);
 +    return true;
 +}
 +
-+static bool trans_WLS(DisasContext *s, arg_WLS *a)
++static bool trans_SQRSHR_rr(DisasContext *s, arg_mve_sh_rr *a)
 +{
-+    /* M-profile low-overhead while-loop start */
++    return do_mve_sh_rr(s, a, gen_helper_mve_sqrshr);
 +    TCGv_i32 tmp;
 +    TCGLabel *nextlabel;
 +
 +    if (!dc_isar_feature(aa32_lob, s)) {
 +        return false;
 +    }
 +    if (a->rn == 13 || a->rn == 15) {
 +        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
 +        return false;
 +    }
 +    if (s->condexec_mask) {
 +        /*
 +         * WLS in an IT block is CONSTRAINED UNPREDICTABLE;
 +         * we choose to UNDEF, because otherwise our use of
 +         * gen_goto_tb(1) would clash with the use of TB exit 1
 +         * in the dc->condjmp condition-failed codepath in
 +         * arm_tr_tb_stop() and we'd get an assertion.
 +         */
 +        return false;
 +    }
 +    nextlabel = gen_new_label();
 +    tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_R[a->rn], 0, nextlabel);
 +    tmp = load_reg(s, a->rn);
 +    store_reg(s, 14, tmp);
 +    gen_jmp_tb(s, s->base.pc_next, 1);
 +
 +    gen_set_label(nextlabel);
 +    gen_jmp(s, read_pc(s) + a->imm);
 +    return true;
 +}
 +
-+static bool trans_LE(DisasContext *s, arg_LE *a)
++static bool trans_UQRSHL_rr(DisasContext *s, arg_mve_sh_rr *a)
 +{
-+    /*
++    return do_mve_sh_rr(s, a, gen_helper_mve_uqrshl);
 +     * M-profile low-overhead loop end. The architecture permits an
 +     * implementation to discard the LO_BRANCH_INFO cache at any time,
 +     * and we take the IMPDEF option to never set it in the first place
 +     * (equivalent to always discarding it immediately), because for QEMU
 +     * a "real" implementation would be complicated and wouldn't execute
 +     * any faster.
 +     */
 +    TCGv_i32 tmp;
 +
 +    if (!dc_isar_feature(aa32_lob, s)) {
 +        return false;
 +    }
 +
 +    if (!a->f) {
 +        /* Not loop-forever. If LR <= 1 this is the last loop: do nothing. */
 +        arm_gen_condlabel(s);
 +        tcg_gen_brcondi_i32(TCG_COND_LEU, cpu_R[14], 1, s->condlabel);
 +        /* Decrement LR */
 +        tmp = load_reg(s, 14);
 +        tcg_gen_addi_i32(tmp, tmp, -1);
 +        store_reg(s, 14, tmp);
 +    }
 +    /* Jump back to the loop start */
 +    gen_jmp(s, read_pc(s) - a->imm);
 +    return true;
 +}
 +
- static bool op_tbranch(DisasContext *s, arg_tbranch *a, bool half)
+ /*
- {
+  * Multiply and multiply accumulate
-     TCGv_i32 addr, tmp;
+  */
 --
 .20.1

-[PULL 27/41] target/arm: Fix has_vfp/has_neon ID reg squashing for M-profile
+Deleted patch
-In arm_cpu_realizefn(), if the CPU has VFP or Neon disabled then we
-squash the ID register fields so that we don't advertise it to the
-guest.  This code was written for A-profile and needs some tweaks to
-work correctly on M-profile:
- * A-profile only fields should not be zeroed on M-profile:
-   - MVFR0.FPSHVEC,FPTRAP
-   - MVFR1.SIMDLS,SIMDINT,SIMDSP,SIMDHP
-   - MVFR2.SIMDMISC
- * M-profile only fields should be zeroed on M-profile:
-   - MVFR1.FP16
-In particular, because MVFR1.SIMDHP on A-profile is the same field as
-MVFR1.FP16 on M-profile this code was incorrectly disabling FP16
-support on an M-profile CPU (where has_neon is always false).  This
-isn't a visible bug yet because we don't have any M-profile CPUs with
-FP16 support, but the change is necessary before we introduce any.
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20201019151301.2046-9-peter.maydell@linaro.org
----
- target/arm/cpu.c | 29 ++++++++++++++++++-----------
-file changed, 18 insertions(+), 11 deletions(-)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
-+++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
-         u = cpu->isar.mvfr0;
-         u = FIELD_DP32(u, MVFR0, FPSP, 0);
-         u = FIELD_DP32(u, MVFR0, FPDP, 0);
--        u = FIELD_DP32(u, MVFR0, FPTRAP, 0);
-         u = FIELD_DP32(u, MVFR0, FPDIVIDE, 0);
-         u = FIELD_DP32(u, MVFR0, FPSQRT, 0);
--        u = FIELD_DP32(u, MVFR0, FPSHVEC, 0);
-         u = FIELD_DP32(u, MVFR0, FPROUND, 0);
-+        if (!arm_feature(env, ARM_FEATURE_M)) {
-+            u = FIELD_DP32(u, MVFR0, FPTRAP, 0);
-+            u = FIELD_DP32(u, MVFR0, FPSHVEC, 0);
-+        }
-         cpu->isar.mvfr0 = u;
-         u = cpu->isar.mvfr1;
-         u = FIELD_DP32(u, MVFR1, FPFTZ, 0);
-         u = FIELD_DP32(u, MVFR1, FPDNAN, 0);
-         u = FIELD_DP32(u, MVFR1, FPHP, 0);
-+        if (arm_feature(env, ARM_FEATURE_M)) {
-+            u = FIELD_DP32(u, MVFR1, FP16, 0);
-+        }
-         cpu->isar.mvfr1 = u;
-         u = cpu->isar.mvfr2;
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
-         u = FIELD_DP32(u, ID_ISAR6, FHM, 0);
-         cpu->isar.id_isar6 = u;
--        u = cpu->isar.mvfr1;
--        u = FIELD_DP32(u, MVFR1, SIMDLS, 0);
--        u = FIELD_DP32(u, MVFR1, SIMDINT, 0);
--        u = FIELD_DP32(u, MVFR1, SIMDSP, 0);
--        u = FIELD_DP32(u, MVFR1, SIMDHP, 0);
--        cpu->isar.mvfr1 = u;
-+        if (!arm_feature(env, ARM_FEATURE_M)) {
-+            u = cpu->isar.mvfr1;
-+            u = FIELD_DP32(u, MVFR1, SIMDLS, 0);
-+            u = FIELD_DP32(u, MVFR1, SIMDINT, 0);
-+            u = FIELD_DP32(u, MVFR1, SIMDSP, 0);
-+            u = FIELD_DP32(u, MVFR1, SIMDHP, 0);
-+            cpu->isar.mvfr1 = u;
--        u = cpu->isar.mvfr2;
--        u = FIELD_DP32(u, MVFR2, SIMDMISC, 0);
--        cpu->isar.mvfr2 = u;
-+            u = cpu->isar.mvfr2;
-+            u = FIELD_DP32(u, MVFR2, SIMDMISC, 0);
-+            cpu->isar.mvfr2 = u;
-+        }
-     }
-     if (!cpu->has_neon && !cpu->has_vfp) {
---
-.20.1

-[PULL 33/41] linux-user/elfload: Avoid leaking interp_name using GLib memory API
+Deleted patch
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Fix an unlikely memory leak in load_elf_image().
-Fixes: bf858897b7 ("linux-user: Re-use load_elf_image for the main binary.")
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201016184207.786698-5-richard.henderson@linaro.org
-Message-Id: <20201003174944.1972444-1-f4bug@amsat.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- linux-user/elfload.c | 8 ++++----
-file changed, 4 insertions(+), 4 deletions(-)
-diff --git a/linux-user/elfload.c b/linux-user/elfload.c
-index XXXXXXX..XXXXXXX 100644
---- a/linux-user/elfload.c
-+++ b/linux-user/elfload.c
-@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
-                 info->brk = vaddr_em;
-             }
-         } else if (eppnt->p_type == PT_INTERP && pinterp_name) {
--            char *interp_name;
-+            g_autofree char *interp_name = NULL;
-             if (*pinterp_name) {
-                 errmsg = "Multiple PT_INTERP entries";
-                 goto exit_errmsg;
-             }
--            interp_name = malloc(eppnt->p_filesz);
-+            interp_name = g_malloc(eppnt->p_filesz);
-             if (!interp_name) {
-                 goto exit_perror;
-             }
-@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
-                 errmsg = "Invalid PT_INTERP entry";
-                 goto exit_errmsg;
-             }
--            *pinterp_name = interp_name;
-+            *pinterp_name = g_steal_pointer(&interp_name);
- #ifdef TARGET_MIPS
-         } else if (eppnt->p_type == PT_MIPS_ABIFLAGS) {
-             Mips_elf_abiflags_v0 abiflags;
-@@ -XXX,XX +XXX,XX @@ int load_elf_binary(struct linux_binprm *bprm, struct image_info *info)
-     if (elf_interpreter) {
-         info->load_bias = interp_info.load_bias;
-         info->entry = interp_info.entry;
--        free(elf_interpreter);
-+        g_free(elf_interpreter);
-     }
- #ifdef USE_ELF_CORE_DUMP
---
-.20.1

-[PULL 34/41] linux-user/elfload: Fix coding style in load_elf_image
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-Fixing this now will clarify following patches.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20201016184207.786698-6-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- linux-user/elfload.c | 12 +++++++++---
-file changed, 9 insertions(+), 3 deletions(-)
-diff --git a/linux-user/elfload.c b/linux-user/elfload.c
-index XXXXXXX..XXXXXXX 100644
---- a/linux-user/elfload.c
-+++ b/linux-user/elfload.c
-@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
-             abi_ulong vaddr, vaddr_po, vaddr_ps, vaddr_ef, vaddr_em, vaddr_len;
-             int elf_prot = 0;
--            if (eppnt->p_flags & PF_R) elf_prot =  PROT_READ;
--            if (eppnt->p_flags & PF_W) elf_prot |= PROT_WRITE;
--            if (eppnt->p_flags & PF_X) elf_prot |= PROT_EXEC;
-+            if (eppnt->p_flags & PF_R) {
-+                elf_prot |= PROT_READ;
-+            }
-+            if (eppnt->p_flags & PF_W) {
-+                elf_prot |= PROT_WRITE;
-+            }
-+            if (eppnt->p_flags & PF_X) {
-+                elf_prot |= PROT_EXEC;
-+            }
-             vaddr = load_bias + eppnt->p_vaddr;
-             vaddr_po = TARGET_ELF_PAGEOFFSET(vaddr);
---
-.20.1

The following changes since commit 4c41341af76cfc85b5a6c0f87de4838672ab9f89:

Merge remote-tracking branch 'remotes/aperard/tags/pull-xen-20201020' into staging (2020-10-20 11:20:36 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20201020

for you to fetch changes up to 6358890cb939192f6169fdf7664d903bf9b1d338:

tests/tcg/aarch64: Add bti smoke tests (2020-10-20 16:12:02 +0100)

----------------------------------------------------------------
target-arm queue:
 * Fix AArch32 SMLAD incorrect setting of Q bit
 * AArch32 VCVT fixed-point to float is always round-to-nearest
 * strongarm: Fix 'time to transmit a char' unit comment
 * Restrict APEI tables generation to the 'virt' machine
 * bcm2835: minor code cleanups
 * correctly flush TLBs when TBI is enabled
 * tests/qtest: Add npcm7xx timer test
 * loads-stores.rst: add footnote that clarifies GETPC usage
 * Fix reported EL for mte_check_fail
 * Ignore HCR_EL2.ATA when {E2H,TGE} != 11
 * microbit_i2c: Fix coredump when dump-vmstate
 * nseries: Fix loading kernel image on n8x0 machines
 * Implement v8.1M low-overhead-loops
 * linux-user: Support AArch64 BTI

----------------------------------------------------------------
Emanuele Giuseppe Esposito (1):
      loads-stores.rst: add footnote that clarifies GETPC usage

Havard Skinnemoen (1):
      tests/qtest: Add npcm7xx timer test

Peng Liang (1):
      microbit_i2c: Fix coredump when dump-vmstate

Peter Maydell (12):
      target/arm: Fix SMLAD incorrect setting of Q bit
      target/arm: AArch32 VCVT fixed-point to float is always round-to-nearest
      decodetree: Fix codegen for non-overlapping group inside overlapping group
      target/arm: Implement v8.1M NOCP handling
      target/arm: Implement v8.1M conditional-select insns
      target/arm: Make the t32 insn[25:23]=111 group non-overlapping
      target/arm: Don't allow BLX imm for M-profile
      target/arm: Implement v8.1M branch-future insns (as NOPs)
      target/arm: Implement v8.1M low-overhead-loop instructions
      target/arm: Fix has_vfp/has_neon ID reg squashing for M-profile
      target/arm: Allow M-profile CPUs with FP16 to set FPSCR.FP16
      target/arm: Implement FPSCR.LTPSIZE for M-profile LOB extension

Philippe Mathieu-Daudé (10):
      hw/arm/strongarm: Fix 'time to transmit a char' unit comment
      hw/arm: Restrict APEI tables generation to the 'virt' machine
      hw/timer/bcm2835: Introduce BCM2835_SYSTIMER_COUNT definition
      hw/timer/bcm2835: Rename variable holding CTRL_STATUS register
      hw/timer/bcm2835: Support the timer COMPARE registers
      hw/arm/bcm2835_peripherals: Correctly wire the SYS_timer IRQs
      hw/intc/bcm2835_ic: Trace GPU/CPU IRQ handlers
      hw/intc/bcm2836_control: Use IRQ definitions instead of magic numbers
      hw/arm/nseries: Fix loading kernel image on n8x0 machines
      linux-user/elfload: Avoid leaking interp_name using GLib memory API

Richard Henderson (16):
      accel/tcg: Add tlb_flush_page_bits_by_mmuidx*
      target/arm: Use tlb_flush_page_bits_by_mmuidx*
      target/arm: Remove redundant mmu_idx lookup
      target/arm: Fix reported EL for mte_check_fail
      target/arm: Ignore HCR_EL2.ATA when {E2H,TGE} != 11
      linux-user/aarch64: Reset btype for signals
      linux-user: Set PAGE_TARGET_1 for TARGET_PROT_BTI
      include/elf: Add defines related to GNU property notes for AArch64
      linux-user/elfload: Fix coding style in load_elf_image
      linux-user/elfload: Adjust iteration over phdr
      linux-user/elfload: Move PT_INTERP detection to first loop
      linux-user/elfload: Use Error for load_elf_image
      linux-user/elfload: Use Error for load_elf_interp
      linux-user/elfload: Parse NT_GNU_PROPERTY_TYPE_0 notes
      linux-user/elfload: Parse GNU_PROPERTY_AARCH64_FEATURE_1_AND
      tests/tcg/aarch64: Add bti smoke tests

The SMLAD instruction is supposed to:
 * signed multiply Rn[15:0] * Rm[15:0]
 * signed multiply Rn[31:16] * Rm[31:16]
 * perform a signed addition of the products and Ra
 * set Rd to the low 32 bits of the theoretical
   infinite-precision result
 * set the Q flag if the sign-extension of Rd
   would differ from the infinite-precision result
   (ie on overflow)

Our current implementation doesn't quite do this, though: it performs
an addition of the products setting Q on overflow, and then it adds
Ra, again possibly setting Q.  This sometimes incorrectly sets Q when
the architecturally mandated only-check-for-overflow-once algorithm
does not. For instance:
 r1 = 0x80008000; r2 = 0x80008000; r3 = 0xffffffff
 smlad r0, r1, r2, r3
This is (-32768 * -32768) + (-32768 * -32768) - 1

The products are both 0x4000_0000, so when added together as 32-bit
signed numbers they overflow (and QEMU sets Q), but because the
addition of Ra == -1 brings the total back down to 0x7fff_ffff
there is no overflow for the complete operation and setting Q is
incorrect.

Fix this edge case by resorting to 64-bit arithmetic for the
case where we need to add three values together.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201009144712.11187-1-peter.maydell@linaro.org
---
 target/arm/translate.c | 58 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 48 insertions(+), 10 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool op_smlad(DisasContext *s, arg_rrrr *a, bool m_swap, bool sub)
     gen_smul_dual(t1, t2);
 
     if (sub) {
-        /* This subtraction cannot overflow. */
+        /*
+         * This subtraction cannot overflow, so we can do a simple
+         * 32-bit subtraction and then a possible 32-bit saturating
+         * addition of Ra.
+         */
         tcg_gen_sub_i32(t1, t1, t2);
+        tcg_temp_free_i32(t2);
+
+        if (a->ra != 15) {
+            t2 = load_reg(s, a->ra);
+            gen_helper_add_setq(t1, cpu_env, t1, t2);
+            tcg_temp_free_i32(t2);
+        }
+    } else if (a->ra == 15) {
+        /* Single saturation-checking addition */
+        gen_helper_add_setq(t1, cpu_env, t1, t2);
+        tcg_temp_free_i32(t2);
     } else {
         /*
-         * This addition cannot overflow 32 bits; however it may
-         * overflow considered as a signed operation, in which case
-         * we must set the Q flag.
+         * We need to add the products and Ra together and then
+         * determine whether the final result overflowed. Doing
+         * this as two separate add-and-check-overflow steps incorrectly
+         * sets Q for cases like (-32768 * -32768) + (-32768 * -32768) + -1.
+         * Do all the arithmetic at 64-bits and then check for overflow.
          */
-        gen_helper_add_setq(t1, cpu_env, t1, t2);
-    }
-    tcg_temp_free_i32(t2);
+        TCGv_i64 p64, q64;
+        TCGv_i32 t3, qf, one;
 
-    if (a->ra != 15) {
-        t2 = load_reg(s, a->ra);
-        gen_helper_add_setq(t1, cpu_env, t1, t2);
+        p64 = tcg_temp_new_i64();
+        q64 = tcg_temp_new_i64();
+        tcg_gen_ext_i32_i64(p64, t1);
+        tcg_gen_ext_i32_i64(q64, t2);
+        tcg_gen_add_i64(p64, p64, q64);
+        load_reg_var(s, t2, a->ra);
+        tcg_gen_ext_i32_i64(q64, t2);
+        tcg_gen_add_i64(p64, p64, q64);
+        tcg_temp_free_i64(q64);
+
+        tcg_gen_extr_i64_i32(t1, t2, p64);
+        tcg_temp_free_i64(p64);
+        /*
+         * t1 is the low half of the result which goes into Rd.
+         * We have overflow and must set Q if the high half (t2)
+         * is different from the sign-extension of t1.
+         */
+        t3 = tcg_temp_new_i32();
+        tcg_gen_sari_i32(t3, t1, 31);
+        qf = load_cpu_field(QF);
+        one = tcg_const_i32(1);
+        tcg_gen_movcond_i32(TCG_COND_NE, qf, t2, t3, one, qf);
+        store_cpu_field(qf, QF);
+        tcg_temp_free_i32(one);
+        tcg_temp_free_i32(t3);
         tcg_temp_free_i32(t2);
     }
     store_reg(s, a->rd, t1);
-- 
2.20.1

For AArch32, unlike the VCVT of integer to float, which honours the
rounding mode specified by the FPSCR, VCVT of fixed-point to float is
always round-to-nearest. (AArch64 fixed-point-to-float conversions
always honour the FPCR rounding mode.)

Implement this by providing _round_to_nearest versions of the
relevant helpers which set the rounding mode temporarily when making
the call to the underlying softfloat function.

We only need to change the VFP VCVT instructions, because the
standard- FPSCR value used by the Neon VCVT is always set to
round-to-nearest, so we don't need to do the extra work of saving
and restoring the rounding mode.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201013103532.13391-1-peter.maydell@linaro.org
---
 target/arm/helper.h            | 13 +++++++++++++
 target/arm/vfp_helper.c        | 23 ++++++++++++++++++++++-
 target/arm/translate-vfp.c.inc | 24 ++++++++++++------------
 3 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_ultoh, f16, i32, i32, ptr)
 DEF_HELPER_3(vfp_sqtoh, f16, i64, i32, ptr)
 DEF_HELPER_3(vfp_uqtoh, f16, i64, i32, ptr)
 
+DEF_HELPER_3(vfp_shtos_round_to_nearest, f32, i32, i32, ptr)
+DEF_HELPER_3(vfp_sltos_round_to_nearest, f32, i32, i32, ptr)
+DEF_HELPER_3(vfp_uhtos_round_to_nearest, f32, i32, i32, ptr)
+DEF_HELPER_3(vfp_ultos_round_to_nearest, f32, i32, i32, ptr)
+DEF_HELPER_3(vfp_shtod_round_to_nearest, f64, i64, i32, ptr)
+DEF_HELPER_3(vfp_sltod_round_to_nearest, f64, i64, i32, ptr)
+DEF_HELPER_3(vfp_uhtod_round_to_nearest, f64, i64, i32, ptr)
+DEF_HELPER_3(vfp_ultod_round_to_nearest, f64, i64, i32, ptr)
+DEF_HELPER_3(vfp_shtoh_round_to_nearest, f16, i32, i32, ptr)
+DEF_HELPER_3(vfp_uhtoh_round_to_nearest, f16, i32, i32, ptr)
+DEF_HELPER_3(vfp_sltoh_round_to_nearest, f16, i32, i32, ptr)
+DEF_HELPER_3(vfp_ultoh_round_to_nearest, f16, i32, i32, ptr)
+
 DEF_HELPER_FLAGS_2(set_rmode, TCG_CALL_NO_RWG, i32, i32, ptr)
 
 DEF_HELPER_FLAGS_3(vfp_fcvt_f16_to_f32, TCG_CALL_NO_RWG, f32, f16, ptr, i32)
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ float32 VFP_HELPER(fcvts, d)(float64 x, CPUARMState *env)
     return float64_to_float32(x, &env->vfp.fp_status);
 }
 
-/* VFP3 fixed point conversion.  */
+/*
+ * VFP3 fixed point conversion. The AArch32 versions of fix-to-float
+ * must always round-to-nearest; the AArch64 ones honour the FPSCR
+ * rounding mode. (For AArch32 Neon the standard-FPSCR is set to
+ * round-to-nearest so either helper will work.) AArch32 float-to-fix
+ * must round-to-zero.
+ */
 #define VFP_CONV_FIX_FLOAT(name, p, fsz, ftype, isz, itype)            \
 ftype HELPER(vfp_##name##to##p)(uint##isz##_t  x, uint32_t shift,      \
                                      void *fpstp) \
 { return itype##_to_##float##fsz##_scalbn(x, -shift, fpstp); }
 
+#define VFP_CONV_FIX_FLOAT_ROUND(name, p, fsz, ftype, isz, itype)      \
+    ftype HELPER(vfp_##name##to##p##_round_to_nearest)(uint##isz##_t  x, \
+                                                     uint32_t shift,   \
+                                                     void *fpstp)      \
+    {                                                                  \
+        ftype ret;                                                     \
+        float_status *fpst = fpstp;                                    \
+        FloatRoundMode oldmode = fpst->float_rounding_mode;            \
+        fpst->float_rounding_mode = float_round_nearest_even;          \
+        ret = itype##_to_##float##fsz##_scalbn(x, -shift, fpstp);      \
+        fpst->float_rounding_mode = oldmode;                           \
+        return ret;                                                    \
+    }
+
 #define VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype, ROUND, suff) \
 uint##isz##_t HELPER(vfp_to##name##p##suff)(ftype x, uint32_t shift,      \
                                             void *fpst)                   \
@@ -XXX,XX +XXX,XX @@ uint##isz##_t HELPER(vfp_to##name##p##suff)(ftype x, uint32_t shift,      \
 
 #define VFP_CONV_FIX(name, p, fsz, ftype, isz, itype)            \
 VFP_CONV_FIX_FLOAT(name, p, fsz, ftype, isz, itype)              \
+VFP_CONV_FIX_FLOAT_ROUND(name, p, fsz, ftype, isz, itype)        \
 VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype,        \
                          float_round_to_zero, _round_to_zero)    \
 VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, ftype, isz, itype,        \
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
     /* Switch on op:U:sx bits */
     switch (a->opc) {
     case 0:
-        gen_helper_vfp_shtoh(vd, vd, shift, fpst);
+        gen_helper_vfp_shtoh_round_to_nearest(vd, vd, shift, fpst);
         break;
     case 1:
-        gen_helper_vfp_sltoh(vd, vd, shift, fpst);
+        gen_helper_vfp_sltoh_round_to_nearest(vd, vd, shift, fpst);
         break;
     case 2:
-        gen_helper_vfp_uhtoh(vd, vd, shift, fpst);
+        gen_helper_vfp_uhtoh_round_to_nearest(vd, vd, shift, fpst);
         break;
     case 3:
-        gen_helper_vfp_ultoh(vd, vd, shift, fpst);
+        gen_helper_vfp_ultoh_round_to_nearest(vd, vd, shift, fpst);
         break;
     case 4:
         gen_helper_vfp_toshh_round_to_zero(vd, vd, shift, fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
     /* Switch on op:U:sx bits */
     switch (a->opc) {
     case 0:
-        gen_helper_vfp_shtos(vd, vd, shift, fpst);
+        gen_helper_vfp_shtos_round_to_nearest(vd, vd, shift, fpst);
         break;
     case 1:
-        gen_helper_vfp_sltos(vd, vd, shift, fpst);
+        gen_helper_vfp_sltos_round_to_nearest(vd, vd, shift, fpst);
         break;
     case 2:
-        gen_helper_vfp_uhtos(vd, vd, shift, fpst);
+        gen_helper_vfp_uhtos_round_to_nearest(vd, vd, shift, fpst);
         break;
     case 3:
-        gen_helper_vfp_ultos(vd, vd, shift, fpst);
+        gen_helper_vfp_ultos_round_to_nearest(vd, vd, shift, fpst);
         break;
     case 4:
         gen_helper_vfp_toshs_round_to_zero(vd, vd, shift, fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
     /* Switch on op:U:sx bits */
     switch (a->opc) {
     case 0:
-        gen_helper_vfp_shtod(vd, vd, shift, fpst);
+        gen_helper_vfp_shtod_round_to_nearest(vd, vd, shift, fpst);
         break;
     case 1:
-        gen_helper_vfp_sltod(vd, vd, shift, fpst);
+        gen_helper_vfp_sltod_round_to_nearest(vd, vd, shift, fpst);
         break;
     case 2:
-        gen_helper_vfp_uhtod(vd, vd, shift, fpst);
+        gen_helper_vfp_uhtod_round_to_nearest(vd, vd, shift, fpst);
         break;
     case 3:
-        gen_helper_vfp_ultod(vd, vd, shift, fpst);
+        gen_helper_vfp_ultod_round_to_nearest(vd, vd, shift, fpst);
         break;
     case 4:
         gen_helper_vfp_toshd_round_to_zero(vd, vd, shift, fpst);
-- 
2.20.1

From: Philippe Mathieu-Daudé <philmd@redhat.com>

While APEI is a generic ACPI feature (usable by X86 and ARM64), only
the 'virt' machine uses it, by enabling the RAS Virtualization. See
commit 2afa8c8519: "hw/arm/virt: Introduce a RAS machine option").

Restrict the APEI tables generation code to the single user: the virt
machine. If another machine wants to use it, it simply has to 'select
ACPI_APEI' in its Kconfig.

Fixes: aa16508f1d ("ACPI: Build related register address fields via hardware error fw_cfg blob")
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Dongjiu Geng <gengdongjiu@huawei.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201008161414.2672569-1-philmd@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 default-configs/devices/arm-softmmu.mak | 1 -
 hw/arm/Kconfig                          | 1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/default-configs/devices/arm-softmmu.mak b/default-configs/devices/arm-softmmu.mak
index XXXXXXX..XXXXXXX 100644
--- a/default-configs/devices/arm-softmmu.mak
+++ b/default-configs/devices/arm-softmmu.mak
@@ -XXX,XX +XXX,XX @@ CONFIG_FSL_IMX7=y
 CONFIG_FSL_IMX6UL=y
 CONFIG_SEMIHOSTING=y
 CONFIG_ALLWINNER_H3=y
-CONFIG_ACPI_APEI=y
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -XXX,XX +XXX,XX @@ config ARM_VIRT
     select ACPI_MEMORY_HOTPLUG
     select ACPI_HW_REDUCED
     select ACPI_NVDIMM
+    select ACPI_APEI
 
 config CHEETAH
     bool
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

Use the BCM2835_SYSTIMER_COUNT definition instead of the
magic '4' value.

Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201010203709.3116542-2-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/timer/bcm2835_systmr.h | 4 +++-
 hw/timer/bcm2835_systmr.c         | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/hw/timer/bcm2835_systmr.h b/include/hw/timer/bcm2835_systmr.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/timer/bcm2835_systmr.h
+++ b/include/hw/timer/bcm2835_systmr.h
@@ -XXX,XX +XXX,XX @@
 #define TYPE_BCM2835_SYSTIMER "bcm2835-sys-timer"
 OBJECT_DECLARE_SIMPLE_TYPE(BCM2835SystemTimerState, BCM2835_SYSTIMER)
 
+#define BCM2835_SYSTIMER_COUNT 4
+
 struct BCM2835SystemTimerState {
     /*< private >*/
     SysBusDevice parent_obj;
@@ -XXX,XX +XXX,XX @@ struct BCM2835SystemTimerState {
 
     struct {
         uint32_t status;
-        uint32_t compare[4];
+        uint32_t compare[BCM2835_SYSTIMER_COUNT];
     } reg;
 };
 
diff --git a/hw/timer/bcm2835_systmr.c b/hw/timer/bcm2835_systmr.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/bcm2835_systmr.c
+++ b/hw/timer/bcm2835_systmr.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription bcm2835_systmr_vmstate = {
     .minimum_version_id = 1,
     .fields = (VMStateField[]) {
         VMSTATE_UINT32(reg.status, BCM2835SystemTimerState),
-        VMSTATE_UINT32_ARRAY(reg.compare, BCM2835SystemTimerState, 4),
+        VMSTATE_UINT32_ARRAY(reg.compare, BCM2835SystemTimerState,
+                             BCM2835_SYSTIMER_COUNT),
         VMSTATE_END_OF_LIST()
     }
 };
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

The variable holding the CTRL_STATUS register is misnamed
'status'. Rename it 'ctrl_status' to make it more obvious
this register is also used to control the peripheral.

Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201010203709.3116542-3-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/timer/bcm2835_systmr.h | 2 +-
 hw/timer/bcm2835_systmr.c         | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/hw/timer/bcm2835_systmr.h b/include/hw/timer/bcm2835_systmr.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/timer/bcm2835_systmr.h
+++ b/include/hw/timer/bcm2835_systmr.h
@@ -XXX,XX +XXX,XX @@ struct BCM2835SystemTimerState {
     qemu_irq irq;
 
     struct {
-        uint32_t status;
+        uint32_t ctrl_status;
         uint32_t compare[BCM2835_SYSTIMER_COUNT];
     } reg;
 };
diff --git a/hw/timer/bcm2835_systmr.c b/hw/timer/bcm2835_systmr.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/bcm2835_systmr.c
+++ b/hw/timer/bcm2835_systmr.c
@@ -XXX,XX +XXX,XX @@ REG32(COMPARE3,     0x18)
 
 static void bcm2835_systmr_update_irq(BCM2835SystemTimerState *s)
 {
-    bool enable = !!s->reg.status;
+    bool enable = !!s->reg.ctrl_status;
 
     trace_bcm2835_systmr_irq(enable);
     qemu_set_irq(s->irq, enable);
@@ -XXX,XX +XXX,XX @@ static uint64_t bcm2835_systmr_read(void *opaque, hwaddr offset,
 
     switch (offset) {
     case A_CTRL_STATUS:
-        r = s->reg.status;
+        r = s->reg.ctrl_status;
         break;
     case A_COMPARE0 ... A_COMPARE3:
         r = s->reg.compare[(offset - A_COMPARE0) >> 2];
@@ -XXX,XX +XXX,XX @@ static void bcm2835_systmr_write(void *opaque, hwaddr offset,
     trace_bcm2835_systmr_write(offset, value);
     switch (offset) {
     case A_CTRL_STATUS:
-        s->reg.status &= ~value; /* Ack */
+        s->reg.ctrl_status &= ~value; /* Ack */
         bcm2835_systmr_update_irq(s);
         break;
     case A_COMPARE0 ... A_COMPARE3:
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription bcm2835_systmr_vmstate = {
     .version_id = 1,
     .minimum_version_id = 1,
     .fields = (VMStateField[]) {
-        VMSTATE_UINT32(reg.status, BCM2835SystemTimerState),
+        VMSTATE_UINT32(reg.ctrl_status, BCM2835SystemTimerState),
         VMSTATE_UINT32_ARRAY(reg.compare, BCM2835SystemTimerState,
                              BCM2835_SYSTIMER_COUNT),
         VMSTATE_END_OF_LIST()
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

This peripheral has 1 free-running timer and 4 compare registers.

Only the free-running timer is implemented. Add support the
COMPARE registers (each register is wired to an IRQ).

Reference: "BCM2835 ARM Peripherals" datasheet [*]
            chapter 12 "System Timer":

The System Timer peripheral provides four 32-bit timer channels
  and a single 64-bit free running counter. Each channel has an
  output compare register, which is compared against the 32 least
  significant bits of the free running counter values. When the
  two values match, the system timer peripheral generates a signal
  to indicate a match for the appropriate channel. The match signal
  is then fed into the interrupt controller.

This peripheral is used since Linux 3.7, commit ee4af5696720
("ARM: bcm2835: add system timer").

[*] https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc@lmichel.fr>
Message-id: 20201010203709.3116542-4-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/timer/bcm2835_systmr.h | 11 +++++--
 hw/timer/bcm2835_systmr.c         | 48 ++++++++++++++++++++-----------
 hw/timer/trace-events             |  6 ++--
 3 files changed, 44 insertions(+), 21 deletions(-)

diff --git a/include/hw/timer/bcm2835_systmr.h b/include/hw/timer/bcm2835_systmr.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/timer/bcm2835_systmr.h
+++ b/include/hw/timer/bcm2835_systmr.h
@@ -XXX,XX +XXX,XX @@
 
 #include "hw/sysbus.h"
 #include "hw/irq.h"
+#include "qemu/timer.h"
 #include "qom/object.h"
 
 #define TYPE_BCM2835_SYSTIMER "bcm2835-sys-timer"
@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(BCM2835SystemTimerState, BCM2835_SYSTIMER)
 
 #define BCM2835_SYSTIMER_COUNT 4
 
+typedef struct {
+    unsigned id;
+    QEMUTimer timer;
+    qemu_irq irq;
+    BCM2835SystemTimerState *state;
+} BCM2835SystemTimerCompare;
+
 struct BCM2835SystemTimerState {
     /*< private >*/
     SysBusDevice parent_obj;
 
     /*< public >*/
     MemoryRegion iomem;
-    qemu_irq irq;
-
     struct {
         uint32_t ctrl_status;
         uint32_t compare[BCM2835_SYSTIMER_COUNT];
     } reg;
+    BCM2835SystemTimerCompare tmr[BCM2835_SYSTIMER_COUNT];
 };
 
 #endif
diff --git a/hw/timer/bcm2835_systmr.c b/hw/timer/bcm2835_systmr.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/bcm2835_systmr.c
+++ b/hw/timer/bcm2835_systmr.c
@@ -XXX,XX +XXX,XX @@ REG32(COMPARE1,     0x10)
 REG32(COMPARE2,     0x14)
 REG32(COMPARE3,     0x18)
 
-static void bcm2835_systmr_update_irq(BCM2835SystemTimerState *s)
+static void bcm2835_systmr_timer_expire(void *opaque)
 {
-    bool enable = !!s->reg.ctrl_status;
+    BCM2835SystemTimerCompare *tmr = opaque;
 
-    trace_bcm2835_systmr_irq(enable);
-    qemu_set_irq(s->irq, enable);
-}
-
-static void bcm2835_systmr_update_compare(BCM2835SystemTimerState *s,
-                                          unsigned timer_index)
-{
-    /* TODO fow now, since neither Linux nor U-boot use these timers. */
-    qemu_log_mask(LOG_UNIMP, "COMPARE register %u not implemented\n",
-                  timer_index);
+    trace_bcm2835_systmr_timer_expired(tmr->id);
+    tmr->state->reg.ctrl_status |= 1 << tmr->id;
+    qemu_set_irq(tmr->irq, 1);
 }
 
 static uint64_t bcm2835_systmr_read(void *opaque, hwaddr offset,
@@ -XXX,XX +XXX,XX @@ static uint64_t bcm2835_systmr_read(void *opaque, hwaddr offset,
 }
 
 static void bcm2835_systmr_write(void *opaque, hwaddr offset,
-                                 uint64_t value, unsigned size)
+                                 uint64_t value64, unsigned size)
 {
     BCM2835SystemTimerState *s = BCM2835_SYSTIMER(opaque);
+    int index;
+    uint32_t value = value64;
+    uint32_t triggers_delay_us;
+    uint64_t now;
 
     trace_bcm2835_systmr_write(offset, value);
     switch (offset) {
     case A_CTRL_STATUS:
         s->reg.ctrl_status &= ~value; /* Ack */
-        bcm2835_systmr_update_irq(s);
+        for (index = 0; index < ARRAY_SIZE(s->tmr); index++) {
+            if (extract32(value, index, 1)) {
+                trace_bcm2835_systmr_irq_ack(index);
+                qemu_set_irq(s->tmr[index].irq, 0);
+            }
+        }
         break;
     case A_COMPARE0 ... A_COMPARE3:
-        s->reg.compare[(offset - A_COMPARE0) >> 2] = value;
-        bcm2835_systmr_update_compare(s, (offset - A_COMPARE0) >> 2);
+        index = (offset - A_COMPARE0) >> 2;
+        s->reg.compare[index] = value;
+        now = qemu_clock_get_us(QEMU_CLOCK_VIRTUAL);
+        /* Compare lower 32-bits of the free-running counter. */
+        triggers_delay_us = value - now;
+        trace_bcm2835_systmr_run(index, triggers_delay_us);
+        timer_mod(&s->tmr[index].timer, now + triggers_delay_us);
         break;
     case A_COUNTER_LOW:
     case A_COUNTER_HIGH:
@@ -XXX,XX +XXX,XX @@ static void bcm2835_systmr_realize(DeviceState *dev, Error **errp)
     memory_region_init_io(&s->iomem, OBJECT(dev), &bcm2835_systmr_ops,
                           s, "bcm2835-sys-timer", 0x20);
     sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->iomem);
-    sysbus_init_irq(SYS_BUS_DEVICE(dev), &s->irq);
+
+    for (size_t i = 0; i < ARRAY_SIZE(s->tmr); i++) {
+        s->tmr[i].id = i;
+        s->tmr[i].state = s;
+        sysbus_init_irq(SYS_BUS_DEVICE(dev), &s->tmr[i].irq);
+        timer_init_us(&s->tmr[i].timer, QEMU_CLOCK_VIRTUAL,
+                      bcm2835_systmr_timer_expire, &s->tmr[i]);
+    }
 }
 
 static const VMStateDescription bcm2835_systmr_vmstate = {
diff --git a/hw/timer/trace-events b/hw/timer/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/trace-events
+++ b/hw/timer/trace-events
@@ -XXX,XX +XXX,XX @@ nrf51_timer_write(uint8_t timer_id, uint64_t addr, uint32_t value, unsigned size
 nrf51_timer_set_count(uint8_t timer_id, uint8_t counter_id, uint32_t value) "timer %u counter %u count 0x%" PRIx32
 
 # bcm2835_systmr.c
-bcm2835_systmr_irq(bool enable) "timer irq state %u"
+bcm2835_systmr_timer_expired(unsigned id) "timer #%u expired"
+bcm2835_systmr_irq_ack(unsigned id) "timer #%u acked"
 bcm2835_systmr_read(uint64_t offset, uint64_t data) "timer read: offset 0x%" PRIx64 " data 0x%" PRIx64
-bcm2835_systmr_write(uint64_t offset, uint64_t data) "timer write: offset 0x%" PRIx64 " data 0x%" PRIx64
+bcm2835_systmr_write(uint64_t offset, uint32_t data) "timer write: offset 0x%" PRIx64 " data 0x%" PRIx32
+bcm2835_systmr_run(unsigned id, uint64_t delay_us) "timer #%u expiring in %"PRIu64" us"
 
 # avr_timer16.c
 avr_timer16_read(uint8_t addr, uint8_t value) "timer16 read addr:%u value:%u"
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

The SYS_timer is not directly wired to the ARM core, but to the
SoC (peripheral) interrupt controller.

Fixes: 0e5bbd74064 ("hw/arm/bcm2835_peripherals: Use the SYS_timer")
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201010203709.3116542-5-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/bcm2835_peripherals.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/bcm2835_peripherals.c
+++ b/hw/arm/bcm2835_peripherals.c
@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
     memory_region_add_subregion(&s->peri_mr, ST_OFFSET,
                 sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->systmr), 0));
     sysbus_connect_irq(SYS_BUS_DEVICE(&s->systmr), 0,
-        qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_ARM_IRQ,
-                               INTERRUPT_ARM_TIMER));
+        qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
+                               INTERRUPT_TIMER0));
+    sysbus_connect_irq(SYS_BUS_DEVICE(&s->systmr), 1,
+        qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
+                               INTERRUPT_TIMER1));
+    sysbus_connect_irq(SYS_BUS_DEVICE(&s->systmr), 2,
+        qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
+                               INTERRUPT_TIMER2));
+    sysbus_connect_irq(SYS_BUS_DEVICE(&s->systmr), 3,
+        qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
+                               INTERRUPT_TIMER3));
 
     /* UART0 */
     qdev_prop_set_chr(DEVICE(&s->uart0), "chardev", serial_hd(0));
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

On ARM, the Top Byte Ignore feature means that only 56 bits of
the address are significant in the virtual address.  We are
required to give the entire 64-bit address to FAR_ELx on fault,
which means that we do not "clean" the top byte early in TCG.

This new interface allows us to flush all 256 possible aliases
for a given page, currently missed by tlb_flush_page*.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20201016210754.818257-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/exec/exec-all.h |  36 ++++++
 accel/tcg/cputlb.c      | 275 ++++++++++++++++++++++++++++++++++++++--
 2 files changed, 302 insertions(+), 9 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index XXXXXXX..XXXXXXX 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -XXX,XX +XXX,XX @@ void tlb_flush_by_mmuidx_all_cpus(CPUState *cpu, uint16_t idxmap);
  * depend on when the guests translation ends the TB.
  */
 void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu, uint16_t idxmap);
+
+/**
+ * tlb_flush_page_bits_by_mmuidx
+ * @cpu: CPU whose TLB should be flushed
+ * @addr: virtual address of page to be flushed
+ * @idxmap: bitmap of mmu indexes to flush
+ * @bits: number of significant bits in address
+ *
+ * Similar to tlb_flush_page_mask, but with a bitmap of indexes.
+ */
+void tlb_flush_page_bits_by_mmuidx(CPUState *cpu, target_ulong addr,
+                                   uint16_t idxmap, unsigned bits);
+
+/* Similarly, with broadcast and syncing. */
+void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *cpu, target_ulong addr,
+                                            uint16_t idxmap, unsigned bits);
+void tlb_flush_page_bits_by_mmuidx_all_cpus_synced
+    (CPUState *cpu, target_ulong addr, uint16_t idxmap, unsigned bits);
+
 /**
  * tlb_set_page_with_attrs:
  * @cpu: CPU to add this TLB entry for
@@ -XXX,XX +XXX,XX @@ static inline void tlb_flush_by_mmuidx_all_cpus_synced(CPUState *cpu,
                                                        uint16_t idxmap)
 {
 }
+static inline void tlb_flush_page_bits_by_mmuidx(CPUState *cpu,
+                                                 target_ulong addr,
+                                                 uint16_t idxmap,
+                                                 unsigned bits)
+{
+}
+static inline void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *cpu,
+                                                          target_ulong addr,
+                                                          uint16_t idxmap,
+                                                          unsigned bits)
+{
+}
+static inline void
+tlb_flush_page_bits_by_mmuidx_all_cpus_synced(CPUState *cpu, target_ulong addr,
+                                              uint16_t idxmap, unsigned bits)
+{
+}
 #endif
 /**
  * probe_access:
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -XXX,XX +XXX,XX @@ void tlb_flush_all_cpus_synced(CPUState *src_cpu)
     tlb_flush_by_mmuidx_all_cpus_synced(src_cpu, ALL_MMUIDX_BITS);
 }
 
+static bool tlb_hit_page_mask_anyprot(CPUTLBEntry *tlb_entry,
+                                      target_ulong page, target_ulong mask)
+{
+    page &= mask;
+    mask &= TARGET_PAGE_MASK | TLB_INVALID_MASK;
+
+    return (page == (tlb_entry->addr_read & mask) ||
+            page == (tlb_addr_write(tlb_entry) & mask) ||
+            page == (tlb_entry->addr_code & mask));
+}
+
 static inline bool tlb_hit_page_anyprot(CPUTLBEntry *tlb_entry,
                                         target_ulong page)
 {
-    return tlb_hit_page(tlb_entry->addr_read, page) ||
-           tlb_hit_page(tlb_addr_write(tlb_entry), page) ||
-           tlb_hit_page(tlb_entry->addr_code, page);
+    return tlb_hit_page_mask_anyprot(tlb_entry, page, -1);
 }
 
 /**
@@ -XXX,XX +XXX,XX @@ static inline bool tlb_entry_is_empty(const CPUTLBEntry *te)
 }
 
 /* Called with tlb_c.lock held */
-static inline bool tlb_flush_entry_locked(CPUTLBEntry *tlb_entry,
-                                          target_ulong page)
+static bool tlb_flush_entry_mask_locked(CPUTLBEntry *tlb_entry,
+                                        target_ulong page,
+                                        target_ulong mask)
 {
-    if (tlb_hit_page_anyprot(tlb_entry, page)) {
+    if (tlb_hit_page_mask_anyprot(tlb_entry, page, mask)) {
         memset(tlb_entry, -1, sizeof(*tlb_entry));
         return true;
     }
     return false;
 }
 
+static inline bool tlb_flush_entry_locked(CPUTLBEntry *tlb_entry,
+                                          target_ulong page)
+{
+    return tlb_flush_entry_mask_locked(tlb_entry, page, -1);
+}
+
 /* Called with tlb_c.lock held */
-static inline void tlb_flush_vtlb_page_locked(CPUArchState *env, int mmu_idx,
-                                              target_ulong page)
+static void tlb_flush_vtlb_page_mask_locked(CPUArchState *env, int mmu_idx,
+                                            target_ulong page,
+                                            target_ulong mask)
 {
     CPUTLBDesc *d = &env_tlb(env)->d[mmu_idx];
     int k;
 
     assert_cpu_is_self(env_cpu(env));
     for (k = 0; k < CPU_VTLB_SIZE; k++) {
-        if (tlb_flush_entry_locked(&d->vtable[k], page)) {
+        if (tlb_flush_entry_mask_locked(&d->vtable[k], page, mask)) {
             tlb_n_used_entries_dec(env, mmu_idx);
         }
     }
 }
 
+static inline void tlb_flush_vtlb_page_locked(CPUArchState *env, int mmu_idx,
+                                              target_ulong page)
+{
+    tlb_flush_vtlb_page_mask_locked(env, mmu_idx, page, -1);
+}
+
 static void tlb_flush_page_locked(CPUArchState *env, int midx,
                                   target_ulong page)
 {
@@ -XXX,XX +XXX,XX @@ void tlb_flush_page_all_cpus_synced(CPUState *src, target_ulong addr)
     tlb_flush_page_by_mmuidx_all_cpus_synced(src, addr, ALL_MMUIDX_BITS);
 }
 
+static void tlb_flush_page_bits_locked(CPUArchState *env, int midx,
+                                       target_ulong page, unsigned bits)
+{
+    CPUTLBDesc *d = &env_tlb(env)->d[midx];
+    CPUTLBDescFast *f = &env_tlb(env)->f[midx];
+    target_ulong mask = MAKE_64BIT_MASK(0, bits);
+
+    /*
+     * If @bits is smaller than the tlb size, there may be multiple entries
+     * within the TLB; otherwise all addresses that match under @mask hit
+     * the same TLB entry.
+     *
+     * TODO: Perhaps allow bits to be a few bits less than the size.
+     * For now, just flush the entire TLB.
+     */
+    if (mask < f->mask) {
+        tlb_debug("forcing full flush midx %d ("
+                  TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
+                  midx, page, mask);
+        tlb_flush_one_mmuidx_locked(env, midx, get_clock_realtime());
+        return;
+    }
+
+    /* Check if we need to flush due to large pages.  */
+    if ((page & d->large_page_mask) == d->large_page_addr) {
+        tlb_debug("forcing full flush midx %d ("
+                  TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
+                  midx, d->large_page_addr, d->large_page_mask);
+        tlb_flush_one_mmuidx_locked(env, midx, get_clock_realtime());
+        return;
+    }
+
+    if (tlb_flush_entry_mask_locked(tlb_entry(env, midx, page), page, mask)) {
+        tlb_n_used_entries_dec(env, midx);
+    }
+    tlb_flush_vtlb_page_mask_locked(env, midx, page, mask);
+}
+
+typedef struct {
+    target_ulong addr;
+    uint16_t idxmap;
+    uint16_t bits;
+} TLBFlushPageBitsByMMUIdxData;
+
+static void
+tlb_flush_page_bits_by_mmuidx_async_0(CPUState *cpu,
+                                      TLBFlushPageBitsByMMUIdxData d)
+{
+    CPUArchState *env = cpu->env_ptr;
+    int mmu_idx;
+
+    assert_cpu_is_self(cpu);
+
+    tlb_debug("page addr:" TARGET_FMT_lx "/%u mmu_map:0x%x\n",
+              d.addr, d.bits, d.idxmap);
+
+    qemu_spin_lock(&env_tlb(env)->c.lock);
+    for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
+        if ((d.idxmap >> mmu_idx) & 1) {
+            tlb_flush_page_bits_locked(env, mmu_idx, d.addr, d.bits);
+        }
+    }
+    qemu_spin_unlock(&env_tlb(env)->c.lock);
+
+    tb_flush_jmp_cache(cpu, d.addr);
+}
+
+static bool encode_pbm_to_runon(run_on_cpu_data *out,
+                                TLBFlushPageBitsByMMUIdxData d)
+{
+    /* We need 6 bits to hold to hold @bits up to 63. */
+    if (d.idxmap <= MAKE_64BIT_MASK(0, TARGET_PAGE_BITS - 6)) {
+        *out = RUN_ON_CPU_TARGET_PTR(d.addr | (d.idxmap << 6) | d.bits);
+        return true;
+    }
+    return false;
+}
+
+static TLBFlushPageBitsByMMUIdxData
+decode_runon_to_pbm(run_on_cpu_data data)
+{
+    target_ulong addr_map_bits = (target_ulong) data.target_ptr;
+    return (TLBFlushPageBitsByMMUIdxData){
+        .addr = addr_map_bits & TARGET_PAGE_MASK,
+        .idxmap = (addr_map_bits & ~TARGET_PAGE_MASK) >> 6,
+        .bits = addr_map_bits & 0x3f
+    };
+}
+
+static void tlb_flush_page_bits_by_mmuidx_async_1(CPUState *cpu,
+                                                  run_on_cpu_data runon)
+{
+    tlb_flush_page_bits_by_mmuidx_async_0(cpu, decode_runon_to_pbm(runon));
+}
+
+static void tlb_flush_page_bits_by_mmuidx_async_2(CPUState *cpu,
+                                                  run_on_cpu_data data)
+{
+    TLBFlushPageBitsByMMUIdxData *d = data.host_ptr;
+    tlb_flush_page_bits_by_mmuidx_async_0(cpu, *d);
+    g_free(d);
+}
+
+void tlb_flush_page_bits_by_mmuidx(CPUState *cpu, target_ulong addr,
+                                   uint16_t idxmap, unsigned bits)
+{
+    TLBFlushPageBitsByMMUIdxData d;
+    run_on_cpu_data runon;
+
+    /* If all bits are significant, this devolves to tlb_flush_page. */
+    if (bits >= TARGET_LONG_BITS) {
+        tlb_flush_page_by_mmuidx(cpu, addr, idxmap);
+        return;
+    }
+    /* If no page bits are significant, this devolves to tlb_flush. */
+    if (bits < TARGET_PAGE_BITS) {
+        tlb_flush_by_mmuidx(cpu, idxmap);
+        return;
+    }
+
+    /* This should already be page aligned */
+    d.addr = addr & TARGET_PAGE_MASK;
+    d.idxmap = idxmap;
+    d.bits = bits;
+
+    if (qemu_cpu_is_self(cpu)) {
+        tlb_flush_page_bits_by_mmuidx_async_0(cpu, d);
+    } else if (encode_pbm_to_runon(&runon, d)) {
+        async_run_on_cpu(cpu, tlb_flush_page_bits_by_mmuidx_async_1, runon);
+    } else {
+        TLBFlushPageBitsByMMUIdxData *p
+            = g_new(TLBFlushPageBitsByMMUIdxData, 1);
+
+        /* Otherwise allocate a structure, freed by the worker.  */
+        *p = d;
+        async_run_on_cpu(cpu, tlb_flush_page_bits_by_mmuidx_async_2,
+                         RUN_ON_CPU_HOST_PTR(p));
+    }
+}
+
+void tlb_flush_page_bits_by_mmuidx_all_cpus(CPUState *src_cpu,
+                                            target_ulong addr,
+                                            uint16_t idxmap,
+                                            unsigned bits)
+{
+    TLBFlushPageBitsByMMUIdxData d;
+    run_on_cpu_data runon;
+
+    /* If all bits are significant, this devolves to tlb_flush_page. */
+    if (bits >= TARGET_LONG_BITS) {
+        tlb_flush_page_by_mmuidx_all_cpus(src_cpu, addr, idxmap);
+        return;
+    }
+    /* If no page bits are significant, this devolves to tlb_flush. */
+    if (bits < TARGET_PAGE_BITS) {
+        tlb_flush_by_mmuidx_all_cpus(src_cpu, idxmap);
+        return;
+    }
+
+    /* This should already be page aligned */
+    d.addr = addr & TARGET_PAGE_MASK;
+    d.idxmap = idxmap;
+    d.bits = bits;
+
+    if (encode_pbm_to_runon(&runon, d)) {
+        flush_all_helper(src_cpu, tlb_flush_page_bits_by_mmuidx_async_1, runon);
+    } else {
+        CPUState *dst_cpu;
+        TLBFlushPageBitsByMMUIdxData *p;
+
+        /* Allocate a separate data block for each destination cpu.  */
+        CPU_FOREACH(dst_cpu) {
+            if (dst_cpu != src_cpu) {
+                p = g_new(TLBFlushPageBitsByMMUIdxData, 1);
+                *p = d;
+                async_run_on_cpu(dst_cpu,
+                                 tlb_flush_page_bits_by_mmuidx_async_2,
+                                 RUN_ON_CPU_HOST_PTR(p));
+            }
+        }
+    }
+
+    tlb_flush_page_bits_by_mmuidx_async_0(src_cpu, d);
+}
+
+void tlb_flush_page_bits_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
+                                                   target_ulong addr,
+                                                   uint16_t idxmap,
+                                                   unsigned bits)
+{
+    TLBFlushPageBitsByMMUIdxData d;
+    run_on_cpu_data runon;
+
+    /* If all bits are significant, this devolves to tlb_flush_page. */
+    if (bits >= TARGET_LONG_BITS) {
+        tlb_flush_page_by_mmuidx_all_cpus_synced(src_cpu, addr, idxmap);
+        return;
+    }
+    /* If no page bits are significant, this devolves to tlb_flush. */
+    if (bits < TARGET_PAGE_BITS) {
+        tlb_flush_by_mmuidx_all_cpus_synced(src_cpu, idxmap);
+        return;
+    }
+
+    /* This should already be page aligned */
+    d.addr = addr & TARGET_PAGE_MASK;
+    d.idxmap = idxmap;
+    d.bits = bits;
+
+    if (encode_pbm_to_runon(&runon, d)) {
+        flush_all_helper(src_cpu, tlb_flush_page_bits_by_mmuidx_async_1, runon);
+        async_safe_run_on_cpu(src_cpu, tlb_flush_page_bits_by_mmuidx_async_1,
+                              runon);
+    } else {
+        CPUState *dst_cpu;
+        TLBFlushPageBitsByMMUIdxData *p;
+
+        /* Allocate a separate data block for each destination cpu.  */
+        CPU_FOREACH(dst_cpu) {
+            if (dst_cpu != src_cpu) {
+                p = g_new(TLBFlushPageBitsByMMUIdxData, 1);
+                *p = d;
+                async_run_on_cpu(dst_cpu, tlb_flush_page_bits_by_mmuidx_async_2,
+                                 RUN_ON_CPU_HOST_PTR(p));
+            }
+        }
+
+        p = g_new(TLBFlushPageBitsByMMUIdxData, 1);
+        *p = d;
+        async_safe_run_on_cpu(src_cpu, tlb_flush_page_bits_by_mmuidx_async_2,
+                              RUN_ON_CPU_HOST_PTR(p));
+    }
+}
+
 /* update the TLBs so that writes to code in the virtual page 'addr'
    can be detected */
 void tlb_protect_code(ram_addr_t ram_addr)
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

When TBI is enabled in a given regime, 56 bits of the address
are significant and we need to clear out any other matching
virtual addresses with differing tags.

The other uses of tlb_flush_page (without mmuidx) in this file
are only used by aarch32 mode.

Fixes: 38d931687fa1
Reported-by: Jordan Frank <jordanfrank@fb.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201016210754.818257-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 46 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
 #endif
 
 static void switch_mode(CPUARMState *env, int mode);
+static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx);
 
 static int vfp_gdb_get_reg(CPUARMState *env, GByteArray *buf, int reg)
 {
@@ -XXX,XX +XXX,XX @@ static int vae1_tlbmask(CPUARMState *env)
     }
 }
 
+/* Return 56 if TBI is enabled, 64 otherwise. */
+static int tlbbits_for_regime(CPUARMState *env, ARMMMUIdx mmu_idx,
+                              uint64_t addr)
+{
+    uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
+    int tbi = aa64_va_parameter_tbi(tcr, mmu_idx);
+    int select = extract64(addr, 55, 1);
+
+    return (tbi >> select) & 1 ? 56 : 64;
+}
+
+static int vae1_tlbbits(CPUARMState *env, uint64_t addr)
+{
+    ARMMMUIdx mmu_idx;
+
+    /* Only the regime of the mmu_idx below is significant. */
+    if (arm_is_secure_below_el3(env)) {
+        mmu_idx = ARMMMUIdx_SE10_0;
+    } else if ((env->cp15.hcr_el2 & (HCR_E2H | HCR_TGE))
+               == (HCR_E2H | HCR_TGE)) {
+        mmu_idx = ARMMMUIdx_E20_0;
+    } else {
+        mmu_idx = ARMMMUIdx_E10_0;
+    }
+    return tlbbits_for_regime(env, mmu_idx, addr);
+}
+
 static void tlbi_aa64_vmalle1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                       uint64_t value)
 {
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae1is_write(CPUARMState *env, const ARMCPRegInfo *ri,
     CPUState *cs = env_cpu(env);
     int mask = vae1_tlbmask(env);
     uint64_t pageaddr = sextract64(value << 12, 0, 56);
+    int bits = vae1_tlbbits(env, pageaddr);
 
-    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, mask);
+    tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits);
 }
 
 static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae1_write(CPUARMState *env, const ARMCPRegInfo *ri,
     CPUState *cs = env_cpu(env);
     int mask = vae1_tlbmask(env);
     uint64_t pageaddr = sextract64(value << 12, 0, 56);
+    int bits = vae1_tlbbits(env, pageaddr);
 
     if (tlb_force_broadcast(env)) {
-        tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr, mask);
+        tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits);
     } else {
-        tlb_flush_page_by_mmuidx(cs, pageaddr, mask);
+        tlb_flush_page_bits_by_mmuidx(cs, pageaddr, mask, bits);
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri,
 {
     CPUState *cs = env_cpu(env);
     uint64_t pageaddr = sextract64(value << 12, 0, 56);
+    int bits = tlbbits_for_regime(env, ARMMMUIdx_E2, pageaddr);
 
-    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
-                                             ARMMMUIdxBit_E2);
+    tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr,
+                                                  ARMMMUIdxBit_E2, bits);
 }
 
 static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
 {
     CPUState *cs = env_cpu(env);
     uint64_t pageaddr = sextract64(value << 12, 0, 56);
+    int bits = tlbbits_for_regime(env, ARMMMUIdx_SE3, pageaddr);
 
-    tlb_flush_page_by_mmuidx_all_cpus_synced(cs, pageaddr,
-                                             ARMMMUIdxBit_SE3);
+    tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr,
+                                                  ARMMMUIdxBit_SE3, bits);
 }
 
 static CPAccessResult aa64_zva_access(CPUARMState *env, const ARMCPRegInfo *ri,
-- 
2.20.1

From: Havard Skinnemoen <hskinnemoen@google.com>

This test exercises the various modes of the npcm7xx timer. In
particular, it triggers the bug found by the fuzzer, as reported here:

https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg02992.html

It also found several other bugs, especially related to interrupt
handling.

The test exercises all the timers in all the timer modules, which
expands to 180 test cases in total.

Reviewed-by: Tyrone Ting <kfting@nuvoton.com>
Signed-off-by: Havard Skinnemoen <hskinnemoen@google.com>
Message-id: 20201008232154.94221-2-hskinnemoen@google.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/qtest/npcm7xx_timer-test.c | 562 +++++++++++++++++++++++++++++++
 tests/qtest/meson.build          |   1 +
 2 files changed, 563 insertions(+)
 create mode 100644 tests/qtest/npcm7xx_timer-test.c

diff --git a/tests/qtest/npcm7xx_timer-test.c b/tests/qtest/npcm7xx_timer-test.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/qtest/npcm7xx_timer-test.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * QTest testcase for the Nuvoton NPCM7xx Timer
+ *
+ * Copyright 2020 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/timer.h"
+#include "libqtest-single.h"
+
+#define TIM_REF_HZ      (25000000)
+
+/* Bits in TCSRx */
+#define CEN             BIT(30)
+#define IE              BIT(29)
+#define MODE_ONESHOT    (0 << 27)
+#define MODE_PERIODIC   (1 << 27)
+#define CRST            BIT(26)
+#define CACT            BIT(25)
+#define PRESCALE(x)     (x)
+
+/* Registers shared between all timers in a module. */
+#define TISR    0x18
+#define WTCR    0x1c
+# define WTCLK(x)       ((x) << 10)
+
+/* Power-on default; used to re-initialize timers before each test. */
+#define TCSR_DEFAULT    PRESCALE(5)
+
+/* Register offsets for a timer within a timer block. */
+typedef struct Timer {
+    unsigned int tcsr_offset;
+    unsigned int ticr_offset;
+    unsigned int tdr_offset;
+} Timer;
+
+/* A timer block containing 5 timers. */
+typedef struct TimerBlock {
+    int irq_base;
+    uint64_t base_addr;
+} TimerBlock;
+
+/* Testdata for testing a particular timer within a timer block. */
+typedef struct TestData {
+    const TimerBlock *tim;
+    const Timer *timer;
+} TestData;
+
+const TimerBlock timer_block[] = {
+    {
+        .irq_base   = 32,
+        .base_addr  = 0xf0008000,
+    },
+    {
+        .irq_base   = 37,
+        .base_addr  = 0xf0009000,
+    },
+    {
+        .irq_base   = 42,
+        .base_addr  = 0xf000a000,
+    },
+};
+
+const Timer timer[] = {
+    {
+        .tcsr_offset    = 0x00,
+        .ticr_offset    = 0x08,
+        .tdr_offset     = 0x10,
+    }, {
+        .tcsr_offset    = 0x04,
+        .ticr_offset    = 0x0c,
+        .tdr_offset     = 0x14,
+    }, {
+        .tcsr_offset    = 0x20,
+        .ticr_offset    = 0x28,
+        .tdr_offset     = 0x30,
+    }, {
+        .tcsr_offset    = 0x24,
+        .ticr_offset    = 0x2c,
+        .tdr_offset     = 0x34,
+    }, {
+        .tcsr_offset    = 0x40,
+        .ticr_offset    = 0x48,
+        .tdr_offset     = 0x50,
+    },
+};
+
+/* Returns the index of the timer block. */
+static int tim_index(const TimerBlock *tim)
+{
+    ptrdiff_t diff = tim - timer_block;
+
+    g_assert(diff >= 0 && diff < ARRAY_SIZE(timer_block));
+
+    return diff;
+}
+
+/* Returns the index of a timer within a timer block. */
+static int timer_index(const Timer *t)
+{
+    ptrdiff_t diff = t - timer;
+
+    g_assert(diff >= 0 && diff < ARRAY_SIZE(timer));
+
+    return diff;
+}
+
+/* Returns the irq line for a given timer. */
+static int tim_timer_irq(const TestData *td)
+{
+    return td->tim->irq_base + timer_index(td->timer);
+}
+
+/* Register read/write accessors. */
+
+static void tim_write(const TestData *td,
+                      unsigned int offset, uint32_t value)
+{
+    writel(td->tim->base_addr + offset, value);
+}
+
+static uint32_t tim_read(const TestData *td, unsigned int offset)
+{
+    return readl(td->tim->base_addr + offset);
+}
+
+static void tim_write_tcsr(const TestData *td, uint32_t value)
+{
+    tim_write(td, td->timer->tcsr_offset, value);
+}
+
+static uint32_t tim_read_tcsr(const TestData *td)
+{
+    return tim_read(td, td->timer->tcsr_offset);
+}
+
+static void tim_write_ticr(const TestData *td, uint32_t value)
+{
+    tim_write(td, td->timer->ticr_offset, value);
+}
+
+static uint32_t tim_read_ticr(const TestData *td)
+{
+    return tim_read(td, td->timer->ticr_offset);
+}
+
+static uint32_t tim_read_tdr(const TestData *td)
+{
+    return tim_read(td, td->timer->tdr_offset);
+}
+
+/* Returns the number of nanoseconds to count the given number of cycles. */
+static int64_t tim_calculate_step(uint32_t count, uint32_t prescale)
+{
+    return (1000000000LL / TIM_REF_HZ) * count * (prescale + 1);
+}
+
+/* Returns a bitmask corresponding to the timer under test. */
+static uint32_t tim_timer_bit(const TestData *td)
+{
+    return BIT(timer_index(td->timer));
+}
+
+/* Resets all timers to power-on defaults. */
+static void tim_reset(const TestData *td)
+{
+    int i, j;
+
+    /* Reset all the timers, in case a previous test left a timer running. */
+    for (i = 0; i < ARRAY_SIZE(timer_block); i++) {
+        for (j = 0; j < ARRAY_SIZE(timer); j++) {
+            writel(timer_block[i].base_addr + timer[j].tcsr_offset,
+                   CRST | TCSR_DEFAULT);
+        }
+        writel(timer_block[i].base_addr + TISR, -1);
+    }
+}
+
+/* Verifies the reset state of a timer. */
+static void test_reset(gconstpointer test_data)
+{
+    const TestData *td = test_data;
+
+    tim_reset(td);
+
+    g_assert_cmphex(tim_read_tcsr(td), ==, TCSR_DEFAULT);
+    g_assert_cmphex(tim_read_ticr(td), ==, 0);
+    g_assert_cmphex(tim_read_tdr(td), ==, 0);
+    g_assert_cmphex(tim_read(td, TISR), ==, 0);
+    g_assert_cmphex(tim_read(td, WTCR), ==, WTCLK(1));
+}
+
+/* Verifies that CRST wins if both CEN and CRST are set. */
+static void test_reset_overrides_enable(gconstpointer test_data)
+{
+    const TestData *td = test_data;
+
+    tim_reset(td);
+
+    /* CRST should force CEN to 0 */
+    tim_write_tcsr(td, CEN | CRST | TCSR_DEFAULT);
+
+    g_assert_cmphex(tim_read_tcsr(td), ==, TCSR_DEFAULT);
+    g_assert_cmphex(tim_read_tdr(td), ==, 0);
+    g_assert_cmphex(tim_read(td, TISR), ==, 0);
+}
+
+/* Verifies the behavior when CEN is set and then cleared. */
+static void test_oneshot_enable_then_disable(gconstpointer test_data)
+{
+    const TestData *td = test_data;
+
+    tim_reset(td);
+
+    /* Enable the timer with zero initial count, then disable it again. */
+    tim_write_tcsr(td, CEN | TCSR_DEFAULT);
+    tim_write_tcsr(td, TCSR_DEFAULT);
+
+    g_assert_cmphex(tim_read_tcsr(td), ==, TCSR_DEFAULT);
+    g_assert_cmphex(tim_read_tdr(td), ==, 0);
+    /* Timer interrupt flag should be set, but interrupts are not enabled. */
+    g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
+    g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
+}
+
+/* Verifies that a one-shot timer fires when expected with prescaler 5. */
+static void test_oneshot_ps5(gconstpointer test_data)
+{
+    const TestData *td = test_data;
+    unsigned int count = 256;
+    unsigned int ps = 5;
+
+    tim_reset(td);
+
+    tim_write_ticr(td, count);
+    tim_write_tcsr(td, CEN | PRESCALE(ps));
+    g_assert_cmphex(tim_read_tcsr(td), ==, CEN | CACT | PRESCALE(ps));
+    g_assert_cmpuint(tim_read_tdr(td), ==, count);
+
+    clock_step(tim_calculate_step(count, ps) - 1);
+
+    g_assert_cmphex(tim_read_tcsr(td), ==, CEN | CACT | PRESCALE(ps));
+    g_assert_cmpuint(tim_read_tdr(td), <, count);
+    g_assert_cmphex(tim_read(td, TISR), ==, 0);
+
+    clock_step(1);
+
+    g_assert_cmphex(tim_read_tcsr(td), ==, PRESCALE(ps));
+    g_assert_cmpuint(tim_read_tdr(td), ==, count);
+    g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
+    g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
+
+    /* Clear the interrupt flag. */
+    tim_write(td, TISR, tim_timer_bit(td));
+    g_assert_cmphex(tim_read(td, TISR), ==, 0);
+    g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
+
+    /* Verify that this isn't a periodic timer. */
+    clock_step(2 * tim_calculate_step(count, ps));
+    g_assert_cmphex(tim_read(td, TISR), ==, 0);
+    g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
+}
+
+/* Verifies that a one-shot timer fires when expected with prescaler 0. */
+static void test_oneshot_ps0(gconstpointer test_data)
+{
+    const TestData *td = test_data;
+    unsigned int count = 1;
+    unsigned int ps = 0;
+
+    tim_reset(td);
+
+    tim_write_ticr(td, count);
+    tim_write_tcsr(td, CEN | PRESCALE(ps));
+    g_assert_cmphex(tim_read_tcsr(td), ==, CEN | CACT | PRESCALE(ps));
+    g_assert_cmpuint(tim_read_tdr(td), ==, count);
+
+    clock_step(tim_calculate_step(count, ps) - 1);
+
+    g_assert_cmphex(tim_read_tcsr(td), ==, CEN | CACT | PRESCALE(ps));
+    g_assert_cmpuint(tim_read_tdr(td), <, count);
+    g_assert_cmphex(tim_read(td, TISR), ==, 0);
+
+    clock_step(1);
+
+    g_assert_cmphex(tim_read_tcsr(td), ==, PRESCALE(ps));
+    g_assert_cmpuint(tim_read_tdr(td), ==, count);
+    g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
+    g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
+}
+
+/* Verifies that a one-shot timer fires when expected with highest prescaler. */
+static void test_oneshot_ps255(gconstpointer test_data)
+{
+    const TestData *td = test_data;
+    unsigned int count = (1U << 24) - 1;
+    unsigned int ps = 255;
+
+    tim_reset(td);
+
+    tim_write_ticr(td, count);
+    tim_write_tcsr(td, CEN | PRESCALE(ps));
+    g_assert_cmphex(tim_read_tcsr(td), ==, CEN | CACT | PRESCALE(ps));
+    g_assert_cmpuint(tim_read_tdr(td), ==, count);
+
+    clock_step(tim_calculate_step(count, ps) - 1);
+
+    g_assert_cmphex(tim_read_tcsr(td), ==, CEN | CACT | PRESCALE(ps));
+    g_assert_cmpuint(tim_read_tdr(td), <, count);
+    g_assert_cmphex(tim_read(td, TISR), ==, 0);
+
+    clock_step(1);
+
+    g_assert_cmphex(tim_read_tcsr(td), ==, PRESCALE(ps));
+    g_assert_cmpuint(tim_read_tdr(td), ==, count);
+    g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
+    g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
+}
+
+/* Verifies that a oneshot timer fires an interrupt when expected. */
+static void test_oneshot_interrupt(gconstpointer test_data)
+{
+    const TestData *td = test_data;
+    unsigned int count = 256;
+    unsigned int ps = 7;
+
+    tim_reset(td);
+
+    tim_write_ticr(td, count);
+    tim_write_tcsr(td, IE | CEN | MODE_ONESHOT | PRESCALE(ps));
+
+    clock_step_next();
+
+    g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
+    g_assert_true(qtest_get_irq(global_qtest, tim_timer_irq(td)));
+}
+
+/*
+ * Verifies that the timer can be paused and later resumed, and it still fires
+ * at the right moment.
+ */
+static void test_pause_resume(gconstpointer test_data)
+{
+    const TestData *td = test_data;
+    unsigned int count = 256;
+    unsigned int ps = 1;
+
+    tim_reset(td);
+
+    tim_write_ticr(td, count);
+    tim_write_tcsr(td, IE | CEN | MODE_ONESHOT | PRESCALE(ps));
+
+    /* Pause the timer halfway to expiration. */
+    clock_step(tim_calculate_step(count / 2, ps));
+    tim_write_tcsr(td, IE | MODE_ONESHOT | PRESCALE(ps));
+    g_assert_cmpuint(tim_read_tdr(td), ==, count / 2);
+
+    /* Counter should not advance during the following step. */
+    clock_step(2 * tim_calculate_step(count, ps));
+    g_assert_cmpuint(tim_read_tdr(td), ==, count / 2);
+    g_assert_cmphex(tim_read(td, TISR), ==, 0);
+    g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
+
+    /* Resume the timer and run _almost_ to expiration. */
+    tim_write_tcsr(td, IE | CEN | MODE_ONESHOT | PRESCALE(ps));
+    clock_step(tim_calculate_step(count / 2, ps) - 1);
+    g_assert_cmpuint(tim_read_tdr(td), <, count);
+    g_assert_cmphex(tim_read(td, TISR), ==, 0);
+    g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
+
+    /* Now, run the rest of the way and verify that the interrupt fires. */
+    clock_step(1);
+    g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
+    g_assert_true(qtest_get_irq(global_qtest, tim_timer_irq(td)));
+}
+
+/* Verifies that the prescaler can be changed while the timer is runnin. */
+static void test_prescaler_change(gconstpointer test_data)
+{
+    const TestData *td = test_data;
+    unsigned int count = 256;
+    unsigned int ps = 5;
+
+    tim_reset(td);
+
+    tim_write_ticr(td, count);
+    tim_write_tcsr(td, CEN | MODE_ONESHOT | PRESCALE(ps));
+
+    /* Run a quarter of the way, and change the prescaler. */
+    clock_step(tim_calculate_step(count / 4, ps));
+    g_assert_cmpuint(tim_read_tdr(td), ==, 3 * count / 4);
+    ps = 2;
+    tim_write_tcsr(td, CEN | MODE_ONESHOT | PRESCALE(ps));
+    /* The counter must not change. */
+    g_assert_cmpuint(tim_read_tdr(td), ==, 3 * count / 4);
+
+    /* Run another quarter of the way, and change the prescaler again. */
+    clock_step(tim_calculate_step(count / 4, ps));
+    g_assert_cmpuint(tim_read_tdr(td), ==, count / 2);
+    ps = 8;
+    tim_write_tcsr(td, CEN | MODE_ONESHOT | PRESCALE(ps));
+    /* The counter must not change. */
+    g_assert_cmpuint(tim_read_tdr(td), ==, count / 2);
+
+    /* Run another quarter of the way, and change the prescaler again. */
+    clock_step(tim_calculate_step(count / 4, ps));
+    g_assert_cmpuint(tim_read_tdr(td), ==, count / 4);
+    ps = 0;
+    tim_write_tcsr(td, CEN | MODE_ONESHOT | PRESCALE(ps));
+    /* The counter must not change. */
+    g_assert_cmpuint(tim_read_tdr(td), ==, count / 4);
+
+    /* Run almost to expiration, and verify the timer didn't fire yet. */
+    clock_step(tim_calculate_step(count / 4, ps) - 1);
+    g_assert_cmpuint(tim_read_tdr(td), <, count);
+    g_assert_cmphex(tim_read(td, TISR), ==, 0);
+
+    /* Now, run the rest of the way and verify that the timer fires. */
+    clock_step(1);
+    g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
+}
+
+/* Verifies that a periodic timer automatically restarts after expiration. */
+static void test_periodic_no_interrupt(gconstpointer test_data)
+{
+    const TestData *td = test_data;
+    unsigned int count = 2;
+    unsigned int ps = 3;
+    int i;
+
+    tim_reset(td);
+
+    tim_write_ticr(td, count);
+    tim_write_tcsr(td, CEN | MODE_PERIODIC | PRESCALE(ps));
+
+    for (i = 0; i < 4; i++) {
+        clock_step_next();
+
+        g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
+        g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
+
+        tim_write(td, TISR, tim_timer_bit(td));
+
+        g_assert_cmphex(tim_read(td, TISR), ==, 0);
+        g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
+    }
+}
+
+/* Verifies that a periodict timer fires an interrupt every time it expires. */
+static void test_periodic_interrupt(gconstpointer test_data)
+{
+    const TestData *td = test_data;
+    unsigned int count = 65535;
+    unsigned int ps = 2;
+    int i;
+
+    tim_reset(td);
+
+    tim_write_ticr(td, count);
+    tim_write_tcsr(td, CEN | IE | MODE_PERIODIC | PRESCALE(ps));
+
+    for (i = 0; i < 4; i++) {
+        clock_step_next();
+
+        g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
+        g_assert_true(qtest_get_irq(global_qtest, tim_timer_irq(td)));
+
+        tim_write(td, TISR, tim_timer_bit(td));
+
+        g_assert_cmphex(tim_read(td, TISR), ==, 0);
+        g_assert_false(qtest_get_irq(global_qtest, tim_timer_irq(td)));
+    }
+}
+
+/*
+ * Verifies that the timer behaves correctly when disabled right before and
+ * exactly when it's supposed to expire.
+ */
+static void test_disable_on_expiration(gconstpointer test_data)
+{
+    const TestData *td = test_data;
+    unsigned int count = 8;
+    unsigned int ps = 255;
+
+    tim_reset(td);
+
+    tim_write_ticr(td, count);
+    tim_write_tcsr(td, CEN | MODE_ONESHOT | PRESCALE(ps));
+
+    clock_step(tim_calculate_step(count, ps) - 1);
+
+    tim_write_tcsr(td, MODE_ONESHOT | PRESCALE(ps));
+    tim_write_tcsr(td, CEN | MODE_ONESHOT | PRESCALE(ps));
+    clock_step(1);
+    tim_write_tcsr(td, MODE_ONESHOT | PRESCALE(ps));
+    g_assert_cmphex(tim_read(td, TISR), ==, tim_timer_bit(td));
+}
+
+/*
+ * Constructs a name that includes the timer block, timer and testcase name,
+ * and adds the test to the test suite.
+ */
+static void tim_add_test(const char *name, const TestData *td, GTestDataFunc fn)
+{
+    g_autofree char *full_name;
+
+    full_name = g_strdup_printf("npcm7xx_timer/tim[%d]/timer[%d]/%s",
+                                tim_index(td->tim), timer_index(td->timer),
+                                name);
+    qtest_add_data_func(full_name, td, fn);
+}
+
+/* Convenience macro for adding a test with a predictable function name. */
+#define add_test(name, td) tim_add_test(#name, td, test_##name)
+
+int main(int argc, char **argv)
+{
+    TestData testdata[ARRAY_SIZE(timer_block) * ARRAY_SIZE(timer)];
+    int ret;
+    int i, j;
+
+    g_test_init(&argc, &argv, NULL);
+    g_test_set_nonfatal_assertions();
+
+    for (i = 0; i < ARRAY_SIZE(timer_block); i++) {
+        for (j = 0; j < ARRAY_SIZE(timer); j++) {
+            TestData *td = &testdata[i * ARRAY_SIZE(timer) + j];
+            td->tim = &timer_block[i];
+            td->timer = &timer[j];
+
+            add_test(reset, td);
+            add_test(reset_overrides_enable, td);
+            add_test(oneshot_enable_then_disable, td);
+            add_test(oneshot_ps5, td);
+            add_test(oneshot_ps0, td);
+            add_test(oneshot_ps255, td);
+            add_test(oneshot_interrupt, td);
+            add_test(pause_resume, td);
+            add_test(prescaler_change, td);
+            add_test(periodic_no_interrupt, td);
+            add_test(periodic_interrupt, td);
+            add_test(disable_on_expiration, td);
+        }
+    }
+
+    qtest_start("-machine npcm750-evb");
+    qtest_irq_intercept_in(global_qtest, "/machine/soc/a9mpcore/gic");
+    ret = g_test_run();
+    qtest_end();
+
+    return ret;
+}
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -XXX,XX +XXX,XX @@ qtests_arm = \
   ['arm-cpu-features',
    'microbit-test',
    'm25p80-test',
+   'npcm7xx_timer-test',
    'test-arm-mptimer',
    'boot-serial-test',
    'hexloader-test']
-- 
2.20.1

From: Emanuele Giuseppe Esposito <e.emanuelegiuseppe@gmail.com>

Current documentation is not too clear on the GETPC usage.
In particular, when used outside the top level helper function
it causes unexpected behavior.

Signed-off-by: Emanuele Giuseppe Esposito <e.emanuelegiuseppe@gmail.com>
Message-id: 20201015095147.1691-1-e.emanuelegiuseppe@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/devel/loads-stores.rst | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/docs/devel/loads-stores.rst b/docs/devel/loads-stores.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/devel/loads-stores.rst
+++ b/docs/devel/loads-stores.rst
@@ -XXX,XX +XXX,XX @@ guest CPU state in case of a guest CPU exception.  This is passed
 to ``cpu_restore_state()``.  Therefore the value should either be 0,
 to indicate that the guest CPU state is already synchronized, or
 the result of ``GETPC()`` from the top level ``HELPER(foo)``
-function, which is a return address into the generated code.
+function, which is a return address into the generated code [#gpc]_.
+
+.. [#gpc] Note that ``GETPC()`` should be used with great care: calling
+          it in other functions that are *not* the top level
+          ``HELPER(foo)`` will cause unexpected behavior. Instead, the
+          value of ``GETPC()`` should be read from the helper and passed
+          if needed to the functions that the helper calls.
 
 Function names follow the pattern:
 
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

Add trace events for GPU and CPU IRQs.

Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201017180731.1165871-2-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/intc/bcm2835_ic.c | 4 +++-
 hw/intc/trace-events | 4 ++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/intc/bcm2835_ic.c b/hw/intc/bcm2835_ic.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/bcm2835_ic.c
+++ b/hw/intc/bcm2835_ic.c
@@ -XXX,XX +XXX,XX @@
 #include "migration/vmstate.h"
 #include "qemu/log.h"
 #include "qemu/module.h"
+#include "trace.h"
 
 #define GPU_IRQS 64
 #define ARM_IRQS 8
@@ -XXX,XX +XXX,XX @@ static void bcm2835_ic_update(BCM2835ICState *s)
     set = (s->gpu_irq_level & s->gpu_irq_enable)
         || (s->arm_irq_level & s->arm_irq_enable);
     qemu_set_irq(s->irq, set);
-
 }
 
 static void bcm2835_ic_set_gpu_irq(void *opaque, int irq, int level)
@@ -XXX,XX +XXX,XX @@ static void bcm2835_ic_set_gpu_irq(void *opaque, int irq, int level)
     BCM2835ICState *s = opaque;
 
     assert(irq >= 0 && irq < 64);
+    trace_bcm2835_ic_set_gpu_irq(irq, level);
     s->gpu_irq_level = deposit64(s->gpu_irq_level, irq, 1, level != 0);
     bcm2835_ic_update(s);
 }
@@ -XXX,XX +XXX,XX @@ static void bcm2835_ic_set_arm_irq(void *opaque, int irq, int level)
     BCM2835ICState *s = opaque;
 
     assert(irq >= 0 && irq < 8);
+    trace_bcm2835_ic_set_cpu_irq(irq, level);
     s->arm_irq_level = deposit32(s->arm_irq_level, irq, 1, level != 0);
     bcm2835_ic_update(s);
 }
diff --git a/hw/intc/trace-events b/hw/intc/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/trace-events
+++ b/hw/intc/trace-events
@@ -XXX,XX +XXX,XX @@ nvic_sysreg_write(uint64_t addr, uint32_t value, unsigned size) "NVIC sysreg wri
 heathrow_write(uint64_t addr, unsigned int n, uint64_t value) "0x%"PRIx64" %u: 0x%"PRIx64
 heathrow_read(uint64_t addr, unsigned int n, uint64_t value) "0x%"PRIx64" %u: 0x%"PRIx64
 heathrow_set_irq(int num, int level) "set_irq: num=0x%02x level=%d"
+
+# bcm2835_ic.c
+bcm2835_ic_set_gpu_irq(int irq, int level) "GPU irq #%d level %d"
+bcm2835_ic_set_cpu_irq(int irq, int level) "CPU irq #%d level %d"
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

The IRQ values are defined few lines earlier, use them instead of
the magic numbers.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201017180731.1165871-3-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/intc/bcm2836_control.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/intc/bcm2836_control.c b/hw/intc/bcm2836_control.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/bcm2836_control.c
+++ b/hw/intc/bcm2836_control.c
@@ -XXX,XX +XXX,XX @@ static void bcm2836_control_set_local_irq(void *opaque, int core, int local_irq,
 
 static void bcm2836_control_set_local_irq0(void *opaque, int core, int level)
 {
-    bcm2836_control_set_local_irq(opaque, core, 0, level);
+    bcm2836_control_set_local_irq(opaque, core, IRQ_CNTPSIRQ, level);
 }
 
 static void bcm2836_control_set_local_irq1(void *opaque, int core, int level)
 {
-    bcm2836_control_set_local_irq(opaque, core, 1, level);
+    bcm2836_control_set_local_irq(opaque, core, IRQ_CNTPNSIRQ, level);
 }
 
 static void bcm2836_control_set_local_irq2(void *opaque, int core, int level)
 {
-    bcm2836_control_set_local_irq(opaque, core, 2, level);
+    bcm2836_control_set_local_irq(opaque, core, IRQ_CNTHPIRQ, level);
 }
 
 static void bcm2836_control_set_local_irq3(void *opaque, int core, int level)
 {
-    bcm2836_control_set_local_irq(opaque, core, 3, level);
+    bcm2836_control_set_local_irq(opaque, core, IRQ_CNTVIRQ, level);
 }
 
 static void bcm2836_control_set_gpu_irq(void *opaque, int irq, int level)
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We already have the full ARMMMUIdx as computed from the
function parameter.

For the purpose of regime_has_2_ranges, we can ignore any
difference between AccType_Normal and AccType_Unpriv, which
would be the only difference between the passed mmu_idx
and arm_mmu_idx_el.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Message-id: 20201008162155.161886-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/mte_helper.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
 
     case 2:
         /* Tag check fail causes asynchronous flag set.  */
-        mmu_idx = arm_mmu_idx_el(env, el);
-        if (regime_has_2_ranges(mmu_idx)) {
+        if (regime_has_2_ranges(arm_mmu_idx)) {
             select = extract64(dirty_ptr, 55, 1);
         } else {
             select = 0;
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The reporting in AArch64.TagCheckFail only depends on PSTATE.EL,
and not the AccType of the operation.  There are two guest
visible problems that affect LDTR and STTR because of this:

(1) Selecting TCF0 vs TCF1 to decide on reporting,
(2) Report "data abort same el" not "data abort lower el".

Reported-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Message-id: 20201008162155.161886-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/mte_helper.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -XXX,XX +XXX,XX @@ static void mte_check_fail(CPUARMState *env, uint32_t desc,
     reg_el = regime_el(env, arm_mmu_idx);
     sctlr = env->cp15.sctlr_el[reg_el];
 
-    switch (arm_mmu_idx) {
-    case ARMMMUIdx_E10_0:
-    case ARMMMUIdx_E20_0:
-        el = 0;
+    el = arm_current_el(env);
+    if (el == 0) {
         tcf = extract64(sctlr, 38, 2);
-        break;
-    default:
-        el = reg_el;
+    } else {
         tcf = extract64(sctlr, 40, 2);
     }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Unlike many other bits in HCR_EL2, the description for this
bit does not contain the phrase "if ... this field behaves
as 0 for all purposes other than", so do not squash the bit
in arm_hcr_el2_eff.

Instead, replicate the E2H+TGE test in the two places that
require it.

Reported-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Message-id: 20201008162155.161886-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h | 9 +++++----
 target/arm/helper.c    | 9 +++++----
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline bool allocation_tag_access_enabled(CPUARMState *env, int el,
         && !(env->cp15.scr_el3 & SCR_ATA)) {
         return false;
     }
-    if (el < 2
-        && arm_feature(env, ARM_FEATURE_EL2)
-        && !(arm_hcr_el2_eff(env) & HCR_ATA)) {
-        return false;
+    if (el < 2 && arm_feature(env, ARM_FEATURE_EL2)) {
+        uint64_t hcr = arm_hcr_el2_eff(env);
+        if (!(hcr & HCR_ATA) && (!(hcr & HCR_E2H) || !(hcr & HCR_TGE))) {
+            return false;
+        }
     }
     sctlr &= (el == 0 ? SCTLR_ATA0 : SCTLR_ATA);
     return sctlr != 0;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_mte(CPUARMState *env, const ARMCPRegInfo *ri,
 {
     int el = arm_current_el(env);
 
-    if (el < 2 &&
-        arm_feature(env, ARM_FEATURE_EL2) &&
-        !(arm_hcr_el2_eff(env) & HCR_ATA)) {
-        return CP_ACCESS_TRAP_EL2;
+    if (el < 2 && arm_feature(env, ARM_FEATURE_EL2)) {
+        uint64_t hcr = arm_hcr_el2_eff(env);
+        if (!(hcr & HCR_ATA) && (!(hcr & HCR_E2H) || !(hcr & HCR_TGE))) {
+            return CP_ACCESS_TRAP_EL2;
+        }
     }
     if (el < 3 &&
         arm_feature(env, ARM_FEATURE_EL3) &&
-- 
2.20.1

From: Peng Liang <liangpeng10@huawei.com>

VMStateDescription.fields should be end with VMSTATE_END_OF_LIST().
However, microbit_i2c_vmstate doesn't follow it.  Let's change it.

Fixes: 9d68bf564e ("arm: Stub out NRF51 TWI magnetometer/accelerometer detection")
Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Peng Liang <liangpeng10@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201019093401.2993833-1-liangpeng10@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/i2c/microbit_i2c.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/i2c/microbit_i2c.c b/hw/i2c/microbit_i2c.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/i2c/microbit_i2c.c
+++ b/hw/i2c/microbit_i2c.c
@@ -XXX,XX +XXX,XX @@ static const VMStateDescription microbit_i2c_vmstate = {
     .fields = (VMStateField[]) {
         VMSTATE_UINT32_ARRAY(regs, MicrobitI2CState, MICROBIT_I2C_NREGS),
         VMSTATE_UINT32(read_idx, MicrobitI2CState),
+        VMSTATE_END_OF_LIST()
     },
 };
 
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

Commit 7998beb9c2e removed the ram_size initialization in the
arm_boot_info structure, however it is used by arm_load_kernel().

Initialize the field to fix:

$ qemu-system-arm -M n800 -append 'console=ttyS1' \
    -kernel meego-arm-n8x0-1.0.80.20100712.1431-vmlinuz-2.6.35~rc4-129.1-n8x0
  qemu-system-arm: kernel 'meego-arm-n8x0-1.0.80.20100712.1431-vmlinuz-2.6.35~rc4-129.1-n8x0' is too large to fit in RAM (kernel size 1964608, RAM size 0)

Noticed while running the test introduced in commit 050a82f0c5b
("tests/acceptance: Add a test for the N800 and N810 arm machines").

Fixes: 7998beb9c2e ("arm/nseries: use memdev for RAM")
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Thomas Huth <thuth@redhat.com>
Message-id: 20201019095148.1602119-1-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/nseries.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/nseries.c
+++ b/hw/arm/nseries.c
@@ -XXX,XX +XXX,XX @@ static void n8x0_init(MachineState *machine,
         g_free(sz);
         exit(EXIT_FAILURE);
     }
+    binfo->ram_size = machine->ram_size;
 
     memory_region_add_subregion(get_system_memory(), OMAP2_Q2_BASE,
                                 machine->ram);
-- 
2.20.1

For nested groups like:

{
    [
      pattern 1
      pattern 2
    ]
    pattern 3
  }

the intended behaviour is that patterns 1 and 2 must not
overlap with each other; if the insn matches neither then
we fall through to pattern 3 as the next thing in the
outer overlapping group.

Currently we generate incorrect code for this situation,
because in the code path for a failed match inside the
inner non-overlapping group we generate a "return" statement,
which causes decode to stop entirely rather than continuing
to the next thing in the outer group.

Generate a "break" instead, so that decode flow behaves
as required for this nested group case.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20201019151301.2046-2-peter.maydell@linaro.org
---
 scripts/decodetree.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index XXXXXXX..XXXXXXX 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -XXX,XX +XXX,XX @@ class Tree:
             output(ind, '    /* ',
                    str_match_bits(innerbits, innermask), ' */\n')
             s.output_code(i + 4, extracted, innerbits, innermask)
-            output(ind, '    return false;\n')
+            output(ind, '    break;\n')
         output(ind, '}\n')
 # end Tree
 
-- 
2.20.1

From v8.1M, disabled-coprocessor handling changes slightly:
 * coprocessors 8, 9, 14 and 15 are also governed by the
   cp10 enable bit, like cp11
 * an extra range of instruction patterns is considered
   to be inside the coprocessor space

We previously marked these up with TODO comments; implement the
correct behaviour.

Unfortunately there is no ID register field which indicates this
behaviour.  We could in theory test an unrelated ID register which
indicates guaranteed-to-be-in-v8.1M behaviour like ID_ISAR0.CmpBranch
>= 3 (low-overhead-loops), but it seems better to simply define a new
ARM_FEATURE_V8_1M feature flag and use it for this and other
new-in-v8.1M behaviour that isn't identifiable from the ID registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201019151301.2046-3-peter.maydell@linaro.org
---
 target/arm/cpu.h               |  1 +
 target/arm/m-nocp.decode       | 10 ++++++----
 target/arm/translate-vfp.c.inc | 17 +++++++++++++++--
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ enum arm_features {
     ARM_FEATURE_VBAR, /* has cp15 VBAR */
     ARM_FEATURE_M_SECURITY, /* M profile Security Extension */
     ARM_FEATURE_M_MAIN, /* M profile Main Extension */
+    ARM_FEATURE_V8_1M, /* M profile extras only in v8.1M and later */
 };
 
 static inline int arm_feature(CPUARMState *env, int feature)
diff --git a/target/arm/m-nocp.decode b/target/arm/m-nocp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/m-nocp.decode
+++ b/target/arm/m-nocp.decode
@@ -XXX,XX +XXX,XX @@
 # If the coprocessor is not present or disabled then we will generate
 # the NOCP exception; otherwise we let the insn through to the main decode.
 
+&nocp cp
+
 {
   # Special cases which do not take an early NOCP: VLLDM and VLSTM
   VLLDM_VLSTM  1110 1100 001 l:1 rn:4 0000 1010 0000 0000
   # TODO: VSCCLRM (new in v8.1M) is similar:
   #VSCCLRM      1110 1100 1-01 1111 ---- 1011 ---- ---0
 
-  NOCP         111- 1110 ---- ---- ---- cp:4 ---- ----
-  NOCP         111- 110- ---- ---- ---- cp:4 ---- ----
-  # TODO: From v8.1M onwards we will also want this range to NOCP
-  #NOCP_8_1     111- 1111 ---- ---- ---- ---- ---- ---- cp=10
+  NOCP         111- 1110 ---- ---- ---- cp:4 ---- ---- &nocp
+  NOCP         111- 110- ---- ---- ---- cp:4 ---- ---- &nocp
+  # From v8.1M onwards this range will also NOCP:
+  NOCP_8_1     111- 1111 ---- ---- ---- ---- ---- ---- &nocp cp=10
 }
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VLLDM_VLSTM(DisasContext *s, arg_VLLDM_VLSTM *a)
     return true;
 }
 
-static bool trans_NOCP(DisasContext *s, arg_NOCP *a)
+static bool trans_NOCP(DisasContext *s, arg_nocp *a)
 {
     /*
      * Handle M-profile early check for disabled coprocessor:
@@ -XXX,XX +XXX,XX @@ static bool trans_NOCP(DisasContext *s, arg_NOCP *a)
     if (a->cp == 11) {
         a->cp = 10;
     }
-    /* TODO: in v8.1M cp 8, 9, 14, 15 also are governed by the cp10 enable */
+    if (arm_dc_feature(s, ARM_FEATURE_V8_1M) &&
+        (a->cp == 8 || a->cp == 9 || a->cp == 14 || a->cp == 15)) {
+        /* in v8.1M cp 8, 9, 14, 15 also are governed by the cp10 enable */
+        a->cp = 10;
+    }
 
     if (a->cp != 10) {
         gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
@@ -XXX,XX +XXX,XX @@ static bool trans_NOCP(DisasContext *s, arg_NOCP *a)
     return false;
 }
 
+static bool trans_NOCP_8_1(DisasContext *s, arg_nocp *a)
+{
+    /* This range needs a coprocessor check for v8.1M and later only */
+    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+        return false;
+    }
+    return trans_NOCP(s, a);
+}
+
 static bool trans_VINS(DisasContext *s, arg_VINS *a)
 {
     TCGv_i32 rd, rm;
-- 
2.20.1

v8.1M brings four new insns to M-profile:
 * CSEL  : Rd = cond ? Rn : Rm
 * CSINC : Rd = cond ? Rn : Rm+1
 * CSINV : Rd = cond ? Rn : ~Rm
 * CSNEG : Rd = cond ? Rn : -Rm

Implement these.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20201019151301.2046-4-peter.maydell@linaro.org
---
 target/arm/t32.decode  |  3 +++
 target/arm/translate.c | 60 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -XXX,XX +XXX,XX @@ SBC_rrri         1110101 1011 . .... 0 ... .... .... ....     @s_rrr_shi
 }
 RSB_rrri         1110101 1110 . .... 0 ... .... .... ....     @s_rrr_shi
 
+# v8.1M CSEL and friends
+CSEL             1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
+
 # Data-processing (register-shifted register)
 
 MOV_rxrr         1111 1010 0 shty:2 s:1 rm:4 1111 rd:4 0000 rs:4 \
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_IT(DisasContext *s, arg_IT *a)
     return true;
 }
 
+/* v8.1M CSEL/CSINC/CSNEG/CSINV */
+static bool trans_CSEL(DisasContext *s, arg_CSEL *a)
+{
+    TCGv_i32 rn, rm, zero;
+    DisasCompare c;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+        return false;
+    }
+
+    if (a->rm == 13) {
+        /* SEE "Related encodings" (MVE shifts) */
+        return false;
+    }
+
+    if (a->rd == 13 || a->rd == 15 || a->rn == 13 || a->fcond >= 14) {
+        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+        return false;
+    }
+
+    /* In this insn input reg fields of 0b1111 mean "zero", not "PC" */
+    if (a->rn == 15) {
+        rn = tcg_const_i32(0);
+    } else {
+        rn = load_reg(s, a->rn);
+    }
+    if (a->rm == 15) {
+        rm = tcg_const_i32(0);
+    } else {
+        rm = load_reg(s, a->rm);
+    }
+
+    switch (a->op) {
+    case 0: /* CSEL */
+        break;
+    case 1: /* CSINC */
+        tcg_gen_addi_i32(rm, rm, 1);
+        break;
+    case 2: /* CSINV */
+        tcg_gen_not_i32(rm, rm);
+        break;
+    case 3: /* CSNEG */
+        tcg_gen_neg_i32(rm, rm);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    arm_test_cc(&c, a->fcond);
+    zero = tcg_const_i32(0);
+    tcg_gen_movcond_i32(c.cond, rn, c.value, zero, rn, rm);
+    arm_free_cc(&c);
+    tcg_temp_free_i32(zero);
+
+    store_reg(s, a->rd, rn);
+    tcg_temp_free_i32(rm);
+
+    return true;
+}
+
 /*
  * Legacy decoder.
  */
-- 
2.20.1

The t32 decode has a group which represents a set of insns
which overlap with B_cond_thumb because they have [25:23]=111
(which is an invalid condition code field for the branch insn).
This group is currently defined using the {} overlap-OK syntax,
but it is almost entirely non-overlapping patterns. Switch
it over to use a non-overlapping group.

For this to be valid syntactically, CPS must move into the same
overlapping-group as the hint insns (CPS vs hints was the
only actual use of the overlap facility for the group).

The non-overlapping subgroup for CLREX/DSB/DMB/ISB/SB is no longer
necessary and so we can remove it (promoting those insns to
be members of the parent group).

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20201019151301.2046-5-peter.maydell@linaro.org
---
 target/arm/t32.decode | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -XXX,XX +XXX,XX @@ CLZ              1111 1010 1011 ---- 1111 .... 1000 ....      @rdm
 {
   # Group insn[25:23] = 111, which is cond=111x for the branch below,
   # or unconditional, which would be illegal for the branch.
-  {
-    # Hints
+  [
+    # Hints, and CPS
     {
       YIELD      1111 0011 1010 1111 1000 0000 0000 0001
       WFE        1111 0011 1010 1111 1000 0000 0000 0010
@@ -XXX,XX +XXX,XX @@ CLZ              1111 1010 1011 ---- 1111 .... 1000 ....      @rdm
       # The canonical nop ends in 0000 0000, but the whole rest
       # of the space is "reserved hint, behaves as nop".
       NOP        1111 0011 1010 1111 1000 0000 ---- ----
+
+      # If imod == '00' && M == '0' then SEE "Hint instructions", above.
+      CPS        1111 0011 1010 1111 1000 0 imod:2 M:1 A:1 I:1 F:1 mode:5 \
+                 &cps
     }
 
-    # If imod == '00' && M == '0' then SEE "Hint instructions", above.
-    CPS          1111 0011 1010 1111 1000 0 imod:2 M:1 A:1 I:1 F:1 mode:5 \
-                 &cps
-
     # Miscellaneous control
-    [
-      CLREX      1111 0011 1011 1111 1000 1111 0010 1111
-      DSB        1111 0011 1011 1111 1000 1111 0100 ----
-      DMB        1111 0011 1011 1111 1000 1111 0101 ----
-      ISB        1111 0011 1011 1111 1000 1111 0110 ----
-      SB         1111 0011 1011 1111 1000 1111 0111 0000
-    ]
+    CLREX        1111 0011 1011 1111 1000 1111 0010 1111
+    DSB          1111 0011 1011 1111 1000 1111 0100 ----
+    DMB          1111 0011 1011 1111 1000 1111 0101 ----
+    ISB          1111 0011 1011 1111 1000 1111 0110 ----
+    SB           1111 0011 1011 1111 1000 1111 0111 0000
 
     # Note that the v7m insn overlaps both the normal and banked insn.
     {
@@ -XXX,XX +XXX,XX @@ CLZ              1111 1010 1011 ---- 1111 .... 1000 ....      @rdm
     HVC          1111 0111 1110 ....  1000 .... .... ....     \
                  &i imm=%imm16_16_0
     UDF          1111 0111 1111 ----  1010 ---- ---- ----
-  }
+  ]
   B_cond_thumb   1111 0. cond:4 ...... 10.0 ............      &ci imm=%imm21
 }
 
-- 
2.20.1

The BLX immediate insn in the Thumb encoding always performs
a switch from Thumb to Arm state. This would be totally useless
in M-profile which has no Arm decoder, and so the instruction
does not exist at all there. Make the encoding UNDEF for M-profile.

(This part of the encoding space is used for the branch-future
and low-overhead-loop insns in v8.1M.)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20201019151301.2046-6-peter.maydell@linaro.org
---
 target/arm/translate.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BLX_i(DisasContext *s, arg_BLX_i *a)
 {
     TCGv_i32 tmp;
 
+    /*
+     * BLX <imm> would be useless on M-profile; the encoding space
+     * is used for other insns from v8.1M onward, and UNDEFs before that.
+     */
+    if (arm_dc_feature(s, ARM_FEATURE_M)) {
+        return false;
+    }
+
     /* For A32, ARM_FEATURE_V5 is checked near the start of the uncond block. */
     if (s->thumb && (a->imm & 2)) {
         return false;
-- 
2.20.1

v8.1M implements a new 'branch future' feature, which is a
set of instructions that request the CPU to perform a branch
"in the future", when it reaches a particular execution address.
In hardware, the expected implementation is that the information
about the branch location and destination is cached and then
acted upon when execution reaches the specified address.
However the architecture permits an implementation to discard
this cached information at any point, and so guest code must
always include a normal branch insn at the branch point as
a fallback. In particular, an implementation is specifically
permitted to treat all BF insns as NOPs (which is equivalent
to discarding the cached information immediately).

For QEMU, implementing this caching of branch information
would be complicated and would not improve the speed of
execution at all, so we make the IMPDEF choice to implement
all BF insns as NOPs.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20201019151301.2046-7-peter.maydell@linaro.org
---
 target/arm/cpu.h       |  6 ++++++
 target/arm/t32.decode  | 13 ++++++++++++-
 target/arm/translate.c | 20 ++++++++++++++++++++
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_arm_div(const ARMISARegisters *id)
     return FIELD_EX32(id->id_isar0, ID_ISAR0, DIVIDE) > 1;
 }
 
+static inline bool isar_feature_aa32_lob(const ARMISARegisters *id)
+{
+    /* (M-profile) low-overhead loops and branch future */
+    return FIELD_EX32(id->id_isar0, ID_ISAR0, CMPBRANCH) >= 3;
+}
+
 static inline bool isar_feature_aa32_jazelle(const ARMISARegisters *id)
 {
     return FIELD_EX32(id->id_isar1, ID_ISAR1, JAZELLE) != 0;
diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -XXX,XX +XXX,XX @@ MRC              1110 1110 ... 1 .... .... .... ... 1 .... @mcr
 
 B                1111 0. .......... 10.1 ............         @branch24
 BL               1111 0. .......... 11.1 ............         @branch24
-BLX_i            1111 0. .......... 11.0 ............         @branch24
+{
+  # BLX_i is non-M-profile only
+  BLX_i          1111 0. .......... 11.0 ............         @branch24
+  # M-profile only: loop and branch insns
+  [
+    # All these BF insns have boff != 0b0000; we NOP them all
+    BF           1111 0 boff:4  ------- 1100 - ---------- 1    # BFL
+    BF           1111 0 boff:4 0 ------ 1110 - ---------- 1    # BFCSEL
+    BF           1111 0 boff:4 10 ----- 1110 - ---------- 1    # BF
+    BF           1111 0 boff:4 11 ----- 1110 0 0000000000 1    # BFX, BFLX
+  ]
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_BLX_suffix(DisasContext *s, arg_BLX_suffix *a)
     return true;
 }
 
+static bool trans_BF(DisasContext *s, arg_BF *a)
+{
+    /*
+     * M-profile branch future insns. The architecture permits an
+     * implementation to implement these as NOPs (equivalent to
+     * discarding the LO_BRANCH_INFO cache immediately), and we
+     * take that IMPDEF option because for QEMU a "real" implementation
+     * would be complicated and wouldn't execute any faster.
+     */
+    if (!dc_isar_feature(aa32_lob, s)) {
+        return false;
+    }
+    if (a->boff == 0) {
+        /* SEE "Related encodings" (loop insns) */
+        return false;
+    }
+    /* Handle as NOP */
+    return true;
+}
+
 static bool op_tbranch(DisasContext *s, arg_tbranch *a, bool half)
 {
     TCGv_i32 addr, tmp;
-- 
2.20.1

v8.1M's "low-overhead-loop" extension has three instructions
for looping:
 * DLS (start of a do-loop)
 * WLS (start of a while-loop)
 * LE (end of a loop)

The loop-start instructions are both simple operations to start a
loop whose iteration count (if any) is in LR.  The loop-end
instruction handles "decrement iteration count and jump back to loop
start"; it also caches the information about the branch back to the
start of the loop to improve performance of the branch on subsequent
iterations.

As with the branch-future instructions, the architecture permits an
implementation to discard the LO_BRANCH_INFO cache at any time, and
QEMU takes the IMPDEF option to never set it in the first place
(equivalent to discarding it immediately), because for us a "real"
implementation would be unnecessary complexity.

(This implementation only provides the simple looping constructs; the
vector extension MVE (Helium) adds some extra variants to handle
looping across vectors.  We'll add those later when we implement
MVE.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201019151301.2046-8-peter.maydell@linaro.org
---
 target/arm/t32.decode  |  8 ++++
 target/arm/translate.c | 93 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -XXX,XX +XXX,XX @@ BL               1111 0. .......... 11.1 ............         @branch24
     BF           1111 0 boff:4 10 ----- 1110 - ---------- 1    # BF
     BF           1111 0 boff:4 11 ----- 1110 0 0000000000 1    # BFX, BFLX
   ]
+  [
+    # LE and WLS immediate
+    %lob_imm 1:10 11:1 !function=times_2
+
+    DLS          1111 0 0000 100     rn:4 1110 0000 0000 0001
+    WLS          1111 0 0000 100     rn:4 1100 . .......... 1 imm=%lob_imm
+    LE           1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
+  ]
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_goto_tb(DisasContext *s, int n, target_ulong dest)
     s->base.is_jmp = DISAS_NORETURN;
 }
 
-static inline void gen_jmp (DisasContext *s, uint32_t dest)
+/* Jump, specifying which TB number to use if we gen_goto_tb() */
+static inline void gen_jmp_tb(DisasContext *s, uint32_t dest, int tbno)
 {
     if (unlikely(is_singlestepping(s))) {
         /* An indirect jump so that we still trigger the debug exception.  */
         gen_set_pc_im(s, dest);
         s->base.is_jmp = DISAS_JUMP;
     } else {
-        gen_goto_tb(s, 0, dest);
+        gen_goto_tb(s, tbno, dest);
     }
 }
 
+static inline void gen_jmp(DisasContext *s, uint32_t dest)
+{
+    gen_jmp_tb(s, dest, 0);
+}
+
 static inline void gen_mulxy(TCGv_i32 t0, TCGv_i32 t1, int x, int y)
 {
     if (x)
@@ -XXX,XX +XXX,XX @@ static bool trans_BF(DisasContext *s, arg_BF *a)
     return true;
 }
 
+static bool trans_DLS(DisasContext *s, arg_DLS *a)
+{
+    /* M-profile low-overhead loop start */
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_lob, s)) {
+        return false;
+    }
+    if (a->rn == 13 || a->rn == 15) {
+        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+        return false;
+    }
+
+    /* Not a while loop, no tail predication: just set LR to the count */
+    tmp = load_reg(s, a->rn);
+    store_reg(s, 14, tmp);
+    return true;
+}
+
+static bool trans_WLS(DisasContext *s, arg_WLS *a)
+{
+    /* M-profile low-overhead while-loop start */
+    TCGv_i32 tmp;
+    TCGLabel *nextlabel;
+
+    if (!dc_isar_feature(aa32_lob, s)) {
+        return false;
+    }
+    if (a->rn == 13 || a->rn == 15) {
+        /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+        return false;
+    }
+    if (s->condexec_mask) {
+        /*
+         * WLS in an IT block is CONSTRAINED UNPREDICTABLE;
+         * we choose to UNDEF, because otherwise our use of
+         * gen_goto_tb(1) would clash with the use of TB exit 1
+         * in the dc->condjmp condition-failed codepath in
+         * arm_tr_tb_stop() and we'd get an assertion.
+         */
+        return false;
+    }
+    nextlabel = gen_new_label();
+    tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_R[a->rn], 0, nextlabel);
+    tmp = load_reg(s, a->rn);
+    store_reg(s, 14, tmp);
+    gen_jmp_tb(s, s->base.pc_next, 1);
+
+    gen_set_label(nextlabel);
+    gen_jmp(s, read_pc(s) + a->imm);
+    return true;
+}
+
+static bool trans_LE(DisasContext *s, arg_LE *a)
+{
+    /*
+     * M-profile low-overhead loop end. The architecture permits an
+     * implementation to discard the LO_BRANCH_INFO cache at any time,
+     * and we take the IMPDEF option to never set it in the first place
+     * (equivalent to always discarding it immediately), because for QEMU
+     * a "real" implementation would be complicated and wouldn't execute
+     * any faster.
+     */
+    TCGv_i32 tmp;
+
+    if (!dc_isar_feature(aa32_lob, s)) {
+        return false;
+    }
+
+    if (!a->f) {
+        /* Not loop-forever. If LR <= 1 this is the last loop: do nothing. */
+        arm_gen_condlabel(s);
+        tcg_gen_brcondi_i32(TCG_COND_LEU, cpu_R[14], 1, s->condlabel);
+        /* Decrement LR */
+        tmp = load_reg(s, 14);
+        tcg_gen_addi_i32(tmp, tmp, -1);
+        store_reg(s, 14, tmp);
+    }
+    /* Jump back to the loop start */
+    gen_jmp(s, read_pc(s) - a->imm);
+    return true;
+}
+
 static bool op_tbranch(DisasContext *s, arg_tbranch *a, bool half)
 {
     TCGv_i32 addr, tmp;
-- 
2.20.1

In arm_cpu_realizefn(), if the CPU has VFP or Neon disabled then we
squash the ID register fields so that we don't advertise it to the
guest.  This code was written for A-profile and needs some tweaks to
work correctly on M-profile:

* A-profile only fields should not be zeroed on M-profile:
   - MVFR0.FPSHVEC,FPTRAP
   - MVFR1.SIMDLS,SIMDINT,SIMDSP,SIMDHP
   - MVFR2.SIMDMISC
 * M-profile only fields should be zeroed on M-profile:
   - MVFR1.FP16

In particular, because MVFR1.SIMDHP on A-profile is the same field as
MVFR1.FP16 on M-profile this code was incorrectly disabling FP16
support on an M-profile CPU (where has_neon is always false).  This
isn't a visible bug yet because we don't have any M-profile CPUs with
FP16 support, but the change is necessary before we introduce any.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20201019151301.2046-9-peter.maydell@linaro.org
---
 target/arm/cpu.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
         u = cpu->isar.mvfr0;
         u = FIELD_DP32(u, MVFR0, FPSP, 0);
         u = FIELD_DP32(u, MVFR0, FPDP, 0);
-        u = FIELD_DP32(u, MVFR0, FPTRAP, 0);
         u = FIELD_DP32(u, MVFR0, FPDIVIDE, 0);
         u = FIELD_DP32(u, MVFR0, FPSQRT, 0);
-        u = FIELD_DP32(u, MVFR0, FPSHVEC, 0);
         u = FIELD_DP32(u, MVFR0, FPROUND, 0);
+        if (!arm_feature(env, ARM_FEATURE_M)) {
+            u = FIELD_DP32(u, MVFR0, FPTRAP, 0);
+            u = FIELD_DP32(u, MVFR0, FPSHVEC, 0);
+        }
         cpu->isar.mvfr0 = u;
 
         u = cpu->isar.mvfr1;
         u = FIELD_DP32(u, MVFR1, FPFTZ, 0);
         u = FIELD_DP32(u, MVFR1, FPDNAN, 0);
         u = FIELD_DP32(u, MVFR1, FPHP, 0);
+        if (arm_feature(env, ARM_FEATURE_M)) {
+            u = FIELD_DP32(u, MVFR1, FP16, 0);
+        }
         cpu->isar.mvfr1 = u;
 
         u = cpu->isar.mvfr2;
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
         u = FIELD_DP32(u, ID_ISAR6, FHM, 0);
         cpu->isar.id_isar6 = u;
 
-        u = cpu->isar.mvfr1;
-        u = FIELD_DP32(u, MVFR1, SIMDLS, 0);
-        u = FIELD_DP32(u, MVFR1, SIMDINT, 0);
-        u = FIELD_DP32(u, MVFR1, SIMDSP, 0);
-        u = FIELD_DP32(u, MVFR1, SIMDHP, 0);
-        cpu->isar.mvfr1 = u;
+        if (!arm_feature(env, ARM_FEATURE_M)) {
+            u = cpu->isar.mvfr1;
+            u = FIELD_DP32(u, MVFR1, SIMDLS, 0);
+            u = FIELD_DP32(u, MVFR1, SIMDINT, 0);
+            u = FIELD_DP32(u, MVFR1, SIMDSP, 0);
+            u = FIELD_DP32(u, MVFR1, SIMDHP, 0);
+            cpu->isar.mvfr1 = u;
 
-        u = cpu->isar.mvfr2;
-        u = FIELD_DP32(u, MVFR2, SIMDMISC, 0);
-        cpu->isar.mvfr2 = u;
+            u = cpu->isar.mvfr2;
+            u = FIELD_DP32(u, MVFR2, SIMDMISC, 0);
+            cpu->isar.mvfr2 = u;
+        }
     }
 
     if (!cpu->has_neon && !cpu->has_vfp) {
-- 
2.20.1

M-profile CPUs with half-precision floating point support should
be able to write to FPSCR.FZ16, but an M-profile specific masking
of the value at the top of vfp_set_fpscr() currently prevents that.
This is not yet an active bug because we have no M-profile
FP16 CPUs, but needs to be fixed before we can add any.

The bits that the masking is effectively preventing from being
set are the A-profile only short-vector Len and Stride fields,
plus the Neon QC bit. Rearrange the order of the function so
that those fields are handled earlier and only under a suitable
guard; this allows us to drop the M-profile specific masking,
making FZ16 writeable.

This change also makes the QC bit correctly RAZ/WI for older
no-Neon A-profile cores.

This refactoring also paves the way for the low-overhead-branch
LTPSIZE field, which uses some of the bits that are used for
A-profile Stride and Len.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201019151301.2046-10-peter.maydell@linaro.org
---
 target/arm/vfp_helper.c | 47 ++++++++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
         val &= ~FPCR_FZ16;
     }
 
-    if (arm_feature(env, ARM_FEATURE_M)) {
+    vfp_set_fpscr_to_host(env, val);
+
+    if (!arm_feature(env, ARM_FEATURE_M)) {
         /*
-         * M profile FPSCR is RES0 for the QC, STRIDE, FZ16, LEN bits
-         * and also for the trapped-exception-handling bits IxE.
+         * Short-vector length and stride; on M-profile these bits
+         * are used for different purposes.
+         * We can't make this conditional be "if MVFR0.FPShVec != 0",
+         * because in v7A no-short-vector-support cores still had to
+         * allow Stride/Len to be written with the only effect that
+         * some insns are required to UNDEF if the guest sets them.
+         *
+         * TODO: if M-profile MVE implemented, set LTPSIZE.
          */
-        val &= 0xf7c0009f;
+        env->vfp.vec_len = extract32(val, 16, 3);
+        env->vfp.vec_stride = extract32(val, 20, 2);
     }
 
-    vfp_set_fpscr_to_host(env, val);
+    if (arm_feature(env, ARM_FEATURE_NEON)) {
+        /*
+         * The bit we set within fpscr_q is arbitrary; the register as a
+         * whole being zero/non-zero is what counts.
+         * TODO: M-profile MVE also has a QC bit.
+         */
+        env->vfp.qc[0] = val & FPCR_QC;
+        env->vfp.qc[1] = 0;
+        env->vfp.qc[2] = 0;
+        env->vfp.qc[3] = 0;
+    }
 
     /*
      * We don't implement trapped exception handling, so the
      * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
      *
-     * If we exclude the exception flags, IOC|DZC|OFC|UFC|IXC|IDC
-     * (which are stored in fp_status), and the other RES0 bits
-     * in between, then we clear all of the low 16 bits.
+     * The exception flags IOC|DZC|OFC|UFC|IXC|IDC are stored in
+     * fp_status; QC, Len and Stride are stored separately earlier.
+     * Clear out all of those and the RES0 bits: only NZCV, AHP, DN,
+     * FZ, RMode and FZ16 are kept in vfp.xregs[FPSCR].
      */
     env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000;
-    env->vfp.vec_len = (val >> 16) & 7;
-    env->vfp.vec_stride = (val >> 20) & 3;
-
-    /*
-     * The bit we set within fpscr_q is arbitrary; the register as a
-     * whole being zero/non-zero is what counts.
-     */
-    env->vfp.qc[0] = val & FPCR_QC;
-    env->vfp.qc[1] = 0;
-    env->vfp.qc[2] = 0;
-    env->vfp.qc[3] = 0;
 }
 
 void vfp_set_fpscr(CPUARMState *env, uint32_t val)
-- 
2.20.1

If the M-profile low-overhead-branch extension is implemented, FPSCR
bits [18:16] are a new field LTPSIZE.  If MVE is not implemented
(currently always true for us) then this field always reads as 4 and
ignores writes.

These bits used to be the vector-length field for the old
short-vector extension, so we need to take care that they are not
misinterpreted as setting vec_len. We do this with a rearrangement
of the vfp_set_fpscr() code that deals with vec_len, vec_stride
and also the QC bit; this obviates the need for the M-profile
only masking step that we used to have at the start of the function.

We provide a new field in CPUState for LTPSIZE, even though this
will always be 4, in preparation for MVE, so we don't have to
come back later and split it out of the vfp.xregs[FPSCR] value.
(This state struct field will be saved and restored as part of
the FPSCR value via the vmstate_fpscr in machine.c.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201019151301.2046-11-peter.maydell@linaro.org
---
 target/arm/cpu.h        | 1 +
 target/arm/cpu.c        | 9 +++++++++
 target/arm/vfp_helper.c | 6 ++++++
 3 files changed, 16 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
         uint32_t fpdscr[M_REG_NUM_BANKS];
         uint32_t cpacr[M_REG_NUM_BANKS];
         uint32_t nsacr;
+        int ltpsize;
     } v7m;
 
     /* Information associated with an exception about to be taken:
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
         uint8_t *rom;
         uint32_t vecbase;
 
+        if (cpu_isar_feature(aa32_lob, cpu)) {
+            /*
+             * LTPSIZE is constant 4 if MVE not implemented, and resets
+             * to an UNKNOWN value if MVE is implemented. We choose to
+             * always reset to 4.
+             */
+            env->v7m.ltpsize = 4;
+        }
+
         if (arm_feature(env, ARM_FEATURE_M_SECURITY)) {
             env->v7m.secure = true;
         } else {
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env)
             | (env->vfp.vec_len << 16)
             | (env->vfp.vec_stride << 20);
 
+    /*
+     * M-profile LTPSIZE overlaps A-profile Stride; whichever of the
+     * two is not applicable to this CPU will always be zero.
+     */
+    fpscr |= env->v7m.ltpsize << 16;
+
     fpscr |= vfp_get_fpscr_from_host(env);
 
     i = env->vfp.qc[0] | env->vfp.qc[1] | env->vfp.qc[2] | env->vfp.qc[3];
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The kernel sets btype for the signal handler as if for a call.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201016184207.786698-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 linux-user/aarch64/signal.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/aarch64/signal.c
+++ b/linux-user/aarch64/signal.c
@@ -XXX,XX +XXX,XX @@ static void target_setup_frame(int usig, struct target_sigaction *ka,
             + offsetof(struct target_rt_frame_record, tramp);
     }
     env->xregs[0] = usig;
-    env->xregs[31] = frame_addr;
     env->xregs[29] = frame_addr + fr_ofs;
-    env->pc = ka->_sa_handler;
     env->xregs[30] = return_addr;
+    env->xregs[31] = frame_addr;
+    env->pc = ka->_sa_handler;
+
+    /* Invoke the signal handler as if by indirect call.  */
+    if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
+        env->btype = 2;
+    }
+
     if (info) {
         tswap_siginfo(&frame->info, info);
         env->xregs[1] = frame_addr + offsetof(struct target_rt_sigframe, info);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Transform the prot bit to a qemu internal page bit, and save
it in the page tables.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201016184207.786698-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/exec/cpu-all.h     |  2 ++
 linux-user/syscall_defs.h  |  4 ++++
 target/arm/cpu.h           |  5 +++++
 linux-user/mmap.c          | 16 ++++++++++++++++
 target/arm/translate-a64.c |  6 +++---
 5 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index XXXXXXX..XXXXXXX 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -XXX,XX +XXX,XX @@ extern intptr_t qemu_host_page_mask;
 /* FIXME: Code that sets/uses this is broken and needs to go away.  */
 #define PAGE_RESERVED  0x0020
 #endif
+/* Target-specific bits that will be used via page_get_flags().  */
+#define PAGE_TARGET_1  0x0080
 
 #if defined(CONFIG_USER_ONLY)
 void page_dump(FILE *f);
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -XXX,XX +XXX,XX @@ struct target_winsize {
 #define TARGET_PROT_SEM         0x08
 #endif
 
+#ifdef TARGET_AARCH64
+#define TARGET_PROT_BTI         0x10
+#endif
+
 /* Common */
 #define TARGET_MAP_SHARED	0x01		/* Share changes */
 #define TARGET_MAP_PRIVATE	0x02		/* Changes are private */
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline MemTxAttrs *typecheck_memtxattrs(MemTxAttrs *x)
 #define arm_tlb_bti_gp(x) (typecheck_memtxattrs(x)->target_tlb_bit0)
 #define arm_tlb_mte_tagged(x) (typecheck_memtxattrs(x)->target_tlb_bit1)
 
+/*
+ * AArch64 usage of the PAGE_TARGET_* bits for linux-user.
+ */
+#define PAGE_BTI  PAGE_TARGET_1
+
 /*
  * Naming convention for isar_feature functions:
  * Functions which test 32-bit ID registers should have _aa32_ in
diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -XXX,XX +XXX,XX @@ static int validate_prot_to_pageflags(int *host_prot, int prot)
     *host_prot = (prot & (PROT_READ | PROT_WRITE))
                | (prot & PROT_EXEC ? PROT_READ : 0);
 
+#ifdef TARGET_AARCH64
+    /*
+     * The PROT_BTI bit is only accepted if the cpu supports the feature.
+     * Since this is the unusual case, don't bother checking unless
+     * the bit has been requested.  If set and valid, record the bit
+     * within QEMU's page_flags.
+     */
+    if (prot & TARGET_PROT_BTI) {
+        ARMCPU *cpu = ARM_CPU(thread_cpu);
+        if (cpu_isar_feature(aa64_bti, cpu)) {
+            valid |= TARGET_PROT_BTI;
+            page_flags |= PAGE_BTI;
+        }
+    }
+#endif
+
     return prot & ~valid ? 0 : page_flags;
 }
 
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_data_proc_simd_fp(DisasContext *s, uint32_t insn)
  */
 static bool is_guarded_page(CPUARMState *env, DisasContext *s)
 {
-#ifdef CONFIG_USER_ONLY
-    return false;  /* FIXME */
-#else
     uint64_t addr = s->base.pc_first;
+#ifdef CONFIG_USER_ONLY
+    return page_get_flags(addr) & PAGE_BTI;
+#else
     int mmu_idx = arm_to_core_mmu_idx(s->mmu_idx);
     unsigned int index = tlb_index(env, mmu_idx, addr);
     CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

These are all of the defines required to parse
GNU_PROPERTY_AARCH64_FEATURE_1_AND, copied from binutils.
Other missing defines related to other GNU program headers
and notes are elided for now.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201016184207.786698-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/elf.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/include/elf.h b/include/elf.h
index XXXXXXX..XXXXXXX 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -XXX,XX +XXX,XX @@ typedef int64_t  Elf64_Sxword;
 #define PT_NOTE    4
 #define PT_SHLIB   5
 #define PT_PHDR    6
+#define PT_LOOS    0x60000000
+#define PT_HIOS    0x6fffffff
 #define PT_LOPROC  0x70000000
 #define PT_HIPROC  0x7fffffff
 
+#define PT_GNU_PROPERTY   (PT_LOOS + 0x474e553)
+
 #define PT_MIPS_REGINFO   0x70000000
 #define PT_MIPS_RTPROC    0x70000001
 #define PT_MIPS_OPTIONS   0x70000002
@@ -XXX,XX +XXX,XX @@ typedef struct elf64_shdr {
 #define NT_ARM_SYSTEM_CALL      0x404   /* ARM system call number */
 #define NT_ARM_SVE      0x405           /* ARM Scalable Vector Extension regs */
 
+/* Defined note types for GNU systems.  */
+
+#define NT_GNU_PROPERTY_TYPE_0  5       /* Program property */
+
+/* Values used in GNU .note.gnu.property notes (NT_GNU_PROPERTY_TYPE_0).  */
+
+#define GNU_PROPERTY_STACK_SIZE                 1
+#define GNU_PROPERTY_NO_COPY_ON_PROTECTED       2
+
+#define GNU_PROPERTY_LOPROC                     0xc0000000
+#define GNU_PROPERTY_HIPROC                     0xdfffffff
+#define GNU_PROPERTY_LOUSER                     0xe0000000
+#define GNU_PROPERTY_HIUSER                     0xffffffff
+
+#define GNU_PROPERTY_AARCH64_FEATURE_1_AND      0xc0000000
+#define GNU_PROPERTY_AARCH64_FEATURE_1_BTI      (1u << 0)
+#define GNU_PROPERTY_AARCH64_FEATURE_1_PAC      (1u << 1)
+
 /*
  * Physical entry point into the kernel.
  *
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

Fix an unlikely memory leak in load_elf_image().

Fixes: bf858897b7 ("linux-user: Re-use load_elf_image for the main binary.")
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201016184207.786698-5-richard.henderson@linaro.org
Message-Id: <20201003174944.1972444-1-f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 linux-user/elfload.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
                 info->brk = vaddr_em;
             }
         } else if (eppnt->p_type == PT_INTERP && pinterp_name) {
-            char *interp_name;
+            g_autofree char *interp_name = NULL;
 
             if (*pinterp_name) {
                 errmsg = "Multiple PT_INTERP entries";
                 goto exit_errmsg;
             }
-            interp_name = malloc(eppnt->p_filesz);
+            interp_name = g_malloc(eppnt->p_filesz);
             if (!interp_name) {
                 goto exit_perror;
             }
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
                 errmsg = "Invalid PT_INTERP entry";
                 goto exit_errmsg;
             }
-            *pinterp_name = interp_name;
+            *pinterp_name = g_steal_pointer(&interp_name);
 #ifdef TARGET_MIPS
         } else if (eppnt->p_type == PT_MIPS_ABIFLAGS) {
             Mips_elf_abiflags_v0 abiflags;
@@ -XXX,XX +XXX,XX @@ int load_elf_binary(struct linux_binprm *bprm, struct image_info *info)
     if (elf_interpreter) {
         info->load_bias = interp_info.load_bias;
         info->entry = interp_info.entry;
-        free(elf_interpreter);
+        g_free(elf_interpreter);
     }
 
 #ifdef USE_ELF_CORE_DUMP
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Fixing this now will clarify following patches.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201016184207.786698-6-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 linux-user/elfload.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
             abi_ulong vaddr, vaddr_po, vaddr_ps, vaddr_ef, vaddr_em, vaddr_len;
             int elf_prot = 0;
 
-            if (eppnt->p_flags & PF_R) elf_prot =  PROT_READ;
-            if (eppnt->p_flags & PF_W) elf_prot |= PROT_WRITE;
-            if (eppnt->p_flags & PF_X) elf_prot |= PROT_EXEC;
+            if (eppnt->p_flags & PF_R) {
+                elf_prot |= PROT_READ;
+            }
+            if (eppnt->p_flags & PF_W) {
+                elf_prot |= PROT_WRITE;
+            }
+            if (eppnt->p_flags & PF_X) {
+                elf_prot |= PROT_EXEC;
+            }
 
             vaddr = load_bias + eppnt->p_vaddr;
             vaddr_po = TARGET_ELF_PAGEOFFSET(vaddr);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The second loop uses a loop induction variable, and the first
does not.  Transform the first to match the second, to simplify
a following patch moving code between them.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201016184207.786698-7-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 linux-user/elfload.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
     loaddr = -1, hiaddr = 0;
     info->alignment = 0;
     for (i = 0; i < ehdr->e_phnum; ++i) {
-        if (phdr[i].p_type == PT_LOAD) {
-            abi_ulong a = phdr[i].p_vaddr - phdr[i].p_offset;
+        struct elf_phdr *eppnt = phdr + i;
+        if (eppnt->p_type == PT_LOAD) {
+            abi_ulong a = eppnt->p_vaddr - eppnt->p_offset;
             if (a < loaddr) {
                 loaddr = a;
             }
-            a = phdr[i].p_vaddr + phdr[i].p_memsz;
+            a = eppnt->p_vaddr + eppnt->p_memsz;
             if (a > hiaddr) {
                 hiaddr = a;
             }
             ++info->nsegs;
-            info->alignment |= phdr[i].p_align;
+            info->alignment |= eppnt->p_align;
         }
     }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

For BTI, we need to know if the executable is static or dynamic,
which means looking for PT_INTERP earlier.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201016184207.786698-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 linux-user/elfload.c | 60 +++++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
 
     mmap_lock();
 
-    /* Find the maximum size of the image and allocate an appropriate
-       amount of memory to handle that.  */
+    /*
+     * Find the maximum size of the image and allocate an appropriate
+     * amount of memory to handle that.  Locate the interpreter, if any.
+     */
     loaddr = -1, hiaddr = 0;
     info->alignment = 0;
     for (i = 0; i < ehdr->e_phnum; ++i) {
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
             }
             ++info->nsegs;
             info->alignment |= eppnt->p_align;
+        } else if (eppnt->p_type == PT_INTERP && pinterp_name) {
+            g_autofree char *interp_name = NULL;
+
+            if (*pinterp_name) {
+                errmsg = "Multiple PT_INTERP entries";
+                goto exit_errmsg;
+            }
+            interp_name = g_malloc(eppnt->p_filesz);
+            if (!interp_name) {
+                goto exit_perror;
+            }
+
+            if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) {
+                memcpy(interp_name, bprm_buf + eppnt->p_offset,
+                       eppnt->p_filesz);
+            } else {
+                retval = pread(image_fd, interp_name, eppnt->p_filesz,
+                               eppnt->p_offset);
+                if (retval != eppnt->p_filesz) {
+                    goto exit_perror;
+                }
+            }
+            if (interp_name[eppnt->p_filesz - 1] != 0) {
+                errmsg = "Invalid PT_INTERP entry";
+                goto exit_errmsg;
+            }
+            *pinterp_name = g_steal_pointer(&interp_name);
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
             if (vaddr_em > info->brk) {
                 info->brk = vaddr_em;
             }
-        } else if (eppnt->p_type == PT_INTERP && pinterp_name) {
-            g_autofree char *interp_name = NULL;
-
-            if (*pinterp_name) {
-                errmsg = "Multiple PT_INTERP entries";
-                goto exit_errmsg;
-            }
-            interp_name = g_malloc(eppnt->p_filesz);
-            if (!interp_name) {
-                goto exit_perror;
-            }
-
-            if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) {
-                memcpy(interp_name, bprm_buf + eppnt->p_offset,
-                       eppnt->p_filesz);
-            } else {
-                retval = pread(image_fd, interp_name, eppnt->p_filesz,
-                               eppnt->p_offset);
-                if (retval != eppnt->p_filesz) {
-                    goto exit_perror;
-                }
-            }
-            if (interp_name[eppnt->p_filesz - 1] != 0) {
-                errmsg = "Invalid PT_INTERP entry";
-                goto exit_errmsg;
-            }
-            *pinterp_name = g_steal_pointer(&interp_name);
 #ifdef TARGET_MIPS
         } else if (eppnt->p_type == PT_MIPS_ABIFLAGS) {
             Mips_elf_abiflags_v0 abiflags;
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is a bit clearer than open-coding some of this
with a bare c string.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201016184207.786698-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 linux-user/elfload.c | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/guest-random.h"
 #include "qemu/units.h"
 #include "qemu/selfmap.h"
+#include "qapi/error.h"
 
 #ifdef _ARCH_PPC64
 #undef ARCH_DLINFO
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
     struct elf_phdr *phdr;
     abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
     int i, retval;
-    const char *errmsg;
+    Error *err = NULL;
 
     /* First of all, some simple consistency checks */
-    errmsg = "Invalid ELF image for this architecture";
     if (!elf_check_ident(ehdr)) {
+        error_setg(&err, "Invalid ELF image for this architecture");
         goto exit_errmsg;
     }
     bswap_ehdr(ehdr);
     if (!elf_check_ehdr(ehdr)) {
+        error_setg(&err, "Invalid ELF image for this architecture");
         goto exit_errmsg;
     }
 
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
             g_autofree char *interp_name = NULL;
 
             if (*pinterp_name) {
-                errmsg = "Multiple PT_INTERP entries";
+                error_setg(&err, "Multiple PT_INTERP entries");
                 goto exit_errmsg;
             }
+
             interp_name = g_malloc(eppnt->p_filesz);
-            if (!interp_name) {
-                goto exit_perror;
-            }
 
             if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) {
                 memcpy(interp_name, bprm_buf + eppnt->p_offset,
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
                 retval = pread(image_fd, interp_name, eppnt->p_filesz,
                                eppnt->p_offset);
                 if (retval != eppnt->p_filesz) {
-                    goto exit_perror;
+                    goto exit_read;
                 }
             }
             if (interp_name[eppnt->p_filesz - 1] != 0) {
-                errmsg = "Invalid PT_INTERP entry";
+                error_setg(&err, "Invalid PT_INTERP entry");
                 goto exit_errmsg;
             }
             *pinterp_name = g_steal_pointer(&interp_name);
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
                             (ehdr->e_type == ET_EXEC ? MAP_FIXED : 0),
                             -1, 0);
     if (load_addr == -1) {
-        goto exit_perror;
+        goto exit_mmap;
     }
     load_bias = load_addr - loaddr;
 
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
                                     image_fd, eppnt->p_offset - vaddr_po);
 
                 if (error == -1) {
-                    goto exit_perror;
+                    goto exit_mmap;
                 }
             }
 
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
         } else if (eppnt->p_type == PT_MIPS_ABIFLAGS) {
             Mips_elf_abiflags_v0 abiflags;
             if (eppnt->p_filesz < sizeof(Mips_elf_abiflags_v0)) {
-                errmsg = "Invalid PT_MIPS_ABIFLAGS entry";
+                error_setg(&err, "Invalid PT_MIPS_ABIFLAGS entry");
                 goto exit_errmsg;
             }
             if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) {
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
                 retval = pread(image_fd, &abiflags, sizeof(Mips_elf_abiflags_v0),
                                eppnt->p_offset);
                 if (retval != sizeof(Mips_elf_abiflags_v0)) {
-                    goto exit_perror;
+                    goto exit_read;
                 }
             }
             bswap_mips_abiflags(&abiflags);
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
 
  exit_read:
     if (retval >= 0) {
-        errmsg = "Incomplete read of file header";
-        goto exit_errmsg;
+        error_setg(&err, "Incomplete read of file header");
+    } else {
+        error_setg_errno(&err, errno, "Error reading file header");
     }
- exit_perror:
-    errmsg = strerror(errno);
+    goto exit_errmsg;
+ exit_mmap:
+    error_setg_errno(&err, errno, "Error mapping file");
+    goto exit_errmsg;
  exit_errmsg:
-    fprintf(stderr, "%s: %s\n", image_name, errmsg);
+    error_reportf_err(err, "%s: ", image_name);
     exit(-1);
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is slightly clearer than just using strerror, though
the different forms produced by error_setg_file_open and
error_setg_errno isn't entirely convenient.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201016184207.786698-10-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 linux-user/elfload.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static void load_elf_interp(const char *filename, struct image_info *info,
                             char bprm_buf[BPRM_BUF_SIZE])
 {
     int fd, retval;
+    Error *err = NULL;
 
     fd = open(path(filename), O_RDONLY);
     if (fd < 0) {
-        goto exit_perror;
+        error_setg_file_open(&err, errno, filename);
+        error_report_err(err);
+        exit(-1);
     }
 
     retval = read(fd, bprm_buf, BPRM_BUF_SIZE);
     if (retval < 0) {
-        goto exit_perror;
+        error_setg_errno(&err, errno, "Error reading file header");
+        error_reportf_err(err, "%s: ", filename);
+        exit(-1);
     }
+
     if (retval < BPRM_BUF_SIZE) {
         memset(bprm_buf + retval, 0, BPRM_BUF_SIZE - retval);
     }
 
     load_elf_image(filename, fd, info, NULL, bprm_buf);
-    return;
-
- exit_perror:
-    fprintf(stderr, "%s: %s\n", filename, strerror(errno));
-    exit(-1);
 }
 
 static int symfind(const void *s0, const void *s1)
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is generic support, with the code disabled for all targets.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201016184207.786698-11-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 linux-user/qemu.h    |   4 ++
 linux-user/elfload.c | 157 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 161 insertions(+)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -XXX,XX +XXX,XX @@ struct image_info {
         abi_ulong       interpreter_loadmap_addr;
         abi_ulong       interpreter_pt_dynamic_addr;
         struct image_info *other_info;
+
+        /* For target-specific processing of NT_GNU_PROPERTY_TYPE_0. */
+        uint32_t        note_flags;
+
 #ifdef TARGET_MIPS
         int             fp_abi;
         int             interp_fp_abi;
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static void elf_core_copy_regs(target_elf_gregset_t *regs,
 
 #include "elf.h"
 
+static bool arch_parse_elf_property(uint32_t pr_type, uint32_t pr_datasz,
+                                    const uint32_t *data,
+                                    struct image_info *info,
+                                    Error **errp)
+{
+    g_assert_not_reached();
+}
+#define ARCH_USE_GNU_PROPERTY 0
+
 struct exec
 {
     unsigned int a_info;   /* Use macros N_MAGIC, etc for access */
@@ -XXX,XX +XXX,XX @@ void probe_guest_base(const char *image_name, abi_ulong guest_loaddr,
                   "@ 0x%" PRIx64 "\n", (uint64_t)guest_base);
 }
 
+enum {
+    /* The string "GNU\0" as a magic number. */
+    GNU0_MAGIC = const_le32('G' | 'N' << 8 | 'U' << 16),
+    NOTE_DATA_SZ = 1 * KiB,
+    NOTE_NAME_SZ = 4,
+    ELF_GNU_PROPERTY_ALIGN = ELF_CLASS == ELFCLASS32 ? 4 : 8,
+};
+
+/*
+ * Process a single gnu_property entry.
+ * Return false for error.
+ */
+static bool parse_elf_property(const uint32_t *data, int *off, int datasz,
+                               struct image_info *info, bool have_prev_type,
+                               uint32_t *prev_type, Error **errp)
+{
+    uint32_t pr_type, pr_datasz, step;
+
+    if (*off > datasz || !QEMU_IS_ALIGNED(*off, ELF_GNU_PROPERTY_ALIGN)) {
+        goto error_data;
+    }
+    datasz -= *off;
+    data += *off / sizeof(uint32_t);
+
+    if (datasz < 2 * sizeof(uint32_t)) {
+        goto error_data;
+    }
+    pr_type = data[0];
+    pr_datasz = data[1];
+    data += 2;
+    datasz -= 2 * sizeof(uint32_t);
+    step = ROUND_UP(pr_datasz, ELF_GNU_PROPERTY_ALIGN);
+    if (step > datasz) {
+        goto error_data;
+    }
+
+    /* Properties are supposed to be unique and sorted on pr_type. */
+    if (have_prev_type && pr_type <= *prev_type) {
+        if (pr_type == *prev_type) {
+            error_setg(errp, "Duplicate property in PT_GNU_PROPERTY");
+        } else {
+            error_setg(errp, "Unsorted property in PT_GNU_PROPERTY");
+        }
+        return false;
+    }
+    *prev_type = pr_type;
+
+    if (!arch_parse_elf_property(pr_type, pr_datasz, data, info, errp)) {
+        return false;
+    }
+
+    *off += 2 * sizeof(uint32_t) + step;
+    return true;
+
+ error_data:
+    error_setg(errp, "Ill-formed property in PT_GNU_PROPERTY");
+    return false;
+}
+
+/* Process NT_GNU_PROPERTY_TYPE_0. */
+static bool parse_elf_properties(int image_fd,
+                                 struct image_info *info,
+                                 const struct elf_phdr *phdr,
+                                 char bprm_buf[BPRM_BUF_SIZE],
+                                 Error **errp)
+{
+    union {
+        struct elf_note nhdr;
+        uint32_t data[NOTE_DATA_SZ / sizeof(uint32_t)];
+    } note;
+
+    int n, off, datasz;
+    bool have_prev_type;
+    uint32_t prev_type;
+
+    /* Unless the arch requires properties, ignore them. */
+    if (!ARCH_USE_GNU_PROPERTY) {
+        return true;
+    }
+
+    /* If the properties are crazy large, that's too bad. */
+    n = phdr->p_filesz;
+    if (n > sizeof(note)) {
+        error_setg(errp, "PT_GNU_PROPERTY too large");
+        return false;
+    }
+    if (n < sizeof(note.nhdr)) {
+        error_setg(errp, "PT_GNU_PROPERTY too small");
+        return false;
+    }
+
+    if (phdr->p_offset + n <= BPRM_BUF_SIZE) {
+        memcpy(&note, bprm_buf + phdr->p_offset, n);
+    } else {
+        ssize_t len = pread(image_fd, &note, n, phdr->p_offset);
+        if (len != n) {
+            error_setg_errno(errp, errno, "Error reading file header");
+            return false;
+        }
+    }
+
+    /*
+     * The contents of a valid PT_GNU_PROPERTY is a sequence
+     * of uint32_t -- swap them all now.
+     */
+#ifdef BSWAP_NEEDED
+    for (int i = 0; i < n / 4; i++) {
+        bswap32s(note.data + i);
+    }
+#endif
+
+    /*
+     * Note that nhdr is 3 words, and that the "name" described by namesz
+     * immediately follows nhdr and is thus at the 4th word.  Further, all
+     * of the inputs to the kernel's round_up are multiples of 4.
+     */
+    if (note.nhdr.n_type != NT_GNU_PROPERTY_TYPE_0 ||
+        note.nhdr.n_namesz != NOTE_NAME_SZ ||
+        note.data[3] != GNU0_MAGIC) {
+        error_setg(errp, "Invalid note in PT_GNU_PROPERTY");
+        return false;
+    }
+    off = sizeof(note.nhdr) + NOTE_NAME_SZ;
+
+    datasz = note.nhdr.n_descsz + off;
+    if (datasz > n) {
+        error_setg(errp, "Invalid note size in PT_GNU_PROPERTY");
+        return false;
+    }
+
+    have_prev_type = false;
+    prev_type = 0;
+    while (1) {
+        if (off == datasz) {
+            return true;  /* end, exit ok */
+        }
+        if (!parse_elf_property(note.data, &off, datasz, info,
+                                have_prev_type, &prev_type, errp)) {
+            return false;
+        }
+        have_prev_type = true;
+    }
+}
+
 /* Load an ELF image into the address space.
 
    IMAGE_NAME is the filename of the image, to use in error messages.
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
                 goto exit_errmsg;
             }
             *pinterp_name = g_steal_pointer(&interp_name);
+        } else if (eppnt->p_type == PT_GNU_PROPERTY) {
+            if (!parse_elf_properties(image_fd, info, eppnt, bprm_buf, &err)) {
+                goto exit_errmsg;
+            }
         }
     }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Use the new generic support for NT_GNU_PROPERTY_TYPE_0.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201016184207.786698-12-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 linux-user/elfload.c | 48 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static void elf_core_copy_regs(target_elf_gregset_t *regs,
 
 #include "elf.h"
 
+/* We must delay the following stanzas until after "elf.h". */
+#if defined(TARGET_AARCH64)
+
+static bool arch_parse_elf_property(uint32_t pr_type, uint32_t pr_datasz,
+                                    const uint32_t *data,
+                                    struct image_info *info,
+                                    Error **errp)
+{
+    if (pr_type == GNU_PROPERTY_AARCH64_FEATURE_1_AND) {
+        if (pr_datasz != sizeof(uint32_t)) {
+            error_setg(errp, "Ill-formed GNU_PROPERTY_AARCH64_FEATURE_1_AND");
+            return false;
+        }
+        /* We will extract GNU_PROPERTY_AARCH64_FEATURE_1_BTI later. */
+        info->note_flags = *data;
+    }
+    return true;
+}
+#define ARCH_USE_GNU_PROPERTY 1
+
+#else
+
 static bool arch_parse_elf_property(uint32_t pr_type, uint32_t pr_datasz,
                                     const uint32_t *data,
                                     struct image_info *info,
@@ -XXX,XX +XXX,XX @@ static bool arch_parse_elf_property(uint32_t pr_type, uint32_t pr_datasz,
 }
 #define ARCH_USE_GNU_PROPERTY 0
 
+#endif
+
 struct exec
 {
     unsigned int a_info;   /* Use macros N_MAGIC, etc for access */
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
     struct elfhdr *ehdr = (struct elfhdr *)bprm_buf;
     struct elf_phdr *phdr;
     abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
-    int i, retval;
+    int i, retval, prot_exec;
     Error *err = NULL;
 
     /* First of all, some simple consistency checks */
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
     info->brk = 0;
     info->elf_flags = ehdr->e_flags;
 
+    prot_exec = PROT_EXEC;
+#ifdef TARGET_AARCH64
+    /*
+     * If the BTI feature is present, this indicates that the executable
+     * pages of the startup binary should be mapped with PROT_BTI, so that
+     * branch targets are enforced.
+     *
+     * The startup binary is either the interpreter or the static executable.
+     * The interpreter is responsible for all pages of a dynamic executable.
+     *
+     * Elf notes are backward compatible to older cpus.
+     * Do not enable BTI unless it is supported.
+     */
+    if ((info->note_flags & GNU_PROPERTY_AARCH64_FEATURE_1_BTI)
+        && (pinterp_name == NULL || *pinterp_name == 0)
+        && cpu_isar_feature(aa64_bti, ARM_CPU(thread_cpu))) {
+        prot_exec |= TARGET_PROT_BTI;
+    }
+#endif
+
     for (i = 0; i < ehdr->e_phnum; i++) {
         struct elf_phdr *eppnt = phdr + i;
         if (eppnt->p_type == PT_LOAD) {
@@ -XXX,XX +XXX,XX @@ static void load_elf_image(const char *image_name, int image_fd,
                 elf_prot |= PROT_WRITE;
             }
             if (eppnt->p_flags & PF_X) {
-                elf_prot |= PROT_EXEC;
+                elf_prot |= prot_exec;
             }
 
             vaddr = load_bias + eppnt->p_vaddr;
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The note test requires gcc 10 for -mbranch-protection=standard.
The mmap test uses PROT_BTI and does not require special compiler support.

Acked-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201016184207.786698-13-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/tcg/aarch64/bti-1.c         |  62 +++++++++++++++++
 tests/tcg/aarch64/bti-2.c         | 108 ++++++++++++++++++++++++++++++
 tests/tcg/aarch64/bti-crt.inc.c   |  51 ++++++++++++++
 tests/tcg/aarch64/Makefile.target |  10 +++
 tests/tcg/configure.sh            |   4 ++
 5 files changed, 235 insertions(+)
 create mode 100644 tests/tcg/aarch64/bti-1.c
 create mode 100644 tests/tcg/aarch64/bti-2.c
 create mode 100644 tests/tcg/aarch64/bti-crt.inc.c

diff --git a/tests/tcg/aarch64/bti-1.c b/tests/tcg/aarch64/bti-1.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/tcg/aarch64/bti-1.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * Branch target identification, basic notskip cases.
+ */
+
+#include "bti-crt.inc.c"
+
+static void skip2_sigill(int sig, siginfo_t *info, ucontext_t *uc)
+{
+    uc->uc_mcontext.pc += 8;
+    uc->uc_mcontext.pstate = 1;
+}
+
+#define NOP       "nop"
+#define BTI_N     "hint #32"
+#define BTI_C     "hint #34"
+#define BTI_J     "hint #36"
+#define BTI_JC    "hint #38"
+
+#define BTYPE_1(DEST) \
+    asm("mov %0,#1; adr x16, 1f; br x16; 1: " DEST "; mov %0,#0" \
+        : "=r"(skipped) : : "x16")
+
+#define BTYPE_2(DEST) \
+    asm("mov %0,#1; adr x16, 1f; blr x16; 1: " DEST "; mov %0,#0" \
+        : "=r"(skipped) : : "x16", "x30")
+
+#define BTYPE_3(DEST) \
+    asm("mov %0,#1; adr x15, 1f; br x15; 1: " DEST "; mov %0,#0" \
+        : "=r"(skipped) : : "x15")
+
+#define TEST(WHICH, DEST, EXPECT) \
+    do { WHICH(DEST); fail += skipped ^ EXPECT; } while (0)
+
+
+int main()
+{
+    int fail = 0;
+    int skipped;
+
+    /* Signal-like with SA_SIGINFO.  */
+    signal_info(SIGILL, skip2_sigill);
+
+    TEST(BTYPE_1, NOP, 1);
+    TEST(BTYPE_1, BTI_N, 1);
+    TEST(BTYPE_1, BTI_C, 0);
+    TEST(BTYPE_1, BTI_J, 0);
+    TEST(BTYPE_1, BTI_JC, 0);
+
+    TEST(BTYPE_2, NOP, 1);
+    TEST(BTYPE_2, BTI_N, 1);
+    TEST(BTYPE_2, BTI_C, 0);
+    TEST(BTYPE_2, BTI_J, 1);
+    TEST(BTYPE_2, BTI_JC, 0);
+
+    TEST(BTYPE_3, NOP, 1);
+    TEST(BTYPE_3, BTI_N, 1);
+    TEST(BTYPE_3, BTI_C, 1);
+    TEST(BTYPE_3, BTI_J, 0);
+    TEST(BTYPE_3, BTI_JC, 0);
+
+    return fail;
+}
diff --git a/tests/tcg/aarch64/bti-2.c b/tests/tcg/aarch64/bti-2.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/tcg/aarch64/bti-2.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * Branch target identification, basic notskip cases.
+ */
+
+#include <stdio.h>
+#include <signal.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/mman.h>
+
+#ifndef PROT_BTI
+#define PROT_BTI  0x10
+#endif
+
+static void skip2_sigill(int sig, siginfo_t *info, void *vuc)
+{
+    ucontext_t *uc = vuc;
+    uc->uc_mcontext.pc += 8;
+    uc->uc_mcontext.pstate = 1;
+}
+
+#define NOP       "nop"
+#define BTI_N     "hint #32"
+#define BTI_C     "hint #34"
+#define BTI_J     "hint #36"
+#define BTI_JC    "hint #38"
+
+#define BTYPE_1(DEST)    \
+    "mov x1, #1\n\t"     \
+    "adr x16, 1f\n\t"    \
+    "br x16\n"           \
+"1: " DEST "\n\t"        \
+    "mov x1, #0"
+
+#define BTYPE_2(DEST)    \
+    "mov x1, #1\n\t"     \
+    "adr x16, 1f\n\t"    \
+    "blr x16\n"          \
+"1: " DEST "\n\t"        \
+    "mov x1, #0"
+
+#define BTYPE_3(DEST)    \
+    "mov x1, #1\n\t"     \
+    "adr x15, 1f\n\t"    \
+    "br x15\n"           \
+"1: " DEST "\n\t"        \
+    "mov x1, #0"
+
+#define TEST(WHICH, DEST, EXPECT) \
+    WHICH(DEST) "\n"              \
+    ".if " #EXPECT "\n\t"         \
+    "eor x1, x1," #EXPECT "\n"    \
+    ".endif\n\t"                  \
+    "add x0, x0, x1\n\t"
+
+extern char test_begin[], test_end[];
+
+asm("\n"
+"test_begin:\n\t"
+    BTI_C "\n\t"
+    "mov x2, x30\n\t"
+    "mov x0, #0\n\t"
+
+    TEST(BTYPE_1, NOP, 1)
+    TEST(BTYPE_1, BTI_N, 1)
+    TEST(BTYPE_1, BTI_C, 0)
+    TEST(BTYPE_1, BTI_J, 0)
+    TEST(BTYPE_1, BTI_JC, 0)
+
+    TEST(BTYPE_2, NOP, 1)
+    TEST(BTYPE_2, BTI_N, 1)
+    TEST(BTYPE_2, BTI_C, 0)
+    TEST(BTYPE_2, BTI_J, 1)
+    TEST(BTYPE_2, BTI_JC, 0)
+
+    TEST(BTYPE_3, NOP, 1)
+    TEST(BTYPE_3, BTI_N, 1)
+    TEST(BTYPE_3, BTI_C, 1)
+    TEST(BTYPE_3, BTI_J, 0)
+    TEST(BTYPE_3, BTI_JC, 0)
+
+    "ret x2\n"
+"test_end:"
+);
+
+int main()
+{
+    struct sigaction sa;
+
+    void *p = mmap(0, getpagesize(),
+                   PROT_EXEC | PROT_READ | PROT_WRITE | PROT_BTI,
+                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+    if (p == MAP_FAILED) {
+        perror("mmap");
+        return 1;
+    }
+
+    memset(&sa, 0, sizeof(sa));
+    sa.sa_sigaction = skip2_sigill;
+    sa.sa_flags = SA_SIGINFO;
+    if (sigaction(SIGILL, &sa, NULL) < 0) {
+        perror("sigaction");
+        return 1;
+    }
+
+    memcpy(p, test_begin, test_end - test_begin);
+    return ((int (*)(void))p)();
+}
diff --git a/tests/tcg/aarch64/bti-crt.inc.c b/tests/tcg/aarch64/bti-crt.inc.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/tcg/aarch64/bti-crt.inc.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * Minimal user-environment for testing BTI.
+ *
+ * Normal libc is not (yet) built with BTI support enabled,
+ * and so could generate a BTI TRAP before ever reaching main.
+ */
+
+#include <stdlib.h>
+#include <signal.h>
+#include <ucontext.h>
+#include <asm/unistd.h>
+
+int main(void);
+
+void _start(void)
+{
+    exit(main());
+}
+
+void exit(int ret)
+{
+    register int x0 __asm__("x0") = ret;
+    register int x8 __asm__("x8") = __NR_exit;
+
+    asm volatile("svc #0" : : "r"(x0), "r"(x8));
+    __builtin_unreachable();
+}
+
+/*
+ * Irritatingly, the user API struct sigaction does not match the
+ * kernel API struct sigaction.  So for simplicity, isolate the
+ * kernel ABI here, and make this act like signal.
+ */
+void signal_info(int sig, void (*fn)(int, siginfo_t *, ucontext_t *))
+{
+    struct kernel_sigaction {
+        void (*handler)(int, siginfo_t *, ucontext_t *);
+        unsigned long flags;
+        unsigned long restorer;
+        unsigned long mask;
+    } sa = { fn, SA_SIGINFO, 0, 0 };
+
+    register int x0 __asm__("x0") = sig;
+    register void *x1 __asm__("x1") = &sa;
+    register void *x2 __asm__("x2") = 0;
+    register int x3 __asm__("x3") = sizeof(unsigned long);
+    register int x8 __asm__("x8") = __NR_rt_sigaction;
+
+    asm volatile("svc #0"
+                 : : "r"(x0), "r"(x1), "r"(x2), "r"(x3), "r"(x8) : "memory");
+}
diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
index XXXXXXX..XXXXXXX 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -XXX,XX +XXX,XX @@ run-pauth-%: QEMU_OPTS += -cpu max
 run-plugin-pauth-%: QEMU_OPTS += -cpu max
 endif
 
+# BTI Tests
+# bti-1 tests the elf notes, so we require special compiler support.
+ifneq ($(DOCKER_IMAGE)$(CROSS_CC_HAS_ARMV8_BTI),)
+AARCH64_TESTS += bti-1
+bti-1: CFLAGS += -mbranch-protection=standard
+bti-1: LDFLAGS += -nostdlib
+endif
+# bti-2 tests PROT_BTI, so no special compiler support required.
+AARCH64_TESTS += bti-2
+
 # Semihosting smoke test for linux-user
 AARCH64_TESTS += semihosting
 run-semihosting: semihosting
diff --git a/tests/tcg/configure.sh b/tests/tcg/configure.sh
index XXXXXXX..XXXXXXX 100755
--- a/tests/tcg/configure.sh
+++ b/tests/tcg/configure.sh
@@ -XXX,XX +XXX,XX @@ for target in $target_list; do
                -march=armv8.3-a -o $TMPE $TMPC; then
                 echo "CROSS_CC_HAS_ARMV8_3=y" >> $config_target_mak
             fi
+            if do_compiler "$target_compiler" $target_compiler_cflags \
+               -mbranch-protection=standard -o $TMPE $TMPC; then
+                echo "CROSS_CC_HAS_ARMV8_BTI=y" >> $config_target_mak
+            fi
         ;;
     esac
 
-- 
2.20.1

The following changes since commit 5a67d7735d4162630769ef495cf813244fc850df:

Merge remote-tracking branch 'remotes/berrange-gitlab/tags/tls-deps-pull-request' into staging (2021-07-02 08:22:39 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210702

for you to fetch changes up to 04ea4d3cfd0a21b248ece8eb7a9436a3d9898dd8:

target/arm: Implement MVE shifts by register (2021-07-02 11:48:38 +0100)

----------------------------------------------------------------
target-arm queue:
 * more MVE instructions
 * hw/gpio/gpio_pwr: use shutdown function for reboot
 * target/arm: Check NaN mode before silencing NaN
 * tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine
 * hw/arm: Add basic power management to raspi.
 * docs/system/arm: Add quanta-gbs-bmc, quanta-q7l1-bmc

----------------------------------------------------------------
Joe Komlodi (1):
      target/arm: Check NaN mode before silencing NaN

Maxim Uvarov (1):
      hw/gpio/gpio_pwr: use shutdown function for reboot

Nolan Leake (1):
      hw/arm: Add basic power management to raspi.

Patrick Venture (2):
      docs/system/arm: Add quanta-q7l1-bmc reference
      docs/system/arm: Add quanta-gbs-bmc reference

Peter Maydell (18):
      target/arm: Fix MVE widening/narrowing VLDR/VSTR offset calculation
      target/arm: Fix bugs in MVE VRMLALDAVH, VRMLSLDAVH
      target/arm: Make asimd_imm_const() public
      target/arm: Use asimd_imm_const for A64 decode
      target/arm: Use dup_const() instead of bitfield_replicate()
      target/arm: Implement MVE logical immediate insns
      target/arm: Implement MVE vector shift left by immediate insns
      target/arm: Implement MVE vector shift right by immediate insns
      target/arm: Implement MVE VSHLL
      target/arm: Implement MVE VSRI, VSLI
      target/arm: Implement MVE VSHRN, VRSHRN
      target/arm: Implement MVE saturating narrowing shifts
      target/arm: Implement MVE VSHLC
      target/arm: Implement MVE VADDLV
      target/arm: Implement MVE long shifts by immediate
      target/arm: Implement MVE long shifts by register
      target/arm: Implement MVE shifts by immediate
      target/arm: Implement MVE shifts by register

Philippe Mathieu-Daudé (1):
      tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine

docs/system/arm/aspeed.rst             |   1 +
 docs/system/arm/nuvoton.rst            |   5 +-
 include/hw/arm/bcm2835_peripherals.h   |   3 +-
 include/hw/misc/bcm2835_powermgt.h     |  29 ++
 target/arm/helper-mve.h                | 108 +++++++
 target/arm/translate.h                 |  41 +++
 target/arm/mve.decode                  | 177 ++++++++++-
 target/arm/t32.decode                  |  71 ++++-
 hw/arm/bcm2835_peripherals.c           |  13 +-
 hw/gpio/gpio_pwr.c                     |   2 +-
 hw/misc/bcm2835_powermgt.c             | 160 ++++++++++
 target/arm/helper-a64.c                |  12 +-
 target/arm/mve_helper.c                | 524 +++++++++++++++++++++++++++++++--
 target/arm/translate-a64.c             |  86 +-----
 target/arm/translate-mve.c             | 261 +++++++++++++++-
 target/arm/translate-neon.c            |  81 -----
 target/arm/translate.c                 | 327 +++++++++++++++++++-
 target/arm/vfp_helper.c                |  24 +-
 hw/misc/meson.build                    |   1 +
 tests/acceptance/boot_linux_console.py |  43 +++
 20 files changed, 1760 insertions(+), 209 deletions(-)
 create mode 100644 include/hw/misc/bcm2835_powermgt.h
 create mode 100644 hw/misc/bcm2835_powermgt.c

From: Patrick Venture <venture@google.com>

Add line item reference to quanta-gbs-bmc machine.

Signed-off-by: Patrick Venture <venture@google.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-id: 20210615192848.1065297-3-venture@google.com
[PMM: fixed underline Sphinx warning]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/nuvoton.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/nuvoton.rst
+++ b/docs/system/arm/nuvoton.rst
@@ -XXX,XX +XXX,XX @@
-Nuvoton iBMC boards (``npcm750-evb``, ``quanta-gsj``)
-=====================================================
+Nuvoton iBMC boards (``*-bmc``, ``npcm750-evb``, ``quanta-gsj``)
+================================================================
 
 The `Nuvoton iBMC`_ chips (NPCM7xx) are a family of ARM-based SoCs that are
 designed to be used as Baseboard Management Controllers (BMCs) in various
@@ -XXX,XX +XXX,XX @@ segment. The following machines are based on this chip :
 The NPCM730 SoC has two Cortex-A9 cores and is targeted for Data Center and
 Hyperscale applications. The following machines are based on this chip :
 
+- ``quanta-gbs-bmc``    Quanta GBS server BMC
 - ``quanta-gsj``        Quanta GSJ server BMC
 
 There are also two more SoCs, NPCM710 and NPCM705, which are single-core
-- 
2.20.1

From: Nolan Leake <nolan@sigbus.net>

This is just enough to make reboot and poweroff work. Works for
linux, u-boot, and the arm trusted firmware. Not tested, but should
work for plan9, and bare-metal/hobby OSes, since they seem to generally
do what linux does for reset.

The watchdog timer functionality is not yet implemented.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/64
Signed-off-by: Nolan Leake <nolan@sigbus.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210625210209.1870217-1-nolan@sigbus.net
[PMM: tweaked commit title; fixed region size to 0x200;
 moved header file to include/]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/bcm2835_peripherals.h |   3 +-
 include/hw/misc/bcm2835_powermgt.h   |  29 +++++
 hw/arm/bcm2835_peripherals.c         |  13 ++-
 hw/misc/bcm2835_powermgt.c           | 160 +++++++++++++++++++++++++++
 hw/misc/meson.build                  |   1 +
 5 files changed, 204 insertions(+), 2 deletions(-)
 create mode 100644 include/hw/misc/bcm2835_powermgt.h
 create mode 100644 hw/misc/bcm2835_powermgt.c

diff --git a/include/hw/arm/bcm2835_peripherals.h b/include/hw/arm/bcm2835_peripherals.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/bcm2835_peripherals.h
+++ b/include/hw/arm/bcm2835_peripherals.h
@@ -XXX,XX +XXX,XX @@
 #include "hw/misc/bcm2835_mphi.h"
 #include "hw/misc/bcm2835_thermal.h"
 #include "hw/misc/bcm2835_cprman.h"
+#include "hw/misc/bcm2835_powermgt.h"
 #include "hw/sd/sdhci.h"
 #include "hw/sd/bcm2835_sdhost.h"
 #include "hw/gpio/bcm2835_gpio.h"
@@ -XXX,XX +XXX,XX @@ struct BCM2835PeripheralState {
     BCM2835MphiState mphi;
     UnimplementedDeviceState txp;
     UnimplementedDeviceState armtmr;
-    UnimplementedDeviceState powermgt;
+    BCM2835PowerMgtState powermgt;
     BCM2835CprmanState cprman;
     PL011State uart0;
     BCM2835AuxState aux;
diff --git a/include/hw/misc/bcm2835_powermgt.h b/include/hw/misc/bcm2835_powermgt.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/hw/misc/bcm2835_powermgt.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * BCM2835 Power Management emulation
+ *
+ * Copyright (C) 2017 Marcin Chojnacki <marcinch7@gmail.com>
+ * Copyright (C) 2021 Nolan Leake <nolan@sigbus.net>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef BCM2835_POWERMGT_H
+#define BCM2835_POWERMGT_H
+
+#include "hw/sysbus.h"
+#include "qom/object.h"
+
+#define TYPE_BCM2835_POWERMGT "bcm2835-powermgt"
+OBJECT_DECLARE_SIMPLE_TYPE(BCM2835PowerMgtState, BCM2835_POWERMGT)
+
+struct BCM2835PowerMgtState {
+    SysBusDevice busdev;
+    MemoryRegion iomem;
+
+    uint32_t rstc;
+    uint32_t rsts;
+    uint32_t wdog;
+};
+
+#endif
diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/bcm2835_peripherals.c
+++ b/hw/arm/bcm2835_peripherals.c
@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_init(Object *obj)
 
     object_property_add_const_link(OBJECT(&s->dwc2), "dma-mr",
                                    OBJECT(&s->gpu_bus_mr));
+
+    /* Power Management */
+    object_initialize_child(obj, "powermgt", &s->powermgt,
+                            TYPE_BCM2835_POWERMGT);
 }
 
 static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
         qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_GPU_IRQ,
                                INTERRUPT_USB));
 
+    /* Power Management */
+    if (!sysbus_realize(SYS_BUS_DEVICE(&s->powermgt), errp)) {
+        return;
+    }
+
+    memory_region_add_subregion(&s->peri_mr, PM_OFFSET,
+                sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->powermgt), 0));
+
     create_unimp(s, &s->txp, "bcm2835-txp", TXP_OFFSET, 0x1000);
     create_unimp(s, &s->armtmr, "bcm2835-sp804", ARMCTRL_TIMER0_1_OFFSET, 0x40);
-    create_unimp(s, &s->powermgt, "bcm2835-powermgt", PM_OFFSET, 0x114);
     create_unimp(s, &s->i2s, "bcm2835-i2s", I2S_OFFSET, 0x100);
     create_unimp(s, &s->smi, "bcm2835-smi", SMI_OFFSET, 0x100);
     create_unimp(s, &s->spi[0], "bcm2835-spi0", SPI0_OFFSET, 0x20);
diff --git a/hw/misc/bcm2835_powermgt.c b/hw/misc/bcm2835_powermgt.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/misc/bcm2835_powermgt.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * BCM2835 Power Management emulation
+ *
+ * Copyright (C) 2017 Marcin Chojnacki <marcinch7@gmail.com>
+ * Copyright (C) 2021 Nolan Leake <nolan@sigbus.net>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "hw/misc/bcm2835_powermgt.h"
+#include "migration/vmstate.h"
+#include "sysemu/runstate.h"
+
+#define PASSWORD 0x5a000000
+#define PASSWORD_MASK 0xff000000
+
+#define R_RSTC 0x1c
+#define V_RSTC_RESET 0x20
+#define R_RSTS 0x20
+#define V_RSTS_POWEROFF 0x555 /* Linux uses partition 63 to indicate halt. */
+#define R_WDOG 0x24
+
+static uint64_t bcm2835_powermgt_read(void *opaque, hwaddr offset,
+                                      unsigned size)
+{
+    BCM2835PowerMgtState *s = (BCM2835PowerMgtState *)opaque;
+    uint32_t res = 0;
+
+    switch (offset) {
+    case R_RSTC:
+        res = s->rstc;
+        break;
+    case R_RSTS:
+        res = s->rsts;
+        break;
+    case R_WDOG:
+        res = s->wdog;
+        break;
+
+    default:
+        qemu_log_mask(LOG_UNIMP,
+                      "bcm2835_powermgt_read: Unknown offset 0x%08"HWADDR_PRIx
+                      "\n", offset);
+        res = 0;
+        break;
+    }
+
+    return res;
+}
+
+static void bcm2835_powermgt_write(void *opaque, hwaddr offset,
+                                   uint64_t value, unsigned size)
+{
+    BCM2835PowerMgtState *s = (BCM2835PowerMgtState *)opaque;
+
+    if ((value & PASSWORD_MASK) != PASSWORD) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "bcm2835_powermgt_write: Bad password 0x%"PRIx64
+                      " at offset 0x%08"HWADDR_PRIx"\n",
+                      value, offset);
+        return;
+    }
+
+    value = value & ~PASSWORD_MASK;
+
+    switch (offset) {
+    case R_RSTC:
+        s->rstc = value;
+        if (value & V_RSTC_RESET) {
+            if ((s->rsts & 0xfff) == V_RSTS_POWEROFF) {
+                qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+            } else {
+                qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+            }
+        }
+        break;
+    case R_RSTS:
+        qemu_log_mask(LOG_UNIMP,
+                      "bcm2835_powermgt_write: RSTS\n");
+        s->rsts = value;
+        break;
+    case R_WDOG:
+        qemu_log_mask(LOG_UNIMP,
+                      "bcm2835_powermgt_write: WDOG\n");
+        s->wdog = value;
+        break;
+
+    default:
+        qemu_log_mask(LOG_UNIMP,
+                      "bcm2835_powermgt_write: Unknown offset 0x%08"HWADDR_PRIx
+                      "\n", offset);
+        break;
+    }
+}
+
+static const MemoryRegionOps bcm2835_powermgt_ops = {
+    .read = bcm2835_powermgt_read,
+    .write = bcm2835_powermgt_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .impl.min_access_size = 4,
+    .impl.max_access_size = 4,
+};
+
+static const VMStateDescription vmstate_bcm2835_powermgt = {
+    .name = TYPE_BCM2835_POWERMGT,
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(rstc, BCM2835PowerMgtState),
+        VMSTATE_UINT32(rsts, BCM2835PowerMgtState),
+        VMSTATE_UINT32(wdog, BCM2835PowerMgtState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static void bcm2835_powermgt_init(Object *obj)
+{
+    BCM2835PowerMgtState *s = BCM2835_POWERMGT(obj);
+
+    memory_region_init_io(&s->iomem, obj, &bcm2835_powermgt_ops, s,
+                          TYPE_BCM2835_POWERMGT, 0x200);
+    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->iomem);
+}
+
+static void bcm2835_powermgt_reset(DeviceState *dev)
+{
+    BCM2835PowerMgtState *s = BCM2835_POWERMGT(dev);
+
+    /* https://elinux.org/BCM2835_registers#PM */
+    s->rstc = 0x00000102;
+    s->rsts = 0x00001000;
+    s->wdog = 0x00000000;
+}
+
+static void bcm2835_powermgt_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->reset = bcm2835_powermgt_reset;
+    dc->vmsd = &vmstate_bcm2835_powermgt;
+}
+
+static TypeInfo bcm2835_powermgt_info = {
+    .name          = TYPE_BCM2835_POWERMGT,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(BCM2835PowerMgtState),
+    .class_init    = bcm2835_powermgt_class_init,
+    .instance_init = bcm2835_powermgt_init,
+};
+
+static void bcm2835_powermgt_register_types(void)
+{
+    type_register_static(&bcm2835_powermgt_info);
+}
+
+type_init(bcm2835_powermgt_register_types)
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -XXX,XX +XXX,XX @@ softmmu_ss.add(when: 'CONFIG_RASPI', if_true: files(
   'bcm2835_rng.c',
   'bcm2835_thermal.c',
   'bcm2835_cprman.c',
+  'bcm2835_powermgt.c',
 ))
 softmmu_ss.add(when: 'CONFIG_SLAVIO', if_true: files('slavio_misc.c'))
 softmmu_ss.add(when: 'CONFIG_ZYNQ', if_true: files('zynq_slcr.c', 'zynq-xadc.c'))
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

Add a test booting and quickly shutdown a raspi2 machine,
to test the power management model:

(1/1) tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_raspi2_initrd:
  console: [    0.000000] Booting Linux on physical CPU 0xf00
  console: [    0.000000] Linux version 4.14.98-v7+ (dom@dom-XPS-13-9370) (gcc version 4.9.3 (crosstool-NG crosstool-ng-1.22.0-88-g8460611)) #1200 SMP Tue Feb 12 20:27:48 GMT 2019
  console: [    0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
  console: [    0.000000] CPU: div instructions available: patching division code
  console: [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
  console: [    0.000000] OF: fdt: Machine model: Raspberry Pi 2 Model B
  ...
  console: Boot successful.
  console: cat /proc/cpuinfo
  console: / # cat /proc/cpuinfo
  ...
  console: processor      : 3
  console: model name     : ARMv7 Processor rev 5 (v7l)
  console: BogoMIPS       : 125.00
  console: Features       : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
  console: CPU implementer        : 0x41
  console: CPU architecture: 7
  console: CPU variant    : 0x0
  console: CPU part       : 0xc07
  console: CPU revision   : 5
  console: Hardware       : BCM2835
  console: Revision       : 0000
  console: Serial         : 0000000000000000
  console: cat /proc/iomem
  console: / # cat /proc/iomem
  console: 00000000-3bffffff : System RAM
  console: 00008000-00afffff : Kernel code
  console: 00c00000-00d468ef : Kernel data
  console: 3f006000-3f006fff : dwc_otg
  console: 3f007000-3f007eff : /soc/dma@7e007000
  console: 3f00b880-3f00b8bf : /soc/mailbox@7e00b880
  console: 3f100000-3f100027 : /soc/watchdog@7e100000
  console: 3f101000-3f102fff : /soc/cprman@7e101000
  console: 3f200000-3f2000b3 : /soc/gpio@7e200000
  PASS (24.59 s)
  RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
  JOB TIME   : 25.02 s

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Message-id: 20210531113837.1689775-1-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/acceptance/boot_linux_console.py | 43 ++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/tests/acceptance/boot_linux_console.py b/tests/acceptance/boot_linux_console.py
index XXXXXXX..XXXXXXX 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -XXX,XX +XXX,XX @@
 from avocado import skip
 from avocado import skipUnless
 from avocado_qemu import Test
+from avocado_qemu import exec_command
 from avocado_qemu import exec_command_and_wait_for_pattern
 from avocado_qemu import interrupt_interactive_console_until_pattern
 from avocado_qemu import wait_for_console_pattern
@@ -XXX,XX +XXX,XX @@ def test_arm_raspi2_uart0(self):
         """
         self.do_test_arm_raspi2(0)
 
+    def test_arm_raspi2_initrd(self):
+        """
+        :avocado: tags=arch:arm
+        :avocado: tags=machine:raspi2
+        """
+        deb_url = ('http://archive.raspberrypi.org/debian/'
+                   'pool/main/r/raspberrypi-firmware/'
+                   'raspberrypi-kernel_1.20190215-1_armhf.deb')
+        deb_hash = 'cd284220b32128c5084037553db3c482426f3972'
+        deb_path = self.fetch_asset(deb_url, asset_hash=deb_hash)
+        kernel_path = self.extract_from_deb(deb_path, '/boot/kernel7.img')
+        dtb_path = self.extract_from_deb(deb_path, '/boot/bcm2709-rpi-2-b.dtb')
+
+        initrd_url = ('https://github.com/groeck/linux-build-test/raw/'
+                      '2eb0a73b5d5a28df3170c546ddaaa9757e1e0848/rootfs/'
+                      'arm/rootfs-armv7a.cpio.gz')
+        initrd_hash = '604b2e45cdf35045846b8bbfbf2129b1891bdc9c'
+        initrd_path_gz = self.fetch_asset(initrd_url, asset_hash=initrd_hash)
+        initrd_path = os.path.join(self.workdir, 'rootfs.cpio')
+        archive.gzip_uncompress(initrd_path_gz, initrd_path)
+
+        self.vm.set_console()
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'earlycon=pl011,0x3f201000 console=ttyAMA0 '
+                               'panic=-1 noreboot ' +
+                               'dwc_otg.fiq_fsm_enable=0')
+        self.vm.add_args('-kernel', kernel_path,
+                         '-dtb', dtb_path,
+                         '-initrd', initrd_path,
+                         '-append', kernel_command_line,
+                         '-no-reboot')
+        self.vm.launch()
+        self.wait_for_console_pattern('Boot successful.')
+
+        exec_command_and_wait_for_pattern(self, 'cat /proc/cpuinfo',
+                                                'BCM2835')
+        exec_command_and_wait_for_pattern(self, 'cat /proc/iomem',
+                                                '/soc/cprman@7e101000')
+        exec_command(self, 'halt')
+        # Wait for VM to shut down gracefully
+        self.vm.wait()
+
     def test_arm_exynos4210_initrd(self):
         """
         :avocado: tags=arch:arm
-- 
2.20.1

From: Joe Komlodi <joe.komlodi@xilinx.com>

If the CPU is running in default NaN mode (FPCR.DN == 1) and we execute
FRSQRTE, FRECPE, or FRECPX with a signaling NaN, parts_silence_nan_frac() will
assert due to fpst->default_nan_mode being set.

To avoid this, we check to see what NaN mode we're running in before we call
floatxx_silence_nan().

Signed-off-by: Joe Komlodi <joe.komlodi@xilinx.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1624662174-175828-2-git-send-email-joe.komlodi@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-a64.c | 12 +++++++++---
 target/arm/vfp_helper.c | 24 ++++++++++++++++++------
 2 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(frecpx_f16)(uint32_t a, void *fpstp)
         float16 nan = a;
         if (float16_is_signaling_nan(a, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float16_silence_nan(a, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float16_silence_nan(a, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan = float16_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(frecpx_f32)(float32 a, void *fpstp)
         float32 nan = a;
         if (float32_is_signaling_nan(a, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float32_silence_nan(a, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float32_silence_nan(a, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan = float32_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(frecpx_f64)(float64 a, void *fpstp)
         float64 nan = a;
         if (float64_is_signaling_nan(a, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float64_silence_nan(a, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float64_silence_nan(a, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan = float64_default_nan(fpst);
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(recpe_f16)(uint32_t input, void *fpstp)
         float16 nan = f16;
         if (float16_is_signaling_nan(f16, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float16_silence_nan(f16, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float16_silence_nan(f16, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan =  float16_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(recpe_f32)(float32 input, void *fpstp)
         float32 nan = f32;
         if (float32_is_signaling_nan(f32, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float32_silence_nan(f32, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float32_silence_nan(f32, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan =  float32_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(recpe_f64)(float64 input, void *fpstp)
         float64 nan = f64;
         if (float64_is_signaling_nan(f64, fpst)) {
             float_raise(float_flag_invalid, fpst);
-            nan = float64_silence_nan(f64, fpst);
+            if (!fpst->default_nan_mode) {
+                nan = float64_silence_nan(f64, fpst);
+            }
         }
         if (fpst->default_nan_mode) {
             nan =  float64_default_nan(fpst);
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(rsqrte_f16)(uint32_t input, void *fpstp)
         float16 nan = f16;
         if (float16_is_signaling_nan(f16, s)) {
             float_raise(float_flag_invalid, s);
-            nan = float16_silence_nan(f16, s);
+            if (!s->default_nan_mode) {
+                nan = float16_silence_nan(f16, fpstp);
+            }
         }
         if (s->default_nan_mode) {
             nan =  float16_default_nan(s);
@@ -XXX,XX +XXX,XX @@ float32 HELPER(rsqrte_f32)(float32 input, void *fpstp)
         float32 nan = f32;
         if (float32_is_signaling_nan(f32, s)) {
             float_raise(float_flag_invalid, s);
-            nan = float32_silence_nan(f32, s);
+            if (!s->default_nan_mode) {
+                nan = float32_silence_nan(f32, fpstp);
+            }
         }
         if (s->default_nan_mode) {
             nan =  float32_default_nan(s);
@@ -XXX,XX +XXX,XX @@ float64 HELPER(rsqrte_f64)(float64 input, void *fpstp)
         float64 nan = f64;
         if (float64_is_signaling_nan(f64, s)) {
             float_raise(float_flag_invalid, s);
-            nan = float64_silence_nan(f64, s);
+            if (!s->default_nan_mode) {
+                nan = float64_silence_nan(f64, fpstp);
+            }
         }
         if (s->default_nan_mode) {
             nan =  float64_default_nan(s);
-- 
2.20.1

From: Maxim Uvarov <maxim.uvarov@linaro.org>

qemu has 2 type of functions: shutdown and reboot. Shutdown
function has to be used for machine shutdown. Otherwise we cause
a reset with a bogus "cause" value, when we intended a shutdown.

Signed-off-by: Maxim Uvarov <maxim.uvarov@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210625111842.3790-3-maxim.uvarov@linaro.org
[PMM: tweaked commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/gpio/gpio_pwr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/gpio/gpio_pwr.c b/hw/gpio/gpio_pwr.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/gpio/gpio_pwr.c
+++ b/hw/gpio/gpio_pwr.c
@@ -XXX,XX +XXX,XX @@ static void gpio_pwr_reset(void *opaque, int n, int level)
 static void gpio_pwr_shutdown(void *opaque, int n, int level)
 {
     if (level) {
-        qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
     }
 }
 
-- 
2.20.1

In do_ldst(), the calculation of the offset needs to be based on the
size of the memory access, not the size of the elements in the
vector.  This meant we were getting it wrong for the widening and
narrowing variants of the various VLDR and VSTR insns.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-2-peter.maydell@linaro.org
---
 target/arm/translate-mve.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ static bool mve_skip_first_beat(DisasContext *s)
     }
 }
 
-static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
+static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn,
+                    unsigned msize)
 {
     TCGv_i32 addr;
     uint32_t offset;
@@ -XXX,XX +XXX,XX @@ static bool do_ldst(DisasContext *s, arg_VLDR_VSTR *a, MVEGenLdStFn *fn)
         return true;
     }
 
-    offset = a->imm << a->size;
+    offset = a->imm << msize;
     if (!a->a) {
         offset = -offset;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR(DisasContext *s, arg_VLDR_VSTR *a)
         { gen_helper_mve_vstrw, gen_helper_mve_vldrw },
         { NULL, NULL }
     };
-    return do_ldst(s, a, ldstfns[a->size][a->l]);
+    return do_ldst(s, a, ldstfns[a->size][a->l], a->size);
 }
 
-#define DO_VLDST_WIDE_NARROW(OP, SLD, ULD, ST)                  \
+#define DO_VLDST_WIDE_NARROW(OP, SLD, ULD, ST, MSIZE)           \
     static bool trans_##OP(DisasContext *s, arg_VLDR_VSTR *a)   \
     {                                                           \
         static MVEGenLdStFn * const ldstfns[2][2] = {           \
             { gen_helper_mve_##ST, gen_helper_mve_##SLD },      \
             { NULL, gen_helper_mve_##ULD },                     \
         };                                                      \
-        return do_ldst(s, a, ldstfns[a->u][a->l]);              \
+        return do_ldst(s, a, ldstfns[a->u][a->l], MSIZE);       \
     }
 
-DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h)
-DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w)
-DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w)
+DO_VLDST_WIDE_NARROW(VLDSTB_H, vldrb_sh, vldrb_uh, vstrb_h, MO_8)
+DO_VLDST_WIDE_NARROW(VLDSTB_W, vldrb_sw, vldrb_uw, vstrb_w, MO_8)
+DO_VLDST_WIDE_NARROW(VLDSTH_W, vldrh_sw, vldrh_uw, vstrh_w, MO_16)
 
 static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
 {
-- 
2.20.1

The initial implementation of the MVE VRMLALDAVH and VRMLSLDAVH
insns had some bugs:
 * the 32x32 multiply of elements was being done as 32x32->32,
   not 32x32->64
 * we were incorrectly maintaining the accumulator in its full
   72-bit form across all 4 beats of the insn; in the pseudocode
   it is squashed back into the 64 bits of the RdaHi:RdaLo
   registers after each beat

In particular, fixing the second of these allows us to recast
the implementation to avoid 128-bit arithmetic entirely.

Since the element size here is always 4, we can also drop the
parameterization of ESIZE to make the code a little more readable.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-3-peter.maydell@linaro.org
---
 target/arm/mve_helper.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu/int128.h"
 #include "cpu.h"
 #include "internals.h"
 #include "vec_internal.h"
@@ -XXX,XX +XXX,XX @@ DO_LDAV(vmlsldavsw, 4, int32_t, false, +=, -=)
 DO_LDAV(vmlsldavxsw, 4, int32_t, true, +=, -=)
 
 /*
- * Rounding multiply add long dual accumulate high: we must keep
- * a 72-bit internal accumulator value and return the top 64 bits.
+ * Rounding multiply add long dual accumulate high. In the pseudocode
+ * this is implemented with a 72-bit internal accumulator value of which
+ * the top 64 bits are returned. We optimize this to avoid having to
+ * use 128-bit arithmetic -- we can do this because the 74-bit accumulator
+ * is squashed back into 64-bits after each beat.
  */
-#define DO_LDAVH(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC, TO128)         \
+#define DO_LDAVH(OP, TYPE, LTYPE, XCHG, SUB)                            \
     uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn,         \
                                     void *vm, uint64_t a)               \
     {                                                                   \
         uint16_t mask = mve_element_mask(env);                          \
         unsigned e;                                                     \
         TYPE *n = vn, *m = vm;                                          \
-        Int128 acc = int128_lshift(TO128(a), 8);                        \
-        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {              \
+        for (e = 0; e < 16 / 4; e++, mask >>= 4) {                      \
             if (mask & 1) {                                             \
+                LTYPE mul;                                              \
                 if (e & 1) {                                            \
-                    acc = ODDACC(acc, TO128(n[H##ESIZE(e - 1 * XCHG)] * \
-                                            m[H##ESIZE(e)]));           \
+                    mul = (LTYPE)n[H4(e - 1 * XCHG)] * m[H4(e)];        \
+                    if (SUB) {                                          \
+                        mul = -mul;                                     \
+                    }                                                   \
                 } else {                                                \
-                    acc = EVENACC(acc, TO128(n[H##ESIZE(e + 1 * XCHG)] * \
-                                             m[H##ESIZE(e)]));          \
+                    mul = (LTYPE)n[H4(e + 1 * XCHG)] * m[H4(e)];        \
                 }                                                       \
-                acc = int128_add(acc, int128_make64(1 << 7));           \
+                mul = (mul >> 8) + ((mul >> 7) & 1);                    \
+                a += mul;                                               \
             }                                                           \
         }                                                               \
         mve_advance_vpt(env);                                           \
-        return int128_getlo(int128_rshift(acc, 8));                     \
+        return a;                                                       \
     }
 
-DO_LDAVH(vrmlaldavhsw, 4, int32_t, false, int128_add, int128_add, int128_makes64)
-DO_LDAVH(vrmlaldavhxsw, 4, int32_t, true, int128_add, int128_add, int128_makes64)
+DO_LDAVH(vrmlaldavhsw, int32_t, int64_t, false, false)
+DO_LDAVH(vrmlaldavhxsw, int32_t, int64_t, true, false)
 
-DO_LDAVH(vrmlaldavhuw, 4, uint32_t, false, int128_add, int128_add, int128_make64)
+DO_LDAVH(vrmlaldavhuw, uint32_t, uint64_t, false, false)
 
-DO_LDAVH(vrmlsldavhsw, 4, int32_t, false, int128_add, int128_sub, int128_makes64)
-DO_LDAVH(vrmlsldavhxsw, 4, int32_t, true, int128_add, int128_sub, int128_makes64)
+DO_LDAVH(vrmlsldavhsw, int32_t, int64_t, false, true)
+DO_LDAVH(vrmlsldavhxsw, int32_t, int64_t, true, true)
 
 /* Vector add across vector */
 #define DO_VADDV(OP, ESIZE, TYPE)                               \
-- 
2.20.1

The function asimd_imm_const() in translate-neon.c is an
implementation of the pseudocode AdvSIMDExpandImm(), which we will
also want for MVE.  Move the implementation to translate.c, with a
prototype in translate.h.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-4-peter.maydell@linaro.org
---
 target/arm/translate.h      | 16 ++++++++++
 target/arm/translate-neon.c | 63 -------------------------------------
 target/arm/translate.c      | 57 +++++++++++++++++++++++++++++++++
 3 files changed, 73 insertions(+), 63 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ static inline MemOp finalize_memop(DisasContext *s, MemOp opc)
     return opc | s->be_data;
 }
 
+/**
+ * asimd_imm_const: Expand an encoded SIMD constant value
+ *
+ * Expand a SIMD constant value. This is essentially the pseudocode
+ * AdvSIMDExpandImm, except that we also perform the boolean NOT needed for
+ * VMVN and VBIC (when cmode < 14 && op == 1).
+ *
+ * The combination cmode == 15 op == 1 is a reserved encoding for AArch32;
+ * callers must catch this.
+ *
+ * cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 was UNPREDICTABLE in v7A but
+ * is either not unpredictable or merely CONSTRAINED UNPREDICTABLE in v8A;
+ * we produce an immediate constant value of 0 in these cases.
+ */
+uint64_t asimd_imm_const(uint32_t imm, int cmode, int op);
+
 #endif /* TARGET_ARM_TRANSLATE_H */
diff --git a/target/arm/translate-neon.c b/target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@ DO_FP_2SH(VCVT_UH, gen_helper_gvec_vcvt_uh)
 DO_FP_2SH(VCVT_HS, gen_helper_gvec_vcvt_hs)
 DO_FP_2SH(VCVT_HU, gen_helper_gvec_vcvt_hu)
 
-static uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
-{
-    /*
-     * Expand the encoded constant.
-     * Note that cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 is UNPREDICTABLE.
-     * We choose to not special-case this and will behave as if a
-     * valid constant encoding of 0 had been given.
-     * cmode = 15 op = 1 must UNDEF; we assume decode has handled that.
-     */
-    switch (cmode) {
-    case 0: case 1:
-        /* no-op */
-        break;
-    case 2: case 3:
-        imm <<= 8;
-        break;
-    case 4: case 5:
-        imm <<= 16;
-        break;
-    case 6: case 7:
-        imm <<= 24;
-        break;
-    case 8: case 9:
-        imm |= imm << 16;
-        break;
-    case 10: case 11:
-        imm = (imm << 8) | (imm << 24);
-        break;
-    case 12:
-        imm = (imm << 8) | 0xff;
-        break;
-    case 13:
-        imm = (imm << 16) | 0xffff;
-        break;
-    case 14:
-        if (op) {
-            /*
-             * This is the only case where the top and bottom 32 bits
-             * of the encoded constant differ.
-             */
-            uint64_t imm64 = 0;
-            int n;
-
-            for (n = 0; n < 8; n++) {
-                if (imm & (1 << n)) {
-                    imm64 |= (0xffULL << (n * 8));
-                }
-            }
-            return imm64;
-        }
-        imm |= (imm << 8) | (imm << 16) | (imm << 24);
-        break;
-    case 15:
-        imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
-            | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
-        break;
-    }
-    if (op) {
-        imm = ~imm;
-    }
-    return dup_const(MO_32, imm);
-}
-
 static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
                         GVecGen2iFn *fn)
 {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ void arm_translate_init(void)
     a64_translate_init();
 }
 
+uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
+{
+    /* Expand the encoded constant as per AdvSIMDExpandImm pseudocode */
+    switch (cmode) {
+    case 0: case 1:
+        /* no-op */
+        break;
+    case 2: case 3:
+        imm <<= 8;
+        break;
+    case 4: case 5:
+        imm <<= 16;
+        break;
+    case 6: case 7:
+        imm <<= 24;
+        break;
+    case 8: case 9:
+        imm |= imm << 16;
+        break;
+    case 10: case 11:
+        imm = (imm << 8) | (imm << 24);
+        break;
+    case 12:
+        imm = (imm << 8) | 0xff;
+        break;
+    case 13:
+        imm = (imm << 16) | 0xffff;
+        break;
+    case 14:
+        if (op) {
+            /*
+             * This is the only case where the top and bottom 32 bits
+             * of the encoded constant differ.
+             */
+            uint64_t imm64 = 0;
+            int n;
+
+            for (n = 0; n < 8; n++) {
+                if (imm & (1 << n)) {
+                    imm64 |= (0xffULL << (n * 8));
+                }
+            }
+            return imm64;
+        }
+        imm |= (imm << 8) | (imm << 16) | (imm << 24);
+        break;
+    case 15:
+        imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
+            | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
+        break;
+    }
+    if (op) {
+        imm = ~imm;
+    }
+    return dup_const(MO_32, imm);
+}
+
 /* Generate a label used for skipping this instruction */
 void arm_gen_condlabel(DisasContext *s)
 {
-- 
2.20.1

The A64 AdvSIMD modified-immediate grouping uses almost the same
constant encoding that A32 Neon does; reuse asimd_imm_const() (to
which we add the AArch64-specific case for cmode 15 op 1) instead of
reimplementing it all.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-5-peter.maydell@linaro.org
---
 target/arm/translate.h     |  3 +-
 target/arm/translate-a64.c | 86 ++++----------------------------------
 target/arm/translate.c     | 17 +++++++-
 3 files changed, 24 insertions(+), 82 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ static inline MemOp finalize_memop(DisasContext *s, MemOp opc)
  * VMVN and VBIC (when cmode < 14 && op == 1).
  *
  * The combination cmode == 15 op == 1 is a reserved encoding for AArch32;
- * callers must catch this.
+ * callers must catch this; we return the 64-bit constant value defined
+ * for AArch64.
  *
  * cmode = 2,3,4,5,6,7,10,11,12,13 imm=0 was UNPREDICTABLE in v7A but
  * is either not unpredictable or merely CONSTRAINED UNPREDICTABLE in v8A;
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
 {
     int rd = extract32(insn, 0, 5);
     int cmode = extract32(insn, 12, 4);
-    int cmode_3_1 = extract32(cmode, 1, 3);
-    int cmode_0 = extract32(cmode, 0, 1);
     int o2 = extract32(insn, 11, 1);
     uint64_t abcdefgh = extract32(insn, 5, 5) | (extract32(insn, 16, 3) << 5);
     bool is_neg = extract32(insn, 29, 1);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
         return;
     }
 
-    /* See AdvSIMDExpandImm() in ARM ARM */
-    switch (cmode_3_1) {
-    case 0: /* Replicate(Zeros(24):imm8, 2) */
-    case 1: /* Replicate(Zeros(16):imm8:Zeros(8), 2) */
-    case 2: /* Replicate(Zeros(8):imm8:Zeros(16), 2) */
-    case 3: /* Replicate(imm8:Zeros(24), 2) */
-    {
-        int shift = cmode_3_1 * 8;
-        imm = bitfield_replicate(abcdefgh << shift, 32);
-        break;
-    }
-    case 4: /* Replicate(Zeros(8):imm8, 4) */
-    case 5: /* Replicate(imm8:Zeros(8), 4) */
-    {
-        int shift = (cmode_3_1 & 0x1) * 8;
-        imm = bitfield_replicate(abcdefgh << shift, 16);
-        break;
-    }
-    case 6:
-        if (cmode_0) {
-            /* Replicate(Zeros(8):imm8:Ones(16), 2) */
-            imm = (abcdefgh << 16) | 0xffff;
-        } else {
-            /* Replicate(Zeros(16):imm8:Ones(8), 2) */
-            imm = (abcdefgh << 8) | 0xff;
-        }
-        imm = bitfield_replicate(imm, 32);
-        break;
-    case 7:
-        if (!cmode_0 && !is_neg) {
-            imm = bitfield_replicate(abcdefgh, 8);
-        } else if (!cmode_0 && is_neg) {
-            int i;
-            imm = 0;
-            for (i = 0; i < 8; i++) {
-                if ((abcdefgh) & (1 << i)) {
-                    imm |= 0xffULL << (i * 8);
-                }
-            }
-        } else if (cmode_0) {
-            if (is_neg) {
-                imm = (abcdefgh & 0x3f) << 48;
-                if (abcdefgh & 0x80) {
-                    imm |= 0x8000000000000000ULL;
-                }
-                if (abcdefgh & 0x40) {
-                    imm |= 0x3fc0000000000000ULL;
-                } else {
-                    imm |= 0x4000000000000000ULL;
-                }
-            } else {
-                if (o2) {
-                    /* FMOV (vector, immediate) - half-precision */
-                    imm = vfp_expand_imm(MO_16, abcdefgh);
-                    /* now duplicate across the lanes */
-                    imm = bitfield_replicate(imm, 16);
-                } else {
-                    imm = (abcdefgh & 0x3f) << 19;
-                    if (abcdefgh & 0x80) {
-                        imm |= 0x80000000;
-                    }
-                    if (abcdefgh & 0x40) {
-                        imm |= 0x3e000000;
-                    } else {
-                        imm |= 0x40000000;
-                    }
-                    imm |= (imm << 32);
-                }
-            }
-        }
-        break;
-    default:
-        g_assert_not_reached();
-    }
-
-    if (cmode_3_1 != 7 && is_neg) {
-        imm = ~imm;
+    if (cmode == 15 && o2 && !is_neg) {
+        /* FMOV (vector, immediate) - half-precision */
+        imm = vfp_expand_imm(MO_16, abcdefgh);
+        /* now duplicate across the lanes */
+        imm = bitfield_replicate(imm, 16);
+    } else {
+        imm = asimd_imm_const(abcdefgh, cmode, is_neg);
     }
 
     if (!((cmode & 0x9) == 0x1 || (cmode & 0xd) == 0x9)) {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
     case 14:
         if (op) {
             /*
-             * This is the only case where the top and bottom 32 bits
-             * of the encoded constant differ.
+             * This and cmode == 15 op == 1 are the only cases where
+             * the top and bottom 32 bits of the encoded constant differ.
              */
             uint64_t imm64 = 0;
             int n;
@@ -XXX,XX +XXX,XX @@ uint64_t asimd_imm_const(uint32_t imm, int cmode, int op)
         imm |= (imm << 8) | (imm << 16) | (imm << 24);
         break;
     case 15:
+        if (op) {
+            /* Reserved encoding for AArch32; valid for AArch64 */
+            uint64_t imm64 = (uint64_t)(imm & 0x3f) << 48;
+            if (imm & 0x80) {
+                imm64 |= 0x8000000000000000ULL;
+            }
+            if (imm & 0x40) {
+                imm64 |= 0x3fc0000000000000ULL;
+            } else {
+                imm64 |= 0x4000000000000000ULL;
+            }
+            return imm64;
+        }
         imm = ((imm & 0x80) << 24) | ((imm & 0x3f) << 19)
             | ((imm & 0x40) ? (0x1f << 25) : (1 << 30));
         break;
-- 
2.20.1

Use dup_const() instead of bitfield_replicate() in
disas_simd_mod_imm().

(We can't replace the other use of bitfield_replicate() in this file,
in logic_imm_decode_wmask(), because that location needs to handle 2
and 4 bit elements, which dup_const() cannot.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-6-peter.maydell@linaro.org
---
 target/arm/translate-a64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_simd_mod_imm(DisasContext *s, uint32_t insn)
         /* FMOV (vector, immediate) - half-precision */
         imm = vfp_expand_imm(MO_16, abcdefgh);
         /* now duplicate across the lanes */
-        imm = bitfield_replicate(imm, 16);
+        imm = dup_const(MO_16, imm);
     } else {
         imm = asimd_imm_const(abcdefgh, cmode, is_neg);
     }
-- 
2.20.1

Implement the MVE logical-immediate insns (VMOV, VMVN,
VORR and VBIC). These have essentially the same encoding
as their Neon equivalents, and we implement the decode
in the same way.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-7-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  4 +++
 target/arm/mve.decode      | 17 +++++++++++++
 target/arm/mve_helper.c    | 24 ++++++++++++++++++
 target/arm/translate-mve.c | 50 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 95 insertions(+)

Implement the MVE shift-vector-left-by-immediate insns VSHL, VQSHL
and VQSHLU.

The size-and-immediate encoding here is the same as Neon, and we
handle it the same way neon-dp.decode does.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-8-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 16 +++++++++++
 target/arm/mve.decode      | 23 +++++++++++++++
 target/arm/mve_helper.c    | 57 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 51 ++++++++++++++++++++++++++++++++++
 4 files changed, 147 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(mve_vaddvuw, TCG_CALL_NO_WG, i32, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
 DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
 DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
+
+DEF_HELPER_FLAGS_4(mve_vshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshli_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshli_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshlui_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshlui_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshlui_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
 &2op qd qm qn size
 &2scalar qd qn rm size
 &1imm qd imm cmode op
+&2shift qd qm shift size
 
 @vldr_vstr ....... . . . . l:1 rn:4 ... ...... imm:7 &vldr_vstr qd=%qd u=0
 # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -XXX,XX +XXX,XX @@
 @2scalar .... .... .. size:2 .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
 @2scalar_nosz .... .... .... .... .... .... .... rm:4 &2scalar qd=%qd qn=%qn
 
+@2_shl_b .... .... .. 001 shift:3 .... .... .... .... &2shift qd=%qd qm=%qm size=0
+@2_shl_h .... .... .. 01  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
+@2_shl_w .... .... .. 1   shift:5 .... .... .... .... &2shift qd=%qd qm=%qm size=2
+
 # Vector loads and stores
 
 # Widening loads and narrowing stores:
@@ -XXX,XX +XXX,XX @@ VPST             1111 1110 0 . 11 000 1 ... 0 1111 0100 1101 mask=%mask_22_13
 # So we have a single decode line and check the cmode/op in the
 # trans function.
 Vimm_1r 111 . 1111 1 . 00 0 ... ... 0 .... 0 1 . 1 .... @1imm
+
+# Shifts by immediate
+
+VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_b
+VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_h
+VSHLI             111 0 1111 1 . ... ... ... 0 0101 0 1 . 1 ... 0 @2_shl_w
+
+VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_b
+VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_h
+VQSHLI_S          111 0 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_w
+
+VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_b
+VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_h
+VQSHLI_U          111 1 1111 1 . ... ... ... 0 0111 0 1 . 1 ... 0 @2_shl_w
+
+VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_b
+VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_h
+VQSHLUI           111 1 1111 1 . ... ... ... 0 0110 0 1 . 1 ... 0 @2_shl_w
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2OP_SAT(vqsubsw, 4, int32_t, DO_SQSUB_W)
     WRAP_QRSHL_HELPER(do_sqrshl_bhs, N, M, true, satp)
 #define DO_UQRSHL_OP(N, M, satp) \
     WRAP_QRSHL_HELPER(do_uqrshl_bhs, N, M, true, satp)
+#define DO_SUQSHL_OP(N, M, satp) \
+    WRAP_QRSHL_HELPER(do_suqrshl_bhs, N, M, false, satp)
 
 DO_2OP_SAT_S(vqshls, DO_SQSHL_OP)
 DO_2OP_SAT_U(vqshlu, DO_UQSHL_OP)
@@ -XXX,XX +XXX,XX @@ DO_VADDV(vaddvsw, 4, uint32_t)
 DO_VADDV(vaddvub, 1, uint8_t)
 DO_VADDV(vaddvuh, 2, uint16_t)
 DO_VADDV(vaddvuw, 4, uint32_t)
+
+/* Shifts by immediate */
+#define DO_2SHIFT(OP, ESIZE, TYPE, FN)                          \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
+                                void *vm, uint32_t shift)       \
+    {                                                           \
+        TYPE *d = vd, *m = vm;                                  \
+        uint16_t mask = mve_element_mask(env);                  \
+        unsigned e;                                             \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
+            mergemask(&d[H##ESIZE(e)],                          \
+                      FN(m[H##ESIZE(e)], shift), mask);         \
+        }                                                       \
+        mve_advance_vpt(env);                                   \
+    }
+
+#define DO_2SHIFT_SAT(OP, ESIZE, TYPE, FN)                      \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
+                                void *vm, uint32_t shift)       \
+    {                                                           \
+        TYPE *d = vd, *m = vm;                                  \
+        uint16_t mask = mve_element_mask(env);                  \
+        unsigned e;                                             \
+        bool qc = false;                                        \
+        for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {      \
+            bool sat = false;                                   \
+            mergemask(&d[H##ESIZE(e)],                          \
+                      FN(m[H##ESIZE(e)], shift, &sat), mask);   \
+            qc |= sat & mask & 1;                               \
+        }                                                       \
+        if (qc) {                                               \
+            env->vfp.qc[0] = qc;                                \
+        }                                                       \
+        mve_advance_vpt(env);                                   \
+    }
+
+/* provide unsigned 2-op shift helpers for all sizes */
+#define DO_2SHIFT_U(OP, FN)                     \
+    DO_2SHIFT(OP##b, 1, uint8_t, FN)            \
+    DO_2SHIFT(OP##h, 2, uint16_t, FN)           \
+    DO_2SHIFT(OP##w, 4, uint32_t, FN)
+
+#define DO_2SHIFT_SAT_U(OP, FN)                 \
+    DO_2SHIFT_SAT(OP##b, 1, uint8_t, FN)        \
+    DO_2SHIFT_SAT(OP##h, 2, uint16_t, FN)       \
+    DO_2SHIFT_SAT(OP##w, 4, uint32_t, FN)
+#define DO_2SHIFT_SAT_S(OP, FN)                 \
+    DO_2SHIFT_SAT(OP##b, 1, int8_t, FN)         \
+    DO_2SHIFT_SAT(OP##h, 2, int16_t, FN)        \
+    DO_2SHIFT_SAT(OP##w, 4, int32_t, FN)
+
+DO_2SHIFT_U(vshli_u, DO_VSHLU)
+DO_2SHIFT_SAT_U(vqshli_u, DO_UQSHL_OP)
+DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
+DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ typedef void MVEGenLdStFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenOneOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenTwoOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenTwoOpScalarFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
+typedef void MVEGenTwoOpShiftFn(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenDualAccOpFn(TCGv_i64, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i64);
 typedef void MVEGenVADDVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void MVEGenOneOpImmFn(TCGv_ptr, TCGv_ptr, TCGv_i64);
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1imm *a)
     }
     return do_1imm(s, a, fn);
 }
+
+static bool do_2shift(DisasContext *s, arg_2shift *a, MVEGenTwoOpShiftFn fn,
+                      bool negateshift)
+{
+    TCGv_ptr qd, qm;
+    int shift = a->shift;
+
+    if (!dc_isar_feature(aa32_mve, s) ||
+        !mve_check_qreg_bank(s, a->qd | a->qm) ||
+        !fn) {
+        return false;
+    }
+    if (!mve_eci_check(s) || !vfp_access_check(s)) {
+        return true;
+    }
+
+    /*
+     * When we handle a right shift insn using a left-shift helper
+     * which permits a negative shift count to indicate a right-shift,
+     * we must negate the shift count.
+     */
+    if (negateshift) {
+        shift = -shift;
+    }
+
+    qd = mve_qreg_ptr(a->qd);
+    qm = mve_qreg_ptr(a->qm);
+    fn(cpu_env, qd, qm, tcg_constant_i32(shift));
+    tcg_temp_free_ptr(qd);
+    tcg_temp_free_ptr(qm);
+    mve_update_eci(s);
+    return true;
+}
+
+#define DO_2SHIFT(INSN, FN, NEGATESHIFT)                         \
+    static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
+    {                                                           \
+        static MVEGenTwoOpShiftFn * const fns[] = {             \
+            gen_helper_mve_##FN##b,                             \
+            gen_helper_mve_##FN##h,                             \
+            gen_helper_mve_##FN##w,                             \
+            NULL,                                               \
+        };                                                      \
+        return do_2shift(s, a, fns[a->size], NEGATESHIFT);      \
+    }
+
+DO_2SHIFT(VSHLI, vshli_u, false)
+DO_2SHIFT(VQSHLI_S, vqshli_s, false)
+DO_2SHIFT(VQSHLI_U, vqshli_u, false)
+DO_2SHIFT(VQSHLUI, vqshlui_s, false)
-- 
2.20.1

Implement the MVE vector shift right by immediate insns VSHRI and
VRSHRI.  As with Neon, we implement these by using helper functions
which perform left shifts but allow negative shift counts to indicate
right shifts.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-9-peter.maydell@linaro.org
---
 target/arm/helper-mve.h     | 12 ++++++++++++
 target/arm/translate.h      | 20 ++++++++++++++++++++
 target/arm/mve.decode       | 28 ++++++++++++++++++++++++++++
 target/arm/mve_helper.c     |  7 +++++++
 target/arm/translate-mve.c  |  5 +++++
 target/arm/translate-neon.c | 18 ------------------
 6 files changed, 72 insertions(+), 18 deletions(-)

Implement the MVE VHLL (vector shift left long) insn.  This has two
encodings: the T1 encoding is the usual shift-by-immediate format,
and the T2 encoding is a special case where the shift count is always
equal to the element size.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-10-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  9 +++++++
 target/arm/mve.decode      | 53 +++++++++++++++++++++++++++++++++++---
 target/arm/mve_helper.c    | 32 +++++++++++++++++++++++
 target/arm/translate-mve.c | 15 +++++++++++
 4 files changed, 105 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vshllbsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshllbsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshllbub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshllbuh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshlltsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshlltsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshlltub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vshlltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@
 @2_shl_h .... .... .. 01  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
 @2_shl_w .... .... .. 1   shift:5 .... .... .... .... &2shift qd=%qd qm=%qm size=2
 
+@2_shll_b .... .... ... 01 shift:3 .... .... .... .... &2shift qd=%qd qm=%qm size=0
+@2_shll_h .... .... ... 1  shift:4 .... .... .... .... &2shift qd=%qd qm=%qm size=1
+# VSHLL encoding T2 where shift == esize
+@2_shll_esize_b .... .... .... 00 .. .... .... .... .... &2shift \
+                qd=%qd qm=%qm size=0 shift=8
+@2_shll_esize_h .... .... .... 01 .. .... .... .... .... &2shift \
+                qd=%qd qm=%qm size=1 shift=16
+
 # Right shifts are encoded as N - shift, where N is the element size in bits.
 %rshift_i5  16:5 !function=rsub_32
 %rshift_i4  16:4 !function=rsub_16
@@ -XXX,XX +XXX,XX @@ VADD             1110 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
 VSUB             1111 1111 0 . .. ... 0 ... 0 1000 . 1 . 0 ... 0 @2op
 VMUL             1110 1111 0 . .. ... 0 ... 0 1001 . 1 . 1 ... 0 @2op
 
-VMULH_S          111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
-VMULH_U          111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
+# The VSHLL T2 encoding is not a @2op pattern, but is here because it
+# overlaps what would be size=0b11 VMULH/VRMULH
+{
+  VSHLL_BS       111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
+  VSHLL_BS       111 0 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
 
-VRMULH_S         111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
-VRMULH_U         111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
+  VMULH_S        111 0 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
+}
+
+{
+  VSHLL_BU       111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_b
+  VSHLL_BU       111 1 1110 0 . 11 .. 01 ... 0 1110 0 0 . 0 ... 1 @2_shll_esize_h
+
+  VMULH_U        111 1 1110 0 . .. ...1 ... 0 1110 . 0 . 0 ... 1 @2op
+}
+
+{
+  VSHLL_TS       111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
+  VSHLL_TS       111 0 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
+
+  VRMULH_S       111 0 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
+}
+
+{
+  VSHLL_TU       111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_b
+  VSHLL_TU       111 1 1110 0 . 11 .. 01 ... 1 1110 0 0 . 0 ... 1 @2_shll_esize_h
+
+  VRMULH_U       111 1 1110 0 . .. ...1 ... 1 1110 . 0 . 0 ... 1 @2op
+}
 
 VMAX_S           111 0 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
 VMAX_U           111 1 1111 0 . .. ... 0 ... 0 0110 . 1 . 0 ... 0 @2op
@@ -XXX,XX +XXX,XX @@ VRSHRI_S          111 0 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
 VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_b
 VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_h
 VRSHRI_U          111 1 1111 1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
+
+# VSHLL T1 encoding; the T2 VSHLL encoding is elsewhere in this file
+VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_b
+VSHLL_BS          111 0 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_h
+
+VSHLL_BU          111 1 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_b
+VSHLL_BU          111 1 1110 1 . 1 .. ... ... 0 1111 0 1 . 0 ... 0 @2_shll_h
+
+VSHLL_TS          111 0 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_b
+VSHLL_TS          111 0 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
+
+VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_b
+VSHLL_TU          111 1 1110 1 . 1 .. ... ... 1 1111 0 1 . 0 ... 0 @2_shll_h
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
 DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
 DO_2SHIFT_U(vrshli_u, DO_VRSHLU)
 DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
+
+/*
+ * Long shifts taking half-sized inputs from top or bottom of the input
+ * vector and producing a double-width result. ESIZE, TYPE are for
+ * the input, and LESIZE, LTYPE for the output.
+ * Unlike the normal shift helpers, we do not handle negative shift counts,
+ * because the long shift is strictly left-only.
+ */
+#define DO_VSHLL(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE)                   \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,             \
+                                void *vm, uint32_t shift)               \
+    {                                                                   \
+        LTYPE *d = vd;                                                  \
+        TYPE *m = vm;                                                   \
+        uint16_t mask = mve_element_mask(env);                          \
+        unsigned le;                                                    \
+        assert(shift <= 16);                                            \
+        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) {         \
+            LTYPE r = (LTYPE)m[H##ESIZE(le * 2 + TOP)] << shift;        \
+            mergemask(&d[H##LESIZE(le)], r, mask);                      \
+        }                                                               \
+        mve_advance_vpt(env);                                           \
+    }
+
+#define DO_VSHLL_ALL(OP, TOP)                                \
+    DO_VSHLL(OP##sb, TOP, 1, int8_t, 2, int16_t)             \
+    DO_VSHLL(OP##ub, TOP, 1, uint8_t, 2, uint16_t)           \
+    DO_VSHLL(OP##sh, TOP, 2, int16_t, 4, int32_t)            \
+    DO_VSHLL(OP##uh, TOP, 2, uint16_t, 4, uint32_t)          \
+
+DO_VSHLL_ALL(vshllb, false)
+DO_VSHLL_ALL(vshllt, true)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT(VSHRI_S, vshli_s, true)
 DO_2SHIFT(VSHRI_U, vshli_u, true)
 DO_2SHIFT(VRSHRI_S, vrshli_s, true)
 DO_2SHIFT(VRSHRI_U, vrshli_u, true)
+
+#define DO_VSHLL(INSN, FN)                                      \
+    static bool trans_##INSN(DisasContext *s, arg_2shift *a)    \
+    {                                                           \
+        static MVEGenTwoOpShiftFn * const fns[] = {             \
+            gen_helper_mve_##FN##b,                             \
+            gen_helper_mve_##FN##h,                             \
+        };                                                      \
+        return do_2shift(s, a, fns[a->size], false);            \
+    }
+
+DO_VSHLL(VSHLL_BS, vshllbs)
+DO_VSHLL(VSHLL_BU, vshllbu)
+DO_VSHLL(VSHLL_TS, vshllts)
+DO_VSHLL(VSHLL_TU, vshlltu)
-- 
2.20.1

Implement the MVE VSRI and VSLI insns, which perform a
shift-and-insert operation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-11-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  8 ++++++++
 target/arm/mve.decode      |  9 ++++++++
 target/arm/mve_helper.c    | 42 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  3 +++
 4 files changed, 62 insertions(+)

Implement the MVE shift-right-and-narrow insn VSHRN and VRSHRN.

do_urshr() is borrowed from sve_helper.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-12-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    | 10 ++++++++++
 target/arm/mve.decode      | 11 +++++++++++
 target/arm/mve_helper.c    | 40 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 15 ++++++++++++++
 4 files changed, 76 insertions(+)

Implement the MVE saturating shift-right-and-narrow insns
VQSHRN, VQSHRUN, VQRSHRN and VQRSHRUN.

do_srshr() is borrowed from sve_helper.c.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-13-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  30 +++++++++++
 target/arm/mve.decode      |  28 ++++++++++
 target/arm/mve_helper.c    | 104 +++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c |  12 +++++
 4 files changed, 174 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vrshrnbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshrnbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshrntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshrnth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshrnb_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnb_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnt_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnt_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshrnb_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnb_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnt_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrnt_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqshrunbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrunbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqrshrnb_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnb_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnt_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnt_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqrshrnb_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnb_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnt_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrnt_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqrshrunbb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrunbh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshruntb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -XXX,XX +XXX,XX @@ VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_b
 VRSHRNB           111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 1 @2_shr_h
 VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_b
 VRSHRNT           111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 1 @2_shr_h
+
+VQSHRNB_S         111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_b
+VQSHRNB_S         111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_h
+VQSHRNT_S         111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_b
+VQSHRNT_S         111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_h
+VQSHRNB_U         111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_b
+VQSHRNB_U         111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 0 @2_shr_h
+VQSHRNT_U         111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_b
+VQSHRNT_U         111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 0 @2_shr_h
+
+VQSHRUNB          111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_b
+VQSHRUNB          111 0 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_h
+VQSHRUNT          111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
+VQSHRUNT          111 0 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
+
+VQRSHRNB_S        111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_b
+VQRSHRNB_S        111 0 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_h
+VQRSHRNT_S        111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_b
+VQRSHRNT_S        111 0 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_h
+VQRSHRNB_U        111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_b
+VQRSHRNB_U        111 1 1110 1 . ... ... ... 0 1111 0 1 . 0 ... 1 @2_shr_h
+VQRSHRNT_U        111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_b
+VQRSHRNT_U        111 1 1110 1 . ... ... ... 1 1111 0 1 . 0 ... 1 @2_shr_h
+
+VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_b
+VQRSHRUNB         111 1 1110 1 . ... ... ... 0 1111 1 1 . 0 ... 0 @2_shr_h
+VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_b
+VQRSHRUNT         111 1 1110 1 . ... ... ... 1 1111 1 1 . 0 ... 0 @2_shr_h
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint64_t do_urshr(uint64_t x, unsigned sh)
     }
 }
 
+static inline int64_t do_srshr(int64_t x, unsigned sh)
+{
+    if (likely(sh < 64)) {
+        return (x >> sh) + ((x >> (sh - 1)) & 1);
+    } else {
+        /* Rounding the sign bit always produces 0. */
+        return 0;
+    }
+}
+
 DO_VSHRN_ALL(vshrn, DO_SHR)
 DO_VSHRN_ALL(vrshrn, do_urshr)
+
+static inline int32_t do_sat_bhs(int64_t val, int64_t min, int64_t max,
+                                 bool *satp)
+{
+    if (val > max) {
+        *satp = true;
+        return max;
+    } else if (val < min) {
+        *satp = true;
+        return min;
+    } else {
+        return val;
+    }
+}
+
+/* Saturating narrowing right shifts */
+#define DO_VSHRN_SAT(OP, TOP, ESIZE, TYPE, LESIZE, LTYPE, FN)   \
+    void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd,     \
+                                void *vm, uint32_t shift)       \
+    {                                                           \
+        LTYPE *m = vm;                                          \
+        TYPE *d = vd;                                           \
+        uint16_t mask = mve_element_mask(env);                  \
+        bool qc = false;                                        \
+        unsigned le;                                            \
+        for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
+            bool sat = false;                                   \
+            TYPE r = FN(m[H##LESIZE(le)], shift, &sat);         \
+            mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask);     \
+            qc |= sat && (mask & 1 << (TOP * ESIZE));           \
+        }                                                       \
+        if (qc) {                                               \
+            env->vfp.qc[0] = qc;                                \
+        }                                                       \
+        mve_advance_vpt(env);                                   \
+    }
+
+#define DO_VSHRN_SAT_UB(BOP, TOP, FN)                           \
+    DO_VSHRN_SAT(BOP, false, 1, uint8_t, 2, uint16_t, FN)       \
+    DO_VSHRN_SAT(TOP, true, 1, uint8_t, 2, uint16_t, FN)
+
+#define DO_VSHRN_SAT_UH(BOP, TOP, FN)                           \
+    DO_VSHRN_SAT(BOP, false, 2, uint16_t, 4, uint32_t, FN)      \
+    DO_VSHRN_SAT(TOP, true, 2, uint16_t, 4, uint32_t, FN)
+
+#define DO_VSHRN_SAT_SB(BOP, TOP, FN)                           \
+    DO_VSHRN_SAT(BOP, false, 1, int8_t, 2, int16_t, FN)         \
+    DO_VSHRN_SAT(TOP, true, 1, int8_t, 2, int16_t, FN)
+
+#define DO_VSHRN_SAT_SH(BOP, TOP, FN)                           \
+    DO_VSHRN_SAT(BOP, false, 2, int16_t, 4, int32_t, FN)        \
+    DO_VSHRN_SAT(TOP, true, 2, int16_t, 4, int32_t, FN)
+
+#define DO_SHRN_SB(N, M, SATP)                                  \
+    do_sat_bhs((int64_t)(N) >> (M), INT8_MIN, INT8_MAX, SATP)
+#define DO_SHRN_UB(N, M, SATP)                                  \
+    do_sat_bhs((uint64_t)(N) >> (M), 0, UINT8_MAX, SATP)
+#define DO_SHRUN_B(N, M, SATP)                                  \
+    do_sat_bhs((int64_t)(N) >> (M), 0, UINT8_MAX, SATP)
+
+#define DO_SHRN_SH(N, M, SATP)                                  \
+    do_sat_bhs((int64_t)(N) >> (M), INT16_MIN, INT16_MAX, SATP)
+#define DO_SHRN_UH(N, M, SATP)                                  \
+    do_sat_bhs((uint64_t)(N) >> (M), 0, UINT16_MAX, SATP)
+#define DO_SHRUN_H(N, M, SATP)                                  \
+    do_sat_bhs((int64_t)(N) >> (M), 0, UINT16_MAX, SATP)
+
+#define DO_RSHRN_SB(N, M, SATP)                                 \
+    do_sat_bhs(do_srshr(N, M), INT8_MIN, INT8_MAX, SATP)
+#define DO_RSHRN_UB(N, M, SATP)                                 \
+    do_sat_bhs(do_urshr(N, M), 0, UINT8_MAX, SATP)
+#define DO_RSHRUN_B(N, M, SATP)                                 \
+    do_sat_bhs(do_srshr(N, M), 0, UINT8_MAX, SATP)
+
+#define DO_RSHRN_SH(N, M, SATP)                                 \
+    do_sat_bhs(do_srshr(N, M), INT16_MIN, INT16_MAX, SATP)
+#define DO_RSHRN_UH(N, M, SATP)                                 \
+    do_sat_bhs(do_urshr(N, M), 0, UINT16_MAX, SATP)
+#define DO_RSHRUN_H(N, M, SATP)                                 \
+    do_sat_bhs(do_srshr(N, M), 0, UINT16_MAX, SATP)
+
+DO_VSHRN_SAT_SB(vqshrnb_sb, vqshrnt_sb, DO_SHRN_SB)
+DO_VSHRN_SAT_SH(vqshrnb_sh, vqshrnt_sh, DO_SHRN_SH)
+DO_VSHRN_SAT_UB(vqshrnb_ub, vqshrnt_ub, DO_SHRN_UB)
+DO_VSHRN_SAT_UH(vqshrnb_uh, vqshrnt_uh, DO_SHRN_UH)
+DO_VSHRN_SAT_SB(vqshrunbb, vqshruntb, DO_SHRUN_B)
+DO_VSHRN_SAT_SH(vqshrunbh, vqshrunth, DO_SHRUN_H)
+
+DO_VSHRN_SAT_SB(vqrshrnb_sb, vqrshrnt_sb, DO_RSHRN_SB)
+DO_VSHRN_SAT_SH(vqrshrnb_sh, vqrshrnt_sh, DO_RSHRN_SH)
+DO_VSHRN_SAT_UB(vqrshrnb_ub, vqrshrnt_ub, DO_RSHRN_UB)
+DO_VSHRN_SAT_UH(vqrshrnb_uh, vqrshrnt_uh, DO_RSHRN_UH)
+DO_VSHRN_SAT_SB(vqrshrunbb, vqrshruntb, DO_RSHRUN_B)
+DO_VSHRN_SAT_SH(vqrshrunbh, vqrshrunth, DO_RSHRUN_H)
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -XXX,XX +XXX,XX @@ DO_2SHIFT_N(VSHRNB, vshrnb)
 DO_2SHIFT_N(VSHRNT, vshrnt)
 DO_2SHIFT_N(VRSHRNB, vrshrnb)
 DO_2SHIFT_N(VRSHRNT, vrshrnt)
+DO_2SHIFT_N(VQSHRNB_S, vqshrnb_s)
+DO_2SHIFT_N(VQSHRNT_S, vqshrnt_s)
+DO_2SHIFT_N(VQSHRNB_U, vqshrnb_u)
+DO_2SHIFT_N(VQSHRNT_U, vqshrnt_u)
+DO_2SHIFT_N(VQSHRUNB, vqshrunb)
+DO_2SHIFT_N(VQSHRUNT, vqshrunt)
+DO_2SHIFT_N(VQRSHRNB_S, vqrshrnb_s)
+DO_2SHIFT_N(VQRSHRNT_S, vqrshrnt_s)
+DO_2SHIFT_N(VQRSHRNB_U, vqrshrnb_u)
+DO_2SHIFT_N(VQRSHRNT_U, vqrshrnt_u)
+DO_2SHIFT_N(VQRSHRUNB, vqrshrunb)
+DO_2SHIFT_N(VQRSHRUNT, vqrshrunt)
-- 
2.20.1

Implement the MVE VSHLC insn, which performs a shift left of the
entire vector with carry in bits provided from a general purpose
register and carry out bits written back to that register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-14-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  2 ++
 target/arm/mve.decode      |  2 ++
 target/arm/mve_helper.c    | 38 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-mve.c | 30 ++++++++++++++++++++++++++++++
 4 files changed, 72 insertions(+)

Implement the MVE VADDLV insn; this is similar to VADDV, except
that it accumulates 32-bit elements into a 64-bit accumulator
stored in a pair of general-purpose registers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-15-peter.maydell@linaro.org
---
 target/arm/helper-mve.h    |  3 ++
 target/arm/mve.decode      |  6 +++-
 target/arm/mve_helper.c    | 19 ++++++++++++
 target/arm/translate-mve.c | 63 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 90 insertions(+), 1 deletion(-)

The MVE extension to v8.1M includes some new shift instructions which
sit entirely within the non-coprocessor part of the encoding space
and which operate only on general-purpose registers.  They take up
the space which was previously UNPREDICTABLE MOVS and ORRS encodings
with Rm == 13 or 15.

Implement the long shifts by immediate, which perform shifts on a
pair of general-purpose registers treated as a 64-bit quantity, with
an immediate shift count between 1 and 32.

Awkwardly, because the MOVS and ORRS trans functions do not UNDEF for
the Rm==13,15 case, we need to explicitly emit code to UNDEF for the
cases where v8.1M now requires that.  (Trying to change MOVS and ORRS
is too difficult, because the functions that generate the code are
shared between a dozen different kinds of arithmetic or logical
instruction for all A32, T16 and T32 encodings, and for some insns
and some encodings Rm==13,15 are valid.)

We make the helper functions we need for UQSHLL and SQSHLL take
a 32-bit value which the helper casts to int8_t because we'll need
these helpers also for the shift-by-register insns, where the shift
count might be < 0 or > 32.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-16-peter.maydell@linaro.org
---
 target/arm/helper-mve.h |  3 ++
 target/arm/translate.h  |  1 +
 target/arm/t32.decode   | 28 +++++++++++++
 target/arm/mve_helper.c | 10 +++++
 target/arm/translate.c  | 90 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 132 insertions(+)

Implement the MVE long shifts by register, which perform shifts on a
pair of general-purpose registers treated as a 64-bit quantity, with
the shift count in another general-purpose register, which might be
either positive or negative.

Like the long-shifts-by-immediate, these encodings sit in the space
that was previously the UNPREDICTABLE MOVS/ORRS with Rm==13,15.
Because LSLL_rr and ASRL_rr overlap with both MOV_rxri/ORR_rrri and
also with CSEL (as one of the previously-UNPREDICTABLE Rm==13 cases),
we have to move the CSEL pattern into the same decodetree group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-17-peter.maydell@linaro.org
---
 target/arm/helper-mve.h |  6 +++
 target/arm/translate.h  |  1 +
 target/arm/t32.decode   | 16 +++++--
 target/arm/mve_helper.c | 93 +++++++++++++++++++++++++++++++++++++++++
 target/arm/translate.c  | 69 ++++++++++++++++++++++++++++++
 5 files changed, 182 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(mve_vqrshrunth, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(mve_vshlc, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
 
+DEF_HELPER_FLAGS_3(mve_sshrl, TCG_CALL_NO_RWG, i64, env, i64, i32)
+DEF_HELPER_FLAGS_3(mve_ushll, TCG_CALL_NO_RWG, i64, env, i64, i32)
 DEF_HELPER_FLAGS_3(mve_sqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
 DEF_HELPER_FLAGS_3(mve_uqshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
+DEF_HELPER_FLAGS_3(mve_sqrshrl, TCG_CALL_NO_RWG, i64, env, i64, i32)
+DEF_HELPER_FLAGS_3(mve_uqrshll, TCG_CALL_NO_RWG, i64, env, i64, i32)
+DEF_HELPER_FLAGS_3(mve_sqrshrl48, TCG_CALL_NO_RWG, i64, env, i64, i32)
+DEF_HELPER_FLAGS_3(mve_uqrshll48, TCG_CALL_NO_RWG, i64, env, i64, i32)
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
 typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void AtomicThreeOpFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGArg, MemOp);
 typedef void WideShiftImmFn(TCGv_i64, TCGv_i64, int64_t shift);
+typedef void WideShiftFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i32);
 
 /**
  * arm_tbflags_from_tb:
diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -XXX,XX +XXX,XX @@
 &mcrr            !extern cp opc1 crm rt rt2
 
 &mve_shl_ri      rdalo rdahi shim
+&mve_shl_rr      rdalo rdahi rm
 
 # rdahi: bits [3:1] from insn, bit 0 is 1
 # rdalo: bits [3:1] from insn, bit 0 is 0
@@ -XXX,XX +XXX,XX @@
 
 @mve_shl_ri      ....... .... . ... . . ... ... . .. .. .... \
                  &mve_shl_ri shim=%imm5_12_6 rdalo=%rdalo_17 rdahi=%rdahi_9
+@mve_shl_rr      ....... .... . ... . rm:4  ... . .. .. .... \
+                 &mve_shl_rr rdalo=%rdalo_17 rdahi=%rdahi_9
 
 {
   TST_xrri       1110101 0000 1 .... 0 ... 1111 .... ....     @S_xrr_shi
@@ -XXX,XX +XXX,XX @@ BIC_rrri         1110101 0001 . .... 0 ... .... .... ....     @s_rrr_shi
     URSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 01 1111  @mve_shl_ri
     SRSHRL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 10 1111  @mve_shl_ri
     SQSHLL_ri    1110101 0010 1 ... 1 0 ... ... 1 .. 11 1111  @mve_shl_ri
+
+    LSLL_rr      1110101 0010 1 ... 0 ....  ... 1  0000 1101  @mve_shl_rr
+    ASRL_rr      1110101 0010 1 ... 0 ....  ... 1  0010 1101  @mve_shl_rr
+    UQRSHLL64_rr 1110101 0010 1 ... 1 ....  ... 1  0000 1101  @mve_shl_rr
+    SQRSHRL64_rr 1110101 0010 1 ... 1 ....  ... 1  0010 1101  @mve_shl_rr
+    UQRSHLL48_rr 1110101 0010 1 ... 1 ....  ... 1  1000 1101  @mve_shl_rr
+    SQRSHRL48_rr 1110101 0010 1 ... 1 ....  ... 1  1010 1101  @mve_shl_rr
   ]
 
   MOV_rxri       1110101 0010 . 1111 0 ... .... .... ....     @s_rxr_shi
   ORR_rrri       1110101 0010 . .... 0 ... .... .... ....     @s_rrr_shi
+
+  # v8.1M CSEL and friends
+  CSEL           1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
 }
 {
   MVN_rxri       1110101 0011 . 1111 0 ... .... .... ....     @s_rxr_shi
@@ -XXX,XX +XXX,XX @@ SBC_rrri         1110101 1011 . .... 0 ... .... .... ....     @s_rrr_shi
 }
 RSB_rrri         1110101 1110 . .... 0 ... .... .... ....     @s_rrr_shi
 
-# v8.1M CSEL and friends
-CSEL             1110101 0010 1 rn:4 10 op:2 rd:4 fcond:4 rm:4
-
 # Data-processing (register-shifted register)
 
 MOV_rxrr         1111 1010 0 shty:2 s:1 rm:4 1111 rd:4 0000 rs:4 \
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(mve_vshlc)(CPUARMState *env, void *vd, uint32_t rdm,
     return rdm;
 }
 
+uint64_t HELPER(mve_sshrl)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_sqrshl_d(n, -(int8_t)shift, false, NULL);
+}
+
+uint64_t HELPER(mve_ushll)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_uqrshl_d(n, (int8_t)shift, false, NULL);
+}
+
 uint64_t HELPER(mve_sqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
 {
     return do_sqrshl_d(n, (int8_t)shift, false, &env->QF);
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(mve_uqshll)(CPUARMState *env, uint64_t n, uint32_t shift)
 {
     return do_uqrshl_d(n, (int8_t)shift, false, &env->QF);
 }
+
+uint64_t HELPER(mve_sqrshrl)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_sqrshl_d(n, -(int8_t)shift, true, &env->QF);
+}
+
+uint64_t HELPER(mve_uqrshll)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_uqrshl_d(n, (int8_t)shift, true, &env->QF);
+}
+
+/* Operate on 64-bit values, but saturate at 48 bits */
+static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
+                                    bool round, uint32_t *sat)
+{
+    if (shift <= -48) {
+        /* Rounding the sign bit always produces 0. */
+        if (round) {
+            return 0;
+        }
+        return src >> 63;
+    } else if (shift < 0) {
+        if (round) {
+            src >>= -shift - 1;
+            return (src >> 1) + (src & 1);
+        }
+        return src >> -shift;
+    } else if (shift < 48) {
+        int64_t val = src << shift;
+        int64_t extval = sextract64(val, 0, 48);
+        if (!sat || val == extval) {
+            return extval;
+        }
+    } else if (!sat || src == 0) {
+        return 0;
+    }
+
+    *sat = 1;
+    return (1ULL << 47) - (src >= 0);
+}
+
+/* Operate on 64-bit values, but saturate at 48 bits */
+static inline uint64_t do_uqrshl48_d(uint64_t src, int64_t shift,
+                                     bool round, uint32_t *sat)
+{
+    uint64_t val, extval;
+
+    if (shift <= -(48 + round)) {
+        return 0;
+    } else if (shift < 0) {
+        if (round) {
+            val = src >> (-shift - 1);
+            val = (val >> 1) + (val & 1);
+        } else {
+            val = src >> -shift;
+        }
+        extval = extract64(val, 0, 48);
+        if (!sat || val == extval) {
+            return extval;
+        }
+    } else if (shift < 48) {
+        uint64_t val = src << shift;
+        uint64_t extval = extract64(val, 0, 48);
+        if (!sat || val == extval) {
+            return extval;
+        }
+    } else if (!sat || src == 0) {
+        return 0;
+    }
+
+    *sat = 1;
+    return MAKE_64BIT_MASK(0, 48);
+}
+
+uint64_t HELPER(mve_sqrshrl48)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_sqrshl48_d(n, -(int8_t)shift, true, &env->QF);
+}
+
+uint64_t HELPER(mve_uqrshll48)(CPUARMState *env, uint64_t n, uint32_t shift)
+{
+    return do_uqrshl48_d(n, (int8_t)shift, true, &env->QF);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_URSHRL_ri(DisasContext *s, arg_mve_shl_ri *a)
     return do_mve_shl_ri(s, a, gen_urshr64_i64);
 }
 
+static bool do_mve_shl_rr(DisasContext *s, arg_mve_shl_rr *a, WideShiftFn *fn)
+{
+    TCGv_i64 rda;
+    TCGv_i32 rdalo, rdahi;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+        /* Decode falls through to ORR/MOV UNPREDICTABLE handling */
+        return false;
+    }
+    if (a->rdahi == 15) {
+        /* These are a different encoding (SQSHL/SRSHR/UQSHL/URSHR) */
+        return false;
+    }
+    if (!dc_isar_feature(aa32_mve, s) ||
+        !arm_dc_feature(s, ARM_FEATURE_M_MAIN) ||
+        a->rdahi == 13 || a->rm == 13 || a->rm == 15 ||
+        a->rm == a->rdahi || a->rm == a->rdalo) {
+        /* These rdahi/rdalo/rm cases are UNPREDICTABLE; we choose to UNDEF */
+        unallocated_encoding(s);
+        return true;
+    }
+
+    rda = tcg_temp_new_i64();
+    rdalo = load_reg(s, a->rdalo);
+    rdahi = load_reg(s, a->rdahi);
+    tcg_gen_concat_i32_i64(rda, rdalo, rdahi);
+
+    /* The helper takes care of the sign-extension of the low 8 bits of Rm */
+    fn(rda, cpu_env, rda, cpu_R[a->rm]);
+
+    tcg_gen_extrl_i64_i32(rdalo, rda);
+    tcg_gen_extrh_i64_i32(rdahi, rda);
+    store_reg(s, a->rdalo, rdalo);
+    store_reg(s, a->rdahi, rdahi);
+    tcg_temp_free_i64(rda);
+
+    return true;
+}
+
+static bool trans_LSLL_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_ushll);
+}
+
+static bool trans_ASRL_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_sshrl);
+}
+
+static bool trans_UQRSHLL64_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_uqrshll);
+}
+
+static bool trans_SQRSHRL64_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_sqrshrl);
+}
+
+static bool trans_UQRSHLL48_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_uqrshll48);
+}
+
+static bool trans_SQRSHRL48_rr(DisasContext *s, arg_mve_shl_rr *a)
+{
+    return do_mve_shl_rr(s, a, gen_helper_mve_sqrshrl48);
+}
+
 /*
  * Multiply and multiply accumulate
  */
-- 
2.20.1

Implement the MVE shifts by immediate, which perform shifts
on a single general-purpose register.

These patterns overlap with the long-shift-by-immediates,
so we have to rearrange the grouping a little here.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-18-peter.maydell@linaro.org
---
 target/arm/helper-mve.h |  3 ++
 target/arm/translate.h  |  1 +
 target/arm/t32.decode   | 31 ++++++++++++++-----
 target/arm/mve_helper.c | 10 ++++++
 target/arm/translate.c  | 68 +++++++++++++++++++++++++++++++++++++++--
 5 files changed, 104 insertions(+), 9 deletions(-)

Implement the MVE shifts by register, which perform
shifts on a single general-purpose register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-19-peter.maydell@linaro.org
---
 target/arm/helper-mve.h |  2 ++
 target/arm/translate.h  |  1 +
 target/arm/t32.decode   | 18 ++++++++++++++----
 target/arm/mve_helper.c | 10 ++++++++++
 target/arm/translate.c  | 30 ++++++++++++++++++++++++++++++
 5 files changed, 57 insertions(+), 4 deletions(-)