Series comparison

-[Qemu-devel] [PULL 00/21] target-arm queue
+[PULL 00/52] target-arm queue
-Arm queue -- mostly the first slice of my Musca patches.
+Big pullreq this week, though none of the new features are
 particularly earthshaking. Most of the bulk is from code cleanup
 patches from me or rth.
 thanks
 -- PMM
-The following changes since commit fc3dbb90f2eb069801bfb4cfe9cbc83cf9c5f4a9:
+The following changes since commit b651b80822fa8cb66ca30087ac7fbc75507ae5d2:
-  Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging (2019-02-21 13:09:33 +0000)
+  Merge remote-tracking branch 'remotes/vivier2/tags/linux-user-for-5.0-pull-request' into staging (2020-02-20 17:35:42 +0000)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20190221
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200221
-for you to fetch changes up to 3733f80308d2a7f23f5e39b039e0547aba6c07f1:
+for you to fetch changes up to 270a679b3f950d7c4c600f324aab8bff292d0971:
-  hw/arm/armsse: Make 0x5... alias region work for per-CPU devices (2019-02-21 18:17:48 +0000)
+  target/arm: Add missing checks for fpsp_v2 (2020-02-21 12:54:25 +0000)
 ----------------------------------------------------------------
 target-arm queue:
- * Model the Arm "Musca" development boards: "musca-a" and "musca-b1"
+ * aspeed/scu: Implement chip ID register
- * Implement the ARMv8.3-JSConv extension
+ * hw/misc/iotkit-secctl: Fix writing to 'PPC Interrupt Clear' register
- * v8M MPU should use background region as default, not always
+ * mainstone: Make providing flash images non-mandatory
- * Stop unintentional sign extension in pmu_init
+ * z2: Make providing flash images non-mandatory
  * Fix failures to flush SVE high bits after AdvSIMD INS/ZIP/UZP/TRN/TBL/TBX/EXT
  * Minor performance improvement: spend less time recalculating hflags values
  * Code cleanup to isar_feature function tests
  * Implement ARMv8.1-PMU and ARMv8.4-PMU extensions
  * Bugfix: correct handling of PMCR_EL0.LC bit
  * Bugfix: correct definition of PMCRDP
  * Correctly implement ACTLR2, HACTLR2
  * allwinner: Wire up USB ports
  * Vectorize emulation of USHL, SSHL, PMUL*
  * xilinx_spips: Correct the number of dummy cycles for the FAST_READ_4 cmd
  * sh4: Fix PCI ISA IO memory subregion
  * Code cleanup to use more isar_feature tests and fewer ARM_FEATURE_* tests
 ----------------------------------------------------------------
-Aaron Lindsay OS (1):
+Francisco Iglesias (1):
-      target/arm: Stop unintentional sign extension in pmu_init
+      xilinx_spips: Correct the number of dummy cycles for the FAST_READ_4 cmd
-Peter Maydell (16):
+Guenter Roeck (6):
-      hw/arm/armsse: Fix memory leak in error-exit path
+      mainstone: Make providing flash images non-mandatory
-      target/arm: v8M MPU should use background region as default, not always
+      z2: Make providing flash images non-mandatory
-      hw/misc/tz-ppc: Support having unused ports in the middle of the range
+      hw: usb: hcd-ohci: Move OHCISysBusState and TYPE_SYSBUS_OHCI to include file
-      hw/timer/pl031: Allow use as an embedded-struct device
+      hcd-ehci: Introduce "companion-enable" sysbus property
-      hw/timer/pl031: Convert to using trace events
+      arm: allwinner: Wire up USB ports
-      hw/char/pl011: Allow use as an embedded-struct device
+      sh4: Fix PCI ISA IO memory subregion
       hw/char/pl011: Support all interrupt lines
       hw/char/pl011: Use '0x' prefix when logging hex numbers
       hw/arm/armsse: Document SRAM_ADDR_WIDTH property in header comment
       hw/arm/armsse: Allow boards to specify init-svtor
       hw/arm/musca.c: Implement models of the Musca-A and -B1 boards
       hw/arm/musca: Add PPCs
       hw/arm/musca: Add MPCs
       hw/arm/musca: Wire up PL031 RTC
       hw/arm/musca: Wire up PL011 UARTs
       hw/arm/armsse: Make 0x5... alias region work for per-CPU devices
-Richard Henderson (4):
+Joel Stanley (2):
-      target/arm: Restructure disas_fp_int_conv
+      aspeed/scu: Create separate write callbacks
-      target/arm: Split out vfp_helper.c
+      aspeed/scu: Implement chip ID register
       target/arm: Rearrange Floating-point data-processing (2 regs)
       target/arm: Implement ARMv8.3-JSConv
- hw/arm/Makefile.objs            |    1 +
+Peter Maydell (21):
- target/arm/Makefile.objs        |    2 +-
+      target/arm: Add _aa32_ to isar_feature functions testing 32-bit ID registers
- include/hw/arm/armsse.h         |    7 +-
+      target/arm: Check aa32_pan in take_aarch32_exception(), not aa64_pan
- include/hw/char/pl011.h         |   34 ++
+      target/arm: Add isar_feature_any_fp16 and document naming/usage conventions
- include/hw/misc/tz-ppc.h        |    8 +-
+      target/arm: Define and use any_predinv isar_feature test
- include/hw/timer/pl031.h        |   44 ++
+      target/arm: Factor out PMU register definitions
- target/arm/cpu.h                |   10 +
+      target/arm: Add and use FIELD definitions for ID_AA64DFR0_EL1
- target/arm/helper.h             |    3 +
+      target/arm: Use FIELD macros for clearing ID_DFR0 PERFMON field
- hw/arm/armsse.c                 |   44 +-
+      target/arm: Define an aa32_pmu_8_1 isar feature test function
- hw/arm/musca.c                  |  669 ++++++++++++++++++++++
+      target/arm: Add _aa64_ and _any_ versions of pmu_8_1 isar checks
- hw/char/pl011.c                 |   81 +--
+      target/arm: Stop assuming DBGDIDR always exists
- hw/misc/tz-ppc.c                |   32 ++
+      target/arm: Move DBGDIDR into ARMISARegisters
- hw/timer/pl031.c                |   80 ++-
+      target/arm: Read debug-related ID registers from KVM
- target/arm/cpu.c                |    1 +
+      target/arm: Implement ARMv8.1-PMU extension
- target/arm/cpu64.c              |    2 +
+      target/arm: Implement ARMv8.4-PMU extension
- target/arm/helper.c             | 1072 +----------------------------------
+      target/arm: Provide ARMv8.4-PMU in '-cpu max'
- target/arm/translate-a64.c      |  120 ++--
+      target/arm: Correct definition of PMCRDP
- target/arm/translate.c          |  237 ++++----
+      target/arm: Correct handling of PMCR_EL0.LC bit
- target/arm/vfp_helper.c         | 1176 +++++++++++++++++++++++++++++++++++++++
+      target/arm: Test correct register in aa32_pan and aa32_ats1e1 checks
- MAINTAINERS                     |    7 +
+      target/arm: Use isar_feature function for testing AA32HPD feature
- default-configs/arm-softmmu.mak |    1 +
+      target/arm: Use FIELD_EX32 for testing 32-bit fields
- hw/timer/trace-events           |    6 +
+      target/arm: Correctly implement ACTLR2, HACTLR2
 files changed, 2307 insertions(+), 1330 deletions(-)
  create mode 100644 include/hw/timer/pl031.h
  create mode 100644 hw/arm/musca.c
  create mode 100644 target/arm/vfp_helper.c
+Philippe Mathieu-Daudé (1):
+      hw/misc/iotkit-secctl: Fix writing to 'PPC Interrupt Clear' register
+Richard Henderson (21):
+      target/arm: Flush high bits of sve register after AdvSIMD EXT
+      target/arm: Flush high bits of sve register after AdvSIMD TBL/TBX
+      target/arm: Flush high bits of sve register after AdvSIMD ZIP/UZP/TRN
+      target/arm: Flush high bits of sve register after AdvSIMD INS
+      target/arm: Use bit 55 explicitly for pauth
+      target/arm: Fix select for aa64_va_parameters_both
+      target/arm: Remove ttbr1_valid check from get_phys_addr_lpae
+      target/arm: Split out aa64_va_parameter_tbi, aa64_va_parameter_tbid
+      target/arm: Vectorize USHL and SSHL
+      target/arm: Convert PMUL.8 to gvec
+      target/arm: Convert PMULL.64 to gvec
+      target/arm: Convert PMULL.8 to gvec
+      target/arm: Rename isar_feature_aa32_simd_r32
+      target/arm: Use isar_feature_aa32_simd_r32 more places
+      target/arm: Set MVFR0.FPSP for ARMv5 cpus
+      target/arm: Add isar_feature_aa32_simd_r16
+      target/arm: Rename isar_feature_aa32_fpdp_v2
+      target/arm: Add isar_feature_aa32_{fpsp_v2, fpsp_v3, fpdp_v3}
+      target/arm: Perform fpdp_v2 check first
+      target/arm: Replace ARM_FEATURE_VFP3 checks with fp{sp, dp}_v3
+      target/arm: Add missing checks for fpsp_v2
+ hw/usb/hcd-ohci.h              |  16 ++
+ include/hw/arm/allwinner-a10.h |   6 +
+ target/arm/cpu.h               | 173 ++++++++++++---
+ target/arm/helper-sve.h        |   2 +
+ target/arm/helper.h            |  21 +-
+ target/arm/internals.h         |  47 +++-
+ target/arm/translate.h         |   6 +
+ hw/arm/allwinner-a10.c         |  43 ++++
+ hw/arm/mainstone.c             |  11 +-
+ hw/arm/z2.c                    |   6 -
+ hw/intc/armv7m_nvic.c          |  30 +--
+ hw/misc/aspeed_scu.c           |  93 ++++++--
+ hw/misc/iotkit-secctl.c        |   2 +-
+ hw/sh4/sh_pci.c                |  11 +-
+ hw/ssi/xilinx_spips.c          |   2 +-
+ hw/usb/hcd-ehci-sysbus.c       |   2 +
+ hw/usb/hcd-ohci.c              |  15 --
+ linux-user/arm/signal.c        |   4 +-
+ linux-user/elfload.c           |   4 +-
+ target/arm/arch_dump.c         |  11 +-
+ target/arm/cpu.c               | 175 +++++++--------
+ target/arm/cpu64.c             |  58 +++--
+ target/arm/debug_helper.c      |   6 +-
+ target/arm/helper.c            | 472 +++++++++++++++++++++++------------------
+ target/arm/kvm32.c             |  25 +++
+ target/arm/kvm64.c             |  46 ++++
+ target/arm/m_helper.c          |  11 +-
+ target/arm/machine.c           |   3 +-
+ target/arm/neon_helper.c       | 117 ----------
+ target/arm/pauth_helper.c      |   3 +-
+ target/arm/translate-a64.c     |  92 ++++----
+ target/arm/translate-vfp.inc.c | 263 ++++++++++++++---------
+ target/arm/translate.c         | 356 ++++++++++++++++++++++++++-----
+ target/arm/vec_helper.c        | 211 ++++++++++++++++++
+ target/arm/vfp_helper.c        |   2 +-
+files changed, 1564 insertions(+), 781 deletions(-)

-New patch
+[PULL 01/52] aspeed/scu: Create separate write callbacks
+From: Joel Stanley <joel@jms.id.au>
+This splits the common write callback into separate ast2400 and ast2500
+implementations. This makes it clearer when implementing differing
+behaviour.
+Signed-off-by: Joel Stanley <joel@jms.id.au>
+Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
+Reviewed-by: Cédric Le Goater <clg@kaod.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Message-id: 20200121013302.43839-2-joel@jms.id.au
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ hw/misc/aspeed_scu.c | 80 +++++++++++++++++++++++++++++++-------------
+file changed, 57 insertions(+), 23 deletions(-)
+diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/misc/aspeed_scu.c
++++ b/hw/misc/aspeed_scu.c
+@@ -XXX,XX +XXX,XX @@ static uint64_t aspeed_scu_read(void *opaque, hwaddr offset, unsigned size)
+     return s->regs[reg];
+ }
+-static void aspeed_scu_write(void *opaque, hwaddr offset, uint64_t data,
+-                             unsigned size)
++static void aspeed_ast2400_scu_write(void *opaque, hwaddr offset,
++                                     uint64_t data, unsigned size)
++{
++    AspeedSCUState *s = ASPEED_SCU(opaque);
++    int reg = TO_REG(offset);
++
++    if (reg >= ASPEED_SCU_NR_REGS) {
++        qemu_log_mask(LOG_GUEST_ERROR,
++                      "%s: Out-of-bounds write at offset 0x%" HWADDR_PRIx "\n",
++                      __func__, offset);
++        return;
++    }
++
++    if (reg > PROT_KEY && reg < CPU2_BASE_SEG1 &&
++            !s->regs[PROT_KEY]) {
++        qemu_log_mask(LOG_GUEST_ERROR, "%s: SCU is locked!\n", __func__);
++    }
++
++    trace_aspeed_scu_write(offset, size, data);
++
++    switch (reg) {
++    case PROT_KEY:
++        s->regs[reg] = (data == ASPEED_SCU_PROT_KEY) ? 1 : 0;
++        return;
++    case SILICON_REV:
++    case FREQ_CNTR_EVAL:
++    case VGA_SCRATCH1 ... VGA_SCRATCH8:
++    case RNG_DATA:
++    case FREE_CNTR4:
++    case FREE_CNTR4_EXT:
++        qemu_log_mask(LOG_GUEST_ERROR,
++                      "%s: Write to read-only offset 0x%" HWADDR_PRIx "\n",
++                      __func__, offset);
++        return;
++    }
++
++    s->regs[reg] = data;
++}
++
++static void aspeed_ast2500_scu_write(void *opaque, hwaddr offset,
++                                     uint64_t data, unsigned size)
+ {
+     AspeedSCUState *s = ASPEED_SCU(opaque);
+     int reg = TO_REG(offset);
+@@ -XXX,XX +XXX,XX @@ static void aspeed_scu_write(void *opaque, hwaddr offset, uint64_t data,
+     case PROT_KEY:
+         s->regs[reg] = (data == ASPEED_SCU_PROT_KEY) ? 1 : 0;
+         return;
+-    case CLK_SEL:
+-        s->regs[reg] = data;
+-        break;
+     case HW_STRAP1:
+-        if (ASPEED_IS_AST2500(s->regs[SILICON_REV])) {
+-            s->regs[HW_STRAP1] |= data;
+-            return;
+-        }
+-        /* Jump to assignment below */
+-        break;
++        s->regs[HW_STRAP1] |= data;
++        return;
+     case SILICON_REV:
+-        if (ASPEED_IS_AST2500(s->regs[SILICON_REV])) {
+-            s->regs[HW_STRAP1] &= ~data;
+-        } else {
+-            qemu_log_mask(LOG_GUEST_ERROR,
+-                          "%s: Write to read-only offset 0x%" HWADDR_PRIx "\n",
+-                          __func__, offset);
+-        }
+-        /* Avoid assignment below, we've handled everything */
++        s->regs[HW_STRAP1] &= ~data;
+         return;
+     case FREQ_CNTR_EVAL:
+     case VGA_SCRATCH1 ... VGA_SCRATCH8:
+@@ -XXX,XX +XXX,XX @@ static void aspeed_scu_write(void *opaque, hwaddr offset, uint64_t data,
+     s->regs[reg] = data;
+ }
+-static const MemoryRegionOps aspeed_scu_ops = {
++static const MemoryRegionOps aspeed_ast2400_scu_ops = {
+     .read = aspeed_scu_read,
+-    .write = aspeed_scu_write,
++    .write = aspeed_ast2400_scu_write,
++    .endianness = DEVICE_LITTLE_ENDIAN,
++    .valid.min_access_size = 4,
++    .valid.max_access_size = 4,
++    .valid.unaligned = false,
++};
++
++static const MemoryRegionOps aspeed_ast2500_scu_ops = {
++    .read = aspeed_scu_read,
++    .write = aspeed_ast2500_scu_write,
+     .endianness = DEVICE_LITTLE_ENDIAN,
+     .valid.min_access_size = 4,
+     .valid.max_access_size = 4,
+@@ -XXX,XX +XXX,XX @@ static void aspeed_2400_scu_class_init(ObjectClass *klass, void *data)
+     asc->calc_hpll = aspeed_2400_scu_calc_hpll;
+     asc->apb_divider = 2;
+     asc->nr_regs = ASPEED_SCU_NR_REGS;
+-    asc->ops = &aspeed_scu_ops;
++    asc->ops = &aspeed_ast2400_scu_ops;
+ }
+ static const TypeInfo aspeed_2400_scu_info = {
+@@ -XXX,XX +XXX,XX @@ static void aspeed_2500_scu_class_init(ObjectClass *klass, void *data)
+     asc->calc_hpll = aspeed_2500_scu_calc_hpll;
+     asc->apb_divider = 4;
+     asc->nr_regs = ASPEED_SCU_NR_REGS;
+-    asc->ops = &aspeed_scu_ops;
++    asc->ops = &aspeed_ast2500_scu_ops;
+ }
+ static const TypeInfo aspeed_2500_scu_info = {
+--
+.20.1

-New patch
+[PULL 02/52] aspeed/scu: Implement chip ID register
+From: Joel Stanley <joel@jms.id.au>
+This returns a fixed but non-zero value for the chip id.
+Signed-off-by: Joel Stanley <joel@jms.id.au>
+Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
+Reviewed-by: Cédric Le Goater <clg@kaod.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Message-id: 20200121013302.43839-3-joel@jms.id.au
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ hw/misc/aspeed_scu.c | 13 +++++++++++++
+file changed, 13 insertions(+)
+diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/misc/aspeed_scu.c
++++ b/hw/misc/aspeed_scu.c
+@@ -XXX,XX +XXX,XX @@
+ #define CPU2_BASE_SEG4       TO_REG(0x110)
+ #define CPU2_BASE_SEG5       TO_REG(0x114)
+ #define CPU2_CACHE_CTRL      TO_REG(0x118)
++#define CHIP_ID0             TO_REG(0x150)
++#define CHIP_ID1             TO_REG(0x154)
+ #define UART_HPLL_CLK        TO_REG(0x160)
+ #define PCIE_CTRL            TO_REG(0x180)
+ #define BMC_MMIO_CTRL        TO_REG(0x184)
+@@ -XXX,XX +XXX,XX @@
+ #define AST2600_HW_STRAP2_PROT    TO_REG(0x518)
+ #define AST2600_RNG_CTRL          TO_REG(0x524)
+ #define AST2600_RNG_DATA          TO_REG(0x540)
++#define AST2600_CHIP_ID0          TO_REG(0x5B0)
++#define AST2600_CHIP_ID1          TO_REG(0x5B4)
+ #define AST2600_CLK TO_REG(0x40)
+@@ -XXX,XX +XXX,XX @@ static const uint32_t ast2500_a1_resets[ASPEED_SCU_NR_REGS] = {
+      [CPU2_BASE_SEG1]  = 0x80000000U,
+      [CPU2_BASE_SEG4]  = 0x1E600000U,
+      [CPU2_BASE_SEG5]  = 0xC0000000U,
++     [CHIP_ID0]        = 0x1234ABCDU,
++     [CHIP_ID1]        = 0x88884444U,
+      [UART_HPLL_CLK]   = 0x00001903U,
+      [PCIE_CTRL]       = 0x0000007BU,
+      [BMC_DEV_ID]      = 0x00002402U
+@@ -XXX,XX +XXX,XX @@ static void aspeed_ast2500_scu_write(void *opaque, hwaddr offset,
+     case RNG_DATA:
+     case FREE_CNTR4:
+     case FREE_CNTR4_EXT:
++    case CHIP_ID0:
++    case CHIP_ID1:
+         qemu_log_mask(LOG_GUEST_ERROR,
+                       "%s: Write to read-only offset 0x%" HWADDR_PRIx "\n",
+                       __func__, offset);
+@@ -XXX,XX +XXX,XX @@ static void aspeed_ast2600_scu_write(void *opaque, hwaddr offset,
+     case AST2600_RNG_DATA:
+     case AST2600_SILICON_REV:
+     case AST2600_SILICON_REV2:
++    case AST2600_CHIP_ID0:
++    case AST2600_CHIP_ID1:
+         /* Add read only registers here */
+         qemu_log_mask(LOG_GUEST_ERROR,
+                       "%s: Write to read-only offset 0x%" HWADDR_PRIx "\n",
+@@ -XXX,XX +XXX,XX @@ static const uint32_t ast2600_a0_resets[ASPEED_AST2600_SCU_NR_REGS] = {
+     [AST2600_CLK_STOP_CTRL2]    = 0xFFF0FFF0,
+     [AST2600_SDRAM_HANDSHAKE]   = 0x00000040,  /* SoC completed DRAM init */
+     [AST2600_HPLL_PARAM]        = 0x1000405F,
++    [AST2600_CHIP_ID0]          = 0x1234ABCD,
++    [AST2600_CHIP_ID1]          = 0x88884444,
++
+ };
+ static void aspeed_ast2600_scu_reset(DeviceState *dev)
+--
+.20.1

-New patch
+[PULL 03/52] hw/misc/iotkit-secctl: Fix writing to 'PPC Interrupt Clear' register
+From: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Fix warning reported by Clang static code analyzer:
+    CC      hw/misc/iotkit-secctl.o
+  hw/misc/iotkit-secctl.c:343:9: warning: Value stored to 'value' is never read
+          value &= 0x00f000f3;
+          ^        ~~~~~~~~~~
+Fixes: b3717c23e1c
+Reported-by: Clang Static Analyzer
+Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20200217132922.24607-1-f4bug@amsat.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ hw/misc/iotkit-secctl.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/hw/misc/iotkit-secctl.c b/hw/misc/iotkit-secctl.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/misc/iotkit-secctl.c
++++ b/hw/misc/iotkit-secctl.c
+@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_s_write(void *opaque, hwaddr addr,
+         qemu_set_irq(s->sec_resp_cfg, s->secrespcfg);
+         break;
+     case A_SECPPCINTCLR:
+-        value &= 0x00f000f3;
++        s->secppcintstat &= ~(value & 0x00f000f3);
+         foreach_ppc(s, iotkit_secctl_ppc_update_irq_clear);
+         break;
+     case A_SECPPCINTEN:
+--
+.20.1

-New patch
+[PULL 04/52] mainstone: Make providing flash images non-mandatory
+From: Guenter Roeck <linux@roeck-us.net>
+Up to now, the mainstone machine only boots if two flash images are
+provided. This is not really necessary; the machine can boot from initrd
+or from SD without it. At the same time, having to provide dummy flash
+images is a nuisance and does not add any real value. Make it optional.
+Signed-off-by: Guenter Roeck <linux@roeck-us.net>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Message-id: 20200217210824.18513-1-linux@roeck-us.net
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ hw/arm/mainstone.c | 11 +----------
+file changed, 1 insertion(+), 10 deletions(-)
+diff --git a/hw/arm/mainstone.c b/hw/arm/mainstone.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/arm/mainstone.c
++++ b/hw/arm/mainstone.c
+@@ -XXX,XX +XXX,XX @@ static void mainstone_common_init(MemoryRegion *address_space_mem,
+     /* There are two 32MiB flash devices on the board */
+     for (i = 0; i < 2; i ++) {
+         dinfo = drive_get(IF_PFLASH, 0, i);
+-        if (!dinfo) {
+-            if (qtest_enabled()) {
+-                break;
+-            }
+-            error_report("Two flash images must be given with the "
+-                         "'pflash' parameter");
+-            exit(1);
+-        }
+-
+         if (!pflash_cfi01_register(mainstone_flash_base[i],
+                                    i ? "mainstone.flash1" : "mainstone.flash0",
+                                    MAINSTONE_FLASH,
+-                                   blk_by_legacy_dinfo(dinfo),
++                                   dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
+                                    sector_len, 4, 0, 0, 0, 0, be)) {
+             error_report("Error registering flash memory");
+             exit(1);
+--
+.20.1

-New patch
+[PULL 05/52] z2: Make providing flash images non-mandatory
+From: Guenter Roeck <linux@roeck-us.net>
+Up to now, the z2 machine only boots if a flash image is provided.
+This is not really necessary; the machine can boot from initrd or from
+SD without it. At the same time, having to provide dummy flash images
+is a nuisance and does not add any real value. Make it optional.
+Signed-off-by: Guenter Roeck <linux@roeck-us.net>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Message-id: 20200217210903.18602-1-linux@roeck-us.net
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ hw/arm/z2.c | 6 ------
+file changed, 6 deletions(-)
+diff --git a/hw/arm/z2.c b/hw/arm/z2.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/arm/z2.c
++++ b/hw/arm/z2.c
+@@ -XXX,XX +XXX,XX @@ static void z2_init(MachineState *machine)
+     be = 0;
+ #endif
+     dinfo = drive_get(IF_PFLASH, 0, 0);
+-    if (!dinfo && !qtest_enabled()) {
+-        error_report("Flash image must be given with the "
+-                     "'pflash' parameter");
+-        exit(1);
+-    }
+-
+     if (!pflash_cfi01_register(Z2_FLASH_BASE, "z2.flash0", Z2_FLASH_SIZE,
+                                dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
+                                sector_len, 4, 0, 0, 0, 0, be)) {
+--
+.20.1

-New patch
+[PULL 06/52] target/arm: Flush high bits of sve register after AdvSIMD EXT
+From: Richard Henderson <richard.henderson@linaro.org>
+Writes to AdvSIMD registers flush the bits above 128.
+Buglink: https://bugs.launchpad.net/bugs/1863247
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200214194643.23317-2-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/translate-a64.c | 1 +
+file changed, 1 insertion(+)
+diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-a64.c
++++ b/target/arm/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_ext(DisasContext *s, uint32_t insn)
+     tcg_temp_free_i64(tcg_resl);
+     write_vec_element(s, tcg_resh, rd, 1, MO_64);
+     tcg_temp_free_i64(tcg_resh);
++    clear_vec_high(s, true, rd);
+ }
+ /* TBL/TBX
+--
+.20.1

-New patch
+[PULL 07/52] target/arm: Flush high bits of sve register after AdvSIMD TBL/TBX
+From: Richard Henderson <richard.henderson@linaro.org>
+Writes to AdvSIMD registers flush the bits above 128.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200214194643.23317-3-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/translate-a64.c | 1 +
+file changed, 1 insertion(+)
+diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-a64.c
++++ b/target/arm/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_tb(DisasContext *s, uint32_t insn)
+     tcg_temp_free_i64(tcg_resl);
+     write_vec_element(s, tcg_resh, rd, 1, MO_64);
+     tcg_temp_free_i64(tcg_resh);
++    clear_vec_high(s, true, rd);
+ }
+ /* ZIP/UZP/TRN
+--
+.20.1

-New patch
+[PULL 08/52] target/arm: Flush high bits of sve register after AdvSIMD ZIP/UZP/TRN
+From: Richard Henderson <richard.henderson@linaro.org>
+Writes to AdvSIMD registers flush the bits above 128.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200214194643.23317-4-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/translate-a64.c | 1 +
+file changed, 1 insertion(+)
+diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-a64.c
++++ b/target/arm/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_zip_trn(DisasContext *s, uint32_t insn)
+     tcg_temp_free_i64(tcg_resl);
+     write_vec_element(s, tcg_resh, rd, 1, MO_64);
+     tcg_temp_free_i64(tcg_resh);
++    clear_vec_high(s, true, rd);
+ }
+ /*
+--
+.20.1

-[Qemu-devel] [PULL 04/21] target/arm: Restructure disas_fp_int_conv
+[PULL 09/52] target/arm: Flush high bits of sve register after AdvSIMD INS
 From: Richard Henderson <richard.henderson@linaro.org>
-For opcodes 0-5, move some if conditions into the structure
+Writes to AdvSIMD registers flush the bits above 128.
 of a switch statement.  For opcodes 6 & 7, decode everything
 at once with a second switch.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190215192302.27855-2-richard.henderson@linaro.org
+Message-id: 20200214194643.23317-5-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 94 ++++++++++++++++++++------------------
+ target/arm/translate-a64.c | 6 ++++++
-file changed, 49 insertions(+), 45 deletions(-)
+file changed, 6 insertions(+)
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void handle_simd_inse(DisasContext *s, int rd, int rn,
-     int type = extract32(insn, 22, 2);
+     write_vec_element(s, tmp, rd, dst_index, size);
-     bool sbit = extract32(insn, 29, 1);
-     bool sf = extract32(insn, 31, 1);
+     tcg_temp_free_i64(tmp);
 +    bool itof = false;
      if (sbit) {
 -        unallocated_encoding(s);
 -        return;
 +        goto do_unallocated;
      }
 -    if (opcode > 5) {
 -        /* FMOV */
 -        bool itof = opcode & 1;
 -
 -        if (rmode >= 2) {
 -            unallocated_encoding(s);
 -            return;
 -        }
 -
 -        switch (sf << 3 | type << 1 | rmode) {
 -        case 0x0: /* 32 bit */
 -        case 0xa: /* 64 bit */
 -        case 0xd: /* 64 bit to top half of quad */
 -            break;
 -        case 0x6: /* 16-bit float, 32-bit int */
 -        case 0xe: /* 16-bit float, 64-bit int */
 -            if (dc_isar_feature(aa64_fp16, s)) {
 -                break;
 -            }
 -            /* fallthru */
 -        default:
 -            /* all other sf/type/rmode combinations are invalid */
 -            unallocated_encoding(s);
 -            return;
 -        }
 -
 -        if (!fp_access_check(s)) {
 -            return;
 -        }
 -        handle_fmov(s, rd, rn, type, itof);
 -    } else {
 -        /* actual FP conversions */
 -        bool itof = extract32(opcode, 1, 1);
 -
 -        if (rmode != 0 && opcode > 1) {
 -            unallocated_encoding(s);
 -            return;
 +    switch (opcode) {
 +    case 2: /* SCVTF */
 +    case 3: /* UCVTF */
 +        itof = true;
 +        /* fallthru */
 +    case 4: /* FCVTAS */
 +    case 5: /* FCVTAU */
 +        if (rmode != 0) {
 +            goto do_unallocated;
          }
 +        /* fallthru */
 +    case 0: /* FCVT[NPMZ]S */
 +    case 1: /* FCVT[NPMZ]U */
          switch (type) {
          case 0: /* float32 */
          case 1: /* float64 */
              break;
          case 3: /* float16 */
 -            if (dc_isar_feature(aa64_fp16, s)) {
 -                break;
 +            if (!dc_isar_feature(aa64_fp16, s)) {
 +                goto do_unallocated;
              }
 -            /* fallthru */
 +            break;
          default:
 -            unallocated_encoding(s);
 -            return;
 +            goto do_unallocated;
          }
 -
          if (!fp_access_check(s)) {
              return;
          }
          handle_fpfpcvt(s, rd, rn, opcode, itof, rmode, 64, sf, type);
 +        break;
 +
-+    default:
++    /* INS is considered a 128-bit write for SVE. */
-+        switch (sf << 7 | type << 5 | rmode << 3 | opcode) {
++    clear_vec_high(s, true, rd);
-+        case 0b01100110: /* FMOV half <-> 32-bit int */
+ }
-+        case 0b01100111:
-+        case 0b11100110: /* FMOV half <-> 64-bit int */
-+        case 0b11100111:
+@@ -XXX,XX +XXX,XX @@ static void handle_simd_insg(DisasContext *s, int rd, int rn, int imm5)
-+            if (!dc_isar_feature(aa64_fp16, s)) {
-+                goto do_unallocated;
+     idx = extract32(imm5, 1 + size, 4 - size);
-+            }
+     write_vec_element(s, cpu_reg(s, rn), rd, idx, size);
 +            /* fallthru */
 +        case 0b00000110: /* FMOV 32-bit */
 +        case 0b00000111:
 +        case 0b10100110: /* FMOV 64-bit */
 +        case 0b10100111:
 +        case 0b11001110: /* FMOV top half of 128-bit */
 +        case 0b11001111:
 +            if (!fp_access_check(s)) {
 +                return;
 +            }
 +            itof = opcode & 1;
 +            handle_fmov(s, rd, rn, type, itof);
 +            break;
 +
-+        default:
++    /* INS is considered a 128-bit write for SVE. */
-+        do_unallocated:
++    clear_vec_high(s, true, rd);
 +            unallocated_encoding(s);
 +            return;
 +        }
 +        break;
      }
  }
+ /*
 --
 .20.1

-New patch
+[PULL 10/52] target/arm: Use bit 55 explicitly for pauth
+From: Richard Henderson <richard.henderson@linaro.org>
+The psuedocode in aarch64/functions/pac/auth/Auth and
+aarch64/functions/pac/strip/Strip always uses bit 55 for
+extfield and do not consider if the current regime has 2 ranges.
+Suggested-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20200216194343.21331-2-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/pauth_helper.c | 3 ++-
+file changed, 2 insertions(+), 1 deletion(-)
+diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/pauth_helper.c
++++ b/target/arm/pauth_helper.c
+@@ -XXX,XX +XXX,XX @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t ptr, uint64_t modifier,
+ static uint64_t pauth_original_ptr(uint64_t ptr, ARMVAParameters param)
+ {
+-    uint64_t extfield = -param.select;
++    /* Note that bit 55 is used whether or not the regime has 2 ranges. */
++    uint64_t extfield = sextract64(ptr, 55, 1);
+     int bot_pac_bit = 64 - param.tsz;
+     int top_pac_bit = 64 - 8 * param.tbi;
+--
+.20.1

-New patch
+[PULL 11/52] target/arm: Fix select for aa64_va_parameters_both
+From: Richard Henderson <richard.henderson@linaro.org>
+Select should always be 0 for a regime with one range.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200216194343.21331-3-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/helper.c | 46 +++++++++++++++++++++++----------------------
+file changed, 24 insertions(+), 22 deletions(-)
+diff --git a/target/arm/helper.c b/target/arm/helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.c
++++ b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
+     bool tbi, tbid, epd, hpd, using16k, using64k;
+     int select, tsz;
+-    /*
+-     * Bit 55 is always between the two regions, and is canonical for
+-     * determining if address tagging is enabled.
+-     */
+-    select = extract64(va, 55, 1);
+-
+     if (!regime_has_2_ranges(mmu_idx)) {
++        select = 0;
+         tsz = extract32(tcr, 0, 6);
+         using64k = extract32(tcr, 14, 1);
+         using16k = extract32(tcr, 15, 1);
+@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
+             tbid = extract32(tcr, 29, 1);
+         }
+         epd = false;
+-    } else if (!select) {
+-        tsz = extract32(tcr, 0, 6);
+-        epd = extract32(tcr, 7, 1);
+-        using64k = extract32(tcr, 14, 1);
+-        using16k = extract32(tcr, 15, 1);
+-        tbi = extract64(tcr, 37, 1);
+-        hpd = extract64(tcr, 41, 1);
+-        tbid = extract64(tcr, 51, 1);
+     } else {
+-        int tg = extract32(tcr, 30, 2);
+-        using16k = tg == 1;
+-        using64k = tg == 3;
+-        tsz = extract32(tcr, 16, 6);
+-        epd = extract32(tcr, 23, 1);
+-        tbi = extract64(tcr, 38, 1);
+-        hpd = extract64(tcr, 42, 1);
+-        tbid = extract64(tcr, 52, 1);
++        /*
++         * Bit 55 is always between the two regions, and is canonical for
++         * determining if address tagging is enabled.
++         */
++        select = extract64(va, 55, 1);
++        if (!select) {
++            tsz = extract32(tcr, 0, 6);
++            epd = extract32(tcr, 7, 1);
++            using64k = extract32(tcr, 14, 1);
++            using16k = extract32(tcr, 15, 1);
++            tbi = extract64(tcr, 37, 1);
++            hpd = extract64(tcr, 41, 1);
++            tbid = extract64(tcr, 51, 1);
++        } else {
++            int tg = extract32(tcr, 30, 2);
++            using16k = tg == 1;
++            using64k = tg == 3;
++            tsz = extract32(tcr, 16, 6);
++            epd = extract32(tcr, 23, 1);
++            tbi = extract64(tcr, 38, 1);
++            hpd = extract64(tcr, 42, 1);
++            tbid = extract64(tcr, 52, 1);
++        }
+     }
+     tsz = MIN(tsz, 39);  /* TODO: ARMv8.4-TTST */
+     tsz = MAX(tsz, 16);  /* TODO: ARMv8.2-LVA  */
+--
+.20.1

-[Qemu-devel] [PULL 03/21] target/arm: Stop unintentional sign extension in pmu_init
+[PULL 12/52] target/arm: Remove ttbr1_valid check from get_phys_addr_lpae
-From: Aaron Lindsay OS <aaron@os.amperecomputing.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-This was introduced by
+Now that aa64_va_parameters_both sets select based on the number
-    commit bf8d09694ccc07487cd73d7562081fdaec3370c8
+of ranges in the regime, the ttbr1_valid check is redundant.
     target/arm: Don't clear supported PMU events when initializing PMCEID1
 and identified by Coverity (CID 1398645).
-Signed-off-by: Aaron Lindsay <aaron@os.amperecomputing.com>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reported-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20200216194343.21331-4-richard.henderson@linaro.org
 Message-id: 20190219144621.450-1-aaron@os.amperecomputing.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c | 2 +-
+ target/arm/helper.c | 6 +-----
-file changed, 1 insertion(+), 1 deletion(-)
+file changed, 1 insertion(+), 5 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu)
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
+     TCR *tcr = regime_tcr(env, mmu_idx);
-         if (cnt->supported(&cpu->env)) {
+     int ap, ns, xn, pxn;
-             supported_event_map[cnt->number] = i;
+     uint32_t el = regime_el(env, mmu_idx);
--            uint64_t event_mask = 1 << (cnt->number & 0x1f);
+-    bool ttbr1_valid;
-+            uint64_t event_mask = 1ULL << (cnt->number & 0x1f);
+     uint64_t descaddrmask;
-             if (cnt->number & 0x20) {
+     bool aarch64 = arm_el_is_aa64(env, el);
-                 cpu->pmceid1 |= event_mask;
+     bool guarded = false;
-             } else {
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
          param = aa64_va_parameters(env, address, mmu_idx,
                                     access_type != MMU_INST_FETCH);
          level = 0;
 -        ttbr1_valid = regime_has_2_ranges(mmu_idx);
          addrsize = 64 - 8 * param.tbi;
          inputsize = 64 - param.tsz;
      } else {
          param = aa32_va_parameters(env, address, mmu_idx);
          level = 1;
 -        /* There is no TTBR1 for EL2 */
 -        ttbr1_valid = (el != 2);
          addrsize = (mmu_idx == ARMMMUIdx_Stage2 ? 40 : 32);
          inputsize = addrsize - param.tsz;
      }
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
      if (inputsize < addrsize) {
          target_ulong top_bits = sextract64(address, inputsize,
                                             addrsize - inputsize);
 -        if (-top_bits != param.select || (param.select && !ttbr1_valid)) {
 +        if (-top_bits != param.select) {
              /* The gap between the two regions is a Translation fault */
              fault_type = ARMFault_Translation;
              goto do_fault;
 --
 .20.1

-New patch
+[PULL 13/52] target/arm: Split out aa64_va_parameter_tbi, aa64_va_parameter_tbid
+From: Richard Henderson <richard.henderson@linaro.org>
+For the purpose of rebuild_hflags_a64, we do not need to compute
+all of the va parameters, only tbi.  Moreover, we can compute them
+in a form that is more useful to storing in hflags.
+This eliminates the need for aa64_va_parameter_both, so fold that
+in to aa64_va_parameter.  The remaining calls to aa64_va_parameter
+are in get_phys_addr_lpae and in pauth_helper.c.
+This reduces the total cpu consumption of aa64_va_parameter in a
+kernel boot plus a kvm guest kernel boot from 3% to 0.5%.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200216194343.21331-5-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/internals.h |  3 --
+ target/arm/helper.c    | 68 +++++++++++++++++++++++-------------------
+files changed, 37 insertions(+), 34 deletions(-)
+diff --git a/target/arm/internals.h b/target/arm/internals.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/internals.h
++++ b/target/arm/internals.h
+@@ -XXX,XX +XXX,XX @@ typedef struct ARMVAParameters {
+     unsigned tsz    : 8;
+     unsigned select : 1;
+     bool tbi        : 1;
+-    bool tbid       : 1;
+     bool epd        : 1;
+     bool hpd        : 1;
+     bool using16k   : 1;
+     bool using64k   : 1;
+ } ARMVAParameters;
+-ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
+-                                        ARMMMUIdx mmu_idx);
+ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
+                                    ARMMMUIdx mmu_idx, bool data);
+diff --git a/target/arm/helper.c b/target/arm/helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.c
++++ b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@ static uint8_t convert_stage2_attrs(CPUARMState *env, uint8_t s2attrs)
+ }
+ #endif /* !CONFIG_USER_ONLY */
+-ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
+-                                        ARMMMUIdx mmu_idx)
++static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
++{
++    if (regime_has_2_ranges(mmu_idx)) {
++        return extract64(tcr, 37, 2);
++    } else if (mmu_idx == ARMMMUIdx_Stage2) {
++        return 0; /* VTCR_EL2 */
++    } else {
++        return extract32(tcr, 20, 1);
++    }
++}
++
++static int aa64_va_parameter_tbid(uint64_t tcr, ARMMMUIdx mmu_idx)
++{
++    if (regime_has_2_ranges(mmu_idx)) {
++        return extract64(tcr, 51, 2);
++    } else if (mmu_idx == ARMMMUIdx_Stage2) {
++        return 0; /* VTCR_EL2 */
++    } else {
++        return extract32(tcr, 29, 1);
++    }
++}
++
++ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
++                                   ARMMMUIdx mmu_idx, bool data)
+ {
+     uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
+-    bool tbi, tbid, epd, hpd, using16k, using64k;
+-    int select, tsz;
++    bool epd, hpd, using16k, using64k;
++    int select, tsz, tbi;
+     if (!regime_has_2_ranges(mmu_idx)) {
+         select = 0;
+@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
+         using16k = extract32(tcr, 15, 1);
+         if (mmu_idx == ARMMMUIdx_Stage2) {
+             /* VTCR_EL2 */
+-            tbi = tbid = hpd = false;
++            hpd = false;
+         } else {
+-            tbi = extract32(tcr, 20, 1);
+             hpd = extract32(tcr, 24, 1);
+-            tbid = extract32(tcr, 29, 1);
+         }
+         epd = false;
+     } else {
+@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
+             epd = extract32(tcr, 7, 1);
+             using64k = extract32(tcr, 14, 1);
+             using16k = extract32(tcr, 15, 1);
+-            tbi = extract64(tcr, 37, 1);
+             hpd = extract64(tcr, 41, 1);
+-            tbid = extract64(tcr, 51, 1);
+         } else {
+             int tg = extract32(tcr, 30, 2);
+             using16k = tg == 1;
+             using64k = tg == 3;
+             tsz = extract32(tcr, 16, 6);
+             epd = extract32(tcr, 23, 1);
+-            tbi = extract64(tcr, 38, 1);
+             hpd = extract64(tcr, 42, 1);
+-            tbid = extract64(tcr, 52, 1);
+         }
+     }
+     tsz = MIN(tsz, 39);  /* TODO: ARMv8.4-TTST */
+     tsz = MAX(tsz, 16);  /* TODO: ARMv8.2-LVA  */
++    /* Present TBI as a composite with TBID.  */
++    tbi = aa64_va_parameter_tbi(tcr, mmu_idx);
++    if (!data) {
++        tbi &= ~aa64_va_parameter_tbid(tcr, mmu_idx);
++    }
++    tbi = (tbi >> select) & 1;
++
+     return (ARMVAParameters) {
+         .tsz = tsz,
+         .select = select,
+         .tbi = tbi,
+-        .tbid = tbid,
+         .epd = epd,
+         .hpd = hpd,
+         .using16k = using16k,
+@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
+     };
+ }
+-ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
+-                                   ARMMMUIdx mmu_idx, bool data)
+-{
+-    ARMVAParameters ret = aa64_va_parameters_both(env, va, mmu_idx);
+-
+-    /* Present TBI as a composite with TBID.  */
+-    ret.tbi &= (data || !ret.tbid);
+-    return ret;
+-}
+-
+ #ifndef CONFIG_USER_ONLY
+ static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
+                                           ARMMMUIdx mmu_idx)
+@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
+ {
+     uint32_t flags = rebuild_hflags_aprofile(env);
+     ARMMMUIdx stage1 = stage_1_mmu_idx(mmu_idx);
+-    ARMVAParameters p0 = aa64_va_parameters_both(env, 0, stage1);
++    uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
+     uint64_t sctlr;
+     int tbii, tbid;
+     flags = FIELD_DP32(flags, TBFLAG_ANY, AARCH64_STATE, 1);
+     /* Get control bits for tagged addresses.  */
+-    if (regime_has_2_ranges(mmu_idx)) {
+-        ARMVAParameters p1 = aa64_va_parameters_both(env, -1, stage1);
+-        tbid = (p1.tbi << 1) | p0.tbi;
+-        tbii = tbid & ~((p1.tbid << 1) | p0.tbid);
+-    } else {
+-        tbid = p0.tbi;
+-        tbii = tbid & !p0.tbid;
+-    }
++    tbid = aa64_va_parameter_tbi(tcr, mmu_idx);
++    tbii = tbid & ~aa64_va_parameter_tbid(tcr, mmu_idx);
+     flags = FIELD_DP32(flags, TBFLAG_A64, TBII, tbii);
+     flags = FIELD_DP32(flags, TBFLAG_A64, TBID, tbid);
+--
+.20.1

-[Qemu-devel] [PULL 09/21] hw/timer/pl031: Allow use as an embedded-struct device
+[PULL 14/52] target/arm: Add _aa32_ to isar_feature functions testing 32-bit ID registers
-Create a new include file for the pl031's device struct,
+Enforce a convention that an isar_feature function that tests a
-type macros, etc, so that it can be instantiated using
+-bit ID register always has _aa32_ in its name, and one that
-the "embedded struct" coding style.
+tests a 64-bit ID register always has _aa64_ in its name.
 We already follow this except for three cases: thumb_div,
 arm_div and jazelle, which all need _aa32_ adding.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+(As noted in the comment, isar_feature_aa32_fp16_arith()
 is an exception in that it currently tests ID_AA64PFR0_EL1,
 but will switch to MVFR1 once we've properly implemented
 FP16 for AArch32.)
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20200214175116.9164-2-peter.maydell@linaro.org
 ---
- include/hw/timer/pl031.h | 44 ++++++++++++++++++++++++++++++++++++++++
+ target/arm/cpu.h       | 13 ++++++++++---
- hw/timer/pl031.c         | 25 +----------------------
+ target/arm/internals.h |  2 +-
- MAINTAINERS              |  1 +
+ linux-user/elfload.c   |  4 ++--
-files changed, 46 insertions(+), 24 deletions(-)
+ target/arm/cpu.c       |  6 ++++--
- create mode 100644 include/hw/timer/pl031.h
+ target/arm/helper.c    |  2 +-
  target/arm/translate.c |  6 +++---
 files changed, 21 insertions(+), 12 deletions(-)
-diff --git a/include/hw/timer/pl031.h b/include/hw/timer/pl031.h
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
-new file mode 100644
+index XXXXXXX..XXXXXXX 100644
-index XXXXXXX..XXXXXXX
+--- a/target/arm/cpu.h
---- /dev/null
++++ b/target/arm/cpu.h
-+++ b/include/hw/timer/pl031.h
+@@ -XXX,XX +XXX,XX @@ static inline uint64_t *aa64_vfp_qreg(CPUARMState *env, unsigned regno)
-@@ -XXX,XX +XXX,XX @@
+ /* Shared between translate-sve.c and sve_helper.c.  */
  extern const uint64_t pred_esz_masks[4];
 +/*
-+ * ARM AMBA PrimeCell PL031 RTC
++ * Naming convention for isar_feature functions:
-+ *
++ * Functions which test 32-bit ID registers should have _aa32_ in
-+ * Copyright (c) 2007 CodeSourcery
++ * their name. Functions which test 64-bit ID registers should have
-+ *
++ * _aa64_ in their name.
 + * This file is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 as
 + * published by the Free Software Foundation.
 + *
 + * Contributions after 2012-01-13 are licensed under the terms of the
 + * GNU GPL, version 2 or (at your option) any later version.
 + */
 +
-+#ifndef HW_TIMER_PL031
+ /*
-+#define HW_TIMER_PL031
+  * 32-bit feature tests via id registers.
-+
+  */
-+#include "hw/sysbus.h"
+-static inline bool isar_feature_thumb_div(const ARMISARegisters *id)
-+
++static inline bool isar_feature_aa32_thumb_div(const ARMISARegisters *id)
-+#define TYPE_PL031 "pl031"
+ {
-+#define PL031(obj) OBJECT_CHECK(PL031State, (obj), TYPE_PL031)
+     return FIELD_EX32(id->id_isar0, ID_ISAR0, DIVIDE) != 0;
-+
+ }
-+typedef struct PL031State {
-+    SysBusDevice parent_obj;
+-static inline bool isar_feature_arm_div(const ARMISARegisters *id)
-+
++static inline bool isar_feature_aa32_arm_div(const ARMISARegisters *id)
-+    MemoryRegion iomem;
+ {
-+    QEMUTimer *timer;
+     return FIELD_EX32(id->id_isar0, ID_ISAR0, DIVIDE) > 1;
-+    qemu_irq irq;
+ }
-+
-+    /*
+-static inline bool isar_feature_jazelle(const ARMISARegisters *id)
-+     * Needed to preserve the tick_count across migration, even if the
++static inline bool isar_feature_aa32_jazelle(const ARMISARegisters *id)
-+     * absolute value of the rtc_clock is different on the source and
+ {
-+     * destination.
+     return FIELD_EX32(id->id_isar1, ID_ISAR1, JAZELLE) != 0;
-+     */
+ }
-+    uint32_t tick_offset_vmstate;
+diff --git a/target/arm/internals.h b/target/arm/internals.h
 +    uint32_t tick_offset;
 +
 +    uint32_t mr;
 +    uint32_t lr;
 +    uint32_t cr;
 +    uint32_t im;
 +    uint32_t is;
 +} PL031State;
 +
 +#endif
 diff --git a/hw/timer/pl031.c b/hw/timer/pl031.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/timer/pl031.c
+--- a/target/arm/internals.h
-+++ b/hw/timer/pl031.c
++++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline uint32_t aarch32_cpsr_valid_mask(uint64_t features,
      if ((features >> ARM_FEATURE_THUMB2) & 1) {
          valid |= CPSR_IT;
      }
 -    if (isar_feature_jazelle(id)) {
 +    if (isar_feature_aa32_jazelle(id)) {
          valid |= CPSR_J;
      }
      if (isar_feature_aa32_pan(id)) {
 diff --git a/linux-user/elfload.c b/linux-user/elfload.c
 index XXXXXXX..XXXXXXX 100644
 --- a/linux-user/elfload.c
 +++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
      GET_FEATURE(ARM_FEATURE_VFP3, ARM_HWCAP_ARM_VFPv3);
      GET_FEATURE(ARM_FEATURE_V6K, ARM_HWCAP_ARM_TLS);
      GET_FEATURE(ARM_FEATURE_VFP4, ARM_HWCAP_ARM_VFPv4);
 -    GET_FEATURE_ID(arm_div, ARM_HWCAP_ARM_IDIVA);
 -    GET_FEATURE_ID(thumb_div, ARM_HWCAP_ARM_IDIVT);
 +    GET_FEATURE_ID(aa32_arm_div, ARM_HWCAP_ARM_IDIVA);
 +    GET_FEATURE_ID(aa32_thumb_div, ARM_HWCAP_ARM_IDIVT);
      /* All QEMU's VFPv3 CPUs have 32 registers, see VFP_DREG in translate.c.
       * Note that the ARM_HWCAP_ARM_VFPv3D16 bit is always the inverse of
       * ARM_HWCAP_ARM_VFPD32 (and so always clear for QEMU); it is unrelated
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
           * Presence of EL2 itself is ARM_FEATURE_EL2, and of the
           * Security Extensions is ARM_FEATURE_EL3.
           */
 -        assert(!tcg_enabled() || no_aa32 || cpu_isar_feature(arm_div, cpu));
 +        assert(!tcg_enabled() || no_aa32 ||
 +               cpu_isar_feature(aa32_arm_div, cpu));
          set_feature(env, ARM_FEATURE_LPAE);
          set_feature(env, ARM_FEATURE_V7);
      }
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
      if (arm_feature(env, ARM_FEATURE_V6)) {
          set_feature(env, ARM_FEATURE_V5);
          if (!arm_feature(env, ARM_FEATURE_M)) {
 -            assert(!tcg_enabled() || no_aa32 || cpu_isar_feature(jazelle, cpu));
 +            assert(!tcg_enabled() || no_aa32 ||
 +                   cpu_isar_feature(aa32_jazelle, cpu));
              set_feature(env, ARM_FEATURE_AUXCR);
          }
      }
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
      if (arm_feature(env, ARM_FEATURE_LPAE)) {
          define_arm_cp_regs(cpu, lpae_cp_reginfo);
      }
 -    if (cpu_isar_feature(jazelle, cpu)) {
 +    if (cpu_isar_feature(aa32_jazelle, cpu)) {
          define_arm_cp_regs(cpu, jazelle_regs);
      }
      /* Slightly awkwardly, the OMAP and StrongARM cores need all of
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
 @@ -XXX,XX +XXX,XX @@
-  */
+ #define ENABLE_ARCH_5     arm_dc_feature(s, ARM_FEATURE_V5)
+ /* currently all emulated v5 cores are also v5TE, so don't bother */
- #include "qemu/osdep.h"
+ #define ENABLE_ARCH_5TE   arm_dc_feature(s, ARM_FEATURE_V5)
-+#include "hw/timer/pl031.h"
+-#define ENABLE_ARCH_5J    dc_isar_feature(jazelle, s)
- #include "hw/sysbus.h"
++#define ENABLE_ARCH_5J    dc_isar_feature(aa32_jazelle, s)
- #include "qemu/timer.h"
+ #define ENABLE_ARCH_6     arm_dc_feature(s, ARM_FEATURE_V6)
- #include "sysemu/sysemu.h"
+ #define ENABLE_ARCH_6K    arm_dc_feature(s, ARM_FEATURE_V6K)
-@@ -XXX,XX +XXX,XX @@ do { printf("pl031: " fmt , ## __VA_ARGS__); } while (0)
+ #define ENABLE_ARCH_6T2   arm_dc_feature(s, ARM_FEATURE_THUMB2)
- #define RTC_MIS     0x18    /* Masked interrupt status register */
+@@ -XXX,XX +XXX,XX @@ static bool op_div(DisasContext *s, arg_rrr *a, bool u)
- #define RTC_ICR     0x1c    /* Interrupt clear register */
+     TCGv_i32 t1, t2;
--#define TYPE_PL031 "pl031"
+     if (s->thumb
--#define PL031(obj) OBJECT_CHECK(PL031State, (obj), TYPE_PL031)
+-        ? !dc_isar_feature(thumb_div, s)
--
+-        : !dc_isar_feature(arm_div, s)) {
--typedef struct PL031State {
++        ? !dc_isar_feature(aa32_thumb_div, s)
--    SysBusDevice parent_obj;
++        : !dc_isar_feature(aa32_arm_div, s)) {
--
+         return false;
--    MemoryRegion iomem;
+     }
--    QEMUTimer *timer;
 -    qemu_irq irq;
 -
 -    /* Needed to preserve the tick_count across migration, even if the
 -     * absolute value of the rtc_clock is different on the source and
 -     * destination.
 -     */
 -    uint32_t tick_offset_vmstate;
 -    uint32_t tick_offset;
 -
 -    uint32_t mr;
 -    uint32_t lr;
 -    uint32_t cr;
 -    uint32_t im;
 -    uint32_t is;
 -} PL031State;
 -
  static const unsigned char pl031_id[] = {
 x31, 0x10, 0x14, 0x00,         /* Device ID        */
 x0d, 0xf0, 0x05, 0xb1          /* Cell ID      */
 diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ F: hw/sd/pl181.c
  F: hw/ssi/pl022.c
  F: include/hw/ssi/pl022.h
  F: hw/timer/pl031.c
 +F: include/hw/timer/pl031.h
  F: include/hw/arm/primecell.h
  F: hw/timer/cmsdk-apb-timer.c
  F: include/hw/timer/cmsdk-apb-timer.h
 --
 .20.1

-[Qemu-devel] [PULL 02/21] target/arm: v8M MPU should use background region as default, not always
+[PULL 15/52] target/arm: Check aa32_pan in take_aarch32_exception(), not aa64_pan
-The "background region" for a v8M MPU is a default which will be used
+In take_aarch32_exception(), we know we are dealing with a CPU that
-(if enabled, and if the access is privileged) if the access does
+has AArch32, so the right isar_feature test is aa32_pan, not aa64_pan.
 not match any specific MPU region. We were incorrectly using it
 always (by putting the condition at the wrong nesting level). This
 meant that we would always return the default background permissions
 rather than the correct permissions for a specific region, and also
 that we would not return the right information in response to a
 TT instruction.
 Move the check for the background region to the same place in the
 logic as the equivalent v8M MPUCheck() pseudocode puts it.
 This in turn means we must adjust the condition we use to detect
 matches in multiple regions to avoid false-positives.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190214113408.10214-1-peter.maydell@linaro.org
+Message-id: 20200214175116.9164-3-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 8 +++++---
+ target/arm/helper.c | 2 +-
-file changed, 5 insertions(+), 3 deletions(-)
+file changed, 1 insertion(+), 1 deletion(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
+@@ -XXX,XX +XXX,XX @@ static void take_aarch32_exception(CPUARMState *env, int new_mode,
-         hit = true;
+         env->elr_el[2] = env->regs[15];
      } else if (m_is_ppb_region(env, address)) {
          hit = true;
 -    } else if (pmsav7_use_background_region(cpu, mmu_idx, is_user)) {
 -        hit = true;
      } else {
-+        if (pmsav7_use_background_region(cpu, mmu_idx, is_user)) {
+         /* CPSR.PAN is normally preserved preserved unless...  */
-+            hit = true;
+-        if (cpu_isar_feature(aa64_pan, env_archcpu(env))) {
-+        }
++        if (cpu_isar_feature(aa32_pan, env_archcpu(env))) {
-+
+             switch (new_el) {
-         for (n = (int)cpu->pmsav7_dregion - 1; n >= 0; n--) {
+             case 3:
-             /* region search */
+                 if (!arm_is_secure_below_el3(env)) {
              /* Note that the base address is bits [31:5] from the register
@@ -XXX,XX +XXX,XX @@ static bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
                  *is_subpage = true;
              }
 -            if (hit) {
 +            if (matchregion != -1) {
                  /* Multiple regions match -- always a failure (unlike
                   * PMSAv7 where highest-numbered-region wins)
                   */
 --
 .20.1

-New patch
+[PULL 16/52] target/arm: Add isar_feature_any_fp16 and document naming/usage conventions
+Our current usage of the isar_feature feature tests almost always
+uses an _aa32_ test when the code path is known to be AArch32
+specific and an _aa64_ test when the code path is known to be
+AArch64 specific. There is just one exception: in the vfp_set_fpscr
+helper we check aa64_fp16 to determine whether the FZ16 bit in
+the FP(S)CR exists, but this code is also used for AArch32.
+There are other places in future where we're likely to want
+a general "does this feature exist for either AArch32 or
+AArch64" check (typically where architecturally the feature exists
+for both CPU states if it exists at all, but the CPU might be
+AArch32-only or AArch64-only, and so only have one set of ID
+registers).
+Introduce a new category of isar_feature_* functions:
+isar_feature_any_foo() should be tested when what we want to
+know is "does this feature exist for either AArch32 or AArch64",
+and always returns the logical OR of isar_feature_aa32_foo()
+and isar_feature_aa64_foo().
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20200214175116.9164-4-peter.maydell@linaro.org
+---
+ target/arm/cpu.h        | 19 ++++++++++++++++++-
+ target/arm/vfp_helper.c |  2 +-
+files changed, 19 insertions(+), 2 deletions(-)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ extern const uint64_t pred_esz_masks[4];
+  * Naming convention for isar_feature functions:
+  * Functions which test 32-bit ID registers should have _aa32_ in
+  * their name. Functions which test 64-bit ID registers should have
+- * _aa64_ in their name.
++ * _aa64_ in their name. These must only be used in code where we
++ * know for certain that the CPU has AArch32 or AArch64 respectively
++ * or where the correct answer for a CPU which doesn't implement that
++ * CPU state is "false" (eg when generating A32 or A64 code, if adding
++ * system registers that are specific to that CPU state, for "should
++ * we let this system register bit be set" tests where the 32-bit
++ * flavour of the register doesn't have the bit, and so on).
++ * Functions which simply ask "does this feature exist at all" have
++ * _any_ in their name, and always return the logical OR of the _aa64_
++ * and the _aa32_ function.
+  */
+ /*
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_bti(const ARMISARegisters *id)
+     return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, BT) != 0;
+ }
++/*
++ * Feature tests for "does this exist in either 32-bit or 64-bit?"
++ */
++static inline bool isar_feature_any_fp16(const ARMISARegisters *id)
++{
++    return isar_feature_aa64_fp16(id) || isar_feature_aa32_fp16_arith(id);
++}
++
+ /*
+  * Forward to the above feature tests given an ARMCPU pointer.
+  */
+diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vfp_helper.c
++++ b/target/arm/vfp_helper.c
+@@ -XXX,XX +XXX,XX @@ uint32_t vfp_get_fpscr(CPUARMState *env)
+ void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
+ {
+     /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
+-    if (!cpu_isar_feature(aa64_fp16, env_archcpu(env))) {
++    if (!cpu_isar_feature(any_fp16, env_archcpu(env))) {
+         val &= ~FPCR_FZ16;
+     }
+--
+.20.1

-New patch
+[PULL 17/52] target/arm: Define and use any_predinv isar_feature test
+Instead of open-coding "ARM_FEATURE_AARCH64 ? aa64_predinv: aa32_predinv",
+define and use an any_predinv isar_feature test function.
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20200214175116.9164-5-peter.maydell@linaro.org
+---
+ target/arm/cpu.h    | 5 +++++
+ target/arm/helper.c | 9 +--------
+files changed, 6 insertions(+), 8 deletions(-)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_any_fp16(const ARMISARegisters *id)
+     return isar_feature_aa64_fp16(id) || isar_feature_aa32_fp16_arith(id);
+ }
++static inline bool isar_feature_any_predinv(const ARMISARegisters *id)
++{
++    return isar_feature_aa64_predinv(id) || isar_feature_aa32_predinv(id);
++}
++
+ /*
+  * Forward to the above feature tests given an ARMCPU pointer.
+  */
+diff --git a/target/arm/helper.c b/target/arm/helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.c
++++ b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
+ #endif /*CONFIG_USER_ONLY*/
+ #endif
+-    /*
+-     * While all v8.0 cpus support aarch64, QEMU does have configurations
+-     * that do not set ID_AA64ISAR1, e.g. user-only qemu-arm -cpu max,
+-     * which will set ID_ISAR6.
+-     */
+-    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)
+-        ? cpu_isar_feature(aa64_predinv, cpu)
+-        : cpu_isar_feature(aa32_predinv, cpu)) {
++    if (cpu_isar_feature(any_predinv, cpu)) {
+         define_arm_cp_regs(cpu, predinv_reginfo);
+     }
+--
+.20.1

-[Qemu-devel] [PULL 13/21] hw/char/pl011: Use '0x' prefix when logging hex numbers
+[PULL 18/52] target/arm: Factor out PMU register definitions
-The pl011 logs when the guest makes a bad access. It prints
+Pull the code that defines the various PMU registers out
-the address offset in hex but confusingly omits the '0x'
+into its own function, matching the pattern we have
-prefix; add it.
+already for the debug registers.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Apart from one style fix to a multi-line comment, this
 is purely movement of code with no changes to it.
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20200214175116.9164-6-peter.maydell@linaro.org
 ---
- hw/char/pl011.c | 4 ++--
+ target/arm/helper.c | 158 +++++++++++++++++++++++---------------------
-file changed, 2 insertions(+), 2 deletions(-)
+file changed, 82 insertions(+), 76 deletions(-)
-diff --git a/hw/char/pl011.c b/hw/char/pl011.c
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/char/pl011.c
+--- a/target/arm/helper.c
-+++ b/hw/char/pl011.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static uint64_t pl011_read(void *opaque, hwaddr offset,
+@@ -XXX,XX +XXX,XX @@ static void define_debug_regs(ARMCPU *cpu)
          break;
      default:
          qemu_log_mask(LOG_GUEST_ERROR,
 -                      "pl011_read: Bad offset %x\n", (int)offset);
 +                      "pl011_read: Bad offset 0x%x\n", (int)offset);
          r = 0;
          break;
      }
@@ -XXX,XX +XXX,XX @@ static void pl011_write(void *opaque, hwaddr offset,
          break;
      default:
          qemu_log_mask(LOG_GUEST_ERROR,
 -                      "pl011_write: Bad offset %x\n", (int)offset);
 +                      "pl011_write: Bad offset 0x%x\n", (int)offset);
      }
  }
++static void define_pmu_regs(ARMCPU *cpu)
++{
++    /*
++     * v7 performance monitor control register: same implementor
++     * field as main ID register, and we implement four counters in
++     * addition to the cycle count register.
++     */
++    unsigned int i, pmcrn = 4;
++    ARMCPRegInfo pmcr = {
++        .name = "PMCR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 0,
++        .access = PL0_RW,
++        .type = ARM_CP_IO | ARM_CP_ALIAS,
++        .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmcr),
++        .accessfn = pmreg_access, .writefn = pmcr_write,
++        .raw_writefn = raw_write,
++    };
++    ARMCPRegInfo pmcr64 = {
++        .name = "PMCR_EL0", .state = ARM_CP_STATE_AA64,
++        .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 0,
++        .access = PL0_RW, .accessfn = pmreg_access,
++        .type = ARM_CP_IO,
++        .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcr),
++        .resetvalue = (cpu->midr & 0xff000000) | (pmcrn << PMCRN_SHIFT),
++        .writefn = pmcr_write, .raw_writefn = raw_write,
++    };
++    define_one_arm_cp_reg(cpu, &pmcr);
++    define_one_arm_cp_reg(cpu, &pmcr64);
++    for (i = 0; i < pmcrn; i++) {
++        char *pmevcntr_name = g_strdup_printf("PMEVCNTR%d", i);
++        char *pmevcntr_el0_name = g_strdup_printf("PMEVCNTR%d_EL0", i);
++        char *pmevtyper_name = g_strdup_printf("PMEVTYPER%d", i);
++        char *pmevtyper_el0_name = g_strdup_printf("PMEVTYPER%d_EL0", i);
++        ARMCPRegInfo pmev_regs[] = {
++            { .name = pmevcntr_name, .cp = 15, .crn = 14,
++              .crm = 8 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
++              .access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
++              .readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
++              .accessfn = pmreg_access },
++            { .name = pmevcntr_el0_name, .state = ARM_CP_STATE_AA64,
++              .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 8 | (3 & (i >> 3)),
++              .opc2 = i & 7, .access = PL0_RW, .accessfn = pmreg_access,
++              .type = ARM_CP_IO,
++              .readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
++              .raw_readfn = pmevcntr_rawread,
++              .raw_writefn = pmevcntr_rawwrite },
++            { .name = pmevtyper_name, .cp = 15, .crn = 14,
++              .crm = 12 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
++              .access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
++              .readfn = pmevtyper_readfn, .writefn = pmevtyper_writefn,
++              .accessfn = pmreg_access },
++            { .name = pmevtyper_el0_name, .state = ARM_CP_STATE_AA64,
++              .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 12 | (3 & (i >> 3)),
++              .opc2 = i & 7, .access = PL0_RW, .accessfn = pmreg_access,
++              .type = ARM_CP_IO,
++              .readfn = pmevtyper_readfn, .writefn = pmevtyper_writefn,
++              .raw_writefn = pmevtyper_rawwrite },
++            REGINFO_SENTINEL
++        };
++        define_arm_cp_regs(cpu, pmev_regs);
++        g_free(pmevcntr_name);
++        g_free(pmevcntr_el0_name);
++        g_free(pmevtyper_name);
++        g_free(pmevtyper_el0_name);
++    }
++    if (FIELD_EX32(cpu->id_dfr0, ID_DFR0, PERFMON) >= 4 &&
++            FIELD_EX32(cpu->id_dfr0, ID_DFR0, PERFMON) != 0xf) {
++        ARMCPRegInfo v81_pmu_regs[] = {
++            { .name = "PMCEID2", .state = ARM_CP_STATE_AA32,
++              .cp = 15, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 4,
++              .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
++              .resetvalue = extract64(cpu->pmceid0, 32, 32) },
++            { .name = "PMCEID3", .state = ARM_CP_STATE_AA32,
++              .cp = 15, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 5,
++              .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
++              .resetvalue = extract64(cpu->pmceid1, 32, 32) },
++            REGINFO_SENTINEL
++        };
++        define_arm_cp_regs(cpu, v81_pmu_regs);
++    }
++}
++
+ /* We don't know until after realize whether there's a GICv3
+  * attached, and that is what registers the gicv3 sysregs.
+  * So we have to fill in the GIC fields in ID_PFR/ID_PFR1_EL1/ID_AA64PFR0_EL1
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
+         define_arm_cp_regs(cpu, pmovsset_cp_reginfo);
+     }
+     if (arm_feature(env, ARM_FEATURE_V7)) {
+-        /* v7 performance monitor control register: same implementor
+-         * field as main ID register, and we implement four counters in
+-         * addition to the cycle count register.
+-         */
+-        unsigned int i, pmcrn = 4;
+-        ARMCPRegInfo pmcr = {
+-            .name = "PMCR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 0,
+-            .access = PL0_RW,
+-            .type = ARM_CP_IO | ARM_CP_ALIAS,
+-            .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmcr),
+-            .accessfn = pmreg_access, .writefn = pmcr_write,
+-            .raw_writefn = raw_write,
+-        };
+-        ARMCPRegInfo pmcr64 = {
+-            .name = "PMCR_EL0", .state = ARM_CP_STATE_AA64,
+-            .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 0,
+-            .access = PL0_RW, .accessfn = pmreg_access,
+-            .type = ARM_CP_IO,
+-            .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcr),
+-            .resetvalue = (cpu->midr & 0xff000000) | (pmcrn << PMCRN_SHIFT),
+-            .writefn = pmcr_write, .raw_writefn = raw_write,
+-        };
+-        define_one_arm_cp_reg(cpu, &pmcr);
+-        define_one_arm_cp_reg(cpu, &pmcr64);
+-        for (i = 0; i < pmcrn; i++) {
+-            char *pmevcntr_name = g_strdup_printf("PMEVCNTR%d", i);
+-            char *pmevcntr_el0_name = g_strdup_printf("PMEVCNTR%d_EL0", i);
+-            char *pmevtyper_name = g_strdup_printf("PMEVTYPER%d", i);
+-            char *pmevtyper_el0_name = g_strdup_printf("PMEVTYPER%d_EL0", i);
+-            ARMCPRegInfo pmev_regs[] = {
+-                { .name = pmevcntr_name, .cp = 15, .crn = 14,
+-                  .crm = 8 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
+-                  .access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
+-                  .readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
+-                  .accessfn = pmreg_access },
+-                { .name = pmevcntr_el0_name, .state = ARM_CP_STATE_AA64,
+-                  .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 8 | (3 & (i >> 3)),
+-                  .opc2 = i & 7, .access = PL0_RW, .accessfn = pmreg_access,
+-                  .type = ARM_CP_IO,
+-                  .readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
+-                  .raw_readfn = pmevcntr_rawread,
+-                  .raw_writefn = pmevcntr_rawwrite },
+-                { .name = pmevtyper_name, .cp = 15, .crn = 14,
+-                  .crm = 12 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
+-                  .access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
+-                  .readfn = pmevtyper_readfn, .writefn = pmevtyper_writefn,
+-                  .accessfn = pmreg_access },
+-                { .name = pmevtyper_el0_name, .state = ARM_CP_STATE_AA64,
+-                  .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 12 | (3 & (i >> 3)),
+-                  .opc2 = i & 7, .access = PL0_RW, .accessfn = pmreg_access,
+-                  .type = ARM_CP_IO,
+-                  .readfn = pmevtyper_readfn, .writefn = pmevtyper_writefn,
+-                  .raw_writefn = pmevtyper_rawwrite },
+-                REGINFO_SENTINEL
+-            };
+-            define_arm_cp_regs(cpu, pmev_regs);
+-            g_free(pmevcntr_name);
+-            g_free(pmevcntr_el0_name);
+-            g_free(pmevtyper_name);
+-            g_free(pmevtyper_el0_name);
+-        }
+         ARMCPRegInfo clidr = {
+             .name = "CLIDR", .state = ARM_CP_STATE_BOTH,
+             .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = 1,
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
+         define_one_arm_cp_reg(cpu, &clidr);
+         define_arm_cp_regs(cpu, v7_cp_reginfo);
+         define_debug_regs(cpu);
++        define_pmu_regs(cpu);
+     } else {
+         define_arm_cp_regs(cpu, not_v7_cp_reginfo);
+     }
+-    if (FIELD_EX32(cpu->id_dfr0, ID_DFR0, PERFMON) >= 4 &&
+-            FIELD_EX32(cpu->id_dfr0, ID_DFR0, PERFMON) != 0xf) {
+-        ARMCPRegInfo v81_pmu_regs[] = {
+-            { .name = "PMCEID2", .state = ARM_CP_STATE_AA32,
+-              .cp = 15, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 4,
+-              .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
+-              .resetvalue = extract64(cpu->pmceid0, 32, 32) },
+-            { .name = "PMCEID3", .state = ARM_CP_STATE_AA32,
+-              .cp = 15, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 5,
+-              .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
+-              .resetvalue = extract64(cpu->pmceid1, 32, 32) },
+-            REGINFO_SENTINEL
+-        };
+-        define_arm_cp_regs(cpu, v81_pmu_regs);
+-    }
+     if (arm_feature(env, ARM_FEATURE_V8)) {
+         /* AArch64 ID registers, which all have impdef reset values.
+          * Note that within the ID register ranges the unused slots
 --
 .20.1

-New patch
+[PULL 19/52] target/arm: Add and use FIELD definitions for ID_AA64DFR0_EL1
+Add FIELD() definitions for the ID_AA64DFR0_EL1 and use them
+where we currently have hard-coded bit values.
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20200214175116.9164-7-peter.maydell@linaro.org
+---
+ target/arm/cpu.h    | 10 ++++++++++
+ target/arm/cpu.c    |  2 +-
+ target/arm/helper.c |  6 +++---
+files changed, 14 insertions(+), 4 deletions(-)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ FIELD(ID_AA64MMFR2, BBM, 52, 4)
+ FIELD(ID_AA64MMFR2, EVT, 56, 4)
+ FIELD(ID_AA64MMFR2, E0PD, 60, 4)
++FIELD(ID_AA64DFR0, DEBUGVER, 0, 4)
++FIELD(ID_AA64DFR0, TRACEVER, 4, 4)
++FIELD(ID_AA64DFR0, PMUVER, 8, 4)
++FIELD(ID_AA64DFR0, BRPS, 12, 4)
++FIELD(ID_AA64DFR0, WRPS, 20, 4)
++FIELD(ID_AA64DFR0, CTX_CMPS, 28, 4)
++FIELD(ID_AA64DFR0, PMSVER, 32, 4)
++FIELD(ID_AA64DFR0, DOUBLELOCK, 36, 4)
++FIELD(ID_AA64DFR0, TRACEFILT, 40, 4)
++
+ FIELD(ID_DFR0, COPDBG, 0, 4)
+ FIELD(ID_DFR0, COPSDBG, 4, 4)
+ FIELD(ID_DFR0, MMAPDBG, 8, 4)
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.c
++++ b/target/arm/cpu.c
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
+                 cpu);
+ #endif
+     } else {
+-        cpu->id_aa64dfr0 &= ~0xf00;
++        cpu->id_aa64dfr0 = FIELD_DP64(cpu->id_aa64dfr0, ID_AA64DFR0, PMUVER, 0);
+         cpu->id_dfr0 &= ~(0xf << 24);
+         cpu->pmceid0 = 0;
+         cpu->pmceid1 = 0;
+diff --git a/target/arm/helper.c b/target/arm/helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.c
++++ b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@ static void define_debug_regs(ARMCPU *cpu)
+      * check that if they both exist then they agree.
+      */
+     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
+-        assert(extract32(cpu->id_aa64dfr0, 12, 4) == brps);
+-        assert(extract32(cpu->id_aa64dfr0, 20, 4) == wrps);
+-        assert(extract32(cpu->id_aa64dfr0, 28, 4) == ctx_cmps);
++        assert(FIELD_EX64(cpu->id_aa64dfr0, ID_AA64DFR0, BRPS) == brps);
++        assert(FIELD_EX64(cpu->id_aa64dfr0, ID_AA64DFR0, WRPS) == wrps);
++        assert(FIELD_EX64(cpu->id_aa64dfr0, ID_AA64DFR0, CTX_CMPS) == ctx_cmps);
+     }
+     define_one_arm_cp_reg(cpu, &dbgdidr);
+--
+.20.1

-New patch
+[PULL 20/52] target/arm: Use FIELD macros for clearing ID_DFR0 PERFMON field
+We already define FIELD macros for ID_DFR0, so use them in the
+one place where we're doing direct bit value manipulation.
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20200214175116.9164-8-peter.maydell@linaro.org
+---
+ target/arm/cpu.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.c
++++ b/target/arm/cpu.c
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
+ #endif
+     } else {
+         cpu->id_aa64dfr0 = FIELD_DP64(cpu->id_aa64dfr0, ID_AA64DFR0, PMUVER, 0);
+-        cpu->id_dfr0 &= ~(0xf << 24);
++        cpu->id_dfr0 = FIELD_DP32(cpu->id_dfr0, ID_DFR0, PERFMON, 0);
+         cpu->pmceid0 = 0;
+         cpu->pmceid1 = 0;
+     }
+--
+.20.1

-New patch
+[PULL 21/52] target/arm: Define an aa32_pmu_8_1 isar feature test function
+Instead of open-coding a check on the ID_DFR0 PerfMon ID register
 field, create a standardly-named isar_feature for "does AArch32 have
 a v8.1 PMUv3" and use it.
 This entails moving the id_dfr0 field into the ARMISARegisters struct.
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20200214175116.9164-9-peter.maydell@linaro.org
 ---
  target/arm/cpu.h      |  9 ++++++++-
  hw/intc/armv7m_nvic.c |  2 +-
  target/arm/cpu.c      | 28 ++++++++++++++--------------
  target/arm/cpu64.c    |  6 +++---
  target/arm/helper.c   |  5 ++---
 files changed, 28 insertions(+), 22 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
          uint32_t mvfr0;
          uint32_t mvfr1;
          uint32_t mvfr2;
 +        uint32_t id_dfr0;
          uint64_t id_aa64isar0;
          uint64_t id_aa64isar1;
          uint64_t id_aa64pfr0;
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
      uint32_t reset_sctlr;
      uint32_t id_pfr0;
      uint32_t id_pfr1;
 -    uint32_t id_dfr0;
      uint64_t pmceid0;
      uint64_t pmceid1;
      uint32_t id_afr0;
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_ats1e1(const ARMISARegisters *id)
      return FIELD_EX64(id->mvfr0, ID_MMFR3, PAN) >= 2;
  }
 +static inline bool isar_feature_aa32_pmu_8_1(const ARMISARegisters *id)
 +{
 +    /* 0xf means "non-standard IMPDEF PMU" */
 +    return FIELD_EX32(id->id_dfr0, ID_DFR0, PERFMON) >= 4 &&
 +        FIELD_EX32(id->id_dfr0, ID_DFR0, PERFMON) != 0xf;
 +}
 +
  /*
   * 64-bit feature tests via id registers.
   */
 diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/intc/armv7m_nvic.c
 +++ b/hw/intc/armv7m_nvic.c
@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
      case 0xd44: /* PFR1.  */
          return cpu->id_pfr1;
      case 0xd48: /* DFR0.  */
 -        return cpu->id_dfr0;
 +        return cpu->isar.id_dfr0;
      case 0xd4c: /* AFR0.  */
          return cpu->id_afr0;
      case 0xd50: /* MMFR0.  */
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
  #endif
      } else {
          cpu->id_aa64dfr0 = FIELD_DP64(cpu->id_aa64dfr0, ID_AA64DFR0, PMUVER, 0);
 -        cpu->id_dfr0 = FIELD_DP32(cpu->id_dfr0, ID_DFR0, PERFMON, 0);
 +        cpu->isar.id_dfr0 = FIELD_DP32(cpu->isar.id_dfr0, ID_DFR0, PERFMON, 0);
          cpu->pmceid0 = 0;
          cpu->pmceid1 = 0;
      }
@@ -XXX,XX +XXX,XX @@ static void arm1136_r2_initfn(Object *obj)
      cpu->reset_sctlr = 0x00050078;
      cpu->id_pfr0 = 0x111;
      cpu->id_pfr1 = 0x1;
 -    cpu->id_dfr0 = 0x2;
 +    cpu->isar.id_dfr0 = 0x2;
      cpu->id_afr0 = 0x3;
      cpu->id_mmfr0 = 0x01130003;
      cpu->id_mmfr1 = 0x10030302;
@@ -XXX,XX +XXX,XX @@ static void arm1136_initfn(Object *obj)
      cpu->reset_sctlr = 0x00050078;
      cpu->id_pfr0 = 0x111;
      cpu->id_pfr1 = 0x1;
 -    cpu->id_dfr0 = 0x2;
 +    cpu->isar.id_dfr0 = 0x2;
      cpu->id_afr0 = 0x3;
      cpu->id_mmfr0 = 0x01130003;
      cpu->id_mmfr1 = 0x10030302;
@@ -XXX,XX +XXX,XX @@ static void arm1176_initfn(Object *obj)
      cpu->reset_sctlr = 0x00050078;
      cpu->id_pfr0 = 0x111;
      cpu->id_pfr1 = 0x11;
 -    cpu->id_dfr0 = 0x33;
 +    cpu->isar.id_dfr0 = 0x33;
      cpu->id_afr0 = 0;
      cpu->id_mmfr0 = 0x01130003;
      cpu->id_mmfr1 = 0x10030302;
@@ -XXX,XX +XXX,XX @@ static void arm11mpcore_initfn(Object *obj)
      cpu->ctr = 0x1d192992; /* 32K icache 32K dcache */
      cpu->id_pfr0 = 0x111;
      cpu->id_pfr1 = 0x1;
 -    cpu->id_dfr0 = 0;
 +    cpu->isar.id_dfr0 = 0;
      cpu->id_afr0 = 0x2;
      cpu->id_mmfr0 = 0x01100103;
      cpu->id_mmfr1 = 0x10020302;
@@ -XXX,XX +XXX,XX @@ static void cortex_m3_initfn(Object *obj)
      cpu->pmsav7_dregion = 8;
      cpu->id_pfr0 = 0x00000030;
      cpu->id_pfr1 = 0x00000200;
 -    cpu->id_dfr0 = 0x00100000;
 +    cpu->isar.id_dfr0 = 0x00100000;
      cpu->id_afr0 = 0x00000000;
      cpu->id_mmfr0 = 0x00000030;
      cpu->id_mmfr1 = 0x00000000;
@@ -XXX,XX +XXX,XX @@ static void cortex_m4_initfn(Object *obj)
      cpu->isar.mvfr2 = 0x00000000;
      cpu->id_pfr0 = 0x00000030;
      cpu->id_pfr1 = 0x00000200;
 -    cpu->id_dfr0 = 0x00100000;
 +    cpu->isar.id_dfr0 = 0x00100000;
      cpu->id_afr0 = 0x00000000;
      cpu->id_mmfr0 = 0x00000030;
      cpu->id_mmfr1 = 0x00000000;
@@ -XXX,XX +XXX,XX @@ static void cortex_m7_initfn(Object *obj)
      cpu->isar.mvfr2 = 0x00000040;
      cpu->id_pfr0 = 0x00000030;
      cpu->id_pfr1 = 0x00000200;
 -    cpu->id_dfr0 = 0x00100000;
 +    cpu->isar.id_dfr0 = 0x00100000;
      cpu->id_afr0 = 0x00000000;
      cpu->id_mmfr0 = 0x00100030;
      cpu->id_mmfr1 = 0x00000000;
@@ -XXX,XX +XXX,XX @@ static void cortex_m33_initfn(Object *obj)
      cpu->isar.mvfr2 = 0x00000040;
      cpu->id_pfr0 = 0x00000030;
      cpu->id_pfr1 = 0x00000210;
 -    cpu->id_dfr0 = 0x00200000;
 +    cpu->isar.id_dfr0 = 0x00200000;
      cpu->id_afr0 = 0x00000000;
      cpu->id_mmfr0 = 0x00101F40;
      cpu->id_mmfr1 = 0x00000000;
@@ -XXX,XX +XXX,XX @@ static void cortex_r5_initfn(Object *obj)
      cpu->midr = 0x411fc153; /* r1p3 */
      cpu->id_pfr0 = 0x0131;
      cpu->id_pfr1 = 0x001;
 -    cpu->id_dfr0 = 0x010400;
 +    cpu->isar.id_dfr0 = 0x010400;
      cpu->id_afr0 = 0x0;
      cpu->id_mmfr0 = 0x0210030;
      cpu->id_mmfr1 = 0x00000000;
@@ -XXX,XX +XXX,XX @@ static void cortex_a8_initfn(Object *obj)
      cpu->reset_sctlr = 0x00c50078;
      cpu->id_pfr0 = 0x1031;
      cpu->id_pfr1 = 0x11;
 -    cpu->id_dfr0 = 0x400;
 +    cpu->isar.id_dfr0 = 0x400;
      cpu->id_afr0 = 0;
      cpu->id_mmfr0 = 0x31100003;
      cpu->id_mmfr1 = 0x20000000;
@@ -XXX,XX +XXX,XX @@ static void cortex_a9_initfn(Object *obj)
      cpu->reset_sctlr = 0x00c50078;
      cpu->id_pfr0 = 0x1031;
      cpu->id_pfr1 = 0x11;
 -    cpu->id_dfr0 = 0x000;
 +    cpu->isar.id_dfr0 = 0x000;
      cpu->id_afr0 = 0;
      cpu->id_mmfr0 = 0x00100103;
      cpu->id_mmfr1 = 0x20000000;
@@ -XXX,XX +XXX,XX @@ static void cortex_a7_initfn(Object *obj)
      cpu->reset_sctlr = 0x00c50078;
      cpu->id_pfr0 = 0x00001131;
      cpu->id_pfr1 = 0x00011011;
 -    cpu->id_dfr0 = 0x02010555;
 +    cpu->isar.id_dfr0 = 0x02010555;
      cpu->id_afr0 = 0x00000000;
      cpu->id_mmfr0 = 0x10101105;
      cpu->id_mmfr1 = 0x40000000;
@@ -XXX,XX +XXX,XX @@ static void cortex_a15_initfn(Object *obj)
      cpu->reset_sctlr = 0x00c50078;
      cpu->id_pfr0 = 0x00001131;
      cpu->id_pfr1 = 0x00011011;
 -    cpu->id_dfr0 = 0x02010555;
 +    cpu->isar.id_dfr0 = 0x02010555;
      cpu->id_afr0 = 0x00000000;
      cpu->id_mmfr0 = 0x10201105;
      cpu->id_mmfr1 = 0x20000000;
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_a57_initfn(Object *obj)
      cpu->reset_sctlr = 0x00c50838;
      cpu->id_pfr0 = 0x00000131;
      cpu->id_pfr1 = 0x00011011;
 -    cpu->id_dfr0 = 0x03010066;
 +    cpu->isar.id_dfr0 = 0x03010066;
      cpu->id_afr0 = 0x00000000;
      cpu->id_mmfr0 = 0x10101105;
      cpu->id_mmfr1 = 0x40000000;
@@ -XXX,XX +XXX,XX @@ static void aarch64_a53_initfn(Object *obj)
      cpu->reset_sctlr = 0x00c50838;
      cpu->id_pfr0 = 0x00000131;
      cpu->id_pfr1 = 0x00011011;
 -    cpu->id_dfr0 = 0x03010066;
 +    cpu->isar.id_dfr0 = 0x03010066;
      cpu->id_afr0 = 0x00000000;
      cpu->id_mmfr0 = 0x10101105;
      cpu->id_mmfr1 = 0x40000000;
@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
      cpu->reset_sctlr = 0x00c50838;
      cpu->id_pfr0 = 0x00000131;
      cpu->id_pfr1 = 0x00011011;
 -    cpu->id_dfr0 = 0x03010066;
 +    cpu->isar.id_dfr0 = 0x03010066;
      cpu->id_afr0 = 0x00000000;
      cpu->id_mmfr0 = 0x10201105;
      cpu->id_mmfr1 = 0x40000000;
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void define_pmu_regs(ARMCPU *cpu)
          g_free(pmevtyper_name);
          g_free(pmevtyper_el0_name);
      }
 -    if (FIELD_EX32(cpu->id_dfr0, ID_DFR0, PERFMON) >= 4 &&
 -            FIELD_EX32(cpu->id_dfr0, ID_DFR0, PERFMON) != 0xf) {
 +    if (cpu_isar_feature(aa32_pmu_8_1, cpu)) {
          ARMCPRegInfo v81_pmu_regs[] = {
              { .name = "PMCEID2", .state = ARM_CP_STATE_AA32,
                .cp = 15, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 4,
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
                .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 1, .opc2 = 2,
                .access = PL1_R, .type = ARM_CP_CONST,
                .accessfn = access_aa32_tid3,
 -              .resetvalue = cpu->id_dfr0 },
 +              .resetvalue = cpu->isar.id_dfr0 },
              { .name = "ID_AFR0", .state = ARM_CP_STATE_BOTH,
                .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 1, .opc2 = 3,
                .access = PL1_R, .type = ARM_CP_CONST,
 --
 .20.1

-New patch
+[PULL 22/52] target/arm: Add _aa64_ and _any_ versions of pmu_8_1 isar checks
+Add the 64-bit version of the "is this a v8.1 PMUv3?"
+ID register check function, and the _any_ version that
+checks for either AArch32 or AArch64 support. We'll use
+this in a later commit.
+We don't (yet) do any isar_feature checks on ID_AA64DFR1_EL1,
+but we move id_aa64dfr1 into the ARMISARegisters struct with
+id_aa64dfr0, for consistency.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20200214175116.9164-10-peter.maydell@linaro.org
+---
+ target/arm/cpu.h    | 15 +++++++++++++--
+ target/arm/cpu.c    |  3 ++-
+ target/arm/cpu64.c  |  6 +++---
+ target/arm/helper.c | 12 +++++++-----
+files changed, 25 insertions(+), 11 deletions(-)
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
+         uint64_t id_aa64mmfr0;
+         uint64_t id_aa64mmfr1;
+         uint64_t id_aa64mmfr2;
++        uint64_t id_aa64dfr0;
++        uint64_t id_aa64dfr1;
+     } isar;
+     uint32_t midr;
+     uint32_t revidr;
+@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
+     uint32_t id_mmfr2;
+     uint32_t id_mmfr3;
+     uint32_t id_mmfr4;
+-    uint64_t id_aa64dfr0;
+-    uint64_t id_aa64dfr1;
+     uint64_t id_aa64afr0;
+     uint64_t id_aa64afr1;
+     uint32_t dbgdidr;
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_bti(const ARMISARegisters *id)
+     return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, BT) != 0;
+ }
++static inline bool isar_feature_aa64_pmu_8_1(const ARMISARegisters *id)
++{
++    return FIELD_EX64(id->id_aa64dfr0, ID_AA64DFR0, PMUVER) >= 4 &&
++        FIELD_EX64(id->id_aa64dfr0, ID_AA64DFR0, PMUVER) != 0xf;
++}
++
+ /*
+  * Feature tests for "does this exist in either 32-bit or 64-bit?"
+  */
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_any_predinv(const ARMISARegisters *id)
+     return isar_feature_aa64_predinv(id) || isar_feature_aa32_predinv(id);
+ }
++static inline bool isar_feature_any_pmu_8_1(const ARMISARegisters *id)
++{
++    return isar_feature_aa64_pmu_8_1(id) || isar_feature_aa32_pmu_8_1(id);
++}
++
+ /*
+  * Forward to the above feature tests given an ARMCPU pointer.
+  */
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.c
++++ b/target/arm/cpu.c
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
+                 cpu);
+ #endif
+     } else {
+-        cpu->id_aa64dfr0 = FIELD_DP64(cpu->id_aa64dfr0, ID_AA64DFR0, PMUVER, 0);
++        cpu->isar.id_aa64dfr0 =
++            FIELD_DP64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, PMUVER, 0);
+         cpu->isar.id_dfr0 = FIELD_DP32(cpu->isar.id_dfr0, ID_DFR0, PERFMON, 0);
+         cpu->pmceid0 = 0;
+         cpu->pmceid1 = 0;
+diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu64.c
++++ b/target/arm/cpu64.c
+@@ -XXX,XX +XXX,XX @@ static void aarch64_a57_initfn(Object *obj)
+     cpu->isar.id_isar5 = 0x00011121;
+     cpu->isar.id_isar6 = 0;
+     cpu->isar.id_aa64pfr0 = 0x00002222;
+-    cpu->id_aa64dfr0 = 0x10305106;
++    cpu->isar.id_aa64dfr0 = 0x10305106;
+     cpu->isar.id_aa64isar0 = 0x00011120;
+     cpu->isar.id_aa64mmfr0 = 0x00001124;
+     cpu->dbgdidr = 0x3516d000;
+@@ -XXX,XX +XXX,XX @@ static void aarch64_a53_initfn(Object *obj)
+     cpu->isar.id_isar5 = 0x00011121;
+     cpu->isar.id_isar6 = 0;
+     cpu->isar.id_aa64pfr0 = 0x00002222;
+-    cpu->id_aa64dfr0 = 0x10305106;
++    cpu->isar.id_aa64dfr0 = 0x10305106;
+     cpu->isar.id_aa64isar0 = 0x00011120;
+     cpu->isar.id_aa64mmfr0 = 0x00001122; /* 40 bit physical addr */
+     cpu->dbgdidr = 0x3516d000;
+@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
+     cpu->isar.id_isar4 = 0x00011142;
+     cpu->isar.id_isar5 = 0x00011121;
+     cpu->isar.id_aa64pfr0 = 0x00002222;
+-    cpu->id_aa64dfr0 = 0x10305106;
++    cpu->isar.id_aa64dfr0 = 0x10305106;
+     cpu->isar.id_aa64isar0 = 0x00011120;
+     cpu->isar.id_aa64mmfr0 = 0x00001124;
+     cpu->dbgdidr = 0x3516d000;
+diff --git a/target/arm/helper.c b/target/arm/helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.c
++++ b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@
+ #include "hw/semihosting/semihost.h"
+ #include "sysemu/cpus.h"
+ #include "sysemu/kvm.h"
++#include "sysemu/tcg.h"
+ #include "qemu/range.h"
+ #include "qapi/qapi-commands-machine-target.h"
+ #include "qapi/error.h"
+@@ -XXX,XX +XXX,XX @@ static void define_debug_regs(ARMCPU *cpu)
+      * check that if they both exist then they agree.
+      */
+     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
+-        assert(FIELD_EX64(cpu->id_aa64dfr0, ID_AA64DFR0, BRPS) == brps);
+-        assert(FIELD_EX64(cpu->id_aa64dfr0, ID_AA64DFR0, WRPS) == wrps);
+-        assert(FIELD_EX64(cpu->id_aa64dfr0, ID_AA64DFR0, CTX_CMPS) == ctx_cmps);
++        assert(FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, BRPS) == brps);
++        assert(FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, WRPS) == wrps);
++        assert(FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, CTX_CMPS)
++               == ctx_cmps);
+     }
+     define_one_arm_cp_reg(cpu, &dbgdidr);
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
+               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 5, .opc2 = 0,
+               .access = PL1_R, .type = ARM_CP_CONST,
+               .accessfn = access_aa64_tid3,
+-              .resetvalue = cpu->id_aa64dfr0 },
++              .resetvalue = cpu->isar.id_aa64dfr0 },
+             { .name = "ID_AA64DFR1_EL1", .state = ARM_CP_STATE_AA64,
+               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 5, .opc2 = 1,
+               .access = PL1_R, .type = ARM_CP_CONST,
+               .accessfn = access_aa64_tid3,
+-              .resetvalue = cpu->id_aa64dfr1 },
++              .resetvalue = cpu->isar.id_aa64dfr1 },
+             { .name = "ID_AA64DFR2_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
+               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 5, .opc2 = 2,
+               .access = PL1_R, .type = ARM_CP_CONST,
+--
+.20.1

-[Qemu-devel] [PULL 17/21] hw/arm/musca: Add PPCs
+[PULL 23/52] target/arm: Stop assuming DBGDIDR always exists
-Many of the devices on the Musca board live behind TrustZone
+The AArch32 DBGDIDR defines properties like the number of
-Peripheral Protection Controllers (PPCs); add models of the
+breakpoints, watchpoints and context-matching comparators.  On an
-PPCs, using a similar scheme to the MPS2 board models.
+AArch64 CPU, the register may not even exist if AArch32 is not
-This commit wires up the PPCs with "unimplemented device"
+supported at EL1.
-stubs behind them in the correct places in the address map.
 Currently we hard-code use of DBGDIDR to identify the number of
 breakpoints etc; this works for all our TCG CPUs, but will break if
 we ever add an AArch64-only CPU.  We also have an assert() that the
 AArch32 and AArch64 registers match, which currently works only by
 luck for KVM because we don't populate either of these ID registers
 from the KVM vCPU and so they are both zero.
 Clean this up so we have functions for finding the number
 of breakpoints, watchpoints and context comparators which look
 in the appropriate ID register.
 This allows us to drop the "check that AArch64 and AArch32 agree
 on the number of breakpoints etc" asserts:
  * we no longer look at the AArch32 versions unless that's the
    right place to be looking
  * it's valid to have a CPU (eg AArch64-only) where they don't match
  * we shouldn't have been asserting the validity of ID registers
    in a codepath used with KVM anyway
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200214175116.9164-11-peter.maydell@linaro.org
 ---
- hw/arm/musca.c | 289 +++++++++++++++++++++++++++++++++++++++++++++++++
+ target/arm/cpu.h          |  7 +++++++
-file changed, 289 insertions(+)
+ target/arm/internals.h    | 42 +++++++++++++++++++++++++++++++++++++++
  target/arm/debug_helper.c |  6 +++---
  target/arm/helper.c       | 21 +++++---------------
 files changed, 57 insertions(+), 19 deletions(-)
-diff --git a/hw/arm/musca.c b/hw/arm/musca.c
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/musca.c
+--- a/target/arm/cpu.h
-+++ b/hw/arm/musca.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ FIELD(ID_DFR0, MPROFDBG, 20, 4)
- #include "hw/arm/armsse.h"
+ FIELD(ID_DFR0, PERFMON, 24, 4)
- #include "hw/boards.h"
+ FIELD(ID_DFR0, TRACEFILT, 28, 4)
- #include "hw/core/split-irq.h"
-+#include "hw/misc/tz-ppc.h"
++FIELD(DBGDIDR, SE_IMP, 12, 1)
-+#include "hw/misc/unimp.h"
++FIELD(DBGDIDR, NSUHD_IMP, 14, 1)
++FIELD(DBGDIDR, VERSION, 16, 4)
- #define MUSCA_NUMIRQ_MAX 96
++FIELD(DBGDIDR, CTX_CMPS, 20, 4)
-+#define MUSCA_PPC_MAX 3
++FIELD(DBGDIDR, BRPS, 24, 4)
++FIELD(DBGDIDR, WRPS, 28, 4)
- typedef enum MuscaType {
++
-     MUSCA_A,
+ FIELD(MVFR0, SIMDREG, 0, 4)
-@@ -XXX,XX +XXX,XX @@ typedef struct {
+ FIELD(MVFR0, FPSP, 4, 4)
+ FIELD(MVFR0, FPDP, 8, 4)
-     ARMSSE sse;
+diff --git a/target/arm/internals.h b/target/arm/internals.h
-     SplitIRQ cpu_irq_splitter[MUSCA_NUMIRQ_MAX];
+index XXXXXXX..XXXXXXX 100644
-+    SplitIRQ sec_resp_splitter;
+--- a/target/arm/internals.h
-+    TZPPC ppc[MUSCA_PPC_MAX];
++++ b/target/arm/internals.h
-+    MemoryRegion container;
+@@ -XXX,XX +XXX,XX @@ static inline uint32_t arm_debug_exception_fsr(CPUARMState *env)
-+    UnimplementedDeviceState eflash[2];
+     }
-+    UnimplementedDeviceState qspi;
+ }
-+    UnimplementedDeviceState mpc[5];
-+    UnimplementedDeviceState mhu[2];
++/**
-+    UnimplementedDeviceState pwm[3];
++ * arm_num_brps: Return number of implemented breakpoints.
-+    UnimplementedDeviceState i2s;
++ * Note that the ID register BRPS field is "number of bps - 1",
-+    UnimplementedDeviceState uart[2];
++ * and we return the actual number of breakpoints.
 +    UnimplementedDeviceState i2c[2];
 +    UnimplementedDeviceState spi;
 +    UnimplementedDeviceState scc;
 +    UnimplementedDeviceState timer;
 +    UnimplementedDeviceState rtc;
 +    UnimplementedDeviceState pvt;
 +    UnimplementedDeviceState sdio;
 +    UnimplementedDeviceState gpio;
  } MuscaMachineState;
  #define TYPE_MUSCA_MACHINE "musca"
@@ -XXX,XX +XXX,XX @@ typedef struct {
   */
  #define SYSCLK_FRQ 40000000
 +/*
 + * Most of the devices in the Musca board sit behind Peripheral Protection
 + * Controllers. These data structures define the layout of which devices
 + * sit behind which PPCs.
 + * The devfn for each port is a function which creates, configures
 + * and initializes the device, returning the MemoryRegion which
 + * needs to be plugged into the downstream end of the PPC port.
 + */
-+typedef MemoryRegion *MakeDevFn(MuscaMachineState *mms, void *opaque,
++static inline int arm_num_brps(ARMCPU *cpu)
 +                                const char *name, hwaddr size);
 +
 +typedef struct PPCPortInfo {
 +    const char *name;
 +    MakeDevFn *devfn;
 +    void *opaque;
 +    hwaddr addr;
 +    hwaddr size;
 +} PPCPortInfo;
 +
 +typedef struct PPCInfo {
 +    const char *name;
 +    PPCPortInfo ports[TZ_NUM_PORTS];
 +} PPCInfo;
 +
 +static MemoryRegion *make_unimp_dev(MuscaMachineState *mms,
 +                                    void *opaque, const char *name, hwaddr size)
 +{
-+    /*
++    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
-+     * Initialize, configure and realize a TYPE_UNIMPLEMENTED_DEVICE,
++        return FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, BRPS) + 1;
-+     * and return a pointer to its MemoryRegion.
++    } else {
-+     */
++        return FIELD_EX32(cpu->dbgdidr, DBGDIDR, BRPS) + 1;
-+    UnimplementedDeviceState *uds = opaque;
++    }
 +
 +    sysbus_init_child_obj(OBJECT(mms), name, uds,
 +                          sizeof(UnimplementedDeviceState),
 +                          TYPE_UNIMPLEMENTED_DEVICE);
 +    qdev_prop_set_string(DEVICE(uds), "name", name);
 +    qdev_prop_set_uint64(DEVICE(uds), "size", size);
 +    object_property_set_bool(OBJECT(uds), true, "realized", &error_fatal);
 +    return sysbus_mmio_get_region(SYS_BUS_DEVICE(uds), 0);
 +}
 +
-+static MemoryRegion *make_musca_a_devs(MuscaMachineState *mms, void *opaque,
++/**
-+                                       const char *name, hwaddr size)
++ * arm_num_wrps: Return number of implemented watchpoints.
 + * Note that the ID register WRPS field is "number of wps - 1",
 + * and we return the actual number of watchpoints.
 + */
 +static inline int arm_num_wrps(ARMCPU *cpu)
 +{
-+    /*
++    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
-+     * Create the container MemoryRegion for all the devices that live
++        return FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, WRPS) + 1;
-+     * behind the Musca-A PPC's single port. These devices don't have a PPC
++    } else {
-+     * port each, but we use the PPCPortInfo struct as a convenient way
++        return FIELD_EX32(cpu->dbgdidr, DBGDIDR, WRPS) + 1;
 +     * to describe them. Note that addresses here are relative to the base
 +     * address of the PPC port region: 0x40100000, and devices appear both
 +     * at the 0x4... NS region and the 0x5... S region.
 +     */
 +    int i;
 +    MemoryRegion *container = &mms->container;
 +
 +    const PPCPortInfo devices[] = {
 +        { "uart0", make_unimp_dev, &mms->uart[0], 0x1000, 0x1000 },
 +        { "uart1", make_unimp_dev, &mms->uart[1], 0x2000, 0x1000 },
 +        { "spi", make_unimp_dev, &mms->spi, 0x3000, 0x1000 },
 +        { "i2c0", make_unimp_dev, &mms->i2c[0], 0x4000, 0x1000 },
 +        { "i2c1", make_unimp_dev, &mms->i2c[1], 0x5000, 0x1000 },
 +        { "i2s", make_unimp_dev, &mms->i2s, 0x6000, 0x1000 },
 +        { "pwm0", make_unimp_dev, &mms->pwm[0], 0x7000, 0x1000 },
 +        { "rtc", make_unimp_dev, &mms->rtc, 0x8000, 0x1000 },
 +        { "qspi", make_unimp_dev, &mms->qspi, 0xa000, 0x1000 },
 +        { "timer", make_unimp_dev, &mms->timer, 0xb000, 0x1000 },
 +        { "scc", make_unimp_dev, &mms->scc, 0xc000, 0x1000 },
 +        { "pwm1", make_unimp_dev, &mms->pwm[1], 0xe000, 0x1000 },
 +        { "pwm2", make_unimp_dev, &mms->pwm[2], 0xf000, 0x1000 },
 +        { "gpio", make_unimp_dev, &mms->gpio, 0x10000, 0x1000 },
 +        { "mpc0", make_unimp_dev, &mms->mpc[0], 0x12000, 0x1000 },
 +        { "mpc1", make_unimp_dev, &mms->mpc[1], 0x13000, 0x1000 },
 +    };
 +
 +    memory_region_init(container, OBJECT(mms), "musca-device-container", size);
 +
 +    for (i = 0; i < ARRAY_SIZE(devices); i++) {
 +        const PPCPortInfo *pinfo = &devices[i];
 +        MemoryRegion *mr;
 +
 +        mr = pinfo->devfn(mms, pinfo->opaque, pinfo->name, pinfo->size);
 +        memory_region_add_subregion(container, pinfo->addr, mr);
 +    }
-+
-+    return &mms->container;
 +}
 +
- static void musca_init(MachineState *machine)
++/**
 + * arm_num_ctx_cmps: Return number of implemented context comparators.
 + * Note that the ID register CTX_CMPS field is "number of cmps - 1",
 + * and we return the actual number of comparators.
 + */
 +static inline int arm_num_ctx_cmps(ARMCPU *cpu)
 +{
 +    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
 +        return FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, CTX_CMPS) + 1;
 +    } else {
 +        return FIELD_EX32(cpu->dbgdidr, DBGDIDR, CTX_CMPS) + 1;
 +    }
 +}
 +
  /* Note make_memop_idx reserves 4 bits for mmu_idx, and MO_BSWAP is bit 3.
   * Thus a TCGMemOpIdx, without any MO_ALIGN bits, fits in 8 bits.
   */
 diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/debug_helper.c
 +++ b/target/arm/debug_helper.c
@@ -XXX,XX +XXX,XX @@ static bool linked_bp_matches(ARMCPU *cpu, int lbn)
  {
-     MuscaMachineState *mms = MUSCA_MACHINE(machine);
+     CPUARMState *env = &cpu->env;
-@@ -XXX,XX +XXX,XX @@ static void musca_init(MachineState *machine)
+     uint64_t bcr = env->cp15.dbgbcr[lbn];
-     MachineClass *mc = MACHINE_GET_CLASS(machine);
+-    int brps = extract32(cpu->dbgdidr, 24, 4);
-     MemoryRegion *system_memory = get_system_memory();
+-    int ctx_cmps = extract32(cpu->dbgdidr, 20, 4);
-     DeviceState *ssedev;
++    int brps = arm_num_brps(cpu);
-+    DeviceState *dev_splitter;
++    int ctx_cmps = arm_num_ctx_cmps(cpu);
-+    const PPCInfo *ppcs;
+     int bt;
-+    int num_ppcs;
+     uint32_t contextidr;
-     int i;
+     uint64_t hcr_el2;
+@@ -XXX,XX +XXX,XX @@ static bool linked_bp_matches(ARMCPU *cpu, int lbn)
-     assert(mmc->num_irqs <= MUSCA_NUMIRQ_MAX);
+      * case DBGWCR<n>_EL1.LBN must indicate that breakpoint).
-@@ -XXX,XX +XXX,XX @@ static void musca_init(MachineState *machine)
+      * We choose the former.
-                                                      "EXP_CPU1_IRQ", i));
+      */
 -    if (lbn > brps || lbn < (brps - ctx_cmps)) {
 +    if (lbn >= brps || lbn < (brps - ctx_cmps)) {
          return false;
      }
-+    /*
+diff --git a/target/arm/helper.c b/target/arm/helper.c
-+     * The sec_resp_cfg output from the SSE-200 must be split into multiple
+index XXXXXXX..XXXXXXX 100644
-+     * lines, one for each of the PPCs we create here.
+--- a/target/arm/helper.c
-+     */
++++ b/target/arm/helper.c
-+    object_initialize(&mms->sec_resp_splitter, sizeof(mms->sec_resp_splitter),
+@@ -XXX,XX +XXX,XX @@ static void define_debug_regs(ARMCPU *cpu)
-+                      TYPE_SPLIT_IRQ);
+     };
-+    object_property_add_child(OBJECT(machine), "sec-resp-splitter",
-+                              OBJECT(&mms->sec_resp_splitter), &error_fatal);
+     /* Note that all these register fields hold "number of Xs minus 1". */
-+    object_property_set_int(OBJECT(&mms->sec_resp_splitter),
+-    brps = extract32(cpu->dbgdidr, 24, 4);
-+                            ARRAY_SIZE(mms->ppc), "num-lines", &error_fatal);
+-    wrps = extract32(cpu->dbgdidr, 28, 4);
-+    object_property_set_bool(OBJECT(&mms->sec_resp_splitter), true,
+-    ctx_cmps = extract32(cpu->dbgdidr, 20, 4);
-+                             "realized", &error_fatal);
++    brps = arm_num_brps(cpu);
-+    dev_splitter = DEVICE(&mms->sec_resp_splitter);
++    wrps = arm_num_wrps(cpu);
-+    qdev_connect_gpio_out_named(ssedev, "sec_resp_cfg", 0,
++    ctx_cmps = arm_num_ctx_cmps(cpu);
-+                                qdev_get_gpio_in(dev_splitter, 0));
-+
+     assert(ctx_cmps <= brps);
-+    /*
-+     * Most of the devices in the board are behind Peripheral Protection
+-    /* The DBGDIDR and ID_AA64DFR0_EL1 define various properties
-+     * Controllers. The required order for initializing things is:
+-     * of the debug registers such as number of breakpoints;
-+     *  + initialize the PPC
+-     * check that if they both exist then they agree.
-+     *  + initialize, configure and realize downstream devices
+-     */
-+     *  + connect downstream device MemoryRegions to the PPC
+-    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
-+     *  + realize the PPC
+-        assert(FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, BRPS) == brps);
-+     *  + map the PPC's MemoryRegions to the places in the address map
+-        assert(FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, WRPS) == wrps);
-+     *    where the downstream devices should appear
+-        assert(FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, CTX_CMPS)
-+     *  + wire up the PPC's control lines to the SSE object
+-               == ctx_cmps);
-+     *
+-    }
-+     * The PPC mapping differs for the -A and -B1 variants; the -A version
+-
-+     * is much simpler, using only a single port of a single PPC and putting
+     define_one_arm_cp_reg(cpu, &dbgdidr);
-+     * all the devices behind that.
+     define_arm_cp_regs(cpu, debug_cp_reginfo);
-+     */
-+    const PPCInfo a_ppcs[] = { {
+@@ -XXX,XX +XXX,XX @@ static void define_debug_regs(ARMCPU *cpu)
-+            .name = "ahb_ppcexp0",
+         define_arm_cp_regs(cpu, debug_lpae_cp_reginfo);
-+            .ports = {
+     }
-+                { "musca-devices", make_musca_a_devs, 0, 0x40100000, 0x100000 },
-+            },
+-    for (i = 0; i < brps + 1; i++) {
-+        },
++    for (i = 0; i < brps; i++) {
-+    };
+         ARMCPRegInfo dbgregs[] = {
-+
+             { .name = "DBGBVR", .state = ARM_CP_STATE_BOTH,
-+    /*
+               .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = i, .opc2 = 4,
-+     * Devices listed with an 0x4.. address appear in both the NS 0x4.. region
+@@ -XXX,XX +XXX,XX @@ static void define_debug_regs(ARMCPU *cpu)
-+     * and the 0x5.. S region. Devices listed with an 0x5.. address appear
+         define_arm_cp_regs(cpu, dbgregs);
-+     * only in the S region.
+     }
-+     */
-+    const PPCInfo b1_ppcs[] = { {
+-    for (i = 0; i < wrps + 1; i++) {
-+            .name = "apb_ppcexp0",
++    for (i = 0; i < wrps; i++) {
-+            .ports = {
+         ARMCPRegInfo dbgregs[] = {
-+                { "eflash0", make_unimp_dev, &mms->eflash[0],
+             { .name = "DBGWVR", .state = ARM_CP_STATE_BOTH,
-+                  0x52400000, 0x1000 },
+               .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = i, .opc2 = 6,
 +                { "eflash1", make_unimp_dev, &mms->eflash[1],
 +                  0x52500000, 0x1000 },
 +                { "qspi", make_unimp_dev, &mms->qspi, 0x42800000, 0x100000 },
 +                { "mpc0", make_unimp_dev, &mms->mpc[0], 0x52000000, 0x1000 },
 +                { "mpc1", make_unimp_dev, &mms->mpc[1], 0x52100000, 0x1000 },
 +                { "mpc2", make_unimp_dev, &mms->mpc[2], 0x52200000, 0x1000 },
 +                { "mpc3", make_unimp_dev, &mms->mpc[3], 0x52300000, 0x1000 },
 +                { "mhu0", make_unimp_dev, &mms->mhu[0], 0x42600000, 0x100000 },
 +                { "mhu1", make_unimp_dev, &mms->mhu[1], 0x42700000, 0x100000 },
 +                { }, /* port 9: unused */
 +                { }, /* port 10: unused */
 +                { }, /* port 11: unused */
 +                { }, /* port 12: unused */
 +                { }, /* port 13: unused */
 +                { "mpc4", make_unimp_dev, &mms->mpc[4], 0x52e00000, 0x1000 },
 +            },
 +        }, {
 +            .name = "apb_ppcexp1",
 +            .ports = {
 +                { "pwm0", make_unimp_dev, &mms->pwm[0], 0x40101000, 0x1000 },
 +                { "pwm1", make_unimp_dev, &mms->pwm[1], 0x40102000, 0x1000 },
 +                { "pwm2", make_unimp_dev, &mms->pwm[2], 0x40103000, 0x1000 },
 +                { "i2s", make_unimp_dev, &mms->i2s, 0x40104000, 0x1000 },
 +                { "uart0", make_unimp_dev, &mms->uart[0], 0x40105000, 0x1000 },
 +                { "uart1", make_unimp_dev, &mms->uart[1], 0x40106000, 0x1000 },
 +                { "i2c0", make_unimp_dev, &mms->i2c[0], 0x40108000, 0x1000 },
 +                { "i2c1", make_unimp_dev, &mms->i2c[1], 0x40109000, 0x1000 },
 +                { "spi", make_unimp_dev, &mms->spi, 0x4010a000, 0x1000 },
 +                { "scc", make_unimp_dev, &mms->scc, 0x5010b000, 0x1000 },
 +                { "timer", make_unimp_dev, &mms->timer, 0x4010c000, 0x1000 },
 +                { "rtc", make_unimp_dev, &mms->rtc, 0x4010d000, 0x1000 },
 +                { "pvt", make_unimp_dev, &mms->pvt, 0x4010e000, 0x1000 },
 +                { "sdio", make_unimp_dev, &mms->sdio, 0x4010f000, 0x1000 },
 +            },
 +        }, {
 +            .name = "ahb_ppcexp0",
 +            .ports = {
 +                { }, /* port 0: unused */
 +                { "gpio", make_unimp_dev, &mms->gpio, 0x41000000, 0x1000 },
 +            },
 +        },
 +    };
 +
 +    switch (mmc->type) {
 +    case MUSCA_A:
 +        ppcs = a_ppcs;
 +        num_ppcs = ARRAY_SIZE(a_ppcs);
 +        break;
 +    case MUSCA_B1:
 +        ppcs = b1_ppcs;
 +        num_ppcs = ARRAY_SIZE(b1_ppcs);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +    assert(num_ppcs <= MUSCA_PPC_MAX);
 +
 +    for (i = 0; i < num_ppcs; i++) {
 +        const PPCInfo *ppcinfo = &ppcs[i];
 +        TZPPC *ppc = &mms->ppc[i];
 +        DeviceState *ppcdev;
 +        int port;
 +        char *gpioname;
 +
 +        sysbus_init_child_obj(OBJECT(machine), ppcinfo->name, ppc,
 +                              sizeof(TZPPC), TYPE_TZ_PPC);
 +        ppcdev = DEVICE(ppc);
 +
 +        for (port = 0; port < TZ_NUM_PORTS; port++) {
 +            const PPCPortInfo *pinfo = &ppcinfo->ports[port];
 +            MemoryRegion *mr;
 +            char *portname;
 +
 +            if (!pinfo->devfn) {
 +                continue;
 +            }
 +
 +            mr = pinfo->devfn(mms, pinfo->opaque, pinfo->name, pinfo->size);
 +            portname = g_strdup_printf("port[%d]", port);
 +            object_property_set_link(OBJECT(ppc), OBJECT(mr),
 +                                     portname, &error_fatal);
 +            g_free(portname);
 +        }
 +
 +        object_property_set_bool(OBJECT(ppc), true, "realized", &error_fatal);
 +
 +        for (port = 0; port < TZ_NUM_PORTS; port++) {
 +            const PPCPortInfo *pinfo = &ppcinfo->ports[port];
 +
 +            if (!pinfo->devfn) {
 +                continue;
 +            }
 +            sysbus_mmio_map(SYS_BUS_DEVICE(ppc), port, pinfo->addr);
 +
 +            gpioname = g_strdup_printf("%s_nonsec", ppcinfo->name);
 +            qdev_connect_gpio_out_named(ssedev, gpioname, port,
 +                                        qdev_get_gpio_in_named(ppcdev,
 +                                                               "cfg_nonsec",
 +                                                               port));
 +            g_free(gpioname);
 +            gpioname = g_strdup_printf("%s_ap", ppcinfo->name);
 +            qdev_connect_gpio_out_named(ssedev, gpioname, port,
 +                                        qdev_get_gpio_in_named(ppcdev,
 +                                                               "cfg_ap", port));
 +            g_free(gpioname);
 +        }
 +
 +        gpioname = g_strdup_printf("%s_irq_enable", ppcinfo->name);
 +        qdev_connect_gpio_out_named(ssedev, gpioname, 0,
 +                                    qdev_get_gpio_in_named(ppcdev,
 +                                                           "irq_enable", 0));
 +        g_free(gpioname);
 +        gpioname = g_strdup_printf("%s_irq_clear", ppcinfo->name);
 +        qdev_connect_gpio_out_named(ssedev, gpioname, 0,
 +                                    qdev_get_gpio_in_named(ppcdev,
 +                                                           "irq_clear", 0));
 +        g_free(gpioname);
 +        gpioname = g_strdup_printf("%s_irq_status", ppcinfo->name);
 +        qdev_connect_gpio_out_named(ppcdev, "irq", 0,
 +                                    qdev_get_gpio_in_named(ssedev,
 +                                                           gpioname, 0));
 +        g_free(gpioname);
 +
 +        qdev_connect_gpio_out(dev_splitter, i,
 +                              qdev_get_gpio_in_named(ppcdev,
 +                                                     "cfg_sec_resp", 0));
 +    }
 +
      armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename, 0x2000000);
  }
 --
 .20.1

-[Qemu-devel] [PULL 10/21] hw/timer/pl031: Convert to using trace events
+[PULL 24/52] target/arm: Move DBGDIDR into ARMISARegisters
-Convert the debug printing in the PL031 device to use trace events,
+We're going to want to read the DBGDIDR register from KVM in
-and augment it to cover the interesting parts of device operation.
+a subsequent commit, which means it needs to be in the
 ARMISARegisters sub-struct. Move it.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200214175116.9164-12-peter.maydell@linaro.org
 ---
- hw/timer/pl031.c      | 55 +++++++++++++++++++++++--------------------
+ target/arm/cpu.h       | 2 +-
- hw/timer/trace-events |  6 +++++
+ target/arm/internals.h | 6 +++---
-files changed, 36 insertions(+), 25 deletions(-)
+ target/arm/cpu.c       | 8 ++++----
  target/arm/cpu64.c     | 6 +++---
  target/arm/helper.c    | 2 +-
 files changed, 12 insertions(+), 12 deletions(-)
-diff --git a/hw/timer/pl031.c b/hw/timer/pl031.c
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/timer/pl031.c
+--- a/target/arm/cpu.h
-+++ b/hw/timer/pl031.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
- #include "sysemu/sysemu.h"
+         uint32_t mvfr1;
- #include "qemu/cutils.h"
+         uint32_t mvfr2;
- #include "qemu/log.h"
+         uint32_t id_dfr0;
--
++        uint32_t dbgdidr;
--//#define DEBUG_PL031
+         uint64_t id_aa64isar0;
--
+         uint64_t id_aa64isar1;
--#ifdef DEBUG_PL031
+         uint64_t id_aa64pfr0;
--#define DPRINTF(fmt, ...) \
+@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
--do { printf("pl031: " fmt , ## __VA_ARGS__); } while (0)
+     uint32_t id_mmfr4;
--#else
+     uint64_t id_aa64afr0;
--#define DPRINTF(fmt, ...) do {} while(0)
+     uint64_t id_aa64afr1;
--#endif
+-    uint32_t dbgdidr;
-+#include "trace.h"
+     uint32_t clidr;
+     uint64_t mp_affinity; /* MP ID without feature bits */
- #define RTC_DR      0x00    /* Data read register */
+     /* The elements of this array are the CCSIDR values for each cache,
- #define RTC_MR      0x04    /* Match register */
+diff --git a/target/arm/internals.h b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ static const unsigned char pl031_id[] = {
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/internals.h
- static void pl031_update(PL031State *s)
++++ b/target/arm/internals.h
- {
+@@ -XXX,XX +XXX,XX @@ static inline int arm_num_brps(ARMCPU *cpu)
--    qemu_set_irq(s->irq, s->is & s->im);
+     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
-+    uint32_t flags = s->is & s->im;
+         return FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, BRPS) + 1;
-+
+     } else {
-+    trace_pl031_irq_state(flags);
+-        return FIELD_EX32(cpu->dbgdidr, DBGDIDR, BRPS) + 1;
-+    qemu_set_irq(s->irq, flags);
++        return FIELD_EX32(cpu->isar.dbgdidr, DBGDIDR, BRPS) + 1;
      }
  }
- static void pl031_interrupt(void * opaque)
+@@ -XXX,XX +XXX,XX @@ static inline int arm_num_wrps(ARMCPU *cpu)
-@@ -XXX,XX +XXX,XX @@ static void pl031_interrupt(void * opaque)
+     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
-     PL031State *s = (PL031State *)opaque;
+         return FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, WRPS) + 1;
+     } else {
-     s->is = 1;
+-        return FIELD_EX32(cpu->dbgdidr, DBGDIDR, WRPS) + 1;
--    DPRINTF("Alarm raised\n");
++        return FIELD_EX32(cpu->isar.dbgdidr, DBGDIDR, WRPS) + 1;
-+    trace_pl031_alarm_raised();
+     }
      pl031_update(s);
  }
-@@ -XXX,XX +XXX,XX @@ static void pl031_set_alarm(PL031State *s)
+@@ -XXX,XX +XXX,XX @@ static inline int arm_num_ctx_cmps(ARMCPU *cpu)
-     /* The timer wraps around.  This subtraction also wraps in the same way,
+     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
-        and gives correct results when alarm < now_ticks.  */
+         return FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, CTX_CMPS) + 1;
-     ticks = s->mr - pl031_get_count(s);
+     } else {
--    DPRINTF("Alarm set in %ud ticks\n", ticks);
+-        return FIELD_EX32(cpu->dbgdidr, DBGDIDR, CTX_CMPS) + 1;
-+    trace_pl031_set_alarm(ticks);
++        return FIELD_EX32(cpu->isar.dbgdidr, DBGDIDR, CTX_CMPS) + 1;
      if (ticks == 0) {
          timer_del(s->timer);
          pl031_interrupt(s);
@@ -XXX,XX +XXX,XX @@ static uint64_t pl031_read(void *opaque, hwaddr offset,
                             unsigned size)
  {
      PL031State *s = (PL031State *)opaque;
 -
 -    if (offset >= 0xfe0  &&  offset < 0x1000)
 -        return pl031_id[(offset - 0xfe0) >> 2];
 +    uint64_t r;
      switch (offset) {
      case RTC_DR:
 -        return pl031_get_count(s);
 +        r = pl031_get_count(s);
 +        break;
      case RTC_MR:
 -        return s->mr;
 +        r = s->mr;
 +        break;
      case RTC_IMSC:
 -        return s->im;
 +        r = s->im;
 +        break;
      case RTC_RIS:
 -        return s->is;
 +        r = s->is;
 +        break;
      case RTC_LR:
 -        return s->lr;
 +        r = s->lr;
 +        break;
      case RTC_CR:
          /* RTC is permanently enabled.  */
 -        return 1;
 +        r = 1;
 +        break;
      case RTC_MIS:
 -        return s->is & s->im;
 +        r = s->is & s->im;
 +        break;
 +    case 0xfe0 ... 0xfff:
 +        r = pl031_id[(offset - 0xfe0) >> 2];
 +        break;
      case RTC_ICR:
          qemu_log_mask(LOG_GUEST_ERROR,
                        "pl031: read of write-only register at offset 0x%x\n",
                        (int)offset);
 +        r = 0;
          break;
      default:
          qemu_log_mask(LOG_GUEST_ERROR,
                        "pl031_read: Bad offset 0x%x\n", (int)offset);
 +        r = 0;
          break;
      }
--    return 0;
-+    trace_pl031_read(offset, r);
-+    return r;
  }
- static void pl031_write(void * opaque, hwaddr offset,
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void pl031_write(void * opaque, hwaddr offset,
  {
      PL031State *s = (PL031State *)opaque;
 +    trace_pl031_write(offset, value);
      switch (offset) {
      case RTC_LR:
@@ -XXX,XX +XXX,XX @@ static void pl031_write(void * opaque, hwaddr offset,
          break;
      case RTC_IMSC:
          s->im = value & 1;
 -        DPRINTF("Interrupt mask %d\n", s->im);
          pl031_update(s);
          break;
      case RTC_ICR:
@@ -XXX,XX +XXX,XX @@ static void pl031_write(void * opaque, hwaddr offset,
             cleared when bit 0 of the written value is set.  However the
             arm926e documentation (DDI0287B) states that the interrupt is
             cleared when any value is written.  */
 -        DPRINTF("Interrupt cleared");
          s->is = 0;
          pl031_update(s);
          break;
 diff --git a/hw/timer/trace-events b/hw/timer/trace-events
 index XXXXXXX..XXXXXXX 100644
---- a/hw/timer/trace-events
+--- a/target/arm/cpu.c
-+++ b/hw/timer/trace-events
++++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ xlnx_zynqmp_rtc_gettime(int year, int month, int day, int hour, int min, int sec
+@@ -XXX,XX +XXX,XX @@ static void cortex_a8_initfn(Object *obj)
- nrf51_timer_read(uint64_t addr, uint32_t value, unsigned size) "read addr 0x%" PRIx64 " data 0x%" PRIx32 " size %u"
+     cpu->isar.id_isar2 = 0x21232031;
- nrf51_timer_write(uint64_t addr, uint32_t value, unsigned size) "write addr 0x%" PRIx64 " data 0x%" PRIx32 " size %u"
+     cpu->isar.id_isar3 = 0x11112131;
+     cpu->isar.id_isar4 = 0x00111142;
-+# hw/timer/pl031.c
+-    cpu->dbgdidr = 0x15141000;
-+pl031_irq_state(int level) "irq state %d"
++    cpu->isar.dbgdidr = 0x15141000;
-+pl031_read(uint32_t addr, uint32_t value) "addr 0x%08x value 0x%08x"
+     cpu->clidr = (1 << 27) | (2 << 24) | 3;
-+pl031_write(uint32_t addr, uint32_t value) "addr 0x%08x value 0x%08x"
+     cpu->ccsidr[0] = 0xe007e01a; /* 16k L1 dcache. */
-+pl031_alarm_raised(void) "alarm raised"
+     cpu->ccsidr[1] = 0x2007e01a; /* 16k L1 icache. */
-+pl031_set_alarm(uint32_t ticks) "alarm set for %u ticks"
+@@ -XXX,XX +XXX,XX @@ static void cortex_a9_initfn(Object *obj)
      cpu->isar.id_isar2 = 0x21232041;
      cpu->isar.id_isar3 = 0x11112131;
      cpu->isar.id_isar4 = 0x00111142;
 -    cpu->dbgdidr = 0x35141000;
 +    cpu->isar.dbgdidr = 0x35141000;
      cpu->clidr = (1 << 27) | (1 << 24) | 3;
      cpu->ccsidr[0] = 0xe00fe019; /* 16k L1 dcache. */
      cpu->ccsidr[1] = 0x200fe019; /* 16k L1 icache. */
@@ -XXX,XX +XXX,XX @@ static void cortex_a7_initfn(Object *obj)
      cpu->isar.id_isar2 = 0x21232041;
      cpu->isar.id_isar3 = 0x11112131;
      cpu->isar.id_isar4 = 0x10011142;
 -    cpu->dbgdidr = 0x3515f005;
 +    cpu->isar.dbgdidr = 0x3515f005;
      cpu->clidr = 0x0a200023;
      cpu->ccsidr[0] = 0x701fe00a; /* 32K L1 dcache */
      cpu->ccsidr[1] = 0x201fe00a; /* 32K L1 icache */
@@ -XXX,XX +XXX,XX @@ static void cortex_a15_initfn(Object *obj)
      cpu->isar.id_isar2 = 0x21232041;
      cpu->isar.id_isar3 = 0x11112131;
      cpu->isar.id_isar4 = 0x10011142;
 -    cpu->dbgdidr = 0x3515f021;
 +    cpu->isar.dbgdidr = 0x3515f021;
      cpu->clidr = 0x0a200023;
      cpu->ccsidr[0] = 0x701fe00a; /* 32K L1 dcache */
      cpu->ccsidr[1] = 0x201fe00a; /* 32K L1 icache */
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_a57_initfn(Object *obj)
      cpu->isar.id_aa64dfr0 = 0x10305106;
      cpu->isar.id_aa64isar0 = 0x00011120;
      cpu->isar.id_aa64mmfr0 = 0x00001124;
 -    cpu->dbgdidr = 0x3516d000;
 +    cpu->isar.dbgdidr = 0x3516d000;
      cpu->clidr = 0x0a200023;
      cpu->ccsidr[0] = 0x701fe00a; /* 32KB L1 dcache */
      cpu->ccsidr[1] = 0x201fe012; /* 48KB L1 icache */
@@ -XXX,XX +XXX,XX @@ static void aarch64_a53_initfn(Object *obj)
      cpu->isar.id_aa64dfr0 = 0x10305106;
      cpu->isar.id_aa64isar0 = 0x00011120;
      cpu->isar.id_aa64mmfr0 = 0x00001122; /* 40 bit physical addr */
 -    cpu->dbgdidr = 0x3516d000;
 +    cpu->isar.dbgdidr = 0x3516d000;
      cpu->clidr = 0x0a200023;
      cpu->ccsidr[0] = 0x700fe01a; /* 32KB L1 dcache */
      cpu->ccsidr[1] = 0x201fe00a; /* 32KB L1 icache */
@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
      cpu->isar.id_aa64dfr0 = 0x10305106;
      cpu->isar.id_aa64isar0 = 0x00011120;
      cpu->isar.id_aa64mmfr0 = 0x00001124;
 -    cpu->dbgdidr = 0x3516d000;
 +    cpu->isar.dbgdidr = 0x3516d000;
      cpu->clidr = 0x0a200023;
      cpu->ccsidr[0] = 0x701fe00a; /* 32KB L1 dcache */
      cpu->ccsidr[1] = 0x201fe012; /* 48KB L1 icache */
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void define_debug_regs(ARMCPU *cpu)
      ARMCPRegInfo dbgdidr = {
          .name = "DBGDIDR", .cp = 14, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 0,
          .access = PL0_R, .accessfn = access_tda,
 -        .type = ARM_CP_CONST, .resetvalue = cpu->dbgdidr,
 +        .type = ARM_CP_CONST, .resetvalue = cpu->isar.dbgdidr,
      };
      /* Note that all these register fields hold "number of Xs minus 1". */
 --
 .20.1

-[Qemu-devel] [PULL 21/21] hw/arm/armsse: Make 0x5... alias region work for per-CPU devices
+[PULL 25/52] target/arm: Read debug-related ID registers from KVM
-The region 0x40010000 .. 0x4001ffff and its secure-only alias
+Now we have isar_feature test functions that look at fields in the
-at 0x50010000... are for per-CPU devices. We implement this by
+ID_AA64DFR0_EL1 and ID_DFR0 ID registers, add the code that reads
-giving each CPU its own container memory region, where the
+these register values from KVM so that the checks behave correctly
-per-CPU devices live. Unfortunately, the alias region which
+when we're using KVM.
 makes devices mapped at 0x4... addresses also appear at 0x5...
 is only implemented in the overall "all CPUs" container. The
 effect of this bug is that the CPU_IDENTITY register block appears
 only at 0x4001f000, but not at the 0x5001f000 alias where it should
 also appear. Guests (like very recent Arm Trusted Firmware-M)
 which try to access it at 0x5001f000 will crash.
-Fix this by moving the handling for this alias from the "all CPUs"
+No isar_feature function tests ID_AA64DFR1_EL1 or DBGDIDR yet, but we
-container to the per-CPU container. (We leave the aliases for
+add it to maintain the invariant that every field in the
-x1... and 0x3... in the overall container, because there are
+ARMISARegisters struct is populated for a KVM CPU and can be relied
-no per-CPU devices there.)
+on.  This requirement isn't actually written down yet, so add a note
 to the relevant comment.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20190215180500.6906-1-peter.maydell@linaro.org
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Message-id: 20200214175116.9164-13-peter.maydell@linaro.org
 ---
- include/hw/arm/armsse.h |  2 +-
+ target/arm/cpu.h   |  5 +++++
- hw/arm/armsse.c         | 26 ++++++++++++++++----------
+ target/arm/kvm32.c |  8 ++++++++
-files changed, 17 insertions(+), 11 deletions(-)
+ target/arm/kvm64.c | 36 ++++++++++++++++++++++++++++++++++++
 files changed, 49 insertions(+)
-diff --git a/include/hw/arm/armsse.h b/include/hw/arm/armsse.h
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/armsse.h
+--- a/target/arm/cpu.h
-+++ b/include/hw/arm/armsse.h
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ typedef struct ARMSSE {
+@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
-     MemoryRegion cpu_container[SSE_MAX_CPUS];
+      * prefix means a constant register.
-     MemoryRegion alias1;
+      * Some of these registers are split out into a substructure that
-     MemoryRegion alias2;
+      * is shared with the translators to control the ISA.
--    MemoryRegion alias3;
++     *
-+    MemoryRegion alias3[SSE_MAX_CPUS];
++     * Note that if you add an ID register to the ARMISARegisters struct
-     MemoryRegion sram[MAX_SRAM_BANKS];
++     * you need to also update the 32-bit and 64-bit versions of the
++     * kvm_arm_get_host_cpu_features() function to correctly populate the
-     qemu_irq *exp_irqs[SSE_MAX_CPUS];
++     * field by reading the value from the KVM vCPU.
-diff --git a/hw/arm/armsse.c b/hw/arm/armsse.c
+      */
      struct ARMISARegisters {
          uint32_t id_isar0;
 diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/armsse.c
+--- a/target/arm/kvm32.c
-+++ b/hw/arm/armsse.c
++++ b/target/arm/kvm32.c
-@@ -XXX,XX +XXX,XX @@ static bool irq_is_common[32] = {
+@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
-     /* 30, 31: reserved */
+         ahcf->isar.id_isar6 = 0;
  };
 -/* Create an alias region of @size bytes starting at @base
 +/*
 + * Create an alias region in @container of @size bytes starting at @base
   * which mirrors the memory starting at @orig.
   */
 -static void make_alias(ARMSSE *s, MemoryRegion *mr, const char *name,
 -                       hwaddr base, hwaddr size, hwaddr orig)
 +static void make_alias(ARMSSE *s, MemoryRegion *mr, MemoryRegion *container,
 +                       const char *name, hwaddr base, hwaddr size, hwaddr orig)
  {
 -    memory_region_init_alias(mr, NULL, name, &s->container, orig, size);
 +    memory_region_init_alias(mr, NULL, name, container, orig, size);
      /* The alias is even lower priority than unimplemented_device regions */
 -    memory_region_add_subregion_overlap(&s->container, base, mr, -1500);
 +    memory_region_add_subregion_overlap(container, base, mr, -1500);
  }
  static void irq_status_forwarder(void *opaque, int n, int level)
@@ -XXX,XX +XXX,XX @@ static void armsse_realize(DeviceState *dev, Error **errp)
      }
-     /* Set up the big aliases first */
++    err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_dfr0,
--    make_alias(s, &s->alias1, "alias 1", 0x10000000, 0x10000000, 0x00000000);
++                          ARM_CP15_REG32(0, 0, 1, 2));
--    make_alias(s, &s->alias2, "alias 2", 0x30000000, 0x10000000, 0x20000000);
++
-+    make_alias(s, &s->alias1, &s->container, "alias 1",
+     err |= read_sys_reg32(fdarray[2], &ahcf->isar.mvfr0,
-+               0x10000000, 0x10000000, 0x00000000);
+                           KVM_REG_ARM | KVM_REG_SIZE_U32 |
-+    make_alias(s, &s->alias2, &s->container,
+                           KVM_REG_ARM_VFP | KVM_REG_ARM_VFP_MVFR0);
-+               "alias 2", 0x30000000, 0x10000000, 0x20000000);
+@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
-     /* The 0x50000000..0x5fffffff region is not a pure alias: it has
+      * Fortunately there is not yet anything in there that affects migration.
       * a few extra devices that only appear there (generally the
       * control interfaces for the protection controllers).
       * We implement this by mapping those devices over the top of this
 -     * alias MR at a higher priority.
 +     * alias MR at a higher priority. Some of the devices in this range
 +     * are per-CPU, so we must put this alias in the per-cpu containers.
       */
--    make_alias(s, &s->alias3, "alias 3", 0x50000000, 0x10000000, 0x40000000);
--
++    /*
-+    for (i = 0; i < info->num_cpus; i++) {
++     * There is no way to read DBGDIDR, because currently 32-bit KVM
-+        make_alias(s, &s->alias3[i], &s->cpu_container[i],
++     * doesn't implement debug at all. Leave it at zero.
-+                   "alias 3", 0x50000000, 0x10000000, 0x40000000);
++     */
-+    }
++
+     kvm_arm_destroy_scratch_host_vcpu(fdarray);
-     /* Security controller */
-     object_property_set_bool(OBJECT(&s->secctl), true, "realized", &err);
+     if (err < 0) {
 diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm64.c
 +++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
      } else {
          err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64pfr1,
                                ARM64_SYS_REG(3, 0, 0, 4, 1));
 +        err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64dfr0,
 +                              ARM64_SYS_REG(3, 0, 0, 5, 0));
 +        err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64dfr1,
 +                              ARM64_SYS_REG(3, 0, 0, 5, 1));
          err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64isar0,
                                ARM64_SYS_REG(3, 0, 0, 6, 0));
          err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64isar1,
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
           * than skipping the reads and leaving 0, as we must avoid
           * considering the values in every case.
           */
 +        err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_dfr0,
 +                              ARM64_SYS_REG(3, 0, 0, 1, 2));
          err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_isar0,
                                ARM64_SYS_REG(3, 0, 0, 2, 0));
          err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_isar1,
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
                                ARM64_SYS_REG(3, 0, 0, 3, 1));
          err |= read_sys_reg32(fdarray[2], &ahcf->isar.mvfr2,
                                ARM64_SYS_REG(3, 0, 0, 3, 2));
 +
 +        /*
 +         * DBGDIDR is a bit complicated because the kernel doesn't
 +         * provide an accessor for it in 64-bit mode, which is what this
 +         * scratch VM is in, and there's no architected "64-bit sysreg
 +         * which reads the same as the 32-bit register" the way there is
 +         * for other ID registers. Instead we synthesize a value from the
 +         * AArch64 ID_AA64DFR0, the same way the kernel code in
 +         * arch/arm64/kvm/sys_regs.c:trap_dbgidr() does.
 +         * We only do this if the CPU supports AArch32 at EL1.
 +         */
 +        if (FIELD_EX32(ahcf->isar.id_aa64pfr0, ID_AA64PFR0, EL1) >= 2) {
 +            int wrps = FIELD_EX64(ahcf->isar.id_aa64dfr0, ID_AA64DFR0, WRPS);
 +            int brps = FIELD_EX64(ahcf->isar.id_aa64dfr0, ID_AA64DFR0, BRPS);
 +            int ctx_cmps =
 +                FIELD_EX64(ahcf->isar.id_aa64dfr0, ID_AA64DFR0, CTX_CMPS);
 +            int version = 6; /* ARMv8 debug architecture */
 +            bool has_el3 =
 +                !!FIELD_EX32(ahcf->isar.id_aa64pfr0, ID_AA64PFR0, EL3);
 +            uint32_t dbgdidr = 0;
 +
 +            dbgdidr = FIELD_DP32(dbgdidr, DBGDIDR, WRPS, wrps);
 +            dbgdidr = FIELD_DP32(dbgdidr, DBGDIDR, BRPS, brps);
 +            dbgdidr = FIELD_DP32(dbgdidr, DBGDIDR, CTX_CMPS, ctx_cmps);
 +            dbgdidr = FIELD_DP32(dbgdidr, DBGDIDR, VERSION, version);
 +            dbgdidr = FIELD_DP32(dbgdidr, DBGDIDR, NSUHD_IMP, has_el3);
 +            dbgdidr = FIELD_DP32(dbgdidr, DBGDIDR, SE_IMP, has_el3);
 +            dbgdidr |= (1 << 15); /* RES1 bit */
 +            ahcf->isar.dbgdidr = dbgdidr;
 +        }
      }
      sve_supported = ioctl(fdarray[0], KVM_CHECK_EXTENSION, KVM_CAP_ARM_SVE) > 0;
 --
 .20.1

-New patch
+[PULL 26/52] target/arm: Implement ARMv8.1-PMU extension
+The ARMv8.1-PMU extension requires:
+ * the evtCount field in PMETYPER<n>_EL0 is 16 bits, not 10
+ * MDCR_EL2.HPMD allows event counting to be disabled at EL2
+ * two new required events, STALL_FRONTEND and STALL_BACKEND
+ * ID register bits in ID_AA64DFR0_EL1 and ID_DFR0
+We already implement the 16-bit evtCount field and the
+HPMD bit, so all that is missing is the two new events:
+  STALL_FRONTEND
+   "counts every cycle counted by the CPU_CYCLES event on which no
+    operation was issued because there are no operations available
+    to issue to this PE from the frontend"
+  STALL_BACKEND
+   "counts every cycle counted by the CPU_CYCLES event on which no
+    operation was issued because the backend is unable to accept
+    any available operations from the frontend"
+QEMU never stalls in this sense, so our implementation is trivial:
+always return a zero count.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20200214175116.9164-14-peter.maydell@linaro.org
+---
+ target/arm/helper.c | 32 ++++++++++++++++++++++++++++++--
+file changed, 30 insertions(+), 2 deletions(-)
+diff --git a/target/arm/helper.c b/target/arm/helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.c
++++ b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@ static int64_t instructions_ns_per(uint64_t icount)
+ }
+ #endif
++static bool pmu_8_1_events_supported(CPUARMState *env)
++{
++    /* For events which are supported in any v8.1 PMU */
++    return cpu_isar_feature(any_pmu_8_1, env_archcpu(env));
++}
++
++static uint64_t zero_event_get_count(CPUARMState *env)
++{
++    /* For events which on QEMU never fire, so their count is always zero */
++    return 0;
++}
++
++static int64_t zero_event_ns_per(uint64_t cycles)
++{
++    /* An event which never fires can never overflow */
++    return -1;
++}
++
+ static const pm_event pm_events[] = {
+     { .number = 0x000, /* SW_INCR */
+       .supported = event_always_supported,
+@@ -XXX,XX +XXX,XX @@ static const pm_event pm_events[] = {
+       .supported = event_always_supported,
+       .get_count = cycles_get_count,
+       .ns_per_count = cycles_ns_per,
+-    }
++    },
+ #endif
++    { .number = 0x023, /* STALL_FRONTEND */
++      .supported = pmu_8_1_events_supported,
++      .get_count = zero_event_get_count,
++      .ns_per_count = zero_event_ns_per,
++    },
++    { .number = 0x024, /* STALL_BACKEND */
++      .supported = pmu_8_1_events_supported,
++      .get_count = zero_event_get_count,
++      .ns_per_count = zero_event_ns_per,
++    },
+ };
+ /*
+@@ -XXX,XX +XXX,XX @@ static const pm_event pm_events[] = {
+  * should first be updated to something sparse instead of the current
+  * supported_event_map[] array.
+  */
+-#define MAX_EVENT_ID 0x11
++#define MAX_EVENT_ID 0x24
+ #define UNSUPPORTED_EVENT UINT16_MAX
+ static uint16_t supported_event_map[MAX_EVENT_ID + 1];
+--
+.20.1

-[Qemu-devel] [PULL 19/21] hw/arm/musca: Wire up PL031 RTC
+[PULL 27/52] target/arm: Implement ARMv8.4-PMU extension
-Wire up the PL031 RTC for the Musca board.
+The ARMv8.4-PMU extension adds:
  * one new required event, STALL
  * one new system register PMMIR_EL1
+(There are also some more L1-cache related events, but since
+we don't implement any cache we don't provide these, in the
+same way we don't provide the base-PMUv3 cache events.)
+The STALL event "counts every attributable cycle on which no
+attributable instruction or operation was sent for execution on this
+PE".  QEMU doesn't stall in this sense, so this is another
+always-reads-zero event.
+The PMMIR_EL1 register is a read-only register providing
+implementation-specific information about the PMU; currently it has
+only one field, SLOTS, which defines behaviour of the STALL_SLOT PMU
+event.  Since QEMU doesn't implement the STALL_SLOT event, we can
+validly make the register read zero.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200214175116.9164-15-peter.maydell@linaro.org
 ---
- hw/arm/musca.c | 26 +++++++++++++++++++++++---
+ target/arm/cpu.h    | 18 ++++++++++++++++++
-file changed, 23 insertions(+), 3 deletions(-)
+ target/arm/helper.c | 22 +++++++++++++++++++++-
 files changed, 39 insertions(+), 1 deletion(-)
-diff --git a/hw/arm/musca.c b/hw/arm/musca.c
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/musca.c
+--- a/target/arm/cpu.h
-+++ b/hw/arm/musca.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_pmu_8_1(const ARMISARegisters *id)
- #include "hw/misc/tz-mpc.h"
+         FIELD_EX32(id->id_dfr0, ID_DFR0, PERFMON) != 0xf;
- #include "hw/misc/tz-ppc.h"
+ }
- #include "hw/misc/unimp.h"
-+#include "hw/timer/pl031.h"
++static inline bool isar_feature_aa32_pmu_8_4(const ARMISARegisters *id)
  #define MUSCA_NUMIRQ_MAX 96
  #define MUSCA_PPC_MAX 3
@@ -XXX,XX +XXX,XX @@ typedef struct {
      UnimplementedDeviceState spi;
      UnimplementedDeviceState scc;
      UnimplementedDeviceState timer;
 -    UnimplementedDeviceState rtc;
 +    PL031State rtc;
      UnimplementedDeviceState pvt;
      UnimplementedDeviceState sdio;
      UnimplementedDeviceState gpio;
@@ -XXX,XX +XXX,XX @@ typedef struct {
   */
  #define SYSCLK_FRQ 40000000
 +static qemu_irq get_sse_irq_in(MuscaMachineState *mms, int irqno)
 +{
-+    /* Return a qemu_irq which will signal IRQ n to all CPUs in the SSE. */
++    /* 0xf means "non-standard IMPDEF PMU" */
-+    assert(irqno < MUSCA_NUMIRQ_MAX);
++    return FIELD_EX32(id->id_dfr0, ID_DFR0, PERFMON) >= 5 &&
-+
++        FIELD_EX32(id->id_dfr0, ID_DFR0, PERFMON) != 0xf;
 +    return qdev_get_gpio_in(DEVICE(&mms->cpu_irq_splitter[irqno]), 0);
 +}
 +
  /*
-  * Most of the devices in the Musca board sit behind Peripheral Protection
+  * 64-bit feature tests via id registers.
-  * Controllers. These data structures define the layout of which devices
+  */
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_mpc(MuscaMachineState *mms, void *opaque,
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_pmu_8_1(const ARMISARegisters *id)
-     return sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 0);
+         FIELD_EX64(id->id_aa64dfr0, ID_AA64DFR0, PMUVER) != 0xf;
  }
-+static MemoryRegion *make_rtc(MuscaMachineState *mms, void *opaque,
++static inline bool isar_feature_aa64_pmu_8_4(const ARMISARegisters *id)
 +                              const char *name, hwaddr size)
 +{
-+    PL031State *rtc = opaque;
++    return FIELD_EX32(id->id_aa64dfr0, ID_AA64DFR0, PMUVER) >= 5 &&
-+
++        FIELD_EX32(id->id_aa64dfr0, ID_AA64DFR0, PMUVER) != 0xf;
 +    sysbus_init_child_obj(OBJECT(mms), name, rtc, sizeof(mms->rtc), TYPE_PL031);
 +    object_property_set_bool(OBJECT(rtc), true, "realized", &error_fatal);
 +    sysbus_connect_irq(SYS_BUS_DEVICE(rtc), 0, get_sse_irq_in(mms, 39));
 +    return sysbus_mmio_get_region(SYS_BUS_DEVICE(rtc), 0);
 +}
 +
- static MemoryRegion *make_musca_a_devs(MuscaMachineState *mms, void *opaque,
+ /*
-                                        const char *name, hwaddr size)
+  * Feature tests for "does this exist in either 32-bit or 64-bit?"
   */
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_any_pmu_8_1(const ARMISARegisters *id)
      return isar_feature_aa64_pmu_8_1(id) || isar_feature_aa32_pmu_8_1(id);
  }
 +static inline bool isar_feature_any_pmu_8_4(const ARMISARegisters *id)
 +{
 +    return isar_feature_aa64_pmu_8_4(id) || isar_feature_aa32_pmu_8_4(id);
 +}
 +
  /*
   * Forward to the above feature tests given an ARMCPU pointer.
   */
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static bool pmu_8_1_events_supported(CPUARMState *env)
      return cpu_isar_feature(any_pmu_8_1, env_archcpu(env));
  }
 +static bool pmu_8_4_events_supported(CPUARMState *env)
 +{
 +    /* For events which are supported in any v8.1 PMU */
 +    return cpu_isar_feature(any_pmu_8_4, env_archcpu(env));
 +}
 +
  static uint64_t zero_event_get_count(CPUARMState *env)
  {
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_musca_a_devs(MuscaMachineState *mms, void *opaque,
+     /* For events which on QEMU never fire, so their count is always zero */
-         { "i2c1", make_unimp_dev, &mms->i2c[1], 0x5000, 0x1000 },
+@@ -XXX,XX +XXX,XX @@ static const pm_event pm_events[] = {
-         { "i2s", make_unimp_dev, &mms->i2s, 0x6000, 0x1000 },
+       .get_count = zero_event_get_count,
-         { "pwm0", make_unimp_dev, &mms->pwm[0], 0x7000, 0x1000 },
+       .ns_per_count = zero_event_ns_per,
--        { "rtc", make_unimp_dev, &mms->rtc, 0x8000, 0x1000 },
+     },
-+        { "rtc", make_rtc, &mms->rtc, 0x8000, 0x1000 },
++    { .number = 0x03c, /* STALL */
-         { "qspi", make_unimp_dev, &mms->qspi, 0xa000, 0x1000 },
++      .supported = pmu_8_4_events_supported,
-         { "timer", make_unimp_dev, &mms->timer, 0xb000, 0x1000 },
++      .get_count = zero_event_get_count,
-         { "scc", make_unimp_dev, &mms->scc, 0xc000, 0x1000 },
++      .ns_per_count = zero_event_ns_per,
-@@ -XXX,XX +XXX,XX @@ static void musca_init(MachineState *machine)
++    },
-                 { "spi", make_unimp_dev, &mms->spi, 0x4010a000, 0x1000 },
+ };
-                 { "scc", make_unimp_dev, &mms->scc, 0x5010b000, 0x1000 },
-                 { "timer", make_unimp_dev, &mms->timer, 0x4010c000, 0x1000 },
+ /*
--                { "rtc", make_unimp_dev, &mms->rtc, 0x4010d000, 0x1000 },
+@@ -XXX,XX +XXX,XX @@ static const pm_event pm_events[] = {
-+                { "rtc", make_rtc, &mms->rtc, 0x4010d000, 0x1000 },
+  * should first be updated to something sparse instead of the current
-                 { "pvt", make_unimp_dev, &mms->pvt, 0x4010e000, 0x1000 },
+  * supported_event_map[] array.
-                 { "sdio", make_unimp_dev, &mms->sdio, 0x4010f000, 0x1000 },
+  */
-             },
+-#define MAX_EVENT_ID 0x24
 +#define MAX_EVENT_ID 0x3c
  #define UNSUPPORTED_EVENT UINT16_MAX
  static uint16_t supported_event_map[MAX_EVENT_ID + 1];
@@ -XXX,XX +XXX,XX @@ static void define_pmu_regs(ARMCPU *cpu)
          };
          define_arm_cp_regs(cpu, v81_pmu_regs);
      }
 +    if (cpu_isar_feature(any_pmu_8_4, cpu)) {
 +        static const ARMCPRegInfo v84_pmmir = {
 +            .name = "PMMIR_EL1", .state = ARM_CP_STATE_BOTH,
 +            .opc0 = 3, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 6,
 +            .access = PL1_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
 +            .resetvalue = 0
 +        };
 +        define_one_arm_cp_reg(cpu, &v84_pmmir);
 +    }
  }
  /* We don't know until after realize whether there's a GICv3
 --
 .20.1

-New patch
+[PULL 28/52] target/arm: Provide ARMv8.4-PMU in '-cpu max'
+Set the ID register bits to provide ARMv8.4-PMU (and implicitly
+also ARMv8.1-PMU) in the 'max' CPU.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20200214175116.9164-16-peter.maydell@linaro.org
+---
+ target/arm/cpu64.c | 8 ++++++++
+file changed, 8 insertions(+)
+diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu64.c
++++ b/target/arm/cpu64.c
+@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
+         u = FIELD_DP32(u, ID_MMFR3, PAN, 2); /* ATS1E1 */
+         cpu->id_mmfr3 = u;
++        u = cpu->isar.id_aa64dfr0;
++        u = FIELD_DP64(u, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
++        cpu->isar.id_aa64dfr0 = u;
++
++        u = cpu->isar.id_dfr0;
++        u = FIELD_DP32(u, ID_DFR0, PERFMON, 5); /* v8.4-PMU */
++        cpu->isar.id_dfr0 = u;
++
+         /*
+          * FIXME: We do not yet support ARMv8.2-fp16 for AArch32 yet,
+          * so do not set MVFR1.FPHP.  Strictly speaking this is not legal,
+--
+.20.1

-New patch
+[PULL 29/52] target/arm: Correct definition of PMCRDP
+The PMCR_EL0.DP bit is bit 5, which is 0x20, not 0x10.  0x10 is 'X'.
+Correct our #define of PMCRDP and add the missing PMCRX.
+We do have the correct behaviour for handling the DP bit being
+set, so this fixes a guest-visible bug.
+Fixes: 033614c47de
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20200214175116.9164-17-peter.maydell@linaro.org
+---
+ target/arm/helper.c | 3 ++-
+file changed, 2 insertions(+), 1 deletion(-)
+diff --git a/target/arm/helper.c b/target/arm/helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.c
++++ b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
+ #define PMCRN_MASK  0xf800
+ #define PMCRN_SHIFT 11
+ #define PMCRLC  0x40
+-#define PMCRDP  0x10
++#define PMCRDP  0x20
++#define PMCRX   0x10
+ #define PMCRD   0x8
+ #define PMCRC   0x4
+ #define PMCRP   0x2
+--
+.20.1

-[Qemu-devel] [PULL 01/21] hw/arm/armsse: Fix memory leak in error-exit path
+[PULL 30/52] target/arm: Correct handling of PMCR_EL0.LC bit
-Coverity points out (CID 1398632, CID 1398650) that we
+The LC bit in the PMCR_EL0 register is supposed to be:
-leak a couple of allocated strings in the error-exit
+ * read/write
-code path for setting up the MHUs in the ARMSSE.
+ * RES1 on an AArch64-only implementation
-Fix this bug by moving the allocate-and-free of each
+ * an architecturally UNKNOWN value on reset
-string to be closer to the use, so we do the free before
+(and use of LC==0 by software is deprecated).
 doing the error-exit check.
-Fixes: f8574705f62b38a ("hw/arm/armsse: Add unimplemented-device stubs for MHUs")
+We were implementing it incorrectly as read-only always zero,
 though we do have all the code needed to test it and behave
 accordingly.
 Instead make it a read-write bit which resets to 1 always, which
 satisfies all the architectural requirements above.
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Message-id: 20200214175116.9164-18-peter.maydell@linaro.org
 Message-id: 20190215113707.24553-1-peter.maydell@linaro.org
 ---
- hw/arm/armsse.c | 10 ++++++----
+ target/arm/helper.c | 13 +++++++++----
-file changed, 6 insertions(+), 4 deletions(-)
+file changed, 9 insertions(+), 4 deletions(-)
-diff --git a/hw/arm/armsse.c b/hw/arm/armsse.c
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/armsse.c
+--- a/target/arm/helper.c
-+++ b/hw/arm/armsse.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static void armsse_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
+ #define PMCRC   0x4
-     if (info->has_mhus) {
+ #define PMCRP   0x2
-         for (i = 0; i < ARRAY_SIZE(s->mhu); i++) {
+ #define PMCRE   0x1
--            char *name = g_strdup_printf("MHU%d", i);
++/*
--            char *port = g_strdup_printf("port[%d]", i + 3);
++ * Mask of PMCR bits writeable by guest (not including WO bits like C, P,
-+            char *name;
++ * which can be written as 1 to trigger behaviour but which stay RAZ).
-+            char *port;
++ */
++#define PMCR_WRITEABLE_MASK (PMCRLC | PMCRDP | PMCRX | PMCRD | PMCRE)
-+            name = g_strdup_printf("MHU%d", i);
-             qdev_prop_set_string(DEVICE(&s->mhu[i]), "name", name);
+ #define PMXEVTYPER_P          0x80000000
-             qdev_prop_set_uint64(DEVICE(&s->mhu[i]), "size", 0x1000);
+ #define PMXEVTYPER_U          0x40000000
-             object_property_set_bool(OBJECT(&s->mhu[i]), true,
+@@ -XXX,XX +XXX,XX @@ static void pmcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
                                       "realized", &err);
 +            g_free(name);
              if (err) {
                  error_propagate(errp, err);
                  return;
              }
 +            port = g_strdup_printf("port[%d]", i + 3);
              mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->mhu[i]), 0);
              object_property_set_link(OBJECT(&s->apb_ppc0), OBJECT(mr),
                                       port, &err);
 +            g_free(port);
              if (err) {
                  error_propagate(errp, err);
                  return;
              }
 -            g_free(name);
 -            g_free(port);
          }
      }
+-    /* only the DP, X, D and E bits are writable */
+-    env->cp15.c9_pmcr &= ~0x39;
+-    env->cp15.c9_pmcr |= (value & 0x39);
++    env->cp15.c9_pmcr &= ~PMCR_WRITEABLE_MASK;
++    env->cp15.c9_pmcr |= (value & PMCR_WRITEABLE_MASK);
+     pmu_op_finish(env);
+ }
+@@ -XXX,XX +XXX,XX @@ static void define_pmu_regs(ARMCPU *cpu)
+         .access = PL0_RW, .accessfn = pmreg_access,
+         .type = ARM_CP_IO,
+         .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcr),
+-        .resetvalue = (cpu->midr & 0xff000000) | (pmcrn << PMCRN_SHIFT),
++        .resetvalue = (cpu->midr & 0xff000000) | (pmcrn << PMCRN_SHIFT) |
++                      PMCRLC,
+         .writefn = pmcr_write, .raw_writefn = raw_write,
+     };
+     define_one_arm_cp_reg(cpu, &pmcr);
 --
 .20.1

-[Qemu-devel] [PULL 14/21] hw/arm/armsse: Document SRAM_ADDR_WIDTH property in header comment
+[PULL 31/52] target/arm: Test correct register in aa32_pan and aa32_ats1e1 checks
-In commit 4b635cf7a95e501211 we added a QOM property to the ARMSSE
+The isar_feature_aa32_pan and isar_feature_aa32_ats1e1 functions
-object, but forgot to add it to the documentation comment in the
+are supposed to be testing fields in ID_MMFR3; but a cut-and-paste
-header. Correct the omission.
+error meant we were looking at MVFR0 instead.
-Fixes: 4b635cf7a95e501211 ("hw/arm/armsse: Make SRAM bank size configurable")
+Fix the functions to look at the right register; this requires
 us to move at least id_mmfr3 to the ARMISARegisters struct; we
 choose to move all the ID_MMFRn registers for consistency.
 Fixes: 3d6ad6bb466f
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200214175116.9164-19-peter.maydell@linaro.org
 ---
- include/hw/arm/armsse.h | 2 ++
+ target/arm/cpu.h      |  14 +++---
-file changed, 2 insertions(+)
+ hw/intc/armv7m_nvic.c |   8 ++--
  target/arm/cpu.c      | 104 +++++++++++++++++++++---------------------
  target/arm/cpu64.c    |  28 ++++++------
  target/arm/helper.c   |  12 ++---
  target/arm/kvm32.c    |  17 +++++++
  target/arm/kvm64.c    |  10 ++++
 files changed, 110 insertions(+), 83 deletions(-)
-diff --git a/include/hw/arm/armsse.h b/include/hw/arm/armsse.h
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/armsse.h
+--- a/target/arm/cpu.h
-+++ b/include/hw/arm/armsse.h
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
-  *    being the same for both, to avoid having to have separate Property
+         uint32_t id_isar4;
-  *    lists for different variants. This restriction can be relaxed later
+         uint32_t id_isar5;
-  *    if necessary.)
+         uint32_t id_isar6;
-+ *  + QOM property "SRAM_ADDR_WIDTH" sets the number of bits used for the
++        uint32_t id_mmfr0;
-+ *    address of each SRAM bank (and thus the total amount of internal SRAM)
++        uint32_t id_mmfr1;
-  *  + Named GPIO inputs "EXP_IRQ" 0..n are the expansion interrupts for CPU 0,
++        uint32_t id_mmfr2;
-  *    which are wired to its NVIC lines 32 .. n+32
++        uint32_t id_mmfr3;
-  *  + Named GPIO inputs "EXP_CPU1_IRQ" 0..n are the expansion interrupts for
++        uint32_t id_mmfr4;
          uint32_t mvfr0;
          uint32_t mvfr1;
          uint32_t mvfr2;
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
      uint64_t pmceid0;
      uint64_t pmceid1;
      uint32_t id_afr0;
 -    uint32_t id_mmfr0;
 -    uint32_t id_mmfr1;
 -    uint32_t id_mmfr2;
 -    uint32_t id_mmfr3;
 -    uint32_t id_mmfr4;
      uint64_t id_aa64afr0;
      uint64_t id_aa64afr1;
      uint32_t clidr;
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_vminmaxnm(const ARMISARegisters *id)
  static inline bool isar_feature_aa32_pan(const ARMISARegisters *id)
  {
 -    return FIELD_EX64(id->mvfr0, ID_MMFR3, PAN) != 0;
 +    return FIELD_EX32(id->id_mmfr3, ID_MMFR3, PAN) != 0;
  }
  static inline bool isar_feature_aa32_ats1e1(const ARMISARegisters *id)
  {
 -    return FIELD_EX64(id->mvfr0, ID_MMFR3, PAN) >= 2;
 +    return FIELD_EX32(id->id_mmfr3, ID_MMFR3, PAN) >= 2;
  }
  static inline bool isar_feature_aa32_pmu_8_1(const ARMISARegisters *id)
 diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/intc/armv7m_nvic.c
 +++ b/hw/intc/armv7m_nvic.c
@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
      case 0xd4c: /* AFR0.  */
          return cpu->id_afr0;
      case 0xd50: /* MMFR0.  */
 -        return cpu->id_mmfr0;
 +        return cpu->isar.id_mmfr0;
      case 0xd54: /* MMFR1.  */
 -        return cpu->id_mmfr1;
 +        return cpu->isar.id_mmfr1;
      case 0xd58: /* MMFR2.  */
 -        return cpu->id_mmfr2;
 +        return cpu->isar.id_mmfr2;
      case 0xd5c: /* MMFR3.  */
 -        return cpu->id_mmfr3;
 +        return cpu->isar.id_mmfr3;
      case 0xd60: /* ISAR0.  */
          return cpu->isar.id_isar0;
      case 0xd64: /* ISAR1.  */
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm1136_r2_initfn(Object *obj)
      cpu->id_pfr1 = 0x1;
      cpu->isar.id_dfr0 = 0x2;
      cpu->id_afr0 = 0x3;
 -    cpu->id_mmfr0 = 0x01130003;
 -    cpu->id_mmfr1 = 0x10030302;
 -    cpu->id_mmfr2 = 0x01222110;
 +    cpu->isar.id_mmfr0 = 0x01130003;
 +    cpu->isar.id_mmfr1 = 0x10030302;
 +    cpu->isar.id_mmfr2 = 0x01222110;
      cpu->isar.id_isar0 = 0x00140011;
      cpu->isar.id_isar1 = 0x12002111;
      cpu->isar.id_isar2 = 0x11231111;
@@ -XXX,XX +XXX,XX @@ static void arm1136_initfn(Object *obj)
      cpu->id_pfr1 = 0x1;
      cpu->isar.id_dfr0 = 0x2;
      cpu->id_afr0 = 0x3;
 -    cpu->id_mmfr0 = 0x01130003;
 -    cpu->id_mmfr1 = 0x10030302;
 -    cpu->id_mmfr2 = 0x01222110;
 +    cpu->isar.id_mmfr0 = 0x01130003;
 +    cpu->isar.id_mmfr1 = 0x10030302;
 +    cpu->isar.id_mmfr2 = 0x01222110;
      cpu->isar.id_isar0 = 0x00140011;
      cpu->isar.id_isar1 = 0x12002111;
      cpu->isar.id_isar2 = 0x11231111;
@@ -XXX,XX +XXX,XX @@ static void arm1176_initfn(Object *obj)
      cpu->id_pfr1 = 0x11;
      cpu->isar.id_dfr0 = 0x33;
      cpu->id_afr0 = 0;
 -    cpu->id_mmfr0 = 0x01130003;
 -    cpu->id_mmfr1 = 0x10030302;
 -    cpu->id_mmfr2 = 0x01222100;
 +    cpu->isar.id_mmfr0 = 0x01130003;
 +    cpu->isar.id_mmfr1 = 0x10030302;
 +    cpu->isar.id_mmfr2 = 0x01222100;
      cpu->isar.id_isar0 = 0x0140011;
      cpu->isar.id_isar1 = 0x12002111;
      cpu->isar.id_isar2 = 0x11231121;
@@ -XXX,XX +XXX,XX @@ static void arm11mpcore_initfn(Object *obj)
      cpu->id_pfr1 = 0x1;
      cpu->isar.id_dfr0 = 0;
      cpu->id_afr0 = 0x2;
 -    cpu->id_mmfr0 = 0x01100103;
 -    cpu->id_mmfr1 = 0x10020302;
 -    cpu->id_mmfr2 = 0x01222000;
 +    cpu->isar.id_mmfr0 = 0x01100103;
 +    cpu->isar.id_mmfr1 = 0x10020302;
 +    cpu->isar.id_mmfr2 = 0x01222000;
      cpu->isar.id_isar0 = 0x00100011;
      cpu->isar.id_isar1 = 0x12002111;
      cpu->isar.id_isar2 = 0x11221011;
@@ -XXX,XX +XXX,XX @@ static void cortex_m3_initfn(Object *obj)
      cpu->id_pfr1 = 0x00000200;
      cpu->isar.id_dfr0 = 0x00100000;
      cpu->id_afr0 = 0x00000000;
 -    cpu->id_mmfr0 = 0x00000030;
 -    cpu->id_mmfr1 = 0x00000000;
 -    cpu->id_mmfr2 = 0x00000000;
 -    cpu->id_mmfr3 = 0x00000000;
 +    cpu->isar.id_mmfr0 = 0x00000030;
 +    cpu->isar.id_mmfr1 = 0x00000000;
 +    cpu->isar.id_mmfr2 = 0x00000000;
 +    cpu->isar.id_mmfr3 = 0x00000000;
      cpu->isar.id_isar0 = 0x01141110;
      cpu->isar.id_isar1 = 0x02111000;
      cpu->isar.id_isar2 = 0x21112231;
@@ -XXX,XX +XXX,XX @@ static void cortex_m4_initfn(Object *obj)
      cpu->id_pfr1 = 0x00000200;
      cpu->isar.id_dfr0 = 0x00100000;
      cpu->id_afr0 = 0x00000000;
 -    cpu->id_mmfr0 = 0x00000030;
 -    cpu->id_mmfr1 = 0x00000000;
 -    cpu->id_mmfr2 = 0x00000000;
 -    cpu->id_mmfr3 = 0x00000000;
 +    cpu->isar.id_mmfr0 = 0x00000030;
 +    cpu->isar.id_mmfr1 = 0x00000000;
 +    cpu->isar.id_mmfr2 = 0x00000000;
 +    cpu->isar.id_mmfr3 = 0x00000000;
      cpu->isar.id_isar0 = 0x01141110;
      cpu->isar.id_isar1 = 0x02111000;
      cpu->isar.id_isar2 = 0x21112231;
@@ -XXX,XX +XXX,XX @@ static void cortex_m7_initfn(Object *obj)
      cpu->id_pfr1 = 0x00000200;
      cpu->isar.id_dfr0 = 0x00100000;
      cpu->id_afr0 = 0x00000000;
 -    cpu->id_mmfr0 = 0x00100030;
 -    cpu->id_mmfr1 = 0x00000000;
 -    cpu->id_mmfr2 = 0x01000000;
 -    cpu->id_mmfr3 = 0x00000000;
 +    cpu->isar.id_mmfr0 = 0x00100030;
 +    cpu->isar.id_mmfr1 = 0x00000000;
 +    cpu->isar.id_mmfr2 = 0x01000000;
 +    cpu->isar.id_mmfr3 = 0x00000000;
      cpu->isar.id_isar0 = 0x01101110;
      cpu->isar.id_isar1 = 0x02112000;
      cpu->isar.id_isar2 = 0x20232231;
@@ -XXX,XX +XXX,XX @@ static void cortex_m33_initfn(Object *obj)
      cpu->id_pfr1 = 0x00000210;
      cpu->isar.id_dfr0 = 0x00200000;
      cpu->id_afr0 = 0x00000000;
 -    cpu->id_mmfr0 = 0x00101F40;
 -    cpu->id_mmfr1 = 0x00000000;
 -    cpu->id_mmfr2 = 0x01000000;
 -    cpu->id_mmfr3 = 0x00000000;
 +    cpu->isar.id_mmfr0 = 0x00101F40;
 +    cpu->isar.id_mmfr1 = 0x00000000;
 +    cpu->isar.id_mmfr2 = 0x01000000;
 +    cpu->isar.id_mmfr3 = 0x00000000;
      cpu->isar.id_isar0 = 0x01101110;
      cpu->isar.id_isar1 = 0x02212000;
      cpu->isar.id_isar2 = 0x20232232;
@@ -XXX,XX +XXX,XX @@ static void cortex_r5_initfn(Object *obj)
      cpu->id_pfr1 = 0x001;
      cpu->isar.id_dfr0 = 0x010400;
      cpu->id_afr0 = 0x0;
 -    cpu->id_mmfr0 = 0x0210030;
 -    cpu->id_mmfr1 = 0x00000000;
 -    cpu->id_mmfr2 = 0x01200000;
 -    cpu->id_mmfr3 = 0x0211;
 +    cpu->isar.id_mmfr0 = 0x0210030;
 +    cpu->isar.id_mmfr1 = 0x00000000;
 +    cpu->isar.id_mmfr2 = 0x01200000;
 +    cpu->isar.id_mmfr3 = 0x0211;
      cpu->isar.id_isar0 = 0x02101111;
      cpu->isar.id_isar1 = 0x13112111;
      cpu->isar.id_isar2 = 0x21232141;
@@ -XXX,XX +XXX,XX @@ static void cortex_a8_initfn(Object *obj)
      cpu->id_pfr1 = 0x11;
      cpu->isar.id_dfr0 = 0x400;
      cpu->id_afr0 = 0;
 -    cpu->id_mmfr0 = 0x31100003;
 -    cpu->id_mmfr1 = 0x20000000;
 -    cpu->id_mmfr2 = 0x01202000;
 -    cpu->id_mmfr3 = 0x11;
 +    cpu->isar.id_mmfr0 = 0x31100003;
 +    cpu->isar.id_mmfr1 = 0x20000000;
 +    cpu->isar.id_mmfr2 = 0x01202000;
 +    cpu->isar.id_mmfr3 = 0x11;
      cpu->isar.id_isar0 = 0x00101111;
      cpu->isar.id_isar1 = 0x12112111;
      cpu->isar.id_isar2 = 0x21232031;
@@ -XXX,XX +XXX,XX @@ static void cortex_a9_initfn(Object *obj)
      cpu->id_pfr1 = 0x11;
      cpu->isar.id_dfr0 = 0x000;
      cpu->id_afr0 = 0;
 -    cpu->id_mmfr0 = 0x00100103;
 -    cpu->id_mmfr1 = 0x20000000;
 -    cpu->id_mmfr2 = 0x01230000;
 -    cpu->id_mmfr3 = 0x00002111;
 +    cpu->isar.id_mmfr0 = 0x00100103;
 +    cpu->isar.id_mmfr1 = 0x20000000;
 +    cpu->isar.id_mmfr2 = 0x01230000;
 +    cpu->isar.id_mmfr3 = 0x00002111;
      cpu->isar.id_isar0 = 0x00101111;
      cpu->isar.id_isar1 = 0x13112111;
      cpu->isar.id_isar2 = 0x21232041;
@@ -XXX,XX +XXX,XX @@ static void cortex_a7_initfn(Object *obj)
      cpu->id_pfr1 = 0x00011011;
      cpu->isar.id_dfr0 = 0x02010555;
      cpu->id_afr0 = 0x00000000;
 -    cpu->id_mmfr0 = 0x10101105;
 -    cpu->id_mmfr1 = 0x40000000;
 -    cpu->id_mmfr2 = 0x01240000;
 -    cpu->id_mmfr3 = 0x02102211;
 +    cpu->isar.id_mmfr0 = 0x10101105;
 +    cpu->isar.id_mmfr1 = 0x40000000;
 +    cpu->isar.id_mmfr2 = 0x01240000;
 +    cpu->isar.id_mmfr3 = 0x02102211;
      /* a7_mpcore_r0p5_trm, page 4-4 gives 0x01101110; but
       * table 4-41 gives 0x02101110, which includes the arm div insns.
       */
@@ -XXX,XX +XXX,XX @@ static void cortex_a15_initfn(Object *obj)
      cpu->id_pfr1 = 0x00011011;
      cpu->isar.id_dfr0 = 0x02010555;
      cpu->id_afr0 = 0x00000000;
 -    cpu->id_mmfr0 = 0x10201105;
 -    cpu->id_mmfr1 = 0x20000000;
 -    cpu->id_mmfr2 = 0x01240000;
 -    cpu->id_mmfr3 = 0x02102211;
 +    cpu->isar.id_mmfr0 = 0x10201105;
 +    cpu->isar.id_mmfr1 = 0x20000000;
 +    cpu->isar.id_mmfr2 = 0x01240000;
 +    cpu->isar.id_mmfr3 = 0x02102211;
      cpu->isar.id_isar0 = 0x02101110;
      cpu->isar.id_isar1 = 0x13112111;
      cpu->isar.id_isar2 = 0x21232041;
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
              t = FIELD_DP32(t, MVFR2, FPMISC, 4);   /* FP MaxNum */
              cpu->isar.mvfr2 = t;
 -            t = cpu->id_mmfr3;
 +            t = cpu->isar.id_mmfr3;
              t = FIELD_DP32(t, ID_MMFR3, PAN, 2); /* ATS1E1 */
 -            cpu->id_mmfr3 = t;
 +            cpu->isar.id_mmfr3 = t;
 -            t = cpu->id_mmfr4;
 +            t = cpu->isar.id_mmfr4;
              t = FIELD_DP32(t, ID_MMFR4, HPDS, 1); /* AA32HPD */
 -            cpu->id_mmfr4 = t;
 +            cpu->isar.id_mmfr4 = t;
          }
  #endif
      }
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_a57_initfn(Object *obj)
      cpu->id_pfr1 = 0x00011011;
      cpu->isar.id_dfr0 = 0x03010066;
      cpu->id_afr0 = 0x00000000;
 -    cpu->id_mmfr0 = 0x10101105;
 -    cpu->id_mmfr1 = 0x40000000;
 -    cpu->id_mmfr2 = 0x01260000;
 -    cpu->id_mmfr3 = 0x02102211;
 +    cpu->isar.id_mmfr0 = 0x10101105;
 +    cpu->isar.id_mmfr1 = 0x40000000;
 +    cpu->isar.id_mmfr2 = 0x01260000;
 +    cpu->isar.id_mmfr3 = 0x02102211;
      cpu->isar.id_isar0 = 0x02101110;
      cpu->isar.id_isar1 = 0x13112111;
      cpu->isar.id_isar2 = 0x21232042;
@@ -XXX,XX +XXX,XX @@ static void aarch64_a53_initfn(Object *obj)
      cpu->id_pfr1 = 0x00011011;
      cpu->isar.id_dfr0 = 0x03010066;
      cpu->id_afr0 = 0x00000000;
 -    cpu->id_mmfr0 = 0x10101105;
 -    cpu->id_mmfr1 = 0x40000000;
 -    cpu->id_mmfr2 = 0x01260000;
 -    cpu->id_mmfr3 = 0x02102211;
 +    cpu->isar.id_mmfr0 = 0x10101105;
 +    cpu->isar.id_mmfr1 = 0x40000000;
 +    cpu->isar.id_mmfr2 = 0x01260000;
 +    cpu->isar.id_mmfr3 = 0x02102211;
      cpu->isar.id_isar0 = 0x02101110;
      cpu->isar.id_isar1 = 0x13112111;
      cpu->isar.id_isar2 = 0x21232042;
@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
      cpu->id_pfr1 = 0x00011011;
      cpu->isar.id_dfr0 = 0x03010066;
      cpu->id_afr0 = 0x00000000;
 -    cpu->id_mmfr0 = 0x10201105;
 -    cpu->id_mmfr1 = 0x40000000;
 -    cpu->id_mmfr2 = 0x01260000;
 -    cpu->id_mmfr3 = 0x02102211;
 +    cpu->isar.id_mmfr0 = 0x10201105;
 +    cpu->isar.id_mmfr1 = 0x40000000;
 +    cpu->isar.id_mmfr2 = 0x01260000;
 +    cpu->isar.id_mmfr3 = 0x02102211;
      cpu->isar.id_isar0 = 0x02101110;
      cpu->isar.id_isar1 = 0x13112111;
      cpu->isar.id_isar2 = 0x21232042;
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          u = FIELD_DP32(u, ID_ISAR6, SPECRES, 1);
          cpu->isar.id_isar6 = u;
 -        u = cpu->id_mmfr3;
 +        u = cpu->isar.id_mmfr3;
          u = FIELD_DP32(u, ID_MMFR3, PAN, 2); /* ATS1E1 */
 -        cpu->id_mmfr3 = u;
 +        cpu->isar.id_mmfr3 = u;
          u = cpu->isar.id_aa64dfr0;
          u = FIELD_DP64(u, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
                .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 1, .opc2 = 4,
                .access = PL1_R, .type = ARM_CP_CONST,
                .accessfn = access_aa32_tid3,
 -              .resetvalue = cpu->id_mmfr0 },
 +              .resetvalue = cpu->isar.id_mmfr0 },
              { .name = "ID_MMFR1", .state = ARM_CP_STATE_BOTH,
                .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 1, .opc2 = 5,
                .access = PL1_R, .type = ARM_CP_CONST,
                .accessfn = access_aa32_tid3,
 -              .resetvalue = cpu->id_mmfr1 },
 +              .resetvalue = cpu->isar.id_mmfr1 },
              { .name = "ID_MMFR2", .state = ARM_CP_STATE_BOTH,
                .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 1, .opc2 = 6,
                .access = PL1_R, .type = ARM_CP_CONST,
                .accessfn = access_aa32_tid3,
 -              .resetvalue = cpu->id_mmfr2 },
 +              .resetvalue = cpu->isar.id_mmfr2 },
              { .name = "ID_MMFR3", .state = ARM_CP_STATE_BOTH,
                .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 1, .opc2 = 7,
                .access = PL1_R, .type = ARM_CP_CONST,
                .accessfn = access_aa32_tid3,
 -              .resetvalue = cpu->id_mmfr3 },
 +              .resetvalue = cpu->isar.id_mmfr3 },
              { .name = "ID_ISAR0", .state = ARM_CP_STATE_BOTH,
                .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 0,
                .access = PL1_R, .type = ARM_CP_CONST,
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
                .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 6,
                .access = PL1_R, .type = ARM_CP_CONST,
                .accessfn = access_aa32_tid3,
 -              .resetvalue = cpu->id_mmfr4 },
 +              .resetvalue = cpu->isar.id_mmfr4 },
              { .name = "ID_ISAR6", .state = ARM_CP_STATE_BOTH,
                .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 7,
                .access = PL1_R, .type = ARM_CP_CONST,
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
          define_arm_cp_regs(cpu, vmsa_pmsa_cp_reginfo);
          define_arm_cp_regs(cpu, vmsa_cp_reginfo);
          /* TTCBR2 is introduced with ARMv8.2-A32HPD.  */
 -        if (FIELD_EX32(cpu->id_mmfr4, ID_MMFR4, HPDS) != 0) {
 +        if (FIELD_EX32(cpu->isar.id_mmfr4, ID_MMFR4, HPDS) != 0) {
              define_one_arm_cp_reg(cpu, &ttbcr2_reginfo);
          }
      }
 diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm32.c
 +++ b/target/arm/kvm32.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
       * Fortunately there is not yet anything in there that affects migration.
       */
 +    err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr0,
 +                          ARM_CP15_REG32(0, 0, 1, 4));
 +    err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr1,
 +                          ARM_CP15_REG32(0, 0, 1, 5));
 +    err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr2,
 +                          ARM_CP15_REG32(0, 0, 1, 6));
 +    err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr3,
 +                          ARM_CP15_REG32(0, 0, 1, 7));
 +    if (read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr4,
 +                       ARM_CP15_REG32(0, 0, 2, 6))) {
 +        /*
 +         * Older kernels don't support reading ID_MMFR4 (a new in v8
 +         * register); assume it's zero.
 +         */
 +        ahcf->isar.id_mmfr4 = 0;
 +    }
 +
      /*
       * There is no way to read DBGDIDR, because currently 32-bit KVM
       * doesn't implement debug at all. Leave it at zero.
 diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm64.c
 +++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
           */
          err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_dfr0,
                                ARM64_SYS_REG(3, 0, 0, 1, 2));
 +        err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr0,
 +                              ARM64_SYS_REG(3, 0, 0, 1, 4));
 +        err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr1,
 +                              ARM64_SYS_REG(3, 0, 0, 1, 5));
 +        err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr2,
 +                              ARM64_SYS_REG(3, 0, 0, 1, 6));
 +        err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr3,
 +                              ARM64_SYS_REG(3, 0, 0, 1, 7));
          err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_isar0,
                                ARM64_SYS_REG(3, 0, 0, 2, 0));
          err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_isar1,
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
                                ARM64_SYS_REG(3, 0, 0, 2, 4));
          err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_isar5,
                                ARM64_SYS_REG(3, 0, 0, 2, 5));
 +        err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr4,
 +                              ARM64_SYS_REG(3, 0, 0, 2, 6));
          err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_isar6,
                                ARM64_SYS_REG(3, 0, 0, 2, 7));
 --
 .20.1

-[Qemu-devel] [PULL 20/21] hw/arm/musca: Wire up PL011 UARTs
+[PULL 32/52] target/arm: Use isar_feature function for testing AA32HPD feature
-Wire up the two PL011 UARTs in the Musca board.
+Now we have moved ID_MMFR4 into the ARMISARegisters struct, we
 can define and use an isar_feature for the presence of the
 ARMv8.2-AA32HPD feature, rather than open-coding the test.
 While we're here, correct a comment typo which missed an 'A'
 from the feature name.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200214175116.9164-20-peter.maydell@linaro.org
 ---
- hw/arm/musca.c | 34 +++++++++++++++++++++++++++++-----
+ target/arm/cpu.h    | 5 +++++
-file changed, 29 insertions(+), 5 deletions(-)
+ target/arm/helper.c | 4 ++--
 files changed, 7 insertions(+), 2 deletions(-)
-diff --git a/hw/arm/musca.c b/hw/arm/musca.c
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/musca.c
+--- a/target/arm/cpu.h
-+++ b/hw/arm/musca.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_pmu_8_4(const ARMISARegisters *id)
- #include "qemu/error-report.h"
+         FIELD_EX32(id->id_dfr0, ID_DFR0, PERFMON) != 0xf;
  #include "qapi/error.h"
  #include "exec/address-spaces.h"
 +#include "sysemu/sysemu.h"
  #include "hw/arm/arm.h"
  #include "hw/arm/armsse.h"
  #include "hw/boards.h"
 +#include "hw/char/pl011.h"
  #include "hw/core/split-irq.h"
  #include "hw/misc/tz-mpc.h"
  #include "hw/misc/tz-ppc.h"
@@ -XXX,XX +XXX,XX @@ typedef struct {
      UnimplementedDeviceState mhu[2];
      UnimplementedDeviceState pwm[3];
      UnimplementedDeviceState i2s;
 -    UnimplementedDeviceState uart[2];
 +    PL011State uart[2];
      UnimplementedDeviceState i2c[2];
      UnimplementedDeviceState spi;
      UnimplementedDeviceState scc;
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_rtc(MuscaMachineState *mms, void *opaque,
      return sysbus_mmio_get_region(SYS_BUS_DEVICE(rtc), 0);
  }
-+static MemoryRegion *make_uart(MuscaMachineState *mms, void *opaque,
++static inline bool isar_feature_aa32_hpd(const ARMISARegisters *id)
 +                               const char *name, hwaddr size)
 +{
-+    PL011State *uart = opaque;
++    return FIELD_EX32(id->id_mmfr4, ID_MMFR4, HPDS) != 0;
 +    int i = uart - &mms->uart[0];
 +    int irqbase = 7 + i * 6;
 +    SysBusDevice *s;
 +
 +    sysbus_init_child_obj(OBJECT(mms), name, uart, sizeof(mms->uart[0]),
 +                          TYPE_PL011);
 +    qdev_prop_set_chr(DEVICE(uart), "chardev", serial_hd(i));
 +    object_property_set_bool(OBJECT(uart), true, "realized", &error_fatal);
 +    s = SYS_BUS_DEVICE(uart);
 +    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, irqbase + 5)); /* combined */
 +    sysbus_connect_irq(s, 1, get_sse_irq_in(mms, irqbase + 0)); /* RX */
 +    sysbus_connect_irq(s, 2, get_sse_irq_in(mms, irqbase + 1)); /* TX */
 +    sysbus_connect_irq(s, 3, get_sse_irq_in(mms, irqbase + 2)); /* RT */
 +    sysbus_connect_irq(s, 4, get_sse_irq_in(mms, irqbase + 3)); /* MS */
 +    sysbus_connect_irq(s, 5, get_sse_irq_in(mms, irqbase + 4)); /* E */
 +    return sysbus_mmio_get_region(SYS_BUS_DEVICE(uart), 0);
 +}
 +
- static MemoryRegion *make_musca_a_devs(MuscaMachineState *mms, void *opaque,
+ /*
-                                        const char *name, hwaddr size)
+  * 64-bit feature tests via id registers.
- {
+  */
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_musca_a_devs(MuscaMachineState *mms, void *opaque,
+diff --git a/target/arm/helper.c b/target/arm/helper.c
-     MemoryRegion *container = &mms->container;
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.c
-     const PPCPortInfo devices[] = {
++++ b/target/arm/helper.c
--        { "uart0", make_unimp_dev, &mms->uart[0], 0x1000, 0x1000 },
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
--        { "uart1", make_unimp_dev, &mms->uart[1], 0x2000, 0x1000 },
+     } else {
-+        { "uart0", make_uart, &mms->uart[0], 0x1000, 0x1000 },
+         define_arm_cp_regs(cpu, vmsa_pmsa_cp_reginfo);
-+        { "uart1", make_uart, &mms->uart[1], 0x2000, 0x1000 },
+         define_arm_cp_regs(cpu, vmsa_cp_reginfo);
-         { "spi", make_unimp_dev, &mms->spi, 0x3000, 0x1000 },
+-        /* TTCBR2 is introduced with ARMv8.2-A32HPD.  */
-         { "i2c0", make_unimp_dev, &mms->i2c[0], 0x4000, 0x1000 },
+-        if (FIELD_EX32(cpu->isar.id_mmfr4, ID_MMFR4, HPDS) != 0) {
-         { "i2c1", make_unimp_dev, &mms->i2c[1], 0x5000, 0x1000 },
++        /* TTCBR2 is introduced with ARMv8.2-AA32HPD.  */
-@@ -XXX,XX +XXX,XX @@ static void musca_init(MachineState *machine)
++        if (cpu_isar_feature(aa32_hpd, cpu)) {
-                 { "pwm1", make_unimp_dev, &mms->pwm[1], 0x40102000, 0x1000 },
+             define_one_arm_cp_reg(cpu, &ttbcr2_reginfo);
-                 { "pwm2", make_unimp_dev, &mms->pwm[2], 0x40103000, 0x1000 },
+         }
-                 { "i2s", make_unimp_dev, &mms->i2s, 0x40104000, 0x1000 },
+     }
 -                { "uart0", make_unimp_dev, &mms->uart[0], 0x40105000, 0x1000 },
 -                { "uart1", make_unimp_dev, &mms->uart[1], 0x40106000, 0x1000 },
 +                { "uart0", make_uart, &mms->uart[0], 0x40105000, 0x1000 },
 +                { "uart1", make_uart, &mms->uart[1], 0x40106000, 0x1000 },
                  { "i2c0", make_unimp_dev, &mms->i2c[0], 0x40108000, 0x1000 },
                  { "i2c1", make_unimp_dev, &mms->i2c[1], 0x40109000, 0x1000 },
                  { "spi", make_unimp_dev, &mms->spi, 0x4010a000, 0x1000 },
 --
 .20.1

-[Qemu-devel] [PULL 18/21] hw/arm/musca: Add MPCs
+[PULL 33/52] target/arm: Use FIELD_EX32 for testing 32-bit fields
-The Musca board puts its SRAM and flash behind TrustZone
+Cut-and-paste errors mean we're using FIELD_EX64() to extract fields from
-Memory Protection Controllers (MPCs). Each MPC sits between
+some 32-bit ID register fields. Use FIELD_EX32() instead. (This makes
-the CPU and the RAM/flash, and also has a set of memory mapped
+no difference in behaviour, it's just more consistent.)
 control registers. Wire up the MPCs, and the memory behind them.
 For the moment we implement the flash as simple ROM, which
 cannot be reprogrammed by the guest.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200214175116.9164-21-peter.maydell@linaro.org
 ---
- hw/arm/musca.c | 155 ++++++++++++++++++++++++++++++++++++++++++++++---
+ target/arm/cpu.h | 18 +++++++++---------
-file changed, 147 insertions(+), 8 deletions(-)
+file changed, 9 insertions(+), 9 deletions(-)
-diff --git a/hw/arm/musca.c b/hw/arm/musca.c
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/musca.c
+--- a/target/arm/cpu.h
-+++ b/hw/arm/musca.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
- #include "hw/arm/armsse.h"
+ static inline bool isar_feature_aa32_fp_d32(const ARMISARegisters *id)
- #include "hw/boards.h"
+ {
- #include "hw/core/split-irq.h"
+     /* Return true if D16-D31 are implemented */
-+#include "hw/misc/tz-mpc.h"
+-    return FIELD_EX64(id->mvfr0, MVFR0, SIMDREG) >= 2;
- #include "hw/misc/tz-ppc.h"
++    return FIELD_EX32(id->mvfr0, MVFR0, SIMDREG) >= 2;
  #include "hw/misc/unimp.h"
  #define MUSCA_NUMIRQ_MAX 96
  #define MUSCA_PPC_MAX 3
 +#define MUSCA_MPC_MAX 5
 +
 +typedef struct MPCInfo MPCInfo;
  typedef enum MuscaType {
      MUSCA_A,
@@ -XXX,XX +XXX,XX @@ typedef struct {
      uint32_t init_svtor;
      int sram_addr_width;
      int num_irqs;
 +    const MPCInfo *mpc_info;
 +    int num_mpcs;
  } MuscaMachineClass;
  typedef struct {
      MachineState parent;
      ARMSSE sse;
 +    /* RAM and flash */
 +    MemoryRegion ram[MUSCA_MPC_MAX];
      SplitIRQ cpu_irq_splitter[MUSCA_NUMIRQ_MAX];
      SplitIRQ sec_resp_splitter;
      TZPPC ppc[MUSCA_PPC_MAX];
      MemoryRegion container;
      UnimplementedDeviceState eflash[2];
      UnimplementedDeviceState qspi;
 -    UnimplementedDeviceState mpc[5];
 +    TZMPC mpc[MUSCA_MPC_MAX];
      UnimplementedDeviceState mhu[2];
      UnimplementedDeviceState pwm[3];
      UnimplementedDeviceState i2s;
@@ -XXX,XX +XXX,XX @@ typedef struct {
      UnimplementedDeviceState pvt;
      UnimplementedDeviceState sdio;
      UnimplementedDeviceState gpio;
 +    UnimplementedDeviceState cryptoisland;
  } MuscaMachineState;
  #define TYPE_MUSCA_MACHINE "musca"
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_unimp_dev(MuscaMachineState *mms,
      return sysbus_mmio_get_region(SYS_BUS_DEVICE(uds), 0);
  }
-+typedef enum MPCInfoType {
+ static inline bool isar_feature_aa32_fpshvec(const ARMISARegisters *id)
 +    MPC_RAM,
 +    MPC_ROM,
 +    MPC_CRYPTOISLAND,
 +} MPCInfoType;
 +
 +struct MPCInfo {
 +    const char *name;
 +    hwaddr addr;
 +    hwaddr size;
 +    MPCInfoType type;
 +};
 +
 +/* Order of the MPCs here must match the order of the bits in SECMPCINTSTATUS */
 +static const MPCInfo a_mpc_info[] = { {
 +        .name = "qspi",
 +        .type = MPC_ROM,
 +        .addr = 0x00200000,
 +        .size = 0x00800000,
 +    }, {
 +        .name = "sram",
 +        .type = MPC_RAM,
 +        .addr = 0x00000000,
 +        .size = 0x00200000,
 +    }
 +};
 +
 +static const MPCInfo b1_mpc_info[] = { {
 +        .name = "qspi",
 +        .type = MPC_ROM,
 +        .addr = 0x00000000,
 +        .size = 0x02000000,
 +    }, {
 +        .name = "sram",
 +        .type = MPC_RAM,
 +        .addr = 0x0a400000,
 +        .size = 0x00080000,
 +    }, {
 +        .name = "eflash0",
 +        .type = MPC_ROM,
 +        .addr = 0x0a000000,
 +        .size = 0x00200000,
 +    }, {
 +        .name = "eflash1",
 +        .type = MPC_ROM,
 +        .addr = 0x0a200000,
 +        .size = 0x00200000,
 +    }, {
 +        .name = "cryptoisland",
 +        .type = MPC_CRYPTOISLAND,
 +        .addr = 0x0a000000,
 +        .size = 0x00200000,
 +    }
 +};
 +
 +static MemoryRegion *make_mpc(MuscaMachineState *mms, void *opaque,
 +                              const char *name, hwaddr size)
 +{
 +    /*
 +     * Create an MPC and the RAM or flash behind it.
 +     * MPC 0: eFlash 0
 +     * MPC 1: eFlash 1
 +     * MPC 2: SRAM
 +     * MPC 3: QSPI flash
 +     * MPC 4: CryptoIsland
 +     * For now we implement the flash regions as ROM (ie not programmable)
 +     * (with their control interface memory regions being unimplemented
 +     * stubs behind the PPCs).
 +     * The whole CryptoIsland region behind its MPC is an unimplemented stub.
 +     */
 +    MuscaMachineClass *mmc = MUSCA_MACHINE_GET_CLASS(mms);
 +    TZMPC *mpc = opaque;
 +    int i = mpc - &mms->mpc[0];
 +    MemoryRegion *downstream;
 +    MemoryRegion *upstream;
 +    UnimplementedDeviceState *uds;
 +    char *mpcname;
 +    const MPCInfo *mpcinfo = mmc->mpc_info;
 +
 +    mpcname = g_strdup_printf("%s-mpc", mpcinfo[i].name);
 +
 +    switch (mpcinfo[i].type) {
 +    case MPC_ROM:
 +        downstream = &mms->ram[i];
 +        memory_region_init_rom(downstream, NULL, mpcinfo[i].name,
 +                               mpcinfo[i].size, &error_fatal);
 +        break;
 +    case MPC_RAM:
 +        downstream = &mms->ram[i];
 +        memory_region_init_ram(downstream, NULL, mpcinfo[i].name,
 +                               mpcinfo[i].size, &error_fatal);
 +        break;
 +    case MPC_CRYPTOISLAND:
 +        /* We don't implement the CryptoIsland yet */
 +        uds = &mms->cryptoisland;
 +        sysbus_init_child_obj(OBJECT(mms), name, uds,
 +                              sizeof(UnimplementedDeviceState),
 +                              TYPE_UNIMPLEMENTED_DEVICE);
 +        qdev_prop_set_string(DEVICE(uds), "name", mpcinfo[i].name);
 +        qdev_prop_set_uint64(DEVICE(uds), "size", mpcinfo[i].size);
 +        object_property_set_bool(OBJECT(uds), true, "realized", &error_fatal);
 +        downstream = sysbus_mmio_get_region(SYS_BUS_DEVICE(uds), 0);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +
 +    sysbus_init_child_obj(OBJECT(mms), mpcname, mpc, sizeof(mms->mpc[0]),
 +                          TYPE_TZ_MPC);
 +    object_property_set_link(OBJECT(mpc), OBJECT(downstream),
 +                             "downstream", &error_fatal);
 +    object_property_set_bool(OBJECT(mpc), true, "realized", &error_fatal);
 +    /* Map the upstream end of the MPC into system memory */
 +    upstream = sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 1);
 +    memory_region_add_subregion(get_system_memory(), mpcinfo[i].addr, upstream);
 +    /* and connect its interrupt to the SSE-200 */
 +    qdev_connect_gpio_out_named(DEVICE(mpc), "irq", 0,
 +                                qdev_get_gpio_in_named(DEVICE(&mms->sse),
 +                                                       "mpcexp_status", i));
 +
 +    g_free(mpcname);
 +    /* Return the register interface MR for our caller to map behind the PPC */
 +    return sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 0);
 +}
 +
  static MemoryRegion *make_musca_a_devs(MuscaMachineState *mms, void *opaque,
                                         const char *name, hwaddr size)
  {
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_musca_a_devs(MuscaMachineState *mms, void *opaque,
+-    return FIELD_EX64(id->mvfr0, MVFR0, FPSHVEC) > 0;
-         { "pwm1", make_unimp_dev, &mms->pwm[1], 0xe000, 0x1000 },
++    return FIELD_EX32(id->mvfr0, MVFR0, FPSHVEC) > 0;
          { "pwm2", make_unimp_dev, &mms->pwm[2], 0xf000, 0x1000 },
          { "gpio", make_unimp_dev, &mms->gpio, 0x10000, 0x1000 },
 -        { "mpc0", make_unimp_dev, &mms->mpc[0], 0x12000, 0x1000 },
 -        { "mpc1", make_unimp_dev, &mms->mpc[1], 0x13000, 0x1000 },
 +        { "mpc0", make_mpc, &mms->mpc[0], 0x12000, 0x1000 },
 +        { "mpc1", make_mpc, &mms->mpc[1], 0x13000, 0x1000 },
      };
      memory_region_init(container, OBJECT(mms), "musca-device-container", size);
@@ -XXX,XX +XXX,XX @@ static void musca_init(MachineState *machine)
      int i;
      assert(mmc->num_irqs <= MUSCA_NUMIRQ_MAX);
 +    assert(mmc->num_mpcs <= MUSCA_MPC_MAX);
      if (strcmp(machine->cpu_type, mc->default_cpu_type) != 0) {
          error_report("This board can only be used with CPU %s",
@@ -XXX,XX +XXX,XX @@ static void musca_init(MachineState *machine)
                  { "eflash1", make_unimp_dev, &mms->eflash[1],
 x52500000, 0x1000 },
                  { "qspi", make_unimp_dev, &mms->qspi, 0x42800000, 0x100000 },
 -                { "mpc0", make_unimp_dev, &mms->mpc[0], 0x52000000, 0x1000 },
 -                { "mpc1", make_unimp_dev, &mms->mpc[1], 0x52100000, 0x1000 },
 -                { "mpc2", make_unimp_dev, &mms->mpc[2], 0x52200000, 0x1000 },
 -                { "mpc3", make_unimp_dev, &mms->mpc[3], 0x52300000, 0x1000 },
 +                { "mpc0", make_mpc, &mms->mpc[0], 0x52000000, 0x1000 },
 +                { "mpc1", make_mpc, &mms->mpc[1], 0x52100000, 0x1000 },
 +                { "mpc2", make_mpc, &mms->mpc[2], 0x52200000, 0x1000 },
 +                { "mpc3", make_mpc, &mms->mpc[3], 0x52300000, 0x1000 },
                  { "mhu0", make_unimp_dev, &mms->mhu[0], 0x42600000, 0x100000 },
                  { "mhu1", make_unimp_dev, &mms->mhu[1], 0x42700000, 0x100000 },
                  { }, /* port 9: unused */
@@ -XXX,XX +XXX,XX @@ static void musca_init(MachineState *machine)
                  { }, /* port 11: unused */
                  { }, /* port 12: unused */
                  { }, /* port 13: unused */
 -                { "mpc4", make_unimp_dev, &mms->mpc[4], 0x52e00000, 0x1000 },
 +                { "mpc4", make_mpc, &mms->mpc[4], 0x52e00000, 0x1000 },
              },
          }, {
              .name = "apb_ppcexp1",
@@ -XXX,XX +XXX,XX @@ static void musca_a_class_init(ObjectClass *oc, void *data)
      mmc->init_svtor = 0x10200000;
      mmc->sram_addr_width = 15;
      mmc->num_irqs = 64;
 +    mmc->mpc_info = a_mpc_info;
 +    mmc->num_mpcs = ARRAY_SIZE(a_mpc_info);
  }
- static void musca_b1_class_init(ObjectClass *oc, void *data)
+ static inline bool isar_feature_aa32_fpdp(const ARMISARegisters *id)
-@@ -XXX,XX +XXX,XX @@ static void musca_b1_class_init(ObjectClass *oc, void *data)
+ {
-     mmc->init_svtor = 0x10000000;
+     /* Return true if CPU supports double precision floating point */
-     mmc->sram_addr_width = 17;
+-    return FIELD_EX64(id->mvfr0, MVFR0, FPDP) > 0;
-     mmc->num_irqs = 96;
++    return FIELD_EX32(id->mvfr0, MVFR0, FPDP) > 0;
 +    mmc->mpc_info = b1_mpc_info;
 +    mmc->num_mpcs = ARRAY_SIZE(b1_mpc_info);
  }
- static const TypeInfo musca_info = {
+ /*
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fpdp(const ARMISARegisters *id)
   */
  static inline bool isar_feature_aa32_fp16_spconv(const ARMISARegisters *id)
  {
 -    return FIELD_EX64(id->mvfr1, MVFR1, FPHP) > 0;
 +    return FIELD_EX32(id->mvfr1, MVFR1, FPHP) > 0;
  }
  static inline bool isar_feature_aa32_fp16_dpconv(const ARMISARegisters *id)
  {
 -    return FIELD_EX64(id->mvfr1, MVFR1, FPHP) > 1;
 +    return FIELD_EX32(id->mvfr1, MVFR1, FPHP) > 1;
  }
  static inline bool isar_feature_aa32_vsel(const ARMISARegisters *id)
  {
 -    return FIELD_EX64(id->mvfr2, MVFR2, FPMISC) >= 1;
 +    return FIELD_EX32(id->mvfr2, MVFR2, FPMISC) >= 1;
  }
  static inline bool isar_feature_aa32_vcvt_dr(const ARMISARegisters *id)
  {
 -    return FIELD_EX64(id->mvfr2, MVFR2, FPMISC) >= 2;
 +    return FIELD_EX32(id->mvfr2, MVFR2, FPMISC) >= 2;
  }
  static inline bool isar_feature_aa32_vrint(const ARMISARegisters *id)
  {
 -    return FIELD_EX64(id->mvfr2, MVFR2, FPMISC) >= 3;
 +    return FIELD_EX32(id->mvfr2, MVFR2, FPMISC) >= 3;
  }
  static inline bool isar_feature_aa32_vminmaxnm(const ARMISARegisters *id)
  {
 -    return FIELD_EX64(id->mvfr2, MVFR2, FPMISC) >= 4;
 +    return FIELD_EX32(id->mvfr2, MVFR2, FPMISC) >= 4;
  }
  static inline bool isar_feature_aa32_pan(const ARMISARegisters *id)
 --
 .20.1

-[Qemu-devel] [PULL 08/21] hw/misc/tz-ppc: Support having unused ports in the middle of the range
+[PULL 34/52] target/arm: Correctly implement ACTLR2, HACTLR2
-The Peripheral Protection Controller's handling of unused ports
+The ACTLR2 and HACTLR2 AArch32 system registers didn't exist in ARMv7
-is that if there is nothing connected to the port's downstream
+or the original ARMv8.  They were later added as optional registers,
-then it does not create the sysbus MMIO region for the upstream
+whose presence is signaled by the ID_MMFR4.AC2 field.  From ARMv8.2
-end of the port. This results in odd behaviour when there is
+they are mandatory (ie ID_MMFR4.AC2 must be non-zero).
 an unused port in the middle of the range: since sysbus MMIO
 regions are implicitly consecutively allocated, any used ports
 above the unused ones end up with sysbus MMIO region numbers
 that don't match the port number.
-Avoid this numbering mismatch by creating dummy MMIO regions
+We implemented HACTLR2 in commit 0e0456ab8895a5e85, but we
-for the unused ports. This doesn't change anything for our
+incorrectly made it exist for all v8 CPUs, and we didn't implement
-existing boards, which don't have any gaps in the middle of
+ACTLR2 at all.
 the port ranges they use; but it will be needed for the Musca
 board.
+Sort this out by implementing both registers only when they are
+supposed to exist, and setting the ID_MMFR4 bit for -cpu max.
+Note that this removes HACTLR2 from our Cortex-A53, -A47 and -A72
+CPU models; this is correct, because those CPUs do not implement
+this register.
+Fixes: 0e0456ab8895a5e85
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200214175116.9164-22-peter.maydell@linaro.org
 ---
- include/hw/misc/tz-ppc.h |  8 +++++++-
+ target/arm/cpu.h    |  5 +++++
- hw/misc/tz-ppc.c         | 32 ++++++++++++++++++++++++++++++++
+ target/arm/cpu.c    |  1 +
-files changed, 39 insertions(+), 1 deletion(-)
+ target/arm/cpu64.c  |  4 ++++
  target/arm/helper.c | 32 +++++++++++++++++++++++---------
 files changed, 33 insertions(+), 9 deletions(-)
-diff --git a/include/hw/misc/tz-ppc.h b/include/hw/misc/tz-ppc.h
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/misc/tz-ppc.h
+--- a/target/arm/cpu.h
-+++ b/include/hw/misc/tz-ppc.h
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_hpd(const ARMISARegisters *id)
-  *
+     return FIELD_EX32(id->id_mmfr4, ID_MMFR4, HPDS) != 0;
-  * QEMU interface:
+ }
-  * + sysbus MMIO regions 0..15: MemoryRegions defining the upstream end
-- *   of each of the 16 ports of the PPC
++static inline bool isar_feature_aa32_ac2(const ARMISARegisters *id)
 + *   of each of the 16 ports of the PPC. When a port is unused (i.e. no
 + *   downstream MemoryRegion is connected to it) at the end of the 0..15
 + *   range then no sysbus MMIO region is created for its upstream. When an
 + *   unused port lies in the middle of the range with other used ports at
 + *   higher port numbers, a dummy MMIO region is created to ensure that
 + *   port N's upstream is always sysbus MMIO region N. Dummy regions should
 + *   not be mapped, and will assert if any access is made to them.
   * + Property "port[0..15]": MemoryRegion defining the downstream device(s)
   *   for each of the 16 ports of the PPC
   * + Named GPIO inputs "cfg_nonsec[0..15]": set to 1 if the port should be
 diff --git a/hw/misc/tz-ppc.c b/hw/misc/tz-ppc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/tz-ppc.c
 +++ b/hw/misc/tz-ppc.c
@@ -XXX,XX +XXX,XX @@ static const MemoryRegionOps tz_ppc_ops = {
      .endianness = DEVICE_LITTLE_ENDIAN,
  };
 +static bool tz_ppc_dummy_accepts(void *opaque, hwaddr addr,
 +                                 unsigned size, bool is_write,
 +                                 MemTxAttrs attrs)
 +{
-+    /*
++    return FIELD_EX32(id->id_mmfr4, ID_MMFR4, AC2) != 0;
 +     * Board code should never map the upstream end of an unused port,
 +     * so we should never try to make a memory access to it.
 +     */
 +    g_assert_not_reached();
 +}
 +
-+static const MemoryRegionOps tz_ppc_dummy_ops = {
+ /*
-+    .valid.accepts = tz_ppc_dummy_accepts,
+  * 64-bit feature tests via id registers.
   */
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
              t = cpu->isar.id_mmfr4;
              t = FIELD_DP32(t, ID_MMFR4, HPDS, 1); /* AA32HPD */
 +            t = FIELD_DP32(t, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
              cpu->isar.id_mmfr4 = t;
          }
  #endif
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          u = FIELD_DP32(u, ID_MMFR3, PAN, 2); /* ATS1E1 */
          cpu->isar.id_mmfr3 = u;
 +        u = cpu->isar.id_mmfr4;
 +        u = FIELD_DP32(u, ID_MMFR4, AC2, 1); /* ACTLR2, HACTLR2 */
 +        cpu->isar.id_mmfr4 = u;
 +
          u = cpu->isar.id_aa64dfr0;
          u = FIELD_DP64(u, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
          cpu->isar.id_aa64dfr0 = u;
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo ats1cp_reginfo[] = {
  };
  #endif
 +/*
 + * ACTLR2 and HACTLR2 map to ACTLR_EL1[63:32] and
 + * ACTLR_EL2[63:32]. They exist only if the ID_MMFR4.AC2 field
 + * is non-zero, which is never for ARMv7, optionally in ARMv8
 + * and mandatorily for ARMv8.2 and up.
 + * ACTLR2 is banked for S and NS if EL3 is AArch32. Since QEMU's
 + * implementation is RAZ/WI we can ignore this detail, as we
 + * do for ACTLR.
 + */
 +static const ARMCPRegInfo actlr2_hactlr2_reginfo[] = {
 +    { .name = "ACTLR2", .state = ARM_CP_STATE_AA32,
 +      .cp = 15, .opc1 = 0, .crn = 1, .crm = 0, .opc2 = 3,
 +      .access = PL1_RW, .type = ARM_CP_CONST,
 +      .resetvalue = 0 },
 +    { .name = "HACTLR2", .state = ARM_CP_STATE_AA32,
 +      .cp = 15, .opc1 = 4, .crn = 1, .crm = 0, .opc2 = 3,
 +      .access = PL2_RW, .type = ARM_CP_CONST,
 +      .resetvalue = 0 },
 +    REGINFO_SENTINEL
 +};
 +
- static void tz_ppc_reset(DeviceState *dev)
+ void register_cp_regs_for_features(ARMCPU *cpu)
  {
-     TZPPC *s = TZ_PPC(dev);
+     /* Register all the coprocessor registers based on feature bits */
-@@ -XXX,XX +XXX,XX @@ static void tz_ppc_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
-     SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
+             REGINFO_SENTINEL
-     TZPPC *s = TZ_PPC(dev);
+         };
-     int i;
+         define_arm_cp_regs(cpu, auxcr_reginfo);
-+    int max_port = 0;
+-        if (arm_feature(env, ARM_FEATURE_V8)) {
+-            /* HACTLR2 maps to ACTLR_EL2[63:32] and is not in ARMv7 */
-     /* We can't create the upstream end of the port until realize,
+-            ARMCPRegInfo hactlr2_reginfo = {
-      * as we don't know the size of the MR used as the downstream until then.
+-                .name = "HACTLR2", .state = ARM_CP_STATE_AA32,
-      */
+-                .cp = 15, .opc1 = 4, .crn = 1, .crm = 0, .opc2 = 3,
-     for (i = 0; i < TZ_NUM_PORTS; i++) {
+-                .access = PL2_RW, .type = ARM_CP_CONST,
-+        if (s->port[i].downstream) {
+-                .resetvalue = 0
-+            max_port = i;
+-            };
-+        }
+-            define_one_arm_cp_reg(cpu, &hactlr2_reginfo);
-+    }
++        if (cpu_isar_feature(aa32_ac2, cpu)) {
-+
++            define_arm_cp_regs(cpu, actlr2_hactlr2_reginfo);
 +    for (i = 0; i <= max_port; i++) {
          TZPPCPort *port = &s->port[i];
          char *name;
          uint64_t size;
          if (!port->downstream) {
 +            /*
 +             * Create dummy sysbus MMIO region so the sysbus region
 +             * numbering doesn't get out of sync with the port numbers.
 +             * The size is entirely arbitrary.
 +             */
 +            name = g_strdup_printf("tz-ppc-dummy-port[%d]", i);
 +            memory_region_init_io(&port->upstream, obj, &tz_ppc_dummy_ops,
 +                                  port, name, 0x10000);
 +            sysbus_init_mmio(sbd, &port->upstream);
 +            g_free(name);
              continue;
          }
+     }
 --
 .20.1

-New patch
+[PULL 35/52] hw: usb: hcd-ohci: Move OHCISysBusState and TYPE_SYSBUS_OHCI to include file
+From: Guenter Roeck <linux@roeck-us.net>
+We need to be able to use OHCISysBusState outside hcd-ohci.c, so move it
+to its include file.
+Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
+Signed-off-by: Guenter Roeck <linux@roeck-us.net>
+Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
+Message-id: 20200217204812.9857-2-linux@roeck-us.net
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ hw/usb/hcd-ohci.h | 16 ++++++++++++++++
+ hw/usb/hcd-ohci.c | 15 ---------------
+files changed, 16 insertions(+), 15 deletions(-)
+diff --git a/hw/usb/hcd-ohci.h b/hw/usb/hcd-ohci.h
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/usb/hcd-ohci.h
++++ b/hw/usb/hcd-ohci.h
+@@ -XXX,XX +XXX,XX @@
+ #define HCD_OHCI_H
+ #include "sysemu/dma.h"
++#include "hw/usb.h"
+ /* Number of Downstream Ports on the root hub: */
+ #define OHCI_MAX_PORTS 15
+@@ -XXX,XX +XXX,XX @@ typedef struct OHCIState {
+     void (*ohci_die)(struct OHCIState *ohci);
+ } OHCIState;
++#define TYPE_SYSBUS_OHCI "sysbus-ohci"
++#define SYSBUS_OHCI(obj) OBJECT_CHECK(OHCISysBusState, (obj), TYPE_SYSBUS_OHCI)
++
++typedef struct {
++    /*< private >*/
++    SysBusDevice parent_obj;
++    /*< public >*/
++
++    OHCIState ohci;
++    char *masterbus;
++    uint32_t num_ports;
++    uint32_t firstport;
++    dma_addr_t dma_offset;
++} OHCISysBusState;
++
+ extern const VMStateDescription vmstate_ohci_state;
+ void usb_ohci_init(OHCIState *ohci, DeviceState *dev, uint32_t num_ports,
+diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/usb/hcd-ohci.c
++++ b/hw/usb/hcd-ohci.c
+@@ -XXX,XX +XXX,XX @@ void ohci_sysbus_die(struct OHCIState *ohci)
+     ohci_bus_stop(ohci);
+ }
+-#define TYPE_SYSBUS_OHCI "sysbus-ohci"
+-#define SYSBUS_OHCI(obj) OBJECT_CHECK(OHCISysBusState, (obj), TYPE_SYSBUS_OHCI)
+-
+-typedef struct {
+-    /*< private >*/
+-    SysBusDevice parent_obj;
+-    /*< public >*/
+-
+-    OHCIState ohci;
+-    char *masterbus;
+-    uint32_t num_ports;
+-    uint32_t firstport;
+-    dma_addr_t dma_offset;
+-} OHCISysBusState;
+-
+ static void ohci_realize_pxa(DeviceState *dev, Error **errp)
+ {
+     OHCISysBusState *s = SYSBUS_OHCI(dev);
+--
+.20.1

-[Qemu-devel] [PULL 15/21] hw/arm/armsse: Allow boards to specify init-svtor
+[PULL 36/52] hcd-ehci: Introduce "companion-enable" sysbus property
-The Musca boards have DAPLink firmware that sets the initial
+From: Guenter Roeck <linux@roeck-us.net>
 secure VTOR value (the location of the vector table) differently
 depending on the boot mode (from flash, from RAM, etc). Export
 the init-svtor as a QOM property of the ARMSSE object so that
 the board can change it.
+We'll use this property in a follow-up patch to insantiate an EHCI
+bus with companion support.
+Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
+Signed-off-by: Guenter Roeck <linux@roeck-us.net>
+Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
+Message-id: 20200217204812.9857-3-linux@roeck-us.net
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- include/hw/arm/armsse.h | 3 +++
+ hw/usb/hcd-ehci-sysbus.c | 2 ++
- hw/arm/armsse.c         | 8 ++++----
+file changed, 2 insertions(+)
 files changed, 7 insertions(+), 4 deletions(-)
-diff --git a/include/hw/arm/armsse.h b/include/hw/arm/armsse.h
+diff --git a/hw/usb/hcd-ehci-sysbus.c b/hw/usb/hcd-ehci-sysbus.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/armsse.h
+--- a/hw/usb/hcd-ehci-sysbus.c
-+++ b/include/hw/arm/armsse.h
++++ b/hw/usb/hcd-ehci-sysbus.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static const VMStateDescription vmstate_ehci_sysbus = {
-  *    if necessary.)
-  *  + QOM property "SRAM_ADDR_WIDTH" sets the number of bits used for the
+ static Property ehci_sysbus_properties[] = {
-  *    address of each SRAM bank (and thus the total amount of internal SRAM)
+     DEFINE_PROP_UINT32("maxframes", EHCISysBusState, ehci.maxframes, 128),
-+ *  + QOM property "init-svtor" sets the initial value of the CPU SVTOR register
++    DEFINE_PROP_BOOL("companion-enable", EHCISysBusState, ehci.companion_enable,
-+ *    (where it expects to load the PC and SP from the vector table on reset)
++                     false),
-  *  + Named GPIO inputs "EXP_IRQ" 0..n are the expansion interrupts for CPU 0,
+     DEFINE_PROP_END_OF_LIST(),
   *    which are wired to its NVIC lines 32 .. n+32
   *  + Named GPIO inputs "EXP_CPU1_IRQ" 0..n are the expansion interrupts for
@@ -XXX,XX +XXX,XX @@ typedef struct ARMSSE {
      uint32_t exp_numirq;
      uint32_t mainclk_frq;
      uint32_t sram_addr_width;
 +    uint32_t init_svtor;
  } ARMSSE;
  typedef struct ARMSSEInfo ARMSSEInfo;
 diff --git a/hw/arm/armsse.c b/hw/arm/armsse.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/armsse.c
 +++ b/hw/arm/armsse.c
@@ -XXX,XX +XXX,XX @@ static void armsse_realize(DeviceState *dev, Error **errp)
           * the INITSVTOR* registers before powering up the CPUs in any case,
           * so the hardware's default value doesn't matter. QEMU doesn't emulate
           * the control processor, so instead we behave in the way that the
 -         * firmware does. All boards currently known about have firmware that
 -         * sets the INITSVTOR0 and INITSVTOR1 registers to 0x10000000, like the
 -         * IoTKit default. We can make this more configurable if necessary.
 +         * firmware does. The initial value is configurable by the board code
 +         * to match whatever its firmware does.
           */
 -        qdev_prop_set_uint32(cpudev, "init-svtor", 0x10000000);
 +        qdev_prop_set_uint32(cpudev, "init-svtor", s->init_svtor);
          /*
           * Start all CPUs except CPU0 powered down. In real hardware it is
           * a configurable property of the SSE-200 which CPUs start powered up
@@ -XXX,XX +XXX,XX @@ static Property armsse_properties[] = {
      DEFINE_PROP_UINT32("EXP_NUMIRQ", ARMSSE, exp_numirq, 64),
      DEFINE_PROP_UINT32("MAINCLK", ARMSSE, mainclk_frq, 0),
      DEFINE_PROP_UINT32("SRAM_ADDR_WIDTH", ARMSSE, sram_addr_width, 15),
 +    DEFINE_PROP_UINT32("init-svtor", ARMSSE, init_svtor, 0x10000000),
      DEFINE_PROP_END_OF_LIST()
  };
 --
 .20.1

-[Qemu-devel] [PULL 12/21] hw/char/pl011: Support all interrupt lines
+[PULL 37/52] arm: allwinner: Wire up USB ports
-The PL011 UART has six interrupt lines:
+From: Guenter Roeck <linux@roeck-us.net>
  * RX (receive data)
  * TX (transmit data)
  * RT (receive timeout)
  * MS (modem status)
  * E (errors)
  * combined (logical OR of all the above)
-So far we have only emulated the combined interrupt line;
+Instantiate EHCI and OHCI controllers on Allwinner A10. OHCI ports are
-add support for the others, so that boards that wire them
+modeled as companions of the respective EHCI ports.
 up to different interrupt controller inputs can do so.
+With this patch applied, USB controllers are discovered and instantiated
+when booting the cubieboard machine with a recent Linux kernel.
+ehci-platform 1c14000.usb: EHCI Host Controller
+ehci-platform 1c14000.usb: new USB bus registered, assigned bus number 1
+ehci-platform 1c14000.usb: irq 26, io mem 0x01c14000
+ehci-platform 1c14000.usb: USB 2.0 started, EHCI 1.00
+ehci-platform 1c1c000.usb: EHCI Host Controller
+ehci-platform 1c1c000.usb: new USB bus registered, assigned bus number 2
+ehci-platform 1c1c000.usb: irq 31, io mem 0x01c1c000
+ehci-platform 1c1c000.usb: USB 2.0 started, EHCI 1.00
+ohci-platform 1c14400.usb: Generic Platform OHCI controller
+ohci-platform 1c14400.usb: new USB bus registered, assigned bus number 3
+ohci-platform 1c14400.usb: irq 27, io mem 0x01c14400
+ohci-platform 1c1c400.usb: Generic Platform OHCI controller
+ohci-platform 1c1c400.usb: new USB bus registered, assigned bus number 4
+ohci-platform 1c1c400.usb: irq 32, io mem 0x01c1c400
+usb 2-1: new high-speed USB device number 2 using ehci-platform
+usb-storage 2-1:1.0: USB Mass Storage device detected
+scsi host1: usb-storage 2-1:1.0
+usb 3-1: new full-speed USB device number 2 using ohci-platform
+input: QEMU QEMU USB Mouse as /devices/platform/soc/1c14400.usb/usb3/3-1/3-1:1.0/0003:0627:0001.0001/input/input0
+Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
+Signed-off-by: Guenter Roeck <linux@roeck-us.net>
+Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
+Message-id: 20200217204812.9857-4-linux@roeck-us.net
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- include/hw/char/pl011.h |  2 +-
+ include/hw/arm/allwinner-a10.h |  6 +++++
- hw/char/pl011.c         | 46 +++++++++++++++++++++++++++++++++++++++--
+ hw/arm/allwinner-a10.c         | 43 ++++++++++++++++++++++++++++++++++
-files changed, 45 insertions(+), 3 deletions(-)
+files changed, 49 insertions(+)
-diff --git a/include/hw/char/pl011.h b/include/hw/char/pl011.h
+diff --git a/include/hw/arm/allwinner-a10.h b/include/hw/arm/allwinner-a10.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/char/pl011.h
+--- a/include/hw/arm/allwinner-a10.h
-+++ b/include/hw/char/pl011.h
++++ b/include/hw/arm/allwinner-a10.h
-@@ -XXX,XX +XXX,XX @@ typedef struct PL011State {
+@@ -XXX,XX +XXX,XX @@
-     int read_count;
+ #include "hw/intc/allwinner-a10-pic.h"
-     int read_trigger;
+ #include "hw/net/allwinner_emac.h"
-     CharBackend chr;
+ #include "hw/ide/ahci.h"
--    qemu_irq irq;
++#include "hw/usb/hcd-ohci.h"
-+    qemu_irq irq[6];
++#include "hw/usb/hcd-ehci.h"
-     const unsigned char *id;
- } PL011State;
+ #include "target/arm/cpu.h"
-diff --git a/hw/char/pl011.c b/hw/char/pl011.c
  #define AW_A10_SDRAM_BASE       0x40000000
 +#define AW_A10_NUM_USB          2
 +
  #define TYPE_AW_A10 "allwinner-a10"
  #define AW_A10(obj) OBJECT_CHECK(AwA10State, (obj), TYPE_AW_A10)
@@ -XXX,XX +XXX,XX @@ typedef struct AwA10State {
      AwEmacState emac;
      AllwinnerAHCIState sata;
      MemoryRegion sram_a;
 +    EHCISysBusState ehci[AW_A10_NUM_USB];
 +    OHCISysBusState ohci[AW_A10_NUM_USB];
  } AwA10State;
  #endif
 diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/char/pl011.c
+--- a/hw/arm/allwinner-a10.c
-+++ b/hw/char/pl011.c
++++ b/hw/arm/allwinner-a10.c
 @@ -XXX,XX +XXX,XX @@
-  * This code is licensed under the GPL.
+ #include "hw/arm/allwinner-a10.h"
-  */
+ #include "hw/misc/unimp.h"
+ #include "sysemu/sysemu.h"
-+/*
++#include "hw/boards.h"
-+ * QEMU interface:
++#include "hw/usb/hcd-ohci.h"
-+ *  + sysbus MMIO region 0: device registers
-+ *  + sysbus IRQ 0: UARTINTR (combined interrupt line)
+ #define AW_A10_PIC_REG_BASE     0x01c20400
-+ *  + sysbus IRQ 1: UARTRXINTR (receive FIFO interrupt line)
+ #define AW_A10_PIT_REG_BASE     0x01c20c00
-+ *  + sysbus IRQ 2: UARTTXINTR (transmit FIFO interrupt line)
+ #define AW_A10_UART0_REG_BASE   0x01c28000
-+ *  + sysbus IRQ 3: UARTRTINTR (receive timeout interrupt line)
+ #define AW_A10_EMAC_BASE        0x01c0b000
-+ *  + sysbus IRQ 4: UARTMSINTR (momem status interrupt line)
++#define AW_A10_EHCI_BASE        0x01c14000
-+ *  + sysbus IRQ 5: UARTEINTR (error interrupt line)
++#define AW_A10_OHCI_BASE        0x01c14400
-+ */
+ #define AW_A10_SATA_BASE        0x01c18000
  static void aw_a10_init(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void aw_a10_init(Object *obj)
      sysbus_init_child_obj(obj, "sata", &s->sata, sizeof(s->sata),
                            TYPE_ALLWINNER_AHCI);
 +
- #include "qemu/osdep.h"
++    if (machine_usb(current_machine)) {
- #include "hw/char/pl011.h"
++        int i;
  #include "hw/sysbus.h"
@@ -XXX,XX +XXX,XX @@
  #define PL011_FLAG_TXFF 0x20
  #define PL011_FLAG_RXFE 0x10
 +/* Interrupt status bits in UARTRIS, UARTMIS, UARTIMSC */
 +#define INT_OE (1 << 10)
 +#define INT_BE (1 << 9)
 +#define INT_PE (1 << 8)
 +#define INT_FE (1 << 7)
 +#define INT_RT (1 << 6)
 +#define INT_TX (1 << 5)
 +#define INT_RX (1 << 4)
 +#define INT_DSR (1 << 3)
 +#define INT_DCD (1 << 2)
 +#define INT_CTS (1 << 1)
 +#define INT_RI (1 << 0)
 +#define INT_E (INT_OE | INT_BE | INT_PE | INT_FE)
 +#define INT_MS (INT_RI | INT_DSR | INT_DCD | INT_CTS)
 +
- static const unsigned char pl011_id_arm[8] =
++        for (i = 0; i < AW_A10_NUM_USB; i++) {
-   { 0x11, 0x10, 0x14, 0x00, 0x0d, 0xf0, 0x05, 0xb1 };
++            sysbus_init_child_obj(obj, "ehci[*]", OBJECT(&s->ehci[i]),
- static const unsigned char pl011_id_luminary[8] =
++                                  sizeof(s->ehci[i]), TYPE_PLATFORM_EHCI);
-   { 0x11, 0x00, 0x18, 0x01, 0x0d, 0xf0, 0x05, 0xb1 };
++            sysbus_init_child_obj(obj, "ohci[*]", OBJECT(&s->ohci[i]),
++                                  sizeof(s->ohci[i]), TYPE_SYSBUS_OHCI);
-+/* Which bits in the interrupt status matter for each outbound IRQ line ? */
++        }
 +static const uint32_t irqmask[] = {
 +    INT_E | INT_MS | INT_RT | INT_TX | INT_RX, /* combined IRQ */
 +    INT_RX,
 +    INT_TX,
 +    INT_RT,
 +    INT_MS,
 +    INT_E,
 +};
 +
  static void pl011_update(PL011State *s)
  {
      uint32_t flags;
 +    int i;
      flags = s->int_level & s->int_enabled;
      trace_pl011_irq_state(flags != 0);
 -    qemu_set_irq(s->irq, flags != 0);
 +    for (i = 0; i < ARRAY_SIZE(s->irq); i++) {
 +        qemu_set_irq(s->irq[i], (flags & irqmask[i]) != 0);
 +    }
  }
- static uint64_t pl011_read(void *opaque, hwaddr offset,
+ static void aw_a10_realize(DeviceState *dev, Error **errp)
-@@ -XXX,XX +XXX,XX @@ static void pl011_init(Object *obj)
+@@ -XXX,XX +XXX,XX @@ static void aw_a10_realize(DeviceState *dev, Error **errp)
- {
+     serial_mm_init(get_system_memory(), AW_A10_UART0_REG_BASE, 2,
-     SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
+                    qdev_get_gpio_in(dev, 1),
-     PL011State *s = PL011(obj);
+, serial_hd(0), DEVICE_NATIVE_ENDIAN);
-+    int i;
++
++    if (machine_usb(current_machine)) {
-     memory_region_init_io(&s->iomem, OBJECT(s), &pl011_ops, s, "pl011", 0x1000);
++        int i;
-     sysbus_init_mmio(sbd, &s->iomem);
++
--    sysbus_init_irq(sbd, &s->irq);
++        for (i = 0; i < AW_A10_NUM_USB; i++) {
-+    for (i = 0; i < ARRAY_SIZE(s->irq); i++) {
++            char bus[16];
-+        sysbus_init_irq(sbd, &s->irq[i]);
++
 +            sprintf(bus, "usb-bus.%d", i);
 +
 +            object_property_set_bool(OBJECT(&s->ehci[i]), true,
 +                                     "companion-enable", &error_fatal);
 +            object_property_set_bool(OBJECT(&s->ehci[i]), true, "realized",
 +                                     &error_fatal);
 +            sysbus_mmio_map(SYS_BUS_DEVICE(&s->ehci[i]), 0,
 +                            AW_A10_EHCI_BASE + i * 0x8000);
 +            sysbus_connect_irq(SYS_BUS_DEVICE(&s->ehci[i]), 0,
 +                               qdev_get_gpio_in(dev, 39 + i));
 +
 +            object_property_set_str(OBJECT(&s->ohci[i]), bus, "masterbus",
 +                                    &error_fatal);
 +            object_property_set_bool(OBJECT(&s->ohci[i]), true, "realized",
 +                                     &error_fatal);
 +            sysbus_mmio_map(SYS_BUS_DEVICE(&s->ohci[i]), 0,
 +                            AW_A10_OHCI_BASE + i * 0x8000);
 +            sysbus_connect_irq(SYS_BUS_DEVICE(&s->ohci[i]), 0,
 +                               qdev_get_gpio_in(dev, 64 + i));
 +        }
 +    }
+ }
-     s->read_trigger = 1;
-     s->ifl = 0x12;
+ static void aw_a10_class_init(ObjectClass *oc, void *data)
 --
 .20.1

-[Qemu-devel] [PULL 16/21] hw/arm/musca.c: Implement models of the Musca-A and -B1 boards
+[PULL 38/52] target/arm: Vectorize USHL and SSHL
-The Musca-A and Musca-B1 development boards are based on the
+From: Richard Henderson <richard.henderson@linaro.org>
 SSE-200 subsystem for embedded. Implement an initial skeleton
 model of these boards, which are similar but not identical.
-This commit creates the board model with the SSE and the IRQ
+These instructions shift left or right depending on the sign
-splitters to wire IRQs up to its two CPUs. As yet there
+of the input, and 7 bits are significant to the shift.  This
-are no devices and no memory: these will be added later.
+requires several masks and selects in addition to the actual
 shifts to form the complete answer.
+That said, the operation is still a small improvement even for
+two 64-bit elements -- 13 vector operations instead of 2 * 7
+integer operations.
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200216214232.4230-2-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- hw/arm/Makefile.objs            |   1 +
+ target/arm/helper.h        |  11 +-
- hw/arm/musca.c                  | 197 ++++++++++++++++++++++++++++++++
+ target/arm/translate.h     |   6 +
- MAINTAINERS                     |   6 +
+ target/arm/neon_helper.c   |  33 ----
- default-configs/arm-softmmu.mak |   1 +
+ target/arm/translate-a64.c |  18 +--
-files changed, 205 insertions(+)
+ target/arm/translate.c     | 299 +++++++++++++++++++++++++++++++++++--
- create mode 100644 hw/arm/musca.c
+ target/arm/vec_helper.c    |  88 +++++++++++
 files changed, 389 insertions(+), 66 deletions(-)
-diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
+diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/Makefile.objs
+--- a/target/arm/helper.h
-+++ b/hw/arm/Makefile.objs
++++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_ASPEED_SOC) += aspeed_soc.o aspeed.o
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(neon_abd_s16, i32, i32, i32)
- obj-$(CONFIG_MPS2) += mps2.o
+ DEF_HELPER_2(neon_abd_u32, i32, i32, i32)
- obj-$(CONFIG_MPS2) += mps2-tz.o
+ DEF_HELPER_2(neon_abd_s32, i32, i32, i32)
- obj-$(CONFIG_MSF2) += msf2-soc.o msf2-som.o
-+obj-$(CONFIG_MUSCA) += musca.o
+-DEF_HELPER_2(neon_shl_u8, i32, i32, i32)
- obj-$(CONFIG_ARMSSE) += armsse.o
+-DEF_HELPER_2(neon_shl_s8, i32, i32, i32)
- obj-$(CONFIG_FSL_IMX7) += fsl-imx7.o mcimx7d-sabre.o
+ DEF_HELPER_2(neon_shl_u16, i32, i32, i32)
- obj-$(CONFIG_ARM_SMMUV3) += smmu-common.o smmuv3.o
+ DEF_HELPER_2(neon_shl_s16, i32, i32, i32)
-diff --git a/hw/arm/musca.c b/hw/arm/musca.c
+-DEF_HELPER_2(neon_shl_u32, i32, i32, i32)
-new file mode 100644
+-DEF_HELPER_2(neon_shl_s32, i32, i32, i32)
-index XXXXXXX..XXXXXXX
+-DEF_HELPER_2(neon_shl_u64, i64, i64, i64)
---- /dev/null
+-DEF_HELPER_2(neon_shl_s64, i64, i64, i64)
-+++ b/hw/arm/musca.c
+ DEF_HELPER_2(neon_rshl_u8, i32, i32, i32)
-@@ -XXX,XX +XXX,XX @@
+ DEF_HELPER_2(neon_rshl_s8, i32, i32, i32)
-+/*
+ DEF_HELPER_2(neon_rshl_u16, i32, i32, i32)
-+ * Arm Musca-B1 test chip board emulation
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_2(frint64_s, TCG_CALL_NO_RWG, f32, f32, ptr)
-+ *
+ DEF_HELPER_FLAGS_2(frint32_d, TCG_CALL_NO_RWG, f64, f64, ptr)
-+ * Copyright (c) 2019 Linaro Limited
+ DEF_HELPER_FLAGS_2(frint64_d, TCG_CALL_NO_RWG, f64, f64, ptr)
-+ * Written by Peter Maydell
-+ *
++DEF_HELPER_FLAGS_4(gvec_sshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+ *  This program is free software; you can redistribute it and/or modify
++DEF_HELPER_FLAGS_4(gvec_sshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+ *  it under the terms of the GNU General Public License version 2 or
++DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+ *  (at your option) any later version.
++DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
-+ */
++
-+
+ #ifdef TARGET_AARCH64
-+/*
+ #include "helper-a64.h"
-+ * The Musca boards are a reference implementation of a system using
+ #include "helper-sve.h"
-+ * the SSE-200 subsystem for embedded:
+diff --git a/target/arm/translate.h b/target/arm/translate.h
-+ * https://developer.arm.com/products/system-design/development-boards/iot-test-chips-and-boards/musca-a-test-chip-board
+index XXXXXXX..XXXXXXX 100644
-+ * https://developer.arm.com/products/system-design/development-boards/iot-test-chips-and-boards/musca-b-test-chip-board
+--- a/target/arm/translate.h
-+ * We model the A and B1 variants of this board, as described in the TRMs:
++++ b/target/arm/translate.h
-+ * http://infocenter.arm.com/help/topic/com.arm.doc.101107_0000_00_en/index.html
+@@ -XXX,XX +XXX,XX @@ uint64_t vfp_expand_imm(int size, uint8_t imm8);
-+ * http://infocenter.arm.com/help/topic/com.arm.doc.101312_0000_00_en/index.html
+ extern const GVecGen3 mla_op[4];
-+ */
+ extern const GVecGen3 mls_op[4];
-+
+ extern const GVecGen3 cmtst_op[4];
-+#include "qemu/osdep.h"
++extern const GVecGen3 sshl_op[4];
-+#include "qemu/error-report.h"
++extern const GVecGen3 ushl_op[4];
-+#include "qapi/error.h"
+ extern const GVecGen2i ssra_op[4];
-+#include "exec/address-spaces.h"
+ extern const GVecGen2i usra_op[4];
-+#include "hw/arm/arm.h"
+ extern const GVecGen2i sri_op[4];
-+#include "hw/arm/armsse.h"
+@@ -XXX,XX +XXX,XX @@ extern const GVecGen4 sqadd_op[4];
-+#include "hw/boards.h"
+ extern const GVecGen4 uqsub_op[4];
-+#include "hw/core/split-irq.h"
+ extern const GVecGen4 sqsub_op[4];
-+
+ void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
-+#define MUSCA_NUMIRQ_MAX 96
++void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
-+
++void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
-+typedef enum MuscaType {
++void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
-+    MUSCA_A,
++void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
-+    MUSCA_B1,
-+} MuscaType;
+ /*
-+
+  * Forward to the isar_feature_* tests given a DisasContext pointer.
-+typedef struct {
+diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
-+    MachineClass parent;
+index XXXXXXX..XXXXXXX 100644
-+    MuscaType type;
+--- a/target/arm/neon_helper.c
-+    uint32_t init_svtor;
++++ b/target/arm/neon_helper.c
-+    int sram_addr_width;
+@@ -XXX,XX +XXX,XX @@ NEON_VOP(abd_u32, neon_u32, 1)
-+    int num_irqs;
+     } else { \
-+} MuscaMachineClass;
+         dest = src1 << tmp; \
-+
+     }} while (0)
-+typedef struct {
+-NEON_VOP(shl_u8, neon_u8, 4)
-+    MachineState parent;
+ NEON_VOP(shl_u16, neon_u16, 2)
-+
+-NEON_VOP(shl_u32, neon_u32, 1)
-+    ARMSSE sse;
+ #undef NEON_FN
-+    SplitIRQ cpu_irq_splitter[MUSCA_NUMIRQ_MAX];
-+} MuscaMachineState;
+-uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
-+
+-{
-+#define TYPE_MUSCA_MACHINE "musca"
+-    int8_t shift = (int8_t)shiftop;
-+#define TYPE_MUSCA_A_MACHINE MACHINE_TYPE_NAME("musca-a")
+-    if (shift >= 64 || shift <= -64) {
-+#define TYPE_MUSCA_B1_MACHINE MACHINE_TYPE_NAME("musca-b1")
+-        val = 0;
-+
+-    } else if (shift < 0) {
-+#define MUSCA_MACHINE(obj) \
+-        val >>= -shift;
-+    OBJECT_CHECK(MuscaMachineState, obj, TYPE_MUSCA_MACHINE)
+-    } else {
-+#define MUSCA_MACHINE_GET_CLASS(obj) \
+-        val <<= shift;
-+    OBJECT_GET_CLASS(MuscaMachineClass, obj, TYPE_MUSCA_MACHINE)
+-    }
-+#define MUSCA_MACHINE_CLASS(klass) \
+-    return val;
-+    OBJECT_CLASS_CHECK(MuscaMachineClass, klass, TYPE_MUSCA_MACHINE)
+-}
-+
+-
-+/*
+ #define NEON_FN(dest, src1, src2) do { \
-+ * Main SYSCLK frequency in Hz
+     int8_t tmp; \
-+ * TODO this should really be different for the two cores, but we
+     tmp = (int8_t)src2; \
-+ * don't model that in our SSE-200 model yet.
+@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
-+ */
+     } else { \
-+#define SYSCLK_FRQ 40000000
+         dest = src1 << tmp; \
-+
+     }} while (0)
-+static void musca_init(MachineState *machine)
+-NEON_VOP(shl_s8, neon_s8, 4)
-+{
+ NEON_VOP(shl_s16, neon_s16, 2)
-+    MuscaMachineState *mms = MUSCA_MACHINE(machine);
+-NEON_VOP(shl_s32, neon_s32, 1)
-+    MuscaMachineClass *mmc = MUSCA_MACHINE_GET_CLASS(mms);
+ #undef NEON_FN
-+    MachineClass *mc = MACHINE_GET_CLASS(machine);
-+    MemoryRegion *system_memory = get_system_memory();
+-uint64_t HELPER(neon_shl_s64)(uint64_t valop, uint64_t shiftop)
-+    DeviceState *ssedev;
+-{
-+    int i;
+-    int8_t shift = (int8_t)shiftop;
-+
+-    int64_t val = valop;
-+    assert(mmc->num_irqs <= MUSCA_NUMIRQ_MAX);
+-    if (shift >= 64) {
-+
+-        val = 0;
-+    if (strcmp(machine->cpu_type, mc->default_cpu_type) != 0) {
+-    } else if (shift <= -64) {
-+        error_report("This board can only be used with CPU %s",
+-        val >>= 63;
-+                     mc->default_cpu_type);
+-    } else if (shift < 0) {
-+        exit(1);
+-        val >>= -shift;
-+    }
+-    } else {
-+
+-        val <<= shift;
-+    sysbus_init_child_obj(OBJECT(machine), "sse-200", &mms->sse,
+-    }
-+                          sizeof(mms->sse), TYPE_SSE200);
+-    return val;
-+    ssedev = DEVICE(&mms->sse);
+-}
-+    object_property_set_link(OBJECT(&mms->sse), OBJECT(system_memory),
+-
-+                             "memory", &error_fatal);
+ #define NEON_FN(dest, src1, src2) do { \
-+    qdev_prop_set_uint32(ssedev, "EXP_NUMIRQ", mmc->num_irqs);
+     int8_t tmp; \
-+    qdev_prop_set_uint32(ssedev, "init-svtor", mmc->init_svtor);
+     tmp = (int8_t)src2; \
-+    qdev_prop_set_uint32(ssedev, "SRAM_ADDR_WIDTH", mmc->sram_addr_width);
+diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
-+    qdev_prop_set_uint32(ssedev, "MAINCLK", SYSCLK_FRQ);
+index XXXXXXX..XXXXXXX 100644
-+    object_property_set_bool(OBJECT(&mms->sse), true, "realized",
+--- a/target/arm/translate-a64.c
-+                             &error_fatal);
++++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
          break;
      case 0x8: /* SSHL, USHL */
          if (u) {
 -            gen_helper_neon_shl_u64(tcg_rd, tcg_rn, tcg_rm);
 +            gen_ushl_i64(tcg_rd, tcg_rn, tcg_rm);
          } else {
 -            gen_helper_neon_shl_s64(tcg_rd, tcg_rn, tcg_rm);
 +            gen_sshl_i64(tcg_rd, tcg_rn, tcg_rm);
          }
          break;
      case 0x9: /* SQSHL, UQSHL */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                         is_q ? 16 : 8, vec_full_reg_size(s),
                         (u ? uqsub_op : sqsub_op) + size);
          return;
 +    case 0x08: /* SSHL, USHL */
 +        gen_gvec_op3(s, is_q, rd, rn, rm,
 +                     u ? &ushl_op[size] : &sshl_op[size]);
 +        return;
      case 0x0c: /* SMAX, UMAX */
          if (u) {
              gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                  genfn = fns[size][u];
                  break;
              }
 -            case 0x8: /* SSHL, USHL */
 -            {
 -                static NeonGenTwoOpFn * const fns[3][2] = {
 -                    { gen_helper_neon_shl_s8, gen_helper_neon_shl_u8 },
 -                    { gen_helper_neon_shl_s16, gen_helper_neon_shl_u16 },
 -                    { gen_helper_neon_shl_s32, gen_helper_neon_shl_u32 },
 -                };
 -                genfn = fns[size][u];
 -                break;
 -            }
              case 0x9: /* SQSHL, UQSHL */
              {
                  static NeonGenTwoOpEnvFn * const fns[3][2] = {
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_shift_narrow(int size, TCGv_i32 var, TCGv_i32 shift,
          if (u) {
              switch (size) {
              case 1: gen_helper_neon_shl_u16(var, var, shift); break;
 -            case 2: gen_helper_neon_shl_u32(var, var, shift); break;
 +            case 2: gen_ushl_i32(var, var, shift); break;
              default: abort();
              }
          } else {
              switch (size) {
              case 1: gen_helper_neon_shl_s16(var, var, shift); break;
 -            case 2: gen_helper_neon_shl_s32(var, var, shift); break;
 +            case 2: gen_sshl_i32(var, var, shift); break;
              default: abort();
              }
          }
@@ -XXX,XX +XXX,XX @@ const GVecGen3 cmtst_op[4] = {
        .vece = MO_64 },
  };
 +void gen_ushl_i32(TCGv_i32 dst, TCGv_i32 src, TCGv_i32 shift)
 +{
 +    TCGv_i32 lval = tcg_temp_new_i32();
 +    TCGv_i32 rval = tcg_temp_new_i32();
 +    TCGv_i32 lsh = tcg_temp_new_i32();
 +    TCGv_i32 rsh = tcg_temp_new_i32();
 +    TCGv_i32 zero = tcg_const_i32(0);
 +    TCGv_i32 max = tcg_const_i32(32);
 +
 +    /*
-+     * We need to create splitters to feed the IRQ inputs
++     * Rely on the TCG guarantee that out of range shifts produce
-+     * for each CPU in the SSE-200 from each device in the board.
++     * unspecified results, not undefined behaviour (i.e. no trap).
 +     * Discard out-of-range results after the fact.
 +     */
-+    for (i = 0; i < mmc->num_irqs; i++) {
++    tcg_gen_ext8s_i32(lsh, shift);
-+        char *name = g_strdup_printf("musca-irq-splitter%d", i);
++    tcg_gen_neg_i32(rsh, lsh);
-+        SplitIRQ *splitter = &mms->cpu_irq_splitter[i];
++    tcg_gen_shl_i32(lval, src, lsh);
-+
++    tcg_gen_shr_i32(rval, src, rsh);
-+        object_initialize_child(OBJECT(machine), name,
++    tcg_gen_movcond_i32(TCG_COND_LTU, dst, lsh, max, lval, zero);
-+                                splitter, sizeof(*splitter),
++    tcg_gen_movcond_i32(TCG_COND_LTU, dst, rsh, max, rval, dst);
-+                                TYPE_SPLIT_IRQ, &error_fatal, NULL);
++
-+        g_free(name);
++    tcg_temp_free_i32(lval);
-+
++    tcg_temp_free_i32(rval);
-+        object_property_set_int(OBJECT(splitter), 2, "num-lines",
++    tcg_temp_free_i32(lsh);
-+                                &error_fatal);
++    tcg_temp_free_i32(rsh);
-+        object_property_set_bool(OBJECT(splitter), true, "realized",
++    tcg_temp_free_i32(zero);
-+                                 &error_fatal);
++    tcg_temp_free_i32(max);
-+        qdev_connect_gpio_out(DEVICE(splitter), 0,
++}
-+                              qdev_get_gpio_in_named(ssedev, "EXP_IRQ", i));
++
-+        qdev_connect_gpio_out(DEVICE(splitter), 1,
++void gen_ushl_i64(TCGv_i64 dst, TCGv_i64 src, TCGv_i64 shift)
-+                              qdev_get_gpio_in_named(ssedev,
++{
-+                                                     "EXP_CPU1_IRQ", i));
++    TCGv_i64 lval = tcg_temp_new_i64();
-+    }
++    TCGv_i64 rval = tcg_temp_new_i64();
-+
++    TCGv_i64 lsh = tcg_temp_new_i64();
-+    armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename, 0x2000000);
++    TCGv_i64 rsh = tcg_temp_new_i64();
-+}
++    TCGv_i64 zero = tcg_const_i64(0);
-+
++    TCGv_i64 max = tcg_const_i64(64);
-+static void musca_class_init(ObjectClass *oc, void *data)
++
 +{
 +    MachineClass *mc = MACHINE_CLASS(oc);
 +
 +    mc->default_cpus = 2;
 +    mc->min_cpus = mc->default_cpus;
 +    mc->max_cpus = mc->default_cpus;
 +    mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m33");
 +    mc->init = musca_init;
 +}
 +
 +static void musca_a_class_init(ObjectClass *oc, void *data)
 +{
 +    MachineClass *mc = MACHINE_CLASS(oc);
 +    MuscaMachineClass *mmc = MUSCA_MACHINE_CLASS(oc);
 +
 +    mc->desc = "ARM Musca-A board (dual Cortex-M33)";
 +    mmc->type = MUSCA_A;
 +    mmc->init_svtor = 0x10200000;
 +    mmc->sram_addr_width = 15;
 +    mmc->num_irqs = 64;
 +}
 +
 +static void musca_b1_class_init(ObjectClass *oc, void *data)
 +{
 +    MachineClass *mc = MACHINE_CLASS(oc);
 +    MuscaMachineClass *mmc = MUSCA_MACHINE_CLASS(oc);
 +
 +    mc->desc = "ARM Musca-B1 board (dual Cortex-M33)";
 +    mmc->type = MUSCA_B1;
 +    /*
-+     * This matches the DAPlink firmware which boots from QSPI. There
++     * Rely on the TCG guarantee that out of range shifts produce
-+     * is also a firmware blob which boots from the eFlash, which
++     * unspecified results, not undefined behaviour (i.e. no trap).
-+     * uses init_svtor = 0x1A000000. QEMU doesn't currently support that,
++     * Discard out-of-range results after the fact.
 +     * though we could in theory expose a machine property on the command
 +     * line to allow the user to request eFlash boot.
 +     */
-+    mmc->init_svtor = 0x10000000;
++    tcg_gen_ext8s_i64(lsh, shift);
-+    mmc->sram_addr_width = 17;
++    tcg_gen_neg_i64(rsh, lsh);
-+    mmc->num_irqs = 96;
++    tcg_gen_shl_i64(lval, src, lsh);
-+}
++    tcg_gen_shr_i64(rval, src, rsh);
-+
++    tcg_gen_movcond_i64(TCG_COND_LTU, dst, lsh, max, lval, zero);
-+static const TypeInfo musca_info = {
++    tcg_gen_movcond_i64(TCG_COND_LTU, dst, rsh, max, rval, dst);
-+    .name = TYPE_MUSCA_MACHINE,
++
-+    .parent = TYPE_MACHINE,
++    tcg_temp_free_i64(lval);
-+    .abstract = true,
++    tcg_temp_free_i64(rval);
-+    .instance_size = sizeof(MuscaMachineState),
++    tcg_temp_free_i64(lsh);
-+    .class_size = sizeof(MuscaMachineClass),
++    tcg_temp_free_i64(rsh);
-+    .class_init = musca_class_init,
++    tcg_temp_free_i64(zero);
 +    tcg_temp_free_i64(max);
 +}
 +
 +static void gen_ushl_vec(unsigned vece, TCGv_vec dst,
 +                         TCGv_vec src, TCGv_vec shift)
 +{
 +    TCGv_vec lval = tcg_temp_new_vec_matching(dst);
 +    TCGv_vec rval = tcg_temp_new_vec_matching(dst);
 +    TCGv_vec lsh = tcg_temp_new_vec_matching(dst);
 +    TCGv_vec rsh = tcg_temp_new_vec_matching(dst);
 +    TCGv_vec msk, max;
 +
 +    tcg_gen_neg_vec(vece, rsh, shift);
 +    if (vece == MO_8) {
 +        tcg_gen_mov_vec(lsh, shift);
 +    } else {
 +        msk = tcg_temp_new_vec_matching(dst);
 +        tcg_gen_dupi_vec(vece, msk, 0xff);
 +        tcg_gen_and_vec(vece, lsh, shift, msk);
 +        tcg_gen_and_vec(vece, rsh, rsh, msk);
 +        tcg_temp_free_vec(msk);
 +    }
 +
 +    /*
 +     * Rely on the TCG guarantee that out of range shifts produce
 +     * unspecified results, not undefined behaviour (i.e. no trap).
 +     * Discard out-of-range results after the fact.
 +     */
 +    tcg_gen_shlv_vec(vece, lval, src, lsh);
 +    tcg_gen_shrv_vec(vece, rval, src, rsh);
 +
 +    max = tcg_temp_new_vec_matching(dst);
 +    tcg_gen_dupi_vec(vece, max, 8 << vece);
 +
 +    /*
 +     * The choice of LT (signed) and GEU (unsigned) are biased toward
 +     * the instructions of the x86_64 host.  For MO_8, the whole byte
 +     * is significant so we must use an unsigned compare; otherwise we
 +     * have already masked to a byte and so a signed compare works.
 +     * Other tcg hosts have a full set of comparisons and do not care.
 +     */
 +    if (vece == MO_8) {
 +        tcg_gen_cmp_vec(TCG_COND_GEU, vece, lsh, lsh, max);
 +        tcg_gen_cmp_vec(TCG_COND_GEU, vece, rsh, rsh, max);
 +        tcg_gen_andc_vec(vece, lval, lval, lsh);
 +        tcg_gen_andc_vec(vece, rval, rval, rsh);
 +    } else {
 +        tcg_gen_cmp_vec(TCG_COND_LT, vece, lsh, lsh, max);
 +        tcg_gen_cmp_vec(TCG_COND_LT, vece, rsh, rsh, max);
 +        tcg_gen_and_vec(vece, lval, lval, lsh);
 +        tcg_gen_and_vec(vece, rval, rval, rsh);
 +    }
 +    tcg_gen_or_vec(vece, dst, lval, rval);
 +
 +    tcg_temp_free_vec(max);
 +    tcg_temp_free_vec(lval);
 +    tcg_temp_free_vec(rval);
 +    tcg_temp_free_vec(lsh);
 +    tcg_temp_free_vec(rsh);
 +}
 +
 +static const TCGOpcode ushl_list[] = {
 +    INDEX_op_neg_vec, INDEX_op_shlv_vec,
 +    INDEX_op_shrv_vec, INDEX_op_cmp_vec, 0
 +};
 +
-+static const TypeInfo musca_a_info = {
++const GVecGen3 ushl_op[4] = {
-+    .name = TYPE_MUSCA_A_MACHINE,
++    { .fniv = gen_ushl_vec,
-+    .parent = TYPE_MUSCA_MACHINE,
++      .fno = gen_helper_gvec_ushl_b,
-+    .class_init = musca_a_class_init,
++      .opt_opc = ushl_list,
 +      .vece = MO_8 },
 +    { .fniv = gen_ushl_vec,
 +      .fno = gen_helper_gvec_ushl_h,
 +      .opt_opc = ushl_list,
 +      .vece = MO_16 },
 +    { .fni4 = gen_ushl_i32,
 +      .fniv = gen_ushl_vec,
 +      .opt_opc = ushl_list,
 +      .vece = MO_32 },
 +    { .fni8 = gen_ushl_i64,
 +      .fniv = gen_ushl_vec,
 +      .opt_opc = ushl_list,
 +      .vece = MO_64 },
 +};
 +
-+static const TypeInfo musca_b1_info = {
++void gen_sshl_i32(TCGv_i32 dst, TCGv_i32 src, TCGv_i32 shift)
-+    .name = TYPE_MUSCA_B1_MACHINE,
++{
-+    .parent = TYPE_MUSCA_MACHINE,
++    TCGv_i32 lval = tcg_temp_new_i32();
-+    .class_init = musca_b1_class_init,
++    TCGv_i32 rval = tcg_temp_new_i32();
 +    TCGv_i32 lsh = tcg_temp_new_i32();
 +    TCGv_i32 rsh = tcg_temp_new_i32();
 +    TCGv_i32 zero = tcg_const_i32(0);
 +    TCGv_i32 max = tcg_const_i32(31);
 +
 +    /*
 +     * Rely on the TCG guarantee that out of range shifts produce
 +     * unspecified results, not undefined behaviour (i.e. no trap).
 +     * Discard out-of-range results after the fact.
 +     */
 +    tcg_gen_ext8s_i32(lsh, shift);
 +    tcg_gen_neg_i32(rsh, lsh);
 +    tcg_gen_shl_i32(lval, src, lsh);
 +    tcg_gen_umin_i32(rsh, rsh, max);
 +    tcg_gen_sar_i32(rval, src, rsh);
 +    tcg_gen_movcond_i32(TCG_COND_LEU, lval, lsh, max, lval, zero);
 +    tcg_gen_movcond_i32(TCG_COND_LT, dst, lsh, zero, rval, lval);
 +
 +    tcg_temp_free_i32(lval);
 +    tcg_temp_free_i32(rval);
 +    tcg_temp_free_i32(lsh);
 +    tcg_temp_free_i32(rsh);
 +    tcg_temp_free_i32(zero);
 +    tcg_temp_free_i32(max);
 +}
 +
 +void gen_sshl_i64(TCGv_i64 dst, TCGv_i64 src, TCGv_i64 shift)
 +{
 +    TCGv_i64 lval = tcg_temp_new_i64();
 +    TCGv_i64 rval = tcg_temp_new_i64();
 +    TCGv_i64 lsh = tcg_temp_new_i64();
 +    TCGv_i64 rsh = tcg_temp_new_i64();
 +    TCGv_i64 zero = tcg_const_i64(0);
 +    TCGv_i64 max = tcg_const_i64(63);
 +
 +    /*
 +     * Rely on the TCG guarantee that out of range shifts produce
 +     * unspecified results, not undefined behaviour (i.e. no trap).
 +     * Discard out-of-range results after the fact.
 +     */
 +    tcg_gen_ext8s_i64(lsh, shift);
 +    tcg_gen_neg_i64(rsh, lsh);
 +    tcg_gen_shl_i64(lval, src, lsh);
 +    tcg_gen_umin_i64(rsh, rsh, max);
 +    tcg_gen_sar_i64(rval, src, rsh);
 +    tcg_gen_movcond_i64(TCG_COND_LEU, lval, lsh, max, lval, zero);
 +    tcg_gen_movcond_i64(TCG_COND_LT, dst, lsh, zero, rval, lval);
 +
 +    tcg_temp_free_i64(lval);
 +    tcg_temp_free_i64(rval);
 +    tcg_temp_free_i64(lsh);
 +    tcg_temp_free_i64(rsh);
 +    tcg_temp_free_i64(zero);
 +    tcg_temp_free_i64(max);
 +}
 +
 +static void gen_sshl_vec(unsigned vece, TCGv_vec dst,
 +                         TCGv_vec src, TCGv_vec shift)
 +{
 +    TCGv_vec lval = tcg_temp_new_vec_matching(dst);
 +    TCGv_vec rval = tcg_temp_new_vec_matching(dst);
 +    TCGv_vec lsh = tcg_temp_new_vec_matching(dst);
 +    TCGv_vec rsh = tcg_temp_new_vec_matching(dst);
 +    TCGv_vec tmp = tcg_temp_new_vec_matching(dst);
 +
 +    /*
 +     * Rely on the TCG guarantee that out of range shifts produce
 +     * unspecified results, not undefined behaviour (i.e. no trap).
 +     * Discard out-of-range results after the fact.
 +     */
 +    tcg_gen_neg_vec(vece, rsh, shift);
 +    if (vece == MO_8) {
 +        tcg_gen_mov_vec(lsh, shift);
 +    } else {
 +        tcg_gen_dupi_vec(vece, tmp, 0xff);
 +        tcg_gen_and_vec(vece, lsh, shift, tmp);
 +        tcg_gen_and_vec(vece, rsh, rsh, tmp);
 +    }
 +
 +    /* Bound rsh so out of bound right shift gets -1.  */
 +    tcg_gen_dupi_vec(vece, tmp, (8 << vece) - 1);
 +    tcg_gen_umin_vec(vece, rsh, rsh, tmp);
 +    tcg_gen_cmp_vec(TCG_COND_GT, vece, tmp, lsh, tmp);
 +
 +    tcg_gen_shlv_vec(vece, lval, src, lsh);
 +    tcg_gen_sarv_vec(vece, rval, src, rsh);
 +
 +    /* Select in-bound left shift.  */
 +    tcg_gen_andc_vec(vece, lval, lval, tmp);
 +
 +    /* Select between left and right shift.  */
 +    if (vece == MO_8) {
 +        tcg_gen_dupi_vec(vece, tmp, 0);
 +        tcg_gen_cmpsel_vec(TCG_COND_LT, vece, dst, lsh, tmp, rval, lval);
 +    } else {
 +        tcg_gen_dupi_vec(vece, tmp, 0x80);
 +        tcg_gen_cmpsel_vec(TCG_COND_LT, vece, dst, lsh, tmp, lval, rval);
 +    }
 +
 +    tcg_temp_free_vec(lval);
 +    tcg_temp_free_vec(rval);
 +    tcg_temp_free_vec(lsh);
 +    tcg_temp_free_vec(rsh);
 +    tcg_temp_free_vec(tmp);
 +}
 +
 +static const TCGOpcode sshl_list[] = {
 +    INDEX_op_neg_vec, INDEX_op_umin_vec, INDEX_op_shlv_vec,
 +    INDEX_op_sarv_vec, INDEX_op_cmp_vec, INDEX_op_cmpsel_vec, 0
 +};
 +
-+static void musca_machine_init(void)
++const GVecGen3 sshl_op[4] = {
-+{
++    { .fniv = gen_sshl_vec,
-+    type_register_static(&musca_info);
++      .fno = gen_helper_gvec_sshl_b,
-+    type_register_static(&musca_a_info);
++      .opt_opc = sshl_list,
-+    type_register_static(&musca_b1_info);
++      .vece = MO_8 },
-+}
++    { .fniv = gen_sshl_vec,
-+
++      .fno = gen_helper_gvec_sshl_h,
-+type_init(musca_machine_init);
++      .opt_opc = sshl_list,
-diff --git a/MAINTAINERS b/MAINTAINERS
++      .vece = MO_16 },
 +    { .fni4 = gen_sshl_i32,
 +      .fniv = gen_sshl_vec,
 +      .opt_opc = sshl_list,
 +      .vece = MO_32 },
 +    { .fni8 = gen_sshl_i64,
 +      .fniv = gen_sshl_vec,
 +      .opt_opc = sshl_list,
 +      .vece = MO_64 },
 +};
 +
  static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
                            TCGv_vec a, TCGv_vec b)
  {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                    vec_size, vec_size);
              }
              return 0;
 +
 +        case NEON_3R_VSHL:
 +            /* Note the operation is vshl vd,vm,vn */
 +            tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
 +                           u ? &ushl_op[size] : &sshl_op[size]);
 +            return 0;
          }
          if (size == 3) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  neon_load_reg64(cpu_V0, rn + pass);
                  neon_load_reg64(cpu_V1, rm + pass);
                  switch (op) {
 -                case NEON_3R_VSHL:
 -                    if (u) {
 -                        gen_helper_neon_shl_u64(cpu_V0, cpu_V1, cpu_V0);
 -                    } else {
 -                        gen_helper_neon_shl_s64(cpu_V0, cpu_V1, cpu_V0);
 -                    }
 -                    break;
                  case NEON_3R_VQSHL:
                      if (u) {
                          gen_helper_neon_qshl_u64(cpu_V0, cpu_env,
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
          }
          pairwise = 0;
          switch (op) {
 -        case NEON_3R_VSHL:
          case NEON_3R_VQSHL:
          case NEON_3R_VRSHL:
          case NEON_3R_VQRSHL:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
          case NEON_3R_VHSUB:
              GEN_NEON_INTEGER_OP(hsub);
              break;
 -        case NEON_3R_VSHL:
 -            GEN_NEON_INTEGER_OP(shl);
 -            break;
          case NEON_3R_VQSHL:
              GEN_NEON_INTEGER_OP_ENV(qshl);
              break;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                              }
                          } else {
                              if (input_unsigned) {
 -                                gen_helper_neon_shl_u64(cpu_V0, in, tmp64);
 +                                gen_ushl_i64(cpu_V0, in, tmp64);
                              } else {
 -                                gen_helper_neon_shl_s64(cpu_V0, in, tmp64);
 +                                gen_sshl_i64(cpu_V0, in, tmp64);
                              }
                          }
                          tmp = tcg_temp_new_i32();
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/MAINTAINERS
+--- a/target/arm/vec_helper.c
-+++ b/MAINTAINERS
++++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ F: include/hw/misc/iotkit-sysinfo.h
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
- F: hw/misc/armsse-cpuid.c
+     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status, desc,
- F: include/hw/misc/armsse-cpuid.h
+                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16));
+ }
-+Musca
++
-+M: Peter Maydell <peter.maydell@linaro.org>
++void HELPER(gvec_sshl_b)(void *vd, void *vn, void *vm, uint32_t desc)
-+L: qemu-arm@nongnu.org
++{
-+S: Maintained
++    intptr_t i, opr_sz = simd_oprsz(desc);
-+F: hw/arm/musca.c
++    int8_t *d = vd, *n = vn, *m = vm;
 +
- Musicpal
++    for (i = 0; i < opr_sz; ++i) {
- M: Jan Kiszka <jan.kiszka@web.de>
++        int8_t mm = m[i];
- M: Peter Maydell <peter.maydell@linaro.org>
++        int8_t nn = n[i];
-diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
++        int8_t res = 0;
-index XXXXXXX..XXXXXXX 100644
++        if (mm >= 0) {
---- a/default-configs/arm-softmmu.mak
++            if (mm < 8) {
-+++ b/default-configs/arm-softmmu.mak
++                res = nn << mm;
-@@ -XXX,XX +XXX,XX @@ CONFIG_TUSB6010=y
++            }
- CONFIG_IMX=y
++        } else {
- CONFIG_MAINSTONE=y
++            res = nn >> (mm > -8 ? -mm : 7);
- CONFIG_MPS2=y
++        }
-+CONFIG_MUSCA=y
++        d[i] = res;
- CONFIG_NSERIES=y
++    }
- CONFIG_RASPI=y
++    clear_tail(d, opr_sz, simd_maxsz(desc));
- CONFIG_REALVIEW=y
++}
 +
 +void HELPER(gvec_sshl_h)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
 +    intptr_t i, opr_sz = simd_oprsz(desc);
 +    int16_t *d = vd, *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz / 2; ++i) {
 +        int8_t mm = m[i];   /* only 8 bits of shift are significant */
 +        int16_t nn = n[i];
 +        int16_t res = 0;
 +        if (mm >= 0) {
 +            if (mm < 16) {
 +                res = nn << mm;
 +            }
 +        } else {
 +            res = nn >> (mm > -16 ? -mm : 15);
 +        }
 +        d[i] = res;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 +
 +void HELPER(gvec_ushl_b)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
 +    intptr_t i, opr_sz = simd_oprsz(desc);
 +    uint8_t *d = vd, *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz; ++i) {
 +        int8_t mm = m[i];
 +        uint8_t nn = n[i];
 +        uint8_t res = 0;
 +        if (mm >= 0) {
 +            if (mm < 8) {
 +                res = nn << mm;
 +            }
 +        } else {
 +            if (mm > -8) {
 +                res = nn >> -mm;
 +            }
 +        }
 +        d[i] = res;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 +
 +void HELPER(gvec_ushl_h)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
 +    intptr_t i, opr_sz = simd_oprsz(desc);
 +    uint16_t *d = vd, *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz / 2; ++i) {
 +        int8_t mm = m[i];   /* only 8 bits of shift are significant */
 +        uint16_t nn = n[i];
 +        uint16_t res = 0;
 +        if (mm >= 0) {
 +            if (mm < 16) {
 +                res = nn << mm;
 +            }
 +        } else {
 +            if (mm > -16) {
 +                res = nn >> -mm;
 +            }
 +        }
 +        d[i] = res;
 +    }
 +    clear_tail(d, opr_sz, simd_maxsz(desc));
 +}
 --
 .20.1

-New patch
+[PULL 39/52] target/arm: Convert PMUL.8 to gvec
+From: Richard Henderson <richard.henderson@linaro.org>
+The gvec form will be needed for implementing SVE2.
+Extend the implementation to operate on uint64_t instead of uint32_t.
+Use a counted inner loop instead of terminating when op1 goes to zero,
+looking toward the required implementation for ARMv8.4-DIT.
+Tested-by: Alex Bennée <alex.bennee@linaro.org>
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200216214232.4230-3-richard.henderson@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/helper.h        |  3 ++-
+ target/arm/neon_helper.c   | 22 ----------------------
+ target/arm/translate-a64.c | 10 +++-------
+ target/arm/translate.c     | 11 ++++-------
+ target/arm/vec_helper.c    | 30 ++++++++++++++++++++++++++++++
+files changed, 39 insertions(+), 37 deletions(-)
+diff --git a/target/arm/helper.h b/target/arm/helper.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.h
++++ b/target/arm/helper.h
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(neon_sub_u8, i32, i32, i32)
+ DEF_HELPER_2(neon_sub_u16, i32, i32, i32)
+ DEF_HELPER_2(neon_mul_u8, i32, i32, i32)
+ DEF_HELPER_2(neon_mul_u16, i32, i32, i32)
+-DEF_HELPER_2(neon_mul_p8, i32, i32, i32)
+ DEF_HELPER_2(neon_mull_p8, i64, i32, i32)
+ DEF_HELPER_2(neon_tst_u8, i32, i32, i32)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_sshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+ DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++DEF_HELPER_FLAGS_4(gvec_pmul_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
++
+ #ifdef TARGET_AARCH64
+ #include "helper-a64.h"
+ #include "helper-sve.h"
+diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/neon_helper.c
++++ b/target/arm/neon_helper.c
+@@ -XXX,XX +XXX,XX @@ NEON_VOP(mul_u16, neon_u16, 2)
+ /* Polynomial multiplication is like integer multiplication except the
+    partial products are XORed, not added.  */
+-uint32_t HELPER(neon_mul_p8)(uint32_t op1, uint32_t op2)
+-{
+-    uint32_t mask;
+-    uint32_t result;
+-    result = 0;
+-    while (op1) {
+-        mask = 0;
+-        if (op1 & 1)
+-            mask |= 0xff;
+-        if (op1 & (1 << 8))
+-            mask |= (0xff << 8);
+-        if (op1 & (1 << 16))
+-            mask |= (0xff << 16);
+-        if (op1 & (1 << 24))
+-            mask |= (0xff << 24);
+-        result ^= op2 & mask;
+-        op1 = (op1 >> 1) & 0x7f7f7f7f;
+-        op2 = (op2 << 1) & 0xfefefefe;
+-    }
+-    return result;
+-}
+-
+ uint64_t HELPER(neon_mull_p8)(uint32_t op1, uint32_t op2)
+ {
+     uint64_t result = 0;
+diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-a64.c
++++ b/target/arm/translate-a64.c
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
+     case 0x13: /* MUL, PMUL */
+         if (!u) { /* MUL */
+             gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_mul, size);
+-            return;
++        } else {  /* PMUL */
++            gen_gvec_op3_ool(s, is_q, rd, rn, rm, 0, gen_helper_gvec_pmul_b);
+         }
+-        break;
++        return;
+     case 0x12: /* MLA, MLS */
+         if (u) {
+             gen_gvec_op3(s, is_q, rd, rn, rm, &mls_op[size]);
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
+                 genfn = fns[size][u];
+                 break;
+             }
+-            case 0x13: /* MUL, PMUL */
+-                assert(u); /* PMUL */
+-                assert(size == 0);
+-                genfn = gen_helper_neon_mul_p8;
+-                break;
+             case 0x16: /* SQDMULH, SQRDMULH */
+             {
+                 static NeonGenTwoOpEnvFn * const fns[2][2] = {
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+         case NEON_3R_VMUL: /* VMUL */
+             if (u) {
+-                /* Polynomial case allows only P8 and is handled below.  */
++                /* Polynomial case allows only P8.  */
+                 if (size != 0) {
+                     return 1;
+                 }
++                tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
++                                   0, gen_helper_gvec_pmul_b);
+             } else {
+                 tcg_gen_gvec_mul(size, rd_ofs, rn_ofs, rm_ofs,
+                                  vec_size, vec_size);
+-                return 0;
+             }
+-            break;
++            return 0;
+         case NEON_3R_VML: /* VMLA, VMLS */
+             tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, vec_size, vec_size,
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
+             tmp2 = neon_load_reg(rd, pass);
+             gen_neon_add(size, tmp, tmp2);
+             break;
+-        case NEON_3R_VMUL:
+-            /* VMUL.P8; other cases already eliminated.  */
+-            gen_helper_neon_mul_p8(tmp, tmp, tmp2);
+-            break;
+         case NEON_3R_VPMAX:
+             GEN_NEON_INTEGER_OP(pmax);
+             break;
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/vec_helper.c
++++ b/target/arm/vec_helper.c
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_ushl_h)(void *vd, void *vn, void *vm, uint32_t desc)
+     }
+     clear_tail(d, opr_sz, simd_maxsz(desc));
+ }
++
++/*
++ * 8x8->8 polynomial multiply.
++ *
++ * Polynomial multiplication is like integer multiplication except the
++ * partial products are XORed, not added.
++ *
++ * TODO: expose this as a generic vector operation, as it is a common
++ * crypto building block.
++ */
++void HELPER(gvec_pmul_b)(void *vd, void *vn, void *vm, uint32_t desc)
++{
++    intptr_t i, j, opr_sz = simd_oprsz(desc);
++    uint64_t *d = vd, *n = vn, *m = vm;
++
++    for (i = 0; i < opr_sz / 8; ++i) {
++        uint64_t nn = n[i];
++        uint64_t mm = m[i];
++        uint64_t rr = 0;
++
++        for (j = 0; j < 8; ++j) {
++            uint64_t mask = (nn & 0x0101010101010101ull) * 0xff;
++            rr ^= mm & mask;
++            mm = (mm << 1) & 0xfefefefefefefefeull;
++            nn >>= 1;
++        }
++        d[i] = rr;
++    }
++    clear_tail(d, opr_sz, simd_maxsz(desc));
++}
+--
+.20.1

-[Qemu-devel] [PULL 07/21] target/arm: Implement ARMv8.3-JSConv
+[PULL 40/52] target/arm: Convert PMULL.64 to gvec
 From: Richard Henderson <richard.henderson@linaro.org>
+The gvec form will be needed for implementing SVE2.
+Tested-by: Alex Bennée <alex.bennee@linaro.org>
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190215192302.27855-5-richard.henderson@linaro.org
+Message-id: 20200216214232.4230-4-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 [PMM: fixed a couple of comment typos]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h           | 10 +++++
+ target/arm/helper.h        |  4 +---
- target/arm/helper.h        |  3 ++
+ target/arm/neon_helper.c   | 30 ------------------------------
- target/arm/cpu.c           |  1 +
+ target/arm/translate-a64.c | 28 +++-------------------------
- target/arm/cpu64.c         |  2 +
+ target/arm/translate.c     | 16 ++--------------
- target/arm/translate-a64.c | 26 +++++++++++
+ target/arm/vec_helper.c    | 33 +++++++++++++++++++++++++++++++++
- target/arm/translate.c     | 10 +++++
+files changed, 39 insertions(+), 72 deletions(-)
  target/arm/vfp_helper.c    | 88 ++++++++++++++++++++++++++++++++++++++
 files changed, 140 insertions(+)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
-+++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_vcma(const ARMISARegisters *id)
-     return FIELD_EX32(id->id_isar5, ID_ISAR5, VCMA) != 0;
- }
-+static inline bool isar_feature_aa32_jscvt(const ARMISARegisters *id)
-+{
-+    return FIELD_EX32(id->id_isar6, ID_ISAR6, JSCVT) != 0;
-+}
-+
- static inline bool isar_feature_aa32_dp(const ARMISARegisters *id)
- {
-     return FIELD_EX32(id->id_isar6, ID_ISAR6, DP) != 0;
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_dp(const ARMISARegisters *id)
-     return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, DP) != 0;
- }
-+static inline bool isar_feature_aa64_jscvt(const ARMISARegisters *id)
-+{
-+    return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, JSCVT) != 0;
-+}
-+
- static inline bool isar_feature_aa64_fcma(const ARMISARegisters *id)
- {
-     return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, FCMA) != 0;
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.h
 +++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_2(rintd_exact, TCG_CALL_NO_RWG, f64, f64, ptr)
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(crc32, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
- DEF_HELPER_FLAGS_2(rints, TCG_CALL_NO_RWG, f32, f32, ptr)
+ DEF_HELPER_FLAGS_3(crc32c, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
- DEF_HELPER_FLAGS_2(rintd, TCG_CALL_NO_RWG, f64, f64, ptr)
+ DEF_HELPER_2(dc_zva, void, env, i64)
-+DEF_HELPER_FLAGS_2(vjcvt, TCG_CALL_NO_RWG, i32, f64, env)
+-DEF_HELPER_FLAGS_2(neon_pmull_64_lo, TCG_CALL_NO_RWG_SE, i64, i64, i64)
-+DEF_HELPER_FLAGS_2(fjcvtzs, TCG_CALL_NO_RWG, i64, f64, ptr)
+-DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64)
-+
+-
- /* neon_helper.c */
+ DEF_HELPER_FLAGS_5(gvec_qrdmlah_s16, TCG_CALL_NO_RWG,
- DEF_HELPER_FLAGS_3(neon_qadd_u8, TCG_CALL_NO_RWG, i32, env, i32, i32)
+                    void, ptr, ptr, ptr, ptr, i32)
- DEF_HELPER_FLAGS_3(neon_qadd_s8, TCG_CALL_NO_RWG, i32, env, i32, i32)
+ DEF_HELPER_FLAGS_5(gvec_qrdmlsh_s16, TCG_CALL_NO_RWG,
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(gvec_pmul_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(gvec_pmull_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
  #ifdef TARGET_AARCH64
  #include "helper-a64.h"
 diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
+--- a/target/arm/neon_helper.c
-+++ b/target/arm/cpu.c
++++ b/target/arm/neon_helper.c
-@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ void HELPER(neon_zip16)(void *vd, void *vm)
-             cpu->isar.id_isar5 = t;
+     rm[0] = m0;
+     rd[0] = d0;
-             t = cpu->isar.id_isar6;
+ }
-+            t = FIELD_DP32(t, ID_ISAR6, JSCVT, 1);
+-
-             t = FIELD_DP32(t, ID_ISAR6, DP, 1);
+-/* Helper function for 64 bit polynomial multiply case:
-             cpu->isar.id_isar6 = t;
+- * perform PolynomialMult(op1, op2) and return either the top or
+- * bottom half of the 128 bit result.
-diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
+- */
-index XXXXXXX..XXXXXXX 100644
+-uint64_t HELPER(neon_pmull_64_lo)(uint64_t op1, uint64_t op2)
---- a/target/arm/cpu64.c
+-{
-+++ b/target/arm/cpu64.c
+-    int bitnum;
-@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
+-    uint64_t res = 0;
-         cpu->isar.id_aa64isar0 = t;
+-
+-    for (bitnum = 0; bitnum < 64; bitnum++) {
-         t = cpu->isar.id_aa64isar1;
+-        if (op1 & (1ULL << bitnum)) {
-+        t = FIELD_DP64(t, ID_AA64ISAR1, JSCVT, 1);
+-            res ^= op2 << bitnum;
-         t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
+-        }
-         t = FIELD_DP64(t, ID_AA64ISAR1, APA, 1); /* PAuth, architected only */
+-    }
-         t = FIELD_DP64(t, ID_AA64ISAR1, API, 0);
+-    return res;
-@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
+-}
-         cpu->isar.id_isar5 = u;
+-uint64_t HELPER(neon_pmull_64_hi)(uint64_t op1, uint64_t op2)
+-{
-         u = cpu->isar.id_isar6;
+-    int bitnum;
-+        u = FIELD_DP32(u, ID_ISAR6, JSCVT, 1);
+-    uint64_t res = 0;
-         u = FIELD_DP32(u, ID_ISAR6, DP, 1);
+-
-         cpu->isar.id_isar6 = u;
+-    /* bit 0 of op1 can't influence the high 64 bits at all */
+-    for (bitnum = 1; bitnum < 64; bitnum++) {
 -        if (op1 & (1ULL << bitnum)) {
 -            res ^= op2 >> (64 - bitnum);
 -        }
 -    }
 -    return res;
 -}
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
+@@ -XXX,XX +XXX,XX @@ static void handle_3rd_narrowing(DisasContext *s, int is_q, int is_u, int size,
-     }
+     clear_vec_high(s, is_q, rd);
  }
-+static void handle_fjcvtzs(DisasContext *s, int rd, int rn)
+-static void handle_pmull_64(DisasContext *s, int is_q, int rd, int rn, int rm)
-+{
+-{
-+    TCGv_i64 t = read_fp_dreg(s, rn);
+-    /* PMULL of 64 x 64 -> 128 is an odd special case because it
-+    TCGv_ptr fpstatus = get_fpstatus_ptr(false);
+-     * is the only three-reg-diff instruction which produces a
-+
+-     * 128-bit wide result from a single operation. However since
-+    gen_helper_fjcvtzs(t, t, fpstatus);
+-     * it's possible to calculate the two halves more or less
-+
+-     * separately we just use two helper calls.
-+    tcg_temp_free_ptr(fpstatus);
+-     */
-+
+-    TCGv_i64 tcg_op1 = tcg_temp_new_i64();
-+    tcg_gen_ext32u_i64(cpu_reg(s, rd), t);
+-    TCGv_i64 tcg_op2 = tcg_temp_new_i64();
-+    tcg_gen_extrh_i64_i32(cpu_ZF, t);
+-    TCGv_i64 tcg_res = tcg_temp_new_i64();
-+    tcg_gen_movi_i32(cpu_CF, 0);
+-
-+    tcg_gen_movi_i32(cpu_NF, 0);
+-    read_vec_element(s, tcg_op1, rn, is_q, MO_64);
-+    tcg_gen_movi_i32(cpu_VF, 0);
+-    read_vec_element(s, tcg_op2, rm, is_q, MO_64);
-+
+-    gen_helper_neon_pmull_64_lo(tcg_res, tcg_op1, tcg_op2);
-+    tcg_temp_free_i64(t);
+-    write_vec_element(s, tcg_res, rd, 0, MO_64);
-+}
+-    gen_helper_neon_pmull_64_hi(tcg_res, tcg_op1, tcg_op2);
-+
+-    write_vec_element(s, tcg_res, rd, 1, MO_64);
- /* Floating point <-> integer conversions
+-
-  *   31   30  29 28       24 23  22  21 20   19 18 16 15         10 9  5 4  0
+-    tcg_temp_free_i64(tcg_op1);
-  * +----+---+---+-----------+------+---+-------+-----+-------------+----+----+
+-    tcg_temp_free_i64(tcg_op2);
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
+-    tcg_temp_free_i64(tcg_res);
-             handle_fmov(s, rd, rn, type, itof);
+-}
-             break;
+-
+ /* AdvSIMD three different
-+        case 0b00111110: /* FJCVTZS */
+  *   31  30  29 28       24 23  22  21 20  16 15    12 11 10 9    5 4    0
-+            if (!dc_isar_feature(aa64_jscvt, s)) {
+  * +---+---+---+-----------+------+---+------+--------+-----+------+------+
-+                goto do_unallocated;
+@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
-+            } else if (fp_access_check(s)) {
+             if (!fp_access_check(s)) {
-+                handle_fjcvtzs(s, rd, rn);
+                 return;
-+            }
+             }
-+            break;
+-            handle_pmull_64(s, is_q, rd, rn, rm);
-+
++            /* The Q field specifies lo/hi half input for this insn.  */
-         default:
++            gen_gvec_op3_ool(s, true, rd, rn, rm, is_q,
-         do_unallocated:
++                             gen_helper_gvec_pmull_q);
-             unallocated_encoding(s);
+             return;
          }
          goto is_widening;
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                     rm_is_dp = false;
+                  * outside the loop below as it only performs a single pass.
-                     break;
+                  */
+                 if (op == 14 && size == 2) {
-+                case 0x13: /* vjcvt */
+-                    TCGv_i64 tcg_rn, tcg_rm, tcg_rd;
-+                    if (!dp || !dc_isar_feature(aa32_jscvt, s)) {
+-
-+                        return 1;
+                     if (!dc_isar_feature(aa32_pmull, s)) {
-+                    }
+                         return 1;
-+                    rd_is_dp = false;
+                     }
-+                    break;
+-                    tcg_rn = tcg_temp_new_i64();
-+
+-                    tcg_rm = tcg_temp_new_i64();
-                 default:
+-                    tcg_rd = tcg_temp_new_i64();
-                     return 1;
+-                    neon_load_reg64(tcg_rn, rn);
 -                    neon_load_reg64(tcg_rm, rm);
 -                    gen_helper_neon_pmull_64_lo(tcg_rd, tcg_rn, tcg_rm);
 -                    neon_store_reg64(tcg_rd, rd);
 -                    gen_helper_neon_pmull_64_hi(tcg_rd, tcg_rn, tcg_rm);
 -                    neon_store_reg64(tcg_rd, rd + 1);
 -                    tcg_temp_free_i64(tcg_rn);
 -                    tcg_temp_free_i64(tcg_rm);
 -                    tcg_temp_free_i64(tcg_rd);
 +                    tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
 +                                       0, gen_helper_gvec_pmull_q);
                      return 0;
                  }
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
-                     case 17: /* fsito */
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
                          gen_vfp_sito(dp, 0);
                          break;
 +                    case 19: /* vjcvt */
 +                        gen_helper_vjcvt(cpu_F0s, cpu_F0d, cpu_env);
 +                        break;
                      case 20: /* fshto */
                          gen_vfp_shto(dp, 16 - rm, 0);
                          break;
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vfp_helper.c
+--- a/target/arm/vec_helper.c
-+++ b/target/arm/vfp_helper.c
++++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ int arm_rmode_to_sf(int rmode)
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_pmul_b)(void *vd, void *vn, void *vm, uint32_t desc)
      }
-     return rmode;
+     clear_tail(d, opr_sz, simd_maxsz(desc));
  }
 +
 +/*
-+ * Implement float64 to int32_t conversion without saturation;
++ * 64x64->128 polynomial multiply.
-+ * the result is supplied modulo 2^32.
++ * Because of the lanes are not accessed in strict columns,
 + * this probably cannot be turned into a generic helper.
 + */
-+uint64_t HELPER(fjcvtzs)(float64 value, void *vstatus)
++void HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
-+    float_status *status = vstatus;
++    intptr_t i, j, opr_sz = simd_oprsz(desc);
-+    uint32_t exp, sign;
++    intptr_t hi = simd_data(desc);
-+    uint64_t frac;
++    uint64_t *d = vd, *n = vn, *m = vm;
 +    uint32_t inexact = 1; /* !Z */
 +
-+    sign = extract64(value, 63, 1);
++    for (i = 0; i < opr_sz / 8; i += 2) {
-+    exp = extract64(value, 52, 11);
++        uint64_t nn = n[i + hi];
-+    frac = extract64(value, 0, 52);
++        uint64_t mm = m[i + hi];
 +        uint64_t rhi = 0;
 +        uint64_t rlo = 0;
 +
-+    if (exp == 0) {
++        /* Bit 0 can only influence the low 64-bit result.  */
-+        /* While not inexact for IEEE FP, -0.0 is inexact for JavaScript.  */
++        if (nn & 1) {
-+        inexact = sign;
++            rlo = mm;
 +        if (frac != 0) {
 +            if (status->flush_inputs_to_zero) {
 +                float_raise(float_flag_input_denormal, status);
 +            } else {
 +                float_raise(float_flag_inexact, status);
 +                inexact = 1;
 +            }
 +        }
 +        frac = 0;
 +    } else if (exp == 0x7ff) {
 +        /* This operation raises Invalid for both NaN and overflow (Inf).  */
 +        float_raise(float_flag_invalid, status);
 +        frac = 0;
 +    } else {
 +        int true_exp = exp - 1023;
 +        int shift = true_exp - 52;
 +
 +        /* Restore implicit bit.  */
 +        frac |= 1ull << 52;
 +
 +        /* Shift the fraction into place.  */
 +        if (shift >= 0) {
 +            /* The number is so large we must shift the fraction left.  */
 +            if (shift >= 64) {
 +                /* The fraction is shifted out entirely.  */
 +                frac = 0;
 +            } else {
 +                frac <<= shift;
 +            }
 +        } else if (shift > -64) {
 +            /* Normal case -- shift right and notice if bits shift out.  */
 +            inexact = (frac << (64 + shift)) != 0;
 +            frac >>= -shift;
 +        } else {
 +            /* The fraction is shifted out entirely.  */
 +            frac = 0;
 +        }
 +
-+        /* Notice overflow or inexact exceptions.  */
++        for (j = 1; j < 64; ++j) {
-+        if (true_exp > 31 || frac > (sign ? 0x80000000ull : 0x7fffffff)) {
++            uint64_t mask = -((nn >> j) & 1);
-+            /* Overflow, for which this operation raises invalid.  */
++            rlo ^= (mm << j) & mask;
-+            float_raise(float_flag_invalid, status);
++            rhi ^= (mm >> (64 - j)) & mask;
 +            inexact = 1;
 +        } else if (inexact) {
 +            float_raise(float_flag_inexact, status);
 +        }
-+
++        d[i] = rlo;
-+        /* Honor the sign.  */
++        d[i + 1] = rhi;
 +        if (sign) {
 +            frac = -frac;
 +        }
 +    }
-+
++    clear_tail(d, opr_sz, simd_maxsz(desc));
 +    /* Pack the result and the env->ZF representation of Z together.  */
 +    return deposit64(frac, 32, 32, inexact);
 +}
 +
 +uint32_t HELPER(vjcvt)(float64 value, CPUARMState *env)
 +{
 +    uint64_t pair = HELPER(fjcvtzs)(value, &env->vfp.fp_status);
 +    uint32_t result = pair;
 +    uint32_t z = (pair >> 32) == 0;
 +
 +    /* Store Z, clear NCV, in FPSCR.NZCV.  */
 +    env->vfp.xregs[ARM_VFP_FPSCR]
 +        = (env->vfp.xregs[ARM_VFP_FPSCR] & ~CPSR_NZCV) | (z * CPSR_Z);
 +
 +    return result;
 +}
 --
 .20.1

-[Qemu-devel] [PULL 06/21] target/arm: Rearrange Floating-point data-processing (2 regs)
+[PULL 41/52] target/arm: Convert PMULL.8 to gvec
 From: Richard Henderson <richard.henderson@linaro.org>
-There are lots of special cases within these insns.  Split the
+We still need two different helpers, since NEON and SVE2 get the
-major argument decode/loading/saving into no_output (compares),
+inputs from different locations within the source vector.  However,
-rd_is_dp, and rm_is_dp.
+we can convert both to the same internal form for computation.
-We still need to special case argument load for compare (rd as
+The sve2 helper is not used yet, but adding it with this patch
-input, rm as zero) and vcvt fixed (rd as input+output), but lots
+helps illustrate why the neon changes are helpful.
-of special cases do disappear.
+Tested-by: Alex Bennée <alex.bennee@linaro.org>
-Now that we have a full switch at the beginning, hoist the ISA
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 checks from the code generation.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190215192302.27855-4-richard.henderson@linaro.org
+Message-id: 20200216214232.4230-5-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 227 ++++++++++++++++++++---------------------
+ target/arm/helper-sve.h    |  2 ++
-file changed, 111 insertions(+), 116 deletions(-)
+ target/arm/helper.h        |  3 +-
+ target/arm/neon_helper.c   | 32 --------------------
  target/arm/translate-a64.c | 27 +++++++++++------
  target/arm/translate.c     | 26 ++++++++---------
  target/arm/vec_helper.c    | 60 ++++++++++++++++++++++++++++++++++++++
 files changed, 95 insertions(+), 55 deletions(-)
 diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper-sve.h
 +++ b/target/arm/helper-sve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_stdd_le_zd, TCG_CALL_NO_WG,
                     void, env, ptr, ptr, ptr, tl, i32)
  DEF_HELPER_FLAGS_6(sve_stdd_be_zd, TCG_CALL_NO_WG,
                     void, env, ptr, ptr, ptr, tl, i32)
 +
 +DEF_HELPER_FLAGS_4(sve2_pmull_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 diff --git a/target/arm/helper.h b/target/arm/helper.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.h
 +++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(neon_sub_u8, i32, i32, i32)
  DEF_HELPER_2(neon_sub_u16, i32, i32, i32)
  DEF_HELPER_2(neon_mul_u8, i32, i32, i32)
  DEF_HELPER_2(neon_mul_u16, i32, i32, i32)
 -DEF_HELPER_2(neon_mull_p8, i64, i32, i32)
  DEF_HELPER_2(neon_tst_u8, i32, i32, i32)
  DEF_HELPER_2(neon_tst_u16, i32, i32, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(gvec_pmul_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
  DEF_HELPER_FLAGS_4(gvec_pmull_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +DEF_HELPER_FLAGS_4(neon_pmull_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 +
  #ifdef TARGET_AARCH64
  #include "helper-a64.h"
  #include "helper-sve.h"
 diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/neon_helper.c
 +++ b/target/arm/neon_helper.c
@@ -XXX,XX +XXX,XX @@ NEON_VOP(mul_u8, neon_u8, 4)
  NEON_VOP(mul_u16, neon_u16, 2)
  #undef NEON_FN
 -/* Polynomial multiplication is like integer multiplication except the
 -   partial products are XORed, not added.  */
 -uint64_t HELPER(neon_mull_p8)(uint32_t op1, uint32_t op2)
 -{
 -    uint64_t result = 0;
 -    uint64_t mask;
 -    uint64_t op2ex = op2;
 -    op2ex = (op2ex & 0xff) |
 -        ((op2ex & 0xff00) << 8) |
 -        ((op2ex & 0xff0000) << 16) |
 -        ((op2ex & 0xff000000) << 24);
 -    while (op1) {
 -        mask = 0;
 -        if (op1 & 1) {
 -            mask |= 0xffff;
 -        }
 -        if (op1 & (1 << 8)) {
 -            mask |= (0xffffU << 16);
 -        }
 -        if (op1 & (1 << 16)) {
 -            mask |= (0xffffULL << 32);
 -        }
 -        if (op1 & (1 << 24)) {
 -            mask |= (0xffffULL << 48);
 -        }
 -        result ^= op2ex & mask;
 -        op1 = (op1 >> 1) & 0x7f7f7f7f;
 -        op2ex <<= 1;
 -    }
 -    return result;
 -}
 -
  #define NEON_FN(dest, src1, src2) dest = (src1 & src2) ? -1 : 0
  NEON_VOP(tst_u8, neon_u8, 4)
  NEON_VOP(tst_u16, neon_u16, 2)
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_3rd_widening(DisasContext *s, int is_q, int is_u, int size,
                  gen_helper_neon_addl_saturate_s32(tcg_passres, cpu_env,
                                                    tcg_passres, tcg_passres);
                  break;
 -            case 14: /* PMULL */
 -                assert(size == 0);
 -                gen_helper_neon_mull_p8(tcg_passres, tcg_op1, tcg_op2);
 -                break;
              default:
                  g_assert_not_reached();
              }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
          handle_3rd_narrowing(s, is_q, is_u, size, opcode, rd, rn, rm);
          break;
      case 14: /* PMULL, PMULL2 */
 -        if (is_u || size == 1 || size == 2) {
 +        if (is_u) {
              unallocated_encoding(s);
              return;
          }
 -        if (size == 3) {
 +        switch (size) {
 +        case 0: /* PMULL.P8 */
 +            if (!fp_access_check(s)) {
 +                return;
 +            }
 +            /* The Q field specifies lo/hi half input for this insn.  */
 +            gen_gvec_op3_ool(s, true, rd, rn, rm, is_q,
 +                             gen_helper_neon_pmull_h);
 +            break;
 +
 +        case 3: /* PMULL.P64 */
              if (!dc_isar_feature(aa64_pmull, s)) {
                  unallocated_encoding(s);
                  return;
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
              /* The Q field specifies lo/hi half input for this insn.  */
              gen_gvec_op3_ool(s, true, rd, rn, rm, is_q,
                               gen_helper_gvec_pmull_q);
 -            return;
 +            break;
 +
 +        default:
 +            unallocated_encoding(s);
 +            break;
          }
 -        goto is_widening;
 +        return;
      case 9: /* SQDMLAL, SQDMLAL2 */
      case 11: /* SQDMLSL, SQDMLSL2 */
      case 13: /* SQDMULL, SQDMULL2 */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
              unallocated_encoding(s);
              return;
          }
 -    is_widening:
          if (!fp_access_check(s)) {
              return;
          }
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
              }
          } else {
              /* data processing */
 +            bool rd_is_dp = dp;
 +            bool rm_is_dp = dp;
 +            bool no_output = false;
 +
              /* The opcode is in bits 23, 21, 20 and 6.  */
              op = ((insn >> 20) & 8) | ((insn >> 19) & 6) | ((insn >> 6) & 1);
 -            if (dp) {
 -                if (op == 15) {
 -                    /* rn is opcode */
 -                    rn = ((insn >> 15) & 0x1e) | ((insn >> 7) & 1);
 -                } else {
 -                    /* rn is register number */
 -                    VFP_DREG_N(rn, insn);
 -                }
 +            rn = VFP_SREG_N(insn);
 -                if (op == 15 && (rn == 15 || ((rn & 0x1c) == 0x18) ||
 -                                 ((rn & 0x1e) == 0x6))) {
 -                    /* Integer or single/half precision destination.  */
 -                    rd = VFP_SREG_D(insn);
 -                } else {
 -                    VFP_DREG_D(rd, insn);
 -                }
 -                if (op == 15 &&
 -                    (((rn & 0x1c) == 0x10) || ((rn & 0x14) == 0x14) ||
 -                     ((rn & 0x1e) == 0x4))) {
 -                    /* VCVT from int or half precision is always from S reg
 -                     * regardless of dp bit. VCVT with immediate frac_bits
 -                     * has same format as SREG_M.
 +            if (op == 15) {
 +                /* rn is opcode, encoded as per VFP_SREG_N. */
 +                switch (rn) {
 +                case 0x00: /* vmov */
 +                case 0x01: /* vabs */
 +                case 0x02: /* vneg */
 +                case 0x03: /* vsqrt */
 +                    break;
 +
 +                case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
 +                case 0x05: /* vcvtt.f64.f16, vcvtt.f32.f16 */
 +                    /*
 +                     * VCVTB, VCVTT: only present with the halfprec extension
 +                     * UNPREDICTABLE if bit 8 is set prior to ARMv8
 +                     * (we choose to UNDEF)
                       */
 -                    rm = VFP_SREG_M(insn);
 -                } else {
 -                    VFP_DREG_M(rm, insn);
 +                    if ((dp && !arm_dc_feature(s, ARM_FEATURE_V8)) ||
 +                        !arm_dc_feature(s, ARM_FEATURE_VFP_FP16)) {
 +                        return 1;
 +                    }
 +                    rm_is_dp = false;
 +                    break;
 +                case 0x06: /* vcvtb.f16.f32, vcvtb.f16.f64 */
 +                case 0x07: /* vcvtt.f16.f32, vcvtt.f16.f64 */
 +                    if ((dp && !arm_dc_feature(s, ARM_FEATURE_V8)) ||
 +                        !arm_dc_feature(s, ARM_FEATURE_VFP_FP16)) {
 +                        return 1;
 +                    }
 +                    rd_is_dp = false;
 +                    break;
 +
 +                case 0x08: case 0x0a: /* vcmp, vcmpz */
 +                case 0x09: case 0x0b: /* vcmpe, vcmpez */
 +                    no_output = true;
 +                    break;
 +
 +                case 0x0c: /* vrintr */
 +                case 0x0d: /* vrintz */
 +                case 0x0e: /* vrintx */
 +                    break;
 +
 +                case 0x0f: /* vcvt double<->single */
 +                    rd_is_dp = !dp;
 +                    break;
 +
 +                case 0x10: /* vcvt.fxx.u32 */
 +                case 0x11: /* vcvt.fxx.s32 */
 +                    rm_is_dp = false;
 +                    break;
 +                case 0x18: /* vcvtr.u32.fxx */
 +                case 0x19: /* vcvtz.u32.fxx */
 +                case 0x1a: /* vcvtr.s32.fxx */
 +                case 0x1b: /* vcvtz.s32.fxx */
 +                    rd_is_dp = false;
 +                    break;
 +
 +                case 0x14: /* vcvt fp <-> fixed */
 +                case 0x15:
 +                case 0x16:
 +                case 0x17:
 +                case 0x1c:
 +                case 0x1d:
 +                case 0x1e:
 +                case 0x1f:
 +                    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 +                        return 1;
 +                    }
 +                    /* Immediate frac_bits has same format as SREG_M.  */
 +                    rm_is_dp = false;
 +                    break;
 +
 +                default:
 +                    return 1;
                  }
 +            } else if (dp) {
 +                /* rn is register number */
 +                VFP_DREG_N(rn, insn);
 +            }
 +
 +            if (rd_is_dp) {
 +                VFP_DREG_D(rd, insn);
 +            } else {
 +                rd = VFP_SREG_D(insn);
 +            }
 +            if (rm_is_dp) {
 +                VFP_DREG_M(rm, insn);
              } else {
 -                rn = VFP_SREG_N(insn);
 -                if (op == 15 && rn == 15) {
 -                    /* Double precision destination.  */
 -                    VFP_DREG_D(rd, insn);
 -                } else {
 -                    rd = VFP_SREG_D(insn);
 -                }
 -                /* NB that we implicitly rely on the encoding for the frac_bits
 -                 * in VCVT of fixed to float being the same as that of an SREG_M
 -                 */
                  rm = VFP_SREG_M(insn);
              }
              veclen = s->vec_len;
 -            if (op == 15 && rn > 3)
 +            if (op == 15 && rn > 3) {
                  veclen = 0;
 +            }
              /* Shut up compiler warnings.  */
              delta_m = 0;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
              /* Load the initial operands.  */
              if (op == 15) {
                  switch (rn) {
 -                case 16:
 -                case 17:
 -                    /* Integer source */
 -                    gen_mov_F0_vreg(0, rm);
 -                    break;
 -                case 8:
 -                case 9:
 -                    /* Compare */
 +                case 0x08: case 0x09: /* Compare */
                      gen_mov_F0_vreg(dp, rd);
                      gen_mov_F1_vreg(dp, rm);
                      break;
 -                case 10:
 -                case 11:
 -                    /* Compare with zero */
 +                case 0x0a: case 0x0b: /* Compare with zero */
                      gen_mov_F0_vreg(dp, rd);
                      gen_vfp_F1_ld0(dp);
                      break;
 -                case 20:
 -                case 21:
 -                case 22:
 -                case 23:
 -                case 28:
 -                case 29:
 -                case 30:
 -                case 31:
 +                case 0x14: /* vcvt fp <-> fixed */
 +                case 0x15:
 +                case 0x16:
 +                case 0x17:
 +                case 0x1c:
 +                case 0x1d:
 +                case 0x1e:
 +                case 0x1f:
                      /* Source and destination the same.  */
                      gen_mov_F0_vreg(dp, rd);
                      break;
 -                case 4:
 -                case 5:
 -                case 6:
 -                case 7:
 -                    /* VCVTB, VCVTT: only present with the halfprec extension
 -                     * UNPREDICTABLE if bit 8 is set prior to ARMv8
 -                     * (we choose to UNDEF)
 -                     */
 -                    if ((dp && !arm_dc_feature(s, ARM_FEATURE_V8)) ||
 -                        !arm_dc_feature(s, ARM_FEATURE_VFP_FP16)) {
 -                        return 1;
 -                    }
 -                    if (!extract32(rn, 1, 1)) {
 -                        /* Half precision source.  */
 -                        gen_mov_F0_vreg(0, rm);
 -                        break;
 -                    }
 -                    /* Otherwise fall through */
                  default:
                      /* One source operand.  */
 -                    gen_mov_F0_vreg(dp, rm);
 +                    gen_mov_F0_vreg(rm_is_dp, rm);
                      break;
                  }
              } else {
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                          break;
                      }
                      case 15: /* single<->double conversion */
 -                        if (dp)
 +                        if (dp) {
                              gen_helper_vfp_fcvtsd(cpu_F0s, cpu_F0d, cpu_env);
 -                        else
 +                        } else {
                              gen_helper_vfp_fcvtds(cpu_F0d, cpu_F0s, cpu_env);
 +                        }
                          break;
                      case 16: /* fuito */
                          gen_vfp_uito(dp, 0);
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                          gen_vfp_sito(dp, 0);
                          break;
                      case 20: /* fshto */
 -                        if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 -                            return 1;
 -                        }
                          gen_vfp_shto(dp, 16 - rm, 0);
                          break;
                      case 21: /* fslto */
 -                        if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 -                            return 1;
 -                        }
                          gen_vfp_slto(dp, 32 - rm, 0);
                          break;
                      case 22: /* fuhto */
 -                        if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 -                            return 1;
 -                        }
                          gen_vfp_uhto(dp, 16 - rm, 0);
                          break;
                      case 23: /* fulto */
 -                        if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 -                            return 1;
 -                        }
                          gen_vfp_ulto(dp, 32 - rm, 0);
                          break;
                      case 24: /* ftoui */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                          gen_vfp_tosiz(dp, 0);
                          break;
                      case 28: /* ftosh */
 -                        if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 -                            return 1;
 -                        }
                          gen_vfp_tosh(dp, 16 - rm, 0);
                          break;
                      case 29: /* ftosl */
 -                        if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 -                            return 1;
 -                        }
                          gen_vfp_tosl(dp, 32 - rm, 0);
                          break;
                      case 30: /* ftouh */
 -                        if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 -                            return 1;
 -                        }
                          gen_vfp_touh(dp, 16 - rm, 0);
                          break;
                      case 31: /* ftoul */
 -                        if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
 -                            return 1;
 -                        }
                          gen_vfp_toul(dp, 32 - rm, 0);
                          break;
                      default: /* undefined */
 -                        return 1;
 +                        g_assert_not_reached();
                      }
                      break;
                  default: /* undefined */
                      return 1;
                  }
--                /* Write back the result.  */
+-                /* Handle VMULL.P64 (Polynomial 64x64 to 128 bit multiply)
--                if (op == 15 && (rn >= 8 && rn <= 11)) {
+-                 * outside the loop below as it only performs a single pass.
--                    /* Comparison, do nothing.  */
+-                 */
--                } else if (op == 15 && dp && ((rn & 0x1c) == 0x18 ||
+-                if (op == 14 && size == 2) {
--                                              (rn & 0x1e) == 0x6)) {
+-                    if (!dc_isar_feature(aa32_pmull, s)) {
--                    /* VCVT double to int: always integer result.
+-                        return 1;
--                     * VCVT double to half precision is always a single
++                /* Handle polynomial VMULL in a single pass.  */
--                     * precision result.
++                if (op == 14) {
--                     */
++                    if (size == 0) {
--                    gen_mov_vreg_F0(0, rd);
++                        /* VMULL.P8 */
--                } else if (op == 15 && rn == 15) {
++                        tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
--                    /* conversion */
++                                           0, gen_helper_neon_pmull_h);
--                    gen_mov_vreg_F0(!dp, rd);
++                    } else {
--                } else {
++                        /* VMULL.P64 */
--                    gen_mov_vreg_F0(dp, rd);
++                        if (!dc_isar_feature(aa32_pmull, s)) {
-+                /* Write back the result, if any.  */
++                            return 1;
-+                if (!no_output) {
++                        }
-+                    gen_mov_vreg_F0(rd_is_dp, rd);
++                        tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
 +                                           0, gen_helper_gvec_pmull_q);
                      }
 -                    tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
 -                                       0, gen_helper_gvec_pmull_q);
                      return 0;
                  }
-                 /* break out of the loop if we have finished  */
+@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
--                if (veclen == 0)
+                         /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
-+                if (veclen == 0) {
+                         gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
-                     break;
+                         break;
-+                }
+-                    case 14: /* Polynomial VMULL */
+-                        gen_helper_neon_mull_p8(cpu_V0, tmp, tmp2);
-                 if (op == 15 && delta_m == 0) {
+-                        tcg_temp_free_i32(tmp2);
-                     /* single source one-many */
+-                        tcg_temp_free_i32(tmp);
 -                        break;
                      default: /* 15 is RESERVED: caught earlier  */
                          abort();
                      }
 diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/vec_helper.c
 +++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, uint32_t desc)
      }
      clear_tail(d, opr_sz, simd_maxsz(desc));
  }
 +
 +/*
 + * 8x8->16 polynomial multiply.
 + *
 + * The byte inputs are expanded to (or extracted from) half-words.
 + * Note that neon and sve2 get the inputs from different positions.
 + * This allows 4 bytes to be processed in parallel with uint64_t.
 + */
 +
 +static uint64_t expand_byte_to_half(uint64_t x)
 +{
 +    return  (x & 0x000000ff)
 +         | ((x & 0x0000ff00) << 8)
 +         | ((x & 0x00ff0000) << 16)
 +         | ((x & 0xff000000) << 24);
 +}
 +
 +static uint64_t pmull_h(uint64_t op1, uint64_t op2)
 +{
 +    uint64_t result = 0;
 +    int i;
 +
 +    for (i = 0; i < 8; ++i) {
 +        uint64_t mask = (op1 & 0x0001000100010001ull) * 0xffff;
 +        result ^= op2 & mask;
 +        op1 >>= 1;
 +        op2 <<= 1;
 +    }
 +    return result;
 +}
 +
 +void HELPER(neon_pmull_h)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
 +    int hi = simd_data(desc);
 +    uint64_t *d = vd, *n = vn, *m = vm;
 +    uint64_t nn = n[hi], mm = m[hi];
 +
 +    d[0] = pmull_h(expand_byte_to_half(nn), expand_byte_to_half(mm));
 +    nn >>= 32;
 +    mm >>= 32;
 +    d[1] = pmull_h(expand_byte_to_half(nn), expand_byte_to_half(mm));
 +
 +    clear_tail(d, 16, simd_maxsz(desc));
 +}
 +
 +#ifdef TARGET_AARCH64
 +void HELPER(sve2_pmull_h)(void *vd, void *vn, void *vm, uint32_t desc)
 +{
 +    int shift = simd_data(desc) * 8;
 +    intptr_t i, opr_sz = simd_oprsz(desc);
 +    uint64_t *d = vd, *n = vn, *m = vm;
 +
 +    for (i = 0; i < opr_sz / 8; ++i) {
 +        uint64_t nn = (n[i] >> shift) & 0x00ff00ff00ff00ffull;
 +        uint64_t mm = (m[i] >> shift) & 0x00ff00ff00ff00ffull;
 +
 +        d[i] = pmull_h(nn, mm);
 +    }
 +}
 +#endif
 --
 .20.1

-New patch
+[PULL 42/52] xilinx_spips: Correct the number of dummy cycles for the FAST_READ_4 cmd
+From: Francisco Iglesias <francisco.iglesias@xilinx.com>
+Correct the number of dummy cycles required by the FAST_READ_4 command (to
+be eight, one dummy byte).
+Fixes: ef06ca3946 ("xilinx_spips: Add support for RX discard and RX drain")
+Suggested-by: Cédric Le Goater <clg@kaod.org>
+Signed-off-by: Francisco Iglesias <frasse.iglesias@gmail.com>
+Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
+Message-id: 20200218113350.6090-1-frasse.iglesias@gmail.com
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ hw/ssi/xilinx_spips.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/hw/ssi/xilinx_spips.c b/hw/ssi/xilinx_spips.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/ssi/xilinx_spips.c
++++ b/hw/ssi/xilinx_spips.c
+@@ -XXX,XX +XXX,XX @@ static int xilinx_spips_num_dummies(XilinxQSPIPS *qs, uint8_t command)
+     case FAST_READ:
+     case DOR:
+     case QOR:
++    case FAST_READ_4:
+     case DOR_4:
+     case QOR_4:
+         return 1;
+     case DIOR:
+-    case FAST_READ_4:
+     case DIOR_4:
+         return 2;
+     case QIOR:
+--
+.20.1

-New patch
+[PULL 43/52] sh4: Fix PCI ISA IO memory subregion
+From: Guenter Roeck <linux@roeck-us.net>
+Booting the r2d machine from flash fails because flash is not discovered.
+Looking at the flattened memory tree, we see the following.
+FlatView #1
+ AS "memory", root: system
+ AS "cpu-memory-0", root: system
+ AS "sh_pci_host", root: bus master container
+ Root memory region: system
+  0000000000000000-000000000000ffff (prio 0, i/o): io
+  0000000000010000-0000000000ffffff (prio 0, i/o): r2d.flash @0000000000010000
+The overlapping memory region is sh_pci.isa, ie the ISA I/O region bridge.
+This region is initially assigned to address 0xfe240000, but overwritten
+with a write into the PCIIOBR register. This write is expected to adjust
+the PCI memory window, but not to change the region's base adddress.
+Peter Maydell provided the following detailed explanation.
+"Section 22.3.7 and in particular figure 22.3 (of "SSH7751R user's manual:
+hardware") are clear about how this is supposed to work: there is a window
+at 0xfe240000 in the system register space for PCI I/O space. When the CPU
+makes an access into that area, the PCI controller calculates the PCI
+address to use by combining bits 0..17 of the system address with the
+bits 31..18 value that the guest has put into the PCIIOBR. That is, writing
+to the PCIIOBR changes which section of the IO address space is visible in
+the 0xfe240000 window. Instead what QEMU's implementation does is move the
+window to whatever value the guest writes to the PCIIOBR register -- so if
+the guest writes 0 we put the window at 0 in system address space."
+Fix the problem by calling memory_region_set_alias_offset() instead of
+removing and re-adding the PCI ISA subregion on writes into PCIIOBR.
+At the same time, in sh_pci_device_realize(), don't set iobr since
+it is overwritten later anyway. Instead, pass the base address to
+memory_region_add_subregion() directly.
+Many thanks to Peter Maydell for the detailed problem analysis, and for
+providing suggestions on how to fix the problem.
+Signed-off-by: Guenter Roeck <linux@roeck-us.net>
+Message-id: 20200218201050.15273-1-linux@roeck-us.net
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ hw/sh4/sh_pci.c | 11 +++--------
+file changed, 3 insertions(+), 8 deletions(-)
+diff --git a/hw/sh4/sh_pci.c b/hw/sh4/sh_pci.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/sh4/sh_pci.c
++++ b/hw/sh4/sh_pci.c
+@@ -XXX,XX +XXX,XX @@ static void sh_pci_reg_write (void *p, hwaddr addr, uint64_t val,
+         pcic->mbr = val & 0xff000001;
+         break;
+     case 0x1c8:
+-        if ((val & 0xfffc0000) != (pcic->iobr & 0xfffc0000)) {
+-            memory_region_del_subregion(get_system_memory(), &pcic->isa);
+-            pcic->iobr = val & 0xfffc0001;
+-            memory_region_add_subregion(get_system_memory(),
+-                                        pcic->iobr & 0xfffc0000, &pcic->isa);
+-        }
++        pcic->iobr = val & 0xfffc0001;
++        memory_region_set_alias_offset(&pcic->isa, val & 0xfffc0000);
+         break;
+     case 0x220:
+         pci_data_write(phb->bus, pcic->par, val, 4);
+@@ -XXX,XX +XXX,XX @@ static void sh_pci_device_realize(DeviceState *dev, Error **errp)
+                              get_system_io(), 0, 0x40000);
+     sysbus_init_mmio(sbd, &s->memconfig_p4);
+     sysbus_init_mmio(sbd, &s->memconfig_a7);
+-    s->iobr = 0xfe240000;
+-    memory_region_add_subregion(get_system_memory(), s->iobr, &s->isa);
++    memory_region_add_subregion(get_system_memory(), 0xfe240000, &s->isa);
+     s->dev = pci_create_simple(phb->bus, PCI_DEVFN(0, 0), "sh_pci_host");
+ }
+--
+.20.1

-New patch
+[PULL 44/52] target/arm: Rename isar_feature_aa32_simd_r32
+From: Richard Henderson <richard.henderson@linaro.org>
 The old name, isar_feature_aa32_fp_d32, does not reflect
 the MVFR0 field name, SIMDReg.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Message-id: 20200214181547.21408-3-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 [PMM: wrapped one long line]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/cpu.h               |  2 +-
  target/arm/translate-vfp.inc.c | 53 +++++++++++++++++-----------------
 files changed, 28 insertions(+), 27 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
      return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, FP) == 1;
  }
 -static inline bool isar_feature_aa32_fp_d32(const ARMISARegisters *id)
 +static inline bool isar_feature_aa32_simd_r32(const ARMISARegisters *id)
  {
      /* Return true if D16-D31 are implemented */
      return FIELD_EX32(id->mvfr0, MVFR0, SIMDREG) >= 2;
 diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.inc.c
 +++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
      }
      /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 +    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
          ((a->vm | a->vn | a->vd) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
      }
      /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 +    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
          ((a->vm | a->vn | a->vd) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
      }
      /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
 +    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
          ((a->vm | a->vd) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
      }
      /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 +    if (dp && !dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
      uint32_t offset;
      /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vn & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
      uint32_t offset;
      /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vn & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
      }
      /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vn & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
       */
      /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
      TCGv_i64 tmp;
      /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
      }
      /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd + n) > 16) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd + n) > 16) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
      TCGv_ptr fpst;
      /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && ((vd | vn | vm) & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vn | vm) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
      TCGv_i64 f0, fd;
      /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && ((vd | vm) & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vm) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VFM_dp(DisasContext *s, arg_VFM_dp *a)
      }
      /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vn | a->vm) & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
      vd = a->vd;
      /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && (vd & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (vd & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
      }
      /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && ((a->vd | a->vm) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
      }
      /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd  & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd  & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
      }
      /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm  & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm  & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
      }
      /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && ((a->vd | a->vm) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
      }
      /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && ((a->vd | a->vm) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
      }
      /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && ((a->vd | a->vm) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
      TCGv_i32 vm;
      /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
      TCGv_i32 vd;
      /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
      TCGv_ptr fpst;
      /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
      }
      /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
      }
      /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
      TCGv_ptr fpst;
      /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
          return false;
      }
 --
 .20.1

-New patch
+[PULL 45/52] target/arm: Use isar_feature_aa32_simd_r32 more places
+From: Richard Henderson <richard.henderson@linaro.org>
+Many uses of ARM_FEATURE_VFP3 are testing for the number of simd
+registers implemented.  Use the proper test vs MVFR0.SIMDReg.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200214181547.21408-4-richard.henderson@linaro.org
+[PMM: fix typo in commit message]
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/cpu.c       |  9 ++++-----
+ target/arm/helper.c    | 13 ++++++-------
+ target/arm/translate.c |  2 +-
+files changed, 11 insertions(+), 13 deletions(-)
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.c
++++ b/target/arm/cpu.c
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_dump_state(CPUState *cs, FILE *f, int flags)
+     if (flags & CPU_DUMP_FPU) {
+         int numvfpregs = 0;
+-        if (arm_feature(env, ARM_FEATURE_VFP)) {
+-            numvfpregs += 16;
+-        }
+-        if (arm_feature(env, ARM_FEATURE_VFP3)) {
+-            numvfpregs += 16;
++        if (cpu_isar_feature(aa32_simd_r32, cpu)) {
++            numvfpregs = 32;
++        } else if (arm_feature(env, ARM_FEATURE_VFP)) {
++            numvfpregs = 16;
+         }
+         for (i = 0; i < numvfpregs; i++) {
+             uint64_t v = *aa32_vfp_dreg(env, i);
+diff --git a/target/arm/helper.c b/target/arm/helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.c
++++ b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@ static void switch_mode(CPUARMState *env, int mode);
+ static int vfp_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg)
+ {
+-    int nregs;
++    ARMCPU *cpu = env_archcpu(env);
++    int nregs = cpu_isar_feature(aa32_simd_r32, cpu) ? 32 : 16;
+     /* VFP data registers are always little-endian.  */
+-    nregs = arm_feature(env, ARM_FEATURE_VFP3) ? 32 : 16;
+     if (reg < nregs) {
+         stq_le_p(buf, *aa32_vfp_dreg(env, reg));
+         return 8;
+@@ -XXX,XX +XXX,XX @@ static int vfp_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg)
+ static int vfp_gdb_set_reg(CPUARMState *env, uint8_t *buf, int reg)
+ {
+-    int nregs;
++    ARMCPU *cpu = env_archcpu(env);
++    int nregs = cpu_isar_feature(aa32_simd_r32, cpu) ? 32 : 16;
+-    nregs = arm_feature(env, ARM_FEATURE_VFP3) ? 32 : 16;
+     if (reg < nregs) {
+         *aa32_vfp_dreg(env, reg) = ldq_le_p(buf);
+         return 8;
+@@ -XXX,XX +XXX,XX @@ static void cpacr_write(CPUARMState *env, const ARMCPRegInfo *ri,
+             /* VFPv3 and upwards with NEON implement 32 double precision
+              * registers (D0-D31).
+              */
+-            if (!arm_feature(env, ARM_FEATURE_NEON) ||
+-                    !arm_feature(env, ARM_FEATURE_VFP3)) {
++            if (!cpu_isar_feature(aa32_simd_r32, env_archcpu(env))) {
+                 /* D32DIS [30] is RAO/WI if D16-31 are not implemented. */
+                 value |= (1 << 30);
+             }
+@@ -XXX,XX +XXX,XX @@ void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu)
+     } else if (arm_feature(env, ARM_FEATURE_NEON)) {
+         gdb_register_coprocessor(cs, vfp_gdb_get_reg, vfp_gdb_set_reg,
+, "arm-neon.xml", 0);
+-    } else if (arm_feature(env, ARM_FEATURE_VFP3)) {
++    } else if (cpu_isar_feature(aa32_simd_r32, cpu)) {
+         gdb_register_coprocessor(cs, vfp_gdb_get_reg, vfp_gdb_set_reg,
+, "arm-vfp3.xml", 0);
+     } else if (arm_feature(env, ARM_FEATURE_VFP)) {
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
+ #define VFP_SREG(insn, bigbit, smallbit) \
+   ((VFP_REG_SHR(insn, bigbit - 1) & 0x1e) | (((insn) >> (smallbit)) & 1))
+ #define VFP_DREG(reg, insn, bigbit, smallbit) do { \
+-    if (arm_dc_feature(s, ARM_FEATURE_VFP3)) { \
++    if (dc_isar_feature(aa32_simd_r32, s)) { \
+         reg = (((insn) >> (bigbit)) & 0x0f) \
+               | (((insn) >> ((smallbit) - 4)) & 0x10); \
+     } else { \
+--
+.20.1

-New patch
+[PULL 46/52] target/arm: Set MVFR0.FPSP for ARMv5 cpus
+From: Richard Henderson <richard.henderson@linaro.org>
+We are going to convert FEATURE tests to ISAR tests,
+so FPSP needs to be set for these cpus, like we have
+already for FPDP.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200214181547.21408-5-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/cpu.c | 10 ++++++----
+file changed, 6 insertions(+), 4 deletions(-)
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.c
++++ b/target/arm/cpu.c
+@@ -XXX,XX +XXX,XX @@ static void arm926_initfn(Object *obj)
+      */
+     cpu->isar.id_isar1 = FIELD_DP32(cpu->isar.id_isar1, ID_ISAR1, JAZELLE, 1);
+     /*
+-     * Similarly, we need to set MVFR0 fields to enable double precision
+-     * and short vector support even though ARMv5 doesn't have this register.
++     * Similarly, we need to set MVFR0 fields to enable vfp and short vector
++     * support even though ARMv5 doesn't have this register.
+      */
+     cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPSHVEC, 1);
++    cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPSP, 1);
+     cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPDP, 1);
+ }
+@@ -XXX,XX +XXX,XX @@ static void arm1026_initfn(Object *obj)
+      */
+     cpu->isar.id_isar1 = FIELD_DP32(cpu->isar.id_isar1, ID_ISAR1, JAZELLE, 1);
+     /*
+-     * Similarly, we need to set MVFR0 fields to enable double precision
+-     * and short vector support even though ARMv5 doesn't have this register.
++     * Similarly, we need to set MVFR0 fields to enable vfp and short vector
++     * support even though ARMv5 doesn't have this register.
+      */
+     cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPSHVEC, 1);
++    cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPSP, 1);
+     cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPDP, 1);
+     {
+--
+.20.1

-[Qemu-devel] [PULL 11/21] hw/char/pl011: Allow use as an embedded-struct device
+[PULL 47/52] target/arm: Add isar_feature_aa32_simd_r16
-Create a new include file for the pl011's device struct,
+From: Richard Henderson <richard.henderson@linaro.org>
-type macros, etc, so that it can be instantiated using
-the "embedded struct" coding style.
+Use this in the places that were checking ARM_FEATURE_VFP, and
+are obviously testing for the existance of the register set
 as opposed to testing for some particular instruction extension.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200214181547.21408-6-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- include/hw/char/pl011.h | 34 ++++++++++++++++++++++++++++++++++
+ target/arm/cpu.h        |  6 ++++++
- hw/char/pl011.c         | 31 ++-----------------------------
+ hw/intc/armv7m_nvic.c   | 20 ++++++++++----------
-files changed, 36 insertions(+), 29 deletions(-)
+ linux-user/arm/signal.c |  4 ++--
+ target/arm/arch_dump.c  | 11 ++++++-----
-diff --git a/include/hw/char/pl011.h b/include/hw/char/pl011.h
+ target/arm/cpu.c        |  8 ++++----
-index XXXXXXX..XXXXXXX 100644
+ target/arm/helper.c     |  4 ++--
---- a/include/hw/char/pl011.h
+ target/arm/m_helper.c   | 11 ++++++-----
-+++ b/include/hw/char/pl011.h
+ target/arm/machine.c    |  3 +--
 files changed, 37 insertions(+), 30 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
      return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, FP) == 1;
  }
 +static inline bool isar_feature_aa32_simd_r16(const ARMISARegisters *id)
 +{
 +    /* Return true if D0-D15 are implemented */
 +    return FIELD_EX32(id->mvfr0, MVFR0, SIMDREG) > 0;
 +}
 +
  static inline bool isar_feature_aa32_simd_r32(const ARMISARegisters *id)
  {
      /* Return true if D16-D31 are implemented */
 diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/intc/armv7m_nvic.c
 +++ b/hw/intc/armv7m_nvic.c
@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
      case 0xd84: /* CSSELR */
          return cpu->env.v7m.csselr[attrs.secure];
      case 0xd88: /* CPACR */
 -        if (!arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
 +        if (!cpu_isar_feature(aa32_simd_r16, cpu)) {
              return 0;
          }
          return cpu->env.v7m.cpacr[attrs.secure];
      case 0xd8c: /* NSACR */
 -        if (!attrs.secure || !arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
 +        if (!attrs.secure || !cpu_isar_feature(aa32_simd_r16, cpu)) {
              return 0;
          }
          return cpu->env.v7m.nsacr;
@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
          }
          return cpu->env.v7m.sfar;
      case 0xf34: /* FPCCR */
 -        if (!arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
 +        if (!cpu_isar_feature(aa32_simd_r16, cpu)) {
              return 0;
          }
          if (attrs.secure) {
@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
              return value;
          }
      case 0xf38: /* FPCAR */
 -        if (!arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
 +        if (!cpu_isar_feature(aa32_simd_r16, cpu)) {
              return 0;
          }
          return cpu->env.v7m.fpcar[attrs.secure];
      case 0xf3c: /* FPDSCR */
 -        if (!arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
 +        if (!cpu_isar_feature(aa32_simd_r16, cpu)) {
              return 0;
          }
          return cpu->env.v7m.fpdscr[attrs.secure];
@@ -XXX,XX +XXX,XX @@ static void nvic_writel(NVICState *s, uint32_t offset, uint32_t value,
          }
          break;
      case 0xd88: /* CPACR */
 -        if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
 +        if (cpu_isar_feature(aa32_simd_r16, cpu)) {
              /* We implement only the Floating Point extension's CP10/CP11 */
              cpu->env.v7m.cpacr[attrs.secure] = value & (0xf << 20);
          }
          break;
      case 0xd8c: /* NSACR */
 -        if (attrs.secure && arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
 +        if (attrs.secure && cpu_isar_feature(aa32_simd_r16, cpu)) {
              /* We implement only the Floating Point extension's CP10/CP11 */
              cpu->env.v7m.nsacr = value & (3 << 10);
          }
@@ -XXX,XX +XXX,XX @@ static void nvic_writel(NVICState *s, uint32_t offset, uint32_t value,
          break;
      }
      case 0xf34: /* FPCCR */
 -        if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
 +        if (cpu_isar_feature(aa32_simd_r16, cpu)) {
              /* Not all bits here are banked. */
              uint32_t fpccr_s;
@@ -XXX,XX +XXX,XX @@ static void nvic_writel(NVICState *s, uint32_t offset, uint32_t value,
          }
          break;
      case 0xf38: /* FPCAR */
 -        if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
 +        if (cpu_isar_feature(aa32_simd_r16, cpu)) {
              value &= ~7;
              cpu->env.v7m.fpcar[attrs.secure] = value;
          }
          break;
      case 0xf3c: /* FPDSCR */
 -        if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
 +        if (cpu_isar_feature(aa32_simd_r16, cpu)) {
              value &= 0x07c00000;
              cpu->env.v7m.fpdscr[attrs.secure] = value;
          }
 diff --git a/linux-user/arm/signal.c b/linux-user/arm/signal.c
 index XXXXXXX..XXXXXXX 100644
 --- a/linux-user/arm/signal.c
 +++ b/linux-user/arm/signal.c
@@ -XXX,XX +XXX,XX @@ static void setup_sigframe_v2(struct target_ucontext_v2 *uc,
      setup_sigcontext(&uc->tuc_mcontext, env, set->sig[0]);
      /* Save coprocessor signal frame.  */
      regspace = uc->tuc_regspace;
 -    if (arm_feature(env, ARM_FEATURE_VFP)) {
 +    if (cpu_isar_feature(aa32_simd_r16, env_archcpu(env))) {
          regspace = setup_sigframe_v2_vfp(regspace, env);
      }
      if (arm_feature(env, ARM_FEATURE_IWMMXT)) {
@@ -XXX,XX +XXX,XX @@ static int do_sigframe_return_v2(CPUARMState *env,
      /* Restore coprocessor signal frame */
      regspace = uc->tuc_regspace;
 -    if (arm_feature(env, ARM_FEATURE_VFP)) {
 +    if (cpu_isar_feature(aa32_simd_r16, env_archcpu(env))) {
          regspace = restore_sigframe_v2_vfp(env, regspace);
          if (!regspace) {
              return 1;
 diff --git a/target/arm/arch_dump.c b/target/arm/arch_dump.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/arch_dump.c
 +++ b/target/arm/arch_dump.c
@@ -XXX,XX +XXX,XX @@ int arm_cpu_write_elf32_note(WriteCoreDumpFunction f, CPUState *cs,
                               int cpuid, void *opaque)
  {
      struct arm_note note;
 -    CPUARMState *env = &ARM_CPU(cs)->env;
 +    ARMCPU *cpu = ARM_CPU(cs);
 +    CPUARMState *env = &cpu->env;
      DumpState *s = opaque;
 -    int ret, i, fpvalid = !!arm_feature(env, ARM_FEATURE_VFP);
 +    int ret, i;
 +    bool fpvalid = cpu_isar_feature(aa32_simd_r16, cpu);
      arm_note_init(&note, s, "CORE", 5, NT_PRSTATUS, sizeof(note.prstatus));
@@ -XXX,XX +XXX,XX @@ int cpu_get_dump_info(ArchDumpInfo *info,
  ssize_t cpu_get_note_size(int class, int machine, int nr_cpus)
  {
      ARMCPU *cpu = ARM_CPU(first_cpu);
 -    CPUARMState *env = &cpu->env;
      size_t note_size;
      if (class == ELFCLASS64) {
@@ -XXX,XX +XXX,XX @@ ssize_t cpu_get_note_size(int class, int machine, int nr_cpus)
          note_size += AARCH64_PRFPREG_NOTE_SIZE;
  #ifdef TARGET_AARCH64
          if (cpu_isar_feature(aa64_sve, cpu)) {
 -            note_size += AARCH64_SVE_NOTE_SIZE(env);
 +            note_size += AARCH64_SVE_NOTE_SIZE(&cpu->env);
          }
  #endif
      } else {
          note_size = ARM_PRSTATUS_NOTE_SIZE;
 -        if (arm_feature(env, ARM_FEATURE_VFP)) {
 +        if (cpu_isar_feature(aa32_simd_r16, cpu)) {
              note_size += ARM_VFP_NOTE_SIZE;
          }
      }
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
              env->v7m.ccr[M_REG_S] |= R_V7M_CCR_UNALIGN_TRP_MASK;
          }
 -        if (arm_feature(env, ARM_FEATURE_VFP)) {
 +        if (cpu_isar_feature(aa32_simd_r16, cpu)) {
              env->v7m.fpccr[M_REG_NS] = R_V7M_FPCCR_ASPEN_MASK;
              env->v7m.fpccr[M_REG_S] = R_V7M_FPCCR_ASPEN_MASK |
                  R_V7M_FPCCR_LSPEN_MASK | R_V7M_FPCCR_S_MASK;
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_dump_state(CPUState *cs, FILE *f, int flags)
          int numvfpregs = 0;
          if (cpu_isar_feature(aa32_simd_r32, cpu)) {
              numvfpregs = 32;
 -        } else if (arm_feature(env, ARM_FEATURE_VFP)) {
 +        } else if (cpu_isar_feature(aa32_simd_r16, cpu)) {
              numvfpregs = 16;
          }
          for (i = 0; i < numvfpregs; i++) {
@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
       * KVM does not currently allow us to lie to the guest about its
       * ID/feature registers, so the guest always sees what the host has.
       */
 -    if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
 +    if (cpu_isar_feature(aa32_simd_r16, cpu)) {
          cpu->has_vfp = true;
          if (!kvm_enabled()) {
              qdev_property_add_static(DEVICE(obj), &arm_cpu_has_vfp_property);
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
       * We rely on no XScale CPU having VFP so we can use the same bits in the
       * TB flags field for VECSTRIDE and XSCALE_CPAR.
       */
 -    assert(!(arm_feature(env, ARM_FEATURE_VFP) &&
 +    assert(!(cpu_isar_feature(aa32_simd_r16, cpu) &&
               arm_feature(env, ARM_FEATURE_XSCALE)));
      if (arm_feature(env, ARM_FEATURE_V7) &&
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void cpacr_write(CPUARMState *env, const ARMCPRegInfo *ri,
           * ASEDIS [31] and D32DIS [30] are both UNK/SBZP without VFP.
           * TRCDIS [28] is RAZ/WI since we do not implement a trace macrocell.
           */
 -        if (arm_feature(env, ARM_FEATURE_VFP)) {
 +        if (cpu_isar_feature(aa32_simd_r16, env_archcpu(env))) {
              /* VFP coprocessor: cp10 & cp11 [23:20] */
              mask |= (1 << 31) | (1 << 30) | (0xf << 20);
@@ -XXX,XX +XXX,XX @@ void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu)
      } else if (cpu_isar_feature(aa32_simd_r32, cpu)) {
          gdb_register_coprocessor(cs, vfp_gdb_get_reg, vfp_gdb_set_reg,
 , "arm-vfp3.xml", 0);
 -    } else if (arm_feature(env, ARM_FEATURE_VFP)) {
 +    } else if (cpu_isar_feature(aa32_simd_r16, cpu)) {
          gdb_register_coprocessor(cs, vfp_gdb_get_reg, vfp_gdb_set_reg,
 , "arm-vfp.xml", 0);
      }
 diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/m_helper.c
 +++ b/target/arm/m_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t v7m_integrity_sig(CPUARMState *env, uint32_t lr)
       */
      uint32_t sig = 0xfefa125a;
 -    if (!arm_feature(env, ARM_FEATURE_VFP) || (lr & R_V7M_EXCRET_FTYPE_MASK)) {
 +    if (!cpu_isar_feature(aa32_simd_r16, env_archcpu(env))
 +        || (lr & R_V7M_EXCRET_FTYPE_MASK)) {
          sig |= 1;
      }
      return sig;
@@ -XXX,XX +XXX,XX @@ static void v7m_exception_taken(ARMCPU *cpu, uint32_t lr, bool dotailchain,
      if (dotailchain) {
          /* Sanitize LR FType and PREFIX bits */
 -        if (!arm_feature(env, ARM_FEATURE_VFP)) {
 +        if (!cpu_isar_feature(aa32_simd_r16, cpu)) {
              lr |= R_V7M_EXCRET_FTYPE_MASK;
          }
          lr = deposit32(lr, 24, 8, 0xff);
@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
      ftype = excret & R_V7M_EXCRET_FTYPE_MASK;
 -    if (!arm_feature(env, ARM_FEATURE_VFP) && !ftype) {
 +    if (!ftype && !cpu_isar_feature(aa32_simd_r16, cpu)) {
          qemu_log_mask(LOG_GUEST_ERROR, "M profile: zero FTYPE in exception "
                        "exit PC value 0x%" PRIx32 " is UNPREDICTABLE "
                        "if FPU not present\n",
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
               * SFPA is RAZ/WI from NS. FPCA is RO if NSACR.CP10 == 0,
               * RES0 if the FPU is not present, and is stored in the S bank
               */
 -            if (arm_feature(env, ARM_FEATURE_VFP) &&
 +            if (cpu_isar_feature(aa32_simd_r16, env_archcpu(env)) &&
                  extract32(env->v7m.nsacr, 10, 1)) {
                  env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_FPCA_MASK;
                  env->v7m.control[M_REG_S] |= val & R_V7M_CONTROL_FPCA_MASK;
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
              env->v7m.control[env->v7m.secure] &= ~R_V7M_CONTROL_NPRIV_MASK;
              env->v7m.control[env->v7m.secure] |= val & R_V7M_CONTROL_NPRIV_MASK;
          }
 -        if (arm_feature(env, ARM_FEATURE_VFP)) {
 +        if (cpu_isar_feature(aa32_simd_r16, env_archcpu(env))) {
              /*
               * SFPA is RAZ/WI from NS or if no FPU.
               * FPCA is RO if NSACR.CP10 == 0, RES0 if the FPU is not present.
 diff --git a/target/arm/machine.c b/target/arm/machine.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/machine.c
 +++ b/target/arm/machine.c
 @@ -XXX,XX +XXX,XX @@
- #ifndef HW_PL011_H
+ static bool vfp_needed(void *opaque)
- #define HW_PL011_H
+ {
+     ARMCPU *cpu = opaque;
-+#include "hw/sysbus.h"
+-    CPUARMState *env = &cpu->env;
-+#include "chardev/char-fe.h"
-+
+-    return arm_feature(env, ARM_FEATURE_VFP);
-+#define TYPE_PL011 "pl011"
++    return cpu_isar_feature(aa32_simd_r16, cpu);
 +#define PL011(obj) OBJECT_CHECK(PL011State, (obj), TYPE_PL011)
 +
 +/* This shares the same struct (and cast macro) as the base pl011 device */
 +#define TYPE_PL011_LUMINARY "pl011_luminary"
 +
 +typedef struct PL011State {
 +    SysBusDevice parent_obj;
 +
 +    MemoryRegion iomem;
 +    uint32_t readbuff;
 +    uint32_t flags;
 +    uint32_t lcr;
 +    uint32_t rsr;
 +    uint32_t cr;
 +    uint32_t dmacr;
 +    uint32_t int_enabled;
 +    uint32_t int_level;
 +    uint32_t read_fifo[16];
 +    uint32_t ilpr;
 +    uint32_t ibrd;
 +    uint32_t fbrd;
 +    uint32_t ifl;
 +    int read_pos;
 +    int read_count;
 +    int read_trigger;
 +    CharBackend chr;
 +    qemu_irq irq;
 +    const unsigned char *id;
 +} PL011State;
 +
  static inline DeviceState *pl011_create(hwaddr addr,
                                          qemu_irq irq,
                                          Chardev *chr)
 diff --git a/hw/char/pl011.c b/hw/char/pl011.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/char/pl011.c
 +++ b/hw/char/pl011.c
@@ -XXX,XX +XXX,XX @@
   */
  #include "qemu/osdep.h"
 +#include "hw/char/pl011.h"
  #include "hw/sysbus.h"
  #include "chardev/char-fe.h"
  #include "qemu/log.h"
  #include "trace.h"
 -#define TYPE_PL011 "pl011"
 -#define PL011(obj) OBJECT_CHECK(PL011State, (obj), TYPE_PL011)
 -
 -typedef struct PL011State {
 -    SysBusDevice parent_obj;
 -
 -    MemoryRegion iomem;
 -    uint32_t readbuff;
 -    uint32_t flags;
 -    uint32_t lcr;
 -    uint32_t rsr;
 -    uint32_t cr;
 -    uint32_t dmacr;
 -    uint32_t int_enabled;
 -    uint32_t int_level;
 -    uint32_t read_fifo[16];
 -    uint32_t ilpr;
 -    uint32_t ibrd;
 -    uint32_t fbrd;
 -    uint32_t ifl;
 -    int read_pos;
 -    int read_count;
 -    int read_trigger;
 -    CharBackend chr;
 -    qemu_irq irq;
 -    const unsigned char *id;
 -} PL011State;
 -
  #define PL011_INT_TX 0x20
  #define PL011_INT_RX 0x10
@@ -XXX,XX +XXX,XX @@ static void pl011_luminary_init(Object *obj)
  }
- static const TypeInfo pl011_luminary_info = {
+ static int get_fpscr(QEMUFile *f, void *opaque, size_t size,
 -    .name          = "pl011_luminary",
 +    .name          = TYPE_PL011_LUMINARY,
      .parent        = TYPE_PL011,
      .instance_init = pl011_luminary_init,
  };
 --
 .20.1

-New patch
+[PULL 48/52] target/arm: Rename isar_feature_aa32_fpdp_v2
+From: Richard Henderson <richard.henderson@linaro.org>
 The old name, isar_feature_aa32_fpdp, does not reflect
 that the test includes VFPv2.  We will introduce further
 feature tests for VFPv3.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Message-id: 20200214181547.21408-7-richard.henderson@linaro.org
 [PMM: fixed grammar in commit message]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/cpu.h               |  4 ++--
  target/arm/translate-vfp.inc.c | 40 +++++++++++++++++-----------------
 files changed, 22 insertions(+), 22 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fpshvec(const ARMISARegisters *id)
      return FIELD_EX32(id->mvfr0, MVFR0, FPSHVEC) > 0;
  }
 -static inline bool isar_feature_aa32_fpdp(const ARMISARegisters *id)
 +static inline bool isar_feature_aa32_fpdp_v2(const ARMISARegisters *id)
  {
 -    /* Return true if CPU supports double precision floating point */
 +    /* Return true if CPU supports double precision floating point, VFPv2 */
      return FIELD_EX32(id->mvfr0, MVFR0, FPDP) > 0;
  }
 diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.inc.c
 +++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
          return false;
      }
 -    if (dp && !dc_isar_feature(aa32_fpdp, s)) {
 +    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
          return false;
      }
 -    if (dp && !dc_isar_feature(aa32_fpdp, s)) {
 +    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          return false;
      }
 -    if (dp && !dc_isar_feature(aa32_fpdp, s)) {
 +    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
          return false;
      }
 -    if (dp && !dc_isar_feature(aa32_fpdp, s)) {
 +    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp, s)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp, s)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VFM_dp(DisasContext *s, arg_VFM_dp *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp, s)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp, s)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp, s)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp, s)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp, s)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp, s)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp, s)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp, s)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp, s)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp, s)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp, s)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp, s)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp, s)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp, s)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
 --
 .20.1

-[Qemu-devel] [PULL 05/21] target/arm: Split out vfp_helper.c
+[PULL 49/52] target/arm: Add isar_feature_aa32_{fpsp_v2, fpsp_v3, fpdp_v3}
 From: Richard Henderson <richard.henderson@linaro.org>
-Move all of the fp helpers out of helper.c into a new file.
+We will shortly use these to test for VFPv2 and VFPv3
-This is code movement only.  Since helper.c has no copyright
+in different situations.
 header, take the one from cpu.h for the new file.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20190215192302.27855-3-richard.henderson@linaro.org
+Message-id: 20200214181547.21408-8-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/Makefile.objs |    2 +-
+ target/arm/cpu.h | 18 ++++++++++++++++++
- target/arm/helper.c      | 1062 -------------------------------------
+file changed, 18 insertions(+)
  target/arm/vfp_helper.c  | 1088 ++++++++++++++++++++++++++++++++++++++
 files changed, 1089 insertions(+), 1063 deletions(-)
  create mode 100644 target/arm/vfp_helper.c
-diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/Makefile.objs
+--- a/target/arm/cpu.h
-+++ b/target/arm/Makefile.objs
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ obj-$(call land,$(CONFIG_KVM),$(call lnot,$(TARGET_AARCH64))) += kvm32.o
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fpshvec(const ARMISARegisters *id)
- obj-$(call land,$(CONFIG_KVM),$(TARGET_AARCH64)) += kvm64.o
+     return FIELD_EX32(id->mvfr0, MVFR0, FPSHVEC) > 0;
  obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
  obj-y += translate.o op_helper.o helper.o cpu.o
 -obj-y += neon_helper.o iwmmxt_helper.o vec_helper.o
 +obj-y += neon_helper.o iwmmxt_helper.o vec_helper.o vfp_helper.o
  obj-y += gdbstub.o
  obj-$(TARGET_AARCH64) += cpu64.o translate-a64.o helper-a64.o gdbstub64.o
  obj-$(TARGET_AARCH64) += pauth_helper.o
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sel_flags)(uint32_t flags, uint32_t a, uint32_t b)
      return (a & mask) | (b & ~mask);
  }
--/* VFP support.  We follow the convention used for VFP instructions:
++static inline bool isar_feature_aa32_fpsp_v2(const ARMISARegisters *id)
 -   Single precision routines have a "s" suffix, double precision a
 -   "d" suffix.  */
 -
 -/* Convert host exception flags to vfp form.  */
 -static inline int vfp_exceptbits_from_host(int host_bits)
 -{
 -    int target_bits = 0;
 -
 -    if (host_bits & float_flag_invalid)
 -        target_bits |= 1;
 -    if (host_bits & float_flag_divbyzero)
 -        target_bits |= 2;
 -    if (host_bits & float_flag_overflow)
 -        target_bits |= 4;
 -    if (host_bits & (float_flag_underflow | float_flag_output_denormal))
 -        target_bits |= 8;
 -    if (host_bits & float_flag_inexact)
 -        target_bits |= 0x10;
 -    if (host_bits & float_flag_input_denormal)
 -        target_bits |= 0x80;
 -    return target_bits;
 -}
 -
 -uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env)
 -{
 -    uint32_t i, fpscr;
 -
 -    fpscr = env->vfp.xregs[ARM_VFP_FPSCR]
 -            | (env->vfp.vec_len << 16)
 -            | (env->vfp.vec_stride << 20);
 -
 -    i = get_float_exception_flags(&env->vfp.fp_status);
 -    i |= get_float_exception_flags(&env->vfp.standard_fp_status);
 -    /* FZ16 does not generate an input denormal exception.  */
 -    i |= (get_float_exception_flags(&env->vfp.fp_status_f16)
 -          & ~float_flag_input_denormal);
 -    fpscr |= vfp_exceptbits_from_host(i);
 -
 -    i = env->vfp.qc[0] | env->vfp.qc[1] | env->vfp.qc[2] | env->vfp.qc[3];
 -    fpscr |= i ? FPCR_QC : 0;
 -
 -    return fpscr;
 -}
 -
 -uint32_t vfp_get_fpscr(CPUARMState *env)
 -{
 -    return HELPER(vfp_get_fpscr)(env);
 -}
 -
 -/* Convert vfp exception flags to target form.  */
 -static inline int vfp_exceptbits_to_host(int target_bits)
 -{
 -    int host_bits = 0;
 -
 -    if (target_bits & 1)
 -        host_bits |= float_flag_invalid;
 -    if (target_bits & 2)
 -        host_bits |= float_flag_divbyzero;
 -    if (target_bits & 4)
 -        host_bits |= float_flag_overflow;
 -    if (target_bits & 8)
 -        host_bits |= float_flag_underflow;
 -    if (target_bits & 0x10)
 -        host_bits |= float_flag_inexact;
 -    if (target_bits & 0x80)
 -        host_bits |= float_flag_input_denormal;
 -    return host_bits;
 -}
 -
 -void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
 -{
 -    int i;
 -    uint32_t changed = env->vfp.xregs[ARM_VFP_FPSCR];
 -
 -    /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
 -    if (!cpu_isar_feature(aa64_fp16, arm_env_get_cpu(env))) {
 -        val &= ~FPCR_FZ16;
 -    }
 -
 -    /*
 -     * We don't implement trapped exception handling, so the
 -     * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
 -     *
 -     * If we exclude the exception flags, IOC|DZC|OFC|UFC|IXC|IDC
 -     * (which are stored in fp_status), and the other RES0 bits
 -     * in between, then we clear all of the low 16 bits.
 -     */
 -    env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000;
 -    env->vfp.vec_len = (val >> 16) & 7;
 -    env->vfp.vec_stride = (val >> 20) & 3;
 -
 -    /*
 -     * The bit we set within fpscr_q is arbitrary; the register as a
 -     * whole being zero/non-zero is what counts.
 -     */
 -    env->vfp.qc[0] = val & FPCR_QC;
 -    env->vfp.qc[1] = 0;
 -    env->vfp.qc[2] = 0;
 -    env->vfp.qc[3] = 0;
 -
 -    changed ^= val;
 -    if (changed & (3 << 22)) {
 -        i = (val >> 22) & 3;
 -        switch (i) {
 -        case FPROUNDING_TIEEVEN:
 -            i = float_round_nearest_even;
 -            break;
 -        case FPROUNDING_POSINF:
 -            i = float_round_up;
 -            break;
 -        case FPROUNDING_NEGINF:
 -            i = float_round_down;
 -            break;
 -        case FPROUNDING_ZERO:
 -            i = float_round_to_zero;
 -            break;
 -        }
 -        set_float_rounding_mode(i, &env->vfp.fp_status);
 -        set_float_rounding_mode(i, &env->vfp.fp_status_f16);
 -    }
 -    if (changed & FPCR_FZ16) {
 -        bool ftz_enabled = val & FPCR_FZ16;
 -        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16);
 -        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16);
 -    }
 -    if (changed & FPCR_FZ) {
 -        bool ftz_enabled = val & FPCR_FZ;
 -        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status);
 -        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status);
 -    }
 -    if (changed & FPCR_DN) {
 -        bool dnan_enabled = val & FPCR_DN;
 -        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status);
 -        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16);
 -    }
 -
 -    /* The exception flags are ORed together when we read fpscr so we
 -     * only need to preserve the current state in one of our
 -     * float_status values.
 -     */
 -    i = vfp_exceptbits_to_host(val);
 -    set_float_exception_flags(i, &env->vfp.fp_status);
 -    set_float_exception_flags(0, &env->vfp.fp_status_f16);
 -    set_float_exception_flags(0, &env->vfp.standard_fp_status);
 -}
 -
 -void vfp_set_fpscr(CPUARMState *env, uint32_t val)
 -{
 -    HELPER(vfp_set_fpscr)(env, val);
 -}
 -
 -#define VFP_HELPER(name, p) HELPER(glue(glue(vfp_,name),p))
 -
 -#define VFP_BINOP(name) \
 -float32 VFP_HELPER(name, s)(float32 a, float32 b, void *fpstp) \
 -{ \
 -    float_status *fpst = fpstp; \
 -    return float32_ ## name(a, b, fpst); \
 -} \
 -float64 VFP_HELPER(name, d)(float64 a, float64 b, void *fpstp) \
 -{ \
 -    float_status *fpst = fpstp; \
 -    return float64_ ## name(a, b, fpst); \
 -}
 -VFP_BINOP(add)
 -VFP_BINOP(sub)
 -VFP_BINOP(mul)
 -VFP_BINOP(div)
 -VFP_BINOP(min)
 -VFP_BINOP(max)
 -VFP_BINOP(minnum)
 -VFP_BINOP(maxnum)
 -#undef VFP_BINOP
 -
 -float32 VFP_HELPER(neg, s)(float32 a)
 -{
 -    return float32_chs(a);
 -}
 -
 -float64 VFP_HELPER(neg, d)(float64 a)
 -{
 -    return float64_chs(a);
 -}
 -
 -float32 VFP_HELPER(abs, s)(float32 a)
 -{
 -    return float32_abs(a);
 -}
 -
 -float64 VFP_HELPER(abs, d)(float64 a)
 -{
 -    return float64_abs(a);
 -}
 -
 -float32 VFP_HELPER(sqrt, s)(float32 a, CPUARMState *env)
 -{
 -    return float32_sqrt(a, &env->vfp.fp_status);
 -}
 -
 -float64 VFP_HELPER(sqrt, d)(float64 a, CPUARMState *env)
 -{
 -    return float64_sqrt(a, &env->vfp.fp_status);
 -}
 -
 -static void softfloat_to_vfp_compare(CPUARMState *env, int cmp)
 -{
 -    uint32_t flags;
 -    switch (cmp) {
 -    case float_relation_equal:
 -        flags = 0x6;
 -        break;
 -    case float_relation_less:
 -        flags = 0x8;
 -        break;
 -    case float_relation_greater:
 -        flags = 0x2;
 -        break;
 -    case float_relation_unordered:
 -        flags = 0x3;
 -        break;
 -    default:
 -        g_assert_not_reached();
 -    }
 -    env->vfp.xregs[ARM_VFP_FPSCR] =
 -        deposit32(env->vfp.xregs[ARM_VFP_FPSCR], 28, 4, flags);
 -}
 -
 -/* XXX: check quiet/signaling case */
 -#define DO_VFP_cmp(p, type) \
 -void VFP_HELPER(cmp, p)(type a, type b, CPUARMState *env)  \
 -{ \
 -    softfloat_to_vfp_compare(env, \
 -        type ## _compare_quiet(a, b, &env->vfp.fp_status)); \
 -} \
 -void VFP_HELPER(cmpe, p)(type a, type b, CPUARMState *env) \
 -{ \
 -    softfloat_to_vfp_compare(env, \
 -        type ## _compare(a, b, &env->vfp.fp_status)); \
 -}
 -DO_VFP_cmp(s, float32)
 -DO_VFP_cmp(d, float64)
 -#undef DO_VFP_cmp
 -
 -/* Integer to float and float to integer conversions */
 -
 -#define CONV_ITOF(name, ftype, fsz, sign)                           \
 -ftype HELPER(name)(uint32_t x, void *fpstp)                         \
 -{                                                                   \
 -    float_status *fpst = fpstp;                                     \
 -    return sign##int32_to_##float##fsz((sign##int32_t)x, fpst);     \
 -}
 -
 -#define CONV_FTOI(name, ftype, fsz, sign, round)                \
 -sign##int32_t HELPER(name)(ftype x, void *fpstp)                \
 -{                                                               \
 -    float_status *fpst = fpstp;                                 \
 -    if (float##fsz##_is_any_nan(x)) {                           \
 -        float_raise(float_flag_invalid, fpst);                  \
 -        return 0;                                               \
 -    }                                                           \
 -    return float##fsz##_to_##sign##int32##round(x, fpst);       \
 -}
 -
 -#define FLOAT_CONVS(name, p, ftype, fsz, sign)            \
 -    CONV_ITOF(vfp_##name##to##p, ftype, fsz, sign)        \
 -    CONV_FTOI(vfp_to##name##p, ftype, fsz, sign, )        \
 -    CONV_FTOI(vfp_to##name##z##p, ftype, fsz, sign, _round_to_zero)
 -
 -FLOAT_CONVS(si, h, uint32_t, 16, )
 -FLOAT_CONVS(si, s, float32, 32, )
 -FLOAT_CONVS(si, d, float64, 64, )
 -FLOAT_CONVS(ui, h, uint32_t, 16, u)
 -FLOAT_CONVS(ui, s, float32, 32, u)
 -FLOAT_CONVS(ui, d, float64, 64, u)
 -
 -#undef CONV_ITOF
 -#undef CONV_FTOI
 -#undef FLOAT_CONVS
 -
 -/* floating point conversion */
 -float64 VFP_HELPER(fcvtd, s)(float32 x, CPUARMState *env)
 -{
 -    return float32_to_float64(x, &env->vfp.fp_status);
 -}
 -
 -float32 VFP_HELPER(fcvts, d)(float64 x, CPUARMState *env)
 -{
 -    return float64_to_float32(x, &env->vfp.fp_status);
 -}
 -
 -/* VFP3 fixed point conversion.  */
 -#define VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype) \
 -float##fsz HELPER(vfp_##name##to##p)(uint##isz##_t  x, uint32_t shift, \
 -                                     void *fpstp) \
 -{ return itype##_to_##float##fsz##_scalbn(x, -shift, fpstp); }
 -
 -#define VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype, ROUND, suff)   \
 -uint##isz##_t HELPER(vfp_to##name##p##suff)(float##fsz x, uint32_t shift, \
 -                                            void *fpst)                   \
 -{                                                                         \
 -    if (unlikely(float##fsz##_is_any_nan(x))) {                           \
 -        float_raise(float_flag_invalid, fpst);                            \
 -        return 0;                                                         \
 -    }                                                                     \
 -    return float##fsz##_to_##itype##_scalbn(x, ROUND, shift, fpst);       \
 -}
 -
 -#define VFP_CONV_FIX(name, p, fsz, isz, itype)                   \
 -VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype)                     \
 -VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
 -                         float_round_to_zero, _round_to_zero)    \
 -VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
 -                         get_float_rounding_mode(fpst), )
 -
 -#define VFP_CONV_FIX_A64(name, p, fsz, isz, itype)               \
 -VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype)                     \
 -VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
 -                         get_float_rounding_mode(fpst), )
 -
 -VFP_CONV_FIX(sh, d, 64, 64, int16)
 -VFP_CONV_FIX(sl, d, 64, 64, int32)
 -VFP_CONV_FIX_A64(sq, d, 64, 64, int64)
 -VFP_CONV_FIX(uh, d, 64, 64, uint16)
 -VFP_CONV_FIX(ul, d, 64, 64, uint32)
 -VFP_CONV_FIX_A64(uq, d, 64, 64, uint64)
 -VFP_CONV_FIX(sh, s, 32, 32, int16)
 -VFP_CONV_FIX(sl, s, 32, 32, int32)
 -VFP_CONV_FIX_A64(sq, s, 32, 64, int64)
 -VFP_CONV_FIX(uh, s, 32, 32, uint16)
 -VFP_CONV_FIX(ul, s, 32, 32, uint32)
 -VFP_CONV_FIX_A64(uq, s, 32, 64, uint64)
 -
 -#undef VFP_CONV_FIX
 -#undef VFP_CONV_FIX_FLOAT
 -#undef VFP_CONV_FLOAT_FIX_ROUND
 -#undef VFP_CONV_FIX_A64
 -
 -uint32_t HELPER(vfp_sltoh)(uint32_t x, uint32_t shift, void *fpst)
 -{
 -    return int32_to_float16_scalbn(x, -shift, fpst);
 -}
 -
 -uint32_t HELPER(vfp_ultoh)(uint32_t x, uint32_t shift, void *fpst)
 -{
 -    return uint32_to_float16_scalbn(x, -shift, fpst);
 -}
 -
 -uint32_t HELPER(vfp_sqtoh)(uint64_t x, uint32_t shift, void *fpst)
 -{
 -    return int64_to_float16_scalbn(x, -shift, fpst);
 -}
 -
 -uint32_t HELPER(vfp_uqtoh)(uint64_t x, uint32_t shift, void *fpst)
 -{
 -    return uint64_to_float16_scalbn(x, -shift, fpst);
 -}
 -
 -uint32_t HELPER(vfp_toshh)(uint32_t x, uint32_t shift, void *fpst)
 -{
 -    if (unlikely(float16_is_any_nan(x))) {
 -        float_raise(float_flag_invalid, fpst);
 -        return 0;
 -    }
 -    return float16_to_int16_scalbn(x, get_float_rounding_mode(fpst),
 -                                   shift, fpst);
 -}
 -
 -uint32_t HELPER(vfp_touhh)(uint32_t x, uint32_t shift, void *fpst)
 -{
 -    if (unlikely(float16_is_any_nan(x))) {
 -        float_raise(float_flag_invalid, fpst);
 -        return 0;
 -    }
 -    return float16_to_uint16_scalbn(x, get_float_rounding_mode(fpst),
 -                                    shift, fpst);
 -}
 -
 -uint32_t HELPER(vfp_toslh)(uint32_t x, uint32_t shift, void *fpst)
 -{
 -    if (unlikely(float16_is_any_nan(x))) {
 -        float_raise(float_flag_invalid, fpst);
 -        return 0;
 -    }
 -    return float16_to_int32_scalbn(x, get_float_rounding_mode(fpst),
 -                                   shift, fpst);
 -}
 -
 -uint32_t HELPER(vfp_toulh)(uint32_t x, uint32_t shift, void *fpst)
 -{
 -    if (unlikely(float16_is_any_nan(x))) {
 -        float_raise(float_flag_invalid, fpst);
 -        return 0;
 -    }
 -    return float16_to_uint32_scalbn(x, get_float_rounding_mode(fpst),
 -                                    shift, fpst);
 -}
 -
 -uint64_t HELPER(vfp_tosqh)(uint32_t x, uint32_t shift, void *fpst)
 -{
 -    if (unlikely(float16_is_any_nan(x))) {
 -        float_raise(float_flag_invalid, fpst);
 -        return 0;
 -    }
 -    return float16_to_int64_scalbn(x, get_float_rounding_mode(fpst),
 -                                   shift, fpst);
 -}
 -
 -uint64_t HELPER(vfp_touqh)(uint32_t x, uint32_t shift, void *fpst)
 -{
 -    if (unlikely(float16_is_any_nan(x))) {
 -        float_raise(float_flag_invalid, fpst);
 -        return 0;
 -    }
 -    return float16_to_uint64_scalbn(x, get_float_rounding_mode(fpst),
 -                                    shift, fpst);
 -}
 -
 -/* Set the current fp rounding mode and return the old one.
 - * The argument is a softfloat float_round_ value.
 - */
 -uint32_t HELPER(set_rmode)(uint32_t rmode, void *fpstp)
 -{
 -    float_status *fp_status = fpstp;
 -
 -    uint32_t prev_rmode = get_float_rounding_mode(fp_status);
 -    set_float_rounding_mode(rmode, fp_status);
 -
 -    return prev_rmode;
 -}
 -
 -/* Set the current fp rounding mode in the standard fp status and return
 - * the old one. This is for NEON instructions that need to change the
 - * rounding mode but wish to use the standard FPSCR values for everything
 - * else. Always set the rounding mode back to the correct value after
 - * modifying it.
 - * The argument is a softfloat float_round_ value.
 - */
 -uint32_t HELPER(set_neon_rmode)(uint32_t rmode, CPUARMState *env)
 -{
 -    float_status *fp_status = &env->vfp.standard_fp_status;
 -
 -    uint32_t prev_rmode = get_float_rounding_mode(fp_status);
 -    set_float_rounding_mode(rmode, fp_status);
 -
 -    return prev_rmode;
 -}
 -
 -/* Half precision conversions.  */
 -float32 HELPER(vfp_fcvt_f16_to_f32)(uint32_t a, void *fpstp, uint32_t ahp_mode)
 -{
 -    /* Squash FZ16 to 0 for the duration of conversion.  In this case,
 -     * it would affect flushing input denormals.
 -     */
 -    float_status *fpst = fpstp;
 -    flag save = get_flush_inputs_to_zero(fpst);
 -    set_flush_inputs_to_zero(false, fpst);
 -    float32 r = float16_to_float32(a, !ahp_mode, fpst);
 -    set_flush_inputs_to_zero(save, fpst);
 -    return r;
 -}
 -
 -uint32_t HELPER(vfp_fcvt_f32_to_f16)(float32 a, void *fpstp, uint32_t ahp_mode)
 -{
 -    /* Squash FZ16 to 0 for the duration of conversion.  In this case,
 -     * it would affect flushing output denormals.
 -     */
 -    float_status *fpst = fpstp;
 -    flag save = get_flush_to_zero(fpst);
 -    set_flush_to_zero(false, fpst);
 -    float16 r = float32_to_float16(a, !ahp_mode, fpst);
 -    set_flush_to_zero(save, fpst);
 -    return r;
 -}
 -
 -float64 HELPER(vfp_fcvt_f16_to_f64)(uint32_t a, void *fpstp, uint32_t ahp_mode)
 -{
 -    /* Squash FZ16 to 0 for the duration of conversion.  In this case,
 -     * it would affect flushing input denormals.
 -     */
 -    float_status *fpst = fpstp;
 -    flag save = get_flush_inputs_to_zero(fpst);
 -    set_flush_inputs_to_zero(false, fpst);
 -    float64 r = float16_to_float64(a, !ahp_mode, fpst);
 -    set_flush_inputs_to_zero(save, fpst);
 -    return r;
 -}
 -
 -uint32_t HELPER(vfp_fcvt_f64_to_f16)(float64 a, void *fpstp, uint32_t ahp_mode)
 -{
 -    /* Squash FZ16 to 0 for the duration of conversion.  In this case,
 -     * it would affect flushing output denormals.
 -     */
 -    float_status *fpst = fpstp;
 -    flag save = get_flush_to_zero(fpst);
 -    set_flush_to_zero(false, fpst);
 -    float16 r = float64_to_float16(a, !ahp_mode, fpst);
 -    set_flush_to_zero(save, fpst);
 -    return r;
 -}
 -
 -#define float32_two make_float32(0x40000000)
 -#define float32_three make_float32(0x40400000)
 -#define float32_one_point_five make_float32(0x3fc00000)
 -
 -float32 HELPER(recps_f32)(float32 a, float32 b, CPUARMState *env)
 -{
 -    float_status *s = &env->vfp.standard_fp_status;
 -    if ((float32_is_infinity(a) && float32_is_zero_or_denormal(b)) ||
 -        (float32_is_infinity(b) && float32_is_zero_or_denormal(a))) {
 -        if (!(float32_is_zero(a) || float32_is_zero(b))) {
 -            float_raise(float_flag_input_denormal, s);
 -        }
 -        return float32_two;
 -    }
 -    return float32_sub(float32_two, float32_mul(a, b, s), s);
 -}
 -
 -float32 HELPER(rsqrts_f32)(float32 a, float32 b, CPUARMState *env)
 -{
 -    float_status *s = &env->vfp.standard_fp_status;
 -    float32 product;
 -    if ((float32_is_infinity(a) && float32_is_zero_or_denormal(b)) ||
 -        (float32_is_infinity(b) && float32_is_zero_or_denormal(a))) {
 -        if (!(float32_is_zero(a) || float32_is_zero(b))) {
 -            float_raise(float_flag_input_denormal, s);
 -        }
 -        return float32_one_point_five;
 -    }
 -    product = float32_mul(a, b, s);
 -    return float32_div(float32_sub(float32_three, product, s), float32_two, s);
 -}
 -
 -/* NEON helpers.  */
 -
 -/* Constants 256 and 512 are used in some helpers; we avoid relying on
 - * int->float conversions at run-time.  */
 -#define float64_256 make_float64(0x4070000000000000LL)
 -#define float64_512 make_float64(0x4080000000000000LL)
 -#define float16_maxnorm make_float16(0x7bff)
 -#define float32_maxnorm make_float32(0x7f7fffff)
 -#define float64_maxnorm make_float64(0x7fefffffffffffffLL)
 -
 -/* Reciprocal functions
 - *
 - * The algorithm that must be used to calculate the estimate
 - * is specified by the ARM ARM, see FPRecipEstimate()/RecipEstimate
 - */
 -
 -/* See RecipEstimate()
 - *
 - * input is a 9 bit fixed point number
 - * input range 256 .. 511 for a number from 0.5 <= x < 1.0.
 - * result range 256 .. 511 for a number from 1.0 to 511/256.
 - */
 -
 -static int recip_estimate(int input)
 -{
 -    int a, b, r;
 -    assert(256 <= input && input < 512);
 -    a = (input * 2) + 1;
 -    b = (1 << 19) / a;
 -    r = (b + 1) >> 1;
 -    assert(256 <= r && r < 512);
 -    return r;
 -}
 -
 -/*
 - * Common wrapper to call recip_estimate
 - *
 - * The parameters are exponent and 64 bit fraction (without implicit
 - * bit) where the binary point is nominally at bit 52. Returns a
 - * float64 which can then be rounded to the appropriate size by the
 - * callee.
 - */
 -
 -static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
 -{
 -    uint32_t scaled, estimate;
 -    uint64_t result_frac;
 -    int result_exp;
 -
 -    /* Handle sub-normals */
 -    if (*exp == 0) {
 -        if (extract64(frac, 51, 1) == 0) {
 -            *exp = -1;
 -            frac <<= 2;
 -        } else {
 -            frac <<= 1;
 -        }
 -    }
 -
 -    /* scaled = UInt('1':fraction<51:44>) */
 -    scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
 -    estimate = recip_estimate(scaled);
 -
 -    result_exp = exp_off - *exp;
 -    result_frac = deposit64(0, 44, 8, estimate);
 -    if (result_exp == 0) {
 -        result_frac = deposit64(result_frac >> 1, 51, 1, 1);
 -    } else if (result_exp == -1) {
 -        result_frac = deposit64(result_frac >> 2, 50, 2, 1);
 -        result_exp = 0;
 -    }
 -
 -    *exp = result_exp;
 -
 -    return result_frac;
 -}
 -
 -static bool round_to_inf(float_status *fpst, bool sign_bit)
 -{
 -    switch (fpst->float_rounding_mode) {
 -    case float_round_nearest_even: /* Round to Nearest */
 -        return true;
 -    case float_round_up: /* Round to +Inf */
 -        return !sign_bit;
 -    case float_round_down: /* Round to -Inf */
 -        return sign_bit;
 -    case float_round_to_zero: /* Round to Zero */
 -        return false;
 -    }
 -
 -    g_assert_not_reached();
 -}
 -
 -uint32_t HELPER(recpe_f16)(uint32_t input, void *fpstp)
 -{
 -    float_status *fpst = fpstp;
 -    float16 f16 = float16_squash_input_denormal(input, fpst);
 -    uint32_t f16_val = float16_val(f16);
 -    uint32_t f16_sign = float16_is_neg(f16);
 -    int f16_exp = extract32(f16_val, 10, 5);
 -    uint32_t f16_frac = extract32(f16_val, 0, 10);
 -    uint64_t f64_frac;
 -
 -    if (float16_is_any_nan(f16)) {
 -        float16 nan = f16;
 -        if (float16_is_signaling_nan(f16, fpst)) {
 -            float_raise(float_flag_invalid, fpst);
 -            nan = float16_silence_nan(f16, fpst);
 -        }
 -        if (fpst->default_nan_mode) {
 -            nan =  float16_default_nan(fpst);
 -        }
 -        return nan;
 -    } else if (float16_is_infinity(f16)) {
 -        return float16_set_sign(float16_zero, float16_is_neg(f16));
 -    } else if (float16_is_zero(f16)) {
 -        float_raise(float_flag_divbyzero, fpst);
 -        return float16_set_sign(float16_infinity, float16_is_neg(f16));
 -    } else if (float16_abs(f16) < (1 << 8)) {
 -        /* Abs(value) < 2.0^-16 */
 -        float_raise(float_flag_overflow | float_flag_inexact, fpst);
 -        if (round_to_inf(fpst, f16_sign)) {
 -            return float16_set_sign(float16_infinity, f16_sign);
 -        } else {
 -            return float16_set_sign(float16_maxnorm, f16_sign);
 -        }
 -    } else if (f16_exp >= 29 && fpst->flush_to_zero) {
 -        float_raise(float_flag_underflow, fpst);
 -        return float16_set_sign(float16_zero, float16_is_neg(f16));
 -    }
 -
 -    f64_frac = call_recip_estimate(&f16_exp, 29,
 -                                   ((uint64_t) f16_frac) << (52 - 10));
 -
 -    /* result = sign : result_exp<4:0> : fraction<51:42> */
 -    f16_val = deposit32(0, 15, 1, f16_sign);
 -    f16_val = deposit32(f16_val, 10, 5, f16_exp);
 -    f16_val = deposit32(f16_val, 0, 10, extract64(f64_frac, 52 - 10, 10));
 -    return make_float16(f16_val);
 -}
 -
 -float32 HELPER(recpe_f32)(float32 input, void *fpstp)
 -{
 -    float_status *fpst = fpstp;
 -    float32 f32 = float32_squash_input_denormal(input, fpst);
 -    uint32_t f32_val = float32_val(f32);
 -    bool f32_sign = float32_is_neg(f32);
 -    int f32_exp = extract32(f32_val, 23, 8);
 -    uint32_t f32_frac = extract32(f32_val, 0, 23);
 -    uint64_t f64_frac;
 -
 -    if (float32_is_any_nan(f32)) {
 -        float32 nan = f32;
 -        if (float32_is_signaling_nan(f32, fpst)) {
 -            float_raise(float_flag_invalid, fpst);
 -            nan = float32_silence_nan(f32, fpst);
 -        }
 -        if (fpst->default_nan_mode) {
 -            nan =  float32_default_nan(fpst);
 -        }
 -        return nan;
 -    } else if (float32_is_infinity(f32)) {
 -        return float32_set_sign(float32_zero, float32_is_neg(f32));
 -    } else if (float32_is_zero(f32)) {
 -        float_raise(float_flag_divbyzero, fpst);
 -        return float32_set_sign(float32_infinity, float32_is_neg(f32));
 -    } else if (float32_abs(f32) < (1ULL << 21)) {
 -        /* Abs(value) < 2.0^-128 */
 -        float_raise(float_flag_overflow | float_flag_inexact, fpst);
 -        if (round_to_inf(fpst, f32_sign)) {
 -            return float32_set_sign(float32_infinity, f32_sign);
 -        } else {
 -            return float32_set_sign(float32_maxnorm, f32_sign);
 -        }
 -    } else if (f32_exp >= 253 && fpst->flush_to_zero) {
 -        float_raise(float_flag_underflow, fpst);
 -        return float32_set_sign(float32_zero, float32_is_neg(f32));
 -    }
 -
 -    f64_frac = call_recip_estimate(&f32_exp, 253,
 -                                   ((uint64_t) f32_frac) << (52 - 23));
 -
 -    /* result = sign : result_exp<7:0> : fraction<51:29> */
 -    f32_val = deposit32(0, 31, 1, f32_sign);
 -    f32_val = deposit32(f32_val, 23, 8, f32_exp);
 -    f32_val = deposit32(f32_val, 0, 23, extract64(f64_frac, 52 - 23, 23));
 -    return make_float32(f32_val);
 -}
 -
 -float64 HELPER(recpe_f64)(float64 input, void *fpstp)
 -{
 -    float_status *fpst = fpstp;
 -    float64 f64 = float64_squash_input_denormal(input, fpst);
 -    uint64_t f64_val = float64_val(f64);
 -    bool f64_sign = float64_is_neg(f64);
 -    int f64_exp = extract64(f64_val, 52, 11);
 -    uint64_t f64_frac = extract64(f64_val, 0, 52);
 -
 -    /* Deal with any special cases */
 -    if (float64_is_any_nan(f64)) {
 -        float64 nan = f64;
 -        if (float64_is_signaling_nan(f64, fpst)) {
 -            float_raise(float_flag_invalid, fpst);
 -            nan = float64_silence_nan(f64, fpst);
 -        }
 -        if (fpst->default_nan_mode) {
 -            nan =  float64_default_nan(fpst);
 -        }
 -        return nan;
 -    } else if (float64_is_infinity(f64)) {
 -        return float64_set_sign(float64_zero, float64_is_neg(f64));
 -    } else if (float64_is_zero(f64)) {
 -        float_raise(float_flag_divbyzero, fpst);
 -        return float64_set_sign(float64_infinity, float64_is_neg(f64));
 -    } else if ((f64_val & ~(1ULL << 63)) < (1ULL << 50)) {
 -        /* Abs(value) < 2.0^-1024 */
 -        float_raise(float_flag_overflow | float_flag_inexact, fpst);
 -        if (round_to_inf(fpst, f64_sign)) {
 -            return float64_set_sign(float64_infinity, f64_sign);
 -        } else {
 -            return float64_set_sign(float64_maxnorm, f64_sign);
 -        }
 -    } else if (f64_exp >= 2045 && fpst->flush_to_zero) {
 -        float_raise(float_flag_underflow, fpst);
 -        return float64_set_sign(float64_zero, float64_is_neg(f64));
 -    }
 -
 -    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac);
 -
 -    /* result = sign : result_exp<10:0> : fraction<51:0>; */
 -    f64_val = deposit64(0, 63, 1, f64_sign);
 -    f64_val = deposit64(f64_val, 52, 11, f64_exp);
 -    f64_val = deposit64(f64_val, 0, 52, f64_frac);
 -    return make_float64(f64_val);
 -}
 -
 -/* The algorithm that must be used to calculate the estimate
 - * is specified by the ARM ARM.
 - */
 -
 -static int do_recip_sqrt_estimate(int a)
 -{
 -    int b, estimate;
 -
 -    assert(128 <= a && a < 512);
 -    if (a < 256) {
 -        a = a * 2 + 1;
 -    } else {
 -        a = (a >> 1) << 1;
 -        a = (a + 1) * 2;
 -    }
 -    b = 512;
 -    while (a * (b + 1) * (b + 1) < (1 << 28)) {
 -        b += 1;
 -    }
 -    estimate = (b + 1) / 2;
 -    assert(256 <= estimate && estimate < 512);
 -
 -    return estimate;
 -}
 -
 -
 -static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
 -{
 -    int estimate;
 -    uint32_t scaled;
 -
 -    if (*exp == 0) {
 -        while (extract64(frac, 51, 1) == 0) {
 -            frac = frac << 1;
 -            *exp -= 1;
 -        }
 -        frac = extract64(frac, 0, 51) << 1;
 -    }
 -
 -    if (*exp & 1) {
 -        /* scaled = UInt('01':fraction<51:45>) */
 -        scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
 -    } else {
 -        /* scaled = UInt('1':fraction<51:44>) */
 -        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
 -    }
 -    estimate = do_recip_sqrt_estimate(scaled);
 -
 -    *exp = (exp_off - *exp) / 2;
 -    return extract64(estimate, 0, 8) << 44;
 -}
 -
 -uint32_t HELPER(rsqrte_f16)(uint32_t input, void *fpstp)
 -{
 -    float_status *s = fpstp;
 -    float16 f16 = float16_squash_input_denormal(input, s);
 -    uint16_t val = float16_val(f16);
 -    bool f16_sign = float16_is_neg(f16);
 -    int f16_exp = extract32(val, 10, 5);
 -    uint16_t f16_frac = extract32(val, 0, 10);
 -    uint64_t f64_frac;
 -
 -    if (float16_is_any_nan(f16)) {
 -        float16 nan = f16;
 -        if (float16_is_signaling_nan(f16, s)) {
 -            float_raise(float_flag_invalid, s);
 -            nan = float16_silence_nan(f16, s);
 -        }
 -        if (s->default_nan_mode) {
 -            nan =  float16_default_nan(s);
 -        }
 -        return nan;
 -    } else if (float16_is_zero(f16)) {
 -        float_raise(float_flag_divbyzero, s);
 -        return float16_set_sign(float16_infinity, f16_sign);
 -    } else if (f16_sign) {
 -        float_raise(float_flag_invalid, s);
 -        return float16_default_nan(s);
 -    } else if (float16_is_infinity(f16)) {
 -        return float16_zero;
 -    }
 -
 -    /* Scale and normalize to a double-precision value between 0.25 and 1.0,
 -     * preserving the parity of the exponent.  */
 -
 -    f64_frac = ((uint64_t) f16_frac) << (52 - 10);
 -
 -    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac);
 -
 -    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(2) */
 -    val = deposit32(0, 15, 1, f16_sign);
 -    val = deposit32(val, 10, 5, f16_exp);
 -    val = deposit32(val, 2, 8, extract64(f64_frac, 52 - 8, 8));
 -    return make_float16(val);
 -}
 -
 -float32 HELPER(rsqrte_f32)(float32 input, void *fpstp)
 -{
 -    float_status *s = fpstp;
 -    float32 f32 = float32_squash_input_denormal(input, s);
 -    uint32_t val = float32_val(f32);
 -    uint32_t f32_sign = float32_is_neg(f32);
 -    int f32_exp = extract32(val, 23, 8);
 -    uint32_t f32_frac = extract32(val, 0, 23);
 -    uint64_t f64_frac;
 -
 -    if (float32_is_any_nan(f32)) {
 -        float32 nan = f32;
 -        if (float32_is_signaling_nan(f32, s)) {
 -            float_raise(float_flag_invalid, s);
 -            nan = float32_silence_nan(f32, s);
 -        }
 -        if (s->default_nan_mode) {
 -            nan =  float32_default_nan(s);
 -        }
 -        return nan;
 -    } else if (float32_is_zero(f32)) {
 -        float_raise(float_flag_divbyzero, s);
 -        return float32_set_sign(float32_infinity, float32_is_neg(f32));
 -    } else if (float32_is_neg(f32)) {
 -        float_raise(float_flag_invalid, s);
 -        return float32_default_nan(s);
 -    } else if (float32_is_infinity(f32)) {
 -        return float32_zero;
 -    }
 -
 -    /* Scale and normalize to a double-precision value between 0.25 and 1.0,
 -     * preserving the parity of the exponent.  */
 -
 -    f64_frac = ((uint64_t) f32_frac) << 29;
 -
 -    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac);
 -
 -    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(15) */
 -    val = deposit32(0, 31, 1, f32_sign);
 -    val = deposit32(val, 23, 8, f32_exp);
 -    val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
 -    return make_float32(val);
 -}
 -
 -float64 HELPER(rsqrte_f64)(float64 input, void *fpstp)
 -{
 -    float_status *s = fpstp;
 -    float64 f64 = float64_squash_input_denormal(input, s);
 -    uint64_t val = float64_val(f64);
 -    bool f64_sign = float64_is_neg(f64);
 -    int f64_exp = extract64(val, 52, 11);
 -    uint64_t f64_frac = extract64(val, 0, 52);
 -
 -    if (float64_is_any_nan(f64)) {
 -        float64 nan = f64;
 -        if (float64_is_signaling_nan(f64, s)) {
 -            float_raise(float_flag_invalid, s);
 -            nan = float64_silence_nan(f64, s);
 -        }
 -        if (s->default_nan_mode) {
 -            nan =  float64_default_nan(s);
 -        }
 -        return nan;
 -    } else if (float64_is_zero(f64)) {
 -        float_raise(float_flag_divbyzero, s);
 -        return float64_set_sign(float64_infinity, float64_is_neg(f64));
 -    } else if (float64_is_neg(f64)) {
 -        float_raise(float_flag_invalid, s);
 -        return float64_default_nan(s);
 -    } else if (float64_is_infinity(f64)) {
 -        return float64_zero;
 -    }
 -
 -    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac);
 -
 -    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(44) */
 -    val = deposit64(0, 61, 1, f64_sign);
 -    val = deposit64(val, 52, 11, f64_exp);
 -    val = deposit64(val, 44, 8, extract64(f64_frac, 52 - 8, 8));
 -    return make_float64(val);
 -}
 -
 -uint32_t HELPER(recpe_u32)(uint32_t a, void *fpstp)
 -{
 -    /* float_status *s = fpstp; */
 -    int input, estimate;
 -
 -    if ((a & 0x80000000) == 0) {
 -        return 0xffffffff;
 -    }
 -
 -    input = extract32(a, 23, 9);
 -    estimate = recip_estimate(input);
 -
 -    return deposit32(0, (32 - 9), 9, estimate);
 -}
 -
 -uint32_t HELPER(rsqrte_u32)(uint32_t a, void *fpstp)
 -{
 -    int estimate;
 -
 -    if ((a & 0xc0000000) == 0) {
 -        return 0xffffffff;
 -    }
 -
 -    estimate = do_recip_sqrt_estimate(extract32(a, 23, 9));
 -
 -    return deposit32(0, 23, 9, estimate);
 -}
 -
 -/* VFPv4 fused multiply-accumulate */
 -float32 VFP_HELPER(muladd, s)(float32 a, float32 b, float32 c, void *fpstp)
 -{
 -    float_status *fpst = fpstp;
 -    return float32_muladd(a, b, c, 0, fpst);
 -}
 -
 -float64 VFP_HELPER(muladd, d)(float64 a, float64 b, float64 c, void *fpstp)
 -{
 -    float_status *fpst = fpstp;
 -    return float64_muladd(a, b, c, 0, fpst);
 -}
 -
 -/* ARMv8 round to integral */
 -float32 HELPER(rints_exact)(float32 x, void *fp_status)
 -{
 -    return float32_round_to_int(x, fp_status);
 -}
 -
 -float64 HELPER(rintd_exact)(float64 x, void *fp_status)
 -{
 -    return float64_round_to_int(x, fp_status);
 -}
 -
 -float32 HELPER(rints)(float32 x, void *fp_status)
 -{
 -    int old_flags = get_float_exception_flags(fp_status), new_flags;
 -    float32 ret;
 -
 -    ret = float32_round_to_int(x, fp_status);
 -
 -    /* Suppress any inexact exceptions the conversion produced */
 -    if (!(old_flags & float_flag_inexact)) {
 -        new_flags = get_float_exception_flags(fp_status);
 -        set_float_exception_flags(new_flags & ~float_flag_inexact, fp_status);
 -    }
 -
 -    return ret;
 -}
 -
 -float64 HELPER(rintd)(float64 x, void *fp_status)
 -{
 -    int old_flags = get_float_exception_flags(fp_status), new_flags;
 -    float64 ret;
 -
 -    ret = float64_round_to_int(x, fp_status);
 -
 -    new_flags = get_float_exception_flags(fp_status);
 -
 -    /* Suppress any inexact exceptions the conversion produced */
 -    if (!(old_flags & float_flag_inexact)) {
 -        new_flags = get_float_exception_flags(fp_status);
 -        set_float_exception_flags(new_flags & ~float_flag_inexact, fp_status);
 -    }
 -
 -    return ret;
 -}
 -
 -/* Convert ARM rounding mode to softfloat */
 -int arm_rmode_to_sf(int rmode)
 -{
 -    switch (rmode) {
 -    case FPROUNDING_TIEAWAY:
 -        rmode = float_round_ties_away;
 -        break;
 -    case FPROUNDING_ODD:
 -        /* FIXME: add support for TIEAWAY and ODD */
 -        qemu_log_mask(LOG_UNIMP, "arm: unimplemented rounding mode: %d\n",
 -                      rmode);
 -        /* fall through for now */
 -    case FPROUNDING_TIEEVEN:
 -    default:
 -        rmode = float_round_nearest_even;
 -        break;
 -    case FPROUNDING_POSINF:
 -        rmode = float_round_up;
 -        break;
 -    case FPROUNDING_NEGINF:
 -        rmode = float_round_down;
 -        break;
 -    case FPROUNDING_ZERO:
 -        rmode = float_round_to_zero;
 -        break;
 -    }
 -    return rmode;
 -}
 -
  /* CRC helpers.
   * The upper bytes of val (above the number specified by 'bytes') must have
   * been zeroed out by the caller.
 diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * ARM VFP floating-point operations
 + *
 + *  Copyright (c) 2003 Fabrice Bellard
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2.1 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qemu/log.h"
 +#include "cpu.h"
 +#include "exec/helper-proto.h"
 +#include "fpu/softfloat.h"
 +#include "internals.h"
 +
 +
 +/* VFP support.  We follow the convention used for VFP instructions:
 +   Single precision routines have a "s" suffix, double precision a
 +   "d" suffix.  */
 +
 +/* Convert host exception flags to vfp form.  */
 +static inline int vfp_exceptbits_from_host(int host_bits)
 +{
-+    int target_bits = 0;
++    /* Return true if CPU supports single precision floating point, VFPv2 */
-+
++    return FIELD_EX32(id->mvfr0, MVFR0, FPSP) > 0;
 +    if (host_bits & float_flag_invalid)
 +        target_bits |= 1;
 +    if (host_bits & float_flag_divbyzero)
 +        target_bits |= 2;
 +    if (host_bits & float_flag_overflow)
 +        target_bits |= 4;
 +    if (host_bits & (float_flag_underflow | float_flag_output_denormal))
 +        target_bits |= 8;
 +    if (host_bits & float_flag_inexact)
 +        target_bits |= 0x10;
 +    if (host_bits & float_flag_input_denormal)
 +        target_bits |= 0x80;
 +    return target_bits;
 +}
 +
-+uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env)
++static inline bool isar_feature_aa32_fpsp_v3(const ARMISARegisters *id)
 +{
-+    uint32_t i, fpscr;
++    /* Return true if CPU supports single precision floating point, VFPv3 */
-+
++    return FIELD_EX32(id->mvfr0, MVFR0, FPSP) >= 2;
 +    fpscr = env->vfp.xregs[ARM_VFP_FPSCR]
 +            | (env->vfp.vec_len << 16)
 +            | (env->vfp.vec_stride << 20);
 +
 +    i = get_float_exception_flags(&env->vfp.fp_status);
 +    i |= get_float_exception_flags(&env->vfp.standard_fp_status);
 +    /* FZ16 does not generate an input denormal exception.  */
 +    i |= (get_float_exception_flags(&env->vfp.fp_status_f16)
 +          & ~float_flag_input_denormal);
 +    fpscr |= vfp_exceptbits_from_host(i);
 +
 +    i = env->vfp.qc[0] | env->vfp.qc[1] | env->vfp.qc[2] | env->vfp.qc[3];
 +    fpscr |= i ? FPCR_QC : 0;
 +
 +    return fpscr;
 +}
 +
-+uint32_t vfp_get_fpscr(CPUARMState *env)
+ static inline bool isar_feature_aa32_fpdp_v2(const ARMISARegisters *id)
  {
      /* Return true if CPU supports double precision floating point, VFPv2 */
      return FIELD_EX32(id->mvfr0, MVFR0, FPDP) > 0;
  }
 +static inline bool isar_feature_aa32_fpdp_v3(const ARMISARegisters *id)
 +{
-+    return HELPER(vfp_get_fpscr)(env);
++    /* Return true if CPU supports double precision floating point, VFPv3 */
 +    return FIELD_EX32(id->mvfr0, MVFR0, FPDP) >= 2;
 +}
 +
-+/* Convert vfp exception flags to target form.  */
+ /*
-+static inline int vfp_exceptbits_to_host(int target_bits)
+  * We always set the FP and SIMD FP16 fields to indicate identical
-+{
+  * levels of support (assuming SIMD is implemented at all), so
 +    int host_bits = 0;
 +
 +    if (target_bits & 1)
 +        host_bits |= float_flag_invalid;
 +    if (target_bits & 2)
 +        host_bits |= float_flag_divbyzero;
 +    if (target_bits & 4)
 +        host_bits |= float_flag_overflow;
 +    if (target_bits & 8)
 +        host_bits |= float_flag_underflow;
 +    if (target_bits & 0x10)
 +        host_bits |= float_flag_inexact;
 +    if (target_bits & 0x80)
 +        host_bits |= float_flag_input_denormal;
 +    return host_bits;
 +}
 +
 +void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
 +{
 +    int i;
 +    uint32_t changed = env->vfp.xregs[ARM_VFP_FPSCR];
 +
 +    /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
 +    if (!cpu_isar_feature(aa64_fp16, arm_env_get_cpu(env))) {
 +        val &= ~FPCR_FZ16;
 +    }
 +
 +    /*
 +     * We don't implement trapped exception handling, so the
 +     * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
 +     *
 +     * If we exclude the exception flags, IOC|DZC|OFC|UFC|IXC|IDC
 +     * (which are stored in fp_status), and the other RES0 bits
 +     * in between, then we clear all of the low 16 bits.
 +     */
 +    env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000;
 +    env->vfp.vec_len = (val >> 16) & 7;
 +    env->vfp.vec_stride = (val >> 20) & 3;
 +
 +    /*
 +     * The bit we set within fpscr_q is arbitrary; the register as a
 +     * whole being zero/non-zero is what counts.
 +     */
 +    env->vfp.qc[0] = val & FPCR_QC;
 +    env->vfp.qc[1] = 0;
 +    env->vfp.qc[2] = 0;
 +    env->vfp.qc[3] = 0;
 +
 +    changed ^= val;
 +    if (changed & (3 << 22)) {
 +        i = (val >> 22) & 3;
 +        switch (i) {
 +        case FPROUNDING_TIEEVEN:
 +            i = float_round_nearest_even;
 +            break;
 +        case FPROUNDING_POSINF:
 +            i = float_round_up;
 +            break;
 +        case FPROUNDING_NEGINF:
 +            i = float_round_down;
 +            break;
 +        case FPROUNDING_ZERO:
 +            i = float_round_to_zero;
 +            break;
 +        }
 +        set_float_rounding_mode(i, &env->vfp.fp_status);
 +        set_float_rounding_mode(i, &env->vfp.fp_status_f16);
 +    }
 +    if (changed & FPCR_FZ16) {
 +        bool ftz_enabled = val & FPCR_FZ16;
 +        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16);
 +        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16);
 +    }
 +    if (changed & FPCR_FZ) {
 +        bool ftz_enabled = val & FPCR_FZ;
 +        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status);
 +        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status);
 +    }
 +    if (changed & FPCR_DN) {
 +        bool dnan_enabled = val & FPCR_DN;
 +        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status);
 +        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16);
 +    }
 +
 +    /* The exception flags are ORed together when we read fpscr so we
 +     * only need to preserve the current state in one of our
 +     * float_status values.
 +     */
 +    i = vfp_exceptbits_to_host(val);
 +    set_float_exception_flags(i, &env->vfp.fp_status);
 +    set_float_exception_flags(0, &env->vfp.fp_status_f16);
 +    set_float_exception_flags(0, &env->vfp.standard_fp_status);
 +}
 +
 +void vfp_set_fpscr(CPUARMState *env, uint32_t val)
 +{
 +    HELPER(vfp_set_fpscr)(env, val);
 +}
 +
 +#define VFP_HELPER(name, p) HELPER(glue(glue(vfp_,name),p))
 +
 +#define VFP_BINOP(name) \
 +float32 VFP_HELPER(name, s)(float32 a, float32 b, void *fpstp) \
 +{ \
 +    float_status *fpst = fpstp; \
 +    return float32_ ## name(a, b, fpst); \
 +} \
 +float64 VFP_HELPER(name, d)(float64 a, float64 b, void *fpstp) \
 +{ \
 +    float_status *fpst = fpstp; \
 +    return float64_ ## name(a, b, fpst); \
 +}
 +VFP_BINOP(add)
 +VFP_BINOP(sub)
 +VFP_BINOP(mul)
 +VFP_BINOP(div)
 +VFP_BINOP(min)
 +VFP_BINOP(max)
 +VFP_BINOP(minnum)
 +VFP_BINOP(maxnum)
 +#undef VFP_BINOP
 +
 +float32 VFP_HELPER(neg, s)(float32 a)
 +{
 +    return float32_chs(a);
 +}
 +
 +float64 VFP_HELPER(neg, d)(float64 a)
 +{
 +    return float64_chs(a);
 +}
 +
 +float32 VFP_HELPER(abs, s)(float32 a)
 +{
 +    return float32_abs(a);
 +}
 +
 +float64 VFP_HELPER(abs, d)(float64 a)
 +{
 +    return float64_abs(a);
 +}
 +
 +float32 VFP_HELPER(sqrt, s)(float32 a, CPUARMState *env)
 +{
 +    return float32_sqrt(a, &env->vfp.fp_status);
 +}
 +
 +float64 VFP_HELPER(sqrt, d)(float64 a, CPUARMState *env)
 +{
 +    return float64_sqrt(a, &env->vfp.fp_status);
 +}
 +
 +static void softfloat_to_vfp_compare(CPUARMState *env, int cmp)
 +{
 +    uint32_t flags;
 +    switch (cmp) {
 +    case float_relation_equal:
 +        flags = 0x6;
 +        break;
 +    case float_relation_less:
 +        flags = 0x8;
 +        break;
 +    case float_relation_greater:
 +        flags = 0x2;
 +        break;
 +    case float_relation_unordered:
 +        flags = 0x3;
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +    env->vfp.xregs[ARM_VFP_FPSCR] =
 +        deposit32(env->vfp.xregs[ARM_VFP_FPSCR], 28, 4, flags);
 +}
 +
 +/* XXX: check quiet/signaling case */
 +#define DO_VFP_cmp(p, type) \
 +void VFP_HELPER(cmp, p)(type a, type b, CPUARMState *env)  \
 +{ \
 +    softfloat_to_vfp_compare(env, \
 +        type ## _compare_quiet(a, b, &env->vfp.fp_status)); \
 +} \
 +void VFP_HELPER(cmpe, p)(type a, type b, CPUARMState *env) \
 +{ \
 +    softfloat_to_vfp_compare(env, \
 +        type ## _compare(a, b, &env->vfp.fp_status)); \
 +}
 +DO_VFP_cmp(s, float32)
 +DO_VFP_cmp(d, float64)
 +#undef DO_VFP_cmp
 +
 +/* Integer to float and float to integer conversions */
 +
 +#define CONV_ITOF(name, ftype, fsz, sign)                           \
 +ftype HELPER(name)(uint32_t x, void *fpstp)                         \
 +{                                                                   \
 +    float_status *fpst = fpstp;                                     \
 +    return sign##int32_to_##float##fsz((sign##int32_t)x, fpst);     \
 +}
 +
 +#define CONV_FTOI(name, ftype, fsz, sign, round)                \
 +sign##int32_t HELPER(name)(ftype x, void *fpstp)                \
 +{                                                               \
 +    float_status *fpst = fpstp;                                 \
 +    if (float##fsz##_is_any_nan(x)) {                           \
 +        float_raise(float_flag_invalid, fpst);                  \
 +        return 0;                                               \
 +    }                                                           \
 +    return float##fsz##_to_##sign##int32##round(x, fpst);       \
 +}
 +
 +#define FLOAT_CONVS(name, p, ftype, fsz, sign)            \
 +    CONV_ITOF(vfp_##name##to##p, ftype, fsz, sign)        \
 +    CONV_FTOI(vfp_to##name##p, ftype, fsz, sign, )        \
 +    CONV_FTOI(vfp_to##name##z##p, ftype, fsz, sign, _round_to_zero)
 +
 +FLOAT_CONVS(si, h, uint32_t, 16, )
 +FLOAT_CONVS(si, s, float32, 32, )
 +FLOAT_CONVS(si, d, float64, 64, )
 +FLOAT_CONVS(ui, h, uint32_t, 16, u)
 +FLOAT_CONVS(ui, s, float32, 32, u)
 +FLOAT_CONVS(ui, d, float64, 64, u)
 +
 +#undef CONV_ITOF
 +#undef CONV_FTOI
 +#undef FLOAT_CONVS
 +
 +/* floating point conversion */
 +float64 VFP_HELPER(fcvtd, s)(float32 x, CPUARMState *env)
 +{
 +    return float32_to_float64(x, &env->vfp.fp_status);
 +}
 +
 +float32 VFP_HELPER(fcvts, d)(float64 x, CPUARMState *env)
 +{
 +    return float64_to_float32(x, &env->vfp.fp_status);
 +}
 +
 +/* VFP3 fixed point conversion.  */
 +#define VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype) \
 +float##fsz HELPER(vfp_##name##to##p)(uint##isz##_t  x, uint32_t shift, \
 +                                     void *fpstp) \
 +{ return itype##_to_##float##fsz##_scalbn(x, -shift, fpstp); }
 +
 +#define VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype, ROUND, suff)   \
 +uint##isz##_t HELPER(vfp_to##name##p##suff)(float##fsz x, uint32_t shift, \
 +                                            void *fpst)                   \
 +{                                                                         \
 +    if (unlikely(float##fsz##_is_any_nan(x))) {                           \
 +        float_raise(float_flag_invalid, fpst);                            \
 +        return 0;                                                         \
 +    }                                                                     \
 +    return float##fsz##_to_##itype##_scalbn(x, ROUND, shift, fpst);       \
 +}
 +
 +#define VFP_CONV_FIX(name, p, fsz, isz, itype)                   \
 +VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype)                     \
 +VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
 +                         float_round_to_zero, _round_to_zero)    \
 +VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
 +                         get_float_rounding_mode(fpst), )
 +
 +#define VFP_CONV_FIX_A64(name, p, fsz, isz, itype)               \
 +VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype)                     \
 +VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
 +                         get_float_rounding_mode(fpst), )
 +
 +VFP_CONV_FIX(sh, d, 64, 64, int16)
 +VFP_CONV_FIX(sl, d, 64, 64, int32)
 +VFP_CONV_FIX_A64(sq, d, 64, 64, int64)
 +VFP_CONV_FIX(uh, d, 64, 64, uint16)
 +VFP_CONV_FIX(ul, d, 64, 64, uint32)
 +VFP_CONV_FIX_A64(uq, d, 64, 64, uint64)
 +VFP_CONV_FIX(sh, s, 32, 32, int16)
 +VFP_CONV_FIX(sl, s, 32, 32, int32)
 +VFP_CONV_FIX_A64(sq, s, 32, 64, int64)
 +VFP_CONV_FIX(uh, s, 32, 32, uint16)
 +VFP_CONV_FIX(ul, s, 32, 32, uint32)
 +VFP_CONV_FIX_A64(uq, s, 32, 64, uint64)
 +
 +#undef VFP_CONV_FIX
 +#undef VFP_CONV_FIX_FLOAT
 +#undef VFP_CONV_FLOAT_FIX_ROUND
 +#undef VFP_CONV_FIX_A64
 +
 +uint32_t HELPER(vfp_sltoh)(uint32_t x, uint32_t shift, void *fpst)
 +{
 +    return int32_to_float16_scalbn(x, -shift, fpst);
 +}
 +
 +uint32_t HELPER(vfp_ultoh)(uint32_t x, uint32_t shift, void *fpst)
 +{
 +    return uint32_to_float16_scalbn(x, -shift, fpst);
 +}
 +
 +uint32_t HELPER(vfp_sqtoh)(uint64_t x, uint32_t shift, void *fpst)
 +{
 +    return int64_to_float16_scalbn(x, -shift, fpst);
 +}
 +
 +uint32_t HELPER(vfp_uqtoh)(uint64_t x, uint32_t shift, void *fpst)
 +{
 +    return uint64_to_float16_scalbn(x, -shift, fpst);
 +}
 +
 +uint32_t HELPER(vfp_toshh)(uint32_t x, uint32_t shift, void *fpst)
 +{
 +    if (unlikely(float16_is_any_nan(x))) {
 +        float_raise(float_flag_invalid, fpst);
 +        return 0;
 +    }
 +    return float16_to_int16_scalbn(x, get_float_rounding_mode(fpst),
 +                                   shift, fpst);
 +}
 +
 +uint32_t HELPER(vfp_touhh)(uint32_t x, uint32_t shift, void *fpst)
 +{
 +    if (unlikely(float16_is_any_nan(x))) {
 +        float_raise(float_flag_invalid, fpst);
 +        return 0;
 +    }
 +    return float16_to_uint16_scalbn(x, get_float_rounding_mode(fpst),
 +                                    shift, fpst);
 +}
 +
 +uint32_t HELPER(vfp_toslh)(uint32_t x, uint32_t shift, void *fpst)
 +{
 +    if (unlikely(float16_is_any_nan(x))) {
 +        float_raise(float_flag_invalid, fpst);
 +        return 0;
 +    }
 +    return float16_to_int32_scalbn(x, get_float_rounding_mode(fpst),
 +                                   shift, fpst);
 +}
 +
 +uint32_t HELPER(vfp_toulh)(uint32_t x, uint32_t shift, void *fpst)
 +{
 +    if (unlikely(float16_is_any_nan(x))) {
 +        float_raise(float_flag_invalid, fpst);
 +        return 0;
 +    }
 +    return float16_to_uint32_scalbn(x, get_float_rounding_mode(fpst),
 +                                    shift, fpst);
 +}
 +
 +uint64_t HELPER(vfp_tosqh)(uint32_t x, uint32_t shift, void *fpst)
 +{
 +    if (unlikely(float16_is_any_nan(x))) {
 +        float_raise(float_flag_invalid, fpst);
 +        return 0;
 +    }
 +    return float16_to_int64_scalbn(x, get_float_rounding_mode(fpst),
 +                                   shift, fpst);
 +}
 +
 +uint64_t HELPER(vfp_touqh)(uint32_t x, uint32_t shift, void *fpst)
 +{
 +    if (unlikely(float16_is_any_nan(x))) {
 +        float_raise(float_flag_invalid, fpst);
 +        return 0;
 +    }
 +    return float16_to_uint64_scalbn(x, get_float_rounding_mode(fpst),
 +                                    shift, fpst);
 +}
 +
 +/* Set the current fp rounding mode and return the old one.
 + * The argument is a softfloat float_round_ value.
 + */
 +uint32_t HELPER(set_rmode)(uint32_t rmode, void *fpstp)
 +{
 +    float_status *fp_status = fpstp;
 +
 +    uint32_t prev_rmode = get_float_rounding_mode(fp_status);
 +    set_float_rounding_mode(rmode, fp_status);
 +
 +    return prev_rmode;
 +}
 +
 +/* Set the current fp rounding mode in the standard fp status and return
 + * the old one. This is for NEON instructions that need to change the
 + * rounding mode but wish to use the standard FPSCR values for everything
 + * else. Always set the rounding mode back to the correct value after
 + * modifying it.
 + * The argument is a softfloat float_round_ value.
 + */
 +uint32_t HELPER(set_neon_rmode)(uint32_t rmode, CPUARMState *env)
 +{
 +    float_status *fp_status = &env->vfp.standard_fp_status;
 +
 +    uint32_t prev_rmode = get_float_rounding_mode(fp_status);
 +    set_float_rounding_mode(rmode, fp_status);
 +
 +    return prev_rmode;
 +}
 +
 +/* Half precision conversions.  */
 +float32 HELPER(vfp_fcvt_f16_to_f32)(uint32_t a, void *fpstp, uint32_t ahp_mode)
 +{
 +    /* Squash FZ16 to 0 for the duration of conversion.  In this case,
 +     * it would affect flushing input denormals.
 +     */
 +    float_status *fpst = fpstp;
 +    flag save = get_flush_inputs_to_zero(fpst);
 +    set_flush_inputs_to_zero(false, fpst);
 +    float32 r = float16_to_float32(a, !ahp_mode, fpst);
 +    set_flush_inputs_to_zero(save, fpst);
 +    return r;
 +}
 +
 +uint32_t HELPER(vfp_fcvt_f32_to_f16)(float32 a, void *fpstp, uint32_t ahp_mode)
 +{
 +    /* Squash FZ16 to 0 for the duration of conversion.  In this case,
 +     * it would affect flushing output denormals.
 +     */
 +    float_status *fpst = fpstp;
 +    flag save = get_flush_to_zero(fpst);
 +    set_flush_to_zero(false, fpst);
 +    float16 r = float32_to_float16(a, !ahp_mode, fpst);
 +    set_flush_to_zero(save, fpst);
 +    return r;
 +}
 +
 +float64 HELPER(vfp_fcvt_f16_to_f64)(uint32_t a, void *fpstp, uint32_t ahp_mode)
 +{
 +    /* Squash FZ16 to 0 for the duration of conversion.  In this case,
 +     * it would affect flushing input denormals.
 +     */
 +    float_status *fpst = fpstp;
 +    flag save = get_flush_inputs_to_zero(fpst);
 +    set_flush_inputs_to_zero(false, fpst);
 +    float64 r = float16_to_float64(a, !ahp_mode, fpst);
 +    set_flush_inputs_to_zero(save, fpst);
 +    return r;
 +}
 +
 +uint32_t HELPER(vfp_fcvt_f64_to_f16)(float64 a, void *fpstp, uint32_t ahp_mode)
 +{
 +    /* Squash FZ16 to 0 for the duration of conversion.  In this case,
 +     * it would affect flushing output denormals.
 +     */
 +    float_status *fpst = fpstp;
 +    flag save = get_flush_to_zero(fpst);
 +    set_flush_to_zero(false, fpst);
 +    float16 r = float64_to_float16(a, !ahp_mode, fpst);
 +    set_flush_to_zero(save, fpst);
 +    return r;
 +}
 +
 +#define float32_two make_float32(0x40000000)
 +#define float32_three make_float32(0x40400000)
 +#define float32_one_point_five make_float32(0x3fc00000)
 +
 +float32 HELPER(recps_f32)(float32 a, float32 b, CPUARMState *env)
 +{
 +    float_status *s = &env->vfp.standard_fp_status;
 +    if ((float32_is_infinity(a) && float32_is_zero_or_denormal(b)) ||
 +        (float32_is_infinity(b) && float32_is_zero_or_denormal(a))) {
 +        if (!(float32_is_zero(a) || float32_is_zero(b))) {
 +            float_raise(float_flag_input_denormal, s);
 +        }
 +        return float32_two;
 +    }
 +    return float32_sub(float32_two, float32_mul(a, b, s), s);
 +}
 +
 +float32 HELPER(rsqrts_f32)(float32 a, float32 b, CPUARMState *env)
 +{
 +    float_status *s = &env->vfp.standard_fp_status;
 +    float32 product;
 +    if ((float32_is_infinity(a) && float32_is_zero_or_denormal(b)) ||
 +        (float32_is_infinity(b) && float32_is_zero_or_denormal(a))) {
 +        if (!(float32_is_zero(a) || float32_is_zero(b))) {
 +            float_raise(float_flag_input_denormal, s);
 +        }
 +        return float32_one_point_five;
 +    }
 +    product = float32_mul(a, b, s);
 +    return float32_div(float32_sub(float32_three, product, s), float32_two, s);
 +}
 +
 +/* NEON helpers.  */
 +
 +/* Constants 256 and 512 are used in some helpers; we avoid relying on
 + * int->float conversions at run-time.  */
 +#define float64_256 make_float64(0x4070000000000000LL)
 +#define float64_512 make_float64(0x4080000000000000LL)
 +#define float16_maxnorm make_float16(0x7bff)
 +#define float32_maxnorm make_float32(0x7f7fffff)
 +#define float64_maxnorm make_float64(0x7fefffffffffffffLL)
 +
 +/* Reciprocal functions
 + *
 + * The algorithm that must be used to calculate the estimate
 + * is specified by the ARM ARM, see FPRecipEstimate()/RecipEstimate
 + */
 +
 +/* See RecipEstimate()
 + *
 + * input is a 9 bit fixed point number
 + * input range 256 .. 511 for a number from 0.5 <= x < 1.0.
 + * result range 256 .. 511 for a number from 1.0 to 511/256.
 + */
 +
 +static int recip_estimate(int input)
 +{
 +    int a, b, r;
 +    assert(256 <= input && input < 512);
 +    a = (input * 2) + 1;
 +    b = (1 << 19) / a;
 +    r = (b + 1) >> 1;
 +    assert(256 <= r && r < 512);
 +    return r;
 +}
 +
 +/*
 + * Common wrapper to call recip_estimate
 + *
 + * The parameters are exponent and 64 bit fraction (without implicit
 + * bit) where the binary point is nominally at bit 52. Returns a
 + * float64 which can then be rounded to the appropriate size by the
 + * callee.
 + */
 +
 +static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
 +{
 +    uint32_t scaled, estimate;
 +    uint64_t result_frac;
 +    int result_exp;
 +
 +    /* Handle sub-normals */
 +    if (*exp == 0) {
 +        if (extract64(frac, 51, 1) == 0) {
 +            *exp = -1;
 +            frac <<= 2;
 +        } else {
 +            frac <<= 1;
 +        }
 +    }
 +
 +    /* scaled = UInt('1':fraction<51:44>) */
 +    scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
 +    estimate = recip_estimate(scaled);
 +
 +    result_exp = exp_off - *exp;
 +    result_frac = deposit64(0, 44, 8, estimate);
 +    if (result_exp == 0) {
 +        result_frac = deposit64(result_frac >> 1, 51, 1, 1);
 +    } else if (result_exp == -1) {
 +        result_frac = deposit64(result_frac >> 2, 50, 2, 1);
 +        result_exp = 0;
 +    }
 +
 +    *exp = result_exp;
 +
 +    return result_frac;
 +}
 +
 +static bool round_to_inf(float_status *fpst, bool sign_bit)
 +{
 +    switch (fpst->float_rounding_mode) {
 +    case float_round_nearest_even: /* Round to Nearest */
 +        return true;
 +    case float_round_up: /* Round to +Inf */
 +        return !sign_bit;
 +    case float_round_down: /* Round to -Inf */
 +        return sign_bit;
 +    case float_round_to_zero: /* Round to Zero */
 +        return false;
 +    }
 +
 +    g_assert_not_reached();
 +}
 +
 +uint32_t HELPER(recpe_f16)(uint32_t input, void *fpstp)
 +{
 +    float_status *fpst = fpstp;
 +    float16 f16 = float16_squash_input_denormal(input, fpst);
 +    uint32_t f16_val = float16_val(f16);
 +    uint32_t f16_sign = float16_is_neg(f16);
 +    int f16_exp = extract32(f16_val, 10, 5);
 +    uint32_t f16_frac = extract32(f16_val, 0, 10);
 +    uint64_t f64_frac;
 +
 +    if (float16_is_any_nan(f16)) {
 +        float16 nan = f16;
 +        if (float16_is_signaling_nan(f16, fpst)) {
 +            float_raise(float_flag_invalid, fpst);
 +            nan = float16_silence_nan(f16, fpst);
 +        }
 +        if (fpst->default_nan_mode) {
 +            nan =  float16_default_nan(fpst);
 +        }
 +        return nan;
 +    } else if (float16_is_infinity(f16)) {
 +        return float16_set_sign(float16_zero, float16_is_neg(f16));
 +    } else if (float16_is_zero(f16)) {
 +        float_raise(float_flag_divbyzero, fpst);
 +        return float16_set_sign(float16_infinity, float16_is_neg(f16));
 +    } else if (float16_abs(f16) < (1 << 8)) {
 +        /* Abs(value) < 2.0^-16 */
 +        float_raise(float_flag_overflow | float_flag_inexact, fpst);
 +        if (round_to_inf(fpst, f16_sign)) {
 +            return float16_set_sign(float16_infinity, f16_sign);
 +        } else {
 +            return float16_set_sign(float16_maxnorm, f16_sign);
 +        }
 +    } else if (f16_exp >= 29 && fpst->flush_to_zero) {
 +        float_raise(float_flag_underflow, fpst);
 +        return float16_set_sign(float16_zero, float16_is_neg(f16));
 +    }
 +
 +    f64_frac = call_recip_estimate(&f16_exp, 29,
 +                                   ((uint64_t) f16_frac) << (52 - 10));
 +
 +    /* result = sign : result_exp<4:0> : fraction<51:42> */
 +    f16_val = deposit32(0, 15, 1, f16_sign);
 +    f16_val = deposit32(f16_val, 10, 5, f16_exp);
 +    f16_val = deposit32(f16_val, 0, 10, extract64(f64_frac, 52 - 10, 10));
 +    return make_float16(f16_val);
 +}
 +
 +float32 HELPER(recpe_f32)(float32 input, void *fpstp)
 +{
 +    float_status *fpst = fpstp;
 +    float32 f32 = float32_squash_input_denormal(input, fpst);
 +    uint32_t f32_val = float32_val(f32);
 +    bool f32_sign = float32_is_neg(f32);
 +    int f32_exp = extract32(f32_val, 23, 8);
 +    uint32_t f32_frac = extract32(f32_val, 0, 23);
 +    uint64_t f64_frac;
 +
 +    if (float32_is_any_nan(f32)) {
 +        float32 nan = f32;
 +        if (float32_is_signaling_nan(f32, fpst)) {
 +            float_raise(float_flag_invalid, fpst);
 +            nan = float32_silence_nan(f32, fpst);
 +        }
 +        if (fpst->default_nan_mode) {
 +            nan =  float32_default_nan(fpst);
 +        }
 +        return nan;
 +    } else if (float32_is_infinity(f32)) {
 +        return float32_set_sign(float32_zero, float32_is_neg(f32));
 +    } else if (float32_is_zero(f32)) {
 +        float_raise(float_flag_divbyzero, fpst);
 +        return float32_set_sign(float32_infinity, float32_is_neg(f32));
 +    } else if (float32_abs(f32) < (1ULL << 21)) {
 +        /* Abs(value) < 2.0^-128 */
 +        float_raise(float_flag_overflow | float_flag_inexact, fpst);
 +        if (round_to_inf(fpst, f32_sign)) {
 +            return float32_set_sign(float32_infinity, f32_sign);
 +        } else {
 +            return float32_set_sign(float32_maxnorm, f32_sign);
 +        }
 +    } else if (f32_exp >= 253 && fpst->flush_to_zero) {
 +        float_raise(float_flag_underflow, fpst);
 +        return float32_set_sign(float32_zero, float32_is_neg(f32));
 +    }
 +
 +    f64_frac = call_recip_estimate(&f32_exp, 253,
 +                                   ((uint64_t) f32_frac) << (52 - 23));
 +
 +    /* result = sign : result_exp<7:0> : fraction<51:29> */
 +    f32_val = deposit32(0, 31, 1, f32_sign);
 +    f32_val = deposit32(f32_val, 23, 8, f32_exp);
 +    f32_val = deposit32(f32_val, 0, 23, extract64(f64_frac, 52 - 23, 23));
 +    return make_float32(f32_val);
 +}
 +
 +float64 HELPER(recpe_f64)(float64 input, void *fpstp)
 +{
 +    float_status *fpst = fpstp;
 +    float64 f64 = float64_squash_input_denormal(input, fpst);
 +    uint64_t f64_val = float64_val(f64);
 +    bool f64_sign = float64_is_neg(f64);
 +    int f64_exp = extract64(f64_val, 52, 11);
 +    uint64_t f64_frac = extract64(f64_val, 0, 52);
 +
 +    /* Deal with any special cases */
 +    if (float64_is_any_nan(f64)) {
 +        float64 nan = f64;
 +        if (float64_is_signaling_nan(f64, fpst)) {
 +            float_raise(float_flag_invalid, fpst);
 +            nan = float64_silence_nan(f64, fpst);
 +        }
 +        if (fpst->default_nan_mode) {
 +            nan =  float64_default_nan(fpst);
 +        }
 +        return nan;
 +    } else if (float64_is_infinity(f64)) {
 +        return float64_set_sign(float64_zero, float64_is_neg(f64));
 +    } else if (float64_is_zero(f64)) {
 +        float_raise(float_flag_divbyzero, fpst);
 +        return float64_set_sign(float64_infinity, float64_is_neg(f64));
 +    } else if ((f64_val & ~(1ULL << 63)) < (1ULL << 50)) {
 +        /* Abs(value) < 2.0^-1024 */
 +        float_raise(float_flag_overflow | float_flag_inexact, fpst);
 +        if (round_to_inf(fpst, f64_sign)) {
 +            return float64_set_sign(float64_infinity, f64_sign);
 +        } else {
 +            return float64_set_sign(float64_maxnorm, f64_sign);
 +        }
 +    } else if (f64_exp >= 2045 && fpst->flush_to_zero) {
 +        float_raise(float_flag_underflow, fpst);
 +        return float64_set_sign(float64_zero, float64_is_neg(f64));
 +    }
 +
 +    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac);
 +
 +    /* result = sign : result_exp<10:0> : fraction<51:0>; */
 +    f64_val = deposit64(0, 63, 1, f64_sign);
 +    f64_val = deposit64(f64_val, 52, 11, f64_exp);
 +    f64_val = deposit64(f64_val, 0, 52, f64_frac);
 +    return make_float64(f64_val);
 +}
 +
 +/* The algorithm that must be used to calculate the estimate
 + * is specified by the ARM ARM.
 + */
 +
 +static int do_recip_sqrt_estimate(int a)
 +{
 +    int b, estimate;
 +
 +    assert(128 <= a && a < 512);
 +    if (a < 256) {
 +        a = a * 2 + 1;
 +    } else {
 +        a = (a >> 1) << 1;
 +        a = (a + 1) * 2;
 +    }
 +    b = 512;
 +    while (a * (b + 1) * (b + 1) < (1 << 28)) {
 +        b += 1;
 +    }
 +    estimate = (b + 1) / 2;
 +    assert(256 <= estimate && estimate < 512);
 +
 +    return estimate;
 +}
 +
 +
 +static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
 +{
 +    int estimate;
 +    uint32_t scaled;
 +
 +    if (*exp == 0) {
 +        while (extract64(frac, 51, 1) == 0) {
 +            frac = frac << 1;
 +            *exp -= 1;
 +        }
 +        frac = extract64(frac, 0, 51) << 1;
 +    }
 +
 +    if (*exp & 1) {
 +        /* scaled = UInt('01':fraction<51:45>) */
 +        scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
 +    } else {
 +        /* scaled = UInt('1':fraction<51:44>) */
 +        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
 +    }
 +    estimate = do_recip_sqrt_estimate(scaled);
 +
 +    *exp = (exp_off - *exp) / 2;
 +    return extract64(estimate, 0, 8) << 44;
 +}
 +
 +uint32_t HELPER(rsqrte_f16)(uint32_t input, void *fpstp)
 +{
 +    float_status *s = fpstp;
 +    float16 f16 = float16_squash_input_denormal(input, s);
 +    uint16_t val = float16_val(f16);
 +    bool f16_sign = float16_is_neg(f16);
 +    int f16_exp = extract32(val, 10, 5);
 +    uint16_t f16_frac = extract32(val, 0, 10);
 +    uint64_t f64_frac;
 +
 +    if (float16_is_any_nan(f16)) {
 +        float16 nan = f16;
 +        if (float16_is_signaling_nan(f16, s)) {
 +            float_raise(float_flag_invalid, s);
 +            nan = float16_silence_nan(f16, s);
 +        }
 +        if (s->default_nan_mode) {
 +            nan =  float16_default_nan(s);
 +        }
 +        return nan;
 +    } else if (float16_is_zero(f16)) {
 +        float_raise(float_flag_divbyzero, s);
 +        return float16_set_sign(float16_infinity, f16_sign);
 +    } else if (f16_sign) {
 +        float_raise(float_flag_invalid, s);
 +        return float16_default_nan(s);
 +    } else if (float16_is_infinity(f16)) {
 +        return float16_zero;
 +    }
 +
 +    /* Scale and normalize to a double-precision value between 0.25 and 1.0,
 +     * preserving the parity of the exponent.  */
 +
 +    f64_frac = ((uint64_t) f16_frac) << (52 - 10);
 +
 +    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac);
 +
 +    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(2) */
 +    val = deposit32(0, 15, 1, f16_sign);
 +    val = deposit32(val, 10, 5, f16_exp);
 +    val = deposit32(val, 2, 8, extract64(f64_frac, 52 - 8, 8));
 +    return make_float16(val);
 +}
 +
 +float32 HELPER(rsqrte_f32)(float32 input, void *fpstp)
 +{
 +    float_status *s = fpstp;
 +    float32 f32 = float32_squash_input_denormal(input, s);
 +    uint32_t val = float32_val(f32);
 +    uint32_t f32_sign = float32_is_neg(f32);
 +    int f32_exp = extract32(val, 23, 8);
 +    uint32_t f32_frac = extract32(val, 0, 23);
 +    uint64_t f64_frac;
 +
 +    if (float32_is_any_nan(f32)) {
 +        float32 nan = f32;
 +        if (float32_is_signaling_nan(f32, s)) {
 +            float_raise(float_flag_invalid, s);
 +            nan = float32_silence_nan(f32, s);
 +        }
 +        if (s->default_nan_mode) {
 +            nan =  float32_default_nan(s);
 +        }
 +        return nan;
 +    } else if (float32_is_zero(f32)) {
 +        float_raise(float_flag_divbyzero, s);
 +        return float32_set_sign(float32_infinity, float32_is_neg(f32));
 +    } else if (float32_is_neg(f32)) {
 +        float_raise(float_flag_invalid, s);
 +        return float32_default_nan(s);
 +    } else if (float32_is_infinity(f32)) {
 +        return float32_zero;
 +    }
 +
 +    /* Scale and normalize to a double-precision value between 0.25 and 1.0,
 +     * preserving the parity of the exponent.  */
 +
 +    f64_frac = ((uint64_t) f32_frac) << 29;
 +
 +    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac);
 +
 +    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(15) */
 +    val = deposit32(0, 31, 1, f32_sign);
 +    val = deposit32(val, 23, 8, f32_exp);
 +    val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
 +    return make_float32(val);
 +}
 +
 +float64 HELPER(rsqrte_f64)(float64 input, void *fpstp)
 +{
 +    float_status *s = fpstp;
 +    float64 f64 = float64_squash_input_denormal(input, s);
 +    uint64_t val = float64_val(f64);
 +    bool f64_sign = float64_is_neg(f64);
 +    int f64_exp = extract64(val, 52, 11);
 +    uint64_t f64_frac = extract64(val, 0, 52);
 +
 +    if (float64_is_any_nan(f64)) {
 +        float64 nan = f64;
 +        if (float64_is_signaling_nan(f64, s)) {
 +            float_raise(float_flag_invalid, s);
 +            nan = float64_silence_nan(f64, s);
 +        }
 +        if (s->default_nan_mode) {
 +            nan =  float64_default_nan(s);
 +        }
 +        return nan;
 +    } else if (float64_is_zero(f64)) {
 +        float_raise(float_flag_divbyzero, s);
 +        return float64_set_sign(float64_infinity, float64_is_neg(f64));
 +    } else if (float64_is_neg(f64)) {
 +        float_raise(float_flag_invalid, s);
 +        return float64_default_nan(s);
 +    } else if (float64_is_infinity(f64)) {
 +        return float64_zero;
 +    }
 +
 +    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac);
 +
 +    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(44) */
 +    val = deposit64(0, 61, 1, f64_sign);
 +    val = deposit64(val, 52, 11, f64_exp);
 +    val = deposit64(val, 44, 8, extract64(f64_frac, 52 - 8, 8));
 +    return make_float64(val);
 +}
 +
 +uint32_t HELPER(recpe_u32)(uint32_t a, void *fpstp)
 +{
 +    /* float_status *s = fpstp; */
 +    int input, estimate;
 +
 +    if ((a & 0x80000000) == 0) {
 +        return 0xffffffff;
 +    }
 +
 +    input = extract32(a, 23, 9);
 +    estimate = recip_estimate(input);
 +
 +    return deposit32(0, (32 - 9), 9, estimate);
 +}
 +
 +uint32_t HELPER(rsqrte_u32)(uint32_t a, void *fpstp)
 +{
 +    int estimate;
 +
 +    if ((a & 0xc0000000) == 0) {
 +        return 0xffffffff;
 +    }
 +
 +    estimate = do_recip_sqrt_estimate(extract32(a, 23, 9));
 +
 +    return deposit32(0, 23, 9, estimate);
 +}
 +
 +/* VFPv4 fused multiply-accumulate */
 +float32 VFP_HELPER(muladd, s)(float32 a, float32 b, float32 c, void *fpstp)
 +{
 +    float_status *fpst = fpstp;
 +    return float32_muladd(a, b, c, 0, fpst);
 +}
 +
 +float64 VFP_HELPER(muladd, d)(float64 a, float64 b, float64 c, void *fpstp)
 +{
 +    float_status *fpst = fpstp;
 +    return float64_muladd(a, b, c, 0, fpst);
 +}
 +
 +/* ARMv8 round to integral */
 +float32 HELPER(rints_exact)(float32 x, void *fp_status)
 +{
 +    return float32_round_to_int(x, fp_status);
 +}
 +
 +float64 HELPER(rintd_exact)(float64 x, void *fp_status)
 +{
 +    return float64_round_to_int(x, fp_status);
 +}
 +
 +float32 HELPER(rints)(float32 x, void *fp_status)
 +{
 +    int old_flags = get_float_exception_flags(fp_status), new_flags;
 +    float32 ret;
 +
 +    ret = float32_round_to_int(x, fp_status);
 +
 +    /* Suppress any inexact exceptions the conversion produced */
 +    if (!(old_flags & float_flag_inexact)) {
 +        new_flags = get_float_exception_flags(fp_status);
 +        set_float_exception_flags(new_flags & ~float_flag_inexact, fp_status);
 +    }
 +
 +    return ret;
 +}
 +
 +float64 HELPER(rintd)(float64 x, void *fp_status)
 +{
 +    int old_flags = get_float_exception_flags(fp_status), new_flags;
 +    float64 ret;
 +
 +    ret = float64_round_to_int(x, fp_status);
 +
 +    new_flags = get_float_exception_flags(fp_status);
 +
 +    /* Suppress any inexact exceptions the conversion produced */
 +    if (!(old_flags & float_flag_inexact)) {
 +        new_flags = get_float_exception_flags(fp_status);
 +        set_float_exception_flags(new_flags & ~float_flag_inexact, fp_status);
 +    }
 +
 +    return ret;
 +}
 +
 +/* Convert ARM rounding mode to softfloat */
 +int arm_rmode_to_sf(int rmode)
 +{
 +    switch (rmode) {
 +    case FPROUNDING_TIEAWAY:
 +        rmode = float_round_ties_away;
 +        break;
 +    case FPROUNDING_ODD:
 +        /* FIXME: add support for TIEAWAY and ODD */
 +        qemu_log_mask(LOG_UNIMP, "arm: unimplemented rounding mode: %d\n",
 +                      rmode);
 +        /* fall through for now */
 +    case FPROUNDING_TIEEVEN:
 +    default:
 +        rmode = float_round_nearest_even;
 +        break;
 +    case FPROUNDING_POSINF:
 +        rmode = float_round_up;
 +        break;
 +    case FPROUNDING_NEGINF:
 +        rmode = float_round_down;
 +        break;
 +    case FPROUNDING_ZERO:
 +        rmode = float_round_to_zero;
 +        break;
 +    }
 +    return rmode;
 +}
 --
 .20.1

-New patch
+[PULL 50/52] target/arm: Perform fpdp_v2 check first
+From: Richard Henderson <richard.henderson@linaro.org>
 Shuffle the order of the checks so that we test the ISA
 before we test anything else, such as the register arguments.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200214181547.21408-9-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/translate-vfp.inc.c | 144 ++++++++++++++++-----------------
 file changed, 72 insertions(+), 72 deletions(-)
 diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.inc.c
 +++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
          return false;
      }
 -    /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
 -        ((a->vm | a->vn | a->vd) & 0x10)) {
 +    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
 -    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vm | a->vn | a->vd) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
          return false;
      }
 -    /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
 -        ((a->vm | a->vn | a->vd) & 0x10)) {
 +    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
 -    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vm | a->vn | a->vd) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          return false;
      }
 -    /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
 -        ((a->vm | a->vd) & 0x10)) {
 +    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
 -    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vm | a->vd) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
          return false;
      }
 -    /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (dp && !dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
 +    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
 -    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (dp && !dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
      TCGv_i64 f0, f1, fd;
      TCGv_ptr fpst;
 -    /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vn | vm) & 0x10)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vn | vm) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
      int veclen = s->vec_len;
      TCGv_i64 f0, fd;
 -    /* UNDEF accesses to D16-D31 if they don't exist */
 -    if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vm) & 0x10)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 +    /* UNDEF accesses to D16-D31 if they don't exist */
 +    if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vm) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VFM_dp(DisasContext *s, arg_VFM_dp *a)
          return false;
      }
 -    /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_simd_r32, s) &&
 -        ((a->vd | a->vn | a->vm) & 0x10)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
      vd = a->vd;
 -    /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_simd_r32, s) && (vd & 0x10)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (vd & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
  {
      TCGv_i64 vd, vm;
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 +        return false;
 +    }
 +
      /* Vm/M bits must be zero for the Z variant */
      if (a->z && a->vm != 0) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 -        return false;
 -    }
 -
      if (!vfp_access_check(s)) {
          return true;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
      TCGv_i32 tmp;
      TCGv_i64 vd;
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 +        return false;
 +    }
 +
      if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 -        return false;
 -    }
 -
      if (!vfp_access_check(s)) {
          return true;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
      TCGv_i32 tmp;
      TCGv_i64 vm;
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 +        return false;
 +    }
 +
      if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 -        return false;
 -    }
 -
      if (!vfp_access_check(s)) {
          return true;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
      TCGv_ptr fpst;
      TCGv_i64 tmp;
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 +        return false;
 +    }
 +
      if (!dc_isar_feature(aa32_vrint, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 -        return false;
 -    }
 -
      if (!vfp_access_check(s)) {
          return true;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
      TCGv_i64 tmp;
      TCGv_i32 tcg_rmode;
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 +        return false;
 +    }
 +
      if (!dc_isar_feature(aa32_vrint, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 -        return false;
 -    }
 -
      if (!vfp_access_check(s)) {
          return true;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
      TCGv_ptr fpst;
      TCGv_i64 tmp;
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 +        return false;
 +    }
 +
      if (!dc_isar_feature(aa32_vrint, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 -        return false;
 -    }
 -
      if (!vfp_access_check(s)) {
          return true;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
      TCGv_i64 vd;
      TCGv_i32 vm;
 -    /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
      TCGv_i64 vm;
      TCGv_i32 vd;
 -    /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
      TCGv_i64 vd;
      TCGv_ptr fpst;
 -    /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
      TCGv_i32 vd;
      TCGv_i64 vm;
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 +        return false;
 +    }
 +
      if (!dc_isar_feature(aa32_jscvt, s)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 -        return false;
 -    }
 -
      if (!vfp_access_check(s)) {
          return true;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
      TCGv_ptr fpst;
      int frac_bits;
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 +        return false;
 +    }
 +
      if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 -        return false;
 -    }
 -
      if (!vfp_access_check(s)) {
          return true;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
      TCGv_i64 vm;
      TCGv_ptr fpst;
 -    /* UNDEF accesses to D16-D31 if they don't exist. */
 -    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
 +    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
          return false;
      }
 -    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
          return false;
      }
 --
 .20.1

-New patch
+[PULL 51/52] target/arm: Replace ARM_FEATURE_VFP3 checks with fp{sp, dp}_v3
+From: Richard Henderson <richard.henderson@linaro.org>
+Sort this check to the start of a trans_* function.
+Merge this with any existing test for fpdp_v2.
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20200214181547.21408-10-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ target/arm/translate-vfp.inc.c | 24 ++++++++----------------
+file changed, 8 insertions(+), 16 deletions(-)
+diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-vfp.inc.c
++++ b/target/arm/translate-vfp.inc.c
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
+          * VFPv2 allows access to FPSID from userspace; VFPv3 restricts
+          * all ID registers to privileged access only.
+          */
+-        if (IS_USER(s) && arm_dc_feature(s, ARM_FEATURE_VFP3)) {
++        if (IS_USER(s) && dc_isar_feature(aa32_fpsp_v3, s)) {
+             return false;
+         }
+         ignore_vfp_enabled = true;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
+     case ARM_VFP_FPINST:
+     case ARM_VFP_FPINST2:
+         /* Not present in VFPv3 */
+-        if (IS_USER(s) || arm_dc_feature(s, ARM_FEATURE_VFP3)) {
++        if (IS_USER(s) || dc_isar_feature(aa32_fpsp_v3, s)) {
+             return false;
+         }
+         break;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
+     vd = a->vd;
+-    if (!dc_isar_feature(aa32_fpshvec, s) &&
+-        (veclen != 0 || s->vec_stride != 0)) {
++    if (!dc_isar_feature(aa32_fpsp_v3, s)) {
+         return false;
+     }
+-    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
++    if (!dc_isar_feature(aa32_fpshvec, s) &&
++        (veclen != 0 || s->vec_stride != 0)) {
+         return false;
+     }
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
+     vd = a->vd;
+-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
++    if (!dc_isar_feature(aa32_fpdp_v3, s)) {
+         return false;
+     }
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
+         return false;
+     }
+-    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
+-        return false;
+-    }
+-
+     if (!vfp_access_check(s)) {
+         return true;
+     }
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
+     TCGv_ptr fpst;
+     int frac_bits;
+-    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
++    if (!dc_isar_feature(aa32_fpsp_v3, s)) {
+         return false;
+     }
+@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
+     TCGv_ptr fpst;
+     int frac_bits;
+-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+-        return false;
+-    }
+-
+-    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
++    if (!dc_isar_feature(aa32_fpdp_v3, s)) {
+         return false;
+     }
+--
+.20.1

-New patch
+[PULL 52/52] target/arm: Add missing checks for fpsp_v2
+From: Richard Henderson <richard.henderson@linaro.org>
 We will eventually remove the early ARM_FEATURE_VFP test,
 so add a proper test for each trans_* that does not already
 have another ISA test.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20200214181547.21408-11-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/translate-vfp.inc.c | 78 ++++++++++++++++++++++++++++++----
 file changed, 69 insertions(+), 9 deletions(-)
 diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.inc.c
 +++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
      int pass;
      uint32_t offset;
 +    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
 +    if (a->size == 2
 +        ? !dc_isar_feature(aa32_fpsp_v2, s)
 +        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
      /* UNDEF accesses to D16-D31 if they don't exist */
      if (!dc_isar_feature(aa32_simd_r32, s) && (a->vn & 0x10)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
      pass = extract32(offset, 2, 1);
      offset = extract32(offset, 0, 2) * 8;
 -    if (a->size != 2 && !arm_dc_feature(s, ARM_FEATURE_NEON)) {
 -        return false;
 -    }
 -
      if (!vfp_access_check(s)) {
          return true;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
      int pass;
      uint32_t offset;
 +    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
 +    if (a->size == 2
 +        ? !dc_isar_feature(aa32_fpsp_v2, s)
 +        : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
      /* UNDEF accesses to D16-D31 if they don't exist */
      if (!dc_isar_feature(aa32_simd_r32, s) && (a->vn & 0x10)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
      pass = extract32(offset, 2, 1);
      offset = extract32(offset, 0, 2) * 8;
 -    if (a->size != 2 && !arm_dc_feature(s, ARM_FEATURE_NEON)) {
 -        return false;
 -    }
 -
      if (!vfp_access_check(s)) {
          return true;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMSR_VMRS(DisasContext *s, arg_VMSR_VMRS *a)
      TCGv_i32 tmp;
      bool ignore_vfp_enabled = false;
 +    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +        return false;
 +    }
 +
      if (arm_dc_feature(s, ARM_FEATURE_M)) {
          /*
           * The only M-profile VFP vmrs/vmsr sysreg is FPSCR.
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
  {
      TCGv_i32 tmp;
 +    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +        return false;
 +    }
 +
      if (!vfp_access_check(s)) {
          return true;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
  {
      TCGv_i32 tmp;
 +    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +        return false;
 +    }
 +
      /*
       * VMOV between two general-purpose registers and two single precision
       * floating point registers
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
      /*
       * VMOV between two general-purpose registers and one double precision
 -     * floating point register
 +     * floating point register.  Note that this does not require support
 +     * for double precision arithmetic.
       */
 +    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +        return false;
 +    }
      /* UNDEF accesses to D16-D31 if they don't exist */
      if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
      uint32_t offset;
      TCGv_i32 addr, tmp;
 +    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +        return false;
 +    }
 +
      if (!vfp_access_check(s)) {
          return true;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
      TCGv_i32 addr;
      TCGv_i64 tmp;
 +    /* Note that this does not require support for double arithmetic.  */
 +    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +        return false;
 +    }
 +
      /* UNDEF accesses to D16-D31 if they don't exist */
      if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
      TCGv_i32 addr, tmp;
      int i, n;
 +    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +        return false;
 +    }
 +
      n = a->imm;
      if (n == 0 || (a->vd + n) > 32) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
      TCGv_i64 tmp;
      int i, n;
 +    /* Note that this does not require support for double arithmetic.  */
 +    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +        return false;
 +    }
 +
      n = a->imm >> 1;
      if (n == 0 || (a->vd + n) > 32 || n > 16) {
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
      TCGv_i32 f0, f1, fd;
      TCGv_ptr fpst;
 +    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +        return false;
 +    }
 +
      if (!dc_isar_feature(aa32_fpshvec, s) &&
          (veclen != 0 || s->vec_stride != 0)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
      int veclen = s->vec_len;
      TCGv_i32 f0, fd;
 +    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +        return false;
 +    }
 +
      if (!dc_isar_feature(aa32_fpshvec, s) &&
          (veclen != 0 || s->vec_stride != 0)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
  {
      TCGv_i32 vd, vm;
 +    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +        return false;
 +    }
 +
      /* Vm/M bits must be zero for the Z variant */
      if (a->z && a->vm != 0) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
      TCGv_i32 vm;
      TCGv_ptr fpst;
 +    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +        return false;
 +    }
 +
      if (!vfp_access_check(s)) {
          return true;
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
      TCGv_i32 vm;
      TCGv_ptr fpst;
 +    if (!dc_isar_feature(aa32_fpsp_v2, s)) {
 +        return false;
 +    }
 +
      if (!vfp_access_check(s)) {
          return true;
      }
 --
 .20.1

Arm queue -- mostly the first slice of my Musca patches.

thanks
-- PMM

The following changes since commit fc3dbb90f2eb069801bfb4cfe9cbc83cf9c5f4a9:

Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging (2019-02-21 13:09:33 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20190221

for you to fetch changes up to 3733f80308d2a7f23f5e39b039e0547aba6c07f1:

hw/arm/armsse: Make 0x5... alias region work for per-CPU devices (2019-02-21 18:17:48 +0000)

----------------------------------------------------------------
target-arm queue:
 * Model the Arm "Musca" development boards: "musca-a" and "musca-b1"
 * Implement the ARMv8.3-JSConv extension
 * v8M MPU should use background region as default, not always
 * Stop unintentional sign extension in pmu_init

----------------------------------------------------------------
Aaron Lindsay OS (1):
      target/arm: Stop unintentional sign extension in pmu_init

Peter Maydell (16):
      hw/arm/armsse: Fix memory leak in error-exit path
      target/arm: v8M MPU should use background region as default, not always
      hw/misc/tz-ppc: Support having unused ports in the middle of the range
      hw/timer/pl031: Allow use as an embedded-struct device
      hw/timer/pl031: Convert to using trace events
      hw/char/pl011: Allow use as an embedded-struct device
      hw/char/pl011: Support all interrupt lines
      hw/char/pl011: Use '0x' prefix when logging hex numbers
      hw/arm/armsse: Document SRAM_ADDR_WIDTH property in header comment
      hw/arm/armsse: Allow boards to specify init-svtor
      hw/arm/musca.c: Implement models of the Musca-A and -B1 boards
      hw/arm/musca: Add PPCs
      hw/arm/musca: Add MPCs
      hw/arm/musca: Wire up PL031 RTC
      hw/arm/musca: Wire up PL011 UARTs
      hw/arm/armsse: Make 0x5... alias region work for per-CPU devices

Richard Henderson (4):
      target/arm: Restructure disas_fp_int_conv
      target/arm: Split out vfp_helper.c
      target/arm: Rearrange Floating-point data-processing (2 regs)
      target/arm: Implement ARMv8.3-JSConv

Coverity points out (CID 1398632, CID 1398650) that we
leak a couple of allocated strings in the error-exit
code path for setting up the MHUs in the ARMSSE.
Fix this bug by moving the allocate-and-free of each
string to be closer to the use, so we do the free before
doing the error-exit check.

Fixes: f8574705f62b38a ("hw/arm/armsse: Add unimplemented-device stubs for MHUs")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190215113707.24553-1-peter.maydell@linaro.org
---
 hw/arm/armsse.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/arm/armsse.c b/hw/arm/armsse.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/armsse.c
+++ b/hw/arm/armsse.c
@@ -XXX,XX +XXX,XX @@ static void armsse_realize(DeviceState *dev, Error **errp)
 
     if (info->has_mhus) {
         for (i = 0; i < ARRAY_SIZE(s->mhu); i++) {
-            char *name = g_strdup_printf("MHU%d", i);
-            char *port = g_strdup_printf("port[%d]", i + 3);
+            char *name;
+            char *port;
 
+            name = g_strdup_printf("MHU%d", i);
             qdev_prop_set_string(DEVICE(&s->mhu[i]), "name", name);
             qdev_prop_set_uint64(DEVICE(&s->mhu[i]), "size", 0x1000);
             object_property_set_bool(OBJECT(&s->mhu[i]), true,
                                      "realized", &err);
+            g_free(name);
             if (err) {
                 error_propagate(errp, err);
                 return;
             }
+            port = g_strdup_printf("port[%d]", i + 3);
             mr = sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->mhu[i]), 0);
             object_property_set_link(OBJECT(&s->apb_ppc0), OBJECT(mr),
                                      port, &err);
+            g_free(port);
             if (err) {
                 error_propagate(errp, err);
                 return;
             }
-            g_free(name);
-            g_free(port);
         }
     }
 
-- 
2.20.1

The "background region" for a v8M MPU is a default which will be used
(if enabled, and if the access is privileged) if the access does
not match any specific MPU region. We were incorrectly using it
always (by putting the condition at the wrong nesting level). This
meant that we would always return the default background permissions
rather than the correct permissions for a specific region, and also
that we would not return the right information in response to a
TT instruction.

Move the check for the background region to the same place in the
logic as the equivalent v8M MPUCheck() pseudocode puts it.
This in turn means we must adjust the condition we use to detect
matches in multiple regions to avoid false-positives.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190214113408.10214-1-peter.maydell@linaro.org
---
 target/arm/helper.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
         hit = true;
     } else if (m_is_ppb_region(env, address)) {
         hit = true;
-    } else if (pmsav7_use_background_region(cpu, mmu_idx, is_user)) {
-        hit = true;
     } else {
+        if (pmsav7_use_background_region(cpu, mmu_idx, is_user)) {
+            hit = true;
+        }
+
         for (n = (int)cpu->pmsav7_dregion - 1; n >= 0; n--) {
             /* region search */
             /* Note that the base address is bits [31:5] from the register
@@ -XXX,XX +XXX,XX @@ static bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
                 *is_subpage = true;
             }
 
-            if (hit) {
+            if (matchregion != -1) {
                 /* Multiple regions match -- always a failure (unlike
                  * PMSAv7 where highest-numbered-region wins)
                  */
-- 
2.20.1

From: Aaron Lindsay OS <aaron@os.amperecomputing.com>

This was introduced by
    commit bf8d09694ccc07487cd73d7562081fdaec3370c8
    target/arm: Don't clear supported PMU events when initializing PMCEID1
and identified by Coverity (CID 1398645).

Signed-off-by: Aaron Lindsay <aaron@os.amperecomputing.com>
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20190219144621.450-1-aaron@os.amperecomputing.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu)
 
         if (cnt->supported(&cpu->env)) {
             supported_event_map[cnt->number] = i;
-            uint64_t event_mask = 1 << (cnt->number & 0x1f);
+            uint64_t event_mask = 1ULL << (cnt->number & 0x1f);
             if (cnt->number & 0x20) {
                 cpu->pmceid1 |= event_mask;
             } else {
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

For opcodes 0-5, move some if conditions into the structure
of a switch statement.  For opcodes 6 & 7, decode everything
at once with a second switch.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190215192302.27855-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 94 ++++++++++++++++++++------------------
 1 file changed, 49 insertions(+), 45 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
     int type = extract32(insn, 22, 2);
     bool sbit = extract32(insn, 29, 1);
     bool sf = extract32(insn, 31, 1);
+    bool itof = false;
 
     if (sbit) {
-        unallocated_encoding(s);
-        return;
+        goto do_unallocated;
     }
 
-    if (opcode > 5) {
-        /* FMOV */
-        bool itof = opcode & 1;
-
-        if (rmode >= 2) {
-            unallocated_encoding(s);
-            return;
-        }
-
-        switch (sf << 3 | type << 1 | rmode) {
-        case 0x0: /* 32 bit */
-        case 0xa: /* 64 bit */
-        case 0xd: /* 64 bit to top half of quad */
-            break;
-        case 0x6: /* 16-bit float, 32-bit int */
-        case 0xe: /* 16-bit float, 64-bit int */
-            if (dc_isar_feature(aa64_fp16, s)) {
-                break;
-            }
-            /* fallthru */
-        default:
-            /* all other sf/type/rmode combinations are invalid */
-            unallocated_encoding(s);
-            return;
-        }
-
-        if (!fp_access_check(s)) {
-            return;
-        }
-        handle_fmov(s, rd, rn, type, itof);
-    } else {
-        /* actual FP conversions */
-        bool itof = extract32(opcode, 1, 1);
-
-        if (rmode != 0 && opcode > 1) {
-            unallocated_encoding(s);
-            return;
+    switch (opcode) {
+    case 2: /* SCVTF */
+    case 3: /* UCVTF */
+        itof = true;
+        /* fallthru */
+    case 4: /* FCVTAS */
+    case 5: /* FCVTAU */
+        if (rmode != 0) {
+            goto do_unallocated;
         }
+        /* fallthru */
+    case 0: /* FCVT[NPMZ]S */
+    case 1: /* FCVT[NPMZ]U */
         switch (type) {
         case 0: /* float32 */
         case 1: /* float64 */
             break;
         case 3: /* float16 */
-            if (dc_isar_feature(aa64_fp16, s)) {
-                break;
+            if (!dc_isar_feature(aa64_fp16, s)) {
+                goto do_unallocated;
             }
-            /* fallthru */
+            break;
         default:
-            unallocated_encoding(s);
-            return;
+            goto do_unallocated;
         }
-
         if (!fp_access_check(s)) {
             return;
         }
         handle_fpfpcvt(s, rd, rn, opcode, itof, rmode, 64, sf, type);
+        break;
+
+    default:
+        switch (sf << 7 | type << 5 | rmode << 3 | opcode) {
+        case 0b01100110: /* FMOV half <-> 32-bit int */
+        case 0b01100111:
+        case 0b11100110: /* FMOV half <-> 64-bit int */
+        case 0b11100111:
+            if (!dc_isar_feature(aa64_fp16, s)) {
+                goto do_unallocated;
+            }
+            /* fallthru */
+        case 0b00000110: /* FMOV 32-bit */
+        case 0b00000111:
+        case 0b10100110: /* FMOV 64-bit */
+        case 0b10100111:
+        case 0b11001110: /* FMOV top half of 128-bit */
+        case 0b11001111:
+            if (!fp_access_check(s)) {
+                return;
+            }
+            itof = opcode & 1;
+            handle_fmov(s, rd, rn, type, itof);
+            break;
+
+        default:
+        do_unallocated:
+            unallocated_encoding(s);
+            return;
+        }
+        break;
     }
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Move all of the fp helpers out of helper.c into a new file.
This is code movement only.  Since helper.c has no copyright
header, take the one from cpu.h for the new file.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190215192302.27855-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/Makefile.objs |    2 +-
 target/arm/helper.c      | 1062 -------------------------------------
 target/arm/vfp_helper.c  | 1088 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 1089 insertions(+), 1063 deletions(-)
 create mode 100644 target/arm/vfp_helper.c

diff --git a/target/arm/Makefile.objs b/target/arm/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/Makefile.objs
+++ b/target/arm/Makefile.objs
@@ -XXX,XX +XXX,XX @@ obj-$(call land,$(CONFIG_KVM),$(call lnot,$(TARGET_AARCH64))) += kvm32.o
 obj-$(call land,$(CONFIG_KVM),$(TARGET_AARCH64)) += kvm64.o
 obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
 obj-y += translate.o op_helper.o helper.o cpu.o
-obj-y += neon_helper.o iwmmxt_helper.o vec_helper.o
+obj-y += neon_helper.o iwmmxt_helper.o vec_helper.o vfp_helper.o
 obj-y += gdbstub.o
 obj-$(TARGET_AARCH64) += cpu64.o translate-a64.o helper-a64.o gdbstub64.o
 obj-$(TARGET_AARCH64) += pauth_helper.o
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(sel_flags)(uint32_t flags, uint32_t a, uint32_t b)
     return (a & mask) | (b & ~mask);
 }
 
-/* VFP support.  We follow the convention used for VFP instructions:
-   Single precision routines have a "s" suffix, double precision a
-   "d" suffix.  */
-
-/* Convert host exception flags to vfp form.  */
-static inline int vfp_exceptbits_from_host(int host_bits)
-{
-    int target_bits = 0;
-
-    if (host_bits & float_flag_invalid)
-        target_bits |= 1;
-    if (host_bits & float_flag_divbyzero)
-        target_bits |= 2;
-    if (host_bits & float_flag_overflow)
-        target_bits |= 4;
-    if (host_bits & (float_flag_underflow | float_flag_output_denormal))
-        target_bits |= 8;
-    if (host_bits & float_flag_inexact)
-        target_bits |= 0x10;
-    if (host_bits & float_flag_input_denormal)
-        target_bits |= 0x80;
-    return target_bits;
-}
-
-uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env)
-{
-    uint32_t i, fpscr;
-
-    fpscr = env->vfp.xregs[ARM_VFP_FPSCR]
-            | (env->vfp.vec_len << 16)
-            | (env->vfp.vec_stride << 20);
-
-    i = get_float_exception_flags(&env->vfp.fp_status);
-    i |= get_float_exception_flags(&env->vfp.standard_fp_status);
-    /* FZ16 does not generate an input denormal exception.  */
-    i |= (get_float_exception_flags(&env->vfp.fp_status_f16)
-          & ~float_flag_input_denormal);
-    fpscr |= vfp_exceptbits_from_host(i);
-
-    i = env->vfp.qc[0] | env->vfp.qc[1] | env->vfp.qc[2] | env->vfp.qc[3];
-    fpscr |= i ? FPCR_QC : 0;
-
-    return fpscr;
-}
-
-uint32_t vfp_get_fpscr(CPUARMState *env)
-{
-    return HELPER(vfp_get_fpscr)(env);
-}
-
-/* Convert vfp exception flags to target form.  */
-static inline int vfp_exceptbits_to_host(int target_bits)
-{
-    int host_bits = 0;
-
-    if (target_bits & 1)
-        host_bits |= float_flag_invalid;
-    if (target_bits & 2)
-        host_bits |= float_flag_divbyzero;
-    if (target_bits & 4)
-        host_bits |= float_flag_overflow;
-    if (target_bits & 8)
-        host_bits |= float_flag_underflow;
-    if (target_bits & 0x10)
-        host_bits |= float_flag_inexact;
-    if (target_bits & 0x80)
-        host_bits |= float_flag_input_denormal;
-    return host_bits;
-}
-
-void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
-{
-    int i;
-    uint32_t changed = env->vfp.xregs[ARM_VFP_FPSCR];
-
-    /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
-    if (!cpu_isar_feature(aa64_fp16, arm_env_get_cpu(env))) {
-        val &= ~FPCR_FZ16;
-    }
-
-    /*
-     * We don't implement trapped exception handling, so the
-     * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
-     *
-     * If we exclude the exception flags, IOC|DZC|OFC|UFC|IXC|IDC
-     * (which are stored in fp_status), and the other RES0 bits
-     * in between, then we clear all of the low 16 bits.
-     */
-    env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000;
-    env->vfp.vec_len = (val >> 16) & 7;
-    env->vfp.vec_stride = (val >> 20) & 3;
-
-    /*
-     * The bit we set within fpscr_q is arbitrary; the register as a
-     * whole being zero/non-zero is what counts.
-     */
-    env->vfp.qc[0] = val & FPCR_QC;
-    env->vfp.qc[1] = 0;
-    env->vfp.qc[2] = 0;
-    env->vfp.qc[3] = 0;
-
-    changed ^= val;
-    if (changed & (3 << 22)) {
-        i = (val >> 22) & 3;
-        switch (i) {
-        case FPROUNDING_TIEEVEN:
-            i = float_round_nearest_even;
-            break;
-        case FPROUNDING_POSINF:
-            i = float_round_up;
-            break;
-        case FPROUNDING_NEGINF:
-            i = float_round_down;
-            break;
-        case FPROUNDING_ZERO:
-            i = float_round_to_zero;
-            break;
-        }
-        set_float_rounding_mode(i, &env->vfp.fp_status);
-        set_float_rounding_mode(i, &env->vfp.fp_status_f16);
-    }
-    if (changed & FPCR_FZ16) {
-        bool ftz_enabled = val & FPCR_FZ16;
-        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16);
-    }
-    if (changed & FPCR_FZ) {
-        bool ftz_enabled = val & FPCR_FZ;
-        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status);
-        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status);
-    }
-    if (changed & FPCR_DN) {
-        bool dnan_enabled = val & FPCR_DN;
-        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status);
-        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16);
-    }
-
-    /* The exception flags are ORed together when we read fpscr so we
-     * only need to preserve the current state in one of our
-     * float_status values.
-     */
-    i = vfp_exceptbits_to_host(val);
-    set_float_exception_flags(i, &env->vfp.fp_status);
-    set_float_exception_flags(0, &env->vfp.fp_status_f16);
-    set_float_exception_flags(0, &env->vfp.standard_fp_status);
-}
-
-void vfp_set_fpscr(CPUARMState *env, uint32_t val)
-{
-    HELPER(vfp_set_fpscr)(env, val);
-}
-
-#define VFP_HELPER(name, p) HELPER(glue(glue(vfp_,name),p))
-
-#define VFP_BINOP(name) \
-float32 VFP_HELPER(name, s)(float32 a, float32 b, void *fpstp) \
-{ \
-    float_status *fpst = fpstp; \
-    return float32_ ## name(a, b, fpst); \
-} \
-float64 VFP_HELPER(name, d)(float64 a, float64 b, void *fpstp) \
-{ \
-    float_status *fpst = fpstp; \
-    return float64_ ## name(a, b, fpst); \
-}
-VFP_BINOP(add)
-VFP_BINOP(sub)
-VFP_BINOP(mul)
-VFP_BINOP(div)
-VFP_BINOP(min)
-VFP_BINOP(max)
-VFP_BINOP(minnum)
-VFP_BINOP(maxnum)
-#undef VFP_BINOP
-
-float32 VFP_HELPER(neg, s)(float32 a)
-{
-    return float32_chs(a);
-}
-
-float64 VFP_HELPER(neg, d)(float64 a)
-{
-    return float64_chs(a);
-}
-
-float32 VFP_HELPER(abs, s)(float32 a)
-{
-    return float32_abs(a);
-}
-
-float64 VFP_HELPER(abs, d)(float64 a)
-{
-    return float64_abs(a);
-}
-
-float32 VFP_HELPER(sqrt, s)(float32 a, CPUARMState *env)
-{
-    return float32_sqrt(a, &env->vfp.fp_status);
-}
-
-float64 VFP_HELPER(sqrt, d)(float64 a, CPUARMState *env)
-{
-    return float64_sqrt(a, &env->vfp.fp_status);
-}
-
-static void softfloat_to_vfp_compare(CPUARMState *env, int cmp)
-{
-    uint32_t flags;
-    switch (cmp) {
-    case float_relation_equal:
-        flags = 0x6;
-        break;
-    case float_relation_less:
-        flags = 0x8;
-        break;
-    case float_relation_greater:
-        flags = 0x2;
-        break;
-    case float_relation_unordered:
-        flags = 0x3;
-        break;
-    default:
-        g_assert_not_reached();
-    }
-    env->vfp.xregs[ARM_VFP_FPSCR] =
-        deposit32(env->vfp.xregs[ARM_VFP_FPSCR], 28, 4, flags);
-}
-
-/* XXX: check quiet/signaling case */
-#define DO_VFP_cmp(p, type) \
-void VFP_HELPER(cmp, p)(type a, type b, CPUARMState *env)  \
-{ \
-    softfloat_to_vfp_compare(env, \
-        type ## _compare_quiet(a, b, &env->vfp.fp_status)); \
-} \
-void VFP_HELPER(cmpe, p)(type a, type b, CPUARMState *env) \
-{ \
-    softfloat_to_vfp_compare(env, \
-        type ## _compare(a, b, &env->vfp.fp_status)); \
-}
-DO_VFP_cmp(s, float32)
-DO_VFP_cmp(d, float64)
-#undef DO_VFP_cmp
-
-/* Integer to float and float to integer conversions */
-
-#define CONV_ITOF(name, ftype, fsz, sign)                           \
-ftype HELPER(name)(uint32_t x, void *fpstp)                         \
-{                                                                   \
-    float_status *fpst = fpstp;                                     \
-    return sign##int32_to_##float##fsz((sign##int32_t)x, fpst);     \
-}
-
-#define CONV_FTOI(name, ftype, fsz, sign, round)                \
-sign##int32_t HELPER(name)(ftype x, void *fpstp)                \
-{                                                               \
-    float_status *fpst = fpstp;                                 \
-    if (float##fsz##_is_any_nan(x)) {                           \
-        float_raise(float_flag_invalid, fpst);                  \
-        return 0;                                               \
-    }                                                           \
-    return float##fsz##_to_##sign##int32##round(x, fpst);       \
-}
-
-#define FLOAT_CONVS(name, p, ftype, fsz, sign)            \
-    CONV_ITOF(vfp_##name##to##p, ftype, fsz, sign)        \
-    CONV_FTOI(vfp_to##name##p, ftype, fsz, sign, )        \
-    CONV_FTOI(vfp_to##name##z##p, ftype, fsz, sign, _round_to_zero)
-
-FLOAT_CONVS(si, h, uint32_t, 16, )
-FLOAT_CONVS(si, s, float32, 32, )
-FLOAT_CONVS(si, d, float64, 64, )
-FLOAT_CONVS(ui, h, uint32_t, 16, u)
-FLOAT_CONVS(ui, s, float32, 32, u)
-FLOAT_CONVS(ui, d, float64, 64, u)
-
-#undef CONV_ITOF
-#undef CONV_FTOI
-#undef FLOAT_CONVS
-
-/* floating point conversion */
-float64 VFP_HELPER(fcvtd, s)(float32 x, CPUARMState *env)
-{
-    return float32_to_float64(x, &env->vfp.fp_status);
-}
-
-float32 VFP_HELPER(fcvts, d)(float64 x, CPUARMState *env)
-{
-    return float64_to_float32(x, &env->vfp.fp_status);
-}
-
-/* VFP3 fixed point conversion.  */
-#define VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype) \
-float##fsz HELPER(vfp_##name##to##p)(uint##isz##_t  x, uint32_t shift, \
-                                     void *fpstp) \
-{ return itype##_to_##float##fsz##_scalbn(x, -shift, fpstp); }
-
-#define VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype, ROUND, suff)   \
-uint##isz##_t HELPER(vfp_to##name##p##suff)(float##fsz x, uint32_t shift, \
-                                            void *fpst)                   \
-{                                                                         \
-    if (unlikely(float##fsz##_is_any_nan(x))) {                           \
-        float_raise(float_flag_invalid, fpst);                            \
-        return 0;                                                         \
-    }                                                                     \
-    return float##fsz##_to_##itype##_scalbn(x, ROUND, shift, fpst);       \
-}
-
-#define VFP_CONV_FIX(name, p, fsz, isz, itype)                   \
-VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype)                     \
-VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
-                         float_round_to_zero, _round_to_zero)    \
-VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
-                         get_float_rounding_mode(fpst), )
-
-#define VFP_CONV_FIX_A64(name, p, fsz, isz, itype)               \
-VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype)                     \
-VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
-                         get_float_rounding_mode(fpst), )
-
-VFP_CONV_FIX(sh, d, 64, 64, int16)
-VFP_CONV_FIX(sl, d, 64, 64, int32)
-VFP_CONV_FIX_A64(sq, d, 64, 64, int64)
-VFP_CONV_FIX(uh, d, 64, 64, uint16)
-VFP_CONV_FIX(ul, d, 64, 64, uint32)
-VFP_CONV_FIX_A64(uq, d, 64, 64, uint64)
-VFP_CONV_FIX(sh, s, 32, 32, int16)
-VFP_CONV_FIX(sl, s, 32, 32, int32)
-VFP_CONV_FIX_A64(sq, s, 32, 64, int64)
-VFP_CONV_FIX(uh, s, 32, 32, uint16)
-VFP_CONV_FIX(ul, s, 32, 32, uint32)
-VFP_CONV_FIX_A64(uq, s, 32, 64, uint64)
-
-#undef VFP_CONV_FIX
-#undef VFP_CONV_FIX_FLOAT
-#undef VFP_CONV_FLOAT_FIX_ROUND
-#undef VFP_CONV_FIX_A64
-
-uint32_t HELPER(vfp_sltoh)(uint32_t x, uint32_t shift, void *fpst)
-{
-    return int32_to_float16_scalbn(x, -shift, fpst);
-}
-
-uint32_t HELPER(vfp_ultoh)(uint32_t x, uint32_t shift, void *fpst)
-{
-    return uint32_to_float16_scalbn(x, -shift, fpst);
-}
-
-uint32_t HELPER(vfp_sqtoh)(uint64_t x, uint32_t shift, void *fpst)
-{
-    return int64_to_float16_scalbn(x, -shift, fpst);
-}
-
-uint32_t HELPER(vfp_uqtoh)(uint64_t x, uint32_t shift, void *fpst)
-{
-    return uint64_to_float16_scalbn(x, -shift, fpst);
-}
-
-uint32_t HELPER(vfp_toshh)(uint32_t x, uint32_t shift, void *fpst)
-{
-    if (unlikely(float16_is_any_nan(x))) {
-        float_raise(float_flag_invalid, fpst);
-        return 0;
-    }
-    return float16_to_int16_scalbn(x, get_float_rounding_mode(fpst),
-                                   shift, fpst);
-}
-
-uint32_t HELPER(vfp_touhh)(uint32_t x, uint32_t shift, void *fpst)
-{
-    if (unlikely(float16_is_any_nan(x))) {
-        float_raise(float_flag_invalid, fpst);
-        return 0;
-    }
-    return float16_to_uint16_scalbn(x, get_float_rounding_mode(fpst),
-                                    shift, fpst);
-}
-
-uint32_t HELPER(vfp_toslh)(uint32_t x, uint32_t shift, void *fpst)
-{
-    if (unlikely(float16_is_any_nan(x))) {
-        float_raise(float_flag_invalid, fpst);
-        return 0;
-    }
-    return float16_to_int32_scalbn(x, get_float_rounding_mode(fpst),
-                                   shift, fpst);
-}
-
-uint32_t HELPER(vfp_toulh)(uint32_t x, uint32_t shift, void *fpst)
-{
-    if (unlikely(float16_is_any_nan(x))) {
-        float_raise(float_flag_invalid, fpst);
-        return 0;
-    }
-    return float16_to_uint32_scalbn(x, get_float_rounding_mode(fpst),
-                                    shift, fpst);
-}
-
-uint64_t HELPER(vfp_tosqh)(uint32_t x, uint32_t shift, void *fpst)
-{
-    if (unlikely(float16_is_any_nan(x))) {
-        float_raise(float_flag_invalid, fpst);
-        return 0;
-    }
-    return float16_to_int64_scalbn(x, get_float_rounding_mode(fpst),
-                                   shift, fpst);
-}
-
-uint64_t HELPER(vfp_touqh)(uint32_t x, uint32_t shift, void *fpst)
-{
-    if (unlikely(float16_is_any_nan(x))) {
-        float_raise(float_flag_invalid, fpst);
-        return 0;
-    }
-    return float16_to_uint64_scalbn(x, get_float_rounding_mode(fpst),
-                                    shift, fpst);
-}
-
-/* Set the current fp rounding mode and return the old one.
- * The argument is a softfloat float_round_ value.
- */
-uint32_t HELPER(set_rmode)(uint32_t rmode, void *fpstp)
-{
-    float_status *fp_status = fpstp;
-
-    uint32_t prev_rmode = get_float_rounding_mode(fp_status);
-    set_float_rounding_mode(rmode, fp_status);
-
-    return prev_rmode;
-}
-
-/* Set the current fp rounding mode in the standard fp status and return
- * the old one. This is for NEON instructions that need to change the
- * rounding mode but wish to use the standard FPSCR values for everything
- * else. Always set the rounding mode back to the correct value after
- * modifying it.
- * The argument is a softfloat float_round_ value.
- */
-uint32_t HELPER(set_neon_rmode)(uint32_t rmode, CPUARMState *env)
-{
-    float_status *fp_status = &env->vfp.standard_fp_status;
-
-    uint32_t prev_rmode = get_float_rounding_mode(fp_status);
-    set_float_rounding_mode(rmode, fp_status);
-
-    return prev_rmode;
-}
-
-/* Half precision conversions.  */
-float32 HELPER(vfp_fcvt_f16_to_f32)(uint32_t a, void *fpstp, uint32_t ahp_mode)
-{
-    /* Squash FZ16 to 0 for the duration of conversion.  In this case,
-     * it would affect flushing input denormals.
-     */
-    float_status *fpst = fpstp;
-    flag save = get_flush_inputs_to_zero(fpst);
-    set_flush_inputs_to_zero(false, fpst);
-    float32 r = float16_to_float32(a, !ahp_mode, fpst);
-    set_flush_inputs_to_zero(save, fpst);
-    return r;
-}
-
-uint32_t HELPER(vfp_fcvt_f32_to_f16)(float32 a, void *fpstp, uint32_t ahp_mode)
-{
-    /* Squash FZ16 to 0 for the duration of conversion.  In this case,
-     * it would affect flushing output denormals.
-     */
-    float_status *fpst = fpstp;
-    flag save = get_flush_to_zero(fpst);
-    set_flush_to_zero(false, fpst);
-    float16 r = float32_to_float16(a, !ahp_mode, fpst);
-    set_flush_to_zero(save, fpst);
-    return r;
-}
-
-float64 HELPER(vfp_fcvt_f16_to_f64)(uint32_t a, void *fpstp, uint32_t ahp_mode)
-{
-    /* Squash FZ16 to 0 for the duration of conversion.  In this case,
-     * it would affect flushing input denormals.
-     */
-    float_status *fpst = fpstp;
-    flag save = get_flush_inputs_to_zero(fpst);
-    set_flush_inputs_to_zero(false, fpst);
-    float64 r = float16_to_float64(a, !ahp_mode, fpst);
-    set_flush_inputs_to_zero(save, fpst);
-    return r;
-}
-
-uint32_t HELPER(vfp_fcvt_f64_to_f16)(float64 a, void *fpstp, uint32_t ahp_mode)
-{
-    /* Squash FZ16 to 0 for the duration of conversion.  In this case,
-     * it would affect flushing output denormals.
-     */
-    float_status *fpst = fpstp;
-    flag save = get_flush_to_zero(fpst);
-    set_flush_to_zero(false, fpst);
-    float16 r = float64_to_float16(a, !ahp_mode, fpst);
-    set_flush_to_zero(save, fpst);
-    return r;
-}
-
-#define float32_two make_float32(0x40000000)
-#define float32_three make_float32(0x40400000)
-#define float32_one_point_five make_float32(0x3fc00000)
-
-float32 HELPER(recps_f32)(float32 a, float32 b, CPUARMState *env)
-{
-    float_status *s = &env->vfp.standard_fp_status;
-    if ((float32_is_infinity(a) && float32_is_zero_or_denormal(b)) ||
-        (float32_is_infinity(b) && float32_is_zero_or_denormal(a))) {
-        if (!(float32_is_zero(a) || float32_is_zero(b))) {
-            float_raise(float_flag_input_denormal, s);
-        }
-        return float32_two;
-    }
-    return float32_sub(float32_two, float32_mul(a, b, s), s);
-}
-
-float32 HELPER(rsqrts_f32)(float32 a, float32 b, CPUARMState *env)
-{
-    float_status *s = &env->vfp.standard_fp_status;
-    float32 product;
-    if ((float32_is_infinity(a) && float32_is_zero_or_denormal(b)) ||
-        (float32_is_infinity(b) && float32_is_zero_or_denormal(a))) {
-        if (!(float32_is_zero(a) || float32_is_zero(b))) {
-            float_raise(float_flag_input_denormal, s);
-        }
-        return float32_one_point_five;
-    }
-    product = float32_mul(a, b, s);
-    return float32_div(float32_sub(float32_three, product, s), float32_two, s);
-}
-
-/* NEON helpers.  */
-
-/* Constants 256 and 512 are used in some helpers; we avoid relying on
- * int->float conversions at run-time.  */
-#define float64_256 make_float64(0x4070000000000000LL)
-#define float64_512 make_float64(0x4080000000000000LL)
-#define float16_maxnorm make_float16(0x7bff)
-#define float32_maxnorm make_float32(0x7f7fffff)
-#define float64_maxnorm make_float64(0x7fefffffffffffffLL)
-
-/* Reciprocal functions
- *
- * The algorithm that must be used to calculate the estimate
- * is specified by the ARM ARM, see FPRecipEstimate()/RecipEstimate
- */
-
-/* See RecipEstimate()
- *
- * input is a 9 bit fixed point number
- * input range 256 .. 511 for a number from 0.5 <= x < 1.0.
- * result range 256 .. 511 for a number from 1.0 to 511/256.
- */
-
-static int recip_estimate(int input)
-{
-    int a, b, r;
-    assert(256 <= input && input < 512);
-    a = (input * 2) + 1;
-    b = (1 << 19) / a;
-    r = (b + 1) >> 1;
-    assert(256 <= r && r < 512);
-    return r;
-}
-
-/*
- * Common wrapper to call recip_estimate
- *
- * The parameters are exponent and 64 bit fraction (without implicit
- * bit) where the binary point is nominally at bit 52. Returns a
- * float64 which can then be rounded to the appropriate size by the
- * callee.
- */
-
-static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
-{
-    uint32_t scaled, estimate;
-    uint64_t result_frac;
-    int result_exp;
-
-    /* Handle sub-normals */
-    if (*exp == 0) {
-        if (extract64(frac, 51, 1) == 0) {
-            *exp = -1;
-            frac <<= 2;
-        } else {
-            frac <<= 1;
-        }
-    }
-
-    /* scaled = UInt('1':fraction<51:44>) */
-    scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
-    estimate = recip_estimate(scaled);
-
-    result_exp = exp_off - *exp;
-    result_frac = deposit64(0, 44, 8, estimate);
-    if (result_exp == 0) {
-        result_frac = deposit64(result_frac >> 1, 51, 1, 1);
-    } else if (result_exp == -1) {
-        result_frac = deposit64(result_frac >> 2, 50, 2, 1);
-        result_exp = 0;
-    }
-
-    *exp = result_exp;
-
-    return result_frac;
-}
-
-static bool round_to_inf(float_status *fpst, bool sign_bit)
-{
-    switch (fpst->float_rounding_mode) {
-    case float_round_nearest_even: /* Round to Nearest */
-        return true;
-    case float_round_up: /* Round to +Inf */
-        return !sign_bit;
-    case float_round_down: /* Round to -Inf */
-        return sign_bit;
-    case float_round_to_zero: /* Round to Zero */
-        return false;
-    }
-
-    g_assert_not_reached();
-}
-
-uint32_t HELPER(recpe_f16)(uint32_t input, void *fpstp)
-{
-    float_status *fpst = fpstp;
-    float16 f16 = float16_squash_input_denormal(input, fpst);
-    uint32_t f16_val = float16_val(f16);
-    uint32_t f16_sign = float16_is_neg(f16);
-    int f16_exp = extract32(f16_val, 10, 5);
-    uint32_t f16_frac = extract32(f16_val, 0, 10);
-    uint64_t f64_frac;
-
-    if (float16_is_any_nan(f16)) {
-        float16 nan = f16;
-        if (float16_is_signaling_nan(f16, fpst)) {
-            float_raise(float_flag_invalid, fpst);
-            nan = float16_silence_nan(f16, fpst);
-        }
-        if (fpst->default_nan_mode) {
-            nan =  float16_default_nan(fpst);
-        }
-        return nan;
-    } else if (float16_is_infinity(f16)) {
-        return float16_set_sign(float16_zero, float16_is_neg(f16));
-    } else if (float16_is_zero(f16)) {
-        float_raise(float_flag_divbyzero, fpst);
-        return float16_set_sign(float16_infinity, float16_is_neg(f16));
-    } else if (float16_abs(f16) < (1 << 8)) {
-        /* Abs(value) < 2.0^-16 */
-        float_raise(float_flag_overflow | float_flag_inexact, fpst);
-        if (round_to_inf(fpst, f16_sign)) {
-            return float16_set_sign(float16_infinity, f16_sign);
-        } else {
-            return float16_set_sign(float16_maxnorm, f16_sign);
-        }
-    } else if (f16_exp >= 29 && fpst->flush_to_zero) {
-        float_raise(float_flag_underflow, fpst);
-        return float16_set_sign(float16_zero, float16_is_neg(f16));
-    }
-
-    f64_frac = call_recip_estimate(&f16_exp, 29,
-                                   ((uint64_t) f16_frac) << (52 - 10));
-
-    /* result = sign : result_exp<4:0> : fraction<51:42> */
-    f16_val = deposit32(0, 15, 1, f16_sign);
-    f16_val = deposit32(f16_val, 10, 5, f16_exp);
-    f16_val = deposit32(f16_val, 0, 10, extract64(f64_frac, 52 - 10, 10));
-    return make_float16(f16_val);
-}
-
-float32 HELPER(recpe_f32)(float32 input, void *fpstp)
-{
-    float_status *fpst = fpstp;
-    float32 f32 = float32_squash_input_denormal(input, fpst);
-    uint32_t f32_val = float32_val(f32);
-    bool f32_sign = float32_is_neg(f32);
-    int f32_exp = extract32(f32_val, 23, 8);
-    uint32_t f32_frac = extract32(f32_val, 0, 23);
-    uint64_t f64_frac;
-
-    if (float32_is_any_nan(f32)) {
-        float32 nan = f32;
-        if (float32_is_signaling_nan(f32, fpst)) {
-            float_raise(float_flag_invalid, fpst);
-            nan = float32_silence_nan(f32, fpst);
-        }
-        if (fpst->default_nan_mode) {
-            nan =  float32_default_nan(fpst);
-        }
-        return nan;
-    } else if (float32_is_infinity(f32)) {
-        return float32_set_sign(float32_zero, float32_is_neg(f32));
-    } else if (float32_is_zero(f32)) {
-        float_raise(float_flag_divbyzero, fpst);
-        return float32_set_sign(float32_infinity, float32_is_neg(f32));
-    } else if (float32_abs(f32) < (1ULL << 21)) {
-        /* Abs(value) < 2.0^-128 */
-        float_raise(float_flag_overflow | float_flag_inexact, fpst);
-        if (round_to_inf(fpst, f32_sign)) {
-            return float32_set_sign(float32_infinity, f32_sign);
-        } else {
-            return float32_set_sign(float32_maxnorm, f32_sign);
-        }
-    } else if (f32_exp >= 253 && fpst->flush_to_zero) {
-        float_raise(float_flag_underflow, fpst);
-        return float32_set_sign(float32_zero, float32_is_neg(f32));
-    }
-
-    f64_frac = call_recip_estimate(&f32_exp, 253,
-                                   ((uint64_t) f32_frac) << (52 - 23));
-
-    /* result = sign : result_exp<7:0> : fraction<51:29> */
-    f32_val = deposit32(0, 31, 1, f32_sign);
-    f32_val = deposit32(f32_val, 23, 8, f32_exp);
-    f32_val = deposit32(f32_val, 0, 23, extract64(f64_frac, 52 - 23, 23));
-    return make_float32(f32_val);
-}
-
-float64 HELPER(recpe_f64)(float64 input, void *fpstp)
-{
-    float_status *fpst = fpstp;
-    float64 f64 = float64_squash_input_denormal(input, fpst);
-    uint64_t f64_val = float64_val(f64);
-    bool f64_sign = float64_is_neg(f64);
-    int f64_exp = extract64(f64_val, 52, 11);
-    uint64_t f64_frac = extract64(f64_val, 0, 52);
-
-    /* Deal with any special cases */
-    if (float64_is_any_nan(f64)) {
-        float64 nan = f64;
-        if (float64_is_signaling_nan(f64, fpst)) {
-            float_raise(float_flag_invalid, fpst);
-            nan = float64_silence_nan(f64, fpst);
-        }
-        if (fpst->default_nan_mode) {
-            nan =  float64_default_nan(fpst);
-        }
-        return nan;
-    } else if (float64_is_infinity(f64)) {
-        return float64_set_sign(float64_zero, float64_is_neg(f64));
-    } else if (float64_is_zero(f64)) {
-        float_raise(float_flag_divbyzero, fpst);
-        return float64_set_sign(float64_infinity, float64_is_neg(f64));
-    } else if ((f64_val & ~(1ULL << 63)) < (1ULL << 50)) {
-        /* Abs(value) < 2.0^-1024 */
-        float_raise(float_flag_overflow | float_flag_inexact, fpst);
-        if (round_to_inf(fpst, f64_sign)) {
-            return float64_set_sign(float64_infinity, f64_sign);
-        } else {
-            return float64_set_sign(float64_maxnorm, f64_sign);
-        }
-    } else if (f64_exp >= 2045 && fpst->flush_to_zero) {
-        float_raise(float_flag_underflow, fpst);
-        return float64_set_sign(float64_zero, float64_is_neg(f64));
-    }
-
-    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac);
-
-    /* result = sign : result_exp<10:0> : fraction<51:0>; */
-    f64_val = deposit64(0, 63, 1, f64_sign);
-    f64_val = deposit64(f64_val, 52, 11, f64_exp);
-    f64_val = deposit64(f64_val, 0, 52, f64_frac);
-    return make_float64(f64_val);
-}
-
-/* The algorithm that must be used to calculate the estimate
- * is specified by the ARM ARM.
- */
-
-static int do_recip_sqrt_estimate(int a)
-{
-    int b, estimate;
-
-    assert(128 <= a && a < 512);
-    if (a < 256) {
-        a = a * 2 + 1;
-    } else {
-        a = (a >> 1) << 1;
-        a = (a + 1) * 2;
-    }
-    b = 512;
-    while (a * (b + 1) * (b + 1) < (1 << 28)) {
-        b += 1;
-    }
-    estimate = (b + 1) / 2;
-    assert(256 <= estimate && estimate < 512);
-
-    return estimate;
-}
-
-
-static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
-{
-    int estimate;
-    uint32_t scaled;
-
-    if (*exp == 0) {
-        while (extract64(frac, 51, 1) == 0) {
-            frac = frac << 1;
-            *exp -= 1;
-        }
-        frac = extract64(frac, 0, 51) << 1;
-    }
-
-    if (*exp & 1) {
-        /* scaled = UInt('01':fraction<51:45>) */
-        scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
-    } else {
-        /* scaled = UInt('1':fraction<51:44>) */
-        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
-    }
-    estimate = do_recip_sqrt_estimate(scaled);
-
-    *exp = (exp_off - *exp) / 2;
-    return extract64(estimate, 0, 8) << 44;
-}
-
-uint32_t HELPER(rsqrte_f16)(uint32_t input, void *fpstp)
-{
-    float_status *s = fpstp;
-    float16 f16 = float16_squash_input_denormal(input, s);
-    uint16_t val = float16_val(f16);
-    bool f16_sign = float16_is_neg(f16);
-    int f16_exp = extract32(val, 10, 5);
-    uint16_t f16_frac = extract32(val, 0, 10);
-    uint64_t f64_frac;
-
-    if (float16_is_any_nan(f16)) {
-        float16 nan = f16;
-        if (float16_is_signaling_nan(f16, s)) {
-            float_raise(float_flag_invalid, s);
-            nan = float16_silence_nan(f16, s);
-        }
-        if (s->default_nan_mode) {
-            nan =  float16_default_nan(s);
-        }
-        return nan;
-    } else if (float16_is_zero(f16)) {
-        float_raise(float_flag_divbyzero, s);
-        return float16_set_sign(float16_infinity, f16_sign);
-    } else if (f16_sign) {
-        float_raise(float_flag_invalid, s);
-        return float16_default_nan(s);
-    } else if (float16_is_infinity(f16)) {
-        return float16_zero;
-    }
-
-    /* Scale and normalize to a double-precision value between 0.25 and 1.0,
-     * preserving the parity of the exponent.  */
-
-    f64_frac = ((uint64_t) f16_frac) << (52 - 10);
-
-    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac);
-
-    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(2) */
-    val = deposit32(0, 15, 1, f16_sign);
-    val = deposit32(val, 10, 5, f16_exp);
-    val = deposit32(val, 2, 8, extract64(f64_frac, 52 - 8, 8));
-    return make_float16(val);
-}
-
-float32 HELPER(rsqrte_f32)(float32 input, void *fpstp)
-{
-    float_status *s = fpstp;
-    float32 f32 = float32_squash_input_denormal(input, s);
-    uint32_t val = float32_val(f32);
-    uint32_t f32_sign = float32_is_neg(f32);
-    int f32_exp = extract32(val, 23, 8);
-    uint32_t f32_frac = extract32(val, 0, 23);
-    uint64_t f64_frac;
-
-    if (float32_is_any_nan(f32)) {
-        float32 nan = f32;
-        if (float32_is_signaling_nan(f32, s)) {
-            float_raise(float_flag_invalid, s);
-            nan = float32_silence_nan(f32, s);
-        }
-        if (s->default_nan_mode) {
-            nan =  float32_default_nan(s);
-        }
-        return nan;
-    } else if (float32_is_zero(f32)) {
-        float_raise(float_flag_divbyzero, s);
-        return float32_set_sign(float32_infinity, float32_is_neg(f32));
-    } else if (float32_is_neg(f32)) {
-        float_raise(float_flag_invalid, s);
-        return float32_default_nan(s);
-    } else if (float32_is_infinity(f32)) {
-        return float32_zero;
-    }
-
-    /* Scale and normalize to a double-precision value between 0.25 and 1.0,
-     * preserving the parity of the exponent.  */
-
-    f64_frac = ((uint64_t) f32_frac) << 29;
-
-    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac);
-
-    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(15) */
-    val = deposit32(0, 31, 1, f32_sign);
-    val = deposit32(val, 23, 8, f32_exp);
-    val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
-    return make_float32(val);
-}
-
-float64 HELPER(rsqrte_f64)(float64 input, void *fpstp)
-{
-    float_status *s = fpstp;
-    float64 f64 = float64_squash_input_denormal(input, s);
-    uint64_t val = float64_val(f64);
-    bool f64_sign = float64_is_neg(f64);
-    int f64_exp = extract64(val, 52, 11);
-    uint64_t f64_frac = extract64(val, 0, 52);
-
-    if (float64_is_any_nan(f64)) {
-        float64 nan = f64;
-        if (float64_is_signaling_nan(f64, s)) {
-            float_raise(float_flag_invalid, s);
-            nan = float64_silence_nan(f64, s);
-        }
-        if (s->default_nan_mode) {
-            nan =  float64_default_nan(s);
-        }
-        return nan;
-    } else if (float64_is_zero(f64)) {
-        float_raise(float_flag_divbyzero, s);
-        return float64_set_sign(float64_infinity, float64_is_neg(f64));
-    } else if (float64_is_neg(f64)) {
-        float_raise(float_flag_invalid, s);
-        return float64_default_nan(s);
-    } else if (float64_is_infinity(f64)) {
-        return float64_zero;
-    }
-
-    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac);
-
-    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(44) */
-    val = deposit64(0, 61, 1, f64_sign);
-    val = deposit64(val, 52, 11, f64_exp);
-    val = deposit64(val, 44, 8, extract64(f64_frac, 52 - 8, 8));
-    return make_float64(val);
-}
-
-uint32_t HELPER(recpe_u32)(uint32_t a, void *fpstp)
-{
-    /* float_status *s = fpstp; */
-    int input, estimate;
-
-    if ((a & 0x80000000) == 0) {
-        return 0xffffffff;
-    }
-
-    input = extract32(a, 23, 9);
-    estimate = recip_estimate(input);
-
-    return deposit32(0, (32 - 9), 9, estimate);
-}
-
-uint32_t HELPER(rsqrte_u32)(uint32_t a, void *fpstp)
-{
-    int estimate;
-
-    if ((a & 0xc0000000) == 0) {
-        return 0xffffffff;
-    }
-
-    estimate = do_recip_sqrt_estimate(extract32(a, 23, 9));
-
-    return deposit32(0, 23, 9, estimate);
-}
-
-/* VFPv4 fused multiply-accumulate */
-float32 VFP_HELPER(muladd, s)(float32 a, float32 b, float32 c, void *fpstp)
-{
-    float_status *fpst = fpstp;
-    return float32_muladd(a, b, c, 0, fpst);
-}
-
-float64 VFP_HELPER(muladd, d)(float64 a, float64 b, float64 c, void *fpstp)
-{
-    float_status *fpst = fpstp;
-    return float64_muladd(a, b, c, 0, fpst);
-}
-
-/* ARMv8 round to integral */
-float32 HELPER(rints_exact)(float32 x, void *fp_status)
-{
-    return float32_round_to_int(x, fp_status);
-}
-
-float64 HELPER(rintd_exact)(float64 x, void *fp_status)
-{
-    return float64_round_to_int(x, fp_status);
-}
-
-float32 HELPER(rints)(float32 x, void *fp_status)
-{
-    int old_flags = get_float_exception_flags(fp_status), new_flags;
-    float32 ret;
-
-    ret = float32_round_to_int(x, fp_status);
-
-    /* Suppress any inexact exceptions the conversion produced */
-    if (!(old_flags & float_flag_inexact)) {
-        new_flags = get_float_exception_flags(fp_status);
-        set_float_exception_flags(new_flags & ~float_flag_inexact, fp_status);
-    }
-
-    return ret;
-}
-
-float64 HELPER(rintd)(float64 x, void *fp_status)
-{
-    int old_flags = get_float_exception_flags(fp_status), new_flags;
-    float64 ret;
-
-    ret = float64_round_to_int(x, fp_status);
-
-    new_flags = get_float_exception_flags(fp_status);
-
-    /* Suppress any inexact exceptions the conversion produced */
-    if (!(old_flags & float_flag_inexact)) {
-        new_flags = get_float_exception_flags(fp_status);
-        set_float_exception_flags(new_flags & ~float_flag_inexact, fp_status);
-    }
-
-    return ret;
-}
-
-/* Convert ARM rounding mode to softfloat */
-int arm_rmode_to_sf(int rmode)
-{
-    switch (rmode) {
-    case FPROUNDING_TIEAWAY:
-        rmode = float_round_ties_away;
-        break;
-    case FPROUNDING_ODD:
-        /* FIXME: add support for TIEAWAY and ODD */
-        qemu_log_mask(LOG_UNIMP, "arm: unimplemented rounding mode: %d\n",
-                      rmode);
-        /* fall through for now */
-    case FPROUNDING_TIEEVEN:
-    default:
-        rmode = float_round_nearest_even;
-        break;
-    case FPROUNDING_POSINF:
-        rmode = float_round_up;
-        break;
-    case FPROUNDING_NEGINF:
-        rmode = float_round_down;
-        break;
-    case FPROUNDING_ZERO:
-        rmode = float_round_to_zero;
-        break;
-    }
-    return rmode;
-}
-
 /* CRC helpers.
  * The upper bytes of val (above the number specified by 'bytes') must have
  * been zeroed out by the caller.
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * ARM VFP floating-point operations
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "cpu.h"
+#include "exec/helper-proto.h"
+#include "fpu/softfloat.h"
+#include "internals.h"
+
+
+/* VFP support.  We follow the convention used for VFP instructions:
+   Single precision routines have a "s" suffix, double precision a
+   "d" suffix.  */
+
+/* Convert host exception flags to vfp form.  */
+static inline int vfp_exceptbits_from_host(int host_bits)
+{
+    int target_bits = 0;
+
+    if (host_bits & float_flag_invalid)
+        target_bits |= 1;
+    if (host_bits & float_flag_divbyzero)
+        target_bits |= 2;
+    if (host_bits & float_flag_overflow)
+        target_bits |= 4;
+    if (host_bits & (float_flag_underflow | float_flag_output_denormal))
+        target_bits |= 8;
+    if (host_bits & float_flag_inexact)
+        target_bits |= 0x10;
+    if (host_bits & float_flag_input_denormal)
+        target_bits |= 0x80;
+    return target_bits;
+}
+
+uint32_t HELPER(vfp_get_fpscr)(CPUARMState *env)
+{
+    uint32_t i, fpscr;
+
+    fpscr = env->vfp.xregs[ARM_VFP_FPSCR]
+            | (env->vfp.vec_len << 16)
+            | (env->vfp.vec_stride << 20);
+
+    i = get_float_exception_flags(&env->vfp.fp_status);
+    i |= get_float_exception_flags(&env->vfp.standard_fp_status);
+    /* FZ16 does not generate an input denormal exception.  */
+    i |= (get_float_exception_flags(&env->vfp.fp_status_f16)
+          & ~float_flag_input_denormal);
+    fpscr |= vfp_exceptbits_from_host(i);
+
+    i = env->vfp.qc[0] | env->vfp.qc[1] | env->vfp.qc[2] | env->vfp.qc[3];
+    fpscr |= i ? FPCR_QC : 0;
+
+    return fpscr;
+}
+
+uint32_t vfp_get_fpscr(CPUARMState *env)
+{
+    return HELPER(vfp_get_fpscr)(env);
+}
+
+/* Convert vfp exception flags to target form.  */
+static inline int vfp_exceptbits_to_host(int target_bits)
+{
+    int host_bits = 0;
+
+    if (target_bits & 1)
+        host_bits |= float_flag_invalid;
+    if (target_bits & 2)
+        host_bits |= float_flag_divbyzero;
+    if (target_bits & 4)
+        host_bits |= float_flag_overflow;
+    if (target_bits & 8)
+        host_bits |= float_flag_underflow;
+    if (target_bits & 0x10)
+        host_bits |= float_flag_inexact;
+    if (target_bits & 0x80)
+        host_bits |= float_flag_input_denormal;
+    return host_bits;
+}
+
+void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
+{
+    int i;
+    uint32_t changed = env->vfp.xregs[ARM_VFP_FPSCR];
+
+    /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
+    if (!cpu_isar_feature(aa64_fp16, arm_env_get_cpu(env))) {
+        val &= ~FPCR_FZ16;
+    }
+
+    /*
+     * We don't implement trapped exception handling, so the
+     * trap enable bits, IDE|IXE|UFE|OFE|DZE|IOE are all RAZ/WI (not RES0!)
+     *
+     * If we exclude the exception flags, IOC|DZC|OFC|UFC|IXC|IDC
+     * (which are stored in fp_status), and the other RES0 bits
+     * in between, then we clear all of the low 16 bits.
+     */
+    env->vfp.xregs[ARM_VFP_FPSCR] = val & 0xf7c80000;
+    env->vfp.vec_len = (val >> 16) & 7;
+    env->vfp.vec_stride = (val >> 20) & 3;
+
+    /*
+     * The bit we set within fpscr_q is arbitrary; the register as a
+     * whole being zero/non-zero is what counts.
+     */
+    env->vfp.qc[0] = val & FPCR_QC;
+    env->vfp.qc[1] = 0;
+    env->vfp.qc[2] = 0;
+    env->vfp.qc[3] = 0;
+
+    changed ^= val;
+    if (changed & (3 << 22)) {
+        i = (val >> 22) & 3;
+        switch (i) {
+        case FPROUNDING_TIEEVEN:
+            i = float_round_nearest_even;
+            break;
+        case FPROUNDING_POSINF:
+            i = float_round_up;
+            break;
+        case FPROUNDING_NEGINF:
+            i = float_round_down;
+            break;
+        case FPROUNDING_ZERO:
+            i = float_round_to_zero;
+            break;
+        }
+        set_float_rounding_mode(i, &env->vfp.fp_status);
+        set_float_rounding_mode(i, &env->vfp.fp_status_f16);
+    }
+    if (changed & FPCR_FZ16) {
+        bool ftz_enabled = val & FPCR_FZ16;
+        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status_f16);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status_f16);
+    }
+    if (changed & FPCR_FZ) {
+        bool ftz_enabled = val & FPCR_FZ;
+        set_flush_to_zero(ftz_enabled, &env->vfp.fp_status);
+        set_flush_inputs_to_zero(ftz_enabled, &env->vfp.fp_status);
+    }
+    if (changed & FPCR_DN) {
+        bool dnan_enabled = val & FPCR_DN;
+        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status);
+        set_default_nan_mode(dnan_enabled, &env->vfp.fp_status_f16);
+    }
+
+    /* The exception flags are ORed together when we read fpscr so we
+     * only need to preserve the current state in one of our
+     * float_status values.
+     */
+    i = vfp_exceptbits_to_host(val);
+    set_float_exception_flags(i, &env->vfp.fp_status);
+    set_float_exception_flags(0, &env->vfp.fp_status_f16);
+    set_float_exception_flags(0, &env->vfp.standard_fp_status);
+}
+
+void vfp_set_fpscr(CPUARMState *env, uint32_t val)
+{
+    HELPER(vfp_set_fpscr)(env, val);
+}
+
+#define VFP_HELPER(name, p) HELPER(glue(glue(vfp_,name),p))
+
+#define VFP_BINOP(name) \
+float32 VFP_HELPER(name, s)(float32 a, float32 b, void *fpstp) \
+{ \
+    float_status *fpst = fpstp; \
+    return float32_ ## name(a, b, fpst); \
+} \
+float64 VFP_HELPER(name, d)(float64 a, float64 b, void *fpstp) \
+{ \
+    float_status *fpst = fpstp; \
+    return float64_ ## name(a, b, fpst); \
+}
+VFP_BINOP(add)
+VFP_BINOP(sub)
+VFP_BINOP(mul)
+VFP_BINOP(div)
+VFP_BINOP(min)
+VFP_BINOP(max)
+VFP_BINOP(minnum)
+VFP_BINOP(maxnum)
+#undef VFP_BINOP
+
+float32 VFP_HELPER(neg, s)(float32 a)
+{
+    return float32_chs(a);
+}
+
+float64 VFP_HELPER(neg, d)(float64 a)
+{
+    return float64_chs(a);
+}
+
+float32 VFP_HELPER(abs, s)(float32 a)
+{
+    return float32_abs(a);
+}
+
+float64 VFP_HELPER(abs, d)(float64 a)
+{
+    return float64_abs(a);
+}
+
+float32 VFP_HELPER(sqrt, s)(float32 a, CPUARMState *env)
+{
+    return float32_sqrt(a, &env->vfp.fp_status);
+}
+
+float64 VFP_HELPER(sqrt, d)(float64 a, CPUARMState *env)
+{
+    return float64_sqrt(a, &env->vfp.fp_status);
+}
+
+static void softfloat_to_vfp_compare(CPUARMState *env, int cmp)
+{
+    uint32_t flags;
+    switch (cmp) {
+    case float_relation_equal:
+        flags = 0x6;
+        break;
+    case float_relation_less:
+        flags = 0x8;
+        break;
+    case float_relation_greater:
+        flags = 0x2;
+        break;
+    case float_relation_unordered:
+        flags = 0x3;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    env->vfp.xregs[ARM_VFP_FPSCR] =
+        deposit32(env->vfp.xregs[ARM_VFP_FPSCR], 28, 4, flags);
+}
+
+/* XXX: check quiet/signaling case */
+#define DO_VFP_cmp(p, type) \
+void VFP_HELPER(cmp, p)(type a, type b, CPUARMState *env)  \
+{ \
+    softfloat_to_vfp_compare(env, \
+        type ## _compare_quiet(a, b, &env->vfp.fp_status)); \
+} \
+void VFP_HELPER(cmpe, p)(type a, type b, CPUARMState *env) \
+{ \
+    softfloat_to_vfp_compare(env, \
+        type ## _compare(a, b, &env->vfp.fp_status)); \
+}
+DO_VFP_cmp(s, float32)
+DO_VFP_cmp(d, float64)
+#undef DO_VFP_cmp
+
+/* Integer to float and float to integer conversions */
+
+#define CONV_ITOF(name, ftype, fsz, sign)                           \
+ftype HELPER(name)(uint32_t x, void *fpstp)                         \
+{                                                                   \
+    float_status *fpst = fpstp;                                     \
+    return sign##int32_to_##float##fsz((sign##int32_t)x, fpst);     \
+}
+
+#define CONV_FTOI(name, ftype, fsz, sign, round)                \
+sign##int32_t HELPER(name)(ftype x, void *fpstp)                \
+{                                                               \
+    float_status *fpst = fpstp;                                 \
+    if (float##fsz##_is_any_nan(x)) {                           \
+        float_raise(float_flag_invalid, fpst);                  \
+        return 0;                                               \
+    }                                                           \
+    return float##fsz##_to_##sign##int32##round(x, fpst);       \
+}
+
+#define FLOAT_CONVS(name, p, ftype, fsz, sign)            \
+    CONV_ITOF(vfp_##name##to##p, ftype, fsz, sign)        \
+    CONV_FTOI(vfp_to##name##p, ftype, fsz, sign, )        \
+    CONV_FTOI(vfp_to##name##z##p, ftype, fsz, sign, _round_to_zero)
+
+FLOAT_CONVS(si, h, uint32_t, 16, )
+FLOAT_CONVS(si, s, float32, 32, )
+FLOAT_CONVS(si, d, float64, 64, )
+FLOAT_CONVS(ui, h, uint32_t, 16, u)
+FLOAT_CONVS(ui, s, float32, 32, u)
+FLOAT_CONVS(ui, d, float64, 64, u)
+
+#undef CONV_ITOF
+#undef CONV_FTOI
+#undef FLOAT_CONVS
+
+/* floating point conversion */
+float64 VFP_HELPER(fcvtd, s)(float32 x, CPUARMState *env)
+{
+    return float32_to_float64(x, &env->vfp.fp_status);
+}
+
+float32 VFP_HELPER(fcvts, d)(float64 x, CPUARMState *env)
+{
+    return float64_to_float32(x, &env->vfp.fp_status);
+}
+
+/* VFP3 fixed point conversion.  */
+#define VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype) \
+float##fsz HELPER(vfp_##name##to##p)(uint##isz##_t  x, uint32_t shift, \
+                                     void *fpstp) \
+{ return itype##_to_##float##fsz##_scalbn(x, -shift, fpstp); }
+
+#define VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype, ROUND, suff)   \
+uint##isz##_t HELPER(vfp_to##name##p##suff)(float##fsz x, uint32_t shift, \
+                                            void *fpst)                   \
+{                                                                         \
+    if (unlikely(float##fsz##_is_any_nan(x))) {                           \
+        float_raise(float_flag_invalid, fpst);                            \
+        return 0;                                                         \
+    }                                                                     \
+    return float##fsz##_to_##itype##_scalbn(x, ROUND, shift, fpst);       \
+}
+
+#define VFP_CONV_FIX(name, p, fsz, isz, itype)                   \
+VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype)                     \
+VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
+                         float_round_to_zero, _round_to_zero)    \
+VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
+                         get_float_rounding_mode(fpst), )
+
+#define VFP_CONV_FIX_A64(name, p, fsz, isz, itype)               \
+VFP_CONV_FIX_FLOAT(name, p, fsz, isz, itype)                     \
+VFP_CONV_FLOAT_FIX_ROUND(name, p, fsz, isz, itype,               \
+                         get_float_rounding_mode(fpst), )
+
+VFP_CONV_FIX(sh, d, 64, 64, int16)
+VFP_CONV_FIX(sl, d, 64, 64, int32)
+VFP_CONV_FIX_A64(sq, d, 64, 64, int64)
+VFP_CONV_FIX(uh, d, 64, 64, uint16)
+VFP_CONV_FIX(ul, d, 64, 64, uint32)
+VFP_CONV_FIX_A64(uq, d, 64, 64, uint64)
+VFP_CONV_FIX(sh, s, 32, 32, int16)
+VFP_CONV_FIX(sl, s, 32, 32, int32)
+VFP_CONV_FIX_A64(sq, s, 32, 64, int64)
+VFP_CONV_FIX(uh, s, 32, 32, uint16)
+VFP_CONV_FIX(ul, s, 32, 32, uint32)
+VFP_CONV_FIX_A64(uq, s, 32, 64, uint64)
+
+#undef VFP_CONV_FIX
+#undef VFP_CONV_FIX_FLOAT
+#undef VFP_CONV_FLOAT_FIX_ROUND
+#undef VFP_CONV_FIX_A64
+
+uint32_t HELPER(vfp_sltoh)(uint32_t x, uint32_t shift, void *fpst)
+{
+    return int32_to_float16_scalbn(x, -shift, fpst);
+}
+
+uint32_t HELPER(vfp_ultoh)(uint32_t x, uint32_t shift, void *fpst)
+{
+    return uint32_to_float16_scalbn(x, -shift, fpst);
+}
+
+uint32_t HELPER(vfp_sqtoh)(uint64_t x, uint32_t shift, void *fpst)
+{
+    return int64_to_float16_scalbn(x, -shift, fpst);
+}
+
+uint32_t HELPER(vfp_uqtoh)(uint64_t x, uint32_t shift, void *fpst)
+{
+    return uint64_to_float16_scalbn(x, -shift, fpst);
+}
+
+uint32_t HELPER(vfp_toshh)(uint32_t x, uint32_t shift, void *fpst)
+{
+    if (unlikely(float16_is_any_nan(x))) {
+        float_raise(float_flag_invalid, fpst);
+        return 0;
+    }
+    return float16_to_int16_scalbn(x, get_float_rounding_mode(fpst),
+                                   shift, fpst);
+}
+
+uint32_t HELPER(vfp_touhh)(uint32_t x, uint32_t shift, void *fpst)
+{
+    if (unlikely(float16_is_any_nan(x))) {
+        float_raise(float_flag_invalid, fpst);
+        return 0;
+    }
+    return float16_to_uint16_scalbn(x, get_float_rounding_mode(fpst),
+                                    shift, fpst);
+}
+
+uint32_t HELPER(vfp_toslh)(uint32_t x, uint32_t shift, void *fpst)
+{
+    if (unlikely(float16_is_any_nan(x))) {
+        float_raise(float_flag_invalid, fpst);
+        return 0;
+    }
+    return float16_to_int32_scalbn(x, get_float_rounding_mode(fpst),
+                                   shift, fpst);
+}
+
+uint32_t HELPER(vfp_toulh)(uint32_t x, uint32_t shift, void *fpst)
+{
+    if (unlikely(float16_is_any_nan(x))) {
+        float_raise(float_flag_invalid, fpst);
+        return 0;
+    }
+    return float16_to_uint32_scalbn(x, get_float_rounding_mode(fpst),
+                                    shift, fpst);
+}
+
+uint64_t HELPER(vfp_tosqh)(uint32_t x, uint32_t shift, void *fpst)
+{
+    if (unlikely(float16_is_any_nan(x))) {
+        float_raise(float_flag_invalid, fpst);
+        return 0;
+    }
+    return float16_to_int64_scalbn(x, get_float_rounding_mode(fpst),
+                                   shift, fpst);
+}
+
+uint64_t HELPER(vfp_touqh)(uint32_t x, uint32_t shift, void *fpst)
+{
+    if (unlikely(float16_is_any_nan(x))) {
+        float_raise(float_flag_invalid, fpst);
+        return 0;
+    }
+    return float16_to_uint64_scalbn(x, get_float_rounding_mode(fpst),
+                                    shift, fpst);
+}
+
+/* Set the current fp rounding mode and return the old one.
+ * The argument is a softfloat float_round_ value.
+ */
+uint32_t HELPER(set_rmode)(uint32_t rmode, void *fpstp)
+{
+    float_status *fp_status = fpstp;
+
+    uint32_t prev_rmode = get_float_rounding_mode(fp_status);
+    set_float_rounding_mode(rmode, fp_status);
+
+    return prev_rmode;
+}
+
+/* Set the current fp rounding mode in the standard fp status and return
+ * the old one. This is for NEON instructions that need to change the
+ * rounding mode but wish to use the standard FPSCR values for everything
+ * else. Always set the rounding mode back to the correct value after
+ * modifying it.
+ * The argument is a softfloat float_round_ value.
+ */
+uint32_t HELPER(set_neon_rmode)(uint32_t rmode, CPUARMState *env)
+{
+    float_status *fp_status = &env->vfp.standard_fp_status;
+
+    uint32_t prev_rmode = get_float_rounding_mode(fp_status);
+    set_float_rounding_mode(rmode, fp_status);
+
+    return prev_rmode;
+}
+
+/* Half precision conversions.  */
+float32 HELPER(vfp_fcvt_f16_to_f32)(uint32_t a, void *fpstp, uint32_t ahp_mode)
+{
+    /* Squash FZ16 to 0 for the duration of conversion.  In this case,
+     * it would affect flushing input denormals.
+     */
+    float_status *fpst = fpstp;
+    flag save = get_flush_inputs_to_zero(fpst);
+    set_flush_inputs_to_zero(false, fpst);
+    float32 r = float16_to_float32(a, !ahp_mode, fpst);
+    set_flush_inputs_to_zero(save, fpst);
+    return r;
+}
+
+uint32_t HELPER(vfp_fcvt_f32_to_f16)(float32 a, void *fpstp, uint32_t ahp_mode)
+{
+    /* Squash FZ16 to 0 for the duration of conversion.  In this case,
+     * it would affect flushing output denormals.
+     */
+    float_status *fpst = fpstp;
+    flag save = get_flush_to_zero(fpst);
+    set_flush_to_zero(false, fpst);
+    float16 r = float32_to_float16(a, !ahp_mode, fpst);
+    set_flush_to_zero(save, fpst);
+    return r;
+}
+
+float64 HELPER(vfp_fcvt_f16_to_f64)(uint32_t a, void *fpstp, uint32_t ahp_mode)
+{
+    /* Squash FZ16 to 0 for the duration of conversion.  In this case,
+     * it would affect flushing input denormals.
+     */
+    float_status *fpst = fpstp;
+    flag save = get_flush_inputs_to_zero(fpst);
+    set_flush_inputs_to_zero(false, fpst);
+    float64 r = float16_to_float64(a, !ahp_mode, fpst);
+    set_flush_inputs_to_zero(save, fpst);
+    return r;
+}
+
+uint32_t HELPER(vfp_fcvt_f64_to_f16)(float64 a, void *fpstp, uint32_t ahp_mode)
+{
+    /* Squash FZ16 to 0 for the duration of conversion.  In this case,
+     * it would affect flushing output denormals.
+     */
+    float_status *fpst = fpstp;
+    flag save = get_flush_to_zero(fpst);
+    set_flush_to_zero(false, fpst);
+    float16 r = float64_to_float16(a, !ahp_mode, fpst);
+    set_flush_to_zero(save, fpst);
+    return r;
+}
+
+#define float32_two make_float32(0x40000000)
+#define float32_three make_float32(0x40400000)
+#define float32_one_point_five make_float32(0x3fc00000)
+
+float32 HELPER(recps_f32)(float32 a, float32 b, CPUARMState *env)
+{
+    float_status *s = &env->vfp.standard_fp_status;
+    if ((float32_is_infinity(a) && float32_is_zero_or_denormal(b)) ||
+        (float32_is_infinity(b) && float32_is_zero_or_denormal(a))) {
+        if (!(float32_is_zero(a) || float32_is_zero(b))) {
+            float_raise(float_flag_input_denormal, s);
+        }
+        return float32_two;
+    }
+    return float32_sub(float32_two, float32_mul(a, b, s), s);
+}
+
+float32 HELPER(rsqrts_f32)(float32 a, float32 b, CPUARMState *env)
+{
+    float_status *s = &env->vfp.standard_fp_status;
+    float32 product;
+    if ((float32_is_infinity(a) && float32_is_zero_or_denormal(b)) ||
+        (float32_is_infinity(b) && float32_is_zero_or_denormal(a))) {
+        if (!(float32_is_zero(a) || float32_is_zero(b))) {
+            float_raise(float_flag_input_denormal, s);
+        }
+        return float32_one_point_five;
+    }
+    product = float32_mul(a, b, s);
+    return float32_div(float32_sub(float32_three, product, s), float32_two, s);
+}
+
+/* NEON helpers.  */
+
+/* Constants 256 and 512 are used in some helpers; we avoid relying on
+ * int->float conversions at run-time.  */
+#define float64_256 make_float64(0x4070000000000000LL)
+#define float64_512 make_float64(0x4080000000000000LL)
+#define float16_maxnorm make_float16(0x7bff)
+#define float32_maxnorm make_float32(0x7f7fffff)
+#define float64_maxnorm make_float64(0x7fefffffffffffffLL)
+
+/* Reciprocal functions
+ *
+ * The algorithm that must be used to calculate the estimate
+ * is specified by the ARM ARM, see FPRecipEstimate()/RecipEstimate
+ */
+
+/* See RecipEstimate()
+ *
+ * input is a 9 bit fixed point number
+ * input range 256 .. 511 for a number from 0.5 <= x < 1.0.
+ * result range 256 .. 511 for a number from 1.0 to 511/256.
+ */
+
+static int recip_estimate(int input)
+{
+    int a, b, r;
+    assert(256 <= input && input < 512);
+    a = (input * 2) + 1;
+    b = (1 << 19) / a;
+    r = (b + 1) >> 1;
+    assert(256 <= r && r < 512);
+    return r;
+}
+
+/*
+ * Common wrapper to call recip_estimate
+ *
+ * The parameters are exponent and 64 bit fraction (without implicit
+ * bit) where the binary point is nominally at bit 52. Returns a
+ * float64 which can then be rounded to the appropriate size by the
+ * callee.
+ */
+
+static uint64_t call_recip_estimate(int *exp, int exp_off, uint64_t frac)
+{
+    uint32_t scaled, estimate;
+    uint64_t result_frac;
+    int result_exp;
+
+    /* Handle sub-normals */
+    if (*exp == 0) {
+        if (extract64(frac, 51, 1) == 0) {
+            *exp = -1;
+            frac <<= 2;
+        } else {
+            frac <<= 1;
+        }
+    }
+
+    /* scaled = UInt('1':fraction<51:44>) */
+    scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
+    estimate = recip_estimate(scaled);
+
+    result_exp = exp_off - *exp;
+    result_frac = deposit64(0, 44, 8, estimate);
+    if (result_exp == 0) {
+        result_frac = deposit64(result_frac >> 1, 51, 1, 1);
+    } else if (result_exp == -1) {
+        result_frac = deposit64(result_frac >> 2, 50, 2, 1);
+        result_exp = 0;
+    }
+
+    *exp = result_exp;
+
+    return result_frac;
+}
+
+static bool round_to_inf(float_status *fpst, bool sign_bit)
+{
+    switch (fpst->float_rounding_mode) {
+    case float_round_nearest_even: /* Round to Nearest */
+        return true;
+    case float_round_up: /* Round to +Inf */
+        return !sign_bit;
+    case float_round_down: /* Round to -Inf */
+        return sign_bit;
+    case float_round_to_zero: /* Round to Zero */
+        return false;
+    }
+
+    g_assert_not_reached();
+}
+
+uint32_t HELPER(recpe_f16)(uint32_t input, void *fpstp)
+{
+    float_status *fpst = fpstp;
+    float16 f16 = float16_squash_input_denormal(input, fpst);
+    uint32_t f16_val = float16_val(f16);
+    uint32_t f16_sign = float16_is_neg(f16);
+    int f16_exp = extract32(f16_val, 10, 5);
+    uint32_t f16_frac = extract32(f16_val, 0, 10);
+    uint64_t f64_frac;
+
+    if (float16_is_any_nan(f16)) {
+        float16 nan = f16;
+        if (float16_is_signaling_nan(f16, fpst)) {
+            float_raise(float_flag_invalid, fpst);
+            nan = float16_silence_nan(f16, fpst);
+        }
+        if (fpst->default_nan_mode) {
+            nan =  float16_default_nan(fpst);
+        }
+        return nan;
+    } else if (float16_is_infinity(f16)) {
+        return float16_set_sign(float16_zero, float16_is_neg(f16));
+    } else if (float16_is_zero(f16)) {
+        float_raise(float_flag_divbyzero, fpst);
+        return float16_set_sign(float16_infinity, float16_is_neg(f16));
+    } else if (float16_abs(f16) < (1 << 8)) {
+        /* Abs(value) < 2.0^-16 */
+        float_raise(float_flag_overflow | float_flag_inexact, fpst);
+        if (round_to_inf(fpst, f16_sign)) {
+            return float16_set_sign(float16_infinity, f16_sign);
+        } else {
+            return float16_set_sign(float16_maxnorm, f16_sign);
+        }
+    } else if (f16_exp >= 29 && fpst->flush_to_zero) {
+        float_raise(float_flag_underflow, fpst);
+        return float16_set_sign(float16_zero, float16_is_neg(f16));
+    }
+
+    f64_frac = call_recip_estimate(&f16_exp, 29,
+                                   ((uint64_t) f16_frac) << (52 - 10));
+
+    /* result = sign : result_exp<4:0> : fraction<51:42> */
+    f16_val = deposit32(0, 15, 1, f16_sign);
+    f16_val = deposit32(f16_val, 10, 5, f16_exp);
+    f16_val = deposit32(f16_val, 0, 10, extract64(f64_frac, 52 - 10, 10));
+    return make_float16(f16_val);
+}
+
+float32 HELPER(recpe_f32)(float32 input, void *fpstp)
+{
+    float_status *fpst = fpstp;
+    float32 f32 = float32_squash_input_denormal(input, fpst);
+    uint32_t f32_val = float32_val(f32);
+    bool f32_sign = float32_is_neg(f32);
+    int f32_exp = extract32(f32_val, 23, 8);
+    uint32_t f32_frac = extract32(f32_val, 0, 23);
+    uint64_t f64_frac;
+
+    if (float32_is_any_nan(f32)) {
+        float32 nan = f32;
+        if (float32_is_signaling_nan(f32, fpst)) {
+            float_raise(float_flag_invalid, fpst);
+            nan = float32_silence_nan(f32, fpst);
+        }
+        if (fpst->default_nan_mode) {
+            nan =  float32_default_nan(fpst);
+        }
+        return nan;
+    } else if (float32_is_infinity(f32)) {
+        return float32_set_sign(float32_zero, float32_is_neg(f32));
+    } else if (float32_is_zero(f32)) {
+        float_raise(float_flag_divbyzero, fpst);
+        return float32_set_sign(float32_infinity, float32_is_neg(f32));
+    } else if (float32_abs(f32) < (1ULL << 21)) {
+        /* Abs(value) < 2.0^-128 */
+        float_raise(float_flag_overflow | float_flag_inexact, fpst);
+        if (round_to_inf(fpst, f32_sign)) {
+            return float32_set_sign(float32_infinity, f32_sign);
+        } else {
+            return float32_set_sign(float32_maxnorm, f32_sign);
+        }
+    } else if (f32_exp >= 253 && fpst->flush_to_zero) {
+        float_raise(float_flag_underflow, fpst);
+        return float32_set_sign(float32_zero, float32_is_neg(f32));
+    }
+
+    f64_frac = call_recip_estimate(&f32_exp, 253,
+                                   ((uint64_t) f32_frac) << (52 - 23));
+
+    /* result = sign : result_exp<7:0> : fraction<51:29> */
+    f32_val = deposit32(0, 31, 1, f32_sign);
+    f32_val = deposit32(f32_val, 23, 8, f32_exp);
+    f32_val = deposit32(f32_val, 0, 23, extract64(f64_frac, 52 - 23, 23));
+    return make_float32(f32_val);
+}
+
+float64 HELPER(recpe_f64)(float64 input, void *fpstp)
+{
+    float_status *fpst = fpstp;
+    float64 f64 = float64_squash_input_denormal(input, fpst);
+    uint64_t f64_val = float64_val(f64);
+    bool f64_sign = float64_is_neg(f64);
+    int f64_exp = extract64(f64_val, 52, 11);
+    uint64_t f64_frac = extract64(f64_val, 0, 52);
+
+    /* Deal with any special cases */
+    if (float64_is_any_nan(f64)) {
+        float64 nan = f64;
+        if (float64_is_signaling_nan(f64, fpst)) {
+            float_raise(float_flag_invalid, fpst);
+            nan = float64_silence_nan(f64, fpst);
+        }
+        if (fpst->default_nan_mode) {
+            nan =  float64_default_nan(fpst);
+        }
+        return nan;
+    } else if (float64_is_infinity(f64)) {
+        return float64_set_sign(float64_zero, float64_is_neg(f64));
+    } else if (float64_is_zero(f64)) {
+        float_raise(float_flag_divbyzero, fpst);
+        return float64_set_sign(float64_infinity, float64_is_neg(f64));
+    } else if ((f64_val & ~(1ULL << 63)) < (1ULL << 50)) {
+        /* Abs(value) < 2.0^-1024 */
+        float_raise(float_flag_overflow | float_flag_inexact, fpst);
+        if (round_to_inf(fpst, f64_sign)) {
+            return float64_set_sign(float64_infinity, f64_sign);
+        } else {
+            return float64_set_sign(float64_maxnorm, f64_sign);
+        }
+    } else if (f64_exp >= 2045 && fpst->flush_to_zero) {
+        float_raise(float_flag_underflow, fpst);
+        return float64_set_sign(float64_zero, float64_is_neg(f64));
+    }
+
+    f64_frac = call_recip_estimate(&f64_exp, 2045, f64_frac);
+
+    /* result = sign : result_exp<10:0> : fraction<51:0>; */
+    f64_val = deposit64(0, 63, 1, f64_sign);
+    f64_val = deposit64(f64_val, 52, 11, f64_exp);
+    f64_val = deposit64(f64_val, 0, 52, f64_frac);
+    return make_float64(f64_val);
+}
+
+/* The algorithm that must be used to calculate the estimate
+ * is specified by the ARM ARM.
+ */
+
+static int do_recip_sqrt_estimate(int a)
+{
+    int b, estimate;
+
+    assert(128 <= a && a < 512);
+    if (a < 256) {
+        a = a * 2 + 1;
+    } else {
+        a = (a >> 1) << 1;
+        a = (a + 1) * 2;
+    }
+    b = 512;
+    while (a * (b + 1) * (b + 1) < (1 << 28)) {
+        b += 1;
+    }
+    estimate = (b + 1) / 2;
+    assert(256 <= estimate && estimate < 512);
+
+    return estimate;
+}
+
+
+static uint64_t recip_sqrt_estimate(int *exp , int exp_off, uint64_t frac)
+{
+    int estimate;
+    uint32_t scaled;
+
+    if (*exp == 0) {
+        while (extract64(frac, 51, 1) == 0) {
+            frac = frac << 1;
+            *exp -= 1;
+        }
+        frac = extract64(frac, 0, 51) << 1;
+    }
+
+    if (*exp & 1) {
+        /* scaled = UInt('01':fraction<51:45>) */
+        scaled = deposit32(1 << 7, 0, 7, extract64(frac, 45, 7));
+    } else {
+        /* scaled = UInt('1':fraction<51:44>) */
+        scaled = deposit32(1 << 8, 0, 8, extract64(frac, 44, 8));
+    }
+    estimate = do_recip_sqrt_estimate(scaled);
+
+    *exp = (exp_off - *exp) / 2;
+    return extract64(estimate, 0, 8) << 44;
+}
+
+uint32_t HELPER(rsqrte_f16)(uint32_t input, void *fpstp)
+{
+    float_status *s = fpstp;
+    float16 f16 = float16_squash_input_denormal(input, s);
+    uint16_t val = float16_val(f16);
+    bool f16_sign = float16_is_neg(f16);
+    int f16_exp = extract32(val, 10, 5);
+    uint16_t f16_frac = extract32(val, 0, 10);
+    uint64_t f64_frac;
+
+    if (float16_is_any_nan(f16)) {
+        float16 nan = f16;
+        if (float16_is_signaling_nan(f16, s)) {
+            float_raise(float_flag_invalid, s);
+            nan = float16_silence_nan(f16, s);
+        }
+        if (s->default_nan_mode) {
+            nan =  float16_default_nan(s);
+        }
+        return nan;
+    } else if (float16_is_zero(f16)) {
+        float_raise(float_flag_divbyzero, s);
+        return float16_set_sign(float16_infinity, f16_sign);
+    } else if (f16_sign) {
+        float_raise(float_flag_invalid, s);
+        return float16_default_nan(s);
+    } else if (float16_is_infinity(f16)) {
+        return float16_zero;
+    }
+
+    /* Scale and normalize to a double-precision value between 0.25 and 1.0,
+     * preserving the parity of the exponent.  */
+
+    f64_frac = ((uint64_t) f16_frac) << (52 - 10);
+
+    f64_frac = recip_sqrt_estimate(&f16_exp, 44, f64_frac);
+
+    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(2) */
+    val = deposit32(0, 15, 1, f16_sign);
+    val = deposit32(val, 10, 5, f16_exp);
+    val = deposit32(val, 2, 8, extract64(f64_frac, 52 - 8, 8));
+    return make_float16(val);
+}
+
+float32 HELPER(rsqrte_f32)(float32 input, void *fpstp)
+{
+    float_status *s = fpstp;
+    float32 f32 = float32_squash_input_denormal(input, s);
+    uint32_t val = float32_val(f32);
+    uint32_t f32_sign = float32_is_neg(f32);
+    int f32_exp = extract32(val, 23, 8);
+    uint32_t f32_frac = extract32(val, 0, 23);
+    uint64_t f64_frac;
+
+    if (float32_is_any_nan(f32)) {
+        float32 nan = f32;
+        if (float32_is_signaling_nan(f32, s)) {
+            float_raise(float_flag_invalid, s);
+            nan = float32_silence_nan(f32, s);
+        }
+        if (s->default_nan_mode) {
+            nan =  float32_default_nan(s);
+        }
+        return nan;
+    } else if (float32_is_zero(f32)) {
+        float_raise(float_flag_divbyzero, s);
+        return float32_set_sign(float32_infinity, float32_is_neg(f32));
+    } else if (float32_is_neg(f32)) {
+        float_raise(float_flag_invalid, s);
+        return float32_default_nan(s);
+    } else if (float32_is_infinity(f32)) {
+        return float32_zero;
+    }
+
+    /* Scale and normalize to a double-precision value between 0.25 and 1.0,
+     * preserving the parity of the exponent.  */
+
+    f64_frac = ((uint64_t) f32_frac) << 29;
+
+    f64_frac = recip_sqrt_estimate(&f32_exp, 380, f64_frac);
+
+    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(15) */
+    val = deposit32(0, 31, 1, f32_sign);
+    val = deposit32(val, 23, 8, f32_exp);
+    val = deposit32(val, 15, 8, extract64(f64_frac, 52 - 8, 8));
+    return make_float32(val);
+}
+
+float64 HELPER(rsqrte_f64)(float64 input, void *fpstp)
+{
+    float_status *s = fpstp;
+    float64 f64 = float64_squash_input_denormal(input, s);
+    uint64_t val = float64_val(f64);
+    bool f64_sign = float64_is_neg(f64);
+    int f64_exp = extract64(val, 52, 11);
+    uint64_t f64_frac = extract64(val, 0, 52);
+
+    if (float64_is_any_nan(f64)) {
+        float64 nan = f64;
+        if (float64_is_signaling_nan(f64, s)) {
+            float_raise(float_flag_invalid, s);
+            nan = float64_silence_nan(f64, s);
+        }
+        if (s->default_nan_mode) {
+            nan =  float64_default_nan(s);
+        }
+        return nan;
+    } else if (float64_is_zero(f64)) {
+        float_raise(float_flag_divbyzero, s);
+        return float64_set_sign(float64_infinity, float64_is_neg(f64));
+    } else if (float64_is_neg(f64)) {
+        float_raise(float_flag_invalid, s);
+        return float64_default_nan(s);
+    } else if (float64_is_infinity(f64)) {
+        return float64_zero;
+    }
+
+    f64_frac = recip_sqrt_estimate(&f64_exp, 3068, f64_frac);
+
+    /* result = sign : result_exp<4:0> : estimate<7:0> : Zeros(44) */
+    val = deposit64(0, 61, 1, f64_sign);
+    val = deposit64(val, 52, 11, f64_exp);
+    val = deposit64(val, 44, 8, extract64(f64_frac, 52 - 8, 8));
+    return make_float64(val);
+}
+
+uint32_t HELPER(recpe_u32)(uint32_t a, void *fpstp)
+{
+    /* float_status *s = fpstp; */
+    int input, estimate;
+
+    if ((a & 0x80000000) == 0) {
+        return 0xffffffff;
+    }
+
+    input = extract32(a, 23, 9);
+    estimate = recip_estimate(input);
+
+    return deposit32(0, (32 - 9), 9, estimate);
+}
+
+uint32_t HELPER(rsqrte_u32)(uint32_t a, void *fpstp)
+{
+    int estimate;
+
+    if ((a & 0xc0000000) == 0) {
+        return 0xffffffff;
+    }
+
+    estimate = do_recip_sqrt_estimate(extract32(a, 23, 9));
+
+    return deposit32(0, 23, 9, estimate);
+}
+
+/* VFPv4 fused multiply-accumulate */
+float32 VFP_HELPER(muladd, s)(float32 a, float32 b, float32 c, void *fpstp)
+{
+    float_status *fpst = fpstp;
+    return float32_muladd(a, b, c, 0, fpst);
+}
+
+float64 VFP_HELPER(muladd, d)(float64 a, float64 b, float64 c, void *fpstp)
+{
+    float_status *fpst = fpstp;
+    return float64_muladd(a, b, c, 0, fpst);
+}
+
+/* ARMv8 round to integral */
+float32 HELPER(rints_exact)(float32 x, void *fp_status)
+{
+    return float32_round_to_int(x, fp_status);
+}
+
+float64 HELPER(rintd_exact)(float64 x, void *fp_status)
+{
+    return float64_round_to_int(x, fp_status);
+}
+
+float32 HELPER(rints)(float32 x, void *fp_status)
+{
+    int old_flags = get_float_exception_flags(fp_status), new_flags;
+    float32 ret;
+
+    ret = float32_round_to_int(x, fp_status);
+
+    /* Suppress any inexact exceptions the conversion produced */
+    if (!(old_flags & float_flag_inexact)) {
+        new_flags = get_float_exception_flags(fp_status);
+        set_float_exception_flags(new_flags & ~float_flag_inexact, fp_status);
+    }
+
+    return ret;
+}
+
+float64 HELPER(rintd)(float64 x, void *fp_status)
+{
+    int old_flags = get_float_exception_flags(fp_status), new_flags;
+    float64 ret;
+
+    ret = float64_round_to_int(x, fp_status);
+
+    new_flags = get_float_exception_flags(fp_status);
+
+    /* Suppress any inexact exceptions the conversion produced */
+    if (!(old_flags & float_flag_inexact)) {
+        new_flags = get_float_exception_flags(fp_status);
+        set_float_exception_flags(new_flags & ~float_flag_inexact, fp_status);
+    }
+
+    return ret;
+}
+
+/* Convert ARM rounding mode to softfloat */
+int arm_rmode_to_sf(int rmode)
+{
+    switch (rmode) {
+    case FPROUNDING_TIEAWAY:
+        rmode = float_round_ties_away;
+        break;
+    case FPROUNDING_ODD:
+        /* FIXME: add support for TIEAWAY and ODD */
+        qemu_log_mask(LOG_UNIMP, "arm: unimplemented rounding mode: %d\n",
+                      rmode);
+        /* fall through for now */
+    case FPROUNDING_TIEEVEN:
+    default:
+        rmode = float_round_nearest_even;
+        break;
+    case FPROUNDING_POSINF:
+        rmode = float_round_up;
+        break;
+    case FPROUNDING_NEGINF:
+        rmode = float_round_down;
+        break;
+    case FPROUNDING_ZERO:
+        rmode = float_round_to_zero;
+        break;
+    }
+    return rmode;
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

There are lots of special cases within these insns.  Split the
major argument decode/loading/saving into no_output (compares),
rd_is_dp, and rm_is_dp.

We still need to special case argument load for compare (rd as
input, rm as zero) and vcvt fixed (rd as input+output), but lots
of special cases do disappear.

Now that we have a full switch at the beginning, hoist the ISA
checks from the code generation.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190215192302.27855-4-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 227 ++++++++++++++++++++---------------------
 1 file changed, 111 insertions(+), 116 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             }
         } else {
             /* data processing */
+            bool rd_is_dp = dp;
+            bool rm_is_dp = dp;
+            bool no_output = false;
+
             /* The opcode is in bits 23, 21, 20 and 6.  */
             op = ((insn >> 20) & 8) | ((insn >> 19) & 6) | ((insn >> 6) & 1);
-            if (dp) {
-                if (op == 15) {
-                    /* rn is opcode */
-                    rn = ((insn >> 15) & 0x1e) | ((insn >> 7) & 1);
-                } else {
-                    /* rn is register number */
-                    VFP_DREG_N(rn, insn);
-                }
+            rn = VFP_SREG_N(insn);
 
-                if (op == 15 && (rn == 15 || ((rn & 0x1c) == 0x18) ||
-                                 ((rn & 0x1e) == 0x6))) {
-                    /* Integer or single/half precision destination.  */
-                    rd = VFP_SREG_D(insn);
-                } else {
-                    VFP_DREG_D(rd, insn);
-                }
-                if (op == 15 &&
-                    (((rn & 0x1c) == 0x10) || ((rn & 0x14) == 0x14) ||
-                     ((rn & 0x1e) == 0x4))) {
-                    /* VCVT from int or half precision is always from S reg
-                     * regardless of dp bit. VCVT with immediate frac_bits
-                     * has same format as SREG_M.
+            if (op == 15) {
+                /* rn is opcode, encoded as per VFP_SREG_N. */
+                switch (rn) {
+                case 0x00: /* vmov */
+                case 0x01: /* vabs */
+                case 0x02: /* vneg */
+                case 0x03: /* vsqrt */
+                    break;
+
+                case 0x04: /* vcvtb.f64.f16, vcvtb.f32.f16 */
+                case 0x05: /* vcvtt.f64.f16, vcvtt.f32.f16 */
+                    /*
+                     * VCVTB, VCVTT: only present with the halfprec extension
+                     * UNPREDICTABLE if bit 8 is set prior to ARMv8
+                     * (we choose to UNDEF)
                      */
-                    rm = VFP_SREG_M(insn);
-                } else {
-                    VFP_DREG_M(rm, insn);
+                    if ((dp && !arm_dc_feature(s, ARM_FEATURE_V8)) ||
+                        !arm_dc_feature(s, ARM_FEATURE_VFP_FP16)) {
+                        return 1;
+                    }
+                    rm_is_dp = false;
+                    break;
+                case 0x06: /* vcvtb.f16.f32, vcvtb.f16.f64 */
+                case 0x07: /* vcvtt.f16.f32, vcvtt.f16.f64 */
+                    if ((dp && !arm_dc_feature(s, ARM_FEATURE_V8)) ||
+                        !arm_dc_feature(s, ARM_FEATURE_VFP_FP16)) {
+                        return 1;
+                    }
+                    rd_is_dp = false;
+                    break;
+
+                case 0x08: case 0x0a: /* vcmp, vcmpz */
+                case 0x09: case 0x0b: /* vcmpe, vcmpez */
+                    no_output = true;
+                    break;
+
+                case 0x0c: /* vrintr */
+                case 0x0d: /* vrintz */
+                case 0x0e: /* vrintx */
+                    break;
+
+                case 0x0f: /* vcvt double<->single */
+                    rd_is_dp = !dp;
+                    break;
+
+                case 0x10: /* vcvt.fxx.u32 */
+                case 0x11: /* vcvt.fxx.s32 */
+                    rm_is_dp = false;
+                    break;
+                case 0x18: /* vcvtr.u32.fxx */
+                case 0x19: /* vcvtz.u32.fxx */
+                case 0x1a: /* vcvtr.s32.fxx */
+                case 0x1b: /* vcvtz.s32.fxx */
+                    rd_is_dp = false;
+                    break;
+
+                case 0x14: /* vcvt fp <-> fixed */
+                case 0x15:
+                case 0x16:
+                case 0x17:
+                case 0x1c:
+                case 0x1d:
+                case 0x1e:
+                case 0x1f:
+                    if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
+                        return 1;
+                    }
+                    /* Immediate frac_bits has same format as SREG_M.  */
+                    rm_is_dp = false;
+                    break;
+
+                default:
+                    return 1;
                 }
+            } else if (dp) {
+                /* rn is register number */
+                VFP_DREG_N(rn, insn);
+            }
+
+            if (rd_is_dp) {
+                VFP_DREG_D(rd, insn);
+            } else {
+                rd = VFP_SREG_D(insn);
+            }
+            if (rm_is_dp) {
+                VFP_DREG_M(rm, insn);
             } else {
-                rn = VFP_SREG_N(insn);
-                if (op == 15 && rn == 15) {
-                    /* Double precision destination.  */
-                    VFP_DREG_D(rd, insn);
-                } else {
-                    rd = VFP_SREG_D(insn);
-                }
-                /* NB that we implicitly rely on the encoding for the frac_bits
-                 * in VCVT of fixed to float being the same as that of an SREG_M
-                 */
                 rm = VFP_SREG_M(insn);
             }
 
             veclen = s->vec_len;
-            if (op == 15 && rn > 3)
+            if (op == 15 && rn > 3) {
                 veclen = 0;
+            }
 
             /* Shut up compiler warnings.  */
             delta_m = 0;
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
             /* Load the initial operands.  */
             if (op == 15) {
                 switch (rn) {
-                case 16:
-                case 17:
-                    /* Integer source */
-                    gen_mov_F0_vreg(0, rm);
-                    break;
-                case 8:
-                case 9:
-                    /* Compare */
+                case 0x08: case 0x09: /* Compare */
                     gen_mov_F0_vreg(dp, rd);
                     gen_mov_F1_vreg(dp, rm);
                     break;
-                case 10:
-                case 11:
-                    /* Compare with zero */
+                case 0x0a: case 0x0b: /* Compare with zero */
                     gen_mov_F0_vreg(dp, rd);
                     gen_vfp_F1_ld0(dp);
                     break;
-                case 20:
-                case 21:
-                case 22:
-                case 23:
-                case 28:
-                case 29:
-                case 30:
-                case 31:
+                case 0x14: /* vcvt fp <-> fixed */
+                case 0x15:
+                case 0x16:
+                case 0x17:
+                case 0x1c:
+                case 0x1d:
+                case 0x1e:
+                case 0x1f:
                     /* Source and destination the same.  */
                     gen_mov_F0_vreg(dp, rd);
                     break;
-                case 4:
-                case 5:
-                case 6:
-                case 7:
-                    /* VCVTB, VCVTT: only present with the halfprec extension
-                     * UNPREDICTABLE if bit 8 is set prior to ARMv8
-                     * (we choose to UNDEF)
-                     */
-                    if ((dp && !arm_dc_feature(s, ARM_FEATURE_V8)) ||
-                        !arm_dc_feature(s, ARM_FEATURE_VFP_FP16)) {
-                        return 1;
-                    }
-                    if (!extract32(rn, 1, 1)) {
-                        /* Half precision source.  */
-                        gen_mov_F0_vreg(0, rm);
-                        break;
-                    }
-                    /* Otherwise fall through */
                 default:
                     /* One source operand.  */
-                    gen_mov_F0_vreg(dp, rm);
+                    gen_mov_F0_vreg(rm_is_dp, rm);
                     break;
                 }
             } else {
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                         break;
                     }
                     case 15: /* single<->double conversion */
-                        if (dp)
+                        if (dp) {
                             gen_helper_vfp_fcvtsd(cpu_F0s, cpu_F0d, cpu_env);
-                        else
+                        } else {
                             gen_helper_vfp_fcvtds(cpu_F0d, cpu_F0s, cpu_env);
+                        }
                         break;
                     case 16: /* fuito */
                         gen_vfp_uito(dp, 0);
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                         gen_vfp_sito(dp, 0);
                         break;
                     case 20: /* fshto */
-                        if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
-                            return 1;
-                        }
                         gen_vfp_shto(dp, 16 - rm, 0);
                         break;
                     case 21: /* fslto */
-                        if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
-                            return 1;
-                        }
                         gen_vfp_slto(dp, 32 - rm, 0);
                         break;
                     case 22: /* fuhto */
-                        if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
-                            return 1;
-                        }
                         gen_vfp_uhto(dp, 16 - rm, 0);
                         break;
                     case 23: /* fulto */
-                        if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
-                            return 1;
-                        }
                         gen_vfp_ulto(dp, 32 - rm, 0);
                         break;
                     case 24: /* ftoui */
@@ -XXX,XX +XXX,XX @@ static int disas_vfp_insn(DisasContext *s, uint32_t insn)
                         gen_vfp_tosiz(dp, 0);
                         break;
                     case 28: /* ftosh */
-                        if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
-                            return 1;
-                        }
                         gen_vfp_tosh(dp, 16 - rm, 0);
                         break;
                     case 29: /* ftosl */
-                        if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
-                            return 1;
-                        }
                         gen_vfp_tosl(dp, 32 - rm, 0);
                         break;
                     case 30: /* ftouh */
-                        if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
-                            return 1;
-                        }
                         gen_vfp_touh(dp, 16 - rm, 0);
                         break;
                     case 31: /* ftoul */
-                        if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
-                            return 1;
-                        }
                         gen_vfp_toul(dp, 32 - rm, 0);
                         break;
                     default: /* undefined */
-                        return 1;
+                        g_assert_not_reached();
                     }
                     break;
                 default: /* undefined */
                     return 1;
                 }
 
-                /* Write back the result.  */
-                if (op == 15 && (rn >= 8 && rn <= 11)) {
-                    /* Comparison, do nothing.  */
-                } else if (op == 15 && dp && ((rn & 0x1c) == 0x18 ||
-                                              (rn & 0x1e) == 0x6)) {
-                    /* VCVT double to int: always integer result.
-                     * VCVT double to half precision is always a single
-                     * precision result.
-                     */
-                    gen_mov_vreg_F0(0, rd);
-                } else if (op == 15 && rn == 15) {
-                    /* conversion */
-                    gen_mov_vreg_F0(!dp, rd);
-                } else {
-                    gen_mov_vreg_F0(dp, rd);
+                /* Write back the result, if any.  */
+                if (!no_output) {
+                    gen_mov_vreg_F0(rd_is_dp, rd);
                 }
 
                 /* break out of the loop if we have finished  */
-                if (veclen == 0)
+                if (veclen == 0) {
                     break;
+                }
 
                 if (op == 15 && delta_m == 0) {
                     /* single source one-many */
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190215192302.27855-5-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: fixed a couple of comment typos]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h           | 10 +++++
 target/arm/helper.h        |  3 ++
 target/arm/cpu.c           |  1 +
 target/arm/cpu64.c         |  2 +
 target/arm/translate-a64.c | 26 +++++++++++
 target/arm/translate.c     | 10 +++++
 target/arm/vfp_helper.c    | 88 ++++++++++++++++++++++++++++++++++++++
 7 files changed, 140 insertions(+)

The Peripheral Protection Controller's handling of unused ports
is that if there is nothing connected to the port's downstream
then it does not create the sysbus MMIO region for the upstream
end of the port. This results in odd behaviour when there is
an unused port in the middle of the range: since sysbus MMIO
regions are implicitly consecutively allocated, any used ports
above the unused ones end up with sysbus MMIO region numbers
that don't match the port number.

Avoid this numbering mismatch by creating dummy MMIO regions
for the unused ports. This doesn't change anything for our
existing boards, which don't have any gaps in the middle of
the port ranges they use; but it will be needed for the Musca
board.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/hw/misc/tz-ppc.h |  8 +++++++-
 hw/misc/tz-ppc.c         | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/include/hw/misc/tz-ppc.h b/include/hw/misc/tz-ppc.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/tz-ppc.h
+++ b/include/hw/misc/tz-ppc.h
@@ -XXX,XX +XXX,XX @@
  *
  * QEMU interface:
  * + sysbus MMIO regions 0..15: MemoryRegions defining the upstream end
- *   of each of the 16 ports of the PPC
+ *   of each of the 16 ports of the PPC. When a port is unused (i.e. no
+ *   downstream MemoryRegion is connected to it) at the end of the 0..15
+ *   range then no sysbus MMIO region is created for its upstream. When an
+ *   unused port lies in the middle of the range with other used ports at
+ *   higher port numbers, a dummy MMIO region is created to ensure that
+ *   port N's upstream is always sysbus MMIO region N. Dummy regions should
+ *   not be mapped, and will assert if any access is made to them.
  * + Property "port[0..15]": MemoryRegion defining the downstream device(s)
  *   for each of the 16 ports of the PPC
  * + Named GPIO inputs "cfg_nonsec[0..15]": set to 1 if the port should be
diff --git a/hw/misc/tz-ppc.c b/hw/misc/tz-ppc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/tz-ppc.c
+++ b/hw/misc/tz-ppc.c
@@ -XXX,XX +XXX,XX @@ static const MemoryRegionOps tz_ppc_ops = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
+static bool tz_ppc_dummy_accepts(void *opaque, hwaddr addr,
+                                 unsigned size, bool is_write,
+                                 MemTxAttrs attrs)
+{
+    /*
+     * Board code should never map the upstream end of an unused port,
+     * so we should never try to make a memory access to it.
+     */
+    g_assert_not_reached();
+}
+
+static const MemoryRegionOps tz_ppc_dummy_ops = {
+    .valid.accepts = tz_ppc_dummy_accepts,
+};
+
 static void tz_ppc_reset(DeviceState *dev)
 {
     TZPPC *s = TZ_PPC(dev);
@@ -XXX,XX +XXX,XX @@ static void tz_ppc_realize(DeviceState *dev, Error **errp)
     SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
     TZPPC *s = TZ_PPC(dev);
     int i;
+    int max_port = 0;
 
     /* We can't create the upstream end of the port until realize,
      * as we don't know the size of the MR used as the downstream until then.
      */
     for (i = 0; i < TZ_NUM_PORTS; i++) {
+        if (s->port[i].downstream) {
+            max_port = i;
+        }
+    }
+
+    for (i = 0; i <= max_port; i++) {
         TZPPCPort *port = &s->port[i];
         char *name;
         uint64_t size;
 
         if (!port->downstream) {
+            /*
+             * Create dummy sysbus MMIO region so the sysbus region
+             * numbering doesn't get out of sync with the port numbers.
+             * The size is entirely arbitrary.
+             */
+            name = g_strdup_printf("tz-ppc-dummy-port[%d]", i);
+            memory_region_init_io(&port->upstream, obj, &tz_ppc_dummy_ops,
+                                  port, name, 0x10000);
+            sysbus_init_mmio(sbd, &port->upstream);
+            g_free(name);
             continue;
         }
 
-- 
2.20.1

Create a new include file for the pl031's device struct,
type macros, etc, so that it can be instantiated using
the "embedded struct" coding style.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/hw/timer/pl031.h | 44 ++++++++++++++++++++++++++++++++++++++++
 hw/timer/pl031.c         | 25 +----------------------
 MAINTAINERS              |  1 +
 3 files changed, 46 insertions(+), 24 deletions(-)
 create mode 100644 include/hw/timer/pl031.h

diff --git a/include/hw/timer/pl031.h b/include/hw/timer/pl031.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/hw/timer/pl031.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * ARM AMBA PrimeCell PL031 RTC
+ *
+ * Copyright (c) 2007 CodeSourcery
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+
+#ifndef HW_TIMER_PL031
+#define HW_TIMER_PL031
+
+#include "hw/sysbus.h"
+
+#define TYPE_PL031 "pl031"
+#define PL031(obj) OBJECT_CHECK(PL031State, (obj), TYPE_PL031)
+
+typedef struct PL031State {
+    SysBusDevice parent_obj;
+
+    MemoryRegion iomem;
+    QEMUTimer *timer;
+    qemu_irq irq;
+
+    /*
+     * Needed to preserve the tick_count across migration, even if the
+     * absolute value of the rtc_clock is different on the source and
+     * destination.
+     */
+    uint32_t tick_offset_vmstate;
+    uint32_t tick_offset;
+
+    uint32_t mr;
+    uint32_t lr;
+    uint32_t cr;
+    uint32_t im;
+    uint32_t is;
+} PL031State;
+
+#endif
diff --git a/hw/timer/pl031.c b/hw/timer/pl031.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/pl031.c
+++ b/hw/timer/pl031.c
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
+#include "hw/timer/pl031.h"
 #include "hw/sysbus.h"
 #include "qemu/timer.h"
 #include "sysemu/sysemu.h"
@@ -XXX,XX +XXX,XX @@ do { printf("pl031: " fmt , ## __VA_ARGS__); } while (0)
 #define RTC_MIS     0x18    /* Masked interrupt status register */
 #define RTC_ICR     0x1c    /* Interrupt clear register */
 
-#define TYPE_PL031 "pl031"
-#define PL031(obj) OBJECT_CHECK(PL031State, (obj), TYPE_PL031)
-
-typedef struct PL031State {
-    SysBusDevice parent_obj;
-
-    MemoryRegion iomem;
-    QEMUTimer *timer;
-    qemu_irq irq;
-
-    /* Needed to preserve the tick_count across migration, even if the
-     * absolute value of the rtc_clock is different on the source and
-     * destination.
-     */
-    uint32_t tick_offset_vmstate;
-    uint32_t tick_offset;
-
-    uint32_t mr;
-    uint32_t lr;
-    uint32_t cr;
-    uint32_t im;
-    uint32_t is;
-} PL031State;
-
 static const unsigned char pl031_id[] = {
     0x31, 0x10, 0x14, 0x00,         /* Device ID        */
     0x0d, 0xf0, 0x05, 0xb1          /* Cell ID      */
diff --git a/MAINTAINERS b/MAINTAINERS
index XXXXXXX..XXXXXXX 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ F: hw/sd/pl181.c
 F: hw/ssi/pl022.c
 F: include/hw/ssi/pl022.h
 F: hw/timer/pl031.c
+F: include/hw/timer/pl031.h
 F: include/hw/arm/primecell.h
 F: hw/timer/cmsdk-apb-timer.c
 F: include/hw/timer/cmsdk-apb-timer.h
-- 
2.20.1

Convert the debug printing in the PL031 device to use trace events,
and augment it to cover the interesting parts of device operation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 hw/timer/pl031.c      | 55 +++++++++++++++++++++++--------------------
 hw/timer/trace-events |  6 +++++
 2 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/hw/timer/pl031.c b/hw/timer/pl031.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/pl031.c
+++ b/hw/timer/pl031.c
@@ -XXX,XX +XXX,XX @@
 #include "sysemu/sysemu.h"
 #include "qemu/cutils.h"
 #include "qemu/log.h"
-
-//#define DEBUG_PL031
-
-#ifdef DEBUG_PL031
-#define DPRINTF(fmt, ...) \
-do { printf("pl031: " fmt , ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) do {} while(0)
-#endif
+#include "trace.h"
 
 #define RTC_DR      0x00    /* Data read register */
 #define RTC_MR      0x04    /* Match register */
@@ -XXX,XX +XXX,XX @@ static const unsigned char pl031_id[] = {
 
 static void pl031_update(PL031State *s)
 {
-    qemu_set_irq(s->irq, s->is & s->im);
+    uint32_t flags = s->is & s->im;
+
+    trace_pl031_irq_state(flags);
+    qemu_set_irq(s->irq, flags);
 }
 
 static void pl031_interrupt(void * opaque)
@@ -XXX,XX +XXX,XX @@ static void pl031_interrupt(void * opaque)
     PL031State *s = (PL031State *)opaque;
 
     s->is = 1;
-    DPRINTF("Alarm raised\n");
+    trace_pl031_alarm_raised();
     pl031_update(s);
 }
 
@@ -XXX,XX +XXX,XX @@ static void pl031_set_alarm(PL031State *s)
     /* The timer wraps around.  This subtraction also wraps in the same way,
        and gives correct results when alarm < now_ticks.  */
     ticks = s->mr - pl031_get_count(s);
-    DPRINTF("Alarm set in %ud ticks\n", ticks);
+    trace_pl031_set_alarm(ticks);
     if (ticks == 0) {
         timer_del(s->timer);
         pl031_interrupt(s);
@@ -XXX,XX +XXX,XX @@ static uint64_t pl031_read(void *opaque, hwaddr offset,
                            unsigned size)
 {
     PL031State *s = (PL031State *)opaque;
-
-    if (offset >= 0xfe0  &&  offset < 0x1000)
-        return pl031_id[(offset - 0xfe0) >> 2];
+    uint64_t r;
 
     switch (offset) {
     case RTC_DR:
-        return pl031_get_count(s);
+        r = pl031_get_count(s);
+        break;
     case RTC_MR:
-        return s->mr;
+        r = s->mr;
+        break;
     case RTC_IMSC:
-        return s->im;
+        r = s->im;
+        break;
     case RTC_RIS:
-        return s->is;
+        r = s->is;
+        break;
     case RTC_LR:
-        return s->lr;
+        r = s->lr;
+        break;
     case RTC_CR:
         /* RTC is permanently enabled.  */
-        return 1;
+        r = 1;
+        break;
     case RTC_MIS:
-        return s->is & s->im;
+        r = s->is & s->im;
+        break;
+    case 0xfe0 ... 0xfff:
+        r = pl031_id[(offset - 0xfe0) >> 2];
+        break;
     case RTC_ICR:
         qemu_log_mask(LOG_GUEST_ERROR,
                       "pl031: read of write-only register at offset 0x%x\n",
                       (int)offset);
+        r = 0;
         break;
     default:
         qemu_log_mask(LOG_GUEST_ERROR,
                       "pl031_read: Bad offset 0x%x\n", (int)offset);
+        r = 0;
         break;
     }
 
-    return 0;
+    trace_pl031_read(offset, r);
+    return r;
 }
 
 static void pl031_write(void * opaque, hwaddr offset,
@@ -XXX,XX +XXX,XX @@ static void pl031_write(void * opaque, hwaddr offset,
 {
     PL031State *s = (PL031State *)opaque;
 
+    trace_pl031_write(offset, value);
 
     switch (offset) {
     case RTC_LR:
@@ -XXX,XX +XXX,XX @@ static void pl031_write(void * opaque, hwaddr offset,
         break;
     case RTC_IMSC:
         s->im = value & 1;
-        DPRINTF("Interrupt mask %d\n", s->im);
         pl031_update(s);
         break;
     case RTC_ICR:
@@ -XXX,XX +XXX,XX @@ static void pl031_write(void * opaque, hwaddr offset,
            cleared when bit 0 of the written value is set.  However the
            arm926e documentation (DDI0287B) states that the interrupt is
            cleared when any value is written.  */
-        DPRINTF("Interrupt cleared");
         s->is = 0;
         pl031_update(s);
         break;
diff --git a/hw/timer/trace-events b/hw/timer/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/trace-events
+++ b/hw/timer/trace-events
@@ -XXX,XX +XXX,XX @@ xlnx_zynqmp_rtc_gettime(int year, int month, int day, int hour, int min, int sec
 nrf51_timer_read(uint64_t addr, uint32_t value, unsigned size) "read addr 0x%" PRIx64 " data 0x%" PRIx32 " size %u"
 nrf51_timer_write(uint64_t addr, uint32_t value, unsigned size) "write addr 0x%" PRIx64 " data 0x%" PRIx32 " size %u"
 
+# hw/timer/pl031.c
+pl031_irq_state(int level) "irq state %d"
+pl031_read(uint32_t addr, uint32_t value) "addr 0x%08x value 0x%08x"
+pl031_write(uint32_t addr, uint32_t value) "addr 0x%08x value 0x%08x"
+pl031_alarm_raised(void) "alarm raised"
+pl031_set_alarm(uint32_t ticks) "alarm set for %u ticks"
-- 
2.20.1

Create a new include file for the pl011's device struct,
type macros, etc, so that it can be instantiated using
the "embedded struct" coding style.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/hw/char/pl011.h | 34 ++++++++++++++++++++++++++++++++++
 hw/char/pl011.c         | 31 ++-----------------------------
 2 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/include/hw/char/pl011.h b/include/hw/char/pl011.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/char/pl011.h
+++ b/include/hw/char/pl011.h
@@ -XXX,XX +XXX,XX @@
 #ifndef HW_PL011_H
 #define HW_PL011_H
 
+#include "hw/sysbus.h"
+#include "chardev/char-fe.h"
+
+#define TYPE_PL011 "pl011"
+#define PL011(obj) OBJECT_CHECK(PL011State, (obj), TYPE_PL011)
+
+/* This shares the same struct (and cast macro) as the base pl011 device */
+#define TYPE_PL011_LUMINARY "pl011_luminary"
+
+typedef struct PL011State {
+    SysBusDevice parent_obj;
+
+    MemoryRegion iomem;
+    uint32_t readbuff;
+    uint32_t flags;
+    uint32_t lcr;
+    uint32_t rsr;
+    uint32_t cr;
+    uint32_t dmacr;
+    uint32_t int_enabled;
+    uint32_t int_level;
+    uint32_t read_fifo[16];
+    uint32_t ilpr;
+    uint32_t ibrd;
+    uint32_t fbrd;
+    uint32_t ifl;
+    int read_pos;
+    int read_count;
+    int read_trigger;
+    CharBackend chr;
+    qemu_irq irq;
+    const unsigned char *id;
+} PL011State;
+
 static inline DeviceState *pl011_create(hwaddr addr,
                                         qemu_irq irq,
                                         Chardev *chr)
diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
+#include "hw/char/pl011.h"
 #include "hw/sysbus.h"
 #include "chardev/char-fe.h"
 #include "qemu/log.h"
 #include "trace.h"
 
-#define TYPE_PL011 "pl011"
-#define PL011(obj) OBJECT_CHECK(PL011State, (obj), TYPE_PL011)
-
-typedef struct PL011State {
-    SysBusDevice parent_obj;
-
-    MemoryRegion iomem;
-    uint32_t readbuff;
-    uint32_t flags;
-    uint32_t lcr;
-    uint32_t rsr;
-    uint32_t cr;
-    uint32_t dmacr;
-    uint32_t int_enabled;
-    uint32_t int_level;
-    uint32_t read_fifo[16];
-    uint32_t ilpr;
-    uint32_t ibrd;
-    uint32_t fbrd;
-    uint32_t ifl;
-    int read_pos;
-    int read_count;
-    int read_trigger;
-    CharBackend chr;
-    qemu_irq irq;
-    const unsigned char *id;
-} PL011State;
-
 #define PL011_INT_TX 0x20
 #define PL011_INT_RX 0x10
 
@@ -XXX,XX +XXX,XX @@ static void pl011_luminary_init(Object *obj)
 }
 
 static const TypeInfo pl011_luminary_info = {
-    .name          = "pl011_luminary",
+    .name          = TYPE_PL011_LUMINARY,
     .parent        = TYPE_PL011,
     .instance_init = pl011_luminary_init,
 };
-- 
2.20.1

The PL011 UART has six interrupt lines:
 * RX (receive data)
 * TX (transmit data)
 * RT (receive timeout)
 * MS (modem status)
 * E (errors)
 * combined (logical OR of all the above)

So far we have only emulated the combined interrupt line;
add support for the others, so that boards that wire them
up to different interrupt controller inputs can do so.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/hw/char/pl011.h |  2 +-
 hw/char/pl011.c         | 46 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/include/hw/char/pl011.h b/include/hw/char/pl011.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/char/pl011.h
+++ b/include/hw/char/pl011.h
@@ -XXX,XX +XXX,XX @@ typedef struct PL011State {
     int read_count;
     int read_trigger;
     CharBackend chr;
-    qemu_irq irq;
+    qemu_irq irq[6];
     const unsigned char *id;
 } PL011State;
 
diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -XXX,XX +XXX,XX @@
  * This code is licensed under the GPL.
  */
 
+/*
+ * QEMU interface:
+ *  + sysbus MMIO region 0: device registers
+ *  + sysbus IRQ 0: UARTINTR (combined interrupt line)
+ *  + sysbus IRQ 1: UARTRXINTR (receive FIFO interrupt line)
+ *  + sysbus IRQ 2: UARTTXINTR (transmit FIFO interrupt line)
+ *  + sysbus IRQ 3: UARTRTINTR (receive timeout interrupt line)
+ *  + sysbus IRQ 4: UARTMSINTR (momem status interrupt line)
+ *  + sysbus IRQ 5: UARTEINTR (error interrupt line)
+ */
+
 #include "qemu/osdep.h"
 #include "hw/char/pl011.h"
 #include "hw/sysbus.h"
@@ -XXX,XX +XXX,XX @@
 #define PL011_FLAG_TXFF 0x20
 #define PL011_FLAG_RXFE 0x10
 
+/* Interrupt status bits in UARTRIS, UARTMIS, UARTIMSC */
+#define INT_OE (1 << 10)
+#define INT_BE (1 << 9)
+#define INT_PE (1 << 8)
+#define INT_FE (1 << 7)
+#define INT_RT (1 << 6)
+#define INT_TX (1 << 5)
+#define INT_RX (1 << 4)
+#define INT_DSR (1 << 3)
+#define INT_DCD (1 << 2)
+#define INT_CTS (1 << 1)
+#define INT_RI (1 << 0)
+#define INT_E (INT_OE | INT_BE | INT_PE | INT_FE)
+#define INT_MS (INT_RI | INT_DSR | INT_DCD | INT_CTS)
+
 static const unsigned char pl011_id_arm[8] =
   { 0x11, 0x10, 0x14, 0x00, 0x0d, 0xf0, 0x05, 0xb1 };
 static const unsigned char pl011_id_luminary[8] =
   { 0x11, 0x00, 0x18, 0x01, 0x0d, 0xf0, 0x05, 0xb1 };
 
+/* Which bits in the interrupt status matter for each outbound IRQ line ? */
+static const uint32_t irqmask[] = {
+    INT_E | INT_MS | INT_RT | INT_TX | INT_RX, /* combined IRQ */
+    INT_RX,
+    INT_TX,
+    INT_RT,
+    INT_MS,
+    INT_E,
+};
+
 static void pl011_update(PL011State *s)
 {
     uint32_t flags;
+    int i;
 
     flags = s->int_level & s->int_enabled;
     trace_pl011_irq_state(flags != 0);
-    qemu_set_irq(s->irq, flags != 0);
+    for (i = 0; i < ARRAY_SIZE(s->irq); i++) {
+        qemu_set_irq(s->irq[i], (flags & irqmask[i]) != 0);
+    }
 }
 
 static uint64_t pl011_read(void *opaque, hwaddr offset,
@@ -XXX,XX +XXX,XX @@ static void pl011_init(Object *obj)
 {
     SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
     PL011State *s = PL011(obj);
+    int i;
 
     memory_region_init_io(&s->iomem, OBJECT(s), &pl011_ops, s, "pl011", 0x1000);
     sysbus_init_mmio(sbd, &s->iomem);
-    sysbus_init_irq(sbd, &s->irq);
+    for (i = 0; i < ARRAY_SIZE(s->irq); i++) {
+        sysbus_init_irq(sbd, &s->irq[i]);
+    }
 
     s->read_trigger = 1;
     s->ifl = 0x12;
-- 
2.20.1

The pl011 logs when the guest makes a bad access. It prints
the address offset in hex but confusingly omits the '0x'
prefix; add it.

diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -XXX,XX +XXX,XX @@ static uint64_t pl011_read(void *opaque, hwaddr offset,
         break;
     default:
         qemu_log_mask(LOG_GUEST_ERROR,
-                      "pl011_read: Bad offset %x\n", (int)offset);
+                      "pl011_read: Bad offset 0x%x\n", (int)offset);
         r = 0;
         break;
     }
@@ -XXX,XX +XXX,XX @@ static void pl011_write(void *opaque, hwaddr offset,
         break;
     default:
         qemu_log_mask(LOG_GUEST_ERROR,
-                      "pl011_write: Bad offset %x\n", (int)offset);
+                      "pl011_write: Bad offset 0x%x\n", (int)offset);
     }
 }
 
-- 
2.20.1

In commit 4b635cf7a95e501211 we added a QOM property to the ARMSSE
object, but forgot to add it to the documentation comment in the
header. Correct the omission.

Fixes: 4b635cf7a95e501211 ("hw/arm/armsse: Make SRAM bank size configurable")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/hw/arm/armsse.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/hw/arm/armsse.h b/include/hw/arm/armsse.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/armsse.h
+++ b/include/hw/arm/armsse.h
@@ -XXX,XX +XXX,XX @@
  *    being the same for both, to avoid having to have separate Property
  *    lists for different variants. This restriction can be relaxed later
  *    if necessary.)
+ *  + QOM property "SRAM_ADDR_WIDTH" sets the number of bits used for the
+ *    address of each SRAM bank (and thus the total amount of internal SRAM)
  *  + Named GPIO inputs "EXP_IRQ" 0..n are the expansion interrupts for CPU 0,
  *    which are wired to its NVIC lines 32 .. n+32
  *  + Named GPIO inputs "EXP_CPU1_IRQ" 0..n are the expansion interrupts for
-- 
2.20.1

The Musca boards have DAPLink firmware that sets the initial
secure VTOR value (the location of the vector table) differently
depending on the boot mode (from flash, from RAM, etc). Export
the init-svtor as a QOM property of the ARMSSE object so that
the board can change it.

diff --git a/include/hw/arm/armsse.h b/include/hw/arm/armsse.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/armsse.h
+++ b/include/hw/arm/armsse.h
@@ -XXX,XX +XXX,XX @@
  *    if necessary.)
  *  + QOM property "SRAM_ADDR_WIDTH" sets the number of bits used for the
  *    address of each SRAM bank (and thus the total amount of internal SRAM)
+ *  + QOM property "init-svtor" sets the initial value of the CPU SVTOR register
+ *    (where it expects to load the PC and SP from the vector table on reset)
  *  + Named GPIO inputs "EXP_IRQ" 0..n are the expansion interrupts for CPU 0,
  *    which are wired to its NVIC lines 32 .. n+32
  *  + Named GPIO inputs "EXP_CPU1_IRQ" 0..n are the expansion interrupts for
@@ -XXX,XX +XXX,XX @@ typedef struct ARMSSE {
     uint32_t exp_numirq;
     uint32_t mainclk_frq;
     uint32_t sram_addr_width;
+    uint32_t init_svtor;
 } ARMSSE;
 
 typedef struct ARMSSEInfo ARMSSEInfo;
diff --git a/hw/arm/armsse.c b/hw/arm/armsse.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/armsse.c
+++ b/hw/arm/armsse.c
@@ -XXX,XX +XXX,XX @@ static void armsse_realize(DeviceState *dev, Error **errp)
          * the INITSVTOR* registers before powering up the CPUs in any case,
          * so the hardware's default value doesn't matter. QEMU doesn't emulate
          * the control processor, so instead we behave in the way that the
-         * firmware does. All boards currently known about have firmware that
-         * sets the INITSVTOR0 and INITSVTOR1 registers to 0x10000000, like the
-         * IoTKit default. We can make this more configurable if necessary.
+         * firmware does. The initial value is configurable by the board code
+         * to match whatever its firmware does.
          */
-        qdev_prop_set_uint32(cpudev, "init-svtor", 0x10000000);
+        qdev_prop_set_uint32(cpudev, "init-svtor", s->init_svtor);
         /*
          * Start all CPUs except CPU0 powered down. In real hardware it is
          * a configurable property of the SSE-200 which CPUs start powered up
@@ -XXX,XX +XXX,XX @@ static Property armsse_properties[] = {
     DEFINE_PROP_UINT32("EXP_NUMIRQ", ARMSSE, exp_numirq, 64),
     DEFINE_PROP_UINT32("MAINCLK", ARMSSE, mainclk_frq, 0),
     DEFINE_PROP_UINT32("SRAM_ADDR_WIDTH", ARMSSE, sram_addr_width, 15),
+    DEFINE_PROP_UINT32("init-svtor", ARMSSE, init_svtor, 0x10000000),
     DEFINE_PROP_END_OF_LIST()
 };
 
-- 
2.20.1

The Musca-A and Musca-B1 development boards are based on the
SSE-200 subsystem for embedded. Implement an initial skeleton
model of these boards, which are similar but not identical.

This commit creates the board model with the SSE and the IRQ
splitters to wire IRQs up to its two CPUs. As yet there
are no devices and no memory: these will be added later.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 hw/arm/Makefile.objs            |   1 +
 hw/arm/musca.c                  | 197 ++++++++++++++++++++++++++++++++
 MAINTAINERS                     |   6 +
 default-configs/arm-softmmu.mak |   1 +
 4 files changed, 205 insertions(+)
 create mode 100644 hw/arm/musca.c

diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_ASPEED_SOC) += aspeed_soc.o aspeed.o
 obj-$(CONFIG_MPS2) += mps2.o
 obj-$(CONFIG_MPS2) += mps2-tz.o
 obj-$(CONFIG_MSF2) += msf2-soc.o msf2-som.o
+obj-$(CONFIG_MUSCA) += musca.o
 obj-$(CONFIG_ARMSSE) += armsse.o
 obj-$(CONFIG_FSL_IMX7) += fsl-imx7.o mcimx7d-sabre.o
 obj-$(CONFIG_ARM_SMMUV3) += smmu-common.o smmuv3.o
diff --git a/hw/arm/musca.c b/hw/arm/musca.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/arm/musca.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * Arm Musca-B1 test chip board emulation
+ *
+ * Copyright (c) 2019 Linaro Limited
+ * Written by Peter Maydell
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2 or
+ *  (at your option) any later version.
+ */
+
+/*
+ * The Musca boards are a reference implementation of a system using
+ * the SSE-200 subsystem for embedded:
+ * https://developer.arm.com/products/system-design/development-boards/iot-test-chips-and-boards/musca-a-test-chip-board
+ * https://developer.arm.com/products/system-design/development-boards/iot-test-chips-and-boards/musca-b-test-chip-board
+ * We model the A and B1 variants of this board, as described in the TRMs:
+ * http://infocenter.arm.com/help/topic/com.arm.doc.101107_0000_00_en/index.html
+ * http://infocenter.arm.com/help/topic/com.arm.doc.101312_0000_00_en/index.html
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "exec/address-spaces.h"
+#include "hw/arm/arm.h"
+#include "hw/arm/armsse.h"
+#include "hw/boards.h"
+#include "hw/core/split-irq.h"
+
+#define MUSCA_NUMIRQ_MAX 96
+
+typedef enum MuscaType {
+    MUSCA_A,
+    MUSCA_B1,
+} MuscaType;
+
+typedef struct {
+    MachineClass parent;
+    MuscaType type;
+    uint32_t init_svtor;
+    int sram_addr_width;
+    int num_irqs;
+} MuscaMachineClass;
+
+typedef struct {
+    MachineState parent;
+
+    ARMSSE sse;
+    SplitIRQ cpu_irq_splitter[MUSCA_NUMIRQ_MAX];
+} MuscaMachineState;
+
+#define TYPE_MUSCA_MACHINE "musca"
+#define TYPE_MUSCA_A_MACHINE MACHINE_TYPE_NAME("musca-a")
+#define TYPE_MUSCA_B1_MACHINE MACHINE_TYPE_NAME("musca-b1")
+
+#define MUSCA_MACHINE(obj) \
+    OBJECT_CHECK(MuscaMachineState, obj, TYPE_MUSCA_MACHINE)
+#define MUSCA_MACHINE_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(MuscaMachineClass, obj, TYPE_MUSCA_MACHINE)
+#define MUSCA_MACHINE_CLASS(klass) \
+    OBJECT_CLASS_CHECK(MuscaMachineClass, klass, TYPE_MUSCA_MACHINE)
+
+/*
+ * Main SYSCLK frequency in Hz
+ * TODO this should really be different for the two cores, but we
+ * don't model that in our SSE-200 model yet.
+ */
+#define SYSCLK_FRQ 40000000
+
+static void musca_init(MachineState *machine)
+{
+    MuscaMachineState *mms = MUSCA_MACHINE(machine);
+    MuscaMachineClass *mmc = MUSCA_MACHINE_GET_CLASS(mms);
+    MachineClass *mc = MACHINE_GET_CLASS(machine);
+    MemoryRegion *system_memory = get_system_memory();
+    DeviceState *ssedev;
+    int i;
+
+    assert(mmc->num_irqs <= MUSCA_NUMIRQ_MAX);
+
+    if (strcmp(machine->cpu_type, mc->default_cpu_type) != 0) {
+        error_report("This board can only be used with CPU %s",
+                     mc->default_cpu_type);
+        exit(1);
+    }
+
+    sysbus_init_child_obj(OBJECT(machine), "sse-200", &mms->sse,
+                          sizeof(mms->sse), TYPE_SSE200);
+    ssedev = DEVICE(&mms->sse);
+    object_property_set_link(OBJECT(&mms->sse), OBJECT(system_memory),
+                             "memory", &error_fatal);
+    qdev_prop_set_uint32(ssedev, "EXP_NUMIRQ", mmc->num_irqs);
+    qdev_prop_set_uint32(ssedev, "init-svtor", mmc->init_svtor);
+    qdev_prop_set_uint32(ssedev, "SRAM_ADDR_WIDTH", mmc->sram_addr_width);
+    qdev_prop_set_uint32(ssedev, "MAINCLK", SYSCLK_FRQ);
+    object_property_set_bool(OBJECT(&mms->sse), true, "realized",
+                             &error_fatal);
+
+    /*
+     * We need to create splitters to feed the IRQ inputs
+     * for each CPU in the SSE-200 from each device in the board.
+     */
+    for (i = 0; i < mmc->num_irqs; i++) {
+        char *name = g_strdup_printf("musca-irq-splitter%d", i);
+        SplitIRQ *splitter = &mms->cpu_irq_splitter[i];
+
+        object_initialize_child(OBJECT(machine), name,
+                                splitter, sizeof(*splitter),
+                                TYPE_SPLIT_IRQ, &error_fatal, NULL);
+        g_free(name);
+
+        object_property_set_int(OBJECT(splitter), 2, "num-lines",
+                                &error_fatal);
+        object_property_set_bool(OBJECT(splitter), true, "realized",
+                                 &error_fatal);
+        qdev_connect_gpio_out(DEVICE(splitter), 0,
+                              qdev_get_gpio_in_named(ssedev, "EXP_IRQ", i));
+        qdev_connect_gpio_out(DEVICE(splitter), 1,
+                              qdev_get_gpio_in_named(ssedev,
+                                                     "EXP_CPU1_IRQ", i));
+    }
+
+    armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename, 0x2000000);
+}
+
+static void musca_class_init(ObjectClass *oc, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(oc);
+
+    mc->default_cpus = 2;
+    mc->min_cpus = mc->default_cpus;
+    mc->max_cpus = mc->default_cpus;
+    mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-m33");
+    mc->init = musca_init;
+}
+
+static void musca_a_class_init(ObjectClass *oc, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(oc);
+    MuscaMachineClass *mmc = MUSCA_MACHINE_CLASS(oc);
+
+    mc->desc = "ARM Musca-A board (dual Cortex-M33)";
+    mmc->type = MUSCA_A;
+    mmc->init_svtor = 0x10200000;
+    mmc->sram_addr_width = 15;
+    mmc->num_irqs = 64;
+}
+
+static void musca_b1_class_init(ObjectClass *oc, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(oc);
+    MuscaMachineClass *mmc = MUSCA_MACHINE_CLASS(oc);
+
+    mc->desc = "ARM Musca-B1 board (dual Cortex-M33)";
+    mmc->type = MUSCA_B1;
+    /*
+     * This matches the DAPlink firmware which boots from QSPI. There
+     * is also a firmware blob which boots from the eFlash, which
+     * uses init_svtor = 0x1A000000. QEMU doesn't currently support that,
+     * though we could in theory expose a machine property on the command
+     * line to allow the user to request eFlash boot.
+     */
+    mmc->init_svtor = 0x10000000;
+    mmc->sram_addr_width = 17;
+    mmc->num_irqs = 96;
+}
+
+static const TypeInfo musca_info = {
+    .name = TYPE_MUSCA_MACHINE,
+    .parent = TYPE_MACHINE,
+    .abstract = true,
+    .instance_size = sizeof(MuscaMachineState),
+    .class_size = sizeof(MuscaMachineClass),
+    .class_init = musca_class_init,
+};
+
+static const TypeInfo musca_a_info = {
+    .name = TYPE_MUSCA_A_MACHINE,
+    .parent = TYPE_MUSCA_MACHINE,
+    .class_init = musca_a_class_init,
+};
+
+static const TypeInfo musca_b1_info = {
+    .name = TYPE_MUSCA_B1_MACHINE,
+    .parent = TYPE_MUSCA_MACHINE,
+    .class_init = musca_b1_class_init,
+};
+
+static void musca_machine_init(void)
+{
+    type_register_static(&musca_info);
+    type_register_static(&musca_a_info);
+    type_register_static(&musca_b1_info);
+}
+
+type_init(musca_machine_init);
diff --git a/MAINTAINERS b/MAINTAINERS
index XXXXXXX..XXXXXXX 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ F: include/hw/misc/iotkit-sysinfo.h
 F: hw/misc/armsse-cpuid.c
 F: include/hw/misc/armsse-cpuid.h
 
+Musca
+M: Peter Maydell <peter.maydell@linaro.org>
+L: qemu-arm@nongnu.org
+S: Maintained
+F: hw/arm/musca.c
+
 Musicpal
 M: Jan Kiszka <jan.kiszka@web.de>
 M: Peter Maydell <peter.maydell@linaro.org>
diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index XXXXXXX..XXXXXXX 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -XXX,XX +XXX,XX @@ CONFIG_TUSB6010=y
 CONFIG_IMX=y
 CONFIG_MAINSTONE=y
 CONFIG_MPS2=y
+CONFIG_MUSCA=y
 CONFIG_NSERIES=y
 CONFIG_RASPI=y
 CONFIG_REALVIEW=y
-- 
2.20.1

Many of the devices on the Musca board live behind TrustZone
Peripheral Protection Controllers (PPCs); add models of the
PPCs, using a similar scheme to the MPS2 board models.
This commit wires up the PPCs with "unimplemented device"
stubs behind them in the correct places in the address map.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 hw/arm/musca.c | 289 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 289 insertions(+)

diff --git a/hw/arm/musca.c b/hw/arm/musca.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/musca.c
+++ b/hw/arm/musca.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/arm/armsse.h"
 #include "hw/boards.h"
 #include "hw/core/split-irq.h"
+#include "hw/misc/tz-ppc.h"
+#include "hw/misc/unimp.h"
 
 #define MUSCA_NUMIRQ_MAX 96
+#define MUSCA_PPC_MAX 3
 
 typedef enum MuscaType {
     MUSCA_A,
@@ -XXX,XX +XXX,XX @@ typedef struct {
 
     ARMSSE sse;
     SplitIRQ cpu_irq_splitter[MUSCA_NUMIRQ_MAX];
+    SplitIRQ sec_resp_splitter;
+    TZPPC ppc[MUSCA_PPC_MAX];
+    MemoryRegion container;
+    UnimplementedDeviceState eflash[2];
+    UnimplementedDeviceState qspi;
+    UnimplementedDeviceState mpc[5];
+    UnimplementedDeviceState mhu[2];
+    UnimplementedDeviceState pwm[3];
+    UnimplementedDeviceState i2s;
+    UnimplementedDeviceState uart[2];
+    UnimplementedDeviceState i2c[2];
+    UnimplementedDeviceState spi;
+    UnimplementedDeviceState scc;
+    UnimplementedDeviceState timer;
+    UnimplementedDeviceState rtc;
+    UnimplementedDeviceState pvt;
+    UnimplementedDeviceState sdio;
+    UnimplementedDeviceState gpio;
 } MuscaMachineState;
 
 #define TYPE_MUSCA_MACHINE "musca"
@@ -XXX,XX +XXX,XX @@ typedef struct {
  */
 #define SYSCLK_FRQ 40000000
 
+/*
+ * Most of the devices in the Musca board sit behind Peripheral Protection
+ * Controllers. These data structures define the layout of which devices
+ * sit behind which PPCs.
+ * The devfn for each port is a function which creates, configures
+ * and initializes the device, returning the MemoryRegion which
+ * needs to be plugged into the downstream end of the PPC port.
+ */
+typedef MemoryRegion *MakeDevFn(MuscaMachineState *mms, void *opaque,
+                                const char *name, hwaddr size);
+
+typedef struct PPCPortInfo {
+    const char *name;
+    MakeDevFn *devfn;
+    void *opaque;
+    hwaddr addr;
+    hwaddr size;
+} PPCPortInfo;
+
+typedef struct PPCInfo {
+    const char *name;
+    PPCPortInfo ports[TZ_NUM_PORTS];
+} PPCInfo;
+
+static MemoryRegion *make_unimp_dev(MuscaMachineState *mms,
+                                    void *opaque, const char *name, hwaddr size)
+{
+    /*
+     * Initialize, configure and realize a TYPE_UNIMPLEMENTED_DEVICE,
+     * and return a pointer to its MemoryRegion.
+     */
+    UnimplementedDeviceState *uds = opaque;
+
+    sysbus_init_child_obj(OBJECT(mms), name, uds,
+                          sizeof(UnimplementedDeviceState),
+                          TYPE_UNIMPLEMENTED_DEVICE);
+    qdev_prop_set_string(DEVICE(uds), "name", name);
+    qdev_prop_set_uint64(DEVICE(uds), "size", size);
+    object_property_set_bool(OBJECT(uds), true, "realized", &error_fatal);
+    return sysbus_mmio_get_region(SYS_BUS_DEVICE(uds), 0);
+}
+
+static MemoryRegion *make_musca_a_devs(MuscaMachineState *mms, void *opaque,
+                                       const char *name, hwaddr size)
+{
+    /*
+     * Create the container MemoryRegion for all the devices that live
+     * behind the Musca-A PPC's single port. These devices don't have a PPC
+     * port each, but we use the PPCPortInfo struct as a convenient way
+     * to describe them. Note that addresses here are relative to the base
+     * address of the PPC port region: 0x40100000, and devices appear both
+     * at the 0x4... NS region and the 0x5... S region.
+     */
+    int i;
+    MemoryRegion *container = &mms->container;
+
+    const PPCPortInfo devices[] = {
+        { "uart0", make_unimp_dev, &mms->uart[0], 0x1000, 0x1000 },
+        { "uart1", make_unimp_dev, &mms->uart[1], 0x2000, 0x1000 },
+        { "spi", make_unimp_dev, &mms->spi, 0x3000, 0x1000 },
+        { "i2c0", make_unimp_dev, &mms->i2c[0], 0x4000, 0x1000 },
+        { "i2c1", make_unimp_dev, &mms->i2c[1], 0x5000, 0x1000 },
+        { "i2s", make_unimp_dev, &mms->i2s, 0x6000, 0x1000 },
+        { "pwm0", make_unimp_dev, &mms->pwm[0], 0x7000, 0x1000 },
+        { "rtc", make_unimp_dev, &mms->rtc, 0x8000, 0x1000 },
+        { "qspi", make_unimp_dev, &mms->qspi, 0xa000, 0x1000 },
+        { "timer", make_unimp_dev, &mms->timer, 0xb000, 0x1000 },
+        { "scc", make_unimp_dev, &mms->scc, 0xc000, 0x1000 },
+        { "pwm1", make_unimp_dev, &mms->pwm[1], 0xe000, 0x1000 },
+        { "pwm2", make_unimp_dev, &mms->pwm[2], 0xf000, 0x1000 },
+        { "gpio", make_unimp_dev, &mms->gpio, 0x10000, 0x1000 },
+        { "mpc0", make_unimp_dev, &mms->mpc[0], 0x12000, 0x1000 },
+        { "mpc1", make_unimp_dev, &mms->mpc[1], 0x13000, 0x1000 },
+    };
+
+    memory_region_init(container, OBJECT(mms), "musca-device-container", size);
+
+    for (i = 0; i < ARRAY_SIZE(devices); i++) {
+        const PPCPortInfo *pinfo = &devices[i];
+        MemoryRegion *mr;
+
+        mr = pinfo->devfn(mms, pinfo->opaque, pinfo->name, pinfo->size);
+        memory_region_add_subregion(container, pinfo->addr, mr);
+    }
+
+    return &mms->container;
+}
+
 static void musca_init(MachineState *machine)
 {
     MuscaMachineState *mms = MUSCA_MACHINE(machine);
@@ -XXX,XX +XXX,XX @@ static void musca_init(MachineState *machine)
     MachineClass *mc = MACHINE_GET_CLASS(machine);
     MemoryRegion *system_memory = get_system_memory();
     DeviceState *ssedev;
+    DeviceState *dev_splitter;
+    const PPCInfo *ppcs;
+    int num_ppcs;
     int i;
 
     assert(mmc->num_irqs <= MUSCA_NUMIRQ_MAX);
@@ -XXX,XX +XXX,XX @@ static void musca_init(MachineState *machine)
                                                      "EXP_CPU1_IRQ", i));
     }
 
+    /*
+     * The sec_resp_cfg output from the SSE-200 must be split into multiple
+     * lines, one for each of the PPCs we create here.
+     */
+    object_initialize(&mms->sec_resp_splitter, sizeof(mms->sec_resp_splitter),
+                      TYPE_SPLIT_IRQ);
+    object_property_add_child(OBJECT(machine), "sec-resp-splitter",
+                              OBJECT(&mms->sec_resp_splitter), &error_fatal);
+    object_property_set_int(OBJECT(&mms->sec_resp_splitter),
+                            ARRAY_SIZE(mms->ppc), "num-lines", &error_fatal);
+    object_property_set_bool(OBJECT(&mms->sec_resp_splitter), true,
+                             "realized", &error_fatal);
+    dev_splitter = DEVICE(&mms->sec_resp_splitter);
+    qdev_connect_gpio_out_named(ssedev, "sec_resp_cfg", 0,
+                                qdev_get_gpio_in(dev_splitter, 0));
+
+    /*
+     * Most of the devices in the board are behind Peripheral Protection
+     * Controllers. The required order for initializing things is:
+     *  + initialize the PPC
+     *  + initialize, configure and realize downstream devices
+     *  + connect downstream device MemoryRegions to the PPC
+     *  + realize the PPC
+     *  + map the PPC's MemoryRegions to the places in the address map
+     *    where the downstream devices should appear
+     *  + wire up the PPC's control lines to the SSE object
+     *
+     * The PPC mapping differs for the -A and -B1 variants; the -A version
+     * is much simpler, using only a single port of a single PPC and putting
+     * all the devices behind that.
+     */
+    const PPCInfo a_ppcs[] = { {
+            .name = "ahb_ppcexp0",
+            .ports = {
+                { "musca-devices", make_musca_a_devs, 0, 0x40100000, 0x100000 },
+            },
+        },
+    };
+
+    /*
+     * Devices listed with an 0x4.. address appear in both the NS 0x4.. region
+     * and the 0x5.. S region. Devices listed with an 0x5.. address appear
+     * only in the S region.
+     */
+    const PPCInfo b1_ppcs[] = { {
+            .name = "apb_ppcexp0",
+            .ports = {
+                { "eflash0", make_unimp_dev, &mms->eflash[0],
+                  0x52400000, 0x1000 },
+                { "eflash1", make_unimp_dev, &mms->eflash[1],
+                  0x52500000, 0x1000 },
+                { "qspi", make_unimp_dev, &mms->qspi, 0x42800000, 0x100000 },
+                { "mpc0", make_unimp_dev, &mms->mpc[0], 0x52000000, 0x1000 },
+                { "mpc1", make_unimp_dev, &mms->mpc[1], 0x52100000, 0x1000 },
+                { "mpc2", make_unimp_dev, &mms->mpc[2], 0x52200000, 0x1000 },
+                { "mpc3", make_unimp_dev, &mms->mpc[3], 0x52300000, 0x1000 },
+                { "mhu0", make_unimp_dev, &mms->mhu[0], 0x42600000, 0x100000 },
+                { "mhu1", make_unimp_dev, &mms->mhu[1], 0x42700000, 0x100000 },
+                { }, /* port 9: unused */
+                { }, /* port 10: unused */
+                { }, /* port 11: unused */
+                { }, /* port 12: unused */
+                { }, /* port 13: unused */
+                { "mpc4", make_unimp_dev, &mms->mpc[4], 0x52e00000, 0x1000 },
+            },
+        }, {
+            .name = "apb_ppcexp1",
+            .ports = {
+                { "pwm0", make_unimp_dev, &mms->pwm[0], 0x40101000, 0x1000 },
+                { "pwm1", make_unimp_dev, &mms->pwm[1], 0x40102000, 0x1000 },
+                { "pwm2", make_unimp_dev, &mms->pwm[2], 0x40103000, 0x1000 },
+                { "i2s", make_unimp_dev, &mms->i2s, 0x40104000, 0x1000 },
+                { "uart0", make_unimp_dev, &mms->uart[0], 0x40105000, 0x1000 },
+                { "uart1", make_unimp_dev, &mms->uart[1], 0x40106000, 0x1000 },
+                { "i2c0", make_unimp_dev, &mms->i2c[0], 0x40108000, 0x1000 },
+                { "i2c1", make_unimp_dev, &mms->i2c[1], 0x40109000, 0x1000 },
+                { "spi", make_unimp_dev, &mms->spi, 0x4010a000, 0x1000 },
+                { "scc", make_unimp_dev, &mms->scc, 0x5010b000, 0x1000 },
+                { "timer", make_unimp_dev, &mms->timer, 0x4010c000, 0x1000 },
+                { "rtc", make_unimp_dev, &mms->rtc, 0x4010d000, 0x1000 },
+                { "pvt", make_unimp_dev, &mms->pvt, 0x4010e000, 0x1000 },
+                { "sdio", make_unimp_dev, &mms->sdio, 0x4010f000, 0x1000 },
+            },
+        }, {
+            .name = "ahb_ppcexp0",
+            .ports = {
+                { }, /* port 0: unused */
+                { "gpio", make_unimp_dev, &mms->gpio, 0x41000000, 0x1000 },
+            },
+        },
+    };
+
+    switch (mmc->type) {
+    case MUSCA_A:
+        ppcs = a_ppcs;
+        num_ppcs = ARRAY_SIZE(a_ppcs);
+        break;
+    case MUSCA_B1:
+        ppcs = b1_ppcs;
+        num_ppcs = ARRAY_SIZE(b1_ppcs);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+    assert(num_ppcs <= MUSCA_PPC_MAX);
+
+    for (i = 0; i < num_ppcs; i++) {
+        const PPCInfo *ppcinfo = &ppcs[i];
+        TZPPC *ppc = &mms->ppc[i];
+        DeviceState *ppcdev;
+        int port;
+        char *gpioname;
+
+        sysbus_init_child_obj(OBJECT(machine), ppcinfo->name, ppc,
+                              sizeof(TZPPC), TYPE_TZ_PPC);
+        ppcdev = DEVICE(ppc);
+
+        for (port = 0; port < TZ_NUM_PORTS; port++) {
+            const PPCPortInfo *pinfo = &ppcinfo->ports[port];
+            MemoryRegion *mr;
+            char *portname;
+
+            if (!pinfo->devfn) {
+                continue;
+            }
+
+            mr = pinfo->devfn(mms, pinfo->opaque, pinfo->name, pinfo->size);
+            portname = g_strdup_printf("port[%d]", port);
+            object_property_set_link(OBJECT(ppc), OBJECT(mr),
+                                     portname, &error_fatal);
+            g_free(portname);
+        }
+
+        object_property_set_bool(OBJECT(ppc), true, "realized", &error_fatal);
+
+        for (port = 0; port < TZ_NUM_PORTS; port++) {
+            const PPCPortInfo *pinfo = &ppcinfo->ports[port];
+
+            if (!pinfo->devfn) {
+                continue;
+            }
+            sysbus_mmio_map(SYS_BUS_DEVICE(ppc), port, pinfo->addr);
+
+            gpioname = g_strdup_printf("%s_nonsec", ppcinfo->name);
+            qdev_connect_gpio_out_named(ssedev, gpioname, port,
+                                        qdev_get_gpio_in_named(ppcdev,
+                                                               "cfg_nonsec",
+                                                               port));
+            g_free(gpioname);
+            gpioname = g_strdup_printf("%s_ap", ppcinfo->name);
+            qdev_connect_gpio_out_named(ssedev, gpioname, port,
+                                        qdev_get_gpio_in_named(ppcdev,
+                                                               "cfg_ap", port));
+            g_free(gpioname);
+        }
+
+        gpioname = g_strdup_printf("%s_irq_enable", ppcinfo->name);
+        qdev_connect_gpio_out_named(ssedev, gpioname, 0,
+                                    qdev_get_gpio_in_named(ppcdev,
+                                                           "irq_enable", 0));
+        g_free(gpioname);
+        gpioname = g_strdup_printf("%s_irq_clear", ppcinfo->name);
+        qdev_connect_gpio_out_named(ssedev, gpioname, 0,
+                                    qdev_get_gpio_in_named(ppcdev,
+                                                           "irq_clear", 0));
+        g_free(gpioname);
+        gpioname = g_strdup_printf("%s_irq_status", ppcinfo->name);
+        qdev_connect_gpio_out_named(ppcdev, "irq", 0,
+                                    qdev_get_gpio_in_named(ssedev,
+                                                           gpioname, 0));
+        g_free(gpioname);
+
+        qdev_connect_gpio_out(dev_splitter, i,
+                              qdev_get_gpio_in_named(ppcdev,
+                                                     "cfg_sec_resp", 0));
+    }
+
     armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename, 0x2000000);
 }
 
-- 
2.20.1

The Musca board puts its SRAM and flash behind TrustZone
Memory Protection Controllers (MPCs). Each MPC sits between
the CPU and the RAM/flash, and also has a set of memory mapped
control registers. Wire up the MPCs, and the memory behind them.
For the moment we implement the flash as simple ROM, which
cannot be reprogrammed by the guest.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 hw/arm/musca.c | 155 ++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 147 insertions(+), 8 deletions(-)

Wire up the PL031 RTC for the Musca board.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 hw/arm/musca.c | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/hw/arm/musca.c b/hw/arm/musca.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/musca.c
+++ b/hw/arm/musca.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/misc/tz-mpc.h"
 #include "hw/misc/tz-ppc.h"
 #include "hw/misc/unimp.h"
+#include "hw/timer/pl031.h"
 
 #define MUSCA_NUMIRQ_MAX 96
 #define MUSCA_PPC_MAX 3
@@ -XXX,XX +XXX,XX @@ typedef struct {
     UnimplementedDeviceState spi;
     UnimplementedDeviceState scc;
     UnimplementedDeviceState timer;
-    UnimplementedDeviceState rtc;
+    PL031State rtc;
     UnimplementedDeviceState pvt;
     UnimplementedDeviceState sdio;
     UnimplementedDeviceState gpio;
@@ -XXX,XX +XXX,XX @@ typedef struct {
  */
 #define SYSCLK_FRQ 40000000
 
+static qemu_irq get_sse_irq_in(MuscaMachineState *mms, int irqno)
+{
+    /* Return a qemu_irq which will signal IRQ n to all CPUs in the SSE. */
+    assert(irqno < MUSCA_NUMIRQ_MAX);
+
+    return qdev_get_gpio_in(DEVICE(&mms->cpu_irq_splitter[irqno]), 0);
+}
+
 /*
  * Most of the devices in the Musca board sit behind Peripheral Protection
  * Controllers. These data structures define the layout of which devices
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_mpc(MuscaMachineState *mms, void *opaque,
     return sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 0);
 }
 
+static MemoryRegion *make_rtc(MuscaMachineState *mms, void *opaque,
+                              const char *name, hwaddr size)
+{
+    PL031State *rtc = opaque;
+
+    sysbus_init_child_obj(OBJECT(mms), name, rtc, sizeof(mms->rtc), TYPE_PL031);
+    object_property_set_bool(OBJECT(rtc), true, "realized", &error_fatal);
+    sysbus_connect_irq(SYS_BUS_DEVICE(rtc), 0, get_sse_irq_in(mms, 39));
+    return sysbus_mmio_get_region(SYS_BUS_DEVICE(rtc), 0);
+}
+
 static MemoryRegion *make_musca_a_devs(MuscaMachineState *mms, void *opaque,
                                        const char *name, hwaddr size)
 {
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_musca_a_devs(MuscaMachineState *mms, void *opaque,
         { "i2c1", make_unimp_dev, &mms->i2c[1], 0x5000, 0x1000 },
         { "i2s", make_unimp_dev, &mms->i2s, 0x6000, 0x1000 },
         { "pwm0", make_unimp_dev, &mms->pwm[0], 0x7000, 0x1000 },
-        { "rtc", make_unimp_dev, &mms->rtc, 0x8000, 0x1000 },
+        { "rtc", make_rtc, &mms->rtc, 0x8000, 0x1000 },
         { "qspi", make_unimp_dev, &mms->qspi, 0xa000, 0x1000 },
         { "timer", make_unimp_dev, &mms->timer, 0xb000, 0x1000 },
         { "scc", make_unimp_dev, &mms->scc, 0xc000, 0x1000 },
@@ -XXX,XX +XXX,XX @@ static void musca_init(MachineState *machine)
                 { "spi", make_unimp_dev, &mms->spi, 0x4010a000, 0x1000 },
                 { "scc", make_unimp_dev, &mms->scc, 0x5010b000, 0x1000 },
                 { "timer", make_unimp_dev, &mms->timer, 0x4010c000, 0x1000 },
-                { "rtc", make_unimp_dev, &mms->rtc, 0x4010d000, 0x1000 },
+                { "rtc", make_rtc, &mms->rtc, 0x4010d000, 0x1000 },
                 { "pvt", make_unimp_dev, &mms->pvt, 0x4010e000, 0x1000 },
                 { "sdio", make_unimp_dev, &mms->sdio, 0x4010f000, 0x1000 },
             },
-- 
2.20.1

Wire up the two PL011 UARTs in the Musca board.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 hw/arm/musca.c | 34 +++++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/hw/arm/musca.c b/hw/arm/musca.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/musca.c
+++ b/hw/arm/musca.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "exec/address-spaces.h"
+#include "sysemu/sysemu.h"
 #include "hw/arm/arm.h"
 #include "hw/arm/armsse.h"
 #include "hw/boards.h"
+#include "hw/char/pl011.h"
 #include "hw/core/split-irq.h"
 #include "hw/misc/tz-mpc.h"
 #include "hw/misc/tz-ppc.h"
@@ -XXX,XX +XXX,XX @@ typedef struct {
     UnimplementedDeviceState mhu[2];
     UnimplementedDeviceState pwm[3];
     UnimplementedDeviceState i2s;
-    UnimplementedDeviceState uart[2];
+    PL011State uart[2];
     UnimplementedDeviceState i2c[2];
     UnimplementedDeviceState spi;
     UnimplementedDeviceState scc;
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_rtc(MuscaMachineState *mms, void *opaque,
     return sysbus_mmio_get_region(SYS_BUS_DEVICE(rtc), 0);
 }
 
+static MemoryRegion *make_uart(MuscaMachineState *mms, void *opaque,
+                               const char *name, hwaddr size)
+{
+    PL011State *uart = opaque;
+    int i = uart - &mms->uart[0];
+    int irqbase = 7 + i * 6;
+    SysBusDevice *s;
+
+    sysbus_init_child_obj(OBJECT(mms), name, uart, sizeof(mms->uart[0]),
+                          TYPE_PL011);
+    qdev_prop_set_chr(DEVICE(uart), "chardev", serial_hd(i));
+    object_property_set_bool(OBJECT(uart), true, "realized", &error_fatal);
+    s = SYS_BUS_DEVICE(uart);
+    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, irqbase + 5)); /* combined */
+    sysbus_connect_irq(s, 1, get_sse_irq_in(mms, irqbase + 0)); /* RX */
+    sysbus_connect_irq(s, 2, get_sse_irq_in(mms, irqbase + 1)); /* TX */
+    sysbus_connect_irq(s, 3, get_sse_irq_in(mms, irqbase + 2)); /* RT */
+    sysbus_connect_irq(s, 4, get_sse_irq_in(mms, irqbase + 3)); /* MS */
+    sysbus_connect_irq(s, 5, get_sse_irq_in(mms, irqbase + 4)); /* E */
+    return sysbus_mmio_get_region(SYS_BUS_DEVICE(uart), 0);
+}
+
 static MemoryRegion *make_musca_a_devs(MuscaMachineState *mms, void *opaque,
                                        const char *name, hwaddr size)
 {
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_musca_a_devs(MuscaMachineState *mms, void *opaque,
     MemoryRegion *container = &mms->container;
 
     const PPCPortInfo devices[] = {
-        { "uart0", make_unimp_dev, &mms->uart[0], 0x1000, 0x1000 },
-        { "uart1", make_unimp_dev, &mms->uart[1], 0x2000, 0x1000 },
+        { "uart0", make_uart, &mms->uart[0], 0x1000, 0x1000 },
+        { "uart1", make_uart, &mms->uart[1], 0x2000, 0x1000 },
         { "spi", make_unimp_dev, &mms->spi, 0x3000, 0x1000 },
         { "i2c0", make_unimp_dev, &mms->i2c[0], 0x4000, 0x1000 },
         { "i2c1", make_unimp_dev, &mms->i2c[1], 0x5000, 0x1000 },
@@ -XXX,XX +XXX,XX @@ static void musca_init(MachineState *machine)
                 { "pwm1", make_unimp_dev, &mms->pwm[1], 0x40102000, 0x1000 },
                 { "pwm2", make_unimp_dev, &mms->pwm[2], 0x40103000, 0x1000 },
                 { "i2s", make_unimp_dev, &mms->i2s, 0x40104000, 0x1000 },
-                { "uart0", make_unimp_dev, &mms->uart[0], 0x40105000, 0x1000 },
-                { "uart1", make_unimp_dev, &mms->uart[1], 0x40106000, 0x1000 },
+                { "uart0", make_uart, &mms->uart[0], 0x40105000, 0x1000 },
+                { "uart1", make_uart, &mms->uart[1], 0x40106000, 0x1000 },
                 { "i2c0", make_unimp_dev, &mms->i2c[0], 0x40108000, 0x1000 },
                 { "i2c1", make_unimp_dev, &mms->i2c[1], 0x40109000, 0x1000 },
                 { "spi", make_unimp_dev, &mms->spi, 0x4010a000, 0x1000 },
-- 
2.20.1

The region 0x40010000 .. 0x4001ffff and its secure-only alias
at 0x50010000... are for per-CPU devices. We implement this by
giving each CPU its own container memory region, where the
per-CPU devices live. Unfortunately, the alias region which
makes devices mapped at 0x4... addresses also appear at 0x5...
is only implemented in the overall "all CPUs" container. The
effect of this bug is that the CPU_IDENTITY register block appears
only at 0x4001f000, but not at the 0x5001f000 alias where it should
also appear. Guests (like very recent Arm Trusted Firmware-M)
which try to access it at 0x5001f000 will crash.

Fix this by moving the handling for this alias from the "all CPUs"
container to the per-CPU container. (We leave the aliases for
0x1... and 0x3... in the overall container, because there are
no per-CPU devices there.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20190215180500.6906-1-peter.maydell@linaro.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
---
 include/hw/arm/armsse.h |  2 +-
 hw/arm/armsse.c         | 26 ++++++++++++++++----------
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/include/hw/arm/armsse.h b/include/hw/arm/armsse.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/armsse.h
+++ b/include/hw/arm/armsse.h
@@ -XXX,XX +XXX,XX @@ typedef struct ARMSSE {
     MemoryRegion cpu_container[SSE_MAX_CPUS];
     MemoryRegion alias1;
     MemoryRegion alias2;
-    MemoryRegion alias3;
+    MemoryRegion alias3[SSE_MAX_CPUS];
     MemoryRegion sram[MAX_SRAM_BANKS];
 
     qemu_irq *exp_irqs[SSE_MAX_CPUS];
diff --git a/hw/arm/armsse.c b/hw/arm/armsse.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/armsse.c
+++ b/hw/arm/armsse.c
@@ -XXX,XX +XXX,XX @@ static bool irq_is_common[32] = {
     /* 30, 31: reserved */
 };
 
-/* Create an alias region of @size bytes starting at @base
+/*
+ * Create an alias region in @container of @size bytes starting at @base
  * which mirrors the memory starting at @orig.
  */
-static void make_alias(ARMSSE *s, MemoryRegion *mr, const char *name,
-                       hwaddr base, hwaddr size, hwaddr orig)
+static void make_alias(ARMSSE *s, MemoryRegion *mr, MemoryRegion *container,
+                       const char *name, hwaddr base, hwaddr size, hwaddr orig)
 {
-    memory_region_init_alias(mr, NULL, name, &s->container, orig, size);
+    memory_region_init_alias(mr, NULL, name, container, orig, size);
     /* The alias is even lower priority than unimplemented_device regions */
-    memory_region_add_subregion_overlap(&s->container, base, mr, -1500);
+    memory_region_add_subregion_overlap(container, base, mr, -1500);
 }
 
 static void irq_status_forwarder(void *opaque, int n, int level)
@@ -XXX,XX +XXX,XX @@ static void armsse_realize(DeviceState *dev, Error **errp)
     }
 
     /* Set up the big aliases first */
-    make_alias(s, &s->alias1, "alias 1", 0x10000000, 0x10000000, 0x00000000);
-    make_alias(s, &s->alias2, "alias 2", 0x30000000, 0x10000000, 0x20000000);
+    make_alias(s, &s->alias1, &s->container, "alias 1",
+               0x10000000, 0x10000000, 0x00000000);
+    make_alias(s, &s->alias2, &s->container,
+               "alias 2", 0x30000000, 0x10000000, 0x20000000);
     /* The 0x50000000..0x5fffffff region is not a pure alias: it has
      * a few extra devices that only appear there (generally the
      * control interfaces for the protection controllers).
      * We implement this by mapping those devices over the top of this
-     * alias MR at a higher priority.
+     * alias MR at a higher priority. Some of the devices in this range
+     * are per-CPU, so we must put this alias in the per-cpu containers.
      */
-    make_alias(s, &s->alias3, "alias 3", 0x50000000, 0x10000000, 0x40000000);
-
+    for (i = 0; i < info->num_cpus; i++) {
+        make_alias(s, &s->alias3[i], &s->cpu_container[i],
+                   "alias 3", 0x50000000, 0x10000000, 0x40000000);
+    }
 
     /* Security controller */
     object_property_set_bool(OBJECT(&s->secctl), true, "realized", &err);
-- 
2.20.1

Big pullreq this week, though none of the new features are
particularly earthshaking. Most of the bulk is from code cleanup
patches from me or rth.

thanks
-- PMM

The following changes since commit b651b80822fa8cb66ca30087ac7fbc75507ae5d2:

Merge remote-tracking branch 'remotes/vivier2/tags/linux-user-for-5.0-pull-request' into staging (2020-02-20 17:35:42 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200221

for you to fetch changes up to 270a679b3f950d7c4c600f324aab8bff292d0971:

target/arm: Add missing checks for fpsp_v2 (2020-02-21 12:54:25 +0000)

----------------------------------------------------------------
target-arm queue:
 * aspeed/scu: Implement chip ID register
 * hw/misc/iotkit-secctl: Fix writing to 'PPC Interrupt Clear' register
 * mainstone: Make providing flash images non-mandatory
 * z2: Make providing flash images non-mandatory
 * Fix failures to flush SVE high bits after AdvSIMD INS/ZIP/UZP/TRN/TBL/TBX/EXT
 * Minor performance improvement: spend less time recalculating hflags values
 * Code cleanup to isar_feature function tests
 * Implement ARMv8.1-PMU and ARMv8.4-PMU extensions
 * Bugfix: correct handling of PMCR_EL0.LC bit
 * Bugfix: correct definition of PMCRDP
 * Correctly implement ACTLR2, HACTLR2
 * allwinner: Wire up USB ports
 * Vectorize emulation of USHL, SSHL, PMUL*
 * xilinx_spips: Correct the number of dummy cycles for the FAST_READ_4 cmd
 * sh4: Fix PCI ISA IO memory subregion
 * Code cleanup to use more isar_feature tests and fewer ARM_FEATURE_* tests

----------------------------------------------------------------
Francisco Iglesias (1):
      xilinx_spips: Correct the number of dummy cycles for the FAST_READ_4 cmd

Guenter Roeck (6):
      mainstone: Make providing flash images non-mandatory
      z2: Make providing flash images non-mandatory
      hw: usb: hcd-ohci: Move OHCISysBusState and TYPE_SYSBUS_OHCI to include file
      hcd-ehci: Introduce "companion-enable" sysbus property
      arm: allwinner: Wire up USB ports
      sh4: Fix PCI ISA IO memory subregion

Joel Stanley (2):
      aspeed/scu: Create separate write callbacks
      aspeed/scu: Implement chip ID register

Peter Maydell (21):
      target/arm: Add _aa32_ to isar_feature functions testing 32-bit ID registers
      target/arm: Check aa32_pan in take_aarch32_exception(), not aa64_pan
      target/arm: Add isar_feature_any_fp16 and document naming/usage conventions
      target/arm: Define and use any_predinv isar_feature test
      target/arm: Factor out PMU register definitions
      target/arm: Add and use FIELD definitions for ID_AA64DFR0_EL1
      target/arm: Use FIELD macros for clearing ID_DFR0 PERFMON field
      target/arm: Define an aa32_pmu_8_1 isar feature test function
      target/arm: Add _aa64_ and _any_ versions of pmu_8_1 isar checks
      target/arm: Stop assuming DBGDIDR always exists
      target/arm: Move DBGDIDR into ARMISARegisters
      target/arm: Read debug-related ID registers from KVM
      target/arm: Implement ARMv8.1-PMU extension
      target/arm: Implement ARMv8.4-PMU extension
      target/arm: Provide ARMv8.4-PMU in '-cpu max'
      target/arm: Correct definition of PMCRDP
      target/arm: Correct handling of PMCR_EL0.LC bit
      target/arm: Test correct register in aa32_pan and aa32_ats1e1 checks
      target/arm: Use isar_feature function for testing AA32HPD feature
      target/arm: Use FIELD_EX32 for testing 32-bit fields
      target/arm: Correctly implement ACTLR2, HACTLR2

Philippe Mathieu-Daudé (1):
      hw/misc/iotkit-secctl: Fix writing to 'PPC Interrupt Clear' register

Richard Henderson (21):
      target/arm: Flush high bits of sve register after AdvSIMD EXT
      target/arm: Flush high bits of sve register after AdvSIMD TBL/TBX
      target/arm: Flush high bits of sve register after AdvSIMD ZIP/UZP/TRN
      target/arm: Flush high bits of sve register after AdvSIMD INS
      target/arm: Use bit 55 explicitly for pauth
      target/arm: Fix select for aa64_va_parameters_both
      target/arm: Remove ttbr1_valid check from get_phys_addr_lpae
      target/arm: Split out aa64_va_parameter_tbi, aa64_va_parameter_tbid
      target/arm: Vectorize USHL and SSHL
      target/arm: Convert PMUL.8 to gvec
      target/arm: Convert PMULL.64 to gvec
      target/arm: Convert PMULL.8 to gvec
      target/arm: Rename isar_feature_aa32_simd_r32
      target/arm: Use isar_feature_aa32_simd_r32 more places
      target/arm: Set MVFR0.FPSP for ARMv5 cpus
      target/arm: Add isar_feature_aa32_simd_r16
      target/arm: Rename isar_feature_aa32_fpdp_v2
      target/arm: Add isar_feature_aa32_{fpsp_v2, fpsp_v3, fpdp_v3}
      target/arm: Perform fpdp_v2 check first
      target/arm: Replace ARM_FEATURE_VFP3 checks with fp{sp, dp}_v3
      target/arm: Add missing checks for fpsp_v2

From: Joel Stanley <joel@jms.id.au>

This splits the common write callback into separate ast2400 and ast2500
implementations. This makes it clearer when implementing differing
behaviour.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200121013302.43839-2-joel@jms.id.au
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/misc/aspeed_scu.c | 80 +++++++++++++++++++++++++++++++-------------
 1 file changed, 57 insertions(+), 23 deletions(-)

diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/aspeed_scu.c
+++ b/hw/misc/aspeed_scu.c
@@ -XXX,XX +XXX,XX @@ static uint64_t aspeed_scu_read(void *opaque, hwaddr offset, unsigned size)
     return s->regs[reg];
 }
 
-static void aspeed_scu_write(void *opaque, hwaddr offset, uint64_t data,
-                             unsigned size)
+static void aspeed_ast2400_scu_write(void *opaque, hwaddr offset,
+                                     uint64_t data, unsigned size)
+{
+    AspeedSCUState *s = ASPEED_SCU(opaque);
+    int reg = TO_REG(offset);
+
+    if (reg >= ASPEED_SCU_NR_REGS) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: Out-of-bounds write at offset 0x%" HWADDR_PRIx "\n",
+                      __func__, offset);
+        return;
+    }
+
+    if (reg > PROT_KEY && reg < CPU2_BASE_SEG1 &&
+            !s->regs[PROT_KEY]) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: SCU is locked!\n", __func__);
+    }
+
+    trace_aspeed_scu_write(offset, size, data);
+
+    switch (reg) {
+    case PROT_KEY:
+        s->regs[reg] = (data == ASPEED_SCU_PROT_KEY) ? 1 : 0;
+        return;
+    case SILICON_REV:
+    case FREQ_CNTR_EVAL:
+    case VGA_SCRATCH1 ... VGA_SCRATCH8:
+    case RNG_DATA:
+    case FREE_CNTR4:
+    case FREE_CNTR4_EXT:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: Write to read-only offset 0x%" HWADDR_PRIx "\n",
+                      __func__, offset);
+        return;
+    }
+
+    s->regs[reg] = data;
+}
+
+static void aspeed_ast2500_scu_write(void *opaque, hwaddr offset,
+                                     uint64_t data, unsigned size)
 {
     AspeedSCUState *s = ASPEED_SCU(opaque);
     int reg = TO_REG(offset);
@@ -XXX,XX +XXX,XX @@ static void aspeed_scu_write(void *opaque, hwaddr offset, uint64_t data,
     case PROT_KEY:
         s->regs[reg] = (data == ASPEED_SCU_PROT_KEY) ? 1 : 0;
         return;
-    case CLK_SEL:
-        s->regs[reg] = data;
-        break;
     case HW_STRAP1:
-        if (ASPEED_IS_AST2500(s->regs[SILICON_REV])) {
-            s->regs[HW_STRAP1] |= data;
-            return;
-        }
-        /* Jump to assignment below */
-        break;
+        s->regs[HW_STRAP1] |= data;
+        return;
     case SILICON_REV:
-        if (ASPEED_IS_AST2500(s->regs[SILICON_REV])) {
-            s->regs[HW_STRAP1] &= ~data;
-        } else {
-            qemu_log_mask(LOG_GUEST_ERROR,
-                          "%s: Write to read-only offset 0x%" HWADDR_PRIx "\n",
-                          __func__, offset);
-        }
-        /* Avoid assignment below, we've handled everything */
+        s->regs[HW_STRAP1] &= ~data;
         return;
     case FREQ_CNTR_EVAL:
     case VGA_SCRATCH1 ... VGA_SCRATCH8:
@@ -XXX,XX +XXX,XX @@ static void aspeed_scu_write(void *opaque, hwaddr offset, uint64_t data,
     s->regs[reg] = data;
 }
 
-static const MemoryRegionOps aspeed_scu_ops = {
+static const MemoryRegionOps aspeed_ast2400_scu_ops = {
     .read = aspeed_scu_read,
-    .write = aspeed_scu_write,
+    .write = aspeed_ast2400_scu_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .valid.min_access_size = 4,
+    .valid.max_access_size = 4,
+    .valid.unaligned = false,
+};
+
+static const MemoryRegionOps aspeed_ast2500_scu_ops = {
+    .read = aspeed_scu_read,
+    .write = aspeed_ast2500_scu_write,
     .endianness = DEVICE_LITTLE_ENDIAN,
     .valid.min_access_size = 4,
     .valid.max_access_size = 4,
@@ -XXX,XX +XXX,XX @@ static void aspeed_2400_scu_class_init(ObjectClass *klass, void *data)
     asc->calc_hpll = aspeed_2400_scu_calc_hpll;
     asc->apb_divider = 2;
     asc->nr_regs = ASPEED_SCU_NR_REGS;
-    asc->ops = &aspeed_scu_ops;
+    asc->ops = &aspeed_ast2400_scu_ops;
 }
 
 static const TypeInfo aspeed_2400_scu_info = {
@@ -XXX,XX +XXX,XX @@ static void aspeed_2500_scu_class_init(ObjectClass *klass, void *data)
     asc->calc_hpll = aspeed_2500_scu_calc_hpll;
     asc->apb_divider = 4;
     asc->nr_regs = ASPEED_SCU_NR_REGS;
-    asc->ops = &aspeed_scu_ops;
+    asc->ops = &aspeed_ast2500_scu_ops;
 }
 
 static const TypeInfo aspeed_2500_scu_info = {
-- 
2.20.1

From: Joel Stanley <joel@jms.id.au>

This returns a fixed but non-zero value for the chip id.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200121013302.43839-3-joel@jms.id.au
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/misc/aspeed_scu.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/aspeed_scu.c
+++ b/hw/misc/aspeed_scu.c
@@ -XXX,XX +XXX,XX @@
 #define CPU2_BASE_SEG4       TO_REG(0x110)
 #define CPU2_BASE_SEG5       TO_REG(0x114)
 #define CPU2_CACHE_CTRL      TO_REG(0x118)
+#define CHIP_ID0             TO_REG(0x150)
+#define CHIP_ID1             TO_REG(0x154)
 #define UART_HPLL_CLK        TO_REG(0x160)
 #define PCIE_CTRL            TO_REG(0x180)
 #define BMC_MMIO_CTRL        TO_REG(0x184)
@@ -XXX,XX +XXX,XX @@
 #define AST2600_HW_STRAP2_PROT    TO_REG(0x518)
 #define AST2600_RNG_CTRL          TO_REG(0x524)
 #define AST2600_RNG_DATA          TO_REG(0x540)
+#define AST2600_CHIP_ID0          TO_REG(0x5B0)
+#define AST2600_CHIP_ID1          TO_REG(0x5B4)
 
 #define AST2600_CLK TO_REG(0x40)
 
@@ -XXX,XX +XXX,XX @@ static const uint32_t ast2500_a1_resets[ASPEED_SCU_NR_REGS] = {
      [CPU2_BASE_SEG1]  = 0x80000000U,
      [CPU2_BASE_SEG4]  = 0x1E600000U,
      [CPU2_BASE_SEG5]  = 0xC0000000U,
+     [CHIP_ID0]        = 0x1234ABCDU,
+     [CHIP_ID1]        = 0x88884444U,
      [UART_HPLL_CLK]   = 0x00001903U,
      [PCIE_CTRL]       = 0x0000007BU,
      [BMC_DEV_ID]      = 0x00002402U
@@ -XXX,XX +XXX,XX @@ static void aspeed_ast2500_scu_write(void *opaque, hwaddr offset,
     case RNG_DATA:
     case FREE_CNTR4:
     case FREE_CNTR4_EXT:
+    case CHIP_ID0:
+    case CHIP_ID1:
         qemu_log_mask(LOG_GUEST_ERROR,
                       "%s: Write to read-only offset 0x%" HWADDR_PRIx "\n",
                       __func__, offset);
@@ -XXX,XX +XXX,XX @@ static void aspeed_ast2600_scu_write(void *opaque, hwaddr offset,
     case AST2600_RNG_DATA:
     case AST2600_SILICON_REV:
     case AST2600_SILICON_REV2:
+    case AST2600_CHIP_ID0:
+    case AST2600_CHIP_ID1:
         /* Add read only registers here */
         qemu_log_mask(LOG_GUEST_ERROR,
                       "%s: Write to read-only offset 0x%" HWADDR_PRIx "\n",
@@ -XXX,XX +XXX,XX @@ static const uint32_t ast2600_a0_resets[ASPEED_AST2600_SCU_NR_REGS] = {
     [AST2600_CLK_STOP_CTRL2]    = 0xFFF0FFF0,
     [AST2600_SDRAM_HANDSHAKE]   = 0x00000040,  /* SoC completed DRAM init */
     [AST2600_HPLL_PARAM]        = 0x1000405F,
+    [AST2600_CHIP_ID0]          = 0x1234ABCD,
+    [AST2600_CHIP_ID1]          = 0x88884444,
+
 };
 
 static void aspeed_ast2600_scu_reset(DeviceState *dev)
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

Fix warning reported by Clang static code analyzer:

CC      hw/misc/iotkit-secctl.o
  hw/misc/iotkit-secctl.c:343:9: warning: Value stored to 'value' is never read
          value &= 0x00f000f3;
          ^        ~~~~~~~~~~

Fixes: b3717c23e1c
Reported-by: Clang Static Analyzer
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20200217132922.24607-1-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/misc/iotkit-secctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/misc/iotkit-secctl.c b/hw/misc/iotkit-secctl.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/iotkit-secctl.c
+++ b/hw/misc/iotkit-secctl.c
@@ -XXX,XX +XXX,XX @@ static MemTxResult iotkit_secctl_s_write(void *opaque, hwaddr addr,
         qemu_set_irq(s->sec_resp_cfg, s->secrespcfg);
         break;
     case A_SECPPCINTCLR:
-        value &= 0x00f000f3;
+        s->secppcintstat &= ~(value & 0x00f000f3);
         foreach_ppc(s, iotkit_secctl_ppc_update_irq_clear);
         break;
     case A_SECPPCINTEN:
-- 
2.20.1

From: Guenter Roeck <linux@roeck-us.net>

Up to now, the mainstone machine only boots if two flash images are
provided. This is not really necessary; the machine can boot from initrd
or from SD without it. At the same time, having to provide dummy flash
images is a nuisance and does not add any real value. Make it optional.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200217210824.18513-1-linux@roeck-us.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/mainstone.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/hw/arm/mainstone.c b/hw/arm/mainstone.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mainstone.c
+++ b/hw/arm/mainstone.c
@@ -XXX,XX +XXX,XX @@ static void mainstone_common_init(MemoryRegion *address_space_mem,
     /* There are two 32MiB flash devices on the board */
     for (i = 0; i < 2; i ++) {
         dinfo = drive_get(IF_PFLASH, 0, i);
-        if (!dinfo) {
-            if (qtest_enabled()) {
-                break;
-            }
-            error_report("Two flash images must be given with the "
-                         "'pflash' parameter");
-            exit(1);
-        }
-
         if (!pflash_cfi01_register(mainstone_flash_base[i],
                                    i ? "mainstone.flash1" : "mainstone.flash0",
                                    MAINSTONE_FLASH,
-                                   blk_by_legacy_dinfo(dinfo),
+                                   dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
                                    sector_len, 4, 0, 0, 0, 0, be)) {
             error_report("Error registering flash memory");
             exit(1);
-- 
2.20.1

From: Guenter Roeck <linux@roeck-us.net>

Up to now, the z2 machine only boots if a flash image is provided.
This is not really necessary; the machine can boot from initrd or from
SD without it. At the same time, having to provide dummy flash images
is a nuisance and does not add any real value. Make it optional.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200217210903.18602-1-linux@roeck-us.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/z2.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/hw/arm/z2.c b/hw/arm/z2.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/z2.c
+++ b/hw/arm/z2.c
@@ -XXX,XX +XXX,XX @@ static void z2_init(MachineState *machine)
     be = 0;
 #endif
     dinfo = drive_get(IF_PFLASH, 0, 0);
-    if (!dinfo && !qtest_enabled()) {
-        error_report("Flash image must be given with the "
-                     "'pflash' parameter");
-        exit(1);
-    }
-
     if (!pflash_cfi01_register(Z2_FLASH_BASE, "z2.flash0", Z2_FLASH_SIZE,
                                dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
                                sector_len, 4, 0, 0, 0, 0, be)) {
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Writes to AdvSIMD registers flush the bits above 128.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200214194643.23317-5-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_simd_inse(DisasContext *s, int rd, int rn,
     write_vec_element(s, tmp, rd, dst_index, size);
 
     tcg_temp_free_i64(tmp);
+
+    /* INS is considered a 128-bit write for SVE. */
+    clear_vec_high(s, true, rd);
 }
 
 
@@ -XXX,XX +XXX,XX @@ static void handle_simd_insg(DisasContext *s, int rd, int rn, int imm5)
 
     idx = extract32(imm5, 1 + size, 4 - size);
     write_vec_element(s, cpu_reg(s, rn), rd, idx, size);
+
+    /* INS is considered a 128-bit write for SVE. */
+    clear_vec_high(s, true, rd);
 }
 
 /*
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The psuedocode in aarch64/functions/pac/auth/Auth and
aarch64/functions/pac/strip/Strip always uses bit 55 for
extfield and do not consider if the current regime has 2 ranges.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200216194343.21331-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/pauth_helper.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/pauth_helper.c
+++ b/target/arm/pauth_helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t ptr, uint64_t modifier,
 
 static uint64_t pauth_original_ptr(uint64_t ptr, ARMVAParameters param)
 {
-    uint64_t extfield = -param.select;
+    /* Note that bit 55 is used whether or not the regime has 2 ranges. */
+    uint64_t extfield = sextract64(ptr, 55, 1);
     int bot_pac_bit = 64 - param.tsz;
     int top_pac_bit = 64 - 8 * param.tbi;
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Select should always be 0 for a regime with one range.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200216194343.21331-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 46 +++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
     bool tbi, tbid, epd, hpd, using16k, using64k;
     int select, tsz;
 
-    /*
-     * Bit 55 is always between the two regions, and is canonical for
-     * determining if address tagging is enabled.
-     */
-    select = extract64(va, 55, 1);
-
     if (!regime_has_2_ranges(mmu_idx)) {
+        select = 0;
         tsz = extract32(tcr, 0, 6);
         using64k = extract32(tcr, 14, 1);
         using16k = extract32(tcr, 15, 1);
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
             tbid = extract32(tcr, 29, 1);
         }
         epd = false;
-    } else if (!select) {
-        tsz = extract32(tcr, 0, 6);
-        epd = extract32(tcr, 7, 1);
-        using64k = extract32(tcr, 14, 1);
-        using16k = extract32(tcr, 15, 1);
-        tbi = extract64(tcr, 37, 1);
-        hpd = extract64(tcr, 41, 1);
-        tbid = extract64(tcr, 51, 1);
     } else {
-        int tg = extract32(tcr, 30, 2);
-        using16k = tg == 1;
-        using64k = tg == 3;
-        tsz = extract32(tcr, 16, 6);
-        epd = extract32(tcr, 23, 1);
-        tbi = extract64(tcr, 38, 1);
-        hpd = extract64(tcr, 42, 1);
-        tbid = extract64(tcr, 52, 1);
+        /*
+         * Bit 55 is always between the two regions, and is canonical for
+         * determining if address tagging is enabled.
+         */
+        select = extract64(va, 55, 1);
+        if (!select) {
+            tsz = extract32(tcr, 0, 6);
+            epd = extract32(tcr, 7, 1);
+            using64k = extract32(tcr, 14, 1);
+            using16k = extract32(tcr, 15, 1);
+            tbi = extract64(tcr, 37, 1);
+            hpd = extract64(tcr, 41, 1);
+            tbid = extract64(tcr, 51, 1);
+        } else {
+            int tg = extract32(tcr, 30, 2);
+            using16k = tg == 1;
+            using64k = tg == 3;
+            tsz = extract32(tcr, 16, 6);
+            epd = extract32(tcr, 23, 1);
+            tbi = extract64(tcr, 38, 1);
+            hpd = extract64(tcr, 42, 1);
+            tbid = extract64(tcr, 52, 1);
+        }
     }
     tsz = MIN(tsz, 39);  /* TODO: ARMv8.4-TTST */
     tsz = MAX(tsz, 16);  /* TODO: ARMv8.2-LVA  */
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Now that aa64_va_parameters_both sets select based on the number
of ranges in the regime, the ttbr1_valid check is redundant.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200216194343.21331-4-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
     TCR *tcr = regime_tcr(env, mmu_idx);
     int ap, ns, xn, pxn;
     uint32_t el = regime_el(env, mmu_idx);
-    bool ttbr1_valid;
     uint64_t descaddrmask;
     bool aarch64 = arm_el_is_aa64(env, el);
     bool guarded = false;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
         param = aa64_va_parameters(env, address, mmu_idx,
                                    access_type != MMU_INST_FETCH);
         level = 0;
-        ttbr1_valid = regime_has_2_ranges(mmu_idx);
         addrsize = 64 - 8 * param.tbi;
         inputsize = 64 - param.tsz;
     } else {
         param = aa32_va_parameters(env, address, mmu_idx);
         level = 1;
-        /* There is no TTBR1 for EL2 */
-        ttbr1_valid = (el != 2);
         addrsize = (mmu_idx == ARMMMUIdx_Stage2 ? 40 : 32);
         inputsize = addrsize - param.tsz;
     }
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, target_ulong address,
     if (inputsize < addrsize) {
         target_ulong top_bits = sextract64(address, inputsize,
                                            addrsize - inputsize);
-        if (-top_bits != param.select || (param.select && !ttbr1_valid)) {
+        if (-top_bits != param.select) {
             /* The gap between the two regions is a Translation fault */
             fault_type = ARMFault_Translation;
             goto do_fault;
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

For the purpose of rebuild_hflags_a64, we do not need to compute
all of the va parameters, only tbi.  Moreover, we can compute them
in a form that is more useful to storing in hflags.

This eliminates the need for aa64_va_parameter_both, so fold that
in to aa64_va_parameter.  The remaining calls to aa64_va_parameter
are in get_phys_addr_lpae and in pauth_helper.c.

This reduces the total cpu consumption of aa64_va_parameter in a
kernel boot plus a kvm guest kernel boot from 3% to 0.5%.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200216194343.21331-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h |  3 --
 target/arm/helper.c    | 68 +++++++++++++++++++++++-------------------
 2 files changed, 37 insertions(+), 34 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ typedef struct ARMVAParameters {
     unsigned tsz    : 8;
     unsigned select : 1;
     bool tbi        : 1;
-    bool tbid       : 1;
     bool epd        : 1;
     bool hpd        : 1;
     bool using16k   : 1;
     bool using64k   : 1;
 } ARMVAParameters;
 
-ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
-                                        ARMMMUIdx mmu_idx);
 ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
                                    ARMMMUIdx mmu_idx, bool data);
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint8_t convert_stage2_attrs(CPUARMState *env, uint8_t s2attrs)
 }
 #endif /* !CONFIG_USER_ONLY */
 
-ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
-                                        ARMMMUIdx mmu_idx)
+static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
+{
+    if (regime_has_2_ranges(mmu_idx)) {
+        return extract64(tcr, 37, 2);
+    } else if (mmu_idx == ARMMMUIdx_Stage2) {
+        return 0; /* VTCR_EL2 */
+    } else {
+        return extract32(tcr, 20, 1);
+    }
+}
+
+static int aa64_va_parameter_tbid(uint64_t tcr, ARMMMUIdx mmu_idx)
+{
+    if (regime_has_2_ranges(mmu_idx)) {
+        return extract64(tcr, 51, 2);
+    } else if (mmu_idx == ARMMMUIdx_Stage2) {
+        return 0; /* VTCR_EL2 */
+    } else {
+        return extract32(tcr, 29, 1);
+    }
+}
+
+ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
+                                   ARMMMUIdx mmu_idx, bool data)
 {
     uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
-    bool tbi, tbid, epd, hpd, using16k, using64k;
-    int select, tsz;
+    bool epd, hpd, using16k, using64k;
+    int select, tsz, tbi;
 
     if (!regime_has_2_ranges(mmu_idx)) {
         select = 0;
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
         using16k = extract32(tcr, 15, 1);
         if (mmu_idx == ARMMMUIdx_Stage2) {
             /* VTCR_EL2 */
-            tbi = tbid = hpd = false;
+            hpd = false;
         } else {
-            tbi = extract32(tcr, 20, 1);
             hpd = extract32(tcr, 24, 1);
-            tbid = extract32(tcr, 29, 1);
         }
         epd = false;
     } else {
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
             epd = extract32(tcr, 7, 1);
             using64k = extract32(tcr, 14, 1);
             using16k = extract32(tcr, 15, 1);
-            tbi = extract64(tcr, 37, 1);
             hpd = extract64(tcr, 41, 1);
-            tbid = extract64(tcr, 51, 1);
         } else {
             int tg = extract32(tcr, 30, 2);
             using16k = tg == 1;
             using64k = tg == 3;
             tsz = extract32(tcr, 16, 6);
             epd = extract32(tcr, 23, 1);
-            tbi = extract64(tcr, 38, 1);
             hpd = extract64(tcr, 42, 1);
-            tbid = extract64(tcr, 52, 1);
         }
     }
     tsz = MIN(tsz, 39);  /* TODO: ARMv8.4-TTST */
     tsz = MAX(tsz, 16);  /* TODO: ARMv8.2-LVA  */
 
+    /* Present TBI as a composite with TBID.  */
+    tbi = aa64_va_parameter_tbi(tcr, mmu_idx);
+    if (!data) {
+        tbi &= ~aa64_va_parameter_tbid(tcr, mmu_idx);
+    }
+    tbi = (tbi >> select) & 1;
+
     return (ARMVAParameters) {
         .tsz = tsz,
         .select = select,
         .tbi = tbi,
-        .tbid = tbid,
         .epd = epd,
         .hpd = hpd,
         .using16k = using16k,
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters_both(CPUARMState *env, uint64_t va,
     };
 }
 
-ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
-                                   ARMMMUIdx mmu_idx, bool data)
-{
-    ARMVAParameters ret = aa64_va_parameters_both(env, va, mmu_idx);
-
-    /* Present TBI as a composite with TBID.  */
-    ret.tbi &= (data || !ret.tbid);
-    return ret;
-}
-
 #ifndef CONFIG_USER_ONLY
 static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
                                           ARMMMUIdx mmu_idx)
@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
 {
     uint32_t flags = rebuild_hflags_aprofile(env);
     ARMMMUIdx stage1 = stage_1_mmu_idx(mmu_idx);
-    ARMVAParameters p0 = aa64_va_parameters_both(env, 0, stage1);
+    uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
     uint64_t sctlr;
     int tbii, tbid;
 
     flags = FIELD_DP32(flags, TBFLAG_ANY, AARCH64_STATE, 1);
 
     /* Get control bits for tagged addresses.  */
-    if (regime_has_2_ranges(mmu_idx)) {
-        ARMVAParameters p1 = aa64_va_parameters_both(env, -1, stage1);
-        tbid = (p1.tbi << 1) | p0.tbi;
-        tbii = tbid & ~((p1.tbid << 1) | p0.tbid);
-    } else {
-        tbid = p0.tbi;
-        tbii = tbid & !p0.tbid;
-    }
+    tbid = aa64_va_parameter_tbi(tcr, mmu_idx);
+    tbii = tbid & ~aa64_va_parameter_tbid(tcr, mmu_idx);
 
     flags = FIELD_DP32(flags, TBFLAG_A64, TBII, tbii);
     flags = FIELD_DP32(flags, TBFLAG_A64, TBID, tbid);
-- 
2.20.1

Enforce a convention that an isar_feature function that tests a
32-bit ID register always has _aa32_ in its name, and one that
tests a 64-bit ID register always has _aa64_ in its name.
We already follow this except for three cases: thumb_div,
arm_div and jazelle, which all need _aa32_ adding.

(As noted in the comment, isar_feature_aa32_fp16_arith()
is an exception in that it currently tests ID_AA64PFR0_EL1,
but will switch to MVFR1 once we've properly implemented
FP16 for AArch32.)

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200214175116.9164-2-peter.maydell@linaro.org
---
 target/arm/cpu.h       | 13 ++++++++++---
 target/arm/internals.h |  2 +-
 linux-user/elfload.c   |  4 ++--
 target/arm/cpu.c       |  6 ++++--
 target/arm/helper.c    |  2 +-
 target/arm/translate.c |  6 +++---
 6 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline uint64_t *aa64_vfp_qreg(CPUARMState *env, unsigned regno)
 /* Shared between translate-sve.c and sve_helper.c.  */
 extern const uint64_t pred_esz_masks[4];
 
+/*
+ * Naming convention for isar_feature functions:
+ * Functions which test 32-bit ID registers should have _aa32_ in
+ * their name. Functions which test 64-bit ID registers should have
+ * _aa64_ in their name.
+ */
+
 /*
  * 32-bit feature tests via id registers.
  */
-static inline bool isar_feature_thumb_div(const ARMISARegisters *id)
+static inline bool isar_feature_aa32_thumb_div(const ARMISARegisters *id)
 {
     return FIELD_EX32(id->id_isar0, ID_ISAR0, DIVIDE) != 0;
 }
 
-static inline bool isar_feature_arm_div(const ARMISARegisters *id)
+static inline bool isar_feature_aa32_arm_div(const ARMISARegisters *id)
 {
     return FIELD_EX32(id->id_isar0, ID_ISAR0, DIVIDE) > 1;
 }
 
-static inline bool isar_feature_jazelle(const ARMISARegisters *id)
+static inline bool isar_feature_aa32_jazelle(const ARMISARegisters *id)
 {
     return FIELD_EX32(id->id_isar1, ID_ISAR1, JAZELLE) != 0;
 }
diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline uint32_t aarch32_cpsr_valid_mask(uint64_t features,
     if ((features >> ARM_FEATURE_THUMB2) & 1) {
         valid |= CPSR_IT;
     }
-    if (isar_feature_jazelle(id)) {
+    if (isar_feature_aa32_jazelle(id)) {
         valid |= CPSR_J;
     }
     if (isar_feature_aa32_pan(id)) {
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap(void)
     GET_FEATURE(ARM_FEATURE_VFP3, ARM_HWCAP_ARM_VFPv3);
     GET_FEATURE(ARM_FEATURE_V6K, ARM_HWCAP_ARM_TLS);
     GET_FEATURE(ARM_FEATURE_VFP4, ARM_HWCAP_ARM_VFPv4);
-    GET_FEATURE_ID(arm_div, ARM_HWCAP_ARM_IDIVA);
-    GET_FEATURE_ID(thumb_div, ARM_HWCAP_ARM_IDIVT);
+    GET_FEATURE_ID(aa32_arm_div, ARM_HWCAP_ARM_IDIVA);
+    GET_FEATURE_ID(aa32_thumb_div, ARM_HWCAP_ARM_IDIVT);
     /* All QEMU's VFPv3 CPUs have 32 registers, see VFP_DREG in translate.c.
      * Note that the ARM_HWCAP_ARM_VFPv3D16 bit is always the inverse of
      * ARM_HWCAP_ARM_VFPD32 (and so always clear for QEMU); it is unrelated
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
          * Presence of EL2 itself is ARM_FEATURE_EL2, and of the
          * Security Extensions is ARM_FEATURE_EL3.
          */
-        assert(!tcg_enabled() || no_aa32 || cpu_isar_feature(arm_div, cpu));
+        assert(!tcg_enabled() || no_aa32 ||
+               cpu_isar_feature(aa32_arm_div, cpu));
         set_feature(env, ARM_FEATURE_LPAE);
         set_feature(env, ARM_FEATURE_V7);
     }
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
     if (arm_feature(env, ARM_FEATURE_V6)) {
         set_feature(env, ARM_FEATURE_V5);
         if (!arm_feature(env, ARM_FEATURE_M)) {
-            assert(!tcg_enabled() || no_aa32 || cpu_isar_feature(jazelle, cpu));
+            assert(!tcg_enabled() || no_aa32 ||
+                   cpu_isar_feature(aa32_jazelle, cpu));
             set_feature(env, ARM_FEATURE_AUXCR);
         }
     }
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
     if (arm_feature(env, ARM_FEATURE_LPAE)) {
         define_arm_cp_regs(cpu, lpae_cp_reginfo);
     }
-    if (cpu_isar_feature(jazelle, cpu)) {
+    if (cpu_isar_feature(aa32_jazelle, cpu)) {
         define_arm_cp_regs(cpu, jazelle_regs);
     }
     /* Slightly awkwardly, the OMAP and StrongARM cores need all of
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@
 #define ENABLE_ARCH_5     arm_dc_feature(s, ARM_FEATURE_V5)
 /* currently all emulated v5 cores are also v5TE, so don't bother */
 #define ENABLE_ARCH_5TE   arm_dc_feature(s, ARM_FEATURE_V5)
-#define ENABLE_ARCH_5J    dc_isar_feature(jazelle, s)
+#define ENABLE_ARCH_5J    dc_isar_feature(aa32_jazelle, s)
 #define ENABLE_ARCH_6     arm_dc_feature(s, ARM_FEATURE_V6)
 #define ENABLE_ARCH_6K    arm_dc_feature(s, ARM_FEATURE_V6K)
 #define ENABLE_ARCH_6T2   arm_dc_feature(s, ARM_FEATURE_THUMB2)
@@ -XXX,XX +XXX,XX @@ static bool op_div(DisasContext *s, arg_rrr *a, bool u)
     TCGv_i32 t1, t2;
 
     if (s->thumb
-        ? !dc_isar_feature(thumb_div, s)
-        : !dc_isar_feature(arm_div, s)) {
+        ? !dc_isar_feature(aa32_thumb_div, s)
+        : !dc_isar_feature(aa32_arm_div, s)) {
         return false;
     }
 
-- 
2.20.1

Our current usage of the isar_feature feature tests almost always
uses an _aa32_ test when the code path is known to be AArch32
specific and an _aa64_ test when the code path is known to be
AArch64 specific. There is just one exception: in the vfp_set_fpscr
helper we check aa64_fp16 to determine whether the FZ16 bit in
the FP(S)CR exists, but this code is also used for AArch32.
There are other places in future where we're likely to want
a general "does this feature exist for either AArch32 or
AArch64" check (typically where architecturally the feature exists
for both CPU states if it exists at all, but the CPU might be
AArch32-only or AArch64-only, and so only have one set of ID
registers).

Introduce a new category of isar_feature_* functions:
isar_feature_any_foo() should be tested when what we want to
know is "does this feature exist for either AArch32 or AArch64",
and always returns the logical OR of isar_feature_aa32_foo()
and isar_feature_aa64_foo().

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200214175116.9164-4-peter.maydell@linaro.org
---
 target/arm/cpu.h        | 19 ++++++++++++++++++-
 target/arm/vfp_helper.c |  2 +-
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ extern const uint64_t pred_esz_masks[4];
  * Naming convention for isar_feature functions:
  * Functions which test 32-bit ID registers should have _aa32_ in
  * their name. Functions which test 64-bit ID registers should have
- * _aa64_ in their name.
+ * _aa64_ in their name. These must only be used in code where we
+ * know for certain that the CPU has AArch32 or AArch64 respectively
+ * or where the correct answer for a CPU which doesn't implement that
+ * CPU state is "false" (eg when generating A32 or A64 code, if adding
+ * system registers that are specific to that CPU state, for "should
+ * we let this system register bit be set" tests where the 32-bit
+ * flavour of the register doesn't have the bit, and so on).
+ * Functions which simply ask "does this feature exist at all" have
+ * _any_ in their name, and always return the logical OR of the _aa64_
+ * and the _aa32_ function.
  */
 
 /*
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_bti(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, BT) != 0;
 }
 
+/*
+ * Feature tests for "does this exist in either 32-bit or 64-bit?"
+ */
+static inline bool isar_feature_any_fp16(const ARMISARegisters *id)
+{
+    return isar_feature_aa64_fp16(id) || isar_feature_aa32_fp16_arith(id);
+}
+
 /*
  * Forward to the above feature tests given an ARMCPU pointer.
  */
diff --git a/target/arm/vfp_helper.c b/target/arm/vfp_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vfp_helper.c
+++ b/target/arm/vfp_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t vfp_get_fpscr(CPUARMState *env)
 void HELPER(vfp_set_fpscr)(CPUARMState *env, uint32_t val)
 {
     /* When ARMv8.2-FP16 is not supported, FZ16 is RES0.  */
-    if (!cpu_isar_feature(aa64_fp16, env_archcpu(env))) {
+    if (!cpu_isar_feature(any_fp16, env_archcpu(env))) {
         val &= ~FPCR_FZ16;
     }
 
-- 
2.20.1

Instead of open-coding "ARM_FEATURE_AARCH64 ? aa64_predinv: aa32_predinv",
define and use an any_predinv isar_feature test function.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200214175116.9164-5-peter.maydell@linaro.org
---
 target/arm/cpu.h    | 5 +++++
 target/arm/helper.c | 9 +--------
 2 files changed, 6 insertions(+), 8 deletions(-)

Pull the code that defines the various PMU registers out
into its own function, matching the pattern we have
already for the debug registers.

Apart from one style fix to a multi-line comment, this
is purely movement of code with no changes to it.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200214175116.9164-6-peter.maydell@linaro.org
---
 target/arm/helper.c | 158 +++++++++++++++++++++++---------------------
 1 file changed, 82 insertions(+), 76 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void define_debug_regs(ARMCPU *cpu)
     }
 }
 
+static void define_pmu_regs(ARMCPU *cpu)
+{
+    /*
+     * v7 performance monitor control register: same implementor
+     * field as main ID register, and we implement four counters in
+     * addition to the cycle count register.
+     */
+    unsigned int i, pmcrn = 4;
+    ARMCPRegInfo pmcr = {
+        .name = "PMCR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 0,
+        .access = PL0_RW,
+        .type = ARM_CP_IO | ARM_CP_ALIAS,
+        .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmcr),
+        .accessfn = pmreg_access, .writefn = pmcr_write,
+        .raw_writefn = raw_write,
+    };
+    ARMCPRegInfo pmcr64 = {
+        .name = "PMCR_EL0", .state = ARM_CP_STATE_AA64,
+        .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 0,
+        .access = PL0_RW, .accessfn = pmreg_access,
+        .type = ARM_CP_IO,
+        .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcr),
+        .resetvalue = (cpu->midr & 0xff000000) | (pmcrn << PMCRN_SHIFT),
+        .writefn = pmcr_write, .raw_writefn = raw_write,
+    };
+    define_one_arm_cp_reg(cpu, &pmcr);
+    define_one_arm_cp_reg(cpu, &pmcr64);
+    for (i = 0; i < pmcrn; i++) {
+        char *pmevcntr_name = g_strdup_printf("PMEVCNTR%d", i);
+        char *pmevcntr_el0_name = g_strdup_printf("PMEVCNTR%d_EL0", i);
+        char *pmevtyper_name = g_strdup_printf("PMEVTYPER%d", i);
+        char *pmevtyper_el0_name = g_strdup_printf("PMEVTYPER%d_EL0", i);
+        ARMCPRegInfo pmev_regs[] = {
+            { .name = pmevcntr_name, .cp = 15, .crn = 14,
+              .crm = 8 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
+              .access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
+              .readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
+              .accessfn = pmreg_access },
+            { .name = pmevcntr_el0_name, .state = ARM_CP_STATE_AA64,
+              .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 8 | (3 & (i >> 3)),
+              .opc2 = i & 7, .access = PL0_RW, .accessfn = pmreg_access,
+              .type = ARM_CP_IO,
+              .readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
+              .raw_readfn = pmevcntr_rawread,
+              .raw_writefn = pmevcntr_rawwrite },
+            { .name = pmevtyper_name, .cp = 15, .crn = 14,
+              .crm = 12 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
+              .access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
+              .readfn = pmevtyper_readfn, .writefn = pmevtyper_writefn,
+              .accessfn = pmreg_access },
+            { .name = pmevtyper_el0_name, .state = ARM_CP_STATE_AA64,
+              .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 12 | (3 & (i >> 3)),
+              .opc2 = i & 7, .access = PL0_RW, .accessfn = pmreg_access,
+              .type = ARM_CP_IO,
+              .readfn = pmevtyper_readfn, .writefn = pmevtyper_writefn,
+              .raw_writefn = pmevtyper_rawwrite },
+            REGINFO_SENTINEL
+        };
+        define_arm_cp_regs(cpu, pmev_regs);
+        g_free(pmevcntr_name);
+        g_free(pmevcntr_el0_name);
+        g_free(pmevtyper_name);
+        g_free(pmevtyper_el0_name);
+    }
+    if (FIELD_EX32(cpu->id_dfr0, ID_DFR0, PERFMON) >= 4 &&
+            FIELD_EX32(cpu->id_dfr0, ID_DFR0, PERFMON) != 0xf) {
+        ARMCPRegInfo v81_pmu_regs[] = {
+            { .name = "PMCEID2", .state = ARM_CP_STATE_AA32,
+              .cp = 15, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 4,
+              .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
+              .resetvalue = extract64(cpu->pmceid0, 32, 32) },
+            { .name = "PMCEID3", .state = ARM_CP_STATE_AA32,
+              .cp = 15, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 5,
+              .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
+              .resetvalue = extract64(cpu->pmceid1, 32, 32) },
+            REGINFO_SENTINEL
+        };
+        define_arm_cp_regs(cpu, v81_pmu_regs);
+    }
+}
+
 /* We don't know until after realize whether there's a GICv3
  * attached, and that is what registers the gicv3 sysregs.
  * So we have to fill in the GIC fields in ID_PFR/ID_PFR1_EL1/ID_AA64PFR0_EL1
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
         define_arm_cp_regs(cpu, pmovsset_cp_reginfo);
     }
     if (arm_feature(env, ARM_FEATURE_V7)) {
-        /* v7 performance monitor control register: same implementor
-         * field as main ID register, and we implement four counters in
-         * addition to the cycle count register.
-         */
-        unsigned int i, pmcrn = 4;
-        ARMCPRegInfo pmcr = {
-            .name = "PMCR", .cp = 15, .crn = 9, .crm = 12, .opc1 = 0, .opc2 = 0,
-            .access = PL0_RW,
-            .type = ARM_CP_IO | ARM_CP_ALIAS,
-            .fieldoffset = offsetoflow32(CPUARMState, cp15.c9_pmcr),
-            .accessfn = pmreg_access, .writefn = pmcr_write,
-            .raw_writefn = raw_write,
-        };
-        ARMCPRegInfo pmcr64 = {
-            .name = "PMCR_EL0", .state = ARM_CP_STATE_AA64,
-            .opc0 = 3, .opc1 = 3, .crn = 9, .crm = 12, .opc2 = 0,
-            .access = PL0_RW, .accessfn = pmreg_access,
-            .type = ARM_CP_IO,
-            .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcr),
-            .resetvalue = (cpu->midr & 0xff000000) | (pmcrn << PMCRN_SHIFT),
-            .writefn = pmcr_write, .raw_writefn = raw_write,
-        };
-        define_one_arm_cp_reg(cpu, &pmcr);
-        define_one_arm_cp_reg(cpu, &pmcr64);
-        for (i = 0; i < pmcrn; i++) {
-            char *pmevcntr_name = g_strdup_printf("PMEVCNTR%d", i);
-            char *pmevcntr_el0_name = g_strdup_printf("PMEVCNTR%d_EL0", i);
-            char *pmevtyper_name = g_strdup_printf("PMEVTYPER%d", i);
-            char *pmevtyper_el0_name = g_strdup_printf("PMEVTYPER%d_EL0", i);
-            ARMCPRegInfo pmev_regs[] = {
-                { .name = pmevcntr_name, .cp = 15, .crn = 14,
-                  .crm = 8 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
-                  .access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
-                  .readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
-                  .accessfn = pmreg_access },
-                { .name = pmevcntr_el0_name, .state = ARM_CP_STATE_AA64,
-                  .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 8 | (3 & (i >> 3)),
-                  .opc2 = i & 7, .access = PL0_RW, .accessfn = pmreg_access,
-                  .type = ARM_CP_IO,
-                  .readfn = pmevcntr_readfn, .writefn = pmevcntr_writefn,
-                  .raw_readfn = pmevcntr_rawread,
-                  .raw_writefn = pmevcntr_rawwrite },
-                { .name = pmevtyper_name, .cp = 15, .crn = 14,
-                  .crm = 12 | (3 & (i >> 3)), .opc1 = 0, .opc2 = i & 7,
-                  .access = PL0_RW, .type = ARM_CP_IO | ARM_CP_ALIAS,
-                  .readfn = pmevtyper_readfn, .writefn = pmevtyper_writefn,
-                  .accessfn = pmreg_access },
-                { .name = pmevtyper_el0_name, .state = ARM_CP_STATE_AA64,
-                  .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 12 | (3 & (i >> 3)),
-                  .opc2 = i & 7, .access = PL0_RW, .accessfn = pmreg_access,
-                  .type = ARM_CP_IO,
-                  .readfn = pmevtyper_readfn, .writefn = pmevtyper_writefn,
-                  .raw_writefn = pmevtyper_rawwrite },
-                REGINFO_SENTINEL
-            };
-            define_arm_cp_regs(cpu, pmev_regs);
-            g_free(pmevcntr_name);
-            g_free(pmevcntr_el0_name);
-            g_free(pmevtyper_name);
-            g_free(pmevtyper_el0_name);
-        }
         ARMCPRegInfo clidr = {
             .name = "CLIDR", .state = ARM_CP_STATE_BOTH,
             .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 1, .opc2 = 1,
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
         define_one_arm_cp_reg(cpu, &clidr);
         define_arm_cp_regs(cpu, v7_cp_reginfo);
         define_debug_regs(cpu);
+        define_pmu_regs(cpu);
     } else {
         define_arm_cp_regs(cpu, not_v7_cp_reginfo);
     }
-    if (FIELD_EX32(cpu->id_dfr0, ID_DFR0, PERFMON) >= 4 &&
-            FIELD_EX32(cpu->id_dfr0, ID_DFR0, PERFMON) != 0xf) {
-        ARMCPRegInfo v81_pmu_regs[] = {
-            { .name = "PMCEID2", .state = ARM_CP_STATE_AA32,
-              .cp = 15, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 4,
-              .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
-              .resetvalue = extract64(cpu->pmceid0, 32, 32) },
-            { .name = "PMCEID3", .state = ARM_CP_STATE_AA32,
-              .cp = 15, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 5,
-              .access = PL0_R, .accessfn = pmreg_access, .type = ARM_CP_CONST,
-              .resetvalue = extract64(cpu->pmceid1, 32, 32) },
-            REGINFO_SENTINEL
-        };
-        define_arm_cp_regs(cpu, v81_pmu_regs);
-    }
     if (arm_feature(env, ARM_FEATURE_V8)) {
         /* AArch64 ID registers, which all have impdef reset values.
          * Note that within the ID register ranges the unused slots
-- 
2.20.1

Add FIELD() definitions for the ID_AA64DFR0_EL1 and use them
where we currently have hard-coded bit values.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200214175116.9164-7-peter.maydell@linaro.org
---
 target/arm/cpu.h    | 10 ++++++++++
 target/arm/cpu.c    |  2 +-
 target/arm/helper.c |  6 +++---
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(ID_AA64MMFR2, BBM, 52, 4)
 FIELD(ID_AA64MMFR2, EVT, 56, 4)
 FIELD(ID_AA64MMFR2, E0PD, 60, 4)
 
+FIELD(ID_AA64DFR0, DEBUGVER, 0, 4)
+FIELD(ID_AA64DFR0, TRACEVER, 4, 4)
+FIELD(ID_AA64DFR0, PMUVER, 8, 4)
+FIELD(ID_AA64DFR0, BRPS, 12, 4)
+FIELD(ID_AA64DFR0, WRPS, 20, 4)
+FIELD(ID_AA64DFR0, CTX_CMPS, 28, 4)
+FIELD(ID_AA64DFR0, PMSVER, 32, 4)
+FIELD(ID_AA64DFR0, DOUBLELOCK, 36, 4)
+FIELD(ID_AA64DFR0, TRACEFILT, 40, 4)
+
 FIELD(ID_DFR0, COPDBG, 0, 4)
 FIELD(ID_DFR0, COPSDBG, 4, 4)
 FIELD(ID_DFR0, MMAPDBG, 8, 4)
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
                 cpu);
 #endif
     } else {
-        cpu->id_aa64dfr0 &= ~0xf00;
+        cpu->id_aa64dfr0 = FIELD_DP64(cpu->id_aa64dfr0, ID_AA64DFR0, PMUVER, 0);
         cpu->id_dfr0 &= ~(0xf << 24);
         cpu->pmceid0 = 0;
         cpu->pmceid1 = 0;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void define_debug_regs(ARMCPU *cpu)
      * check that if they both exist then they agree.
      */
     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
-        assert(extract32(cpu->id_aa64dfr0, 12, 4) == brps);
-        assert(extract32(cpu->id_aa64dfr0, 20, 4) == wrps);
-        assert(extract32(cpu->id_aa64dfr0, 28, 4) == ctx_cmps);
+        assert(FIELD_EX64(cpu->id_aa64dfr0, ID_AA64DFR0, BRPS) == brps);
+        assert(FIELD_EX64(cpu->id_aa64dfr0, ID_AA64DFR0, WRPS) == wrps);
+        assert(FIELD_EX64(cpu->id_aa64dfr0, ID_AA64DFR0, CTX_CMPS) == ctx_cmps);
     }
 
     define_one_arm_cp_reg(cpu, &dbgdidr);
-- 
2.20.1

Instead of open-coding a check on the ID_DFR0 PerfMon ID register
field, create a standardly-named isar_feature for "does AArch32 have
a v8.1 PMUv3" and use it.

This entails moving the id_dfr0 field into the ARMISARegisters struct.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200214175116.9164-9-peter.maydell@linaro.org
---
 target/arm/cpu.h      |  9 ++++++++-
 hw/intc/armv7m_nvic.c |  2 +-
 target/arm/cpu.c      | 28 ++++++++++++++--------------
 target/arm/cpu64.c    |  6 +++---
 target/arm/helper.c   |  5 ++---
 5 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
         uint32_t mvfr0;
         uint32_t mvfr1;
         uint32_t mvfr2;
+        uint32_t id_dfr0;
         uint64_t id_aa64isar0;
         uint64_t id_aa64isar1;
         uint64_t id_aa64pfr0;
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
     uint32_t reset_sctlr;
     uint32_t id_pfr0;
     uint32_t id_pfr1;
-    uint32_t id_dfr0;
     uint64_t pmceid0;
     uint64_t pmceid1;
     uint32_t id_afr0;
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_ats1e1(const ARMISARegisters *id)
     return FIELD_EX64(id->mvfr0, ID_MMFR3, PAN) >= 2;
 }
 
+static inline bool isar_feature_aa32_pmu_8_1(const ARMISARegisters *id)
+{
+    /* 0xf means "non-standard IMPDEF PMU" */
+    return FIELD_EX32(id->id_dfr0, ID_DFR0, PERFMON) >= 4 &&
+        FIELD_EX32(id->id_dfr0, ID_DFR0, PERFMON) != 0xf;
+}
+
 /*
  * 64-bit feature tests via id registers.
  */
diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
     case 0xd44: /* PFR1.  */
         return cpu->id_pfr1;
     case 0xd48: /* DFR0.  */
-        return cpu->id_dfr0;
+        return cpu->isar.id_dfr0;
     case 0xd4c: /* AFR0.  */
         return cpu->id_afr0;
     case 0xd50: /* MMFR0.  */
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
 #endif
     } else {
         cpu->id_aa64dfr0 = FIELD_DP64(cpu->id_aa64dfr0, ID_AA64DFR0, PMUVER, 0);
-        cpu->id_dfr0 = FIELD_DP32(cpu->id_dfr0, ID_DFR0, PERFMON, 0);
+        cpu->isar.id_dfr0 = FIELD_DP32(cpu->isar.id_dfr0, ID_DFR0, PERFMON, 0);
         cpu->pmceid0 = 0;
         cpu->pmceid1 = 0;
     }
@@ -XXX,XX +XXX,XX @@ static void arm1136_r2_initfn(Object *obj)
     cpu->reset_sctlr = 0x00050078;
     cpu->id_pfr0 = 0x111;
     cpu->id_pfr1 = 0x1;
-    cpu->id_dfr0 = 0x2;
+    cpu->isar.id_dfr0 = 0x2;
     cpu->id_afr0 = 0x3;
     cpu->id_mmfr0 = 0x01130003;
     cpu->id_mmfr1 = 0x10030302;
@@ -XXX,XX +XXX,XX @@ static void arm1136_initfn(Object *obj)
     cpu->reset_sctlr = 0x00050078;
     cpu->id_pfr0 = 0x111;
     cpu->id_pfr1 = 0x1;
-    cpu->id_dfr0 = 0x2;
+    cpu->isar.id_dfr0 = 0x2;
     cpu->id_afr0 = 0x3;
     cpu->id_mmfr0 = 0x01130003;
     cpu->id_mmfr1 = 0x10030302;
@@ -XXX,XX +XXX,XX @@ static void arm1176_initfn(Object *obj)
     cpu->reset_sctlr = 0x00050078;
     cpu->id_pfr0 = 0x111;
     cpu->id_pfr1 = 0x11;
-    cpu->id_dfr0 = 0x33;
+    cpu->isar.id_dfr0 = 0x33;
     cpu->id_afr0 = 0;
     cpu->id_mmfr0 = 0x01130003;
     cpu->id_mmfr1 = 0x10030302;
@@ -XXX,XX +XXX,XX @@ static void arm11mpcore_initfn(Object *obj)
     cpu->ctr = 0x1d192992; /* 32K icache 32K dcache */
     cpu->id_pfr0 = 0x111;
     cpu->id_pfr1 = 0x1;
-    cpu->id_dfr0 = 0;
+    cpu->isar.id_dfr0 = 0;
     cpu->id_afr0 = 0x2;
     cpu->id_mmfr0 = 0x01100103;
     cpu->id_mmfr1 = 0x10020302;
@@ -XXX,XX +XXX,XX @@ static void cortex_m3_initfn(Object *obj)
     cpu->pmsav7_dregion = 8;
     cpu->id_pfr0 = 0x00000030;
     cpu->id_pfr1 = 0x00000200;
-    cpu->id_dfr0 = 0x00100000;
+    cpu->isar.id_dfr0 = 0x00100000;
     cpu->id_afr0 = 0x00000000;
     cpu->id_mmfr0 = 0x00000030;
     cpu->id_mmfr1 = 0x00000000;
@@ -XXX,XX +XXX,XX @@ static void cortex_m4_initfn(Object *obj)
     cpu->isar.mvfr2 = 0x00000000;
     cpu->id_pfr0 = 0x00000030;
     cpu->id_pfr1 = 0x00000200;
-    cpu->id_dfr0 = 0x00100000;
+    cpu->isar.id_dfr0 = 0x00100000;
     cpu->id_afr0 = 0x00000000;
     cpu->id_mmfr0 = 0x00000030;
     cpu->id_mmfr1 = 0x00000000;
@@ -XXX,XX +XXX,XX @@ static void cortex_m7_initfn(Object *obj)
     cpu->isar.mvfr2 = 0x00000040;
     cpu->id_pfr0 = 0x00000030;
     cpu->id_pfr1 = 0x00000200;
-    cpu->id_dfr0 = 0x00100000;
+    cpu->isar.id_dfr0 = 0x00100000;
     cpu->id_afr0 = 0x00000000;
     cpu->id_mmfr0 = 0x00100030;
     cpu->id_mmfr1 = 0x00000000;
@@ -XXX,XX +XXX,XX @@ static void cortex_m33_initfn(Object *obj)
     cpu->isar.mvfr2 = 0x00000040;
     cpu->id_pfr0 = 0x00000030;
     cpu->id_pfr1 = 0x00000210;
-    cpu->id_dfr0 = 0x00200000;
+    cpu->isar.id_dfr0 = 0x00200000;
     cpu->id_afr0 = 0x00000000;
     cpu->id_mmfr0 = 0x00101F40;
     cpu->id_mmfr1 = 0x00000000;
@@ -XXX,XX +XXX,XX @@ static void cortex_r5_initfn(Object *obj)
     cpu->midr = 0x411fc153; /* r1p3 */
     cpu->id_pfr0 = 0x0131;
     cpu->id_pfr1 = 0x001;
-    cpu->id_dfr0 = 0x010400;
+    cpu->isar.id_dfr0 = 0x010400;
     cpu->id_afr0 = 0x0;
     cpu->id_mmfr0 = 0x0210030;
     cpu->id_mmfr1 = 0x00000000;
@@ -XXX,XX +XXX,XX @@ static void cortex_a8_initfn(Object *obj)
     cpu->reset_sctlr = 0x00c50078;
     cpu->id_pfr0 = 0x1031;
     cpu->id_pfr1 = 0x11;
-    cpu->id_dfr0 = 0x400;
+    cpu->isar.id_dfr0 = 0x400;
     cpu->id_afr0 = 0;
     cpu->id_mmfr0 = 0x31100003;
     cpu->id_mmfr1 = 0x20000000;
@@ -XXX,XX +XXX,XX @@ static void cortex_a9_initfn(Object *obj)
     cpu->reset_sctlr = 0x00c50078;
     cpu->id_pfr0 = 0x1031;
     cpu->id_pfr1 = 0x11;
-    cpu->id_dfr0 = 0x000;
+    cpu->isar.id_dfr0 = 0x000;
     cpu->id_afr0 = 0;
     cpu->id_mmfr0 = 0x00100103;
     cpu->id_mmfr1 = 0x20000000;
@@ -XXX,XX +XXX,XX @@ static void cortex_a7_initfn(Object *obj)
     cpu->reset_sctlr = 0x00c50078;
     cpu->id_pfr0 = 0x00001131;
     cpu->id_pfr1 = 0x00011011;
-    cpu->id_dfr0 = 0x02010555;
+    cpu->isar.id_dfr0 = 0x02010555;
     cpu->id_afr0 = 0x00000000;
     cpu->id_mmfr0 = 0x10101105;
     cpu->id_mmfr1 = 0x40000000;
@@ -XXX,XX +XXX,XX @@ static void cortex_a15_initfn(Object *obj)
     cpu->reset_sctlr = 0x00c50078;
     cpu->id_pfr0 = 0x00001131;
     cpu->id_pfr1 = 0x00011011;
-    cpu->id_dfr0 = 0x02010555;
+    cpu->isar.id_dfr0 = 0x02010555;
     cpu->id_afr0 = 0x00000000;
     cpu->id_mmfr0 = 0x10201105;
     cpu->id_mmfr1 = 0x20000000;
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_a57_initfn(Object *obj)
     cpu->reset_sctlr = 0x00c50838;
     cpu->id_pfr0 = 0x00000131;
     cpu->id_pfr1 = 0x00011011;
-    cpu->id_dfr0 = 0x03010066;
+    cpu->isar.id_dfr0 = 0x03010066;
     cpu->id_afr0 = 0x00000000;
     cpu->id_mmfr0 = 0x10101105;
     cpu->id_mmfr1 = 0x40000000;
@@ -XXX,XX +XXX,XX @@ static void aarch64_a53_initfn(Object *obj)
     cpu->reset_sctlr = 0x00c50838;
     cpu->id_pfr0 = 0x00000131;
     cpu->id_pfr1 = 0x00011011;
-    cpu->id_dfr0 = 0x03010066;
+    cpu->isar.id_dfr0 = 0x03010066;
     cpu->id_afr0 = 0x00000000;
     cpu->id_mmfr0 = 0x10101105;
     cpu->id_mmfr1 = 0x40000000;
@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
     cpu->reset_sctlr = 0x00c50838;
     cpu->id_pfr0 = 0x00000131;
     cpu->id_pfr1 = 0x00011011;
-    cpu->id_dfr0 = 0x03010066;
+    cpu->isar.id_dfr0 = 0x03010066;
     cpu->id_afr0 = 0x00000000;
     cpu->id_mmfr0 = 0x10201105;
     cpu->id_mmfr1 = 0x40000000;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void define_pmu_regs(ARMCPU *cpu)
         g_free(pmevtyper_name);
         g_free(pmevtyper_el0_name);
     }
-    if (FIELD_EX32(cpu->id_dfr0, ID_DFR0, PERFMON) >= 4 &&
-            FIELD_EX32(cpu->id_dfr0, ID_DFR0, PERFMON) != 0xf) {
+    if (cpu_isar_feature(aa32_pmu_8_1, cpu)) {
         ARMCPRegInfo v81_pmu_regs[] = {
             { .name = "PMCEID2", .state = ARM_CP_STATE_AA32,
               .cp = 15, .opc1 = 0, .crn = 9, .crm = 14, .opc2 = 4,
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 1, .opc2 = 2,
               .access = PL1_R, .type = ARM_CP_CONST,
               .accessfn = access_aa32_tid3,
-              .resetvalue = cpu->id_dfr0 },
+              .resetvalue = cpu->isar.id_dfr0 },
             { .name = "ID_AFR0", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 1, .opc2 = 3,
               .access = PL1_R, .type = ARM_CP_CONST,
-- 
2.20.1

Add the 64-bit version of the "is this a v8.1 PMUv3?"
ID register check function, and the _any_ version that
checks for either AArch32 or AArch64 support. We'll use
this in a later commit.

We don't (yet) do any isar_feature checks on ID_AA64DFR1_EL1,
but we move id_aa64dfr1 into the ARMISARegisters struct with
id_aa64dfr0, for consistency.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200214175116.9164-10-peter.maydell@linaro.org
---
 target/arm/cpu.h    | 15 +++++++++++++--
 target/arm/cpu.c    |  3 ++-
 target/arm/cpu64.c  |  6 +++---
 target/arm/helper.c | 12 +++++++-----
 4 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
         uint64_t id_aa64mmfr0;
         uint64_t id_aa64mmfr1;
         uint64_t id_aa64mmfr2;
+        uint64_t id_aa64dfr0;
+        uint64_t id_aa64dfr1;
     } isar;
     uint32_t midr;
     uint32_t revidr;
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
     uint32_t id_mmfr2;
     uint32_t id_mmfr3;
     uint32_t id_mmfr4;
-    uint64_t id_aa64dfr0;
-    uint64_t id_aa64dfr1;
     uint64_t id_aa64afr0;
     uint64_t id_aa64afr1;
     uint32_t dbgdidr;
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_bti(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, BT) != 0;
 }
 
+static inline bool isar_feature_aa64_pmu_8_1(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64dfr0, ID_AA64DFR0, PMUVER) >= 4 &&
+        FIELD_EX64(id->id_aa64dfr0, ID_AA64DFR0, PMUVER) != 0xf;
+}
+
 /*
  * Feature tests for "does this exist in either 32-bit or 64-bit?"
  */
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_any_predinv(const ARMISARegisters *id)
     return isar_feature_aa64_predinv(id) || isar_feature_aa32_predinv(id);
 }
 
+static inline bool isar_feature_any_pmu_8_1(const ARMISARegisters *id)
+{
+    return isar_feature_aa64_pmu_8_1(id) || isar_feature_aa32_pmu_8_1(id);
+}
+
 /*
  * Forward to the above feature tests given an ARMCPU pointer.
  */
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
                 cpu);
 #endif
     } else {
-        cpu->id_aa64dfr0 = FIELD_DP64(cpu->id_aa64dfr0, ID_AA64DFR0, PMUVER, 0);
+        cpu->isar.id_aa64dfr0 =
+            FIELD_DP64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, PMUVER, 0);
         cpu->isar.id_dfr0 = FIELD_DP32(cpu->isar.id_dfr0, ID_DFR0, PERFMON, 0);
         cpu->pmceid0 = 0;
         cpu->pmceid1 = 0;
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_a57_initfn(Object *obj)
     cpu->isar.id_isar5 = 0x00011121;
     cpu->isar.id_isar6 = 0;
     cpu->isar.id_aa64pfr0 = 0x00002222;
-    cpu->id_aa64dfr0 = 0x10305106;
+    cpu->isar.id_aa64dfr0 = 0x10305106;
     cpu->isar.id_aa64isar0 = 0x00011120;
     cpu->isar.id_aa64mmfr0 = 0x00001124;
     cpu->dbgdidr = 0x3516d000;
@@ -XXX,XX +XXX,XX @@ static void aarch64_a53_initfn(Object *obj)
     cpu->isar.id_isar5 = 0x00011121;
     cpu->isar.id_isar6 = 0;
     cpu->isar.id_aa64pfr0 = 0x00002222;
-    cpu->id_aa64dfr0 = 0x10305106;
+    cpu->isar.id_aa64dfr0 = 0x10305106;
     cpu->isar.id_aa64isar0 = 0x00011120;
     cpu->isar.id_aa64mmfr0 = 0x00001122; /* 40 bit physical addr */
     cpu->dbgdidr = 0x3516d000;
@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
     cpu->isar.id_isar4 = 0x00011142;
     cpu->isar.id_isar5 = 0x00011121;
     cpu->isar.id_aa64pfr0 = 0x00002222;
-    cpu->id_aa64dfr0 = 0x10305106;
+    cpu->isar.id_aa64dfr0 = 0x10305106;
     cpu->isar.id_aa64isar0 = 0x00011120;
     cpu->isar.id_aa64mmfr0 = 0x00001124;
     cpu->dbgdidr = 0x3516d000;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/semihosting/semihost.h"
 #include "sysemu/cpus.h"
 #include "sysemu/kvm.h"
+#include "sysemu/tcg.h"
 #include "qemu/range.h"
 #include "qapi/qapi-commands-machine-target.h"
 #include "qapi/error.h"
@@ -XXX,XX +XXX,XX @@ static void define_debug_regs(ARMCPU *cpu)
      * check that if they both exist then they agree.
      */
     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
-        assert(FIELD_EX64(cpu->id_aa64dfr0, ID_AA64DFR0, BRPS) == brps);
-        assert(FIELD_EX64(cpu->id_aa64dfr0, ID_AA64DFR0, WRPS) == wrps);
-        assert(FIELD_EX64(cpu->id_aa64dfr0, ID_AA64DFR0, CTX_CMPS) == ctx_cmps);
+        assert(FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, BRPS) == brps);
+        assert(FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, WRPS) == wrps);
+        assert(FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, CTX_CMPS)
+               == ctx_cmps);
     }
 
     define_one_arm_cp_reg(cpu, &dbgdidr);
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 5, .opc2 = 0,
               .access = PL1_R, .type = ARM_CP_CONST,
               .accessfn = access_aa64_tid3,
-              .resetvalue = cpu->id_aa64dfr0 },
+              .resetvalue = cpu->isar.id_aa64dfr0 },
             { .name = "ID_AA64DFR1_EL1", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 5, .opc2 = 1,
               .access = PL1_R, .type = ARM_CP_CONST,
               .accessfn = access_aa64_tid3,
-              .resetvalue = cpu->id_aa64dfr1 },
+              .resetvalue = cpu->isar.id_aa64dfr1 },
             { .name = "ID_AA64DFR2_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 5, .opc2 = 2,
               .access = PL1_R, .type = ARM_CP_CONST,
-- 
2.20.1

The AArch32 DBGDIDR defines properties like the number of
breakpoints, watchpoints and context-matching comparators.  On an
AArch64 CPU, the register may not even exist if AArch32 is not
supported at EL1.

Currently we hard-code use of DBGDIDR to identify the number of
breakpoints etc; this works for all our TCG CPUs, but will break if
we ever add an AArch64-only CPU.  We also have an assert() that the
AArch32 and AArch64 registers match, which currently works only by
luck for KVM because we don't populate either of these ID registers
from the KVM vCPU and so they are both zero.

Clean this up so we have functions for finding the number
of breakpoints, watchpoints and context comparators which look
in the appropriate ID register.

This allows us to drop the "check that AArch64 and AArch32 agree
on the number of breakpoints etc" asserts:
 * we no longer look at the AArch32 versions unless that's the
   right place to be looking
 * it's valid to have a CPU (eg AArch64-only) where they don't match
 * we shouldn't have been asserting the validity of ID registers
   in a codepath used with KVM anyway

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200214175116.9164-11-peter.maydell@linaro.org
---
 target/arm/cpu.h          |  7 +++++++
 target/arm/internals.h    | 42 +++++++++++++++++++++++++++++++++++++++
 target/arm/debug_helper.c |  6 +++---
 target/arm/helper.c       | 21 +++++---------------
 4 files changed, 57 insertions(+), 19 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(ID_DFR0, MPROFDBG, 20, 4)
 FIELD(ID_DFR0, PERFMON, 24, 4)
 FIELD(ID_DFR0, TRACEFILT, 28, 4)
 
+FIELD(DBGDIDR, SE_IMP, 12, 1)
+FIELD(DBGDIDR, NSUHD_IMP, 14, 1)
+FIELD(DBGDIDR, VERSION, 16, 4)
+FIELD(DBGDIDR, CTX_CMPS, 20, 4)
+FIELD(DBGDIDR, BRPS, 24, 4)
+FIELD(DBGDIDR, WRPS, 28, 4)
+
 FIELD(MVFR0, SIMDREG, 0, 4)
 FIELD(MVFR0, FPSP, 4, 4)
 FIELD(MVFR0, FPDP, 8, 4)
diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline uint32_t arm_debug_exception_fsr(CPUARMState *env)
     }
 }
 
+/**
+ * arm_num_brps: Return number of implemented breakpoints.
+ * Note that the ID register BRPS field is "number of bps - 1",
+ * and we return the actual number of breakpoints.
+ */
+static inline int arm_num_brps(ARMCPU *cpu)
+{
+    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
+        return FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, BRPS) + 1;
+    } else {
+        return FIELD_EX32(cpu->dbgdidr, DBGDIDR, BRPS) + 1;
+    }
+}
+
+/**
+ * arm_num_wrps: Return number of implemented watchpoints.
+ * Note that the ID register WRPS field is "number of wps - 1",
+ * and we return the actual number of watchpoints.
+ */
+static inline int arm_num_wrps(ARMCPU *cpu)
+{
+    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
+        return FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, WRPS) + 1;
+    } else {
+        return FIELD_EX32(cpu->dbgdidr, DBGDIDR, WRPS) + 1;
+    }
+}
+
+/**
+ * arm_num_ctx_cmps: Return number of implemented context comparators.
+ * Note that the ID register CTX_CMPS field is "number of cmps - 1",
+ * and we return the actual number of comparators.
+ */
+static inline int arm_num_ctx_cmps(ARMCPU *cpu)
+{
+    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
+        return FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, CTX_CMPS) + 1;
+    } else {
+        return FIELD_EX32(cpu->dbgdidr, DBGDIDR, CTX_CMPS) + 1;
+    }
+}
+
 /* Note make_memop_idx reserves 4 bits for mmu_idx, and MO_BSWAP is bit 3.
  * Thus a TCGMemOpIdx, without any MO_ALIGN bits, fits in 8 bits.
  */
diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/debug_helper.c
+++ b/target/arm/debug_helper.c
@@ -XXX,XX +XXX,XX @@ static bool linked_bp_matches(ARMCPU *cpu, int lbn)
 {
     CPUARMState *env = &cpu->env;
     uint64_t bcr = env->cp15.dbgbcr[lbn];
-    int brps = extract32(cpu->dbgdidr, 24, 4);
-    int ctx_cmps = extract32(cpu->dbgdidr, 20, 4);
+    int brps = arm_num_brps(cpu);
+    int ctx_cmps = arm_num_ctx_cmps(cpu);
     int bt;
     uint32_t contextidr;
     uint64_t hcr_el2;
@@ -XXX,XX +XXX,XX @@ static bool linked_bp_matches(ARMCPU *cpu, int lbn)
      * case DBGWCR<n>_EL1.LBN must indicate that breakpoint).
      * We choose the former.
      */
-    if (lbn > brps || lbn < (brps - ctx_cmps)) {
+    if (lbn >= brps || lbn < (brps - ctx_cmps)) {
         return false;
     }
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void define_debug_regs(ARMCPU *cpu)
     };
 
     /* Note that all these register fields hold "number of Xs minus 1". */
-    brps = extract32(cpu->dbgdidr, 24, 4);
-    wrps = extract32(cpu->dbgdidr, 28, 4);
-    ctx_cmps = extract32(cpu->dbgdidr, 20, 4);
+    brps = arm_num_brps(cpu);
+    wrps = arm_num_wrps(cpu);
+    ctx_cmps = arm_num_ctx_cmps(cpu);
 
     assert(ctx_cmps <= brps);
 
-    /* The DBGDIDR and ID_AA64DFR0_EL1 define various properties
-     * of the debug registers such as number of breakpoints;
-     * check that if they both exist then they agree.
-     */
-    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
-        assert(FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, BRPS) == brps);
-        assert(FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, WRPS) == wrps);
-        assert(FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, CTX_CMPS)
-               == ctx_cmps);
-    }
-
     define_one_arm_cp_reg(cpu, &dbgdidr);
     define_arm_cp_regs(cpu, debug_cp_reginfo);
 
@@ -XXX,XX +XXX,XX @@ static void define_debug_regs(ARMCPU *cpu)
         define_arm_cp_regs(cpu, debug_lpae_cp_reginfo);
     }
 
-    for (i = 0; i < brps + 1; i++) {
+    for (i = 0; i < brps; i++) {
         ARMCPRegInfo dbgregs[] = {
             { .name = "DBGBVR", .state = ARM_CP_STATE_BOTH,
               .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = i, .opc2 = 4,
@@ -XXX,XX +XXX,XX @@ static void define_debug_regs(ARMCPU *cpu)
         define_arm_cp_regs(cpu, dbgregs);
     }
 
-    for (i = 0; i < wrps + 1; i++) {
+    for (i = 0; i < wrps; i++) {
         ARMCPRegInfo dbgregs[] = {
             { .name = "DBGWVR", .state = ARM_CP_STATE_BOTH,
               .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = i, .opc2 = 6,
-- 
2.20.1

We're going to want to read the DBGDIDR register from KVM in
a subsequent commit, which means it needs to be in the
ARMISARegisters sub-struct. Move it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200214175116.9164-12-peter.maydell@linaro.org
---
 target/arm/cpu.h       | 2 +-
 target/arm/internals.h | 6 +++---
 target/arm/cpu.c       | 8 ++++----
 target/arm/cpu64.c     | 6 +++---
 target/arm/helper.c    | 2 +-
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
         uint32_t mvfr1;
         uint32_t mvfr2;
         uint32_t id_dfr0;
+        uint32_t dbgdidr;
         uint64_t id_aa64isar0;
         uint64_t id_aa64isar1;
         uint64_t id_aa64pfr0;
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
     uint32_t id_mmfr4;
     uint64_t id_aa64afr0;
     uint64_t id_aa64afr1;
-    uint32_t dbgdidr;
     uint32_t clidr;
     uint64_t mp_affinity; /* MP ID without feature bits */
     /* The elements of this array are the CCSIDR values for each cache,
diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline int arm_num_brps(ARMCPU *cpu)
     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
         return FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, BRPS) + 1;
     } else {
-        return FIELD_EX32(cpu->dbgdidr, DBGDIDR, BRPS) + 1;
+        return FIELD_EX32(cpu->isar.dbgdidr, DBGDIDR, BRPS) + 1;
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static inline int arm_num_wrps(ARMCPU *cpu)
     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
         return FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, WRPS) + 1;
     } else {
-        return FIELD_EX32(cpu->dbgdidr, DBGDIDR, WRPS) + 1;
+        return FIELD_EX32(cpu->isar.dbgdidr, DBGDIDR, WRPS) + 1;
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static inline int arm_num_ctx_cmps(ARMCPU *cpu)
     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
         return FIELD_EX64(cpu->isar.id_aa64dfr0, ID_AA64DFR0, CTX_CMPS) + 1;
     } else {
-        return FIELD_EX32(cpu->dbgdidr, DBGDIDR, CTX_CMPS) + 1;
+        return FIELD_EX32(cpu->isar.dbgdidr, DBGDIDR, CTX_CMPS) + 1;
     }
 }
 
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void cortex_a8_initfn(Object *obj)
     cpu->isar.id_isar2 = 0x21232031;
     cpu->isar.id_isar3 = 0x11112131;
     cpu->isar.id_isar4 = 0x00111142;
-    cpu->dbgdidr = 0x15141000;
+    cpu->isar.dbgdidr = 0x15141000;
     cpu->clidr = (1 << 27) | (2 << 24) | 3;
     cpu->ccsidr[0] = 0xe007e01a; /* 16k L1 dcache. */
     cpu->ccsidr[1] = 0x2007e01a; /* 16k L1 icache. */
@@ -XXX,XX +XXX,XX @@ static void cortex_a9_initfn(Object *obj)
     cpu->isar.id_isar2 = 0x21232041;
     cpu->isar.id_isar3 = 0x11112131;
     cpu->isar.id_isar4 = 0x00111142;
-    cpu->dbgdidr = 0x35141000;
+    cpu->isar.dbgdidr = 0x35141000;
     cpu->clidr = (1 << 27) | (1 << 24) | 3;
     cpu->ccsidr[0] = 0xe00fe019; /* 16k L1 dcache. */
     cpu->ccsidr[1] = 0x200fe019; /* 16k L1 icache. */
@@ -XXX,XX +XXX,XX @@ static void cortex_a7_initfn(Object *obj)
     cpu->isar.id_isar2 = 0x21232041;
     cpu->isar.id_isar3 = 0x11112131;
     cpu->isar.id_isar4 = 0x10011142;
-    cpu->dbgdidr = 0x3515f005;
+    cpu->isar.dbgdidr = 0x3515f005;
     cpu->clidr = 0x0a200023;
     cpu->ccsidr[0] = 0x701fe00a; /* 32K L1 dcache */
     cpu->ccsidr[1] = 0x201fe00a; /* 32K L1 icache */
@@ -XXX,XX +XXX,XX @@ static void cortex_a15_initfn(Object *obj)
     cpu->isar.id_isar2 = 0x21232041;
     cpu->isar.id_isar3 = 0x11112131;
     cpu->isar.id_isar4 = 0x10011142;
-    cpu->dbgdidr = 0x3515f021;
+    cpu->isar.dbgdidr = 0x3515f021;
     cpu->clidr = 0x0a200023;
     cpu->ccsidr[0] = 0x701fe00a; /* 32K L1 dcache */
     cpu->ccsidr[1] = 0x201fe00a; /* 32K L1 icache */
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_a57_initfn(Object *obj)
     cpu->isar.id_aa64dfr0 = 0x10305106;
     cpu->isar.id_aa64isar0 = 0x00011120;
     cpu->isar.id_aa64mmfr0 = 0x00001124;
-    cpu->dbgdidr = 0x3516d000;
+    cpu->isar.dbgdidr = 0x3516d000;
     cpu->clidr = 0x0a200023;
     cpu->ccsidr[0] = 0x701fe00a; /* 32KB L1 dcache */
     cpu->ccsidr[1] = 0x201fe012; /* 48KB L1 icache */
@@ -XXX,XX +XXX,XX @@ static void aarch64_a53_initfn(Object *obj)
     cpu->isar.id_aa64dfr0 = 0x10305106;
     cpu->isar.id_aa64isar0 = 0x00011120;
     cpu->isar.id_aa64mmfr0 = 0x00001122; /* 40 bit physical addr */
-    cpu->dbgdidr = 0x3516d000;
+    cpu->isar.dbgdidr = 0x3516d000;
     cpu->clidr = 0x0a200023;
     cpu->ccsidr[0] = 0x700fe01a; /* 32KB L1 dcache */
     cpu->ccsidr[1] = 0x201fe00a; /* 32KB L1 icache */
@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
     cpu->isar.id_aa64dfr0 = 0x10305106;
     cpu->isar.id_aa64isar0 = 0x00011120;
     cpu->isar.id_aa64mmfr0 = 0x00001124;
-    cpu->dbgdidr = 0x3516d000;
+    cpu->isar.dbgdidr = 0x3516d000;
     cpu->clidr = 0x0a200023;
     cpu->ccsidr[0] = 0x701fe00a; /* 32KB L1 dcache */
     cpu->ccsidr[1] = 0x201fe012; /* 48KB L1 icache */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void define_debug_regs(ARMCPU *cpu)
     ARMCPRegInfo dbgdidr = {
         .name = "DBGDIDR", .cp = 14, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 0,
         .access = PL0_R, .accessfn = access_tda,
-        .type = ARM_CP_CONST, .resetvalue = cpu->dbgdidr,
+        .type = ARM_CP_CONST, .resetvalue = cpu->isar.dbgdidr,
     };
 
     /* Note that all these register fields hold "number of Xs minus 1". */
-- 
2.20.1

Now we have isar_feature test functions that look at fields in the
ID_AA64DFR0_EL1 and ID_DFR0 ID registers, add the code that reads
these register values from KVM so that the checks behave correctly
when we're using KVM.

No isar_feature function tests ID_AA64DFR1_EL1 or DBGDIDR yet, but we
add it to maintain the invariant that every field in the
ARMISARegisters struct is populated for a KVM CPU and can be relied
on.  This requirement isn't actually written down yet, so add a note
to the relevant comment.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200214175116.9164-13-peter.maydell@linaro.org
---
 target/arm/cpu.h   |  5 +++++
 target/arm/kvm32.c |  8 ++++++++
 target/arm/kvm64.c | 36 ++++++++++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
      * prefix means a constant register.
      * Some of these registers are split out into a substructure that
      * is shared with the translators to control the ISA.
+     *
+     * Note that if you add an ID register to the ARMISARegisters struct
+     * you need to also update the 32-bit and 64-bit versions of the
+     * kvm_arm_get_host_cpu_features() function to correctly populate the
+     * field by reading the value from the KVM vCPU.
      */
     struct ARMISARegisters {
         uint32_t id_isar0;
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
         ahcf->isar.id_isar6 = 0;
     }
 
+    err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_dfr0,
+                          ARM_CP15_REG32(0, 0, 1, 2));
+
     err |= read_sys_reg32(fdarray[2], &ahcf->isar.mvfr0,
                           KVM_REG_ARM | KVM_REG_SIZE_U32 |
                           KVM_REG_ARM_VFP | KVM_REG_ARM_VFP_MVFR0);
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
      * Fortunately there is not yet anything in there that affects migration.
      */
 
+    /*
+     * There is no way to read DBGDIDR, because currently 32-bit KVM
+     * doesn't implement debug at all. Leave it at zero.
+     */
+
     kvm_arm_destroy_scratch_host_vcpu(fdarray);
 
     if (err < 0) {
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
     } else {
         err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64pfr1,
                               ARM64_SYS_REG(3, 0, 0, 4, 1));
+        err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64dfr0,
+                              ARM64_SYS_REG(3, 0, 0, 5, 0));
+        err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64dfr1,
+                              ARM64_SYS_REG(3, 0, 0, 5, 1));
         err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64isar0,
                               ARM64_SYS_REG(3, 0, 0, 6, 0));
         err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64isar1,
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
          * than skipping the reads and leaving 0, as we must avoid
          * considering the values in every case.
          */
+        err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_dfr0,
+                              ARM64_SYS_REG(3, 0, 0, 1, 2));
         err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_isar0,
                               ARM64_SYS_REG(3, 0, 0, 2, 0));
         err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_isar1,
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
                               ARM64_SYS_REG(3, 0, 0, 3, 1));
         err |= read_sys_reg32(fdarray[2], &ahcf->isar.mvfr2,
                               ARM64_SYS_REG(3, 0, 0, 3, 2));
+
+        /*
+         * DBGDIDR is a bit complicated because the kernel doesn't
+         * provide an accessor for it in 64-bit mode, which is what this
+         * scratch VM is in, and there's no architected "64-bit sysreg
+         * which reads the same as the 32-bit register" the way there is
+         * for other ID registers. Instead we synthesize a value from the
+         * AArch64 ID_AA64DFR0, the same way the kernel code in
+         * arch/arm64/kvm/sys_regs.c:trap_dbgidr() does.
+         * We only do this if the CPU supports AArch32 at EL1.
+         */
+        if (FIELD_EX32(ahcf->isar.id_aa64pfr0, ID_AA64PFR0, EL1) >= 2) {
+            int wrps = FIELD_EX64(ahcf->isar.id_aa64dfr0, ID_AA64DFR0, WRPS);
+            int brps = FIELD_EX64(ahcf->isar.id_aa64dfr0, ID_AA64DFR0, BRPS);
+            int ctx_cmps =
+                FIELD_EX64(ahcf->isar.id_aa64dfr0, ID_AA64DFR0, CTX_CMPS);
+            int version = 6; /* ARMv8 debug architecture */
+            bool has_el3 =
+                !!FIELD_EX32(ahcf->isar.id_aa64pfr0, ID_AA64PFR0, EL3);
+            uint32_t dbgdidr = 0;
+
+            dbgdidr = FIELD_DP32(dbgdidr, DBGDIDR, WRPS, wrps);
+            dbgdidr = FIELD_DP32(dbgdidr, DBGDIDR, BRPS, brps);
+            dbgdidr = FIELD_DP32(dbgdidr, DBGDIDR, CTX_CMPS, ctx_cmps);
+            dbgdidr = FIELD_DP32(dbgdidr, DBGDIDR, VERSION, version);
+            dbgdidr = FIELD_DP32(dbgdidr, DBGDIDR, NSUHD_IMP, has_el3);
+            dbgdidr = FIELD_DP32(dbgdidr, DBGDIDR, SE_IMP, has_el3);
+            dbgdidr |= (1 << 15); /* RES1 bit */
+            ahcf->isar.dbgdidr = dbgdidr;
+        }
     }
 
     sve_supported = ioctl(fdarray[0], KVM_CHECK_EXTENSION, KVM_CAP_ARM_SVE) > 0;
-- 
2.20.1

The ARMv8.1-PMU extension requires:
 * the evtCount field in PMETYPER<n>_EL0 is 16 bits, not 10
 * MDCR_EL2.HPMD allows event counting to be disabled at EL2
 * two new required events, STALL_FRONTEND and STALL_BACKEND
 * ID register bits in ID_AA64DFR0_EL1 and ID_DFR0

We already implement the 16-bit evtCount field and the
HPMD bit, so all that is missing is the two new events:
  STALL_FRONTEND
   "counts every cycle counted by the CPU_CYCLES event on which no
    operation was issued because there are no operations available
    to issue to this PE from the frontend"
  STALL_BACKEND
   "counts every cycle counted by the CPU_CYCLES event on which no
    operation was issued because the backend is unable to accept
    any available operations from the frontend"

QEMU never stalls in this sense, so our implementation is trivial:
always return a zero count.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200214175116.9164-14-peter.maydell@linaro.org
---
 target/arm/helper.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static int64_t instructions_ns_per(uint64_t icount)
 }
 #endif
 
+static bool pmu_8_1_events_supported(CPUARMState *env)
+{
+    /* For events which are supported in any v8.1 PMU */
+    return cpu_isar_feature(any_pmu_8_1, env_archcpu(env));
+}
+
+static uint64_t zero_event_get_count(CPUARMState *env)
+{
+    /* For events which on QEMU never fire, so their count is always zero */
+    return 0;
+}
+
+static int64_t zero_event_ns_per(uint64_t cycles)
+{
+    /* An event which never fires can never overflow */
+    return -1;
+}
+
 static const pm_event pm_events[] = {
     { .number = 0x000, /* SW_INCR */
       .supported = event_always_supported,
@@ -XXX,XX +XXX,XX @@ static const pm_event pm_events[] = {
       .supported = event_always_supported,
       .get_count = cycles_get_count,
       .ns_per_count = cycles_ns_per,
-    }
+    },
 #endif
+    { .number = 0x023, /* STALL_FRONTEND */
+      .supported = pmu_8_1_events_supported,
+      .get_count = zero_event_get_count,
+      .ns_per_count = zero_event_ns_per,
+    },
+    { .number = 0x024, /* STALL_BACKEND */
+      .supported = pmu_8_1_events_supported,
+      .get_count = zero_event_get_count,
+      .ns_per_count = zero_event_ns_per,
+    },
 };
 
 /*
@@ -XXX,XX +XXX,XX @@ static const pm_event pm_events[] = {
  * should first be updated to something sparse instead of the current
  * supported_event_map[] array.
  */
-#define MAX_EVENT_ID 0x11
+#define MAX_EVENT_ID 0x24
 #define UNSUPPORTED_EVENT UINT16_MAX
 static uint16_t supported_event_map[MAX_EVENT_ID + 1];
 
-- 
2.20.1

The ARMv8.4-PMU extension adds:
 * one new required event, STALL
 * one new system register PMMIR_EL1

(There are also some more L1-cache related events, but since
we don't implement any cache we don't provide these, in the
same way we don't provide the base-PMUv3 cache events.)

The STALL event "counts every attributable cycle on which no
attributable instruction or operation was sent for execution on this
PE".  QEMU doesn't stall in this sense, so this is another
always-reads-zero event.

The PMMIR_EL1 register is a read-only register providing
implementation-specific information about the PMU; currently it has
only one field, SLOTS, which defines behaviour of the STALL_SLOT PMU
event.  Since QEMU doesn't implement the STALL_SLOT event, we can
validly make the register read zero.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200214175116.9164-15-peter.maydell@linaro.org
---
 target/arm/cpu.h    | 18 ++++++++++++++++++
 target/arm/helper.c | 22 +++++++++++++++++++++-
 2 files changed, 39 insertions(+), 1 deletion(-)

Set the ID register bits to provide ARMv8.4-PMU (and implicitly
also ARMv8.1-PMU) in the 'max' CPU.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200214175116.9164-16-peter.maydell@linaro.org
---
 target/arm/cpu64.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         u = FIELD_DP32(u, ID_MMFR3, PAN, 2); /* ATS1E1 */
         cpu->id_mmfr3 = u;
 
+        u = cpu->isar.id_aa64dfr0;
+        u = FIELD_DP64(u, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
+        cpu->isar.id_aa64dfr0 = u;
+
+        u = cpu->isar.id_dfr0;
+        u = FIELD_DP32(u, ID_DFR0, PERFMON, 5); /* v8.4-PMU */
+        cpu->isar.id_dfr0 = u;
+
         /*
          * FIXME: We do not yet support ARMv8.2-fp16 for AArch32 yet,
          * so do not set MVFR1.FPHP.  Strictly speaking this is not legal,
-- 
2.20.1

The LC bit in the PMCR_EL0 register is supposed to be:
 * read/write
 * RES1 on an AArch64-only implementation
 * an architecturally UNKNOWN value on reset
(and use of LC==0 by software is deprecated).

We were implementing it incorrectly as read-only always zero,
though we do have all the code needed to test it and behave
accordingly.

Instead make it a read-write bit which resets to 1 always, which
satisfies all the architectural requirements above.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20200214175116.9164-18-peter.maydell@linaro.org
---
 target/arm/helper.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v6_cp_reginfo[] = {
 #define PMCRC   0x4
 #define PMCRP   0x2
 #define PMCRE   0x1
+/*
+ * Mask of PMCR bits writeable by guest (not including WO bits like C, P,
+ * which can be written as 1 to trigger behaviour but which stay RAZ).
+ */
+#define PMCR_WRITEABLE_MASK (PMCRLC | PMCRDP | PMCRX | PMCRD | PMCRE)
 
 #define PMXEVTYPER_P          0x80000000
 #define PMXEVTYPER_U          0x40000000
@@ -XXX,XX +XXX,XX @@ static void pmcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
         }
     }
 
-    /* only the DP, X, D and E bits are writable */
-    env->cp15.c9_pmcr &= ~0x39;
-    env->cp15.c9_pmcr |= (value & 0x39);
+    env->cp15.c9_pmcr &= ~PMCR_WRITEABLE_MASK;
+    env->cp15.c9_pmcr |= (value & PMCR_WRITEABLE_MASK);
 
     pmu_op_finish(env);
 }
@@ -XXX,XX +XXX,XX @@ static void define_pmu_regs(ARMCPU *cpu)
         .access = PL0_RW, .accessfn = pmreg_access,
         .type = ARM_CP_IO,
         .fieldoffset = offsetof(CPUARMState, cp15.c9_pmcr),
-        .resetvalue = (cpu->midr & 0xff000000) | (pmcrn << PMCRN_SHIFT),
+        .resetvalue = (cpu->midr & 0xff000000) | (pmcrn << PMCRN_SHIFT) |
+                      PMCRLC,
         .writefn = pmcr_write, .raw_writefn = raw_write,
     };
     define_one_arm_cp_reg(cpu, &pmcr);
-- 
2.20.1

The isar_feature_aa32_pan and isar_feature_aa32_ats1e1 functions
are supposed to be testing fields in ID_MMFR3; but a cut-and-paste
error meant we were looking at MVFR0 instead.

Fix the functions to look at the right register; this requires
us to move at least id_mmfr3 to the ARMISARegisters struct; we
choose to move all the ID_MMFRn registers for consistency.

Fixes: 3d6ad6bb466f
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200214175116.9164-19-peter.maydell@linaro.org
---
 target/arm/cpu.h      |  14 +++---
 hw/intc/armv7m_nvic.c |   8 ++--
 target/arm/cpu.c      | 104 +++++++++++++++++++++---------------------
 target/arm/cpu64.c    |  28 ++++++------
 target/arm/helper.c   |  12 ++---
 target/arm/kvm32.c    |  17 +++++++
 target/arm/kvm64.c    |  10 ++++
 7 files changed, 110 insertions(+), 83 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
         uint32_t id_isar4;
         uint32_t id_isar5;
         uint32_t id_isar6;
+        uint32_t id_mmfr0;
+        uint32_t id_mmfr1;
+        uint32_t id_mmfr2;
+        uint32_t id_mmfr3;
+        uint32_t id_mmfr4;
         uint32_t mvfr0;
         uint32_t mvfr1;
         uint32_t mvfr2;
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
     uint64_t pmceid0;
     uint64_t pmceid1;
     uint32_t id_afr0;
-    uint32_t id_mmfr0;
-    uint32_t id_mmfr1;
-    uint32_t id_mmfr2;
-    uint32_t id_mmfr3;
-    uint32_t id_mmfr4;
     uint64_t id_aa64afr0;
     uint64_t id_aa64afr1;
     uint32_t clidr;
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_vminmaxnm(const ARMISARegisters *id)
 
 static inline bool isar_feature_aa32_pan(const ARMISARegisters *id)
 {
-    return FIELD_EX64(id->mvfr0, ID_MMFR3, PAN) != 0;
+    return FIELD_EX32(id->id_mmfr3, ID_MMFR3, PAN) != 0;
 }
 
 static inline bool isar_feature_aa32_ats1e1(const ARMISARegisters *id)
 {
-    return FIELD_EX64(id->mvfr0, ID_MMFR3, PAN) >= 2;
+    return FIELD_EX32(id->id_mmfr3, ID_MMFR3, PAN) >= 2;
 }
 
 static inline bool isar_feature_aa32_pmu_8_1(const ARMISARegisters *id)
diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
     case 0xd4c: /* AFR0.  */
         return cpu->id_afr0;
     case 0xd50: /* MMFR0.  */
-        return cpu->id_mmfr0;
+        return cpu->isar.id_mmfr0;
     case 0xd54: /* MMFR1.  */
-        return cpu->id_mmfr1;
+        return cpu->isar.id_mmfr1;
     case 0xd58: /* MMFR2.  */
-        return cpu->id_mmfr2;
+        return cpu->isar.id_mmfr2;
     case 0xd5c: /* MMFR3.  */
-        return cpu->id_mmfr3;
+        return cpu->isar.id_mmfr3;
     case 0xd60: /* ISAR0.  */
         return cpu->isar.id_isar0;
     case 0xd64: /* ISAR1.  */
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm1136_r2_initfn(Object *obj)
     cpu->id_pfr1 = 0x1;
     cpu->isar.id_dfr0 = 0x2;
     cpu->id_afr0 = 0x3;
-    cpu->id_mmfr0 = 0x01130003;
-    cpu->id_mmfr1 = 0x10030302;
-    cpu->id_mmfr2 = 0x01222110;
+    cpu->isar.id_mmfr0 = 0x01130003;
+    cpu->isar.id_mmfr1 = 0x10030302;
+    cpu->isar.id_mmfr2 = 0x01222110;
     cpu->isar.id_isar0 = 0x00140011;
     cpu->isar.id_isar1 = 0x12002111;
     cpu->isar.id_isar2 = 0x11231111;
@@ -XXX,XX +XXX,XX @@ static void arm1136_initfn(Object *obj)
     cpu->id_pfr1 = 0x1;
     cpu->isar.id_dfr0 = 0x2;
     cpu->id_afr0 = 0x3;
-    cpu->id_mmfr0 = 0x01130003;
-    cpu->id_mmfr1 = 0x10030302;
-    cpu->id_mmfr2 = 0x01222110;
+    cpu->isar.id_mmfr0 = 0x01130003;
+    cpu->isar.id_mmfr1 = 0x10030302;
+    cpu->isar.id_mmfr2 = 0x01222110;
     cpu->isar.id_isar0 = 0x00140011;
     cpu->isar.id_isar1 = 0x12002111;
     cpu->isar.id_isar2 = 0x11231111;
@@ -XXX,XX +XXX,XX @@ static void arm1176_initfn(Object *obj)
     cpu->id_pfr1 = 0x11;
     cpu->isar.id_dfr0 = 0x33;
     cpu->id_afr0 = 0;
-    cpu->id_mmfr0 = 0x01130003;
-    cpu->id_mmfr1 = 0x10030302;
-    cpu->id_mmfr2 = 0x01222100;
+    cpu->isar.id_mmfr0 = 0x01130003;
+    cpu->isar.id_mmfr1 = 0x10030302;
+    cpu->isar.id_mmfr2 = 0x01222100;
     cpu->isar.id_isar0 = 0x0140011;
     cpu->isar.id_isar1 = 0x12002111;
     cpu->isar.id_isar2 = 0x11231121;
@@ -XXX,XX +XXX,XX @@ static void arm11mpcore_initfn(Object *obj)
     cpu->id_pfr1 = 0x1;
     cpu->isar.id_dfr0 = 0;
     cpu->id_afr0 = 0x2;
-    cpu->id_mmfr0 = 0x01100103;
-    cpu->id_mmfr1 = 0x10020302;
-    cpu->id_mmfr2 = 0x01222000;
+    cpu->isar.id_mmfr0 = 0x01100103;
+    cpu->isar.id_mmfr1 = 0x10020302;
+    cpu->isar.id_mmfr2 = 0x01222000;
     cpu->isar.id_isar0 = 0x00100011;
     cpu->isar.id_isar1 = 0x12002111;
     cpu->isar.id_isar2 = 0x11221011;
@@ -XXX,XX +XXX,XX @@ static void cortex_m3_initfn(Object *obj)
     cpu->id_pfr1 = 0x00000200;
     cpu->isar.id_dfr0 = 0x00100000;
     cpu->id_afr0 = 0x00000000;
-    cpu->id_mmfr0 = 0x00000030;
-    cpu->id_mmfr1 = 0x00000000;
-    cpu->id_mmfr2 = 0x00000000;
-    cpu->id_mmfr3 = 0x00000000;
+    cpu->isar.id_mmfr0 = 0x00000030;
+    cpu->isar.id_mmfr1 = 0x00000000;
+    cpu->isar.id_mmfr2 = 0x00000000;
+    cpu->isar.id_mmfr3 = 0x00000000;
     cpu->isar.id_isar0 = 0x01141110;
     cpu->isar.id_isar1 = 0x02111000;
     cpu->isar.id_isar2 = 0x21112231;
@@ -XXX,XX +XXX,XX @@ static void cortex_m4_initfn(Object *obj)
     cpu->id_pfr1 = 0x00000200;
     cpu->isar.id_dfr0 = 0x00100000;
     cpu->id_afr0 = 0x00000000;
-    cpu->id_mmfr0 = 0x00000030;
-    cpu->id_mmfr1 = 0x00000000;
-    cpu->id_mmfr2 = 0x00000000;
-    cpu->id_mmfr3 = 0x00000000;
+    cpu->isar.id_mmfr0 = 0x00000030;
+    cpu->isar.id_mmfr1 = 0x00000000;
+    cpu->isar.id_mmfr2 = 0x00000000;
+    cpu->isar.id_mmfr3 = 0x00000000;
     cpu->isar.id_isar0 = 0x01141110;
     cpu->isar.id_isar1 = 0x02111000;
     cpu->isar.id_isar2 = 0x21112231;
@@ -XXX,XX +XXX,XX @@ static void cortex_m7_initfn(Object *obj)
     cpu->id_pfr1 = 0x00000200;
     cpu->isar.id_dfr0 = 0x00100000;
     cpu->id_afr0 = 0x00000000;
-    cpu->id_mmfr0 = 0x00100030;
-    cpu->id_mmfr1 = 0x00000000;
-    cpu->id_mmfr2 = 0x01000000;
-    cpu->id_mmfr3 = 0x00000000;
+    cpu->isar.id_mmfr0 = 0x00100030;
+    cpu->isar.id_mmfr1 = 0x00000000;
+    cpu->isar.id_mmfr2 = 0x01000000;
+    cpu->isar.id_mmfr3 = 0x00000000;
     cpu->isar.id_isar0 = 0x01101110;
     cpu->isar.id_isar1 = 0x02112000;
     cpu->isar.id_isar2 = 0x20232231;
@@ -XXX,XX +XXX,XX @@ static void cortex_m33_initfn(Object *obj)
     cpu->id_pfr1 = 0x00000210;
     cpu->isar.id_dfr0 = 0x00200000;
     cpu->id_afr0 = 0x00000000;
-    cpu->id_mmfr0 = 0x00101F40;
-    cpu->id_mmfr1 = 0x00000000;
-    cpu->id_mmfr2 = 0x01000000;
-    cpu->id_mmfr3 = 0x00000000;
+    cpu->isar.id_mmfr0 = 0x00101F40;
+    cpu->isar.id_mmfr1 = 0x00000000;
+    cpu->isar.id_mmfr2 = 0x01000000;
+    cpu->isar.id_mmfr3 = 0x00000000;
     cpu->isar.id_isar0 = 0x01101110;
     cpu->isar.id_isar1 = 0x02212000;
     cpu->isar.id_isar2 = 0x20232232;
@@ -XXX,XX +XXX,XX @@ static void cortex_r5_initfn(Object *obj)
     cpu->id_pfr1 = 0x001;
     cpu->isar.id_dfr0 = 0x010400;
     cpu->id_afr0 = 0x0;
-    cpu->id_mmfr0 = 0x0210030;
-    cpu->id_mmfr1 = 0x00000000;
-    cpu->id_mmfr2 = 0x01200000;
-    cpu->id_mmfr3 = 0x0211;
+    cpu->isar.id_mmfr0 = 0x0210030;
+    cpu->isar.id_mmfr1 = 0x00000000;
+    cpu->isar.id_mmfr2 = 0x01200000;
+    cpu->isar.id_mmfr3 = 0x0211;
     cpu->isar.id_isar0 = 0x02101111;
     cpu->isar.id_isar1 = 0x13112111;
     cpu->isar.id_isar2 = 0x21232141;
@@ -XXX,XX +XXX,XX @@ static void cortex_a8_initfn(Object *obj)
     cpu->id_pfr1 = 0x11;
     cpu->isar.id_dfr0 = 0x400;
     cpu->id_afr0 = 0;
-    cpu->id_mmfr0 = 0x31100003;
-    cpu->id_mmfr1 = 0x20000000;
-    cpu->id_mmfr2 = 0x01202000;
-    cpu->id_mmfr3 = 0x11;
+    cpu->isar.id_mmfr0 = 0x31100003;
+    cpu->isar.id_mmfr1 = 0x20000000;
+    cpu->isar.id_mmfr2 = 0x01202000;
+    cpu->isar.id_mmfr3 = 0x11;
     cpu->isar.id_isar0 = 0x00101111;
     cpu->isar.id_isar1 = 0x12112111;
     cpu->isar.id_isar2 = 0x21232031;
@@ -XXX,XX +XXX,XX @@ static void cortex_a9_initfn(Object *obj)
     cpu->id_pfr1 = 0x11;
     cpu->isar.id_dfr0 = 0x000;
     cpu->id_afr0 = 0;
-    cpu->id_mmfr0 = 0x00100103;
-    cpu->id_mmfr1 = 0x20000000;
-    cpu->id_mmfr2 = 0x01230000;
-    cpu->id_mmfr3 = 0x00002111;
+    cpu->isar.id_mmfr0 = 0x00100103;
+    cpu->isar.id_mmfr1 = 0x20000000;
+    cpu->isar.id_mmfr2 = 0x01230000;
+    cpu->isar.id_mmfr3 = 0x00002111;
     cpu->isar.id_isar0 = 0x00101111;
     cpu->isar.id_isar1 = 0x13112111;
     cpu->isar.id_isar2 = 0x21232041;
@@ -XXX,XX +XXX,XX @@ static void cortex_a7_initfn(Object *obj)
     cpu->id_pfr1 = 0x00011011;
     cpu->isar.id_dfr0 = 0x02010555;
     cpu->id_afr0 = 0x00000000;
-    cpu->id_mmfr0 = 0x10101105;
-    cpu->id_mmfr1 = 0x40000000;
-    cpu->id_mmfr2 = 0x01240000;
-    cpu->id_mmfr3 = 0x02102211;
+    cpu->isar.id_mmfr0 = 0x10101105;
+    cpu->isar.id_mmfr1 = 0x40000000;
+    cpu->isar.id_mmfr2 = 0x01240000;
+    cpu->isar.id_mmfr3 = 0x02102211;
     /* a7_mpcore_r0p5_trm, page 4-4 gives 0x01101110; but
      * table 4-41 gives 0x02101110, which includes the arm div insns.
      */
@@ -XXX,XX +XXX,XX @@ static void cortex_a15_initfn(Object *obj)
     cpu->id_pfr1 = 0x00011011;
     cpu->isar.id_dfr0 = 0x02010555;
     cpu->id_afr0 = 0x00000000;
-    cpu->id_mmfr0 = 0x10201105;
-    cpu->id_mmfr1 = 0x20000000;
-    cpu->id_mmfr2 = 0x01240000;
-    cpu->id_mmfr3 = 0x02102211;
+    cpu->isar.id_mmfr0 = 0x10201105;
+    cpu->isar.id_mmfr1 = 0x20000000;
+    cpu->isar.id_mmfr2 = 0x01240000;
+    cpu->isar.id_mmfr3 = 0x02102211;
     cpu->isar.id_isar0 = 0x02101110;
     cpu->isar.id_isar1 = 0x13112111;
     cpu->isar.id_isar2 = 0x21232041;
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
             t = FIELD_DP32(t, MVFR2, FPMISC, 4);   /* FP MaxNum */
             cpu->isar.mvfr2 = t;
 
-            t = cpu->id_mmfr3;
+            t = cpu->isar.id_mmfr3;
             t = FIELD_DP32(t, ID_MMFR3, PAN, 2); /* ATS1E1 */
-            cpu->id_mmfr3 = t;
+            cpu->isar.id_mmfr3 = t;
 
-            t = cpu->id_mmfr4;
+            t = cpu->isar.id_mmfr4;
             t = FIELD_DP32(t, ID_MMFR4, HPDS, 1); /* AA32HPD */
-            cpu->id_mmfr4 = t;
+            cpu->isar.id_mmfr4 = t;
         }
 #endif
     }
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_a57_initfn(Object *obj)
     cpu->id_pfr1 = 0x00011011;
     cpu->isar.id_dfr0 = 0x03010066;
     cpu->id_afr0 = 0x00000000;
-    cpu->id_mmfr0 = 0x10101105;
-    cpu->id_mmfr1 = 0x40000000;
-    cpu->id_mmfr2 = 0x01260000;
-    cpu->id_mmfr3 = 0x02102211;
+    cpu->isar.id_mmfr0 = 0x10101105;
+    cpu->isar.id_mmfr1 = 0x40000000;
+    cpu->isar.id_mmfr2 = 0x01260000;
+    cpu->isar.id_mmfr3 = 0x02102211;
     cpu->isar.id_isar0 = 0x02101110;
     cpu->isar.id_isar1 = 0x13112111;
     cpu->isar.id_isar2 = 0x21232042;
@@ -XXX,XX +XXX,XX @@ static void aarch64_a53_initfn(Object *obj)
     cpu->id_pfr1 = 0x00011011;
     cpu->isar.id_dfr0 = 0x03010066;
     cpu->id_afr0 = 0x00000000;
-    cpu->id_mmfr0 = 0x10101105;
-    cpu->id_mmfr1 = 0x40000000;
-    cpu->id_mmfr2 = 0x01260000;
-    cpu->id_mmfr3 = 0x02102211;
+    cpu->isar.id_mmfr0 = 0x10101105;
+    cpu->isar.id_mmfr1 = 0x40000000;
+    cpu->isar.id_mmfr2 = 0x01260000;
+    cpu->isar.id_mmfr3 = 0x02102211;
     cpu->isar.id_isar0 = 0x02101110;
     cpu->isar.id_isar1 = 0x13112111;
     cpu->isar.id_isar2 = 0x21232042;
@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
     cpu->id_pfr1 = 0x00011011;
     cpu->isar.id_dfr0 = 0x03010066;
     cpu->id_afr0 = 0x00000000;
-    cpu->id_mmfr0 = 0x10201105;
-    cpu->id_mmfr1 = 0x40000000;
-    cpu->id_mmfr2 = 0x01260000;
-    cpu->id_mmfr3 = 0x02102211;
+    cpu->isar.id_mmfr0 = 0x10201105;
+    cpu->isar.id_mmfr1 = 0x40000000;
+    cpu->isar.id_mmfr2 = 0x01260000;
+    cpu->isar.id_mmfr3 = 0x02102211;
     cpu->isar.id_isar0 = 0x02101110;
     cpu->isar.id_isar1 = 0x13112111;
     cpu->isar.id_isar2 = 0x21232042;
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         u = FIELD_DP32(u, ID_ISAR6, SPECRES, 1);
         cpu->isar.id_isar6 = u;
 
-        u = cpu->id_mmfr3;
+        u = cpu->isar.id_mmfr3;
         u = FIELD_DP32(u, ID_MMFR3, PAN, 2); /* ATS1E1 */
-        cpu->id_mmfr3 = u;
+        cpu->isar.id_mmfr3 = u;
 
         u = cpu->isar.id_aa64dfr0;
         u = FIELD_DP64(u, ID_AA64DFR0, PMUVER, 5); /* v8.4-PMU */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 1, .opc2 = 4,
               .access = PL1_R, .type = ARM_CP_CONST,
               .accessfn = access_aa32_tid3,
-              .resetvalue = cpu->id_mmfr0 },
+              .resetvalue = cpu->isar.id_mmfr0 },
             { .name = "ID_MMFR1", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 1, .opc2 = 5,
               .access = PL1_R, .type = ARM_CP_CONST,
               .accessfn = access_aa32_tid3,
-              .resetvalue = cpu->id_mmfr1 },
+              .resetvalue = cpu->isar.id_mmfr1 },
             { .name = "ID_MMFR2", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 1, .opc2 = 6,
               .access = PL1_R, .type = ARM_CP_CONST,
               .accessfn = access_aa32_tid3,
-              .resetvalue = cpu->id_mmfr2 },
+              .resetvalue = cpu->isar.id_mmfr2 },
             { .name = "ID_MMFR3", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 1, .opc2 = 7,
               .access = PL1_R, .type = ARM_CP_CONST,
               .accessfn = access_aa32_tid3,
-              .resetvalue = cpu->id_mmfr3 },
+              .resetvalue = cpu->isar.id_mmfr3 },
             { .name = "ID_ISAR0", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 0,
               .access = PL1_R, .type = ARM_CP_CONST,
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 6,
               .access = PL1_R, .type = ARM_CP_CONST,
               .accessfn = access_aa32_tid3,
-              .resetvalue = cpu->id_mmfr4 },
+              .resetvalue = cpu->isar.id_mmfr4 },
             { .name = "ID_ISAR6", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 2, .opc2 = 7,
               .access = PL1_R, .type = ARM_CP_CONST,
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
         define_arm_cp_regs(cpu, vmsa_pmsa_cp_reginfo);
         define_arm_cp_regs(cpu, vmsa_cp_reginfo);
         /* TTCBR2 is introduced with ARMv8.2-A32HPD.  */
-        if (FIELD_EX32(cpu->id_mmfr4, ID_MMFR4, HPDS) != 0) {
+        if (FIELD_EX32(cpu->isar.id_mmfr4, ID_MMFR4, HPDS) != 0) {
             define_one_arm_cp_reg(cpu, &ttbcr2_reginfo);
         }
     }
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
      * Fortunately there is not yet anything in there that affects migration.
      */
 
+    err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr0,
+                          ARM_CP15_REG32(0, 0, 1, 4));
+    err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr1,
+                          ARM_CP15_REG32(0, 0, 1, 5));
+    err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr2,
+                          ARM_CP15_REG32(0, 0, 1, 6));
+    err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr3,
+                          ARM_CP15_REG32(0, 0, 1, 7));
+    if (read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr4,
+                       ARM_CP15_REG32(0, 0, 2, 6))) {
+        /*
+         * Older kernels don't support reading ID_MMFR4 (a new in v8
+         * register); assume it's zero.
+         */
+        ahcf->isar.id_mmfr4 = 0;
+    }
+
     /*
      * There is no way to read DBGDIDR, because currently 32-bit KVM
      * doesn't implement debug at all. Leave it at zero.
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
          */
         err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_dfr0,
                               ARM64_SYS_REG(3, 0, 0, 1, 2));
+        err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr0,
+                              ARM64_SYS_REG(3, 0, 0, 1, 4));
+        err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr1,
+                              ARM64_SYS_REG(3, 0, 0, 1, 5));
+        err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr2,
+                              ARM64_SYS_REG(3, 0, 0, 1, 6));
+        err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr3,
+                              ARM64_SYS_REG(3, 0, 0, 1, 7));
         err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_isar0,
                               ARM64_SYS_REG(3, 0, 0, 2, 0));
         err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_isar1,
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
                               ARM64_SYS_REG(3, 0, 0, 2, 4));
         err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_isar5,
                               ARM64_SYS_REG(3, 0, 0, 2, 5));
+        err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_mmfr4,
+                              ARM64_SYS_REG(3, 0, 0, 2, 6));
         err |= read_sys_reg32(fdarray[2], &ahcf->isar.id_isar6,
                               ARM64_SYS_REG(3, 0, 0, 2, 7));
 
-- 
2.20.1

Now we have moved ID_MMFR4 into the ARMISARegisters struct, we
can define and use an isar_feature for the presence of the
ARMv8.2-AA32HPD feature, rather than open-coding the test.

While we're here, correct a comment typo which missed an 'A'
from the feature name.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200214175116.9164-20-peter.maydell@linaro.org
---
 target/arm/cpu.h    | 5 +++++
 target/arm/helper.c | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

Cut-and-paste errors mean we're using FIELD_EX64() to extract fields from
some 32-bit ID register fields. Use FIELD_EX32() instead. (This makes
no difference in behaviour, it's just more consistent.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200214175116.9164-21-peter.maydell@linaro.org
---
 target/arm/cpu.h | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
 static inline bool isar_feature_aa32_fp_d32(const ARMISARegisters *id)
 {
     /* Return true if D16-D31 are implemented */
-    return FIELD_EX64(id->mvfr0, MVFR0, SIMDREG) >= 2;
+    return FIELD_EX32(id->mvfr0, MVFR0, SIMDREG) >= 2;
 }
 
 static inline bool isar_feature_aa32_fpshvec(const ARMISARegisters *id)
 {
-    return FIELD_EX64(id->mvfr0, MVFR0, FPSHVEC) > 0;
+    return FIELD_EX32(id->mvfr0, MVFR0, FPSHVEC) > 0;
 }
 
 static inline bool isar_feature_aa32_fpdp(const ARMISARegisters *id)
 {
     /* Return true if CPU supports double precision floating point */
-    return FIELD_EX64(id->mvfr0, MVFR0, FPDP) > 0;
+    return FIELD_EX32(id->mvfr0, MVFR0, FPDP) > 0;
 }
 
 /*
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fpdp(const ARMISARegisters *id)
  */
 static inline bool isar_feature_aa32_fp16_spconv(const ARMISARegisters *id)
 {
-    return FIELD_EX64(id->mvfr1, MVFR1, FPHP) > 0;
+    return FIELD_EX32(id->mvfr1, MVFR1, FPHP) > 0;
 }
 
 static inline bool isar_feature_aa32_fp16_dpconv(const ARMISARegisters *id)
 {
-    return FIELD_EX64(id->mvfr1, MVFR1, FPHP) > 1;
+    return FIELD_EX32(id->mvfr1, MVFR1, FPHP) > 1;
 }
 
 static inline bool isar_feature_aa32_vsel(const ARMISARegisters *id)
 {
-    return FIELD_EX64(id->mvfr2, MVFR2, FPMISC) >= 1;
+    return FIELD_EX32(id->mvfr2, MVFR2, FPMISC) >= 1;
 }
 
 static inline bool isar_feature_aa32_vcvt_dr(const ARMISARegisters *id)
 {
-    return FIELD_EX64(id->mvfr2, MVFR2, FPMISC) >= 2;
+    return FIELD_EX32(id->mvfr2, MVFR2, FPMISC) >= 2;
 }
 
 static inline bool isar_feature_aa32_vrint(const ARMISARegisters *id)
 {
-    return FIELD_EX64(id->mvfr2, MVFR2, FPMISC) >= 3;
+    return FIELD_EX32(id->mvfr2, MVFR2, FPMISC) >= 3;
 }
 
 static inline bool isar_feature_aa32_vminmaxnm(const ARMISARegisters *id)
 {
-    return FIELD_EX64(id->mvfr2, MVFR2, FPMISC) >= 4;
+    return FIELD_EX32(id->mvfr2, MVFR2, FPMISC) >= 4;
 }
 
 static inline bool isar_feature_aa32_pan(const ARMISARegisters *id)
-- 
2.20.1

The ACTLR2 and HACTLR2 AArch32 system registers didn't exist in ARMv7
or the original ARMv8.  They were later added as optional registers,
whose presence is signaled by the ID_MMFR4.AC2 field.  From ARMv8.2
they are mandatory (ie ID_MMFR4.AC2 must be non-zero).

We implemented HACTLR2 in commit 0e0456ab8895a5e85, but we
incorrectly made it exist for all v8 CPUs, and we didn't implement
ACTLR2 at all.

Sort this out by implementing both registers only when they are
supposed to exist, and setting the ID_MMFR4 bit for -cpu max.

Note that this removes HACTLR2 from our Cortex-A53, -A47 and -A72
CPU models; this is correct, because those CPUs do not implement
this register.

Fixes: 0e0456ab8895a5e85
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200214175116.9164-22-peter.maydell@linaro.org
---
 target/arm/cpu.h    |  5 +++++
 target/arm/cpu.c    |  1 +
 target/arm/cpu64.c  |  4 ++++
 target/arm/helper.c | 32 +++++++++++++++++++++++---------
 4 files changed, 33 insertions(+), 9 deletions(-)

From: Guenter Roeck <linux@roeck-us.net>

We need to be able to use OHCISysBusState outside hcd-ohci.c, so move it
to its include file.

Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-id: 20200217204812.9857-2-linux@roeck-us.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/usb/hcd-ohci.h | 16 ++++++++++++++++
 hw/usb/hcd-ohci.c | 15 ---------------
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/hw/usb/hcd-ohci.h b/hw/usb/hcd-ohci.h
index XXXXXXX..XXXXXXX 100644
--- a/hw/usb/hcd-ohci.h
+++ b/hw/usb/hcd-ohci.h
@@ -XXX,XX +XXX,XX @@
 #define HCD_OHCI_H
 
 #include "sysemu/dma.h"
+#include "hw/usb.h"
 
 /* Number of Downstream Ports on the root hub: */
 #define OHCI_MAX_PORTS 15
@@ -XXX,XX +XXX,XX @@ typedef struct OHCIState {
     void (*ohci_die)(struct OHCIState *ohci);
 } OHCIState;
 
+#define TYPE_SYSBUS_OHCI "sysbus-ohci"
+#define SYSBUS_OHCI(obj) OBJECT_CHECK(OHCISysBusState, (obj), TYPE_SYSBUS_OHCI)
+
+typedef struct {
+    /*< private >*/
+    SysBusDevice parent_obj;
+    /*< public >*/
+
+    OHCIState ohci;
+    char *masterbus;
+    uint32_t num_ports;
+    uint32_t firstport;
+    dma_addr_t dma_offset;
+} OHCISysBusState;
+
 extern const VMStateDescription vmstate_ohci_state;
 
 void usb_ohci_init(OHCIState *ohci, DeviceState *dev, uint32_t num_ports,
diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/usb/hcd-ohci.c
+++ b/hw/usb/hcd-ohci.c
@@ -XXX,XX +XXX,XX @@ void ohci_sysbus_die(struct OHCIState *ohci)
     ohci_bus_stop(ohci);
 }
 
-#define TYPE_SYSBUS_OHCI "sysbus-ohci"
-#define SYSBUS_OHCI(obj) OBJECT_CHECK(OHCISysBusState, (obj), TYPE_SYSBUS_OHCI)
-
-typedef struct {
-    /*< private >*/
-    SysBusDevice parent_obj;
-    /*< public >*/
-
-    OHCIState ohci;
-    char *masterbus;
-    uint32_t num_ports;
-    uint32_t firstport;
-    dma_addr_t dma_offset;
-} OHCISysBusState;
-
 static void ohci_realize_pxa(DeviceState *dev, Error **errp)
 {
     OHCISysBusState *s = SYSBUS_OHCI(dev);
-- 
2.20.1

From: Guenter Roeck <linux@roeck-us.net>

Instantiate EHCI and OHCI controllers on Allwinner A10. OHCI ports are
modeled as companions of the respective EHCI ports.

With this patch applied, USB controllers are discovered and instantiated
when booting the cubieboard machine with a recent Linux kernel.

ehci-platform 1c14000.usb: EHCI Host Controller
ehci-platform 1c14000.usb: new USB bus registered, assigned bus number 1
ehci-platform 1c14000.usb: irq 26, io mem 0x01c14000
ehci-platform 1c14000.usb: USB 2.0 started, EHCI 1.00
ehci-platform 1c1c000.usb: EHCI Host Controller
ehci-platform 1c1c000.usb: new USB bus registered, assigned bus number 2
ehci-platform 1c1c000.usb: irq 31, io mem 0x01c1c000
ehci-platform 1c1c000.usb: USB 2.0 started, EHCI 1.00
ohci-platform 1c14400.usb: Generic Platform OHCI controller
ohci-platform 1c14400.usb: new USB bus registered, assigned bus number 3
ohci-platform 1c14400.usb: irq 27, io mem 0x01c14400
ohci-platform 1c1c400.usb: Generic Platform OHCI controller
ohci-platform 1c1c400.usb: new USB bus registered, assigned bus number 4
ohci-platform 1c1c400.usb: irq 32, io mem 0x01c1c400
usb 2-1: new high-speed USB device number 2 using ehci-platform
usb-storage 2-1:1.0: USB Mass Storage device detected
scsi host1: usb-storage 2-1:1.0
usb 3-1: new full-speed USB device number 2 using ohci-platform
input: QEMU QEMU USB Mouse as /devices/platform/soc/1c14400.usb/usb3/3-1/3-1:1.0/0003:0627:0001.0001/input/input0

Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-id: 20200217204812.9857-4-linux@roeck-us.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/allwinner-a10.h |  6 +++++
 hw/arm/allwinner-a10.c         | 43 ++++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/include/hw/arm/allwinner-a10.h b/include/hw/arm/allwinner-a10.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/allwinner-a10.h
+++ b/include/hw/arm/allwinner-a10.h
@@ -XXX,XX +XXX,XX @@
 #include "hw/intc/allwinner-a10-pic.h"
 #include "hw/net/allwinner_emac.h"
 #include "hw/ide/ahci.h"
+#include "hw/usb/hcd-ohci.h"
+#include "hw/usb/hcd-ehci.h"
 
 #include "target/arm/cpu.h"
 
 
 #define AW_A10_SDRAM_BASE       0x40000000
 
+#define AW_A10_NUM_USB          2
+
 #define TYPE_AW_A10 "allwinner-a10"
 #define AW_A10(obj) OBJECT_CHECK(AwA10State, (obj), TYPE_AW_A10)
 
@@ -XXX,XX +XXX,XX @@ typedef struct AwA10State {
     AwEmacState emac;
     AllwinnerAHCIState sata;
     MemoryRegion sram_a;
+    EHCISysBusState ehci[AW_A10_NUM_USB];
+    OHCISysBusState ohci[AW_A10_NUM_USB];
 } AwA10State;
 
 #endif
diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/allwinner-a10.c
+++ b/hw/arm/allwinner-a10.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/arm/allwinner-a10.h"
 #include "hw/misc/unimp.h"
 #include "sysemu/sysemu.h"
+#include "hw/boards.h"
+#include "hw/usb/hcd-ohci.h"
 
 #define AW_A10_PIC_REG_BASE     0x01c20400
 #define AW_A10_PIT_REG_BASE     0x01c20c00
 #define AW_A10_UART0_REG_BASE   0x01c28000
 #define AW_A10_EMAC_BASE        0x01c0b000
+#define AW_A10_EHCI_BASE        0x01c14000
+#define AW_A10_OHCI_BASE        0x01c14400
 #define AW_A10_SATA_BASE        0x01c18000
 
 static void aw_a10_init(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void aw_a10_init(Object *obj)
 
     sysbus_init_child_obj(obj, "sata", &s->sata, sizeof(s->sata),
                           TYPE_ALLWINNER_AHCI);
+
+    if (machine_usb(current_machine)) {
+        int i;
+
+        for (i = 0; i < AW_A10_NUM_USB; i++) {
+            sysbus_init_child_obj(obj, "ehci[*]", OBJECT(&s->ehci[i]),
+                                  sizeof(s->ehci[i]), TYPE_PLATFORM_EHCI);
+            sysbus_init_child_obj(obj, "ohci[*]", OBJECT(&s->ohci[i]),
+                                  sizeof(s->ohci[i]), TYPE_SYSBUS_OHCI);
+        }
+    }
 }
 
 static void aw_a10_realize(DeviceState *dev, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void aw_a10_realize(DeviceState *dev, Error **errp)
     serial_mm_init(get_system_memory(), AW_A10_UART0_REG_BASE, 2,
                    qdev_get_gpio_in(dev, 1),
                    115200, serial_hd(0), DEVICE_NATIVE_ENDIAN);
+
+    if (machine_usb(current_machine)) {
+        int i;
+
+        for (i = 0; i < AW_A10_NUM_USB; i++) {
+            char bus[16];
+
+            sprintf(bus, "usb-bus.%d", i);
+
+            object_property_set_bool(OBJECT(&s->ehci[i]), true,
+                                     "companion-enable", &error_fatal);
+            object_property_set_bool(OBJECT(&s->ehci[i]), true, "realized",
+                                     &error_fatal);
+            sysbus_mmio_map(SYS_BUS_DEVICE(&s->ehci[i]), 0,
+                            AW_A10_EHCI_BASE + i * 0x8000);
+            sysbus_connect_irq(SYS_BUS_DEVICE(&s->ehci[i]), 0,
+                               qdev_get_gpio_in(dev, 39 + i));
+
+            object_property_set_str(OBJECT(&s->ohci[i]), bus, "masterbus",
+                                    &error_fatal);
+            object_property_set_bool(OBJECT(&s->ohci[i]), true, "realized",
+                                     &error_fatal);
+            sysbus_mmio_map(SYS_BUS_DEVICE(&s->ohci[i]), 0,
+                            AW_A10_OHCI_BASE + i * 0x8000);
+            sysbus_connect_irq(SYS_BUS_DEVICE(&s->ohci[i]), 0,
+                               qdev_get_gpio_in(dev, 64 + i));
+        }
+    }
 }
 
 static void aw_a10_class_init(ObjectClass *oc, void *data)
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

These instructions shift left or right depending on the sign
of the input, and 7 bits are significant to the shift.  This
requires several masks and selects in addition to the actual
shifts to form the complete answer.

That said, the operation is still a small improvement even for
two 64-bit elements -- 13 vector operations instead of 2 * 7
integer operations.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200216214232.4230-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h        |  11 +-
 target/arm/translate.h     |   6 +
 target/arm/neon_helper.c   |  33 ----
 target/arm/translate-a64.c |  18 +--
 target/arm/translate.c     | 299 +++++++++++++++++++++++++++++++++++--
 target/arm/vec_helper.c    |  88 +++++++++++
 6 files changed, 389 insertions(+), 66 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(neon_abd_s16, i32, i32, i32)
 DEF_HELPER_2(neon_abd_u32, i32, i32, i32)
 DEF_HELPER_2(neon_abd_s32, i32, i32, i32)
 
-DEF_HELPER_2(neon_shl_u8, i32, i32, i32)
-DEF_HELPER_2(neon_shl_s8, i32, i32, i32)
 DEF_HELPER_2(neon_shl_u16, i32, i32, i32)
 DEF_HELPER_2(neon_shl_s16, i32, i32, i32)
-DEF_HELPER_2(neon_shl_u32, i32, i32, i32)
-DEF_HELPER_2(neon_shl_s32, i32, i32, i32)
-DEF_HELPER_2(neon_shl_u64, i64, i64, i64)
-DEF_HELPER_2(neon_shl_s64, i64, i64, i64)
 DEF_HELPER_2(neon_rshl_u8, i32, i32, i32)
 DEF_HELPER_2(neon_rshl_s8, i32, i32, i32)
 DEF_HELPER_2(neon_rshl_u16, i32, i32, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_2(frint64_s, TCG_CALL_NO_RWG, f32, f32, ptr)
 DEF_HELPER_FLAGS_2(frint32_d, TCG_CALL_NO_RWG, f64, f64, ptr)
 DEF_HELPER_FLAGS_2(frint64_d, TCG_CALL_NO_RWG, f64, f64, ptr)
 
+DEF_HELPER_FLAGS_4(gvec_sshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ uint64_t vfp_expand_imm(int size, uint8_t imm8);
 extern const GVecGen3 mla_op[4];
 extern const GVecGen3 mls_op[4];
 extern const GVecGen3 cmtst_op[4];
+extern const GVecGen3 sshl_op[4];
+extern const GVecGen3 ushl_op[4];
 extern const GVecGen2i ssra_op[4];
 extern const GVecGen2i usra_op[4];
 extern const GVecGen2i sri_op[4];
@@ -XXX,XX +XXX,XX @@ extern const GVecGen4 sqadd_op[4];
 extern const GVecGen4 uqsub_op[4];
 extern const GVecGen4 sqsub_op[4];
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
+void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
+void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 /*
  * Forward to the isar_feature_* tests given a DisasContext pointer.
diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon_helper.c
+++ b/target/arm/neon_helper.c
@@ -XXX,XX +XXX,XX @@ NEON_VOP(abd_u32, neon_u32, 1)
     } else { \
         dest = src1 << tmp; \
     }} while (0)
-NEON_VOP(shl_u8, neon_u8, 4)
 NEON_VOP(shl_u16, neon_u16, 2)
-NEON_VOP(shl_u32, neon_u32, 1)
 #undef NEON_FN
 
-uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
-{
-    int8_t shift = (int8_t)shiftop;
-    if (shift >= 64 || shift <= -64) {
-        val = 0;
-    } else if (shift < 0) {
-        val >>= -shift;
-    } else {
-        val <<= shift;
-    }
-    return val;
-}
-
 #define NEON_FN(dest, src1, src2) do { \
     int8_t tmp; \
     tmp = (int8_t)src2; \
@@ -XXX,XX +XXX,XX @@ uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
     } else { \
         dest = src1 << tmp; \
     }} while (0)
-NEON_VOP(shl_s8, neon_s8, 4)
 NEON_VOP(shl_s16, neon_s16, 2)
-NEON_VOP(shl_s32, neon_s32, 1)
 #undef NEON_FN
 
-uint64_t HELPER(neon_shl_s64)(uint64_t valop, uint64_t shiftop)
-{
-    int8_t shift = (int8_t)shiftop;
-    int64_t val = valop;
-    if (shift >= 64) {
-        val = 0;
-    } else if (shift <= -64) {
-        val >>= 63;
-    } else if (shift < 0) {
-        val >>= -shift;
-    } else {
-        val <<= shift;
-    }
-    return val;
-}
-
 #define NEON_FN(dest, src1, src2) do { \
     int8_t tmp; \
     tmp = (int8_t)src2; \
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
         break;
     case 0x8: /* SSHL, USHL */
         if (u) {
-            gen_helper_neon_shl_u64(tcg_rd, tcg_rn, tcg_rm);
+            gen_ushl_i64(tcg_rd, tcg_rn, tcg_rm);
         } else {
-            gen_helper_neon_shl_s64(tcg_rd, tcg_rn, tcg_rm);
+            gen_sshl_i64(tcg_rd, tcg_rn, tcg_rm);
         }
         break;
     case 0x9: /* SQSHL, UQSHL */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                        is_q ? 16 : 8, vec_full_reg_size(s),
                        (u ? uqsub_op : sqsub_op) + size);
         return;
+    case 0x08: /* SSHL, USHL */
+        gen_gvec_op3(s, is_q, rd, rn, rm,
+                     u ? &ushl_op[size] : &sshl_op[size]);
+        return;
     case 0x0c: /* SMAX, UMAX */
         if (u) {
             gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                 genfn = fns[size][u];
                 break;
             }
-            case 0x8: /* SSHL, USHL */
-            {
-                static NeonGenTwoOpFn * const fns[3][2] = {
-                    { gen_helper_neon_shl_s8, gen_helper_neon_shl_u8 },
-                    { gen_helper_neon_shl_s16, gen_helper_neon_shl_u16 },
-                    { gen_helper_neon_shl_s32, gen_helper_neon_shl_u32 },
-                };
-                genfn = fns[size][u];
-                break;
-            }
             case 0x9: /* SQSHL, UQSHL */
             {
                 static NeonGenTwoOpEnvFn * const fns[3][2] = {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_shift_narrow(int size, TCGv_i32 var, TCGv_i32 shift,
         if (u) {
             switch (size) {
             case 1: gen_helper_neon_shl_u16(var, var, shift); break;
-            case 2: gen_helper_neon_shl_u32(var, var, shift); break;
+            case 2: gen_ushl_i32(var, var, shift); break;
             default: abort();
             }
         } else {
             switch (size) {
             case 1: gen_helper_neon_shl_s16(var, var, shift); break;
-            case 2: gen_helper_neon_shl_s32(var, var, shift); break;
+            case 2: gen_sshl_i32(var, var, shift); break;
             default: abort();
             }
         }
@@ -XXX,XX +XXX,XX @@ const GVecGen3 cmtst_op[4] = {
       .vece = MO_64 },
 };
 
+void gen_ushl_i32(TCGv_i32 dst, TCGv_i32 src, TCGv_i32 shift)
+{
+    TCGv_i32 lval = tcg_temp_new_i32();
+    TCGv_i32 rval = tcg_temp_new_i32();
+    TCGv_i32 lsh = tcg_temp_new_i32();
+    TCGv_i32 rsh = tcg_temp_new_i32();
+    TCGv_i32 zero = tcg_const_i32(0);
+    TCGv_i32 max = tcg_const_i32(32);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_ext8s_i32(lsh, shift);
+    tcg_gen_neg_i32(rsh, lsh);
+    tcg_gen_shl_i32(lval, src, lsh);
+    tcg_gen_shr_i32(rval, src, rsh);
+    tcg_gen_movcond_i32(TCG_COND_LTU, dst, lsh, max, lval, zero);
+    tcg_gen_movcond_i32(TCG_COND_LTU, dst, rsh, max, rval, dst);
+
+    tcg_temp_free_i32(lval);
+    tcg_temp_free_i32(rval);
+    tcg_temp_free_i32(lsh);
+    tcg_temp_free_i32(rsh);
+    tcg_temp_free_i32(zero);
+    tcg_temp_free_i32(max);
+}
+
+void gen_ushl_i64(TCGv_i64 dst, TCGv_i64 src, TCGv_i64 shift)
+{
+    TCGv_i64 lval = tcg_temp_new_i64();
+    TCGv_i64 rval = tcg_temp_new_i64();
+    TCGv_i64 lsh = tcg_temp_new_i64();
+    TCGv_i64 rsh = tcg_temp_new_i64();
+    TCGv_i64 zero = tcg_const_i64(0);
+    TCGv_i64 max = tcg_const_i64(64);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_ext8s_i64(lsh, shift);
+    tcg_gen_neg_i64(rsh, lsh);
+    tcg_gen_shl_i64(lval, src, lsh);
+    tcg_gen_shr_i64(rval, src, rsh);
+    tcg_gen_movcond_i64(TCG_COND_LTU, dst, lsh, max, lval, zero);
+    tcg_gen_movcond_i64(TCG_COND_LTU, dst, rsh, max, rval, dst);
+
+    tcg_temp_free_i64(lval);
+    tcg_temp_free_i64(rval);
+    tcg_temp_free_i64(lsh);
+    tcg_temp_free_i64(rsh);
+    tcg_temp_free_i64(zero);
+    tcg_temp_free_i64(max);
+}
+
+static void gen_ushl_vec(unsigned vece, TCGv_vec dst,
+                         TCGv_vec src, TCGv_vec shift)
+{
+    TCGv_vec lval = tcg_temp_new_vec_matching(dst);
+    TCGv_vec rval = tcg_temp_new_vec_matching(dst);
+    TCGv_vec lsh = tcg_temp_new_vec_matching(dst);
+    TCGv_vec rsh = tcg_temp_new_vec_matching(dst);
+    TCGv_vec msk, max;
+
+    tcg_gen_neg_vec(vece, rsh, shift);
+    if (vece == MO_8) {
+        tcg_gen_mov_vec(lsh, shift);
+    } else {
+        msk = tcg_temp_new_vec_matching(dst);
+        tcg_gen_dupi_vec(vece, msk, 0xff);
+        tcg_gen_and_vec(vece, lsh, shift, msk);
+        tcg_gen_and_vec(vece, rsh, rsh, msk);
+        tcg_temp_free_vec(msk);
+    }
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_shlv_vec(vece, lval, src, lsh);
+    tcg_gen_shrv_vec(vece, rval, src, rsh);
+
+    max = tcg_temp_new_vec_matching(dst);
+    tcg_gen_dupi_vec(vece, max, 8 << vece);
+
+    /*
+     * The choice of LT (signed) and GEU (unsigned) are biased toward
+     * the instructions of the x86_64 host.  For MO_8, the whole byte
+     * is significant so we must use an unsigned compare; otherwise we
+     * have already masked to a byte and so a signed compare works.
+     * Other tcg hosts have a full set of comparisons and do not care.
+     */
+    if (vece == MO_8) {
+        tcg_gen_cmp_vec(TCG_COND_GEU, vece, lsh, lsh, max);
+        tcg_gen_cmp_vec(TCG_COND_GEU, vece, rsh, rsh, max);
+        tcg_gen_andc_vec(vece, lval, lval, lsh);
+        tcg_gen_andc_vec(vece, rval, rval, rsh);
+    } else {
+        tcg_gen_cmp_vec(TCG_COND_LT, vece, lsh, lsh, max);
+        tcg_gen_cmp_vec(TCG_COND_LT, vece, rsh, rsh, max);
+        tcg_gen_and_vec(vece, lval, lval, lsh);
+        tcg_gen_and_vec(vece, rval, rval, rsh);
+    }
+    tcg_gen_or_vec(vece, dst, lval, rval);
+
+    tcg_temp_free_vec(max);
+    tcg_temp_free_vec(lval);
+    tcg_temp_free_vec(rval);
+    tcg_temp_free_vec(lsh);
+    tcg_temp_free_vec(rsh);
+}
+
+static const TCGOpcode ushl_list[] = {
+    INDEX_op_neg_vec, INDEX_op_shlv_vec,
+    INDEX_op_shrv_vec, INDEX_op_cmp_vec, 0
+};
+
+const GVecGen3 ushl_op[4] = {
+    { .fniv = gen_ushl_vec,
+      .fno = gen_helper_gvec_ushl_b,
+      .opt_opc = ushl_list,
+      .vece = MO_8 },
+    { .fniv = gen_ushl_vec,
+      .fno = gen_helper_gvec_ushl_h,
+      .opt_opc = ushl_list,
+      .vece = MO_16 },
+    { .fni4 = gen_ushl_i32,
+      .fniv = gen_ushl_vec,
+      .opt_opc = ushl_list,
+      .vece = MO_32 },
+    { .fni8 = gen_ushl_i64,
+      .fniv = gen_ushl_vec,
+      .opt_opc = ushl_list,
+      .vece = MO_64 },
+};
+
+void gen_sshl_i32(TCGv_i32 dst, TCGv_i32 src, TCGv_i32 shift)
+{
+    TCGv_i32 lval = tcg_temp_new_i32();
+    TCGv_i32 rval = tcg_temp_new_i32();
+    TCGv_i32 lsh = tcg_temp_new_i32();
+    TCGv_i32 rsh = tcg_temp_new_i32();
+    TCGv_i32 zero = tcg_const_i32(0);
+    TCGv_i32 max = tcg_const_i32(31);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_ext8s_i32(lsh, shift);
+    tcg_gen_neg_i32(rsh, lsh);
+    tcg_gen_shl_i32(lval, src, lsh);
+    tcg_gen_umin_i32(rsh, rsh, max);
+    tcg_gen_sar_i32(rval, src, rsh);
+    tcg_gen_movcond_i32(TCG_COND_LEU, lval, lsh, max, lval, zero);
+    tcg_gen_movcond_i32(TCG_COND_LT, dst, lsh, zero, rval, lval);
+
+    tcg_temp_free_i32(lval);
+    tcg_temp_free_i32(rval);
+    tcg_temp_free_i32(lsh);
+    tcg_temp_free_i32(rsh);
+    tcg_temp_free_i32(zero);
+    tcg_temp_free_i32(max);
+}
+
+void gen_sshl_i64(TCGv_i64 dst, TCGv_i64 src, TCGv_i64 shift)
+{
+    TCGv_i64 lval = tcg_temp_new_i64();
+    TCGv_i64 rval = tcg_temp_new_i64();
+    TCGv_i64 lsh = tcg_temp_new_i64();
+    TCGv_i64 rsh = tcg_temp_new_i64();
+    TCGv_i64 zero = tcg_const_i64(0);
+    TCGv_i64 max = tcg_const_i64(63);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_ext8s_i64(lsh, shift);
+    tcg_gen_neg_i64(rsh, lsh);
+    tcg_gen_shl_i64(lval, src, lsh);
+    tcg_gen_umin_i64(rsh, rsh, max);
+    tcg_gen_sar_i64(rval, src, rsh);
+    tcg_gen_movcond_i64(TCG_COND_LEU, lval, lsh, max, lval, zero);
+    tcg_gen_movcond_i64(TCG_COND_LT, dst, lsh, zero, rval, lval);
+
+    tcg_temp_free_i64(lval);
+    tcg_temp_free_i64(rval);
+    tcg_temp_free_i64(lsh);
+    tcg_temp_free_i64(rsh);
+    tcg_temp_free_i64(zero);
+    tcg_temp_free_i64(max);
+}
+
+static void gen_sshl_vec(unsigned vece, TCGv_vec dst,
+                         TCGv_vec src, TCGv_vec shift)
+{
+    TCGv_vec lval = tcg_temp_new_vec_matching(dst);
+    TCGv_vec rval = tcg_temp_new_vec_matching(dst);
+    TCGv_vec lsh = tcg_temp_new_vec_matching(dst);
+    TCGv_vec rsh = tcg_temp_new_vec_matching(dst);
+    TCGv_vec tmp = tcg_temp_new_vec_matching(dst);
+
+    /*
+     * Rely on the TCG guarantee that out of range shifts produce
+     * unspecified results, not undefined behaviour (i.e. no trap).
+     * Discard out-of-range results after the fact.
+     */
+    tcg_gen_neg_vec(vece, rsh, shift);
+    if (vece == MO_8) {
+        tcg_gen_mov_vec(lsh, shift);
+    } else {
+        tcg_gen_dupi_vec(vece, tmp, 0xff);
+        tcg_gen_and_vec(vece, lsh, shift, tmp);
+        tcg_gen_and_vec(vece, rsh, rsh, tmp);
+    }
+
+    /* Bound rsh so out of bound right shift gets -1.  */
+    tcg_gen_dupi_vec(vece, tmp, (8 << vece) - 1);
+    tcg_gen_umin_vec(vece, rsh, rsh, tmp);
+    tcg_gen_cmp_vec(TCG_COND_GT, vece, tmp, lsh, tmp);
+
+    tcg_gen_shlv_vec(vece, lval, src, lsh);
+    tcg_gen_sarv_vec(vece, rval, src, rsh);
+
+    /* Select in-bound left shift.  */
+    tcg_gen_andc_vec(vece, lval, lval, tmp);
+
+    /* Select between left and right shift.  */
+    if (vece == MO_8) {
+        tcg_gen_dupi_vec(vece, tmp, 0);
+        tcg_gen_cmpsel_vec(TCG_COND_LT, vece, dst, lsh, tmp, rval, lval);
+    } else {
+        tcg_gen_dupi_vec(vece, tmp, 0x80);
+        tcg_gen_cmpsel_vec(TCG_COND_LT, vece, dst, lsh, tmp, lval, rval);
+    }
+
+    tcg_temp_free_vec(lval);
+    tcg_temp_free_vec(rval);
+    tcg_temp_free_vec(lsh);
+    tcg_temp_free_vec(rsh);
+    tcg_temp_free_vec(tmp);
+}
+
+static const TCGOpcode sshl_list[] = {
+    INDEX_op_neg_vec, INDEX_op_umin_vec, INDEX_op_shlv_vec,
+    INDEX_op_sarv_vec, INDEX_op_cmp_vec, INDEX_op_cmpsel_vec, 0
+};
+
+const GVecGen3 sshl_op[4] = {
+    { .fniv = gen_sshl_vec,
+      .fno = gen_helper_gvec_sshl_b,
+      .opt_opc = sshl_list,
+      .vece = MO_8 },
+    { .fniv = gen_sshl_vec,
+      .fno = gen_helper_gvec_sshl_h,
+      .opt_opc = sshl_list,
+      .vece = MO_16 },
+    { .fni4 = gen_sshl_i32,
+      .fniv = gen_sshl_vec,
+      .opt_opc = sshl_list,
+      .vece = MO_32 },
+    { .fni8 = gen_sshl_i64,
+      .fniv = gen_sshl_vec,
+      .opt_opc = sshl_list,
+      .vece = MO_64 },
+};
+
 static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
                           TCGv_vec a, TCGv_vec b)
 {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                   vec_size, vec_size);
             }
             return 0;
+
+        case NEON_3R_VSHL:
+            /* Note the operation is vshl vd,vm,vn */
+            tcg_gen_gvec_3(rd_ofs, rm_ofs, rn_ofs, vec_size, vec_size,
+                           u ? &ushl_op[size] : &sshl_op[size]);
+            return 0;
         }
 
         if (size == 3) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 neon_load_reg64(cpu_V0, rn + pass);
                 neon_load_reg64(cpu_V1, rm + pass);
                 switch (op) {
-                case NEON_3R_VSHL:
-                    if (u) {
-                        gen_helper_neon_shl_u64(cpu_V0, cpu_V1, cpu_V0);
-                    } else {
-                        gen_helper_neon_shl_s64(cpu_V0, cpu_V1, cpu_V0);
-                    }
-                    break;
                 case NEON_3R_VQSHL:
                     if (u) {
                         gen_helper_neon_qshl_u64(cpu_V0, cpu_env,
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         }
         pairwise = 0;
         switch (op) {
-        case NEON_3R_VSHL:
         case NEON_3R_VQSHL:
         case NEON_3R_VRSHL:
         case NEON_3R_VQRSHL:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         case NEON_3R_VHSUB:
             GEN_NEON_INTEGER_OP(hsub);
             break;
-        case NEON_3R_VSHL:
-            GEN_NEON_INTEGER_OP(shl);
-            break;
         case NEON_3R_VQSHL:
             GEN_NEON_INTEGER_OP_ENV(qshl);
             break;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             }
                         } else {
                             if (input_unsigned) {
-                                gen_helper_neon_shl_u64(cpu_V0, in, tmp64);
+                                gen_ushl_i64(cpu_V0, in, tmp64);
                             } else {
-                                gen_helper_neon_shl_s64(cpu_V0, in, tmp64);
+                                gen_sshl_i64(cpu_V0, in, tmp64);
                             }
                         }
                         tmp = tcg_temp_new_i32();
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_fmlal_idx_a64)(void *vd, void *vn, void *vm,
     do_fmlal_idx(vd, vn, vm, &env->vfp.fp_status, desc,
                  get_flush_inputs_to_zero(&env->vfp.fp_status_f16));
 }
+
+void HELPER(gvec_sshl_b)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    int8_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz; ++i) {
+        int8_t mm = m[i];
+        int8_t nn = n[i];
+        int8_t res = 0;
+        if (mm >= 0) {
+            if (mm < 8) {
+                res = nn << mm;
+            }
+        } else {
+            res = nn >> (mm > -8 ? -mm : 7);
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_sshl_h)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    int16_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 2; ++i) {
+        int8_t mm = m[i];   /* only 8 bits of shift are significant */
+        int16_t nn = n[i];
+        int16_t res = 0;
+        if (mm >= 0) {
+            if (mm < 16) {
+                res = nn << mm;
+            }
+        } else {
+            res = nn >> (mm > -16 ? -mm : 15);
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_ushl_b)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    uint8_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz; ++i) {
+        int8_t mm = m[i];
+        uint8_t nn = n[i];
+        uint8_t res = 0;
+        if (mm >= 0) {
+            if (mm < 8) {
+                res = nn << mm;
+            }
+        } else {
+            if (mm > -8) {
+                res = nn >> -mm;
+            }
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(gvec_ushl_h)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    uint16_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 2; ++i) {
+        int8_t mm = m[i];   /* only 8 bits of shift are significant */
+        uint16_t nn = n[i];
+        uint16_t res = 0;
+        if (mm >= 0) {
+            if (mm < 16) {
+                res = nn << mm;
+            }
+        } else {
+            if (mm > -16) {
+                res = nn >> -mm;
+            }
+        }
+        d[i] = res;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The gvec form will be needed for implementing SVE2.

Extend the implementation to operate on uint64_t instead of uint32_t.
Use a counted inner loop instead of terminating when op1 goes to zero,
looking toward the required implementation for ARMv8.4-DIT.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200216214232.4230-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h        |  3 ++-
 target/arm/neon_helper.c   | 22 ----------------------
 target/arm/translate-a64.c | 10 +++-------
 target/arm/translate.c     | 11 ++++-------
 target/arm/vec_helper.c    | 30 ++++++++++++++++++++++++++++++
 5 files changed, 39 insertions(+), 37 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

The gvec form will be needed for implementing SVE2.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200216214232.4230-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h        |  4 +---
 target/arm/neon_helper.c   | 30 ------------------------------
 target/arm/translate-a64.c | 28 +++-------------------------
 target/arm/translate.c     | 16 ++--------------
 target/arm/vec_helper.c    | 33 +++++++++++++++++++++++++++++++++
 5 files changed, 39 insertions(+), 72 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_3(crc32, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
 DEF_HELPER_FLAGS_3(crc32c, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
 DEF_HELPER_2(dc_zva, void, env, i64)
 
-DEF_HELPER_FLAGS_2(neon_pmull_64_lo, TCG_CALL_NO_RWG_SE, i64, i64, i64)
-DEF_HELPER_FLAGS_2(neon_pmull_64_hi, TCG_CALL_NO_RWG_SE, i64, i64, i64)
-
 DEF_HELPER_FLAGS_5(gvec_qrdmlah_s16, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_qrdmlsh_s16, TCG_CALL_NO_RWG,
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(gvec_pmul_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_pmull_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon_helper.c
+++ b/target/arm/neon_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(neon_zip16)(void *vd, void *vm)
     rm[0] = m0;
     rd[0] = d0;
 }
-
-/* Helper function for 64 bit polynomial multiply case:
- * perform PolynomialMult(op1, op2) and return either the top or
- * bottom half of the 128 bit result.
- */
-uint64_t HELPER(neon_pmull_64_lo)(uint64_t op1, uint64_t op2)
-{
-    int bitnum;
-    uint64_t res = 0;
-
-    for (bitnum = 0; bitnum < 64; bitnum++) {
-        if (op1 & (1ULL << bitnum)) {
-            res ^= op2 << bitnum;
-        }
-    }
-    return res;
-}
-uint64_t HELPER(neon_pmull_64_hi)(uint64_t op1, uint64_t op2)
-{
-    int bitnum;
-    uint64_t res = 0;
-
-    /* bit 0 of op1 can't influence the high 64 bits at all */
-    for (bitnum = 1; bitnum < 64; bitnum++) {
-        if (op1 & (1ULL << bitnum)) {
-            res ^= op2 >> (64 - bitnum);
-        }
-    }
-    return res;
-}
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_3rd_narrowing(DisasContext *s, int is_q, int is_u, int size,
     clear_vec_high(s, is_q, rd);
 }
 
-static void handle_pmull_64(DisasContext *s, int is_q, int rd, int rn, int rm)
-{
-    /* PMULL of 64 x 64 -> 128 is an odd special case because it
-     * is the only three-reg-diff instruction which produces a
-     * 128-bit wide result from a single operation. However since
-     * it's possible to calculate the two halves more or less
-     * separately we just use two helper calls.
-     */
-    TCGv_i64 tcg_op1 = tcg_temp_new_i64();
-    TCGv_i64 tcg_op2 = tcg_temp_new_i64();
-    TCGv_i64 tcg_res = tcg_temp_new_i64();
-
-    read_vec_element(s, tcg_op1, rn, is_q, MO_64);
-    read_vec_element(s, tcg_op2, rm, is_q, MO_64);
-    gen_helper_neon_pmull_64_lo(tcg_res, tcg_op1, tcg_op2);
-    write_vec_element(s, tcg_res, rd, 0, MO_64);
-    gen_helper_neon_pmull_64_hi(tcg_res, tcg_op1, tcg_op2);
-    write_vec_element(s, tcg_res, rd, 1, MO_64);
-
-    tcg_temp_free_i64(tcg_op1);
-    tcg_temp_free_i64(tcg_op2);
-    tcg_temp_free_i64(tcg_res);
-}
-
 /* AdvSIMD three different
  *   31  30  29 28       24 23  22  21 20  16 15    12 11 10 9    5 4    0
  * +---+---+---+-----------+------+---+------+--------+-----+------+------+
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
             if (!fp_access_check(s)) {
                 return;
             }
-            handle_pmull_64(s, is_q, rd, rn, rm);
+            /* The Q field specifies lo/hi half input for this insn.  */
+            gen_gvec_op3_ool(s, true, rd, rn, rm, is_q,
+                             gen_helper_gvec_pmull_q);
             return;
         }
         goto is_widening;
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  * outside the loop below as it only performs a single pass.
                  */
                 if (op == 14 && size == 2) {
-                    TCGv_i64 tcg_rn, tcg_rm, tcg_rd;
-
                     if (!dc_isar_feature(aa32_pmull, s)) {
                         return 1;
                     }
-                    tcg_rn = tcg_temp_new_i64();
-                    tcg_rm = tcg_temp_new_i64();
-                    tcg_rd = tcg_temp_new_i64();
-                    neon_load_reg64(tcg_rn, rn);
-                    neon_load_reg64(tcg_rm, rm);
-                    gen_helper_neon_pmull_64_lo(tcg_rd, tcg_rn, tcg_rm);
-                    neon_store_reg64(tcg_rd, rd);
-                    gen_helper_neon_pmull_64_hi(tcg_rd, tcg_rn, tcg_rm);
-                    neon_store_reg64(tcg_rd, rd + 1);
-                    tcg_temp_free_i64(tcg_rn);
-                    tcg_temp_free_i64(tcg_rm);
-                    tcg_temp_free_i64(tcg_rd);
+                    tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
+                                       0, gen_helper_gvec_pmull_q);
                     return 0;
                 }
 
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_pmul_b)(void *vd, void *vn, void *vm, uint32_t desc)
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+/*
+ * 64x64->128 polynomial multiply.
+ * Because of the lanes are not accessed in strict columns,
+ * this probably cannot be turned into a generic helper.
+ */
+void HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, j, opr_sz = simd_oprsz(desc);
+    intptr_t hi = simd_data(desc);
+    uint64_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 8; i += 2) {
+        uint64_t nn = n[i + hi];
+        uint64_t mm = m[i + hi];
+        uint64_t rhi = 0;
+        uint64_t rlo = 0;
+
+        /* Bit 0 can only influence the low 64-bit result.  */
+        if (nn & 1) {
+            rlo = mm;
+        }
+
+        for (j = 1; j < 64; ++j) {
+            uint64_t mask = -((nn >> j) & 1);
+            rlo ^= (mm << j) & mask;
+            rhi ^= (mm >> (64 - j)) & mask;
+        }
+        d[i] = rlo;
+        d[i + 1] = rhi;
+    }
+    clear_tail(d, opr_sz, simd_maxsz(desc));
+}
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We still need two different helpers, since NEON and SVE2 get the
inputs from different locations within the source vector.  However,
we can convert both to the same internal form for computation.

The sve2 helper is not used yet, but adding it with this patch
helps illustrate why the neon changes are helpful.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200216214232.4230-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-sve.h    |  2 ++
 target/arm/helper.h        |  3 +-
 target/arm/neon_helper.c   | 32 --------------------
 target/arm/translate-a64.c | 27 +++++++++++------
 target/arm/translate.c     | 26 ++++++++---------
 target/arm/vec_helper.c    | 60 ++++++++++++++++++++++++++++++++++++++
 6 files changed, 95 insertions(+), 55 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_6(sve_stdd_le_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 DEF_HELPER_FLAGS_6(sve_stdd_be_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve2_pmull_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_2(neon_sub_u8, i32, i32, i32)
 DEF_HELPER_2(neon_sub_u16, i32, i32, i32)
 DEF_HELPER_2(neon_mul_u8, i32, i32, i32)
 DEF_HELPER_2(neon_mul_u16, i32, i32, i32)
-DEF_HELPER_2(neon_mull_p8, i64, i32, i32)
 
 DEF_HELPER_2(neon_tst_u8, i32, i32, i32)
 DEF_HELPER_2(neon_tst_u16, i32, i32, i32)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_pmul_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_pmull_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(neon_pmull_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon_helper.c
+++ b/target/arm/neon_helper.c
@@ -XXX,XX +XXX,XX @@ NEON_VOP(mul_u8, neon_u8, 4)
 NEON_VOP(mul_u16, neon_u16, 2)
 #undef NEON_FN
 
-/* Polynomial multiplication is like integer multiplication except the
-   partial products are XORed, not added.  */
-uint64_t HELPER(neon_mull_p8)(uint32_t op1, uint32_t op2)
-{
-    uint64_t result = 0;
-    uint64_t mask;
-    uint64_t op2ex = op2;
-    op2ex = (op2ex & 0xff) |
-        ((op2ex & 0xff00) << 8) |
-        ((op2ex & 0xff0000) << 16) |
-        ((op2ex & 0xff000000) << 24);
-    while (op1) {
-        mask = 0;
-        if (op1 & 1) {
-            mask |= 0xffff;
-        }
-        if (op1 & (1 << 8)) {
-            mask |= (0xffffU << 16);
-        }
-        if (op1 & (1 << 16)) {
-            mask |= (0xffffULL << 32);
-        }
-        if (op1 & (1 << 24)) {
-            mask |= (0xffffULL << 48);
-        }
-        result ^= op2ex & mask;
-        op1 = (op1 >> 1) & 0x7f7f7f7f;
-        op2ex <<= 1;
-    }
-    return result;
-}
-
 #define NEON_FN(dest, src1, src2) dest = (src1 & src2) ? -1 : 0
 NEON_VOP(tst_u8, neon_u8, 4)
 NEON_VOP(tst_u16, neon_u16, 2)
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_3rd_widening(DisasContext *s, int is_q, int is_u, int size,
                 gen_helper_neon_addl_saturate_s32(tcg_passres, cpu_env,
                                                   tcg_passres, tcg_passres);
                 break;
-            case 14: /* PMULL */
-                assert(size == 0);
-                gen_helper_neon_mull_p8(tcg_passres, tcg_op1, tcg_op2);
-                break;
             default:
                 g_assert_not_reached();
             }
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
         handle_3rd_narrowing(s, is_q, is_u, size, opcode, rd, rn, rm);
         break;
     case 14: /* PMULL, PMULL2 */
-        if (is_u || size == 1 || size == 2) {
+        if (is_u) {
             unallocated_encoding(s);
             return;
         }
-        if (size == 3) {
+        switch (size) {
+        case 0: /* PMULL.P8 */
+            if (!fp_access_check(s)) {
+                return;
+            }
+            /* The Q field specifies lo/hi half input for this insn.  */
+            gen_gvec_op3_ool(s, true, rd, rn, rm, is_q,
+                             gen_helper_neon_pmull_h);
+            break;
+
+        case 3: /* PMULL.P64 */
             if (!dc_isar_feature(aa64_pmull, s)) {
                 unallocated_encoding(s);
                 return;
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
             /* The Q field specifies lo/hi half input for this insn.  */
             gen_gvec_op3_ool(s, true, rd, rn, rm, is_q,
                              gen_helper_gvec_pmull_q);
-            return;
+            break;
+
+        default:
+            unallocated_encoding(s);
+            break;
         }
-        goto is_widening;
+        return;
     case 9: /* SQDMLAL, SQDMLAL2 */
     case 11: /* SQDMLSL, SQDMLSL2 */
     case 13: /* SQDMULL, SQDMULL2 */
@@ -XXX,XX +XXX,XX @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
             unallocated_encoding(s);
             return;
         }
-    is_widening:
         if (!fp_access_check(s)) {
             return;
         }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     return 1;
                 }
 
-                /* Handle VMULL.P64 (Polynomial 64x64 to 128 bit multiply)
-                 * outside the loop below as it only performs a single pass.
-                 */
-                if (op == 14 && size == 2) {
-                    if (!dc_isar_feature(aa32_pmull, s)) {
-                        return 1;
+                /* Handle polynomial VMULL in a single pass.  */
+                if (op == 14) {
+                    if (size == 0) {
+                        /* VMULL.P8 */
+                        tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
+                                           0, gen_helper_neon_pmull_h);
+                    } else {
+                        /* VMULL.P64 */
+                        if (!dc_isar_feature(aa32_pmull, s)) {
+                            return 1;
+                        }
+                        tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
+                                           0, gen_helper_gvec_pmull_q);
                     }
-                    tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
-                                       0, gen_helper_gvec_pmull_q);
                     return 0;
                 }
 
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
                         gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
                         break;
-                    case 14: /* Polynomial VMULL */
-                        gen_helper_neon_mull_p8(cpu_V0, tmp, tmp2);
-                        tcg_temp_free_i32(tmp2);
-                        tcg_temp_free_i32(tmp);
-                        break;
                     default: /* 15 is RESERVED: caught earlier  */
                         abort();
                     }
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, uint32_t desc)
     }
     clear_tail(d, opr_sz, simd_maxsz(desc));
 }
+
+/*
+ * 8x8->16 polynomial multiply.
+ *
+ * The byte inputs are expanded to (or extracted from) half-words.
+ * Note that neon and sve2 get the inputs from different positions.
+ * This allows 4 bytes to be processed in parallel with uint64_t.
+ */
+
+static uint64_t expand_byte_to_half(uint64_t x)
+{
+    return  (x & 0x000000ff)
+         | ((x & 0x0000ff00) << 8)
+         | ((x & 0x00ff0000) << 16)
+         | ((x & 0xff000000) << 24);
+}
+
+static uint64_t pmull_h(uint64_t op1, uint64_t op2)
+{
+    uint64_t result = 0;
+    int i;
+
+    for (i = 0; i < 8; ++i) {
+        uint64_t mask = (op1 & 0x0001000100010001ull) * 0xffff;
+        result ^= op2 & mask;
+        op1 >>= 1;
+        op2 <<= 1;
+    }
+    return result;
+}
+
+void HELPER(neon_pmull_h)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    int hi = simd_data(desc);
+    uint64_t *d = vd, *n = vn, *m = vm;
+    uint64_t nn = n[hi], mm = m[hi];
+
+    d[0] = pmull_h(expand_byte_to_half(nn), expand_byte_to_half(mm));
+    nn >>= 32;
+    mm >>= 32;
+    d[1] = pmull_h(expand_byte_to_half(nn), expand_byte_to_half(mm));
+
+    clear_tail(d, 16, simd_maxsz(desc));
+}
+
+#ifdef TARGET_AARCH64
+void HELPER(sve2_pmull_h)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    int shift = simd_data(desc) * 8;
+    intptr_t i, opr_sz = simd_oprsz(desc);
+    uint64_t *d = vd, *n = vn, *m = vm;
+
+    for (i = 0; i < opr_sz / 8; ++i) {
+        uint64_t nn = (n[i] >> shift) & 0x00ff00ff00ff00ffull;
+        uint64_t mm = (m[i] >> shift) & 0x00ff00ff00ff00ffull;
+
+        d[i] = pmull_h(nn, mm);
+    }
+}
+#endif
-- 
2.20.1

From: Francisco Iglesias <francisco.iglesias@xilinx.com>

Correct the number of dummy cycles required by the FAST_READ_4 command (to
be eight, one dummy byte).

Fixes: ef06ca3946 ("xilinx_spips: Add support for RX discard and RX drain")
Suggested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 20200218113350.6090-1-frasse.iglesias@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/ssi/xilinx_spips.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ssi/xilinx_spips.c b/hw/ssi/xilinx_spips.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/ssi/xilinx_spips.c
+++ b/hw/ssi/xilinx_spips.c
@@ -XXX,XX +XXX,XX @@ static int xilinx_spips_num_dummies(XilinxQSPIPS *qs, uint8_t command)
     case FAST_READ:
     case DOR:
     case QOR:
+    case FAST_READ_4:
     case DOR_4:
     case QOR_4:
         return 1;
     case DIOR:
-    case FAST_READ_4:
     case DIOR_4:
         return 2;
     case QIOR:
-- 
2.20.1

From: Guenter Roeck <linux@roeck-us.net>

Booting the r2d machine from flash fails because flash is not discovered.
Looking at the flattened memory tree, we see the following.

FlatView #1
 AS "memory", root: system
 AS "cpu-memory-0", root: system
 AS "sh_pci_host", root: bus master container
 Root memory region: system
  0000000000000000-000000000000ffff (prio 0, i/o): io
  0000000000010000-0000000000ffffff (prio 0, i/o): r2d.flash @0000000000010000

The overlapping memory region is sh_pci.isa, ie the ISA I/O region bridge.
This region is initially assigned to address 0xfe240000, but overwritten
with a write into the PCIIOBR register. This write is expected to adjust
the PCI memory window, but not to change the region's base adddress.

Peter Maydell provided the following detailed explanation.

"Section 22.3.7 and in particular figure 22.3 (of "SSH7751R user's manual:
hardware") are clear about how this is supposed to work: there is a window
at 0xfe240000 in the system register space for PCI I/O space. When the CPU
makes an access into that area, the PCI controller calculates the PCI
address to use by combining bits 0..17 of the system address with the
bits 31..18 value that the guest has put into the PCIIOBR. That is, writing
to the PCIIOBR changes which section of the IO address space is visible in
the 0xfe240000 window. Instead what QEMU's implementation does is move the
window to whatever value the guest writes to the PCIIOBR register -- so if
the guest writes 0 we put the window at 0 in system address space."

Fix the problem by calling memory_region_set_alias_offset() instead of
removing and re-adding the PCI ISA subregion on writes into PCIIOBR.
At the same time, in sh_pci_device_realize(), don't set iobr since
it is overwritten later anyway. Instead, pass the base address to
memory_region_add_subregion() directly.

Many thanks to Peter Maydell for the detailed problem analysis, and for
providing suggestions on how to fix the problem.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Message-id: 20200218201050.15273-1-linux@roeck-us.net
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/sh4/sh_pci.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/hw/sh4/sh_pci.c b/hw/sh4/sh_pci.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/sh4/sh_pci.c
+++ b/hw/sh4/sh_pci.c
@@ -XXX,XX +XXX,XX @@ static void sh_pci_reg_write (void *p, hwaddr addr, uint64_t val,
         pcic->mbr = val & 0xff000001;
         break;
     case 0x1c8:
-        if ((val & 0xfffc0000) != (pcic->iobr & 0xfffc0000)) {
-            memory_region_del_subregion(get_system_memory(), &pcic->isa);
-            pcic->iobr = val & 0xfffc0001;
-            memory_region_add_subregion(get_system_memory(),
-                                        pcic->iobr & 0xfffc0000, &pcic->isa);
-        }
+        pcic->iobr = val & 0xfffc0001;
+        memory_region_set_alias_offset(&pcic->isa, val & 0xfffc0000);
         break;
     case 0x220:
         pci_data_write(phb->bus, pcic->par, val, 4);
@@ -XXX,XX +XXX,XX @@ static void sh_pci_device_realize(DeviceState *dev, Error **errp)
                              get_system_io(), 0, 0x40000);
     sysbus_init_mmio(sbd, &s->memconfig_p4);
     sysbus_init_mmio(sbd, &s->memconfig_a7);
-    s->iobr = 0xfe240000;
-    memory_region_add_subregion(get_system_memory(), s->iobr, &s->isa);
+    memory_region_add_subregion(get_system_memory(), 0xfe240000, &s->isa);
 
     s->dev = pci_create_simple(phb->bus, PCI_DEVFN(0, 0), "sh_pci_host");
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The old name, isar_feature_aa32_fp_d32, does not reflect
the MVFR0 field name, SIMDReg.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200214181547.21408-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: wrapped one long line]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h               |  2 +-
 target/arm/translate-vfp.inc.c | 53 +++++++++++++++++-----------------
 2 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, FP) == 1;
 }
 
-static inline bool isar_feature_aa32_fp_d32(const ARMISARegisters *id)
+static inline bool isar_feature_aa32_simd_r32(const ARMISARegisters *id)
 {
     /* Return true if D16-D31 are implemented */
     return FIELD_EX32(id->mvfr0, MVFR0, SIMDREG) >= 2;
diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
         ((a->vm | a->vn | a->vd) & 0x10)) {
         return false;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
         ((a->vm | a->vn | a->vd) & 0x10)) {
         return false;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_fp_d32, s) &&
+    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
         ((a->vm | a->vd) & 0x10)) {
         return false;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+    if (dp && !dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
     uint32_t offset;
 
     /* UNDEF accesses to D16-D31 if they don't exist */
-    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vn & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
     uint32_t offset;
 
     /* UNDEF accesses to D16-D31 if they don't exist */
-    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vn & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist */
-    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vn & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vn & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
      */
 
     /* UNDEF accesses to D16-D31 if they don't exist */
-    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
     TCGv_i64 tmp;
 
     /* UNDEF accesses to D16-D31 if they don't exist */
-    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist */
-    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd + n) > 16) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd + n) > 16) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
     TCGv_ptr fpst;
 
     /* UNDEF accesses to D16-D31 if they don't exist */
-    if (!dc_isar_feature(aa32_fp_d32, s) && ((vd | vn | vm) & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vn | vm) & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
     TCGv_i64 f0, fd;
 
     /* UNDEF accesses to D16-D31 if they don't exist */
-    if (!dc_isar_feature(aa32_fp_d32, s) && ((vd | vm) & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vm) & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VFM_dp(DisasContext *s, arg_VFM_dp *a)
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vn | a->vm) & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     vd = a->vd;
 
     /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_fp_d32, s) && (vd & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && (vd & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && ((a->vd | a->vm) & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd  & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd  & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm  & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm  & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && ((a->vd | a->vm) & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && ((a->vd | a->vm) & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_fp_d32, s) && ((a->vd | a->vm) & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && ((a->vd | a->vm) & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
     TCGv_i32 vm;
 
     /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
     TCGv_i32 vd;
 
     /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
     TCGv_ptr fpst;
 
     /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
     }
 
     /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vd & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
     TCGv_ptr fpst;
 
     /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_fp_d32, s) && (a->vm & 0x10)) {
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
         return false;
     }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Many uses of ARM_FEATURE_VFP3 are testing for the number of simd
registers implemented.  Use the proper test vs MVFR0.SIMDReg.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200214181547.21408-4-richard.henderson@linaro.org
[PMM: fix typo in commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c       |  9 ++++-----
 target/arm/helper.c    | 13 ++++++-------
 target/arm/translate.c |  2 +-
 3 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_dump_state(CPUState *cs, FILE *f, int flags)
 
     if (flags & CPU_DUMP_FPU) {
         int numvfpregs = 0;
-        if (arm_feature(env, ARM_FEATURE_VFP)) {
-            numvfpregs += 16;
-        }
-        if (arm_feature(env, ARM_FEATURE_VFP3)) {
-            numvfpregs += 16;
+        if (cpu_isar_feature(aa32_simd_r32, cpu)) {
+            numvfpregs = 32;
+        } else if (arm_feature(env, ARM_FEATURE_VFP)) {
+            numvfpregs = 16;
         }
         for (i = 0; i < numvfpregs; i++) {
             uint64_t v = *aa32_vfp_dreg(env, i);
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void switch_mode(CPUARMState *env, int mode);
 
 static int vfp_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg)
 {
-    int nregs;
+    ARMCPU *cpu = env_archcpu(env);
+    int nregs = cpu_isar_feature(aa32_simd_r32, cpu) ? 32 : 16;
 
     /* VFP data registers are always little-endian.  */
-    nregs = arm_feature(env, ARM_FEATURE_VFP3) ? 32 : 16;
     if (reg < nregs) {
         stq_le_p(buf, *aa32_vfp_dreg(env, reg));
         return 8;
@@ -XXX,XX +XXX,XX @@ static int vfp_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg)
 
 static int vfp_gdb_set_reg(CPUARMState *env, uint8_t *buf, int reg)
 {
-    int nregs;
+    ARMCPU *cpu = env_archcpu(env);
+    int nregs = cpu_isar_feature(aa32_simd_r32, cpu) ? 32 : 16;
 
-    nregs = arm_feature(env, ARM_FEATURE_VFP3) ? 32 : 16;
     if (reg < nregs) {
         *aa32_vfp_dreg(env, reg) = ldq_le_p(buf);
         return 8;
@@ -XXX,XX +XXX,XX @@ static void cpacr_write(CPUARMState *env, const ARMCPRegInfo *ri,
             /* VFPv3 and upwards with NEON implement 32 double precision
              * registers (D0-D31).
              */
-            if (!arm_feature(env, ARM_FEATURE_NEON) ||
-                    !arm_feature(env, ARM_FEATURE_VFP3)) {
+            if (!cpu_isar_feature(aa32_simd_r32, env_archcpu(env))) {
                 /* D32DIS [30] is RAO/WI if D16-31 are not implemented. */
                 value |= (1 << 30);
             }
@@ -XXX,XX +XXX,XX @@ void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu)
     } else if (arm_feature(env, ARM_FEATURE_NEON)) {
         gdb_register_coprocessor(cs, vfp_gdb_get_reg, vfp_gdb_set_reg,
                                  51, "arm-neon.xml", 0);
-    } else if (arm_feature(env, ARM_FEATURE_VFP3)) {
+    } else if (cpu_isar_feature(aa32_simd_r32, cpu)) {
         gdb_register_coprocessor(cs, vfp_gdb_get_reg, vfp_gdb_set_reg,
                                  35, "arm-vfp3.xml", 0);
     } else if (arm_feature(env, ARM_FEATURE_VFP)) {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
 #define VFP_SREG(insn, bigbit, smallbit) \
   ((VFP_REG_SHR(insn, bigbit - 1) & 0x1e) | (((insn) >> (smallbit)) & 1))
 #define VFP_DREG(reg, insn, bigbit, smallbit) do { \
-    if (arm_dc_feature(s, ARM_FEATURE_VFP3)) { \
+    if (dc_isar_feature(aa32_simd_r32, s)) { \
         reg = (((insn) >> (bigbit)) & 0x0f) \
               | (((insn) >> ((smallbit) - 4)) & 0x10); \
     } else { \
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We are going to convert FEATURE tests to ISAR tests,
so FPSP needs to be set for these cpus, like we have
already for FPDP.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200214181547.21408-5-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm926_initfn(Object *obj)
      */
     cpu->isar.id_isar1 = FIELD_DP32(cpu->isar.id_isar1, ID_ISAR1, JAZELLE, 1);
     /*
-     * Similarly, we need to set MVFR0 fields to enable double precision
-     * and short vector support even though ARMv5 doesn't have this register.
+     * Similarly, we need to set MVFR0 fields to enable vfp and short vector
+     * support even though ARMv5 doesn't have this register.
      */
     cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPSHVEC, 1);
+    cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPSP, 1);
     cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPDP, 1);
 }
 
@@ -XXX,XX +XXX,XX @@ static void arm1026_initfn(Object *obj)
      */
     cpu->isar.id_isar1 = FIELD_DP32(cpu->isar.id_isar1, ID_ISAR1, JAZELLE, 1);
     /*
-     * Similarly, we need to set MVFR0 fields to enable double precision
-     * and short vector support even though ARMv5 doesn't have this register.
+     * Similarly, we need to set MVFR0 fields to enable vfp and short vector
+     * support even though ARMv5 doesn't have this register.
      */
     cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPSHVEC, 1);
+    cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPSP, 1);
     cpu->isar.mvfr0 = FIELD_DP32(cpu->isar.mvfr0, MVFR0, FPDP, 1);
 
     {
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Use this in the places that were checking ARM_FEATURE_VFP, and
are obviously testing for the existance of the register set
as opposed to testing for some particular instruction extension.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200214181547.21408-6-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h        |  6 ++++++
 hw/intc/armv7m_nvic.c   | 20 ++++++++++----------
 linux-user/arm/signal.c |  4 ++--
 target/arm/arch_dump.c  | 11 ++++++-----
 target/arm/cpu.c        |  8 ++++----
 target/arm/helper.c     |  4 ++--
 target/arm/m_helper.c   | 11 ++++++-----
 target/arm/machine.c    |  3 +--
 8 files changed, 37 insertions(+), 30 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, FP) == 1;
 }
 
+static inline bool isar_feature_aa32_simd_r16(const ARMISARegisters *id)
+{
+    /* Return true if D0-D15 are implemented */
+    return FIELD_EX32(id->mvfr0, MVFR0, SIMDREG) > 0;
+}
+
 static inline bool isar_feature_aa32_simd_r32(const ARMISARegisters *id)
 {
     /* Return true if D16-D31 are implemented */
diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
     case 0xd84: /* CSSELR */
         return cpu->env.v7m.csselr[attrs.secure];
     case 0xd88: /* CPACR */
-        if (!arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+        if (!cpu_isar_feature(aa32_simd_r16, cpu)) {
             return 0;
         }
         return cpu->env.v7m.cpacr[attrs.secure];
     case 0xd8c: /* NSACR */
-        if (!attrs.secure || !arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+        if (!attrs.secure || !cpu_isar_feature(aa32_simd_r16, cpu)) {
             return 0;
         }
         return cpu->env.v7m.nsacr;
@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
         }
         return cpu->env.v7m.sfar;
     case 0xf34: /* FPCCR */
-        if (!arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+        if (!cpu_isar_feature(aa32_simd_r16, cpu)) {
             return 0;
         }
         if (attrs.secure) {
@@ -XXX,XX +XXX,XX @@ static uint32_t nvic_readl(NVICState *s, uint32_t offset, MemTxAttrs attrs)
             return value;
         }
     case 0xf38: /* FPCAR */
-        if (!arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+        if (!cpu_isar_feature(aa32_simd_r16, cpu)) {
             return 0;
         }
         return cpu->env.v7m.fpcar[attrs.secure];
     case 0xf3c: /* FPDSCR */
-        if (!arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+        if (!cpu_isar_feature(aa32_simd_r16, cpu)) {
             return 0;
         }
         return cpu->env.v7m.fpdscr[attrs.secure];
@@ -XXX,XX +XXX,XX @@ static void nvic_writel(NVICState *s, uint32_t offset, uint32_t value,
         }
         break;
     case 0xd88: /* CPACR */
-        if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+        if (cpu_isar_feature(aa32_simd_r16, cpu)) {
             /* We implement only the Floating Point extension's CP10/CP11 */
             cpu->env.v7m.cpacr[attrs.secure] = value & (0xf << 20);
         }
         break;
     case 0xd8c: /* NSACR */
-        if (attrs.secure && arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+        if (attrs.secure && cpu_isar_feature(aa32_simd_r16, cpu)) {
             /* We implement only the Floating Point extension's CP10/CP11 */
             cpu->env.v7m.nsacr = value & (3 << 10);
         }
@@ -XXX,XX +XXX,XX @@ static void nvic_writel(NVICState *s, uint32_t offset, uint32_t value,
         break;
     }
     case 0xf34: /* FPCCR */
-        if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+        if (cpu_isar_feature(aa32_simd_r16, cpu)) {
             /* Not all bits here are banked. */
             uint32_t fpccr_s;
 
@@ -XXX,XX +XXX,XX @@ static void nvic_writel(NVICState *s, uint32_t offset, uint32_t value,
         }
         break;
     case 0xf38: /* FPCAR */
-        if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+        if (cpu_isar_feature(aa32_simd_r16, cpu)) {
             value &= ~7;
             cpu->env.v7m.fpcar[attrs.secure] = value;
         }
         break;
     case 0xf3c: /* FPDSCR */
-        if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+        if (cpu_isar_feature(aa32_simd_r16, cpu)) {
             value &= 0x07c00000;
             cpu->env.v7m.fpdscr[attrs.secure] = value;
         }
diff --git a/linux-user/arm/signal.c b/linux-user/arm/signal.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/arm/signal.c
+++ b/linux-user/arm/signal.c
@@ -XXX,XX +XXX,XX @@ static void setup_sigframe_v2(struct target_ucontext_v2 *uc,
     setup_sigcontext(&uc->tuc_mcontext, env, set->sig[0]);
     /* Save coprocessor signal frame.  */
     regspace = uc->tuc_regspace;
-    if (arm_feature(env, ARM_FEATURE_VFP)) {
+    if (cpu_isar_feature(aa32_simd_r16, env_archcpu(env))) {
         regspace = setup_sigframe_v2_vfp(regspace, env);
     }
     if (arm_feature(env, ARM_FEATURE_IWMMXT)) {
@@ -XXX,XX +XXX,XX @@ static int do_sigframe_return_v2(CPUARMState *env,
 
     /* Restore coprocessor signal frame */
     regspace = uc->tuc_regspace;
-    if (arm_feature(env, ARM_FEATURE_VFP)) {
+    if (cpu_isar_feature(aa32_simd_r16, env_archcpu(env))) {
         regspace = restore_sigframe_v2_vfp(env, regspace);
         if (!regspace) {
             return 1;
diff --git a/target/arm/arch_dump.c b/target/arm/arch_dump.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/arch_dump.c
+++ b/target/arm/arch_dump.c
@@ -XXX,XX +XXX,XX @@ int arm_cpu_write_elf32_note(WriteCoreDumpFunction f, CPUState *cs,
                              int cpuid, void *opaque)
 {
     struct arm_note note;
-    CPUARMState *env = &ARM_CPU(cs)->env;
+    ARMCPU *cpu = ARM_CPU(cs);
+    CPUARMState *env = &cpu->env;
     DumpState *s = opaque;
-    int ret, i, fpvalid = !!arm_feature(env, ARM_FEATURE_VFP);
+    int ret, i;
+    bool fpvalid = cpu_isar_feature(aa32_simd_r16, cpu);
 
     arm_note_init(&note, s, "CORE", 5, NT_PRSTATUS, sizeof(note.prstatus));
 
@@ -XXX,XX +XXX,XX @@ int cpu_get_dump_info(ArchDumpInfo *info,
 ssize_t cpu_get_note_size(int class, int machine, int nr_cpus)
 {
     ARMCPU *cpu = ARM_CPU(first_cpu);
-    CPUARMState *env = &cpu->env;
     size_t note_size;
 
     if (class == ELFCLASS64) {
@@ -XXX,XX +XXX,XX @@ ssize_t cpu_get_note_size(int class, int machine, int nr_cpus)
         note_size += AARCH64_PRFPREG_NOTE_SIZE;
 #ifdef TARGET_AARCH64
         if (cpu_isar_feature(aa64_sve, cpu)) {
-            note_size += AARCH64_SVE_NOTE_SIZE(env);
+            note_size += AARCH64_SVE_NOTE_SIZE(&cpu->env);
         }
 #endif
     } else {
         note_size = ARM_PRSTATUS_NOTE_SIZE;
-        if (arm_feature(env, ARM_FEATURE_VFP)) {
+        if (cpu_isar_feature(aa32_simd_r16, cpu)) {
             note_size += ARM_VFP_NOTE_SIZE;
         }
     }
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
             env->v7m.ccr[M_REG_S] |= R_V7M_CCR_UNALIGN_TRP_MASK;
         }
 
-        if (arm_feature(env, ARM_FEATURE_VFP)) {
+        if (cpu_isar_feature(aa32_simd_r16, cpu)) {
             env->v7m.fpccr[M_REG_NS] = R_V7M_FPCCR_ASPEN_MASK;
             env->v7m.fpccr[M_REG_S] = R_V7M_FPCCR_ASPEN_MASK |
                 R_V7M_FPCCR_LSPEN_MASK | R_V7M_FPCCR_S_MASK;
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_dump_state(CPUState *cs, FILE *f, int flags)
         int numvfpregs = 0;
         if (cpu_isar_feature(aa32_simd_r32, cpu)) {
             numvfpregs = 32;
-        } else if (arm_feature(env, ARM_FEATURE_VFP)) {
+        } else if (cpu_isar_feature(aa32_simd_r16, cpu)) {
             numvfpregs = 16;
         }
         for (i = 0; i < numvfpregs; i++) {
@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
      * KVM does not currently allow us to lie to the guest about its
      * ID/feature registers, so the guest always sees what the host has.
      */
-    if (arm_feature(&cpu->env, ARM_FEATURE_VFP)) {
+    if (cpu_isar_feature(aa32_simd_r16, cpu)) {
         cpu->has_vfp = true;
         if (!kvm_enabled()) {
             qdev_property_add_static(DEVICE(obj), &arm_cpu_has_vfp_property);
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
      * We rely on no XScale CPU having VFP so we can use the same bits in the
      * TB flags field for VECSTRIDE and XSCALE_CPAR.
      */
-    assert(!(arm_feature(env, ARM_FEATURE_VFP) &&
+    assert(!(cpu_isar_feature(aa32_simd_r16, cpu) &&
              arm_feature(env, ARM_FEATURE_XSCALE)));
 
     if (arm_feature(env, ARM_FEATURE_V7) &&
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void cpacr_write(CPUARMState *env, const ARMCPRegInfo *ri,
          * ASEDIS [31] and D32DIS [30] are both UNK/SBZP without VFP.
          * TRCDIS [28] is RAZ/WI since we do not implement a trace macrocell.
          */
-        if (arm_feature(env, ARM_FEATURE_VFP)) {
+        if (cpu_isar_feature(aa32_simd_r16, env_archcpu(env))) {
             /* VFP coprocessor: cp10 & cp11 [23:20] */
             mask |= (1 << 31) | (1 << 30) | (0xf << 20);
 
@@ -XXX,XX +XXX,XX @@ void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu)
     } else if (cpu_isar_feature(aa32_simd_r32, cpu)) {
         gdb_register_coprocessor(cs, vfp_gdb_get_reg, vfp_gdb_set_reg,
                                  35, "arm-vfp3.xml", 0);
-    } else if (arm_feature(env, ARM_FEATURE_VFP)) {
+    } else if (cpu_isar_feature(aa32_simd_r16, cpu)) {
         gdb_register_coprocessor(cs, vfp_gdb_get_reg, vfp_gdb_set_reg,
                                  19, "arm-vfp.xml", 0);
     }
diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t v7m_integrity_sig(CPUARMState *env, uint32_t lr)
      */
     uint32_t sig = 0xfefa125a;
 
-    if (!arm_feature(env, ARM_FEATURE_VFP) || (lr & R_V7M_EXCRET_FTYPE_MASK)) {
+    if (!cpu_isar_feature(aa32_simd_r16, env_archcpu(env))
+        || (lr & R_V7M_EXCRET_FTYPE_MASK)) {
         sig |= 1;
     }
     return sig;
@@ -XXX,XX +XXX,XX @@ static void v7m_exception_taken(ARMCPU *cpu, uint32_t lr, bool dotailchain,
 
     if (dotailchain) {
         /* Sanitize LR FType and PREFIX bits */
-        if (!arm_feature(env, ARM_FEATURE_VFP)) {
+        if (!cpu_isar_feature(aa32_simd_r16, cpu)) {
             lr |= R_V7M_EXCRET_FTYPE_MASK;
         }
         lr = deposit32(lr, 24, 8, 0xff);
@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
 
     ftype = excret & R_V7M_EXCRET_FTYPE_MASK;
 
-    if (!arm_feature(env, ARM_FEATURE_VFP) && !ftype) {
+    if (!ftype && !cpu_isar_feature(aa32_simd_r16, cpu)) {
         qemu_log_mask(LOG_GUEST_ERROR, "M profile: zero FTYPE in exception "
                       "exit PC value 0x%" PRIx32 " is UNPREDICTABLE "
                       "if FPU not present\n",
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
              * SFPA is RAZ/WI from NS. FPCA is RO if NSACR.CP10 == 0,
              * RES0 if the FPU is not present, and is stored in the S bank
              */
-            if (arm_feature(env, ARM_FEATURE_VFP) &&
+            if (cpu_isar_feature(aa32_simd_r16, env_archcpu(env)) &&
                 extract32(env->v7m.nsacr, 10, 1)) {
                 env->v7m.control[M_REG_S] &= ~R_V7M_CONTROL_FPCA_MASK;
                 env->v7m.control[M_REG_S] |= val & R_V7M_CONTROL_FPCA_MASK;
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_msr)(CPUARMState *env, uint32_t maskreg, uint32_t val)
             env->v7m.control[env->v7m.secure] &= ~R_V7M_CONTROL_NPRIV_MASK;
             env->v7m.control[env->v7m.secure] |= val & R_V7M_CONTROL_NPRIV_MASK;
         }
-        if (arm_feature(env, ARM_FEATURE_VFP)) {
+        if (cpu_isar_feature(aa32_simd_r16, env_archcpu(env))) {
             /*
              * SFPA is RAZ/WI from NS or if no FPU.
              * FPCA is RO if NSACR.CP10 == 0, RES0 if the FPU is not present.
diff --git a/target/arm/machine.c b/target/arm/machine.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -XXX,XX +XXX,XX @@
 static bool vfp_needed(void *opaque)
 {
     ARMCPU *cpu = opaque;
-    CPUARMState *env = &cpu->env;
 
-    return arm_feature(env, ARM_FEATURE_VFP);
+    return cpu_isar_feature(aa32_simd_r16, cpu);
 }
 
 static int get_fpscr(QEMUFile *f, void *opaque, size_t size,
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The old name, isar_feature_aa32_fpdp, does not reflect
that the test includes VFPv2.  We will introduce further
feature tests for VFPv3.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200214181547.21408-7-richard.henderson@linaro.org
[PMM: fixed grammar in commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h               |  4 ++--
 target/arm/translate-vfp.inc.c | 40 +++++++++++++++++-----------------
 2 files changed, 22 insertions(+), 22 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

We will shortly use these to test for VFPv2 and VFPv3
in different situations.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200214181547.21408-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

From: Richard Henderson <richard.henderson@linaro.org>

Shuffle the order of the checks so that we test the ISA
before we test anything else, such as the register arguments.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200214181547.21408-9-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-vfp.inc.c | 144 ++++++++++++++++-----------------
 1 file changed, 72 insertions(+), 72 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         return false;
     }
 
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
-        ((a->vm | a->vn | a->vd) & 0x10)) {
+    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
         return false;
     }
 
-    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vm | a->vn | a->vd) & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMINMAXNM(DisasContext *s, arg_VMINMAXNM *a)
         return false;
     }
 
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
-        ((a->vm | a->vn | a->vd) & 0x10)) {
+    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
         return false;
     }
 
-    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vm | a->vn | a->vd) & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         return false;
     }
 
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
-        ((a->vm | a->vd) & 0x10)) {
+    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
         return false;
     }
 
-    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vm | a->vd) & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         return false;
     }
 
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (dp && !dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
+    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
         return false;
     }
 
-    if (dp && !dc_isar_feature(aa32_fpdp_v2, s)) {
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (dp && !dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
     TCGv_i64 f0, f1, fd;
     TCGv_ptr fpst;
 
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vn | vm) & 0x10)) {
+    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
         return false;
     }
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vn | vm) & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
     int veclen = s->vec_len;
     TCGv_i64 f0, fd;
 
-    /* UNDEF accesses to D16-D31 if they don't exist */
-    if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vm) & 0x10)) {
+    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
         return false;
     }
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+    /* UNDEF accesses to D16-D31 if they don't exist */
+    if (!dc_isar_feature(aa32_simd_r32, s) && ((vd | vm) & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VFM_dp(DisasContext *s, arg_VFM_dp *a)
         return false;
     }
 
-    /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_simd_r32, s) &&
-        ((a->vd | a->vn | a->vm) & 0x10)) {
+    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
         return false;
     }
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
 
     vd = a->vd;
 
-    /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_simd_r32, s) && (vd & 0x10)) {
+    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
         return false;
     }
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) && (vd & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
 {
     TCGv_i64 vd, vm;
 
+    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+        return false;
+    }
+
     /* Vm/M bits must be zero for the Z variant */
     if (a->z && a->vm != 0) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
         return false;
     }
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
-        return false;
-    }
-
     if (!vfp_access_check(s)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
     TCGv_i32 tmp;
     TCGv_i64 vd;
 
+    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+        return false;
+    }
+
     if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
         return false;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
         return false;
     }
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
-        return false;
-    }
-
     if (!vfp_access_check(s)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
     TCGv_i32 tmp;
     TCGv_i64 vm;
 
+    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+        return false;
+    }
+
     if (!dc_isar_feature(aa32_fp16_dpconv, s)) {
         return false;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
         return false;
     }
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
-        return false;
-    }
-
     if (!vfp_access_check(s)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
     TCGv_ptr fpst;
     TCGv_i64 tmp;
 
+    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+        return false;
+    }
+
     if (!dc_isar_feature(aa32_vrint, s)) {
         return false;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
         return false;
     }
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
-        return false;
-    }
-
     if (!vfp_access_check(s)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
     TCGv_i64 tmp;
     TCGv_i32 tcg_rmode;
 
+    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+        return false;
+    }
+
     if (!dc_isar_feature(aa32_vrint, s)) {
         return false;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
         return false;
     }
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
-        return false;
-    }
-
     if (!vfp_access_check(s)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
     TCGv_ptr fpst;
     TCGv_i64 tmp;
 
+    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+        return false;
+    }
+
     if (!dc_isar_feature(aa32_vrint, s)) {
         return false;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
         return false;
     }
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
-        return false;
-    }
-
     if (!vfp_access_check(s)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
     TCGv_i64 vd;
     TCGv_i32 vm;
 
-    /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
+    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
         return false;
     }
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
     TCGv_i64 vm;
     TCGv_i32 vd;
 
-    /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
+    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
         return false;
     }
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
     TCGv_i64 vd;
     TCGv_ptr fpst;
 
-    /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
+    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
         return false;
     }
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vd & 0x10)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
     TCGv_i32 vd;
     TCGv_i64 vm;
 
+    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+        return false;
+    }
+
     if (!dc_isar_feature(aa32_jscvt, s)) {
         return false;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
         return false;
     }
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
-        return false;
-    }
-
     if (!vfp_access_check(s)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
     TCGv_ptr fpst;
     int frac_bits;
 
+    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+        return false;
+    }
+
     if (!arm_dc_feature(s, ARM_FEATURE_VFP3)) {
         return false;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
         return false;
     }
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
-        return false;
-    }
-
     if (!vfp_access_check(s)) {
         return true;
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
     TCGv_i64 vm;
     TCGv_ptr fpst;
 
-    /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
+    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
         return false;
     }
 
-    if (!dc_isar_feature(aa32_fpdp_v2, s)) {
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) && (a->vm & 0x10)) {
         return false;
     }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Sort this check to the start of a trans_* function.
Merge this with any existing test for fpdp_v2.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200214181547.21408-10-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-vfp.inc.c | 24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

We will eventually remove the early ARM_FEATURE_VFP test,
so add a proper test for each trans_* that does not already
have another ISA test.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20200214181547.21408-11-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-vfp.inc.c | 78 ++++++++++++++++++++++++++++++----
 1 file changed, 69 insertions(+), 9 deletions(-)