Series comparison

-[PULL 00/51] target-arm queue
+[PULL 00/26] target-arm queue
-Probably the last arm pullreq before softfreeze...
+Small pile of bug fixes for rc1. I've included my patches to get
 our docs building with Sphinx 3, just for convenience...
-The following changes since commit 58560ad254fbda71d4daa6622d71683190070ee2:
+-- PMM
-  Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.2-20191024' into staging (2019-10-24 16:22:58 +0100)
+The following changes since commit b149dea55cce97cb226683d06af61984a1c11e96:
   Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20201102' into staging (2020-11-02 10:57:48 +0000)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20191024
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20201102
-for you to fetch changes up to a01a4a3e85ae8f6fe21adbedc80f7013faabdcf4:
+for you to fetch changes up to ffb4fbf90a2f63c9cb33e4bb9f854c79bf04ca4a:
-  hw/arm/highbank: Use AddressSpace when using write_secondary_boot() (2019-10-24 17:16:30 +0100)
+  tests/qtest/npcm7xx_rng-test: Disable randomness tests (2020-11-02 16:52:18 +0000)
 ----------------------------------------------------------------
 target-arm queue:
- * raspi boards: some cleanup
+ * target/arm: Fix Neon emulation bugs on big-endian hosts
- * raspi: implement the bcm2835 system timer device
+ * target/arm: fix handling of HCR.FB
- * raspi: implement a dummy thermal sensor
+ * target/arm: fix LORID_EL1 access check
- * KVM: support providing SVE to the guest
+ * disas/capstone: Fix monitor disassembly of >32 bytes
- * misc devices: switch to ptimer transaction API
+ * hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
- * cache TB flag state to improve performance of cpu_get_tb_cpu_state
+ * hw/arm/boot: fix SVE for EL3 direct kernel boot
- * aspeed: Add an AST2600 eval board
+ * hw/display/omap_lcdc: Fix potential NULL pointer dereference
  * hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
  * target/arm: Get correct MMU index for other-security-state
  * configure: Test that gio libs from pkg-config work
  * hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
  * docs: Fix building with Sphinx 3
  * tests/qtest/npcm7xx_rng-test: Disable randomness tests
 ----------------------------------------------------------------
-Andrew Jones (9):
+AlexChen (2):
-      target/arm/monitor: Introduce qmp_query_cpu_model_expansion
+      hw/display/omap_lcdc: Fix potential NULL pointer dereference
-      tests: arm: Introduce cpu feature tests
+      hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
       target/arm: Allow SVE to be disabled via a CPU property
       target/arm/cpu64: max cpu: Introduce sve<N> properties
       target/arm/kvm64: Add kvm_arch_get/put_sve
       target/arm/kvm64: max cpu: Enable SVE when available
       target/arm/kvm: scratch vcpu: Preserve input kvm_vcpu_init features
       target/arm/cpu64: max cpu: Support sve properties with KVM
       target/arm/kvm: host cpu: Add support for sve<N> properties
-Cédric Le Goater (2):
+Peter Maydell (9):
-      hw/gpio: Fix property accessors of the AST2600 GPIO 1.8V model
+      target/arm: Fix float16 pairwise Neon ops on big-endian hosts
-      aspeed: Add an AST2600 eval board
+      target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
       disas/capstone: Fix monitor disassembly of >32 bytes
       target/arm: Get correct MMU index for other-security-state
       configure: Test that gio libs from pkg-config work
       hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
       scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
       qemu-option-trace.rst.inc: Don't use option:: markup
       tests/qtest/npcm7xx_rng-test: Disable randomness tests
-Peter Maydell (8):
+Philippe Mathieu-Daudé (1):
-      hw/net/fsl_etsec/etsec.c: Switch to transaction-based ptimer API
+      hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
       hw/timer/xilinx_timer.c: Switch to transaction-based ptimer API
       hw/dma/xilinx_axidma.c: Switch to transaction-based ptimer API
       hw/timer/slavio_timer: Remove useless check for NULL t->timer
       hw/timer/slavio_timer.c: Switch to transaction-based ptimer API
       hw/timer/grlib_gptimer.c: Switch to transaction-based ptimer API
       hw/m68k/mcf5206.c: Switch to transaction-based ptimer API
       hw/watchdog/milkymist-sysctl.c: Switch to transaction-based ptimer API
-Philippe Mathieu-Daudé (8):
+Richard Henderson (11):
-      hw/misc/bcm2835_thermal: Add a dummy BCM2835 thermal sensor
+      target/arm: Introduce neon_full_reg_offset
-      hw/arm/bcm2835_peripherals: Use the thermal sensor block
+      target/arm: Move neon_element_offset to translate.c
-      hw/timer/bcm2835: Add the BCM2835 SYS_timer
+      target/arm: Use neon_element_offset in neon_load/store_reg
-      hw/arm/bcm2835_peripherals: Use the SYS_timer
+      target/arm: Use neon_element_offset in vfp_reg_offset
-      hw/arm/bcm2836: Make the SoC code modular
+      target/arm: Add read/write_neon_element32
-      hw/arm/bcm2836: Rename cpus[] as cpu[].core
+      target/arm: Expand read/write_neon_element32 to all MemOp
-      hw/arm/raspi: Use AddressSpace when using arm_boot::write_secondary_boot
+      target/arm: Rename neon_load_reg32 to vfp_load_reg32
-      hw/arm/highbank: Use AddressSpace when using write_secondary_boot()
+      target/arm: Add read/write_neon_element64
       target/arm: Rename neon_load_reg64 to vfp_load_reg64
       target/arm: Simplify do_long_3d and do_2scalar_long
       target/arm: Improve do_prewiden_3d
-Richard Henderson (24):
+Rémi Denis-Courmont (3):
-      target/arm: Split out rebuild_hflags_common
+      target/arm: fix handling of HCR.FB
-      target/arm: Split out rebuild_hflags_a64
+      target/arm: fix LORID_EL1 access check
-      target/arm: Split out rebuild_hflags_common_32
+      hw/arm/boot: fix SVE for EL3 direct kernel boot
       target/arm: Split arm_cpu_data_is_big_endian
       target/arm: Split out rebuild_hflags_m32
       target/arm: Reduce tests vs M-profile in cpu_get_tb_cpu_state
       target/arm: Split out rebuild_hflags_a32
       target/arm: Split out rebuild_hflags_aprofile
       target/arm: Hoist XSCALE_CPAR, VECLEN, VECSTRIDE in cpu_get_tb_cpu_state
       target/arm: Simplify set of PSTATE_SS in cpu_get_tb_cpu_state
       target/arm: Hoist computation of TBFLAG_A32.VFPEN
       target/arm: Add arm_rebuild_hflags
       target/arm: Split out arm_mmu_idx_el
       target/arm: Hoist store to cs_base in cpu_get_tb_cpu_state
       target/arm: Add HELPER(rebuild_hflags_{a32, a64, m32})
       target/arm: Rebuild hflags at EL changes
       target/arm: Rebuild hflags at MSR writes
       target/arm: Rebuild hflags at CPSR writes
       target/arm: Rebuild hflags at Xscale SCTLR writes
       target/arm: Rebuild hflags for M-profile
       target/arm: Rebuild hflags for M-profile NVIC
       linux-user/aarch64: Rebuild hflags for TARGET_WORDS_BIGENDIAN
       linux-user/arm: Rebuild hflags for TARGET_WORDS_BIGENDIAN
       target/arm: Rely on hflags correct in cpu_get_tb_cpu_state
- hw/misc/Makefile.objs                |   1 +
+ docs/qemu-option-trace.rst.inc     |   6 +-
- hw/timer/Makefile.objs               |   1 +
+ configure                          |  10 +-
- tests/Makefile.include               |   5 +-
+ include/hw/intc/arm_gicv3_common.h |   1 -
- qapi/machine-target.json             |   6 +-
+ disas/capstone.c                   |   2 +-
- hw/net/fsl_etsec/etsec.h             |   1 -
+ hw/arm/boot.c                      |   3 +
- include/hw/arm/aspeed.h              |   1 +
+ hw/arm/smmuv3.c                    |   3 +-
- include/hw/arm/bcm2835_peripherals.h |   5 +-
+ hw/display/exynos4210_fimd.c       |   4 +-
- include/hw/arm/bcm2836.h             |   4 +-
+ hw/display/omap_lcdc.c             |  10 +-
- include/hw/arm/raspi_platform.h      |   1 +
+ hw/intc/arm_gicv3_cpuif.c          |   5 +-
- include/hw/misc/bcm2835_thermal.h    |  27 ++
+ target/arm/helper.c                |  24 +-
- include/hw/timer/bcm2835_systmr.h    |  33 +++
+ target/arm/m_helper.c              |   3 +-
- include/qemu/bitops.h                |   1 +
+ target/arm/translate.c             | 153 +++++++++---
- target/arm/cpu.h                     | 105 +++++--
+ target/arm/vec_helper.c            |  12 +-
- target/arm/helper.h                  |   4 +
+ tests/qtest/npcm7xx_rng-test.c     |  14 +-
- target/arm/internals.h               |   9 +
+ scripts/kernel-doc                 |  18 +-
- target/arm/kvm_arm.h                 |  39 +++
+ target/arm/translate-neon.c.inc    | 472 ++++++++++++++++++++-----------------
- hw/arm/aspeed.c                      |  23 ++
+ target/arm/translate-vfp.c.inc     | 341 +++++++++++----------------
- hw/arm/bcm2835_peripherals.c         |  30 +-
+files changed, 588 insertions(+), 493 deletions(-)
  hw/arm/bcm2836.c                     |  44 +--
  hw/arm/highbank.c                    |   3 +-
  hw/arm/raspi.c                       |  14 +-
  hw/dma/xilinx_axidma.c               |   9 +-
  hw/gpio/aspeed_gpio.c                |   8 +-
  hw/intc/armv7m_nvic.c                |  22 +-
  hw/m68k/mcf5206.c                    |  15 +-
  hw/misc/bcm2835_thermal.c            | 135 +++++++++
  hw/net/fsl_etsec/etsec.c             |   9 +-
  hw/timer/bcm2835_systmr.c            | 163 +++++++++++
  hw/timer/grlib_gptimer.c             |  28 +-
  hw/timer/milkymist-sysctl.c          |  25 +-
  hw/timer/slavio_timer.c              |  32 ++-
  hw/timer/xilinx_timer.c              |  13 +-
  linux-user/aarch64/cpu_loop.c        |   1 +
  linux-user/arm/cpu_loop.c            |   1 +
  linux-user/syscall.c                 |   1 +
  target/arm/cpu.c                     |  26 +-
  target/arm/cpu64.c                   | 364 +++++++++++++++++++++--
  target/arm/helper-a64.c              |   3 +
  target/arm/helper.c                  | 403 +++++++++++++++++---------
  target/arm/kvm.c                     |  25 +-
  target/arm/kvm32.c                   |   6 +-
  target/arm/kvm64.c                   | 325 ++++++++++++++++++---
  target/arm/m_helper.c                |   6 +
  target/arm/machine.c                 |   1 +
  target/arm/monitor.c                 | 158 ++++++++++
  target/arm/op_helper.c               |   4 +
  target/arm/translate-a64.c           |  13 +-
  target/arm/translate.c               |  33 ++-
  tests/arm-cpu-features.c             | 540 +++++++++++++++++++++++++++++++++++
  docs/arm-cpu-features.rst            | 317 ++++++++++++++++++++
  hw/timer/trace-events                |   5 +
 files changed, 2725 insertions(+), 323 deletions(-)
  create mode 100644 include/hw/misc/bcm2835_thermal.h
  create mode 100644 include/hw/timer/bcm2835_systmr.h
  create mode 100644 hw/misc/bcm2835_thermal.c
  create mode 100644 hw/timer/bcm2835_systmr.c
  create mode 100644 tests/arm-cpu-features.c
  create mode 100644 docs/arm-cpu-features.rst

-[PULL 01/51] hw/gpio: Fix property accessors of the AST2600 GPIO 1.8V model
+Deleted patch
-From: Cédric Le Goater <clg@kaod.org>
-The property names of AST2600 GPIO 1.8V model are one character bigger
-than the names of the other ASPEED GPIO model. Increase the string
-buffer size by one and be more strict on the expected pattern of the
-property name.
-This fixes the QOM test of the ast2600-evb machine under :
-  Apple LLVM version 10.0.0 (clang-1000.10.44.4)
-  Target: x86_64-apple-darwin17.7.0
-  Thread model: posix
-  InstalledDir: /Library/Developer/CommandLineTools/usr/bin
-Cc: Rashmica Gupta <rashmica.g@gmail.com>
-Fixes: 36d737ee82b2 ("hw/gpio: Add in AST2600 specific implementation")
-Signed-off-by: Cédric Le Goater <clg@kaod.org>
-Message-id: 20191023130455.1347-2-clg@kaod.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- hw/gpio/aspeed_gpio.c | 8 ++++----
-file changed, 4 insertions(+), 4 deletions(-)
-diff --git a/hw/gpio/aspeed_gpio.c b/hw/gpio/aspeed_gpio.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/gpio/aspeed_gpio.c
-+++ b/hw/gpio/aspeed_gpio.c
-@@ -XXX,XX +XXX,XX @@ static void aspeed_gpio_get_pin(Object *obj, Visitor *v, const char *name,
- {
-     int pin = 0xfff;
-     bool level = true;
--    char group[3];
-+    char group[4];
-     AspeedGPIOState *s = ASPEED_GPIO(obj);
-     int set_idx, group_idx = 0;
-     if (sscanf(name, "gpio%2[A-Z]%1d", group, &pin) != 2) {
-         /* 1.8V gpio */
--        if (sscanf(name, "gpio%3s%1d", group, &pin) != 2) {
-+        if (sscanf(name, "gpio%3[18A-E]%1d", group, &pin) != 2) {
-             error_setg(errp, "%s: error reading %s", __func__, name);
-             return;
-         }
-@@ -XXX,XX +XXX,XX @@ static void aspeed_gpio_set_pin(Object *obj, Visitor *v, const char *name,
-     Error *local_err = NULL;
-     bool level;
-     int pin = 0xfff;
--    char group[3];
-+    char group[4];
-     AspeedGPIOState *s = ASPEED_GPIO(obj);
-     int set_idx, group_idx = 0;
-@@ -XXX,XX +XXX,XX @@ static void aspeed_gpio_set_pin(Object *obj, Visitor *v, const char *name,
-     }
-     if (sscanf(name, "gpio%2[A-Z]%1d", group, &pin) != 2) {
-         /* 1.8V gpio */
--        if (sscanf(name, "gpio%3s%1d", group, &pin) != 2) {
-+        if (sscanf(name, "gpio%3[18A-E]%1d", group, &pin) != 2) {
-             error_setg(errp, "%s: error reading %s", __func__, name);
-             return;
-         }
---
-.20.1

-[PULL 02/51] aspeed: Add an AST2600 eval board
+Deleted patch
-From: Cédric Le Goater <clg@kaod.org>
-Signed-off-by: Cédric Le Goater <clg@kaod.org>
-Reviewed-by: Joel Stanley <joel@jms.id.au>
-Message-id: 20191023130455.1347-3-clg@kaod.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- include/hw/arm/aspeed.h |  1 +
- hw/arm/aspeed.c         | 23 +++++++++++++++++++++++
-files changed, 24 insertions(+)
-diff --git a/include/hw/arm/aspeed.h b/include/hw/arm/aspeed.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/aspeed.h
-+++ b/include/hw/arm/aspeed.h
-@@ -XXX,XX +XXX,XX @@ typedef struct AspeedBoardConfig {
-     const char *desc;
-     const char *soc_name;
-     uint32_t hw_strap1;
-+    uint32_t hw_strap2;
-     const char *fmc_model;
-     const char *spi_model;
-     uint32_t num_cs;
-diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/aspeed.c
-+++ b/hw/arm/aspeed.c
-@@ -XXX,XX +XXX,XX @@ struct AspeedBoardState {
- /* Witherspoon hardware value: 0xF10AD216 (but use romulus definition) */
- #define WITHERSPOON_BMC_HW_STRAP1 ROMULUS_BMC_HW_STRAP1
-+/* AST2600 evb hardware value */
-+#define AST2600_EVB_HW_STRAP1 0x000000C0
-+#define AST2600_EVB_HW_STRAP2 0x00000003
-+
- /*
-  * The max ram region is for firmwares that scan the address space
-  * with load/store to guess how much RAM the SoC has.
-@@ -XXX,XX +XXX,XX @@ static void aspeed_board_init(MachineState *machine,
-                              &error_abort);
-     object_property_set_int(OBJECT(&bmc->soc), cfg->hw_strap1, "hw-strap1",
-                             &error_abort);
-+    object_property_set_int(OBJECT(&bmc->soc), cfg->hw_strap2, "hw-strap2",
-+                            &error_abort);
-     object_property_set_int(OBJECT(&bmc->soc), cfg->num_cs, "num-cs",
-                             &error_abort);
-     object_property_set_int(OBJECT(&bmc->soc), machine->smp.cpus, "num-cpus",
-@@ -XXX,XX +XXX,XX @@ static void ast2500_evb_i2c_init(AspeedBoardState *bmc)
-     i2c_create_slave(aspeed_i2c_get_bus(DEVICE(&soc->i2c), 11), "ds1338", 0x32);
- }
-+static void ast2600_evb_i2c_init(AspeedBoardState *bmc)
-+{
-+    /* Start with some devices on our I2C busses */
-+    ast2500_evb_i2c_init(bmc);
-+}
-+
- static void romulus_bmc_i2c_init(AspeedBoardState *bmc)
- {
-     AspeedSoCState *soc = &bmc->soc;
-@@ -XXX,XX +XXX,XX @@ static const AspeedBoardConfig aspeed_boards[] = {
-         .num_cs    = 2,
-         .i2c_init  = witherspoon_bmc_i2c_init,
-         .ram       = 512 * MiB,
-+    }, {
-+        .name      = MACHINE_TYPE_NAME("ast2600-evb"),
-+        .desc      = "Aspeed AST2600 EVB (Cortex A7)",
-+        .soc_name  = "ast2600-a0",
-+        .hw_strap1 = AST2600_EVB_HW_STRAP1,
-+        .hw_strap2 = AST2600_EVB_HW_STRAP2,
-+        .fmc_model = "w25q512jv",
-+        .spi_model = "mx66u51235f",
-+        .num_cs    = 1,
-+        .i2c_init  = ast2600_evb_i2c_init,
-+        .ram       = 1 * GiB,
-     },
- };
---
-.20.1

-[PULL 03/51] target/arm: Split out rebuild_hflags_common
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-Create a function to compute the values of the TBFLAG_ANY bits
-that will be cached.  For now, the env->hflags variable is not
-used, and the results are fed back to cpu_get_tb_cpu_state.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-2-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/cpu.h    | 29 ++++++++++++++++++-----------
- target/arm/helper.c | 26 +++++++++++++++++++-------
-files changed, 37 insertions(+), 18 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
-+++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
-     uint32_t pstate;
-     uint32_t aarch64; /* 1 if CPU is in aarch64 state; inverse of PSTATE.nRW */
-+    /* Cached TBFLAGS state.  See below for which bits are included.  */
-+    uint32_t hflags;
-+
-     /* Frequently accessed CPSR bits are stored separately for efficiency.
-        This contains all the other bits.  Use cpsr_{read,write} to access
-        the whole CPSR.  */
-@@ -XXX,XX +XXX,XX @@ typedef ARMCPU ArchCPU;
- #include "exec/cpu-all.h"
--/* Bit usage in the TB flags field: bit 31 indicates whether we are
-+/*
-+ * Bit usage in the TB flags field: bit 31 indicates whether we are
-  * in 32 or 64 bit mode. The meaning of the other bits depends on that.
-  * We put flags which are shared between 32 and 64 bit mode at the top
-  * of the word, and flags which apply to only one mode at the bottom.
-+ *
-+ * Unless otherwise noted, these bits are cached in env->hflags.
-  */
- FIELD(TBFLAG_ANY, AARCH64_STATE, 31, 1)
- FIELD(TBFLAG_ANY, MMUIDX, 28, 3)
- FIELD(TBFLAG_ANY, SS_ACTIVE, 27, 1)
--FIELD(TBFLAG_ANY, PSTATE_SS, 26, 1)
-+FIELD(TBFLAG_ANY, PSTATE_SS, 26, 1)     /* Not cached. */
- /* Target EL if we take a floating-point-disabled exception */
- FIELD(TBFLAG_ANY, FPEXC_EL, 24, 2)
- FIELD(TBFLAG_ANY, BE_DATA, 23, 1)
-@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_ANY, BE_DATA, 23, 1)
- FIELD(TBFLAG_ANY, DEBUG_TARGET_EL, 21, 2)
- /* Bit usage when in AArch32 state: */
--FIELD(TBFLAG_A32, THUMB, 0, 1)
--FIELD(TBFLAG_A32, VECLEN, 1, 3)
--FIELD(TBFLAG_A32, VECSTRIDE, 4, 2)
-+FIELD(TBFLAG_A32, THUMB, 0, 1)          /* Not cached. */
-+FIELD(TBFLAG_A32, VECLEN, 1, 3)         /* Not cached. */
-+FIELD(TBFLAG_A32, VECSTRIDE, 4, 2)      /* Not cached. */
- /*
-  * We store the bottom two bits of the CPAR as TB flags and handle
-  * checks on the other bits at runtime. This shares the same bits as
-  * VECSTRIDE, which is OK as no XScale CPU has VFP.
-+ * Not cached, because VECLEN+VECSTRIDE are not cached.
-  */
- FIELD(TBFLAG_A32, XSCALE_CPAR, 4, 2)
- /*
-@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A32, XSCALE_CPAR, 4, 2)
-  * the same thing as the current security state of the processor!
-  */
- FIELD(TBFLAG_A32, NS, 6, 1)
--FIELD(TBFLAG_A32, VFPEN, 7, 1)
--FIELD(TBFLAG_A32, CONDEXEC, 8, 8)
-+FIELD(TBFLAG_A32, VFPEN, 7, 1)          /* Not cached. */
-+FIELD(TBFLAG_A32, CONDEXEC, 8, 8)       /* Not cached. */
- FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
- /* For M profile only, set if FPCCR.LSPACT is set */
--FIELD(TBFLAG_A32, LSPACT, 18, 1)
-+FIELD(TBFLAG_A32, LSPACT, 18, 1)        /* Not cached. */
- /* For M profile only, set if we must create a new FP context */
--FIELD(TBFLAG_A32, NEW_FP_CTXT_NEEDED, 19, 1)
-+FIELD(TBFLAG_A32, NEW_FP_CTXT_NEEDED, 19, 1) /* Not cached. */
- /* For M profile only, set if FPCCR.S does not match current security state */
--FIELD(TBFLAG_A32, FPCCR_S_WRONG, 20, 1)
-+FIELD(TBFLAG_A32, FPCCR_S_WRONG, 20, 1) /* Not cached. */
- /* For M profile only, Handler (ie not Thread) mode */
- FIELD(TBFLAG_A32, HANDLER, 21, 1)
- /* For M profile only, whether we should generate stack-limit checks */
-@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, SVEEXC_EL, 2, 2)
- FIELD(TBFLAG_A64, ZCR_LEN, 4, 4)
- FIELD(TBFLAG_A64, PAUTH_ACTIVE, 8, 1)
- FIELD(TBFLAG_A64, BT, 9, 1)
--FIELD(TBFLAG_A64, BTYPE, 10, 2)
-+FIELD(TBFLAG_A64, BTYPE, 10, 2)         /* Not cached. */
- FIELD(TBFLAG_A64, TBID, 12, 2)
- static inline bool bswap_code(bool sctlr_b)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env)
- }
- #endif
-+static uint32_t rebuild_hflags_common(CPUARMState *env, int fp_el,
-+                                      ARMMMUIdx mmu_idx, uint32_t flags)
-+{
-+    flags = FIELD_DP32(flags, TBFLAG_ANY, FPEXC_EL, fp_el);
-+    flags = FIELD_DP32(flags, TBFLAG_ANY, MMUIDX,
-+                       arm_to_core_mmu_idx(mmu_idx));
-+
-+    if (arm_cpu_data_is_big_endian(env)) {
-+        flags = FIELD_DP32(flags, TBFLAG_ANY, BE_DATA, 1);
-+    }
-+    if (arm_singlestep_active(env)) {
-+        flags = FIELD_DP32(flags, TBFLAG_ANY, SS_ACTIVE, 1);
-+    }
-+    return flags;
-+}
-+
- void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-                           target_ulong *cs_base, uint32_t *pflags)
- {
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-         }
-     }
--    flags = FIELD_DP32(flags, TBFLAG_ANY, MMUIDX, arm_to_core_mmu_idx(mmu_idx));
-+    flags = rebuild_hflags_common(env, fp_el, mmu_idx, flags);
-     /* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
-      * states defined in the ARM ARM for software singlestep:
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-      *     0            x       Inactive (the TB flag for SS is always 0)
-      *     1            0       Active-pending
-      *     1            1       Active-not-pending
-+     * SS_ACTIVE is set in hflags; PSTATE_SS is computed every TB.
-      */
--    if (arm_singlestep_active(env)) {
--        flags = FIELD_DP32(flags, TBFLAG_ANY, SS_ACTIVE, 1);
-+    if (FIELD_EX32(flags, TBFLAG_ANY, SS_ACTIVE)) {
-         if (is_a64(env)) {
-             if (env->pstate & PSTATE_SS) {
-                 flags = FIELD_DP32(flags, TBFLAG_ANY, PSTATE_SS, 1);
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-             }
-         }
-     }
--    if (arm_cpu_data_is_big_endian(env)) {
--        flags = FIELD_DP32(flags, TBFLAG_ANY, BE_DATA, 1);
--    }
--    flags = FIELD_DP32(flags, TBFLAG_ANY, FPEXC_EL, fp_el);
-     if (arm_v7m_is_handler_mode(env)) {
-         flags = FIELD_DP32(flags, TBFLAG_A32, HANDLER, 1);
---
-.20.1

-[PULL 04/51] target/arm: Split out rebuild_hflags_a64
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-Create a function to compute the values of the TBFLAG_A64 bits
-that will be cached.  For now, the env->hflags variable is not
-used, and the results are fed back to cpu_get_tb_cpu_state.
-Note that not all BTI related flags are cached, so we have to
-test the BTI feature twice -- once for those bits moved out to
-rebuild_hflags_a64 and once for those bits that remain in
-cpu_get_tb_cpu_state.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-3-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/helper.c | 131 +++++++++++++++++++++++---------------------
-file changed, 69 insertions(+), 62 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_common(CPUARMState *env, int fp_el,
-     return flags;
- }
-+static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
-+                                   ARMMMUIdx mmu_idx)
-+{
-+    ARMMMUIdx stage1 = stage_1_mmu_idx(mmu_idx);
-+    ARMVAParameters p0 = aa64_va_parameters_both(env, 0, stage1);
-+    uint32_t flags = 0;
-+    uint64_t sctlr;
-+    int tbii, tbid;
-+
-+    flags = FIELD_DP32(flags, TBFLAG_ANY, AARCH64_STATE, 1);
-+
-+    /* FIXME: ARMv8.1-VHE S2 translation regime.  */
-+    if (regime_el(env, stage1) < 2) {
-+        ARMVAParameters p1 = aa64_va_parameters_both(env, -1, stage1);
-+        tbid = (p1.tbi << 1) | p0.tbi;
-+        tbii = tbid & ~((p1.tbid << 1) | p0.tbid);
-+    } else {
-+        tbid = p0.tbi;
-+        tbii = tbid & !p0.tbid;
-+    }
-+
-+    flags = FIELD_DP32(flags, TBFLAG_A64, TBII, tbii);
-+    flags = FIELD_DP32(flags, TBFLAG_A64, TBID, tbid);
-+
-+    if (cpu_isar_feature(aa64_sve, env_archcpu(env))) {
-+        int sve_el = sve_exception_el(env, el);
-+        uint32_t zcr_len;
-+
-+        /*
-+         * If SVE is disabled, but FP is enabled,
-+         * then the effective len is 0.
-+         */
-+        if (sve_el != 0 && fp_el == 0) {
-+            zcr_len = 0;
-+        } else {
-+            zcr_len = sve_zcr_len_for_el(env, el);
-+        }
-+        flags = FIELD_DP32(flags, TBFLAG_A64, SVEEXC_EL, sve_el);
-+        flags = FIELD_DP32(flags, TBFLAG_A64, ZCR_LEN, zcr_len);
-+    }
-+
-+    sctlr = arm_sctlr(env, el);
-+
-+    if (cpu_isar_feature(aa64_pauth, env_archcpu(env))) {
-+        /*
-+         * In order to save space in flags, we record only whether
-+         * pauth is "inactive", meaning all insns are implemented as
-+         * a nop, or "active" when some action must be performed.
-+         * The decision of which action to take is left to a helper.
-+         */
-+        if (sctlr & (SCTLR_EnIA | SCTLR_EnIB | SCTLR_EnDA | SCTLR_EnDB)) {
-+            flags = FIELD_DP32(flags, TBFLAG_A64, PAUTH_ACTIVE, 1);
-+        }
-+    }
-+
-+    if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
-+        /* Note that SCTLR_EL[23].BT == SCTLR_BT1.  */
-+        if (sctlr & (el == 0 ? SCTLR_BT0 : SCTLR_BT1)) {
-+            flags = FIELD_DP32(flags, TBFLAG_A64, BT, 1);
-+        }
-+    }
-+
-+    return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
-+}
-+
- void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-                           target_ulong *cs_base, uint32_t *pflags)
- {
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-     uint32_t flags = 0;
-     if (is_a64(env)) {
--        ARMCPU *cpu = env_archcpu(env);
--        uint64_t sctlr;
--
-         *pc = env->pc;
--        flags = FIELD_DP32(flags, TBFLAG_ANY, AARCH64_STATE, 1);
--
--        /* Get control bits for tagged addresses.  */
--        {
--            ARMMMUIdx stage1 = stage_1_mmu_idx(mmu_idx);
--            ARMVAParameters p0 = aa64_va_parameters_both(env, 0, stage1);
--            int tbii, tbid;
--
--            /* FIXME: ARMv8.1-VHE S2 translation regime.  */
--            if (regime_el(env, stage1) < 2) {
--                ARMVAParameters p1 = aa64_va_parameters_both(env, -1, stage1);
--                tbid = (p1.tbi << 1) | p0.tbi;
--                tbii = tbid & ~((p1.tbid << 1) | p0.tbid);
--            } else {
--                tbid = p0.tbi;
--                tbii = tbid & !p0.tbid;
--            }
--
--            flags = FIELD_DP32(flags, TBFLAG_A64, TBII, tbii);
--            flags = FIELD_DP32(flags, TBFLAG_A64, TBID, tbid);
--        }
--
--        if (cpu_isar_feature(aa64_sve, cpu)) {
--            int sve_el = sve_exception_el(env, current_el);
--            uint32_t zcr_len;
--
--            /* If SVE is disabled, but FP is enabled,
--             * then the effective len is 0.
--             */
--            if (sve_el != 0 && fp_el == 0) {
--                zcr_len = 0;
--            } else {
--                zcr_len = sve_zcr_len_for_el(env, current_el);
--            }
--            flags = FIELD_DP32(flags, TBFLAG_A64, SVEEXC_EL, sve_el);
--            flags = FIELD_DP32(flags, TBFLAG_A64, ZCR_LEN, zcr_len);
--        }
--
--        sctlr = arm_sctlr(env, current_el);
--
--        if (cpu_isar_feature(aa64_pauth, cpu)) {
--            /*
--             * In order to save space in flags, we record only whether
--             * pauth is "inactive", meaning all insns are implemented as
--             * a nop, or "active" when some action must be performed.
--             * The decision of which action to take is left to a helper.
--             */
--            if (sctlr & (SCTLR_EnIA | SCTLR_EnIB | SCTLR_EnDA | SCTLR_EnDB)) {
--                flags = FIELD_DP32(flags, TBFLAG_A64, PAUTH_ACTIVE, 1);
--            }
--        }
--
--        if (cpu_isar_feature(aa64_bti, cpu)) {
--            /* Note that SCTLR_EL[23].BT == SCTLR_BT1.  */
--            if (sctlr & (current_el == 0 ? SCTLR_BT0 : SCTLR_BT1)) {
--                flags = FIELD_DP32(flags, TBFLAG_A64, BT, 1);
--            }
-+        flags = rebuild_hflags_a64(env, current_el, fp_el, mmu_idx);
-+        if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
-             flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
-         }
-     } else {
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-             flags = FIELD_DP32(flags, TBFLAG_A32,
-                                XSCALE_CPAR, env->cp15.c15_cpar);
-         }
--    }
--    flags = rebuild_hflags_common(env, fp_el, mmu_idx, flags);
-+        flags = rebuild_hflags_common(env, fp_el, mmu_idx, flags);
-+    }
-     /* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
-      * states defined in the ARM ARM for software singlestep:
---
-.20.1

-[PULL 43/51] target/arm/kvm: host cpu: Add support for sve<N> properties
+[PULL 01/26] target/arm: Introduce neon_full_reg_offset
-From: Andrew Jones <drjones@redhat.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-Allow cpu 'host' to enable SVE when it's available, unless the
+This function makes it clear that we're talking about the whole
-user chooses to disable it with the added 'sve=off' cpu property.
+register, and not the 32-bit piece at index 0.  This fixes a bug
-Also give the user the ability to select vector lengths with the
+when running on a big-endian host.
 sve<N> properties. We don't adopt 'max' cpu's other sve property,
 sve-max-vq, because that property is difficult to use with KVM.
 That property assumes all vector lengths in the range from 1 up
 to and including the specified maximum length are supported, but
 there may be optional lengths not supported by the host in that
 range. With KVM one must be more specific when enabling vector
 lengths.
-Signed-off-by: Andrew Jones <drjones@redhat.com>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Eric Auger <eric.auger@redhat.com>
+Message-id: 20201030022618.785675-2-richard.henderson@linaro.org
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
 Message-id: 20191024121808.9612-10-drjones@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h          |  2 ++
+ target/arm/translate.c          |  8 ++++++
- target/arm/cpu.c          |  3 +++
+ target/arm/translate-neon.c.inc | 44 ++++++++++++++++-----------------
- target/arm/cpu64.c        | 33 +++++++++++++++++----------------
+ target/arm/translate-vfp.c.inc  |  2 +-
- target/arm/kvm64.c        | 14 +++++++++++++-
+files changed, 31 insertions(+), 23 deletions(-)
  tests/arm-cpu-features.c  | 23 +++++++++++------------
  docs/arm-cpu-features.rst | 19 ++++++++++++-------
 files changed, 58 insertions(+), 36 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/translate.c
-+++ b/target/arm/cpu.h
++++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ int aarch64_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
+@@ -XXX,XX +XXX,XX @@ static inline void gen_hlt(DisasContext *s, int imm)
- void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq);
+     unallocated_encoding(s);
  void aarch64_sve_change_el(CPUARMState *env, int old_el,
                             int new_el, bool el0_a64);
 +void aarch64_add_sve_properties(Object *obj);
  #else
  static inline void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq) { }
  static inline void aarch64_sve_change_el(CPUARMState *env, int o,
                                           int n, bool a)
  { }
 +static inline void aarch64_add_sve_properties(Object *obj) { }
  #endif
  #if !defined(CONFIG_TCG)
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_host_initfn(Object *obj)
      ARMCPU *cpu = ARM_CPU(obj);
      kvm_arm_set_cpu_features_from_host(cpu);
 +    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
 +        aarch64_add_sve_properties(obj);
 +    }
      arm_cpu_post_init(obj);
  }
-diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
++/*
-index XXXXXXX..XXXXXXX 100644
++ * Return the offset of a "full" NEON Dreg.
---- a/target/arm/cpu64.c
++ */
-+++ b/target/arm/cpu64.c
++static long neon_full_reg_offset(unsigned reg)
@@ -XXX,XX +XXX,XX @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
      cpu->isar.id_aa64pfr0 = t;
  }
 +void aarch64_add_sve_properties(Object *obj)
 +{
-+    uint32_t vq;
++    return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
 +
 +    object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
 +                        cpu_arm_set_sve, NULL, NULL, &error_fatal);
 +
 +    for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
 +        char name[8];
 +        sprintf(name, "sve%d", vq * 128);
 +        object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
 +                            cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
 +    }
 +}
 +
- /* -cpu max: if KVM is enabled, like -cpu host (best possible with this host);
+ static inline long vfp_reg_offset(bool dp, unsigned reg)
   * otherwise, a CPU with as many features enabled as our emulation supports.
   * The version of '-cpu max' for qemu-system-arm is defined in cpu.c;
@@ -XXX,XX +XXX,XX @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
  static void aarch64_max_initfn(Object *obj)
  {
-     ARMCPU *cpu = ARM_CPU(obj);
+     if (dp) {
--    uint32_t vq;
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
--    uint64_t t;
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.c.inc
-     if (kvm_enabled()) {
++++ b/target/arm/translate-neon.c.inc
-         kvm_arm_set_cpu_features_from_host(cpu);
+@@ -XXX,XX +XXX,XX @@ neon_element_offset(int reg, int element, MemOp size)
--        if (kvm_arm_sve_supported(CPU(cpu))) {
+         ofs ^= 8 - element_size;
--            t = cpu->isar.id_aa64pfr0;
+     }
 -            t = FIELD_DP64(t, ID_AA64PFR0, SVE, 1);
 -            cpu->isar.id_aa64pfr0 = t;
 -        }
      } else {
 +        uint64_t t;
          uint32_t u;
          aarch64_a57_initfn(obj);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
  #endif
+-    return neon_reg_offset(reg, 0) + ofs;
++    return neon_full_reg_offset(reg) + ofs;
+ }
+ static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
+              * We cannot write 16 bytes at once because the
+              * destination is unaligned.
+              */
+-            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
++            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
+, 8, tmp);
+-            tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
+-                             neon_reg_offset(vd, 0), 8, 8);
++            tcg_gen_gvec_mov(0, neon_full_reg_offset(vd + 1),
++                             neon_full_reg_offset(vd), 8, 8);
+         } else {
+-            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
++            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
+                                  vec_size, vec_size, tmp);
+         }
+         tcg_gen_addi_i32(addr, addr, 1 << size);
+@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
+ static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
+ {
+     int vec_size = a->q ? 16 : 8;
+-    int rd_ofs = neon_reg_offset(a->vd, 0);
+-    int rn_ofs = neon_reg_offset(a->vn, 0);
+-    int rm_ofs = neon_reg_offset(a->vm, 0);
++    int rd_ofs = neon_full_reg_offset(a->vd);
++    int rn_ofs = neon_full_reg_offset(a->vn);
++    int rm_ofs = neon_full_reg_offset(a->vm);
+     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+         return false;
+@@ -XXX,XX +XXX,XX @@ static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
+ {
+     /* Handle a 2-reg-shift insn which can be vectorized. */
+     int vec_size = a->q ? 16 : 8;
+-    int rd_ofs = neon_reg_offset(a->vd, 0);
+-    int rm_ofs = neon_reg_offset(a->vm, 0);
++    int rd_ofs = neon_full_reg_offset(a->vd);
++    int rm_ofs = neon_full_reg_offset(a->vm);
+     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+         return false;
+@@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
+ {
+     /* FP operations in 2-reg-and-shift group */
+     int vec_size = a->q ? 16 : 8;
+-    int rd_ofs = neon_reg_offset(a->vd, 0);
+-    int rm_ofs = neon_reg_offset(a->vm, 0);
++    int rd_ofs = neon_full_reg_offset(a->vd);
++    int rm_ofs = neon_full_reg_offset(a->vm);
+     TCGv_ptr fpst;
+     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+@@ -XXX,XX +XXX,XX @@ static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
+         return true;
      }
--    object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
+-    reg_ofs = neon_reg_offset(a->vd, 0);
--                        cpu_arm_set_sve, NULL, NULL, &error_fatal);
++    reg_ofs = neon_full_reg_offset(a->vd);
-+    aarch64_add_sve_properties(obj);
+     vec_size = a->q ? 16 : 8;
-     object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
+     imm = asimd_imm_const(a->imm, a->cmode, a->op);
-                         cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
--
+@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
--    for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
+         return true;
--        char name[8];
+     }
--        sprintf(name, "sve%d", vq * 128);
--        object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
+-    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
--                            cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
+-                       neon_reg_offset(a->vn, 0),
--    }
+-                       neon_reg_offset(a->vm, 0),
 +    tcg_gen_gvec_3_ool(neon_full_reg_offset(a->vd),
 +                       neon_full_reg_offset(a->vn),
 +                       neon_full_reg_offset(a->vm),
 , 16, 0, fn_gvec);
      return true;
  }
+@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
- struct ARMCPUInfo {
+ {
-diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
+     /* Two registers and a scalar, using gvec */
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rn_ofs = neon_reg_offset(a->vn, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rn_ofs = neon_full_reg_offset(a->vn);
      int rm_ofs;
      int idx;
      TCGv_ptr fpstatus;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
      /* a->vm is M:Vm, which encodes both register and index */
      idx = extract32(a->vm, a->size + 2, 2);
      a->vm = extract32(a->vm, 0, a->size + 2);
 -    rm_ofs = neon_reg_offset(a->vm, 0);
 +    rm_ofs = neon_full_reg_offset(a->vm);
      fpstatus = fpstatus_ptr(a->size == 1 ? FPST_STD_F16 : FPST_STD);
      tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpstatus,
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
          return true;
      }
 -    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
 +    tcg_gen_gvec_dup_mem(a->size, neon_full_reg_offset(a->vd),
                           neon_element_offset(a->vm, a->index, a->size),
                           a->q ? 16 : 8, a->q ? 16 : 8);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
  static bool do_2misc_vec(DisasContext *s, arg_2misc *a, GVecGen2Fn *fn)
  {
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rm_ofs = neon_reg_offset(a->vm, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rm_ofs = neon_full_reg_offset(a->vm);
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm64.c
+--- a/target/arm/translate-vfp.c.inc
-+++ b/target/arm/kvm64.c
++++ b/target/arm/translate-vfp.c.inc
-@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
+@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
       * and then query that CPU for the relevant ID registers.
       */
      int fdarray[3];
 +    bool sve_supported;
      uint64_t features = 0;
 +    uint64_t t;
      int err;
      /* Old kernels may not know about the PREFERRED_TARGET ioctl: however
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
                                ARM64_SYS_REG(3, 0, 0, 3, 2));
      }
-+    sve_supported = ioctl(fdarray[0], KVM_CHECK_EXTENSION, KVM_CAP_ARM_SVE) > 0;
+     tmp = load_reg(s, a->rt);
-+
+-    tcg_gen_gvec_dup_i32(size, neon_reg_offset(a->vn, 0),
-     kvm_arm_destroy_scratch_host_vcpu(fdarray);
++    tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(a->vn),
+                          vec_size, vec_size, tmp);
-     if (err < 0) {
+     tcg_temp_free_i32(tmp);
          return false;
      }
 -   /* We can assume any KVM supporting CPU is at least a v8
 +    /* Add feature bits that can't appear until after VCPU init. */
 +    if (sve_supported) {
 +        t = ahcf->isar.id_aa64pfr0;
 +        t = FIELD_DP64(t, ID_AA64PFR0, SVE, 1);
 +        ahcf->isar.id_aa64pfr0 = t;
 +    }
 +
 +    /*
 +     * We can assume any KVM supporting CPU is at least a v8
       * with VFPv4+Neon; this in turn implies most of the other
       * feature bits.
       */
 diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/arm-cpu-features.c
 +++ b/tests/arm-cpu-features.c
@@ -XXX,XX +XXX,XX @@ static void sve_tests_sve_off_kvm(const void *data)
  {
      QTestState *qts;
 -    qts = qtest_init(MACHINE "-accel kvm -cpu max,sve=off");
 +    qts = qtest_init(MACHINE "-accel kvm -cpu host,sve=off");
      /*
       * We don't know if this host supports SVE so we don't
@@ -XXX,XX +XXX,XX @@ static void sve_tests_sve_off_kvm(const void *data)
       * and that using sve<N>=off to explicitly disable vector
       * lengths is OK too.
       */
 -    assert_sve_vls(qts, "max", 0, NULL);
 -    assert_sve_vls(qts, "max", 0, "{ 'sve128': false }");
 +    assert_sve_vls(qts, "host", 0, NULL);
 +    assert_sve_vls(qts, "host", 0, "{ 'sve128': false }");
      qtest_quit(qts);
  }
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
              "We cannot guarantee the CPU type 'cortex-a15' works "
              "with KVM on this host", NULL);
 -        assert_has_feature(qts, "max", "sve");
 -        resp = do_query_no_props(qts, "max");
 +        assert_has_feature(qts, "host", "sve");
 +        resp = do_query_no_props(qts, "host");
          kvm_supports_sve = resp_get_feature(resp, "sve");
          vls = resp_get_sve_vls(resp);
          qobject_unref(resp);
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
              sprintf(max_name, "sve%d", max_vq * 128);
              /* Enabling a supported length is of course fine. */
 -            assert_sve_vls(qts, "max", vls, "{ %s: true }", max_name);
 +            assert_sve_vls(qts, "host", vls, "{ %s: true }", max_name);
              /* Get the next supported length smaller than max-vq. */
              vq = 64 - __builtin_clzll(vls & ~BIT_ULL(max_vq - 1));
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
                   * We have at least one length smaller than max-vq,
                   * so we can disable max-vq.
                   */
 -                assert_sve_vls(qts, "max", (vls & ~BIT_ULL(max_vq - 1)),
 +                assert_sve_vls(qts, "host", (vls & ~BIT_ULL(max_vq - 1)),
                                 "{ %s: false }", max_name);
                  /*
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
                   */
                  sprintf(name, "sve%d", vq * 128);
                  error = g_strdup_printf("cannot disable %s", name);
 -                assert_error(qts, "max", error,
 +                assert_error(qts, "host", error,
                               "{ %s: true, %s: false }",
                               max_name, name);
                  g_free(error);
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
              vq = __builtin_ffsll(vls);
              sprintf(name, "sve%d", vq * 128);
              error = g_strdup_printf("cannot disable %s", name);
 -            assert_error(qts, "max", error, "{ %s: false }", name);
 +            assert_error(qts, "host", error, "{ %s: false }", name);
              g_free(error);
              /* Get an unsupported length. */
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
              if (vq <= SVE_MAX_VQ) {
                  sprintf(name, "sve%d", vq * 128);
                  error = g_strdup_printf("cannot enable %s", name);
 -                assert_error(qts, "max", error, "{ %s: true }", name);
 +                assert_error(qts, "host", error, "{ %s: true }", name);
                  g_free(error);
              }
          } else {
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
      } else {
          assert_has_not_feature(qts, "host", "aarch64");
          assert_has_not_feature(qts, "host", "pmu");
 -
 -        assert_has_not_feature(qts, "max", "sve");
 +        assert_has_not_feature(qts, "host", "sve");
      }
      qtest_quit(qts);
 diff --git a/docs/arm-cpu-features.rst b/docs/arm-cpu-features.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/arm-cpu-features.rst
 +++ b/docs/arm-cpu-features.rst
@@ -XXX,XX +XXX,XX @@ SVE CPU Property Examples
       $ qemu-system-aarch64 -M virt -cpu max
 -  3) Only enable the 128-bit vector length::
 +  3) When KVM is enabled, implicitly enable all host CPU supported vector
 +     lengths with the `host` CPU type::
 +
 +     $ qemu-system-aarch64 -M virt,accel=kvm -cpu host
 +
 +  4) Only enable the 128-bit vector length::
       $ qemu-system-aarch64 -M virt -cpu max,sve128=on
 -  4) Disable the 512-bit vector length and all larger vector lengths,
 +  5) Disable the 512-bit vector length and all larger vector lengths,
       since 512 is a power-of-two.  This results in all the smaller,
       uninitialized lengths (128, 256, and 384) defaulting to enabled::
       $ qemu-system-aarch64 -M virt -cpu max,sve512=off
 -  5) Enable the 128-bit, 256-bit, and 512-bit vector lengths::
 +  6) Enable the 128-bit, 256-bit, and 512-bit vector lengths::
       $ qemu-system-aarch64 -M virt -cpu max,sve128=on,sve256=on,sve512=on
 -  6) The same as (5), but since the 128-bit and 256-bit vector
 +  7) The same as (6), but since the 128-bit and 256-bit vector
       lengths are required for the 512-bit vector length to be enabled,
       then allow them to be auto-enabled::
       $ qemu-system-aarch64 -M virt -cpu max,sve512=on
 -  7) Do the same as (6), but by first disabling SVE and then re-enabling it::
 +  8) Do the same as (7), but by first disabling SVE and then re-enabling it::
       $ qemu-system-aarch64 -M virt -cpu max,sve=off,sve512=on,sve=on
 -  8) Force errors regarding the last vector length::
 +  9) Force errors regarding the last vector length::
       $ qemu-system-aarch64 -M virt -cpu max,sve128=off
       $ qemu-system-aarch64 -M virt -cpu max,sve=off,sve128=off,sve=on
@@ -XXX,XX +XXX,XX @@ The examples in "SVE CPU Property Examples" exhibit many ways to select
  vector lengths which developers may find useful in order to avoid overly
  verbose command lines.  However, the recommended way to select vector
  lengths is to explicitly enable each desired length.  Therefore only
 -example's (1), (3), and (5) exhibit recommended uses of the properties.
 +example's (1), (4), and (6) exhibit recommended uses of the properties.
 --
 .20.1

-[PULL 09/51] target/arm: Split out rebuild_hflags_a32
+[PULL 02/26] target/arm: Move neon_element_offset to translate.c
 From: Richard Henderson <richard.henderson@linaro.org>
-Currently a trivial wrapper for rebuild_hflags_common_32.
+This will shortly have users outside of translate-neon.c.inc.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-8-richard.henderson@linaro.org
+Message-id: 20201030022618.785675-3-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c | 8 +++++++-
+ target/arm/translate.c          | 20 ++++++++++++++++++++
-file changed, 7 insertions(+), 1 deletion(-)
+ target/arm/translate-neon.c.inc | 19 -------------------
 files changed, 20 insertions(+), 19 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/translate.c
-+++ b/target/arm/helper.c
++++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_m32(CPUARMState *env, int fp_el,
+@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
-     return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
+     return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
  }
-+static uint32_t rebuild_hflags_a32(CPUARMState *env, int fp_el,
++/*
-+                                   ARMMMUIdx mmu_idx)
++ * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
 + * where 0 is the least significant end of the register.
 + */
 +static long neon_element_offset(int reg, int element, MemOp size)
 +{
-+    return rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
++    int element_size = 1 << size;
 +    int ofs = element * element_size;
 +#ifdef HOST_WORDS_BIGENDIAN
 +    /*
 +     * Calculate the offset assuming fully little-endian,
 +     * then XOR to account for the order of the 8-byte units.
 +     */
 +    if (element_size < 8) {
 +        ofs ^= 8 - element_size;
 +    }
 +#endif
 +    return neon_full_reg_offset(reg) + ofs;
 +}
 +
- static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
+ static inline long vfp_reg_offset(bool dp, unsigned reg)
                                     ARMMMUIdx mmu_idx)
  {
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
+     if (dp) {
-                 flags = FIELD_DP32(flags, TBFLAG_A32, LSPACT, 1);
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
-             }
+index XXXXXXX..XXXXXXX 100644
-         } else {
+--- a/target/arm/translate-neon.c.inc
--            flags = rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
++++ b/target/arm/translate-neon.c.inc
-+            flags = rebuild_hflags_a32(env, fp_el, mmu_idx);
+@@ -XXX,XX +XXX,XX @@ static inline int neon_3same_fp_size(DisasContext *s, int x)
-         }
+ #include "decode-neon-ls.c.inc"
+ #include "decode-neon-shared.c.inc"
-         flags = FIELD_DP32(flags, TBFLAG_A32, THUMB, env->thumb);
 -/* Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
 - * where 0 is the least significant end of the register.
 - */
 -static inline long
 -neon_element_offset(int reg, int element, MemOp size)
 -{
 -    int element_size = 1 << size;
 -    int ofs = element * element_size;
 -#ifdef HOST_WORDS_BIGENDIAN
 -    /* Calculate the offset assuming fully little-endian,
 -     * then XOR to account for the order of the 8-byte units.
 -     */
 -    if (element_size < 8) {
 -        ofs ^= 8 - element_size;
 -    }
 -#endif
 -    return neon_full_reg_offset(reg) + ofs;
 -}
 -
  static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
  {
      long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
 --
 .20.1

-[PULL 06/51] target/arm: Split arm_cpu_data_is_big_endian
+[PULL 03/26] target/arm: Use neon_element_offset in neon_load/store_reg
 From: Richard Henderson <richard.henderson@linaro.org>
-Set TBFLAG_ANY.BE_DATA in rebuild_hflags_common_32 and
+These are the only users of neon_reg_offset, so remove that.
 rebuild_hflags_a64 instead of rebuild_hflags_common, where we do
 not need to re-test is_a64() nor re-compute the various inputs.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-5-richard.henderson@linaro.org
+Message-id: 20201030022618.785675-4-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h    | 49 +++++++++++++++++++++++++++------------------
+ target/arm/translate.c | 14 ++------------
- target/arm/helper.c | 16 +++++++++++----
+file changed, 2 insertions(+), 12 deletions(-)
 files changed, 42 insertions(+), 23 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/translate.c
-+++ b/target/arm/cpu.h
++++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static inline uint64_t arm_sctlr(CPUARMState *env, int el)
+@@ -XXX,XX +XXX,XX @@ static inline long vfp_reg_offset(bool dp, unsigned reg)
      }
  }
-+static inline bool arm_cpu_data_is_big_endian_a32(CPUARMState *env,
+-/* Return the offset of a 32-bit piece of a NEON register.
-+                                                  bool sctlr_b)
+-   zero is the least significant end of the register.  */
-+{
+-static inline long
-+#ifdef CONFIG_USER_ONLY
+-neon_reg_offset (int reg, int n)
-+    /*
+-{
-+     * In system mode, BE32 is modelled in line with the
+-    int sreg;
-+     * architecture (as word-invariant big-endianness), where loads
+-    sreg = reg * 2 + n;
-+     * and stores are done little endian but from addresses which
+-    return vfp_reg_offset(0, sreg);
-+     * are adjusted by XORing with the appropriate constant. So the
+-}
-+     * endianness to use for the raw data access is not affected by
+-
-+     * SCTLR.B.
+ static TCGv_i32 neon_load_reg(int reg, int pass)
 +     * In user mode, however, we model BE32 as byte-invariant
 +     * big-endianness (because user-only code cannot tell the
 +     * difference), and so we need to use a data access endianness
 +     * that depends on SCTLR.B.
 +     */
 +    if (sctlr_b) {
 +        return true;
 +    }
 +#endif
 +    /* In 32bit endianness is determined by looking at CPSR's E bit */
 +    return env->uncached_cpsr & CPSR_E;
 +}
 +
 +static inline bool arm_cpu_data_is_big_endian_a64(int el, uint64_t sctlr)
 +{
 +    return sctlr & (el ? SCTLR_EE : SCTLR_E0E);
 +}
  /* Return true if the processor is in big-endian mode. */
  static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
  {
--    /* In 32bit endianness is determined by looking at CPSR's E bit */
+     TCGv_i32 tmp = tcg_temp_new_i32();
-     if (!is_a64(env)) {
+-    tcg_gen_ld_i32(tmp, cpu_env, neon_reg_offset(reg, pass));
--        return
++    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
--#ifdef CONFIG_USER_ONLY
+     return tmp;
 -            /* In system mode, BE32 is modelled in line with the
 -             * architecture (as word-invariant big-endianness), where loads
 -             * and stores are done little endian but from addresses which
 -             * are adjusted by XORing with the appropriate constant. So the
 -             * endianness to use for the raw data access is not affected by
 -             * SCTLR.B.
 -             * In user mode, however, we model BE32 as byte-invariant
 -             * big-endianness (because user-only code cannot tell the
 -             * difference), and so we need to use a data access endianness
 -             * that depends on SCTLR.B.
 -             */
 -            arm_sctlr_b(env) ||
 -#endif
 -                ((env->uncached_cpsr & CPSR_E) ? 1 : 0);
 +        return arm_cpu_data_is_big_endian_a32(env, arm_sctlr_b(env));
      } else {
          int cur_el = arm_current_el(env);
          uint64_t sctlr = arm_sctlr(env, cur_el);
 -
 -        return (sctlr & (cur_el ? SCTLR_EE : SCTLR_E0E)) != 0;
 +        return arm_cpu_data_is_big_endian_a64(cur_el, sctlr);
      }
  }
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+ static void neon_store_reg(int reg, int pass, TCGv_i32 var)
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_common(CPUARMState *env, int fp_el,
      flags = FIELD_DP32(flags, TBFLAG_ANY, MMUIDX,
                         arm_to_core_mmu_idx(mmu_idx));
 -    if (arm_cpu_data_is_big_endian(env)) {
 -        flags = FIELD_DP32(flags, TBFLAG_ANY, BE_DATA, 1);
 -    }
      if (arm_singlestep_active(env)) {
          flags = FIELD_DP32(flags, TBFLAG_ANY, SS_ACTIVE, 1);
      }
@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_common(CPUARMState *env, int fp_el,
  static uint32_t rebuild_hflags_common_32(CPUARMState *env, int fp_el,
                                           ARMMMUIdx mmu_idx, uint32_t flags)
  {
--    flags = FIELD_DP32(flags, TBFLAG_A32, SCTLR_B, arm_sctlr_b(env));
+-    tcg_gen_st_i32(var, cpu_env, neon_reg_offset(reg, pass));
-+    bool sctlr_b = arm_sctlr_b(env);
++    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
-+
+     tcg_temp_free_i32(var);
-+    if (sctlr_b) {
+ }
-+        flags = FIELD_DP32(flags, TBFLAG_A32, SCTLR_B, 1);
 +    }
 +    if (arm_cpu_data_is_big_endian_a32(env, sctlr_b)) {
 +        flags = FIELD_DP32(flags, TBFLAG_ANY, BE_DATA, 1);
 +    }
      flags = FIELD_DP32(flags, TBFLAG_A32, NS, !access_secure_reg(env));
      return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
      sctlr = arm_sctlr(env, el);
 +    if (arm_cpu_data_is_big_endian_a64(el, sctlr)) {
 +        flags = FIELD_DP32(flags, TBFLAG_ANY, BE_DATA, 1);
 +    }
 +
      if (cpu_isar_feature(aa64_pauth, env_archcpu(env))) {
          /*
           * In order to save space in flags, we record only whether
 --
 .20.1

-[PULL 19/51] target/arm: Rebuild hflags at MSR writes
+[PULL 04/26] target/arm: Use neon_element_offset in vfp_reg_offset
 From: Richard Henderson <richard.henderson@linaro.org>
-Continue setting, but not relying upon, env->hflags.
+This seems a bit more readable than using offsetof CPU_DoubleU.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-18-richard.henderson@linaro.org
+Message-id: 20201030022618.785675-5-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 13 +++++++++++--
+ target/arm/translate.c | 13 ++++---------
- target/arm/translate.c     | 28 +++++++++++++++++++++++-----
+file changed, 4 insertions(+), 9 deletions(-)
 files changed, 34 insertions(+), 7 deletions(-)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
-+++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void handle_sys(DisasContext *s, uint32_t insn, bool isread,
-     if ((tb_cflags(s->base.tb) & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
-         /* I/O operations must end the TB here (whether read or write) */
-         s->base.is_jmp = DISAS_UPDATE;
--    } else if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
--        /* We default to ending the TB on a coprocessor register write,
-+    }
-+    if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
-+        /*
-+         * A write to any coprocessor regiser that ends a TB
-+         * must rebuild the hflags for the next TB.
-+         */
-+        TCGv_i32 tcg_el = tcg_const_i32(s->current_el);
-+        gen_helper_rebuild_hflags_a64(cpu_env, tcg_el);
-+        tcg_temp_free_i32(tcg_el);
-+        /*
-+         * We default to ending the TB on a coprocessor register write,
-          * but allow this to be suppressed by the register definition
-          * (usually only necessary to work around guest bugs).
-          */
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_coproc_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static long neon_element_offset(int reg, int element, MemOp size)
-     ri = get_arm_cp_reginfo(s->cp_regs,
+     return neon_full_reg_offset(reg) + ofs;
-             ENCODE_CP_REG(cpnum, is64, s->ns, crn, crm, opc1, opc2));
+ }
-     if (ri) {
-+        bool need_exit_tb;
+-static inline long vfp_reg_offset(bool dp, unsigned reg)
-+
++/* Return the offset of a VFP Dreg (dp = true) or VFP Sreg (dp = false). */
-         /* Check access permissions */
++static long vfp_reg_offset(bool dp, unsigned reg)
-         if (!cp_access_ok(s->current_el, ri, isread)) {
+ {
-             return 1;
+     if (dp) {
-@@ -XXX,XX +XXX,XX @@ static int disas_coproc_insn(DisasContext *s, uint32_t insn)
+-        return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
-             }
++        return neon_element_offset(reg, 0, MO_64);
-         }
+     } else {
+-        long ofs = offsetof(CPUARMState, vfp.zregs[reg >> 2].d[(reg >> 1) & 1]);
--        if ((tb_cflags(s->base.tb) & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
+-        if (reg & 1) {
--            /* I/O operations must end the TB here (whether read or write) */
+-            ofs += offsetof(CPU_DoubleU, l.upper);
--            gen_lookup_tb(s);
+-        } else {
--        } else if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
+-            ofs += offsetof(CPU_DoubleU, l.lower);
--            /* We default to ending the TB on a coprocessor register write,
+-        }
-+        /* I/O operations must end the TB here (whether read or write) */
+-        return ofs;
-+        need_exit_tb = ((tb_cflags(s->base.tb) & CF_USE_ICOUNT) &&
++        return neon_element_offset(reg >> 1, reg & 1, MO_32);
-+                        (ri->type & ARM_CP_IO));
+     }
-+
+ }
 +        if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
 +            /*
 +             * A write to any coprocessor regiser that ends a TB
 +             * must rebuild the hflags for the next TB.
 +             */
 +            TCGv_i32 tcg_el = tcg_const_i32(s->current_el);
 +            if (arm_dc_feature(s, ARM_FEATURE_M)) {
 +                gen_helper_rebuild_hflags_m32(cpu_env, tcg_el);
 +            } else {
 +                gen_helper_rebuild_hflags_a32(cpu_env, tcg_el);
 +            }
 +            tcg_temp_free_i32(tcg_el);
 +            /*
 +             * We default to ending the TB on a coprocessor register write,
               * but allow this to be suppressed by the register definition
               * (usually only necessary to work around guest bugs).
               */
 +            need_exit_tb = true;
 +        }
 +        if (need_exit_tb) {
              gen_lookup_tb(s);
          }
 --
 .20.1

-[PULL 46/51] hw/timer/bcm2835: Add the BCM2835 SYS_timer
+[PULL 05/26] target/arm: Add read/write_neon_element32
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
+From: Richard Henderson <richard.henderson@linaro.org>
-Add the 64-bit free running timer. Do not model the COMPARE register
+Model these off the aa64 read/write_vec_element functions.
-(no IRQ generated).
+Use it within translate-neon.c.inc.  The new functions do
-This timer is used by Linux kernel and recently U-Boot:
+not allocate or free temps, so this rearranges the calling
-https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clocksource/bcm2835_timer.c?h=v3.7
+code a bit.
 https://github.com/u-boot/u-boot/blob/v2019.07/include/configs/rpi.h#L19
-Datasheet used:
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
+Message-id: 20201030022618.785675-6-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Message-id: 20191019234715.25750-4-f4bug@amsat.org
 [PMM: squashed in switch to using memset in reset]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/timer/Makefile.objs            |   1 +
+ target/arm/translate.c          |  26 ++++
- include/hw/timer/bcm2835_systmr.h |  33 ++++++
+ target/arm/translate-neon.c.inc | 256 ++++++++++++++++++++------------
- hw/timer/bcm2835_systmr.c         | 163 ++++++++++++++++++++++++++++++
+files changed, 183 insertions(+), 99 deletions(-)
  hw/timer/trace-events             |   5 +
 files changed, 202 insertions(+)
  create mode 100644 include/hw/timer/bcm2835_systmr.h
  create mode 100644 hw/timer/bcm2835_systmr.c
-diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
+diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/timer/Makefile.objs
+--- a/target/arm/translate.c
-+++ b/hw/timer/Makefile.objs
++++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ common-obj-$(CONFIG_SUN4V_RTC) += sun4v-rtc.o
+@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
- common-obj-$(CONFIG_CMSDK_APB_TIMER) += cmsdk-apb-timer.o
+     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
- common-obj-$(CONFIG_CMSDK_APB_DUALTIMER) += cmsdk-apb-dualtimer.o
+ }
- common-obj-$(CONFIG_MSF2) += mss-timer.o
-+common-obj-$(CONFIG_RASPI) += bcm2835_systmr.o
++static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
 diff --git a/include/hw/timer/bcm2835_systmr.h b/include/hw/timer/bcm2835_systmr.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/include/hw/timer/bcm2835_systmr.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * BCM2835 SYS timer emulation
 + *
 + * Copyright (c) 2019 Philippe Mathieu-Daudé <f4bug@amsat.org>
 + *
 + * SPDX-License-Identifier: GPL-2.0-or-later
 + */
 +
 +#ifndef BCM2835_SYSTIMER_H
 +#define BCM2835_SYSTIMER_H
 +
 +#include "hw/sysbus.h"
 +#include "hw/irq.h"
 +
 +#define TYPE_BCM2835_SYSTIMER "bcm2835-sys-timer"
 +#define BCM2835_SYSTIMER(obj) \
 +    OBJECT_CHECK(BCM2835SystemTimerState, (obj), TYPE_BCM2835_SYSTIMER)
 +
 +typedef struct {
 +    /*< private >*/
 +    SysBusDevice parent_obj;
 +
 +    /*< public >*/
 +    MemoryRegion iomem;
 +    qemu_irq irq;
 +
 +    struct {
 +        uint32_t status;
 +        uint32_t compare[4];
 +    } reg;
 +} BCM2835SystemTimerState;
 +
 +#endif
 diff --git a/hw/timer/bcm2835_systmr.c b/hw/timer/bcm2835_systmr.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/hw/timer/bcm2835_systmr.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * BCM2835 SYS timer emulation
 + *
 + * Copyright (C) 2019 Philippe Mathieu-Daudé <f4bug@amsat.org>
 + *
 + * SPDX-License-Identifier: GPL-2.0-or-later
 + *
 + * Datasheet: BCM2835 ARM Peripherals (C6357-M-1398)
 + * https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
 + *
 + * Only the free running 64-bit counter is implemented.
 + * The 4 COMPARE registers and the interruption are not implemented.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qemu/log.h"
 +#include "qemu/timer.h"
 +#include "hw/timer/bcm2835_systmr.h"
 +#include "hw/registerfields.h"
 +#include "migration/vmstate.h"
 +#include "trace.h"
 +
 +REG32(CTRL_STATUS,  0x00)
 +REG32(COUNTER_LOW,  0x04)
 +REG32(COUNTER_HIGH, 0x08)
 +REG32(COMPARE0,     0x0c)
 +REG32(COMPARE1,     0x10)
 +REG32(COMPARE2,     0x14)
 +REG32(COMPARE3,     0x18)
 +
 +static void bcm2835_systmr_update_irq(BCM2835SystemTimerState *s)
 +{
-+    bool enable = !!s->reg.status;
++    long off = neon_element_offset(reg, ele, size);
 +
-+    trace_bcm2835_systmr_irq(enable);
++    switch (size) {
-+    qemu_set_irq(s->irq, enable);
++    case MO_32:
-+}
++        tcg_gen_ld_i32(dest, cpu_env, off);
 +
 +static void bcm2835_systmr_update_compare(BCM2835SystemTimerState *s,
 +                                          unsigned timer_index)
 +{
 +    /* TODO fow now, since neither Linux nor U-boot use these timers. */
 +    qemu_log_mask(LOG_UNIMP, "COMPARE register %u not implemented\n",
 +                  timer_index);
 +}
 +
 +static uint64_t bcm2835_systmr_read(void *opaque, hwaddr offset,
 +                                    unsigned size)
 +{
 +    BCM2835SystemTimerState *s = BCM2835_SYSTIMER(opaque);
 +    uint64_t r = 0;
 +
 +    switch (offset) {
 +    case A_CTRL_STATUS:
 +        r = s->reg.status;
 +        break;
 +    case A_COMPARE0 ... A_COMPARE3:
 +        r = s->reg.compare[(offset - A_COMPARE0) >> 2];
 +        break;
 +    case A_COUNTER_LOW:
 +    case A_COUNTER_HIGH:
 +        /* Free running counter at 1MHz */
 +        r = qemu_clock_get_us(QEMU_CLOCK_VIRTUAL);
 +        r >>= 8 * (offset - A_COUNTER_LOW);
 +        r &= UINT32_MAX;
 +        break;
 +    default:
-+        qemu_log_mask(LOG_GUEST_ERROR, "%s: bad offset 0x%" HWADDR_PRIx "\n",
++        g_assert_not_reached();
 +                      __func__, offset);
 +        break;
 +    }
-+    trace_bcm2835_systmr_read(offset, r);
-+
-+    return r;
 +}
 +
-+static void bcm2835_systmr_write(void *opaque, hwaddr offset,
++static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
 +                                 uint64_t value, unsigned size)
 +{
-+    BCM2835SystemTimerState *s = BCM2835_SYSTIMER(opaque);
++    long off = neon_element_offset(reg, ele, size);
 +
-+    trace_bcm2835_systmr_write(offset, value);
++    switch (size) {
-+    switch (offset) {
++    case MO_32:
-+    case A_CTRL_STATUS:
++        tcg_gen_st_i32(src, cpu_env, off);
 +        s->reg.status &= ~value; /* Ack */
 +        bcm2835_systmr_update_irq(s);
 +        break;
 +    case A_COMPARE0 ... A_COMPARE3:
 +        s->reg.compare[(offset - A_COMPARE0) >> 2] = value;
 +        bcm2835_systmr_update_compare(s, (offset - A_COMPARE0) >> 2);
 +        break;
 +    case A_COUNTER_LOW:
 +    case A_COUNTER_HIGH:
 +        qemu_log_mask(LOG_GUEST_ERROR, "%s: read-only ofs 0x%" HWADDR_PRIx "\n",
 +                      __func__, offset);
 +        break;
 +    default:
-+        qemu_log_mask(LOG_GUEST_ERROR, "%s: bad offset 0x%" HWADDR_PRIx "\n",
++        g_assert_not_reached();
 +                      __func__, offset);
 +        break;
 +    }
 +}
 +
-+static const MemoryRegionOps bcm2835_systmr_ops = {
+ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
-+    .read = bcm2835_systmr_read,
+ {
-+    .write = bcm2835_systmr_write,
+     TCGv_ptr ret = tcg_temp_new_ptr();
-+    .endianness = DEVICE_LITTLE_ENDIAN,
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 +    .impl = {
 +        .min_access_size = 4,
 +        .max_access_size = 4,
 +    },
 +};
 +
 +static void bcm2835_systmr_reset(DeviceState *dev)
 +{
 +    BCM2835SystemTimerState *s = BCM2835_SYSTIMER(dev);
 +
 +    memset(&s->reg, 0, sizeof(s->reg));
 +}
 +
 +static void bcm2835_systmr_realize(DeviceState *dev, Error **errp)
 +{
 +    BCM2835SystemTimerState *s = BCM2835_SYSTIMER(dev);
 +
 +    memory_region_init_io(&s->iomem, OBJECT(dev), &bcm2835_systmr_ops,
 +                          s, "bcm2835-sys-timer", 0x20);
 +    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->iomem);
 +    sysbus_init_irq(SYS_BUS_DEVICE(dev), &s->irq);
 +}
 +
 +static const VMStateDescription bcm2835_systmr_vmstate = {
 +    .name = "bcm2835_sys_timer",
 +    .version_id = 1,
 +    .minimum_version_id = 1,
 +    .fields = (VMStateField[]) {
 +        VMSTATE_UINT32(reg.status, BCM2835SystemTimerState),
 +        VMSTATE_UINT32_ARRAY(reg.compare, BCM2835SystemTimerState, 4),
 +        VMSTATE_END_OF_LIST()
 +    }
 +};
 +
 +static void bcm2835_systmr_class_init(ObjectClass *klass, void *data)
 +{
 +    DeviceClass *dc = DEVICE_CLASS(klass);
 +
 +    dc->realize = bcm2835_systmr_realize;
 +    dc->reset = bcm2835_systmr_reset;
 +    dc->vmsd = &bcm2835_systmr_vmstate;
 +}
 +
 +static const TypeInfo bcm2835_systmr_info = {
 +    .name = TYPE_BCM2835_SYSTIMER,
 +    .parent = TYPE_SYS_BUS_DEVICE,
 +    .instance_size = sizeof(BCM2835SystemTimerState),
 +    .class_init = bcm2835_systmr_class_init,
 +};
 +
 +static void bcm2835_systmr_register_types(void)
 +{
 +    type_register_static(&bcm2835_systmr_info);
 +}
 +
 +type_init(bcm2835_systmr_register_types);
 diff --git a/hw/timer/trace-events b/hw/timer/trace-events
 index XXXXXXX..XXXXXXX 100644
---- a/hw/timer/trace-events
+--- a/target/arm/translate-neon.c.inc
-+++ b/hw/timer/trace-events
++++ b/target/arm/translate-neon.c.inc
-@@ -XXX,XX +XXX,XX @@ pl031_read(uint32_t addr, uint32_t value) "addr 0x%08x value 0x%08x"
+@@ -XXX,XX +XXX,XX @@ static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
- pl031_write(uint32_t addr, uint32_t value) "addr 0x%08x value 0x%08x"
+      * early. Since Q is 0 there are always just two passes, so instead
- pl031_alarm_raised(void) "alarm raised"
+      * of a complicated loop over each pass we just unroll.
- pl031_set_alarm(uint32_t ticks) "alarm set for %u ticks"
+      */
-+
+-    tmp = neon_load_reg(a->vn, 0);
-+# bcm2835_systmr.c
+-    tmp2 = neon_load_reg(a->vn, 1);
-+bcm2835_systmr_irq(bool enable) "timer irq state %u"
++    tmp = tcg_temp_new_i32();
-+bcm2835_systmr_read(uint64_t offset, uint64_t data) "timer read: offset 0x%" PRIx64 " data 0x%" PRIx64
++    tmp2 = tcg_temp_new_i32();
-+bcm2835_systmr_write(uint64_t offset, uint64_t data) "timer write: offset 0x%" PRIx64 " data 0x%" PRIx64
++    tmp3 = tcg_temp_new_i32();
 +
 +    read_neon_element32(tmp, a->vn, 0, MO_32);
 +    read_neon_element32(tmp2, a->vn, 1, MO_32);
      fn(tmp, tmp, tmp2);
 -    tcg_temp_free_i32(tmp2);
 -    tmp3 = neon_load_reg(a->vm, 0);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    read_neon_element32(tmp3, a->vm, 0, MO_32);
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      fn(tmp3, tmp3, tmp2);
 -    tcg_temp_free_i32(tmp2);
 -    neon_store_reg(a->vd, 0, tmp);
 -    neon_store_reg(a->vd, 1, tmp3);
 +    write_neon_element32(tmp, a->vd, 0, MO_32);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +
 +    tcg_temp_free_i32(tmp);
 +    tcg_temp_free_i32(tmp2);
 +    tcg_temp_free_i32(tmp3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
       * 2-reg-and-shift operations, size < 3 case, where the
       * helper needs to be passed cpu_env.
       */
 -    TCGv_i32 constimm;
 +    TCGv_i32 constimm, tmp;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
       * by immediate using the variable shift operations.
       */
      constimm = tcg_const_i32(dup_const(a->size, a->shift));
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
 +        read_neon_element32(tmp, a->vm, pass, MO_32);
          fn(tmp, cpu_env, tmp, constimm);
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(constimm);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
      constimm = tcg_const_i64(-a->shift);
      rm1 = tcg_temp_new_i64();
      rm2 = tcg_temp_new_i64();
 +    rd = tcg_temp_new_i32();
      /* Load both inputs first to avoid potential overwrite if rm == rd */
      neon_load_reg64(rm1, a->vm);
      neon_load_reg64(rm2, a->vm + 1);
      shiftfn(rm1, rm1, constimm);
 -    rd = tcg_temp_new_i32();
      narrowfn(rd, cpu_env, rm1);
 -    neon_store_reg(a->vd, 0, rd);
 +    write_neon_element32(rd, a->vd, 0, MO_32);
      shiftfn(rm2, rm2, constimm);
 -    rd = tcg_temp_new_i32();
      narrowfn(rd, cpu_env, rm2);
 -    neon_store_reg(a->vd, 1, rd);
 +    write_neon_element32(rd, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd);
      tcg_temp_free_i64(rm1);
      tcg_temp_free_i64(rm2);
      tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      constimm = tcg_const_i32(imm);
      /* Load all inputs first to avoid potential overwrite */
 -    rm1 = neon_load_reg(a->vm, 0);
 -    rm2 = neon_load_reg(a->vm, 1);
 -    rm3 = neon_load_reg(a->vm + 1, 0);
 -    rm4 = neon_load_reg(a->vm + 1, 1);
 +    rm1 = tcg_temp_new_i32();
 +    rm2 = tcg_temp_new_i32();
 +    rm3 = tcg_temp_new_i32();
 +    rm4 = tcg_temp_new_i32();
 +    read_neon_element32(rm1, a->vm, 0, MO_32);
 +    read_neon_element32(rm2, a->vm, 1, MO_32);
 +    read_neon_element32(rm3, a->vm, 2, MO_32);
 +    read_neon_element32(rm4, a->vm, 3, MO_32);
      rtmp = tcg_temp_new_i64();
      shiftfn(rm1, rm1, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      tcg_temp_free_i32(rm2);
      narrowfn(rm1, cpu_env, rtmp);
 -    neon_store_reg(a->vd, 0, rm1);
 +    write_neon_element32(rm1, a->vd, 0, MO_32);
 +    tcg_temp_free_i32(rm1);
      shiftfn(rm3, rm3, constimm);
      shiftfn(rm4, rm4, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      narrowfn(rm3, cpu_env, rtmp);
      tcg_temp_free_i64(rtmp);
 -    neon_store_reg(a->vd, 1, rm3);
 +    write_neon_element32(rm3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rm3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
          widen_mask = dup_const(a->size + 1, widen_mask);
      }
 -    rm0 = neon_load_reg(a->vm, 0);
 -    rm1 = neon_load_reg(a->vm, 1);
 +    rm0 = tcg_temp_new_i32();
 +    rm1 = tcg_temp_new_i32();
 +    read_neon_element32(rm0, a->vm, 0, MO_32);
 +    read_neon_element32(rm1, a->vm, 1, MO_32);
      tmp = tcg_temp_new_i64();
      widenfn(tmp, rm0);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      if (src1_wide) {
          neon_load_reg64(rn0_64, a->vn);
      } else {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vn, 0, MO_32);
          widenfn(rn0_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = neon_load_reg(a->vm, 0);
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rm, a->vm, 0, MO_32);
      widenfn(rm_64, rm);
      tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      if (src1_wide) {
          neon_load_reg64(rn1_64, a->vn + 1);
      } else {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vn, 1, MO_32);
          widenfn(rn1_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = neon_load_reg(a->vm, 1);
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rm, a->vm, 1, MO_32);
      neon_store_reg64(rn0_64, a->vd);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
      narrowfn(rd1, rn_64);
 -    neon_store_reg(a->vd, 0, rd0);
 -    neon_store_reg(a->vd, 1, rd1);
 +    write_neon_element32(rd0, a->vd, 0, MO_32);
 +    write_neon_element32(rd1, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd0);
 +    tcg_temp_free_i32(rd1);
      tcg_temp_free_i64(rn_64);
      tcg_temp_free_i64(rm_64);
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
      rd0 = tcg_temp_new_i64();
      rd1 = tcg_temp_new_i64();
 -    rn = neon_load_reg(a->vn, 0);
 -    rm = neon_load_reg(a->vm, 0);
 +    rn = tcg_temp_new_i32();
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rn, a->vn, 0, MO_32);
 +    read_neon_element32(rm, a->vm, 0, MO_32);
      opfn(rd0, rn, rm);
 -    tcg_temp_free_i32(rn);
 -    tcg_temp_free_i32(rm);
 -    rn = neon_load_reg(a->vn, 1);
 -    rm = neon_load_reg(a->vm, 1);
 +    read_neon_element32(rn, a->vn, 1, MO_32);
 +    read_neon_element32(rm, a->vm, 1, MO_32);
      opfn(rd1, rn, rm);
      tcg_temp_free_i32(rn);
      tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
  static inline TCGv_i32 neon_get_scalar(int size, int reg)
  {
 -    TCGv_i32 tmp;
 -    if (size == 1) {
 -        tmp = neon_load_reg(reg & 7, reg >> 4);
 +    TCGv_i32 tmp = tcg_temp_new_i32();
 +    if (size == MO_16) {
 +        read_neon_element32(tmp, reg & 7, reg >> 4, MO_32);
          if (reg & 8) {
              gen_neon_dup_high16(tmp);
          } else {
              gen_neon_dup_low16(tmp);
          }
      } else {
 -        tmp = neon_load_reg(reg & 15, reg >> 4);
 +        read_neon_element32(tmp, reg & 15, reg >> 4, MO_32);
      }
      return tmp;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
       * perform an accumulation operation of that result into the
       * destination.
       */
 -    TCGv_i32 scalar;
 +    TCGv_i32 scalar, tmp;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
      }
      scalar = neon_get_scalar(a->size, a->vm);
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
 +        read_neon_element32(tmp, a->vn, pass, MO_32);
          opfn(tmp, tmp, scalar);
          if (accfn) {
 -            TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +            TCGv_i32 rd = tcg_temp_new_i32();
 +            read_neon_element32(rd, a->vd, pass, MO_32);
              accfn(tmp, rd, tmp);
              tcg_temp_free_i32(rd);
          }
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(scalar);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
       * performs a kind of fused op-then-accumulate using a helper
       * function that takes all of rd, rn and the scalar at once.
       */
 -    TCGv_i32 scalar;
 +    TCGv_i32 scalar, rn, rd;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
      }
      scalar = neon_get_scalar(a->size, a->vm);
 +    rn = tcg_temp_new_i32();
 +    rd = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 rn = neon_load_reg(a->vn, pass);
 -        TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +        read_neon_element32(rn, a->vn, pass, MO_32);
 +        read_neon_element32(rd, a->vd, pass, MO_32);
          opfn(rd, cpu_env, rn, scalar, rd);
 -        tcg_temp_free_i32(rn);
 -        neon_store_reg(a->vd, pass, rd);
 +        write_neon_element32(rd, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(rn);
 +    tcg_temp_free_i32(rd);
      tcg_temp_free_i32(scalar);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
      scalar = neon_get_scalar(a->size, a->vm);
      /* Load all inputs before writing any outputs, in case of overlap */
 -    rn = neon_load_reg(a->vn, 0);
 +    rn = tcg_temp_new_i32();
 +    read_neon_element32(rn, a->vn, 0, MO_32);
      rn0_64 = tcg_temp_new_i64();
      opfn(rn0_64, rn, scalar);
 -    tcg_temp_free_i32(rn);
 -    rn = neon_load_reg(a->vn, 1);
 +    read_neon_element32(rn, a->vn, 1, MO_32);
      rn1_64 = tcg_temp_new_i64();
      opfn(rn1_64, rn, scalar);
      tcg_temp_free_i32(rn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
          return false;
      }
      n <<= 3;
 +    tmp = tcg_temp_new_i32();
      if (a->op) {
 -        tmp = neon_load_reg(a->vd, 0);
 +        read_neon_element32(tmp, a->vd, 0, MO_32);
      } else {
 -        tmp = tcg_temp_new_i32();
          tcg_gen_movi_i32(tmp, 0);
      }
 -    tmp2 = neon_load_reg(a->vm, 0);
 +    tmp2 = tcg_temp_new_i32();
 +    read_neon_element32(tmp2, a->vm, 0, MO_32);
      ptr1 = vfp_reg_ptr(true, a->vn);
      tmp4 = tcg_const_i32(n);
      gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
 -    tcg_temp_free_i32(tmp);
 +
      if (a->op) {
 -        tmp = neon_load_reg(a->vd, 1);
 +        read_neon_element32(tmp, a->vd, 1, MO_32);
      } else {
 -        tmp = tcg_temp_new_i32();
          tcg_gen_movi_i32(tmp, 0);
      }
 -    tmp3 = neon_load_reg(a->vm, 1);
 +    tmp3 = tcg_temp_new_i32();
 +    read_neon_element32(tmp3, a->vm, 1, MO_32);
      gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(tmp4);
      tcg_temp_free_ptr(ptr1);
 -    neon_store_reg(a->vd, 0, tmp2);
 -    neon_store_reg(a->vd, 1, tmp3);
 -    tcg_temp_free_i32(tmp);
 +
 +    write_neon_element32(tmp2, a->vd, 0, MO_32);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp2);
 +    tcg_temp_free_i32(tmp3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
  static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
  {
      int pass, half;
 +    TCGv_i32 tmp[2];
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
          return true;
      }
 -    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
 -        TCGv_i32 tmp[2];
 +    tmp[0] = tcg_temp_new_i32();
 +    tmp[1] = tcg_temp_new_i32();
 +    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
          for (half = 0; half < 2; half++) {
 -            tmp[half] = neon_load_reg(a->vm, pass * 2 + half);
 +            read_neon_element32(tmp[half], a->vm, pass * 2 + half, MO_32);
              switch (a->size) {
              case 0:
                  tcg_gen_bswap32_i32(tmp[half], tmp[half]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
                  g_assert_not_reached();
              }
          }
 -        neon_store_reg(a->vd, pass * 2, tmp[1]);
 -        neon_store_reg(a->vd, pass * 2 + 1, tmp[0]);
 +        write_neon_element32(tmp[1], a->vd, pass * 2, MO_32);
 +        write_neon_element32(tmp[0], a->vd, pass * 2 + 1, MO_32);
      }
 +
 +    tcg_temp_free_i32(tmp[0]);
 +    tcg_temp_free_i32(tmp[1]);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
          rm0_64 = tcg_temp_new_i64();
          rm1_64 = tcg_temp_new_i64();
          rd_64 = tcg_temp_new_i64();
 -        tmp = neon_load_reg(a->vm, pass * 2);
 +
 +        tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vm, pass * 2, MO_32);
          widenfn(rm0_64, tmp);
 -        tcg_temp_free_i32(tmp);
 -        tmp = neon_load_reg(a->vm, pass * 2 + 1);
 +        read_neon_element32(tmp, a->vm, pass * 2 + 1, MO_32);
          widenfn(rm1_64, tmp);
          tcg_temp_free_i32(tmp);
 +
          opfn(rd_64, rm0_64, rm1_64);
          tcg_temp_free_i64(rm0_64);
          tcg_temp_free_i64(rm1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
      narrowfn(rd0, cpu_env, rm);
      neon_load_reg64(rm, a->vm + 1);
      narrowfn(rd1, cpu_env, rm);
 -    neon_store_reg(a->vd, 0, rd0);
 -    neon_store_reg(a->vd, 1, rd1);
 +    write_neon_element32(rd0, a->vd, 0, MO_32);
 +    write_neon_element32(rd1, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd0);
 +    tcg_temp_free_i32(rd1);
      tcg_temp_free_i64(rm);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      }
      rd = tcg_temp_new_i64();
 +    rm0 = tcg_temp_new_i32();
 +    rm1 = tcg_temp_new_i32();
 -    rm0 = neon_load_reg(a->vm, 0);
 -    rm1 = neon_load_reg(a->vm, 1);
 +    read_neon_element32(rm0, a->vm, 0, MO_32);
 +    read_neon_element32(rm1, a->vm, 1, MO_32);
      widenfn(rd, rm0);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
      fpst = fpstatus_ptr(FPST_STD);
      ahp = get_ahp_flag();
 -    tmp = neon_load_reg(a->vm, 0);
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vm, 0, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    tmp2 = tcg_temp_new_i32();
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp2, tmp2, fpst, ahp);
      tcg_gen_shli_i32(tmp2, tmp2, 16);
      tcg_gen_or_i32(tmp2, tmp2, tmp);
 -    tcg_temp_free_i32(tmp);
 -    tmp = neon_load_reg(a->vm, 2);
 +    read_neon_element32(tmp, a->vm, 2, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
 -    tmp3 = neon_load_reg(a->vm, 3);
 -    neon_store_reg(a->vd, 0, tmp2);
 +    tmp3 = tcg_temp_new_i32();
 +    read_neon_element32(tmp3, a->vm, 3, MO_32);
 +    write_neon_element32(tmp2, a->vd, 0, MO_32);
 +    tcg_temp_free_i32(tmp2);
      gen_helper_vfp_fcvt_f32_to_f16(tmp3, tmp3, fpst, ahp);
      tcg_gen_shli_i32(tmp3, tmp3, 16);
      tcg_gen_or_i32(tmp3, tmp3, tmp);
 -    neon_store_reg(a->vd, 1, tmp3);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp3);
      tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(ahp);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
      fpst = fpstatus_ptr(FPST_STD);
      ahp = get_ahp_flag();
      tmp3 = tcg_temp_new_i32();
 -    tmp = neon_load_reg(a->vm, 0);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    tmp2 = tcg_temp_new_i32();
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vm, 0, MO_32);
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      tcg_gen_ext16u_i32(tmp3, tmp);
      gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
 -    neon_store_reg(a->vd, 0, tmp3);
 +    write_neon_element32(tmp3, a->vd, 0, MO_32);
      tcg_gen_shri_i32(tmp, tmp, 16);
      gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp);
 -    neon_store_reg(a->vd, 1, tmp);
 -    tmp3 = tcg_temp_new_i32();
 +    write_neon_element32(tmp, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp);
      tcg_gen_ext16u_i32(tmp3, tmp2);
      gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
 -    neon_store_reg(a->vd, 2, tmp3);
 +    write_neon_element32(tmp3, a->vd, 2, MO_32);
 +    tcg_temp_free_i32(tmp3);
      tcg_gen_shri_i32(tmp2, tmp2, 16);
      gen_helper_vfp_fcvt_f16_to_f32(tmp2, tmp2, fpst, ahp);
 -    neon_store_reg(a->vd, 3, tmp2);
 +    write_neon_element32(tmp2, a->vd, 3, MO_32);
 +    tcg_temp_free_i32(tmp2);
      tcg_temp_free_i32(ahp);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ DO_2M_CRYPTO(SHA256SU0, aa32_sha2, 2)
  static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
  {
 +    TCGv_i32 tmp;
      int pass;
      /* Handle a 2-reg-misc operation by iterating 32 bits at a time */
@@ -XXX,XX +XXX,XX @@ static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
          return true;
      }
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
 +        read_neon_element32(tmp, a->vm, pass, MO_32);
          fn(tmp, tmp);
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VTRN(DisasContext *s, arg_2misc *a)
          return true;
      }
 -    if (a->size == 2) {
 +    tmp = tcg_temp_new_i32();
 +    tmp2 = tcg_temp_new_i32();
 +    if (a->size == MO_32) {
          for (pass = 0; pass < (a->q ? 4 : 2); pass += 2) {
 -            tmp = neon_load_reg(a->vm, pass);
 -            tmp2 = neon_load_reg(a->vd, pass + 1);
 -            neon_store_reg(a->vm, pass, tmp2);
 -            neon_store_reg(a->vd, pass + 1, tmp);
 +            read_neon_element32(tmp, a->vm, pass, MO_32);
 +            read_neon_element32(tmp2, a->vd, pass + 1, MO_32);
 +            write_neon_element32(tmp2, a->vm, pass, MO_32);
 +            write_neon_element32(tmp, a->vd, pass + 1, MO_32);
          }
      } else {
          for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -            tmp = neon_load_reg(a->vm, pass);
 -            tmp2 = neon_load_reg(a->vd, pass);
 -            if (a->size == 0) {
 +            read_neon_element32(tmp, a->vm, pass, MO_32);
 +            read_neon_element32(tmp2, a->vd, pass, MO_32);
 +            if (a->size == MO_8) {
                  gen_neon_trn_u8(tmp, tmp2);
              } else {
                  gen_neon_trn_u16(tmp, tmp2);
              }
 -            neon_store_reg(a->vm, pass, tmp2);
 -            neon_store_reg(a->vd, pass, tmp);
 +            write_neon_element32(tmp2, a->vm, pass, MO_32);
 +            write_neon_element32(tmp, a->vd, pass, MO_32);
          }
      }
 +    tcg_temp_free_i32(tmp);
 +    tcg_temp_free_i32(tmp2);
      return true;
  }
 --
 .20.1

-[PULL 22/51] target/arm: Rebuild hflags for M-profile
+[PULL 06/26] target/arm: Expand read/write_neon_element32 to all MemOp
 From: Richard Henderson <richard.henderson@linaro.org>
-Continue setting, but not relying upon, env->hflags.
+We can then use this to improve VMOV (scalar to gp) and
+VMOV (gp to scalar) so that we simply perform the memory
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+operation that we wanted, rather than inserting or
 extracting from a 32-bit quantity.
 These were the last uses of neon_load/store_reg, so remove them.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-21-richard.henderson@linaro.org
+Message-id: 20201030022618.785675-7-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/m_helper.c  | 6 ++++++
+ target/arm/translate.c         | 50 +++++++++++++-----------
- target/arm/translate.c | 5 ++++-
+ target/arm/translate-vfp.c.inc | 71 +++++-----------------------------
-files changed, 10 insertions(+), 1 deletion(-)
+files changed, 37 insertions(+), 84 deletions(-)
 diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/m_helper.c
 +++ b/target/arm/m_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_bxns)(CPUARMState *env, uint32_t dest)
      switch_v7m_security_state(env, dest & 1);
      env->thumb = 1;
      env->regs[15] = dest & ~1;
 +    arm_rebuild_hflags(env);
  }
  void HELPER(v7m_blxns)(CPUARMState *env, uint32_t dest)
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_blxns)(CPUARMState *env, uint32_t dest)
      switch_v7m_security_state(env, 0);
      env->thumb = 1;
      env->regs[15] = dest;
 +    arm_rebuild_hflags(env);
  }
  static uint32_t *get_v7m_sp_ptr(CPUARMState *env, bool secure, bool threadmode,
@@ -XXX,XX +XXX,XX @@ static void v7m_exception_taken(ARMCPU *cpu, uint32_t lr, bool dotailchain,
      env->regs[14] = lr;
      env->regs[15] = addr & 0xfffffffe;
      env->thumb = addr & 1;
 +    arm_rebuild_hflags(env);
  }
  static void v7m_update_fpccr(CPUARMState *env, uint32_t frameptr,
@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
      /* Otherwise, we have a successful exception exit. */
      arm_clear_exclusive(env);
 +    arm_rebuild_hflags(env);
      qemu_log_mask(CPU_LOG_INT, "...successful exception return\n");
  }
@@ -XXX,XX +XXX,XX @@ static bool do_v7m_function_return(ARMCPU *cpu)
      xpsr_write(env, 0, XPSR_IT);
      env->thumb = newpc & 1;
      env->regs[15] = newpc & ~1;
 +    arm_rebuild_hflags(env);
      qemu_log_mask(CPU_LOG_INT, "...function return successful\n");
      return true;
@@ -XXX,XX +XXX,XX @@ static bool v7m_handle_execute_nsc(ARMCPU *cpu)
      switch_v7m_security_state(env, true);
      xpsr_write(env, 0, XPSR_IT);
      env->regs[15] += 4;
 +    arm_rebuild_hflags(env);
      return true;
  gen_invep:
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_MRS_v7m(DisasContext *s, arg_MRS_v7m *a)
+@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
+  * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
- static bool trans_MSR_v7m(DisasContext *s, arg_MSR_v7m *a)
+  * where 0 is the least significant end of the register.
- {
+  */
--    TCGv_i32 addr, reg;
+-static long neon_element_offset(int reg, int element, MemOp size)
-+    TCGv_i32 addr, reg, el;
++static long neon_element_offset(int reg, int element, MemOp memop)
+ {
-     if (!arm_dc_feature(s, ARM_FEATURE_M)) {
+-    int element_size = 1 << size;
-         return false;
++    int element_size = 1 << (memop & MO_SIZE);
-@@ -XXX,XX +XXX,XX @@ static bool trans_MSR_v7m(DisasContext *s, arg_MSR_v7m *a)
+     int ofs = element * element_size;
-     gen_helper_v7m_msr(cpu_env, addr, reg);
+ #ifdef HOST_WORDS_BIGENDIAN
-     tcg_temp_free_i32(addr);
+     /*
-     tcg_temp_free_i32(reg);
+@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
-+    el = tcg_const_i32(s->current_el);
+     }
-+    gen_helper_rebuild_hflags_m32(cpu_env, el);
+ }
-+    tcg_temp_free_i32(el);
-     gen_lookup_tb(s);
+-static TCGv_i32 neon_load_reg(int reg, int pass)
 -{
 -    TCGv_i32 tmp = tcg_temp_new_i32();
 -    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
 -    return tmp;
 -}
 -
 -static void neon_store_reg(int reg, int pass, TCGv_i32 var)
 -{
 -    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
 -    tcg_temp_free_i32(var);
 -}
 -
  static inline void neon_load_reg64(TCGv_i64 var, int reg)
  {
      tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
      tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
  }
 -static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
 +static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
  {
 -    long off = neon_element_offset(reg, ele, size);
 +    long off = neon_element_offset(reg, ele, memop);
 -    switch (size) {
 -    case MO_32:
 +    switch (memop) {
 +    case MO_SB:
 +        tcg_gen_ld8s_i32(dest, cpu_env, off);
 +        break;
 +    case MO_UB:
 +        tcg_gen_ld8u_i32(dest, cpu_env, off);
 +        break;
 +    case MO_SW:
 +        tcg_gen_ld16s_i32(dest, cpu_env, off);
 +        break;
 +    case MO_UW:
 +        tcg_gen_ld16u_i32(dest, cpu_env, off);
 +        break;
 +    case MO_UL:
 +    case MO_SL:
          tcg_gen_ld_i32(dest, cpu_env, off);
          break;
      default:
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
      }
  }
 -static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
 +static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
  {
 -    long off = neon_element_offset(reg, ele, size);
 +    long off = neon_element_offset(reg, ele, memop);
 -    switch (size) {
 +    switch (memop) {
 +    case MO_8:
 +        tcg_gen_st8_i32(src, cpu_env, off);
 +        break;
 +    case MO_16:
 +        tcg_gen_st16_i32(src, cpu_env, off);
 +        break;
      case MO_32:
          tcg_gen_st_i32(src, cpu_env, off);
          break;
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
  {
      /* VMOV scalar to general purpose register */
      TCGv_i32 tmp;
 -    int pass;
 -    uint32_t offset;
 -    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
 -    if (a->size == 2
 +    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
 +    if (a->size == MO_32
          ? !dc_isar_feature(aa32_fpsp_v2, s)
          : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
          return false;
      }
 -    offset = a->index << a->size;
 -    pass = extract32(offset, 2, 1);
 -    offset = extract32(offset, 0, 2) * 8;
 -
      if (!vfp_access_check(s)) {
          return true;
      }
 -    tmp = neon_load_reg(a->vn, pass);
 -    switch (a->size) {
 -    case 0:
 -        if (offset) {
 -            tcg_gen_shri_i32(tmp, tmp, offset);
 -        }
 -        if (a->u) {
 -            gen_uxtb(tmp);
 -        } else {
 -            gen_sxtb(tmp);
 -        }
 -        break;
 -    case 1:
 -        if (a->u) {
 -            if (offset) {
 -                tcg_gen_shri_i32(tmp, tmp, 16);
 -            } else {
 -                gen_uxth(tmp);
 -            }
 -        } else {
 -            if (offset) {
 -                tcg_gen_sari_i32(tmp, tmp, 16);
 -            } else {
 -                gen_sxth(tmp);
 -            }
 -        }
 -        break;
 -    case 2:
 -        break;
 -    }
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
      store_reg(s, a->rt, tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
  static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
  {
      /* VMOV general purpose register to scalar */
 -    TCGv_i32 tmp, tmp2;
 -    int pass;
 -    uint32_t offset;
 +    TCGv_i32 tmp;
 -    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
 -    if (a->size == 2
 +    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
 +    if (a->size == MO_32
          ? !dc_isar_feature(aa32_fpsp_v2, s)
          : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
          return false;
      }
 -    offset = a->index << a->size;
 -    pass = extract32(offset, 2, 1);
 -    offset = extract32(offset, 0, 2) * 8;
 -
      if (!vfp_access_check(s)) {
          return true;
      }
      tmp = load_reg(s, a->rt);
 -    switch (a->size) {
 -    case 0:
 -        tmp2 = neon_load_reg(a->vn, pass);
 -        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
 -        tcg_temp_free_i32(tmp2);
 -        break;
 -    case 1:
 -        tmp2 = neon_load_reg(a->vn, pass);
 -        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
 -        tcg_temp_free_i32(tmp2);
 -        break;
 -    case 2:
 -        break;
 -    }
 -    neon_store_reg(a->vn, pass, tmp);
 +    write_neon_element32(tmp, a->vn, a->index, a->size);
 +    tcg_temp_free_i32(tmp);
      return true;
  }
 --
 .20.1

-[PULL 10/51] target/arm: Split out rebuild_hflags_aprofile
+[PULL 07/26] target/arm: Rename neon_load_reg32 to vfp_load_reg32
 From: Richard Henderson <richard.henderson@linaro.org>
-Create a function to compute the values of the TBFLAG_ANY bits
+The only uses of this function are for loading VFP
-that will be cached, and are used by A-profile.
+single-precision values, and nothing to do with NEON.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-9-richard.henderson@linaro.org
+Message-id: 20201030022618.785675-8-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c | 20 ++++++++++++--------
+ target/arm/translate.c         |   4 +-
-file changed, 12 insertions(+), 8 deletions(-)
+ target/arm/translate-vfp.c.inc | 184 ++++++++++++++++-----------------
 files changed, 94 insertions(+), 94 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/translate.c
-+++ b/target/arm/helper.c
++++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_m32(CPUARMState *env, int fp_el,
+@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
-     return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
+     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
  }
-+static uint32_t rebuild_hflags_aprofile(CPUARMState *env)
+-static inline void neon_load_reg32(TCGv_i32 var, int reg)
-+{
++static inline void vfp_load_reg32(TCGv_i32 var, int reg)
 +    int flags = 0;
 +
 +    flags = FIELD_DP32(flags, TBFLAG_ANY, DEBUG_TARGET_EL,
 +                       arm_debug_target_el(env));
 +    return flags;
 +}
 +
  static uint32_t rebuild_hflags_a32(CPUARMState *env, int fp_el,
                                     ARMMMUIdx mmu_idx)
  {
--    return rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
+     tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
 +    uint32_t flags = rebuild_hflags_aprofile(env);
 +    return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
  }
- static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
+-static inline void neon_store_reg32(TCGv_i32 var, int reg)
-                                    ARMMMUIdx mmu_idx)
++static inline void vfp_store_reg32(TCGv_i32 var, int reg)
  {
-+    uint32_t flags = rebuild_hflags_aprofile(env);
+     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
-     ARMMMUIdx stage1 = stage_1_mmu_idx(mmu_idx);
+ }
-     ARMVAParameters p0 = aa64_va_parameters_both(env, 0, stage1);
+diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
--    uint32_t flags = 0;
+index XXXXXXX..XXXXXXX 100644
-     uint64_t sctlr;
+--- a/target/arm/translate-vfp.c.inc
-     int tbii, tbid;
++++ b/target/arm/translate-vfp.c.inc
+@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
+         frn = tcg_temp_new_i32();
-         }
+         frm = tcg_temp_new_i32();
-     }
+         dest = tcg_temp_new_i32();
+-        neon_load_reg32(frn, rn);
--    if (!arm_feature(env, ARM_FEATURE_M)) {
+-        neon_load_reg32(frm, rm);
--        int target_el = arm_debug_target_el(env);
++        vfp_load_reg32(frn, rn);
--
++        vfp_load_reg32(frm, rm);
--        flags = FIELD_DP32(flags, TBFLAG_ANY, DEBUG_TARGET_EL, target_el);
+         switch (a->cc) {
--    }
+         case 0: /* eq: Z */
--
+             tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
-     *pflags = flags;
+@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
-     *cs_base = 0;
+         if (sz == 1) {
              tcg_gen_andi_i32(dest, dest, 0xffff);
          }
 -        neon_store_reg32(dest, rd);
 +        vfp_store_reg32(dest, rd);
          tcg_temp_free_i32(frn);
          tcg_temp_free_i32(frm);
          tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          TCGv_i32 tcg_res;
          tcg_op = tcg_temp_new_i32();
          tcg_res = tcg_temp_new_i32();
 -        neon_load_reg32(tcg_op, rm);
 +        vfp_load_reg32(tcg_op, rm);
          if (sz == 1) {
              gen_helper_rinth(tcg_res, tcg_op, fpst);
          } else {
              gen_helper_rints(tcg_res, tcg_op, fpst);
          }
 -        neon_store_reg32(tcg_res, rd);
 +        vfp_store_reg32(tcg_res, rd);
          tcg_temp_free_i32(tcg_op);
          tcg_temp_free_i32(tcg_res);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
              gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
          }
          tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
 -        neon_store_reg32(tcg_tmp, rd);
 +        vfp_store_reg32(tcg_tmp, rd);
          tcg_temp_free_i32(tcg_tmp);
          tcg_temp_free_i64(tcg_res);
          tcg_temp_free_i64(tcg_double);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
          TCGv_i32 tcg_single, tcg_res;
          tcg_single = tcg_temp_new_i32();
          tcg_res = tcg_temp_new_i32();
 -        neon_load_reg32(tcg_single, rm);
 +        vfp_load_reg32(tcg_single, rm);
          if (sz == 1) {
              if (is_signed) {
                  gen_helper_vfp_toslh(tcg_res, tcg_single, tcg_shift, fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
                  gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
              }
          }
 -        neon_store_reg32(tcg_res, rd);
 +        vfp_store_reg32(tcg_res, rd);
          tcg_temp_free_i32(tcg_res);
          tcg_temp_free_i32(tcg_single);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
      if (a->l) {
          /* VFP to general purpose register */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vn);
 +        vfp_load_reg32(tmp, a->vn);
          tcg_gen_andi_i32(tmp, tmp, 0xffff);
          store_reg(s, a->rt, tmp);
      } else {
          /* general purpose register to VFP */
          tmp = load_reg(s, a->rt);
          tcg_gen_andi_i32(tmp, tmp, 0xffff);
 -        neon_store_reg32(tmp, a->vn);
 +        vfp_store_reg32(tmp, a->vn);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
      if (a->l) {
          /* VFP to general purpose register */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vn);
 +        vfp_load_reg32(tmp, a->vn);
          if (a->rt == 15) {
              /* Set the 4 flag bits in the CPSR.  */
              gen_set_nzcv(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
      } else {
          /* general purpose register to VFP */
          tmp = load_reg(s, a->rt);
 -        neon_store_reg32(tmp, a->vn);
 +        vfp_store_reg32(tmp, a->vn);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
      if (a->op) {
          /* fpreg to gpreg */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm);
 +        vfp_load_reg32(tmp, a->vm);
          store_reg(s, a->rt, tmp);
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm + 1);
 +        vfp_load_reg32(tmp, a->vm + 1);
          store_reg(s, a->rt2, tmp);
      } else {
          /* gpreg to fpreg */
          tmp = load_reg(s, a->rt);
 -        neon_store_reg32(tmp, a->vm);
 +        vfp_store_reg32(tmp, a->vm);
          tcg_temp_free_i32(tmp);
          tmp = load_reg(s, a->rt2);
 -        neon_store_reg32(tmp, a->vm + 1);
 +        vfp_store_reg32(tmp, a->vm + 1);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
      if (a->op) {
          /* fpreg to gpreg */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm * 2);
 +        vfp_load_reg32(tmp, a->vm * 2);
          store_reg(s, a->rt, tmp);
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm * 2 + 1);
 +        vfp_load_reg32(tmp, a->vm * 2 + 1);
          store_reg(s, a->rt2, tmp);
      } else {
          /* gpreg to fpreg */
          tmp = load_reg(s, a->rt);
 -        neon_store_reg32(tmp, a->vm * 2);
 +        vfp_store_reg32(tmp, a->vm * 2);
          tcg_temp_free_i32(tmp);
          tmp = load_reg(s, a->rt2);
 -        neon_store_reg32(tmp, a->vm * 2 + 1);
 +        vfp_store_reg32(tmp, a->vm * 2 + 1);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
      tmp = tcg_temp_new_i32();
      if (a->l) {
          gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
 -        neon_store_reg32(tmp, a->vd);
 +        vfp_store_reg32(tmp, a->vd);
      } else {
 -        neon_load_reg32(tmp, a->vd);
 +        vfp_load_reg32(tmp, a->vd);
          gen_aa32_st16(s, tmp, addr, get_mem_index(s));
      }
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
      tmp = tcg_temp_new_i32();
      if (a->l) {
          gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
 -        neon_store_reg32(tmp, a->vd);
 +        vfp_store_reg32(tmp, a->vd);
      } else {
 -        neon_load_reg32(tmp, a->vd);
 +        vfp_load_reg32(tmp, a->vd);
          gen_aa32_st32(s, tmp, addr, get_mem_index(s));
      }
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
          if (a->l) {
              /* load */
              gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
 -            neon_store_reg32(tmp, a->vd + i);
 +            vfp_store_reg32(tmp, a->vd + i);
          } else {
              /* store */
 -            neon_load_reg32(tmp, a->vd + i);
 +            vfp_load_reg32(tmp, a->vd + i);
              gen_aa32_st32(s, tmp, addr, get_mem_index(s));
          }
          tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
      fd = tcg_temp_new_i32();
      fpst = fpstatus_ptr(FPST_FPCR);
 -    neon_load_reg32(f0, vn);
 -    neon_load_reg32(f1, vm);
 +    vfp_load_reg32(f0, vn);
 +    vfp_load_reg32(f1, vm);
      for (;;) {
          if (reads_vd) {
 -            neon_load_reg32(fd, vd);
 +            vfp_load_reg32(fd, vd);
          }
          fn(fd, f0, f1, fpst);
 -        neon_store_reg32(fd, vd);
 +        vfp_store_reg32(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
          veclen--;
          vd = vfp_advance_sreg(vd, delta_d);
          vn = vfp_advance_sreg(vn, delta_d);
 -        neon_load_reg32(f0, vn);
 +        vfp_load_reg32(f0, vn);
          if (delta_m) {
              vm = vfp_advance_sreg(vm, delta_m);
 -            neon_load_reg32(f1, vm);
 +            vfp_load_reg32(f1, vm);
          }
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
      fd = tcg_temp_new_i32();
      fpst = fpstatus_ptr(FPST_FPCR_F16);
 -    neon_load_reg32(f0, vn);
 -    neon_load_reg32(f1, vm);
 +    vfp_load_reg32(f0, vn);
 +    vfp_load_reg32(f1, vm);
      if (reads_vd) {
 -        neon_load_reg32(fd, vd);
 +        vfp_load_reg32(fd, vd);
      }
      fn(fd, f0, f1, fpst);
 -    neon_store_reg32(fd, vd);
 +    vfp_store_reg32(fd, vd);
      tcg_temp_free_i32(f0);
      tcg_temp_free_i32(f1);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
      f0 = tcg_temp_new_i32();
      fd = tcg_temp_new_i32();
 -    neon_load_reg32(f0, vm);
 +    vfp_load_reg32(f0, vm);
      for (;;) {
          fn(fd, f0);
 -        neon_store_reg32(fd, vd);
 +        vfp_store_reg32(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
              /* single source one-many */
              while (veclen--) {
                  vd = vfp_advance_sreg(vd, delta_d);
 -                neon_store_reg32(fd, vd);
 +                vfp_store_reg32(fd, vd);
              }
              break;
          }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
          veclen--;
          vd = vfp_advance_sreg(vd, delta_d);
          vm = vfp_advance_sreg(vm, delta_m);
 -        neon_load_reg32(f0, vm);
 +        vfp_load_reg32(f0, vm);
      }
      tcg_temp_free_i32(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
      }
      f0 = tcg_temp_new_i32();
 -    neon_load_reg32(f0, vm);
 +    vfp_load_reg32(f0, vm);
      fn(f0, f0);
 -    neon_store_reg32(f0, vd);
 +    vfp_store_reg32(f0, vd);
      tcg_temp_free_i32(f0);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vn, a->vn);
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vn, a->vn);
 +    vfp_load_reg32(vm, a->vm);
      if (neg_n) {
          /* VFNMS, VFMS */
          gen_helper_vfp_negh(vn, vn);
      }
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (neg_d) {
          /* VFNMA, VFNMS */
          gen_helper_vfp_negh(vd, vd);
      }
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vn, a->vn);
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vn, a->vn);
 +    vfp_load_reg32(vm, a->vm);
      if (neg_n) {
          /* VFNMS, VFMS */
          gen_helper_vfp_negs(vn, vn);
      }
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (neg_d) {
          /* VFNMA, VFNMS */
          gen_helper_vfp_negs(vd, vd);
      }
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_hp(DisasContext *s, arg_VMOV_imm_sp *a)
      }
      fd = tcg_const_i32(vfp_expand_imm(MO_16, a->imm));
 -    neon_store_reg32(fd, a->vd);
 +    vfp_store_reg32(fd, a->vd);
      tcg_temp_free_i32(fd);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
      fd = tcg_const_i32(vfp_expand_imm(MO_32, a->imm));
      for (;;) {
 -        neon_store_reg32(fd, vd);
 +        vfp_store_reg32(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
      vd = tcg_temp_new_i32();
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (a->z) {
          tcg_gen_movi_i32(vm, 0);
      } else {
 -        neon_load_reg32(vm, a->vm);
 +        vfp_load_reg32(vm, a->vm);
      }
      if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
      vd = tcg_temp_new_i32();
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (a->z) {
          tcg_gen_movi_i32(vm, 0);
      } else {
 -        neon_load_reg32(vm, a->vm);
 +        vfp_load_reg32(vm, a->vm);
      }
      if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
      /* The T bit tells us if we want the low or high 16 bits of Vm */
      tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
      gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_i32(ahp_mode);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
      ahp_mode = get_ahp_flag();
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
      tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
      tcg_temp_free_i32(ahp_mode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_hp(DisasContext *s, arg_VRINTR_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      gen_helper_rinth(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rints(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_hp(DisasContext *s, arg_VRINTZ_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rinth(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tcg_rmode);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rints(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tcg_rmode);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_hp(DisasContext *s, arg_VRINTX_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      gen_helper_rinth_exact(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rints_exact(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i64();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      gen_helper_vfp_fcvtds(vd, vm, cpu_env);
      neon_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
      vm = tcg_temp_new_i64();
      neon_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i64(vm);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
      }
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      if (a->s) {
          /* i32 -> f16 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
          /* u32 -> f16 */
          gen_helper_vfp_uitoh(vm, vm, fpst);
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
      }
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      if (a->s) {
          /* i32 -> f32 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
          /* u32 -> f32 */
          gen_helper_vfp_uitos(vm, vm, fpst);
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i64();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      if (a->s) {
          /* i32 -> f64 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
      vd = tcg_temp_new_i32();
      neon_load_reg64(vm, a->vm);
      gen_helper_vjcvt(vd, vm, cpu_env);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i64(vm);
      tcg_temp_free_i32(vd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
          g_assert_not_reached();
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i32(shift);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
          g_assert_not_reached();
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i32(shift);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      if (a->s) {
          if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
              gen_helper_vfp_touih(vm, vm, fpst);
          }
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
      fpst = fpstatus_ptr(FPST_FPCR);
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      if (a->s) {
          if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
              gen_helper_vfp_touis(vm, vm, fpst);
          }
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
              gen_helper_vfp_touid(vd, vm, fpst);
          }
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i64(vm);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VINS(DisasContext *s, arg_VINS *a)
      /* Insert low half of Vm into high half of Vd */
      rm = tcg_temp_new_i32();
      rd = tcg_temp_new_i32();
 -    neon_load_reg32(rm, a->vm);
 -    neon_load_reg32(rd, a->vd);
 +    vfp_load_reg32(rm, a->vm);
 +    vfp_load_reg32(rd, a->vd);
      tcg_gen_deposit_i32(rd, rd, rm, 16, 16);
 -    neon_store_reg32(rd, a->vd);
 +    vfp_store_reg32(rd, a->vd);
      tcg_temp_free_i32(rm);
      tcg_temp_free_i32(rd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOVX(DisasContext *s, arg_VINS *a)
      /* Set Vd to high half of Vm */
      rm = tcg_temp_new_i32();
 -    neon_load_reg32(rm, a->vm);
 +    vfp_load_reg32(rm, a->vm);
      tcg_gen_shri_i32(rm, rm, 16);
 -    neon_store_reg32(rm, a->vd);
 +    vfp_store_reg32(rm, a->vd);
      tcg_temp_free_i32(rm);
      return true;
  }
 --
 .20.1

-[PULL 44/51] hw/misc/bcm2835_thermal: Add a dummy BCM2835 thermal sensor
+[PULL 08/26] target/arm: Add read/write_neon_element64
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
+From: Richard Henderson <richard.henderson@linaro.org>
-We will soon implement the SYS_timer. This timer is used by Linux
+Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.
-in the thermal subsystem, so once available, the subsystem will be
-enabled and poll the temperature sensors. We need to provide the
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-minimum required to keep Linux booting.
+Message-id: 20201030022618.785675-9-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Add a dummy thermal sensor returning ~25°C based on:
 https://github.com/raspberrypi/linux/blob/rpi-5.3.y/drivers/thermal/broadcom/bcm2835_thermal.c
 Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Message-id: 20191019234715.25750-2-f4bug@amsat.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/misc/Makefile.objs             |   1 +
+ target/arm/translate.c          | 26 +++++++++
- include/hw/misc/bcm2835_thermal.h |  27 ++++++
+ target/arm/translate-neon.c.inc | 94 ++++++++++++++++-----------------
- hw/misc/bcm2835_thermal.c         | 135 ++++++++++++++++++++++++++++++
+files changed, 73 insertions(+), 47 deletions(-)
-files changed, 163 insertions(+)
- create mode 100644 include/hw/misc/bcm2835_thermal.h
+diff --git a/target/arm/translate.c b/target/arm/translate.c
  create mode 100644 hw/misc/bcm2835_thermal.c
 diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/Makefile.objs
+--- a/target/arm/translate.c
-+++ b/hw/misc/Makefile.objs
++++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ common-obj-$(CONFIG_OMAP) += omap_tap.o
+@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
- common-obj-$(CONFIG_RASPI) += bcm2835_mbox.o
+     }
- common-obj-$(CONFIG_RASPI) += bcm2835_property.o
+ }
- common-obj-$(CONFIG_RASPI) += bcm2835_rng.o
-+common-obj-$(CONFIG_RASPI) += bcm2835_thermal.o
++static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
- common-obj-$(CONFIG_SLAVIO) += slavio_misc.o
++{
- common-obj-$(CONFIG_ZYNQ) += zynq_slcr.o
++    long off = neon_element_offset(reg, ele, memop);
  common-obj-$(CONFIG_ZYNQ) += zynq-xadc.o
 diff --git a/include/hw/misc/bcm2835_thermal.h b/include/hw/misc/bcm2835_thermal.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/include/hw/misc/bcm2835_thermal.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * BCM2835 dummy thermal sensor
 + *
 + * Copyright (C) 2019 Philippe Mathieu-Daudé <f4bug@amsat.org>
 + *
 + * SPDX-License-Identifier: GPL-2.0-or-later
 + */
 +
-+#ifndef HW_MISC_BCM2835_THERMAL_H
++    switch (memop) {
-+#define HW_MISC_BCM2835_THERMAL_H
++    case MO_Q:
-+
++        tcg_gen_ld_i64(dest, cpu_env, off);
 +#include "hw/sysbus.h"
 +
 +#define TYPE_BCM2835_THERMAL "bcm2835-thermal"
 +
 +#define BCM2835_THERMAL(obj) \
 +    OBJECT_CHECK(Bcm2835ThermalState, (obj), TYPE_BCM2835_THERMAL)
 +
 +typedef struct {
 +    /*< private >*/
 +    SysBusDevice parent_obj;
 +    /*< public >*/
 +    MemoryRegion iomem;
 +    uint32_t ctl;
 +} Bcm2835ThermalState;
 +
 +#endif
 diff --git a/hw/misc/bcm2835_thermal.c b/hw/misc/bcm2835_thermal.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/hw/misc/bcm2835_thermal.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * BCM2835 dummy thermal sensor
 + *
 + * Copyright (C) 2019 Philippe Mathieu-Daudé <f4bug@amsat.org>
 + *
 + * SPDX-License-Identifier: GPL-2.0-or-later
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qemu/log.h"
 +#include "qapi/error.h"
 +#include "hw/misc/bcm2835_thermal.h"
 +#include "hw/registerfields.h"
 +#include "migration/vmstate.h"
 +
 +REG32(CTL, 0)
 +FIELD(CTL, POWER_DOWN, 0, 1)
 +FIELD(CTL, RESET, 1, 1)
 +FIELD(CTL, BANDGAP_CTRL, 2, 3)
 +FIELD(CTL, INTERRUPT_ENABLE, 5, 1)
 +FIELD(CTL, DIRECT, 6, 1)
 +FIELD(CTL, INTERRUPT_CLEAR, 7, 1)
 +FIELD(CTL, HOLD, 8, 10)
 +FIELD(CTL, RESET_DELAY, 18, 8)
 +FIELD(CTL, REGULATOR_ENABLE, 26, 1)
 +
 +REG32(STAT, 4)
 +FIELD(STAT, DATA, 0, 10)
 +FIELD(STAT, VALID, 10, 1)
 +FIELD(STAT, INTERRUPT, 11, 1)
 +
 +#define THERMAL_OFFSET_C 412
 +#define THERMAL_COEFF  (-0.538f)
 +
 +static uint16_t bcm2835_thermal_temp2adc(int temp_C)
 +{
 +    return (temp_C - THERMAL_OFFSET_C) / THERMAL_COEFF;
 +}
 +
 +static uint64_t bcm2835_thermal_read(void *opaque, hwaddr addr, unsigned size)
 +{
 +    Bcm2835ThermalState *s = BCM2835_THERMAL(opaque);
 +    uint32_t val = 0;
 +
 +    switch (addr) {
 +    case A_CTL:
 +        val = s->ctl;
 +        break;
 +    case A_STAT:
 +        /* Temperature is constantly 25°C. */
 +        val = FIELD_DP32(bcm2835_thermal_temp2adc(25), STAT, VALID, true);
 +        break;
 +    default:
-+        /* MemoryRegionOps are aligned, so this can not happen. */
-+        g_assert_not_reached();
-+    }
-+    return val;
-+}
-+
-+static void bcm2835_thermal_write(void *opaque, hwaddr addr,
-+                                  uint64_t value, unsigned size)
-+{
-+    Bcm2835ThermalState *s = BCM2835_THERMAL(opaque);
-+
-+    switch (addr) {
-+    case A_CTL:
-+        s->ctl = value;
-+        break;
-+    case A_STAT:
-+        qemu_log_mask(LOG_GUEST_ERROR, "%s: write 0x%" PRIx64
-+                                       " to 0x%" HWADDR_PRIx "\n",
-+                       __func__, value, addr);
-+        break;
-+    default:
-+        /* MemoryRegionOps are aligned, so this can not happen. */
 +        g_assert_not_reached();
 +    }
 +}
 +
-+static const MemoryRegionOps bcm2835_thermal_ops = {
+ static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
-+    .read = bcm2835_thermal_read,
+ {
-+    .write = bcm2835_thermal_write,
+     long off = neon_element_offset(reg, ele, memop);
-+    .impl.max_access_size = 4,
+@@ -XXX,XX +XXX,XX @@ static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
-+    .valid.min_access_size = 4,
+     }
-+    .endianness = DEVICE_NATIVE_ENDIAN,
+ }
-+};
 +static void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
 +{
 +    long off = neon_element_offset(reg, ele, memop);
 +
-+static void bcm2835_thermal_reset(DeviceState *dev)
++    switch (memop) {
-+{
++    case MO_64:
-+    Bcm2835ThermalState *s = BCM2835_THERMAL(dev);
++        tcg_gen_st_i64(src, cpu_env, off);
-+
++        break;
-+    s->ctl = 0;
++    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
-+static void bcm2835_thermal_realize(DeviceState *dev, Error **errp)
+ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
-+{
+ {
-+    Bcm2835ThermalState *s = BCM2835_THERMAL(dev);
+     TCGv_ptr ret = tcg_temp_new_ptr();
-+
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
-+    memory_region_init_io(&s->iomem, OBJECT(s), &bcm2835_thermal_ops,
+index XXXXXXX..XXXXXXX 100644
-+                          s, TYPE_BCM2835_THERMAL, 8);
+--- a/target/arm/translate-neon.c.inc
-+    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->iomem);
++++ b/target/arm/translate-neon.c.inc
-+}
+@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
-+
+     for (pass = 0; pass < a->q + 1; pass++) {
-+static const VMStateDescription bcm2835_thermal_vmstate = {
+         TCGv_i64 tmp = tcg_temp_new_i64();
-+    .name = "bcm2835_thermal",
-+    .version_id = 1,
+-        neon_load_reg64(tmp, a->vm + pass);
-+    .minimum_version_id = 1,
++        read_neon_element64(tmp, a->vm, pass, MO_64);
-+    .fields = (VMStateField[]) {
+         fn(tmp, cpu_env, tmp, constimm);
-+        VMSTATE_UINT32(ctl, Bcm2835ThermalState),
+-        neon_store_reg64(tmp, a->vd + pass);
-+        VMSTATE_END_OF_LIST()
++        write_neon_element64(tmp, a->vd, pass, MO_64);
-+    }
+         tcg_temp_free_i64(tmp);
-+};
+     }
-+
+     tcg_temp_free_i64(constimm);
-+static void bcm2835_thermal_class_init(ObjectClass *klass, void *data)
+@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
-+{
+     rd = tcg_temp_new_i32();
-+    DeviceClass *dc = DEVICE_CLASS(klass);
-+
+     /* Load both inputs first to avoid potential overwrite if rm == rd */
-+    dc->realize = bcm2835_thermal_realize;
+-    neon_load_reg64(rm1, a->vm);
-+    dc->reset = bcm2835_thermal_reset;
+-    neon_load_reg64(rm2, a->vm + 1);
-+    dc->vmsd = &bcm2835_thermal_vmstate;
++    read_neon_element64(rm1, a->vm, 0, MO_64);
-+}
++    read_neon_element64(rm2, a->vm, 1, MO_64);
-+
-+static const TypeInfo bcm2835_thermal_info = {
+     shiftfn(rm1, rm1, constimm);
-+    .name = TYPE_BCM2835_THERMAL,
+     narrowfn(rd, cpu_env, rm1);
-+    .parent = TYPE_SYS_BUS_DEVICE,
+@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
-+    .instance_size = sizeof(Bcm2835ThermalState),
+         tcg_gen_shli_i64(tmp, tmp, a->shift);
-+    .class_init = bcm2835_thermal_class_init,
+         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
-+};
+     }
-+
+-    neon_store_reg64(tmp, a->vd);
-+static void bcm2835_thermal_register_types(void)
++    write_neon_element64(tmp, a->vd, 0, MO_64);
-+{
-+    type_register_static(&bcm2835_thermal_info);
+     widenfn(tmp, rm1);
-+}
+     tcg_temp_free_i32(rm1);
-+
+@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
-+type_init(bcm2835_thermal_register_types)
+         tcg_gen_shli_i64(tmp, tmp, a->shift);
          tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
      }
 -    neon_store_reg64(tmp, a->vd + 1);
 +    write_neon_element64(tmp, a->vd, 1, MO_64);
      tcg_temp_free_i64(tmp);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      rm_64 = tcg_temp_new_i64();
      if (src1_wide) {
 -        neon_load_reg64(rn0_64, a->vn);
 +        read_neon_element64(rn0_64, a->vn, 0, MO_64);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 0, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
       * avoid incorrect results if a narrow input overlaps with the result.
       */
      if (src1_wide) {
 -        neon_load_reg64(rn1_64, a->vn + 1);
 +        read_neon_element64(rn1_64, a->vn, 1, MO_64);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      rm = tcg_temp_new_i32();
      read_neon_element32(rm, a->vm, 1, MO_32);
 -    neon_store_reg64(rn0_64, a->vd);
 +    write_neon_element64(rn0_64, a->vd, 0, MO_64);
      widenfn(rm_64, rm);
      tcg_temp_free_i32(rm);
      opfn(rn1_64, rn1_64, rm_64);
 -    neon_store_reg64(rn1_64, a->vd + 1);
 +    write_neon_element64(rn1_64, a->vd, 1, MO_64);
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
      rd0 = tcg_temp_new_i32();
      rd1 = tcg_temp_new_i32();
 -    neon_load_reg64(rn_64, a->vn);
 -    neon_load_reg64(rm_64, a->vm);
 +    read_neon_element64(rn_64, a->vn, 0, MO_64);
 +    read_neon_element64(rm_64, a->vm, 0, MO_64);
      opfn(rn_64, rn_64, rm_64);
      narrowfn(rd0, rn_64);
 -    neon_load_reg64(rn_64, a->vn + 1);
 -    neon_load_reg64(rm_64, a->vm + 1);
 +    read_neon_element64(rn_64, a->vn, 1, MO_64);
 +    read_neon_element64(rm_64, a->vm, 1, MO_64);
      opfn(rn_64, rn_64, rm_64);
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
      /* Don't store results until after all loads: they might overlap */
      if (accfn) {
          tmp = tcg_temp_new_i64();
 -        neon_load_reg64(tmp, a->vd);
 +        read_neon_element64(tmp, a->vd, 0, MO_64);
          accfn(tmp, tmp, rd0);
 -        neon_store_reg64(tmp, a->vd);
 -        neon_load_reg64(tmp, a->vd + 1);
 +        write_neon_element64(tmp, a->vd, 0, MO_64);
 +        read_neon_element64(tmp, a->vd, 1, MO_64);
          accfn(tmp, tmp, rd1);
 -        neon_store_reg64(tmp, a->vd + 1);
 +        write_neon_element64(tmp, a->vd, 1, MO_64);
          tcg_temp_free_i64(tmp);
      } else {
 -        neon_store_reg64(rd0, a->vd);
 -        neon_store_reg64(rd1, a->vd + 1);
 +        write_neon_element64(rd0, a->vd, 0, MO_64);
 +        write_neon_element64(rd1, a->vd, 1, MO_64);
      }
      tcg_temp_free_i64(rd0);
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
      if (accfn) {
          TCGv_i64 t64 = tcg_temp_new_i64();
 -        neon_load_reg64(t64, a->vd);
 +        read_neon_element64(t64, a->vd, 0, MO_64);
          accfn(t64, t64, rn0_64);
 -        neon_store_reg64(t64, a->vd);
 -        neon_load_reg64(t64, a->vd + 1);
 +        write_neon_element64(t64, a->vd, 0, MO_64);
 +        read_neon_element64(t64, a->vd, 1, MO_64);
          accfn(t64, t64, rn1_64);
 -        neon_store_reg64(t64, a->vd + 1);
 +        write_neon_element64(t64, a->vd, 1, MO_64);
          tcg_temp_free_i64(t64);
      } else {
 -        neon_store_reg64(rn0_64, a->vd);
 -        neon_store_reg64(rn1_64, a->vd + 1);
 +        write_neon_element64(rn0_64, a->vd, 0, MO_64);
 +        write_neon_element64(rn1_64, a->vd, 1, MO_64);
      }
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
          right = tcg_temp_new_i64();
          dest = tcg_temp_new_i64();
 -        neon_load_reg64(right, a->vn);
 -        neon_load_reg64(left, a->vm);
 +        read_neon_element64(right, a->vn, 0, MO_64);
 +        read_neon_element64(left, a->vm, 0, MO_64);
          tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
 -        neon_store_reg64(dest, a->vd);
 +        write_neon_element64(dest, a->vd, 0, MO_64);
          tcg_temp_free_i64(left);
          tcg_temp_free_i64(right);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
          destright = tcg_temp_new_i64();
          if (a->imm < 8) {
 -            neon_load_reg64(right, a->vn);
 -            neon_load_reg64(middle, a->vn + 1);
 +            read_neon_element64(right, a->vn, 0, MO_64);
 +            read_neon_element64(middle, a->vn, 1, MO_64);
              tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
 -            neon_load_reg64(left, a->vm);
 +            read_neon_element64(left, a->vm, 0, MO_64);
              tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
          } else {
 -            neon_load_reg64(right, a->vn + 1);
 -            neon_load_reg64(middle, a->vm);
 +            read_neon_element64(right, a->vn, 1, MO_64);
 +            read_neon_element64(middle, a->vm, 0, MO_64);
              tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
 -            neon_load_reg64(left, a->vm + 1);
 +            read_neon_element64(left, a->vm, 1, MO_64);
              tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
          }
 -        neon_store_reg64(destright, a->vd);
 -        neon_store_reg64(destleft, a->vd + 1);
 +        write_neon_element64(destright, a->vd, 0, MO_64);
 +        write_neon_element64(destleft, a->vd, 1, MO_64);
          tcg_temp_free_i64(destright);
          tcg_temp_free_i64(destleft);
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
          if (accfn) {
              TCGv_i64 tmp64 = tcg_temp_new_i64();
 -            neon_load_reg64(tmp64, a->vd + pass);
 +            read_neon_element64(tmp64, a->vd, pass, MO_64);
              accfn(rd_64, tmp64, rd_64);
              tcg_temp_free_i64(tmp64);
          }
 -        neon_store_reg64(rd_64, a->vd + pass);
 +        write_neon_element64(rd_64, a->vd, pass, MO_64);
          tcg_temp_free_i64(rd_64);
      }
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
      rd0 = tcg_temp_new_i32();
      rd1 = tcg_temp_new_i32();
 -    neon_load_reg64(rm, a->vm);
 +    read_neon_element64(rm, a->vm, 0, MO_64);
      narrowfn(rd0, cpu_env, rm);
 -    neon_load_reg64(rm, a->vm + 1);
 +    read_neon_element64(rm, a->vm, 1, MO_64);
      narrowfn(rd1, cpu_env, rm);
      write_neon_element32(rd0, a->vd, 0, MO_32);
      write_neon_element32(rd1, a->vd, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      widenfn(rd, rm0);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
 -    neon_store_reg64(rd, a->vd);
 +    write_neon_element64(rd, a->vd, 0, MO_64);
      widenfn(rd, rm1);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
 -    neon_store_reg64(rd, a->vd + 1);
 +    write_neon_element64(rd, a->vd, 1, MO_64);
      tcg_temp_free_i64(rd);
      tcg_temp_free_i32(rm0);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSWP(DisasContext *s, arg_2misc *a)
      rm = tcg_temp_new_i64();
      rd = tcg_temp_new_i64();
      for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
 -        neon_load_reg64(rm, a->vm + pass);
 -        neon_load_reg64(rd, a->vd + pass);
 -        neon_store_reg64(rm, a->vd + pass);
 -        neon_store_reg64(rd, a->vm + pass);
 +        read_neon_element64(rm, a->vm, pass, MO_64);
 +        read_neon_element64(rd, a->vd, pass, MO_64);
 +        write_neon_element64(rm, a->vd, pass, MO_64);
 +        write_neon_element64(rd, a->vm, pass, MO_64);
      }
      tcg_temp_free_i64(rm);
      tcg_temp_free_i64(rd);
 --
 .20.1

-[PULL 21/51] target/arm: Rebuild hflags at Xscale SCTLR writes
+[PULL 09/26] target/arm: Rename neon_load_reg64 to vfp_load_reg64
 From: Richard Henderson <richard.henderson@linaro.org>
-Continue setting, but not relying upon, env->hflags.
+The only uses of this function are for loading VFP
 double-precision values, and nothing to do with NEON.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-20-richard.henderson@linaro.org
+Message-id: 20201030022618.785675-10-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c | 10 ++++++++++
+ target/arm/translate.c         |  8 ++--
-file changed, 10 insertions(+)
+ target/arm/translate-vfp.c.inc | 84 +++++++++++++++++-----------------
+files changed, 46 insertions(+), 46 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/translate.c
-+++ b/target/arm/helper.c
++++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static void sctlr_write(CPUARMState *env, const ARMCPRegInfo *ri,
+@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
-     /* ??? Lots of these bits are not implemented.  */
+     }
      /* This may enable/disable the MMU, so do a TLB flush.  */
      tlb_flush(CPU(cpu));
 +
 +    if (ri->type & ARM_CP_SUPPRESS_TB_END) {
 +        /*
 +         * Normally we would always end the TB on an SCTLR write; see the
 +         * comment in ARMCPRegInfo sctlr initialization below for why Xscale
 +         * is special.  Setting ARM_CP_SUPPRESS_TB_END also stops the rebuild
 +         * of hflags from the translator, so do it here.
 +         */
 +        arm_rebuild_hflags(env);
 +    }
  }
- static CPAccessResult fpexc32_access(CPUARMState *env, const ARMCPRegInfo *ri,
+-static inline void neon_load_reg64(TCGv_i64 var, int reg)
 +static inline void vfp_load_reg64(TCGv_i64 var, int reg)
  {
 -    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
 +    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(true, reg));
  }
 -static inline void neon_store_reg64(TCGv_i64 var, int reg)
 +static inline void vfp_store_reg64(TCGv_i64 var, int reg)
  {
 -    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
 +    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(true, reg));
  }
  static inline void vfp_load_reg32(TCGv_i32 var, int reg)
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
          tcg_gen_ext_i32_i64(nf, cpu_NF);
          tcg_gen_ext_i32_i64(vf, cpu_VF);
 -        neon_load_reg64(frn, rn);
 -        neon_load_reg64(frm, rm);
 +        vfp_load_reg64(frn, rn);
 +        vfp_load_reg64(frm, rm);
          switch (a->cc) {
          case 0: /* eq: Z */
              tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
              tcg_temp_free_i64(tmp);
              break;
          }
 -        neon_store_reg64(dest, rd);
 +        vfp_store_reg64(dest, rd);
          tcg_temp_free_i64(frn);
          tcg_temp_free_i64(frm);
          tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          TCGv_i64 tcg_res;
          tcg_op = tcg_temp_new_i64();
          tcg_res = tcg_temp_new_i64();
 -        neon_load_reg64(tcg_op, rm);
 +        vfp_load_reg64(tcg_op, rm);
          gen_helper_rintd(tcg_res, tcg_op, fpst);
 -        neon_store_reg64(tcg_res, rd);
 +        vfp_store_reg64(tcg_res, rd);
          tcg_temp_free_i64(tcg_op);
          tcg_temp_free_i64(tcg_res);
      } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
          tcg_double = tcg_temp_new_i64();
          tcg_res = tcg_temp_new_i64();
          tcg_tmp = tcg_temp_new_i32();
 -        neon_load_reg64(tcg_double, rm);
 +        vfp_load_reg64(tcg_double, rm);
          if (is_signed) {
              gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
          } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
      tmp = tcg_temp_new_i64();
      if (a->l) {
          gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
 -        neon_store_reg64(tmp, a->vd);
 +        vfp_store_reg64(tmp, a->vd);
      } else {
 -        neon_load_reg64(tmp, a->vd);
 +        vfp_load_reg64(tmp, a->vd);
          gen_aa32_st64(s, tmp, addr, get_mem_index(s));
      }
      tcg_temp_free_i64(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
          if (a->l) {
              /* load */
              gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
 -            neon_store_reg64(tmp, a->vd + i);
 +            vfp_store_reg64(tmp, a->vd + i);
          } else {
              /* store */
 -            neon_load_reg64(tmp, a->vd + i);
 +            vfp_load_reg64(tmp, a->vd + i);
              gen_aa32_st64(s, tmp, addr, get_mem_index(s));
          }
          tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
      fd = tcg_temp_new_i64();
      fpst = fpstatus_ptr(FPST_FPCR);
 -    neon_load_reg64(f0, vn);
 -    neon_load_reg64(f1, vm);
 +    vfp_load_reg64(f0, vn);
 +    vfp_load_reg64(f1, vm);
      for (;;) {
          if (reads_vd) {
 -            neon_load_reg64(fd, vd);
 +            vfp_load_reg64(fd, vd);
          }
          fn(fd, f0, f1, fpst);
 -        neon_store_reg64(fd, vd);
 +        vfp_store_reg64(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
          veclen--;
          vd = vfp_advance_dreg(vd, delta_d);
          vn = vfp_advance_dreg(vn, delta_d);
 -        neon_load_reg64(f0, vn);
 +        vfp_load_reg64(f0, vn);
          if (delta_m) {
              vm = vfp_advance_dreg(vm, delta_m);
 -            neon_load_reg64(f1, vm);
 +            vfp_load_reg64(f1, vm);
          }
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
      f0 = tcg_temp_new_i64();
      fd = tcg_temp_new_i64();
 -    neon_load_reg64(f0, vm);
 +    vfp_load_reg64(f0, vm);
      for (;;) {
          fn(fd, f0);
 -        neon_store_reg64(fd, vd);
 +        vfp_store_reg64(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
              /* single source one-many */
              while (veclen--) {
                  vd = vfp_advance_dreg(vd, delta_d);
 -                neon_store_reg64(fd, vd);
 +                vfp_store_reg64(fd, vd);
              }
              break;
          }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
          veclen--;
          vd = vfp_advance_dreg(vd, delta_d);
          vd = vfp_advance_dreg(vm, delta_m);
 -        neon_load_reg64(f0, vm);
 +        vfp_load_reg64(f0, vm);
      }
      tcg_temp_free_i64(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
      vm = tcg_temp_new_i64();
      vd = tcg_temp_new_i64();
 -    neon_load_reg64(vn, a->vn);
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vn, a->vn);
 +    vfp_load_reg64(vm, a->vm);
      if (neg_n) {
          /* VFNMS, VFMS */
          gen_helper_vfp_negd(vn, vn);
      }
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      if (neg_d) {
          /* VFNMA, VFNMS */
          gen_helper_vfp_negd(vd, vd);
      }
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
      fd = tcg_const_i64(vfp_expand_imm(MO_64, a->imm));
      for (;;) {
 -        neon_store_reg64(fd, vd);
 +        vfp_store_reg64(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
      vd = tcg_temp_new_i64();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      if (a->z) {
          tcg_gen_movi_i64(vm, 0);
      } else {
 -        neon_load_reg64(vm, a->vm);
 +        vfp_load_reg64(vm, a->vm);
      }
      if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
      tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
      vd = tcg_temp_new_i64();
      gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(ahp_mode);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
      tmp = tcg_temp_new_i32();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
      tcg_temp_free_i64(vm);
      tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rintd(tmp, tmp, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rintd(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      tcg_temp_free_i32(tcg_rmode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
      }
      tmp = tcg_temp_new_i64();
 -    neon_load_reg64(tmp, a->vm);
 +    vfp_load_reg64(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rintd_exact(tmp, tmp, fpst);
 -    neon_store_reg64(tmp, a->vd);
 +    vfp_store_reg64(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i64(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
      vd = tcg_temp_new_i64();
      vfp_load_reg32(vm, a->vm);
      gen_helper_vfp_fcvtds(vd, vm, cpu_env);
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_i64(vd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
      vd = tcg_temp_new_i32();
      vm = tcg_temp_new_i64();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
      vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
          /* u32 -> f64 */
          gen_helper_vfp_uitod(vd, vm, fpst);
      }
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_i64(vd);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
      vm = tcg_temp_new_i64();
      vd = tcg_temp_new_i32();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      gen_helper_vjcvt(vd, vm, cpu_env);
      vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i64(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i64();
 -    neon_load_reg64(vd, a->vd);
 +    vfp_load_reg64(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
          g_assert_not_reached();
      }
 -    neon_store_reg64(vd, a->vd);
 +    vfp_store_reg64(vd, a->vd);
      tcg_temp_free_i64(vd);
      tcg_temp_free_i32(shift);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
      fpst = fpstatus_ptr(FPST_FPCR);
      vm = tcg_temp_new_i64();
      vd = tcg_temp_new_i32();
 -    neon_load_reg64(vm, a->vm);
 +    vfp_load_reg64(vm, a->vm);
      if (a->s) {
          if (a->rz) {
 --
 .20.1

-[PULL 24/51] linux-user/aarch64: Rebuild hflags for TARGET_WORDS_BIGENDIAN
+[PULL 10/26] target/arm: Simplify do_long_3d and do_2scalar_long
 From: Richard Henderson <richard.henderson@linaro.org>
-Continue setting, but not relying upon, env->hflags.
+In both cases, we can sink the write-back and perform
 the accumulate into the normal destination temps.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-23-richard.henderson@linaro.org
+Message-id: 20201030022618.785675-11-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- linux-user/aarch64/cpu_loop.c | 1 +
+ target/arm/translate-neon.c.inc | 23 +++++++++--------------
-file changed, 1 insertion(+)
+file changed, 9 insertions(+), 14 deletions(-)
-diff --git a/linux-user/aarch64/cpu_loop.c b/linux-user/aarch64/cpu_loop.c
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
---- a/linux-user/aarch64/cpu_loop.c
+--- a/target/arm/translate-neon.c.inc
-+++ b/linux-user/aarch64/cpu_loop.c
++++ b/target/arm/translate-neon.c.inc
-@@ -XXX,XX +XXX,XX @@ void target_cpu_copy_regs(CPUArchState *env, struct target_pt_regs *regs)
+@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
-     for (i = 1; i < 4; ++i) {
+     if (accfn) {
-         env->cp15.sctlr_el[i] |= SCTLR_EE;
+         tmp = tcg_temp_new_i64();
          read_neon_element64(tmp, a->vd, 0, MO_64);
 -        accfn(tmp, tmp, rd0);
 -        write_neon_element64(tmp, a->vd, 0, MO_64);
 +        accfn(rd0, tmp, rd0);
          read_neon_element64(tmp, a->vd, 1, MO_64);
 -        accfn(tmp, tmp, rd1);
 -        write_neon_element64(tmp, a->vd, 1, MO_64);
 +        accfn(rd1, tmp, rd1);
          tcg_temp_free_i64(tmp);
 -    } else {
 -        write_neon_element64(rd0, a->vd, 0, MO_64);
 -        write_neon_element64(rd1, a->vd, 1, MO_64);
      }
-+    arm_rebuild_hflags(env);
- #endif
++    write_neon_element64(rd0, a->vd, 0, MO_64);
++    write_neon_element64(rd1, a->vd, 1, MO_64);
-     if (cpu_isar_feature(aa64_pauth, cpu)) {
+     tcg_temp_free_i64(rd0);
      tcg_temp_free_i64(rd1);
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
      if (accfn) {
          TCGv_i64 t64 = tcg_temp_new_i64();
          read_neon_element64(t64, a->vd, 0, MO_64);
 -        accfn(t64, t64, rn0_64);
 -        write_neon_element64(t64, a->vd, 0, MO_64);
 +        accfn(rn0_64, t64, rn0_64);
          read_neon_element64(t64, a->vd, 1, MO_64);
 -        accfn(t64, t64, rn1_64);
 -        write_neon_element64(t64, a->vd, 1, MO_64);
 +        accfn(rn1_64, t64, rn1_64);
          tcg_temp_free_i64(t64);
 -    } else {
 -        write_neon_element64(rn0_64, a->vd, 0, MO_64);
 -        write_neon_element64(rn1_64, a->vd, 1, MO_64);
      }
 +
 +    write_neon_element64(rn0_64, a->vd, 0, MO_64);
 +    write_neon_element64(rn1_64, a->vd, 1, MO_64);
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
      return true;
 --
 .20.1

-[PULL 13/51] target/arm: Hoist computation of TBFLAG_A32.VFPEN
+[PULL 11/26] target/arm: Improve do_prewiden_3d
 From: Richard Henderson <richard.henderson@linaro.org>
-There are 3 conditions that each enable this flag.  M-profile always
+We can use proper widening loads to extend 32-bit inputs,
-enables; A-profile with EL1 as AA64 always enables.  Both of these
+and skip the "widenfn" step.
 conditions can easily be cached.  The final condition relies on the
 FPEXC register which we are not prepared to cache.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-12-richard.henderson@linaro.org
+Message-id: 20201030022618.785675-12-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.h    |  2 +-
+ target/arm/translate.c          |  6 +++
- target/arm/helper.c | 14 ++++++++++----
+ target/arm/translate-neon.c.inc | 66 ++++++++++++++++++---------------
-files changed, 11 insertions(+), 5 deletions(-)
+files changed, 43 insertions(+), 29 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
+--- a/target/arm/translate.c
-+++ b/target/arm/cpu.h
++++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A32, XSCALE_CPAR, 4, 2)
+@@ -XXX,XX +XXX,XX @@ static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
-  * the same thing as the current security state of the processor!
+     long off = neon_element_offset(reg, ele, memop);
-  */
- FIELD(TBFLAG_A32, NS, 6, 1)
+     switch (memop) {
--FIELD(TBFLAG_A32, VFPEN, 7, 1)          /* Not cached. */
++    case MO_SL:
-+FIELD(TBFLAG_A32, VFPEN, 7, 1)          /* Partially cached, minus FPEXC. */
++        tcg_gen_ld32s_i64(dest, cpu_env, off);
- FIELD(TBFLAG_A32, CONDEXEC, 8, 8)       /* Not cached. */
++        break;
- FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
++    case MO_UL:
- /* For M profile only, set if FPCCR.LSPACT is set */
++        tcg_gen_ld32u_i64(dest, cpu_env, off);
-diff --git a/target/arm/helper.c b/target/arm/helper.c
++        break;
      case MO_Q:
          tcg_gen_ld_i64(dest, cpu_env, off);
          break;
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/translate-neon.c.inc
-+++ b/target/arm/helper.c
++++ b/target/arm/translate-neon.c.inc
-@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_m32(CPUARMState *env, int fp_el,
+@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
  static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
                             NeonGenWidenFn *widenfn,
                             NeonGenTwo64OpFn *opfn,
 -                           bool src1_wide)
 +                           int src1_mop, int src2_mop)
  {
-     uint32_t flags = 0;
+     /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
+     TCGv_i64 rn0_64, rn1_64, rm_64;
-+    /* v8M always enables the fpu.  */
+-    TCGv_i32 rm;
-+    flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
-+
+     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-     if (arm_v7m_is_handler_mode(env)) {
+         return false;
-         flags = FIELD_DP32(flags, TBFLAG_A32, HANDLER, 1);
+@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
          return false;
      }
-@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_a32(CPUARMState *env, int fp_el,
-                                    ARMMMUIdx mmu_idx)
+-    if (!widenfn || !opfn) {
- {
++    if (!opfn) {
-     uint32_t flags = rebuild_hflags_aprofile(env);
+         /* size == 3 case, which is an entirely different insn group */
-+
+         return false;
-+    if (arm_el_is_aa64(env, 1)) {
+     }
-+        flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
 -    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
 +    if ((a->vd & 1) || (src1_mop == MO_Q && (a->vn & 1))) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      rn1_64 = tcg_temp_new_i64();
      rm_64 = tcg_temp_new_i64();
 -    if (src1_wide) {
 -        read_neon_element64(rn0_64, a->vn, 0, MO_64);
 +    if (src1_mop >= 0) {
 +        read_neon_element64(rn0_64, a->vn, 0, src1_mop);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 0, MO_32);
          widenfn(rn0_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = tcg_temp_new_i32();
 -    read_neon_element32(rm, a->vm, 0, MO_32);
 +    if (src2_mop >= 0) {
 +        read_neon_element64(rm_64, a->vm, 0, src2_mop);
 +    } else {
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vm, 0, MO_32);
 +        widenfn(rm_64, tmp);
 +        tcg_temp_free_i32(tmp);
 +    }
-     return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
 -    widenfn(rm_64, rm);
 -    tcg_temp_free_i32(rm);
      opfn(rn0_64, rn0_64, rm_64);
      /*
       * Load second pass inputs before storing the first pass result, to
       * avoid incorrect results if a narrow input overlaps with the result.
       */
 -    if (src1_wide) {
 -        read_neon_element64(rn1_64, a->vn, 1, MO_64);
 +    if (src1_mop >= 0) {
 +        read_neon_element64(rn1_64, a->vn, 1, src1_mop);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 1, MO_32);
          widenfn(rn1_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = tcg_temp_new_i32();
 -    read_neon_element32(rm, a->vm, 1, MO_32);
 +    if (src2_mop >= 0) {
 +        read_neon_element64(rm_64, a->vm, 1, src2_mop);
 +    } else {
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vm, 1, MO_32);
 +        widenfn(rm_64, tmp);
 +        tcg_temp_free_i32(tmp);
 +    }
      write_neon_element64(rn0_64, a->vd, 0, MO_64);
 -    widenfn(rm_64, rm);
 -    tcg_temp_free_i32(rm);
      opfn(rn1_64, rn1_64, rm_64);
      write_neon_element64(rn1_64, a->vd, 1, MO_64);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      return true;
  }
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
+-#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
-                 flags = FIELD_DP32(flags, TBFLAG_A32, VECSTRIDE,
++#define DO_PREWIDEN(INSN, S, OP, SRC1WIDE, SIGN)                        \
-                                    env->vfp.vec_stride);
+     static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
-             }
+     {                                                                   \
-+            if (env->vfp.xregs[ARM_VFP_FPEXC] & (1 << 30)) {
+         static NeonGenWidenFn * const widenfn[] = {                     \
-+                flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
+             gen_helper_neon_widen_##S##8,                               \
-+            }
+             gen_helper_neon_widen_##S##16,                              \
-         }
+-            tcg_gen_##EXT##_i32_i64,                                    \
+-            NULL,                                                       \
-         flags = FIELD_DP32(flags, TBFLAG_A32, THUMB, env->thumb);
++            NULL, NULL,                                                 \
-         flags = FIELD_DP32(flags, TBFLAG_A32, CONDEXEC, env->condexec_bits);
+         };                                                              \
--        if (env->vfp.xregs[ARM_VFP_FPEXC] & (1 << 30)
+         static NeonGenTwo64OpFn * const addfn[] = {                     \
--            || arm_el_is_aa64(env, 1) || arm_feature(env, ARM_FEATURE_M)) {
+             gen_helper_neon_##OP##l_u16,                                \
--            flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
+@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
--        }
+             tcg_gen_##OP##_i64,                                         \
-         pstate_for_ss = env->uncached_cpsr;
+             NULL,                                                       \
          };                                                              \
 -        return do_prewiden_3d(s, a, widenfn[a->size],                   \
 -                              addfn[a->size], SRC1WIDE);                \
 +        int narrow_mop = a->size == MO_32 ? MO_32 | SIGN : -1;          \
 +        return do_prewiden_3d(s, a, widenfn[a->size], addfn[a->size],   \
 +                              SRC1WIDE ? MO_Q : narrow_mop,             \
 +                              narrow_mop);                              \
      }
+-DO_PREWIDEN(VADDL_S, s, ext, add, false)
+-DO_PREWIDEN(VADDL_U, u, extu, add, false)
+-DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
+-DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
+-DO_PREWIDEN(VADDW_S, s, ext, add, true)
+-DO_PREWIDEN(VADDW_U, u, extu, add, true)
+-DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
+-DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
++DO_PREWIDEN(VADDL_S, s, add, false, MO_SIGN)
++DO_PREWIDEN(VADDL_U, u, add, false, 0)
++DO_PREWIDEN(VSUBL_S, s, sub, false, MO_SIGN)
++DO_PREWIDEN(VSUBL_U, u, sub, false, 0)
++DO_PREWIDEN(VADDW_S, s, add, true, MO_SIGN)
++DO_PREWIDEN(VADDW_U, u, add, true, 0)
++DO_PREWIDEN(VSUBW_S, s, sub, true, MO_SIGN)
++DO_PREWIDEN(VSUBW_U, u, sub, true, 0)
+ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
+                          NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
 --
 .20.1

-[PULL 28/51] hw/timer/xilinx_timer.c: Switch to transaction-based ptimer API
+[PULL 12/26] target/arm: Fix float16 pairwise Neon ops on big-endian hosts
-Switch the xilinx_timer code away from bottom-half based ptimers to
+In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
-the new transaction-based ptimer API.  This just requires adding
+meant we were using the H4() address swizzler macro rather than the
-begin/commit calls around the various places that modify the ptimer
+H2() which is required for 2-byte data.  This had no effect on
-state, and using the new ptimer_init() function to create the timer.
+little-endian hosts but meant we put the result data into the
 destination Dreg in the wrong order on big-endian hosts.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
+Message-id: 20201028191712.4910-2-peter.maydell@linaro.org
 Message-id: 20191017132122.4402-3-peter.maydell@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/timer/xilinx_timer.c | 13 ++++++++-----
+ target/arm/vec_helper.c | 8 ++++----
-file changed, 8 insertions(+), 5 deletions(-)
+file changed, 4 insertions(+), 4 deletions(-)
-diff --git a/hw/timer/xilinx_timer.c b/hw/timer/xilinx_timer.c
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/timer/xilinx_timer.c
+--- a/target/arm/vec_helper.c
-+++ b/hw/timer/xilinx_timer.c
++++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ DO_ABA(gvec_uaba_d, uint64_t)
- #include "hw/ptimer.h"
+         r2 = float16_##OP(m[H2(0)], m[H2(1)], fpst);                    \
- #include "hw/qdev-properties.h"
+         r3 = float16_##OP(m[H2(2)], m[H2(3)], fpst);                    \
- #include "qemu/log.h"
+                                                                         \
--#include "qemu/main-loop.h"
+-        d[H4(0)] = r0;                                                  \
- #include "qemu/module.h"
+-        d[H4(1)] = r1;                                                  \
+-        d[H4(2)] = r2;                                                  \
- #define D(x)
+-        d[H4(3)] = r3;                                                  \
-@@ -XXX,XX +XXX,XX @@
++        d[H2(0)] = r0;                                                  \
++        d[H2(1)] = r1;                                                  \
- struct xlx_timer
++        d[H2(2)] = r2;                                                  \
- {
++        d[H2(3)] = r3;                                                  \
 -    QEMUBH *bh;
      ptimer_state *ptimer;
      void *parent;
      int nr; /* for debug.  */
@@ -XXX,XX +XXX,XX @@ timer_read(void *opaque, hwaddr addr, unsigned int size)
      return r;
  }
 +/* Must be called inside ptimer transaction block */
  static void timer_enable(struct xlx_timer *xt)
  {
      uint64_t count;
@@ -XXX,XX +XXX,XX @@ timer_write(void *opaque, hwaddr addr,
                  value &= ~TCSR_TINT;
              xt->regs[addr] = value & 0x7ff;
 -            if (value & TCSR_ENT)
 +            if (value & TCSR_ENT) {
 +                ptimer_transaction_begin(xt->ptimer);
                  timer_enable(xt);
 +                ptimer_transaction_commit(xt->ptimer);
 +            }
              break;
          default:
@@ -XXX,XX +XXX,XX @@ static void xilinx_timer_realize(DeviceState *dev, Error **errp)
          xt->parent = t;
          xt->nr = i;
 -        xt->bh = qemu_bh_new(timer_hit, xt);
 -        xt->ptimer = ptimer_init_with_bh(xt->bh, PTIMER_POLICY_DEFAULT);
 +        xt->ptimer = ptimer_init(timer_hit, xt, PTIMER_POLICY_DEFAULT);
 +        ptimer_transaction_begin(xt->ptimer);
          ptimer_set_freq(xt->ptimer, t->freq_hz);
 +        ptimer_transaction_commit(xt->ptimer);
      }
-     memory_region_init_io(&t->mmio, OBJECT(t), &timer_ops, t, "xlnx.xps-timer",
+ DO_NEON_PAIRWISE(neon_padd, add)
 --
 .20.1

-[PULL 32/51] hw/timer/grlib_gptimer.c: Switch to transaction-based ptimer API
+[PULL 13/26] target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
-Switch the grlib_gptimer code away from bottom-half based ptimers to
+The helper functions for performing the udot/sdot operations against
-the new transaction-based ptimer API.  This just requires adding
+a scalar were not using an address-swizzling macro when converting
-begin/commit calls around the various places that modify the ptimer
+the index of the scalar element into a pointer into the vm array.
-state, and using the new ptimer_init() function to create the timer.
+This had no effect on little-endian hosts but meant we generated
 incorrect results on big-endian hosts.
 For these insns, the index is indexing over group of 4 8-bit values,
 so 32 bits per indexed entity, and H4() is therefore what we want.
 (For Neon the only possible input indexes are 0 and 1.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20191021134357.14266-3-peter.maydell@linaro.org
+Message-id: 20201028191712.4910-3-peter.maydell@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/timer/grlib_gptimer.c | 28 ++++++++++++++++++++++++----
+ target/arm/vec_helper.c | 4 ++--
-file changed, 24 insertions(+), 4 deletions(-)
+file changed, 2 insertions(+), 2 deletions(-)
-diff --git a/hw/timer/grlib_gptimer.c b/hw/timer/grlib_gptimer.c
+diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/timer/grlib_gptimer.c
+--- a/target/arm/vec_helper.c
-+++ b/hw/timer/grlib_gptimer.c
++++ b/target/arm/vec_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
- #include "hw/irq.h"
+     intptr_t index = simd_data(desc);
- #include "hw/ptimer.h"
+     uint32_t *d = vd;
- #include "hw/qdev-properties.h"
+     int8_t *n = vn;
--#include "qemu/main-loop.h"
+-    int8_t *m_indexed = (int8_t *)vm + index * 4;
- #include "qemu/module.h"
++    int8_t *m_indexed = (int8_t *)vm + H4(index) * 4;
- #include "trace.h"
+     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
-@@ -XXX,XX +XXX,XX @@ typedef struct GPTimer     GPTimer;
+      * Otherwise opr_sz is a multiple of 16.
- typedef struct GPTimerUnit GPTimerUnit;
+@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
+     intptr_t index = simd_data(desc);
- struct GPTimer {
+     uint32_t *d = vd;
--    QEMUBH *bh;
+     uint8_t *n = vn;
-     struct ptimer_state *ptimer;
+-    uint8_t *m_indexed = (uint8_t *)vm + index * 4;
++    uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4;
-     qemu_irq     irq;
-@@ -XXX,XX +XXX,XX @@ struct GPTimerUnit {
+     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
-     uint32_t config;
+      * Otherwise opr_sz is a multiple of 16.
  };
 +static void grlib_gptimer_tx_begin(GPTimer *timer)
 +{
 +    ptimer_transaction_begin(timer->ptimer);
 +}
 +
 +static void grlib_gptimer_tx_commit(GPTimer *timer)
 +{
 +    ptimer_transaction_commit(timer->ptimer);
 +}
 +
 +/* Must be called within grlib_gptimer_tx_begin/commit block */
  static void grlib_gptimer_enable(GPTimer *timer)
  {
      assert(timer != NULL);
@@ -XXX,XX +XXX,XX @@ static void grlib_gptimer_enable(GPTimer *timer)
      ptimer_run(timer->ptimer, 1);
  }
 +/* Must be called within grlib_gptimer_tx_begin/commit block */
  static void grlib_gptimer_restart(GPTimer *timer)
  {
      assert(timer != NULL);
@@ -XXX,XX +XXX,XX @@ static void grlib_gptimer_set_scaler(GPTimerUnit *unit, uint32_t scaler)
      trace_grlib_gptimer_set_scaler(scaler, value);
      for (i = 0; i < unit->nr_timers; i++) {
 +        ptimer_transaction_begin(unit->timers[i].ptimer);
          ptimer_set_freq(unit->timers[i].ptimer, value);
 +        ptimer_transaction_commit(unit->timers[i].ptimer);
      }
  }
@@ -XXX,XX +XXX,XX @@ static void grlib_gptimer_write(void *opaque, hwaddr addr,
          switch (timer_addr) {
          case COUNTER_OFFSET:
              trace_grlib_gptimer_writel(id, addr, value);
 +            grlib_gptimer_tx_begin(&unit->timers[id]);
              unit->timers[id].counter = value;
              grlib_gptimer_enable(&unit->timers[id]);
 +            grlib_gptimer_tx_commit(&unit->timers[id]);
              return;
          case COUNTER_RELOAD_OFFSET:
@@ -XXX,XX +XXX,XX @@ static void grlib_gptimer_write(void *opaque, hwaddr addr,
              /* gptimer_restart calls gptimer_enable, so if "enable" and "load"
                 bits are present, we just have to call restart. */
 +            grlib_gptimer_tx_begin(&unit->timers[id]);
              if (value & GPTIMER_LOAD) {
                  grlib_gptimer_restart(&unit->timers[id]);
              } else if (value & GPTIMER_ENABLE) {
@@ -XXX,XX +XXX,XX @@ static void grlib_gptimer_write(void *opaque, hwaddr addr,
              value &= ~(GPTIMER_LOAD & GPTIMER_DEBUG_HALT);
              unit->timers[id].config = value;
 +            grlib_gptimer_tx_commit(&unit->timers[id]);
              return;
          default:
@@ -XXX,XX +XXX,XX @@ static void grlib_gptimer_reset(DeviceState *d)
          timer->counter = 0;
          timer->reload = 0;
          timer->config = 0;
 +        ptimer_transaction_begin(timer->ptimer);
          ptimer_stop(timer->ptimer);
          ptimer_set_count(timer->ptimer, 0);
          ptimer_set_freq(timer->ptimer, unit->freq_hz);
 +        ptimer_transaction_commit(timer->ptimer);
      }
  }
@@ -XXX,XX +XXX,XX @@ static void grlib_gptimer_realize(DeviceState *dev, Error **errp)
          GPTimer *timer = &unit->timers[i];
          timer->unit   = unit;
 -        timer->bh     = qemu_bh_new(grlib_gptimer_hit, timer);
 -        timer->ptimer = ptimer_init_with_bh(timer->bh, PTIMER_POLICY_DEFAULT);
 +        timer->ptimer = ptimer_init(grlib_gptimer_hit, timer,
 +                                    PTIMER_POLICY_DEFAULT);
          timer->id     = i;
          /* One IRQ line for each timer */
          sysbus_init_irq(sbd, &timer->irq);
 +        ptimer_transaction_begin(timer->ptimer);
          ptimer_set_freq(timer->ptimer, unit->freq_hz);
 +        ptimer_transaction_commit(timer->ptimer);
      }
      memory_region_init_io(&unit->iomem, OBJECT(unit), &grlib_gptimer_ops,
 --
 .20.1

-[PULL 07/51] target/arm: Split out rebuild_hflags_m32
+[PULL 14/26] target/arm: fix handling of HCR.FB
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
-Create a function to compute the values of the TBFLAG_A32 bits
+HCR should be applied when NS is set, not when it is cleared.
 that will be cached, and are used by M-profile.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20191023150057.25731-6-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c | 45 ++++++++++++++++++++++++++++++---------------
+ target/arm/helper.c | 5 ++---
-file changed, 30 insertions(+), 15 deletions(-)
+file changed, 2 insertions(+), 3 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_common_32(CPUARMState *env, int fp_el,
+@@ -XXX,XX +XXX,XX @@ static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
-     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
  /*
   * Non-IS variants of TLB operations are upgraded to
 - * IS versions if we are at NS EL1 and HCR_EL2.FB is set to
 + * IS versions if we are at EL1 and HCR_EL2.FB is effectively set to
   * force broadcast of these operations.
   */
  static bool tlb_force_broadcast(CPUARMState *env)
  {
 -    return (env->cp15.hcr_el2 & HCR_FB) &&
 -        arm_current_el(env) == 1 && arm_is_secure_below_el3(env);
 +    return arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_FB);
  }
-+static uint32_t rebuild_hflags_m32(CPUARMState *env, int fp_el,
+ static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
 +                                   ARMMMUIdx mmu_idx)
 +{
 +    uint32_t flags = 0;
 +
 +    if (arm_v7m_is_handler_mode(env)) {
 +        flags = FIELD_DP32(flags, TBFLAG_A32, HANDLER, 1);
 +    }
 +
 +    /*
 +     * v8M always applies stack limit checks unless CCR.STKOFHFNMIGN
 +     * is suppressing them because the requested execution priority
 +     * is less than 0.
 +     */
 +    if (arm_feature(env, ARM_FEATURE_V8) &&
 +        !((mmu_idx & ARM_MMU_IDX_M_NEGPRI) &&
 +          (env->v7m.ccr[env->v7m.secure] & R_V7M_CCR_STKOFHFNMIGN_MASK))) {
 +        flags = FIELD_DP32(flags, TBFLAG_A32, STACKCHECK, 1);
 +    }
 +
 +    return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
 +}
 +
  static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
                                     ARMMMUIdx mmu_idx)
  {
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
          }
      } else {
          *pc = env->regs[15];
 -        flags = rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
 +
 +        if (arm_feature(env, ARM_FEATURE_M)) {
 +            flags = rebuild_hflags_m32(env, fp_el, mmu_idx);
 +        } else {
 +            flags = rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
 +        }
 +
          flags = FIELD_DP32(flags, TBFLAG_A32, THUMB, env->thumb);
          flags = FIELD_DP32(flags, TBFLAG_A32, VECLEN, env->vfp.vec_len);
          flags = FIELD_DP32(flags, TBFLAG_A32, VECSTRIDE, env->vfp.vec_stride);
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
          }
      }
 -    if (arm_v7m_is_handler_mode(env)) {
 -        flags = FIELD_DP32(flags, TBFLAG_A32, HANDLER, 1);
 -    }
 -
 -    /* v8M always applies stack limit checks unless CCR.STKOFHFNMIGN is
 -     * suppressing them because the requested execution priority is less than 0.
 -     */
 -    if (arm_feature(env, ARM_FEATURE_V8) &&
 -        arm_feature(env, ARM_FEATURE_M) &&
 -        !((mmu_idx  & ARM_MMU_IDX_M_NEGPRI) &&
 -          (env->v7m.ccr[env->v7m.secure] & R_V7M_CCR_STKOFHFNMIGN_MASK))) {
 -        flags = FIELD_DP32(flags, TBFLAG_A32, STACKCHECK, 1);
 -    }
 -
      if (arm_feature(env, ARM_FEATURE_M_SECURITY) &&
          FIELD_EX32(env->v7m.fpccr[M_REG_S], V7M_FPCCR, S) != env->v7m.secure) {
          flags = FIELD_DP32(flags, TBFLAG_A32, FPCCR_S_WRONG, 1);
 --
 .20.1

-[PULL 05/51] target/arm: Split out rebuild_hflags_common_32
+[PULL 15/26] target/arm: fix LORID_EL1 access check
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
-Create a function to compute the values of the TBFLAG_A32 bits
+Secure mode is not exempted from checking SCR_EL3.TLOR, and in the
-that will be cached, and are used by all profiles.
+future HCR_EL2.TLOR when S-EL2 is enabled.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20191023150057.25731-4-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c | 16 +++++++++++-----
+ target/arm/helper.c | 19 +++++--------------
-file changed, 11 insertions(+), 5 deletions(-)
+file changed, 5 insertions(+), 14 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_common(CPUARMState *env, int fp_el,
+@@ -XXX,XX +XXX,XX @@ static uint64_t id_aa64pfr0_read(CPUARMState *env, const ARMCPRegInfo *ri)
-     return flags;
+ #endif
  /* Shared logic between LORID and the rest of the LOR* registers.
 - * Secure state has already been delt with.
 + * Secure state exclusion has already been dealt with.
   */
 -static CPAccessResult access_lor_ns(CPUARMState *env)
 +static CPAccessResult access_lor_ns(CPUARMState *env,
 +                                    const ARMCPRegInfo *ri, bool isread)
  {
      int el = arm_current_el(env);
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_ns(CPUARMState *env)
      return CP_ACCESS_OK;
  }
-+static uint32_t rebuild_hflags_common_32(CPUARMState *env, int fp_el,
+-static CPAccessResult access_lorid(CPUARMState *env, const ARMCPRegInfo *ri,
-+                                         ARMMMUIdx mmu_idx, uint32_t flags)
+-                                   bool isread)
-+{
+-{
-+    flags = FIELD_DP32(flags, TBFLAG_A32, SCTLR_B, arm_sctlr_b(env));
+-    if (arm_is_secure_below_el3(env)) {
-+    flags = FIELD_DP32(flags, TBFLAG_A32, NS, !access_secure_reg(env));
+-        /* Access ok in secure mode.  */
-+
+-        return CP_ACCESS_OK;
-+    return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
+-    }
-+}
+-    return access_lor_ns(env);
-+
+-}
- static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
+-
-                                    ARMMMUIdx mmu_idx)
+ static CPAccessResult access_lor_other(CPUARMState *env,
                                         const ARMCPRegInfo *ri, bool isread)
  {
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
+@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_other(CPUARMState *env,
-     ARMMMUIdx mmu_idx = arm_mmu_idx(env);
+         /* Access denied in secure mode.  */
-     int current_el = arm_current_el(env);
+         return CP_ACCESS_TRAP;
      int fp_el = fp_exception_el(env, current_el);
 -    uint32_t flags = 0;
 +    uint32_t flags;
      if (is_a64(env)) {
          *pc = env->pc;
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
          }
      } else {
          *pc = env->regs[15];
 +        flags = rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
          flags = FIELD_DP32(flags, TBFLAG_A32, THUMB, env->thumb);
          flags = FIELD_DP32(flags, TBFLAG_A32, VECLEN, env->vfp.vec_len);
          flags = FIELD_DP32(flags, TBFLAG_A32, VECSTRIDE, env->vfp.vec_stride);
          flags = FIELD_DP32(flags, TBFLAG_A32, CONDEXEC, env->condexec_bits);
 -        flags = FIELD_DP32(flags, TBFLAG_A32, SCTLR_B, arm_sctlr_b(env));
 -        flags = FIELD_DP32(flags, TBFLAG_A32, NS, !access_secure_reg(env));
          if (env->vfp.xregs[ARM_VFP_FPEXC] & (1 << 30)
              || arm_el_is_aa64(env, 1) || arm_feature(env, ARM_FEATURE_M)) {
              flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
              flags = FIELD_DP32(flags, TBFLAG_A32,
                                 XSCALE_CPAR, env->cp15.c15_cpar);
          }
 -
 -        flags = rebuild_hflags_common(env, fp_el, mmu_idx, flags);
      }
+-    return access_lor_ns(env);
-     /* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
++    return access_lor_ns(env, ri, isread);
  }
  /*
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lor_reginfo[] = {
        .type = ARM_CP_CONST, .resetvalue = 0 },
      { .name = "LORID_EL1", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 7,
 -      .access = PL1_R, .accessfn = access_lorid,
 +      .access = PL1_R, .accessfn = access_lor_ns,
        .type = ARM_CP_CONST, .resetvalue = 0 },
      REGINFO_SENTINEL
  };
 --
 .20.1

-[PULL 08/51] target/arm: Reduce tests vs M-profile in cpu_get_tb_cpu_state
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-Hoist the computation of some TBFLAG_A32 bits that only apply to
-M-profile under a single test for ARM_FEATURE_M.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-7-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/helper.c | 49 +++++++++++++++++++++------------------------
-file changed, 23 insertions(+), 26 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-         if (arm_feature(env, ARM_FEATURE_M)) {
-             flags = rebuild_hflags_m32(env, fp_el, mmu_idx);
-+
-+            if (arm_feature(env, ARM_FEATURE_M_SECURITY) &&
-+                FIELD_EX32(env->v7m.fpccr[M_REG_S], V7M_FPCCR, S)
-+                != env->v7m.secure) {
-+                flags = FIELD_DP32(flags, TBFLAG_A32, FPCCR_S_WRONG, 1);
-+            }
-+
-+            if ((env->v7m.fpccr[env->v7m.secure] & R_V7M_FPCCR_ASPEN_MASK) &&
-+                (!(env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK) ||
-+                 (env->v7m.secure &&
-+                  !(env->v7m.control[M_REG_S] & R_V7M_CONTROL_SFPA_MASK)))) {
-+                /*
-+                 * ASPEN is set, but FPCA/SFPA indicate that there is no
-+                 * active FP context; we must create a new FP context before
-+                 * executing any FP insn.
-+                 */
-+                flags = FIELD_DP32(flags, TBFLAG_A32, NEW_FP_CTXT_NEEDED, 1);
-+            }
-+
-+            bool is_secure = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK;
-+            if (env->v7m.fpccr[is_secure] & R_V7M_FPCCR_LSPACT_MASK) {
-+                flags = FIELD_DP32(flags, TBFLAG_A32, LSPACT, 1);
-+            }
-         } else {
-             flags = rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
-         }
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-         }
-     }
--    if (arm_feature(env, ARM_FEATURE_M_SECURITY) &&
--        FIELD_EX32(env->v7m.fpccr[M_REG_S], V7M_FPCCR, S) != env->v7m.secure) {
--        flags = FIELD_DP32(flags, TBFLAG_A32, FPCCR_S_WRONG, 1);
--    }
--
--    if (arm_feature(env, ARM_FEATURE_M) &&
--        (env->v7m.fpccr[env->v7m.secure] & R_V7M_FPCCR_ASPEN_MASK) &&
--        (!(env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK) ||
--         (env->v7m.secure &&
--          !(env->v7m.control[M_REG_S] & R_V7M_CONTROL_SFPA_MASK)))) {
--        /*
--         * ASPEN is set, but FPCA/SFPA indicate that there is no active
--         * FP context; we must create a new FP context before executing
--         * any FP insn.
--         */
--        flags = FIELD_DP32(flags, TBFLAG_A32, NEW_FP_CTXT_NEEDED, 1);
--    }
--
--    if (arm_feature(env, ARM_FEATURE_M)) {
--        bool is_secure = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK;
--
--        if (env->v7m.fpccr[is_secure] & R_V7M_FPCCR_LSPACT_MASK) {
--            flags = FIELD_DP32(flags, TBFLAG_A32, LSPACT, 1);
--        }
--    }
--
-     if (!arm_feature(env, ARM_FEATURE_M)) {
-         int target_el = arm_debug_target_el(env);
---
-.20.1

-[PULL 11/51] target/arm: Hoist XSCALE_CPAR, VECLEN, VECSTRIDE in cpu_get_tb_cpu_state
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-We do not need to compute any of these values for M-profile.
-Further, XSCALE_CPAR overlaps VECSTRIDE so obviously the two
-sets must be mutually exclusive.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-10-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/helper.c | 21 ++++++++++++++-------
-file changed, 14 insertions(+), 7 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-             }
-         } else {
-             flags = rebuild_hflags_a32(env, fp_el, mmu_idx);
-+
-+            /*
-+             * Note that XSCALE_CPAR shares bits with VECSTRIDE.
-+             * Note that VECLEN+VECSTRIDE are RES0 for M-profile.
-+             */
-+            if (arm_feature(env, ARM_FEATURE_XSCALE)) {
-+                flags = FIELD_DP32(flags, TBFLAG_A32,
-+                                   XSCALE_CPAR, env->cp15.c15_cpar);
-+            } else {
-+                flags = FIELD_DP32(flags, TBFLAG_A32, VECLEN,
-+                                   env->vfp.vec_len);
-+                flags = FIELD_DP32(flags, TBFLAG_A32, VECSTRIDE,
-+                                   env->vfp.vec_stride);
-+            }
-         }
-         flags = FIELD_DP32(flags, TBFLAG_A32, THUMB, env->thumb);
--        flags = FIELD_DP32(flags, TBFLAG_A32, VECLEN, env->vfp.vec_len);
--        flags = FIELD_DP32(flags, TBFLAG_A32, VECSTRIDE, env->vfp.vec_stride);
-         flags = FIELD_DP32(flags, TBFLAG_A32, CONDEXEC, env->condexec_bits);
-         if (env->vfp.xregs[ARM_VFP_FPEXC] & (1 << 30)
-             || arm_el_is_aa64(env, 1) || arm_feature(env, ARM_FEATURE_M)) {
-             flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
-         }
--        /* Note that XSCALE_CPAR shares bits with VECSTRIDE */
--        if (arm_feature(env, ARM_FEATURE_XSCALE)) {
--            flags = FIELD_DP32(flags, TBFLAG_A32,
--                               XSCALE_CPAR, env->cp15.c15_cpar);
--        }
-     }
-     /* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
---
-.20.1

-[PULL 12/51] target/arm: Simplify set of PSTATE_SS in cpu_get_tb_cpu_state
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-Hoist the variable load for PSTATE into the existing test vs is_a64.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-11-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/helper.c | 20 ++++++++------------
-file changed, 8 insertions(+), 12 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-     ARMMMUIdx mmu_idx = arm_mmu_idx(env);
-     int current_el = arm_current_el(env);
-     int fp_el = fp_exception_el(env, current_el);
--    uint32_t flags;
-+    uint32_t flags, pstate_for_ss;
-     if (is_a64(env)) {
-         *pc = env->pc;
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-         if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
-             flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
-         }
-+        pstate_for_ss = env->pstate;
-     } else {
-         *pc = env->regs[15];
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-             || arm_el_is_aa64(env, 1) || arm_feature(env, ARM_FEATURE_M)) {
-             flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
-         }
-+        pstate_for_ss = env->uncached_cpsr;
-     }
--    /* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
-+    /*
-+     * The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
-      * states defined in the ARM ARM for software singlestep:
-      *  SS_ACTIVE   PSTATE.SS   State
-      *     0            x       Inactive (the TB flag for SS is always 0)
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-      *     1            1       Active-not-pending
-      * SS_ACTIVE is set in hflags; PSTATE_SS is computed every TB.
-      */
--    if (FIELD_EX32(flags, TBFLAG_ANY, SS_ACTIVE)) {
--        if (is_a64(env)) {
--            if (env->pstate & PSTATE_SS) {
--                flags = FIELD_DP32(flags, TBFLAG_ANY, PSTATE_SS, 1);
--            }
--        } else {
--            if (env->uncached_cpsr & PSTATE_SS) {
--                flags = FIELD_DP32(flags, TBFLAG_ANY, PSTATE_SS, 1);
--            }
--        }
-+    if (FIELD_EX32(flags, TBFLAG_ANY, SS_ACTIVE) &&
-+        (pstate_for_ss & PSTATE_SS)) {
-+        flags = FIELD_DP32(flags, TBFLAG_ANY, PSTATE_SS, 1);
-     }
-     *pflags = flags;
---
-.20.1

-[PULL 14/51] target/arm: Add arm_rebuild_hflags
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-This function assumes nothing about the current state of the cpu,
-and writes the computed value to env->hflags.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-13-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/cpu.h    |  6 ++++++
- target/arm/helper.c | 30 ++++++++++++++++++++++--------
-files changed, 28 insertions(+), 8 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
-+++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ void arm_register_pre_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
- void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook, void
-         *opaque);
-+/**
-+ * arm_rebuild_hflags:
-+ * Rebuild the cached TBFLAGS for arbitrary changed processor state.
-+ */
-+void arm_rebuild_hflags(CPUARMState *env);
-+
- /**
-  * aa32_vfp_dreg:
-  * Return a pointer to the Dn register within env in 32-bit mode.
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
-     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
- }
-+static uint32_t rebuild_hflags_internal(CPUARMState *env)
-+{
-+    int el = arm_current_el(env);
-+    int fp_el = fp_exception_el(env, el);
-+    ARMMMUIdx mmu_idx = arm_mmu_idx(env);
-+
-+    if (is_a64(env)) {
-+        return rebuild_hflags_a64(env, el, fp_el, mmu_idx);
-+    } else if (arm_feature(env, ARM_FEATURE_M)) {
-+        return rebuild_hflags_m32(env, fp_el, mmu_idx);
-+    } else {
-+        return rebuild_hflags_a32(env, fp_el, mmu_idx);
-+    }
-+}
-+
-+void arm_rebuild_hflags(CPUARMState *env)
-+{
-+    env->hflags = rebuild_hflags_internal(env);
-+}
-+
- void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-                           target_ulong *cs_base, uint32_t *pflags)
- {
--    ARMMMUIdx mmu_idx = arm_mmu_idx(env);
--    int current_el = arm_current_el(env);
--    int fp_el = fp_exception_el(env, current_el);
-     uint32_t flags, pstate_for_ss;
-+    flags = rebuild_hflags_internal(env);
-+
-     if (is_a64(env)) {
-         *pc = env->pc;
--        flags = rebuild_hflags_a64(env, current_el, fp_el, mmu_idx);
-         if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
-             flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
-         }
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-         *pc = env->regs[15];
-         if (arm_feature(env, ARM_FEATURE_M)) {
--            flags = rebuild_hflags_m32(env, fp_el, mmu_idx);
--
-             if (arm_feature(env, ARM_FEATURE_M_SECURITY) &&
-                 FIELD_EX32(env->v7m.fpccr[M_REG_S], V7M_FPCCR, S)
-                 != env->v7m.secure) {
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-                 flags = FIELD_DP32(flags, TBFLAG_A32, LSPACT, 1);
-             }
-         } else {
--            flags = rebuild_hflags_a32(env, fp_el, mmu_idx);
--
-             /*
-              * Note that XSCALE_CPAR shares bits with VECSTRIDE.
-              * Note that VECLEN+VECSTRIDE are RES0 for M-profile.
---
-.20.1

-[PULL 15/51] target/arm: Split out arm_mmu_idx_el
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-Avoid calling arm_current_el() twice.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-14-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/internals.h |  9 +++++++++
- target/arm/helper.c    | 12 +++++++-----
-files changed, 16 insertions(+), 5 deletions(-)
-diff --git a/target/arm/internals.h b/target/arm/internals.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/internals.h
-+++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ void arm_cpu_update_virq(ARMCPU *cpu);
-  */
- void arm_cpu_update_vfiq(ARMCPU *cpu);
-+/**
-+ * arm_mmu_idx_el:
-+ * @env: The cpu environment
-+ * @el: The EL to use.
-+ *
-+ * Return the full ARMMMUIdx for the translation regime for EL.
-+ */
-+ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el);
-+
- /**
-  * arm_mmu_idx:
-  * @env: The cpu environment
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
- }
- #endif
--ARMMMUIdx arm_mmu_idx(CPUARMState *env)
-+ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el)
- {
--    int el;
--
-     if (arm_feature(env, ARM_FEATURE_M)) {
-         return arm_v7m_mmu_idx_for_secstate(env, env->v7m.secure);
-     }
--    el = arm_current_el(env);
-     if (el < 2 && arm_is_secure_below_el3(env)) {
-         return ARMMMUIdx_S1SE0 + el;
-     } else {
-@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx(CPUARMState *env)
-     }
- }
-+ARMMMUIdx arm_mmu_idx(CPUARMState *env)
-+{
-+    return arm_mmu_idx_el(env, arm_current_el(env));
-+}
-+
- int cpu_mmu_index(CPUARMState *env, bool ifetch)
- {
-     return arm_to_core_mmu_idx(arm_mmu_idx(env));
-@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_internal(CPUARMState *env)
- {
-     int el = arm_current_el(env);
-     int fp_el = fp_exception_el(env, el);
--    ARMMMUIdx mmu_idx = arm_mmu_idx(env);
-+    ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, el);
-     if (is_a64(env)) {
-         return rebuild_hflags_a64(env, el, fp_el, mmu_idx);
---
-.20.1

-[PULL 16/51] target/arm: Hoist store to cs_base in cpu_get_tb_cpu_state
+[PULL 16/26] disas/capstone: Fix monitor disassembly of >32 bytes
-From: Richard Henderson <richard.henderson@linaro.org>
+If we're using the capstone disassembler, disassembly of a run of
 instructions more than 32 bytes long disassembles the wrong data for
 instructions beyond the 32 byte mark:
-By performing this store early, we avoid having to save and restore
+(qemu) xp /16x 0x100
-the register holding the address around any function calls.
+0000000000000100: 0x00000005 0x54410001 0x00000001 0x00001000
 0000000000000110: 0x00000000 0x00000004 0x54410002 0x3c000000
 0000000000000120: 0x00000000 0x00000004 0x54410009 0x74736574
 0000000000000130: 0x00000000 0x00000000 0x00000000 0x00000000
 (qemu) xp /16i 0x100
 x00000100: 00000005 andeq r0, r0, r5
 x00000104: 54410001 strbpl r0, [r1], #-1
 x00000108: 00000001 andeq r0, r0, r1
 x0000010c: 00001000 andeq r1, r0, r0
 x00000110: 00000000 andeq r0, r0, r0
 x00000114: 00000004 andeq r0, r0, r4
 x00000118: 54410002 strbpl r0, [r1], #-2
 x0000011c: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
 x00000120: 54410001 strbpl r0, [r1], #-1
 x00000124: 00000001 andeq r0, r0, r1
 x00000128: 00001000 andeq r1, r0, r0
 x0000012c: 00000000 andeq r0, r0, r0
 x00000130: 00000004 andeq r0, r0, r4
 x00000134: 54410002 strbpl r0, [r1], #-2
 x00000138: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
 x0000013c: 00000000 andeq r0, r0, r0
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Here the disassembly of 0x120..0x13f is using the data that is in
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+x104..0x123.
-Message-id: 20191023150057.25731-15-richard.henderson@linaro.org
 This is caused by passing the wrong value to the read_memory_func().
 The intention is that at this point in the loop the 'cap_buf' buffer
 already contains 'csize' bytes of data for the instruction at guest
 addr 'pc', and we want to read in an extra 'tsize' bytes.  Those
 extra bytes are therefore at 'pc + csize', not 'pc'.  On the first
 time through the loop 'csize' happens to be zero, so the initial read
 of 32 bytes into cap_buf is correct and as long as the disassembly
 never needs to read more data we return the correct information.
 Use the correct guest address in the call to read_memory_func().
 Cc: qemu-stable@nongnu.org
 Fixes: https://bugs.launchpad.net/qemu/+bug/1900779
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20201022132445.25039-1-peter.maydell@linaro.org
 ---
- target/arm/helper.c | 2 +-
+ disas/capstone.c | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+diff --git a/disas/capstone.c b/disas/capstone.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/disas/capstone.c
-+++ b/target/arm/helper.c
++++ b/disas/capstone.c
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
+@@ -XXX,XX +XXX,XX @@ bool cap_disas_monitor(disassemble_info *info, uint64_t pc, int count)
- {
-     uint32_t flags, pstate_for_ss;
+         /* Make certain that we can make progress.  */
+         assert(tsize != 0);
-+    *cs_base = 0;
+-        info->read_memory_func(pc, cap_buf + csize, tsize, info);
-     flags = rebuild_hflags_internal(env);
++        info->read_memory_func(pc + csize, cap_buf + csize, tsize, info);
+         csize += tsize;
-     if (is_a64(env)) {
-@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
+         if (cs_disasm_iter(handle, &cbuf, &csize, &pc, insn)) {
      }
      *pflags = flags;
 -    *cs_base = 0;
  }
  #ifdef TARGET_AARCH64
 --
 .20.1

-[PULL 17/51] target/arm: Add HELPER(rebuild_hflags_{a32, a64, m32})
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-This functions are given the mode and el state of the cpu
-and writes the computed value to env->hflags.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-16-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/helper.h |  4 ++++
- target/arm/helper.c | 24 ++++++++++++++++++++++++
-files changed, 28 insertions(+)
-diff --git a/target/arm/helper.h b/target/arm/helper.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.h
-+++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(msr_banked, void, env, i32, i32, i32)
- DEF_HELPER_2(get_user_reg, i32, env, i32)
- DEF_HELPER_3(set_user_reg, void, env, i32, i32)
-+DEF_HELPER_FLAGS_2(rebuild_hflags_m32, TCG_CALL_NO_RWG, void, env, int)
-+DEF_HELPER_FLAGS_2(rebuild_hflags_a32, TCG_CALL_NO_RWG, void, env, int)
-+DEF_HELPER_FLAGS_2(rebuild_hflags_a64, TCG_CALL_NO_RWG, void, env, int)
-+
- DEF_HELPER_1(vfp_get_fpscr, i32, env)
- DEF_HELPER_2(vfp_set_fpscr, void, env, i32)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ void arm_rebuild_hflags(CPUARMState *env)
-     env->hflags = rebuild_hflags_internal(env);
- }
-+void HELPER(rebuild_hflags_m32)(CPUARMState *env, int el)
-+{
-+    int fp_el = fp_exception_el(env, el);
-+    ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, el);
-+
-+    env->hflags = rebuild_hflags_m32(env, fp_el, mmu_idx);
-+}
-+
-+void HELPER(rebuild_hflags_a32)(CPUARMState *env, int el)
-+{
-+    int fp_el = fp_exception_el(env, el);
-+    ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, el);
-+
-+    env->hflags = rebuild_hflags_a32(env, fp_el, mmu_idx);
-+}
-+
-+void HELPER(rebuild_hflags_a64)(CPUARMState *env, int el)
-+{
-+    int fp_el = fp_exception_el(env, el);
-+    ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, el);
-+
-+    env->hflags = rebuild_hflags_a64(env, el, fp_el, mmu_idx);
-+}
-+
- void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-                           target_ulong *cs_base, uint32_t *pflags)
- {
---
-.20.1

-[PULL 18/51] target/arm: Rebuild hflags at EL changes
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-Begin setting, but not relying upon, env->hflags.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-17-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- linux-user/syscall.c    | 1 +
- target/arm/cpu.c        | 1 +
- target/arm/helper-a64.c | 3 +++
- target/arm/helper.c     | 2 ++
- target/arm/machine.c    | 1 +
- target/arm/op_helper.c  | 1 +
-files changed, 9 insertions(+)
-diff --git a/linux-user/syscall.c b/linux-user/syscall.c
-index XXXXXXX..XXXXXXX 100644
---- a/linux-user/syscall.c
-+++ b/linux-user/syscall.c
-@@ -XXX,XX +XXX,XX @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
-                     aarch64_sve_narrow_vq(env, vq);
-                 }
-                 env->vfp.zcr_el[1] = vq - 1;
-+                arm_rebuild_hflags(env);
-                 ret = vq * 16;
-             }
-             return ret;
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
-+++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
-     hw_breakpoint_update_all(cpu);
-     hw_watchpoint_update_all(cpu);
-+    arm_rebuild_hflags(env);
- }
- bool arm_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
-diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-a64.c
-+++ b/target/arm/helper-a64.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(exception_return)(CPUARMState *env, uint64_t new_pc)
-         } else {
-             env->regs[15] = new_pc & ~0x3;
-         }
-+        helper_rebuild_hflags_a32(env, new_el);
-         qemu_log_mask(CPU_LOG_INT, "Exception return from AArch64 EL%d to "
-                       "AArch32 EL%d PC 0x%" PRIx32 "\n",
-                       cur_el, new_el, env->regs[15]);
-@@ -XXX,XX +XXX,XX @@ void HELPER(exception_return)(CPUARMState *env, uint64_t new_pc)
-         }
-         aarch64_restore_sp(env, new_el);
-         env->pc = new_pc;
-+        helper_rebuild_hflags_a64(env, new_el);
-         qemu_log_mask(CPU_LOG_INT, "Exception return from AArch64 EL%d to "
-                       "AArch64 EL%d PC 0x%" PRIx64 "\n",
-                       cur_el, new_el, env->pc);
-     }
-+
-     /*
-      * Note that cur_el can never be 0.  If new_el is 0, then
-      * el0_a64 is return_to_aa64, else el0_a64 is ignored.
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static void take_aarch32_exception(CPUARMState *env, int new_mode,
-         env->regs[14] = env->regs[15] + offset;
-     }
-     env->regs[15] = newpc;
-+    arm_rebuild_hflags(env);
- }
- static void arm_cpu_do_interrupt_aarch32_hyp(CPUState *cs)
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
-     pstate_write(env, PSTATE_DAIF | new_mode);
-     env->aarch64 = 1;
-     aarch64_restore_sp(env, new_el);
-+    helper_rebuild_hflags_a64(env, new_el);
-     env->pc = addr;
-diff --git a/target/arm/machine.c b/target/arm/machine.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/machine.c
-+++ b/target/arm/machine.c
-@@ -XXX,XX +XXX,XX @@ static int cpu_post_load(void *opaque, int version_id)
-     if (!kvm_enabled()) {
-         pmu_op_finish(&cpu->env);
-     }
-+    arm_rebuild_hflags(&cpu->env);
-     return 0;
- }
-diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/op_helper.c
-+++ b/target/arm/op_helper.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(cpsr_write_eret)(CPUARMState *env, uint32_t val)
-      * state. Do the masking now.
-      */
-     env->regs[15] &= (env->thumb ? ~1 : ~3);
-+    arm_rebuild_hflags(env);
-     qemu_mutex_lock_iothread();
-     arm_call_el_change_hook(env_archcpu(env));
---
-.20.1

-[PULL 35/51] target/arm/monitor: Introduce qmp_query_cpu_model_expansion
+[PULL 17/26] hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
-From: Andrew Jones <drjones@redhat.com>
+From: Philippe Mathieu-Daudé <philmd@redhat.com>
-Add support for the query-cpu-model-expansion QMP command to Arm. We
+Use the BIT_ULL() macro to ensure we use 64-bit arithmetic.
-do this selectively, only exposing CPU properties which represent
+This fixes the following Coverity issue (OVERFLOW_BEFORE_WIDEN):
 optional CPU features which the user may want to enable/disable.
 Additionally we restrict the list of queryable cpu models to 'max',
 'host', or the current type when KVM is in use. And, finally, we only
 implement expansion type 'full', as Arm does not yet have a "base"
 CPU type. More details and example queries are described in a new
 document (docs/arm-cpu-features.rst).
-Note, certainly more features may be added to the list of advertised
+  CID 1432363 (#1 of 1): Unintentional integer overflow:
 features, e.g. 'vfp' and 'neon'. The only requirement is that we can
 detect invalid configurations and emit failures at QMP query time.
 For 'vfp' and 'neon' this will require some refactoring to share a
 validation function between the QMP query and the CPU realize
 functions.
-Signed-off-by: Andrew Jones <drjones@redhat.com>
+  overflow_before_widen:
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+    Potentially overflowing expression 1 << scale with type int
-Reviewed-by: Eric Auger <eric.auger@redhat.com>
+    (32 bits, signed) is evaluated using 32-bit arithmetic, and
-Reviewed-by: Beata Michalska <beata.michalska@linaro.org>
+    then used in a context that expects an expression of type
-Message-id: 20191024121808.9612-2-drjones@redhat.com
+    hwaddr (64 bits, unsigned).
 Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Acked-by: Eric Auger <eric.auger@redhat.com>
 Message-id: 20201030144617.1535064-1-philmd@redhat.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- qapi/machine-target.json  |   6 +-
+ hw/arm/smmuv3.c | 3 ++-
- target/arm/monitor.c      | 146 ++++++++++++++++++++++++++++++++++++++
+file changed, 2 insertions(+), 1 deletion(-)
  docs/arm-cpu-features.rst | 137 +++++++++++++++++++++++++++++++++++
 files changed, 286 insertions(+), 3 deletions(-)
  create mode 100644 docs/arm-cpu-features.rst
-diff --git a/qapi/machine-target.json b/qapi/machine-target.json
+diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
 index XXXXXXX..XXXXXXX 100644
---- a/qapi/machine-target.json
+--- a/hw/arm/smmuv3.c
-+++ b/qapi/machine-target.json
++++ b/hw/arm/smmuv3.c
@@ -XXX,XX +XXX,XX @@
  ##
  { 'struct': 'CpuModelExpansionInfo',
    'data': { 'model': 'CpuModelInfo' },
 -  'if': 'defined(TARGET_S390X) || defined(TARGET_I386)' }
 +  'if': 'defined(TARGET_S390X) || defined(TARGET_I386) || defined(TARGET_ARM)' }
  ##
  # @query-cpu-model-expansion:
@@ -XXX,XX +XXX,XX @@
  #   query-cpu-model-expansion while using these is not advised.
  #
  # Some architectures may not support all expansion types. s390x supports
 -# "full" and "static".
 +# "full" and "static". Arm only supports "full".
  #
  # Returns: a CpuModelExpansionInfo. Returns an error if expanding CPU models is
  #          not supported, if the model cannot be expanded, if the model contains
@@ -XXX,XX +XXX,XX @@
    'data': { 'type': 'CpuModelExpansionType',
              'model': 'CpuModelInfo' },
    'returns': 'CpuModelExpansionInfo',
 -  'if': 'defined(TARGET_S390X) || defined(TARGET_I386)' }
 +  'if': 'defined(TARGET_S390X) || defined(TARGET_I386) || defined(TARGET_ARM)' }
  ##
  # @CpuDefinitionInfo:
 diff --git a/target/arm/monitor.c b/target/arm/monitor.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/monitor.c
 +++ b/target/arm/monitor.c
 @@ -XXX,XX +XXX,XX @@
   */
  #include "qemu/osdep.h"
-+#include "hw/boards.h"
++#include "qemu/bitops.h"
- #include "kvm_arm.h"
+ #include "hw/irq.h"
-+#include "qapi/error.h"
+ #include "hw/sysbus.h"
-+#include "qapi/visitor.h"
+ #include "migration/vmstate.h"
-+#include "qapi/qobject-input-visitor.h"
+@@ -XXX,XX +XXX,XX @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
-+#include "qapi/qapi-commands-machine-target.h"
+         scale = CMD_SCALE(cmd);
- #include "qapi/qapi-commands-misc-target.h"
+         num = CMD_NUM(cmd);
-+#include "qapi/qmp/qerror.h"
+         ttl = CMD_TTL(cmd);
-+#include "qapi/qmp/qdict.h"
+-        num_pages = (num + 1) * (1 << (scale));
-+#include "qom/qom-qobject.h"
++        num_pages = (num + 1) * BIT_ULL(scale);
+     }
- static GICCapability *gic_cap_new(int version)
- {
+     if (type == SMMU_CMD_TLBI_NH_VA) {
@@ -XXX,XX +XXX,XX @@ GICCapabilityList *qmp_query_gic_capabilities(Error **errp)
      return head;
  }
 +
 +/*
 + * These are cpu model features we want to advertise. The order here
 + * matters as this is the order in which qmp_query_cpu_model_expansion
 + * will attempt to set them. If there are dependencies between features,
 + * then the order that considers those dependencies must be used.
 + */
 +static const char *cpu_model_advertised_features[] = {
 +    "aarch64", "pmu",
 +    NULL
 +};
 +
 +CpuModelExpansionInfo *qmp_query_cpu_model_expansion(CpuModelExpansionType type,
 +                                                     CpuModelInfo *model,
 +                                                     Error **errp)
 +{
 +    CpuModelExpansionInfo *expansion_info;
 +    const QDict *qdict_in = NULL;
 +    QDict *qdict_out;
 +    ObjectClass *oc;
 +    Object *obj;
 +    const char *name;
 +    int i;
 +
 +    if (type != CPU_MODEL_EXPANSION_TYPE_FULL) {
 +        error_setg(errp, "The requested expansion type is not supported");
 +        return NULL;
 +    }
 +
 +    if (!kvm_enabled() && !strcmp(model->name, "host")) {
 +        error_setg(errp, "The CPU type '%s' requires KVM", model->name);
 +        return NULL;
 +    }
 +
 +    oc = cpu_class_by_name(TYPE_ARM_CPU, model->name);
 +    if (!oc) {
 +        error_setg(errp, "The CPU type '%s' is not a recognized ARM CPU type",
 +                   model->name);
 +        return NULL;
 +    }
 +
 +    if (kvm_enabled()) {
 +        const char *cpu_type = current_machine->cpu_type;
 +        int len = strlen(cpu_type) - strlen(ARM_CPU_TYPE_SUFFIX);
 +        bool supported = false;
 +
 +        if (!strcmp(model->name, "host") || !strcmp(model->name, "max")) {
 +            /* These are kvmarm's recommended cpu types */
 +            supported = true;
 +        } else if (strlen(model->name) == len &&
 +                   !strncmp(model->name, cpu_type, len)) {
 +            /* KVM is enabled and we're using this type, so it works. */
 +            supported = true;
 +        }
 +        if (!supported) {
 +            error_setg(errp, "We cannot guarantee the CPU type '%s' works "
 +                             "with KVM on this host", model->name);
 +            return NULL;
 +        }
 +    }
 +
 +    if (model->props) {
 +        qdict_in = qobject_to(QDict, model->props);
 +        if (!qdict_in) {
 +            error_setg(errp, QERR_INVALID_PARAMETER_TYPE, "props", "dict");
 +            return NULL;
 +        }
 +    }
 +
 +    obj = object_new(object_class_get_name(oc));
 +
 +    if (qdict_in) {
 +        Visitor *visitor;
 +        Error *err = NULL;
 +
 +        visitor = qobject_input_visitor_new(model->props);
 +        visit_start_struct(visitor, NULL, NULL, 0, &err);
 +        if (err) {
 +            visit_free(visitor);
 +            object_unref(obj);
 +            error_propagate(errp, err);
 +            return NULL;
 +        }
 +
 +        i = 0;
 +        while ((name = cpu_model_advertised_features[i++]) != NULL) {
 +            if (qdict_get(qdict_in, name)) {
 +                object_property_set(obj, visitor, name, &err);
 +                if (err) {
 +                    break;
 +                }
 +            }
 +        }
 +
 +        if (!err) {
 +            visit_check_struct(visitor, &err);
 +        }
 +        visit_end_struct(visitor, NULL);
 +        visit_free(visitor);
 +        if (err) {
 +            object_unref(obj);
 +            error_propagate(errp, err);
 +            return NULL;
 +        }
 +    }
 +
 +    expansion_info = g_new0(CpuModelExpansionInfo, 1);
 +    expansion_info->model = g_malloc0(sizeof(*expansion_info->model));
 +    expansion_info->model->name = g_strdup(model->name);
 +
 +    qdict_out = qdict_new();
 +
 +    i = 0;
 +    while ((name = cpu_model_advertised_features[i++]) != NULL) {
 +        ObjectProperty *prop = object_property_find(obj, name, NULL);
 +        if (prop) {
 +            Error *err = NULL;
 +            QObject *value;
 +
 +            assert(prop->get);
 +            value = object_property_get_qobject(obj, name, &err);
 +            assert(!err);
 +
 +            qdict_put_obj(qdict_out, name, value);
 +        }
 +    }
 +
 +    if (!qdict_size(qdict_out)) {
 +        qobject_unref(qdict_out);
 +    } else {
 +        expansion_info->model->props = QOBJECT(qdict_out);
 +        expansion_info->model->has_props = true;
 +    }
 +
 +    object_unref(obj);
 +
 +    return expansion_info;
 +}
 diff --git a/docs/arm-cpu-features.rst b/docs/arm-cpu-features.rst
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/docs/arm-cpu-features.rst
@@ -XXX,XX +XXX,XX @@
 +================
 +ARM CPU Features
 +================
 +
 +Examples of probing and using ARM CPU features
 +
 +Introduction
 +============
 +
 +CPU features are optional features that a CPU of supporting type may
 +choose to implement or not.  In QEMU, optional CPU features have
 +corresponding boolean CPU proprieties that, when enabled, indicate
 +that the feature is implemented, and, conversely, when disabled,
 +indicate that it is not implemented. An example of an ARM CPU feature
 +is the Performance Monitoring Unit (PMU).  CPU types such as the
 +Cortex-A15 and the Cortex-A57, which respectively implement ARM
 +architecture reference manuals ARMv7-A and ARMv8-A, may both optionally
 +implement PMUs.  For example, if a user wants to use a Cortex-A15 without
 +a PMU, then the `-cpu` parameter should contain `pmu=off` on the QEMU
 +command line, i.e. `-cpu cortex-a15,pmu=off`.
 +
 +As not all CPU types support all optional CPU features, then whether or
 +not a CPU property exists depends on the CPU type.  For example, CPUs
 +that implement the ARMv8-A architecture reference manual may optionally
 +support the AArch32 CPU feature, which may be enabled by disabling the
 +`aarch64` CPU property.  A CPU type such as the Cortex-A15, which does
 +not implement ARMv8-A, will not have the `aarch64` CPU property.
 +
 +QEMU's support may be limited for some CPU features, only partially
 +supporting the feature or only supporting the feature under certain
 +configurations.  For example, the `aarch64` CPU feature, which, when
 +disabled, enables the optional AArch32 CPU feature, is only supported
 +when using the KVM accelerator and when running on a host CPU type that
 +supports the feature.
 +
 +CPU Feature Probing
 +===================
 +
 +Determining which CPU features are available and functional for a given
 +CPU type is possible with the `query-cpu-model-expansion` QMP command.
 +Below are some examples where `scripts/qmp/qmp-shell` (see the top comment
 +block in the script for usage) is used to issue the QMP commands.
 +
 +(1) Determine which CPU features are available for the `max` CPU type
 +    (Note, we started QEMU with qemu-system-aarch64, so `max` is
 +     implementing the ARMv8-A reference manual in this case)::
 +
 +      (QEMU) query-cpu-model-expansion type=full model={"name":"max"}
 +      { "return": {
 +        "model": { "name": "max", "props": {
 +        "pmu": true, "aarch64": true
 +      }}}}
 +
 +We see that the `max` CPU type has the `pmu` and `aarch64` CPU features.
 +We also see that the CPU features are enabled, as they are all `true`.
 +
 +(2) Let's try to disable the PMU::
 +
 +      (QEMU) query-cpu-model-expansion type=full model={"name":"max","props":{"pmu":false}}
 +      { "return": {
 +        "model": { "name": "max", "props": {
 +        "pmu": false, "aarch64": true
 +      }}}}
 +
 +We see it worked, as `pmu` is now `false`.
 +
 +(3) Let's try to disable `aarch64`, which enables the AArch32 CPU feature::
 +
 +      (QEMU) query-cpu-model-expansion type=full model={"name":"max","props":{"aarch64":false}}
 +      {"error": {
 +       "class": "GenericError", "desc":
 +       "'aarch64' feature cannot be disabled unless KVM is enabled and 32-bit EL1 is supported"
 +      }}
 +
 +It looks like this feature is limited to a configuration we do not
 +currently have.
 +
 +(4) Let's try probing CPU features for the Cortex-A15 CPU type::
 +
 +      (QEMU) query-cpu-model-expansion type=full model={"name":"cortex-a15"}
 +      {"return": {"model": {"name": "cortex-a15", "props": {"pmu": true}}}}
 +
 +Only the `pmu` CPU feature is available.
 +
 +A note about CPU feature dependencies
 +-------------------------------------
 +
 +It's possible for features to have dependencies on other features. I.e.
 +it may be possible to change one feature at a time without error, but
 +when attempting to change all features at once an error could occur
 +depending on the order they are processed.  It's also possible changing
 +all at once doesn't generate an error, because a feature's dependencies
 +are satisfied with other features, but the same feature cannot be changed
 +independently without error.  For these reasons callers should always
 +attempt to make their desired changes all at once in order to ensure the
 +collection is valid.
 +
 +A note about CPU models and KVM
 +-------------------------------
 +
 +Named CPU models generally do not work with KVM.  There are a few cases
 +that do work, e.g. using the named CPU model `cortex-a57` with KVM on a
 +seattle host, but mostly if KVM is enabled the `host` CPU type must be
 +used.  This means the guest is provided all the same CPU features as the
 +host CPU type has.  And, for this reason, the `host` CPU type should
 +enable all CPU features that the host has by default.  Indeed it's even
 +a bit strange to allow disabling CPU features that the host has when using
 +the `host` CPU type, but in the absence of CPU models it's the best we can
 +do if we want to launch guests without all the host's CPU features enabled.
 +
 +Enabling KVM also affects the `query-cpu-model-expansion` QMP command.  The
 +affect is not only limited to specific features, as pointed out in example
 +(3) of "CPU Feature Probing", but also to which CPU types may be expanded.
 +When KVM is enabled, only the `max`, `host`, and current CPU type may be
 +expanded.  This restriction is necessary as it's not possible to know all
 +CPU types that may work with KVM, but it does impose a small risk of users
 +experiencing unexpected errors.  For example on a seattle, as mentioned
 +above, the `cortex-a57` CPU type is also valid when KVM is enabled.
 +Therefore a user could use the `host` CPU type for the current type, but
 +then attempt to query `cortex-a57`, however that query will fail with our
 +restrictions.  This shouldn't be an issue though as management layers and
 +users have been preferring the `host` CPU type for use with KVM for quite
 +some time.  Additionally, if the KVM-enabled QEMU instance running on a
 +seattle host is using the `cortex-a57` CPU type, then querying `cortex-a57`
 +will work.
 +
 +Using CPU Features
 +==================
 +
 +After determining which CPU features are available and supported for a
 +given CPU type, then they may be selectively enabled or disabled on the
 +QEMU command line with that CPU type::
 +
 +  $ qemu-system-aarch64 -M virt -cpu max,pmu=off
 +
 +The example above disables the PMU for the `max` CPU type.
 +
 --
 .20.1

-[PULL 20/51] target/arm: Rebuild hflags at CPSR writes
+[PULL 18/26] hw/arm/boot: fix SVE for EL3 direct kernel boot
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
-Continue setting, but not relying upon, env->hflags.
+When booting a CPU with EL3 using the -kernel flag, set up CPTR_EL3 so
 that SVE will not trap to EL3.
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-19-richard.henderson@linaro.org
+Message-id: 20201030151541.11976-1-remi@remlab.net
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/op_helper.c | 3 +++
+ hw/arm/boot.c | 3 +++
 file changed, 3 insertions(+)
-diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
+diff --git a/hw/arm/boot.c b/hw/arm/boot.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/op_helper.c
+--- a/hw/arm/boot.c
-+++ b/target/arm/op_helper.c
++++ b/hw/arm/boot.c
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(usat16)(CPUARMState *env, uint32_t x, uint32_t shift)
+@@ -XXX,XX +XXX,XX @@ static void do_cpu_reset(void *opaque)
- void HELPER(setend)(CPUARMState *env)
+                     if (cpu_isar_feature(aa64_mte, cpu)) {
- {
+                         env->cp15.scr_el3 |= SCR_ATA;
-     env->uncached_cpsr ^= CPSR_E;
+                     }
-+    arm_rebuild_hflags(env);
++                    if (cpu_isar_feature(aa64_sve, cpu)) {
- }
++                        env->cp15.cptr_el[3] |= CPTR_EZ;
++                    }
- /* Function checks whether WFx (WFI/WFE) instructions are set up to be trapped.
+                     /* AArch64 kernels never boot in secure mode */
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(cpsr_read)(CPUARMState *env)
+                     assert(!info->secure_boot);
- void HELPER(cpsr_write)(CPUARMState *env, uint32_t val, uint32_t mask)
+                     /* This hook is only supported for AArch32 currently:
  {
      cpsr_write(env, val, mask, CPSRWriteByInstr);
 +    /* TODO: Not all cpsr bits are relevant to hflags.  */
 +    arm_rebuild_hflags(env);
  }
  /* Write the CPSR for a 32-bit exception return */
 --
 .20.1

-[PULL 23/51] target/arm: Rebuild hflags for M-profile NVIC
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-Continue setting, but not relying upon, env->hflags.
-Suggested-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-22-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- hw/intc/armv7m_nvic.c | 22 +++++++++++++---------
-file changed, 13 insertions(+), 9 deletions(-)
-diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/intc/armv7m_nvic.c
-+++ b/hw/intc/armv7m_nvic.c
-@@ -XXX,XX +XXX,XX @@ static MemTxResult nvic_sysreg_write(void *opaque, hwaddr addr,
-             }
-         }
-         nvic_irq_update(s);
--        return MEMTX_OK;
-+        goto exit_ok;
-     case 0x200 ... 0x23f: /* NVIC Set pend */
-         /* the special logic in armv7m_nvic_set_pending()
-          * is not needed since IRQs are never escalated
-@@ -XXX,XX +XXX,XX @@ static MemTxResult nvic_sysreg_write(void *opaque, hwaddr addr,
-             }
-         }
-         nvic_irq_update(s);
--        return MEMTX_OK;
-+        goto exit_ok;
-     case 0x300 ... 0x33f: /* NVIC Active */
--        return MEMTX_OK; /* R/O */
-+        goto exit_ok; /* R/O */
-     case 0x400 ... 0x5ef: /* NVIC Priority */
-         startvec = (offset - 0x400) + NVIC_FIRST_IRQ; /* vector # */
-@@ -XXX,XX +XXX,XX @@ static MemTxResult nvic_sysreg_write(void *opaque, hwaddr addr,
-             }
-         }
-         nvic_irq_update(s);
--        return MEMTX_OK;
-+        goto exit_ok;
-     case 0xd18 ... 0xd1b: /* System Handler Priority (SHPR1) */
-         if (!arm_feature(&s->cpu->env, ARM_FEATURE_M_MAIN)) {
--            return MEMTX_OK;
-+            goto exit_ok;
-         }
-         /* fall through */
-     case 0xd1c ... 0xd23: /* System Handler Priority (SHPR2, SHPR3) */
-@@ -XXX,XX +XXX,XX @@ static MemTxResult nvic_sysreg_write(void *opaque, hwaddr addr,
-             set_prio(s, hdlidx, sbank, newprio);
-         }
-         nvic_irq_update(s);
--        return MEMTX_OK;
-+        goto exit_ok;
-     case 0xd28 ... 0xd2b: /* Configurable Fault Status (CFSR) */
-         if (!arm_feature(&s->cpu->env, ARM_FEATURE_M_MAIN)) {
--            return MEMTX_OK;
-+            goto exit_ok;
-         }
-         /* All bits are W1C, so construct 32 bit value with 0s in
-          * the parts not written by the access size
-@@ -XXX,XX +XXX,XX @@ static MemTxResult nvic_sysreg_write(void *opaque, hwaddr addr,
-              */
-             s->cpu->env.v7m.cfsr[M_REG_NS] &= ~(value & R_V7M_CFSR_BFSR_MASK);
-         }
--        return MEMTX_OK;
-+        goto exit_ok;
-     }
-     if (size == 4) {
-         nvic_writel(s, offset, value, attrs);
--        return MEMTX_OK;
-+        goto exit_ok;
-     }
-     qemu_log_mask(LOG_GUEST_ERROR,
-                   "NVIC: Bad write of size %d at offset 0x%x\n", size, offset);
-     /* This is UNPREDICTABLE; treat as RAZ/WI */
-+
-+ exit_ok:
-+    /* Ensure any changes made are reflected in the cached hflags.  */
-+    arm_rebuild_hflags(&s->cpu->env);
-     return MEMTX_OK;
- }
---
-.20.1

-[PULL 25/51] linux-user/arm: Rebuild hflags for TARGET_WORDS_BIGENDIAN
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-Continue setting, but not relying upon, env->hflags.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-24-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- linux-user/arm/cpu_loop.c | 1 +
-file changed, 1 insertion(+)
-diff --git a/linux-user/arm/cpu_loop.c b/linux-user/arm/cpu_loop.c
-index XXXXXXX..XXXXXXX 100644
---- a/linux-user/arm/cpu_loop.c
-+++ b/linux-user/arm/cpu_loop.c
-@@ -XXX,XX +XXX,XX @@ void target_cpu_copy_regs(CPUArchState *env, struct target_pt_regs *regs)
-     } else {
-         env->cp15.sctlr_el[1] |= SCTLR_B;
-     }
-+    arm_rebuild_hflags(env);
- #endif
-     ts->stack_base = info->start_stack;
---
-.20.1

-[PULL 26/51] target/arm: Rely on hflags correct in cpu_get_tb_cpu_state
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-This is the payoff.
-From perf record -g data of ubuntu 18 boot and shutdown:
-BEFORE:
--   23.02%     2.82%  qemu-system-aar  [.] helper_lookup_tb_ptr
-   - 20.22% helper_lookup_tb_ptr
-      + 10.05% tb_htable_lookup
-      - 9.13% cpu_get_tb_cpu_state
-.20% aa64_va_parameters_both
-.55% fp_exception_el
--   11.66%     4.74%  qemu-system-aar  [.] cpu_get_tb_cpu_state
-   - 6.96% cpu_get_tb_cpu_state
-.63% aa64_va_parameters_both
-.60% fp_exception_el
-.53% sve_exception_el
-AFTER:
--   16.40%     3.40%  qemu-system-aar  [.] helper_lookup_tb_ptr
-   - 13.03% helper_lookup_tb_ptr
-      + 11.19% tb_htable_lookup
-.55% cpu_get_tb_cpu_state
-.98%     0.71%  qemu-system-aar  [.] cpu_get_tb_cpu_state
-.87%     0.24%  qemu-system-aar  [.] rebuild_hflags_a64
-Before, helper_lookup_tb_ptr is the second hottest function in the
-application, consuming almost a quarter of the runtime.  Within the
-entire execution, cpu_get_tb_cpu_state consumes about 12%.
-After, helper_lookup_tb_ptr has dropped to the fourth hottest function,
-with consumption dropping to a sixth of the runtime.  Within the
-entire execution, cpu_get_tb_cpu_state has dropped below 1%, and the
-supporting function to rebuild hflags also consumes about 1%.
-Assertions are retained for --enable-debug-tcg.
-Tested-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191023150057.25731-25-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/helper.c | 9 ++++++---
-file changed, 6 insertions(+), 3 deletions(-)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(rebuild_hflags_a64)(CPUARMState *env, int el)
- void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
-                           target_ulong *cs_base, uint32_t *pflags)
- {
--    uint32_t flags, pstate_for_ss;
-+    uint32_t flags = env->hflags;
-+    uint32_t pstate_for_ss;
-     *cs_base = 0;
--    flags = rebuild_hflags_internal(env);
-+#ifdef CONFIG_DEBUG_TCG
-+    assert(flags == rebuild_hflags_internal(env));
-+#endif
--    if (is_a64(env)) {
-+    if (FIELD_EX32(flags, TBFLAG_ANY, AARCH64_STATE)) {
-         *pc = env->pc;
-         if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
-             flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
---
-.20.1

-[PULL 27/51] hw/net/fsl_etsec/etsec.c: Switch to transaction-based ptimer API
+Deleted patch
-Switch the fsl_etsec code away from bottom-half based ptimers to
-the new transaction-based ptimer API.  This just requires adding
-begin/commit calls around the various places that modify the ptimer
-state, and using the new ptimer_init() function to create the timer.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
-Message-id: 20191017132122.4402-2-peter.maydell@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- hw/net/fsl_etsec/etsec.h | 1 -
- hw/net/fsl_etsec/etsec.c | 9 +++++----
-files changed, 5 insertions(+), 5 deletions(-)
-diff --git a/hw/net/fsl_etsec/etsec.h b/hw/net/fsl_etsec/etsec.h
-index XXXXXXX..XXXXXXX 100644
---- a/hw/net/fsl_etsec/etsec.h
-+++ b/hw/net/fsl_etsec/etsec.h
-@@ -XXX,XX +XXX,XX @@ typedef struct eTSEC {
-     uint16_t phy_control;
-     /* Polling */
--    QEMUBH *bh;
-     struct ptimer_state *ptimer;
-     /* Whether we should flush the rx queue when buffer becomes available. */
-diff --git a/hw/net/fsl_etsec/etsec.c b/hw/net/fsl_etsec/etsec.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/net/fsl_etsec/etsec.c
-+++ b/hw/net/fsl_etsec/etsec.c
-@@ -XXX,XX +XXX,XX @@
- #include "etsec.h"
- #include "registers.h"
- #include "qemu/log.h"
--#include "qemu/main-loop.h"
- #include "qemu/module.h"
- /* #define HEX_DUMP */
-@@ -XXX,XX +XXX,XX @@ static void write_dmactrl(eTSEC          *etsec,
-     if (!(value & DMACTRL_WOP)) {
-         /* Start polling */
-+        ptimer_transaction_begin(etsec->ptimer);
-         ptimer_stop(etsec->ptimer);
-         ptimer_set_count(etsec->ptimer, 1);
-         ptimer_run(etsec->ptimer, 1);
-+        ptimer_transaction_commit(etsec->ptimer);
-     }
- }
-@@ -XXX,XX +XXX,XX @@ static void etsec_realize(DeviceState *dev, Error **errp)
-                               object_get_typename(OBJECT(dev)), dev->id, etsec);
-     qemu_format_nic_info_str(qemu_get_queue(etsec->nic), etsec->conf.macaddr.a);
--
--    etsec->bh     = qemu_bh_new(etsec_timer_hit, etsec);
--    etsec->ptimer = ptimer_init_with_bh(etsec->bh, PTIMER_POLICY_DEFAULT);
-+    etsec->ptimer = ptimer_init(etsec_timer_hit, etsec, PTIMER_POLICY_DEFAULT);
-+    ptimer_transaction_begin(etsec->ptimer);
-     ptimer_set_freq(etsec->ptimer, 100);
-+    ptimer_transaction_commit(etsec->ptimer);
- }
- static void etsec_instance_init(Object *obj)
---
-.20.1

-[PULL 37/51] target/arm: Allow SVE to be disabled via a CPU property
+[PULL 19/26] hw/display/omap_lcdc: Fix potential NULL pointer dereference
-From: Andrew Jones <drjones@redhat.com>
+From: AlexChen <alex.chen@huawei.com>
-Since 97a28b0eeac14 ("target/arm: Allow VFP and Neon to be disabled via
+In omap_lcd_interrupts(), the pointer omap_lcd is dereferinced before
-a CPU property") we can disable the 'max' cpu model's VFP and neon
+being check if it is valid, which may lead to NULL pointer dereference.
-features, but there's no way to disable SVE. Add the 'sve=on|off'
+So move the assignment to surface after checking that the omap_lcd is valid
-property to give it that flexibility. We also rename
+and move surface_bits_per_pixel(surface) to after the surface assignment.
 cpu_max_get/set_sve_vq to cpu_max_get/set_sve_max_vq in order for them
 to follow the typical *_get/set_<property-name> pattern.
-Signed-off-by: Andrew Jones <drjones@redhat.com>
+Reported-by: Euler Robot <euler.robot@huawei.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Signed-off-by: AlexChen <alex.chen@huawei.com>
-Reviewed-by: Eric Auger <eric.auger@redhat.com>
+Message-id: 5F9CDB8A.9000001@huawei.com
-Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Beata Michalska <beata.michalska@linaro.org>
 Message-id: 20191024121808.9612-4-drjones@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.c         |  3 ++-
+ hw/display/omap_lcdc.c | 10 +++++++---
- target/arm/cpu64.c       | 52 ++++++++++++++++++++++++++++++++++------
+file changed, 7 insertions(+), 3 deletions(-)
  target/arm/monitor.c     |  2 +-
  tests/arm-cpu-features.c |  1 +
 files changed, 49 insertions(+), 9 deletions(-)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+diff --git a/hw/display/omap_lcdc.c b/hw/display/omap_lcdc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
+--- a/hw/display/omap_lcdc.c
-+++ b/target/arm/cpu.c
++++ b/hw/display/omap_lcdc.c
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
+@@ -XXX,XX +XXX,XX @@ static void omap_lcd_interrupts(struct omap_lcd_panel_s *s)
-         env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 16, 2, 3);
+ static void omap_update_display(void *opaque)
          env->cp15.cptr_el[3] |= CPTR_EZ;
          /* with maximum vector length */
 -        env->vfp.zcr_el[1] = cpu->sve_max_vq - 1;
 +        env->vfp.zcr_el[1] = cpu_isar_feature(aa64_sve, cpu) ?
 +                             cpu->sve_max_vq - 1 : 0;
          env->vfp.zcr_el[2] = env->vfp.zcr_el[1];
          env->vfp.zcr_el[3] = env->vfp.zcr_el[1];
          /*
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
      define_arm_cp_regs(cpu, cortex_a72_a57_a53_cp_reginfo);
  }
 -static void cpu_max_get_sve_vq(Object *obj, Visitor *v, const char *name,
 -                               void *opaque, Error **errp)
 +static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char *name,
 +                                   void *opaque, Error **errp)
  {
-     ARMCPU *cpu = ARM_CPU(obj);
+     struct omap_lcd_panel_s *omap_lcd = (struct omap_lcd_panel_s *) opaque;
--    visit_type_uint32(v, name, &cpu->sve_max_vq, errp);
+-    DisplaySurface *surface = qemu_console_surface(omap_lcd->con);
-+    uint32_t value;
++    DisplaySurface *surface;
-+
+     draw_line_func draw_line;
-+    /* All vector lengths are disabled when SVE is off. */
+     int size, height, first, last;
-+    if (!cpu_isar_feature(aa64_sve, cpu)) {
+     int width, linesize, step, bpp, frame_offset;
-+        value = 0;
+     hwaddr frame_base;
-+    } else {
-+        value = cpu->sve_max_vq;
+-    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable ||
-+    }
+-        !surface_bits_per_pixel(surface)) {
-+    visit_type_uint32(v, name, &value, errp);
++    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable) {
  }
 -static void cpu_max_set_sve_vq(Object *obj, Visitor *v, const char *name,
 -                               void *opaque, Error **errp)
 +static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char *name,
 +                                   void *opaque, Error **errp)
  {
      ARMCPU *cpu = ARM_CPU(obj);
      Error *err = NULL;
@@ -XXX,XX +XXX,XX @@ static void cpu_max_set_sve_vq(Object *obj, Visitor *v, const char *name,
      error_propagate(errp, err);
  }
 +static void cpu_arm_get_sve(Object *obj, Visitor *v, const char *name,
 +                            void *opaque, Error **errp)
 +{
 +    ARMCPU *cpu = ARM_CPU(obj);
 +    bool value = cpu_isar_feature(aa64_sve, cpu);
 +
 +    visit_type_bool(v, name, &value, errp);
 +}
 +
 +static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
 +                            void *opaque, Error **errp)
 +{
 +    ARMCPU *cpu = ARM_CPU(obj);
 +    Error *err = NULL;
 +    bool value;
 +    uint64_t t;
 +
 +    visit_type_bool(v, name, &value, &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +
-+    t = cpu->isar.id_aa64pfr0;
++    surface = qemu_console_surface(omap_lcd->con);
-+    t = FIELD_DP64(t, ID_AA64PFR0, SVE, value);
++    if (!surface_bits_per_pixel(surface)) {
-+    cpu->isar.id_aa64pfr0 = t;
+         return;
 +}
 +
  /* -cpu max: if KVM is enabled, like -cpu host (best possible with this host);
   * otherwise, a CPU with as many features enabled as our emulation supports.
   * The version of '-cpu max' for qemu-system-arm is defined in cpu.c;
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
  #endif
          cpu->sve_max_vq = ARM_MAX_VQ;
 -        object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_vq,
 -                            cpu_max_set_sve_vq, NULL, NULL, &error_fatal);
 +        object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
 +                            cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
 +        object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
 +                            cpu_arm_set_sve, NULL, NULL, &error_fatal);
      }
- }
-diff --git a/target/arm/monitor.c b/target/arm/monitor.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/monitor.c
-+++ b/target/arm/monitor.c
-@@ -XXX,XX +XXX,XX @@ GICCapabilityList *qmp_query_gic_capabilities(Error **errp)
-  * then the order that considers those dependencies must be used.
-  */
- static const char *cpu_model_advertised_features[] = {
--    "aarch64", "pmu",
-+    "aarch64", "pmu", "sve",
-     NULL
- };
-diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
-index XXXXXXX..XXXXXXX 100644
---- a/tests/arm-cpu-features.c
-+++ b/tests/arm-cpu-features.c
-@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion(const void *data)
-     if (g_str_equal(qtest_get_arch(), "aarch64")) {
-         assert_has_feature(qts, "max", "aarch64");
-+        assert_has_feature(qts, "max", "sve");
-         assert_has_feature(qts, "cortex-a57", "pmu");
-         assert_has_feature(qts, "cortex-a57", "aarch64");
 --
 .20.1

-[PULL 38/51] target/arm/cpu64: max cpu: Introduce sve<N> properties
+[PULL 20/26] hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
-From: Andrew Jones <drjones@redhat.com>
+From: AlexChen <alex.chen@huawei.com>
-Introduce cpu properties to give fine control over SVE vector lengths.
+In exynos4210_fimd_update(), the pointer s is dereferinced before
-We introduce a property for each valid length up to the current
+being check if it is valid, which may lead to NULL pointer dereference.
-maximum supported, which is 2048-bits. The properties are named, e.g.
+So move the assignment to global_width after checking that the s is valid.
 sve128, sve256, sve384, sve512, ..., where the number is the number of
 bits. See the updates to docs/arm-cpu-features.rst for a description
 of the semantics and for example uses.
-Note, as sve-max-vq is still present and we'd like to be able to
+Reported-by: Euler Robot <euler.robot@huawei.com>
-support qmp_query_cpu_model_expansion with guests launched with e.g.
+Signed-off-by: Alex Chen <alex.chen@huawei.com>
--cpu max,sve-max-vq=8 on their command lines, then we do allow
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-sve-max-vq and sve<N> properties to be provided at the same time, but
+Message-id: 5F9F8D88.9030102@huawei.com
 this is not recommended, and is why sve-max-vq is not mentioned in the
 document.  If sve-max-vq is provided then it enables all lengths smaller
 than and including the max and disables all lengths larger. It also has
 the side-effect that no larger lengths may be enabled and that the max
 itself cannot be disabled. Smaller non-power-of-two lengths may,
 however, be disabled, e.g. -cpu max,sve-max-vq=4,sve384=off provides a
 guest the vector lengths 128, 256, and 512 bits.
 This patch has been co-authored with Richard Henderson, who reworked
 the target/arm/cpu64.c changes in order to push all the validation and
 auto-enabling/disabling steps into the finalizer, resulting in a nice
 LOC reduction.
 Signed-off-by: Andrew Jones <drjones@redhat.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Eric Auger <eric.auger@redhat.com>
 Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
 Reviewed-by: Beata Michalska <beata.michalska@linaro.org>
 Message-id: 20191024121808.9612-5-drjones@redhat.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/qemu/bitops.h     |   1 +
+ hw/display/exynos4210_fimd.c | 4 +++-
- target/arm/cpu.h          |  19 ++++
+file changed, 3 insertions(+), 1 deletion(-)
  target/arm/cpu.c          |  19 ++++
  target/arm/cpu64.c        | 192 ++++++++++++++++++++++++++++++++++++-
  target/arm/helper.c       |  10 +-
  target/arm/monitor.c      |  12 +++
  tests/arm-cpu-features.c  | 194 ++++++++++++++++++++++++++++++++++++++
  docs/arm-cpu-features.rst | 168 +++++++++++++++++++++++++++++++--
 files changed, 606 insertions(+), 9 deletions(-)
-diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
+diff --git a/hw/display/exynos4210_fimd.c b/hw/display/exynos4210_fimd.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/qemu/bitops.h
+--- a/hw/display/exynos4210_fimd.c
-+++ b/include/qemu/bitops.h
++++ b/hw/display/exynos4210_fimd.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void exynos4210_fimd_update(void *opaque)
- #define BITS_PER_LONG           (sizeof (unsigned long) * BITS_PER_BYTE)
+     bool blend = false;
+     uint8_t *host_fb_addr;
- #define BIT(nr)                 (1UL << (nr))
+     bool is_dirty = false;
-+#define BIT_ULL(nr)             (1ULL << (nr))
+-    const int global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
- #define BIT_MASK(nr)            (1UL << ((nr) % BITS_PER_LONG))
++    int global_width;
- #define BIT_WORD(nr)            ((nr) / BITS_PER_LONG)
- #define BITS_TO_LONGS(nr)       DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
+     if (!s || !s->console || !s->enabled ||
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+         surface_bits_per_pixel(qemu_console_surface(s->console)) == 0) {
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct {
  #ifdef TARGET_AARCH64
  # define ARM_MAX_VQ    16
 +void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp);
 +uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq);
  #else
  # define ARM_MAX_VQ    1
 +static inline void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp) { }
 +static inline uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq)
 +{ return 0; }
  #endif
  typedef struct ARMVectorReg {
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
      /* Used to set the maximum vector length the cpu will support.  */
      uint32_t sve_max_vq;
 +
 +    /*
 +     * In sve_vq_map each set bit is a supported vector length of
 +     * (bit-number + 1) * 16 bytes, i.e. each bit number + 1 is the vector
 +     * length in quadwords.
 +     *
 +     * While processing properties during initialization, corresponding
 +     * sve_vq_init bits are set for bits in sve_vq_map that have been
 +     * set by properties.
 +     */
 +    DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ);
 +    DECLARE_BITMAP(sve_vq_init, ARM_MAX_VQ);
  };
  void arm_cpu_post_init(Object *obj);
@@ -XXX,XX +XXX,XX @@ static inline int arm_feature(CPUARMState *env, int feature)
      return (env->features & (1ULL << feature)) != 0;
  }
 +void arm_cpu_finalize_features(ARMCPU *cpu, Error **errp);
 +
  #if !defined(CONFIG_USER_ONLY)
  /* Return true if exception levels below EL3 are in secure state,
   * or would be following an exception return to that level.
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_finalizefn(Object *obj)
  #endif
  }
 +void arm_cpu_finalize_features(ARMCPU *cpu, Error **errp)
 +{
 +    Error *local_err = NULL;
 +
 +    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
 +        arm_cpu_sve_finalize(cpu, &local_err);
 +        if (local_err != NULL) {
 +            error_propagate(errp, local_err);
 +            return;
 +        }
 +    }
 +}
 +
  static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
  {
      CPUState *cs = CPU(dev);
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
          return;
      }
-+    arm_cpu_finalize_features(cpu, &local_err);
-+    if (local_err != NULL) {
-+        error_propagate(errp, local_err);
-+        return;
-+    }
 +
-     if (arm_feature(env, ARM_FEATURE_AARCH64) &&
++    global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
-         cpu->has_vfp != cpu->has_neon) {
+     exynos4210_update_resolution(s);
-         /*
+     surface = qemu_console_surface(s->console);
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
      define_arm_cp_regs(cpu, cortex_a72_a57_a53_cp_reginfo);
  }
 +void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
 +{
 +    /*
 +     * If any vector lengths are explicitly enabled with sve<N> properties,
 +     * then all other lengths are implicitly disabled.  If sve-max-vq is
 +     * specified then it is the same as explicitly enabling all lengths
 +     * up to and including the specified maximum, which means all larger
 +     * lengths will be implicitly disabled.  If no sve<N> properties
 +     * are enabled and sve-max-vq is not specified, then all lengths not
 +     * explicitly disabled will be enabled.  Additionally, all power-of-two
 +     * vector lengths less than the maximum enabled length will be
 +     * automatically enabled and all vector lengths larger than the largest
 +     * disabled power-of-two vector length will be automatically disabled.
 +     * Errors are generated if the user provided input that interferes with
 +     * any of the above.  Finally, if SVE is not disabled, then at least one
 +     * vector length must be enabled.
 +     */
 +    DECLARE_BITMAP(tmp, ARM_MAX_VQ);
 +    uint32_t vq, max_vq = 0;
 +
 +    /*
 +     * Process explicit sve<N> properties.
 +     * From the properties, sve_vq_map<N> implies sve_vq_init<N>.
 +     * Check first for any sve<N> enabled.
 +     */
 +    if (!bitmap_empty(cpu->sve_vq_map, ARM_MAX_VQ)) {
 +        max_vq = find_last_bit(cpu->sve_vq_map, ARM_MAX_VQ) + 1;
 +
 +        if (cpu->sve_max_vq && max_vq > cpu->sve_max_vq) {
 +            error_setg(errp, "cannot enable sve%d", max_vq * 128);
 +            error_append_hint(errp, "sve%d is larger than the maximum vector "
 +                              "length, sve-max-vq=%d (%d bits)\n",
 +                              max_vq * 128, cpu->sve_max_vq,
 +                              cpu->sve_max_vq * 128);
 +            return;
 +        }
 +
 +        /* Propagate enabled bits down through required powers-of-two. */
 +        for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
 +            if (!test_bit(vq - 1, cpu->sve_vq_init)) {
 +                set_bit(vq - 1, cpu->sve_vq_map);
 +            }
 +        }
 +    } else if (cpu->sve_max_vq == 0) {
 +        /*
 +         * No explicit bits enabled, and no implicit bits from sve-max-vq.
 +         */
 +        if (!cpu_isar_feature(aa64_sve, cpu)) {
 +            /* SVE is disabled and so are all vector lengths.  Good. */
 +            return;
 +        }
 +
 +        /* Disabling a power-of-two disables all larger lengths. */
 +        if (test_bit(0, cpu->sve_vq_init)) {
 +            error_setg(errp, "cannot disable sve128");
 +            error_append_hint(errp, "Disabling sve128 results in all vector "
 +                              "lengths being disabled.\n");
 +            error_append_hint(errp, "With SVE enabled, at least one vector "
 +                              "length must be enabled.\n");
 +            return;
 +        }
 +        for (vq = 2; vq <= ARM_MAX_VQ; vq <<= 1) {
 +            if (test_bit(vq - 1, cpu->sve_vq_init)) {
 +                break;
 +            }
 +        }
 +        max_vq = vq <= ARM_MAX_VQ ? vq - 1 : ARM_MAX_VQ;
 +
 +        bitmap_complement(cpu->sve_vq_map, cpu->sve_vq_init, max_vq);
 +        max_vq = find_last_bit(cpu->sve_vq_map, max_vq) + 1;
 +    }
 +
 +    /*
 +     * Process the sve-max-vq property.
 +     * Note that we know from the above that no bit above
 +     * sve-max-vq is currently set.
 +     */
 +    if (cpu->sve_max_vq != 0) {
 +        max_vq = cpu->sve_max_vq;
 +
 +        if (!test_bit(max_vq - 1, cpu->sve_vq_map) &&
 +            test_bit(max_vq - 1, cpu->sve_vq_init)) {
 +            error_setg(errp, "cannot disable sve%d", max_vq * 128);
 +            error_append_hint(errp, "The maximum vector length must be "
 +                              "enabled, sve-max-vq=%d (%d bits)\n",
 +                              max_vq, max_vq * 128);
 +            return;
 +        }
 +
 +        /* Set all bits not explicitly set within sve-max-vq. */
 +        bitmap_complement(tmp, cpu->sve_vq_init, max_vq);
 +        bitmap_or(cpu->sve_vq_map, cpu->sve_vq_map, tmp, max_vq);
 +    }
 +
 +    /*
 +     * We should know what max-vq is now.  Also, as we're done
 +     * manipulating sve-vq-map, we ensure any bits above max-vq
 +     * are clear, just in case anybody looks.
 +     */
 +    assert(max_vq != 0);
 +    bitmap_clear(cpu->sve_vq_map, max_vq, ARM_MAX_VQ - max_vq);
 +
 +    /* Ensure all required powers-of-two are enabled. */
 +    for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
 +        if (!test_bit(vq - 1, cpu->sve_vq_map)) {
 +            error_setg(errp, "cannot disable sve%d", vq * 128);
 +            error_append_hint(errp, "sve%d is required as it "
 +                              "is a power-of-two length smaller than "
 +                              "the maximum, sve%d\n",
 +                              vq * 128, max_vq * 128);
 +            return;
 +        }
 +    }
 +
 +    /*
 +     * Now that we validated all our vector lengths, the only question
 +     * left to answer is if we even want SVE at all.
 +     */
 +    if (!cpu_isar_feature(aa64_sve, cpu)) {
 +        error_setg(errp, "cannot enable sve%d", max_vq * 128);
 +        error_append_hint(errp, "SVE must be enabled to enable vector "
 +                          "lengths.\n");
 +        error_append_hint(errp, "Add sve=on to the CPU property list.\n");
 +        return;
 +    }
 +
 +    /* From now on sve_max_vq is the actual maximum supported length. */
 +    cpu->sve_max_vq = max_vq;
 +}
 +
 +uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq)
 +{
 +    uint32_t bitnum;
 +
 +    /*
 +     * We allow vq == ARM_MAX_VQ + 1 to be input because the caller may want
 +     * to find the maximum vq enabled, which may be ARM_MAX_VQ, but this
 +     * function always returns the next smaller than the input.
 +     */
 +    assert(vq && vq <= ARM_MAX_VQ + 1);
 +
 +    bitnum = find_last_bit(cpu->sve_vq_map, vq - 1);
 +    return bitnum == vq - 1 ? 0 : bitnum + 1;
 +}
 +
  static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char *name,
                                     void *opaque, Error **errp)
  {
@@ -XXX,XX +XXX,XX @@ static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char *name,
      error_propagate(errp, err);
  }
 +static void cpu_arm_get_sve_vq(Object *obj, Visitor *v, const char *name,
 +                               void *opaque, Error **errp)
 +{
 +    ARMCPU *cpu = ARM_CPU(obj);
 +    uint32_t vq = atoi(&name[3]) / 128;
 +    bool value;
 +
 +    /* All vector lengths are disabled when SVE is off. */
 +    if (!cpu_isar_feature(aa64_sve, cpu)) {
 +        value = false;
 +    } else {
 +        value = test_bit(vq - 1, cpu->sve_vq_map);
 +    }
 +    visit_type_bool(v, name, &value, errp);
 +}
 +
 +static void cpu_arm_set_sve_vq(Object *obj, Visitor *v, const char *name,
 +                               void *opaque, Error **errp)
 +{
 +    ARMCPU *cpu = ARM_CPU(obj);
 +    uint32_t vq = atoi(&name[3]) / 128;
 +    Error *err = NULL;
 +    bool value;
 +
 +    visit_type_bool(v, name, &value, &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
 +    }
 +
 +    if (value) {
 +        set_bit(vq - 1, cpu->sve_vq_map);
 +    } else {
 +        clear_bit(vq - 1, cpu->sve_vq_map);
 +    }
 +    set_bit(vq - 1, cpu->sve_vq_init);
 +}
 +
  static void cpu_arm_get_sve(Object *obj, Visitor *v, const char *name,
                              void *opaque, Error **errp)
  {
@@ -XXX,XX +XXX,XX @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
  static void aarch64_max_initfn(Object *obj)
  {
      ARMCPU *cpu = ARM_CPU(obj);
 +    uint32_t vq;
      if (kvm_enabled()) {
          kvm_arm_set_cpu_features_from_host(cpu);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          cpu->dcz_blocksize = 7; /*  512 bytes */
  #endif
 -        cpu->sve_max_vq = ARM_MAX_VQ;
          object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
                              cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
          object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
                              cpu_arm_set_sve, NULL, NULL, &error_fatal);
 +
 +        for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
 +            char name[8];
 +            sprintf(name, "sve%d", vq * 128);
 +            object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
 +                                cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
 +        }
      }
  }
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ int sve_exception_el(CPUARMState *env, int el)
      return 0;
  }
 +static uint32_t sve_zcr_get_valid_len(ARMCPU *cpu, uint32_t start_len)
 +{
 +    uint32_t start_vq = (start_len & 0xf) + 1;
 +
 +    return arm_cpu_vq_map_next_smaller(cpu, start_vq + 1) - 1;
 +}
 +
  /*
   * Given that SVE is enabled, return the vector length for EL.
   */
@@ -XXX,XX +XXX,XX @@ uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
      if (arm_feature(env, ARM_FEATURE_EL3)) {
          zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[3]);
      }
 -    return zcr_len;
 +
 +    return sve_zcr_get_valid_len(cpu, zcr_len);
  }
  static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
 diff --git a/target/arm/monitor.c b/target/arm/monitor.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/monitor.c
 +++ b/target/arm/monitor.c
@@ -XXX,XX +XXX,XX @@ GICCapabilityList *qmp_query_gic_capabilities(Error **errp)
      return head;
  }
 +QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
 +
  /*
   * These are cpu model features we want to advertise. The order here
   * matters as this is the order in which qmp_query_cpu_model_expansion
@@ -XXX,XX +XXX,XX @@ GICCapabilityList *qmp_query_gic_capabilities(Error **errp)
   */
  static const char *cpu_model_advertised_features[] = {
      "aarch64", "pmu", "sve",
 +    "sve128", "sve256", "sve384", "sve512",
 +    "sve640", "sve768", "sve896", "sve1024", "sve1152", "sve1280",
 +    "sve1408", "sve1536", "sve1664", "sve1792", "sve1920", "sve2048",
      NULL
  };
@@ -XXX,XX +XXX,XX @@ CpuModelExpansionInfo *qmp_query_cpu_model_expansion(CpuModelExpansionType type,
          if (!err) {
              visit_check_struct(visitor, &err);
          }
 +        if (!err) {
 +            arm_cpu_finalize_features(ARM_CPU(obj), &err);
 +        }
          visit_end_struct(visitor, NULL);
          visit_free(visitor);
          if (err) {
@@ -XXX,XX +XXX,XX @@ CpuModelExpansionInfo *qmp_query_cpu_model_expansion(CpuModelExpansionType type,
              error_propagate(errp, err);
              return NULL;
          }
 +    } else {
 +        Error *err = NULL;
 +        arm_cpu_finalize_features(ARM_CPU(obj), &err);
 +        assert(err == NULL);
      }
      expansion_info = g_new0(CpuModelExpansionInfo, 1);
 diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/arm-cpu-features.c
 +++ b/tests/arm-cpu-features.c
@@ -XXX,XX +XXX,XX @@
   * See the COPYING file in the top-level directory.
   */
  #include "qemu/osdep.h"
 +#include "qemu/bitops.h"
  #include "libqtest.h"
  #include "qapi/qmp/qdict.h"
  #include "qapi/qmp/qjson.h"
 +/*
 + * We expect the SVE max-vq to be 16. Also it must be <= 64
 + * for our test code, otherwise 'vls' can't just be a uint64_t.
 + */
 +#define SVE_MAX_VQ 16
 +
  #define MACHINE    "-machine virt,gic-version=max "
  #define QUERY_HEAD "{ 'execute': 'query-cpu-model-expansion', " \
                       "'arguments': { 'type': 'full', "
@@ -XXX,XX +XXX,XX @@ static void assert_bad_props(QTestState *qts, const char *cpu_type)
      qobject_unref(resp);
  }
 +static uint64_t resp_get_sve_vls(QDict *resp)
 +{
 +    QDict *props;
 +    const QDictEntry *e;
 +    uint64_t vls = 0;
 +    int n = 0;
 +
 +    g_assert(resp);
 +    g_assert(resp_has_props(resp));
 +
 +    props = resp_get_props(resp);
 +
 +    for (e = qdict_first(props); e; e = qdict_next(props, e)) {
 +        if (strlen(e->key) > 3 && !strncmp(e->key, "sve", 3) &&
 +            g_ascii_isdigit(e->key[3])) {
 +            char *endptr;
 +            int bits;
 +
 +            bits = g_ascii_strtoll(&e->key[3], &endptr, 10);
 +            if (!bits || *endptr != '\0') {
 +                continue;
 +            }
 +
 +            if (qdict_get_bool(props, e->key)) {
 +                vls |= BIT_ULL((bits / 128) - 1);
 +            }
 +            ++n;
 +        }
 +    }
 +
 +    g_assert(n == SVE_MAX_VQ);
 +
 +    return vls;
 +}
 +
 +#define assert_sve_vls(qts, cpu_type, expected_vls, fmt, ...)          \
 +({                                                                     \
 +    QDict *_resp = do_query(qts, cpu_type, fmt, ##__VA_ARGS__);        \
 +    g_assert(_resp);                                                   \
 +    g_assert(resp_has_props(_resp));                                   \
 +    g_assert(resp_get_sve_vls(_resp) == expected_vls);                 \
 +    qobject_unref(_resp);                                              \
 +})
 +
 +static void sve_tests_default(QTestState *qts, const char *cpu_type)
 +{
 +    /*
 +     * With no sve-max-vq or sve<N> properties on the command line
 +     * the default is to have all vector lengths enabled. This also
 +     * tests that 'sve' is 'on' by default.
 +     */
 +    assert_sve_vls(qts, cpu_type, BIT_ULL(SVE_MAX_VQ) - 1, NULL);
 +
 +    /* With SVE off, all vector lengths should also be off. */
 +    assert_sve_vls(qts, cpu_type, 0, "{ 'sve': false }");
 +
 +    /* With SVE on, we must have at least one vector length enabled. */
 +    assert_error(qts, cpu_type, "cannot disable sve128", "{ 'sve128': false }");
 +
 +    /* Basic enable/disable tests. */
 +    assert_sve_vls(qts, cpu_type, 0x7, "{ 'sve384': true }");
 +    assert_sve_vls(qts, cpu_type, ((BIT_ULL(SVE_MAX_VQ) - 1) & ~BIT_ULL(2)),
 +                   "{ 'sve384': false }");
 +
 +    /*
 +     * ---------------------------------------------------------------------
 +     *               power-of-two(vq)   all-power-            can      can
 +     *                                  of-two(< vq)        enable   disable
 +     * ---------------------------------------------------------------------
 +     * vq < max_vq      no                MUST*              yes      yes
 +     * vq < max_vq      yes               MUST*              yes      no
 +     * ---------------------------------------------------------------------
 +     * vq == max_vq     n/a               MUST*              yes**    yes**
 +     * ---------------------------------------------------------------------
 +     * vq > max_vq      n/a               no                 no       yes
 +     * vq > max_vq      n/a               yes                yes      yes
 +     * ---------------------------------------------------------------------
 +     *
 +     * [*] "MUST" means this requirement must already be satisfied,
 +     *     otherwise 'max_vq' couldn't itself be enabled.
 +     *
 +     * [**] Not testable with the QMP interface, only with the command line.
 +     */
 +
 +    /* max_vq := 8 */
 +    assert_sve_vls(qts, cpu_type, 0x8b, "{ 'sve1024': true }");
 +
 +    /* max_vq := 8, vq < max_vq, !power-of-two(vq) */
 +    assert_sve_vls(qts, cpu_type, 0x8f,
 +                   "{ 'sve1024': true, 'sve384': true }");
 +    assert_sve_vls(qts, cpu_type, 0x8b,
 +                   "{ 'sve1024': true, 'sve384': false }");
 +
 +    /* max_vq := 8, vq < max_vq, power-of-two(vq) */
 +    assert_sve_vls(qts, cpu_type, 0x8b,
 +                   "{ 'sve1024': true, 'sve256': true }");
 +    assert_error(qts, cpu_type, "cannot disable sve256",
 +                 "{ 'sve1024': true, 'sve256': false }");
 +
 +    /* max_vq := 3, vq > max_vq, !all-power-of-two(< vq) */
 +    assert_error(qts, cpu_type, "cannot disable sve512",
 +                 "{ 'sve384': true, 'sve512': false, 'sve640': true }");
 +
 +    /*
 +     * We can disable power-of-two vector lengths when all larger lengths
 +     * are also disabled. We only need to disable the power-of-two length,
 +     * as all non-enabled larger lengths will then be auto-disabled.
 +     */
 +    assert_sve_vls(qts, cpu_type, 0x7, "{ 'sve512': false }");
 +
 +    /* max_vq := 3, vq > max_vq, all-power-of-two(< vq) */
 +    assert_sve_vls(qts, cpu_type, 0x1f,
 +                   "{ 'sve384': true, 'sve512': true, 'sve640': true }");
 +    assert_sve_vls(qts, cpu_type, 0xf,
 +                   "{ 'sve384': true, 'sve512': true, 'sve640': false }");
 +}
 +
 +static void sve_tests_sve_max_vq_8(const void *data)
 +{
 +    QTestState *qts;
 +
 +    qts = qtest_init(MACHINE "-cpu max,sve-max-vq=8");
 +
 +    assert_sve_vls(qts, "max", BIT_ULL(8) - 1, NULL);
 +
 +    /*
 +     * Disabling the max-vq set by sve-max-vq is not allowed, but
 +     * of course enabling it is OK.
 +     */
 +    assert_error(qts, "max", "cannot disable sve1024", "{ 'sve1024': false }");
 +    assert_sve_vls(qts, "max", 0xff, "{ 'sve1024': true }");
 +
 +    /*
 +     * Enabling anything larger than max-vq set by sve-max-vq is not
 +     * allowed, but of course disabling everything larger is OK.
 +     */
 +    assert_error(qts, "max", "cannot enable sve1152", "{ 'sve1152': true }");
 +    assert_sve_vls(qts, "max", 0xff, "{ 'sve1152': false }");
 +
 +    /*
 +     * We can enable/disable non power-of-two lengths smaller than the
 +     * max-vq set by sve-max-vq, but, while we can enable power-of-two
 +     * lengths, we can't disable them.
 +     */
 +    assert_sve_vls(qts, "max", 0xff, "{ 'sve384': true }");
 +    assert_sve_vls(qts, "max", 0xfb, "{ 'sve384': false }");
 +    assert_sve_vls(qts, "max", 0xff, "{ 'sve256': true }");
 +    assert_error(qts, "max", "cannot disable sve256", "{ 'sve256': false }");
 +
 +    qtest_quit(qts);
 +}
 +
 +static void sve_tests_sve_off(const void *data)
 +{
 +    QTestState *qts;
 +
 +    qts = qtest_init(MACHINE "-cpu max,sve=off");
 +
 +    /* SVE is off, so the map should be empty. */
 +    assert_sve_vls(qts, "max", 0, NULL);
 +
 +    /* The map stays empty even if we turn lengths off. */
 +    assert_sve_vls(qts, "max", 0, "{ 'sve128': false }");
 +
 +    /* It's an error to enable lengths when SVE is off. */
 +    assert_error(qts, "max", "cannot enable sve128", "{ 'sve128': true }");
 +
 +    /* With SVE re-enabled we should get all vector lengths enabled. */
 +    assert_sve_vls(qts, "max", BIT_ULL(SVE_MAX_VQ) - 1, "{ 'sve': true }");
 +
 +    /* Or enable SVE with just specific vector lengths. */
 +    assert_sve_vls(qts, "max", 0x3,
 +                   "{ 'sve': true, 'sve128': true, 'sve256': true }");
 +
 +    qtest_quit(qts);
 +}
 +
  static void test_query_cpu_model_expansion(const void *data)
  {
      QTestState *qts;
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion(const void *data)
      if (g_str_equal(qtest_get_arch(), "aarch64")) {
          assert_has_feature(qts, "max", "aarch64");
          assert_has_feature(qts, "max", "sve");
 +        assert_has_feature(qts, "max", "sve128");
          assert_has_feature(qts, "cortex-a57", "pmu");
          assert_has_feature(qts, "cortex-a57", "aarch64");
 +        sve_tests_default(qts, "max");
 +
          /* Test that features that depend on KVM generate errors without. */
          assert_error(qts, "max",
                       "'aarch64' feature cannot be disabled "
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
      qtest_add_data_func("/arm/query-cpu-model-expansion",
                          NULL, test_query_cpu_model_expansion);
 +    if (g_str_equal(qtest_get_arch(), "aarch64")) {
 +        qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-max-vq-8",
 +                            NULL, sve_tests_sve_max_vq_8);
 +        qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-off",
 +                            NULL, sve_tests_sve_off);
 +    }
 +
      if (kvm_available) {
          qtest_add_data_func("/arm/kvm/query-cpu-model-expansion",
                              NULL, test_query_cpu_model_expansion_kvm);
 diff --git a/docs/arm-cpu-features.rst b/docs/arm-cpu-features.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/arm-cpu-features.rst
 +++ b/docs/arm-cpu-features.rst
@@ -XXX,XX +XXX,XX @@ block in the script for usage) is used to issue the QMP commands.
        (QEMU) query-cpu-model-expansion type=full model={"name":"max"}
        { "return": {
          "model": { "name": "max", "props": {
 -        "pmu": true, "aarch64": true
 +        "sve1664": true, "pmu": true, "sve1792": true, "sve1920": true,
 +        "sve128": true, "aarch64": true, "sve1024": true, "sve": true,
 +        "sve640": true, "sve768": true, "sve1408": true, "sve256": true,
 +        "sve1152": true, "sve512": true, "sve384": true, "sve1536": true,
 +        "sve896": true, "sve1280": true, "sve2048": true
        }}}}
 -We see that the `max` CPU type has the `pmu` and `aarch64` CPU features.
 -We also see that the CPU features are enabled, as they are all `true`.
 +We see that the `max` CPU type has the `pmu`, `aarch64`, `sve`, and many
 +`sve<N>` CPU features.  We also see that all the CPU features are
 +enabled, as they are all `true`.  (The `sve<N>` CPU features are all
 +optional SVE vector lengths (see "SVE CPU Properties").  While with TCG
 +all SVE vector lengths can be supported, when KVM is in use it's more
 +likely that only a few lengths will be supported, if SVE is supported at
 +all.)
  (2) Let's try to disable the PMU::
        (QEMU) query-cpu-model-expansion type=full model={"name":"max","props":{"pmu":false}}
        { "return": {
          "model": { "name": "max", "props": {
 -        "pmu": false, "aarch64": true
 +        "sve1664": true, "pmu": false, "sve1792": true, "sve1920": true,
 +        "sve128": true, "aarch64": true, "sve1024": true, "sve": true,
 +        "sve640": true, "sve768": true, "sve1408": true, "sve256": true,
 +        "sve1152": true, "sve512": true, "sve384": true, "sve1536": true,
 +        "sve896": true, "sve1280": true, "sve2048": true
        }}}}
  We see it worked, as `pmu` is now `false`.
@@ -XXX,XX +XXX,XX @@ We see it worked, as `pmu` is now `false`.
  It looks like this feature is limited to a configuration we do not
  currently have.
 -(4) Let's try probing CPU features for the Cortex-A15 CPU type::
 +(4) Let's disable `sve` and see what happens to all the optional SVE
 +    vector lengths::
 +
 +      (QEMU) query-cpu-model-expansion type=full model={"name":"max","props":{"sve":false}}
 +      { "return": {
 +        "model": { "name": "max", "props": {
 +        "sve1664": false, "pmu": true, "sve1792": false, "sve1920": false,
 +        "sve128": false, "aarch64": true, "sve1024": false, "sve": false,
 +        "sve640": false, "sve768": false, "sve1408": false, "sve256": false,
 +        "sve1152": false, "sve512": false, "sve384": false, "sve1536": false,
 +        "sve896": false, "sve1280": false, "sve2048": false
 +      }}}}
 +
 +As expected they are now all `false`.
 +
 +(5) Let's try probing CPU features for the Cortex-A15 CPU type::
        (QEMU) query-cpu-model-expansion type=full model={"name":"cortex-a15"}
        {"return": {"model": {"name": "cortex-a15", "props": {"pmu": true}}}}
@@ -XXX,XX +XXX,XX @@ After determining which CPU features are available and supported for a
  given CPU type, then they may be selectively enabled or disabled on the
  QEMU command line with that CPU type::
 -  $ qemu-system-aarch64 -M virt -cpu max,pmu=off
 +  $ qemu-system-aarch64 -M virt -cpu max,pmu=off,sve=on,sve128=on,sve256=on
 -The example above disables the PMU for the `max` CPU type.
 +The example above disables the PMU and enables the first two SVE vector
 +lengths for the `max` CPU type.  Note, the `sve=on` isn't actually
 +necessary, because, as we observed above with our probe of the `max` CPU
 +type, `sve` is already on by default.  Also, based on our probe of
 +defaults, it would seem we need to disable many SVE vector lengths, rather
 +than only enabling the two we want.  This isn't the case, because, as
 +disabling many SVE vector lengths would be quite verbose, the `sve<N>` CPU
 +properties have special semantics (see "SVE CPU Property Parsing
 +Semantics").
 +
 +SVE CPU Properties
 +==================
 +
 +There are two types of SVE CPU properties: `sve` and `sve<N>`.  The first
 +is used to enable or disable the entire SVE feature, just as the `pmu`
 +CPU property completely enables or disables the PMU.  The second type
 +is used to enable or disable specific vector lengths, where `N` is the
 +number of bits of the length.  The `sve<N>` CPU properties have special
 +dependencies and constraints, see "SVE CPU Property Dependencies and
 +Constraints" below.  Additionally, as we want all supported vector lengths
 +to be enabled by default, then, in order to avoid overly verbose command
 +lines (command lines full of `sve<N>=off`, for all `N` not wanted), we
 +provide the parsing semantics listed in "SVE CPU Property Parsing
 +Semantics".
 +
 +SVE CPU Property Dependencies and Constraints
 +---------------------------------------------
 +
 +  1) At least one vector length must be enabled when `sve` is enabled.
 +
 +  2) If a vector length `N` is enabled, then all power-of-two vector
 +     lengths smaller than `N` must also be enabled.  E.g. if `sve512`
 +     is enabled, then the 128-bit and 256-bit vector lengths must also
 +     be enabled.
 +
 +SVE CPU Property Parsing Semantics
 +----------------------------------
 +
 +  1) If SVE is disabled (`sve=off`), then which SVE vector lengths
 +     are enabled or disabled is irrelevant to the guest, as the entire
 +     SVE feature is disabled and that disables all vector lengths for
 +     the guest.  However QEMU will still track any `sve<N>` CPU
 +     properties provided by the user.  If later an `sve=on` is provided,
 +     then the guest will get only the enabled lengths.  If no `sve=on`
 +     is provided and there are explicitly enabled vector lengths, then
 +     an error is generated.
 +
 +  2) If SVE is enabled (`sve=on`), but no `sve<N>` CPU properties are
 +     provided, then all supported vector lengths are enabled, including
 +     the non-power-of-two lengths.
 +
 +  3) If SVE is enabled, then an error is generated when attempting to
 +     disable the last enabled vector length (see constraint (1) of "SVE
 +     CPU Property Dependencies and Constraints").
 +
 +  4) If one or more vector lengths have been explicitly enabled and at
 +     at least one of the dependency lengths of the maximum enabled length
 +     has been explicitly disabled, then an error is generated (see
 +     constraint (2) of "SVE CPU Property Dependencies and Constraints").
 +
 +  5) If one or more `sve<N>` CPU properties are set `off`, but no `sve<N>`,
 +     CPU properties are set `on`, then the specified vector lengths are
 +     disabled but the default for any unspecified lengths remains enabled.
 +     Disabling a power-of-two vector length also disables all vector
 +     lengths larger than the power-of-two length (see constraint (2) of
 +     "SVE CPU Property Dependencies and Constraints").
 +
 +  6) If one or more `sve<N>` CPU properties are set to `on`, then they
 +     are enabled and all unspecified lengths default to disabled, except
 +     for the required lengths per constraint (2) of "SVE CPU Property
 +     Dependencies and Constraints", which will even be auto-enabled if
 +     they were not explicitly enabled.
 +
 +  7) If SVE was disabled (`sve=off`), allowing all vector lengths to be
 +     explicitly disabled (i.e. avoiding the error specified in (3) of
 +     "SVE CPU Property Parsing Semantics"), then if later an `sve=on` is
 +     provided an error will be generated.  To avoid this error, one must
 +     enable at least one vector length prior to enabling SVE.
 +
 +SVE CPU Property Examples
 +-------------------------
 +
 +  1) Disable SVE::
 +
 +     $ qemu-system-aarch64 -M virt -cpu max,sve=off
 +
 +  2) Implicitly enable all vector lengths for the `max` CPU type::
 +
 +     $ qemu-system-aarch64 -M virt -cpu max
 +
 +  3) Only enable the 128-bit vector length::
 +
 +     $ qemu-system-aarch64 -M virt -cpu max,sve128=on
 +
 +  4) Disable the 512-bit vector length and all larger vector lengths,
 +     since 512 is a power-of-two.  This results in all the smaller,
 +     uninitialized lengths (128, 256, and 384) defaulting to enabled::
 +
 +     $ qemu-system-aarch64 -M virt -cpu max,sve512=off
 +
 +  5) Enable the 128-bit, 256-bit, and 512-bit vector lengths::
 +
 +     $ qemu-system-aarch64 -M virt -cpu max,sve128=on,sve256=on,sve512=on
 +
 +  6) The same as (5), but since the 128-bit and 256-bit vector
 +     lengths are required for the 512-bit vector length to be enabled,
 +     then allow them to be auto-enabled::
 +
 +     $ qemu-system-aarch64 -M virt -cpu max,sve512=on
 +
 +  7) Do the same as (6), but by first disabling SVE and then re-enabling it::
 +
 +     $ qemu-system-aarch64 -M virt -cpu max,sve=off,sve512=on,sve=on
 +
 +  8) Force errors regarding the last vector length::
 +
 +     $ qemu-system-aarch64 -M virt -cpu max,sve128=off
 +     $ qemu-system-aarch64 -M virt -cpu max,sve=off,sve128=off,sve=on
 +
 +SVE CPU Property Recommendations
 +--------------------------------
 +
 +The examples in "SVE CPU Property Examples" exhibit many ways to select
 +vector lengths which developers may find useful in order to avoid overly
 +verbose command lines.  However, the recommended way to select vector
 +lengths is to explicitly enable each desired length.  Therefore only
 +example's (1), (3), and (5) exhibit recommended uses of the properties.
 --
 .20.1

-[PULL 29/51] hw/dma/xilinx_axidma.c: Switch to transaction-based ptimer API
+[PULL 21/26] target/arm: Get correct MMU index for other-security-state
-Switch the xilinx_axidma code away from bottom-half based ptimers to
+In arm_v7m_mmu_idx_for_secstate() we get the 'priv' level to pass to
-the new transaction-based ptimer API.  This just requires adding
+armv7m_mmu_idx_for_secstate_and_priv() by calling arm_current_el().
-begin/commit calls around the various places that modify the ptimer
+This is incorrect when the security state being queried is not the
-state, and using the new ptimer_init() function to create the timer.
+current one, because arm_current_el() uses the current security state
 to determine which of the banked CONTROL.nPRIV bits to look at.
 The effect was that if (for instance) Secure state was in privileged
 mode but Non-Secure was not then we would return the wrong MMU index.
 The only places where we are using this function in a way that could
 trigger this bug are for the stack loads during a v8M function-return
 and for the instruction fetch of a v8M SG insn.
 Fix the bug by expanding out the M-profile version of the
 arm_current_el() logic inline so it can use the passed in secstate
 rather than env->v7m.secure.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Message-id: 20201022164408.13214-1-peter.maydell@linaro.org
 Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Message-id: 20191017132122.4402-4-peter.maydell@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/dma/xilinx_axidma.c | 9 +++++----
+ target/arm/m_helper.c | 3 ++-
-file changed, 5 insertions(+), 4 deletions(-)
+file changed, 2 insertions(+), 1 deletion(-)
-diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
+diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/dma/xilinx_axidma.c
+--- a/target/arm/m_helper.c
-+++ b/hw/dma/xilinx_axidma.c
++++ b/target/arm/m_helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
- #include "hw/ptimer.h"
+ /* Return the MMU index for a v7M CPU in the specified security state */
- #include "hw/qdev-properties.h"
+ ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
- #include "qemu/log.h"
+ {
--#include "qemu/main-loop.h"
+-    bool priv = arm_current_el(env) != 0;
- #include "qemu/module.h"
++    bool priv = arm_v7m_is_handler_mode(env) ||
++        !(env->v7m.control[secstate] & 1);
- #include "hw/stream.h"
-@@ -XXX,XX +XXX,XX @@ enum {
+     return arm_v7m_mmu_idx_for_secstate_and_priv(env, secstate, priv);
  };
  struct Stream {
 -    QEMUBH *bh;
      ptimer_state *ptimer;
      qemu_irq irq;
@@ -XXX,XX +XXX,XX @@ static void stream_complete(struct Stream *s)
      unsigned int comp_delay;
      /* Start the delayed timer.  */
 +    ptimer_transaction_begin(s->ptimer);
      comp_delay = s->regs[R_DMACR] >> 24;
      if (comp_delay) {
          ptimer_stop(s->ptimer);
@@ -XXX,XX +XXX,XX @@ static void stream_complete(struct Stream *s)
          s->regs[R_DMASR] |= DMASR_IOC_IRQ;
          stream_reload_complete_cnt(s);
      }
 +    ptimer_transaction_commit(s->ptimer);
  }
- static void stream_process_mem2s(struct Stream *s, StreamSlave *tx_data_dev,
-@@ -XXX,XX +XXX,XX @@ static void xilinx_axidma_realize(DeviceState *dev, Error **errp)
-         struct Stream *st = &s->streams[i];
-         st->nr = i;
--        st->bh = qemu_bh_new(timer_hit, st);
--        st->ptimer = ptimer_init_with_bh(st->bh, PTIMER_POLICY_DEFAULT);
-+        st->ptimer = ptimer_init(timer_hit, st, PTIMER_POLICY_DEFAULT);
-+        ptimer_transaction_begin(st->ptimer);
-         ptimer_set_freq(st->ptimer, s->freqhz);
-+        ptimer_transaction_commit(st->ptimer);
-     }
-     return;
 --
 .20.1

-[PULL 30/51] hw/timer/slavio_timer: Remove useless check for NULL t->timer
+Deleted patch
-In the slavio timer devcie, the ptimer TimerContext::timer is
-always created by slavio_timer_init(), so there's no need to
-check it for NULL; remove the single unneeded NULL check.
-This will be useful to avoid compiler/Coverity errors when
-a subsequent change adds a use of t->timer before the location
-we currently do the NULL check.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20191021134357.14266-2-peter.maydell@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- hw/timer/slavio_timer.c | 12 +++++-------
-file changed, 5 insertions(+), 7 deletions(-)
-diff --git a/hw/timer/slavio_timer.c b/hw/timer/slavio_timer.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/timer/slavio_timer.c
-+++ b/hw/timer/slavio_timer.c
-@@ -XXX,XX +XXX,XX @@ static void slavio_timer_mem_writel(void *opaque, hwaddr addr,
-             // set limit, reset counter
-             qemu_irq_lower(t->irq);
-             t->limit = val & TIMER_MAX_COUNT32;
--            if (t->timer) {
--                if (t->limit == 0) { /* free-run */
--                    ptimer_set_limit(t->timer,
--                                     LIMIT_TO_PERIODS(TIMER_MAX_COUNT32), 1);
--                } else {
--                    ptimer_set_limit(t->timer, LIMIT_TO_PERIODS(t->limit), 1);
--                }
-+            if (t->limit == 0) { /* free-run */
-+                ptimer_set_limit(t->timer,
-+                                 LIMIT_TO_PERIODS(TIMER_MAX_COUNT32), 1);
-+            } else {
-+                ptimer_set_limit(t->timer, LIMIT_TO_PERIODS(t->limit), 1);
-             }
-         }
-         break;
---
-.20.1

-[PULL 34/51] hw/watchdog/milkymist-sysctl.c: Switch to transaction-based ptimer API
+[PULL 22/26] configure: Test that gio libs from pkg-config work
-Switch the milkymist-sysctl code away from bottom-half based
+On some hosts (eg Ubuntu Bionic) pkg-config returns a set of
-ptimers to the new transaction-based ptimer API.  This just requires
+libraries for gio-2.0 which don't actually work when compiling
-adding begin/commit calls around the various places that modify the
+statically. (Specifically, the returned library string includes
-ptimer state, and using the new ptimer_init() function to create the
+-lmount, but not -lblkid which -lmount depends upon, so linking
-timer.
+fails due to missing symbols.)
 Check that the libraries work, and don't enable gio if they don't,
 in the same way we do for gnutls.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20191021141040.11007-1-peter.maydell@linaro.org
+Message-id: 20200928160402.7961-1-peter.maydell@linaro.org
 ---
- hw/timer/milkymist-sysctl.c | 25 ++++++++++++++++++-------
+ configure | 10 +++++++++-
-file changed, 18 insertions(+), 7 deletions(-)
+file changed, 9 insertions(+), 1 deletion(-)
-diff --git a/hw/timer/milkymist-sysctl.c b/hw/timer/milkymist-sysctl.c
+diff --git a/configure b/configure
-index XXXXXXX..XXXXXXX 100644
+index XXXXXXX..XXXXXXX 100755
---- a/hw/timer/milkymist-sysctl.c
+--- a/configure
-+++ b/hw/timer/milkymist-sysctl.c
++++ b/configure
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ if test "$static" = yes && test "$mingw32" = yes; then
- #include "hw/ptimer.h"
+ fi
- #include "hw/qdev-properties.h"
- #include "qemu/error-report.h"
+ if $pkg_config --atleast-version=$glib_req_ver gio-2.0; then
--#include "qemu/main-loop.h"
+-    gio=yes
- #include "qemu/module.h"
+     gio_cflags=$($pkg_config --cflags gio-2.0)
+     gio_libs=$($pkg_config --libs gio-2.0)
- enum {
+     gdbus_codegen=$($pkg_config --variable=gdbus_codegen gio-2.0)
-@@ -XXX,XX +XXX,XX @@ struct MilkymistSysctlState {
+     if [ ! -x "$gdbus_codegen" ]; then
+         gdbus_codegen=
-     MemoryRegion regs_region;
+     fi
++    # Check that the libraries actually work -- Ubuntu 18.04 ships
--    QEMUBH *bh0;
++    # with pkg-config --static --libs data for gio-2.0 that is missing
--    QEMUBH *bh1;
++    # -lblkid and will give a link error.
-     ptimer_state *ptimer0;
++    write_c_skeleton
-     ptimer_state *ptimer1;
++    if compile_prog "" "gio_libs" ; then
++        gio=yes
-@@ -XXX,XX +XXX,XX @@ static void sysctl_write(void *opaque, hwaddr addr, uint64_t value,
++    else
-         s->regs[addr] = value;
++        gio=no
-         break;
++    fi
-     case R_TIMER0_COMPARE:
+ else
-+        ptimer_transaction_begin(s->ptimer0);
+     gio=no
-         ptimer_set_limit(s->ptimer0, value, 0);
+ fi
          s->regs[addr] = value;
 +        ptimer_transaction_commit(s->ptimer0);
          break;
      case R_TIMER1_COMPARE:
 +        ptimer_transaction_begin(s->ptimer1);
          ptimer_set_limit(s->ptimer1, value, 0);
          s->regs[addr] = value;
 +        ptimer_transaction_commit(s->ptimer1);
          break;
      case R_TIMER0_CONTROL:
 +        ptimer_transaction_begin(s->ptimer0);
          s->regs[addr] = value;
          if (s->regs[R_TIMER0_CONTROL] & CTRL_ENABLE) {
              trace_milkymist_sysctl_start_timer0();
@@ -XXX,XX +XXX,XX @@ static void sysctl_write(void *opaque, hwaddr addr, uint64_t value,
              trace_milkymist_sysctl_stop_timer0();
              ptimer_stop(s->ptimer0);
          }
 +        ptimer_transaction_commit(s->ptimer0);
          break;
      case R_TIMER1_CONTROL:
 +        ptimer_transaction_begin(s->ptimer1);
          s->regs[addr] = value;
          if (s->regs[R_TIMER1_CONTROL] & CTRL_ENABLE) {
              trace_milkymist_sysctl_start_timer1();
@@ -XXX,XX +XXX,XX @@ static void sysctl_write(void *opaque, hwaddr addr, uint64_t value,
              trace_milkymist_sysctl_stop_timer1();
              ptimer_stop(s->ptimer1);
          }
 +        ptimer_transaction_commit(s->ptimer1);
          break;
      case R_ICAP:
          sysctl_icap_write(s, value);
@@ -XXX,XX +XXX,XX @@ static void milkymist_sysctl_reset(DeviceState *d)
          s->regs[i] = 0;
      }
 +    ptimer_transaction_begin(s->ptimer0);
      ptimer_stop(s->ptimer0);
 +    ptimer_transaction_commit(s->ptimer0);
 +    ptimer_transaction_begin(s->ptimer1);
      ptimer_stop(s->ptimer1);
 +    ptimer_transaction_commit(s->ptimer1);
      /* defaults */
      s->regs[R_ICAP] = ICAP_READY;
@@ -XXX,XX +XXX,XX @@ static void milkymist_sysctl_realize(DeviceState *dev, Error **errp)
  {
      MilkymistSysctlState *s = MILKYMIST_SYSCTL(dev);
 -    s->bh0 = qemu_bh_new(timer0_hit, s);
 -    s->bh1 = qemu_bh_new(timer1_hit, s);
 -    s->ptimer0 = ptimer_init_with_bh(s->bh0, PTIMER_POLICY_DEFAULT);
 -    s->ptimer1 = ptimer_init_with_bh(s->bh1, PTIMER_POLICY_DEFAULT);
 +    s->ptimer0 = ptimer_init(timer0_hit, s, PTIMER_POLICY_DEFAULT);
 +    s->ptimer1 = ptimer_init(timer1_hit, s, PTIMER_POLICY_DEFAULT);
 +    ptimer_transaction_begin(s->ptimer0);
      ptimer_set_freq(s->ptimer0, s->freq_hz);
 +    ptimer_transaction_commit(s->ptimer0);
 +    ptimer_transaction_begin(s->ptimer1);
      ptimer_set_freq(s->ptimer1, s->freq_hz);
 +    ptimer_transaction_commit(s->ptimer1);
  }
  static const VMStateDescription vmstate_milkymist_sysctl = {
 --
 .20.1

-[PULL 51/51] hw/arm/highbank: Use AddressSpace when using write_secondary_boot()
+[PULL 23/26] hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
+In gicv3_init_cpuif() we copy the ARMCPU gicv3_maintenance_interrupt
 into the GICv3CPUState struct's maintenance_irq field.  This will
 only work if the board happens to have already wired up the CPU
 maintenance IRQ before the GIC was realized.  Unfortunately this is
 not the case for the 'virt' board, and so the value that gets copied
 is NULL (since a qemu_irq is really a pointer to an IRQState struct
 under the hood).  The effect is that the CPU interface code never
 actually raises the maintenance interrupt line.
-write_secondary_boot() is used in SMP configurations where the
+Instead, since the GICv3CPUState has a pointer to the CPUState, make
-CPU address space might not be the main System Bus.
+the dereference at the point where we want to raise the interrupt, to
-The rom_add_blob_fixed_as() function allow us to specify an
+avoid an implicit requirement on board code to wire things up in a
-address space. Use it to write each boot blob in the corresponding
+particular order.
 CPU address space.
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reported-by: Jose Martins <josemartins90@gmail.com>
 Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
 Message-id: 20191019234715.25750-15-f4bug@amsat.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20201009153904.28529-1-peter.maydell@linaro.org
+Reviewed-by: Luc Michel <luc@lmichel.fr>
 ---
- hw/arm/highbank.c | 3 ++-
+ include/hw/intc/arm_gicv3_common.h | 1 -
-file changed, 2 insertions(+), 1 deletion(-)
+ hw/intc/arm_gicv3_cpuif.c          | 5 ++---
 files changed, 2 insertions(+), 4 deletions(-)
-diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c
+diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/highbank.c
+--- a/include/hw/intc/arm_gicv3_common.h
-+++ b/hw/arm/highbank.c
++++ b/include/hw/intc/arm_gicv3_common.h
-@@ -XXX,XX +XXX,XX @@ static void hb_write_secondary(ARMCPU *cpu, const struct arm_boot_info *info)
+@@ -XXX,XX +XXX,XX @@ struct GICv3CPUState {
-     for (n = 0; n < ARRAY_SIZE(smpboot); n++) {
+     qemu_irq parent_fiq;
-         smpboot[n] = tswap32(smpboot[n]);
+     qemu_irq parent_virq;
-     }
+     qemu_irq parent_vfiq;
--    rom_add_blob_fixed("smpboot", smpboot, sizeof(smpboot), SMP_BOOT_ADDR);
+-    qemu_irq maintenance_irq;
-+    rom_add_blob_fixed_as("smpboot", smpboot, sizeof(smpboot), SMP_BOOT_ADDR,
-+                          arm_boot_address_space(cpu, info));
+     /* Redistributor */
      uint32_t level;                  /* Current IRQ level */
 diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/intc/arm_gicv3_cpuif.c
 +++ b/hw/intc/arm_gicv3_cpuif.c
@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
      int irqlevel = 0;
      int fiqlevel = 0;
      int maintlevel = 0;
 +    ARMCPU *cpu = ARM_CPU(cs->cpu);
      idx = hppvi_index(cs);
      trace_gicv3_cpuif_virt_update(gicv3_redist_affid(cs), idx);
@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
      qemu_set_irq(cs->parent_vfiq, fiqlevel);
      qemu_set_irq(cs->parent_virq, irqlevel);
 -    qemu_set_irq(cs->maintenance_irq, maintlevel);
 +    qemu_set_irq(cpu->gicv3_maintenance_interrupt, maintlevel);
  }
- static void hb_reset_secondary(ARMCPU *cpu, const struct arm_boot_info *info)
+ static uint64_t icv_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
@@ -XXX,XX +XXX,XX @@ void gicv3_init_cpuif(GICv3State *s)
              && cpu->gic_num_lrs) {
              int j;
 -            cs->maintenance_irq = cpu->gicv3_maintenance_interrupt;
 -
              cs->num_list_regs = cpu->gic_num_lrs;
              cs->vpribits = cpu->gic_vpribits;
              cs->vprebits = cpu->gic_vprebits;
 --
 .20.1

-[PULL 42/51] target/arm/cpu64: max cpu: Support sve properties with KVM
+[PULL 24/26] scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
-From: Andrew Jones <drjones@redhat.com>
+The kerneldoc script currently emits Sphinx markup for a macro with
 arguments that uses the c:function directive. This is correct for
 Sphinx versions earlier than Sphinx 3, where c:macro doesn't allow
 documentation of macros with arguments and c:function is not picky
 about the syntax of what it is passed. However, in Sphinx 3 the
 c:macro directive was enhanced to support macros with arguments,
 and c:function was made more picky about what syntax it accepted.
-Extend the SVE vq map initialization and validation with KVM's
+When kerneldoc is told that it needs to produce output for Sphinx
-supported vector lengths when KVM is enabled. In order to determine
+or later, make it emit c:function only for functions and c:macro
-and select supported lengths we add two new KVM functions for getting
+for macros with arguments. We assume that anything with a return
-and setting the KVM_REG_ARM64_SVE_VLS pseudo-register.
+type is a function and anything without is a macro.
-This patch has been co-authored with Richard Henderson, who reworked
+This fixes the Sphinx error:
 the target/arm/cpu64.c changes in order to push all the validation and
 auto-enabling/disabling steps into the finalizer, resulting in a nice
 LOC reduction.
-Signed-off-by: Andrew Jones <drjones@redhat.com>
+/home/petmay01/linaro/qemu-from-laptop/qemu/docs/../include/qom/object.h:155:Error in declarator
-Reviewed-by: Eric Auger <eric.auger@redhat.com>
+If declarator-id with parameters (e.g., 'void f(int arg)'):
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+  Invalid C declaration: Expected identifier in nested name. [error at 25]
-Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
+    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
-Message-id: 20191024121808.9612-9-drjones@redhat.com
+    -------------------------^
 If parenthesis in noptr-declarator (e.g., 'void (*f(int arg))(double)'):
   Error in declarator or parameters
   Invalid C declaration: Expecting "(" in parameters. [error at 39]
     DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
     ---------------------------------------^
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
+Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20201030174700.7204-2-peter.maydell@linaro.org
 ---
- target/arm/kvm_arm.h      |  12 +++
+ scripts/kernel-doc | 18 +++++++++++++++++-
- target/arm/cpu64.c        | 176 ++++++++++++++++++++++++++++----------
+file changed, 17 insertions(+), 1 deletion(-)
  target/arm/kvm64.c        | 100 +++++++++++++++++++++-
  tests/arm-cpu-features.c  | 106 ++++++++++++++++++++++-
  docs/arm-cpu-features.rst |  45 +++++++---
 files changed, 381 insertions(+), 58 deletions(-)
-diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
+diff --git a/scripts/kernel-doc b/scripts/kernel-doc
-index XXXXXXX..XXXXXXX 100644
+index XXXXXXX..XXXXXXX 100755
---- a/target/arm/kvm_arm.h
+--- a/scripts/kernel-doc
-+++ b/target/arm/kvm_arm.h
++++ b/scripts/kernel-doc
-@@ -XXX,XX +XXX,XX @@ typedef struct ARMHostCPUFeatures {
+@@ -XXX,XX +XXX,XX @@ sub output_function_rst(%) {
-  */
+     output_highlight_rst($args{'purpose'});
- bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf);
+     $start = "\n\n**Syntax**\n\n  ``";
+     } else {
-+/**
+-    print ".. c:function:: ";
-+ * kvm_arm_sve_get_vls:
++        if ((split(/\./, $sphinx_version))[0] >= 3) {
-+ * @cs: CPUState
++            # Sphinx 3 and later distinguish macros and functions and
-+ * @map: bitmap to fill in
++            # complain if you use c:function with something that's not
-+ *
++            # syntactically valid as a function declaration.
-+ * Get all the SVE vector lengths supported by the KVM host, setting
++            # We assume that anything with a return type is a function
-+ * the bits corresponding to their length in quadwords minus one
++            # and anything without is a macro.
-+ * (vq - 1) in @map up to ARM_MAX_VQ.
++            if ($args{'functiontype'} ne "") {
-+ */
++                print ".. c:function:: ";
-+void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map);
++            } else {
-+
++                print ".. c:macro:: ";
  /**
   * kvm_arm_set_cpu_features_from_host:
   * @cpu: ARMCPU to set the features for
@@ -XXX,XX +XXX,XX @@ static inline int kvm_arm_vgic_probe(void)
  static inline void kvm_arm_pmu_set_irq(CPUState *cs, int irq) {}
  static inline void kvm_arm_pmu_init(CPUState *cs) {}
 +static inline void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map) {}
  #endif
  static inline const char *gic_class_name(void)
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
       * any of the above.  Finally, if SVE is not disabled, then at least one
       * vector length must be enabled.
       */
 +    DECLARE_BITMAP(kvm_supported, ARM_MAX_VQ);
      DECLARE_BITMAP(tmp, ARM_MAX_VQ);
      uint32_t vq, max_vq = 0;
 +    /* Collect the set of vector lengths supported by KVM. */
 +    bitmap_zero(kvm_supported, ARM_MAX_VQ);
 +    if (kvm_enabled() && kvm_arm_sve_supported(CPU(cpu))) {
 +        kvm_arm_sve_get_vls(CPU(cpu), kvm_supported);
 +    } else if (kvm_enabled()) {
 +        assert(!cpu_isar_feature(aa64_sve, cpu));
 +    }
 +
      /*
       * Process explicit sve<N> properties.
       * From the properties, sve_vq_map<N> implies sve_vq_init<N>.
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
              return;
          }
 -        /* Propagate enabled bits down through required powers-of-two. */
 -        for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
 -            if (!test_bit(vq - 1, cpu->sve_vq_init)) {
 -                set_bit(vq - 1, cpu->sve_vq_map);
 +        if (kvm_enabled()) {
 +            /*
 +             * For KVM we have to automatically enable all supported unitialized
 +             * lengths, even when the smaller lengths are not all powers-of-two.
 +             */
 +            bitmap_andnot(tmp, kvm_supported, cpu->sve_vq_init, max_vq);
 +            bitmap_or(cpu->sve_vq_map, cpu->sve_vq_map, tmp, max_vq);
 +        } else {
 +            /* Propagate enabled bits down through required powers-of-two. */
 +            for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
 +                if (!test_bit(vq - 1, cpu->sve_vq_init)) {
 +                    set_bit(vq - 1, cpu->sve_vq_map);
 +                }
              }
          }
      } else if (cpu->sve_max_vq == 0) {
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
              return;
          }
 -        /* Disabling a power-of-two disables all larger lengths. */
 -        if (test_bit(0, cpu->sve_vq_init)) {
 -            error_setg(errp, "cannot disable sve128");
 -            error_append_hint(errp, "Disabling sve128 results in all vector "
 -                              "lengths being disabled.\n");
 -            error_append_hint(errp, "With SVE enabled, at least one vector "
 -                              "length must be enabled.\n");
 -            return;
 -        }
 -        for (vq = 2; vq <= ARM_MAX_VQ; vq <<= 1) {
 -            if (test_bit(vq - 1, cpu->sve_vq_init)) {
 -                break;
 +        if (kvm_enabled()) {
 +            /* Disabling a supported length disables all larger lengths. */
 +            for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
 +                if (test_bit(vq - 1, cpu->sve_vq_init) &&
 +                    test_bit(vq - 1, kvm_supported)) {
 +                    break;
 +                }
              }
 +            max_vq = vq <= ARM_MAX_VQ ? vq - 1 : ARM_MAX_VQ;
 +            bitmap_andnot(cpu->sve_vq_map, kvm_supported,
 +                          cpu->sve_vq_init, max_vq);
 +            if (max_vq == 0 || bitmap_empty(cpu->sve_vq_map, max_vq)) {
 +                error_setg(errp, "cannot disable sve%d", vq * 128);
 +                error_append_hint(errp, "Disabling sve%d results in all "
 +                                  "vector lengths being disabled.\n",
 +                                  vq * 128);
 +                error_append_hint(errp, "With SVE enabled, at least one "
 +                                  "vector length must be enabled.\n");
 +                return;
 +            }
 +        } else {
-+            /* Disabling a power-of-two disables all larger lengths. */
++            # Older Sphinx don't support documenting macros that take
-+            if (test_bit(0, cpu->sve_vq_init)) {
++            # arguments with c:macro, and don't complain about the use
-+                error_setg(errp, "cannot disable sve128");
++            # of c:function for this.
-+                error_append_hint(errp, "Disabling sve128 results in all "
++            print ".. c:function:: ";
 +                                  "vector lengths being disabled.\n");
 +                error_append_hint(errp, "With SVE enabled, at least one "
 +                                  "vector length must be enabled.\n");
 +                return;
 +            }
 +            for (vq = 2; vq <= ARM_MAX_VQ; vq <<= 1) {
 +                if (test_bit(vq - 1, cpu->sve_vq_init)) {
 +                    break;
 +                }
 +            }
 +            max_vq = vq <= ARM_MAX_VQ ? vq - 1 : ARM_MAX_VQ;
 +            bitmap_complement(cpu->sve_vq_map, cpu->sve_vq_init, max_vq);
          }
 -        max_vq = vq <= ARM_MAX_VQ ? vq - 1 : ARM_MAX_VQ;
 -        bitmap_complement(cpu->sve_vq_map, cpu->sve_vq_init, max_vq);
          max_vq = find_last_bit(cpu->sve_vq_map, max_vq) + 1;
      }
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
      assert(max_vq != 0);
      bitmap_clear(cpu->sve_vq_map, max_vq, ARM_MAX_VQ - max_vq);
 -    /* Ensure all required powers-of-two are enabled. */
 -    for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
 -        if (!test_bit(vq - 1, cpu->sve_vq_map)) {
 -            error_setg(errp, "cannot disable sve%d", vq * 128);
 -            error_append_hint(errp, "sve%d is required as it "
 -                              "is a power-of-two length smaller than "
 -                              "the maximum, sve%d\n",
 -                              vq * 128, max_vq * 128);
 +    if (kvm_enabled()) {
 +        /* Ensure the set of lengths matches what KVM supports. */
 +        bitmap_xor(tmp, cpu->sve_vq_map, kvm_supported, max_vq);
 +        if (!bitmap_empty(tmp, max_vq)) {
 +            vq = find_last_bit(tmp, max_vq) + 1;
 +            if (test_bit(vq - 1, cpu->sve_vq_map)) {
 +                if (cpu->sve_max_vq) {
 +                    error_setg(errp, "cannot set sve-max-vq=%d",
 +                               cpu->sve_max_vq);
 +                    error_append_hint(errp, "This KVM host does not support "
 +                                      "the vector length %d-bits.\n",
 +                                      vq * 128);
 +                    error_append_hint(errp, "It may not be possible to use "
 +                                      "sve-max-vq with this KVM host. Try "
 +                                      "using only sve<N> properties.\n");
 +                } else {
 +                    error_setg(errp, "cannot enable sve%d", vq * 128);
 +                    error_append_hint(errp, "This KVM host does not support "
 +                                      "the vector length %d-bits.\n",
 +                                      vq * 128);
 +                }
 +            } else {
 +                error_setg(errp, "cannot disable sve%d", vq * 128);
 +                error_append_hint(errp, "The KVM host requires all "
 +                                  "supported vector lengths smaller "
 +                                  "than %d bits to also be enabled.\n",
 +                                  max_vq * 128);
 +            }
              return;
          }
 +    } else {
 +        /* Ensure all required powers-of-two are enabled. */
 +        for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
 +            if (!test_bit(vq - 1, cpu->sve_vq_map)) {
 +                error_setg(errp, "cannot disable sve%d", vq * 128);
 +                error_append_hint(errp, "sve%d is required as it "
 +                                  "is a power-of-two length smaller than "
 +                                  "the maximum, sve%d\n",
 +                                  vq * 128, max_vq * 128);
 +                return;
 +            }
 +        }
      }
+     if ($args{'functiontype'} ne "") {
-     /*
+     $start .= $args{'functiontype'} . " " . $args{'function'} . " (";
@@ -XXX,XX +XXX,XX @@ static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char *name,
  {
      ARMCPU *cpu = ARM_CPU(obj);
      Error *err = NULL;
 +    uint32_t max_vq;
 -    visit_type_uint32(v, name, &cpu->sve_max_vq, &err);
 -
 -    if (!err && (cpu->sve_max_vq == 0 || cpu->sve_max_vq > ARM_MAX_VQ)) {
 -        error_setg(&err, "unsupported SVE vector length");
 -        error_append_hint(&err, "Valid sve-max-vq in range [1-%d]\n",
 -                          ARM_MAX_VQ);
 +    visit_type_uint32(v, name, &max_vq, &err);
 +    if (err) {
 +        error_propagate(errp, err);
 +        return;
      }
 -    error_propagate(errp, err);
 +
 +    if (kvm_enabled() && !kvm_arm_sve_supported(CPU(cpu))) {
 +        error_setg(errp, "cannot set sve-max-vq");
 +        error_append_hint(errp, "SVE not supported by KVM on this host\n");
 +        return;
 +    }
 +
 +    if (max_vq == 0 || max_vq > ARM_MAX_VQ) {
 +        error_setg(errp, "unsupported SVE vector length");
 +        error_append_hint(errp, "Valid sve-max-vq in range [1-%d]\n",
 +                          ARM_MAX_VQ);
 +        return;
 +    }
 +
 +    cpu->sve_max_vq = max_vq;
  }
  static void cpu_arm_get_sve_vq(Object *obj, Visitor *v, const char *name,
@@ -XXX,XX +XXX,XX @@ static void cpu_arm_set_sve_vq(Object *obj, Visitor *v, const char *name,
          return;
      }
 +    if (value && kvm_enabled() && !kvm_arm_sve_supported(CPU(cpu))) {
 +        error_setg(errp, "cannot enable %s", name);
 +        error_append_hint(errp, "SVE not supported by KVM on this host\n");
 +        return;
 +    }
 +
      if (value) {
          set_bit(vq - 1, cpu->sve_vq_map);
      } else {
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
          cpu->ctr = 0x80038003; /* 32 byte I and D cacheline size, VIPT icache */
          cpu->dcz_blocksize = 7; /*  512 bytes */
  #endif
 -
 -        object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
 -                            cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
 -
 -        for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
 -            char name[8];
 -            sprintf(name, "sve%d", vq * 128);
 -            object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
 -                                cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
 -        }
      }
      object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
                          cpu_arm_set_sve, NULL, NULL, &error_fatal);
 +    object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
 +                        cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
 +
 +    for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
 +        char name[8];
 +        sprintf(name, "sve%d", vq * 128);
 +        object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
 +                            cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
 +    }
  }
  struct ARMCPUInfo {
 diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm64.c
 +++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_sve_supported(CPUState *cpu)
      return kvm_check_extension(s, KVM_CAP_ARM_SVE);
  }
 +QEMU_BUILD_BUG_ON(KVM_ARM64_SVE_VQ_MIN != 1);
 +
 +void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map)
 +{
 +    /* Only call this function if kvm_arm_sve_supported() returns true. */
 +    static uint64_t vls[KVM_ARM64_SVE_VLS_WORDS];
 +    static bool probed;
 +    uint32_t vq = 0;
 +    int i, j;
 +
 +    bitmap_clear(map, 0, ARM_MAX_VQ);
 +
 +    /*
 +     * KVM ensures all host CPUs support the same set of vector lengths.
 +     * So we only need to create the scratch VCPUs once and then cache
 +     * the results.
 +     */
 +    if (!probed) {
 +        struct kvm_vcpu_init init = {
 +            .target = -1,
 +            .features[0] = (1 << KVM_ARM_VCPU_SVE),
 +        };
 +        struct kvm_one_reg reg = {
 +            .id = KVM_REG_ARM64_SVE_VLS,
 +            .addr = (uint64_t)&vls[0],
 +        };
 +        int fdarray[3], ret;
 +
 +        probed = true;
 +
 +        if (!kvm_arm_create_scratch_host_vcpu(NULL, fdarray, &init)) {
 +            error_report("failed to create scratch VCPU with SVE enabled");
 +            abort();
 +        }
 +        ret = ioctl(fdarray[2], KVM_GET_ONE_REG, &reg);
 +        kvm_arm_destroy_scratch_host_vcpu(fdarray);
 +        if (ret) {
 +            error_report("failed to get KVM_REG_ARM64_SVE_VLS: %s",
 +                         strerror(errno));
 +            abort();
 +        }
 +
 +        for (i = KVM_ARM64_SVE_VLS_WORDS - 1; i >= 0; --i) {
 +            if (vls[i]) {
 +                vq = 64 - clz64(vls[i]) + i * 64;
 +                break;
 +            }
 +        }
 +        if (vq > ARM_MAX_VQ) {
 +            warn_report("KVM supports vector lengths larger than "
 +                        "QEMU can enable");
 +        }
 +    }
 +
 +    for (i = 0; i < KVM_ARM64_SVE_VLS_WORDS; ++i) {
 +        if (!vls[i]) {
 +            continue;
 +        }
 +        for (j = 1; j <= 64; ++j) {
 +            vq = j + i * 64;
 +            if (vq > ARM_MAX_VQ) {
 +                return;
 +            }
 +            if (vls[i] & (1UL << (j - 1))) {
 +                set_bit(vq - 1, map);
 +            }
 +        }
 +    }
 +}
 +
 +static int kvm_arm_sve_set_vls(CPUState *cs)
 +{
 +    uint64_t vls[KVM_ARM64_SVE_VLS_WORDS] = {0};
 +    struct kvm_one_reg reg = {
 +        .id = KVM_REG_ARM64_SVE_VLS,
 +        .addr = (uint64_t)&vls[0],
 +    };
 +    ARMCPU *cpu = ARM_CPU(cs);
 +    uint32_t vq;
 +    int i, j;
 +
 +    assert(cpu->sve_max_vq <= KVM_ARM64_SVE_VQ_MAX);
 +
 +    for (vq = 1; vq <= cpu->sve_max_vq; ++vq) {
 +        if (test_bit(vq - 1, cpu->sve_vq_map)) {
 +            i = (vq - 1) / 64;
 +            j = (vq - 1) % 64;
 +            vls[i] |= 1UL << j;
 +        }
 +    }
 +
 +    return kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
 +}
 +
  #define ARM_CPU_ID_MPIDR       3, 0, 0, 0, 5
  int kvm_arch_init_vcpu(CPUState *cs)
@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
      if (cpu->kvm_target == QEMU_KVM_ARM_TARGET_NONE ||
          !object_dynamic_cast(OBJECT(cpu), TYPE_AARCH64_CPU)) {
 -        fprintf(stderr, "KVM is not supported for this guest CPU type\n");
 +        error_report("KVM is not supported for this guest CPU type");
          return -EINVAL;
      }
@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
      }
      if (cpu_isar_feature(aa64_sve, cpu)) {
 +        ret = kvm_arm_sve_set_vls(cs);
 +        if (ret) {
 +            return ret;
 +        }
          ret = kvm_arm_vcpu_finalize(cs, KVM_ARM_VCPU_SVE);
          if (ret) {
              return ret;
 diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/arm-cpu-features.c
 +++ b/tests/arm-cpu-features.c
@@ -XXX,XX +XXX,XX @@ static QDict *resp_get_props(QDict *resp)
      return qdict;
  }
 +static bool resp_get_feature(QDict *resp, const char *feature)
 +{
 +    QDict *props;
 +
 +    g_assert(resp);
 +    g_assert(resp_has_props(resp));
 +    props = resp_get_props(resp);
 +    g_assert(qdict_get(props, feature));
 +    return qdict_get_bool(props, feature);
 +}
 +
  #define assert_has_feature(qts, cpu_type, feature)                     \
  ({                                                                     \
      QDict *_resp = do_query_no_props(qts, cpu_type);                   \
@@ -XXX,XX +XXX,XX @@ static void sve_tests_sve_off(const void *data)
      qtest_quit(qts);
  }
 +static void sve_tests_sve_off_kvm(const void *data)
 +{
 +    QTestState *qts;
 +
 +    qts = qtest_init(MACHINE "-accel kvm -cpu max,sve=off");
 +
 +    /*
 +     * We don't know if this host supports SVE so we don't
 +     * attempt to test enabling anything. We only test that
 +     * everything is disabled (as it should be with sve=off)
 +     * and that using sve<N>=off to explicitly disable vector
 +     * lengths is OK too.
 +     */
 +    assert_sve_vls(qts, "max", 0, NULL);
 +    assert_sve_vls(qts, "max", 0, "{ 'sve128': false }");
 +
 +    qtest_quit(qts);
 +}
 +
  static void test_query_cpu_model_expansion(const void *data)
  {
      QTestState *qts;
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
      qts = qtest_init(MACHINE "-accel kvm -cpu host");
      if (g_str_equal(qtest_get_arch(), "aarch64")) {
 +        bool kvm_supports_sve;
 +        char max_name[8], name[8];
 +        uint32_t max_vq, vq;
 +        uint64_t vls;
 +        QDict *resp;
 +        char *error;
 +
          assert_has_feature(qts, "host", "aarch64");
          assert_has_feature(qts, "host", "pmu");
 -        assert_has_feature(qts, "max", "sve");
 -
          assert_error(qts, "cortex-a15",
              "We cannot guarantee the CPU type 'cortex-a15' works "
              "with KVM on this host", NULL);
 +
 +        assert_has_feature(qts, "max", "sve");
 +        resp = do_query_no_props(qts, "max");
 +        kvm_supports_sve = resp_get_feature(resp, "sve");
 +        vls = resp_get_sve_vls(resp);
 +        qobject_unref(resp);
 +
 +        if (kvm_supports_sve) {
 +            g_assert(vls != 0);
 +            max_vq = 64 - __builtin_clzll(vls);
 +            sprintf(max_name, "sve%d", max_vq * 128);
 +
 +            /* Enabling a supported length is of course fine. */
 +            assert_sve_vls(qts, "max", vls, "{ %s: true }", max_name);
 +
 +            /* Get the next supported length smaller than max-vq. */
 +            vq = 64 - __builtin_clzll(vls & ~BIT_ULL(max_vq - 1));
 +            if (vq) {
 +                /*
 +                 * We have at least one length smaller than max-vq,
 +                 * so we can disable max-vq.
 +                 */
 +                assert_sve_vls(qts, "max", (vls & ~BIT_ULL(max_vq - 1)),
 +                               "{ %s: false }", max_name);
 +
 +                /*
 +                 * Smaller, supported vector lengths cannot be disabled
 +                 * unless all larger, supported vector lengths are also
 +                 * disabled.
 +                 */
 +                sprintf(name, "sve%d", vq * 128);
 +                error = g_strdup_printf("cannot disable %s", name);
 +                assert_error(qts, "max", error,
 +                             "{ %s: true, %s: false }",
 +                             max_name, name);
 +                g_free(error);
 +            }
 +
 +            /*
 +             * The smallest, supported vector length is required, because
 +             * we need at least one vector length enabled.
 +             */
 +            vq = __builtin_ffsll(vls);
 +            sprintf(name, "sve%d", vq * 128);
 +            error = g_strdup_printf("cannot disable %s", name);
 +            assert_error(qts, "max", error, "{ %s: false }", name);
 +            g_free(error);
 +
 +            /* Get an unsupported length. */
 +            for (vq = 1; vq <= max_vq; ++vq) {
 +                if (!(vls & BIT_ULL(vq - 1))) {
 +                    break;
 +                }
 +            }
 +            if (vq <= SVE_MAX_VQ) {
 +                sprintf(name, "sve%d", vq * 128);
 +                error = g_strdup_printf("cannot enable %s", name);
 +                assert_error(qts, "max", error, "{ %s: true }", name);
 +                g_free(error);
 +            }
 +        } else {
 +            g_assert(vls == 0);
 +        }
      } else {
          assert_has_not_feature(qts, "host", "aarch64");
          assert_has_not_feature(qts, "host", "pmu");
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
      if (kvm_available) {
          qtest_add_data_func("/arm/kvm/query-cpu-model-expansion",
                              NULL, test_query_cpu_model_expansion_kvm);
 +        if (g_str_equal(qtest_get_arch(), "aarch64")) {
 +            qtest_add_data_func("/arm/kvm/query-cpu-model-expansion/sve-off",
 +                                NULL, sve_tests_sve_off_kvm);
 +        }
      }
      return g_test_run();
 diff --git a/docs/arm-cpu-features.rst b/docs/arm-cpu-features.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/arm-cpu-features.rst
 +++ b/docs/arm-cpu-features.rst
@@ -XXX,XX +XXX,XX @@ SVE CPU Property Dependencies and Constraints
 ) At least one vector length must be enabled when `sve` is enabled.
 -  2) If a vector length `N` is enabled, then all power-of-two vector
 -     lengths smaller than `N` must also be enabled.  E.g. if `sve512`
 -     is enabled, then the 128-bit and 256-bit vector lengths must also
 -     be enabled.
 +  2) If a vector length `N` is enabled, then, when KVM is enabled, all
 +     smaller, host supported vector lengths must also be enabled.  If
 +     KVM is not enabled, then only all the smaller, power-of-two vector
 +     lengths must be enabled.  E.g. with KVM if the host supports all
 +     vector lengths up to 512-bits (128, 256, 384, 512), then if `sve512`
 +     is enabled, the 128-bit vector length, 256-bit vector length, and
 +     384-bit vector length must also be enabled. Without KVM, the 384-bit
 +     vector length would not be required.
 +
 +  3) If KVM is enabled then only vector lengths that the host CPU type
 +     support may be enabled.  If SVE is not supported by the host, then
 +     no `sve*` properties may be enabled.
  SVE CPU Property Parsing Semantics
  ----------------------------------
@@ -XXX,XX +XXX,XX @@ SVE CPU Property Parsing Semantics
       an error is generated.
 ) If SVE is enabled (`sve=on`), but no `sve<N>` CPU properties are
 -     provided, then all supported vector lengths are enabled, including
 -     the non-power-of-two lengths.
 +     provided, then all supported vector lengths are enabled, which when
 +     KVM is not in use means including the non-power-of-two lengths, and,
 +     when KVM is in use, it means all vector lengths supported by the host
 +     processor.
 ) If SVE is enabled, then an error is generated when attempting to
       disable the last enabled vector length (see constraint (1) of "SVE
@@ -XXX,XX +XXX,XX @@ SVE CPU Property Parsing Semantics
       has been explicitly disabled, then an error is generated (see
       constraint (2) of "SVE CPU Property Dependencies and Constraints").
 -  5) If one or more `sve<N>` CPU properties are set `off`, but no `sve<N>`,
 +  5) When KVM is enabled, if the host does not support SVE, then an error
 +     is generated when attempting to enable any `sve*` properties (see
 +     constraint (3) of "SVE CPU Property Dependencies and Constraints").
 +
 +  6) When KVM is enabled, if the host does support SVE, then an error is
 +     generated when attempting to enable any vector lengths not supported
 +     by the host (see constraint (3) of "SVE CPU Property Dependencies and
 +     Constraints").
 +
 +  7) If one or more `sve<N>` CPU properties are set `off`, but no `sve<N>`,
       CPU properties are set `on`, then the specified vector lengths are
       disabled but the default for any unspecified lengths remains enabled.
 -     Disabling a power-of-two vector length also disables all vector
 -     lengths larger than the power-of-two length (see constraint (2) of
 -     "SVE CPU Property Dependencies and Constraints").
 +     When KVM is not enabled, disabling a power-of-two vector length also
 +     disables all vector lengths larger than the power-of-two length.
 +     When KVM is enabled, then disabling any supported vector length also
 +     disables all larger vector lengths (see constraint (2) of "SVE CPU
 +     Property Dependencies and Constraints").
 -  6) If one or more `sve<N>` CPU properties are set to `on`, then they
 +  8) If one or more `sve<N>` CPU properties are set to `on`, then they
       are enabled and all unspecified lengths default to disabled, except
       for the required lengths per constraint (2) of "SVE CPU Property
       Dependencies and Constraints", which will even be auto-enabled if
       they were not explicitly enabled.
 -  7) If SVE was disabled (`sve=off`), allowing all vector lengths to be
 +  9) If SVE was disabled (`sve=off`), allowing all vector lengths to be
       explicitly disabled (i.e. avoiding the error specified in (3) of
       "SVE CPU Property Parsing Semantics"), then if later an `sve=on` is
       provided an error will be generated.  To avoid this error, one must
 --
 .20.1

-[PULL 33/51] hw/m68k/mcf5206.c: Switch to transaction-based ptimer API
+[PULL 25/26] qemu-option-trace.rst.inc: Don't use option:: markup
-Switch the mcf5206 code away from bottom-half based ptimers to
+Sphinx 3.2 is pickier than earlier versions about the option:: markup,
-the new transaction-based ptimer API.  This just requires adding
+and complains about our usage in qemu-option-trace.rst:
-begin/commit calls around the various places that modify the ptimer
-state, and using the new ptimer_init() function to create the timer.
+../../docs/qemu-option-trace.rst.inc:4:Malformed option description
   '[enable=]PATTERN', should look like "opt", "-opt args", "--opt args",
   "/opt args" or "+opt args"
 In this file, we're really trying to document the different parts of
 the top-level --trace option, which qemu-nbd.rst and qemu-img.rst
 have already introduced with an option:: markup.  So it's not right
 to use option:: here anyway.  Switch to a different markup
 (definition lists) which gives about the same formatted output.
 (Unlike option::, this markup doesn't produce index entries; but
 at the moment we don't do anything much with indexes anyway, and
 in any case I think it doesn't make much sense to have individual
 index entries for the sub-parts of the --trace option.)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Thomas Huth <thuth@redhat.com>
+Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
-Message-id: 20191021140600.10725-1-peter.maydell@linaro.org
+Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
 Message-id: 20201030174700.7204-3-peter.maydell@linaro.org
 ---
- hw/m68k/mcf5206.c | 15 +++++++++------
+ docs/qemu-option-trace.rst.inc | 6 +++---
-file changed, 9 insertions(+), 6 deletions(-)
+file changed, 3 insertions(+), 3 deletions(-)
-diff --git a/hw/m68k/mcf5206.c b/hw/m68k/mcf5206.c
+diff --git a/docs/qemu-option-trace.rst.inc b/docs/qemu-option-trace.rst.inc
 index XXXXXXX..XXXXXXX 100644
---- a/hw/m68k/mcf5206.c
+--- a/docs/qemu-option-trace.rst.inc
-+++ b/hw/m68k/mcf5206.c
++++ b/docs/qemu-option-trace.rst.inc
 @@ -XXX,XX +XXX,XX @@
- #include "qemu/osdep.h"
+ Specify tracing options.
- #include "qemu/error-report.h"
--#include "qemu/main-loop.h"
+-.. option:: [enable=]PATTERN
- #include "cpu.h"
++``[enable=]PATTERN``
- #include "hw/hw.h"
- #include "hw/irq.h"
+   Immediately enable events matching *PATTERN*
-@@ -XXX,XX +XXX,XX @@ static void m5206_timer_recalibrate(m5206_timer_state *s)
+   (either event name or a globbing pattern).  This option is only
-     int prescale;
+@@ -XXX,XX +XXX,XX @@ Specify tracing options.
-     int mode;
+   Use :option:`-trace help` to print a list of names of trace points.
-+    ptimer_transaction_begin(s->timer);
-     ptimer_stop(s->timer);
+-.. option:: events=FILE
++``events=FILE``
--    if ((s->tmr & TMR_RST) == 0)
--        return;
+   Immediately enable events listed in *FILE*.
-+    if ((s->tmr & TMR_RST) == 0) {
+   The file must contain one event name (as listed in the ``trace-events-all``
-+        goto exit;
+@@ -XXX,XX +XXX,XX @@ Specify tracing options.
-+    }
+   available if QEMU has been compiled with the ``simple``, ``log`` or
+   ``ftrace`` tracing backend.
-     prescale = (s->tmr >> 8) + 1;
-     mode = (s->tmr >> 1) & 3;
+-.. option:: file=FILE
-@@ -XXX,XX +XXX,XX @@ static void m5206_timer_recalibrate(m5206_timer_state *s)
++``file=FILE``
-     ptimer_set_limit(s->timer, s->trr, 0);
+   Log output traces to *FILE*.
-     ptimer_run(s->timer, 0);
+   This option is only available if QEMU has been compiled with
 +exit:
 +    ptimer_transaction_commit(s->timer);
  }
  static void m5206_timer_trigger(void *opaque)
@@ -XXX,XX +XXX,XX @@ static void m5206_timer_write(m5206_timer_state *s, uint32_t addr, uint32_t val)
          s->tcr = val;
          break;
      case 0xc:
 +        ptimer_transaction_begin(s->timer);
          ptimer_set_count(s->timer, val);
 +        ptimer_transaction_commit(s->timer);
          break;
      case 0x11:
          s->ter &= ~val;
@@ -XXX,XX +XXX,XX @@ static void m5206_timer_write(m5206_timer_state *s, uint32_t addr, uint32_t val)
  static m5206_timer_state *m5206_timer_init(qemu_irq irq)
  {
      m5206_timer_state *s;
 -    QEMUBH *bh;
      s = g_new0(m5206_timer_state, 1);
 -    bh = qemu_bh_new(m5206_timer_trigger, s);
 -    s->timer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT);
 +    s->timer = ptimer_init(m5206_timer_trigger, s, PTIMER_POLICY_DEFAULT);
      s->irq = irq;
      m5206_timer_reset(s);
      return s;
 --
 .20.1

-[PULL 31/51] hw/timer/slavio_timer.c: Switch to transaction-based ptimer API
+[PULL 26/26] tests/qtest/npcm7xx_rng-test: Disable randomness tests
-Switch the slavio_timer code away from bottom-half based ptimers to
+The randomness tests in the NPCM7xx RNG test fail intermittently
-the new transaction-based ptimer API.  This just requires adding
+but fairly frequently. On my machine running the test in a loop:
-begin/commit calls around the various places that modify the ptimer
+ while QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/npcm7xx_rng-test; do true; done
-state, and using the new ptimer_init() function to create the timer.
 will fail in less than a minute with an error like:
 ERROR:../../tests/qtest/npcm7xx_rng-test.c:256:test_first_byte_runs:
 assertion failed (calc_runs_p(buf.l, sizeof(buf) * BITS_PER_BYTE) > 0.01): (0.00286205989 > 0.01)
 (Failures have been observed on all 4 of the randomness tests,
 not just first_byte_runs.)
 It's not clear why these tests are failing like this, but intermittent
 failures make CI and merge testing awkward, so disable running them
 unless a developer specifically sets QEMU_TEST_FLAKY_RNG_TESTS when
 running the test suite, until we work out the cause.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20201102152454.8287-1-peter.maydell@linaro.org
-Message-id: 20191021134357.14266-4-peter.maydell@linaro.org
+Reviewed-by: Havard Skinnemoen <hskinnemoen@google.com>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/timer/slavio_timer.c | 20 ++++++++++++++++----
+ tests/qtest/npcm7xx_rng-test.c | 14 ++++++++++----
-file changed, 16 insertions(+), 4 deletions(-)
+file changed, 10 insertions(+), 4 deletions(-)
-diff --git a/hw/timer/slavio_timer.c b/hw/timer/slavio_timer.c
+diff --git a/tests/qtest/npcm7xx_rng-test.c b/tests/qtest/npcm7xx_rng-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/timer/slavio_timer.c
+--- a/tests/qtest/npcm7xx_rng-test.c
-+++ b/hw/timer/slavio_timer.c
++++ b/tests/qtest/npcm7xx_rng-test.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
- #include "hw/sysbus.h"
- #include "migration/vmstate.h"
+     qtest_add_func("npcm7xx_rng/enable_disable", test_enable_disable);
- #include "trace.h"
+     qtest_add_func("npcm7xx_rng/rosel", test_rosel);
--#include "qemu/main-loop.h"
+-    qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
- #include "qemu/module.h"
+-    qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
+-    qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
- /*
+-    qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
-@@ -XXX,XX +XXX,XX @@ static void slavio_timer_mem_writel(void *opaque, hwaddr addr,
++    /*
-     saddr = addr >> 2;
++     * These tests fail intermittently; only run them on explicit
-     switch (saddr) {
++     * request until we figure out why.
-     case TIMER_LIMIT:
++     */
-+        ptimer_transaction_begin(t->timer);
++    if (getenv("QEMU_TEST_FLAKY_RNG_TESTS")) {
-         if (slavio_timer_is_user(tc)) {
++        qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
-             uint64_t count;
++        qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
++        qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
-@@ -XXX,XX +XXX,XX @@ static void slavio_timer_mem_writel(void *opaque, hwaddr addr,
++        qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
-                 ptimer_set_limit(t->timer, LIMIT_TO_PERIODS(t->limit), 1);
++    }
-             }
-         }
+     qtest_start("-machine npcm750-evb");
-+        ptimer_transaction_commit(t->timer);
+     ret = g_test_run();
          break;
      case TIMER_COUNTER:
          if (slavio_timer_is_user(tc)) {
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_mem_writel(void *opaque, hwaddr addr,
              t->reached = 0;
              count = ((uint64_t)t->counthigh) << 32 | t->count;
              trace_slavio_timer_mem_writel_limit(timer_index, count);
 +            ptimer_transaction_begin(t->timer);
              ptimer_set_count(t->timer, LIMIT_TO_PERIODS(t->limit - count));
 +            ptimer_transaction_commit(t->timer);
          } else {
              trace_slavio_timer_mem_writel_counter_invalid();
          }
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_mem_writel(void *opaque, hwaddr addr,
      case TIMER_COUNTER_NORST:
          // set limit without resetting counter
          t->limit = val & TIMER_MAX_COUNT32;
 +        ptimer_transaction_begin(t->timer);
          if (t->limit == 0) { /* free-run */
              ptimer_set_limit(t->timer, LIMIT_TO_PERIODS(TIMER_MAX_COUNT32), 0);
          } else {
              ptimer_set_limit(t->timer, LIMIT_TO_PERIODS(t->limit), 0);
          }
 +        ptimer_transaction_commit(t->timer);
          break;
      case TIMER_STATUS:
 +        ptimer_transaction_begin(t->timer);
          if (slavio_timer_is_user(tc)) {
              // start/stop user counter
              if (val & 1) {
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_mem_writel(void *opaque, hwaddr addr,
              }
          }
          t->run = val & 1;
 +        ptimer_transaction_commit(t->timer);
          break;
      case TIMER_MODE:
          if (timer_index == 0) {
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_mem_writel(void *opaque, hwaddr addr,
                  unsigned int processor = 1 << i;
                  CPUTimerState *curr_timer = &s->cputimer[i + 1];
 +                ptimer_transaction_begin(curr_timer->timer);
                  // check for a change in timer mode for this processor
                  if ((val & processor) != (s->cputimer_mode & processor)) {
                      if (val & processor) { // counter -> user timer
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_mem_writel(void *opaque, hwaddr addr,
                          trace_slavio_timer_mem_writel_mode_counter(timer_index);
                      }
                  }
 +                ptimer_transaction_commit(curr_timer->timer);
              }
          } else {
              trace_slavio_timer_mem_writel_mode_invalid();
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_reset(DeviceState *d)
          curr_timer->count = 0;
          curr_timer->reached = 0;
          if (i <= s->num_cpus) {
 +            ptimer_transaction_begin(curr_timer->timer);
              ptimer_set_limit(curr_timer->timer,
                               LIMIT_TO_PERIODS(TIMER_MAX_COUNT32), 1);
              ptimer_run(curr_timer->timer, 0);
              curr_timer->run = 1;
 +            ptimer_transaction_commit(curr_timer->timer);
          }
      }
      s->cputimer_mode = 0;
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_init(Object *obj)
  {
      SLAVIO_TIMERState *s = SLAVIO_TIMER(obj);
      SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 -    QEMUBH *bh;
      unsigned int i;
      TimerContext *tc;
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_init(Object *obj)
          tc->s = s;
          tc->timer_index = i;
 -        bh = qemu_bh_new(slavio_timer_irq, tc);
 -        s->cputimer[i].timer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT);
 +        s->cputimer[i].timer = ptimer_init(slavio_timer_irq, tc,
 +                                           PTIMER_POLICY_DEFAULT);
 +        ptimer_transaction_begin(s->cputimer[i].timer);
          ptimer_set_period(s->cputimer[i].timer, TIMER_PERIOD);
 +        ptimer_transaction_commit(s->cputimer[i].timer);
          size = i == 0 ? SYS_TIMER_SIZE : CPU_TIMER_SIZE;
          snprintf(timer_name, sizeof(timer_name), "timer-%i", i);
 --
 .20.1

-[PULL 36/51] tests: arm: Introduce cpu feature tests
+Deleted patch
-From: Andrew Jones <drjones@redhat.com>
-Now that Arm CPUs have advertised features lets add tests to ensure
-we maintain their expected availability with and without KVM.
-Signed-off-by: Andrew Jones <drjones@redhat.com>
-Reviewed-by: Eric Auger <eric.auger@redhat.com>
-Message-id: 20191024121808.9612-3-drjones@redhat.com
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- tests/Makefile.include   |   5 +-
- tests/arm-cpu-features.c | 240 +++++++++++++++++++++++++++++++++++++++
-files changed, 244 insertions(+), 1 deletion(-)
- create mode 100644 tests/arm-cpu-features.c
-diff --git a/tests/Makefile.include b/tests/Makefile.include
-index XXXXXXX..XXXXXXX 100644
---- a/tests/Makefile.include
-+++ b/tests/Makefile.include
-@@ -XXX,XX +XXX,XX @@ check-qtest-sparc64-$(CONFIG_ISA_TESTDEV) = tests/endianness-test$(EXESUF)
- check-qtest-sparc64-y += tests/prom-env-test$(EXESUF)
- check-qtest-sparc64-y += tests/boot-serial-test$(EXESUF)
-+check-qtest-arm-y += tests/arm-cpu-features$(EXESUF)
- check-qtest-arm-y += tests/microbit-test$(EXESUF)
- check-qtest-arm-y += tests/m25p80-test$(EXESUF)
- check-qtest-arm-y += tests/test-arm-mptimer$(EXESUF)
-@@ -XXX,XX +XXX,XX @@ check-qtest-arm-y += tests/boot-serial-test$(EXESUF)
- check-qtest-arm-y += tests/hexloader-test$(EXESUF)
- check-qtest-arm-$(CONFIG_PFLASH_CFI02) += tests/pflash-cfi02-test$(EXESUF)
--check-qtest-aarch64-y = tests/numa-test$(EXESUF)
-+check-qtest-aarch64-y += tests/arm-cpu-features$(EXESUF)
-+check-qtest-aarch64-y += tests/numa-test$(EXESUF)
- check-qtest-aarch64-y += tests/boot-serial-test$(EXESUF)
- check-qtest-aarch64-y += tests/migration-test$(EXESUF)
- # TODO: once aarch64 TCG is fixed on ARM 32 bit host, make test unconditional
-@@ -XXX,XX +XXX,XX @@ tests/test-qapi-util$(EXESUF): tests/test-qapi-util.o $(test-util-obj-y)
- tests/numa-test$(EXESUF): tests/numa-test.o
- tests/vmgenid-test$(EXESUF): tests/vmgenid-test.o tests/boot-sector.o tests/acpi-utils.o
- tests/cdrom-test$(EXESUF): tests/cdrom-test.o tests/boot-sector.o $(libqos-obj-y)
-+tests/arm-cpu-features$(EXESUF): tests/arm-cpu-features.o
- tests/migration/stress$(EXESUF): tests/migration/stress.o
-     $(call quiet-command, $(LINKPROG) -static -O3 $(PTHREAD_LIB) -o $@ $< ,"LINK","$(TARGET_DIR)$@")
-diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
-new file mode 100644
-index XXXXXXX..XXXXXXX
---- /dev/null
-+++ b/tests/arm-cpu-features.c
-@@ -XXX,XX +XXX,XX @@
-+/*
-+ * Arm CPU feature test cases
-+ *
-+ * Copyright (c) 2019 Red Hat Inc.
-+ * Authors:
-+ *  Andrew Jones <drjones@redhat.com>
-+ *
-+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
-+ * See the COPYING file in the top-level directory.
-+ */
-+#include "qemu/osdep.h"
-+#include "libqtest.h"
-+#include "qapi/qmp/qdict.h"
-+#include "qapi/qmp/qjson.h"
-+
-+#define MACHINE    "-machine virt,gic-version=max "
-+#define QUERY_HEAD "{ 'execute': 'query-cpu-model-expansion', " \
-+                     "'arguments': { 'type': 'full', "
-+#define QUERY_TAIL "}}"
-+
-+static QDict *do_query_no_props(QTestState *qts, const char *cpu_type)
-+{
-+    return qtest_qmp(qts, QUERY_HEAD "'model': { 'name': %s }"
-+                          QUERY_TAIL, cpu_type);
-+}
-+
-+static QDict *do_query(QTestState *qts, const char *cpu_type,
-+                       const char *fmt, ...)
-+{
-+    QDict *resp;
-+
-+    if (fmt) {
-+        QDict *args;
-+        va_list ap;
-+
-+        va_start(ap, fmt);
-+        args = qdict_from_vjsonf_nofail(fmt, ap);
-+        va_end(ap);
-+
-+        resp = qtest_qmp(qts, QUERY_HEAD "'model': { 'name': %s, "
-+                                                    "'props': %p }"
-+                              QUERY_TAIL, cpu_type, args);
-+    } else {
-+        resp = do_query_no_props(qts, cpu_type);
-+    }
-+
-+    return resp;
-+}
-+
-+static const char *resp_get_error(QDict *resp)
-+{
-+    QDict *qdict;
-+
-+    g_assert(resp);
-+
-+    qdict = qdict_get_qdict(resp, "error");
-+    if (qdict) {
-+        return qdict_get_str(qdict, "desc");
-+    }
-+    return NULL;
-+}
-+
-+#define assert_error(qts, cpu_type, expected_error, fmt, ...)          \
-+({                                                                     \
-+    QDict *_resp;                                                      \
-+    const char *_error;                                                \
-+                                                                       \
-+    _resp = do_query(qts, cpu_type, fmt, ##__VA_ARGS__);               \
-+    g_assert(_resp);                                                   \
-+    _error = resp_get_error(_resp);                                    \
-+    g_assert(_error);                                                  \
-+    g_assert(g_str_equal(_error, expected_error));                     \
-+    qobject_unref(_resp);                                              \
-+})
-+
-+static bool resp_has_props(QDict *resp)
-+{
-+    QDict *qdict;
-+
-+    g_assert(resp);
-+
-+    if (!qdict_haskey(resp, "return")) {
-+        return false;
-+    }
-+    qdict = qdict_get_qdict(resp, "return");
-+
-+    if (!qdict_haskey(qdict, "model")) {
-+        return false;
-+    }
-+    qdict = qdict_get_qdict(qdict, "model");
-+
-+    return qdict_haskey(qdict, "props");
-+}
-+
-+static QDict *resp_get_props(QDict *resp)
-+{
-+    QDict *qdict;
-+
-+    g_assert(resp);
-+    g_assert(resp_has_props(resp));
-+
-+    qdict = qdict_get_qdict(resp, "return");
-+    qdict = qdict_get_qdict(qdict, "model");
-+    qdict = qdict_get_qdict(qdict, "props");
-+    return qdict;
-+}
-+
-+#define assert_has_feature(qts, cpu_type, feature)                     \
-+({                                                                     \
-+    QDict *_resp = do_query_no_props(qts, cpu_type);                   \
-+    g_assert(_resp);                                                   \
-+    g_assert(resp_has_props(_resp));                                   \
-+    g_assert(qdict_get(resp_get_props(_resp), feature));               \
-+    qobject_unref(_resp);                                              \
-+})
-+
-+#define assert_has_not_feature(qts, cpu_type, feature)                 \
-+({                                                                     \
-+    QDict *_resp = do_query_no_props(qts, cpu_type);                   \
-+    g_assert(_resp);                                                   \
-+    g_assert(!resp_has_props(_resp) ||                                 \
-+             !qdict_get(resp_get_props(_resp), feature));              \
-+    qobject_unref(_resp);                                              \
-+})
-+
-+static void assert_type_full(QTestState *qts)
-+{
-+    const char *error;
-+    QDict *resp;
-+
-+    resp = qtest_qmp(qts, "{ 'execute': 'query-cpu-model-expansion', "
-+                            "'arguments': { 'type': 'static', "
-+                                           "'model': { 'name': 'foo' }}}");
-+    g_assert(resp);
-+    error = resp_get_error(resp);
-+    g_assert(error);
-+    g_assert(g_str_equal(error,
-+                         "The requested expansion type is not supported"));
-+    qobject_unref(resp);
-+}
-+
-+static void assert_bad_props(QTestState *qts, const char *cpu_type)
-+{
-+    const char *error;
-+    QDict *resp;
-+
-+    resp = qtest_qmp(qts, "{ 'execute': 'query-cpu-model-expansion', "
-+                            "'arguments': { 'type': 'full', "
-+                                           "'model': { 'name': %s, "
-+                                                      "'props': false }}}",
-+                     cpu_type);
-+    g_assert(resp);
-+    error = resp_get_error(resp);
-+    g_assert(error);
-+    g_assert(g_str_equal(error,
-+                         "Invalid parameter type for 'props', expected: dict"));
-+    qobject_unref(resp);
-+}
-+
-+static void test_query_cpu_model_expansion(const void *data)
-+{
-+    QTestState *qts;
-+
-+    qts = qtest_init(MACHINE "-cpu max");
-+
-+    /* Test common query-cpu-model-expansion input validation */
-+    assert_type_full(qts);
-+    assert_bad_props(qts, "max");
-+    assert_error(qts, "foo", "The CPU type 'foo' is not a recognized "
-+                 "ARM CPU type", NULL);
-+    assert_error(qts, "max", "Parameter 'not-a-prop' is unexpected",
-+                 "{ 'not-a-prop': false }");
-+    assert_error(qts, "host", "The CPU type 'host' requires KVM", NULL);
-+
-+    /* Test expected feature presence/absence for some cpu types */
-+    assert_has_feature(qts, "max", "pmu");
-+    assert_has_feature(qts, "cortex-a15", "pmu");
-+    assert_has_not_feature(qts, "cortex-a15", "aarch64");
-+
-+    if (g_str_equal(qtest_get_arch(), "aarch64")) {
-+        assert_has_feature(qts, "max", "aarch64");
-+        assert_has_feature(qts, "cortex-a57", "pmu");
-+        assert_has_feature(qts, "cortex-a57", "aarch64");
-+
-+        /* Test that features that depend on KVM generate errors without. */
-+        assert_error(qts, "max",
-+                     "'aarch64' feature cannot be disabled "
-+                     "unless KVM is enabled and 32-bit EL1 "
-+                     "is supported",
-+                     "{ 'aarch64': false }");
-+    }
-+
-+    qtest_quit(qts);
-+}
-+
-+static void test_query_cpu_model_expansion_kvm(const void *data)
-+{
-+    QTestState *qts;
-+
-+    qts = qtest_init(MACHINE "-accel kvm -cpu host");
-+
-+    if (g_str_equal(qtest_get_arch(), "aarch64")) {
-+        assert_has_feature(qts, "host", "aarch64");
-+        assert_has_feature(qts, "host", "pmu");
-+
-+        assert_error(qts, "cortex-a15",
-+            "We cannot guarantee the CPU type 'cortex-a15' works "
-+            "with KVM on this host", NULL);
-+    } else {
-+        assert_has_not_feature(qts, "host", "aarch64");
-+        assert_has_not_feature(qts, "host", "pmu");
-+    }
-+
-+    qtest_quit(qts);
-+}
-+
-+int main(int argc, char **argv)
-+{
-+    bool kvm_available = false;
-+
-+    if (!access("/dev/kvm",  R_OK | W_OK)) {
-+#if defined(HOST_AARCH64)
-+        kvm_available = g_str_equal(qtest_get_arch(), "aarch64");
-+#elif defined(HOST_ARM)
-+        kvm_available = g_str_equal(qtest_get_arch(), "arm");
-+#endif
-+    }
-+
-+    g_test_init(&argc, &argv, NULL);
-+
-+    qtest_add_data_func("/arm/query-cpu-model-expansion",
-+                        NULL, test_query_cpu_model_expansion);
-+
-+    if (kvm_available) {
-+        qtest_add_data_func("/arm/kvm/query-cpu-model-expansion",
-+                            NULL, test_query_cpu_model_expansion_kvm);
-+    }
-+
-+    return g_test_run();
-+}
---
-.20.1

-[PULL 39/51] target/arm/kvm64: Add kvm_arch_get/put_sve
+Deleted patch
-From: Andrew Jones <drjones@redhat.com>
-These are the SVE equivalents to kvm_arch_get/put_fpsimd. Note, the
-swabbing is different than it is for fpsmid because the vector format
-is a little-endian stream of words.
-Signed-off-by: Andrew Jones <drjones@redhat.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Eric Auger <eric.auger@redhat.com>
-Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
-Message-id: 20191024121808.9612-6-drjones@redhat.com
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/kvm64.c | 185 ++++++++++++++++++++++++++++++++++++++-------
-file changed, 156 insertions(+), 29 deletions(-)
-diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm64.c
-+++ b/target/arm/kvm64.c
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_destroy_vcpu(CPUState *cs)
- bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx)
- {
-     /* Return true if the regidx is a register we should synchronize
--     * via the cpreg_tuples array (ie is not a core reg we sync by
--     * hand in kvm_arch_get/put_registers())
-+     * via the cpreg_tuples array (ie is not a core or sve reg that
-+     * we sync by hand in kvm_arch_get/put_registers())
-      */
-     switch (regidx & KVM_REG_ARM_COPROC_MASK) {
-     case KVM_REG_ARM_CORE:
-+    case KVM_REG_ARM64_SVE:
-         return false;
-     default:
-         return true;
-@@ -XXX,XX +XXX,XX @@ int kvm_arm_cpreg_level(uint64_t regidx)
- static int kvm_arch_put_fpsimd(CPUState *cs)
- {
--    ARMCPU *cpu = ARM_CPU(cs);
--    CPUARMState *env = &cpu->env;
-+    CPUARMState *env = &ARM_CPU(cs)->env;
-     struct kvm_one_reg reg;
--    uint32_t fpr;
-     int i, ret;
-     for (i = 0; i < 32; i++) {
-@@ -XXX,XX +XXX,XX @@ static int kvm_arch_put_fpsimd(CPUState *cs)
-         }
-     }
--    reg.addr = (uintptr_t)(&fpr);
--    fpr = vfp_get_fpsr(env);
--    reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpsr);
--    ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
--    if (ret) {
--        return ret;
-+    return 0;
-+}
-+
-+/*
-+ * SVE registers are encoded in KVM's memory in an endianness-invariant format.
-+ * The byte at offset i from the start of the in-memory representation contains
-+ * the bits [(7 + 8 * i) : (8 * i)] of the register value. As this means the
-+ * lowest offsets are stored in the lowest memory addresses, then that nearly
-+ * matches QEMU's representation, which is to use an array of host-endian
-+ * uint64_t's, where the lower offsets are at the lower indices. To complete
-+ * the translation we just need to byte swap the uint64_t's on big-endian hosts.
-+ */
-+static uint64_t *sve_bswap64(uint64_t *dst, uint64_t *src, int nr)
-+{
-+#ifdef HOST_WORDS_BIGENDIAN
-+    int i;
-+
-+    for (i = 0; i < nr; ++i) {
-+        dst[i] = bswap64(src[i]);
-     }
--    reg.addr = (uintptr_t)(&fpr);
--    fpr = vfp_get_fpcr(env);
--    reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpcr);
-+    return dst;
-+#else
-+    return src;
-+#endif
-+}
-+
-+/*
-+ * KVM SVE registers come in slices where ZREGs have a slice size of 2048 bits
-+ * and PREGS and the FFR have a slice size of 256 bits. However we simply hard
-+ * code the slice index to zero for now as it's unlikely we'll need more than
-+ * one slice for quite some time.
-+ */
-+static int kvm_arch_put_sve(CPUState *cs)
-+{
-+    ARMCPU *cpu = ARM_CPU(cs);
-+    CPUARMState *env = &cpu->env;
-+    uint64_t tmp[ARM_MAX_VQ * 2];
-+    uint64_t *r;
-+    struct kvm_one_reg reg;
-+    int n, ret;
-+
-+    for (n = 0; n < KVM_ARM64_SVE_NUM_ZREGS; ++n) {
-+        r = sve_bswap64(tmp, &env->vfp.zregs[n].d[0], cpu->sve_max_vq * 2);
-+        reg.addr = (uintptr_t)r;
-+        reg.id = KVM_REG_ARM64_SVE_ZREG(n, 0);
-+        ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
-+        if (ret) {
-+            return ret;
-+        }
-+    }
-+
-+    for (n = 0; n < KVM_ARM64_SVE_NUM_PREGS; ++n) {
-+        r = sve_bswap64(tmp, r = &env->vfp.pregs[n].p[0],
-+                        DIV_ROUND_UP(cpu->sve_max_vq * 2, 8));
-+        reg.addr = (uintptr_t)r;
-+        reg.id = KVM_REG_ARM64_SVE_PREG(n, 0);
-+        ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
-+        if (ret) {
-+            return ret;
-+        }
-+    }
-+
-+    r = sve_bswap64(tmp, &env->vfp.pregs[FFR_PRED_NUM].p[0],
-+                    DIV_ROUND_UP(cpu->sve_max_vq * 2, 8));
-+    reg.addr = (uintptr_t)r;
-+    reg.id = KVM_REG_ARM64_SVE_FFR(0);
-     ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
-     if (ret) {
-         return ret;
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_put_registers(CPUState *cs, int level)
- {
-     struct kvm_one_reg reg;
-     uint64_t val;
-+    uint32_t fpr;
-     int i, ret;
-     unsigned int el;
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_put_registers(CPUState *cs, int level)
-         }
-     }
--    ret = kvm_arch_put_fpsimd(cs);
-+    if (cpu_isar_feature(aa64_sve, cpu)) {
-+        ret = kvm_arch_put_sve(cs);
-+    } else {
-+        ret = kvm_arch_put_fpsimd(cs);
-+    }
-+    if (ret) {
-+        return ret;
-+    }
-+
-+    reg.addr = (uintptr_t)(&fpr);
-+    fpr = vfp_get_fpsr(env);
-+    reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpsr);
-+    ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
-+    if (ret) {
-+        return ret;
-+    }
-+
-+    reg.addr = (uintptr_t)(&fpr);
-+    fpr = vfp_get_fpcr(env);
-+    reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpcr);
-+    ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
-     if (ret) {
-         return ret;
-     }
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_put_registers(CPUState *cs, int level)
- static int kvm_arch_get_fpsimd(CPUState *cs)
- {
--    ARMCPU *cpu = ARM_CPU(cs);
--    CPUARMState *env = &cpu->env;
-+    CPUARMState *env = &ARM_CPU(cs)->env;
-     struct kvm_one_reg reg;
--    uint32_t fpr;
-     int i, ret;
-     for (i = 0; i < 32; i++) {
-@@ -XXX,XX +XXX,XX @@ static int kvm_arch_get_fpsimd(CPUState *cs)
-         }
-     }
--    reg.addr = (uintptr_t)(&fpr);
--    reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpsr);
--    ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &reg);
--    if (ret) {
--        return ret;
--    }
--    vfp_set_fpsr(env, fpr);
-+    return 0;
-+}
--    reg.addr = (uintptr_t)(&fpr);
--    reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpcr);
-+/*
-+ * KVM SVE registers come in slices where ZREGs have a slice size of 2048 bits
-+ * and PREGS and the FFR have a slice size of 256 bits. However we simply hard
-+ * code the slice index to zero for now as it's unlikely we'll need more than
-+ * one slice for quite some time.
-+ */
-+static int kvm_arch_get_sve(CPUState *cs)
-+{
-+    ARMCPU *cpu = ARM_CPU(cs);
-+    CPUARMState *env = &cpu->env;
-+    struct kvm_one_reg reg;
-+    uint64_t *r;
-+    int n, ret;
-+
-+    for (n = 0; n < KVM_ARM64_SVE_NUM_ZREGS; ++n) {
-+        r = &env->vfp.zregs[n].d[0];
-+        reg.addr = (uintptr_t)r;
-+        reg.id = KVM_REG_ARM64_SVE_ZREG(n, 0);
-+        ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &reg);
-+        if (ret) {
-+            return ret;
-+        }
-+        sve_bswap64(r, r, cpu->sve_max_vq * 2);
-+    }
-+
-+    for (n = 0; n < KVM_ARM64_SVE_NUM_PREGS; ++n) {
-+        r = &env->vfp.pregs[n].p[0];
-+        reg.addr = (uintptr_t)r;
-+        reg.id = KVM_REG_ARM64_SVE_PREG(n, 0);
-+        ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &reg);
-+        if (ret) {
-+            return ret;
-+        }
-+        sve_bswap64(r, r, DIV_ROUND_UP(cpu->sve_max_vq * 2, 8));
-+    }
-+
-+    r = &env->vfp.pregs[FFR_PRED_NUM].p[0];
-+    reg.addr = (uintptr_t)r;
-+    reg.id = KVM_REG_ARM64_SVE_FFR(0);
-     ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &reg);
-     if (ret) {
-         return ret;
-     }
--    vfp_set_fpcr(env, fpr);
-+    sve_bswap64(r, r, DIV_ROUND_UP(cpu->sve_max_vq * 2, 8));
-     return 0;
- }
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_get_registers(CPUState *cs)
-     struct kvm_one_reg reg;
-     uint64_t val;
-     unsigned int el;
-+    uint32_t fpr;
-     int i, ret;
-     ARMCPU *cpu = ARM_CPU(cs);
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_get_registers(CPUState *cs)
-         env->spsr = env->banked_spsr[i];
-     }
--    ret = kvm_arch_get_fpsimd(cs);
-+    if (cpu_isar_feature(aa64_sve, cpu)) {
-+        ret = kvm_arch_get_sve(cs);
-+    } else {
-+        ret = kvm_arch_get_fpsimd(cs);
-+    }
-     if (ret) {
-         return ret;
-     }
-+    reg.addr = (uintptr_t)(&fpr);
-+    reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpsr);
-+    ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &reg);
-+    if (ret) {
-+        return ret;
-+    }
-+    vfp_set_fpsr(env, fpr);
-+
-+    reg.addr = (uintptr_t)(&fpr);
-+    reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpcr);
-+    ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &reg);
-+    if (ret) {
-+        return ret;
-+    }
-+    vfp_set_fpcr(env, fpr);
-+
-     ret = kvm_get_vcpu_events(cpu);
-     if (ret) {
-         return ret;
---
-.20.1

-[PULL 40/51] target/arm/kvm64: max cpu: Enable SVE when available
+Deleted patch
-From: Andrew Jones <drjones@redhat.com>
-Enable SVE in the KVM guest when the 'max' cpu type is configured
-and KVM supports it. KVM SVE requires use of the new finalize
-vcpu ioctl, so we add that now too. For starters SVE can only be
-turned on or off, getting all vector lengths the host CPU supports
-when on. We'll add the other SVE CPU properties in later patches.
-Signed-off-by: Andrew Jones <drjones@redhat.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Eric Auger <eric.auger@redhat.com>
-Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
-Reviewed-by: Beata Michalska <beata.michalska@linaro.org>
-Message-id: 20191024121808.9612-7-drjones@redhat.com
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/kvm_arm.h     | 27 +++++++++++++++++++++++++++
- target/arm/cpu64.c       | 17 ++++++++++++++---
- target/arm/kvm.c         |  5 +++++
- target/arm/kvm64.c       | 20 +++++++++++++++++++-
- tests/arm-cpu-features.c |  4 ++++
-files changed, 69 insertions(+), 4 deletions(-)
-diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm_arm.h
-+++ b/target/arm/kvm_arm.h
-@@ -XXX,XX +XXX,XX @@
-  */
- int kvm_arm_vcpu_init(CPUState *cs);
-+/**
-+ * kvm_arm_vcpu_finalize
-+ * @cs: CPUState
-+ * @feature: int
-+ *
-+ * Finalizes the configuration of the specified VCPU feature by
-+ * invoking the KVM_ARM_VCPU_FINALIZE ioctl. Features requiring
-+ * this are documented in the "KVM_ARM_VCPU_FINALIZE" section of
-+ * KVM's API documentation.
-+ *
-+ * Returns: 0 if success else < 0 error code
-+ */
-+int kvm_arm_vcpu_finalize(CPUState *cs, int feature);
-+
- /**
-  * kvm_arm_register_device:
-  * @mr: memory region for this device
-@@ -XXX,XX +XXX,XX @@ bool kvm_arm_aarch32_supported(CPUState *cs);
-  */
- bool kvm_arm_pmu_supported(CPUState *cs);
-+/**
-+ * bool kvm_arm_sve_supported:
-+ * @cs: CPUState
-+ *
-+ * Returns true if the KVM VCPU can enable SVE and false otherwise.
-+ */
-+bool kvm_arm_sve_supported(CPUState *cs);
-+
- /**
-  * kvm_arm_get_max_vm_ipa_size - Returns the number of bits in the
-  * IPA address space supported by KVM
-@@ -XXX,XX +XXX,XX @@ static inline bool kvm_arm_pmu_supported(CPUState *cs)
-     return false;
- }
-+static inline bool kvm_arm_sve_supported(CPUState *cs)
-+{
-+    return false;
-+}
-+
- static inline int kvm_arm_get_max_vm_ipa_size(MachineState *ms)
- {
-     return -ENOENT;
-diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu64.c
-+++ b/target/arm/cpu64.c
-@@ -XXX,XX +XXX,XX @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
-         return;
-     }
-+    if (value && kvm_enabled() && !kvm_arm_sve_supported(CPU(cpu))) {
-+        error_setg(errp, "'sve' feature not supported by KVM on this host");
-+        return;
-+    }
-+
-     t = cpu->isar.id_aa64pfr0;
-     t = FIELD_DP64(t, ID_AA64PFR0, SVE, value);
-     cpu->isar.id_aa64pfr0 = t;
-@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
- {
-     ARMCPU *cpu = ARM_CPU(obj);
-     uint32_t vq;
-+    uint64_t t;
-     if (kvm_enabled()) {
-         kvm_arm_set_cpu_features_from_host(cpu);
-+        if (kvm_arm_sve_supported(CPU(cpu))) {
-+            t = cpu->isar.id_aa64pfr0;
-+            t = FIELD_DP64(t, ID_AA64PFR0, SVE, 1);
-+            cpu->isar.id_aa64pfr0 = t;
-+        }
-     } else {
--        uint64_t t;
-         uint32_t u;
-         aarch64_a57_initfn(obj);
-@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
-         object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
-                             cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
--        object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
--                            cpu_arm_set_sve, NULL, NULL, &error_fatal);
-         for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
-             char name[8];
-@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
-                                 cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
-         }
-     }
-+
-+    object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
-+                        cpu_arm_set_sve, NULL, NULL, &error_fatal);
- }
- struct ARMCPUInfo {
-diff --git a/target/arm/kvm.c b/target/arm/kvm.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm.c
-+++ b/target/arm/kvm.c
-@@ -XXX,XX +XXX,XX @@ int kvm_arm_vcpu_init(CPUState *cs)
-     return kvm_vcpu_ioctl(cs, KVM_ARM_VCPU_INIT, &init);
- }
-+int kvm_arm_vcpu_finalize(CPUState *cs, int feature)
-+{
-+    return kvm_vcpu_ioctl(cs, KVM_ARM_VCPU_FINALIZE, &feature);
-+}
-+
- void kvm_arm_init_serror_injection(CPUState *cs)
- {
-     cap_has_inject_serror_esr = kvm_check_extension(cs->kvm_state,
-diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm64.c
-+++ b/target/arm/kvm64.c
-@@ -XXX,XX +XXX,XX @@ bool kvm_arm_aarch32_supported(CPUState *cpu)
-     return kvm_check_extension(s, KVM_CAP_ARM_EL1_32BIT);
- }
-+bool kvm_arm_sve_supported(CPUState *cpu)
-+{
-+    KVMState *s = KVM_STATE(current_machine->accelerator);
-+
-+    return kvm_check_extension(s, KVM_CAP_ARM_SVE);
-+}
-+
- #define ARM_CPU_ID_MPIDR       3, 0, 0, 0, 5
- int kvm_arch_init_vcpu(CPUState *cs)
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
-         cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_EL1_32BIT;
-     }
-     if (!kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_PMU_V3)) {
--            cpu->has_pmu = false;
-+        cpu->has_pmu = false;
-     }
-     if (cpu->has_pmu) {
-         cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_PMU_V3;
-     } else {
-         unset_feature(&env->features, ARM_FEATURE_PMU);
-     }
-+    if (cpu_isar_feature(aa64_sve, cpu)) {
-+        assert(kvm_arm_sve_supported(cs));
-+        cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_SVE;
-+    }
-     /* Do KVM_ARM_VCPU_INIT ioctl */
-     ret = kvm_arm_vcpu_init(cs);
-@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
-         return ret;
-     }
-+    if (cpu_isar_feature(aa64_sve, cpu)) {
-+        ret = kvm_arm_vcpu_finalize(cs, KVM_ARM_VCPU_SVE);
-+        if (ret) {
-+            return ret;
-+        }
-+    }
-+
-     /*
-      * When KVM is in use, PSCI is emulated in-kernel and not by qemu.
-      * Currently KVM has its own idea about MPIDR assignment, so we
-diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
-index XXXXXXX..XXXXXXX 100644
---- a/tests/arm-cpu-features.c
-+++ b/tests/arm-cpu-features.c
-@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
-         assert_has_feature(qts, "host", "aarch64");
-         assert_has_feature(qts, "host", "pmu");
-+        assert_has_feature(qts, "max", "sve");
-+
-         assert_error(qts, "cortex-a15",
-             "We cannot guarantee the CPU type 'cortex-a15' works "
-             "with KVM on this host", NULL);
-     } else {
-         assert_has_not_feature(qts, "host", "aarch64");
-         assert_has_not_feature(qts, "host", "pmu");
-+
-+        assert_has_not_feature(qts, "max", "sve");
-     }
-     qtest_quit(qts);
---
-.20.1

-[PULL 41/51] target/arm/kvm: scratch vcpu: Preserve input kvm_vcpu_init features
+Deleted patch
-From: Andrew Jones <drjones@redhat.com>
-kvm_arm_create_scratch_host_vcpu() takes a struct kvm_vcpu_init
-parameter. Rather than just using it as an output parameter to
-pass back the preferred target, use it also as an input parameter,
-allowing a caller to pass a selected target if they wish and to
-also pass cpu features. If the caller doesn't want to select a
-target they can pass -1 for the target which indicates they want
-to use the preferred target and have it passed back like before.
-Signed-off-by: Andrew Jones <drjones@redhat.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Eric Auger <eric.auger@redhat.com>
-Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
-Reviewed-by: Beata Michalska <beata.michalska@linaro.org>
-Message-id: 20191024121808.9612-8-drjones@redhat.com
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/kvm.c   | 20 +++++++++++++++-----
- target/arm/kvm32.c |  6 +++++-
- target/arm/kvm64.c |  6 +++++-
-files changed, 25 insertions(+), 7 deletions(-)
-diff --git a/target/arm/kvm.c b/target/arm/kvm.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm.c
-+++ b/target/arm/kvm.c
-@@ -XXX,XX +XXX,XX @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
-                                       int *fdarray,
-                                       struct kvm_vcpu_init *init)
- {
--    int ret, kvmfd = -1, vmfd = -1, cpufd = -1;
-+    int ret = 0, kvmfd = -1, vmfd = -1, cpufd = -1;
-     kvmfd = qemu_open("/dev/kvm", O_RDWR);
-     if (kvmfd < 0) {
-@@ -XXX,XX +XXX,XX @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
-         goto finish;
-     }
--    ret = ioctl(vmfd, KVM_ARM_PREFERRED_TARGET, init);
-+    if (init->target == -1) {
-+        struct kvm_vcpu_init preferred;
-+
-+        ret = ioctl(vmfd, KVM_ARM_PREFERRED_TARGET, &preferred);
-+        if (!ret) {
-+            init->target = preferred.target;
-+        }
-+    }
-     if (ret >= 0) {
-         ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, init);
-         if (ret < 0) {
-@@ -XXX,XX +XXX,XX @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
-          * creating one kind of guest CPU which is its preferred
-          * CPU type.
-          */
-+        struct kvm_vcpu_init try;
-+
-         while (*cpus_to_try != QEMU_KVM_ARM_TARGET_NONE) {
--            init->target = *cpus_to_try++;
--            memset(init->features, 0, sizeof(init->features));
--            ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, init);
-+            try.target = *cpus_to_try++;
-+            memcpy(try.features, init->features, sizeof(init->features));
-+            ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, &try);
-             if (ret >= 0) {
-                 break;
-             }
-@@ -XXX,XX +XXX,XX @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
-         if (ret < 0) {
-             goto err;
-         }
-+        init->target = try.target;
-     } else {
-         /* Treat a NULL cpus_to_try argument the same as an empty
-          * list, which means we will fail the call since this must
-diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm32.c
-+++ b/target/arm/kvm32.c
-@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
-         QEMU_KVM_ARM_TARGET_CORTEX_A15,
-         QEMU_KVM_ARM_TARGET_NONE
-     };
--    struct kvm_vcpu_init init;
-+    /*
-+     * target = -1 informs kvm_arm_create_scratch_host_vcpu()
-+     * to use the preferred target
-+     */
-+    struct kvm_vcpu_init init = { .target = -1, };
-     if (!kvm_arm_create_scratch_host_vcpu(cpus_to_try, fdarray, &init)) {
-         return false;
-diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/kvm64.c
-+++ b/target/arm/kvm64.c
-@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
-         KVM_ARM_TARGET_CORTEX_A57,
-         QEMU_KVM_ARM_TARGET_NONE
-     };
--    struct kvm_vcpu_init init;
-+    /*
-+     * target = -1 informs kvm_arm_create_scratch_host_vcpu()
-+     * to use the preferred target
-+     */
-+    struct kvm_vcpu_init init = { .target = -1, };
-     if (!kvm_arm_create_scratch_host_vcpu(cpus_to_try, fdarray, &init)) {
-         return false;
---
-.20.1

-[PULL 45/51] hw/arm/bcm2835_peripherals: Use the thermal sensor block
+Deleted patch
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Map the thermal sensor in the BCM2835 block.
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20191019234715.25750-3-f4bug@amsat.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- include/hw/arm/bcm2835_peripherals.h |  2 ++
- include/hw/arm/raspi_platform.h      |  1 +
- hw/arm/bcm2835_peripherals.c         | 13 +++++++++++++
-files changed, 16 insertions(+)
-diff --git a/include/hw/arm/bcm2835_peripherals.h b/include/hw/arm/bcm2835_peripherals.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/bcm2835_peripherals.h
-+++ b/include/hw/arm/bcm2835_peripherals.h
-@@ -XXX,XX +XXX,XX @@
- #include "hw/misc/bcm2835_property.h"
- #include "hw/misc/bcm2835_rng.h"
- #include "hw/misc/bcm2835_mbox.h"
-+#include "hw/misc/bcm2835_thermal.h"
- #include "hw/sd/sdhci.h"
- #include "hw/sd/bcm2835_sdhost.h"
- #include "hw/gpio/bcm2835_gpio.h"
-@@ -XXX,XX +XXX,XX @@ typedef struct BCM2835PeripheralState {
-     SDHCIState sdhci;
-     BCM2835SDHostState sdhost;
-     BCM2835GpioState gpio;
-+    Bcm2835ThermalState thermal;
-     UnimplementedDeviceState i2s;
-     UnimplementedDeviceState spi[1];
-     UnimplementedDeviceState i2c[3];
-diff --git a/include/hw/arm/raspi_platform.h b/include/hw/arm/raspi_platform.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/raspi_platform.h
-+++ b/include/hw/arm/raspi_platform.h
-@@ -XXX,XX +XXX,XX @@
- #define SPI0_OFFSET             0x204000
- #define BSC0_OFFSET             0x205000 /* BSC0 I2C/TWI */
- #define OTP_OFFSET              0x20f000
-+#define THERMAL_OFFSET          0x212000
- #define BSC_SL_OFFSET           0x214000 /* SPI slave */
- #define AUX_OFFSET              0x215000 /* AUX: UART1/SPI1/SPI2 */
- #define EMMC1_OFFSET            0x300000
-diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/bcm2835_peripherals.c
-+++ b/hw/arm/bcm2835_peripherals.c
-@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_init(Object *obj)
-     object_property_add_const_link(OBJECT(&s->dma), "dma-mr",
-                                    OBJECT(&s->gpu_bus_mr), &error_abort);
-+    /* Thermal */
-+    sysbus_init_child_obj(obj, "thermal", &s->thermal, sizeof(s->thermal),
-+                          TYPE_BCM2835_THERMAL);
-+
-     /* GPIO */
-     sysbus_init_child_obj(obj, "gpio", &s->gpio, sizeof(s->gpio),
-                           TYPE_BCM2835_GPIO);
-@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
-                                                   INTERRUPT_DMA0 + n));
-     }
-+    /* THERMAL */
-+    object_property_set_bool(OBJECT(&s->thermal), true, "realized", &err);
-+    if (err) {
-+        error_propagate(errp, err);
-+        return;
-+    }
-+    memory_region_add_subregion(&s->peri_mr, THERMAL_OFFSET,
-+                sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->thermal), 0));
-+
-     /* GPIO */
-     object_property_set_bool(OBJECT(&s->gpio), true, "realized", &err);
-     if (err) {
---
-.20.1

-[PULL 47/51] hw/arm/bcm2835_peripherals: Use the SYS_timer
+Deleted patch
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Connect the recently added SYS_timer.
-Now U-Boot does not hang anymore polling a free running counter
-stuck at 0.
-This timer is also used by the Linux kernel thermal subsystem.
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20191019234715.25750-5-f4bug@amsat.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- include/hw/arm/bcm2835_peripherals.h |  3 ++-
- hw/arm/bcm2835_peripherals.c         | 17 ++++++++++++++++-
-files changed, 18 insertions(+), 2 deletions(-)
-diff --git a/include/hw/arm/bcm2835_peripherals.h b/include/hw/arm/bcm2835_peripherals.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/bcm2835_peripherals.h
-+++ b/include/hw/arm/bcm2835_peripherals.h
-@@ -XXX,XX +XXX,XX @@
- #include "hw/sd/sdhci.h"
- #include "hw/sd/bcm2835_sdhost.h"
- #include "hw/gpio/bcm2835_gpio.h"
-+#include "hw/timer/bcm2835_systmr.h"
- #include "hw/misc/unimp.h"
- #define TYPE_BCM2835_PERIPHERALS "bcm2835-peripherals"
-@@ -XXX,XX +XXX,XX @@ typedef struct BCM2835PeripheralState {
-     MemoryRegion ram_alias[4];
-     qemu_irq irq, fiq;
--    UnimplementedDeviceState systmr;
-+    BCM2835SystemTimerState systmr;
-     UnimplementedDeviceState armtmr;
-     UnimplementedDeviceState cprman;
-     UnimplementedDeviceState a2w;
-diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/bcm2835_peripherals.c
-+++ b/hw/arm/bcm2835_peripherals.c
-@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_init(Object *obj)
-     /* Interrupt Controller */
-     sysbus_init_child_obj(obj, "ic", &s->ic, sizeof(s->ic), TYPE_BCM2835_IC);
-+    /* SYS Timer */
-+    sysbus_init_child_obj(obj, "systimer", &s->systmr, sizeof(s->systmr),
-+                          TYPE_BCM2835_SYSTIMER);
-+
-     /* UART0 */
-     sysbus_init_child_obj(obj, "uart0", &s->uart0, sizeof(s->uart0),
-                           TYPE_PL011);
-@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
-                 sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->ic), 0));
-     sysbus_pass_irq(SYS_BUS_DEVICE(s), SYS_BUS_DEVICE(&s->ic));
-+    /* Sys Timer */
-+    object_property_set_bool(OBJECT(&s->systmr), true, "realized", &err);
-+    if (err) {
-+        error_propagate(errp, err);
-+        return;
-+    }
-+    memory_region_add_subregion(&s->peri_mr, ST_OFFSET,
-+                sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->systmr), 0));
-+    sysbus_connect_irq(SYS_BUS_DEVICE(&s->systmr), 0,
-+        qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_ARM_IRQ,
-+                               INTERRUPT_ARM_TIMER));
-+
-     /* UART0 */
-     qdev_prop_set_chr(DEVICE(&s->uart0), "chardev", serial_hd(0));
-     object_property_set_bool(OBJECT(&s->uart0), true, "realized", &err);
-@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
-     }
-     create_unimp(s, &s->armtmr, "bcm2835-sp804", ARMCTRL_TIMER0_1_OFFSET, 0x40);
--    create_unimp(s, &s->systmr, "bcm2835-systimer", ST_OFFSET, 0x20);
-     create_unimp(s, &s->cprman, "bcm2835-cprman", CPRMAN_OFFSET, 0x1000);
-     create_unimp(s, &s->a2w, "bcm2835-a2w", A2W_OFFSET, 0x1000);
-     create_unimp(s, &s->i2s, "bcm2835-i2s", I2S_OFFSET, 0x100);
---
-.20.1

-[PULL 48/51] hw/arm/bcm2836: Make the SoC code modular
+Deleted patch
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-This file creates the BCM2836/BCM2837 blocks.
-The biggest differences with the BCM2838 we are going to add, are
-the base addresses of the interrupt controller and the peripherals.
-Add these addresses in the BCM283XInfo structure to make this
-block more modular. Remove the MCORE_OFFSET offset as it is
-not useful and rather confusing.
-Reviewed-by: Esteban Bosse <estebanbosse@gmail.com>
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
-Message-id: 20191019234715.25750-6-f4bug@amsat.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- hw/arm/bcm2836.c | 18 +++++++++---------
-file changed, 9 insertions(+), 9 deletions(-)
-diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/bcm2836.c
-+++ b/hw/arm/bcm2836.c
-@@ -XXX,XX +XXX,XX @@
- #include "hw/arm/raspi_platform.h"
- #include "hw/sysbus.h"
--/* Peripheral base address seen by the CPU */
--#define BCM2836_PERI_BASE       0x3F000000
--
--/* "QA7" (Pi2) interrupt controller and mailboxes etc. */
--#define BCM2836_CONTROL_BASE    0x40000000
--
- struct BCM283XInfo {
-     const char *name;
-     const char *cpu_type;
-+    hwaddr peri_base; /* Peripheral base address seen by the CPU */
-+    hwaddr ctrl_base; /* Interrupt controller and mailboxes etc. */
-     int clusterid;
- };
-@@ -XXX,XX +XXX,XX @@ static const BCM283XInfo bcm283x_socs[] = {
-     {
-         .name = TYPE_BCM2836,
-         .cpu_type = ARM_CPU_TYPE_NAME("cortex-a7"),
-+        .peri_base = 0x3f000000,
-+        .ctrl_base = 0x40000000,
-         .clusterid = 0xf,
-     },
- #ifdef TARGET_AARCH64
-     {
-         .name = TYPE_BCM2837,
-         .cpu_type = ARM_CPU_TYPE_NAME("cortex-a53"),
-+        .peri_base = 0x3f000000,
-+        .ctrl_base = 0x40000000,
-         .clusterid = 0x0,
-     },
- #endif
-@@ -XXX,XX +XXX,XX @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
-     }
-     sysbus_mmio_map_overlap(SYS_BUS_DEVICE(&s->peripherals), 0,
--                            BCM2836_PERI_BASE, 1);
-+                            info->peri_base, 1);
-     /* bcm2836 interrupt controller (and mailboxes, etc.) */
-     object_property_set_bool(OBJECT(&s->control), true, "realized", &err);
-@@ -XXX,XX +XXX,XX @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
-         return;
-     }
--    sysbus_mmio_map(SYS_BUS_DEVICE(&s->control), 0, BCM2836_CONTROL_BASE);
-+    sysbus_mmio_map(SYS_BUS_DEVICE(&s->control), 0, info->ctrl_base);
-     sysbus_connect_irq(SYS_BUS_DEVICE(&s->peripherals), 0,
-         qdev_get_gpio_in_named(DEVICE(&s->control), "gpu-irq", 0));
-@@ -XXX,XX +XXX,XX @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
-         /* set periphbase/CBAR value for CPU-local registers */
-         object_property_set_int(OBJECT(&s->cpus[n]),
--                                BCM2836_PERI_BASE + MSYNC_OFFSET,
-+                                info->peri_base,
-                                 "reset-cbar", &err);
-         if (err) {
-             error_propagate(errp, err);
---
-.20.1

-[PULL 49/51] hw/arm/bcm2836: Rename cpus[] as cpu[].core
+Deleted patch
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-As we are going to add more core-specific fields, add a 'cpu'
-structure and move the ARMCPU field there as 'core'.
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
-Message-id: 20191019234715.25750-7-f4bug@amsat.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- include/hw/arm/bcm2836.h |  4 +++-
- hw/arm/bcm2836.c         | 26 ++++++++++++++------------
-files changed, 17 insertions(+), 13 deletions(-)
-diff --git a/include/hw/arm/bcm2836.h b/include/hw/arm/bcm2836.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/hw/arm/bcm2836.h
-+++ b/include/hw/arm/bcm2836.h
-@@ -XXX,XX +XXX,XX @@ typedef struct BCM283XState {
-     char *cpu_type;
-     uint32_t enabled_cpus;
--    ARMCPU cpus[BCM283X_NCPUS];
-+    struct {
-+        ARMCPU core;
-+    } cpu[BCM283X_NCPUS];
-     BCM2836ControlState control;
-     BCM2835PeripheralState peripherals;
- } BCM283XState;
-diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/bcm2836.c
-+++ b/hw/arm/bcm2836.c
-@@ -XXX,XX +XXX,XX @@ static void bcm2836_init(Object *obj)
-     int n;
-     for (n = 0; n < BCM283X_NCPUS; n++) {
--        object_initialize_child(obj, "cpu[*]", &s->cpus[n], sizeof(s->cpus[n]),
--                                info->cpu_type, &error_abort, NULL);
-+        object_initialize_child(obj, "cpu[*]", &s->cpu[n].core,
-+                                sizeof(s->cpu[n].core), info->cpu_type,
-+                                &error_abort, NULL);
-     }
-     sysbus_init_child_obj(obj, "control", &s->control, sizeof(s->control),
-@@ -XXX,XX +XXX,XX @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
-     for (n = 0; n < BCM283X_NCPUS; n++) {
-         /* TODO: this should be converted to a property of ARM_CPU */
--        s->cpus[n].mp_affinity = (info->clusterid << 8) | n;
-+        s->cpu[n].core.mp_affinity = (info->clusterid << 8) | n;
-         /* set periphbase/CBAR value for CPU-local registers */
--        object_property_set_int(OBJECT(&s->cpus[n]),
-+        object_property_set_int(OBJECT(&s->cpu[n].core),
-                                 info->peri_base,
-                                 "reset-cbar", &err);
-         if (err) {
-@@ -XXX,XX +XXX,XX @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
-         }
-         /* start powered off if not enabled */
--        object_property_set_bool(OBJECT(&s->cpus[n]), n >= s->enabled_cpus,
-+        object_property_set_bool(OBJECT(&s->cpu[n].core), n >= s->enabled_cpus,
-                                  "start-powered-off", &err);
-         if (err) {
-             error_propagate(errp, err);
-             return;
-         }
--        object_property_set_bool(OBJECT(&s->cpus[n]), true, "realized", &err);
-+        object_property_set_bool(OBJECT(&s->cpu[n].core), true,
-+                                 "realized", &err);
-         if (err) {
-             error_propagate(errp, err);
-             return;
-@@ -XXX,XX +XXX,XX @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
-         /* Connect irq/fiq outputs from the interrupt controller. */
-         qdev_connect_gpio_out_named(DEVICE(&s->control), "irq", n,
--                qdev_get_gpio_in(DEVICE(&s->cpus[n]), ARM_CPU_IRQ));
-+                qdev_get_gpio_in(DEVICE(&s->cpu[n].core), ARM_CPU_IRQ));
-         qdev_connect_gpio_out_named(DEVICE(&s->control), "fiq", n,
--                qdev_get_gpio_in(DEVICE(&s->cpus[n]), ARM_CPU_FIQ));
-+                qdev_get_gpio_in(DEVICE(&s->cpu[n].core), ARM_CPU_FIQ));
-         /* Connect timers from the CPU to the interrupt controller */
--        qdev_connect_gpio_out(DEVICE(&s->cpus[n]), GTIMER_PHYS,
-+        qdev_connect_gpio_out(DEVICE(&s->cpu[n].core), GTIMER_PHYS,
-                 qdev_get_gpio_in_named(DEVICE(&s->control), "cntpnsirq", n));
--        qdev_connect_gpio_out(DEVICE(&s->cpus[n]), GTIMER_VIRT,
-+        qdev_connect_gpio_out(DEVICE(&s->cpu[n].core), GTIMER_VIRT,
-                 qdev_get_gpio_in_named(DEVICE(&s->control), "cntvirq", n));
--        qdev_connect_gpio_out(DEVICE(&s->cpus[n]), GTIMER_HYP,
-+        qdev_connect_gpio_out(DEVICE(&s->cpu[n].core), GTIMER_HYP,
-                 qdev_get_gpio_in_named(DEVICE(&s->control), "cnthpirq", n));
--        qdev_connect_gpio_out(DEVICE(&s->cpus[n]), GTIMER_SEC,
-+        qdev_connect_gpio_out(DEVICE(&s->cpu[n].core), GTIMER_SEC,
-                 qdev_get_gpio_in_named(DEVICE(&s->control), "cntpsirq", n));
-     }
- }
---
-.20.1

-[PULL 50/51] hw/arm/raspi: Use AddressSpace when using arm_boot::write_secondary_boot
+Deleted patch
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-write_secondary_boot() is used in SMP configurations where the
-CPU address space might not be the main System Bus.
-The rom_add_blob_fixed_as() function allow us to specify an
-address space. Use it to write each boot blob in the corresponding
-CPU address space.
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
-Message-id: 20191019234715.25750-11-f4bug@amsat.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- hw/arm/raspi.c | 14 ++++++++------
-file changed, 8 insertions(+), 6 deletions(-)
-diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/raspi.c
-+++ b/hw/arm/raspi.c
-@@ -XXX,XX +XXX,XX @@ static void write_smpboot(ARMCPU *cpu, const struct arm_boot_info *info)
-     QEMU_BUILD_BUG_ON((BOARDSETUP_ADDR & 0xf) != 0
-                       || (BOARDSETUP_ADDR >> 4) >= 0x100);
--    rom_add_blob_fixed("raspi_smpboot", smpboot, sizeof(smpboot),
--                       info->smp_loader_start);
-+    rom_add_blob_fixed_as("raspi_smpboot", smpboot, sizeof(smpboot),
-+                          info->smp_loader_start,
-+                          arm_boot_address_space(cpu, info));
- }
- static void write_smpboot64(ARMCPU *cpu, const struct arm_boot_info *info)
- {
-+    AddressSpace *as = arm_boot_address_space(cpu, info);
-     /* Unlike the AArch32 version we don't need to call the board setup hook.
-      * The mechanism for doing the spin-table is also entirely different.
-      * We must have four 64-bit fields at absolute addresses
-@@ -XXX,XX +XXX,XX @@ static void write_smpboot64(ARMCPU *cpu, const struct arm_boot_info *info)
-, 0, 0, 0
-     };
--    rom_add_blob_fixed("raspi_smpboot", smpboot, sizeof(smpboot),
--                       info->smp_loader_start);
--    rom_add_blob_fixed("raspi_spintables", spintables, sizeof(spintables),
--                       SPINTABLE_ADDR);
-+    rom_add_blob_fixed_as("raspi_smpboot", smpboot, sizeof(smpboot),
-+                          info->smp_loader_start, as);
-+    rom_add_blob_fixed_as("raspi_spintables", spintables, sizeof(spintables),
-+                          SPINTABLE_ADDR, as);
- }
- static void write_board_setup(ARMCPU *cpu, const struct arm_boot_info *info)
---
-.20.1

Probably the last arm pullreq before softfreeze...

The following changes since commit 58560ad254fbda71d4daa6622d71683190070ee2:

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.2-20191024' into staging (2019-10-24 16:22:58 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20191024

for you to fetch changes up to a01a4a3e85ae8f6fe21adbedc80f7013faabdcf4:

hw/arm/highbank: Use AddressSpace when using write_secondary_boot() (2019-10-24 17:16:30 +0100)

----------------------------------------------------------------
target-arm queue:
 * raspi boards: some cleanup
 * raspi: implement the bcm2835 system timer device
 * raspi: implement a dummy thermal sensor
 * KVM: support providing SVE to the guest
 * misc devices: switch to ptimer transaction API
 * cache TB flag state to improve performance of cpu_get_tb_cpu_state
 * aspeed: Add an AST2600 eval board

----------------------------------------------------------------
Andrew Jones (9):
      target/arm/monitor: Introduce qmp_query_cpu_model_expansion
      tests: arm: Introduce cpu feature tests
      target/arm: Allow SVE to be disabled via a CPU property
      target/arm/cpu64: max cpu: Introduce sve<N> properties
      target/arm/kvm64: Add kvm_arch_get/put_sve
      target/arm/kvm64: max cpu: Enable SVE when available
      target/arm/kvm: scratch vcpu: Preserve input kvm_vcpu_init features
      target/arm/cpu64: max cpu: Support sve properties with KVM
      target/arm/kvm: host cpu: Add support for sve<N> properties

Cédric Le Goater (2):
      hw/gpio: Fix property accessors of the AST2600 GPIO 1.8V model
      aspeed: Add an AST2600 eval board

Peter Maydell (8):
      hw/net/fsl_etsec/etsec.c: Switch to transaction-based ptimer API
      hw/timer/xilinx_timer.c: Switch to transaction-based ptimer API
      hw/dma/xilinx_axidma.c: Switch to transaction-based ptimer API
      hw/timer/slavio_timer: Remove useless check for NULL t->timer
      hw/timer/slavio_timer.c: Switch to transaction-based ptimer API
      hw/timer/grlib_gptimer.c: Switch to transaction-based ptimer API
      hw/m68k/mcf5206.c: Switch to transaction-based ptimer API
      hw/watchdog/milkymist-sysctl.c: Switch to transaction-based ptimer API

Philippe Mathieu-Daudé (8):
      hw/misc/bcm2835_thermal: Add a dummy BCM2835 thermal sensor
      hw/arm/bcm2835_peripherals: Use the thermal sensor block
      hw/timer/bcm2835: Add the BCM2835 SYS_timer
      hw/arm/bcm2835_peripherals: Use the SYS_timer
      hw/arm/bcm2836: Make the SoC code modular
      hw/arm/bcm2836: Rename cpus[] as cpu[].core
      hw/arm/raspi: Use AddressSpace when using arm_boot::write_secondary_boot
      hw/arm/highbank: Use AddressSpace when using write_secondary_boot()

Richard Henderson (24):
      target/arm: Split out rebuild_hflags_common
      target/arm: Split out rebuild_hflags_a64
      target/arm: Split out rebuild_hflags_common_32
      target/arm: Split arm_cpu_data_is_big_endian
      target/arm: Split out rebuild_hflags_m32
      target/arm: Reduce tests vs M-profile in cpu_get_tb_cpu_state
      target/arm: Split out rebuild_hflags_a32
      target/arm: Split out rebuild_hflags_aprofile
      target/arm: Hoist XSCALE_CPAR, VECLEN, VECSTRIDE in cpu_get_tb_cpu_state
      target/arm: Simplify set of PSTATE_SS in cpu_get_tb_cpu_state
      target/arm: Hoist computation of TBFLAG_A32.VFPEN
      target/arm: Add arm_rebuild_hflags
      target/arm: Split out arm_mmu_idx_el
      target/arm: Hoist store to cs_base in cpu_get_tb_cpu_state
      target/arm: Add HELPER(rebuild_hflags_{a32, a64, m32})
      target/arm: Rebuild hflags at EL changes
      target/arm: Rebuild hflags at MSR writes
      target/arm: Rebuild hflags at CPSR writes
      target/arm: Rebuild hflags at Xscale SCTLR writes
      target/arm: Rebuild hflags for M-profile
      target/arm: Rebuild hflags for M-profile NVIC
      linux-user/aarch64: Rebuild hflags for TARGET_WORDS_BIGENDIAN
      linux-user/arm: Rebuild hflags for TARGET_WORDS_BIGENDIAN
      target/arm: Rely on hflags correct in cpu_get_tb_cpu_state

From: Cédric Le Goater <clg@kaod.org>

The property names of AST2600 GPIO 1.8V model are one character bigger
than the names of the other ASPEED GPIO model. Increase the string
buffer size by one and be more strict on the expected pattern of the
property name.

This fixes the QOM test of the ast2600-evb machine under :

Apple LLVM version 10.0.0 (clang-1000.10.44.4)
  Target: x86_64-apple-darwin17.7.0
  Thread model: posix
  InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Cc: Rashmica Gupta <rashmica.g@gmail.com>
Fixes: 36d737ee82b2 ("hw/gpio: Add in AST2600 specific implementation")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-id: 20191023130455.1347-2-clg@kaod.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/gpio/aspeed_gpio.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/gpio/aspeed_gpio.c b/hw/gpio/aspeed_gpio.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/gpio/aspeed_gpio.c
+++ b/hw/gpio/aspeed_gpio.c
@@ -XXX,XX +XXX,XX @@ static void aspeed_gpio_get_pin(Object *obj, Visitor *v, const char *name,
 {
     int pin = 0xfff;
     bool level = true;
-    char group[3];
+    char group[4];
     AspeedGPIOState *s = ASPEED_GPIO(obj);
     int set_idx, group_idx = 0;
 
     if (sscanf(name, "gpio%2[A-Z]%1d", group, &pin) != 2) {
         /* 1.8V gpio */
-        if (sscanf(name, "gpio%3s%1d", group, &pin) != 2) {
+        if (sscanf(name, "gpio%3[18A-E]%1d", group, &pin) != 2) {
             error_setg(errp, "%s: error reading %s", __func__, name);
             return;
         }
@@ -XXX,XX +XXX,XX @@ static void aspeed_gpio_set_pin(Object *obj, Visitor *v, const char *name,
     Error *local_err = NULL;
     bool level;
     int pin = 0xfff;
-    char group[3];
+    char group[4];
     AspeedGPIOState *s = ASPEED_GPIO(obj);
     int set_idx, group_idx = 0;
 
@@ -XXX,XX +XXX,XX @@ static void aspeed_gpio_set_pin(Object *obj, Visitor *v, const char *name,
     }
     if (sscanf(name, "gpio%2[A-Z]%1d", group, &pin) != 2) {
         /* 1.8V gpio */
-        if (sscanf(name, "gpio%3s%1d", group, &pin) != 2) {
+        if (sscanf(name, "gpio%3[18A-E]%1d", group, &pin) != 2) {
             error_setg(errp, "%s: error reading %s", __func__, name);
             return;
         }
-- 
2.20.1

From: Cédric Le Goater <clg@kaod.org>

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Message-id: 20191023130455.1347-3-clg@kaod.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/aspeed.h |  1 +
 hw/arm/aspeed.c         | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/include/hw/arm/aspeed.h b/include/hw/arm/aspeed.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/aspeed.h
+++ b/include/hw/arm/aspeed.h
@@ -XXX,XX +XXX,XX @@ typedef struct AspeedBoardConfig {
     const char *desc;
     const char *soc_name;
     uint32_t hw_strap1;
+    uint32_t hw_strap2;
     const char *fmc_model;
     const char *spi_model;
     uint32_t num_cs;
diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -XXX,XX +XXX,XX @@ struct AspeedBoardState {
 /* Witherspoon hardware value: 0xF10AD216 (but use romulus definition) */
 #define WITHERSPOON_BMC_HW_STRAP1 ROMULUS_BMC_HW_STRAP1
 
+/* AST2600 evb hardware value */
+#define AST2600_EVB_HW_STRAP1 0x000000C0
+#define AST2600_EVB_HW_STRAP2 0x00000003
+
 /*
  * The max ram region is for firmwares that scan the address space
  * with load/store to guess how much RAM the SoC has.
@@ -XXX,XX +XXX,XX @@ static void aspeed_board_init(MachineState *machine,
                              &error_abort);
     object_property_set_int(OBJECT(&bmc->soc), cfg->hw_strap1, "hw-strap1",
                             &error_abort);
+    object_property_set_int(OBJECT(&bmc->soc), cfg->hw_strap2, "hw-strap2",
+                            &error_abort);
     object_property_set_int(OBJECT(&bmc->soc), cfg->num_cs, "num-cs",
                             &error_abort);
     object_property_set_int(OBJECT(&bmc->soc), machine->smp.cpus, "num-cpus",
@@ -XXX,XX +XXX,XX @@ static void ast2500_evb_i2c_init(AspeedBoardState *bmc)
     i2c_create_slave(aspeed_i2c_get_bus(DEVICE(&soc->i2c), 11), "ds1338", 0x32);
 }
 
+static void ast2600_evb_i2c_init(AspeedBoardState *bmc)
+{
+    /* Start with some devices on our I2C busses */
+    ast2500_evb_i2c_init(bmc);
+}
+
 static void romulus_bmc_i2c_init(AspeedBoardState *bmc)
 {
     AspeedSoCState *soc = &bmc->soc;
@@ -XXX,XX +XXX,XX @@ static const AspeedBoardConfig aspeed_boards[] = {
         .num_cs    = 2,
         .i2c_init  = witherspoon_bmc_i2c_init,
         .ram       = 512 * MiB,
+    }, {
+        .name      = MACHINE_TYPE_NAME("ast2600-evb"),
+        .desc      = "Aspeed AST2600 EVB (Cortex A7)",
+        .soc_name  = "ast2600-a0",
+        .hw_strap1 = AST2600_EVB_HW_STRAP1,
+        .hw_strap2 = AST2600_EVB_HW_STRAP2,
+        .fmc_model = "w25q512jv",
+        .spi_model = "mx66u51235f",
+        .num_cs    = 1,
+        .i2c_init  = ast2600_evb_i2c_init,
+        .ram       = 1 * GiB,
     },
 };
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Create a function to compute the values of the TBFLAG_ANY bits
that will be cached.  For now, the env->hflags variable is not
used, and the results are fed back to cpu_get_tb_cpu_state.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h    | 29 ++++++++++++++++++-----------
 target/arm/helper.c | 26 +++++++++++++++++++-------
 2 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUARMState {
     uint32_t pstate;
     uint32_t aarch64; /* 1 if CPU is in aarch64 state; inverse of PSTATE.nRW */
 
+    /* Cached TBFLAGS state.  See below for which bits are included.  */
+    uint32_t hflags;
+
     /* Frequently accessed CPSR bits are stored separately for efficiency.
        This contains all the other bits.  Use cpsr_{read,write} to access
        the whole CPSR.  */
@@ -XXX,XX +XXX,XX @@ typedef ARMCPU ArchCPU;
 
 #include "exec/cpu-all.h"
 
-/* Bit usage in the TB flags field: bit 31 indicates whether we are
+/*
+ * Bit usage in the TB flags field: bit 31 indicates whether we are
  * in 32 or 64 bit mode. The meaning of the other bits depends on that.
  * We put flags which are shared between 32 and 64 bit mode at the top
  * of the word, and flags which apply to only one mode at the bottom.
+ *
+ * Unless otherwise noted, these bits are cached in env->hflags.
  */
 FIELD(TBFLAG_ANY, AARCH64_STATE, 31, 1)
 FIELD(TBFLAG_ANY, MMUIDX, 28, 3)
 FIELD(TBFLAG_ANY, SS_ACTIVE, 27, 1)
-FIELD(TBFLAG_ANY, PSTATE_SS, 26, 1)
+FIELD(TBFLAG_ANY, PSTATE_SS, 26, 1)     /* Not cached. */
 /* Target EL if we take a floating-point-disabled exception */
 FIELD(TBFLAG_ANY, FPEXC_EL, 24, 2)
 FIELD(TBFLAG_ANY, BE_DATA, 23, 1)
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_ANY, BE_DATA, 23, 1)
 FIELD(TBFLAG_ANY, DEBUG_TARGET_EL, 21, 2)
 
 /* Bit usage when in AArch32 state: */
-FIELD(TBFLAG_A32, THUMB, 0, 1)
-FIELD(TBFLAG_A32, VECLEN, 1, 3)
-FIELD(TBFLAG_A32, VECSTRIDE, 4, 2)
+FIELD(TBFLAG_A32, THUMB, 0, 1)          /* Not cached. */
+FIELD(TBFLAG_A32, VECLEN, 1, 3)         /* Not cached. */
+FIELD(TBFLAG_A32, VECSTRIDE, 4, 2)      /* Not cached. */
 /*
  * We store the bottom two bits of the CPAR as TB flags and handle
  * checks on the other bits at runtime. This shares the same bits as
  * VECSTRIDE, which is OK as no XScale CPU has VFP.
+ * Not cached, because VECLEN+VECSTRIDE are not cached.
  */
 FIELD(TBFLAG_A32, XSCALE_CPAR, 4, 2)
 /*
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A32, XSCALE_CPAR, 4, 2)
  * the same thing as the current security state of the processor!
  */
 FIELD(TBFLAG_A32, NS, 6, 1)
-FIELD(TBFLAG_A32, VFPEN, 7, 1)
-FIELD(TBFLAG_A32, CONDEXEC, 8, 8)
+FIELD(TBFLAG_A32, VFPEN, 7, 1)          /* Not cached. */
+FIELD(TBFLAG_A32, CONDEXEC, 8, 8)       /* Not cached. */
 FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
 /* For M profile only, set if FPCCR.LSPACT is set */
-FIELD(TBFLAG_A32, LSPACT, 18, 1)
+FIELD(TBFLAG_A32, LSPACT, 18, 1)        /* Not cached. */
 /* For M profile only, set if we must create a new FP context */
-FIELD(TBFLAG_A32, NEW_FP_CTXT_NEEDED, 19, 1)
+FIELD(TBFLAG_A32, NEW_FP_CTXT_NEEDED, 19, 1) /* Not cached. */
 /* For M profile only, set if FPCCR.S does not match current security state */
-FIELD(TBFLAG_A32, FPCCR_S_WRONG, 20, 1)
+FIELD(TBFLAG_A32, FPCCR_S_WRONG, 20, 1) /* Not cached. */
 /* For M profile only, Handler (ie not Thread) mode */
 FIELD(TBFLAG_A32, HANDLER, 21, 1)
 /* For M profile only, whether we should generate stack-limit checks */
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A64, SVEEXC_EL, 2, 2)
 FIELD(TBFLAG_A64, ZCR_LEN, 4, 4)
 FIELD(TBFLAG_A64, PAUTH_ACTIVE, 8, 1)
 FIELD(TBFLAG_A64, BT, 9, 1)
-FIELD(TBFLAG_A64, BTYPE, 10, 2)
+FIELD(TBFLAG_A64, BTYPE, 10, 2)         /* Not cached. */
 FIELD(TBFLAG_A64, TBID, 12, 2)
 
 static inline bool bswap_code(bool sctlr_b)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env)
 }
 #endif
 
+static uint32_t rebuild_hflags_common(CPUARMState *env, int fp_el,
+                                      ARMMMUIdx mmu_idx, uint32_t flags)
+{
+    flags = FIELD_DP32(flags, TBFLAG_ANY, FPEXC_EL, fp_el);
+    flags = FIELD_DP32(flags, TBFLAG_ANY, MMUIDX,
+                       arm_to_core_mmu_idx(mmu_idx));
+
+    if (arm_cpu_data_is_big_endian(env)) {
+        flags = FIELD_DP32(flags, TBFLAG_ANY, BE_DATA, 1);
+    }
+    if (arm_singlestep_active(env)) {
+        flags = FIELD_DP32(flags, TBFLAG_ANY, SS_ACTIVE, 1);
+    }
+    return flags;
+}
+
 void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
                           target_ulong *cs_base, uint32_t *pflags)
 {
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         }
     }
 
-    flags = FIELD_DP32(flags, TBFLAG_ANY, MMUIDX, arm_to_core_mmu_idx(mmu_idx));
+    flags = rebuild_hflags_common(env, fp_el, mmu_idx, flags);
 
     /* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
      * states defined in the ARM ARM for software singlestep:
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
      *     0            x       Inactive (the TB flag for SS is always 0)
      *     1            0       Active-pending
      *     1            1       Active-not-pending
+     * SS_ACTIVE is set in hflags; PSTATE_SS is computed every TB.
      */
-    if (arm_singlestep_active(env)) {
-        flags = FIELD_DP32(flags, TBFLAG_ANY, SS_ACTIVE, 1);
+    if (FIELD_EX32(flags, TBFLAG_ANY, SS_ACTIVE)) {
         if (is_a64(env)) {
             if (env->pstate & PSTATE_SS) {
                 flags = FIELD_DP32(flags, TBFLAG_ANY, PSTATE_SS, 1);
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
             }
         }
     }
-    if (arm_cpu_data_is_big_endian(env)) {
-        flags = FIELD_DP32(flags, TBFLAG_ANY, BE_DATA, 1);
-    }
-    flags = FIELD_DP32(flags, TBFLAG_ANY, FPEXC_EL, fp_el);
 
     if (arm_v7m_is_handler_mode(env)) {
         flags = FIELD_DP32(flags, TBFLAG_A32, HANDLER, 1);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Create a function to compute the values of the TBFLAG_A64 bits
that will be cached.  For now, the env->hflags variable is not
used, and the results are fed back to cpu_get_tb_cpu_state.

Note that not all BTI related flags are cached, so we have to
test the BTI feature twice -- once for those bits moved out to
rebuild_hflags_a64 and once for those bits that remain in
cpu_get_tb_cpu_state.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 131 +++++++++++++++++++++++---------------------
 1 file changed, 69 insertions(+), 62 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_common(CPUARMState *env, int fp_el,
     return flags;
 }
 
+static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
+                                   ARMMMUIdx mmu_idx)
+{
+    ARMMMUIdx stage1 = stage_1_mmu_idx(mmu_idx);
+    ARMVAParameters p0 = aa64_va_parameters_both(env, 0, stage1);
+    uint32_t flags = 0;
+    uint64_t sctlr;
+    int tbii, tbid;
+
+    flags = FIELD_DP32(flags, TBFLAG_ANY, AARCH64_STATE, 1);
+
+    /* FIXME: ARMv8.1-VHE S2 translation regime.  */
+    if (regime_el(env, stage1) < 2) {
+        ARMVAParameters p1 = aa64_va_parameters_both(env, -1, stage1);
+        tbid = (p1.tbi << 1) | p0.tbi;
+        tbii = tbid & ~((p1.tbid << 1) | p0.tbid);
+    } else {
+        tbid = p0.tbi;
+        tbii = tbid & !p0.tbid;
+    }
+
+    flags = FIELD_DP32(flags, TBFLAG_A64, TBII, tbii);
+    flags = FIELD_DP32(flags, TBFLAG_A64, TBID, tbid);
+
+    if (cpu_isar_feature(aa64_sve, env_archcpu(env))) {
+        int sve_el = sve_exception_el(env, el);
+        uint32_t zcr_len;
+
+        /*
+         * If SVE is disabled, but FP is enabled,
+         * then the effective len is 0.
+         */
+        if (sve_el != 0 && fp_el == 0) {
+            zcr_len = 0;
+        } else {
+            zcr_len = sve_zcr_len_for_el(env, el);
+        }
+        flags = FIELD_DP32(flags, TBFLAG_A64, SVEEXC_EL, sve_el);
+        flags = FIELD_DP32(flags, TBFLAG_A64, ZCR_LEN, zcr_len);
+    }
+
+    sctlr = arm_sctlr(env, el);
+
+    if (cpu_isar_feature(aa64_pauth, env_archcpu(env))) {
+        /*
+         * In order to save space in flags, we record only whether
+         * pauth is "inactive", meaning all insns are implemented as
+         * a nop, or "active" when some action must be performed.
+         * The decision of which action to take is left to a helper.
+         */
+        if (sctlr & (SCTLR_EnIA | SCTLR_EnIB | SCTLR_EnDA | SCTLR_EnDB)) {
+            flags = FIELD_DP32(flags, TBFLAG_A64, PAUTH_ACTIVE, 1);
+        }
+    }
+
+    if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
+        /* Note that SCTLR_EL[23].BT == SCTLR_BT1.  */
+        if (sctlr & (el == 0 ? SCTLR_BT0 : SCTLR_BT1)) {
+            flags = FIELD_DP32(flags, TBFLAG_A64, BT, 1);
+        }
+    }
+
+    return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
+}
+
 void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
                           target_ulong *cs_base, uint32_t *pflags)
 {
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
     uint32_t flags = 0;
 
     if (is_a64(env)) {
-        ARMCPU *cpu = env_archcpu(env);
-        uint64_t sctlr;
-
         *pc = env->pc;
-        flags = FIELD_DP32(flags, TBFLAG_ANY, AARCH64_STATE, 1);
-
-        /* Get control bits for tagged addresses.  */
-        {
-            ARMMMUIdx stage1 = stage_1_mmu_idx(mmu_idx);
-            ARMVAParameters p0 = aa64_va_parameters_both(env, 0, stage1);
-            int tbii, tbid;
-
-            /* FIXME: ARMv8.1-VHE S2 translation regime.  */
-            if (regime_el(env, stage1) < 2) {
-                ARMVAParameters p1 = aa64_va_parameters_both(env, -1, stage1);
-                tbid = (p1.tbi << 1) | p0.tbi;
-                tbii = tbid & ~((p1.tbid << 1) | p0.tbid);
-            } else {
-                tbid = p0.tbi;
-                tbii = tbid & !p0.tbid;
-            }
-
-            flags = FIELD_DP32(flags, TBFLAG_A64, TBII, tbii);
-            flags = FIELD_DP32(flags, TBFLAG_A64, TBID, tbid);
-        }
-
-        if (cpu_isar_feature(aa64_sve, cpu)) {
-            int sve_el = sve_exception_el(env, current_el);
-            uint32_t zcr_len;
-
-            /* If SVE is disabled, but FP is enabled,
-             * then the effective len is 0.
-             */
-            if (sve_el != 0 && fp_el == 0) {
-                zcr_len = 0;
-            } else {
-                zcr_len = sve_zcr_len_for_el(env, current_el);
-            }
-            flags = FIELD_DP32(flags, TBFLAG_A64, SVEEXC_EL, sve_el);
-            flags = FIELD_DP32(flags, TBFLAG_A64, ZCR_LEN, zcr_len);
-        }
-
-        sctlr = arm_sctlr(env, current_el);
-
-        if (cpu_isar_feature(aa64_pauth, cpu)) {
-            /*
-             * In order to save space in flags, we record only whether
-             * pauth is "inactive", meaning all insns are implemented as
-             * a nop, or "active" when some action must be performed.
-             * The decision of which action to take is left to a helper.
-             */
-            if (sctlr & (SCTLR_EnIA | SCTLR_EnIB | SCTLR_EnDA | SCTLR_EnDB)) {
-                flags = FIELD_DP32(flags, TBFLAG_A64, PAUTH_ACTIVE, 1);
-            }
-        }
-
-        if (cpu_isar_feature(aa64_bti, cpu)) {
-            /* Note that SCTLR_EL[23].BT == SCTLR_BT1.  */
-            if (sctlr & (current_el == 0 ? SCTLR_BT0 : SCTLR_BT1)) {
-                flags = FIELD_DP32(flags, TBFLAG_A64, BT, 1);
-            }
+        flags = rebuild_hflags_a64(env, current_el, fp_el, mmu_idx);
+        if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
             flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
         }
     } else {
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
             flags = FIELD_DP32(flags, TBFLAG_A32,
                                XSCALE_CPAR, env->cp15.c15_cpar);
         }
-    }
 
-    flags = rebuild_hflags_common(env, fp_el, mmu_idx, flags);
+        flags = rebuild_hflags_common(env, fp_el, mmu_idx, flags);
+    }
 
     /* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
      * states defined in the ARM ARM for software singlestep:
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Create a function to compute the values of the TBFLAG_A32 bits
that will be cached, and are used by all profiles.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Set TBFLAG_ANY.BE_DATA in rebuild_hflags_common_32 and
rebuild_hflags_a64 instead of rebuild_hflags_common, where we do
not need to re-test is_a64() nor re-compute the various inputs.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h    | 49 +++++++++++++++++++++++++++------------------
 target/arm/helper.c | 16 +++++++++++----
 2 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline uint64_t arm_sctlr(CPUARMState *env, int el)
     }
 }
 
+static inline bool arm_cpu_data_is_big_endian_a32(CPUARMState *env,
+                                                  bool sctlr_b)
+{
+#ifdef CONFIG_USER_ONLY
+    /*
+     * In system mode, BE32 is modelled in line with the
+     * architecture (as word-invariant big-endianness), where loads
+     * and stores are done little endian but from addresses which
+     * are adjusted by XORing with the appropriate constant. So the
+     * endianness to use for the raw data access is not affected by
+     * SCTLR.B.
+     * In user mode, however, we model BE32 as byte-invariant
+     * big-endianness (because user-only code cannot tell the
+     * difference), and so we need to use a data access endianness
+     * that depends on SCTLR.B.
+     */
+    if (sctlr_b) {
+        return true;
+    }
+#endif
+    /* In 32bit endianness is determined by looking at CPSR's E bit */
+    return env->uncached_cpsr & CPSR_E;
+}
+
+static inline bool arm_cpu_data_is_big_endian_a64(int el, uint64_t sctlr)
+{
+    return sctlr & (el ? SCTLR_EE : SCTLR_E0E);
+}
 
 /* Return true if the processor is in big-endian mode. */
 static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
 {
-    /* In 32bit endianness is determined by looking at CPSR's E bit */
     if (!is_a64(env)) {
-        return
-#ifdef CONFIG_USER_ONLY
-            /* In system mode, BE32 is modelled in line with the
-             * architecture (as word-invariant big-endianness), where loads
-             * and stores are done little endian but from addresses which
-             * are adjusted by XORing with the appropriate constant. So the
-             * endianness to use for the raw data access is not affected by
-             * SCTLR.B.
-             * In user mode, however, we model BE32 as byte-invariant
-             * big-endianness (because user-only code cannot tell the
-             * difference), and so we need to use a data access endianness
-             * that depends on SCTLR.B.
-             */
-            arm_sctlr_b(env) ||
-#endif
-                ((env->uncached_cpsr & CPSR_E) ? 1 : 0);
+        return arm_cpu_data_is_big_endian_a32(env, arm_sctlr_b(env));
     } else {
         int cur_el = arm_current_el(env);
         uint64_t sctlr = arm_sctlr(env, cur_el);
-
-        return (sctlr & (cur_el ? SCTLR_EE : SCTLR_E0E)) != 0;
+        return arm_cpu_data_is_big_endian_a64(cur_el, sctlr);
     }
 }
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_common(CPUARMState *env, int fp_el,
     flags = FIELD_DP32(flags, TBFLAG_ANY, MMUIDX,
                        arm_to_core_mmu_idx(mmu_idx));
 
-    if (arm_cpu_data_is_big_endian(env)) {
-        flags = FIELD_DP32(flags, TBFLAG_ANY, BE_DATA, 1);
-    }
     if (arm_singlestep_active(env)) {
         flags = FIELD_DP32(flags, TBFLAG_ANY, SS_ACTIVE, 1);
     }
@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_common(CPUARMState *env, int fp_el,
 static uint32_t rebuild_hflags_common_32(CPUARMState *env, int fp_el,
                                          ARMMMUIdx mmu_idx, uint32_t flags)
 {
-    flags = FIELD_DP32(flags, TBFLAG_A32, SCTLR_B, arm_sctlr_b(env));
+    bool sctlr_b = arm_sctlr_b(env);
+
+    if (sctlr_b) {
+        flags = FIELD_DP32(flags, TBFLAG_A32, SCTLR_B, 1);
+    }
+    if (arm_cpu_data_is_big_endian_a32(env, sctlr_b)) {
+        flags = FIELD_DP32(flags, TBFLAG_ANY, BE_DATA, 1);
+    }
     flags = FIELD_DP32(flags, TBFLAG_A32, NS, !access_secure_reg(env));
 
     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
 
     sctlr = arm_sctlr(env, el);
 
+    if (arm_cpu_data_is_big_endian_a64(el, sctlr)) {
+        flags = FIELD_DP32(flags, TBFLAG_ANY, BE_DATA, 1);
+    }
+
     if (cpu_isar_feature(aa64_pauth, env_archcpu(env))) {
         /*
          * In order to save space in flags, we record only whether
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Create a function to compute the values of the TBFLAG_A32 bits
that will be cached, and are used by M-profile.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-6-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 45 ++++++++++++++++++++++++++++++---------------
 1 file changed, 30 insertions(+), 15 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_common_32(CPUARMState *env, int fp_el,
     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
 }
 
+static uint32_t rebuild_hflags_m32(CPUARMState *env, int fp_el,
+                                   ARMMMUIdx mmu_idx)
+{
+    uint32_t flags = 0;
+
+    if (arm_v7m_is_handler_mode(env)) {
+        flags = FIELD_DP32(flags, TBFLAG_A32, HANDLER, 1);
+    }
+
+    /*
+     * v8M always applies stack limit checks unless CCR.STKOFHFNMIGN
+     * is suppressing them because the requested execution priority
+     * is less than 0.
+     */
+    if (arm_feature(env, ARM_FEATURE_V8) &&
+        !((mmu_idx & ARM_MMU_IDX_M_NEGPRI) &&
+          (env->v7m.ccr[env->v7m.secure] & R_V7M_CCR_STKOFHFNMIGN_MASK))) {
+        flags = FIELD_DP32(flags, TBFLAG_A32, STACKCHECK, 1);
+    }
+
+    return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
+}
+
 static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
                                    ARMMMUIdx mmu_idx)
 {
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         }
     } else {
         *pc = env->regs[15];
-        flags = rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
+
+        if (arm_feature(env, ARM_FEATURE_M)) {
+            flags = rebuild_hflags_m32(env, fp_el, mmu_idx);
+        } else {
+            flags = rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
+        }
+
         flags = FIELD_DP32(flags, TBFLAG_A32, THUMB, env->thumb);
         flags = FIELD_DP32(flags, TBFLAG_A32, VECLEN, env->vfp.vec_len);
         flags = FIELD_DP32(flags, TBFLAG_A32, VECSTRIDE, env->vfp.vec_stride);
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         }
     }
 
-    if (arm_v7m_is_handler_mode(env)) {
-        flags = FIELD_DP32(flags, TBFLAG_A32, HANDLER, 1);
-    }
-
-    /* v8M always applies stack limit checks unless CCR.STKOFHFNMIGN is
-     * suppressing them because the requested execution priority is less than 0.
-     */
-    if (arm_feature(env, ARM_FEATURE_V8) &&
-        arm_feature(env, ARM_FEATURE_M) &&
-        !((mmu_idx  & ARM_MMU_IDX_M_NEGPRI) &&
-          (env->v7m.ccr[env->v7m.secure] & R_V7M_CCR_STKOFHFNMIGN_MASK))) {
-        flags = FIELD_DP32(flags, TBFLAG_A32, STACKCHECK, 1);
-    }
-
     if (arm_feature(env, ARM_FEATURE_M_SECURITY) &&
         FIELD_EX32(env->v7m.fpccr[M_REG_S], V7M_FPCCR, S) != env->v7m.secure) {
         flags = FIELD_DP32(flags, TBFLAG_A32, FPCCR_S_WRONG, 1);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Hoist the computation of some TBFLAG_A32 bits that only apply to
M-profile under a single test for ARM_FEATURE_M.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-7-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 49 +++++++++++++++++++++------------------------
 1 file changed, 23 insertions(+), 26 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
 
         if (arm_feature(env, ARM_FEATURE_M)) {
             flags = rebuild_hflags_m32(env, fp_el, mmu_idx);
+
+            if (arm_feature(env, ARM_FEATURE_M_SECURITY) &&
+                FIELD_EX32(env->v7m.fpccr[M_REG_S], V7M_FPCCR, S)
+                != env->v7m.secure) {
+                flags = FIELD_DP32(flags, TBFLAG_A32, FPCCR_S_WRONG, 1);
+            }
+
+            if ((env->v7m.fpccr[env->v7m.secure] & R_V7M_FPCCR_ASPEN_MASK) &&
+                (!(env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK) ||
+                 (env->v7m.secure &&
+                  !(env->v7m.control[M_REG_S] & R_V7M_CONTROL_SFPA_MASK)))) {
+                /*
+                 * ASPEN is set, but FPCA/SFPA indicate that there is no
+                 * active FP context; we must create a new FP context before
+                 * executing any FP insn.
+                 */
+                flags = FIELD_DP32(flags, TBFLAG_A32, NEW_FP_CTXT_NEEDED, 1);
+            }
+
+            bool is_secure = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK;
+            if (env->v7m.fpccr[is_secure] & R_V7M_FPCCR_LSPACT_MASK) {
+                flags = FIELD_DP32(flags, TBFLAG_A32, LSPACT, 1);
+            }
         } else {
             flags = rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
         }
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         }
     }
 
-    if (arm_feature(env, ARM_FEATURE_M_SECURITY) &&
-        FIELD_EX32(env->v7m.fpccr[M_REG_S], V7M_FPCCR, S) != env->v7m.secure) {
-        flags = FIELD_DP32(flags, TBFLAG_A32, FPCCR_S_WRONG, 1);
-    }
-
-    if (arm_feature(env, ARM_FEATURE_M) &&
-        (env->v7m.fpccr[env->v7m.secure] & R_V7M_FPCCR_ASPEN_MASK) &&
-        (!(env->v7m.control[M_REG_S] & R_V7M_CONTROL_FPCA_MASK) ||
-         (env->v7m.secure &&
-          !(env->v7m.control[M_REG_S] & R_V7M_CONTROL_SFPA_MASK)))) {
-        /*
-         * ASPEN is set, but FPCA/SFPA indicate that there is no active
-         * FP context; we must create a new FP context before executing
-         * any FP insn.
-         */
-        flags = FIELD_DP32(flags, TBFLAG_A32, NEW_FP_CTXT_NEEDED, 1);
-    }
-
-    if (arm_feature(env, ARM_FEATURE_M)) {
-        bool is_secure = env->v7m.fpccr[M_REG_S] & R_V7M_FPCCR_S_MASK;
-
-        if (env->v7m.fpccr[is_secure] & R_V7M_FPCCR_LSPACT_MASK) {
-            flags = FIELD_DP32(flags, TBFLAG_A32, LSPACT, 1);
-        }
-    }
-
     if (!arm_feature(env, ARM_FEATURE_M)) {
         int target_el = arm_debug_target_el(env);
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Currently a trivial wrapper for rebuild_hflags_common_32.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-8-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_m32(CPUARMState *env, int fp_el,
     return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
 }
 
+static uint32_t rebuild_hflags_a32(CPUARMState *env, int fp_el,
+                                   ARMMMUIdx mmu_idx)
+{
+    return rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
+}
+
 static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
                                    ARMMMUIdx mmu_idx)
 {
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
                 flags = FIELD_DP32(flags, TBFLAG_A32, LSPACT, 1);
             }
         } else {
-            flags = rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
+            flags = rebuild_hflags_a32(env, fp_el, mmu_idx);
         }
 
         flags = FIELD_DP32(flags, TBFLAG_A32, THUMB, env->thumb);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Create a function to compute the values of the TBFLAG_ANY bits
that will be cached, and are used by A-profile.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_m32(CPUARMState *env, int fp_el,
     return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
 }
 
+static uint32_t rebuild_hflags_aprofile(CPUARMState *env)
+{
+    int flags = 0;
+
+    flags = FIELD_DP32(flags, TBFLAG_ANY, DEBUG_TARGET_EL,
+                       arm_debug_target_el(env));
+    return flags;
+}
+
 static uint32_t rebuild_hflags_a32(CPUARMState *env, int fp_el,
                                    ARMMMUIdx mmu_idx)
 {
-    return rebuild_hflags_common_32(env, fp_el, mmu_idx, 0);
+    uint32_t flags = rebuild_hflags_aprofile(env);
+    return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
 }
 
 static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
                                    ARMMMUIdx mmu_idx)
 {
+    uint32_t flags = rebuild_hflags_aprofile(env);
     ARMMMUIdx stage1 = stage_1_mmu_idx(mmu_idx);
     ARMVAParameters p0 = aa64_va_parameters_both(env, 0, stage1);
-    uint32_t flags = 0;
     uint64_t sctlr;
     int tbii, tbid;
 
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         }
     }
 
-    if (!arm_feature(env, ARM_FEATURE_M)) {
-        int target_el = arm_debug_target_el(env);
-
-        flags = FIELD_DP32(flags, TBFLAG_ANY, DEBUG_TARGET_EL, target_el);
-    }
-
     *pflags = flags;
     *cs_base = 0;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We do not need to compute any of these values for M-profile.
Further, XSCALE_CPAR overlaps VECSTRIDE so obviously the two
sets must be mutually exclusive.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-10-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

Hoist the variable load for PSTATE into the existing test vs is_a64.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-11-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
     ARMMMUIdx mmu_idx = arm_mmu_idx(env);
     int current_el = arm_current_el(env);
     int fp_el = fp_exception_el(env, current_el);
-    uint32_t flags;
+    uint32_t flags, pstate_for_ss;
 
     if (is_a64(env)) {
         *pc = env->pc;
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
             flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
         }
+        pstate_for_ss = env->pstate;
     } else {
         *pc = env->regs[15];
 
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
             || arm_el_is_aa64(env, 1) || arm_feature(env, ARM_FEATURE_M)) {
             flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
         }
+        pstate_for_ss = env->uncached_cpsr;
     }
 
-    /* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
+    /*
+     * The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
      * states defined in the ARM ARM for software singlestep:
      *  SS_ACTIVE   PSTATE.SS   State
      *     0            x       Inactive (the TB flag for SS is always 0)
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
      *     1            1       Active-not-pending
      * SS_ACTIVE is set in hflags; PSTATE_SS is computed every TB.
      */
-    if (FIELD_EX32(flags, TBFLAG_ANY, SS_ACTIVE)) {
-        if (is_a64(env)) {
-            if (env->pstate & PSTATE_SS) {
-                flags = FIELD_DP32(flags, TBFLAG_ANY, PSTATE_SS, 1);
-            }
-        } else {
-            if (env->uncached_cpsr & PSTATE_SS) {
-                flags = FIELD_DP32(flags, TBFLAG_ANY, PSTATE_SS, 1);
-            }
-        }
+    if (FIELD_EX32(flags, TBFLAG_ANY, SS_ACTIVE) &&
+        (pstate_for_ss & PSTATE_SS)) {
+        flags = FIELD_DP32(flags, TBFLAG_ANY, PSTATE_SS, 1);
     }
 
     *pflags = flags;
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

There are 3 conditions that each enable this flag.  M-profile always
enables; A-profile with EL1 as AA64 always enables.  Both of these
conditions can easily be cached.  The final condition relies on the
FPEXC register which we are not prepared to cache.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-12-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h    |  2 +-
 target/arm/helper.c | 14 ++++++++++----
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A32, XSCALE_CPAR, 4, 2)
  * the same thing as the current security state of the processor!
  */
 FIELD(TBFLAG_A32, NS, 6, 1)
-FIELD(TBFLAG_A32, VFPEN, 7, 1)          /* Not cached. */
+FIELD(TBFLAG_A32, VFPEN, 7, 1)          /* Partially cached, minus FPEXC. */
 FIELD(TBFLAG_A32, CONDEXEC, 8, 8)       /* Not cached. */
 FIELD(TBFLAG_A32, SCTLR_B, 16, 1)
 /* For M profile only, set if FPCCR.LSPACT is set */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_m32(CPUARMState *env, int fp_el,
 {
     uint32_t flags = 0;
 
+    /* v8M always enables the fpu.  */
+    flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
+
     if (arm_v7m_is_handler_mode(env)) {
         flags = FIELD_DP32(flags, TBFLAG_A32, HANDLER, 1);
     }
@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_a32(CPUARMState *env, int fp_el,
                                    ARMMMUIdx mmu_idx)
 {
     uint32_t flags = rebuild_hflags_aprofile(env);
+
+    if (arm_el_is_aa64(env, 1)) {
+        flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
+    }
     return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
 }
 
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
                 flags = FIELD_DP32(flags, TBFLAG_A32, VECSTRIDE,
                                    env->vfp.vec_stride);
             }
+            if (env->vfp.xregs[ARM_VFP_FPEXC] & (1 << 30)) {
+                flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
+            }
         }
 
         flags = FIELD_DP32(flags, TBFLAG_A32, THUMB, env->thumb);
         flags = FIELD_DP32(flags, TBFLAG_A32, CONDEXEC, env->condexec_bits);
-        if (env->vfp.xregs[ARM_VFP_FPEXC] & (1 << 30)
-            || arm_el_is_aa64(env, 1) || arm_feature(env, ARM_FEATURE_M)) {
-            flags = FIELD_DP32(flags, TBFLAG_A32, VFPEN, 1);
-        }
         pstate_for_ss = env->uncached_cpsr;
     }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This function assumes nothing about the current state of the cpu,
and writes the computed value to env->hflags.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-13-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h    |  6 ++++++
 target/arm/helper.c | 30 ++++++++++++++++++++++--------
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ void arm_register_pre_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
 void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook, void
         *opaque);
 
+/**
+ * arm_rebuild_hflags:
+ * Rebuild the cached TBFLAGS for arbitrary changed processor state.
+ */
+void arm_rebuild_hflags(CPUARMState *env);
+
 /**
  * aa32_vfp_dreg:
  * Return a pointer to the Dn register within env in 32-bit mode.
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_a64(CPUARMState *env, int el, int fp_el,
     return rebuild_hflags_common(env, fp_el, mmu_idx, flags);
 }
 
+static uint32_t rebuild_hflags_internal(CPUARMState *env)
+{
+    int el = arm_current_el(env);
+    int fp_el = fp_exception_el(env, el);
+    ARMMMUIdx mmu_idx = arm_mmu_idx(env);
+
+    if (is_a64(env)) {
+        return rebuild_hflags_a64(env, el, fp_el, mmu_idx);
+    } else if (arm_feature(env, ARM_FEATURE_M)) {
+        return rebuild_hflags_m32(env, fp_el, mmu_idx);
+    } else {
+        return rebuild_hflags_a32(env, fp_el, mmu_idx);
+    }
+}
+
+void arm_rebuild_hflags(CPUARMState *env)
+{
+    env->hflags = rebuild_hflags_internal(env);
+}
+
 void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
                           target_ulong *cs_base, uint32_t *pflags)
 {
-    ARMMMUIdx mmu_idx = arm_mmu_idx(env);
-    int current_el = arm_current_el(env);
-    int fp_el = fp_exception_el(env, current_el);
     uint32_t flags, pstate_for_ss;
 
+    flags = rebuild_hflags_internal(env);
+
     if (is_a64(env)) {
         *pc = env->pc;
-        flags = rebuild_hflags_a64(env, current_el, fp_el, mmu_idx);
         if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
             flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
         }
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         *pc = env->regs[15];
 
         if (arm_feature(env, ARM_FEATURE_M)) {
-            flags = rebuild_hflags_m32(env, fp_el, mmu_idx);
-
             if (arm_feature(env, ARM_FEATURE_M_SECURITY) &&
                 FIELD_EX32(env->v7m.fpccr[M_REG_S], V7M_FPCCR, S)
                 != env->v7m.secure) {
@@ -XXX,XX +XXX,XX @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
                 flags = FIELD_DP32(flags, TBFLAG_A32, LSPACT, 1);
             }
         } else {
-            flags = rebuild_hflags_a32(env, fp_el, mmu_idx);
-
             /*
              * Note that XSCALE_CPAR shares bits with VECSTRIDE.
              * Note that VECLEN+VECSTRIDE are RES0 for M-profile.
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Avoid calling arm_current_el() twice.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-14-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h |  9 +++++++++
 target/arm/helper.c    | 12 +++++++-----
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ void arm_cpu_update_virq(ARMCPU *cpu);
  */
 void arm_cpu_update_vfiq(ARMCPU *cpu);
 
+/**
+ * arm_mmu_idx_el:
+ * @env: The cpu environment
+ * @el: The EL to use.
+ *
+ * Return the full ARMMMUIdx for the translation regime for EL.
+ */
+ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el);
+
 /**
  * arm_mmu_idx:
  * @env: The cpu environment
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
 }
 #endif
 
-ARMMMUIdx arm_mmu_idx(CPUARMState *env)
+ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el)
 {
-    int el;
-
     if (arm_feature(env, ARM_FEATURE_M)) {
         return arm_v7m_mmu_idx_for_secstate(env, env->v7m.secure);
     }
 
-    el = arm_current_el(env);
     if (el < 2 && arm_is_secure_below_el3(env)) {
         return ARMMMUIdx_S1SE0 + el;
     } else {
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx(CPUARMState *env)
     }
 }
 
+ARMMMUIdx arm_mmu_idx(CPUARMState *env)
+{
+    return arm_mmu_idx_el(env, arm_current_el(env));
+}
+
 int cpu_mmu_index(CPUARMState *env, bool ifetch)
 {
     return arm_to_core_mmu_idx(arm_mmu_idx(env));
@@ -XXX,XX +XXX,XX @@ static uint32_t rebuild_hflags_internal(CPUARMState *env)
 {
     int el = arm_current_el(env);
     int fp_el = fp_exception_el(env, el);
-    ARMMMUIdx mmu_idx = arm_mmu_idx(env);
+    ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, el);
 
     if (is_a64(env)) {
         return rebuild_hflags_a64(env, el, fp_el, mmu_idx);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

By performing this store early, we avoid having to save and restore
the register holding the address around any function calls.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-15-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

From: Richard Henderson <richard.henderson@linaro.org>

This functions are given the mode and el state of the cpu
and writes the computed value to env->hflags.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-16-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h |  4 ++++
 target/arm/helper.c | 24 ++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_4(msr_banked, void, env, i32, i32, i32)
 DEF_HELPER_2(get_user_reg, i32, env, i32)
 DEF_HELPER_3(set_user_reg, void, env, i32, i32)
 
+DEF_HELPER_FLAGS_2(rebuild_hflags_m32, TCG_CALL_NO_RWG, void, env, int)
+DEF_HELPER_FLAGS_2(rebuild_hflags_a32, TCG_CALL_NO_RWG, void, env, int)
+DEF_HELPER_FLAGS_2(rebuild_hflags_a64, TCG_CALL_NO_RWG, void, env, int)
+
 DEF_HELPER_1(vfp_get_fpscr, i32, env)
 DEF_HELPER_2(vfp_set_fpscr, void, env, i32)
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void arm_rebuild_hflags(CPUARMState *env)
     env->hflags = rebuild_hflags_internal(env);
 }
 
+void HELPER(rebuild_hflags_m32)(CPUARMState *env, int el)
+{
+    int fp_el = fp_exception_el(env, el);
+    ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, el);
+
+    env->hflags = rebuild_hflags_m32(env, fp_el, mmu_idx);
+}
+
+void HELPER(rebuild_hflags_a32)(CPUARMState *env, int el)
+{
+    int fp_el = fp_exception_el(env, el);
+    ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, el);
+
+    env->hflags = rebuild_hflags_a32(env, fp_el, mmu_idx);
+}
+
+void HELPER(rebuild_hflags_a64)(CPUARMState *env, int el)
+{
+    int fp_el = fp_exception_el(env, el);
+    ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, el);
+
+    env->hflags = rebuild_hflags_a64(env, el, fp_el, mmu_idx);
+}
+
 void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
                           target_ulong *cs_base, uint32_t *pflags)
 {
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Begin setting, but not relying upon, env->hflags.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-17-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 linux-user/syscall.c    | 1 +
 target/arm/cpu.c        | 1 +
 target/arm/helper-a64.c | 3 +++
 target/arm/helper.c     | 2 ++
 target/arm/machine.c    | 1 +
 target/arm/op_helper.c  | 1 +
 6 files changed, 9 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -XXX,XX +XXX,XX @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
                     aarch64_sve_narrow_vq(env, vq);
                 }
                 env->vfp.zcr_el[1] = vq - 1;
+                arm_rebuild_hflags(env);
                 ret = vq * 16;
             }
             return ret;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
 
     hw_breakpoint_update_all(cpu);
     hw_watchpoint_update_all(cpu);
+    arm_rebuild_hflags(env);
 }
 
 bool arm_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -XXX,XX +XXX,XX @@ void HELPER(exception_return)(CPUARMState *env, uint64_t new_pc)
         } else {
             env->regs[15] = new_pc & ~0x3;
         }
+        helper_rebuild_hflags_a32(env, new_el);
         qemu_log_mask(CPU_LOG_INT, "Exception return from AArch64 EL%d to "
                       "AArch32 EL%d PC 0x%" PRIx32 "\n",
                       cur_el, new_el, env->regs[15]);
@@ -XXX,XX +XXX,XX @@ void HELPER(exception_return)(CPUARMState *env, uint64_t new_pc)
         }
         aarch64_restore_sp(env, new_el);
         env->pc = new_pc;
+        helper_rebuild_hflags_a64(env, new_el);
         qemu_log_mask(CPU_LOG_INT, "Exception return from AArch64 EL%d to "
                       "AArch64 EL%d PC 0x%" PRIx64 "\n",
                       cur_el, new_el, env->pc);
     }
+
     /*
      * Note that cur_el can never be 0.  If new_el is 0, then
      * el0_a64 is return_to_aa64, else el0_a64 is ignored.
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void take_aarch32_exception(CPUARMState *env, int new_mode,
         env->regs[14] = env->regs[15] + offset;
     }
     env->regs[15] = newpc;
+    arm_rebuild_hflags(env);
 }
 
 static void arm_cpu_do_interrupt_aarch32_hyp(CPUState *cs)
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
     pstate_write(env, PSTATE_DAIF | new_mode);
     env->aarch64 = 1;
     aarch64_restore_sp(env, new_el);
+    helper_rebuild_hflags_a64(env, new_el);
 
     env->pc = addr;
 
diff --git a/target/arm/machine.c b/target/arm/machine.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -XXX,XX +XXX,XX @@ static int cpu_post_load(void *opaque, int version_id)
     if (!kvm_enabled()) {
         pmu_op_finish(&cpu->env);
     }
+    arm_rebuild_hflags(&cpu->env);
 
     return 0;
 }
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(cpsr_write_eret)(CPUARMState *env, uint32_t val)
      * state. Do the masking now.
      */
     env->regs[15] &= (env->thumb ? ~1 : ~3);
+    arm_rebuild_hflags(env);
 
     qemu_mutex_lock_iothread();
     arm_call_el_change_hook(env_archcpu(env));
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Continue setting, but not relying upon, env->hflags.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-18-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 13 +++++++++++--
 target/arm/translate.c     | 28 +++++++++++++++++++++++-----
 2 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_sys(DisasContext *s, uint32_t insn, bool isread,
     if ((tb_cflags(s->base.tb) & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
         /* I/O operations must end the TB here (whether read or write) */
         s->base.is_jmp = DISAS_UPDATE;
-    } else if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
-        /* We default to ending the TB on a coprocessor register write,
+    }
+    if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
+        /*
+         * A write to any coprocessor regiser that ends a TB
+         * must rebuild the hflags for the next TB.
+         */
+        TCGv_i32 tcg_el = tcg_const_i32(s->current_el);
+        gen_helper_rebuild_hflags_a64(cpu_env, tcg_el);
+        tcg_temp_free_i32(tcg_el);
+        /*
+         * We default to ending the TB on a coprocessor register write,
          * but allow this to be suppressed by the register definition
          * (usually only necessary to work around guest bugs).
          */
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_coproc_insn(DisasContext *s, uint32_t insn)
     ri = get_arm_cp_reginfo(s->cp_regs,
             ENCODE_CP_REG(cpnum, is64, s->ns, crn, crm, opc1, opc2));
     if (ri) {
+        bool need_exit_tb;
+
         /* Check access permissions */
         if (!cp_access_ok(s->current_el, ri, isread)) {
             return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_coproc_insn(DisasContext *s, uint32_t insn)
             }
         }
 
-        if ((tb_cflags(s->base.tb) & CF_USE_ICOUNT) && (ri->type & ARM_CP_IO)) {
-            /* I/O operations must end the TB here (whether read or write) */
-            gen_lookup_tb(s);
-        } else if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
-            /* We default to ending the TB on a coprocessor register write,
+        /* I/O operations must end the TB here (whether read or write) */
+        need_exit_tb = ((tb_cflags(s->base.tb) & CF_USE_ICOUNT) &&
+                        (ri->type & ARM_CP_IO));
+
+        if (!isread && !(ri->type & ARM_CP_SUPPRESS_TB_END)) {
+            /*
+             * A write to any coprocessor regiser that ends a TB
+             * must rebuild the hflags for the next TB.
+             */
+            TCGv_i32 tcg_el = tcg_const_i32(s->current_el);
+            if (arm_dc_feature(s, ARM_FEATURE_M)) {
+                gen_helper_rebuild_hflags_m32(cpu_env, tcg_el);
+            } else {
+                gen_helper_rebuild_hflags_a32(cpu_env, tcg_el);
+            }
+            tcg_temp_free_i32(tcg_el);
+            /*
+             * We default to ending the TB on a coprocessor register write,
              * but allow this to be suppressed by the register definition
              * (usually only necessary to work around guest bugs).
              */
+            need_exit_tb = true;
+        }
+        if (need_exit_tb) {
             gen_lookup_tb(s);
         }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Continue setting, but not relying upon, env->hflags.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-19-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/op_helper.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(usat16)(CPUARMState *env, uint32_t x, uint32_t shift)
 void HELPER(setend)(CPUARMState *env)
 {
     env->uncached_cpsr ^= CPSR_E;
+    arm_rebuild_hflags(env);
 }
 
 /* Function checks whether WFx (WFI/WFE) instructions are set up to be trapped.
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(cpsr_read)(CPUARMState *env)
 void HELPER(cpsr_write)(CPUARMState *env, uint32_t val, uint32_t mask)
 {
     cpsr_write(env, val, mask, CPSRWriteByInstr);
+    /* TODO: Not all cpsr bits are relevant to hflags.  */
+    arm_rebuild_hflags(env);
 }
 
 /* Write the CPSR for a 32-bit exception return */
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Continue setting, but not relying upon, env->hflags.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-20-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void sctlr_write(CPUARMState *env, const ARMCPRegInfo *ri,
     /* ??? Lots of these bits are not implemented.  */
     /* This may enable/disable the MMU, so do a TLB flush.  */
     tlb_flush(CPU(cpu));
+
+    if (ri->type & ARM_CP_SUPPRESS_TB_END) {
+        /*
+         * Normally we would always end the TB on an SCTLR write; see the
+         * comment in ARMCPRegInfo sctlr initialization below for why Xscale
+         * is special.  Setting ARM_CP_SUPPRESS_TB_END also stops the rebuild
+         * of hflags from the translator, so do it here.
+         */
+        arm_rebuild_hflags(env);
+    }
 }
 
 static CPAccessResult fpexc32_access(CPUARMState *env, const ARMCPRegInfo *ri,
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Continue setting, but not relying upon, env->hflags.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-21-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/m_helper.c  | 6 ++++++
 target/arm/translate.c | 5 ++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_bxns)(CPUARMState *env, uint32_t dest)
     switch_v7m_security_state(env, dest & 1);
     env->thumb = 1;
     env->regs[15] = dest & ~1;
+    arm_rebuild_hflags(env);
 }
 
 void HELPER(v7m_blxns)(CPUARMState *env, uint32_t dest)
@@ -XXX,XX +XXX,XX @@ void HELPER(v7m_blxns)(CPUARMState *env, uint32_t dest)
     switch_v7m_security_state(env, 0);
     env->thumb = 1;
     env->regs[15] = dest;
+    arm_rebuild_hflags(env);
 }
 
 static uint32_t *get_v7m_sp_ptr(CPUARMState *env, bool secure, bool threadmode,
@@ -XXX,XX +XXX,XX @@ static void v7m_exception_taken(ARMCPU *cpu, uint32_t lr, bool dotailchain,
     env->regs[14] = lr;
     env->regs[15] = addr & 0xfffffffe;
     env->thumb = addr & 1;
+    arm_rebuild_hflags(env);
 }
 
 static void v7m_update_fpccr(CPUARMState *env, uint32_t frameptr,
@@ -XXX,XX +XXX,XX @@ static void do_v7m_exception_exit(ARMCPU *cpu)
 
     /* Otherwise, we have a successful exception exit. */
     arm_clear_exclusive(env);
+    arm_rebuild_hflags(env);
     qemu_log_mask(CPU_LOG_INT, "...successful exception return\n");
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_v7m_function_return(ARMCPU *cpu)
     xpsr_write(env, 0, XPSR_IT);
     env->thumb = newpc & 1;
     env->regs[15] = newpc & ~1;
+    arm_rebuild_hflags(env);
 
     qemu_log_mask(CPU_LOG_INT, "...function return successful\n");
     return true;
@@ -XXX,XX +XXX,XX @@ static bool v7m_handle_execute_nsc(ARMCPU *cpu)
     switch_v7m_security_state(env, true);
     xpsr_write(env, 0, XPSR_IT);
     env->regs[15] += 4;
+    arm_rebuild_hflags(env);
     return true;
 
 gen_invep:
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static bool trans_MRS_v7m(DisasContext *s, arg_MRS_v7m *a)
 
 static bool trans_MSR_v7m(DisasContext *s, arg_MSR_v7m *a)
 {
-    TCGv_i32 addr, reg;
+    TCGv_i32 addr, reg, el;
 
     if (!arm_dc_feature(s, ARM_FEATURE_M)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_MSR_v7m(DisasContext *s, arg_MSR_v7m *a)
     gen_helper_v7m_msr(cpu_env, addr, reg);
     tcg_temp_free_i32(addr);
     tcg_temp_free_i32(reg);
+    el = tcg_const_i32(s->current_el);
+    gen_helper_rebuild_hflags_m32(cpu_env, el);
+    tcg_temp_free_i32(el);
     gen_lookup_tb(s);
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Continue setting, but not relying upon, env->hflags.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-22-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/intc/armv7m_nvic.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/armv7m_nvic.c
+++ b/hw/intc/armv7m_nvic.c
@@ -XXX,XX +XXX,XX @@ static MemTxResult nvic_sysreg_write(void *opaque, hwaddr addr,
             }
         }
         nvic_irq_update(s);
-        return MEMTX_OK;
+        goto exit_ok;
     case 0x200 ... 0x23f: /* NVIC Set pend */
         /* the special logic in armv7m_nvic_set_pending()
          * is not needed since IRQs are never escalated
@@ -XXX,XX +XXX,XX @@ static MemTxResult nvic_sysreg_write(void *opaque, hwaddr addr,
             }
         }
         nvic_irq_update(s);
-        return MEMTX_OK;
+        goto exit_ok;
     case 0x300 ... 0x33f: /* NVIC Active */
-        return MEMTX_OK; /* R/O */
+        goto exit_ok; /* R/O */
     case 0x400 ... 0x5ef: /* NVIC Priority */
         startvec = (offset - 0x400) + NVIC_FIRST_IRQ; /* vector # */
 
@@ -XXX,XX +XXX,XX @@ static MemTxResult nvic_sysreg_write(void *opaque, hwaddr addr,
             }
         }
         nvic_irq_update(s);
-        return MEMTX_OK;
+        goto exit_ok;
     case 0xd18 ... 0xd1b: /* System Handler Priority (SHPR1) */
         if (!arm_feature(&s->cpu->env, ARM_FEATURE_M_MAIN)) {
-            return MEMTX_OK;
+            goto exit_ok;
         }
         /* fall through */
     case 0xd1c ... 0xd23: /* System Handler Priority (SHPR2, SHPR3) */
@@ -XXX,XX +XXX,XX @@ static MemTxResult nvic_sysreg_write(void *opaque, hwaddr addr,
             set_prio(s, hdlidx, sbank, newprio);
         }
         nvic_irq_update(s);
-        return MEMTX_OK;
+        goto exit_ok;
     case 0xd28 ... 0xd2b: /* Configurable Fault Status (CFSR) */
         if (!arm_feature(&s->cpu->env, ARM_FEATURE_M_MAIN)) {
-            return MEMTX_OK;
+            goto exit_ok;
         }
         /* All bits are W1C, so construct 32 bit value with 0s in
          * the parts not written by the access size
@@ -XXX,XX +XXX,XX @@ static MemTxResult nvic_sysreg_write(void *opaque, hwaddr addr,
              */
             s->cpu->env.v7m.cfsr[M_REG_NS] &= ~(value & R_V7M_CFSR_BFSR_MASK);
         }
-        return MEMTX_OK;
+        goto exit_ok;
     }
     if (size == 4) {
         nvic_writel(s, offset, value, attrs);
-        return MEMTX_OK;
+        goto exit_ok;
     }
     qemu_log_mask(LOG_GUEST_ERROR,
                   "NVIC: Bad write of size %d at offset 0x%x\n", size, offset);
     /* This is UNPREDICTABLE; treat as RAZ/WI */
+
+ exit_ok:
+    /* Ensure any changes made are reflected in the cached hflags.  */
+    arm_rebuild_hflags(&s->cpu->env);
     return MEMTX_OK;
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This is the payoff.

From perf record -g data of ubuntu 18 boot and shutdown:

BEFORE:

-   23.02%     2.82%  qemu-system-aar  [.] helper_lookup_tb_ptr
   - 20.22% helper_lookup_tb_ptr
      + 10.05% tb_htable_lookup
      - 9.13% cpu_get_tb_cpu_state
           3.20% aa64_va_parameters_both
           0.55% fp_exception_el

-   11.66%     4.74%  qemu-system-aar  [.] cpu_get_tb_cpu_state
   - 6.96% cpu_get_tb_cpu_state
        3.63% aa64_va_parameters_both
        0.60% fp_exception_el
        0.53% sve_exception_el

AFTER:

-   16.40%     3.40%  qemu-system-aar  [.] helper_lookup_tb_ptr
   - 13.03% helper_lookup_tb_ptr
      + 11.19% tb_htable_lookup
        0.55% cpu_get_tb_cpu_state

0.98%     0.71%  qemu-system-aar  [.] cpu_get_tb_cpu_state

0.87%     0.24%  qemu-system-aar  [.] rebuild_hflags_a64

Before, helper_lookup_tb_ptr is the second hottest function in the
application, consuming almost a quarter of the runtime.  Within the
entire execution, cpu_get_tb_cpu_state consumes about 12%.

After, helper_lookup_tb_ptr has dropped to the fourth hottest function,
with consumption dropping to a sixth of the runtime.  Within the
entire execution, cpu_get_tb_cpu_state has dropped below 1%, and the
supporting function to rebuild hflags also consumes about 1%.

Assertions are retained for --enable-debug-tcg.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191023150057.25731-25-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(rebuild_hflags_a64)(CPUARMState *env, int el)
 void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
                           target_ulong *cs_base, uint32_t *pflags)
 {
-    uint32_t flags, pstate_for_ss;
+    uint32_t flags = env->hflags;
+    uint32_t pstate_for_ss;
 
     *cs_base = 0;
-    flags = rebuild_hflags_internal(env);
+#ifdef CONFIG_DEBUG_TCG
+    assert(flags == rebuild_hflags_internal(env));
+#endif
 
-    if (is_a64(env)) {
+    if (FIELD_EX32(flags, TBFLAG_ANY, AARCH64_STATE)) {
         *pc = env->pc;
         if (cpu_isar_feature(aa64_bti, env_archcpu(env))) {
             flags = FIELD_DP32(flags, TBFLAG_A64, BTYPE, env->btype);
-- 
2.20.1

Switch the fsl_etsec code away from bottom-half based ptimers to
the new transaction-based ptimer API.  This just requires adding
begin/commit calls around the various places that modify the ptimer
state, and using the new ptimer_init() function to create the timer.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20191017132122.4402-2-peter.maydell@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/net/fsl_etsec/etsec.h | 1 -
 hw/net/fsl_etsec/etsec.c | 9 +++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/net/fsl_etsec/etsec.h b/hw/net/fsl_etsec/etsec.h
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/fsl_etsec/etsec.h
+++ b/hw/net/fsl_etsec/etsec.h
@@ -XXX,XX +XXX,XX @@ typedef struct eTSEC {
     uint16_t phy_control;
 
     /* Polling */
-    QEMUBH *bh;
     struct ptimer_state *ptimer;
 
     /* Whether we should flush the rx queue when buffer becomes available. */
diff --git a/hw/net/fsl_etsec/etsec.c b/hw/net/fsl_etsec/etsec.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/fsl_etsec/etsec.c
+++ b/hw/net/fsl_etsec/etsec.c
@@ -XXX,XX +XXX,XX @@
 #include "etsec.h"
 #include "registers.h"
 #include "qemu/log.h"
-#include "qemu/main-loop.h"
 #include "qemu/module.h"
 
 /* #define HEX_DUMP */
@@ -XXX,XX +XXX,XX @@ static void write_dmactrl(eTSEC          *etsec,
 
     if (!(value & DMACTRL_WOP)) {
         /* Start polling */
+        ptimer_transaction_begin(etsec->ptimer);
         ptimer_stop(etsec->ptimer);
         ptimer_set_count(etsec->ptimer, 1);
         ptimer_run(etsec->ptimer, 1);
+        ptimer_transaction_commit(etsec->ptimer);
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void etsec_realize(DeviceState *dev, Error **errp)
                               object_get_typename(OBJECT(dev)), dev->id, etsec);
     qemu_format_nic_info_str(qemu_get_queue(etsec->nic), etsec->conf.macaddr.a);
 
-
-    etsec->bh     = qemu_bh_new(etsec_timer_hit, etsec);
-    etsec->ptimer = ptimer_init_with_bh(etsec->bh, PTIMER_POLICY_DEFAULT);
+    etsec->ptimer = ptimer_init(etsec_timer_hit, etsec, PTIMER_POLICY_DEFAULT);
+    ptimer_transaction_begin(etsec->ptimer);
     ptimer_set_freq(etsec->ptimer, 100);
+    ptimer_transaction_commit(etsec->ptimer);
 }
 
 static void etsec_instance_init(Object *obj)
-- 
2.20.1

Switch the xilinx_timer code away from bottom-half based ptimers to
the new transaction-based ptimer API.  This just requires adding
begin/commit calls around the various places that modify the ptimer
state, and using the new ptimer_init() function to create the timer.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20191017132122.4402-3-peter.maydell@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/timer/xilinx_timer.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/hw/timer/xilinx_timer.c b/hw/timer/xilinx_timer.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/xilinx_timer.c
+++ b/hw/timer/xilinx_timer.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/ptimer.h"
 #include "hw/qdev-properties.h"
 #include "qemu/log.h"
-#include "qemu/main-loop.h"
 #include "qemu/module.h"
 
 #define D(x)
@@ -XXX,XX +XXX,XX @@
 
 struct xlx_timer
 {
-    QEMUBH *bh;
     ptimer_state *ptimer;
     void *parent;
     int nr; /* for debug.  */
@@ -XXX,XX +XXX,XX @@ timer_read(void *opaque, hwaddr addr, unsigned int size)
     return r;
 }
 
+/* Must be called inside ptimer transaction block */
 static void timer_enable(struct xlx_timer *xt)
 {
     uint64_t count;
@@ -XXX,XX +XXX,XX @@ timer_write(void *opaque, hwaddr addr,
                 value &= ~TCSR_TINT;
 
             xt->regs[addr] = value & 0x7ff;
-            if (value & TCSR_ENT)
+            if (value & TCSR_ENT) {
+                ptimer_transaction_begin(xt->ptimer);
                 timer_enable(xt);
+                ptimer_transaction_commit(xt->ptimer);
+            }
             break;
  
         default:
@@ -XXX,XX +XXX,XX @@ static void xilinx_timer_realize(DeviceState *dev, Error **errp)
 
         xt->parent = t;
         xt->nr = i;
-        xt->bh = qemu_bh_new(timer_hit, xt);
-        xt->ptimer = ptimer_init_with_bh(xt->bh, PTIMER_POLICY_DEFAULT);
+        xt->ptimer = ptimer_init(timer_hit, xt, PTIMER_POLICY_DEFAULT);
+        ptimer_transaction_begin(xt->ptimer);
         ptimer_set_freq(xt->ptimer, t->freq_hz);
+        ptimer_transaction_commit(xt->ptimer);
     }
 
     memory_region_init_io(&t->mmio, OBJECT(t), &timer_ops, t, "xlnx.xps-timer",
-- 
2.20.1

Switch the xilinx_axidma code away from bottom-half based ptimers to
the new transaction-based ptimer API.  This just requires adding
begin/commit calls around the various places that modify the ptimer
state, and using the new ptimer_init() function to create the timer.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20191017132122.4402-4-peter.maydell@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/dma/xilinx_axidma.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/dma/xilinx_axidma.c
+++ b/hw/dma/xilinx_axidma.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/ptimer.h"
 #include "hw/qdev-properties.h"
 #include "qemu/log.h"
-#include "qemu/main-loop.h"
 #include "qemu/module.h"
 
 #include "hw/stream.h"
@@ -XXX,XX +XXX,XX @@ enum {
 };
 
 struct Stream {
-    QEMUBH *bh;
     ptimer_state *ptimer;
     qemu_irq irq;
 
@@ -XXX,XX +XXX,XX @@ static void stream_complete(struct Stream *s)
     unsigned int comp_delay;
 
     /* Start the delayed timer.  */
+    ptimer_transaction_begin(s->ptimer);
     comp_delay = s->regs[R_DMACR] >> 24;
     if (comp_delay) {
         ptimer_stop(s->ptimer);
@@ -XXX,XX +XXX,XX @@ static void stream_complete(struct Stream *s)
         s->regs[R_DMASR] |= DMASR_IOC_IRQ;
         stream_reload_complete_cnt(s);
     }
+    ptimer_transaction_commit(s->ptimer);
 }
 
 static void stream_process_mem2s(struct Stream *s, StreamSlave *tx_data_dev,
@@ -XXX,XX +XXX,XX @@ static void xilinx_axidma_realize(DeviceState *dev, Error **errp)
         struct Stream *st = &s->streams[i];
 
         st->nr = i;
-        st->bh = qemu_bh_new(timer_hit, st);
-        st->ptimer = ptimer_init_with_bh(st->bh, PTIMER_POLICY_DEFAULT);
+        st->ptimer = ptimer_init(timer_hit, st, PTIMER_POLICY_DEFAULT);
+        ptimer_transaction_begin(st->ptimer);
         ptimer_set_freq(st->ptimer, s->freqhz);
+        ptimer_transaction_commit(st->ptimer);
     }
     return;
 
-- 
2.20.1

In the slavio timer devcie, the ptimer TimerContext::timer is
always created by slavio_timer_init(), so there's no need to
check it for NULL; remove the single unneeded NULL check.

This will be useful to avoid compiler/Coverity errors when
a subsequent change adds a use of t->timer before the location
we currently do the NULL check.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191021134357.14266-2-peter.maydell@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/timer/slavio_timer.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/hw/timer/slavio_timer.c b/hw/timer/slavio_timer.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/slavio_timer.c
+++ b/hw/timer/slavio_timer.c
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_mem_writel(void *opaque, hwaddr addr,
             // set limit, reset counter
             qemu_irq_lower(t->irq);
             t->limit = val & TIMER_MAX_COUNT32;
-            if (t->timer) {
-                if (t->limit == 0) { /* free-run */
-                    ptimer_set_limit(t->timer,
-                                     LIMIT_TO_PERIODS(TIMER_MAX_COUNT32), 1);
-                } else {
-                    ptimer_set_limit(t->timer, LIMIT_TO_PERIODS(t->limit), 1);
-                }
+            if (t->limit == 0) { /* free-run */
+                ptimer_set_limit(t->timer,
+                                 LIMIT_TO_PERIODS(TIMER_MAX_COUNT32), 1);
+            } else {
+                ptimer_set_limit(t->timer, LIMIT_TO_PERIODS(t->limit), 1);
             }
         }
         break;
-- 
2.20.1

Switch the slavio_timer code away from bottom-half based ptimers to
the new transaction-based ptimer API.  This just requires adding
begin/commit calls around the various places that modify the ptimer
state, and using the new ptimer_init() function to create the timer.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20191021134357.14266-4-peter.maydell@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/timer/slavio_timer.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/hw/timer/slavio_timer.c b/hw/timer/slavio_timer.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/slavio_timer.c
+++ b/hw/timer/slavio_timer.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/sysbus.h"
 #include "migration/vmstate.h"
 #include "trace.h"
-#include "qemu/main-loop.h"
 #include "qemu/module.h"
 
 /*
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_mem_writel(void *opaque, hwaddr addr,
     saddr = addr >> 2;
     switch (saddr) {
     case TIMER_LIMIT:
+        ptimer_transaction_begin(t->timer);
         if (slavio_timer_is_user(tc)) {
             uint64_t count;
 
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_mem_writel(void *opaque, hwaddr addr,
                 ptimer_set_limit(t->timer, LIMIT_TO_PERIODS(t->limit), 1);
             }
         }
+        ptimer_transaction_commit(t->timer);
         break;
     case TIMER_COUNTER:
         if (slavio_timer_is_user(tc)) {
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_mem_writel(void *opaque, hwaddr addr,
             t->reached = 0;
             count = ((uint64_t)t->counthigh) << 32 | t->count;
             trace_slavio_timer_mem_writel_limit(timer_index, count);
+            ptimer_transaction_begin(t->timer);
             ptimer_set_count(t->timer, LIMIT_TO_PERIODS(t->limit - count));
+            ptimer_transaction_commit(t->timer);
         } else {
             trace_slavio_timer_mem_writel_counter_invalid();
         }
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_mem_writel(void *opaque, hwaddr addr,
     case TIMER_COUNTER_NORST:
         // set limit without resetting counter
         t->limit = val & TIMER_MAX_COUNT32;
+        ptimer_transaction_begin(t->timer);
         if (t->limit == 0) { /* free-run */
             ptimer_set_limit(t->timer, LIMIT_TO_PERIODS(TIMER_MAX_COUNT32), 0);
         } else {
             ptimer_set_limit(t->timer, LIMIT_TO_PERIODS(t->limit), 0);
         }
+        ptimer_transaction_commit(t->timer);
         break;
     case TIMER_STATUS:
+        ptimer_transaction_begin(t->timer);
         if (slavio_timer_is_user(tc)) {
             // start/stop user counter
             if (val & 1) {
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_mem_writel(void *opaque, hwaddr addr,
             }
         }
         t->run = val & 1;
+        ptimer_transaction_commit(t->timer);
         break;
     case TIMER_MODE:
         if (timer_index == 0) {
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_mem_writel(void *opaque, hwaddr addr,
                 unsigned int processor = 1 << i;
                 CPUTimerState *curr_timer = &s->cputimer[i + 1];
 
+                ptimer_transaction_begin(curr_timer->timer);
                 // check for a change in timer mode for this processor
                 if ((val & processor) != (s->cputimer_mode & processor)) {
                     if (val & processor) { // counter -> user timer
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_mem_writel(void *opaque, hwaddr addr,
                         trace_slavio_timer_mem_writel_mode_counter(timer_index);
                     }
                 }
+                ptimer_transaction_commit(curr_timer->timer);
             }
         } else {
             trace_slavio_timer_mem_writel_mode_invalid();
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_reset(DeviceState *d)
         curr_timer->count = 0;
         curr_timer->reached = 0;
         if (i <= s->num_cpus) {
+            ptimer_transaction_begin(curr_timer->timer);
             ptimer_set_limit(curr_timer->timer,
                              LIMIT_TO_PERIODS(TIMER_MAX_COUNT32), 1);
             ptimer_run(curr_timer->timer, 0);
             curr_timer->run = 1;
+            ptimer_transaction_commit(curr_timer->timer);
         }
     }
     s->cputimer_mode = 0;
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_init(Object *obj)
 {
     SLAVIO_TIMERState *s = SLAVIO_TIMER(obj);
     SysBusDevice *dev = SYS_BUS_DEVICE(obj);
-    QEMUBH *bh;
     unsigned int i;
     TimerContext *tc;
 
@@ -XXX,XX +XXX,XX @@ static void slavio_timer_init(Object *obj)
         tc->s = s;
         tc->timer_index = i;
 
-        bh = qemu_bh_new(slavio_timer_irq, tc);
-        s->cputimer[i].timer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT);
+        s->cputimer[i].timer = ptimer_init(slavio_timer_irq, tc,
+                                           PTIMER_POLICY_DEFAULT);
+        ptimer_transaction_begin(s->cputimer[i].timer);
         ptimer_set_period(s->cputimer[i].timer, TIMER_PERIOD);
+        ptimer_transaction_commit(s->cputimer[i].timer);
 
         size = i == 0 ? SYS_TIMER_SIZE : CPU_TIMER_SIZE;
         snprintf(timer_name, sizeof(timer_name), "timer-%i", i);
-- 
2.20.1

Switch the grlib_gptimer code away from bottom-half based ptimers to
the new transaction-based ptimer API.  This just requires adding
begin/commit calls around the various places that modify the ptimer
state, and using the new ptimer_init() function to create the timer.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20191021134357.14266-3-peter.maydell@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/timer/grlib_gptimer.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/hw/timer/grlib_gptimer.c b/hw/timer/grlib_gptimer.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/grlib_gptimer.c
+++ b/hw/timer/grlib_gptimer.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/irq.h"
 #include "hw/ptimer.h"
 #include "hw/qdev-properties.h"
-#include "qemu/main-loop.h"
 #include "qemu/module.h"
 
 #include "trace.h"
@@ -XXX,XX +XXX,XX @@ typedef struct GPTimer     GPTimer;
 typedef struct GPTimerUnit GPTimerUnit;
 
 struct GPTimer {
-    QEMUBH *bh;
     struct ptimer_state *ptimer;
 
     qemu_irq     irq;
@@ -XXX,XX +XXX,XX @@ struct GPTimerUnit {
     uint32_t config;
 };
 
+static void grlib_gptimer_tx_begin(GPTimer *timer)
+{
+    ptimer_transaction_begin(timer->ptimer);
+}
+
+static void grlib_gptimer_tx_commit(GPTimer *timer)
+{
+    ptimer_transaction_commit(timer->ptimer);
+}
+
+/* Must be called within grlib_gptimer_tx_begin/commit block */
 static void grlib_gptimer_enable(GPTimer *timer)
 {
     assert(timer != NULL);
@@ -XXX,XX +XXX,XX @@ static void grlib_gptimer_enable(GPTimer *timer)
     ptimer_run(timer->ptimer, 1);
 }
 
+/* Must be called within grlib_gptimer_tx_begin/commit block */
 static void grlib_gptimer_restart(GPTimer *timer)
 {
     assert(timer != NULL);
@@ -XXX,XX +XXX,XX @@ static void grlib_gptimer_set_scaler(GPTimerUnit *unit, uint32_t scaler)
     trace_grlib_gptimer_set_scaler(scaler, value);
 
     for (i = 0; i < unit->nr_timers; i++) {
+        ptimer_transaction_begin(unit->timers[i].ptimer);
         ptimer_set_freq(unit->timers[i].ptimer, value);
+        ptimer_transaction_commit(unit->timers[i].ptimer);
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void grlib_gptimer_write(void *opaque, hwaddr addr,
         switch (timer_addr) {
         case COUNTER_OFFSET:
             trace_grlib_gptimer_writel(id, addr, value);
+            grlib_gptimer_tx_begin(&unit->timers[id]);
             unit->timers[id].counter = value;
             grlib_gptimer_enable(&unit->timers[id]);
+            grlib_gptimer_tx_commit(&unit->timers[id]);
             return;
 
         case COUNTER_RELOAD_OFFSET:
@@ -XXX,XX +XXX,XX @@ static void grlib_gptimer_write(void *opaque, hwaddr addr,
             /* gptimer_restart calls gptimer_enable, so if "enable" and "load"
                bits are present, we just have to call restart. */
 
+            grlib_gptimer_tx_begin(&unit->timers[id]);
             if (value & GPTIMER_LOAD) {
                 grlib_gptimer_restart(&unit->timers[id]);
             } else if (value & GPTIMER_ENABLE) {
@@ -XXX,XX +XXX,XX @@ static void grlib_gptimer_write(void *opaque, hwaddr addr,
             value &= ~(GPTIMER_LOAD & GPTIMER_DEBUG_HALT);
 
             unit->timers[id].config = value;
+            grlib_gptimer_tx_commit(&unit->timers[id]);
             return;
 
         default:
@@ -XXX,XX +XXX,XX @@ static void grlib_gptimer_reset(DeviceState *d)
         timer->counter = 0;
         timer->reload = 0;
         timer->config = 0;
+        ptimer_transaction_begin(timer->ptimer);
         ptimer_stop(timer->ptimer);
         ptimer_set_count(timer->ptimer, 0);
         ptimer_set_freq(timer->ptimer, unit->freq_hz);
+        ptimer_transaction_commit(timer->ptimer);
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void grlib_gptimer_realize(DeviceState *dev, Error **errp)
         GPTimer *timer = &unit->timers[i];
 
         timer->unit   = unit;
-        timer->bh     = qemu_bh_new(grlib_gptimer_hit, timer);
-        timer->ptimer = ptimer_init_with_bh(timer->bh, PTIMER_POLICY_DEFAULT);
+        timer->ptimer = ptimer_init(grlib_gptimer_hit, timer,
+                                    PTIMER_POLICY_DEFAULT);
         timer->id     = i;
 
         /* One IRQ line for each timer */
         sysbus_init_irq(sbd, &timer->irq);
 
+        ptimer_transaction_begin(timer->ptimer);
         ptimer_set_freq(timer->ptimer, unit->freq_hz);
+        ptimer_transaction_commit(timer->ptimer);
     }
 
     memory_region_init_io(&unit->iomem, OBJECT(unit), &grlib_gptimer_ops,
-- 
2.20.1

Switch the mcf5206 code away from bottom-half based ptimers to
the new transaction-based ptimer API.  This just requires adding
begin/commit calls around the various places that modify the ptimer
state, and using the new ptimer_init() function to create the timer.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20191021140600.10725-1-peter.maydell@linaro.org
---
 hw/m68k/mcf5206.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/hw/m68k/mcf5206.c b/hw/m68k/mcf5206.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/m68k/mcf5206.c
+++ b/hw/m68k/mcf5206.c
@@ -XXX,XX +XXX,XX @@
 
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
-#include "qemu/main-loop.h"
 #include "cpu.h"
 #include "hw/hw.h"
 #include "hw/irq.h"
@@ -XXX,XX +XXX,XX @@ static void m5206_timer_recalibrate(m5206_timer_state *s)
     int prescale;
     int mode;
 
+    ptimer_transaction_begin(s->timer);
     ptimer_stop(s->timer);
 
-    if ((s->tmr & TMR_RST) == 0)
-        return;
+    if ((s->tmr & TMR_RST) == 0) {
+        goto exit;
+    }
 
     prescale = (s->tmr >> 8) + 1;
     mode = (s->tmr >> 1) & 3;
@@ -XXX,XX +XXX,XX @@ static void m5206_timer_recalibrate(m5206_timer_state *s)
     ptimer_set_limit(s->timer, s->trr, 0);
 
     ptimer_run(s->timer, 0);
+exit:
+    ptimer_transaction_commit(s->timer);
 }
 
 static void m5206_timer_trigger(void *opaque)
@@ -XXX,XX +XXX,XX @@ static void m5206_timer_write(m5206_timer_state *s, uint32_t addr, uint32_t val)
         s->tcr = val;
         break;
     case 0xc:
+        ptimer_transaction_begin(s->timer);
         ptimer_set_count(s->timer, val);
+        ptimer_transaction_commit(s->timer);
         break;
     case 0x11:
         s->ter &= ~val;
@@ -XXX,XX +XXX,XX @@ static void m5206_timer_write(m5206_timer_state *s, uint32_t addr, uint32_t val)
 static m5206_timer_state *m5206_timer_init(qemu_irq irq)
 {
     m5206_timer_state *s;
-    QEMUBH *bh;
 
     s = g_new0(m5206_timer_state, 1);
-    bh = qemu_bh_new(m5206_timer_trigger, s);
-    s->timer = ptimer_init_with_bh(bh, PTIMER_POLICY_DEFAULT);
+    s->timer = ptimer_init(m5206_timer_trigger, s, PTIMER_POLICY_DEFAULT);
     s->irq = irq;
     m5206_timer_reset(s);
     return s;
-- 
2.20.1

Switch the milkymist-sysctl code away from bottom-half based
ptimers to the new transaction-based ptimer API.  This just requires
adding begin/commit calls around the various places that modify the
ptimer state, and using the new ptimer_init() function to create the
timer.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20191021141040.11007-1-peter.maydell@linaro.org
---
 hw/timer/milkymist-sysctl.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/hw/timer/milkymist-sysctl.c b/hw/timer/milkymist-sysctl.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/milkymist-sysctl.c
+++ b/hw/timer/milkymist-sysctl.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/ptimer.h"
 #include "hw/qdev-properties.h"
 #include "qemu/error-report.h"
-#include "qemu/main-loop.h"
 #include "qemu/module.h"
 
 enum {
@@ -XXX,XX +XXX,XX @@ struct MilkymistSysctlState {
 
     MemoryRegion regs_region;
 
-    QEMUBH *bh0;
-    QEMUBH *bh1;
     ptimer_state *ptimer0;
     ptimer_state *ptimer1;
 
@@ -XXX,XX +XXX,XX @@ static void sysctl_write(void *opaque, hwaddr addr, uint64_t value,
         s->regs[addr] = value;
         break;
     case R_TIMER0_COMPARE:
+        ptimer_transaction_begin(s->ptimer0);
         ptimer_set_limit(s->ptimer0, value, 0);
         s->regs[addr] = value;
+        ptimer_transaction_commit(s->ptimer0);
         break;
     case R_TIMER1_COMPARE:
+        ptimer_transaction_begin(s->ptimer1);
         ptimer_set_limit(s->ptimer1, value, 0);
         s->regs[addr] = value;
+        ptimer_transaction_commit(s->ptimer1);
         break;
     case R_TIMER0_CONTROL:
+        ptimer_transaction_begin(s->ptimer0);
         s->regs[addr] = value;
         if (s->regs[R_TIMER0_CONTROL] & CTRL_ENABLE) {
             trace_milkymist_sysctl_start_timer0();
@@ -XXX,XX +XXX,XX @@ static void sysctl_write(void *opaque, hwaddr addr, uint64_t value,
             trace_milkymist_sysctl_stop_timer0();
             ptimer_stop(s->ptimer0);
         }
+        ptimer_transaction_commit(s->ptimer0);
         break;
     case R_TIMER1_CONTROL:
+        ptimer_transaction_begin(s->ptimer1);
         s->regs[addr] = value;
         if (s->regs[R_TIMER1_CONTROL] & CTRL_ENABLE) {
             trace_milkymist_sysctl_start_timer1();
@@ -XXX,XX +XXX,XX @@ static void sysctl_write(void *opaque, hwaddr addr, uint64_t value,
             trace_milkymist_sysctl_stop_timer1();
             ptimer_stop(s->ptimer1);
         }
+        ptimer_transaction_commit(s->ptimer1);
         break;
     case R_ICAP:
         sysctl_icap_write(s, value);
@@ -XXX,XX +XXX,XX @@ static void milkymist_sysctl_reset(DeviceState *d)
         s->regs[i] = 0;
     }
 
+    ptimer_transaction_begin(s->ptimer0);
     ptimer_stop(s->ptimer0);
+    ptimer_transaction_commit(s->ptimer0);
+    ptimer_transaction_begin(s->ptimer1);
     ptimer_stop(s->ptimer1);
+    ptimer_transaction_commit(s->ptimer1);
 
     /* defaults */
     s->regs[R_ICAP] = ICAP_READY;
@@ -XXX,XX +XXX,XX @@ static void milkymist_sysctl_realize(DeviceState *dev, Error **errp)
 {
     MilkymistSysctlState *s = MILKYMIST_SYSCTL(dev);
 
-    s->bh0 = qemu_bh_new(timer0_hit, s);
-    s->bh1 = qemu_bh_new(timer1_hit, s);
-    s->ptimer0 = ptimer_init_with_bh(s->bh0, PTIMER_POLICY_DEFAULT);
-    s->ptimer1 = ptimer_init_with_bh(s->bh1, PTIMER_POLICY_DEFAULT);
+    s->ptimer0 = ptimer_init(timer0_hit, s, PTIMER_POLICY_DEFAULT);
+    s->ptimer1 = ptimer_init(timer1_hit, s, PTIMER_POLICY_DEFAULT);
 
+    ptimer_transaction_begin(s->ptimer0);
     ptimer_set_freq(s->ptimer0, s->freq_hz);
+    ptimer_transaction_commit(s->ptimer0);
+    ptimer_transaction_begin(s->ptimer1);
     ptimer_set_freq(s->ptimer1, s->freq_hz);
+    ptimer_transaction_commit(s->ptimer1);
 }
 
 static const VMStateDescription vmstate_milkymist_sysctl = {
-- 
2.20.1

From: Andrew Jones <drjones@redhat.com>

Add support for the query-cpu-model-expansion QMP command to Arm. We
do this selectively, only exposing CPU properties which represent
optional CPU features which the user may want to enable/disable.
Additionally we restrict the list of queryable cpu models to 'max',
'host', or the current type when KVM is in use. And, finally, we only
implement expansion type 'full', as Arm does not yet have a "base"
CPU type. More details and example queries are described in a new
document (docs/arm-cpu-features.rst).

Note, certainly more features may be added to the list of advertised
features, e.g. 'vfp' and 'neon'. The only requirement is that we can
detect invalid configurations and emit failures at QMP query time.
For 'vfp' and 'neon' this will require some refactoring to share a
validation function between the QMP query and the CPU realize
functions.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Beata Michalska <beata.michalska@linaro.org>
Message-id: 20191024121808.9612-2-drjones@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 qapi/machine-target.json  |   6 +-
 target/arm/monitor.c      | 146 ++++++++++++++++++++++++++++++++++++++
 docs/arm-cpu-features.rst | 137 +++++++++++++++++++++++++++++++++++
 3 files changed, 286 insertions(+), 3 deletions(-)
 create mode 100644 docs/arm-cpu-features.rst

diff --git a/qapi/machine-target.json b/qapi/machine-target.json
index XXXXXXX..XXXXXXX 100644
--- a/qapi/machine-target.json
+++ b/qapi/machine-target.json
@@ -XXX,XX +XXX,XX @@
 ##
 { 'struct': 'CpuModelExpansionInfo',
   'data': { 'model': 'CpuModelInfo' },
-  'if': 'defined(TARGET_S390X) || defined(TARGET_I386)' }
+  'if': 'defined(TARGET_S390X) || defined(TARGET_I386) || defined(TARGET_ARM)' }
 
 ##
 # @query-cpu-model-expansion:
@@ -XXX,XX +XXX,XX @@
 #   query-cpu-model-expansion while using these is not advised.
 #
 # Some architectures may not support all expansion types. s390x supports
-# "full" and "static".
+# "full" and "static". Arm only supports "full".
 #
 # Returns: a CpuModelExpansionInfo. Returns an error if expanding CPU models is
 #          not supported, if the model cannot be expanded, if the model contains
@@ -XXX,XX +XXX,XX @@
   'data': { 'type': 'CpuModelExpansionType',
             'model': 'CpuModelInfo' },
   'returns': 'CpuModelExpansionInfo',
-  'if': 'defined(TARGET_S390X) || defined(TARGET_I386)' }
+  'if': 'defined(TARGET_S390X) || defined(TARGET_I386) || defined(TARGET_ARM)' }
 
 ##
 # @CpuDefinitionInfo:
diff --git a/target/arm/monitor.c b/target/arm/monitor.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/monitor.c
+++ b/target/arm/monitor.c
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
+#include "hw/boards.h"
 #include "kvm_arm.h"
+#include "qapi/error.h"
+#include "qapi/visitor.h"
+#include "qapi/qobject-input-visitor.h"
+#include "qapi/qapi-commands-machine-target.h"
 #include "qapi/qapi-commands-misc-target.h"
+#include "qapi/qmp/qerror.h"
+#include "qapi/qmp/qdict.h"
+#include "qom/qom-qobject.h"
 
 static GICCapability *gic_cap_new(int version)
 {
@@ -XXX,XX +XXX,XX @@ GICCapabilityList *qmp_query_gic_capabilities(Error **errp)
 
     return head;
 }
+
+/*
+ * These are cpu model features we want to advertise. The order here
+ * matters as this is the order in which qmp_query_cpu_model_expansion
+ * will attempt to set them. If there are dependencies between features,
+ * then the order that considers those dependencies must be used.
+ */
+static const char *cpu_model_advertised_features[] = {
+    "aarch64", "pmu",
+    NULL
+};
+
+CpuModelExpansionInfo *qmp_query_cpu_model_expansion(CpuModelExpansionType type,
+                                                     CpuModelInfo *model,
+                                                     Error **errp)
+{
+    CpuModelExpansionInfo *expansion_info;
+    const QDict *qdict_in = NULL;
+    QDict *qdict_out;
+    ObjectClass *oc;
+    Object *obj;
+    const char *name;
+    int i;
+
+    if (type != CPU_MODEL_EXPANSION_TYPE_FULL) {
+        error_setg(errp, "The requested expansion type is not supported");
+        return NULL;
+    }
+
+    if (!kvm_enabled() && !strcmp(model->name, "host")) {
+        error_setg(errp, "The CPU type '%s' requires KVM", model->name);
+        return NULL;
+    }
+
+    oc = cpu_class_by_name(TYPE_ARM_CPU, model->name);
+    if (!oc) {
+        error_setg(errp, "The CPU type '%s' is not a recognized ARM CPU type",
+                   model->name);
+        return NULL;
+    }
+
+    if (kvm_enabled()) {
+        const char *cpu_type = current_machine->cpu_type;
+        int len = strlen(cpu_type) - strlen(ARM_CPU_TYPE_SUFFIX);
+        bool supported = false;
+
+        if (!strcmp(model->name, "host") || !strcmp(model->name, "max")) {
+            /* These are kvmarm's recommended cpu types */
+            supported = true;
+        } else if (strlen(model->name) == len &&
+                   !strncmp(model->name, cpu_type, len)) {
+            /* KVM is enabled and we're using this type, so it works. */
+            supported = true;
+        }
+        if (!supported) {
+            error_setg(errp, "We cannot guarantee the CPU type '%s' works "
+                             "with KVM on this host", model->name);
+            return NULL;
+        }
+    }
+
+    if (model->props) {
+        qdict_in = qobject_to(QDict, model->props);
+        if (!qdict_in) {
+            error_setg(errp, QERR_INVALID_PARAMETER_TYPE, "props", "dict");
+            return NULL;
+        }
+    }
+
+    obj = object_new(object_class_get_name(oc));
+
+    if (qdict_in) {
+        Visitor *visitor;
+        Error *err = NULL;
+
+        visitor = qobject_input_visitor_new(model->props);
+        visit_start_struct(visitor, NULL, NULL, 0, &err);
+        if (err) {
+            visit_free(visitor);
+            object_unref(obj);
+            error_propagate(errp, err);
+            return NULL;
+        }
+
+        i = 0;
+        while ((name = cpu_model_advertised_features[i++]) != NULL) {
+            if (qdict_get(qdict_in, name)) {
+                object_property_set(obj, visitor, name, &err);
+                if (err) {
+                    break;
+                }
+            }
+        }
+
+        if (!err) {
+            visit_check_struct(visitor, &err);
+        }
+        visit_end_struct(visitor, NULL);
+        visit_free(visitor);
+        if (err) {
+            object_unref(obj);
+            error_propagate(errp, err);
+            return NULL;
+        }
+    }
+
+    expansion_info = g_new0(CpuModelExpansionInfo, 1);
+    expansion_info->model = g_malloc0(sizeof(*expansion_info->model));
+    expansion_info->model->name = g_strdup(model->name);
+
+    qdict_out = qdict_new();
+
+    i = 0;
+    while ((name = cpu_model_advertised_features[i++]) != NULL) {
+        ObjectProperty *prop = object_property_find(obj, name, NULL);
+        if (prop) {
+            Error *err = NULL;
+            QObject *value;
+
+            assert(prop->get);
+            value = object_property_get_qobject(obj, name, &err);
+            assert(!err);
+
+            qdict_put_obj(qdict_out, name, value);
+        }
+    }
+
+    if (!qdict_size(qdict_out)) {
+        qobject_unref(qdict_out);
+    } else {
+        expansion_info->model->props = QOBJECT(qdict_out);
+        expansion_info->model->has_props = true;
+    }
+
+    object_unref(obj);
+
+    return expansion_info;
+}
diff --git a/docs/arm-cpu-features.rst b/docs/arm-cpu-features.rst
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/docs/arm-cpu-features.rst
@@ -XXX,XX +XXX,XX @@
+================
+ARM CPU Features
+================
+
+Examples of probing and using ARM CPU features
+
+Introduction
+============
+
+CPU features are optional features that a CPU of supporting type may
+choose to implement or not.  In QEMU, optional CPU features have
+corresponding boolean CPU proprieties that, when enabled, indicate
+that the feature is implemented, and, conversely, when disabled,
+indicate that it is not implemented. An example of an ARM CPU feature
+is the Performance Monitoring Unit (PMU).  CPU types such as the
+Cortex-A15 and the Cortex-A57, which respectively implement ARM
+architecture reference manuals ARMv7-A and ARMv8-A, may both optionally
+implement PMUs.  For example, if a user wants to use a Cortex-A15 without
+a PMU, then the `-cpu` parameter should contain `pmu=off` on the QEMU
+command line, i.e. `-cpu cortex-a15,pmu=off`.
+
+As not all CPU types support all optional CPU features, then whether or
+not a CPU property exists depends on the CPU type.  For example, CPUs
+that implement the ARMv8-A architecture reference manual may optionally
+support the AArch32 CPU feature, which may be enabled by disabling the
+`aarch64` CPU property.  A CPU type such as the Cortex-A15, which does
+not implement ARMv8-A, will not have the `aarch64` CPU property.
+
+QEMU's support may be limited for some CPU features, only partially
+supporting the feature or only supporting the feature under certain
+configurations.  For example, the `aarch64` CPU feature, which, when
+disabled, enables the optional AArch32 CPU feature, is only supported
+when using the KVM accelerator and when running on a host CPU type that
+supports the feature.
+
+CPU Feature Probing
+===================
+
+Determining which CPU features are available and functional for a given
+CPU type is possible with the `query-cpu-model-expansion` QMP command.
+Below are some examples where `scripts/qmp/qmp-shell` (see the top comment
+block in the script for usage) is used to issue the QMP commands.
+
+(1) Determine which CPU features are available for the `max` CPU type
+    (Note, we started QEMU with qemu-system-aarch64, so `max` is
+     implementing the ARMv8-A reference manual in this case)::
+
+      (QEMU) query-cpu-model-expansion type=full model={"name":"max"}
+      { "return": {
+        "model": { "name": "max", "props": {
+        "pmu": true, "aarch64": true
+      }}}}
+
+We see that the `max` CPU type has the `pmu` and `aarch64` CPU features.
+We also see that the CPU features are enabled, as they are all `true`.
+
+(2) Let's try to disable the PMU::
+
+      (QEMU) query-cpu-model-expansion type=full model={"name":"max","props":{"pmu":false}}
+      { "return": {
+        "model": { "name": "max", "props": {
+        "pmu": false, "aarch64": true
+      }}}}
+
+We see it worked, as `pmu` is now `false`.
+
+(3) Let's try to disable `aarch64`, which enables the AArch32 CPU feature::
+
+      (QEMU) query-cpu-model-expansion type=full model={"name":"max","props":{"aarch64":false}}
+      {"error": {
+       "class": "GenericError", "desc":
+       "'aarch64' feature cannot be disabled unless KVM is enabled and 32-bit EL1 is supported"
+      }}
+
+It looks like this feature is limited to a configuration we do not
+currently have.
+
+(4) Let's try probing CPU features for the Cortex-A15 CPU type::
+
+      (QEMU) query-cpu-model-expansion type=full model={"name":"cortex-a15"}
+      {"return": {"model": {"name": "cortex-a15", "props": {"pmu": true}}}}
+
+Only the `pmu` CPU feature is available.
+
+A note about CPU feature dependencies
+-------------------------------------
+
+It's possible for features to have dependencies on other features. I.e.
+it may be possible to change one feature at a time without error, but
+when attempting to change all features at once an error could occur
+depending on the order they are processed.  It's also possible changing
+all at once doesn't generate an error, because a feature's dependencies
+are satisfied with other features, but the same feature cannot be changed
+independently without error.  For these reasons callers should always
+attempt to make their desired changes all at once in order to ensure the
+collection is valid.
+
+A note about CPU models and KVM
+-------------------------------
+
+Named CPU models generally do not work with KVM.  There are a few cases
+that do work, e.g. using the named CPU model `cortex-a57` with KVM on a
+seattle host, but mostly if KVM is enabled the `host` CPU type must be
+used.  This means the guest is provided all the same CPU features as the
+host CPU type has.  And, for this reason, the `host` CPU type should
+enable all CPU features that the host has by default.  Indeed it's even
+a bit strange to allow disabling CPU features that the host has when using
+the `host` CPU type, but in the absence of CPU models it's the best we can
+do if we want to launch guests without all the host's CPU features enabled.
+
+Enabling KVM also affects the `query-cpu-model-expansion` QMP command.  The
+affect is not only limited to specific features, as pointed out in example
+(3) of "CPU Feature Probing", but also to which CPU types may be expanded.
+When KVM is enabled, only the `max`, `host`, and current CPU type may be
+expanded.  This restriction is necessary as it's not possible to know all
+CPU types that may work with KVM, but it does impose a small risk of users
+experiencing unexpected errors.  For example on a seattle, as mentioned
+above, the `cortex-a57` CPU type is also valid when KVM is enabled.
+Therefore a user could use the `host` CPU type for the current type, but
+then attempt to query `cortex-a57`, however that query will fail with our
+restrictions.  This shouldn't be an issue though as management layers and
+users have been preferring the `host` CPU type for use with KVM for quite
+some time.  Additionally, if the KVM-enabled QEMU instance running on a
+seattle host is using the `cortex-a57` CPU type, then querying `cortex-a57`
+will work.
+
+Using CPU Features
+==================
+
+After determining which CPU features are available and supported for a
+given CPU type, then they may be selectively enabled or disabled on the
+QEMU command line with that CPU type::
+
+  $ qemu-system-aarch64 -M virt -cpu max,pmu=off
+
+The example above disables the PMU for the `max` CPU type.
+
-- 
2.20.1

From: Andrew Jones <drjones@redhat.com>

Now that Arm CPUs have advertised features lets add tests to ensure
we maintain their expected availability with and without KVM.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20191024121808.9612-3-drjones@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/Makefile.include   |   5 +-
 tests/arm-cpu-features.c | 240 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 244 insertions(+), 1 deletion(-)
 create mode 100644 tests/arm-cpu-features.c

diff --git a/tests/Makefile.include b/tests/Makefile.include
index XXXXXXX..XXXXXXX 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -XXX,XX +XXX,XX @@ check-qtest-sparc64-$(CONFIG_ISA_TESTDEV) = tests/endianness-test$(EXESUF)
 check-qtest-sparc64-y += tests/prom-env-test$(EXESUF)
 check-qtest-sparc64-y += tests/boot-serial-test$(EXESUF)
 
+check-qtest-arm-y += tests/arm-cpu-features$(EXESUF)
 check-qtest-arm-y += tests/microbit-test$(EXESUF)
 check-qtest-arm-y += tests/m25p80-test$(EXESUF)
 check-qtest-arm-y += tests/test-arm-mptimer$(EXESUF)
@@ -XXX,XX +XXX,XX @@ check-qtest-arm-y += tests/boot-serial-test$(EXESUF)
 check-qtest-arm-y += tests/hexloader-test$(EXESUF)
 check-qtest-arm-$(CONFIG_PFLASH_CFI02) += tests/pflash-cfi02-test$(EXESUF)
 
-check-qtest-aarch64-y = tests/numa-test$(EXESUF)
+check-qtest-aarch64-y += tests/arm-cpu-features$(EXESUF)
+check-qtest-aarch64-y += tests/numa-test$(EXESUF)
 check-qtest-aarch64-y += tests/boot-serial-test$(EXESUF)
 check-qtest-aarch64-y += tests/migration-test$(EXESUF)
 # TODO: once aarch64 TCG is fixed on ARM 32 bit host, make test unconditional
@@ -XXX,XX +XXX,XX @@ tests/test-qapi-util$(EXESUF): tests/test-qapi-util.o $(test-util-obj-y)
 tests/numa-test$(EXESUF): tests/numa-test.o
 tests/vmgenid-test$(EXESUF): tests/vmgenid-test.o tests/boot-sector.o tests/acpi-utils.o
 tests/cdrom-test$(EXESUF): tests/cdrom-test.o tests/boot-sector.o $(libqos-obj-y)
+tests/arm-cpu-features$(EXESUF): tests/arm-cpu-features.o
 
 tests/migration/stress$(EXESUF): tests/migration/stress.o
 	$(call quiet-command, $(LINKPROG) -static -O3 $(PTHREAD_LIB) -o $@ $< ,"LINK","$(TARGET_DIR)$@")
diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/arm-cpu-features.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * Arm CPU feature test cases
+ *
+ * Copyright (c) 2019 Red Hat Inc.
+ * Authors:
+ *  Andrew Jones <drjones@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qjson.h"
+
+#define MACHINE    "-machine virt,gic-version=max "
+#define QUERY_HEAD "{ 'execute': 'query-cpu-model-expansion', " \
+                     "'arguments': { 'type': 'full', "
+#define QUERY_TAIL "}}"
+
+static QDict *do_query_no_props(QTestState *qts, const char *cpu_type)
+{
+    return qtest_qmp(qts, QUERY_HEAD "'model': { 'name': %s }"
+                          QUERY_TAIL, cpu_type);
+}
+
+static QDict *do_query(QTestState *qts, const char *cpu_type,
+                       const char *fmt, ...)
+{
+    QDict *resp;
+
+    if (fmt) {
+        QDict *args;
+        va_list ap;
+
+        va_start(ap, fmt);
+        args = qdict_from_vjsonf_nofail(fmt, ap);
+        va_end(ap);
+
+        resp = qtest_qmp(qts, QUERY_HEAD "'model': { 'name': %s, "
+                                                    "'props': %p }"
+                              QUERY_TAIL, cpu_type, args);
+    } else {
+        resp = do_query_no_props(qts, cpu_type);
+    }
+
+    return resp;
+}
+
+static const char *resp_get_error(QDict *resp)
+{
+    QDict *qdict;
+
+    g_assert(resp);
+
+    qdict = qdict_get_qdict(resp, "error");
+    if (qdict) {
+        return qdict_get_str(qdict, "desc");
+    }
+    return NULL;
+}
+
+#define assert_error(qts, cpu_type, expected_error, fmt, ...)          \
+({                                                                     \
+    QDict *_resp;                                                      \
+    const char *_error;                                                \
+                                                                       \
+    _resp = do_query(qts, cpu_type, fmt, ##__VA_ARGS__);               \
+    g_assert(_resp);                                                   \
+    _error = resp_get_error(_resp);                                    \
+    g_assert(_error);                                                  \
+    g_assert(g_str_equal(_error, expected_error));                     \
+    qobject_unref(_resp);                                              \
+})
+
+static bool resp_has_props(QDict *resp)
+{
+    QDict *qdict;
+
+    g_assert(resp);
+
+    if (!qdict_haskey(resp, "return")) {
+        return false;
+    }
+    qdict = qdict_get_qdict(resp, "return");
+
+    if (!qdict_haskey(qdict, "model")) {
+        return false;
+    }
+    qdict = qdict_get_qdict(qdict, "model");
+
+    return qdict_haskey(qdict, "props");
+}
+
+static QDict *resp_get_props(QDict *resp)
+{
+    QDict *qdict;
+
+    g_assert(resp);
+    g_assert(resp_has_props(resp));
+
+    qdict = qdict_get_qdict(resp, "return");
+    qdict = qdict_get_qdict(qdict, "model");
+    qdict = qdict_get_qdict(qdict, "props");
+    return qdict;
+}
+
+#define assert_has_feature(qts, cpu_type, feature)                     \
+({                                                                     \
+    QDict *_resp = do_query_no_props(qts, cpu_type);                   \
+    g_assert(_resp);                                                   \
+    g_assert(resp_has_props(_resp));                                   \
+    g_assert(qdict_get(resp_get_props(_resp), feature));               \
+    qobject_unref(_resp);                                              \
+})
+
+#define assert_has_not_feature(qts, cpu_type, feature)                 \
+({                                                                     \
+    QDict *_resp = do_query_no_props(qts, cpu_type);                   \
+    g_assert(_resp);                                                   \
+    g_assert(!resp_has_props(_resp) ||                                 \
+             !qdict_get(resp_get_props(_resp), feature));              \
+    qobject_unref(_resp);                                              \
+})
+
+static void assert_type_full(QTestState *qts)
+{
+    const char *error;
+    QDict *resp;
+
+    resp = qtest_qmp(qts, "{ 'execute': 'query-cpu-model-expansion', "
+                            "'arguments': { 'type': 'static', "
+                                           "'model': { 'name': 'foo' }}}");
+    g_assert(resp);
+    error = resp_get_error(resp);
+    g_assert(error);
+    g_assert(g_str_equal(error,
+                         "The requested expansion type is not supported"));
+    qobject_unref(resp);
+}
+
+static void assert_bad_props(QTestState *qts, const char *cpu_type)
+{
+    const char *error;
+    QDict *resp;
+
+    resp = qtest_qmp(qts, "{ 'execute': 'query-cpu-model-expansion', "
+                            "'arguments': { 'type': 'full', "
+                                           "'model': { 'name': %s, "
+                                                      "'props': false }}}",
+                     cpu_type);
+    g_assert(resp);
+    error = resp_get_error(resp);
+    g_assert(error);
+    g_assert(g_str_equal(error,
+                         "Invalid parameter type for 'props', expected: dict"));
+    qobject_unref(resp);
+}
+
+static void test_query_cpu_model_expansion(const void *data)
+{
+    QTestState *qts;
+
+    qts = qtest_init(MACHINE "-cpu max");
+
+    /* Test common query-cpu-model-expansion input validation */
+    assert_type_full(qts);
+    assert_bad_props(qts, "max");
+    assert_error(qts, "foo", "The CPU type 'foo' is not a recognized "
+                 "ARM CPU type", NULL);
+    assert_error(qts, "max", "Parameter 'not-a-prop' is unexpected",
+                 "{ 'not-a-prop': false }");
+    assert_error(qts, "host", "The CPU type 'host' requires KVM", NULL);
+
+    /* Test expected feature presence/absence for some cpu types */
+    assert_has_feature(qts, "max", "pmu");
+    assert_has_feature(qts, "cortex-a15", "pmu");
+    assert_has_not_feature(qts, "cortex-a15", "aarch64");
+
+    if (g_str_equal(qtest_get_arch(), "aarch64")) {
+        assert_has_feature(qts, "max", "aarch64");
+        assert_has_feature(qts, "cortex-a57", "pmu");
+        assert_has_feature(qts, "cortex-a57", "aarch64");
+
+        /* Test that features that depend on KVM generate errors without. */
+        assert_error(qts, "max",
+                     "'aarch64' feature cannot be disabled "
+                     "unless KVM is enabled and 32-bit EL1 "
+                     "is supported",
+                     "{ 'aarch64': false }");
+    }
+
+    qtest_quit(qts);
+}
+
+static void test_query_cpu_model_expansion_kvm(const void *data)
+{
+    QTestState *qts;
+
+    qts = qtest_init(MACHINE "-accel kvm -cpu host");
+
+    if (g_str_equal(qtest_get_arch(), "aarch64")) {
+        assert_has_feature(qts, "host", "aarch64");
+        assert_has_feature(qts, "host", "pmu");
+
+        assert_error(qts, "cortex-a15",
+            "We cannot guarantee the CPU type 'cortex-a15' works "
+            "with KVM on this host", NULL);
+    } else {
+        assert_has_not_feature(qts, "host", "aarch64");
+        assert_has_not_feature(qts, "host", "pmu");
+    }
+
+    qtest_quit(qts);
+}
+
+int main(int argc, char **argv)
+{
+    bool kvm_available = false;
+
+    if (!access("/dev/kvm",  R_OK | W_OK)) {
+#if defined(HOST_AARCH64)
+        kvm_available = g_str_equal(qtest_get_arch(), "aarch64");
+#elif defined(HOST_ARM)
+        kvm_available = g_str_equal(qtest_get_arch(), "arm");
+#endif
+    }
+
+    g_test_init(&argc, &argv, NULL);
+
+    qtest_add_data_func("/arm/query-cpu-model-expansion",
+                        NULL, test_query_cpu_model_expansion);
+
+    if (kvm_available) {
+        qtest_add_data_func("/arm/kvm/query-cpu-model-expansion",
+                            NULL, test_query_cpu_model_expansion_kvm);
+    }
+
+    return g_test_run();
+}
-- 
2.20.1

From: Andrew Jones <drjones@redhat.com>

Since 97a28b0eeac14 ("target/arm: Allow VFP and Neon to be disabled via
a CPU property") we can disable the 'max' cpu model's VFP and neon
features, but there's no way to disable SVE. Add the 'sve=on|off'
property to give it that flexibility. We also rename
cpu_max_get/set_sve_vq to cpu_max_get/set_sve_max_vq in order for them
to follow the typical *_get/set_<property-name> pattern.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Beata Michalska <beata.michalska@linaro.org>
Message-id: 20191024121808.9612-4-drjones@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c         |  3 ++-
 target/arm/cpu64.c       | 52 ++++++++++++++++++++++++++++++++++------
 target/arm/monitor.c     |  2 +-
 tests/arm-cpu-features.c |  1 +
 4 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
         env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 16, 2, 3);
         env->cp15.cptr_el[3] |= CPTR_EZ;
         /* with maximum vector length */
-        env->vfp.zcr_el[1] = cpu->sve_max_vq - 1;
+        env->vfp.zcr_el[1] = cpu_isar_feature(aa64_sve, cpu) ?
+                             cpu->sve_max_vq - 1 : 0;
         env->vfp.zcr_el[2] = env->vfp.zcr_el[1];
         env->vfp.zcr_el[3] = env->vfp.zcr_el[1];
         /*
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
     define_arm_cp_regs(cpu, cortex_a72_a57_a53_cp_reginfo);
 }
 
-static void cpu_max_get_sve_vq(Object *obj, Visitor *v, const char *name,
-                               void *opaque, Error **errp)
+static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char *name,
+                                   void *opaque, Error **errp)
 {
     ARMCPU *cpu = ARM_CPU(obj);
-    visit_type_uint32(v, name, &cpu->sve_max_vq, errp);
+    uint32_t value;
+
+    /* All vector lengths are disabled when SVE is off. */
+    if (!cpu_isar_feature(aa64_sve, cpu)) {
+        value = 0;
+    } else {
+        value = cpu->sve_max_vq;
+    }
+    visit_type_uint32(v, name, &value, errp);
 }
 
-static void cpu_max_set_sve_vq(Object *obj, Visitor *v, const char *name,
-                               void *opaque, Error **errp)
+static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char *name,
+                                   void *opaque, Error **errp)
 {
     ARMCPU *cpu = ARM_CPU(obj);
     Error *err = NULL;
@@ -XXX,XX +XXX,XX @@ static void cpu_max_set_sve_vq(Object *obj, Visitor *v, const char *name,
     error_propagate(errp, err);
 }
 
+static void cpu_arm_get_sve(Object *obj, Visitor *v, const char *name,
+                            void *opaque, Error **errp)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+    bool value = cpu_isar_feature(aa64_sve, cpu);
+
+    visit_type_bool(v, name, &value, errp);
+}
+
+static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
+                            void *opaque, Error **errp)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+    Error *err = NULL;
+    bool value;
+    uint64_t t;
+
+    visit_type_bool(v, name, &value, &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+
+    t = cpu->isar.id_aa64pfr0;
+    t = FIELD_DP64(t, ID_AA64PFR0, SVE, value);
+    cpu->isar.id_aa64pfr0 = t;
+}
+
 /* -cpu max: if KVM is enabled, like -cpu host (best possible with this host);
  * otherwise, a CPU with as many features enabled as our emulation supports.
  * The version of '-cpu max' for qemu-system-arm is defined in cpu.c;
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
 #endif
 
         cpu->sve_max_vq = ARM_MAX_VQ;
-        object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_vq,
-                            cpu_max_set_sve_vq, NULL, NULL, &error_fatal);
+        object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
+                            cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
+        object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
+                            cpu_arm_set_sve, NULL, NULL, &error_fatal);
     }
 }
 
diff --git a/target/arm/monitor.c b/target/arm/monitor.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/monitor.c
+++ b/target/arm/monitor.c
@@ -XXX,XX +XXX,XX @@ GICCapabilityList *qmp_query_gic_capabilities(Error **errp)
  * then the order that considers those dependencies must be used.
  */
 static const char *cpu_model_advertised_features[] = {
-    "aarch64", "pmu",
+    "aarch64", "pmu", "sve",
     NULL
 };
 
diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/arm-cpu-features.c
+++ b/tests/arm-cpu-features.c
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion(const void *data)
 
     if (g_str_equal(qtest_get_arch(), "aarch64")) {
         assert_has_feature(qts, "max", "aarch64");
+        assert_has_feature(qts, "max", "sve");
         assert_has_feature(qts, "cortex-a57", "pmu");
         assert_has_feature(qts, "cortex-a57", "aarch64");
 
-- 
2.20.1

From: Andrew Jones <drjones@redhat.com>

Introduce cpu properties to give fine control over SVE vector lengths.
We introduce a property for each valid length up to the current
maximum supported, which is 2048-bits. The properties are named, e.g.
sve128, sve256, sve384, sve512, ..., where the number is the number of
bits. See the updates to docs/arm-cpu-features.rst for a description
of the semantics and for example uses.

Note, as sve-max-vq is still present and we'd like to be able to
support qmp_query_cpu_model_expansion with guests launched with e.g.
-cpu max,sve-max-vq=8 on their command lines, then we do allow
sve-max-vq and sve<N> properties to be provided at the same time, but
this is not recommended, and is why sve-max-vq is not mentioned in the
document.  If sve-max-vq is provided then it enables all lengths smaller
than and including the max and disables all lengths larger. It also has
the side-effect that no larger lengths may be enabled and that the max
itself cannot be disabled. Smaller non-power-of-two lengths may,
however, be disabled, e.g. -cpu max,sve-max-vq=4,sve384=off provides a
guest the vector lengths 128, 256, and 512 bits.

This patch has been co-authored with Richard Henderson, who reworked
the target/arm/cpu64.c changes in order to push all the validation and
auto-enabling/disabling steps into the finalizer, resulting in a nice
LOC reduction.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Beata Michalska <beata.michalska@linaro.org>
Message-id: 20191024121808.9612-5-drjones@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/qemu/bitops.h     |   1 +
 target/arm/cpu.h          |  19 ++++
 target/arm/cpu.c          |  19 ++++
 target/arm/cpu64.c        | 192 ++++++++++++++++++++++++++++++++++++-
 target/arm/helper.c       |  10 +-
 target/arm/monitor.c      |  12 +++
 tests/arm-cpu-features.c  | 194 ++++++++++++++++++++++++++++++++++++++
 docs/arm-cpu-features.rst | 168 +++++++++++++++++++++++++++++++--
 8 files changed, 606 insertions(+), 9 deletions(-)

diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/bitops.h
+++ b/include/qemu/bitops.h
@@ -XXX,XX +XXX,XX @@
 #define BITS_PER_LONG           (sizeof (unsigned long) * BITS_PER_BYTE)
 
 #define BIT(nr)                 (1UL << (nr))
+#define BIT_ULL(nr)             (1ULL << (nr))
 #define BIT_MASK(nr)            (1UL << ((nr) % BITS_PER_LONG))
 #define BIT_WORD(nr)            ((nr) / BITS_PER_LONG)
 #define BITS_TO_LONGS(nr)       DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct {
 
 #ifdef TARGET_AARCH64
 # define ARM_MAX_VQ    16
+void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp);
+uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq);
 #else
 # define ARM_MAX_VQ    1
+static inline void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp) { }
+static inline uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq)
+{ return 0; }
 #endif
 
 typedef struct ARMVectorReg {
@@ -XXX,XX +XXX,XX @@ struct ARMCPU {
 
     /* Used to set the maximum vector length the cpu will support.  */
     uint32_t sve_max_vq;
+
+    /*
+     * In sve_vq_map each set bit is a supported vector length of
+     * (bit-number + 1) * 16 bytes, i.e. each bit number + 1 is the vector
+     * length in quadwords.
+     *
+     * While processing properties during initialization, corresponding
+     * sve_vq_init bits are set for bits in sve_vq_map that have been
+     * set by properties.
+     */
+    DECLARE_BITMAP(sve_vq_map, ARM_MAX_VQ);
+    DECLARE_BITMAP(sve_vq_init, ARM_MAX_VQ);
 };
 
 void arm_cpu_post_init(Object *obj);
@@ -XXX,XX +XXX,XX @@ static inline int arm_feature(CPUARMState *env, int feature)
     return (env->features & (1ULL << feature)) != 0;
 }
 
+void arm_cpu_finalize_features(ARMCPU *cpu, Error **errp);
+
 #if !defined(CONFIG_USER_ONLY)
 /* Return true if exception levels below EL3 are in secure state,
  * or would be following an exception return to that level.
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_finalizefn(Object *obj)
 #endif
 }
 
+void arm_cpu_finalize_features(ARMCPU *cpu, Error **errp)
+{
+    Error *local_err = NULL;
+
+    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
+        arm_cpu_sve_finalize(cpu, &local_err);
+        if (local_err != NULL) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+}
+
 static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
 {
     CPUState *cs = CPU(dev);
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
+    arm_cpu_finalize_features(cpu, &local_err);
+    if (local_err != NULL) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
     if (arm_feature(env, ARM_FEATURE_AARCH64) &&
         cpu->has_vfp != cpu->has_neon) {
         /*
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_a72_initfn(Object *obj)
     define_arm_cp_regs(cpu, cortex_a72_a57_a53_cp_reginfo);
 }
 
+void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
+{
+    /*
+     * If any vector lengths are explicitly enabled with sve<N> properties,
+     * then all other lengths are implicitly disabled.  If sve-max-vq is
+     * specified then it is the same as explicitly enabling all lengths
+     * up to and including the specified maximum, which means all larger
+     * lengths will be implicitly disabled.  If no sve<N> properties
+     * are enabled and sve-max-vq is not specified, then all lengths not
+     * explicitly disabled will be enabled.  Additionally, all power-of-two
+     * vector lengths less than the maximum enabled length will be
+     * automatically enabled and all vector lengths larger than the largest
+     * disabled power-of-two vector length will be automatically disabled.
+     * Errors are generated if the user provided input that interferes with
+     * any of the above.  Finally, if SVE is not disabled, then at least one
+     * vector length must be enabled.
+     */
+    DECLARE_BITMAP(tmp, ARM_MAX_VQ);
+    uint32_t vq, max_vq = 0;
+
+    /*
+     * Process explicit sve<N> properties.
+     * From the properties, sve_vq_map<N> implies sve_vq_init<N>.
+     * Check first for any sve<N> enabled.
+     */
+    if (!bitmap_empty(cpu->sve_vq_map, ARM_MAX_VQ)) {
+        max_vq = find_last_bit(cpu->sve_vq_map, ARM_MAX_VQ) + 1;
+
+        if (cpu->sve_max_vq && max_vq > cpu->sve_max_vq) {
+            error_setg(errp, "cannot enable sve%d", max_vq * 128);
+            error_append_hint(errp, "sve%d is larger than the maximum vector "
+                              "length, sve-max-vq=%d (%d bits)\n",
+                              max_vq * 128, cpu->sve_max_vq,
+                              cpu->sve_max_vq * 128);
+            return;
+        }
+
+        /* Propagate enabled bits down through required powers-of-two. */
+        for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
+            if (!test_bit(vq - 1, cpu->sve_vq_init)) {
+                set_bit(vq - 1, cpu->sve_vq_map);
+            }
+        }
+    } else if (cpu->sve_max_vq == 0) {
+        /*
+         * No explicit bits enabled, and no implicit bits from sve-max-vq.
+         */
+        if (!cpu_isar_feature(aa64_sve, cpu)) {
+            /* SVE is disabled and so are all vector lengths.  Good. */
+            return;
+        }
+
+        /* Disabling a power-of-two disables all larger lengths. */
+        if (test_bit(0, cpu->sve_vq_init)) {
+            error_setg(errp, "cannot disable sve128");
+            error_append_hint(errp, "Disabling sve128 results in all vector "
+                              "lengths being disabled.\n");
+            error_append_hint(errp, "With SVE enabled, at least one vector "
+                              "length must be enabled.\n");
+            return;
+        }
+        for (vq = 2; vq <= ARM_MAX_VQ; vq <<= 1) {
+            if (test_bit(vq - 1, cpu->sve_vq_init)) {
+                break;
+            }
+        }
+        max_vq = vq <= ARM_MAX_VQ ? vq - 1 : ARM_MAX_VQ;
+
+        bitmap_complement(cpu->sve_vq_map, cpu->sve_vq_init, max_vq);
+        max_vq = find_last_bit(cpu->sve_vq_map, max_vq) + 1;
+    }
+
+    /*
+     * Process the sve-max-vq property.
+     * Note that we know from the above that no bit above
+     * sve-max-vq is currently set.
+     */
+    if (cpu->sve_max_vq != 0) {
+        max_vq = cpu->sve_max_vq;
+
+        if (!test_bit(max_vq - 1, cpu->sve_vq_map) &&
+            test_bit(max_vq - 1, cpu->sve_vq_init)) {
+            error_setg(errp, "cannot disable sve%d", max_vq * 128);
+            error_append_hint(errp, "The maximum vector length must be "
+                              "enabled, sve-max-vq=%d (%d bits)\n",
+                              max_vq, max_vq * 128);
+            return;
+        }
+
+        /* Set all bits not explicitly set within sve-max-vq. */
+        bitmap_complement(tmp, cpu->sve_vq_init, max_vq);
+        bitmap_or(cpu->sve_vq_map, cpu->sve_vq_map, tmp, max_vq);
+    }
+
+    /*
+     * We should know what max-vq is now.  Also, as we're done
+     * manipulating sve-vq-map, we ensure any bits above max-vq
+     * are clear, just in case anybody looks.
+     */
+    assert(max_vq != 0);
+    bitmap_clear(cpu->sve_vq_map, max_vq, ARM_MAX_VQ - max_vq);
+
+    /* Ensure all required powers-of-two are enabled. */
+    for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
+        if (!test_bit(vq - 1, cpu->sve_vq_map)) {
+            error_setg(errp, "cannot disable sve%d", vq * 128);
+            error_append_hint(errp, "sve%d is required as it "
+                              "is a power-of-two length smaller than "
+                              "the maximum, sve%d\n",
+                              vq * 128, max_vq * 128);
+            return;
+        }
+    }
+
+    /*
+     * Now that we validated all our vector lengths, the only question
+     * left to answer is if we even want SVE at all.
+     */
+    if (!cpu_isar_feature(aa64_sve, cpu)) {
+        error_setg(errp, "cannot enable sve%d", max_vq * 128);
+        error_append_hint(errp, "SVE must be enabled to enable vector "
+                          "lengths.\n");
+        error_append_hint(errp, "Add sve=on to the CPU property list.\n");
+        return;
+    }
+
+    /* From now on sve_max_vq is the actual maximum supported length. */
+    cpu->sve_max_vq = max_vq;
+}
+
+uint32_t arm_cpu_vq_map_next_smaller(ARMCPU *cpu, uint32_t vq)
+{
+    uint32_t bitnum;
+
+    /*
+     * We allow vq == ARM_MAX_VQ + 1 to be input because the caller may want
+     * to find the maximum vq enabled, which may be ARM_MAX_VQ, but this
+     * function always returns the next smaller than the input.
+     */
+    assert(vq && vq <= ARM_MAX_VQ + 1);
+
+    bitnum = find_last_bit(cpu->sve_vq_map, vq - 1);
+    return bitnum == vq - 1 ? 0 : bitnum + 1;
+}
+
 static void cpu_max_get_sve_max_vq(Object *obj, Visitor *v, const char *name,
                                    void *opaque, Error **errp)
 {
@@ -XXX,XX +XXX,XX @@ static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char *name,
     error_propagate(errp, err);
 }
 
+static void cpu_arm_get_sve_vq(Object *obj, Visitor *v, const char *name,
+                               void *opaque, Error **errp)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+    uint32_t vq = atoi(&name[3]) / 128;
+    bool value;
+
+    /* All vector lengths are disabled when SVE is off. */
+    if (!cpu_isar_feature(aa64_sve, cpu)) {
+        value = false;
+    } else {
+        value = test_bit(vq - 1, cpu->sve_vq_map);
+    }
+    visit_type_bool(v, name, &value, errp);
+}
+
+static void cpu_arm_set_sve_vq(Object *obj, Visitor *v, const char *name,
+                               void *opaque, Error **errp)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+    uint32_t vq = atoi(&name[3]) / 128;
+    Error *err = NULL;
+    bool value;
+
+    visit_type_bool(v, name, &value, &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+
+    if (value) {
+        set_bit(vq - 1, cpu->sve_vq_map);
+    } else {
+        clear_bit(vq - 1, cpu->sve_vq_map);
+    }
+    set_bit(vq - 1, cpu->sve_vq_init);
+}
+
 static void cpu_arm_get_sve(Object *obj, Visitor *v, const char *name,
                             void *opaque, Error **errp)
 {
@@ -XXX,XX +XXX,XX @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
 static void aarch64_max_initfn(Object *obj)
 {
     ARMCPU *cpu = ARM_CPU(obj);
+    uint32_t vq;
 
     if (kvm_enabled()) {
         kvm_arm_set_cpu_features_from_host(cpu);
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         cpu->dcz_blocksize = 7; /*  512 bytes */
 #endif
 
-        cpu->sve_max_vq = ARM_MAX_VQ;
         object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
                             cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
         object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
                             cpu_arm_set_sve, NULL, NULL, &error_fatal);
+
+        for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
+            char name[8];
+            sprintf(name, "sve%d", vq * 128);
+            object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
+                                cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
+        }
     }
 }
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ int sve_exception_el(CPUARMState *env, int el)
     return 0;
 }
 
+static uint32_t sve_zcr_get_valid_len(ARMCPU *cpu, uint32_t start_len)
+{
+    uint32_t start_vq = (start_len & 0xf) + 1;
+
+    return arm_cpu_vq_map_next_smaller(cpu, start_vq + 1) - 1;
+}
+
 /*
  * Given that SVE is enabled, return the vector length for EL.
  */
@@ -XXX,XX +XXX,XX @@ uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
     if (arm_feature(env, ARM_FEATURE_EL3)) {
         zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[3]);
     }
-    return zcr_len;
+
+    return sve_zcr_get_valid_len(cpu, zcr_len);
 }
 
 static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
diff --git a/target/arm/monitor.c b/target/arm/monitor.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/monitor.c
+++ b/target/arm/monitor.c
@@ -XXX,XX +XXX,XX @@ GICCapabilityList *qmp_query_gic_capabilities(Error **errp)
     return head;
 }
 
+QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
+
 /*
  * These are cpu model features we want to advertise. The order here
  * matters as this is the order in which qmp_query_cpu_model_expansion
@@ -XXX,XX +XXX,XX @@ GICCapabilityList *qmp_query_gic_capabilities(Error **errp)
  */
 static const char *cpu_model_advertised_features[] = {
     "aarch64", "pmu", "sve",
+    "sve128", "sve256", "sve384", "sve512",
+    "sve640", "sve768", "sve896", "sve1024", "sve1152", "sve1280",
+    "sve1408", "sve1536", "sve1664", "sve1792", "sve1920", "sve2048",
     NULL
 };
 
@@ -XXX,XX +XXX,XX @@ CpuModelExpansionInfo *qmp_query_cpu_model_expansion(CpuModelExpansionType type,
         if (!err) {
             visit_check_struct(visitor, &err);
         }
+        if (!err) {
+            arm_cpu_finalize_features(ARM_CPU(obj), &err);
+        }
         visit_end_struct(visitor, NULL);
         visit_free(visitor);
         if (err) {
@@ -XXX,XX +XXX,XX @@ CpuModelExpansionInfo *qmp_query_cpu_model_expansion(CpuModelExpansionType type,
             error_propagate(errp, err);
             return NULL;
         }
+    } else {
+        Error *err = NULL;
+        arm_cpu_finalize_features(ARM_CPU(obj), &err);
+        assert(err == NULL);
     }
 
     expansion_info = g_new0(CpuModelExpansionInfo, 1);
diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/arm-cpu-features.c
+++ b/tests/arm-cpu-features.c
@@ -XXX,XX +XXX,XX @@
  * See the COPYING file in the top-level directory.
  */
 #include "qemu/osdep.h"
+#include "qemu/bitops.h"
 #include "libqtest.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qjson.h"
 
+/*
+ * We expect the SVE max-vq to be 16. Also it must be <= 64
+ * for our test code, otherwise 'vls' can't just be a uint64_t.
+ */
+#define SVE_MAX_VQ 16
+
 #define MACHINE    "-machine virt,gic-version=max "
 #define QUERY_HEAD "{ 'execute': 'query-cpu-model-expansion', " \
                      "'arguments': { 'type': 'full', "
@@ -XXX,XX +XXX,XX @@ static void assert_bad_props(QTestState *qts, const char *cpu_type)
     qobject_unref(resp);
 }
 
+static uint64_t resp_get_sve_vls(QDict *resp)
+{
+    QDict *props;
+    const QDictEntry *e;
+    uint64_t vls = 0;
+    int n = 0;
+
+    g_assert(resp);
+    g_assert(resp_has_props(resp));
+
+    props = resp_get_props(resp);
+
+    for (e = qdict_first(props); e; e = qdict_next(props, e)) {
+        if (strlen(e->key) > 3 && !strncmp(e->key, "sve", 3) &&
+            g_ascii_isdigit(e->key[3])) {
+            char *endptr;
+            int bits;
+
+            bits = g_ascii_strtoll(&e->key[3], &endptr, 10);
+            if (!bits || *endptr != '\0') {
+                continue;
+            }
+
+            if (qdict_get_bool(props, e->key)) {
+                vls |= BIT_ULL((bits / 128) - 1);
+            }
+            ++n;
+        }
+    }
+
+    g_assert(n == SVE_MAX_VQ);
+
+    return vls;
+}
+
+#define assert_sve_vls(qts, cpu_type, expected_vls, fmt, ...)          \
+({                                                                     \
+    QDict *_resp = do_query(qts, cpu_type, fmt, ##__VA_ARGS__);        \
+    g_assert(_resp);                                                   \
+    g_assert(resp_has_props(_resp));                                   \
+    g_assert(resp_get_sve_vls(_resp) == expected_vls);                 \
+    qobject_unref(_resp);                                              \
+})
+
+static void sve_tests_default(QTestState *qts, const char *cpu_type)
+{
+    /*
+     * With no sve-max-vq or sve<N> properties on the command line
+     * the default is to have all vector lengths enabled. This also
+     * tests that 'sve' is 'on' by default.
+     */
+    assert_sve_vls(qts, cpu_type, BIT_ULL(SVE_MAX_VQ) - 1, NULL);
+
+    /* With SVE off, all vector lengths should also be off. */
+    assert_sve_vls(qts, cpu_type, 0, "{ 'sve': false }");
+
+    /* With SVE on, we must have at least one vector length enabled. */
+    assert_error(qts, cpu_type, "cannot disable sve128", "{ 'sve128': false }");
+
+    /* Basic enable/disable tests. */
+    assert_sve_vls(qts, cpu_type, 0x7, "{ 'sve384': true }");
+    assert_sve_vls(qts, cpu_type, ((BIT_ULL(SVE_MAX_VQ) - 1) & ~BIT_ULL(2)),
+                   "{ 'sve384': false }");
+
+    /*
+     * ---------------------------------------------------------------------
+     *               power-of-two(vq)   all-power-            can      can
+     *                                  of-two(< vq)        enable   disable
+     * ---------------------------------------------------------------------
+     * vq < max_vq      no                MUST*              yes      yes
+     * vq < max_vq      yes               MUST*              yes      no
+     * ---------------------------------------------------------------------
+     * vq == max_vq     n/a               MUST*              yes**    yes**
+     * ---------------------------------------------------------------------
+     * vq > max_vq      n/a               no                 no       yes
+     * vq > max_vq      n/a               yes                yes      yes
+     * ---------------------------------------------------------------------
+     *
+     * [*] "MUST" means this requirement must already be satisfied,
+     *     otherwise 'max_vq' couldn't itself be enabled.
+     *
+     * [**] Not testable with the QMP interface, only with the command line.
+     */
+
+    /* max_vq := 8 */
+    assert_sve_vls(qts, cpu_type, 0x8b, "{ 'sve1024': true }");
+
+    /* max_vq := 8, vq < max_vq, !power-of-two(vq) */
+    assert_sve_vls(qts, cpu_type, 0x8f,
+                   "{ 'sve1024': true, 'sve384': true }");
+    assert_sve_vls(qts, cpu_type, 0x8b,
+                   "{ 'sve1024': true, 'sve384': false }");
+
+    /* max_vq := 8, vq < max_vq, power-of-two(vq) */
+    assert_sve_vls(qts, cpu_type, 0x8b,
+                   "{ 'sve1024': true, 'sve256': true }");
+    assert_error(qts, cpu_type, "cannot disable sve256",
+                 "{ 'sve1024': true, 'sve256': false }");
+
+    /* max_vq := 3, vq > max_vq, !all-power-of-two(< vq) */
+    assert_error(qts, cpu_type, "cannot disable sve512",
+                 "{ 'sve384': true, 'sve512': false, 'sve640': true }");
+
+    /*
+     * We can disable power-of-two vector lengths when all larger lengths
+     * are also disabled. We only need to disable the power-of-two length,
+     * as all non-enabled larger lengths will then be auto-disabled.
+     */
+    assert_sve_vls(qts, cpu_type, 0x7, "{ 'sve512': false }");
+
+    /* max_vq := 3, vq > max_vq, all-power-of-two(< vq) */
+    assert_sve_vls(qts, cpu_type, 0x1f,
+                   "{ 'sve384': true, 'sve512': true, 'sve640': true }");
+    assert_sve_vls(qts, cpu_type, 0xf,
+                   "{ 'sve384': true, 'sve512': true, 'sve640': false }");
+}
+
+static void sve_tests_sve_max_vq_8(const void *data)
+{
+    QTestState *qts;
+
+    qts = qtest_init(MACHINE "-cpu max,sve-max-vq=8");
+
+    assert_sve_vls(qts, "max", BIT_ULL(8) - 1, NULL);
+
+    /*
+     * Disabling the max-vq set by sve-max-vq is not allowed, but
+     * of course enabling it is OK.
+     */
+    assert_error(qts, "max", "cannot disable sve1024", "{ 'sve1024': false }");
+    assert_sve_vls(qts, "max", 0xff, "{ 'sve1024': true }");
+
+    /*
+     * Enabling anything larger than max-vq set by sve-max-vq is not
+     * allowed, but of course disabling everything larger is OK.
+     */
+    assert_error(qts, "max", "cannot enable sve1152", "{ 'sve1152': true }");
+    assert_sve_vls(qts, "max", 0xff, "{ 'sve1152': false }");
+
+    /*
+     * We can enable/disable non power-of-two lengths smaller than the
+     * max-vq set by sve-max-vq, but, while we can enable power-of-two
+     * lengths, we can't disable them.
+     */
+    assert_sve_vls(qts, "max", 0xff, "{ 'sve384': true }");
+    assert_sve_vls(qts, "max", 0xfb, "{ 'sve384': false }");
+    assert_sve_vls(qts, "max", 0xff, "{ 'sve256': true }");
+    assert_error(qts, "max", "cannot disable sve256", "{ 'sve256': false }");
+
+    qtest_quit(qts);
+}
+
+static void sve_tests_sve_off(const void *data)
+{
+    QTestState *qts;
+
+    qts = qtest_init(MACHINE "-cpu max,sve=off");
+
+    /* SVE is off, so the map should be empty. */
+    assert_sve_vls(qts, "max", 0, NULL);
+
+    /* The map stays empty even if we turn lengths off. */
+    assert_sve_vls(qts, "max", 0, "{ 'sve128': false }");
+
+    /* It's an error to enable lengths when SVE is off. */
+    assert_error(qts, "max", "cannot enable sve128", "{ 'sve128': true }");
+
+    /* With SVE re-enabled we should get all vector lengths enabled. */
+    assert_sve_vls(qts, "max", BIT_ULL(SVE_MAX_VQ) - 1, "{ 'sve': true }");
+
+    /* Or enable SVE with just specific vector lengths. */
+    assert_sve_vls(qts, "max", 0x3,
+                   "{ 'sve': true, 'sve128': true, 'sve256': true }");
+
+    qtest_quit(qts);
+}
+
 static void test_query_cpu_model_expansion(const void *data)
 {
     QTestState *qts;
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion(const void *data)
     if (g_str_equal(qtest_get_arch(), "aarch64")) {
         assert_has_feature(qts, "max", "aarch64");
         assert_has_feature(qts, "max", "sve");
+        assert_has_feature(qts, "max", "sve128");
         assert_has_feature(qts, "cortex-a57", "pmu");
         assert_has_feature(qts, "cortex-a57", "aarch64");
 
+        sve_tests_default(qts, "max");
+
         /* Test that features that depend on KVM generate errors without. */
         assert_error(qts, "max",
                      "'aarch64' feature cannot be disabled "
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
     qtest_add_data_func("/arm/query-cpu-model-expansion",
                         NULL, test_query_cpu_model_expansion);
 
+    if (g_str_equal(qtest_get_arch(), "aarch64")) {
+        qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-max-vq-8",
+                            NULL, sve_tests_sve_max_vq_8);
+        qtest_add_data_func("/arm/max/query-cpu-model-expansion/sve-off",
+                            NULL, sve_tests_sve_off);
+    }
+
     if (kvm_available) {
         qtest_add_data_func("/arm/kvm/query-cpu-model-expansion",
                             NULL, test_query_cpu_model_expansion_kvm);
diff --git a/docs/arm-cpu-features.rst b/docs/arm-cpu-features.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/arm-cpu-features.rst
+++ b/docs/arm-cpu-features.rst
@@ -XXX,XX +XXX,XX @@ block in the script for usage) is used to issue the QMP commands.
       (QEMU) query-cpu-model-expansion type=full model={"name":"max"}
       { "return": {
         "model": { "name": "max", "props": {
-        "pmu": true, "aarch64": true
+        "sve1664": true, "pmu": true, "sve1792": true, "sve1920": true,
+        "sve128": true, "aarch64": true, "sve1024": true, "sve": true,
+        "sve640": true, "sve768": true, "sve1408": true, "sve256": true,
+        "sve1152": true, "sve512": true, "sve384": true, "sve1536": true,
+        "sve896": true, "sve1280": true, "sve2048": true
       }}}}
 
-We see that the `max` CPU type has the `pmu` and `aarch64` CPU features.
-We also see that the CPU features are enabled, as they are all `true`.
+We see that the `max` CPU type has the `pmu`, `aarch64`, `sve`, and many
+`sve<N>` CPU features.  We also see that all the CPU features are
+enabled, as they are all `true`.  (The `sve<N>` CPU features are all
+optional SVE vector lengths (see "SVE CPU Properties").  While with TCG
+all SVE vector lengths can be supported, when KVM is in use it's more
+likely that only a few lengths will be supported, if SVE is supported at
+all.)
 
 (2) Let's try to disable the PMU::
 
       (QEMU) query-cpu-model-expansion type=full model={"name":"max","props":{"pmu":false}}
       { "return": {
         "model": { "name": "max", "props": {
-        "pmu": false, "aarch64": true
+        "sve1664": true, "pmu": false, "sve1792": true, "sve1920": true,
+        "sve128": true, "aarch64": true, "sve1024": true, "sve": true,
+        "sve640": true, "sve768": true, "sve1408": true, "sve256": true,
+        "sve1152": true, "sve512": true, "sve384": true, "sve1536": true,
+        "sve896": true, "sve1280": true, "sve2048": true
       }}}}
 
 We see it worked, as `pmu` is now `false`.
@@ -XXX,XX +XXX,XX @@ We see it worked, as `pmu` is now `false`.
 It looks like this feature is limited to a configuration we do not
 currently have.
 
-(4) Let's try probing CPU features for the Cortex-A15 CPU type::
+(4) Let's disable `sve` and see what happens to all the optional SVE
+    vector lengths::
+
+      (QEMU) query-cpu-model-expansion type=full model={"name":"max","props":{"sve":false}}
+      { "return": {
+        "model": { "name": "max", "props": {
+        "sve1664": false, "pmu": true, "sve1792": false, "sve1920": false,
+        "sve128": false, "aarch64": true, "sve1024": false, "sve": false,
+        "sve640": false, "sve768": false, "sve1408": false, "sve256": false,
+        "sve1152": false, "sve512": false, "sve384": false, "sve1536": false,
+        "sve896": false, "sve1280": false, "sve2048": false
+      }}}}
+
+As expected they are now all `false`.
+
+(5) Let's try probing CPU features for the Cortex-A15 CPU type::
 
       (QEMU) query-cpu-model-expansion type=full model={"name":"cortex-a15"}
       {"return": {"model": {"name": "cortex-a15", "props": {"pmu": true}}}}
@@ -XXX,XX +XXX,XX @@ After determining which CPU features are available and supported for a
 given CPU type, then they may be selectively enabled or disabled on the
 QEMU command line with that CPU type::
 
-  $ qemu-system-aarch64 -M virt -cpu max,pmu=off
+  $ qemu-system-aarch64 -M virt -cpu max,pmu=off,sve=on,sve128=on,sve256=on
 
-The example above disables the PMU for the `max` CPU type.
+The example above disables the PMU and enables the first two SVE vector
+lengths for the `max` CPU type.  Note, the `sve=on` isn't actually
+necessary, because, as we observed above with our probe of the `max` CPU
+type, `sve` is already on by default.  Also, based on our probe of
+defaults, it would seem we need to disable many SVE vector lengths, rather
+than only enabling the two we want.  This isn't the case, because, as
+disabling many SVE vector lengths would be quite verbose, the `sve<N>` CPU
+properties have special semantics (see "SVE CPU Property Parsing
+Semantics").
+
+SVE CPU Properties
+==================
+
+There are two types of SVE CPU properties: `sve` and `sve<N>`.  The first
+is used to enable or disable the entire SVE feature, just as the `pmu`
+CPU property completely enables or disables the PMU.  The second type
+is used to enable or disable specific vector lengths, where `N` is the
+number of bits of the length.  The `sve<N>` CPU properties have special
+dependencies and constraints, see "SVE CPU Property Dependencies and
+Constraints" below.  Additionally, as we want all supported vector lengths
+to be enabled by default, then, in order to avoid overly verbose command
+lines (command lines full of `sve<N>=off`, for all `N` not wanted), we
+provide the parsing semantics listed in "SVE CPU Property Parsing
+Semantics".
+
+SVE CPU Property Dependencies and Constraints
+---------------------------------------------
+
+  1) At least one vector length must be enabled when `sve` is enabled.
+
+  2) If a vector length `N` is enabled, then all power-of-two vector
+     lengths smaller than `N` must also be enabled.  E.g. if `sve512`
+     is enabled, then the 128-bit and 256-bit vector lengths must also
+     be enabled.
+
+SVE CPU Property Parsing Semantics
+----------------------------------
+
+  1) If SVE is disabled (`sve=off`), then which SVE vector lengths
+     are enabled or disabled is irrelevant to the guest, as the entire
+     SVE feature is disabled and that disables all vector lengths for
+     the guest.  However QEMU will still track any `sve<N>` CPU
+     properties provided by the user.  If later an `sve=on` is provided,
+     then the guest will get only the enabled lengths.  If no `sve=on`
+     is provided and there are explicitly enabled vector lengths, then
+     an error is generated.
+
+  2) If SVE is enabled (`sve=on`), but no `sve<N>` CPU properties are
+     provided, then all supported vector lengths are enabled, including
+     the non-power-of-two lengths.
+
+  3) If SVE is enabled, then an error is generated when attempting to
+     disable the last enabled vector length (see constraint (1) of "SVE
+     CPU Property Dependencies and Constraints").
+
+  4) If one or more vector lengths have been explicitly enabled and at
+     at least one of the dependency lengths of the maximum enabled length
+     has been explicitly disabled, then an error is generated (see
+     constraint (2) of "SVE CPU Property Dependencies and Constraints").
+
+  5) If one or more `sve<N>` CPU properties are set `off`, but no `sve<N>`,
+     CPU properties are set `on`, then the specified vector lengths are
+     disabled but the default for any unspecified lengths remains enabled.
+     Disabling a power-of-two vector length also disables all vector
+     lengths larger than the power-of-two length (see constraint (2) of
+     "SVE CPU Property Dependencies and Constraints").
+
+  6) If one or more `sve<N>` CPU properties are set to `on`, then they
+     are enabled and all unspecified lengths default to disabled, except
+     for the required lengths per constraint (2) of "SVE CPU Property
+     Dependencies and Constraints", which will even be auto-enabled if
+     they were not explicitly enabled.
+
+  7) If SVE was disabled (`sve=off`), allowing all vector lengths to be
+     explicitly disabled (i.e. avoiding the error specified in (3) of
+     "SVE CPU Property Parsing Semantics"), then if later an `sve=on` is
+     provided an error will be generated.  To avoid this error, one must
+     enable at least one vector length prior to enabling SVE.
+
+SVE CPU Property Examples
+-------------------------
+
+  1) Disable SVE::
+
+     $ qemu-system-aarch64 -M virt -cpu max,sve=off
+
+  2) Implicitly enable all vector lengths for the `max` CPU type::
+
+     $ qemu-system-aarch64 -M virt -cpu max
+
+  3) Only enable the 128-bit vector length::
+
+     $ qemu-system-aarch64 -M virt -cpu max,sve128=on
+
+  4) Disable the 512-bit vector length and all larger vector lengths,
+     since 512 is a power-of-two.  This results in all the smaller,
+     uninitialized lengths (128, 256, and 384) defaulting to enabled::
+
+     $ qemu-system-aarch64 -M virt -cpu max,sve512=off
+
+  5) Enable the 128-bit, 256-bit, and 512-bit vector lengths::
+
+     $ qemu-system-aarch64 -M virt -cpu max,sve128=on,sve256=on,sve512=on
+
+  6) The same as (5), but since the 128-bit and 256-bit vector
+     lengths are required for the 512-bit vector length to be enabled,
+     then allow them to be auto-enabled::
+
+     $ qemu-system-aarch64 -M virt -cpu max,sve512=on
+
+  7) Do the same as (6), but by first disabling SVE and then re-enabling it::
+
+     $ qemu-system-aarch64 -M virt -cpu max,sve=off,sve512=on,sve=on
+
+  8) Force errors regarding the last vector length::
+
+     $ qemu-system-aarch64 -M virt -cpu max,sve128=off
+     $ qemu-system-aarch64 -M virt -cpu max,sve=off,sve128=off,sve=on
+
+SVE CPU Property Recommendations
+--------------------------------
+
+The examples in "SVE CPU Property Examples" exhibit many ways to select
+vector lengths which developers may find useful in order to avoid overly
+verbose command lines.  However, the recommended way to select vector
+lengths is to explicitly enable each desired length.  Therefore only
+example's (1), (3), and (5) exhibit recommended uses of the properties.
 
-- 
2.20.1

From: Andrew Jones <drjones@redhat.com>

These are the SVE equivalents to kvm_arch_get/put_fpsimd. Note, the
swabbing is different than it is for fpsmid because the vector format
is a little-endian stream of words.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Message-id: 20191024121808.9612-6-drjones@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/kvm64.c | 185 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 156 insertions(+), 29 deletions(-)

diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ int kvm_arch_destroy_vcpu(CPUState *cs)
 bool kvm_arm_reg_syncs_via_cpreg_list(uint64_t regidx)
 {
     /* Return true if the regidx is a register we should synchronize
-     * via the cpreg_tuples array (ie is not a core reg we sync by
-     * hand in kvm_arch_get/put_registers())
+     * via the cpreg_tuples array (ie is not a core or sve reg that
+     * we sync by hand in kvm_arch_get/put_registers())
      */
     switch (regidx & KVM_REG_ARM_COPROC_MASK) {
     case KVM_REG_ARM_CORE:
+    case KVM_REG_ARM64_SVE:
         return false;
     default:
         return true;
@@ -XXX,XX +XXX,XX @@ int kvm_arm_cpreg_level(uint64_t regidx)
 
 static int kvm_arch_put_fpsimd(CPUState *cs)
 {
-    ARMCPU *cpu = ARM_CPU(cs);
-    CPUARMState *env = &cpu->env;
+    CPUARMState *env = &ARM_CPU(cs)->env;
     struct kvm_one_reg reg;
-    uint32_t fpr;
     int i, ret;
 
     for (i = 0; i < 32; i++) {
@@ -XXX,XX +XXX,XX @@ static int kvm_arch_put_fpsimd(CPUState *cs)
         }
     }
 
-    reg.addr = (uintptr_t)(&fpr);
-    fpr = vfp_get_fpsr(env);
-    reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpsr);
-    ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
-    if (ret) {
-        return ret;
+    return 0;
+}
+
+/*
+ * SVE registers are encoded in KVM's memory in an endianness-invariant format.
+ * The byte at offset i from the start of the in-memory representation contains
+ * the bits [(7 + 8 * i) : (8 * i)] of the register value. As this means the
+ * lowest offsets are stored in the lowest memory addresses, then that nearly
+ * matches QEMU's representation, which is to use an array of host-endian
+ * uint64_t's, where the lower offsets are at the lower indices. To complete
+ * the translation we just need to byte swap the uint64_t's on big-endian hosts.
+ */
+static uint64_t *sve_bswap64(uint64_t *dst, uint64_t *src, int nr)
+{
+#ifdef HOST_WORDS_BIGENDIAN
+    int i;
+
+    for (i = 0; i < nr; ++i) {
+        dst[i] = bswap64(src[i]);
     }
 
-    reg.addr = (uintptr_t)(&fpr);
-    fpr = vfp_get_fpcr(env);
-    reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpcr);
+    return dst;
+#else
+    return src;
+#endif
+}
+
+/*
+ * KVM SVE registers come in slices where ZREGs have a slice size of 2048 bits
+ * and PREGS and the FFR have a slice size of 256 bits. However we simply hard
+ * code the slice index to zero for now as it's unlikely we'll need more than
+ * one slice for quite some time.
+ */
+static int kvm_arch_put_sve(CPUState *cs)
+{
+    ARMCPU *cpu = ARM_CPU(cs);
+    CPUARMState *env = &cpu->env;
+    uint64_t tmp[ARM_MAX_VQ * 2];
+    uint64_t *r;
+    struct kvm_one_reg reg;
+    int n, ret;
+
+    for (n = 0; n < KVM_ARM64_SVE_NUM_ZREGS; ++n) {
+        r = sve_bswap64(tmp, &env->vfp.zregs[n].d[0], cpu->sve_max_vq * 2);
+        reg.addr = (uintptr_t)r;
+        reg.id = KVM_REG_ARM64_SVE_ZREG(n, 0);
+        ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    for (n = 0; n < KVM_ARM64_SVE_NUM_PREGS; ++n) {
+        r = sve_bswap64(tmp, r = &env->vfp.pregs[n].p[0],
+                        DIV_ROUND_UP(cpu->sve_max_vq * 2, 8));
+        reg.addr = (uintptr_t)r;
+        reg.id = KVM_REG_ARM64_SVE_PREG(n, 0);
+        ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
+        if (ret) {
+            return ret;
+        }
+    }
+
+    r = sve_bswap64(tmp, &env->vfp.pregs[FFR_PRED_NUM].p[0],
+                    DIV_ROUND_UP(cpu->sve_max_vq * 2, 8));
+    reg.addr = (uintptr_t)r;
+    reg.id = KVM_REG_ARM64_SVE_FFR(0);
     ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
     if (ret) {
         return ret;
@@ -XXX,XX +XXX,XX @@ int kvm_arch_put_registers(CPUState *cs, int level)
 {
     struct kvm_one_reg reg;
     uint64_t val;
+    uint32_t fpr;
     int i, ret;
     unsigned int el;
 
@@ -XXX,XX +XXX,XX @@ int kvm_arch_put_registers(CPUState *cs, int level)
         }
     }
 
-    ret = kvm_arch_put_fpsimd(cs);
+    if (cpu_isar_feature(aa64_sve, cpu)) {
+        ret = kvm_arch_put_sve(cs);
+    } else {
+        ret = kvm_arch_put_fpsimd(cs);
+    }
+    if (ret) {
+        return ret;
+    }
+
+    reg.addr = (uintptr_t)(&fpr);
+    fpr = vfp_get_fpsr(env);
+    reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpsr);
+    ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
+    if (ret) {
+        return ret;
+    }
+
+    reg.addr = (uintptr_t)(&fpr);
+    fpr = vfp_get_fpcr(env);
+    reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpcr);
+    ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
     if (ret) {
         return ret;
     }
@@ -XXX,XX +XXX,XX @@ int kvm_arch_put_registers(CPUState *cs, int level)
 
 static int kvm_arch_get_fpsimd(CPUState *cs)
 {
-    ARMCPU *cpu = ARM_CPU(cs);
-    CPUARMState *env = &cpu->env;
+    CPUARMState *env = &ARM_CPU(cs)->env;
     struct kvm_one_reg reg;
-    uint32_t fpr;
     int i, ret;
 
     for (i = 0; i < 32; i++) {
@@ -XXX,XX +XXX,XX @@ static int kvm_arch_get_fpsimd(CPUState *cs)
         }
     }
 
-    reg.addr = (uintptr_t)(&fpr);
-    reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpsr);
-    ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &reg);
-    if (ret) {
-        return ret;
-    }
-    vfp_set_fpsr(env, fpr);
+    return 0;
+}
 
-    reg.addr = (uintptr_t)(&fpr);
-    reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpcr);
+/*
+ * KVM SVE registers come in slices where ZREGs have a slice size of 2048 bits
+ * and PREGS and the FFR have a slice size of 256 bits. However we simply hard
+ * code the slice index to zero for now as it's unlikely we'll need more than
+ * one slice for quite some time.
+ */
+static int kvm_arch_get_sve(CPUState *cs)
+{
+    ARMCPU *cpu = ARM_CPU(cs);
+    CPUARMState *env = &cpu->env;
+    struct kvm_one_reg reg;
+    uint64_t *r;
+    int n, ret;
+
+    for (n = 0; n < KVM_ARM64_SVE_NUM_ZREGS; ++n) {
+        r = &env->vfp.zregs[n].d[0];
+        reg.addr = (uintptr_t)r;
+        reg.id = KVM_REG_ARM64_SVE_ZREG(n, 0);
+        ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &reg);
+        if (ret) {
+            return ret;
+        }
+        sve_bswap64(r, r, cpu->sve_max_vq * 2);
+    }
+
+    for (n = 0; n < KVM_ARM64_SVE_NUM_PREGS; ++n) {
+        r = &env->vfp.pregs[n].p[0];
+        reg.addr = (uintptr_t)r;
+        reg.id = KVM_REG_ARM64_SVE_PREG(n, 0);
+        ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &reg);
+        if (ret) {
+            return ret;
+        }
+        sve_bswap64(r, r, DIV_ROUND_UP(cpu->sve_max_vq * 2, 8));
+    }
+
+    r = &env->vfp.pregs[FFR_PRED_NUM].p[0];
+    reg.addr = (uintptr_t)r;
+    reg.id = KVM_REG_ARM64_SVE_FFR(0);
     ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &reg);
     if (ret) {
         return ret;
     }
-    vfp_set_fpcr(env, fpr);
+    sve_bswap64(r, r, DIV_ROUND_UP(cpu->sve_max_vq * 2, 8));
 
     return 0;
 }
@@ -XXX,XX +XXX,XX @@ int kvm_arch_get_registers(CPUState *cs)
     struct kvm_one_reg reg;
     uint64_t val;
     unsigned int el;
+    uint32_t fpr;
     int i, ret;
 
     ARMCPU *cpu = ARM_CPU(cs);
@@ -XXX,XX +XXX,XX @@ int kvm_arch_get_registers(CPUState *cs)
         env->spsr = env->banked_spsr[i];
     }
 
-    ret = kvm_arch_get_fpsimd(cs);
+    if (cpu_isar_feature(aa64_sve, cpu)) {
+        ret = kvm_arch_get_sve(cs);
+    } else {
+        ret = kvm_arch_get_fpsimd(cs);
+    }
     if (ret) {
         return ret;
     }
 
+    reg.addr = (uintptr_t)(&fpr);
+    reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpsr);
+    ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &reg);
+    if (ret) {
+        return ret;
+    }
+    vfp_set_fpsr(env, fpr);
+
+    reg.addr = (uintptr_t)(&fpr);
+    reg.id = AARCH64_SIMD_CTRL_REG(fp_regs.fpcr);
+    ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &reg);
+    if (ret) {
+        return ret;
+    }
+    vfp_set_fpcr(env, fpr);
+
     ret = kvm_get_vcpu_events(cpu);
     if (ret) {
         return ret;
-- 
2.20.1

From: Andrew Jones <drjones@redhat.com>

Enable SVE in the KVM guest when the 'max' cpu type is configured
and KVM supports it. KVM SVE requires use of the new finalize
vcpu ioctl, so we add that now too. For starters SVE can only be
turned on or off, getting all vector lengths the host CPU supports
when on. We'll add the other SVE CPU properties in later patches.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Beata Michalska <beata.michalska@linaro.org>
Message-id: 20191024121808.9612-7-drjones@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/kvm_arm.h     | 27 +++++++++++++++++++++++++++
 target/arm/cpu64.c       | 17 ++++++++++++++---
 target/arm/kvm.c         |  5 +++++
 target/arm/kvm64.c       | 20 +++++++++++++++++++-
 tests/arm-cpu-features.c |  4 ++++
 5 files changed, 69 insertions(+), 4 deletions(-)

diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -XXX,XX +XXX,XX @@
  */
 int kvm_arm_vcpu_init(CPUState *cs);
 
+/**
+ * kvm_arm_vcpu_finalize
+ * @cs: CPUState
+ * @feature: int
+ *
+ * Finalizes the configuration of the specified VCPU feature by
+ * invoking the KVM_ARM_VCPU_FINALIZE ioctl. Features requiring
+ * this are documented in the "KVM_ARM_VCPU_FINALIZE" section of
+ * KVM's API documentation.
+ *
+ * Returns: 0 if success else < 0 error code
+ */
+int kvm_arm_vcpu_finalize(CPUState *cs, int feature);
+
 /**
  * kvm_arm_register_device:
  * @mr: memory region for this device
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_aarch32_supported(CPUState *cs);
  */
 bool kvm_arm_pmu_supported(CPUState *cs);
 
+/**
+ * bool kvm_arm_sve_supported:
+ * @cs: CPUState
+ *
+ * Returns true if the KVM VCPU can enable SVE and false otherwise.
+ */
+bool kvm_arm_sve_supported(CPUState *cs);
+
 /**
  * kvm_arm_get_max_vm_ipa_size - Returns the number of bits in the
  * IPA address space supported by KVM
@@ -XXX,XX +XXX,XX @@ static inline bool kvm_arm_pmu_supported(CPUState *cs)
     return false;
 }
 
+static inline bool kvm_arm_sve_supported(CPUState *cs)
+{
+    return false;
+}
+
 static inline int kvm_arm_get_max_vm_ipa_size(MachineState *ms)
 {
     return -ENOENT;
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
         return;
     }
 
+    if (value && kvm_enabled() && !kvm_arm_sve_supported(CPU(cpu))) {
+        error_setg(errp, "'sve' feature not supported by KVM on this host");
+        return;
+    }
+
     t = cpu->isar.id_aa64pfr0;
     t = FIELD_DP64(t, ID_AA64PFR0, SVE, value);
     cpu->isar.id_aa64pfr0 = t;
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
 {
     ARMCPU *cpu = ARM_CPU(obj);
     uint32_t vq;
+    uint64_t t;
 
     if (kvm_enabled()) {
         kvm_arm_set_cpu_features_from_host(cpu);
+        if (kvm_arm_sve_supported(CPU(cpu))) {
+            t = cpu->isar.id_aa64pfr0;
+            t = FIELD_DP64(t, ID_AA64PFR0, SVE, 1);
+            cpu->isar.id_aa64pfr0 = t;
+        }
     } else {
-        uint64_t t;
         uint32_t u;
         aarch64_a57_initfn(obj);
 
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
 
         object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
                             cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
-        object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
-                            cpu_arm_set_sve, NULL, NULL, &error_fatal);
 
         for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
             char name[8];
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
                                 cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
         }
     }
+
+    object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
+                        cpu_arm_set_sve, NULL, NULL, &error_fatal);
 }
 
 struct ARMCPUInfo {
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@ int kvm_arm_vcpu_init(CPUState *cs)
     return kvm_vcpu_ioctl(cs, KVM_ARM_VCPU_INIT, &init);
 }
 
+int kvm_arm_vcpu_finalize(CPUState *cs, int feature)
+{
+    return kvm_vcpu_ioctl(cs, KVM_ARM_VCPU_FINALIZE, &feature);
+}
+
 void kvm_arm_init_serror_injection(CPUState *cs)
 {
     cap_has_inject_serror_esr = kvm_check_extension(cs->kvm_state,
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_aarch32_supported(CPUState *cpu)
     return kvm_check_extension(s, KVM_CAP_ARM_EL1_32BIT);
 }
 
+bool kvm_arm_sve_supported(CPUState *cpu)
+{
+    KVMState *s = KVM_STATE(current_machine->accelerator);
+
+    return kvm_check_extension(s, KVM_CAP_ARM_SVE);
+}
+
 #define ARM_CPU_ID_MPIDR       3, 0, 0, 0, 5
 
 int kvm_arch_init_vcpu(CPUState *cs)
@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
         cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_EL1_32BIT;
     }
     if (!kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_PMU_V3)) {
-            cpu->has_pmu = false;
+        cpu->has_pmu = false;
     }
     if (cpu->has_pmu) {
         cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_PMU_V3;
     } else {
         unset_feature(&env->features, ARM_FEATURE_PMU);
     }
+    if (cpu_isar_feature(aa64_sve, cpu)) {
+        assert(kvm_arm_sve_supported(cs));
+        cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_SVE;
+    }
 
     /* Do KVM_ARM_VCPU_INIT ioctl */
     ret = kvm_arm_vcpu_init(cs);
@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
         return ret;
     }
 
+    if (cpu_isar_feature(aa64_sve, cpu)) {
+        ret = kvm_arm_vcpu_finalize(cs, KVM_ARM_VCPU_SVE);
+        if (ret) {
+            return ret;
+        }
+    }
+
     /*
      * When KVM is in use, PSCI is emulated in-kernel and not by qemu.
      * Currently KVM has its own idea about MPIDR assignment, so we
diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/arm-cpu-features.c
+++ b/tests/arm-cpu-features.c
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
         assert_has_feature(qts, "host", "aarch64");
         assert_has_feature(qts, "host", "pmu");
 
+        assert_has_feature(qts, "max", "sve");
+
         assert_error(qts, "cortex-a15",
             "We cannot guarantee the CPU type 'cortex-a15' works "
             "with KVM on this host", NULL);
     } else {
         assert_has_not_feature(qts, "host", "aarch64");
         assert_has_not_feature(qts, "host", "pmu");
+
+        assert_has_not_feature(qts, "max", "sve");
     }
 
     qtest_quit(qts);
-- 
2.20.1

From: Andrew Jones <drjones@redhat.com>

kvm_arm_create_scratch_host_vcpu() takes a struct kvm_vcpu_init
parameter. Rather than just using it as an output parameter to
pass back the preferred target, use it also as an input parameter,
allowing a caller to pass a selected target if they wish and to
also pass cpu features. If the caller doesn't want to select a
target they can pass -1 for the target which indicates they want
to use the preferred target and have it passed back like before.

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
                                       int *fdarray,
                                       struct kvm_vcpu_init *init)
 {
-    int ret, kvmfd = -1, vmfd = -1, cpufd = -1;
+    int ret = 0, kvmfd = -1, vmfd = -1, cpufd = -1;
 
     kvmfd = qemu_open("/dev/kvm", O_RDWR);
     if (kvmfd < 0) {
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
         goto finish;
     }
 
-    ret = ioctl(vmfd, KVM_ARM_PREFERRED_TARGET, init);
+    if (init->target == -1) {
+        struct kvm_vcpu_init preferred;
+
+        ret = ioctl(vmfd, KVM_ARM_PREFERRED_TARGET, &preferred);
+        if (!ret) {
+            init->target = preferred.target;
+        }
+    }
     if (ret >= 0) {
         ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, init);
         if (ret < 0) {
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
          * creating one kind of guest CPU which is its preferred
          * CPU type.
          */
+        struct kvm_vcpu_init try;
+
         while (*cpus_to_try != QEMU_KVM_ARM_TARGET_NONE) {
-            init->target = *cpus_to_try++;
-            memset(init->features, 0, sizeof(init->features));
-            ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, init);
+            try.target = *cpus_to_try++;
+            memcpy(try.features, init->features, sizeof(init->features));
+            ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, &try);
             if (ret >= 0) {
                 break;
             }
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
         if (ret < 0) {
             goto err;
         }
+        init->target = try.target;
     } else {
         /* Treat a NULL cpus_to_try argument the same as an empty
          * list, which means we will fail the call since this must
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
         QEMU_KVM_ARM_TARGET_CORTEX_A15,
         QEMU_KVM_ARM_TARGET_NONE
     };
-    struct kvm_vcpu_init init;
+    /*
+     * target = -1 informs kvm_arm_create_scratch_host_vcpu()
+     * to use the preferred target
+     */
+    struct kvm_vcpu_init init = { .target = -1, };
 
     if (!kvm_arm_create_scratch_host_vcpu(cpus_to_try, fdarray, &init)) {
         return false;
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
         KVM_ARM_TARGET_CORTEX_A57,
         QEMU_KVM_ARM_TARGET_NONE
     };
-    struct kvm_vcpu_init init;
+    /*
+     * target = -1 informs kvm_arm_create_scratch_host_vcpu()
+     * to use the preferred target
+     */
+    struct kvm_vcpu_init init = { .target = -1, };
 
     if (!kvm_arm_create_scratch_host_vcpu(cpus_to_try, fdarray, &init)) {
         return false;
-- 
2.20.1

From: Andrew Jones <drjones@redhat.com>

Extend the SVE vq map initialization and validation with KVM's
supported vector lengths when KVM is enabled. In order to determine
and select supported lengths we add two new KVM functions for getting
and setting the KVM_REG_ARM64_SVE_VLS pseudo-register.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Message-id: 20191024121808.9612-9-drjones@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/kvm_arm.h      |  12 +++
 target/arm/cpu64.c        | 176 ++++++++++++++++++++++++++++----------
 target/arm/kvm64.c        | 100 +++++++++++++++++++++-
 tests/arm-cpu-features.c  | 106 ++++++++++++++++++++++-
 docs/arm-cpu-features.rst |  45 +++++++---
 5 files changed, 381 insertions(+), 58 deletions(-)

diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -XXX,XX +XXX,XX @@ typedef struct ARMHostCPUFeatures {
  */
 bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf);
 
+/**
+ * kvm_arm_sve_get_vls:
+ * @cs: CPUState
+ * @map: bitmap to fill in
+ *
+ * Get all the SVE vector lengths supported by the KVM host, setting
+ * the bits corresponding to their length in quadwords minus one
+ * (vq - 1) in @map up to ARM_MAX_VQ.
+ */
+void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map);
+
 /**
  * kvm_arm_set_cpu_features_from_host:
  * @cpu: ARMCPU to set the features for
@@ -XXX,XX +XXX,XX @@ static inline int kvm_arm_vgic_probe(void)
 static inline void kvm_arm_pmu_set_irq(CPUState *cs, int irq) {}
 static inline void kvm_arm_pmu_init(CPUState *cs) {}
 
+static inline void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map) {}
 #endif
 
 static inline const char *gic_class_name(void)
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
      * any of the above.  Finally, if SVE is not disabled, then at least one
      * vector length must be enabled.
      */
+    DECLARE_BITMAP(kvm_supported, ARM_MAX_VQ);
     DECLARE_BITMAP(tmp, ARM_MAX_VQ);
     uint32_t vq, max_vq = 0;
 
+    /* Collect the set of vector lengths supported by KVM. */
+    bitmap_zero(kvm_supported, ARM_MAX_VQ);
+    if (kvm_enabled() && kvm_arm_sve_supported(CPU(cpu))) {
+        kvm_arm_sve_get_vls(CPU(cpu), kvm_supported);
+    } else if (kvm_enabled()) {
+        assert(!cpu_isar_feature(aa64_sve, cpu));
+    }
+
     /*
      * Process explicit sve<N> properties.
      * From the properties, sve_vq_map<N> implies sve_vq_init<N>.
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
             return;
         }
 
-        /* Propagate enabled bits down through required powers-of-two. */
-        for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
-            if (!test_bit(vq - 1, cpu->sve_vq_init)) {
-                set_bit(vq - 1, cpu->sve_vq_map);
+        if (kvm_enabled()) {
+            /*
+             * For KVM we have to automatically enable all supported unitialized
+             * lengths, even when the smaller lengths are not all powers-of-two.
+             */
+            bitmap_andnot(tmp, kvm_supported, cpu->sve_vq_init, max_vq);
+            bitmap_or(cpu->sve_vq_map, cpu->sve_vq_map, tmp, max_vq);
+        } else {
+            /* Propagate enabled bits down through required powers-of-two. */
+            for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
+                if (!test_bit(vq - 1, cpu->sve_vq_init)) {
+                    set_bit(vq - 1, cpu->sve_vq_map);
+                }
             }
         }
     } else if (cpu->sve_max_vq == 0) {
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
             return;
         }
 
-        /* Disabling a power-of-two disables all larger lengths. */
-        if (test_bit(0, cpu->sve_vq_init)) {
-            error_setg(errp, "cannot disable sve128");
-            error_append_hint(errp, "Disabling sve128 results in all vector "
-                              "lengths being disabled.\n");
-            error_append_hint(errp, "With SVE enabled, at least one vector "
-                              "length must be enabled.\n");
-            return;
-        }
-        for (vq = 2; vq <= ARM_MAX_VQ; vq <<= 1) {
-            if (test_bit(vq - 1, cpu->sve_vq_init)) {
-                break;
+        if (kvm_enabled()) {
+            /* Disabling a supported length disables all larger lengths. */
+            for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
+                if (test_bit(vq - 1, cpu->sve_vq_init) &&
+                    test_bit(vq - 1, kvm_supported)) {
+                    break;
+                }
             }
+            max_vq = vq <= ARM_MAX_VQ ? vq - 1 : ARM_MAX_VQ;
+            bitmap_andnot(cpu->sve_vq_map, kvm_supported,
+                          cpu->sve_vq_init, max_vq);
+            if (max_vq == 0 || bitmap_empty(cpu->sve_vq_map, max_vq)) {
+                error_setg(errp, "cannot disable sve%d", vq * 128);
+                error_append_hint(errp, "Disabling sve%d results in all "
+                                  "vector lengths being disabled.\n",
+                                  vq * 128);
+                error_append_hint(errp, "With SVE enabled, at least one "
+                                  "vector length must be enabled.\n");
+                return;
+            }
+        } else {
+            /* Disabling a power-of-two disables all larger lengths. */
+            if (test_bit(0, cpu->sve_vq_init)) {
+                error_setg(errp, "cannot disable sve128");
+                error_append_hint(errp, "Disabling sve128 results in all "
+                                  "vector lengths being disabled.\n");
+                error_append_hint(errp, "With SVE enabled, at least one "
+                                  "vector length must be enabled.\n");
+                return;
+            }
+            for (vq = 2; vq <= ARM_MAX_VQ; vq <<= 1) {
+                if (test_bit(vq - 1, cpu->sve_vq_init)) {
+                    break;
+                }
+            }
+            max_vq = vq <= ARM_MAX_VQ ? vq - 1 : ARM_MAX_VQ;
+            bitmap_complement(cpu->sve_vq_map, cpu->sve_vq_init, max_vq);
         }
-        max_vq = vq <= ARM_MAX_VQ ? vq - 1 : ARM_MAX_VQ;
 
-        bitmap_complement(cpu->sve_vq_map, cpu->sve_vq_init, max_vq);
         max_vq = find_last_bit(cpu->sve_vq_map, max_vq) + 1;
     }
 
@@ -XXX,XX +XXX,XX @@ void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
     assert(max_vq != 0);
     bitmap_clear(cpu->sve_vq_map, max_vq, ARM_MAX_VQ - max_vq);
 
-    /* Ensure all required powers-of-two are enabled. */
-    for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
-        if (!test_bit(vq - 1, cpu->sve_vq_map)) {
-            error_setg(errp, "cannot disable sve%d", vq * 128);
-            error_append_hint(errp, "sve%d is required as it "
-                              "is a power-of-two length smaller than "
-                              "the maximum, sve%d\n",
-                              vq * 128, max_vq * 128);
+    if (kvm_enabled()) {
+        /* Ensure the set of lengths matches what KVM supports. */
+        bitmap_xor(tmp, cpu->sve_vq_map, kvm_supported, max_vq);
+        if (!bitmap_empty(tmp, max_vq)) {
+            vq = find_last_bit(tmp, max_vq) + 1;
+            if (test_bit(vq - 1, cpu->sve_vq_map)) {
+                if (cpu->sve_max_vq) {
+                    error_setg(errp, "cannot set sve-max-vq=%d",
+                               cpu->sve_max_vq);
+                    error_append_hint(errp, "This KVM host does not support "
+                                      "the vector length %d-bits.\n",
+                                      vq * 128);
+                    error_append_hint(errp, "It may not be possible to use "
+                                      "sve-max-vq with this KVM host. Try "
+                                      "using only sve<N> properties.\n");
+                } else {
+                    error_setg(errp, "cannot enable sve%d", vq * 128);
+                    error_append_hint(errp, "This KVM host does not support "
+                                      "the vector length %d-bits.\n",
+                                      vq * 128);
+                }
+            } else {
+                error_setg(errp, "cannot disable sve%d", vq * 128);
+                error_append_hint(errp, "The KVM host requires all "
+                                  "supported vector lengths smaller "
+                                  "than %d bits to also be enabled.\n",
+                                  max_vq * 128);
+            }
             return;
         }
+    } else {
+        /* Ensure all required powers-of-two are enabled. */
+        for (vq = pow2floor(max_vq); vq >= 1; vq >>= 1) {
+            if (!test_bit(vq - 1, cpu->sve_vq_map)) {
+                error_setg(errp, "cannot disable sve%d", vq * 128);
+                error_append_hint(errp, "sve%d is required as it "
+                                  "is a power-of-two length smaller than "
+                                  "the maximum, sve%d\n",
+                                  vq * 128, max_vq * 128);
+                return;
+            }
+        }
     }
 
     /*
@@ -XXX,XX +XXX,XX @@ static void cpu_max_set_sve_max_vq(Object *obj, Visitor *v, const char *name,
 {
     ARMCPU *cpu = ARM_CPU(obj);
     Error *err = NULL;
+    uint32_t max_vq;
 
-    visit_type_uint32(v, name, &cpu->sve_max_vq, &err);
-
-    if (!err && (cpu->sve_max_vq == 0 || cpu->sve_max_vq > ARM_MAX_VQ)) {
-        error_setg(&err, "unsupported SVE vector length");
-        error_append_hint(&err, "Valid sve-max-vq in range [1-%d]\n",
-                          ARM_MAX_VQ);
+    visit_type_uint32(v, name, &max_vq, &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
     }
-    error_propagate(errp, err);
+
+    if (kvm_enabled() && !kvm_arm_sve_supported(CPU(cpu))) {
+        error_setg(errp, "cannot set sve-max-vq");
+        error_append_hint(errp, "SVE not supported by KVM on this host\n");
+        return;
+    }
+
+    if (max_vq == 0 || max_vq > ARM_MAX_VQ) {
+        error_setg(errp, "unsupported SVE vector length");
+        error_append_hint(errp, "Valid sve-max-vq in range [1-%d]\n",
+                          ARM_MAX_VQ);
+        return;
+    }
+
+    cpu->sve_max_vq = max_vq;
 }
 
 static void cpu_arm_get_sve_vq(Object *obj, Visitor *v, const char *name,
@@ -XXX,XX +XXX,XX @@ static void cpu_arm_set_sve_vq(Object *obj, Visitor *v, const char *name,
         return;
     }
 
+    if (value && kvm_enabled() && !kvm_arm_sve_supported(CPU(cpu))) {
+        error_setg(errp, "cannot enable %s", name);
+        error_append_hint(errp, "SVE not supported by KVM on this host\n");
+        return;
+    }
+
     if (value) {
         set_bit(vq - 1, cpu->sve_vq_map);
     } else {
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
         cpu->ctr = 0x80038003; /* 32 byte I and D cacheline size, VIPT icache */
         cpu->dcz_blocksize = 7; /*  512 bytes */
 #endif
-
-        object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
-                            cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
-
-        for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
-            char name[8];
-            sprintf(name, "sve%d", vq * 128);
-            object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
-                                cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
-        }
     }
 
     object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
                         cpu_arm_set_sve, NULL, NULL, &error_fatal);
+    object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
+                        cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
+
+    for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
+        char name[8];
+        sprintf(name, "sve%d", vq * 128);
+        object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
+                            cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
+    }
 }
 
 struct ARMCPUInfo {
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_sve_supported(CPUState *cpu)
     return kvm_check_extension(s, KVM_CAP_ARM_SVE);
 }
 
+QEMU_BUILD_BUG_ON(KVM_ARM64_SVE_VQ_MIN != 1);
+
+void kvm_arm_sve_get_vls(CPUState *cs, unsigned long *map)
+{
+    /* Only call this function if kvm_arm_sve_supported() returns true. */
+    static uint64_t vls[KVM_ARM64_SVE_VLS_WORDS];
+    static bool probed;
+    uint32_t vq = 0;
+    int i, j;
+
+    bitmap_clear(map, 0, ARM_MAX_VQ);
+
+    /*
+     * KVM ensures all host CPUs support the same set of vector lengths.
+     * So we only need to create the scratch VCPUs once and then cache
+     * the results.
+     */
+    if (!probed) {
+        struct kvm_vcpu_init init = {
+            .target = -1,
+            .features[0] = (1 << KVM_ARM_VCPU_SVE),
+        };
+        struct kvm_one_reg reg = {
+            .id = KVM_REG_ARM64_SVE_VLS,
+            .addr = (uint64_t)&vls[0],
+        };
+        int fdarray[3], ret;
+
+        probed = true;
+
+        if (!kvm_arm_create_scratch_host_vcpu(NULL, fdarray, &init)) {
+            error_report("failed to create scratch VCPU with SVE enabled");
+            abort();
+        }
+        ret = ioctl(fdarray[2], KVM_GET_ONE_REG, &reg);
+        kvm_arm_destroy_scratch_host_vcpu(fdarray);
+        if (ret) {
+            error_report("failed to get KVM_REG_ARM64_SVE_VLS: %s",
+                         strerror(errno));
+            abort();
+        }
+
+        for (i = KVM_ARM64_SVE_VLS_WORDS - 1; i >= 0; --i) {
+            if (vls[i]) {
+                vq = 64 - clz64(vls[i]) + i * 64;
+                break;
+            }
+        }
+        if (vq > ARM_MAX_VQ) {
+            warn_report("KVM supports vector lengths larger than "
+                        "QEMU can enable");
+        }
+    }
+
+    for (i = 0; i < KVM_ARM64_SVE_VLS_WORDS; ++i) {
+        if (!vls[i]) {
+            continue;
+        }
+        for (j = 1; j <= 64; ++j) {
+            vq = j + i * 64;
+            if (vq > ARM_MAX_VQ) {
+                return;
+            }
+            if (vls[i] & (1UL << (j - 1))) {
+                set_bit(vq - 1, map);
+            }
+        }
+    }
+}
+
+static int kvm_arm_sve_set_vls(CPUState *cs)
+{
+    uint64_t vls[KVM_ARM64_SVE_VLS_WORDS] = {0};
+    struct kvm_one_reg reg = {
+        .id = KVM_REG_ARM64_SVE_VLS,
+        .addr = (uint64_t)&vls[0],
+    };
+    ARMCPU *cpu = ARM_CPU(cs);
+    uint32_t vq;
+    int i, j;
+
+    assert(cpu->sve_max_vq <= KVM_ARM64_SVE_VQ_MAX);
+
+    for (vq = 1; vq <= cpu->sve_max_vq; ++vq) {
+        if (test_bit(vq - 1, cpu->sve_vq_map)) {
+            i = (vq - 1) / 64;
+            j = (vq - 1) % 64;
+            vls[i] |= 1UL << j;
+        }
+    }
+
+    return kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
+}
+
 #define ARM_CPU_ID_MPIDR       3, 0, 0, 0, 5
 
 int kvm_arch_init_vcpu(CPUState *cs)
@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
 
     if (cpu->kvm_target == QEMU_KVM_ARM_TARGET_NONE ||
         !object_dynamic_cast(OBJECT(cpu), TYPE_AARCH64_CPU)) {
-        fprintf(stderr, "KVM is not supported for this guest CPU type\n");
+        error_report("KVM is not supported for this guest CPU type");
         return -EINVAL;
     }
 
@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
     }
 
     if (cpu_isar_feature(aa64_sve, cpu)) {
+        ret = kvm_arm_sve_set_vls(cs);
+        if (ret) {
+            return ret;
+        }
         ret = kvm_arm_vcpu_finalize(cs, KVM_ARM_VCPU_SVE);
         if (ret) {
             return ret;
diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/arm-cpu-features.c
+++ b/tests/arm-cpu-features.c
@@ -XXX,XX +XXX,XX @@ static QDict *resp_get_props(QDict *resp)
     return qdict;
 }
 
+static bool resp_get_feature(QDict *resp, const char *feature)
+{
+    QDict *props;
+
+    g_assert(resp);
+    g_assert(resp_has_props(resp));
+    props = resp_get_props(resp);
+    g_assert(qdict_get(props, feature));
+    return qdict_get_bool(props, feature);
+}
+
 #define assert_has_feature(qts, cpu_type, feature)                     \
 ({                                                                     \
     QDict *_resp = do_query_no_props(qts, cpu_type);                   \
@@ -XXX,XX +XXX,XX @@ static void sve_tests_sve_off(const void *data)
     qtest_quit(qts);
 }
 
+static void sve_tests_sve_off_kvm(const void *data)
+{
+    QTestState *qts;
+
+    qts = qtest_init(MACHINE "-accel kvm -cpu max,sve=off");
+
+    /*
+     * We don't know if this host supports SVE so we don't
+     * attempt to test enabling anything. We only test that
+     * everything is disabled (as it should be with sve=off)
+     * and that using sve<N>=off to explicitly disable vector
+     * lengths is OK too.
+     */
+    assert_sve_vls(qts, "max", 0, NULL);
+    assert_sve_vls(qts, "max", 0, "{ 'sve128': false }");
+
+    qtest_quit(qts);
+}
+
 static void test_query_cpu_model_expansion(const void *data)
 {
     QTestState *qts;
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
     qts = qtest_init(MACHINE "-accel kvm -cpu host");
 
     if (g_str_equal(qtest_get_arch(), "aarch64")) {
+        bool kvm_supports_sve;
+        char max_name[8], name[8];
+        uint32_t max_vq, vq;
+        uint64_t vls;
+        QDict *resp;
+        char *error;
+
         assert_has_feature(qts, "host", "aarch64");
         assert_has_feature(qts, "host", "pmu");
 
-        assert_has_feature(qts, "max", "sve");
-
         assert_error(qts, "cortex-a15",
             "We cannot guarantee the CPU type 'cortex-a15' works "
             "with KVM on this host", NULL);
+
+        assert_has_feature(qts, "max", "sve");
+        resp = do_query_no_props(qts, "max");
+        kvm_supports_sve = resp_get_feature(resp, "sve");
+        vls = resp_get_sve_vls(resp);
+        qobject_unref(resp);
+
+        if (kvm_supports_sve) {
+            g_assert(vls != 0);
+            max_vq = 64 - __builtin_clzll(vls);
+            sprintf(max_name, "sve%d", max_vq * 128);
+
+            /* Enabling a supported length is of course fine. */
+            assert_sve_vls(qts, "max", vls, "{ %s: true }", max_name);
+
+            /* Get the next supported length smaller than max-vq. */
+            vq = 64 - __builtin_clzll(vls & ~BIT_ULL(max_vq - 1));
+            if (vq) {
+                /*
+                 * We have at least one length smaller than max-vq,
+                 * so we can disable max-vq.
+                 */
+                assert_sve_vls(qts, "max", (vls & ~BIT_ULL(max_vq - 1)),
+                               "{ %s: false }", max_name);
+
+                /*
+                 * Smaller, supported vector lengths cannot be disabled
+                 * unless all larger, supported vector lengths are also
+                 * disabled.
+                 */
+                sprintf(name, "sve%d", vq * 128);
+                error = g_strdup_printf("cannot disable %s", name);
+                assert_error(qts, "max", error,
+                             "{ %s: true, %s: false }",
+                             max_name, name);
+                g_free(error);
+            }
+
+            /*
+             * The smallest, supported vector length is required, because
+             * we need at least one vector length enabled.
+             */
+            vq = __builtin_ffsll(vls);
+            sprintf(name, "sve%d", vq * 128);
+            error = g_strdup_printf("cannot disable %s", name);
+            assert_error(qts, "max", error, "{ %s: false }", name);
+            g_free(error);
+
+            /* Get an unsupported length. */
+            for (vq = 1; vq <= max_vq; ++vq) {
+                if (!(vls & BIT_ULL(vq - 1))) {
+                    break;
+                }
+            }
+            if (vq <= SVE_MAX_VQ) {
+                sprintf(name, "sve%d", vq * 128);
+                error = g_strdup_printf("cannot enable %s", name);
+                assert_error(qts, "max", error, "{ %s: true }", name);
+                g_free(error);
+            }
+        } else {
+            g_assert(vls == 0);
+        }
     } else {
         assert_has_not_feature(qts, "host", "aarch64");
         assert_has_not_feature(qts, "host", "pmu");
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
     if (kvm_available) {
         qtest_add_data_func("/arm/kvm/query-cpu-model-expansion",
                             NULL, test_query_cpu_model_expansion_kvm);
+        if (g_str_equal(qtest_get_arch(), "aarch64")) {
+            qtest_add_data_func("/arm/kvm/query-cpu-model-expansion/sve-off",
+                                NULL, sve_tests_sve_off_kvm);
+        }
     }
 
     return g_test_run();
diff --git a/docs/arm-cpu-features.rst b/docs/arm-cpu-features.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/arm-cpu-features.rst
+++ b/docs/arm-cpu-features.rst
@@ -XXX,XX +XXX,XX @@ SVE CPU Property Dependencies and Constraints
 
   1) At least one vector length must be enabled when `sve` is enabled.
 
-  2) If a vector length `N` is enabled, then all power-of-two vector
-     lengths smaller than `N` must also be enabled.  E.g. if `sve512`
-     is enabled, then the 128-bit and 256-bit vector lengths must also
-     be enabled.
+  2) If a vector length `N` is enabled, then, when KVM is enabled, all
+     smaller, host supported vector lengths must also be enabled.  If
+     KVM is not enabled, then only all the smaller, power-of-two vector
+     lengths must be enabled.  E.g. with KVM if the host supports all
+     vector lengths up to 512-bits (128, 256, 384, 512), then if `sve512`
+     is enabled, the 128-bit vector length, 256-bit vector length, and
+     384-bit vector length must also be enabled. Without KVM, the 384-bit
+     vector length would not be required.
+
+  3) If KVM is enabled then only vector lengths that the host CPU type
+     support may be enabled.  If SVE is not supported by the host, then
+     no `sve*` properties may be enabled.
 
 SVE CPU Property Parsing Semantics
 ----------------------------------
@@ -XXX,XX +XXX,XX @@ SVE CPU Property Parsing Semantics
      an error is generated.
 
   2) If SVE is enabled (`sve=on`), but no `sve<N>` CPU properties are
-     provided, then all supported vector lengths are enabled, including
-     the non-power-of-two lengths.
+     provided, then all supported vector lengths are enabled, which when
+     KVM is not in use means including the non-power-of-two lengths, and,
+     when KVM is in use, it means all vector lengths supported by the host
+     processor.
 
   3) If SVE is enabled, then an error is generated when attempting to
      disable the last enabled vector length (see constraint (1) of "SVE
@@ -XXX,XX +XXX,XX @@ SVE CPU Property Parsing Semantics
      has been explicitly disabled, then an error is generated (see
      constraint (2) of "SVE CPU Property Dependencies and Constraints").
 
-  5) If one or more `sve<N>` CPU properties are set `off`, but no `sve<N>`,
+  5) When KVM is enabled, if the host does not support SVE, then an error
+     is generated when attempting to enable any `sve*` properties (see
+     constraint (3) of "SVE CPU Property Dependencies and Constraints").
+
+  6) When KVM is enabled, if the host does support SVE, then an error is
+     generated when attempting to enable any vector lengths not supported
+     by the host (see constraint (3) of "SVE CPU Property Dependencies and
+     Constraints").
+
+  7) If one or more `sve<N>` CPU properties are set `off`, but no `sve<N>`,
      CPU properties are set `on`, then the specified vector lengths are
      disabled but the default for any unspecified lengths remains enabled.
-     Disabling a power-of-two vector length also disables all vector
-     lengths larger than the power-of-two length (see constraint (2) of
-     "SVE CPU Property Dependencies and Constraints").
+     When KVM is not enabled, disabling a power-of-two vector length also
+     disables all vector lengths larger than the power-of-two length.
+     When KVM is enabled, then disabling any supported vector length also
+     disables all larger vector lengths (see constraint (2) of "SVE CPU
+     Property Dependencies and Constraints").
 
-  6) If one or more `sve<N>` CPU properties are set to `on`, then they
+  8) If one or more `sve<N>` CPU properties are set to `on`, then they
      are enabled and all unspecified lengths default to disabled, except
      for the required lengths per constraint (2) of "SVE CPU Property
      Dependencies and Constraints", which will even be auto-enabled if
      they were not explicitly enabled.
 
-  7) If SVE was disabled (`sve=off`), allowing all vector lengths to be
+  9) If SVE was disabled (`sve=off`), allowing all vector lengths to be
      explicitly disabled (i.e. avoiding the error specified in (3) of
      "SVE CPU Property Parsing Semantics"), then if later an `sve=on` is
      provided an error will be generated.  To avoid this error, one must
-- 
2.20.1

From: Andrew Jones <drjones@redhat.com>

Allow cpu 'host' to enable SVE when it's available, unless the
user chooses to disable it with the added 'sve=off' cpu property.
Also give the user the ability to select vector lengths with the
sve<N> properties. We don't adopt 'max' cpu's other sve property,
sve-max-vq, because that property is difficult to use with KVM.
That property assumes all vector lengths in the range from 1 up
to and including the specified maximum length are supported, but
there may be optional lengths not supported by the host in that
range. With KVM one must be more specific when enabling vector
lengths.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Message-id: 20191024121808.9612-10-drjones@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.h          |  2 ++
 target/arm/cpu.c          |  3 +++
 target/arm/cpu64.c        | 33 +++++++++++++++++----------------
 target/arm/kvm64.c        | 14 +++++++++++++-
 tests/arm-cpu-features.c  | 23 +++++++++++------------
 docs/arm-cpu-features.rst | 19 ++++++++++++-------
 6 files changed, 58 insertions(+), 36 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ int aarch64_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
 void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq);
 void aarch64_sve_change_el(CPUARMState *env, int old_el,
                            int new_el, bool el0_a64);
+void aarch64_add_sve_properties(Object *obj);
 #else
 static inline void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq) { }
 static inline void aarch64_sve_change_el(CPUARMState *env, int o,
                                          int n, bool a)
 { }
+static inline void aarch64_add_sve_properties(Object *obj) { }
 #endif
 
 #if !defined(CONFIG_TCG)
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_host_initfn(Object *obj)
     ARMCPU *cpu = ARM_CPU(obj);
 
     kvm_arm_set_cpu_features_from_host(cpu);
+    if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
+        aarch64_add_sve_properties(obj);
+    }
     arm_cpu_post_init(obj);
 }
 
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
     cpu->isar.id_aa64pfr0 = t;
 }
 
+void aarch64_add_sve_properties(Object *obj)
+{
+    uint32_t vq;
+
+    object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
+                        cpu_arm_set_sve, NULL, NULL, &error_fatal);
+
+    for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
+        char name[8];
+        sprintf(name, "sve%d", vq * 128);
+        object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
+                            cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
+    }
+}
+
 /* -cpu max: if KVM is enabled, like -cpu host (best possible with this host);
  * otherwise, a CPU with as many features enabled as our emulation supports.
  * The version of '-cpu max' for qemu-system-arm is defined in cpu.c;
@@ -XXX,XX +XXX,XX @@ static void cpu_arm_set_sve(Object *obj, Visitor *v, const char *name,
 static void aarch64_max_initfn(Object *obj)
 {
     ARMCPU *cpu = ARM_CPU(obj);
-    uint32_t vq;
-    uint64_t t;
 
     if (kvm_enabled()) {
         kvm_arm_set_cpu_features_from_host(cpu);
-        if (kvm_arm_sve_supported(CPU(cpu))) {
-            t = cpu->isar.id_aa64pfr0;
-            t = FIELD_DP64(t, ID_AA64PFR0, SVE, 1);
-            cpu->isar.id_aa64pfr0 = t;
-        }
     } else {
+        uint64_t t;
         uint32_t u;
         aarch64_a57_initfn(obj);
 
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
 #endif
     }
 
-    object_property_add(obj, "sve", "bool", cpu_arm_get_sve,
-                        cpu_arm_set_sve, NULL, NULL, &error_fatal);
+    aarch64_add_sve_properties(obj);
     object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
                         cpu_max_set_sve_max_vq, NULL, NULL, &error_fatal);
-
-    for (vq = 1; vq <= ARM_MAX_VQ; ++vq) {
-        char name[8];
-        sprintf(name, "sve%d", vq * 128);
-        object_property_add(obj, name, "bool", cpu_arm_get_sve_vq,
-                            cpu_arm_set_sve_vq, NULL, NULL, &error_fatal);
-    }
 }
 
 struct ARMCPUInfo {
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
      * and then query that CPU for the relevant ID registers.
      */
     int fdarray[3];
+    bool sve_supported;
     uint64_t features = 0;
+    uint64_t t;
     int err;
 
     /* Old kernels may not know about the PREFERRED_TARGET ioctl: however
@@ -XXX,XX +XXX,XX @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
                               ARM64_SYS_REG(3, 0, 0, 3, 2));
     }
 
+    sve_supported = ioctl(fdarray[0], KVM_CHECK_EXTENSION, KVM_CAP_ARM_SVE) > 0;
+
     kvm_arm_destroy_scratch_host_vcpu(fdarray);
 
     if (err < 0) {
         return false;
     }
 
-   /* We can assume any KVM supporting CPU is at least a v8
+    /* Add feature bits that can't appear until after VCPU init. */
+    if (sve_supported) {
+        t = ahcf->isar.id_aa64pfr0;
+        t = FIELD_DP64(t, ID_AA64PFR0, SVE, 1);
+        ahcf->isar.id_aa64pfr0 = t;
+    }
+
+    /*
+     * We can assume any KVM supporting CPU is at least a v8
      * with VFPv4+Neon; this in turn implies most of the other
      * feature bits.
      */
diff --git a/tests/arm-cpu-features.c b/tests/arm-cpu-features.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/arm-cpu-features.c
+++ b/tests/arm-cpu-features.c
@@ -XXX,XX +XXX,XX @@ static void sve_tests_sve_off_kvm(const void *data)
 {
     QTestState *qts;
 
-    qts = qtest_init(MACHINE "-accel kvm -cpu max,sve=off");
+    qts = qtest_init(MACHINE "-accel kvm -cpu host,sve=off");
 
     /*
      * We don't know if this host supports SVE so we don't
@@ -XXX,XX +XXX,XX @@ static void sve_tests_sve_off_kvm(const void *data)
      * and that using sve<N>=off to explicitly disable vector
      * lengths is OK too.
      */
-    assert_sve_vls(qts, "max", 0, NULL);
-    assert_sve_vls(qts, "max", 0, "{ 'sve128': false }");
+    assert_sve_vls(qts, "host", 0, NULL);
+    assert_sve_vls(qts, "host", 0, "{ 'sve128': false }");
 
     qtest_quit(qts);
 }
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
             "We cannot guarantee the CPU type 'cortex-a15' works "
             "with KVM on this host", NULL);
 
-        assert_has_feature(qts, "max", "sve");
-        resp = do_query_no_props(qts, "max");
+        assert_has_feature(qts, "host", "sve");
+        resp = do_query_no_props(qts, "host");
         kvm_supports_sve = resp_get_feature(resp, "sve");
         vls = resp_get_sve_vls(resp);
         qobject_unref(resp);
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
             sprintf(max_name, "sve%d", max_vq * 128);
 
             /* Enabling a supported length is of course fine. */
-            assert_sve_vls(qts, "max", vls, "{ %s: true }", max_name);
+            assert_sve_vls(qts, "host", vls, "{ %s: true }", max_name);
 
             /* Get the next supported length smaller than max-vq. */
             vq = 64 - __builtin_clzll(vls & ~BIT_ULL(max_vq - 1));
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
                  * We have at least one length smaller than max-vq,
                  * so we can disable max-vq.
                  */
-                assert_sve_vls(qts, "max", (vls & ~BIT_ULL(max_vq - 1)),
+                assert_sve_vls(qts, "host", (vls & ~BIT_ULL(max_vq - 1)),
                                "{ %s: false }", max_name);
 
                 /*
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
                  */
                 sprintf(name, "sve%d", vq * 128);
                 error = g_strdup_printf("cannot disable %s", name);
-                assert_error(qts, "max", error,
+                assert_error(qts, "host", error,
                              "{ %s: true, %s: false }",
                              max_name, name);
                 g_free(error);
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
             vq = __builtin_ffsll(vls);
             sprintf(name, "sve%d", vq * 128);
             error = g_strdup_printf("cannot disable %s", name);
-            assert_error(qts, "max", error, "{ %s: false }", name);
+            assert_error(qts, "host", error, "{ %s: false }", name);
             g_free(error);
 
             /* Get an unsupported length. */
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
             if (vq <= SVE_MAX_VQ) {
                 sprintf(name, "sve%d", vq * 128);
                 error = g_strdup_printf("cannot enable %s", name);
-                assert_error(qts, "max", error, "{ %s: true }", name);
+                assert_error(qts, "host", error, "{ %s: true }", name);
                 g_free(error);
             }
         } else {
@@ -XXX,XX +XXX,XX @@ static void test_query_cpu_model_expansion_kvm(const void *data)
     } else {
         assert_has_not_feature(qts, "host", "aarch64");
         assert_has_not_feature(qts, "host", "pmu");
-
-        assert_has_not_feature(qts, "max", "sve");
+        assert_has_not_feature(qts, "host", "sve");
     }
 
     qtest_quit(qts);
diff --git a/docs/arm-cpu-features.rst b/docs/arm-cpu-features.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/arm-cpu-features.rst
+++ b/docs/arm-cpu-features.rst
@@ -XXX,XX +XXX,XX @@ SVE CPU Property Examples
 
      $ qemu-system-aarch64 -M virt -cpu max
 
-  3) Only enable the 128-bit vector length::
+  3) When KVM is enabled, implicitly enable all host CPU supported vector
+     lengths with the `host` CPU type::
+
+     $ qemu-system-aarch64 -M virt,accel=kvm -cpu host
+
+  4) Only enable the 128-bit vector length::
 
      $ qemu-system-aarch64 -M virt -cpu max,sve128=on
 
-  4) Disable the 512-bit vector length and all larger vector lengths,
+  5) Disable the 512-bit vector length and all larger vector lengths,
      since 512 is a power-of-two.  This results in all the smaller,
      uninitialized lengths (128, 256, and 384) defaulting to enabled::
 
      $ qemu-system-aarch64 -M virt -cpu max,sve512=off
 
-  5) Enable the 128-bit, 256-bit, and 512-bit vector lengths::
+  6) Enable the 128-bit, 256-bit, and 512-bit vector lengths::
 
      $ qemu-system-aarch64 -M virt -cpu max,sve128=on,sve256=on,sve512=on
 
-  6) The same as (5), but since the 128-bit and 256-bit vector
+  7) The same as (6), but since the 128-bit and 256-bit vector
      lengths are required for the 512-bit vector length to be enabled,
      then allow them to be auto-enabled::
 
      $ qemu-system-aarch64 -M virt -cpu max,sve512=on
 
-  7) Do the same as (6), but by first disabling SVE and then re-enabling it::
+  8) Do the same as (7), but by first disabling SVE and then re-enabling it::
 
      $ qemu-system-aarch64 -M virt -cpu max,sve=off,sve512=on,sve=on
 
-  8) Force errors regarding the last vector length::
+  9) Force errors regarding the last vector length::
 
      $ qemu-system-aarch64 -M virt -cpu max,sve128=off
      $ qemu-system-aarch64 -M virt -cpu max,sve=off,sve128=off,sve=on
@@ -XXX,XX +XXX,XX @@ The examples in "SVE CPU Property Examples" exhibit many ways to select
 vector lengths which developers may find useful in order to avoid overly
 verbose command lines.  However, the recommended way to select vector
 lengths is to explicitly enable each desired length.  Therefore only
-example's (1), (3), and (5) exhibit recommended uses of the properties.
+example's (1), (4), and (6) exhibit recommended uses of the properties.
 
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

We will soon implement the SYS_timer. This timer is used by Linux
in the thermal subsystem, so once available, the subsystem will be
enabled and poll the temperature sensors. We need to provide the
minimum required to keep Linux booting.

Add a dummy thermal sensor returning ~25°C based on:
https://github.com/raspberrypi/linux/blob/rpi-5.3.y/drivers/thermal/broadcom/bcm2835_thermal.c

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20191019234715.25750-2-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/misc/Makefile.objs             |   1 +
 include/hw/misc/bcm2835_thermal.h |  27 ++++++
 hw/misc/bcm2835_thermal.c         | 135 ++++++++++++++++++++++++++++++
 3 files changed, 163 insertions(+)
 create mode 100644 include/hw/misc/bcm2835_thermal.h
 create mode 100644 hw/misc/bcm2835_thermal.c

diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -XXX,XX +XXX,XX @@ common-obj-$(CONFIG_OMAP) += omap_tap.o
 common-obj-$(CONFIG_RASPI) += bcm2835_mbox.o
 common-obj-$(CONFIG_RASPI) += bcm2835_property.o
 common-obj-$(CONFIG_RASPI) += bcm2835_rng.o
+common-obj-$(CONFIG_RASPI) += bcm2835_thermal.o
 common-obj-$(CONFIG_SLAVIO) += slavio_misc.o
 common-obj-$(CONFIG_ZYNQ) += zynq_slcr.o
 common-obj-$(CONFIG_ZYNQ) += zynq-xadc.o
diff --git a/include/hw/misc/bcm2835_thermal.h b/include/hw/misc/bcm2835_thermal.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/hw/misc/bcm2835_thermal.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * BCM2835 dummy thermal sensor
+ *
+ * Copyright (C) 2019 Philippe Mathieu-Daudé <f4bug@amsat.org>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_MISC_BCM2835_THERMAL_H
+#define HW_MISC_BCM2835_THERMAL_H
+
+#include "hw/sysbus.h"
+
+#define TYPE_BCM2835_THERMAL "bcm2835-thermal"
+
+#define BCM2835_THERMAL(obj) \
+    OBJECT_CHECK(Bcm2835ThermalState, (obj), TYPE_BCM2835_THERMAL)
+
+typedef struct {
+    /*< private >*/
+    SysBusDevice parent_obj;
+    /*< public >*/
+    MemoryRegion iomem;
+    uint32_t ctl;
+} Bcm2835ThermalState;
+
+#endif
diff --git a/hw/misc/bcm2835_thermal.c b/hw/misc/bcm2835_thermal.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/misc/bcm2835_thermal.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * BCM2835 dummy thermal sensor
+ *
+ * Copyright (C) 2019 Philippe Mathieu-Daudé <f4bug@amsat.org>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qapi/error.h"
+#include "hw/misc/bcm2835_thermal.h"
+#include "hw/registerfields.h"
+#include "migration/vmstate.h"
+
+REG32(CTL, 0)
+FIELD(CTL, POWER_DOWN, 0, 1)
+FIELD(CTL, RESET, 1, 1)
+FIELD(CTL, BANDGAP_CTRL, 2, 3)
+FIELD(CTL, INTERRUPT_ENABLE, 5, 1)
+FIELD(CTL, DIRECT, 6, 1)
+FIELD(CTL, INTERRUPT_CLEAR, 7, 1)
+FIELD(CTL, HOLD, 8, 10)
+FIELD(CTL, RESET_DELAY, 18, 8)
+FIELD(CTL, REGULATOR_ENABLE, 26, 1)
+
+REG32(STAT, 4)
+FIELD(STAT, DATA, 0, 10)
+FIELD(STAT, VALID, 10, 1)
+FIELD(STAT, INTERRUPT, 11, 1)
+
+#define THERMAL_OFFSET_C 412
+#define THERMAL_COEFF  (-0.538f)
+
+static uint16_t bcm2835_thermal_temp2adc(int temp_C)
+{
+    return (temp_C - THERMAL_OFFSET_C) / THERMAL_COEFF;
+}
+
+static uint64_t bcm2835_thermal_read(void *opaque, hwaddr addr, unsigned size)
+{
+    Bcm2835ThermalState *s = BCM2835_THERMAL(opaque);
+    uint32_t val = 0;
+
+    switch (addr) {
+    case A_CTL:
+        val = s->ctl;
+        break;
+    case A_STAT:
+        /* Temperature is constantly 25°C. */
+        val = FIELD_DP32(bcm2835_thermal_temp2adc(25), STAT, VALID, true);
+        break;
+    default:
+        /* MemoryRegionOps are aligned, so this can not happen. */
+        g_assert_not_reached();
+    }
+    return val;
+}
+
+static void bcm2835_thermal_write(void *opaque, hwaddr addr,
+                                  uint64_t value, unsigned size)
+{
+    Bcm2835ThermalState *s = BCM2835_THERMAL(opaque);
+
+    switch (addr) {
+    case A_CTL:
+        s->ctl = value;
+        break;
+    case A_STAT:
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: write 0x%" PRIx64
+                                       " to 0x%" HWADDR_PRIx "\n",
+                       __func__, value, addr);
+        break;
+    default:
+        /* MemoryRegionOps are aligned, so this can not happen. */
+        g_assert_not_reached();
+    }
+}
+
+static const MemoryRegionOps bcm2835_thermal_ops = {
+    .read = bcm2835_thermal_read,
+    .write = bcm2835_thermal_write,
+    .impl.max_access_size = 4,
+    .valid.min_access_size = 4,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+static void bcm2835_thermal_reset(DeviceState *dev)
+{
+    Bcm2835ThermalState *s = BCM2835_THERMAL(dev);
+
+    s->ctl = 0;
+}
+
+static void bcm2835_thermal_realize(DeviceState *dev, Error **errp)
+{
+    Bcm2835ThermalState *s = BCM2835_THERMAL(dev);
+
+    memory_region_init_io(&s->iomem, OBJECT(s), &bcm2835_thermal_ops,
+                          s, TYPE_BCM2835_THERMAL, 8);
+    sysbus_init_mmio(SYS_BUS_DEVICE(s), &s->iomem);
+}
+
+static const VMStateDescription bcm2835_thermal_vmstate = {
+    .name = "bcm2835_thermal",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(ctl, Bcm2835ThermalState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static void bcm2835_thermal_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->realize = bcm2835_thermal_realize;
+    dc->reset = bcm2835_thermal_reset;
+    dc->vmsd = &bcm2835_thermal_vmstate;
+}
+
+static const TypeInfo bcm2835_thermal_info = {
+    .name = TYPE_BCM2835_THERMAL,
+    .parent = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(Bcm2835ThermalState),
+    .class_init = bcm2835_thermal_class_init,
+};
+
+static void bcm2835_thermal_register_types(void)
+{
+    type_register_static(&bcm2835_thermal_info);
+}
+
+type_init(bcm2835_thermal_register_types)
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

Map the thermal sensor in the BCM2835 block.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20191019234715.25750-3-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/bcm2835_peripherals.h |  2 ++
 include/hw/arm/raspi_platform.h      |  1 +
 hw/arm/bcm2835_peripherals.c         | 13 +++++++++++++
 3 files changed, 16 insertions(+)

diff --git a/include/hw/arm/bcm2835_peripherals.h b/include/hw/arm/bcm2835_peripherals.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/bcm2835_peripherals.h
+++ b/include/hw/arm/bcm2835_peripherals.h
@@ -XXX,XX +XXX,XX @@
 #include "hw/misc/bcm2835_property.h"
 #include "hw/misc/bcm2835_rng.h"
 #include "hw/misc/bcm2835_mbox.h"
+#include "hw/misc/bcm2835_thermal.h"
 #include "hw/sd/sdhci.h"
 #include "hw/sd/bcm2835_sdhost.h"
 #include "hw/gpio/bcm2835_gpio.h"
@@ -XXX,XX +XXX,XX @@ typedef struct BCM2835PeripheralState {
     SDHCIState sdhci;
     BCM2835SDHostState sdhost;
     BCM2835GpioState gpio;
+    Bcm2835ThermalState thermal;
     UnimplementedDeviceState i2s;
     UnimplementedDeviceState spi[1];
     UnimplementedDeviceState i2c[3];
diff --git a/include/hw/arm/raspi_platform.h b/include/hw/arm/raspi_platform.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/raspi_platform.h
+++ b/include/hw/arm/raspi_platform.h
@@ -XXX,XX +XXX,XX @@
 #define SPI0_OFFSET             0x204000
 #define BSC0_OFFSET             0x205000 /* BSC0 I2C/TWI */
 #define OTP_OFFSET              0x20f000
+#define THERMAL_OFFSET          0x212000
 #define BSC_SL_OFFSET           0x214000 /* SPI slave */
 #define AUX_OFFSET              0x215000 /* AUX: UART1/SPI1/SPI2 */
 #define EMMC1_OFFSET            0x300000
diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/bcm2835_peripherals.c
+++ b/hw/arm/bcm2835_peripherals.c
@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_init(Object *obj)
     object_property_add_const_link(OBJECT(&s->dma), "dma-mr",
                                    OBJECT(&s->gpu_bus_mr), &error_abort);
 
+    /* Thermal */
+    sysbus_init_child_obj(obj, "thermal", &s->thermal, sizeof(s->thermal),
+                          TYPE_BCM2835_THERMAL);
+
     /* GPIO */
     sysbus_init_child_obj(obj, "gpio", &s->gpio, sizeof(s->gpio),
                           TYPE_BCM2835_GPIO);
@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
                                                   INTERRUPT_DMA0 + n));
     }
 
+    /* THERMAL */
+    object_property_set_bool(OBJECT(&s->thermal), true, "realized", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    memory_region_add_subregion(&s->peri_mr, THERMAL_OFFSET,
+                sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->thermal), 0));
+
     /* GPIO */
     object_property_set_bool(OBJECT(&s->gpio), true, "realized", &err);
     if (err) {
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

Add the 64-bit free running timer. Do not model the COMPARE register
(no IRQ generated).
This timer is used by Linux kernel and recently U-Boot:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clocksource/bcm2835_timer.c?h=v3.7
https://github.com/u-boot/u-boot/blob/v2019.07/include/configs/rpi.h#L19

Datasheet used:
https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20191019234715.25750-4-f4bug@amsat.org
[PMM: squashed in switch to using memset in reset]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/timer/Makefile.objs            |   1 +
 include/hw/timer/bcm2835_systmr.h |  33 ++++++
 hw/timer/bcm2835_systmr.c         | 163 ++++++++++++++++++++++++++++++
 hw/timer/trace-events             |   5 +
 4 files changed, 202 insertions(+)
 create mode 100644 include/hw/timer/bcm2835_systmr.h
 create mode 100644 hw/timer/bcm2835_systmr.c

diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/Makefile.objs
+++ b/hw/timer/Makefile.objs
@@ -XXX,XX +XXX,XX @@ common-obj-$(CONFIG_SUN4V_RTC) += sun4v-rtc.o
 common-obj-$(CONFIG_CMSDK_APB_TIMER) += cmsdk-apb-timer.o
 common-obj-$(CONFIG_CMSDK_APB_DUALTIMER) += cmsdk-apb-dualtimer.o
 common-obj-$(CONFIG_MSF2) += mss-timer.o
+common-obj-$(CONFIG_RASPI) += bcm2835_systmr.o
diff --git a/include/hw/timer/bcm2835_systmr.h b/include/hw/timer/bcm2835_systmr.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/hw/timer/bcm2835_systmr.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * BCM2835 SYS timer emulation
+ *
+ * Copyright (c) 2019 Philippe Mathieu-Daudé <f4bug@amsat.org>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef BCM2835_SYSTIMER_H
+#define BCM2835_SYSTIMER_H
+
+#include "hw/sysbus.h"
+#include "hw/irq.h"
+
+#define TYPE_BCM2835_SYSTIMER "bcm2835-sys-timer"
+#define BCM2835_SYSTIMER(obj) \
+    OBJECT_CHECK(BCM2835SystemTimerState, (obj), TYPE_BCM2835_SYSTIMER)
+
+typedef struct {
+    /*< private >*/
+    SysBusDevice parent_obj;
+
+    /*< public >*/
+    MemoryRegion iomem;
+    qemu_irq irq;
+
+    struct {
+        uint32_t status;
+        uint32_t compare[4];
+    } reg;
+} BCM2835SystemTimerState;
+
+#endif
diff --git a/hw/timer/bcm2835_systmr.c b/hw/timer/bcm2835_systmr.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/timer/bcm2835_systmr.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * BCM2835 SYS timer emulation
+ *
+ * Copyright (C) 2019 Philippe Mathieu-Daudé <f4bug@amsat.org>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Datasheet: BCM2835 ARM Peripherals (C6357-M-1398)
+ * https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
+ *
+ * Only the free running 64-bit counter is implemented.
+ * The 4 COMPARE registers and the interruption are not implemented.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/timer.h"
+#include "hw/timer/bcm2835_systmr.h"
+#include "hw/registerfields.h"
+#include "migration/vmstate.h"
+#include "trace.h"
+
+REG32(CTRL_STATUS,  0x00)
+REG32(COUNTER_LOW,  0x04)
+REG32(COUNTER_HIGH, 0x08)
+REG32(COMPARE0,     0x0c)
+REG32(COMPARE1,     0x10)
+REG32(COMPARE2,     0x14)
+REG32(COMPARE3,     0x18)
+
+static void bcm2835_systmr_update_irq(BCM2835SystemTimerState *s)
+{
+    bool enable = !!s->reg.status;
+
+    trace_bcm2835_systmr_irq(enable);
+    qemu_set_irq(s->irq, enable);
+}
+
+static void bcm2835_systmr_update_compare(BCM2835SystemTimerState *s,
+                                          unsigned timer_index)
+{
+    /* TODO fow now, since neither Linux nor U-boot use these timers. */
+    qemu_log_mask(LOG_UNIMP, "COMPARE register %u not implemented\n",
+                  timer_index);
+}
+
+static uint64_t bcm2835_systmr_read(void *opaque, hwaddr offset,
+                                    unsigned size)
+{
+    BCM2835SystemTimerState *s = BCM2835_SYSTIMER(opaque);
+    uint64_t r = 0;
+
+    switch (offset) {
+    case A_CTRL_STATUS:
+        r = s->reg.status;
+        break;
+    case A_COMPARE0 ... A_COMPARE3:
+        r = s->reg.compare[(offset - A_COMPARE0) >> 2];
+        break;
+    case A_COUNTER_LOW:
+    case A_COUNTER_HIGH:
+        /* Free running counter at 1MHz */
+        r = qemu_clock_get_us(QEMU_CLOCK_VIRTUAL);
+        r >>= 8 * (offset - A_COUNTER_LOW);
+        r &= UINT32_MAX;
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: bad offset 0x%" HWADDR_PRIx "\n",
+                      __func__, offset);
+        break;
+    }
+    trace_bcm2835_systmr_read(offset, r);
+
+    return r;
+}
+
+static void bcm2835_systmr_write(void *opaque, hwaddr offset,
+                                 uint64_t value, unsigned size)
+{
+    BCM2835SystemTimerState *s = BCM2835_SYSTIMER(opaque);
+
+    trace_bcm2835_systmr_write(offset, value);
+    switch (offset) {
+    case A_CTRL_STATUS:
+        s->reg.status &= ~value; /* Ack */
+        bcm2835_systmr_update_irq(s);
+        break;
+    case A_COMPARE0 ... A_COMPARE3:
+        s->reg.compare[(offset - A_COMPARE0) >> 2] = value;
+        bcm2835_systmr_update_compare(s, (offset - A_COMPARE0) >> 2);
+        break;
+    case A_COUNTER_LOW:
+    case A_COUNTER_HIGH:
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: read-only ofs 0x%" HWADDR_PRIx "\n",
+                      __func__, offset);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: bad offset 0x%" HWADDR_PRIx "\n",
+                      __func__, offset);
+        break;
+    }
+}
+
+static const MemoryRegionOps bcm2835_systmr_ops = {
+    .read = bcm2835_systmr_read,
+    .write = bcm2835_systmr_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+};
+
+static void bcm2835_systmr_reset(DeviceState *dev)
+{
+    BCM2835SystemTimerState *s = BCM2835_SYSTIMER(dev);
+
+    memset(&s->reg, 0, sizeof(s->reg));
+}
+
+static void bcm2835_systmr_realize(DeviceState *dev, Error **errp)
+{
+    BCM2835SystemTimerState *s = BCM2835_SYSTIMER(dev);
+
+    memory_region_init_io(&s->iomem, OBJECT(dev), &bcm2835_systmr_ops,
+                          s, "bcm2835-sys-timer", 0x20);
+    sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->iomem);
+    sysbus_init_irq(SYS_BUS_DEVICE(dev), &s->irq);
+}
+
+static const VMStateDescription bcm2835_systmr_vmstate = {
+    .name = "bcm2835_sys_timer",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(reg.status, BCM2835SystemTimerState),
+        VMSTATE_UINT32_ARRAY(reg.compare, BCM2835SystemTimerState, 4),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static void bcm2835_systmr_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->realize = bcm2835_systmr_realize;
+    dc->reset = bcm2835_systmr_reset;
+    dc->vmsd = &bcm2835_systmr_vmstate;
+}
+
+static const TypeInfo bcm2835_systmr_info = {
+    .name = TYPE_BCM2835_SYSTIMER,
+    .parent = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(BCM2835SystemTimerState),
+    .class_init = bcm2835_systmr_class_init,
+};
+
+static void bcm2835_systmr_register_types(void)
+{
+    type_register_static(&bcm2835_systmr_info);
+}
+
+type_init(bcm2835_systmr_register_types);
diff --git a/hw/timer/trace-events b/hw/timer/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/trace-events
+++ b/hw/timer/trace-events
@@ -XXX,XX +XXX,XX @@ pl031_read(uint32_t addr, uint32_t value) "addr 0x%08x value 0x%08x"
 pl031_write(uint32_t addr, uint32_t value) "addr 0x%08x value 0x%08x"
 pl031_alarm_raised(void) "alarm raised"
 pl031_set_alarm(uint32_t ticks) "alarm set for %u ticks"
+
+# bcm2835_systmr.c
+bcm2835_systmr_irq(bool enable) "timer irq state %u"
+bcm2835_systmr_read(uint64_t offset, uint64_t data) "timer read: offset 0x%" PRIx64 " data 0x%" PRIx64
+bcm2835_systmr_write(uint64_t offset, uint64_t data) "timer write: offset 0x%" PRIx64 " data 0x%" PRIx64
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

Connect the recently added SYS_timer.
Now U-Boot does not hang anymore polling a free running counter
stuck at 0.
This timer is also used by the Linux kernel thermal subsystem.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20191019234715.25750-5-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/bcm2835_peripherals.h |  3 ++-
 hw/arm/bcm2835_peripherals.c         | 17 ++++++++++++++++-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/hw/arm/bcm2835_peripherals.h b/include/hw/arm/bcm2835_peripherals.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/bcm2835_peripherals.h
+++ b/include/hw/arm/bcm2835_peripherals.h
@@ -XXX,XX +XXX,XX @@
 #include "hw/sd/sdhci.h"
 #include "hw/sd/bcm2835_sdhost.h"
 #include "hw/gpio/bcm2835_gpio.h"
+#include "hw/timer/bcm2835_systmr.h"
 #include "hw/misc/unimp.h"
 
 #define TYPE_BCM2835_PERIPHERALS "bcm2835-peripherals"
@@ -XXX,XX +XXX,XX @@ typedef struct BCM2835PeripheralState {
     MemoryRegion ram_alias[4];
     qemu_irq irq, fiq;
 
-    UnimplementedDeviceState systmr;
+    BCM2835SystemTimerState systmr;
     UnimplementedDeviceState armtmr;
     UnimplementedDeviceState cprman;
     UnimplementedDeviceState a2w;
diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/bcm2835_peripherals.c
+++ b/hw/arm/bcm2835_peripherals.c
@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_init(Object *obj)
     /* Interrupt Controller */
     sysbus_init_child_obj(obj, "ic", &s->ic, sizeof(s->ic), TYPE_BCM2835_IC);
 
+    /* SYS Timer */
+    sysbus_init_child_obj(obj, "systimer", &s->systmr, sizeof(s->systmr),
+                          TYPE_BCM2835_SYSTIMER);
+
     /* UART0 */
     sysbus_init_child_obj(obj, "uart0", &s->uart0, sizeof(s->uart0),
                           TYPE_PL011);
@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
                 sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->ic), 0));
     sysbus_pass_irq(SYS_BUS_DEVICE(s), SYS_BUS_DEVICE(&s->ic));
 
+    /* Sys Timer */
+    object_property_set_bool(OBJECT(&s->systmr), true, "realized", &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    memory_region_add_subregion(&s->peri_mr, ST_OFFSET,
+                sysbus_mmio_get_region(SYS_BUS_DEVICE(&s->systmr), 0));
+    sysbus_connect_irq(SYS_BUS_DEVICE(&s->systmr), 0,
+        qdev_get_gpio_in_named(DEVICE(&s->ic), BCM2835_IC_ARM_IRQ,
+                               INTERRUPT_ARM_TIMER));
+
     /* UART0 */
     qdev_prop_set_chr(DEVICE(&s->uart0), "chardev", serial_hd(0));
     object_property_set_bool(OBJECT(&s->uart0), true, "realized", &err);
@@ -XXX,XX +XXX,XX @@ static void bcm2835_peripherals_realize(DeviceState *dev, Error **errp)
     }
 
     create_unimp(s, &s->armtmr, "bcm2835-sp804", ARMCTRL_TIMER0_1_OFFSET, 0x40);
-    create_unimp(s, &s->systmr, "bcm2835-systimer", ST_OFFSET, 0x20);
     create_unimp(s, &s->cprman, "bcm2835-cprman", CPRMAN_OFFSET, 0x1000);
     create_unimp(s, &s->a2w, "bcm2835-a2w", A2W_OFFSET, 0x1000);
     create_unimp(s, &s->i2s, "bcm2835-i2s", I2S_OFFSET, 0x100);
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

This file creates the BCM2836/BCM2837 blocks.
The biggest differences with the BCM2838 we are going to add, are
the base addresses of the interrupt controller and the peripherals.
Add these addresses in the BCM283XInfo structure to make this
block more modular. Remove the MCORE_OFFSET offset as it is
not useful and rather confusing.

Reviewed-by: Esteban Bosse <estebanbosse@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20191019234715.25750-6-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/bcm2836.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/bcm2836.c
+++ b/hw/arm/bcm2836.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/arm/raspi_platform.h"
 #include "hw/sysbus.h"
 
-/* Peripheral base address seen by the CPU */
-#define BCM2836_PERI_BASE       0x3F000000
-
-/* "QA7" (Pi2) interrupt controller and mailboxes etc. */
-#define BCM2836_CONTROL_BASE    0x40000000
-
 struct BCM283XInfo {
     const char *name;
     const char *cpu_type;
+    hwaddr peri_base; /* Peripheral base address seen by the CPU */
+    hwaddr ctrl_base; /* Interrupt controller and mailboxes etc. */
     int clusterid;
 };
 
@@ -XXX,XX +XXX,XX @@ static const BCM283XInfo bcm283x_socs[] = {
     {
         .name = TYPE_BCM2836,
         .cpu_type = ARM_CPU_TYPE_NAME("cortex-a7"),
+        .peri_base = 0x3f000000,
+        .ctrl_base = 0x40000000,
         .clusterid = 0xf,
     },
 #ifdef TARGET_AARCH64
     {
         .name = TYPE_BCM2837,
         .cpu_type = ARM_CPU_TYPE_NAME("cortex-a53"),
+        .peri_base = 0x3f000000,
+        .ctrl_base = 0x40000000,
         .clusterid = 0x0,
     },
 #endif
@@ -XXX,XX +XXX,XX @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
     }
 
     sysbus_mmio_map_overlap(SYS_BUS_DEVICE(&s->peripherals), 0,
-                            BCM2836_PERI_BASE, 1);
+                            info->peri_base, 1);
 
     /* bcm2836 interrupt controller (and mailboxes, etc.) */
     object_property_set_bool(OBJECT(&s->control), true, "realized", &err);
@@ -XXX,XX +XXX,XX @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
         return;
     }
 
-    sysbus_mmio_map(SYS_BUS_DEVICE(&s->control), 0, BCM2836_CONTROL_BASE);
+    sysbus_mmio_map(SYS_BUS_DEVICE(&s->control), 0, info->ctrl_base);
 
     sysbus_connect_irq(SYS_BUS_DEVICE(&s->peripherals), 0,
         qdev_get_gpio_in_named(DEVICE(&s->control), "gpu-irq", 0));
@@ -XXX,XX +XXX,XX @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
 
         /* set periphbase/CBAR value for CPU-local registers */
         object_property_set_int(OBJECT(&s->cpus[n]),
-                                BCM2836_PERI_BASE + MSYNC_OFFSET,
+                                info->peri_base,
                                 "reset-cbar", &err);
         if (err) {
             error_propagate(errp, err);
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

As we are going to add more core-specific fields, add a 'cpu'
structure and move the ARMCPU field there as 'core'.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20191019234715.25750-7-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/bcm2836.h |  4 +++-
 hw/arm/bcm2836.c         | 26 ++++++++++++++------------
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/include/hw/arm/bcm2836.h b/include/hw/arm/bcm2836.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/bcm2836.h
+++ b/include/hw/arm/bcm2836.h
@@ -XXX,XX +XXX,XX @@ typedef struct BCM283XState {
     char *cpu_type;
     uint32_t enabled_cpus;
 
-    ARMCPU cpus[BCM283X_NCPUS];
+    struct {
+        ARMCPU core;
+    } cpu[BCM283X_NCPUS];
     BCM2836ControlState control;
     BCM2835PeripheralState peripherals;
 } BCM283XState;
diff --git a/hw/arm/bcm2836.c b/hw/arm/bcm2836.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/bcm2836.c
+++ b/hw/arm/bcm2836.c
@@ -XXX,XX +XXX,XX @@ static void bcm2836_init(Object *obj)
     int n;
 
     for (n = 0; n < BCM283X_NCPUS; n++) {
-        object_initialize_child(obj, "cpu[*]", &s->cpus[n], sizeof(s->cpus[n]),
-                                info->cpu_type, &error_abort, NULL);
+        object_initialize_child(obj, "cpu[*]", &s->cpu[n].core,
+                                sizeof(s->cpu[n].core), info->cpu_type,
+                                &error_abort, NULL);
     }
 
     sysbus_init_child_obj(obj, "control", &s->control, sizeof(s->control),
@@ -XXX,XX +XXX,XX @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
 
     for (n = 0; n < BCM283X_NCPUS; n++) {
         /* TODO: this should be converted to a property of ARM_CPU */
-        s->cpus[n].mp_affinity = (info->clusterid << 8) | n;
+        s->cpu[n].core.mp_affinity = (info->clusterid << 8) | n;
 
         /* set periphbase/CBAR value for CPU-local registers */
-        object_property_set_int(OBJECT(&s->cpus[n]),
+        object_property_set_int(OBJECT(&s->cpu[n].core),
                                 info->peri_base,
                                 "reset-cbar", &err);
         if (err) {
@@ -XXX,XX +XXX,XX @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
         }
 
         /* start powered off if not enabled */
-        object_property_set_bool(OBJECT(&s->cpus[n]), n >= s->enabled_cpus,
+        object_property_set_bool(OBJECT(&s->cpu[n].core), n >= s->enabled_cpus,
                                  "start-powered-off", &err);
         if (err) {
             error_propagate(errp, err);
             return;
         }
 
-        object_property_set_bool(OBJECT(&s->cpus[n]), true, "realized", &err);
+        object_property_set_bool(OBJECT(&s->cpu[n].core), true,
+                                 "realized", &err);
         if (err) {
             error_propagate(errp, err);
             return;
@@ -XXX,XX +XXX,XX @@ static void bcm2836_realize(DeviceState *dev, Error **errp)
 
         /* Connect irq/fiq outputs from the interrupt controller. */
         qdev_connect_gpio_out_named(DEVICE(&s->control), "irq", n,
-                qdev_get_gpio_in(DEVICE(&s->cpus[n]), ARM_CPU_IRQ));
+                qdev_get_gpio_in(DEVICE(&s->cpu[n].core), ARM_CPU_IRQ));
         qdev_connect_gpio_out_named(DEVICE(&s->control), "fiq", n,
-                qdev_get_gpio_in(DEVICE(&s->cpus[n]), ARM_CPU_FIQ));
+                qdev_get_gpio_in(DEVICE(&s->cpu[n].core), ARM_CPU_FIQ));
 
         /* Connect timers from the CPU to the interrupt controller */
-        qdev_connect_gpio_out(DEVICE(&s->cpus[n]), GTIMER_PHYS,
+        qdev_connect_gpio_out(DEVICE(&s->cpu[n].core), GTIMER_PHYS,
                 qdev_get_gpio_in_named(DEVICE(&s->control), "cntpnsirq", n));
-        qdev_connect_gpio_out(DEVICE(&s->cpus[n]), GTIMER_VIRT,
+        qdev_connect_gpio_out(DEVICE(&s->cpu[n].core), GTIMER_VIRT,
                 qdev_get_gpio_in_named(DEVICE(&s->control), "cntvirq", n));
-        qdev_connect_gpio_out(DEVICE(&s->cpus[n]), GTIMER_HYP,
+        qdev_connect_gpio_out(DEVICE(&s->cpu[n].core), GTIMER_HYP,
                 qdev_get_gpio_in_named(DEVICE(&s->control), "cnthpirq", n));
-        qdev_connect_gpio_out(DEVICE(&s->cpus[n]), GTIMER_SEC,
+        qdev_connect_gpio_out(DEVICE(&s->cpu[n].core), GTIMER_SEC,
                 qdev_get_gpio_in_named(DEVICE(&s->control), "cntpsirq", n));
     }
 }
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

write_secondary_boot() is used in SMP configurations where the
CPU address space might not be the main System Bus.
The rom_add_blob_fixed_as() function allow us to specify an
address space. Use it to write each boot blob in the corresponding
CPU address space.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20191019234715.25750-11-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/raspi.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/hw/arm/raspi.c b/hw/arm/raspi.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/raspi.c
+++ b/hw/arm/raspi.c
@@ -XXX,XX +XXX,XX @@ static void write_smpboot(ARMCPU *cpu, const struct arm_boot_info *info)
     QEMU_BUILD_BUG_ON((BOARDSETUP_ADDR & 0xf) != 0
                       || (BOARDSETUP_ADDR >> 4) >= 0x100);
 
-    rom_add_blob_fixed("raspi_smpboot", smpboot, sizeof(smpboot),
-                       info->smp_loader_start);
+    rom_add_blob_fixed_as("raspi_smpboot", smpboot, sizeof(smpboot),
+                          info->smp_loader_start,
+                          arm_boot_address_space(cpu, info));
 }
 
 static void write_smpboot64(ARMCPU *cpu, const struct arm_boot_info *info)
 {
+    AddressSpace *as = arm_boot_address_space(cpu, info);
     /* Unlike the AArch32 version we don't need to call the board setup hook.
      * The mechanism for doing the spin-table is also entirely different.
      * We must have four 64-bit fields at absolute addresses
@@ -XXX,XX +XXX,XX @@ static void write_smpboot64(ARMCPU *cpu, const struct arm_boot_info *info)
         0, 0, 0, 0
     };
 
-    rom_add_blob_fixed("raspi_smpboot", smpboot, sizeof(smpboot),
-                       info->smp_loader_start);
-    rom_add_blob_fixed("raspi_spintables", spintables, sizeof(spintables),
-                       SPINTABLE_ADDR);
+    rom_add_blob_fixed_as("raspi_smpboot", smpboot, sizeof(smpboot),
+                          info->smp_loader_start, as);
+    rom_add_blob_fixed_as("raspi_spintables", spintables, sizeof(spintables),
+                          SPINTABLE_ADDR, as);
 }
 
 static void write_board_setup(ARMCPU *cpu, const struct arm_boot_info *info)
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20191019234715.25750-15-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/highbank.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/highbank.c
+++ b/hw/arm/highbank.c
@@ -XXX,XX +XXX,XX @@ static void hb_write_secondary(ARMCPU *cpu, const struct arm_boot_info *info)
     for (n = 0; n < ARRAY_SIZE(smpboot); n++) {
         smpboot[n] = tswap32(smpboot[n]);
     }
-    rom_add_blob_fixed("smpboot", smpboot, sizeof(smpboot), SMP_BOOT_ADDR);
+    rom_add_blob_fixed_as("smpboot", smpboot, sizeof(smpboot), SMP_BOOT_ADDR,
+                          arm_boot_address_space(cpu, info));
 }
 
 static void hb_reset_secondary(ARMCPU *cpu, const struct arm_boot_info *info)
-- 
2.20.1

Small pile of bug fixes for rc1. I've included my patches to get
our docs building with Sphinx 3, just for convenience...

-- PMM

The following changes since commit b149dea55cce97cb226683d06af61984a1c11e96:

Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20201102' into staging (2020-11-02 10:57:48 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20201102

for you to fetch changes up to ffb4fbf90a2f63c9cb33e4bb9f854c79bf04ca4a:

tests/qtest/npcm7xx_rng-test: Disable randomness tests (2020-11-02 16:52:18 +0000)

----------------------------------------------------------------
target-arm queue:
 * target/arm: Fix Neon emulation bugs on big-endian hosts
 * target/arm: fix handling of HCR.FB
 * target/arm: fix LORID_EL1 access check
 * disas/capstone: Fix monitor disassembly of >32 bytes
 * hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
 * hw/arm/boot: fix SVE for EL3 direct kernel boot
 * hw/display/omap_lcdc: Fix potential NULL pointer dereference
 * hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
 * target/arm: Get correct MMU index for other-security-state
 * configure: Test that gio libs from pkg-config work
 * hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
 * docs: Fix building with Sphinx 3
 * tests/qtest/npcm7xx_rng-test: Disable randomness tests

----------------------------------------------------------------
AlexChen (2):
      hw/display/omap_lcdc: Fix potential NULL pointer dereference
      hw/display/exynos4210_fimd: Fix potential NULL pointer dereference

Peter Maydell (9):
      target/arm: Fix float16 pairwise Neon ops on big-endian hosts
      target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
      disas/capstone: Fix monitor disassembly of >32 bytes
      target/arm: Get correct MMU index for other-security-state
      configure: Test that gio libs from pkg-config work
      hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
      scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
      qemu-option-trace.rst.inc: Don't use option:: markup
      tests/qtest/npcm7xx_rng-test: Disable randomness tests

Philippe Mathieu-Daudé (1):
      hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)

Richard Henderson (11):
      target/arm: Introduce neon_full_reg_offset
      target/arm: Move neon_element_offset to translate.c
      target/arm: Use neon_element_offset in neon_load/store_reg
      target/arm: Use neon_element_offset in vfp_reg_offset
      target/arm: Add read/write_neon_element32
      target/arm: Expand read/write_neon_element32 to all MemOp
      target/arm: Rename neon_load_reg32 to vfp_load_reg32
      target/arm: Add read/write_neon_element64
      target/arm: Rename neon_load_reg64 to vfp_load_reg64
      target/arm: Simplify do_long_3d and do_2scalar_long
      target/arm: Improve do_prewiden_3d

Rémi Denis-Courmont (3):
      target/arm: fix handling of HCR.FB
      target/arm: fix LORID_EL1 access check
      hw/arm/boot: fix SVE for EL3 direct kernel boot

From: Richard Henderson <richard.henderson@linaro.org>

This function makes it clear that we're talking about the whole
register, and not the 32-bit piece at index 0.  This fixes a bug
when running on a big-endian host.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  8 ++++++
 target/arm/translate-neon.c.inc | 44 ++++++++++++++++-----------------
 target/arm/translate-vfp.c.inc  |  2 +-
 3 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_hlt(DisasContext *s, int imm)
     unallocated_encoding(s);
 }
 
+/*
+ * Return the offset of a "full" NEON Dreg.
+ */
+static long neon_full_reg_offset(unsigned reg)
+{
+    return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+}
+
 static inline long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ neon_element_offset(int reg, int element, MemOp size)
         ofs ^= 8 - element_size;
     }
 #endif
-    return neon_reg_offset(reg, 0) + ofs;
+    return neon_full_reg_offset(reg) + ofs;
 }
 
 static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
              * We cannot write 16 bytes at once because the
              * destination is unaligned.
              */
-            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
+            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
                                  8, 8, tmp);
-            tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
-                             neon_reg_offset(vd, 0), 8, 8);
+            tcg_gen_gvec_mov(0, neon_full_reg_offset(vd + 1),
+                             neon_full_reg_offset(vd), 8, 8);
         } else {
-            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
+            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
                                  vec_size, vec_size, tmp);
         }
         tcg_gen_addi_i32(addr, addr, 1 << size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
 static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
 {
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rn_ofs = neon_reg_offset(a->vn, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rn_ofs = neon_full_reg_offset(a->vn);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
 {
     /* Handle a 2-reg-shift insn which can be vectorized. */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
 {
     /* FP operations in 2-reg-and-shift group */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
     TCGv_ptr fpst;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
         return true;
     }
 
-    reg_ofs = neon_reg_offset(a->vd, 0);
+    reg_ofs = neon_full_reg_offset(a->vd);
     vec_size = a->q ? 16 : 8;
     imm = asimd_imm_const(a->imm, a->cmode, a->op);
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
         return true;
     }
 
-    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
-                       neon_reg_offset(a->vn, 0),
-                       neon_reg_offset(a->vm, 0),
+    tcg_gen_gvec_3_ool(neon_full_reg_offset(a->vd),
+                       neon_full_reg_offset(a->vn),
+                       neon_full_reg_offset(a->vm),
                        16, 16, 0, fn_gvec);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
 {
     /* Two registers and a scalar, using gvec */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rn_ofs = neon_reg_offset(a->vn, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rn_ofs = neon_full_reg_offset(a->vn);
     int rm_ofs;
     int idx;
     TCGv_ptr fpstatus;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
     /* a->vm is M:Vm, which encodes both register and index */
     idx = extract32(a->vm, a->size + 2, 2);
     a->vm = extract32(a->vm, 0, a->size + 2);
-    rm_ofs = neon_reg_offset(a->vm, 0);
+    rm_ofs = neon_full_reg_offset(a->vm);
 
     fpstatus = fpstatus_ptr(a->size == 1 ? FPST_STD_F16 : FPST_STD);
     tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpstatus,
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
         return true;
     }
 
-    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
+    tcg_gen_gvec_dup_mem(a->size, neon_full_reg_offset(a->vd),
                          neon_element_offset(a->vm, a->index, a->size),
                          a->q ? 16 : 8, a->q ? 16 : 8);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
 static bool do_2misc_vec(DisasContext *s, arg_2misc *a, GVecGen2Fn *fn)
 {
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
     }
 
     tmp = load_reg(s, a->rt);
-    tcg_gen_gvec_dup_i32(size, neon_reg_offset(a->vn, 0),
+    tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(a->vn),
                          vec_size, vec_size, tmp);
     tcg_temp_free_i32(tmp);
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This will shortly have users outside of translate-neon.c.inc.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          | 20 ++++++++++++++++++++
 target/arm/translate-neon.c.inc | 19 -------------------
 2 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
     return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
 }
 
+/*
+ * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
+ * where 0 is the least significant end of the register.
+ */
+static long neon_element_offset(int reg, int element, MemOp size)
+{
+    int element_size = 1 << size;
+    int ofs = element * element_size;
+#ifdef HOST_WORDS_BIGENDIAN
+    /*
+     * Calculate the offset assuming fully little-endian,
+     * then XOR to account for the order of the 8-byte units.
+     */
+    if (element_size < 8) {
+        ofs ^= 8 - element_size;
+    }
+#endif
+    return neon_full_reg_offset(reg) + ofs;
+}
+
 static inline long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static inline int neon_3same_fp_size(DisasContext *s, int x)
 #include "decode-neon-ls.c.inc"
 #include "decode-neon-shared.c.inc"
 
-/* Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
- * where 0 is the least significant end of the register.
- */
-static inline long
-neon_element_offset(int reg, int element, MemOp size)
-{
-    int element_size = 1 << size;
-    int ofs = element * element_size;
-#ifdef HOST_WORDS_BIGENDIAN
-    /* Calculate the offset assuming fully little-endian,
-     * then XOR to account for the order of the 8-byte units.
-     */
-    if (element_size < 8) {
-        ofs ^= 8 - element_size;
-    }
-#endif
-    return neon_full_reg_offset(reg) + ofs;
-}
-
 static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
 {
     long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

These are the only users of neon_reg_offset, so remove that.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-4-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-/* Return the offset of a 32-bit piece of a NEON register.
-   zero is the least significant end of the register.  */
-static inline long
-neon_reg_offset (int reg, int n)
-{
-    int sreg;
-    sreg = reg * 2 + n;
-    return vfp_reg_offset(0, sreg);
-}
-
 static TCGv_i32 neon_load_reg(int reg, int pass)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_ld_i32(tmp, cpu_env, neon_reg_offset(reg, pass));
+    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
     return tmp;
 }
 
 static void neon_store_reg(int reg, int pass, TCGv_i32 var)
 {
-    tcg_gen_st_i32(var, cpu_env, neon_reg_offset(reg, pass));
+    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
     tcg_temp_free_i32(var);
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This seems a bit more readable than using offsetof CPU_DoubleU.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-5-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_element_offset(int reg, int element, MemOp size)
     return neon_full_reg_offset(reg) + ofs;
 }
 
-static inline long vfp_reg_offset(bool dp, unsigned reg)
+/* Return the offset of a VFP Dreg (dp = true) or VFP Sreg (dp = false). */
+static long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
-        return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+        return neon_element_offset(reg, 0, MO_64);
     } else {
-        long ofs = offsetof(CPUARMState, vfp.zregs[reg >> 2].d[(reg >> 1) & 1]);
-        if (reg & 1) {
-            ofs += offsetof(CPU_DoubleU, l.upper);
-        } else {
-            ofs += offsetof(CPU_DoubleU, l.lower);
-        }
-        return ofs;
+        return neon_element_offset(reg >> 1, reg & 1, MO_32);
     }
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Model these off the aa64 read/write_vec_element functions.
Use it within translate-neon.c.inc.  The new functions do
not allocate or free temps, so this rearranges the calling
code a bit.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-6-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  26 ++++
 target/arm/translate-neon.c.inc | 256 ++++++++++++++++++++------------
 2 files changed, 183 insertions(+), 99 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
+{
+    long off = neon_element_offset(reg, ele, size);
+
+    switch (size) {
+    case MO_32:
+        tcg_gen_ld_i32(dest, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
+{
+    long off = neon_element_offset(reg, ele, size);
+
+    switch (size) {
+    case MO_32:
+        tcg_gen_st_i32(src, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 {
     TCGv_ptr ret = tcg_temp_new_ptr();
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
      * early. Since Q is 0 there are always just two passes, so instead
      * of a complicated loop over each pass we just unroll.
      */
-    tmp = neon_load_reg(a->vn, 0);
-    tmp2 = neon_load_reg(a->vn, 1);
+    tmp = tcg_temp_new_i32();
+    tmp2 = tcg_temp_new_i32();
+    tmp3 = tcg_temp_new_i32();
+
+    read_neon_element32(tmp, a->vn, 0, MO_32);
+    read_neon_element32(tmp2, a->vn, 1, MO_32);
     fn(tmp, tmp, tmp2);
-    tcg_temp_free_i32(tmp2);
 
-    tmp3 = neon_load_reg(a->vm, 0);
-    tmp2 = neon_load_reg(a->vm, 1);
+    read_neon_element32(tmp3, a->vm, 0, MO_32);
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     fn(tmp3, tmp3, tmp2);
-    tcg_temp_free_i32(tmp2);
 
-    neon_store_reg(a->vd, 0, tmp);
-    neon_store_reg(a->vd, 1, tmp3);
+    write_neon_element32(tmp, a->vd, 0, MO_32);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i32(tmp2);
+    tcg_temp_free_i32(tmp3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
      * 2-reg-and-shift operations, size < 3 case, where the
      * helper needs to be passed cpu_env.
      */
-    TCGv_i32 constimm;
+    TCGv_i32 constimm, tmp;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
      * by immediate using the variable shift operations.
      */
     constimm = tcg_const_i32(dup_const(a->size, a->shift));
+    tmp = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
+        read_neon_element32(tmp, a->vm, pass, MO_32);
         fn(tmp, cpu_env, tmp, constimm);
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(constimm);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
     constimm = tcg_const_i64(-a->shift);
     rm1 = tcg_temp_new_i64();
     rm2 = tcg_temp_new_i64();
+    rd = tcg_temp_new_i32();
 
     /* Load both inputs first to avoid potential overwrite if rm == rd */
     neon_load_reg64(rm1, a->vm);
     neon_load_reg64(rm2, a->vm + 1);
 
     shiftfn(rm1, rm1, constimm);
-    rd = tcg_temp_new_i32();
     narrowfn(rd, cpu_env, rm1);
-    neon_store_reg(a->vd, 0, rd);
+    write_neon_element32(rd, a->vd, 0, MO_32);
 
     shiftfn(rm2, rm2, constimm);
-    rd = tcg_temp_new_i32();
     narrowfn(rd, cpu_env, rm2);
-    neon_store_reg(a->vd, 1, rd);
+    write_neon_element32(rd, a->vd, 1, MO_32);
 
+    tcg_temp_free_i32(rd);
     tcg_temp_free_i64(rm1);
     tcg_temp_free_i64(rm2);
     tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
     constimm = tcg_const_i32(imm);
 
     /* Load all inputs first to avoid potential overwrite */
-    rm1 = neon_load_reg(a->vm, 0);
-    rm2 = neon_load_reg(a->vm, 1);
-    rm3 = neon_load_reg(a->vm + 1, 0);
-    rm4 = neon_load_reg(a->vm + 1, 1);
+    rm1 = tcg_temp_new_i32();
+    rm2 = tcg_temp_new_i32();
+    rm3 = tcg_temp_new_i32();
+    rm4 = tcg_temp_new_i32();
+    read_neon_element32(rm1, a->vm, 0, MO_32);
+    read_neon_element32(rm2, a->vm, 1, MO_32);
+    read_neon_element32(rm3, a->vm, 2, MO_32);
+    read_neon_element32(rm4, a->vm, 3, MO_32);
     rtmp = tcg_temp_new_i64();
 
     shiftfn(rm1, rm1, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
     tcg_temp_free_i32(rm2);
 
     narrowfn(rm1, cpu_env, rtmp);
-    neon_store_reg(a->vd, 0, rm1);
+    write_neon_element32(rm1, a->vd, 0, MO_32);
+    tcg_temp_free_i32(rm1);
 
     shiftfn(rm3, rm3, constimm);
     shiftfn(rm4, rm4, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
 
     narrowfn(rm3, cpu_env, rtmp);
     tcg_temp_free_i64(rtmp);
-    neon_store_reg(a->vd, 1, rm3);
+    write_neon_element32(rm3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(rm3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         widen_mask = dup_const(a->size + 1, widen_mask);
     }
 
-    rm0 = neon_load_reg(a->vm, 0);
-    rm1 = neon_load_reg(a->vm, 1);
+    rm0 = tcg_temp_new_i32();
+    rm1 = tcg_temp_new_i32();
+    read_neon_element32(rm0, a->vm, 0, MO_32);
+    read_neon_element32(rm1, a->vm, 1, MO_32);
     tmp = tcg_temp_new_i64();
 
     widenfn(tmp, rm0);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     if (src1_wide) {
         neon_load_reg64(rn0_64, a->vn);
     } else {
-        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vn, 0, MO_32);
         widenfn(rn0_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = neon_load_reg(a->vm, 0);
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rm, a->vm, 0, MO_32);
 
     widenfn(rm_64, rm);
     tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     if (src1_wide) {
         neon_load_reg64(rn1_64, a->vn + 1);
     } else {
-        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vn, 1, MO_32);
         widenfn(rn1_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = neon_load_reg(a->vm, 1);
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rm, a->vm, 1, MO_32);
 
     neon_store_reg64(rn0_64, a->vd);
 
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
 
     narrowfn(rd1, rn_64);
 
-    neon_store_reg(a->vd, 0, rd0);
-    neon_store_reg(a->vd, 1, rd1);
+    write_neon_element32(rd0, a->vd, 0, MO_32);
+    write_neon_element32(rd1, a->vd, 1, MO_32);
 
+    tcg_temp_free_i32(rd0);
+    tcg_temp_free_i32(rd1);
     tcg_temp_free_i64(rn_64);
     tcg_temp_free_i64(rm_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     rd0 = tcg_temp_new_i64();
     rd1 = tcg_temp_new_i64();
 
-    rn = neon_load_reg(a->vn, 0);
-    rm = neon_load_reg(a->vm, 0);
+    rn = tcg_temp_new_i32();
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rn, a->vn, 0, MO_32);
+    read_neon_element32(rm, a->vm, 0, MO_32);
     opfn(rd0, rn, rm);
-    tcg_temp_free_i32(rn);
-    tcg_temp_free_i32(rm);
 
-    rn = neon_load_reg(a->vn, 1);
-    rm = neon_load_reg(a->vm, 1);
+    read_neon_element32(rn, a->vn, 1, MO_32);
+    read_neon_element32(rm, a->vm, 1, MO_32);
     opfn(rd1, rn, rm);
     tcg_temp_free_i32(rn);
     tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
 
 static inline TCGv_i32 neon_get_scalar(int size, int reg)
 {
-    TCGv_i32 tmp;
-    if (size == 1) {
-        tmp = neon_load_reg(reg & 7, reg >> 4);
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    if (size == MO_16) {
+        read_neon_element32(tmp, reg & 7, reg >> 4, MO_32);
         if (reg & 8) {
             gen_neon_dup_high16(tmp);
         } else {
             gen_neon_dup_low16(tmp);
         }
     } else {
-        tmp = neon_load_reg(reg & 15, reg >> 4);
+        read_neon_element32(tmp, reg & 15, reg >> 4, MO_32);
     }
     return tmp;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
      * perform an accumulation operation of that result into the
      * destination.
      */
-    TCGv_i32 scalar;
+    TCGv_i32 scalar, tmp;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
     }
 
     scalar = neon_get_scalar(a->size, a->vm);
+    tmp = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
+        read_neon_element32(tmp, a->vn, pass, MO_32);
         opfn(tmp, tmp, scalar);
         if (accfn) {
-            TCGv_i32 rd = neon_load_reg(a->vd, pass);
+            TCGv_i32 rd = tcg_temp_new_i32();
+            read_neon_element32(rd, a->vd, pass, MO_32);
             accfn(tmp, rd, tmp);
             tcg_temp_free_i32(rd);
         }
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(scalar);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
      * performs a kind of fused op-then-accumulate using a helper
      * function that takes all of rd, rn and the scalar at once.
      */
-    TCGv_i32 scalar;
+    TCGv_i32 scalar, rn, rd;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
     }
 
     scalar = neon_get_scalar(a->size, a->vm);
+    rn = tcg_temp_new_i32();
+    rd = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 rn = neon_load_reg(a->vn, pass);
-        TCGv_i32 rd = neon_load_reg(a->vd, pass);
+        read_neon_element32(rn, a->vn, pass, MO_32);
+        read_neon_element32(rd, a->vd, pass, MO_32);
         opfn(rd, cpu_env, rn, scalar, rd);
-        tcg_temp_free_i32(rn);
-        neon_store_reg(a->vd, pass, rd);
+        write_neon_element32(rd, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(rd);
     tcg_temp_free_i32(scalar);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
     scalar = neon_get_scalar(a->size, a->vm);
 
     /* Load all inputs before writing any outputs, in case of overlap */
-    rn = neon_load_reg(a->vn, 0);
+    rn = tcg_temp_new_i32();
+    read_neon_element32(rn, a->vn, 0, MO_32);
     rn0_64 = tcg_temp_new_i64();
     opfn(rn0_64, rn, scalar);
-    tcg_temp_free_i32(rn);
 
-    rn = neon_load_reg(a->vn, 1);
+    read_neon_element32(rn, a->vn, 1, MO_32);
     rn1_64 = tcg_temp_new_i64();
     opfn(rn1_64, rn, scalar);
     tcg_temp_free_i32(rn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
         return false;
     }
     n <<= 3;
+    tmp = tcg_temp_new_i32();
     if (a->op) {
-        tmp = neon_load_reg(a->vd, 0);
+        read_neon_element32(tmp, a->vd, 0, MO_32);
     } else {
-        tmp = tcg_temp_new_i32();
         tcg_gen_movi_i32(tmp, 0);
     }
-    tmp2 = neon_load_reg(a->vm, 0);
+    tmp2 = tcg_temp_new_i32();
+    read_neon_element32(tmp2, a->vm, 0, MO_32);
     ptr1 = vfp_reg_ptr(true, a->vn);
     tmp4 = tcg_const_i32(n);
     gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
-    tcg_temp_free_i32(tmp);
+
     if (a->op) {
-        tmp = neon_load_reg(a->vd, 1);
+        read_neon_element32(tmp, a->vd, 1, MO_32);
     } else {
-        tmp = tcg_temp_new_i32();
         tcg_gen_movi_i32(tmp, 0);
     }
-    tmp3 = neon_load_reg(a->vm, 1);
+    tmp3 = tcg_temp_new_i32();
+    read_neon_element32(tmp3, a->vm, 1, MO_32);
     gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(tmp4);
     tcg_temp_free_ptr(ptr1);
-    neon_store_reg(a->vd, 0, tmp2);
-    neon_store_reg(a->vd, 1, tmp3);
-    tcg_temp_free_i32(tmp);
+
+    write_neon_element32(tmp2, a->vd, 0, MO_32);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp2);
+    tcg_temp_free_i32(tmp3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
 static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
 {
     int pass, half;
+    TCGv_i32 tmp[2];
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
         return true;
     }
 
-    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
-        TCGv_i32 tmp[2];
+    tmp[0] = tcg_temp_new_i32();
+    tmp[1] = tcg_temp_new_i32();
 
+    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
         for (half = 0; half < 2; half++) {
-            tmp[half] = neon_load_reg(a->vm, pass * 2 + half);
+            read_neon_element32(tmp[half], a->vm, pass * 2 + half, MO_32);
             switch (a->size) {
             case 0:
                 tcg_gen_bswap32_i32(tmp[half], tmp[half]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
                 g_assert_not_reached();
             }
         }
-        neon_store_reg(a->vd, pass * 2, tmp[1]);
-        neon_store_reg(a->vd, pass * 2 + 1, tmp[0]);
+        write_neon_element32(tmp[1], a->vd, pass * 2, MO_32);
+        write_neon_element32(tmp[0], a->vd, pass * 2 + 1, MO_32);
     }
+
+    tcg_temp_free_i32(tmp[0]);
+    tcg_temp_free_i32(tmp[1]);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
         rm0_64 = tcg_temp_new_i64();
         rm1_64 = tcg_temp_new_i64();
         rd_64 = tcg_temp_new_i64();
-        tmp = neon_load_reg(a->vm, pass * 2);
+
+        tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, pass * 2, MO_32);
         widenfn(rm0_64, tmp);
-        tcg_temp_free_i32(tmp);
-        tmp = neon_load_reg(a->vm, pass * 2 + 1);
+        read_neon_element32(tmp, a->vm, pass * 2 + 1, MO_32);
         widenfn(rm1_64, tmp);
         tcg_temp_free_i32(tmp);
+
         opfn(rd_64, rm0_64, rm1_64);
         tcg_temp_free_i64(rm0_64);
         tcg_temp_free_i64(rm1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
     narrowfn(rd0, cpu_env, rm);
     neon_load_reg64(rm, a->vm + 1);
     narrowfn(rd1, cpu_env, rm);
-    neon_store_reg(a->vd, 0, rd0);
-    neon_store_reg(a->vd, 1, rd1);
+    write_neon_element32(rd0, a->vd, 0, MO_32);
+    write_neon_element32(rd1, a->vd, 1, MO_32);
+    tcg_temp_free_i32(rd0);
+    tcg_temp_free_i32(rd1);
     tcg_temp_free_i64(rm);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
     }
 
     rd = tcg_temp_new_i64();
+    rm0 = tcg_temp_new_i32();
+    rm1 = tcg_temp_new_i32();
 
-    rm0 = neon_load_reg(a->vm, 0);
-    rm1 = neon_load_reg(a->vm, 1);
+    read_neon_element32(rm0, a->vm, 0, MO_32);
+    read_neon_element32(rm1, a->vm, 1, MO_32);
 
     widenfn(rd, rm0);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
 
     fpst = fpstatus_ptr(FPST_STD);
     ahp = get_ahp_flag();
-    tmp = neon_load_reg(a->vm, 0);
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vm, 0, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
-    tmp2 = neon_load_reg(a->vm, 1);
+    tmp2 = tcg_temp_new_i32();
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp2, tmp2, fpst, ahp);
     tcg_gen_shli_i32(tmp2, tmp2, 16);
     tcg_gen_or_i32(tmp2, tmp2, tmp);
-    tcg_temp_free_i32(tmp);
-    tmp = neon_load_reg(a->vm, 2);
+    read_neon_element32(tmp, a->vm, 2, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
-    tmp3 = neon_load_reg(a->vm, 3);
-    neon_store_reg(a->vd, 0, tmp2);
+    tmp3 = tcg_temp_new_i32();
+    read_neon_element32(tmp3, a->vm, 3, MO_32);
+    write_neon_element32(tmp2, a->vd, 0, MO_32);
+    tcg_temp_free_i32(tmp2);
     gen_helper_vfp_fcvt_f32_to_f16(tmp3, tmp3, fpst, ahp);
     tcg_gen_shli_i32(tmp3, tmp3, 16);
     tcg_gen_or_i32(tmp3, tmp3, tmp);
-    neon_store_reg(a->vd, 1, tmp3);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp3);
     tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(ahp);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
     fpst = fpstatus_ptr(FPST_STD);
     ahp = get_ahp_flag();
     tmp3 = tcg_temp_new_i32();
-    tmp = neon_load_reg(a->vm, 0);
-    tmp2 = neon_load_reg(a->vm, 1);
+    tmp2 = tcg_temp_new_i32();
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vm, 0, MO_32);
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     tcg_gen_ext16u_i32(tmp3, tmp);
     gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
-    neon_store_reg(a->vd, 0, tmp3);
+    write_neon_element32(tmp3, a->vd, 0, MO_32);
     tcg_gen_shri_i32(tmp, tmp, 16);
     gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp);
-    neon_store_reg(a->vd, 1, tmp);
-    tmp3 = tcg_temp_new_i32();
+    write_neon_element32(tmp, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp);
     tcg_gen_ext16u_i32(tmp3, tmp2);
     gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
-    neon_store_reg(a->vd, 2, tmp3);
+    write_neon_element32(tmp3, a->vd, 2, MO_32);
+    tcg_temp_free_i32(tmp3);
     tcg_gen_shri_i32(tmp2, tmp2, 16);
     gen_helper_vfp_fcvt_f16_to_f32(tmp2, tmp2, fpst, ahp);
-    neon_store_reg(a->vd, 3, tmp2);
+    write_neon_element32(tmp2, a->vd, 3, MO_32);
+    tcg_temp_free_i32(tmp2);
     tcg_temp_free_i32(ahp);
     tcg_temp_free_ptr(fpst);
 
@@ -XXX,XX +XXX,XX @@ DO_2M_CRYPTO(SHA256SU0, aa32_sha2, 2)
 
 static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
 {
+    TCGv_i32 tmp;
     int pass;
 
     /* Handle a 2-reg-misc operation by iterating 32 bits at a time */
@@ -XXX,XX +XXX,XX @@ static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
         return true;
     }
 
+    tmp = tcg_temp_new_i32();
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
+        read_neon_element32(tmp, a->vm, pass, MO_32);
         fn(tmp, tmp);
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
 
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VTRN(DisasContext *s, arg_2misc *a)
         return true;
     }
 
-    if (a->size == 2) {
+    tmp = tcg_temp_new_i32();
+    tmp2 = tcg_temp_new_i32();
+    if (a->size == MO_32) {
         for (pass = 0; pass < (a->q ? 4 : 2); pass += 2) {
-            tmp = neon_load_reg(a->vm, pass);
-            tmp2 = neon_load_reg(a->vd, pass + 1);
-            neon_store_reg(a->vm, pass, tmp2);
-            neon_store_reg(a->vd, pass + 1, tmp);
+            read_neon_element32(tmp, a->vm, pass, MO_32);
+            read_neon_element32(tmp2, a->vd, pass + 1, MO_32);
+            write_neon_element32(tmp2, a->vm, pass, MO_32);
+            write_neon_element32(tmp, a->vd, pass + 1, MO_32);
         }
     } else {
         for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-            tmp = neon_load_reg(a->vm, pass);
-            tmp2 = neon_load_reg(a->vd, pass);
-            if (a->size == 0) {
+            read_neon_element32(tmp, a->vm, pass, MO_32);
+            read_neon_element32(tmp2, a->vd, pass, MO_32);
+            if (a->size == MO_8) {
                 gen_neon_trn_u8(tmp, tmp2);
             } else {
                 gen_neon_trn_u16(tmp, tmp2);
             }
-            neon_store_reg(a->vm, pass, tmp2);
-            neon_store_reg(a->vd, pass, tmp);
+            write_neon_element32(tmp2, a->vm, pass, MO_32);
+            write_neon_element32(tmp, a->vd, pass, MO_32);
         }
     }
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i32(tmp2);
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We can then use this to improve VMOV (scalar to gp) and
VMOV (gp to scalar) so that we simply perform the memory
operation that we wanted, rather than inserting or
extracting from a 32-bit quantity.

These were the last uses of neon_load/store_reg, so remove them.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-7-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         | 50 +++++++++++++-----------
 target/arm/translate-vfp.c.inc | 71 +++++-----------------------------
 2 files changed, 37 insertions(+), 84 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
  * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
  * where 0 is the least significant end of the register.
  */
-static long neon_element_offset(int reg, int element, MemOp size)
+static long neon_element_offset(int reg, int element, MemOp memop)
 {
-    int element_size = 1 << size;
+    int element_size = 1 << (memop & MO_SIZE);
     int ofs = element * element_size;
 #ifdef HOST_WORDS_BIGENDIAN
     /*
@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-static TCGv_i32 neon_load_reg(int reg, int pass)
-{
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
-    return tmp;
-}
-
-static void neon_store_reg(int reg, int pass, TCGv_i32 var)
-{
-    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
-    tcg_temp_free_i32(var);
-}
-
 static inline void neon_load_reg64(TCGv_i64 var, int reg)
 {
     tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
-static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
 {
-    long off = neon_element_offset(reg, ele, size);
+    long off = neon_element_offset(reg, ele, memop);
 
-    switch (size) {
-    case MO_32:
+    switch (memop) {
+    case MO_SB:
+        tcg_gen_ld8s_i32(dest, cpu_env, off);
+        break;
+    case MO_UB:
+        tcg_gen_ld8u_i32(dest, cpu_env, off);
+        break;
+    case MO_SW:
+        tcg_gen_ld16s_i32(dest, cpu_env, off);
+        break;
+    case MO_UW:
+        tcg_gen_ld16u_i32(dest, cpu_env, off);
+        break;
+    case MO_UL:
+    case MO_SL:
         tcg_gen_ld_i32(dest, cpu_env, off);
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
     }
 }
 
-static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
 {
-    long off = neon_element_offset(reg, ele, size);
+    long off = neon_element_offset(reg, ele, memop);
 
-    switch (size) {
+    switch (memop) {
+    case MO_8:
+        tcg_gen_st8_i32(src, cpu_env, off);
+        break;
+    case MO_16:
+        tcg_gen_st16_i32(src, cpu_env, off);
+        break;
     case MO_32:
         tcg_gen_st_i32(src, cpu_env, off);
         break;
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
 {
     /* VMOV scalar to general purpose register */
     TCGv_i32 tmp;
-    int pass;
-    uint32_t offset;
 
-    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
-    if (a->size == 2
+    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
+    if (a->size == MO_32
         ? !dc_isar_feature(aa32_fpsp_v2, s)
         : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
         return false;
     }
 
-    offset = a->index << a->size;
-    pass = extract32(offset, 2, 1);
-    offset = extract32(offset, 0, 2) * 8;
-
     if (!vfp_access_check(s)) {
         return true;
     }
 
-    tmp = neon_load_reg(a->vn, pass);
-    switch (a->size) {
-    case 0:
-        if (offset) {
-            tcg_gen_shri_i32(tmp, tmp, offset);
-        }
-        if (a->u) {
-            gen_uxtb(tmp);
-        } else {
-            gen_sxtb(tmp);
-        }
-        break;
-    case 1:
-        if (a->u) {
-            if (offset) {
-                tcg_gen_shri_i32(tmp, tmp, 16);
-            } else {
-                gen_uxth(tmp);
-            }
-        } else {
-            if (offset) {
-                tcg_gen_sari_i32(tmp, tmp, 16);
-            } else {
-                gen_sxth(tmp);
-            }
-        }
-        break;
-    case 2:
-        break;
-    }
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
     store_reg(s, a->rt, tmp);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
 static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
 {
     /* VMOV general purpose register to scalar */
-    TCGv_i32 tmp, tmp2;
-    int pass;
-    uint32_t offset;
+    TCGv_i32 tmp;
 
-    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
-    if (a->size == 2
+    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
+    if (a->size == MO_32
         ? !dc_isar_feature(aa32_fpsp_v2, s)
         : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
         return false;
     }
 
-    offset = a->index << a->size;
-    pass = extract32(offset, 2, 1);
-    offset = extract32(offset, 0, 2) * 8;
-
     if (!vfp_access_check(s)) {
         return true;
     }
 
     tmp = load_reg(s, a->rt);
-    switch (a->size) {
-    case 0:
-        tmp2 = neon_load_reg(a->vn, pass);
-        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
-        tcg_temp_free_i32(tmp2);
-        break;
-    case 1:
-        tmp2 = neon_load_reg(a->vn, pass);
-        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
-        tcg_temp_free_i32(tmp2);
-        break;
-    case 2:
-        break;
-    }
-    neon_store_reg(a->vn, pass, tmp);
+    write_neon_element32(tmp, a->vn, a->index, a->size);
+    tcg_temp_free_i32(tmp);
 
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The only uses of this function are for loading VFP
single-precision values, and nothing to do with NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         |   4 +-
 target/arm/translate-vfp.c.inc | 184 ++++++++++++++++-----------------
 2 files changed, 94 insertions(+), 94 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
 }
 
-static inline void neon_load_reg32(TCGv_i32 var, int reg)
+static inline void vfp_load_reg32(TCGv_i32 var, int reg)
 {
     tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
-static inline void neon_store_reg32(TCGv_i32 var, int reg)
+static inline void vfp_store_reg32(TCGv_i32 var, int reg)
 {
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         frn = tcg_temp_new_i32();
         frm = tcg_temp_new_i32();
         dest = tcg_temp_new_i32();
-        neon_load_reg32(frn, rn);
-        neon_load_reg32(frm, rm);
+        vfp_load_reg32(frn, rn);
+        vfp_load_reg32(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         if (sz == 1) {
             tcg_gen_andi_i32(dest, dest, 0xffff);
         }
-        neon_store_reg32(dest, rd);
+        vfp_store_reg32(dest, rd);
         tcg_temp_free_i32(frn);
         tcg_temp_free_i32(frm);
         tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i32 tcg_res;
         tcg_op = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        neon_load_reg32(tcg_op, rm);
+        vfp_load_reg32(tcg_op, rm);
         if (sz == 1) {
             gen_helper_rinth(tcg_res, tcg_op, fpst);
         } else {
             gen_helper_rints(tcg_res, tcg_op, fpst);
         }
-        neon_store_reg32(tcg_res, rd);
+        vfp_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_op);
         tcg_temp_free_i32(tcg_res);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
             gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
         }
         tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
-        neon_store_reg32(tcg_tmp, rd);
+        vfp_store_reg32(tcg_tmp, rd);
         tcg_temp_free_i32(tcg_tmp);
         tcg_temp_free_i64(tcg_res);
         tcg_temp_free_i64(tcg_double);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         TCGv_i32 tcg_single, tcg_res;
         tcg_single = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        neon_load_reg32(tcg_single, rm);
+        vfp_load_reg32(tcg_single, rm);
         if (sz == 1) {
             if (is_signed) {
                 gen_helper_vfp_toslh(tcg_res, tcg_single, tcg_shift, fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
                 gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
             }
         }
-        neon_store_reg32(tcg_res, rd);
+        vfp_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_res);
         tcg_temp_free_i32(tcg_single);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
     if (a->l) {
         /* VFP to general purpose register */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vn);
+        vfp_load_reg32(tmp, a->vn);
         tcg_gen_andi_i32(tmp, tmp, 0xffff);
         store_reg(s, a->rt, tmp);
     } else {
         /* general purpose register to VFP */
         tmp = load_reg(s, a->rt);
         tcg_gen_andi_i32(tmp, tmp, 0xffff);
-        neon_store_reg32(tmp, a->vn);
+        vfp_store_reg32(tmp, a->vn);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
     if (a->l) {
         /* VFP to general purpose register */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vn);
+        vfp_load_reg32(tmp, a->vn);
         if (a->rt == 15) {
             /* Set the 4 flag bits in the CPSR.  */
             gen_set_nzcv(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
     } else {
         /* general purpose register to VFP */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vn);
+        vfp_store_reg32(tmp, a->vn);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
     if (a->op) {
         /* fpreg to gpreg */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm);
+        vfp_load_reg32(tmp, a->vm);
         store_reg(s, a->rt, tmp);
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm + 1);
+        vfp_load_reg32(tmp, a->vm + 1);
         store_reg(s, a->rt2, tmp);
     } else {
         /* gpreg to fpreg */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vm);
+        vfp_store_reg32(tmp, a->vm);
         tcg_temp_free_i32(tmp);
         tmp = load_reg(s, a->rt2);
-        neon_store_reg32(tmp, a->vm + 1);
+        vfp_store_reg32(tmp, a->vm + 1);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
     if (a->op) {
         /* fpreg to gpreg */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm * 2);
+        vfp_load_reg32(tmp, a->vm * 2);
         store_reg(s, a->rt, tmp);
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm * 2 + 1);
+        vfp_load_reg32(tmp, a->vm * 2 + 1);
         store_reg(s, a->rt2, tmp);
     } else {
         /* gpreg to fpreg */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vm * 2);
+        vfp_store_reg32(tmp, a->vm * 2);
         tcg_temp_free_i32(tmp);
         tmp = load_reg(s, a->rt2);
-        neon_store_reg32(tmp, a->vm * 2 + 1);
+        vfp_store_reg32(tmp, a->vm * 2 + 1);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     tmp = tcg_temp_new_i32();
     if (a->l) {
         gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
-        neon_store_reg32(tmp, a->vd);
+        vfp_store_reg32(tmp, a->vd);
     } else {
-        neon_load_reg32(tmp, a->vd);
+        vfp_load_reg32(tmp, a->vd);
         gen_aa32_st16(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     tmp = tcg_temp_new_i32();
     if (a->l) {
         gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-        neon_store_reg32(tmp, a->vd);
+        vfp_store_reg32(tmp, a->vd);
     } else {
-        neon_load_reg32(tmp, a->vd);
+        vfp_load_reg32(tmp, a->vd);
         gen_aa32_st32(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
         if (a->l) {
             /* load */
             gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-            neon_store_reg32(tmp, a->vd + i);
+            vfp_store_reg32(tmp, a->vd + i);
         } else {
             /* store */
-            neon_load_reg32(tmp, a->vd + i);
+            vfp_load_reg32(tmp, a->vd + i);
             gen_aa32_st32(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
     fd = tcg_temp_new_i32();
     fpst = fpstatus_ptr(FPST_FPCR);
 
-    neon_load_reg32(f0, vn);
-    neon_load_reg32(f1, vm);
+    vfp_load_reg32(f0, vn);
+    vfp_load_reg32(f1, vm);
 
     for (;;) {
         if (reads_vd) {
-            neon_load_reg32(fd, vd);
+            vfp_load_reg32(fd, vd);
         }
         fn(fd, f0, f1, fpst);
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
         veclen--;
         vd = vfp_advance_sreg(vd, delta_d);
         vn = vfp_advance_sreg(vn, delta_d);
-        neon_load_reg32(f0, vn);
+        vfp_load_reg32(f0, vn);
         if (delta_m) {
             vm = vfp_advance_sreg(vm, delta_m);
-            neon_load_reg32(f1, vm);
+            vfp_load_reg32(f1, vm);
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
     fd = tcg_temp_new_i32();
     fpst = fpstatus_ptr(FPST_FPCR_F16);
 
-    neon_load_reg32(f0, vn);
-    neon_load_reg32(f1, vm);
+    vfp_load_reg32(f0, vn);
+    vfp_load_reg32(f1, vm);
 
     if (reads_vd) {
-        neon_load_reg32(fd, vd);
+        vfp_load_reg32(fd, vd);
     }
     fn(fd, f0, f1, fpst);
-    neon_store_reg32(fd, vd);
+    vfp_store_reg32(fd, vd);
 
     tcg_temp_free_i32(f0);
     tcg_temp_free_i32(f1);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     f0 = tcg_temp_new_i32();
     fd = tcg_temp_new_i32();
 
-    neon_load_reg32(f0, vm);
+    vfp_load_reg32(f0, vm);
 
     for (;;) {
         fn(fd, f0);
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
             /* single source one-many */
             while (veclen--) {
                 vd = vfp_advance_sreg(vd, delta_d);
-                neon_store_reg32(fd, vd);
+                vfp_store_reg32(fd, vd);
             }
             break;
         }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
         veclen--;
         vd = vfp_advance_sreg(vd, delta_d);
         vm = vfp_advance_sreg(vm, delta_m);
-        neon_load_reg32(f0, vm);
+        vfp_load_reg32(f0, vm);
     }
 
     tcg_temp_free_i32(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     }
 
     f0 = tcg_temp_new_i32();
-    neon_load_reg32(f0, vm);
+    vfp_load_reg32(f0, vm);
     fn(f0, f0);
-    neon_store_reg32(f0, vd);
+    vfp_store_reg32(f0, vd);
     tcg_temp_free_i32(f0);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i32();
 
-    neon_load_reg32(vn, a->vn);
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vn, a->vn);
+    vfp_load_reg32(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negh(vn, vn);
     }
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negh(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i32();
 
-    neon_load_reg32(vn, a->vn);
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vn, a->vn);
+    vfp_load_reg32(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negs(vn, vn);
     }
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negs(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_hp(DisasContext *s, arg_VMOV_imm_sp *a)
     }
 
     fd = tcg_const_i32(vfp_expand_imm(MO_16, a->imm));
-    neon_store_reg32(fd, a->vd);
+    vfp_store_reg32(fd, a->vd);
     tcg_temp_free_i32(fd);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
     fd = tcg_const_i32(vfp_expand_imm(MO_32, a->imm));
 
     for (;;) {
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i32();
 
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i32(vm, 0);
     } else {
-        neon_load_reg32(vm, a->vm);
+        vfp_load_reg32(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i32();
 
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i32(vm, 0);
     } else {
-        neon_load_reg32(vm, a->vm);
+        vfp_load_reg32(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
     /* The T bit tells us if we want the low or high 16 bits of Vm */
     tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
     gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_i32(ahp_mode);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
     ahp_mode = get_ahp_flag();
     tmp = tcg_temp_new_i32();
 
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
     tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
     tcg_temp_free_i32(ahp_mode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_hp(DisasContext *s, arg_VRINTR_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_rinth(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rints(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_hp(DisasContext *s, arg_VRINTZ_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rinth(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tcg_rmode);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rints(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tcg_rmode);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_hp(DisasContext *s, arg_VRINTX_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_rinth_exact(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rints_exact(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
 
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i64();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     gen_helper_vfp_fcvtds(vd, vm, cpu_env);
     neon_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
     vm = tcg_temp_new_i64();
     neon_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i64(vm);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
     }
 
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     if (a->s) {
         /* i32 -> f16 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
         /* u32 -> f16 */
         gen_helper_vfp_uitoh(vm, vm, fpst);
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
     }
 
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     if (a->s) {
         /* i32 -> f32 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
         /* u32 -> f32 */
         gen_helper_vfp_uitos(vm, vm, fpst);
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
 
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i64();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     if (a->s) {
         /* i32 -> f64 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
     vd = tcg_temp_new_i32();
     neon_load_reg64(vm, a->vm);
     gen_helper_vjcvt(vd, vm, cpu_env);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i64(vm);
     tcg_temp_free_i32(vd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i32();
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i32();
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
 
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
             gen_helper_vfp_touih(vm, vm, fpst);
         }
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
 
     fpst = fpstatus_ptr(FPST_FPCR);
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
             gen_helper_vfp_touis(vm, vm, fpst);
         }
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
             gen_helper_vfp_touid(vd, vm, fpst);
         }
     }
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i64(vm);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VINS(DisasContext *s, arg_VINS *a)
     /* Insert low half of Vm into high half of Vd */
     rm = tcg_temp_new_i32();
     rd = tcg_temp_new_i32();
-    neon_load_reg32(rm, a->vm);
-    neon_load_reg32(rd, a->vd);
+    vfp_load_reg32(rm, a->vm);
+    vfp_load_reg32(rd, a->vd);
     tcg_gen_deposit_i32(rd, rd, rm, 16, 16);
-    neon_store_reg32(rd, a->vd);
+    vfp_store_reg32(rd, a->vd);
     tcg_temp_free_i32(rm);
     tcg_temp_free_i32(rd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOVX(DisasContext *s, arg_VINS *a)
 
     /* Set Vd to high half of Vm */
     rm = tcg_temp_new_i32();
-    neon_load_reg32(rm, a->vm);
+    vfp_load_reg32(rm, a->vm);
     tcg_gen_shri_i32(rm, rm, 16);
-    neon_store_reg32(rm, a->vd);
+    vfp_store_reg32(rm, a->vd);
     tcg_temp_free_i32(rm);
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-9-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          | 26 +++++++++
 target/arm/translate-neon.c.inc | 94 ++++++++++++++++-----------------
 2 files changed, 73 insertions(+), 47 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
     }
 }
 
+static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
+{
+    long off = neon_element_offset(reg, ele, memop);
+
+    switch (memop) {
+    case MO_Q:
+        tcg_gen_ld_i64(dest, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
 {
     long off = neon_element_offset(reg, ele, memop);
@@ -XXX,XX +XXX,XX @@ static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
     }
 }
 
+static void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
+{
+    long off = neon_element_offset(reg, ele, memop);
+
+    switch (memop) {
+    case MO_64:
+        tcg_gen_st_i64(src, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 {
     TCGv_ptr ret = tcg_temp_new_ptr();
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
     for (pass = 0; pass < a->q + 1; pass++) {
         TCGv_i64 tmp = tcg_temp_new_i64();
 
-        neon_load_reg64(tmp, a->vm + pass);
+        read_neon_element64(tmp, a->vm, pass, MO_64);
         fn(tmp, cpu_env, tmp, constimm);
-        neon_store_reg64(tmp, a->vd + pass);
+        write_neon_element64(tmp, a->vd, pass, MO_64);
         tcg_temp_free_i64(tmp);
     }
     tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
     rd = tcg_temp_new_i32();
 
     /* Load both inputs first to avoid potential overwrite if rm == rd */
-    neon_load_reg64(rm1, a->vm);
-    neon_load_reg64(rm2, a->vm + 1);
+    read_neon_element64(rm1, a->vm, 0, MO_64);
+    read_neon_element64(rm2, a->vm, 1, MO_64);
 
     shiftfn(rm1, rm1, constimm);
     narrowfn(rd, cpu_env, rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
     }
-    neon_store_reg64(tmp, a->vd);
+    write_neon_element64(tmp, a->vd, 0, MO_64);
 
     widenfn(tmp, rm1);
     tcg_temp_free_i32(rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
     }
-    neon_store_reg64(tmp, a->vd + 1);
+    write_neon_element64(tmp, a->vd, 1, MO_64);
     tcg_temp_free_i64(tmp);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rm_64 = tcg_temp_new_i64();
 
     if (src1_wide) {
-        neon_load_reg64(rn0_64, a->vn);
+        read_neon_element64(rn0_64, a->vn, 0, MO_64);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 0, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      * avoid incorrect results if a narrow input overlaps with the result.
      */
     if (src1_wide) {
-        neon_load_reg64(rn1_64, a->vn + 1);
+        read_neon_element64(rn1_64, a->vn, 1, MO_64);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rm = tcg_temp_new_i32();
     read_neon_element32(rm, a->vm, 1, MO_32);
 
-    neon_store_reg64(rn0_64, a->vd);
+    write_neon_element64(rn0_64, a->vd, 0, MO_64);
 
     widenfn(rm_64, rm);
     tcg_temp_free_i32(rm);
     opfn(rn1_64, rn1_64, rm_64);
-    neon_store_reg64(rn1_64, a->vd + 1);
+    write_neon_element64(rn1_64, a->vd, 1, MO_64);
 
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
     rd0 = tcg_temp_new_i32();
     rd1 = tcg_temp_new_i32();
 
-    neon_load_reg64(rn_64, a->vn);
-    neon_load_reg64(rm_64, a->vm);
+    read_neon_element64(rn_64, a->vn, 0, MO_64);
+    read_neon_element64(rm_64, a->vm, 0, MO_64);
 
     opfn(rn_64, rn_64, rm_64);
 
     narrowfn(rd0, rn_64);
 
-    neon_load_reg64(rn_64, a->vn + 1);
-    neon_load_reg64(rm_64, a->vm + 1);
+    read_neon_element64(rn_64, a->vn, 1, MO_64);
+    read_neon_element64(rm_64, a->vm, 1, MO_64);
 
     opfn(rn_64, rn_64, rm_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     /* Don't store results until after all loads: they might overlap */
     if (accfn) {
         tmp = tcg_temp_new_i64();
-        neon_load_reg64(tmp, a->vd);
+        read_neon_element64(tmp, a->vd, 0, MO_64);
         accfn(tmp, tmp, rd0);
-        neon_store_reg64(tmp, a->vd);
-        neon_load_reg64(tmp, a->vd + 1);
+        write_neon_element64(tmp, a->vd, 0, MO_64);
+        read_neon_element64(tmp, a->vd, 1, MO_64);
         accfn(tmp, tmp, rd1);
-        neon_store_reg64(tmp, a->vd + 1);
+        write_neon_element64(tmp, a->vd, 1, MO_64);
         tcg_temp_free_i64(tmp);
     } else {
-        neon_store_reg64(rd0, a->vd);
-        neon_store_reg64(rd1, a->vd + 1);
+        write_neon_element64(rd0, a->vd, 0, MO_64);
+        write_neon_element64(rd1, a->vd, 1, MO_64);
     }
 
     tcg_temp_free_i64(rd0);
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
 
     if (accfn) {
         TCGv_i64 t64 = tcg_temp_new_i64();
-        neon_load_reg64(t64, a->vd);
+        read_neon_element64(t64, a->vd, 0, MO_64);
         accfn(t64, t64, rn0_64);
-        neon_store_reg64(t64, a->vd);
-        neon_load_reg64(t64, a->vd + 1);
+        write_neon_element64(t64, a->vd, 0, MO_64);
+        read_neon_element64(t64, a->vd, 1, MO_64);
         accfn(t64, t64, rn1_64);
-        neon_store_reg64(t64, a->vd + 1);
+        write_neon_element64(t64, a->vd, 1, MO_64);
         tcg_temp_free_i64(t64);
     } else {
-        neon_store_reg64(rn0_64, a->vd);
-        neon_store_reg64(rn1_64, a->vd + 1);
+        write_neon_element64(rn0_64, a->vd, 0, MO_64);
+        write_neon_element64(rn1_64, a->vd, 1, MO_64);
     }
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
         right = tcg_temp_new_i64();
         dest = tcg_temp_new_i64();
 
-        neon_load_reg64(right, a->vn);
-        neon_load_reg64(left, a->vm);
+        read_neon_element64(right, a->vn, 0, MO_64);
+        read_neon_element64(left, a->vm, 0, MO_64);
         tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
-        neon_store_reg64(dest, a->vd);
+        write_neon_element64(dest, a->vd, 0, MO_64);
 
         tcg_temp_free_i64(left);
         tcg_temp_free_i64(right);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
         destright = tcg_temp_new_i64();
 
         if (a->imm < 8) {
-            neon_load_reg64(right, a->vn);
-            neon_load_reg64(middle, a->vn + 1);
+            read_neon_element64(right, a->vn, 0, MO_64);
+            read_neon_element64(middle, a->vn, 1, MO_64);
             tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
-            neon_load_reg64(left, a->vm);
+            read_neon_element64(left, a->vm, 0, MO_64);
             tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
         } else {
-            neon_load_reg64(right, a->vn + 1);
-            neon_load_reg64(middle, a->vm);
+            read_neon_element64(right, a->vn, 1, MO_64);
+            read_neon_element64(middle, a->vm, 0, MO_64);
             tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
-            neon_load_reg64(left, a->vm + 1);
+            read_neon_element64(left, a->vm, 1, MO_64);
             tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
         }
 
-        neon_store_reg64(destright, a->vd);
-        neon_store_reg64(destleft, a->vd + 1);
+        write_neon_element64(destright, a->vd, 0, MO_64);
+        write_neon_element64(destleft, a->vd, 1, MO_64);
 
         tcg_temp_free_i64(destright);
         tcg_temp_free_i64(destleft);
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
 
         if (accfn) {
             TCGv_i64 tmp64 = tcg_temp_new_i64();
-            neon_load_reg64(tmp64, a->vd + pass);
+            read_neon_element64(tmp64, a->vd, pass, MO_64);
             accfn(rd_64, tmp64, rd_64);
             tcg_temp_free_i64(tmp64);
         }
-        neon_store_reg64(rd_64, a->vd + pass);
+        write_neon_element64(rd_64, a->vd, pass, MO_64);
         tcg_temp_free_i64(rd_64);
     }
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
     rd0 = tcg_temp_new_i32();
     rd1 = tcg_temp_new_i32();
 
-    neon_load_reg64(rm, a->vm);
+    read_neon_element64(rm, a->vm, 0, MO_64);
     narrowfn(rd0, cpu_env, rm);
-    neon_load_reg64(rm, a->vm + 1);
+    read_neon_element64(rm, a->vm, 1, MO_64);
     narrowfn(rd1, cpu_env, rm);
     write_neon_element32(rd0, a->vd, 0, MO_32);
     write_neon_element32(rd1, a->vd, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
 
     widenfn(rd, rm0);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
-    neon_store_reg64(rd, a->vd);
+    write_neon_element64(rd, a->vd, 0, MO_64);
     widenfn(rd, rm1);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
-    neon_store_reg64(rd, a->vd + 1);
+    write_neon_element64(rd, a->vd, 1, MO_64);
 
     tcg_temp_free_i64(rd);
     tcg_temp_free_i32(rm0);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSWP(DisasContext *s, arg_2misc *a)
     rm = tcg_temp_new_i64();
     rd = tcg_temp_new_i64();
     for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
-        neon_load_reg64(rm, a->vm + pass);
-        neon_load_reg64(rd, a->vd + pass);
-        neon_store_reg64(rm, a->vd + pass);
-        neon_store_reg64(rd, a->vm + pass);
+        read_neon_element64(rm, a->vm, pass, MO_64);
+        read_neon_element64(rd, a->vd, pass, MO_64);
+        write_neon_element64(rm, a->vd, pass, MO_64);
+        write_neon_element64(rd, a->vm, pass, MO_64);
     }
     tcg_temp_free_i64(rm);
     tcg_temp_free_i64(rd);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The only uses of this function are for loading VFP
double-precision values, and nothing to do with NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-10-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         |  8 ++--
 target/arm/translate-vfp.c.inc | 84 +++++++++++++++++-----------------
 2 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-static inline void neon_load_reg64(TCGv_i64 var, int reg)
+static inline void vfp_load_reg64(TCGv_i64 var, int reg)
 {
-    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
+    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(true, reg));
 }
 
-static inline void neon_store_reg64(TCGv_i64 var, int reg)
+static inline void vfp_store_reg64(TCGv_i64 var, int reg)
 {
-    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
+    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(true, reg));
 }
 
 static inline void vfp_load_reg32(TCGv_i32 var, int reg)
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         tcg_gen_ext_i32_i64(nf, cpu_NF);
         tcg_gen_ext_i32_i64(vf, cpu_VF);
 
-        neon_load_reg64(frn, rn);
-        neon_load_reg64(frm, rm);
+        vfp_load_reg64(frn, rn);
+        vfp_load_reg64(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
             tcg_temp_free_i64(tmp);
             break;
         }
-        neon_store_reg64(dest, rd);
+        vfp_store_reg64(dest, rd);
         tcg_temp_free_i64(frn);
         tcg_temp_free_i64(frm);
         tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i64 tcg_res;
         tcg_op = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
-        neon_load_reg64(tcg_op, rm);
+        vfp_load_reg64(tcg_op, rm);
         gen_helper_rintd(tcg_res, tcg_op, fpst);
-        neon_store_reg64(tcg_res, rd);
+        vfp_store_reg64(tcg_res, rd);
         tcg_temp_free_i64(tcg_op);
         tcg_temp_free_i64(tcg_res);
     } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         tcg_double = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
         tcg_tmp = tcg_temp_new_i32();
-        neon_load_reg64(tcg_double, rm);
+        vfp_load_reg64(tcg_double, rm);
         if (is_signed) {
             gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
         } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
     tmp = tcg_temp_new_i64();
     if (a->l) {
         gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-        neon_store_reg64(tmp, a->vd);
+        vfp_store_reg64(tmp, a->vd);
     } else {
-        neon_load_reg64(tmp, a->vd);
+        vfp_load_reg64(tmp, a->vd);
         gen_aa32_st64(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i64(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
         if (a->l) {
             /* load */
             gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-            neon_store_reg64(tmp, a->vd + i);
+            vfp_store_reg64(tmp, a->vd + i);
         } else {
             /* store */
-            neon_load_reg64(tmp, a->vd + i);
+            vfp_load_reg64(tmp, a->vd + i);
             gen_aa32_st64(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
     fd = tcg_temp_new_i64();
     fpst = fpstatus_ptr(FPST_FPCR);
 
-    neon_load_reg64(f0, vn);
-    neon_load_reg64(f1, vm);
+    vfp_load_reg64(f0, vn);
+    vfp_load_reg64(f1, vm);
 
     for (;;) {
         if (reads_vd) {
-            neon_load_reg64(fd, vd);
+            vfp_load_reg64(fd, vd);
         }
         fn(fd, f0, f1, fpst);
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
         veclen--;
         vd = vfp_advance_dreg(vd, delta_d);
         vn = vfp_advance_dreg(vn, delta_d);
-        neon_load_reg64(f0, vn);
+        vfp_load_reg64(f0, vn);
         if (delta_m) {
             vm = vfp_advance_dreg(vm, delta_m);
-            neon_load_reg64(f1, vm);
+            vfp_load_reg64(f1, vm);
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
     f0 = tcg_temp_new_i64();
     fd = tcg_temp_new_i64();
 
-    neon_load_reg64(f0, vm);
+    vfp_load_reg64(f0, vm);
 
     for (;;) {
         fn(fd, f0);
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
             /* single source one-many */
             while (veclen--) {
                 vd = vfp_advance_dreg(vd, delta_d);
-                neon_store_reg64(fd, vd);
+                vfp_store_reg64(fd, vd);
             }
             break;
         }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
         veclen--;
         vd = vfp_advance_dreg(vd, delta_d);
         vd = vfp_advance_dreg(vm, delta_m);
-        neon_load_reg64(f0, vm);
+        vfp_load_reg64(f0, vm);
     }
 
     tcg_temp_free_i64(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i64();
 
-    neon_load_reg64(vn, a->vn);
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vn, a->vn);
+    vfp_load_reg64(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negd(vn, vn);
     }
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negd(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     fd = tcg_const_i64(vfp_expand_imm(MO_64, a->imm));
 
     for (;;) {
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
     vd = tcg_temp_new_i64();
     vm = tcg_temp_new_i64();
 
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i64(vm, 0);
     } else {
-        neon_load_reg64(vm, a->vm);
+        vfp_load_reg64(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
     tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
     vd = tcg_temp_new_i64();
     gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(ahp_mode);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
     tmp = tcg_temp_new_i32();
     vm = tcg_temp_new_i64();
 
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
     tcg_temp_free_i64(vm);
     tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rintd(tmp, tmp, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rintd(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     tcg_temp_free_i32(tcg_rmode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rintd_exact(tmp, tmp, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
     vd = tcg_temp_new_i64();
     vfp_load_reg32(vm, a->vm);
     gen_helper_vfp_fcvtds(vd, vm, cpu_env);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_i64(vd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
 
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i64();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
     vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
         /* u32 -> f64 */
         gen_helper_vfp_uitod(vd, vm, fpst);
     }
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_i64(vd);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
 
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i32();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vjcvt(vd, vm, cpu_env);
     vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i64(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i64();
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i64(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
     fpst = fpstatus_ptr(FPST_FPCR);
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i32();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

In both cases, we can sink the write-back and perform
the accumulate into the normal destination temps.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-11-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.c.inc | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     if (accfn) {
         tmp = tcg_temp_new_i64();
         read_neon_element64(tmp, a->vd, 0, MO_64);
-        accfn(tmp, tmp, rd0);
-        write_neon_element64(tmp, a->vd, 0, MO_64);
+        accfn(rd0, tmp, rd0);
         read_neon_element64(tmp, a->vd, 1, MO_64);
-        accfn(tmp, tmp, rd1);
-        write_neon_element64(tmp, a->vd, 1, MO_64);
+        accfn(rd1, tmp, rd1);
         tcg_temp_free_i64(tmp);
-    } else {
-        write_neon_element64(rd0, a->vd, 0, MO_64);
-        write_neon_element64(rd1, a->vd, 1, MO_64);
     }
 
+    write_neon_element64(rd0, a->vd, 0, MO_64);
+    write_neon_element64(rd1, a->vd, 1, MO_64);
     tcg_temp_free_i64(rd0);
     tcg_temp_free_i64(rd1);
 
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
     if (accfn) {
         TCGv_i64 t64 = tcg_temp_new_i64();
         read_neon_element64(t64, a->vd, 0, MO_64);
-        accfn(t64, t64, rn0_64);
-        write_neon_element64(t64, a->vd, 0, MO_64);
+        accfn(rn0_64, t64, rn0_64);
         read_neon_element64(t64, a->vd, 1, MO_64);
-        accfn(t64, t64, rn1_64);
-        write_neon_element64(t64, a->vd, 1, MO_64);
+        accfn(rn1_64, t64, rn1_64);
         tcg_temp_free_i64(t64);
-    } else {
-        write_neon_element64(rn0_64, a->vd, 0, MO_64);
-        write_neon_element64(rn1_64, a->vd, 1, MO_64);
     }
+
+    write_neon_element64(rn0_64, a->vd, 0, MO_64);
+    write_neon_element64(rn1_64, a->vd, 1, MO_64);
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
     return true;
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We can use proper widening loads to extend 32-bit inputs,
and skip the "widenfn" step.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-12-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  6 +++
 target/arm/translate-neon.c.inc | 66 ++++++++++++++++++---------------
 2 files changed, 43 insertions(+), 29 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
     long off = neon_element_offset(reg, ele, memop);
 
     switch (memop) {
+    case MO_SL:
+        tcg_gen_ld32s_i64(dest, cpu_env, off);
+        break;
+    case MO_UL:
+        tcg_gen_ld32u_i64(dest, cpu_env, off);
+        break;
     case MO_Q:
         tcg_gen_ld_i64(dest, cpu_env, off);
         break;
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
 static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
                            NeonGenWidenFn *widenfn,
                            NeonGenTwo64OpFn *opfn,
-                           bool src1_wide)
+                           int src1_mop, int src2_mop)
 {
     /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
     TCGv_i64 rn0_64, rn1_64, rm_64;
-    TCGv_i32 rm;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
         return false;
     }
 
-    if (!widenfn || !opfn) {
+    if (!opfn) {
         /* size == 3 case, which is an entirely different insn group */
         return false;
     }
 
-    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
+    if ((a->vd & 1) || (src1_mop == MO_Q && (a->vn & 1))) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rn1_64 = tcg_temp_new_i64();
     rm_64 = tcg_temp_new_i64();
 
-    if (src1_wide) {
-        read_neon_element64(rn0_64, a->vn, 0, MO_64);
+    if (src1_mop >= 0) {
+        read_neon_element64(rn0_64, a->vn, 0, src1_mop);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 0, MO_32);
         widenfn(rn0_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = tcg_temp_new_i32();
-    read_neon_element32(rm, a->vm, 0, MO_32);
+    if (src2_mop >= 0) {
+        read_neon_element64(rm_64, a->vm, 0, src2_mop);
+    } else {
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, 0, MO_32);
+        widenfn(rm_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
 
-    widenfn(rm_64, rm);
-    tcg_temp_free_i32(rm);
     opfn(rn0_64, rn0_64, rm_64);
 
     /*
      * Load second pass inputs before storing the first pass result, to
      * avoid incorrect results if a narrow input overlaps with the result.
      */
-    if (src1_wide) {
-        read_neon_element64(rn1_64, a->vn, 1, MO_64);
+    if (src1_mop >= 0) {
+        read_neon_element64(rn1_64, a->vn, 1, src1_mop);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 1, MO_32);
         widenfn(rn1_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = tcg_temp_new_i32();
-    read_neon_element32(rm, a->vm, 1, MO_32);
+    if (src2_mop >= 0) {
+        read_neon_element64(rm_64, a->vm, 1, src2_mop);
+    } else {
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, 1, MO_32);
+        widenfn(rm_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
 
     write_neon_element64(rn0_64, a->vd, 0, MO_64);
 
-    widenfn(rm_64, rm);
-    tcg_temp_free_i32(rm);
     opfn(rn1_64, rn1_64, rm_64);
     write_neon_element64(rn1_64, a->vd, 1, MO_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     return true;
 }
 
-#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
+#define DO_PREWIDEN(INSN, S, OP, SRC1WIDE, SIGN)                        \
     static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
     {                                                                   \
         static NeonGenWidenFn * const widenfn[] = {                     \
             gen_helper_neon_widen_##S##8,                               \
             gen_helper_neon_widen_##S##16,                              \
-            tcg_gen_##EXT##_i32_i64,                                    \
-            NULL,                                                       \
+            NULL, NULL,                                                 \
         };                                                              \
         static NeonGenTwo64OpFn * const addfn[] = {                     \
             gen_helper_neon_##OP##l_u16,                                \
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
             tcg_gen_##OP##_i64,                                         \
             NULL,                                                       \
         };                                                              \
-        return do_prewiden_3d(s, a, widenfn[a->size],                   \
-                              addfn[a->size], SRC1WIDE);                \
+        int narrow_mop = a->size == MO_32 ? MO_32 | SIGN : -1;          \
+        return do_prewiden_3d(s, a, widenfn[a->size], addfn[a->size],   \
+                              SRC1WIDE ? MO_Q : narrow_mop,             \
+                              narrow_mop);                              \
     }
 
-DO_PREWIDEN(VADDL_S, s, ext, add, false)
-DO_PREWIDEN(VADDL_U, u, extu, add, false)
-DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
-DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
-DO_PREWIDEN(VADDW_S, s, ext, add, true)
-DO_PREWIDEN(VADDW_U, u, extu, add, true)
-DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
-DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
+DO_PREWIDEN(VADDL_S, s, add, false, MO_SIGN)
+DO_PREWIDEN(VADDL_U, u, add, false, 0)
+DO_PREWIDEN(VSUBL_S, s, sub, false, MO_SIGN)
+DO_PREWIDEN(VSUBL_U, u, sub, false, 0)
+DO_PREWIDEN(VADDW_S, s, add, true, MO_SIGN)
+DO_PREWIDEN(VADDW_U, u, add, true, 0)
+DO_PREWIDEN(VSUBW_S, s, sub, true, MO_SIGN)
+DO_PREWIDEN(VSUBW_U, u, sub, true, 0)
 
 static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
                          NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
-- 
2.20.1

In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
meant we were using the H4() address swizzler macro rather than the
H2() which is required for 2-byte data.  This had no effect on
little-endian hosts but meant we put the result data into the
destination Dreg in the wrong order on big-endian hosts.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201028191712.4910-2-peter.maydell@linaro.org
---
 target/arm/vec_helper.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_ABA(gvec_uaba_d, uint64_t)
         r2 = float16_##OP(m[H2(0)], m[H2(1)], fpst);                    \
         r3 = float16_##OP(m[H2(2)], m[H2(3)], fpst);                    \
                                                                         \
-        d[H4(0)] = r0;                                                  \
-        d[H4(1)] = r1;                                                  \
-        d[H4(2)] = r2;                                                  \
-        d[H4(3)] = r3;                                                  \
+        d[H2(0)] = r0;                                                  \
+        d[H2(1)] = r1;                                                  \
+        d[H2(2)] = r2;                                                  \
+        d[H2(3)] = r3;                                                  \
     }
 
 DO_NEON_PAIRWISE(neon_padd, add)
-- 
2.20.1

The helper functions for performing the udot/sdot operations against
a scalar were not using an address-swizzling macro when converting
the index of the scalar element into a pointer into the vm array.
This had no effect on little-endian hosts but meant we generated
incorrect results on big-endian hosts.

For these insns, the index is indexing over group of 4 8-bit values,
so 32 bits per indexed entity, and H4() is therefore what we want.
(For Neon the only possible input indexes are 0 and 1.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201028191712.4910-3-peter.maydell@linaro.org
---
 target/arm/vec_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t index = simd_data(desc);
     uint32_t *d = vd;
     int8_t *n = vn;
-    int8_t *m_indexed = (int8_t *)vm + index * 4;
+    int8_t *m_indexed = (int8_t *)vm + H4(index) * 4;
 
     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
      * Otherwise opr_sz is a multiple of 16.
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t index = simd_data(desc);
     uint32_t *d = vd;
     uint8_t *n = vn;
-    uint8_t *m_indexed = (uint8_t *)vm + index * 4;
+    uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4;
 
     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
      * Otherwise opr_sz is a multiple of 16.
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

HCR should be applied when NS is set, not when it is cleared.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
 
 /*
  * Non-IS variants of TLB operations are upgraded to
- * IS versions if we are at NS EL1 and HCR_EL2.FB is set to
+ * IS versions if we are at EL1 and HCR_EL2.FB is effectively set to
  * force broadcast of these operations.
  */
 static bool tlb_force_broadcast(CPUARMState *env)
 {
-    return (env->cp15.hcr_el2 & HCR_FB) &&
-        arm_current_el(env) == 1 && arm_is_secure_below_el3(env);
+    return arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_FB);
 }
 
 static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

Secure mode is not exempted from checking SCR_EL3.TLOR, and in the
future HCR_EL2.TLOR when S-EL2 is enabled.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t id_aa64pfr0_read(CPUARMState *env, const ARMCPRegInfo *ri)
 #endif
 
 /* Shared logic between LORID and the rest of the LOR* registers.
- * Secure state has already been delt with.
+ * Secure state exclusion has already been dealt with.
  */
-static CPAccessResult access_lor_ns(CPUARMState *env)
+static CPAccessResult access_lor_ns(CPUARMState *env,
+                                    const ARMCPRegInfo *ri, bool isread)
 {
     int el = arm_current_el(env);
 
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_ns(CPUARMState *env)
     return CP_ACCESS_OK;
 }
 
-static CPAccessResult access_lorid(CPUARMState *env, const ARMCPRegInfo *ri,
-                                   bool isread)
-{
-    if (arm_is_secure_below_el3(env)) {
-        /* Access ok in secure mode.  */
-        return CP_ACCESS_OK;
-    }
-    return access_lor_ns(env);
-}
-
 static CPAccessResult access_lor_other(CPUARMState *env,
                                        const ARMCPRegInfo *ri, bool isread)
 {
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_other(CPUARMState *env,
         /* Access denied in secure mode.  */
         return CP_ACCESS_TRAP;
     }
-    return access_lor_ns(env);
+    return access_lor_ns(env, ri, isread);
 }
 
 /*
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lor_reginfo[] = {
       .type = ARM_CP_CONST, .resetvalue = 0 },
     { .name = "LORID_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 7,
-      .access = PL1_R, .accessfn = access_lorid,
+      .access = PL1_R, .accessfn = access_lor_ns,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     REGINFO_SENTINEL
 };
-- 
2.20.1

If we're using the capstone disassembler, disassembly of a run of
instructions more than 32 bytes long disassembles the wrong data for
instructions beyond the 32 byte mark:

(qemu) xp /16x 0x100
0000000000000100: 0x00000005 0x54410001 0x00000001 0x00001000
0000000000000110: 0x00000000 0x00000004 0x54410002 0x3c000000
0000000000000120: 0x00000000 0x00000004 0x54410009 0x74736574
0000000000000130: 0x00000000 0x00000000 0x00000000 0x00000000
(qemu) xp /16i 0x100
0x00000100: 00000005 andeq r0, r0, r5
0x00000104: 54410001 strbpl r0, [r1], #-1
0x00000108: 00000001 andeq r0, r0, r1
0x0000010c: 00001000 andeq r1, r0, r0
0x00000110: 00000000 andeq r0, r0, r0
0x00000114: 00000004 andeq r0, r0, r4
0x00000118: 54410002 strbpl r0, [r1], #-2
0x0000011c: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
0x00000120: 54410001 strbpl r0, [r1], #-1
0x00000124: 00000001 andeq r0, r0, r1
0x00000128: 00001000 andeq r1, r0, r0
0x0000012c: 00000000 andeq r0, r0, r0
0x00000130: 00000004 andeq r0, r0, r4
0x00000134: 54410002 strbpl r0, [r1], #-2
0x00000138: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
0x0000013c: 00000000 andeq r0, r0, r0

Here the disassembly of 0x120..0x13f is using the data that is in
0x104..0x123.

This is caused by passing the wrong value to the read_memory_func().
The intention is that at this point in the loop the 'cap_buf' buffer
already contains 'csize' bytes of data for the instruction at guest
addr 'pc', and we want to read in an extra 'tsize' bytes.  Those
extra bytes are therefore at 'pc + csize', not 'pc'.  On the first
time through the loop 'csize' happens to be zero, so the initial read
of 32 bytes into cap_buf is correct and as long as the disassembly
never needs to read more data we return the correct information.

Use the correct guest address in the call to read_memory_func().

Cc: qemu-stable@nongnu.org
Fixes: https://bugs.launchpad.net/qemu/+bug/1900779
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201022132445.25039-1-peter.maydell@linaro.org
---
 disas/capstone.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/disas/capstone.c b/disas/capstone.c
index XXXXXXX..XXXXXXX 100644
--- a/disas/capstone.c
+++ b/disas/capstone.c
@@ -XXX,XX +XXX,XX @@ bool cap_disas_monitor(disassemble_info *info, uint64_t pc, int count)
 
         /* Make certain that we can make progress.  */
         assert(tsize != 0);
-        info->read_memory_func(pc, cap_buf + csize, tsize, info);
+        info->read_memory_func(pc + csize, cap_buf + csize, tsize, info);
         csize += tsize;
 
         if (cs_disasm_iter(handle, &cbuf, &csize, &pc, insn)) {
-- 
2.20.1

From: Philippe Mathieu-Daudé <philmd@redhat.com>

Use the BIT_ULL() macro to ensure we use 64-bit arithmetic.
This fixes the following Coverity issue (OVERFLOW_BEFORE_WIDEN):

CID 1432363 (#1 of 1): Unintentional integer overflow:

overflow_before_widen:
    Potentially overflowing expression 1 << scale with type int
    (32 bits, signed) is evaluated using 32-bit arithmetic, and
    then used in a context that expects an expression of type
    hwaddr (64 bits, unsigned).

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20201030144617.1535064-1-philmd@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/smmuv3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/bitops.h"
 #include "hw/irq.h"
 #include "hw/sysbus.h"
 #include "migration/vmstate.h"
@@ -XXX,XX +XXX,XX @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
         scale = CMD_SCALE(cmd);
         num = CMD_NUM(cmd);
         ttl = CMD_TTL(cmd);
-        num_pages = (num + 1) * (1 << (scale));
+        num_pages = (num + 1) * BIT_ULL(scale);
     }
 
     if (type == SMMU_CMD_TLBI_NH_VA) {
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

When booting a CPU with EL3 using the -kernel flag, set up CPTR_EL3 so
that SVE will not trap to EL3.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030151541.11976-1-remi@remlab.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/boot.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@ static void do_cpu_reset(void *opaque)
                     if (cpu_isar_feature(aa64_mte, cpu)) {
                         env->cp15.scr_el3 |= SCR_ATA;
                     }
+                    if (cpu_isar_feature(aa64_sve, cpu)) {
+                        env->cp15.cptr_el[3] |= CPTR_EZ;
+                    }
                     /* AArch64 kernels never boot in secure mode */
                     assert(!info->secure_boot);
                     /* This hook is only supported for AArch32 currently:
-- 
2.20.1

From: AlexChen <alex.chen@huawei.com>

In omap_lcd_interrupts(), the pointer omap_lcd is dereferinced before
being check if it is valid, which may lead to NULL pointer dereference.
So move the assignment to surface after checking that the omap_lcd is valid
and move surface_bits_per_pixel(surface) to after the surface assignment.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: AlexChen <alex.chen@huawei.com>
Message-id: 5F9CDB8A.9000001@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/display/omap_lcdc.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/display/omap_lcdc.c b/hw/display/omap_lcdc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/omap_lcdc.c
+++ b/hw/display/omap_lcdc.c
@@ -XXX,XX +XXX,XX @@ static void omap_lcd_interrupts(struct omap_lcd_panel_s *s)
 static void omap_update_display(void *opaque)
 {
     struct omap_lcd_panel_s *omap_lcd = (struct omap_lcd_panel_s *) opaque;
-    DisplaySurface *surface = qemu_console_surface(omap_lcd->con);
+    DisplaySurface *surface;
     draw_line_func draw_line;
     int size, height, first, last;
     int width, linesize, step, bpp, frame_offset;
     hwaddr frame_base;
 
-    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable ||
-        !surface_bits_per_pixel(surface)) {
+    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable) {
+        return;
+    }
+
+    surface = qemu_console_surface(omap_lcd->con);
+    if (!surface_bits_per_pixel(surface)) {
         return;
     }
 
-- 
2.20.1

From: AlexChen <alex.chen@huawei.com>

In exynos4210_fimd_update(), the pointer s is dereferinced before
being check if it is valid, which may lead to NULL pointer dereference.
So move the assignment to global_width after checking that the s is valid.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 5F9F8D88.9030102@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/display/exynos4210_fimd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/display/exynos4210_fimd.c b/hw/display/exynos4210_fimd.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/exynos4210_fimd.c
+++ b/hw/display/exynos4210_fimd.c
@@ -XXX,XX +XXX,XX @@ static void exynos4210_fimd_update(void *opaque)
     bool blend = false;
     uint8_t *host_fb_addr;
     bool is_dirty = false;
-    const int global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
+    int global_width;
 
     if (!s || !s->console || !s->enabled ||
         surface_bits_per_pixel(qemu_console_surface(s->console)) == 0) {
         return;
     }
+
+    global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
     exynos4210_update_resolution(s);
     surface = qemu_console_surface(s->console);
 
-- 
2.20.1

In arm_v7m_mmu_idx_for_secstate() we get the 'priv' level to pass to
armv7m_mmu_idx_for_secstate_and_priv() by calling arm_current_el().
This is incorrect when the security state being queried is not the
current one, because arm_current_el() uses the current security state
to determine which of the banked CONTROL.nPRIV bits to look at.
The effect was that if (for instance) Secure state was in privileged
mode but Non-Secure was not then we would return the wrong MMU index.

The only places where we are using this function in a way that could
trigger this bug are for the stack loads during a v8M function-return
and for the instruction fetch of a v8M SG insn.

Fix the bug by expanding out the M-profile version of the
arm_current_el() logic inline so it can use the passed in secstate
rather than env->v7m.secure.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201022164408.13214-1-peter.maydell@linaro.org
---
 target/arm/m_helper.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
 /* Return the MMU index for a v7M CPU in the specified security state */
 ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
 {
-    bool priv = arm_current_el(env) != 0;
+    bool priv = arm_v7m_is_handler_mode(env) ||
+        !(env->v7m.control[secstate] & 1);
 
     return arm_v7m_mmu_idx_for_secstate_and_priv(env, secstate, priv);
 }
-- 
2.20.1

On some hosts (eg Ubuntu Bionic) pkg-config returns a set of
libraries for gio-2.0 which don't actually work when compiling
statically. (Specifically, the returned library string includes
-lmount, but not -lblkid which -lmount depends upon, so linking
fails due to missing symbols.)

Check that the libraries work, and don't enable gio if they don't,
in the same way we do for gnutls.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200928160402.7961-1-peter.maydell@linaro.org
---
 configure | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index XXXXXXX..XXXXXXX 100755
--- a/configure
+++ b/configure
@@ -XXX,XX +XXX,XX @@ if test "$static" = yes && test "$mingw32" = yes; then
 fi
 
 if $pkg_config --atleast-version=$glib_req_ver gio-2.0; then
-    gio=yes
     gio_cflags=$($pkg_config --cflags gio-2.0)
     gio_libs=$($pkg_config --libs gio-2.0)
     gdbus_codegen=$($pkg_config --variable=gdbus_codegen gio-2.0)
     if [ ! -x "$gdbus_codegen" ]; then
         gdbus_codegen=
     fi
+    # Check that the libraries actually work -- Ubuntu 18.04 ships
+    # with pkg-config --static --libs data for gio-2.0 that is missing
+    # -lblkid and will give a link error.
+    write_c_skeleton
+    if compile_prog "" "gio_libs" ; then
+        gio=yes
+    else
+        gio=no
+    fi
 else
     gio=no
 fi
-- 
2.20.1

In gicv3_init_cpuif() we copy the ARMCPU gicv3_maintenance_interrupt
into the GICv3CPUState struct's maintenance_irq field.  This will
only work if the board happens to have already wired up the CPU
maintenance IRQ before the GIC was realized.  Unfortunately this is
not the case for the 'virt' board, and so the value that gets copied
is NULL (since a qemu_irq is really a pointer to an IRQState struct
under the hood).  The effect is that the CPU interface code never
actually raises the maintenance interrupt line.

Instead, since the GICv3CPUState has a pointer to the CPUState, make
the dereference at the point where we want to raise the interrupt, to
avoid an implicit requirement on board code to wire things up in a
particular order.

Reported-by: Jose Martins <josemartins90@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20201009153904.28529-1-peter.maydell@linaro.org
Reviewed-by: Luc Michel <luc@lmichel.fr>
---
 include/hw/intc/arm_gicv3_common.h | 1 -
 hw/intc/arm_gicv3_cpuif.c          | 5 ++---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -XXX,XX +XXX,XX @@ struct GICv3CPUState {
     qemu_irq parent_fiq;
     qemu_irq parent_virq;
     qemu_irq parent_vfiq;
-    qemu_irq maintenance_irq;
 
     /* Redistributor */
     uint32_t level;                  /* Current IRQ level */
diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
     int irqlevel = 0;
     int fiqlevel = 0;
     int maintlevel = 0;
+    ARMCPU *cpu = ARM_CPU(cs->cpu);
 
     idx = hppvi_index(cs);
     trace_gicv3_cpuif_virt_update(gicv3_redist_affid(cs), idx);
@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
 
     qemu_set_irq(cs->parent_vfiq, fiqlevel);
     qemu_set_irq(cs->parent_virq, irqlevel);
-    qemu_set_irq(cs->maintenance_irq, maintlevel);
+    qemu_set_irq(cpu->gicv3_maintenance_interrupt, maintlevel);
 }
 
 static uint64_t icv_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
@@ -XXX,XX +XXX,XX @@ void gicv3_init_cpuif(GICv3State *s)
             && cpu->gic_num_lrs) {
             int j;
 
-            cs->maintenance_irq = cpu->gicv3_maintenance_interrupt;
-
             cs->num_list_regs = cpu->gic_num_lrs;
             cs->vpribits = cpu->gic_vpribits;
             cs->vprebits = cpu->gic_vprebits;
-- 
2.20.1

The kerneldoc script currently emits Sphinx markup for a macro with
arguments that uses the c:function directive. This is correct for
Sphinx versions earlier than Sphinx 3, where c:macro doesn't allow
documentation of macros with arguments and c:function is not picky
about the syntax of what it is passed. However, in Sphinx 3 the
c:macro directive was enhanced to support macros with arguments,
and c:function was made more picky about what syntax it accepted.

When kerneldoc is told that it needs to produce output for Sphinx
3 or later, make it emit c:function only for functions and c:macro
for macros with arguments. We assume that anything with a return
type is a function and anything without is a macro.

This fixes the Sphinx error:

/home/petmay01/linaro/qemu-from-laptop/qemu/docs/../include/qom/object.h:155:Error in declarator
If declarator-id with parameters (e.g., 'void f(int arg)'):
  Invalid C declaration: Expected identifier in nested name. [error at 25]
    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
    -------------------------^
If parenthesis in noptr-declarator (e.g., 'void (*f(int arg))(double)'):
  Error in declarator or parameters
  Invalid C declaration: Expecting "(" in parameters. [error at 39]
    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
    ---------------------------------------^

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201030174700.7204-2-peter.maydell@linaro.org
---
 scripts/kernel-doc | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index XXXXXXX..XXXXXXX 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -XXX,XX +XXX,XX @@ sub output_function_rst(%) {
 	output_highlight_rst($args{'purpose'});
 	$start = "\n\n**Syntax**\n\n  ``";
     } else {
-	print ".. c:function:: ";
+        if ((split(/\./, $sphinx_version))[0] >= 3) {
+            # Sphinx 3 and later distinguish macros and functions and
+            # complain if you use c:function with something that's not
+            # syntactically valid as a function declaration.
+            # We assume that anything with a return type is a function
+            # and anything without is a macro.
+            if ($args{'functiontype'} ne "") {
+                print ".. c:function:: ";
+            } else {
+                print ".. c:macro:: ";
+            }
+        } else {
+            # Older Sphinx don't support documenting macros that take
+            # arguments with c:macro, and don't complain about the use
+            # of c:function for this.
+            print ".. c:function:: ";
+        }
     }
     if ($args{'functiontype'} ne "") {
 	$start .= $args{'functiontype'} . " " . $args{'function'} . " (";
-- 
2.20.1

Sphinx 3.2 is pickier than earlier versions about the option:: markup,
and complains about our usage in qemu-option-trace.rst:

../../docs/qemu-option-trace.rst.inc:4:Malformed option description
  '[enable=]PATTERN', should look like "opt", "-opt args", "--opt args",
  "/opt args" or "+opt args"

In this file, we're really trying to document the different parts of
the top-level --trace option, which qemu-nbd.rst and qemu-img.rst
have already introduced with an option:: markup.  So it's not right
to use option:: here anyway.  Switch to a different markup
(definition lists) which gives about the same formatted output.

(Unlike option::, this markup doesn't produce index entries; but
at the moment we don't do anything much with indexes anyway, and
in any case I think it doesn't make much sense to have individual
index entries for the sub-parts of the --trace option.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201030174700.7204-3-peter.maydell@linaro.org
---
 docs/qemu-option-trace.rst.inc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/qemu-option-trace.rst.inc b/docs/qemu-option-trace.rst.inc
index XXXXXXX..XXXXXXX 100644
--- a/docs/qemu-option-trace.rst.inc
+++ b/docs/qemu-option-trace.rst.inc
@@ -XXX,XX +XXX,XX @@
 
 Specify tracing options.
 
-.. option:: [enable=]PATTERN
+``[enable=]PATTERN``
 
   Immediately enable events matching *PATTERN*
   (either event name or a globbing pattern).  This option is only
@@ -XXX,XX +XXX,XX @@ Specify tracing options.
 
   Use :option:`-trace help` to print a list of names of trace points.
 
-.. option:: events=FILE
+``events=FILE``
 
   Immediately enable events listed in *FILE*.
   The file must contain one event name (as listed in the ``trace-events-all``
@@ -XXX,XX +XXX,XX @@ Specify tracing options.
   available if QEMU has been compiled with the ``simple``, ``log`` or
   ``ftrace`` tracing backend.
 
-.. option:: file=FILE
+``file=FILE``
 
   Log output traces to *FILE*.
   This option is only available if QEMU has been compiled with
-- 
2.20.1

The randomness tests in the NPCM7xx RNG test fail intermittently
but fairly frequently. On my machine running the test in a loop:
 while QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/npcm7xx_rng-test; do true; done

will fail in less than a minute with an error like:
ERROR:../../tests/qtest/npcm7xx_rng-test.c:256:test_first_byte_runs:
assertion failed (calc_runs_p(buf.l, sizeof(buf) * BITS_PER_BYTE) > 0.01): (0.00286205989 > 0.01)

(Failures have been observed on all 4 of the randomness tests,
not just first_byte_runs.)

It's not clear why these tests are failing like this, but intermittent
failures make CI and merge testing awkward, so disable running them
unless a developer specifically sets QEMU_TEST_FLAKY_RNG_TESTS when
running the test suite, until we work out the cause.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201102152454.8287-1-peter.maydell@linaro.org
Reviewed-by: Havard Skinnemoen <hskinnemoen@google.com>
---
 tests/qtest/npcm7xx_rng-test.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/npcm7xx_rng-test.c b/tests/qtest/npcm7xx_rng-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/npcm7xx_rng-test.c
+++ b/tests/qtest/npcm7xx_rng-test.c
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
 
     qtest_add_func("npcm7xx_rng/enable_disable", test_enable_disable);
     qtest_add_func("npcm7xx_rng/rosel", test_rosel);
-    qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
-    qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
-    qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
-    qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
+    /*
+     * These tests fail intermittently; only run them on explicit
+     * request until we figure out why.
+     */
+    if (getenv("QEMU_TEST_FLAKY_RNG_TESTS")) {
+        qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
+        qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
+        qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
+        qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
+    }
 
     qtest_start("-machine npcm750-evb");
     ret = g_test_run();
-- 
2.20.1