Series comparison

 [PULL 00/26] target-arm queue
-Small pile of bug fixes for rc1. I've included my patches to get
+The following changes since commit 64ada298b98a51eb2512607f6e6180cb330c47b1:
 our docs building with Sphinx 3, just for convenience...
--- PMM
+  Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220302' into staging (2022-03-02 12:38:46 +0000)
 The following changes since commit b149dea55cce97cb226683d06af61984a1c11e96:
   Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20201102' into staging (2020-11-02 10:57:48 +0000)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20201102
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20220302
-for you to fetch changes up to ffb4fbf90a2f63c9cb33e4bb9f854c79bf04ca4a:
+for you to fetch changes up to 268c11984e67867c22f53beb3c7f8b98900d66b2:
-  tests/qtest/npcm7xx_rng-test: Disable randomness tests (2020-11-02 16:52:18 +0000)
+  ui/cocoa.m: Remove unnecessary NSAutoreleasePools (2022-03-02 19:27:37 +0000)
 ----------------------------------------------------------------
 target-arm queue:
- * target/arm: Fix Neon emulation bugs on big-endian hosts
+ * mps3-an547: Add missing user ahb interfaces
- * target/arm: fix handling of HCR.FB
+ * hw/arm/mps2-tz.c: Update AN547 documentation URL
- * target/arm: fix LORID_EL1 access check
+ * hw/input/tsc210x: Don't abort on bad SPI word widths
- * disas/capstone: Fix monitor disassembly of >32 bytes
+ * hw/i2c: flatten pca954x mux device
- * hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
+ * target/arm: Support PSCI 1.1 and SMCCC 1.0
- * hw/arm/boot: fix SVE for EL3 direct kernel boot
+ * target/arm: Fix early free of TCG temp in handle_simd_shift_fpint_conv()
- * hw/display/omap_lcdc: Fix potential NULL pointer dereference
+ * tests/qtest: add qtests for npcm7xx sdhci
- * hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
+ * Implement FEAT_LVA
- * target/arm: Get correct MMU index for other-security-state
+ * Implement FEAT_LPA
- * configure: Test that gio libs from pkg-config work
+ * Implement FEAT_LPA2 (but do not enable it yet)
- * hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
+ * Report KVM's actual PSCI version to guest in dtb
- * docs: Fix building with Sphinx 3
+ * ui/cocoa.m: Fix updateUIInfo threading issues
- * tests/qtest/npcm7xx_rng-test: Disable randomness tests
+ * ui/cocoa.m: Remove unnecessary NSAutoreleasePools
 ----------------------------------------------------------------
-AlexChen (2):
+Akihiko Odaki (1):
-      hw/display/omap_lcdc: Fix potential NULL pointer dereference
+      target/arm: Support PSCI 1.1 and SMCCC 1.0
       hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
-Peter Maydell (9):
+Jimmy Brisson (1):
-      target/arm: Fix float16 pairwise Neon ops on big-endian hosts
+      mps3-an547: Add missing user ahb interfaces
       target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
       disas/capstone: Fix monitor disassembly of >32 bytes
       target/arm: Get correct MMU index for other-security-state
       configure: Test that gio libs from pkg-config work
       hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
       scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
       qemu-option-trace.rst.inc: Don't use option:: markup
       tests/qtest/npcm7xx_rng-test: Disable randomness tests
-Philippe Mathieu-Daudé (1):
+Patrick Venture (1):
-      hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
+      hw/i2c: flatten pca954x mux device
-Richard Henderson (11):
+Peter Maydell (5):
-      target/arm: Introduce neon_full_reg_offset
+      hw/arm/mps2-tz.c: Update AN547 documentation URL
-      target/arm: Move neon_element_offset to translate.c
+      hw/input/tsc210x: Don't abort on bad SPI word widths
-      target/arm: Use neon_element_offset in neon_load/store_reg
+      target/arm: Report KVM's actual PSCI version to guest in dtb
-      target/arm: Use neon_element_offset in vfp_reg_offset
+      ui/cocoa.m: Fix updateUIInfo threading issues
-      target/arm: Add read/write_neon_element32
+      ui/cocoa.m: Remove unnecessary NSAutoreleasePools
       target/arm: Expand read/write_neon_element32 to all MemOp
       target/arm: Rename neon_load_reg32 to vfp_load_reg32
       target/arm: Add read/write_neon_element64
       target/arm: Rename neon_load_reg64 to vfp_load_reg64
       target/arm: Simplify do_long_3d and do_2scalar_long
       target/arm: Improve do_prewiden_3d
-Rémi Denis-Courmont (3):
+Richard Henderson (16):
-      target/arm: fix handling of HCR.FB
+      hw/registerfields: Add FIELD_SEX<N> and FIELD_SDP<N>
-      target/arm: fix LORID_EL1 access check
+      target/arm: Set TCR_EL1.TSZ for user-only
-      hw/arm/boot: fix SVE for EL3 direct kernel boot
+      target/arm: Fault on invalid TCR_ELx.TxSZ
       target/arm: Move arm_pamax out of line
       target/arm: Pass outputsize down to check_s2_mmu_setup
       target/arm: Use MAKE_64BIT_MASK to compute indexmask
       target/arm: Honor TCR_ELx.{I}PS
       target/arm: Prepare DBGBVR and DBGWVR for FEAT_LVA
       target/arm: Implement FEAT_LVA
       target/arm: Implement FEAT_LPA
       target/arm: Extend arm_fi_to_lfsc to level -1
       target/arm: Introduce tlbi_aa64_get_range
       target/arm: Fix TLBIRange.base for 16k and 64k pages
       target/arm: Validate tlbi TG matches translation granule in use
       target/arm: Advertise all page sizes for -cpu max
       target/arm: Implement FEAT_LPA2
- docs/qemu-option-trace.rst.inc     |   6 +-
+Shengtan Mao (1):
- configure                          |  10 +-
+      tests/qtest: add qtests for npcm7xx sdhci
  include/hw/intc/arm_gicv3_common.h |   1 -
  disas/capstone.c                   |   2 +-
  hw/arm/boot.c                      |   3 +
  hw/arm/smmuv3.c                    |   3 +-
  hw/display/exynos4210_fimd.c       |   4 +-
  hw/display/omap_lcdc.c             |  10 +-
  hw/intc/arm_gicv3_cpuif.c          |   5 +-
  target/arm/helper.c                |  24 +-
  target/arm/m_helper.c              |   3 +-
  target/arm/translate.c             | 153 +++++++++---
  target/arm/vec_helper.c            |  12 +-
  tests/qtest/npcm7xx_rng-test.c     |  14 +-
  scripts/kernel-doc                 |  18 +-
  target/arm/translate-neon.c.inc    | 472 ++++++++++++++++++++-----------------
  target/arm/translate-vfp.c.inc     | 341 +++++++++++----------------
 files changed, 588 insertions(+), 493 deletions(-)
+Wentao_Liang (1):
+      target/arm: Fix early free of TCG temp in handle_simd_shift_fpint_conv()
+ docs/system/arm/emulation.rst    |   3 +
+ include/hw/registerfields.h      |  48 +++++-
+ target/arm/cpu-param.h           |   4 +-
+ target/arm/cpu.h                 |  27 ++++
+ target/arm/internals.h           |  58 ++++---
+ target/arm/kvm-consts.h          |  14 +-
+ hw/arm/boot.c                    |  11 +-
+ hw/arm/mps2-tz.c                 |   6 +-
+ hw/i2c/i2c_mux_pca954x.c         |  77 ++-------
+ hw/input/tsc210x.c               |   8 +-
+ target/arm/cpu.c                 |   8 +-
+ target/arm/cpu64.c               |   7 +-
+ target/arm/helper.c              | 332 ++++++++++++++++++++++++++++++---------
+ target/arm/hvf/hvf.c             |  27 +++-
+ target/arm/kvm64.c               |  14 +-
+ target/arm/psci.c                |  35 ++++-
+ target/arm/translate-a64.c       |   2 +-
+ tests/qtest/npcm7xx_sdhci-test.c | 215 +++++++++++++++++++++++++
+ tests/qtest/meson.build          |   1 +
+ ui/cocoa.m                       |  31 ++--
+files changed, 736 insertions(+), 192 deletions(-)
+ create mode 100644 tests/qtest/npcm7xx_sdhci-test.c

-[PULL 25/26] qemu-option-trace.rst.inc: Don't use option:: markup
+[PULL 01/26] mps3-an547: Add missing user ahb interfaces
-Sphinx 3.2 is pickier than earlier versions about the option:: markup,
+From: Jimmy Brisson <jimmy.brisson@linaro.org>
 and complains about our usage in qemu-option-trace.rst:
-../../docs/qemu-option-trace.rst.inc:4:Malformed option description
+With these interfaces missing, TFM would delegate peripherals 0, 1,
-  '[enable=]PATTERN', should look like "opt", "-opt args", "--opt args",
+, 3 and 8, and qemu would ignore the delegation of interface 8, as
-  "/opt args" or "+opt args"
+it thought interface 4 was eth & USB.
-In this file, we're really trying to document the different parts of
+This patch corrects this behavior and allows TFM to delegate the
-the top-level --trace option, which qemu-nbd.rst and qemu-img.rst
+eth & USB peripheral to NS mode.
 have already introduced with an option:: markup.  So it's not right
 to use option:: here anyway.  Switch to a different markup
 (definition lists) which gives about the same formatted output.
-(Unlike option::, this markup doesn't produce index entries; but
+(The old QEMU behaviour was based on revision B of the AN547
-at the moment we don't do anything much with indexes anyway, and
+appnote; revision C corrects this error in the documentation,
-in any case I think it doesn't make much sense to have individual
+and this commit brings QEMU in to line with how the FPGA
-index entries for the sub-parts of the --trace option.)
+image really behaves.)
+Signed-off-by: Jimmy Brisson <jimmy.brisson@linaro.org>
+Message-id: 20220210210227.3203883-1-jimmy.brisson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+[PMM: added commit message note clarifying that the old behaviour
+was a docs issue, not because there were two different versions
+of the FPGA image]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
-Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
-Message-id: 20201030174700.7204-3-peter.maydell@linaro.org
 ---
- docs/qemu-option-trace.rst.inc | 6 +++---
+ hw/arm/mps2-tz.c | 4 ++++
-file changed, 3 insertions(+), 3 deletions(-)
+file changed, 4 insertions(+)
-diff --git a/docs/qemu-option-trace.rst.inc b/docs/qemu-option-trace.rst.inc
+diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
 index XXXXXXX..XXXXXXX 100644
---- a/docs/qemu-option-trace.rst.inc
+--- a/hw/arm/mps2-tz.c
-+++ b/docs/qemu-option-trace.rst.inc
++++ b/hw/arm/mps2-tz.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
+                 { "gpio1", make_unimp_dev, &mms->gpio[1], 0x41101000, 0x1000 },
- Specify tracing options.
+                 { "gpio2", make_unimp_dev, &mms->gpio[2], 0x41102000, 0x1000 },
+                 { "gpio3", make_unimp_dev, &mms->gpio[3], 0x41103000, 0x1000 },
--.. option:: [enable=]PATTERN
++                { /* port 4 USER AHB interface 0 */ },
-+``[enable=]PATTERN``
++                { /* port 5 USER AHB interface 1 */ },
++                { /* port 6 USER AHB interface 2 */ },
-   Immediately enable events matching *PATTERN*
++                { /* port 7 USER AHB interface 3 */ },
-   (either event name or a globbing pattern).  This option is only
+                 { "eth-usb", make_eth_usb, NULL, 0x41400000, 0x200000, { 49 } },
-@@ -XXX,XX +XXX,XX @@ Specify tracing options.
+             },
+         },
    Use :option:`-trace help` to print a list of names of trace points.
 -.. option:: events=FILE
 +``events=FILE``
    Immediately enable events listed in *FILE*.
    The file must contain one event name (as listed in the ``trace-events-all``
@@ -XXX,XX +XXX,XX @@ Specify tracing options.
    available if QEMU has been compiled with the ``simple``, ``log`` or
    ``ftrace`` tracing backend.
 -.. option:: file=FILE
 +``file=FILE``
    Log output traces to *FILE*.
    This option is only available if QEMU has been compiled with
 --
-.20.1
+.25.1

-[PULL 16/26] disas/capstone: Fix monitor disassembly of >32 bytes
+[PULL 02/26] hw/arm/mps2-tz.c: Update AN547 documentation URL
-If we're using the capstone disassembler, disassembly of a run of
+The AN547 application note URL has changed: update our comment
-instructions more than 32 bytes long disassembles the wrong data for
+accordingly. (Rev B is still downloadable from the old URL,
-instructions beyond the 32 byte mark:
+but there is a new Rev C of the document now.)
-(qemu) xp /16x 0x100
-0000000000000100: 0x00000005 0x54410001 0x00000001 0x00001000
-0000000000000110: 0x00000000 0x00000004 0x54410002 0x3c000000
-0000000000000120: 0x00000000 0x00000004 0x54410009 0x74736574
-0000000000000130: 0x00000000 0x00000000 0x00000000 0x00000000
-(qemu) xp /16i 0x100
-x00000100: 00000005 andeq r0, r0, r5
-x00000104: 54410001 strbpl r0, [r1], #-1
-x00000108: 00000001 andeq r0, r0, r1
-x0000010c: 00001000 andeq r1, r0, r0
-x00000110: 00000000 andeq r0, r0, r0
-x00000114: 00000004 andeq r0, r0, r4
-x00000118: 54410002 strbpl r0, [r1], #-2
-x0000011c: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
-x00000120: 54410001 strbpl r0, [r1], #-1
-x00000124: 00000001 andeq r0, r0, r1
-x00000128: 00001000 andeq r1, r0, r0
-x0000012c: 00000000 andeq r0, r0, r0
-x00000130: 00000004 andeq r0, r0, r4
-x00000134: 54410002 strbpl r0, [r1], #-2
-x00000138: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
-x0000013c: 00000000 andeq r0, r0, r0
-Here the disassembly of 0x120..0x13f is using the data that is in
-x104..0x123.
-This is caused by passing the wrong value to the read_memory_func().
-The intention is that at this point in the loop the 'cap_buf' buffer
-already contains 'csize' bytes of data for the instruction at guest
-addr 'pc', and we want to read in an extra 'tsize' bytes.  Those
-extra bytes are therefore at 'pc + csize', not 'pc'.  On the first
-time through the loop 'csize' happens to be zero, so the initial read
-of 32 bytes into cap_buf is correct and as long as the disassembly
-never needs to read more data we return the correct information.
-Use the correct guest address in the call to read_memory_func().
-Cc: qemu-stable@nongnu.org
-Fixes: https://bugs.launchpad.net/qemu/+bug/1900779
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20201022132445.25039-1-peter.maydell@linaro.org
+Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Message-id: 20220221094144.426191-1-peter.maydell@linaro.org
 ---
- disas/capstone.c | 2 +-
+ hw/arm/mps2-tz.c | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/disas/capstone.c b/disas/capstone.c
+diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
 index XXXXXXX..XXXXXXX 100644
---- a/disas/capstone.c
+--- a/hw/arm/mps2-tz.c
-+++ b/disas/capstone.c
++++ b/hw/arm/mps2-tz.c
-@@ -XXX,XX +XXX,XX @@ bool cap_disas_monitor(disassemble_info *info, uint64_t pc, int count)
+@@ -XXX,XX +XXX,XX @@
+  * Application Note AN524:
-         /* Make certain that we can make progress.  */
+  * https://developer.arm.com/documentation/dai0524/latest/
-         assert(tsize != 0);
+  * Application Note AN547:
--        info->read_memory_func(pc, cap_buf + csize, tsize, info);
+- * https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/DAI0547B_SSE300_PLUS_U55_FPGA_for_mps3.pdf
-+        info->read_memory_func(pc + csize, cap_buf + csize, tsize, info);
++ * https://developer.arm.com/documentation/dai0547/latest/
-         csize += tsize;
+  *
+  * The AN505 defers to the Cortex-M33 processor ARMv8M IoT Kit FVP User Guide
-         if (cs_disasm_iter(handle, &cbuf, &csize, &pc, insn)) {
+  * (ARM ECM0601256) for the details of some of the device layout:
 --
-.20.1
+.25.1

-[PULL 26/26] tests/qtest/npcm7xx_rng-test: Disable randomness tests
+[PULL 03/26] hw/input/tsc210x: Don't abort on bad SPI word widths
-The randomness tests in the NPCM7xx RNG test fail intermittently
+The tsc210x doesn't support anything other than 16-bit reads on the
-but fairly frequently. On my machine running the test in a loop:
+SPI bus, but the guest can program the SPI controller to attempt
- while QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/npcm7xx_rng-test; do true; done
+them anyway. If this happens, don't abort QEMU, just log this as
 a guest error.
-will fail in less than a minute with an error like:
+This fixes our machine_arm_n8x0.py:N8x0Machine.test_n800
-ERROR:../../tests/qtest/npcm7xx_rng-test.c:256:test_first_byte_runs:
+acceptance test, which hits this assertion.
 assertion failed (calc_runs_p(buf.l, sizeof(buf) * BITS_PER_BYTE) > 0.01): (0.00286205989 > 0.01)
-(Failures have been observed on all 4 of the randomness tests,
+The reason we hit the assertion is because the guest kernel thinks
-not just first_byte_runs.)
+there is a TSC2005 on this SPI bus address, not a TSC210x.  (The n810
 *does* have a TSC2005 at this address.) The TSC2005 supports the
 -bit accesses which the guest driver makes, and the TSC210x does
 not (that is, our TSC210x emulation is not missing support for a word
 width the hardware can handle).  It's not clear whether the problem
 here is that the guest kernel incorrectly thinks the n800 has the
 same device at this SPI bus address as the n810, or that QEMU's n810
 board model doesn't get the SPI devices right.  At this late date
 there no longer appears to be any reliable information on the web
 about the hardware behaviour, but I am inclined to think this is a
 guest kernel bug.  In any case, we prefer not to abort QEMU for
 guest-triggerable conditions, so logging the error is the right thing
 to do.
-It's not clear why these tests are failing like this, but intermittent
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/736
-failures make CI and merge testing awkward, so disable running them
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-unless a developer specifically sets QEMU_TEST_FLAKY_RNG_TESTS when
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-running the test suite, until we work out the cause.
+Message-id: 20220221140750.514557-1-peter.maydell@linaro.org
 ---
  hw/input/tsc210x.c | 8 ++++++--
 file changed, 6 insertions(+), 2 deletions(-)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/hw/input/tsc210x.c b/hw/input/tsc210x.c
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Message-id: 20201102152454.8287-1-peter.maydell@linaro.org
 Reviewed-by: Havard Skinnemoen <hskinnemoen@google.com>
 ---
  tests/qtest/npcm7xx_rng-test.c | 14 ++++++++++----
 file changed, 10 insertions(+), 4 deletions(-)
 diff --git a/tests/qtest/npcm7xx_rng-test.c b/tests/qtest/npcm7xx_rng-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/tests/qtest/npcm7xx_rng-test.c
+--- a/hw/input/tsc210x.c
-+++ b/tests/qtest/npcm7xx_rng-test.c
++++ b/hw/input/tsc210x.c
-@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
+@@ -XXX,XX +XXX,XX @@
+ #include "hw/hw.h"
-     qtest_add_func("npcm7xx_rng/enable_disable", test_enable_disable);
+ #include "audio/audio.h"
-     qtest_add_func("npcm7xx_rng/rosel", test_rosel);
+ #include "qemu/timer.h"
--    qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
++#include "qemu/log.h"
--    qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
+ #include "sysemu/reset.h"
--    qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
+ #include "ui/console.h"
--    qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
+ #include "hw/arm/omap.h"            /* For I2SCodec */
-+    /*
+@@ -XXX,XX +XXX,XX @@ uint32_t tsc210x_txrx(void *opaque, uint32_t value, int len)
-+     * These tests fail intermittently; only run them on explicit
+     TSC210xState *s = opaque;
-+     * request until we figure out why.
+     uint32_t ret = 0;
-+     */
-+    if (getenv("QEMU_TEST_FLAKY_RNG_TESTS")) {
+-    if (len != 16)
-+        qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
+-        hw_error("%s: FIXME: bad SPI word width %i\n", __func__, len);
-+        qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
++    if (len != 16) {
-+        qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
++        qemu_log_mask(LOG_GUEST_ERROR,
-+        qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
++                      "%s: bad SPI word width %i\n", __func__, len);
 +        return 0;
 +    }
-     qtest_start("-machine npcm750-evb");
+     /* TODO: sequential reads etc - how do we make sure the host doesn't
-     ret = g_test_run();
+      * unintentionally read out a conversion result from a register while
 --
-.20.1
+.25.1

-[PULL 06/26] target/arm: Expand read/write_neon_element32 to all MemOp
+[PULL 04/26] hw/i2c: flatten pca954x mux device
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Patrick Venture <venture@google.com>
-We can then use this to improve VMOV (scalar to gp) and
+Previously this device created N subdevices which each owned an i2c bus.
-VMOV (gp to scalar) so that we simply perform the memory
+Now this device simply owns the N i2c busses directly.
 operation that we wanted, rather than inserting or
 extracting from a 32-bit quantity.
-These were the last uses of neon_load/store_reg, so remove them.
+Tested: Verified devices behind mux are still accessible via qmp and i2c
 from within an arm32 SoC.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Hao Wu <wuhaotsh@google.com>
-Message-id: 20201030022618.785675-7-richard.henderson@linaro.org
+Signed-off-by: Patrick Venture <venture@google.com>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Message-id: 20220202164533.1283668-1-venture@google.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c         | 50 +++++++++++++-----------
+ hw/i2c/i2c_mux_pca954x.c | 77 +++++++---------------------------------
- target/arm/translate-vfp.c.inc | 71 +++++-----------------------------
+file changed, 13 insertions(+), 64 deletions(-)
 files changed, 37 insertions(+), 84 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/hw/i2c/i2c_mux_pca954x.c b/hw/i2c/i2c_mux_pca954x.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/hw/i2c/i2c_mux_pca954x.c
-+++ b/target/arm/translate.c
++++ b/hw/i2c/i2c_mux_pca954x.c
-@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
+@@ -XXX,XX +XXX,XX @@
-  * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
+ #define PCA9548_CHANNEL_COUNT 8
-  * where 0 is the least significant end of the register.
+ #define PCA9546_CHANNEL_COUNT 4
-  */
--static long neon_element_offset(int reg, int element, MemOp size)
+-/*
-+static long neon_element_offset(int reg, int element, MemOp memop)
+- * struct Pca954xChannel - The i2c mux device will have N of these states
- {
+- * that own the i2c channel bus.
--    int element_size = 1 << size;
+- * @bus: The owned channel bus.
-+    int element_size = 1 << (memop & MO_SIZE);
+- * @enabled: Is this channel active?
-     int ofs = element * element_size;
+- */
- #ifdef HOST_WORDS_BIGENDIAN
+-typedef struct Pca954xChannel {
-     /*
+-    SysBusDevice parent;
-@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
+-
 -    I2CBus       *bus;
 -
 -    bool         enabled;
 -} Pca954xChannel;
 -
 -#define TYPE_PCA954X_CHANNEL "pca954x-channel"
 -#define PCA954X_CHANNEL(obj) \
 -    OBJECT_CHECK(Pca954xChannel, (obj), TYPE_PCA954X_CHANNEL)
 -
  /*
   * struct Pca954xState - The pca954x state object.
   * @control: The value written to the mux control.
@@ -XXX,XX +XXX,XX @@ typedef struct Pca954xState {
      uint8_t control;
 -    /* The channel i2c buses. */
 -    Pca954xChannel channel[PCA9548_CHANNEL_COUNT];
 +    bool enabled[PCA9548_CHANNEL_COUNT];
 +    I2CBus *bus[PCA9548_CHANNEL_COUNT];
  } Pca954xState;
  /*
@@ -XXX,XX +XXX,XX @@ static bool pca954x_match(I2CSlave *candidate, uint8_t address,
      }
      for (i = 0; i < mc->nchans; i++) {
 -        if (!mux->channel[i].enabled) {
 +        if (!mux->enabled[i]) {
              continue;
          }
 -        if (i2c_scan_bus(mux->channel[i].bus, address, broadcast,
 +        if (i2c_scan_bus(mux->bus[i], address, broadcast,
                           current_devs)) {
              if (!broadcast) {
                  return true;
@@ -XXX,XX +XXX,XX @@ static void pca954x_enable_channel(Pca954xState *s, uint8_t enable_mask)
       */
      for (i = 0; i < mc->nchans; i++) {
          if (enable_mask & (1 << i)) {
 -            s->channel[i].enabled = true;
 +            s->enabled[i] = true;
          } else {
 -            s->channel[i].enabled = false;
 +            s->enabled[i] = false;
          }
      }
  }
+@@ -XXX,XX +XXX,XX @@ I2CBus *pca954x_i2c_get_bus(I2CSlave *mux, uint8_t channel)
--static TCGv_i32 neon_load_reg(int reg, int pass)
+     Pca954xState *pca954x = PCA954X(mux);
--{
--    TCGv_i32 tmp = tcg_temp_new_i32();
+     g_assert(channel < pc->nchans);
--    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
+-    return I2C_BUS(qdev_get_child_bus(DEVICE(&pca954x->channel[channel]),
--    return tmp;
+-                                      "i2c-bus"));
 -}
 -
--static void neon_store_reg(int reg, int pass, TCGv_i32 var)
+-static void pca954x_channel_init(Object *obj)
 -{
--    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
+-    Pca954xChannel *s = PCA954X_CHANNEL(obj);
--    tcg_temp_free_i32(var);
+-    s->bus = i2c_init_bus(DEVICE(s), "i2c-bus");
 -
 -    /* Start all channels as disabled. */
 -    s->enabled = false;
 -}
 -
- static inline void neon_load_reg64(TCGv_i64 var, int reg)
+-static void pca954x_channel_class_init(ObjectClass *klass, void *data)
 -{
 -    DeviceClass *dc = DEVICE_CLASS(klass);
 -    dc->desc = "Pca954x Channel";
 +    return pca954x->bus[channel];
  }
  static void pca9546_class_init(ObjectClass *klass, void *data)
@@ -XXX,XX +XXX,XX @@ static void pca9548_class_init(ObjectClass *klass, void *data)
      s->nchans = PCA9548_CHANNEL_COUNT;
  }
 -static void pca954x_realize(DeviceState *dev, Error **errp)
 -{
 -    Pca954xState *s = PCA954X(dev);
 -    Pca954xClass *c = PCA954X_GET_CLASS(s);
 -    int i;
 -
 -    /* SMBus modules. Cannot fail. */
 -    for (i = 0; i < c->nchans; i++) {
 -        sysbus_realize(SYS_BUS_DEVICE(&s->channel[i]), &error_abort);
 -    }
 -}
 -
  static void pca954x_init(Object *obj)
  {
-     tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
+     Pca954xState *s = PCA954X(obj);
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
+     Pca954xClass *c = PCA954X_GET_CLASS(obj);
-     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
+     int i;
- }
+-    /* Only initialize the children we expect. */
--static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
++    /* SMBus modules. Cannot fail. */
-+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
+     for (i = 0; i < c->nchans; i++) {
- {
+-        object_initialize_child(obj, "channel[*]", &s->channel[i],
--    long off = neon_element_offset(reg, ele, size);
+-                                TYPE_PCA954X_CHANNEL);
-+    long off = neon_element_offset(reg, ele, memop);
++        g_autofree gchar *bus_name = g_strdup_printf("i2c.%d", i);
++
--    switch (size) {
++        /* start all channels as disabled. */
--    case MO_32:
++        s->enabled[i] = false;
-+    switch (memop) {
++        s->bus[i] = i2c_init_bus(DEVICE(s), bus_name);
 +    case MO_SB:
 +        tcg_gen_ld8s_i32(dest, cpu_env, off);
 +        break;
 +    case MO_UB:
 +        tcg_gen_ld8u_i32(dest, cpu_env, off);
 +        break;
 +    case MO_SW:
 +        tcg_gen_ld16s_i32(dest, cpu_env, off);
 +        break;
 +    case MO_UW:
 +        tcg_gen_ld16u_i32(dest, cpu_env, off);
 +        break;
 +    case MO_UL:
 +    case MO_SL:
          tcg_gen_ld_i32(dest, cpu_env, off);
          break;
      default:
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
      }
  }
--static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
+@@ -XXX,XX +XXX,XX @@ static void pca954x_class_init(ObjectClass *klass, void *data)
-+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
+     rc->phases.enter = pca954x_enter_reset;
- {
--    long off = neon_element_offset(reg, ele, size);
+     dc->desc = "Pca954x i2c-mux";
-+    long off = neon_element_offset(reg, ele, memop);
+-    dc->realize = pca954x_realize;
--    switch (size) {
+     k->write_data = pca954x_write_data;
-+    switch (memop) {
+     k->receive_byte = pca954x_read_byte;
-+    case MO_8:
+@@ -XXX,XX +XXX,XX @@ static const TypeInfo pca954x_info[] = {
-+        tcg_gen_st8_i32(src, cpu_env, off);
+         .parent        = TYPE_PCA954X,
-+        break;
+         .class_init    = pca9548_class_init,
-+    case MO_16:
+     },
-+        tcg_gen_st16_i32(src, cpu_env, off);
+-    {
-+        break;
+-        .name = TYPE_PCA954X_CHANNEL,
-     case MO_32:
+-        .parent = TYPE_SYS_BUS_DEVICE,
-         tcg_gen_st_i32(src, cpu_env, off);
+-        .class_init = pca954x_channel_class_init,
-         break;
+-        .instance_size = sizeof(Pca954xChannel),
-diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
+-        .instance_init = pca954x_channel_init,
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
  {
      /* VMOV scalar to general purpose register */
      TCGv_i32 tmp;
 -    int pass;
 -    uint32_t offset;
 -    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
 -    if (a->size == 2
 +    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
 +    if (a->size == MO_32
          ? !dc_isar_feature(aa32_fpsp_v2, s)
          : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
          return false;
      }
 -    offset = a->index << a->size;
 -    pass = extract32(offset, 2, 1);
 -    offset = extract32(offset, 0, 2) * 8;
 -
      if (!vfp_access_check(s)) {
          return true;
      }
 -    tmp = neon_load_reg(a->vn, pass);
 -    switch (a->size) {
 -    case 0:
 -        if (offset) {
 -            tcg_gen_shri_i32(tmp, tmp, offset);
 -        }
 -        if (a->u) {
 -            gen_uxtb(tmp);
 -        } else {
 -            gen_sxtb(tmp);
 -        }
 -        break;
 -    case 1:
 -        if (a->u) {
 -            if (offset) {
 -                tcg_gen_shri_i32(tmp, tmp, 16);
 -            } else {
 -                gen_uxth(tmp);
 -            }
 -        } else {
 -            if (offset) {
 -                tcg_gen_sari_i32(tmp, tmp, 16);
 -            } else {
 -                gen_sxth(tmp);
 -            }
 -        }
 -        break;
 -    case 2:
 -        break;
 -    }
-+    tmp = tcg_temp_new_i32();
+ };
-+    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
-     store_reg(s, a->rt, tmp);
+ DEFINE_TYPES(pca954x_info)
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
  static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
  {
      /* VMOV general purpose register to scalar */
 -    TCGv_i32 tmp, tmp2;
 -    int pass;
 -    uint32_t offset;
 +    TCGv_i32 tmp;
 -    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
 -    if (a->size == 2
 +    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
 +    if (a->size == MO_32
          ? !dc_isar_feature(aa32_fpsp_v2, s)
          : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
          return false;
      }
 -    offset = a->index << a->size;
 -    pass = extract32(offset, 2, 1);
 -    offset = extract32(offset, 0, 2) * 8;
 -
      if (!vfp_access_check(s)) {
          return true;
      }
      tmp = load_reg(s, a->rt);
 -    switch (a->size) {
 -    case 0:
 -        tmp2 = neon_load_reg(a->vn, pass);
 -        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
 -        tcg_temp_free_i32(tmp2);
 -        break;
 -    case 1:
 -        tmp2 = neon_load_reg(a->vn, pass);
 -        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
 -        tcg_temp_free_i32(tmp2);
 -        break;
 -    case 2:
 -        break;
 -    }
 -    neon_store_reg(a->vn, pass, tmp);
 +    write_neon_element32(tmp, a->vn, a->index, a->size);
 +    tcg_temp_free_i32(tmp);
      return true;
  }
 --
-.20.1
+.25.1

-[PULL 18/26] hw/arm/boot: fix SVE for EL3 direct kernel boot
+[PULL 05/26] target/arm: Support PSCI 1.1 and SMCCC 1.0
-From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+From: Akihiko Odaki <akihiko.odaki@gmail.com>
-When booting a CPU with EL3 using the -kernel flag, set up CPTR_EL3 so
+Support the latest PSCI on TCG and HVF. A 64-bit function called from
-that SVE will not trap to EL3.
+AArch32 now returns NOT_SUPPORTED, which is necessary to adhere to SMC
 Calling Convention 1.0. It is still not compliant with SMCCC 1.3 since
 they do not implement mandatory functions.
-Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220213035753.34577-1-akihiko.odaki@gmail.com
-Message-id: 20201030151541.11976-1-remi@remlab.net
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 [PMM: update MISMATCH_CHECK checks on PSCI_VERSION macros to match]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/boot.c | 3 +++
+ target/arm/kvm-consts.h | 13 +++++++++----
-file changed, 3 insertions(+)
+ hw/arm/boot.c           | 12 +++++++++---
  target/arm/cpu.c        |  5 +++--
  target/arm/hvf/hvf.c    | 27 ++++++++++++++++++++++++++-
  target/arm/kvm64.c      |  2 +-
  target/arm/psci.c       | 35 ++++++++++++++++++++++++++++++++---
 files changed, 80 insertions(+), 14 deletions(-)
+diff --git a/target/arm/kvm-consts.h b/target/arm/kvm-consts.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/kvm-consts.h
++++ b/target/arm/kvm-consts.h
+@@ -XXX,XX +XXX,XX @@ MISMATCH_CHECK(QEMU_PSCI_0_1_FN_MIGRATE, KVM_PSCI_FN_MIGRATE);
+ #define QEMU_PSCI_0_2_FN64_AFFINITY_INFO QEMU_PSCI_0_2_FN64(4)
+ #define QEMU_PSCI_0_2_FN64_MIGRATE QEMU_PSCI_0_2_FN64(5)
++#define QEMU_PSCI_1_0_FN_PSCI_FEATURES QEMU_PSCI_0_2_FN(10)
++
+ MISMATCH_CHECK(QEMU_PSCI_0_2_FN_CPU_SUSPEND, PSCI_0_2_FN_CPU_SUSPEND);
+ MISMATCH_CHECK(QEMU_PSCI_0_2_FN_CPU_OFF, PSCI_0_2_FN_CPU_OFF);
+ MISMATCH_CHECK(QEMU_PSCI_0_2_FN_CPU_ON, PSCI_0_2_FN_CPU_ON);
+@@ -XXX,XX +XXX,XX @@ MISMATCH_CHECK(QEMU_PSCI_0_2_FN_MIGRATE, PSCI_0_2_FN_MIGRATE);
+ MISMATCH_CHECK(QEMU_PSCI_0_2_FN64_CPU_SUSPEND, PSCI_0_2_FN64_CPU_SUSPEND);
+ MISMATCH_CHECK(QEMU_PSCI_0_2_FN64_CPU_ON, PSCI_0_2_FN64_CPU_ON);
+ MISMATCH_CHECK(QEMU_PSCI_0_2_FN64_MIGRATE, PSCI_0_2_FN64_MIGRATE);
++MISMATCH_CHECK(QEMU_PSCI_1_0_FN_PSCI_FEATURES, PSCI_1_0_FN_PSCI_FEATURES);
+ /* PSCI v0.2 return values used by TCG emulation of PSCI */
+ /* No Trusted OS migration to worry about when offlining CPUs */
+ #define QEMU_PSCI_0_2_RET_TOS_MIGRATION_NOT_REQUIRED        2
+-/* We implement version 0.2 only */
+-#define QEMU_PSCI_0_2_RET_VERSION_0_2                       2
++#define QEMU_PSCI_VERSION_0_1                     0x00001
++#define QEMU_PSCI_VERSION_0_2                     0x00002
++#define QEMU_PSCI_VERSION_1_1                     0x10001
+ MISMATCH_CHECK(QEMU_PSCI_0_2_RET_TOS_MIGRATION_NOT_REQUIRED, PSCI_0_2_TOS_MP);
+-MISMATCH_CHECK(QEMU_PSCI_0_2_RET_VERSION_0_2,
+-               (PSCI_VERSION_MAJOR(0) | PSCI_VERSION_MINOR(2)));
++/* We don't bother to check every possible version value */
++MISMATCH_CHECK(QEMU_PSCI_VERSION_0_2, PSCI_VERSION(0, 2));
++MISMATCH_CHECK(QEMU_PSCI_VERSION_1_1, PSCI_VERSION(1, 1));
+ /* PSCI return values (inclusive of all PSCI versions) */
+ #define QEMU_PSCI_RET_SUCCESS                     0
 diff --git a/hw/arm/boot.c b/hw/arm/boot.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/boot.c
 +++ b/hw/arm/boot.c
-@@ -XXX,XX +XXX,XX @@ static void do_cpu_reset(void *opaque)
+@@ -XXX,XX +XXX,XX @@ static void fdt_add_psci_node(void *fdt)
-                     if (cpu_isar_feature(aa64_mte, cpu)) {
+     }
-                         env->cp15.scr_el3 |= SCR_ATA;
-                     }
+     qemu_fdt_add_subnode(fdt, "/psci");
-+                    if (cpu_isar_feature(aa64_sve, cpu)) {
+-    if (armcpu->psci_version == 2) {
-+                        env->cp15.cptr_el[3] |= CPTR_EZ;
+-        const char comp[] = "arm,psci-0.2\0arm,psci";
-+                    }
+-        qemu_fdt_setprop(fdt, "/psci", "compatible", comp, sizeof(comp));
-                     /* AArch64 kernels never boot in secure mode */
++    if (armcpu->psci_version == QEMU_PSCI_VERSION_0_2 ||
-                     assert(!info->secure_boot);
++        armcpu->psci_version == QEMU_PSCI_VERSION_1_1) {
-                     /* This hook is only supported for AArch32 currently:
++        if (armcpu->psci_version == QEMU_PSCI_VERSION_0_2) {
 +            const char comp[] = "arm,psci-0.2\0arm,psci";
 +            qemu_fdt_setprop(fdt, "/psci", "compatible", comp, sizeof(comp));
 +        } else {
 +            const char comp[] = "arm,psci-1.0\0arm,psci-0.2\0arm,psci";
 +            qemu_fdt_setprop(fdt, "/psci", "compatible", comp, sizeof(comp));
 +        }
          cpu_off_fn = QEMU_PSCI_0_2_FN_CPU_OFF;
          if (arm_feature(&armcpu->env, ARM_FEATURE_AARCH64)) {
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_initfn(Object *obj)
       * picky DTB consumer will also provide a helpful error message.
       */
      cpu->dtb_compatible = "qemu,unknown";
 -    cpu->psci_version = 1; /* By default assume PSCI v0.1 */
 +    cpu->psci_version = QEMU_PSCI_VERSION_0_1; /* By default assume PSCI v0.1 */
      cpu->kvm_target = QEMU_KVM_ARM_TARGET_NONE;
      if (tcg_enabled() || hvf_enabled()) {
 -        cpu->psci_version = 2; /* TCG and HVF implement PSCI 0.2 */
 +        /* TCG and HVF implement PSCI 1.1 */
 +        cpu->psci_version = QEMU_PSCI_VERSION_1_1;
      }
  }
 diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/hvf/hvf.c
 +++ b/target/arm/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@ static bool hvf_handle_psci_call(CPUState *cpu)
      switch (param[0]) {
      case QEMU_PSCI_0_2_FN_PSCI_VERSION:
 -        ret = QEMU_PSCI_0_2_RET_VERSION_0_2;
 +        ret = QEMU_PSCI_VERSION_1_1;
          break;
      case QEMU_PSCI_0_2_FN_MIGRATE_INFO_TYPE:
          ret = QEMU_PSCI_0_2_RET_TOS_MIGRATION_NOT_REQUIRED; /* No trusted OS */
@@ -XXX,XX +XXX,XX @@ static bool hvf_handle_psci_call(CPUState *cpu)
      case QEMU_PSCI_0_2_FN_MIGRATE:
          ret = QEMU_PSCI_RET_NOT_SUPPORTED;
          break;
 +    case QEMU_PSCI_1_0_FN_PSCI_FEATURES:
 +        switch (param[1]) {
 +        case QEMU_PSCI_0_2_FN_PSCI_VERSION:
 +        case QEMU_PSCI_0_2_FN_MIGRATE_INFO_TYPE:
 +        case QEMU_PSCI_0_2_FN_AFFINITY_INFO:
 +        case QEMU_PSCI_0_2_FN64_AFFINITY_INFO:
 +        case QEMU_PSCI_0_2_FN_SYSTEM_RESET:
 +        case QEMU_PSCI_0_2_FN_SYSTEM_OFF:
 +        case QEMU_PSCI_0_1_FN_CPU_ON:
 +        case QEMU_PSCI_0_2_FN_CPU_ON:
 +        case QEMU_PSCI_0_2_FN64_CPU_ON:
 +        case QEMU_PSCI_0_1_FN_CPU_OFF:
 +        case QEMU_PSCI_0_2_FN_CPU_OFF:
 +        case QEMU_PSCI_0_1_FN_CPU_SUSPEND:
 +        case QEMU_PSCI_0_2_FN_CPU_SUSPEND:
 +        case QEMU_PSCI_0_2_FN64_CPU_SUSPEND:
 +        case QEMU_PSCI_1_0_FN_PSCI_FEATURES:
 +            ret = 0;
 +            break;
 +        case QEMU_PSCI_0_1_FN_MIGRATE:
 +        case QEMU_PSCI_0_2_FN_MIGRATE:
 +        default:
 +            ret = QEMU_PSCI_RET_NOT_SUPPORTED;
 +        }
 +        break;
      default:
          return false;
      }
 diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm64.c
 +++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
          cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_POWER_OFF;
      }
      if (kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_PSCI_0_2)) {
 -        cpu->psci_version = 2;
 +        cpu->psci_version = QEMU_PSCI_VERSION_0_2;
          cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_PSCI_0_2;
      }
      if (!arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
 diff --git a/target/arm/psci.c b/target/arm/psci.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/psci.c
 +++ b/target/arm/psci.c
@@ -XXX,XX +XXX,XX @@ void arm_handle_psci_call(ARMCPU *cpu)
  {
      /*
       * This function partially implements the logic for dispatching Power State
 -     * Coordination Interface (PSCI) calls (as described in ARM DEN 0022B.b),
 +     * Coordination Interface (PSCI) calls (as described in ARM DEN 0022D.b),
       * to the extent required for bringing up and taking down secondary cores,
       * and for handling reset and poweroff requests.
       * Additional information about the calling convention used is available in
@@ -XXX,XX +XXX,XX @@ void arm_handle_psci_call(ARMCPU *cpu)
      }
      if ((param[0] & QEMU_PSCI_0_2_64BIT) && !is_a64(env)) {
 -        ret = QEMU_PSCI_RET_INVALID_PARAMS;
 +        ret = QEMU_PSCI_RET_NOT_SUPPORTED;
          goto err;
      }
@@ -XXX,XX +XXX,XX @@ void arm_handle_psci_call(ARMCPU *cpu)
          ARMCPU *target_cpu;
      case QEMU_PSCI_0_2_FN_PSCI_VERSION:
 -        ret = QEMU_PSCI_0_2_RET_VERSION_0_2;
 +        ret = QEMU_PSCI_VERSION_1_1;
          break;
      case QEMU_PSCI_0_2_FN_MIGRATE_INFO_TYPE:
          ret = QEMU_PSCI_0_2_RET_TOS_MIGRATION_NOT_REQUIRED; /* No trusted OS */
@@ -XXX,XX +XXX,XX @@ void arm_handle_psci_call(ARMCPU *cpu)
          }
          helper_wfi(env, 4);
          break;
 +    case QEMU_PSCI_1_0_FN_PSCI_FEATURES:
 +        switch (param[1]) {
 +        case QEMU_PSCI_0_2_FN_PSCI_VERSION:
 +        case QEMU_PSCI_0_2_FN_MIGRATE_INFO_TYPE:
 +        case QEMU_PSCI_0_2_FN_AFFINITY_INFO:
 +        case QEMU_PSCI_0_2_FN64_AFFINITY_INFO:
 +        case QEMU_PSCI_0_2_FN_SYSTEM_RESET:
 +        case QEMU_PSCI_0_2_FN_SYSTEM_OFF:
 +        case QEMU_PSCI_0_1_FN_CPU_ON:
 +        case QEMU_PSCI_0_2_FN_CPU_ON:
 +        case QEMU_PSCI_0_2_FN64_CPU_ON:
 +        case QEMU_PSCI_0_1_FN_CPU_OFF:
 +        case QEMU_PSCI_0_2_FN_CPU_OFF:
 +        case QEMU_PSCI_0_1_FN_CPU_SUSPEND:
 +        case QEMU_PSCI_0_2_FN_CPU_SUSPEND:
 +        case QEMU_PSCI_0_2_FN64_CPU_SUSPEND:
 +        case QEMU_PSCI_1_0_FN_PSCI_FEATURES:
 +            if (!(param[1] & QEMU_PSCI_0_2_64BIT) || is_a64(env)) {
 +                ret = 0;
 +                break;
 +            }
 +            /* fallthrough */
 +        case QEMU_PSCI_0_1_FN_MIGRATE:
 +        case QEMU_PSCI_0_2_FN_MIGRATE:
 +        default:
 +            ret = QEMU_PSCI_RET_NOT_SUPPORTED;
 +            break;
 +        }
 +        break;
      case QEMU_PSCI_0_1_FN_MIGRATE:
      case QEMU_PSCI_0_2_FN_MIGRATE:
      default:
 --
-.20.1
+.25.1

-[PULL 20/26] hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
+[PULL 06/26] target/arm: Fix early free of TCG temp in handle_simd_shift_fpint_conv()
-From: AlexChen <alex.chen@huawei.com>
+From: Wentao_Liang <Wentao_Liang_g@163.com>
-In exynos4210_fimd_update(), the pointer s is dereferinced before
+handle_simd_shift_fpint_conv() was accidentally freeing the TCG
-being check if it is valid, which may lead to NULL pointer dereference.
+temporary tcg_fpstatus too early, before the last use of it.  Move
-So move the assignment to global_width after checking that the s is valid.
+the free down to where it belongs.
-Reported-by: Euler Robot <euler.robot@huawei.com>
+Signed-off-by: Wentao_Liang <Wentao_Liang_g@163.com>
-Signed-off-by: Alex Chen <alex.chen@huawei.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+[PMM: cleaned up commit message]
 Message-id: 5F9F8D88.9030102@huawei.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/display/exynos4210_fimd.c | 4 +++-
+ target/arm/translate-a64.c | 2 +-
-file changed, 3 insertions(+), 1 deletion(-)
+file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/hw/display/exynos4210_fimd.c b/hw/display/exynos4210_fimd.c
+diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/exynos4210_fimd.c
+--- a/target/arm/translate-a64.c
-+++ b/hw/display/exynos4210_fimd.c
++++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void exynos4210_fimd_update(void *opaque)
+@@ -XXX,XX +XXX,XX @@ static void handle_simd_shift_fpint_conv(DisasContext *s, bool is_scalar,
-     bool blend = false;
+         }
      uint8_t *host_fb_addr;
      bool is_dirty = false;
 -    const int global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
 +    int global_width;
      if (!s || !s->console || !s->enabled ||
          surface_bits_per_pixel(qemu_console_surface(s->console)) == 0) {
          return;
      }
-+
-+    global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
+-    tcg_temp_free_ptr(tcg_fpstatus);
-     exynos4210_update_resolution(s);
+     tcg_temp_free_i32(tcg_shift);
-     surface = qemu_console_surface(s->console);
+     gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
 +    tcg_temp_free_ptr(tcg_fpstatus);
      tcg_temp_free_i32(tcg_rmode);
  }
 --
-.20.1
+.25.1

-[PULL 24/26] scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
+[PULL 07/26] tests/qtest: add qtests for npcm7xx sdhci
-The kerneldoc script currently emits Sphinx markup for a macro with
+From: Shengtan Mao <stmao@google.com>
 arguments that uses the c:function directive. This is correct for
 Sphinx versions earlier than Sphinx 3, where c:macro doesn't allow
 documentation of macros with arguments and c:function is not picky
 about the syntax of what it is passed. However, in Sphinx 3 the
 c:macro directive was enhanced to support macros with arguments,
 and c:function was made more picky about what syntax it accepted.
-When kerneldoc is told that it needs to produce output for Sphinx
+Reviewed-by: Hao Wu <wuhaotsh@google.com>
-or later, make it emit c:function only for functions and c:macro
+Reviewed-by: Chris Rauer <crauer@google.com>
-for macros with arguments. We assume that anything with a return
+Signed-off-by: Shengtan Mao <stmao@google.com>
-type is a function and anything without is a macro.
+Signed-off-by: Patrick Venture <venture@google.com>
 Message-id: 20220225174451.192304-1-wuhaotsh@google.com
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  tests/qtest/npcm7xx_sdhci-test.c | 215 +++++++++++++++++++++++++++++++
  tests/qtest/meson.build          |   1 +
 files changed, 216 insertions(+)
  create mode 100644 tests/qtest/npcm7xx_sdhci-test.c
-This fixes the Sphinx error:
+diff --git a/tests/qtest/npcm7xx_sdhci-test.c b/tests/qtest/npcm7xx_sdhci-test.c
+new file mode 100644
-/home/petmay01/linaro/qemu-from-laptop/qemu/docs/../include/qom/object.h:155:Error in declarator
+index XXXXXXX..XXXXXXX
-If declarator-id with parameters (e.g., 'void f(int arg)'):
+--- /dev/null
-  Invalid C declaration: Expected identifier in nested name. [error at 25]
++++ b/tests/qtest/npcm7xx_sdhci-test.c
-    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
+@@ -XXX,XX +XXX,XX @@
-    -------------------------^
++/*
-If parenthesis in noptr-declarator (e.g., 'void (*f(int arg))(double)'):
++ * QTests for NPCM7xx SD-3.0 / MMC-4.51 Host Controller
-  Error in declarator or parameters
++ *
-  Invalid C declaration: Expecting "(" in parameters. [error at 39]
++ * Copyright (c) 2022 Google LLC
-    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
++ *
-    ---------------------------------------^
++ * This program is free software; you can redistribute it and/or modify it
++ * under the terms of the GNU General Public License as published by the
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
++ * Free Software Foundation; either version 2 of the License, or
-Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
++ * (at your option) any later version.
-Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
++ *
-Message-id: 20201030174700.7204-2-peter.maydell@linaro.org
++ * This program is distributed in the hope that it will be useful, but WITHOUT
----
++ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- scripts/kernel-doc | 18 +++++++++++++++++-
++ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
-file changed, 17 insertions(+), 1 deletion(-)
++ * for more details.
++ */
-diff --git a/scripts/kernel-doc b/scripts/kernel-doc
++
-index XXXXXXX..XXXXXXX 100755
++#include "qemu/osdep.h"
---- a/scripts/kernel-doc
++#include "hw/sd/npcm7xx_sdhci.h"
-+++ b/scripts/kernel-doc
++
-@@ -XXX,XX +XXX,XX @@ sub output_function_rst(%) {
++#include "libqos/libqtest.h"
-     output_highlight_rst($args{'purpose'});
++#include "libqtest-single.h"
-     $start = "\n\n**Syntax**\n\n  ``";
++#include "libqos/sdhci-cmd.h"
-     } else {
++
--    print ".. c:function:: ";
++#define NPCM7XX_REG_SIZE 0x100
-+        if ((split(/\./, $sphinx_version))[0] >= 3) {
++#define NPCM7XX_MMC_BA 0xF0842000
-+            # Sphinx 3 and later distinguish macros and functions and
++#define NPCM7XX_BLK_SIZE 512
-+            # complain if you use c:function with something that's not
++#define NPCM7XX_TEST_IMAGE_SIZE (1 << 30)
-+            # syntactically valid as a function declaration.
++
-+            # We assume that anything with a return type is a function
++char *sd_path;
-+            # and anything without is a macro.
++
-+            if ($args{'functiontype'} ne "") {
++static QTestState *setup_sd_card(void)
-+                print ".. c:function:: ";
++{
-+            } else {
++    QTestState *qts = qtest_initf(
-+                print ".. c:macro:: ";
++        "-machine kudo-bmc "
 +        "-device sd-card,drive=drive0 "
 +        "-drive id=drive0,if=none,file=%s,format=raw,auto-read-only=off",
 +        sd_path);
 +
 +    qtest_writew(qts, NPCM7XX_MMC_BA + SDHC_SWRST, SDHC_RESET_ALL);
 +    qtest_writew(qts, NPCM7XX_MMC_BA + SDHC_CLKCON,
 +                 SDHC_CLOCK_SDCLK_EN | SDHC_CLOCK_INT_STABLE |
 +                     SDHC_CLOCK_INT_EN);
 +    sdhci_cmd_regs(qts, NPCM7XX_MMC_BA, 0, 0, 0, 0, SDHC_APP_CMD);
 +    sdhci_cmd_regs(qts, NPCM7XX_MMC_BA, 0, 0, 0x41200000, 0, (41 << 8));
 +    sdhci_cmd_regs(qts, NPCM7XX_MMC_BA, 0, 0, 0, 0, SDHC_ALL_SEND_CID);
 +    sdhci_cmd_regs(qts, NPCM7XX_MMC_BA, 0, 0, 0, 0, SDHC_SEND_RELATIVE_ADDR);
 +    sdhci_cmd_regs(qts, NPCM7XX_MMC_BA, 0, 0, 0x45670000, 0,
 +                   SDHC_SELECT_DESELECT_CARD);
 +
 +    return qts;
 +}
 +
 +static void write_sdread(QTestState *qts, const char *msg)
 +{
 +    int fd, ret;
 +    size_t len = strlen(msg);
 +    char *rmsg = g_malloc(len);
 +
 +    /* write message to sd */
 +    fd = open(sd_path, O_WRONLY);
 +    g_assert(fd >= 0);
 +    ret = write(fd, msg, len);
 +    close(fd);
 +    g_assert(ret == len);
 +
 +    /* read message using sdhci */
 +    ret = sdhci_read_cmd(qts, NPCM7XX_MMC_BA, rmsg, len);
 +    g_assert(ret == len);
 +    g_assert(!memcmp(rmsg, msg, len));
 +
 +    g_free(rmsg);
 +}
 +
 +/* Check MMC can read values from sd */
 +static void test_read_sd(void)
 +{
 +    QTestState *qts = setup_sd_card();
 +
 +    write_sdread(qts, "hello world");
 +    write_sdread(qts, "goodbye");
 +
 +    qtest_quit(qts);
 +}
 +
 +static void sdwrite_read(QTestState *qts, const char *msg)
 +{
 +    int fd, ret;
 +    size_t len = strlen(msg);
 +    char *rmsg = g_malloc(len);
 +
 +    /* write message using sdhci */
 +    sdhci_write_cmd(qts, NPCM7XX_MMC_BA, msg, len, NPCM7XX_BLK_SIZE);
 +
 +    /* read message from sd */
 +    fd = open(sd_path, O_RDONLY);
 +    g_assert(fd >= 0);
 +    ret = read(fd, rmsg, len);
 +    close(fd);
 +    g_assert(ret == len);
 +
 +    g_assert(!memcmp(rmsg, msg, len));
 +
 +    g_free(rmsg);
 +}
 +
 +/* Check MMC can write values to sd */
 +static void test_write_sd(void)
 +{
 +    QTestState *qts = setup_sd_card();
 +
 +    sdwrite_read(qts, "hello world");
 +    sdwrite_read(qts, "goodbye");
 +
 +    qtest_quit(qts);
 +}
 +
 +/* Check SDHCI has correct default values. */
 +static void test_reset(void)
 +{
 +    QTestState *qts = qtest_init("-machine kudo-bmc");
 +    uint64_t addr = NPCM7XX_MMC_BA;
 +    uint64_t end_addr = addr + NPCM7XX_REG_SIZE;
 +    uint16_t prstvals_resets[] = {NPCM7XX_PRSTVALS_0_RESET,
 +                                  NPCM7XX_PRSTVALS_1_RESET,
 +                                  0,
 +                                  NPCM7XX_PRSTVALS_3_RESET,
 +                                  0,
 +                                  0};
 +    int i;
 +    uint32_t mask;
 +
 +    while (addr < end_addr) {
 +        switch (addr - NPCM7XX_MMC_BA) {
 +        case SDHC_PRNSTS:
 +            /*
 +             * ignores bits 20 to 24: they are changed when reading registers
 +             */
 +            mask = 0x1f00000;
 +            g_assert_cmphex(qtest_readl(qts, addr) | mask, ==,
 +                            NPCM7XX_PRSNTS_RESET | mask);
 +            addr += 4;
 +            break;
 +        case SDHC_BLKGAP:
 +            g_assert_cmphex(qtest_readb(qts, addr), ==, NPCM7XX_BLKGAP_RESET);
 +            addr += 1;
 +            break;
 +        case SDHC_CAPAB:
 +            g_assert_cmphex(qtest_readq(qts, addr), ==, NPCM7XX_CAPAB_RESET);
 +            addr += 8;
 +            break;
 +        case SDHC_MAXCURR:
 +            g_assert_cmphex(qtest_readq(qts, addr), ==, NPCM7XX_MAXCURR_RESET);
 +            addr += 8;
 +            break;
 +        case SDHC_HCVER:
 +            g_assert_cmphex(qtest_readw(qts, addr), ==, NPCM7XX_HCVER_RESET);
 +            addr += 2;
 +            break;
 +        case NPCM7XX_PRSTVALS:
 +            for (i = 0; i < NPCM7XX_PRSTVALS_SIZE; ++i) {
 +                g_assert_cmphex(qtest_readw(qts, addr + 2 * i), ==,
 +                                prstvals_resets[i]);
 +            }
-+        } else {
++            addr += NPCM7XX_PRSTVALS_SIZE * 2;
-+            # Older Sphinx don't support documenting macros that take
++            break;
-+            # arguments with c:macro, and don't complain about the use
++        default:
-+            # of c:function for this.
++            g_assert_cmphex(qtest_readb(qts, addr), ==, 0);
-+            print ".. c:function:: ";
++            addr += 1;
 +        }
-     }
++    }
-     if ($args{'functiontype'} ne "") {
++
-     $start .= $args{'functiontype'} . " " . $args{'function'} . " (";
++    qtest_quit(qts);
 +}
 +
 +static void drive_destroy(void)
 +{
 +    unlink(sd_path);
 +    g_free(sd_path);
 +}
 +
 +static void drive_create(void)
 +{
 +    int fd, ret;
 +    GError *error = NULL;
 +
 +    /* Create a temporary raw image */
 +    fd = g_file_open_tmp("sdhci_XXXXXX", &sd_path, &error);
 +    if (fd == -1) {
 +        fprintf(stderr, "unable to create sdhci file: %s\n", error->message);
 +        g_error_free(error);
 +    }
 +    g_assert(sd_path != NULL);
 +
 +    ret = ftruncate(fd, NPCM7XX_TEST_IMAGE_SIZE);
 +    g_assert_cmpint(ret, ==, 0);
 +    g_message("%s", sd_path);
 +    close(fd);
 +}
 +
 +int main(int argc, char **argv)
 +{
 +    int ret;
 +
 +    drive_create();
 +
 +    g_test_init(&argc, &argv, NULL);
 +
 +    qtest_add_func("npcm7xx_sdhci/reset", test_reset);
 +    qtest_add_func("npcm7xx_sdhci/write_sd", test_write_sd);
 +    qtest_add_func("npcm7xx_sdhci/read_sd", test_read_sd);
 +
 +    ret = g_test_run();
 +    drive_destroy();
 +    return ret;
 +}
 diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/qtest/meson.build
 +++ b/tests/qtest/meson.build
@@ -XXX,XX +XXX,XX @@ qtests_npcm7xx = \
     'npcm7xx_gpio-test',
     'npcm7xx_pwm-test',
     'npcm7xx_rng-test',
 +   'npcm7xx_sdhci-test',
     'npcm7xx_smbus-test',
     'npcm7xx_timer-test',
     'npcm7xx_watchdog_timer-test'] + \
 --
-.20.1
+.25.1

-[PULL 22/26] configure: Test that gio libs from pkg-config work
+[PULL 08/26] hw/registerfields: Add FIELD_SEX<N> and FIELD_SDP<N>
-On some hosts (eg Ubuntu Bionic) pkg-config returns a set of
+From: Richard Henderson <richard.henderson@linaro.org>
 libraries for gio-2.0 which don't actually work when compiling
 statically. (Specifically, the returned library string includes
 -lmount, but not -lblkid which -lmount depends upon, so linking
 fails due to missing symbols.)
-Check that the libraries work, and don't enable gio if they don't,
+Add new macros to manipulate signed fields within the register.
 in the same way we do for gnutls.
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220301215958.157011-2-richard.henderson@linaro.org
+Suggested-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-Message-id: 20200928160402.7961-1-peter.maydell@linaro.org
 ---
- configure | 10 +++++++++-
+ include/hw/registerfields.h | 48 ++++++++++++++++++++++++++++++++++++-
-file changed, 9 insertions(+), 1 deletion(-)
+file changed, 47 insertions(+), 1 deletion(-)
-diff --git a/configure b/configure
+diff --git a/include/hw/registerfields.h b/include/hw/registerfields.h
-index XXXXXXX..XXXXXXX 100755
+index XXXXXXX..XXXXXXX 100644
---- a/configure
+--- a/include/hw/registerfields.h
-+++ b/configure
++++ b/include/hw/registerfields.h
-@@ -XXX,XX +XXX,XX @@ if test "$static" = yes && test "$mingw32" = yes; then
+@@ -XXX,XX +XXX,XX @@
- fi
+     extract64((storage), R_ ## reg ## _ ## field ## _SHIFT,               \
+               R_ ## reg ## _ ## field ## _LENGTH)
- if $pkg_config --atleast-version=$glib_req_ver gio-2.0; then
--    gio=yes
++#define FIELD_SEX8(storage, reg, field)                                   \
-     gio_cflags=$($pkg_config --cflags gio-2.0)
++    sextract8((storage), R_ ## reg ## _ ## field ## _SHIFT,               \
-     gio_libs=$($pkg_config --libs gio-2.0)
++              R_ ## reg ## _ ## field ## _LENGTH)
-     gdbus_codegen=$($pkg_config --variable=gdbus_codegen gio-2.0)
++#define FIELD_SEX16(storage, reg, field)                                  \
-     if [ ! -x "$gdbus_codegen" ]; then
++    sextract16((storage), R_ ## reg ## _ ## field ## _SHIFT,              \
-         gdbus_codegen=
++               R_ ## reg ## _ ## field ## _LENGTH)
-     fi
++#define FIELD_SEX32(storage, reg, field)                                  \
-+    # Check that the libraries actually work -- Ubuntu 18.04 ships
++    sextract32((storage), R_ ## reg ## _ ## field ## _SHIFT,              \
-+    # with pkg-config --static --libs data for gio-2.0 that is missing
++               R_ ## reg ## _ ## field ## _LENGTH)
-+    # -lblkid and will give a link error.
++#define FIELD_SEX64(storage, reg, field)                                  \
-+    write_c_skeleton
++    sextract64((storage), R_ ## reg ## _ ## field ## _SHIFT,              \
-+    if compile_prog "" "gio_libs" ; then
++               R_ ## reg ## _ ## field ## _LENGTH)
-+        gio=yes
++
-+    else
+ /* Extract a field from an array of registers */
-+        gio=no
+ #define ARRAY_FIELD_EX32(regs, reg, field)                                \
-+    fi
+     FIELD_EX32((regs)[R_ ## reg], reg, field)
- else
+@@ -XXX,XX +XXX,XX @@
-     gio=no
+     _d; })
- fi
+ #define FIELD_DP64(storage, reg, field, val) ({                           \
      struct {                                                              \
 -        uint64_t v:R_ ## reg ## _ ## field ## _LENGTH;                \
 +        uint64_t v:R_ ## reg ## _ ## field ## _LENGTH;                    \
 +    } _v = { .v = val };                                                  \
 +    uint64_t _d;                                                          \
 +    _d = deposit64((storage), R_ ## reg ## _ ## field ## _SHIFT,          \
 +                  R_ ## reg ## _ ## field ## _LENGTH, _v.v);              \
 +    _d; })
 +
 +#define FIELD_SDP8(storage, reg, field, val) ({                           \
 +    struct {                                                              \
 +        signed int v:R_ ## reg ## _ ## field ## _LENGTH;                  \
 +    } _v = { .v = val };                                                  \
 +    uint8_t _d;                                                           \
 +    _d = deposit32((storage), R_ ## reg ## _ ## field ## _SHIFT,          \
 +                  R_ ## reg ## _ ## field ## _LENGTH, _v.v);              \
 +    _d; })
 +#define FIELD_SDP16(storage, reg, field, val) ({                          \
 +    struct {                                                              \
 +        signed int v:R_ ## reg ## _ ## field ## _LENGTH;                  \
 +    } _v = { .v = val };                                                  \
 +    uint16_t _d;                                                          \
 +    _d = deposit32((storage), R_ ## reg ## _ ## field ## _SHIFT,          \
 +                  R_ ## reg ## _ ## field ## _LENGTH, _v.v);              \
 +    _d; })
 +#define FIELD_SDP32(storage, reg, field, val) ({                          \
 +    struct {                                                              \
 +        signed int v:R_ ## reg ## _ ## field ## _LENGTH;                  \
 +    } _v = { .v = val };                                                  \
 +    uint32_t _d;                                                          \
 +    _d = deposit32((storage), R_ ## reg ## _ ## field ## _SHIFT,          \
 +                  R_ ## reg ## _ ## field ## _LENGTH, _v.v);              \
 +    _d; })
 +#define FIELD_SDP64(storage, reg, field, val) ({                          \
 +    struct {                                                              \
 +        int64_t v:R_ ## reg ## _ ## field ## _LENGTH;                     \
      } _v = { .v = val };                                                  \
      uint64_t _d;                                                          \
      _d = deposit64((storage), R_ ## reg ## _ ## field ## _SHIFT,          \
 --
-.20.1
+.25.1

-[PULL 21/26] target/arm: Get correct MMU index for other-security-state
+[PULL 09/26] target/arm: Set TCR_EL1.TSZ for user-only
-In arm_v7m_mmu_idx_for_secstate() we get the 'priv' level to pass to
+From: Richard Henderson <richard.henderson@linaro.org>
 armv7m_mmu_idx_for_secstate_and_priv() by calling arm_current_el().
 This is incorrect when the security state being queried is not the
 current one, because arm_current_el() uses the current security state
 to determine which of the banked CONTROL.nPRIV bits to look at.
 The effect was that if (for instance) Secure state was in privileged
 mode but Non-Secure was not then we would return the wrong MMU index.
-The only places where we are using this function in a way that could
+Set this as the kernel would, to 48 bits, to keep the computation
-trigger this bug are for the stack loads during a v8M function-return
+of the address space correct for PAuth.
 and for the instruction fetch of a v8M SG insn.
-Fix the bug by expanding out the M-profile version of the
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-arm_current_el() logic inline so it can use the passed in secstate
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-rather than env->v7m.secure.
+Message-id: 20220301215958.157011-3-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201022164408.13214-1-peter.maydell@linaro.org
 ---
- target/arm/m_helper.c | 3 ++-
+ target/arm/cpu.c | 3 ++-
 file changed, 2 insertions(+), 1 deletion(-)
-diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/m_helper.c
+--- a/target/arm/cpu.c
-+++ b/target/arm/m_helper.c
++++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
- /* Return the MMU index for a v7M CPU in the specified security state */
+                 aarch64_sve_zcr_get_valid_len(cpu, cpu->sve_default_vq - 1);
- ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
+         }
- {
+         /*
--    bool priv = arm_current_el(env) != 0;
++         * Enable 48-bit address space (TODO: take reserved_va into account).
-+    bool priv = arm_v7m_is_handler_mode(env) ||
+          * Enable TBI0 but not TBI1.
-+        !(env->v7m.control[secstate] & 1);
+          * Note that this must match useronly_clean_ptr.
+          */
-     return arm_v7m_mmu_idx_for_secstate_and_priv(env, secstate, priv);
+-        env->cp15.tcr_el[1].raw_tcr = (1ULL << 37);
- }
++        env->cp15.tcr_el[1].raw_tcr = 5 | (1ULL << 37);
          /* Enable MTE */
          if (cpu_isar_feature(aa64_mte, cpu)) {
 --
-.20.1
+.25.1

-[PULL 09/26] target/arm: Rename neon_load_reg64 to vfp_load_reg64
+[PULL 10/26] target/arm: Fault on invalid TCR_ELx.TxSZ
 From: Richard Henderson <richard.henderson@linaro.org>
-The only uses of this function are for loading VFP
+Without FEAT_LVA, the behaviour of programming an invalid value
-double-precision values, and nothing to do with NEON.
+is IMPLEMENTATION DEFINED.  With FEAT_LVA, programming an invalid
 minimum value requires a Translation fault.
+It is most self-consistent to choose to generate the fault always.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-10-richard.henderson@linaro.org
+Message-id: 20220301215958.157011-4-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c         |  8 ++--
+ target/arm/internals.h |  1 +
- target/arm/translate-vfp.c.inc | 84 +++++++++++++++++-----------------
+ target/arm/helper.c    | 32 ++++++++++++++++++++++++++++----
-files changed, 46 insertions(+), 46 deletions(-)
+files changed, 29 insertions(+), 4 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/internals.h
-+++ b/target/arm/translate.c
++++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
+@@ -XXX,XX +XXX,XX @@ typedef struct ARMVAParameters {
      bool hpd        : 1;
      bool using16k   : 1;
      bool using64k   : 1;
 +    bool tsz_oob    : 1;  /* tsz has been clamped to legal range */
  } ARMVAParameters;
  ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
                                     ARMMMUIdx mmu_idx, bool data)
  {
      uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
 -    bool epd, hpd, using16k, using64k;
 -    int select, tsz, tbi, max_tsz;
 +    bool epd, hpd, using16k, using64k, tsz_oob;
 +    int select, tsz, tbi, max_tsz, min_tsz;
      if (!regime_has_2_ranges(mmu_idx)) {
          select = 0;
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
      } else {
          max_tsz = 39;
      }
++    min_tsz = 16;  /* TODO: ARMv8.2-LVA  */
+-    tsz = MIN(tsz, max_tsz);
+-    tsz = MAX(tsz, 16);  /* TODO: ARMv8.2-LVA  */
++    if (tsz > max_tsz) {
++        tsz = max_tsz;
++        tsz_oob = true;
++    } else if (tsz < min_tsz) {
++        tsz = min_tsz;
++        tsz_oob = true;
++    } else {
++        tsz_oob = false;
++    }
+     /* Present TBI as a composite with TBID.  */
+     tbi = aa64_va_parameter_tbi(tcr, mmu_idx);
+@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
+         .hpd = hpd,
+         .using16k = using16k,
+         .using64k = using64k,
++        .tsz_oob = tsz_oob,
+     };
  }
--static inline void neon_load_reg64(TCGv_i64 var, int reg)
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
-+static inline void vfp_load_reg64(TCGv_i64 var, int reg)
+         param = aa64_va_parameters(env, address, mmu_idx,
- {
+                                    access_type != MMU_INST_FETCH);
--    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
+         level = 0;
-+    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(true, reg));
++
- }
++        /*
++         * If TxSZ is programmed to a value larger than the maximum,
--static inline void neon_store_reg64(TCGv_i64 var, int reg)
++         * or smaller than the effective minimum, it is IMPLEMENTATION
-+static inline void vfp_store_reg64(TCGv_i64 var, int reg)
++         * DEFINED whether we behave as if the field were programmed
- {
++         * within bounds, or if a level 0 Translation fault is generated.
--    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
++         *
-+    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(true, reg));
++         * With FEAT_LVA, fault on less than minimum becomes required,
- }
++         * so our choice is to always raise the fault.
++         */
- static inline void vfp_load_reg32(TCGv_i32 var, int reg)
++        if (param.tsz_oob) {
-diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
++            fault_type = ARMFault_Translation;
-index XXXXXXX..XXXXXXX 100644
++            goto do_fault;
---- a/target/arm/translate-vfp.c.inc
++        }
-+++ b/target/arm/translate-vfp.c.inc
++
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
+         addrsize = 64 - 8 * param.tbi;
-         tcg_gen_ext_i32_i64(nf, cpu_NF);
+         inputsize = 64 - param.tsz;
          tcg_gen_ext_i32_i64(vf, cpu_VF);
 -        neon_load_reg64(frn, rn);
 -        neon_load_reg64(frm, rm);
 +        vfp_load_reg64(frn, rn);
 +        vfp_load_reg64(frm, rm);
          switch (a->cc) {
          case 0: /* eq: Z */
              tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
              tcg_temp_free_i64(tmp);
              break;
          }
 -        neon_store_reg64(dest, rd);
 +        vfp_store_reg64(dest, rd);
          tcg_temp_free_i64(frn);
          tcg_temp_free_i64(frm);
          tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          TCGv_i64 tcg_res;
          tcg_op = tcg_temp_new_i64();
          tcg_res = tcg_temp_new_i64();
 -        neon_load_reg64(tcg_op, rm);
 +        vfp_load_reg64(tcg_op, rm);
          gen_helper_rintd(tcg_res, tcg_op, fpst);
 -        neon_store_reg64(tcg_res, rd);
 +        vfp_store_reg64(tcg_res, rd);
          tcg_temp_free_i64(tcg_op);
          tcg_temp_free_i64(tcg_res);
      } else {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
-         tcg_double = tcg_temp_new_i64();
-         tcg_res = tcg_temp_new_i64();
-         tcg_tmp = tcg_temp_new_i32();
--        neon_load_reg64(tcg_double, rm);
-+        vfp_load_reg64(tcg_double, rm);
-         if (is_signed) {
-             gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
-         } else {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
-     tmp = tcg_temp_new_i64();
-     if (a->l) {
-         gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
--        neon_store_reg64(tmp, a->vd);
-+        vfp_store_reg64(tmp, a->vd);
-     } else {
--        neon_load_reg64(tmp, a->vd);
-+        vfp_load_reg64(tmp, a->vd);
-         gen_aa32_st64(s, tmp, addr, get_mem_index(s));
-     }
-     tcg_temp_free_i64(tmp);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
-         if (a->l) {
-             /* load */
-             gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
--            neon_store_reg64(tmp, a->vd + i);
-+            vfp_store_reg64(tmp, a->vd + i);
-         } else {
-             /* store */
--            neon_load_reg64(tmp, a->vd + i);
-+            vfp_load_reg64(tmp, a->vd + i);
-             gen_aa32_st64(s, tmp, addr, get_mem_index(s));
-         }
-         tcg_gen_addi_i32(addr, addr, offset);
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
-     fd = tcg_temp_new_i64();
-     fpst = fpstatus_ptr(FPST_FPCR);
--    neon_load_reg64(f0, vn);
--    neon_load_reg64(f1, vm);
-+    vfp_load_reg64(f0, vn);
-+    vfp_load_reg64(f1, vm);
-     for (;;) {
-         if (reads_vd) {
--            neon_load_reg64(fd, vd);
-+            vfp_load_reg64(fd, vd);
-         }
-         fn(fd, f0, f1, fpst);
--        neon_store_reg64(fd, vd);
-+        vfp_store_reg64(fd, vd);
-         if (veclen == 0) {
-             break;
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
-         veclen--;
-         vd = vfp_advance_dreg(vd, delta_d);
-         vn = vfp_advance_dreg(vn, delta_d);
--        neon_load_reg64(f0, vn);
-+        vfp_load_reg64(f0, vn);
-         if (delta_m) {
-             vm = vfp_advance_dreg(vm, delta_m);
--            neon_load_reg64(f1, vm);
-+            vfp_load_reg64(f1, vm);
-         }
-     }
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
-     f0 = tcg_temp_new_i64();
-     fd = tcg_temp_new_i64();
--    neon_load_reg64(f0, vm);
-+    vfp_load_reg64(f0, vm);
-     for (;;) {
-         fn(fd, f0);
--        neon_store_reg64(fd, vd);
-+        vfp_store_reg64(fd, vd);
-         if (veclen == 0) {
-             break;
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
-             /* single source one-many */
-             while (veclen--) {
-                 vd = vfp_advance_dreg(vd, delta_d);
--                neon_store_reg64(fd, vd);
-+                vfp_store_reg64(fd, vd);
-             }
-             break;
-         }
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
-         veclen--;
-         vd = vfp_advance_dreg(vd, delta_d);
-         vd = vfp_advance_dreg(vm, delta_m);
--        neon_load_reg64(f0, vm);
-+        vfp_load_reg64(f0, vm);
-     }
-     tcg_temp_free_i64(f0);
-@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
-     vm = tcg_temp_new_i64();
-     vd = tcg_temp_new_i64();
--    neon_load_reg64(vn, a->vn);
--    neon_load_reg64(vm, a->vm);
-+    vfp_load_reg64(vn, a->vn);
-+    vfp_load_reg64(vm, a->vm);
-     if (neg_n) {
-         /* VFNMS, VFMS */
-         gen_helper_vfp_negd(vn, vn);
-     }
--    neon_load_reg64(vd, a->vd);
-+    vfp_load_reg64(vd, a->vd);
-     if (neg_d) {
-         /* VFNMA, VFNMS */
-         gen_helper_vfp_negd(vd, vd);
-     }
-     fpst = fpstatus_ptr(FPST_FPCR);
-     gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
--    neon_store_reg64(vd, a->vd);
-+    vfp_store_reg64(vd, a->vd);
-     tcg_temp_free_ptr(fpst);
-     tcg_temp_free_i64(vn);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
-     fd = tcg_const_i64(vfp_expand_imm(MO_64, a->imm));
-     for (;;) {
--        neon_store_reg64(fd, vd);
-+        vfp_store_reg64(fd, vd);
-         if (veclen == 0) {
-             break;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
-     vd = tcg_temp_new_i64();
-     vm = tcg_temp_new_i64();
--    neon_load_reg64(vd, a->vd);
-+    vfp_load_reg64(vd, a->vd);
-     if (a->z) {
-         tcg_gen_movi_i64(vm, 0);
-     } else {
--        neon_load_reg64(vm, a->vm);
-+        vfp_load_reg64(vm, a->vm);
-     }
-     if (a->e) {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
-     tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
-     vd = tcg_temp_new_i64();
-     gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
--    neon_store_reg64(vd, a->vd);
-+    vfp_store_reg64(vd, a->vd);
-     tcg_temp_free_i32(ahp_mode);
-     tcg_temp_free_ptr(fpst);
-     tcg_temp_free_i32(tmp);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
-     tmp = tcg_temp_new_i32();
-     vm = tcg_temp_new_i64();
--    neon_load_reg64(vm, a->vm);
-+    vfp_load_reg64(vm, a->vm);
-     gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
-     tcg_temp_free_i64(vm);
-     tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
-@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
-     }
-     tmp = tcg_temp_new_i64();
--    neon_load_reg64(tmp, a->vm);
-+    vfp_load_reg64(tmp, a->vm);
-     fpst = fpstatus_ptr(FPST_FPCR);
-     gen_helper_rintd(tmp, tmp, fpst);
--    neon_store_reg64(tmp, a->vd);
-+    vfp_store_reg64(tmp, a->vd);
-     tcg_temp_free_ptr(fpst);
-     tcg_temp_free_i64(tmp);
-     return true;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
-     }
-     tmp = tcg_temp_new_i64();
--    neon_load_reg64(tmp, a->vm);
-+    vfp_load_reg64(tmp, a->vm);
-     fpst = fpstatus_ptr(FPST_FPCR);
-     tcg_rmode = tcg_const_i32(float_round_to_zero);
-     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-     gen_helper_rintd(tmp, tmp, fpst);
-     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
--    neon_store_reg64(tmp, a->vd);
-+    vfp_store_reg64(tmp, a->vd);
-     tcg_temp_free_ptr(fpst);
-     tcg_temp_free_i64(tmp);
-     tcg_temp_free_i32(tcg_rmode);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
-     }
-     tmp = tcg_temp_new_i64();
--    neon_load_reg64(tmp, a->vm);
-+    vfp_load_reg64(tmp, a->vm);
-     fpst = fpstatus_ptr(FPST_FPCR);
-     gen_helper_rintd_exact(tmp, tmp, fpst);
--    neon_store_reg64(tmp, a->vd);
-+    vfp_store_reg64(tmp, a->vd);
-     tcg_temp_free_ptr(fpst);
-     tcg_temp_free_i64(tmp);
-     return true;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
-     vd = tcg_temp_new_i64();
-     vfp_load_reg32(vm, a->vm);
-     gen_helper_vfp_fcvtds(vd, vm, cpu_env);
--    neon_store_reg64(vd, a->vd);
-+    vfp_store_reg64(vd, a->vd);
-     tcg_temp_free_i32(vm);
-     tcg_temp_free_i64(vd);
-     return true;
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
-     vd = tcg_temp_new_i32();
-     vm = tcg_temp_new_i64();
--    neon_load_reg64(vm, a->vm);
-+    vfp_load_reg64(vm, a->vm);
-     gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
-     vfp_store_reg32(vd, a->vd);
-     tcg_temp_free_i32(vd);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
-         /* u32 -> f64 */
-         gen_helper_vfp_uitod(vd, vm, fpst);
-     }
--    neon_store_reg64(vd, a->vd);
-+    vfp_store_reg64(vd, a->vd);
-     tcg_temp_free_i32(vm);
-     tcg_temp_free_i64(vd);
-     tcg_temp_free_ptr(fpst);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
-     vm = tcg_temp_new_i64();
-     vd = tcg_temp_new_i32();
--    neon_load_reg64(vm, a->vm);
-+    vfp_load_reg64(vm, a->vm);
-     gen_helper_vjcvt(vd, vm, cpu_env);
-     vfp_store_reg32(vd, a->vd);
-     tcg_temp_free_i64(vm);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
-     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
-     vd = tcg_temp_new_i64();
--    neon_load_reg64(vd, a->vd);
-+    vfp_load_reg64(vd, a->vd);
-     fpst = fpstatus_ptr(FPST_FPCR);
-     shift = tcg_const_i32(frac_bits);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
-         g_assert_not_reached();
-     }
--    neon_store_reg64(vd, a->vd);
-+    vfp_store_reg64(vd, a->vd);
-     tcg_temp_free_i64(vd);
-     tcg_temp_free_i32(shift);
-     tcg_temp_free_ptr(fpst);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
-     fpst = fpstatus_ptr(FPST_FPCR);
-     vm = tcg_temp_new_i64();
-     vd = tcg_temp_new_i32();
--    neon_load_reg64(vm, a->vm);
-+    vfp_load_reg64(vm, a->vm);
-     if (a->s) {
-         if (a->rz) {
 --
-.20.1
+.25.1

-[PULL 08/26] target/arm: Add read/write_neon_element64
+[PULL 11/26] target/arm: Move arm_pamax out of line
 From: Richard Henderson <richard.henderson@linaro.org>
-Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.
+We will shortly share parts of this function with other portions
 of address translation.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-9-richard.henderson@linaro.org
+Message-id: 20220301215958.157011-5-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          | 26 +++++++++
+ target/arm/internals.h | 19 +------------------
- target/arm/translate-neon.c.inc | 94 ++++++++++++++++-----------------
+ target/arm/helper.c    | 22 ++++++++++++++++++++++
-files changed, 73 insertions(+), 47 deletions(-)
+files changed, 23 insertions(+), 18 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/internals.h
-+++ b/target/arm/translate.c
++++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
+@@ -XXX,XX +XXX,XX @@ static inline void update_spsel(CPUARMState *env, uint32_t imm)
-     }
+  * Returns the implementation defined bit-width of physical addresses.
   * The ARMv8 reference manuals refer to this as PAMax().
   */
 -static inline unsigned int arm_pamax(ARMCPU *cpu)
 -{
 -    static const unsigned int pamax_map[] = {
 -        [0] = 32,
 -        [1] = 36,
 -        [2] = 40,
 -        [3] = 42,
 -        [4] = 44,
 -        [5] = 48,
 -    };
 -    unsigned int parange =
 -        FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
 -
 -    /* id_aa64mmfr0 is a read-only register so values outside of the
 -     * supported mappings can be considered an implementation error.  */
 -    assert(parange < ARRAY_SIZE(pamax_map));
 -    return pamax_map[parange];
 -}
 +unsigned int arm_pamax(ARMCPU *cpu);
  /* Return true if extended addresses are enabled.
   * This is always the case if our translation regime is 64 bit,
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint8_t convert_stage2_attrs(CPUARMState *env, uint8_t s2attrs)
  }
+ #endif /* !CONFIG_USER_ONLY */
-+static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
 +/* The cpu-specific constant value of PAMax; also used by hw/arm/virt. */
 +unsigned int arm_pamax(ARMCPU *cpu)
 +{
-+    long off = neon_element_offset(reg, ele, memop);
++    static const unsigned int pamax_map[] = {
 +        [0] = 32,
 +        [1] = 36,
 +        [2] = 40,
 +        [3] = 42,
 +        [4] = 44,
 +        [5] = 48,
 +    };
 +    unsigned int parange =
 +        FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
 +
-+    switch (memop) {
++    /*
-+    case MO_Q:
++     * id_aa64mmfr0 is a read-only register so values outside of the
-+        tcg_gen_ld_i64(dest, cpu_env, off);
++     * supported mappings can be considered an implementation error.
-+        break;
++     */
-+    default:
++    assert(parange < ARRAY_SIZE(pamax_map));
-+        g_assert_not_reached();
++    return pamax_map[parange];
 +    }
 +}
 +
- static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
+ static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
  {
-     long off = neon_element_offset(reg, ele, memop);
+     if (regime_has_2_ranges(mmu_idx)) {
@@ -XXX,XX +XXX,XX @@ static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
      }
  }
 +static void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
 +{
 +    long off = neon_element_offset(reg, ele, memop);
 +
 +    switch (memop) {
 +    case MO_64:
 +        tcg_gen_st_i64(src, cpu_env, off);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
  static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
  {
      TCGv_ptr ret = tcg_temp_new_ptr();
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
      for (pass = 0; pass < a->q + 1; pass++) {
          TCGv_i64 tmp = tcg_temp_new_i64();
 -        neon_load_reg64(tmp, a->vm + pass);
 +        read_neon_element64(tmp, a->vm, pass, MO_64);
          fn(tmp, cpu_env, tmp, constimm);
 -        neon_store_reg64(tmp, a->vd + pass);
 +        write_neon_element64(tmp, a->vd, pass, MO_64);
          tcg_temp_free_i64(tmp);
      }
      tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
      rd = tcg_temp_new_i32();
      /* Load both inputs first to avoid potential overwrite if rm == rd */
 -    neon_load_reg64(rm1, a->vm);
 -    neon_load_reg64(rm2, a->vm + 1);
 +    read_neon_element64(rm1, a->vm, 0, MO_64);
 +    read_neon_element64(rm2, a->vm, 1, MO_64);
      shiftfn(rm1, rm1, constimm);
      narrowfn(rd, cpu_env, rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
          tcg_gen_shli_i64(tmp, tmp, a->shift);
          tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
      }
 -    neon_store_reg64(tmp, a->vd);
 +    write_neon_element64(tmp, a->vd, 0, MO_64);
      widenfn(tmp, rm1);
      tcg_temp_free_i32(rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
          tcg_gen_shli_i64(tmp, tmp, a->shift);
          tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
      }
 -    neon_store_reg64(tmp, a->vd + 1);
 +    write_neon_element64(tmp, a->vd, 1, MO_64);
      tcg_temp_free_i64(tmp);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      rm_64 = tcg_temp_new_i64();
      if (src1_wide) {
 -        neon_load_reg64(rn0_64, a->vn);
 +        read_neon_element64(rn0_64, a->vn, 0, MO_64);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 0, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
       * avoid incorrect results if a narrow input overlaps with the result.
       */
      if (src1_wide) {
 -        neon_load_reg64(rn1_64, a->vn + 1);
 +        read_neon_element64(rn1_64, a->vn, 1, MO_64);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      rm = tcg_temp_new_i32();
      read_neon_element32(rm, a->vm, 1, MO_32);
 -    neon_store_reg64(rn0_64, a->vd);
 +    write_neon_element64(rn0_64, a->vd, 0, MO_64);
      widenfn(rm_64, rm);
      tcg_temp_free_i32(rm);
      opfn(rn1_64, rn1_64, rm_64);
 -    neon_store_reg64(rn1_64, a->vd + 1);
 +    write_neon_element64(rn1_64, a->vd, 1, MO_64);
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
      rd0 = tcg_temp_new_i32();
      rd1 = tcg_temp_new_i32();
 -    neon_load_reg64(rn_64, a->vn);
 -    neon_load_reg64(rm_64, a->vm);
 +    read_neon_element64(rn_64, a->vn, 0, MO_64);
 +    read_neon_element64(rm_64, a->vm, 0, MO_64);
      opfn(rn_64, rn_64, rm_64);
      narrowfn(rd0, rn_64);
 -    neon_load_reg64(rn_64, a->vn + 1);
 -    neon_load_reg64(rm_64, a->vm + 1);
 +    read_neon_element64(rn_64, a->vn, 1, MO_64);
 +    read_neon_element64(rm_64, a->vm, 1, MO_64);
      opfn(rn_64, rn_64, rm_64);
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
      /* Don't store results until after all loads: they might overlap */
      if (accfn) {
          tmp = tcg_temp_new_i64();
 -        neon_load_reg64(tmp, a->vd);
 +        read_neon_element64(tmp, a->vd, 0, MO_64);
          accfn(tmp, tmp, rd0);
 -        neon_store_reg64(tmp, a->vd);
 -        neon_load_reg64(tmp, a->vd + 1);
 +        write_neon_element64(tmp, a->vd, 0, MO_64);
 +        read_neon_element64(tmp, a->vd, 1, MO_64);
          accfn(tmp, tmp, rd1);
 -        neon_store_reg64(tmp, a->vd + 1);
 +        write_neon_element64(tmp, a->vd, 1, MO_64);
          tcg_temp_free_i64(tmp);
      } else {
 -        neon_store_reg64(rd0, a->vd);
 -        neon_store_reg64(rd1, a->vd + 1);
 +        write_neon_element64(rd0, a->vd, 0, MO_64);
 +        write_neon_element64(rd1, a->vd, 1, MO_64);
      }
      tcg_temp_free_i64(rd0);
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
      if (accfn) {
          TCGv_i64 t64 = tcg_temp_new_i64();
 -        neon_load_reg64(t64, a->vd);
 +        read_neon_element64(t64, a->vd, 0, MO_64);
          accfn(t64, t64, rn0_64);
 -        neon_store_reg64(t64, a->vd);
 -        neon_load_reg64(t64, a->vd + 1);
 +        write_neon_element64(t64, a->vd, 0, MO_64);
 +        read_neon_element64(t64, a->vd, 1, MO_64);
          accfn(t64, t64, rn1_64);
 -        neon_store_reg64(t64, a->vd + 1);
 +        write_neon_element64(t64, a->vd, 1, MO_64);
          tcg_temp_free_i64(t64);
      } else {
 -        neon_store_reg64(rn0_64, a->vd);
 -        neon_store_reg64(rn1_64, a->vd + 1);
 +        write_neon_element64(rn0_64, a->vd, 0, MO_64);
 +        write_neon_element64(rn1_64, a->vd, 1, MO_64);
      }
      tcg_temp_free_i64(rn0_64);
      tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
          right = tcg_temp_new_i64();
          dest = tcg_temp_new_i64();
 -        neon_load_reg64(right, a->vn);
 -        neon_load_reg64(left, a->vm);
 +        read_neon_element64(right, a->vn, 0, MO_64);
 +        read_neon_element64(left, a->vm, 0, MO_64);
          tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
 -        neon_store_reg64(dest, a->vd);
 +        write_neon_element64(dest, a->vd, 0, MO_64);
          tcg_temp_free_i64(left);
          tcg_temp_free_i64(right);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
          destright = tcg_temp_new_i64();
          if (a->imm < 8) {
 -            neon_load_reg64(right, a->vn);
 -            neon_load_reg64(middle, a->vn + 1);
 +            read_neon_element64(right, a->vn, 0, MO_64);
 +            read_neon_element64(middle, a->vn, 1, MO_64);
              tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
 -            neon_load_reg64(left, a->vm);
 +            read_neon_element64(left, a->vm, 0, MO_64);
              tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
          } else {
 -            neon_load_reg64(right, a->vn + 1);
 -            neon_load_reg64(middle, a->vm);
 +            read_neon_element64(right, a->vn, 1, MO_64);
 +            read_neon_element64(middle, a->vm, 0, MO_64);
              tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
 -            neon_load_reg64(left, a->vm + 1);
 +            read_neon_element64(left, a->vm, 1, MO_64);
              tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
          }
 -        neon_store_reg64(destright, a->vd);
 -        neon_store_reg64(destleft, a->vd + 1);
 +        write_neon_element64(destright, a->vd, 0, MO_64);
 +        write_neon_element64(destleft, a->vd, 1, MO_64);
          tcg_temp_free_i64(destright);
          tcg_temp_free_i64(destleft);
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
          if (accfn) {
              TCGv_i64 tmp64 = tcg_temp_new_i64();
 -            neon_load_reg64(tmp64, a->vd + pass);
 +            read_neon_element64(tmp64, a->vd, pass, MO_64);
              accfn(rd_64, tmp64, rd_64);
              tcg_temp_free_i64(tmp64);
          }
 -        neon_store_reg64(rd_64, a->vd + pass);
 +        write_neon_element64(rd_64, a->vd, pass, MO_64);
          tcg_temp_free_i64(rd_64);
      }
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
      rd0 = tcg_temp_new_i32();
      rd1 = tcg_temp_new_i32();
 -    neon_load_reg64(rm, a->vm);
 +    read_neon_element64(rm, a->vm, 0, MO_64);
      narrowfn(rd0, cpu_env, rm);
 -    neon_load_reg64(rm, a->vm + 1);
 +    read_neon_element64(rm, a->vm, 1, MO_64);
      narrowfn(rd1, cpu_env, rm);
      write_neon_element32(rd0, a->vd, 0, MO_32);
      write_neon_element32(rd1, a->vd, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      widenfn(rd, rm0);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
 -    neon_store_reg64(rd, a->vd);
 +    write_neon_element64(rd, a->vd, 0, MO_64);
      widenfn(rd, rm1);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
 -    neon_store_reg64(rd, a->vd + 1);
 +    write_neon_element64(rd, a->vd, 1, MO_64);
      tcg_temp_free_i64(rd);
      tcg_temp_free_i32(rm0);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSWP(DisasContext *s, arg_2misc *a)
      rm = tcg_temp_new_i64();
      rd = tcg_temp_new_i64();
      for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
 -        neon_load_reg64(rm, a->vm + pass);
 -        neon_load_reg64(rd, a->vd + pass);
 -        neon_store_reg64(rm, a->vd + pass);
 -        neon_store_reg64(rd, a->vm + pass);
 +        read_neon_element64(rm, a->vm, pass, MO_64);
 +        read_neon_element64(rd, a->vd, pass, MO_64);
 +        write_neon_element64(rm, a->vd, pass, MO_64);
 +        write_neon_element64(rd, a->vm, pass, MO_64);
      }
      tcg_temp_free_i64(rm);
      tcg_temp_free_i64(rd);
 --
-.20.1
+.25.1

-[PULL 07/26] target/arm: Rename neon_load_reg32 to vfp_load_reg32
+[PULL 12/26] target/arm: Pass outputsize down to check_s2_mmu_setup
 From: Richard Henderson <richard.henderson@linaro.org>
-The only uses of this function are for loading VFP
+Pass down the width of the output address from translation.
-single-precision values, and nothing to do with NEON.
+For now this is still just PAMax, but a subsequent patch will
 compute the correct value from TCR_ELx.{I}PS.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-8-richard.henderson@linaro.org
+Message-id: 20220301215958.157011-6-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c         |   4 +-
+ target/arm/helper.c | 21 ++++++++++-----------
- target/arm/translate-vfp.c.inc | 184 ++++++++++++++++-----------------
+file changed, 10 insertions(+), 11 deletions(-)
 files changed, 94 insertions(+), 94 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
+@@ -XXX,XX +XXX,XX @@ do_fault:
-     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
+  * false otherwise.
- }
+  */
+ static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
--static inline void neon_load_reg32(TCGv_i32 var, int reg)
+-                               int inputsize, int stride)
-+static inline void vfp_load_reg32(TCGv_i32 var, int reg)
++                               int inputsize, int stride, int outputsize)
  {
-     tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
+     const int grainsize = stride + 3;
- }
+     int startsizecheck;
+@@ -XXX,XX +XXX,XX @@ static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
 -static inline void neon_store_reg32(TCGv_i32 var, int reg)
 +static inline void vfp_store_reg32(TCGv_i32 var, int reg)
  {
      tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
  }
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
          frn = tcg_temp_new_i32();
          frm = tcg_temp_new_i32();
          dest = tcg_temp_new_i32();
 -        neon_load_reg32(frn, rn);
 -        neon_load_reg32(frm, rm);
 +        vfp_load_reg32(frn, rn);
 +        vfp_load_reg32(frm, rm);
          switch (a->cc) {
          case 0: /* eq: Z */
              tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
          if (sz == 1) {
              tcg_gen_andi_i32(dest, dest, 0xffff);
          }
 -        neon_store_reg32(dest, rd);
 +        vfp_store_reg32(dest, rd);
          tcg_temp_free_i32(frn);
          tcg_temp_free_i32(frm);
          tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
          TCGv_i32 tcg_res;
          tcg_op = tcg_temp_new_i32();
          tcg_res = tcg_temp_new_i32();
 -        neon_load_reg32(tcg_op, rm);
 +        vfp_load_reg32(tcg_op, rm);
          if (sz == 1) {
              gen_helper_rinth(tcg_res, tcg_op, fpst);
          } else {
              gen_helper_rints(tcg_res, tcg_op, fpst);
          }
 -        neon_store_reg32(tcg_res, rd);
 +        vfp_store_reg32(tcg_res, rd);
          tcg_temp_free_i32(tcg_op);
          tcg_temp_free_i32(tcg_res);
      }
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
-             gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
+     if (is_aa64) {
-         }
+-        CPUARMState *env = &cpu->env;
-         tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
+-        unsigned int pamax = arm_pamax(cpu);
--        neon_store_reg32(tcg_tmp, rd);
+-
-+        vfp_store_reg32(tcg_tmp, rd);
+         switch (stride) {
-         tcg_temp_free_i32(tcg_tmp);
+         case 13: /* 64KB Pages.  */
-         tcg_temp_free_i64(tcg_res);
+-            if (level == 0 || (level == 1 && pamax <= 42)) {
-         tcg_temp_free_i64(tcg_double);
++            if (level == 0 || (level == 1 && outputsize <= 42)) {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
+                 return false;
          TCGv_i32 tcg_single, tcg_res;
          tcg_single = tcg_temp_new_i32();
          tcg_res = tcg_temp_new_i32();
 -        neon_load_reg32(tcg_single, rm);
 +        vfp_load_reg32(tcg_single, rm);
          if (sz == 1) {
              if (is_signed) {
                  gen_helper_vfp_toslh(tcg_res, tcg_single, tcg_shift, fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
                  gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
              }
          }
 -        neon_store_reg32(tcg_res, rd);
 +        vfp_store_reg32(tcg_res, rd);
          tcg_temp_free_i32(tcg_res);
          tcg_temp_free_i32(tcg_single);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
      if (a->l) {
          /* VFP to general purpose register */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vn);
 +        vfp_load_reg32(tmp, a->vn);
          tcg_gen_andi_i32(tmp, tmp, 0xffff);
          store_reg(s, a->rt, tmp);
      } else {
          /* general purpose register to VFP */
          tmp = load_reg(s, a->rt);
          tcg_gen_andi_i32(tmp, tmp, 0xffff);
 -        neon_store_reg32(tmp, a->vn);
 +        vfp_store_reg32(tmp, a->vn);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
      if (a->l) {
          /* VFP to general purpose register */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vn);
 +        vfp_load_reg32(tmp, a->vn);
          if (a->rt == 15) {
              /* Set the 4 flag bits in the CPSR.  */
              gen_set_nzcv(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
      } else {
          /* general purpose register to VFP */
          tmp = load_reg(s, a->rt);
 -        neon_store_reg32(tmp, a->vn);
 +        vfp_store_reg32(tmp, a->vn);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
      if (a->op) {
          /* fpreg to gpreg */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm);
 +        vfp_load_reg32(tmp, a->vm);
          store_reg(s, a->rt, tmp);
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm + 1);
 +        vfp_load_reg32(tmp, a->vm + 1);
          store_reg(s, a->rt2, tmp);
      } else {
          /* gpreg to fpreg */
          tmp = load_reg(s, a->rt);
 -        neon_store_reg32(tmp, a->vm);
 +        vfp_store_reg32(tmp, a->vm);
          tcg_temp_free_i32(tmp);
          tmp = load_reg(s, a->rt2);
 -        neon_store_reg32(tmp, a->vm + 1);
 +        vfp_store_reg32(tmp, a->vm + 1);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
      if (a->op) {
          /* fpreg to gpreg */
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm * 2);
 +        vfp_load_reg32(tmp, a->vm * 2);
          store_reg(s, a->rt, tmp);
          tmp = tcg_temp_new_i32();
 -        neon_load_reg32(tmp, a->vm * 2 + 1);
 +        vfp_load_reg32(tmp, a->vm * 2 + 1);
          store_reg(s, a->rt2, tmp);
      } else {
          /* gpreg to fpreg */
          tmp = load_reg(s, a->rt);
 -        neon_store_reg32(tmp, a->vm * 2);
 +        vfp_store_reg32(tmp, a->vm * 2);
          tcg_temp_free_i32(tmp);
          tmp = load_reg(s, a->rt2);
 -        neon_store_reg32(tmp, a->vm * 2 + 1);
 +        vfp_store_reg32(tmp, a->vm * 2 + 1);
          tcg_temp_free_i32(tmp);
      }
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
      tmp = tcg_temp_new_i32();
      if (a->l) {
          gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
 -        neon_store_reg32(tmp, a->vd);
 +        vfp_store_reg32(tmp, a->vd);
      } else {
 -        neon_load_reg32(tmp, a->vd);
 +        vfp_load_reg32(tmp, a->vd);
          gen_aa32_st16(s, tmp, addr, get_mem_index(s));
      }
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
      tmp = tcg_temp_new_i32();
      if (a->l) {
          gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
 -        neon_store_reg32(tmp, a->vd);
 +        vfp_store_reg32(tmp, a->vd);
      } else {
 -        neon_load_reg32(tmp, a->vd);
 +        vfp_load_reg32(tmp, a->vd);
          gen_aa32_st32(s, tmp, addr, get_mem_index(s));
      }
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
          if (a->l) {
              /* load */
              gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
 -            neon_store_reg32(tmp, a->vd + i);
 +            vfp_store_reg32(tmp, a->vd + i);
          } else {
              /* store */
 -            neon_load_reg32(tmp, a->vd + i);
 +            vfp_load_reg32(tmp, a->vd + i);
              gen_aa32_st32(s, tmp, addr, get_mem_index(s));
          }
          tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
      fd = tcg_temp_new_i32();
      fpst = fpstatus_ptr(FPST_FPCR);
 -    neon_load_reg32(f0, vn);
 -    neon_load_reg32(f1, vm);
 +    vfp_load_reg32(f0, vn);
 +    vfp_load_reg32(f1, vm);
      for (;;) {
          if (reads_vd) {
 -            neon_load_reg32(fd, vd);
 +            vfp_load_reg32(fd, vd);
          }
          fn(fd, f0, f1, fpst);
 -        neon_store_reg32(fd, vd);
 +        vfp_store_reg32(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
          veclen--;
          vd = vfp_advance_sreg(vd, delta_d);
          vn = vfp_advance_sreg(vn, delta_d);
 -        neon_load_reg32(f0, vn);
 +        vfp_load_reg32(f0, vn);
          if (delta_m) {
              vm = vfp_advance_sreg(vm, delta_m);
 -            neon_load_reg32(f1, vm);
 +            vfp_load_reg32(f1, vm);
          }
      }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
      fd = tcg_temp_new_i32();
      fpst = fpstatus_ptr(FPST_FPCR_F16);
 -    neon_load_reg32(f0, vn);
 -    neon_load_reg32(f1, vm);
 +    vfp_load_reg32(f0, vn);
 +    vfp_load_reg32(f1, vm);
      if (reads_vd) {
 -        neon_load_reg32(fd, vd);
 +        vfp_load_reg32(fd, vd);
      }
      fn(fd, f0, f1, fpst);
 -    neon_store_reg32(fd, vd);
 +    vfp_store_reg32(fd, vd);
      tcg_temp_free_i32(f0);
      tcg_temp_free_i32(f1);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
      f0 = tcg_temp_new_i32();
      fd = tcg_temp_new_i32();
 -    neon_load_reg32(f0, vm);
 +    vfp_load_reg32(f0, vm);
      for (;;) {
          fn(fd, f0);
 -        neon_store_reg32(fd, vd);
 +        vfp_store_reg32(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
              /* single source one-many */
              while (veclen--) {
                  vd = vfp_advance_sreg(vd, delta_d);
 -                neon_store_reg32(fd, vd);
 +                vfp_store_reg32(fd, vd);
              }
              break;
+         case 11: /* 16KB Pages.  */
+-            if (level == 0 || (level == 1 && pamax <= 40)) {
++            if (level == 0 || (level == 1 && outputsize <= 40)) {
+                 return false;
+             }
+             break;
+         case 9: /* 4KB Pages.  */
+-            if (level == 0 && pamax <= 42) {
++            if (level == 0 && outputsize <= 42) {
+                 return false;
+             }
+             break;
+@@ -XXX,XX +XXX,XX @@ static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
          }
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
-         veclen--;
+         /* Inputsize checks.  */
-         vd = vfp_advance_sreg(vd, delta_d);
+-        if (inputsize > pamax &&
-         vm = vfp_advance_sreg(vm, delta_m);
+-            (arm_el_is_aa64(env, 1) || inputsize > 40)) {
--        neon_load_reg32(f0, vm);
++        if (inputsize > outputsize &&
-+        vfp_load_reg32(f0, vm);
++            (arm_el_is_aa64(&cpu->env, 1) || inputsize > 40)) {
              /* This is CONSTRAINED UNPREDICTABLE and we choose to fault.  */
              return false;
          }
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
      target_ulong page_size;
      uint32_t attrs;
      int32_t stride;
 -    int addrsize, inputsize;
 +    int addrsize, inputsize, outputsize;
      TCR *tcr = regime_tcr(env, mmu_idx);
      int ap, ns, xn, pxn;
      uint32_t el = regime_el(env, mmu_idx);
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
          addrsize = 64 - 8 * param.tbi;
          inputsize = 64 - param.tsz;
 +        outputsize = arm_pamax(cpu);
      } else {
          param = aa32_va_parameters(env, address, mmu_idx);
          level = 1;
          addrsize = (mmu_idx == ARMMMUIdx_Stage2 ? 40 : 32);
          inputsize = addrsize - param.tsz;
 +        outputsize = 40;
      }
-     tcg_temp_free_i32(f0);
+     /*
-@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
-     }
+         /* Check that the starting level is valid. */
-     f0 = tcg_temp_new_i32();
+         ok = check_s2_mmu_setup(cpu, aarch64, startlevel,
--    neon_load_reg32(f0, vm);
+-                                inputsize, stride);
-+    vfp_load_reg32(f0, vm);
++                                inputsize, stride, outputsize);
-     fn(f0, f0);
+         if (!ok) {
--    neon_store_reg32(f0, vd);
+             fault_type = ARMFault_Translation;
-+    vfp_store_reg32(f0, vd);
+             goto do_fault;
      tcg_temp_free_i32(f0);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vn, a->vn);
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vn, a->vn);
 +    vfp_load_reg32(vm, a->vm);
      if (neg_n) {
          /* VFNMS, VFMS */
          gen_helper_vfp_negh(vn, vn);
      }
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (neg_d) {
          /* VFNMA, VFNMS */
          gen_helper_vfp_negh(vd, vd);
      }
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vn, a->vn);
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vn, a->vn);
 +    vfp_load_reg32(vm, a->vm);
      if (neg_n) {
          /* VFNMS, VFMS */
          gen_helper_vfp_negs(vn, vn);
      }
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (neg_d) {
          /* VFNMA, VFNMS */
          gen_helper_vfp_negs(vd, vd);
      }
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_hp(DisasContext *s, arg_VMOV_imm_sp *a)
      }
      fd = tcg_const_i32(vfp_expand_imm(MO_16, a->imm));
 -    neon_store_reg32(fd, a->vd);
 +    vfp_store_reg32(fd, a->vd);
      tcg_temp_free_i32(fd);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
      fd = tcg_const_i32(vfp_expand_imm(MO_32, a->imm));
      for (;;) {
 -        neon_store_reg32(fd, vd);
 +        vfp_store_reg32(fd, vd);
          if (veclen == 0) {
              break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
      vd = tcg_temp_new_i32();
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (a->z) {
          tcg_gen_movi_i32(vm, 0);
      } else {
 -        neon_load_reg32(vm, a->vm);
 +        vfp_load_reg32(vm, a->vm);
      }
      if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
      vd = tcg_temp_new_i32();
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      if (a->z) {
          tcg_gen_movi_i32(vm, 0);
      } else {
 -        neon_load_reg32(vm, a->vm);
 +        vfp_load_reg32(vm, a->vm);
      }
      if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
      /* The T bit tells us if we want the low or high 16 bits of Vm */
      tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
      gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_i32(ahp_mode);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
      ahp_mode = get_ahp_flag();
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
      tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
      tcg_temp_free_i32(ahp_mode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_hp(DisasContext *s, arg_VRINTR_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      gen_helper_rinth(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rints(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_hp(DisasContext *s, arg_VRINTZ_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rinth(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tcg_rmode);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      tcg_rmode = tcg_const_i32(float_round_to_zero);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
      gen_helper_rints(tmp, tmp, fpst);
      gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tcg_rmode);
      tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_hp(DisasContext *s, arg_VRINTX_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      gen_helper_rinth_exact(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
      }
      tmp = tcg_temp_new_i32();
 -    neon_load_reg32(tmp, a->vm);
 +    vfp_load_reg32(tmp, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      gen_helper_rints_exact(tmp, tmp, fpst);
 -    neon_store_reg32(tmp, a->vd);
 +    vfp_store_reg32(tmp, a->vd);
      tcg_temp_free_ptr(fpst);
      tcg_temp_free_i32(tmp);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i64();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      gen_helper_vfp_fcvtds(vd, vm, cpu_env);
      neon_store_reg64(vd, a->vd);
      tcg_temp_free_i32(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
      vm = tcg_temp_new_i64();
      neon_load_reg64(vm, a->vm);
      gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i64(vm);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
      }
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      if (a->s) {
          /* i32 -> f16 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
          /* u32 -> f16 */
          gen_helper_vfp_uitoh(vm, vm, fpst);
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
      }
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      if (a->s) {
          /* i32 -> f32 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
          /* u32 -> f32 */
          gen_helper_vfp_uitos(vm, vm, fpst);
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
      vm = tcg_temp_new_i32();
      vd = tcg_temp_new_i64();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      fpst = fpstatus_ptr(FPST_FPCR);
      if (a->s) {
          /* i32 -> f64 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
      vd = tcg_temp_new_i32();
      neon_load_reg64(vm, a->vm);
      gen_helper_vjcvt(vd, vm, cpu_env);
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i64(vm);
      tcg_temp_free_i32(vd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
          g_assert_not_reached();
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i32(shift);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
      frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
      vd = tcg_temp_new_i32();
 -    neon_load_reg32(vd, a->vd);
 +    vfp_load_reg32(vd, a->vd);
      fpst = fpstatus_ptr(FPST_FPCR);
      shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
          g_assert_not_reached();
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i32(shift);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
      fpst = fpstatus_ptr(FPST_FPCR_F16);
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      if (a->s) {
          if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
              gen_helper_vfp_touih(vm, vm, fpst);
          }
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
      fpst = fpstatus_ptr(FPST_FPCR);
      vm = tcg_temp_new_i32();
 -    neon_load_reg32(vm, a->vm);
 +    vfp_load_reg32(vm, a->vm);
      if (a->s) {
          if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
              gen_helper_vfp_touis(vm, vm, fpst);
          }
      }
 -    neon_store_reg32(vm, a->vd);
 +    vfp_store_reg32(vm, a->vd);
      tcg_temp_free_i32(vm);
      tcg_temp_free_ptr(fpst);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
              gen_helper_vfp_touid(vd, vm, fpst);
          }
      }
 -    neon_store_reg32(vd, a->vd);
 +    vfp_store_reg32(vd, a->vd);
      tcg_temp_free_i32(vd);
      tcg_temp_free_i64(vm);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VINS(DisasContext *s, arg_VINS *a)
      /* Insert low half of Vm into high half of Vd */
      rm = tcg_temp_new_i32();
      rd = tcg_temp_new_i32();
 -    neon_load_reg32(rm, a->vm);
 -    neon_load_reg32(rd, a->vd);
 +    vfp_load_reg32(rm, a->vm);
 +    vfp_load_reg32(rd, a->vd);
      tcg_gen_deposit_i32(rd, rd, rm, 16, 16);
 -    neon_store_reg32(rd, a->vd);
 +    vfp_store_reg32(rd, a->vd);
      tcg_temp_free_i32(rm);
      tcg_temp_free_i32(rd);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOVX(DisasContext *s, arg_VINS *a)
      /* Set Vd to high half of Vm */
      rm = tcg_temp_new_i32();
 -    neon_load_reg32(rm, a->vm);
 +    vfp_load_reg32(rm, a->vm);
      tcg_gen_shri_i32(rm, rm, 16);
 -    neon_store_reg32(rm, a->vd);
 +    vfp_store_reg32(rm, a->vd);
      tcg_temp_free_i32(rm);
      return true;
  }
 --
-.20.1
+.25.1

-[PULL 17/26] hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
+[PULL 13/26] target/arm: Use MAKE_64BIT_MASK to compute indexmask
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-Use the BIT_ULL() macro to ensure we use 64-bit arithmetic.
+The macro is a bit more readable than the inlined computation.
 This fixes the following Coverity issue (OVERFLOW_BEFORE_WIDEN):
-  CID 1432363 (#1 of 1): Unintentional integer overflow:
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-  overflow_before_widen:
+Message-id: 20220301215958.157011-7-richard.henderson@linaro.org
     Potentially overflowing expression 1 << scale with type int
     (32 bits, signed) is evaluated using 32-bit arithmetic, and
     then used in a context that expects an expression of type
     hwaddr (64 bits, unsigned).
 Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Acked-by: Eric Auger <eric.auger@redhat.com>
 Message-id: 20201030144617.1535064-1-philmd@redhat.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/smmuv3.c | 3 ++-
+ target/arm/helper.c | 4 ++--
-file changed, 2 insertions(+), 1 deletion(-)
+file changed, 2 insertions(+), 2 deletions(-)
-diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/smmuv3.c
+--- a/target/arm/helper.c
-+++ b/hw/arm/smmuv3.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
-  */
+         level = startlevel;
  #include "qemu/osdep.h"
 +#include "qemu/bitops.h"
  #include "hw/irq.h"
  #include "hw/sysbus.h"
  #include "migration/vmstate.h"
@@ -XXX,XX +XXX,XX @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
          scale = CMD_SCALE(cmd);
          num = CMD_NUM(cmd);
          ttl = CMD_TTL(cmd);
 -        num_pages = (num + 1) * (1 << (scale));
 +        num_pages = (num + 1) * BIT_ULL(scale);
      }
-     if (type == SMMU_CMD_TLBI_NH_VA) {
+-    indexmask_grainsize = (1ULL << (stride + 3)) - 1;
 -    indexmask = (1ULL << (inputsize - (stride * (4 - level)))) - 1;
 +    indexmask_grainsize = MAKE_64BIT_MASK(0, stride + 3);
 +    indexmask = MAKE_64BIT_MASK(0, inputsize - (stride * (4 - level)));
      /* Now we can extract the actual base address from the TTBR */
      descaddr = extract64(ttbr, 0, 48);
 --
-.20.1
+.25.1

-[PULL 10/26] target/arm: Simplify do_long_3d and do_2scalar_long
+[PULL 14/26] target/arm: Honor TCR_ELx.{I}PS
 From: Richard Henderson <richard.henderson@linaro.org>
-In both cases, we can sink the write-back and perform
+This field controls the output (intermediate) physical address size
-the accumulate into the normal destination temps.
+of the translation process.  V8 requires to raise an AddressSize
 fault if the page tables are programmed incorrectly, such that any
 intermediate descriptor address, or the final translated address,
 is out of range.
+Add a PS field to ARMVAParameters, and properly compute outputsize
+in get_phys_addr_lpae.  Test the descaddr as extracted from TTBR
+and from page table entries.
+Restrict descaddrmask so that we won't raise the fault for v7.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-11-richard.henderson@linaro.org
+Message-id: 20220301215958.157011-8-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-neon.c.inc | 23 +++++++++--------------
+ target/arm/internals.h |  1 +
-file changed, 9 insertions(+), 14 deletions(-)
+ target/arm/helper.c    | 72 ++++++++++++++++++++++++++++++++----------
 files changed, 57 insertions(+), 16 deletions(-)
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.c.inc
+--- a/target/arm/internals.h
-+++ b/target/arm/translate-neon.c.inc
++++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
+@@ -XXX,XX +XXX,XX @@ static inline uint32_t aarch64_pstate_valid_mask(const ARMISARegisters *id)
-     if (accfn) {
+  */
-         tmp = tcg_temp_new_i64();
+ typedef struct ARMVAParameters {
-         read_neon_element64(tmp, a->vd, 0, MO_64);
+     unsigned tsz    : 8;
--        accfn(tmp, tmp, rd0);
++    unsigned ps     : 3;
--        write_neon_element64(tmp, a->vd, 0, MO_64);
+     unsigned select : 1;
-+        accfn(rd0, tmp, rd0);
+     bool tbi        : 1;
-         read_neon_element64(tmp, a->vd, 1, MO_64);
+     bool epd        : 1;
--        accfn(tmp, tmp, rd1);
+diff --git a/target/arm/helper.c b/target/arm/helper.c
--        write_neon_element64(tmp, a->vd, 1, MO_64);
+index XXXXXXX..XXXXXXX 100644
-+        accfn(rd1, tmp, rd1);
+--- a/target/arm/helper.c
-         tcg_temp_free_i64(tmp);
++++ b/target/arm/helper.c
--    } else {
+@@ -XXX,XX +XXX,XX @@ static uint8_t convert_stage2_attrs(CPUARMState *env, uint8_t s2attrs)
--        write_neon_element64(rd0, a->vd, 0, MO_64);
+ }
--        write_neon_element64(rd1, a->vd, 1, MO_64);
+ #endif /* !CONFIG_USER_ONLY */
 +/* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
 +static const uint8_t pamax_map[] = {
 +    [0] = 32,
 +    [1] = 36,
 +    [2] = 40,
 +    [3] = 42,
 +    [4] = 44,
 +    [5] = 48,
 +};
 +
  /* The cpu-specific constant value of PAMax; also used by hw/arm/virt. */
  unsigned int arm_pamax(ARMCPU *cpu)
  {
 -    static const unsigned int pamax_map[] = {
 -        [0] = 32,
 -        [1] = 36,
 -        [2] = 40,
 -        [3] = 42,
 -        [4] = 44,
 -        [5] = 48,
 -    };
      unsigned int parange =
          FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
  {
      uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
      bool epd, hpd, using16k, using64k, tsz_oob;
 -    int select, tsz, tbi, max_tsz, min_tsz;
 +    int select, tsz, tbi, max_tsz, min_tsz, ps;
      if (!regime_has_2_ranges(mmu_idx)) {
          select = 0;
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
              hpd = extract32(tcr, 24, 1);
          }
          epd = false;
 +        ps = extract32(tcr, 16, 3);
      } else {
          /*
           * Bit 55 is always between the two regions, and is canonical for
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
              epd = extract32(tcr, 23, 1);
              hpd = extract64(tcr, 42, 1);
          }
 +        ps = extract64(tcr, 32, 3);
      }
-+    write_neon_element64(rd0, a->vd, 0, MO_64);
+     if (cpu_isar_feature(aa64_st, env_archcpu(env))) {
-+    write_neon_element64(rd1, a->vd, 1, MO_64);
+@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
-     tcg_temp_free_i64(rd0);
-     tcg_temp_free_i64(rd1);
+     return (ARMVAParameters) {
+         .tsz = tsz,
-@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
++        .ps = ps,
-     if (accfn) {
+         .select = select,
-         TCGv_i64 t64 = tcg_temp_new_i64();
+         .tbi = tbi,
-         read_neon_element64(t64, a->vd, 0, MO_64);
+         .epd = epd,
--        accfn(t64, t64, rn0_64);
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
--        write_neon_element64(t64, a->vd, 0, MO_64);
-+        accfn(rn0_64, t64, rn0_64);
+     /* TODO: This code does not support shareability levels. */
-         read_neon_element64(t64, a->vd, 1, MO_64);
+     if (aarch64) {
--        accfn(t64, t64, rn1_64);
++        int ps;
 -        write_neon_element64(t64, a->vd, 1, MO_64);
 +        accfn(rn1_64, t64, rn1_64);
          tcg_temp_free_i64(t64);
 -    } else {
 -        write_neon_element64(rn0_64, a->vd, 0, MO_64);
 -        write_neon_element64(rn1_64, a->vd, 1, MO_64);
      }
 +
-+    write_neon_element64(rn0_64, a->vd, 0, MO_64);
+         param = aa64_va_parameters(env, address, mmu_idx,
-+    write_neon_element64(rn1_64, a->vd, 1, MO_64);
+                                    access_type != MMU_INST_FETCH);
-     tcg_temp_free_i64(rn0_64);
+         level = 0;
-     tcg_temp_free_i64(rn1_64);
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
-     return true;
          addrsize = 64 - 8 * param.tbi;
          inputsize = 64 - param.tsz;
 -        outputsize = arm_pamax(cpu);
 +
 +        /*
 +         * Bound PS by PARANGE to find the effective output address size.
 +         * ID_AA64MMFR0 is a read-only register so values outside of the
 +         * supported mappings can be considered an implementation error.
 +         */
 +        ps = FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
 +        ps = MIN(ps, param.ps);
 +        assert(ps < ARRAY_SIZE(pamax_map));
 +        outputsize = pamax_map[ps];
      } else {
          param = aa32_va_parameters(env, address, mmu_idx);
          level = 1;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
      /* Now we can extract the actual base address from the TTBR */
      descaddr = extract64(ttbr, 0, 48);
 +
 +    /*
 +     * If the base address is out of range, raise AddressSizeFault.
 +     * In the pseudocode, this is !IsZero(baseregister<47:outputsize>),
 +     * but we've just cleared the bits above 47, so simplify the test.
 +     */
 +    if (descaddr >> outputsize) {
 +        level = 0;
 +        fault_type = ARMFault_AddressSize;
 +        goto do_fault;
 +    }
 +
      /*
       * We rely on this masking to clear the RES0 bits at the bottom of the TTBR
       * and also to mask out CnP (bit 0) which could validly be non-zero.
       */
      descaddr &= ~indexmask;
 -    /* The address field in the descriptor goes up to bit 39 for ARMv7
 -     * but up to bit 47 for ARMv8, but we use the descaddrmask
 -     * up to bit 39 for AArch32, because we don't need other bits in that case
 -     * to construct next descriptor address (anyway they should be all zeroes).
 +    /*
 +     * For AArch32, the address field in the descriptor goes up to bit 39
 +     * for both v7 and v8.  However, for v8 the SBZ bits [47:40] must be 0
 +     * or an AddressSize fault is raised.  So for v8 we extract those SBZ
 +     * bits as part of the address, which will be checked via outputsize.
 +     * For AArch64, the address field always goes up to bit 47 (with extra
 +     * bits for FEAT_LPA placed elsewhere).  AArch64 implies v8.
       */
 -    descaddrmask = ((1ull << (aarch64 ? 48 : 40)) - 1) &
 -                   ~indexmask_grainsize;
 +    if (arm_feature(env, ARM_FEATURE_V8)) {
 +        descaddrmask = MAKE_64BIT_MASK(0, 48);
 +    } else {
 +        descaddrmask = MAKE_64BIT_MASK(0, 40);
 +    }
 +    descaddrmask &= ~indexmask_grainsize;
      /* Secure accesses start with the page table in secure memory and
       * can be downgraded to non-secure at any step. Non-secure accesses
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
              /* Invalid, or the Reserved level 3 encoding */
              goto do_fault;
          }
 +
          descaddr = descriptor & descaddrmask;
 +        if (descaddr >> outputsize) {
 +            fault_type = ARMFault_AddressSize;
 +            goto do_fault;
 +        }
          if ((descriptor & 2) && (level < 3)) {
              /* Table entry. The top five bits are attributes which may
 --
-.20.1
+.25.1

-[PULL 15/26] target/arm: fix LORID_EL1 access check
+[PULL 15/26] target/arm: Prepare DBGBVR and DBGWVR for FEAT_LVA
-From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-Secure mode is not exempted from checking SCR_EL3.TLOR, and in the
+The original A.a revision of the AArch64 ARM required that we
-future HCR_EL2.TLOR when S-EL2 is enabled.
+force-extend the addresses in these registers from 49 bits.
 This language has been loosened via a combination of IMPLEMENTATION
 DEFINED and CONSTRAINTED UNPREDICTABLE to allow consideration of
 the entire aligned address.
-Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+This means that we do not have to consider whether or not FEAT_LVA
 is enabled, and decide from which bit an address might need to be
 extended.
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220301215958.157011-9-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c | 19 +++++--------------
+ target/arm/helper.c | 32 ++++++++++++++++++++++++--------
-file changed, 5 insertions(+), 14 deletions(-)
+file changed, 24 insertions(+), 8 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static uint64_t id_aa64pfr0_read(CPUARMState *env, const ARMCPRegInfo *ri)
+@@ -XXX,XX +XXX,XX @@ static void dbgwvr_write(CPUARMState *env, const ARMCPRegInfo *ri,
- #endif
+     ARMCPU *cpu = env_archcpu(env);
+     int i = ri->crm;
- /* Shared logic between LORID and the rest of the LOR* registers.
-- * Secure state has already been delt with.
+-    /* Bits [63:49] are hardwired to the value of bit [48]; that is, the
-+ * Secure state exclusion has already been dealt with.
+-     * register reads and behaves as if values written are sign extended.
-  */
++    /*
--static CPAccessResult access_lor_ns(CPUARMState *env)
+      * Bits [1:0] are RES0.
-+static CPAccessResult access_lor_ns(CPUARMState *env,
++     *
-+                                    const ARMCPRegInfo *ri, bool isread)
++     * It is IMPLEMENTATION DEFINED whether [63:49] ([63:53] with FEAT_LVA)
- {
++     * are hardwired to the value of bit [48] ([52] with FEAT_LVA), or if
-     int el = arm_current_el(env);
++     * they contain the value written.  It is CONSTRAINED UNPREDICTABLE
++     * whether the RESS bits are ignored when comparing an address.
-@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_ns(CPUARMState *env)
++     *
-     return CP_ACCESS_OK;
++     * Therefore we are allowed to compare the entire register, which lets
- }
++     * us avoid considering whether or not FEAT_LVA is actually enabled.
+      */
--static CPAccessResult access_lorid(CPUARMState *env, const ARMCPRegInfo *ri,
+-    value = sextract64(value, 0, 49) & ~3ULL;
--                                   bool isread)
++    value &= ~3ULL;
--{
--    if (arm_is_secure_below_el3(env)) {
+     raw_write(env, ri, value);
--        /* Access ok in secure mode.  */
+     hw_watchpoint_update(cpu, i);
--        return CP_ACCESS_OK;
+@@ -XXX,XX +XXX,XX @@ void hw_breakpoint_update(ARMCPU *cpu, int n)
--    }
+     case 0: /* unlinked address match */
--    return access_lor_ns(env);
+     case 1: /* linked address match */
--}
+     {
--
+-        /* Bits [63:49] are hardwired to the value of bit [48]; that is,
- static CPAccessResult access_lor_other(CPUARMState *env,
+-         * we behave as if the register was sign extended. Bits [1:0] are
-                                        const ARMCPRegInfo *ri, bool isread)
+-         * RES0. The BAS field is used to allow setting breakpoints on 16
- {
+-         * bit wide instructions; it is CONSTRAINED UNPREDICTABLE whether
-@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_other(CPUARMState *env,
++        /*
-         /* Access denied in secure mode.  */
++         * Bits [1:0] are RES0.
-         return CP_ACCESS_TRAP;
++         *
-     }
++         * It is IMPLEMENTATION DEFINED whether bits [63:49]
--    return access_lor_ns(env);
++         * ([63:53] for FEAT_LVA) are hardwired to a copy of the sign bit
-+    return access_lor_ns(env, ri, isread);
++         * of the VA field ([48] or [52] for FEAT_LVA), or whether the
- }
++         * value is read as written.  It is CONSTRAINED UNPREDICTABLE
++         * whether the RESS bits are ignored when comparing an address.
- /*
++         * Therefore we are allowed to compare the entire register, which
-@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lor_reginfo[] = {
++         * lets us avoid considering whether FEAT_LVA is actually enabled.
-       .type = ARM_CP_CONST, .resetvalue = 0 },
++         *
-     { .name = "LORID_EL1", .state = ARM_CP_STATE_AA64,
++         * The BAS field is used to allow setting breakpoints on 16-bit
-       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 7,
++         * wide instructions; it is CONSTRAINED UNPREDICTABLE whether
--      .access = PL1_R, .accessfn = access_lorid,
+          * a bp will fire if the addresses covered by the bp and the addresses
-+      .access = PL1_R, .accessfn = access_lor_ns,
+          * covered by the insn overlap but the insn doesn't start at the
-       .type = ARM_CP_CONST, .resetvalue = 0 },
+          * start of the bp address range. We choose to require the insn and
-     REGINFO_SENTINEL
+@@ -XXX,XX +XXX,XX @@ void hw_breakpoint_update(ARMCPU *cpu, int n)
- };
+          * See also figure D2-3 in the v8 ARM ARM (DDI0487A.c).
           */
          int bas = extract64(bcr, 5, 4);
 -        addr = sextract64(bvr, 0, 49) & ~3ULL;
 +        addr = bvr & ~3ULL;
          if (bas == 0) {
              return;
          }
 --
-.20.1
+.25.1

-[PULL 02/26] target/arm: Move neon_element_offset to translate.c
+[PULL 16/26] target/arm: Implement FEAT_LVA
 From: Richard Henderson <richard.henderson@linaro.org>
-This will shortly have users outside of translate-neon.c.inc.
+This feature is relatively small, as it applies only to
 k pages and thus requires no additional changes to the
 table descriptor walking algorithm, only a change to the
 minimum TSZ (which is the inverse of the maximum virtual
 address space size).
+Note that this feature widens VBAR_ELx, but we already
+treat the register as being 64 bits wide.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-3-richard.henderson@linaro.org
+Message-id: 20220301215958.157011-10-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          | 20 ++++++++++++++++++++
+ docs/system/arm/emulation.rst | 1 +
- target/arm/translate-neon.c.inc | 19 -------------------
+ target/arm/cpu-param.h        | 2 +-
-files changed, 20 insertions(+), 19 deletions(-)
+ target/arm/cpu.h              | 5 +++++
  target/arm/cpu64.c            | 1 +
  target/arm/helper.c           | 9 ++++++++-
 files changed, 16 insertions(+), 2 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/docs/system/arm/emulation.rst
-+++ b/target/arm/translate.c
++++ b/docs/system/arm/emulation.rst
-@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
+@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
-     return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+ - FEAT_LRCPC (Load-acquire RCpc instructions)
  - FEAT_LRCPC2 (Load-acquire RCpc instructions v2)
  - FEAT_LSE (Large System Extensions)
 +- FEAT_LVA (Large Virtual Address space)
  - FEAT_MTE (Memory Tagging Extension)
  - FEAT_MTE2 (Memory Tagging Extension)
  - FEAT_MTE3 (MTE Asymmetric Fault Handling)
 diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu-param.h
 +++ b/target/arm/cpu-param.h
@@ -XXX,XX +XXX,XX @@
  #ifdef TARGET_AARCH64
  # define TARGET_LONG_BITS             64
  # define TARGET_PHYS_ADDR_SPACE_BITS  48
 -# define TARGET_VIRT_ADDR_SPACE_BITS  48
 +# define TARGET_VIRT_ADDR_SPACE_BITS  52
  #else
  # define TARGET_LONG_BITS             32
  # define TARGET_PHYS_ADDR_SPACE_BITS  40
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_ccidx(const ARMISARegisters *id)
      return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, CCIDX) != 0;
  }
-+/*
++static inline bool isar_feature_aa64_lva(const ARMISARegisters *id)
 + * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
 + * where 0 is the least significant end of the register.
 + */
 +static long neon_element_offset(int reg, int element, MemOp size)
 +{
-+    int element_size = 1 << size;
++    return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, VARANGE) != 0;
 +    int ofs = element * element_size;
 +#ifdef HOST_WORDS_BIGENDIAN
 +    /*
 +     * Calculate the offset assuming fully little-endian,
 +     * then XOR to account for the order of the 8-byte units.
 +     */
 +    if (element_size < 8) {
 +        ofs ^= 8 - element_size;
 +    }
 +#endif
 +    return neon_full_reg_offset(reg) + ofs;
 +}
 +
- static inline long vfp_reg_offset(bool dp, unsigned reg)
+ static inline bool isar_feature_aa64_tts2uxn(const ARMISARegisters *id)
  {
-     if (dp) {
+     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, XNX) != 0;
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.c.inc
+--- a/target/arm/cpu64.c
-+++ b/target/arm/translate-neon.c.inc
++++ b/target/arm/cpu64.c
-@@ -XXX,XX +XXX,XX @@ static inline int neon_3same_fp_size(DisasContext *s, int x)
+@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
- #include "decode-neon-ls.c.inc"
+     t = FIELD_DP64(t, ID_AA64MMFR2, UAO, 1);
- #include "decode-neon-shared.c.inc"
+     t = FIELD_DP64(t, ID_AA64MMFR2, CNP, 1); /* TTCNP */
+     t = FIELD_DP64(t, ID_AA64MMFR2, ST, 1); /* TTST */
--/* Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
++    t = FIELD_DP64(t, ID_AA64MMFR2, VARANGE, 1); /* FEAT_LVA */
-- * where 0 is the least significant end of the register.
+     cpu->isar.id_aa64mmfr2 = t;
-- */
--static inline long
+     t = cpu->isar.id_aa64zfr0;
--neon_element_offset(int reg, int element, MemOp size)
+diff --git a/target/arm/helper.c b/target/arm/helper.c
--{
+index XXXXXXX..XXXXXXX 100644
--    int element_size = 1 << size;
+--- a/target/arm/helper.c
--    int ofs = element * element_size;
++++ b/target/arm/helper.c
--#ifdef HOST_WORDS_BIGENDIAN
+@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
--    /* Calculate the offset assuming fully little-endian,
+     } else {
--     * then XOR to account for the order of the 8-byte units.
+         max_tsz = 39;
--     */
+     }
--    if (element_size < 8) {
+-    min_tsz = 16;  /* TODO: ARMv8.2-LVA  */
--        ofs ^= 8 - element_size;
++
--    }
++    min_tsz = 16;
--#endif
++    if (using64k) {
--    return neon_full_reg_offset(reg) + ofs;
++        if (cpu_isar_feature(aa64_lva, env_archcpu(env))) {
--}
++            min_tsz = 12;
--
++        }
- static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
++    }
- {
++    /* TODO: FEAT_LPA2 */
-     long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
      if (tsz > max_tsz) {
          tsz = max_tsz;
 --
-.20.1
+.25.1

-[PULL 11/26] target/arm: Improve do_prewiden_3d
+[PULL 17/26] target/arm: Implement FEAT_LPA
 From: Richard Henderson <richard.henderson@linaro.org>
-We can use proper widening loads to extend 32-bit inputs,
+This feature widens physical addresses (and intermediate physical
-and skip the "widenfn" step.
+addresses for 2-stage translation) from 48 to 52 bits, when using
 k pages.  The only thing left at this point is to handle the
 extra bits in the TTBR and in the table descriptors.
+Note that PAR_EL1 and HPFAR_EL2 are nominally extended, but we don't
+mask out the high bits when writing to those registers, so no changes
+are required there.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-12-richard.henderson@linaro.org
+Message-id: 20220301215958.157011-11-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          |  6 +++
+ docs/system/arm/emulation.rst |  1 +
- target/arm/translate-neon.c.inc | 66 ++++++++++++++++++---------------
+ target/arm/cpu-param.h        |  2 +-
-files changed, 43 insertions(+), 29 deletions(-)
+ target/arm/cpu64.c            |  2 +-
  target/arm/helper.c           | 19 ++++++++++++++++---
 files changed, 19 insertions(+), 5 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/docs/system/arm/emulation.rst
-+++ b/target/arm/translate.c
++++ b/docs/system/arm/emulation.rst
-@@ -XXX,XX +XXX,XX @@ static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
+@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
-     long off = neon_element_offset(reg, ele, memop);
+ - FEAT_I8MM (AArch64 Int8 matrix multiplication instructions)
+ - FEAT_JSCVT (JavaScript conversion instructions)
-     switch (memop) {
+ - FEAT_LOR (Limited ordering regions)
-+    case MO_SL:
++- FEAT_LPA (Large Physical Address space)
-+        tcg_gen_ld32s_i64(dest, cpu_env, off);
+ - FEAT_LRCPC (Load-acquire RCpc instructions)
-+        break;
+ - FEAT_LRCPC2 (Load-acquire RCpc instructions v2)
-+    case MO_UL:
+ - FEAT_LSE (Large System Extensions)
-+        tcg_gen_ld32u_i64(dest, cpu_env, off);
+diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
 +        break;
      case MO_Q:
          tcg_gen_ld_i64(dest, cpu_env, off);
          break;
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.c.inc
+--- a/target/arm/cpu-param.h
-+++ b/target/arm/translate-neon.c.inc
++++ b/target/arm/cpu-param.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
+@@ -XXX,XX +XXX,XX @@
- static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
-                            NeonGenWidenFn *widenfn,
+ #ifdef TARGET_AARCH64
-                            NeonGenTwo64OpFn *opfn,
+ # define TARGET_LONG_BITS             64
--                           bool src1_wide)
+-# define TARGET_PHYS_ADDR_SPACE_BITS  48
-+                           int src1_mop, int src2_mop)
++# define TARGET_PHYS_ADDR_SPACE_BITS  52
- {
+ # define TARGET_VIRT_ADDR_SPACE_BITS  52
-     /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
+ #else
-     TCGv_i64 rn0_64, rn1_64, rm_64;
+ # define TARGET_LONG_BITS             32
--    TCGv_i32 rm;
+diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
+index XXXXXXX..XXXXXXX 100644
-     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+--- a/target/arm/cpu64.c
-         return false;
++++ b/target/arm/cpu64.c
-@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
+@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
-         return false;
+     cpu->isar.id_aa64pfr1 = t;
-     }
+     t = cpu->isar.id_aa64mmfr0;
--    if (!widenfn || !opfn) {
+-    t = FIELD_DP64(t, ID_AA64MMFR0, PARANGE, 5); /* PARange: 48 bits */
-+    if (!opfn) {
++    t = FIELD_DP64(t, ID_AA64MMFR0, PARANGE, 6); /* FEAT_LPA: 52 bits */
-         /* size == 3 case, which is an entirely different insn group */
+     cpu->isar.id_aa64mmfr0 = t;
-         return false;
-     }
+     t = cpu->isar.id_aa64mmfr1;
+diff --git a/target/arm/helper.c b/target/arm/helper.c
--    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
+index XXXXXXX..XXXXXXX 100644
-+    if ((a->vd & 1) || (src1_mop == MO_Q && (a->vn & 1))) {
+--- a/target/arm/helper.c
-         return false;
++++ b/target/arm/helper.c
-     }
+@@ -XXX,XX +XXX,XX @@ static const uint8_t pamax_map[] = {
+     [3] = 42,
-@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
+     [4] = 44,
-     rn1_64 = tcg_temp_new_i64();
+     [5] = 48,
-     rm_64 = tcg_temp_new_i64();
++    [6] = 52,
+ };
--    if (src1_wide) {
--        read_neon_element64(rn0_64, a->vn, 0, MO_64);
+ /* The cpu-specific constant value of PAMax; also used by hw/arm/virt. */
-+    if (src1_mop >= 0) {
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
-+        read_neon_element64(rn0_64, a->vn, 0, src1_mop);
+     descaddr = extract64(ttbr, 0, 48);
      } else {
          TCGv_i32 tmp = tcg_temp_new_i32();
          read_neon_element32(tmp, a->vn, 0, MO_32);
          widenfn(rn0_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = tcg_temp_new_i32();
 -    read_neon_element32(rm, a->vm, 0, MO_32);
 +    if (src2_mop >= 0) {
 +        read_neon_element64(rm_64, a->vm, 0, src2_mop);
 +    } else {
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vm, 0, MO_32);
 +        widenfn(rm_64, tmp);
 +        tcg_temp_free_i32(tmp);
 +    }
 -    widenfn(rm_64, rm);
 -    tcg_temp_free_i32(rm);
      opfn(rn0_64, rn0_64, rm_64);
      /*
-      * Load second pass inputs before storing the first pass result, to
+-     * If the base address is out of range, raise AddressSizeFault.
-      * avoid incorrect results if a narrow input overlaps with the result.
++     * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [5:2] of TTBR.
 +     *
 +     * Otherwise, if the base address is out of range, raise AddressSizeFault.
       * In the pseudocode, this is !IsZero(baseregister<47:outputsize>),
       * but we've just cleared the bits above 47, so simplify the test.
       */
--    if (src1_wide) {
+-    if (descaddr >> outputsize) {
--        read_neon_element64(rn1_64, a->vn, 1, MO_64);
++    if (outputsize > 48) {
-+    if (src1_mop >= 0) {
++        descaddr |= extract64(ttbr, 2, 4) << 48;
-+        read_neon_element64(rn1_64, a->vn, 1, src1_mop);
++    } else if (descaddr >> outputsize) {
-     } else {
+         level = 0;
-         TCGv_i32 tmp = tcg_temp_new_i32();
+         fault_type = ARMFault_AddressSize;
-         read_neon_element32(tmp, a->vn, 1, MO_32);
+         goto do_fault;
-         widenfn(rn1_64, tmp);
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
-         tcg_temp_free_i32(tmp);
+         }
-     }
--    rm = tcg_temp_new_i32();
+         descaddr = descriptor & descaddrmask;
--    read_neon_element32(rm, a->vm, 1, MO_32);
+-        if (descaddr >> outputsize) {
-+    if (src2_mop >= 0) {
++
-+        read_neon_element64(rm_64, a->vm, 1, src2_mop);
++        /*
-+    } else {
++         * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [15:12]
-+        TCGv_i32 tmp = tcg_temp_new_i32();
++         * of descriptor.  Otherwise, if descaddr is out of range, raise
-+        read_neon_element32(tmp, a->vm, 1, MO_32);
++         * AddressSizeFault.
-+        widenfn(rm_64, tmp);
++         */
-+        tcg_temp_free_i32(tmp);
++        if (outputsize > 48) {
-+    }
++            descaddr |= extract64(descriptor, 12, 4) << 48;
++        } else if (descaddr >> outputsize) {
-     write_neon_element64(rn0_64, a->vd, 0, MO_64);
+             fault_type = ARMFault_AddressSize;
+             goto do_fault;
--    widenfn(rm_64, rm);
+         }
 -    tcg_temp_free_i32(rm);
      opfn(rn1_64, rn1_64, rm_64);
      write_neon_element64(rn1_64, a->vd, 1, MO_64);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      return true;
  }
 -#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
 +#define DO_PREWIDEN(INSN, S, OP, SRC1WIDE, SIGN)                        \
      static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
      {                                                                   \
          static NeonGenWidenFn * const widenfn[] = {                     \
              gen_helper_neon_widen_##S##8,                               \
              gen_helper_neon_widen_##S##16,                              \
 -            tcg_gen_##EXT##_i32_i64,                                    \
 -            NULL,                                                       \
 +            NULL, NULL,                                                 \
          };                                                              \
          static NeonGenTwo64OpFn * const addfn[] = {                     \
              gen_helper_neon_##OP##l_u16,                                \
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
              tcg_gen_##OP##_i64,                                         \
              NULL,                                                       \
          };                                                              \
 -        return do_prewiden_3d(s, a, widenfn[a->size],                   \
 -                              addfn[a->size], SRC1WIDE);                \
 +        int narrow_mop = a->size == MO_32 ? MO_32 | SIGN : -1;          \
 +        return do_prewiden_3d(s, a, widenfn[a->size], addfn[a->size],   \
 +                              SRC1WIDE ? MO_Q : narrow_mop,             \
 +                              narrow_mop);                              \
      }
 -DO_PREWIDEN(VADDL_S, s, ext, add, false)
 -DO_PREWIDEN(VADDL_U, u, extu, add, false)
 -DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
 -DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
 -DO_PREWIDEN(VADDW_S, s, ext, add, true)
 -DO_PREWIDEN(VADDW_U, u, extu, add, true)
 -DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
 -DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
 +DO_PREWIDEN(VADDL_S, s, add, false, MO_SIGN)
 +DO_PREWIDEN(VADDL_U, u, add, false, 0)
 +DO_PREWIDEN(VSUBL_S, s, sub, false, MO_SIGN)
 +DO_PREWIDEN(VSUBL_U, u, sub, false, 0)
 +DO_PREWIDEN(VADDW_S, s, add, true, MO_SIGN)
 +DO_PREWIDEN(VADDW_U, u, add, true, 0)
 +DO_PREWIDEN(VSUBW_S, s, sub, true, MO_SIGN)
 +DO_PREWIDEN(VSUBW_U, u, sub, true, 0)
  static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
                           NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
 --
-.20.1
+.25.1

-[PULL 13/26] target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
+[PULL 18/26] target/arm: Extend arm_fi_to_lfsc to level -1
-The helper functions for performing the udot/sdot operations against
+From: Richard Henderson <richard.henderson@linaro.org>
 a scalar were not using an address-swizzling macro when converting
 the index of the scalar element into a pointer into the vm array.
 This had no effect on little-endian hosts but meant we generated
 incorrect results on big-endian hosts.
-For these insns, the index is indexing over group of 4 8-bit values,
+With FEAT_LPA2, rather than introducing translation level 4,
-so 32 bits per indexed entity, and H4() is therefore what we want.
+we introduce level -1, below the current level 0.  Extend
-(For Neon the only possible input indexes are 0 and 1.)
+arm_fi_to_lfsc to handle these faults.
+Assert that this new translation level does not leak into
+fault types for which it is not defined, which allows some
+masking of fi->level to be removed.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220301215958.157011-12-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20201028191712.4910-3-peter.maydell@linaro.org
 ---
- target/arm/vec_helper.c | 4 ++--
+ target/arm/internals.h | 35 +++++++++++++++++++++++++++++------
-file changed, 2 insertions(+), 2 deletions(-)
+file changed, 29 insertions(+), 6 deletions(-)
-diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
+diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vec_helper.c
+--- a/target/arm/internals.h
-+++ b/target/arm/vec_helper.c
++++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
+@@ -XXX,XX +XXX,XX @@ static inline uint32_t arm_fi_to_lfsc(ARMMMUFaultInfo *fi)
-     intptr_t index = simd_data(desc);
+     case ARMFault_None:
-     uint32_t *d = vd;
+         return 0;
-     int8_t *n = vn;
+     case ARMFault_AddressSize:
--    int8_t *m_indexed = (int8_t *)vm + index * 4;
+-        fsc = fi->level & 3;
-+    int8_t *m_indexed = (int8_t *)vm + H4(index) * 4;
++        assert(fi->level >= -1 && fi->level <= 3);
++        if (fi->level < 0) {
-     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
++            fsc = 0b101001;
-      * Otherwise opr_sz is a multiple of 16.
++        } else {
-@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
++            fsc = fi->level;
-     intptr_t index = simd_data(desc);
++        }
-     uint32_t *d = vd;
+         break;
-     uint8_t *n = vn;
+     case ARMFault_AccessFlag:
--    uint8_t *m_indexed = (uint8_t *)vm + index * 4;
+-        fsc = (fi->level & 3) | (0x2 << 2);
-+    uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4;
++        assert(fi->level >= 0 && fi->level <= 3);
++        fsc = 0b001000 | fi->level;
-     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
+         break;
-      * Otherwise opr_sz is a multiple of 16.
+     case ARMFault_Permission:
 -        fsc = (fi->level & 3) | (0x3 << 2);
 +        assert(fi->level >= 0 && fi->level <= 3);
 +        fsc = 0b001100 | fi->level;
          break;
      case ARMFault_Translation:
 -        fsc = (fi->level & 3) | (0x1 << 2);
 +        assert(fi->level >= -1 && fi->level <= 3);
 +        if (fi->level < 0) {
 +            fsc = 0b101011;
 +        } else {
 +            fsc = 0b000100 | fi->level;
 +        }
          break;
      case ARMFault_SyncExternal:
          fsc = 0x10 | (fi->ea << 12);
          break;
      case ARMFault_SyncExternalOnWalk:
 -        fsc = (fi->level & 3) | (0x5 << 2) | (fi->ea << 12);
 +        assert(fi->level >= -1 && fi->level <= 3);
 +        if (fi->level < 0) {
 +            fsc = 0b010011;
 +        } else {
 +            fsc = 0b010100 | fi->level;
 +        }
 +        fsc |= fi->ea << 12;
          break;
      case ARMFault_SyncParity:
          fsc = 0x18;
          break;
      case ARMFault_SyncParityOnWalk:
 -        fsc = (fi->level & 3) | (0x7 << 2);
 +        assert(fi->level >= -1 && fi->level <= 3);
 +        if (fi->level < 0) {
 +            fsc = 0b011011;
 +        } else {
 +            fsc = 0b011100 | fi->level;
 +        }
          break;
      case ARMFault_AsyncParity:
          fsc = 0x19;
 --
-.20.1
+.25.1

-[PULL 04/26] target/arm: Use neon_element_offset in vfp_reg_offset
+[PULL 19/26] target/arm: Introduce tlbi_aa64_get_range
 From: Richard Henderson <richard.henderson@linaro.org>
-This seems a bit more readable than using offsetof CPU_DoubleU.
+Merge tlbi_aa64_range_get_length and tlbi_aa64_range_get_base,
 returning a structure containing both results.  Pass in the
 ARMMMUIdx, rather than the digested two_ranges boolean.
+This is in preparation for FEAT_LPA2, where the interpretation
+of 'value' depends on the effective value of DS for the regime.
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-5-richard.henderson@linaro.org
+Message-id: 20220301215958.157011-13-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 13 ++++---------
+ target/arm/helper.c | 58 +++++++++++++++++++--------------------------
-file changed, 4 insertions(+), 9 deletions(-)
+file changed, 24 insertions(+), 34 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static long neon_element_offset(int reg, int element, MemOp size)
+@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
      return neon_full_reg_offset(reg) + ofs;
  }
--static inline long vfp_reg_offset(bool dp, unsigned reg)
+ #ifdef TARGET_AARCH64
-+/* Return the offset of a VFP Dreg (dp = true) or VFP Sreg (dp = false). */
+-static uint64_t tlbi_aa64_range_get_length(CPUARMState *env,
-+static long vfp_reg_offset(bool dp, unsigned reg)
+-                                           uint64_t value)
 -{
 -    unsigned int page_shift;
 -    unsigned int page_size_granule;
 -    uint64_t num;
 -    uint64_t scale;
 -    uint64_t exponent;
 +typedef struct {
 +    uint64_t base;
      uint64_t length;
 +} TLBIRange;
 +
 +static TLBIRange tlbi_aa64_get_range(CPUARMState *env, ARMMMUIdx mmuidx,
 +                                     uint64_t value)
 +{
 +    unsigned int page_size_granule, page_shift, num, scale, exponent;
 +    TLBIRange ret = { };
 -    num = extract64(value, 39, 5);
 -    scale = extract64(value, 44, 2);
      page_size_granule = extract64(value, 46, 2);
      if (page_size_granule == 0) {
          qemu_log_mask(LOG_GUEST_ERROR, "Invalid page size granule %d\n",
                        page_size_granule);
 -        return 0;
 +        return ret;
      }
      page_shift = (page_size_granule - 1) * 2 + 12;
 -
 +    num = extract64(value, 39, 5);
 +    scale = extract64(value, 44, 2);
      exponent = (5 * scale) + 1;
 -    length = (num + 1) << (exponent + page_shift);
 -    return length;
 -}
 +    ret.length = (num + 1) << (exponent + page_shift);
 -static uint64_t tlbi_aa64_range_get_base(CPUARMState *env, uint64_t value,
 -                                        bool two_ranges)
 -{
 -    /* TODO: ARMv8.7 FEAT_LPA2 */
 -    uint64_t pageaddr;
 -
 -    if (two_ranges) {
 -        pageaddr = sextract64(value, 0, 37) << TARGET_PAGE_BITS;
 +    if (regime_has_2_ranges(mmuidx)) {
 +        ret.base = sextract64(value, 0, 37) << TARGET_PAGE_BITS;
      } else {
 -        pageaddr = extract64(value, 0, 37) << TARGET_PAGE_BITS;
 +        ret.base = extract64(value, 0, 37) << TARGET_PAGE_BITS;
      }
 -    return pageaddr;
 +    return ret;
  }
  static void do_rvae_write(CPUARMState *env, uint64_t value,
                            int idxmap, bool synced)
  {
-     if (dp) {
+     ARMMMUIdx one_idx = ARM_MMU_IDX_A | ctz32(idxmap);
--        return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+-    bool two_ranges = regime_has_2_ranges(one_idx);
-+        return neon_element_offset(reg, 0, MO_64);
+-    uint64_t baseaddr, length;
 +    TLBIRange range;
      int bits;
 -    baseaddr = tlbi_aa64_range_get_base(env, value, two_ranges);
 -    length = tlbi_aa64_range_get_length(env, value);
 -    bits = tlbbits_for_regime(env, one_idx, baseaddr);
 +    range = tlbi_aa64_get_range(env, one_idx, value);
 +    bits = tlbbits_for_regime(env, one_idx, range.base);
      if (synced) {
          tlb_flush_range_by_mmuidx_all_cpus_synced(env_cpu(env),
 -                                                  baseaddr,
 -                                                  length,
 +                                                  range.base,
 +                                                  range.length,
                                                    idxmap,
                                                    bits);
      } else {
--        long ofs = offsetof(CPUARMState, vfp.zregs[reg >> 2].d[(reg >> 1) & 1]);
+-        tlb_flush_range_by_mmuidx(env_cpu(env), baseaddr,
--        if (reg & 1) {
+-                                  length, idxmap, bits);
--            ofs += offsetof(CPU_DoubleU, l.upper);
++        tlb_flush_range_by_mmuidx(env_cpu(env), range.base,
--        } else {
++                                  range.length, idxmap, bits);
 -            ofs += offsetof(CPU_DoubleU, l.lower);
 -        }
 -        return ofs;
 +        return neon_element_offset(reg >> 1, reg & 1, MO_32);
      }
  }
 --
-.20.1
+.25.1

-[PULL 05/26] target/arm: Add read/write_neon_element32
+[PULL 20/26] target/arm: Fix TLBIRange.base for 16k and 64k pages
 From: Richard Henderson <richard.henderson@linaro.org>
-Model these off the aa64 read/write_vec_element functions.
+The shift of the BaseADDR field depends on the translation
-Use it within translate-neon.c.inc.  The new functions do
+granule in use.
 not allocate or free temps, so this rearranges the calling
 code a bit.
+Fixes: 84940ed8255 ("target/arm: Add support for FEAT_TLBIRANGE")
+Reported-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-6-richard.henderson@linaro.org
+Message-id: 20220301215958.157011-14-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          |  26 ++++
+ target/arm/helper.c | 5 +++--
- target/arm/translate-neon.c.inc | 256 ++++++++++++++++++++------------
+file changed, 3 insertions(+), 2 deletions(-)
 files changed, 183 insertions(+), 99 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/helper.c
-+++ b/target/arm/translate.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
+@@ -XXX,XX +XXX,XX @@ static TLBIRange tlbi_aa64_get_range(CPUARMState *env, ARMMMUIdx mmuidx,
-     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
+     ret.length = (num + 1) << (exponent + page_shift);
- }
+     if (regime_has_2_ranges(mmuidx)) {
-+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
+-        ret.base = sextract64(value, 0, 37) << TARGET_PAGE_BITS;
-+{
++        ret.base = sextract64(value, 0, 37);
-+    long off = neon_element_offset(reg, ele, size);
+     } else {
-+
+-        ret.base = extract64(value, 0, 37) << TARGET_PAGE_BITS;
-+    switch (size) {
++        ret.base = extract64(value, 0, 37);
 +    case MO_32:
 +        tcg_gen_ld_i32(dest, cpu_env, off);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
 +static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
 +{
 +    long off = neon_element_offset(reg, ele, size);
 +
 +    switch (size) {
 +    case MO_32:
 +        tcg_gen_st_i32(src, cpu_env, off);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +}
 +
  static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
  {
      TCGv_ptr ret = tcg_temp_new_ptr();
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
       * early. Since Q is 0 there are always just two passes, so instead
       * of a complicated loop over each pass we just unroll.
       */
 -    tmp = neon_load_reg(a->vn, 0);
 -    tmp2 = neon_load_reg(a->vn, 1);
 +    tmp = tcg_temp_new_i32();
 +    tmp2 = tcg_temp_new_i32();
 +    tmp3 = tcg_temp_new_i32();
 +
 +    read_neon_element32(tmp, a->vn, 0, MO_32);
 +    read_neon_element32(tmp2, a->vn, 1, MO_32);
      fn(tmp, tmp, tmp2);
 -    tcg_temp_free_i32(tmp2);
 -    tmp3 = neon_load_reg(a->vm, 0);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    read_neon_element32(tmp3, a->vm, 0, MO_32);
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      fn(tmp3, tmp3, tmp2);
 -    tcg_temp_free_i32(tmp2);
 -    neon_store_reg(a->vd, 0, tmp);
 -    neon_store_reg(a->vd, 1, tmp3);
 +    write_neon_element32(tmp, a->vd, 0, MO_32);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +
 +    tcg_temp_free_i32(tmp);
 +    tcg_temp_free_i32(tmp2);
 +    tcg_temp_free_i32(tmp3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
       * 2-reg-and-shift operations, size < 3 case, where the
       * helper needs to be passed cpu_env.
       */
 -    TCGv_i32 constimm;
 +    TCGv_i32 constimm, tmp;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
       * by immediate using the variable shift operations.
       */
      constimm = tcg_const_i32(dup_const(a->size, a->shift));
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
 +        read_neon_element32(tmp, a->vm, pass, MO_32);
          fn(tmp, cpu_env, tmp, constimm);
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
-+    tcg_temp_free_i32(tmp);
++    ret.base <<= page_shift;
-     tcg_temp_free_i32(constimm);
-     return true;
+     return ret;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
      constimm = tcg_const_i64(-a->shift);
      rm1 = tcg_temp_new_i64();
      rm2 = tcg_temp_new_i64();
 +    rd = tcg_temp_new_i32();
      /* Load both inputs first to avoid potential overwrite if rm == rd */
      neon_load_reg64(rm1, a->vm);
      neon_load_reg64(rm2, a->vm + 1);
      shiftfn(rm1, rm1, constimm);
 -    rd = tcg_temp_new_i32();
      narrowfn(rd, cpu_env, rm1);
 -    neon_store_reg(a->vd, 0, rd);
 +    write_neon_element32(rd, a->vd, 0, MO_32);
      shiftfn(rm2, rm2, constimm);
 -    rd = tcg_temp_new_i32();
      narrowfn(rd, cpu_env, rm2);
 -    neon_store_reg(a->vd, 1, rd);
 +    write_neon_element32(rd, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd);
      tcg_temp_free_i64(rm1);
      tcg_temp_free_i64(rm2);
      tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      constimm = tcg_const_i32(imm);
      /* Load all inputs first to avoid potential overwrite */
 -    rm1 = neon_load_reg(a->vm, 0);
 -    rm2 = neon_load_reg(a->vm, 1);
 -    rm3 = neon_load_reg(a->vm + 1, 0);
 -    rm4 = neon_load_reg(a->vm + 1, 1);
 +    rm1 = tcg_temp_new_i32();
 +    rm2 = tcg_temp_new_i32();
 +    rm3 = tcg_temp_new_i32();
 +    rm4 = tcg_temp_new_i32();
 +    read_neon_element32(rm1, a->vm, 0, MO_32);
 +    read_neon_element32(rm2, a->vm, 1, MO_32);
 +    read_neon_element32(rm3, a->vm, 2, MO_32);
 +    read_neon_element32(rm4, a->vm, 3, MO_32);
      rtmp = tcg_temp_new_i64();
      shiftfn(rm1, rm1, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      tcg_temp_free_i32(rm2);
      narrowfn(rm1, cpu_env, rtmp);
 -    neon_store_reg(a->vd, 0, rm1);
 +    write_neon_element32(rm1, a->vd, 0, MO_32);
 +    tcg_temp_free_i32(rm1);
      shiftfn(rm3, rm3, constimm);
      shiftfn(rm4, rm4, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
      narrowfn(rm3, cpu_env, rtmp);
      tcg_temp_free_i64(rtmp);
 -    neon_store_reg(a->vd, 1, rm3);
 +    write_neon_element32(rm3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rm3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
          widen_mask = dup_const(a->size + 1, widen_mask);
      }
 -    rm0 = neon_load_reg(a->vm, 0);
 -    rm1 = neon_load_reg(a->vm, 1);
 +    rm0 = tcg_temp_new_i32();
 +    rm1 = tcg_temp_new_i32();
 +    read_neon_element32(rm0, a->vm, 0, MO_32);
 +    read_neon_element32(rm1, a->vm, 1, MO_32);
      tmp = tcg_temp_new_i64();
      widenfn(tmp, rm0);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      if (src1_wide) {
          neon_load_reg64(rn0_64, a->vn);
      } else {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vn, 0, MO_32);
          widenfn(rn0_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = neon_load_reg(a->vm, 0);
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rm, a->vm, 0, MO_32);
      widenfn(rm_64, rm);
      tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      if (src1_wide) {
          neon_load_reg64(rn1_64, a->vn + 1);
      } else {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
 +        TCGv_i32 tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vn, 1, MO_32);
          widenfn(rn1_64, tmp);
          tcg_temp_free_i32(tmp);
      }
 -    rm = neon_load_reg(a->vm, 1);
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rm, a->vm, 1, MO_32);
      neon_store_reg64(rn0_64, a->vd);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
      narrowfn(rd1, rn_64);
 -    neon_store_reg(a->vd, 0, rd0);
 -    neon_store_reg(a->vd, 1, rd1);
 +    write_neon_element32(rd0, a->vd, 0, MO_32);
 +    write_neon_element32(rd1, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd0);
 +    tcg_temp_free_i32(rd1);
      tcg_temp_free_i64(rn_64);
      tcg_temp_free_i64(rm_64);
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
      rd0 = tcg_temp_new_i64();
      rd1 = tcg_temp_new_i64();
 -    rn = neon_load_reg(a->vn, 0);
 -    rm = neon_load_reg(a->vm, 0);
 +    rn = tcg_temp_new_i32();
 +    rm = tcg_temp_new_i32();
 +    read_neon_element32(rn, a->vn, 0, MO_32);
 +    read_neon_element32(rm, a->vm, 0, MO_32);
      opfn(rd0, rn, rm);
 -    tcg_temp_free_i32(rn);
 -    tcg_temp_free_i32(rm);
 -    rn = neon_load_reg(a->vn, 1);
 -    rm = neon_load_reg(a->vm, 1);
 +    read_neon_element32(rn, a->vn, 1, MO_32);
 +    read_neon_element32(rm, a->vm, 1, MO_32);
      opfn(rd1, rn, rm);
      tcg_temp_free_i32(rn);
      tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
  static inline TCGv_i32 neon_get_scalar(int size, int reg)
  {
 -    TCGv_i32 tmp;
 -    if (size == 1) {
 -        tmp = neon_load_reg(reg & 7, reg >> 4);
 +    TCGv_i32 tmp = tcg_temp_new_i32();
 +    if (size == MO_16) {
 +        read_neon_element32(tmp, reg & 7, reg >> 4, MO_32);
          if (reg & 8) {
              gen_neon_dup_high16(tmp);
          } else {
              gen_neon_dup_low16(tmp);
          }
      } else {
 -        tmp = neon_load_reg(reg & 15, reg >> 4);
 +        read_neon_element32(tmp, reg & 15, reg >> 4, MO_32);
      }
      return tmp;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
       * perform an accumulation operation of that result into the
       * destination.
       */
 -    TCGv_i32 scalar;
 +    TCGv_i32 scalar, tmp;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
      }
      scalar = neon_get_scalar(a->size, a->vm);
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
 +        read_neon_element32(tmp, a->vn, pass, MO_32);
          opfn(tmp, tmp, scalar);
          if (accfn) {
 -            TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +            TCGv_i32 rd = tcg_temp_new_i32();
 +            read_neon_element32(rd, a->vd, pass, MO_32);
              accfn(tmp, rd, tmp);
              tcg_temp_free_i32(rd);
          }
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(scalar);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
       * performs a kind of fused op-then-accumulate using a helper
       * function that takes all of rd, rn and the scalar at once.
       */
 -    TCGv_i32 scalar;
 +    TCGv_i32 scalar, rn, rd;
      int pass;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
      }
      scalar = neon_get_scalar(a->size, a->vm);
 +    rn = tcg_temp_new_i32();
 +    rd = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 rn = neon_load_reg(a->vn, pass);
 -        TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +        read_neon_element32(rn, a->vn, pass, MO_32);
 +        read_neon_element32(rd, a->vd, pass, MO_32);
          opfn(rd, cpu_env, rn, scalar, rd);
 -        tcg_temp_free_i32(rn);
 -        neon_store_reg(a->vd, pass, rd);
 +        write_neon_element32(rd, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(rn);
 +    tcg_temp_free_i32(rd);
      tcg_temp_free_i32(scalar);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
      scalar = neon_get_scalar(a->size, a->vm);
      /* Load all inputs before writing any outputs, in case of overlap */
 -    rn = neon_load_reg(a->vn, 0);
 +    rn = tcg_temp_new_i32();
 +    read_neon_element32(rn, a->vn, 0, MO_32);
      rn0_64 = tcg_temp_new_i64();
      opfn(rn0_64, rn, scalar);
 -    tcg_temp_free_i32(rn);
 -    rn = neon_load_reg(a->vn, 1);
 +    read_neon_element32(rn, a->vn, 1, MO_32);
      rn1_64 = tcg_temp_new_i64();
      opfn(rn1_64, rn, scalar);
      tcg_temp_free_i32(rn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
          return false;
      }
      n <<= 3;
 +    tmp = tcg_temp_new_i32();
      if (a->op) {
 -        tmp = neon_load_reg(a->vd, 0);
 +        read_neon_element32(tmp, a->vd, 0, MO_32);
      } else {
 -        tmp = tcg_temp_new_i32();
          tcg_gen_movi_i32(tmp, 0);
      }
 -    tmp2 = neon_load_reg(a->vm, 0);
 +    tmp2 = tcg_temp_new_i32();
 +    read_neon_element32(tmp2, a->vm, 0, MO_32);
      ptr1 = vfp_reg_ptr(true, a->vn);
      tmp4 = tcg_const_i32(n);
      gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
 -    tcg_temp_free_i32(tmp);
 +
      if (a->op) {
 -        tmp = neon_load_reg(a->vd, 1);
 +        read_neon_element32(tmp, a->vd, 1, MO_32);
      } else {
 -        tmp = tcg_temp_new_i32();
          tcg_gen_movi_i32(tmp, 0);
      }
 -    tmp3 = neon_load_reg(a->vm, 1);
 +    tmp3 = tcg_temp_new_i32();
 +    read_neon_element32(tmp3, a->vm, 1, MO_32);
      gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
 +    tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(tmp4);
      tcg_temp_free_ptr(ptr1);
 -    neon_store_reg(a->vd, 0, tmp2);
 -    neon_store_reg(a->vd, 1, tmp3);
 -    tcg_temp_free_i32(tmp);
 +
 +    write_neon_element32(tmp2, a->vd, 0, MO_32);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp2);
 +    tcg_temp_free_i32(tmp3);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
  static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
  {
      int pass, half;
 +    TCGv_i32 tmp[2];
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
          return true;
      }
 -    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
 -        TCGv_i32 tmp[2];
 +    tmp[0] = tcg_temp_new_i32();
 +    tmp[1] = tcg_temp_new_i32();
 +    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
          for (half = 0; half < 2; half++) {
 -            tmp[half] = neon_load_reg(a->vm, pass * 2 + half);
 +            read_neon_element32(tmp[half], a->vm, pass * 2 + half, MO_32);
              switch (a->size) {
              case 0:
                  tcg_gen_bswap32_i32(tmp[half], tmp[half]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
                  g_assert_not_reached();
              }
          }
 -        neon_store_reg(a->vd, pass * 2, tmp[1]);
 -        neon_store_reg(a->vd, pass * 2 + 1, tmp[0]);
 +        write_neon_element32(tmp[1], a->vd, pass * 2, MO_32);
 +        write_neon_element32(tmp[0], a->vd, pass * 2 + 1, MO_32);
      }
 +
 +    tcg_temp_free_i32(tmp[0]);
 +    tcg_temp_free_i32(tmp[1]);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
          rm0_64 = tcg_temp_new_i64();
          rm1_64 = tcg_temp_new_i64();
          rd_64 = tcg_temp_new_i64();
 -        tmp = neon_load_reg(a->vm, pass * 2);
 +
 +        tmp = tcg_temp_new_i32();
 +        read_neon_element32(tmp, a->vm, pass * 2, MO_32);
          widenfn(rm0_64, tmp);
 -        tcg_temp_free_i32(tmp);
 -        tmp = neon_load_reg(a->vm, pass * 2 + 1);
 +        read_neon_element32(tmp, a->vm, pass * 2 + 1, MO_32);
          widenfn(rm1_64, tmp);
          tcg_temp_free_i32(tmp);
 +
          opfn(rd_64, rm0_64, rm1_64);
          tcg_temp_free_i64(rm0_64);
          tcg_temp_free_i64(rm1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
      narrowfn(rd0, cpu_env, rm);
      neon_load_reg64(rm, a->vm + 1);
      narrowfn(rd1, cpu_env, rm);
 -    neon_store_reg(a->vd, 0, rd0);
 -    neon_store_reg(a->vd, 1, rd1);
 +    write_neon_element32(rd0, a->vd, 0, MO_32);
 +    write_neon_element32(rd1, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(rd0);
 +    tcg_temp_free_i32(rd1);
      tcg_temp_free_i64(rm);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
      }
      rd = tcg_temp_new_i64();
 +    rm0 = tcg_temp_new_i32();
 +    rm1 = tcg_temp_new_i32();
 -    rm0 = neon_load_reg(a->vm, 0);
 -    rm1 = neon_load_reg(a->vm, 1);
 +    read_neon_element32(rm0, a->vm, 0, MO_32);
 +    read_neon_element32(rm1, a->vm, 1, MO_32);
      widenfn(rd, rm0);
      tcg_gen_shli_i64(rd, rd, 8 << a->size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
      fpst = fpstatus_ptr(FPST_STD);
      ahp = get_ahp_flag();
 -    tmp = neon_load_reg(a->vm, 0);
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vm, 0, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    tmp2 = tcg_temp_new_i32();
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp2, tmp2, fpst, ahp);
      tcg_gen_shli_i32(tmp2, tmp2, 16);
      tcg_gen_or_i32(tmp2, tmp2, tmp);
 -    tcg_temp_free_i32(tmp);
 -    tmp = neon_load_reg(a->vm, 2);
 +    read_neon_element32(tmp, a->vm, 2, MO_32);
      gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
 -    tmp3 = neon_load_reg(a->vm, 3);
 -    neon_store_reg(a->vd, 0, tmp2);
 +    tmp3 = tcg_temp_new_i32();
 +    read_neon_element32(tmp3, a->vm, 3, MO_32);
 +    write_neon_element32(tmp2, a->vd, 0, MO_32);
 +    tcg_temp_free_i32(tmp2);
      gen_helper_vfp_fcvt_f32_to_f16(tmp3, tmp3, fpst, ahp);
      tcg_gen_shli_i32(tmp3, tmp3, 16);
      tcg_gen_or_i32(tmp3, tmp3, tmp);
 -    neon_store_reg(a->vd, 1, tmp3);
 +    write_neon_element32(tmp3, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp3);
      tcg_temp_free_i32(tmp);
      tcg_temp_free_i32(ahp);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
      fpst = fpstatus_ptr(FPST_STD);
      ahp = get_ahp_flag();
      tmp3 = tcg_temp_new_i32();
 -    tmp = neon_load_reg(a->vm, 0);
 -    tmp2 = neon_load_reg(a->vm, 1);
 +    tmp2 = tcg_temp_new_i32();
 +    tmp = tcg_temp_new_i32();
 +    read_neon_element32(tmp, a->vm, 0, MO_32);
 +    read_neon_element32(tmp2, a->vm, 1, MO_32);
      tcg_gen_ext16u_i32(tmp3, tmp);
      gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
 -    neon_store_reg(a->vd, 0, tmp3);
 +    write_neon_element32(tmp3, a->vd, 0, MO_32);
      tcg_gen_shri_i32(tmp, tmp, 16);
      gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp);
 -    neon_store_reg(a->vd, 1, tmp);
 -    tmp3 = tcg_temp_new_i32();
 +    write_neon_element32(tmp, a->vd, 1, MO_32);
 +    tcg_temp_free_i32(tmp);
      tcg_gen_ext16u_i32(tmp3, tmp2);
      gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
 -    neon_store_reg(a->vd, 2, tmp3);
 +    write_neon_element32(tmp3, a->vd, 2, MO_32);
 +    tcg_temp_free_i32(tmp3);
      tcg_gen_shri_i32(tmp2, tmp2, 16);
      gen_helper_vfp_fcvt_f16_to_f32(tmp2, tmp2, fpst, ahp);
 -    neon_store_reg(a->vd, 3, tmp2);
 +    write_neon_element32(tmp2, a->vd, 3, MO_32);
 +    tcg_temp_free_i32(tmp2);
      tcg_temp_free_i32(ahp);
      tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ DO_2M_CRYPTO(SHA256SU0, aa32_sha2, 2)
  static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
  {
 +    TCGv_i32 tmp;
      int pass;
      /* Handle a 2-reg-misc operation by iterating 32 bits at a time */
@@ -XXX,XX +XXX,XX @@ static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
          return true;
      }
 +    tmp = tcg_temp_new_i32();
      for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
 +        read_neon_element32(tmp, a->vm, pass, MO_32);
          fn(tmp, tmp);
 -        neon_store_reg(a->vd, pass, tmp);
 +        write_neon_element32(tmp, a->vd, pass, MO_32);
      }
 +    tcg_temp_free_i32(tmp);
      return true;
  }
@@ -XXX,XX +XXX,XX @@ static bool trans_VTRN(DisasContext *s, arg_2misc *a)
          return true;
      }
 -    if (a->size == 2) {
 +    tmp = tcg_temp_new_i32();
 +    tmp2 = tcg_temp_new_i32();
 +    if (a->size == MO_32) {
          for (pass = 0; pass < (a->q ? 4 : 2); pass += 2) {
 -            tmp = neon_load_reg(a->vm, pass);
 -            tmp2 = neon_load_reg(a->vd, pass + 1);
 -            neon_store_reg(a->vm, pass, tmp2);
 -            neon_store_reg(a->vd, pass + 1, tmp);
 +            read_neon_element32(tmp, a->vm, pass, MO_32);
 +            read_neon_element32(tmp2, a->vd, pass + 1, MO_32);
 +            write_neon_element32(tmp2, a->vm, pass, MO_32);
 +            write_neon_element32(tmp, a->vd, pass + 1, MO_32);
          }
      } else {
          for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 -            tmp = neon_load_reg(a->vm, pass);
 -            tmp2 = neon_load_reg(a->vd, pass);
 -            if (a->size == 0) {
 +            read_neon_element32(tmp, a->vm, pass, MO_32);
 +            read_neon_element32(tmp2, a->vd, pass, MO_32);
 +            if (a->size == MO_8) {
                  gen_neon_trn_u8(tmp, tmp2);
              } else {
                  gen_neon_trn_u16(tmp, tmp2);
              }
 -            neon_store_reg(a->vm, pass, tmp2);
 -            neon_store_reg(a->vd, pass, tmp);
 +            write_neon_element32(tmp2, a->vm, pass, MO_32);
 +            write_neon_element32(tmp, a->vd, pass, MO_32);
          }
      }
 +    tcg_temp_free_i32(tmp);
 +    tcg_temp_free_i32(tmp2);
      return true;
  }
 --
-.20.1
+.25.1

-[PULL 14/26] target/arm: fix handling of HCR.FB
+[PULL 21/26] target/arm: Validate tlbi TG matches translation granule in use
-From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-HCR should be applied when NS is set, not when it is cleared.
+For FEAT_LPA2, we will need other ARMVAParameters, which themselves
 depend on the translation granule in use.  We might as well validate
 that the given TG matches; the architecture "does not require that
 the instruction invalidates any entries" if this is not true.
-Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20220301215958.157011-15-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.c | 5 ++---
+ target/arm/helper.c | 10 +++++++---
-file changed, 2 insertions(+), 3 deletions(-)
+file changed, 7 insertions(+), 3 deletions(-)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
+@@ -XXX,XX +XXX,XX @@ static TLBIRange tlbi_aa64_get_range(CPUARMState *env, ARMMMUIdx mmuidx,
+                                      uint64_t value)
  /*
   * Non-IS variants of TLB operations are upgraded to
 - * IS versions if we are at NS EL1 and HCR_EL2.FB is set to
 + * IS versions if we are at EL1 and HCR_EL2.FB is effectively set to
   * force broadcast of these operations.
   */
  static bool tlb_force_broadcast(CPUARMState *env)
  {
--    return (env->cp15.hcr_el2 & HCR_FB) &&
+     unsigned int page_size_granule, page_shift, num, scale, exponent;
--        arm_current_el(env) == 1 && arm_is_secure_below_el3(env);
++    /* Extract one bit to represent the va selector in use. */
-+    return arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_FB);
++    uint64_t select = sextract64(value, 36, 1);
- }
++    ARMVAParameters param = aa64_va_parameters(env, select, mmuidx, true);
+     TLBIRange ret = { };
- static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
      page_size_granule = extract64(value, 46, 2);
 -    if (page_size_granule == 0) {
 -        qemu_log_mask(LOG_GUEST_ERROR, "Invalid page size granule %d\n",
 +    /* The granule encoded in value must match the granule in use. */
 +    if (page_size_granule != (param.using64k ? 3 : param.using16k ? 2 : 1)) {
 +        qemu_log_mask(LOG_GUEST_ERROR, "Invalid tlbi page size granule %d\n",
                        page_size_granule);
          return ret;
      }
@@ -XXX,XX +XXX,XX @@ static TLBIRange tlbi_aa64_get_range(CPUARMState *env, ARMMMUIdx mmuidx,
      ret.length = (num + 1) << (exponent + page_shift);
 -    if (regime_has_2_ranges(mmuidx)) {
 +    if (param.select) {
          ret.base = sextract64(value, 0, 37);
      } else {
          ret.base = extract64(value, 0, 37);
 --
-.20.1
+.25.1

-[PULL 03/26] target/arm: Use neon_element_offset in neon_load/store_reg
+[PULL 22/26] target/arm: Advertise all page sizes for -cpu max
 From: Richard Henderson <richard.henderson@linaro.org>
-These are the only users of neon_reg_offset, so remove that.
+We support 16k pages, but do not advertize that in ID_AA64MMFR0.
 The value 0 in the TGRAN*_2 fields indicates that stage2 lookups defer
 to the same support as stage1 lookups.  This setting is deprecated, so
 indicate support for all stage2 page sizes directly.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-4-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20220301215958.157011-16-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c | 14 ++------------
+ target/arm/cpu64.c | 4 ++++
-file changed, 2 insertions(+), 12 deletions(-)
+file changed, 4 insertions(+)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
+diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/cpu64.c
-+++ b/target/arm/translate.c
++++ b/target/arm/cpu64.c
-@@ -XXX,XX +XXX,XX @@ static inline long vfp_reg_offset(bool dp, unsigned reg)
+@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
-     }
- }
+     t = cpu->isar.id_aa64mmfr0;
+     t = FIELD_DP64(t, ID_AA64MMFR0, PARANGE, 6); /* FEAT_LPA: 52 bits */
--/* Return the offset of a 32-bit piece of a NEON register.
++    t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN16, 1);   /* 16k pages supported */
--   zero is the least significant end of the register.  */
++    t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN16_2, 2); /* 16k stage2 supported */
--static inline long
++    t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN64_2, 2); /* 64k stage2 supported */
--neon_reg_offset (int reg, int n)
++    t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN4_2, 2);  /*  4k stage2 supported */
--{
+     cpu->isar.id_aa64mmfr0 = t;
--    int sreg;
--    sreg = reg * 2 + n;
+     t = cpu->isar.id_aa64mmfr1;
 -    return vfp_reg_offset(0, sreg);
 -}
 -
  static TCGv_i32 neon_load_reg(int reg, int pass)
  {
      TCGv_i32 tmp = tcg_temp_new_i32();
 -    tcg_gen_ld_i32(tmp, cpu_env, neon_reg_offset(reg, pass));
 +    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
      return tmp;
  }
  static void neon_store_reg(int reg, int pass, TCGv_i32 var)
  {
 -    tcg_gen_st_i32(var, cpu_env, neon_reg_offset(reg, pass));
 +    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
      tcg_temp_free_i32(var);
  }
 --
-.20.1
+.25.1

-[PULL 01/26] target/arm: Introduce neon_full_reg_offset
+[PULL 23/26] target/arm: Implement FEAT_LPA2
 From: Richard Henderson <richard.henderson@linaro.org>
-This function makes it clear that we're talking about the whole
+This feature widens physical addresses (and intermediate physical
-register, and not the 32-bit piece at index 0.  This fixes a bug
+addresses for 2-stage translation) from 48 to 52 bits, when using
-when running on a big-endian host.
+k or 16k pages.
 This introduces the DS bit to TCR_ELx, which is RES0 unless the
 page size is enabled and supports LPA2, resulting in the effective
 value of DS for a given table walk.  The DS bit changes the format
 of the page table descriptor slightly, moving the PS field out to
 TCR so that all pages have the same sharability and repurposing
 those bits of the page table descriptor for the highest bits of
 the output address.
 Do not yet enable FEAT_LPA2; we need extra plumbing to avoid
 tickling an old kernel bug.
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20201030022618.785675-2-richard.henderson@linaro.org
+Message-id: 20220301215958.157011-17-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate.c          |  8 ++++++
+ docs/system/arm/emulation.rst |   1 +
- target/arm/translate-neon.c.inc | 44 ++++++++++++++++-----------------
+ target/arm/cpu.h              |  22 ++++++++
- target/arm/translate-vfp.c.inc  |  2 +-
+ target/arm/internals.h        |   2 +
-files changed, 31 insertions(+), 23 deletions(-)
+ target/arm/helper.c           | 102 +++++++++++++++++++++++++++++-----
+files changed, 112 insertions(+), 15 deletions(-)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
 diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/docs/system/arm/emulation.rst
-+++ b/target/arm/translate.c
++++ b/docs/system/arm/emulation.rst
-@@ -XXX,XX +XXX,XX @@ static inline void gen_hlt(DisasContext *s, int imm)
+@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
-     unallocated_encoding(s);
+ - FEAT_JSCVT (JavaScript conversion instructions)
  - FEAT_LOR (Limited ordering regions)
  - FEAT_LPA (Large Physical Address space)
 +- FEAT_LPA2 (Large Physical and virtual Address space v2)
  - FEAT_LRCPC (Load-acquire RCpc instructions)
  - FEAT_LRCPC2 (Load-acquire RCpc instructions v2)
  - FEAT_LSE (Large System Extensions)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_i8mm(const ARMISARegisters *id)
      return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, I8MM) != 0;
  }
-+/*
++static inline bool isar_feature_aa64_tgran4_lpa2(const ARMISARegisters *id)
 + * Return the offset of a "full" NEON Dreg.
 + */
 +static long neon_full_reg_offset(unsigned reg)
 +{
-+    return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
++    return FIELD_SEX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN4) >= 1;
 +}
 +
- static inline long vfp_reg_offset(bool dp, unsigned reg)
++static inline bool isar_feature_aa64_tgran4_2_lpa2(const ARMISARegisters *id)
 +{
 +    unsigned t = FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN4_2);
 +    return t >= 3 || (t == 0 && isar_feature_aa64_tgran4_lpa2(id));
 +}
 +
 +static inline bool isar_feature_aa64_tgran16_lpa2(const ARMISARegisters *id)
 +{
 +    return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN16) >= 2;
 +}
 +
 +static inline bool isar_feature_aa64_tgran16_2_lpa2(const ARMISARegisters *id)
 +{
 +    unsigned t = FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN16_2);
 +    return t >= 3 || (t == 0 && isar_feature_aa64_tgran16_lpa2(id));
 +}
 +
  static inline bool isar_feature_aa64_ccidx(const ARMISARegisters *id)
  {
-     if (dp) {
+     return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, CCIDX) != 0;
-diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
+diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.c.inc
+--- a/target/arm/internals.h
-+++ b/target/arm/translate-neon.c.inc
++++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ neon_element_offset(int reg, int element, MemOp size)
+@@ -XXX,XX +XXX,XX @@ static inline uint32_t aarch64_pstate_valid_mask(const ARMISARegisters *id)
-         ofs ^= 8 - element_size;
+ typedef struct ARMVAParameters {
-     }
+     unsigned tsz    : 8;
- #endif
+     unsigned ps     : 3;
--    return neon_reg_offset(reg, 0) + ofs;
++    unsigned sh     : 2;
-+    return neon_full_reg_offset(reg) + ofs;
+     unsigned select : 1;
      bool tbi        : 1;
      bool epd        : 1;
@@ -XXX,XX +XXX,XX @@ typedef struct ARMVAParameters {
      bool using16k   : 1;
      bool using64k   : 1;
      bool tsz_oob    : 1;  /* tsz has been clamped to legal range */
 +    bool ds         : 1;
  } ARMVAParameters;
  ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static TLBIRange tlbi_aa64_get_range(CPUARMState *env, ARMMMUIdx mmuidx,
      } else {
          ret.base = extract64(value, 0, 37);
      }
 +    if (param.ds) {
 +        /*
 +         * With DS=1, BaseADDR is always shifted 16 so that it is able
 +         * to address all 52 va bits.  The input address is perforce
 +         * aligned on a 64k boundary regardless of translation granule.
 +         */
 +        page_shift = 16;
 +    }
      ret.base <<= page_shift;
      return ret;
@@ -XXX,XX +XXX,XX @@ static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
      const int grainsize = stride + 3;
      int startsizecheck;
 -    /* Negative levels are never allowed.  */
 -    if (level < 0) {
 +    /*
 +     * Negative levels are usually not allowed...
 +     * Except for FEAT_LPA2, 4k page table, 52-bit address space, which
 +     * begins with level -1.  Note that previous feature tests will have
 +     * eliminated this combination if it is not enabled.
 +     */
 +    if (level < (inputsize == 52 && stride == 9 ? -1 : 0)) {
          return false;
      }
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
                                     ARMMMUIdx mmu_idx, bool data)
  {
      uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
 -    bool epd, hpd, using16k, using64k, tsz_oob;
 -    int select, tsz, tbi, max_tsz, min_tsz, ps;
 +    bool epd, hpd, using16k, using64k, tsz_oob, ds;
 +    int select, tsz, tbi, max_tsz, min_tsz, ps, sh;
 +    ARMCPU *cpu = env_archcpu(env);
      if (!regime_has_2_ranges(mmu_idx)) {
          select = 0;
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
              hpd = extract32(tcr, 24, 1);
          }
          epd = false;
 +        sh = extract32(tcr, 12, 2);
          ps = extract32(tcr, 16, 3);
 +        ds = extract64(tcr, 32, 1);
      } else {
          /*
           * Bit 55 is always between the two regions, and is canonical for
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
          if (!select) {
              tsz = extract32(tcr, 0, 6);
              epd = extract32(tcr, 7, 1);
 +            sh = extract32(tcr, 12, 2);
              using64k = extract32(tcr, 14, 1);
              using16k = extract32(tcr, 15, 1);
              hpd = extract64(tcr, 41, 1);
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
              using64k = tg == 3;
              tsz = extract32(tcr, 16, 6);
              epd = extract32(tcr, 23, 1);
 +            sh = extract32(tcr, 28, 2);
              hpd = extract64(tcr, 42, 1);
          }
          ps = extract64(tcr, 32, 3);
 +        ds = extract64(tcr, 59, 1);
      }
 -    if (cpu_isar_feature(aa64_st, env_archcpu(env))) {
 +    if (cpu_isar_feature(aa64_st, cpu)) {
          max_tsz = 48 - using64k;
      } else {
          max_tsz = 39;
      }
 +    /*
 +     * DS is RES0 unless FEAT_LPA2 is supported for the given page size;
 +     * adjust the effective value of DS, as documented.
 +     */
      min_tsz = 16;
      if (using64k) {
 -        if (cpu_isar_feature(aa64_lva, env_archcpu(env))) {
 +        if (cpu_isar_feature(aa64_lva, cpu)) {
 +            min_tsz = 12;
 +        }
 +        ds = false;
 +    } else if (ds) {
 +        switch (mmu_idx) {
 +        case ARMMMUIdx_Stage2:
 +        case ARMMMUIdx_Stage2_S:
 +            if (using16k) {
 +                ds = cpu_isar_feature(aa64_tgran16_2_lpa2, cpu);
 +            } else {
 +                ds = cpu_isar_feature(aa64_tgran4_2_lpa2, cpu);
 +            }
 +            break;
 +        default:
 +            if (using16k) {
 +                ds = cpu_isar_feature(aa64_tgran16_lpa2, cpu);
 +            } else {
 +                ds = cpu_isar_feature(aa64_tgran4_lpa2, cpu);
 +            }
 +            break;
 +        }
 +        if (ds) {
              min_tsz = 12;
          }
      }
 -    /* TODO: FEAT_LPA2 */
      if (tsz > max_tsz) {
          tsz = max_tsz;
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
      return (ARMVAParameters) {
          .tsz = tsz,
          .ps = ps,
 +        .sh = sh,
          .select = select,
          .tbi = tbi,
          .epd = epd,
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
          .using16k = using16k,
          .using64k = using64k,
          .tsz_oob = tsz_oob,
 +        .ds = ds,
      };
  }
- static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
+          * VTCR_EL2.SL0 field (whose interpretation depends on the page size)
-              * We cannot write 16 bytes at once because the
+          */
-              * destination is unaligned.
+         uint32_t sl0 = extract32(tcr->raw_tcr, 6, 2);
-              */
++        uint32_t sl2 = extract64(tcr->raw_tcr, 33, 1);
--            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
+         uint32_t startlevel;
-+            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
+         bool ok;
-, 8, tmp);
--            tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
+-        if (!aarch64 || stride == 9) {
--                             neon_reg_offset(vd, 0), 8, 8);
++        /* SL2 is RES0 unless DS=1 & 4kb granule. */
-+            tcg_gen_gvec_mov(0, neon_full_reg_offset(vd + 1),
++        if (param.ds && stride == 9 && sl2) {
-+                             neon_full_reg_offset(vd), 8, 8);
++            if (sl0 != 0) {
-         } else {
++                level = 0;
--            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
++                fault_type = ARMFault_Translation;
-+            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
++                goto do_fault;
-                                  vec_size, vec_size, tmp);
++            }
-         }
++            startlevel = -1;
-         tcg_gen_addi_i32(addr, addr, 1 << size);
++        } else if (!aarch64 || stride == 9) {
-@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
+             /* AArch32 or 4KB pages */
- static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
+             startlevel = 2 - sl0;
- {
-     int vec_size = a->q ? 16 : 8;
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
--    int rd_ofs = neon_reg_offset(a->vd, 0);
+      * for both v7 and v8.  However, for v8 the SBZ bits [47:40] must be 0
--    int rn_ofs = neon_reg_offset(a->vn, 0);
+      * or an AddressSize fault is raised.  So for v8 we extract those SBZ
--    int rm_ofs = neon_reg_offset(a->vm, 0);
+      * bits as part of the address, which will be checked via outputsize.
-+    int rd_ofs = neon_full_reg_offset(a->vd);
+-     * For AArch64, the address field always goes up to bit 47 (with extra
-+    int rn_ofs = neon_full_reg_offset(a->vn);
+-     * bits for FEAT_LPA placed elsewhere).  AArch64 implies v8.
-+    int rm_ofs = neon_full_reg_offset(a->vm);
++     * For AArch64, the address field goes up to bit 47, or 49 with FEAT_LPA2;
++     * the highest bits of a 52-bit output are placed elsewhere.
-     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+      */
-         return false;
+-    if (arm_feature(env, ARM_FEATURE_V8)) {
-@@ -XXX,XX +XXX,XX @@ static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
++    if (param.ds) {
- {
++        descaddrmask = MAKE_64BIT_MASK(0, 50);
-     /* Handle a 2-reg-shift insn which can be vectorized. */
++    } else if (arm_feature(env, ARM_FEATURE_V8)) {
-     int vec_size = a->q ? 16 : 8;
+         descaddrmask = MAKE_64BIT_MASK(0, 48);
--    int rd_ofs = neon_reg_offset(a->vd, 0);
+     } else {
--    int rm_ofs = neon_reg_offset(a->vm, 0);
+         descaddrmask = MAKE_64BIT_MASK(0, 40);
-+    int rd_ofs = neon_full_reg_offset(a->vd);
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
-+    int rm_ofs = neon_full_reg_offset(a->vm);
+         /*
-     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+          * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [15:12]
-         return false;
+-         * of descriptor.  Otherwise, if descaddr is out of range, raise
-@@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
+-         * AddressSizeFault.
- {
++         * of descriptor.  For FEAT_LPA2 and effective DS, bits [51:50] of
-     /* FP operations in 2-reg-and-shift group */
++         * descaddr are in [9:8].  Otherwise, if descaddr is out of range,
-     int vec_size = a->q ? 16 : 8;
++         * raise AddressSizeFault.
--    int rd_ofs = neon_reg_offset(a->vd, 0);
+          */
--    int rm_ofs = neon_reg_offset(a->vm, 0);
+         if (outputsize > 48) {
-+    int rd_ofs = neon_full_reg_offset(a->vd);
+-            descaddr |= extract64(descriptor, 12, 4) << 48;
-+    int rm_ofs = neon_full_reg_offset(a->vm);
++            if (param.ds) {
-     TCGv_ptr fpst;
++                descaddr |= extract64(descriptor, 8, 2) << 50;
++            } else {
-     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
++                descaddr |= extract64(descriptor, 12, 4) << 48;
-@@ -XXX,XX +XXX,XX @@ static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
++            }
-         return true;
+         } else if (descaddr >> outputsize) {
-     }
+             fault_type = ARMFault_AddressSize;
+             goto do_fault;
--    reg_ofs = neon_reg_offset(a->vd, 0);
+@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
-+    reg_ofs = neon_full_reg_offset(a->vd);
+         assert(attrindx <= 7);
-     vec_size = a->q ? 16 : 8;
+         cacheattrs->attrs = extract64(mair, attrindx * 8, 8);
-     imm = asimd_imm_const(a->imm, a->cmode, a->op);
+     }
+-    cacheattrs->shareability = extract32(attrs, 6, 2);
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
++
-         return true;
++    /*
-     }
++     * For FEAT_LPA2 and effective DS, the SH field in the attributes
++     * was re-purposed for output address bits.  The SH attribute in
--    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
++     * that case comes from TCR_ELx, which we extracted earlier.
--                       neon_reg_offset(a->vn, 0),
++     */
--                       neon_reg_offset(a->vm, 0),
++    if (param.ds) {
-+    tcg_gen_gvec_3_ool(neon_full_reg_offset(a->vd),
++        cacheattrs->shareability = param.sh;
-+                       neon_full_reg_offset(a->vn),
++    } else {
-+                       neon_full_reg_offset(a->vm),
++        cacheattrs->shareability = extract32(attrs, 6, 2);
-, 16, 0, fn_gvec);
++    }
-     return true;
- }
+     *phys_ptr = descaddr;
-@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
+     *page_size_ptr = page_size;
  {
      /* Two registers and a scalar, using gvec */
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rn_ofs = neon_reg_offset(a->vn, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rn_ofs = neon_full_reg_offset(a->vn);
      int rm_ofs;
      int idx;
      TCGv_ptr fpstatus;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
      /* a->vm is M:Vm, which encodes both register and index */
      idx = extract32(a->vm, a->size + 2, 2);
      a->vm = extract32(a->vm, 0, a->size + 2);
 -    rm_ofs = neon_reg_offset(a->vm, 0);
 +    rm_ofs = neon_full_reg_offset(a->vm);
      fpstatus = fpstatus_ptr(a->size == 1 ? FPST_STD_F16 : FPST_STD);
      tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpstatus,
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
          return true;
      }
 -    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
 +    tcg_gen_gvec_dup_mem(a->size, neon_full_reg_offset(a->vd),
                           neon_element_offset(a->vm, a->index, a->size),
                           a->q ? 16 : 8, a->q ? 16 : 8);
      return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
  static bool do_2misc_vec(DisasContext *s, arg_2misc *a, GVecGen2Fn *fn)
  {
      int vec_size = a->q ? 16 : 8;
 -    int rd_ofs = neon_reg_offset(a->vd, 0);
 -    int rm_ofs = neon_reg_offset(a->vm, 0);
 +    int rd_ofs = neon_full_reg_offset(a->vd);
 +    int rm_ofs = neon_full_reg_offset(a->vm);
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return false;
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
      }
      tmp = load_reg(s, a->rt);
 -    tcg_gen_gvec_dup_i32(size, neon_reg_offset(a->vn, 0),
 +    tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(a->vn),
                           vec_size, vec_size, tmp);
      tcg_temp_free_i32(tmp);
 --
-.20.1
+.25.1

-[PULL 12/26] target/arm: Fix float16 pairwise Neon ops on big-endian hosts
+[PULL 24/26] target/arm: Report KVM's actual PSCI version to guest in dtb
-In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
+When we're using KVM, the PSCI implementation is provided by the
-meant we were using the H4() address swizzler macro rather than the
+kernel, but QEMU has to tell the guest about it via the device tree.
-H2() which is required for 2-byte data.  This had no effect on
+Currently we look at the KVM_CAP_ARM_PSCI_0_2 capability to determine
-little-endian hosts but meant we put the result data into the
+if the kernel is providing at least PSCI 0.2, but if the kernel
-destination Dreg in the wrong order on big-endian hosts.
+provides a newer version than that we will still only tell the guest
 it has PSCI 0.2.  (This is fairly harmless; it just means the guest
 won't use newer parts of the PSCI API.)
 The kernel exposes the specific PSCI version it is implementing via
 the ONE_REG API; use this to report in the dtb that the PSCI
 implementation is 1.0-compatible if appropriate.  (The device tree
 binding currently only distinguishes "pre-0.2", "0.2-compatible" and
 "1.0-compatible".)
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Marc Zyngier <maz@kernel.org>
+Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Andrew Jones <drjones@redhat.com>
-Message-id: 20201028191712.4910-2-peter.maydell@linaro.org
+Message-id: 20220224134655.1207865-1-peter.maydell@linaro.org
 ---
- target/arm/vec_helper.c | 8 ++++----
+ target/arm/kvm-consts.h |  1 +
-file changed, 4 insertions(+), 4 deletions(-)
+ hw/arm/boot.c           |  5 ++---
  target/arm/kvm64.c      | 12 ++++++++++++
 files changed, 15 insertions(+), 3 deletions(-)
-diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
+diff --git a/target/arm/kvm-consts.h b/target/arm/kvm-consts.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/vec_helper.c
+--- a/target/arm/kvm-consts.h
-+++ b/target/arm/vec_helper.c
++++ b/target/arm/kvm-consts.h
-@@ -XXX,XX +XXX,XX @@ DO_ABA(gvec_uaba_d, uint64_t)
+@@ -XXX,XX +XXX,XX @@ MISMATCH_CHECK(QEMU_PSCI_1_0_FN_PSCI_FEATURES, PSCI_1_0_FN_PSCI_FEATURES);
-         r2 = float16_##OP(m[H2(0)], m[H2(1)], fpst);                    \
-         r3 = float16_##OP(m[H2(2)], m[H2(3)], fpst);                    \
+ #define QEMU_PSCI_VERSION_0_1                     0x00001
-                                                                         \
+ #define QEMU_PSCI_VERSION_0_2                     0x00002
--        d[H4(0)] = r0;                                                  \
++#define QEMU_PSCI_VERSION_1_0                     0x10000
--        d[H4(1)] = r1;                                                  \
+ #define QEMU_PSCI_VERSION_1_1                     0x10001
--        d[H4(2)] = r2;                                                  \
--        d[H4(3)] = r3;                                                  \
+ MISMATCH_CHECK(QEMU_PSCI_0_2_RET_TOS_MIGRATION_NOT_REQUIRED, PSCI_0_2_TOS_MP);
-+        d[H2(0)] = r0;                                                  \
+diff --git a/hw/arm/boot.c b/hw/arm/boot.c
-+        d[H2(1)] = r1;                                                  \
+index XXXXXXX..XXXXXXX 100644
-+        d[H2(2)] = r2;                                                  \
+--- a/hw/arm/boot.c
-+        d[H2(3)] = r3;                                                  \
++++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@ static void fdt_add_psci_node(void *fdt)
      }
- DO_NEON_PAIRWISE(neon_padd, add)
+     qemu_fdt_add_subnode(fdt, "/psci");
 -    if (armcpu->psci_version == QEMU_PSCI_VERSION_0_2 ||
 -        armcpu->psci_version == QEMU_PSCI_VERSION_1_1) {
 -        if (armcpu->psci_version == QEMU_PSCI_VERSION_0_2) {
 +    if (armcpu->psci_version >= QEMU_PSCI_VERSION_0_2) {
 +        if (armcpu->psci_version < QEMU_PSCI_VERSION_1_0) {
              const char comp[] = "arm,psci-0.2\0arm,psci";
              qemu_fdt_setprop(fdt, "/psci", "compatible", comp, sizeof(comp));
          } else {
 diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm64.c
 +++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
      uint64_t mpidr;
      ARMCPU *cpu = ARM_CPU(cs);
      CPUARMState *env = &cpu->env;
 +    uint64_t psciver;
      if (cpu->kvm_target == QEMU_KVM_ARM_TARGET_NONE ||
          !object_dynamic_cast(OBJECT(cpu), TYPE_AARCH64_CPU)) {
@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
          }
      }
 +    /*
 +     * KVM reports the exact PSCI version it is implementing via a
 +     * special sysreg. If it is present, use its contents to determine
 +     * what to report to the guest in the dtb (it is the PSCI version,
 +     * in the same 15-bits major 16-bits minor format that PSCI_VERSION
 +     * returns).
 +     */
 +    if (!kvm_get_one_reg(cs, KVM_REG_ARM_PSCI_VERSION, &psciver)) {
 +        cpu->psci_version = psciver;
 +    }
 +
      /*
       * When KVM is in use, PSCI is emulated in-kernel and not by qemu.
       * Currently KVM has its own idea about MPIDR assignment, so we
 --
-.20.1
+.25.1

-[PULL 19/26] hw/display/omap_lcdc: Fix potential NULL pointer dereference
+[PULL 25/26] ui/cocoa.m: Fix updateUIInfo threading issues
-From: AlexChen <alex.chen@huawei.com>
+The updateUIInfo method makes Cocoa API calls.  It also calls back
 into QEMU functions like dpy_set_ui_info().  To do this safely, we
 need to follow two rules:
  * Cocoa API calls are made on the Cocoa UI thread
  * When calling back into QEMU we must hold the iothread lock
-In omap_lcd_interrupts(), the pointer omap_lcd is dereferinced before
+Fix the places where we got this wrong, by taking the iothread lock
-being check if it is valid, which may lead to NULL pointer dereference.
+while executing updateUIInfo, and moving the call in cocoa_switch()
-So move the assignment to surface after checking that the omap_lcd is valid
+inside the dispatch_async block.
 and move surface_bits_per_pixel(surface) to after the surface assignment.
-Reported-by: Euler Robot <euler.robot@huawei.com>
+Some of the Cocoa UI methods which call updateUIInfo are invoked as
-Signed-off-by: AlexChen <alex.chen@huawei.com>
+part of the initial application startup, while we're still doing the
-Message-id: 5F9CDB8A.9000001@huawei.com
+little cross-thread dance described in the comment just above
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+call_qemu_main().  This meant they were calling back into the QEMU UI
 layer before we'd actually finished initializing our display and
 registered the DisplayChangeListener, which isn't really valid.  Once
 updateUIInfo takes the iothread lock, we no longer get away with
 this, because during this startup phase the iothread lock is held by
 the QEMU main-loop thread which is waiting for us to finish our
 display initialization.  So we must suppress updateUIInfo until
 applicationDidFinishLaunching allows the QEMU main-loop thread to
 continue.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com>
+Tested-by: Akihiko Odaki <akihiko.odaki@gmail.com>
+Message-id: 20220224101330.967429-2-peter.maydell@linaro.org
 ---
- hw/display/omap_lcdc.c | 10 +++++++---
+ ui/cocoa.m | 25 ++++++++++++++++++++++---
-file changed, 7 insertions(+), 3 deletions(-)
+file changed, 22 insertions(+), 3 deletions(-)
-diff --git a/hw/display/omap_lcdc.c b/hw/display/omap_lcdc.c
+diff --git a/ui/cocoa.m b/ui/cocoa.m
 index XXXXXXX..XXXXXXX 100644
---- a/hw/display/omap_lcdc.c
+--- a/ui/cocoa.m
-+++ b/hw/display/omap_lcdc.c
++++ b/ui/cocoa.m
-@@ -XXX,XX +XXX,XX @@ static void omap_lcd_interrupts(struct omap_lcd_panel_s *s)
+@@ -XXX,XX +XXX,XX @@ QemuCocoaView *cocoaView;
- static void omap_update_display(void *opaque)
+     }
  }
 -- (void) updateUIInfo
 +- (void) updateUIInfoLocked
  {
-     struct omap_lcd_panel_s *omap_lcd = (struct omap_lcd_panel_s *) opaque;
++    /* Must be called with the iothread lock, i.e. via updateUIInfo */
--    DisplaySurface *surface = qemu_console_surface(omap_lcd->con);
+     NSSize frameSize;
-+    DisplaySurface *surface;
+     QemuUIInfo info;
-     draw_line_func draw_line;
-     int size, height, first, last;
+@@ -XXX,XX +XXX,XX @@ QemuCocoaView *cocoaView;
-     int width, linesize, step, bpp, frame_offset;
+     dpy_set_ui_info(dcl.con, &info, TRUE);
-     hwaddr frame_base;
+ }
--    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable ||
++- (void) updateUIInfo
--        !surface_bits_per_pixel(surface)) {
++{
-+    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable) {
++    if (!allow_events) {
 +        /*
 +         * Don't try to tell QEMU about UI information in the application
 +         * startup phase -- we haven't yet registered dcl with the QEMU UI
 +         * layer, and also trying to take the iothread lock would deadlock.
 +         * When cocoa_display_init() does register the dcl, the UI layer
 +         * will call cocoa_switch(), which will call updateUIInfo, so
 +         * we don't lose any information here.
 +         */
 +        return;
 +    }
 +
-+    surface = qemu_console_surface(omap_lcd->con);
++    with_iothread_lock(^{
-+    if (!surface_bits_per_pixel(surface)) {
++        [self updateUIInfoLocked];
-         return;
++    });
-     }
++}
++
  - (void)viewDidMoveToWindow
  {
      [self updateUIInfo];
@@ -XXX,XX +XXX,XX @@ static void cocoa_switch(DisplayChangeListener *dcl,
      COCOA_DEBUG("qemu_cocoa: cocoa_switch\n");
 -    [cocoaView updateUIInfo];
 -
      // The DisplaySurface will be freed as soon as this callback returns.
      // We take a reference to the underlying pixman image here so it does
      // not disappear from under our feet; the switchSurface method will
@@ -XXX,XX +XXX,XX @@ static void cocoa_switch(DisplayChangeListener *dcl,
      pixman_image_ref(image);
      dispatch_async(dispatch_get_main_queue(), ^{
 +        [cocoaView updateUIInfo];
          [cocoaView switchSurface:image];
      });
      [pool release];
 --
-.20.1
+.25.1

-[PULL 23/26] hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
+[PULL 26/26] ui/cocoa.m: Remove unnecessary NSAutoreleasePools
-In gicv3_init_cpuif() we copy the ARMCPU gicv3_maintenance_interrupt
+In commit 6e657e64cdc478 in 2013 we added some autorelease pools to
-into the GICv3CPUState struct's maintenance_irq field.  This will
+deal with complaints from macOS when we made calls into Cocoa from
-only work if the board happens to have already wired up the CPU
+threads that didn't have automatically created autorelease pools.
-maintenance IRQ before the GIC was realized.  Unfortunately this is
+Later on, macOS got stricter about forbidding cross-thread Cocoa
-not the case for the 'virt' board, and so the value that gets copied
+calls, and in commit 5588840ff77800e839d8 we restructured the code to
-is NULL (since a qemu_irq is really a pointer to an IRQState struct
+avoid them.  This left the autorelease pool creation in several
-under the hood).  The effect is that the CPU interface code never
+functions without any purpose; delete it.
 actually raises the maintenance interrupt line.
-Instead, since the GICv3CPUState has a pointer to the CPUState, make
+We still need the pool in cocoa_refresh() for the clipboard related
-the dereference at the point where we want to raise the interrupt, to
+code which is called directly there.
 avoid an implicit requirement on board code to wire things up in a
 particular order.
-Reported-by: Jose Martins <josemartins90@gmail.com>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20201009153904.28529-1-peter.maydell@linaro.org
+Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com>
-Reviewed-by: Luc Michel <luc@lmichel.fr>
+Tested-by: Akihiko Odaki <akihiko.odaki@gmail.com>
 Message-id: 20220224101330.967429-3-peter.maydell@linaro.org
 ---
- include/hw/intc/arm_gicv3_common.h | 1 -
+ ui/cocoa.m | 6 ------
- hw/intc/arm_gicv3_cpuif.c          | 5 ++---
+file changed, 6 deletions(-)
 files changed, 2 insertions(+), 4 deletions(-)
-diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
+diff --git a/ui/cocoa.m b/ui/cocoa.m
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/intc/arm_gicv3_common.h
+--- a/ui/cocoa.m
-+++ b/include/hw/intc/arm_gicv3_common.h
++++ b/ui/cocoa.m
-@@ -XXX,XX +XXX,XX @@ struct GICv3CPUState {
+@@ -XXX,XX +XXX,XX @@ int main (int argc, char **argv) {
-     qemu_irq parent_fiq;
+ static void cocoa_update(DisplayChangeListener *dcl,
-     qemu_irq parent_virq;
+                          int x, int y, int w, int h)
-     qemu_irq parent_vfiq;
+ {
--    qemu_irq maintenance_irq;
+-    NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
+-
-     /* Redistributor */
+     COCOA_DEBUG("qemu_cocoa: cocoa_update\n");
-     uint32_t level;                  /* Current IRQ level */
-diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
+     dispatch_async(dispatch_get_main_queue(), ^{
-index XXXXXXX..XXXXXXX 100644
+@@ -XXX,XX +XXX,XX @@ static void cocoa_update(DisplayChangeListener *dcl,
---- a/hw/intc/arm_gicv3_cpuif.c
+         }
-+++ b/hw/intc/arm_gicv3_cpuif.c
+         [cocoaView setNeedsDisplayInRect:rect];
-@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
+     });
-     int irqlevel = 0;
+-
-     int fiqlevel = 0;
+-    [pool release];
      int maintlevel = 0;
 +    ARMCPU *cpu = ARM_CPU(cs->cpu);
      idx = hppvi_index(cs);
      trace_gicv3_cpuif_virt_update(gicv3_redist_affid(cs), idx);
@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
      qemu_set_irq(cs->parent_vfiq, fiqlevel);
      qemu_set_irq(cs->parent_virq, irqlevel);
 -    qemu_set_irq(cs->maintenance_irq, maintlevel);
 +    qemu_set_irq(cpu->gicv3_maintenance_interrupt, maintlevel);
  }
- static uint64_t icv_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
+ static void cocoa_switch(DisplayChangeListener *dcl,
-@@ -XXX,XX +XXX,XX @@ void gicv3_init_cpuif(GICv3State *s)
+                          DisplaySurface *surface)
-             && cpu->gic_num_lrs) {
+ {
-             int j;
+-    NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
+     pixman_image_t *image = surface->image;
--            cs->maintenance_irq = cpu->gicv3_maintenance_interrupt;
--
+     COCOA_DEBUG("qemu_cocoa: cocoa_switch\n");
-             cs->num_list_regs = cpu->gic_num_lrs;
+@@ -XXX,XX +XXX,XX @@ static void cocoa_switch(DisplayChangeListener *dcl,
-             cs->vpribits = cpu->gic_vpribits;
+         [cocoaView updateUIInfo];
-             cs->vprebits = cpu->gic_vprebits;
+         [cocoaView switchSurface:image];
      });
 -    [pool release];
  }
  static void cocoa_refresh(DisplayChangeListener *dcl)
 --
-.20.1
+.25.1

Small pile of bug fixes for rc1. I've included my patches to get
our docs building with Sphinx 3, just for convenience...

-- PMM

The following changes since commit b149dea55cce97cb226683d06af61984a1c11e96:

Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20201102' into staging (2020-11-02 10:57:48 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20201102

for you to fetch changes up to ffb4fbf90a2f63c9cb33e4bb9f854c79bf04ca4a:

tests/qtest/npcm7xx_rng-test: Disable randomness tests (2020-11-02 16:52:18 +0000)

----------------------------------------------------------------
target-arm queue:
 * target/arm: Fix Neon emulation bugs on big-endian hosts
 * target/arm: fix handling of HCR.FB
 * target/arm: fix LORID_EL1 access check
 * disas/capstone: Fix monitor disassembly of >32 bytes
 * hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
 * hw/arm/boot: fix SVE for EL3 direct kernel boot
 * hw/display/omap_lcdc: Fix potential NULL pointer dereference
 * hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
 * target/arm: Get correct MMU index for other-security-state
 * configure: Test that gio libs from pkg-config work
 * hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
 * docs: Fix building with Sphinx 3
 * tests/qtest/npcm7xx_rng-test: Disable randomness tests

----------------------------------------------------------------
AlexChen (2):
      hw/display/omap_lcdc: Fix potential NULL pointer dereference
      hw/display/exynos4210_fimd: Fix potential NULL pointer dereference

Peter Maydell (9):
      target/arm: Fix float16 pairwise Neon ops on big-endian hosts
      target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
      disas/capstone: Fix monitor disassembly of >32 bytes
      target/arm: Get correct MMU index for other-security-state
      configure: Test that gio libs from pkg-config work
      hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
      scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
      qemu-option-trace.rst.inc: Don't use option:: markup
      tests/qtest/npcm7xx_rng-test: Disable randomness tests

Philippe Mathieu-Daudé (1):
      hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)

Richard Henderson (11):
      target/arm: Introduce neon_full_reg_offset
      target/arm: Move neon_element_offset to translate.c
      target/arm: Use neon_element_offset in neon_load/store_reg
      target/arm: Use neon_element_offset in vfp_reg_offset
      target/arm: Add read/write_neon_element32
      target/arm: Expand read/write_neon_element32 to all MemOp
      target/arm: Rename neon_load_reg32 to vfp_load_reg32
      target/arm: Add read/write_neon_element64
      target/arm: Rename neon_load_reg64 to vfp_load_reg64
      target/arm: Simplify do_long_3d and do_2scalar_long
      target/arm: Improve do_prewiden_3d

Rémi Denis-Courmont (3):
      target/arm: fix handling of HCR.FB
      target/arm: fix LORID_EL1 access check
      hw/arm/boot: fix SVE for EL3 direct kernel boot

From: Richard Henderson <richard.henderson@linaro.org>

This function makes it clear that we're talking about the whole
register, and not the 32-bit piece at index 0.  This fixes a bug
when running on a big-endian host.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  8 ++++++
 target/arm/translate-neon.c.inc | 44 ++++++++++++++++-----------------
 target/arm/translate-vfp.c.inc  |  2 +-
 3 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_hlt(DisasContext *s, int imm)
     unallocated_encoding(s);
 }
 
+/*
+ * Return the offset of a "full" NEON Dreg.
+ */
+static long neon_full_reg_offset(unsigned reg)
+{
+    return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+}
+
 static inline long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ neon_element_offset(int reg, int element, MemOp size)
         ofs ^= 8 - element_size;
     }
 #endif
-    return neon_reg_offset(reg, 0) + ofs;
+    return neon_full_reg_offset(reg) + ofs;
 }
 
 static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
@@ -XXX,XX +XXX,XX @@ static bool trans_VLD_all_lanes(DisasContext *s, arg_VLD_all_lanes *a)
              * We cannot write 16 bytes at once because the
              * destination is unaligned.
              */
-            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
+            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
                                  8, 8, tmp);
-            tcg_gen_gvec_mov(0, neon_reg_offset(vd + 1, 0),
-                             neon_reg_offset(vd, 0), 8, 8);
+            tcg_gen_gvec_mov(0, neon_full_reg_offset(vd + 1),
+                             neon_full_reg_offset(vd), 8, 8);
         } else {
-            tcg_gen_gvec_dup_i32(size, neon_reg_offset(vd, 0),
+            tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(vd),
                                  vec_size, vec_size, tmp);
         }
         tcg_gen_addi_i32(addr, addr, 1 << size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDST_single(DisasContext *s, arg_VLDST_single *a)
 static bool do_3same(DisasContext *s, arg_3same *a, GVecGen3Fn fn)
 {
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rn_ofs = neon_reg_offset(a->vn, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rn_ofs = neon_full_reg_offset(a->vn);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
 {
     /* Handle a 2-reg-shift insn which can be vectorized. */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_fp_2sh(DisasContext *s, arg_2reg_shift *a,
 {
     /* FP operations in 2-reg-and-shift group */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
     TCGv_ptr fpst;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_1reg_imm(DisasContext *s, arg_1reg_imm *a,
         return true;
     }
 
-    reg_ofs = neon_reg_offset(a->vd, 0);
+    reg_ofs = neon_full_reg_offset(a->vd);
     vec_size = a->q ? 16 : 8;
     imm = asimd_imm_const(a->imm, a->cmode, a->op);
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
         return true;
     }
 
-    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
-                       neon_reg_offset(a->vn, 0),
-                       neon_reg_offset(a->vm, 0),
+    tcg_gen_gvec_3_ool(neon_full_reg_offset(a->vd),
+                       neon_full_reg_offset(a->vn),
+                       neon_full_reg_offset(a->vm),
                        16, 16, 0, fn_gvec);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
 {
     /* Two registers and a scalar, using gvec */
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rn_ofs = neon_reg_offset(a->vn, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rn_ofs = neon_full_reg_offset(a->vn);
     int rm_ofs;
     int idx;
     TCGv_ptr fpstatus;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_fp_vec(DisasContext *s, arg_2scalar *a,
     /* a->vm is M:Vm, which encodes both register and index */
     idx = extract32(a->vm, a->size + 2, 2);
     a->vm = extract32(a->vm, 0, a->size + 2);
-    rm_ofs = neon_reg_offset(a->vm, 0);
+    rm_ofs = neon_full_reg_offset(a->vm);
 
     fpstatus = fpstatus_ptr(a->size == 1 ? FPST_STD_F16 : FPST_STD);
     tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, fpstatus,
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
         return true;
     }
 
-    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
+    tcg_gen_gvec_dup_mem(a->size, neon_full_reg_offset(a->vd),
                          neon_element_offset(a->vm, a->index, a->size),
                          a->q ? 16 : 8, a->q ? 16 : 8);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
 static bool do_2misc_vec(DisasContext *s, arg_2misc *a, GVecGen2Fn *fn)
 {
     int vec_size = a->q ? 16 : 8;
-    int rd_ofs = neon_reg_offset(a->vd, 0);
-    int rm_ofs = neon_reg_offset(a->vm, 0);
+    int rd_ofs = neon_full_reg_offset(a->vd);
+    int rm_ofs = neon_full_reg_offset(a->vm);
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP(DisasContext *s, arg_VDUP *a)
     }
 
     tmp = load_reg(s, a->rt);
-    tcg_gen_gvec_dup_i32(size, neon_reg_offset(a->vn, 0),
+    tcg_gen_gvec_dup_i32(size, neon_full_reg_offset(a->vn),
                          vec_size, vec_size, tmp);
     tcg_temp_free_i32(tmp);
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This will shortly have users outside of translate-neon.c.inc.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          | 20 ++++++++++++++++++++
 target/arm/translate-neon.c.inc | 19 -------------------
 2 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
     return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
 }
 
+/*
+ * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
+ * where 0 is the least significant end of the register.
+ */
+static long neon_element_offset(int reg, int element, MemOp size)
+{
+    int element_size = 1 << size;
+    int ofs = element * element_size;
+#ifdef HOST_WORDS_BIGENDIAN
+    /*
+     * Calculate the offset assuming fully little-endian,
+     * then XOR to account for the order of the 8-byte units.
+     */
+    if (element_size < 8) {
+        ofs ^= 8 - element_size;
+    }
+#endif
+    return neon_full_reg_offset(reg) + ofs;
+}
+
 static inline long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static inline int neon_3same_fp_size(DisasContext *s, int x)
 #include "decode-neon-ls.c.inc"
 #include "decode-neon-shared.c.inc"
 
-/* Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
- * where 0 is the least significant end of the register.
- */
-static inline long
-neon_element_offset(int reg, int element, MemOp size)
-{
-    int element_size = 1 << size;
-    int ofs = element * element_size;
-#ifdef HOST_WORDS_BIGENDIAN
-    /* Calculate the offset assuming fully little-endian,
-     * then XOR to account for the order of the 8-byte units.
-     */
-    if (element_size < 8) {
-        ofs ^= 8 - element_size;
-    }
-#endif
-    return neon_full_reg_offset(reg) + ofs;
-}
-
 static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
 {
     long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

These are the only users of neon_reg_offset, so remove that.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-4-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-/* Return the offset of a 32-bit piece of a NEON register.
-   zero is the least significant end of the register.  */
-static inline long
-neon_reg_offset (int reg, int n)
-{
-    int sreg;
-    sreg = reg * 2 + n;
-    return vfp_reg_offset(0, sreg);
-}
-
 static TCGv_i32 neon_load_reg(int reg, int pass)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_ld_i32(tmp, cpu_env, neon_reg_offset(reg, pass));
+    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
     return tmp;
 }
 
 static void neon_store_reg(int reg, int pass, TCGv_i32 var)
 {
-    tcg_gen_st_i32(var, cpu_env, neon_reg_offset(reg, pass));
+    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
     tcg_temp_free_i32(var);
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

This seems a bit more readable than using offsetof CPU_DoubleU.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-5-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_element_offset(int reg, int element, MemOp size)
     return neon_full_reg_offset(reg) + ofs;
 }
 
-static inline long vfp_reg_offset(bool dp, unsigned reg)
+/* Return the offset of a VFP Dreg (dp = true) or VFP Sreg (dp = false). */
+static long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
-        return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+        return neon_element_offset(reg, 0, MO_64);
     } else {
-        long ofs = offsetof(CPUARMState, vfp.zregs[reg >> 2].d[(reg >> 1) & 1]);
-        if (reg & 1) {
-            ofs += offsetof(CPU_DoubleU, l.upper);
-        } else {
-            ofs += offsetof(CPU_DoubleU, l.lower);
-        }
-        return ofs;
+        return neon_element_offset(reg >> 1, reg & 1, MO_32);
     }
 }
 
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Model these off the aa64 read/write_vec_element functions.
Use it within translate-neon.c.inc.  The new functions do
not allocate or free temps, so this rearranges the calling
code a bit.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-6-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  26 ++++
 target/arm/translate-neon.c.inc | 256 ++++++++++++++++++++------------
 2 files changed, 183 insertions(+), 99 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
+{
+    long off = neon_element_offset(reg, ele, size);
+
+    switch (size) {
+    case MO_32:
+        tcg_gen_ld_i32(dest, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
+{
+    long off = neon_element_offset(reg, ele, size);
+
+    switch (size) {
+    case MO_32:
+        tcg_gen_st_i32(src, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 {
     TCGv_ptr ret = tcg_temp_new_ptr();
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
      * early. Since Q is 0 there are always just two passes, so instead
      * of a complicated loop over each pass we just unroll.
      */
-    tmp = neon_load_reg(a->vn, 0);
-    tmp2 = neon_load_reg(a->vn, 1);
+    tmp = tcg_temp_new_i32();
+    tmp2 = tcg_temp_new_i32();
+    tmp3 = tcg_temp_new_i32();
+
+    read_neon_element32(tmp, a->vn, 0, MO_32);
+    read_neon_element32(tmp2, a->vn, 1, MO_32);
     fn(tmp, tmp, tmp2);
-    tcg_temp_free_i32(tmp2);
 
-    tmp3 = neon_load_reg(a->vm, 0);
-    tmp2 = neon_load_reg(a->vm, 1);
+    read_neon_element32(tmp3, a->vm, 0, MO_32);
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     fn(tmp3, tmp3, tmp2);
-    tcg_temp_free_i32(tmp2);
 
-    neon_store_reg(a->vd, 0, tmp);
-    neon_store_reg(a->vd, 1, tmp3);
+    write_neon_element32(tmp, a->vd, 0, MO_32);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i32(tmp2);
+    tcg_temp_free_i32(tmp3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
      * 2-reg-and-shift operations, size < 3 case, where the
      * helper needs to be passed cpu_env.
      */
-    TCGv_i32 constimm;
+    TCGv_i32 constimm, tmp;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_32(DisasContext *s, arg_2reg_shift *a,
      * by immediate using the variable shift operations.
      */
     constimm = tcg_const_i32(dup_const(a->size, a->shift));
+    tmp = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
+        read_neon_element32(tmp, a->vm, pass, MO_32);
         fn(tmp, cpu_env, tmp, constimm);
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(constimm);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
     constimm = tcg_const_i64(-a->shift);
     rm1 = tcg_temp_new_i64();
     rm2 = tcg_temp_new_i64();
+    rd = tcg_temp_new_i32();
 
     /* Load both inputs first to avoid potential overwrite if rm == rd */
     neon_load_reg64(rm1, a->vm);
     neon_load_reg64(rm2, a->vm + 1);
 
     shiftfn(rm1, rm1, constimm);
-    rd = tcg_temp_new_i32();
     narrowfn(rd, cpu_env, rm1);
-    neon_store_reg(a->vd, 0, rd);
+    write_neon_element32(rd, a->vd, 0, MO_32);
 
     shiftfn(rm2, rm2, constimm);
-    rd = tcg_temp_new_i32();
     narrowfn(rd, cpu_env, rm2);
-    neon_store_reg(a->vd, 1, rd);
+    write_neon_element32(rd, a->vd, 1, MO_32);
 
+    tcg_temp_free_i32(rd);
     tcg_temp_free_i64(rm1);
     tcg_temp_free_i64(rm2);
     tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
     constimm = tcg_const_i32(imm);
 
     /* Load all inputs first to avoid potential overwrite */
-    rm1 = neon_load_reg(a->vm, 0);
-    rm2 = neon_load_reg(a->vm, 1);
-    rm3 = neon_load_reg(a->vm + 1, 0);
-    rm4 = neon_load_reg(a->vm + 1, 1);
+    rm1 = tcg_temp_new_i32();
+    rm2 = tcg_temp_new_i32();
+    rm3 = tcg_temp_new_i32();
+    rm4 = tcg_temp_new_i32();
+    read_neon_element32(rm1, a->vm, 0, MO_32);
+    read_neon_element32(rm2, a->vm, 1, MO_32);
+    read_neon_element32(rm3, a->vm, 2, MO_32);
+    read_neon_element32(rm4, a->vm, 3, MO_32);
     rtmp = tcg_temp_new_i64();
 
     shiftfn(rm1, rm1, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
     tcg_temp_free_i32(rm2);
 
     narrowfn(rm1, cpu_env, rtmp);
-    neon_store_reg(a->vd, 0, rm1);
+    write_neon_element32(rm1, a->vd, 0, MO_32);
+    tcg_temp_free_i32(rm1);
 
     shiftfn(rm3, rm3, constimm);
     shiftfn(rm4, rm4, constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_32(DisasContext *s, arg_2reg_shift *a,
 
     narrowfn(rm3, cpu_env, rtmp);
     tcg_temp_free_i64(rtmp);
-    neon_store_reg(a->vd, 1, rm3);
+    write_neon_element32(rm3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(rm3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         widen_mask = dup_const(a->size + 1, widen_mask);
     }
 
-    rm0 = neon_load_reg(a->vm, 0);
-    rm1 = neon_load_reg(a->vm, 1);
+    rm0 = tcg_temp_new_i32();
+    rm1 = tcg_temp_new_i32();
+    read_neon_element32(rm0, a->vm, 0, MO_32);
+    read_neon_element32(rm1, a->vm, 1, MO_32);
     tmp = tcg_temp_new_i64();
 
     widenfn(tmp, rm0);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     if (src1_wide) {
         neon_load_reg64(rn0_64, a->vn);
     } else {
-        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vn, 0, MO_32);
         widenfn(rn0_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = neon_load_reg(a->vm, 0);
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rm, a->vm, 0, MO_32);
 
     widenfn(rm_64, rm);
     tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     if (src1_wide) {
         neon_load_reg64(rn1_64, a->vn + 1);
     } else {
-        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vn, 1, MO_32);
         widenfn(rn1_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = neon_load_reg(a->vm, 1);
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rm, a->vm, 1, MO_32);
 
     neon_store_reg64(rn0_64, a->vd);
 
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
 
     narrowfn(rd1, rn_64);
 
-    neon_store_reg(a->vd, 0, rd0);
-    neon_store_reg(a->vd, 1, rd1);
+    write_neon_element32(rd0, a->vd, 0, MO_32);
+    write_neon_element32(rd1, a->vd, 1, MO_32);
 
+    tcg_temp_free_i32(rd0);
+    tcg_temp_free_i32(rd1);
     tcg_temp_free_i64(rn_64);
     tcg_temp_free_i64(rm_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     rd0 = tcg_temp_new_i64();
     rd1 = tcg_temp_new_i64();
 
-    rn = neon_load_reg(a->vn, 0);
-    rm = neon_load_reg(a->vm, 0);
+    rn = tcg_temp_new_i32();
+    rm = tcg_temp_new_i32();
+    read_neon_element32(rn, a->vn, 0, MO_32);
+    read_neon_element32(rm, a->vm, 0, MO_32);
     opfn(rd0, rn, rm);
-    tcg_temp_free_i32(rn);
-    tcg_temp_free_i32(rm);
 
-    rn = neon_load_reg(a->vn, 1);
-    rm = neon_load_reg(a->vm, 1);
+    read_neon_element32(rn, a->vn, 1, MO_32);
+    read_neon_element32(rm, a->vm, 1, MO_32);
     opfn(rd1, rn, rm);
     tcg_temp_free_i32(rn);
     tcg_temp_free_i32(rm);
@@ -XXX,XX +XXX,XX @@ static void gen_neon_dup_high16(TCGv_i32 var)
 
 static inline TCGv_i32 neon_get_scalar(int size, int reg)
 {
-    TCGv_i32 tmp;
-    if (size == 1) {
-        tmp = neon_load_reg(reg & 7, reg >> 4);
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    if (size == MO_16) {
+        read_neon_element32(tmp, reg & 7, reg >> 4, MO_32);
         if (reg & 8) {
             gen_neon_dup_high16(tmp);
         } else {
             gen_neon_dup_low16(tmp);
         }
     } else {
-        tmp = neon_load_reg(reg & 15, reg >> 4);
+        read_neon_element32(tmp, reg & 15, reg >> 4, MO_32);
     }
     return tmp;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
      * perform an accumulation operation of that result into the
      * destination.
      */
-    TCGv_i32 scalar;
+    TCGv_i32 scalar, tmp;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar(DisasContext *s, arg_2scalar *a,
     }
 
     scalar = neon_get_scalar(a->size, a->vm);
+    tmp = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
+        read_neon_element32(tmp, a->vn, pass, MO_32);
         opfn(tmp, tmp, scalar);
         if (accfn) {
-            TCGv_i32 rd = neon_load_reg(a->vd, pass);
+            TCGv_i32 rd = tcg_temp_new_i32();
+            read_neon_element32(rd, a->vd, pass, MO_32);
             accfn(tmp, rd, tmp);
             tcg_temp_free_i32(rd);
         }
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(scalar);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
      * performs a kind of fused op-then-accumulate using a helper
      * function that takes all of rd, rn and the scalar at once.
      */
-    TCGv_i32 scalar;
+    TCGv_i32 scalar, rn, rd;
     int pass;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
@@ -XXX,XX +XXX,XX @@ static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
     }
 
     scalar = neon_get_scalar(a->size, a->vm);
+    rn = tcg_temp_new_i32();
+    rd = tcg_temp_new_i32();
 
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 rn = neon_load_reg(a->vn, pass);
-        TCGv_i32 rd = neon_load_reg(a->vd, pass);
+        read_neon_element32(rn, a->vn, pass, MO_32);
+        read_neon_element32(rd, a->vd, pass, MO_32);
         opfn(rd, cpu_env, rn, scalar, rd);
-        tcg_temp_free_i32(rn);
-        neon_store_reg(a->vd, pass, rd);
+        write_neon_element32(rd, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(rd);
     tcg_temp_free_i32(scalar);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
     scalar = neon_get_scalar(a->size, a->vm);
 
     /* Load all inputs before writing any outputs, in case of overlap */
-    rn = neon_load_reg(a->vn, 0);
+    rn = tcg_temp_new_i32();
+    read_neon_element32(rn, a->vn, 0, MO_32);
     rn0_64 = tcg_temp_new_i64();
     opfn(rn0_64, rn, scalar);
-    tcg_temp_free_i32(rn);
 
-    rn = neon_load_reg(a->vn, 1);
+    read_neon_element32(rn, a->vn, 1, MO_32);
     rn1_64 = tcg_temp_new_i64();
     opfn(rn1_64, rn, scalar);
     tcg_temp_free_i32(rn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
         return false;
     }
     n <<= 3;
+    tmp = tcg_temp_new_i32();
     if (a->op) {
-        tmp = neon_load_reg(a->vd, 0);
+        read_neon_element32(tmp, a->vd, 0, MO_32);
     } else {
-        tmp = tcg_temp_new_i32();
         tcg_gen_movi_i32(tmp, 0);
     }
-    tmp2 = neon_load_reg(a->vm, 0);
+    tmp2 = tcg_temp_new_i32();
+    read_neon_element32(tmp2, a->vm, 0, MO_32);
     ptr1 = vfp_reg_ptr(true, a->vn);
     tmp4 = tcg_const_i32(n);
     gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
-    tcg_temp_free_i32(tmp);
+
     if (a->op) {
-        tmp = neon_load_reg(a->vd, 1);
+        read_neon_element32(tmp, a->vd, 1, MO_32);
     } else {
-        tmp = tcg_temp_new_i32();
         tcg_gen_movi_i32(tmp, 0);
     }
-    tmp3 = neon_load_reg(a->vm, 1);
+    tmp3 = tcg_temp_new_i32();
+    read_neon_element32(tmp3, a->vm, 1, MO_32);
     gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
+    tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(tmp4);
     tcg_temp_free_ptr(ptr1);
-    neon_store_reg(a->vd, 0, tmp2);
-    neon_store_reg(a->vd, 1, tmp3);
-    tcg_temp_free_i32(tmp);
+
+    write_neon_element32(tmp2, a->vd, 0, MO_32);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp2);
+    tcg_temp_free_i32(tmp3);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
 static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
 {
     int pass, half;
+    TCGv_i32 tmp[2];
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
         return true;
     }
 
-    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
-        TCGv_i32 tmp[2];
+    tmp[0] = tcg_temp_new_i32();
+    tmp[1] = tcg_temp_new_i32();
 
+    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
         for (half = 0; half < 2; half++) {
-            tmp[half] = neon_load_reg(a->vm, pass * 2 + half);
+            read_neon_element32(tmp[half], a->vm, pass * 2 + half, MO_32);
             switch (a->size) {
             case 0:
                 tcg_gen_bswap32_i32(tmp[half], tmp[half]);
@@ -XXX,XX +XXX,XX @@ static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
                 g_assert_not_reached();
             }
         }
-        neon_store_reg(a->vd, pass * 2, tmp[1]);
-        neon_store_reg(a->vd, pass * 2 + 1, tmp[0]);
+        write_neon_element32(tmp[1], a->vd, pass * 2, MO_32);
+        write_neon_element32(tmp[0], a->vd, pass * 2 + 1, MO_32);
     }
+
+    tcg_temp_free_i32(tmp[0]);
+    tcg_temp_free_i32(tmp[1]);
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
         rm0_64 = tcg_temp_new_i64();
         rm1_64 = tcg_temp_new_i64();
         rd_64 = tcg_temp_new_i64();
-        tmp = neon_load_reg(a->vm, pass * 2);
+
+        tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, pass * 2, MO_32);
         widenfn(rm0_64, tmp);
-        tcg_temp_free_i32(tmp);
-        tmp = neon_load_reg(a->vm, pass * 2 + 1);
+        read_neon_element32(tmp, a->vm, pass * 2 + 1, MO_32);
         widenfn(rm1_64, tmp);
         tcg_temp_free_i32(tmp);
+
         opfn(rd_64, rm0_64, rm1_64);
         tcg_temp_free_i64(rm0_64);
         tcg_temp_free_i64(rm1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
     narrowfn(rd0, cpu_env, rm);
     neon_load_reg64(rm, a->vm + 1);
     narrowfn(rd1, cpu_env, rm);
-    neon_store_reg(a->vd, 0, rd0);
-    neon_store_reg(a->vd, 1, rd1);
+    write_neon_element32(rd0, a->vd, 0, MO_32);
+    write_neon_element32(rd1, a->vd, 1, MO_32);
+    tcg_temp_free_i32(rd0);
+    tcg_temp_free_i32(rd1);
     tcg_temp_free_i64(rm);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
     }
 
     rd = tcg_temp_new_i64();
+    rm0 = tcg_temp_new_i32();
+    rm1 = tcg_temp_new_i32();
 
-    rm0 = neon_load_reg(a->vm, 0);
-    rm1 = neon_load_reg(a->vm, 1);
+    read_neon_element32(rm0, a->vm, 0, MO_32);
+    read_neon_element32(rm1, a->vm, 1, MO_32);
 
     widenfn(rd, rm0);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F16_F32(DisasContext *s, arg_2misc *a)
 
     fpst = fpstatus_ptr(FPST_STD);
     ahp = get_ahp_flag();
-    tmp = neon_load_reg(a->vm, 0);
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vm, 0, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
-    tmp2 = neon_load_reg(a->vm, 1);
+    tmp2 = tcg_temp_new_i32();
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp2, tmp2, fpst, ahp);
     tcg_gen_shli_i32(tmp2, tmp2, 16);
     tcg_gen_or_i32(tmp2, tmp2, tmp);
-    tcg_temp_free_i32(tmp);
-    tmp = neon_load_reg(a->vm, 2);
+    read_neon_element32(tmp, a->vm, 2, MO_32);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp);
-    tmp3 = neon_load_reg(a->vm, 3);
-    neon_store_reg(a->vd, 0, tmp2);
+    tmp3 = tcg_temp_new_i32();
+    read_neon_element32(tmp3, a->vm, 3, MO_32);
+    write_neon_element32(tmp2, a->vd, 0, MO_32);
+    tcg_temp_free_i32(tmp2);
     gen_helper_vfp_fcvt_f32_to_f16(tmp3, tmp3, fpst, ahp);
     tcg_gen_shli_i32(tmp3, tmp3, 16);
     tcg_gen_or_i32(tmp3, tmp3, tmp);
-    neon_store_reg(a->vd, 1, tmp3);
+    write_neon_element32(tmp3, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp3);
     tcg_temp_free_i32(tmp);
     tcg_temp_free_i32(ahp);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_F32_F16(DisasContext *s, arg_2misc *a)
     fpst = fpstatus_ptr(FPST_STD);
     ahp = get_ahp_flag();
     tmp3 = tcg_temp_new_i32();
-    tmp = neon_load_reg(a->vm, 0);
-    tmp2 = neon_load_reg(a->vm, 1);
+    tmp2 = tcg_temp_new_i32();
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vm, 0, MO_32);
+    read_neon_element32(tmp2, a->vm, 1, MO_32);
     tcg_gen_ext16u_i32(tmp3, tmp);
     gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
-    neon_store_reg(a->vd, 0, tmp3);
+    write_neon_element32(tmp3, a->vd, 0, MO_32);
     tcg_gen_shri_i32(tmp, tmp, 16);
     gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp);
-    neon_store_reg(a->vd, 1, tmp);
-    tmp3 = tcg_temp_new_i32();
+    write_neon_element32(tmp, a->vd, 1, MO_32);
+    tcg_temp_free_i32(tmp);
     tcg_gen_ext16u_i32(tmp3, tmp2);
     gen_helper_vfp_fcvt_f16_to_f32(tmp3, tmp3, fpst, ahp);
-    neon_store_reg(a->vd, 2, tmp3);
+    write_neon_element32(tmp3, a->vd, 2, MO_32);
+    tcg_temp_free_i32(tmp3);
     tcg_gen_shri_i32(tmp2, tmp2, 16);
     gen_helper_vfp_fcvt_f16_to_f32(tmp2, tmp2, fpst, ahp);
-    neon_store_reg(a->vd, 3, tmp2);
+    write_neon_element32(tmp2, a->vd, 3, MO_32);
+    tcg_temp_free_i32(tmp2);
     tcg_temp_free_i32(ahp);
     tcg_temp_free_ptr(fpst);
 
@@ -XXX,XX +XXX,XX @@ DO_2M_CRYPTO(SHA256SU0, aa32_sha2, 2)
 
 static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
 {
+    TCGv_i32 tmp;
     int pass;
 
     /* Handle a 2-reg-misc operation by iterating 32 bits at a time */
@@ -XXX,XX +XXX,XX @@ static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
         return true;
     }
 
+    tmp = tcg_temp_new_i32();
     for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-        TCGv_i32 tmp = neon_load_reg(a->vm, pass);
+        read_neon_element32(tmp, a->vm, pass, MO_32);
         fn(tmp, tmp);
-        neon_store_reg(a->vd, pass, tmp);
+        write_neon_element32(tmp, a->vd, pass, MO_32);
     }
+    tcg_temp_free_i32(tmp);
 
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VTRN(DisasContext *s, arg_2misc *a)
         return true;
     }
 
-    if (a->size == 2) {
+    tmp = tcg_temp_new_i32();
+    tmp2 = tcg_temp_new_i32();
+    if (a->size == MO_32) {
         for (pass = 0; pass < (a->q ? 4 : 2); pass += 2) {
-            tmp = neon_load_reg(a->vm, pass);
-            tmp2 = neon_load_reg(a->vd, pass + 1);
-            neon_store_reg(a->vm, pass, tmp2);
-            neon_store_reg(a->vd, pass + 1, tmp);
+            read_neon_element32(tmp, a->vm, pass, MO_32);
+            read_neon_element32(tmp2, a->vd, pass + 1, MO_32);
+            write_neon_element32(tmp2, a->vm, pass, MO_32);
+            write_neon_element32(tmp, a->vd, pass + 1, MO_32);
         }
     } else {
         for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
-            tmp = neon_load_reg(a->vm, pass);
-            tmp2 = neon_load_reg(a->vd, pass);
-            if (a->size == 0) {
+            read_neon_element32(tmp, a->vm, pass, MO_32);
+            read_neon_element32(tmp2, a->vd, pass, MO_32);
+            if (a->size == MO_8) {
                 gen_neon_trn_u8(tmp, tmp2);
             } else {
                 gen_neon_trn_u16(tmp, tmp2);
             }
-            neon_store_reg(a->vm, pass, tmp2);
-            neon_store_reg(a->vd, pass, tmp);
+            write_neon_element32(tmp2, a->vm, pass, MO_32);
+            write_neon_element32(tmp, a->vd, pass, MO_32);
         }
     }
+    tcg_temp_free_i32(tmp);
+    tcg_temp_free_i32(tmp2);
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We can then use this to improve VMOV (scalar to gp) and
VMOV (gp to scalar) so that we simply perform the memory
operation that we wanted, rather than inserting or
extracting from a 32-bit quantity.

These were the last uses of neon_load/store_reg, so remove them.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-7-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         | 50 +++++++++++++-----------
 target/arm/translate-vfp.c.inc | 71 +++++-----------------------------
 2 files changed, 37 insertions(+), 84 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long neon_full_reg_offset(unsigned reg)
  * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
  * where 0 is the least significant end of the register.
  */
-static long neon_element_offset(int reg, int element, MemOp size)
+static long neon_element_offset(int reg, int element, MemOp memop)
 {
-    int element_size = 1 << size;
+    int element_size = 1 << (memop & MO_SIZE);
     int ofs = element * element_size;
 #ifdef HOST_WORDS_BIGENDIAN
     /*
@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-static TCGv_i32 neon_load_reg(int reg, int pass)
-{
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_ld_i32(tmp, cpu_env, neon_element_offset(reg, pass, MO_32));
-    return tmp;
-}
-
-static void neon_store_reg(int reg, int pass, TCGv_i32 var)
-{
-    tcg_gen_st_i32(var, cpu_env, neon_element_offset(reg, pass, MO_32));
-    tcg_temp_free_i32(var);
-}
-
 static inline void neon_load_reg64(TCGv_i64 var, int reg)
 {
     tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg32(TCGv_i32 var, int reg)
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
-static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
+static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
 {
-    long off = neon_element_offset(reg, ele, size);
+    long off = neon_element_offset(reg, ele, memop);
 
-    switch (size) {
-    case MO_32:
+    switch (memop) {
+    case MO_SB:
+        tcg_gen_ld8s_i32(dest, cpu_env, off);
+        break;
+    case MO_UB:
+        tcg_gen_ld8u_i32(dest, cpu_env, off);
+        break;
+    case MO_SW:
+        tcg_gen_ld16s_i32(dest, cpu_env, off);
+        break;
+    case MO_UW:
+        tcg_gen_ld16u_i32(dest, cpu_env, off);
+        break;
+    case MO_UL:
+    case MO_SL:
         tcg_gen_ld_i32(dest, cpu_env, off);
         break;
     default:
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp size)
     }
 }
 
-static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp size)
+static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
 {
-    long off = neon_element_offset(reg, ele, size);
+    long off = neon_element_offset(reg, ele, memop);
 
-    switch (size) {
+    switch (memop) {
+    case MO_8:
+        tcg_gen_st8_i32(src, cpu_env, off);
+        break;
+    case MO_16:
+        tcg_gen_st16_i32(src, cpu_env, off);
+        break;
     case MO_32:
         tcg_gen_st_i32(src, cpu_env, off);
         break;
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
 {
     /* VMOV scalar to general purpose register */
     TCGv_i32 tmp;
-    int pass;
-    uint32_t offset;
 
-    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
-    if (a->size == 2
+    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
+    if (a->size == MO_32
         ? !dc_isar_feature(aa32_fpsp_v2, s)
         : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
         return false;
     }
 
-    offset = a->index << a->size;
-    pass = extract32(offset, 2, 1);
-    offset = extract32(offset, 0, 2) * 8;
-
     if (!vfp_access_check(s)) {
         return true;
     }
 
-    tmp = neon_load_reg(a->vn, pass);
-    switch (a->size) {
-    case 0:
-        if (offset) {
-            tcg_gen_shri_i32(tmp, tmp, offset);
-        }
-        if (a->u) {
-            gen_uxtb(tmp);
-        } else {
-            gen_sxtb(tmp);
-        }
-        break;
-    case 1:
-        if (a->u) {
-            if (offset) {
-                tcg_gen_shri_i32(tmp, tmp, 16);
-            } else {
-                gen_uxth(tmp);
-            }
-        } else {
-            if (offset) {
-                tcg_gen_sari_i32(tmp, tmp, 16);
-            } else {
-                gen_sxth(tmp);
-            }
-        }
-        break;
-    case 2:
-        break;
-    }
+    tmp = tcg_temp_new_i32();
+    read_neon_element32(tmp, a->vn, a->index, a->size | (a->u ? 0 : MO_SIGN));
     store_reg(s, a->rt, tmp);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_to_gp(DisasContext *s, arg_VMOV_to_gp *a)
 static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
 {
     /* VMOV general purpose register to scalar */
-    TCGv_i32 tmp, tmp2;
-    int pass;
-    uint32_t offset;
+    TCGv_i32 tmp;
 
-    /* SIZE == 2 is a VFP instruction; otherwise NEON.  */
-    if (a->size == 2
+    /* SIZE == MO_32 is a VFP instruction; otherwise NEON.  */
+    if (a->size == MO_32
         ? !dc_isar_feature(aa32_fpsp_v2, s)
         : !arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_from_gp(DisasContext *s, arg_VMOV_from_gp *a)
         return false;
     }
 
-    offset = a->index << a->size;
-    pass = extract32(offset, 2, 1);
-    offset = extract32(offset, 0, 2) * 8;
-
     if (!vfp_access_check(s)) {
         return true;
     }
 
     tmp = load_reg(s, a->rt);
-    switch (a->size) {
-    case 0:
-        tmp2 = neon_load_reg(a->vn, pass);
-        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 8);
-        tcg_temp_free_i32(tmp2);
-        break;
-    case 1:
-        tmp2 = neon_load_reg(a->vn, pass);
-        tcg_gen_deposit_i32(tmp, tmp2, tmp, offset, 16);
-        tcg_temp_free_i32(tmp2);
-        break;
-    case 2:
-        break;
-    }
-    neon_store_reg(a->vn, pass, tmp);
+    write_neon_element32(tmp, a->vn, a->index, a->size);
+    tcg_temp_free_i32(tmp);
 
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The only uses of this function are for loading VFP
single-precision values, and nothing to do with NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         |   4 +-
 target/arm/translate-vfp.c.inc | 184 ++++++++++++++++-----------------
 2 files changed, 94 insertions(+), 94 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
     tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
 }
 
-static inline void neon_load_reg32(TCGv_i32 var, int reg)
+static inline void vfp_load_reg32(TCGv_i32 var, int reg)
 {
     tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
-static inline void neon_store_reg32(TCGv_i32 var, int reg)
+static inline void vfp_store_reg32(TCGv_i32 var, int reg)
 {
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         frn = tcg_temp_new_i32();
         frm = tcg_temp_new_i32();
         dest = tcg_temp_new_i32();
-        neon_load_reg32(frn, rn);
-        neon_load_reg32(frm, rm);
+        vfp_load_reg32(frn, rn);
+        vfp_load_reg32(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i32(TCG_COND_EQ, dest, cpu_ZF, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         if (sz == 1) {
             tcg_gen_andi_i32(dest, dest, 0xffff);
         }
-        neon_store_reg32(dest, rd);
+        vfp_store_reg32(dest, rd);
         tcg_temp_free_i32(frn);
         tcg_temp_free_i32(frm);
         tcg_temp_free_i32(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i32 tcg_res;
         tcg_op = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        neon_load_reg32(tcg_op, rm);
+        vfp_load_reg32(tcg_op, rm);
         if (sz == 1) {
             gen_helper_rinth(tcg_res, tcg_op, fpst);
         } else {
             gen_helper_rints(tcg_res, tcg_op, fpst);
         }
-        neon_store_reg32(tcg_res, rd);
+        vfp_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_op);
         tcg_temp_free_i32(tcg_res);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
             gen_helper_vfp_tould(tcg_res, tcg_double, tcg_shift, fpst);
         }
         tcg_gen_extrl_i64_i32(tcg_tmp, tcg_res);
-        neon_store_reg32(tcg_tmp, rd);
+        vfp_store_reg32(tcg_tmp, rd);
         tcg_temp_free_i32(tcg_tmp);
         tcg_temp_free_i64(tcg_res);
         tcg_temp_free_i64(tcg_double);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         TCGv_i32 tcg_single, tcg_res;
         tcg_single = tcg_temp_new_i32();
         tcg_res = tcg_temp_new_i32();
-        neon_load_reg32(tcg_single, rm);
+        vfp_load_reg32(tcg_single, rm);
         if (sz == 1) {
             if (is_signed) {
                 gen_helper_vfp_toslh(tcg_res, tcg_single, tcg_shift, fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
                 gen_helper_vfp_touls(tcg_res, tcg_single, tcg_shift, fpst);
             }
         }
-        neon_store_reg32(tcg_res, rd);
+        vfp_store_reg32(tcg_res, rd);
         tcg_temp_free_i32(tcg_res);
         tcg_temp_free_i32(tcg_single);
     }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_half(DisasContext *s, arg_VMOV_single *a)
     if (a->l) {
         /* VFP to general purpose register */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vn);
+        vfp_load_reg32(tmp, a->vn);
         tcg_gen_andi_i32(tmp, tmp, 0xffff);
         store_reg(s, a->rt, tmp);
     } else {
         /* general purpose register to VFP */
         tmp = load_reg(s, a->rt);
         tcg_gen_andi_i32(tmp, tmp, 0xffff);
-        neon_store_reg32(tmp, a->vn);
+        vfp_store_reg32(tmp, a->vn);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
     if (a->l) {
         /* VFP to general purpose register */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vn);
+        vfp_load_reg32(tmp, a->vn);
         if (a->rt == 15) {
             /* Set the 4 flag bits in the CPSR.  */
             gen_set_nzcv(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_single(DisasContext *s, arg_VMOV_single *a)
     } else {
         /* general purpose register to VFP */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vn);
+        vfp_store_reg32(tmp, a->vn);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_sp(DisasContext *s, arg_VMOV_64_sp *a)
     if (a->op) {
         /* fpreg to gpreg */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm);
+        vfp_load_reg32(tmp, a->vm);
         store_reg(s, a->rt, tmp);
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm + 1);
+        vfp_load_reg32(tmp, a->vm + 1);
         store_reg(s, a->rt2, tmp);
     } else {
         /* gpreg to fpreg */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vm);
+        vfp_store_reg32(tmp, a->vm);
         tcg_temp_free_i32(tmp);
         tmp = load_reg(s, a->rt2);
-        neon_store_reg32(tmp, a->vm + 1);
+        vfp_store_reg32(tmp, a->vm + 1);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_64_dp(DisasContext *s, arg_VMOV_64_dp *a)
     if (a->op) {
         /* fpreg to gpreg */
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm * 2);
+        vfp_load_reg32(tmp, a->vm * 2);
         store_reg(s, a->rt, tmp);
         tmp = tcg_temp_new_i32();
-        neon_load_reg32(tmp, a->vm * 2 + 1);
+        vfp_load_reg32(tmp, a->vm * 2 + 1);
         store_reg(s, a->rt2, tmp);
     } else {
         /* gpreg to fpreg */
         tmp = load_reg(s, a->rt);
-        neon_store_reg32(tmp, a->vm * 2);
+        vfp_store_reg32(tmp, a->vm * 2);
         tcg_temp_free_i32(tmp);
         tmp = load_reg(s, a->rt2);
-        neon_store_reg32(tmp, a->vm * 2 + 1);
+        vfp_store_reg32(tmp, a->vm * 2 + 1);
         tcg_temp_free_i32(tmp);
     }
 
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_hp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     tmp = tcg_temp_new_i32();
     if (a->l) {
         gen_aa32_ld16u(s, tmp, addr, get_mem_index(s));
-        neon_store_reg32(tmp, a->vd);
+        vfp_store_reg32(tmp, a->vd);
     } else {
-        neon_load_reg32(tmp, a->vd);
+        vfp_load_reg32(tmp, a->vd);
         gen_aa32_st16(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_sp(DisasContext *s, arg_VLDR_VSTR_sp *a)
     tmp = tcg_temp_new_i32();
     if (a->l) {
         gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-        neon_store_reg32(tmp, a->vd);
+        vfp_store_reg32(tmp, a->vd);
     } else {
-        neon_load_reg32(tmp, a->vd);
+        vfp_load_reg32(tmp, a->vd);
         gen_aa32_st32(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_sp(DisasContext *s, arg_VLDM_VSTM_sp *a)
         if (a->l) {
             /* load */
             gen_aa32_ld32u(s, tmp, addr, get_mem_index(s));
-            neon_store_reg32(tmp, a->vd + i);
+            vfp_store_reg32(tmp, a->vd + i);
         } else {
             /* store */
-            neon_load_reg32(tmp, a->vd + i);
+            vfp_load_reg32(tmp, a->vd + i);
             gen_aa32_st32(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
     fd = tcg_temp_new_i32();
     fpst = fpstatus_ptr(FPST_FPCR);
 
-    neon_load_reg32(f0, vn);
-    neon_load_reg32(f1, vm);
+    vfp_load_reg32(f0, vn);
+    vfp_load_reg32(f1, vm);
 
     for (;;) {
         if (reads_vd) {
-            neon_load_reg32(fd, vd);
+            vfp_load_reg32(fd, vd);
         }
         fn(fd, f0, f1, fpst);
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
         veclen--;
         vd = vfp_advance_sreg(vd, delta_d);
         vn = vfp_advance_sreg(vn, delta_d);
-        neon_load_reg32(f0, vn);
+        vfp_load_reg32(f0, vn);
         if (delta_m) {
             vm = vfp_advance_sreg(vm, delta_m);
-            neon_load_reg32(f1, vm);
+            vfp_load_reg32(f1, vm);
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_hp(DisasContext *s, VFPGen3OpSPFn *fn,
     fd = tcg_temp_new_i32();
     fpst = fpstatus_ptr(FPST_FPCR_F16);
 
-    neon_load_reg32(f0, vn);
-    neon_load_reg32(f1, vm);
+    vfp_load_reg32(f0, vn);
+    vfp_load_reg32(f1, vm);
 
     if (reads_vd) {
-        neon_load_reg32(fd, vd);
+        vfp_load_reg32(fd, vd);
     }
     fn(fd, f0, f1, fpst);
-    neon_store_reg32(fd, vd);
+    vfp_store_reg32(fd, vd);
 
     tcg_temp_free_i32(f0);
     tcg_temp_free_i32(f1);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     f0 = tcg_temp_new_i32();
     fd = tcg_temp_new_i32();
 
-    neon_load_reg32(f0, vm);
+    vfp_load_reg32(f0, vm);
 
     for (;;) {
         fn(fd, f0);
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
             /* single source one-many */
             while (veclen--) {
                 vd = vfp_advance_sreg(vd, delta_d);
-                neon_store_reg32(fd, vd);
+                vfp_store_reg32(fd, vd);
             }
             break;
         }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
         veclen--;
         vd = vfp_advance_sreg(vd, delta_d);
         vm = vfp_advance_sreg(vm, delta_m);
-        neon_load_reg32(f0, vm);
+        vfp_load_reg32(f0, vm);
     }
 
     tcg_temp_free_i32(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     }
 
     f0 = tcg_temp_new_i32();
-    neon_load_reg32(f0, vm);
+    vfp_load_reg32(f0, vm);
     fn(f0, f0);
-    neon_store_reg32(f0, vd);
+    vfp_store_reg32(f0, vd);
     tcg_temp_free_i32(f0);
 
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i32();
 
-    neon_load_reg32(vn, a->vn);
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vn, a->vn);
+    vfp_load_reg32(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negh(vn, vn);
     }
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negh(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_vfp_muladdh(vd, vn, vm, vd, fpst);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_sp(DisasContext *s, arg_VFMA_sp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i32();
 
-    neon_load_reg32(vn, a->vn);
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vn, a->vn);
+    vfp_load_reg32(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negs(vn, vn);
     }
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negs(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_vfp_muladds(vd, vn, vm, vd, fpst);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_hp(DisasContext *s, arg_VMOV_imm_sp *a)
     }
 
     fd = tcg_const_i32(vfp_expand_imm(MO_16, a->imm));
-    neon_store_reg32(fd, a->vd);
+    vfp_store_reg32(fd, a->vd);
     tcg_temp_free_i32(fd);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
     fd = tcg_const_i32(vfp_expand_imm(MO_32, a->imm));
 
     for (;;) {
-        neon_store_reg32(fd, vd);
+        vfp_store_reg32(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp *a)
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i32();
 
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i32(vm, 0);
     } else {
-        neon_load_reg32(vm, a->vm);
+        vfp_load_reg32(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_sp(DisasContext *s, arg_VCMP_sp *a)
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i32();
 
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i32(vm, 0);
     } else {
-        neon_load_reg32(vm, a->vm);
+        vfp_load_reg32(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f32_f16(DisasContext *s, arg_VCVT_f32_f16 *a)
     /* The T bit tells us if we want the low or high 16 bits of Vm */
     tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
     gen_helper_vfp_fcvt_f16_to_f32(tmp, tmp, fpst, ahp_mode);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_i32(ahp_mode);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f32(DisasContext *s, arg_VCVT_f16_f32 *a)
     ahp_mode = get_ahp_flag();
     tmp = tcg_temp_new_i32();
 
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     gen_helper_vfp_fcvt_f32_to_f16(tmp, tmp, fpst, ahp_mode);
     tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
     tcg_temp_free_i32(ahp_mode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_hp(DisasContext *s, arg_VRINTR_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_rinth(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_sp(DisasContext *s, arg_VRINTR_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rints(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_hp(DisasContext *s, arg_VRINTZ_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rinth(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tcg_rmode);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_sp(DisasContext *s, arg_VRINTZ_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rints(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tcg_rmode);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_hp(DisasContext *s, arg_VRINTX_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     gen_helper_rinth_exact(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_sp(DisasContext *s, arg_VRINTX_sp *a)
     }
 
     tmp = tcg_temp_new_i32();
-    neon_load_reg32(tmp, a->vm);
+    vfp_load_reg32(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rints_exact(tmp, tmp, fpst);
-    neon_store_reg32(tmp, a->vd);
+    vfp_store_reg32(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
 
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i64();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     gen_helper_vfp_fcvtds(vd, vm, cpu_env);
     neon_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
     vm = tcg_temp_new_i64();
     neon_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i64(vm);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
     }
 
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     if (a->s) {
         /* i32 -> f16 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_hp(DisasContext *s, arg_VCVT_int_sp *a)
         /* u32 -> f16 */
         gen_helper_vfp_uitoh(vm, vm, fpst);
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
     }
 
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     if (a->s) {
         /* i32 -> f32 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_sp(DisasContext *s, arg_VCVT_int_sp *a)
         /* u32 -> f32 */
         gen_helper_vfp_uitos(vm, vm, fpst);
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
 
     vm = tcg_temp_new_i32();
     vd = tcg_temp_new_i64();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     if (a->s) {
         /* i32 -> f64 */
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
     vd = tcg_temp_new_i32();
     neon_load_reg64(vm, a->vm);
     gen_helper_vjcvt(vd, vm, cpu_env);
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i64(vm);
     tcg_temp_free_i32(vd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i32();
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_hp(DisasContext *s, arg_VCVT_fix_sp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i32();
-    neon_load_reg32(vd, a->vd);
+    vfp_load_reg32(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_sp(DisasContext *s, arg_VCVT_fix_sp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
 
     fpst = fpstatus_ptr(FPST_FPCR_F16);
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_hp_int(DisasContext *s, arg_VCVT_sp_int *a)
             gen_helper_vfp_touih(vm, vm, fpst);
         }
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
 
     fpst = fpstatus_ptr(FPST_FPCR);
     vm = tcg_temp_new_i32();
-    neon_load_reg32(vm, a->vm);
+    vfp_load_reg32(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp_int(DisasContext *s, arg_VCVT_sp_int *a)
             gen_helper_vfp_touis(vm, vm, fpst);
         }
     }
-    neon_store_reg32(vm, a->vd);
+    vfp_store_reg32(vm, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_ptr(fpst);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
             gen_helper_vfp_touid(vd, vm, fpst);
         }
     }
-    neon_store_reg32(vd, a->vd);
+    vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
     tcg_temp_free_i64(vm);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VINS(DisasContext *s, arg_VINS *a)
     /* Insert low half of Vm into high half of Vd */
     rm = tcg_temp_new_i32();
     rd = tcg_temp_new_i32();
-    neon_load_reg32(rm, a->vm);
-    neon_load_reg32(rd, a->vd);
+    vfp_load_reg32(rm, a->vm);
+    vfp_load_reg32(rd, a->vd);
     tcg_gen_deposit_i32(rd, rd, rm, 16, 16);
-    neon_store_reg32(rd, a->vd);
+    vfp_store_reg32(rd, a->vd);
     tcg_temp_free_i32(rm);
     tcg_temp_free_i32(rd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOVX(DisasContext *s, arg_VINS *a)
 
     /* Set Vd to high half of Vm */
     rm = tcg_temp_new_i32();
-    neon_load_reg32(rm, a->vm);
+    vfp_load_reg32(rm, a->vm);
     tcg_gen_shri_i32(rm, rm, 16);
-    neon_store_reg32(rm, a->vd);
+    vfp_store_reg32(rm, a->vd);
     tcg_temp_free_i32(rm);
     return true;
 }
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-9-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          | 26 +++++++++
 target/arm/translate-neon.c.inc | 94 ++++++++++++++++-----------------
 2 files changed, 73 insertions(+), 47 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
     }
 }
 
+static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
+{
+    long off = neon_element_offset(reg, ele, memop);
+
+    switch (memop) {
+    case MO_Q:
+        tcg_gen_ld_i64(dest, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
 {
     long off = neon_element_offset(reg, ele, memop);
@@ -XXX,XX +XXX,XX @@ static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
     }
 }
 
+static void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
+{
+    long off = neon_element_offset(reg, ele, memop);
+
+    switch (memop) {
+    case MO_64:
+        tcg_gen_st_i64(src, cpu_env, off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 {
     TCGv_ptr ret = tcg_temp_new_ptr();
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
     for (pass = 0; pass < a->q + 1; pass++) {
         TCGv_i64 tmp = tcg_temp_new_i64();
 
-        neon_load_reg64(tmp, a->vm + pass);
+        read_neon_element64(tmp, a->vm, pass, MO_64);
         fn(tmp, cpu_env, tmp, constimm);
-        neon_store_reg64(tmp, a->vd + pass);
+        write_neon_element64(tmp, a->vd, pass, MO_64);
         tcg_temp_free_i64(tmp);
     }
     tcg_temp_free_i64(constimm);
@@ -XXX,XX +XXX,XX @@ static bool do_2shift_narrow_64(DisasContext *s, arg_2reg_shift *a,
     rd = tcg_temp_new_i32();
 
     /* Load both inputs first to avoid potential overwrite if rm == rd */
-    neon_load_reg64(rm1, a->vm);
-    neon_load_reg64(rm2, a->vm + 1);
+    read_neon_element64(rm1, a->vm, 0, MO_64);
+    read_neon_element64(rm2, a->vm, 1, MO_64);
 
     shiftfn(rm1, rm1, constimm);
     narrowfn(rd, cpu_env, rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
     }
-    neon_store_reg64(tmp, a->vd);
+    write_neon_element64(tmp, a->vd, 0, MO_64);
 
     widenfn(tmp, rm1);
     tcg_temp_free_i32(rm1);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
     }
-    neon_store_reg64(tmp, a->vd + 1);
+    write_neon_element64(tmp, a->vd, 1, MO_64);
     tcg_temp_free_i64(tmp);
     return true;
 }
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rm_64 = tcg_temp_new_i64();
 
     if (src1_wide) {
-        neon_load_reg64(rn0_64, a->vn);
+        read_neon_element64(rn0_64, a->vn, 0, MO_64);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 0, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
      * avoid incorrect results if a narrow input overlaps with the result.
      */
     if (src1_wide) {
-        neon_load_reg64(rn1_64, a->vn + 1);
+        read_neon_element64(rn1_64, a->vn, 1, MO_64);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rm = tcg_temp_new_i32();
     read_neon_element32(rm, a->vm, 1, MO_32);
 
-    neon_store_reg64(rn0_64, a->vd);
+    write_neon_element64(rn0_64, a->vd, 0, MO_64);
 
     widenfn(rm_64, rm);
     tcg_temp_free_i32(rm);
     opfn(rn1_64, rn1_64, rm_64);
-    neon_store_reg64(rn1_64, a->vd + 1);
+    write_neon_element64(rn1_64, a->vd, 1, MO_64);
 
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
     rd0 = tcg_temp_new_i32();
     rd1 = tcg_temp_new_i32();
 
-    neon_load_reg64(rn_64, a->vn);
-    neon_load_reg64(rm_64, a->vm);
+    read_neon_element64(rn_64, a->vn, 0, MO_64);
+    read_neon_element64(rm_64, a->vm, 0, MO_64);
 
     opfn(rn_64, rn_64, rm_64);
 
     narrowfn(rd0, rn_64);
 
-    neon_load_reg64(rn_64, a->vn + 1);
-    neon_load_reg64(rm_64, a->vm + 1);
+    read_neon_element64(rn_64, a->vn, 1, MO_64);
+    read_neon_element64(rm_64, a->vm, 1, MO_64);
 
     opfn(rn_64, rn_64, rm_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     /* Don't store results until after all loads: they might overlap */
     if (accfn) {
         tmp = tcg_temp_new_i64();
-        neon_load_reg64(tmp, a->vd);
+        read_neon_element64(tmp, a->vd, 0, MO_64);
         accfn(tmp, tmp, rd0);
-        neon_store_reg64(tmp, a->vd);
-        neon_load_reg64(tmp, a->vd + 1);
+        write_neon_element64(tmp, a->vd, 0, MO_64);
+        read_neon_element64(tmp, a->vd, 1, MO_64);
         accfn(tmp, tmp, rd1);
-        neon_store_reg64(tmp, a->vd + 1);
+        write_neon_element64(tmp, a->vd, 1, MO_64);
         tcg_temp_free_i64(tmp);
     } else {
-        neon_store_reg64(rd0, a->vd);
-        neon_store_reg64(rd1, a->vd + 1);
+        write_neon_element64(rd0, a->vd, 0, MO_64);
+        write_neon_element64(rd1, a->vd, 1, MO_64);
     }
 
     tcg_temp_free_i64(rd0);
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
 
     if (accfn) {
         TCGv_i64 t64 = tcg_temp_new_i64();
-        neon_load_reg64(t64, a->vd);
+        read_neon_element64(t64, a->vd, 0, MO_64);
         accfn(t64, t64, rn0_64);
-        neon_store_reg64(t64, a->vd);
-        neon_load_reg64(t64, a->vd + 1);
+        write_neon_element64(t64, a->vd, 0, MO_64);
+        read_neon_element64(t64, a->vd, 1, MO_64);
         accfn(t64, t64, rn1_64);
-        neon_store_reg64(t64, a->vd + 1);
+        write_neon_element64(t64, a->vd, 1, MO_64);
         tcg_temp_free_i64(t64);
     } else {
-        neon_store_reg64(rn0_64, a->vd);
-        neon_store_reg64(rn1_64, a->vd + 1);
+        write_neon_element64(rn0_64, a->vd, 0, MO_64);
+        write_neon_element64(rn1_64, a->vd, 1, MO_64);
     }
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
         right = tcg_temp_new_i64();
         dest = tcg_temp_new_i64();
 
-        neon_load_reg64(right, a->vn);
-        neon_load_reg64(left, a->vm);
+        read_neon_element64(right, a->vn, 0, MO_64);
+        read_neon_element64(left, a->vm, 0, MO_64);
         tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
-        neon_store_reg64(dest, a->vd);
+        write_neon_element64(dest, a->vd, 0, MO_64);
 
         tcg_temp_free_i64(left);
         tcg_temp_free_i64(right);
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
         destright = tcg_temp_new_i64();
 
         if (a->imm < 8) {
-            neon_load_reg64(right, a->vn);
-            neon_load_reg64(middle, a->vn + 1);
+            read_neon_element64(right, a->vn, 0, MO_64);
+            read_neon_element64(middle, a->vn, 1, MO_64);
             tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
-            neon_load_reg64(left, a->vm);
+            read_neon_element64(left, a->vm, 0, MO_64);
             tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
         } else {
-            neon_load_reg64(right, a->vn + 1);
-            neon_load_reg64(middle, a->vm);
+            read_neon_element64(right, a->vn, 1, MO_64);
+            read_neon_element64(middle, a->vm, 0, MO_64);
             tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
-            neon_load_reg64(left, a->vm + 1);
+            read_neon_element64(left, a->vm, 1, MO_64);
             tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
         }
 
-        neon_store_reg64(destright, a->vd);
-        neon_store_reg64(destleft, a->vd + 1);
+        write_neon_element64(destright, a->vd, 0, MO_64);
+        write_neon_element64(destleft, a->vd, 1, MO_64);
 
         tcg_temp_free_i64(destright);
         tcg_temp_free_i64(destleft);
@@ -XXX,XX +XXX,XX @@ static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
 
         if (accfn) {
             TCGv_i64 tmp64 = tcg_temp_new_i64();
-            neon_load_reg64(tmp64, a->vd + pass);
+            read_neon_element64(tmp64, a->vd, pass, MO_64);
             accfn(rd_64, tmp64, rd_64);
             tcg_temp_free_i64(tmp64);
         }
-        neon_store_reg64(rd_64, a->vd + pass);
+        write_neon_element64(rd_64, a->vd, pass, MO_64);
         tcg_temp_free_i64(rd_64);
     }
     return true;
@@ -XXX,XX +XXX,XX @@ static bool do_vmovn(DisasContext *s, arg_2misc *a,
     rd0 = tcg_temp_new_i32();
     rd1 = tcg_temp_new_i32();
 
-    neon_load_reg64(rm, a->vm);
+    read_neon_element64(rm, a->vm, 0, MO_64);
     narrowfn(rd0, cpu_env, rm);
-    neon_load_reg64(rm, a->vm + 1);
+    read_neon_element64(rm, a->vm, 1, MO_64);
     narrowfn(rd1, cpu_env, rm);
     write_neon_element32(rd0, a->vd, 0, MO_32);
     write_neon_element32(rd1, a->vd, 1, MO_32);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL(DisasContext *s, arg_2misc *a)
 
     widenfn(rd, rm0);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
-    neon_store_reg64(rd, a->vd);
+    write_neon_element64(rd, a->vd, 0, MO_64);
     widenfn(rd, rm1);
     tcg_gen_shli_i64(rd, rd, 8 << a->size);
-    neon_store_reg64(rd, a->vd + 1);
+    write_neon_element64(rd, a->vd, 1, MO_64);
 
     tcg_temp_free_i64(rd);
     tcg_temp_free_i32(rm0);
@@ -XXX,XX +XXX,XX @@ static bool trans_VSWP(DisasContext *s, arg_2misc *a)
     rm = tcg_temp_new_i64();
     rd = tcg_temp_new_i64();
     for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
-        neon_load_reg64(rm, a->vm + pass);
-        neon_load_reg64(rd, a->vd + pass);
-        neon_store_reg64(rm, a->vd + pass);
-        neon_store_reg64(rd, a->vm + pass);
+        read_neon_element64(rm, a->vm, pass, MO_64);
+        read_neon_element64(rd, a->vd, pass, MO_64);
+        write_neon_element64(rm, a->vd, pass, MO_64);
+        write_neon_element64(rd, a->vm, pass, MO_64);
     }
     tcg_temp_free_i64(rm);
     tcg_temp_free_i64(rd);
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

The only uses of this function are for loading VFP
double-precision values, and nothing to do with NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-10-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c         |  8 ++--
 target/arm/translate-vfp.c.inc | 84 +++++++++++++++++-----------------
 2 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-static inline void neon_load_reg64(TCGv_i64 var, int reg)
+static inline void vfp_load_reg64(TCGv_i64 var, int reg)
 {
-    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(1, reg));
+    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(true, reg));
 }
 
-static inline void neon_store_reg64(TCGv_i64 var, int reg)
+static inline void vfp_store_reg64(TCGv_i64 var, int reg)
 {
-    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
+    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(true, reg));
 }
 
 static inline void vfp_load_reg32(TCGv_i32 var, int reg)
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
         tcg_gen_ext_i32_i64(nf, cpu_NF);
         tcg_gen_ext_i32_i64(vf, cpu_VF);
 
-        neon_load_reg64(frn, rn);
-        neon_load_reg64(frm, rm);
+        vfp_load_reg64(frn, rn);
+        vfp_load_reg64(frm, rm);
         switch (a->cc) {
         case 0: /* eq: Z */
             tcg_gen_movcond_i64(TCG_COND_EQ, dest, zf, zero,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSEL(DisasContext *s, arg_VSEL *a)
             tcg_temp_free_i64(tmp);
             break;
         }
-        neon_store_reg64(dest, rd);
+        vfp_store_reg64(dest, rd);
         tcg_temp_free_i64(frn);
         tcg_temp_free_i64(frm);
         tcg_temp_free_i64(dest);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINT(DisasContext *s, arg_VRINT *a)
         TCGv_i64 tcg_res;
         tcg_op = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
-        neon_load_reg64(tcg_op, rm);
+        vfp_load_reg64(tcg_op, rm);
         gen_helper_rintd(tcg_res, tcg_op, fpst);
-        neon_store_reg64(tcg_res, rd);
+        vfp_store_reg64(tcg_res, rd);
         tcg_temp_free_i64(tcg_op);
         tcg_temp_free_i64(tcg_res);
     } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT(DisasContext *s, arg_VCVT *a)
         tcg_double = tcg_temp_new_i64();
         tcg_res = tcg_temp_new_i64();
         tcg_tmp = tcg_temp_new_i32();
-        neon_load_reg64(tcg_double, rm);
+        vfp_load_reg64(tcg_double, rm);
         if (is_signed) {
             gen_helper_vfp_tosld(tcg_res, tcg_double, tcg_shift, fpst);
         } else {
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDR_VSTR_dp(DisasContext *s, arg_VLDR_VSTR_dp *a)
     tmp = tcg_temp_new_i64();
     if (a->l) {
         gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-        neon_store_reg64(tmp, a->vd);
+        vfp_store_reg64(tmp, a->vd);
     } else {
-        neon_load_reg64(tmp, a->vd);
+        vfp_load_reg64(tmp, a->vd);
         gen_aa32_st64(s, tmp, addr, get_mem_index(s));
     }
     tcg_temp_free_i64(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VLDM_VSTM_dp(DisasContext *s, arg_VLDM_VSTM_dp *a)
         if (a->l) {
             /* load */
             gen_aa32_ld64(s, tmp, addr, get_mem_index(s));
-            neon_store_reg64(tmp, a->vd + i);
+            vfp_store_reg64(tmp, a->vd + i);
         } else {
             /* store */
-            neon_load_reg64(tmp, a->vd + i);
+            vfp_load_reg64(tmp, a->vd + i);
             gen_aa32_st64(s, tmp, addr, get_mem_index(s));
         }
         tcg_gen_addi_i32(addr, addr, offset);
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
     fd = tcg_temp_new_i64();
     fpst = fpstatus_ptr(FPST_FPCR);
 
-    neon_load_reg64(f0, vn);
-    neon_load_reg64(f1, vm);
+    vfp_load_reg64(f0, vn);
+    vfp_load_reg64(f1, vm);
 
     for (;;) {
         if (reads_vd) {
-            neon_load_reg64(fd, vd);
+            vfp_load_reg64(fd, vd);
         }
         fn(fd, f0, f1, fpst);
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
         veclen--;
         vd = vfp_advance_dreg(vd, delta_d);
         vn = vfp_advance_dreg(vn, delta_d);
-        neon_load_reg64(f0, vn);
+        vfp_load_reg64(f0, vn);
         if (delta_m) {
             vm = vfp_advance_dreg(vm, delta_m);
-            neon_load_reg64(f1, vm);
+            vfp_load_reg64(f1, vm);
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
     f0 = tcg_temp_new_i64();
     fd = tcg_temp_new_i64();
 
-    neon_load_reg64(f0, vm);
+    vfp_load_reg64(f0, vm);
 
     for (;;) {
         fn(fd, f0);
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
             /* single source one-many */
             while (veclen--) {
                 vd = vfp_advance_dreg(vd, delta_d);
-                neon_store_reg64(fd, vd);
+                vfp_store_reg64(fd, vd);
             }
             break;
         }
@@ -XXX,XX +XXX,XX @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
         veclen--;
         vd = vfp_advance_dreg(vd, delta_d);
         vd = vfp_advance_dreg(vm, delta_m);
-        neon_load_reg64(f0, vm);
+        vfp_load_reg64(f0, vm);
     }
 
     tcg_temp_free_i64(f0);
@@ -XXX,XX +XXX,XX @@ static bool do_vfm_dp(DisasContext *s, arg_VFMA_dp *a, bool neg_n, bool neg_d)
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i64();
 
-    neon_load_reg64(vn, a->vn);
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vn, a->vn);
+    vfp_load_reg64(vm, a->vm);
     if (neg_n) {
         /* VFNMS, VFMS */
         gen_helper_vfp_negd(vn, vn);
     }
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
     if (neg_d) {
         /* VFNMA, VFNMS */
         gen_helper_vfp_negd(vd, vd);
     }
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_vfp_muladdd(vd, vn, vm, vd, fpst);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
 
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(vn);
@@ -XXX,XX +XXX,XX @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     fd = tcg_const_i64(vfp_expand_imm(MO_64, a->imm));
 
     for (;;) {
-        neon_store_reg64(fd, vd);
+        vfp_store_reg64(fd, vd);
 
         if (veclen == 0) {
             break;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCMP_dp(DisasContext *s, arg_VCMP_dp *a)
     vd = tcg_temp_new_i64();
     vm = tcg_temp_new_i64();
 
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
     if (a->z) {
         tcg_gen_movi_i64(vm, 0);
     } else {
-        neon_load_reg64(vm, a->vm);
+        vfp_load_reg64(vm, a->vm);
     }
 
     if (a->e) {
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f64_f16(DisasContext *s, arg_VCVT_f64_f16 *a)
     tcg_gen_ld16u_i32(tmp, cpu_env, vfp_f16_offset(a->vm, a->t));
     vd = tcg_temp_new_i64();
     gen_helper_vfp_fcvt_f16_to_f64(vd, tmp, fpst, ahp_mode);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(ahp_mode);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i32(tmp);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_f16_f64(DisasContext *s, arg_VCVT_f16_f64 *a)
     tmp = tcg_temp_new_i32();
     vm = tcg_temp_new_i64();
 
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvt_f64_to_f16(tmp, vm, fpst, ahp_mode);
     tcg_temp_free_i64(vm);
     tcg_gen_st16_i32(tmp, cpu_env, vfp_f16_offset(a->vd, a->t));
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTR_dp(DisasContext *s, arg_VRINTR_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rintd(tmp, tmp, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTZ_dp(DisasContext *s, arg_VRINTZ_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     tcg_rmode = tcg_const_i32(float_round_to_zero);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
     gen_helper_rintd(tmp, tmp, fpst);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     tcg_temp_free_i32(tcg_rmode);
@@ -XXX,XX +XXX,XX @@ static bool trans_VRINTX_dp(DisasContext *s, arg_VRINTX_dp *a)
     }
 
     tmp = tcg_temp_new_i64();
-    neon_load_reg64(tmp, a->vm);
+    vfp_load_reg64(tmp, a->vm);
     fpst = fpstatus_ptr(FPST_FPCR);
     gen_helper_rintd_exact(tmp, tmp, fpst);
-    neon_store_reg64(tmp, a->vd);
+    vfp_store_reg64(tmp, a->vd);
     tcg_temp_free_ptr(fpst);
     tcg_temp_free_i64(tmp);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_sp(DisasContext *s, arg_VCVT_sp *a)
     vd = tcg_temp_new_i64();
     vfp_load_reg32(vm, a->vm);
     gen_helper_vfp_fcvtds(vd, vm, cpu_env);
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_i64(vd);
     return true;
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp(DisasContext *s, arg_VCVT_dp *a)
 
     vd = tcg_temp_new_i32();
     vm = tcg_temp_new_i64();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vfp_fcvtsd(vd, vm, cpu_env);
     vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i32(vd);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_int_dp(DisasContext *s, arg_VCVT_int_dp *a)
         /* u32 -> f64 */
         gen_helper_vfp_uitod(vd, vm, fpst);
     }
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i32(vm);
     tcg_temp_free_i64(vd);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VJCVT(DisasContext *s, arg_VJCVT *a)
 
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i32();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
     gen_helper_vjcvt(vd, vm, cpu_env);
     vfp_store_reg32(vd, a->vd);
     tcg_temp_free_i64(vm);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
     frac_bits = (a->opc & 1) ? (32 - a->imm) : (16 - a->imm);
 
     vd = tcg_temp_new_i64();
-    neon_load_reg64(vd, a->vd);
+    vfp_load_reg64(vd, a->vd);
 
     fpst = fpstatus_ptr(FPST_FPCR);
     shift = tcg_const_i32(frac_bits);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_fix_dp(DisasContext *s, arg_VCVT_fix_dp *a)
         g_assert_not_reached();
     }
 
-    neon_store_reg64(vd, a->vd);
+    vfp_store_reg64(vd, a->vd);
     tcg_temp_free_i64(vd);
     tcg_temp_free_i32(shift);
     tcg_temp_free_ptr(fpst);
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
     fpst = fpstatus_ptr(FPST_FPCR);
     vm = tcg_temp_new_i64();
     vd = tcg_temp_new_i32();
-    neon_load_reg64(vm, a->vm);
+    vfp_load_reg64(vm, a->vm);
 
     if (a->s) {
         if (a->rz) {
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

In both cases, we can sink the write-back and perform
the accumulate into the normal destination temps.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-11-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-neon.c.inc | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool do_long_3d(DisasContext *s, arg_3diff *a,
     if (accfn) {
         tmp = tcg_temp_new_i64();
         read_neon_element64(tmp, a->vd, 0, MO_64);
-        accfn(tmp, tmp, rd0);
-        write_neon_element64(tmp, a->vd, 0, MO_64);
+        accfn(rd0, tmp, rd0);
         read_neon_element64(tmp, a->vd, 1, MO_64);
-        accfn(tmp, tmp, rd1);
-        write_neon_element64(tmp, a->vd, 1, MO_64);
+        accfn(rd1, tmp, rd1);
         tcg_temp_free_i64(tmp);
-    } else {
-        write_neon_element64(rd0, a->vd, 0, MO_64);
-        write_neon_element64(rd1, a->vd, 1, MO_64);
     }
 
+    write_neon_element64(rd0, a->vd, 0, MO_64);
+    write_neon_element64(rd1, a->vd, 1, MO_64);
     tcg_temp_free_i64(rd0);
     tcg_temp_free_i64(rd1);
 
@@ -XXX,XX +XXX,XX @@ static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
     if (accfn) {
         TCGv_i64 t64 = tcg_temp_new_i64();
         read_neon_element64(t64, a->vd, 0, MO_64);
-        accfn(t64, t64, rn0_64);
-        write_neon_element64(t64, a->vd, 0, MO_64);
+        accfn(rn0_64, t64, rn0_64);
         read_neon_element64(t64, a->vd, 1, MO_64);
-        accfn(t64, t64, rn1_64);
-        write_neon_element64(t64, a->vd, 1, MO_64);
+        accfn(rn1_64, t64, rn1_64);
         tcg_temp_free_i64(t64);
-    } else {
-        write_neon_element64(rn0_64, a->vd, 0, MO_64);
-        write_neon_element64(rn1_64, a->vd, 1, MO_64);
     }
+
+    write_neon_element64(rn0_64, a->vd, 0, MO_64);
+    write_neon_element64(rn1_64, a->vd, 1, MO_64);
     tcg_temp_free_i64(rn0_64);
     tcg_temp_free_i64(rn1_64);
     return true;
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

We can use proper widening loads to extend 32-bit inputs,
and skip the "widenfn" step.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-12-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate.c          |  6 +++
 target/arm/translate-neon.c.inc | 66 ++++++++++++++++++---------------
 2 files changed, 43 insertions(+), 29 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
     long off = neon_element_offset(reg, ele, memop);
 
     switch (memop) {
+    case MO_SL:
+        tcg_gen_ld32s_i64(dest, cpu_env, off);
+        break;
+    case MO_UL:
+        tcg_gen_ld32u_i64(dest, cpu_env, off);
+        break;
     case MO_Q:
         tcg_gen_ld_i64(dest, cpu_env, off);
         break;
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
 static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
                            NeonGenWidenFn *widenfn,
                            NeonGenTwo64OpFn *opfn,
-                           bool src1_wide)
+                           int src1_mop, int src2_mop)
 {
     /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
     TCGv_i64 rn0_64, rn1_64, rm_64;
-    TCGv_i32 rm;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return false;
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
         return false;
     }
 
-    if (!widenfn || !opfn) {
+    if (!opfn) {
         /* size == 3 case, which is an entirely different insn group */
         return false;
     }
 
-    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
+    if ((a->vd & 1) || (src1_mop == MO_Q && (a->vn & 1))) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     rn1_64 = tcg_temp_new_i64();
     rm_64 = tcg_temp_new_i64();
 
-    if (src1_wide) {
-        read_neon_element64(rn0_64, a->vn, 0, MO_64);
+    if (src1_mop >= 0) {
+        read_neon_element64(rn0_64, a->vn, 0, src1_mop);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 0, MO_32);
         widenfn(rn0_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = tcg_temp_new_i32();
-    read_neon_element32(rm, a->vm, 0, MO_32);
+    if (src2_mop >= 0) {
+        read_neon_element64(rm_64, a->vm, 0, src2_mop);
+    } else {
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, 0, MO_32);
+        widenfn(rm_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
 
-    widenfn(rm_64, rm);
-    tcg_temp_free_i32(rm);
     opfn(rn0_64, rn0_64, rm_64);
 
     /*
      * Load second pass inputs before storing the first pass result, to
      * avoid incorrect results if a narrow input overlaps with the result.
      */
-    if (src1_wide) {
-        read_neon_element64(rn1_64, a->vn, 1, MO_64);
+    if (src1_mop >= 0) {
+        read_neon_element64(rn1_64, a->vn, 1, src1_mop);
     } else {
         TCGv_i32 tmp = tcg_temp_new_i32();
         read_neon_element32(tmp, a->vn, 1, MO_32);
         widenfn(rn1_64, tmp);
         tcg_temp_free_i32(tmp);
     }
-    rm = tcg_temp_new_i32();
-    read_neon_element32(rm, a->vm, 1, MO_32);
+    if (src2_mop >= 0) {
+        read_neon_element64(rm_64, a->vm, 1, src2_mop);
+    } else {
+        TCGv_i32 tmp = tcg_temp_new_i32();
+        read_neon_element32(tmp, a->vm, 1, MO_32);
+        widenfn(rm_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
 
     write_neon_element64(rn0_64, a->vd, 0, MO_64);
 
-    widenfn(rm_64, rm);
-    tcg_temp_free_i32(rm);
     opfn(rn1_64, rn1_64, rm_64);
     write_neon_element64(rn1_64, a->vd, 1, MO_64);
 
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
     return true;
 }
 
-#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
+#define DO_PREWIDEN(INSN, S, OP, SRC1WIDE, SIGN)                        \
     static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
     {                                                                   \
         static NeonGenWidenFn * const widenfn[] = {                     \
             gen_helper_neon_widen_##S##8,                               \
             gen_helper_neon_widen_##S##16,                              \
-            tcg_gen_##EXT##_i32_i64,                                    \
-            NULL,                                                       \
+            NULL, NULL,                                                 \
         };                                                              \
         static NeonGenTwo64OpFn * const addfn[] = {                     \
             gen_helper_neon_##OP##l_u16,                                \
@@ -XXX,XX +XXX,XX @@ static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
             tcg_gen_##OP##_i64,                                         \
             NULL,                                                       \
         };                                                              \
-        return do_prewiden_3d(s, a, widenfn[a->size],                   \
-                              addfn[a->size], SRC1WIDE);                \
+        int narrow_mop = a->size == MO_32 ? MO_32 | SIGN : -1;          \
+        return do_prewiden_3d(s, a, widenfn[a->size], addfn[a->size],   \
+                              SRC1WIDE ? MO_Q : narrow_mop,             \
+                              narrow_mop);                              \
     }
 
-DO_PREWIDEN(VADDL_S, s, ext, add, false)
-DO_PREWIDEN(VADDL_U, u, extu, add, false)
-DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
-DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
-DO_PREWIDEN(VADDW_S, s, ext, add, true)
-DO_PREWIDEN(VADDW_U, u, extu, add, true)
-DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
-DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
+DO_PREWIDEN(VADDL_S, s, add, false, MO_SIGN)
+DO_PREWIDEN(VADDL_U, u, add, false, 0)
+DO_PREWIDEN(VSUBL_S, s, sub, false, MO_SIGN)
+DO_PREWIDEN(VSUBL_U, u, sub, false, 0)
+DO_PREWIDEN(VADDW_S, s, add, true, MO_SIGN)
+DO_PREWIDEN(VADDW_U, u, add, true, 0)
+DO_PREWIDEN(VSUBW_S, s, sub, true, MO_SIGN)
+DO_PREWIDEN(VSUBW_U, u, sub, true, 0)
 
 static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
                          NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
-- 
2.20.1

In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
meant we were using the H4() address swizzler macro rather than the
H2() which is required for 2-byte data.  This had no effect on
little-endian hosts but meant we put the result data into the
destination Dreg in the wrong order on big-endian hosts.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201028191712.4910-2-peter.maydell@linaro.org
---
 target/arm/vec_helper.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ DO_ABA(gvec_uaba_d, uint64_t)
         r2 = float16_##OP(m[H2(0)], m[H2(1)], fpst);                    \
         r3 = float16_##OP(m[H2(2)], m[H2(3)], fpst);                    \
                                                                         \
-        d[H4(0)] = r0;                                                  \
-        d[H4(1)] = r1;                                                  \
-        d[H4(2)] = r2;                                                  \
-        d[H4(3)] = r3;                                                  \
+        d[H2(0)] = r0;                                                  \
+        d[H2(1)] = r1;                                                  \
+        d[H2(2)] = r2;                                                  \
+        d[H2(3)] = r3;                                                  \
     }
 
 DO_NEON_PAIRWISE(neon_padd, add)
-- 
2.20.1

The helper functions for performing the udot/sdot operations against
a scalar were not using an address-swizzling macro when converting
the index of the scalar element into a pointer into the vm array.
This had no effect on little-endian hosts but meant we generated
incorrect results on big-endian hosts.

For these insns, the index is indexing over group of 4 8-bit values,
so 32 bits per indexed entity, and H4() is therefore what we want.
(For Neon the only possible input indexes are 0 and 1.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201028191712.4910-3-peter.maydell@linaro.org
---
 target/arm/vec_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t index = simd_data(desc);
     uint32_t *d = vd;
     int8_t *n = vn;
-    int8_t *m_indexed = (int8_t *)vm + index * 4;
+    int8_t *m_indexed = (int8_t *)vm + H4(index) * 4;
 
     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
      * Otherwise opr_sz is a multiple of 16.
@@ -XXX,XX +XXX,XX @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t index = simd_data(desc);
     uint32_t *d = vd;
     uint8_t *n = vn;
-    uint8_t *m_indexed = (uint8_t *)vm + index * 4;
+    uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4;
 
     /* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
      * Otherwise opr_sz is a multiple of 16.
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

HCR should be applied when NS is set, not when it is cleared.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void tlbimvaa_is_write(CPUARMState *env, const ARMCPRegInfo *ri,
 
 /*
  * Non-IS variants of TLB operations are upgraded to
- * IS versions if we are at NS EL1 and HCR_EL2.FB is set to
+ * IS versions if we are at EL1 and HCR_EL2.FB is effectively set to
  * force broadcast of these operations.
  */
 static bool tlb_force_broadcast(CPUARMState *env)
 {
-    return (env->cp15.hcr_el2 & HCR_FB) &&
-        arm_current_el(env) == 1 && arm_is_secure_below_el3(env);
+    return arm_current_el(env) == 1 && (arm_hcr_el2_eff(env) & HCR_FB);
 }
 
 static void tlbiall_write(CPUARMState *env, const ARMCPRegInfo *ri,
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

Secure mode is not exempted from checking SCR_EL3.TLOR, and in the
future HCR_EL2.TLOR when S-EL2 is enabled.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t id_aa64pfr0_read(CPUARMState *env, const ARMCPRegInfo *ri)
 #endif
 
 /* Shared logic between LORID and the rest of the LOR* registers.
- * Secure state has already been delt with.
+ * Secure state exclusion has already been dealt with.
  */
-static CPAccessResult access_lor_ns(CPUARMState *env)
+static CPAccessResult access_lor_ns(CPUARMState *env,
+                                    const ARMCPRegInfo *ri, bool isread)
 {
     int el = arm_current_el(env);
 
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_ns(CPUARMState *env)
     return CP_ACCESS_OK;
 }
 
-static CPAccessResult access_lorid(CPUARMState *env, const ARMCPRegInfo *ri,
-                                   bool isread)
-{
-    if (arm_is_secure_below_el3(env)) {
-        /* Access ok in secure mode.  */
-        return CP_ACCESS_OK;
-    }
-    return access_lor_ns(env);
-}
-
 static CPAccessResult access_lor_other(CPUARMState *env,
                                        const ARMCPRegInfo *ri, bool isread)
 {
@@ -XXX,XX +XXX,XX @@ static CPAccessResult access_lor_other(CPUARMState *env,
         /* Access denied in secure mode.  */
         return CP_ACCESS_TRAP;
     }
-    return access_lor_ns(env);
+    return access_lor_ns(env, ri, isread);
 }
 
 /*
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lor_reginfo[] = {
       .type = ARM_CP_CONST, .resetvalue = 0 },
     { .name = "LORID_EL1", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .crn = 10, .crm = 4, .opc2 = 7,
-      .access = PL1_R, .accessfn = access_lorid,
+      .access = PL1_R, .accessfn = access_lor_ns,
       .type = ARM_CP_CONST, .resetvalue = 0 },
     REGINFO_SENTINEL
 };
-- 
2.20.1

If we're using the capstone disassembler, disassembly of a run of
instructions more than 32 bytes long disassembles the wrong data for
instructions beyond the 32 byte mark:

(qemu) xp /16x 0x100
0000000000000100: 0x00000005 0x54410001 0x00000001 0x00001000
0000000000000110: 0x00000000 0x00000004 0x54410002 0x3c000000
0000000000000120: 0x00000000 0x00000004 0x54410009 0x74736574
0000000000000130: 0x00000000 0x00000000 0x00000000 0x00000000
(qemu) xp /16i 0x100
0x00000100: 00000005 andeq r0, r0, r5
0x00000104: 54410001 strbpl r0, [r1], #-1
0x00000108: 00000001 andeq r0, r0, r1
0x0000010c: 00001000 andeq r1, r0, r0
0x00000110: 00000000 andeq r0, r0, r0
0x00000114: 00000004 andeq r0, r0, r4
0x00000118: 54410002 strbpl r0, [r1], #-2
0x0000011c: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
0x00000120: 54410001 strbpl r0, [r1], #-1
0x00000124: 00000001 andeq r0, r0, r1
0x00000128: 00001000 andeq r1, r0, r0
0x0000012c: 00000000 andeq r0, r0, r0
0x00000130: 00000004 andeq r0, r0, r4
0x00000134: 54410002 strbpl r0, [r1], #-2
0x00000138: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
0x0000013c: 00000000 andeq r0, r0, r0

Here the disassembly of 0x120..0x13f is using the data that is in
0x104..0x123.

This is caused by passing the wrong value to the read_memory_func().
The intention is that at this point in the loop the 'cap_buf' buffer
already contains 'csize' bytes of data for the instruction at guest
addr 'pc', and we want to read in an extra 'tsize' bytes.  Those
extra bytes are therefore at 'pc + csize', not 'pc'.  On the first
time through the loop 'csize' happens to be zero, so the initial read
of 32 bytes into cap_buf is correct and as long as the disassembly
never needs to read more data we return the correct information.

Use the correct guest address in the call to read_memory_func().

Cc: qemu-stable@nongnu.org
Fixes: https://bugs.launchpad.net/qemu/+bug/1900779
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201022132445.25039-1-peter.maydell@linaro.org
---
 disas/capstone.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/disas/capstone.c b/disas/capstone.c
index XXXXXXX..XXXXXXX 100644
--- a/disas/capstone.c
+++ b/disas/capstone.c
@@ -XXX,XX +XXX,XX @@ bool cap_disas_monitor(disassemble_info *info, uint64_t pc, int count)
 
         /* Make certain that we can make progress.  */
         assert(tsize != 0);
-        info->read_memory_func(pc, cap_buf + csize, tsize, info);
+        info->read_memory_func(pc + csize, cap_buf + csize, tsize, info);
         csize += tsize;
 
         if (cs_disasm_iter(handle, &cbuf, &csize, &pc, insn)) {
-- 
2.20.1

From: Philippe Mathieu-Daudé <philmd@redhat.com>

Use the BIT_ULL() macro to ensure we use 64-bit arithmetic.
This fixes the following Coverity issue (OVERFLOW_BEFORE_WIDEN):

CID 1432363 (#1 of 1): Unintentional integer overflow:

overflow_before_widen:
    Potentially overflowing expression 1 << scale with type int
    (32 bits, signed) is evaluated using 32-bit arithmetic, and
    then used in a context that expects an expression of type
    hwaddr (64 bits, unsigned).

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20201030144617.1535064-1-philmd@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/smmuv3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/bitops.h"
 #include "hw/irq.h"
 #include "hw/sysbus.h"
 #include "migration/vmstate.h"
@@ -XXX,XX +XXX,XX @@ static void smmuv3_s1_range_inval(SMMUState *s, Cmd *cmd)
         scale = CMD_SCALE(cmd);
         num = CMD_NUM(cmd);
         ttl = CMD_TTL(cmd);
-        num_pages = (num + 1) * (1 << (scale));
+        num_pages = (num + 1) * BIT_ULL(scale);
     }
 
     if (type == SMMU_CMD_TLBI_NH_VA) {
-- 
2.20.1

From: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>

When booting a CPU with EL3 using the -kernel flag, set up CPTR_EL3 so
that SVE will not trap to EL3.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030151541.11976-1-remi@remlab.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/boot.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@ static void do_cpu_reset(void *opaque)
                     if (cpu_isar_feature(aa64_mte, cpu)) {
                         env->cp15.scr_el3 |= SCR_ATA;
                     }
+                    if (cpu_isar_feature(aa64_sve, cpu)) {
+                        env->cp15.cptr_el[3] |= CPTR_EZ;
+                    }
                     /* AArch64 kernels never boot in secure mode */
                     assert(!info->secure_boot);
                     /* This hook is only supported for AArch32 currently:
-- 
2.20.1

From: AlexChen <alex.chen@huawei.com>

In omap_lcd_interrupts(), the pointer omap_lcd is dereferinced before
being check if it is valid, which may lead to NULL pointer dereference.
So move the assignment to surface after checking that the omap_lcd is valid
and move surface_bits_per_pixel(surface) to after the surface assignment.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: AlexChen <alex.chen@huawei.com>
Message-id: 5F9CDB8A.9000001@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/display/omap_lcdc.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/display/omap_lcdc.c b/hw/display/omap_lcdc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/omap_lcdc.c
+++ b/hw/display/omap_lcdc.c
@@ -XXX,XX +XXX,XX @@ static void omap_lcd_interrupts(struct omap_lcd_panel_s *s)
 static void omap_update_display(void *opaque)
 {
     struct omap_lcd_panel_s *omap_lcd = (struct omap_lcd_panel_s *) opaque;
-    DisplaySurface *surface = qemu_console_surface(omap_lcd->con);
+    DisplaySurface *surface;
     draw_line_func draw_line;
     int size, height, first, last;
     int width, linesize, step, bpp, frame_offset;
     hwaddr frame_base;
 
-    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable ||
-        !surface_bits_per_pixel(surface)) {
+    if (!omap_lcd || omap_lcd->plm == 1 || !omap_lcd->enable) {
+        return;
+    }
+
+    surface = qemu_console_surface(omap_lcd->con);
+    if (!surface_bits_per_pixel(surface)) {
         return;
     }
 
-- 
2.20.1

From: AlexChen <alex.chen@huawei.com>

In exynos4210_fimd_update(), the pointer s is dereferinced before
being check if it is valid, which may lead to NULL pointer dereference.
So move the assignment to global_width after checking that the s is valid.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 5F9F8D88.9030102@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/display/exynos4210_fimd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/display/exynos4210_fimd.c b/hw/display/exynos4210_fimd.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/display/exynos4210_fimd.c
+++ b/hw/display/exynos4210_fimd.c
@@ -XXX,XX +XXX,XX @@ static void exynos4210_fimd_update(void *opaque)
     bool blend = false;
     uint8_t *host_fb_addr;
     bool is_dirty = false;
-    const int global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
+    int global_width;
 
     if (!s || !s->console || !s->enabled ||
         surface_bits_per_pixel(qemu_console_surface(s->console)) == 0) {
         return;
     }
+
+    global_width = (s->vidtcon[2] & FIMD_VIDTCON2_SIZE_MASK) + 1;
     exynos4210_update_resolution(s);
     surface = qemu_console_surface(s->console);
 
-- 
2.20.1

In arm_v7m_mmu_idx_for_secstate() we get the 'priv' level to pass to
armv7m_mmu_idx_for_secstate_and_priv() by calling arm_current_el().
This is incorrect when the security state being queried is not the
current one, because arm_current_el() uses the current security state
to determine which of the banked CONTROL.nPRIV bits to look at.
The effect was that if (for instance) Secure state was in privileged
mode but Non-Secure was not then we would return the wrong MMU index.

The only places where we are using this function in a way that could
trigger this bug are for the stack loads during a v8M function-return
and for the instruction fetch of a v8M SG insn.

Fix the bug by expanding out the M-profile version of the
arm_current_el() logic inline so it can use the passed in secstate
rather than env->v7m.secure.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201022164408.13214-1-peter.maydell@linaro.org
---
 target/arm/m_helper.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
 /* Return the MMU index for a v7M CPU in the specified security state */
 ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
 {
-    bool priv = arm_current_el(env) != 0;
+    bool priv = arm_v7m_is_handler_mode(env) ||
+        !(env->v7m.control[secstate] & 1);
 
     return arm_v7m_mmu_idx_for_secstate_and_priv(env, secstate, priv);
 }
-- 
2.20.1

On some hosts (eg Ubuntu Bionic) pkg-config returns a set of
libraries for gio-2.0 which don't actually work when compiling
statically. (Specifically, the returned library string includes
-lmount, but not -lblkid which -lmount depends upon, so linking
fails due to missing symbols.)

Check that the libraries work, and don't enable gio if they don't,
in the same way we do for gnutls.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200928160402.7961-1-peter.maydell@linaro.org
---
 configure | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index XXXXXXX..XXXXXXX 100755
--- a/configure
+++ b/configure
@@ -XXX,XX +XXX,XX @@ if test "$static" = yes && test "$mingw32" = yes; then
 fi
 
 if $pkg_config --atleast-version=$glib_req_ver gio-2.0; then
-    gio=yes
     gio_cflags=$($pkg_config --cflags gio-2.0)
     gio_libs=$($pkg_config --libs gio-2.0)
     gdbus_codegen=$($pkg_config --variable=gdbus_codegen gio-2.0)
     if [ ! -x "$gdbus_codegen" ]; then
         gdbus_codegen=
     fi
+    # Check that the libraries actually work -- Ubuntu 18.04 ships
+    # with pkg-config --static --libs data for gio-2.0 that is missing
+    # -lblkid and will give a link error.
+    write_c_skeleton
+    if compile_prog "" "gio_libs" ; then
+        gio=yes
+    else
+        gio=no
+    fi
 else
     gio=no
 fi
-- 
2.20.1

In gicv3_init_cpuif() we copy the ARMCPU gicv3_maintenance_interrupt
into the GICv3CPUState struct's maintenance_irq field.  This will
only work if the board happens to have already wired up the CPU
maintenance IRQ before the GIC was realized.  Unfortunately this is
not the case for the 'virt' board, and so the value that gets copied
is NULL (since a qemu_irq is really a pointer to an IRQState struct
under the hood).  The effect is that the CPU interface code never
actually raises the maintenance interrupt line.

Instead, since the GICv3CPUState has a pointer to the CPUState, make
the dereference at the point where we want to raise the interrupt, to
avoid an implicit requirement on board code to wire things up in a
particular order.

Reported-by: Jose Martins <josemartins90@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20201009153904.28529-1-peter.maydell@linaro.org
Reviewed-by: Luc Michel <luc@lmichel.fr>
---
 include/hw/intc/arm_gicv3_common.h | 1 -
 hw/intc/arm_gicv3_cpuif.c          | 5 ++---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -XXX,XX +XXX,XX @@ struct GICv3CPUState {
     qemu_irq parent_fiq;
     qemu_irq parent_virq;
     qemu_irq parent_vfiq;
-    qemu_irq maintenance_irq;
 
     /* Redistributor */
     uint32_t level;                  /* Current IRQ level */
diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
     int irqlevel = 0;
     int fiqlevel = 0;
     int maintlevel = 0;
+    ARMCPU *cpu = ARM_CPU(cs->cpu);
 
     idx = hppvi_index(cs);
     trace_gicv3_cpuif_virt_update(gicv3_redist_affid(cs), idx);
@@ -XXX,XX +XXX,XX @@ static void gicv3_cpuif_virt_update(GICv3CPUState *cs)
 
     qemu_set_irq(cs->parent_vfiq, fiqlevel);
     qemu_set_irq(cs->parent_virq, irqlevel);
-    qemu_set_irq(cs->maintenance_irq, maintlevel);
+    qemu_set_irq(cpu->gicv3_maintenance_interrupt, maintlevel);
 }
 
 static uint64_t icv_ap_read(CPUARMState *env, const ARMCPRegInfo *ri)
@@ -XXX,XX +XXX,XX @@ void gicv3_init_cpuif(GICv3State *s)
             && cpu->gic_num_lrs) {
             int j;
 
-            cs->maintenance_irq = cpu->gicv3_maintenance_interrupt;
-
             cs->num_list_regs = cpu->gic_num_lrs;
             cs->vpribits = cpu->gic_vpribits;
             cs->vprebits = cpu->gic_vprebits;
-- 
2.20.1

The kerneldoc script currently emits Sphinx markup for a macro with
arguments that uses the c:function directive. This is correct for
Sphinx versions earlier than Sphinx 3, where c:macro doesn't allow
documentation of macros with arguments and c:function is not picky
about the syntax of what it is passed. However, in Sphinx 3 the
c:macro directive was enhanced to support macros with arguments,
and c:function was made more picky about what syntax it accepted.

When kerneldoc is told that it needs to produce output for Sphinx
3 or later, make it emit c:function only for functions and c:macro
for macros with arguments. We assume that anything with a return
type is a function and anything without is a macro.

This fixes the Sphinx error:

/home/petmay01/linaro/qemu-from-laptop/qemu/docs/../include/qom/object.h:155:Error in declarator
If declarator-id with parameters (e.g., 'void f(int arg)'):
  Invalid C declaration: Expected identifier in nested name. [error at 25]
    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
    -------------------------^
If parenthesis in noptr-declarator (e.g., 'void (*f(int arg))(double)'):
  Error in declarator or parameters
  Invalid C declaration: Expecting "(" in parameters. [error at 39]
    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
    ---------------------------------------^

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201030174700.7204-2-peter.maydell@linaro.org
---
 scripts/kernel-doc | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/scripts/kernel-doc b/scripts/kernel-doc
index XXXXXXX..XXXXXXX 100755
--- a/scripts/kernel-doc
+++ b/scripts/kernel-doc
@@ -XXX,XX +XXX,XX @@ sub output_function_rst(%) {
 	output_highlight_rst($args{'purpose'});
 	$start = "\n\n**Syntax**\n\n  ``";
     } else {
-	print ".. c:function:: ";
+        if ((split(/\./, $sphinx_version))[0] >= 3) {
+            # Sphinx 3 and later distinguish macros and functions and
+            # complain if you use c:function with something that's not
+            # syntactically valid as a function declaration.
+            # We assume that anything with a return type is a function
+            # and anything without is a macro.
+            if ($args{'functiontype'} ne "") {
+                print ".. c:function:: ";
+            } else {
+                print ".. c:macro:: ";
+            }
+        } else {
+            # Older Sphinx don't support documenting macros that take
+            # arguments with c:macro, and don't complain about the use
+            # of c:function for this.
+            print ".. c:function:: ";
+        }
     }
     if ($args{'functiontype'} ne "") {
 	$start .= $args{'functiontype'} . " " . $args{'function'} . " (";
-- 
2.20.1

Sphinx 3.2 is pickier than earlier versions about the option:: markup,
and complains about our usage in qemu-option-trace.rst:

../../docs/qemu-option-trace.rst.inc:4:Malformed option description
  '[enable=]PATTERN', should look like "opt", "-opt args", "--opt args",
  "/opt args" or "+opt args"

In this file, we're really trying to document the different parts of
the top-level --trace option, which qemu-nbd.rst and qemu-img.rst
have already introduced with an option:: markup.  So it's not right
to use option:: here anyway.  Switch to a different markup
(definition lists) which gives about the same formatted output.

(Unlike option::, this markup doesn't produce index entries; but
at the moment we don't do anything much with indexes anyway, and
in any case I think it doesn't make much sense to have individual
index entries for the sub-parts of the --trace option.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201030174700.7204-3-peter.maydell@linaro.org
---
 docs/qemu-option-trace.rst.inc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/qemu-option-trace.rst.inc b/docs/qemu-option-trace.rst.inc
index XXXXXXX..XXXXXXX 100644
--- a/docs/qemu-option-trace.rst.inc
+++ b/docs/qemu-option-trace.rst.inc
@@ -XXX,XX +XXX,XX @@
 
 Specify tracing options.
 
-.. option:: [enable=]PATTERN
+``[enable=]PATTERN``
 
   Immediately enable events matching *PATTERN*
   (either event name or a globbing pattern).  This option is only
@@ -XXX,XX +XXX,XX @@ Specify tracing options.
 
   Use :option:`-trace help` to print a list of names of trace points.
 
-.. option:: events=FILE
+``events=FILE``
 
   Immediately enable events listed in *FILE*.
   The file must contain one event name (as listed in the ``trace-events-all``
@@ -XXX,XX +XXX,XX @@ Specify tracing options.
   available if QEMU has been compiled with the ``simple``, ``log`` or
   ``ftrace`` tracing backend.
 
-.. option:: file=FILE
+``file=FILE``
 
   Log output traces to *FILE*.
   This option is only available if QEMU has been compiled with
-- 
2.20.1

The randomness tests in the NPCM7xx RNG test fail intermittently
but fairly frequently. On my machine running the test in a loop:
 while QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/npcm7xx_rng-test; do true; done

will fail in less than a minute with an error like:
ERROR:../../tests/qtest/npcm7xx_rng-test.c:256:test_first_byte_runs:
assertion failed (calc_runs_p(buf.l, sizeof(buf) * BITS_PER_BYTE) > 0.01): (0.00286205989 > 0.01)

(Failures have been observed on all 4 of the randomness tests,
not just first_byte_runs.)

It's not clear why these tests are failing like this, but intermittent
failures make CI and merge testing awkward, so disable running them
unless a developer specifically sets QEMU_TEST_FLAKY_RNG_TESTS when
running the test suite, until we work out the cause.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201102152454.8287-1-peter.maydell@linaro.org
Reviewed-by: Havard Skinnemoen <hskinnemoen@google.com>
---
 tests/qtest/npcm7xx_rng-test.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/npcm7xx_rng-test.c b/tests/qtest/npcm7xx_rng-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/npcm7xx_rng-test.c
+++ b/tests/qtest/npcm7xx_rng-test.c
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
 
     qtest_add_func("npcm7xx_rng/enable_disable", test_enable_disable);
     qtest_add_func("npcm7xx_rng/rosel", test_rosel);
-    qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
-    qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
-    qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
-    qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
+    /*
+     * These tests fail intermittently; only run them on explicit
+     * request until we figure out why.
+     */
+    if (getenv("QEMU_TEST_FLAKY_RNG_TESTS")) {
+        qtest_add_func("npcm7xx_rng/continuous/monobit", test_continuous_monobit);
+        qtest_add_func("npcm7xx_rng/continuous/runs", test_continuous_runs);
+        qtest_add_func("npcm7xx_rng/first_byte/monobit", test_first_byte_monobit);
+        qtest_add_func("npcm7xx_rng/first_byte/runs", test_first_byte_runs);
+    }
 
     qtest_start("-machine npcm750-evb");
     ret = g_test_run();
-- 
2.20.1

The following changes since commit 64ada298b98a51eb2512607f6e6180cb330c47b1:

Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220302' into staging (2022-03-02 12:38:46 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20220302

for you to fetch changes up to 268c11984e67867c22f53beb3c7f8b98900d66b2:

ui/cocoa.m: Remove unnecessary NSAutoreleasePools (2022-03-02 19:27:37 +0000)

----------------------------------------------------------------
target-arm queue:
 * mps3-an547: Add missing user ahb interfaces
 * hw/arm/mps2-tz.c: Update AN547 documentation URL
 * hw/input/tsc210x: Don't abort on bad SPI word widths
 * hw/i2c: flatten pca954x mux device
 * target/arm: Support PSCI 1.1 and SMCCC 1.0
 * target/arm: Fix early free of TCG temp in handle_simd_shift_fpint_conv()
 * tests/qtest: add qtests for npcm7xx sdhci
 * Implement FEAT_LVA
 * Implement FEAT_LPA
 * Implement FEAT_LPA2 (but do not enable it yet)
 * Report KVM's actual PSCI version to guest in dtb
 * ui/cocoa.m: Fix updateUIInfo threading issues
 * ui/cocoa.m: Remove unnecessary NSAutoreleasePools

----------------------------------------------------------------
Akihiko Odaki (1):
      target/arm: Support PSCI 1.1 and SMCCC 1.0

Jimmy Brisson (1):
      mps3-an547: Add missing user ahb interfaces

Patrick Venture (1):
      hw/i2c: flatten pca954x mux device

Peter Maydell (5):
      hw/arm/mps2-tz.c: Update AN547 documentation URL
      hw/input/tsc210x: Don't abort on bad SPI word widths
      target/arm: Report KVM's actual PSCI version to guest in dtb
      ui/cocoa.m: Fix updateUIInfo threading issues
      ui/cocoa.m: Remove unnecessary NSAutoreleasePools

Richard Henderson (16):
      hw/registerfields: Add FIELD_SEX<N> and FIELD_SDP<N>
      target/arm: Set TCR_EL1.TSZ for user-only
      target/arm: Fault on invalid TCR_ELx.TxSZ
      target/arm: Move arm_pamax out of line
      target/arm: Pass outputsize down to check_s2_mmu_setup
      target/arm: Use MAKE_64BIT_MASK to compute indexmask
      target/arm: Honor TCR_ELx.{I}PS
      target/arm: Prepare DBGBVR and DBGWVR for FEAT_LVA
      target/arm: Implement FEAT_LVA
      target/arm: Implement FEAT_LPA
      target/arm: Extend arm_fi_to_lfsc to level -1
      target/arm: Introduce tlbi_aa64_get_range
      target/arm: Fix TLBIRange.base for 16k and 64k pages
      target/arm: Validate tlbi TG matches translation granule in use
      target/arm: Advertise all page sizes for -cpu max
      target/arm: Implement FEAT_LPA2

Shengtan Mao (1):
      tests/qtest: add qtests for npcm7xx sdhci

Wentao_Liang (1):
      target/arm: Fix early free of TCG temp in handle_simd_shift_fpint_conv()

From: Jimmy Brisson <jimmy.brisson@linaro.org>

With these interfaces missing, TFM would delegate peripherals 0, 1,
2, 3 and 8, and qemu would ignore the delegation of interface 8, as
it thought interface 4 was eth & USB.

This patch corrects this behavior and allows TFM to delegate the
eth & USB peripheral to NS mode.

(The old QEMU behaviour was based on revision B of the AN547
appnote; revision C corrects this error in the documentation,
and this commit brings QEMU in to line with how the FPGA
image really behaves.)

Signed-off-by: Jimmy Brisson <jimmy.brisson@linaro.org>
Message-id: 20220210210227.3203883-1-jimmy.brisson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: added commit message note clarifying that the old behaviour
was a docs issue, not because there were two different versions
of the FPGA image]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/mps2-tz.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
                 { "gpio1", make_unimp_dev, &mms->gpio[1], 0x41101000, 0x1000 },
                 { "gpio2", make_unimp_dev, &mms->gpio[2], 0x41102000, 0x1000 },
                 { "gpio3", make_unimp_dev, &mms->gpio[3], 0x41103000, 0x1000 },
+                { /* port 4 USER AHB interface 0 */ },
+                { /* port 5 USER AHB interface 1 */ },
+                { /* port 6 USER AHB interface 2 */ },
+                { /* port 7 USER AHB interface 3 */ },
                 { "eth-usb", make_eth_usb, NULL, 0x41400000, 0x200000, { 49 } },
             },
         },
-- 
2.25.1

The AN547 application note URL has changed: update our comment
accordingly. (Rev B is still downloadable from the old URL,
but there is a new Rev C of the document now.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20220221094144.426191-1-peter.maydell@linaro.org
---
 hw/arm/mps2-tz.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@
  * Application Note AN524:
  * https://developer.arm.com/documentation/dai0524/latest/
  * Application Note AN547:
- * https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/DAI0547B_SSE300_PLUS_U55_FPGA_for_mps3.pdf
+ * https://developer.arm.com/documentation/dai0547/latest/
  *
  * The AN505 defers to the Cortex-M33 processor ARMv8M IoT Kit FVP User Guide
  * (ARM ECM0601256) for the details of some of the device layout:
-- 
2.25.1

The tsc210x doesn't support anything other than 16-bit reads on the
SPI bus, but the guest can program the SPI controller to attempt
them anyway. If this happens, don't abort QEMU, just log this as
a guest error.

This fixes our machine_arm_n8x0.py:N8x0Machine.test_n800
acceptance test, which hits this assertion.

The reason we hit the assertion is because the guest kernel thinks
there is a TSC2005 on this SPI bus address, not a TSC210x.  (The n810
*does* have a TSC2005 at this address.) The TSC2005 supports the
24-bit accesses which the guest driver makes, and the TSC210x does
not (that is, our TSC210x emulation is not missing support for a word
width the hardware can handle).  It's not clear whether the problem
here is that the guest kernel incorrectly thinks the n800 has the
same device at this SPI bus address as the n810, or that QEMU's n810
board model doesn't get the SPI devices right.  At this late date
there no longer appears to be any reliable information on the web
about the hardware behaviour, but I am inclined to think this is a
guest kernel bug.  In any case, we prefer not to abort QEMU for
guest-triggerable conditions, so logging the error is the right thing
to do.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/736
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20220221140750.514557-1-peter.maydell@linaro.org
---
 hw/input/tsc210x.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/input/tsc210x.c b/hw/input/tsc210x.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/input/tsc210x.c
+++ b/hw/input/tsc210x.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/hw.h"
 #include "audio/audio.h"
 #include "qemu/timer.h"
+#include "qemu/log.h"
 #include "sysemu/reset.h"
 #include "ui/console.h"
 #include "hw/arm/omap.h"            /* For I2SCodec */
@@ -XXX,XX +XXX,XX @@ uint32_t tsc210x_txrx(void *opaque, uint32_t value, int len)
     TSC210xState *s = opaque;
     uint32_t ret = 0;
 
-    if (len != 16)
-        hw_error("%s: FIXME: bad SPI word width %i\n", __func__, len);
+    if (len != 16) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: bad SPI word width %i\n", __func__, len);
+        return 0;
+    }
 
     /* TODO: sequential reads etc - how do we make sure the host doesn't
      * unintentionally read out a conversion result from a register while
-- 
2.25.1

From: Patrick Venture <venture@google.com>

Previously this device created N subdevices which each owned an i2c bus.
Now this device simply owns the N i2c busses directly.

Tested: Verified devices behind mux are still accessible via qmp and i2c
from within an arm32 SoC.

Reviewed-by: Hao Wu <wuhaotsh@google.com>
Signed-off-by: Patrick Venture <venture@google.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20220202164533.1283668-1-venture@google.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/i2c/i2c_mux_pca954x.c | 77 +++++++---------------------------------
 1 file changed, 13 insertions(+), 64 deletions(-)

diff --git a/hw/i2c/i2c_mux_pca954x.c b/hw/i2c/i2c_mux_pca954x.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/i2c/i2c_mux_pca954x.c
+++ b/hw/i2c/i2c_mux_pca954x.c
@@ -XXX,XX +XXX,XX @@
 #define PCA9548_CHANNEL_COUNT 8
 #define PCA9546_CHANNEL_COUNT 4
 
-/*
- * struct Pca954xChannel - The i2c mux device will have N of these states
- * that own the i2c channel bus.
- * @bus: The owned channel bus.
- * @enabled: Is this channel active?
- */
-typedef struct Pca954xChannel {
-    SysBusDevice parent;
-
-    I2CBus       *bus;
-
-    bool         enabled;
-} Pca954xChannel;
-
-#define TYPE_PCA954X_CHANNEL "pca954x-channel"
-#define PCA954X_CHANNEL(obj) \
-    OBJECT_CHECK(Pca954xChannel, (obj), TYPE_PCA954X_CHANNEL)
-
 /*
  * struct Pca954xState - The pca954x state object.
  * @control: The value written to the mux control.
@@ -XXX,XX +XXX,XX @@ typedef struct Pca954xState {
 
     uint8_t control;
 
-    /* The channel i2c buses. */
-    Pca954xChannel channel[PCA9548_CHANNEL_COUNT];
+    bool enabled[PCA9548_CHANNEL_COUNT];
+    I2CBus *bus[PCA9548_CHANNEL_COUNT];
 } Pca954xState;
 
 /*
@@ -XXX,XX +XXX,XX @@ static bool pca954x_match(I2CSlave *candidate, uint8_t address,
     }
 
     for (i = 0; i < mc->nchans; i++) {
-        if (!mux->channel[i].enabled) {
+        if (!mux->enabled[i]) {
             continue;
         }
 
-        if (i2c_scan_bus(mux->channel[i].bus, address, broadcast,
+        if (i2c_scan_bus(mux->bus[i], address, broadcast,
                          current_devs)) {
             if (!broadcast) {
                 return true;
@@ -XXX,XX +XXX,XX @@ static void pca954x_enable_channel(Pca954xState *s, uint8_t enable_mask)
      */
     for (i = 0; i < mc->nchans; i++) {
         if (enable_mask & (1 << i)) {
-            s->channel[i].enabled = true;
+            s->enabled[i] = true;
         } else {
-            s->channel[i].enabled = false;
+            s->enabled[i] = false;
         }
     }
 }
@@ -XXX,XX +XXX,XX @@ I2CBus *pca954x_i2c_get_bus(I2CSlave *mux, uint8_t channel)
     Pca954xState *pca954x = PCA954X(mux);
 
     g_assert(channel < pc->nchans);
-    return I2C_BUS(qdev_get_child_bus(DEVICE(&pca954x->channel[channel]),
-                                      "i2c-bus"));
-}
-
-static void pca954x_channel_init(Object *obj)
-{
-    Pca954xChannel *s = PCA954X_CHANNEL(obj);
-    s->bus = i2c_init_bus(DEVICE(s), "i2c-bus");
-
-    /* Start all channels as disabled. */
-    s->enabled = false;
-}
-
-static void pca954x_channel_class_init(ObjectClass *klass, void *data)
-{
-    DeviceClass *dc = DEVICE_CLASS(klass);
-    dc->desc = "Pca954x Channel";
+    return pca954x->bus[channel];
 }
 
 static void pca9546_class_init(ObjectClass *klass, void *data)
@@ -XXX,XX +XXX,XX @@ static void pca9548_class_init(ObjectClass *klass, void *data)
     s->nchans = PCA9548_CHANNEL_COUNT;
 }
 
-static void pca954x_realize(DeviceState *dev, Error **errp)
-{
-    Pca954xState *s = PCA954X(dev);
-    Pca954xClass *c = PCA954X_GET_CLASS(s);
-    int i;
-
-    /* SMBus modules. Cannot fail. */
-    for (i = 0; i < c->nchans; i++) {
-        sysbus_realize(SYS_BUS_DEVICE(&s->channel[i]), &error_abort);
-    }
-}
-
 static void pca954x_init(Object *obj)
 {
     Pca954xState *s = PCA954X(obj);
     Pca954xClass *c = PCA954X_GET_CLASS(obj);
     int i;
 
-    /* Only initialize the children we expect. */
+    /* SMBus modules. Cannot fail. */
     for (i = 0; i < c->nchans; i++) {
-        object_initialize_child(obj, "channel[*]", &s->channel[i],
-                                TYPE_PCA954X_CHANNEL);
+        g_autofree gchar *bus_name = g_strdup_printf("i2c.%d", i);
+
+        /* start all channels as disabled. */
+        s->enabled[i] = false;
+        s->bus[i] = i2c_init_bus(DEVICE(s), bus_name);
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static void pca954x_class_init(ObjectClass *klass, void *data)
     rc->phases.enter = pca954x_enter_reset;
 
     dc->desc = "Pca954x i2c-mux";
-    dc->realize = pca954x_realize;
 
     k->write_data = pca954x_write_data;
     k->receive_byte = pca954x_read_byte;
@@ -XXX,XX +XXX,XX @@ static const TypeInfo pca954x_info[] = {
         .parent        = TYPE_PCA954X,
         .class_init    = pca9548_class_init,
     },
-    {
-        .name = TYPE_PCA954X_CHANNEL,
-        .parent = TYPE_SYS_BUS_DEVICE,
-        .class_init = pca954x_channel_class_init,
-        .instance_size = sizeof(Pca954xChannel),
-        .instance_init = pca954x_channel_init,
-    }
 };
 
 DEFINE_TYPES(pca954x_info)
-- 
2.25.1

From: Akihiko Odaki <akihiko.odaki@gmail.com>

Support the latest PSCI on TCG and HVF. A 64-bit function called from
AArch32 now returns NOT_SUPPORTED, which is necessary to adhere to SMC
Calling Convention 1.0. It is still not compliant with SMCCC 1.3 since
they do not implement mandatory functions.

Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Message-id: 20220213035753.34577-1-akihiko.odaki@gmail.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: update MISMATCH_CHECK checks on PSCI_VERSION macros to match]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/kvm-consts.h | 13 +++++++++----
 hw/arm/boot.c           | 12 +++++++++---
 target/arm/cpu.c        |  5 +++--
 target/arm/hvf/hvf.c    | 27 ++++++++++++++++++++++++++-
 target/arm/kvm64.c      |  2 +-
 target/arm/psci.c       | 35 ++++++++++++++++++++++++++++++++---
 6 files changed, 80 insertions(+), 14 deletions(-)

diff --git a/target/arm/kvm-consts.h b/target/arm/kvm-consts.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm-consts.h
+++ b/target/arm/kvm-consts.h
@@ -XXX,XX +XXX,XX @@ MISMATCH_CHECK(QEMU_PSCI_0_1_FN_MIGRATE, KVM_PSCI_FN_MIGRATE);
 #define QEMU_PSCI_0_2_FN64_AFFINITY_INFO QEMU_PSCI_0_2_FN64(4)
 #define QEMU_PSCI_0_2_FN64_MIGRATE QEMU_PSCI_0_2_FN64(5)
 
+#define QEMU_PSCI_1_0_FN_PSCI_FEATURES QEMU_PSCI_0_2_FN(10)
+
 MISMATCH_CHECK(QEMU_PSCI_0_2_FN_CPU_SUSPEND, PSCI_0_2_FN_CPU_SUSPEND);
 MISMATCH_CHECK(QEMU_PSCI_0_2_FN_CPU_OFF, PSCI_0_2_FN_CPU_OFF);
 MISMATCH_CHECK(QEMU_PSCI_0_2_FN_CPU_ON, PSCI_0_2_FN_CPU_ON);
@@ -XXX,XX +XXX,XX @@ MISMATCH_CHECK(QEMU_PSCI_0_2_FN_MIGRATE, PSCI_0_2_FN_MIGRATE);
 MISMATCH_CHECK(QEMU_PSCI_0_2_FN64_CPU_SUSPEND, PSCI_0_2_FN64_CPU_SUSPEND);
 MISMATCH_CHECK(QEMU_PSCI_0_2_FN64_CPU_ON, PSCI_0_2_FN64_CPU_ON);
 MISMATCH_CHECK(QEMU_PSCI_0_2_FN64_MIGRATE, PSCI_0_2_FN64_MIGRATE);
+MISMATCH_CHECK(QEMU_PSCI_1_0_FN_PSCI_FEATURES, PSCI_1_0_FN_PSCI_FEATURES);
 
 /* PSCI v0.2 return values used by TCG emulation of PSCI */
 
 /* No Trusted OS migration to worry about when offlining CPUs */
 #define QEMU_PSCI_0_2_RET_TOS_MIGRATION_NOT_REQUIRED        2
 
-/* We implement version 0.2 only */
-#define QEMU_PSCI_0_2_RET_VERSION_0_2                       2
+#define QEMU_PSCI_VERSION_0_1                     0x00001
+#define QEMU_PSCI_VERSION_0_2                     0x00002
+#define QEMU_PSCI_VERSION_1_1                     0x10001
 
 MISMATCH_CHECK(QEMU_PSCI_0_2_RET_TOS_MIGRATION_NOT_REQUIRED, PSCI_0_2_TOS_MP);
-MISMATCH_CHECK(QEMU_PSCI_0_2_RET_VERSION_0_2,
-               (PSCI_VERSION_MAJOR(0) | PSCI_VERSION_MINOR(2)));
+/* We don't bother to check every possible version value */
+MISMATCH_CHECK(QEMU_PSCI_VERSION_0_2, PSCI_VERSION(0, 2));
+MISMATCH_CHECK(QEMU_PSCI_VERSION_1_1, PSCI_VERSION(1, 1));
 
 /* PSCI return values (inclusive of all PSCI versions) */
 #define QEMU_PSCI_RET_SUCCESS                     0
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@ static void fdt_add_psci_node(void *fdt)
     }
 
     qemu_fdt_add_subnode(fdt, "/psci");
-    if (armcpu->psci_version == 2) {
-        const char comp[] = "arm,psci-0.2\0arm,psci";
-        qemu_fdt_setprop(fdt, "/psci", "compatible", comp, sizeof(comp));
+    if (armcpu->psci_version == QEMU_PSCI_VERSION_0_2 ||
+        armcpu->psci_version == QEMU_PSCI_VERSION_1_1) {
+        if (armcpu->psci_version == QEMU_PSCI_VERSION_0_2) {
+            const char comp[] = "arm,psci-0.2\0arm,psci";
+            qemu_fdt_setprop(fdt, "/psci", "compatible", comp, sizeof(comp));
+        } else {
+            const char comp[] = "arm,psci-1.0\0arm,psci-0.2\0arm,psci";
+            qemu_fdt_setprop(fdt, "/psci", "compatible", comp, sizeof(comp));
+        }
 
         cpu_off_fn = QEMU_PSCI_0_2_FN_CPU_OFF;
         if (arm_feature(&armcpu->env, ARM_FEATURE_AARCH64)) {
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_initfn(Object *obj)
      * picky DTB consumer will also provide a helpful error message.
      */
     cpu->dtb_compatible = "qemu,unknown";
-    cpu->psci_version = 1; /* By default assume PSCI v0.1 */
+    cpu->psci_version = QEMU_PSCI_VERSION_0_1; /* By default assume PSCI v0.1 */
     cpu->kvm_target = QEMU_KVM_ARM_TARGET_NONE;
 
     if (tcg_enabled() || hvf_enabled()) {
-        cpu->psci_version = 2; /* TCG and HVF implement PSCI 0.2 */
+        /* TCG and HVF implement PSCI 1.1 */
+        cpu->psci_version = QEMU_PSCI_VERSION_1_1;
     }
 }
 
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -XXX,XX +XXX,XX @@ static bool hvf_handle_psci_call(CPUState *cpu)
 
     switch (param[0]) {
     case QEMU_PSCI_0_2_FN_PSCI_VERSION:
-        ret = QEMU_PSCI_0_2_RET_VERSION_0_2;
+        ret = QEMU_PSCI_VERSION_1_1;
         break;
     case QEMU_PSCI_0_2_FN_MIGRATE_INFO_TYPE:
         ret = QEMU_PSCI_0_2_RET_TOS_MIGRATION_NOT_REQUIRED; /* No trusted OS */
@@ -XXX,XX +XXX,XX @@ static bool hvf_handle_psci_call(CPUState *cpu)
     case QEMU_PSCI_0_2_FN_MIGRATE:
         ret = QEMU_PSCI_RET_NOT_SUPPORTED;
         break;
+    case QEMU_PSCI_1_0_FN_PSCI_FEATURES:
+        switch (param[1]) {
+        case QEMU_PSCI_0_2_FN_PSCI_VERSION:
+        case QEMU_PSCI_0_2_FN_MIGRATE_INFO_TYPE:
+        case QEMU_PSCI_0_2_FN_AFFINITY_INFO:
+        case QEMU_PSCI_0_2_FN64_AFFINITY_INFO:
+        case QEMU_PSCI_0_2_FN_SYSTEM_RESET:
+        case QEMU_PSCI_0_2_FN_SYSTEM_OFF:
+        case QEMU_PSCI_0_1_FN_CPU_ON:
+        case QEMU_PSCI_0_2_FN_CPU_ON:
+        case QEMU_PSCI_0_2_FN64_CPU_ON:
+        case QEMU_PSCI_0_1_FN_CPU_OFF:
+        case QEMU_PSCI_0_2_FN_CPU_OFF:
+        case QEMU_PSCI_0_1_FN_CPU_SUSPEND:
+        case QEMU_PSCI_0_2_FN_CPU_SUSPEND:
+        case QEMU_PSCI_0_2_FN64_CPU_SUSPEND:
+        case QEMU_PSCI_1_0_FN_PSCI_FEATURES:
+            ret = 0;
+            break;
+        case QEMU_PSCI_0_1_FN_MIGRATE:
+        case QEMU_PSCI_0_2_FN_MIGRATE:
+        default:
+            ret = QEMU_PSCI_RET_NOT_SUPPORTED;
+        }
+        break;
     default:
         return false;
     }
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
         cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_POWER_OFF;
     }
     if (kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_PSCI_0_2)) {
-        cpu->psci_version = 2;
+        cpu->psci_version = QEMU_PSCI_VERSION_0_2;
         cpu->kvm_init_features[0] |= 1 << KVM_ARM_VCPU_PSCI_0_2;
     }
     if (!arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
diff --git a/target/arm/psci.c b/target/arm/psci.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/psci.c
+++ b/target/arm/psci.c
@@ -XXX,XX +XXX,XX @@ void arm_handle_psci_call(ARMCPU *cpu)
 {
     /*
      * This function partially implements the logic for dispatching Power State
-     * Coordination Interface (PSCI) calls (as described in ARM DEN 0022B.b),
+     * Coordination Interface (PSCI) calls (as described in ARM DEN 0022D.b),
      * to the extent required for bringing up and taking down secondary cores,
      * and for handling reset and poweroff requests.
      * Additional information about the calling convention used is available in
@@ -XXX,XX +XXX,XX @@ void arm_handle_psci_call(ARMCPU *cpu)
     }
 
     if ((param[0] & QEMU_PSCI_0_2_64BIT) && !is_a64(env)) {
-        ret = QEMU_PSCI_RET_INVALID_PARAMS;
+        ret = QEMU_PSCI_RET_NOT_SUPPORTED;
         goto err;
     }
 
@@ -XXX,XX +XXX,XX @@ void arm_handle_psci_call(ARMCPU *cpu)
         ARMCPU *target_cpu;
 
     case QEMU_PSCI_0_2_FN_PSCI_VERSION:
-        ret = QEMU_PSCI_0_2_RET_VERSION_0_2;
+        ret = QEMU_PSCI_VERSION_1_1;
         break;
     case QEMU_PSCI_0_2_FN_MIGRATE_INFO_TYPE:
         ret = QEMU_PSCI_0_2_RET_TOS_MIGRATION_NOT_REQUIRED; /* No trusted OS */
@@ -XXX,XX +XXX,XX @@ void arm_handle_psci_call(ARMCPU *cpu)
         }
         helper_wfi(env, 4);
         break;
+    case QEMU_PSCI_1_0_FN_PSCI_FEATURES:
+        switch (param[1]) {
+        case QEMU_PSCI_0_2_FN_PSCI_VERSION:
+        case QEMU_PSCI_0_2_FN_MIGRATE_INFO_TYPE:
+        case QEMU_PSCI_0_2_FN_AFFINITY_INFO:
+        case QEMU_PSCI_0_2_FN64_AFFINITY_INFO:
+        case QEMU_PSCI_0_2_FN_SYSTEM_RESET:
+        case QEMU_PSCI_0_2_FN_SYSTEM_OFF:
+        case QEMU_PSCI_0_1_FN_CPU_ON:
+        case QEMU_PSCI_0_2_FN_CPU_ON:
+        case QEMU_PSCI_0_2_FN64_CPU_ON:
+        case QEMU_PSCI_0_1_FN_CPU_OFF:
+        case QEMU_PSCI_0_2_FN_CPU_OFF:
+        case QEMU_PSCI_0_1_FN_CPU_SUSPEND:
+        case QEMU_PSCI_0_2_FN_CPU_SUSPEND:
+        case QEMU_PSCI_0_2_FN64_CPU_SUSPEND:
+        case QEMU_PSCI_1_0_FN_PSCI_FEATURES:
+            if (!(param[1] & QEMU_PSCI_0_2_64BIT) || is_a64(env)) {
+                ret = 0;
+                break;
+            }
+            /* fallthrough */
+        case QEMU_PSCI_0_1_FN_MIGRATE:
+        case QEMU_PSCI_0_2_FN_MIGRATE:
+        default:
+            ret = QEMU_PSCI_RET_NOT_SUPPORTED;
+            break;
+        }
+        break;
     case QEMU_PSCI_0_1_FN_MIGRATE:
     case QEMU_PSCI_0_2_FN_MIGRATE:
     default:
-- 
2.25.1

From: Wentao_Liang <Wentao_Liang_g@163.com>

handle_simd_shift_fpint_conv() was accidentally freeing the TCG
temporary tcg_fpstatus too early, before the last use of it.  Move
the free down to where it belongs.

Signed-off-by: Wentao_Liang <Wentao_Liang_g@163.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[PMM: cleaned up commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_simd_shift_fpint_conv(DisasContext *s, bool is_scalar,
         }
     }
 
-    tcg_temp_free_ptr(tcg_fpstatus);
     tcg_temp_free_i32(tcg_shift);
     gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
+    tcg_temp_free_ptr(tcg_fpstatus);
     tcg_temp_free_i32(tcg_rmode);
 }
 
-- 
2.25.1

From: Shengtan Mao <stmao@google.com>

Reviewed-by: Hao Wu <wuhaotsh@google.com>
Reviewed-by: Chris Rauer <crauer@google.com>
Signed-off-by: Shengtan Mao <stmao@google.com>
Signed-off-by: Patrick Venture <venture@google.com>
Message-id: 20220225174451.192304-1-wuhaotsh@google.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/qtest/npcm7xx_sdhci-test.c | 215 +++++++++++++++++++++++++++++++
 tests/qtest/meson.build          |   1 +
 2 files changed, 216 insertions(+)
 create mode 100644 tests/qtest/npcm7xx_sdhci-test.c

diff --git a/tests/qtest/npcm7xx_sdhci-test.c b/tests/qtest/npcm7xx_sdhci-test.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/qtest/npcm7xx_sdhci-test.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * QTests for NPCM7xx SD-3.0 / MMC-4.51 Host Controller
+ *
+ * Copyright (c) 2022 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/sd/npcm7xx_sdhci.h"
+
+#include "libqos/libqtest.h"
+#include "libqtest-single.h"
+#include "libqos/sdhci-cmd.h"
+
+#define NPCM7XX_REG_SIZE 0x100
+#define NPCM7XX_MMC_BA 0xF0842000
+#define NPCM7XX_BLK_SIZE 512
+#define NPCM7XX_TEST_IMAGE_SIZE (1 << 30)
+
+char *sd_path;
+
+static QTestState *setup_sd_card(void)
+{
+    QTestState *qts = qtest_initf(
+        "-machine kudo-bmc "
+        "-device sd-card,drive=drive0 "
+        "-drive id=drive0,if=none,file=%s,format=raw,auto-read-only=off",
+        sd_path);
+
+    qtest_writew(qts, NPCM7XX_MMC_BA + SDHC_SWRST, SDHC_RESET_ALL);
+    qtest_writew(qts, NPCM7XX_MMC_BA + SDHC_CLKCON,
+                 SDHC_CLOCK_SDCLK_EN | SDHC_CLOCK_INT_STABLE |
+                     SDHC_CLOCK_INT_EN);
+    sdhci_cmd_regs(qts, NPCM7XX_MMC_BA, 0, 0, 0, 0, SDHC_APP_CMD);
+    sdhci_cmd_regs(qts, NPCM7XX_MMC_BA, 0, 0, 0x41200000, 0, (41 << 8));
+    sdhci_cmd_regs(qts, NPCM7XX_MMC_BA, 0, 0, 0, 0, SDHC_ALL_SEND_CID);
+    sdhci_cmd_regs(qts, NPCM7XX_MMC_BA, 0, 0, 0, 0, SDHC_SEND_RELATIVE_ADDR);
+    sdhci_cmd_regs(qts, NPCM7XX_MMC_BA, 0, 0, 0x45670000, 0,
+                   SDHC_SELECT_DESELECT_CARD);
+
+    return qts;
+}
+
+static void write_sdread(QTestState *qts, const char *msg)
+{
+    int fd, ret;
+    size_t len = strlen(msg);
+    char *rmsg = g_malloc(len);
+
+    /* write message to sd */
+    fd = open(sd_path, O_WRONLY);
+    g_assert(fd >= 0);
+    ret = write(fd, msg, len);
+    close(fd);
+    g_assert(ret == len);
+
+    /* read message using sdhci */
+    ret = sdhci_read_cmd(qts, NPCM7XX_MMC_BA, rmsg, len);
+    g_assert(ret == len);
+    g_assert(!memcmp(rmsg, msg, len));
+
+    g_free(rmsg);
+}
+
+/* Check MMC can read values from sd */
+static void test_read_sd(void)
+{
+    QTestState *qts = setup_sd_card();
+
+    write_sdread(qts, "hello world");
+    write_sdread(qts, "goodbye");
+
+    qtest_quit(qts);
+}
+
+static void sdwrite_read(QTestState *qts, const char *msg)
+{
+    int fd, ret;
+    size_t len = strlen(msg);
+    char *rmsg = g_malloc(len);
+
+    /* write message using sdhci */
+    sdhci_write_cmd(qts, NPCM7XX_MMC_BA, msg, len, NPCM7XX_BLK_SIZE);
+
+    /* read message from sd */
+    fd = open(sd_path, O_RDONLY);
+    g_assert(fd >= 0);
+    ret = read(fd, rmsg, len);
+    close(fd);
+    g_assert(ret == len);
+
+    g_assert(!memcmp(rmsg, msg, len));
+
+    g_free(rmsg);
+}
+
+/* Check MMC can write values to sd */
+static void test_write_sd(void)
+{
+    QTestState *qts = setup_sd_card();
+
+    sdwrite_read(qts, "hello world");
+    sdwrite_read(qts, "goodbye");
+
+    qtest_quit(qts);
+}
+
+/* Check SDHCI has correct default values. */
+static void test_reset(void)
+{
+    QTestState *qts = qtest_init("-machine kudo-bmc");
+    uint64_t addr = NPCM7XX_MMC_BA;
+    uint64_t end_addr = addr + NPCM7XX_REG_SIZE;
+    uint16_t prstvals_resets[] = {NPCM7XX_PRSTVALS_0_RESET,
+                                  NPCM7XX_PRSTVALS_1_RESET,
+                                  0,
+                                  NPCM7XX_PRSTVALS_3_RESET,
+                                  0,
+                                  0};
+    int i;
+    uint32_t mask;
+
+    while (addr < end_addr) {
+        switch (addr - NPCM7XX_MMC_BA) {
+        case SDHC_PRNSTS:
+            /*
+             * ignores bits 20 to 24: they are changed when reading registers
+             */
+            mask = 0x1f00000;
+            g_assert_cmphex(qtest_readl(qts, addr) | mask, ==,
+                            NPCM7XX_PRSNTS_RESET | mask);
+            addr += 4;
+            break;
+        case SDHC_BLKGAP:
+            g_assert_cmphex(qtest_readb(qts, addr), ==, NPCM7XX_BLKGAP_RESET);
+            addr += 1;
+            break;
+        case SDHC_CAPAB:
+            g_assert_cmphex(qtest_readq(qts, addr), ==, NPCM7XX_CAPAB_RESET);
+            addr += 8;
+            break;
+        case SDHC_MAXCURR:
+            g_assert_cmphex(qtest_readq(qts, addr), ==, NPCM7XX_MAXCURR_RESET);
+            addr += 8;
+            break;
+        case SDHC_HCVER:
+            g_assert_cmphex(qtest_readw(qts, addr), ==, NPCM7XX_HCVER_RESET);
+            addr += 2;
+            break;
+        case NPCM7XX_PRSTVALS:
+            for (i = 0; i < NPCM7XX_PRSTVALS_SIZE; ++i) {
+                g_assert_cmphex(qtest_readw(qts, addr + 2 * i), ==,
+                                prstvals_resets[i]);
+            }
+            addr += NPCM7XX_PRSTVALS_SIZE * 2;
+            break;
+        default:
+            g_assert_cmphex(qtest_readb(qts, addr), ==, 0);
+            addr += 1;
+        }
+    }
+
+    qtest_quit(qts);
+}
+
+static void drive_destroy(void)
+{
+    unlink(sd_path);
+    g_free(sd_path);
+}
+
+static void drive_create(void)
+{
+    int fd, ret;
+    GError *error = NULL;
+
+    /* Create a temporary raw image */
+    fd = g_file_open_tmp("sdhci_XXXXXX", &sd_path, &error);
+    if (fd == -1) {
+        fprintf(stderr, "unable to create sdhci file: %s\n", error->message);
+        g_error_free(error);
+    }
+    g_assert(sd_path != NULL);
+
+    ret = ftruncate(fd, NPCM7XX_TEST_IMAGE_SIZE);
+    g_assert_cmpint(ret, ==, 0);
+    g_message("%s", sd_path);
+    close(fd);
+}
+
+int main(int argc, char **argv)
+{
+    int ret;
+
+    drive_create();
+
+    g_test_init(&argc, &argv, NULL);
+
+    qtest_add_func("npcm7xx_sdhci/reset", test_reset);
+    qtest_add_func("npcm7xx_sdhci/write_sd", test_write_sd);
+    qtest_add_func("npcm7xx_sdhci/read_sd", test_read_sd);
+
+    ret = g_test_run();
+    drive_destroy();
+    return ret;
+}
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -XXX,XX +XXX,XX @@ qtests_npcm7xx = \
    'npcm7xx_gpio-test',
    'npcm7xx_pwm-test',
    'npcm7xx_rng-test',
+   'npcm7xx_sdhci-test',
    'npcm7xx_smbus-test',
    'npcm7xx_timer-test',
    'npcm7xx_watchdog_timer-test'] + \
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Add new macros to manipulate signed fields within the register.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220301215958.157011-2-richard.henderson@linaro.org
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/registerfields.h | 48 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 47 insertions(+), 1 deletion(-)

diff --git a/include/hw/registerfields.h b/include/hw/registerfields.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/registerfields.h
+++ b/include/hw/registerfields.h
@@ -XXX,XX +XXX,XX @@
     extract64((storage), R_ ## reg ## _ ## field ## _SHIFT,               \
               R_ ## reg ## _ ## field ## _LENGTH)
 
+#define FIELD_SEX8(storage, reg, field)                                   \
+    sextract8((storage), R_ ## reg ## _ ## field ## _SHIFT,               \
+              R_ ## reg ## _ ## field ## _LENGTH)
+#define FIELD_SEX16(storage, reg, field)                                  \
+    sextract16((storage), R_ ## reg ## _ ## field ## _SHIFT,              \
+               R_ ## reg ## _ ## field ## _LENGTH)
+#define FIELD_SEX32(storage, reg, field)                                  \
+    sextract32((storage), R_ ## reg ## _ ## field ## _SHIFT,              \
+               R_ ## reg ## _ ## field ## _LENGTH)
+#define FIELD_SEX64(storage, reg, field)                                  \
+    sextract64((storage), R_ ## reg ## _ ## field ## _SHIFT,              \
+               R_ ## reg ## _ ## field ## _LENGTH)
+
 /* Extract a field from an array of registers */
 #define ARRAY_FIELD_EX32(regs, reg, field)                                \
     FIELD_EX32((regs)[R_ ## reg], reg, field)
@@ -XXX,XX +XXX,XX @@
     _d; })
 #define FIELD_DP64(storage, reg, field, val) ({                           \
     struct {                                                              \
-        uint64_t v:R_ ## reg ## _ ## field ## _LENGTH;                \
+        uint64_t v:R_ ## reg ## _ ## field ## _LENGTH;                    \
+    } _v = { .v = val };                                                  \
+    uint64_t _d;                                                          \
+    _d = deposit64((storage), R_ ## reg ## _ ## field ## _SHIFT,          \
+                  R_ ## reg ## _ ## field ## _LENGTH, _v.v);              \
+    _d; })
+
+#define FIELD_SDP8(storage, reg, field, val) ({                           \
+    struct {                                                              \
+        signed int v:R_ ## reg ## _ ## field ## _LENGTH;                  \
+    } _v = { .v = val };                                                  \
+    uint8_t _d;                                                           \
+    _d = deposit32((storage), R_ ## reg ## _ ## field ## _SHIFT,          \
+                  R_ ## reg ## _ ## field ## _LENGTH, _v.v);              \
+    _d; })
+#define FIELD_SDP16(storage, reg, field, val) ({                          \
+    struct {                                                              \
+        signed int v:R_ ## reg ## _ ## field ## _LENGTH;                  \
+    } _v = { .v = val };                                                  \
+    uint16_t _d;                                                          \
+    _d = deposit32((storage), R_ ## reg ## _ ## field ## _SHIFT,          \
+                  R_ ## reg ## _ ## field ## _LENGTH, _v.v);              \
+    _d; })
+#define FIELD_SDP32(storage, reg, field, val) ({                          \
+    struct {                                                              \
+        signed int v:R_ ## reg ## _ ## field ## _LENGTH;                  \
+    } _v = { .v = val };                                                  \
+    uint32_t _d;                                                          \
+    _d = deposit32((storage), R_ ## reg ## _ ## field ## _SHIFT,          \
+                  R_ ## reg ## _ ## field ## _LENGTH, _v.v);              \
+    _d; })
+#define FIELD_SDP64(storage, reg, field, val) ({                          \
+    struct {                                                              \
+        int64_t v:R_ ## reg ## _ ## field ## _LENGTH;                     \
     } _v = { .v = val };                                                  \
     uint64_t _d;                                                          \
     _d = deposit64((storage), R_ ## reg ## _ ## field ## _SHIFT,          \
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Set this as the kernel would, to 48 bits, to keep the computation
of the address space correct for PAuth.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220301215958.157011-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(DeviceState *dev)
                 aarch64_sve_zcr_get_valid_len(cpu, cpu->sve_default_vq - 1);
         }
         /*
+         * Enable 48-bit address space (TODO: take reserved_va into account).
          * Enable TBI0 but not TBI1.
          * Note that this must match useronly_clean_ptr.
          */
-        env->cp15.tcr_el[1].raw_tcr = (1ULL << 37);
+        env->cp15.tcr_el[1].raw_tcr = 5 | (1ULL << 37);
 
         /* Enable MTE */
         if (cpu_isar_feature(aa64_mte, cpu)) {
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Without FEAT_LVA, the behaviour of programming an invalid value
is IMPLEMENTATION DEFINED.  With FEAT_LVA, programming an invalid
minimum value requires a Translation fault.

It is most self-consistent to choose to generate the fault always.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220301215958.157011-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h |  1 +
 target/arm/helper.c    | 32 ++++++++++++++++++++++++++++----
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ typedef struct ARMVAParameters {
     bool hpd        : 1;
     bool using16k   : 1;
     bool using64k   : 1;
+    bool tsz_oob    : 1;  /* tsz has been clamped to legal range */
 } ARMVAParameters;
 
 ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
                                    ARMMMUIdx mmu_idx, bool data)
 {
     uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
-    bool epd, hpd, using16k, using64k;
-    int select, tsz, tbi, max_tsz;
+    bool epd, hpd, using16k, using64k, tsz_oob;
+    int select, tsz, tbi, max_tsz, min_tsz;
 
     if (!regime_has_2_ranges(mmu_idx)) {
         select = 0;
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
     } else {
         max_tsz = 39;
     }
+    min_tsz = 16;  /* TODO: ARMv8.2-LVA  */
 
-    tsz = MIN(tsz, max_tsz);
-    tsz = MAX(tsz, 16);  /* TODO: ARMv8.2-LVA  */
+    if (tsz > max_tsz) {
+        tsz = max_tsz;
+        tsz_oob = true;
+    } else if (tsz < min_tsz) {
+        tsz = min_tsz;
+        tsz_oob = true;
+    } else {
+        tsz_oob = false;
+    }
 
     /* Present TBI as a composite with TBID.  */
     tbi = aa64_va_parameter_tbi(tcr, mmu_idx);
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
         .hpd = hpd,
         .using16k = using16k,
         .using64k = using64k,
+        .tsz_oob = tsz_oob,
     };
 }
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
         param = aa64_va_parameters(env, address, mmu_idx,
                                    access_type != MMU_INST_FETCH);
         level = 0;
+
+        /*
+         * If TxSZ is programmed to a value larger than the maximum,
+         * or smaller than the effective minimum, it is IMPLEMENTATION
+         * DEFINED whether we behave as if the field were programmed
+         * within bounds, or if a level 0 Translation fault is generated.
+         *
+         * With FEAT_LVA, fault on less than minimum becomes required,
+         * so our choice is to always raise the fault.
+         */
+        if (param.tsz_oob) {
+            fault_type = ARMFault_Translation;
+            goto do_fault;
+        }
+
         addrsize = 64 - 8 * param.tbi;
         inputsize = 64 - param.tsz;
     } else {
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

We will shortly share parts of this function with other portions
of address translation.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220301215958.157011-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h | 19 +------------------
 target/arm/helper.c    | 22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline void update_spsel(CPUARMState *env, uint32_t imm)
  * Returns the implementation defined bit-width of physical addresses.
  * The ARMv8 reference manuals refer to this as PAMax().
  */
-static inline unsigned int arm_pamax(ARMCPU *cpu)
-{
-    static const unsigned int pamax_map[] = {
-        [0] = 32,
-        [1] = 36,
-        [2] = 40,
-        [3] = 42,
-        [4] = 44,
-        [5] = 48,
-    };
-    unsigned int parange =
-        FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
-
-    /* id_aa64mmfr0 is a read-only register so values outside of the
-     * supported mappings can be considered an implementation error.  */
-    assert(parange < ARRAY_SIZE(pamax_map));
-    return pamax_map[parange];
-}
+unsigned int arm_pamax(ARMCPU *cpu);
 
 /* Return true if extended addresses are enabled.
  * This is always the case if our translation regime is 64 bit,
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint8_t convert_stage2_attrs(CPUARMState *env, uint8_t s2attrs)
 }
 #endif /* !CONFIG_USER_ONLY */
 
+/* The cpu-specific constant value of PAMax; also used by hw/arm/virt. */
+unsigned int arm_pamax(ARMCPU *cpu)
+{
+    static const unsigned int pamax_map[] = {
+        [0] = 32,
+        [1] = 36,
+        [2] = 40,
+        [3] = 42,
+        [4] = 44,
+        [5] = 48,
+    };
+    unsigned int parange =
+        FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
+
+    /*
+     * id_aa64mmfr0 is a read-only register so values outside of the
+     * supported mappings can be considered an implementation error.
+     */
+    assert(parange < ARRAY_SIZE(pamax_map));
+    return pamax_map[parange];
+}
+
 static int aa64_va_parameter_tbi(uint64_t tcr, ARMMMUIdx mmu_idx)
 {
     if (regime_has_2_ranges(mmu_idx)) {
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Pass down the width of the output address from translation.
For now this is still just PAMax, but a subsequent patch will
compute the correct value from TCR_ELx.{I}PS.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220301215958.157011-6-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ do_fault:
  * false otherwise.
  */
 static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
-                               int inputsize, int stride)
+                               int inputsize, int stride, int outputsize)
 {
     const int grainsize = stride + 3;
     int startsizecheck;
@@ -XXX,XX +XXX,XX @@ static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
     }
 
     if (is_aa64) {
-        CPUARMState *env = &cpu->env;
-        unsigned int pamax = arm_pamax(cpu);
-
         switch (stride) {
         case 13: /* 64KB Pages.  */
-            if (level == 0 || (level == 1 && pamax <= 42)) {
+            if (level == 0 || (level == 1 && outputsize <= 42)) {
                 return false;
             }
             break;
         case 11: /* 16KB Pages.  */
-            if (level == 0 || (level == 1 && pamax <= 40)) {
+            if (level == 0 || (level == 1 && outputsize <= 40)) {
                 return false;
             }
             break;
         case 9: /* 4KB Pages.  */
-            if (level == 0 && pamax <= 42) {
+            if (level == 0 && outputsize <= 42) {
                 return false;
             }
             break;
@@ -XXX,XX +XXX,XX @@ static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
         }
 
         /* Inputsize checks.  */
-        if (inputsize > pamax &&
-            (arm_el_is_aa64(env, 1) || inputsize > 40)) {
+        if (inputsize > outputsize &&
+            (arm_el_is_aa64(&cpu->env, 1) || inputsize > 40)) {
             /* This is CONSTRAINED UNPREDICTABLE and we choose to fault.  */
             return false;
         }
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
     target_ulong page_size;
     uint32_t attrs;
     int32_t stride;
-    int addrsize, inputsize;
+    int addrsize, inputsize, outputsize;
     TCR *tcr = regime_tcr(env, mmu_idx);
     int ap, ns, xn, pxn;
     uint32_t el = regime_el(env, mmu_idx);
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
 
         addrsize = 64 - 8 * param.tbi;
         inputsize = 64 - param.tsz;
+        outputsize = arm_pamax(cpu);
     } else {
         param = aa32_va_parameters(env, address, mmu_idx);
         level = 1;
         addrsize = (mmu_idx == ARMMMUIdx_Stage2 ? 40 : 32);
         inputsize = addrsize - param.tsz;
+        outputsize = 40;
     }
 
     /*
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
 
         /* Check that the starting level is valid. */
         ok = check_s2_mmu_setup(cpu, aarch64, startlevel,
-                                inputsize, stride);
+                                inputsize, stride, outputsize);
         if (!ok) {
             fault_type = ARMFault_Translation;
             goto do_fault;
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

The macro is a bit more readable than the inlined computation.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220301215958.157011-7-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
         level = startlevel;
     }
 
-    indexmask_grainsize = (1ULL << (stride + 3)) - 1;
-    indexmask = (1ULL << (inputsize - (stride * (4 - level)))) - 1;
+    indexmask_grainsize = MAKE_64BIT_MASK(0, stride + 3);
+    indexmask = MAKE_64BIT_MASK(0, inputsize - (stride * (4 - level)));
 
     /* Now we can extract the actual base address from the TTBR */
     descaddr = extract64(ttbr, 0, 48);
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

This field controls the output (intermediate) physical address size
of the translation process.  V8 requires to raise an AddressSize
fault if the page tables are programmed incorrectly, such that any
intermediate descriptor address, or the final translated address,
is out of range.

Add a PS field to ARMVAParameters, and properly compute outputsize
in get_phys_addr_lpae.  Test the descaddr as extracted from TTBR
and from page table entries.

Restrict descaddrmask so that we won't raise the fault for v7.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220301215958.157011-8-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h |  1 +
 target/arm/helper.c    | 72 ++++++++++++++++++++++++++++++++----------
 2 files changed, 57 insertions(+), 16 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline uint32_t aarch64_pstate_valid_mask(const ARMISARegisters *id)
  */
 typedef struct ARMVAParameters {
     unsigned tsz    : 8;
+    unsigned ps     : 3;
     unsigned select : 1;
     bool tbi        : 1;
     bool epd        : 1;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint8_t convert_stage2_attrs(CPUARMState *env, uint8_t s2attrs)
 }
 #endif /* !CONFIG_USER_ONLY */
 
+/* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
+static const uint8_t pamax_map[] = {
+    [0] = 32,
+    [1] = 36,
+    [2] = 40,
+    [3] = 42,
+    [4] = 44,
+    [5] = 48,
+};
+
 /* The cpu-specific constant value of PAMax; also used by hw/arm/virt. */
 unsigned int arm_pamax(ARMCPU *cpu)
 {
-    static const unsigned int pamax_map[] = {
-        [0] = 32,
-        [1] = 36,
-        [2] = 40,
-        [3] = 42,
-        [4] = 44,
-        [5] = 48,
-    };
     unsigned int parange =
         FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
 
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
 {
     uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
     bool epd, hpd, using16k, using64k, tsz_oob;
-    int select, tsz, tbi, max_tsz, min_tsz;
+    int select, tsz, tbi, max_tsz, min_tsz, ps;
 
     if (!regime_has_2_ranges(mmu_idx)) {
         select = 0;
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
             hpd = extract32(tcr, 24, 1);
         }
         epd = false;
+        ps = extract32(tcr, 16, 3);
     } else {
         /*
          * Bit 55 is always between the two regions, and is canonical for
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
             epd = extract32(tcr, 23, 1);
             hpd = extract64(tcr, 42, 1);
         }
+        ps = extract64(tcr, 32, 3);
     }
 
     if (cpu_isar_feature(aa64_st, env_archcpu(env))) {
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
 
     return (ARMVAParameters) {
         .tsz = tsz,
+        .ps = ps,
         .select = select,
         .tbi = tbi,
         .epd = epd,
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
 
     /* TODO: This code does not support shareability levels. */
     if (aarch64) {
+        int ps;
+
         param = aa64_va_parameters(env, address, mmu_idx,
                                    access_type != MMU_INST_FETCH);
         level = 0;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
 
         addrsize = 64 - 8 * param.tbi;
         inputsize = 64 - param.tsz;
-        outputsize = arm_pamax(cpu);
+
+        /*
+         * Bound PS by PARANGE to find the effective output address size.
+         * ID_AA64MMFR0 is a read-only register so values outside of the
+         * supported mappings can be considered an implementation error.
+         */
+        ps = FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE);
+        ps = MIN(ps, param.ps);
+        assert(ps < ARRAY_SIZE(pamax_map));
+        outputsize = pamax_map[ps];
     } else {
         param = aa32_va_parameters(env, address, mmu_idx);
         level = 1;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
 
     /* Now we can extract the actual base address from the TTBR */
     descaddr = extract64(ttbr, 0, 48);
+
+    /*
+     * If the base address is out of range, raise AddressSizeFault.
+     * In the pseudocode, this is !IsZero(baseregister<47:outputsize>),
+     * but we've just cleared the bits above 47, so simplify the test.
+     */
+    if (descaddr >> outputsize) {
+        level = 0;
+        fault_type = ARMFault_AddressSize;
+        goto do_fault;
+    }
+
     /*
      * We rely on this masking to clear the RES0 bits at the bottom of the TTBR
      * and also to mask out CnP (bit 0) which could validly be non-zero.
      */
     descaddr &= ~indexmask;
 
-    /* The address field in the descriptor goes up to bit 39 for ARMv7
-     * but up to bit 47 for ARMv8, but we use the descaddrmask
-     * up to bit 39 for AArch32, because we don't need other bits in that case
-     * to construct next descriptor address (anyway they should be all zeroes).
+    /*
+     * For AArch32, the address field in the descriptor goes up to bit 39
+     * for both v7 and v8.  However, for v8 the SBZ bits [47:40] must be 0
+     * or an AddressSize fault is raised.  So for v8 we extract those SBZ
+     * bits as part of the address, which will be checked via outputsize.
+     * For AArch64, the address field always goes up to bit 47 (with extra
+     * bits for FEAT_LPA placed elsewhere).  AArch64 implies v8.
      */
-    descaddrmask = ((1ull << (aarch64 ? 48 : 40)) - 1) &
-                   ~indexmask_grainsize;
+    if (arm_feature(env, ARM_FEATURE_V8)) {
+        descaddrmask = MAKE_64BIT_MASK(0, 48);
+    } else {
+        descaddrmask = MAKE_64BIT_MASK(0, 40);
+    }
+    descaddrmask &= ~indexmask_grainsize;
 
     /* Secure accesses start with the page table in secure memory and
      * can be downgraded to non-secure at any step. Non-secure accesses
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
             /* Invalid, or the Reserved level 3 encoding */
             goto do_fault;
         }
+
         descaddr = descriptor & descaddrmask;
+        if (descaddr >> outputsize) {
+            fault_type = ARMFault_AddressSize;
+            goto do_fault;
+        }
 
         if ((descriptor & 2) && (level < 3)) {
             /* Table entry. The top five bits are attributes which may
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

The original A.a revision of the AArch64 ARM required that we
force-extend the addresses in these registers from 49 bits.
This language has been loosened via a combination of IMPLEMENTATION
DEFINED and CONSTRAINTED UNPREDICTABLE to allow consideration of
the entire aligned address.

This means that we do not have to consider whether or not FEAT_LVA
is enabled, and decide from which bit an address might need to be
extended.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220301215958.157011-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 32 ++++++++++++++++++++++++--------
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void dbgwvr_write(CPUARMState *env, const ARMCPRegInfo *ri,
     ARMCPU *cpu = env_archcpu(env);
     int i = ri->crm;
 
-    /* Bits [63:49] are hardwired to the value of bit [48]; that is, the
-     * register reads and behaves as if values written are sign extended.
+    /*
      * Bits [1:0] are RES0.
+     *
+     * It is IMPLEMENTATION DEFINED whether [63:49] ([63:53] with FEAT_LVA)
+     * are hardwired to the value of bit [48] ([52] with FEAT_LVA), or if
+     * they contain the value written.  It is CONSTRAINED UNPREDICTABLE
+     * whether the RESS bits are ignored when comparing an address.
+     *
+     * Therefore we are allowed to compare the entire register, which lets
+     * us avoid considering whether or not FEAT_LVA is actually enabled.
      */
-    value = sextract64(value, 0, 49) & ~3ULL;
+    value &= ~3ULL;
 
     raw_write(env, ri, value);
     hw_watchpoint_update(cpu, i);
@@ -XXX,XX +XXX,XX @@ void hw_breakpoint_update(ARMCPU *cpu, int n)
     case 0: /* unlinked address match */
     case 1: /* linked address match */
     {
-        /* Bits [63:49] are hardwired to the value of bit [48]; that is,
-         * we behave as if the register was sign extended. Bits [1:0] are
-         * RES0. The BAS field is used to allow setting breakpoints on 16
-         * bit wide instructions; it is CONSTRAINED UNPREDICTABLE whether
+        /*
+         * Bits [1:0] are RES0.
+         *
+         * It is IMPLEMENTATION DEFINED whether bits [63:49]
+         * ([63:53] for FEAT_LVA) are hardwired to a copy of the sign bit
+         * of the VA field ([48] or [52] for FEAT_LVA), or whether the
+         * value is read as written.  It is CONSTRAINED UNPREDICTABLE
+         * whether the RESS bits are ignored when comparing an address.
+         * Therefore we are allowed to compare the entire register, which
+         * lets us avoid considering whether FEAT_LVA is actually enabled.
+         *
+         * The BAS field is used to allow setting breakpoints on 16-bit
+         * wide instructions; it is CONSTRAINED UNPREDICTABLE whether
          * a bp will fire if the addresses covered by the bp and the addresses
          * covered by the insn overlap but the insn doesn't start at the
          * start of the bp address range. We choose to require the insn and
@@ -XXX,XX +XXX,XX @@ void hw_breakpoint_update(ARMCPU *cpu, int n)
          * See also figure D2-3 in the v8 ARM ARM (DDI0487A.c).
          */
         int bas = extract64(bcr, 5, 4);
-        addr = sextract64(bvr, 0, 49) & ~3ULL;
+        addr = bvr & ~3ULL;
         if (bas == 0) {
             return;
         }
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

This feature is relatively small, as it applies only to
64k pages and thus requires no additional changes to the
table descriptor walking algorithm, only a change to the
minimum TSZ (which is the inverse of the maximum virtual
address space size).

Note that this feature widens VBAR_ELx, but we already
treat the register as being 64 bits wide.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220301215958.157011-10-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/emulation.rst | 1 +
 target/arm/cpu-param.h        | 2 +-
 target/arm/cpu.h              | 5 +++++
 target/arm/cpu64.c            | 1 +
 target/arm/helper.c           | 9 ++++++++-
 5 files changed, 16 insertions(+), 2 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

This feature widens physical addresses (and intermediate physical
addresses for 2-stage translation) from 48 to 52 bits, when using
64k pages.  The only thing left at this point is to handle the
extra bits in the TTBR and in the table descriptors.

Note that PAR_EL1 and HPFAR_EL2 are nominally extended, but we don't
mask out the high bits when writing to those registers, so no changes
are required there.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220301215958.157011-11-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/emulation.rst |  1 +
 target/arm/cpu-param.h        |  2 +-
 target/arm/cpu64.c            |  2 +-
 target/arm/helper.c           | 19 ++++++++++++++++---
 4 files changed, 19 insertions(+), 5 deletions(-)

From: Richard Henderson <richard.henderson@linaro.org>

With FEAT_LPA2, rather than introducing translation level 4,
we introduce level -1, below the current level 0.  Extend
arm_fi_to_lfsc to handle these faults.

Assert that this new translation level does not leak into
fault types for which it is not defined, which allows some
masking of fi->level to be removed.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220301215958.157011-12-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/internals.h | 35 +++++++++++++++++++++++++++++------
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline uint32_t arm_fi_to_lfsc(ARMMMUFaultInfo *fi)
     case ARMFault_None:
         return 0;
     case ARMFault_AddressSize:
-        fsc = fi->level & 3;
+        assert(fi->level >= -1 && fi->level <= 3);
+        if (fi->level < 0) {
+            fsc = 0b101001;
+        } else {
+            fsc = fi->level;
+        }
         break;
     case ARMFault_AccessFlag:
-        fsc = (fi->level & 3) | (0x2 << 2);
+        assert(fi->level >= 0 && fi->level <= 3);
+        fsc = 0b001000 | fi->level;
         break;
     case ARMFault_Permission:
-        fsc = (fi->level & 3) | (0x3 << 2);
+        assert(fi->level >= 0 && fi->level <= 3);
+        fsc = 0b001100 | fi->level;
         break;
     case ARMFault_Translation:
-        fsc = (fi->level & 3) | (0x1 << 2);
+        assert(fi->level >= -1 && fi->level <= 3);
+        if (fi->level < 0) {
+            fsc = 0b101011;
+        } else {
+            fsc = 0b000100 | fi->level;
+        }
         break;
     case ARMFault_SyncExternal:
         fsc = 0x10 | (fi->ea << 12);
         break;
     case ARMFault_SyncExternalOnWalk:
-        fsc = (fi->level & 3) | (0x5 << 2) | (fi->ea << 12);
+        assert(fi->level >= -1 && fi->level <= 3);
+        if (fi->level < 0) {
+            fsc = 0b010011;
+        } else {
+            fsc = 0b010100 | fi->level;
+        }
+        fsc |= fi->ea << 12;
         break;
     case ARMFault_SyncParity:
         fsc = 0x18;
         break;
     case ARMFault_SyncParityOnWalk:
-        fsc = (fi->level & 3) | (0x7 << 2);
+        assert(fi->level >= -1 && fi->level <= 3);
+        if (fi->level < 0) {
+            fsc = 0b011011;
+        } else {
+            fsc = 0b011100 | fi->level;
+        }
         break;
     case ARMFault_AsyncParity:
         fsc = 0x19;
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

Merge tlbi_aa64_range_get_length and tlbi_aa64_range_get_base,
returning a structure containing both results.  Pass in the
ARMMMUIdx, rather than the digested two_ranges boolean.

This is in preparation for FEAT_LPA2, where the interpretation
of 'value' depends on the effective value of DS for the regime.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220301215958.157011-13-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 58 +++++++++++++++++++--------------------------
 1 file changed, 24 insertions(+), 34 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae3is_write(CPUARMState *env, const ARMCPRegInfo *ri,
 }
 
 #ifdef TARGET_AARCH64
-static uint64_t tlbi_aa64_range_get_length(CPUARMState *env,
-                                           uint64_t value)
-{
-    unsigned int page_shift;
-    unsigned int page_size_granule;
-    uint64_t num;
-    uint64_t scale;
-    uint64_t exponent;
+typedef struct {
+    uint64_t base;
     uint64_t length;
+} TLBIRange;
+
+static TLBIRange tlbi_aa64_get_range(CPUARMState *env, ARMMMUIdx mmuidx,
+                                     uint64_t value)
+{
+    unsigned int page_size_granule, page_shift, num, scale, exponent;
+    TLBIRange ret = { };
 
-    num = extract64(value, 39, 5);
-    scale = extract64(value, 44, 2);
     page_size_granule = extract64(value, 46, 2);
 
     if (page_size_granule == 0) {
         qemu_log_mask(LOG_GUEST_ERROR, "Invalid page size granule %d\n",
                       page_size_granule);
-        return 0;
+        return ret;
     }
 
     page_shift = (page_size_granule - 1) * 2 + 12;
-
+    num = extract64(value, 39, 5);
+    scale = extract64(value, 44, 2);
     exponent = (5 * scale) + 1;
-    length = (num + 1) << (exponent + page_shift);
 
-    return length;
-}
+    ret.length = (num + 1) << (exponent + page_shift);
 
-static uint64_t tlbi_aa64_range_get_base(CPUARMState *env, uint64_t value,
-                                        bool two_ranges)
-{
-    /* TODO: ARMv8.7 FEAT_LPA2 */
-    uint64_t pageaddr;
-
-    if (two_ranges) {
-        pageaddr = sextract64(value, 0, 37) << TARGET_PAGE_BITS;
+    if (regime_has_2_ranges(mmuidx)) {
+        ret.base = sextract64(value, 0, 37) << TARGET_PAGE_BITS;
     } else {
-        pageaddr = extract64(value, 0, 37) << TARGET_PAGE_BITS;
+        ret.base = extract64(value, 0, 37) << TARGET_PAGE_BITS;
     }
 
-    return pageaddr;
+    return ret;
 }
 
 static void do_rvae_write(CPUARMState *env, uint64_t value,
                           int idxmap, bool synced)
 {
     ARMMMUIdx one_idx = ARM_MMU_IDX_A | ctz32(idxmap);
-    bool two_ranges = regime_has_2_ranges(one_idx);
-    uint64_t baseaddr, length;
+    TLBIRange range;
     int bits;
 
-    baseaddr = tlbi_aa64_range_get_base(env, value, two_ranges);
-    length = tlbi_aa64_range_get_length(env, value);
-    bits = tlbbits_for_regime(env, one_idx, baseaddr);
+    range = tlbi_aa64_get_range(env, one_idx, value);
+    bits = tlbbits_for_regime(env, one_idx, range.base);
 
     if (synced) {
         tlb_flush_range_by_mmuidx_all_cpus_synced(env_cpu(env),
-                                                  baseaddr,
-                                                  length,
+                                                  range.base,
+                                                  range.length,
                                                   idxmap,
                                                   bits);
     } else {
-        tlb_flush_range_by_mmuidx(env_cpu(env), baseaddr,
-                                  length, idxmap, bits);
+        tlb_flush_range_by_mmuidx(env_cpu(env), range.base,
+                                  range.length, idxmap, bits);
     }
 }
 
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

The shift of the BaseADDR field depends on the translation
granule in use.

Fixes: 84940ed8255 ("target/arm: Add support for FEAT_TLBIRANGE")
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220301215958.157011-14-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static TLBIRange tlbi_aa64_get_range(CPUARMState *env, ARMMMUIdx mmuidx,
     ret.length = (num + 1) << (exponent + page_shift);
 
     if (regime_has_2_ranges(mmuidx)) {
-        ret.base = sextract64(value, 0, 37) << TARGET_PAGE_BITS;
+        ret.base = sextract64(value, 0, 37);
     } else {
-        ret.base = extract64(value, 0, 37) << TARGET_PAGE_BITS;
+        ret.base = extract64(value, 0, 37);
     }
+    ret.base <<= page_shift;
 
     return ret;
 }
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

For FEAT_LPA2, we will need other ARMVAParameters, which themselves
depend on the translation granule in use.  We might as well validate
that the given TG matches; the architecture "does not require that
the instruction invalidates any entries" if this is not true.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220301215958.157011-15-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static TLBIRange tlbi_aa64_get_range(CPUARMState *env, ARMMMUIdx mmuidx,
                                      uint64_t value)
 {
     unsigned int page_size_granule, page_shift, num, scale, exponent;
+    /* Extract one bit to represent the va selector in use. */
+    uint64_t select = sextract64(value, 36, 1);
+    ARMVAParameters param = aa64_va_parameters(env, select, mmuidx, true);
     TLBIRange ret = { };
 
     page_size_granule = extract64(value, 46, 2);
 
-    if (page_size_granule == 0) {
-        qemu_log_mask(LOG_GUEST_ERROR, "Invalid page size granule %d\n",
+    /* The granule encoded in value must match the granule in use. */
+    if (page_size_granule != (param.using64k ? 3 : param.using16k ? 2 : 1)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "Invalid tlbi page size granule %d\n",
                       page_size_granule);
         return ret;
     }
@@ -XXX,XX +XXX,XX @@ static TLBIRange tlbi_aa64_get_range(CPUARMState *env, ARMMMUIdx mmuidx,
 
     ret.length = (num + 1) << (exponent + page_shift);
 
-    if (regime_has_2_ranges(mmuidx)) {
+    if (param.select) {
         ret.base = sextract64(value, 0, 37);
     } else {
         ret.base = extract64(value, 0, 37);
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

We support 16k pages, but do not advertize that in ID_AA64MMFR0.

The value 0 in the TGRAN*_2 fields indicates that stage2 lookups defer
to the same support as stage1 lookups.  This setting is deprecated, so
indicate support for all stage2 page sizes directly.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20220301215958.157011-16-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu64.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
 
     t = cpu->isar.id_aa64mmfr0;
     t = FIELD_DP64(t, ID_AA64MMFR0, PARANGE, 6); /* FEAT_LPA: 52 bits */
+    t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN16, 1);   /* 16k pages supported */
+    t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN16_2, 2); /* 16k stage2 supported */
+    t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN64_2, 2); /* 64k stage2 supported */
+    t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN4_2, 2);  /*  4k stage2 supported */
     cpu->isar.id_aa64mmfr0 = t;
 
     t = cpu->isar.id_aa64mmfr1;
-- 
2.25.1

From: Richard Henderson <richard.henderson@linaro.org>

This feature widens physical addresses (and intermediate physical
addresses for 2-stage translation) from 48 to 52 bits, when using
4k or 16k pages.

This introduces the DS bit to TCR_ELx, which is RES0 unless the
page size is enabled and supports LPA2, resulting in the effective
value of DS for a given table walk.  The DS bit changes the format
of the page table descriptor slightly, moving the PS field out to
TCR so that all pages have the same sharability and repurposing
those bits of the page table descriptor for the highest bits of
the output address.

Do not yet enable FEAT_LPA2; we need extra plumbing to avoid
tickling an old kernel bug.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220301215958.157011-17-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/emulation.rst |   1 +
 target/arm/cpu.h              |  22 ++++++++
 target/arm/internals.h        |   2 +
 target/arm/helper.c           | 102 +++++++++++++++++++++++++++++-----
 4 files changed, 112 insertions(+), 15 deletions(-)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
 - FEAT_JSCVT (JavaScript conversion instructions)
 - FEAT_LOR (Limited ordering regions)
 - FEAT_LPA (Large Physical Address space)
+- FEAT_LPA2 (Large Physical and virtual Address space v2)
 - FEAT_LRCPC (Load-acquire RCpc instructions)
 - FEAT_LRCPC2 (Load-acquire RCpc instructions v2)
 - FEAT_LSE (Large System Extensions)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_i8mm(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, I8MM) != 0;
 }
 
+static inline bool isar_feature_aa64_tgran4_lpa2(const ARMISARegisters *id)
+{
+    return FIELD_SEX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN4) >= 1;
+}
+
+static inline bool isar_feature_aa64_tgran4_2_lpa2(const ARMISARegisters *id)
+{
+    unsigned t = FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN4_2);
+    return t >= 3 || (t == 0 && isar_feature_aa64_tgran4_lpa2(id));
+}
+
+static inline bool isar_feature_aa64_tgran16_lpa2(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN16) >= 2;
+}
+
+static inline bool isar_feature_aa64_tgran16_2_lpa2(const ARMISARegisters *id)
+{
+    unsigned t = FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, TGRAN16_2);
+    return t >= 3 || (t == 0 && isar_feature_aa64_tgran16_lpa2(id));
+}
+
 static inline bool isar_feature_aa64_ccidx(const ARMISARegisters *id)
 {
     return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, CCIDX) != 0;
diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline uint32_t aarch64_pstate_valid_mask(const ARMISARegisters *id)
 typedef struct ARMVAParameters {
     unsigned tsz    : 8;
     unsigned ps     : 3;
+    unsigned sh     : 2;
     unsigned select : 1;
     bool tbi        : 1;
     bool epd        : 1;
@@ -XXX,XX +XXX,XX @@ typedef struct ARMVAParameters {
     bool using16k   : 1;
     bool using64k   : 1;
     bool tsz_oob    : 1;  /* tsz has been clamped to legal range */
+    bool ds         : 1;
 } ARMVAParameters;
 
 ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static TLBIRange tlbi_aa64_get_range(CPUARMState *env, ARMMMUIdx mmuidx,
     } else {
         ret.base = extract64(value, 0, 37);
     }
+    if (param.ds) {
+        /*
+         * With DS=1, BaseADDR is always shifted 16 so that it is able
+         * to address all 52 va bits.  The input address is perforce
+         * aligned on a 64k boundary regardless of translation granule.
+         */
+        page_shift = 16;
+    }
     ret.base <<= page_shift;
 
     return ret;
@@ -XXX,XX +XXX,XX @@ static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
     const int grainsize = stride + 3;
     int startsizecheck;
 
-    /* Negative levels are never allowed.  */
-    if (level < 0) {
+    /*
+     * Negative levels are usually not allowed...
+     * Except for FEAT_LPA2, 4k page table, 52-bit address space, which
+     * begins with level -1.  Note that previous feature tests will have
+     * eliminated this combination if it is not enabled.
+     */
+    if (level < (inputsize == 52 && stride == 9 ? -1 : 0)) {
         return false;
     }
 
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
                                    ARMMMUIdx mmu_idx, bool data)
 {
     uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
-    bool epd, hpd, using16k, using64k, tsz_oob;
-    int select, tsz, tbi, max_tsz, min_tsz, ps;
+    bool epd, hpd, using16k, using64k, tsz_oob, ds;
+    int select, tsz, tbi, max_tsz, min_tsz, ps, sh;
+    ARMCPU *cpu = env_archcpu(env);
 
     if (!regime_has_2_ranges(mmu_idx)) {
         select = 0;
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
             hpd = extract32(tcr, 24, 1);
         }
         epd = false;
+        sh = extract32(tcr, 12, 2);
         ps = extract32(tcr, 16, 3);
+        ds = extract64(tcr, 32, 1);
     } else {
         /*
          * Bit 55 is always between the two regions, and is canonical for
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
         if (!select) {
             tsz = extract32(tcr, 0, 6);
             epd = extract32(tcr, 7, 1);
+            sh = extract32(tcr, 12, 2);
             using64k = extract32(tcr, 14, 1);
             using16k = extract32(tcr, 15, 1);
             hpd = extract64(tcr, 41, 1);
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
             using64k = tg == 3;
             tsz = extract32(tcr, 16, 6);
             epd = extract32(tcr, 23, 1);
+            sh = extract32(tcr, 28, 2);
             hpd = extract64(tcr, 42, 1);
         }
         ps = extract64(tcr, 32, 3);
+        ds = extract64(tcr, 59, 1);
     }
 
-    if (cpu_isar_feature(aa64_st, env_archcpu(env))) {
+    if (cpu_isar_feature(aa64_st, cpu)) {
         max_tsz = 48 - using64k;
     } else {
         max_tsz = 39;
     }
 
+    /*
+     * DS is RES0 unless FEAT_LPA2 is supported for the given page size;
+     * adjust the effective value of DS, as documented.
+     */
     min_tsz = 16;
     if (using64k) {
-        if (cpu_isar_feature(aa64_lva, env_archcpu(env))) {
+        if (cpu_isar_feature(aa64_lva, cpu)) {
+            min_tsz = 12;
+        }
+        ds = false;
+    } else if (ds) {
+        switch (mmu_idx) {
+        case ARMMMUIdx_Stage2:
+        case ARMMMUIdx_Stage2_S:
+            if (using16k) {
+                ds = cpu_isar_feature(aa64_tgran16_2_lpa2, cpu);
+            } else {
+                ds = cpu_isar_feature(aa64_tgran4_2_lpa2, cpu);
+            }
+            break;
+        default:
+            if (using16k) {
+                ds = cpu_isar_feature(aa64_tgran16_lpa2, cpu);
+            } else {
+                ds = cpu_isar_feature(aa64_tgran4_lpa2, cpu);
+            }
+            break;
+        }
+        if (ds) {
             min_tsz = 12;
         }
     }
-    /* TODO: FEAT_LPA2 */
 
     if (tsz > max_tsz) {
         tsz = max_tsz;
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
     return (ARMVAParameters) {
         .tsz = tsz,
         .ps = ps,
+        .sh = sh,
         .select = select,
         .tbi = tbi,
         .epd = epd,
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
         .using16k = using16k,
         .using64k = using64k,
         .tsz_oob = tsz_oob,
+        .ds = ds,
     };
 }
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
          * VTCR_EL2.SL0 field (whose interpretation depends on the page size)
          */
         uint32_t sl0 = extract32(tcr->raw_tcr, 6, 2);
+        uint32_t sl2 = extract64(tcr->raw_tcr, 33, 1);
         uint32_t startlevel;
         bool ok;
 
-        if (!aarch64 || stride == 9) {
+        /* SL2 is RES0 unless DS=1 & 4kb granule. */
+        if (param.ds && stride == 9 && sl2) {
+            if (sl0 != 0) {
+                level = 0;
+                fault_type = ARMFault_Translation;
+                goto do_fault;
+            }
+            startlevel = -1;
+        } else if (!aarch64 || stride == 9) {
             /* AArch32 or 4KB pages */
             startlevel = 2 - sl0;
 
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
      * for both v7 and v8.  However, for v8 the SBZ bits [47:40] must be 0
      * or an AddressSize fault is raised.  So for v8 we extract those SBZ
      * bits as part of the address, which will be checked via outputsize.
-     * For AArch64, the address field always goes up to bit 47 (with extra
-     * bits for FEAT_LPA placed elsewhere).  AArch64 implies v8.
+     * For AArch64, the address field goes up to bit 47, or 49 with FEAT_LPA2;
+     * the highest bits of a 52-bit output are placed elsewhere.
      */
-    if (arm_feature(env, ARM_FEATURE_V8)) {
+    if (param.ds) {
+        descaddrmask = MAKE_64BIT_MASK(0, 50);
+    } else if (arm_feature(env, ARM_FEATURE_V8)) {
         descaddrmask = MAKE_64BIT_MASK(0, 48);
     } else {
         descaddrmask = MAKE_64BIT_MASK(0, 40);
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
 
         /*
          * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [15:12]
-         * of descriptor.  Otherwise, if descaddr is out of range, raise
-         * AddressSizeFault.
+         * of descriptor.  For FEAT_LPA2 and effective DS, bits [51:50] of
+         * descaddr are in [9:8].  Otherwise, if descaddr is out of range,
+         * raise AddressSizeFault.
          */
         if (outputsize > 48) {
-            descaddr |= extract64(descriptor, 12, 4) << 48;
+            if (param.ds) {
+                descaddr |= extract64(descriptor, 8, 2) << 50;
+            } else {
+                descaddr |= extract64(descriptor, 12, 4) << 48;
+            }
         } else if (descaddr >> outputsize) {
             fault_type = ARMFault_AddressSize;
             goto do_fault;
@@ -XXX,XX +XXX,XX @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
         assert(attrindx <= 7);
         cacheattrs->attrs = extract64(mair, attrindx * 8, 8);
     }
-    cacheattrs->shareability = extract32(attrs, 6, 2);
+
+    /*
+     * For FEAT_LPA2 and effective DS, the SH field in the attributes
+     * was re-purposed for output address bits.  The SH attribute in
+     * that case comes from TCR_ELx, which we extracted earlier.
+     */
+    if (param.ds) {
+        cacheattrs->shareability = param.sh;
+    } else {
+        cacheattrs->shareability = extract32(attrs, 6, 2);
+    }
 
     *phys_ptr = descaddr;
     *page_size_ptr = page_size;
-- 
2.25.1

When we're using KVM, the PSCI implementation is provided by the
kernel, but QEMU has to tell the guest about it via the device tree.
Currently we look at the KVM_CAP_ARM_PSCI_0_2 capability to determine
if the kernel is providing at least PSCI 0.2, but if the kernel
provides a newer version than that we will still only tell the guest
it has PSCI 0.2.  (This is fairly harmless; it just means the guest
won't use newer parts of the PSCI API.)

The kernel exposes the specific PSCI version it is implementing via
the ONE_REG API; use this to report in the dtb that the PSCI
implementation is 1.0-compatible if appropriate.  (The device tree
binding currently only distinguishes "pre-0.2", "0.2-compatible" and
"1.0-compatible".)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 20220224134655.1207865-1-peter.maydell@linaro.org
---
 target/arm/kvm-consts.h |  1 +
 hw/arm/boot.c           |  5 ++---
 target/arm/kvm64.c      | 12 ++++++++++++
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/target/arm/kvm-consts.h b/target/arm/kvm-consts.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm-consts.h
+++ b/target/arm/kvm-consts.h
@@ -XXX,XX +XXX,XX @@ MISMATCH_CHECK(QEMU_PSCI_1_0_FN_PSCI_FEATURES, PSCI_1_0_FN_PSCI_FEATURES);
 
 #define QEMU_PSCI_VERSION_0_1                     0x00001
 #define QEMU_PSCI_VERSION_0_2                     0x00002
+#define QEMU_PSCI_VERSION_1_0                     0x10000
 #define QEMU_PSCI_VERSION_1_1                     0x10001
 
 MISMATCH_CHECK(QEMU_PSCI_0_2_RET_TOS_MIGRATION_NOT_REQUIRED, PSCI_0_2_TOS_MP);
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -XXX,XX +XXX,XX @@ static void fdt_add_psci_node(void *fdt)
     }
 
     qemu_fdt_add_subnode(fdt, "/psci");
-    if (armcpu->psci_version == QEMU_PSCI_VERSION_0_2 ||
-        armcpu->psci_version == QEMU_PSCI_VERSION_1_1) {
-        if (armcpu->psci_version == QEMU_PSCI_VERSION_0_2) {
+    if (armcpu->psci_version >= QEMU_PSCI_VERSION_0_2) {
+        if (armcpu->psci_version < QEMU_PSCI_VERSION_1_0) {
             const char comp[] = "arm,psci-0.2\0arm,psci";
             qemu_fdt_setprop(fdt, "/psci", "compatible", comp, sizeof(comp));
         } else {
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
     uint64_t mpidr;
     ARMCPU *cpu = ARM_CPU(cs);
     CPUARMState *env = &cpu->env;
+    uint64_t psciver;
 
     if (cpu->kvm_target == QEMU_KVM_ARM_TARGET_NONE ||
         !object_dynamic_cast(OBJECT(cpu), TYPE_AARCH64_CPU)) {
@@ -XXX,XX +XXX,XX @@ int kvm_arch_init_vcpu(CPUState *cs)
         }
     }
 
+    /*
+     * KVM reports the exact PSCI version it is implementing via a
+     * special sysreg. If it is present, use its contents to determine
+     * what to report to the guest in the dtb (it is the PSCI version,
+     * in the same 15-bits major 16-bits minor format that PSCI_VERSION
+     * returns).
+     */
+    if (!kvm_get_one_reg(cs, KVM_REG_ARM_PSCI_VERSION, &psciver)) {
+        cpu->psci_version = psciver;
+    }
+
     /*
      * When KVM is in use, PSCI is emulated in-kernel and not by qemu.
      * Currently KVM has its own idea about MPIDR assignment, so we
-- 
2.25.1

The updateUIInfo method makes Cocoa API calls.  It also calls back
into QEMU functions like dpy_set_ui_info().  To do this safely, we
need to follow two rules:
 * Cocoa API calls are made on the Cocoa UI thread
 * When calling back into QEMU we must hold the iothread lock

Fix the places where we got this wrong, by taking the iothread lock
while executing updateUIInfo, and moving the call in cocoa_switch()
inside the dispatch_async block.

Some of the Cocoa UI methods which call updateUIInfo are invoked as
part of the initial application startup, while we're still doing the
little cross-thread dance described in the comment just above
call_qemu_main().  This meant they were calling back into the QEMU UI
layer before we'd actually finished initializing our display and
registered the DisplayChangeListener, which isn't really valid.  Once
updateUIInfo takes the iothread lock, we no longer get away with
this, because during this startup phase the iothread lock is held by
the QEMU main-loop thread which is waiting for us to finish our
display initialization.  So we must suppress updateUIInfo until
applicationDidFinishLaunching allows the QEMU main-loop thread to
continue.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Tested-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Message-id: 20220224101330.967429-2-peter.maydell@linaro.org
---
 ui/cocoa.m | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/ui/cocoa.m b/ui/cocoa.m
index XXXXXXX..XXXXXXX 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -XXX,XX +XXX,XX @@ QemuCocoaView *cocoaView;
     }
 }
 
-- (void) updateUIInfo
+- (void) updateUIInfoLocked
 {
+    /* Must be called with the iothread lock, i.e. via updateUIInfo */
     NSSize frameSize;
     QemuUIInfo info;
 
@@ -XXX,XX +XXX,XX @@ QemuCocoaView *cocoaView;
     dpy_set_ui_info(dcl.con, &info, TRUE);
 }
 
+- (void) updateUIInfo
+{
+    if (!allow_events) {
+        /*
+         * Don't try to tell QEMU about UI information in the application
+         * startup phase -- we haven't yet registered dcl with the QEMU UI
+         * layer, and also trying to take the iothread lock would deadlock.
+         * When cocoa_display_init() does register the dcl, the UI layer
+         * will call cocoa_switch(), which will call updateUIInfo, so
+         * we don't lose any information here.
+         */
+        return;
+    }
+
+    with_iothread_lock(^{
+        [self updateUIInfoLocked];
+    });
+}
+
 - (void)viewDidMoveToWindow
 {
     [self updateUIInfo];
@@ -XXX,XX +XXX,XX @@ static void cocoa_switch(DisplayChangeListener *dcl,
 
     COCOA_DEBUG("qemu_cocoa: cocoa_switch\n");
 
-    [cocoaView updateUIInfo];
-
     // The DisplaySurface will be freed as soon as this callback returns.
     // We take a reference to the underlying pixman image here so it does
     // not disappear from under our feet; the switchSurface method will
@@ -XXX,XX +XXX,XX @@ static void cocoa_switch(DisplayChangeListener *dcl,
     pixman_image_ref(image);
 
     dispatch_async(dispatch_get_main_queue(), ^{
+        [cocoaView updateUIInfo];
         [cocoaView switchSurface:image];
     });
     [pool release];
-- 
2.25.1

In commit 6e657e64cdc478 in 2013 we added some autorelease pools to
deal with complaints from macOS when we made calls into Cocoa from
threads that didn't have automatically created autorelease pools.
Later on, macOS got stricter about forbidding cross-thread Cocoa
calls, and in commit 5588840ff77800e839d8 we restructured the code to
avoid them.  This left the autorelease pool creation in several
functions without any purpose; delete it.

We still need the pool in cocoa_refresh() for the clipboard related
code which is called directly there.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Tested-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Message-id: 20220224101330.967429-3-peter.maydell@linaro.org
---
 ui/cocoa.m | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/ui/cocoa.m b/ui/cocoa.m
index XXXXXXX..XXXXXXX 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -XXX,XX +XXX,XX @@ int main (int argc, char **argv) {
 static void cocoa_update(DisplayChangeListener *dcl,
                          int x, int y, int w, int h)
 {
-    NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
-
     COCOA_DEBUG("qemu_cocoa: cocoa_update\n");
 
     dispatch_async(dispatch_get_main_queue(), ^{
@@ -XXX,XX +XXX,XX @@ static void cocoa_update(DisplayChangeListener *dcl,
         }
         [cocoaView setNeedsDisplayInRect:rect];
     });
-
-    [pool release];
 }
 
 static void cocoa_switch(DisplayChangeListener *dcl,
                          DisplaySurface *surface)
 {
-    NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
     pixman_image_t *image = surface->image;
 
     COCOA_DEBUG("qemu_cocoa: cocoa_switch\n");
@@ -XXX,XX +XXX,XX @@ static void cocoa_switch(DisplayChangeListener *dcl,
         [cocoaView updateUIInfo];
         [cocoaView switchSurface:image];
     });
-    [pool release];
 }
 
 static void cocoa_refresh(DisplayChangeListener *dcl)
-- 
2.25.1