Series comparison

-[PULL 00/31] target-arm queue
+[PULL 00/11] target-arm queue
-The following changes since commit 11b8920ed2093848f79f93d106afe8a69a61a523:
+The following changes since commit 3214bec13d8d4c40f707d21d8350d04e4123ae97:
-  Merge tag 'pull-request-2024-11-04' of https://gitlab.com/thuth/qemu into staging (2024-11-04 17:37:59 +0000)
+  Merge tag 'migration-20250110-pull-request' of https://gitlab.com/farosas/qemu into staging (2025-01-10 13:39:19 -0500)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20241105
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20250113
-for you to fetch changes up to 374cdc8efe4a039510cca47e8399d54a1aeb4f2d:
+for you to fetch changes up to 435d260e7ec5ff9c79e3e62f1d66ec82d2d691ae:
-  target/arm: Enable FEAT_CMOW for -cpu max (2024-11-05 10:10:00 +0000)
+  docs/system/arm/virt: mention specific migration information (2025-01-13 12:35:35 +0000)
 ----------------------------------------------------------------
 target-arm queue:
- * Fix MMU indexes for AArch32 Secure PL1&0 in a less complex and buggy way
+ * hw/arm_sysctl: fix extracting 31th bit of val
- * Fix SVE SDOT/UDOT/USDOT (4-way, indexed)
+ * hw/misc: cast rpm to uint64_t
- * softfloat: set 2-operand NaN propagation rule at runtime
+ * tests/qtest/boot-serial-test: Improve ASM
- * disas: Fix build against Capstone v6 (again)
+ * target/arm: Move minor arithmetic helpers out of helper.c
- * hw/rtc/ds1338: Trace send and receive operations
+ * target/arm: change default pauth algorithm to impdef
  * hw/timer/imx_gpt: Convert DPRINTF to trace events
  * hw/watchdog/wdt_imx2: Remove redundant assignment
  * hw/sensor/tmp105: Convert printf() to trace event, add tracing for read/write access
  * hw/net/npcm_gmac: Change error log to trace event
  * target/arm: Enable FEAT_CMOW for -cpu max
 ----------------------------------------------------------------
-Bernhard Beschow (4):
+Anastasia Belova (1):
-      hw/rtc/ds1338: Trace send and receive operations
+      hw/arm_sysctl: fix extracting 31th bit of val
       hw/timer/imx_gpt: Convert DPRINTF to trace events
       hw/watchdog/wdt_imx2: Remove redundant assignment
       hw/sensor/tmp105: Convert printf() to trace event, add tracing for read/write access
-Gustavo Romero (1):
+Peter Maydell (2):
-      target/arm: Enable FEAT_CMOW for -cpu max
+      target/arm: Move minor arithmetic helpers out of helper.c
       tests/tcg/aarch64: force qarma5 for pauth-3 test
-Nabih Estefan (1):
+Philippe Mathieu-Daudé (4):
-      hw/net/npcm_gmac: Change error log to trace event
+      tests/qtest/boot-serial-test: Improve ASM comments of PL011 tests
       tests/qtest/boot-serial-test: Reduce for() loop in PL011 tests
       tests/qtest/boot-serial-test: Reorder pair of instructions in PL011 test
       tests/qtest/boot-serial-test: Initialize PL011 Control register
-Peter Maydell (24):
+Pierrick Bouvier (3):
-      softfloat: Allow 2-operand NaN propagation rule to be set at runtime
+      target/arm: add new property to select pauth-qarma5
-      tests/fp: Explicitly set 2-NaN propagation rule
+      target/arm: change default pauth algorithm to impdef
-      target/arm: Explicitly set 2-NaN propagation rule
+      docs/system/arm/virt: mention specific migration information
       target/mips: Explicitly set 2-NaN propagation rule
       target/loongarch: Explicitly set 2-NaN propagation rule
       target/hppa: Explicitly set 2-NaN propagation rule
       target/s390x: Explicitly set 2-NaN propagation rule
       target/ppc: Explicitly set 2-NaN propagation rule
       target/m68k: Explicitly set 2-NaN propagation rule
       target/m68k: Initialize float_status fields in gdb set/get functions
       target/sparc: Move cpu_put_fsr(env, 0) call to reset
       target/sparc: Explicitly set 2-NaN propagation rule
       target/xtensa: Factor out calls to set_use_first_nan()
       target/xtensa: Explicitly set 2-NaN propagation rule
       target/i386: Set 2-NaN propagation rule explicitly
       target/alpha: Explicitly set 2-NaN propagation rule
       target/microblaze: Move setting of float rounding mode to reset
       target/microblaze: Explicitly set 2-NaN propagation rule
       target/openrisc: Explicitly set 2-NaN propagation rule
       target/rx: Explicitly set 2-NaN propagation rule
       softfloat: Remove fallback rule from pickNaN()
       Revert "target/arm: Fix usage of MMU indexes when EL3 is AArch32"
       target/arm: Add new MMU indexes for AArch32 Secure PL1&0
       target/arm: Fix SVE SDOT/UDOT/USDOT (4-way, indexed)
-Richard Henderson (1):
+Tigran Sogomonian (1):
-      disas: Fix build against Capstone v6 (again)
+      hw/misc: cast rpm to uint64_t
- docs/system/arm/emulation.rst     |   1 +
+ docs/system/arm/cpu-features.rst                |   7 +-
- meson.build                       |   1 +
+ docs/system/arm/virt.rst                        |   4 +
- hw/sensor/trace.h                 |   1 +
+ docs/system/introduction.rst                    |   2 +-
- include/disas/capstone.h          |   1 +
+ target/arm/cpu.h                                |   4 +
- include/fpu/softfloat-helpers.h   |  11 +++
+ hw/core/machine.c                               |   4 +-
- include/fpu/softfloat-types.h     |  38 ++++++++++
+ hw/misc/arm_sysctl.c                            |   2 +-
- target/arm/cpu-features.h         |   5 ++
+ hw/misc/npcm7xx_mft.c                           |   5 +-
- target/arm/cpu.h                  |  49 ++++++------
+ target/arm/arm-qmp-cmds.c                       |   2 +-
- target/arm/internals.h            |  41 +++++-----
+ target/arm/cpu.c                                |   2 +
- target/arm/tcg/translate.h        |   2 -
+ target/arm/cpu64.c                              |  38 ++-
- target/i386/cpu.h                 |   3 +
+ target/arm/helper.c                             | 285 -----------------------
- target/mips/fpu_helper.h          |  22 ++++++
+ target/arm/tcg/arith_helper.c                   | 296 ++++++++++++++++++++++++
- target/xtensa/cpu.h               |   6 ++
+ tests/qtest/arm-cpu-features.c                  |  15 +-
- hw/net/npcm_gmac.c                |   5 +-
+ tests/qtest/boot-serial-test.c                  |  23 +-
- hw/rtc/ds1338.c                   |   6 ++
+ target/arm/{op_addsub.h => tcg/op_addsub.c.inc} |   0
- hw/sensor/tmp105.c                |   7 +-
+ target/arm/tcg/meson.build                      |   1 +
- hw/timer/imx_gpt.c                |  18 ++---
+ tests/tcg/aarch64/Makefile.softmmu-target       |   3 +
- hw/watchdog/wdt_imx2.c            |   1 -
+files changed, 377 insertions(+), 316 deletions(-)
- linux-user/arm/nwfpe/fpa11.c      |  18 +++++
+ create mode 100644 target/arm/tcg/arith_helper.c
- target/alpha/cpu.c                |  11 +++
+ rename target/arm/{op_addsub.h => tcg/op_addsub.c.inc} (100%)
- target/arm/cpu.c                  |  25 ++++--
  target/arm/helper.c               |  73 ++++++++++++------
  target/arm/ptw.c                  |  10 +--
  target/arm/tcg/cpu64.c            |   1 +
  target/arm/tcg/hflags.c           |   4 -
  target/arm/tcg/op_helper.c        |  14 +++-
  target/arm/tcg/translate-a64.c    |   2 +-
  target/arm/tcg/translate.c        |  12 +--
  target/arm/tcg/vec_helper.c       |   9 ++-
  target/hppa/fpu_helper.c          |   6 ++
  target/i386/cpu.c                 |   4 +
  target/i386/tcg/fpu_helper.c      |  40 ++++++++++
  target/loongarch/tcg/fpu_helper.c |   1 +
  target/m68k/cpu.c                 |  16 ++++
  target/m68k/fpu_helper.c          |   1 +
  target/m68k/helper.c              |   4 +-
  target/microblaze/cpu.c           |  10 ++-
  target/mips/cpu.c                 |   2 +-
  target/mips/msa.c                 |  17 +++++
  target/openrisc/cpu.c             |   6 ++
  target/ppc/cpu_init.c             |   8 ++
  target/rx/cpu.c                   |   7 ++
  target/s390x/cpu.c                |   1 +
  target/sparc/cpu.c                |  10 ++-
  target/sparc/fop_helper.c         |  10 ++-
  target/xtensa/cpu.c               |   2 +-
  target/xtensa/fpu_helper.c        |  35 +++++----
  tests/fp/fp-bench.c               |   2 +
  tests/fp/fp-test-log2.c           |   1 +
  tests/fp/fp-test.c                |   2 +
  fpu/softfloat-specialize.c.inc    | 156 ++++++++++++++------------------------
  hw/net/trace-events               |   1 +
  hw/rtc/trace-events               |   4 +
  hw/sensor/trace-events            |   6 ++
  hw/timer/trace-events             |   6 ++
 files changed, 516 insertions(+), 239 deletions(-)
  create mode 100644 hw/sensor/trace.h
  create mode 100644 hw/sensor/trace-events

-[PULL 28/31] hw/watchdog/wdt_imx2: Remove redundant assignment
+[PULL 01/11] hw/arm_sysctl: fix extracting 31th bit of val
-From: Bernhard Beschow <shentey@gmail.com>
+From: Anastasia Belova <abelova@astralinux.ru>
-The same statement is executed unconditionally right before the if statement.
+<< 31 is casted to uint64_t while bitwise and with val.
 So this value may become 0xffffffff80000000 but only
 th "start" bit is required.
-Cc: Guenter Roeck <linux@roeck-us.net>
+This is not possible in practice because the MemoryRegionOps
-Reviewed-by: Guenter Roeck <linux@roeck-us.net>
+uses the default max access size of 4 bytes and so none
-Signed-off-by: Bernhard Beschow <shentey@gmail.com>
+of the upper bytes of val will be set, but the bitfield
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+extract API is clearer anyway.
-Message-id: 20241103143330.123596-4-shentey@gmail.com
 Use the bitfield extract() API instead.
 Found by Linux Verification Center (linuxtesting.org) with SVACE.
 Signed-off-by: Anastasia Belova <abelova@astralinux.ru>
 Message-id: 20241220125429.7552-1-abelova@astralinux.ru
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 [PMM: add clarification to commit message]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/watchdog/wdt_imx2.c | 1 -
+ hw/misc/arm_sysctl.c | 2 +-
-file changed, 1 deletion(-)
+file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/hw/watchdog/wdt_imx2.c b/hw/watchdog/wdt_imx2.c
+diff --git a/hw/misc/arm_sysctl.c b/hw/misc/arm_sysctl.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/watchdog/wdt_imx2.c
+--- a/hw/misc/arm_sysctl.c
-+++ b/hw/watchdog/wdt_imx2.c
++++ b/hw/misc/arm_sysctl.c
-@@ -XXX,XX +XXX,XX @@ static void imx2_wdt_expired(void *opaque)
+@@ -XXX,XX +XXX,XX @@ static void arm_sysctl_write(void *opaque, hwaddr offset,
+          * as zero.
-     /* Perform watchdog action if watchdog is enabled */
+          */
-     if (s->wcr & IMX2_WDT_WCR_WDE) {
+         s->sys_cfgctrl = val & ~((3 << 18) | (1 << 31));
--        s->wrsr = IMX2_WDT_WRSR_TOUT;
+-        if (val & (1 << 31)) {
-         watchdog_perform_action();
++        if (extract64(val, 31, 1)) {
-     }
+             /* Start bit set -- actually do something */
- }
+             unsigned int dcc = extract32(s->sys_cfgctrl, 26, 4);
              unsigned int function = extract32(s->sys_cfgctrl, 20, 6);
 --
 .34.1

-[PULL 27/31] hw/timer/imx_gpt: Convert DPRINTF to trace events
+[PULL 02/11] hw/misc: cast rpm to uint64_t
-From: Bernhard Beschow <shentey@gmail.com>
+From: Tigran Sogomonian <tsogomonian@astralinux.ru>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+The value of an arithmetic expression
-Signed-off-by: Bernhard Beschow <shentey@gmail.com>
+'rpm * NPCM7XX_MFT_PULSE_PER_REVOLUTION' is a subject
-Message-id: 20241103143330.123596-3-shentey@gmail.com
+to overflow because its operands are not cast to
 a larger data type before performing arithmetic. Thus, need
 to cast rpm to uint64_t.
 Found by Linux Verification Center (linuxtesting.org) with SVACE.
 Signed-off-by: Tigran Sogomonian <tsogomonian@astralinux.ru>
 Reviewed-by: Patrick Leis <venture@google.com>
 Reviewed-by: Hao Wu <wuhaotsh@google.com>
 Message-id: 20241226130311.1349-1-tsogomonian@astralinux.ru
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/timer/imx_gpt.c    | 18 +++++-------------
+ hw/misc/npcm7xx_mft.c | 5 +++--
- hw/timer/trace-events |  6 ++++++
+file changed, 3 insertions(+), 2 deletions(-)
 files changed, 11 insertions(+), 13 deletions(-)
-diff --git a/hw/timer/imx_gpt.c b/hw/timer/imx_gpt.c
+diff --git a/hw/misc/npcm7xx_mft.c b/hw/misc/npcm7xx_mft.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/timer/imx_gpt.c
+--- a/hw/misc/npcm7xx_mft.c
-+++ b/hw/timer/imx_gpt.c
++++ b/hw/misc/npcm7xx_mft.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static NPCM7xxMFTCaptureState npcm7xx_mft_compute_cnt(
- #include "migration/vmstate.h"
+          * RPM = revolution/min. The time for one revlution (in ns) is
- #include "qemu/module.h"
+          * MINUTE_TO_NANOSECOND / RPM.
- #include "qemu/log.h"
+          */
-+#include "trace.h"
+-        count = clock_ns_to_ticks(clock, (60 * NANOSECONDS_PER_SECOND) /
+-            (rpm * NPCM7XX_MFT_PULSE_PER_REVOLUTION));
- #ifndef DEBUG_IMX_GPT
++        count = clock_ns_to_ticks(clock,
- #define DEBUG_IMX_GPT 0
++            (uint64_t)(60 * NANOSECONDS_PER_SECOND) /
- #endif
++            ((uint64_t)rpm * NPCM7XX_MFT_PULSE_PER_REVOLUTION));
 -#define DPRINTF(fmt, args...) \
 -    do { \
 -        if (DEBUG_IMX_GPT) { \
 -            fprintf(stderr, "[%s]%s: " fmt , TYPE_IMX_GPT, \
 -                                             __func__, ##args); \
 -        } \
 -    } while (0)
 -
  static const char *imx_gpt_reg_name(uint32_t reg)
  {
      switch (reg) {
@@ -XXX,XX +XXX,XX @@ static void imx_gpt_set_freq(IMXGPTState *s)
      s->freq = imx_ccm_get_clock_frequency(s->ccm,
                                            s->clocks[clksrc]) / (1 + s->pr);
 -    DPRINTF("Setting clksrc %d to frequency %d\n", clksrc, s->freq);
 +    trace_imx_gpt_set_freq(clksrc, s->freq);
      if (s->freq) {
          ptimer_set_freq(s->timer, s->freq);
@@ -XXX,XX +XXX,XX @@ static uint64_t imx_gpt_read(void *opaque, hwaddr offset, unsigned size)
          break;
      }
--    DPRINTF("(%s) = 0x%08x\n", imx_gpt_reg_name(offset >> 2), reg_value);
+     if (count > NPCM7XX_MFT_MAX_CNT) {
 +    trace_imx_gpt_read(imx_gpt_reg_name(offset >> 2), reg_value);
      return reg_value;
  }
@@ -XXX,XX +XXX,XX @@ static void imx_gpt_write(void *opaque, hwaddr offset, uint64_t value,
      IMXGPTState *s = IMX_GPT(opaque);
      uint32_t oldreg;
 -    DPRINTF("(%s, value = 0x%08x)\n", imx_gpt_reg_name(offset >> 2),
 -            (uint32_t)value);
 +    trace_imx_gpt_write(imx_gpt_reg_name(offset >> 2), (uint32_t)value);
      switch (offset >> 2) {
      case 0:
@@ -XXX,XX +XXX,XX @@ static void imx_gpt_timeout(void *opaque)
  {
      IMXGPTState *s = IMX_GPT(opaque);
 -    DPRINTF("\n");
 +    trace_imx_gpt_timeout();
      s->sr |= s->next_int;
      s->next_int = 0;
 diff --git a/hw/timer/trace-events b/hw/timer/trace-events
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/timer/trace-events
 +++ b/hw/timer/trace-events
@@ -XXX,XX +XXX,XX @@ cmsdk_apb_dualtimer_read(uint64_t offset, uint64_t data, unsigned size) "CMSDK A
  cmsdk_apb_dualtimer_write(uint64_t offset, uint64_t data, unsigned size) "CMSDK APB dualtimer write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
  cmsdk_apb_dualtimer_reset(void) "CMSDK APB dualtimer: reset"
 +# imx_gpt.c
 +imx_gpt_set_freq(uint32_t clksrc, uint32_t freq) "Setting clksrc %u to %u Hz"
 +imx_gpt_read(const char *name, uint64_t value) "%s -> 0x%08" PRIx64
 +imx_gpt_write(const char *name, uint64_t value) "%s <- 0x%08" PRIx64
 +imx_gpt_timeout(void) ""
 +
  # npcm7xx_timer.c
  npcm7xx_timer_read(const char *id, uint64_t offset, uint64_t value) " %s offset: 0x%04" PRIx64 " value 0x%08" PRIx64
  npcm7xx_timer_write(const char *id, uint64_t offset, uint64_t value) "%s offset: 0x%04" PRIx64 " value 0x%08" PRIx64
 --
 .34.1

-[PULL 30/31] hw/net/npcm_gmac: Change error log to trace event
+[PULL 03/11] tests/qtest/boot-serial-test: Improve ASM comments of PL011 tests
-From: Nabih Estefan <nabihestefan@google.com>
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
-Convert the LOG_GUEST_ERROR for the "tx descriptor is owned
+Re-indent ASM comments adding the 'loop:' label.
 by software" to a trace message. This condition is normal
 when there is there is nothing to transmit, and we would
 otherwise spam the logs with it in that situation.
-Signed-off-by: Nabih Estefan <nabihestefan@google.com>
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Signed-off-by: Roque Arcudia Hernandez <roqueh@google.com>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Reviewed-by: Fabiano Rosas <farosas@suse.de>
 Message-id: 20241014184847.1594056-1-roqueh@google.com
 [PMM: tweaked commit message]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/net/npcm_gmac.c  | 5 ++---
+ tests/qtest/boot-serial-test.c | 18 +++++++++---------
- hw/net/trace-events | 1 +
+file changed, 9 insertions(+), 9 deletions(-)
 files changed, 3 insertions(+), 3 deletions(-)
-diff --git a/hw/net/npcm_gmac.c b/hw/net/npcm_gmac.c
+diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/net/npcm_gmac.c
+--- a/tests/qtest/boot-serial-test.c
-+++ b/hw/net/npcm_gmac.c
++++ b/tests/qtest/boot-serial-test.c
-@@ -XXX,XX +XXX,XX @@ static void gmac_try_send_next_packet(NPCMGMACState *gmac)
+@@ -XXX,XX +XXX,XX @@ static const uint8_t kernel_plml605[] = {
+ };
-         /* 1 = DMA Owned, 0 = Software Owned */
-         if (!(tx_desc.tdes0 & TX_DESC_TDES0_OWN)) {
+ static const uint8_t bios_raspi2[] = {
--            qemu_log_mask(LOG_GUEST_ERROR,
+-    0x08, 0x30, 0x9f, 0xe5,                 /* ldr   r3,[pc,#8]    Get base */
--                          "TX Descriptor @ 0x%x is owned by software\n",
+-    0x54, 0x20, 0xa0, 0xe3,                 /* mov     r2,#'T' */
--                          desc_addr);
+-    0x00, 0x20, 0xc3, 0xe5,                 /* strb    r2,[r3] */
-+            trace_npcm_gmac_tx_desc_owner(DEVICE(gmac)->canonical_path,
+-    0xfb, 0xff, 0xff, 0xea,                 /* b       loop */
-+                                          desc_addr);
+-    0x00, 0x10, 0x20, 0x3f,                 /* 0x3f201000 = UART0 base addr */
-             gmac->regs[R_NPCM_DMA_STATUS] |= NPCM_DMA_STATUS_TU;
++    0x08, 0x30, 0x9f, 0xe5,                 /* loop:  ldr     r3, [pc, #8]   Get &UART0 */
-             gmac_dma_set_state(gmac, NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT,
++    0x54, 0x20, 0xa0, 0xe3,                 /*        mov     r2, #'T' */
-                 NPCM_DMA_STATUS_TX_SUSPENDED_STATE);
++    0x00, 0x20, 0xc3, 0xe5,                 /*        strb    r2, [r3]       *TXDAT = 'T' */
-diff --git a/hw/net/trace-events b/hw/net/trace-events
++    0xfb, 0xff, 0xff, 0xea,                 /*        b       -12            (loop) */
-index XXXXXXX..XXXXXXX 100644
++    0x00, 0x10, 0x20, 0x3f,                 /* UART0: 0x3f201000 */
---- a/hw/net/trace-events
+ };
-+++ b/hw/net/trace-events
-@@ -XXX,XX +XXX,XX @@ npcm_gmac_packet_received(const char* name, uint32_t len) "%s: Reception finishe
+ static const uint8_t kernel_aarch64[] = {
- npcm_gmac_packet_sent(const char* name, uint16_t len) "%s: TX packet sent!, length: 0x%04" PRIX16
+-    0x81, 0x0a, 0x80, 0x52,                 /* mov     w1, #0x54 */
- npcm_gmac_debug_desc_data(const char* name, void* addr, uint32_t des0, uint32_t des1, uint32_t des2, uint32_t des3)"%s: Address: %p Descriptor 0: 0x%04" PRIX32 " Descriptor 1: 0x%04" PRIX32 "Descriptor 2: 0x%04" PRIX32 " Descriptor 3: 0x%04" PRIX32
+-    0x02, 0x20, 0xa1, 0xd2,                 /* mov     x2, #0x9000000 */
- npcm_gmac_packet_tx_desc_data(const char* name, uint32_t tdes0, uint32_t tdes1) "%s: Tdes0: 0x%04" PRIX32 " Tdes1: 0x%04" PRIX32
+-    0x41, 0x00, 0x00, 0x39,                 /* strb    w1, [x2] */
-+npcm_gmac_tx_desc_owner(const char* name, uint32_t desc_addr) "%s: TX Descriptor @0x%04" PRIX32 " is owned by software"
+-    0xfd, 0xff, 0xff, 0x17,                 /* b       -12 (loop) */
++    0x81, 0x0a, 0x80, 0x52,                 /* loop:  mov    w1, #'T' */
- # npcm_pcs.c
++    0x02, 0x20, 0xa1, 0xd2,                 /*        mov    x2, #0x9000000  Load UART0 */
- npcm_pcs_reg_read(const char *name, uint16_t indirect_access_baes, uint64_t offset, uint16_t value) "%s: IND: 0x%02" PRIx16 " offset: 0x%04" PRIx64 " value: 0x%04" PRIx16
++    0x41, 0x00, 0x00, 0x39,                 /*        strb   w1, [x2]        *TXDAT = 'T' */
 +    0xfd, 0xff, 0xff, 0x17,                 /*        b      -12             (loop) */
  };
  static const uint8_t kernel_nrf51[] = {
 --
 .34.1

-[PULL 29/31] hw/sensor/tmp105: Convert printf() to trace event, add tracing for read/write access
+[PULL 04/11] tests/qtest/boot-serial-test: Reduce for() loop in PL011 tests
-From: Bernhard Beschow <shentey@gmail.com>
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
-printf() unconditionally prints to the console which disturbs `-serial stdio`.
+Since registers are not modified, we don't need
-Fix that by converting into a trace event. While at it, add some tracing for
+to refill their values. Directly jump to the previous
-read and write access.
+store instruction to keep filling the TXDAT register.
-Fixes: 7e7c5e4c1ba5 "Nokia N800 machine support (ARM)."
+The equivalent C code remains:
-Signed-off-by: Bernhard Beschow <shentey@gmail.com>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+  while (true) {
-Message-id: 20241103143330.123596-5-shentey@gmail.com
+      *UART_DATA = 'T';
   }
 Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Fabiano Rosas <farosas@suse.de>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- meson.build            | 1 +
+ tests/qtest/boot-serial-test.c | 12 ++++++------
- hw/sensor/trace.h      | 1 +
+file changed, 6 insertions(+), 6 deletions(-)
  hw/sensor/tmp105.c     | 7 ++++++-
  hw/sensor/trace-events | 6 ++++++
 files changed, 14 insertions(+), 1 deletion(-)
  create mode 100644 hw/sensor/trace.h
  create mode 100644 hw/sensor/trace-events
-diff --git a/meson.build b/meson.build
+diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/meson.build
+--- a/tests/qtest/boot-serial-test.c
-+++ b/meson.build
++++ b/tests/qtest/boot-serial-test.c
-@@ -XXX,XX +XXX,XX @@ if have_system
+@@ -XXX,XX +XXX,XX @@ static const uint8_t kernel_plml605[] = {
-     'hw/s390x',
+ };
-     'hw/scsi',
-     'hw/sd',
+ static const uint8_t bios_raspi2[] = {
-+    'hw/sensor',
+-    0x08, 0x30, 0x9f, 0xe5,                 /* loop:  ldr     r3, [pc, #8]   Get &UART0 */
-     'hw/sh4',
++    0x08, 0x30, 0x9f, 0xe5,                 /*        ldr     r3, [pc, #8]   Get &UART0 */
-     'hw/sparc',
+x54, 0x20, 0xa0, 0xe3,                 /*        mov     r2, #'T' */
-     'hw/sparc64',
+-    0x00, 0x20, 0xc3, 0xe5,                 /*        strb    r2, [r3]       *TXDAT = 'T' */
-diff --git a/hw/sensor/trace.h b/hw/sensor/trace.h
+-    0xfb, 0xff, 0xff, 0xea,                 /*        b       -12            (loop) */
-new file mode 100644
++    0x00, 0x20, 0xc3, 0xe5,                 /* loop:  strb    r2, [r3]       *TXDAT = 'T' */
-index XXXXXXX..XXXXXXX
++    0xff, 0xff, 0xff, 0xea,                 /*        b       -4             (loop) */
---- /dev/null
+x00, 0x10, 0x20, 0x3f,                 /* UART0: 0x3f201000 */
-+++ b/hw/sensor/trace.h
+ };
-@@ -0,0 +1 @@
-+#include "trace/trace-hw_sensor.h"
+ static const uint8_t kernel_aarch64[] = {
-diff --git a/hw/sensor/tmp105.c b/hw/sensor/tmp105.c
+-    0x81, 0x0a, 0x80, 0x52,                 /* loop:  mov    w1, #'T' */
-index XXXXXXX..XXXXXXX 100644
++    0x81, 0x0a, 0x80, 0x52,                 /*        mov    w1, #'T' */
---- a/hw/sensor/tmp105.c
+x02, 0x20, 0xa1, 0xd2,                 /*        mov    x2, #0x9000000  Load UART0 */
-+++ b/hw/sensor/tmp105.c
+-    0x41, 0x00, 0x00, 0x39,                 /*        strb   w1, [x2]        *TXDAT = 'T' */
-@@ -XXX,XX +XXX,XX @@
+-    0xfd, 0xff, 0xff, 0x17,                 /*        b      -12             (loop) */
- #include "qapi/visitor.h"
++    0x41, 0x00, 0x00, 0x39,                 /* loop:  strb   w1, [x2]        *TXDAT = 'T' */
- #include "qemu/module.h"
++    0xff, 0xff, 0xff, 0x17,                 /*        b      -4              (loop) */
- #include "hw/registerfields.h"
+ };
-+#include "trace.h"
+ static const uint8_t kernel_nrf51[] = {
  FIELD(CONFIG, SHUTDOWN_MODE,        0, 1)
  FIELD(CONFIG, THERMOSTAT_MODE,      1, 1)
@@ -XXX,XX +XXX,XX @@ static void tmp105_read(TMP105State *s)
          s->buf[s->len++] = ((uint16_t) s->limit[1]) >> 0;
          break;
      }
 +
 +    trace_tmp105_read(s->i2c.address, s->pointer);
  }
  static void tmp105_write(TMP105State *s)
  {
 +    trace_tmp105_write(s->i2c.address, s->pointer);
 +
      switch (s->pointer & 3) {
      case TMP105_REG_TEMPERATURE:
          break;
      case TMP105_REG_CONFIG:
          if (FIELD_EX8(s->buf[0] & ~s->config, CONFIG, SHUTDOWN_MODE)) {
 -            printf("%s: TMP105 shutdown\n", __func__);
 +            trace_tmp105_write_shutdown(s->i2c.address);
          }
          s->config = FIELD_DP8(s->buf[0], CONFIG, ONE_SHOT, 0);
          s->faults = tmp105_faultq[FIELD_EX8(s->config, CONFIG, FAULT_QUEUE)];
 diff --git a/hw/sensor/trace-events b/hw/sensor/trace-events
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/hw/sensor/trace-events
@@ -XXX,XX +XXX,XX @@
 +# See docs/devel/tracing.rst for syntax documentation.
 +
 +# tmp105.c
 +tmp105_read(uint8_t dev, uint8_t addr) "device: 0x%02x, addr: 0x%02x"
 +tmp105_write(uint8_t dev, uint8_t addr) "device: 0x%02x, addr 0x%02x"
 +tmp105_write_shutdown(uint8_t dev) "device: 0x%02x"
 --
 .34.1

-[PULL 11/31] target/sparc: Move cpu_put_fsr(env, 0) call to reset
+[PULL 05/11] tests/qtest/boot-serial-test: Reorder pair of instructions in PL011 test
-Currently we call cpu_put_fsr(0) in sparc_cpu_realizefn(), which
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
 initializes various fields in the CPU struct:
  * fsr_cexc_ftt
  * fcc[]
  * fsr_qne
  * fsr
 It also sets the rounding mode in env->fp_status.
-This is largely pointless, because when we later reset the CPU
+In the next commit we are going to use a different value
-this will zero out all the fields up until the "end_reset_fields"
+for the $w1 register, maintaining the same $x2 value. In
-label, which includes all of these (but not fp_status!)
+order to keep the next commit trivial to review, set $x2
 before $w1.
-Move the cpu_put_fsr(env, 0) call to reset, because that expresses
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-the logical requirement: we want to reset FSR to 0 on every reset.
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-This isn't a behaviour change because the fields are all zero anyway.
+Reviewed-by: Fabiano Rosas <farosas@suse.de>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-12-peter.maydell@linaro.org
 ---
- target/sparc/cpu.c | 2 +-
+ tests/qtest/boot-serial-test.c | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/target/sparc/cpu.c b/target/sparc/cpu.c
+diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/sparc/cpu.c
+--- a/tests/qtest/boot-serial-test.c
-+++ b/target/sparc/cpu.c
++++ b/tests/qtest/boot-serial-test.c
-@@ -XXX,XX +XXX,XX @@ static void sparc_cpu_reset_hold(Object *obj, ResetType type)
+@@ -XXX,XX +XXX,XX @@ static const uint8_t bios_raspi2[] = {
-     env->npc = env->pc + 4;
+ };
- #endif
-     env->cache_control = 0;
+ static const uint8_t kernel_aarch64[] = {
-+    cpu_put_fsr(env, 0);
+-    0x81, 0x0a, 0x80, 0x52,                 /*        mov    w1, #'T' */
- }
+x02, 0x20, 0xa1, 0xd2,                 /*        mov    x2, #0x9000000  Load UART0 */
++    0x81, 0x0a, 0x80, 0x52,                 /*        mov    w1, #'T' */
- #ifndef CONFIG_USER_ONLY
+x41, 0x00, 0x00, 0x39,                 /* loop:  strb   w1, [x2]        *TXDAT = 'T' */
-@@ -XXX,XX +XXX,XX @@ static void sparc_cpu_realizefn(DeviceState *dev, Error **errp)
+xff, 0xff, 0xff, 0x17,                 /*        b      -4              (loop) */
-     env->version |= env->def.maxtl << 8;
+ };
      env->version |= env->def.nwindows - 1;
  #endif
 -    cpu_put_fsr(env, 0);
      cpu_exec_realizefn(cs, &local_err);
      if (local_err != NULL) {
 --
 .34.1

-[PULL 26/31] hw/rtc/ds1338: Trace send and receive operations
+[PULL 06/11] tests/qtest/boot-serial-test: Initialize PL011 Control register
-From: Bernhard Beschow <shentey@gmail.com>
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+The tests using the PL011 UART of the virt and raspi machines
-Signed-off-by: Bernhard Beschow <shentey@gmail.com>
+weren't properly enabling the UART and its transmitter previous
-Message-id: 20241103143330.123596-2-shentey@gmail.com
+to sending characters. Follow the PL011 manual initialization
 recommendation by setting the proper bits of the control register.
 Update the ASM code prefixing:
   *UART_CTRL = UART_ENABLE | TX_ENABLE;
 to:
   while (true) {
       *UART_DATA = 'T';
   }
 Note, since commit 51b61dd4d56 ("hw/char/pl011: Warn when using
 disabled transmitter") incomplete PL011 initialization can be
 logged using the '-d guest_errors' command line option.
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/rtc/ds1338.c     | 6 ++++++
+ tests/qtest/boot-serial-test.c | 7 ++++++-
- hw/rtc/trace-events | 4 ++++
+file changed, 6 insertions(+), 1 deletion(-)
 files changed, 10 insertions(+)
-diff --git a/hw/rtc/ds1338.c b/hw/rtc/ds1338.c
+diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/rtc/ds1338.c
+--- a/tests/qtest/boot-serial-test.c
-+++ b/hw/rtc/ds1338.c
++++ b/tests/qtest/boot-serial-test.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static const uint8_t kernel_plml605[] = {
- #include "qemu/module.h"
+ };
- #include "qom/object.h"
- #include "sysemu/rtc.h"
+ static const uint8_t bios_raspi2[] = {
-+#include "trace.h"
+-    0x08, 0x30, 0x9f, 0xe5,                 /*        ldr     r3, [pc, #8]   Get &UART0 */
++    0x10, 0x30, 0x9f, 0xe5,                 /*        ldr     r3, [pc, #16]  Get &UART0 */
- /* Size of NVRAM including both the user-accessible area and the
++    0x10, 0x20, 0x9f, 0xe5,                 /*        ldr     r2, [pc, #16]  Get &CR */
-  * secondary register area.
++    0xb0, 0x23, 0xc3, 0xe1,                 /*        strh    r2, [r3, #48]  Set CR */
-@@ -XXX,XX +XXX,XX @@ static uint8_t ds1338_recv(I2CSlave *i2c)
+x54, 0x20, 0xa0, 0xe3,                 /*        mov     r2, #'T' */
-     uint8_t res;
+x00, 0x20, 0xc3, 0xe5,                 /* loop:  strb    r2, [r3]       *TXDAT = 'T' */
+xff, 0xff, 0xff, 0xea,                 /*        b       -4             (loop) */
-     res  = s->nvram[s->ptr];
+x00, 0x10, 0x20, 0x3f,                 /* UART0: 0x3f201000 */
-+
++    0x01, 0x01, 0x00, 0x00,                 /* CR:    0x101 = UARTEN|TXE */
-+    trace_ds1338_recv(s->ptr, res);
+ };
-+
-     inc_regptr(s);
+ static const uint8_t kernel_aarch64[] = {
-     return res;
+x02, 0x20, 0xa1, 0xd2,                 /*        mov    x2, #0x9000000  Load UART0 */
- }
++    0x21, 0x20, 0x80, 0x52,                 /*        mov    w1, 0x101       CR = UARTEN|TXE */
-@@ -XXX,XX +XXX,XX @@ static int ds1338_send(I2CSlave *i2c, uint8_t data)
++    0x41, 0x60, 0x00, 0x79,                 /*        strh   w1, [x2, #48]   Set CR */
- {
+x81, 0x0a, 0x80, 0x52,                 /*        mov    w1, #'T' */
-     DS1338State *s = DS1338(i2c);
+x41, 0x00, 0x00, 0x39,                 /* loop:  strb   w1, [x2]        *TXDAT = 'T' */
+xff, 0xff, 0xff, 0x17,                 /*        b      -4              (loop) */
 +    trace_ds1338_send(s->ptr, data);
 +
      if (s->addr_byte) {
          s->ptr = data & (NVRAM_SIZE - 1);
          s->addr_byte = false;
 diff --git a/hw/rtc/trace-events b/hw/rtc/trace-events
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/rtc/trace-events
 +++ b/hw/rtc/trace-events
@@ -XXX,XX +XXX,XX @@ pl031_set_alarm(uint32_t ticks) "alarm set for %u ticks"
  aspeed_rtc_read(uint64_t addr, uint64_t value) "addr 0x%02" PRIx64 " value 0x%08" PRIx64
  aspeed_rtc_write(uint64_t addr, uint64_t value) "addr 0x%02" PRIx64 " value 0x%08" PRIx64
 +# ds1338.c
 +ds1338_recv(uint32_t addr, uint8_t value) "[0x%" PRIx32 "] -> 0x%02" PRIx8
 +ds1338_send(uint32_t addr, uint8_t value) "[0x%" PRIx32 "] <- 0x%02" PRIx8
 +
  # m48t59.c
  m48txx_nvram_io_read(uint64_t addr, uint64_t value) "io read addr:0x%04" PRIx64 " value:0x%02" PRIx64
  m48txx_nvram_io_write(uint64_t addr, uint64_t value) "io write addr:0x%04" PRIx64 " value:0x%02" PRIx64
 --
 .34.1

-[PULL 01/31] softfloat: Allow 2-operand NaN propagation rule to be set at runtime
+[PULL 07/11] target/arm: Move minor arithmetic helpers out of helper.c
-IEEE 758 does not define a fixed rule for which NaN to pick as the
+helper.c includes some small TCG helper functions used for mostly
-result if both operands of a 2-operand operation are NaNs.  As a
+arithmetic instructions.  These are TCG only and there's no need for
-result different architectures have ended up with different rules for
+them to be in the large and unwieldy helper.c.  Move them out to
-propagating NaNs.
+their own source file in the tcg/ subdirectory, together with the
 op_addsub.h multiply-included template header that they use.
-QEMU currently hardcodes the NaN propagation logic into the binary
+Since we are moving op_addsub.h, we take the opportunity to
-because pickNaN() has an ifdef ladder for different targets.  We want
+give it a name which matches our convention for files which
-to make the propagation rule instead be selectable at runtime,
+are not true header files but which are #included from other
-because:
+C files: op_addsub.c.inc.
  * this will let us have multiple targets in one QEMU binary
  * the Arm FEAT_AFP architectural feature includes letting
    the guest select a NaN propagation rule at runtime
  * x86 specifies different propagation rules for x87 FPU ops
    and for SSE ops, and specifying the rule in the float_status
    would let us emulate this, instead of wrongly using the
    x87 rules everywhere
-In this commit we add an enum for the propagation rule, the field in
+(Ironically, this means that helper.c no longer contains
-float_status, and the corresponding getters and setters.  We change
+any TCG helper function definitions at all.)
 pickNaN to honour this, but because all targets still leave this
 field at its default 0 value, the fallback logic will pick the rule
 type with the old ifdef ladder.
 It's valid not to set a propagation rule if default_nan_mode is
 enabled, because in that case there's no need to pick a NaN; all the
 callers of pickNaN() catch this case and skip calling it.  So we can
 already assert that we don't get into the "no rule defined" codepath
 for our four targets which always set default_nan_mode: Hexagon,
 RiscV, SH4 and Tricore, and for the one target which does not have FP
 at all: avr.  These targets will not need to be updated to call
 set_float_2nan_prop_rule().
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-2-peter.maydell@linaro.org
+Message-id: 20250110131211.2546314-1-peter.maydell@linaro.org
 Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 ---
- include/fpu/softfloat-helpers.h |  11 ++
+ target/arm/helper.c                           | 285 -----------------
- include/fpu/softfloat-types.h   |  42 ++++++
+ target/arm/tcg/arith_helper.c                 | 296 ++++++++++++++++++
- fpu/softfloat-specialize.c.inc  | 229 ++++++++++++++++++--------------
+ .../arm/{op_addsub.h => tcg/op_addsub.c.inc}  |   0
-files changed, 185 insertions(+), 97 deletions(-)
+ target/arm/tcg/meson.build                    |   1 +
 files changed, 297 insertions(+), 285 deletions(-)
  create mode 100644 target/arm/tcg/arith_helper.c
  rename target/arm/{op_addsub.h => tcg/op_addsub.c.inc} (100%)
-diff --git a/include/fpu/softfloat-helpers.h b/include/fpu/softfloat-helpers.h
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/fpu/softfloat-helpers.h
+--- a/target/arm/helper.c
-+++ b/include/fpu/softfloat-helpers.h
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static inline void set_floatx80_rounding_precision(FloatX80RoundPrec val,
+@@ -XXX,XX +XXX,XX @@
-     status->floatx80_rounding_precision = val;
+ #include "qemu/main-loop.h"
  #include "qemu/timer.h"
  #include "qemu/bitops.h"
 -#include "qemu/crc32c.h"
  #include "qemu/qemu-print.h"
  #include "exec/exec-all.h"
  #include "exec/translation-block.h"
 -#include <zlib.h> /* for crc32 */
  #include "hw/irq.h"
  #include "system/cpu-timers.h"
  #include "system/kvm.h"
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
      };
  }
-+static inline void set_float_2nan_prop_rule(Float2NaNPropRule rule,
+-/*
-+                                            float_status *status)
+- * Note that signed overflow is undefined in C.  The following routines are
-+{
+- * careful to use unsigned types where modulo arithmetic is required.
-+    status->float_2nan_prop_rule = rule;
+- * Failure to do so _will_ break on newer gcc.
-+}
+- */
-+
+-
- static inline void set_flush_to_zero(bool val, float_status *status)
+-/* Signed saturating arithmetic.  */
- {
+-
-     status->flush_to_zero = val;
+-/* Perform 16-bit signed saturating addition.  */
-@@ -XXX,XX +XXX,XX @@ get_floatx80_rounding_precision(float_status *status)
+-static inline uint16_t add16_sat(uint16_t a, uint16_t b)
-     return status->floatx80_rounding_precision;
+-{
- }
+-    uint16_t res;
+-
-+static inline Float2NaNPropRule get_float_2nan_prop_rule(float_status *status)
+-    res = a + b;
-+{
+-    if (((res ^ a) & 0x8000) && !((a ^ b) & 0x8000)) {
-+    return status->float_2nan_prop_rule;
+-        if (a & 0x8000) {
-+}
+-            res = 0x8000;
-+
+-        } else {
- static inline bool get_flush_to_zero(float_status *status)
+-            res = 0x7fff;
- {
+-        }
-     return status->flush_to_zero;
+-    }
-diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
+-    return res;
-index XXXXXXX..XXXXXXX 100644
+-}
---- a/include/fpu/softfloat-types.h
+-
-+++ b/include/fpu/softfloat-types.h
+-/* Perform 8-bit signed saturating addition.  */
-@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
+-static inline uint8_t add8_sat(uint8_t a, uint8_t b)
-     floatx80_precision_s,
+-{
- } FloatX80RoundPrec;
+-    uint8_t res;
+-
-+/*
+-    res = a + b;
-+ * 2-input NaN propagation rule. Individual architectures have
+-    if (((res ^ a) & 0x80) && !((a ^ b) & 0x80)) {
-+ * different rules for which input NaN is propagated to the output
+-        if (a & 0x80) {
-+ * when there is more than one NaN on the input.
+-            res = 0x80;
-+ *
+-        } else {
-+ * If default_nan_mode is enabled then it is valid not to set a
+-            res = 0x7f;
-+ * NaN propagation rule, because the softfloat code guarantees
+-        }
-+ * not to try to pick a NaN to propagate in default NaN mode.
+-    }
-+ *
+-    return res;
-+ * For transition, currently the 'none' rule will cause us to
+-}
-+ * fall back to picking the propagation rule based on the existing
+-
-+ * ifdef ladder. When all targets are converted it will be an error
+-/* Perform 16-bit signed saturating subtraction.  */
-+ * not to set the rule in float_status unless in default_nan_mode,
+-static inline uint16_t sub16_sat(uint16_t a, uint16_t b)
-+ * and we will assert if we need to handle an input NaN and no
+-{
-+ * rule was selected.
+-    uint16_t res;
-+ */
+-
-+typedef enum __attribute__((__packed__)) {
+-    res = a - b;
-+    /* No propagation rule specified */
+-    if (((res ^ a) & 0x8000) && ((a ^ b) & 0x8000)) {
-+    float_2nan_prop_none = 0,
+-        if (a & 0x8000) {
-+    /* Prefer SNaN over QNaN, then operand A over B */
+-            res = 0x8000;
-+    float_2nan_prop_s_ab,
+-        } else {
-+    /* Prefer SNaN over QNaN, then operand B over A */
+-            res = 0x7fff;
-+    float_2nan_prop_s_ba,
+-        }
-+    /* Prefer A over B regardless of SNaN vs QNaN */
+-    }
-+    float_2nan_prop_ab,
+-    return res;
-+    /* Prefer B over A regardless of SNaN vs QNaN */
+-}
-+    float_2nan_prop_ba,
+-
-+    /*
+-/* Perform 8-bit signed saturating subtraction.  */
-+     * This implements x87 NaN propagation rules:
+-static inline uint8_t sub8_sat(uint8_t a, uint8_t b)
-+     * SNaN + QNaN => return the QNaN
+-{
-+     * two SNaNs => return the one with the larger significand, silenced
+-    uint8_t res;
-+     * two QNaNs => return the one with the larger significand
+-
-+     * SNaN and a non-NaN => return the SNaN, silenced
+-    res = a - b;
-+     * QNaN and a non-NaN => return the QNaN
+-    if (((res ^ a) & 0x80) && ((a ^ b) & 0x80)) {
-+     *
+-        if (a & 0x80) {
-+     * If we get down to comparing significands and they are the same,
+-            res = 0x80;
-+     * return the NaN with the positive sign bit (if any).
+-        } else {
-+     */
+-            res = 0x7f;
-+    float_2nan_prop_x87,
+-        }
-+} Float2NaNPropRule;
+-    }
-+
+-    return res;
- /*
+-}
-  * Floating Point Status. Individual architectures may maintain
+-
-  * several versions of float_status for different functions. The
+-#define ADD16(a, b, n) RESULT(add16_sat(a, b), n, 16);
-@@ -XXX,XX +XXX,XX @@ typedef struct float_status {
+-#define SUB16(a, b, n) RESULT(sub16_sat(a, b), n, 16);
-     uint16_t float_exception_flags;
+-#define ADD8(a, b, n)  RESULT(add8_sat(a, b), n, 8);
-     FloatRoundMode float_rounding_mode;
+-#define SUB8(a, b, n)  RESULT(sub8_sat(a, b), n, 8);
-     FloatX80RoundPrec floatx80_rounding_precision;
+-#define PFX q
-+    Float2NaNPropRule float_2nan_prop_rule;
+-
-     bool tininess_before_rounding;
+-#include "op_addsub.h"
-     /* should denormalised results go to zero and set the inexact flag? */
+-
-     bool flush_to_zero;
+-/* Unsigned saturating arithmetic.  */
-diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
+-static inline uint16_t add16_usat(uint16_t a, uint16_t b)
-index XXXXXXX..XXXXXXX 100644
+-{
---- a/fpu/softfloat-specialize.c.inc
+-    uint16_t res;
-+++ b/fpu/softfloat-specialize.c.inc
+-    res = a + b;
-@@ -XXX,XX +XXX,XX @@ bool float32_is_signaling_nan(float32 a_, float_status *status)
+-    if (res < a) {
- static int pickNaN(FloatClass a_cls, FloatClass b_cls,
+-        res = 0xffff;
-                    bool aIsLargerSignificand, float_status *status)
+-    }
- {
+-    return res;
--#if defined(TARGET_ARM) || defined(TARGET_MIPS) || defined(TARGET_HPPA) || \
+-}
--    defined(TARGET_LOONGARCH64) || defined(TARGET_S390X)
+-
--    /* ARM mandated NaN propagation rules (see FPProcessNaNs()), take
+-static inline uint16_t sub16_usat(uint16_t a, uint16_t b)
--     * the first of:
+-{
--     *  1. A if it is signaling
+-    if (a > b) {
--     *  2. B if it is signaling
+-        return a - b;
 -     *  3. A (quiet)
 -     *  4. B (quiet)
 -     * A signaling NaN is always quietened before returning it.
 -     */
 -    /* According to MIPS specifications, if one of the two operands is
 -     * a sNaN, a new qNaN has to be generated. This is done in
 -     * floatXX_silence_nan(). For qNaN inputs the specifications
 -     * says: "When possible, this QNaN result is one of the operand QNaN
 -     * values." In practice it seems that most implementations choose
 -     * the first operand if both operands are qNaN. In short this gives
 -     * the following rules:
 -     *  1. A if it is signaling
 -     *  2. B if it is signaling
 -     *  3. A (quiet)
 -     *  4. B (quiet)
 -     * A signaling NaN is always silenced before returning it.
 -     */
 -    if (is_snan(a_cls)) {
 -        return 0;
 -    } else if (is_snan(b_cls)) {
 -        return 1;
 -    } else if (is_qnan(a_cls)) {
 -        return 0;
 -    } else {
 -        return 1;
 -    }
 -#elif defined(TARGET_PPC) || defined(TARGET_M68K)
 -    /* PowerPC propagation rules:
 -     *  1. A if it sNaN or qNaN
 -     *  2. B if it sNaN or qNaN
 -     * A signaling NaN is always silenced before returning it.
 -     */
 -    /* M68000 FAMILY PROGRAMMER'S REFERENCE MANUAL
 -     * 3.4 FLOATING-POINT INSTRUCTION DETAILS
 -     * If either operand, but not both operands, of an operation is a
 -     * nonsignaling NaN, then that NaN is returned as the result. If both
 -     * operands are nonsignaling NaNs, then the destination operand
 -     * nonsignaling NaN is returned as the result.
 -     * If either operand to an operation is a signaling NaN (SNaN), then the
 -     * SNaN bit is set in the FPSR EXC byte. If the SNaN exception enable bit
 -     * is set in the FPCR ENABLE byte, then the exception is taken and the
 -     * destination is not modified. If the SNaN exception enable bit is not
 -     * set, setting the SNaN bit in the operand to a one converts the SNaN to
 -     * a nonsignaling NaN. The operation then continues as described in the
 -     * preceding paragraph for nonsignaling NaNs.
 -     */
 -    if (is_nan(a_cls)) {
 -        return 0;
 -    } else {
 -        return 1;
 -    }
 -#elif defined(TARGET_SPARC)
 -    /* Prefer SNaN over QNaN, order B then A. */
 -    if (is_snan(b_cls)) {
 -        return 1;
 -    } else if (is_snan(a_cls)) {
 -        return 0;
 -    } else if (is_qnan(b_cls)) {
 -        return 1;
 -    } else {
 -        return 0;
 -    }
--#elif defined(TARGET_XTENSA)
+-}
-+    Float2NaNPropRule rule = status->float_2nan_prop_rule;
+-
-+
+-static inline uint8_t add8_usat(uint8_t a, uint8_t b)
-     /*
+-{
--     * Xtensa has two NaN propagation modes.
+-    uint8_t res;
--     * Which one is active is controlled by float_status::use_first_nan.
+-    res = a + b;
-+     * We guarantee not to require the target to tell us how to
+-    if (res < a) {
-+     * pick a NaN if we're always returning the default NaN.
+-        res = 0xff;
-      */
+-    }
--    if (status->use_first_nan) {
+-    return res;
-+    assert(!status->default_nan_mode);
+-}
-+
+-
-+    if (rule == float_2nan_prop_none) {
+-static inline uint8_t sub8_usat(uint8_t a, uint8_t b)
-+        /* target didn't set the rule: fall back to old ifdef choices */
+-{
-+#if defined(TARGET_AVR) || defined(TARGET_HEXAGON) \
+-    if (a > b) {
-+    || defined(TARGET_RISCV) || defined(TARGET_SH4) \
+-        return a - b;
-+    || defined(TARGET_TRICORE)
+-    } else {
-+        g_assert_not_reached();
+-        return 0;
-+#elif defined(TARGET_ARM) || defined(TARGET_MIPS) || defined(TARGET_HPPA) || \
+-    }
-+    defined(TARGET_LOONGARCH64) || defined(TARGET_S390X)
+-}
-+        /*
+-
-+         * ARM mandated NaN propagation rules (see FPProcessNaNs()), take
+-#define ADD16(a, b, n) RESULT(add16_usat(a, b), n, 16);
-+         * the first of:
+-#define SUB16(a, b, n) RESULT(sub16_usat(a, b), n, 16);
-+         *  1. A if it is signaling
+-#define ADD8(a, b, n)  RESULT(add8_usat(a, b), n, 8);
-+         *  2. B if it is signaling
+-#define SUB8(a, b, n)  RESULT(sub8_usat(a, b), n, 8);
-+         *  3. A (quiet)
+-#define PFX uq
-+         *  4. B (quiet)
+-
-+         * A signaling NaN is always quietened before returning it.
+-#include "op_addsub.h"
-+         */
+-
-+        /*
+-/* Signed modulo arithmetic.  */
-+         * According to MIPS specifications, if one of the two operands is
+-#define SARITH16(a, b, n, op) do { \
-+         * a sNaN, a new qNaN has to be generated. This is done in
+-    int32_t sum; \
-+         * floatXX_silence_nan(). For qNaN inputs the specifications
+-    sum = (int32_t)(int16_t)(a) op (int32_t)(int16_t)(b); \
-+         * says: "When possible, this QNaN result is one of the operand QNaN
+-    RESULT(sum, n, 16); \
-+         * values." In practice it seems that most implementations choose
+-    if (sum >= 0) \
-+         * the first operand if both operands are qNaN. In short this gives
+-        ge |= 3 << (n * 2); \
-+         * the following rules:
+-    } while (0)
-+         *  1. A if it is signaling
+-
-+         *  2. B if it is signaling
+-#define SARITH8(a, b, n, op) do { \
-+         *  3. A (quiet)
+-    int32_t sum; \
-+         *  4. B (quiet)
+-    sum = (int32_t)(int8_t)(a) op (int32_t)(int8_t)(b); \
-+         * A signaling NaN is always silenced before returning it.
+-    RESULT(sum, n, 8); \
-+         */
+-    if (sum >= 0) \
-+        rule = float_2nan_prop_s_ab;
+-        ge |= 1 << n; \
-+#elif defined(TARGET_PPC) || defined(TARGET_M68K)
+-    } while (0)
-+        /*
+-
-+         * PowerPC propagation rules:
+-
-+         *  1. A if it sNaN or qNaN
+-#define ADD16(a, b, n) SARITH16(a, b, n, +)
-+         *  2. B if it sNaN or qNaN
+-#define SUB16(a, b, n) SARITH16(a, b, n, -)
-+         * A signaling NaN is always silenced before returning it.
+-#define ADD8(a, b, n)  SARITH8(a, b, n, +)
-+         */
+-#define SUB8(a, b, n)  SARITH8(a, b, n, -)
-+        /*
+-#define PFX s
-+         * M68000 FAMILY PROGRAMMER'S REFERENCE MANUAL
+-#define ARITH_GE
-+         * 3.4 FLOATING-POINT INSTRUCTION DETAILS
+-
-+         * If either operand, but not both operands, of an operation is a
+-#include "op_addsub.h"
-+         * nonsignaling NaN, then that NaN is returned as the result. If both
+-
-+         * operands are nonsignaling NaNs, then the destination operand
+-/* Unsigned modulo arithmetic.  */
-+         * nonsignaling NaN is returned as the result.
+-#define ADD16(a, b, n) do { \
-+         * If either operand to an operation is a signaling NaN (SNaN), then the
+-    uint32_t sum; \
-+         * SNaN bit is set in the FPSR EXC byte. If the SNaN exception enable bit
+-    sum = (uint32_t)(uint16_t)(a) + (uint32_t)(uint16_t)(b); \
-+         * is set in the FPCR ENABLE byte, then the exception is taken and the
+-    RESULT(sum, n, 16); \
-+         * destination is not modified. If the SNaN exception enable bit is not
+-    if ((sum >> 16) == 1) \
-+         * set, setting the SNaN bit in the operand to a one converts the SNaN to
+-        ge |= 3 << (n * 2); \
-+         * a nonsignaling NaN. The operation then continues as described in the
+-    } while (0)
-+         * preceding paragraph for nonsignaling NaNs.
+-
-+         */
+-#define ADD8(a, b, n) do { \
-+        rule = float_2nan_prop_ab;
+-    uint32_t sum; \
-+#elif defined(TARGET_SPARC)
+-    sum = (uint32_t)(uint8_t)(a) + (uint32_t)(uint8_t)(b); \
-+        /* Prefer SNaN over QNaN, order B then A. */
+-    RESULT(sum, n, 8); \
-+        rule = float_2nan_prop_s_ba;
+-    if ((sum >> 8) == 1) \
-+#elif defined(TARGET_XTENSA)
+-        ge |= 1 << n; \
-+        /*
+-    } while (0)
-+         * Xtensa has two NaN propagation modes.
+-
-+         * Which one is active is controlled by float_status::use_first_nan.
+-#define SUB16(a, b, n) do { \
-+         */
+-    uint32_t sum; \
-+        if (status->use_first_nan) {
+-    sum = (uint32_t)(uint16_t)(a) - (uint32_t)(uint16_t)(b); \
-+            rule = float_2nan_prop_ab;
+-    RESULT(sum, n, 16); \
 -    if ((sum >> 16) == 0) \
 -        ge |= 3 << (n * 2); \
 -    } while (0)
 -
 -#define SUB8(a, b, n) do { \
 -    uint32_t sum; \
 -    sum = (uint32_t)(uint8_t)(a) - (uint32_t)(uint8_t)(b); \
 -    RESULT(sum, n, 8); \
 -    if ((sum >> 8) == 0) \
 -        ge |= 1 << n; \
 -    } while (0)
 -
 -#define PFX u
 -#define ARITH_GE
 -
 -#include "op_addsub.h"
 -
 -/* Halved signed arithmetic.  */
 -#define ADD16(a, b, n) \
 -  RESULT(((int32_t)(int16_t)(a) + (int32_t)(int16_t)(b)) >> 1, n, 16)
 -#define SUB16(a, b, n) \
 -  RESULT(((int32_t)(int16_t)(a) - (int32_t)(int16_t)(b)) >> 1, n, 16)
 -#define ADD8(a, b, n) \
 -  RESULT(((int32_t)(int8_t)(a) + (int32_t)(int8_t)(b)) >> 1, n, 8)
 -#define SUB8(a, b, n) \
 -  RESULT(((int32_t)(int8_t)(a) - (int32_t)(int8_t)(b)) >> 1, n, 8)
 -#define PFX sh
 -
 -#include "op_addsub.h"
 -
 -/* Halved unsigned arithmetic.  */
 -#define ADD16(a, b, n) \
 -  RESULT(((uint32_t)(uint16_t)(a) + (uint32_t)(uint16_t)(b)) >> 1, n, 16)
 -#define SUB16(a, b, n) \
 -  RESULT(((uint32_t)(uint16_t)(a) - (uint32_t)(uint16_t)(b)) >> 1, n, 16)
 -#define ADD8(a, b, n) \
 -  RESULT(((uint32_t)(uint8_t)(a) + (uint32_t)(uint8_t)(b)) >> 1, n, 8)
 -#define SUB8(a, b, n) \
 -  RESULT(((uint32_t)(uint8_t)(a) - (uint32_t)(uint8_t)(b)) >> 1, n, 8)
 -#define PFX uh
 -
 -#include "op_addsub.h"
 -
 -static inline uint8_t do_usad(uint8_t a, uint8_t b)
 -{
 -    if (a > b) {
 -        return a - b;
 -    } else {
 -        return b - a;
 -    }
 -}
 -
 -/* Unsigned sum of absolute byte differences.  */
 -uint32_t HELPER(usad8)(uint32_t a, uint32_t b)
 -{
 -    uint32_t sum;
 -    sum = do_usad(a, b);
 -    sum += do_usad(a >> 8, b >> 8);
 -    sum += do_usad(a >> 16, b >> 16);
 -    sum += do_usad(a >> 24, b >> 24);
 -    return sum;
 -}
 -
 -/* For ARMv6 SEL instruction.  */
 -uint32_t HELPER(sel_flags)(uint32_t flags, uint32_t a, uint32_t b)
 -{
 -    uint32_t mask;
 -
 -    mask = 0;
 -    if (flags & 1) {
 -        mask |= 0xff;
 -    }
 -    if (flags & 2) {
 -        mask |= 0xff00;
 -    }
 -    if (flags & 4) {
 -        mask |= 0xff0000;
 -    }
 -    if (flags & 8) {
 -        mask |= 0xff000000;
 -    }
 -    return (a & mask) | (b & ~mask);
 -}
 -
 -/*
 - * CRC helpers.
 - * The upper bytes of val (above the number specified by 'bytes') must have
 - * been zeroed out by the caller.
 - */
 -uint32_t HELPER(crc32)(uint32_t acc, uint32_t val, uint32_t bytes)
 -{
 -    uint8_t buf[4];
 -
 -    stl_le_p(buf, val);
 -
 -    /* zlib crc32 converts the accumulator and output to one's complement.  */
 -    return crc32(acc ^ 0xffffffff, buf, bytes) ^ 0xffffffff;
 -}
 -
 -uint32_t HELPER(crc32c)(uint32_t acc, uint32_t val, uint32_t bytes)
 -{
 -    uint8_t buf[4];
 -
 -    stl_le_p(buf, val);
 -
 -    /* Linux crc32c converts the output to one's complement.  */
 -    return crc32c(acc, buf, bytes) ^ 0xffffffff;
 -}
  /*
   * Return the exception level to which FP-disabled exceptions should
 diff --git a/target/arm/tcg/arith_helper.c b/target/arm/tcg/arith_helper.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/tcg/arith_helper.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * ARM generic helpers for various arithmetical operations.
 + *
 + * This code is licensed under the GNU GPL v2 or later.
 + *
 + * SPDX-License-Identifier: GPL-2.0-or-later
 + */
 +#include "qemu/osdep.h"
 +#include "cpu.h"
 +#include "exec/helper-proto.h"
 +#include "qemu/crc32c.h"
 +#include <zlib.h> /* for crc32 */
 +
 +/*
 + * Note that signed overflow is undefined in C.  The following routines are
 + * careful to use unsigned types where modulo arithmetic is required.
 + * Failure to do so _will_ break on newer gcc.
 + */
 +
 +/* Signed saturating arithmetic.  */
 +
 +/* Perform 16-bit signed saturating addition.  */
 +static inline uint16_t add16_sat(uint16_t a, uint16_t b)
 +{
 +    uint16_t res;
 +
 +    res = a + b;
 +    if (((res ^ a) & 0x8000) && !((a ^ b) & 0x8000)) {
 +        if (a & 0x8000) {
 +            res = 0x8000;
 +        } else {
-+            rule = float_2nan_prop_ba;
++            res = 0x7fff;
 +        }
-+#else
++    }
-+        rule = float_2nan_prop_x87;
++    return res;
-+#endif
++}
-+    }
++
-+
++/* Perform 8-bit signed saturating addition.  */
-+    switch (rule) {
++static inline uint8_t add8_sat(uint8_t a, uint8_t b)
-+    case float_2nan_prop_s_ab:
++{
-+        if (is_snan(a_cls)) {
++    uint8_t res;
-+            return 0;
++
-+        } else if (is_snan(b_cls)) {
++    res = a + b;
-+            return 1;
++    if (((res ^ a) & 0x80) && !((a ^ b) & 0x80)) {
-+        } else if (is_qnan(a_cls)) {
++        if (a & 0x80) {
-+            return 0;
++            res = 0x80;
 +        } else {
-+            return 1;
++            res = 0x7f;
 +        }
-+        break;
++    }
-+    case float_2nan_prop_s_ba:
++    return res;
-+        if (is_snan(b_cls)) {
++}
-+            return 1;
++
-+        } else if (is_snan(a_cls)) {
++/* Perform 16-bit signed saturating subtraction.  */
-+            return 0;
++static inline uint16_t sub16_sat(uint16_t a, uint16_t b)
-+        } else if (is_qnan(b_cls)) {
++{
-+            return 1;
++    uint16_t res;
 +
 +    res = a - b;
 +    if (((res ^ a) & 0x8000) && ((a ^ b) & 0x8000)) {
 +        if (a & 0x8000) {
 +            res = 0x8000;
 +        } else {
-+            return 0;
++            res = 0x7fff;
 +        }
-+        break;
++    }
-+    case float_2nan_prop_ab:
++    return res;
-         if (is_nan(a_cls)) {
++}
-             return 0;
++
-         } else {
++/* Perform 8-bit signed saturating subtraction.  */
-             return 1;
++static inline uint8_t sub8_sat(uint8_t a, uint8_t b)
-         }
++{
--    } else {
++    uint8_t res;
-+        break;
++
-+    case float_2nan_prop_ba:
++    res = a - b;
-         if (is_nan(b_cls)) {
++    if (((res ^ a) & 0x80) && ((a ^ b) & 0x80)) {
-             return 1;
++        if (a & 0x80) {
-         } else {
++            res = 0x80;
-             return 0;
++        } else {
-         }
++            res = 0x7f;
--    }
++        }
--#else
++    }
--    /* This implements x87 NaN propagation rules:
++    return res;
--     * SNaN + QNaN => return the QNaN
++}
--     * two SNaNs => return the one with the larger significand, silenced
++
--     * two QNaNs => return the one with the larger significand
++#define ADD16(a, b, n) RESULT(add16_sat(a, b), n, 16);
--     * SNaN and a non-NaN => return the SNaN, silenced
++#define SUB16(a, b, n) RESULT(sub16_sat(a, b), n, 16);
--     * QNaN and a non-NaN => return the QNaN
++#define ADD8(a, b, n)  RESULT(add8_sat(a, b), n, 8);
--     *
++#define SUB8(a, b, n)  RESULT(sub8_sat(a, b), n, 8);
--     * If we get down to comparing significands and they are the same,
++#define PFX q
--     * return the NaN with the positive sign bit (if any).
++
--     */
++#include "op_addsub.c.inc"
--    if (is_snan(a_cls)) {
++
--        if (is_snan(b_cls)) {
++/* Unsigned saturating arithmetic.  */
--            return aIsLargerSignificand ? 0 : 1;
++static inline uint16_t add16_usat(uint16_t a, uint16_t b)
--        }
++{
--        return is_qnan(b_cls) ? 1 : 0;
++    uint16_t res;
--    } else if (is_qnan(a_cls)) {
++    res = a + b;
--        if (is_snan(b_cls) || !is_qnan(b_cls)) {
++    if (res < a) {
--            return 0;
++        res = 0xffff;
-+        break;
++    }
-+    case float_2nan_prop_x87:
++    return res;
-+        /*
++}
-+         * This implements x87 NaN propagation rules:
++
-+         * SNaN + QNaN => return the QNaN
++static inline uint16_t sub16_usat(uint16_t a, uint16_t b)
-+         * two SNaNs => return the one with the larger significand, silenced
++{
-+         * two QNaNs => return the one with the larger significand
++    if (a > b) {
-+         * SNaN and a non-NaN => return the SNaN, silenced
++        return a - b;
-+         * QNaN and a non-NaN => return the QNaN
++    } else {
-+         *
++        return 0;
-+         * If we get down to comparing significands and they are the same,
++    }
-+         * return the NaN with the positive sign bit (if any).
++}
-+         */
++
-+        if (is_snan(a_cls)) {
++static inline uint8_t add8_usat(uint8_t a, uint8_t b)
-+            if (is_snan(b_cls)) {
++{
-+                return aIsLargerSignificand ? 0 : 1;
++    uint8_t res;
-+            }
++    res = a + b;
-+            return is_qnan(b_cls) ? 1 : 0;
++    if (res < a) {
-+        } else if (is_qnan(a_cls)) {
++        res = 0xff;
-+            if (is_snan(b_cls) || !is_qnan(b_cls)) {
++    }
-+                return 0;
++    return res;
-+            } else {
++}
-+                return aIsLargerSignificand ? 0 : 1;
++
-+            }
++static inline uint8_t sub8_usat(uint8_t a, uint8_t b)
-         } else {
++{
--            return aIsLargerSignificand ? 0 : 1;
++    if (a > b) {
-+            return 1;
++        return a - b;
-         }
++    } else {
--    } else {
++        return 0;
--        return 1;
++    }
-+    default:
++}
-+        g_assert_not_reached();
++
-     }
++#define ADD16(a, b, n) RESULT(add16_usat(a, b), n, 16);
--#endif
++#define SUB16(a, b, n) RESULT(sub16_usat(a, b), n, 16);
- }
++#define ADD8(a, b, n)  RESULT(add8_usat(a, b), n, 8);
++#define SUB8(a, b, n)  RESULT(sub8_usat(a, b), n, 8);
- /*----------------------------------------------------------------------------
++#define PFX uq
 +
 +#include "op_addsub.c.inc"
 +
 +/* Signed modulo arithmetic.  */
 +#define SARITH16(a, b, n, op) do { \
 +    int32_t sum; \
 +    sum = (int32_t)(int16_t)(a) op (int32_t)(int16_t)(b); \
 +    RESULT(sum, n, 16); \
 +    if (sum >= 0) \
 +        ge |= 3 << (n * 2); \
 +    } while (0)
 +
 +#define SARITH8(a, b, n, op) do { \
 +    int32_t sum; \
 +    sum = (int32_t)(int8_t)(a) op (int32_t)(int8_t)(b); \
 +    RESULT(sum, n, 8); \
 +    if (sum >= 0) \
 +        ge |= 1 << n; \
 +    } while (0)
 +
 +
 +#define ADD16(a, b, n) SARITH16(a, b, n, +)
 +#define SUB16(a, b, n) SARITH16(a, b, n, -)
 +#define ADD8(a, b, n)  SARITH8(a, b, n, +)
 +#define SUB8(a, b, n)  SARITH8(a, b, n, -)
 +#define PFX s
 +#define ARITH_GE
 +
 +#include "op_addsub.c.inc"
 +
 +/* Unsigned modulo arithmetic.  */
 +#define ADD16(a, b, n) do { \
 +    uint32_t sum; \
 +    sum = (uint32_t)(uint16_t)(a) + (uint32_t)(uint16_t)(b); \
 +    RESULT(sum, n, 16); \
 +    if ((sum >> 16) == 1) \
 +        ge |= 3 << (n * 2); \
 +    } while (0)
 +
 +#define ADD8(a, b, n) do { \
 +    uint32_t sum; \
 +    sum = (uint32_t)(uint8_t)(a) + (uint32_t)(uint8_t)(b); \
 +    RESULT(sum, n, 8); \
 +    if ((sum >> 8) == 1) \
 +        ge |= 1 << n; \
 +    } while (0)
 +
 +#define SUB16(a, b, n) do { \
 +    uint32_t sum; \
 +    sum = (uint32_t)(uint16_t)(a) - (uint32_t)(uint16_t)(b); \
 +    RESULT(sum, n, 16); \
 +    if ((sum >> 16) == 0) \
 +        ge |= 3 << (n * 2); \
 +    } while (0)
 +
 +#define SUB8(a, b, n) do { \
 +    uint32_t sum; \
 +    sum = (uint32_t)(uint8_t)(a) - (uint32_t)(uint8_t)(b); \
 +    RESULT(sum, n, 8); \
 +    if ((sum >> 8) == 0) \
 +        ge |= 1 << n; \
 +    } while (0)
 +
 +#define PFX u
 +#define ARITH_GE
 +
 +#include "op_addsub.c.inc"
 +
 +/* Halved signed arithmetic.  */
 +#define ADD16(a, b, n) \
 +  RESULT(((int32_t)(int16_t)(a) + (int32_t)(int16_t)(b)) >> 1, n, 16)
 +#define SUB16(a, b, n) \
 +  RESULT(((int32_t)(int16_t)(a) - (int32_t)(int16_t)(b)) >> 1, n, 16)
 +#define ADD8(a, b, n) \
 +  RESULT(((int32_t)(int8_t)(a) + (int32_t)(int8_t)(b)) >> 1, n, 8)
 +#define SUB8(a, b, n) \
 +  RESULT(((int32_t)(int8_t)(a) - (int32_t)(int8_t)(b)) >> 1, n, 8)
 +#define PFX sh
 +
 +#include "op_addsub.c.inc"
 +
 +/* Halved unsigned arithmetic.  */
 +#define ADD16(a, b, n) \
 +  RESULT(((uint32_t)(uint16_t)(a) + (uint32_t)(uint16_t)(b)) >> 1, n, 16)
 +#define SUB16(a, b, n) \
 +  RESULT(((uint32_t)(uint16_t)(a) - (uint32_t)(uint16_t)(b)) >> 1, n, 16)
 +#define ADD8(a, b, n) \
 +  RESULT(((uint32_t)(uint8_t)(a) + (uint32_t)(uint8_t)(b)) >> 1, n, 8)
 +#define SUB8(a, b, n) \
 +  RESULT(((uint32_t)(uint8_t)(a) - (uint32_t)(uint8_t)(b)) >> 1, n, 8)
 +#define PFX uh
 +
 +#include "op_addsub.c.inc"
 +
 +static inline uint8_t do_usad(uint8_t a, uint8_t b)
 +{
 +    if (a > b) {
 +        return a - b;
 +    } else {
 +        return b - a;
 +    }
 +}
 +
 +/* Unsigned sum of absolute byte differences.  */
 +uint32_t HELPER(usad8)(uint32_t a, uint32_t b)
 +{
 +    uint32_t sum;
 +    sum = do_usad(a, b);
 +    sum += do_usad(a >> 8, b >> 8);
 +    sum += do_usad(a >> 16, b >> 16);
 +    sum += do_usad(a >> 24, b >> 24);
 +    return sum;
 +}
 +
 +/* For ARMv6 SEL instruction.  */
 +uint32_t HELPER(sel_flags)(uint32_t flags, uint32_t a, uint32_t b)
 +{
 +    uint32_t mask;
 +
 +    mask = 0;
 +    if (flags & 1) {
 +        mask |= 0xff;
 +    }
 +    if (flags & 2) {
 +        mask |= 0xff00;
 +    }
 +    if (flags & 4) {
 +        mask |= 0xff0000;
 +    }
 +    if (flags & 8) {
 +        mask |= 0xff000000;
 +    }
 +    return (a & mask) | (b & ~mask);
 +}
 +
 +/*
 + * CRC helpers.
 + * The upper bytes of val (above the number specified by 'bytes') must have
 + * been zeroed out by the caller.
 + */
 +uint32_t HELPER(crc32)(uint32_t acc, uint32_t val, uint32_t bytes)
 +{
 +    uint8_t buf[4];
 +
 +    stl_le_p(buf, val);
 +
 +    /* zlib crc32 converts the accumulator and output to one's complement.  */
 +    return crc32(acc ^ 0xffffffff, buf, bytes) ^ 0xffffffff;
 +}
 +
 +uint32_t HELPER(crc32c)(uint32_t acc, uint32_t val, uint32_t bytes)
 +{
 +    uint8_t buf[4];
 +
 +    stl_le_p(buf, val);
 +
 +    /* Linux crc32c converts the output to one's complement.  */
 +    return crc32c(acc, buf, bytes) ^ 0xffffffff;
 +}
 diff --git a/target/arm/op_addsub.h b/target/arm/tcg/op_addsub.c.inc
 similarity index 100%
 rename from target/arm/op_addsub.h
 rename to target/arm/tcg/op_addsub.c.inc
 diff --git a/target/arm/tcg/meson.build b/target/arm/tcg/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/meson.build
 +++ b/target/arm/tcg/meson.build
@@ -XXX,XX +XXX,XX @@ arm_ss.add(files(
    'tlb_helper.c',
    'vec_helper.c',
    'tlb-insns.c',
 +  'arith_helper.c',
  ))
  arm_ss.add(when: 'TARGET_AARCH64', if_true: files(
 --
 .34.1

-[PULL 02/31] tests/fp: Explicitly set 2-NaN propagation rule
+Deleted patch
-Explicitly set a 2-NaN propagation rule in the softfloat tests.  In
-meson.build we put -DTARGET_ARM in fpcflags, and so we should select
-here the Arm propagation rule of float_2nan_prop_s_ab.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-3-peter.maydell@linaro.org
----
- tests/fp/fp-bench.c     | 2 ++
- tests/fp/fp-test-log2.c | 1 +
- tests/fp/fp-test.c      | 2 ++
-files changed, 5 insertions(+)
-diff --git a/tests/fp/fp-bench.c b/tests/fp/fp-bench.c
-index XXXXXXX..XXXXXXX 100644
---- a/tests/fp/fp-bench.c
-+++ b/tests/fp/fp-bench.c
-@@ -XXX,XX +XXX,XX @@ static void run_bench(void)
- {
-     bench_func_t f;
-+    set_float_2nan_prop_rule(float_2nan_prop_s_ab, &soft_status);
-+
-     f = bench_funcs[operation][precision];
-     g_assert(f);
-     f();
-diff --git a/tests/fp/fp-test-log2.c b/tests/fp/fp-test-log2.c
-index XXXXXXX..XXXXXXX 100644
---- a/tests/fp/fp-test-log2.c
-+++ b/tests/fp/fp-test-log2.c
-@@ -XXX,XX +XXX,XX @@ int main(int ac, char **av)
-     float_status qsf = {0};
-     int i;
-+    set_float_2nan_prop_rule(float_2nan_prop_s_ab, &qsf);
-     set_float_rounding_mode(float_round_nearest_even, &qsf);
-     test.d = 0.0;
-diff --git a/tests/fp/fp-test.c b/tests/fp/fp-test.c
-index XXXXXXX..XXXXXXX 100644
---- a/tests/fp/fp-test.c
-+++ b/tests/fp/fp-test.c
-@@ -XXX,XX +XXX,XX @@ void run_test(void)
- {
-     unsigned int i;
-+    set_float_2nan_prop_rule(float_2nan_prop_s_ab, &qsf);
-+
-     genCases_setLevel(test_level);
-     verCases_maxErrorCount = n_max_errors;
---
-.34.1

-[PULL 03/31] target/arm: Explicitly set 2-NaN propagation rule
+Deleted patch
-Set the 2-NaN propagation rule explicitly in the float_status words
-we use.  We wrap this plus the pre-existing setting of the
-tininess-before-rounding flag in a new function
-arm_set_default_fp_behaviours() to avoid repetition, since we have a
-lot of float_status words at this point.
-The situation with FPA11 emulation in linux-user is a little odd, and
-arguably "correct" behaviour there would be to exactly match a real
-Linux kernel's FPA11 emulation.  However FPA11 emulation is
-essentially dead at this point and so it seems better to continue
-with QEMU's current behaviour and leave a comment describing the
-situation.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-4-peter.maydell@linaro.org
----
- linux-user/arm/nwfpe/fpa11.c   | 18 ++++++++++++++++++
- target/arm/cpu.c               | 25 +++++++++++++++++--------
- fpu/softfloat-specialize.c.inc | 13 ++-----------
-files changed, 37 insertions(+), 19 deletions(-)
-diff --git a/linux-user/arm/nwfpe/fpa11.c b/linux-user/arm/nwfpe/fpa11.c
-index XXXXXXX..XXXXXXX 100644
---- a/linux-user/arm/nwfpe/fpa11.c
-+++ b/linux-user/arm/nwfpe/fpa11.c
-@@ -XXX,XX +XXX,XX @@ void resetFPA11(void)
- #ifdef MAINTAIN_FPCR
-   fpa11->fpcr = MASK_RESET;
- #endif
-+
-+  /*
-+   * Real FPA11 hardware does not handle NaNs, but always takes an
-+   * exception for them to be software-emulated (ARM7500FE datasheet
-+   * section 10.4). There is no documented architectural requirement
-+   * for NaN propagation rules and it will depend on how the OS
-+   * level software emulation opted to do it. We here use prop_s_ab
-+   * which matches the later VFP hardware choice and how QEMU's
-+   * fpa11 emulation has worked in the past. The real Linux kernel
-+   * does something slightly different: arch/arm/nwfpe/softfloat-specialize
-+   * propagateFloat64NaN() has the curious behaviour that it prefers
-+   * the QNaN over the SNaN, but if both are QNaN it picks A and
-+   * if both are SNaN it picks B. In theory we could add this as
-+   * a NaN propagation rule, but in practice FPA11 emulation is so
-+   * close to totally dead that it's not worth trying to match it at
-+   * this late date.
-+   */
-+  set_float_2nan_prop_rule(float_2nan_prop_s_ab, &fpa11->fp_status);
- }
- void SetRoundingMode(const unsigned int opcode)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
-+++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
-     QLIST_INSERT_HEAD(&cpu->el_change_hooks, entry, node);
- }
-+/*
-+ * Set the float_status behaviour to match the Arm defaults:
-+ *  * tininess-before-rounding
-+ *  * 2-input NaN propagation prefers SNaN over QNaN, and then
-+ *    operand A over operand B (see FPProcessNaNs() pseudocode)
-+ */
-+static void arm_set_default_fp_behaviours(float_status *s)
-+{
-+    set_float_detect_tininess(float_tininess_before_rounding, s);
-+    set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
-+}
-+
- static void cp_reg_reset(gpointer key, gpointer value, gpointer opaque)
- {
-     /* Reset a single ARMCPRegInfo register */
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
-     set_flush_inputs_to_zero(1, &env->vfp.standard_fp_status);
-     set_default_nan_mode(1, &env->vfp.standard_fp_status);
-     set_default_nan_mode(1, &env->vfp.standard_fp_status_f16);
--    set_float_detect_tininess(float_tininess_before_rounding,
--                              &env->vfp.fp_status);
--    set_float_detect_tininess(float_tininess_before_rounding,
--                              &env->vfp.standard_fp_status);
--    set_float_detect_tininess(float_tininess_before_rounding,
--                              &env->vfp.fp_status_f16);
--    set_float_detect_tininess(float_tininess_before_rounding,
--                              &env->vfp.standard_fp_status_f16);
-+    arm_set_default_fp_behaviours(&env->vfp.fp_status);
-+    arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
-+    arm_set_default_fp_behaviours(&env->vfp.fp_status_f16);
-+    arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
-+
- #ifndef CONFIG_USER_ONLY
-     if (kvm_enabled()) {
-         kvm_arm_reset_vcpu(cpu);
-diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat-specialize.c.inc
-+++ b/fpu/softfloat-specialize.c.inc
-@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
-         /* target didn't set the rule: fall back to old ifdef choices */
- #if defined(TARGET_AVR) || defined(TARGET_HEXAGON) \
-     || defined(TARGET_RISCV) || defined(TARGET_SH4) \
--    || defined(TARGET_TRICORE)
-+    || defined(TARGET_TRICORE) || defined(TARGET_ARM)
-         g_assert_not_reached();
--#elif defined(TARGET_ARM) || defined(TARGET_MIPS) || defined(TARGET_HPPA) || \
-+#elif defined(TARGET_MIPS) || defined(TARGET_HPPA) || \
-     defined(TARGET_LOONGARCH64) || defined(TARGET_S390X)
--        /*
--         * ARM mandated NaN propagation rules (see FPProcessNaNs()), take
--         * the first of:
--         *  1. A if it is signaling
--         *  2. B if it is signaling
--         *  3. A (quiet)
--         *  4. B (quiet)
--         * A signaling NaN is always quietened before returning it.
--         */
-         /*
-          * According to MIPS specifications, if one of the two operands is
-          * a sNaN, a new qNaN has to be generated. This is done in
---
-.34.1

-[PULL 04/31] target/mips: Explicitly set 2-NaN propagation rule
+Deleted patch
-Set the 2-NaN propagation rule explicitly in the float_status words
-we use.
-For active_fpu.fp_status, we do this in a new fp_reset() function
-which mirrors the existing msa_reset() function in doing "first call
-restore to set the fp status parts that depend on CPU state, then set
-the fp status parts that are constant".
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Message-id: 20241025141254.2141506-5-peter.maydell@linaro.org
----
- target/mips/fpu_helper.h       | 22 ++++++++++++++++++++++
- target/mips/cpu.c              |  2 +-
- target/mips/msa.c              | 17 +++++++++++++++++
- fpu/softfloat-specialize.c.inc | 18 ++----------------
-files changed, 42 insertions(+), 17 deletions(-)
-diff --git a/target/mips/fpu_helper.h b/target/mips/fpu_helper.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/mips/fpu_helper.h
-+++ b/target/mips/fpu_helper.h
-@@ -XXX,XX +XXX,XX @@ static inline void restore_fp_status(CPUMIPSState *env)
-     restore_snan_bit_mode(env);
- }
-+static inline void fp_reset(CPUMIPSState *env)
-+{
-+    restore_fp_status(env);
-+
-+    /*
-+     * According to MIPS specifications, if one of the two operands is
-+     * a sNaN, a new qNaN has to be generated. This is done in
-+     * floatXX_silence_nan(). For qNaN inputs the specifications
-+     * says: "When possible, this QNaN result is one of the operand QNaN
-+     * values." In practice it seems that most implementations choose
-+     * the first operand if both operands are qNaN. In short this gives
-+     * the following rules:
-+     *  1. A if it is signaling
-+     *  2. B if it is signaling
-+     *  3. A (quiet)
-+     *  4. B (quiet)
-+     * A signaling NaN is always silenced before returning it.
-+     */
-+    set_float_2nan_prop_rule(float_2nan_prop_s_ab,
-+                             &env->active_fpu.fp_status);
-+}
-+
- /* MSA */
- enum CPUMIPSMSADataFormat {
-diff --git a/target/mips/cpu.c b/target/mips/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/mips/cpu.c
-+++ b/target/mips/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void mips_cpu_reset_hold(Object *obj, ResetType type)
-     }
-     msa_reset(env);
-+    fp_reset(env);
-     compute_hflags(env);
--    restore_fp_status(env);
-     restore_pamask(env);
-     cs->exception_index = EXCP_NONE;
-diff --git a/target/mips/msa.c b/target/mips/msa.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/mips/msa.c
-+++ b/target/mips/msa.c
-@@ -XXX,XX +XXX,XX @@ void msa_reset(CPUMIPSState *env)
-     set_float_detect_tininess(float_tininess_after_rounding,
-                               &env->active_tc.msa_fp_status);
-+    /*
-+     * According to MIPS specifications, if one of the two operands is
-+     * a sNaN, a new qNaN has to be generated. This is done in
-+     * floatXX_silence_nan(). For qNaN inputs the specifications
-+     * says: "When possible, this QNaN result is one of the operand QNaN
-+     * values." In practice it seems that most implementations choose
-+     * the first operand if both operands are qNaN. In short this gives
-+     * the following rules:
-+     *  1. A if it is signaling
-+     *  2. B if it is signaling
-+     *  3. A (quiet)
-+     *  4. B (quiet)
-+     * A signaling NaN is always silenced before returning it.
-+     */
-+    set_float_2nan_prop_rule(float_2nan_prop_s_ab,
-+                             &env->active_tc.msa_fp_status);
-+
-     /* clear float_status exception flags */
-     set_float_exception_flags(0, &env->active_tc.msa_fp_status);
-diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat-specialize.c.inc
-+++ b/fpu/softfloat-specialize.c.inc
-@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
-         /* target didn't set the rule: fall back to old ifdef choices */
- #if defined(TARGET_AVR) || defined(TARGET_HEXAGON) \
-     || defined(TARGET_RISCV) || defined(TARGET_SH4) \
--    || defined(TARGET_TRICORE) || defined(TARGET_ARM)
-+    || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS)
-         g_assert_not_reached();
--#elif defined(TARGET_MIPS) || defined(TARGET_HPPA) || \
-+#elif defined(TARGET_HPPA) || \
-     defined(TARGET_LOONGARCH64) || defined(TARGET_S390X)
--        /*
--         * According to MIPS specifications, if one of the two operands is
--         * a sNaN, a new qNaN has to be generated. This is done in
--         * floatXX_silence_nan(). For qNaN inputs the specifications
--         * says: "When possible, this QNaN result is one of the operand QNaN
--         * values." In practice it seems that most implementations choose
--         * the first operand if both operands are qNaN. In short this gives
--         * the following rules:
--         *  1. A if it is signaling
--         *  2. B if it is signaling
--         *  3. A (quiet)
--         *  4. B (quiet)
--         * A signaling NaN is always silenced before returning it.
--         */
-         rule = float_2nan_prop_s_ab;
- #elif defined(TARGET_PPC) || defined(TARGET_M68K)
-         /*
---
-.34.1

-[PULL 05/31] target/loongarch: Explicitly set 2-NaN propagation rule
+Deleted patch
-Set the 2-NaN propagation rule explicitly in the float_status word we
-use.
-(There are a couple of places in fpu_helper.c where we create a
-dummy float_status word with "float_status *s = { };", but these
-are only used for calling float*_is_quiet_nan() so it doesn't
-matter that we don't set a 2-NaN propagation rule there.)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-6-peter.maydell@linaro.org
----
- target/loongarch/tcg/fpu_helper.c | 1 +
- fpu/softfloat-specialize.c.inc    | 6 +++---
-files changed, 4 insertions(+), 3 deletions(-)
-diff --git a/target/loongarch/tcg/fpu_helper.c b/target/loongarch/tcg/fpu_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/loongarch/tcg/fpu_helper.c
-+++ b/target/loongarch/tcg/fpu_helper.c
-@@ -XXX,XX +XXX,XX @@ void restore_fp_status(CPULoongArchState *env)
-     set_float_rounding_mode(ieee_rm[(env->fcsr0 >> FCSR0_RM) & 0x3],
-                             &env->fp_status);
-     set_flush_to_zero(0, &env->fp_status);
-+    set_float_2nan_prop_rule(float_2nan_prop_s_ab, &env->fp_status);
- }
- int ieee_ex_to_loongarch(int xcpt)
-diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat-specialize.c.inc
-+++ b/fpu/softfloat-specialize.c.inc
-@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
-         /* target didn't set the rule: fall back to old ifdef choices */
- #if defined(TARGET_AVR) || defined(TARGET_HEXAGON) \
-     || defined(TARGET_RISCV) || defined(TARGET_SH4) \
--    || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS)
-+    || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
-+    || defined(TARGET_LOONGARCH64)
-         g_assert_not_reached();
--#elif defined(TARGET_HPPA) || \
--    defined(TARGET_LOONGARCH64) || defined(TARGET_S390X)
-+#elif defined(TARGET_HPPA) || defined(TARGET_S390X)
-         rule = float_2nan_prop_s_ab;
- #elif defined(TARGET_PPC) || defined(TARGET_M68K)
-         /*
---
-.34.1

-[PULL 06/31] target/hppa: Explicitly set 2-NaN propagation rule
+Deleted patch
-Set the 2-NaN propagation rule explicitly in env->fp_status.
-Really we only need to do this at CPU reset (after reset has zeroed
-out most of the CPU state struct, which typically includes fp_status
-fields).  However target/hppa does not currently implement CPU reset
-at all, so leave a TODO comment to note that this could be moved if
-we ever do implement reset.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-7-peter.maydell@linaro.org
----
- target/hppa/fpu_helper.c       | 6 ++++++
- fpu/softfloat-specialize.c.inc | 4 ++--
-files changed, 8 insertions(+), 2 deletions(-)
-diff --git a/target/hppa/fpu_helper.c b/target/hppa/fpu_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/hppa/fpu_helper.c
-+++ b/target/hppa/fpu_helper.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(loaded_fr0)(CPUHPPAState *env)
-     d = FIELD_EX32(shadow, FPSR, D);
-     set_flush_to_zero(d, &env->fp_status);
-     set_flush_inputs_to_zero(d, &env->fp_status);
-+
-+    /*
-+     * TODO: we only need to do this at CPU reset, but currently
-+     * HPPA does note implement a CPU reset method at all...
-+     */
-+    set_float_2nan_prop_rule(float_2nan_prop_s_ab, &env->fp_status);
- }
- void cpu_hppa_loaded_fr0(CPUHPPAState *env)
-diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat-specialize.c.inc
-+++ b/fpu/softfloat-specialize.c.inc
-@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
- #if defined(TARGET_AVR) || defined(TARGET_HEXAGON) \
-     || defined(TARGET_RISCV) || defined(TARGET_SH4) \
-     || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
--    || defined(TARGET_LOONGARCH64)
-+    || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA)
-         g_assert_not_reached();
--#elif defined(TARGET_HPPA) || defined(TARGET_S390X)
-+#elif defined(TARGET_S390X)
-         rule = float_2nan_prop_s_ab;
- #elif defined(TARGET_PPC) || defined(TARGET_M68K)
-         /*
---
-.34.1

-[PULL 07/31] target/s390x: Explicitly set 2-NaN propagation rule
+Deleted patch
-Set the 2-NaN propagation rule explicitly in env->fpu_status.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-8-peter.maydell@linaro.org
----
- target/s390x/cpu.c             | 1 +
- fpu/softfloat-specialize.c.inc | 5 ++---
-files changed, 3 insertions(+), 3 deletions(-)
-diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/s390x/cpu.c
-+++ b/target/s390x/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void s390_cpu_reset_hold(Object *obj, ResetType type)
-         /* tininess for underflow is detected before rounding */
-         set_float_detect_tininess(float_tininess_before_rounding,
-                                   &env->fpu_status);
-+        set_float_2nan_prop_rule(float_2nan_prop_s_ab, &env->fpu_status);
-        /* fall through */
-     case RESET_TYPE_S390_CPU_NORMAL:
-         env->psw.mask &= ~PSW_MASK_RI;
-diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat-specialize.c.inc
-+++ b/fpu/softfloat-specialize.c.inc
-@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
- #if defined(TARGET_AVR) || defined(TARGET_HEXAGON) \
-     || defined(TARGET_RISCV) || defined(TARGET_SH4) \
-     || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
--    || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA)
-+    || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
-+    || defined(TARGET_S390X)
-         g_assert_not_reached();
--#elif defined(TARGET_S390X)
--        rule = float_2nan_prop_s_ab;
- #elif defined(TARGET_PPC) || defined(TARGET_M68K)
-         /*
-          * PowerPC propagation rules:
---
-.34.1

-[PULL 08/31] target/ppc: Explicitly set 2-NaN propagation rule
+Deleted patch
-Set the 2-NaN propagation rule explicitly in env->fp_status
-and env->vec_status.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-9-peter.maydell@linaro.org
----
- target/ppc/cpu_init.c          |  8 ++++++++
- fpu/softfloat-specialize.c.inc | 10 ++--------
-files changed, 10 insertions(+), 8 deletions(-)
-diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/ppc/cpu_init.c
-+++ b/target/ppc/cpu_init.c
-@@ -XXX,XX +XXX,XX @@ static void ppc_cpu_reset_hold(Object *obj, ResetType type)
-     /* tininess for underflow is detected before rounding */
-     set_float_detect_tininess(float_tininess_before_rounding,
-                               &env->fp_status);
-+    /*
-+     * PowerPC propagation rules:
-+     *  1. A if it sNaN or qNaN
-+     *  2. B if it sNaN or qNaN
-+     * A signaling NaN is always silenced before returning it.
-+     */
-+    set_float_2nan_prop_rule(float_2nan_prop_ab, &env->fp_status);
-+    set_float_2nan_prop_rule(float_2nan_prop_ab, &env->vec_status);
-     for (i = 0; i < ARRAY_SIZE(env->spr_cb); i++) {
-         ppc_spr_t *spr = &env->spr_cb[i];
-diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat-specialize.c.inc
-+++ b/fpu/softfloat-specialize.c.inc
-@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
-     || defined(TARGET_RISCV) || defined(TARGET_SH4) \
-     || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
-     || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
--    || defined(TARGET_S390X)
-+    || defined(TARGET_S390X) || defined(TARGET_PPC)
-         g_assert_not_reached();
--#elif defined(TARGET_PPC) || defined(TARGET_M68K)
--        /*
--         * PowerPC propagation rules:
--         *  1. A if it sNaN or qNaN
--         *  2. B if it sNaN or qNaN
--         * A signaling NaN is always silenced before returning it.
--         */
-+#elif defined(TARGET_M68K)
-         /*
-          * M68000 FAMILY PROGRAMMER'S REFERENCE MANUAL
-          * 3.4 FLOATING-POINT INSTRUCTION DETAILS
---
-.34.1

-[PULL 09/31] target/m68k: Explicitly set 2-NaN propagation rule
+Deleted patch
-Explicitly set the 2-NaN propagation rule on env->fp_status
-and on the temporary fp_status that we use in frem (since
-we pass that to a division operation function).
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
----
- target/m68k/cpu.c              | 16 ++++++++++++++++
- target/m68k/fpu_helper.c       |  1 +
- fpu/softfloat-specialize.c.inc | 19 +------------------
-files changed, 18 insertions(+), 18 deletions(-)
-diff --git a/target/m68k/cpu.c b/target/m68k/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/m68k/cpu.c
-+++ b/target/m68k/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void m68k_cpu_reset_hold(Object *obj, ResetType type)
-         env->fregs[i].d = nan;
-     }
-     cpu_m68k_set_fpcr(env, 0);
-+    /*
-+     * M68000 FAMILY PROGRAMMER'S REFERENCE MANUAL
-+     * 3.4 FLOATING-POINT INSTRUCTION DETAILS
-+     * If either operand, but not both operands, of an operation is a
-+     * nonsignaling NaN, then that NaN is returned as the result. If both
-+     * operands are nonsignaling NaNs, then the destination operand
-+     * nonsignaling NaN is returned as the result.
-+     * If either operand to an operation is a signaling NaN (SNaN), then the
-+     * SNaN bit is set in the FPSR EXC byte. If the SNaN exception enable bit
-+     * is set in the FPCR ENABLE byte, then the exception is taken and the
-+     * destination is not modified. If the SNaN exception enable bit is not
-+     * set, setting the SNaN bit in the operand to a one converts the SNaN to
-+     * a nonsignaling NaN. The operation then continues as described in the
-+     * preceding paragraph for nonsignaling NaNs.
-+     */
-+    set_float_2nan_prop_rule(float_2nan_prop_ab, &env->fp_status);
-     env->fpsr = 0;
-     /* TODO: We should set PC from the interrupt vector.  */
-diff --git a/target/m68k/fpu_helper.c b/target/m68k/fpu_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/m68k/fpu_helper.c
-+++ b/target/m68k/fpu_helper.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(frem)(CPUM68KState *env, FPReg *res, FPReg *val0, FPReg *val1)
-         int sign;
-         /* Calculate quotient directly using round to nearest mode */
-+        set_float_2nan_prop_rule(float_2nan_prop_ab, &fp_status);
-         set_float_rounding_mode(float_round_nearest_even, &fp_status);
-         set_floatx80_rounding_precision(
-             get_floatx80_rounding_precision(&env->fp_status), &fp_status);
-diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat-specialize.c.inc
-+++ b/fpu/softfloat-specialize.c.inc
-@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
-     || defined(TARGET_RISCV) || defined(TARGET_SH4) \
-     || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
-     || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
--    || defined(TARGET_S390X) || defined(TARGET_PPC)
-+    || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K)
-         g_assert_not_reached();
--#elif defined(TARGET_M68K)
--        /*
--         * M68000 FAMILY PROGRAMMER'S REFERENCE MANUAL
--         * 3.4 FLOATING-POINT INSTRUCTION DETAILS
--         * If either operand, but not both operands, of an operation is a
--         * nonsignaling NaN, then that NaN is returned as the result. If both
--         * operands are nonsignaling NaNs, then the destination operand
--         * nonsignaling NaN is returned as the result.
--         * If either operand to an operation is a signaling NaN (SNaN), then the
--         * SNaN bit is set in the FPSR EXC byte. If the SNaN exception enable bit
--         * is set in the FPCR ENABLE byte, then the exception is taken and the
--         * destination is not modified. If the SNaN exception enable bit is not
--         * set, setting the SNaN bit in the operand to a one converts the SNaN to
--         * a nonsignaling NaN. The operation then continues as described in the
--         * preceding paragraph for nonsignaling NaNs.
--         */
--        rule = float_2nan_prop_ab;
- #elif defined(TARGET_SPARC)
-         /* Prefer SNaN over QNaN, order B then A. */
-         rule = float_2nan_prop_s_ba;
---
-.34.1

-[PULL 10/31] target/m68k: Initialize float_status fields in gdb set/get functions
+Deleted patch
-In cf_fpu_gdb_get_reg() and cf_fpu_gdb_set_reg() we use a temporary
-float_status variable to pass to floatx80_to_float64() and
-float64_to_floatx80(), but we don't initialize it, meaning that those
-functions could access uninitialized data.  Zero-init the structs.
-(We don't need to set a NaN-propagation rule here because we
-don't use these with a 2-argument fpu operation.)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-11-peter.maydell@linaro.org
----
- target/m68k/helper.c | 4 ++--
-file changed, 2 insertions(+), 2 deletions(-)
-diff --git a/target/m68k/helper.c b/target/m68k/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/m68k/helper.c
-+++ b/target/m68k/helper.c
-@@ -XXX,XX +XXX,XX @@ static int cf_fpu_gdb_get_reg(CPUState *cs, GByteArray *mem_buf, int n)
-     CPUM68KState *env = &cpu->env;
-     if (n < 8) {
--        float_status s;
-+        float_status s = {};
-         return gdb_get_reg64(mem_buf, floatx80_to_float64(env->fregs[n].d, &s));
-     }
-     switch (n) {
-@@ -XXX,XX +XXX,XX @@ static int cf_fpu_gdb_set_reg(CPUState *cs, uint8_t *mem_buf, int n)
-     CPUM68KState *env = &cpu->env;
-     if (n < 8) {
--        float_status s;
-+        float_status s = {};
-         env->fregs[n].d = float64_to_floatx80(ldq_be_p(mem_buf), &s);
-         return 8;
-     }
---
-.34.1

-[PULL 12/31] target/sparc: Explicitly set 2-NaN propagation rule
+Deleted patch
-Set the NaN propagation rule explicitly in the float_status
-words we use.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-13-peter.maydell@linaro.org
----
- target/sparc/cpu.c             |  8 ++++++++
- target/sparc/fop_helper.c      | 10 ++++++++--
- fpu/softfloat-specialize.c.inc |  6 ++----
-files changed, 18 insertions(+), 6 deletions(-)
-diff --git a/target/sparc/cpu.c b/target/sparc/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/sparc/cpu.c
-+++ b/target/sparc/cpu.c
-@@ -XXX,XX +XXX,XX @@
- #include "hw/qdev-properties.h"
- #include "qapi/visitor.h"
- #include "tcg/tcg.h"
-+#include "fpu/softfloat.h"
- //#define DEBUG_FEATURES
-@@ -XXX,XX +XXX,XX @@ static void sparc_cpu_realizefn(DeviceState *dev, Error **errp)
-     env->version |= env->def.nwindows - 1;
- #endif
-+    /*
-+     * Prefer SNaN over QNaN, order B then A. It's OK to do this in realize
-+     * rather than reset, because fp_status is after 'end_reset_fields' in
-+     * the CPU state struct so it won't get zeroed on reset.
-+     */
-+    set_float_2nan_prop_rule(float_2nan_prop_s_ba, &env->fp_status);
-+
-     cpu_exec_realizefn(cs, &local_err);
-     if (local_err != NULL) {
-         error_propagate(errp, local_err);
-diff --git a/target/sparc/fop_helper.c b/target/sparc/fop_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/sparc/fop_helper.c
-+++ b/target/sparc/fop_helper.c
-@@ -XXX,XX +XXX,XX @@ uint32_t helper_flcmps(float32 src1, float32 src2)
-      * Perform the comparison with a dummy fp environment.
-      */
-     float_status discard = { };
--    FloatRelation r = float32_compare_quiet(src1, src2, &discard);
-+    FloatRelation r;
-+
-+    set_float_2nan_prop_rule(float_2nan_prop_s_ba, &discard);
-+    r = float32_compare_quiet(src1, src2, &discard);
-     switch (r) {
-     case float_relation_equal:
-@@ -XXX,XX +XXX,XX @@ uint32_t helper_flcmps(float32 src1, float32 src2)
- uint32_t helper_flcmpd(float64 src1, float64 src2)
- {
-     float_status discard = { };
--    FloatRelation r = float64_compare_quiet(src1, src2, &discard);
-+    FloatRelation r;
-+
-+    set_float_2nan_prop_rule(float_2nan_prop_s_ba, &discard);
-+    r = float64_compare_quiet(src1, src2, &discard);
-     switch (r) {
-     case float_relation_equal:
-diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat-specialize.c.inc
-+++ b/fpu/softfloat-specialize.c.inc
-@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
-     || defined(TARGET_RISCV) || defined(TARGET_SH4) \
-     || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
-     || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
--    || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K)
-+    || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K) \
-+    || defined(TARGET_SPARC)
-         g_assert_not_reached();
--#elif defined(TARGET_SPARC)
--        /* Prefer SNaN over QNaN, order B then A. */
--        rule = float_2nan_prop_s_ba;
- #elif defined(TARGET_XTENSA)
-         /*
-          * Xtensa has two NaN propagation modes.
---
-.34.1

-[PULL 23/31] target/arm: Add new MMU indexes for AArch32 Secure PL1&0
+[PULL 08/11] target/arm: add new property to select pauth-qarma5
-Our current usage of MMU indexes when EL3 is AArch32 is confused.
+From: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 Architecturally, when EL3 is AArch32, all Secure code runs under the
 Secure PL1&0 translation regime:
  * code at EL3, which might be Mon, or SVC, or any of the
    other privileged modes (PL1)
  * code at EL0 (Secure PL0)
-This is different from when EL3 is AArch64, in which case EL3 is its
+Before changing default pauth algorithm, we need to make sure current
-own translation regime, and EL1 and EL0 (whether AArch32 or AArch64)
+default one (QARMA5) can still be selected.
 have their own regime.
-We claimed to be mapping Secure PL1 to our ARMMMUIdx_EL3, but didn't
+$ qemu-system-aarch64 -cpu max,pauth-qarma5=on ...
 do anything special about Secure PL0, which meant it used the same
 ARMMMUIdx_EL10_0 that NonSecure PL0 does.  This resulted in a bug
 where arm_sctlr() incorrectly picked the NonSecure SCTLR as the
 controlling register when in Secure PL0, which meant we were
 spuriously generating alignment faults because we were looking at the
 wrong SCTLR control bits.
-The use of ARMMMUIdx_EL3 for Secure PL1 also resulted in the bug that
+Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
-we wouldn't honour the PAN bit for Secure PL1, because there's no
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-equivalent _PAN mmu index for it.
+Message-id: 20241219183211.3493974-2-pierrick.bouvier@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  docs/system/arm/cpu-features.rst |  5 ++++-
  target/arm/cpu.h                 |  1 +
  target/arm/arm-qmp-cmds.c        |  2 +-
  target/arm/cpu64.c               | 20 ++++++++++++++------
  tests/qtest/arm-cpu-features.c   | 15 +++++++++++----
 files changed, 31 insertions(+), 12 deletions(-)
-Fix this by adding two new MMU indexes:
+diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst
- * ARMMMUIdx_E30_0 is for Secure PL0
+index XXXXXXX..XXXXXXX 100644
- * ARMMMUIdx_E30_3_PAN is for Secure PL1 when PAN is enabled
+--- a/docs/system/arm/cpu-features.rst
-The existing ARMMMUIdx_E3 is used to mean "Secure PL1 without PAN"
++++ b/docs/system/arm/cpu-features.rst
-(and would be named ARMMMUIdx_E30_3 in an AArch32-centric scheme).
+@@ -XXX,XX +XXX,XX @@ Below is the list of TCG VCPU features and their descriptions.
+ ``pauth-qarma3``
-These extra two indexes bring us up to the maximum of 16 that the
+   When ``pauth`` is enabled, select the architected QARMA3 algorithm.
-core code can currently support.
+-Without either ``pauth-impdef`` or ``pauth-qarma3`` enabled,
-This commit:
++``pauth-qarma5``
- * adds the new MMU index handling to the various places
++  When ``pauth`` is enabled, select the architected QARMA5 algorithm.
-   where we deal in MMU index values
++
- * adds assertions that we aren't AArch32 EL3 in a couple of
++Without ``pauth-impdef``, ``pauth-qarma3`` or ``pauth-qarma5`` enabled,
-   places that currently use the E10 indexes, to document why
+ the architected QARMA5 algorithm is used.  The architected QARMA5
-   they don't also need to handle the E30 indexes
+ and QARMA3 algorithms have good cryptographic properties, but can
- * documents in a comment why regime_has_2_ranges() doesn't need
+ be quite slow to emulate.  The impdef algorithm used by QEMU is
    updating
 Notes for backporting: this commit depends on the preceding revert of
 c2c04746932; that revert and this commit should probably be
 backported to everywhere that we originally backported 4c2c04746932.
 Cc: qemu-stable@nongnu.org
 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2326
 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2588
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Tested-by: Thomas Huth <thuth@redhat.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20241101142845.1712482-3-peter.maydell@linaro.org
 ---
  target/arm/cpu.h           | 31 ++++++++++++++++++-------------
  target/arm/internals.h     | 16 ++++++++++++++--
  target/arm/helper.c        | 38 ++++++++++++++++++++++++++++++++++----
  target/arm/ptw.c           |  4 ++++
  target/arm/tcg/op_helper.c | 14 +++++++++++++-
  target/arm/tcg/translate.c |  3 +++
 files changed, 86 insertions(+), 20 deletions(-)
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
+@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
-  *  + NonSecure PL1 & 0 stage 1
+     bool prop_pauth;
-  *  + NonSecure PL1 & 0 stage 2
+     bool prop_pauth_impdef;
-  *  + NonSecure PL2
+     bool prop_pauth_qarma3;
-- *  + Secure PL0
++    bool prop_pauth_qarma5;
-- *  + Secure PL1
+     bool prop_lpa2;
-+ *  + Secure PL1 & 0
-  * (reminder: for 32 bit EL3, Secure PL1 is *EL3*, not EL1.)
+     /* DCZ blocksize, in log_2(words), ie low 4 bits of DCZID_EL0 */
-  *
+diff --git a/target/arm/arm-qmp-cmds.c b/target/arm/arm-qmp-cmds.c
   * For QEMU, an mmu_idx is not quite the same as a translation regime because:
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
   *
   * This gives us the following list of cases:
   *
 - * EL0 EL1&0 stage 1+2 (aka NS PL0)
 - * EL1 EL1&0 stage 1+2 (aka NS PL1)
 - * EL1 EL1&0 stage 1+2 +PAN
 + * EL0 EL1&0 stage 1+2 (aka NS PL0 PL1&0 stage 1+2)
 + * EL1 EL1&0 stage 1+2 (aka NS PL1 PL1&0 stage 1+2)
 + * EL1 EL1&0 stage 1+2 +PAN (aka NS PL1 P1&0 stage 1+2 +PAN)
   * EL0 EL2&0
   * EL2 EL2&0
   * EL2 EL2&0 +PAN
   * EL2 (aka NS PL2)
 - * EL3 (aka S PL1)
 + * EL3 (aka AArch32 S PL1 PL1&0)
 + * AArch32 S PL0 PL1&0 (we call this EL30_0)
 + * AArch32 S PL1 PL1&0 +PAN (we call this EL30_3_PAN)
   * Stage2 Secure
   * Stage2 NonSecure
   * plus one TLB per Physical address space: S, NS, Realm, Root
   *
 - * for a total of 14 different mmu_idx.
 + * for a total of 16 different mmu_idx.
   *
   * R profile CPUs have an MPU, but can use the same set of MMU indexes
   * as A profile. They only need to distinguish EL0 and EL1 (and
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
      ARMMMUIdx_E20_2_PAN = 5 | ARM_MMU_IDX_A,
      ARMMMUIdx_E2        = 6 | ARM_MMU_IDX_A,
      ARMMMUIdx_E3        = 7 | ARM_MMU_IDX_A,
 +    ARMMMUIdx_E30_0     = 8 | ARM_MMU_IDX_A,
 +    ARMMMUIdx_E30_3_PAN = 9 | ARM_MMU_IDX_A,
      /*
       * Used for second stage of an S12 page table walk, or for descriptor
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
       * are in use simultaneously for SecureEL2: the security state for
       * the S2 ptw is selected by the NS bit from the S1 ptw.
       */
 -    ARMMMUIdx_Stage2_S  = 8 | ARM_MMU_IDX_A,
 -    ARMMMUIdx_Stage2    = 9 | ARM_MMU_IDX_A,
 +    ARMMMUIdx_Stage2_S  = 10 | ARM_MMU_IDX_A,
 +    ARMMMUIdx_Stage2    = 11 | ARM_MMU_IDX_A,
      /* TLBs with 1-1 mapping to the physical address spaces. */
 -    ARMMMUIdx_Phys_S     = 10 | ARM_MMU_IDX_A,
 -    ARMMMUIdx_Phys_NS    = 11 | ARM_MMU_IDX_A,
 -    ARMMMUIdx_Phys_Root  = 12 | ARM_MMU_IDX_A,
 -    ARMMMUIdx_Phys_Realm = 13 | ARM_MMU_IDX_A,
 +    ARMMMUIdx_Phys_S     = 12 | ARM_MMU_IDX_A,
 +    ARMMMUIdx_Phys_NS    = 13 | ARM_MMU_IDX_A,
 +    ARMMMUIdx_Phys_Root  = 14 | ARM_MMU_IDX_A,
 +    ARMMMUIdx_Phys_Realm = 15 | ARM_MMU_IDX_A,
      /*
       * These are not allocated TLBs and are used only for AT system
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdxBit {
      TO_CORE_BIT(E20_2),
      TO_CORE_BIT(E20_2_PAN),
      TO_CORE_BIT(E3),
 +    TO_CORE_BIT(E30_0),
 +    TO_CORE_BIT(E30_3_PAN),
      TO_CORE_BIT(Stage2),
      TO_CORE_BIT(Stage2_S),
 diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/internals.h
+--- a/target/arm/arm-qmp-cmds.c
-+++ b/target/arm/internals.h
++++ b/target/arm/arm-qmp-cmds.c
-@@ -XXX,XX +XXX,XX @@ static inline void arm_call_el_change_hook(ARMCPU *cpu)
+@@ -XXX,XX +XXX,XX @@ static const char *cpu_model_advertised_features[] = {
      "sve640", "sve768", "sve896", "sve1024", "sve1152", "sve1280",
      "sve1408", "sve1536", "sve1664", "sve1792", "sve1920", "sve2048",
      "kvm-no-adjvtime", "kvm-steal-time",
 -    "pauth", "pauth-impdef", "pauth-qarma3",
 +    "pauth", "pauth-impdef", "pauth-qarma3", "pauth-qarma5",
      NULL
  };
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
          }
          if (cpu->prop_pauth) {
 -            if (cpu->prop_pauth_impdef && cpu->prop_pauth_qarma3) {
 +            if ((cpu->prop_pauth_impdef && cpu->prop_pauth_qarma3) ||
 +                (cpu->prop_pauth_impdef && cpu->prop_pauth_qarma5) ||
 +                (cpu->prop_pauth_qarma3 && cpu->prop_pauth_qarma5)) {
                  error_setg(errp,
 -                           "cannot enable both pauth-impdef and pauth-qarma3");
 +                           "cannot enable pauth-impdef, pauth-qarma3 and "
 +                           "pauth-qarma5 at the same time");
                  return;
              }
@@ -XXX,XX +XXX,XX @@ void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
              } else if (cpu->prop_pauth_qarma3) {
                  isar2 = FIELD_DP64(isar2, ID_AA64ISAR2, APA3, features);
                  isar2 = FIELD_DP64(isar2, ID_AA64ISAR2, GPA3, 1);
 -            } else {
 +            } else { /* default is pauth-qarma5 */
                  isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, APA, features);
                  isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, GPA, 1);
              }
 -        } else if (cpu->prop_pauth_impdef || cpu->prop_pauth_qarma3) {
 -            error_setg(errp, "cannot enable pauth-impdef or "
 -                       "pauth-qarma3 without pauth");
 +        } else if (cpu->prop_pauth_impdef ||
 +                   cpu->prop_pauth_qarma3 ||
 +                   cpu->prop_pauth_qarma5) {
 +            error_setg(errp, "cannot enable pauth-impdef, pauth-qarma3 or "
 +                       "pauth-qarma5 without pauth");
              error_append_hint(errp, "Add pauth=on to the CPU property list.\n");
          }
      }
@@ -XXX,XX +XXX,XX @@ static const Property arm_cpu_pauth_impdef_property =
      DEFINE_PROP_BOOL("pauth-impdef", ARMCPU, prop_pauth_impdef, false);
  static const Property arm_cpu_pauth_qarma3_property =
      DEFINE_PROP_BOOL("pauth-qarma3", ARMCPU, prop_pauth_qarma3, false);
 +static Property arm_cpu_pauth_qarma5_property =
 +    DEFINE_PROP_BOOL("pauth-qarma5", ARMCPU, prop_pauth_qarma5, false);
  void aarch64_add_pauth_properties(Object *obj)
  {
@@ -XXX,XX +XXX,XX @@ void aarch64_add_pauth_properties(Object *obj)
      } else {
          qdev_property_add_static(DEVICE(obj), &arm_cpu_pauth_impdef_property);
          qdev_property_add_static(DEVICE(obj), &arm_cpu_pauth_qarma3_property);
 +        qdev_property_add_static(DEVICE(obj), &arm_cpu_pauth_qarma5_property);
      }
  }
--/* Return true if this address translation regime has two ranges.  */
+diff --git a/tests/qtest/arm-cpu-features.c b/tests/qtest/arm-cpu-features.c
 +/*
 + * Return true if this address translation regime has two ranges.
 + * Note that this will not return the correct answer for AArch32
 + * Secure PL1&0 (i.e. mmu indexes E3, E30_0, E30_3_PAN), but it is
 + * never called from a context where EL3 can be AArch32. (The
 + * correct return value for ARMMMUIdx_E3 would be different for
 + * that case, so we can't just make the function return the
 + * correct value anyway; we would need an extra "bool e3_is_aarch32"
 + * argument which all the current callsites would pass as 'false'.)
 + */
  static inline bool regime_has_2_ranges(ARMMMUIdx mmu_idx)
  {
      switch (mmu_idx) {
@@ -XXX,XX +XXX,XX @@ static inline bool regime_is_pan(CPUARMState *env, ARMMMUIdx mmu_idx)
      case ARMMMUIdx_Stage1_E1_PAN:
      case ARMMMUIdx_E10_1_PAN:
      case ARMMMUIdx_E20_2_PAN:
 +    case ARMMMUIdx_E30_3_PAN:
          return true;
      default:
          return false;
@@ -XXX,XX +XXX,XX @@ static inline uint32_t regime_el(CPUARMState *env, ARMMMUIdx mmu_idx)
      case ARMMMUIdx_E2:
          return 2;
      case ARMMMUIdx_E3:
 +    case ARMMMUIdx_E30_0:
 +    case ARMMMUIdx_E30_3_PAN:
          return 3;
      case ARMMMUIdx_E10_0:
      case ARMMMUIdx_Stage1_E0:
 -        return arm_el_is_aa64(env, 3) || !arm_is_secure_below_el3(env) ? 1 : 3;
      case ARMMMUIdx_Stage1_E1:
      case ARMMMUIdx_Stage1_E1_PAN:
      case ARMMMUIdx_E10_1:
@@ -XXX,XX +XXX,XX @@ static inline bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
      switch (mmu_idx) {
      case ARMMMUIdx_E10_0:
      case ARMMMUIdx_E20_0:
 +    case ARMMMUIdx_E30_0:
      case ARMMMUIdx_Stage1_E0:
      case ARMMMUIdx_MUser:
      case ARMMMUIdx_MSUser:
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/tests/qtest/arm-cpu-features.c
-+++ b/target/arm/helper.c
++++ b/tests/qtest/arm-cpu-features.c
-@@ -XXX,XX +XXX,XX @@ static int alle1_tlbmask(CPUARMState *env)
+@@ -XXX,XX +XXX,XX @@ static void pauth_tests_default(QTestState *qts, const char *cpu_type)
-      * Note that the 'ALL' scope must invalidate both stage 1 and
+     assert_has_feature_enabled(qts, cpu_type, "pauth");
-      * stage 2 translations, whereas most other scopes only invalidate
+     assert_has_feature_disabled(qts, cpu_type, "pauth-impdef");
-      * stage 1 translations.
+     assert_has_feature_disabled(qts, cpu_type, "pauth-qarma3");
-+     *
++    assert_has_feature_disabled(qts, cpu_type, "pauth-qarma5");
-+     * For AArch32 this is only used for TLBIALLNSNH and VTTBR
+     assert_set_feature(qts, cpu_type, "pauth", false);
-+     * writes, so only needs to apply to NS PL1&0, not S PL1&0.
+     assert_set_feature(qts, cpu_type, "pauth", true);
-      */
+     assert_set_feature(qts, cpu_type, "pauth-impdef", true);
-     return (ARMMMUIdxBit_E10_1 |
+     assert_set_feature(qts, cpu_type, "pauth-impdef", false);
-             ARMMMUIdxBit_E10_1_PAN |
+     assert_set_feature(qts, cpu_type, "pauth-qarma3", true);
-@@ -XXX,XX +XXX,XX @@ static void ats_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
+     assert_set_feature(qts, cpu_type, "pauth-qarma3", false);
-         /* stage 1 current state PL1: ATS1CPR, ATS1CPW, ATS1CPRP, ATS1CPWP */
++    assert_set_feature(qts, cpu_type, "pauth-qarma5", true);
-         switch (el) {
++    assert_set_feature(qts, cpu_type, "pauth-qarma5", false);
-         case 3:
+     assert_error(qts, cpu_type,
--            mmu_idx = ARMMMUIdx_E3;
+-                 "cannot enable pauth-impdef or pauth-qarma3 without pauth",
-+            if (ri->crm == 9 && arm_pan_enabled(env)) {
++                 "cannot enable pauth-impdef, pauth-qarma3 or pauth-qarma5 without pauth",
-+                mmu_idx = ARMMMUIdx_E30_3_PAN;
+                  "{ 'pauth': false, 'pauth-impdef': true }");
-+            } else {
+     assert_error(qts, cpu_type,
-+                mmu_idx = ARMMMUIdx_E3;
+-                 "cannot enable pauth-impdef or pauth-qarma3 without pauth",
-+            }
++                 "cannot enable pauth-impdef, pauth-qarma3 or pauth-qarma5 without pauth",
-             break;
+                  "{ 'pauth': false, 'pauth-qarma3': true }");
-         case 2:
+     assert_error(qts, cpu_type,
-             g_assert(ss != ARMSS_Secure);  /* ARMv8.4-SecEL2 is 64-bit only */
+-                 "cannot enable both pauth-impdef and pauth-qarma3",
-@@ -XXX,XX +XXX,XX @@ static void ats_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
+-                 "{ 'pauth': true, 'pauth-impdef': true, 'pauth-qarma3': true }");
-         /* stage 1 current state PL0: ATS1CUR, ATS1CUW */
++                 "cannot enable pauth-impdef, pauth-qarma3 or pauth-qarma5 without pauth",
-         switch (el) {
++                 "{ 'pauth': false, 'pauth-qarma5': true }");
-         case 3:
++    assert_error(qts, cpu_type,
--            mmu_idx = ARMMMUIdx_E10_0;
++                 "cannot enable pauth-impdef, pauth-qarma3 and pauth-qarma5 at the same time",
-+            mmu_idx = ARMMMUIdx_E30_0;
++                 "{ 'pauth': true, 'pauth-impdef': true, 'pauth-qarma3': true,"
-             break;
++                 "  'pauth-qarma5': true }");
          case 2:
              g_assert(ss != ARMSS_Secure);  /* ARMv8.4-SecEL2 is 64-bit only */
@@ -XXX,XX +XXX,XX @@ static int vae1_tlbmask(CPUARMState *env)
      uint64_t hcr = arm_hcr_el2_eff(env);
      uint16_t mask;
 +    assert(arm_feature(env, ARM_FEATURE_AARCH64));
 +
      if ((hcr & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) {
          mask = ARMMMUIdxBit_E20_2 |
                 ARMMMUIdxBit_E20_2_PAN |
                 ARMMMUIdxBit_E20_0;
      } else {
 +        /* This is AArch64 only, so we don't need to touch the EL30_x TLBs */
          mask = ARMMMUIdxBit_E10_1 |
                 ARMMMUIdxBit_E10_1_PAN |
                 ARMMMUIdxBit_E10_0;
@@ -XXX,XX +XXX,XX @@ static int vae1_tlbbits(CPUARMState *env, uint64_t addr)
      uint64_t hcr = arm_hcr_el2_eff(env);
      ARMMMUIdx mmu_idx;
 +    assert(arm_feature(env, ARM_FEATURE_AARCH64));
 +
      /* Only the regime of the mmu_idx below is significant. */
      if ((hcr & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) {
          mmu_idx = ARMMMUIdx_E20_0;
@@ -XXX,XX +XXX,XX @@ void arm_cpu_do_interrupt(CPUState *cs)
  uint64_t arm_sctlr(CPUARMState *env, int el)
  {
 -    /* Only EL0 needs to be adjusted for EL1&0 or EL2&0. */
 +    /* Only EL0 needs to be adjusted for EL1&0 or EL2&0 or EL3&0 */
      if (el == 0) {
          ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, 0);
 -        el = mmu_idx == ARMMMUIdx_E20_0 ? 2 : 1;
 +        switch (mmu_idx) {
 +        case ARMMMUIdx_E20_0:
 +            el = 2;
 +            break;
 +        case ARMMMUIdx_E30_0:
 +            el = 3;
 +            break;
 +        default:
 +            el = 1;
 +            break;
 +        }
      }
      return env->cp15.sctlr_el[el];
  }
-@@ -XXX,XX +XXX,XX @@ int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx)
-     switch (mmu_idx) {
+ static void test_query_cpu_model_expansion(const void *data)
      case ARMMMUIdx_E10_0:
      case ARMMMUIdx_E20_0:
 +    case ARMMMUIdx_E30_0:
          return 0;
      case ARMMMUIdx_E10_1:
      case ARMMMUIdx_E10_1_PAN:
@@ -XXX,XX +XXX,XX @@ int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx)
      case ARMMMUIdx_E20_2_PAN:
          return 2;
      case ARMMMUIdx_E3:
 +    case ARMMMUIdx_E30_3_PAN:
          return 3;
      default:
          g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el)
          hcr = arm_hcr_el2_eff(env);
          if ((hcr & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) {
              idx = ARMMMUIdx_E20_0;
 +        } else if (arm_is_secure_below_el3(env) &&
 +                   !arm_el_is_aa64(env, 3)) {
 +            idx = ARMMMUIdx_E30_0;
          } else {
              idx = ARMMMUIdx_E10_0;
          }
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el)
          }
          break;
      case 3:
 +        if (!arm_el_is_aa64(env, 3) && arm_pan_enabled(env)) {
 +            return ARMMMUIdx_E30_3_PAN;
 +        }
          return ARMMMUIdx_E3;
      default:
          g_assert_not_reached();
 diff --git a/target/arm/ptw.c b/target/arm/ptw.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/ptw.c
 +++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx,
      case ARMMMUIdx_E20_2_PAN:
      case ARMMMUIdx_E2:
      case ARMMMUIdx_E3:
 +    case ARMMMUIdx_E30_0:
 +    case ARMMMUIdx_E30_3_PAN:
          break;
      case ARMMMUIdx_Phys_S:
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, vaddr address,
          ss = ARMSS_Secure;
          break;
      case ARMMMUIdx_E3:
 +    case ARMMMUIdx_E30_0:
 +    case ARMMMUIdx_E30_3_PAN:
          if (arm_feature(env, ARM_FEATURE_AARCH64) &&
              cpu_isar_feature(aa64_rme, env_archcpu(env))) {
              ss = ARMSS_Root;
 diff --git a/target/arm/tcg/op_helper.c b/target/arm/tcg/op_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/op_helper.c
 +++ b/target/arm/tcg/op_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(tidcp_el0)(CPUARMState *env, uint32_t syndrome)
  {
      /* See arm_sctlr(), but we also need the sctlr el. */
      ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, 0);
 -    int target_el = mmu_idx == ARMMMUIdx_E20_0 ? 2 : 1;
 +    int target_el;
 +
 +    switch (mmu_idx) {
 +    case ARMMMUIdx_E20_0:
 +        target_el = 2;
 +        break;
 +    case ARMMMUIdx_E30_0:
 +        target_el = 3;
 +        break;
 +    default:
 +        target_el = 1;
 +        break;
 +    }
      /*
       * The bit is not valid unless the target el is aa64, but since the
 diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate.c
 +++ b/target/arm/tcg/translate.c
@@ -XXX,XX +XXX,XX @@ static inline int get_a32_user_mem_index(DisasContext *s)
       */
      switch (s->mmu_idx) {
      case ARMMMUIdx_E3:
 +    case ARMMMUIdx_E30_0:
 +    case ARMMMUIdx_E30_3_PAN:
 +        return arm_to_core_mmu_idx(ARMMMUIdx_E30_0);
      case ARMMMUIdx_E2:        /* this one is UNPREDICTABLE */
      case ARMMMUIdx_E10_0:
      case ARMMMUIdx_E10_1:
 --
 .34.1

-[PULL 25/31] disas: Fix build against Capstone v6 (again)
+[PULL 09/11] tests/tcg/aarch64: force qarma5 for pauth-3 test
-From: Richard Henderson <richard.henderson@linaro.org>
+The pauth-3 test explicitly tests that a computation of the
 pointer-authentication produces the expected result.  This means that
 it must be run with the QARMA5 algorithm.
-Like 9971cbac2f3, which set CAPSTONE_AARCH64_COMPAT_HEADER,
+Explicitly set the pauth algorithm when running this test, so that it
-also set CAPSTONE_SYSTEMZ_COMPAT_HEADER.  Fixes the build
+doesn't break when we change the default algorithm the 'max' CPU
-against capstone v6-alpha.
+uses.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Gustavo Romero <gustavo.romero@linaro.org>
-Message-id: 20241022013047.830273-1-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- include/disas/capstone.h | 1 +
+ tests/tcg/aarch64/Makefile.softmmu-target | 3 +++
-file changed, 1 insertion(+)
+file changed, 3 insertions(+)
-diff --git a/include/disas/capstone.h b/include/disas/capstone.h
+diff --git a/tests/tcg/aarch64/Makefile.softmmu-target b/tests/tcg/aarch64/Makefile.softmmu-target
 index XXXXXXX..XXXXXXX 100644
---- a/include/disas/capstone.h
+--- a/tests/tcg/aarch64/Makefile.softmmu-target
-+++ b/include/disas/capstone.h
++++ b/tests/tcg/aarch64/Makefile.softmmu-target
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ EXTRA_RUNS+=run-memory-replay
- #ifdef CONFIG_CAPSTONE
+ ifneq ($(CROSS_CC_HAS_ARMV8_3),)
- #define CAPSTONE_AARCH64_COMPAT_HEADER
+ pauth-3: CFLAGS += $(CROSS_CC_HAS_ARMV8_3)
-+#define CAPSTONE_SYSTEMZ_COMPAT_HEADER
++# This test explicitly checks the output of the pauth operation so we
- #include <capstone.h>
++# must force the use of the QARMA5 algorithm for it.
++run-pauth-3: QEMU_BASE_MACHINE=-M virt -cpu max,pauth-qarma5=on -display none
- #else
+ else
  pauth-3:
      $(call skip-test, "BUILD of $@", "missing compiler support")
 --
 .34.1

-[PULL 31/31] target/arm: Enable FEAT_CMOW for -cpu max
+[PULL 10/11] target/arm: change default pauth algorithm to impdef
-From: Gustavo Romero <gustavo.romero@linaro.org>
+From: Pierrick Bouvier <pierrick.bouvier@linaro.org>
-FEAT_CMOW introduces support for controlling cache maintenance
+Pointer authentication on aarch64 is pretty expensive (up to 50% of
-instructions executed in EL0/1 and is mandatory from Armv8.8.
+execution time) when running a virtual machine with tcg and -cpu max
 (which enables pauth=on).
-On real hardware, the main use for this feature is to prevent processes
+The advice is always: use pauth-impdef=on.
-from invalidating or flushing cache lines for addresses they only have
+Our documentation even mentions it "by default" in
-read permission, which can impact the performance of other processes.
+docs/system/introduction.rst.
-QEMU implements all cache instructions as NOPs, and, according to rule
+Thus, we change the default to use impdef by default. This does not
-[1], which states that generating any Permission fault when a cache
+affect kvm or hvf acceleration, since pauth algorithm used is the one
-instruction is implemented as a NOP is implementation-defined, no
+from host cpu.
 Permission fault is generated for any cache instruction when it lacks
 read and write permissions.
-QEMU does not model any cache topology, so the PoU and PoC are before
+This change is retro compatible, in terms of cli, with previous
-any cache, and rules [2] apply. These rules state that generating any
+versions, as the semantic of using -cpu max,pauth-impdef=on, and -cpu
-MMU fault for cache instructions in this topology is also
+max,pauth-qarma3=on is preserved.
-implementation-defined. Therefore, for FEAT_CMOW, we do not generate any
+The new option introduced in previous patch and matching old default is
-MMU faults either, instead, we only advertise it in the feature
+-cpu max,pauth-qarma5=on.
-register.
+It is retro compatible with migration as well, by defining a backcompat
 property, that will use qarma5 by default for virt machine <= 9.2.
 Tested by saving and restoring a vm from qemu 9.2.0 into qemu-master
 (10.0) for cpus neoverse-n2 and max.
-[1] Rule R_HGLYG of section D8.14.3, Arm ARM K.a.
+Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 [2] Rules R_MZTNR and R_DNZYL of section D8.14.3, Arm ARM K.a.
 Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241104142606.941638-1-gustavo.romero@linaro.org
+Message-id: 20241219183211.3493974-3-pierrick.bouvier@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- docs/system/arm/emulation.rst | 1 +
+ docs/system/arm/cpu-features.rst |  2 +-
- target/arm/cpu-features.h     | 5 +++++
+ docs/system/introduction.rst     |  2 +-
- target/arm/cpu.h              | 1 +
+ target/arm/cpu.h                 |  3 +++
- target/arm/helper.c           | 5 +++++
+ hw/core/machine.c                |  4 +++-
- target/arm/tcg/cpu64.c        | 1 +
+ target/arm/cpu.c                 |  2 ++
-files changed, 13 insertions(+)
+ target/arm/cpu64.c               | 22 ++++++++++++++++------
 files changed, 26 insertions(+), 9 deletions(-)
-diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
+diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst
 index XXXXXXX..XXXXXXX 100644
---- a/docs/system/arm/emulation.rst
+--- a/docs/system/arm/cpu-features.rst
-+++ b/docs/system/arm/emulation.rst
++++ b/docs/system/arm/cpu-features.rst
-@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
+@@ -XXX,XX +XXX,XX @@ Below is the list of TCG VCPU features and their descriptions.
- - FEAT_BF16 (AArch64 BFloat16 instructions)
+   When ``pauth`` is enabled, select the architected QARMA5 algorithm.
- - FEAT_BTI (Branch Target Identification)
- - FEAT_CCIDX (Extended cache index)
+ Without ``pauth-impdef``, ``pauth-qarma3`` or ``pauth-qarma5`` enabled,
-+- FEAT_CMOW (Control for cache maintenance permission)
+-the architected QARMA5 algorithm is used.  The architected QARMA5
- - FEAT_CRC32 (CRC32 instructions)
++the QEMU impdef algorithm is used.  The architected QARMA5
- - FEAT_Crypto (Cryptographic Extension)
+ and QARMA3 algorithms have good cryptographic properties, but can
- - FEAT_CSV2 (Cache speculation variant 2)
+ be quite slow to emulate.  The impdef algorithm used by QEMU is
-diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
+ non-cryptographic but significantly faster.
 diff --git a/docs/system/introduction.rst b/docs/system/introduction.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu-features.h
+--- a/docs/system/introduction.rst
-+++ b/target/arm/cpu-features.h
++++ b/docs/system/introduction.rst
-@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_tidcp1(const ARMISARegisters *id)
+@@ -XXX,XX +XXX,XX @@ would default to it anyway.
-     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, TIDCP1) != 0;
- }
+ .. code::
-+static inline bool isar_feature_aa64_cmow(const ARMISARegisters *id)
+- -cpu max,pauth-impdef=on \
-+{
++ -cpu max \
-+    return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, CMOW) != 0;
+  -smp 4 \
-+}
+  -accel tcg \
-+
  static inline bool isar_feature_aa64_hafs(const ARMISARegisters *id)
  {
      return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HAFDBS) != 0;
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu);
+@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
- #define SCTLR_EnIB    (1U << 30) /* v8.3, AArch64 only */
+     /* QOM property to indicate we should use the back-compat CNTFRQ default */
- #define SCTLR_EnIA    (1U << 31) /* v8.3, AArch64 only */
+     bool backcompat_cntfrq;
- #define SCTLR_DSSBS_32 (1U << 31) /* v8.5, AArch32 only */
-+#define SCTLR_CMOW    (1ULL << 32) /* FEAT_CMOW */
++    /* QOM property to indicate we should use the back-compat QARMA5 default */
- #define SCTLR_MSCEN   (1ULL << 33) /* FEAT_MOPS */
++    bool backcompat_pauth_default_use_qarma5;
- #define SCTLR_BT0     (1ULL << 35) /* v8.5-BTI */
++
- #define SCTLR_BT1     (1ULL << 36) /* v8.5-BTI */
+     /* Specify the number of cores in this CPU cluster. Used for the L2CTLR
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+      * register.
       */
 diff --git a/hw/core/machine.c b/hw/core/machine.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/hw/core/machine.c
-+++ b/target/arm/helper.c
++++ b/hw/core/machine.c
-@@ -XXX,XX +XXX,XX @@ static void hcrx_write(CPUARMState *env, const ARMCPRegInfo *ri,
+@@ -XXX,XX +XXX,XX @@
-     if (cpu_isar_feature(aa64_nmi, cpu)) {
+ #include "hw/virtio/virtio-iommu.h"
-         valid_mask |= HCRX_TALLINT | HCRX_VINMI | HCRX_VFNMI;
+ #include "audio/audio.h"
-     }
-+    /* FEAT_CMOW adds CMOW */
+-GlobalProperty hw_compat_9_2[] = {};
 +GlobalProperty hw_compat_9_2[] = {
 +    {"arm-cpu", "backcompat-pauth-default-use-qarma5", "true"},
 +};
  const size_t hw_compat_9_2_len = G_N_ELEMENTS(hw_compat_9_2);
  GlobalProperty hw_compat_9_1[] = {
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static const Property arm_cpu_properties[] = {
      DEFINE_PROP_INT32("core-count", ARMCPU, core_count, -1),
      /* True to default to the backward-compat old CNTFRQ rather than 1Ghz */
      DEFINE_PROP_BOOL("backcompat-cntfrq", ARMCPU, backcompat_cntfrq, false),
 +    DEFINE_PROP_BOOL("backcompat-pauth-default-use-qarma5", ARMCPU,
 +                      backcompat_pauth_default_use_qarma5, false),
  };
  static const gchar *arm_gdb_arch_name(CPUState *cs)
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
                  return;
              }
 -            if (cpu->prop_pauth_impdef) {
 -                isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, API, features);
 -                isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, GPI, 1);
 +            bool use_default = !cpu->prop_pauth_qarma5 &&
 +                               !cpu->prop_pauth_qarma3 &&
 +                               !cpu->prop_pauth_impdef;
 +
-+    if (cpu_isar_feature(aa64_cmow, cpu)) {
++            if (cpu->prop_pauth_qarma5 ||
-+        valid_mask |= HCRX_CMOW;
++                (use_default &&
-+    }
++                 cpu->backcompat_pauth_default_use_qarma5)) {
++                isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, APA, features);
-     /* Clear RES0 bits.  */
++                isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, GPA, 1);
-     env->cp15.hcrx_el2 = value & valid_mask;
+             } else if (cpu->prop_pauth_qarma3) {
-diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
+                 isar2 = FIELD_DP64(isar2, ID_AA64ISAR2, APA3, features);
-index XXXXXXX..XXXXXXX 100644
+                 isar2 = FIELD_DP64(isar2, ID_AA64ISAR2, GPA3, 1);
---- a/target/arm/tcg/cpu64.c
+-            } else { /* default is pauth-qarma5 */
-+++ b/target/arm/tcg/cpu64.c
+-                isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, APA, features);
-@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
+-                isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, GPA, 1);
-     t = FIELD_DP64(t, ID_AA64MMFR1, ETS, 2);      /* FEAT_ETS2 */
++            } else if (cpu->prop_pauth_impdef ||
-     t = FIELD_DP64(t, ID_AA64MMFR1, HCX, 1);      /* FEAT_HCX */
++                       (use_default &&
-     t = FIELD_DP64(t, ID_AA64MMFR1, TIDCP1, 1);   /* FEAT_TIDCP1 */
++                        !cpu->backcompat_pauth_default_use_qarma5)) {
-+    t = FIELD_DP64(t, ID_AA64MMFR1, CMOW, 1);     /* FEAT_CMOW */
++                isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, API, features);
-     cpu->isar.id_aa64mmfr1 = t;
++                isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, GPI, 1);
++            } else {
-     t = cpu->isar.id_aa64mmfr2;
++                g_assert_not_reached();
              }
          } else if (cpu->prop_pauth_impdef ||
                     cpu->prop_pauth_qarma3 ||
 --
 .34.1

-[PULL 13/31] target/xtensa: Factor out calls to set_use_first_nan()
+[PULL 11/11] docs/system/arm/virt: mention specific migration information
-In xtensa we currently call set_use_first_nan() in a lot of
+From: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 places where we want to switch the NaN-propagation handling.
 We're about to change the softfloat API we use to do that,
 so start by factoring all the calls out into a single
 xtensa_use_first_nan() function.
-The bulk of this change was done with
+Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
- sed -i -e 's/set_use_first_nan(\([^,]*\),[^)]*)/xtensa_use_first_nan(env, \1)/'  target/xtensa/fpu_helper.c
+Message-id: 20241219183211.3493974-4-pierrick.bouvier@linaro.org
 [PMM: Removed a paragraph about using non-versioned models.]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  docs/system/arm/virt.rst | 4 ++++
 file changed, 4 insertions(+)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
 Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20241025141254.2141506-14-peter.maydell@linaro.org
 ---
  target/xtensa/cpu.h        |  6 ++++++
  target/xtensa/cpu.c        |  2 +-
  target/xtensa/fpu_helper.c | 33 +++++++++++++++++++--------------
 files changed, 26 insertions(+), 15 deletions(-)
 diff --git a/target/xtensa/cpu.h b/target/xtensa/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/xtensa/cpu.h
+--- a/docs/system/arm/virt.rst
-+++ b/target/xtensa/cpu.h
++++ b/docs/system/arm/virt.rst
-@@ -XXX,XX +XXX,XX @@ static inline void cpu_get_tb_cpu_state(CPUXtensaState *env, vaddr *pc,
+@@ -XXX,XX +XXX,XX @@ of the 5.0 release and ``virt-5.0`` of the 5.1 release. Migration
- XtensaCPU *xtensa_cpu_create_with_clock(const char *cpu_type,
+ is not guaranteed to work between different QEMU releases for
-                                         Clock *cpu_refclk);
+ the non-versioned ``virt`` machine type.
-+/*
++VM migration is not guaranteed when using ``-cpu max``, as features
-+ * Set the NaN propagation rule for future FPU operations:
++supported may change between QEMU versions.  To ensure your VM can be
-+ * use_first is true to pick the first NaN as the result if both
++migrated, it is recommended to use another cpu model instead.
 + * inputs are NaNs, false to pick the second.
 + */
 +void xtensa_use_first_nan(CPUXtensaState *env, bool use_first);
  #endif
 diff --git a/target/xtensa/cpu.c b/target/xtensa/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/xtensa/cpu.c
 +++ b/target/xtensa/cpu.c
@@ -XXX,XX +XXX,XX @@ static void xtensa_cpu_reset_hold(Object *obj, ResetType type)
      cs->halted = env->runstall;
  #endif
      set_no_signaling_nans(!dfpu, &env->fp_status);
 -    set_use_first_nan(!dfpu, &env->fp_status);
 +    xtensa_use_first_nan(env, !dfpu);
  }
  static ObjectClass *xtensa_cpu_class_by_name(const char *cpu_model)
 diff --git a/target/xtensa/fpu_helper.c b/target/xtensa/fpu_helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/xtensa/fpu_helper.c
 +++ b/target/xtensa/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ static const struct {
      { XTENSA_FP_V, float_flag_invalid, },
  };
 +void xtensa_use_first_nan(CPUXtensaState *env, bool use_first)
 +{
 +    set_use_first_nan(use_first, &env->fp_status);
 +}
 +
- void HELPER(wur_fpu2k_fcr)(CPUXtensaState *env, uint32_t v)
+ Supported devices
- {
+ """""""""""""""""
      static const int rounding_mode[] = {
@@ -XXX,XX +XXX,XX @@ float32 HELPER(fpu2k_msub_s)(CPUXtensaState *env,
  float64 HELPER(add_d)(CPUXtensaState *env, float64 a, float64 b)
  {
 -    set_use_first_nan(true, &env->fp_status);
 +    xtensa_use_first_nan(env, true);
      return float64_add(a, b, &env->fp_status);
  }
  float32 HELPER(add_s)(CPUXtensaState *env, float32 a, float32 b)
  {
 -    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
 +    xtensa_use_first_nan(env, env->config->use_first_nan);
      return float32_add(a, b, &env->fp_status);
  }
  float64 HELPER(sub_d)(CPUXtensaState *env, float64 a, float64 b)
  {
 -    set_use_first_nan(true, &env->fp_status);
 +    xtensa_use_first_nan(env, true);
      return float64_sub(a, b, &env->fp_status);
  }
  float32 HELPER(sub_s)(CPUXtensaState *env, float32 a, float32 b)
  {
 -    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
 +    xtensa_use_first_nan(env, env->config->use_first_nan);
      return float32_sub(a, b, &env->fp_status);
  }
  float64 HELPER(mul_d)(CPUXtensaState *env, float64 a, float64 b)
  {
 -    set_use_first_nan(true, &env->fp_status);
 +    xtensa_use_first_nan(env, true);
      return float64_mul(a, b, &env->fp_status);
  }
  float32 HELPER(mul_s)(CPUXtensaState *env, float32 a, float32 b)
  {
 -    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
 +    xtensa_use_first_nan(env, env->config->use_first_nan);
      return float32_mul(a, b, &env->fp_status);
  }
  float64 HELPER(madd_d)(CPUXtensaState *env, float64 a, float64 b, float64 c)
  {
 -    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
 +    xtensa_use_first_nan(env, env->config->use_first_nan);
      return float64_muladd(b, c, a, 0, &env->fp_status);
  }
  float32 HELPER(madd_s)(CPUXtensaState *env, float32 a, float32 b, float32 c)
  {
 -    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
 +    xtensa_use_first_nan(env, env->config->use_first_nan);
      return float32_muladd(b, c, a, 0, &env->fp_status);
  }
  float64 HELPER(msub_d)(CPUXtensaState *env, float64 a, float64 b, float64 c)
  {
 -    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
 +    xtensa_use_first_nan(env, env->config->use_first_nan);
      return float64_muladd(b, c, a, float_muladd_negate_product,
                            &env->fp_status);
  }
  float32 HELPER(msub_s)(CPUXtensaState *env, float32 a, float32 b, float32 c)
  {
 -    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
 +    xtensa_use_first_nan(env, env->config->use_first_nan);
      return float32_muladd(b, c, a, float_muladd_negate_product,
                            &env->fp_status);
  }
  float64 HELPER(mkdadj_d)(CPUXtensaState *env, float64 a, float64 b)
  {
 -    set_use_first_nan(true, &env->fp_status);
 +    xtensa_use_first_nan(env, true);
      return float64_div(b, a, &env->fp_status);
  }
  float32 HELPER(mkdadj_s)(CPUXtensaState *env, float32 a, float32 b)
  {
 -    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
 +    xtensa_use_first_nan(env, env->config->use_first_nan);
      return float32_div(b, a, &env->fp_status);
  }
  float64 HELPER(mksadj_d)(CPUXtensaState *env, float64 v)
  {
 -    set_use_first_nan(true, &env->fp_status);
 +    xtensa_use_first_nan(env, true);
      return float64_sqrt(v, &env->fp_status);
  }
  float32 HELPER(mksadj_s)(CPUXtensaState *env, float32 v)
  {
 -    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
 +    xtensa_use_first_nan(env, env->config->use_first_nan);
      return float32_sqrt(v, &env->fp_status);
  }
 --
 .34.1

-[PULL 14/31] target/xtensa: Explicitly set 2-NaN propagation rule
+Deleted patch
-Set the NaN propagation rule explicitly in xtensa_use_first_nan().
-(When we convert the softfloat pickNaNMulAdd routine to also
-select a NaN propagation rule at runtime, we will be able to
-remove the use_first_nan flag because the propagation rules
-will handle everything.)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-15-peter.maydell@linaro.org
----
- target/xtensa/fpu_helper.c     |  2 ++
- fpu/softfloat-specialize.c.inc | 12 +-----------
-files changed, 3 insertions(+), 11 deletions(-)
-diff --git a/target/xtensa/fpu_helper.c b/target/xtensa/fpu_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/xtensa/fpu_helper.c
-+++ b/target/xtensa/fpu_helper.c
-@@ -XXX,XX +XXX,XX @@ static const struct {
- void xtensa_use_first_nan(CPUXtensaState *env, bool use_first)
- {
-     set_use_first_nan(use_first, &env->fp_status);
-+    set_float_2nan_prop_rule(use_first ? float_2nan_prop_ab : float_2nan_prop_ba,
-+                             &env->fp_status);
- }
- void HELPER(wur_fpu2k_fcr)(CPUXtensaState *env, uint32_t v)
-diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat-specialize.c.inc
-+++ b/fpu/softfloat-specialize.c.inc
-@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
-     || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
-     || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
-     || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K) \
--    || defined(TARGET_SPARC)
-+    || defined(TARGET_SPARC) || defined(TARGET_XTENSA)
-         g_assert_not_reached();
--#elif defined(TARGET_XTENSA)
--        /*
--         * Xtensa has two NaN propagation modes.
--         * Which one is active is controlled by float_status::use_first_nan.
--         */
--        if (status->use_first_nan) {
--            rule = float_2nan_prop_ab;
--        } else {
--            rule = float_2nan_prop_ba;
--        }
- #else
-         rule = float_2nan_prop_x87;
- #endif
---
-.34.1

-[PULL 15/31] target/i386: Set 2-NaN propagation rule explicitly
+Deleted patch
-Set the NaN propagation rule explicitly for the float_status words
-used in the x86 target.
-This is a no-behaviour-change commit, so we retain the existing
-behaviour of using the x87-style "prefer QNaN over SNaN, then prefer
-the NaN with the larger significand" for MMX and SSE.  This is
-however not the documented hardware behaviour, so we leave a TODO
-note about what we should be doing instead.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-16-peter.maydell@linaro.org
----
- target/i386/cpu.h              |  3 +++
- target/i386/cpu.c              |  4 ++++
- target/i386/tcg/fpu_helper.c   | 40 ++++++++++++++++++++++++++++++++++
- fpu/softfloat-specialize.c.inc |  3 ++-
-files changed, 49 insertions(+), 1 deletion(-)
-diff --git a/target/i386/cpu.h b/target/i386/cpu.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/i386/cpu.h
-+++ b/target/i386/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline bool cpu_vmx_maybe_enabled(CPUX86State *env)
- int get_pg_mode(CPUX86State *env);
- /* fpu_helper.c */
-+
-+/* Set all non-runtime-variable float_status fields to x86 handling */
-+void cpu_init_fp_statuses(CPUX86State *env);
- void update_fp_status(CPUX86State *env);
- void update_mxcsr_status(CPUX86State *env);
- void update_mxcsr_from_sse_status(CPUX86State *env);
-diff --git a/target/i386/cpu.c b/target/i386/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/i386/cpu.c
-+++ b/target/i386/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void x86_cpu_reset_hold(Object *obj, ResetType type)
-     memset(env, 0, offsetof(CPUX86State, end_reset_fields));
-+    if (tcg_enabled()) {
-+        cpu_init_fp_statuses(env);
-+    }
-+
-     env->old_exception = -1;
-     /* init to reset state */
-diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/i386/tcg/fpu_helper.c
-+++ b/target/i386/tcg/fpu_helper.c
-@@ -XXX,XX +XXX,XX @@ static void fpu_set_exception(CPUX86State *env, int mask)
-     }
- }
-+void cpu_init_fp_statuses(CPUX86State *env)
-+{
-+    /*
-+     * Initialise the non-runtime-varying fields of the various
-+     * float_status words to x86 behaviour. This must be called at
-+     * CPU reset because the float_status words are in the
-+     * "zeroed on reset" portion of the CPU state struct.
-+     * Fields in float_status that vary under guest control are set
-+     * via the codepath for setting that register, eg cpu_set_fpuc().
-+     */
-+    /*
-+     * Use x87 NaN propagation rules:
-+     * SNaN + QNaN => return the QNaN
-+     * two SNaNs => return the one with the larger significand, silenced
-+     * two QNaNs => return the one with the larger significand
-+     * SNaN and a non-NaN => return the SNaN, silenced
-+     * QNaN and a non-NaN => return the QNaN
-+     *
-+     * If we get down to comparing significands and they are the same,
-+     * return the NaN with the positive sign bit (if any).
-+     */
-+    set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
-+    /*
-+     * TODO: These are incorrect: the x86 Software Developer's Manual vol 1
-+     * section 4.8.3.5 "Operating on SNaNs and QNaNs" says that the
-+     * "larger significand" behaviour is only used for x87 FPU operations.
-+     * For SSE the required behaviour is to always return the first NaN,
-+     * which is float_2nan_prop_ab.
-+     *
-+     * mmx_status is used only for the AMD 3DNow! instructions, which
-+     * are documented in the "3DNow! Technology Manual" as not supporting
-+     * NaNs or infinities as inputs. The result of passing two NaNs is
-+     * documented as "undefined", so we can do what we choose.
-+     * (Strictly there is some behaviour we don't implement correctly
-+     * for these "unsupported" NaN and Inf values, like "NaN * 0 == 0".)
-+     */
-+    set_float_2nan_prop_rule(float_2nan_prop_x87, &env->mmx_status);
-+    set_float_2nan_prop_rule(float_2nan_prop_x87, &env->sse_status);
-+}
-+
- static inline uint8_t save_exception_flags(CPUX86State *env)
- {
-     uint8_t old_flags = get_float_exception_flags(&env->fp_status);
-diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat-specialize.c.inc
-+++ b/fpu/softfloat-specialize.c.inc
-@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
-     || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
-     || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
-     || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K) \
--    || defined(TARGET_SPARC) || defined(TARGET_XTENSA)
-+    || defined(TARGET_SPARC) || defined(TARGET_XTENSA) \
-+    || defined(TARGET_I386)
-         g_assert_not_reached();
- #else
-         rule = float_2nan_prop_x87;
---
-.34.1

-[PULL 16/31] target/alpha: Explicitly set 2-NaN propagation rule
+Deleted patch
-Set the NaN propagation rule explicitly for the float_status word
-used in this target.
-This is a no-behaviour-change commit, so we retain the existing
-behaviour of x87-style pick-largest-significand NaN propagation.
-This is however not the architecturally correct handling, so we leave
-a TODO note to that effect.
-We also leave a TODO note pointing out that all this code in the cpu
-initfn (including the existing setting up of env->flags and the FPCR)
-should be in a currently non-existent CPU reset function.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-17-peter.maydell@linaro.org
----
- target/alpha/cpu.c             | 11 +++++++++++
- fpu/softfloat-specialize.c.inc |  2 +-
-files changed, 12 insertions(+), 1 deletion(-)
-diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/alpha/cpu.c
-+++ b/target/alpha/cpu.c
-@@ -XXX,XX +XXX,XX @@
- #include "qemu/qemu-print.h"
- #include "cpu.h"
- #include "exec/exec-all.h"
-+#include "fpu/softfloat.h"
- static void alpha_cpu_set_pc(CPUState *cs, vaddr value)
-@@ -XXX,XX +XXX,XX @@ static void alpha_cpu_initfn(Object *obj)
- {
-     CPUAlphaState *env = cpu_env(CPU(obj));
-+    /* TODO all this should be done in reset, not init */
-+
-     env->lock_addr = -1;
-+
-+    /*
-+     * TODO: this is incorrect. The Alpha Architecture Handbook version 4
-+     * describes NaN propagation in section 4.7.10.4. We should prefer
-+     * the operand in Fb (whether it is a QNaN or an SNaN), then the
-+     * operand in Fa. That is float_2nan_prop_ba.
-+     */
-+    set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
- #if defined(CONFIG_USER_ONLY)
-     env->flags = ENV_FLAG_PS_USER | ENV_FLAG_FEN;
-     cpu_alpha_store_fpcr(env, (uint64_t)(FPCR_INVD | FPCR_DZED | FPCR_OVFD
-diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat-specialize.c.inc
-+++ b/fpu/softfloat-specialize.c.inc
-@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
-     || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
-     || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K) \
-     || defined(TARGET_SPARC) || defined(TARGET_XTENSA) \
--    || defined(TARGET_I386)
-+    || defined(TARGET_I386) || defined(TARGET_ALPHA)
-         g_assert_not_reached();
- #else
-         rule = float_2nan_prop_x87;
---
-.34.1

-[PULL 17/31] target/microblaze: Move setting of float rounding mode to reset
+Deleted patch
-Although the floating point rounding mode for Microblaze is always
-nearest-even, we cannot set it just once in the CPU initfn.  This is
-because env->fp_status is in the part of the CPU state struct that is
-zeroed on reset.
-Move the call to set_float_rounding_mode() into the reset fn.
-(This had no guest-visible effects because it happens that the
-float_round_nearest_even enum value is 0, so when the struct was
-zeroed it didn't corrupt the setting.)
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-18-peter.maydell@linaro.org
----
- target/microblaze/cpu.c | 5 ++---
-file changed, 2 insertions(+), 3 deletions(-)
-diff --git a/target/microblaze/cpu.c b/target/microblaze/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/microblaze/cpu.c
-+++ b/target/microblaze/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void mb_cpu_reset_hold(Object *obj, ResetType type)
-     env->pc = cpu->cfg.base_vectors;
-+    set_float_rounding_mode(float_round_nearest_even, &env->fp_status);
-+
- #if defined(CONFIG_USER_ONLY)
-     /* start in user mode with interrupts enabled.  */
-     mb_cpu_write_msr(env, MSR_EE | MSR_IE | MSR_VM | MSR_UM);
-@@ -XXX,XX +XXX,XX @@ static void mb_cpu_realizefn(DeviceState *dev, Error **errp)
- static void mb_cpu_initfn(Object *obj)
- {
-     MicroBlazeCPU *cpu = MICROBLAZE_CPU(obj);
--    CPUMBState *env = &cpu->env;
-     gdb_register_coprocessor(CPU(cpu), mb_cpu_gdb_read_stack_protect,
-                              mb_cpu_gdb_write_stack_protect,
-                              gdb_find_static_feature("microblaze-stack-protect.xml"),
-);
--    set_float_rounding_mode(float_round_nearest_even, &env->fp_status);
--
- #ifndef CONFIG_USER_ONLY
-     /* Inbound IRQ and FIR lines */
-     qdev_init_gpio_in(DEVICE(cpu), microblaze_cpu_set_irq, 2);
---
-.34.1

-[PULL 18/31] target/microblaze: Explicitly set 2-NaN propagation rule
+Deleted patch
-Set the NaN propagation rule explicitly for the float_status word
-used in the microblaze target.
-This is probably not the architecturally correct behaviour,
-but since this is a no-behaviour-change patch, we leave a
-TODO note to that effect.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-19-peter.maydell@linaro.org
----
- target/microblaze/cpu.c        | 5 +++++
- fpu/softfloat-specialize.c.inc | 3 ++-
-files changed, 7 insertions(+), 1 deletion(-)
-diff --git a/target/microblaze/cpu.c b/target/microblaze/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/microblaze/cpu.c
-+++ b/target/microblaze/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void mb_cpu_reset_hold(Object *obj, ResetType type)
-     env->pc = cpu->cfg.base_vectors;
-     set_float_rounding_mode(float_round_nearest_even, &env->fp_status);
-+    /*
-+     * TODO: this is probably not the correct NaN propagation rule for
-+     * this architecture.
-+     */
-+    set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
- #if defined(CONFIG_USER_ONLY)
-     /* start in user mode with interrupts enabled.  */
-diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat-specialize.c.inc
-+++ b/fpu/softfloat-specialize.c.inc
-@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
-     || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
-     || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K) \
-     || defined(TARGET_SPARC) || defined(TARGET_XTENSA) \
--    || defined(TARGET_I386) || defined(TARGET_ALPHA)
-+    || defined(TARGET_I386) || defined(TARGET_ALPHA) \
-+    || defined(TARGET_MICROBLAZE)
-         g_assert_not_reached();
- #else
-         rule = float_2nan_prop_x87;
---
-.34.1

-[PULL 19/31] target/openrisc: Explicitly set 2-NaN propagation rule
+Deleted patch
-Set the NaN propagation rule explicitly for the float_status word
-used in the openrisc target.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-20-peter.maydell@linaro.org
----
- target/openrisc/cpu.c          | 6 ++++++
- fpu/softfloat-specialize.c.inc | 2 +-
-files changed, 7 insertions(+), 1 deletion(-)
-diff --git a/target/openrisc/cpu.c b/target/openrisc/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/openrisc/cpu.c
-+++ b/target/openrisc/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void openrisc_cpu_reset_hold(Object *obj, ResetType type)
-     set_float_detect_tininess(float_tininess_before_rounding,
-                               &cpu->env.fp_status);
-+    /*
-+     * TODO: this is probably not the correct NaN propagation rule for
-+     * this architecture.
-+     */
-+    set_float_2nan_prop_rule(float_2nan_prop_x87, &cpu->env.fp_status);
-+
- #ifndef CONFIG_USER_ONLY
-     cpu->env.picmr = 0x00000000;
-diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat-specialize.c.inc
-+++ b/fpu/softfloat-specialize.c.inc
-@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
-     || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K) \
-     || defined(TARGET_SPARC) || defined(TARGET_XTENSA) \
-     || defined(TARGET_I386) || defined(TARGET_ALPHA) \
--    || defined(TARGET_MICROBLAZE)
-+    || defined(TARGET_MICROBLAZE) || defined(TARGET_OPENRISC)
-         g_assert_not_reached();
- #else
-         rule = float_2nan_prop_x87;
---
-.34.1

-[PULL 20/31] target/rx: Explicitly set 2-NaN propagation rule
+Deleted patch
-Set the NaN propagation rule explicitly for the float_status word
-used in the rx target.
-This not the architecturally correct behaviour, but since this is a
-no-behaviour-change patch, we leave a TODO note to that effect.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-21-peter.maydell@linaro.org
----
- target/rx/cpu.c                | 7 +++++++
- fpu/softfloat-specialize.c.inc | 3 ++-
-files changed, 9 insertions(+), 1 deletion(-)
-diff --git a/target/rx/cpu.c b/target/rx/cpu.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/rx/cpu.c
-+++ b/target/rx/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void rx_cpu_reset_hold(Object *obj, ResetType type)
-     env->fpsw = 0;
-     set_flush_to_zero(1, &env->fp_status);
-     set_flush_inputs_to_zero(1, &env->fp_status);
-+    /*
-+     * TODO: this is not the correct NaN propagation rule for this
-+     * architecture. The "RX Family User's Manual: Software" table 1.6
-+     * defines the propagation rules as "prefer SNaN over QNaN;
-+     * then prefer dest over source", which is float_2nan_prop_s_ab.
-+     */
-+    set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
- }
- static ObjectClass *rx_cpu_class_by_name(const char *cpu_model)
-diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat-specialize.c.inc
-+++ b/fpu/softfloat-specialize.c.inc
-@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
-     || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K) \
-     || defined(TARGET_SPARC) || defined(TARGET_XTENSA) \
-     || defined(TARGET_I386) || defined(TARGET_ALPHA) \
--    || defined(TARGET_MICROBLAZE) || defined(TARGET_OPENRISC)
-+    || defined(TARGET_MICROBLAZE) || defined(TARGET_OPENRISC) \
-+    || defined(TARGET_RX)
-         g_assert_not_reached();
- #else
-         rule = float_2nan_prop_x87;
---
-.34.1

-[PULL 21/31] softfloat: Remove fallback rule from pickNaN()
+Deleted patch
-Now that all targets have been converted to explicitly set a NaN
-propagation rule, we can remove the set of target ifdefs (which now
-list every target) and clean up the references to fallback behaviour
-for float_2nan_prop_none.
-The "default" case in the switch will catch any remaining places
-where status->float_2nan_prop_rule was not set by the target.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241025141254.2141506-22-peter.maydell@linaro.org
----
- include/fpu/softfloat-types.h  | 10 +++-------
- fpu/softfloat-specialize.c.inc | 23 +++--------------------
-files changed, 6 insertions(+), 27 deletions(-)
-diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/fpu/softfloat-types.h
-+++ b/include/fpu/softfloat-types.h
-@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
-  * If default_nan_mode is enabled then it is valid not to set a
-  * NaN propagation rule, because the softfloat code guarantees
-  * not to try to pick a NaN to propagate in default NaN mode.
-- *
-- * For transition, currently the 'none' rule will cause us to
-- * fall back to picking the propagation rule based on the existing
-- * ifdef ladder. When all targets are converted it will be an error
-- * not to set the rule in float_status unless in default_nan_mode,
-- * and we will assert if we need to handle an input NaN and no
-- * rule was selected.
-+ * When not in default-NaN mode, it is an error for the target
-+ * not to set the rule in float_status, and we will assert if
-+ * we need to handle an input NaN and no rule was selected.
-  */
- typedef enum __attribute__((__packed__)) {
-     /* No propagation rule specified */
-diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat-specialize.c.inc
-+++ b/fpu/softfloat-specialize.c.inc
-@@ -XXX,XX +XXX,XX @@ bool float32_is_signaling_nan(float32 a_, float_status *status)
- static int pickNaN(FloatClass a_cls, FloatClass b_cls,
-                    bool aIsLargerSignificand, float_status *status)
- {
--    Float2NaNPropRule rule = status->float_2nan_prop_rule;
--
-     /*
-      * We guarantee not to require the target to tell us how to
-      * pick a NaN if we're always returning the default NaN.
-+     * But if we're not in default-NaN mode then the target must
-+     * specify via set_float_2nan_prop_rule().
-      */
-     assert(!status->default_nan_mode);
--    if (rule == float_2nan_prop_none) {
--        /* target didn't set the rule: fall back to old ifdef choices */
--#if defined(TARGET_AVR) || defined(TARGET_HEXAGON) \
--    || defined(TARGET_RISCV) || defined(TARGET_SH4) \
--    || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
--    || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
--    || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K) \
--    || defined(TARGET_SPARC) || defined(TARGET_XTENSA) \
--    || defined(TARGET_I386) || defined(TARGET_ALPHA) \
--    || defined(TARGET_MICROBLAZE) || defined(TARGET_OPENRISC) \
--    || defined(TARGET_RX)
--        g_assert_not_reached();
--#else
--        rule = float_2nan_prop_x87;
--#endif
--    }
--
--    switch (rule) {
-+    switch (status->float_2nan_prop_rule) {
-     case float_2nan_prop_s_ab:
-         if (is_snan(a_cls)) {
-             return 0;
---
-.34.1

-[PULL 22/31] Revert "target/arm: Fix usage of MMU indexes when EL3 is AArch32"
+Deleted patch
-This reverts commit 4c2c0474693229c1f533239bb983495c5427784d.
-This commit tried to fix a problem with our usage of MMU indexes when
-EL3 is AArch32, using what it described as a "more complicated
-approach" where we share the same MMU index values for Secure PL1&0
-and NonSecure PL1&0. In theory this should work, but the change
-didn't account for (at least) two things:
-(1) The design change means we need to flush the TLBs at any point
-where the CPU state flips from one to the other.  We already flush
-the TLB when SCR.NS is changed, but we don't flush the TLB when we
-take an exception from NS PL1&0 into Mon or when we return from Mon
-to NS PL1&0, and the commit didn't add any code to do that.
-(2) The ATS12NS* address translate instructions allow Mon code (which
-is Secure) to do a stage 1+2 page table walk for NS.  I thought this
-was OK because do_ats_write() does a page table walk which doesn't
-use the TLBs, so because it can pass both the MMU index and also an
-ARMSecuritySpace argument we can tell the table walk that we want NS
-stage1+2, not S.  But that means that all the code within the ptw
-that needs to find e.g.  the regime EL cannot do so only with an
-mmu_idx -- all these functions like regime_sctlr(), regime_el(), etc
-would need to pass both an mmu_idx and the security_space, so they
-can tell whether this is a translation regime controlled by EL1 or
-EL3 (and so whether to look at SCTLR.S or SCTLR.NS, etc).
-In particular, because regime_el() wasn't updated to look at the
-ARMSecuritySpace it would return 1 even when the CPU was in Monitor
-mode (and the controlling EL is 3).  This meant that page table walks
-in Monitor mode would look at the wrong SCTLR, TCR, etc and would
-generally fault when they should not.
-Rather than trying to make the complicated changes needed to rescue
-the design of 4c2c04746932, we revert it in order to instead take the
-route that that commit describes as "the most straightforward" fix,
-where we add new MMU indexes EL30_0, EL30_3, EL30_3_PAN to correspond
-to "Secure PL1&0 at PL0", "Secure PL1&0 at PL1", and "Secure PL1&0 at
-PL1 with PAN".
-This revert will re-expose the "spurious alignment faults in
-Secure PL0" issue #2326; we'll fix it again in the next commit.
-Cc: qemu-stable@nongnu.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Tested-by: Thomas Huth <thuth@redhat.com>
-Message-id: 20241101142845.1712482-2-peter.maydell@linaro.org
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/cpu.h               | 31 +++++++++++++------------------
- target/arm/internals.h         | 27 ++++-----------------------
- target/arm/tcg/translate.h     |  2 --
- target/arm/helper.c            | 34 +++++++++++-----------------------
- target/arm/ptw.c               |  6 +-----
- target/arm/tcg/hflags.c        |  4 ----
- target/arm/tcg/translate-a64.c |  2 +-
- target/arm/tcg/translate.c     |  9 ++++-----
-files changed, 34 insertions(+), 81 deletions(-)
-diff --git a/target/arm/cpu.h b/target/arm/cpu.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.h
-+++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
-  *  + NonSecure PL1 & 0 stage 1
-  *  + NonSecure PL1 & 0 stage 2
-  *  + NonSecure PL2
-- *  + Secure PL1 & 0
-+ *  + Secure PL0
-+ *  + Secure PL1
-  * (reminder: for 32 bit EL3, Secure PL1 is *EL3*, not EL1.)
-  *
-  * For QEMU, an mmu_idx is not quite the same as a translation regime because:
-@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
-  *     The only use of stage 2 translations is either as part of an s1+2
-  *     lookup or when loading the descriptors during a stage 1 page table walk,
-  *     and in both those cases we don't use the TLB.
-- *  4. we want to be able to use the TLB for accesses done as part of a
-+ *  4. we can also safely fold together the "32 bit EL3" and "64 bit EL3"
-+ *     translation regimes, because they map reasonably well to each other
-+ *     and they can't both be active at the same time.
-+ *  5. we want to be able to use the TLB for accesses done as part of a
-  *     stage1 page table walk, rather than having to walk the stage2 page
-  *     table over and over.
-- *  5. we need separate EL1/EL2 mmu_idx for handling the Privileged Access
-+ *  6. we need separate EL1/EL2 mmu_idx for handling the Privileged Access
-  *     Never (PAN) bit within PSTATE.
-- *  6. we fold together most secure and non-secure regimes for A-profile,
-+ *  7. we fold together most secure and non-secure regimes for A-profile,
-  *     because there are no banked system registers for aarch64, so the
-  *     process of switching between secure and non-secure is
-  *     already heavyweight.
-- *  7. we cannot fold together Stage 2 Secure and Stage 2 NonSecure,
-+ *  8. we cannot fold together Stage 2 Secure and Stage 2 NonSecure,
-  *     because both are in use simultaneously for Secure EL2.
-  *
-  * This gives us the following list of cases:
-  *
-- * EL0 EL1&0 stage 1+2 (or AArch32 PL0 PL1&0 stage 1+2)
-- * EL1 EL1&0 stage 1+2 (or AArch32 PL1 PL1&0 stage 1+2)
-- * EL1 EL1&0 stage 1+2 +PAN (or AArch32 PL1 PL1&0 stage 1+2 +PAN)
-+ * EL0 EL1&0 stage 1+2 (aka NS PL0)
-+ * EL1 EL1&0 stage 1+2 (aka NS PL1)
-+ * EL1 EL1&0 stage 1+2 +PAN
-  * EL0 EL2&0
-  * EL2 EL2&0
-  * EL2 EL2&0 +PAN
-  * EL2 (aka NS PL2)
-- * EL3 (not used when EL3 is AArch32)
-+ * EL3 (aka S PL1)
-  * Stage2 Secure
-  * Stage2 NonSecure
-  * plus one TLB per Physical address space: S, NS, Realm, Root
-  *
-  * for a total of 14 different mmu_idx.
-  *
-- * Note that when EL3 is AArch32, the usage is potentially confusing
-- * because the MMU indexes are named for their AArch64 use, so code
-- * using the ARMMMUIdx_E10_1 might be at EL3, not EL1. This is because
-- * Secure PL1 is always at EL3.
-- *
-  * R profile CPUs have an MPU, but can use the same set of MMU indexes
-  * as A profile. They only need to distinguish EL0 and EL1 (and
-  * EL2 for cores like the Cortex-R52).
-@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A32, NS, 10, 1)
-  * This requires an SME trap from AArch32 mode when using NEON.
-  */
- FIELD(TBFLAG_A32, SME_TRAP_NONSTREAMING, 11, 1)
--/*
-- * Indicates whether we are in the Secure PL1&0 translation regime
-- */
--FIELD(TBFLAG_A32, S_PL1_0, 12, 1)
- /*
-  * Bit usage when in AArch32 state, for M-profile only.
-diff --git a/target/arm/internals.h b/target/arm/internals.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/internals.h
-+++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ FIELD(CNTHCTL, CNTPMASK, 19, 1)
- #define M_FAKE_FSR_NSC_EXEC 0xf /* NS executing in S&NSC memory */
- #define M_FAKE_FSR_SFAULT 0xe /* SecureFault INVTRAN, INVEP or AUVIOL */
--/**
-- * arm_aa32_secure_pl1_0(): Return true if in Secure PL1&0 regime
-- *
-- * Return true if the CPU is in the Secure PL1&0 translation regime.
-- * This requires that EL3 exists and is AArch32 and we are currently
-- * Secure. If this is the case then the ARMMMUIdx_E10* apply and
-- * mean we are in EL3, not EL1.
-- */
--static inline bool arm_aa32_secure_pl1_0(CPUARMState *env)
--{
--    return arm_feature(env, ARM_FEATURE_EL3) &&
--        !arm_el_is_aa64(env, 3) && arm_is_secure(env);
--}
--
- /**
-  * raise_exception: Raise the specified exception.
-  * Raise a guest exception with the specified value, syndrome register
-@@ -XXX,XX +XXX,XX @@ static inline ARMMMUIdx core_to_aa64_mmu_idx(int mmu_idx)
-     return mmu_idx | ARM_MMU_IDX_A;
- }
--/**
-- * Return the exception level we're running at if our current MMU index
-- * is @mmu_idx. @s_pl1_0 should be true if this is the AArch32
-- * Secure PL1&0 translation regime.
-- */
--int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx, bool s_pl1_0);
-+int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx);
- /* Return the MMU index for a v7M CPU in the specified security state */
- ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate);
-@@ -XXX,XX +XXX,XX @@ static inline uint32_t regime_el(CPUARMState *env, ARMMMUIdx mmu_idx)
-         return 3;
-     case ARMMMUIdx_E10_0:
-     case ARMMMUIdx_Stage1_E0:
--    case ARMMMUIdx_E10_1:
--    case ARMMMUIdx_E10_1_PAN:
-+        return arm_el_is_aa64(env, 3) || !arm_is_secure_below_el3(env) ? 1 : 3;
-     case ARMMMUIdx_Stage1_E1:
-     case ARMMMUIdx_Stage1_E1_PAN:
--        return arm_el_is_aa64(env, 3) || !arm_is_secure_below_el3(env) ? 1 : 3;
-+    case ARMMMUIdx_E10_1:
-+    case ARMMMUIdx_E10_1_PAN:
-     case ARMMMUIdx_MPrivNegPri:
-     case ARMMMUIdx_MUserNegPri:
-     case ARMMMUIdx_MPriv:
-diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/tcg/translate.h
-+++ b/target/arm/tcg/translate.h
-@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
-     uint8_t gm_blocksize;
-     /* True if the current insn_start has been updated. */
-     bool insn_start_updated;
--    /* True if this is the AArch32 Secure PL1&0 translation regime */
--    bool s_pl1_0;
-     /* Bottom two bits of XScale c15_cpar coprocessor access control reg */
-     int c15_cpar;
-     /* Offset from VNCR_EL2 when FEAT_NV2 redirects this reg to memory */
-diff --git a/target/arm/helper.c b/target/arm/helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
-+++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
-          */
-         format64 = arm_s1_regime_using_lpae_format(env, mmu_idx);
--        if (arm_feature(env, ARM_FEATURE_EL2) && !arm_aa32_secure_pl1_0(env)) {
-+        if (arm_feature(env, ARM_FEATURE_EL2)) {
-             if (mmu_idx == ARMMMUIdx_E10_0 ||
-                 mmu_idx == ARMMMUIdx_E10_1 ||
-                 mmu_idx == ARMMMUIdx_E10_1_PAN) {
-@@ -XXX,XX +XXX,XX @@ static void ats_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
-     case 0:
-         /* stage 1 current state PL1: ATS1CPR, ATS1CPW, ATS1CPRP, ATS1CPWP */
-         switch (el) {
-+        case 3:
-+            mmu_idx = ARMMMUIdx_E3;
-+            break;
-         case 2:
-             g_assert(ss != ARMSS_Secure);  /* ARMv8.4-SecEL2 is 64-bit only */
-             /* fall through */
-         case 1:
--        case 3:
-             if (ri->crm == 9 && arm_pan_enabled(env)) {
-                 mmu_idx = ARMMMUIdx_Stage1_E1_PAN;
-             } else {
-@@ -XXX,XX +XXX,XX @@ void arm_cpu_do_interrupt(CPUState *cs)
- uint64_t arm_sctlr(CPUARMState *env, int el)
- {
--    if (arm_aa32_secure_pl1_0(env)) {
--        /* In Secure PL1&0 SCTLR_S is always controlling */
--        el = 3;
--    } else if (el == 0) {
--        /* Only EL0 needs to be adjusted for EL1&0 or EL2&0. */
-+    /* Only EL0 needs to be adjusted for EL1&0 or EL2&0. */
-+    if (el == 0) {
-         ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, 0);
-         el = mmu_idx == ARMMMUIdx_E20_0 ? 2 : 1;
-     }
-@@ -XXX,XX +XXX,XX @@ int fp_exception_el(CPUARMState *env, int cur_el)
-     return 0;
- }
--/*
-- * Return the exception level we're running at if this is our mmu_idx.
-- * s_pl1_0 should be true if this is the AArch32 Secure PL1&0 translation
-- * regime.
-- */
--int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx, bool s_pl1_0)
-+/* Return the exception level we're running at if this is our mmu_idx */
-+int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx)
- {
-     if (mmu_idx & ARM_MMU_IDX_M) {
-         return mmu_idx & ARM_MMU_IDX_M_PRIV;
-@@ -XXX,XX +XXX,XX @@ int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx, bool s_pl1_0)
-         return 0;
-     case ARMMMUIdx_E10_1:
-     case ARMMMUIdx_E10_1_PAN:
--        return s_pl1_0 ? 3 : 1;
-+        return 1;
-     case ARMMMUIdx_E2:
-     case ARMMMUIdx_E20_2:
-     case ARMMMUIdx_E20_2_PAN:
-@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el)
-             idx = ARMMMUIdx_E10_0;
-         }
-         break;
--    case 3:
--        /*
--         * AArch64 EL3 has its own translation regime; AArch32 EL3
--         * uses the Secure PL1&0 translation regime.
--         */
--        if (arm_el_is_aa64(env, 3)) {
--            return ARMMMUIdx_E3;
--        }
--        /* fall through */
-     case 1:
-         if (arm_pan_enabled(env)) {
-             idx = ARMMMUIdx_E10_1_PAN;
-@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el)
-             idx = ARMMMUIdx_E2;
-         }
-         break;
-+    case 3:
-+        return ARMMMUIdx_E3;
-     default:
-         g_assert_not_reached();
-     }
-diff --git a/target/arm/ptw.c b/target/arm/ptw.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/ptw.c
-+++ b/target/arm/ptw.c
-@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, vaddr address,
-     case ARMMMUIdx_Stage1_E1:
-     case ARMMMUIdx_Stage1_E1_PAN:
-     case ARMMMUIdx_E2:
--        if (arm_aa32_secure_pl1_0(env)) {
--            ss = ARMSS_Secure;
--        } else {
--            ss = arm_security_space_below_el3(env);
--        }
-+        ss = arm_security_space_below_el3(env);
-         break;
-     case ARMMMUIdx_Stage2:
-         /*
-diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/tcg/hflags.c
-+++ b/target/arm/tcg/hflags.c
-@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState *env, int fp_el,
-         DP_TBFLAG_A32(flags, SME_TRAP_NONSTREAMING, 1);
-     }
--    if (arm_aa32_secure_pl1_0(env)) {
--        DP_TBFLAG_A32(flags, S_PL1_0, 1);
--    }
--
-     return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
- }
-diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/tcg/translate-a64.c
-+++ b/target/arm/tcg/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
-     dc->tbii = EX_TBFLAG_A64(tb_flags, TBII);
-     dc->tbid = EX_TBFLAG_A64(tb_flags, TBID);
-     dc->tcma = EX_TBFLAG_A64(tb_flags, TCMA);
--    dc->current_el = arm_mmu_idx_to_el(dc->mmu_idx, false);
-+    dc->current_el = arm_mmu_idx_to_el(dc->mmu_idx);
- #if !defined(CONFIG_USER_ONLY)
-     dc->user = (dc->current_el == 0);
- #endif
-diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/tcg/translate.c
-+++ b/target/arm/tcg/translate.c
-@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
-     core_mmu_idx = EX_TBFLAG_ANY(tb_flags, MMUIDX);
-     dc->mmu_idx = core_to_arm_mmu_idx(env, core_mmu_idx);
-+    dc->current_el = arm_mmu_idx_to_el(dc->mmu_idx);
-+#if !defined(CONFIG_USER_ONLY)
-+    dc->user = (dc->current_el == 0);
-+#endif
-     dc->fp_excp_el = EX_TBFLAG_ANY(tb_flags, FPEXC_EL);
-     dc->align_mem = EX_TBFLAG_ANY(tb_flags, ALIGN_MEM);
-     dc->pstate_il = EX_TBFLAG_ANY(tb_flags, PSTATE__IL);
-@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
-         }
-         dc->sme_trap_nonstreaming =
-             EX_TBFLAG_A32(tb_flags, SME_TRAP_NONSTREAMING);
--        dc->s_pl1_0 = EX_TBFLAG_A32(tb_flags, S_PL1_0);
-     }
--    dc->current_el = arm_mmu_idx_to_el(dc->mmu_idx, dc->s_pl1_0);
--#if !defined(CONFIG_USER_ONLY)
--    dc->user = (dc->current_el == 0);
--#endif
-     dc->lse2 = false; /* applies only to aarch64 */
-     dc->cp_regs = cpu->cp_regs;
-     dc->features = env->features;
---
-.34.1

-[PULL 24/31] target/arm: Fix SVE SDOT/UDOT/USDOT (4-way, indexed)
+Deleted patch
-Our implementation of the indexed version of SVE SDOT/UDOT/USDOT got
-the calculation of the inner loop terminator wrong.  Although we
-correctly account for the element size when we calculate the
-terminator for the first iteration:
-   intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n);
-we don't do that when we move it forward after the first inner loop
-completes.  The intention is that we process the vector in 128-bit
-segments, which for a 64-bit element size should mean (1, 2), (3, 4),
-(5, 6), etc.  This bug meant that we would iterate (1, 2), (3, 4, 5,
-), (7, 8, 9, 10) etc and apply the wrong indexed element to some of
-the operations, and also index off the end of the vector.
-You don't see this bug if the vector length is small enough that we
-don't need to iterate the outer loop, i.e.  if it is only 128 bits,
-or if it is the 64-bit special case from AA32/AA64 AdvSIMD.  If the
-vector length is 256 bits then we calculate the right results for the
-elements in the vector but do index off the end of the vector. Vector
-lengths greater than 256 bits see wrong answers. The instructions
-that produce 32-bit results behave correctly.
-Fix the recalculation of 'segend' for subsequent iterations, and
-restore a version of the comment that was lost in the refactor of
-commit 7020ffd656a5 that explains why we only need to clamp segend to
-opr_sz_n for the first iteration, not the later ones.
-Cc: qemu-stable@nongnu.org
-Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2595
-Fixes: 7020ffd656a5 ("target/arm: Macroize helper_gvec_{s,u}dot_idx_{b,h}")
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20241101185544.2130972-1-peter.maydell@linaro.org
----
- target/arm/tcg/vec_helper.c | 9 ++++++++-
-file changed, 8 insertions(+), 1 deletion(-)
-diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/tcg/vec_helper.c
-+++ b/target/arm/tcg/vec_helper.c
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)  \
- {                                                                         \
-     intptr_t i = 0, opr_sz = simd_oprsz(desc);                            \
-     intptr_t opr_sz_n = opr_sz / sizeof(TYPED);                           \
-+    /*                                                                    \
-+     * Special case: opr_sz == 8 from AA64/AA32 advsimd means the         \
-+     * first iteration might not be a full 16 byte segment. But           \
-+     * for vector lengths beyond that this must be SVE and we know        \
-+     * opr_sz is a multiple of 16, so we need not clamp segend            \
-+     * to opr_sz_n when we advance it at the end of the loop.             \
-+     */                                                                   \
-     intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n);                  \
-     intptr_t index = simd_data(desc);                                     \
-     TYPED *d = vd, *a = va;                                               \
-@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)  \
-                     n[i * 4 + 2] * m2 +                                   \
-                     n[i * 4 + 3] * m3);                                   \
-         } while (++i < segend);                                           \
--        segend = i + 4;                                                   \
-+        segend = i + (16 / sizeof(TYPED));                                \
-     } while (i < opr_sz_n);                                               \
-     clear_tail(d, opr_sz, simd_maxsz(desc));                              \
- }
---
-.34.1

The following changes since commit 11b8920ed2093848f79f93d106afe8a69a61a523:

Merge tag 'pull-request-2024-11-04' of https://gitlab.com/thuth/qemu into staging (2024-11-04 17:37:59 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20241105

for you to fetch changes up to 374cdc8efe4a039510cca47e8399d54a1aeb4f2d:

target/arm: Enable FEAT_CMOW for -cpu max (2024-11-05 10:10:00 +0000)

----------------------------------------------------------------
target-arm queue:
 * Fix MMU indexes for AArch32 Secure PL1&0 in a less complex and buggy way
 * Fix SVE SDOT/UDOT/USDOT (4-way, indexed)
 * softfloat: set 2-operand NaN propagation rule at runtime
 * disas: Fix build against Capstone v6 (again)
 * hw/rtc/ds1338: Trace send and receive operations
 * hw/timer/imx_gpt: Convert DPRINTF to trace events
 * hw/watchdog/wdt_imx2: Remove redundant assignment
 * hw/sensor/tmp105: Convert printf() to trace event, add tracing for read/write access
 * hw/net/npcm_gmac: Change error log to trace event
 * target/arm: Enable FEAT_CMOW for -cpu max

----------------------------------------------------------------
Bernhard Beschow (4):
      hw/rtc/ds1338: Trace send and receive operations
      hw/timer/imx_gpt: Convert DPRINTF to trace events
      hw/watchdog/wdt_imx2: Remove redundant assignment
      hw/sensor/tmp105: Convert printf() to trace event, add tracing for read/write access

Gustavo Romero (1):
      target/arm: Enable FEAT_CMOW for -cpu max

Nabih Estefan (1):
      hw/net/npcm_gmac: Change error log to trace event

Peter Maydell (24):
      softfloat: Allow 2-operand NaN propagation rule to be set at runtime
      tests/fp: Explicitly set 2-NaN propagation rule
      target/arm: Explicitly set 2-NaN propagation rule
      target/mips: Explicitly set 2-NaN propagation rule
      target/loongarch: Explicitly set 2-NaN propagation rule
      target/hppa: Explicitly set 2-NaN propagation rule
      target/s390x: Explicitly set 2-NaN propagation rule
      target/ppc: Explicitly set 2-NaN propagation rule
      target/m68k: Explicitly set 2-NaN propagation rule
      target/m68k: Initialize float_status fields in gdb set/get functions
      target/sparc: Move cpu_put_fsr(env, 0) call to reset
      target/sparc: Explicitly set 2-NaN propagation rule
      target/xtensa: Factor out calls to set_use_first_nan()
      target/xtensa: Explicitly set 2-NaN propagation rule
      target/i386: Set 2-NaN propagation rule explicitly
      target/alpha: Explicitly set 2-NaN propagation rule
      target/microblaze: Move setting of float rounding mode to reset
      target/microblaze: Explicitly set 2-NaN propagation rule
      target/openrisc: Explicitly set 2-NaN propagation rule
      target/rx: Explicitly set 2-NaN propagation rule
      softfloat: Remove fallback rule from pickNaN()
      Revert "target/arm: Fix usage of MMU indexes when EL3 is AArch32"
      target/arm: Add new MMU indexes for AArch32 Secure PL1&0
      target/arm: Fix SVE SDOT/UDOT/USDOT (4-way, indexed)

Richard Henderson (1):
      disas: Fix build against Capstone v6 (again)

IEEE 758 does not define a fixed rule for which NaN to pick as the
result if both operands of a 2-operand operation are NaNs.  As a
result different architectures have ended up with different rules for
propagating NaNs.

QEMU currently hardcodes the NaN propagation logic into the binary
because pickNaN() has an ifdef ladder for different targets.  We want
to make the propagation rule instead be selectable at runtime,
because:
 * this will let us have multiple targets in one QEMU binary
 * the Arm FEAT_AFP architectural feature includes letting
   the guest select a NaN propagation rule at runtime
 * x86 specifies different propagation rules for x87 FPU ops
   and for SSE ops, and specifying the rule in the float_status
   would let us emulate this, instead of wrongly using the
   x87 rules everywhere

In this commit we add an enum for the propagation rule, the field in
float_status, and the corresponding getters and setters.  We change
pickNaN to honour this, but because all targets still leave this
field at its default 0 value, the fallback logic will pick the rule
type with the old ifdef ladder.

It's valid not to set a propagation rule if default_nan_mode is
enabled, because in that case there's no need to pick a NaN; all the
callers of pickNaN() catch this case and skip calling it.  So we can
already assert that we don't get into the "no rule defined" codepath
for our four targets which always set default_nan_mode: Hexagon,
RiscV, SH4 and Tricore, and for the one target which does not have FP
at all: avr.  These targets will not need to be updated to call
set_float_2nan_prop_rule().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-2-peter.maydell@linaro.org
---
 include/fpu/softfloat-helpers.h |  11 ++
 include/fpu/softfloat-types.h   |  42 ++++++
 fpu/softfloat-specialize.c.inc  | 229 ++++++++++++++++++--------------
 3 files changed, 185 insertions(+), 97 deletions(-)

diff --git a/include/fpu/softfloat-helpers.h b/include/fpu/softfloat-helpers.h
index XXXXXXX..XXXXXXX 100644
--- a/include/fpu/softfloat-helpers.h
+++ b/include/fpu/softfloat-helpers.h
@@ -XXX,XX +XXX,XX @@ static inline void set_floatx80_rounding_precision(FloatX80RoundPrec val,
     status->floatx80_rounding_precision = val;
 }
 
+static inline void set_float_2nan_prop_rule(Float2NaNPropRule rule,
+                                            float_status *status)
+{
+    status->float_2nan_prop_rule = rule;
+}
+
 static inline void set_flush_to_zero(bool val, float_status *status)
 {
     status->flush_to_zero = val;
@@ -XXX,XX +XXX,XX @@ get_floatx80_rounding_precision(float_status *status)
     return status->floatx80_rounding_precision;
 }
 
+static inline Float2NaNPropRule get_float_2nan_prop_rule(float_status *status)
+{
+    return status->float_2nan_prop_rule;
+}
+
 static inline bool get_flush_to_zero(float_status *status)
 {
     return status->flush_to_zero;
diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index XXXXXXX..XXXXXXX 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
     floatx80_precision_s,
 } FloatX80RoundPrec;
 
+/*
+ * 2-input NaN propagation rule. Individual architectures have
+ * different rules for which input NaN is propagated to the output
+ * when there is more than one NaN on the input.
+ *
+ * If default_nan_mode is enabled then it is valid not to set a
+ * NaN propagation rule, because the softfloat code guarantees
+ * not to try to pick a NaN to propagate in default NaN mode.
+ *
+ * For transition, currently the 'none' rule will cause us to
+ * fall back to picking the propagation rule based on the existing
+ * ifdef ladder. When all targets are converted it will be an error
+ * not to set the rule in float_status unless in default_nan_mode,
+ * and we will assert if we need to handle an input NaN and no
+ * rule was selected.
+ */
+typedef enum __attribute__((__packed__)) {
+    /* No propagation rule specified */
+    float_2nan_prop_none = 0,
+    /* Prefer SNaN over QNaN, then operand A over B */
+    float_2nan_prop_s_ab,
+    /* Prefer SNaN over QNaN, then operand B over A */
+    float_2nan_prop_s_ba,
+    /* Prefer A over B regardless of SNaN vs QNaN */
+    float_2nan_prop_ab,
+    /* Prefer B over A regardless of SNaN vs QNaN */
+    float_2nan_prop_ba,
+    /*
+     * This implements x87 NaN propagation rules:
+     * SNaN + QNaN => return the QNaN
+     * two SNaNs => return the one with the larger significand, silenced
+     * two QNaNs => return the one with the larger significand
+     * SNaN and a non-NaN => return the SNaN, silenced
+     * QNaN and a non-NaN => return the QNaN
+     *
+     * If we get down to comparing significands and they are the same,
+     * return the NaN with the positive sign bit (if any).
+     */
+    float_2nan_prop_x87,
+} Float2NaNPropRule;
+
 /*
  * Floating Point Status. Individual architectures may maintain
  * several versions of float_status for different functions. The
@@ -XXX,XX +XXX,XX @@ typedef struct float_status {
     uint16_t float_exception_flags;
     FloatRoundMode float_rounding_mode;
     FloatX80RoundPrec floatx80_rounding_precision;
+    Float2NaNPropRule float_2nan_prop_rule;
     bool tininess_before_rounding;
     /* should denormalised results go to zero and set the inexact flag? */
     bool flush_to_zero;
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -XXX,XX +XXX,XX @@ bool float32_is_signaling_nan(float32 a_, float_status *status)
 static int pickNaN(FloatClass a_cls, FloatClass b_cls,
                    bool aIsLargerSignificand, float_status *status)
 {
-#if defined(TARGET_ARM) || defined(TARGET_MIPS) || defined(TARGET_HPPA) || \
-    defined(TARGET_LOONGARCH64) || defined(TARGET_S390X)
-    /* ARM mandated NaN propagation rules (see FPProcessNaNs()), take
-     * the first of:
-     *  1. A if it is signaling
-     *  2. B if it is signaling
-     *  3. A (quiet)
-     *  4. B (quiet)
-     * A signaling NaN is always quietened before returning it.
-     */
-    /* According to MIPS specifications, if one of the two operands is
-     * a sNaN, a new qNaN has to be generated. This is done in
-     * floatXX_silence_nan(). For qNaN inputs the specifications
-     * says: "When possible, this QNaN result is one of the operand QNaN
-     * values." In practice it seems that most implementations choose
-     * the first operand if both operands are qNaN. In short this gives
-     * the following rules:
-     *  1. A if it is signaling
-     *  2. B if it is signaling
-     *  3. A (quiet)
-     *  4. B (quiet)
-     * A signaling NaN is always silenced before returning it.
-     */
-    if (is_snan(a_cls)) {
-        return 0;
-    } else if (is_snan(b_cls)) {
-        return 1;
-    } else if (is_qnan(a_cls)) {
-        return 0;
-    } else {
-        return 1;
-    }
-#elif defined(TARGET_PPC) || defined(TARGET_M68K)
-    /* PowerPC propagation rules:
-     *  1. A if it sNaN or qNaN
-     *  2. B if it sNaN or qNaN
-     * A signaling NaN is always silenced before returning it.
-     */
-    /* M68000 FAMILY PROGRAMMER'S REFERENCE MANUAL
-     * 3.4 FLOATING-POINT INSTRUCTION DETAILS
-     * If either operand, but not both operands, of an operation is a
-     * nonsignaling NaN, then that NaN is returned as the result. If both
-     * operands are nonsignaling NaNs, then the destination operand
-     * nonsignaling NaN is returned as the result.
-     * If either operand to an operation is a signaling NaN (SNaN), then the
-     * SNaN bit is set in the FPSR EXC byte. If the SNaN exception enable bit
-     * is set in the FPCR ENABLE byte, then the exception is taken and the
-     * destination is not modified. If the SNaN exception enable bit is not
-     * set, setting the SNaN bit in the operand to a one converts the SNaN to
-     * a nonsignaling NaN. The operation then continues as described in the
-     * preceding paragraph for nonsignaling NaNs.
-     */
-    if (is_nan(a_cls)) {
-        return 0;
-    } else {
-        return 1;
-    }
-#elif defined(TARGET_SPARC)
-    /* Prefer SNaN over QNaN, order B then A. */
-    if (is_snan(b_cls)) {
-        return 1;
-    } else if (is_snan(a_cls)) {
-        return 0;
-    } else if (is_qnan(b_cls)) {
-        return 1;
-    } else {
-        return 0;
-    }
-#elif defined(TARGET_XTENSA)
+    Float2NaNPropRule rule = status->float_2nan_prop_rule;
+
     /*
-     * Xtensa has two NaN propagation modes.
-     * Which one is active is controlled by float_status::use_first_nan.
+     * We guarantee not to require the target to tell us how to
+     * pick a NaN if we're always returning the default NaN.
      */
-    if (status->use_first_nan) {
+    assert(!status->default_nan_mode);
+
+    if (rule == float_2nan_prop_none) {
+        /* target didn't set the rule: fall back to old ifdef choices */
+#if defined(TARGET_AVR) || defined(TARGET_HEXAGON) \
+    || defined(TARGET_RISCV) || defined(TARGET_SH4) \
+    || defined(TARGET_TRICORE)
+        g_assert_not_reached();
+#elif defined(TARGET_ARM) || defined(TARGET_MIPS) || defined(TARGET_HPPA) || \
+    defined(TARGET_LOONGARCH64) || defined(TARGET_S390X)
+        /*
+         * ARM mandated NaN propagation rules (see FPProcessNaNs()), take
+         * the first of:
+         *  1. A if it is signaling
+         *  2. B if it is signaling
+         *  3. A (quiet)
+         *  4. B (quiet)
+         * A signaling NaN is always quietened before returning it.
+         */
+        /*
+         * According to MIPS specifications, if one of the two operands is
+         * a sNaN, a new qNaN has to be generated. This is done in
+         * floatXX_silence_nan(). For qNaN inputs the specifications
+         * says: "When possible, this QNaN result is one of the operand QNaN
+         * values." In practice it seems that most implementations choose
+         * the first operand if both operands are qNaN. In short this gives
+         * the following rules:
+         *  1. A if it is signaling
+         *  2. B if it is signaling
+         *  3. A (quiet)
+         *  4. B (quiet)
+         * A signaling NaN is always silenced before returning it.
+         */
+        rule = float_2nan_prop_s_ab;
+#elif defined(TARGET_PPC) || defined(TARGET_M68K)
+        /*
+         * PowerPC propagation rules:
+         *  1. A if it sNaN or qNaN
+         *  2. B if it sNaN or qNaN
+         * A signaling NaN is always silenced before returning it.
+         */
+        /*
+         * M68000 FAMILY PROGRAMMER'S REFERENCE MANUAL
+         * 3.4 FLOATING-POINT INSTRUCTION DETAILS
+         * If either operand, but not both operands, of an operation is a
+         * nonsignaling NaN, then that NaN is returned as the result. If both
+         * operands are nonsignaling NaNs, then the destination operand
+         * nonsignaling NaN is returned as the result.
+         * If either operand to an operation is a signaling NaN (SNaN), then the
+         * SNaN bit is set in the FPSR EXC byte. If the SNaN exception enable bit
+         * is set in the FPCR ENABLE byte, then the exception is taken and the
+         * destination is not modified. If the SNaN exception enable bit is not
+         * set, setting the SNaN bit in the operand to a one converts the SNaN to
+         * a nonsignaling NaN. The operation then continues as described in the
+         * preceding paragraph for nonsignaling NaNs.
+         */
+        rule = float_2nan_prop_ab;
+#elif defined(TARGET_SPARC)
+        /* Prefer SNaN over QNaN, order B then A. */
+        rule = float_2nan_prop_s_ba;
+#elif defined(TARGET_XTENSA)
+        /*
+         * Xtensa has two NaN propagation modes.
+         * Which one is active is controlled by float_status::use_first_nan.
+         */
+        if (status->use_first_nan) {
+            rule = float_2nan_prop_ab;
+        } else {
+            rule = float_2nan_prop_ba;
+        }
+#else
+        rule = float_2nan_prop_x87;
+#endif
+    }
+
+    switch (rule) {
+    case float_2nan_prop_s_ab:
+        if (is_snan(a_cls)) {
+            return 0;
+        } else if (is_snan(b_cls)) {
+            return 1;
+        } else if (is_qnan(a_cls)) {
+            return 0;
+        } else {
+            return 1;
+        }
+        break;
+    case float_2nan_prop_s_ba:
+        if (is_snan(b_cls)) {
+            return 1;
+        } else if (is_snan(a_cls)) {
+            return 0;
+        } else if (is_qnan(b_cls)) {
+            return 1;
+        } else {
+            return 0;
+        }
+        break;
+    case float_2nan_prop_ab:
         if (is_nan(a_cls)) {
             return 0;
         } else {
             return 1;
         }
-    } else {
+        break;
+    case float_2nan_prop_ba:
         if (is_nan(b_cls)) {
             return 1;
         } else {
             return 0;
         }
-    }
-#else
-    /* This implements x87 NaN propagation rules:
-     * SNaN + QNaN => return the QNaN
-     * two SNaNs => return the one with the larger significand, silenced
-     * two QNaNs => return the one with the larger significand
-     * SNaN and a non-NaN => return the SNaN, silenced
-     * QNaN and a non-NaN => return the QNaN
-     *
-     * If we get down to comparing significands and they are the same,
-     * return the NaN with the positive sign bit (if any).
-     */
-    if (is_snan(a_cls)) {
-        if (is_snan(b_cls)) {
-            return aIsLargerSignificand ? 0 : 1;
-        }
-        return is_qnan(b_cls) ? 1 : 0;
-    } else if (is_qnan(a_cls)) {
-        if (is_snan(b_cls) || !is_qnan(b_cls)) {
-            return 0;
+        break;
+    case float_2nan_prop_x87:
+        /*
+         * This implements x87 NaN propagation rules:
+         * SNaN + QNaN => return the QNaN
+         * two SNaNs => return the one with the larger significand, silenced
+         * two QNaNs => return the one with the larger significand
+         * SNaN and a non-NaN => return the SNaN, silenced
+         * QNaN and a non-NaN => return the QNaN
+         *
+         * If we get down to comparing significands and they are the same,
+         * return the NaN with the positive sign bit (if any).
+         */
+        if (is_snan(a_cls)) {
+            if (is_snan(b_cls)) {
+                return aIsLargerSignificand ? 0 : 1;
+            }
+            return is_qnan(b_cls) ? 1 : 0;
+        } else if (is_qnan(a_cls)) {
+            if (is_snan(b_cls) || !is_qnan(b_cls)) {
+                return 0;
+            } else {
+                return aIsLargerSignificand ? 0 : 1;
+            }
         } else {
-            return aIsLargerSignificand ? 0 : 1;
+            return 1;
         }
-    } else {
-        return 1;
+    default:
+        g_assert_not_reached();
     }
-#endif
 }
 
 /*----------------------------------------------------------------------------
-- 
2.34.1

Explicitly set a 2-NaN propagation rule in the softfloat tests.  In
meson.build we put -DTARGET_ARM in fpcflags, and so we should select
here the Arm propagation rule of float_2nan_prop_s_ab.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-3-peter.maydell@linaro.org
---
 tests/fp/fp-bench.c     | 2 ++
 tests/fp/fp-test-log2.c | 1 +
 tests/fp/fp-test.c      | 2 ++
 3 files changed, 5 insertions(+)

diff --git a/tests/fp/fp-bench.c b/tests/fp/fp-bench.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/fp/fp-bench.c
+++ b/tests/fp/fp-bench.c
@@ -XXX,XX +XXX,XX @@ static void run_bench(void)
 {
     bench_func_t f;
 
+    set_float_2nan_prop_rule(float_2nan_prop_s_ab, &soft_status);
+
     f = bench_funcs[operation][precision];
     g_assert(f);
     f();
diff --git a/tests/fp/fp-test-log2.c b/tests/fp/fp-test-log2.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/fp/fp-test-log2.c
+++ b/tests/fp/fp-test-log2.c
@@ -XXX,XX +XXX,XX @@ int main(int ac, char **av)
     float_status qsf = {0};
     int i;
 
+    set_float_2nan_prop_rule(float_2nan_prop_s_ab, &qsf);
     set_float_rounding_mode(float_round_nearest_even, &qsf);
 
     test.d = 0.0;
diff --git a/tests/fp/fp-test.c b/tests/fp/fp-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/fp/fp-test.c
+++ b/tests/fp/fp-test.c
@@ -XXX,XX +XXX,XX @@ void run_test(void)
 {
     unsigned int i;
 
+    set_float_2nan_prop_rule(float_2nan_prop_s_ab, &qsf);
+
     genCases_setLevel(test_level);
     verCases_maxErrorCount = n_max_errors;
 
-- 
2.34.1

Set the 2-NaN propagation rule explicitly in the float_status words
we use.  We wrap this plus the pre-existing setting of the
tininess-before-rounding flag in a new function
arm_set_default_fp_behaviours() to avoid repetition, since we have a
lot of float_status words at this point.

The situation with FPA11 emulation in linux-user is a little odd, and
arguably "correct" behaviour there would be to exactly match a real
Linux kernel's FPA11 emulation.  However FPA11 emulation is
essentially dead at this point and so it seems better to continue
with QEMU's current behaviour and leave a comment describing the
situation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-4-peter.maydell@linaro.org
---
 linux-user/arm/nwfpe/fpa11.c   | 18 ++++++++++++++++++
 target/arm/cpu.c               | 25 +++++++++++++++++--------
 fpu/softfloat-specialize.c.inc | 13 ++-----------
 3 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/linux-user/arm/nwfpe/fpa11.c b/linux-user/arm/nwfpe/fpa11.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/arm/nwfpe/fpa11.c
+++ b/linux-user/arm/nwfpe/fpa11.c
@@ -XXX,XX +XXX,XX @@ void resetFPA11(void)
 #ifdef MAINTAIN_FPCR
   fpa11->fpcr = MASK_RESET;
 #endif
+
+  /*
+   * Real FPA11 hardware does not handle NaNs, but always takes an
+   * exception for them to be software-emulated (ARM7500FE datasheet
+   * section 10.4). There is no documented architectural requirement
+   * for NaN propagation rules and it will depend on how the OS
+   * level software emulation opted to do it. We here use prop_s_ab
+   * which matches the later VFP hardware choice and how QEMU's
+   * fpa11 emulation has worked in the past. The real Linux kernel
+   * does something slightly different: arch/arm/nwfpe/softfloat-specialize
+   * propagateFloat64NaN() has the curious behaviour that it prefers
+   * the QNaN over the SNaN, but if both are QNaN it picks A and
+   * if both are SNaN it picks B. In theory we could add this as
+   * a NaN propagation rule, but in practice FPA11 emulation is so
+   * close to totally dead that it's not worth trying to match it at
+   * this late date.
+   */
+  set_float_2nan_prop_rule(float_2nan_prop_s_ab, &fpa11->fp_status);
 }
 
 void SetRoundingMode(const unsigned int opcode)
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ void arm_register_el_change_hook(ARMCPU *cpu, ARMELChangeHookFn *hook,
     QLIST_INSERT_HEAD(&cpu->el_change_hooks, entry, node);
 }
 
+/*
+ * Set the float_status behaviour to match the Arm defaults:
+ *  * tininess-before-rounding
+ *  * 2-input NaN propagation prefers SNaN over QNaN, and then
+ *    operand A over operand B (see FPProcessNaNs() pseudocode)
+ */
+static void arm_set_default_fp_behaviours(float_status *s)
+{
+    set_float_detect_tininess(float_tininess_before_rounding, s);
+    set_float_2nan_prop_rule(float_2nan_prop_s_ab, s);
+}
+
 static void cp_reg_reset(gpointer key, gpointer value, gpointer opaque)
 {
     /* Reset a single ARMCPRegInfo register */
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset_hold(Object *obj, ResetType type)
     set_flush_inputs_to_zero(1, &env->vfp.standard_fp_status);
     set_default_nan_mode(1, &env->vfp.standard_fp_status);
     set_default_nan_mode(1, &env->vfp.standard_fp_status_f16);
-    set_float_detect_tininess(float_tininess_before_rounding,
-                              &env->vfp.fp_status);
-    set_float_detect_tininess(float_tininess_before_rounding,
-                              &env->vfp.standard_fp_status);
-    set_float_detect_tininess(float_tininess_before_rounding,
-                              &env->vfp.fp_status_f16);
-    set_float_detect_tininess(float_tininess_before_rounding,
-                              &env->vfp.standard_fp_status_f16);
+    arm_set_default_fp_behaviours(&env->vfp.fp_status);
+    arm_set_default_fp_behaviours(&env->vfp.standard_fp_status);
+    arm_set_default_fp_behaviours(&env->vfp.fp_status_f16);
+    arm_set_default_fp_behaviours(&env->vfp.standard_fp_status_f16);
+
 #ifndef CONFIG_USER_ONLY
     if (kvm_enabled()) {
         kvm_arm_reset_vcpu(cpu);
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
         /* target didn't set the rule: fall back to old ifdef choices */
 #if defined(TARGET_AVR) || defined(TARGET_HEXAGON) \
     || defined(TARGET_RISCV) || defined(TARGET_SH4) \
-    || defined(TARGET_TRICORE)
+    || defined(TARGET_TRICORE) || defined(TARGET_ARM)
         g_assert_not_reached();
-#elif defined(TARGET_ARM) || defined(TARGET_MIPS) || defined(TARGET_HPPA) || \
+#elif defined(TARGET_MIPS) || defined(TARGET_HPPA) || \
     defined(TARGET_LOONGARCH64) || defined(TARGET_S390X)
-        /*
-         * ARM mandated NaN propagation rules (see FPProcessNaNs()), take
-         * the first of:
-         *  1. A if it is signaling
-         *  2. B if it is signaling
-         *  3. A (quiet)
-         *  4. B (quiet)
-         * A signaling NaN is always quietened before returning it.
-         */
         /*
          * According to MIPS specifications, if one of the two operands is
          * a sNaN, a new qNaN has to be generated. This is done in
-- 
2.34.1

Set the 2-NaN propagation rule explicitly in the float_status words
we use.

For active_fpu.fp_status, we do this in a new fp_reset() function
which mirrors the existing msa_reset() function in doing "first call
restore to set the fp status parts that depend on CPU state, then set
the fp status parts that are constant".

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20241025141254.2141506-5-peter.maydell@linaro.org
---
 target/mips/fpu_helper.h       | 22 ++++++++++++++++++++++
 target/mips/cpu.c              |  2 +-
 target/mips/msa.c              | 17 +++++++++++++++++
 fpu/softfloat-specialize.c.inc | 18 ++----------------
 4 files changed, 42 insertions(+), 17 deletions(-)

diff --git a/target/mips/fpu_helper.h b/target/mips/fpu_helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/mips/fpu_helper.h
+++ b/target/mips/fpu_helper.h
@@ -XXX,XX +XXX,XX @@ static inline void restore_fp_status(CPUMIPSState *env)
     restore_snan_bit_mode(env);
 }
 
+static inline void fp_reset(CPUMIPSState *env)
+{
+    restore_fp_status(env);
+
+    /*
+     * According to MIPS specifications, if one of the two operands is
+     * a sNaN, a new qNaN has to be generated. This is done in
+     * floatXX_silence_nan(). For qNaN inputs the specifications
+     * says: "When possible, this QNaN result is one of the operand QNaN
+     * values." In practice it seems that most implementations choose
+     * the first operand if both operands are qNaN. In short this gives
+     * the following rules:
+     *  1. A if it is signaling
+     *  2. B if it is signaling
+     *  3. A (quiet)
+     *  4. B (quiet)
+     * A signaling NaN is always silenced before returning it.
+     */
+    set_float_2nan_prop_rule(float_2nan_prop_s_ab,
+                             &env->active_fpu.fp_status);
+}
+
 /* MSA */
 
 enum CPUMIPSMSADataFormat {
diff --git a/target/mips/cpu.c b/target/mips/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/mips/cpu.c
+++ b/target/mips/cpu.c
@@ -XXX,XX +XXX,XX @@ static void mips_cpu_reset_hold(Object *obj, ResetType type)
     }
 
     msa_reset(env);
+    fp_reset(env);
 
     compute_hflags(env);
-    restore_fp_status(env);
     restore_pamask(env);
     cs->exception_index = EXCP_NONE;
 
diff --git a/target/mips/msa.c b/target/mips/msa.c
index XXXXXXX..XXXXXXX 100644
--- a/target/mips/msa.c
+++ b/target/mips/msa.c
@@ -XXX,XX +XXX,XX @@ void msa_reset(CPUMIPSState *env)
     set_float_detect_tininess(float_tininess_after_rounding,
                               &env->active_tc.msa_fp_status);
 
+    /*
+     * According to MIPS specifications, if one of the two operands is
+     * a sNaN, a new qNaN has to be generated. This is done in
+     * floatXX_silence_nan(). For qNaN inputs the specifications
+     * says: "When possible, this QNaN result is one of the operand QNaN
+     * values." In practice it seems that most implementations choose
+     * the first operand if both operands are qNaN. In short this gives
+     * the following rules:
+     *  1. A if it is signaling
+     *  2. B if it is signaling
+     *  3. A (quiet)
+     *  4. B (quiet)
+     * A signaling NaN is always silenced before returning it.
+     */
+    set_float_2nan_prop_rule(float_2nan_prop_s_ab,
+                             &env->active_tc.msa_fp_status);
+
     /* clear float_status exception flags */
     set_float_exception_flags(0, &env->active_tc.msa_fp_status);
 
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
         /* target didn't set the rule: fall back to old ifdef choices */
 #if defined(TARGET_AVR) || defined(TARGET_HEXAGON) \
     || defined(TARGET_RISCV) || defined(TARGET_SH4) \
-    || defined(TARGET_TRICORE) || defined(TARGET_ARM)
+    || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS)
         g_assert_not_reached();
-#elif defined(TARGET_MIPS) || defined(TARGET_HPPA) || \
+#elif defined(TARGET_HPPA) || \
     defined(TARGET_LOONGARCH64) || defined(TARGET_S390X)
-        /*
-         * According to MIPS specifications, if one of the two operands is
-         * a sNaN, a new qNaN has to be generated. This is done in
-         * floatXX_silence_nan(). For qNaN inputs the specifications
-         * says: "When possible, this QNaN result is one of the operand QNaN
-         * values." In practice it seems that most implementations choose
-         * the first operand if both operands are qNaN. In short this gives
-         * the following rules:
-         *  1. A if it is signaling
-         *  2. B if it is signaling
-         *  3. A (quiet)
-         *  4. B (quiet)
-         * A signaling NaN is always silenced before returning it.
-         */
         rule = float_2nan_prop_s_ab;
 #elif defined(TARGET_PPC) || defined(TARGET_M68K)
         /*
-- 
2.34.1

Set the 2-NaN propagation rule explicitly in the float_status word we
use.

(There are a couple of places in fpu_helper.c where we create a
dummy float_status word with "float_status *s = { };", but these
are only used for calling float*_is_quiet_nan() so it doesn't
matter that we don't set a 2-NaN propagation rule there.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-6-peter.maydell@linaro.org
---
 target/loongarch/tcg/fpu_helper.c | 1 +
 fpu/softfloat-specialize.c.inc    | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/target/loongarch/tcg/fpu_helper.c b/target/loongarch/tcg/fpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/loongarch/tcg/fpu_helper.c
+++ b/target/loongarch/tcg/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ void restore_fp_status(CPULoongArchState *env)
     set_float_rounding_mode(ieee_rm[(env->fcsr0 >> FCSR0_RM) & 0x3],
                             &env->fp_status);
     set_flush_to_zero(0, &env->fp_status);
+    set_float_2nan_prop_rule(float_2nan_prop_s_ab, &env->fp_status);
 }
 
 int ieee_ex_to_loongarch(int xcpt)
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
         /* target didn't set the rule: fall back to old ifdef choices */
 #if defined(TARGET_AVR) || defined(TARGET_HEXAGON) \
     || defined(TARGET_RISCV) || defined(TARGET_SH4) \
-    || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS)
+    || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
+    || defined(TARGET_LOONGARCH64)
         g_assert_not_reached();
-#elif defined(TARGET_HPPA) || \
-    defined(TARGET_LOONGARCH64) || defined(TARGET_S390X)
+#elif defined(TARGET_HPPA) || defined(TARGET_S390X)
         rule = float_2nan_prop_s_ab;
 #elif defined(TARGET_PPC) || defined(TARGET_M68K)
         /*
-- 
2.34.1

Set the 2-NaN propagation rule explicitly in env->fp_status.

Really we only need to do this at CPU reset (after reset has zeroed
out most of the CPU state struct, which typically includes fp_status
fields).  However target/hppa does not currently implement CPU reset
at all, so leave a TODO comment to note that this could be moved if
we ever do implement reset.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-7-peter.maydell@linaro.org
---
 target/hppa/fpu_helper.c       | 6 ++++++
 fpu/softfloat-specialize.c.inc | 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/target/hppa/fpu_helper.c b/target/hppa/fpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/hppa/fpu_helper.c
+++ b/target/hppa/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(loaded_fr0)(CPUHPPAState *env)
     d = FIELD_EX32(shadow, FPSR, D);
     set_flush_to_zero(d, &env->fp_status);
     set_flush_inputs_to_zero(d, &env->fp_status);
+
+    /*
+     * TODO: we only need to do this at CPU reset, but currently
+     * HPPA does note implement a CPU reset method at all...
+     */
+    set_float_2nan_prop_rule(float_2nan_prop_s_ab, &env->fp_status);
 }
 
 void cpu_hppa_loaded_fr0(CPUHPPAState *env)
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
 #if defined(TARGET_AVR) || defined(TARGET_HEXAGON) \
     || defined(TARGET_RISCV) || defined(TARGET_SH4) \
     || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
-    || defined(TARGET_LOONGARCH64)
+    || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA)
         g_assert_not_reached();
-#elif defined(TARGET_HPPA) || defined(TARGET_S390X)
+#elif defined(TARGET_S390X)
         rule = float_2nan_prop_s_ab;
 #elif defined(TARGET_PPC) || defined(TARGET_M68K)
         /*
-- 
2.34.1

Set the 2-NaN propagation rule explicitly in env->fpu_status.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-8-peter.maydell@linaro.org
---
 target/s390x/cpu.c             | 1 +
 fpu/softfloat-specialize.c.inc | 5 ++---
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/s390x/cpu.c
+++ b/target/s390x/cpu.c
@@ -XXX,XX +XXX,XX @@ static void s390_cpu_reset_hold(Object *obj, ResetType type)
         /* tininess for underflow is detected before rounding */
         set_float_detect_tininess(float_tininess_before_rounding,
                                   &env->fpu_status);
+        set_float_2nan_prop_rule(float_2nan_prop_s_ab, &env->fpu_status);
        /* fall through */
     case RESET_TYPE_S390_CPU_NORMAL:
         env->psw.mask &= ~PSW_MASK_RI;
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
 #if defined(TARGET_AVR) || defined(TARGET_HEXAGON) \
     || defined(TARGET_RISCV) || defined(TARGET_SH4) \
     || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
-    || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA)
+    || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
+    || defined(TARGET_S390X)
         g_assert_not_reached();
-#elif defined(TARGET_S390X)
-        rule = float_2nan_prop_s_ab;
 #elif defined(TARGET_PPC) || defined(TARGET_M68K)
         /*
          * PowerPC propagation rules:
-- 
2.34.1

Set the 2-NaN propagation rule explicitly in env->fp_status
and env->vec_status.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-9-peter.maydell@linaro.org
---
 target/ppc/cpu_init.c          |  8 ++++++++
 fpu/softfloat-specialize.c.inc | 10 ++--------
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index XXXXXXX..XXXXXXX 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -XXX,XX +XXX,XX @@ static void ppc_cpu_reset_hold(Object *obj, ResetType type)
     /* tininess for underflow is detected before rounding */
     set_float_detect_tininess(float_tininess_before_rounding,
                               &env->fp_status);
+    /*
+     * PowerPC propagation rules:
+     *  1. A if it sNaN or qNaN
+     *  2. B if it sNaN or qNaN
+     * A signaling NaN is always silenced before returning it.
+     */
+    set_float_2nan_prop_rule(float_2nan_prop_ab, &env->fp_status);
+    set_float_2nan_prop_rule(float_2nan_prop_ab, &env->vec_status);
 
     for (i = 0; i < ARRAY_SIZE(env->spr_cb); i++) {
         ppc_spr_t *spr = &env->spr_cb[i];
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
     || defined(TARGET_RISCV) || defined(TARGET_SH4) \
     || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
     || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
-    || defined(TARGET_S390X)
+    || defined(TARGET_S390X) || defined(TARGET_PPC)
         g_assert_not_reached();
-#elif defined(TARGET_PPC) || defined(TARGET_M68K)
-        /*
-         * PowerPC propagation rules:
-         *  1. A if it sNaN or qNaN
-         *  2. B if it sNaN or qNaN
-         * A signaling NaN is always silenced before returning it.
-         */
+#elif defined(TARGET_M68K)
         /*
          * M68000 FAMILY PROGRAMMER'S REFERENCE MANUAL
          * 3.4 FLOATING-POINT INSTRUCTION DETAILS
-- 
2.34.1

Explicitly set the 2-NaN propagation rule on env->fp_status
and on the temporary fp_status that we use in frem (since
we pass that to a division operation function).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
 target/m68k/cpu.c              | 16 ++++++++++++++++
 target/m68k/fpu_helper.c       |  1 +
 fpu/softfloat-specialize.c.inc | 19 +------------------
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/target/m68k/cpu.c b/target/m68k/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/m68k/cpu.c
+++ b/target/m68k/cpu.c
@@ -XXX,XX +XXX,XX @@ static void m68k_cpu_reset_hold(Object *obj, ResetType type)
         env->fregs[i].d = nan;
     }
     cpu_m68k_set_fpcr(env, 0);
+    /*
+     * M68000 FAMILY PROGRAMMER'S REFERENCE MANUAL
+     * 3.4 FLOATING-POINT INSTRUCTION DETAILS
+     * If either operand, but not both operands, of an operation is a
+     * nonsignaling NaN, then that NaN is returned as the result. If both
+     * operands are nonsignaling NaNs, then the destination operand
+     * nonsignaling NaN is returned as the result.
+     * If either operand to an operation is a signaling NaN (SNaN), then the
+     * SNaN bit is set in the FPSR EXC byte. If the SNaN exception enable bit
+     * is set in the FPCR ENABLE byte, then the exception is taken and the
+     * destination is not modified. If the SNaN exception enable bit is not
+     * set, setting the SNaN bit in the operand to a one converts the SNaN to
+     * a nonsignaling NaN. The operation then continues as described in the
+     * preceding paragraph for nonsignaling NaNs.
+     */
+    set_float_2nan_prop_rule(float_2nan_prop_ab, &env->fp_status);
     env->fpsr = 0;
 
     /* TODO: We should set PC from the interrupt vector.  */
diff --git a/target/m68k/fpu_helper.c b/target/m68k/fpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/m68k/fpu_helper.c
+++ b/target/m68k/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(frem)(CPUM68KState *env, FPReg *res, FPReg *val0, FPReg *val1)
         int sign;
 
         /* Calculate quotient directly using round to nearest mode */
+        set_float_2nan_prop_rule(float_2nan_prop_ab, &fp_status);
         set_float_rounding_mode(float_round_nearest_even, &fp_status);
         set_floatx80_rounding_precision(
             get_floatx80_rounding_precision(&env->fp_status), &fp_status);
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
     || defined(TARGET_RISCV) || defined(TARGET_SH4) \
     || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
     || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
-    || defined(TARGET_S390X) || defined(TARGET_PPC)
+    || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K)
         g_assert_not_reached();
-#elif defined(TARGET_M68K)
-        /*
-         * M68000 FAMILY PROGRAMMER'S REFERENCE MANUAL
-         * 3.4 FLOATING-POINT INSTRUCTION DETAILS
-         * If either operand, but not both operands, of an operation is a
-         * nonsignaling NaN, then that NaN is returned as the result. If both
-         * operands are nonsignaling NaNs, then the destination operand
-         * nonsignaling NaN is returned as the result.
-         * If either operand to an operation is a signaling NaN (SNaN), then the
-         * SNaN bit is set in the FPSR EXC byte. If the SNaN exception enable bit
-         * is set in the FPCR ENABLE byte, then the exception is taken and the
-         * destination is not modified. If the SNaN exception enable bit is not
-         * set, setting the SNaN bit in the operand to a one converts the SNaN to
-         * a nonsignaling NaN. The operation then continues as described in the
-         * preceding paragraph for nonsignaling NaNs.
-         */
-        rule = float_2nan_prop_ab;
 #elif defined(TARGET_SPARC)
         /* Prefer SNaN over QNaN, order B then A. */
         rule = float_2nan_prop_s_ba;
-- 
2.34.1

In cf_fpu_gdb_get_reg() and cf_fpu_gdb_set_reg() we use a temporary
float_status variable to pass to floatx80_to_float64() and
float64_to_floatx80(), but we don't initialize it, meaning that those
functions could access uninitialized data.  Zero-init the structs.

(We don't need to set a NaN-propagation rule here because we
don't use these with a 2-argument fpu operation.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-11-peter.maydell@linaro.org
---
 target/m68k/helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/m68k/helper.c b/target/m68k/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/m68k/helper.c
+++ b/target/m68k/helper.c
@@ -XXX,XX +XXX,XX @@ static int cf_fpu_gdb_get_reg(CPUState *cs, GByteArray *mem_buf, int n)
     CPUM68KState *env = &cpu->env;
 
     if (n < 8) {
-        float_status s;
+        float_status s = {};
         return gdb_get_reg64(mem_buf, floatx80_to_float64(env->fregs[n].d, &s));
     }
     switch (n) {
@@ -XXX,XX +XXX,XX @@ static int cf_fpu_gdb_set_reg(CPUState *cs, uint8_t *mem_buf, int n)
     CPUM68KState *env = &cpu->env;
 
     if (n < 8) {
-        float_status s;
+        float_status s = {};
         env->fregs[n].d = float64_to_floatx80(ldq_be_p(mem_buf), &s);
         return 8;
     }
-- 
2.34.1

Currently we call cpu_put_fsr(0) in sparc_cpu_realizefn(), which
initializes various fields in the CPU struct:
 * fsr_cexc_ftt
 * fcc[]
 * fsr_qne
 * fsr
It also sets the rounding mode in env->fp_status.

This is largely pointless, because when we later reset the CPU
this will zero out all the fields up until the "end_reset_fields"
label, which includes all of these (but not fp_status!)

Move the cpu_put_fsr(env, 0) call to reset, because that expresses
the logical requirement: we want to reset FSR to 0 on every reset.
This isn't a behaviour change because the fields are all zero anyway.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-12-peter.maydell@linaro.org
---
 target/sparc/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/sparc/cpu.c b/target/sparc/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/sparc/cpu.c
+++ b/target/sparc/cpu.c
@@ -XXX,XX +XXX,XX @@ static void sparc_cpu_reset_hold(Object *obj, ResetType type)
     env->npc = env->pc + 4;
 #endif
     env->cache_control = 0;
+    cpu_put_fsr(env, 0);
 }
 
 #ifndef CONFIG_USER_ONLY
@@ -XXX,XX +XXX,XX @@ static void sparc_cpu_realizefn(DeviceState *dev, Error **errp)
     env->version |= env->def.maxtl << 8;
     env->version |= env->def.nwindows - 1;
 #endif
-    cpu_put_fsr(env, 0);
 
     cpu_exec_realizefn(cs, &local_err);
     if (local_err != NULL) {
-- 
2.34.1

Set the NaN propagation rule explicitly in the float_status
words we use.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-13-peter.maydell@linaro.org
---
 target/sparc/cpu.c             |  8 ++++++++
 target/sparc/fop_helper.c      | 10 ++++++++--
 fpu/softfloat-specialize.c.inc |  6 ++----
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/target/sparc/cpu.c b/target/sparc/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/sparc/cpu.c
+++ b/target/sparc/cpu.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/qdev-properties.h"
 #include "qapi/visitor.h"
 #include "tcg/tcg.h"
+#include "fpu/softfloat.h"
 
 //#define DEBUG_FEATURES
 
@@ -XXX,XX +XXX,XX @@ static void sparc_cpu_realizefn(DeviceState *dev, Error **errp)
     env->version |= env->def.nwindows - 1;
 #endif
 
+    /*
+     * Prefer SNaN over QNaN, order B then A. It's OK to do this in realize
+     * rather than reset, because fp_status is after 'end_reset_fields' in
+     * the CPU state struct so it won't get zeroed on reset.
+     */
+    set_float_2nan_prop_rule(float_2nan_prop_s_ba, &env->fp_status);
+
     cpu_exec_realizefn(cs, &local_err);
     if (local_err != NULL) {
         error_propagate(errp, local_err);
diff --git a/target/sparc/fop_helper.c b/target/sparc/fop_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/sparc/fop_helper.c
+++ b/target/sparc/fop_helper.c
@@ -XXX,XX +XXX,XX @@ uint32_t helper_flcmps(float32 src1, float32 src2)
      * Perform the comparison with a dummy fp environment.
      */
     float_status discard = { };
-    FloatRelation r = float32_compare_quiet(src1, src2, &discard);
+    FloatRelation r;
+
+    set_float_2nan_prop_rule(float_2nan_prop_s_ba, &discard);
+    r = float32_compare_quiet(src1, src2, &discard);
 
     switch (r) {
     case float_relation_equal:
@@ -XXX,XX +XXX,XX @@ uint32_t helper_flcmps(float32 src1, float32 src2)
 uint32_t helper_flcmpd(float64 src1, float64 src2)
 {
     float_status discard = { };
-    FloatRelation r = float64_compare_quiet(src1, src2, &discard);
+    FloatRelation r;
+
+    set_float_2nan_prop_rule(float_2nan_prop_s_ba, &discard);
+    r = float64_compare_quiet(src1, src2, &discard);
 
     switch (r) {
     case float_relation_equal:
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
     || defined(TARGET_RISCV) || defined(TARGET_SH4) \
     || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
     || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
-    || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K)
+    || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K) \
+    || defined(TARGET_SPARC)
         g_assert_not_reached();
-#elif defined(TARGET_SPARC)
-        /* Prefer SNaN over QNaN, order B then A. */
-        rule = float_2nan_prop_s_ba;
 #elif defined(TARGET_XTENSA)
         /*
          * Xtensa has two NaN propagation modes.
-- 
2.34.1

In xtensa we currently call set_use_first_nan() in a lot of
places where we want to switch the NaN-propagation handling.
We're about to change the softfloat API we use to do that,
so start by factoring all the calls out into a single
xtensa_use_first_nan() function.

The bulk of this change was done with
 sed -i -e 's/set_use_first_nan($[^,]*$,[^)]*)/xtensa_use_first_nan(env, \1)/'  target/xtensa/fpu_helper.c

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-14-peter.maydell@linaro.org
---
 target/xtensa/cpu.h        |  6 ++++++
 target/xtensa/cpu.c        |  2 +-
 target/xtensa/fpu_helper.c | 33 +++++++++++++++++++--------------
 3 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/target/xtensa/cpu.h b/target/xtensa/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/xtensa/cpu.h
+++ b/target/xtensa/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline void cpu_get_tb_cpu_state(CPUXtensaState *env, vaddr *pc,
 XtensaCPU *xtensa_cpu_create_with_clock(const char *cpu_type,
                                         Clock *cpu_refclk);
 
+/*
+ * Set the NaN propagation rule for future FPU operations:
+ * use_first is true to pick the first NaN as the result if both
+ * inputs are NaNs, false to pick the second.
+ */
+void xtensa_use_first_nan(CPUXtensaState *env, bool use_first);
 #endif
diff --git a/target/xtensa/cpu.c b/target/xtensa/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/xtensa/cpu.c
+++ b/target/xtensa/cpu.c
@@ -XXX,XX +XXX,XX @@ static void xtensa_cpu_reset_hold(Object *obj, ResetType type)
     cs->halted = env->runstall;
 #endif
     set_no_signaling_nans(!dfpu, &env->fp_status);
-    set_use_first_nan(!dfpu, &env->fp_status);
+    xtensa_use_first_nan(env, !dfpu);
 }
 
 static ObjectClass *xtensa_cpu_class_by_name(const char *cpu_model)
diff --git a/target/xtensa/fpu_helper.c b/target/xtensa/fpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/xtensa/fpu_helper.c
+++ b/target/xtensa/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ static const struct {
     { XTENSA_FP_V, float_flag_invalid, },
 };
 
+void xtensa_use_first_nan(CPUXtensaState *env, bool use_first)
+{
+    set_use_first_nan(use_first, &env->fp_status);
+}
+
 void HELPER(wur_fpu2k_fcr)(CPUXtensaState *env, uint32_t v)
 {
     static const int rounding_mode[] = {
@@ -XXX,XX +XXX,XX @@ float32 HELPER(fpu2k_msub_s)(CPUXtensaState *env,
 
 float64 HELPER(add_d)(CPUXtensaState *env, float64 a, float64 b)
 {
-    set_use_first_nan(true, &env->fp_status);
+    xtensa_use_first_nan(env, true);
     return float64_add(a, b, &env->fp_status);
 }
 
 float32 HELPER(add_s)(CPUXtensaState *env, float32 a, float32 b)
 {
-    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    xtensa_use_first_nan(env, env->config->use_first_nan);
     return float32_add(a, b, &env->fp_status);
 }
 
 float64 HELPER(sub_d)(CPUXtensaState *env, float64 a, float64 b)
 {
-    set_use_first_nan(true, &env->fp_status);
+    xtensa_use_first_nan(env, true);
     return float64_sub(a, b, &env->fp_status);
 }
 
 float32 HELPER(sub_s)(CPUXtensaState *env, float32 a, float32 b)
 {
-    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    xtensa_use_first_nan(env, env->config->use_first_nan);
     return float32_sub(a, b, &env->fp_status);
 }
 
 float64 HELPER(mul_d)(CPUXtensaState *env, float64 a, float64 b)
 {
-    set_use_first_nan(true, &env->fp_status);
+    xtensa_use_first_nan(env, true);
     return float64_mul(a, b, &env->fp_status);
 }
 
 float32 HELPER(mul_s)(CPUXtensaState *env, float32 a, float32 b)
 {
-    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    xtensa_use_first_nan(env, env->config->use_first_nan);
     return float32_mul(a, b, &env->fp_status);
 }
 
 float64 HELPER(madd_d)(CPUXtensaState *env, float64 a, float64 b, float64 c)
 {
-    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    xtensa_use_first_nan(env, env->config->use_first_nan);
     return float64_muladd(b, c, a, 0, &env->fp_status);
 }
 
 float32 HELPER(madd_s)(CPUXtensaState *env, float32 a, float32 b, float32 c)
 {
-    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    xtensa_use_first_nan(env, env->config->use_first_nan);
     return float32_muladd(b, c, a, 0, &env->fp_status);
 }
 
 float64 HELPER(msub_d)(CPUXtensaState *env, float64 a, float64 b, float64 c)
 {
-    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    xtensa_use_first_nan(env, env->config->use_first_nan);
     return float64_muladd(b, c, a, float_muladd_negate_product,
                           &env->fp_status);
 }
 
 float32 HELPER(msub_s)(CPUXtensaState *env, float32 a, float32 b, float32 c)
 {
-    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    xtensa_use_first_nan(env, env->config->use_first_nan);
     return float32_muladd(b, c, a, float_muladd_negate_product,
                           &env->fp_status);
 }
 
 float64 HELPER(mkdadj_d)(CPUXtensaState *env, float64 a, float64 b)
 {
-    set_use_first_nan(true, &env->fp_status);
+    xtensa_use_first_nan(env, true);
     return float64_div(b, a, &env->fp_status);
 }
 
 float32 HELPER(mkdadj_s)(CPUXtensaState *env, float32 a, float32 b)
 {
-    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    xtensa_use_first_nan(env, env->config->use_first_nan);
     return float32_div(b, a, &env->fp_status);
 }
 
 float64 HELPER(mksadj_d)(CPUXtensaState *env, float64 v)
 {
-    set_use_first_nan(true, &env->fp_status);
+    xtensa_use_first_nan(env, true);
     return float64_sqrt(v, &env->fp_status);
 }
 
 float32 HELPER(mksadj_s)(CPUXtensaState *env, float32 v)
 {
-    set_use_first_nan(env->config->use_first_nan, &env->fp_status);
+    xtensa_use_first_nan(env, env->config->use_first_nan);
     return float32_sqrt(v, &env->fp_status);
 }
 
-- 
2.34.1

Set the NaN propagation rule explicitly in xtensa_use_first_nan().

(When we convert the softfloat pickNaNMulAdd routine to also
select a NaN propagation rule at runtime, we will be able to
remove the use_first_nan flag because the propagation rules
will handle everything.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-15-peter.maydell@linaro.org
---
 target/xtensa/fpu_helper.c     |  2 ++
 fpu/softfloat-specialize.c.inc | 12 +-----------
 2 files changed, 3 insertions(+), 11 deletions(-)

diff --git a/target/xtensa/fpu_helper.c b/target/xtensa/fpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/xtensa/fpu_helper.c
+++ b/target/xtensa/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ static const struct {
 void xtensa_use_first_nan(CPUXtensaState *env, bool use_first)
 {
     set_use_first_nan(use_first, &env->fp_status);
+    set_float_2nan_prop_rule(use_first ? float_2nan_prop_ab : float_2nan_prop_ba,
+                             &env->fp_status);
 }
 
 void HELPER(wur_fpu2k_fcr)(CPUXtensaState *env, uint32_t v)
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
     || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
     || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
     || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K) \
-    || defined(TARGET_SPARC)
+    || defined(TARGET_SPARC) || defined(TARGET_XTENSA)
         g_assert_not_reached();
-#elif defined(TARGET_XTENSA)
-        /*
-         * Xtensa has two NaN propagation modes.
-         * Which one is active is controlled by float_status::use_first_nan.
-         */
-        if (status->use_first_nan) {
-            rule = float_2nan_prop_ab;
-        } else {
-            rule = float_2nan_prop_ba;
-        }
 #else
         rule = float_2nan_prop_x87;
 #endif
-- 
2.34.1

Set the NaN propagation rule explicitly for the float_status words
used in the x86 target.

This is a no-behaviour-change commit, so we retain the existing
behaviour of using the x87-style "prefer QNaN over SNaN, then prefer
the NaN with the larger significand" for MMX and SSE.  This is
however not the documented hardware behaviour, so we leave a TODO
note about what we should be doing instead.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-16-peter.maydell@linaro.org
---
 target/i386/cpu.h              |  3 +++
 target/i386/cpu.c              |  4 ++++
 target/i386/tcg/fpu_helper.c   | 40 ++++++++++++++++++++++++++++++++++
 fpu/softfloat-specialize.c.inc |  3 ++-
 4 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -XXX,XX +XXX,XX @@ static inline bool cpu_vmx_maybe_enabled(CPUX86State *env)
 int get_pg_mode(CPUX86State *env);
 
 /* fpu_helper.c */
+
+/* Set all non-runtime-variable float_status fields to x86 handling */
+void cpu_init_fp_statuses(CPUX86State *env);
 void update_fp_status(CPUX86State *env);
 void update_mxcsr_status(CPUX86State *env);
 void update_mxcsr_from_sse_status(CPUX86State *env);
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -XXX,XX +XXX,XX @@ static void x86_cpu_reset_hold(Object *obj, ResetType type)
 
     memset(env, 0, offsetof(CPUX86State, end_reset_fields));
 
+    if (tcg_enabled()) {
+        cpu_init_fp_statuses(env);
+    }
+
     env->old_exception = -1;
 
     /* init to reset state */
diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/i386/tcg/fpu_helper.c
+++ b/target/i386/tcg/fpu_helper.c
@@ -XXX,XX +XXX,XX @@ static void fpu_set_exception(CPUX86State *env, int mask)
     }
 }
 
+void cpu_init_fp_statuses(CPUX86State *env)
+{
+    /*
+     * Initialise the non-runtime-varying fields of the various
+     * float_status words to x86 behaviour. This must be called at
+     * CPU reset because the float_status words are in the
+     * "zeroed on reset" portion of the CPU state struct.
+     * Fields in float_status that vary under guest control are set
+     * via the codepath for setting that register, eg cpu_set_fpuc().
+     */
+    /*
+     * Use x87 NaN propagation rules:
+     * SNaN + QNaN => return the QNaN
+     * two SNaNs => return the one with the larger significand, silenced
+     * two QNaNs => return the one with the larger significand
+     * SNaN and a non-NaN => return the SNaN, silenced
+     * QNaN and a non-NaN => return the QNaN
+     *
+     * If we get down to comparing significands and they are the same,
+     * return the NaN with the positive sign bit (if any).
+     */
+    set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
+    /*
+     * TODO: These are incorrect: the x86 Software Developer's Manual vol 1
+     * section 4.8.3.5 "Operating on SNaNs and QNaNs" says that the
+     * "larger significand" behaviour is only used for x87 FPU operations.
+     * For SSE the required behaviour is to always return the first NaN,
+     * which is float_2nan_prop_ab.
+     *
+     * mmx_status is used only for the AMD 3DNow! instructions, which
+     * are documented in the "3DNow! Technology Manual" as not supporting
+     * NaNs or infinities as inputs. The result of passing two NaNs is
+     * documented as "undefined", so we can do what we choose.
+     * (Strictly there is some behaviour we don't implement correctly
+     * for these "unsupported" NaN and Inf values, like "NaN * 0 == 0".)
+     */
+    set_float_2nan_prop_rule(float_2nan_prop_x87, &env->mmx_status);
+    set_float_2nan_prop_rule(float_2nan_prop_x87, &env->sse_status);
+}
+
 static inline uint8_t save_exception_flags(CPUX86State *env)
 {
     uint8_t old_flags = get_float_exception_flags(&env->fp_status);
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
     || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
     || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
     || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K) \
-    || defined(TARGET_SPARC) || defined(TARGET_XTENSA)
+    || defined(TARGET_SPARC) || defined(TARGET_XTENSA) \
+    || defined(TARGET_I386)
         g_assert_not_reached();
 #else
         rule = float_2nan_prop_x87;
-- 
2.34.1

Set the NaN propagation rule explicitly for the float_status word
used in this target.

This is a no-behaviour-change commit, so we retain the existing
behaviour of x87-style pick-largest-significand NaN propagation.
This is however not the architecturally correct handling, so we leave
a TODO note to that effect.

We also leave a TODO note pointing out that all this code in the cpu
initfn (including the existing setting up of env->flags and the FPCR)
should be in a currently non-existent CPU reset function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-17-peter.maydell@linaro.org
---
 target/alpha/cpu.c             | 11 +++++++++++
 fpu/softfloat-specialize.c.inc |  2 +-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/target/alpha/cpu.c b/target/alpha/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/alpha/cpu.c
+++ b/target/alpha/cpu.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/qemu-print.h"
 #include "cpu.h"
 #include "exec/exec-all.h"
+#include "fpu/softfloat.h"
 
 
 static void alpha_cpu_set_pc(CPUState *cs, vaddr value)
@@ -XXX,XX +XXX,XX @@ static void alpha_cpu_initfn(Object *obj)
 {
     CPUAlphaState *env = cpu_env(CPU(obj));
 
+    /* TODO all this should be done in reset, not init */
+
     env->lock_addr = -1;
+
+    /*
+     * TODO: this is incorrect. The Alpha Architecture Handbook version 4
+     * describes NaN propagation in section 4.7.10.4. We should prefer
+     * the operand in Fb (whether it is a QNaN or an SNaN), then the
+     * operand in Fa. That is float_2nan_prop_ba.
+     */
+    set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
 #if defined(CONFIG_USER_ONLY)
     env->flags = ENV_FLAG_PS_USER | ENV_FLAG_FEN;
     cpu_alpha_store_fpcr(env, (uint64_t)(FPCR_INVD | FPCR_DZED | FPCR_OVFD
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
     || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
     || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K) \
     || defined(TARGET_SPARC) || defined(TARGET_XTENSA) \
-    || defined(TARGET_I386)
+    || defined(TARGET_I386) || defined(TARGET_ALPHA)
         g_assert_not_reached();
 #else
         rule = float_2nan_prop_x87;
-- 
2.34.1

Although the floating point rounding mode for Microblaze is always
nearest-even, we cannot set it just once in the CPU initfn.  This is
because env->fp_status is in the part of the CPU state struct that is
zeroed on reset.

Move the call to set_float_rounding_mode() into the reset fn.

(This had no guest-visible effects because it happens that the
float_round_nearest_even enum value is 0, so when the struct was
zeroed it didn't corrupt the setting.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-18-peter.maydell@linaro.org
---
 target/microblaze/cpu.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/target/microblaze/cpu.c b/target/microblaze/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/microblaze/cpu.c
+++ b/target/microblaze/cpu.c
@@ -XXX,XX +XXX,XX @@ static void mb_cpu_reset_hold(Object *obj, ResetType type)
 
     env->pc = cpu->cfg.base_vectors;
 
+    set_float_rounding_mode(float_round_nearest_even, &env->fp_status);
+
 #if defined(CONFIG_USER_ONLY)
     /* start in user mode with interrupts enabled.  */
     mb_cpu_write_msr(env, MSR_EE | MSR_IE | MSR_VM | MSR_UM);
@@ -XXX,XX +XXX,XX @@ static void mb_cpu_realizefn(DeviceState *dev, Error **errp)
 static void mb_cpu_initfn(Object *obj)
 {
     MicroBlazeCPU *cpu = MICROBLAZE_CPU(obj);
-    CPUMBState *env = &cpu->env;
 
     gdb_register_coprocessor(CPU(cpu), mb_cpu_gdb_read_stack_protect,
                              mb_cpu_gdb_write_stack_protect,
                              gdb_find_static_feature("microblaze-stack-protect.xml"),
                              0);
 
-    set_float_rounding_mode(float_round_nearest_even, &env->fp_status);
-
 #ifndef CONFIG_USER_ONLY
     /* Inbound IRQ and FIR lines */
     qdev_init_gpio_in(DEVICE(cpu), microblaze_cpu_set_irq, 2);
-- 
2.34.1

Set the NaN propagation rule explicitly for the float_status word
used in the microblaze target.

This is probably not the architecturally correct behaviour,
but since this is a no-behaviour-change patch, we leave a
TODO note to that effect.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-19-peter.maydell@linaro.org
---
 target/microblaze/cpu.c        | 5 +++++
 fpu/softfloat-specialize.c.inc | 3 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/target/microblaze/cpu.c b/target/microblaze/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/microblaze/cpu.c
+++ b/target/microblaze/cpu.c
@@ -XXX,XX +XXX,XX @@ static void mb_cpu_reset_hold(Object *obj, ResetType type)
     env->pc = cpu->cfg.base_vectors;
 
     set_float_rounding_mode(float_round_nearest_even, &env->fp_status);
+    /*
+     * TODO: this is probably not the correct NaN propagation rule for
+     * this architecture.
+     */
+    set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
 
 #if defined(CONFIG_USER_ONLY)
     /* start in user mode with interrupts enabled.  */
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
     || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
     || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K) \
     || defined(TARGET_SPARC) || defined(TARGET_XTENSA) \
-    || defined(TARGET_I386) || defined(TARGET_ALPHA)
+    || defined(TARGET_I386) || defined(TARGET_ALPHA) \
+    || defined(TARGET_MICROBLAZE)
         g_assert_not_reached();
 #else
         rule = float_2nan_prop_x87;
-- 
2.34.1

Set the NaN propagation rule explicitly for the float_status word
used in the openrisc target.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-20-peter.maydell@linaro.org
---
 target/openrisc/cpu.c          | 6 ++++++
 fpu/softfloat-specialize.c.inc | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/target/openrisc/cpu.c b/target/openrisc/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/openrisc/cpu.c
+++ b/target/openrisc/cpu.c
@@ -XXX,XX +XXX,XX @@ static void openrisc_cpu_reset_hold(Object *obj, ResetType type)
 
     set_float_detect_tininess(float_tininess_before_rounding,
                               &cpu->env.fp_status);
+    /*
+     * TODO: this is probably not the correct NaN propagation rule for
+     * this architecture.
+     */
+    set_float_2nan_prop_rule(float_2nan_prop_x87, &cpu->env.fp_status);
+
 
 #ifndef CONFIG_USER_ONLY
     cpu->env.picmr = 0x00000000;
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
     || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K) \
     || defined(TARGET_SPARC) || defined(TARGET_XTENSA) \
     || defined(TARGET_I386) || defined(TARGET_ALPHA) \
-    || defined(TARGET_MICROBLAZE)
+    || defined(TARGET_MICROBLAZE) || defined(TARGET_OPENRISC)
         g_assert_not_reached();
 #else
         rule = float_2nan_prop_x87;
-- 
2.34.1

Set the NaN propagation rule explicitly for the float_status word
used in the rx target.

This not the architecturally correct behaviour, but since this is a
no-behaviour-change patch, we leave a TODO note to that effect.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-21-peter.maydell@linaro.org
---
 target/rx/cpu.c                | 7 +++++++
 fpu/softfloat-specialize.c.inc | 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/target/rx/cpu.c b/target/rx/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/rx/cpu.c
+++ b/target/rx/cpu.c
@@ -XXX,XX +XXX,XX @@ static void rx_cpu_reset_hold(Object *obj, ResetType type)
     env->fpsw = 0;
     set_flush_to_zero(1, &env->fp_status);
     set_flush_inputs_to_zero(1, &env->fp_status);
+    /*
+     * TODO: this is not the correct NaN propagation rule for this
+     * architecture. The "RX Family User's Manual: Software" table 1.6
+     * defines the propagation rules as "prefer SNaN over QNaN;
+     * then prefer dest over source", which is float_2nan_prop_s_ab.
+     */
+    set_float_2nan_prop_rule(float_2nan_prop_x87, &env->fp_status);
 }
 
 static ObjectClass *rx_cpu_class_by_name(const char *cpu_model)
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -XXX,XX +XXX,XX @@ static int pickNaN(FloatClass a_cls, FloatClass b_cls,
     || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K) \
     || defined(TARGET_SPARC) || defined(TARGET_XTENSA) \
     || defined(TARGET_I386) || defined(TARGET_ALPHA) \
-    || defined(TARGET_MICROBLAZE) || defined(TARGET_OPENRISC)
+    || defined(TARGET_MICROBLAZE) || defined(TARGET_OPENRISC) \
+    || defined(TARGET_RX)
         g_assert_not_reached();
 #else
         rule = float_2nan_prop_x87;
-- 
2.34.1

Now that all targets have been converted to explicitly set a NaN
propagation rule, we can remove the set of target ifdefs (which now
list every target) and clean up the references to fallback behaviour
for float_2nan_prop_none.

The "default" case in the switch will catch any remaining places
where status->float_2nan_prop_rule was not set by the target.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241025141254.2141506-22-peter.maydell@linaro.org
---
 include/fpu/softfloat-types.h  | 10 +++-------
 fpu/softfloat-specialize.c.inc | 23 +++--------------------
 2 files changed, 6 insertions(+), 27 deletions(-)

diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h
index XXXXXXX..XXXXXXX 100644
--- a/include/fpu/softfloat-types.h
+++ b/include/fpu/softfloat-types.h
@@ -XXX,XX +XXX,XX @@ typedef enum __attribute__((__packed__)) {
  * If default_nan_mode is enabled then it is valid not to set a
  * NaN propagation rule, because the softfloat code guarantees
  * not to try to pick a NaN to propagate in default NaN mode.
- *
- * For transition, currently the 'none' rule will cause us to
- * fall back to picking the propagation rule based on the existing
- * ifdef ladder. When all targets are converted it will be an error
- * not to set the rule in float_status unless in default_nan_mode,
- * and we will assert if we need to handle an input NaN and no
- * rule was selected.
+ * When not in default-NaN mode, it is an error for the target
+ * not to set the rule in float_status, and we will assert if
+ * we need to handle an input NaN and no rule was selected.
  */
 typedef enum __attribute__((__packed__)) {
     /* No propagation rule specified */
diff --git a/fpu/softfloat-specialize.c.inc b/fpu/softfloat-specialize.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat-specialize.c.inc
+++ b/fpu/softfloat-specialize.c.inc
@@ -XXX,XX +XXX,XX @@ bool float32_is_signaling_nan(float32 a_, float_status *status)
 static int pickNaN(FloatClass a_cls, FloatClass b_cls,
                    bool aIsLargerSignificand, float_status *status)
 {
-    Float2NaNPropRule rule = status->float_2nan_prop_rule;
-
     /*
      * We guarantee not to require the target to tell us how to
      * pick a NaN if we're always returning the default NaN.
+     * But if we're not in default-NaN mode then the target must
+     * specify via set_float_2nan_prop_rule().
      */
     assert(!status->default_nan_mode);
 
-    if (rule == float_2nan_prop_none) {
-        /* target didn't set the rule: fall back to old ifdef choices */
-#if defined(TARGET_AVR) || defined(TARGET_HEXAGON) \
-    || defined(TARGET_RISCV) || defined(TARGET_SH4) \
-    || defined(TARGET_TRICORE) || defined(TARGET_ARM) || defined(TARGET_MIPS) \
-    || defined(TARGET_LOONGARCH64) || defined(TARGET_HPPA) \
-    || defined(TARGET_S390X) || defined(TARGET_PPC) || defined(TARGET_M68K) \
-    || defined(TARGET_SPARC) || defined(TARGET_XTENSA) \
-    || defined(TARGET_I386) || defined(TARGET_ALPHA) \
-    || defined(TARGET_MICROBLAZE) || defined(TARGET_OPENRISC) \
-    || defined(TARGET_RX)
-        g_assert_not_reached();
-#else
-        rule = float_2nan_prop_x87;
-#endif
-    }
-
-    switch (rule) {
+    switch (status->float_2nan_prop_rule) {
     case float_2nan_prop_s_ab:
         if (is_snan(a_cls)) {
             return 0;
-- 
2.34.1

This reverts commit 4c2c0474693229c1f533239bb983495c5427784d.

This commit tried to fix a problem with our usage of MMU indexes when
EL3 is AArch32, using what it described as a "more complicated
approach" where we share the same MMU index values for Secure PL1&0
and NonSecure PL1&0. In theory this should work, but the change
didn't account for (at least) two things:

(1) The design change means we need to flush the TLBs at any point
where the CPU state flips from one to the other.  We already flush
the TLB when SCR.NS is changed, but we don't flush the TLB when we
take an exception from NS PL1&0 into Mon or when we return from Mon
to NS PL1&0, and the commit didn't add any code to do that.

(2) The ATS12NS* address translate instructions allow Mon code (which
is Secure) to do a stage 1+2 page table walk for NS.  I thought this
was OK because do_ats_write() does a page table walk which doesn't
use the TLBs, so because it can pass both the MMU index and also an
ARMSecuritySpace argument we can tell the table walk that we want NS
stage1+2, not S.  But that means that all the code within the ptw
that needs to find e.g.  the regime EL cannot do so only with an
mmu_idx -- all these functions like regime_sctlr(), regime_el(), etc
would need to pass both an mmu_idx and the security_space, so they
can tell whether this is a translation regime controlled by EL1 or
EL3 (and so whether to look at SCTLR.S or SCTLR.NS, etc).

In particular, because regime_el() wasn't updated to look at the
ARMSecuritySpace it would return 1 even when the CPU was in Monitor
mode (and the controlling EL is 3).  This meant that page table walks
in Monitor mode would look at the wrong SCTLR, TCR, etc and would
generally fault when they should not.

Rather than trying to make the complicated changes needed to rescue
the design of 4c2c04746932, we revert it in order to instead take the
route that that commit describes as "the most straightforward" fix,
where we add new MMU indexes EL30_0, EL30_3, EL30_3_PAN to correspond
to "Secure PL1&0 at PL0", "Secure PL1&0 at PL1", and "Secure PL1&0 at
PL1 with PAN".

This revert will re-expose the "spurious alignment faults in
Secure PL0" issue #2326; we'll fix it again in the next commit.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Thomas Huth <thuth@redhat.com>
Message-id: 20241101142845.1712482-2-peter.maydell@linaro.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h               | 31 +++++++++++++------------------
 target/arm/internals.h         | 27 ++++-----------------------
 target/arm/tcg/translate.h     |  2 --
 target/arm/helper.c            | 34 +++++++++++-----------------------
 target/arm/ptw.c               |  6 +-----
 target/arm/tcg/hflags.c        |  4 ----
 target/arm/tcg/translate-a64.c |  2 +-
 target/arm/tcg/translate.c     |  9 ++++-----
 8 files changed, 34 insertions(+), 81 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
  *  + NonSecure PL1 & 0 stage 1
  *  + NonSecure PL1 & 0 stage 2
  *  + NonSecure PL2
- *  + Secure PL1 & 0
+ *  + Secure PL0
+ *  + Secure PL1
  * (reminder: for 32 bit EL3, Secure PL1 is *EL3*, not EL1.)
  *
  * For QEMU, an mmu_idx is not quite the same as a translation regime because:
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
  *     The only use of stage 2 translations is either as part of an s1+2
  *     lookup or when loading the descriptors during a stage 1 page table walk,
  *     and in both those cases we don't use the TLB.
- *  4. we want to be able to use the TLB for accesses done as part of a
+ *  4. we can also safely fold together the "32 bit EL3" and "64 bit EL3"
+ *     translation regimes, because they map reasonably well to each other
+ *     and they can't both be active at the same time.
+ *  5. we want to be able to use the TLB for accesses done as part of a
  *     stage1 page table walk, rather than having to walk the stage2 page
  *     table over and over.
- *  5. we need separate EL1/EL2 mmu_idx for handling the Privileged Access
+ *  6. we need separate EL1/EL2 mmu_idx for handling the Privileged Access
  *     Never (PAN) bit within PSTATE.
- *  6. we fold together most secure and non-secure regimes for A-profile,
+ *  7. we fold together most secure and non-secure regimes for A-profile,
  *     because there are no banked system registers for aarch64, so the
  *     process of switching between secure and non-secure is
  *     already heavyweight.
- *  7. we cannot fold together Stage 2 Secure and Stage 2 NonSecure,
+ *  8. we cannot fold together Stage 2 Secure and Stage 2 NonSecure,
  *     because both are in use simultaneously for Secure EL2.
  *
  * This gives us the following list of cases:
  *
- * EL0 EL1&0 stage 1+2 (or AArch32 PL0 PL1&0 stage 1+2)
- * EL1 EL1&0 stage 1+2 (or AArch32 PL1 PL1&0 stage 1+2)
- * EL1 EL1&0 stage 1+2 +PAN (or AArch32 PL1 PL1&0 stage 1+2 +PAN)
+ * EL0 EL1&0 stage 1+2 (aka NS PL0)
+ * EL1 EL1&0 stage 1+2 (aka NS PL1)
+ * EL1 EL1&0 stage 1+2 +PAN
  * EL0 EL2&0
  * EL2 EL2&0
  * EL2 EL2&0 +PAN
  * EL2 (aka NS PL2)
- * EL3 (not used when EL3 is AArch32)
+ * EL3 (aka S PL1)
  * Stage2 Secure
  * Stage2 NonSecure
  * plus one TLB per Physical address space: S, NS, Realm, Root
  *
  * for a total of 14 different mmu_idx.
  *
- * Note that when EL3 is AArch32, the usage is potentially confusing
- * because the MMU indexes are named for their AArch64 use, so code
- * using the ARMMMUIdx_E10_1 might be at EL3, not EL1. This is because
- * Secure PL1 is always at EL3.
- *
  * R profile CPUs have an MPU, but can use the same set of MMU indexes
  * as A profile. They only need to distinguish EL0 and EL1 (and
  * EL2 for cores like the Cortex-R52).
@@ -XXX,XX +XXX,XX @@ FIELD(TBFLAG_A32, NS, 10, 1)
  * This requires an SME trap from AArch32 mode when using NEON.
  */
 FIELD(TBFLAG_A32, SME_TRAP_NONSTREAMING, 11, 1)
-/*
- * Indicates whether we are in the Secure PL1&0 translation regime
- */
-FIELD(TBFLAG_A32, S_PL1_0, 12, 1)
 
 /*
  * Bit usage when in AArch32 state, for M-profile only.
diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ FIELD(CNTHCTL, CNTPMASK, 19, 1)
 #define M_FAKE_FSR_NSC_EXEC 0xf /* NS executing in S&NSC memory */
 #define M_FAKE_FSR_SFAULT 0xe /* SecureFault INVTRAN, INVEP or AUVIOL */
 
-/**
- * arm_aa32_secure_pl1_0(): Return true if in Secure PL1&0 regime
- *
- * Return true if the CPU is in the Secure PL1&0 translation regime.
- * This requires that EL3 exists and is AArch32 and we are currently
- * Secure. If this is the case then the ARMMMUIdx_E10* apply and
- * mean we are in EL3, not EL1.
- */
-static inline bool arm_aa32_secure_pl1_0(CPUARMState *env)
-{
-    return arm_feature(env, ARM_FEATURE_EL3) &&
-        !arm_el_is_aa64(env, 3) && arm_is_secure(env);
-}
-
 /**
  * raise_exception: Raise the specified exception.
  * Raise a guest exception with the specified value, syndrome register
@@ -XXX,XX +XXX,XX @@ static inline ARMMMUIdx core_to_aa64_mmu_idx(int mmu_idx)
     return mmu_idx | ARM_MMU_IDX_A;
 }
 
-/**
- * Return the exception level we're running at if our current MMU index
- * is @mmu_idx. @s_pl1_0 should be true if this is the AArch32
- * Secure PL1&0 translation regime.
- */
-int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx, bool s_pl1_0);
+int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx);
 
 /* Return the MMU index for a v7M CPU in the specified security state */
 ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate);
@@ -XXX,XX +XXX,XX @@ static inline uint32_t regime_el(CPUARMState *env, ARMMMUIdx mmu_idx)
         return 3;
     case ARMMMUIdx_E10_0:
     case ARMMMUIdx_Stage1_E0:
-    case ARMMMUIdx_E10_1:
-    case ARMMMUIdx_E10_1_PAN:
+        return arm_el_is_aa64(env, 3) || !arm_is_secure_below_el3(env) ? 1 : 3;
     case ARMMMUIdx_Stage1_E1:
     case ARMMMUIdx_Stage1_E1_PAN:
-        return arm_el_is_aa64(env, 3) || !arm_is_secure_below_el3(env) ? 1 : 3;
+    case ARMMMUIdx_E10_1:
+    case ARMMMUIdx_E10_1_PAN:
     case ARMMMUIdx_MPrivNegPri:
     case ARMMMUIdx_MUserNegPri:
     case ARMMMUIdx_MPriv:
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -XXX,XX +XXX,XX @@ typedef struct DisasContext {
     uint8_t gm_blocksize;
     /* True if the current insn_start has been updated. */
     bool insn_start_updated;
-    /* True if this is the AArch32 Secure PL1&0 translation regime */
-    bool s_pl1_0;
     /* Bottom two bits of XScale c15_cpar coprocessor access control reg */
     int c15_cpar;
     /* Offset from VNCR_EL2 when FEAT_NV2 redirects this reg to memory */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
          */
         format64 = arm_s1_regime_using_lpae_format(env, mmu_idx);
 
-        if (arm_feature(env, ARM_FEATURE_EL2) && !arm_aa32_secure_pl1_0(env)) {
+        if (arm_feature(env, ARM_FEATURE_EL2)) {
             if (mmu_idx == ARMMMUIdx_E10_0 ||
                 mmu_idx == ARMMMUIdx_E10_1 ||
                 mmu_idx == ARMMMUIdx_E10_1_PAN) {
@@ -XXX,XX +XXX,XX @@ static void ats_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
     case 0:
         /* stage 1 current state PL1: ATS1CPR, ATS1CPW, ATS1CPRP, ATS1CPWP */
         switch (el) {
+        case 3:
+            mmu_idx = ARMMMUIdx_E3;
+            break;
         case 2:
             g_assert(ss != ARMSS_Secure);  /* ARMv8.4-SecEL2 is 64-bit only */
             /* fall through */
         case 1:
-        case 3:
             if (ri->crm == 9 && arm_pan_enabled(env)) {
                 mmu_idx = ARMMMUIdx_Stage1_E1_PAN;
             } else {
@@ -XXX,XX +XXX,XX @@ void arm_cpu_do_interrupt(CPUState *cs)
 
 uint64_t arm_sctlr(CPUARMState *env, int el)
 {
-    if (arm_aa32_secure_pl1_0(env)) {
-        /* In Secure PL1&0 SCTLR_S is always controlling */
-        el = 3;
-    } else if (el == 0) {
-        /* Only EL0 needs to be adjusted for EL1&0 or EL2&0. */
+    /* Only EL0 needs to be adjusted for EL1&0 or EL2&0. */
+    if (el == 0) {
         ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, 0);
         el = mmu_idx == ARMMMUIdx_E20_0 ? 2 : 1;
     }
@@ -XXX,XX +XXX,XX @@ int fp_exception_el(CPUARMState *env, int cur_el)
     return 0;
 }
 
-/*
- * Return the exception level we're running at if this is our mmu_idx.
- * s_pl1_0 should be true if this is the AArch32 Secure PL1&0 translation
- * regime.
- */
-int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx, bool s_pl1_0)
+/* Return the exception level we're running at if this is our mmu_idx */
+int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx)
 {
     if (mmu_idx & ARM_MMU_IDX_M) {
         return mmu_idx & ARM_MMU_IDX_M_PRIV;
@@ -XXX,XX +XXX,XX @@ int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx, bool s_pl1_0)
         return 0;
     case ARMMMUIdx_E10_1:
     case ARMMMUIdx_E10_1_PAN:
-        return s_pl1_0 ? 3 : 1;
+        return 1;
     case ARMMMUIdx_E2:
     case ARMMMUIdx_E20_2:
     case ARMMMUIdx_E20_2_PAN:
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el)
             idx = ARMMMUIdx_E10_0;
         }
         break;
-    case 3:
-        /*
-         * AArch64 EL3 has its own translation regime; AArch32 EL3
-         * uses the Secure PL1&0 translation regime.
-         */
-        if (arm_el_is_aa64(env, 3)) {
-            return ARMMMUIdx_E3;
-        }
-        /* fall through */
     case 1:
         if (arm_pan_enabled(env)) {
             idx = ARMMMUIdx_E10_1_PAN;
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el)
             idx = ARMMMUIdx_E2;
         }
         break;
+    case 3:
+        return ARMMMUIdx_E3;
     default:
         g_assert_not_reached();
     }
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, vaddr address,
     case ARMMMUIdx_Stage1_E1:
     case ARMMMUIdx_Stage1_E1_PAN:
     case ARMMMUIdx_E2:
-        if (arm_aa32_secure_pl1_0(env)) {
-            ss = ARMSS_Secure;
-        } else {
-            ss = arm_security_space_below_el3(env);
-        }
+        ss = arm_security_space_below_el3(env);
         break;
     case ARMMMUIdx_Stage2:
         /*
diff --git a/target/arm/tcg/hflags.c b/target/arm/tcg/hflags.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/hflags.c
+++ b/target/arm/tcg/hflags.c
@@ -XXX,XX +XXX,XX @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState *env, int fp_el,
         DP_TBFLAG_A32(flags, SME_TRAP_NONSTREAMING, 1);
     }
 
-    if (arm_aa32_secure_pl1_0(env)) {
-        DP_TBFLAG_A32(flags, S_PL1_0, 1);
-    }
-
     return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
 }
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_tr_init_disas_context(DisasContextBase *dcbase,
     dc->tbii = EX_TBFLAG_A64(tb_flags, TBII);
     dc->tbid = EX_TBFLAG_A64(tb_flags, TBID);
     dc->tcma = EX_TBFLAG_A64(tb_flags, TCMA);
-    dc->current_el = arm_mmu_idx_to_el(dc->mmu_idx, false);
+    dc->current_el = arm_mmu_idx_to_el(dc->mmu_idx);
 #if !defined(CONFIG_USER_ONLY)
     dc->user = (dc->current_el == 0);
 #endif
diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
 
     core_mmu_idx = EX_TBFLAG_ANY(tb_flags, MMUIDX);
     dc->mmu_idx = core_to_arm_mmu_idx(env, core_mmu_idx);
+    dc->current_el = arm_mmu_idx_to_el(dc->mmu_idx);
+#if !defined(CONFIG_USER_ONLY)
+    dc->user = (dc->current_el == 0);
+#endif
     dc->fp_excp_el = EX_TBFLAG_ANY(tb_flags, FPEXC_EL);
     dc->align_mem = EX_TBFLAG_ANY(tb_flags, ALIGN_MEM);
     dc->pstate_il = EX_TBFLAG_ANY(tb_flags, PSTATE__IL);
@@ -XXX,XX +XXX,XX @@ static void arm_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
         }
         dc->sme_trap_nonstreaming =
             EX_TBFLAG_A32(tb_flags, SME_TRAP_NONSTREAMING);
-        dc->s_pl1_0 = EX_TBFLAG_A32(tb_flags, S_PL1_0);
     }
-    dc->current_el = arm_mmu_idx_to_el(dc->mmu_idx, dc->s_pl1_0);
-#if !defined(CONFIG_USER_ONLY)
-    dc->user = (dc->current_el == 0);
-#endif
     dc->lse2 = false; /* applies only to aarch64 */
     dc->cp_regs = cpu->cp_regs;
     dc->features = env->features;
-- 
2.34.1

Our current usage of MMU indexes when EL3 is AArch32 is confused.
Architecturally, when EL3 is AArch32, all Secure code runs under the
Secure PL1&0 translation regime:
 * code at EL3, which might be Mon, or SVC, or any of the
   other privileged modes (PL1)
 * code at EL0 (Secure PL0)

This is different from when EL3 is AArch64, in which case EL3 is its
own translation regime, and EL1 and EL0 (whether AArch32 or AArch64)
have their own regime.

We claimed to be mapping Secure PL1 to our ARMMMUIdx_EL3, but didn't
do anything special about Secure PL0, which meant it used the same
ARMMMUIdx_EL10_0 that NonSecure PL0 does.  This resulted in a bug
where arm_sctlr() incorrectly picked the NonSecure SCTLR as the
controlling register when in Secure PL0, which meant we were
spuriously generating alignment faults because we were looking at the
wrong SCTLR control bits.

The use of ARMMMUIdx_EL3 for Secure PL1 also resulted in the bug that
we wouldn't honour the PAN bit for Secure PL1, because there's no
equivalent _PAN mmu index for it.

Fix this by adding two new MMU indexes:
 * ARMMMUIdx_E30_0 is for Secure PL0
 * ARMMMUIdx_E30_3_PAN is for Secure PL1 when PAN is enabled
The existing ARMMMUIdx_E3 is used to mean "Secure PL1 without PAN"
(and would be named ARMMMUIdx_E30_3 in an AArch32-centric scheme).

These extra two indexes bring us up to the maximum of 16 that the
core code can currently support.

This commit:
 * adds the new MMU index handling to the various places
   where we deal in MMU index values
 * adds assertions that we aren't AArch32 EL3 in a couple of
   places that currently use the E10 indexes, to document why
   they don't also need to handle the E30 indexes
 * documents in a comment why regime_has_2_ranges() doesn't need
   updating

Notes for backporting: this commit depends on the preceding revert of
4c2c04746932; that revert and this commit should probably be
backported to everywhere that we originally backported 4c2c04746932.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2326
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2588
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241101142845.1712482-3-peter.maydell@linaro.org
---
 target/arm/cpu.h           | 31 ++++++++++++++++++-------------
 target/arm/internals.h     | 16 ++++++++++++++--
 target/arm/helper.c        | 38 ++++++++++++++++++++++++++++++++++----
 target/arm/ptw.c           |  4 ++++
 target/arm/tcg/op_helper.c | 14 +++++++++++++-
 target/arm/tcg/translate.c |  3 +++
 6 files changed, 86 insertions(+), 20 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
  *  + NonSecure PL1 & 0 stage 1
  *  + NonSecure PL1 & 0 stage 2
  *  + NonSecure PL2
- *  + Secure PL0
- *  + Secure PL1
+ *  + Secure PL1 & 0
  * (reminder: for 32 bit EL3, Secure PL1 is *EL3*, not EL1.)
  *
  * For QEMU, an mmu_idx is not quite the same as a translation regime because:
@@ -XXX,XX +XXX,XX @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
  *
  * This gives us the following list of cases:
  *
- * EL0 EL1&0 stage 1+2 (aka NS PL0)
- * EL1 EL1&0 stage 1+2 (aka NS PL1)
- * EL1 EL1&0 stage 1+2 +PAN
+ * EL0 EL1&0 stage 1+2 (aka NS PL0 PL1&0 stage 1+2)
+ * EL1 EL1&0 stage 1+2 (aka NS PL1 PL1&0 stage 1+2)
+ * EL1 EL1&0 stage 1+2 +PAN (aka NS PL1 P1&0 stage 1+2 +PAN)
  * EL0 EL2&0
  * EL2 EL2&0
  * EL2 EL2&0 +PAN
  * EL2 (aka NS PL2)
- * EL3 (aka S PL1)
+ * EL3 (aka AArch32 S PL1 PL1&0)
+ * AArch32 S PL0 PL1&0 (we call this EL30_0)
+ * AArch32 S PL1 PL1&0 +PAN (we call this EL30_3_PAN)
  * Stage2 Secure
  * Stage2 NonSecure
  * plus one TLB per Physical address space: S, NS, Realm, Root
  *
- * for a total of 14 different mmu_idx.
+ * for a total of 16 different mmu_idx.
  *
  * R profile CPUs have an MPU, but can use the same set of MMU indexes
  * as A profile. They only need to distinguish EL0 and EL1 (and
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
     ARMMMUIdx_E20_2_PAN = 5 | ARM_MMU_IDX_A,
     ARMMMUIdx_E2        = 6 | ARM_MMU_IDX_A,
     ARMMMUIdx_E3        = 7 | ARM_MMU_IDX_A,
+    ARMMMUIdx_E30_0     = 8 | ARM_MMU_IDX_A,
+    ARMMMUIdx_E30_3_PAN = 9 | ARM_MMU_IDX_A,
 
     /*
      * Used for second stage of an S12 page table walk, or for descriptor
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdx {
      * are in use simultaneously for SecureEL2: the security state for
      * the S2 ptw is selected by the NS bit from the S1 ptw.
      */
-    ARMMMUIdx_Stage2_S  = 8 | ARM_MMU_IDX_A,
-    ARMMMUIdx_Stage2    = 9 | ARM_MMU_IDX_A,
+    ARMMMUIdx_Stage2_S  = 10 | ARM_MMU_IDX_A,
+    ARMMMUIdx_Stage2    = 11 | ARM_MMU_IDX_A,
 
     /* TLBs with 1-1 mapping to the physical address spaces. */
-    ARMMMUIdx_Phys_S     = 10 | ARM_MMU_IDX_A,
-    ARMMMUIdx_Phys_NS    = 11 | ARM_MMU_IDX_A,
-    ARMMMUIdx_Phys_Root  = 12 | ARM_MMU_IDX_A,
-    ARMMMUIdx_Phys_Realm = 13 | ARM_MMU_IDX_A,
+    ARMMMUIdx_Phys_S     = 12 | ARM_MMU_IDX_A,
+    ARMMMUIdx_Phys_NS    = 13 | ARM_MMU_IDX_A,
+    ARMMMUIdx_Phys_Root  = 14 | ARM_MMU_IDX_A,
+    ARMMMUIdx_Phys_Realm = 15 | ARM_MMU_IDX_A,
 
     /*
      * These are not allocated TLBs and are used only for AT system
@@ -XXX,XX +XXX,XX @@ typedef enum ARMMMUIdxBit {
     TO_CORE_BIT(E20_2),
     TO_CORE_BIT(E20_2_PAN),
     TO_CORE_BIT(E3),
+    TO_CORE_BIT(E30_0),
+    TO_CORE_BIT(E30_3_PAN),
     TO_CORE_BIT(Stage2),
     TO_CORE_BIT(Stage2_S),
 
diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ static inline void arm_call_el_change_hook(ARMCPU *cpu)
     }
 }
 
-/* Return true if this address translation regime has two ranges.  */
+/*
+ * Return true if this address translation regime has two ranges.
+ * Note that this will not return the correct answer for AArch32
+ * Secure PL1&0 (i.e. mmu indexes E3, E30_0, E30_3_PAN), but it is
+ * never called from a context where EL3 can be AArch32. (The
+ * correct return value for ARMMMUIdx_E3 would be different for
+ * that case, so we can't just make the function return the
+ * correct value anyway; we would need an extra "bool e3_is_aarch32"
+ * argument which all the current callsites would pass as 'false'.)
+ */
 static inline bool regime_has_2_ranges(ARMMMUIdx mmu_idx)
 {
     switch (mmu_idx) {
@@ -XXX,XX +XXX,XX @@ static inline bool regime_is_pan(CPUARMState *env, ARMMMUIdx mmu_idx)
     case ARMMMUIdx_Stage1_E1_PAN:
     case ARMMMUIdx_E10_1_PAN:
     case ARMMMUIdx_E20_2_PAN:
+    case ARMMMUIdx_E30_3_PAN:
         return true;
     default:
         return false;
@@ -XXX,XX +XXX,XX @@ static inline uint32_t regime_el(CPUARMState *env, ARMMMUIdx mmu_idx)
     case ARMMMUIdx_E2:
         return 2;
     case ARMMMUIdx_E3:
+    case ARMMMUIdx_E30_0:
+    case ARMMMUIdx_E30_3_PAN:
         return 3;
     case ARMMMUIdx_E10_0:
     case ARMMMUIdx_Stage1_E0:
-        return arm_el_is_aa64(env, 3) || !arm_is_secure_below_el3(env) ? 1 : 3;
     case ARMMMUIdx_Stage1_E1:
     case ARMMMUIdx_Stage1_E1_PAN:
     case ARMMMUIdx_E10_1:
@@ -XXX,XX +XXX,XX @@ static inline bool regime_is_user(CPUARMState *env, ARMMMUIdx mmu_idx)
     switch (mmu_idx) {
     case ARMMMUIdx_E10_0:
     case ARMMMUIdx_E20_0:
+    case ARMMMUIdx_E30_0:
     case ARMMMUIdx_Stage1_E0:
     case ARMMMUIdx_MUser:
     case ARMMMUIdx_MSUser:
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static int alle1_tlbmask(CPUARMState *env)
      * Note that the 'ALL' scope must invalidate both stage 1 and
      * stage 2 translations, whereas most other scopes only invalidate
      * stage 1 translations.
+     *
+     * For AArch32 this is only used for TLBIALLNSNH and VTTBR
+     * writes, so only needs to apply to NS PL1&0, not S PL1&0.
      */
     return (ARMMMUIdxBit_E10_1 |
             ARMMMUIdxBit_E10_1_PAN |
@@ -XXX,XX +XXX,XX @@ static void ats_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
         /* stage 1 current state PL1: ATS1CPR, ATS1CPW, ATS1CPRP, ATS1CPWP */
         switch (el) {
         case 3:
-            mmu_idx = ARMMMUIdx_E3;
+            if (ri->crm == 9 && arm_pan_enabled(env)) {
+                mmu_idx = ARMMMUIdx_E30_3_PAN;
+            } else {
+                mmu_idx = ARMMMUIdx_E3;
+            }
             break;
         case 2:
             g_assert(ss != ARMSS_Secure);  /* ARMv8.4-SecEL2 is 64-bit only */
@@ -XXX,XX +XXX,XX @@ static void ats_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
         /* stage 1 current state PL0: ATS1CUR, ATS1CUW */
         switch (el) {
         case 3:
-            mmu_idx = ARMMMUIdx_E10_0;
+            mmu_idx = ARMMMUIdx_E30_0;
             break;
         case 2:
             g_assert(ss != ARMSS_Secure);  /* ARMv8.4-SecEL2 is 64-bit only */
@@ -XXX,XX +XXX,XX @@ static int vae1_tlbmask(CPUARMState *env)
     uint64_t hcr = arm_hcr_el2_eff(env);
     uint16_t mask;
 
+    assert(arm_feature(env, ARM_FEATURE_AARCH64));
+
     if ((hcr & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) {
         mask = ARMMMUIdxBit_E20_2 |
                ARMMMUIdxBit_E20_2_PAN |
                ARMMMUIdxBit_E20_0;
     } else {
+        /* This is AArch64 only, so we don't need to touch the EL30_x TLBs */
         mask = ARMMMUIdxBit_E10_1 |
                ARMMMUIdxBit_E10_1_PAN |
                ARMMMUIdxBit_E10_0;
@@ -XXX,XX +XXX,XX @@ static int vae1_tlbbits(CPUARMState *env, uint64_t addr)
     uint64_t hcr = arm_hcr_el2_eff(env);
     ARMMMUIdx mmu_idx;
 
+    assert(arm_feature(env, ARM_FEATURE_AARCH64));
+
     /* Only the regime of the mmu_idx below is significant. */
     if ((hcr & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) {
         mmu_idx = ARMMMUIdx_E20_0;
@@ -XXX,XX +XXX,XX @@ void arm_cpu_do_interrupt(CPUState *cs)
 
 uint64_t arm_sctlr(CPUARMState *env, int el)
 {
-    /* Only EL0 needs to be adjusted for EL1&0 or EL2&0. */
+    /* Only EL0 needs to be adjusted for EL1&0 or EL2&0 or EL3&0 */
     if (el == 0) {
         ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, 0);
-        el = mmu_idx == ARMMMUIdx_E20_0 ? 2 : 1;
+        switch (mmu_idx) {
+        case ARMMMUIdx_E20_0:
+            el = 2;
+            break;
+        case ARMMMUIdx_E30_0:
+            el = 3;
+            break;
+        default:
+            el = 1;
+            break;
+        }
     }
     return env->cp15.sctlr_el[el];
 }
@@ -XXX,XX +XXX,XX @@ int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx)
     switch (mmu_idx) {
     case ARMMMUIdx_E10_0:
     case ARMMMUIdx_E20_0:
+    case ARMMMUIdx_E30_0:
         return 0;
     case ARMMMUIdx_E10_1:
     case ARMMMUIdx_E10_1_PAN:
@@ -XXX,XX +XXX,XX @@ int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx)
     case ARMMMUIdx_E20_2_PAN:
         return 2;
     case ARMMMUIdx_E3:
+    case ARMMMUIdx_E30_3_PAN:
         return 3;
     default:
         g_assert_not_reached();
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el)
         hcr = arm_hcr_el2_eff(env);
         if ((hcr & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) {
             idx = ARMMMUIdx_E20_0;
+        } else if (arm_is_secure_below_el3(env) &&
+                   !arm_el_is_aa64(env, 3)) {
+            idx = ARMMMUIdx_E30_0;
         } else {
             idx = ARMMMUIdx_E10_0;
         }
@@ -XXX,XX +XXX,XX @@ ARMMMUIdx arm_mmu_idx_el(CPUARMState *env, int el)
         }
         break;
     case 3:
+        if (!arm_el_is_aa64(env, 3) && arm_pan_enabled(env)) {
+            return ARMMMUIdx_E30_3_PAN;
+        }
         return ARMMMUIdx_E3;
     default:
         g_assert_not_reached();
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -XXX,XX +XXX,XX @@ static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx,
     case ARMMMUIdx_E20_2_PAN:
     case ARMMMUIdx_E2:
     case ARMMMUIdx_E3:
+    case ARMMMUIdx_E30_0:
+    case ARMMMUIdx_E30_3_PAN:
         break;
 
     case ARMMMUIdx_Phys_S:
@@ -XXX,XX +XXX,XX @@ bool get_phys_addr(CPUARMState *env, vaddr address,
         ss = ARMSS_Secure;
         break;
     case ARMMMUIdx_E3:
+    case ARMMMUIdx_E30_0:
+    case ARMMMUIdx_E30_3_PAN:
         if (arm_feature(env, ARM_FEATURE_AARCH64) &&
             cpu_isar_feature(aa64_rme, env_archcpu(env))) {
             ss = ARMSS_Root;
diff --git a/target/arm/tcg/op_helper.c b/target/arm/tcg/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/op_helper.c
+++ b/target/arm/tcg/op_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(tidcp_el0)(CPUARMState *env, uint32_t syndrome)
 {
     /* See arm_sctlr(), but we also need the sctlr el. */
     ARMMMUIdx mmu_idx = arm_mmu_idx_el(env, 0);
-    int target_el = mmu_idx == ARMMMUIdx_E20_0 ? 2 : 1;
+    int target_el;
+
+    switch (mmu_idx) {
+    case ARMMMUIdx_E20_0:
+        target_el = 2;
+        break;
+    case ARMMMUIdx_E30_0:
+        target_el = 3;
+        break;
+    default:
+        target_el = 1;
+        break;
+    }
 
     /*
      * The bit is not valid unless the target el is aa64, but since the
diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -XXX,XX +XXX,XX @@ static inline int get_a32_user_mem_index(DisasContext *s)
      */
     switch (s->mmu_idx) {
     case ARMMMUIdx_E3:
+    case ARMMMUIdx_E30_0:
+    case ARMMMUIdx_E30_3_PAN:
+        return arm_to_core_mmu_idx(ARMMMUIdx_E30_0);
     case ARMMMUIdx_E2:        /* this one is UNPREDICTABLE */
     case ARMMMUIdx_E10_0:
     case ARMMMUIdx_E10_1:
-- 
2.34.1

Our implementation of the indexed version of SVE SDOT/UDOT/USDOT got
the calculation of the inner loop terminator wrong.  Although we
correctly account for the element size when we calculate the
terminator for the first iteration:
   intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n);
we don't do that when we move it forward after the first inner loop
completes.  The intention is that we process the vector in 128-bit
segments, which for a 64-bit element size should mean (1, 2), (3, 4),
(5, 6), etc.  This bug meant that we would iterate (1, 2), (3, 4, 5,
6), (7, 8, 9, 10) etc and apply the wrong indexed element to some of
the operations, and also index off the end of the vector.

You don't see this bug if the vector length is small enough that we
don't need to iterate the outer loop, i.e.  if it is only 128 bits,
or if it is the 64-bit special case from AA32/AA64 AdvSIMD.  If the
vector length is 256 bits then we calculate the right results for the
elements in the vector but do index off the end of the vector. Vector
lengths greater than 256 bits see wrong answers. The instructions
that produce 32-bit results behave correctly.

Fix the recalculation of 'segend' for subsequent iterations, and
restore a version of the comment that was lost in the refactor of
commit 7020ffd656a5 that explains why we only need to clamp segend to
opr_sz_n for the first iteration, not the later ones.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2595
Fixes: 7020ffd656a5 ("target/arm: Macroize helper_gvec_{s,u}dot_idx_{b,h}")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241101185544.2130972-1-peter.maydell@linaro.org
---
 target/arm/tcg/vec_helper.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)  \
 {                                                                         \
     intptr_t i = 0, opr_sz = simd_oprsz(desc);                            \
     intptr_t opr_sz_n = opr_sz / sizeof(TYPED);                           \
+    /*                                                                    \
+     * Special case: opr_sz == 8 from AA64/AA32 advsimd means the         \
+     * first iteration might not be a full 16 byte segment. But           \
+     * for vector lengths beyond that this must be SVE and we know        \
+     * opr_sz is a multiple of 16, so we need not clamp segend            \
+     * to opr_sz_n when we advance it at the end of the loop.             \
+     */                                                                   \
     intptr_t segend = MIN(16 / sizeof(TYPED), opr_sz_n);                  \
     intptr_t index = simd_data(desc);                                     \
     TYPED *d = vd, *a = va;                                               \
@@ -XXX,XX +XXX,XX @@ void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)  \
                     n[i * 4 + 2] * m2 +                                   \
                     n[i * 4 + 3] * m3);                                   \
         } while (++i < segend);                                           \
-        segend = i + 4;                                                   \
+        segend = i + (16 / sizeof(TYPED));                                \
     } while (i < opr_sz_n);                                               \
     clear_tail(d, opr_sz, simd_maxsz(desc));                              \
 }
-- 
2.34.1

From: Bernhard Beschow <shentey@gmail.com>

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-id: 20241103143330.123596-2-shentey@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/rtc/ds1338.c     | 6 ++++++
 hw/rtc/trace-events | 4 ++++
 2 files changed, 10 insertions(+)

diff --git a/hw/rtc/ds1338.c b/hw/rtc/ds1338.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/rtc/ds1338.c
+++ b/hw/rtc/ds1338.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/module.h"
 #include "qom/object.h"
 #include "sysemu/rtc.h"
+#include "trace.h"
 
 /* Size of NVRAM including both the user-accessible area and the
  * secondary register area.
@@ -XXX,XX +XXX,XX @@ static uint8_t ds1338_recv(I2CSlave *i2c)
     uint8_t res;
 
     res  = s->nvram[s->ptr];
+
+    trace_ds1338_recv(s->ptr, res);
+
     inc_regptr(s);
     return res;
 }
@@ -XXX,XX +XXX,XX @@ static int ds1338_send(I2CSlave *i2c, uint8_t data)
 {
     DS1338State *s = DS1338(i2c);
 
+    trace_ds1338_send(s->ptr, data);
+
     if (s->addr_byte) {
         s->ptr = data & (NVRAM_SIZE - 1);
         s->addr_byte = false;
diff --git a/hw/rtc/trace-events b/hw/rtc/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/rtc/trace-events
+++ b/hw/rtc/trace-events
@@ -XXX,XX +XXX,XX @@ pl031_set_alarm(uint32_t ticks) "alarm set for %u ticks"
 aspeed_rtc_read(uint64_t addr, uint64_t value) "addr 0x%02" PRIx64 " value 0x%08" PRIx64
 aspeed_rtc_write(uint64_t addr, uint64_t value) "addr 0x%02" PRIx64 " value 0x%08" PRIx64
 
+# ds1338.c
+ds1338_recv(uint32_t addr, uint8_t value) "[0x%" PRIx32 "] -> 0x%02" PRIx8
+ds1338_send(uint32_t addr, uint8_t value) "[0x%" PRIx32 "] <- 0x%02" PRIx8
+
 # m48t59.c
 m48txx_nvram_io_read(uint64_t addr, uint64_t value) "io read addr:0x%04" PRIx64 " value:0x%02" PRIx64
 m48txx_nvram_io_write(uint64_t addr, uint64_t value) "io write addr:0x%04" PRIx64 " value:0x%02" PRIx64
-- 
2.34.1

From: Bernhard Beschow <shentey@gmail.com>

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-id: 20241103143330.123596-3-shentey@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/timer/imx_gpt.c    | 18 +++++-------------
 hw/timer/trace-events |  6 ++++++
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/hw/timer/imx_gpt.c b/hw/timer/imx_gpt.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/imx_gpt.c
+++ b/hw/timer/imx_gpt.c
@@ -XXX,XX +XXX,XX @@
 #include "migration/vmstate.h"
 #include "qemu/module.h"
 #include "qemu/log.h"
+#include "trace.h"
 
 #ifndef DEBUG_IMX_GPT
 #define DEBUG_IMX_GPT 0
 #endif
 
-#define DPRINTF(fmt, args...) \
-    do { \
-        if (DEBUG_IMX_GPT) { \
-            fprintf(stderr, "[%s]%s: " fmt , TYPE_IMX_GPT, \
-                                             __func__, ##args); \
-        } \
-    } while (0)
-
 static const char *imx_gpt_reg_name(uint32_t reg)
 {
     switch (reg) {
@@ -XXX,XX +XXX,XX @@ static void imx_gpt_set_freq(IMXGPTState *s)
     s->freq = imx_ccm_get_clock_frequency(s->ccm,
                                           s->clocks[clksrc]) / (1 + s->pr);
 
-    DPRINTF("Setting clksrc %d to frequency %d\n", clksrc, s->freq);
+    trace_imx_gpt_set_freq(clksrc, s->freq);
 
     if (s->freq) {
         ptimer_set_freq(s->timer, s->freq);
@@ -XXX,XX +XXX,XX @@ static uint64_t imx_gpt_read(void *opaque, hwaddr offset, unsigned size)
         break;
     }
 
-    DPRINTF("(%s) = 0x%08x\n", imx_gpt_reg_name(offset >> 2), reg_value);
+    trace_imx_gpt_read(imx_gpt_reg_name(offset >> 2), reg_value);
 
     return reg_value;
 }
@@ -XXX,XX +XXX,XX @@ static void imx_gpt_write(void *opaque, hwaddr offset, uint64_t value,
     IMXGPTState *s = IMX_GPT(opaque);
     uint32_t oldreg;
 
-    DPRINTF("(%s, value = 0x%08x)\n", imx_gpt_reg_name(offset >> 2),
-            (uint32_t)value);
+    trace_imx_gpt_write(imx_gpt_reg_name(offset >> 2), (uint32_t)value);
 
     switch (offset >> 2) {
     case 0:
@@ -XXX,XX +XXX,XX @@ static void imx_gpt_timeout(void *opaque)
 {
     IMXGPTState *s = IMX_GPT(opaque);
 
-    DPRINTF("\n");
+    trace_imx_gpt_timeout();
 
     s->sr |= s->next_int;
     s->next_int = 0;
diff --git a/hw/timer/trace-events b/hw/timer/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/timer/trace-events
+++ b/hw/timer/trace-events
@@ -XXX,XX +XXX,XX @@ cmsdk_apb_dualtimer_read(uint64_t offset, uint64_t data, unsigned size) "CMSDK A
 cmsdk_apb_dualtimer_write(uint64_t offset, uint64_t data, unsigned size) "CMSDK APB dualtimer write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
 cmsdk_apb_dualtimer_reset(void) "CMSDK APB dualtimer: reset"
 
+# imx_gpt.c
+imx_gpt_set_freq(uint32_t clksrc, uint32_t freq) "Setting clksrc %u to %u Hz"
+imx_gpt_read(const char *name, uint64_t value) "%s -> 0x%08" PRIx64
+imx_gpt_write(const char *name, uint64_t value) "%s <- 0x%08" PRIx64
+imx_gpt_timeout(void) ""
+
 # npcm7xx_timer.c
 npcm7xx_timer_read(const char *id, uint64_t offset, uint64_t value) " %s offset: 0x%04" PRIx64 " value 0x%08" PRIx64
 npcm7xx_timer_write(const char *id, uint64_t offset, uint64_t value) "%s offset: 0x%04" PRIx64 " value 0x%08" PRIx64
-- 
2.34.1

From: Bernhard Beschow <shentey@gmail.com>

printf() unconditionally prints to the console which disturbs `-serial stdio`.
Fix that by converting into a trace event. While at it, add some tracing for
read and write access.

Fixes: 7e7c5e4c1ba5 "Nokia N800 machine support (ARM)."
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20241103143330.123596-5-shentey@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 meson.build            | 1 +
 hw/sensor/trace.h      | 1 +
 hw/sensor/tmp105.c     | 7 ++++++-
 hw/sensor/trace-events | 6 ++++++
 4 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 hw/sensor/trace.h
 create mode 100644 hw/sensor/trace-events

diff --git a/meson.build b/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/meson.build
+++ b/meson.build
@@ -XXX,XX +XXX,XX @@ if have_system
     'hw/s390x',
     'hw/scsi',
     'hw/sd',
+    'hw/sensor',
     'hw/sh4',
     'hw/sparc',
     'hw/sparc64',
diff --git a/hw/sensor/trace.h b/hw/sensor/trace.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/sensor/trace.h
@@ -0,0 +1 @@
+#include "trace/trace-hw_sensor.h"
diff --git a/hw/sensor/tmp105.c b/hw/sensor/tmp105.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/sensor/tmp105.c
+++ b/hw/sensor/tmp105.c
@@ -XXX,XX +XXX,XX @@
 #include "qapi/visitor.h"
 #include "qemu/module.h"
 #include "hw/registerfields.h"
+#include "trace.h"
 
 FIELD(CONFIG, SHUTDOWN_MODE,        0, 1)
 FIELD(CONFIG, THERMOSTAT_MODE,      1, 1)
@@ -XXX,XX +XXX,XX @@ static void tmp105_read(TMP105State *s)
         s->buf[s->len++] = ((uint16_t) s->limit[1]) >> 0;
         break;
     }
+
+    trace_tmp105_read(s->i2c.address, s->pointer);
 }
 
 static void tmp105_write(TMP105State *s)
 {
+    trace_tmp105_write(s->i2c.address, s->pointer);
+
     switch (s->pointer & 3) {
     case TMP105_REG_TEMPERATURE:
         break;
 
     case TMP105_REG_CONFIG:
         if (FIELD_EX8(s->buf[0] & ~s->config, CONFIG, SHUTDOWN_MODE)) {
-            printf("%s: TMP105 shutdown\n", __func__);
+            trace_tmp105_write_shutdown(s->i2c.address);
         }
         s->config = FIELD_DP8(s->buf[0], CONFIG, ONE_SHOT, 0);
         s->faults = tmp105_faultq[FIELD_EX8(s->config, CONFIG, FAULT_QUEUE)];
diff --git a/hw/sensor/trace-events b/hw/sensor/trace-events
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/sensor/trace-events
@@ -XXX,XX +XXX,XX @@
+# See docs/devel/tracing.rst for syntax documentation.
+
+# tmp105.c
+tmp105_read(uint8_t dev, uint8_t addr) "device: 0x%02x, addr: 0x%02x"
+tmp105_write(uint8_t dev, uint8_t addr) "device: 0x%02x, addr 0x%02x"
+tmp105_write_shutdown(uint8_t dev) "device: 0x%02x"
-- 
2.34.1

From: Nabih Estefan <nabihestefan@google.com>

Convert the LOG_GUEST_ERROR for the "tx descriptor is owned
by software" to a trace message. This condition is normal
when there is there is nothing to transmit, and we would
otherwise spam the logs with it in that situation.

Signed-off-by: Nabih Estefan <nabihestefan@google.com>
Signed-off-by: Roque Arcudia Hernandez <roqueh@google.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20241014184847.1594056-1-roqueh@google.com
[PMM: tweaked commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/net/npcm_gmac.c  | 5 ++---
 hw/net/trace-events | 1 +
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/net/npcm_gmac.c b/hw/net/npcm_gmac.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/npcm_gmac.c
+++ b/hw/net/npcm_gmac.c
@@ -XXX,XX +XXX,XX @@ static void gmac_try_send_next_packet(NPCMGMACState *gmac)
 
         /* 1 = DMA Owned, 0 = Software Owned */
         if (!(tx_desc.tdes0 & TX_DESC_TDES0_OWN)) {
-            qemu_log_mask(LOG_GUEST_ERROR,
-                          "TX Descriptor @ 0x%x is owned by software\n",
-                          desc_addr);
+            trace_npcm_gmac_tx_desc_owner(DEVICE(gmac)->canonical_path,
+                                          desc_addr);
             gmac->regs[R_NPCM_DMA_STATUS] |= NPCM_DMA_STATUS_TU;
             gmac_dma_set_state(gmac, NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT,
                 NPCM_DMA_STATUS_TX_SUSPENDED_STATE);
diff --git a/hw/net/trace-events b/hw/net/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -XXX,XX +XXX,XX @@ npcm_gmac_packet_received(const char* name, uint32_t len) "%s: Reception finishe
 npcm_gmac_packet_sent(const char* name, uint16_t len) "%s: TX packet sent!, length: 0x%04" PRIX16
 npcm_gmac_debug_desc_data(const char* name, void* addr, uint32_t des0, uint32_t des1, uint32_t des2, uint32_t des3)"%s: Address: %p Descriptor 0: 0x%04" PRIX32 " Descriptor 1: 0x%04" PRIX32 "Descriptor 2: 0x%04" PRIX32 " Descriptor 3: 0x%04" PRIX32
 npcm_gmac_packet_tx_desc_data(const char* name, uint32_t tdes0, uint32_t tdes1) "%s: Tdes0: 0x%04" PRIX32 " Tdes1: 0x%04" PRIX32
+npcm_gmac_tx_desc_owner(const char* name, uint32_t desc_addr) "%s: TX Descriptor @0x%04" PRIX32 " is owned by software"
 
 # npcm_pcs.c
 npcm_pcs_reg_read(const char *name, uint16_t indirect_access_baes, uint64_t offset, uint16_t value) "%s: IND: 0x%02" PRIx16 " offset: 0x%04" PRIx64 " value: 0x%04" PRIx16
-- 
2.34.1

From: Gustavo Romero <gustavo.romero@linaro.org>

FEAT_CMOW introduces support for controlling cache maintenance
instructions executed in EL0/1 and is mandatory from Armv8.8.

On real hardware, the main use for this feature is to prevent processes
from invalidating or flushing cache lines for addresses they only have
read permission, which can impact the performance of other processes.

QEMU implements all cache instructions as NOPs, and, according to rule
[1], which states that generating any Permission fault when a cache
instruction is implemented as a NOP is implementation-defined, no
Permission fault is generated for any cache instruction when it lacks
read and write permissions.

QEMU does not model any cache topology, so the PoU and PoC are before
any cache, and rules [2] apply. These rules state that generating any
MMU fault for cache instructions in this topology is also
implementation-defined. Therefore, for FEAT_CMOW, we do not generate any
MMU faults either, instead, we only advertise it in the feature
register.

[1] Rule R_HGLYG of section D8.14.3, Arm ARM K.a.
[2] Rules R_MZTNR and R_DNZYL of section D8.14.3, Arm ARM K.a.

Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241104142606.941638-1-gustavo.romero@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/emulation.rst | 1 +
 target/arm/cpu-features.h     | 5 +++++
 target/arm/cpu.h              | 1 +
 target/arm/helper.c           | 5 +++++
 target/arm/tcg/cpu64.c        | 1 +
 5 files changed, 13 insertions(+)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
 - FEAT_BF16 (AArch64 BFloat16 instructions)
 - FEAT_BTI (Branch Target Identification)
 - FEAT_CCIDX (Extended cache index)
+- FEAT_CMOW (Control for cache maintenance permission)
 - FEAT_CRC32 (CRC32 instructions)
 - FEAT_Crypto (Cryptographic Extension)
 - FEAT_CSV2 (Cache speculation variant 2)
diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_tidcp1(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, TIDCP1) != 0;
 }
 
+static inline bool isar_feature_aa64_cmow(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, CMOW) != 0;
+}
+
 static inline bool isar_feature_aa64_hafs(const ARMISARegisters *id)
 {
     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HAFDBS) != 0;
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu);
 #define SCTLR_EnIB    (1U << 30) /* v8.3, AArch64 only */
 #define SCTLR_EnIA    (1U << 31) /* v8.3, AArch64 only */
 #define SCTLR_DSSBS_32 (1U << 31) /* v8.5, AArch32 only */
+#define SCTLR_CMOW    (1ULL << 32) /* FEAT_CMOW */
 #define SCTLR_MSCEN   (1ULL << 33) /* FEAT_MOPS */
 #define SCTLR_BT0     (1ULL << 35) /* v8.5-BTI */
 #define SCTLR_BT1     (1ULL << 36) /* v8.5-BTI */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void hcrx_write(CPUARMState *env, const ARMCPRegInfo *ri,
     if (cpu_isar_feature(aa64_nmi, cpu)) {
         valid_mask |= HCRX_TALLINT | HCRX_VINMI | HCRX_VFNMI;
     }
+    /* FEAT_CMOW adds CMOW */
+
+    if (cpu_isar_feature(aa64_cmow, cpu)) {
+        valid_mask |= HCRX_CMOW;
+    }
 
     /* Clear RES0 bits.  */
     env->cp15.hcrx_el2 = value & valid_mask;
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
     t = FIELD_DP64(t, ID_AA64MMFR1, ETS, 2);      /* FEAT_ETS2 */
     t = FIELD_DP64(t, ID_AA64MMFR1, HCX, 1);      /* FEAT_HCX */
     t = FIELD_DP64(t, ID_AA64MMFR1, TIDCP1, 1);   /* FEAT_TIDCP1 */
+    t = FIELD_DP64(t, ID_AA64MMFR1, CMOW, 1);     /* FEAT_CMOW */
     cpu->isar.id_aa64mmfr1 = t;
 
     t = cpu->isar.id_aa64mmfr2;
-- 
2.34.1

The following changes since commit 3214bec13d8d4c40f707d21d8350d04e4123ae97:

Merge tag 'migration-20250110-pull-request' of https://gitlab.com/farosas/qemu into staging (2025-01-10 13:39:19 -0500)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20250113

for you to fetch changes up to 435d260e7ec5ff9c79e3e62f1d66ec82d2d691ae:

docs/system/arm/virt: mention specific migration information (2025-01-13 12:35:35 +0000)

----------------------------------------------------------------
target-arm queue:
 * hw/arm_sysctl: fix extracting 31th bit of val
 * hw/misc: cast rpm to uint64_t
 * tests/qtest/boot-serial-test: Improve ASM
 * target/arm: Move minor arithmetic helpers out of helper.c
 * target/arm: change default pauth algorithm to impdef

----------------------------------------------------------------
Anastasia Belova (1):
      hw/arm_sysctl: fix extracting 31th bit of val

Peter Maydell (2):
      target/arm: Move minor arithmetic helpers out of helper.c
      tests/tcg/aarch64: force qarma5 for pauth-3 test

Philippe Mathieu-Daudé (4):
      tests/qtest/boot-serial-test: Improve ASM comments of PL011 tests
      tests/qtest/boot-serial-test: Reduce for() loop in PL011 tests
      tests/qtest/boot-serial-test: Reorder pair of instructions in PL011 test
      tests/qtest/boot-serial-test: Initialize PL011 Control register

Pierrick Bouvier (3):
      target/arm: add new property to select pauth-qarma5
      target/arm: change default pauth algorithm to impdef
      docs/system/arm/virt: mention specific migration information

Tigran Sogomonian (1):
      hw/misc: cast rpm to uint64_t

docs/system/arm/cpu-features.rst                |   7 +-
 docs/system/arm/virt.rst                        |   4 +
 docs/system/introduction.rst                    |   2 +-
 target/arm/cpu.h                                |   4 +
 hw/core/machine.c                               |   4 +-
 hw/misc/arm_sysctl.c                            |   2 +-
 hw/misc/npcm7xx_mft.c                           |   5 +-
 target/arm/arm-qmp-cmds.c                       |   2 +-
 target/arm/cpu.c                                |   2 +
 target/arm/cpu64.c                              |  38 ++-
 target/arm/helper.c                             | 285 -----------------------
 target/arm/tcg/arith_helper.c                   | 296 ++++++++++++++++++++++++
 tests/qtest/arm-cpu-features.c                  |  15 +-
 tests/qtest/boot-serial-test.c                  |  23 +-
 target/arm/{op_addsub.h => tcg/op_addsub.c.inc} |   0
 target/arm/tcg/meson.build                      |   1 +
 tests/tcg/aarch64/Makefile.softmmu-target       |   3 +
 17 files changed, 377 insertions(+), 316 deletions(-)
 create mode 100644 target/arm/tcg/arith_helper.c
 rename target/arm/{op_addsub.h => tcg/op_addsub.c.inc} (100%)

From: Anastasia Belova <abelova@astralinux.ru>

1 << 31 is casted to uint64_t while bitwise and with val.
So this value may become 0xffffffff80000000 but only
31th "start" bit is required.

This is not possible in practice because the MemoryRegionOps
uses the default max access size of 4 bytes and so none
of the upper bytes of val will be set, but the bitfield
extract API is clearer anyway.

Use the bitfield extract() API instead.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Anastasia Belova <abelova@astralinux.ru>
Message-id: 20241220125429.7552-1-abelova@astralinux.ru
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: add clarification to commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/misc/arm_sysctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/misc/arm_sysctl.c b/hw/misc/arm_sysctl.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/arm_sysctl.c
+++ b/hw/misc/arm_sysctl.c
@@ -XXX,XX +XXX,XX @@ static void arm_sysctl_write(void *opaque, hwaddr offset,
          * as zero.
          */
         s->sys_cfgctrl = val & ~((3 << 18) | (1 << 31));
-        if (val & (1 << 31)) {
+        if (extract64(val, 31, 1)) {
             /* Start bit set -- actually do something */
             unsigned int dcc = extract32(s->sys_cfgctrl, 26, 4);
             unsigned int function = extract32(s->sys_cfgctrl, 20, 6);
-- 
2.34.1

From: Tigran Sogomonian <tsogomonian@astralinux.ru>

The value of an arithmetic expression
'rpm * NPCM7XX_MFT_PULSE_PER_REVOLUTION' is a subject
to overflow because its operands are not cast to
a larger data type before performing arithmetic. Thus, need
to cast rpm to uint64_t.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Tigran Sogomonian <tsogomonian@astralinux.ru>
Reviewed-by: Patrick Leis <venture@google.com>
Reviewed-by: Hao Wu <wuhaotsh@google.com>
Message-id: 20241226130311.1349-1-tsogomonian@astralinux.ru
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/misc/npcm7xx_mft.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/misc/npcm7xx_mft.c b/hw/misc/npcm7xx_mft.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/npcm7xx_mft.c
+++ b/hw/misc/npcm7xx_mft.c
@@ -XXX,XX +XXX,XX @@ static NPCM7xxMFTCaptureState npcm7xx_mft_compute_cnt(
          * RPM = revolution/min. The time for one revlution (in ns) is
          * MINUTE_TO_NANOSECOND / RPM.
          */
-        count = clock_ns_to_ticks(clock, (60 * NANOSECONDS_PER_SECOND) /
-            (rpm * NPCM7XX_MFT_PULSE_PER_REVOLUTION));
+        count = clock_ns_to_ticks(clock,
+            (uint64_t)(60 * NANOSECONDS_PER_SECOND) /
+            ((uint64_t)rpm * NPCM7XX_MFT_PULSE_PER_REVOLUTION));
     }
 
     if (count > NPCM7XX_MFT_MAX_CNT) {
-- 
2.34.1

From: Philippe Mathieu-Daudé <philmd@linaro.org>

Re-indent ASM comments adding the 'loop:' label.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/qtest/boot-serial-test.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/boot-serial-test.c
+++ b/tests/qtest/boot-serial-test.c
@@ -XXX,XX +XXX,XX @@ static const uint8_t kernel_plml605[] = {
 };
 
 static const uint8_t bios_raspi2[] = {
-    0x08, 0x30, 0x9f, 0xe5,                 /* ldr   r3,[pc,#8]    Get base */
-    0x54, 0x20, 0xa0, 0xe3,                 /* mov     r2,#'T' */
-    0x00, 0x20, 0xc3, 0xe5,                 /* strb    r2,[r3] */
-    0xfb, 0xff, 0xff, 0xea,                 /* b       loop */
-    0x00, 0x10, 0x20, 0x3f,                 /* 0x3f201000 = UART0 base addr */
+    0x08, 0x30, 0x9f, 0xe5,                 /* loop:  ldr     r3, [pc, #8]   Get &UART0 */
+    0x54, 0x20, 0xa0, 0xe3,                 /*        mov     r2, #'T' */
+    0x00, 0x20, 0xc3, 0xe5,                 /*        strb    r2, [r3]       *TXDAT = 'T' */
+    0xfb, 0xff, 0xff, 0xea,                 /*        b       -12            (loop) */
+    0x00, 0x10, 0x20, 0x3f,                 /* UART0: 0x3f201000 */
 };
 
 static const uint8_t kernel_aarch64[] = {
-    0x81, 0x0a, 0x80, 0x52,                 /* mov     w1, #0x54 */
-    0x02, 0x20, 0xa1, 0xd2,                 /* mov     x2, #0x9000000 */
-    0x41, 0x00, 0x00, 0x39,                 /* strb    w1, [x2] */
-    0xfd, 0xff, 0xff, 0x17,                 /* b       -12 (loop) */
+    0x81, 0x0a, 0x80, 0x52,                 /* loop:  mov    w1, #'T' */
+    0x02, 0x20, 0xa1, 0xd2,                 /*        mov    x2, #0x9000000  Load UART0 */
+    0x41, 0x00, 0x00, 0x39,                 /*        strb   w1, [x2]        *TXDAT = 'T' */
+    0xfd, 0xff, 0xff, 0x17,                 /*        b      -12             (loop) */
 };
 
 static const uint8_t kernel_nrf51[] = {
-- 
2.34.1

From: Philippe Mathieu-Daudé <philmd@linaro.org>

Since registers are not modified, we don't need
to refill their values. Directly jump to the previous
store instruction to keep filling the TXDAT register.

The equivalent C code remains:

while (true) {
      *UART_DATA = 'T';
  }

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/qtest/boot-serial-test.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/boot-serial-test.c
+++ b/tests/qtest/boot-serial-test.c
@@ -XXX,XX +XXX,XX @@ static const uint8_t kernel_plml605[] = {
 };
 
 static const uint8_t bios_raspi2[] = {
-    0x08, 0x30, 0x9f, 0xe5,                 /* loop:  ldr     r3, [pc, #8]   Get &UART0 */
+    0x08, 0x30, 0x9f, 0xe5,                 /*        ldr     r3, [pc, #8]   Get &UART0 */
     0x54, 0x20, 0xa0, 0xe3,                 /*        mov     r2, #'T' */
-    0x00, 0x20, 0xc3, 0xe5,                 /*        strb    r2, [r3]       *TXDAT = 'T' */
-    0xfb, 0xff, 0xff, 0xea,                 /*        b       -12            (loop) */
+    0x00, 0x20, 0xc3, 0xe5,                 /* loop:  strb    r2, [r3]       *TXDAT = 'T' */
+    0xff, 0xff, 0xff, 0xea,                 /*        b       -4             (loop) */
     0x00, 0x10, 0x20, 0x3f,                 /* UART0: 0x3f201000 */
 };
 
 static const uint8_t kernel_aarch64[] = {
-    0x81, 0x0a, 0x80, 0x52,                 /* loop:  mov    w1, #'T' */
+    0x81, 0x0a, 0x80, 0x52,                 /*        mov    w1, #'T' */
     0x02, 0x20, 0xa1, 0xd2,                 /*        mov    x2, #0x9000000  Load UART0 */
-    0x41, 0x00, 0x00, 0x39,                 /*        strb   w1, [x2]        *TXDAT = 'T' */
-    0xfd, 0xff, 0xff, 0x17,                 /*        b      -12             (loop) */
+    0x41, 0x00, 0x00, 0x39,                 /* loop:  strb   w1, [x2]        *TXDAT = 'T' */
+    0xff, 0xff, 0xff, 0x17,                 /*        b      -4              (loop) */
 };
 
 static const uint8_t kernel_nrf51[] = {
-- 
2.34.1

From: Philippe Mathieu-Daudé <philmd@linaro.org>

In the next commit we are going to use a different value
for the $w1 register, maintaining the same $x2 value. In
order to keep the next commit trivial to review, set $x2
before $w1.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/qtest/boot-serial-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

From: Philippe Mathieu-Daudé <philmd@linaro.org>

The tests using the PL011 UART of the virt and raspi machines
weren't properly enabling the UART and its transmitter previous
to sending characters. Follow the PL011 manual initialization
recommendation by setting the proper bits of the control register.

Update the ASM code prefixing:

*UART_CTRL = UART_ENABLE | TX_ENABLE;

to:

while (true) {
      *UART_DATA = 'T';
  }

Note, since commit 51b61dd4d56 ("hw/char/pl011: Warn when using
disabled transmitter") incomplete PL011 initialization can be
logged using the '-d guest_errors' command line option.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/qtest/boot-serial-test.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/boot-serial-test.c
+++ b/tests/qtest/boot-serial-test.c
@@ -XXX,XX +XXX,XX @@ static const uint8_t kernel_plml605[] = {
 };
 
 static const uint8_t bios_raspi2[] = {
-    0x08, 0x30, 0x9f, 0xe5,                 /*        ldr     r3, [pc, #8]   Get &UART0 */
+    0x10, 0x30, 0x9f, 0xe5,                 /*        ldr     r3, [pc, #16]  Get &UART0 */
+    0x10, 0x20, 0x9f, 0xe5,                 /*        ldr     r2, [pc, #16]  Get &CR */
+    0xb0, 0x23, 0xc3, 0xe1,                 /*        strh    r2, [r3, #48]  Set CR */
     0x54, 0x20, 0xa0, 0xe3,                 /*        mov     r2, #'T' */
     0x00, 0x20, 0xc3, 0xe5,                 /* loop:  strb    r2, [r3]       *TXDAT = 'T' */
     0xff, 0xff, 0xff, 0xea,                 /*        b       -4             (loop) */
     0x00, 0x10, 0x20, 0x3f,                 /* UART0: 0x3f201000 */
+    0x01, 0x01, 0x00, 0x00,                 /* CR:    0x101 = UARTEN|TXE */
 };
 
 static const uint8_t kernel_aarch64[] = {
     0x02, 0x20, 0xa1, 0xd2,                 /*        mov    x2, #0x9000000  Load UART0 */
+    0x21, 0x20, 0x80, 0x52,                 /*        mov    w1, 0x101       CR = UARTEN|TXE */
+    0x41, 0x60, 0x00, 0x79,                 /*        strh   w1, [x2, #48]   Set CR */
     0x81, 0x0a, 0x80, 0x52,                 /*        mov    w1, #'T' */
     0x41, 0x00, 0x00, 0x39,                 /* loop:  strb   w1, [x2]        *TXDAT = 'T' */
     0xff, 0xff, 0xff, 0x17,                 /*        b      -4              (loop) */
-- 
2.34.1

helper.c includes some small TCG helper functions used for mostly
arithmetic instructions.  These are TCG only and there's no need for
them to be in the large and unwieldy helper.c.  Move them out to
their own source file in the tcg/ subdirectory, together with the
op_addsub.h multiply-included template header that they use.

Since we are moving op_addsub.h, we take the opportunity to
give it a name which matches our convention for files which
are not true header files but which are #included from other
C files: op_addsub.c.inc.

(Ironically, this means that helper.c no longer contains
any TCG helper function definitions at all.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250110131211.2546314-1-peter.maydell@linaro.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
---
 target/arm/helper.c                           | 285 -----------------
 target/arm/tcg/arith_helper.c                 | 296 ++++++++++++++++++
 .../arm/{op_addsub.h => tcg/op_addsub.c.inc}  |   0
 target/arm/tcg/meson.build                    |   1 +
 4 files changed, 297 insertions(+), 285 deletions(-)
 create mode 100644 target/arm/tcg/arith_helper.c
 rename target/arm/{op_addsub.h => tcg/op_addsub.c.inc} (100%)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/main-loop.h"
 #include "qemu/timer.h"
 #include "qemu/bitops.h"
-#include "qemu/crc32c.h"
 #include "qemu/qemu-print.h"
 #include "exec/exec-all.h"
 #include "exec/translation-block.h"
-#include <zlib.h> /* for crc32 */
 #include "hw/irq.h"
 #include "system/cpu-timers.h"
 #include "system/kvm.h"
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
     };
 }
 
-/*
- * Note that signed overflow is undefined in C.  The following routines are
- * careful to use unsigned types where modulo arithmetic is required.
- * Failure to do so _will_ break on newer gcc.
- */
-
-/* Signed saturating arithmetic.  */
-
-/* Perform 16-bit signed saturating addition.  */
-static inline uint16_t add16_sat(uint16_t a, uint16_t b)
-{
-    uint16_t res;
-
-    res = a + b;
-    if (((res ^ a) & 0x8000) && !((a ^ b) & 0x8000)) {
-        if (a & 0x8000) {
-            res = 0x8000;
-        } else {
-            res = 0x7fff;
-        }
-    }
-    return res;
-}
-
-/* Perform 8-bit signed saturating addition.  */
-static inline uint8_t add8_sat(uint8_t a, uint8_t b)
-{
-    uint8_t res;
-
-    res = a + b;
-    if (((res ^ a) & 0x80) && !((a ^ b) & 0x80)) {
-        if (a & 0x80) {
-            res = 0x80;
-        } else {
-            res = 0x7f;
-        }
-    }
-    return res;
-}
-
-/* Perform 16-bit signed saturating subtraction.  */
-static inline uint16_t sub16_sat(uint16_t a, uint16_t b)
-{
-    uint16_t res;
-
-    res = a - b;
-    if (((res ^ a) & 0x8000) && ((a ^ b) & 0x8000)) {
-        if (a & 0x8000) {
-            res = 0x8000;
-        } else {
-            res = 0x7fff;
-        }
-    }
-    return res;
-}
-
-/* Perform 8-bit signed saturating subtraction.  */
-static inline uint8_t sub8_sat(uint8_t a, uint8_t b)
-{
-    uint8_t res;
-
-    res = a - b;
-    if (((res ^ a) & 0x80) && ((a ^ b) & 0x80)) {
-        if (a & 0x80) {
-            res = 0x80;
-        } else {
-            res = 0x7f;
-        }
-    }
-    return res;
-}
-
-#define ADD16(a, b, n) RESULT(add16_sat(a, b), n, 16);
-#define SUB16(a, b, n) RESULT(sub16_sat(a, b), n, 16);
-#define ADD8(a, b, n)  RESULT(add8_sat(a, b), n, 8);
-#define SUB8(a, b, n)  RESULT(sub8_sat(a, b), n, 8);
-#define PFX q
-
-#include "op_addsub.h"
-
-/* Unsigned saturating arithmetic.  */
-static inline uint16_t add16_usat(uint16_t a, uint16_t b)
-{
-    uint16_t res;
-    res = a + b;
-    if (res < a) {
-        res = 0xffff;
-    }
-    return res;
-}
-
-static inline uint16_t sub16_usat(uint16_t a, uint16_t b)
-{
-    if (a > b) {
-        return a - b;
-    } else {
-        return 0;
-    }
-}
-
-static inline uint8_t add8_usat(uint8_t a, uint8_t b)
-{
-    uint8_t res;
-    res = a + b;
-    if (res < a) {
-        res = 0xff;
-    }
-    return res;
-}
-
-static inline uint8_t sub8_usat(uint8_t a, uint8_t b)
-{
-    if (a > b) {
-        return a - b;
-    } else {
-        return 0;
-    }
-}
-
-#define ADD16(a, b, n) RESULT(add16_usat(a, b), n, 16);
-#define SUB16(a, b, n) RESULT(sub16_usat(a, b), n, 16);
-#define ADD8(a, b, n)  RESULT(add8_usat(a, b), n, 8);
-#define SUB8(a, b, n)  RESULT(sub8_usat(a, b), n, 8);
-#define PFX uq
-
-#include "op_addsub.h"
-
-/* Signed modulo arithmetic.  */
-#define SARITH16(a, b, n, op) do { \
-    int32_t sum; \
-    sum = (int32_t)(int16_t)(a) op (int32_t)(int16_t)(b); \
-    RESULT(sum, n, 16); \
-    if (sum >= 0) \
-        ge |= 3 << (n * 2); \
-    } while (0)
-
-#define SARITH8(a, b, n, op) do { \
-    int32_t sum; \
-    sum = (int32_t)(int8_t)(a) op (int32_t)(int8_t)(b); \
-    RESULT(sum, n, 8); \
-    if (sum >= 0) \
-        ge |= 1 << n; \
-    } while (0)
-
-
-#define ADD16(a, b, n) SARITH16(a, b, n, +)
-#define SUB16(a, b, n) SARITH16(a, b, n, -)
-#define ADD8(a, b, n)  SARITH8(a, b, n, +)
-#define SUB8(a, b, n)  SARITH8(a, b, n, -)
-#define PFX s
-#define ARITH_GE
-
-#include "op_addsub.h"
-
-/* Unsigned modulo arithmetic.  */
-#define ADD16(a, b, n) do { \
-    uint32_t sum; \
-    sum = (uint32_t)(uint16_t)(a) + (uint32_t)(uint16_t)(b); \
-    RESULT(sum, n, 16); \
-    if ((sum >> 16) == 1) \
-        ge |= 3 << (n * 2); \
-    } while (0)
-
-#define ADD8(a, b, n) do { \
-    uint32_t sum; \
-    sum = (uint32_t)(uint8_t)(a) + (uint32_t)(uint8_t)(b); \
-    RESULT(sum, n, 8); \
-    if ((sum >> 8) == 1) \
-        ge |= 1 << n; \
-    } while (0)
-
-#define SUB16(a, b, n) do { \
-    uint32_t sum; \
-    sum = (uint32_t)(uint16_t)(a) - (uint32_t)(uint16_t)(b); \
-    RESULT(sum, n, 16); \
-    if ((sum >> 16) == 0) \
-        ge |= 3 << (n * 2); \
-    } while (0)
-
-#define SUB8(a, b, n) do { \
-    uint32_t sum; \
-    sum = (uint32_t)(uint8_t)(a) - (uint32_t)(uint8_t)(b); \
-    RESULT(sum, n, 8); \
-    if ((sum >> 8) == 0) \
-        ge |= 1 << n; \
-    } while (0)
-
-#define PFX u
-#define ARITH_GE
-
-#include "op_addsub.h"
-
-/* Halved signed arithmetic.  */
-#define ADD16(a, b, n) \
-  RESULT(((int32_t)(int16_t)(a) + (int32_t)(int16_t)(b)) >> 1, n, 16)
-#define SUB16(a, b, n) \
-  RESULT(((int32_t)(int16_t)(a) - (int32_t)(int16_t)(b)) >> 1, n, 16)
-#define ADD8(a, b, n) \
-  RESULT(((int32_t)(int8_t)(a) + (int32_t)(int8_t)(b)) >> 1, n, 8)
-#define SUB8(a, b, n) \
-  RESULT(((int32_t)(int8_t)(a) - (int32_t)(int8_t)(b)) >> 1, n, 8)
-#define PFX sh
-
-#include "op_addsub.h"
-
-/* Halved unsigned arithmetic.  */
-#define ADD16(a, b, n) \
-  RESULT(((uint32_t)(uint16_t)(a) + (uint32_t)(uint16_t)(b)) >> 1, n, 16)
-#define SUB16(a, b, n) \
-  RESULT(((uint32_t)(uint16_t)(a) - (uint32_t)(uint16_t)(b)) >> 1, n, 16)
-#define ADD8(a, b, n) \
-  RESULT(((uint32_t)(uint8_t)(a) + (uint32_t)(uint8_t)(b)) >> 1, n, 8)
-#define SUB8(a, b, n) \
-  RESULT(((uint32_t)(uint8_t)(a) - (uint32_t)(uint8_t)(b)) >> 1, n, 8)
-#define PFX uh
-
-#include "op_addsub.h"
-
-static inline uint8_t do_usad(uint8_t a, uint8_t b)
-{
-    if (a > b) {
-        return a - b;
-    } else {
-        return b - a;
-    }
-}
-
-/* Unsigned sum of absolute byte differences.  */
-uint32_t HELPER(usad8)(uint32_t a, uint32_t b)
-{
-    uint32_t sum;
-    sum = do_usad(a, b);
-    sum += do_usad(a >> 8, b >> 8);
-    sum += do_usad(a >> 16, b >> 16);
-    sum += do_usad(a >> 24, b >> 24);
-    return sum;
-}
-
-/* For ARMv6 SEL instruction.  */
-uint32_t HELPER(sel_flags)(uint32_t flags, uint32_t a, uint32_t b)
-{
-    uint32_t mask;
-
-    mask = 0;
-    if (flags & 1) {
-        mask |= 0xff;
-    }
-    if (flags & 2) {
-        mask |= 0xff00;
-    }
-    if (flags & 4) {
-        mask |= 0xff0000;
-    }
-    if (flags & 8) {
-        mask |= 0xff000000;
-    }
-    return (a & mask) | (b & ~mask);
-}
-
-/*
- * CRC helpers.
- * The upper bytes of val (above the number specified by 'bytes') must have
- * been zeroed out by the caller.
- */
-uint32_t HELPER(crc32)(uint32_t acc, uint32_t val, uint32_t bytes)
-{
-    uint8_t buf[4];
-
-    stl_le_p(buf, val);
-
-    /* zlib crc32 converts the accumulator and output to one's complement.  */
-    return crc32(acc ^ 0xffffffff, buf, bytes) ^ 0xffffffff;
-}
-
-uint32_t HELPER(crc32c)(uint32_t acc, uint32_t val, uint32_t bytes)
-{
-    uint8_t buf[4];
-
-    stl_le_p(buf, val);
-
-    /* Linux crc32c converts the output to one's complement.  */
-    return crc32c(acc, buf, bytes) ^ 0xffffffff;
-}
 
 /*
  * Return the exception level to which FP-disabled exceptions should
diff --git a/target/arm/tcg/arith_helper.c b/target/arm/tcg/arith_helper.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/tcg/arith_helper.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * ARM generic helpers for various arithmetical operations.
+ *
+ * This code is licensed under the GNU GPL v2 or later.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/helper-proto.h"
+#include "qemu/crc32c.h"
+#include <zlib.h> /* for crc32 */
+
+/*
+ * Note that signed overflow is undefined in C.  The following routines are
+ * careful to use unsigned types where modulo arithmetic is required.
+ * Failure to do so _will_ break on newer gcc.
+ */
+
+/* Signed saturating arithmetic.  */
+
+/* Perform 16-bit signed saturating addition.  */
+static inline uint16_t add16_sat(uint16_t a, uint16_t b)
+{
+    uint16_t res;
+
+    res = a + b;
+    if (((res ^ a) & 0x8000) && !((a ^ b) & 0x8000)) {
+        if (a & 0x8000) {
+            res = 0x8000;
+        } else {
+            res = 0x7fff;
+        }
+    }
+    return res;
+}
+
+/* Perform 8-bit signed saturating addition.  */
+static inline uint8_t add8_sat(uint8_t a, uint8_t b)
+{
+    uint8_t res;
+
+    res = a + b;
+    if (((res ^ a) & 0x80) && !((a ^ b) & 0x80)) {
+        if (a & 0x80) {
+            res = 0x80;
+        } else {
+            res = 0x7f;
+        }
+    }
+    return res;
+}
+
+/* Perform 16-bit signed saturating subtraction.  */
+static inline uint16_t sub16_sat(uint16_t a, uint16_t b)
+{
+    uint16_t res;
+
+    res = a - b;
+    if (((res ^ a) & 0x8000) && ((a ^ b) & 0x8000)) {
+        if (a & 0x8000) {
+            res = 0x8000;
+        } else {
+            res = 0x7fff;
+        }
+    }
+    return res;
+}
+
+/* Perform 8-bit signed saturating subtraction.  */
+static inline uint8_t sub8_sat(uint8_t a, uint8_t b)
+{
+    uint8_t res;
+
+    res = a - b;
+    if (((res ^ a) & 0x80) && ((a ^ b) & 0x80)) {
+        if (a & 0x80) {
+            res = 0x80;
+        } else {
+            res = 0x7f;
+        }
+    }
+    return res;
+}
+
+#define ADD16(a, b, n) RESULT(add16_sat(a, b), n, 16);
+#define SUB16(a, b, n) RESULT(sub16_sat(a, b), n, 16);
+#define ADD8(a, b, n)  RESULT(add8_sat(a, b), n, 8);
+#define SUB8(a, b, n)  RESULT(sub8_sat(a, b), n, 8);
+#define PFX q
+
+#include "op_addsub.c.inc"
+
+/* Unsigned saturating arithmetic.  */
+static inline uint16_t add16_usat(uint16_t a, uint16_t b)
+{
+    uint16_t res;
+    res = a + b;
+    if (res < a) {
+        res = 0xffff;
+    }
+    return res;
+}
+
+static inline uint16_t sub16_usat(uint16_t a, uint16_t b)
+{
+    if (a > b) {
+        return a - b;
+    } else {
+        return 0;
+    }
+}
+
+static inline uint8_t add8_usat(uint8_t a, uint8_t b)
+{
+    uint8_t res;
+    res = a + b;
+    if (res < a) {
+        res = 0xff;
+    }
+    return res;
+}
+
+static inline uint8_t sub8_usat(uint8_t a, uint8_t b)
+{
+    if (a > b) {
+        return a - b;
+    } else {
+        return 0;
+    }
+}
+
+#define ADD16(a, b, n) RESULT(add16_usat(a, b), n, 16);
+#define SUB16(a, b, n) RESULT(sub16_usat(a, b), n, 16);
+#define ADD8(a, b, n)  RESULT(add8_usat(a, b), n, 8);
+#define SUB8(a, b, n)  RESULT(sub8_usat(a, b), n, 8);
+#define PFX uq
+
+#include "op_addsub.c.inc"
+
+/* Signed modulo arithmetic.  */
+#define SARITH16(a, b, n, op) do { \
+    int32_t sum; \
+    sum = (int32_t)(int16_t)(a) op (int32_t)(int16_t)(b); \
+    RESULT(sum, n, 16); \
+    if (sum >= 0) \
+        ge |= 3 << (n * 2); \
+    } while (0)
+
+#define SARITH8(a, b, n, op) do { \
+    int32_t sum; \
+    sum = (int32_t)(int8_t)(a) op (int32_t)(int8_t)(b); \
+    RESULT(sum, n, 8); \
+    if (sum >= 0) \
+        ge |= 1 << n; \
+    } while (0)
+
+
+#define ADD16(a, b, n) SARITH16(a, b, n, +)
+#define SUB16(a, b, n) SARITH16(a, b, n, -)
+#define ADD8(a, b, n)  SARITH8(a, b, n, +)
+#define SUB8(a, b, n)  SARITH8(a, b, n, -)
+#define PFX s
+#define ARITH_GE
+
+#include "op_addsub.c.inc"
+
+/* Unsigned modulo arithmetic.  */
+#define ADD16(a, b, n) do { \
+    uint32_t sum; \
+    sum = (uint32_t)(uint16_t)(a) + (uint32_t)(uint16_t)(b); \
+    RESULT(sum, n, 16); \
+    if ((sum >> 16) == 1) \
+        ge |= 3 << (n * 2); \
+    } while (0)
+
+#define ADD8(a, b, n) do { \
+    uint32_t sum; \
+    sum = (uint32_t)(uint8_t)(a) + (uint32_t)(uint8_t)(b); \
+    RESULT(sum, n, 8); \
+    if ((sum >> 8) == 1) \
+        ge |= 1 << n; \
+    } while (0)
+
+#define SUB16(a, b, n) do { \
+    uint32_t sum; \
+    sum = (uint32_t)(uint16_t)(a) - (uint32_t)(uint16_t)(b); \
+    RESULT(sum, n, 16); \
+    if ((sum >> 16) == 0) \
+        ge |= 3 << (n * 2); \
+    } while (0)
+
+#define SUB8(a, b, n) do { \
+    uint32_t sum; \
+    sum = (uint32_t)(uint8_t)(a) - (uint32_t)(uint8_t)(b); \
+    RESULT(sum, n, 8); \
+    if ((sum >> 8) == 0) \
+        ge |= 1 << n; \
+    } while (0)
+
+#define PFX u
+#define ARITH_GE
+
+#include "op_addsub.c.inc"
+
+/* Halved signed arithmetic.  */
+#define ADD16(a, b, n) \
+  RESULT(((int32_t)(int16_t)(a) + (int32_t)(int16_t)(b)) >> 1, n, 16)
+#define SUB16(a, b, n) \
+  RESULT(((int32_t)(int16_t)(a) - (int32_t)(int16_t)(b)) >> 1, n, 16)
+#define ADD8(a, b, n) \
+  RESULT(((int32_t)(int8_t)(a) + (int32_t)(int8_t)(b)) >> 1, n, 8)
+#define SUB8(a, b, n) \
+  RESULT(((int32_t)(int8_t)(a) - (int32_t)(int8_t)(b)) >> 1, n, 8)
+#define PFX sh
+
+#include "op_addsub.c.inc"
+
+/* Halved unsigned arithmetic.  */
+#define ADD16(a, b, n) \
+  RESULT(((uint32_t)(uint16_t)(a) + (uint32_t)(uint16_t)(b)) >> 1, n, 16)
+#define SUB16(a, b, n) \
+  RESULT(((uint32_t)(uint16_t)(a) - (uint32_t)(uint16_t)(b)) >> 1, n, 16)
+#define ADD8(a, b, n) \
+  RESULT(((uint32_t)(uint8_t)(a) + (uint32_t)(uint8_t)(b)) >> 1, n, 8)
+#define SUB8(a, b, n) \
+  RESULT(((uint32_t)(uint8_t)(a) - (uint32_t)(uint8_t)(b)) >> 1, n, 8)
+#define PFX uh
+
+#include "op_addsub.c.inc"
+
+static inline uint8_t do_usad(uint8_t a, uint8_t b)
+{
+    if (a > b) {
+        return a - b;
+    } else {
+        return b - a;
+    }
+}
+
+/* Unsigned sum of absolute byte differences.  */
+uint32_t HELPER(usad8)(uint32_t a, uint32_t b)
+{
+    uint32_t sum;
+    sum = do_usad(a, b);
+    sum += do_usad(a >> 8, b >> 8);
+    sum += do_usad(a >> 16, b >> 16);
+    sum += do_usad(a >> 24, b >> 24);
+    return sum;
+}
+
+/* For ARMv6 SEL instruction.  */
+uint32_t HELPER(sel_flags)(uint32_t flags, uint32_t a, uint32_t b)
+{
+    uint32_t mask;
+
+    mask = 0;
+    if (flags & 1) {
+        mask |= 0xff;
+    }
+    if (flags & 2) {
+        mask |= 0xff00;
+    }
+    if (flags & 4) {
+        mask |= 0xff0000;
+    }
+    if (flags & 8) {
+        mask |= 0xff000000;
+    }
+    return (a & mask) | (b & ~mask);
+}
+
+/*
+ * CRC helpers.
+ * The upper bytes of val (above the number specified by 'bytes') must have
+ * been zeroed out by the caller.
+ */
+uint32_t HELPER(crc32)(uint32_t acc, uint32_t val, uint32_t bytes)
+{
+    uint8_t buf[4];
+
+    stl_le_p(buf, val);
+
+    /* zlib crc32 converts the accumulator and output to one's complement.  */
+    return crc32(acc ^ 0xffffffff, buf, bytes) ^ 0xffffffff;
+}
+
+uint32_t HELPER(crc32c)(uint32_t acc, uint32_t val, uint32_t bytes)
+{
+    uint8_t buf[4];
+
+    stl_le_p(buf, val);
+
+    /* Linux crc32c converts the output to one's complement.  */
+    return crc32c(acc, buf, bytes) ^ 0xffffffff;
+}
diff --git a/target/arm/op_addsub.h b/target/arm/tcg/op_addsub.c.inc
similarity index 100%
rename from target/arm/op_addsub.h
rename to target/arm/tcg/op_addsub.c.inc
diff --git a/target/arm/tcg/meson.build b/target/arm/tcg/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/meson.build
+++ b/target/arm/tcg/meson.build
@@ -XXX,XX +XXX,XX @@ arm_ss.add(files(
   'tlb_helper.c',
   'vec_helper.c',
   'tlb-insns.c',
+  'arith_helper.c',
 ))
 
 arm_ss.add(when: 'TARGET_AARCH64', if_true: files(
-- 
2.34.1

From: Pierrick Bouvier <pierrick.bouvier@linaro.org>

Before changing default pauth algorithm, we need to make sure current
default one (QARMA5) can still be selected.

$ qemu-system-aarch64 -cpu max,pauth-qarma5=on ...

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241219183211.3493974-2-pierrick.bouvier@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/cpu-features.rst |  5 ++++-
 target/arm/cpu.h                 |  1 +
 target/arm/arm-qmp-cmds.c        |  2 +-
 target/arm/cpu64.c               | 20 ++++++++++++++------
 tests/qtest/arm-cpu-features.c   | 15 +++++++++++----
 5 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/cpu-features.rst
+++ b/docs/system/arm/cpu-features.rst
@@ -XXX,XX +XXX,XX @@ Below is the list of TCG VCPU features and their descriptions.
 ``pauth-qarma3``
   When ``pauth`` is enabled, select the architected QARMA3 algorithm.
 
-Without either ``pauth-impdef`` or ``pauth-qarma3`` enabled,
+``pauth-qarma5``
+  When ``pauth`` is enabled, select the architected QARMA5 algorithm.
+
+Without ``pauth-impdef``, ``pauth-qarma3`` or ``pauth-qarma5`` enabled,
 the architected QARMA5 algorithm is used.  The architected QARMA5
 and QARMA3 algorithms have good cryptographic properties, but can
 be quite slow to emulate.  The impdef algorithm used by QEMU is
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
     bool prop_pauth;
     bool prop_pauth_impdef;
     bool prop_pauth_qarma3;
+    bool prop_pauth_qarma5;
     bool prop_lpa2;
 
     /* DCZ blocksize, in log_2(words), ie low 4 bits of DCZID_EL0 */
diff --git a/target/arm/arm-qmp-cmds.c b/target/arm/arm-qmp-cmds.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/arm-qmp-cmds.c
+++ b/target/arm/arm-qmp-cmds.c
@@ -XXX,XX +XXX,XX @@ static const char *cpu_model_advertised_features[] = {
     "sve640", "sve768", "sve896", "sve1024", "sve1152", "sve1280",
     "sve1408", "sve1536", "sve1664", "sve1792", "sve1920", "sve2048",
     "kvm-no-adjvtime", "kvm-steal-time",
-    "pauth", "pauth-impdef", "pauth-qarma3",
+    "pauth", "pauth-impdef", "pauth-qarma3", "pauth-qarma5",
     NULL
 };
 
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
         }
 
         if (cpu->prop_pauth) {
-            if (cpu->prop_pauth_impdef && cpu->prop_pauth_qarma3) {
+            if ((cpu->prop_pauth_impdef && cpu->prop_pauth_qarma3) ||
+                (cpu->prop_pauth_impdef && cpu->prop_pauth_qarma5) ||
+                (cpu->prop_pauth_qarma3 && cpu->prop_pauth_qarma5)) {
                 error_setg(errp,
-                           "cannot enable both pauth-impdef and pauth-qarma3");
+                           "cannot enable pauth-impdef, pauth-qarma3 and "
+                           "pauth-qarma5 at the same time");
                 return;
             }
 
@@ -XXX,XX +XXX,XX @@ void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
             } else if (cpu->prop_pauth_qarma3) {
                 isar2 = FIELD_DP64(isar2, ID_AA64ISAR2, APA3, features);
                 isar2 = FIELD_DP64(isar2, ID_AA64ISAR2, GPA3, 1);
-            } else {
+            } else { /* default is pauth-qarma5 */
                 isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, APA, features);
                 isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, GPA, 1);
             }
-        } else if (cpu->prop_pauth_impdef || cpu->prop_pauth_qarma3) {
-            error_setg(errp, "cannot enable pauth-impdef or "
-                       "pauth-qarma3 without pauth");
+        } else if (cpu->prop_pauth_impdef ||
+                   cpu->prop_pauth_qarma3 ||
+                   cpu->prop_pauth_qarma5) {
+            error_setg(errp, "cannot enable pauth-impdef, pauth-qarma3 or "
+                       "pauth-qarma5 without pauth");
             error_append_hint(errp, "Add pauth=on to the CPU property list.\n");
         }
     }
@@ -XXX,XX +XXX,XX @@ static const Property arm_cpu_pauth_impdef_property =
     DEFINE_PROP_BOOL("pauth-impdef", ARMCPU, prop_pauth_impdef, false);
 static const Property arm_cpu_pauth_qarma3_property =
     DEFINE_PROP_BOOL("pauth-qarma3", ARMCPU, prop_pauth_qarma3, false);
+static Property arm_cpu_pauth_qarma5_property =
+    DEFINE_PROP_BOOL("pauth-qarma5", ARMCPU, prop_pauth_qarma5, false);
 
 void aarch64_add_pauth_properties(Object *obj)
 {
@@ -XXX,XX +XXX,XX @@ void aarch64_add_pauth_properties(Object *obj)
     } else {
         qdev_property_add_static(DEVICE(obj), &arm_cpu_pauth_impdef_property);
         qdev_property_add_static(DEVICE(obj), &arm_cpu_pauth_qarma3_property);
+        qdev_property_add_static(DEVICE(obj), &arm_cpu_pauth_qarma5_property);
     }
 }
 
diff --git a/tests/qtest/arm-cpu-features.c b/tests/qtest/arm-cpu-features.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/arm-cpu-features.c
+++ b/tests/qtest/arm-cpu-features.c
@@ -XXX,XX +XXX,XX @@ static void pauth_tests_default(QTestState *qts, const char *cpu_type)
     assert_has_feature_enabled(qts, cpu_type, "pauth");
     assert_has_feature_disabled(qts, cpu_type, "pauth-impdef");
     assert_has_feature_disabled(qts, cpu_type, "pauth-qarma3");
+    assert_has_feature_disabled(qts, cpu_type, "pauth-qarma5");
     assert_set_feature(qts, cpu_type, "pauth", false);
     assert_set_feature(qts, cpu_type, "pauth", true);
     assert_set_feature(qts, cpu_type, "pauth-impdef", true);
     assert_set_feature(qts, cpu_type, "pauth-impdef", false);
     assert_set_feature(qts, cpu_type, "pauth-qarma3", true);
     assert_set_feature(qts, cpu_type, "pauth-qarma3", false);
+    assert_set_feature(qts, cpu_type, "pauth-qarma5", true);
+    assert_set_feature(qts, cpu_type, "pauth-qarma5", false);
     assert_error(qts, cpu_type,
-                 "cannot enable pauth-impdef or pauth-qarma3 without pauth",
+                 "cannot enable pauth-impdef, pauth-qarma3 or pauth-qarma5 without pauth",
                  "{ 'pauth': false, 'pauth-impdef': true }");
     assert_error(qts, cpu_type,
-                 "cannot enable pauth-impdef or pauth-qarma3 without pauth",
+                 "cannot enable pauth-impdef, pauth-qarma3 or pauth-qarma5 without pauth",
                  "{ 'pauth': false, 'pauth-qarma3': true }");
     assert_error(qts, cpu_type,
-                 "cannot enable both pauth-impdef and pauth-qarma3",
-                 "{ 'pauth': true, 'pauth-impdef': true, 'pauth-qarma3': true }");
+                 "cannot enable pauth-impdef, pauth-qarma3 or pauth-qarma5 without pauth",
+                 "{ 'pauth': false, 'pauth-qarma5': true }");
+    assert_error(qts, cpu_type,
+                 "cannot enable pauth-impdef, pauth-qarma3 and pauth-qarma5 at the same time",
+                 "{ 'pauth': true, 'pauth-impdef': true, 'pauth-qarma3': true,"
+                 "  'pauth-qarma5': true }");
 }
 
 static void test_query_cpu_model_expansion(const void *data)
-- 
2.34.1

The pauth-3 test explicitly tests that a computation of the
pointer-authentication produces the expected result.  This means that
it must be run with the QARMA5 algorithm.

Explicitly set the pauth algorithm when running this test, so that it
doesn't break when we change the default algorithm the 'max' CPU
uses.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/tcg/aarch64/Makefile.softmmu-target | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/tcg/aarch64/Makefile.softmmu-target b/tests/tcg/aarch64/Makefile.softmmu-target
index XXXXXXX..XXXXXXX 100644
--- a/tests/tcg/aarch64/Makefile.softmmu-target
+++ b/tests/tcg/aarch64/Makefile.softmmu-target
@@ -XXX,XX +XXX,XX @@ EXTRA_RUNS+=run-memory-replay
 
 ifneq ($(CROSS_CC_HAS_ARMV8_3),)
 pauth-3: CFLAGS += $(CROSS_CC_HAS_ARMV8_3)
+# This test explicitly checks the output of the pauth operation so we
+# must force the use of the QARMA5 algorithm for it.
+run-pauth-3: QEMU_BASE_MACHINE=-M virt -cpu max,pauth-qarma5=on -display none
 else
 pauth-3:
 	$(call skip-test, "BUILD of $@", "missing compiler support")
-- 
2.34.1

From: Pierrick Bouvier <pierrick.bouvier@linaro.org>

Pointer authentication on aarch64 is pretty expensive (up to 50% of
execution time) when running a virtual machine with tcg and -cpu max
(which enables pauth=on).

The advice is always: use pauth-impdef=on.
Our documentation even mentions it "by default" in
docs/system/introduction.rst.

Thus, we change the default to use impdef by default. This does not
affect kvm or hvf acceleration, since pauth algorithm used is the one
from host cpu.

This change is retro compatible, in terms of cli, with previous
versions, as the semantic of using -cpu max,pauth-impdef=on, and -cpu
max,pauth-qarma3=on is preserved.
The new option introduced in previous patch and matching old default is
-cpu max,pauth-qarma5=on.
It is retro compatible with migration as well, by defining a backcompat
property, that will use qarma5 by default for virt machine <= 9.2.
Tested by saving and restoring a vm from qemu 9.2.0 into qemu-master
(10.0) for cpus neoverse-n2 and max.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241219183211.3493974-3-pierrick.bouvier@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/cpu-features.rst |  2 +-
 docs/system/introduction.rst     |  2 +-
 target/arm/cpu.h                 |  3 +++
 hw/core/machine.c                |  4 +++-
 target/arm/cpu.c                 |  2 ++
 target/arm/cpu64.c               | 22 ++++++++++++++++------
 6 files changed, 26 insertions(+), 9 deletions(-)