Series comparison

-[PULL 0/4] target-arm queue
+[PULL 00/11] target-arm queue
-Just 4 bug fixes here...
+The following changes since commit 3214bec13d8d4c40f707d21d8350d04e4123ae97:
-thanks
+  Merge tag 'migration-20250110-pull-request' of https://gitlab.com/farosas/qemu into staging (2025-01-10 13:39:19 -0500)
 -- PMM
 The following changes since commit e9d2db818ff934afb366aea566d0b33acf7bced1:
   Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging (2024-08-01 07:31:49 +1000)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20240801
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20250113
-for you to fetch changes up to 5e8e4f098d872818aa9a138a171200068b81c8d1:
+for you to fetch changes up to 435d260e7ec5ff9c79e3e62f1d66ec82d2d691ae:
-  target/xtensa: Correct assert condition in handle_interrupt() (2024-08-01 10:59:01 +0100)
+  docs/system/arm/virt: mention specific migration information (2025-01-13 12:35:35 +0000)
 ----------------------------------------------------------------
 target-arm queue:
- * hw/arm/mps2-tz.c: fix RX/TX interrupts order
+ * hw/arm_sysctl: fix extracting 31th bit of val
- * accel/kvm/kvm-all: Fixes the missing break in vCPU unpark logic
+ * hw/misc: cast rpm to uint64_t
- * target/arm: Handle denormals correctly for FMOPA (widening)
+ * tests/qtest/boot-serial-test: Improve ASM
- * target/xtensa: Correct assert condition in handle_interrupt()
+ * target/arm: Move minor arithmetic helpers out of helper.c
  * target/arm: change default pauth algorithm to impdef
 ----------------------------------------------------------------
-Marco Palumbi (1):
+Anastasia Belova (1):
-      hw/arm/mps2-tz.c: fix RX/TX interrupts order
+      hw/arm_sysctl: fix extracting 31th bit of val
 Peter Maydell (2):
-      target/arm: Handle denormals correctly for FMOPA (widening)
+      target/arm: Move minor arithmetic helpers out of helper.c
-      target/xtensa: Correct assert condition in handle_interrupt()
+      tests/tcg/aarch64: force qarma5 for pauth-3 test
-Salil Mehta (1):
+Philippe Mathieu-Daudé (4):
-      accel/kvm/kvm-all: Fixes the missing break in vCPU unpark logic
+      tests/qtest/boot-serial-test: Improve ASM comments of PL011 tests
       tests/qtest/boot-serial-test: Reduce for() loop in PL011 tests
       tests/qtest/boot-serial-test: Reorder pair of instructions in PL011 test
       tests/qtest/boot-serial-test: Initialize PL011 Control register
- target/arm/tcg/helper-sme.h    |  2 +-
+Pierrick Bouvier (3):
- accel/kvm/kvm-all.c            |  1 +
+      target/arm: add new property to select pauth-qarma5
- hw/arm/mps2-tz.c               |  6 +++---
+      target/arm: change default pauth algorithm to impdef
- target/arm/tcg/sme_helper.c    | 39 +++++++++++++++++++++++++++------------
+      docs/system/arm/virt: mention specific migration information
- target/arm/tcg/translate-sme.c | 25 +++++++++++++++++++++++--
- target/xtensa/exc_helper.c     |  2 +-
+Tigran Sogomonian (1):
-files changed, 56 insertions(+), 19 deletions(-)
+      hw/misc: cast rpm to uint64_t
  docs/system/arm/cpu-features.rst                |   7 +-
  docs/system/arm/virt.rst                        |   4 +
  docs/system/introduction.rst                    |   2 +-
  target/arm/cpu.h                                |   4 +
  hw/core/machine.c                               |   4 +-
  hw/misc/arm_sysctl.c                            |   2 +-
  hw/misc/npcm7xx_mft.c                           |   5 +-
  target/arm/arm-qmp-cmds.c                       |   2 +-
  target/arm/cpu.c                                |   2 +
  target/arm/cpu64.c                              |  38 ++-
  target/arm/helper.c                             | 285 -----------------------
  target/arm/tcg/arith_helper.c                   | 296 ++++++++++++++++++++++++
  tests/qtest/arm-cpu-features.c                  |  15 +-
  tests/qtest/boot-serial-test.c                  |  23 +-
  target/arm/{op_addsub.h => tcg/op_addsub.c.inc} |   0
  target/arm/tcg/meson.build                      |   1 +
  tests/tcg/aarch64/Makefile.softmmu-target       |   3 +
 files changed, 377 insertions(+), 316 deletions(-)
  create mode 100644 target/arm/tcg/arith_helper.c
  rename target/arm/{op_addsub.h => tcg/op_addsub.c.inc} (100%)

-New patch
+[PULL 01/11] hw/arm_sysctl: fix extracting 31th bit of val
+From: Anastasia Belova <abelova@astralinux.ru>
+<< 31 is casted to uint64_t while bitwise and with val.
+So this value may become 0xffffffff80000000 but only
+th "start" bit is required.
+This is not possible in practice because the MemoryRegionOps
+uses the default max access size of 4 bytes and so none
+of the upper bytes of val will be set, but the bitfield
+extract API is clearer anyway.
+Use the bitfield extract() API instead.
+Found by Linux Verification Center (linuxtesting.org) with SVACE.
+Signed-off-by: Anastasia Belova <abelova@astralinux.ru>
+Message-id: 20241220125429.7552-1-abelova@astralinux.ru
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+[PMM: add clarification to commit message]
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ hw/misc/arm_sysctl.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/hw/misc/arm_sysctl.c b/hw/misc/arm_sysctl.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/misc/arm_sysctl.c
++++ b/hw/misc/arm_sysctl.c
+@@ -XXX,XX +XXX,XX @@ static void arm_sysctl_write(void *opaque, hwaddr offset,
+          * as zero.
+          */
+         s->sys_cfgctrl = val & ~((3 << 18) | (1 << 31));
+-        if (val & (1 << 31)) {
++        if (extract64(val, 31, 1)) {
+             /* Start bit set -- actually do something */
+             unsigned int dcc = extract32(s->sys_cfgctrl, 26, 4);
+             unsigned int function = extract32(s->sys_cfgctrl, 20, 6);
+--
+.34.1

-New patch
+[PULL 02/11] hw/misc: cast rpm to uint64_t
+From: Tigran Sogomonian <tsogomonian@astralinux.ru>
+The value of an arithmetic expression
+'rpm * NPCM7XX_MFT_PULSE_PER_REVOLUTION' is a subject
+to overflow because its operands are not cast to
+a larger data type before performing arithmetic. Thus, need
+to cast rpm to uint64_t.
+Found by Linux Verification Center (linuxtesting.org) with SVACE.
+Signed-off-by: Tigran Sogomonian <tsogomonian@astralinux.ru>
+Reviewed-by: Patrick Leis <venture@google.com>
+Reviewed-by: Hao Wu <wuhaotsh@google.com>
+Message-id: 20241226130311.1349-1-tsogomonian@astralinux.ru
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ hw/misc/npcm7xx_mft.c | 5 +++--
+file changed, 3 insertions(+), 2 deletions(-)
+diff --git a/hw/misc/npcm7xx_mft.c b/hw/misc/npcm7xx_mft.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/misc/npcm7xx_mft.c
++++ b/hw/misc/npcm7xx_mft.c
+@@ -XXX,XX +XXX,XX @@ static NPCM7xxMFTCaptureState npcm7xx_mft_compute_cnt(
+          * RPM = revolution/min. The time for one revlution (in ns) is
+          * MINUTE_TO_NANOSECOND / RPM.
+          */
+-        count = clock_ns_to_ticks(clock, (60 * NANOSECONDS_PER_SECOND) /
+-            (rpm * NPCM7XX_MFT_PULSE_PER_REVOLUTION));
++        count = clock_ns_to_ticks(clock,
++            (uint64_t)(60 * NANOSECONDS_PER_SECOND) /
++            ((uint64_t)rpm * NPCM7XX_MFT_PULSE_PER_REVOLUTION));
+     }
+     if (count > NPCM7XX_MFT_MAX_CNT) {
+--
+.34.1

-New patch
+[PULL 03/11] tests/qtest/boot-serial-test: Improve ASM comments of PL011 tests
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
+Re-indent ASM comments adding the 'loop:' label.
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Fabiano Rosas <farosas@suse.de>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ tests/qtest/boot-serial-test.c | 18 +++++++++---------
+file changed, 9 insertions(+), 9 deletions(-)
+diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/qtest/boot-serial-test.c
++++ b/tests/qtest/boot-serial-test.c
+@@ -XXX,XX +XXX,XX @@ static const uint8_t kernel_plml605[] = {
+ };
+ static const uint8_t bios_raspi2[] = {
+-    0x08, 0x30, 0x9f, 0xe5,                 /* ldr   r3,[pc,#8]    Get base */
+-    0x54, 0x20, 0xa0, 0xe3,                 /* mov     r2,#'T' */
+-    0x00, 0x20, 0xc3, 0xe5,                 /* strb    r2,[r3] */
+-    0xfb, 0xff, 0xff, 0xea,                 /* b       loop */
+-    0x00, 0x10, 0x20, 0x3f,                 /* 0x3f201000 = UART0 base addr */
++    0x08, 0x30, 0x9f, 0xe5,                 /* loop:  ldr     r3, [pc, #8]   Get &UART0 */
++    0x54, 0x20, 0xa0, 0xe3,                 /*        mov     r2, #'T' */
++    0x00, 0x20, 0xc3, 0xe5,                 /*        strb    r2, [r3]       *TXDAT = 'T' */
++    0xfb, 0xff, 0xff, 0xea,                 /*        b       -12            (loop) */
++    0x00, 0x10, 0x20, 0x3f,                 /* UART0: 0x3f201000 */
+ };
+ static const uint8_t kernel_aarch64[] = {
+-    0x81, 0x0a, 0x80, 0x52,                 /* mov     w1, #0x54 */
+-    0x02, 0x20, 0xa1, 0xd2,                 /* mov     x2, #0x9000000 */
+-    0x41, 0x00, 0x00, 0x39,                 /* strb    w1, [x2] */
+-    0xfd, 0xff, 0xff, 0x17,                 /* b       -12 (loop) */
++    0x81, 0x0a, 0x80, 0x52,                 /* loop:  mov    w1, #'T' */
++    0x02, 0x20, 0xa1, 0xd2,                 /*        mov    x2, #0x9000000  Load UART0 */
++    0x41, 0x00, 0x00, 0x39,                 /*        strb   w1, [x2]        *TXDAT = 'T' */
++    0xfd, 0xff, 0xff, 0x17,                 /*        b      -12             (loop) */
+ };
+ static const uint8_t kernel_nrf51[] = {
+--
+.34.1

-New patch
+[PULL 04/11] tests/qtest/boot-serial-test: Reduce for() loop in PL011 tests
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
+Since registers are not modified, we don't need
+to refill their values. Directly jump to the previous
+store instruction to keep filling the TXDAT register.
+The equivalent C code remains:
+  while (true) {
+      *UART_DATA = 'T';
+  }
+Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Fabiano Rosas <farosas@suse.de>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ tests/qtest/boot-serial-test.c | 12 ++++++------
+file changed, 6 insertions(+), 6 deletions(-)
+diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/qtest/boot-serial-test.c
++++ b/tests/qtest/boot-serial-test.c
+@@ -XXX,XX +XXX,XX @@ static const uint8_t kernel_plml605[] = {
+ };
+ static const uint8_t bios_raspi2[] = {
+-    0x08, 0x30, 0x9f, 0xe5,                 /* loop:  ldr     r3, [pc, #8]   Get &UART0 */
++    0x08, 0x30, 0x9f, 0xe5,                 /*        ldr     r3, [pc, #8]   Get &UART0 */
+x54, 0x20, 0xa0, 0xe3,                 /*        mov     r2, #'T' */
+-    0x00, 0x20, 0xc3, 0xe5,                 /*        strb    r2, [r3]       *TXDAT = 'T' */
+-    0xfb, 0xff, 0xff, 0xea,                 /*        b       -12            (loop) */
++    0x00, 0x20, 0xc3, 0xe5,                 /* loop:  strb    r2, [r3]       *TXDAT = 'T' */
++    0xff, 0xff, 0xff, 0xea,                 /*        b       -4             (loop) */
+x00, 0x10, 0x20, 0x3f,                 /* UART0: 0x3f201000 */
+ };
+ static const uint8_t kernel_aarch64[] = {
+-    0x81, 0x0a, 0x80, 0x52,                 /* loop:  mov    w1, #'T' */
++    0x81, 0x0a, 0x80, 0x52,                 /*        mov    w1, #'T' */
+x02, 0x20, 0xa1, 0xd2,                 /*        mov    x2, #0x9000000  Load UART0 */
+-    0x41, 0x00, 0x00, 0x39,                 /*        strb   w1, [x2]        *TXDAT = 'T' */
+-    0xfd, 0xff, 0xff, 0x17,                 /*        b      -12             (loop) */
++    0x41, 0x00, 0x00, 0x39,                 /* loop:  strb   w1, [x2]        *TXDAT = 'T' */
++    0xff, 0xff, 0xff, 0x17,                 /*        b      -4              (loop) */
+ };
+ static const uint8_t kernel_nrf51[] = {
+--
+.34.1

-[PULL 4/4] target/xtensa: Correct assert condition in handle_interrupt()
+[PULL 05/11] tests/qtest/boot-serial-test: Reorder pair of instructions in PL011 test
-In commit ad18376b90c8101 we added an assert that the level value was
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
 in-bounds for the array we're about to index into.  However, the
 assert condition is wrong -- env->config->interrupt_vector is an
 array of uint32_t, so we should bounds check the index against
 ARRAY_SIZE(...), not against sizeof().
-Resolves: Coverity CID 1507131
+In the next commit we are going to use a different value
-Fixes: ad18376b90c8101 ("target/xtensa: Assert that interrupt level is within bounds")
+for the $w1 register, maintaining the same $x2 value. In
 order to keep the next commit trivial to review, set $x2
 before $w1.
 Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Fabiano Rosas <farosas@suse.de>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Acked-by: Max Filippov <jcmvbkbc@gmail.com>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Message-id: 20240731172246.3682311-1-peter.maydell@linaro.org
 ---
- target/xtensa/exc_helper.c | 2 +-
+ tests/qtest/boot-serial-test.c | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/target/xtensa/exc_helper.c b/target/xtensa/exc_helper.c
+diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/xtensa/exc_helper.c
+--- a/tests/qtest/boot-serial-test.c
-+++ b/target/xtensa/exc_helper.c
++++ b/tests/qtest/boot-serial-test.c
-@@ -XXX,XX +XXX,XX @@ static void handle_interrupt(CPUXtensaState *env)
+@@ -XXX,XX +XXX,XX @@ static const uint8_t bios_raspi2[] = {
+ };
-         if (level > 1) {
-             /* env->config->nlevel check should have ensured this */
+ static const uint8_t kernel_aarch64[] = {
--            assert(level < sizeof(env->config->interrupt_vector));
+-    0x81, 0x0a, 0x80, 0x52,                 /*        mov    w1, #'T' */
-+            assert(level < ARRAY_SIZE(env->config->interrupt_vector));
+x02, 0x20, 0xa1, 0xd2,                 /*        mov    x2, #0x9000000  Load UART0 */
++    0x81, 0x0a, 0x80, 0x52,                 /*        mov    w1, #'T' */
-             env->sregs[EPC1 + level - 1] = env->pc;
+x41, 0x00, 0x00, 0x39,                 /* loop:  strb   w1, [x2]        *TXDAT = 'T' */
-             env->sregs[EPS2 + level - 2] = env->sregs[PS];
+xff, 0xff, 0xff, 0x17,                 /*        b      -4              (loop) */
  };
 --
 .34.1

-[PULL 2/4] accel/kvm/kvm-all: Fixes the missing break in vCPU unpark logic
+[PULL 06/11] tests/qtest/boot-serial-test: Initialize PL011 Control register
-From: Salil Mehta <salil.mehta@huawei.com>
+From: Philippe Mathieu-Daudé <philmd@linaro.org>
-Loop should exit prematurely on successfully finding out the parked vCPU (struct
+The tests using the PL011 UART of the virt and raspi machines
-KVMParkedVcpu) in the 'struct KVMState' maintained 'kvm_parked_vcpus' list of
+weren't properly enabling the UART and its transmitter previous
-parked vCPUs.
+to sending characters. Follow the PL011 manual initialization
 recommendation by setting the proper bits of the control register.
-Fixes: Coverity CID 1558552
+Update the ASM code prefixing:
-Fixes: 08c3286822 ("accel/kvm: Extract common KVM vCPU {creation,parking} code")
-Reported-by: Peter Maydell <peter.maydell@linaro.org>
+  *UART_CTRL = UART_ENABLE | TX_ENABLE;
-Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
+to:
-Reviewed-by: Gavin Shan <gshan@redhat.com>
-Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
+  while (true) {
-Reviewed-by: Igor Mammedov <imammedo@redhat.com>
+      *UART_DATA = 'T';
-Message-id: 20240725145132.99355-1-salil.mehta@huawei.com
+  }
-Suggested-by: Peter Maydell <peter.maydell@linaro.org>
-Message-ID: <CAFEAcA-3_d1c7XSXWkFubD-LsW5c5i95e6xxV09r2C9yGtzcdA@mail.gmail.com>
+Note, since commit 51b61dd4d56 ("hw/char/pl011: Warn when using
-Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
+disabled transmitter") incomplete PL011 initialization can be
 logged using the '-d guest_errors' command line option.
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- accel/kvm/kvm-all.c | 1 +
+ tests/qtest/boot-serial-test.c | 7 ++++++-
-file changed, 1 insertion(+)
+file changed, 6 insertions(+), 1 deletion(-)
-diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
+diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/accel/kvm/kvm-all.c
+--- a/tests/qtest/boot-serial-test.c
-+++ b/accel/kvm/kvm-all.c
++++ b/tests/qtest/boot-serial-test.c
-@@ -XXX,XX +XXX,XX @@ int kvm_unpark_vcpu(KVMState *s, unsigned long vcpu_id)
+@@ -XXX,XX +XXX,XX @@ static const uint8_t kernel_plml605[] = {
-             QLIST_REMOVE(cpu, node);
+ };
-             kvm_fd = cpu->kvm_fd;
-             g_free(cpu);
+ static const uint8_t bios_raspi2[] = {
-+            break;
+-    0x08, 0x30, 0x9f, 0xe5,                 /*        ldr     r3, [pc, #8]   Get &UART0 */
-         }
++    0x10, 0x30, 0x9f, 0xe5,                 /*        ldr     r3, [pc, #16]  Get &UART0 */
-     }
++    0x10, 0x20, 0x9f, 0xe5,                 /*        ldr     r2, [pc, #16]  Get &CR */
++    0xb0, 0x23, 0xc3, 0xe1,                 /*        strh    r2, [r3, #48]  Set CR */
 x54, 0x20, 0xa0, 0xe3,                 /*        mov     r2, #'T' */
 x00, 0x20, 0xc3, 0xe5,                 /* loop:  strb    r2, [r3]       *TXDAT = 'T' */
 xff, 0xff, 0xff, 0xea,                 /*        b       -4             (loop) */
 x00, 0x10, 0x20, 0x3f,                 /* UART0: 0x3f201000 */
 +    0x01, 0x01, 0x00, 0x00,                 /* CR:    0x101 = UARTEN|TXE */
  };
  static const uint8_t kernel_aarch64[] = {
 x02, 0x20, 0xa1, 0xd2,                 /*        mov    x2, #0x9000000  Load UART0 */
 +    0x21, 0x20, 0x80, 0x52,                 /*        mov    w1, 0x101       CR = UARTEN|TXE */
 +    0x41, 0x60, 0x00, 0x79,                 /*        strh   w1, [x2, #48]   Set CR */
 x81, 0x0a, 0x80, 0x52,                 /*        mov    w1, #'T' */
 x41, 0x00, 0x00, 0x39,                 /* loop:  strb   w1, [x2]        *TXDAT = 'T' */
 xff, 0xff, 0xff, 0x17,                 /*        b      -4              (loop) */
 --
 .34.1

-New patch
+[PULL 07/11] target/arm: Move minor arithmetic helpers out of helper.c
+helper.c includes some small TCG helper functions used for mostly
+arithmetic instructions.  These are TCG only and there's no need for
+them to be in the large and unwieldy helper.c.  Move them out to
+their own source file in the tcg/ subdirectory, together with the
+op_addsub.h multiply-included template header that they use.
+Since we are moving op_addsub.h, we take the opportunity to
+give it a name which matches our convention for files which
+are not true header files but which are #included from other
+C files: op_addsub.c.inc.
+(Ironically, this means that helper.c no longer contains
+any TCG helper function definitions at all.)
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20250110131211.2546314-1-peter.maydell@linaro.org
+Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+---
+ target/arm/helper.c                           | 285 -----------------
+ target/arm/tcg/arith_helper.c                 | 296 ++++++++++++++++++
+ .../arm/{op_addsub.h => tcg/op_addsub.c.inc}  |   0
+ target/arm/tcg/meson.build                    |   1 +
+files changed, 297 insertions(+), 285 deletions(-)
+ create mode 100644 target/arm/tcg/arith_helper.c
+ rename target/arm/{op_addsub.h => tcg/op_addsub.c.inc} (100%)
+diff --git a/target/arm/helper.c b/target/arm/helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/helper.c
++++ b/target/arm/helper.c
+@@ -XXX,XX +XXX,XX @@
+ #include "qemu/main-loop.h"
+ #include "qemu/timer.h"
+ #include "qemu/bitops.h"
+-#include "qemu/crc32c.h"
+ #include "qemu/qemu-print.h"
+ #include "exec/exec-all.h"
+ #include "exec/translation-block.h"
+-#include <zlib.h> /* for crc32 */
+ #include "hw/irq.h"
+ #include "system/cpu-timers.h"
+ #include "system/kvm.h"
+@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
+     };
+ }
+-/*
+- * Note that signed overflow is undefined in C.  The following routines are
+- * careful to use unsigned types where modulo arithmetic is required.
+- * Failure to do so _will_ break on newer gcc.
+- */
+-
+-/* Signed saturating arithmetic.  */
+-
+-/* Perform 16-bit signed saturating addition.  */
+-static inline uint16_t add16_sat(uint16_t a, uint16_t b)
+-{
+-    uint16_t res;
+-
+-    res = a + b;
+-    if (((res ^ a) & 0x8000) && !((a ^ b) & 0x8000)) {
+-        if (a & 0x8000) {
+-            res = 0x8000;
+-        } else {
+-            res = 0x7fff;
+-        }
+-    }
+-    return res;
+-}
+-
+-/* Perform 8-bit signed saturating addition.  */
+-static inline uint8_t add8_sat(uint8_t a, uint8_t b)
+-{
+-    uint8_t res;
+-
+-    res = a + b;
+-    if (((res ^ a) & 0x80) && !((a ^ b) & 0x80)) {
+-        if (a & 0x80) {
+-            res = 0x80;
+-        } else {
+-            res = 0x7f;
+-        }
+-    }
+-    return res;
+-}
+-
+-/* Perform 16-bit signed saturating subtraction.  */
+-static inline uint16_t sub16_sat(uint16_t a, uint16_t b)
+-{
+-    uint16_t res;
+-
+-    res = a - b;
+-    if (((res ^ a) & 0x8000) && ((a ^ b) & 0x8000)) {
+-        if (a & 0x8000) {
+-            res = 0x8000;
+-        } else {
+-            res = 0x7fff;
+-        }
+-    }
+-    return res;
+-}
+-
+-/* Perform 8-bit signed saturating subtraction.  */
+-static inline uint8_t sub8_sat(uint8_t a, uint8_t b)
+-{
+-    uint8_t res;
+-
+-    res = a - b;
+-    if (((res ^ a) & 0x80) && ((a ^ b) & 0x80)) {
+-        if (a & 0x80) {
+-            res = 0x80;
+-        } else {
+-            res = 0x7f;
+-        }
+-    }
+-    return res;
+-}
+-
+-#define ADD16(a, b, n) RESULT(add16_sat(a, b), n, 16);
+-#define SUB16(a, b, n) RESULT(sub16_sat(a, b), n, 16);
+-#define ADD8(a, b, n)  RESULT(add8_sat(a, b), n, 8);
+-#define SUB8(a, b, n)  RESULT(sub8_sat(a, b), n, 8);
+-#define PFX q
+-
+-#include "op_addsub.h"
+-
+-/* Unsigned saturating arithmetic.  */
+-static inline uint16_t add16_usat(uint16_t a, uint16_t b)
+-{
+-    uint16_t res;
+-    res = a + b;
+-    if (res < a) {
+-        res = 0xffff;
+-    }
+-    return res;
+-}
+-
+-static inline uint16_t sub16_usat(uint16_t a, uint16_t b)
+-{
+-    if (a > b) {
+-        return a - b;
+-    } else {
+-        return 0;
+-    }
+-}
+-
+-static inline uint8_t add8_usat(uint8_t a, uint8_t b)
+-{
+-    uint8_t res;
+-    res = a + b;
+-    if (res < a) {
+-        res = 0xff;
+-    }
+-    return res;
+-}
+-
+-static inline uint8_t sub8_usat(uint8_t a, uint8_t b)
+-{
+-    if (a > b) {
+-        return a - b;
+-    } else {
+-        return 0;
+-    }
+-}
+-
+-#define ADD16(a, b, n) RESULT(add16_usat(a, b), n, 16);
+-#define SUB16(a, b, n) RESULT(sub16_usat(a, b), n, 16);
+-#define ADD8(a, b, n)  RESULT(add8_usat(a, b), n, 8);
+-#define SUB8(a, b, n)  RESULT(sub8_usat(a, b), n, 8);
+-#define PFX uq
+-
+-#include "op_addsub.h"
+-
+-/* Signed modulo arithmetic.  */
+-#define SARITH16(a, b, n, op) do { \
+-    int32_t sum; \
+-    sum = (int32_t)(int16_t)(a) op (int32_t)(int16_t)(b); \
+-    RESULT(sum, n, 16); \
+-    if (sum >= 0) \
+-        ge |= 3 << (n * 2); \
+-    } while (0)
+-
+-#define SARITH8(a, b, n, op) do { \
+-    int32_t sum; \
+-    sum = (int32_t)(int8_t)(a) op (int32_t)(int8_t)(b); \
+-    RESULT(sum, n, 8); \
+-    if (sum >= 0) \
+-        ge |= 1 << n; \
+-    } while (0)
+-
+-
+-#define ADD16(a, b, n) SARITH16(a, b, n, +)
+-#define SUB16(a, b, n) SARITH16(a, b, n, -)
+-#define ADD8(a, b, n)  SARITH8(a, b, n, +)
+-#define SUB8(a, b, n)  SARITH8(a, b, n, -)
+-#define PFX s
+-#define ARITH_GE
+-
+-#include "op_addsub.h"
+-
+-/* Unsigned modulo arithmetic.  */
+-#define ADD16(a, b, n) do { \
+-    uint32_t sum; \
+-    sum = (uint32_t)(uint16_t)(a) + (uint32_t)(uint16_t)(b); \
+-    RESULT(sum, n, 16); \
+-    if ((sum >> 16) == 1) \
+-        ge |= 3 << (n * 2); \
+-    } while (0)
+-
+-#define ADD8(a, b, n) do { \
+-    uint32_t sum; \
+-    sum = (uint32_t)(uint8_t)(a) + (uint32_t)(uint8_t)(b); \
+-    RESULT(sum, n, 8); \
+-    if ((sum >> 8) == 1) \
+-        ge |= 1 << n; \
+-    } while (0)
+-
+-#define SUB16(a, b, n) do { \
+-    uint32_t sum; \
+-    sum = (uint32_t)(uint16_t)(a) - (uint32_t)(uint16_t)(b); \
+-    RESULT(sum, n, 16); \
+-    if ((sum >> 16) == 0) \
+-        ge |= 3 << (n * 2); \
+-    } while (0)
+-
+-#define SUB8(a, b, n) do { \
+-    uint32_t sum; \
+-    sum = (uint32_t)(uint8_t)(a) - (uint32_t)(uint8_t)(b); \
+-    RESULT(sum, n, 8); \
+-    if ((sum >> 8) == 0) \
+-        ge |= 1 << n; \
+-    } while (0)
+-
+-#define PFX u
+-#define ARITH_GE
+-
+-#include "op_addsub.h"
+-
+-/* Halved signed arithmetic.  */
+-#define ADD16(a, b, n) \
+-  RESULT(((int32_t)(int16_t)(a) + (int32_t)(int16_t)(b)) >> 1, n, 16)
+-#define SUB16(a, b, n) \
+-  RESULT(((int32_t)(int16_t)(a) - (int32_t)(int16_t)(b)) >> 1, n, 16)
+-#define ADD8(a, b, n) \
+-  RESULT(((int32_t)(int8_t)(a) + (int32_t)(int8_t)(b)) >> 1, n, 8)
+-#define SUB8(a, b, n) \
+-  RESULT(((int32_t)(int8_t)(a) - (int32_t)(int8_t)(b)) >> 1, n, 8)
+-#define PFX sh
+-
+-#include "op_addsub.h"
+-
+-/* Halved unsigned arithmetic.  */
+-#define ADD16(a, b, n) \
+-  RESULT(((uint32_t)(uint16_t)(a) + (uint32_t)(uint16_t)(b)) >> 1, n, 16)
+-#define SUB16(a, b, n) \
+-  RESULT(((uint32_t)(uint16_t)(a) - (uint32_t)(uint16_t)(b)) >> 1, n, 16)
+-#define ADD8(a, b, n) \
+-  RESULT(((uint32_t)(uint8_t)(a) + (uint32_t)(uint8_t)(b)) >> 1, n, 8)
+-#define SUB8(a, b, n) \
+-  RESULT(((uint32_t)(uint8_t)(a) - (uint32_t)(uint8_t)(b)) >> 1, n, 8)
+-#define PFX uh
+-
+-#include "op_addsub.h"
+-
+-static inline uint8_t do_usad(uint8_t a, uint8_t b)
+-{
+-    if (a > b) {
+-        return a - b;
+-    } else {
+-        return b - a;
+-    }
+-}
+-
+-/* Unsigned sum of absolute byte differences.  */
+-uint32_t HELPER(usad8)(uint32_t a, uint32_t b)
+-{
+-    uint32_t sum;
+-    sum = do_usad(a, b);
+-    sum += do_usad(a >> 8, b >> 8);
+-    sum += do_usad(a >> 16, b >> 16);
+-    sum += do_usad(a >> 24, b >> 24);
+-    return sum;
+-}
+-
+-/* For ARMv6 SEL instruction.  */
+-uint32_t HELPER(sel_flags)(uint32_t flags, uint32_t a, uint32_t b)
+-{
+-    uint32_t mask;
+-
+-    mask = 0;
+-    if (flags & 1) {
+-        mask |= 0xff;
+-    }
+-    if (flags & 2) {
+-        mask |= 0xff00;
+-    }
+-    if (flags & 4) {
+-        mask |= 0xff0000;
+-    }
+-    if (flags & 8) {
+-        mask |= 0xff000000;
+-    }
+-    return (a & mask) | (b & ~mask);
+-}
+-
+-/*
+- * CRC helpers.
+- * The upper bytes of val (above the number specified by 'bytes') must have
+- * been zeroed out by the caller.
+- */
+-uint32_t HELPER(crc32)(uint32_t acc, uint32_t val, uint32_t bytes)
+-{
+-    uint8_t buf[4];
+-
+-    stl_le_p(buf, val);
+-
+-    /* zlib crc32 converts the accumulator and output to one's complement.  */
+-    return crc32(acc ^ 0xffffffff, buf, bytes) ^ 0xffffffff;
+-}
+-
+-uint32_t HELPER(crc32c)(uint32_t acc, uint32_t val, uint32_t bytes)
+-{
+-    uint8_t buf[4];
+-
+-    stl_le_p(buf, val);
+-
+-    /* Linux crc32c converts the output to one's complement.  */
+-    return crc32c(acc, buf, bytes) ^ 0xffffffff;
+-}
+ /*
+  * Return the exception level to which FP-disabled exceptions should
+diff --git a/target/arm/tcg/arith_helper.c b/target/arm/tcg/arith_helper.c
+new file mode 100644
+index XXXXXXX..XXXXXXX
+--- /dev/null
++++ b/target/arm/tcg/arith_helper.c
+@@ -XXX,XX +XXX,XX @@
++/*
++ * ARM generic helpers for various arithmetical operations.
++ *
++ * This code is licensed under the GNU GPL v2 or later.
++ *
++ * SPDX-License-Identifier: GPL-2.0-or-later
++ */
++#include "qemu/osdep.h"
++#include "cpu.h"
++#include "exec/helper-proto.h"
++#include "qemu/crc32c.h"
++#include <zlib.h> /* for crc32 */
++
++/*
++ * Note that signed overflow is undefined in C.  The following routines are
++ * careful to use unsigned types where modulo arithmetic is required.
++ * Failure to do so _will_ break on newer gcc.
++ */
++
++/* Signed saturating arithmetic.  */
++
++/* Perform 16-bit signed saturating addition.  */
++static inline uint16_t add16_sat(uint16_t a, uint16_t b)
++{
++    uint16_t res;
++
++    res = a + b;
++    if (((res ^ a) & 0x8000) && !((a ^ b) & 0x8000)) {
++        if (a & 0x8000) {
++            res = 0x8000;
++        } else {
++            res = 0x7fff;
++        }
++    }
++    return res;
++}
++
++/* Perform 8-bit signed saturating addition.  */
++static inline uint8_t add8_sat(uint8_t a, uint8_t b)
++{
++    uint8_t res;
++
++    res = a + b;
++    if (((res ^ a) & 0x80) && !((a ^ b) & 0x80)) {
++        if (a & 0x80) {
++            res = 0x80;
++        } else {
++            res = 0x7f;
++        }
++    }
++    return res;
++}
++
++/* Perform 16-bit signed saturating subtraction.  */
++static inline uint16_t sub16_sat(uint16_t a, uint16_t b)
++{
++    uint16_t res;
++
++    res = a - b;
++    if (((res ^ a) & 0x8000) && ((a ^ b) & 0x8000)) {
++        if (a & 0x8000) {
++            res = 0x8000;
++        } else {
++            res = 0x7fff;
++        }
++    }
++    return res;
++}
++
++/* Perform 8-bit signed saturating subtraction.  */
++static inline uint8_t sub8_sat(uint8_t a, uint8_t b)
++{
++    uint8_t res;
++
++    res = a - b;
++    if (((res ^ a) & 0x80) && ((a ^ b) & 0x80)) {
++        if (a & 0x80) {
++            res = 0x80;
++        } else {
++            res = 0x7f;
++        }
++    }
++    return res;
++}
++
++#define ADD16(a, b, n) RESULT(add16_sat(a, b), n, 16);
++#define SUB16(a, b, n) RESULT(sub16_sat(a, b), n, 16);
++#define ADD8(a, b, n)  RESULT(add8_sat(a, b), n, 8);
++#define SUB8(a, b, n)  RESULT(sub8_sat(a, b), n, 8);
++#define PFX q
++
++#include "op_addsub.c.inc"
++
++/* Unsigned saturating arithmetic.  */
++static inline uint16_t add16_usat(uint16_t a, uint16_t b)
++{
++    uint16_t res;
++    res = a + b;
++    if (res < a) {
++        res = 0xffff;
++    }
++    return res;
++}
++
++static inline uint16_t sub16_usat(uint16_t a, uint16_t b)
++{
++    if (a > b) {
++        return a - b;
++    } else {
++        return 0;
++    }
++}
++
++static inline uint8_t add8_usat(uint8_t a, uint8_t b)
++{
++    uint8_t res;
++    res = a + b;
++    if (res < a) {
++        res = 0xff;
++    }
++    return res;
++}
++
++static inline uint8_t sub8_usat(uint8_t a, uint8_t b)
++{
++    if (a > b) {
++        return a - b;
++    } else {
++        return 0;
++    }
++}
++
++#define ADD16(a, b, n) RESULT(add16_usat(a, b), n, 16);
++#define SUB16(a, b, n) RESULT(sub16_usat(a, b), n, 16);
++#define ADD8(a, b, n)  RESULT(add8_usat(a, b), n, 8);
++#define SUB8(a, b, n)  RESULT(sub8_usat(a, b), n, 8);
++#define PFX uq
++
++#include "op_addsub.c.inc"
++
++/* Signed modulo arithmetic.  */
++#define SARITH16(a, b, n, op) do { \
++    int32_t sum; \
++    sum = (int32_t)(int16_t)(a) op (int32_t)(int16_t)(b); \
++    RESULT(sum, n, 16); \
++    if (sum >= 0) \
++        ge |= 3 << (n * 2); \
++    } while (0)
++
++#define SARITH8(a, b, n, op) do { \
++    int32_t sum; \
++    sum = (int32_t)(int8_t)(a) op (int32_t)(int8_t)(b); \
++    RESULT(sum, n, 8); \
++    if (sum >= 0) \
++        ge |= 1 << n; \
++    } while (0)
++
++
++#define ADD16(a, b, n) SARITH16(a, b, n, +)
++#define SUB16(a, b, n) SARITH16(a, b, n, -)
++#define ADD8(a, b, n)  SARITH8(a, b, n, +)
++#define SUB8(a, b, n)  SARITH8(a, b, n, -)
++#define PFX s
++#define ARITH_GE
++
++#include "op_addsub.c.inc"
++
++/* Unsigned modulo arithmetic.  */
++#define ADD16(a, b, n) do { \
++    uint32_t sum; \
++    sum = (uint32_t)(uint16_t)(a) + (uint32_t)(uint16_t)(b); \
++    RESULT(sum, n, 16); \
++    if ((sum >> 16) == 1) \
++        ge |= 3 << (n * 2); \
++    } while (0)
++
++#define ADD8(a, b, n) do { \
++    uint32_t sum; \
++    sum = (uint32_t)(uint8_t)(a) + (uint32_t)(uint8_t)(b); \
++    RESULT(sum, n, 8); \
++    if ((sum >> 8) == 1) \
++        ge |= 1 << n; \
++    } while (0)
++
++#define SUB16(a, b, n) do { \
++    uint32_t sum; \
++    sum = (uint32_t)(uint16_t)(a) - (uint32_t)(uint16_t)(b); \
++    RESULT(sum, n, 16); \
++    if ((sum >> 16) == 0) \
++        ge |= 3 << (n * 2); \
++    } while (0)
++
++#define SUB8(a, b, n) do { \
++    uint32_t sum; \
++    sum = (uint32_t)(uint8_t)(a) - (uint32_t)(uint8_t)(b); \
++    RESULT(sum, n, 8); \
++    if ((sum >> 8) == 0) \
++        ge |= 1 << n; \
++    } while (0)
++
++#define PFX u
++#define ARITH_GE
++
++#include "op_addsub.c.inc"
++
++/* Halved signed arithmetic.  */
++#define ADD16(a, b, n) \
++  RESULT(((int32_t)(int16_t)(a) + (int32_t)(int16_t)(b)) >> 1, n, 16)
++#define SUB16(a, b, n) \
++  RESULT(((int32_t)(int16_t)(a) - (int32_t)(int16_t)(b)) >> 1, n, 16)
++#define ADD8(a, b, n) \
++  RESULT(((int32_t)(int8_t)(a) + (int32_t)(int8_t)(b)) >> 1, n, 8)
++#define SUB8(a, b, n) \
++  RESULT(((int32_t)(int8_t)(a) - (int32_t)(int8_t)(b)) >> 1, n, 8)
++#define PFX sh
++
++#include "op_addsub.c.inc"
++
++/* Halved unsigned arithmetic.  */
++#define ADD16(a, b, n) \
++  RESULT(((uint32_t)(uint16_t)(a) + (uint32_t)(uint16_t)(b)) >> 1, n, 16)
++#define SUB16(a, b, n) \
++  RESULT(((uint32_t)(uint16_t)(a) - (uint32_t)(uint16_t)(b)) >> 1, n, 16)
++#define ADD8(a, b, n) \
++  RESULT(((uint32_t)(uint8_t)(a) + (uint32_t)(uint8_t)(b)) >> 1, n, 8)
++#define SUB8(a, b, n) \
++  RESULT(((uint32_t)(uint8_t)(a) - (uint32_t)(uint8_t)(b)) >> 1, n, 8)
++#define PFX uh
++
++#include "op_addsub.c.inc"
++
++static inline uint8_t do_usad(uint8_t a, uint8_t b)
++{
++    if (a > b) {
++        return a - b;
++    } else {
++        return b - a;
++    }
++}
++
++/* Unsigned sum of absolute byte differences.  */
++uint32_t HELPER(usad8)(uint32_t a, uint32_t b)
++{
++    uint32_t sum;
++    sum = do_usad(a, b);
++    sum += do_usad(a >> 8, b >> 8);
++    sum += do_usad(a >> 16, b >> 16);
++    sum += do_usad(a >> 24, b >> 24);
++    return sum;
++}
++
++/* For ARMv6 SEL instruction.  */
++uint32_t HELPER(sel_flags)(uint32_t flags, uint32_t a, uint32_t b)
++{
++    uint32_t mask;
++
++    mask = 0;
++    if (flags & 1) {
++        mask |= 0xff;
++    }
++    if (flags & 2) {
++        mask |= 0xff00;
++    }
++    if (flags & 4) {
++        mask |= 0xff0000;
++    }
++    if (flags & 8) {
++        mask |= 0xff000000;
++    }
++    return (a & mask) | (b & ~mask);
++}
++
++/*
++ * CRC helpers.
++ * The upper bytes of val (above the number specified by 'bytes') must have
++ * been zeroed out by the caller.
++ */
++uint32_t HELPER(crc32)(uint32_t acc, uint32_t val, uint32_t bytes)
++{
++    uint8_t buf[4];
++
++    stl_le_p(buf, val);
++
++    /* zlib crc32 converts the accumulator and output to one's complement.  */
++    return crc32(acc ^ 0xffffffff, buf, bytes) ^ 0xffffffff;
++}
++
++uint32_t HELPER(crc32c)(uint32_t acc, uint32_t val, uint32_t bytes)
++{
++    uint8_t buf[4];
++
++    stl_le_p(buf, val);
++
++    /* Linux crc32c converts the output to one's complement.  */
++    return crc32c(acc, buf, bytes) ^ 0xffffffff;
++}
+diff --git a/target/arm/op_addsub.h b/target/arm/tcg/op_addsub.c.inc
+similarity index 100%
+rename from target/arm/op_addsub.h
+rename to target/arm/tcg/op_addsub.c.inc
+diff --git a/target/arm/tcg/meson.build b/target/arm/tcg/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/tcg/meson.build
++++ b/target/arm/tcg/meson.build
+@@ -XXX,XX +XXX,XX @@ arm_ss.add(files(
+   'tlb_helper.c',
+   'vec_helper.c',
+   'tlb-insns.c',
++  'arith_helper.c',
+ ))
+ arm_ss.add(when: 'TARGET_AARCH64', if_true: files(
+--
+.34.1

-[PULL 3/4] target/arm: Handle denormals correctly for FMOPA (widening)
+[PULL 08/11] target/arm: add new property to select pauth-qarma5
-The FMOPA (widening) SME instruction takes pairs of half-precision
+From: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 floating point values, widens them to single-precision, does a
 two-way dot product and accumulates the results into a
 single-precision destination.  We don't quite correctly handle the
 FPCR bits FZ and FZ16 which control flushing of denormal inputs and
 outputs.  This is because at the moment we pass a single float_status
 value to the helper function, which then uses that configuration for
 all the fp operations it does.  However, because the inputs to this
 operation are float16 and the outputs are float32 we need to use the
 fp_status_f16 for the float16 input widening but the normal fp_status
 for everything else.  Otherwise we will apply the flushing control
 FPCR.FZ16 to the 32-bit output rather than the FPCR.FZ control, and
 incorrectly flush a denormal output to zero when we should not (or
 vice-versa).
-(In commit 207d30b5fdb5b we tried to fix the FZ handling but
+Before changing default pauth algorithm, we need to make sure current
-didn't get it right, switching from "use FPCR.FZ for everything" to
+default one (QARMA5) can still be selected.
 "use FPCR.FZ16 for everything".)
-Pass the CPU env to the sme_fmopa_h helper instead of an fp_status
+$ qemu-system-aarch64 -cpu max,pauth-qarma5=on ...
 pointer, and have the helper pass an extra fp_status into the
 f16_dotadd() function so that we can use the right status for the
 right parts of this operation.
-Cc: qemu-stable@nongnu.org
+Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
-Fixes: 207d30b5fdb5 ("target/arm: Use FPST_F16 for SME FMOPA (widening)")
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2373
+Message-id: 20241219183211.3493974-2-pierrick.bouvier@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
- target/arm/tcg/helper-sme.h    |  2 +-
+ docs/system/arm/cpu-features.rst |  5 ++++-
- target/arm/tcg/sme_helper.c    | 39 +++++++++++++++++++++++-----------
+ target/arm/cpu.h                 |  1 +
- target/arm/tcg/translate-sme.c | 25 ++++++++++++++++++++--
+ target/arm/arm-qmp-cmds.c        |  2 +-
-files changed, 51 insertions(+), 15 deletions(-)
+ target/arm/cpu64.c               | 20 ++++++++++++++------
  tests/qtest/arm-cpu-features.c   | 15 +++++++++++----
 files changed, 31 insertions(+), 12 deletions(-)
-diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h
+diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/tcg/helper-sme.h
+--- a/docs/system/arm/cpu-features.rst
-+++ b/target/arm/tcg/helper-sme.h
++++ b/docs/system/arm/cpu-features.rst
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sme_addha_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+@@ -XXX,XX +XXX,XX @@ Below is the list of TCG VCPU features and their descriptions.
- DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+ ``pauth-qarma3``
+   When ``pauth`` is enabled, select the architected QARMA3 algorithm.
- DEF_HELPER_FLAGS_7(sme_fmopa_h, TCG_CALL_NO_RWG,
--                   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
+-Without either ``pauth-impdef`` or ``pauth-qarma3`` enabled,
-+                   void, ptr, ptr, ptr, ptr, ptr, env, i32)
++``pauth-qarma5``
- DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG,
++  When ``pauth`` is enabled, select the architected QARMA5 algorithm.
-                    void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
++
- DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG,
++Without ``pauth-impdef``, ``pauth-qarma3`` or ``pauth-qarma5`` enabled,
-diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
+ the architected QARMA5 algorithm is used.  The architected QARMA5
  and QARMA3 algorithms have good cryptographic properties, but can
  be quite slow to emulate.  The impdef algorithm used by QEMU is
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/tcg/sme_helper.c
+--- a/target/arm/cpu.h
-+++ b/target/arm/tcg/sme_helper.c
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ static inline uint32_t f16mop_adj_pair(uint32_t pair, uint32_t pg, uint32_t neg)
+@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
      bool prop_pauth;
      bool prop_pauth_impdef;
      bool prop_pauth_qarma3;
 +    bool prop_pauth_qarma5;
      bool prop_lpa2;
      /* DCZ blocksize, in log_2(words), ie low 4 bits of DCZID_EL0 */
 diff --git a/target/arm/arm-qmp-cmds.c b/target/arm/arm-qmp-cmds.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/arm-qmp-cmds.c
 +++ b/target/arm/arm-qmp-cmds.c
@@ -XXX,XX +XXX,XX @@ static const char *cpu_model_advertised_features[] = {
      "sve640", "sve768", "sve896", "sve1024", "sve1152", "sve1280",
      "sve1408", "sve1536", "sve1664", "sve1792", "sve1920", "sve2048",
      "kvm-no-adjvtime", "kvm-steal-time",
 -    "pauth", "pauth-impdef", "pauth-qarma3",
 +    "pauth", "pauth-impdef", "pauth-qarma3", "pauth-qarma5",
      NULL
  };
 diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu64.c
 +++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
          }
          if (cpu->prop_pauth) {
 -            if (cpu->prop_pauth_impdef && cpu->prop_pauth_qarma3) {
 +            if ((cpu->prop_pauth_impdef && cpu->prop_pauth_qarma3) ||
 +                (cpu->prop_pauth_impdef && cpu->prop_pauth_qarma5) ||
 +                (cpu->prop_pauth_qarma3 && cpu->prop_pauth_qarma5)) {
                  error_setg(errp,
 -                           "cannot enable both pauth-impdef and pauth-qarma3");
 +                           "cannot enable pauth-impdef, pauth-qarma3 and "
 +                           "pauth-qarma5 at the same time");
                  return;
              }
@@ -XXX,XX +XXX,XX @@ void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
              } else if (cpu->prop_pauth_qarma3) {
                  isar2 = FIELD_DP64(isar2, ID_AA64ISAR2, APA3, features);
                  isar2 = FIELD_DP64(isar2, ID_AA64ISAR2, GPA3, 1);
 -            } else {
 +            } else { /* default is pauth-qarma5 */
                  isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, APA, features);
                  isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, GPA, 1);
              }
 -        } else if (cpu->prop_pauth_impdef || cpu->prop_pauth_qarma3) {
 -            error_setg(errp, "cannot enable pauth-impdef or "
 -                       "pauth-qarma3 without pauth");
 +        } else if (cpu->prop_pauth_impdef ||
 +                   cpu->prop_pauth_qarma3 ||
 +                   cpu->prop_pauth_qarma5) {
 +            error_setg(errp, "cannot enable pauth-impdef, pauth-qarma3 or "
 +                       "pauth-qarma5 without pauth");
              error_append_hint(errp, "Add pauth=on to the CPU property list.\n");
          }
      }
@@ -XXX,XX +XXX,XX @@ static const Property arm_cpu_pauth_impdef_property =
      DEFINE_PROP_BOOL("pauth-impdef", ARMCPU, prop_pauth_impdef, false);
  static const Property arm_cpu_pauth_qarma3_property =
      DEFINE_PROP_BOOL("pauth-qarma3", ARMCPU, prop_pauth_qarma3, false);
 +static Property arm_cpu_pauth_qarma5_property =
 +    DEFINE_PROP_BOOL("pauth-qarma5", ARMCPU, prop_pauth_qarma5, false);
  void aarch64_add_pauth_properties(Object *obj)
  {
@@ -XXX,XX +XXX,XX @@ void aarch64_add_pauth_properties(Object *obj)
      } else {
          qdev_property_add_static(DEVICE(obj), &arm_cpu_pauth_impdef_property);
          qdev_property_add_static(DEVICE(obj), &arm_cpu_pauth_qarma3_property);
 +        qdev_property_add_static(DEVICE(obj), &arm_cpu_pauth_qarma5_property);
      }
  }
- static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2,
+diff --git a/tests/qtest/arm-cpu-features.c b/tests/qtest/arm-cpu-features.c
--                          float_status *s_std, float_status *s_odd)
+index XXXXXXX..XXXXXXX 100644
-+                          float_status *s_f16, float_status *s_std,
+--- a/tests/qtest/arm-cpu-features.c
-+                          float_status *s_odd)
++++ b/tests/qtest/arm-cpu-features.c
- {
+@@ -XXX,XX +XXX,XX @@ static void pauth_tests_default(QTestState *qts, const char *cpu_type)
--    float64 e1r = float16_to_float64(e1 & 0xffff, true, s_std);
+     assert_has_feature_enabled(qts, cpu_type, "pauth");
--    float64 e1c = float16_to_float64(e1 >> 16, true, s_std);
+     assert_has_feature_disabled(qts, cpu_type, "pauth-impdef");
--    float64 e2r = float16_to_float64(e2 & 0xffff, true, s_std);
+     assert_has_feature_disabled(qts, cpu_type, "pauth-qarma3");
--    float64 e2c = float16_to_float64(e2 >> 16, true, s_std);
++    assert_has_feature_disabled(qts, cpu_type, "pauth-qarma5");
-+    /*
+     assert_set_feature(qts, cpu_type, "pauth", false);
-+     * We need three different float_status for different parts of this
+     assert_set_feature(qts, cpu_type, "pauth", true);
-+     * operation:
+     assert_set_feature(qts, cpu_type, "pauth-impdef", true);
-+     *  - the input conversion of the float16 values must use the
+     assert_set_feature(qts, cpu_type, "pauth-impdef", false);
-+     *    f16-specific float_status, so that the FPCR.FZ16 control is applied
+     assert_set_feature(qts, cpu_type, "pauth-qarma3", true);
-+     *  - operations on float32 including the final accumulation must use
+     assert_set_feature(qts, cpu_type, "pauth-qarma3", false);
-+     *    the normal float_status, so that FPCR.FZ is applied
++    assert_set_feature(qts, cpu_type, "pauth-qarma5", true);
-+     *  - we have pre-set-up copy of s_std which is set to round-to-odd,
++    assert_set_feature(qts, cpu_type, "pauth-qarma5", false);
-+     *    for the multiply (see below)
+     assert_error(qts, cpu_type,
-+     */
+-                 "cannot enable pauth-impdef or pauth-qarma3 without pauth",
-+    float64 e1r = float16_to_float64(e1 & 0xffff, true, s_f16);
++                 "cannot enable pauth-impdef, pauth-qarma3 or pauth-qarma5 without pauth",
-+    float64 e1c = float16_to_float64(e1 >> 16, true, s_f16);
+                  "{ 'pauth': false, 'pauth-impdef': true }");
-+    float64 e2r = float16_to_float64(e2 & 0xffff, true, s_f16);
+     assert_error(qts, cpu_type,
-+    float64 e2c = float16_to_float64(e2 >> 16, true, s_f16);
+-                 "cannot enable pauth-impdef or pauth-qarma3 without pauth",
-     float64 t64;
++                 "cannot enable pauth-impdef, pauth-qarma3 or pauth-qarma5 without pauth",
-     float32 t32;
+                  "{ 'pauth': false, 'pauth-qarma3': true }");
+     assert_error(qts, cpu_type,
-@@ -XXX,XX +XXX,XX @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2,
+-                 "cannot enable both pauth-impdef and pauth-qarma3",
 -                 "{ 'pauth': true, 'pauth-impdef': true, 'pauth-qarma3': true }");
 +                 "cannot enable pauth-impdef, pauth-qarma3 or pauth-qarma5 without pauth",
 +                 "{ 'pauth': false, 'pauth-qarma5': true }");
 +    assert_error(qts, cpu_type,
 +                 "cannot enable pauth-impdef, pauth-qarma3 and pauth-qarma5 at the same time",
 +                 "{ 'pauth': true, 'pauth-impdef': true, 'pauth-qarma3': true,"
 +                 "  'pauth-qarma5': true }");
  }
- void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn,
+ static void test_query_cpu_model_expansion(const void *data)
 -                         void *vpm, void *vst, uint32_t desc)
 +                         void *vpm, CPUARMState *env, uint32_t desc)
  {
      intptr_t row, col, oprsz = simd_maxsz(desc);
      uint32_t neg = simd_data(desc) * 0x80008000u;
      uint16_t *pn = vpn, *pm = vpm;
 -    float_status fpst_odd, fpst_std;
 +    float_status fpst_odd, fpst_std, fpst_f16;
      /*
 -     * Make a copy of float_status because this operation does not
 -     * update the cumulative fp exception status.  It also produces
 -     * default nans.  Make a second copy with round-to-odd -- see above.
 +     * Make copies of fp_status and fp_status_f16, because this operation
 +     * does not update the cumulative fp exception status.  It also
 +     * produces default NaNs. We also need a second copy of fp_status with
 +     * round-to-odd -- see above.
       */
 -    fpst_std = *(float_status *)vst;
 +    fpst_f16 = env->vfp.fp_status_f16;
 +    fpst_std = env->vfp.fp_status;
      set_default_nan_mode(true, &fpst_std);
 +    set_default_nan_mode(true, &fpst_f16);
      fpst_odd = fpst_std;
      set_float_rounding_mode(float_round_to_odd, &fpst_odd);
@@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn,
                          uint32_t m = *(uint32_t *)(vzm + H1_4(col));
                          m = f16mop_adj_pair(m, pcol, 0);
 -                        *a = f16_dotadd(*a, n, m, &fpst_std, &fpst_odd);
 +                        *a = f16_dotadd(*a, n, m,
 +                                        &fpst_f16, &fpst_std, &fpst_odd);
                      }
                      col += 4;
                      pcol >>= 4;
 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/tcg/translate-sme.c
 +++ b/target/arm/tcg/translate-sme.c
@@ -XXX,XX +XXX,XX @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz,
      return true;
  }
 -TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_fpst, a,
 -           MO_32, FPST_FPCR_F16, gen_helper_sme_fmopa_h)
 +static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz,
 +                           gen_helper_gvec_5_ptr *fn)
 +{
 +    int svl = streaming_vec_reg_size(s);
 +    uint32_t desc = simd_desc(svl, svl, a->sub);
 +    TCGv_ptr za, zn, zm, pn, pm;
 +
 +    if (!sme_smza_enabled_check(s)) {
 +        return true;
 +    }
 +
 +    za = get_tile(s, esz, a->zad);
 +    zn = vec_full_reg_ptr(s, a->zn);
 +    zm = vec_full_reg_ptr(s, a->zm);
 +    pn = pred_full_reg_ptr(s, a->pn);
 +    pm = pred_full_reg_ptr(s, a->pm);
 +
 +    fn(za, zn, zm, pn, pm, tcg_env, tcg_constant_i32(desc));
 +    return true;
 +}
 +
 +TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_env, a,
 +           MO_32, gen_helper_sme_fmopa_h)
  TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a,
             MO_32, FPST_FPCR, gen_helper_sme_fmopa_s)
  TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a,
 --
 .34.1

-New patch
+[PULL 09/11] tests/tcg/aarch64: force qarma5 for pauth-3 test
+The pauth-3 test explicitly tests that a computation of the
+pointer-authentication produces the expected result.  This means that
+it must be run with the QARMA5 algorithm.
+Explicitly set the pauth algorithm when running this test, so that it
+doesn't break when we change the default algorithm the 'max' CPU
+uses.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ tests/tcg/aarch64/Makefile.softmmu-target | 3 +++
+file changed, 3 insertions(+)
+diff --git a/tests/tcg/aarch64/Makefile.softmmu-target b/tests/tcg/aarch64/Makefile.softmmu-target
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/tcg/aarch64/Makefile.softmmu-target
++++ b/tests/tcg/aarch64/Makefile.softmmu-target
+@@ -XXX,XX +XXX,XX @@ EXTRA_RUNS+=run-memory-replay
+ ifneq ($(CROSS_CC_HAS_ARMV8_3),)
+ pauth-3: CFLAGS += $(CROSS_CC_HAS_ARMV8_3)
++# This test explicitly checks the output of the pauth operation so we
++# must force the use of the QARMA5 algorithm for it.
++run-pauth-3: QEMU_BASE_MACHINE=-M virt -cpu max,pauth-qarma5=on -display none
+ else
+ pauth-3:
+     $(call skip-test, "BUILD of $@", "missing compiler support")
+--
+.34.1

-New patch
+[PULL 10/11] target/arm: change default pauth algorithm to impdef
+From: Pierrick Bouvier <pierrick.bouvier@linaro.org>
+Pointer authentication on aarch64 is pretty expensive (up to 50% of
+execution time) when running a virtual machine with tcg and -cpu max
+(which enables pauth=on).
+The advice is always: use pauth-impdef=on.
+Our documentation even mentions it "by default" in
+docs/system/introduction.rst.
+Thus, we change the default to use impdef by default. This does not
+affect kvm or hvf acceleration, since pauth algorithm used is the one
+from host cpu.
+This change is retro compatible, in terms of cli, with previous
+versions, as the semantic of using -cpu max,pauth-impdef=on, and -cpu
+max,pauth-qarma3=on is preserved.
+The new option introduced in previous patch and matching old default is
+-cpu max,pauth-qarma5=on.
+It is retro compatible with migration as well, by defining a backcompat
+property, that will use qarma5 by default for virt machine <= 9.2.
+Tested by saving and restoring a vm from qemu 9.2.0 into qemu-master
+(10.0) for cpus neoverse-n2 and max.
+Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20241219183211.3493974-3-pierrick.bouvier@linaro.org
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+---
+ docs/system/arm/cpu-features.rst |  2 +-
+ docs/system/introduction.rst     |  2 +-
+ target/arm/cpu.h                 |  3 +++
+ hw/core/machine.c                |  4 +++-
+ target/arm/cpu.c                 |  2 ++
+ target/arm/cpu64.c               | 22 ++++++++++++++++------
+files changed, 26 insertions(+), 9 deletions(-)
+diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst
+index XXXXXXX..XXXXXXX 100644
+--- a/docs/system/arm/cpu-features.rst
++++ b/docs/system/arm/cpu-features.rst
+@@ -XXX,XX +XXX,XX @@ Below is the list of TCG VCPU features and their descriptions.
+   When ``pauth`` is enabled, select the architected QARMA5 algorithm.
+ Without ``pauth-impdef``, ``pauth-qarma3`` or ``pauth-qarma5`` enabled,
+-the architected QARMA5 algorithm is used.  The architected QARMA5
++the QEMU impdef algorithm is used.  The architected QARMA5
+ and QARMA3 algorithms have good cryptographic properties, but can
+ be quite slow to emulate.  The impdef algorithm used by QEMU is
+ non-cryptographic but significantly faster.
+diff --git a/docs/system/introduction.rst b/docs/system/introduction.rst
+index XXXXXXX..XXXXXXX 100644
+--- a/docs/system/introduction.rst
++++ b/docs/system/introduction.rst
+@@ -XXX,XX +XXX,XX @@ would default to it anyway.
+ .. code::
+- -cpu max,pauth-impdef=on \
++ -cpu max \
+  -smp 4 \
+  -accel tcg \
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.h
++++ b/target/arm/cpu.h
+@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
+     /* QOM property to indicate we should use the back-compat CNTFRQ default */
+     bool backcompat_cntfrq;
++    /* QOM property to indicate we should use the back-compat QARMA5 default */
++    bool backcompat_pauth_default_use_qarma5;
++
+     /* Specify the number of cores in this CPU cluster. Used for the L2CTLR
+      * register.
+      */
+diff --git a/hw/core/machine.c b/hw/core/machine.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/core/machine.c
++++ b/hw/core/machine.c
+@@ -XXX,XX +XXX,XX @@
+ #include "hw/virtio/virtio-iommu.h"
+ #include "audio/audio.h"
+-GlobalProperty hw_compat_9_2[] = {};
++GlobalProperty hw_compat_9_2[] = {
++    {"arm-cpu", "backcompat-pauth-default-use-qarma5", "true"},
++};
+ const size_t hw_compat_9_2_len = G_N_ELEMENTS(hw_compat_9_2);
+ GlobalProperty hw_compat_9_1[] = {
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu.c
++++ b/target/arm/cpu.c
+@@ -XXX,XX +XXX,XX @@ static const Property arm_cpu_properties[] = {
+     DEFINE_PROP_INT32("core-count", ARMCPU, core_count, -1),
+     /* True to default to the backward-compat old CNTFRQ rather than 1Ghz */
+     DEFINE_PROP_BOOL("backcompat-cntfrq", ARMCPU, backcompat_cntfrq, false),
++    DEFINE_PROP_BOOL("backcompat-pauth-default-use-qarma5", ARMCPU,
++                      backcompat_pauth_default_use_qarma5, false),
+ };
+ static const gchar *arm_gdb_arch_name(CPUState *cs)
+diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/cpu64.c
++++ b/target/arm/cpu64.c
+@@ -XXX,XX +XXX,XX @@ void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
+                 return;
+             }
+-            if (cpu->prop_pauth_impdef) {
+-                isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, API, features);
+-                isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, GPI, 1);
++            bool use_default = !cpu->prop_pauth_qarma5 &&
++                               !cpu->prop_pauth_qarma3 &&
++                               !cpu->prop_pauth_impdef;
++
++            if (cpu->prop_pauth_qarma5 ||
++                (use_default &&
++                 cpu->backcompat_pauth_default_use_qarma5)) {
++                isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, APA, features);
++                isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, GPA, 1);
+             } else if (cpu->prop_pauth_qarma3) {
+                 isar2 = FIELD_DP64(isar2, ID_AA64ISAR2, APA3, features);
+                 isar2 = FIELD_DP64(isar2, ID_AA64ISAR2, GPA3, 1);
+-            } else { /* default is pauth-qarma5 */
+-                isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, APA, features);
+-                isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, GPA, 1);
++            } else if (cpu->prop_pauth_impdef ||
++                       (use_default &&
++                        !cpu->backcompat_pauth_default_use_qarma5)) {
++                isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, API, features);
++                isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, GPI, 1);
++            } else {
++                g_assert_not_reached();
+             }
+         } else if (cpu->prop_pauth_impdef ||
+                    cpu->prop_pauth_qarma3 ||
+--
+.34.1

-[PULL 1/4] hw/arm/mps2-tz.c: fix RX/TX interrupts order
+[PULL 11/11] docs/system/arm/virt: mention specific migration information
-From: Marco Palumbi <Marco.Palumbi@tii.ae>
+From: Pierrick Bouvier <pierrick.bouvier@linaro.org>
-The order of the RX and TX interrupts are swapped.
+Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
-This commit fixes the order as per the following documents:
+Message-id: 20241219183211.3493974-4-pierrick.bouvier@linaro.org
- * https://developer.arm.com/documentation/dai0505/latest/
+[PMM: Removed a paragraph about using non-versioned models.]
  * https://developer.arm.com/documentation/dai0521/latest/
  * https://developer.arm.com/documentation/dai0524/latest/
  * https://developer.arm.com/documentation/dai0547/latest/
 Cc: qemu-stable@nongnu.org
 Signed-off-by: Marco Palumbi <Marco.Palumbi@tii.ae>
 Message-id: 20240730073123.72992-1-marco@palumbi.it
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/mps2-tz.c | 6 +++---
+ docs/system/arm/virt.rst | 4 ++++
-file changed, 3 insertions(+), 3 deletions(-)
+file changed, 4 insertions(+)
-diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
+diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/mps2-tz.c
+--- a/docs/system/arm/virt.rst
-+++ b/hw/arm/mps2-tz.c
++++ b/docs/system/arm/virt.rst
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
+@@ -XXX,XX +XXX,XX @@ of the 5.0 release and ``virt-5.0`` of the 5.1 release. Migration
-                                const char *name, hwaddr size,
+ is not guaranteed to work between different QEMU releases for
-                                const int *irqs, const PPCExtraData *extradata)
+ the non-versioned ``virt`` machine type.
- {
--    /* The irq[] array is tx, rx, combined, in that order */
++VM migration is not guaranteed when using ``-cpu max``, as features
-+    /* The irq[] array is rx, tx, combined, in that order */
++supported may change between QEMU versions.  To ensure your VM can be
-     MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
++migrated, it is recommended to use another cpu model instead.
-     CMSDKAPBUART *uart = opaque;
++
-     int i = uart - &mms->uart[0];
+ Supported devices
-@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
+ """""""""""""""""
-     qdev_prop_set_uint32(DEVICE(uart), "pclk-frq", mmc->apb_periph_frq);
      sysbus_realize(SYS_BUS_DEVICE(uart), &error_fatal);
      s = SYS_BUS_DEVICE(uart);
 -    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, irqs[0]));
 -    sysbus_connect_irq(s, 1, get_sse_irq_in(mms, irqs[1]));
 +    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, irqs[1]));
 +    sysbus_connect_irq(s, 1, get_sse_irq_in(mms, irqs[0]));
      sysbus_connect_irq(s, 2, qdev_get_gpio_in(orgate_dev, i * 2));
      sysbus_connect_irq(s, 3, qdev_get_gpio_in(orgate_dev, i * 2 + 1));
      sysbus_connect_irq(s, 4, get_sse_irq_in(mms, irqs[2]));
 --
 .34.1

Just 4 bug fixes here...

thanks
-- PMM

The following changes since commit e9d2db818ff934afb366aea566d0b33acf7bced1:

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging (2024-08-01 07:31:49 +1000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20240801

for you to fetch changes up to 5e8e4f098d872818aa9a138a171200068b81c8d1:

target/xtensa: Correct assert condition in handle_interrupt() (2024-08-01 10:59:01 +0100)

----------------------------------------------------------------
target-arm queue:
 * hw/arm/mps2-tz.c: fix RX/TX interrupts order
 * accel/kvm/kvm-all: Fixes the missing break in vCPU unpark logic
 * target/arm: Handle denormals correctly for FMOPA (widening)
 * target/xtensa: Correct assert condition in handle_interrupt()

----------------------------------------------------------------
Marco Palumbi (1):
      hw/arm/mps2-tz.c: fix RX/TX interrupts order

Peter Maydell (2):
      target/arm: Handle denormals correctly for FMOPA (widening)
      target/xtensa: Correct assert condition in handle_interrupt()

Salil Mehta (1):
      accel/kvm/kvm-all: Fixes the missing break in vCPU unpark logic

target/arm/tcg/helper-sme.h    |  2 +-
 accel/kvm/kvm-all.c            |  1 +
 hw/arm/mps2-tz.c               |  6 +++---
 target/arm/tcg/sme_helper.c    | 39 +++++++++++++++++++++++++++------------
 target/arm/tcg/translate-sme.c | 25 +++++++++++++++++++++++--
 target/xtensa/exc_helper.c     |  2 +-
 6 files changed, 56 insertions(+), 19 deletions(-)

From: Marco Palumbi <Marco.Palumbi@tii.ae>

The order of the RX and TX interrupts are swapped.
This commit fixes the order as per the following documents:
 * https://developer.arm.com/documentation/dai0505/latest/
 * https://developer.arm.com/documentation/dai0521/latest/
 * https://developer.arm.com/documentation/dai0524/latest/
 * https://developer.arm.com/documentation/dai0547/latest/

Cc: qemu-stable@nongnu.org
Signed-off-by: Marco Palumbi <Marco.Palumbi@tii.ae>
Message-id: 20240730073123.72992-1-marco@palumbi.it
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/mps2-tz.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
                                const char *name, hwaddr size,
                                const int *irqs, const PPCExtraData *extradata)
 {
-    /* The irq[] array is tx, rx, combined, in that order */
+    /* The irq[] array is rx, tx, combined, in that order */
     MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
     CMSDKAPBUART *uart = opaque;
     int i = uart - &mms->uart[0];
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_uart(MPS2TZMachineState *mms, void *opaque,
     qdev_prop_set_uint32(DEVICE(uart), "pclk-frq", mmc->apb_periph_frq);
     sysbus_realize(SYS_BUS_DEVICE(uart), &error_fatal);
     s = SYS_BUS_DEVICE(uart);
-    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, irqs[0]));
-    sysbus_connect_irq(s, 1, get_sse_irq_in(mms, irqs[1]));
+    sysbus_connect_irq(s, 0, get_sse_irq_in(mms, irqs[1]));
+    sysbus_connect_irq(s, 1, get_sse_irq_in(mms, irqs[0]));
     sysbus_connect_irq(s, 2, qdev_get_gpio_in(orgate_dev, i * 2));
     sysbus_connect_irq(s, 3, qdev_get_gpio_in(orgate_dev, i * 2 + 1));
     sysbus_connect_irq(s, 4, get_sse_irq_in(mms, irqs[2]));
-- 
2.34.1

From: Salil Mehta <salil.mehta@huawei.com>

Loop should exit prematurely on successfully finding out the parked vCPU (struct
KVMParkedVcpu) in the 'struct KVMState' maintained 'kvm_parked_vcpus' list of
parked vCPUs.

Fixes: Coverity CID 1558552
Fixes: 08c3286822 ("accel/kvm: Extract common KVM vCPU {creation,parking} code")
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 20240725145132.99355-1-salil.mehta@huawei.com
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Message-ID: <CAFEAcA-3_d1c7XSXWkFubD-LsW5c5i95e6xxV09r2C9yGtzcdA@mail.gmail.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 accel/kvm/kvm-all.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -XXX,XX +XXX,XX @@ int kvm_unpark_vcpu(KVMState *s, unsigned long vcpu_id)
             QLIST_REMOVE(cpu, node);
             kvm_fd = cpu->kvm_fd;
             g_free(cpu);
+            break;
         }
     }
 
-- 
2.34.1

The FMOPA (widening) SME instruction takes pairs of half-precision
floating point values, widens them to single-precision, does a
two-way dot product and accumulates the results into a
single-precision destination.  We don't quite correctly handle the
FPCR bits FZ and FZ16 which control flushing of denormal inputs and
outputs.  This is because at the moment we pass a single float_status
value to the helper function, which then uses that configuration for
all the fp operations it does.  However, because the inputs to this
operation are float16 and the outputs are float32 we need to use the
fp_status_f16 for the float16 input widening but the normal fp_status
for everything else.  Otherwise we will apply the flushing control
FPCR.FZ16 to the 32-bit output rather than the FPCR.FZ control, and
incorrectly flush a denormal output to zero when we should not (or
vice-versa).

(In commit 207d30b5fdb5b we tried to fix the FZ handling but
didn't get it right, switching from "use FPCR.FZ for everything" to
"use FPCR.FZ16 for everything".)

Pass the CPU env to the sme_fmopa_h helper instead of an fp_status
pointer, and have the helper pass an extra fp_status into the
f16_dotadd() function so that we can use the right status for the
right parts of this operation.

Cc: qemu-stable@nongnu.org
Fixes: 207d30b5fdb5 ("target/arm: Use FPST_F16 for SME FMOPA (widening)")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2373
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-sme.h    |  2 +-
 target/arm/tcg/sme_helper.c    | 39 +++++++++++++++++++++++-----------
 target/arm/tcg/translate-sme.c | 25 ++++++++++++++++++++--
 3 files changed, 51 insertions(+), 15 deletions(-)

diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/helper-sme.h
+++ b/target/arm/tcg/helper-sme.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_FLAGS_5(sme_addha_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_7(sme_fmopa_h, TCG_CALL_NO_RWG,
-                   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
+                   void, ptr, ptr, ptr, ptr, ptr, env, i32)
 DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG,
                    void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG,
diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -XXX,XX +XXX,XX @@ static inline uint32_t f16mop_adj_pair(uint32_t pair, uint32_t pg, uint32_t neg)
 }
 
 static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2,
-                          float_status *s_std, float_status *s_odd)
+                          float_status *s_f16, float_status *s_std,
+                          float_status *s_odd)
 {
-    float64 e1r = float16_to_float64(e1 & 0xffff, true, s_std);
-    float64 e1c = float16_to_float64(e1 >> 16, true, s_std);
-    float64 e2r = float16_to_float64(e2 & 0xffff, true, s_std);
-    float64 e2c = float16_to_float64(e2 >> 16, true, s_std);
+    /*
+     * We need three different float_status for different parts of this
+     * operation:
+     *  - the input conversion of the float16 values must use the
+     *    f16-specific float_status, so that the FPCR.FZ16 control is applied
+     *  - operations on float32 including the final accumulation must use
+     *    the normal float_status, so that FPCR.FZ is applied
+     *  - we have pre-set-up copy of s_std which is set to round-to-odd,
+     *    for the multiply (see below)
+     */
+    float64 e1r = float16_to_float64(e1 & 0xffff, true, s_f16);
+    float64 e1c = float16_to_float64(e1 >> 16, true, s_f16);
+    float64 e2r = float16_to_float64(e2 & 0xffff, true, s_f16);
+    float64 e2c = float16_to_float64(e2 >> 16, true, s_f16);
     float64 t64;
     float32 t32;
 
@@ -XXX,XX +XXX,XX @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2,
 }
 
 void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn,
-                         void *vpm, void *vst, uint32_t desc)
+                         void *vpm, CPUARMState *env, uint32_t desc)
 {
     intptr_t row, col, oprsz = simd_maxsz(desc);
     uint32_t neg = simd_data(desc) * 0x80008000u;
     uint16_t *pn = vpn, *pm = vpm;
-    float_status fpst_odd, fpst_std;
+    float_status fpst_odd, fpst_std, fpst_f16;
 
     /*
-     * Make a copy of float_status because this operation does not
-     * update the cumulative fp exception status.  It also produces
-     * default nans.  Make a second copy with round-to-odd -- see above.
+     * Make copies of fp_status and fp_status_f16, because this operation
+     * does not update the cumulative fp exception status.  It also
+     * produces default NaNs. We also need a second copy of fp_status with
+     * round-to-odd -- see above.
      */
-    fpst_std = *(float_status *)vst;
+    fpst_f16 = env->vfp.fp_status_f16;
+    fpst_std = env->vfp.fp_status;
     set_default_nan_mode(true, &fpst_std);
+    set_default_nan_mode(true, &fpst_f16);
     fpst_odd = fpst_std;
     set_float_rounding_mode(float_round_to_odd, &fpst_odd);
 
@@ -XXX,XX +XXX,XX @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn,
                         uint32_t m = *(uint32_t *)(vzm + H1_4(col));
 
                         m = f16mop_adj_pair(m, pcol, 0);
-                        *a = f16_dotadd(*a, n, m, &fpst_std, &fpst_odd);
+                        *a = f16_dotadd(*a, n, m,
+                                        &fpst_f16, &fpst_std, &fpst_odd);
                     }
                     col += 4;
                     pcol >>= 4;
diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -XXX,XX +XXX,XX @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz,
     return true;
 }
 
-TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_fpst, a,
-           MO_32, FPST_FPCR_F16, gen_helper_sme_fmopa_h)
+static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz,
+                           gen_helper_gvec_5_ptr *fn)
+{
+    int svl = streaming_vec_reg_size(s);
+    uint32_t desc = simd_desc(svl, svl, a->sub);
+    TCGv_ptr za, zn, zm, pn, pm;
+
+    if (!sme_smza_enabled_check(s)) {
+        return true;
+    }
+
+    za = get_tile(s, esz, a->zad);
+    zn = vec_full_reg_ptr(s, a->zn);
+    zm = vec_full_reg_ptr(s, a->zm);
+    pn = pred_full_reg_ptr(s, a->pn);
+    pm = pred_full_reg_ptr(s, a->pm);
+
+    fn(za, zn, zm, pn, pm, tcg_env, tcg_constant_i32(desc));
+    return true;
+}
+
+TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_env, a,
+           MO_32, gen_helper_sme_fmopa_h)
 TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a,
            MO_32, FPST_FPCR, gen_helper_sme_fmopa_s)
 TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a,
-- 
2.34.1

In commit ad18376b90c8101 we added an assert that the level value was
in-bounds for the array we're about to index into.  However, the
assert condition is wrong -- env->config->interrupt_vector is an
array of uint32_t, so we should bounds check the index against
ARRAY_SIZE(...), not against sizeof().

Resolves: Coverity CID 1507131
Fixes: ad18376b90c8101 ("target/xtensa: Assert that interrupt level is within bounds")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20240731172246.3682311-1-peter.maydell@linaro.org
---
 target/xtensa/exc_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/xtensa/exc_helper.c b/target/xtensa/exc_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/xtensa/exc_helper.c
+++ b/target/xtensa/exc_helper.c
@@ -XXX,XX +XXX,XX @@ static void handle_interrupt(CPUXtensaState *env)
 
         if (level > 1) {
             /* env->config->nlevel check should have ensured this */
-            assert(level < sizeof(env->config->interrupt_vector));
+            assert(level < ARRAY_SIZE(env->config->interrupt_vector));
 
             env->sregs[EPC1 + level - 1] = env->pc;
             env->sregs[EPS2 + level - 2] = env->sregs[PS];
-- 
2.34.1

The following changes since commit 3214bec13d8d4c40f707d21d8350d04e4123ae97:

Merge tag 'migration-20250110-pull-request' of https://gitlab.com/farosas/qemu into staging (2025-01-10 13:39:19 -0500)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20250113

for you to fetch changes up to 435d260e7ec5ff9c79e3e62f1d66ec82d2d691ae:

docs/system/arm/virt: mention specific migration information (2025-01-13 12:35:35 +0000)

----------------------------------------------------------------
target-arm queue:
 * hw/arm_sysctl: fix extracting 31th bit of val
 * hw/misc: cast rpm to uint64_t
 * tests/qtest/boot-serial-test: Improve ASM
 * target/arm: Move minor arithmetic helpers out of helper.c
 * target/arm: change default pauth algorithm to impdef

----------------------------------------------------------------
Anastasia Belova (1):
      hw/arm_sysctl: fix extracting 31th bit of val

Peter Maydell (2):
      target/arm: Move minor arithmetic helpers out of helper.c
      tests/tcg/aarch64: force qarma5 for pauth-3 test

Philippe Mathieu-Daudé (4):
      tests/qtest/boot-serial-test: Improve ASM comments of PL011 tests
      tests/qtest/boot-serial-test: Reduce for() loop in PL011 tests
      tests/qtest/boot-serial-test: Reorder pair of instructions in PL011 test
      tests/qtest/boot-serial-test: Initialize PL011 Control register

Pierrick Bouvier (3):
      target/arm: add new property to select pauth-qarma5
      target/arm: change default pauth algorithm to impdef
      docs/system/arm/virt: mention specific migration information

Tigran Sogomonian (1):
      hw/misc: cast rpm to uint64_t

docs/system/arm/cpu-features.rst                |   7 +-
 docs/system/arm/virt.rst                        |   4 +
 docs/system/introduction.rst                    |   2 +-
 target/arm/cpu.h                                |   4 +
 hw/core/machine.c                               |   4 +-
 hw/misc/arm_sysctl.c                            |   2 +-
 hw/misc/npcm7xx_mft.c                           |   5 +-
 target/arm/arm-qmp-cmds.c                       |   2 +-
 target/arm/cpu.c                                |   2 +
 target/arm/cpu64.c                              |  38 ++-
 target/arm/helper.c                             | 285 -----------------------
 target/arm/tcg/arith_helper.c                   | 296 ++++++++++++++++++++++++
 tests/qtest/arm-cpu-features.c                  |  15 +-
 tests/qtest/boot-serial-test.c                  |  23 +-
 target/arm/{op_addsub.h => tcg/op_addsub.c.inc} |   0
 target/arm/tcg/meson.build                      |   1 +
 tests/tcg/aarch64/Makefile.softmmu-target       |   3 +
 17 files changed, 377 insertions(+), 316 deletions(-)
 create mode 100644 target/arm/tcg/arith_helper.c
 rename target/arm/{op_addsub.h => tcg/op_addsub.c.inc} (100%)

From: Anastasia Belova <abelova@astralinux.ru>

1 << 31 is casted to uint64_t while bitwise and with val.
So this value may become 0xffffffff80000000 but only
31th "start" bit is required.

This is not possible in practice because the MemoryRegionOps
uses the default max access size of 4 bytes and so none
of the upper bytes of val will be set, but the bitfield
extract API is clearer anyway.

Use the bitfield extract() API instead.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Anastasia Belova <abelova@astralinux.ru>
Message-id: 20241220125429.7552-1-abelova@astralinux.ru
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: add clarification to commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/misc/arm_sysctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/misc/arm_sysctl.c b/hw/misc/arm_sysctl.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/arm_sysctl.c
+++ b/hw/misc/arm_sysctl.c
@@ -XXX,XX +XXX,XX @@ static void arm_sysctl_write(void *opaque, hwaddr offset,
          * as zero.
          */
         s->sys_cfgctrl = val & ~((3 << 18) | (1 << 31));
-        if (val & (1 << 31)) {
+        if (extract64(val, 31, 1)) {
             /* Start bit set -- actually do something */
             unsigned int dcc = extract32(s->sys_cfgctrl, 26, 4);
             unsigned int function = extract32(s->sys_cfgctrl, 20, 6);
-- 
2.34.1

From: Tigran Sogomonian <tsogomonian@astralinux.ru>

The value of an arithmetic expression
'rpm * NPCM7XX_MFT_PULSE_PER_REVOLUTION' is a subject
to overflow because its operands are not cast to
a larger data type before performing arithmetic. Thus, need
to cast rpm to uint64_t.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Tigran Sogomonian <tsogomonian@astralinux.ru>
Reviewed-by: Patrick Leis <venture@google.com>
Reviewed-by: Hao Wu <wuhaotsh@google.com>
Message-id: 20241226130311.1349-1-tsogomonian@astralinux.ru
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/misc/npcm7xx_mft.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/misc/npcm7xx_mft.c b/hw/misc/npcm7xx_mft.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/npcm7xx_mft.c
+++ b/hw/misc/npcm7xx_mft.c
@@ -XXX,XX +XXX,XX @@ static NPCM7xxMFTCaptureState npcm7xx_mft_compute_cnt(
          * RPM = revolution/min. The time for one revlution (in ns) is
          * MINUTE_TO_NANOSECOND / RPM.
          */
-        count = clock_ns_to_ticks(clock, (60 * NANOSECONDS_PER_SECOND) /
-            (rpm * NPCM7XX_MFT_PULSE_PER_REVOLUTION));
+        count = clock_ns_to_ticks(clock,
+            (uint64_t)(60 * NANOSECONDS_PER_SECOND) /
+            ((uint64_t)rpm * NPCM7XX_MFT_PULSE_PER_REVOLUTION));
     }
 
     if (count > NPCM7XX_MFT_MAX_CNT) {
-- 
2.34.1

From: Philippe Mathieu-Daudé <philmd@linaro.org>

Re-indent ASM comments adding the 'loop:' label.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/qtest/boot-serial-test.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/boot-serial-test.c
+++ b/tests/qtest/boot-serial-test.c
@@ -XXX,XX +XXX,XX @@ static const uint8_t kernel_plml605[] = {
 };
 
 static const uint8_t bios_raspi2[] = {
-    0x08, 0x30, 0x9f, 0xe5,                 /* ldr   r3,[pc,#8]    Get base */
-    0x54, 0x20, 0xa0, 0xe3,                 /* mov     r2,#'T' */
-    0x00, 0x20, 0xc3, 0xe5,                 /* strb    r2,[r3] */
-    0xfb, 0xff, 0xff, 0xea,                 /* b       loop */
-    0x00, 0x10, 0x20, 0x3f,                 /* 0x3f201000 = UART0 base addr */
+    0x08, 0x30, 0x9f, 0xe5,                 /* loop:  ldr     r3, [pc, #8]   Get &UART0 */
+    0x54, 0x20, 0xa0, 0xe3,                 /*        mov     r2, #'T' */
+    0x00, 0x20, 0xc3, 0xe5,                 /*        strb    r2, [r3]       *TXDAT = 'T' */
+    0xfb, 0xff, 0xff, 0xea,                 /*        b       -12            (loop) */
+    0x00, 0x10, 0x20, 0x3f,                 /* UART0: 0x3f201000 */
 };
 
 static const uint8_t kernel_aarch64[] = {
-    0x81, 0x0a, 0x80, 0x52,                 /* mov     w1, #0x54 */
-    0x02, 0x20, 0xa1, 0xd2,                 /* mov     x2, #0x9000000 */
-    0x41, 0x00, 0x00, 0x39,                 /* strb    w1, [x2] */
-    0xfd, 0xff, 0xff, 0x17,                 /* b       -12 (loop) */
+    0x81, 0x0a, 0x80, 0x52,                 /* loop:  mov    w1, #'T' */
+    0x02, 0x20, 0xa1, 0xd2,                 /*        mov    x2, #0x9000000  Load UART0 */
+    0x41, 0x00, 0x00, 0x39,                 /*        strb   w1, [x2]        *TXDAT = 'T' */
+    0xfd, 0xff, 0xff, 0x17,                 /*        b      -12             (loop) */
 };
 
 static const uint8_t kernel_nrf51[] = {
-- 
2.34.1

From: Philippe Mathieu-Daudé <philmd@linaro.org>

Since registers are not modified, we don't need
to refill their values. Directly jump to the previous
store instruction to keep filling the TXDAT register.

The equivalent C code remains:

while (true) {
      *UART_DATA = 'T';
  }

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/qtest/boot-serial-test.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/boot-serial-test.c
+++ b/tests/qtest/boot-serial-test.c
@@ -XXX,XX +XXX,XX @@ static const uint8_t kernel_plml605[] = {
 };
 
 static const uint8_t bios_raspi2[] = {
-    0x08, 0x30, 0x9f, 0xe5,                 /* loop:  ldr     r3, [pc, #8]   Get &UART0 */
+    0x08, 0x30, 0x9f, 0xe5,                 /*        ldr     r3, [pc, #8]   Get &UART0 */
     0x54, 0x20, 0xa0, 0xe3,                 /*        mov     r2, #'T' */
-    0x00, 0x20, 0xc3, 0xe5,                 /*        strb    r2, [r3]       *TXDAT = 'T' */
-    0xfb, 0xff, 0xff, 0xea,                 /*        b       -12            (loop) */
+    0x00, 0x20, 0xc3, 0xe5,                 /* loop:  strb    r2, [r3]       *TXDAT = 'T' */
+    0xff, 0xff, 0xff, 0xea,                 /*        b       -4             (loop) */
     0x00, 0x10, 0x20, 0x3f,                 /* UART0: 0x3f201000 */
 };
 
 static const uint8_t kernel_aarch64[] = {
-    0x81, 0x0a, 0x80, 0x52,                 /* loop:  mov    w1, #'T' */
+    0x81, 0x0a, 0x80, 0x52,                 /*        mov    w1, #'T' */
     0x02, 0x20, 0xa1, 0xd2,                 /*        mov    x2, #0x9000000  Load UART0 */
-    0x41, 0x00, 0x00, 0x39,                 /*        strb   w1, [x2]        *TXDAT = 'T' */
-    0xfd, 0xff, 0xff, 0x17,                 /*        b      -12             (loop) */
+    0x41, 0x00, 0x00, 0x39,                 /* loop:  strb   w1, [x2]        *TXDAT = 'T' */
+    0xff, 0xff, 0xff, 0x17,                 /*        b      -4              (loop) */
 };
 
 static const uint8_t kernel_nrf51[] = {
-- 
2.34.1

From: Philippe Mathieu-Daudé <philmd@linaro.org>

In the next commit we are going to use a different value
for the $w1 register, maintaining the same $x2 value. In
order to keep the next commit trivial to review, set $x2
before $w1.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/qtest/boot-serial-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

From: Philippe Mathieu-Daudé <philmd@linaro.org>

The tests using the PL011 UART of the virt and raspi machines
weren't properly enabling the UART and its transmitter previous
to sending characters. Follow the PL011 manual initialization
recommendation by setting the proper bits of the control register.

Update the ASM code prefixing:

*UART_CTRL = UART_ENABLE | TX_ENABLE;

to:

while (true) {
      *UART_DATA = 'T';
  }

Note, since commit 51b61dd4d56 ("hw/char/pl011: Warn when using
disabled transmitter") incomplete PL011 initialization can be
logged using the '-d guest_errors' command line option.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/qtest/boot-serial-test.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/boot-serial-test.c
+++ b/tests/qtest/boot-serial-test.c
@@ -XXX,XX +XXX,XX @@ static const uint8_t kernel_plml605[] = {
 };
 
 static const uint8_t bios_raspi2[] = {
-    0x08, 0x30, 0x9f, 0xe5,                 /*        ldr     r3, [pc, #8]   Get &UART0 */
+    0x10, 0x30, 0x9f, 0xe5,                 /*        ldr     r3, [pc, #16]  Get &UART0 */
+    0x10, 0x20, 0x9f, 0xe5,                 /*        ldr     r2, [pc, #16]  Get &CR */
+    0xb0, 0x23, 0xc3, 0xe1,                 /*        strh    r2, [r3, #48]  Set CR */
     0x54, 0x20, 0xa0, 0xe3,                 /*        mov     r2, #'T' */
     0x00, 0x20, 0xc3, 0xe5,                 /* loop:  strb    r2, [r3]       *TXDAT = 'T' */
     0xff, 0xff, 0xff, 0xea,                 /*        b       -4             (loop) */
     0x00, 0x10, 0x20, 0x3f,                 /* UART0: 0x3f201000 */
+    0x01, 0x01, 0x00, 0x00,                 /* CR:    0x101 = UARTEN|TXE */
 };
 
 static const uint8_t kernel_aarch64[] = {
     0x02, 0x20, 0xa1, 0xd2,                 /*        mov    x2, #0x9000000  Load UART0 */
+    0x21, 0x20, 0x80, 0x52,                 /*        mov    w1, 0x101       CR = UARTEN|TXE */
+    0x41, 0x60, 0x00, 0x79,                 /*        strh   w1, [x2, #48]   Set CR */
     0x81, 0x0a, 0x80, 0x52,                 /*        mov    w1, #'T' */
     0x41, 0x00, 0x00, 0x39,                 /* loop:  strb   w1, [x2]        *TXDAT = 'T' */
     0xff, 0xff, 0xff, 0x17,                 /*        b      -4              (loop) */
-- 
2.34.1

helper.c includes some small TCG helper functions used for mostly
arithmetic instructions.  These are TCG only and there's no need for
them to be in the large and unwieldy helper.c.  Move them out to
their own source file in the tcg/ subdirectory, together with the
op_addsub.h multiply-included template header that they use.

Since we are moving op_addsub.h, we take the opportunity to
give it a name which matches our convention for files which
are not true header files but which are #included from other
C files: op_addsub.c.inc.

(Ironically, this means that helper.c no longer contains
any TCG helper function definitions at all.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20250110131211.2546314-1-peter.maydell@linaro.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
---
 target/arm/helper.c                           | 285 -----------------
 target/arm/tcg/arith_helper.c                 | 296 ++++++++++++++++++
 .../arm/{op_addsub.h => tcg/op_addsub.c.inc}  |   0
 target/arm/tcg/meson.build                    |   1 +
 4 files changed, 297 insertions(+), 285 deletions(-)
 create mode 100644 target/arm/tcg/arith_helper.c
 rename target/arm/{op_addsub.h => tcg/op_addsub.c.inc} (100%)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/main-loop.h"
 #include "qemu/timer.h"
 #include "qemu/bitops.h"
-#include "qemu/crc32c.h"
 #include "qemu/qemu-print.h"
 #include "exec/exec-all.h"
 #include "exec/translation-block.h"
-#include <zlib.h> /* for crc32 */
 #include "hw/irq.h"
 #include "system/cpu-timers.h"
 #include "system/kvm.h"
@@ -XXX,XX +XXX,XX @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
     };
 }
 
-/*
- * Note that signed overflow is undefined in C.  The following routines are
- * careful to use unsigned types where modulo arithmetic is required.
- * Failure to do so _will_ break on newer gcc.
- */
-
-/* Signed saturating arithmetic.  */
-
-/* Perform 16-bit signed saturating addition.  */
-static inline uint16_t add16_sat(uint16_t a, uint16_t b)
-{
-    uint16_t res;
-
-    res = a + b;
-    if (((res ^ a) & 0x8000) && !((a ^ b) & 0x8000)) {
-        if (a & 0x8000) {
-            res = 0x8000;
-        } else {
-            res = 0x7fff;
-        }
-    }
-    return res;
-}
-
-/* Perform 8-bit signed saturating addition.  */
-static inline uint8_t add8_sat(uint8_t a, uint8_t b)
-{
-    uint8_t res;
-
-    res = a + b;
-    if (((res ^ a) & 0x80) && !((a ^ b) & 0x80)) {
-        if (a & 0x80) {
-            res = 0x80;
-        } else {
-            res = 0x7f;
-        }
-    }
-    return res;
-}
-
-/* Perform 16-bit signed saturating subtraction.  */
-static inline uint16_t sub16_sat(uint16_t a, uint16_t b)
-{
-    uint16_t res;
-
-    res = a - b;
-    if (((res ^ a) & 0x8000) && ((a ^ b) & 0x8000)) {
-        if (a & 0x8000) {
-            res = 0x8000;
-        } else {
-            res = 0x7fff;
-        }
-    }
-    return res;
-}
-
-/* Perform 8-bit signed saturating subtraction.  */
-static inline uint8_t sub8_sat(uint8_t a, uint8_t b)
-{
-    uint8_t res;
-
-    res = a - b;
-    if (((res ^ a) & 0x80) && ((a ^ b) & 0x80)) {
-        if (a & 0x80) {
-            res = 0x80;
-        } else {
-            res = 0x7f;
-        }
-    }
-    return res;
-}
-
-#define ADD16(a, b, n) RESULT(add16_sat(a, b), n, 16);
-#define SUB16(a, b, n) RESULT(sub16_sat(a, b), n, 16);
-#define ADD8(a, b, n)  RESULT(add8_sat(a, b), n, 8);
-#define SUB8(a, b, n)  RESULT(sub8_sat(a, b), n, 8);
-#define PFX q
-
-#include "op_addsub.h"
-
-/* Unsigned saturating arithmetic.  */
-static inline uint16_t add16_usat(uint16_t a, uint16_t b)
-{
-    uint16_t res;
-    res = a + b;
-    if (res < a) {
-        res = 0xffff;
-    }
-    return res;
-}
-
-static inline uint16_t sub16_usat(uint16_t a, uint16_t b)
-{
-    if (a > b) {
-        return a - b;
-    } else {
-        return 0;
-    }
-}
-
-static inline uint8_t add8_usat(uint8_t a, uint8_t b)
-{
-    uint8_t res;
-    res = a + b;
-    if (res < a) {
-        res = 0xff;
-    }
-    return res;
-}
-
-static inline uint8_t sub8_usat(uint8_t a, uint8_t b)
-{
-    if (a > b) {
-        return a - b;
-    } else {
-        return 0;
-    }
-}
-
-#define ADD16(a, b, n) RESULT(add16_usat(a, b), n, 16);
-#define SUB16(a, b, n) RESULT(sub16_usat(a, b), n, 16);
-#define ADD8(a, b, n)  RESULT(add8_usat(a, b), n, 8);
-#define SUB8(a, b, n)  RESULT(sub8_usat(a, b), n, 8);
-#define PFX uq
-
-#include "op_addsub.h"
-
-/* Signed modulo arithmetic.  */
-#define SARITH16(a, b, n, op) do { \
-    int32_t sum; \
-    sum = (int32_t)(int16_t)(a) op (int32_t)(int16_t)(b); \
-    RESULT(sum, n, 16); \
-    if (sum >= 0) \
-        ge |= 3 << (n * 2); \
-    } while (0)
-
-#define SARITH8(a, b, n, op) do { \
-    int32_t sum; \
-    sum = (int32_t)(int8_t)(a) op (int32_t)(int8_t)(b); \
-    RESULT(sum, n, 8); \
-    if (sum >= 0) \
-        ge |= 1 << n; \
-    } while (0)
-
-
-#define ADD16(a, b, n) SARITH16(a, b, n, +)
-#define SUB16(a, b, n) SARITH16(a, b, n, -)
-#define ADD8(a, b, n)  SARITH8(a, b, n, +)
-#define SUB8(a, b, n)  SARITH8(a, b, n, -)
-#define PFX s
-#define ARITH_GE
-
-#include "op_addsub.h"
-
-/* Unsigned modulo arithmetic.  */
-#define ADD16(a, b, n) do { \
-    uint32_t sum; \
-    sum = (uint32_t)(uint16_t)(a) + (uint32_t)(uint16_t)(b); \
-    RESULT(sum, n, 16); \
-    if ((sum >> 16) == 1) \
-        ge |= 3 << (n * 2); \
-    } while (0)
-
-#define ADD8(a, b, n) do { \
-    uint32_t sum; \
-    sum = (uint32_t)(uint8_t)(a) + (uint32_t)(uint8_t)(b); \
-    RESULT(sum, n, 8); \
-    if ((sum >> 8) == 1) \
-        ge |= 1 << n; \
-    } while (0)
-
-#define SUB16(a, b, n) do { \
-    uint32_t sum; \
-    sum = (uint32_t)(uint16_t)(a) - (uint32_t)(uint16_t)(b); \
-    RESULT(sum, n, 16); \
-    if ((sum >> 16) == 0) \
-        ge |= 3 << (n * 2); \
-    } while (0)
-
-#define SUB8(a, b, n) do { \
-    uint32_t sum; \
-    sum = (uint32_t)(uint8_t)(a) - (uint32_t)(uint8_t)(b); \
-    RESULT(sum, n, 8); \
-    if ((sum >> 8) == 0) \
-        ge |= 1 << n; \
-    } while (0)
-
-#define PFX u
-#define ARITH_GE
-
-#include "op_addsub.h"
-
-/* Halved signed arithmetic.  */
-#define ADD16(a, b, n) \
-  RESULT(((int32_t)(int16_t)(a) + (int32_t)(int16_t)(b)) >> 1, n, 16)
-#define SUB16(a, b, n) \
-  RESULT(((int32_t)(int16_t)(a) - (int32_t)(int16_t)(b)) >> 1, n, 16)
-#define ADD8(a, b, n) \
-  RESULT(((int32_t)(int8_t)(a) + (int32_t)(int8_t)(b)) >> 1, n, 8)
-#define SUB8(a, b, n) \
-  RESULT(((int32_t)(int8_t)(a) - (int32_t)(int8_t)(b)) >> 1, n, 8)
-#define PFX sh
-
-#include "op_addsub.h"
-
-/* Halved unsigned arithmetic.  */
-#define ADD16(a, b, n) \
-  RESULT(((uint32_t)(uint16_t)(a) + (uint32_t)(uint16_t)(b)) >> 1, n, 16)
-#define SUB16(a, b, n) \
-  RESULT(((uint32_t)(uint16_t)(a) - (uint32_t)(uint16_t)(b)) >> 1, n, 16)
-#define ADD8(a, b, n) \
-  RESULT(((uint32_t)(uint8_t)(a) + (uint32_t)(uint8_t)(b)) >> 1, n, 8)
-#define SUB8(a, b, n) \
-  RESULT(((uint32_t)(uint8_t)(a) - (uint32_t)(uint8_t)(b)) >> 1, n, 8)
-#define PFX uh
-
-#include "op_addsub.h"
-
-static inline uint8_t do_usad(uint8_t a, uint8_t b)
-{
-    if (a > b) {
-        return a - b;
-    } else {
-        return b - a;
-    }
-}
-
-/* Unsigned sum of absolute byte differences.  */
-uint32_t HELPER(usad8)(uint32_t a, uint32_t b)
-{
-    uint32_t sum;
-    sum = do_usad(a, b);
-    sum += do_usad(a >> 8, b >> 8);
-    sum += do_usad(a >> 16, b >> 16);
-    sum += do_usad(a >> 24, b >> 24);
-    return sum;
-}
-
-/* For ARMv6 SEL instruction.  */
-uint32_t HELPER(sel_flags)(uint32_t flags, uint32_t a, uint32_t b)
-{
-    uint32_t mask;
-
-    mask = 0;
-    if (flags & 1) {
-        mask |= 0xff;
-    }
-    if (flags & 2) {
-        mask |= 0xff00;
-    }
-    if (flags & 4) {
-        mask |= 0xff0000;
-    }
-    if (flags & 8) {
-        mask |= 0xff000000;
-    }
-    return (a & mask) | (b & ~mask);
-}
-
-/*
- * CRC helpers.
- * The upper bytes of val (above the number specified by 'bytes') must have
- * been zeroed out by the caller.
- */
-uint32_t HELPER(crc32)(uint32_t acc, uint32_t val, uint32_t bytes)
-{
-    uint8_t buf[4];
-
-    stl_le_p(buf, val);
-
-    /* zlib crc32 converts the accumulator and output to one's complement.  */
-    return crc32(acc ^ 0xffffffff, buf, bytes) ^ 0xffffffff;
-}
-
-uint32_t HELPER(crc32c)(uint32_t acc, uint32_t val, uint32_t bytes)
-{
-    uint8_t buf[4];
-
-    stl_le_p(buf, val);
-
-    /* Linux crc32c converts the output to one's complement.  */
-    return crc32c(acc, buf, bytes) ^ 0xffffffff;
-}
 
 /*
  * Return the exception level to which FP-disabled exceptions should
diff --git a/target/arm/tcg/arith_helper.c b/target/arm/tcg/arith_helper.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/tcg/arith_helper.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * ARM generic helpers for various arithmetical operations.
+ *
+ * This code is licensed under the GNU GPL v2 or later.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/helper-proto.h"
+#include "qemu/crc32c.h"
+#include <zlib.h> /* for crc32 */
+
+/*
+ * Note that signed overflow is undefined in C.  The following routines are
+ * careful to use unsigned types where modulo arithmetic is required.
+ * Failure to do so _will_ break on newer gcc.
+ */
+
+/* Signed saturating arithmetic.  */
+
+/* Perform 16-bit signed saturating addition.  */
+static inline uint16_t add16_sat(uint16_t a, uint16_t b)
+{
+    uint16_t res;
+
+    res = a + b;
+    if (((res ^ a) & 0x8000) && !((a ^ b) & 0x8000)) {
+        if (a & 0x8000) {
+            res = 0x8000;
+        } else {
+            res = 0x7fff;
+        }
+    }
+    return res;
+}
+
+/* Perform 8-bit signed saturating addition.  */
+static inline uint8_t add8_sat(uint8_t a, uint8_t b)
+{
+    uint8_t res;
+
+    res = a + b;
+    if (((res ^ a) & 0x80) && !((a ^ b) & 0x80)) {
+        if (a & 0x80) {
+            res = 0x80;
+        } else {
+            res = 0x7f;
+        }
+    }
+    return res;
+}
+
+/* Perform 16-bit signed saturating subtraction.  */
+static inline uint16_t sub16_sat(uint16_t a, uint16_t b)
+{
+    uint16_t res;
+
+    res = a - b;
+    if (((res ^ a) & 0x8000) && ((a ^ b) & 0x8000)) {
+        if (a & 0x8000) {
+            res = 0x8000;
+        } else {
+            res = 0x7fff;
+        }
+    }
+    return res;
+}
+
+/* Perform 8-bit signed saturating subtraction.  */
+static inline uint8_t sub8_sat(uint8_t a, uint8_t b)
+{
+    uint8_t res;
+
+    res = a - b;
+    if (((res ^ a) & 0x80) && ((a ^ b) & 0x80)) {
+        if (a & 0x80) {
+            res = 0x80;
+        } else {
+            res = 0x7f;
+        }
+    }
+    return res;
+}
+
+#define ADD16(a, b, n) RESULT(add16_sat(a, b), n, 16);
+#define SUB16(a, b, n) RESULT(sub16_sat(a, b), n, 16);
+#define ADD8(a, b, n)  RESULT(add8_sat(a, b), n, 8);
+#define SUB8(a, b, n)  RESULT(sub8_sat(a, b), n, 8);
+#define PFX q
+
+#include "op_addsub.c.inc"
+
+/* Unsigned saturating arithmetic.  */
+static inline uint16_t add16_usat(uint16_t a, uint16_t b)
+{
+    uint16_t res;
+    res = a + b;
+    if (res < a) {
+        res = 0xffff;
+    }
+    return res;
+}
+
+static inline uint16_t sub16_usat(uint16_t a, uint16_t b)
+{
+    if (a > b) {
+        return a - b;
+    } else {
+        return 0;
+    }
+}
+
+static inline uint8_t add8_usat(uint8_t a, uint8_t b)
+{
+    uint8_t res;
+    res = a + b;
+    if (res < a) {
+        res = 0xff;
+    }
+    return res;
+}
+
+static inline uint8_t sub8_usat(uint8_t a, uint8_t b)
+{
+    if (a > b) {
+        return a - b;
+    } else {
+        return 0;
+    }
+}
+
+#define ADD16(a, b, n) RESULT(add16_usat(a, b), n, 16);
+#define SUB16(a, b, n) RESULT(sub16_usat(a, b), n, 16);
+#define ADD8(a, b, n)  RESULT(add8_usat(a, b), n, 8);
+#define SUB8(a, b, n)  RESULT(sub8_usat(a, b), n, 8);
+#define PFX uq
+
+#include "op_addsub.c.inc"
+
+/* Signed modulo arithmetic.  */
+#define SARITH16(a, b, n, op) do { \
+    int32_t sum; \
+    sum = (int32_t)(int16_t)(a) op (int32_t)(int16_t)(b); \
+    RESULT(sum, n, 16); \
+    if (sum >= 0) \
+        ge |= 3 << (n * 2); \
+    } while (0)
+
+#define SARITH8(a, b, n, op) do { \
+    int32_t sum; \
+    sum = (int32_t)(int8_t)(a) op (int32_t)(int8_t)(b); \
+    RESULT(sum, n, 8); \
+    if (sum >= 0) \
+        ge |= 1 << n; \
+    } while (0)
+
+
+#define ADD16(a, b, n) SARITH16(a, b, n, +)
+#define SUB16(a, b, n) SARITH16(a, b, n, -)
+#define ADD8(a, b, n)  SARITH8(a, b, n, +)
+#define SUB8(a, b, n)  SARITH8(a, b, n, -)
+#define PFX s
+#define ARITH_GE
+
+#include "op_addsub.c.inc"
+
+/* Unsigned modulo arithmetic.  */
+#define ADD16(a, b, n) do { \
+    uint32_t sum; \
+    sum = (uint32_t)(uint16_t)(a) + (uint32_t)(uint16_t)(b); \
+    RESULT(sum, n, 16); \
+    if ((sum >> 16) == 1) \
+        ge |= 3 << (n * 2); \
+    } while (0)
+
+#define ADD8(a, b, n) do { \
+    uint32_t sum; \
+    sum = (uint32_t)(uint8_t)(a) + (uint32_t)(uint8_t)(b); \
+    RESULT(sum, n, 8); \
+    if ((sum >> 8) == 1) \
+        ge |= 1 << n; \
+    } while (0)
+
+#define SUB16(a, b, n) do { \
+    uint32_t sum; \
+    sum = (uint32_t)(uint16_t)(a) - (uint32_t)(uint16_t)(b); \
+    RESULT(sum, n, 16); \
+    if ((sum >> 16) == 0) \
+        ge |= 3 << (n * 2); \
+    } while (0)
+
+#define SUB8(a, b, n) do { \
+    uint32_t sum; \
+    sum = (uint32_t)(uint8_t)(a) - (uint32_t)(uint8_t)(b); \
+    RESULT(sum, n, 8); \
+    if ((sum >> 8) == 0) \
+        ge |= 1 << n; \
+    } while (0)
+
+#define PFX u
+#define ARITH_GE
+
+#include "op_addsub.c.inc"
+
+/* Halved signed arithmetic.  */
+#define ADD16(a, b, n) \
+  RESULT(((int32_t)(int16_t)(a) + (int32_t)(int16_t)(b)) >> 1, n, 16)
+#define SUB16(a, b, n) \
+  RESULT(((int32_t)(int16_t)(a) - (int32_t)(int16_t)(b)) >> 1, n, 16)
+#define ADD8(a, b, n) \
+  RESULT(((int32_t)(int8_t)(a) + (int32_t)(int8_t)(b)) >> 1, n, 8)
+#define SUB8(a, b, n) \
+  RESULT(((int32_t)(int8_t)(a) - (int32_t)(int8_t)(b)) >> 1, n, 8)
+#define PFX sh
+
+#include "op_addsub.c.inc"
+
+/* Halved unsigned arithmetic.  */
+#define ADD16(a, b, n) \
+  RESULT(((uint32_t)(uint16_t)(a) + (uint32_t)(uint16_t)(b)) >> 1, n, 16)
+#define SUB16(a, b, n) \
+  RESULT(((uint32_t)(uint16_t)(a) - (uint32_t)(uint16_t)(b)) >> 1, n, 16)
+#define ADD8(a, b, n) \
+  RESULT(((uint32_t)(uint8_t)(a) + (uint32_t)(uint8_t)(b)) >> 1, n, 8)
+#define SUB8(a, b, n) \
+  RESULT(((uint32_t)(uint8_t)(a) - (uint32_t)(uint8_t)(b)) >> 1, n, 8)
+#define PFX uh
+
+#include "op_addsub.c.inc"
+
+static inline uint8_t do_usad(uint8_t a, uint8_t b)
+{
+    if (a > b) {
+        return a - b;
+    } else {
+        return b - a;
+    }
+}
+
+/* Unsigned sum of absolute byte differences.  */
+uint32_t HELPER(usad8)(uint32_t a, uint32_t b)
+{
+    uint32_t sum;
+    sum = do_usad(a, b);
+    sum += do_usad(a >> 8, b >> 8);
+    sum += do_usad(a >> 16, b >> 16);
+    sum += do_usad(a >> 24, b >> 24);
+    return sum;
+}
+
+/* For ARMv6 SEL instruction.  */
+uint32_t HELPER(sel_flags)(uint32_t flags, uint32_t a, uint32_t b)
+{
+    uint32_t mask;
+
+    mask = 0;
+    if (flags & 1) {
+        mask |= 0xff;
+    }
+    if (flags & 2) {
+        mask |= 0xff00;
+    }
+    if (flags & 4) {
+        mask |= 0xff0000;
+    }
+    if (flags & 8) {
+        mask |= 0xff000000;
+    }
+    return (a & mask) | (b & ~mask);
+}
+
+/*
+ * CRC helpers.
+ * The upper bytes of val (above the number specified by 'bytes') must have
+ * been zeroed out by the caller.
+ */
+uint32_t HELPER(crc32)(uint32_t acc, uint32_t val, uint32_t bytes)
+{
+    uint8_t buf[4];
+
+    stl_le_p(buf, val);
+
+    /* zlib crc32 converts the accumulator and output to one's complement.  */
+    return crc32(acc ^ 0xffffffff, buf, bytes) ^ 0xffffffff;
+}
+
+uint32_t HELPER(crc32c)(uint32_t acc, uint32_t val, uint32_t bytes)
+{
+    uint8_t buf[4];
+
+    stl_le_p(buf, val);
+
+    /* Linux crc32c converts the output to one's complement.  */
+    return crc32c(acc, buf, bytes) ^ 0xffffffff;
+}
diff --git a/target/arm/op_addsub.h b/target/arm/tcg/op_addsub.c.inc
similarity index 100%
rename from target/arm/op_addsub.h
rename to target/arm/tcg/op_addsub.c.inc
diff --git a/target/arm/tcg/meson.build b/target/arm/tcg/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/meson.build
+++ b/target/arm/tcg/meson.build
@@ -XXX,XX +XXX,XX @@ arm_ss.add(files(
   'tlb_helper.c',
   'vec_helper.c',
   'tlb-insns.c',
+  'arith_helper.c',
 ))
 
 arm_ss.add(when: 'TARGET_AARCH64', if_true: files(
-- 
2.34.1

From: Pierrick Bouvier <pierrick.bouvier@linaro.org>

Before changing default pauth algorithm, we need to make sure current
default one (QARMA5) can still be selected.

$ qemu-system-aarch64 -cpu max,pauth-qarma5=on ...

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241219183211.3493974-2-pierrick.bouvier@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/cpu-features.rst |  5 ++++-
 target/arm/cpu.h                 |  1 +
 target/arm/arm-qmp-cmds.c        |  2 +-
 target/arm/cpu64.c               | 20 ++++++++++++++------
 tests/qtest/arm-cpu-features.c   | 15 +++++++++++----
 5 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/cpu-features.rst
+++ b/docs/system/arm/cpu-features.rst
@@ -XXX,XX +XXX,XX @@ Below is the list of TCG VCPU features and their descriptions.
 ``pauth-qarma3``
   When ``pauth`` is enabled, select the architected QARMA3 algorithm.
 
-Without either ``pauth-impdef`` or ``pauth-qarma3`` enabled,
+``pauth-qarma5``
+  When ``pauth`` is enabled, select the architected QARMA5 algorithm.
+
+Without ``pauth-impdef``, ``pauth-qarma3`` or ``pauth-qarma5`` enabled,
 the architected QARMA5 algorithm is used.  The architected QARMA5
 and QARMA3 algorithms have good cryptographic properties, but can
 be quite slow to emulate.  The impdef algorithm used by QEMU is
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ struct ArchCPU {
     bool prop_pauth;
     bool prop_pauth_impdef;
     bool prop_pauth_qarma3;
+    bool prop_pauth_qarma5;
     bool prop_lpa2;
 
     /* DCZ blocksize, in log_2(words), ie low 4 bits of DCZID_EL0 */
diff --git a/target/arm/arm-qmp-cmds.c b/target/arm/arm-qmp-cmds.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/arm-qmp-cmds.c
+++ b/target/arm/arm-qmp-cmds.c
@@ -XXX,XX +XXX,XX @@ static const char *cpu_model_advertised_features[] = {
     "sve640", "sve768", "sve896", "sve1024", "sve1152", "sve1280",
     "sve1408", "sve1536", "sve1664", "sve1792", "sve1920", "sve2048",
     "kvm-no-adjvtime", "kvm-steal-time",
-    "pauth", "pauth-impdef", "pauth-qarma3",
+    "pauth", "pauth-impdef", "pauth-qarma3", "pauth-qarma5",
     NULL
 };
 
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
         }
 
         if (cpu->prop_pauth) {
-            if (cpu->prop_pauth_impdef && cpu->prop_pauth_qarma3) {
+            if ((cpu->prop_pauth_impdef && cpu->prop_pauth_qarma3) ||
+                (cpu->prop_pauth_impdef && cpu->prop_pauth_qarma5) ||
+                (cpu->prop_pauth_qarma3 && cpu->prop_pauth_qarma5)) {
                 error_setg(errp,
-                           "cannot enable both pauth-impdef and pauth-qarma3");
+                           "cannot enable pauth-impdef, pauth-qarma3 and "
+                           "pauth-qarma5 at the same time");
                 return;
             }
 
@@ -XXX,XX +XXX,XX @@ void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
             } else if (cpu->prop_pauth_qarma3) {
                 isar2 = FIELD_DP64(isar2, ID_AA64ISAR2, APA3, features);
                 isar2 = FIELD_DP64(isar2, ID_AA64ISAR2, GPA3, 1);
-            } else {
+            } else { /* default is pauth-qarma5 */
                 isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, APA, features);
                 isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, GPA, 1);
             }
-        } else if (cpu->prop_pauth_impdef || cpu->prop_pauth_qarma3) {
-            error_setg(errp, "cannot enable pauth-impdef or "
-                       "pauth-qarma3 without pauth");
+        } else if (cpu->prop_pauth_impdef ||
+                   cpu->prop_pauth_qarma3 ||
+                   cpu->prop_pauth_qarma5) {
+            error_setg(errp, "cannot enable pauth-impdef, pauth-qarma3 or "
+                       "pauth-qarma5 without pauth");
             error_append_hint(errp, "Add pauth=on to the CPU property list.\n");
         }
     }
@@ -XXX,XX +XXX,XX @@ static const Property arm_cpu_pauth_impdef_property =
     DEFINE_PROP_BOOL("pauth-impdef", ARMCPU, prop_pauth_impdef, false);
 static const Property arm_cpu_pauth_qarma3_property =
     DEFINE_PROP_BOOL("pauth-qarma3", ARMCPU, prop_pauth_qarma3, false);
+static Property arm_cpu_pauth_qarma5_property =
+    DEFINE_PROP_BOOL("pauth-qarma5", ARMCPU, prop_pauth_qarma5, false);
 
 void aarch64_add_pauth_properties(Object *obj)
 {
@@ -XXX,XX +XXX,XX @@ void aarch64_add_pauth_properties(Object *obj)
     } else {
         qdev_property_add_static(DEVICE(obj), &arm_cpu_pauth_impdef_property);
         qdev_property_add_static(DEVICE(obj), &arm_cpu_pauth_qarma3_property);
+        qdev_property_add_static(DEVICE(obj), &arm_cpu_pauth_qarma5_property);
     }
 }
 
diff --git a/tests/qtest/arm-cpu-features.c b/tests/qtest/arm-cpu-features.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/arm-cpu-features.c
+++ b/tests/qtest/arm-cpu-features.c
@@ -XXX,XX +XXX,XX @@ static void pauth_tests_default(QTestState *qts, const char *cpu_type)
     assert_has_feature_enabled(qts, cpu_type, "pauth");
     assert_has_feature_disabled(qts, cpu_type, "pauth-impdef");
     assert_has_feature_disabled(qts, cpu_type, "pauth-qarma3");
+    assert_has_feature_disabled(qts, cpu_type, "pauth-qarma5");
     assert_set_feature(qts, cpu_type, "pauth", false);
     assert_set_feature(qts, cpu_type, "pauth", true);
     assert_set_feature(qts, cpu_type, "pauth-impdef", true);
     assert_set_feature(qts, cpu_type, "pauth-impdef", false);
     assert_set_feature(qts, cpu_type, "pauth-qarma3", true);
     assert_set_feature(qts, cpu_type, "pauth-qarma3", false);
+    assert_set_feature(qts, cpu_type, "pauth-qarma5", true);
+    assert_set_feature(qts, cpu_type, "pauth-qarma5", false);
     assert_error(qts, cpu_type,
-                 "cannot enable pauth-impdef or pauth-qarma3 without pauth",
+                 "cannot enable pauth-impdef, pauth-qarma3 or pauth-qarma5 without pauth",
                  "{ 'pauth': false, 'pauth-impdef': true }");
     assert_error(qts, cpu_type,
-                 "cannot enable pauth-impdef or pauth-qarma3 without pauth",
+                 "cannot enable pauth-impdef, pauth-qarma3 or pauth-qarma5 without pauth",
                  "{ 'pauth': false, 'pauth-qarma3': true }");
     assert_error(qts, cpu_type,
-                 "cannot enable both pauth-impdef and pauth-qarma3",
-                 "{ 'pauth': true, 'pauth-impdef': true, 'pauth-qarma3': true }");
+                 "cannot enable pauth-impdef, pauth-qarma3 or pauth-qarma5 without pauth",
+                 "{ 'pauth': false, 'pauth-qarma5': true }");
+    assert_error(qts, cpu_type,
+                 "cannot enable pauth-impdef, pauth-qarma3 and pauth-qarma5 at the same time",
+                 "{ 'pauth': true, 'pauth-impdef': true, 'pauth-qarma3': true,"
+                 "  'pauth-qarma5': true }");
 }
 
 static void test_query_cpu_model_expansion(const void *data)
-- 
2.34.1

The pauth-3 test explicitly tests that a computation of the
pointer-authentication produces the expected result.  This means that
it must be run with the QARMA5 algorithm.

Explicitly set the pauth algorithm when running this test, so that it
doesn't break when we change the default algorithm the 'max' CPU
uses.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/tcg/aarch64/Makefile.softmmu-target | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/tcg/aarch64/Makefile.softmmu-target b/tests/tcg/aarch64/Makefile.softmmu-target
index XXXXXXX..XXXXXXX 100644
--- a/tests/tcg/aarch64/Makefile.softmmu-target
+++ b/tests/tcg/aarch64/Makefile.softmmu-target
@@ -XXX,XX +XXX,XX @@ EXTRA_RUNS+=run-memory-replay
 
 ifneq ($(CROSS_CC_HAS_ARMV8_3),)
 pauth-3: CFLAGS += $(CROSS_CC_HAS_ARMV8_3)
+# This test explicitly checks the output of the pauth operation so we
+# must force the use of the QARMA5 algorithm for it.
+run-pauth-3: QEMU_BASE_MACHINE=-M virt -cpu max,pauth-qarma5=on -display none
 else
 pauth-3:
 	$(call skip-test, "BUILD of $@", "missing compiler support")
-- 
2.34.1

From: Pierrick Bouvier <pierrick.bouvier@linaro.org>

Pointer authentication on aarch64 is pretty expensive (up to 50% of
execution time) when running a virtual machine with tcg and -cpu max
(which enables pauth=on).

The advice is always: use pauth-impdef=on.
Our documentation even mentions it "by default" in
docs/system/introduction.rst.

Thus, we change the default to use impdef by default. This does not
affect kvm or hvf acceleration, since pauth algorithm used is the one
from host cpu.

This change is retro compatible, in terms of cli, with previous
versions, as the semantic of using -cpu max,pauth-impdef=on, and -cpu
max,pauth-qarma3=on is preserved.
The new option introduced in previous patch and matching old default is
-cpu max,pauth-qarma5=on.
It is retro compatible with migration as well, by defining a backcompat
property, that will use qarma5 by default for virt machine <= 9.2.
Tested by saving and restoring a vm from qemu 9.2.0 into qemu-master
(10.0) for cpus neoverse-n2 and max.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20241219183211.3493974-3-pierrick.bouvier@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/cpu-features.rst |  2 +-
 docs/system/introduction.rst     |  2 +-
 target/arm/cpu.h                 |  3 +++
 hw/core/machine.c                |  4 +++-
 target/arm/cpu.c                 |  2 ++
 target/arm/cpu64.c               | 22 ++++++++++++++++------
 6 files changed, 26 insertions(+), 9 deletions(-)