Series comparison

-[Qemu-devel] [PULL 00/16] target-arm queue
+[PULL 00/11] target-arm queue
-The following changes since commit ad1b4ec39caa5b3f17cbd8160283a03a3dcfe2ae:
+Just a collection of bug fixes this time around...
-  Merge remote-tracking branch 'remotes/kraxel/tags/input-20180515-pull-request' into staging (2018-05-15 12:50:06 +0100)
+thanks
 -- PMM
 The following changes since commit 2a6ae69154542caa91dd17c40fd3f5ffbec300de:
   Merge tag 'pull-maintainer-ominbus-030723-1' of https://gitlab.com/stsquad/qemu into staging (2023-07-04 08:36:44 +0200)
 are available in the Git repository at:
-  git://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20180515
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20230704
-for you to fetch changes up to ae7651804748c6b479d5ae09aeac4edb9c44f76e:
+for you to fetch changes up to 86a78272f094857b4eda79d721c116e93942aa9a:
-  tcg: Optionally log FPU state in TCG -d cpu logging (2018-05-15 14:58:44 +0100)
+  target/xtensa: Assert that interrupt level is within bounds (2023-07-04 14:27:08 +0100)
 ----------------------------------------------------------------
 target-arm queue:
- * Fix coverity nit in int_to_float code
+ * Add raw_writes ops for register whose write induce TLB maintenance
- * Don't set Invalid for float-to-int(MAXINT)
+ * hw/arm/sbsa-ref: use XHCI to replace EHCI
- * Fix fp_status_f16 tininess before rounding
+ * Avoid splitting Zregs across lines in dump
- * Add various missing insns from the v8.2-FP16 extension
+ * Dump ZA[] when active
- * Fix sqrt_f16 exception raising
+ * Fix SME full tile indexing
- * sdcard: Correct CRC16 offset in sd_function_switch()
+ * Handle IC IVAU to improve compatibility with JITs
- * tcg: Optionally log FPU state in TCG -d cpu logging
+ * xlnx-canfd-test: Fix code coverity issues
  * gdbstub: Guard M-profile code with CONFIG_TCG
  * allwinner-sramc: Set class_size
  * target/xtensa: Assert that interrupt level is within bounds
 ----------------------------------------------------------------
-Alex Bennée (5):
+Akihiko Odaki (1):
-      fpu/softfloat: int_to_float ensure r fully initialised
+      hw: arm: allwinner-sramc: Set class_size
       target/arm: Implement FCMP for fp16
       target/arm: Implement FCSEL for fp16
       target/arm: Implement FMOV (immediate) for fp16
       target/arm: Fix sqrt_f16 exception raising
-Peter Maydell (3):
+Eric Auger (1):
-      fpu/softfloat: Don't set Invalid for float-to-int(MAXINT)
+      target/arm: Add raw_writes ops for register whose write induce TLB maintenance
       target/arm: Fix fp_status_f16 tininess before rounding
       tcg: Optionally log FPU state in TCG -d cpu logging
-Philippe Mathieu-Daudé (1):
+Fabiano Rosas (1):
-      sdcard: Correct CRC16 offset in sd_function_switch()
+      target/arm: gdbstub: Guard M-profile code with CONFIG_TCG
-Richard Henderson (7):
+John Högberg (2):
-      target/arm: Implement FMOV (general) for fp16
+      target/arm: Handle IC IVAU to improve compatibility with JITs
-      target/arm: Early exit after unallocated_encoding in disas_fp_int_conv
+      tests/tcg/aarch64: Add testcases for IC IVAU and dual-mapped code
       target/arm: Implement FCVT (scalar, integer) for fp16
       target/arm: Implement FCVT (scalar, fixed-point) for fp16
       target/arm: Introduce and use read_fp_hreg
       target/arm: Implement FP data-processing (2 source) for fp16
       target/arm: Implement FP data-processing (3 source) for fp16
- include/qemu/log.h         |   1 +
+Peter Maydell (1):
- target/arm/helper-a64.h    |   2 +
+      target/xtensa: Assert that interrupt level is within bounds
  target/arm/helper.h        |   6 +
  accel/tcg/cpu-exec.c       |   9 +-
  fpu/softfloat.c            |   6 +-
  hw/sd/sd.c                 |   2 +-
  target/arm/cpu.c           |   2 +
  target/arm/helper-a64.c    |  10 ++
  target/arm/helper.c        |  38 +++-
  target/arm/translate-a64.c | 421 ++++++++++++++++++++++++++++++++++++++-------
  util/log.c                 |   2 +
 files changed, 428 insertions(+), 71 deletions(-)
+Richard Henderson (3):
+      target/arm: Avoid splitting Zregs across lines in dump
+      target/arm: Dump ZA[] when active
+      target/arm: Fix SME full tile indexing
+Vikram Garhwal (1):
+      tests/qtest: xlnx-canfd-test: Fix code coverity issues
+Yuquan Wang (1):
+      hw/arm/sbsa-ref: use XHCI to replace EHCI
+ docs/system/arm/sbsa.rst          |   5 +-
+ hw/arm/sbsa-ref.c                 |  23 +++--
+ hw/misc/allwinner-sramc.c         |   1 +
+ target/arm/cpu.c                  |  65 ++++++++-----
+ target/arm/gdbstub.c              |   4 +
+ target/arm/helper.c               |  70 +++++++++++---
+ target/arm/tcg/translate-sme.c    |  24 +++--
+ target/xtensa/exc_helper.c        |   3 +
+ tests/qtest/xlnx-canfd-test.c     |  33 +++----
+ tests/tcg/aarch64/icivau.c        | 189 ++++++++++++++++++++++++++++++++++++++
+ tests/tcg/aarch64/sme-outprod1.c  |  83 +++++++++++++++++
+ hw/arm/Kconfig                    |   2 +-
+ tests/tcg/aarch64/Makefile.target |  13 ++-
+files changed, 436 insertions(+), 79 deletions(-)
+ create mode 100644 tests/tcg/aarch64/icivau.c
+ create mode 100644 tests/tcg/aarch64/sme-outprod1.c

-[Qemu-devel] [PULL 01/16] fpu/softfloat: int_to_float ensure r fully initialised
+Deleted patch
-From: Alex Bennée <alex.bennee@linaro.org>
-Reported by Coverity (CID1390635). We ensure this for uint_to_float
-later on so we might as well mirror that.
-Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- fpu/softfloat.c | 2 +-
-file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/fpu/softfloat.c b/fpu/softfloat.c
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat.c
-+++ b/fpu/softfloat.c
-@@ -XXX,XX +XXX,XX @@ FLOAT_TO_UINT(64, 64)
- static FloatParts int_to_float(int64_t a, float_status *status)
- {
--    FloatParts r;
-+    FloatParts r = {};
-     if (a == 0) {
-         r.cls = float_class_zero;
-         r.sign = false;
---
-.17.0

-[Qemu-devel] [PULL 02/16] fpu/softfloat: Don't set Invalid for float-to-int(MAXINT)
+Deleted patch
-In float-to-integer conversion, if the floating point input
-converts exactly to the largest or smallest integer that
-fits in to the result type, this is not an overflow.
-In this situation we were producing the correct result value,
-but were incorrectly setting the Invalid flag.
-For example for Arm A64, "FCVTAS w0, d0" on an input of
-x41dfffffffc00000 should produce 0x7fffffff and set no flags.
-Fix the boundary case to take the right half of the if()
-statements.
-This fixes a regression from 2.11 introduced by the softfloat
-refactoring.
-Cc: qemu-stable@nongnu.org
-Fixes: ab52f973a50
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180510140141.12120-1-peter.maydell@linaro.org
----
- fpu/softfloat.c | 4 ++--
-file changed, 2 insertions(+), 2 deletions(-)
-diff --git a/fpu/softfloat.c b/fpu/softfloat.c
-index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat.c
-+++ b/fpu/softfloat.c
-@@ -XXX,XX +XXX,XX @@ static int64_t round_to_int_and_pack(FloatParts in, int rmode,
-             r = UINT64_MAX;
-         }
-         if (p.sign) {
--            if (r < -(uint64_t) min) {
-+            if (r <= -(uint64_t) min) {
-                 return -r;
-             } else {
-                 s->float_exception_flags = orig_flags | float_flag_invalid;
-                 return min;
-             }
-         } else {
--            if (r < max) {
-+            if (r <= max) {
-                 return r;
-             } else {
-                 s->float_exception_flags = orig_flags | float_flag_invalid;
---
-.17.0

-[Qemu-devel] [PULL 06/16] target/arm: Implement FCVT (scalar, integer) for fp16
+[PULL 01/11] target/arm: Add raw_writes ops for register whose write induce TLB maintenance
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Eric Auger <eric.auger@redhat.com>
-Cc: qemu-stable@nongnu.org
+Some registers whose 'cooked' writefns induce TLB maintenance do
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+not have raw_writefn ops defined. If only the writefn ops is set
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+(ie. no raw_writefn is provided), it is assumed the cooked also
-Tested-by: Alex Bennée <alex.bennee@linaro.org>
+work as the raw one. For those registers it is not obvious the
-Message-id: 20180512003217.9105-4-richard.henderson@linaro.org
+tlb_flush works on KVM mode so better/safer setting the raw write.
 Signed-off-by: Eric Auger <eric.auger@redhat.com>
 Suggested-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper.h        |  6 +++
+ target/arm/helper.c | 23 +++++++++++++----------
- target/arm/helper.c        | 38 ++++++++++++++-
+file changed, 13 insertions(+), 10 deletions(-)
  target/arm/translate-a64.c | 96 +++++++++++++++++++++++++++++++-------
 files changed, 122 insertions(+), 18 deletions(-)
-diff --git a/target/arm/helper.h b/target/arm/helper.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.h
-+++ b/target/arm/helper.h
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_touhd_round_to_zero, i64, f64, i32, ptr)
- DEF_HELPER_3(vfp_tould_round_to_zero, i64, f64, i32, ptr)
- DEF_HELPER_3(vfp_touhh, i32, f16, i32, ptr)
- DEF_HELPER_3(vfp_toshh, i32, f16, i32, ptr)
-+DEF_HELPER_3(vfp_toulh, i32, f16, i32, ptr)
-+DEF_HELPER_3(vfp_toslh, i32, f16, i32, ptr)
-+DEF_HELPER_3(vfp_touqh, i64, f16, i32, ptr)
-+DEF_HELPER_3(vfp_tosqh, i64, f16, i32, ptr)
- DEF_HELPER_3(vfp_toshs, i32, f32, i32, ptr)
- DEF_HELPER_3(vfp_tosls, i32, f32, i32, ptr)
- DEF_HELPER_3(vfp_tosqs, i64, f32, i32, ptr)
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_ultod, f64, i64, i32, ptr)
- DEF_HELPER_3(vfp_uqtod, f64, i64, i32, ptr)
- DEF_HELPER_3(vfp_sltoh, f16, i32, i32, ptr)
- DEF_HELPER_3(vfp_ultoh, f16, i32, i32, ptr)
-+DEF_HELPER_3(vfp_sqtoh, f16, i64, i32, ptr)
-+DEF_HELPER_3(vfp_uqtoh, f16, i64, i32, ptr)
- DEF_HELPER_FLAGS_2(set_rmode, TCG_CALL_NO_RWG, i32, i32, ptr)
- DEF_HELPER_FLAGS_2(set_neon_rmode, TCG_CALL_NO_RWG, i32, i32, env)
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ VFP_CONV_FIX_A64(uq, s, 32, 64, uint64)
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vmsa_cp_reginfo[] = {
- #undef VFP_CONV_FIX_A64
+       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 0,
+       .access = PL1_RW, .accessfn = access_tvm_trvm,
- /* Conversion to/from f16 can overflow to infinity before/after scaling.
+       .fgt = FGT_TTBR0_EL1,
-- * Therefore we convert to f64 (which does not round), scale,
+-      .writefn = vmsa_ttbr_write, .resetvalue = 0,
-- * and then convert f64 to f16 (which may round).
++      .writefn = vmsa_ttbr_write, .resetvalue = 0, .raw_writefn = raw_write,
-+ * Therefore we convert to f64, scale, and then convert f64 to f16; or
+       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr0_s),
-+ * vice versa for conversion to integer.
+                              offsetof(CPUARMState, cp15.ttbr0_ns) } },
-+ *
+     { .name = "TTBR1_EL1", .state = ARM_CP_STATE_BOTH,
-+ * For 16- and 32-bit integers, the conversion to f64 never rounds.
+       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 1,
-+ * For 64-bit integers, any integer that would cause rounding will also
+       .access = PL1_RW, .accessfn = access_tvm_trvm,
-+ * overflow to f16 infinity, so there is no double rounding problem.
+       .fgt = FGT_TTBR1_EL1,
-  */
+-      .writefn = vmsa_ttbr_write, .resetvalue = 0,
++      .writefn = vmsa_ttbr_write, .resetvalue = 0, .raw_writefn = raw_write,
- static float16 do_postscale_fp16(float64 f, int shift, float_status *fpst)
+       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr1_s),
-@@ -XXX,XX +XXX,XX @@ float16 HELPER(vfp_ultoh)(uint32_t x, uint32_t shift, void *fpst)
+                              offsetof(CPUARMState, cp15.ttbr1_ns) } },
-     return do_postscale_fp16(uint32_to_float64(x, fpst), shift, fpst);
+     { .name = "TCR_EL1", .state = ARM_CP_STATE_AA64,
- }
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lpae_cp_reginfo[] = {
+       .type = ARM_CP_64BIT | ARM_CP_ALIAS,
-+float16 HELPER(vfp_sqtoh)(uint64_t x, uint32_t shift, void *fpst)
+       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr0_s),
-+{
+                              offsetof(CPUARMState, cp15.ttbr0_ns) },
-+    return do_postscale_fp16(int64_to_float64(x, fpst), shift, fpst);
+-      .writefn = vmsa_ttbr_write, },
-+}
++      .writefn = vmsa_ttbr_write, .raw_writefn = raw_write },
-+
+     { .name = "TTBR1", .cp = 15, .crm = 2, .opc1 = 1,
-+float16 HELPER(vfp_uqtoh)(uint64_t x, uint32_t shift, void *fpst)
+       .access = PL1_RW, .accessfn = access_tvm_trvm,
-+{
+       .type = ARM_CP_64BIT | ARM_CP_ALIAS,
-+    return do_postscale_fp16(uint64_to_float64(x, fpst), shift, fpst);
+       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr1_s),
-+}
+                              offsetof(CPUARMState, cp15.ttbr1_ns) },
-+
+-      .writefn = vmsa_ttbr_write, },
- static float64 do_prescale_fp16(float16 f, int shift, float_status *fpst)
++      .writefn = vmsa_ttbr_write, .raw_writefn = raw_write },
- {
+ };
-     if (unlikely(float16_is_any_nan(f))) {
-@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(vfp_touhh)(float16 x, uint32_t shift, void *fpst)
+ static uint64_t aa64_fpcr_read(CPUARMState *env, const ARMCPRegInfo *ri)
-     return float64_to_uint16(do_prescale_fp16(x, shift, fpst), fpst);
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
- }
+       .type = ARM_CP_IO,
+       .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 0,
-+uint32_t HELPER(vfp_toslh)(float16 x, uint32_t shift, void *fpst)
+       .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.hcr_el2),
-+{
+-      .writefn = hcr_write },
-+    return float64_to_int32(do_prescale_fp16(x, shift, fpst), fpst);
++      .writefn = hcr_write, .raw_writefn = raw_write },
-+}
+     { .name = "HCR", .state = ARM_CP_STATE_AA32,
-+
+       .type = ARM_CP_ALIAS | ARM_CP_IO,
-+uint32_t HELPER(vfp_toulh)(float16 x, uint32_t shift, void *fpst)
+       .cp = 15, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 0,
-+{
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
-+    return float64_to_uint32(do_prescale_fp16(x, shift, fpst), fpst);
+     { .name = "TCR_EL2", .state = ARM_CP_STATE_BOTH,
-+}
+       .opc0 = 3, .opc1 = 4, .crn = 2, .crm = 0, .opc2 = 2,
-+
+       .access = PL2_RW, .writefn = vmsa_tcr_el12_write,
-+uint64_t HELPER(vfp_tosqh)(float16 x, uint32_t shift, void *fpst)
++      .raw_writefn = raw_write,
-+{
+       .fieldoffset = offsetof(CPUARMState, cp15.tcr_el[2]) },
-+    return float64_to_int64(do_prescale_fp16(x, shift, fpst), fpst);
+     { .name = "VTCR", .state = ARM_CP_STATE_AA32,
-+}
+       .cp = 15, .opc1 = 4, .crn = 2, .crm = 1, .opc2 = 2,
-+
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
-+uint64_t HELPER(vfp_touqh)(float16 x, uint32_t shift, void *fpst)
+       .type = ARM_CP_64BIT | ARM_CP_ALIAS,
-+{
+       .access = PL2_RW, .accessfn = access_el3_aa32ns,
-+    return float64_to_uint64(do_prescale_fp16(x, shift, fpst), fpst);
+       .fieldoffset = offsetof(CPUARMState, cp15.vttbr_el2),
-+}
+-      .writefn = vttbr_write },
-+
++      .writefn = vttbr_write, .raw_writefn = raw_write },
- /* Set the current fp rounding mode and return the old one.
+     { .name = "VTTBR_EL2", .state = ARM_CP_STATE_AA64,
-  * The argument is a softfloat float_round_ value.
+       .opc0 = 3, .opc1 = 4, .crn = 2, .crm = 1, .opc2 = 0,
-  */
+-      .access = PL2_RW, .writefn = vttbr_write,
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
++      .access = PL2_RW, .writefn = vttbr_write, .raw_writefn = raw_write,
-index XXXXXXX..XXXXXXX 100644
+       .fieldoffset = offsetof(CPUARMState, cp15.vttbr_el2) },
---- a/target/arm/translate-a64.c
+     { .name = "SCTLR_EL2", .state = ARM_CP_STATE_BOTH,
-+++ b/target/arm/translate-a64.c
+       .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 0, .opc2 = 0,
-@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
-                            bool itof, int rmode, int scale, int sf, int type)
+       .fieldoffset = offsetof(CPUARMState, cp15.tpidr_el[2]) },
- {
+     { .name = "TTBR0_EL2", .state = ARM_CP_STATE_AA64,
-     bool is_signed = !(opcode & 1);
+       .opc0 = 3, .opc1 = 4, .crn = 2, .crm = 0, .opc2 = 0,
--    bool is_double = type;
+-      .access = PL2_RW, .resetvalue = 0, .writefn = vmsa_tcr_ttbr_el2_write,
-     TCGv_ptr tcg_fpstatus;
++      .access = PL2_RW, .resetvalue = 0,
--    TCGv_i32 tcg_shift;
++      .writefn = vmsa_tcr_ttbr_el2_write, .raw_writefn = raw_write,
-+    TCGv_i32 tcg_shift, tcg_single;
+       .fieldoffset = offsetof(CPUARMState, cp15.ttbr0_el[2]) },
-+    TCGv_i64 tcg_double;
+     { .name = "HTTBR", .cp = 15, .opc1 = 4, .crm = 2,
+       .access = PL2_RW, .type = ARM_CP_64BIT | ARM_CP_ALIAS,
--    tcg_fpstatus = get_fpstatus_ptr(false);
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el3_cp_reginfo[] = {
-+    tcg_fpstatus = get_fpstatus_ptr(type == 3);
+     { .name = "SCR_EL3", .state = ARM_CP_STATE_AA64,
+       .opc0 = 3, .opc1 = 6, .crn = 1, .crm = 1, .opc2 = 0,
-     tcg_shift = tcg_const_i32(64 - scale);
+       .access = PL3_RW, .fieldoffset = offsetof(CPUARMState, cp15.scr_el3),
+-      .resetfn = scr_reset, .writefn = scr_write },
-@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
++      .resetfn = scr_reset, .writefn = scr_write, .raw_writefn = raw_write },
-             tcg_int = tcg_extend;
+     { .name = "SCR",  .type = ARM_CP_ALIAS | ARM_CP_NEWEL,
-         }
+       .cp = 15, .opc1 = 0, .crn = 1, .crm = 1, .opc2 = 0,
+       .access = PL1_RW, .accessfn = access_trap_aa32s_el1,
--        if (is_double) {
+       .fieldoffset = offsetoflow32(CPUARMState, cp15.scr_el3),
--            TCGv_i64 tcg_double = tcg_temp_new_i64();
+-      .writefn = scr_write },
-+        switch (type) {
++      .writefn = scr_write, .raw_writefn = raw_write },
-+        case 1: /* float64 */
+     { .name = "SDER32_EL3", .state = ARM_CP_STATE_AA64,
-+            tcg_double = tcg_temp_new_i64();
+       .opc0 = 3, .opc1 = 6, .crn = 1, .crm = 1, .opc2 = 1,
-             if (is_signed) {
+       .access = PL3_RW, .resetvalue = 0,
-                 gen_helper_vfp_sqtod(tcg_double, tcg_int,
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vhe_reginfo[] = {
-                                      tcg_shift, tcg_fpstatus);
+     { .name = "TTBR1_EL2", .state = ARM_CP_STATE_AA64,
-@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
+       .opc0 = 3, .opc1 = 4, .crn = 2, .crm = 0, .opc2 = 1,
-             }
+       .access = PL2_RW, .writefn = vmsa_tcr_ttbr_el2_write,
-             write_fp_dreg(s, rd, tcg_double);
++      .raw_writefn = raw_write,
-             tcg_temp_free_i64(tcg_double);
+       .fieldoffset = offsetof(CPUARMState, cp15.ttbr1_el[2]) },
--        } else {
+ #ifndef CONFIG_USER_ONLY
--            TCGv_i32 tcg_single = tcg_temp_new_i32();
+     { .name = "CNTHV_CVAL_EL2", .state = ARM_CP_STATE_AA64,
 +            break;
 +
 +        case 0: /* float32 */
 +            tcg_single = tcg_temp_new_i32();
              if (is_signed) {
                  gen_helper_vfp_sqtos(tcg_single, tcg_int,
                                       tcg_shift, tcg_fpstatus);
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
              }
              write_fp_sreg(s, rd, tcg_single);
              tcg_temp_free_i32(tcg_single);
 +            break;
 +
 +        case 3: /* float16 */
 +            tcg_single = tcg_temp_new_i32();
 +            if (is_signed) {
 +                gen_helper_vfp_sqtoh(tcg_single, tcg_int,
 +                                     tcg_shift, tcg_fpstatus);
 +            } else {
 +                gen_helper_vfp_uqtoh(tcg_single, tcg_int,
 +                                     tcg_shift, tcg_fpstatus);
 +            }
 +            write_fp_sreg(s, rd, tcg_single);
 +            tcg_temp_free_i32(tcg_single);
 +            break;
 +
 +        default:
 +            g_assert_not_reached();
          }
      } else {
          TCGv_i64 tcg_int = cpu_reg(s, rd);
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
          gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
 -        if (is_double) {
 -            TCGv_i64 tcg_double = read_fp_dreg(s, rn);
 +        switch (type) {
 +        case 1: /* float64 */
 +            tcg_double = read_fp_dreg(s, rn);
              if (is_signed) {
                  if (!sf) {
                      gen_helper_vfp_tosld(tcg_int, tcg_double,
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
                                           tcg_shift, tcg_fpstatus);
                  }
              }
 +            if (!sf) {
 +                tcg_gen_ext32u_i64(tcg_int, tcg_int);
 +            }
              tcg_temp_free_i64(tcg_double);
 -        } else {
 -            TCGv_i32 tcg_single = read_fp_sreg(s, rn);
 +            break;
 +
 +        case 0: /* float32 */
 +            tcg_single = read_fp_sreg(s, rn);
              if (sf) {
                  if (is_signed) {
                      gen_helper_vfp_tosqs(tcg_int, tcg_single,
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
                  tcg_temp_free_i32(tcg_dest);
              }
              tcg_temp_free_i32(tcg_single);
 +            break;
 +
 +        case 3: /* float16 */
 +            tcg_single = read_fp_sreg(s, rn);
 +            if (sf) {
 +                if (is_signed) {
 +                    gen_helper_vfp_tosqh(tcg_int, tcg_single,
 +                                         tcg_shift, tcg_fpstatus);
 +                } else {
 +                    gen_helper_vfp_touqh(tcg_int, tcg_single,
 +                                         tcg_shift, tcg_fpstatus);
 +                }
 +            } else {
 +                TCGv_i32 tcg_dest = tcg_temp_new_i32();
 +                if (is_signed) {
 +                    gen_helper_vfp_toslh(tcg_dest, tcg_single,
 +                                         tcg_shift, tcg_fpstatus);
 +                } else {
 +                    gen_helper_vfp_toulh(tcg_dest, tcg_single,
 +                                         tcg_shift, tcg_fpstatus);
 +                }
 +                tcg_gen_extu_i32_i64(tcg_int, tcg_dest);
 +                tcg_temp_free_i32(tcg_dest);
 +            }
 +            tcg_temp_free_i32(tcg_single);
 +            break;
 +
 +        default:
 +            g_assert_not_reached();
          }
          gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
          tcg_temp_free_i32(tcg_rmode);
 -
 -        if (!sf) {
 -            tcg_gen_ext32u_i64(tcg_int, tcg_int);
 -        }
      }
      tcg_temp_free_ptr(tcg_fpstatus);
@@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
          /* actual FP conversions */
          bool itof = extract32(opcode, 1, 1);
 -        if (type > 1 || (rmode != 0 && opcode > 1)) {
 +        if (rmode != 0 && opcode > 1) {
 +            unallocated_encoding(s);
 +            return;
 +        }
 +        switch (type) {
 +        case 0: /* float32 */
 +        case 1: /* float64 */
 +            break;
 +        case 3: /* float16 */
 +            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 +                break;
 +            }
 +            /* fallthru */
 +        default:
              unallocated_encoding(s);
              return;
          }
 --
-.17.0
+.34.1

-[Qemu-devel] [PULL 11/16] target/arm: Implement FCMP for fp16
+[PULL 02/11] hw/arm/sbsa-ref: use XHCI to replace EHCI
-From: Alex Bennée <alex.bennee@linaro.org>
+From: Yuquan Wang <wangyuquan1236@phytium.com.cn>
-These where missed out from the rest of the half-precision work.
+The current sbsa-ref cannot use EHCI controller which is only
 able to do 32-bit DMA, since sbsa-ref doesn't have RAM below 4GB.
 Hence, this uses XHCI to provide a usb controller with 64-bit
 DMA capablity instead of EHCI.
-Cc: qemu-stable@nongnu.org
+We bump the platform version to 0.3 with this change.  Although the
 hardware at the USB controller address changes, the firmware and
 Linux can both cope with this -- on an older non-XHCI-aware
 firmware/kernel setup the probe routine simply fails and the guest
 proceeds without any USB.  (This isn't a loss of functionality,
 because the old USB controller never worked in the first place.) So
 we can call this a backwards-compatible change and only bump the
 minor version.
 Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn>
 Message-id: 20230621103847.447508-2-wangyuquan1236@phytium.com.cn
 [PMM: tweaked commit message; add line to docs about what
  changes in platform version 0.3]
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
-Tested-by: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180512003217.9105-9-richard.henderson@linaro.org
-[rth: Diagnose lack of FP16 before fp_access_check]
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/helper-a64.h    |  2 +
+ docs/system/arm/sbsa.rst |  5 ++++-
- target/arm/helper-a64.c    | 10 +++++
+ hw/arm/sbsa-ref.c        | 23 +++++++++++++----------
- target/arm/translate-a64.c | 88 ++++++++++++++++++++++++++++++--------
+ hw/arm/Kconfig           |  2 +-
-files changed, 83 insertions(+), 17 deletions(-)
+files changed, 18 insertions(+), 12 deletions(-)
-diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
+diff --git a/docs/system/arm/sbsa.rst b/docs/system/arm/sbsa.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-a64.h
+--- a/docs/system/arm/sbsa.rst
-+++ b/target/arm/helper-a64.h
++++ b/docs/system/arm/sbsa.rst
@@ -XXX,XX +XXX,XX @@ The ``sbsa-ref`` board supports:
    - A configurable number of AArch64 CPUs
    - GIC version 3
    - System bus AHCI controller
 -  - System bus EHCI controller
 +  - System bus XHCI controller
    - CDROM and hard disc on AHCI bus
    - E1000E ethernet card on PCIe bus
    - Bochs display adapter on PCIe bus
@@ -XXX,XX +XXX,XX @@ Platform version changes:
 .2
    GIC ITS information is present in devicetree.
 +
 +0.3
 +  The USB controller is an XHCI device, not EHCI
 diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/sbsa-ref.c
 +++ b/hw/arm/sbsa-ref.c
 @@ -XXX,XX +XXX,XX @@
- DEF_HELPER_FLAGS_2(udiv64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+ #include "hw/pci-host/gpex.h"
- DEF_HELPER_FLAGS_2(sdiv64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
+ #include "hw/qdev-properties.h"
- DEF_HELPER_FLAGS_1(rbit64, TCG_CALL_NO_RWG_SE, i64, i64)
+ #include "hw/usb.h"
-+DEF_HELPER_3(vfp_cmph_a64, i64, f16, f16, ptr)
++#include "hw/usb/xhci.h"
-+DEF_HELPER_3(vfp_cmpeh_a64, i64, f16, f16, ptr)
+ #include "hw/char/pl011.h"
- DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
+ #include "hw/watchdog/sbsa_gwdt.h"
- DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
+ #include "net/net.h"
- DEF_HELPER_3(vfp_cmpd_a64, i64, f64, f64, ptr)
+@@ -XXX,XX +XXX,XX @@ enum {
-diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
+     SBSA_SECURE_UART_MM,
-index XXXXXXX..XXXXXXX 100644
+     SBSA_SECURE_MEM,
---- a/target/arm/helper-a64.c
+     SBSA_AHCI,
-+++ b/target/arm/helper-a64.c
+-    SBSA_EHCI,
-@@ -XXX,XX +XXX,XX @@ static inline uint32_t float_rel_to_flags(int res)
++    SBSA_XHCI,
-     return flags;
+ };
- }
+ struct SBSAMachineState {
-+uint64_t HELPER(vfp_cmph_a64)(float16 x, float16 y, void *fp_status)
+@@ -XXX,XX +XXX,XX @@ static const MemMapEntry sbsa_ref_memmap[] = {
-+{
+     [SBSA_SMMU] =               { 0x60050000, 0x00020000 },
-+    return float_rel_to_flags(float16_compare_quiet(x, y, fp_status));
+     /* Space here reserved for more SMMUs */
-+}
+     [SBSA_AHCI] =               { 0x60100000, 0x00010000 },
-+
+-    [SBSA_EHCI] =               { 0x60110000, 0x00010000 },
-+uint64_t HELPER(vfp_cmpeh_a64)(float16 x, float16 y, void *fp_status)
++    [SBSA_XHCI] =               { 0x60110000, 0x00010000 },
-+{
+     /* Space here reserved for other devices */
-+    return float_rel_to_flags(float16_compare(x, y, fp_status));
+     [SBSA_PCIE_PIO] =           { 0x7fff0000, 0x00010000 },
-+}
+     /* 32-bit address PCIE MMIO space */
-+
+@@ -XXX,XX +XXX,XX @@ static const int sbsa_ref_irqmap[] = {
- uint64_t HELPER(vfp_cmps_a64)(float32 x, float32 y, void *fp_status)
+     [SBSA_SECURE_UART] = 8,
- {
+     [SBSA_SECURE_UART_MM] = 9,
-     return float_rel_to_flags(float32_compare_quiet(x, y, fp_status));
+     [SBSA_AHCI] = 10,
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+-    [SBSA_EHCI] = 11,
-index XXXXXXX..XXXXXXX 100644
++    [SBSA_XHCI] = 11,
---- a/target/arm/translate-a64.c
+     [SBSA_SMMU] = 12, /* ... to 15 */
-+++ b/target/arm/translate-a64.c
+     [SBSA_GWDT_WS0] = 16,
-@@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn)
+ };
@@ -XXX,XX +XXX,XX @@ static void create_fdt(SBSAMachineState *sms)
       *                        fw compatibility.
       */
      qemu_fdt_setprop_cell(fdt, "/", "machine-version-major", 0);
 -    qemu_fdt_setprop_cell(fdt, "/", "machine-version-minor", 2);
 +    qemu_fdt_setprop_cell(fdt, "/", "machine-version-minor", 3);
      if (ms->numa_state->have_numa_distance) {
          int size = nb_numa_nodes * nb_numa_nodes * 3 * sizeof(uint32_t);
@@ -XXX,XX +XXX,XX @@ static void create_ahci(const SBSAMachineState *sms)
      }
  }
--static void handle_fp_compare(DisasContext *s, bool is_double,
+-static void create_ehci(const SBSAMachineState *sms)
-+static void handle_fp_compare(DisasContext *s, int size,
++static void create_xhci(const SBSAMachineState *sms)
                                unsigned int rn, unsigned int rm,
                                bool cmp_with_zero, bool signal_all_nans)
  {
-     TCGv_i64 tcg_flags = tcg_temp_new_i64();
+-    hwaddr base = sbsa_ref_memmap[SBSA_EHCI].base;
--    TCGv_ptr fpst = get_fpstatus_ptr(false);
+-    int irq = sbsa_ref_irqmap[SBSA_EHCI];
-+    TCGv_ptr fpst = get_fpstatus_ptr(size == MO_16);
++    hwaddr base = sbsa_ref_memmap[SBSA_XHCI].base;
++    int irq = sbsa_ref_irqmap[SBSA_XHCI];
--    if (is_double) {
++    DeviceState *dev = qdev_new(TYPE_XHCI_SYSBUS);
-+    if (size == MO_64) {
-         TCGv_i64 tcg_vn, tcg_vm;
+-    sysbus_create_simple("platform-ehci-usb", base,
+-                         qdev_get_gpio_in(sms->gic, irq));
-         tcg_vn = read_fp_dreg(s, rn);
++    sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
-@@ -XXX,XX +XXX,XX @@ static void handle_fp_compare(DisasContext *s, bool is_double,
++    sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, base);
-         tcg_temp_free_i64(tcg_vn);
++    sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, qdev_get_gpio_in(sms->gic, irq));
          tcg_temp_free_i64(tcg_vm);
      } else {
 -        TCGv_i32 tcg_vn, tcg_vm;
 +        TCGv_i32 tcg_vn = tcg_temp_new_i32();
 +        TCGv_i32 tcg_vm = tcg_temp_new_i32();
 -        tcg_vn = read_fp_sreg(s, rn);
 +        read_vec_element_i32(s, tcg_vn, rn, 0, size);
          if (cmp_with_zero) {
 -            tcg_vm = tcg_const_i32(0);
 +            tcg_gen_movi_i32(tcg_vm, 0);
          } else {
 -            tcg_vm = read_fp_sreg(s, rm);
 +            read_vec_element_i32(s, tcg_vm, rm, 0, size);
          }
 -        if (signal_all_nans) {
 -            gen_helper_vfp_cmpes_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
 -        } else {
 -            gen_helper_vfp_cmps_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
 +
 +        switch (size) {
 +        case MO_32:
 +            if (signal_all_nans) {
 +                gen_helper_vfp_cmpes_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
 +            } else {
 +                gen_helper_vfp_cmps_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
 +            }
 +            break;
 +        case MO_16:
 +            if (signal_all_nans) {
 +                gen_helper_vfp_cmpeh_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
 +            } else {
 +                gen_helper_vfp_cmph_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
 +            }
 +            break;
 +        default:
 +            g_assert_not_reached();
          }
 +
          tcg_temp_free_i32(tcg_vn);
          tcg_temp_free_i32(tcg_vm);
      }
@@ -XXX,XX +XXX,XX @@ static void handle_fp_compare(DisasContext *s, bool is_double,
  static void disas_fp_compare(DisasContext *s, uint32_t insn)
  {
      unsigned int mos, type, rm, op, rn, opc, op2r;
 +    int size;
      mos = extract32(insn, 29, 3);
 -    type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
 +    type = extract32(insn, 22, 2);
      rm = extract32(insn, 16, 5);
      op = extract32(insn, 14, 2);
      rn = extract32(insn, 5, 5);
      opc = extract32(insn, 3, 2);
      op2r = extract32(insn, 0, 3);
 -    if (mos || op || op2r || type > 1) {
 +    if (mos || op || op2r) {
 +        unallocated_encoding(s);
 +        return;
 +    }
 +
 +    switch (type) {
 +    case 0:
 +        size = MO_32;
 +        break;
 +    case 1:
 +        size = MO_64;
 +        break;
 +    case 3:
 +        size = MO_16;
 +        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 +            break;
 +        }
 +        /* fallthru */
 +    default:
          unallocated_encoding(s);
          return;
      }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_compare(DisasContext *s, uint32_t insn)
          return;
      }
 -    handle_fp_compare(s, type, rn, rm, opc & 1, opc & 2);
 +    handle_fp_compare(s, size, rn, rm, opc & 1, opc & 2);
  }
- /* Floating point conditional compare
+ static void create_smmu(const SBSAMachineState *sms, PCIBus *bus)
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void sbsa_ref_init(MachineState *machine)
-     unsigned int mos, type, rm, cond, rn, op, nzcv;
-     TCGv_i64 tcg_flags;
+     create_ahci(sms);
-     TCGLabel *label_continue = NULL;
-+    int size;
+-    create_ehci(sms);
++    create_xhci(sms);
-     mos = extract32(insn, 29, 3);
--    type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
+     create_pcie(sms);
-+    type = extract32(insn, 22, 2);
-     rm = extract32(insn, 16, 5);
+diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
-     cond = extract32(insn, 12, 4);
+index XXXXXXX..XXXXXXX 100644
-     rn = extract32(insn, 5, 5);
+--- a/hw/arm/Kconfig
-     op = extract32(insn, 4, 1);
++++ b/hw/arm/Kconfig
-     nzcv = extract32(insn, 0, 4);
+@@ -XXX,XX +XXX,XX @@ config SBSA_REF
+     select PL011 # UART
--    if (mos || type > 1) {
+     select PL031 # RTC
-+    if (mos) {
+     select PL061 # GPIO
-+        unallocated_encoding(s);
+-    select USB_EHCI_SYSBUS
-+        return;
++    select USB_XHCI_SYSBUS
-+    }
+     select WDT_SBSA
-+
+     select BOCHS_DISPLAY
-+    switch (type) {
 +    case 0:
 +        size = MO_32;
 +        break;
 +    case 1:
 +        size = MO_64;
 +        break;
 +    case 3:
 +        size = MO_16;
 +        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 +            break;
 +        }
 +        /* fallthru */
 +    default:
          unallocated_encoding(s);
          return;
      }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
          gen_set_label(label_match);
      }
 -    handle_fp_compare(s, type, rn, rm, false, op);
 +    handle_fp_compare(s, size, rn, rm, false, op);
      if (cond < 0x0e) {
          gen_set_label(label_continue);
 --
-.17.0
+.34.1

-[Qemu-devel] [PULL 05/16] target/arm: Early exit after unallocated_encoding in disas_fp_int_conv
+[PULL 03/11] target/arm: Avoid splitting Zregs across lines in dump
 From: Richard Henderson <richard.henderson@linaro.org>
-No sense in emitting code after the exception.
+Allow the line length to extend to 548 columns.  While annoyingly wide,
 it's still less confusing than the continuations we print.  Also, the
 default VL used by Linux (and max for A64FX) uses only 140 columns.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Tested-by: Alex Bennée <alex.bennee@linaro.org>
+Message-id: 20230622151201.1578522-2-richard.henderson@linaro.org
 Message-id: 20180512003217.9105-3-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 2 +-
+ target/arm/cpu.c | 36 ++++++++++++++----------------------
-file changed, 1 insertion(+), 1 deletion(-)
+file changed, 14 insertions(+), 22 deletions(-)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/cpu.c
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
-         default:
+     ARMCPU *cpu = ARM_CPU(cs);
-             /* all other sf/type/rmode combinations are invalid */
+     CPUARMState *env = &cpu->env;
-             unallocated_encoding(s);
+     uint32_t psr = pstate_read(env);
--            break;
+-    int i;
-+            return;
++    int i, j;
      int el = arm_current_el(env);
      const char *ns_status;
      bool sve;
@@ -XXX,XX +XXX,XX @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
      }
      if (sve) {
 -        int j, zcr_len = sve_vqm1_for_el(env, el);
 +        int zcr_len = sve_vqm1_for_el(env, el);
          for (i = 0; i <= FFR_PRED_NUM; i++) {
              bool eol;
@@ -XXX,XX +XXX,XX @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
              }
          }
-         if (!fp_access_check(s)) {
+-        for (i = 0; i < 32; i++) {
 -            if (zcr_len == 0) {
 +        if (zcr_len == 0) {
 +            /*
 +             * With vl=16, there are only 37 columns per register,
 +             * so output two registers per line.
 +             */
 +            for (i = 0; i < 32; i++) {
                  qemu_fprintf(f, "Z%02d=%016" PRIx64 ":%016" PRIx64 "%s",
                               i, env->vfp.zregs[i].d[1],
                               env->vfp.zregs[i].d[0], i & 1 ? "\n" : " ");
 -            } else if (zcr_len == 1) {
 -                qemu_fprintf(f, "Z%02d=%016" PRIx64 ":%016" PRIx64
 -                             ":%016" PRIx64 ":%016" PRIx64 "\n",
 -                             i, env->vfp.zregs[i].d[3], env->vfp.zregs[i].d[2],
 -                             env->vfp.zregs[i].d[1], env->vfp.zregs[i].d[0]);
 -            } else {
 +            }
 +        } else {
 +            for (i = 0; i < 32; i++) {
 +                qemu_fprintf(f, "Z%02d=", i);
                  for (j = zcr_len; j >= 0; j--) {
 -                    bool odd = (zcr_len - j) % 2 != 0;
 -                    if (j == zcr_len) {
 -                        qemu_fprintf(f, "Z%02d[%x-%x]=", i, j, j - 1);
 -                    } else if (!odd) {
 -                        if (j > 0) {
 -                            qemu_fprintf(f, "   [%x-%x]=", j, j - 1);
 -                        } else {
 -                            qemu_fprintf(f, "     [%x]=", j);
 -                        }
 -                    }
                      qemu_fprintf(f, "%016" PRIx64 ":%016" PRIx64 "%s",
                                   env->vfp.zregs[i].d[j * 2 + 1],
 -                                 env->vfp.zregs[i].d[j * 2],
 -                                 odd || j == 0 ? "\n" : ":");
 +                                 env->vfp.zregs[i].d[j * 2 + 0],
 +                                 j ? ":" : "\n");
                  }
              }
          }
 --
-.17.0
+.34.1

-[Qemu-devel] [PULL 08/16] target/arm: Introduce and use read_fp_hreg
+[PULL 04/11] target/arm: Dump ZA[] when active
 From: Richard Henderson <richard.henderson@linaro.org>
-Cc: qemu-stable@nongnu.org
+Always print each matrix row whole, one per line, so that we
 get the entire matrix in the proper shape.
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20230622151201.1578522-3-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Tested-by: Alex Bennée <alex.bennee@linaro.org>
-Message-id: 20180512003217.9105-6-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 30 ++++++++++++++----------------
+ target/arm/cpu.c | 18 ++++++++++++++++++
-file changed, 14 insertions(+), 16 deletions(-)
+file changed, 18 insertions(+)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/cpu.c
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static TCGv_i32 read_fp_sreg(DisasContext *s, int reg)
+@@ -XXX,XX +XXX,XX @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
-     return v;
+                          i, q[1], q[0], (i & 1 ? "\n" : " "));
          }
      }
 +
 +    if (cpu_isar_feature(aa64_sme, cpu) &&
 +        FIELD_EX64(env->svcr, SVCR, ZA) &&
 +        sme_exception_el(env, el) == 0) {
 +        int zcr_len = sve_vqm1_for_el_sm(env, el, true);
 +        int svl = (zcr_len + 1) * 16;
 +        int svl_lg10 = svl < 100 ? 2 : 3;
 +
 +        for (i = 0; i < svl; i++) {
 +            qemu_fprintf(f, "ZA[%0*d]=", svl_lg10, i);
 +            for (j = zcr_len; j >= 0; --j) {
 +                qemu_fprintf(f, "%016" PRIx64 ":%016" PRIx64 "%c",
 +                             env->zarray[i].d[2 * j + 1],
 +                             env->zarray[i].d[2 * j],
 +                             j ? ':' : '\n');
 +            }
 +        }
 +    }
  }
-+static TCGv_i32 read_fp_hreg(DisasContext *s, int reg)
+ #else
 +{
 +    TCGv_i32 v = tcg_temp_new_i32();
 +
 +    tcg_gen_ld16u_i32(v, cpu_env, fp_reg_offset(s, reg, MO_16));
 +    return v;
 +}
 +
  /* Clear the bits above an N-bit vector, for N = (is_q ? 128 : 64).
   * If SVE is not enabled, then there are only 128 bits in the vector.
   */
@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
  static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn)
  {
      TCGv_ptr fpst = NULL;
 -    TCGv_i32 tcg_op = tcg_temp_new_i32();
 +    TCGv_i32 tcg_op = read_fp_hreg(s, rn);
      TCGv_i32 tcg_res = tcg_temp_new_i32();
 -    read_vec_element_i32(s, tcg_op, rn, 0, MO_16);
 -
      switch (opcode) {
      case 0x0: /* FMOV */
          tcg_gen_mov_i32(tcg_res, tcg_op);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
          tcg_temp_free_i64(tcg_op2);
          tcg_temp_free_i64(tcg_res);
      } else {
 -        TCGv_i32 tcg_op1 = tcg_temp_new_i32();
 -        TCGv_i32 tcg_op2 = tcg_temp_new_i32();
 +        TCGv_i32 tcg_op1 = read_fp_hreg(s, rn);
 +        TCGv_i32 tcg_op2 = read_fp_hreg(s, rm);
          TCGv_i64 tcg_res = tcg_temp_new_i64();
 -        read_vec_element_i32(s, tcg_op1, rn, 0, MO_16);
 -        read_vec_element_i32(s, tcg_op2, rm, 0, MO_16);
 -
          gen_helper_neon_mull_s16(tcg_res, tcg_op1, tcg_op2);
          gen_helper_neon_addl_saturate_s32(tcg_res, cpu_env, tcg_res, tcg_res);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_three_reg_same_fp16(DisasContext *s,
      fpst = get_fpstatus_ptr(true);
 -    tcg_op1 = tcg_temp_new_i32();
 -    tcg_op2 = tcg_temp_new_i32();
 +    tcg_op1 = read_fp_hreg(s, rn);
 +    tcg_op2 = read_fp_hreg(s, rm);
      tcg_res = tcg_temp_new_i32();
 -    read_vec_element_i32(s, tcg_op1, rn, 0, MO_16);
 -    read_vec_element_i32(s, tcg_op2, rm, 0, MO_16);
 -
      switch (fpopcode) {
      case 0x03: /* FMULX */
          gen_helper_advsimd_mulxh(tcg_res, tcg_op1, tcg_op2, fpst);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn)
      }
      if (is_scalar) {
 -        TCGv_i32 tcg_op = tcg_temp_new_i32();
 +        TCGv_i32 tcg_op = read_fp_hreg(s, rn);
          TCGv_i32 tcg_res = tcg_temp_new_i32();
 -        read_vec_element_i32(s, tcg_op, rn, 0, MO_16);
 -
          switch (fpop) {
          case 0x1a: /* FCVTNS */
          case 0x1b: /* FCVTMS */
 --
-.17.0
+.34.1

-[Qemu-devel] [PULL 10/16] target/arm: Implement FP data-processing (3 source) for fp16
+[PULL 05/11] target/arm: Fix SME full tile indexing
 From: Richard Henderson <richard.henderson@linaro.org>
-We missed all of the scalar fp16 fma operations.
+For the outer product set of insns, which take an entire matrix
 tile as output, the argument is not a combined tile+column.
 Therefore using get_tile_rowcol was incorrect, as we extracted
 the tile number from itself.
 The test case relies only on assembler support for SME, since
 no release of GCC recognizes -march=armv9-a+sme yet.
 Cc: qemu-stable@nongnu.org
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1620
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Tested-by: Alex Bennée <alex.bennee@linaro.org>
+Message-id: 20230622151201.1578522-5-richard.henderson@linaro.org
-Message-id: 20180512003217.9105-8-richard.henderson@linaro.org
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 48 ++++++++++++++++++++++++++++++++++++++
+ target/arm/tcg/translate-sme.c    | 24 ++++++---
-file changed, 48 insertions(+)
+ tests/tcg/aarch64/sme-outprod1.c  | 83 +++++++++++++++++++++++++++++++
+ tests/tcg/aarch64/Makefile.target | 10 ++--
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+files changed, 108 insertions(+), 9 deletions(-)
  create mode 100644 tests/tcg/aarch64/sme-outprod1.c
 diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/tcg/translate-sme.c
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/tcg/translate-sme.c
-@@ -XXX,XX +XXX,XX @@ static void handle_fp_3src_double(DisasContext *s, bool o0, bool o1,
+@@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs,
-     tcg_temp_free_i64(tcg_res);
+     return addr;
  }
-+/* Floating-point data-processing (3 source) - half precision */
++/*
-+static void handle_fp_3src_half(DisasContext *s, bool o0, bool o1,
++ * Resolve tile.size[0] to a host pointer.
-+                                int rd, int rn, int rm, int ra)
++ * Used by e.g. outer product insns where we require the entire tile.
 + */
 +static TCGv_ptr get_tile(DisasContext *s, int esz, int tile)
 +{
-+    TCGv_i32 tcg_op1, tcg_op2, tcg_op3;
++    TCGv_ptr addr = tcg_temp_new_ptr();
-+    TCGv_i32 tcg_res = tcg_temp_new_i32();
++    int offset;
-+    TCGv_ptr fpst = get_fpstatus_ptr(true);
++
-+
++    offset = tile * sizeof(ARMVectorReg) + offsetof(CPUARMState, zarray);
-+    tcg_op1 = read_fp_hreg(s, rn);
++
-+    tcg_op2 = read_fp_hreg(s, rm);
++    tcg_gen_addi_ptr(addr, cpu_env, offset);
-+    tcg_op3 = read_fp_hreg(s, ra);
++    return addr;
-+
++}
-+    /* These are fused multiply-add, and must be done as one
++
-+     * floating point operation with no rounding between the
+ static bool trans_ZERO(DisasContext *s, arg_ZERO *a)
-+     * multiplication and addition steps.
+ {
-+     * NB that doing the negations here as separate steps is
+     if (!dc_isar_feature(aa64_sme, s)) {
-+     * correct : an input NaN should come out with its sign bit
+@@ -XXX,XX +XXX,XX @@ static bool do_adda(DisasContext *s, arg_adda *a, MemOp esz,
-+     * flipped if it is a negated-input.
+         return true;
-+     */
+     }
-+    if (o1 == true) {
-+        tcg_gen_xori_i32(tcg_op3, tcg_op3, 0x8000);
+-    /* Sum XZR+zad to find ZAd. */
 -    za = get_tile_rowcol(s, esz, 31, a->zad, false);
 +    za = get_tile(s, esz, a->zad);
      zn = vec_full_reg_ptr(s, a->zn);
      pn = pred_full_reg_ptr(s, a->pn);
      pm = pred_full_reg_ptr(s, a->pm);
@@ -XXX,XX +XXX,XX @@ static bool do_outprod(DisasContext *s, arg_op *a, MemOp esz,
          return true;
      }
 -    /* Sum XZR+zad to find ZAd. */
 -    za = get_tile_rowcol(s, esz, 31, a->zad, false);
 +    za = get_tile(s, esz, a->zad);
      zn = vec_full_reg_ptr(s, a->zn);
      zm = vec_full_reg_ptr(s, a->zm);
      pn = pred_full_reg_ptr(s, a->pn);
@@ -XXX,XX +XXX,XX @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz,
          return true;
      }
 -    /* Sum XZR+zad to find ZAd. */
 -    za = get_tile_rowcol(s, esz, 31, a->zad, false);
 +    za = get_tile(s, esz, a->zad);
      zn = vec_full_reg_ptr(s, a->zn);
      zm = vec_full_reg_ptr(s, a->zm);
      pn = pred_full_reg_ptr(s, a->pn);
 diff --git a/tests/tcg/aarch64/sme-outprod1.c b/tests/tcg/aarch64/sme-outprod1.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/tests/tcg/aarch64/sme-outprod1.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * SME outer product, 1 x 1.
 + * SPDX-License-Identifier: GPL-2.0-or-later
 + */
 +
 +#include <stdio.h>
 +
 +extern void foo(float *dst);
 +
 +asm(
 +"    .arch_extension sme\n"
 +"    .type foo, @function\n"
 +"foo:\n"
 +"    stp x29, x30, [sp, -80]!\n"
 +"    mov x29, sp\n"
 +"    stp d8, d9, [sp, 16]\n"
 +"    stp d10, d11, [sp, 32]\n"
 +"    stp d12, d13, [sp, 48]\n"
 +"    stp d14, d15, [sp, 64]\n"
 +"    smstart\n"
 +"    ptrue p0.s, vl4\n"
 +"    fmov z0.s, #1.0\n"
 +/*
 + * An outer product of a vector of 1.0 by itself should be a matrix of 1.0.
 + * Note that we are using tile 1 here (za1.s) rather than tile 0.
 + */
 +"    zero {za}\n"
 +"    fmopa za1.s, p0/m, p0/m, z0.s, z0.s\n"
 +/*
 + * Read the first 4x4 sub-matrix of elements from tile 1:
 + * Note that za1h should be interchangable here.
 + */
 +"    mov w12, #0\n"
 +"    mova z0.s, p0/m, za1v.s[w12, #0]\n"
 +"    mova z1.s, p0/m, za1v.s[w12, #1]\n"
 +"    mova z2.s, p0/m, za1v.s[w12, #2]\n"
 +"    mova z3.s, p0/m, za1v.s[w12, #3]\n"
 +/*
 + * And store them to the input pointer (dst in the C code):
 + */
 +"    st1w {z0.s}, p0, [x0]\n"
 +"    add x0, x0, #16\n"
 +"    st1w {z1.s}, p0, [x0]\n"
 +"    add x0, x0, #16\n"
 +"    st1w {z2.s}, p0, [x0]\n"
 +"    add x0, x0, #16\n"
 +"    st1w {z3.s}, p0, [x0]\n"
 +"    smstop\n"
 +"    ldp d8, d9, [sp, 16]\n"
 +"    ldp d10, d11, [sp, 32]\n"
 +"    ldp d12, d13, [sp, 48]\n"
 +"    ldp d14, d15, [sp, 64]\n"
 +"    ldp x29, x30, [sp], 80\n"
 +"    ret\n"
 +"    .size foo, . - foo"
 +);
 +
 +int main()
 +{
 +    float dst[16];
 +    int i, j;
 +
 +    foo(dst);
 +
 +    for (i = 0; i < 16; i++) {
 +        if (dst[i] != 1.0f) {
 +            break;
 +        }
 +    }
 +
-+    if (o0 != o1) {
++    if (i == 16) {
-+        tcg_gen_xori_i32(tcg_op1, tcg_op1, 0x8000);
++        return 0; /* success */
 +    }
 +
-+    gen_helper_advsimd_muladdh(tcg_res, tcg_op1, tcg_op2, tcg_op3, fpst);
++    /* failure */
-+
++    for (i = 0; i < 4; ++i) {
-+    write_fp_sreg(s, rd, tcg_res);
++        for (j = 0; j < 4; ++j) {
-+
++            printf("%f ", (double)dst[i * 4 + j]);
-+    tcg_temp_free_ptr(fpst);
++        }
-+    tcg_temp_free_i32(tcg_op1);
++        printf("\n");
-+    tcg_temp_free_i32(tcg_op2);
++    }
-+    tcg_temp_free_i32(tcg_op3);
++    return 1;
 +    tcg_temp_free_i32(tcg_res);
 +}
-+
+diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
- /* Floating point data-processing (3 source)
+index XXXXXXX..XXXXXXX 100644
-  *   31  30  29 28       24 23  22  21  20  16  15  14  10 9    5 4    0
+--- a/tests/tcg/aarch64/Makefile.target
-  * +---+---+---+-----------+------+----+------+----+------+------+------+
++++ b/tests/tcg/aarch64/Makefile.target
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_3src(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ config-cc.mak: Makefile
-         }
+         $(call cc-option,-march=armv8.5-a,              CROSS_CC_HAS_ARMV8_5); \
-         handle_fp_3src_double(s, o0, o1, rd, rn, rm, ra);
+         $(call cc-option,-mbranch-protection=standard,  CROSS_CC_HAS_ARMV8_BTI); \
-         break;
+         $(call cc-option,-march=armv8.5-a+memtag,       CROSS_CC_HAS_ARMV8_MTE); \
-+    case 3:
+-        $(call cc-option,-march=armv9-a+sme,            CROSS_CC_HAS_ARMV9_SME)) 3> config-cc.mak
-+        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
++        $(call cc-option,-Wa$(COMMA)-march=armv9-a+sme, CROSS_AS_HAS_ARMV9_SME)) 3> config-cc.mak
-+            unallocated_encoding(s);
+ -include config-cc.mak
-+            return;
-+        }
+ ifneq ($(CROSS_CC_HAS_ARMV8_2),)
-+        if (!fp_access_check(s)) {
+@@ -XXX,XX +XXX,XX @@ AARCH64_TESTS += mte-1 mte-2 mte-3 mte-4 mte-5 mte-6 mte-7
-+            return;
+ mte-%: CFLAGS += -march=armv8.5-a+memtag
-+        }
+ endif
-+        handle_fp_3src_half(s, o0, o1, rd, rn, rm, ra);
-+        break;
++ifneq ($(CROSS_AS_HAS_ARMV9_SME),)
-     default:
++AARCH64_TESTS += sme-outprod1
-         unallocated_encoding(s);
++endif
-     }
++
  ifneq ($(CROSS_CC_HAS_SVE),)
  # System Registers Tests
  AARCH64_TESTS += sysregs
 -ifneq ($(CROSS_CC_HAS_ARMV9_SME),)
 -sysregs: CFLAGS+=-march=armv9-a+sme -DHAS_ARMV9_SME
 +ifneq ($(CROSS_AS_HAS_ARMV9_SME),)
 +sysregs: CFLAGS+=-Wa,-march=armv9-a+sme -DHAS_ARMV9_SME
  else
  sysregs: CFLAGS+=-march=armv8.1-a+sve
  endif
 --
-.17.0
+.34.1

-[Qemu-devel] [PULL 03/16] target/arm: Fix fp_status_f16 tininess before rounding
+[PULL 06/11] target/arm: Handle IC IVAU to improve compatibility with JITs
-In commit d81ce0ef2c4f105 we added an extra float_status field
+From: John Högberg <john.hogberg@ericsson.com>
 fp_status_fp16 for Arm, but forgot to initialize it correctly
 by setting it to float_tininess_before_rounding. This currently
 will only cause problems for the new V8_FP16 feature, since the
 float-to-float conversion code doesn't use it yet. The effect
 would be that we failed to set the Underflow IEEE exception flag
 in all the cases where we should.
-Add the missing initialization.
+Unlike architectures with precise self-modifying code semantics
 (e.g. x86) ARM processors do not maintain coherency for instruction
 execution and memory, requiring an instruction synchronization
 barrier on every core that will execute the new code, and on many
 models also the explicit use of cache management instructions.
-Fixes: d81ce0ef2c4f105
+While this is required to make JITs work on actual hardware, QEMU
-Cc: qemu-stable@nongnu.org
+has gotten away with not handling this since it does not emulate
 caches, and unconditionally invalidates code whenever the softmmu
 or the user-mode page protection logic detects that code has been
 modified.
 Unfortunately the latter does not work in the face of dual-mapped
 code (a common W^X workaround), where one page is executable and
 the other is writable: user-mode has no way to connect one with the
 other as that is only known to the kernel and the emulated
 application.
 This commit works around the issue by telling software that
 instruction cache invalidation is required by clearing the
 CPR_EL0.DIC flag (regardless of whether the emulated processor
 needs it), and then invalidating code in IC IVAU instructions.
 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1034
 Co-authored-by: Richard Henderson <richard.henderson@linaro.org>
 Signed-off-by: John Högberg <john.hogberg@ericsson.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Message-id: 168778890374.24232.3402138851538068785-1@git.sr.ht
 [PMM: removed unnecessary AArch64 feature check; moved
  "clear CTR_EL1.DIC" code up a bit so it's not in the middle
  of the vfp/neon related tests]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Message-id: 20180512004311.9299-16-richard.henderson@linaro.org
 ---
- target/arm/cpu.c | 2 ++
+ target/arm/cpu.c    | 11 +++++++++++
-file changed, 2 insertions(+)
+ target/arm/helper.c | 47 ++++++++++++++++++++++++++++++++++++++++++---
 files changed, 55 insertions(+), 3 deletions(-)
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.c
 +++ b/target/arm/cpu.c
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
+@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
-                               &env->vfp.fp_status);
+         return;
-     set_float_detect_tininess(float_tininess_before_rounding,
+     }
-                               &env->vfp.standard_fp_status);
-+    set_float_detect_tininess(float_tininess_before_rounding,
++#ifdef CONFIG_USER_ONLY
-+                              &env->vfp.fp_status_f16);
++    /*
- #ifndef CONFIG_USER_ONLY
++     * User mode relies on IC IVAU instructions to catch modification of
-     if (kvm_enabled()) {
++     * dual-mapped code.
-         kvm_arm_reset_vcpu(cpu);
++     *
 +     * Clear CTR_EL0.DIC to ensure that software that honors these flags uses
 +     * IC IVAU even if the emulated processor does not normally require it.
 +     */
 +    cpu->ctr = FIELD_DP64(cpu->ctr, CTR_EL0, DIC, 0);
 +#endif
 +
      if (arm_feature(env, ARM_FEATURE_AARCH64) &&
          cpu->has_vfp != cpu->has_neon) {
          /*
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void mdcr_el2_write(CPUARMState *env, const ARMCPRegInfo *ri,
      }
  }
 +#ifdef CONFIG_USER_ONLY
 +/*
 + * `IC IVAU` is handled to improve compatibility with JITs that dual-map their
 + * code to get around W^X restrictions, where one region is writable and the
 + * other is executable.
 + *
 + * Since the executable region is never written to we cannot detect code
 + * changes when running in user mode, and rely on the emulated JIT telling us
 + * that the code has changed by executing this instruction.
 + */
 +static void ic_ivau_write(CPUARMState *env, const ARMCPRegInfo *ri,
 +                          uint64_t value)
 +{
 +    uint64_t icache_line_mask, start_address, end_address;
 +    const ARMCPU *cpu;
 +
 +    cpu = env_archcpu(env);
 +
 +    icache_line_mask = (4 << extract32(cpu->ctr, 0, 4)) - 1;
 +    start_address = value & ~icache_line_mask;
 +    end_address = value | icache_line_mask;
 +
 +    mmap_lock();
 +
 +    tb_invalidate_phys_range(start_address, end_address);
 +
 +    mmap_unlock();
 +}
 +#endif
 +
  static const ARMCPRegInfo v8_cp_reginfo[] = {
      /*
       * Minimal set of EL0-visible registers. This will need to be expanded
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
      { .name = "CURRENTEL", .state = ARM_CP_STATE_AA64,
        .opc0 = 3, .opc1 = 0, .opc2 = 2, .crn = 4, .crm = 2,
        .access = PL1_R, .type = ARM_CP_CURRENTEL },
 -    /* Cache ops: all NOPs since we don't emulate caches */
 +    /*
 +     * Instruction cache ops. All of these except `IC IVAU` NOP because we
 +     * don't emulate caches.
 +     */
      { .name = "IC_IALLUIS", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
        .access = PL1_W, .type = ARM_CP_NOP,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
        .accessfn = access_tocu },
      { .name = "IC_IVAU", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 5, .opc2 = 1,
 -      .access = PL0_W, .type = ARM_CP_NOP,
 +      .access = PL0_W,
        .fgt = FGT_ICIVAU,
 -      .accessfn = access_tocu },
 +      .accessfn = access_tocu,
 +#ifdef CONFIG_USER_ONLY
 +      .type = ARM_CP_NO_RAW,
 +      .writefn = ic_ivau_write
 +#else
 +      .type = ARM_CP_NOP
 +#endif
 +    },
 +    /* Cache ops: all NOPs since we don't emulate caches */
      { .name = "DC_IVAC", .state = ARM_CP_STATE_AA64,
        .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 1,
        .access = PL1_W, .accessfn = aa64_cacheop_poc_access,
 --
-.17.0
+.34.1

-[Qemu-devel] [PULL 15/16] sdcard: Correct CRC16 offset in sd_function_switch()
+[PULL 07/11] tests/tcg/aarch64: Add testcases for IC IVAU and dual-mapped code
-From: Philippe Mathieu-Daudé <f4bug@amsat.org>
+From: John Högberg <john.hogberg@ericsson.com>
-Per the Physical Layer Simplified Spec. "4.3.10.4 Switch Function Status":
+https://gitlab.com/qemu-project/qemu/-/issues/1034
-  The block length is predefined to 512 bits
+Signed-off-by: John Högberg <john.hogberg@ericsson.com>
+Message-id: 168778890374.24232.3402138851538068785-2@git.sr.ht
 and "4.10.2 SD Status":
   The SD Status contains status bits that are related to the SD Memory Card
   proprietary features and may be used for future application-specific usage.
   The size of the SD Status is one data block of 512 bit. The content of this
   register is transmitted to the Host over the DAT bus along with a 16-bit CRC.
 Thus the 16-bit CRC goes at offset 64.
 Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Message-id: 20180509060104.4458-3-f4bug@amsat.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+[PMM: fixed typo in comment]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/sd/sd.c | 2 +-
+ tests/tcg/aarch64/icivau.c        | 189 ++++++++++++++++++++++++++++++
-file changed, 1 insertion(+), 1 deletion(-)
+ tests/tcg/aarch64/Makefile.target |   3 +-
+files changed, 191 insertions(+), 1 deletion(-)
-diff --git a/hw/sd/sd.c b/hw/sd/sd.c
+ create mode 100644 tests/tcg/aarch64/icivau.c
 diff --git a/tests/tcg/aarch64/icivau.c b/tests/tcg/aarch64/icivau.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/tests/tcg/aarch64/icivau.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * Tests the IC IVAU-driven workaround for catching changes made to dual-mapped
 + * code that would otherwise go unnoticed in user mode.
 + *
 + * Copyright (c) 2023 Ericsson AB
 + * SPDX-License-Identifier: GPL-2.0-or-later
 + */
 +
 +#include <sys/mman.h>
 +#include <sys/stat.h>
 +#include <string.h>
 +#include <stdint.h>
 +#include <stdlib.h>
 +#include <unistd.h>
 +#include <fcntl.h>
 +
 +#define MAX_CODE_SIZE 128
 +
 +typedef int (SelfModTest)(uint32_t, uint32_t*);
 +typedef int (BasicTest)(int);
 +
 +static void mark_code_modified(const uint32_t *exec_data, size_t length)
 +{
 +    int dc_required, ic_required;
 +    unsigned long ctr_el0;
 +
 +    /*
 +     * Clear the data/instruction cache, as indicated by the CTR_ELO.{DIC,IDC}
 +     * flags.
 +     *
 +     * For completeness we might be tempted to assert that we should fail when
 +     * the whole code update sequence is omitted, but that would make the test
 +     * flaky as it can succeed by coincidence on actual hardware.
 +     */
 +    asm ("mrs %0, ctr_el0\n" : "=r"(ctr_el0));
 +
 +    /* CTR_EL0.IDC */
 +    dc_required = !((ctr_el0 >> 28) & 1);
 +
 +    /* CTR_EL0.DIC */
 +    ic_required = !((ctr_el0 >> 29) & 1);
 +
 +    if (dc_required) {
 +        size_t dcache_stride, i;
 +
 +        /*
 +         * Step according to the minimum cache size, as the cache maintenance
 +         * instructions operate on the cache line of the given address.
 +         *
 +         * We assume that exec_data is properly aligned.
 +         */
 +        dcache_stride = (4 << ((ctr_el0 >> 16) & 0xF));
 +
 +        for (i = 0; i < length; i += dcache_stride) {
 +            const char *dc_addr = &((const char *)exec_data)[i];
 +            asm volatile ("dc cvau, %x[dc_addr]\n"
 +                          : /* no outputs */
 +                          : [dc_addr] "r"(dc_addr)
 +                          : "memory");
 +        }
 +
 +        asm volatile ("dmb ish\n");
 +    }
 +
 +    if (ic_required) {
 +        size_t icache_stride, i;
 +
 +        icache_stride = (4 << (ctr_el0 & 0xF));
 +
 +        for (i = 0; i < length; i += icache_stride) {
 +            const char *ic_addr = &((const char *)exec_data)[i];
 +            asm volatile ("ic ivau, %x[ic_addr]\n"
 +                          : /* no outputs */
 +                          : [ic_addr] "r"(ic_addr)
 +                          : "memory");
 +        }
 +
 +        asm volatile ("dmb ish\n");
 +    }
 +
 +    asm volatile ("isb sy\n");
 +}
 +
 +static int basic_test(uint32_t *rw_data, const uint32_t *exec_data)
 +{
 +    /*
 +     * As user mode only misbehaved for dual-mapped code when previously
 +     * translated code had been changed, we'll start off with this basic test
 +     * function to ensure that there's already some translated code at
 +     * exec_data before the next test. This should cause the next test to fail
 +     * if `mark_code_modified` fails to invalidate the code.
 +     *
 +     * Note that the payload is in binary form instead of inline assembler
 +     * because we cannot use __attribute__((naked)) on this platform and the
 +     * workarounds are at least as ugly as this is.
 +     */
 +    static const uint32_t basic_payload[] = {
 +        0xD65F03C0 /* 0x00: RET */
 +    };
 +
 +    BasicTest *copied_ptr = (BasicTest *)exec_data;
 +
 +    memcpy(rw_data, basic_payload, sizeof(basic_payload));
 +    mark_code_modified(exec_data, sizeof(basic_payload));
 +
 +    return copied_ptr(1234) == 1234;
 +}
 +
 +static int self_modification_test(uint32_t *rw_data, const uint32_t *exec_data)
 +{
 +    /*
 +     * This test is self-modifying in an attempt to cover an edge case where
 +     * the IC IVAU instruction invalidates itself.
 +     *
 +     * Note that the IC IVAU instruction is 16 bytes into the function, in what
 +     * will be the same cache line as the modified instruction on machines with
 +     * a cache line size >= 16 bytes.
 +     */
 +    static const uint32_t self_mod_payload[] = {
 +        /* Overwrite the placeholder instruction with the new one. */
 +        0xB9001C20, /* 0x00: STR w0, [x1, 0x1C] */
 +
 +        /* Get the executable address of the modified instruction. */
 +        0x100000A8, /* 0x04: ADR x8, <0x1C> */
 +
 +        /* Mark the modified instruction as updated. */
 +        0xD50B7B28, /* 0x08: DC CVAU x8 */
 +        0xD5033BBF, /* 0x0C: DMB ISH */
 +        0xD50B7528, /* 0x10: IC IVAU x8 */
 +        0xD5033BBF, /* 0x14: DMB ISH */
 +        0xD5033FDF, /* 0x18: ISB */
 +
 +        /* Placeholder instruction, overwritten above. */
 +        0x52800000, /* 0x1C: MOV w0, 0 */
 +
 +        0xD65F03C0  /* 0x20: RET */
 +    };
 +
 +    SelfModTest *copied_ptr = (SelfModTest *)exec_data;
 +    int i;
 +
 +    memcpy(rw_data, self_mod_payload, sizeof(self_mod_payload));
 +    mark_code_modified(exec_data, sizeof(self_mod_payload));
 +
 +    for (i = 1; i < 10; i++) {
 +        /* Replace the placeholder instruction with `MOV w0, i` */
 +        uint32_t new_instr = 0x52800000 | (i << 5);
 +
 +        if (copied_ptr(new_instr, rw_data) != i) {
 +            return 0;
 +        }
 +    }
 +
 +    return 1;
 +}
 +
 +int main(int argc, char **argv)
 +{
 +    const char *shm_name = "qemu-test-tcg-aarch64-icivau";
 +    int fd;
 +
 +    fd = shm_open(shm_name, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
 +
 +    if (fd < 0) {
 +        return EXIT_FAILURE;
 +    }
 +
 +    /* Unlink early to avoid leaving garbage in case the test crashes. */
 +    shm_unlink(shm_name);
 +
 +    if (ftruncate(fd, MAX_CODE_SIZE) == 0) {
 +        const uint32_t *exec_data;
 +        uint32_t *rw_data;
 +
 +        rw_data = mmap(0, MAX_CODE_SIZE, PROT_READ | PROT_WRITE,
 +                       MAP_SHARED, fd, 0);
 +        exec_data = mmap(0, MAX_CODE_SIZE, PROT_READ | PROT_EXEC,
 +                         MAP_SHARED, fd, 0);
 +
 +        if (rw_data && exec_data) {
 +            if (basic_test(rw_data, exec_data) &&
 +                self_modification_test(rw_data, exec_data)) {
 +                return EXIT_SUCCESS;
 +            }
 +        }
 +    }
 +
 +    return EXIT_FAILURE;
 +}
 diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
 index XXXXXXX..XXXXXXX 100644
---- a/hw/sd/sd.c
+--- a/tests/tcg/aarch64/Makefile.target
-+++ b/hw/sd/sd.c
++++ b/tests/tcg/aarch64/Makefile.target
-@@ -XXX,XX +XXX,XX @@ static void sd_function_switch(SDState *sd, uint32_t arg)
+@@ -XXX,XX +XXX,XX @@ AARCH64_SRC=$(SRC_PATH)/tests/tcg/aarch64
-         sd->data[14 + (i >> 1)] = new_func << ((i * 4) & 4);
+ VPATH         += $(AARCH64_SRC)
-     }
-     memset(&sd->data[17], 0, 47);
+ # Base architecture tests
--    stw_be_p(sd->data + 65, sd_crc16(sd->data, 64));
+-AARCH64_TESTS=fcvt pcalign-a64
-+    stw_be_p(sd->data + 64, sd_crc16(sd->data, 64));
++AARCH64_TESTS=fcvt pcalign-a64 icivau
- }
+ fcvt: LDFLAGS+=-lm
- static inline bool sd_wp_addr(SDState *sd, uint64_t addr)
++icivau: LDFLAGS+=-lrt
  run-fcvt: fcvt
      $(call run-test,$<,$(QEMU) $<, "$< on $(TARGET_NAME)")
 --
-.17.0
+.34.1

-[Qemu-devel] [PULL 04/16] target/arm: Implement FMOV (general) for fp16
+[PULL 08/11] tests/qtest: xlnx-canfd-test: Fix code coverity issues
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Vikram Garhwal <vikram.garhwal@amd.com>
-Adding the fp16 moves to/from general registers.
+Following are done to fix the coverity issues:
 . Change read_data to fix the CID 1512899: Out-of-bounds access (OVERRUN)
 . Fix match_rx_tx_data to fix CID 1512900: Logically dead code (DEADCODE)
 . Replace rand() in generate_random_data() with g_rand_int()
-Cc: qemu-stable@nongnu.org
+Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20230628202758.16398-1-vikram.garhwal@amd.com
 Tested-by: Alex Bennée <alex.bennee@linaro.org>
 Message-id: 20180512003217.9105-2-richard.henderson@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 21 +++++++++++++++++++++
+ tests/qtest/xlnx-canfd-test.c | 33 +++++++++++----------------------
-file changed, 21 insertions(+)
+file changed, 11 insertions(+), 22 deletions(-)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/tests/qtest/xlnx-canfd-test.c b/tests/qtest/xlnx-canfd-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/tests/qtest/xlnx-canfd-test.c
-+++ b/target/arm/translate-a64.c
++++ b/tests/qtest/xlnx-canfd-test.c
-@@ -XXX,XX +XXX,XX @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
+@@ -XXX,XX +XXX,XX @@ static void generate_random_data(uint32_t *buf_tx, bool is_canfd_frame)
-             tcg_gen_st_i64(tcg_rn, cpu_env, fp_reg_hi_offset(s, rd));
+     /* Generate random TX data for CANFD frame. */
-             clear_vec_high(s, true, rd);
+     if (is_canfd_frame) {
-             break;
+         for (int i = 0; i < CANFD_FRAME_SIZE - 2; i++) {
-+        case 3:
+-            buf_tx[2 + i] = rand();
-+            /* 16 bit */
++            buf_tx[2 + i] = g_random_int();
 +            tmp = tcg_temp_new_i64();
 +            tcg_gen_ext16u_i64(tmp, tcg_rn);
 +            write_fp_dreg(s, rd, tmp);
 +            tcg_temp_free_i64(tmp);
 +            break;
 +        default:
 +            g_assert_not_reached();
          }
      } else {
-         TCGv_i64 tcg_rd = cpu_reg(s, rd);
+         /* Generate random TX data for CAN frame. */
-@@ -XXX,XX +XXX,XX @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
+         for (int i = 0; i < CAN_FRAME_SIZE - 2; i++) {
-             /* 64 bits from top half */
+-            buf_tx[2 + i] = rand();
-             tcg_gen_ld_i64(tcg_rd, cpu_env, fp_reg_hi_offset(s, rn));
++            buf_tx[2 + i] = g_random_int();
              break;
 +        case 3:
 +            /* 16 bit */
 +            tcg_gen_ld16u_i64(tcg_rd, cpu_env, fp_reg_offset(s, rn, MO_16));
 +            break;
 +        default:
 +            g_assert_not_reached();
          }
      }
  }
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
-         case 0xa: /* 64 bit */
+-static void read_data(QTestState *qts, uint64_t can_base_addr, uint32_t *buf_rx)
-         case 0xd: /* 64 bit to top half of quad */
++static void read_data(QTestState *qts, uint64_t can_base_addr, uint32_t *buf_rx,
-             break;
++                      uint32_t frame_size)
-+        case 0x6: /* 16-bit float, 32-bit int */
+ {
-+        case 0xe: /* 16-bit float, 64-bit int */
+     uint32_t int_status;
-+            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+     uint32_t fifo_status_reg_value;
-+                break;
+     /* At which RX FIFO the received data is stored. */
-+            }
+     uint8_t store_ind = 0;
-+            /* fallthru */
+-    bool is_canfd_frame = false;
-         default:
-             /* all other sf/type/rmode combinations are invalid */
+     /* Read the interrupt on CANFD rx. */
-             unallocated_encoding(s);
+     int_status = qtest_readl(qts, can_base_addr + R_ISR_OFFSET) & ISR_RXOK;
@@ -XXX,XX +XXX,XX @@ static void read_data(QTestState *qts, uint64_t can_base_addr, uint32_t *buf_rx)
      buf_rx[0] = qtest_readl(qts, can_base_addr + R_RX0_ID_OFFSET);
      buf_rx[1] = qtest_readl(qts, can_base_addr + R_RX0_DLC_OFFSET);
 -    is_canfd_frame = (buf_rx[1] >> DLC_FD_BIT_SHIFT) & 1;
 -
 -    if (is_canfd_frame) {
 -        for (int i = 0; i < CANFD_FRAME_SIZE - 2; i++) {
 -            buf_rx[i + 2] = qtest_readl(qts,
 -                                    can_base_addr + R_RX0_DATA1_OFFSET + 4 * i);
 -        }
 -    } else {
 -        buf_rx[2] = qtest_readl(qts, can_base_addr + R_RX0_DATA1_OFFSET);
 -        buf_rx[3] = qtest_readl(qts, can_base_addr + R_RX0_DATA2_OFFSET);
 +    for (int i = 0; i < frame_size - 2; i++) {
 +        buf_rx[i + 2] = qtest_readl(qts,
 +                                can_base_addr + R_RX0_DATA1_OFFSET + 4 * i);
      }
      /* Clear the RX interrupt. */
@@ -XXX,XX +XXX,XX @@ static void match_rx_tx_data(const uint32_t *buf_tx, const uint32_t *buf_rx,
              g_assert_cmpint((buf_rx[size] & DLC_FD_BIT_MASK), ==,
                              (buf_tx[size] & DLC_FD_BIT_MASK));
          } else {
 -            if (!is_canfd_frame && size == 4) {
 -                break;
 -            }
 -
              g_assert_cmpint(buf_rx[size], ==, buf_tx[size]);
          }
@@ -XXX,XX +XXX,XX @@ static void test_can_data_transfer(void)
      write_data(qts, CANFD0_BASE_ADDR, buf_tx, false);
      send_data(qts, CANFD0_BASE_ADDR);
 -    read_data(qts, CANFD1_BASE_ADDR, buf_rx);
 +    read_data(qts, CANFD1_BASE_ADDR, buf_rx, CAN_FRAME_SIZE);
      match_rx_tx_data(buf_tx, buf_rx, false);
      qtest_quit(qts);
@@ -XXX,XX +XXX,XX @@ static void test_canfd_data_transfer(void)
      write_data(qts, CANFD0_BASE_ADDR, buf_tx, true);
      send_data(qts, CANFD0_BASE_ADDR);
 -    read_data(qts, CANFD1_BASE_ADDR, buf_rx);
 +    read_data(qts, CANFD1_BASE_ADDR, buf_rx, CANFD_FRAME_SIZE);
      match_rx_tx_data(buf_tx, buf_rx, true);
      qtest_quit(qts);
@@ -XXX,XX +XXX,XX @@ static void test_can_loopback(void)
      write_data(qts, CANFD0_BASE_ADDR, buf_tx, true);
      send_data(qts, CANFD0_BASE_ADDR);
 -    read_data(qts, CANFD0_BASE_ADDR, buf_rx);
 +    read_data(qts, CANFD0_BASE_ADDR, buf_rx, CANFD_FRAME_SIZE);
      match_rx_tx_data(buf_tx, buf_rx, true);
      generate_random_data(buf_tx, true);
@@ -XXX,XX +XXX,XX @@ static void test_can_loopback(void)
      write_data(qts, CANFD1_BASE_ADDR, buf_tx, true);
      send_data(qts, CANFD1_BASE_ADDR);
 -    read_data(qts, CANFD1_BASE_ADDR, buf_rx);
 +    read_data(qts, CANFD1_BASE_ADDR, buf_rx, CANFD_FRAME_SIZE);
      match_rx_tx_data(buf_tx, buf_rx, true);
      qtest_quit(qts);
 --
-.17.0
+.34.1

-[Qemu-devel] [PULL 07/16] target/arm: Implement FCVT (scalar, fixed-point) for fp16
+Deleted patch
-From: Richard Henderson <richard.henderson@linaro.org>
-Cc: qemu-stable@nongnu.org
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Tested-by: Alex Bennée <alex.bennee@linaro.org>
-Message-id: 20180512003217.9105-5-richard.henderson@linaro.org
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/translate-a64.c | 17 +++++++++++++++--
-file changed, 15 insertions(+), 2 deletions(-)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
-+++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_fixed_conv(DisasContext *s, uint32_t insn)
-     bool sf = extract32(insn, 31, 1);
-     bool itof;
--    if (sbit || (type > 1)
--        || (!sf && scale < 32)) {
-+    if (sbit || (!sf && scale < 32)) {
-+        unallocated_encoding(s);
-+        return;
-+    }
-+
-+    switch (type) {
-+    case 0: /* float32 */
-+    case 1: /* float64 */
-+        break;
-+    case 3: /* float16 */
-+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
-+            break;
-+        }
-+        /* fallthru */
-+    default:
-         unallocated_encoding(s);
-         return;
-     }
---
-.17.0

-[Qemu-devel] [PULL 09/16] target/arm: Implement FP data-processing (2 source) for fp16
+[PULL 09/11] target/arm: gdbstub: Guard M-profile code with CONFIG_TCG
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Fabiano Rosas <farosas@suse.de>
-We missed all of the scalar fp16 binary operations.
+This code is only relevant when TCG is present in the build. Building
 with --disable-tcg --enable-xen on an x86 host we get:
-Cc: qemu-stable@nongnu.org
+$ ../configure --target-list=x86_64-softmmu,aarch64-softmmu --disable-tcg --enable-xen
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+$ make -j$(nproc)
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+...
-Tested-by: Alex Bennée <alex.bennee@linaro.org>
+libqemu-aarch64-softmmu.fa.p/target_arm_gdbstub.c.o: in function `m_sysreg_ptr':
-Message-id: 20180512003217.9105-7-richard.henderson@linaro.org
+ ../target/arm/gdbstub.c:358: undefined reference to `arm_v7m_get_sp_ptr'
  ../target/arm/gdbstub.c:361: undefined reference to `arm_v7m_get_sp_ptr'
 libqemu-aarch64-softmmu.fa.p/target_arm_gdbstub.c.o: in function `arm_gdb_get_m_systemreg':
 ../target/arm/gdbstub.c:405: undefined reference to `arm_v7m_mrs_control'
 Signed-off-by: Fabiano Rosas <farosas@suse.de>
 Message-id: 20230628164821.16771-1-farosas@suse.de
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 65 ++++++++++++++++++++++++++++++++++++++
+ target/arm/gdbstub.c | 4 ++++
-file changed, 65 insertions(+)
+file changed, 4 insertions(+)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/target/arm/gdbstub.c b/target/arm/gdbstub.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/gdbstub.c
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/gdbstub.c
-@@ -XXX,XX +XXX,XX @@ static void handle_fp_2src_double(DisasContext *s, int opcode,
+@@ -XXX,XX +XXX,XX @@ static int arm_gen_dynamic_sysreg_xml(CPUState *cs, int base_reg)
-     tcg_temp_free_i64(tcg_res);
+     return cpu->dyn_sysreg_xml.num;
  }
-+/* Floating-point data-processing (2 source) - half precision */
++#ifdef CONFIG_TCG
-+static void handle_fp_2src_half(DisasContext *s, int opcode,
+ typedef enum {
-+                                int rd, int rn, int rm)
+     M_SYSREG_MSP,
-+{
+     M_SYSREG_PSP,
-+    TCGv_i32 tcg_op1;
+@@ -XXX,XX +XXX,XX @@ static int arm_gen_dynamic_m_secextreg_xml(CPUState *cs, int orig_base_reg)
-+    TCGv_i32 tcg_op2;
+     return cpu->dyn_m_secextreg_xml.num;
-+    TCGv_i32 tcg_res;
+ }
-+    TCGv_ptr fpst;
+ #endif
-+
++#endif /* CONFIG_TCG */
-+    tcg_res = tcg_temp_new_i32();
-+    fpst = get_fpstatus_ptr(true);
+ const char *arm_gdb_get_dynamic_xml(CPUState *cs, const char *xmlname)
-+    tcg_op1 = read_fp_hreg(s, rn);
+ {
-+    tcg_op2 = read_fp_hreg(s, rm);
+@@ -XXX,XX +XXX,XX @@ void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu)
-+
+                              arm_gen_dynamic_sysreg_xml(cs, cs->gdb_num_regs),
-+    switch (opcode) {
+                              "system-registers.xml", 0);
-+    case 0x0: /* FMUL */
-+        gen_helper_advsimd_mulh(tcg_res, tcg_op1, tcg_op2, fpst);
++#ifdef CONFIG_TCG
-+        break;
+     if (arm_feature(env, ARM_FEATURE_M) && tcg_enabled()) {
-+    case 0x1: /* FDIV */
+         gdb_register_coprocessor(cs,
-+        gen_helper_advsimd_divh(tcg_res, tcg_op1, tcg_op2, fpst);
+             arm_gdb_get_m_systemreg, arm_gdb_set_m_systemreg,
-+        break;
+@@ -XXX,XX +XXX,XX @@ void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu)
 +    case 0x2: /* FADD */
 +        gen_helper_advsimd_addh(tcg_res, tcg_op1, tcg_op2, fpst);
 +        break;
 +    case 0x3: /* FSUB */
 +        gen_helper_advsimd_subh(tcg_res, tcg_op1, tcg_op2, fpst);
 +        break;
 +    case 0x4: /* FMAX */
 +        gen_helper_advsimd_maxh(tcg_res, tcg_op1, tcg_op2, fpst);
 +        break;
 +    case 0x5: /* FMIN */
 +        gen_helper_advsimd_minh(tcg_res, tcg_op1, tcg_op2, fpst);
 +        break;
 +    case 0x6: /* FMAXNM */
 +        gen_helper_advsimd_maxnumh(tcg_res, tcg_op1, tcg_op2, fpst);
 +        break;
 +    case 0x7: /* FMINNM */
 +        gen_helper_advsimd_minnumh(tcg_res, tcg_op1, tcg_op2, fpst);
 +        break;
 +    case 0x8: /* FNMUL */
 +        gen_helper_advsimd_mulh(tcg_res, tcg_op1, tcg_op2, fpst);
 +        tcg_gen_xori_i32(tcg_res, tcg_res, 0x8000);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +
 +    write_fp_sreg(s, rd, tcg_res);
 +
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tcg_op1);
 +    tcg_temp_free_i32(tcg_op2);
 +    tcg_temp_free_i32(tcg_res);
 +}
 +
  /* Floating point data-processing (2 source)
   *   31  30  29 28       24 23  22  21 20  16 15    12 11 10 9    5 4    0
   * +---+---+---+-----------+------+---+------+--------+-----+------+------+
@@ -XXX,XX +XXX,XX @@ static void disas_fp_2src(DisasContext *s, uint32_t insn)
          }
-         handle_fp_2src_double(s, opcode, rd, rn, rm);
+ #endif
          break;
 +    case 3:
 +        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 +            unallocated_encoding(s);
 +            return;
 +        }
 +        if (!fp_access_check(s)) {
 +            return;
 +        }
 +        handle_fp_2src_half(s, opcode, rd, rn, rm);
 +        break;
      default:
          unallocated_encoding(s);
      }
++#endif /* CONFIG_TCG */
+ }
 --
-.17.0
+.34.1

-[Qemu-devel] [PULL 12/16] target/arm: Implement FCSEL for fp16
+Deleted patch
-From: Alex Bennée <alex.bennee@linaro.org>
-These were missed out from the rest of the half-precision work.
-Cc: qemu-stable@nongnu.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
-Tested-by: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180512003217.9105-10-richard.henderson@linaro.org
-[rth: Fix erroneous check vs type]
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/translate-a64.c | 31 +++++++++++++++++++++++++------
-file changed, 25 insertions(+), 6 deletions(-)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
-+++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
-     unsigned int mos, type, rm, cond, rn, rd;
-     TCGv_i64 t_true, t_false, t_zero;
-     DisasCompare64 c;
-+    TCGMemOp sz;
-     mos = extract32(insn, 29, 3);
--    type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
-+    type = extract32(insn, 22, 2);
-     rm = extract32(insn, 16, 5);
-     cond = extract32(insn, 12, 4);
-     rn = extract32(insn, 5, 5);
-     rd = extract32(insn, 0, 5);
--    if (mos || type > 1) {
-+    if (mos) {
-+        unallocated_encoding(s);
-+        return;
-+    }
-+
-+    switch (type) {
-+    case 0:
-+        sz = MO_32;
-+        break;
-+    case 1:
-+        sz = MO_64;
-+        break;
-+    case 3:
-+        sz = MO_16;
-+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
-+            break;
-+        }
-+        /* fallthru */
-+    default:
-         unallocated_encoding(s);
-         return;
-     }
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
-         return;
-     }
--    /* Zero extend sreg inputs to 64 bits now.  */
-+    /* Zero extend sreg & hreg inputs to 64 bits now.  */
-     t_true = tcg_temp_new_i64();
-     t_false = tcg_temp_new_i64();
--    read_vec_element(s, t_true, rn, 0, type ? MO_64 : MO_32);
--    read_vec_element(s, t_false, rm, 0, type ? MO_64 : MO_32);
-+    read_vec_element(s, t_true, rn, 0, sz);
-+    read_vec_element(s, t_false, rm, 0, sz);
-     a64_test_cc(&c, cond);
-     t_zero = tcg_const_i64(0);
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
-     tcg_temp_free_i64(t_false);
-     a64_free_cc(&c);
--    /* Note that sregs write back zeros to the high bits,
-+    /* Note that sregs & hregs write back zeros to the high bits,
-        and we've already done the zero-extension.  */
-     write_fp_dreg(s, rd, t_true);
-     tcg_temp_free_i64(t_true);
---
-.17.0

-[Qemu-devel] [PULL 13/16] target/arm: Implement FMOV (immediate) for fp16
+Deleted patch
-From: Alex Bennée <alex.bennee@linaro.org>
-All the hard work is already done by vfp_expand_imm, we just need to
-make sure we pick up the correct size.
-Cc: qemu-stable@nongnu.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
-Tested-by: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180512003217.9105-11-richard.henderson@linaro.org
-[rth: Merge unallocated_encoding check with TCGMemOp conversion.]
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
- target/arm/translate-a64.c | 20 +++++++++++++++++---
-file changed, 17 insertions(+), 3 deletions(-)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
-+++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_imm(DisasContext *s, uint32_t insn)
- {
-     int rd = extract32(insn, 0, 5);
-     int imm8 = extract32(insn, 13, 8);
--    int is_double = extract32(insn, 22, 2);
-+    int type = extract32(insn, 22, 2);
-     uint64_t imm;
-     TCGv_i64 tcg_res;
-+    TCGMemOp sz;
--    if (is_double > 1) {
-+    switch (type) {
-+    case 0:
-+        sz = MO_32;
-+        break;
-+    case 1:
-+        sz = MO_64;
-+        break;
-+    case 3:
-+        sz = MO_16;
-+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
-+            break;
-+        }
-+        /* fallthru */
-+    default:
-         unallocated_encoding(s);
-         return;
-     }
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_imm(DisasContext *s, uint32_t insn)
-         return;
-     }
--    imm = vfp_expand_imm(MO_32 + is_double, imm8);
-+    imm = vfp_expand_imm(sz, imm8);
-     tcg_res = tcg_const_i64(imm);
-     write_fp_dreg(s, rd, tcg_res);
---
-.17.0

-[Qemu-devel] [PULL 14/16] target/arm: Fix sqrt_f16 exception raising
+[PULL 10/11] hw: arm: allwinner-sramc: Set class_size
-From: Alex Bennée <alex.bennee@linaro.org>
+From: Akihiko Odaki <akihiko.odaki@daynix.com>
-We are meant to explicitly pass fpst, not cpu_env.
+AwSRAMCClass is larger than SysBusDeviceClass so the class size must be
 advertised accordingly.
-Cc: qemu-stable@nongnu.org
+Fixes: 05def917e1 ("hw: arm: allwinner-sramc: Add SRAM Controller support for R40")
-Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
+Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20230628110905.38125-1-akihiko.odaki@daynix.com
 Tested-by: Alex Bennée <alex.bennee@linaro.org>
 Message-id: 20180512003217.9105-12-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 3 ++-
+ hw/misc/allwinner-sramc.c | 1 +
-file changed, 2 insertions(+), 1 deletion(-)
+file changed, 1 insertion(+)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/hw/misc/allwinner-sramc.c b/hw/misc/allwinner-sramc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/hw/misc/allwinner-sramc.c
-+++ b/target/arm/translate-a64.c
++++ b/hw/misc/allwinner-sramc.c
-@@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn)
+@@ -XXX,XX +XXX,XX @@ static const TypeInfo allwinner_sramc_info = {
-         tcg_gen_xori_i32(tcg_res, tcg_op, 0x8000);
+     .parent        = TYPE_SYS_BUS_DEVICE,
-         break;
+     .instance_init = allwinner_sramc_init,
-     case 0x3: /* FSQRT */
+     .instance_size = sizeof(AwSRAMCState),
--        gen_helper_sqrt_f16(tcg_res, tcg_op, cpu_env);
++    .class_size    = sizeof(AwSRAMCClass),
-+        fpst = get_fpstatus_ptr(true);
+     .class_init    = allwinner_sramc_class_init,
-+        gen_helper_sqrt_f16(tcg_res, tcg_op, fpst);
+ };
-         break;
      case 0x8: /* FRINTN */
      case 0x9: /* FRINTP */
 --
-.17.0
+.34.1

-[Qemu-devel] [PULL 16/16] tcg: Optionally log FPU state in TCG -d cpu logging
+[PULL 11/11] target/xtensa: Assert that interrupt level is within bounds
-Usually the logging of the CPU state produced by -d cpu is sufficient
+In handle_interrupt() we use level as an index into the interrupt_vector[]
-to diagnose problems, but sometimes you want to see the state of
+array. This is safe because we have checked it against env->config->nlevel,
-the floating point registers as well. We don't want to enable that
+but Coverity can't see that (and it is only true because each CPU config
-by default as it adds a lot of extra data to the log; instead,
+sets its XCHAL_NUM_INTLEVELS to something less than MAX_NLEVELS), so it
-allow it to be optionally enabled via -d fpu.
+complains about a possible array overrun (CID 1507131)
 Add an assert() which will make Coverity happy and catch the unlikely
 case of a mis-set XCHAL_NUM_INTLEVELS in future.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Acked-by: Max Filippov <jcmvbkbc@gmail.com>
-Message-id: 20180510130024.31678-1-peter.maydell@linaro.org
+Message-id: 20230623154135.1930261-1-peter.maydell@linaro.org
 ---
- include/qemu/log.h   | 1 +
+ target/xtensa/exc_helper.c | 3 +++
- accel/tcg/cpu-exec.c | 9 ++++++---
+file changed, 3 insertions(+)
  util/log.c           | 2 ++
 files changed, 9 insertions(+), 3 deletions(-)
-diff --git a/include/qemu/log.h b/include/qemu/log.h
+diff --git a/target/xtensa/exc_helper.c b/target/xtensa/exc_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/qemu/log.h
+--- a/target/xtensa/exc_helper.c
-+++ b/include/qemu/log.h
++++ b/target/xtensa/exc_helper.c
-@@ -XXX,XX +XXX,XX @@ static inline bool qemu_log_separate(void)
+@@ -XXX,XX +XXX,XX @@ static void handle_interrupt(CPUXtensaState *env)
- #define CPU_LOG_PAGE       (1 << 14)
+         CPUState *cs = env_cpu(env);
- /* LOG_TRACE (1 << 15) is defined in log-for-trace.h */
- #define CPU_LOG_TB_OP_IND  (1 << 16)
+         if (level > 1) {
-+#define CPU_LOG_TB_FPU     (1 << 17)
++            /* env->config->nlevel check should have ensured this */
++            assert(level < sizeof(env->config->interrupt_vector));
- /* Lock output for a series of related logs.  Since this is not needed
++
-  * for a single qemu_log / qemu_log_mask / qemu_log_mask_and_addr, we
+             env->sregs[EPC1 + level - 1] = env->pc;
-diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
+             env->sregs[EPS2 + level - 2] = env->sregs[PS];
-index XXXXXXX..XXXXXXX 100644
+             env->sregs[PS] =
 --- a/accel/tcg/cpu-exec.c
 +++ b/accel/tcg/cpu-exec.c
@@ -XXX,XX +XXX,XX @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
      if (qemu_loglevel_mask(CPU_LOG_TB_CPU)
          && qemu_log_in_addr_range(itb->pc)) {
          qemu_log_lock();
 +        int flags = 0;
 +        if (qemu_loglevel_mask(CPU_LOG_TB_FPU)) {
 +            flags |= CPU_DUMP_FPU;
 +        }
  #if defined(TARGET_I386)
 -        log_cpu_state(cpu, CPU_DUMP_CCOP);
 -#else
 -        log_cpu_state(cpu, 0);
 +        flags |= CPU_DUMP_CCOP;
  #endif
 +        log_cpu_state(cpu, flags);
          qemu_log_unlock();
      }
  #endif /* DEBUG_DISAS */
 diff --git a/util/log.c b/util/log.c
 index XXXXXXX..XXXXXXX 100644
 --- a/util/log.c
 +++ b/util/log.c
@@ -XXX,XX +XXX,XX @@ const QEMULogItem qemu_log_items[] = {
        "show trace before each executed TB (lots of logs)" },
      { CPU_LOG_TB_CPU, "cpu",
        "show CPU registers before entering a TB (lots of logs)" },
 +    { CPU_LOG_TB_FPU, "fpu",
 +      "include FPU registers in the 'cpu' logging" },
      { CPU_LOG_MMU, "mmu",
        "log MMU-related activities" },
      { CPU_LOG_PCALL, "pcall",
 --
-.17.0
+.34.1

The following changes since commit ad1b4ec39caa5b3f17cbd8160283a03a3dcfe2ae:

Merge remote-tracking branch 'remotes/kraxel/tags/input-20180515-pull-request' into staging (2018-05-15 12:50:06 +0100)

are available in the Git repository at:

git://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20180515

for you to fetch changes up to ae7651804748c6b479d5ae09aeac4edb9c44f76e:

tcg: Optionally log FPU state in TCG -d cpu logging (2018-05-15 14:58:44 +0100)

----------------------------------------------------------------
target-arm queue:
 * Fix coverity nit in int_to_float code
 * Don't set Invalid for float-to-int(MAXINT)
 * Fix fp_status_f16 tininess before rounding
 * Add various missing insns from the v8.2-FP16 extension
 * Fix sqrt_f16 exception raising
 * sdcard: Correct CRC16 offset in sd_function_switch()
 * tcg: Optionally log FPU state in TCG -d cpu logging

----------------------------------------------------------------
Alex Bennée (5):
      fpu/softfloat: int_to_float ensure r fully initialised
      target/arm: Implement FCMP for fp16
      target/arm: Implement FCSEL for fp16
      target/arm: Implement FMOV (immediate) for fp16
      target/arm: Fix sqrt_f16 exception raising

Peter Maydell (3):
      fpu/softfloat: Don't set Invalid for float-to-int(MAXINT)
      target/arm: Fix fp_status_f16 tininess before rounding
      tcg: Optionally log FPU state in TCG -d cpu logging

Philippe Mathieu-Daudé (1):
      sdcard: Correct CRC16 offset in sd_function_switch()

Richard Henderson (7):
      target/arm: Implement FMOV (general) for fp16
      target/arm: Early exit after unallocated_encoding in disas_fp_int_conv
      target/arm: Implement FCVT (scalar, integer) for fp16
      target/arm: Implement FCVT (scalar, fixed-point) for fp16
      target/arm: Introduce and use read_fp_hreg
      target/arm: Implement FP data-processing (2 source) for fp16
      target/arm: Implement FP data-processing (3 source) for fp16

In float-to-integer conversion, if the floating point input
converts exactly to the largest or smallest integer that
fits in to the result type, this is not an overflow.
In this situation we were producing the correct result value,
but were incorrectly setting the Invalid flag.
For example for Arm A64, "FCVTAS w0, d0" on an input of
0x41dfffffffc00000 should produce 0x7fffffff and set no flags.

Fix the boundary case to take the right half of the if()
statements.

This fixes a regression from 2.11 introduced by the softfloat
refactoring.

Cc: qemu-stable@nongnu.org
Fixes: ab52f973a50
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180510140141.12120-1-peter.maydell@linaro.org
---
 fpu/softfloat.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -XXX,XX +XXX,XX @@ static int64_t round_to_int_and_pack(FloatParts in, int rmode,
             r = UINT64_MAX;
         }
         if (p.sign) {
-            if (r < -(uint64_t) min) {
+            if (r <= -(uint64_t) min) {
                 return -r;
             } else {
                 s->float_exception_flags = orig_flags | float_flag_invalid;
                 return min;
             }
         } else {
-            if (r < max) {
+            if (r <= max) {
                 return r;
             } else {
                 s->float_exception_flags = orig_flags | float_flag_invalid;
-- 
2.17.0

In commit d81ce0ef2c4f105 we added an extra float_status field
fp_status_fp16 for Arm, but forgot to initialize it correctly
by setting it to float_tininess_before_rounding. This currently
will only cause problems for the new V8_FP16 feature, since the
float-to-float conversion code doesn't use it yet. The effect
would be that we failed to set the Underflow IEEE exception flag
in all the cases where we should.

Add the missing initialization.

Fixes: d81ce0ef2c4f105
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20180512004311.9299-16-richard.henderson@linaro.org
---
 target/arm/cpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
                               &env->vfp.fp_status);
     set_float_detect_tininess(float_tininess_before_rounding,
                               &env->vfp.standard_fp_status);
+    set_float_detect_tininess(float_tininess_before_rounding,
+                              &env->vfp.fp_status_f16);
 #ifndef CONFIG_USER_ONLY
     if (kvm_enabled()) {
         kvm_arm_reset_vcpu(cpu);
-- 
2.17.0

From: Richard Henderson <richard.henderson@linaro.org>

Adding the fp16 moves to/from general registers.

Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180512003217.9105-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
             tcg_gen_st_i64(tcg_rn, cpu_env, fp_reg_hi_offset(s, rd));
             clear_vec_high(s, true, rd);
             break;
+        case 3:
+            /* 16 bit */
+            tmp = tcg_temp_new_i64();
+            tcg_gen_ext16u_i64(tmp, tcg_rn);
+            write_fp_dreg(s, rd, tmp);
+            tcg_temp_free_i64(tmp);
+            break;
+        default:
+            g_assert_not_reached();
         }
     } else {
         TCGv_i64 tcg_rd = cpu_reg(s, rd);
@@ -XXX,XX +XXX,XX @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
             /* 64 bits from top half */
             tcg_gen_ld_i64(tcg_rd, cpu_env, fp_reg_hi_offset(s, rn));
             break;
+        case 3:
+            /* 16 bit */
+            tcg_gen_ld16u_i64(tcg_rd, cpu_env, fp_reg_offset(s, rn, MO_16));
+            break;
+        default:
+            g_assert_not_reached();
         }
     }
 }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
         case 0xa: /* 64 bit */
         case 0xd: /* 64 bit to top half of quad */
             break;
+        case 0x6: /* 16-bit float, 32-bit int */
+        case 0xe: /* 16-bit float, 64-bit int */
+            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+                break;
+            }
+            /* fallthru */
         default:
             /* all other sf/type/rmode combinations are invalid */
             unallocated_encoding(s);
-- 
2.17.0

From: Richard Henderson <richard.henderson@linaro.org>

Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180512003217.9105-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h        |  6 +++
 target/arm/helper.c        | 38 ++++++++++++++-
 target/arm/translate-a64.c | 96 +++++++++++++++++++++++++++++++-------
 3 files changed, 122 insertions(+), 18 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_touhd_round_to_zero, i64, f64, i32, ptr)
 DEF_HELPER_3(vfp_tould_round_to_zero, i64, f64, i32, ptr)
 DEF_HELPER_3(vfp_touhh, i32, f16, i32, ptr)
 DEF_HELPER_3(vfp_toshh, i32, f16, i32, ptr)
+DEF_HELPER_3(vfp_toulh, i32, f16, i32, ptr)
+DEF_HELPER_3(vfp_toslh, i32, f16, i32, ptr)
+DEF_HELPER_3(vfp_touqh, i64, f16, i32, ptr)
+DEF_HELPER_3(vfp_tosqh, i64, f16, i32, ptr)
 DEF_HELPER_3(vfp_toshs, i32, f32, i32, ptr)
 DEF_HELPER_3(vfp_tosls, i32, f32, i32, ptr)
 DEF_HELPER_3(vfp_tosqs, i64, f32, i32, ptr)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_ultod, f64, i64, i32, ptr)
 DEF_HELPER_3(vfp_uqtod, f64, i64, i32, ptr)
 DEF_HELPER_3(vfp_sltoh, f16, i32, i32, ptr)
 DEF_HELPER_3(vfp_ultoh, f16, i32, i32, ptr)
+DEF_HELPER_3(vfp_sqtoh, f16, i64, i32, ptr)
+DEF_HELPER_3(vfp_uqtoh, f16, i64, i32, ptr)
 
 DEF_HELPER_FLAGS_2(set_rmode, TCG_CALL_NO_RWG, i32, i32, ptr)
 DEF_HELPER_FLAGS_2(set_neon_rmode, TCG_CALL_NO_RWG, i32, i32, env)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ VFP_CONV_FIX_A64(uq, s, 32, 64, uint64)
 #undef VFP_CONV_FIX_A64
 
 /* Conversion to/from f16 can overflow to infinity before/after scaling.
- * Therefore we convert to f64 (which does not round), scale,
- * and then convert f64 to f16 (which may round).
+ * Therefore we convert to f64, scale, and then convert f64 to f16; or
+ * vice versa for conversion to integer.
+ *
+ * For 16- and 32-bit integers, the conversion to f64 never rounds.
+ * For 64-bit integers, any integer that would cause rounding will also
+ * overflow to f16 infinity, so there is no double rounding problem.
  */
 
 static float16 do_postscale_fp16(float64 f, int shift, float_status *fpst)
@@ -XXX,XX +XXX,XX @@ float16 HELPER(vfp_ultoh)(uint32_t x, uint32_t shift, void *fpst)
     return do_postscale_fp16(uint32_to_float64(x, fpst), shift, fpst);
 }
 
+float16 HELPER(vfp_sqtoh)(uint64_t x, uint32_t shift, void *fpst)
+{
+    return do_postscale_fp16(int64_to_float64(x, fpst), shift, fpst);
+}
+
+float16 HELPER(vfp_uqtoh)(uint64_t x, uint32_t shift, void *fpst)
+{
+    return do_postscale_fp16(uint64_to_float64(x, fpst), shift, fpst);
+}
+
 static float64 do_prescale_fp16(float16 f, int shift, float_status *fpst)
 {
     if (unlikely(float16_is_any_nan(f))) {
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(vfp_touhh)(float16 x, uint32_t shift, void *fpst)
     return float64_to_uint16(do_prescale_fp16(x, shift, fpst), fpst);
 }
 
+uint32_t HELPER(vfp_toslh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_int32(do_prescale_fp16(x, shift, fpst), fpst);
+}
+
+uint32_t HELPER(vfp_toulh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_uint32(do_prescale_fp16(x, shift, fpst), fpst);
+}
+
+uint64_t HELPER(vfp_tosqh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_int64(do_prescale_fp16(x, shift, fpst), fpst);
+}
+
+uint64_t HELPER(vfp_touqh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_uint64(do_prescale_fp16(x, shift, fpst), fpst);
+}
+
 /* Set the current fp rounding mode and return the old one.
  * The argument is a softfloat float_round_ value.
  */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
                            bool itof, int rmode, int scale, int sf, int type)
 {
     bool is_signed = !(opcode & 1);
-    bool is_double = type;
     TCGv_ptr tcg_fpstatus;
-    TCGv_i32 tcg_shift;
+    TCGv_i32 tcg_shift, tcg_single;
+    TCGv_i64 tcg_double;
 
-    tcg_fpstatus = get_fpstatus_ptr(false);
+    tcg_fpstatus = get_fpstatus_ptr(type == 3);
 
     tcg_shift = tcg_const_i32(64 - scale);
 
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
             tcg_int = tcg_extend;
         }
 
-        if (is_double) {
-            TCGv_i64 tcg_double = tcg_temp_new_i64();
+        switch (type) {
+        case 1: /* float64 */
+            tcg_double = tcg_temp_new_i64();
             if (is_signed) {
                 gen_helper_vfp_sqtod(tcg_double, tcg_int,
                                      tcg_shift, tcg_fpstatus);
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
             }
             write_fp_dreg(s, rd, tcg_double);
             tcg_temp_free_i64(tcg_double);
-        } else {
-            TCGv_i32 tcg_single = tcg_temp_new_i32();
+            break;
+
+        case 0: /* float32 */
+            tcg_single = tcg_temp_new_i32();
             if (is_signed) {
                 gen_helper_vfp_sqtos(tcg_single, tcg_int,
                                      tcg_shift, tcg_fpstatus);
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
             }
             write_fp_sreg(s, rd, tcg_single);
             tcg_temp_free_i32(tcg_single);
+            break;
+
+        case 3: /* float16 */
+            tcg_single = tcg_temp_new_i32();
+            if (is_signed) {
+                gen_helper_vfp_sqtoh(tcg_single, tcg_int,
+                                     tcg_shift, tcg_fpstatus);
+            } else {
+                gen_helper_vfp_uqtoh(tcg_single, tcg_int,
+                                     tcg_shift, tcg_fpstatus);
+            }
+            write_fp_sreg(s, rd, tcg_single);
+            tcg_temp_free_i32(tcg_single);
+            break;
+
+        default:
+            g_assert_not_reached();
         }
     } else {
         TCGv_i64 tcg_int = cpu_reg(s, rd);
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
 
         gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
 
-        if (is_double) {
-            TCGv_i64 tcg_double = read_fp_dreg(s, rn);
+        switch (type) {
+        case 1: /* float64 */
+            tcg_double = read_fp_dreg(s, rn);
             if (is_signed) {
                 if (!sf) {
                     gen_helper_vfp_tosld(tcg_int, tcg_double,
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
                                          tcg_shift, tcg_fpstatus);
                 }
             }
+            if (!sf) {
+                tcg_gen_ext32u_i64(tcg_int, tcg_int);
+            }
             tcg_temp_free_i64(tcg_double);
-        } else {
-            TCGv_i32 tcg_single = read_fp_sreg(s, rn);
+            break;
+
+        case 0: /* float32 */
+            tcg_single = read_fp_sreg(s, rn);
             if (sf) {
                 if (is_signed) {
                     gen_helper_vfp_tosqs(tcg_int, tcg_single,
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
                 tcg_temp_free_i32(tcg_dest);
             }
             tcg_temp_free_i32(tcg_single);
+            break;
+
+        case 3: /* float16 */
+            tcg_single = read_fp_sreg(s, rn);
+            if (sf) {
+                if (is_signed) {
+                    gen_helper_vfp_tosqh(tcg_int, tcg_single,
+                                         tcg_shift, tcg_fpstatus);
+                } else {
+                    gen_helper_vfp_touqh(tcg_int, tcg_single,
+                                         tcg_shift, tcg_fpstatus);
+                }
+            } else {
+                TCGv_i32 tcg_dest = tcg_temp_new_i32();
+                if (is_signed) {
+                    gen_helper_vfp_toslh(tcg_dest, tcg_single,
+                                         tcg_shift, tcg_fpstatus);
+                } else {
+                    gen_helper_vfp_toulh(tcg_dest, tcg_single,
+                                         tcg_shift, tcg_fpstatus);
+                }
+                tcg_gen_extu_i32_i64(tcg_int, tcg_dest);
+                tcg_temp_free_i32(tcg_dest);
+            }
+            tcg_temp_free_i32(tcg_single);
+            break;
+
+        default:
+            g_assert_not_reached();
         }
 
         gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
         tcg_temp_free_i32(tcg_rmode);
-
-        if (!sf) {
-            tcg_gen_ext32u_i64(tcg_int, tcg_int);
-        }
     }
 
     tcg_temp_free_ptr(tcg_fpstatus);
@@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
         /* actual FP conversions */
         bool itof = extract32(opcode, 1, 1);
 
-        if (type > 1 || (rmode != 0 && opcode > 1)) {
+        if (rmode != 0 && opcode > 1) {
+            unallocated_encoding(s);
+            return;
+        }
+        switch (type) {
+        case 0: /* float32 */
+        case 1: /* float64 */
+            break;
+        case 3: /* float16 */
+            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+                break;
+            }
+            /* fallthru */
+        default:
             unallocated_encoding(s);
             return;
         }
-- 
2.17.0

From: Richard Henderson <richard.henderson@linaro.org>

Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180512003217.9105-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_fp_fixed_conv(DisasContext *s, uint32_t insn)
     bool sf = extract32(insn, 31, 1);
     bool itof;
 
-    if (sbit || (type > 1)
-        || (!sf && scale < 32)) {
+    if (sbit || (!sf && scale < 32)) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (type) {
+    case 0: /* float32 */
+    case 1: /* float64 */
+        break;
+    case 3: /* float16 */
+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            break;
+        }
+        /* fallthru */
+    default:
         unallocated_encoding(s);
         return;
     }
-- 
2.17.0

From: Richard Henderson <richard.henderson@linaro.org>

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180512003217.9105-6-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static TCGv_i32 read_fp_sreg(DisasContext *s, int reg)
     return v;
 }
 
+static TCGv_i32 read_fp_hreg(DisasContext *s, int reg)
+{
+    TCGv_i32 v = tcg_temp_new_i32();
+
+    tcg_gen_ld16u_i32(v, cpu_env, fp_reg_offset(s, reg, MO_16));
+    return v;
+}
+
 /* Clear the bits above an N-bit vector, for N = (is_q ? 128 : 64).
  * If SVE is not enabled, then there are only 128 bits in the vector.
  */
@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
 static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn)
 {
     TCGv_ptr fpst = NULL;
-    TCGv_i32 tcg_op = tcg_temp_new_i32();
+    TCGv_i32 tcg_op = read_fp_hreg(s, rn);
     TCGv_i32 tcg_res = tcg_temp_new_i32();
 
-    read_vec_element_i32(s, tcg_op, rn, 0, MO_16);
-
     switch (opcode) {
     case 0x0: /* FMOV */
         tcg_gen_mov_i32(tcg_res, tcg_op);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
         tcg_temp_free_i64(tcg_op2);
         tcg_temp_free_i64(tcg_res);
     } else {
-        TCGv_i32 tcg_op1 = tcg_temp_new_i32();
-        TCGv_i32 tcg_op2 = tcg_temp_new_i32();
+        TCGv_i32 tcg_op1 = read_fp_hreg(s, rn);
+        TCGv_i32 tcg_op2 = read_fp_hreg(s, rm);
         TCGv_i64 tcg_res = tcg_temp_new_i64();
 
-        read_vec_element_i32(s, tcg_op1, rn, 0, MO_16);
-        read_vec_element_i32(s, tcg_op2, rm, 0, MO_16);
-
         gen_helper_neon_mull_s16(tcg_res, tcg_op1, tcg_op2);
         gen_helper_neon_addl_saturate_s32(tcg_res, cpu_env, tcg_res, tcg_res);
 
@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_three_reg_same_fp16(DisasContext *s,
 
     fpst = get_fpstatus_ptr(true);
 
-    tcg_op1 = tcg_temp_new_i32();
-    tcg_op2 = tcg_temp_new_i32();
+    tcg_op1 = read_fp_hreg(s, rn);
+    tcg_op2 = read_fp_hreg(s, rm);
     tcg_res = tcg_temp_new_i32();
 
-    read_vec_element_i32(s, tcg_op1, rn, 0, MO_16);
-    read_vec_element_i32(s, tcg_op2, rm, 0, MO_16);
-
     switch (fpopcode) {
     case 0x03: /* FMULX */
         gen_helper_advsimd_mulxh(tcg_res, tcg_op1, tcg_op2, fpst);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn)
     }
 
     if (is_scalar) {
-        TCGv_i32 tcg_op = tcg_temp_new_i32();
+        TCGv_i32 tcg_op = read_fp_hreg(s, rn);
         TCGv_i32 tcg_res = tcg_temp_new_i32();
 
-        read_vec_element_i32(s, tcg_op, rn, 0, MO_16);
-
         switch (fpop) {
         case 0x1a: /* FCVTNS */
         case 0x1b: /* FCVTMS */
-- 
2.17.0

From: Richard Henderson <richard.henderson@linaro.org>

We missed all of the scalar fp16 binary operations.

Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180512003217.9105-7-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 65 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fp_2src_double(DisasContext *s, int opcode,
     tcg_temp_free_i64(tcg_res);
 }
 
+/* Floating-point data-processing (2 source) - half precision */
+static void handle_fp_2src_half(DisasContext *s, int opcode,
+                                int rd, int rn, int rm)
+{
+    TCGv_i32 tcg_op1;
+    TCGv_i32 tcg_op2;
+    TCGv_i32 tcg_res;
+    TCGv_ptr fpst;
+
+    tcg_res = tcg_temp_new_i32();
+    fpst = get_fpstatus_ptr(true);
+    tcg_op1 = read_fp_hreg(s, rn);
+    tcg_op2 = read_fp_hreg(s, rm);
+
+    switch (opcode) {
+    case 0x0: /* FMUL */
+        gen_helper_advsimd_mulh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x1: /* FDIV */
+        gen_helper_advsimd_divh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x2: /* FADD */
+        gen_helper_advsimd_addh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x3: /* FSUB */
+        gen_helper_advsimd_subh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x4: /* FMAX */
+        gen_helper_advsimd_maxh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x5: /* FMIN */
+        gen_helper_advsimd_minh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x6: /* FMAXNM */
+        gen_helper_advsimd_maxnumh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x7: /* FMINNM */
+        gen_helper_advsimd_minnumh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x8: /* FNMUL */
+        gen_helper_advsimd_mulh(tcg_res, tcg_op1, tcg_op2, fpst);
+        tcg_gen_xori_i32(tcg_res, tcg_res, 0x8000);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    write_fp_sreg(s, rd, tcg_res);
+
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tcg_op1);
+    tcg_temp_free_i32(tcg_op2);
+    tcg_temp_free_i32(tcg_res);
+}
+
 /* Floating point data-processing (2 source)
  *   31  30  29 28       24 23  22  21 20  16 15    12 11 10 9    5 4    0
  * +---+---+---+-----------+------+---+------+--------+-----+------+------+
@@ -XXX,XX +XXX,XX @@ static void disas_fp_2src(DisasContext *s, uint32_t insn)
         }
         handle_fp_2src_double(s, opcode, rd, rn, rm);
         break;
+    case 3:
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            unallocated_encoding(s);
+            return;
+        }
+        if (!fp_access_check(s)) {
+            return;
+        }
+        handle_fp_2src_half(s, opcode, rd, rn, rm);
+        break;
     default:
         unallocated_encoding(s);
     }
-- 
2.17.0

From: Richard Henderson <richard.henderson@linaro.org>

We missed all of the scalar fp16 fma operations.

Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180512003217.9105-8-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 48 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fp_3src_double(DisasContext *s, bool o0, bool o1,
     tcg_temp_free_i64(tcg_res);
 }
 
+/* Floating-point data-processing (3 source) - half precision */
+static void handle_fp_3src_half(DisasContext *s, bool o0, bool o1,
+                                int rd, int rn, int rm, int ra)
+{
+    TCGv_i32 tcg_op1, tcg_op2, tcg_op3;
+    TCGv_i32 tcg_res = tcg_temp_new_i32();
+    TCGv_ptr fpst = get_fpstatus_ptr(true);
+
+    tcg_op1 = read_fp_hreg(s, rn);
+    tcg_op2 = read_fp_hreg(s, rm);
+    tcg_op3 = read_fp_hreg(s, ra);
+
+    /* These are fused multiply-add, and must be done as one
+     * floating point operation with no rounding between the
+     * multiplication and addition steps.
+     * NB that doing the negations here as separate steps is
+     * correct : an input NaN should come out with its sign bit
+     * flipped if it is a negated-input.
+     */
+    if (o1 == true) {
+        tcg_gen_xori_i32(tcg_op3, tcg_op3, 0x8000);
+    }
+
+    if (o0 != o1) {
+        tcg_gen_xori_i32(tcg_op1, tcg_op1, 0x8000);
+    }
+
+    gen_helper_advsimd_muladdh(tcg_res, tcg_op1, tcg_op2, tcg_op3, fpst);
+
+    write_fp_sreg(s, rd, tcg_res);
+
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tcg_op1);
+    tcg_temp_free_i32(tcg_op2);
+    tcg_temp_free_i32(tcg_op3);
+    tcg_temp_free_i32(tcg_res);
+}
+
 /* Floating point data-processing (3 source)
  *   31  30  29 28       24 23  22  21  20  16  15  14  10 9    5 4    0
  * +---+---+---+-----------+------+----+------+----+------+------+------+
@@ -XXX,XX +XXX,XX @@ static void disas_fp_3src(DisasContext *s, uint32_t insn)
         }
         handle_fp_3src_double(s, o0, o1, rd, rn, rm, ra);
         break;
+    case 3:
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            unallocated_encoding(s);
+            return;
+        }
+        if (!fp_access_check(s)) {
+            return;
+        }
+        handle_fp_3src_half(s, o0, o1, rd, rn, rm, ra);
+        break;
     default:
         unallocated_encoding(s);
     }
-- 
2.17.0

From: Alex Bennée <alex.bennee@linaro.org>

These where missed out from the rest of the half-precision work.

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180512003217.9105-9-richard.henderson@linaro.org
[rth: Diagnose lack of FP16 before fp_access_check]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-a64.h    |  2 +
 target/arm/helper-a64.c    | 10 +++++
 target/arm/translate-a64.c | 88 ++++++++++++++++++++++++++++++--------
 3 files changed, 83 insertions(+), 17 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -XXX,XX +XXX,XX @@
 DEF_HELPER_FLAGS_2(udiv64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(sdiv64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 DEF_HELPER_FLAGS_1(rbit64, TCG_CALL_NO_RWG_SE, i64, i64)
+DEF_HELPER_3(vfp_cmph_a64, i64, f16, f16, ptr)
+DEF_HELPER_3(vfp_cmpeh_a64, i64, f16, f16, ptr)
 DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
 DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
 DEF_HELPER_3(vfp_cmpd_a64, i64, f64, f64, ptr)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -XXX,XX +XXX,XX @@ static inline uint32_t float_rel_to_flags(int res)
     return flags;
 }
 
+uint64_t HELPER(vfp_cmph_a64)(float16 x, float16 y, void *fp_status)
+{
+    return float_rel_to_flags(float16_compare_quiet(x, y, fp_status));
+}
+
+uint64_t HELPER(vfp_cmpeh_a64)(float16 x, float16 y, void *fp_status)
+{
+    return float_rel_to_flags(float16_compare(x, y, fp_status));
+}
+
 uint64_t HELPER(vfp_cmps_a64)(float32 x, float32 y, void *fp_status)
 {
     return float_rel_to_flags(float32_compare_quiet(x, y, fp_status));
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn)
     }
 }
 
-static void handle_fp_compare(DisasContext *s, bool is_double,
+static void handle_fp_compare(DisasContext *s, int size,
                               unsigned int rn, unsigned int rm,
                               bool cmp_with_zero, bool signal_all_nans)
 {
     TCGv_i64 tcg_flags = tcg_temp_new_i64();
-    TCGv_ptr fpst = get_fpstatus_ptr(false);
+    TCGv_ptr fpst = get_fpstatus_ptr(size == MO_16);
 
-    if (is_double) {
+    if (size == MO_64) {
         TCGv_i64 tcg_vn, tcg_vm;
 
         tcg_vn = read_fp_dreg(s, rn);
@@ -XXX,XX +XXX,XX @@ static void handle_fp_compare(DisasContext *s, bool is_double,
         tcg_temp_free_i64(tcg_vn);
         tcg_temp_free_i64(tcg_vm);
     } else {
-        TCGv_i32 tcg_vn, tcg_vm;
+        TCGv_i32 tcg_vn = tcg_temp_new_i32();
+        TCGv_i32 tcg_vm = tcg_temp_new_i32();
 
-        tcg_vn = read_fp_sreg(s, rn);
+        read_vec_element_i32(s, tcg_vn, rn, 0, size);
         if (cmp_with_zero) {
-            tcg_vm = tcg_const_i32(0);
+            tcg_gen_movi_i32(tcg_vm, 0);
         } else {
-            tcg_vm = read_fp_sreg(s, rm);
+            read_vec_element_i32(s, tcg_vm, rm, 0, size);
         }
-        if (signal_all_nans) {
-            gen_helper_vfp_cmpes_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
-        } else {
-            gen_helper_vfp_cmps_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
+
+        switch (size) {
+        case MO_32:
+            if (signal_all_nans) {
+                gen_helper_vfp_cmpes_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
+            } else {
+                gen_helper_vfp_cmps_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
+            }
+            break;
+        case MO_16:
+            if (signal_all_nans) {
+                gen_helper_vfp_cmpeh_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
+            } else {
+                gen_helper_vfp_cmph_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
+            }
+            break;
+        default:
+            g_assert_not_reached();
         }
+
         tcg_temp_free_i32(tcg_vn);
         tcg_temp_free_i32(tcg_vm);
     }
@@ -XXX,XX +XXX,XX @@ static void handle_fp_compare(DisasContext *s, bool is_double,
 static void disas_fp_compare(DisasContext *s, uint32_t insn)
 {
     unsigned int mos, type, rm, op, rn, opc, op2r;
+    int size;
 
     mos = extract32(insn, 29, 3);
-    type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
+    type = extract32(insn, 22, 2);
     rm = extract32(insn, 16, 5);
     op = extract32(insn, 14, 2);
     rn = extract32(insn, 5, 5);
     opc = extract32(insn, 3, 2);
     op2r = extract32(insn, 0, 3);
 
-    if (mos || op || op2r || type > 1) {
+    if (mos || op || op2r) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (type) {
+    case 0:
+        size = MO_32;
+        break;
+    case 1:
+        size = MO_64;
+        break;
+    case 3:
+        size = MO_16;
+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            break;
+        }
+        /* fallthru */
+    default:
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_compare(DisasContext *s, uint32_t insn)
         return;
     }
 
-    handle_fp_compare(s, type, rn, rm, opc & 1, opc & 2);
+    handle_fp_compare(s, size, rn, rm, opc & 1, opc & 2);
 }
 
 /* Floating point conditional compare
@@ -XXX,XX +XXX,XX @@ static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
     unsigned int mos, type, rm, cond, rn, op, nzcv;
     TCGv_i64 tcg_flags;
     TCGLabel *label_continue = NULL;
+    int size;
 
     mos = extract32(insn, 29, 3);
-    type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
+    type = extract32(insn, 22, 2);
     rm = extract32(insn, 16, 5);
     cond = extract32(insn, 12, 4);
     rn = extract32(insn, 5, 5);
     op = extract32(insn, 4, 1);
     nzcv = extract32(insn, 0, 4);
 
-    if (mos || type > 1) {
+    if (mos) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (type) {
+    case 0:
+        size = MO_32;
+        break;
+    case 1:
+        size = MO_64;
+        break;
+    case 3:
+        size = MO_16;
+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            break;
+        }
+        /* fallthru */
+    default:
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
         gen_set_label(label_match);
     }
 
-    handle_fp_compare(s, type, rn, rm, false, op);
+    handle_fp_compare(s, size, rn, rm, false, op);
 
     if (cond < 0x0e) {
         gen_set_label(label_continue);
-- 
2.17.0

From: Alex Bennée <alex.bennee@linaro.org>

These were missed out from the rest of the half-precision work.

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180512003217.9105-10-richard.henderson@linaro.org
[rth: Fix erroneous check vs type]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
     unsigned int mos, type, rm, cond, rn, rd;
     TCGv_i64 t_true, t_false, t_zero;
     DisasCompare64 c;
+    TCGMemOp sz;
 
     mos = extract32(insn, 29, 3);
-    type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
+    type = extract32(insn, 22, 2);
     rm = extract32(insn, 16, 5);
     cond = extract32(insn, 12, 4);
     rn = extract32(insn, 5, 5);
     rd = extract32(insn, 0, 5);
 
-    if (mos || type > 1) {
+    if (mos) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (type) {
+    case 0:
+        sz = MO_32;
+        break;
+    case 1:
+        sz = MO_64;
+        break;
+    case 3:
+        sz = MO_16;
+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            break;
+        }
+        /* fallthru */
+    default:
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
         return;
     }
 
-    /* Zero extend sreg inputs to 64 bits now.  */
+    /* Zero extend sreg & hreg inputs to 64 bits now.  */
     t_true = tcg_temp_new_i64();
     t_false = tcg_temp_new_i64();
-    read_vec_element(s, t_true, rn, 0, type ? MO_64 : MO_32);
-    read_vec_element(s, t_false, rm, 0, type ? MO_64 : MO_32);
+    read_vec_element(s, t_true, rn, 0, sz);
+    read_vec_element(s, t_false, rm, 0, sz);
 
     a64_test_cc(&c, cond);
     t_zero = tcg_const_i64(0);
@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
     tcg_temp_free_i64(t_false);
     a64_free_cc(&c);
 
-    /* Note that sregs write back zeros to the high bits,
+    /* Note that sregs & hregs write back zeros to the high bits,
        and we've already done the zero-extension.  */
     write_fp_dreg(s, rd, t_true);
     tcg_temp_free_i64(t_true);
-- 
2.17.0

From: Alex Bennée <alex.bennee@linaro.org>

All the hard work is already done by vfp_expand_imm, we just need to
make sure we pick up the correct size.

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180512003217.9105-11-richard.henderson@linaro.org
[rth: Merge unallocated_encoding check with TCGMemOp conversion.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_fp_imm(DisasContext *s, uint32_t insn)
 {
     int rd = extract32(insn, 0, 5);
     int imm8 = extract32(insn, 13, 8);
-    int is_double = extract32(insn, 22, 2);
+    int type = extract32(insn, 22, 2);
     uint64_t imm;
     TCGv_i64 tcg_res;
+    TCGMemOp sz;
 
-    if (is_double > 1) {
+    switch (type) {
+    case 0:
+        sz = MO_32;
+        break;
+    case 1:
+        sz = MO_64;
+        break;
+    case 3:
+        sz = MO_16;
+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            break;
+        }
+        /* fallthru */
+    default:
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_imm(DisasContext *s, uint32_t insn)
         return;
     }
 
-    imm = vfp_expand_imm(MO_32 + is_double, imm8);
+    imm = vfp_expand_imm(sz, imm8);
 
     tcg_res = tcg_const_i64(imm);
     write_fp_dreg(s, rd, tcg_res);
-- 
2.17.0

From: Alex Bennée <alex.bennee@linaro.org>

We are meant to explicitly pass fpst, not cpu_env.

Cc: qemu-stable@nongnu.org
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180512003217.9105-12-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn)
         tcg_gen_xori_i32(tcg_res, tcg_op, 0x8000);
         break;
     case 0x3: /* FSQRT */
-        gen_helper_sqrt_f16(tcg_res, tcg_op, cpu_env);
+        fpst = get_fpstatus_ptr(true);
+        gen_helper_sqrt_f16(tcg_res, tcg_op, fpst);
         break;
     case 0x8: /* FRINTN */
     case 0x9: /* FRINTP */
-- 
2.17.0

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

Per the Physical Layer Simplified Spec. "4.3.10.4 Switch Function Status":

The block length is predefined to 512 bits

and "4.10.2 SD Status":

The SD Status contains status bits that are related to the SD Memory Card
  proprietary features and may be used for future application-specific usage.
  The size of the SD Status is one data block of 512 bit. The content of this
  register is transmitted to the Host over the DAT bus along with a 16-bit CRC.

Thus the 16-bit CRC goes at offset 64.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180509060104.4458-3-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/sd/sd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -XXX,XX +XXX,XX @@ static void sd_function_switch(SDState *sd, uint32_t arg)
         sd->data[14 + (i >> 1)] = new_func << ((i * 4) & 4);
     }
     memset(&sd->data[17], 0, 47);
-    stw_be_p(sd->data + 65, sd_crc16(sd->data, 64));
+    stw_be_p(sd->data + 64, sd_crc16(sd->data, 64));
 }
 
 static inline bool sd_wp_addr(SDState *sd, uint64_t addr)
-- 
2.17.0

Usually the logging of the CPU state produced by -d cpu is sufficient
to diagnose problems, but sometimes you want to see the state of
the floating point registers as well. We don't want to enable that
by default as it adds a lot of extra data to the log; instead,
allow it to be optionally enabled via -d fpu.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180510130024.31678-1-peter.maydell@linaro.org
---
 include/qemu/log.h   | 1 +
 accel/tcg/cpu-exec.c | 9 ++++++---
 util/log.c           | 2 ++
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/qemu/log.h b/include/qemu/log.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -XXX,XX +XXX,XX @@ static inline bool qemu_log_separate(void)
 #define CPU_LOG_PAGE       (1 << 14)
 /* LOG_TRACE (1 << 15) is defined in log-for-trace.h */
 #define CPU_LOG_TB_OP_IND  (1 << 16)
+#define CPU_LOG_TB_FPU     (1 << 17)
 
 /* Lock output for a series of related logs.  Since this is not needed
  * for a single qemu_log / qemu_log_mask / qemu_log_mask_and_addr, we
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -XXX,XX +XXX,XX @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
     if (qemu_loglevel_mask(CPU_LOG_TB_CPU)
         && qemu_log_in_addr_range(itb->pc)) {
         qemu_log_lock();
+        int flags = 0;
+        if (qemu_loglevel_mask(CPU_LOG_TB_FPU)) {
+            flags |= CPU_DUMP_FPU;
+        }
 #if defined(TARGET_I386)
-        log_cpu_state(cpu, CPU_DUMP_CCOP);
-#else
-        log_cpu_state(cpu, 0);
+        flags |= CPU_DUMP_CCOP;
 #endif
+        log_cpu_state(cpu, flags);
         qemu_log_unlock();
     }
 #endif /* DEBUG_DISAS */
diff --git a/util/log.c b/util/log.c
index XXXXXXX..XXXXXXX 100644
--- a/util/log.c
+++ b/util/log.c
@@ -XXX,XX +XXX,XX @@ const QEMULogItem qemu_log_items[] = {
       "show trace before each executed TB (lots of logs)" },
     { CPU_LOG_TB_CPU, "cpu",
       "show CPU registers before entering a TB (lots of logs)" },
+    { CPU_LOG_TB_FPU, "fpu",
+      "include FPU registers in the 'cpu' logging" },
     { CPU_LOG_MMU, "mmu",
       "log MMU-related activities" },
     { CPU_LOG_PCALL, "pcall",
-- 
2.17.0

Just a collection of bug fixes this time around...

thanks
-- PMM

The following changes since commit 2a6ae69154542caa91dd17c40fd3f5ffbec300de:

Merge tag 'pull-maintainer-ominbus-030723-1' of https://gitlab.com/stsquad/qemu into staging (2023-07-04 08:36:44 +0200)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20230704

for you to fetch changes up to 86a78272f094857b4eda79d721c116e93942aa9a:

target/xtensa: Assert that interrupt level is within bounds (2023-07-04 14:27:08 +0100)

----------------------------------------------------------------
target-arm queue:
 * Add raw_writes ops for register whose write induce TLB maintenance
 * hw/arm/sbsa-ref: use XHCI to replace EHCI
 * Avoid splitting Zregs across lines in dump
 * Dump ZA[] when active
 * Fix SME full tile indexing
 * Handle IC IVAU to improve compatibility with JITs
 * xlnx-canfd-test: Fix code coverity issues
 * gdbstub: Guard M-profile code with CONFIG_TCG
 * allwinner-sramc: Set class_size
 * target/xtensa: Assert that interrupt level is within bounds

----------------------------------------------------------------
Akihiko Odaki (1):
      hw: arm: allwinner-sramc: Set class_size

Eric Auger (1):
      target/arm: Add raw_writes ops for register whose write induce TLB maintenance

Fabiano Rosas (1):
      target/arm: gdbstub: Guard M-profile code with CONFIG_TCG

John Högberg (2):
      target/arm: Handle IC IVAU to improve compatibility with JITs
      tests/tcg/aarch64: Add testcases for IC IVAU and dual-mapped code

Peter Maydell (1):
      target/xtensa: Assert that interrupt level is within bounds

Richard Henderson (3):
      target/arm: Avoid splitting Zregs across lines in dump
      target/arm: Dump ZA[] when active
      target/arm: Fix SME full tile indexing

Vikram Garhwal (1):
      tests/qtest: xlnx-canfd-test: Fix code coverity issues

Yuquan Wang (1):
      hw/arm/sbsa-ref: use XHCI to replace EHCI

docs/system/arm/sbsa.rst          |   5 +-
 hw/arm/sbsa-ref.c                 |  23 +++--
 hw/misc/allwinner-sramc.c         |   1 +
 target/arm/cpu.c                  |  65 ++++++++-----
 target/arm/gdbstub.c              |   4 +
 target/arm/helper.c               |  70 +++++++++++---
 target/arm/tcg/translate-sme.c    |  24 +++--
 target/xtensa/exc_helper.c        |   3 +
 tests/qtest/xlnx-canfd-test.c     |  33 +++----
 tests/tcg/aarch64/icivau.c        | 189 ++++++++++++++++++++++++++++++++++++++
 tests/tcg/aarch64/sme-outprod1.c  |  83 +++++++++++++++++
 hw/arm/Kconfig                    |   2 +-
 tests/tcg/aarch64/Makefile.target |  13 ++-
 13 files changed, 436 insertions(+), 79 deletions(-)
 create mode 100644 tests/tcg/aarch64/icivau.c
 create mode 100644 tests/tcg/aarch64/sme-outprod1.c

From: Eric Auger <eric.auger@redhat.com>

Some registers whose 'cooked' writefns induce TLB maintenance do
not have raw_writefn ops defined. If only the writefn ops is set
(ie. no raw_writefn is provided), it is assumed the cooked also
work as the raw one. For those registers it is not obvious the
tlb_flush works on KVM mode so better/safer setting the raw write.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vmsa_cp_reginfo[] = {
       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 0,
       .access = PL1_RW, .accessfn = access_tvm_trvm,
       .fgt = FGT_TTBR0_EL1,
-      .writefn = vmsa_ttbr_write, .resetvalue = 0,
+      .writefn = vmsa_ttbr_write, .resetvalue = 0, .raw_writefn = raw_write,
       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr0_s),
                              offsetof(CPUARMState, cp15.ttbr0_ns) } },
     { .name = "TTBR1_EL1", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 0, .opc2 = 1,
       .access = PL1_RW, .accessfn = access_tvm_trvm,
       .fgt = FGT_TTBR1_EL1,
-      .writefn = vmsa_ttbr_write, .resetvalue = 0,
+      .writefn = vmsa_ttbr_write, .resetvalue = 0, .raw_writefn = raw_write,
       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr1_s),
                              offsetof(CPUARMState, cp15.ttbr1_ns) } },
     { .name = "TCR_EL1", .state = ARM_CP_STATE_AA64,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo lpae_cp_reginfo[] = {
       .type = ARM_CP_64BIT | ARM_CP_ALIAS,
       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr0_s),
                              offsetof(CPUARMState, cp15.ttbr0_ns) },
-      .writefn = vmsa_ttbr_write, },
+      .writefn = vmsa_ttbr_write, .raw_writefn = raw_write },
     { .name = "TTBR1", .cp = 15, .crm = 2, .opc1 = 1,
       .access = PL1_RW, .accessfn = access_tvm_trvm,
       .type = ARM_CP_64BIT | ARM_CP_ALIAS,
       .bank_fieldoffsets = { offsetof(CPUARMState, cp15.ttbr1_s),
                              offsetof(CPUARMState, cp15.ttbr1_ns) },
-      .writefn = vmsa_ttbr_write, },
+      .writefn = vmsa_ttbr_write, .raw_writefn = raw_write },
 };
 
 static uint64_t aa64_fpcr_read(CPUARMState *env, const ARMCPRegInfo *ri)
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
       .type = ARM_CP_IO,
       .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 0,
       .access = PL2_RW, .fieldoffset = offsetof(CPUARMState, cp15.hcr_el2),
-      .writefn = hcr_write },
+      .writefn = hcr_write, .raw_writefn = raw_write },
     { .name = "HCR", .state = ARM_CP_STATE_AA32,
       .type = ARM_CP_ALIAS | ARM_CP_IO,
       .cp = 15, .opc1 = 4, .crn = 1, .crm = 1, .opc2 = 0,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
     { .name = "TCR_EL2", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 4, .crn = 2, .crm = 0, .opc2 = 2,
       .access = PL2_RW, .writefn = vmsa_tcr_el12_write,
+      .raw_writefn = raw_write,
       .fieldoffset = offsetof(CPUARMState, cp15.tcr_el[2]) },
     { .name = "VTCR", .state = ARM_CP_STATE_AA32,
       .cp = 15, .opc1 = 4, .crn = 2, .crm = 1, .opc2 = 2,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
       .type = ARM_CP_64BIT | ARM_CP_ALIAS,
       .access = PL2_RW, .accessfn = access_el3_aa32ns,
       .fieldoffset = offsetof(CPUARMState, cp15.vttbr_el2),
-      .writefn = vttbr_write },
+      .writefn = vttbr_write, .raw_writefn = raw_write },
     { .name = "VTTBR_EL2", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 4, .crn = 2, .crm = 1, .opc2 = 0,
-      .access = PL2_RW, .writefn = vttbr_write,
+      .access = PL2_RW, .writefn = vttbr_write, .raw_writefn = raw_write,
       .fieldoffset = offsetof(CPUARMState, cp15.vttbr_el2) },
     { .name = "SCTLR_EL2", .state = ARM_CP_STATE_BOTH,
       .opc0 = 3, .opc1 = 4, .crn = 1, .crm = 0, .opc2 = 0,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el2_cp_reginfo[] = {
       .fieldoffset = offsetof(CPUARMState, cp15.tpidr_el[2]) },
     { .name = "TTBR0_EL2", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 4, .crn = 2, .crm = 0, .opc2 = 0,
-      .access = PL2_RW, .resetvalue = 0, .writefn = vmsa_tcr_ttbr_el2_write,
+      .access = PL2_RW, .resetvalue = 0,
+      .writefn = vmsa_tcr_ttbr_el2_write, .raw_writefn = raw_write,
       .fieldoffset = offsetof(CPUARMState, cp15.ttbr0_el[2]) },
     { .name = "HTTBR", .cp = 15, .opc1 = 4, .crm = 2,
       .access = PL2_RW, .type = ARM_CP_64BIT | ARM_CP_ALIAS,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo el3_cp_reginfo[] = {
     { .name = "SCR_EL3", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 6, .crn = 1, .crm = 1, .opc2 = 0,
       .access = PL3_RW, .fieldoffset = offsetof(CPUARMState, cp15.scr_el3),
-      .resetfn = scr_reset, .writefn = scr_write },
+      .resetfn = scr_reset, .writefn = scr_write, .raw_writefn = raw_write },
     { .name = "SCR",  .type = ARM_CP_ALIAS | ARM_CP_NEWEL,
       .cp = 15, .opc1 = 0, .crn = 1, .crm = 1, .opc2 = 0,
       .access = PL1_RW, .accessfn = access_trap_aa32s_el1,
       .fieldoffset = offsetoflow32(CPUARMState, cp15.scr_el3),
-      .writefn = scr_write },
+      .writefn = scr_write, .raw_writefn = raw_write },
     { .name = "SDER32_EL3", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 6, .crn = 1, .crm = 1, .opc2 = 1,
       .access = PL3_RW, .resetvalue = 0,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vhe_reginfo[] = {
     { .name = "TTBR1_EL2", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 4, .crn = 2, .crm = 0, .opc2 = 1,
       .access = PL2_RW, .writefn = vmsa_tcr_ttbr_el2_write,
+      .raw_writefn = raw_write,
       .fieldoffset = offsetof(CPUARMState, cp15.ttbr1_el[2]) },
 #ifndef CONFIG_USER_ONLY
     { .name = "CNTHV_CVAL_EL2", .state = ARM_CP_STATE_AA64,
-- 
2.34.1

From: Yuquan Wang <wangyuquan1236@phytium.com.cn>

The current sbsa-ref cannot use EHCI controller which is only
able to do 32-bit DMA, since sbsa-ref doesn't have RAM below 4GB.
Hence, this uses XHCI to provide a usb controller with 64-bit
DMA capablity instead of EHCI.

We bump the platform version to 0.3 with this change.  Although the
hardware at the USB controller address changes, the firmware and
Linux can both cope with this -- on an older non-XHCI-aware
firmware/kernel setup the probe routine simply fails and the guest
proceeds without any USB.  (This isn't a loss of functionality,
because the old USB controller never worked in the first place.) So
we can call this a backwards-compatible change and only bump the
minor version.

Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn>
Message-id: 20230621103847.447508-2-wangyuquan1236@phytium.com.cn
[PMM: tweaked commit message; add line to docs about what
 changes in platform version 0.3]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/sbsa.rst |  5 ++++-
 hw/arm/sbsa-ref.c        | 23 +++++++++++++----------
 hw/arm/Kconfig           |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/docs/system/arm/sbsa.rst b/docs/system/arm/sbsa.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/sbsa.rst
+++ b/docs/system/arm/sbsa.rst
@@ -XXX,XX +XXX,XX @@ The ``sbsa-ref`` board supports:
   - A configurable number of AArch64 CPUs
   - GIC version 3
   - System bus AHCI controller
-  - System bus EHCI controller
+  - System bus XHCI controller
   - CDROM and hard disc on AHCI bus
   - E1000E ethernet card on PCIe bus
   - Bochs display adapter on PCIe bus
@@ -XXX,XX +XXX,XX @@ Platform version changes:
 
 0.2
   GIC ITS information is present in devicetree.
+
+0.3
+  The USB controller is an XHCI device, not EHCI
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/pci-host/gpex.h"
 #include "hw/qdev-properties.h"
 #include "hw/usb.h"
+#include "hw/usb/xhci.h"
 #include "hw/char/pl011.h"
 #include "hw/watchdog/sbsa_gwdt.h"
 #include "net/net.h"
@@ -XXX,XX +XXX,XX @@ enum {
     SBSA_SECURE_UART_MM,
     SBSA_SECURE_MEM,
     SBSA_AHCI,
-    SBSA_EHCI,
+    SBSA_XHCI,
 };
 
 struct SBSAMachineState {
@@ -XXX,XX +XXX,XX @@ static const MemMapEntry sbsa_ref_memmap[] = {
     [SBSA_SMMU] =               { 0x60050000, 0x00020000 },
     /* Space here reserved for more SMMUs */
     [SBSA_AHCI] =               { 0x60100000, 0x00010000 },
-    [SBSA_EHCI] =               { 0x60110000, 0x00010000 },
+    [SBSA_XHCI] =               { 0x60110000, 0x00010000 },
     /* Space here reserved for other devices */
     [SBSA_PCIE_PIO] =           { 0x7fff0000, 0x00010000 },
     /* 32-bit address PCIE MMIO space */
@@ -XXX,XX +XXX,XX @@ static const int sbsa_ref_irqmap[] = {
     [SBSA_SECURE_UART] = 8,
     [SBSA_SECURE_UART_MM] = 9,
     [SBSA_AHCI] = 10,
-    [SBSA_EHCI] = 11,
+    [SBSA_XHCI] = 11,
     [SBSA_SMMU] = 12, /* ... to 15 */
     [SBSA_GWDT_WS0] = 16,
 };
@@ -XXX,XX +XXX,XX @@ static void create_fdt(SBSAMachineState *sms)
      *                        fw compatibility.
      */
     qemu_fdt_setprop_cell(fdt, "/", "machine-version-major", 0);
-    qemu_fdt_setprop_cell(fdt, "/", "machine-version-minor", 2);
+    qemu_fdt_setprop_cell(fdt, "/", "machine-version-minor", 3);
 
     if (ms->numa_state->have_numa_distance) {
         int size = nb_numa_nodes * nb_numa_nodes * 3 * sizeof(uint32_t);
@@ -XXX,XX +XXX,XX @@ static void create_ahci(const SBSAMachineState *sms)
     }
 }
 
-static void create_ehci(const SBSAMachineState *sms)
+static void create_xhci(const SBSAMachineState *sms)
 {
-    hwaddr base = sbsa_ref_memmap[SBSA_EHCI].base;
-    int irq = sbsa_ref_irqmap[SBSA_EHCI];
+    hwaddr base = sbsa_ref_memmap[SBSA_XHCI].base;
+    int irq = sbsa_ref_irqmap[SBSA_XHCI];
+    DeviceState *dev = qdev_new(TYPE_XHCI_SYSBUS);
 
-    sysbus_create_simple("platform-ehci-usb", base,
-                         qdev_get_gpio_in(sms->gic, irq));
+    sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);
+    sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, base);
+    sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, qdev_get_gpio_in(sms->gic, irq));
 }
 
 static void create_smmu(const SBSAMachineState *sms, PCIBus *bus)
@@ -XXX,XX +XXX,XX @@ static void sbsa_ref_init(MachineState *machine)
 
     create_ahci(sms);
 
-    create_ehci(sms);
+    create_xhci(sms);
 
     create_pcie(sms);
 
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -XXX,XX +XXX,XX @@ config SBSA_REF
     select PL011 # UART
     select PL031 # RTC
     select PL061 # GPIO
-    select USB_EHCI_SYSBUS
+    select USB_XHCI_SYSBUS
     select WDT_SBSA
     select BOCHS_DISPLAY
 
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Allow the line length to extend to 548 columns.  While annoyingly wide,
it's still less confusing than the continuations we print.  Also, the
default VL used by Linux (and max for A64FX) uses only 140 columns.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230622151201.1578522-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c | 36 ++++++++++++++----------------------
 1 file changed, 14 insertions(+), 22 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
     ARMCPU *cpu = ARM_CPU(cs);
     CPUARMState *env = &cpu->env;
     uint32_t psr = pstate_read(env);
-    int i;
+    int i, j;
     int el = arm_current_el(env);
     const char *ns_status;
     bool sve;
@@ -XXX,XX +XXX,XX @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
     }
 
     if (sve) {
-        int j, zcr_len = sve_vqm1_for_el(env, el);
+        int zcr_len = sve_vqm1_for_el(env, el);
 
         for (i = 0; i <= FFR_PRED_NUM; i++) {
             bool eol;
@@ -XXX,XX +XXX,XX @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
             }
         }
 
-        for (i = 0; i < 32; i++) {
-            if (zcr_len == 0) {
+        if (zcr_len == 0) {
+            /*
+             * With vl=16, there are only 37 columns per register,
+             * so output two registers per line.
+             */
+            for (i = 0; i < 32; i++) {
                 qemu_fprintf(f, "Z%02d=%016" PRIx64 ":%016" PRIx64 "%s",
                              i, env->vfp.zregs[i].d[1],
                              env->vfp.zregs[i].d[0], i & 1 ? "\n" : " ");
-            } else if (zcr_len == 1) {
-                qemu_fprintf(f, "Z%02d=%016" PRIx64 ":%016" PRIx64
-                             ":%016" PRIx64 ":%016" PRIx64 "\n",
-                             i, env->vfp.zregs[i].d[3], env->vfp.zregs[i].d[2],
-                             env->vfp.zregs[i].d[1], env->vfp.zregs[i].d[0]);
-            } else {
+            }
+        } else {
+            for (i = 0; i < 32; i++) {
+                qemu_fprintf(f, "Z%02d=", i);
                 for (j = zcr_len; j >= 0; j--) {
-                    bool odd = (zcr_len - j) % 2 != 0;
-                    if (j == zcr_len) {
-                        qemu_fprintf(f, "Z%02d[%x-%x]=", i, j, j - 1);
-                    } else if (!odd) {
-                        if (j > 0) {
-                            qemu_fprintf(f, "   [%x-%x]=", j, j - 1);
-                        } else {
-                            qemu_fprintf(f, "     [%x]=", j);
-                        }
-                    }
                     qemu_fprintf(f, "%016" PRIx64 ":%016" PRIx64 "%s",
                                  env->vfp.zregs[i].d[j * 2 + 1],
-                                 env->vfp.zregs[i].d[j * 2],
-                                 odd || j == 0 ? "\n" : ":");
+                                 env->vfp.zregs[i].d[j * 2 + 0],
+                                 j ? ":" : "\n");
                 }
             }
         }
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

Always print each matrix row whole, one per line, so that we
get the entire matrix in the proper shape.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230622151201.1578522-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, int flags)
                          i, q[1], q[0], (i & 1 ? "\n" : " "));
         }
     }
+
+    if (cpu_isar_feature(aa64_sme, cpu) &&
+        FIELD_EX64(env->svcr, SVCR, ZA) &&
+        sme_exception_el(env, el) == 0) {
+        int zcr_len = sve_vqm1_for_el_sm(env, el, true);
+        int svl = (zcr_len + 1) * 16;
+        int svl_lg10 = svl < 100 ? 2 : 3;
+
+        for (i = 0; i < svl; i++) {
+            qemu_fprintf(f, "ZA[%0*d]=", svl_lg10, i);
+            for (j = zcr_len; j >= 0; --j) {
+                qemu_fprintf(f, "%016" PRIx64 ":%016" PRIx64 "%c",
+                             env->zarray[i].d[2 * j + 1],
+                             env->zarray[i].d[2 * j],
+                             j ? ':' : '\n');
+            }
+        }
+    }
 }
 
 #else
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

For the outer product set of insns, which take an entire matrix
tile as output, the argument is not a combined tile+column.
Therefore using get_tile_rowcol was incorrect, as we extracted
the tile number from itself.

The test case relies only on assembler support for SME, since
no release of GCC recognizes -march=armv9-a+sme yet.

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1620
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230622151201.1578522-5-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/translate-sme.c    | 24 ++++++---
 tests/tcg/aarch64/sme-outprod1.c  | 83 +++++++++++++++++++++++++++++++
 tests/tcg/aarch64/Makefile.target | 10 ++--
 3 files changed, 108 insertions(+), 9 deletions(-)
 create mode 100644 tests/tcg/aarch64/sme-outprod1.c

diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/translate-sme.c
+++ b/target/arm/tcg/translate-sme.c
@@ -XXX,XX +XXX,XX @@ static TCGv_ptr get_tile_rowcol(DisasContext *s, int esz, int rs,
     return addr;
 }
 
+/*
+ * Resolve tile.size[0] to a host pointer.
+ * Used by e.g. outer product insns where we require the entire tile.
+ */
+static TCGv_ptr get_tile(DisasContext *s, int esz, int tile)
+{
+    TCGv_ptr addr = tcg_temp_new_ptr();
+    int offset;
+
+    offset = tile * sizeof(ARMVectorReg) + offsetof(CPUARMState, zarray);
+
+    tcg_gen_addi_ptr(addr, cpu_env, offset);
+    return addr;
+}
+
 static bool trans_ZERO(DisasContext *s, arg_ZERO *a)
 {
     if (!dc_isar_feature(aa64_sme, s)) {
@@ -XXX,XX +XXX,XX @@ static bool do_adda(DisasContext *s, arg_adda *a, MemOp esz,
         return true;
     }
 
-    /* Sum XZR+zad to find ZAd. */
-    za = get_tile_rowcol(s, esz, 31, a->zad, false);
+    za = get_tile(s, esz, a->zad);
     zn = vec_full_reg_ptr(s, a->zn);
     pn = pred_full_reg_ptr(s, a->pn);
     pm = pred_full_reg_ptr(s, a->pm);
@@ -XXX,XX +XXX,XX @@ static bool do_outprod(DisasContext *s, arg_op *a, MemOp esz,
         return true;
     }
 
-    /* Sum XZR+zad to find ZAd. */
-    za = get_tile_rowcol(s, esz, 31, a->zad, false);
+    za = get_tile(s, esz, a->zad);
     zn = vec_full_reg_ptr(s, a->zn);
     zm = vec_full_reg_ptr(s, a->zm);
     pn = pred_full_reg_ptr(s, a->pn);
@@ -XXX,XX +XXX,XX @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz,
         return true;
     }
 
-    /* Sum XZR+zad to find ZAd. */
-    za = get_tile_rowcol(s, esz, 31, a->zad, false);
+    za = get_tile(s, esz, a->zad);
     zn = vec_full_reg_ptr(s, a->zn);
     zm = vec_full_reg_ptr(s, a->zm);
     pn = pred_full_reg_ptr(s, a->pn);
diff --git a/tests/tcg/aarch64/sme-outprod1.c b/tests/tcg/aarch64/sme-outprod1.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/tcg/aarch64/sme-outprod1.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * SME outer product, 1 x 1.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include <stdio.h>
+
+extern void foo(float *dst);
+
+asm(
+"	.arch_extension sme\n"
+"	.type foo, @function\n"
+"foo:\n"
+"	stp x29, x30, [sp, -80]!\n"
+"	mov x29, sp\n"
+"	stp d8, d9, [sp, 16]\n"
+"	stp d10, d11, [sp, 32]\n"
+"	stp d12, d13, [sp, 48]\n"
+"	stp d14, d15, [sp, 64]\n"
+"	smstart\n"
+"	ptrue p0.s, vl4\n"
+"	fmov z0.s, #1.0\n"
+/*
+ * An outer product of a vector of 1.0 by itself should be a matrix of 1.0.
+ * Note that we are using tile 1 here (za1.s) rather than tile 0.
+ */
+"	zero {za}\n"
+"	fmopa za1.s, p0/m, p0/m, z0.s, z0.s\n"
+/*
+ * Read the first 4x4 sub-matrix of elements from tile 1:
+ * Note that za1h should be interchangable here.
+ */
+"	mov w12, #0\n"
+"	mova z0.s, p0/m, za1v.s[w12, #0]\n"
+"	mova z1.s, p0/m, za1v.s[w12, #1]\n"
+"	mova z2.s, p0/m, za1v.s[w12, #2]\n"
+"	mova z3.s, p0/m, za1v.s[w12, #3]\n"
+/*
+ * And store them to the input pointer (dst in the C code):
+ */
+"	st1w {z0.s}, p0, [x0]\n"
+"	add x0, x0, #16\n"
+"	st1w {z1.s}, p0, [x0]\n"
+"	add x0, x0, #16\n"
+"	st1w {z2.s}, p0, [x0]\n"
+"	add x0, x0, #16\n"
+"	st1w {z3.s}, p0, [x0]\n"
+"	smstop\n"
+"	ldp d8, d9, [sp, 16]\n"
+"	ldp d10, d11, [sp, 32]\n"
+"	ldp d12, d13, [sp, 48]\n"
+"	ldp d14, d15, [sp, 64]\n"
+"	ldp x29, x30, [sp], 80\n"
+"	ret\n"
+"	.size foo, . - foo"
+);
+
+int main()
+{
+    float dst[16];
+    int i, j;
+
+    foo(dst);
+
+    for (i = 0; i < 16; i++) {
+        if (dst[i] != 1.0f) {
+            break;
+        }
+    }
+
+    if (i == 16) {
+        return 0; /* success */
+    }
+
+    /* failure */
+    for (i = 0; i < 4; ++i) {
+        for (j = 0; j < 4; ++j) {
+            printf("%f ", (double)dst[i * 4 + j]);
+        }
+        printf("\n");
+    }
+    return 1;
+}
diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
index XXXXXXX..XXXXXXX 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -XXX,XX +XXX,XX @@ config-cc.mak: Makefile
 	    $(call cc-option,-march=armv8.5-a,              CROSS_CC_HAS_ARMV8_5); \
 	    $(call cc-option,-mbranch-protection=standard,  CROSS_CC_HAS_ARMV8_BTI); \
 	    $(call cc-option,-march=armv8.5-a+memtag,       CROSS_CC_HAS_ARMV8_MTE); \
-	    $(call cc-option,-march=armv9-a+sme,            CROSS_CC_HAS_ARMV9_SME)) 3> config-cc.mak
+	    $(call cc-option,-Wa$(COMMA)-march=armv9-a+sme, CROSS_AS_HAS_ARMV9_SME)) 3> config-cc.mak
 -include config-cc.mak
 
 ifneq ($(CROSS_CC_HAS_ARMV8_2),)
@@ -XXX,XX +XXX,XX @@ AARCH64_TESTS += mte-1 mte-2 mte-3 mte-4 mte-5 mte-6 mte-7
 mte-%: CFLAGS += -march=armv8.5-a+memtag
 endif
 
+ifneq ($(CROSS_AS_HAS_ARMV9_SME),)
+AARCH64_TESTS += sme-outprod1
+endif
+
 ifneq ($(CROSS_CC_HAS_SVE),)
 # System Registers Tests
 AARCH64_TESTS += sysregs
-ifneq ($(CROSS_CC_HAS_ARMV9_SME),)
-sysregs: CFLAGS+=-march=armv9-a+sme -DHAS_ARMV9_SME
+ifneq ($(CROSS_AS_HAS_ARMV9_SME),)
+sysregs: CFLAGS+=-Wa,-march=armv9-a+sme -DHAS_ARMV9_SME
 else
 sysregs: CFLAGS+=-march=armv8.1-a+sve
 endif
-- 
2.34.1

From: John Högberg <john.hogberg@ericsson.com>

Unlike architectures with precise self-modifying code semantics
(e.g. x86) ARM processors do not maintain coherency for instruction
execution and memory, requiring an instruction synchronization
barrier on every core that will execute the new code, and on many
models also the explicit use of cache management instructions.

While this is required to make JITs work on actual hardware, QEMU
has gotten away with not handling this since it does not emulate
caches, and unconditionally invalidates code whenever the softmmu
or the user-mode page protection logic detects that code has been
modified.

Unfortunately the latter does not work in the face of dual-mapped
code (a common W^X workaround), where one page is executable and
the other is writable: user-mode has no way to connect one with the
other as that is only known to the kernel and the emulated
application.

This commit works around the issue by telling software that
instruction cache invalidation is required by clearing the
CPR_EL0.DIC flag (regardless of whether the emulated processor
needs it), and then invalidating code in IC IVAU instructions.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1034

Co-authored-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: John Högberg <john.hogberg@ericsson.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 168778890374.24232.3402138851538068785-1@git.sr.ht
[PMM: removed unnecessary AArch64 feature check; moved
 "clear CTR_EL1.DIC" code up a bit so it's not in the middle
 of the vfp/neon related tests]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c    | 11 +++++++++++
 target/arm/helper.c | 47 ++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
+#ifdef CONFIG_USER_ONLY
+    /*
+     * User mode relies on IC IVAU instructions to catch modification of
+     * dual-mapped code.
+     *
+     * Clear CTR_EL0.DIC to ensure that software that honors these flags uses
+     * IC IVAU even if the emulated processor does not normally require it.
+     */
+    cpu->ctr = FIELD_DP64(cpu->ctr, CTR_EL0, DIC, 0);
+#endif
+
     if (arm_feature(env, ARM_FEATURE_AARCH64) &&
         cpu->has_vfp != cpu->has_neon) {
         /*
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void mdcr_el2_write(CPUARMState *env, const ARMCPRegInfo *ri,
     }
 }
 
+#ifdef CONFIG_USER_ONLY
+/*
+ * `IC IVAU` is handled to improve compatibility with JITs that dual-map their
+ * code to get around W^X restrictions, where one region is writable and the
+ * other is executable.
+ *
+ * Since the executable region is never written to we cannot detect code
+ * changes when running in user mode, and rely on the emulated JIT telling us
+ * that the code has changed by executing this instruction.
+ */
+static void ic_ivau_write(CPUARMState *env, const ARMCPRegInfo *ri,
+                          uint64_t value)
+{
+    uint64_t icache_line_mask, start_address, end_address;
+    const ARMCPU *cpu;
+
+    cpu = env_archcpu(env);
+
+    icache_line_mask = (4 << extract32(cpu->ctr, 0, 4)) - 1;
+    start_address = value & ~icache_line_mask;
+    end_address = value | icache_line_mask;
+
+    mmap_lock();
+
+    tb_invalidate_phys_range(start_address, end_address);
+
+    mmap_unlock();
+}
+#endif
+
 static const ARMCPRegInfo v8_cp_reginfo[] = {
     /*
      * Minimal set of EL0-visible registers. This will need to be expanded
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
     { .name = "CURRENTEL", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 0, .opc2 = 2, .crn = 4, .crm = 2,
       .access = PL1_R, .type = ARM_CP_CURRENTEL },
-    /* Cache ops: all NOPs since we don't emulate caches */
+    /*
+     * Instruction cache ops. All of these except `IC IVAU` NOP because we
+     * don't emulate caches.
+     */
     { .name = "IC_IALLUIS", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 1, .opc2 = 0,
       .access = PL1_W, .type = ARM_CP_NOP,
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo v8_cp_reginfo[] = {
       .accessfn = access_tocu },
     { .name = "IC_IVAU", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 5, .opc2 = 1,
-      .access = PL0_W, .type = ARM_CP_NOP,
+      .access = PL0_W,
       .fgt = FGT_ICIVAU,
-      .accessfn = access_tocu },
+      .accessfn = access_tocu,
+#ifdef CONFIG_USER_ONLY
+      .type = ARM_CP_NO_RAW,
+      .writefn = ic_ivau_write
+#else
+      .type = ARM_CP_NOP
+#endif
+    },
+    /* Cache ops: all NOPs since we don't emulate caches */
     { .name = "DC_IVAC", .state = ARM_CP_STATE_AA64,
       .opc0 = 1, .opc1 = 0, .crn = 7, .crm = 6, .opc2 = 1,
       .access = PL1_W, .accessfn = aa64_cacheop_poc_access,
-- 
2.34.1

From: John Högberg <john.hogberg@ericsson.com>

https://gitlab.com/qemu-project/qemu/-/issues/1034

Signed-off-by: John Högberg <john.hogberg@ericsson.com>
Message-id: 168778890374.24232.3402138851538068785-2@git.sr.ht
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: fixed typo in comment]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/tcg/aarch64/icivau.c        | 189 ++++++++++++++++++++++++++++++
 tests/tcg/aarch64/Makefile.target |   3 +-
 2 files changed, 191 insertions(+), 1 deletion(-)
 create mode 100644 tests/tcg/aarch64/icivau.c

diff --git a/tests/tcg/aarch64/icivau.c b/tests/tcg/aarch64/icivau.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/tcg/aarch64/icivau.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * Tests the IC IVAU-driven workaround for catching changes made to dual-mapped
+ * code that would otherwise go unnoticed in user mode.
+ *
+ * Copyright (c) 2023 Ericsson AB
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <string.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <fcntl.h>
+
+#define MAX_CODE_SIZE 128
+
+typedef int (SelfModTest)(uint32_t, uint32_t*);
+typedef int (BasicTest)(int);
+
+static void mark_code_modified(const uint32_t *exec_data, size_t length)
+{
+    int dc_required, ic_required;
+    unsigned long ctr_el0;
+
+    /*
+     * Clear the data/instruction cache, as indicated by the CTR_ELO.{DIC,IDC}
+     * flags.
+     *
+     * For completeness we might be tempted to assert that we should fail when
+     * the whole code update sequence is omitted, but that would make the test
+     * flaky as it can succeed by coincidence on actual hardware.
+     */
+    asm ("mrs %0, ctr_el0\n" : "=r"(ctr_el0));
+
+    /* CTR_EL0.IDC */
+    dc_required = !((ctr_el0 >> 28) & 1);
+
+    /* CTR_EL0.DIC */
+    ic_required = !((ctr_el0 >> 29) & 1);
+
+    if (dc_required) {
+        size_t dcache_stride, i;
+
+        /*
+         * Step according to the minimum cache size, as the cache maintenance
+         * instructions operate on the cache line of the given address.
+         *
+         * We assume that exec_data is properly aligned.
+         */
+        dcache_stride = (4 << ((ctr_el0 >> 16) & 0xF));
+
+        for (i = 0; i < length; i += dcache_stride) {
+            const char *dc_addr = &((const char *)exec_data)[i];
+            asm volatile ("dc cvau, %x[dc_addr]\n"
+                          : /* no outputs */
+                          : [dc_addr] "r"(dc_addr)
+                          : "memory");
+        }
+
+        asm volatile ("dmb ish\n");
+    }
+
+    if (ic_required) {
+        size_t icache_stride, i;
+
+        icache_stride = (4 << (ctr_el0 & 0xF));
+
+        for (i = 0; i < length; i += icache_stride) {
+            const char *ic_addr = &((const char *)exec_data)[i];
+            asm volatile ("ic ivau, %x[ic_addr]\n"
+                          : /* no outputs */
+                          : [ic_addr] "r"(ic_addr)
+                          : "memory");
+        }
+
+        asm volatile ("dmb ish\n");
+    }
+
+    asm volatile ("isb sy\n");
+}
+
+static int basic_test(uint32_t *rw_data, const uint32_t *exec_data)
+{
+    /*
+     * As user mode only misbehaved for dual-mapped code when previously
+     * translated code had been changed, we'll start off with this basic test
+     * function to ensure that there's already some translated code at
+     * exec_data before the next test. This should cause the next test to fail
+     * if `mark_code_modified` fails to invalidate the code.
+     *
+     * Note that the payload is in binary form instead of inline assembler
+     * because we cannot use __attribute__((naked)) on this platform and the
+     * workarounds are at least as ugly as this is.
+     */
+    static const uint32_t basic_payload[] = {
+        0xD65F03C0 /* 0x00: RET */
+    };
+
+    BasicTest *copied_ptr = (BasicTest *)exec_data;
+
+    memcpy(rw_data, basic_payload, sizeof(basic_payload));
+    mark_code_modified(exec_data, sizeof(basic_payload));
+
+    return copied_ptr(1234) == 1234;
+}
+
+static int self_modification_test(uint32_t *rw_data, const uint32_t *exec_data)
+{
+    /*
+     * This test is self-modifying in an attempt to cover an edge case where
+     * the IC IVAU instruction invalidates itself.
+     *
+     * Note that the IC IVAU instruction is 16 bytes into the function, in what
+     * will be the same cache line as the modified instruction on machines with
+     * a cache line size >= 16 bytes.
+     */
+    static const uint32_t self_mod_payload[] = {
+        /* Overwrite the placeholder instruction with the new one. */
+        0xB9001C20, /* 0x00: STR w0, [x1, 0x1C] */
+
+        /* Get the executable address of the modified instruction. */
+        0x100000A8, /* 0x04: ADR x8, <0x1C> */
+
+        /* Mark the modified instruction as updated. */
+        0xD50B7B28, /* 0x08: DC CVAU x8 */
+        0xD5033BBF, /* 0x0C: DMB ISH */
+        0xD50B7528, /* 0x10: IC IVAU x8 */
+        0xD5033BBF, /* 0x14: DMB ISH */
+        0xD5033FDF, /* 0x18: ISB */
+
+        /* Placeholder instruction, overwritten above. */
+        0x52800000, /* 0x1C: MOV w0, 0 */
+
+        0xD65F03C0  /* 0x20: RET */
+    };
+
+    SelfModTest *copied_ptr = (SelfModTest *)exec_data;
+    int i;
+
+    memcpy(rw_data, self_mod_payload, sizeof(self_mod_payload));
+    mark_code_modified(exec_data, sizeof(self_mod_payload));
+
+    for (i = 1; i < 10; i++) {
+        /* Replace the placeholder instruction with `MOV w0, i` */
+        uint32_t new_instr = 0x52800000 | (i << 5);
+
+        if (copied_ptr(new_instr, rw_data) != i) {
+            return 0;
+        }
+    }
+
+    return 1;
+}
+
+int main(int argc, char **argv)
+{
+    const char *shm_name = "qemu-test-tcg-aarch64-icivau";
+    int fd;
+
+    fd = shm_open(shm_name, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
+
+    if (fd < 0) {
+        return EXIT_FAILURE;
+    }
+
+    /* Unlink early to avoid leaving garbage in case the test crashes. */
+    shm_unlink(shm_name);
+
+    if (ftruncate(fd, MAX_CODE_SIZE) == 0) {
+        const uint32_t *exec_data;
+        uint32_t *rw_data;
+
+        rw_data = mmap(0, MAX_CODE_SIZE, PROT_READ | PROT_WRITE,
+                       MAP_SHARED, fd, 0);
+        exec_data = mmap(0, MAX_CODE_SIZE, PROT_READ | PROT_EXEC,
+                         MAP_SHARED, fd, 0);
+
+        if (rw_data && exec_data) {
+            if (basic_test(rw_data, exec_data) &&
+                self_modification_test(rw_data, exec_data)) {
+                return EXIT_SUCCESS;
+            }
+        }
+    }
+
+    return EXIT_FAILURE;
+}
diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
index XXXXXXX..XXXXXXX 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -XXX,XX +XXX,XX @@ AARCH64_SRC=$(SRC_PATH)/tests/tcg/aarch64
 VPATH 		+= $(AARCH64_SRC)
 
 # Base architecture tests
-AARCH64_TESTS=fcvt pcalign-a64
+AARCH64_TESTS=fcvt pcalign-a64 icivau
 
 fcvt: LDFLAGS+=-lm
+icivau: LDFLAGS+=-lrt
 
 run-fcvt: fcvt
 	$(call run-test,$<,$(QEMU) $<, "$< on $(TARGET_NAME)")
-- 
2.34.1

From: Vikram Garhwal <vikram.garhwal@amd.com>

Following are done to fix the coverity issues:
1. Change read_data to fix the CID 1512899: Out-of-bounds access (OVERRUN)
2. Fix match_rx_tx_data to fix CID 1512900: Logically dead code (DEADCODE)
3. Replace rand() in generate_random_data() with g_rand_int()

Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com>
Message-id: 20230628202758.16398-1-vikram.garhwal@amd.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/qtest/xlnx-canfd-test.c | 33 +++++++++++----------------------
 1 file changed, 11 insertions(+), 22 deletions(-)

diff --git a/tests/qtest/xlnx-canfd-test.c b/tests/qtest/xlnx-canfd-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/xlnx-canfd-test.c
+++ b/tests/qtest/xlnx-canfd-test.c
@@ -XXX,XX +XXX,XX @@ static void generate_random_data(uint32_t *buf_tx, bool is_canfd_frame)
     /* Generate random TX data for CANFD frame. */
     if (is_canfd_frame) {
         for (int i = 0; i < CANFD_FRAME_SIZE - 2; i++) {
-            buf_tx[2 + i] = rand();
+            buf_tx[2 + i] = g_random_int();
         }
     } else {
         /* Generate random TX data for CAN frame. */
         for (int i = 0; i < CAN_FRAME_SIZE - 2; i++) {
-            buf_tx[2 + i] = rand();
+            buf_tx[2 + i] = g_random_int();
         }
     }
 }
 
-static void read_data(QTestState *qts, uint64_t can_base_addr, uint32_t *buf_rx)
+static void read_data(QTestState *qts, uint64_t can_base_addr, uint32_t *buf_rx,
+                      uint32_t frame_size)
 {
     uint32_t int_status;
     uint32_t fifo_status_reg_value;
     /* At which RX FIFO the received data is stored. */
     uint8_t store_ind = 0;
-    bool is_canfd_frame = false;
 
     /* Read the interrupt on CANFD rx. */
     int_status = qtest_readl(qts, can_base_addr + R_ISR_OFFSET) & ISR_RXOK;
@@ -XXX,XX +XXX,XX @@ static void read_data(QTestState *qts, uint64_t can_base_addr, uint32_t *buf_rx)
     buf_rx[0] = qtest_readl(qts, can_base_addr + R_RX0_ID_OFFSET);
     buf_rx[1] = qtest_readl(qts, can_base_addr + R_RX0_DLC_OFFSET);
 
-    is_canfd_frame = (buf_rx[1] >> DLC_FD_BIT_SHIFT) & 1;
-
-    if (is_canfd_frame) {
-        for (int i = 0; i < CANFD_FRAME_SIZE - 2; i++) {
-            buf_rx[i + 2] = qtest_readl(qts,
-                                    can_base_addr + R_RX0_DATA1_OFFSET + 4 * i);
-        }
-    } else {
-        buf_rx[2] = qtest_readl(qts, can_base_addr + R_RX0_DATA1_OFFSET);
-        buf_rx[3] = qtest_readl(qts, can_base_addr + R_RX0_DATA2_OFFSET);
+    for (int i = 0; i < frame_size - 2; i++) {
+        buf_rx[i + 2] = qtest_readl(qts,
+                                can_base_addr + R_RX0_DATA1_OFFSET + 4 * i);
     }
 
     /* Clear the RX interrupt. */
@@ -XXX,XX +XXX,XX @@ static void match_rx_tx_data(const uint32_t *buf_tx, const uint32_t *buf_rx,
             g_assert_cmpint((buf_rx[size] & DLC_FD_BIT_MASK), ==,
                             (buf_tx[size] & DLC_FD_BIT_MASK));
         } else {
-            if (!is_canfd_frame && size == 4) {
-                break;
-            }
-
             g_assert_cmpint(buf_rx[size], ==, buf_tx[size]);
         }
 
@@ -XXX,XX +XXX,XX @@ static void test_can_data_transfer(void)
     write_data(qts, CANFD0_BASE_ADDR, buf_tx, false);
 
     send_data(qts, CANFD0_BASE_ADDR);
-    read_data(qts, CANFD1_BASE_ADDR, buf_rx);
+    read_data(qts, CANFD1_BASE_ADDR, buf_rx, CAN_FRAME_SIZE);
     match_rx_tx_data(buf_tx, buf_rx, false);
 
     qtest_quit(qts);
@@ -XXX,XX +XXX,XX @@ static void test_canfd_data_transfer(void)
     write_data(qts, CANFD0_BASE_ADDR, buf_tx, true);
 
     send_data(qts, CANFD0_BASE_ADDR);
-    read_data(qts, CANFD1_BASE_ADDR, buf_rx);
+    read_data(qts, CANFD1_BASE_ADDR, buf_rx, CANFD_FRAME_SIZE);
     match_rx_tx_data(buf_tx, buf_rx, true);
 
     qtest_quit(qts);
@@ -XXX,XX +XXX,XX @@ static void test_can_loopback(void)
     write_data(qts, CANFD0_BASE_ADDR, buf_tx, true);
 
     send_data(qts, CANFD0_BASE_ADDR);
-    read_data(qts, CANFD0_BASE_ADDR, buf_rx);
+    read_data(qts, CANFD0_BASE_ADDR, buf_rx, CANFD_FRAME_SIZE);
     match_rx_tx_data(buf_tx, buf_rx, true);
 
     generate_random_data(buf_tx, true);
@@ -XXX,XX +XXX,XX @@ static void test_can_loopback(void)
     write_data(qts, CANFD1_BASE_ADDR, buf_tx, true);
 
     send_data(qts, CANFD1_BASE_ADDR);
-    read_data(qts, CANFD1_BASE_ADDR, buf_rx);
+    read_data(qts, CANFD1_BASE_ADDR, buf_rx, CANFD_FRAME_SIZE);
     match_rx_tx_data(buf_tx, buf_rx, true);
 
     qtest_quit(qts);
-- 
2.34.1

From: Fabiano Rosas <farosas@suse.de>

This code is only relevant when TCG is present in the build. Building
with --disable-tcg --enable-xen on an x86 host we get:

$ ../configure --target-list=x86_64-softmmu,aarch64-softmmu --disable-tcg --enable-xen
$ make -j$(nproc)
...
libqemu-aarch64-softmmu.fa.p/target_arm_gdbstub.c.o: in function `m_sysreg_ptr':
 ../target/arm/gdbstub.c:358: undefined reference to `arm_v7m_get_sp_ptr'
 ../target/arm/gdbstub.c:361: undefined reference to `arm_v7m_get_sp_ptr'

libqemu-aarch64-softmmu.fa.p/target_arm_gdbstub.c.o: in function `arm_gdb_get_m_systemreg':
../target/arm/gdbstub.c:405: undefined reference to `arm_v7m_mrs_control'

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-id: 20230628164821.16771-1-farosas@suse.de
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/gdbstub.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/arm/gdbstub.c b/target/arm/gdbstub.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/gdbstub.c
+++ b/target/arm/gdbstub.c
@@ -XXX,XX +XXX,XX @@ static int arm_gen_dynamic_sysreg_xml(CPUState *cs, int base_reg)
     return cpu->dyn_sysreg_xml.num;
 }
 
+#ifdef CONFIG_TCG
 typedef enum {
     M_SYSREG_MSP,
     M_SYSREG_PSP,
@@ -XXX,XX +XXX,XX @@ static int arm_gen_dynamic_m_secextreg_xml(CPUState *cs, int orig_base_reg)
     return cpu->dyn_m_secextreg_xml.num;
 }
 #endif
+#endif /* CONFIG_TCG */
 
 const char *arm_gdb_get_dynamic_xml(CPUState *cs, const char *xmlname)
 {
@@ -XXX,XX +XXX,XX @@ void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu)
                              arm_gen_dynamic_sysreg_xml(cs, cs->gdb_num_regs),
                              "system-registers.xml", 0);
 
+#ifdef CONFIG_TCG
     if (arm_feature(env, ARM_FEATURE_M) && tcg_enabled()) {
         gdb_register_coprocessor(cs,
             arm_gdb_get_m_systemreg, arm_gdb_set_m_systemreg,
@@ -XXX,XX +XXX,XX @@ void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu)
         }
 #endif
     }
+#endif /* CONFIG_TCG */
 }
-- 
2.34.1

From: Akihiko Odaki <akihiko.odaki@daynix.com>

AwSRAMCClass is larger than SysBusDeviceClass so the class size must be
advertised accordingly.

Fixes: 05def917e1 ("hw: arm: allwinner-sramc: Add SRAM Controller support for R40")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230628110905.38125-1-akihiko.odaki@daynix.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/misc/allwinner-sramc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/misc/allwinner-sramc.c b/hw/misc/allwinner-sramc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/allwinner-sramc.c
+++ b/hw/misc/allwinner-sramc.c
@@ -XXX,XX +XXX,XX @@ static const TypeInfo allwinner_sramc_info = {
     .parent        = TYPE_SYS_BUS_DEVICE,
     .instance_init = allwinner_sramc_init,
     .instance_size = sizeof(AwSRAMCState),
+    .class_size    = sizeof(AwSRAMCClass),
     .class_init    = allwinner_sramc_class_init,
 };
 
-- 
2.34.1

In handle_interrupt() we use level as an index into the interrupt_vector[]
array. This is safe because we have checked it against env->config->nlevel,
but Coverity can't see that (and it is only true because each CPU config
sets its XCHAL_NUM_INTLEVELS to something less than MAX_NLEVELS), so it
complains about a possible array overrun (CID 1507131)

Add an assert() which will make Coverity happy and catch the unlikely
case of a mis-set XCHAL_NUM_INTLEVELS in future.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Message-id: 20230623154135.1930261-1-peter.maydell@linaro.org
---
 target/xtensa/exc_helper.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/xtensa/exc_helper.c b/target/xtensa/exc_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/xtensa/exc_helper.c
+++ b/target/xtensa/exc_helper.c
@@ -XXX,XX +XXX,XX @@ static void handle_interrupt(CPUXtensaState *env)
         CPUState *cs = env_cpu(env);
 
         if (level > 1) {
+            /* env->config->nlevel check should have ensured this */
+            assert(level < sizeof(env->config->interrupt_vector));
+
             env->sregs[EPC1 + level - 1] = env->pc;
             env->sregs[EPS2 + level - 2] = env->sregs[PS];
             env->sregs[PS] =
-- 
2.34.1