Series comparison

-[PULL 00/23] target-arm queue
+[PULL 00/14] target-arm queue
-Mostly my decodetree stuff, but also some patches for various
+The following changes since commit 8f6330a807f2642dc2a3cdf33347aa28a4c00a87:
 smaller bugs/features from others.
-thanks
+  Merge tag 'pull-maintainer-updates-060324-1' of https://gitlab.com/stsquad/qemu into staging (2024-03-06 16:56:20 +0000)
 -- PMM
 The following changes since commit 53550e81e2cafe7c03a39526b95cd21b5194d9b1:
   Merge remote-tracking branch 'remotes/berrange/tags/qcrypto-next-pull-request' into staging (2020-06-15 16:36:34 +0100)
 are available in the Git repository at:
-  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200616
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20240308
-for you to fetch changes up to 64b397417a26509bcdff44ab94356a35c7901c79:
+for you to fetch changes up to bbf6c6dbead82292a20951eb1204442a6b838de9:
-  hw: arm: Set vendor property for IMX SDHCI emulations (2020-06-16 10:32:29 +0100)
+  target/arm: Move v7m-related code from cpu32.c into a separate file (2024-03-08 14:45:03 +0000)
 ----------------------------------------------------------------
- * hw: arm: Set vendor property for IMX SDHCI emulations
+target-arm queue:
- * sd: sdhci: Implement basic vendor specific register support
+ * Implement FEAT_ECV
- * hw/net/imx_fec: Convert debug fprintf() to trace events
+ * STM32L4x5: Implement GPIO device
- * target/arm/cpu: adjust virtual time for all KVM arm cpus
+ * Fix 32-bit SMOPA
- * Implement configurable descriptor size in ftgmac100
+ * Refactor v7m related code from cpu32.c into its own file
- * hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
+ * hw/rtc/sun4v-rtc: Relicense to GPLv2-or-later
  * target/arm: More Neon decodetree conversion work
 ----------------------------------------------------------------
-Erik Smit (1):
+Inès Varhol (3):
-      Implement configurable descriptor size in ftgmac100
+      hw/gpio: Implement STM32L4x5 GPIO
       hw/arm: Connect STM32L4x5 GPIO to STM32L4x5 SoC
       tests/qtest: Add STM32L4x5 GPIO QTest testcase
-Guenter Roeck (2):
+Peter Maydell (9):
-      sd: sdhci: Implement basic vendor specific register support
+      target/arm: Move some register related defines to internals.h
-      hw: arm: Set vendor property for IMX SDHCI emulations
+      target/arm: Timer _EL02 registers UNDEF for E2H == 0
       target/arm: use FIELD macro for CNTHCTL bit definitions
       target/arm: Don't allow RES0 CNTHCTL_EL2 bits to be written
       target/arm: Implement new FEAT_ECV trap bits
       target/arm: Define CNTPCTSS_EL0 and CNTVCTSS_EL0
       target/arm: Implement FEAT_ECV CNTPOFF_EL2 handling
       target/arm: Enable FEAT_ECV for 'max' CPU
       hw/rtc/sun4v-rtc: Relicense to GPLv2-or-later
-Jean-Christophe Dubois (2):
+Richard Henderson (1):
-      hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
+      target/arm: Fix 32-bit SMOPA
       hw/net/imx_fec: Convert debug fprintf() to trace events
-Peter Maydell (17):
+Thomas Huth (1):
-      target/arm: Fix missing temp frees in do_vshll_2sh
+      target/arm: Move v7m-related code from cpu32.c into a separate file
       target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
       target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
       target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
       target/arm: Convert Neon 3-reg-diff long multiplies
       target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
       target/arm: Convert Neon 3-reg-diff polynomial VMULL
       target/arm: Add 'static' and 'const' annotations to VSHLL function arrays
       target/arm: Add missing TCG temp free in do_2shift_env_64()
       target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
       target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
       target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
       target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
       target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
       target/arm: Convert Neon VEXT to decodetree
       target/arm: Convert Neon VTBL, VTBX to decodetree
       target/arm: Convert Neon VDUP (scalar) to decodetree
-fangying (1):
+ MAINTAINERS                        |   1 +
-      target/arm/cpu: adjust virtual time for all KVM arm cpus
+ docs/system/arm/b-l475e-iot01a.rst |   2 +-
  docs/system/arm/emulation.rst      |   1 +
  include/hw/arm/stm32l4x5_soc.h     |   2 +
  include/hw/gpio/stm32l4x5_gpio.h   |  71 +++++
  include/hw/misc/stm32l4x5_syscfg.h |   3 +-
  include/hw/rtc/sun4v-rtc.h         |   2 +-
  target/arm/cpu-features.h          |  10 +
  target/arm/cpu.h                   | 129 +--------
  target/arm/internals.h             | 151 ++++++++++
  hw/arm/stm32l4x5_soc.c             |  71 ++++-
  hw/gpio/stm32l4x5_gpio.c           | 477 ++++++++++++++++++++++++++++++++
  hw/misc/stm32l4x5_syscfg.c         |   1 +
  hw/rtc/sun4v-rtc.c                 |   2 +-
  target/arm/helper.c                | 189 ++++++++++++-
  target/arm/tcg/cpu-v7m.c           | 290 +++++++++++++++++++
  target/arm/tcg/cpu32.c             | 261 ------------------
  target/arm/tcg/cpu64.c             |   1 +
  target/arm/tcg/sme_helper.c        |  77 +++---
  tests/qtest/stm32l4x5_gpio-test.c  | 551 +++++++++++++++++++++++++++++++++++++
  tests/tcg/aarch64/sme-smopa-1.c    |  47 ++++
  tests/tcg/aarch64/sme-smopa-2.c    |  54 ++++
  hw/arm/Kconfig                     |   3 +-
  hw/gpio/Kconfig                    |   3 +
  hw/gpio/meson.build                |   1 +
  hw/gpio/trace-events               |   6 +
  target/arm/meson.build             |   3 +
  target/arm/tcg/meson.build         |   3 +
  target/arm/trace-events            |   1 +
  tests/qtest/meson.build            |   3 +-
  tests/tcg/aarch64/Makefile.target  |   2 +-
 files changed, 1962 insertions(+), 456 deletions(-)
  create mode 100644 include/hw/gpio/stm32l4x5_gpio.h
  create mode 100644 hw/gpio/stm32l4x5_gpio.c
  create mode 100644 target/arm/tcg/cpu-v7m.c
  create mode 100644 tests/qtest/stm32l4x5_gpio-test.c
  create mode 100644 tests/tcg/aarch64/sme-smopa-1.c
  create mode 100644 tests/tcg/aarch64/sme-smopa-2.c
- hw/sd/sdhci-internal.h          |    5 +
- include/hw/sd/sdhci.h           |    5 +
- target/arm/translate.h          |    1 +
- target/arm/neon-dp.decode       |  130 +++++
- hw/arm/fsl-imx25.c              |    6 +
- hw/arm/fsl-imx6.c               |    6 +
- hw/arm/fsl-imx6ul.c             |    2 +
- hw/arm/fsl-imx7.c               |    2 +
- hw/misc/imx6ul_ccm.c            |   76 ++-
- hw/net/ftgmac100.c              |   26 +-
- hw/net/imx_fec.c                |  106 ++--
- hw/sd/sdhci.c                   |   18 +-
- target/arm/cpu.c                |    6 +-
- target/arm/cpu64.c              |    1 -
- target/arm/kvm.c                |   21 +-
- target/arm/translate-neon.inc.c | 1148 ++++++++++++++++++++++++++++++++++++++-
- target/arm/translate.c          |  684 +----------------------
- hw/net/trace-events             |   18 +
-files changed, 1495 insertions(+), 766 deletions(-)

-[PULL 01/23] target/arm: Fix missing temp frees in do_vshll_2sh
+Deleted patch
-The widenfn() in do_vshll_2sh() does not free the input 32-bit
-TCGv, so we need to do this in the calling code.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
----
- target/arm/translate-neon.inc.c | 2 ++
-file changed, 2 insertions(+)
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
-     tmp = tcg_temp_new_i64();
-     widenfn(tmp, rm0);
-+    tcg_temp_free_i32(rm0);
-     if (a->shift != 0) {
-         tcg_gen_shli_i64(tmp, tmp, a->shift);
-         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
-@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
-     neon_store_reg64(tmp, a->vd);
-     widenfn(tmp, rm1);
-+    tcg_temp_free_i32(rm1);
-     if (a->shift != 0) {
-         tcg_gen_shli_i64(tmp, tmp, a->shift);
-         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
---
-.20.1

-[PULL 02/23] target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
+Deleted patch
-Convert the "pre-widening" insns VADDL, VSUBL, VADDW and VSUBW
-in the Neon 3-registers-different-lengths group to decodetree.
-These insns work by widening one or both inputs to double their
-size, performing an add or subtract at the doubled size and
-then storing the double-size result.
-As usual, rather than copying the loop of the original decoder
-(which needs awkward code to avoid problems when source and
-destination registers overlap) we just unroll the two passes.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/neon-dp.decode       |  43 +++++++++++++
- target/arm/translate-neon.inc.c | 104 ++++++++++++++++++++++++++++++++
- target/arm/translate.c          |  16 ++---
-files changed, 151 insertions(+), 12 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
-+++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ VCVT_FU_2sh      1111 001 1 1 . ...... .... 1111 0 . . 1 .... @2reg_vcvt
- # So we have a single decode line and check the cmode/op in the
- # trans function.
- Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
-+
-+######################################################################
-+# Within the "two registers, or three registers of different lengths"
-+# grouping ([23,4]=0b10), bits [21:20] are either part of the opcode
-+# decode: 0b11 for VEXT, two-reg-misc, VTBL, and duplicate-scalar;
-+# or they are a size field for the three-reg-different-lengths and
-+# two-reg-and-scalar insn groups (where size cannot be 0b11). This
-+# is slightly awkward for decodetree: we handle it with this
-+# non-exclusive group which contains within it two exclusive groups:
-+# one for the size=0b11 patterns, and one for the size-not-0b11
-+# patterns. This allows us to check that none of the insns within
-+# each subgroup accidentally overlap each other. Note that all the
-+# trans functions for the size-not-0b11 patterns must check and
-+# return false for size==3.
-+######################################################################
-+{
-+  # 0b11 subgroup will go here
-+
-+  # Subgroup for size != 0b11
-+  [
-+    ##################################################################
-+    # 3-reg-different-length grouping:
-+    # 1111 001 U 1 D sz!=11 Vn:4 Vd:4 opc:4 N 0 M 0 Vm:4
-+    ##################################################################
-+
-+    &3diff vm vn vd size
-+
-+    @3diff       .... ... . . . size:2 .... .... .... . . . . .... \
-+                 &3diff vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
-+    VADDL_S_3d   1111 001 0 1 . .. .... .... 0000 . 0 . 0 .... @3diff
-+    VADDL_U_3d   1111 001 1 1 . .. .... .... 0000 . 0 . 0 .... @3diff
-+
-+    VADDW_S_3d   1111 001 0 1 . .. .... .... 0001 . 0 . 0 .... @3diff
-+    VADDW_U_3d   1111 001 1 1 . .. .... .... 0001 . 0 . 0 .... @3diff
-+
-+    VSUBL_S_3d   1111 001 0 1 . .. .... .... 0010 . 0 . 0 .... @3diff
-+    VSUBL_U_3d   1111 001 1 1 . .. .... .... 0010 . 0 . 0 .... @3diff
-+
-+    VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
-+    VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
-+  ]
-+}
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
-     }
-     return do_1reg_imm(s, a, fn);
- }
-+
-+static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
-+                           NeonGenWidenFn *widenfn,
-+                           NeonGenTwo64OpFn *opfn,
-+                           bool src1_wide)
-+{
-+    /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
-+    TCGv_i64 rn0_64, rn1_64, rm_64;
-+    TCGv_i32 rm;
-+
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
-+        return false;
-+    }
-+
-+    if (!widenfn || !opfn) {
-+        /* size == 3 case, which is an entirely different insn group */
-+        return false;
-+    }
-+
-+    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    rn0_64 = tcg_temp_new_i64();
-+    rn1_64 = tcg_temp_new_i64();
-+    rm_64 = tcg_temp_new_i64();
-+
-+    if (src1_wide) {
-+        neon_load_reg64(rn0_64, a->vn);
-+    } else {
-+        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
-+        widenfn(rn0_64, tmp);
-+        tcg_temp_free_i32(tmp);
-+    }
-+    rm = neon_load_reg(a->vm, 0);
-+
-+    widenfn(rm_64, rm);
-+    tcg_temp_free_i32(rm);
-+    opfn(rn0_64, rn0_64, rm_64);
-+
-+    /*
-+     * Load second pass inputs before storing the first pass result, to
-+     * avoid incorrect results if a narrow input overlaps with the result.
-+     */
-+    if (src1_wide) {
-+        neon_load_reg64(rn1_64, a->vn + 1);
-+    } else {
-+        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
-+        widenfn(rn1_64, tmp);
-+        tcg_temp_free_i32(tmp);
-+    }
-+    rm = neon_load_reg(a->vm, 1);
-+
-+    neon_store_reg64(rn0_64, a->vd);
-+
-+    widenfn(rm_64, rm);
-+    tcg_temp_free_i32(rm);
-+    opfn(rn1_64, rn1_64, rm_64);
-+    neon_store_reg64(rn1_64, a->vd + 1);
-+
-+    tcg_temp_free_i64(rn0_64);
-+    tcg_temp_free_i64(rn1_64);
-+    tcg_temp_free_i64(rm_64);
-+
-+    return true;
-+}
-+
-+#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
-+    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
-+    {                                                                   \
-+        static NeonGenWidenFn * const widenfn[] = {                     \
-+            gen_helper_neon_widen_##S##8,                               \
-+            gen_helper_neon_widen_##S##16,                              \
-+            tcg_gen_##EXT##_i32_i64,                                    \
-+            NULL,                                                       \
-+        };                                                              \
-+        static NeonGenTwo64OpFn * const addfn[] = {                     \
-+            gen_helper_neon_##OP##l_u16,                                \
-+            gen_helper_neon_##OP##l_u32,                                \
-+            tcg_gen_##OP##_i64,                                         \
-+            NULL,                                                       \
-+        };                                                              \
-+        return do_prewiden_3d(s, a, widenfn[a->size],                   \
-+                              addfn[a->size], SRC1WIDE);                \
-+    }
-+
-+DO_PREWIDEN(VADDL_S, s, ext, add, false)
-+DO_PREWIDEN(VADDL_U, u, extu, add, false)
-+DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
-+DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
-+DO_PREWIDEN(VADDW_S, s, ext, add, true)
-+DO_PREWIDEN(VADDW_U, u, extu, add, true)
-+DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
-+DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                 /* Three registers of different lengths.  */
-                 int src1_wide;
-                 int src2_wide;
--                int prewiden;
-                 /* undefreq: bit 0 : UNDEF if size == 0
-                  *           bit 1 : UNDEF if size == 1
-                  *           bit 2 : UNDEF if size == 2
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                 int undefreq;
-                 /* prewiden, src1_wide, src2_wide, undefreq */
-                 static const int neon_3reg_wide[16][4] = {
--                    {1, 0, 0, 0}, /* VADDL */
--                    {1, 1, 0, 0}, /* VADDW */
--                    {1, 0, 0, 0}, /* VSUBL */
--                    {1, 1, 0, 0}, /* VSUBW */
-+                    {0, 0, 0, 7}, /* VADDL: handled by decodetree */
-+                    {0, 0, 0, 7}, /* VADDW: handled by decodetree */
-+                    {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
-+                    {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
-                     {0, 1, 1, 0}, /* VADDHN */
-                     {0, 0, 0, 0}, /* VABAL */
-                     {0, 1, 1, 0}, /* VSUBHN */
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
-                 };
--                prewiden = neon_3reg_wide[op][0];
-                 src1_wide = neon_3reg_wide[op][1];
-                 src2_wide = neon_3reg_wide[op][2];
-                 undefreq = neon_3reg_wide[op][3];
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                         } else {
-                             tmp = neon_load_reg(rn, pass);
-                         }
--                        if (prewiden) {
--                            gen_neon_widen(cpu_V0, tmp, size, u);
--                        }
-                     }
-                     if (src2_wide) {
-                         neon_load_reg64(cpu_V1, rm + pass);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                         } else {
-                             tmp2 = neon_load_reg(rm, pass);
-                         }
--                        if (prewiden) {
--                            gen_neon_widen(cpu_V1, tmp2, size, u);
--                        }
-                     }
-                     switch (op) {
-                     case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
---
-.20.1

-[PULL 17/23] target/arm: Convert Neon VDUP (scalar) to decodetree
+[PULL 01/14] target/arm: Move some register related defines to internals.h
-Convert the Neon VDUP (scalar) insn to decodetree.  (Note that we
+cpu.h has a lot of #defines relating to CPU register fields.
-can't call this just "VDUP" as we used that already in vfp.decode for
+Most of these aren't actually used outside target/arm code,
-the "VDUP (general purpose register" insn.)
+so there's no point in cluttering up the cpu.h file with them.
 Move some easy ones to internals.h.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240301183219.2424889-2-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  7 +++++++
+ target/arm/cpu.h       | 128 -----------------------------------------
- target/arm/translate-neon.inc.c | 26 ++++++++++++++++++++++++++
+ target/arm/internals.h | 128 +++++++++++++++++++++++++++++++++++++++++
- target/arm/translate.c          | 25 +------------------------
+files changed, 128 insertions(+), 128 deletions(-)
 files changed, 34 insertions(+), 24 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/cpu.h
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/cpu.h
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ typedef struct ARMGenericTimer {
+     uint64_t ctl; /* Timer Control register */
-     VTBL         1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
+ } ARMGenericTimer;
-                  vm=%vm_dp vn=%vn_dp vd=%vd_dp
-+
+-#define VTCR_NSW (1u << 29)
-+    VDUP_scalar  1111 001 1 1 . 11 index:3 1 .... 11 000 q:1 . 0 .... \
+-#define VTCR_NSA (1u << 30)
-+                 vm=%vm_dp vd=%vd_dp size=0
+-#define VSTCR_SW VTCR_NSW
-+    VDUP_scalar  1111 001 1 1 . 11 index:2 10 .... 11 000 q:1 . 0 .... \
+-#define VSTCR_SA VTCR_NSA
-+                 vm=%vm_dp vd=%vd_dp size=1
+-
-+    VDUP_scalar  1111 001 1 1 . 11 index:1 100 .... 11 000 q:1 . 0 .... \
+ /* Define a maximum sized vector register.
-+                 vm=%vm_dp vd=%vd_dp size=2
+  * For 32-bit, this is a 128-bit NEON/AdvSIMD register.
-   ]
+  * For 64-bit, this is a 2048-bit SVE register.
+@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu);
-   # Subgroup for size != 0b11
+ #define SCTLR_SPINTMASK (1ULL << 62) /* FEAT_NMI */
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+ #define SCTLR_TIDCP   (1ULL << 63) /* FEAT_TIDCP1 */
 -/* Bit definitions for CPACR (AArch32 only) */
 -FIELD(CPACR, CP10, 20, 2)
 -FIELD(CPACR, CP11, 22, 2)
 -FIELD(CPACR, TRCDIS, 28, 1)    /* matches CPACR_EL1.TTA */
 -FIELD(CPACR, D32DIS, 30, 1)    /* up to v7; RAZ in v8 */
 -FIELD(CPACR, ASEDIS, 31, 1)
 -
 -/* Bit definitions for CPACR_EL1 (AArch64 only) */
 -FIELD(CPACR_EL1, ZEN, 16, 2)
 -FIELD(CPACR_EL1, FPEN, 20, 2)
 -FIELD(CPACR_EL1, SMEN, 24, 2)
 -FIELD(CPACR_EL1, TTA, 28, 1)   /* matches CPACR.TRCDIS */
 -
 -/* Bit definitions for HCPTR (AArch32 only) */
 -FIELD(HCPTR, TCP10, 10, 1)
 -FIELD(HCPTR, TCP11, 11, 1)
 -FIELD(HCPTR, TASE, 15, 1)
 -FIELD(HCPTR, TTA, 20, 1)
 -FIELD(HCPTR, TAM, 30, 1)       /* matches CPTR_EL2.TAM */
 -FIELD(HCPTR, TCPAC, 31, 1)     /* matches CPTR_EL2.TCPAC */
 -
 -/* Bit definitions for CPTR_EL2 (AArch64 only) */
 -FIELD(CPTR_EL2, TZ, 8, 1)      /* !E2H */
 -FIELD(CPTR_EL2, TFP, 10, 1)    /* !E2H, matches HCPTR.TCP10 */
 -FIELD(CPTR_EL2, TSM, 12, 1)    /* !E2H */
 -FIELD(CPTR_EL2, ZEN, 16, 2)    /* E2H */
 -FIELD(CPTR_EL2, FPEN, 20, 2)   /* E2H */
 -FIELD(CPTR_EL2, SMEN, 24, 2)   /* E2H */
 -FIELD(CPTR_EL2, TTA, 28, 1)
 -FIELD(CPTR_EL2, TAM, 30, 1)    /* matches HCPTR.TAM */
 -FIELD(CPTR_EL2, TCPAC, 31, 1)  /* matches HCPTR.TCPAC */
 -
 -/* Bit definitions for CPTR_EL3 (AArch64 only) */
 -FIELD(CPTR_EL3, EZ, 8, 1)
 -FIELD(CPTR_EL3, TFP, 10, 1)
 -FIELD(CPTR_EL3, ESM, 12, 1)
 -FIELD(CPTR_EL3, TTA, 20, 1)
 -FIELD(CPTR_EL3, TAM, 30, 1)
 -FIELD(CPTR_EL3, TCPAC, 31, 1)
 -
 -#define MDCR_MTPME    (1U << 28)
 -#define MDCR_TDCC     (1U << 27)
 -#define MDCR_HLP      (1U << 26)  /* MDCR_EL2 */
 -#define MDCR_SCCD     (1U << 23)  /* MDCR_EL3 */
 -#define MDCR_HCCD     (1U << 23)  /* MDCR_EL2 */
 -#define MDCR_EPMAD    (1U << 21)
 -#define MDCR_EDAD     (1U << 20)
 -#define MDCR_TTRF     (1U << 19)
 -#define MDCR_STE      (1U << 18)  /* MDCR_EL3 */
 -#define MDCR_SPME     (1U << 17)  /* MDCR_EL3 */
 -#define MDCR_HPMD     (1U << 17)  /* MDCR_EL2 */
 -#define MDCR_SDD      (1U << 16)
 -#define MDCR_SPD      (3U << 14)
 -#define MDCR_TDRA     (1U << 11)
 -#define MDCR_TDOSA    (1U << 10)
 -#define MDCR_TDA      (1U << 9)
 -#define MDCR_TDE      (1U << 8)
 -#define MDCR_HPME     (1U << 7)
 -#define MDCR_TPM      (1U << 6)
 -#define MDCR_TPMCR    (1U << 5)
 -#define MDCR_HPMN     (0x1fU)
 -
 -/* Not all of the MDCR_EL3 bits are present in the 32-bit SDCR */
 -#define SDCR_VALID_MASK (MDCR_MTPME | MDCR_TDCC | MDCR_SCCD | \
 -                         MDCR_EPMAD | MDCR_EDAD | MDCR_TTRF | \
 -                         MDCR_STE | MDCR_SPME | MDCR_SPD)
 -
  #define CPSR_M (0x1fU)
  #define CPSR_T (1U << 5)
  #define CPSR_F (1U << 6)
@@ -XXX,XX +XXX,XX @@ FIELD(CPTR_EL3, TCPAC, 31, 1)
  #define XPSR_NZCV CPSR_NZCV
  #define XPSR_IT CPSR_IT
 -#define TTBCR_N      (7U << 0) /* TTBCR.EAE==0 */
 -#define TTBCR_T0SZ   (7U << 0) /* TTBCR.EAE==1 */
 -#define TTBCR_PD0    (1U << 4)
 -#define TTBCR_PD1    (1U << 5)
 -#define TTBCR_EPD0   (1U << 7)
 -#define TTBCR_IRGN0  (3U << 8)
 -#define TTBCR_ORGN0  (3U << 10)
 -#define TTBCR_SH0    (3U << 12)
 -#define TTBCR_T1SZ   (3U << 16)
 -#define TTBCR_A1     (1U << 22)
 -#define TTBCR_EPD1   (1U << 23)
 -#define TTBCR_IRGN1  (3U << 24)
 -#define TTBCR_ORGN1  (3U << 26)
 -#define TTBCR_SH1    (1U << 28)
 -#define TTBCR_EAE    (1U << 31)
 -
 -FIELD(VTCR, T0SZ, 0, 6)
 -FIELD(VTCR, SL0, 6, 2)
 -FIELD(VTCR, IRGN0, 8, 2)
 -FIELD(VTCR, ORGN0, 10, 2)
 -FIELD(VTCR, SH0, 12, 2)
 -FIELD(VTCR, TG0, 14, 2)
 -FIELD(VTCR, PS, 16, 3)
 -FIELD(VTCR, VS, 19, 1)
 -FIELD(VTCR, HA, 21, 1)
 -FIELD(VTCR, HD, 22, 1)
 -FIELD(VTCR, HWU59, 25, 1)
 -FIELD(VTCR, HWU60, 26, 1)
 -FIELD(VTCR, HWU61, 27, 1)
 -FIELD(VTCR, HWU62, 28, 1)
 -FIELD(VTCR, NSW, 29, 1)
 -FIELD(VTCR, NSA, 30, 1)
 -FIELD(VTCR, DS, 32, 1)
 -FIELD(VTCR, SL2, 33, 1)
 -
  /* Bit definitions for ARMv8 SPSR (PSTATE) format.
   * Only these are valid when in AArch64 mode; in
   * AArch32 mode SPSRs are basically CPSR-format.
@@ -XXX,XX +XXX,XX @@ static inline void xpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
  #define HCR_TWEDEN    (1ULL << 59)
  #define HCR_TWEDEL    MAKE_64BIT_MASK(60, 4)
 -#define HCRX_ENAS0    (1ULL << 0)
 -#define HCRX_ENALS    (1ULL << 1)
 -#define HCRX_ENASR    (1ULL << 2)
 -#define HCRX_FNXS     (1ULL << 3)
 -#define HCRX_FGTNXS   (1ULL << 4)
 -#define HCRX_SMPME    (1ULL << 5)
 -#define HCRX_TALLINT  (1ULL << 6)
 -#define HCRX_VINMI    (1ULL << 7)
 -#define HCRX_VFNMI    (1ULL << 8)
 -#define HCRX_CMOW     (1ULL << 9)
 -#define HCRX_MCE2     (1ULL << 10)
 -#define HCRX_MSCEN    (1ULL << 11)
 -
 -#define HPFAR_NS      (1ULL << 63)
 -
  #define SCR_NS                (1ULL << 0)
  #define SCR_IRQ               (1ULL << 1)
  #define SCR_FIQ               (1ULL << 2)
@@ -XXX,XX +XXX,XX @@ static inline void xpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
  #define SCR_GPF               (1ULL << 48)
  #define SCR_NSE               (1ULL << 62)
 -#define HSTR_TTEE (1 << 16)
 -#define HSTR_TJDBX (1 << 17)
 -
 -#define CNTHCTL_CNTVMASK      (1 << 18)
 -#define CNTHCTL_CNTPMASK      (1 << 19)
 -
  /* Return the current FPSCR value.  */
  uint32_t vfp_get_fpscr(CPUARMState *env);
  void vfp_set_fpscr(CPUARMState *env, uint32_t val);
 diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/arm/internals.h
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
+@@ -XXX,XX +XXX,XX @@ FIELD(DBGWCR, WT, 20, 1)
-     tcg_temp_free_i32(tmp);
+ FIELD(DBGWCR, MASK, 24, 5)
-     return true;
+ FIELD(DBGWCR, SSCE, 29, 1)
- }
-+
++#define VTCR_NSW (1u << 29)
-+static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
++#define VTCR_NSA (1u << 30)
-+{
++#define VSTCR_SW VTCR_NSW
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
++#define VSTCR_SA VTCR_NSA
-+        return false;
++
-+    }
++/* Bit definitions for CPACR (AArch32 only) */
-+
++FIELD(CPACR, CP10, 20, 2)
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
++FIELD(CPACR, CP11, 22, 2)
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
++FIELD(CPACR, TRCDIS, 28, 1)    /* matches CPACR_EL1.TTA */
-+        ((a->vd | a->vm) & 0x10)) {
++FIELD(CPACR, D32DIS, 30, 1)    /* up to v7; RAZ in v8 */
-+        return false;
++FIELD(CPACR, ASEDIS, 31, 1)
-+    }
++
-+
++/* Bit definitions for CPACR_EL1 (AArch64 only) */
-+    if (a->vd & a->q) {
++FIELD(CPACR_EL1, ZEN, 16, 2)
-+        return false;
++FIELD(CPACR_EL1, FPEN, 20, 2)
-+    }
++FIELD(CPACR_EL1, SMEN, 24, 2)
-+
++FIELD(CPACR_EL1, TTA, 28, 1)   /* matches CPACR.TRCDIS */
-+    if (!vfp_access_check(s)) {
++
-+        return true;
++/* Bit definitions for HCPTR (AArch32 only) */
-+    }
++FIELD(HCPTR, TCP10, 10, 1)
-+
++FIELD(HCPTR, TCP11, 11, 1)
-+    tcg_gen_gvec_dup_mem(a->size, neon_reg_offset(a->vd, 0),
++FIELD(HCPTR, TASE, 15, 1)
-+                         neon_element_offset(a->vm, a->index, a->size),
++FIELD(HCPTR, TTA, 20, 1)
-+                         a->q ? 16 : 8, a->q ? 16 : 8);
++FIELD(HCPTR, TAM, 30, 1)       /* matches CPTR_EL2.TAM */
-+    return true;
++FIELD(HCPTR, TCPAC, 31, 1)     /* matches CPTR_EL2.TCPAC */
-+}
++
-diff --git a/target/arm/translate.c b/target/arm/translate.c
++/* Bit definitions for CPTR_EL2 (AArch64 only) */
-index XXXXXXX..XXXXXXX 100644
++FIELD(CPTR_EL2, TZ, 8, 1)      /* !E2H */
---- a/target/arm/translate.c
++FIELD(CPTR_EL2, TFP, 10, 1)    /* !E2H, matches HCPTR.TCP10 */
-+++ b/target/arm/translate.c
++FIELD(CPTR_EL2, TSM, 12, 1)    /* !E2H */
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
++FIELD(CPTR_EL2, ZEN, 16, 2)    /* E2H */
-                     }
++FIELD(CPTR_EL2, FPEN, 20, 2)   /* E2H */
-                     break;
++FIELD(CPTR_EL2, SMEN, 24, 2)   /* E2H */
-                 }
++FIELD(CPTR_EL2, TTA, 28, 1)
--            } else if ((insn & (1 << 10)) == 0) {
++FIELD(CPTR_EL2, TAM, 30, 1)    /* matches HCPTR.TAM */
--                /* VTBL, VTBX: handled by decodetree */
++FIELD(CPTR_EL2, TCPAC, 31, 1)  /* matches HCPTR.TCPAC */
--                return 1;
++
--            } else if ((insn & 0x380) == 0) {
++/* Bit definitions for CPTR_EL3 (AArch64 only) */
--                /* VDUP */
++FIELD(CPTR_EL3, EZ, 8, 1)
--                int element;
++FIELD(CPTR_EL3, TFP, 10, 1)
--                MemOp size;
++FIELD(CPTR_EL3, ESM, 12, 1)
--
++FIELD(CPTR_EL3, TTA, 20, 1)
--                if ((insn & (7 << 16)) == 0 || (q && (rd & 1))) {
++FIELD(CPTR_EL3, TAM, 30, 1)
--                    return 1;
++FIELD(CPTR_EL3, TCPAC, 31, 1)
--                }
++
--                if (insn & (1 << 16)) {
++#define MDCR_MTPME    (1U << 28)
--                    size = MO_8;
++#define MDCR_TDCC     (1U << 27)
--                    element = (insn >> 17) & 7;
++#define MDCR_HLP      (1U << 26)  /* MDCR_EL2 */
--                } else if (insn & (1 << 17)) {
++#define MDCR_SCCD     (1U << 23)  /* MDCR_EL3 */
--                    size = MO_16;
++#define MDCR_HCCD     (1U << 23)  /* MDCR_EL2 */
--                    element = (insn >> 18) & 3;
++#define MDCR_EPMAD    (1U << 21)
--                } else {
++#define MDCR_EDAD     (1U << 20)
--                    size = MO_32;
++#define MDCR_TTRF     (1U << 19)
--                    element = (insn >> 19) & 1;
++#define MDCR_STE      (1U << 18)  /* MDCR_EL3 */
--                }
++#define MDCR_SPME     (1U << 17)  /* MDCR_EL3 */
--                tcg_gen_gvec_dup_mem(size, neon_reg_offset(rd, 0),
++#define MDCR_HPMD     (1U << 17)  /* MDCR_EL2 */
--                                     neon_element_offset(rm, element, size),
++#define MDCR_SDD      (1U << 16)
--                                     q ? 16 : 8, q ? 16 : 8);
++#define MDCR_SPD      (3U << 14)
-             } else {
++#define MDCR_TDRA     (1U << 11)
-+                /* VTBL, VTBX, VDUP: handled by decodetree */
++#define MDCR_TDOSA    (1U << 10)
-                 return 1;
++#define MDCR_TDA      (1U << 9)
-             }
++#define MDCR_TDE      (1U << 8)
-         }
++#define MDCR_HPME     (1U << 7)
 +#define MDCR_TPM      (1U << 6)
 +#define MDCR_TPMCR    (1U << 5)
 +#define MDCR_HPMN     (0x1fU)
 +
 +/* Not all of the MDCR_EL3 bits are present in the 32-bit SDCR */
 +#define SDCR_VALID_MASK (MDCR_MTPME | MDCR_TDCC | MDCR_SCCD | \
 +                         MDCR_EPMAD | MDCR_EDAD | MDCR_TTRF | \
 +                         MDCR_STE | MDCR_SPME | MDCR_SPD)
 +
 +#define TTBCR_N      (7U << 0) /* TTBCR.EAE==0 */
 +#define TTBCR_T0SZ   (7U << 0) /* TTBCR.EAE==1 */
 +#define TTBCR_PD0    (1U << 4)
 +#define TTBCR_PD1    (1U << 5)
 +#define TTBCR_EPD0   (1U << 7)
 +#define TTBCR_IRGN0  (3U << 8)
 +#define TTBCR_ORGN0  (3U << 10)
 +#define TTBCR_SH0    (3U << 12)
 +#define TTBCR_T1SZ   (3U << 16)
 +#define TTBCR_A1     (1U << 22)
 +#define TTBCR_EPD1   (1U << 23)
 +#define TTBCR_IRGN1  (3U << 24)
 +#define TTBCR_ORGN1  (3U << 26)
 +#define TTBCR_SH1    (1U << 28)
 +#define TTBCR_EAE    (1U << 31)
 +
 +FIELD(VTCR, T0SZ, 0, 6)
 +FIELD(VTCR, SL0, 6, 2)
 +FIELD(VTCR, IRGN0, 8, 2)
 +FIELD(VTCR, ORGN0, 10, 2)
 +FIELD(VTCR, SH0, 12, 2)
 +FIELD(VTCR, TG0, 14, 2)
 +FIELD(VTCR, PS, 16, 3)
 +FIELD(VTCR, VS, 19, 1)
 +FIELD(VTCR, HA, 21, 1)
 +FIELD(VTCR, HD, 22, 1)
 +FIELD(VTCR, HWU59, 25, 1)
 +FIELD(VTCR, HWU60, 26, 1)
 +FIELD(VTCR, HWU61, 27, 1)
 +FIELD(VTCR, HWU62, 28, 1)
 +FIELD(VTCR, NSW, 29, 1)
 +FIELD(VTCR, NSA, 30, 1)
 +FIELD(VTCR, DS, 32, 1)
 +FIELD(VTCR, SL2, 33, 1)
 +
 +#define HCRX_ENAS0    (1ULL << 0)
 +#define HCRX_ENALS    (1ULL << 1)
 +#define HCRX_ENASR    (1ULL << 2)
 +#define HCRX_FNXS     (1ULL << 3)
 +#define HCRX_FGTNXS   (1ULL << 4)
 +#define HCRX_SMPME    (1ULL << 5)
 +#define HCRX_TALLINT  (1ULL << 6)
 +#define HCRX_VINMI    (1ULL << 7)
 +#define HCRX_VFNMI    (1ULL << 8)
 +#define HCRX_CMOW     (1ULL << 9)
 +#define HCRX_MCE2     (1ULL << 10)
 +#define HCRX_MSCEN    (1ULL << 11)
 +
 +#define HPFAR_NS      (1ULL << 63)
 +
 +#define HSTR_TTEE (1 << 16)
 +#define HSTR_TJDBX (1 << 17)
 +
 +#define CNTHCTL_CNTVMASK      (1 << 18)
 +#define CNTHCTL_CNTPMASK      (1 << 19)
 +
  /* We use a few fake FSR values for internal purposes in M profile.
   * M profile cores don't have A/R format FSRs, but currently our
   * get_phys_addr() code assumes A/R profile and reports failures via
 --
-.20.1
+.34.1

-[PULL 16/23] target/arm: Convert Neon VTBL, VTBX to decodetree
+[PULL 02/14] target/arm: Timer _EL02 registers UNDEF for E2H == 0
-Convert the Neon VTBL, VTBX instructions to decodetree.  The actual
+The timer _EL02 registers should UNDEF for invalid accesses from EL2
-implementation of the insn is copied across to the new trans function
+or EL3 when HCR_EL2.E2H == 0, not take a cp access trap.  We were
-unchanged except for renaming 'tmp5' to 'tmp4'.
+delivering the exception to EL2 with the wrong syndrome.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240301183219.2424889-3-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  3 ++
+ target/arm/helper.c | 2 +-
- target/arm/translate-neon.inc.c | 56 +++++++++++++++++++++++++++++++++
+file changed, 1 insertion(+), 1 deletion(-)
  target/arm/translate.c          | 41 +++---------------------
 files changed, 63 insertions(+), 37 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/helper.c
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ static CPAccessResult e2h_access(CPUARMState *env, const ARMCPRegInfo *ri,
-     ##################################################################
+         return CP_ACCESS_OK;
      VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
                   vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +
 +    VTBL         1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
 +                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
    ]
    # Subgroup for size != 0b11
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
      }
-     return true;
+     if (!(arm_hcr_el2_eff(env) & HCR_E2H)) {
 -        return CP_ACCESS_TRAP;
 +        return CP_ACCESS_TRAP_UNCATEGORIZED;
      }
      return CP_ACCESS_OK;
  }
-+
-+static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
-+{
-+    int n;
-+    TCGv_i32 tmp, tmp2, tmp3, tmp4;
-+    TCGv_ptr ptr1;
-+
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    n = a->len + 1;
-+    if ((a->vn + n) > 32) {
-+        /*
-+         * This is UNPREDICTABLE; we choose to UNDEF to avoid the
-+         * helper function running off the end of the register file.
-+         */
-+        return false;
-+    }
-+    n <<= 3;
-+    if (a->op) {
-+        tmp = neon_load_reg(a->vd, 0);
-+    } else {
-+        tmp = tcg_temp_new_i32();
-+        tcg_gen_movi_i32(tmp, 0);
-+    }
-+    tmp2 = neon_load_reg(a->vm, 0);
-+    ptr1 = vfp_reg_ptr(true, a->vn);
-+    tmp4 = tcg_const_i32(n);
-+    gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
-+    tcg_temp_free_i32(tmp);
-+    if (a->op) {
-+        tmp = neon_load_reg(a->vd, 1);
-+    } else {
-+        tmp = tcg_temp_new_i32();
-+        tcg_gen_movi_i32(tmp, 0);
-+    }
-+    tmp3 = neon_load_reg(a->vm, 1);
-+    gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
-+    tcg_temp_free_i32(tmp4);
-+    tcg_temp_free_ptr(ptr1);
-+    neon_store_reg(a->vd, 0, tmp2);
-+    neon_store_reg(a->vd, 1, tmp3);
-+    tcg_temp_free_i32(tmp);
-+    return true;
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
- {
-     int op;
-     int q;
--    int rd, rn, rm, rd_ofs, rm_ofs;
-+    int rd, rm, rd_ofs, rm_ofs;
-     int size;
-     int pass;
-     int u;
-     int vec_size;
--    TCGv_i32 tmp, tmp2, tmp3, tmp5;
--    TCGv_ptr ptr1;
-+    TCGv_i32 tmp, tmp2, tmp3;
-     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-         return 1;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-     q = (insn & (1 << 6)) != 0;
-     u = (insn >> 24) & 1;
-     VFP_DREG_D(rd, insn);
--    VFP_DREG_N(rn, insn);
-     VFP_DREG_M(rm, insn);
-     size = (insn >> 20) & 3;
-     vec_size = q ? 16 : 8;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                     break;
-                 }
-             } else if ((insn & (1 << 10)) == 0) {
--                /* VTBL, VTBX.  */
--                int n = ((insn >> 8) & 3) + 1;
--                if ((rn + n) > 32) {
--                    /* This is UNPREDICTABLE; we choose to UNDEF to avoid the
--                     * helper function running off the end of the register file.
--                     */
--                    return 1;
--                }
--                n <<= 3;
--                if (insn & (1 << 6)) {
--                    tmp = neon_load_reg(rd, 0);
--                } else {
--                    tmp = tcg_temp_new_i32();
--                    tcg_gen_movi_i32(tmp, 0);
--                }
--                tmp2 = neon_load_reg(rm, 0);
--                ptr1 = vfp_reg_ptr(true, rn);
--                tmp5 = tcg_const_i32(n);
--                gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp5);
--                tcg_temp_free_i32(tmp);
--                if (insn & (1 << 6)) {
--                    tmp = neon_load_reg(rd, 1);
--                } else {
--                    tmp = tcg_temp_new_i32();
--                    tcg_gen_movi_i32(tmp, 0);
--                }
--                tmp3 = neon_load_reg(rm, 1);
--                gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp5);
--                tcg_temp_free_i32(tmp5);
--                tcg_temp_free_ptr(ptr1);
--                neon_store_reg(rd, 0, tmp2);
--                neon_store_reg(rd, 1, tmp3);
--                tcg_temp_free_i32(tmp);
-+                /* VTBL, VTBX: handled by decodetree */
-+                return 1;
-             } else if ((insn & 0x380) == 0) {
-                 /* VDUP */
-                 int element;
 --
-.20.1
+.34.1

-[PULL 03/23] target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
+[PULL 03/14] target/arm: use FIELD macro for CNTHCTL bit definitions
-Convert the narrow-to-high-half insns VADDHN, VSUBHN, VRADDHN,
+We prefer the FIELD macro over ad-hoc #defines for register bits;
-VRSUBHN in the Neon 3-registers-different-lengths group to
+switch CNTHCTL to that style before we add any more bits.
 decodetree.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240301183219.2424889-4-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  6 +++
+ target/arm/internals.h | 27 +++++++++++++++++++++++++--
- target/arm/translate-neon.inc.c | 87 +++++++++++++++++++++++++++++++
+ target/arm/helper.c    |  9 ++++-----
- target/arm/translate.c          | 91 ++++-----------------------------
+files changed, 29 insertions(+), 7 deletions(-)
 files changed, 104 insertions(+), 80 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/internals.h b/target/arm/internals.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/internals.h
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/internals.h
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ FIELD(VTCR, SL2, 33, 1)
+ #define HSTR_TTEE (1 << 16)
-     VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+ #define HSTR_TJDBX (1 << 17)
-     VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
-+
+-#define CNTHCTL_CNTVMASK      (1 << 18)
-+    VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+-#define CNTHCTL_CNTPMASK      (1 << 19)
-+    VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
++/*
-+
++ * Depending on the value of HCR_EL2.E2H, bits 0 and 1
-+    VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
++ * have different bit definitions, and EL1PCTEN might be
-+    VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
++ * bit 0 or bit 10. We use _E2H1 and _E2H0 suffixes to
-   ]
++ * disambiguate if necessary.
- }
++ */
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
++FIELD(CNTHCTL, EL0PCTEN_E2H1, 0, 1)
 +FIELD(CNTHCTL, EL0VCTEN_E2H1, 1, 1)
 +FIELD(CNTHCTL, EL1PCTEN_E2H0, 0, 1)
 +FIELD(CNTHCTL, EL1PCEN_E2H0, 1, 1)
 +FIELD(CNTHCTL, EVNTEN, 2, 1)
 +FIELD(CNTHCTL, EVNTDIR, 3, 1)
 +FIELD(CNTHCTL, EVNTI, 4, 4)
 +FIELD(CNTHCTL, EL0VTEN, 8, 1)
 +FIELD(CNTHCTL, EL0PTEN, 9, 1)
 +FIELD(CNTHCTL, EL1PCTEN_E2H1, 10, 1)
 +FIELD(CNTHCTL, EL1PTEN, 11, 1)
 +FIELD(CNTHCTL, ECV, 12, 1)
 +FIELD(CNTHCTL, EL1TVT, 13, 1)
 +FIELD(CNTHCTL, EL1TVCT, 14, 1)
 +FIELD(CNTHCTL, EL1NVPCT, 15, 1)
 +FIELD(CNTHCTL, EL1NVVCT, 16, 1)
 +FIELD(CNTHCTL, EVNTIS, 17, 1)
 +FIELD(CNTHCTL, CNTVMASK, 18, 1)
 +FIELD(CNTHCTL, CNTPMASK, 19, 1)
  /* We use a few fake FSR values for internal purposes in M profile.
   * M profile cores don't have A/R format FSRs, but currently our
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/arm/helper.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ DO_PREWIDEN(VADDW_S, s, ext, add, true)
+@@ -XXX,XX +XXX,XX @@ static void gt_update_irq(ARMCPU *cpu, int timeridx)
- DO_PREWIDEN(VADDW_U, u, extu, add, true)
+      * It is RES0 in Secure and NonSecure state.
- DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
+      */
- DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
+     if ((ss == ARMSS_Root || ss == ARMSS_Realm) &&
-+
+-        ((timeridx == GTIMER_VIRT && (cnthctl & CNTHCTL_CNTVMASK)) ||
-+static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
+-         (timeridx == GTIMER_PHYS && (cnthctl & CNTHCTL_CNTPMASK)))) {
-+                         NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
++        ((timeridx == GTIMER_VIRT && (cnthctl & R_CNTHCTL_CNTVMASK_MASK)) ||
-+{
++         (timeridx == GTIMER_PHYS && (cnthctl & R_CNTHCTL_CNTPMASK_MASK)))) {
-+    /* 3-regs different lengths, narrowing (VADDHN/VSUBHN/VRADDHN/VRSUBHN) */
+         irqstate = 0;
-+    TCGv_i64 rn_64, rm_64;
+     }
-+    TCGv_i32 rd0, rd1;
-+
+@@ -XXX,XX +XXX,XX @@ static void gt_cnthctl_write(CPUARMState *env, const ARMCPRegInfo *ri,
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+ {
-+        return false;
+     ARMCPU *cpu = env_archcpu(env);
-+    }
+     uint32_t oldval = env->cp15.cnthctl_el2;
-+
+-
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
+     raw_write(env, ri, value);
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
+-    if ((oldval ^ value) & CNTHCTL_CNTVMASK) {
-+        return false;
++    if ((oldval ^ value) & R_CNTHCTL_CNTVMASK_MASK) {
-+    }
+         gt_update_irq(cpu, GTIMER_VIRT);
-+
+-    } else if ((oldval ^ value) & CNTHCTL_CNTPMASK) {
-+    if (!opfn || !narrowfn) {
++    } else if ((oldval ^ value) & R_CNTHCTL_CNTPMASK_MASK) {
-+        /* size == 3 case, which is an entirely different insn group */
+         gt_update_irq(cpu, GTIMER_PHYS);
 +        return false;
 +    }
 +
 +    if ((a->vn | a->vm) & 1) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    rn_64 = tcg_temp_new_i64();
 +    rm_64 = tcg_temp_new_i64();
 +    rd0 = tcg_temp_new_i32();
 +    rd1 = tcg_temp_new_i32();
 +
 +    neon_load_reg64(rn_64, a->vn);
 +    neon_load_reg64(rm_64, a->vm);
 +
 +    opfn(rn_64, rn_64, rm_64);
 +
 +    narrowfn(rd0, rn_64);
 +
 +    neon_load_reg64(rn_64, a->vn + 1);
 +    neon_load_reg64(rm_64, a->vm + 1);
 +
 +    opfn(rn_64, rn_64, rm_64);
 +
 +    narrowfn(rd1, rn_64);
 +
 +    neon_store_reg(a->vd, 0, rd0);
 +    neon_store_reg(a->vd, 1, rd1);
 +
 +    tcg_temp_free_i64(rn_64);
 +    tcg_temp_free_i64(rm_64);
 +
 +    return true;
 +}
 +
 +#define DO_NARROW_3D(INSN, OP, NARROWTYPE, EXTOP)                       \
 +    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
 +    {                                                                   \
 +        static NeonGenTwo64OpFn * const addfn[] = {                     \
 +            gen_helper_neon_##OP##l_u16,                                \
 +            gen_helper_neon_##OP##l_u32,                                \
 +            tcg_gen_##OP##_i64,                                         \
 +            NULL,                                                       \
 +        };                                                              \
 +        static NeonGenNarrowFn * const narrowfn[] = {                   \
 +            gen_helper_neon_##NARROWTYPE##_high_u8,                     \
 +            gen_helper_neon_##NARROWTYPE##_high_u16,                    \
 +            EXTOP,                                                      \
 +            NULL,                                                       \
 +        };                                                              \
 +        return do_narrow_3d(s, a, addfn[a->size], narrowfn[a->size]);   \
 +    }
 +
 +static void gen_narrow_round_high_u32(TCGv_i32 rd, TCGv_i64 rn)
 +{
 +    tcg_gen_addi_i64(rn, rn, 1u << 31);
 +    tcg_gen_extrh_i64_i32(rd, rn);
 +}
 +
 +DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
 +DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
 +DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
 +DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
      }
  }
--static inline void gen_neon_subl(int size)
--{
--    switch (size) {
--    case 0: gen_helper_neon_subl_u16(CPU_V001); break;
--    case 1: gen_helper_neon_subl_u32(CPU_V001); break;
--    case 2: tcg_gen_sub_i64(CPU_V001); break;
--    default: abort();
--    }
--}
--
- static inline void gen_neon_negl(TCGv_i64 var, int size)
- {
-     switch (size) {
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-             op = (insn >> 8) & 0xf;
-             if ((insn & (1 << 6)) == 0) {
-                 /* Three registers of different lengths.  */
--                int src1_wide;
--                int src2_wide;
-                 /* undefreq: bit 0 : UNDEF if size == 0
-                  *           bit 1 : UNDEF if size == 1
-                  *           bit 2 : UNDEF if size == 2
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                     {0, 0, 0, 7}, /* VADDW: handled by decodetree */
-                     {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
-                     {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
--                    {0, 1, 1, 0}, /* VADDHN */
-+                    {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
-                     {0, 0, 0, 0}, /* VABAL */
--                    {0, 1, 1, 0}, /* VSUBHN */
-+                    {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
-                     {0, 0, 0, 0}, /* VABDL */
-                     {0, 0, 0, 0}, /* VMLAL */
-                     {0, 0, 0, 9}, /* VQDMLAL */
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
-                 };
--                src1_wide = neon_3reg_wide[op][1];
--                src2_wide = neon_3reg_wide[op][2];
-                 undefreq = neon_3reg_wide[op][3];
-                 if ((undefreq & (1 << size)) ||
-                     ((undefreq & 8) && u)) {
-                     return 1;
-                 }
--                if ((src1_wide && (rn & 1)) ||
--                    (src2_wide && (rm & 1)) ||
--                    (!src2_wide && (rd & 1))) {
-+                if (rd & 1) {
-                     return 1;
-                 }
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                 /* Avoid overlapping operands.  Wide source operands are
-                    always aligned so will never overlap with wide
-                    destinations in problematic ways.  */
--                if (rd == rm && !src2_wide) {
-+                if (rd == rm) {
-                     tmp = neon_load_reg(rm, 1);
-                     neon_store_scratch(2, tmp);
--                } else if (rd == rn && !src1_wide) {
-+                } else if (rd == rn) {
-                     tmp = neon_load_reg(rn, 1);
-                     neon_store_scratch(2, tmp);
-                 }
-                 tmp3 = NULL;
-                 for (pass = 0; pass < 2; pass++) {
--                    if (src1_wide) {
--                        neon_load_reg64(cpu_V0, rn + pass);
--                        tmp = NULL;
-+                    if (pass == 1 && rd == rn) {
-+                        tmp = neon_load_scratch(2);
-                     } else {
--                        if (pass == 1 && rd == rn) {
--                            tmp = neon_load_scratch(2);
--                        } else {
--                            tmp = neon_load_reg(rn, pass);
--                        }
-+                        tmp = neon_load_reg(rn, pass);
-                     }
--                    if (src2_wide) {
--                        neon_load_reg64(cpu_V1, rm + pass);
--                        tmp2 = NULL;
-+                    if (pass == 1 && rd == rm) {
-+                        tmp2 = neon_load_scratch(2);
-                     } else {
--                        if (pass == 1 && rd == rm) {
--                            tmp2 = neon_load_scratch(2);
--                        } else {
--                            tmp2 = neon_load_reg(rm, pass);
--                        }
-+                        tmp2 = neon_load_reg(rm, pass);
-                     }
-                     switch (op) {
--                    case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
--                        gen_neon_addl(size);
--                        break;
--                    case 2: case 3: case 6: /* VSUBL, VSUBW, VSUBHN, VRSUBHN */
--                        gen_neon_subl(size);
--                        break;
-                     case 5: case 7: /* VABAL, VABDL */
-                         switch ((size << 1) | u) {
-                         case 0:
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                             abort();
-                         }
-                         neon_store_reg64(cpu_V0, rd + pass);
--                    } else if (op == 4 || op == 6) {
--                        /* Narrowing operation.  */
--                        tmp = tcg_temp_new_i32();
--                        if (!u) {
--                            switch (size) {
--                            case 0:
--                                gen_helper_neon_narrow_high_u8(tmp, cpu_V0);
--                                break;
--                            case 1:
--                                gen_helper_neon_narrow_high_u16(tmp, cpu_V0);
--                                break;
--                            case 2:
--                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
--                                break;
--                            default: abort();
--                            }
--                        } else {
--                            switch (size) {
--                            case 0:
--                                gen_helper_neon_narrow_round_high_u8(tmp, cpu_V0);
--                                break;
--                            case 1:
--                                gen_helper_neon_narrow_round_high_u16(tmp, cpu_V0);
--                                break;
--                            case 2:
--                                tcg_gen_addi_i64(cpu_V0, cpu_V0, 1u << 31);
--                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
--                                break;
--                            default: abort();
--                            }
--                        }
--                        if (pass == 0) {
--                            tmp3 = tmp;
--                        } else {
--                            neon_store_reg(rd, 0, tmp3);
--                            neon_store_reg(rd, 1, tmp);
--                        }
-                     } else {
-                         /* Write back the result.  */
-                         neon_store_reg64(cpu_V0, rd + pass);
 --
-.20.1
+.34.1

-[PULL 04/23] target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
+Deleted patch
-Convert the Neon 3-reg-diff insns VABAL and VABDL to decodetree.
-Like almost all the remaining insns in this group, these are
-a combination of a two-input operation which returns a double width
-result and then a possible accumulation of that double width
-result into the destination.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate.h          |   1 +
- target/arm/neon-dp.decode       |   6 ++
- target/arm/translate-neon.inc.c | 132 ++++++++++++++++++++++++++++++++
- target/arm/translate.c          |  31 +-------
-files changed, 142 insertions(+), 28 deletions(-)
-diff --git a/target/arm/translate.h b/target/arm/translate.h
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.h
-+++ b/target/arm/translate.h
-@@ -XXX,XX +XXX,XX @@ typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
- typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
- typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
- typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
-+typedef void NeonGenTwoOpWidenFn(TCGv_i64, TCGv_i32, TCGv_i32);
- typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
- typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
- typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
-+++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
-     VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
-     VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
-+    VABAL_S_3d   1111 001 0 1 . .. .... .... 0101 . 0 . 0 .... @3diff
-+    VABAL_U_3d   1111 001 1 1 . .. .... .... 0101 . 0 . 0 .... @3diff
-+
-     VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
-     VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
-+
-+    VABDL_S_3d   1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
-+    VABDL_U_3d   1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
-   ]
- }
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
- DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
- DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
- DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
-+
-+static bool do_long_3d(DisasContext *s, arg_3diff *a,
-+                       NeonGenTwoOpWidenFn *opfn,
-+                       NeonGenTwo64OpFn *accfn)
-+{
-+    /*
-+     * 3-regs different lengths, long operations.
-+     * These perform an operation on two inputs that returns a double-width
-+     * result, and then possibly perform an accumulation operation of
-+     * that result into the double-width destination.
-+     */
-+    TCGv_i64 rd0, rd1, tmp;
-+    TCGv_i32 rn, rm;
-+
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
-+        return false;
-+    }
-+
-+    if (!opfn) {
-+        /* size == 3 case, which is an entirely different insn group */
-+        return false;
-+    }
-+
-+    if (a->vd & 1) {
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    rd0 = tcg_temp_new_i64();
-+    rd1 = tcg_temp_new_i64();
-+
-+    rn = neon_load_reg(a->vn, 0);
-+    rm = neon_load_reg(a->vm, 0);
-+    opfn(rd0, rn, rm);
-+    tcg_temp_free_i32(rn);
-+    tcg_temp_free_i32(rm);
-+
-+    rn = neon_load_reg(a->vn, 1);
-+    rm = neon_load_reg(a->vm, 1);
-+    opfn(rd1, rn, rm);
-+    tcg_temp_free_i32(rn);
-+    tcg_temp_free_i32(rm);
-+
-+    /* Don't store results until after all loads: they might overlap */
-+    if (accfn) {
-+        tmp = tcg_temp_new_i64();
-+        neon_load_reg64(tmp, a->vd);
-+        accfn(tmp, tmp, rd0);
-+        neon_store_reg64(tmp, a->vd);
-+        neon_load_reg64(tmp, a->vd + 1);
-+        accfn(tmp, tmp, rd1);
-+        neon_store_reg64(tmp, a->vd + 1);
-+        tcg_temp_free_i64(tmp);
-+    } else {
-+        neon_store_reg64(rd0, a->vd);
-+        neon_store_reg64(rd1, a->vd + 1);
-+    }
-+
-+    tcg_temp_free_i64(rd0);
-+    tcg_temp_free_i64(rd1);
-+
-+    return true;
-+}
-+
-+static bool trans_VABDL_S_3d(DisasContext *s, arg_3diff *a)
-+{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        gen_helper_neon_abdl_s16,
-+        gen_helper_neon_abdl_s32,
-+        gen_helper_neon_abdl_s64,
-+        NULL,
-+    };
-+
-+    return do_long_3d(s, a, opfn[a->size], NULL);
-+}
-+
-+static bool trans_VABDL_U_3d(DisasContext *s, arg_3diff *a)
-+{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        gen_helper_neon_abdl_u16,
-+        gen_helper_neon_abdl_u32,
-+        gen_helper_neon_abdl_u64,
-+        NULL,
-+    };
-+
-+    return do_long_3d(s, a, opfn[a->size], NULL);
-+}
-+
-+static bool trans_VABAL_S_3d(DisasContext *s, arg_3diff *a)
-+{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        gen_helper_neon_abdl_s16,
-+        gen_helper_neon_abdl_s32,
-+        gen_helper_neon_abdl_s64,
-+        NULL,
-+    };
-+    static NeonGenTwo64OpFn * const addfn[] = {
-+        gen_helper_neon_addl_u16,
-+        gen_helper_neon_addl_u32,
-+        tcg_gen_add_i64,
-+        NULL,
-+    };
-+
-+    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
-+}
-+
-+static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
-+{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        gen_helper_neon_abdl_u16,
-+        gen_helper_neon_abdl_u32,
-+        gen_helper_neon_abdl_u64,
-+        NULL,
-+    };
-+    static NeonGenTwo64OpFn * const addfn[] = {
-+        gen_helper_neon_addl_u16,
-+        gen_helper_neon_addl_u32,
-+        tcg_gen_add_i64,
-+        NULL,
-+    };
-+
-+    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                     {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
-                     {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
-                     {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
--                    {0, 0, 0, 0}, /* VABAL */
-+                    {0, 0, 0, 7}, /* VABAL */
-                     {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
--                    {0, 0, 0, 0}, /* VABDL */
-+                    {0, 0, 0, 7}, /* VABDL */
-                     {0, 0, 0, 0}, /* VMLAL */
-                     {0, 0, 0, 9}, /* VQDMLAL */
-                     {0, 0, 0, 0}, /* VMLSL */
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                         tmp2 = neon_load_reg(rm, pass);
-                     }
-                     switch (op) {
--                    case 5: case 7: /* VABAL, VABDL */
--                        switch ((size << 1) | u) {
--                        case 0:
--                            gen_helper_neon_abdl_s16(cpu_V0, tmp, tmp2);
--                            break;
--                        case 1:
--                            gen_helper_neon_abdl_u16(cpu_V0, tmp, tmp2);
--                            break;
--                        case 2:
--                            gen_helper_neon_abdl_s32(cpu_V0, tmp, tmp2);
--                            break;
--                        case 3:
--                            gen_helper_neon_abdl_u32(cpu_V0, tmp, tmp2);
--                            break;
--                        case 4:
--                            gen_helper_neon_abdl_s64(cpu_V0, tmp, tmp2);
--                            break;
--                        case 5:
--                            gen_helper_neon_abdl_u64(cpu_V0, tmp, tmp2);
--                            break;
--                        default: abort();
--                        }
--                        tcg_temp_free_i32(tmp2);
--                        tcg_temp_free_i32(tmp);
--                        break;
-                     case 8: case 9: case 10: case 11: case 12: case 13:
-                         /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
-                         gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                         case 10: /* VMLSL */
-                             gen_neon_negl(cpu_V0, size);
-                             /* Fall through */
--                        case 5: case 8: /* VABAL, VMLAL */
-+                        case 8: /* VABAL, VMLAL */
-                             gen_neon_addl(size);
-                             break;
-                         case 9: case 11: /* VQDMLAL, VQDMLSL */
---
-.20.1

-[PULL 05/23] target/arm: Convert Neon 3-reg-diff long multiplies
+Deleted patch
-Convert the Neon 3-reg-diff insns VMULL, VMLAL and VMLSL; these perform
-a 32x32->64 multiply with possible accumulate.
-Note that for VMLSL we do the accumulate directly with a subtraction
-rather than doing a negate-then-add as the old code did.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/neon-dp.decode       |  9 +++++
- target/arm/translate-neon.inc.c | 71 +++++++++++++++++++++++++++++++++
- target/arm/translate.c          | 21 +++-------
-files changed, 86 insertions(+), 15 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
-+++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
-     VABDL_S_3d   1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
-     VABDL_U_3d   1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
-+
-+    VMLAL_S_3d   1111 001 0 1 . .. .... .... 1000 . 0 . 0 .... @3diff
-+    VMLAL_U_3d   1111 001 1 1 . .. .... .... 1000 . 0 . 0 .... @3diff
-+
-+    VMLSL_S_3d   1111 001 0 1 . .. .... .... 1010 . 0 . 0 .... @3diff
-+    VMLSL_U_3d   1111 001 1 1 . .. .... .... 1010 . 0 . 0 .... @3diff
-+
-+    VMULL_S_3d   1111 001 0 1 . .. .... .... 1100 . 0 . 0 .... @3diff
-+    VMULL_U_3d   1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
-   ]
- }
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
-     return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
- }
-+
-+static void gen_mull_s32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
-+{
-+    TCGv_i32 lo = tcg_temp_new_i32();
-+    TCGv_i32 hi = tcg_temp_new_i32();
-+
-+    tcg_gen_muls2_i32(lo, hi, rn, rm);
-+    tcg_gen_concat_i32_i64(rd, lo, hi);
-+
-+    tcg_temp_free_i32(lo);
-+    tcg_temp_free_i32(hi);
-+}
-+
-+static void gen_mull_u32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
-+{
-+    TCGv_i32 lo = tcg_temp_new_i32();
-+    TCGv_i32 hi = tcg_temp_new_i32();
-+
-+    tcg_gen_mulu2_i32(lo, hi, rn, rm);
-+    tcg_gen_concat_i32_i64(rd, lo, hi);
-+
-+    tcg_temp_free_i32(lo);
-+    tcg_temp_free_i32(hi);
-+}
-+
-+static bool trans_VMULL_S_3d(DisasContext *s, arg_3diff *a)
-+{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        gen_helper_neon_mull_s8,
-+        gen_helper_neon_mull_s16,
-+        gen_mull_s32,
-+        NULL,
-+    };
-+
-+    return do_long_3d(s, a, opfn[a->size], NULL);
-+}
-+
-+static bool trans_VMULL_U_3d(DisasContext *s, arg_3diff *a)
-+{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        gen_helper_neon_mull_u8,
-+        gen_helper_neon_mull_u16,
-+        gen_mull_u32,
-+        NULL,
-+    };
-+
-+    return do_long_3d(s, a, opfn[a->size], NULL);
-+}
-+
-+#define DO_VMLAL(INSN,MULL,ACC)                                         \
-+    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
-+    {                                                                   \
-+        static NeonGenTwoOpWidenFn * const opfn[] = {                   \
-+            gen_helper_neon_##MULL##8,                                  \
-+            gen_helper_neon_##MULL##16,                                 \
-+            gen_##MULL##32,                                             \
-+            NULL,                                                       \
-+        };                                                              \
-+        static NeonGenTwo64OpFn * const accfn[] = {                     \
-+            gen_helper_neon_##ACC##l_u16,                               \
-+            gen_helper_neon_##ACC##l_u32,                               \
-+            tcg_gen_##ACC##_i64,                                        \
-+            NULL,                                                       \
-+        };                                                              \
-+        return do_long_3d(s, a, opfn[a->size], accfn[a->size]);         \
-+    }
-+
-+DO_VMLAL(VMLAL_S,mull_s,add)
-+DO_VMLAL(VMLAL_U,mull_u,add)
-+DO_VMLAL(VMLSL_S,mull_s,sub)
-+DO_VMLAL(VMLSL_U,mull_u,sub)
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                     {0, 0, 0, 7}, /* VABAL */
-                     {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
-                     {0, 0, 0, 7}, /* VABDL */
--                    {0, 0, 0, 0}, /* VMLAL */
-+                    {0, 0, 0, 7}, /* VMLAL */
-                     {0, 0, 0, 9}, /* VQDMLAL */
--                    {0, 0, 0, 0}, /* VMLSL */
-+                    {0, 0, 0, 7}, /* VMLSL */
-                     {0, 0, 0, 9}, /* VQDMLSL */
--                    {0, 0, 0, 0}, /* Integer VMULL */
-+                    {0, 0, 0, 7}, /* Integer VMULL */
-                     {0, 0, 0, 9}, /* VQDMULL */
-                     {0, 0, 0, 0xa}, /* Polynomial VMULL */
-                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                         tmp2 = neon_load_reg(rm, pass);
-                     }
-                     switch (op) {
--                    case 8: case 9: case 10: case 11: case 12: case 13:
--                        /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
-+                    case 9: case 11: case 13:
-+                        /* VQDMLAL, VQDMLSL, VQDMULL */
-                         gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
-                         break;
-                     default: /* 15 is RESERVED: caught earlier  */
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                         /* VQDMULL */
-                         gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
-                         neon_store_reg64(cpu_V0, rd + pass);
--                    } else if (op == 5 || (op >= 8 && op <= 11)) {
-+                    } else {
-                         /* Accumulate.  */
-                         neon_load_reg64(cpu_V1, rd + pass);
-                         switch (op) {
--                        case 10: /* VMLSL */
--                            gen_neon_negl(cpu_V0, size);
--                            /* Fall through */
--                        case 8: /* VABAL, VMLAL */
--                            gen_neon_addl(size);
--                            break;
-                         case 9: case 11: /* VQDMLAL, VQDMLSL */
-                             gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
-                             if (op == 11) {
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                             abort();
-                         }
-                         neon_store_reg64(cpu_V0, rd + pass);
--                    } else {
--                        /* Write back the result.  */
--                        neon_store_reg64(cpu_V0, rd + pass);
-                     }
-                 }
-             } else {
---
-.20.1

-[PULL 06/23] target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
+Deleted patch
-Convert the Neon 3-reg-diff insns VQDMULL, VQDMLAL and VQDMLSL:
-these are all saturating doubling long multiplies with a possible
-accumulate step.
-These are the last insns in the group which use the pass-over-each
-elements loop, so we can delete that code.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/neon-dp.decode       |  6 +++
- target/arm/translate-neon.inc.c | 82 +++++++++++++++++++++++++++++++++
- target/arm/translate.c          | 59 ++----------------------
-files changed, 92 insertions(+), 55 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
-+++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
-     VMLAL_S_3d   1111 001 0 1 . .. .... .... 1000 . 0 . 0 .... @3diff
-     VMLAL_U_3d   1111 001 1 1 . .. .... .... 1000 . 0 . 0 .... @3diff
-+    VQDMLAL_3d   1111 001 0 1 . .. .... .... 1001 . 0 . 0 .... @3diff
-+
-     VMLSL_S_3d   1111 001 0 1 . .. .... .... 1010 . 0 . 0 .... @3diff
-     VMLSL_U_3d   1111 001 1 1 . .. .... .... 1010 . 0 . 0 .... @3diff
-+    VQDMLSL_3d   1111 001 0 1 . .. .... .... 1011 . 0 . 0 .... @3diff
-+
-     VMULL_S_3d   1111 001 0 1 . .. .... .... 1100 . 0 . 0 .... @3diff
-     VMULL_U_3d   1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
-+
-+    VQDMULL_3d   1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
-   ]
- }
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ DO_VMLAL(VMLAL_S,mull_s,add)
- DO_VMLAL(VMLAL_U,mull_u,add)
- DO_VMLAL(VMLSL_S,mull_s,sub)
- DO_VMLAL(VMLSL_U,mull_u,sub)
-+
-+static void gen_VQDMULL_16(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
-+{
-+    gen_helper_neon_mull_s16(rd, rn, rm);
-+    gen_helper_neon_addl_saturate_s32(rd, cpu_env, rd, rd);
-+}
-+
-+static void gen_VQDMULL_32(TCGv_i64 rd, TCGv_i32 rn, TCGv_i32 rm)
-+{
-+    gen_mull_s32(rd, rn, rm);
-+    gen_helper_neon_addl_saturate_s64(rd, cpu_env, rd, rd);
-+}
-+
-+static bool trans_VQDMULL_3d(DisasContext *s, arg_3diff *a)
-+{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        NULL,
-+        gen_VQDMULL_16,
-+        gen_VQDMULL_32,
-+        NULL,
-+    };
-+
-+    return do_long_3d(s, a, opfn[a->size], NULL);
-+}
-+
-+static void gen_VQDMLAL_acc_16(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-+{
-+    gen_helper_neon_addl_saturate_s32(rd, cpu_env, rn, rm);
-+}
-+
-+static void gen_VQDMLAL_acc_32(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-+{
-+    gen_helper_neon_addl_saturate_s64(rd, cpu_env, rn, rm);
-+}
-+
-+static bool trans_VQDMLAL_3d(DisasContext *s, arg_3diff *a)
-+{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        NULL,
-+        gen_VQDMULL_16,
-+        gen_VQDMULL_32,
-+        NULL,
-+    };
-+    static NeonGenTwo64OpFn * const accfn[] = {
-+        NULL,
-+        gen_VQDMLAL_acc_16,
-+        gen_VQDMLAL_acc_32,
-+        NULL,
-+    };
-+
-+    return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
-+}
-+
-+static void gen_VQDMLSL_acc_16(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-+{
-+    gen_helper_neon_negl_u32(rm, rm);
-+    gen_helper_neon_addl_saturate_s32(rd, cpu_env, rn, rm);
-+}
-+
-+static void gen_VQDMLSL_acc_32(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-+{
-+    tcg_gen_neg_i64(rm, rm);
-+    gen_helper_neon_addl_saturate_s64(rd, cpu_env, rn, rm);
-+}
-+
-+static bool trans_VQDMLSL_3d(DisasContext *s, arg_3diff *a)
-+{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
-+        NULL,
-+        gen_VQDMULL_16,
-+        gen_VQDMULL_32,
-+        NULL,
-+    };
-+    static NeonGenTwo64OpFn * const accfn[] = {
-+        NULL,
-+        gen_VQDMLSL_acc_16,
-+        gen_VQDMLSL_acc_32,
-+        NULL,
-+    };
-+
-+    return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                     {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
-                     {0, 0, 0, 7}, /* VABDL */
-                     {0, 0, 0, 7}, /* VMLAL */
--                    {0, 0, 0, 9}, /* VQDMLAL */
-+                    {0, 0, 0, 7}, /* VQDMLAL */
-                     {0, 0, 0, 7}, /* VMLSL */
--                    {0, 0, 0, 9}, /* VQDMLSL */
-+                    {0, 0, 0, 7}, /* VQDMLSL */
-                     {0, 0, 0, 7}, /* Integer VMULL */
--                    {0, 0, 0, 9}, /* VQDMULL */
-+                    {0, 0, 0, 7}, /* VQDMULL */
-                     {0, 0, 0, 0xa}, /* Polynomial VMULL */
-                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
-                 };
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                     }
-                     return 0;
-                 }
--
--                /* Avoid overlapping operands.  Wide source operands are
--                   always aligned so will never overlap with wide
--                   destinations in problematic ways.  */
--                if (rd == rm) {
--                    tmp = neon_load_reg(rm, 1);
--                    neon_store_scratch(2, tmp);
--                } else if (rd == rn) {
--                    tmp = neon_load_reg(rn, 1);
--                    neon_store_scratch(2, tmp);
--                }
--                tmp3 = NULL;
--                for (pass = 0; pass < 2; pass++) {
--                    if (pass == 1 && rd == rn) {
--                        tmp = neon_load_scratch(2);
--                    } else {
--                        tmp = neon_load_reg(rn, pass);
--                    }
--                    if (pass == 1 && rd == rm) {
--                        tmp2 = neon_load_scratch(2);
--                    } else {
--                        tmp2 = neon_load_reg(rm, pass);
--                    }
--                    switch (op) {
--                    case 9: case 11: case 13:
--                        /* VQDMLAL, VQDMLSL, VQDMULL */
--                        gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
--                        break;
--                    default: /* 15 is RESERVED: caught earlier  */
--                        abort();
--                    }
--                    if (op == 13) {
--                        /* VQDMULL */
--                        gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
--                        neon_store_reg64(cpu_V0, rd + pass);
--                    } else {
--                        /* Accumulate.  */
--                        neon_load_reg64(cpu_V1, rd + pass);
--                        switch (op) {
--                        case 9: case 11: /* VQDMLAL, VQDMLSL */
--                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
--                            if (op == 11) {
--                                gen_neon_negl(cpu_V0, size);
--                            }
--                            gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
--                            break;
--                        default:
--                            abort();
--                        }
--                        neon_store_reg64(cpu_V0, rd + pass);
--                    }
--                }
-+                abort(); /* all others handled by decodetree */
-             } else {
-                 /* Two registers and a scalar. NB that for ops of this form
-                  * the ARM ARM labels bit 24 as Q, but it is in our variable
---
-.20.1

-[PULL 07/23] target/arm: Convert Neon 3-reg-diff polynomial VMULL
+Deleted patch
-Convert the Neon 3-reg-diff insn polynomial VMULL. This is the last
-insn in this group to be converted.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/neon-dp.decode       |  2 ++
- target/arm/translate-neon.inc.c | 43 +++++++++++++++++++++++
- target/arm/translate.c          | 60 ++-------------------------------
-files changed, 48 insertions(+), 57 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
-+++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
-     VMULL_U_3d   1111 001 1 1 . .. .... .... 1100 . 0 . 0 .... @3diff
-     VQDMULL_3d   1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
-+
-+    VMULL_P_3d   1111 001 0 1 . .. .... .... 1110 . 0 . 0 .... @3diff
-   ]
- }
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMLSL_3d(DisasContext *s, arg_3diff *a)
-     return do_long_3d(s, a, opfn[a->size], accfn[a->size]);
- }
-+
-+static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
-+{
-+    gen_helper_gvec_3 *fn_gvec;
-+
-+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-+        return false;
-+    }
-+
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
-+        ((a->vd | a->vn | a->vm) & 0x10)) {
-+        return false;
-+    }
-+
-+    if (a->vd & 1) {
-+        return false;
-+    }
-+
-+    switch (a->size) {
-+    case 0:
-+        fn_gvec = gen_helper_neon_pmull_h;
-+        break;
-+    case 2:
-+        if (!dc_isar_feature(aa32_pmull, s)) {
-+            return false;
-+        }
-+        fn_gvec = gen_helper_gvec_pmull_q;
-+        break;
-+    default:
-+        return false;
-+    }
-+
-+    if (!vfp_access_check(s)) {
-+        return true;
-+    }
-+
-+    tcg_gen_gvec_3_ool(neon_reg_offset(a->vd, 0),
-+                       neon_reg_offset(a->vn, 0),
-+                       neon_reg_offset(a->vm, 0),
-+                       16, 16, 0, fn_gvec);
-+    return true;
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
- {
-     int op;
-     int q;
--    int rd, rn, rm, rd_ofs, rn_ofs, rm_ofs;
-+    int rd, rn, rm, rd_ofs, rm_ofs;
-     int size;
-     int pass;
-     int u;
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-     size = (insn >> 20) & 3;
-     vec_size = q ? 16 : 8;
-     rd_ofs = neon_reg_offset(rd, 0);
--    rn_ofs = neon_reg_offset(rn, 0);
-     rm_ofs = neon_reg_offset(rm, 0);
-     if ((insn & (1 << 23)) == 0) {
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-         if (size != 3) {
-             op = (insn >> 8) & 0xf;
-             if ((insn & (1 << 6)) == 0) {
--                /* Three registers of different lengths.  */
--                /* undefreq: bit 0 : UNDEF if size == 0
--                 *           bit 1 : UNDEF if size == 1
--                 *           bit 2 : UNDEF if size == 2
--                 *           bit 3 : UNDEF if U == 1
--                 * Note that [2:0] set implies 'always UNDEF'
--                 */
--                int undefreq;
--                /* prewiden, src1_wide, src2_wide, undefreq */
--                static const int neon_3reg_wide[16][4] = {
--                    {0, 0, 0, 7}, /* VADDL: handled by decodetree */
--                    {0, 0, 0, 7}, /* VADDW: handled by decodetree */
--                    {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
--                    {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
--                    {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
--                    {0, 0, 0, 7}, /* VABAL */
--                    {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
--                    {0, 0, 0, 7}, /* VABDL */
--                    {0, 0, 0, 7}, /* VMLAL */
--                    {0, 0, 0, 7}, /* VQDMLAL */
--                    {0, 0, 0, 7}, /* VMLSL */
--                    {0, 0, 0, 7}, /* VQDMLSL */
--                    {0, 0, 0, 7}, /* Integer VMULL */
--                    {0, 0, 0, 7}, /* VQDMULL */
--                    {0, 0, 0, 0xa}, /* Polynomial VMULL */
--                    {0, 0, 0, 7}, /* Reserved: always UNDEF */
--                };
--
--                undefreq = neon_3reg_wide[op][3];
--
--                if ((undefreq & (1 << size)) ||
--                    ((undefreq & 8) && u)) {
--                    return 1;
--                }
--                if (rd & 1) {
--                    return 1;
--                }
--
--                /* Handle polynomial VMULL in a single pass.  */
--                if (op == 14) {
--                    if (size == 0) {
--                        /* VMULL.P8 */
--                        tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
--                                           0, gen_helper_neon_pmull_h);
--                    } else {
--                        /* VMULL.P64 */
--                        if (!dc_isar_feature(aa32_pmull, s)) {
--                            return 1;
--                        }
--                        tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, 16, 16,
--                                           0, gen_helper_gvec_pmull_q);
--                    }
--                    return 0;
--                }
--                abort(); /* all others handled by decodetree */
-+                /* Three registers of different lengths: handled by decodetree */
-+                return 1;
-             } else {
-                 /* Two registers and a scalar. NB that for ops of this form
-                  * the ARM ARM labels bit 24 as Q, but it is in our variable
---
-.20.1

-[PULL 08/23] target/arm: Add 'static' and 'const' annotations to VSHLL function arrays
+Deleted patch
-Mark the arrays of function pointers in trans_VSHLL_S_2sh() and
-trans_VSHLL_U_2sh() as both 'static' and 'const'.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-neon.inc.c | 4 ++--
-file changed, 2 insertions(+), 2 deletions(-)
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
- static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
- {
--    NeonGenWidenFn *widenfn[] = {
-+    static NeonGenWidenFn * const widenfn[] = {
-         gen_helper_neon_widen_s8,
-         gen_helper_neon_widen_s16,
-         tcg_gen_ext_i32_i64,
-@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
- static bool trans_VSHLL_U_2sh(DisasContext *s, arg_2reg_shift *a)
- {
--    NeonGenWidenFn *widenfn[] = {
-+    static NeonGenWidenFn * const widenfn[] = {
-         gen_helper_neon_widen_u8,
-         gen_helper_neon_widen_u16,
-         tcg_gen_extu_i32_i64,
---
-.20.1

-[PULL 09/23] target/arm: Add missing TCG temp free in do_2shift_env_64()
+Deleted patch
-In commit 37bfce81b10450071 we accidentally introduced a leak of a TCG
-temporary in do_2shift_env_64(); free it.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/translate-neon.inc.c | 1 +
-file changed, 1 insertion(+)
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool do_2shift_env_64(DisasContext *s, arg_2reg_shift *a,
-         neon_load_reg64(tmp, a->vm + pass);
-         fn(tmp, cpu_env, tmp, constimm);
-         neon_store_reg64(tmp, a->vd + pass);
-+        tcg_temp_free_i64(tmp);
-     }
-     tcg_temp_free_i64(constimm);
-     return true;
---
-.20.1

-[PULL 15/23] target/arm: Convert Neon VEXT to decodetree
+[PULL 04/14] target/arm: Don't allow RES0 CNTHCTL_EL2 bits to be written
-Convert the Neon VEXT insn to decodetree. Rather than keeping the
+Don't allow the guest to write CNTHCTL_EL2 bits which don't exist.
-old implementation which used fixed temporaries cpu_V0 and cpu_V1
+This is not strictly architecturally required, but it is how we've
-and did the extraction with by-hand shift and logic ops, we use
+tended to implement registers more recently.
 the TCG extract2 insn.
-We don't need to special case 0 or 8 immediates any more as the
+In particular, bits [19:18] are only present with FEAT_RME,
-optimizer is smart enough to throw away the dead code.
+and bits [17:12] will only be present with FEAT_ECV.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240301183219.2424889-5-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  8 +++-
+ target/arm/helper.c | 18 ++++++++++++++++++
- target/arm/translate-neon.inc.c | 76 +++++++++++++++++++++++++++++++++
+file changed, 18 insertions(+)
  target/arm/translate.c          | 58 +------------------------
 files changed, 85 insertions(+), 57 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/helper.c
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ static void gt_cnthctl_write(CPUARMState *env, const ARMCPRegInfo *ri,
  # return false for size==3.
  ######################################################################
  {
--  # 0b11 subgroup will go here
+     ARMCPU *cpu = env_archcpu(env);
-+  [
+     uint32_t oldval = env->cp15.cnthctl_el2;
-+    ##################################################################
++    uint32_t valid_mask =
-+    # Miscellaneous size=0b11 insns
++        R_CNTHCTL_EL0PCTEN_E2H1_MASK |
-+    ##################################################################
++        R_CNTHCTL_EL0VCTEN_E2H1_MASK |
-+    VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
++        R_CNTHCTL_EVNTEN_MASK |
-+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
++        R_CNTHCTL_EVNTDIR_MASK |
-+  ]
++        R_CNTHCTL_EVNTI_MASK |
++        R_CNTHCTL_EL0VTEN_MASK |
-   # Subgroup for size != 0b11
++        R_CNTHCTL_EL0PTEN_MASK |
-   [
++        R_CNTHCTL_EL1PCTEN_E2H1_MASK |
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
++        R_CNTHCTL_EL1PTEN_MASK;
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
      return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
  }
 +
-+static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
++    if (cpu_isar_feature(aa64_rme, cpu)) {
-+{
++        valid_mask |= R_CNTHCTL_CNTVMASK_MASK | R_CNTHCTL_CNTPMASK_MASK;
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
-+    /* UNDEF accesses to D16-D31 if they don't exist. */
++    /* Clear RES0 bits */
-+    if (!dc_isar_feature(aa32_simd_r32, s) &&
++    value &= valid_mask;
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
-+    if ((a->vn | a->vm | a->vd) & a->q) {
+     raw_write(env, ri, value);
-+        return false;
-+    }
+     if ((oldval ^ value) & R_CNTHCTL_CNTVMASK_MASK) {
 +
 +    if (a->imm > 7 && !a->q) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    if (!a->q) {
 +        /* Extract 64 bits from <Vm:Vn> */
 +        TCGv_i64 left, right, dest;
 +
 +        left = tcg_temp_new_i64();
 +        right = tcg_temp_new_i64();
 +        dest = tcg_temp_new_i64();
 +
 +        neon_load_reg64(right, a->vn);
 +        neon_load_reg64(left, a->vm);
 +        tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
 +        neon_store_reg64(dest, a->vd);
 +
 +        tcg_temp_free_i64(left);
 +        tcg_temp_free_i64(right);
 +        tcg_temp_free_i64(dest);
 +    } else {
 +        /* Extract 128 bits from <Vm+1:Vm:Vn+1:Vn> */
 +        TCGv_i64 left, middle, right, destleft, destright;
 +
 +        left = tcg_temp_new_i64();
 +        middle = tcg_temp_new_i64();
 +        right = tcg_temp_new_i64();
 +        destleft = tcg_temp_new_i64();
 +        destright = tcg_temp_new_i64();
 +
 +        if (a->imm < 8) {
 +            neon_load_reg64(right, a->vn);
 +            neon_load_reg64(middle, a->vn + 1);
 +            tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
 +            neon_load_reg64(left, a->vm);
 +            tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
 +        } else {
 +            neon_load_reg64(right, a->vn + 1);
 +            neon_load_reg64(middle, a->vm);
 +            tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
 +            neon_load_reg64(left, a->vm + 1);
 +            tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
 +        }
 +
 +        neon_store_reg64(destright, a->vd);
 +        neon_store_reg64(destleft, a->vd + 1);
 +
 +        tcg_temp_free_i64(destright);
 +        tcg_temp_free_i64(destleft);
 +        tcg_temp_free_i64(right);
 +        tcg_temp_free_i64(middle);
 +        tcg_temp_free_i64(left);
 +    }
 +    return true;
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
      int pass;
      int u;
      int vec_size;
 -    uint32_t imm;
      TCGv_i32 tmp, tmp2, tmp3, tmp5;
      TCGv_ptr ptr1;
 -    TCGv_i64 tmp64;
      if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
          return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
              return 1;
          } else { /* size == 3 */
              if (!u) {
 -                /* Extract.  */
 -                imm = (insn >> 8) & 0xf;
 -
 -                if (imm > 7 && !q)
 -                    return 1;
 -
 -                if (q && ((rd | rn | rm) & 1)) {
 -                    return 1;
 -                }
 -
 -                if (imm == 0) {
 -                    neon_load_reg64(cpu_V0, rn);
 -                    if (q) {
 -                        neon_load_reg64(cpu_V1, rn + 1);
 -                    }
 -                } else if (imm == 8) {
 -                    neon_load_reg64(cpu_V0, rn + 1);
 -                    if (q) {
 -                        neon_load_reg64(cpu_V1, rm);
 -                    }
 -                } else if (q) {
 -                    tmp64 = tcg_temp_new_i64();
 -                    if (imm < 8) {
 -                        neon_load_reg64(cpu_V0, rn);
 -                        neon_load_reg64(tmp64, rn + 1);
 -                    } else {
 -                        neon_load_reg64(cpu_V0, rn + 1);
 -                        neon_load_reg64(tmp64, rm);
 -                    }
 -                    tcg_gen_shri_i64(cpu_V0, cpu_V0, (imm & 7) * 8);
 -                    tcg_gen_shli_i64(cpu_V1, tmp64, 64 - ((imm & 7) * 8));
 -                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
 -                    if (imm < 8) {
 -                        neon_load_reg64(cpu_V1, rm);
 -                    } else {
 -                        neon_load_reg64(cpu_V1, rm + 1);
 -                        imm -= 8;
 -                    }
 -                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
 -                    tcg_gen_shri_i64(tmp64, tmp64, imm * 8);
 -                    tcg_gen_or_i64(cpu_V1, cpu_V1, tmp64);
 -                    tcg_temp_free_i64(tmp64);
 -                } else {
 -                    /* BUGFIX */
 -                    neon_load_reg64(cpu_V0, rn);
 -                    tcg_gen_shri_i64(cpu_V0, cpu_V0, imm * 8);
 -                    neon_load_reg64(cpu_V1, rm);
 -                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
 -                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
 -                }
 -                neon_store_reg64(cpu_V0, rd);
 -                if (q) {
 -                    neon_store_reg64(cpu_V1, rd + 1);
 -                }
 +                /* Extract: handled by decodetree */
 +                return 1;
              } else if ((insn & (1 << 11)) == 0) {
                  /* Two register misc.  */
                  op = ((insn >> 12) & 0x30) | ((insn >> 7) & 0xf);
 --
-.20.1
+.34.1

-[PULL 14/23] target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
+[PULL 05/14] target/arm: Implement new FEAT_ECV trap bits
-Convert the Neon 2-reg-scalar long multiplies to decodetree.
+The functionality defined by ID_AA64MMFR0_EL1.ECV == 1 is:
-These are the last instructions in the group.
+ * four new trap bits for various counter and timer registers
  * the CNTHCTL_EL2.EVNTIS and CNTKCTL_EL1.EVNTIS bits which control
    scaling of the event stream. This is a no-op for us, because we don't
    implement the event stream (our WFE is a NOP): all we need to do is
    allow CNTHCTL_EL2.ENVTIS to be read and written.
  * extensions to PMSCR_EL1.PCT, PMSCR_EL2.PCT, TRFCR_EL1.TS and
    TRFCR_EL2.TS: these are all no-ops for us, because we don't implement
    FEAT_SPE or FEAT_TRF.
  * new registers CNTPCTSS_EL0 and NCTVCTSS_EL0 which are
    "self-sychronizing" views of the CNTPCT_EL0 and CNTVCT_EL0, meaning
    that no barriers are needed around their accesses. For us these
    are just the same as the normal views, because all our sysregs are
    inherently self-sychronizing.
 In this commit we implement the trap handling and permit the new
 CNTHCTL_EL2 bits to be written.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240301183219.2424889-6-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  18 ++++
+ target/arm/cpu-features.h |  5 ++++
- target/arm/translate-neon.inc.c | 163 ++++++++++++++++++++++++++++
+ target/arm/helper.c       | 51 +++++++++++++++++++++++++++++++++++----
- target/arm/translate.c          | 182 ++------------------------------
+files changed, 51 insertions(+), 5 deletions(-)
 files changed, 187 insertions(+), 176 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/cpu-features.h
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/cpu-features.h
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_fgt(const ARMISARegisters *id)
+     return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, FGT) != 0;
      @2scalar     .... ... q:1 . . size:2 .... .... .... . . . . .... \
                   &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +    # For the 'long' ops the Q bit is part of insn decode
 +    @2scalar_q0  .... ... . . . size:2 .... .... .... . . . . .... \
 +                 &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp q=0
      VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
      VMLA_F_2sc   1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
 +    VMLAL_S_2sc  1111 001 0 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
 +    VMLAL_U_2sc  1111 001 1 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
 +
 +    VQDMLAL_2sc  1111 001 0 1 . .. .... .... 0011 . 1 . 0 .... @2scalar_q0
 +
      VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
      VMLS_F_2sc   1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
 +    VMLSL_S_2sc  1111 001 0 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
 +    VMLSL_U_2sc  1111 001 1 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
 +
 +    VQDMLSL_2sc  1111 001 0 1 . .. .... .... 0111 . 1 . 0 .... @2scalar_q0
 +
      VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
      VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
 +    VMULL_S_2sc  1111 001 0 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
 +    VMULL_U_2sc  1111 001 1 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
 +
 +    VQDMULL_2sc  1111 001 0 1 . .. .... .... 1011 . 1 . 0 .... @2scalar_q0
 +
      VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
      VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
      };
      return do_vqrdmlah_2sc(s, a, opfn[a->size]);
  }
-+
-+static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
++static inline bool isar_feature_aa64_ecv_traps(const ARMISARegisters *id)
 +                            NeonGenTwoOpWidenFn *opfn,
 +                            NeonGenTwo64OpFn *accfn)
 +{
-+    /*
++    return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, ECV) > 0;
 +     * Two registers and a scalar, long operations: perform an
 +     * operation on the input elements and the scalar which produces
 +     * a double-width result, and then possibly perform an accumulation
 +     * operation of that result into the destination.
 +     */
 +    TCGv_i32 scalar, rn;
 +    TCGv_i64 rn0_64, rn1_64;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!opfn) {
 +        /* Bad size (including size == 3, which is a different insn group) */
 +        return false;
 +    }
 +
 +    if (a->vd & 1) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    scalar = neon_get_scalar(a->size, a->vm);
 +
 +    /* Load all inputs before writing any outputs, in case of overlap */
 +    rn = neon_load_reg(a->vn, 0);
 +    rn0_64 = tcg_temp_new_i64();
 +    opfn(rn0_64, rn, scalar);
 +    tcg_temp_free_i32(rn);
 +
 +    rn = neon_load_reg(a->vn, 1);
 +    rn1_64 = tcg_temp_new_i64();
 +    opfn(rn1_64, rn, scalar);
 +    tcg_temp_free_i32(rn);
 +    tcg_temp_free_i32(scalar);
 +
 +    if (accfn) {
 +        TCGv_i64 t64 = tcg_temp_new_i64();
 +        neon_load_reg64(t64, a->vd);
 +        accfn(t64, t64, rn0_64);
 +        neon_store_reg64(t64, a->vd);
 +        neon_load_reg64(t64, a->vd + 1);
 +        accfn(t64, t64, rn1_64);
 +        neon_store_reg64(t64, a->vd + 1);
 +        tcg_temp_free_i64(t64);
 +    } else {
 +        neon_store_reg64(rn0_64, a->vd);
 +        neon_store_reg64(rn1_64, a->vd + 1);
 +    }
 +    tcg_temp_free_i64(rn0_64);
 +    tcg_temp_free_i64(rn1_64);
 +    return true;
 +}
 +
-+static bool trans_VMULL_S_2sc(DisasContext *s, arg_2scalar *a)
+ static inline bool isar_feature_aa64_vh(const ARMISARegisters *id)
  {
      return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, VH) != 0;
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPAccessResult gt_counter_access(CPUARMState *env, int timeridx,
               : !extract32(env->cp15.cnthctl_el2, 0, 1))) {
              return CP_ACCESS_TRAP_EL2;
          }
 +        if (has_el2 && timeridx == GTIMER_VIRT) {
 +            if (FIELD_EX64(env->cp15.cnthctl_el2, CNTHCTL, EL1TVCT)) {
 +                return CP_ACCESS_TRAP_EL2;
 +            }
 +        }
          break;
      }
      return CP_ACCESS_OK;
@@ -XXX,XX +XXX,XX @@ static CPAccessResult gt_timer_access(CPUARMState *env, int timeridx,
                  }
              }
          }
 +        if (has_el2 && timeridx == GTIMER_VIRT) {
 +            if (FIELD_EX64(env->cp15.cnthctl_el2, CNTHCTL, EL1TVT)) {
 +                return CP_ACCESS_TRAP_EL2;
 +            }
 +        }
          break;
      }
      return CP_ACCESS_OK;
@@ -XXX,XX +XXX,XX @@ static void gt_cnthctl_write(CPUARMState *env, const ARMCPRegInfo *ri,
      if (cpu_isar_feature(aa64_rme, cpu)) {
          valid_mask |= R_CNTHCTL_CNTVMASK_MASK | R_CNTHCTL_CNTPMASK_MASK;
      }
 +    if (cpu_isar_feature(aa64_ecv_traps, cpu)) {
 +        valid_mask |=
 +            R_CNTHCTL_EL1TVT_MASK |
 +            R_CNTHCTL_EL1TVCT_MASK |
 +            R_CNTHCTL_EL1NVPCT_MASK |
 +            R_CNTHCTL_EL1NVVCT_MASK |
 +            R_CNTHCTL_EVNTIS_MASK;
 +    }
      /* Clear RES0 bits */
      value &= valid_mask;
@@ -XXX,XX +XXX,XX @@ static CPAccessResult e2h_access(CPUARMState *env, const ARMCPRegInfo *ri,
  {
      if (arm_current_el(env) == 1) {
          /* This must be a FEAT_NV access */
 -        /* TODO: FEAT_ECV will need to check CNTHCTL_EL2 here */
          return CP_ACCESS_OK;
      }
      if (!(arm_hcr_el2_eff(env) & HCR_E2H)) {
@@ -XXX,XX +XXX,XX @@ static CPAccessResult e2h_access(CPUARMState *env, const ARMCPRegInfo *ri,
      return CP_ACCESS_OK;
  }
 +static CPAccessResult access_el1nvpct(CPUARMState *env, const ARMCPRegInfo *ri,
 +                                      bool isread)
 +{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
++    if (arm_current_el(env) == 1) {
-+        NULL,
++        /* This must be a FEAT_NV access with NVx == 101 */
-+        gen_helper_neon_mull_s16,
++        if (FIELD_EX64(env->cp15.cnthctl_el2, CNTHCTL, EL1NVPCT)) {
-+        gen_mull_s32,
++            return CP_ACCESS_TRAP_EL2;
-+        NULL,
++        }
-+    };
++    }
-+
++    return e2h_access(env, ri, isread);
 +    return do_2scalar_long(s, a, opfn[a->size], NULL);
 +}
 +
-+static bool trans_VMULL_U_2sc(DisasContext *s, arg_2scalar *a)
++static CPAccessResult access_el1nvvct(CPUARMState *env, const ARMCPRegInfo *ri,
 +                                      bool isread)
 +{
-+    static NeonGenTwoOpWidenFn * const opfn[] = {
++    if (arm_current_el(env) == 1) {
-+        NULL,
++        /* This must be a FEAT_NV access with NVx == 101 */
-+        gen_helper_neon_mull_u16,
++        if (FIELD_EX64(env->cp15.cnthctl_el2, CNTHCTL, EL1NVVCT)) {
-+        gen_mull_u32,
++            return CP_ACCESS_TRAP_EL2;
-+        NULL,
++        }
-+    };
++    }
-+
++    return e2h_access(env, ri, isread);
 +    return do_2scalar_long(s, a, opfn[a->size], NULL);
 +}
 +
-+#define DO_VMLAL_2SC(INSN, MULL, ACC)                                   \
+ /* Test if system register redirection is to occur in the current state.  */
-+    static bool trans_##INSN##_2sc(DisasContext *s, arg_2scalar *a)     \
+ static bool redirect_for_e2h(CPUARMState *env)
 +    {                                                                   \
 +        static NeonGenTwoOpWidenFn * const opfn[] = {                   \
 +            NULL,                                                       \
 +            gen_helper_neon_##MULL##16,                                 \
 +            gen_##MULL##32,                                             \
 +            NULL,                                                       \
 +        };                                                              \
 +        static NeonGenTwo64OpFn * const accfn[] = {                     \
 +            NULL,                                                       \
 +            gen_helper_neon_##ACC##l_u32,                               \
 +            tcg_gen_##ACC##_i64,                                        \
 +            NULL,                                                       \
 +        };                                                              \
 +        return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);    \
 +    }
 +
 +DO_VMLAL_2SC(VMLAL_S, mull_s, add)
 +DO_VMLAL_2SC(VMLAL_U, mull_u, add)
 +DO_VMLAL_2SC(VMLSL_S, mull_s, sub)
 +DO_VMLAL_2SC(VMLSL_U, mull_u, sub)
 +
 +static bool trans_VQDMULL_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        NULL,
 +        gen_VQDMULL_16,
 +        gen_VQDMULL_32,
 +        NULL,
 +    };
 +
 +    return do_2scalar_long(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VQDMLAL_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        NULL,
 +        gen_VQDMULL_16,
 +        gen_VQDMULL_32,
 +        NULL,
 +    };
 +    static NeonGenTwo64OpFn * const accfn[] = {
 +        NULL,
 +        gen_VQDMLAL_acc_16,
 +        gen_VQDMLAL_acc_32,
 +        NULL,
 +    };
 +
 +    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
 +}
 +
 +static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpWidenFn * const opfn[] = {
 +        NULL,
 +        gen_VQDMULL_16,
 +        gen_VQDMULL_32,
 +        NULL,
 +    };
 +    static NeonGenTwo64OpFn * const accfn[] = {
 +        NULL,
 +        gen_VQDMLSL_acc_16,
 +        gen_VQDMLSL_acc_32,
 +        NULL,
 +    };
 +
 +    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_revsh(TCGv_i32 dest, TCGv_i32 var)
      tcg_gen_ext16s_i32(dest, var);
  }
 -/* 32x32->64 multiply.  Marks inputs as dead.  */
 -static TCGv_i64 gen_mulu_i64_i32(TCGv_i32 a, TCGv_i32 b)
 -{
 -    TCGv_i32 lo = tcg_temp_new_i32();
 -    TCGv_i32 hi = tcg_temp_new_i32();
 -    TCGv_i64 ret;
 -
 -    tcg_gen_mulu2_i32(lo, hi, a, b);
 -    tcg_temp_free_i32(a);
 -    tcg_temp_free_i32(b);
 -
 -    ret = tcg_temp_new_i64();
 -    tcg_gen_concat_i32_i64(ret, lo, hi);
 -    tcg_temp_free_i32(lo);
 -    tcg_temp_free_i32(hi);
 -
 -    return ret;
 -}
 -
 -static TCGv_i64 gen_muls_i64_i32(TCGv_i32 a, TCGv_i32 b)
 -{
 -    TCGv_i32 lo = tcg_temp_new_i32();
 -    TCGv_i32 hi = tcg_temp_new_i32();
 -    TCGv_i64 ret;
 -
 -    tcg_gen_muls2_i32(lo, hi, a, b);
 -    tcg_temp_free_i32(a);
 -    tcg_temp_free_i32(b);
 -
 -    ret = tcg_temp_new_i64();
 -    tcg_gen_concat_i32_i64(ret, lo, hi);
 -    tcg_temp_free_i32(lo);
 -    tcg_temp_free_i32(hi);
 -
 -    return ret;
 -}
 -
  /* Swap low and high halfwords.  */
  static void gen_swap_half(TCGv_i32 var)
  {
-@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vhe_reginfo[] = {
-     }
+     { .name = "CNTP_CTL_EL02", .state = ARM_CP_STATE_AA64,
- }
+       .opc0 = 3, .opc1 = 5, .crn = 14, .crm = 2, .opc2 = 1,
+       .type = ARM_CP_IO | ARM_CP_ALIAS,
--static inline void gen_neon_negl(TCGv_i64 var, int size)
+-      .access = PL2_RW, .accessfn = e2h_access,
--{
++      .access = PL2_RW, .accessfn = access_el1nvpct,
--    switch (size) {
+       .nv2_redirect_offset = 0x180 | NV2_REDIR_NO_NV1,
--    case 0: gen_helper_neon_negl_u16(var, var); break;
+       .fieldoffset = offsetof(CPUARMState, cp15.c14_timer[GTIMER_PHYS].ctl),
--    case 1: gen_helper_neon_negl_u32(var, var); break;
+       .writefn = gt_phys_ctl_write, .raw_writefn = raw_write },
--    case 2:
+     { .name = "CNTV_CTL_EL02", .state = ARM_CP_STATE_AA64,
--        tcg_gen_neg_i64(var, var);
+       .opc0 = 3, .opc1 = 5, .crn = 14, .crm = 3, .opc2 = 1,
--        break;
+       .type = ARM_CP_IO | ARM_CP_ALIAS,
--    default: abort();
+-      .access = PL2_RW, .accessfn = e2h_access,
--    }
++      .access = PL2_RW, .accessfn = access_el1nvvct,
--}
+       .nv2_redirect_offset = 0x170 | NV2_REDIR_NO_NV1,
--
+       .fieldoffset = offsetof(CPUARMState, cp15.c14_timer[GTIMER_VIRT].ctl),
--static inline void gen_neon_addl_saturate(TCGv_i64 op0, TCGv_i64 op1, int size)
+       .writefn = gt_virt_ctl_write, .raw_writefn = raw_write },
--{
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vhe_reginfo[] = {
--    switch (size) {
+       .type = ARM_CP_IO | ARM_CP_ALIAS,
--    case 1: gen_helper_neon_addl_saturate_s32(op0, cpu_env, op0, op1); break;
+       .fieldoffset = offsetof(CPUARMState, cp15.c14_timer[GTIMER_PHYS].cval),
--    case 2: gen_helper_neon_addl_saturate_s64(op0, cpu_env, op0, op1); break;
+       .nv2_redirect_offset = 0x178 | NV2_REDIR_NO_NV1,
--    default: abort();
+-      .access = PL2_RW, .accessfn = e2h_access,
--    }
++      .access = PL2_RW, .accessfn = access_el1nvpct,
--}
+       .writefn = gt_phys_cval_write, .raw_writefn = raw_write },
--
+     { .name = "CNTV_CVAL_EL02", .state = ARM_CP_STATE_AA64,
--static inline void gen_neon_mull(TCGv_i64 dest, TCGv_i32 a, TCGv_i32 b,
+       .opc0 = 3, .opc1 = 5, .crn = 14, .crm = 3, .opc2 = 2,
--                                 int size, int u)
+       .type = ARM_CP_IO | ARM_CP_ALIAS,
--{
+       .nv2_redirect_offset = 0x168 | NV2_REDIR_NO_NV1,
--    TCGv_i64 tmp;
+       .fieldoffset = offsetof(CPUARMState, cp15.c14_timer[GTIMER_VIRT].cval),
--
+-      .access = PL2_RW, .accessfn = e2h_access,
--    switch ((size << 1) | u) {
++      .access = PL2_RW, .accessfn = access_el1nvvct,
--    case 0: gen_helper_neon_mull_s8(dest, a, b); break;
+       .writefn = gt_virt_cval_write, .raw_writefn = raw_write },
--    case 1: gen_helper_neon_mull_u8(dest, a, b); break;
+ #endif
--    case 2: gen_helper_neon_mull_s16(dest, a, b); break;
+ };
 -    case 3: gen_helper_neon_mull_u16(dest, a, b); break;
 -    case 4:
 -        tmp = gen_muls_i64_i32(a, b);
 -        tcg_gen_mov_i64(dest, tmp);
 -        tcg_temp_free_i64(tmp);
 -        break;
 -    case 5:
 -        tmp = gen_mulu_i64_i32(a, b);
 -        tcg_gen_mov_i64(dest, tmp);
 -        tcg_temp_free_i64(tmp);
 -        break;
 -    default: abort();
 -    }
 -
 -    /* gen_helper_neon_mull_[su]{8|16} do not free their parameters.
 -       Don't forget to clean them now.  */
 -    if (size < 2) {
 -        tcg_temp_free_i32(a);
 -        tcg_temp_free_i32(b);
 -    }
 -}
 -
  static void gen_neon_narrow_op(int op, int u, int size,
                                 TCGv_i32 dest, TCGv_i64 src)
  {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
      int u;
      int vec_size;
      uint32_t imm;
 -    TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
 +    TCGv_i32 tmp, tmp2, tmp3, tmp5;
      TCGv_ptr ptr1;
      TCGv_i64 tmp64;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
          return 1;
      } else { /* (insn & 0x00800010 == 0x00800000) */
          if (size != 3) {
 -            op = (insn >> 8) & 0xf;
 -            if ((insn & (1 << 6)) == 0) {
 -                /* Three registers of different lengths: handled by decodetree */
 -                return 1;
 -            } else {
 -                /* Two registers and a scalar. NB that for ops of this form
 -                 * the ARM ARM labels bit 24 as Q, but it is in our variable
 -                 * 'u', not 'q'.
 -                 */
 -                if (size == 0) {
 -                    return 1;
 -                }
 -                switch (op) {
 -                case 0: /* Integer VMLA scalar */
 -                case 4: /* Integer VMLS scalar */
 -                case 8: /* Integer VMUL scalar */
 -                case 1: /* Float VMLA scalar */
 -                case 5: /* Floating point VMLS scalar */
 -                case 9: /* Floating point VMUL scalar */
 -                case 12: /* VQDMULH scalar */
 -                case 13: /* VQRDMULH scalar */
 -                case 14: /* VQRDMLAH scalar */
 -                case 15: /* VQRDMLSH scalar */
 -                    return 1; /* handled by decodetree */
 -
 -                case 3: /* VQDMLAL scalar */
 -                case 7: /* VQDMLSL scalar */
 -                case 11: /* VQDMULL scalar */
 -                    if (u == 1) {
 -                        return 1;
 -                    }
 -                    /* fall through */
 -                case 2: /* VMLAL sclar */
 -                case 6: /* VMLSL scalar */
 -                case 10: /* VMULL scalar */
 -                    if (rd & 1) {
 -                        return 1;
 -                    }
 -                    tmp2 = neon_get_scalar(size, rm);
 -                    /* We need a copy of tmp2 because gen_neon_mull
 -                     * deletes it during pass 0.  */
 -                    tmp4 = tcg_temp_new_i32();
 -                    tcg_gen_mov_i32(tmp4, tmp2);
 -                    tmp3 = neon_load_reg(rn, 1);
 -
 -                    for (pass = 0; pass < 2; pass++) {
 -                        if (pass == 0) {
 -                            tmp = neon_load_reg(rn, 0);
 -                        } else {
 -                            tmp = tmp3;
 -                            tmp2 = tmp4;
 -                        }
 -                        gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
 -                        if (op != 11) {
 -                            neon_load_reg64(cpu_V1, rd + pass);
 -                        }
 -                        switch (op) {
 -                        case 6:
 -                            gen_neon_negl(cpu_V0, size);
 -                            /* Fall through */
 -                        case 2:
 -                            gen_neon_addl(size);
 -                            break;
 -                        case 3: case 7:
 -                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
 -                            if (op == 7) {
 -                                gen_neon_negl(cpu_V0, size);
 -                            }
 -                            gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
 -                            break;
 -                        case 10:
 -                            /* no-op */
 -                            break;
 -                        case 11:
 -                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
 -                            break;
 -                        default:
 -                            abort();
 -                        }
 -                        neon_store_reg64(cpu_V0, rd + pass);
 -                    }
 -                    break;
 -                default:
 -                    g_assert_not_reached();
 -                }
 -            }
 +            /*
 +             * Three registers of different lengths, or two registers and
 +             * a scalar: handled by decodetree
 +             */
 +            return 1;
          } else { /* size == 3 */
              if (!u) {
                  /* Extract.  */
 --
-.20.1
+.34.1

-[PULL 13/23] target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
+[PULL 06/14] target/arm: Define CNTPCTSS_EL0 and CNTVCTSS_EL0
-Convert the VQRDMLAH and VQRDMLSH insns in the 2-reg-scalar
+For FEAT_ECV, new registers CNTPCTSS_EL0 and CNTVCTSS_EL0 are
-group to decodetree.
+defined, which are "self-synchronized" views of the physical and
 virtual counts as seen in the CNTPCT_EL0 and CNTVCT_EL0 registers
 (meaning that no barriers are needed around accesses to them to
 ensure that reads of them do not occur speculatively and out-of-order
 with other instructions).
 For QEMU, all our system registers are self-synchronized, so we can
 simply copy the existing implementation of CNTPCT_EL0 and CNTVCT_EL0
 to the new register encodings.
 This means we now implement all the functionality required for
 ID_AA64MMFR0_EL1.ECV == 0b0001.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240301183219.2424889-7-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  3 ++
+ target/arm/helper.c | 43 +++++++++++++++++++++++++++++++++++++++++++
- target/arm/translate-neon.inc.c | 74 +++++++++++++++++++++++++++++++++
+file changed, 43 insertions(+)
  target/arm/translate.c          | 38 +----------------
 files changed, 79 insertions(+), 36 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/helper.c
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo generic_timer_cp_reginfo[] = {
+     },
-     VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
+ };
-     VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
 +/*
 + * FEAT_ECV adds extra views of CNTVCT_EL0 and CNTPCT_EL0 which
 + * are "self-synchronizing". For QEMU all sysregs are self-synchronizing,
 + * so our implementations here are identical to the normal registers.
 + */
 +static const ARMCPRegInfo gen_timer_ecv_cp_reginfo[] = {
 +    { .name = "CNTVCTSS", .cp = 15, .crm = 14, .opc1 = 9,
 +      .access = PL0_R, .type = ARM_CP_64BIT | ARM_CP_NO_RAW | ARM_CP_IO,
 +      .accessfn = gt_vct_access,
 +      .readfn = gt_virt_cnt_read, .resetfn = arm_cp_reset_ignore,
 +    },
 +    { .name = "CNTVCTSS_EL0", .state = ARM_CP_STATE_AA64,
 +      .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 0, .opc2 = 6,
 +      .access = PL0_R, .type = ARM_CP_NO_RAW | ARM_CP_IO,
 +      .accessfn = gt_vct_access, .readfn = gt_virt_cnt_read,
 +    },
 +    { .name = "CNTPCTSS", .cp = 15, .crm = 14, .opc1 = 8,
 +      .access = PL0_R, .type = ARM_CP_64BIT | ARM_CP_NO_RAW | ARM_CP_IO,
 +      .accessfn = gt_pct_access,
 +      .readfn = gt_cnt_read, .resetfn = arm_cp_reset_ignore,
 +    },
 +    { .name = "CNTPCTSS_EL0", .state = ARM_CP_STATE_AA64,
 +      .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 0, .opc2 = 5,
 +      .access = PL0_R, .type = ARM_CP_NO_RAW | ARM_CP_IO,
 +      .accessfn = gt_pct_access, .readfn = gt_cnt_read,
 +    },
 +};
 +
-+    VQRDMLAH_2sc 1111 001 . 1 . .. .... .... 1110 . 1 . 0 .... @2scalar
+ #else
-+    VQRDMLSH_2sc 1111 001 . 1 . .. .... .... 1111 . 1 . 0 .... @2scalar
-   ]
+ /*
- }
+@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo generic_timer_cp_reginfo[] = {
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
+     },
-index XXXXXXX..XXXXXXX 100644
+ };
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
++/*
-@@ -XXX,XX +XXX,XX @@ static bool trans_VQRDMULH_2sc(DisasContext *s, arg_2scalar *a)
++ * CNTVCTSS_EL0 has the same trap conditions as CNTVCT_EL0, so it also
++ * is exposed to userspace by Linux.
-     return do_2scalar(s, a, opfn[a->size], NULL);
++ */
- }
++static const ARMCPRegInfo gen_timer_ecv_cp_reginfo[] = {
 +    { .name = "CNTVCTSS_EL0", .state = ARM_CP_STATE_AA64,
 +      .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 0, .opc2 = 6,
 +      .access = PL0_R, .type = ARM_CP_NO_RAW | ARM_CP_IO,
 +      .readfn = gt_virt_cnt_read,
 +    },
 +};
 +
-+static bool do_vqrdmlah_2sc(DisasContext *s, arg_2scalar *a,
+ #endif
-+                            NeonGenThreeOpEnvFn *opfn)
-+{
+ static void par_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
-+    /*
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
-+     * VQRDMLAH/VQRDMLSH: this is like do_2scalar, but the opfn
+     if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) {
-+     * performs a kind of fused op-then-accumulate using a helper
+         define_arm_cp_regs(cpu, generic_timer_cp_reginfo);
-+     * function that takes all of rd, rn and the scalar at once.
+     }
-+     */
++    if (cpu_isar_feature(aa64_ecv_traps, cpu)) {
-+    TCGv_i32 scalar;
++        define_arm_cp_regs(cpu, gen_timer_ecv_cp_reginfo);
 +    int pass;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
-+
+     if (arm_feature(env, ARM_FEATURE_VAPA)) {
-+    if (!dc_isar_feature(aa32_rdm, s)) {
+         ARMCPRegInfo vapa_cp_reginfo[] = {
-+        return false;
+             { .name = "PAR", .cp = 15, .crn = 7, .crm = 4, .opc1 = 0, .opc2 = 0,
 +    }
 +
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!opfn) {
 +        /* Bad size (including size == 3, which is a different insn group) */
 +        return false;
 +    }
 +
 +    if (a->q && ((a->vd | a->vn) & 1)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    scalar = neon_get_scalar(a->size, a->vm);
 +
 +    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 +        TCGv_i32 rn = neon_load_reg(a->vn, pass);
 +        TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +        opfn(rd, cpu_env, rn, scalar, rd);
 +        tcg_temp_free_i32(rn);
 +        neon_store_reg(a->vd, pass, rd);
 +    }
 +    tcg_temp_free_i32(scalar);
 +
 +    return true;
 +}
 +
 +static bool trans_VQRDMLAH_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenThreeOpEnvFn *opfn[] = {
 +        NULL,
 +        gen_helper_neon_qrdmlah_s16,
 +        gen_helper_neon_qrdmlah_s32,
 +        NULL,
 +    };
 +    return do_vqrdmlah_2sc(s, a, opfn[a->size]);
 +}
 +
 +static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenThreeOpEnvFn *opfn[] = {
 +        NULL,
 +        gen_helper_neon_qrdmlsh_s16,
 +        gen_helper_neon_qrdmlsh_s32,
 +        NULL,
 +    };
 +    return do_vqrdmlah_2sc(s, a, opfn[a->size]);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  case 9: /* Floating point VMUL scalar */
                  case 12: /* VQDMULH scalar */
                  case 13: /* VQRDMULH scalar */
 +                case 14: /* VQRDMLAH scalar */
 +                case 15: /* VQRDMLSH scalar */
                      return 1; /* handled by decodetree */
                  case 3: /* VQDMLAL scalar */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          neon_store_reg64(cpu_V0, rd + pass);
                      }
                      break;
 -                case 14: /* VQRDMLAH scalar */
 -                case 15: /* VQRDMLSH scalar */
 -                    {
 -                        NeonGenThreeOpEnvFn *fn;
 -
 -                        if (!dc_isar_feature(aa32_rdm, s)) {
 -                            return 1;
 -                        }
 -                        if (u && ((rd | rn) & 1)) {
 -                            return 1;
 -                        }
 -                        if (op == 14) {
 -                            if (size == 1) {
 -                                fn = gen_helper_neon_qrdmlah_s16;
 -                            } else {
 -                                fn = gen_helper_neon_qrdmlah_s32;
 -                            }
 -                        } else {
 -                            if (size == 1) {
 -                                fn = gen_helper_neon_qrdmlsh_s16;
 -                            } else {
 -                                fn = gen_helper_neon_qrdmlsh_s32;
 -                            }
 -                        }
 -
 -                        tmp2 = neon_get_scalar(size, rm);
 -                        for (pass = 0; pass < (u ? 4 : 2); pass++) {
 -                            tmp = neon_load_reg(rn, pass);
 -                            tmp3 = neon_load_reg(rd, pass);
 -                            fn(tmp, cpu_env, tmp, tmp2, tmp3);
 -                            tcg_temp_free_i32(tmp3);
 -                            neon_store_reg(rd, pass, tmp);
 -                        }
 -                        tcg_temp_free_i32(tmp2);
 -                    }
 -                    break;
                  default:
                      g_assert_not_reached();
                  }
 --
-.20.1
+.34.1

-[PULL 10/23] target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
+[PULL 07/14] target/arm: Implement FEAT_ECV CNTPOFF_EL2 handling
-Convert the VMLA, VMLS and VMUL insns in the Neon "2 registers and a
+When ID_AA64MMFR0_EL1.ECV is 0b0010, a new register CNTPOFF_EL2 is
-scalar" group to decodetree.  These are 32x32->32 operations where
+implemented.  This is similar to the existing CNTVOFF_EL2, except
-one of the inputs is the scalar, followed by a possible accumulate
+that it controls a hypervisor-adjustable offset made to the physical
-operation of the 32-bit result.
+counter and timer.
-The refactoring removes some of the oddities of the old decoder:
+Implement the handling for this register, which includes control/trap
- * operands to the operation and accumulation were often
+bits in SCR_EL3 and CNTHCTL_EL2.
    reversed (taking advantage of the fact that most of these ops
    are commutative); the new code follows the pseudocode order
  * the Q bit in the insn was in a local variable 'u'; in the
    new code it is decoded into a->q
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240301183219.2424889-8-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  15 ++++
+ target/arm/cpu-features.h |  5 +++
- target/arm/translate-neon.inc.c | 133 ++++++++++++++++++++++++++++++++
+ target/arm/cpu.h          |  1 +
- target/arm/translate.c          |  77 ++----------------
+ target/arm/helper.c       | 68 +++++++++++++++++++++++++++++++++++++--
-files changed, 154 insertions(+), 71 deletions(-)
+ target/arm/trace-events   |  1 +
 files changed, 73 insertions(+), 2 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/target/arm/cpu-features.h
-+++ b/target/arm/neon-dp.decode
++++ b/target/arm/cpu-features.h
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_ecv_traps(const ARMISARegisters *id)
-     VQDMULL_3d   1111 001 0 1 . .. .... .... 1101 . 0 . 0 .... @3diff
+     return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, ECV) > 0;
      VMULL_P_3d   1111 001 0 1 . .. .... .... 1110 . 0 . 0 .... @3diff
 +
 +    ##################################################################
 +    # 2-regs-plus-scalar grouping:
 +    # 1111 001 Q 1 D sz!=11 Vn:4 Vd:4 opc:4 N 1 M 0 Vm:4
 +    ##################################################################
 +    &2scalar vm vn vd size q
 +
 +    @2scalar     .... ... q:1 . . size:2 .... .... .... . . . . .... \
 +                 &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
 +
 +    VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
 +
 +    VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
 +
 +    VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
    ]
  }
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
++static inline bool isar_feature_aa64_ecv(const ARMISARegisters *id)
 --- a/target/arm/translate-neon.inc.c
 +++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VMULL_P_3d(DisasContext *s, arg_3diff *a)
 , 16, 0, fn_gvec);
      return true;
  }
 +
 +static void gen_neon_dup_low16(TCGv_i32 var)
 +{
-+    TCGv_i32 tmp = tcg_temp_new_i32();
++    return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, ECV) > 1;
 +    tcg_gen_ext16u_i32(var, var);
 +    tcg_gen_shli_i32(tmp, var, 16);
 +    tcg_gen_or_i32(var, var, tmp);
 +    tcg_temp_free_i32(tmp);
 +}
 +
-+static void gen_neon_dup_high16(TCGv_i32 var)
+ static inline bool isar_feature_aa64_vh(const ARMISARegisters *id)
  {
      return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, VH) != 0;
 diff --git a/target/arm/cpu.h b/target/arm/cpu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/cpu.h
 +++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
          uint64_t c14_cntkctl; /* Timer Control register */
          uint64_t cnthctl_el2; /* Counter/Timer Hyp Control register */
          uint64_t cntvoff_el2; /* Counter Virtual Offset register */
 +        uint64_t cntpoff_el2; /* Counter Physical Offset register */
          ARMGenericTimer c14_timer[NUM_GTIMERS];
          uint32_t c15_cpar; /* XScale Coprocessor Access Register */
          uint32_t c15_ticonfig; /* TI925T configuration byte.  */
 diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper.c
 +++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void scr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
          if (cpu_isar_feature(aa64_rme, cpu)) {
              valid_mask |= SCR_NSE | SCR_GPF;
          }
 +        if (cpu_isar_feature(aa64_ecv, cpu)) {
 +            valid_mask |= SCR_ECVEN;
 +        }
      } else {
          valid_mask &= ~(SCR_RW | SCR_ST);
          if (cpu_isar_feature(aa32_ras, cpu)) {
@@ -XXX,XX +XXX,XX @@ void gt_rme_post_el_change(ARMCPU *cpu, void *ignored)
      gt_update_irq(cpu, GTIMER_PHYS);
  }
 +static uint64_t gt_phys_raw_cnt_offset(CPUARMState *env)
 +{
-+    TCGv_i32 tmp = tcg_temp_new_i32();
++    if ((env->cp15.scr_el3 & SCR_ECVEN) &&
-+    tcg_gen_andi_i32(var, var, 0xffff0000);
++        FIELD_EX64(env->cp15.cnthctl_el2, CNTHCTL, ECV) &&
-+    tcg_gen_shri_i32(tmp, var, 16);
++        arm_is_el2_enabled(env) &&
-+    tcg_gen_or_i32(var, var, tmp);
++        (arm_hcr_el2_eff(env) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
-+    tcg_temp_free_i32(tmp);
++        return env->cp15.cntpoff_el2;
 +    }
 +    return 0;
 +}
 +
-+static inline TCGv_i32 neon_get_scalar(int size, int reg)
++static uint64_t gt_phys_cnt_offset(CPUARMState *env)
 +{
-+    TCGv_i32 tmp;
++    if (arm_current_el(env) >= 2) {
-+    if (size == 1) {
++        return 0;
 +        tmp = neon_load_reg(reg & 7, reg >> 4);
 +        if (reg & 8) {
 +            gen_neon_dup_high16(tmp);
 +        } else {
 +            gen_neon_dup_low16(tmp);
 +        }
 +    } else {
 +        tmp = neon_load_reg(reg & 15, reg >> 4);
 +    }
-+    return tmp;
++    return gt_phys_raw_cnt_offset(env);
 +}
 +
-+static bool do_2scalar(DisasContext *s, arg_2scalar *a,
+ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
-+                       NeonGenTwoOpFn *opfn, NeonGenTwoOpFn *accfn)
+ {
      ARMGenericTimer *gt = &cpu->env.cp15.c14_timer[timeridx];
@@ -XXX,XX +XXX,XX @@ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
           * reset timer to when ISTATUS next has to change
           */
          uint64_t offset = timeridx == GTIMER_VIRT ?
 -                                      cpu->env.cp15.cntvoff_el2 : 0;
 +            cpu->env.cp15.cntvoff_el2 : gt_phys_raw_cnt_offset(&cpu->env);
          uint64_t count = gt_get_countervalue(&cpu->env);
          /* Note that this must be unsigned 64 bit arithmetic: */
          int istatus = count - offset >= gt->cval;
@@ -XXX,XX +XXX,XX @@ static void gt_timer_reset(CPUARMState *env, const ARMCPRegInfo *ri,
  static uint64_t gt_cnt_read(CPUARMState *env, const ARMCPRegInfo *ri)
  {
 -    return gt_get_countervalue(env);
 +    return gt_get_countervalue(env) - gt_phys_cnt_offset(env);
  }
  static uint64_t gt_virt_cnt_offset(CPUARMState *env)
@@ -XXX,XX +XXX,XX @@ static uint64_t gt_tval_read(CPUARMState *env, const ARMCPRegInfo *ri,
      case GTIMER_HYPVIRT:
          offset = gt_virt_cnt_offset(env);
          break;
 +    case GTIMER_PHYS:
 +        offset = gt_phys_cnt_offset(env);
 +        break;
      }
      return (uint32_t)(env->cp15.c14_timer[timeridx].cval -
@@ -XXX,XX +XXX,XX @@ static void gt_tval_write(CPUARMState *env, const ARMCPRegInfo *ri,
      case GTIMER_HYPVIRT:
          offset = gt_virt_cnt_offset(env);
          break;
 +    case GTIMER_PHYS:
 +        offset = gt_phys_cnt_offset(env);
 +        break;
      }
      trace_arm_gt_tval_write(timeridx, value);
@@ -XXX,XX +XXX,XX @@ static void gt_cnthctl_write(CPUARMState *env, const ARMCPRegInfo *ri,
              R_CNTHCTL_EL1NVVCT_MASK |
              R_CNTHCTL_EVNTIS_MASK;
      }
 +    if (cpu_isar_feature(aa64_ecv, cpu)) {
 +        valid_mask |= R_CNTHCTL_ECV_MASK;
 +    }
      /* Clear RES0 bits */
      value &= valid_mask;
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo gen_timer_ecv_cp_reginfo[] = {
      },
  };
 +static CPAccessResult gt_cntpoff_access(CPUARMState *env,
 +                                        const ARMCPRegInfo *ri,
 +                                        bool isread)
 +{
-+    /*
++    if (arm_current_el(env) == 2 && !(env->cp15.scr_el3 & SCR_ECVEN)) {
-+     * Two registers and a scalar: perform an operation between
++        return CP_ACCESS_TRAP_EL3;
 +     * the input elements and the scalar, and then possibly
 +     * perform an accumulation operation of that result into the
 +     * destination.
 +     */
 +    TCGv_i32 scalar;
 +    int pass;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
 +        return false;
 +    }
-+
++    return CP_ACCESS_OK;
 +    /* UNDEF accesses to D16-D31 if they don't exist. */
 +    if (!dc_isar_feature(aa32_simd_r32, s) &&
 +        ((a->vd | a->vn | a->vm) & 0x10)) {
 +        return false;
 +    }
 +
 +    if (!opfn) {
 +        /* Bad size (including size == 3, which is a different insn group) */
 +        return false;
 +    }
 +
 +    if (a->q && ((a->vd | a->vn) & 1)) {
 +        return false;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    scalar = neon_get_scalar(a->size, a->vm);
 +
 +    for (pass = 0; pass < (a->q ? 4 : 2); pass++) {
 +        TCGv_i32 tmp = neon_load_reg(a->vn, pass);
 +        opfn(tmp, tmp, scalar);
 +        if (accfn) {
 +            TCGv_i32 rd = neon_load_reg(a->vd, pass);
 +            accfn(tmp, rd, tmp);
 +            tcg_temp_free_i32(rd);
 +        }
 +        neon_store_reg(a->vd, pass, tmp);
 +    }
 +    tcg_temp_free_i32(scalar);
 +    return true;
 +}
 +
-+static bool trans_VMUL_2sc(DisasContext *s, arg_2scalar *a)
++static void gt_cntpoff_write(CPUARMState *env, const ARMCPRegInfo *ri,
 +                              uint64_t value)
 +{
-+    static NeonGenTwoOpFn * const opfn[] = {
++    ARMCPU *cpu = env_archcpu(env);
 +        NULL,
 +        gen_helper_neon_mul_u16,
 +        tcg_gen_mul_i32,
 +        NULL,
 +    };
 +
-+    return do_2scalar(s, a, opfn[a->size], NULL);
++    trace_arm_gt_cntpoff_write(value);
 +    raw_write(env, ri, value);
 +    gt_recalc_timer(cpu, GTIMER_PHYS);
 +}
 +
-+static bool trans_VMLA_2sc(DisasContext *s, arg_2scalar *a)
++static const ARMCPRegInfo gen_timer_cntpoff_reginfo = {
-+{
++    .name = "CNTPOFF_EL2", .state = ARM_CP_STATE_AA64,
-+    static NeonGenTwoOpFn * const opfn[] = {
++    .opc0 = 3, .opc1 = 4, .crn = 14, .crm = 0, .opc2 = 6,
-+        NULL,
++    .access = PL2_RW, .type = ARM_CP_IO, .resetvalue = 0,
-+        gen_helper_neon_mul_u16,
++    .accessfn = gt_cntpoff_access, .writefn = gt_cntpoff_write,
-+        tcg_gen_mul_i32,
++    .nv2_redirect_offset = 0x1a8,
-+        NULL,
++    .fieldoffset = offsetof(CPUARMState, cp15.cntpoff_el2),
-+    };
++};
-+    static NeonGenTwoOpFn * const accfn[] = {
+ #else
-+        NULL,
-+        gen_helper_neon_add_u16,
+ /*
-+        tcg_gen_add_i32,
+@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
-+        NULL,
+     if (cpu_isar_feature(aa64_ecv_traps, cpu)) {
-+    };
+         define_arm_cp_regs(cpu, gen_timer_ecv_cp_reginfo);
-+
+     }
-+    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
++#ifndef CONFIG_USER_ONLY
-+}
++    if (cpu_isar_feature(aa64_ecv, cpu)) {
-+
++        define_one_arm_cp_reg(cpu, &gen_timer_cntpoff_reginfo);
-+static bool trans_VMLS_2sc(DisasContext *s, arg_2scalar *a)
++    }
-+{
++#endif
-+    static NeonGenTwoOpFn * const opfn[] = {
+     if (arm_feature(env, ARM_FEATURE_VAPA)) {
-+        NULL,
+         ARMCPRegInfo vapa_cp_reginfo[] = {
-+        gen_helper_neon_mul_u16,
+             { .name = "PAR", .cp = 15, .crn = 7, .crm = 4, .opc1 = 0, .opc2 = 0,
-+        tcg_gen_mul_i32,
+diff --git a/target/arm/trace-events b/target/arm/trace-events
 +        NULL,
 +    };
 +    static NeonGenTwoOpFn * const accfn[] = {
 +        NULL,
 +        gen_helper_neon_sub_u16,
 +        tcg_gen_sub_i32,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
+--- a/target/arm/trace-events
-+++ b/target/arm/translate.c
++++ b/target/arm/trace-events
-@@ -XXX,XX +XXX,XX @@ static int disas_dsp_insn(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ arm_gt_tval_write(int timer, uint64_t value) "gt_tval_write: timer %d value 0x%"
- #define VFP_DREG_N(reg, insn) VFP_DREG(reg, insn, 16,  7)
+ arm_gt_ctl_write(int timer, uint64_t value) "gt_ctl_write: timer %d value 0x%" PRIx64
- #define VFP_DREG_M(reg, insn) VFP_DREG(reg, insn,  0,  5)
+ arm_gt_imask_toggle(int timer) "gt_ctl_write: timer %d IMASK toggle"
+ arm_gt_cntvoff_write(uint64_t value) "gt_cntvoff_write: value 0x%" PRIx64
--static void gen_neon_dup_low16(TCGv_i32 var)
++arm_gt_cntpoff_write(uint64_t value) "gt_cntpoff_write: value 0x%" PRIx64
--{
+ arm_gt_update_irq(int timer, int irqstate) "gt_update_irq: timer %d irqstate %d"
--    TCGv_i32 tmp = tcg_temp_new_i32();
--    tcg_gen_ext16u_i32(var, var);
+ # kvm.c
 -    tcg_gen_shli_i32(tmp, var, 16);
 -    tcg_gen_or_i32(var, var, tmp);
 -    tcg_temp_free_i32(tmp);
 -}
 -
 -static void gen_neon_dup_high16(TCGv_i32 var)
 -{
 -    TCGv_i32 tmp = tcg_temp_new_i32();
 -    tcg_gen_andi_i32(var, var, 0xffff0000);
 -    tcg_gen_shri_i32(tmp, var, 16);
 -    tcg_gen_or_i32(var, var, tmp);
 -    tcg_temp_free_i32(tmp);
 -}
 -
  static inline bool use_goto_tb(DisasContext *s, target_ulong dest)
  {
  #ifndef CONFIG_USER_ONLY
@@ -XXX,XX +XXX,XX @@ static void gen_exception_return(DisasContext *s, TCGv_i32 pc)
  #define CPU_V001 cpu_V0, cpu_V0, cpu_V1
 -static inline void gen_neon_add(int size, TCGv_i32 t0, TCGv_i32 t1)
 -{
 -    switch (size) {
 -    case 0: gen_helper_neon_add_u8(t0, t0, t1); break;
 -    case 1: gen_helper_neon_add_u16(t0, t0, t1); break;
 -    case 2: tcg_gen_add_i32(t0, t0, t1); break;
 -    default: abort();
 -    }
 -}
 -
 -static inline void gen_neon_rsb(int size, TCGv_i32 t0, TCGv_i32 t1)
 -{
 -    switch (size) {
 -    case 0: gen_helper_neon_sub_u8(t0, t1, t0); break;
 -    case 1: gen_helper_neon_sub_u16(t0, t1, t0); break;
 -    case 2: tcg_gen_sub_i32(t0, t1, t0); break;
 -    default: return;
 -    }
 -}
 -
  static TCGv_i32 neon_load_scratch(int scratch)
  {
      TCGv_i32 tmp = tcg_temp_new_i32();
@@ -XXX,XX +XXX,XX @@ static void neon_store_scratch(int scratch, TCGv_i32 var)
      tcg_temp_free_i32(var);
  }
 -static inline TCGv_i32 neon_get_scalar(int size, int reg)
 -{
 -    TCGv_i32 tmp;
 -    if (size == 1) {
 -        tmp = neon_load_reg(reg & 7, reg >> 4);
 -        if (reg & 8) {
 -            gen_neon_dup_high16(tmp);
 -        } else {
 -            gen_neon_dup_low16(tmp);
 -        }
 -    } else {
 -        tmp = neon_load_reg(reg & 15, reg >> 4);
 -    }
 -    return tmp;
 -}
 -
  static int gen_neon_unzip(int rd, int rm, int size, int q)
  {
      TCGv_ptr pd, pm;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                      return 1;
                  }
                  switch (op) {
 +                case 0: /* Integer VMLA scalar */
 +                case 4: /* Integer VMLS scalar */
 +                case 8: /* Integer VMUL scalar */
 +                    return 1; /* handled by decodetree */
 +
                  case 1: /* Float VMLA scalar */
                  case 5: /* Floating point VMLS scalar */
                  case 9: /* Floating point VMUL scalar */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                          return 1;
                      }
                      /* fall through */
 -                case 0: /* Integer VMLA scalar */
 -                case 4: /* Integer VMLS scalar */
 -                case 8: /* Integer VMUL scalar */
                  case 12: /* VQDMULH scalar */
                  case 13: /* VQRDMULH scalar */
                      if (u && ((rd | rn) & 1)) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                              } else {
                                  gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
                              }
 -                        } else if (op & 1) {
 +                        } else {
                              TCGv_ptr fpstatus = get_fpstatus_ptr(1);
                              gen_helper_vfp_muls(tmp, tmp, tmp2, fpstatus);
                              tcg_temp_free_ptr(fpstatus);
 -                        } else {
 -                            switch (size) {
 -                            case 0: gen_helper_neon_mul_u8(tmp, tmp, tmp2); break;
 -                            case 1: gen_helper_neon_mul_u16(tmp, tmp, tmp2); break;
 -                            case 2: tcg_gen_mul_i32(tmp, tmp, tmp2); break;
 -                            default: abort();
 -                            }
                          }
                          tcg_temp_free_i32(tmp2);
                          if (op < 8) {
                              /* Accumulate.  */
                              tmp2 = neon_load_reg(rd, pass);
                              switch (op) {
 -                            case 0:
 -                                gen_neon_add(size, tmp, tmp2);
 -                                break;
                              case 1:
                              {
                                  TCGv_ptr fpstatus = get_fpstatus_ptr(1);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                                  tcg_temp_free_ptr(fpstatus);
                                  break;
                              }
 -                            case 4:
 -                                gen_neon_rsb(size, tmp, tmp2);
 -                                break;
                              case 5:
                              {
                                  TCGv_ptr fpstatus = get_fpstatus_ptr(1);
 --
-.20.1
+.34.1

-[PULL 11/23] target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
+Deleted patch
-Convert the float versions of VMLA, VMLS and VMUL in the Neon
--reg-scalar group to decodetree.
-Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
----
-As noted in the comment on the WRAP_FP_FN macro, we could have
-had a do_2scalar_fp() function, but for 3 insns it seemed
-simpler to just do the wrapping to get hold of the fpstatus ptr.
-(These are the only fp insns in the group.)
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
----
- target/arm/neon-dp.decode       |  3 ++
- target/arm/translate-neon.inc.c | 65 +++++++++++++++++++++++++++++++++
- target/arm/translate.c          | 37 ++-----------------
-files changed, 71 insertions(+), 34 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
-+++ b/target/arm/neon-dp.decode
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
-                  &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
-     VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
-+    VMLA_F_2sc   1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
-     VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
-+    VMLS_F_2sc   1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
-     VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
-+    VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
-   ]
- }
-diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
-+++ b/target/arm/translate-neon.inc.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_2sc(DisasContext *s, arg_2scalar *a)
-     return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
- }
-+
-+/*
-+ * Rather than have a float-specific version of do_2scalar just for
-+ * three insns, we wrap a NeonGenTwoSingleOpFn to turn it into
-+ * a NeonGenTwoOpFn.
-+ */
-+#define WRAP_FP_FN(WRAPNAME, FUNC)                              \
-+    static void WRAPNAME(TCGv_i32 rd, TCGv_i32 rn, TCGv_i32 rm) \
-+    {                                                           \
-+        TCGv_ptr fpstatus = get_fpstatus_ptr(1);                \
-+        FUNC(rd, rn, rm, fpstatus);                             \
-+        tcg_temp_free_ptr(fpstatus);                            \
-+    }
-+
-+WRAP_FP_FN(gen_VMUL_F_mul, gen_helper_vfp_muls)
-+WRAP_FP_FN(gen_VMUL_F_add, gen_helper_vfp_adds)
-+WRAP_FP_FN(gen_VMUL_F_sub, gen_helper_vfp_subs)
-+
-+static bool trans_VMUL_F_2sc(DisasContext *s, arg_2scalar *a)
-+{
-+    static NeonGenTwoOpFn * const opfn[] = {
-+        NULL,
-+        NULL, /* TODO: fp16 support */
-+        gen_VMUL_F_mul,
-+        NULL,
-+    };
-+
-+    return do_2scalar(s, a, opfn[a->size], NULL);
-+}
-+
-+static bool trans_VMLA_F_2sc(DisasContext *s, arg_2scalar *a)
-+{
-+    static NeonGenTwoOpFn * const opfn[] = {
-+        NULL,
-+        NULL, /* TODO: fp16 support */
-+        gen_VMUL_F_mul,
-+        NULL,
-+    };
-+    static NeonGenTwoOpFn * const accfn[] = {
-+        NULL,
-+        NULL, /* TODO: fp16 support */
-+        gen_VMUL_F_add,
-+        NULL,
-+    };
-+
-+    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
-+}
-+
-+static bool trans_VMLS_F_2sc(DisasContext *s, arg_2scalar *a)
-+{
-+    static NeonGenTwoOpFn * const opfn[] = {
-+        NULL,
-+        NULL, /* TODO: fp16 support */
-+        gen_VMUL_F_mul,
-+        NULL,
-+    };
-+    static NeonGenTwoOpFn * const accfn[] = {
-+        NULL,
-+        NULL, /* TODO: fp16 support */
-+        gen_VMUL_F_sub,
-+        NULL,
-+    };
-+
-+    return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
-+}
-diff --git a/target/arm/translate.c b/target/arm/translate.c
-index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate.c
-+++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                 case 0: /* Integer VMLA scalar */
-                 case 4: /* Integer VMLS scalar */
-                 case 8: /* Integer VMUL scalar */
--                    return 1; /* handled by decodetree */
--
-                 case 1: /* Float VMLA scalar */
-                 case 5: /* Floating point VMLS scalar */
-                 case 9: /* Floating point VMUL scalar */
--                    if (size == 1) {
--                        return 1;
--                    }
--                    /* fall through */
-+                    return 1; /* handled by decodetree */
-+
-                 case 12: /* VQDMULH scalar */
-                 case 13: /* VQRDMULH scalar */
-                     if (u && ((rd | rn) & 1)) {
-@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
-                             } else {
-                                 gen_helper_neon_qdmulh_s32(tmp, cpu_env, tmp, tmp2);
-                             }
--                        } else if (op == 13) {
-+                        } else {
-                             if (size == 1) {
-                                 gen_helper_neon_qrdmulh_s16(tmp, cpu_env, tmp, tmp2);
-                             } else {
-                                 gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
-                             }
--                        } else {
--                            TCGv_ptr fpstatus = get_fpstatus_ptr(1);
--                            gen_helper_vfp_muls(tmp, tmp, tmp2, fpstatus);
--                            tcg_temp_free_ptr(fpstatus);
-                         }
-                         tcg_temp_free_i32(tmp2);
--                        if (op < 8) {
--                            /* Accumulate.  */
--                            tmp2 = neon_load_reg(rd, pass);
--                            switch (op) {
--                            case 1:
--                            {
--                                TCGv_ptr fpstatus = get_fpstatus_ptr(1);
--                                gen_helper_vfp_adds(tmp, tmp, tmp2, fpstatus);
--                                tcg_temp_free_ptr(fpstatus);
--                                break;
--                            }
--                            case 5:
--                            {
--                                TCGv_ptr fpstatus = get_fpstatus_ptr(1);
--                                gen_helper_vfp_subs(tmp, tmp2, tmp, fpstatus);
--                                tcg_temp_free_ptr(fpstatus);
--                                break;
--                            }
--                            default:
--                                abort();
--                            }
--                            tcg_temp_free_i32(tmp2);
--                        }
-                         neon_store_reg(rd, pass, tmp);
-                     }
-                     break;
---
-.20.1

-[PULL 12/23] target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
+[PULL 08/14] target/arm: Enable FEAT_ECV for 'max' CPU
-Convert the VQDMULH and VQRDMULH insns in the 2-reg-scalar group
+Enable all FEAT_ECV features on the 'max' CPU.
 to decodetree.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20240301183219.2424889-9-peter.maydell@linaro.org
 ---
- target/arm/neon-dp.decode       |  3 +++
+ docs/system/arm/emulation.rst | 1 +
- target/arm/translate-neon.inc.c | 29 +++++++++++++++++++++++
+ target/arm/tcg/cpu64.c        | 1 +
- target/arm/translate.c          | 42 ++-------------------------------
+files changed, 2 insertions(+)
 files changed, 34 insertions(+), 40 deletions(-)
-diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
+diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/neon-dp.decode
+--- a/docs/system/arm/emulation.rst
-+++ b/target/arm/neon-dp.decode
++++ b/docs/system/arm/emulation.rst
-@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
+ - FEAT_DotProd (Advanced SIMD dot product instructions)
-     VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
+ - FEAT_DoubleFault (Double Fault Extension)
-     VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
+ - FEAT_E0PD (Preventing EL0 access to halves of address maps)
-+
++- FEAT_ECV (Enhanced Counter Virtualization)
-+    VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
+ - FEAT_EPAC (Enhanced pointer authentication)
-+    VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
+ - FEAT_ETS (Enhanced Translation Synchronization)
-   ]
+ - FEAT_EVT (Enhanced Virtualization Traps)
- }
+diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
 diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-neon.inc.c
+--- a/target/arm/tcg/cpu64.c
-+++ b/target/arm/translate-neon.inc.c
++++ b/target/arm/tcg/cpu64.c
-@@ -XXX,XX +XXX,XX @@ static bool trans_VMLS_F_2sc(DisasContext *s, arg_2scalar *a)
+@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
+     t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN64_2, 2); /* 64k stage2 supported */
-     return do_2scalar(s, a, opfn[a->size], accfn[a->size]);
+     t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN4_2, 2);  /*  4k stage2 supported */
- }
+     t = FIELD_DP64(t, ID_AA64MMFR0, FGT, 1);       /* FEAT_FGT */
-+
++    t = FIELD_DP64(t, ID_AA64MMFR0, ECV, 2);       /* FEAT_ECV */
-+WRAP_ENV_FN(gen_VQDMULH_16, gen_helper_neon_qdmulh_s16)
+     cpu->isar.id_aa64mmfr0 = t;
-+WRAP_ENV_FN(gen_VQDMULH_32, gen_helper_neon_qdmulh_s32)
-+WRAP_ENV_FN(gen_VQRDMULH_16, gen_helper_neon_qrdmulh_s16)
+     t = cpu->isar.id_aa64mmfr1;
 +WRAP_ENV_FN(gen_VQRDMULH_32, gen_helper_neon_qrdmulh_s32)
 +
 +static bool trans_VQDMULH_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        gen_VQDMULH_16,
 +        gen_VQDMULH_32,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], NULL);
 +}
 +
 +static bool trans_VQRDMULH_2sc(DisasContext *s, arg_2scalar *a)
 +{
 +    static NeonGenTwoOpFn * const opfn[] = {
 +        NULL,
 +        gen_VQRDMULH_16,
 +        gen_VQRDMULH_32,
 +        NULL,
 +    };
 +
 +    return do_2scalar(s, a, opfn[a->size], NULL);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_exception_return(DisasContext *s, TCGv_i32 pc)
  #define CPU_V001 cpu_V0, cpu_V0, cpu_V1
 -static TCGv_i32 neon_load_scratch(int scratch)
 -{
 -    TCGv_i32 tmp = tcg_temp_new_i32();
 -    tcg_gen_ld_i32(tmp, cpu_env, offsetof(CPUARMState, vfp.scratch[scratch]));
 -    return tmp;
 -}
 -
 -static void neon_store_scratch(int scratch, TCGv_i32 var)
 -{
 -    tcg_gen_st_i32(var, cpu_env, offsetof(CPUARMState, vfp.scratch[scratch]));
 -    tcg_temp_free_i32(var);
 -}
 -
  static int gen_neon_unzip(int rd, int rm, int size, int q)
  {
      TCGv_ptr pd, pm;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                  case 1: /* Float VMLA scalar */
                  case 5: /* Floating point VMLS scalar */
                  case 9: /* Floating point VMUL scalar */
 -                    return 1; /* handled by decodetree */
 -
                  case 12: /* VQDMULH scalar */
                  case 13: /* VQRDMULH scalar */
 -                    if (u && ((rd | rn) & 1)) {
 -                        return 1;
 -                    }
 -                    tmp = neon_get_scalar(size, rm);
 -                    neon_store_scratch(0, tmp);
 -                    for (pass = 0; pass < (u ? 4 : 2); pass++) {
 -                        tmp = neon_load_scratch(0);
 -                        tmp2 = neon_load_reg(rn, pass);
 -                        if (op == 12) {
 -                            if (size == 1) {
 -                                gen_helper_neon_qdmulh_s16(tmp, cpu_env, tmp, tmp2);
 -                            } else {
 -                                gen_helper_neon_qdmulh_s32(tmp, cpu_env, tmp, tmp2);
 -                            }
 -                        } else {
 -                            if (size == 1) {
 -                                gen_helper_neon_qrdmulh_s16(tmp, cpu_env, tmp, tmp2);
 -                            } else {
 -                                gen_helper_neon_qrdmulh_s32(tmp, cpu_env, tmp, tmp2);
 -                            }
 -                        }
 -                        tcg_temp_free_i32(tmp2);
 -                        neon_store_reg(rd, pass, tmp);
 -                    }
 -                    break;
 +                    return 1; /* handled by decodetree */
 +
                  case 3: /* VQDMLAL scalar */
                  case 7: /* VQDMLSL scalar */
                  case 11: /* VQDMULL scalar */
 --
-.20.1
+.34.1

-[PULL 18/23] hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
+[PULL 09/14] hw/gpio: Implement STM32L4x5 GPIO
-From: Jean-Christophe Dubois <jcd@tribudubois.net>
+From: Inès Varhol <ines.varhol@telecom-paris.fr>
-Some bits of the CCM registers are non writable.
+Features supported :
 - the 8 STM32L4x5 GPIOs are initialized with their reset values
     (except IDR, see below)
 - input mode : setting a pin in input mode "externally" (using input
     irqs) results in an out irq (transmitted to SYSCFG)
 - output mode : setting a bit in ODR sets the corresponding out irq
     (if this line is configured in output mode)
 - pull-up, pull-down
 - push-pull, open-drain
-This was left undone in the initial commit (all bits of registers were
+Difference with the real GPIOs :
-writable).
+- Alternate Function and Analog mode aren't implemented :
     pins in AF/Analog behave like pins in input mode
 - floating pins stay at their last value
 - register IDR reset values differ from the real one :
     values are coherent with the other registers reset values
     and the fact that AF/Analog modes aren't implemented
 - setting I/O output speed isn't supported
 - locking port bits isn't supported
 - ADC function isn't supported
 - GPIOH has 16 pins instead of 2 pins
 - writing to registers LCKR, AFRL, AFRH and ASCR is ineffective
-This patch adds the required code to protect the non writable bits.
+Signed-off-by: Arnaud Minier <arnaud.minier@telecom-paris.fr>
+Signed-off-by: Inès Varhol <ines.varhol@telecom-paris.fr>
-Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Message-id: 20200608133508.550046-1-jcd@tribudubois.net
+Acked-by: Alistair Francis <alistair.francis@wdc.com>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Message-id: 20240305210444.310665-2-ines.varhol@telecom-paris.fr
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/misc/imx6ul_ccm.c | 76 ++++++++++++++++++++++++++++++++++++--------
+ MAINTAINERS                        |   1 +
-file changed, 63 insertions(+), 13 deletions(-)
+ docs/system/arm/b-l475e-iot01a.rst |   2 +-
  include/hw/gpio/stm32l4x5_gpio.h   |  70 +++++
  hw/gpio/stm32l4x5_gpio.c           | 477 +++++++++++++++++++++++++++++
  hw/gpio/Kconfig                    |   3 +
  hw/gpio/meson.build                |   1 +
  hw/gpio/trace-events               |   6 +
 files changed, 559 insertions(+), 1 deletion(-)
  create mode 100644 include/hw/gpio/stm32l4x5_gpio.h
  create mode 100644 hw/gpio/stm32l4x5_gpio.c
-diff --git a/hw/misc/imx6ul_ccm.c b/hw/misc/imx6ul_ccm.c
+diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
---- a/hw/misc/imx6ul_ccm.c
+--- a/MAINTAINERS
-+++ b/hw/misc/imx6ul_ccm.c
++++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ F: hw/arm/stm32l4x5_soc.c
  F: hw/misc/stm32l4x5_exti.c
  F: hw/misc/stm32l4x5_syscfg.c
  F: hw/misc/stm32l4x5_rcc.c
 +F: hw/gpio/stm32l4x5_gpio.c
  F: include/hw/*/stm32l4x5_*.h
  B-L475E-IOT01A IoT Node
 diff --git a/docs/system/arm/b-l475e-iot01a.rst b/docs/system/arm/b-l475e-iot01a.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/system/arm/b-l475e-iot01a.rst
 +++ b/docs/system/arm/b-l475e-iot01a.rst
@@ -XXX,XX +XXX,XX @@ Currently B-L475E-IOT01A machine's only supports the following devices:
  - STM32L4x5 EXTI (Extended interrupts and events controller)
  - STM32L4x5 SYSCFG (System configuration controller)
  - STM32L4x5 RCC (Reset and clock control)
 +- STM32L4x5 GPIOs (General-purpose I/Os)
  Missing devices
  """""""""""""""
@@ -XXX,XX +XXX,XX @@ Missing devices
  The B-L475E-IOT01A does *not* support the following devices:
  - Serial ports (UART)
 -- General-purpose I/Os (GPIO)
  - Analog to Digital Converter (ADC)
  - SPI controller
  - Timer controller (TIMER)
 diff --git a/include/hw/gpio/stm32l4x5_gpio.h b/include/hw/gpio/stm32l4x5_gpio.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/include/hw/gpio/stm32l4x5_gpio.h
 @@ -XXX,XX +XXX,XX @@
++/*
- #include "trace.h"
++ * STM32L4x5 GPIO (General Purpose Input/Ouput)
++ *
-+static const uint32_t ccm_mask[CCM_MAX] = {
++ * Copyright (c) 2024 Arnaud Minier <arnaud.minier@telecom-paris.fr>
-+    [CCM_CCR] = 0xf01fef80,
++ * Copyright (c) 2024 Inès Varhol <ines.varhol@telecom-paris.fr>
-+    [CCM_CCDR] = 0xfffeffff,
++ *
-+    [CCM_CSR] = 0xffffffff,
++ * SPDX-License-Identifier: GPL-2.0-or-later
-+    [CCM_CCSR] = 0xfffffef2,
++ *
-+    [CCM_CACRR] = 0xfffffff8,
++ * This work is licensed under the terms of the GNU GPL, version 2 or later.
-+    [CCM_CBCDR] = 0xc1f8e000,
++ * See the COPYING file in the top-level directory.
-+    [CCM_CBCMR] = 0xfc03cfff,
++ */
-+    [CCM_CSCMR1] = 0x80700000,
++
-+    [CCM_CSCMR2] = 0xe01ff003,
++/*
-+    [CCM_CSCDR1] = 0xfe00c780,
++ * The reference used is the STMicroElectronics RM0351 Reference manual
-+    [CCM_CS1CDR] = 0xfe00fe00,
++ * for STM32L4x5 and STM32L4x6 advanced Arm ® -based 32-bit MCUs.
-+    [CCM_CS2CDR] = 0xf8007000,
++ * https://www.st.com/en/microcontrollers-microprocessors/stm32l4x5/documentation.html
-+    [CCM_CDCDR] = 0xf00fffff,
++ */
-+    [CCM_CHSCCDR] = 0xfffc01ff,
++
-+    [CCM_CSCDR2] = 0xfe0001ff,
++#ifndef HW_STM32L4X5_GPIO_H
-+    [CCM_CSCDR3] = 0xffffc1ff,
++#define HW_STM32L4X5_GPIO_H
-+    [CCM_CDHIPR] = 0xffffffff,
++
-+    [CCM_CTOR] = 0x00000000,
++#include "hw/sysbus.h"
-+    [CCM_CLPCR] = 0xf39ff01c,
++#include "qom/object.h"
-+    [CCM_CISR] = 0xfb85ffbe,
++
-+    [CCM_CIMR] = 0xfb85ffbf,
++#define TYPE_STM32L4X5_GPIO "stm32l4x5-gpio"
-+    [CCM_CCOSR] = 0xfe00fe00,
++OBJECT_DECLARE_SIMPLE_TYPE(Stm32l4x5GpioState, STM32L4X5_GPIO)
-+    [CCM_CGPR] = 0xfffc3fea,
++
-+    [CCM_CCGR0] = 0x00000000,
++#define GPIO_NUM_PINS 16
-+    [CCM_CCGR1] = 0x00000000,
++
-+    [CCM_CCGR2] = 0x00000000,
++struct Stm32l4x5GpioState {
-+    [CCM_CCGR3] = 0x00000000,
++    SysBusDevice parent_obj;
-+    [CCM_CCGR4] = 0x00000000,
++
-+    [CCM_CCGR5] = 0x00000000,
++    MemoryRegion mmio;
-+    [CCM_CCGR6] = 0x00000000,
++
-+    [CCM_CMEOR] = 0xafffff1f,
++    /* GPIO registers */
 +    uint32_t moder;
 +    uint32_t otyper;
 +    uint32_t ospeedr;
 +    uint32_t pupdr;
 +    uint32_t idr;
 +    uint32_t odr;
 +    uint32_t lckr;
 +    uint32_t afrl;
 +    uint32_t afrh;
 +    uint32_t ascr;
 +
 +    /* GPIO registers reset values */
 +    uint32_t moder_reset;
 +    uint32_t ospeedr_reset;
 +    uint32_t pupdr_reset;
 +
 +    /*
 +     * External driving of pins.
 +     * The pins can be set externally through the device
 +     * anonymous input GPIOs lines under certain conditions.
 +     * The pin must not be in push-pull output mode,
 +     * and can't be set high in open-drain mode.
 +     * Pins driven externally and configured to
 +     * output mode will in general be "disconnected"
 +     * (see `get_gpio_pinmask_to_disconnect()`)
 +     */
 +    uint16_t disconnected_pins;
 +    uint16_t pins_connected_high;
 +
 +    char *name;
 +    Clock *clk;
 +    qemu_irq pin[GPIO_NUM_PINS];
 +};
 +
-+static const uint32_t analog_mask[CCM_ANALOG_MAX] = {
++#endif
-+    [CCM_ANALOG_PLL_ARM] = 0xfff60f80,
+diff --git a/hw/gpio/stm32l4x5_gpio.c b/hw/gpio/stm32l4x5_gpio.c
-+    [CCM_ANALOG_PLL_USB1] = 0xfffe0fbc,
+new file mode 100644
-+    [CCM_ANALOG_PLL_USB2] = 0xfffe0fbc,
+index XXXXXXX..XXXXXXX
-+    [CCM_ANALOG_PLL_SYS] = 0xfffa0ffe,
+--- /dev/null
-+    [CCM_ANALOG_PLL_SYS_SS] = 0x00000000,
++++ b/hw/gpio/stm32l4x5_gpio.c
-+    [CCM_ANALOG_PLL_SYS_NUM] = 0xc0000000,
+@@ -XXX,XX +XXX,XX @@
-+    [CCM_ANALOG_PLL_SYS_DENOM] = 0xc0000000,
++/*
-+    [CCM_ANALOG_PLL_AUDIO] = 0xffe20f80,
++ * STM32L4x5 GPIO (General Purpose Input/Ouput)
-+    [CCM_ANALOG_PLL_AUDIO_NUM] = 0xc0000000,
++ *
-+    [CCM_ANALOG_PLL_AUDIO_DENOM] = 0xc0000000,
++ * Copyright (c) 2024 Arnaud Minier <arnaud.minier@telecom-paris.fr>
-+    [CCM_ANALOG_PLL_VIDEO] = 0xffe20f80,
++ * Copyright (c) 2024 Inès Varhol <ines.varhol@telecom-paris.fr>
-+    [CCM_ANALOG_PLL_VIDEO_NUM] = 0xc0000000,
++ *
-+    [CCM_ANALOG_PLL_VIDEO_DENOM] = 0xc0000000,
++ * SPDX-License-Identifier: GPL-2.0-or-later
-+    [CCM_ANALOG_PLL_ENET] = 0xffc20ff0,
++ *
-+    [CCM_ANALOG_PFD_480] = 0x40404040,
++ * This work is licensed under the terms of the GNU GPL, version 2 or later.
-+    [CCM_ANALOG_PFD_528] = 0x40404040,
++ * See the COPYING file in the top-level directory.
-+    [PMU_MISC0] = 0x01fe8306,
++ */
-+    [PMU_MISC1] = 0x07fcede0,
++
-+    [PMU_MISC2] = 0x005f5f5f,
++/*
 + * The reference used is the STMicroElectronics RM0351 Reference manual
 + * for STM32L4x5 and STM32L4x6 advanced Arm ® -based 32-bit MCUs.
 + * https://www.st.com/en/microcontrollers-microprocessors/stm32l4x5/documentation.html
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qemu/log.h"
 +#include "hw/gpio/stm32l4x5_gpio.h"
 +#include "hw/irq.h"
 +#include "hw/qdev-clock.h"
 +#include "hw/qdev-properties.h"
 +#include "qapi/visitor.h"
 +#include "qapi/error.h"
 +#include "migration/vmstate.h"
 +#include "trace.h"
 +
 +#define GPIO_MODER 0x00
 +#define GPIO_OTYPER 0x04
 +#define GPIO_OSPEEDR 0x08
 +#define GPIO_PUPDR 0x0C
 +#define GPIO_IDR 0x10
 +#define GPIO_ODR 0x14
 +#define GPIO_BSRR 0x18
 +#define GPIO_LCKR 0x1C
 +#define GPIO_AFRL 0x20
 +#define GPIO_AFRH 0x24
 +#define GPIO_BRR 0x28
 +#define GPIO_ASCR 0x2C
 +
 +/* 0b11111111_11111111_00000000_00000000 */
 +#define RESERVED_BITS_MASK 0xFFFF0000
 +
 +static void update_gpio_idr(Stm32l4x5GpioState *s);
 +
 +static bool is_pull_up(Stm32l4x5GpioState *s, unsigned pin)
 +{
 +    return extract32(s->pupdr, 2 * pin, 2) == 1;
 +}
 +
 +static bool is_pull_down(Stm32l4x5GpioState *s, unsigned pin)
 +{
 +    return extract32(s->pupdr, 2 * pin, 2) == 2;
 +}
 +
 +static bool is_output(Stm32l4x5GpioState *s, unsigned pin)
 +{
 +    return extract32(s->moder, 2 * pin, 2) == 1;
 +}
 +
 +static bool is_open_drain(Stm32l4x5GpioState *s, unsigned pin)
 +{
 +    return extract32(s->otyper, pin, 1) == 1;
 +}
 +
 +static bool is_push_pull(Stm32l4x5GpioState *s, unsigned pin)
 +{
 +    return extract32(s->otyper, pin, 1) == 0;
 +}
 +
 +static void stm32l4x5_gpio_reset_hold(Object *obj)
 +{
 +    Stm32l4x5GpioState *s = STM32L4X5_GPIO(obj);
 +
 +    s->moder = s->moder_reset;
 +    s->otyper = 0x00000000;
 +    s->ospeedr = s->ospeedr_reset;
 +    s->pupdr = s->pupdr_reset;
 +    s->idr = 0x00000000;
 +    s->odr = 0x00000000;
 +    s->lckr = 0x00000000;
 +    s->afrl = 0x00000000;
 +    s->afrh = 0x00000000;
 +    s->ascr = 0x00000000;
 +
 +    s->disconnected_pins = 0xFFFF;
 +    s->pins_connected_high = 0x0000;
 +    update_gpio_idr(s);
 +}
 +
 +static void stm32l4x5_gpio_set(void *opaque, int line, int level)
 +{
 +    Stm32l4x5GpioState *s = opaque;
 +    /*
 +     * The pin isn't set if line is configured in output mode
 +     * except if level is 0 and the output is open-drain.
 +     * This way there will be no short-circuit prone situations.
 +     */
 +    if (is_output(s, line) && !(is_open_drain(s, line) && (level == 0))) {
 +        qemu_log_mask(LOG_GUEST_ERROR, "Line %d can't be driven externally\n",
 +                      line);
 +        return;
 +    }
 +
 +    s->disconnected_pins &= ~(1 << line);
 +    if (level) {
 +        s->pins_connected_high |= (1 << line);
 +    } else {
 +        s->pins_connected_high &= ~(1 << line);
 +    }
 +    trace_stm32l4x5_gpio_pins(s->name, s->disconnected_pins,
 +                              s->pins_connected_high);
 +    update_gpio_idr(s);
 +}
 +
 +
 +static void update_gpio_idr(Stm32l4x5GpioState *s)
 +{
 +    uint32_t new_idr_mask = 0;
 +    uint32_t new_idr = s->odr;
 +    uint32_t old_idr = s->idr;
 +    int new_pin_state, old_pin_state;
 +
 +    for (int i = 0; i < GPIO_NUM_PINS; i++) {
 +        if (is_output(s, i)) {
 +            if (is_push_pull(s, i)) {
 +                new_idr_mask |= (1 << i);
 +            } else if (!(s->odr & (1 << i))) {
 +                /* open-drain ODR 0 */
 +                new_idr_mask |= (1 << i);
 +            /* open-drain ODR 1 */
 +            } else if (!(s->disconnected_pins & (1 << i)) &&
 +                       !(s->pins_connected_high & (1 << i))) {
 +                /* open-drain ODR 1 with pin connected low */
 +                new_idr_mask |= (1 << i);
 +                new_idr &= ~(1 << i);
 +            /* open-drain ODR 1 with unactive pin */
 +            } else if (is_pull_up(s, i)) {
 +                new_idr_mask |= (1 << i);
 +            } else if (is_pull_down(s, i)) {
 +                new_idr_mask |= (1 << i);
 +                new_idr &= ~(1 << i);
 +            }
 +            /*
 +             * The only case left is for open-drain ODR 1
 +             * with unactive pin without pull-up or pull-down :
 +             * the value is floating.
 +             */
 +        /* input or analog mode with connected pin */
 +        } else if (!(s->disconnected_pins & (1 << i))) {
 +            if (s->pins_connected_high & (1 << i)) {
 +                /* pin high */
 +                new_idr_mask |= (1 << i);
 +                new_idr |= (1 << i);
 +            } else {
 +                /* pin low */
 +                new_idr_mask |= (1 << i);
 +                new_idr &= ~(1 << i);
 +            }
 +        /* input or analog mode with disconnected pin */
 +        } else {
 +            if (is_pull_up(s, i)) {
 +                /* pull-up */
 +                new_idr_mask |= (1 << i);
 +                new_idr |= (1 << i);
 +            } else if (is_pull_down(s, i)) {
 +                /* pull-down */
 +                new_idr_mask |= (1 << i);
 +                new_idr &= ~(1 << i);
 +            }
 +            /*
 +             * The only case left is for a disconnected pin
 +             * without pull-up or pull-down :
 +             * the value is floating.
 +             */
 +        }
 +    }
 +
 +    s->idr = (old_idr & ~new_idr_mask) | (new_idr & new_idr_mask);
 +    trace_stm32l4x5_gpio_update_idr(s->name, old_idr, s->idr);
 +
 +    for (int i = 0; i < GPIO_NUM_PINS; i++) {
 +        if (new_idr_mask & (1 << i)) {
 +            new_pin_state = (new_idr & (1 << i)) > 0;
 +            old_pin_state = (old_idr & (1 << i)) > 0;
 +            if (new_pin_state > old_pin_state) {
 +                qemu_irq_raise(s->pin[i]);
 +            } else if (new_pin_state < old_pin_state) {
 +                qemu_irq_lower(s->pin[i]);
 +            }
 +        }
 +    }
 +}
 +
 +/*
 + * Return mask of pins that are both configured in output
 + * mode and externally driven (except pins in open-drain
 + * mode externally set to 0).
 + */
 +static uint32_t get_gpio_pinmask_to_disconnect(Stm32l4x5GpioState *s)
 +{
 +    uint32_t pins_to_disconnect = 0;
 +    for (int i = 0; i < GPIO_NUM_PINS; i++) {
 +        /* for each connected pin in output mode */
 +        if (!(s->disconnected_pins & (1 << i)) && is_output(s, i)) {
 +            /* if either push-pull or high level */
 +            if (is_push_pull(s, i) || s->pins_connected_high & (1 << i)) {
 +                pins_to_disconnect |= (1 << i);
 +                qemu_log_mask(LOG_GUEST_ERROR,
 +                              "Line %d can't be driven externally\n",
 +                              i);
 +            }
 +        }
 +    }
 +    return pins_to_disconnect;
 +}
 +
 +/*
 + * Set field `disconnected_pins` and call `update_gpio_idr()`
 + */
 +static void disconnect_gpio_pins(Stm32l4x5GpioState *s, uint16_t lines)
 +{
 +    s->disconnected_pins |= lines;
 +    trace_stm32l4x5_gpio_pins(s->name, s->disconnected_pins,
 +                              s->pins_connected_high);
 +    update_gpio_idr(s);
 +}
 +
 +static void disconnected_pins_set(Object *obj, Visitor *v,
 +    const char *name, void *opaque, Error **errp)
 +{
 +    Stm32l4x5GpioState *s = STM32L4X5_GPIO(obj);
 +    uint16_t value;
 +    if (!visit_type_uint16(v, name, &value, errp)) {
 +        return;
 +    }
 +    disconnect_gpio_pins(s, value);
 +}
 +
 +static void disconnected_pins_get(Object *obj, Visitor *v,
 +    const char *name, void *opaque, Error **errp)
 +{
 +    visit_type_uint16(v, name, (uint16_t *)opaque, errp);
 +}
 +
 +static void clock_freq_get(Object *obj, Visitor *v,
 +    const char *name, void *opaque, Error **errp)
 +{
 +    Stm32l4x5GpioState *s = STM32L4X5_GPIO(obj);
 +    uint32_t clock_freq_hz = clock_get_hz(s->clk);
 +    visit_type_uint32(v, name, &clock_freq_hz, errp);
 +}
 +
 +static void stm32l4x5_gpio_write(void *opaque, hwaddr addr,
 +                                 uint64_t val64, unsigned int size)
 +{
 +    Stm32l4x5GpioState *s = opaque;
 +
 +    uint32_t value = val64;
 +    trace_stm32l4x5_gpio_write(s->name, addr, val64);
 +
 +    switch (addr) {
 +    case GPIO_MODER:
 +        s->moder = value;
 +        disconnect_gpio_pins(s, get_gpio_pinmask_to_disconnect(s));
 +        qemu_log_mask(LOG_UNIMP,
 +                      "%s: Analog and AF modes aren't supported\n\
 +                       Analog and AF mode behave like input mode\n",
 +                      __func__);
 +        return;
 +    case GPIO_OTYPER:
 +        s->otyper = value & ~RESERVED_BITS_MASK;
 +        disconnect_gpio_pins(s, get_gpio_pinmask_to_disconnect(s));
 +        return;
 +    case GPIO_OSPEEDR:
 +        qemu_log_mask(LOG_UNIMP,
 +                      "%s: Changing I/O output speed isn't supported\n\
 +                       I/O speed is already maximal\n",
 +                      __func__);
 +        s->ospeedr = value;
 +        return;
 +    case GPIO_PUPDR:
 +        s->pupdr = value;
 +        update_gpio_idr(s);
 +        return;
 +    case GPIO_IDR:
 +        qemu_log_mask(LOG_UNIMP,
 +                      "%s: GPIO->IDR is read-only\n",
 +                      __func__);
 +        return;
 +    case GPIO_ODR:
 +        s->odr = value & ~RESERVED_BITS_MASK;
 +        update_gpio_idr(s);
 +        return;
 +    case GPIO_BSRR: {
 +        uint32_t bits_to_reset = (value & RESERVED_BITS_MASK) >> GPIO_NUM_PINS;
 +        uint32_t bits_to_set = value & ~RESERVED_BITS_MASK;
 +        /* If both BSx and BRx are set, BSx has priority.*/
 +        s->odr &= ~bits_to_reset;
 +        s->odr |= bits_to_set;
 +        update_gpio_idr(s);
 +        return;
 +    }
 +    case GPIO_LCKR:
 +        qemu_log_mask(LOG_UNIMP,
 +                      "%s: Locking port bits configuration isn't supported\n",
 +                      __func__);
 +        s->lckr = value & ~RESERVED_BITS_MASK;
 +        return;
 +    case GPIO_AFRL:
 +        qemu_log_mask(LOG_UNIMP,
 +                      "%s: Alternate functions aren't supported\n",
 +                      __func__);
 +        s->afrl = value;
 +        return;
 +    case GPIO_AFRH:
 +        qemu_log_mask(LOG_UNIMP,
 +                      "%s: Alternate functions aren't supported\n",
 +                      __func__);
 +        s->afrh = value;
 +        return;
 +    case GPIO_BRR: {
 +        uint32_t bits_to_reset = value & ~RESERVED_BITS_MASK;
 +        s->odr &= ~bits_to_reset;
 +        update_gpio_idr(s);
 +        return;
 +    }
 +    case GPIO_ASCR:
 +        qemu_log_mask(LOG_UNIMP,
 +                      "%s: ADC function isn't supported\n",
 +                      __func__);
 +        s->ascr = value & ~RESERVED_BITS_MASK;
 +        return;
 +    default:
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "%s: Bad offset 0x%" HWADDR_PRIx "\n", __func__, addr);
 +    }
 +}
 +
 +static uint64_t stm32l4x5_gpio_read(void *opaque, hwaddr addr,
 +                                    unsigned int size)
 +{
 +    Stm32l4x5GpioState *s = opaque;
 +
 +    trace_stm32l4x5_gpio_read(s->name, addr);
 +
 +    switch (addr) {
 +    case GPIO_MODER:
 +        return s->moder;
 +    case GPIO_OTYPER:
 +        return s->otyper;
 +    case GPIO_OSPEEDR:
 +        return s->ospeedr;
 +    case GPIO_PUPDR:
 +        return s->pupdr;
 +    case GPIO_IDR:
 +        return s->idr;
 +    case GPIO_ODR:
 +        return s->odr;
 +    case GPIO_BSRR:
 +        return 0;
 +    case GPIO_LCKR:
 +        return s->lckr;
 +    case GPIO_AFRL:
 +        return s->afrl;
 +    case GPIO_AFRH:
 +        return s->afrh;
 +    case GPIO_BRR:
 +        return 0;
 +    case GPIO_ASCR:
 +        return s->ascr;
 +    default:
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "%s: Bad offset 0x%" HWADDR_PRIx "\n", __func__, addr);
 +        return 0;
 +    }
 +}
 +
 +static const MemoryRegionOps stm32l4x5_gpio_ops = {
 +    .read = stm32l4x5_gpio_read,
 +    .write = stm32l4x5_gpio_write,
 +    .endianness = DEVICE_NATIVE_ENDIAN,
 +    .impl = {
 +        .min_access_size = 4,
 +        .max_access_size = 4,
 +        .unaligned = false,
 +    },
 +    .valid = {
 +        .min_access_size = 4,
 +        .max_access_size = 4,
 +        .unaligned = false,
 +    },
 +};
 +
- static const char *imx6ul_ccm_reg_name(uint32_t reg)
++static void stm32l4x5_gpio_init(Object *obj)
- {
++{
-     static char unknown[20];
++    Stm32l4x5GpioState *s = STM32L4X5_GPIO(obj);
-@@ -XXX,XX +XXX,XX @@ static void imx6ul_ccm_write(void *opaque, hwaddr offset, uint64_t value,
++
++    memory_region_init_io(&s->mmio, obj, &stm32l4x5_gpio_ops, s,
-     trace_ccm_write_reg(imx6ul_ccm_reg_name(index), (uint32_t)value);
++                          TYPE_STM32L4X5_GPIO, 0x400);
++
--    /*
++    sysbus_init_mmio(SYS_BUS_DEVICE(obj), &s->mmio);
--     * We will do a better implementation later. In particular some bits
++
--     * cannot be written to.
++    qdev_init_gpio_out(DEVICE(obj), s->pin, GPIO_NUM_PINS);
--     */
++    qdev_init_gpio_in(DEVICE(obj), stm32l4x5_gpio_set, GPIO_NUM_PINS);
--    s->ccm[index] = (uint32_t)value;
++
-+    s->ccm[index] = (s->ccm[index] & ccm_mask[index]) |
++    s->clk = qdev_init_clock_in(DEVICE(s), "clk", NULL, s, 0);
-+                           ((uint32_t)value & ~ccm_mask[index]);
++
- }
++    object_property_add(obj, "disconnected-pins", "uint16",
++                        disconnected_pins_get, disconnected_pins_set,
- static uint64_t imx6ul_analog_read(void *opaque, hwaddr offset, unsigned size)
++                        NULL, &s->disconnected_pins);
-@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
++    object_property_add(obj, "clock-freq-hz", "uint32",
-          * the REG_NAME register. So we change the value of the
++                        clock_freq_get, NULL, NULL, NULL);
-          * REG_NAME register, setting bits passed in the value.
++}
-          */
++
--        s->analog[index - 1] |= value;
++static void stm32l4x5_gpio_realize(DeviceState *dev, Error **errp)
-+        s->analog[index - 1] |= (value & ~analog_mask[index - 1]);
++{
-         break;
++    Stm32l4x5GpioState *s = STM32L4X5_GPIO(dev);
-     case CCM_ANALOG_PLL_ARM_CLR:
++    if (!clock_has_source(s->clk)) {
-     case CCM_ANALOG_PLL_USB1_CLR:
++        error_setg(errp, "GPIO: clk input must be connected");
-@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
++        return;
-          * the REG_NAME register. So we change the value of the
++    }
-          * REG_NAME register, unsetting bits passed in the value.
++}
-          */
++
--        s->analog[index - 2] &= ~value;
++static const VMStateDescription vmstate_stm32l4x5_gpio = {
-+        s->analog[index - 2] &= ~(value & ~analog_mask[index - 2]);
++    .name = TYPE_STM32L4X5_GPIO,
-         break;
++    .version_id = 1,
-     case CCM_ANALOG_PLL_ARM_TOG:
++    .minimum_version_id = 1,
-     case CCM_ANALOG_PLL_USB1_TOG:
++    .fields = (VMStateField[]){
-@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
++        VMSTATE_UINT32(moder, Stm32l4x5GpioState),
-          * the REG_NAME register. So we change the value of the
++        VMSTATE_UINT32(otyper, Stm32l4x5GpioState),
-          * REG_NAME register, toggling bits passed in the value.
++        VMSTATE_UINT32(ospeedr, Stm32l4x5GpioState),
-          */
++        VMSTATE_UINT32(pupdr, Stm32l4x5GpioState),
--        s->analog[index - 3] ^= value;
++        VMSTATE_UINT32(idr, Stm32l4x5GpioState),
-+        s->analog[index - 3] ^= (value & ~analog_mask[index - 3]);
++        VMSTATE_UINT32(odr, Stm32l4x5GpioState),
-         break;
++        VMSTATE_UINT32(lckr, Stm32l4x5GpioState),
-     default:
++        VMSTATE_UINT32(afrl, Stm32l4x5GpioState),
--        /*
++        VMSTATE_UINT32(afrh, Stm32l4x5GpioState),
--         * We will do a better implementation later. In particular some bits
++        VMSTATE_UINT32(ascr, Stm32l4x5GpioState),
--         * cannot be written to.
++        VMSTATE_UINT16(disconnected_pins, Stm32l4x5GpioState),
--         */
++        VMSTATE_UINT16(pins_connected_high, Stm32l4x5GpioState),
--        s->analog[index] = value;
++        VMSTATE_END_OF_LIST()
-+        s->analog[index] = (s->analog[index] & analog_mask[index]) |
++    }
-+                           (value & ~analog_mask[index]);
++};
-         break;
++
-     }
++static Property stm32l4x5_gpio_properties[] = {
- }
++    DEFINE_PROP_STRING("name", Stm32l4x5GpioState, name),
 +    DEFINE_PROP_UINT32("mode-reset", Stm32l4x5GpioState, moder_reset, 0),
 +    DEFINE_PROP_UINT32("ospeed-reset", Stm32l4x5GpioState, ospeedr_reset, 0),
 +    DEFINE_PROP_UINT32("pupd-reset", Stm32l4x5GpioState, pupdr_reset, 0),
 +    DEFINE_PROP_END_OF_LIST(),
 +};
 +
 +static void stm32l4x5_gpio_class_init(ObjectClass *klass, void *data)
 +{
 +    DeviceClass *dc = DEVICE_CLASS(klass);
 +    ResettableClass *rc = RESETTABLE_CLASS(klass);
 +
 +    device_class_set_props(dc, stm32l4x5_gpio_properties);
 +    dc->vmsd = &vmstate_stm32l4x5_gpio;
 +    dc->realize = stm32l4x5_gpio_realize;
 +    rc->phases.hold = stm32l4x5_gpio_reset_hold;
 +}
 +
 +static const TypeInfo stm32l4x5_gpio_types[] = {
 +    {
 +        .name = TYPE_STM32L4X5_GPIO,
 +        .parent = TYPE_SYS_BUS_DEVICE,
 +        .instance_size = sizeof(Stm32l4x5GpioState),
 +        .instance_init = stm32l4x5_gpio_init,
 +        .class_init = stm32l4x5_gpio_class_init,
 +    },
 +};
 +
 +DEFINE_TYPES(stm32l4x5_gpio_types)
 diff --git a/hw/gpio/Kconfig b/hw/gpio/Kconfig
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/gpio/Kconfig
 +++ b/hw/gpio/Kconfig
@@ -XXX,XX +XXX,XX @@ config GPIO_PWR
  config SIFIVE_GPIO
      bool
 +
 +config STM32L4X5_GPIO
 +    bool
 diff --git a/hw/gpio/meson.build b/hw/gpio/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/gpio/meson.build
 +++ b/hw/gpio/meson.build
@@ -XXX,XX +XXX,XX @@ system_ss.add(when: 'CONFIG_RASPI', if_true: files(
      'bcm2835_gpio.c',
      'bcm2838_gpio.c'
  ))
 +system_ss.add(when: 'CONFIG_STM32L4X5_SOC', if_true: files('stm32l4x5_gpio.c'))
  system_ss.add(when: 'CONFIG_ASPEED_SOC', if_true: files('aspeed_gpio.c'))
  system_ss.add(when: 'CONFIG_SIFIVE_GPIO', if_true: files('sifive_gpio.c'))
 diff --git a/hw/gpio/trace-events b/hw/gpio/trace-events
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/gpio/trace-events
 +++ b/hw/gpio/trace-events
@@ -XXX,XX +XXX,XX @@ sifive_gpio_update_output_irq(int64_t line, int64_t value) "line %" PRIi64 " val
  # aspeed_gpio.c
  aspeed_gpio_read(uint64_t offset, uint64_t value) "offset: 0x%" PRIx64 " value 0x%" PRIx64
  aspeed_gpio_write(uint64_t offset, uint64_t value) "offset: 0x%" PRIx64 " value 0x%" PRIx64
 +
 +# stm32l4x5_gpio.c
 +stm32l4x5_gpio_read(char *gpio, uint64_t addr) "GPIO%s addr: 0x%" PRIx64 " "
 +stm32l4x5_gpio_write(char *gpio, uint64_t addr, uint64_t data) "GPIO%s addr: 0x%" PRIx64 " val: 0x%" PRIx64 ""
 +stm32l4x5_gpio_update_idr(char *gpio, uint32_t old_idr, uint32_t new_idr) "GPIO%s from: 0x%x to: 0x%x"
 +stm32l4x5_gpio_pins(char *gpio, uint16_t disconnected, uint16_t high) "GPIO%s disconnected pins: 0x%x levels: 0x%x"
 --
-.20.1
+.34.1

-[PULL 19/23] Implement configurable descriptor size in ftgmac100
+[PULL 10/14] hw/arm: Connect STM32L4x5 GPIO to STM32L4x5 SoC
-From: Erik Smit <erik.lucas.smit@gmail.com>
+From: Inès Varhol <ines.varhol@telecom-paris.fr>
-The hardware supports configurable descriptor sizes, configured in the DBLAC
+Signed-off-by: Arnaud Minier <arnaud.minier@telecom-paris.fr>
-register.
+Signed-off-by: Inès Varhol <ines.varhol@telecom-paris.fr>
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
-Most drivers use the default 4 word descriptor, which is currently hardcoded,
+Acked-by: Alistair Francis <alistair.francis@wdc.com>
-but Aspeed SDK configures 8 words to store extra data.
+Message-id: 20240305210444.310665-3-ines.varhol@telecom-paris.fr
 Signed-off-by: Erik Smit <erik.lucas.smit@gmail.com>
 Reviewed-by: Cédric Le Goater <clg@kaod.org>
 [PMM: removed unnecessary parens]
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/net/ftgmac100.c | 26 ++++++++++++++++++++++++--
+ include/hw/arm/stm32l4x5_soc.h     |  2 +
-file changed, 24 insertions(+), 2 deletions(-)
+ include/hw/gpio/stm32l4x5_gpio.h   |  1 +
+ include/hw/misc/stm32l4x5_syscfg.h |  3 +-
-diff --git a/hw/net/ftgmac100.c b/hw/net/ftgmac100.c
+ hw/arm/stm32l4x5_soc.c             | 71 +++++++++++++++++++++++-------
-index XXXXXXX..XXXXXXX 100644
+ hw/misc/stm32l4x5_syscfg.c         |  1 +
---- a/hw/net/ftgmac100.c
+ hw/arm/Kconfig                     |  3 +-
-+++ b/hw/net/ftgmac100.c
+files changed, 63 insertions(+), 18 deletions(-)
-@@ -XXX,XX +XXX,XX @@
- #define FTGMAC100_APTC_TXPOLL_CNT(x)        (((x) >> 8) & 0xf)
+diff --git a/include/hw/arm/stm32l4x5_soc.h b/include/hw/arm/stm32l4x5_soc.h
- #define FTGMAC100_APTC_TXPOLL_TIME_SEL      (1 << 12)
+index XXXXXXX..XXXXXXX 100644
+--- a/include/hw/arm/stm32l4x5_soc.h
-+/*
++++ b/include/hw/arm/stm32l4x5_soc.h
-+ * DMA burst length and arbitration control register
+@@ -XXX,XX +XXX,XX @@
-+ */
+ #include "hw/misc/stm32l4x5_syscfg.h"
-+#define FTGMAC100_DBLAC_RXBURST_SIZE(x)     (((x) >> 8) & 0x3)
+ #include "hw/misc/stm32l4x5_exti.h"
-+#define FTGMAC100_DBLAC_TXBURST_SIZE(x)     (((x) >> 10) & 0x3)
+ #include "hw/misc/stm32l4x5_rcc.h"
-+#define FTGMAC100_DBLAC_RXDES_SIZE(x)       ((((x) >> 12) & 0xf) * 8)
++#include "hw/gpio/stm32l4x5_gpio.h"
-+#define FTGMAC100_DBLAC_TXDES_SIZE(x)       ((((x) >> 16) & 0xf) * 8)
+ #include "qom/object.h"
-+#define FTGMAC100_DBLAC_IFG_CNT(x)          (((x) >> 20) & 0x7)
-+#define FTGMAC100_DBLAC_IFG_INC             (1 << 23)
+ #define TYPE_STM32L4X5_SOC "stm32l4x5-soc"
-+
+@@ -XXX,XX +XXX,XX @@ struct Stm32l4x5SocState {
- /*
+     OrIRQState exti_or_gates[NUM_EXTI_OR_GATES];
-  * PHY control register
+     Stm32l4x5SyscfgState syscfg;
-  */
+     Stm32l4x5RccState rcc;
-@@ -XXX,XX +XXX,XX @@ static void ftgmac100_do_tx(FTGMAC100State *s, uint32_t tx_ring,
++    Stm32l4x5GpioState gpio[NUM_GPIOS];
-         if (bd.des0 & s->txdes0_edotr) {
-             addr = tx_ring;
+     MemoryRegion sram1;
-         } else {
+     MemoryRegion sram2;
--            addr += sizeof(FTGMAC100Desc);
+diff --git a/include/hw/gpio/stm32l4x5_gpio.h b/include/hw/gpio/stm32l4x5_gpio.h
-+            addr += FTGMAC100_DBLAC_TXDES_SIZE(s->dblac);
+index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/gpio/stm32l4x5_gpio.h
 +++ b/include/hw/gpio/stm32l4x5_gpio.h
@@ -XXX,XX +XXX,XX @@
  #define TYPE_STM32L4X5_GPIO "stm32l4x5-gpio"
  OBJECT_DECLARE_SIMPLE_TYPE(Stm32l4x5GpioState, STM32L4X5_GPIO)
 +#define NUM_GPIOS 8
  #define GPIO_NUM_PINS 16
  struct Stm32l4x5GpioState {
 diff --git a/include/hw/misc/stm32l4x5_syscfg.h b/include/hw/misc/stm32l4x5_syscfg.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/misc/stm32l4x5_syscfg.h
 +++ b/include/hw/misc/stm32l4x5_syscfg.h
@@ -XXX,XX +XXX,XX @@
  #include "hw/sysbus.h"
  #include "qom/object.h"
 +#include "hw/gpio/stm32l4x5_gpio.h"
  #define TYPE_STM32L4X5_SYSCFG "stm32l4x5-syscfg"
  OBJECT_DECLARE_SIMPLE_TYPE(Stm32l4x5SyscfgState, STM32L4X5_SYSCFG)
 -#define NUM_GPIOS 8
 -#define GPIO_NUM_PINS 16
  #define SYSCFG_NUM_EXTICR 4
  struct Stm32l4x5SyscfgState {
 diff --git a/hw/arm/stm32l4x5_soc.c b/hw/arm/stm32l4x5_soc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/stm32l4x5_soc.c
 +++ b/hw/arm/stm32l4x5_soc.c
@@ -XXX,XX +XXX,XX @@
  #include "sysemu/sysemu.h"
  #include "hw/or-irq.h"
  #include "hw/arm/stm32l4x5_soc.h"
 +#include "hw/gpio/stm32l4x5_gpio.h"
  #include "hw/qdev-clock.h"
  #include "hw/misc/unimp.h"
@@ -XXX,XX +XXX,XX @@ static const int exti_or_gate1_lines_in[EXTI_OR_GATE1_NUM_LINES_IN] = {
 , 35, 36, 37, 38,
  };
 +static const struct {
 +    uint32_t addr;
 +    uint32_t moder_reset;
 +    uint32_t ospeedr_reset;
 +    uint32_t pupdr_reset;
 +} stm32l4x5_gpio_cfg[NUM_GPIOS] = {
 +    { 0x48000000, 0xABFFFFFF, 0x0C000000, 0x64000000 },
 +    { 0x48000400, 0xFFFFFEBF, 0x00000000, 0x00000100 },
 +    { 0x48000800, 0xFFFFFFFF, 0x00000000, 0x00000000 },
 +    { 0x48000C00, 0xFFFFFFFF, 0x00000000, 0x00000000 },
 +    { 0x48001000, 0xFFFFFFFF, 0x00000000, 0x00000000 },
 +    { 0x48001400, 0xFFFFFFFF, 0x00000000, 0x00000000 },
 +    { 0x48001800, 0xFFFFFFFF, 0x00000000, 0x00000000 },
 +    { 0x48001C00, 0x0000000F, 0x00000000, 0x00000000 },
 +};
 +
  static void stm32l4x5_soc_initfn(Object *obj)
  {
      Stm32l4x5SocState *s = STM32L4X5_SOC(obj);
@@ -XXX,XX +XXX,XX @@ static void stm32l4x5_soc_initfn(Object *obj)
      }
      object_initialize_child(obj, "syscfg", &s->syscfg, TYPE_STM32L4X5_SYSCFG);
      object_initialize_child(obj, "rcc", &s->rcc, TYPE_STM32L4X5_RCC);
 +
 +    for (unsigned i = 0; i < NUM_GPIOS; i++) {
 +        g_autofree char *name = g_strdup_printf("gpio%c", 'a' + i);
 +        object_initialize_child(obj, name, &s->gpio[i], TYPE_STM32L4X5_GPIO);
 +    }
  }
  static void stm32l4x5_soc_realize(DeviceState *dev_soc, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void stm32l4x5_soc_realize(DeviceState *dev_soc, Error **errp)
      Stm32l4x5SocState *s = STM32L4X5_SOC(dev_soc);
      const Stm32l4x5SocClass *sc = STM32L4X5_SOC_GET_CLASS(dev_soc);
      MemoryRegion *system_memory = get_system_memory();
 -    DeviceState *armv7m;
 +    DeviceState *armv7m, *dev;
      SysBusDevice *busdev;
 +    uint32_t pin_index;
      if (!memory_region_init_rom(&s->flash, OBJECT(dev_soc), "flash",
                                  sc->flash_size, errp)) {
@@ -XXX,XX +XXX,XX @@ static void stm32l4x5_soc_realize(DeviceState *dev_soc, Error **errp)
          return;
      }
 +    /* GPIOs */
 +    for (unsigned i = 0; i < NUM_GPIOS; i++) {
 +        g_autofree char *name = g_strdup_printf("%c", 'A' + i);
 +        dev = DEVICE(&s->gpio[i]);
 +        qdev_prop_set_string(dev, "name", name);
 +        qdev_prop_set_uint32(dev, "mode-reset",
 +                             stm32l4x5_gpio_cfg[i].moder_reset);
 +        qdev_prop_set_uint32(dev, "ospeed-reset",
 +                             stm32l4x5_gpio_cfg[i].ospeedr_reset);
 +        qdev_prop_set_uint32(dev, "pupd-reset",
 +                            stm32l4x5_gpio_cfg[i].pupdr_reset);
 +        busdev = SYS_BUS_DEVICE(&s->gpio[i]);
 +        g_free(name);
 +        name = g_strdup_printf("gpio%c-out", 'a' + i);
 +        qdev_connect_clock_in(DEVICE(&s->gpio[i]), "clk",
 +            qdev_get_clock_out(DEVICE(&(s->rcc)), name));
 +        if (!sysbus_realize(busdev, errp)) {
 +            return;
 +        }
 +        sysbus_mmio_map(busdev, 0, stm32l4x5_gpio_cfg[i].addr);
 +    }
 +
      /* System configuration controller */
      busdev = SYS_BUS_DEVICE(&s->syscfg);
      if (!sysbus_realize(busdev, errp)) {
          return;
      }
      sysbus_mmio_map(busdev, 0, SYSCFG_ADDR);
 -    /*
 -     * TODO: when the GPIO device is implemented, connect it
 -     * to SYCFG using `qdev_connect_gpio_out`, NUM_GPIOS and
 -     * GPIO_NUM_PINS.
 -     */
 +
 +    for (unsigned i = 0; i < NUM_GPIOS; i++) {
 +        for (unsigned j = 0; j < GPIO_NUM_PINS; j++) {
 +            pin_index = GPIO_NUM_PINS * i + j;
 +            qdev_connect_gpio_out(DEVICE(&s->gpio[i]), j,
 +                                  qdev_get_gpio_in(DEVICE(&s->syscfg),
 +                                  pin_index));
 +        }
 +    }
      /* EXTI device */
      busdev = SYS_BUS_DEVICE(&s->exti);
@@ -XXX,XX +XXX,XX @@ static void stm32l4x5_soc_realize(DeviceState *dev_soc, Error **errp)
          }
      }
-@@ -XXX,XX +XXX,XX @@ static void ftgmac100_write(void *opaque, hwaddr addr,
+-    for (unsigned i = 0; i < 16; i++) {
-         s->phydata = value & 0xffff;
++    for (unsigned i = 0; i < GPIO_NUM_PINS; i++) {
-         break;
+         qdev_connect_gpio_out(DEVICE(&s->syscfg), i,
-     case FTGMAC100_DBLAC: /* DMA Burst Length and Arbitration Control */
+                               qdev_get_gpio_in(DEVICE(&s->exti), i));
-+        if (FTGMAC100_DBLAC_TXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
+     }
-+            qemu_log_mask(LOG_GUEST_ERROR,
+@@ -XXX,XX +XXX,XX @@ static void stm32l4x5_soc_realize(DeviceState *dev_soc, Error **errp)
-+                          "%s: transmit descriptor too small : %d bytes\n",
+     /* RESERVED:    0x40024400, 0x7FDBC00 */
-+                          __func__, FTGMAC100_DBLAC_TXDES_SIZE(s->dblac));
-+            break;
+     /* AHB2 BUS */
-+        }
+-    create_unimplemented_device("GPIOA",     0x48000000, 0x400);
-+        if (FTGMAC100_DBLAC_RXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
+-    create_unimplemented_device("GPIOB",     0x48000400, 0x400);
-+            qemu_log_mask(LOG_GUEST_ERROR,
+-    create_unimplemented_device("GPIOC",     0x48000800, 0x400);
-+                          "%s: receive descriptor too small : %d bytes\n",
+-    create_unimplemented_device("GPIOD",     0x48000C00, 0x400);
-+                          __func__, FTGMAC100_DBLAC_RXDES_SIZE(s->dblac));
+-    create_unimplemented_device("GPIOE",     0x48001000, 0x400);
-+            break;
+-    create_unimplemented_device("GPIOF",     0x48001400, 0x400);
-+        }
+-    create_unimplemented_device("GPIOG",     0x48001800, 0x400);
-         s->dblac = value;
+-    create_unimplemented_device("GPIOH",     0x48001C00, 0x400);
-         break;
+     /* RESERVED:    0x48002000, 0x7FDBC00 */
-     case FTGMAC100_REVR:  /* Feature Register */
+     create_unimplemented_device("OTG_FS",    0x50000000, 0x40000);
-@@ -XXX,XX +XXX,XX @@ static ssize_t ftgmac100_receive(NetClientState *nc, const uint8_t *buf,
+     create_unimplemented_device("ADC",       0x50040000, 0x400);
-         if (bd.des0 & s->rxdes0_edorr) {
+diff --git a/hw/misc/stm32l4x5_syscfg.c b/hw/misc/stm32l4x5_syscfg.c
-             addr = s->rx_ring;
+index XXXXXXX..XXXXXXX 100644
-         } else {
+--- a/hw/misc/stm32l4x5_syscfg.c
--            addr += sizeof(FTGMAC100Desc);
++++ b/hw/misc/stm32l4x5_syscfg.c
-+            addr += FTGMAC100_DBLAC_RXDES_SIZE(s->dblac);
+@@ -XXX,XX +XXX,XX @@
-         }
+ #include "hw/irq.h"
-     }
+ #include "migration/vmstate.h"
-     s->rx_descriptor = addr;
+ #include "hw/misc/stm32l4x5_syscfg.h"
 +#include "hw/gpio/stm32l4x5_gpio.h"
  #define SYSCFG_MEMRMP 0x00
  #define SYSCFG_CFGR1 0x04
 diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/Kconfig
 +++ b/hw/arm/Kconfig
@@ -XXX,XX +XXX,XX @@ config STM32L4X5_SOC
      bool
      select ARM_V7M
      select OR_IRQ
 -    select STM32L4X5_SYSCFG
      select STM32L4X5_EXTI
 +    select STM32L4X5_SYSCFG
      select STM32L4X5_RCC
 +    select STM32L4X5_GPIO
  config XLNX_ZYNQMP_ARM
      bool
 --
-.20.1
+.34.1

-[PULL 22/23] sd: sdhci: Implement basic vendor specific register support
+[PULL 11/14] tests/qtest: Add STM32L4x5 GPIO QTest testcase
-From: Guenter Roeck <linux@roeck-us.net>
+From: Inès Varhol <ines.varhol@telecom-paris.fr>
-The Linux kernel's IMX code now uses vendor specific commands.
+The testcase contains :
-This results in endless warnings when booting the Linux kernel.
+- `test_idr_reset_value()` :
 Checks the reset values of MODER, OTYPER, PUPDR, ODR and IDR.
 - `test_gpio_output_mode()` :
 Checks that writing a bit in register ODR results in the corresponding
 pin rising or lowering, if this pin is configured in output mode.
 - `test_gpio_input_mode()` :
 Checks that a input pin set high or low externally results
 in the pin rising and lowering.
 - `test_pull_up_pull_down()` :
 Checks that a floating pin in pull-up/down mode is actually high/down.
 - `test_push_pull()` :
 Checks that a pin set externally is disconnected when configured in
 push-pull output mode, and can't be set externally while in this mode.
 - `test_open_drain()` :
 Checks that a pin set externally high is disconnected when configured
 in open-drain output mode, and can't be set high while in this mode.
 - `test_bsrr_brr()` :
 Checks that writing to BSRR and BRR has the desired result in ODR.
 - `test_clock_enable()` :
 Checks that GPIO clock is at the right frequency after enabling it.
-sdhci-esdhc-imx 2194000.usdhc: esdhc_wait_for_card_clock_gate_off:
+Acked-by: Thomas Huth <thuth@redhat.com>
-    card clock still not gate off in 100us!.
+Signed-off-by: Arnaud Minier <arnaud.minier@telecom-paris.fr>
+Signed-off-by: Inès Varhol <ines.varhol@telecom-paris.fr>
-Implement support for the vendor specific command implemented in IMX hardware
+Message-id: 20240305210444.310665-4-ines.varhol@telecom-paris.fr
 to be able to avoid this warning.
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Guenter Roeck <linux@roeck-us.net>
 Message-id: 20200603145258.195920-2-linux@roeck-us.net
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/sd/sdhci-internal.h |  5 +++++
+ tests/qtest/stm32l4x5_gpio-test.c | 551 ++++++++++++++++++++++++++++++
- include/hw/sd/sdhci.h  |  5 +++++
+ tests/qtest/meson.build           |   3 +-
- hw/sd/sdhci.c          | 18 +++++++++++++++++-
+files changed, 553 insertions(+), 1 deletion(-)
-files changed, 27 insertions(+), 1 deletion(-)
+ create mode 100644 tests/qtest/stm32l4x5_gpio-test.c
-diff --git a/hw/sd/sdhci-internal.h b/hw/sd/sdhci-internal.h
+diff --git a/tests/qtest/stm32l4x5_gpio-test.c b/tests/qtest/stm32l4x5_gpio-test.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/tests/qtest/stm32l4x5_gpio-test.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * QTest testcase for STM32L4x5_GPIO
 + *
 + * Copyright (c) 2024 Arnaud Minier <arnaud.minier@telecom-paris.fr>
 + * Copyright (c) 2024 Inès Varhol <ines.varhol@telecom-paris.fr>
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "libqtest-single.h"
 +
 +#define GPIO_BASE_ADDR 0x48000000
 +#define GPIO_SIZE      0x400
 +#define NUM_GPIOS      8
 +#define NUM_GPIO_PINS  16
 +
 +#define GPIO_A 0x48000000
 +#define GPIO_B 0x48000400
 +#define GPIO_C 0x48000800
 +#define GPIO_D 0x48000C00
 +#define GPIO_E 0x48001000
 +#define GPIO_F 0x48001400
 +#define GPIO_G 0x48001800
 +#define GPIO_H 0x48001C00
 +
 +#define MODER 0x00
 +#define OTYPER 0x04
 +#define PUPDR 0x0C
 +#define IDR 0x10
 +#define ODR 0x14
 +#define BSRR 0x18
 +#define BRR 0x28
 +
 +#define MODER_INPUT 0
 +#define MODER_OUTPUT 1
 +
 +#define PUPDR_NONE 0
 +#define PUPDR_PULLUP 1
 +#define PUPDR_PULLDOWN 2
 +
 +#define OTYPER_PUSH_PULL 0
 +#define OTYPER_OPEN_DRAIN 1
 +
 +const uint32_t moder_reset[NUM_GPIOS] = {
 +    0xABFFFFFF,
 +    0xFFFFFEBF,
 +    0xFFFFFFFF,
 +    0xFFFFFFFF,
 +    0xFFFFFFFF,
 +    0xFFFFFFFF,
 +    0xFFFFFFFF,
 +    0x0000000F
 +};
 +
 +const uint32_t pupdr_reset[NUM_GPIOS] = {
 +    0x64000000,
 +    0x00000100,
 +    0x00000000,
 +    0x00000000,
 +    0x00000000,
 +    0x00000000,
 +    0x00000000,
 +    0x00000000
 +};
 +
 +const uint32_t idr_reset[NUM_GPIOS] = {
 +    0x0000A000,
 +    0x00000010,
 +    0x00000000,
 +    0x00000000,
 +    0x00000000,
 +    0x00000000,
 +    0x00000000,
 +    0x00000000
 +};
 +
 +static uint32_t gpio_readl(unsigned int gpio, unsigned int offset)
 +{
 +    return readl(gpio + offset);
 +}
 +
 +static void gpio_writel(unsigned int gpio, unsigned int offset, uint32_t value)
 +{
 +    writel(gpio + offset, value);
 +}
 +
 +static void gpio_set_bit(unsigned int gpio, unsigned int reg,
 +                         unsigned int pin, uint32_t value)
 +{
 +    uint32_t mask = 0xFFFFFFFF & ~(0x1 << pin);
 +    gpio_writel(gpio, reg, (gpio_readl(gpio, reg) & mask) | value << pin);
 +}
 +
 +static void gpio_set_2bits(unsigned int gpio, unsigned int reg,
 +                           unsigned int pin, uint32_t value)
 +{
 +    uint32_t offset = 2 * pin;
 +    uint32_t mask = 0xFFFFFFFF & ~(0x3 << offset);
 +    gpio_writel(gpio, reg, (gpio_readl(gpio, reg) & mask) | value << offset);
 +}
 +
 +static unsigned int get_gpio_id(uint32_t gpio_addr)
 +{
 +    return (gpio_addr - GPIO_BASE_ADDR) / GPIO_SIZE;
 +}
 +
 +static void gpio_set_irq(unsigned int gpio, int num, int level)
 +{
 +    g_autofree char *name = g_strdup_printf("/machine/soc/gpio%c",
 +                                            get_gpio_id(gpio) + 'a');
 +    qtest_set_irq_in(global_qtest, name, NULL, num, level);
 +}
 +
 +static void disconnect_all_pins(unsigned int gpio)
 +{
 +    g_autofree char *path = g_strdup_printf("/machine/soc/gpio%c",
 +                                            get_gpio_id(gpio) + 'a');
 +    QDict *r;
 +
 +    r = qtest_qmp(global_qtest, "{ 'execute': 'qom-set', 'arguments': "
 +        "{ 'path': %s, 'property': 'disconnected-pins', 'value': %d } }",
 +        path, 0xFFFF);
 +    g_assert_false(qdict_haskey(r, "error"));
 +    qobject_unref(r);
 +}
 +
 +static uint32_t get_disconnected_pins(unsigned int gpio)
 +{
 +    g_autofree char *path = g_strdup_printf("/machine/soc/gpio%c",
 +                                            get_gpio_id(gpio) + 'a');
 +    uint32_t disconnected_pins = 0;
 +    QDict *r;
 +
 +    r = qtest_qmp(global_qtest, "{ 'execute': 'qom-get', 'arguments':"
 +        " { 'path': %s, 'property': 'disconnected-pins'} }", path);
 +    g_assert_false(qdict_haskey(r, "error"));
 +    disconnected_pins = qdict_get_int(r, "return");
 +    qobject_unref(r);
 +    return disconnected_pins;
 +}
 +
 +static uint32_t reset(uint32_t gpio, unsigned int offset)
 +{
 +    switch (offset) {
 +    case MODER:
 +        return moder_reset[get_gpio_id(gpio)];
 +    case PUPDR:
 +        return pupdr_reset[get_gpio_id(gpio)];
 +    case IDR:
 +        return idr_reset[get_gpio_id(gpio)];
 +    }
 +    return 0x0;
 +}
 +
 +static void system_reset(void)
 +{
 +    QDict *r;
 +    r = qtest_qmp(global_qtest, "{'execute': 'system_reset'}");
 +    g_assert_false(qdict_haskey(r, "error"));
 +    qobject_unref(r);
 +}
 +
 +static void test_idr_reset_value(void)
 +{
 +    /*
 +     * Checks that the values in MODER, OTYPER, PUPDR and ODR
 +     * after reset are correct, and that the value in IDR is
 +     * coherent.
 +     * Since AF and analog modes aren't implemented, IDR reset
 +     * values aren't the same as with a real board.
 +     *
 +     * Register IDR contains the actual values of all GPIO pins.
 +     * Its value depends on the pins' configuration
 +     * (intput/output/analog : register MODER, push-pull/open-drain :
 +     * register OTYPER, pull-up/pull-down/none : register PUPDR)
 +     * and on the values stored in register ODR
 +     * (in case the pin is in output mode).
 +     */
 +
 +    gpio_writel(GPIO_A, MODER, 0xDEADBEEF);
 +    gpio_writel(GPIO_A, ODR, 0xDEADBEEF);
 +    gpio_writel(GPIO_A, OTYPER, 0xDEADBEEF);
 +    gpio_writel(GPIO_A, PUPDR, 0xDEADBEEF);
 +
 +    gpio_writel(GPIO_B, MODER, 0xDEADBEEF);
 +    gpio_writel(GPIO_B, ODR, 0xDEADBEEF);
 +    gpio_writel(GPIO_B, OTYPER, 0xDEADBEEF);
 +    gpio_writel(GPIO_B, PUPDR, 0xDEADBEEF);
 +
 +    gpio_writel(GPIO_C, MODER, 0xDEADBEEF);
 +    gpio_writel(GPIO_C, ODR, 0xDEADBEEF);
 +    gpio_writel(GPIO_C, OTYPER, 0xDEADBEEF);
 +    gpio_writel(GPIO_C, PUPDR, 0xDEADBEEF);
 +
 +    gpio_writel(GPIO_H, MODER, 0xDEADBEEF);
 +    gpio_writel(GPIO_H, ODR, 0xDEADBEEF);
 +    gpio_writel(GPIO_H, OTYPER, 0xDEADBEEF);
 +    gpio_writel(GPIO_H, PUPDR, 0xDEADBEEF);
 +
 +    system_reset();
 +
 +    uint32_t moder = gpio_readl(GPIO_A, MODER);
 +    uint32_t odr = gpio_readl(GPIO_A, ODR);
 +    uint32_t otyper = gpio_readl(GPIO_A, OTYPER);
 +    uint32_t pupdr = gpio_readl(GPIO_A, PUPDR);
 +    uint32_t idr = gpio_readl(GPIO_A, IDR);
 +    /* 15: AF, 14: AF, 13: AF, 12: Analog ... */
 +    /* here AF is the same as Analog and Input mode */
 +    g_assert_cmphex(moder, ==, reset(GPIO_A, MODER));
 +    g_assert_cmphex(odr, ==, reset(GPIO_A, ODR));
 +    g_assert_cmphex(otyper, ==, reset(GPIO_A, OTYPER));
 +    /* 15: pull-up, 14: pull-down, 13: pull-up, 12: neither ... */
 +    g_assert_cmphex(pupdr, ==, reset(GPIO_A, PUPDR));
 +    /* 15 : 1, 14: 0, 13: 1, 12 : reset value ... */
 +    g_assert_cmphex(idr, ==, reset(GPIO_A, IDR));
 +
 +    moder = gpio_readl(GPIO_B, MODER);
 +    odr = gpio_readl(GPIO_B, ODR);
 +    otyper = gpio_readl(GPIO_B, OTYPER);
 +    pupdr = gpio_readl(GPIO_B, PUPDR);
 +    idr = gpio_readl(GPIO_B, IDR);
 +    /* ... 5: Analog, 4: AF, 3: AF, 2: Analog ... */
 +    /* here AF is the same as Analog and Input mode */
 +    g_assert_cmphex(moder, ==, reset(GPIO_B, MODER));
 +    g_assert_cmphex(odr, ==, reset(GPIO_B, ODR));
 +    g_assert_cmphex(otyper, ==, reset(GPIO_B, OTYPER));
 +    /* ... 5: neither, 4: pull-up, 3: neither ... */
 +    g_assert_cmphex(pupdr, ==, reset(GPIO_B, PUPDR));
 +    /* ... 5 : reset value, 4 : 1, 3 : reset value ... */
 +    g_assert_cmphex(idr, ==, reset(GPIO_B, IDR));
 +
 +    moder = gpio_readl(GPIO_C, MODER);
 +    odr = gpio_readl(GPIO_C, ODR);
 +    otyper = gpio_readl(GPIO_C, OTYPER);
 +    pupdr = gpio_readl(GPIO_C, PUPDR);
 +    idr = gpio_readl(GPIO_C, IDR);
 +    /* Analog, same as Input mode*/
 +    g_assert_cmphex(moder, ==, reset(GPIO_C, MODER));
 +    g_assert_cmphex(odr, ==, reset(GPIO_C, ODR));
 +    g_assert_cmphex(otyper, ==, reset(GPIO_C, OTYPER));
 +    /* no pull-up or pull-down */
 +    g_assert_cmphex(pupdr, ==, reset(GPIO_C, PUPDR));
 +    /* reset value */
 +    g_assert_cmphex(idr, ==, reset(GPIO_C, IDR));
 +
 +    moder = gpio_readl(GPIO_H, MODER);
 +    odr = gpio_readl(GPIO_H, ODR);
 +    otyper = gpio_readl(GPIO_H, OTYPER);
 +    pupdr = gpio_readl(GPIO_H, PUPDR);
 +    idr = gpio_readl(GPIO_H, IDR);
 +    /* Analog, same as Input mode */
 +    g_assert_cmphex(moder, ==, reset(GPIO_H, MODER));
 +    g_assert_cmphex(odr, ==, reset(GPIO_H, ODR));
 +    g_assert_cmphex(otyper, ==, reset(GPIO_H, OTYPER));
 +    /* no pull-up or pull-down */
 +    g_assert_cmphex(pupdr, ==, reset(GPIO_H, PUPDR));
 +    /* reset value */
 +    g_assert_cmphex(idr, ==, reset(GPIO_H, IDR));
 +}
 +
 +static void test_gpio_output_mode(const void *data)
 +{
 +    /*
 +     * Checks that setting a bit in ODR sets the corresponding
 +     * GPIO line high : it should set the right bit in IDR
 +     * and send an irq to syscfg.
 +     * Additionally, it checks that values written to ODR
 +     * when not in output mode are stored and not discarded.
 +     */
 +    unsigned int pin = ((uint64_t)data) & 0xF;
 +    uint32_t gpio = ((uint64_t)data) >> 32;
 +    unsigned int gpio_id = get_gpio_id(gpio);
 +
 +    qtest_irq_intercept_in(global_qtest, "/machine/soc/syscfg");
 +
 +    /* Set a bit in ODR and check nothing happens */
 +    gpio_set_bit(gpio, ODR, pin, 1);
 +    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR));
 +    g_assert_false(get_irq(gpio_id * NUM_GPIO_PINS + pin));
 +
 +    /* Configure the relevant line as output and check the pin is high */
 +    gpio_set_2bits(gpio, MODER, pin, MODER_OUTPUT);
 +    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) | (1 << pin));
 +    g_assert_true(get_irq(gpio_id * NUM_GPIO_PINS + pin));
 +
 +    /* Reset the bit in ODR and check the pin is low */
 +    gpio_set_bit(gpio, ODR, pin, 0);
 +    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) & ~(1 << pin));
 +    g_assert_false(get_irq(gpio_id * NUM_GPIO_PINS + pin));
 +
 +    /* Clean the test */
 +    gpio_writel(gpio, ODR, reset(gpio, ODR));
 +    gpio_writel(gpio, MODER, reset(gpio, MODER));
 +    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR));
 +    g_assert_false(get_irq(gpio_id * NUM_GPIO_PINS + pin));
 +}
 +
 +static void test_gpio_input_mode(const void *data)
 +{
 +    /*
 +     * Test that setting a line high/low externally sets the
 +     * corresponding GPIO line high/low : it should set the
 +     * right bit in IDR and send an irq to syscfg.
 +     */
 +    unsigned int pin = ((uint64_t)data) & 0xF;
 +    uint32_t gpio = ((uint64_t)data) >> 32;
 +    unsigned int gpio_id = get_gpio_id(gpio);
 +
 +    qtest_irq_intercept_in(global_qtest, "/machine/soc/syscfg");
 +
 +    /* Configure a line as input, raise it, and check that the pin is high */
 +    gpio_set_2bits(gpio, MODER, pin, MODER_INPUT);
 +    gpio_set_irq(gpio, pin, 1);
 +    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) | (1 << pin));
 +    g_assert_true(get_irq(gpio_id * NUM_GPIO_PINS + pin));
 +
 +    /* Lower the line and check that the pin is low */
 +    gpio_set_irq(gpio, pin, 0);
 +    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) & ~(1 << pin));
 +    g_assert_false(get_irq(gpio_id * NUM_GPIO_PINS + pin));
 +
 +    /* Clean the test */
 +    gpio_writel(gpio, MODER, reset(gpio, MODER));
 +    disconnect_all_pins(gpio);
 +    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR));
 +}
 +
 +static void test_pull_up_pull_down(const void *data)
 +{
 +    /*
 +     * Test that a floating pin with pull-up sets the pin
 +     * high and vice-versa.
 +     */
 +    unsigned int pin = ((uint64_t)data) & 0xF;
 +    uint32_t gpio = ((uint64_t)data) >> 32;
 +    unsigned int gpio_id = get_gpio_id(gpio);
 +
 +    qtest_irq_intercept_in(global_qtest, "/machine/soc/syscfg");
 +
 +    /* Configure a line as input with pull-up, check the line is set high */
 +    gpio_set_2bits(gpio, MODER, pin, MODER_INPUT);
 +    gpio_set_2bits(gpio, PUPDR, pin, PUPDR_PULLUP);
 +    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) | (1 << pin));
 +    g_assert_true(get_irq(gpio_id * NUM_GPIO_PINS + pin));
 +
 +    /* Configure the line with pull-down, check the line is low */
 +    gpio_set_2bits(gpio, PUPDR, pin, PUPDR_PULLDOWN);
 +    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) & ~(1 << pin));
 +    g_assert_false(get_irq(gpio_id * NUM_GPIO_PINS + pin));
 +
 +    /* Clean the test */
 +    gpio_writel(gpio, MODER, reset(gpio, MODER));
 +    gpio_writel(gpio, PUPDR, reset(gpio, PUPDR));
 +    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR));
 +}
 +
 +static void test_push_pull(const void *data)
 +{
 +    /*
 +     * Test that configuring a line in push-pull output mode
 +     * disconnects the pin, that the pin can't be set or reset
 +     * externally afterwards.
 +     */
 +    unsigned int pin = ((uint64_t)data) & 0xF;
 +    uint32_t gpio = ((uint64_t)data) >> 32;
 +    uint32_t gpio2 = GPIO_BASE_ADDR + (GPIO_H - gpio);
 +
 +    qtest_irq_intercept_in(global_qtest, "/machine/soc/syscfg");
 +
 +    /* Setting a line high externally, configuring it in push-pull output */
 +    /* And checking the pin was disconnected */
 +    gpio_set_irq(gpio, pin, 1);
 +    gpio_set_2bits(gpio, MODER, pin, MODER_OUTPUT);
 +    g_assert_cmphex(get_disconnected_pins(gpio), ==, 0xFFFF);
 +    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) & ~(1 << pin));
 +
 +    /* Setting a line low externally, configuring it in push-pull output */
 +    /* And checking the pin was disconnected */
 +    gpio_set_irq(gpio2, pin, 0);
 +    gpio_set_bit(gpio2, ODR, pin, 1);
 +    gpio_set_2bits(gpio2, MODER, pin, MODER_OUTPUT);
 +    g_assert_cmphex(get_disconnected_pins(gpio2), ==, 0xFFFF);
 +    g_assert_cmphex(gpio_readl(gpio2, IDR), ==, reset(gpio2, IDR) | (1 << pin));
 +
 +    /* Trying to set a push-pull output pin, checking it doesn't work */
 +    gpio_set_irq(gpio, pin, 1);
 +    g_assert_cmphex(get_disconnected_pins(gpio), ==, 0xFFFF);
 +    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) & ~(1 << pin));
 +
 +    /* Trying to reset a push-pull output pin, checking it doesn't work */
 +    gpio_set_irq(gpio2, pin, 0);
 +    g_assert_cmphex(get_disconnected_pins(gpio2), ==, 0xFFFF);
 +    g_assert_cmphex(gpio_readl(gpio2, IDR), ==, reset(gpio2, IDR) | (1 << pin));
 +
 +    /* Clean the test */
 +    gpio_writel(gpio, MODER, reset(gpio, MODER));
 +    gpio_writel(gpio2, ODR, reset(gpio2, ODR));
 +    gpio_writel(gpio2, MODER, reset(gpio2, MODER));
 +}
 +
 +static void test_open_drain(const void *data)
 +{
 +    /*
 +     * Test that configuring a line in open-drain output mode
 +     * disconnects a pin set high externally and that the pin
 +     * can't be set high externally while configured in open-drain.
 +     *
 +     * However a pin set low externally shouldn't be disconnected,
 +     * and it can be set low externally when in open-drain mode.
 +     */
 +    unsigned int pin = ((uint64_t)data) & 0xF;
 +    uint32_t gpio = ((uint64_t)data) >> 32;
 +    uint32_t gpio2 = GPIO_BASE_ADDR + (GPIO_H - gpio);
 +
 +    qtest_irq_intercept_in(global_qtest, "/machine/soc/syscfg");
 +
 +    /* Setting a line high externally, configuring it in open-drain output */
 +    /* And checking the pin was disconnected */
 +    gpio_set_irq(gpio, pin, 1);
 +    gpio_set_bit(gpio, OTYPER, pin, OTYPER_OPEN_DRAIN);
 +    gpio_set_2bits(gpio, MODER, pin, MODER_OUTPUT);
 +    g_assert_cmphex(get_disconnected_pins(gpio), ==, 0xFFFF);
 +    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) & ~(1 << pin));
 +
 +    /* Setting a line low externally, configuring it in open-drain output */
 +    /* And checking the pin wasn't disconnected */
 +    gpio_set_irq(gpio2, pin, 0);
 +    gpio_set_bit(gpio2, ODR, pin, 1);
 +    gpio_set_bit(gpio2, OTYPER, pin, OTYPER_OPEN_DRAIN);
 +    gpio_set_2bits(gpio2, MODER, pin, MODER_OUTPUT);
 +    g_assert_cmphex(get_disconnected_pins(gpio2), ==, 0xFFFF & ~(1 << pin));
 +    g_assert_cmphex(gpio_readl(gpio2, IDR), ==,
 +                               reset(gpio2, IDR) & ~(1 << pin));
 +
 +    /* Trying to set a open-drain output pin, checking it doesn't work */
 +    gpio_set_irq(gpio, pin, 1);
 +    g_assert_cmphex(get_disconnected_pins(gpio), ==, 0xFFFF);
 +    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) & ~(1 << pin));
 +
 +    /* Trying to reset a open-drain output pin, checking it works */
 +    gpio_set_bit(gpio, ODR, pin, 1);
 +    gpio_set_irq(gpio, pin, 0);
 +    g_assert_cmphex(get_disconnected_pins(gpio2), ==, 0xFFFF & ~(1 << pin));
 +    g_assert_cmphex(gpio_readl(gpio2, IDR), ==,
 +                               reset(gpio2, IDR) & ~(1 << pin));
 +
 +    /* Clean the test */
 +    disconnect_all_pins(gpio2);
 +    gpio_writel(gpio2, OTYPER, reset(gpio2, OTYPER));
 +    gpio_writel(gpio2, ODR, reset(gpio2, ODR));
 +    gpio_writel(gpio2, MODER, reset(gpio2, MODER));
 +    g_assert_cmphex(gpio_readl(gpio2, IDR), ==, reset(gpio2, IDR));
 +    disconnect_all_pins(gpio);
 +    gpio_writel(gpio, OTYPER, reset(gpio, OTYPER));
 +    gpio_writel(gpio, ODR, reset(gpio, ODR));
 +    gpio_writel(gpio, MODER, reset(gpio, MODER));
 +    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR));
 +}
 +
 +static void test_bsrr_brr(const void *data)
 +{
 +    /*
 +     * Test that writing a '1' in BSS and BSRR
 +     * has the desired effect on ODR.
 +     * In BSRR, BSx has priority over BRx.
 +     */
 +    unsigned int pin = ((uint64_t)data) & 0xF;
 +    uint32_t gpio = ((uint64_t)data) >> 32;
 +
 +    gpio_writel(gpio, BSRR, (1 << pin));
 +    g_assert_cmphex(gpio_readl(gpio, ODR), ==, reset(gpio, ODR) | (1 << pin));
 +
 +    gpio_writel(gpio, BSRR, (1 << (pin + NUM_GPIO_PINS)));
 +    g_assert_cmphex(gpio_readl(gpio, ODR), ==, reset(gpio, ODR));
 +
 +    gpio_writel(gpio, BSRR, (1 << pin));
 +    g_assert_cmphex(gpio_readl(gpio, ODR), ==, reset(gpio, ODR) | (1 << pin));
 +
 +    gpio_writel(gpio, BRR, (1 << pin));
 +    g_assert_cmphex(gpio_readl(gpio, ODR), ==, reset(gpio, ODR));
 +
 +    /* BSx should have priority over BRx */
 +    gpio_writel(gpio, BSRR, (1 << pin) | (1 << (pin + NUM_GPIO_PINS)));
 +    g_assert_cmphex(gpio_readl(gpio, ODR), ==, reset(gpio, ODR) | (1 << pin));
 +
 +    gpio_writel(gpio, BRR, (1 << pin));
 +    g_assert_cmphex(gpio_readl(gpio, ODR), ==, reset(gpio, ODR));
 +
 +    gpio_writel(gpio, ODR, reset(gpio, ODR));
 +}
 +
 +int main(int argc, char **argv)
 +{
 +    int ret;
 +
 +    g_test_init(&argc, &argv, NULL);
 +    g_test_set_nonfatal_assertions();
 +    qtest_add_func("stm32l4x5/gpio/test_idr_reset_value",
 +                   test_idr_reset_value);
 +    /*
 +     * The inputs for the tests (gpio and pin) can be changed,
 +     * but the tests don't work for pins that are high at reset
 +     * (GPIOA15, GPIO13 and GPIOB5).
 +     * Specifically, rising the pin then checking `get_irq()`
 +     * is problematic since the pin was already high.
 +     */
 +    qtest_add_data_func("stm32l4x5/gpio/test_gpioc5_output_mode",
 +                        (void *)((uint64_t)GPIO_C << 32 | 5),
 +                        test_gpio_output_mode);
 +    qtest_add_data_func("stm32l4x5/gpio/test_gpioh3_output_mode",
 +                        (void *)((uint64_t)GPIO_H << 32 | 3),
 +                        test_gpio_output_mode);
 +    qtest_add_data_func("stm32l4x5/gpio/test_gpio_input_mode1",
 +                        (void *)((uint64_t)GPIO_D << 32 | 6),
 +                        test_gpio_input_mode);
 +    qtest_add_data_func("stm32l4x5/gpio/test_gpio_input_mode2",
 +                        (void *)((uint64_t)GPIO_C << 32 | 10),
 +                        test_gpio_input_mode);
 +    qtest_add_data_func("stm32l4x5/gpio/test_gpio_pull_up_pull_down1",
 +                        (void *)((uint64_t)GPIO_B << 32 | 5),
 +                        test_pull_up_pull_down);
 +    qtest_add_data_func("stm32l4x5/gpio/test_gpio_pull_up_pull_down2",
 +                        (void *)((uint64_t)GPIO_F << 32 | 1),
 +                        test_pull_up_pull_down);
 +    qtest_add_data_func("stm32l4x5/gpio/test_gpio_push_pull1",
 +                        (void *)((uint64_t)GPIO_G << 32 | 6),
 +                        test_push_pull);
 +    qtest_add_data_func("stm32l4x5/gpio/test_gpio_push_pull2",
 +                        (void *)((uint64_t)GPIO_H << 32 | 3),
 +                        test_push_pull);
 +    qtest_add_data_func("stm32l4x5/gpio/test_gpio_open_drain1",
 +                        (void *)((uint64_t)GPIO_C << 32 | 4),
 +                        test_open_drain);
 +    qtest_add_data_func("stm32l4x5/gpio/test_gpio_open_drain2",
 +                        (void *)((uint64_t)GPIO_E << 32 | 11),
 +                        test_open_drain);
 +    qtest_add_data_func("stm32l4x5/gpio/test_bsrr_brr1",
 +                        (void *)((uint64_t)GPIO_A << 32 | 12),
 +                        test_bsrr_brr);
 +    qtest_add_data_func("stm32l4x5/gpio/test_bsrr_brr2",
 +                        (void *)((uint64_t)GPIO_D << 32 | 0),
 +                        test_bsrr_brr);
 +
 +    qtest_start("-machine b-l475e-iot01a");
 +    ret = g_test_run();
 +    qtest_end();
 +
 +    return ret;
 +}
 diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
 index XXXXXXX..XXXXXXX 100644
---- a/hw/sd/sdhci-internal.h
+--- a/tests/qtest/meson.build
-+++ b/hw/sd/sdhci-internal.h
++++ b/tests/qtest/meson.build
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ qtests_aspeed = \
- #define SDHC_CMD_INHIBIT               0x00000001
+ qtests_stm32l4x5 = \
- #define SDHC_DATA_INHIBIT              0x00000002
+   ['stm32l4x5_exti-test',
- #define SDHC_DAT_LINE_ACTIVE           0x00000004
+    'stm32l4x5_syscfg-test',
-+#define SDHC_IMX_CLOCK_GATE_OFF        0x00000080
+-   'stm32l4x5_rcc-test']
- #define SDHC_DOING_WRITE               0x00000100
++   'stm32l4x5_rcc-test',
- #define SDHC_DOING_READ                0x00000200
++   'stm32l4x5_gpio-test']
- #define SDHC_SPACE_AVAILABLE           0x00000400
-@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
+ qtests_arm = \
+   (config_all_devices.has_key('CONFIG_MPS2') ? ['sse-timer-test'] : []) + \
  #define ESDHC_MIX_CTRL                  0x48
 +
  #define ESDHC_VENDOR_SPEC               0xc0
 +#define ESDHC_IMX_FRC_SDCLK_ON          (1 << 8)
 +
  #define ESDHC_DLL_CTRL                  0x60
  #define ESDHC_TUNING_CTRL               0xcc
@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
  #define DEFINE_SDHCI_COMMON_PROPERTIES(_state) \
      DEFINE_PROP_UINT8("sd-spec-version", _state, sd_spec_version, 2), \
      DEFINE_PROP_UINT8("uhs", _state, uhs_mode, UHS_NOT_SUPPORTED), \
 +    DEFINE_PROP_UINT8("vendor", _state, vendor, SDHCI_VENDOR_NONE), \
      \
      /* Capabilities registers provide information on supported
       * features of this specific host controller implementation */ \
 diff --git a/include/hw/sd/sdhci.h b/include/hw/sd/sdhci.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/sd/sdhci.h
 +++ b/include/hw/sd/sdhci.h
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
      uint16_t acmd12errsts; /* Auto CMD12 error status register */
      uint16_t hostctl2;     /* Host Control 2 */
      uint64_t admasysaddr;  /* ADMA System Address Register */
 +    uint16_t vendor_spec;  /* Vendor specific register */
      /* Read-only registers */
      uint64_t capareg;      /* Capabilities Register */
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
      uint32_t quirks;
      uint8_t sd_spec_version;
      uint8_t uhs_mode;
 +    uint8_t vendor;        /* For vendor specific functionality */
  } SDHCIState;
 +#define SDHCI_VENDOR_NONE       0
 +#define SDHCI_VENDOR_IMX        1
 +
  /*
   * Controller does not provide transfer-complete interrupt when not
   * busy.
 diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/sd/sdhci.c
 +++ b/hw/sd/sdhci.c
@@ -XXX,XX +XXX,XX @@ static uint64_t usdhc_read(void *opaque, hwaddr offset, unsigned size)
          }
          break;
 +    case ESDHC_VENDOR_SPEC:
 +        ret = s->vendor_spec;
 +        break;
      case ESDHC_DLL_CTRL:
      case ESDHC_TUNE_CTRL_STATUS:
      case ESDHC_UNDOCUMENTED_REG27:
      case ESDHC_TUNING_CTRL:
 -    case ESDHC_VENDOR_SPEC:
      case ESDHC_MIX_CTRL:
      case ESDHC_WTMK_LVL:
          ret = 0;
@@ -XXX,XX +XXX,XX @@ usdhc_write(void *opaque, hwaddr offset, uint64_t val, unsigned size)
      case ESDHC_UNDOCUMENTED_REG27:
      case ESDHC_TUNING_CTRL:
      case ESDHC_WTMK_LVL:
 +        break;
 +
      case ESDHC_VENDOR_SPEC:
 +        s->vendor_spec = value;
 +        switch (s->vendor) {
 +        case SDHCI_VENDOR_IMX:
 +            if (value & ESDHC_IMX_FRC_SDCLK_ON) {
 +                s->prnsts &= ~SDHC_IMX_CLOCK_GATE_OFF;
 +            } else {
 +                s->prnsts |= SDHC_IMX_CLOCK_GATE_OFF;
 +            }
 +            break;
 +        default:
 +            break;
 +        }
          break;
      case SDHC_HOSTCTL:
 --
-.20.1
+.34.1

-[PULL 20/23] target/arm/cpu: adjust virtual time for all KVM arm cpus
+[PULL 12/14] target/arm: Fix 32-bit SMOPA
-From: fangying <fangying1@huawei.com>
+From: Richard Henderson <richard.henderson@linaro.org>
-Virtual time adjustment was implemented for virt-5.0 machine type,
+While the 8-bit input elements are sequential in the input vector,
-but the cpu property was enabled only for host-passthrough and max
+the 32-bit output elements are not sequential in the output matrix.
-cpu model.  Let's add it for any KVM arm cpu which has the generic
+Do not attempt to compute 2 32-bit outputs at the same time.
-timer feature enabled.
+Cc: qemu-stable@nongnu.org
-Signed-off-by: Ying Fang <fangying1@huawei.com>
+Fixes: 23a5e3859f5 ("target/arm: Implement SME integer outer product")
-Reviewed-by: Andrew Jones <drjones@redhat.com>
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2083
-Message-id: 20200608121243.2076-1-fangying1@huawei.com
+Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-[PMM: minor commit message tweak, removed inaccurate
+Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
- suggested-by tag]
+Message-id: 20240305163931.242795-1-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/cpu.c   |  6 ++++--
+ target/arm/tcg/sme_helper.c       | 77 ++++++++++++++++++-------------
- target/arm/cpu64.c |  1 -
+ tests/tcg/aarch64/sme-smopa-1.c   | 47 +++++++++++++++++++
- target/arm/kvm.c   | 21 +++++++++++----------
+ tests/tcg/aarch64/sme-smopa-2.c   | 54 ++++++++++++++++++++++
-files changed, 15 insertions(+), 13 deletions(-)
+ tests/tcg/aarch64/Makefile.target |  2 +-
+files changed, 147 insertions(+), 33 deletions(-)
-diff --git a/target/arm/cpu.c b/target/arm/cpu.c
+ create mode 100644 tests/tcg/aarch64/sme-smopa-1.c
  create mode 100644 tests/tcg/aarch64/sme-smopa-2.c
 diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
+--- a/target/arm/tcg/sme_helper.c
-+++ b/target/arm/cpu.c
++++ b/target/arm/tcg/sme_helper.c
-@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
+@@ -XXX,XX +XXX,XX @@ void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn,
      if (arm_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER)) {
          qdev_property_add_static(DEVICE(cpu), &arm_cpu_gt_cntfrq_property);
      }
-+
-+    if (kvm_enabled()) {
-+        kvm_arm_add_vcpu_properties(obj);
-+    }
  }
- static void arm_cpu_finalizefn(Object *obj)
+-typedef uint64_t IMOPFn(uint64_t, uint64_t, uint64_t, uint8_t, bool);
-@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
++typedef uint32_t IMOPFn32(uint32_t, uint32_t, uint32_t, uint8_t, bool);
++static inline void do_imopa_s(uint32_t *za, uint32_t *zn, uint32_t *zm,
-     if (kvm_enabled()) {
++                              uint8_t *pn, uint8_t *pm,
-         kvm_arm_set_cpu_features_from_host(cpu);
++                              uint32_t desc, IMOPFn32 *fn)
--        kvm_arm_add_vcpu_properties(obj);
++{
-     } else {
++    intptr_t row, col, oprsz = simd_oprsz(desc) / 4;
-         cortex_a15_initfn(obj);
++    bool neg = simd_data(desc);
-@@ -XXX,XX +XXX,XX @@ static void arm_host_initfn(Object *obj)
+-static inline void do_imopa(uint64_t *za, uint64_t *zn, uint64_t *zm,
-     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
+-                            uint8_t *pn, uint8_t *pm,
-         aarch64_add_sve_properties(obj);
+-                            uint32_t desc, IMOPFn *fn)
-     }
++    for (row = 0; row < oprsz; ++row) {
--    kvm_arm_add_vcpu_properties(obj);
++        uint8_t pa = (pn[H1(row >> 1)] >> ((row & 1) * 4)) & 0xf;
-     arm_cpu_post_init(obj);
++        uint32_t *za_row = &za[tile_vslice_index(row)];
 +        uint32_t n = zn[H4(row)];
 +
 +        for (col = 0; col < oprsz; ++col) {
 +            uint8_t pb = pm[H1(col >> 1)] >> ((col & 1) * 4);
 +            uint32_t *a = &za_row[H4(col)];
 +
 +            *a = fn(n, zm[H4(col)], *a, pa & pb, neg);
 +        }
 +    }
 +}
 +
 +typedef uint64_t IMOPFn64(uint64_t, uint64_t, uint64_t, uint8_t, bool);
 +static inline void do_imopa_d(uint64_t *za, uint64_t *zn, uint64_t *zm,
 +                              uint8_t *pn, uint8_t *pm,
 +                              uint32_t desc, IMOPFn64 *fn)
  {
      intptr_t row, col, oprsz = simd_oprsz(desc) / 8;
      bool neg = simd_data(desc);
@@ -XXX,XX +XXX,XX @@ static inline void do_imopa(uint64_t *za, uint64_t *zn, uint64_t *zm,
  }
-diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
+ #define DEF_IMOP_32(NAME, NTYPE, MTYPE) \
 -static uint64_t NAME(uint64_t n, uint64_t m, uint64_t a, uint8_t p, bool neg) \
 +static uint32_t NAME(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) \
  {                                                                           \
 -    uint32_t sum0 = 0, sum1 = 0;                                            \
 +    uint32_t sum = 0;                                                       \
      /* Apply P to N as a mask, making the inactive elements 0. */           \
      n &= expand_pred_b(p);                                                  \
 -    sum0 += (NTYPE)(n >> 0) * (MTYPE)(m >> 0);                              \
 -    sum0 += (NTYPE)(n >> 8) * (MTYPE)(m >> 8);                              \
 -    sum0 += (NTYPE)(n >> 16) * (MTYPE)(m >> 16);                            \
 -    sum0 += (NTYPE)(n >> 24) * (MTYPE)(m >> 24);                            \
 -    sum1 += (NTYPE)(n >> 32) * (MTYPE)(m >> 32);                            \
 -    sum1 += (NTYPE)(n >> 40) * (MTYPE)(m >> 40);                            \
 -    sum1 += (NTYPE)(n >> 48) * (MTYPE)(m >> 48);                            \
 -    sum1 += (NTYPE)(n >> 56) * (MTYPE)(m >> 56);                            \
 -    if (neg) {                                                              \
 -        sum0 = (uint32_t)a - sum0, sum1 = (uint32_t)(a >> 32) - sum1;       \
 -    } else {                                                                \
 -        sum0 = (uint32_t)a + sum0, sum1 = (uint32_t)(a >> 32) + sum1;       \
 -    }                                                                       \
 -    return ((uint64_t)sum1 << 32) | sum0;                                   \
 +    sum += (NTYPE)(n >> 0) * (MTYPE)(m >> 0);                               \
 +    sum += (NTYPE)(n >> 8) * (MTYPE)(m >> 8);                               \
 +    sum += (NTYPE)(n >> 16) * (MTYPE)(m >> 16);                             \
 +    sum += (NTYPE)(n >> 24) * (MTYPE)(m >> 24);                             \
 +    return neg ? a - sum : a + sum;                                         \
  }
  #define DEF_IMOP_64(NAME, NTYPE, MTYPE) \
@@ -XXX,XX +XXX,XX @@ DEF_IMOP_64(umopa_d, uint16_t, uint16_t)
  DEF_IMOP_64(sumopa_d, int16_t, uint16_t)
  DEF_IMOP_64(usmopa_d, uint16_t, int16_t)
 -#define DEF_IMOPH(NAME) \
 -    void HELPER(sme_##NAME)(void *vza, void *vzn, void *vzm, void *vpn,      \
 -                            void *vpm, uint32_t desc)                        \
 -    { do_imopa(vza, vzn, vzm, vpn, vpm, desc, NAME); }
 +#define DEF_IMOPH(NAME, S) \
 +    void HELPER(sme_##NAME##_##S)(void *vza, void *vzn, void *vzm,          \
 +                                  void *vpn, void *vpm, uint32_t desc)      \
 +    { do_imopa_##S(vza, vzn, vzm, vpn, vpm, desc, NAME##_##S); }
 -DEF_IMOPH(smopa_s)
 -DEF_IMOPH(umopa_s)
 -DEF_IMOPH(sumopa_s)
 -DEF_IMOPH(usmopa_s)
 -DEF_IMOPH(smopa_d)
 -DEF_IMOPH(umopa_d)
 -DEF_IMOPH(sumopa_d)
 -DEF_IMOPH(usmopa_d)
 +DEF_IMOPH(smopa, s)
 +DEF_IMOPH(umopa, s)
 +DEF_IMOPH(sumopa, s)
 +DEF_IMOPH(usmopa, s)
 +
 +DEF_IMOPH(smopa, d)
 +DEF_IMOPH(umopa, d)
 +DEF_IMOPH(sumopa, d)
 +DEF_IMOPH(usmopa, d)
 diff --git a/tests/tcg/aarch64/sme-smopa-1.c b/tests/tcg/aarch64/sme-smopa-1.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/tests/tcg/aarch64/sme-smopa-1.c
@@ -XXX,XX +XXX,XX @@
 +#include <stdio.h>
 +#include <string.h>
 +
 +int main()
 +{
 +    static const int cmp[4][4] = {
 +        {  110,  134,  158,  182 },
 +        {  390,  478,  566,  654 },
 +        {  670,  822,  974, 1126 },
 +        {  950, 1166, 1382, 1598 }
 +    };
 +    int dst[4][4];
 +    int *tmp = &dst[0][0];
 +
 +    asm volatile(
 +        ".arch armv8-r+sme\n\t"
 +        "smstart\n\t"
 +        "index z0.b, #0, #1\n\t"
 +        "movprfx z1, z0\n\t"
 +        "add z1.b, z1.b, #16\n\t"
 +        "ptrue p0.b\n\t"
 +        "smopa za0.s, p0/m, p0/m, z0.b, z1.b\n\t"
 +        "ptrue p0.s, vl4\n\t"
 +        "mov w12, #0\n\t"
 +        "st1w { za0h.s[w12, #0] }, p0, [%0]\n\t"
 +        "add %0, %0, #16\n\t"
 +        "st1w { za0h.s[w12, #1] }, p0, [%0]\n\t"
 +        "add %0, %0, #16\n\t"
 +        "st1w { za0h.s[w12, #2] }, p0, [%0]\n\t"
 +        "add %0, %0, #16\n\t"
 +        "st1w { za0h.s[w12, #3] }, p0, [%0]\n\t"
 +        "smstop"
 +        : "+r"(tmp) : : "memory");
 +
 +    if (memcmp(cmp, dst, sizeof(dst)) == 0) {
 +        return 0;
 +    }
 +
 +    /* See above for correct results. */
 +    for (int i = 0; i < 4; ++i) {
 +        for (int j = 0; j < 4; ++j) {
 +            printf("%6d", dst[i][j]);
 +        }
 +        printf("\n");
 +    }
 +    return 1;
 +}
 diff --git a/tests/tcg/aarch64/sme-smopa-2.c b/tests/tcg/aarch64/sme-smopa-2.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/tests/tcg/aarch64/sme-smopa-2.c
@@ -XXX,XX +XXX,XX @@
 +#include <stdio.h>
 +#include <string.h>
 +
 +int main()
 +{
 +    static const long cmp[4][4] = {
 +        {  110,  134,  158,  182 },
 +        {  390,  478,  566,  654 },
 +        {  670,  822,  974, 1126 },
 +        {  950, 1166, 1382, 1598 }
 +    };
 +    long dst[4][4];
 +    long *tmp = &dst[0][0];
 +    long svl;
 +
 +    /* Validate that we have a wide enough vector for 4 elements. */
 +    asm(".arch armv8-r+sme-i64\n\trdsvl %0, #1" : "=r"(svl));
 +    if (svl < 32) {
 +        return 0;
 +    }
 +
 +    asm volatile(
 +        "smstart\n\t"
 +        "index z0.h, #0, #1\n\t"
 +        "movprfx z1, z0\n\t"
 +        "add z1.h, z1.h, #16\n\t"
 +        "ptrue p0.b\n\t"
 +        "smopa za0.d, p0/m, p0/m, z0.h, z1.h\n\t"
 +        "ptrue p0.d, vl4\n\t"
 +        "mov w12, #0\n\t"
 +        "st1d { za0h.d[w12, #0] }, p0, [%0]\n\t"
 +        "add %0, %0, #32\n\t"
 +        "st1d { za0h.d[w12, #1] }, p0, [%0]\n\t"
 +        "mov w12, #2\n\t"
 +        "add %0, %0, #32\n\t"
 +        "st1d { za0h.d[w12, #0] }, p0, [%0]\n\t"
 +        "add %0, %0, #32\n\t"
 +        "st1d { za0h.d[w12, #1] }, p0, [%0]\n\t"
 +        "smstop"
 +        : "+r"(tmp) : : "memory");
 +
 +    if (memcmp(cmp, dst, sizeof(dst)) == 0) {
 +        return 0;
 +    }
 +
 +    /* See above for correct results. */
 +    for (int i = 0; i < 4; ++i) {
 +        for (int j = 0; j < 4; ++j) {
 +            printf("%6ld", dst[i][j]);
 +        }
 +        printf("\n");
 +    }
 +    return 1;
 +}
 diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu64.c
+--- a/tests/tcg/aarch64/Makefile.target
-+++ b/target/arm/cpu64.c
++++ b/tests/tcg/aarch64/Makefile.target
-@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
+@@ -XXX,XX +XXX,XX @@ endif
-     if (kvm_enabled()) {
+ # SME Tests
-         kvm_arm_set_cpu_features_from_host(cpu);
+ ifneq ($(CROSS_AS_HAS_ARMV9_SME),)
--        kvm_arm_add_vcpu_properties(obj);
+-AARCH64_TESTS += sme-outprod1
-     } else {
++AARCH64_TESTS += sme-outprod1 sme-smopa-1 sme-smopa-2
-         uint64_t t;
+ endif
-         uint32_t u;
-diff --git a/target/arm/kvm.c b/target/arm/kvm.c
+ # System Registers Tests
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/kvm.c
 +++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@ static void kvm_no_adjvtime_set(Object *obj, bool value, Error **errp)
  /* KVM VCPU properties should be prefixed with "kvm-". */
  void kvm_arm_add_vcpu_properties(Object *obj)
  {
 -    if (!kvm_enabled()) {
 -        return;
 -    }
 +    ARMCPU *cpu = ARM_CPU(obj);
 +    CPUARMState *env = &cpu->env;
 -    ARM_CPU(obj)->kvm_adjvtime = true;
 -    object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
 -                             kvm_no_adjvtime_set);
 -    object_property_set_description(obj, "kvm-no-adjvtime",
 -                                    "Set on to disable the adjustment of "
 -                                    "the virtual counter. VM stopped time "
 -                                    "will be counted.");
 +    if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) {
 +        cpu->kvm_adjvtime = true;
 +        object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
 +                                 kvm_no_adjvtime_set);
 +        object_property_set_description(obj, "kvm-no-adjvtime",
 +                                        "Set on to disable the adjustment of "
 +                                        "the virtual counter. VM stopped time "
 +                                        "will be counted.");
 +    }
  }
  bool kvm_arm_pmu_supported(CPUState *cpu)
 --
-.20.1
+.34.1

-[PULL 23/23] hw: arm: Set vendor property for IMX SDHCI emulations
+[PULL 13/14] hw/rtc/sun4v-rtc: Relicense to GPLv2-or-later
-From: Guenter Roeck <linux@roeck-us.net>
+The sun4v RTC device model added under commit a0e893039cf2ce0 in 2016
 was unfortunately added with a license of GPL-v3-or-later, which is
 not compatible with other QEMU code which has a GPL-v2-only license.
-Set vendor property to IMX to enable IMX specific functionality
+Relicense the code in the .c and the .h file to GPL-v2-or-later,
-in sdhci code.
+to make it compatible with the rest of QEMU.
-Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Cc: qemu-stable@nongnu.org
-Signed-off-by: Guenter Roeck <linux@roeck-us.net>
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Signed-off-by: Paolo Bonzini (for Red Hat) <pbonzini@redhat.com>
-Message-id: 20200603145258.195920-3-linux@roeck-us.net
+Signed-off-by: Artyom Tarasenko <atar4qemu@gmail.com>
 Signed-off-by: Markus Armbruster <armbru@redhat.com>
 Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
 Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
 Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
 Acked-by: Alex Bennée <alex.bennee@linaro.org>
 Message-id: 20240223161300.938542-1-peter.maydell@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/arm/fsl-imx25.c  | 6 ++++++
+ include/hw/rtc/sun4v-rtc.h | 2 +-
- hw/arm/fsl-imx6.c   | 6 ++++++
+ hw/rtc/sun4v-rtc.c         | 2 +-
- hw/arm/fsl-imx6ul.c | 2 ++
+files changed, 2 insertions(+), 2 deletions(-)
  hw/arm/fsl-imx7.c   | 2 ++
 files changed, 16 insertions(+)
-diff --git a/hw/arm/fsl-imx25.c b/hw/arm/fsl-imx25.c
+diff --git a/include/hw/rtc/sun4v-rtc.h b/include/hw/rtc/sun4v-rtc.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/fsl-imx25.c
+--- a/include/hw/rtc/sun4v-rtc.h
-+++ b/hw/arm/fsl-imx25.c
++++ b/include/hw/rtc/sun4v-rtc.h
-@@ -XXX,XX +XXX,XX @@ static void fsl_imx25_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@
-                                  &err);
+  *
-         object_property_set_uint(OBJECT(&s->esdhc[i]), IMX25_ESDHC_CAPABILITIES,
+  * Copyright (c) 2016 Artyom Tarasenko
-                                  "capareg", &err);
+  *
-+        object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
+- * This code is licensed under the GNU GPL v3 or (at your option) any later
-+                                 "vendor", &err);
++ * This code is licensed under the GNU GPL v2 or (at your option) any later
-+        if (err) {
+  * version.
-+            error_propagate(errp, err);
+  */
-+            return;
-+        }
+diff --git a/hw/rtc/sun4v-rtc.c b/hw/rtc/sun4v-rtc.c
          object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
          if (err) {
              error_propagate(errp, err);
 diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/arm/fsl-imx6.c
+--- a/hw/rtc/sun4v-rtc.c
-+++ b/hw/arm/fsl-imx6.c
++++ b/hw/rtc/sun4v-rtc.c
-@@ -XXX,XX +XXX,XX @@ static void fsl_imx6_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@
-                                  &err);
+  *
-         object_property_set_uint(OBJECT(&s->esdhc[i]), IMX6_ESDHC_CAPABILITIES,
+  * Copyright (c) 2016 Artyom Tarasenko
-                                  "capareg", &err);
+  *
-+        object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
+- * This code is licensed under the GNU GPL v3 or (at your option) any later
-+                                 "vendor", &err);
++ * This code is licensed under the GNU GPL v2 or (at your option) any later
-+        if (err) {
+  * version.
-+            error_propagate(errp, err);
+  */
 +            return;
 +        }
          object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
          if (err) {
              error_propagate(errp, err);
 diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/fsl-imx6ul.c
 +++ b/hw/arm/fsl-imx6ul.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
              FSL_IMX6UL_USDHC2_IRQ,
          };
 +        object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
 +                                        "vendor", &error_abort);
          object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
                                   &error_abort);
 diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/arm/fsl-imx7.c
 +++ b/hw/arm/fsl-imx7.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
              FSL_IMX7_USDHC3_IRQ,
          };
 +        object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
 +                                 "vendor", &error_abort);
          object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
                                   &error_abort);
 --
-.20.1
+.34.1

-[PULL 21/23] hw/net/imx_fec: Convert debug fprintf() to trace events
+[PULL 14/14] target/arm: Move v7m-related code from cpu32.c into a separate file
-From: Jean-Christophe Dubois <jcd@tribudubois.net>
+From: Thomas Huth <thuth@redhat.com>
-Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
+Move the code to a separate file so that we do not have to compile
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+it anymore if CONFIG_ARM_V7M is not set.
-Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-[PMD: Fixed 32-bit format string using PRIx32/PRIx64]
+Signed-off-by: Thomas Huth <thuth@redhat.com>
-Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20240308141051.536599-2-thuth@redhat.com
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/net/imx_fec.c    | 106 +++++++++++++++++++-------------------------
+ target/arm/tcg/cpu-v7m.c   | 290 +++++++++++++++++++++++++++++++++++++
- hw/net/trace-events |  18 ++++++++
+ target/arm/tcg/cpu32.c     | 261 ---------------------------------
-files changed, 63 insertions(+), 61 deletions(-)
+ target/arm/meson.build     |   3 +
  target/arm/tcg/meson.build |   3 +
 files changed, 296 insertions(+), 261 deletions(-)
  create mode 100644 target/arm/tcg/cpu-v7m.c
-diff --git a/hw/net/imx_fec.c b/hw/net/imx_fec.c
+diff --git a/target/arm/tcg/cpu-v7m.c b/target/arm/tcg/cpu-v7m.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/tcg/cpu-v7m.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * QEMU ARMv7-M TCG-only CPUs.
 + *
 + * Copyright (c) 2012 SUSE LINUX Products GmbH
 + *
 + * This code is licensed under the GNU GPL v2 or later.
 + *
 + * SPDX-License-Identifier: GPL-2.0-or-later
 + */
 +
 +#include "qemu/osdep.h"
 +#include "cpu.h"
 +#include "hw/core/tcg-cpu-ops.h"
 +#include "internals.h"
 +
 +#if !defined(CONFIG_USER_ONLY)
 +
 +#include "hw/intc/armv7m_nvic.h"
 +
 +static bool arm_v7m_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
 +{
 +    CPUClass *cc = CPU_GET_CLASS(cs);
 +    ARMCPU *cpu = ARM_CPU(cs);
 +    CPUARMState *env = &cpu->env;
 +    bool ret = false;
 +
 +    /*
 +     * ARMv7-M interrupt masking works differently than -A or -R.
 +     * There is no FIQ/IRQ distinction. Instead of I and F bits
 +     * masking FIQ and IRQ interrupts, an exception is taken only
 +     * if it is higher priority than the current execution priority
 +     * (which depends on state like BASEPRI, FAULTMASK and the
 +     * currently active exception).
 +     */
 +    if (interrupt_request & CPU_INTERRUPT_HARD
 +        && (armv7m_nvic_can_take_pending_exception(env->nvic))) {
 +        cs->exception_index = EXCP_IRQ;
 +        cc->tcg_ops->do_interrupt(cs);
 +        ret = true;
 +    }
 +    return ret;
 +}
 +
 +#endif /* !CONFIG_USER_ONLY */
 +
 +static void cortex_m0_initfn(Object *obj)
 +{
 +    ARMCPU *cpu = ARM_CPU(obj);
 +    set_feature(&cpu->env, ARM_FEATURE_V6);
 +    set_feature(&cpu->env, ARM_FEATURE_M);
 +
 +    cpu->midr = 0x410cc200;
 +
 +    /*
 +     * These ID register values are not guest visible, because
 +     * we do not implement the Main Extension. They must be set
 +     * to values corresponding to the Cortex-M0's implemented
 +     * features, because QEMU generally controls its emulation
 +     * by looking at ID register fields. We use the same values as
 +     * for the M3.
 +     */
 +    cpu->isar.id_pfr0 = 0x00000030;
 +    cpu->isar.id_pfr1 = 0x00000200;
 +    cpu->isar.id_dfr0 = 0x00100000;
 +    cpu->id_afr0 = 0x00000000;
 +    cpu->isar.id_mmfr0 = 0x00000030;
 +    cpu->isar.id_mmfr1 = 0x00000000;
 +    cpu->isar.id_mmfr2 = 0x00000000;
 +    cpu->isar.id_mmfr3 = 0x00000000;
 +    cpu->isar.id_isar0 = 0x01141110;
 +    cpu->isar.id_isar1 = 0x02111000;
 +    cpu->isar.id_isar2 = 0x21112231;
 +    cpu->isar.id_isar3 = 0x01111110;
 +    cpu->isar.id_isar4 = 0x01310102;
 +    cpu->isar.id_isar5 = 0x00000000;
 +    cpu->isar.id_isar6 = 0x00000000;
 +}
 +
 +static void cortex_m3_initfn(Object *obj)
 +{
 +    ARMCPU *cpu = ARM_CPU(obj);
 +    set_feature(&cpu->env, ARM_FEATURE_V7);
 +    set_feature(&cpu->env, ARM_FEATURE_M);
 +    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
 +    cpu->midr = 0x410fc231;
 +    cpu->pmsav7_dregion = 8;
 +    cpu->isar.id_pfr0 = 0x00000030;
 +    cpu->isar.id_pfr1 = 0x00000200;
 +    cpu->isar.id_dfr0 = 0x00100000;
 +    cpu->id_afr0 = 0x00000000;
 +    cpu->isar.id_mmfr0 = 0x00000030;
 +    cpu->isar.id_mmfr1 = 0x00000000;
 +    cpu->isar.id_mmfr2 = 0x00000000;
 +    cpu->isar.id_mmfr3 = 0x00000000;
 +    cpu->isar.id_isar0 = 0x01141110;
 +    cpu->isar.id_isar1 = 0x02111000;
 +    cpu->isar.id_isar2 = 0x21112231;
 +    cpu->isar.id_isar3 = 0x01111110;
 +    cpu->isar.id_isar4 = 0x01310102;
 +    cpu->isar.id_isar5 = 0x00000000;
 +    cpu->isar.id_isar6 = 0x00000000;
 +}
 +
 +static void cortex_m4_initfn(Object *obj)
 +{
 +    ARMCPU *cpu = ARM_CPU(obj);
 +
 +    set_feature(&cpu->env, ARM_FEATURE_V7);
 +    set_feature(&cpu->env, ARM_FEATURE_M);
 +    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
 +    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
 +    cpu->midr = 0x410fc240; /* r0p0 */
 +    cpu->pmsav7_dregion = 8;
 +    cpu->isar.mvfr0 = 0x10110021;
 +    cpu->isar.mvfr1 = 0x11000011;
 +    cpu->isar.mvfr2 = 0x00000000;
 +    cpu->isar.id_pfr0 = 0x00000030;
 +    cpu->isar.id_pfr1 = 0x00000200;
 +    cpu->isar.id_dfr0 = 0x00100000;
 +    cpu->id_afr0 = 0x00000000;
 +    cpu->isar.id_mmfr0 = 0x00000030;
 +    cpu->isar.id_mmfr1 = 0x00000000;
 +    cpu->isar.id_mmfr2 = 0x00000000;
 +    cpu->isar.id_mmfr3 = 0x00000000;
 +    cpu->isar.id_isar0 = 0x01141110;
 +    cpu->isar.id_isar1 = 0x02111000;
 +    cpu->isar.id_isar2 = 0x21112231;
 +    cpu->isar.id_isar3 = 0x01111110;
 +    cpu->isar.id_isar4 = 0x01310102;
 +    cpu->isar.id_isar5 = 0x00000000;
 +    cpu->isar.id_isar6 = 0x00000000;
 +}
 +
 +static void cortex_m7_initfn(Object *obj)
 +{
 +    ARMCPU *cpu = ARM_CPU(obj);
 +
 +    set_feature(&cpu->env, ARM_FEATURE_V7);
 +    set_feature(&cpu->env, ARM_FEATURE_M);
 +    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
 +    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
 +    cpu->midr = 0x411fc272; /* r1p2 */
 +    cpu->pmsav7_dregion = 8;
 +    cpu->isar.mvfr0 = 0x10110221;
 +    cpu->isar.mvfr1 = 0x12000011;
 +    cpu->isar.mvfr2 = 0x00000040;
 +    cpu->isar.id_pfr0 = 0x00000030;
 +    cpu->isar.id_pfr1 = 0x00000200;
 +    cpu->isar.id_dfr0 = 0x00100000;
 +    cpu->id_afr0 = 0x00000000;
 +    cpu->isar.id_mmfr0 = 0x00100030;
 +    cpu->isar.id_mmfr1 = 0x00000000;
 +    cpu->isar.id_mmfr2 = 0x01000000;
 +    cpu->isar.id_mmfr3 = 0x00000000;
 +    cpu->isar.id_isar0 = 0x01101110;
 +    cpu->isar.id_isar1 = 0x02112000;
 +    cpu->isar.id_isar2 = 0x20232231;
 +    cpu->isar.id_isar3 = 0x01111131;
 +    cpu->isar.id_isar4 = 0x01310132;
 +    cpu->isar.id_isar5 = 0x00000000;
 +    cpu->isar.id_isar6 = 0x00000000;
 +}
 +
 +static void cortex_m33_initfn(Object *obj)
 +{
 +    ARMCPU *cpu = ARM_CPU(obj);
 +
 +    set_feature(&cpu->env, ARM_FEATURE_V8);
 +    set_feature(&cpu->env, ARM_FEATURE_M);
 +    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
 +    set_feature(&cpu->env, ARM_FEATURE_M_SECURITY);
 +    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
 +    cpu->midr = 0x410fd213; /* r0p3 */
 +    cpu->pmsav7_dregion = 16;
 +    cpu->sau_sregion = 8;
 +    cpu->isar.mvfr0 = 0x10110021;
 +    cpu->isar.mvfr1 = 0x11000011;
 +    cpu->isar.mvfr2 = 0x00000040;
 +    cpu->isar.id_pfr0 = 0x00000030;
 +    cpu->isar.id_pfr1 = 0x00000210;
 +    cpu->isar.id_dfr0 = 0x00200000;
 +    cpu->id_afr0 = 0x00000000;
 +    cpu->isar.id_mmfr0 = 0x00101F40;
 +    cpu->isar.id_mmfr1 = 0x00000000;
 +    cpu->isar.id_mmfr2 = 0x01000000;
 +    cpu->isar.id_mmfr3 = 0x00000000;
 +    cpu->isar.id_isar0 = 0x01101110;
 +    cpu->isar.id_isar1 = 0x02212000;
 +    cpu->isar.id_isar2 = 0x20232232;
 +    cpu->isar.id_isar3 = 0x01111131;
 +    cpu->isar.id_isar4 = 0x01310132;
 +    cpu->isar.id_isar5 = 0x00000000;
 +    cpu->isar.id_isar6 = 0x00000000;
 +    cpu->clidr = 0x00000000;
 +    cpu->ctr = 0x8000c000;
 +}
 +
 +static void cortex_m55_initfn(Object *obj)
 +{
 +    ARMCPU *cpu = ARM_CPU(obj);
 +
 +    set_feature(&cpu->env, ARM_FEATURE_V8);
 +    set_feature(&cpu->env, ARM_FEATURE_V8_1M);
 +    set_feature(&cpu->env, ARM_FEATURE_M);
 +    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
 +    set_feature(&cpu->env, ARM_FEATURE_M_SECURITY);
 +    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
 +    cpu->midr = 0x410fd221; /* r0p1 */
 +    cpu->revidr = 0;
 +    cpu->pmsav7_dregion = 16;
 +    cpu->sau_sregion = 8;
 +    /* These are the MVFR* values for the FPU + full MVE configuration */
 +    cpu->isar.mvfr0 = 0x10110221;
 +    cpu->isar.mvfr1 = 0x12100211;
 +    cpu->isar.mvfr2 = 0x00000040;
 +    cpu->isar.id_pfr0 = 0x20000030;
 +    cpu->isar.id_pfr1 = 0x00000230;
 +    cpu->isar.id_dfr0 = 0x10200000;
 +    cpu->id_afr0 = 0x00000000;
 +    cpu->isar.id_mmfr0 = 0x00111040;
 +    cpu->isar.id_mmfr1 = 0x00000000;
 +    cpu->isar.id_mmfr2 = 0x01000000;
 +    cpu->isar.id_mmfr3 = 0x00000011;
 +    cpu->isar.id_isar0 = 0x01103110;
 +    cpu->isar.id_isar1 = 0x02212000;
 +    cpu->isar.id_isar2 = 0x20232232;
 +    cpu->isar.id_isar3 = 0x01111131;
 +    cpu->isar.id_isar4 = 0x01310132;
 +    cpu->isar.id_isar5 = 0x00000000;
 +    cpu->isar.id_isar6 = 0x00000000;
 +    cpu->clidr = 0x00000000; /* caches not implemented */
 +    cpu->ctr = 0x8303c003;
 +}
 +
 +static const TCGCPUOps arm_v7m_tcg_ops = {
 +    .initialize = arm_translate_init,
 +    .synchronize_from_tb = arm_cpu_synchronize_from_tb,
 +    .debug_excp_handler = arm_debug_excp_handler,
 +    .restore_state_to_opc = arm_restore_state_to_opc,
 +
 +#ifdef CONFIG_USER_ONLY
 +    .record_sigsegv = arm_cpu_record_sigsegv,
 +    .record_sigbus = arm_cpu_record_sigbus,
 +#else
 +    .tlb_fill = arm_cpu_tlb_fill,
 +    .cpu_exec_interrupt = arm_v7m_cpu_exec_interrupt,
 +    .do_interrupt = arm_v7m_cpu_do_interrupt,
 +    .do_transaction_failed = arm_cpu_do_transaction_failed,
 +    .do_unaligned_access = arm_cpu_do_unaligned_access,
 +    .adjust_watchpoint_address = arm_adjust_watchpoint_address,
 +    .debug_check_watchpoint = arm_debug_check_watchpoint,
 +    .debug_check_breakpoint = arm_debug_check_breakpoint,
 +#endif /* !CONFIG_USER_ONLY */
 +};
 +
 +static void arm_v7m_class_init(ObjectClass *oc, void *data)
 +{
 +    ARMCPUClass *acc = ARM_CPU_CLASS(oc);
 +    CPUClass *cc = CPU_CLASS(oc);
 +
 +    acc->info = data;
 +    cc->tcg_ops = &arm_v7m_tcg_ops;
 +    cc->gdb_core_xml_file = "arm-m-profile.xml";
 +}
 +
 +static const ARMCPUInfo arm_v7m_cpus[] = {
 +    { .name = "cortex-m0",   .initfn = cortex_m0_initfn,
 +                             .class_init = arm_v7m_class_init },
 +    { .name = "cortex-m3",   .initfn = cortex_m3_initfn,
 +                             .class_init = arm_v7m_class_init },
 +    { .name = "cortex-m4",   .initfn = cortex_m4_initfn,
 +                             .class_init = arm_v7m_class_init },
 +    { .name = "cortex-m7",   .initfn = cortex_m7_initfn,
 +                             .class_init = arm_v7m_class_init },
 +    { .name = "cortex-m33",  .initfn = cortex_m33_initfn,
 +                             .class_init = arm_v7m_class_init },
 +    { .name = "cortex-m55",  .initfn = cortex_m55_initfn,
 +                             .class_init = arm_v7m_class_init },
 +};
 +
 +static void arm_v7m_cpu_register_types(void)
 +{
 +    size_t i;
 +
 +    for (i = 0; i < ARRAY_SIZE(arm_v7m_cpus); ++i) {
 +        arm_cpu_register(&arm_v7m_cpus[i]);
 +    }
 +}
 +
 +type_init(arm_v7m_cpu_register_types)
 diff --git a/target/arm/tcg/cpu32.c b/target/arm/tcg/cpu32.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/net/imx_fec.c
+--- a/target/arm/tcg/cpu32.c
-+++ b/hw/net/imx_fec.c
++++ b/target/arm/tcg/cpu32.c
 @@ -XXX,XX +XXX,XX @@
- #include "qemu/module.h"
+ #include "hw/boards.h"
- #include "net/checksum.h"
+ #endif
- #include "net/eth.h"
+ #include "cpregs.h"
-+#include "trace.h"
+-#if !defined(CONFIG_USER_ONLY) && defined(CONFIG_TCG)
+-#include "hw/intc/armv7m_nvic.h"
  /* For crc32 */
  #include <zlib.h>
 -#ifndef DEBUG_IMX_FEC
 -#define DEBUG_IMX_FEC 0
 -#endif
--
--#define FEC_PRINTF(fmt, args...) \
--    do { \
+ /* Share AArch32 -cpu max features with AArch64. */
--        if (DEBUG_IMX_FEC) { \
+@@ -XXX,XX +XXX,XX @@ void aa32_max_features(ARMCPU *cpu)
--            fprintf(stderr, "[%s]%s: " fmt , TYPE_IMX_FEC, \
+ /* CPU models. These are not needed for the AArch64 linux-user build. */
--                                             __func__, ##args); \
+ #if !defined(CONFIG_USER_ONLY) || !defined(TARGET_AARCH64)
--        } \
--    } while (0)
+-#if !defined(CONFIG_USER_ONLY)
--
+-static bool arm_v7m_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
--#ifndef DEBUG_IMX_PHY
+-{
--#define DEBUG_IMX_PHY 0
+-    CPUClass *cc = CPU_GET_CLASS(cs);
--#endif
+-    ARMCPU *cpu = ARM_CPU(cs);
--
+-    CPUARMState *env = &cpu->env;
--#define PHY_PRINTF(fmt, args...) \
+-    bool ret = false;
--    do { \
+-
--        if (DEBUG_IMX_PHY) { \
+-    /*
--            fprintf(stderr, "[%s.phy]%s: " fmt , TYPE_IMX_FEC, \
+-     * ARMv7-M interrupt masking works differently than -A or -R.
--                                                 __func__, ##args); \
+-     * There is no FIQ/IRQ distinction. Instead of I and F bits
--        } \
+-     * masking FIQ and IRQ interrupts, an exception is taken only
--    } while (0)
+-     * if it is higher priority than the current execution priority
--
+-     * (which depends on state like BASEPRI, FAULTMASK and the
- #define IMX_MAX_DESC    1024
+-     * currently active exception).
+-     */
- static const char *imx_default_reg_name(IMXFECState *s, uint32_t index)
+-    if (interrupt_request & CPU_INTERRUPT_HARD
-@@ -XXX,XX +XXX,XX @@ static void imx_eth_update(IMXFECState *s);
+-        && (armv7m_nvic_can_take_pending_exception(env->nvic))) {
-  * For now we don't handle any GPIO/interrupt line, so the OS will
+-        cs->exception_index = EXCP_IRQ;
-  * have to poll for the PHY status.
+-        cc->tcg_ops->do_interrupt(cs);
-  */
+-        ret = true;
--static void phy_update_irq(IMXFECState *s)
+-    }
-+static void imx_phy_update_irq(IMXFECState *s)
+-    return ret;
 -}
 -#endif /* !CONFIG_USER_ONLY */
 -
  static void arm926_initfn(Object *obj)
  {
-     imx_eth_update(s);
+     ARMCPU *cpu = ARM_CPU(obj);
@@ -XXX,XX +XXX,XX @@ static void cortex_a15_initfn(Object *obj)
      define_arm_cp_regs(cpu, cortexa15_cp_reginfo);
  }
--static void phy_update_link(IMXFECState *s)
+-static void cortex_m0_initfn(Object *obj)
-+static void imx_phy_update_link(IMXFECState *s)
+-{
- {
+-    ARMCPU *cpu = ARM_CPU(obj);
-     /* Autonegotiation status mirrors link status.  */
+-    set_feature(&cpu->env, ARM_FEATURE_V6);
-     if (qemu_get_queue(s->nic)->link_down) {
+-    set_feature(&cpu->env, ARM_FEATURE_M);
--        PHY_PRINTF("link is down\n");
+-
-+        trace_imx_phy_update_link("down");
+-    cpu->midr = 0x410cc200;
-         s->phy_status &= ~0x0024;
+-
-         s->phy_int |= PHY_INT_DOWN;
+-    /*
-     } else {
+-     * These ID register values are not guest visible, because
--        PHY_PRINTF("link is up\n");
+-     * we do not implement the Main Extension. They must be set
-+        trace_imx_phy_update_link("up");
+-     * to values corresponding to the Cortex-M0's implemented
-         s->phy_status |= 0x0024;
+-     * features, because QEMU generally controls its emulation
-         s->phy_int |= PHY_INT_ENERGYON;
+-     * by looking at ID register fields. We use the same values as
-         s->phy_int |= PHY_INT_AUTONEG_COMPLETE;
+-     * for the M3.
-     }
+-     */
--    phy_update_irq(s);
+-    cpu->isar.id_pfr0 = 0x00000030;
-+    imx_phy_update_irq(s);
+-    cpu->isar.id_pfr1 = 0x00000200;
 -    cpu->isar.id_dfr0 = 0x00100000;
 -    cpu->id_afr0 = 0x00000000;
 -    cpu->isar.id_mmfr0 = 0x00000030;
 -    cpu->isar.id_mmfr1 = 0x00000000;
 -    cpu->isar.id_mmfr2 = 0x00000000;
 -    cpu->isar.id_mmfr3 = 0x00000000;
 -    cpu->isar.id_isar0 = 0x01141110;
 -    cpu->isar.id_isar1 = 0x02111000;
 -    cpu->isar.id_isar2 = 0x21112231;
 -    cpu->isar.id_isar3 = 0x01111110;
 -    cpu->isar.id_isar4 = 0x01310102;
 -    cpu->isar.id_isar5 = 0x00000000;
 -    cpu->isar.id_isar6 = 0x00000000;
 -}
 -
 -static void cortex_m3_initfn(Object *obj)
 -{
 -    ARMCPU *cpu = ARM_CPU(obj);
 -    set_feature(&cpu->env, ARM_FEATURE_V7);
 -    set_feature(&cpu->env, ARM_FEATURE_M);
 -    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
 -    cpu->midr = 0x410fc231;
 -    cpu->pmsav7_dregion = 8;
 -    cpu->isar.id_pfr0 = 0x00000030;
 -    cpu->isar.id_pfr1 = 0x00000200;
 -    cpu->isar.id_dfr0 = 0x00100000;
 -    cpu->id_afr0 = 0x00000000;
 -    cpu->isar.id_mmfr0 = 0x00000030;
 -    cpu->isar.id_mmfr1 = 0x00000000;
 -    cpu->isar.id_mmfr2 = 0x00000000;
 -    cpu->isar.id_mmfr3 = 0x00000000;
 -    cpu->isar.id_isar0 = 0x01141110;
 -    cpu->isar.id_isar1 = 0x02111000;
 -    cpu->isar.id_isar2 = 0x21112231;
 -    cpu->isar.id_isar3 = 0x01111110;
 -    cpu->isar.id_isar4 = 0x01310102;
 -    cpu->isar.id_isar5 = 0x00000000;
 -    cpu->isar.id_isar6 = 0x00000000;
 -}
 -
 -static void cortex_m4_initfn(Object *obj)
 -{
 -    ARMCPU *cpu = ARM_CPU(obj);
 -
 -    set_feature(&cpu->env, ARM_FEATURE_V7);
 -    set_feature(&cpu->env, ARM_FEATURE_M);
 -    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
 -    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
 -    cpu->midr = 0x410fc240; /* r0p0 */
 -    cpu->pmsav7_dregion = 8;
 -    cpu->isar.mvfr0 = 0x10110021;
 -    cpu->isar.mvfr1 = 0x11000011;
 -    cpu->isar.mvfr2 = 0x00000000;
 -    cpu->isar.id_pfr0 = 0x00000030;
 -    cpu->isar.id_pfr1 = 0x00000200;
 -    cpu->isar.id_dfr0 = 0x00100000;
 -    cpu->id_afr0 = 0x00000000;
 -    cpu->isar.id_mmfr0 = 0x00000030;
 -    cpu->isar.id_mmfr1 = 0x00000000;
 -    cpu->isar.id_mmfr2 = 0x00000000;
 -    cpu->isar.id_mmfr3 = 0x00000000;
 -    cpu->isar.id_isar0 = 0x01141110;
 -    cpu->isar.id_isar1 = 0x02111000;
 -    cpu->isar.id_isar2 = 0x21112231;
 -    cpu->isar.id_isar3 = 0x01111110;
 -    cpu->isar.id_isar4 = 0x01310102;
 -    cpu->isar.id_isar5 = 0x00000000;
 -    cpu->isar.id_isar6 = 0x00000000;
 -}
 -
 -static void cortex_m7_initfn(Object *obj)
 -{
 -    ARMCPU *cpu = ARM_CPU(obj);
 -
 -    set_feature(&cpu->env, ARM_FEATURE_V7);
 -    set_feature(&cpu->env, ARM_FEATURE_M);
 -    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
 -    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
 -    cpu->midr = 0x411fc272; /* r1p2 */
 -    cpu->pmsav7_dregion = 8;
 -    cpu->isar.mvfr0 = 0x10110221;
 -    cpu->isar.mvfr1 = 0x12000011;
 -    cpu->isar.mvfr2 = 0x00000040;
 -    cpu->isar.id_pfr0 = 0x00000030;
 -    cpu->isar.id_pfr1 = 0x00000200;
 -    cpu->isar.id_dfr0 = 0x00100000;
 -    cpu->id_afr0 = 0x00000000;
 -    cpu->isar.id_mmfr0 = 0x00100030;
 -    cpu->isar.id_mmfr1 = 0x00000000;
 -    cpu->isar.id_mmfr2 = 0x01000000;
 -    cpu->isar.id_mmfr3 = 0x00000000;
 -    cpu->isar.id_isar0 = 0x01101110;
 -    cpu->isar.id_isar1 = 0x02112000;
 -    cpu->isar.id_isar2 = 0x20232231;
 -    cpu->isar.id_isar3 = 0x01111131;
 -    cpu->isar.id_isar4 = 0x01310132;
 -    cpu->isar.id_isar5 = 0x00000000;
 -    cpu->isar.id_isar6 = 0x00000000;
 -}
 -
 -static void cortex_m33_initfn(Object *obj)
 -{
 -    ARMCPU *cpu = ARM_CPU(obj);
 -
 -    set_feature(&cpu->env, ARM_FEATURE_V8);
 -    set_feature(&cpu->env, ARM_FEATURE_M);
 -    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
 -    set_feature(&cpu->env, ARM_FEATURE_M_SECURITY);
 -    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
 -    cpu->midr = 0x410fd213; /* r0p3 */
 -    cpu->pmsav7_dregion = 16;
 -    cpu->sau_sregion = 8;
 -    cpu->isar.mvfr0 = 0x10110021;
 -    cpu->isar.mvfr1 = 0x11000011;
 -    cpu->isar.mvfr2 = 0x00000040;
 -    cpu->isar.id_pfr0 = 0x00000030;
 -    cpu->isar.id_pfr1 = 0x00000210;
 -    cpu->isar.id_dfr0 = 0x00200000;
 -    cpu->id_afr0 = 0x00000000;
 -    cpu->isar.id_mmfr0 = 0x00101F40;
 -    cpu->isar.id_mmfr1 = 0x00000000;
 -    cpu->isar.id_mmfr2 = 0x01000000;
 -    cpu->isar.id_mmfr3 = 0x00000000;
 -    cpu->isar.id_isar0 = 0x01101110;
 -    cpu->isar.id_isar1 = 0x02212000;
 -    cpu->isar.id_isar2 = 0x20232232;
 -    cpu->isar.id_isar3 = 0x01111131;
 -    cpu->isar.id_isar4 = 0x01310132;
 -    cpu->isar.id_isar5 = 0x00000000;
 -    cpu->isar.id_isar6 = 0x00000000;
 -    cpu->clidr = 0x00000000;
 -    cpu->ctr = 0x8000c000;
 -}
 -
 -static void cortex_m55_initfn(Object *obj)
 -{
 -    ARMCPU *cpu = ARM_CPU(obj);
 -
 -    set_feature(&cpu->env, ARM_FEATURE_V8);
 -    set_feature(&cpu->env, ARM_FEATURE_V8_1M);
 -    set_feature(&cpu->env, ARM_FEATURE_M);
 -    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
 -    set_feature(&cpu->env, ARM_FEATURE_M_SECURITY);
 -    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
 -    cpu->midr = 0x410fd221; /* r0p1 */
 -    cpu->revidr = 0;
 -    cpu->pmsav7_dregion = 16;
 -    cpu->sau_sregion = 8;
 -    /* These are the MVFR* values for the FPU + full MVE configuration */
 -    cpu->isar.mvfr0 = 0x10110221;
 -    cpu->isar.mvfr1 = 0x12100211;
 -    cpu->isar.mvfr2 = 0x00000040;
 -    cpu->isar.id_pfr0 = 0x20000030;
 -    cpu->isar.id_pfr1 = 0x00000230;
 -    cpu->isar.id_dfr0 = 0x10200000;
 -    cpu->id_afr0 = 0x00000000;
 -    cpu->isar.id_mmfr0 = 0x00111040;
 -    cpu->isar.id_mmfr1 = 0x00000000;
 -    cpu->isar.id_mmfr2 = 0x01000000;
 -    cpu->isar.id_mmfr3 = 0x00000011;
 -    cpu->isar.id_isar0 = 0x01103110;
 -    cpu->isar.id_isar1 = 0x02212000;
 -    cpu->isar.id_isar2 = 0x20232232;
 -    cpu->isar.id_isar3 = 0x01111131;
 -    cpu->isar.id_isar4 = 0x01310132;
 -    cpu->isar.id_isar5 = 0x00000000;
 -    cpu->isar.id_isar6 = 0x00000000;
 -    cpu->clidr = 0x00000000; /* caches not implemented */
 -    cpu->ctr = 0x8303c003;
 -}
 -
  static const ARMCPRegInfo cortexr5_cp_reginfo[] = {
      /* Dummy the TCM region regs for the moment */
      { .name = "ATCM", .cp = 15, .opc1 = 0, .crn = 9, .crm = 1, .opc2 = 0,
@@ -XXX,XX +XXX,XX @@ static void pxa270c5_initfn(Object *obj)
      cpu->reset_sctlr = 0x00000078;
  }
- static void imx_eth_set_link(NetClientState *nc)
+-static const TCGCPUOps arm_v7m_tcg_ops = {
- {
+-    .initialize = arm_translate_init,
--    phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
+-    .synchronize_from_tb = arm_cpu_synchronize_from_tb,
-+    imx_phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
+-    .debug_excp_handler = arm_debug_excp_handler,
- }
+-    .restore_state_to_opc = arm_restore_state_to_opc,
+-
--static void phy_reset(IMXFECState *s)
+-#ifdef CONFIG_USER_ONLY
-+static void imx_phy_reset(IMXFECState *s)
+-    .record_sigsegv = arm_cpu_record_sigsegv,
- {
+-    .record_sigbus = arm_cpu_record_sigbus,
-+    trace_imx_phy_reset();
+-#else
-+
+-    .tlb_fill = arm_cpu_tlb_fill,
-     s->phy_status = 0x7809;
+-    .cpu_exec_interrupt = arm_v7m_cpu_exec_interrupt,
-     s->phy_control = 0x3000;
+-    .do_interrupt = arm_v7m_cpu_do_interrupt,
-     s->phy_advertise = 0x01e1;
+-    .do_transaction_failed = arm_cpu_do_transaction_failed,
-     s->phy_int_mask = 0;
+-    .do_unaligned_access = arm_cpu_do_unaligned_access,
-     s->phy_int = 0;
+-    .adjust_watchpoint_address = arm_adjust_watchpoint_address,
--    phy_update_link(s);
+-    .debug_check_watchpoint = arm_debug_check_watchpoint,
-+    imx_phy_update_link(s);
+-    .debug_check_breakpoint = arm_debug_check_breakpoint,
- }
+-#endif /* !CONFIG_USER_ONLY */
+-};
--static uint32_t do_phy_read(IMXFECState *s, int reg)
+-
-+static uint32_t imx_phy_read(IMXFECState *s, int reg)
+-static void arm_v7m_class_init(ObjectClass *oc, void *data)
- {
+-{
-     uint32_t val;
+-    ARMCPUClass *acc = ARM_CPU_CLASS(oc);
+-    CPUClass *cc = CPU_CLASS(oc);
-@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
+-
-     case 29:    /* Interrupt source.  */
+-    acc->info = data;
-         val = s->phy_int;
+-    cc->tcg_ops = &arm_v7m_tcg_ops;
-         s->phy_int = 0;
+-    cc->gdb_core_xml_file = "arm-m-profile.xml";
--        phy_update_irq(s);
+-}
-+        imx_phy_update_irq(s);
+-
-         break;
+ #ifndef TARGET_AARCH64
-     case 30:    /* Interrupt mask */
+ /*
-         val = s->phy_int_mask;
+  * -cpu max: a CPU with as many features enabled as our emulation supports.
-@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
+@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo arm_tcg_cpus[] = {
-         break;
+     { .name = "cortex-a8",   .initfn = cortex_a8_initfn },
-     }
+     { .name = "cortex-a9",   .initfn = cortex_a9_initfn },
+     { .name = "cortex-a15",  .initfn = cortex_a15_initfn },
--    PHY_PRINTF("read 0x%04x @ %d\n", val, reg);
+-    { .name = "cortex-m0",   .initfn = cortex_m0_initfn,
-+    trace_imx_phy_read(val, reg);
+-                             .class_init = arm_v7m_class_init },
+-    { .name = "cortex-m3",   .initfn = cortex_m3_initfn,
-     return val;
+-                             .class_init = arm_v7m_class_init },
- }
+-    { .name = "cortex-m4",   .initfn = cortex_m4_initfn,
+-                             .class_init = arm_v7m_class_init },
--static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
+-    { .name = "cortex-m7",   .initfn = cortex_m7_initfn,
-+static void imx_phy_write(IMXFECState *s, int reg, uint32_t val)
+-                             .class_init = arm_v7m_class_init },
- {
+-    { .name = "cortex-m33",  .initfn = cortex_m33_initfn,
--    PHY_PRINTF("write 0x%04x @ %d\n", val, reg);
+-                             .class_init = arm_v7m_class_init },
-+    trace_imx_phy_write(val, reg);
+-    { .name = "cortex-m55",  .initfn = cortex_m55_initfn,
+-                             .class_init = arm_v7m_class_init },
-     if (reg > 31) {
+     { .name = "cortex-r5",   .initfn = cortex_r5_initfn },
-         /* we only advertise one phy */
+     { .name = "cortex-r5f",  .initfn = cortex_r5f_initfn },
-@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
+     { .name = "cortex-r52",  .initfn = cortex_r52_initfn },
-     switch (reg) {
+diff --git a/target/arm/meson.build b/target/arm/meson.build
      case 0:     /* Basic Control */
          if (val & 0x8000) {
 -            phy_reset(s);
 +            imx_phy_reset(s);
          } else {
              s->phy_control = val & 0x7980;
              /* Complete autonegotiation immediately.  */
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
          break;
      case 30:    /* Interrupt mask */
          s->phy_int_mask = val & 0xff;
 -        phy_update_irq(s);
 +        imx_phy_update_irq(s);
          break;
      case 17:
      case 18:
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
  static void imx_fec_read_bd(IMXFECBufDesc *bd, dma_addr_t addr)
  {
      dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
 +
 +    trace_imx_fec_read_bd(addr, bd->flags, bd->length, bd->data);
  }
  static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
@@ -XXX,XX +XXX,XX @@ static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
  static void imx_enet_read_bd(IMXENETBufDesc *bd, dma_addr_t addr)
  {
      dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
 +
 +    trace_imx_enet_read_bd(addr, bd->flags, bd->length, bd->data,
 +                   bd->option, bd->status);
  }
  static void imx_enet_write_bd(IMXENETBufDesc *bd, dma_addr_t addr)
@@ -XXX,XX +XXX,XX @@ static void imx_fec_do_tx(IMXFECState *s)
          int len;
          imx_fec_read_bd(&bd, addr);
 -        FEC_PRINTF("tx_bd %x flags %04x len %d data %08x\n",
 -                   addr, bd.flags, bd.length, bd.data);
          if ((bd.flags & ENET_BD_R) == 0) {
 +
              /* Run out of descriptors to transmit.  */
 -            FEC_PRINTF("tx_bd ran out of descriptors to transmit\n");
 +            trace_imx_eth_tx_bd_busy();
 +
              break;
          }
          len = bd.length;
@@ -XXX,XX +XXX,XX @@ static void imx_enet_do_tx(IMXFECState *s, uint32_t index)
          int len;
          imx_enet_read_bd(&bd, addr);
 -        FEC_PRINTF("tx_bd %x flags %04x len %d data %08x option %04x "
 -                   "status %04x\n", addr, bd.flags, bd.length, bd.data,
 -                   bd.option, bd.status);
          if ((bd.flags & ENET_BD_R) == 0) {
              /* Run out of descriptors to transmit.  */
 +
 +            trace_imx_eth_tx_bd_busy();
 +
              break;
          }
          len = bd.length;
@@ -XXX,XX +XXX,XX @@ static void imx_eth_enable_rx(IMXFECState *s, bool flush)
      s->regs[ENET_RDAR] = (bd.flags & ENET_BD_E) ? ENET_RDAR_RDAR : 0;
      if (!s->regs[ENET_RDAR]) {
 -        FEC_PRINTF("RX buffer full\n");
 +        trace_imx_eth_rx_bd_full();
      } else if (flush) {
          qemu_flush_queued_packets(qemu_get_queue(s->nic));
      }
@@ -XXX,XX +XXX,XX @@ static void imx_eth_reset(DeviceState *d)
      memset(s->tx_descriptor, 0, sizeof(s->tx_descriptor));
      /* We also reset the PHY */
 -    phy_reset(s);
 +    imx_phy_reset(s);
  }
  static uint32_t imx_default_read(IMXFECState *s, uint32_t index)
@@ -XXX,XX +XXX,XX @@ static uint64_t imx_eth_read(void *opaque, hwaddr offset, unsigned size)
          break;
      }
 -    FEC_PRINTF("reg[%s] => 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
 -                                              value);
 +    trace_imx_eth_read(index, imx_eth_reg_name(s, index), value);
      return value;
  }
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
      const bool single_tx_ring = !imx_eth_is_multi_tx_ring(s);
      uint32_t index = offset >> 2;
 -    FEC_PRINTF("reg[%s] <= 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
 -                (uint32_t)value);
 +    trace_imx_eth_write(index, imx_eth_reg_name(s, index), value);
      switch (index) {
      case ENET_EIR:
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
          if (extract32(value, 29, 1)) {
              /* This is a read operation */
              s->regs[ENET_MMFR] = deposit32(s->regs[ENET_MMFR], 0, 16,
 -                                           do_phy_read(s,
 +                                           imx_phy_read(s,
                                                         extract32(value,
 , 10)));
          } else {
              /* This a write operation */
 -            do_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
 +            imx_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
          }
          /* raise the interrupt as the PHY operation is done */
          s->regs[ENET_EIR] |= ENET_INT_MII;
@@ -XXX,XX +XXX,XX @@ static bool imx_eth_can_receive(NetClientState *nc)
  {
      IMXFECState *s = IMX_FEC(qemu_get_nic_opaque(nc));
 -    FEC_PRINTF("\n");
 -
      return !!s->regs[ENET_RDAR];
  }
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
      unsigned int buf_len;
      size_t size = len;
 -    FEC_PRINTF("len %d\n", (int)size);
 +    trace_imx_fec_receive(size);
      if (!s->regs[ENET_RDAR]) {
          qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
          bd.length = buf_len;
          size -= buf_len;
 -        FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
 +        trace_imx_fec_receive_len(addr, bd.length);
          /* The last 4 bytes are the CRC.  */
          if (size < 4) {
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
          if (size == 0) {
              /* Last buffer in frame.  */
              bd.flags |= flags | ENET_BD_L;
 -            FEC_PRINTF("rx frame flags %04x\n", bd.flags);
 +
 +            trace_imx_fec_receive_last(bd.flags);
 +
              s->regs[ENET_EIR] |= ENET_INT_RXF;
          } else {
              s->regs[ENET_EIR] |= ENET_INT_RXB;
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
      size_t size = len;
      bool shift16 = s->regs[ENET_RACC] & ENET_RACC_SHIFT16;
 -    FEC_PRINTF("len %d\n", (int)size);
 +    trace_imx_enet_receive(size);
      if (!s->regs[ENET_RDAR]) {
          qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
          bd.length = buf_len;
          size -= buf_len;
 -        FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
 +        trace_imx_enet_receive_len(addr, bd.length);
          /* The last 4 bytes are the CRC.  */
          if (size < 4) {
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
          if (size == 0) {
              /* Last buffer in frame.  */
              bd.flags |= flags | ENET_BD_L;
 -            FEC_PRINTF("rx frame flags %04x\n", bd.flags);
 +
 +            trace_imx_enet_receive_last(bd.flags);
 +
              /* Indicate that we've updated the last buffer descriptor. */
              bd.last_buffer = ENET_BD_BDU;
              if (bd.option & ENET_BD_RX_INT) {
 diff --git a/hw/net/trace-events b/hw/net/trace-events
 index XXXXXXX..XXXXXXX 100644
---- a/hw/net/trace-events
+--- a/target/arm/meson.build
-+++ b/hw/net/trace-events
++++ b/target/arm/meson.build
-@@ -XXX,XX +XXX,XX @@ i82596_receive_packet(size_t sz) "len=%zu"
+@@ -XXX,XX +XXX,XX @@ arm_system_ss.add(files(
- i82596_new_mac(const char *id_with_mac) "New MAC for: %s"
+   'ptw.c',
- i82596_set_multicast(uint16_t count) "Added %d multicast entries"
+ ))
- i82596_channel_attention(void *s) "%p: Received CHANNEL ATTENTION"
-+
++arm_user_ss = ss.source_set()
-+# imx_fec.c
++
-+imx_phy_read(uint32_t val, int reg) "0x%04"PRIx32" <= reg[%d]"
+ subdir('hvf')
-+imx_phy_write(uint32_t val, int reg) "0x%04"PRIx32" => reg[%d]"
-+imx_phy_update_link(const char *s) "%s"
+ if 'CONFIG_TCG' in config_all_accel
-+imx_phy_reset(void) ""
+@@ -XXX,XX +XXX,XX @@ endif
-+imx_fec_read_bd(uint64_t addr, int flags, int len, int data) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x"
-+imx_enet_read_bd(uint64_t addr, int flags, int len, int data, int options, int status) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x option 0x%04x status 0x%04x"
+ target_arch += {'arm': arm_ss}
-+imx_eth_tx_bd_busy(void) "tx_bd ran out of descriptors to transmit"
+ target_system_arch += {'arm': arm_system_ss}
-+imx_eth_rx_bd_full(void) "RX buffer is full"
++target_user_arch += {'arm': arm_user_ss}
-+imx_eth_read(int reg, const char *reg_name, uint32_t value) "reg[%d:%s] => 0x%08"PRIx32
+diff --git a/target/arm/tcg/meson.build b/target/arm/tcg/meson.build
-+imx_eth_write(int reg, const char *reg_name, uint64_t value) "reg[%d:%s] <= 0x%08"PRIx64
+index XXXXXXX..XXXXXXX 100644
-+imx_fec_receive(size_t size) "len %zu"
+--- a/target/arm/tcg/meson.build
-+imx_fec_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
++++ b/target/arm/tcg/meson.build
-+imx_fec_receive_last(int last) "rx frame flags 0x%04x"
+@@ -XXX,XX +XXX,XX @@ arm_ss.add(when: 'TARGET_AARCH64', if_true: files(
-+imx_enet_receive(size_t size) "len %zu"
+ arm_system_ss.add(files(
-+imx_enet_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
+   'psci.c',
-+imx_enet_receive_last(int last) "rx frame flags 0x%04x"
+ ))
 +
 +arm_system_ss.add(when: 'CONFIG_ARM_V7M', if_true: files('cpu-v7m.c'))
 +arm_user_ss.add(when: 'TARGET_AARCH64', if_false: files('cpu-v7m.c'))
 --
-.20.1
+.34.1

Mostly my decodetree stuff, but also some patches for various
smaller bugs/features from others.

thanks
-- PMM

The following changes since commit 53550e81e2cafe7c03a39526b95cd21b5194d9b1:

Merge remote-tracking branch 'remotes/berrange/tags/qcrypto-next-pull-request' into staging (2020-06-15 16:36:34 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20200616

for you to fetch changes up to 64b397417a26509bcdff44ab94356a35c7901c79:

hw: arm: Set vendor property for IMX SDHCI emulations (2020-06-16 10:32:29 +0100)

----------------------------------------------------------------
 * hw: arm: Set vendor property for IMX SDHCI emulations
 * sd: sdhci: Implement basic vendor specific register support
 * hw/net/imx_fec: Convert debug fprintf() to trace events
 * target/arm/cpu: adjust virtual time for all KVM arm cpus
 * Implement configurable descriptor size in ftgmac100
 * hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
 * target/arm: More Neon decodetree conversion work

----------------------------------------------------------------
Erik Smit (1):
      Implement configurable descriptor size in ftgmac100

Guenter Roeck (2):
      sd: sdhci: Implement basic vendor specific register support
      hw: arm: Set vendor property for IMX SDHCI emulations

Jean-Christophe Dubois (2):
      hw/misc/imx6ul_ccm: Implement non writable bits in CCM registers
      hw/net/imx_fec: Convert debug fprintf() to trace events

Peter Maydell (17):
      target/arm: Fix missing temp frees in do_vshll_2sh
      target/arm: Convert Neon 3-reg-diff prewidening ops to decodetree
      target/arm: Convert Neon 3-reg-diff narrowing ops to decodetree
      target/arm: Convert Neon 3-reg-diff VABAL, VABDL to decodetree
      target/arm: Convert Neon 3-reg-diff long multiplies
      target/arm: Convert Neon 3-reg-diff saturating doubling multiplies
      target/arm: Convert Neon 3-reg-diff polynomial VMULL
      target/arm: Add 'static' and 'const' annotations to VSHLL function arrays
      target/arm: Add missing TCG temp free in do_2shift_env_64()
      target/arm: Convert Neon 2-reg-scalar integer multiplies to decodetree
      target/arm: Convert Neon 2-reg-scalar float multiplies to decodetree
      target/arm: Convert Neon 2-reg-scalar VQDMULH, VQRDMULH to decodetree
      target/arm: Convert Neon 2-reg-scalar VQRDMLAH, VQRDMLSH to decodetree
      target/arm: Convert Neon 2-reg-scalar long multiplies to decodetree
      target/arm: Convert Neon VEXT to decodetree
      target/arm: Convert Neon VTBL, VTBX to decodetree
      target/arm: Convert Neon VDUP (scalar) to decodetree

fangying (1):
      target/arm/cpu: adjust virtual time for all KVM arm cpus

The widenfn() in do_vshll_2sh() does not free the input 32-bit
TCGv, so we need to do this in the calling code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
 target/arm/translate-neon.inc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
     tmp = tcg_temp_new_i64();
 
     widenfn(tmp, rm0);
+    tcg_temp_free_i32(rm0);
     if (a->shift != 0) {
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
     neon_store_reg64(tmp, a->vd);
 
     widenfn(tmp, rm1);
+    tcg_temp_free_i32(rm1);
     if (a->shift != 0) {
         tcg_gen_shli_i64(tmp, tmp, a->shift);
         tcg_gen_andi_i64(tmp, tmp, ~widen_mask);
-- 
2.20.1

Convert the "pre-widening" insns VADDL, VSUBL, VADDW and VSUBW
in the Neon 3-registers-different-lengths group to decodetree.
These insns work by widening one or both inputs to double their
size, performing an add or subtract at the doubled size and
then storing the double-size result.

As usual, rather than copying the loop of the original decoder
(which needs awkward code to avoid problems when source and
destination registers overlap) we just unroll the two passes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  43 +++++++++++++
 target/arm/translate-neon.inc.c | 104 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  16 ++---
 3 files changed, 151 insertions(+), 12 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ VCVT_FU_2sh      1111 001 1 1 . ...... .... 1111 0 . . 1 .... @2reg_vcvt
 # So we have a single decode line and check the cmode/op in the
 # trans function.
 Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
+
+######################################################################
+# Within the "two registers, or three registers of different lengths"
+# grouping ([23,4]=0b10), bits [21:20] are either part of the opcode
+# decode: 0b11 for VEXT, two-reg-misc, VTBL, and duplicate-scalar;
+# or they are a size field for the three-reg-different-lengths and
+# two-reg-and-scalar insn groups (where size cannot be 0b11). This
+# is slightly awkward for decodetree: we handle it with this
+# non-exclusive group which contains within it two exclusive groups:
+# one for the size=0b11 patterns, and one for the size-not-0b11
+# patterns. This allows us to check that none of the insns within
+# each subgroup accidentally overlap each other. Note that all the
+# trans functions for the size-not-0b11 patterns must check and
+# return false for size==3.
+######################################################################
+{
+  # 0b11 subgroup will go here
+
+  # Subgroup for size != 0b11
+  [
+    ##################################################################
+    # 3-reg-different-length grouping:
+    # 1111 001 U 1 D sz!=11 Vn:4 Vd:4 opc:4 N 0 M 0 Vm:4
+    ##################################################################
+
+    &3diff vm vn vd size
+
+    @3diff       .... ... . . . size:2 .... .... .... . . . . .... \
+                 &3diff vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+    VADDL_S_3d   1111 001 0 1 . .. .... .... 0000 . 0 . 0 .... @3diff
+    VADDL_U_3d   1111 001 1 1 . .. .... .... 0000 . 0 . 0 .... @3diff
+
+    VADDW_S_3d   1111 001 0 1 . .. .... .... 0001 . 0 . 0 .... @3diff
+    VADDW_U_3d   1111 001 1 1 . .. .... .... 0001 . 0 . 0 .... @3diff
+
+    VSUBL_S_3d   1111 001 0 1 . .. .... .... 0010 . 0 . 0 .... @3diff
+    VSUBL_U_3d   1111 001 1 1 . .. .... .... 0010 . 0 . 0 .... @3diff
+
+    VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+    VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+  ]
+}
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_Vimm_1r(DisasContext *s, arg_1reg_imm *a)
     }
     return do_1reg_imm(s, a, fn);
 }
+
+static bool do_prewiden_3d(DisasContext *s, arg_3diff *a,
+                           NeonGenWidenFn *widenfn,
+                           NeonGenTwo64OpFn *opfn,
+                           bool src1_wide)
+{
+    /* 3-regs different lengths, prewidening case (VADDL/VSUBL/VAADW/VSUBW) */
+    TCGv_i64 rn0_64, rn1_64, rm_64;
+    TCGv_i32 rm;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!widenfn || !opfn) {
+        /* size == 3 case, which is an entirely different insn group */
+        return false;
+    }
+
+    if ((a->vd & 1) || (src1_wide && (a->vn & 1))) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    rn0_64 = tcg_temp_new_i64();
+    rn1_64 = tcg_temp_new_i64();
+    rm_64 = tcg_temp_new_i64();
+
+    if (src1_wide) {
+        neon_load_reg64(rn0_64, a->vn);
+    } else {
+        TCGv_i32 tmp = neon_load_reg(a->vn, 0);
+        widenfn(rn0_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
+    rm = neon_load_reg(a->vm, 0);
+
+    widenfn(rm_64, rm);
+    tcg_temp_free_i32(rm);
+    opfn(rn0_64, rn0_64, rm_64);
+
+    /*
+     * Load second pass inputs before storing the first pass result, to
+     * avoid incorrect results if a narrow input overlaps with the result.
+     */
+    if (src1_wide) {
+        neon_load_reg64(rn1_64, a->vn + 1);
+    } else {
+        TCGv_i32 tmp = neon_load_reg(a->vn, 1);
+        widenfn(rn1_64, tmp);
+        tcg_temp_free_i32(tmp);
+    }
+    rm = neon_load_reg(a->vm, 1);
+
+    neon_store_reg64(rn0_64, a->vd);
+
+    widenfn(rm_64, rm);
+    tcg_temp_free_i32(rm);
+    opfn(rn1_64, rn1_64, rm_64);
+    neon_store_reg64(rn1_64, a->vd + 1);
+
+    tcg_temp_free_i64(rn0_64);
+    tcg_temp_free_i64(rn1_64);
+    tcg_temp_free_i64(rm_64);
+
+    return true;
+}
+
+#define DO_PREWIDEN(INSN, S, EXT, OP, SRC1WIDE)                         \
+    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
+    {                                                                   \
+        static NeonGenWidenFn * const widenfn[] = {                     \
+            gen_helper_neon_widen_##S##8,                               \
+            gen_helper_neon_widen_##S##16,                              \
+            tcg_gen_##EXT##_i32_i64,                                    \
+            NULL,                                                       \
+        };                                                              \
+        static NeonGenTwo64OpFn * const addfn[] = {                     \
+            gen_helper_neon_##OP##l_u16,                                \
+            gen_helper_neon_##OP##l_u32,                                \
+            tcg_gen_##OP##_i64,                                         \
+            NULL,                                                       \
+        };                                                              \
+        return do_prewiden_3d(s, a, widenfn[a->size],                   \
+                              addfn[a->size], SRC1WIDE);                \
+    }
+
+DO_PREWIDEN(VADDL_S, s, ext, add, false)
+DO_PREWIDEN(VADDL_U, u, extu, add, false)
+DO_PREWIDEN(VSUBL_S, s, ext, sub, false)
+DO_PREWIDEN(VSUBL_U, u, extu, sub, false)
+DO_PREWIDEN(VADDW_S, s, ext, add, true)
+DO_PREWIDEN(VADDW_U, u, extu, add, true)
+DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
+DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 /* Three registers of different lengths.  */
                 int src1_wide;
                 int src2_wide;
-                int prewiden;
                 /* undefreq: bit 0 : UNDEF if size == 0
                  *           bit 1 : UNDEF if size == 1
                  *           bit 2 : UNDEF if size == 2
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 int undefreq;
                 /* prewiden, src1_wide, src2_wide, undefreq */
                 static const int neon_3reg_wide[16][4] = {
-                    {1, 0, 0, 0}, /* VADDL */
-                    {1, 1, 0, 0}, /* VADDW */
-                    {1, 0, 0, 0}, /* VSUBL */
-                    {1, 1, 0, 0}, /* VSUBW */
+                    {0, 0, 0, 7}, /* VADDL: handled by decodetree */
+                    {0, 0, 0, 7}, /* VADDW: handled by decodetree */
+                    {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
+                    {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
                     {0, 1, 1, 0}, /* VADDHN */
                     {0, 0, 0, 0}, /* VABAL */
                     {0, 1, 1, 0}, /* VSUBHN */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
                 };
 
-                prewiden = neon_3reg_wide[op][0];
                 src1_wide = neon_3reg_wide[op][1];
                 src2_wide = neon_3reg_wide[op][2];
                 undefreq = neon_3reg_wide[op][3];
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         } else {
                             tmp = neon_load_reg(rn, pass);
                         }
-                        if (prewiden) {
-                            gen_neon_widen(cpu_V0, tmp, size, u);
-                        }
                     }
                     if (src2_wide) {
                         neon_load_reg64(cpu_V1, rm + pass);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         } else {
                             tmp2 = neon_load_reg(rm, pass);
                         }
-                        if (prewiden) {
-                            gen_neon_widen(cpu_V1, tmp2, size, u);
-                        }
                     }
                     switch (op) {
                     case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
-- 
2.20.1

Convert the narrow-to-high-half insns VADDHN, VSUBHN, VRADDHN,
VRSUBHN in the Neon 3-registers-different-lengths group to
decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  6 +++
 target/arm/translate-neon.inc.c | 87 +++++++++++++++++++++++++++++++
 target/arm/translate.c          | 91 ++++-----------------------------
 3 files changed, 104 insertions(+), 80 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 
     VSUBW_S_3d   1111 001 0 1 . .. .... .... 0011 . 0 . 0 .... @3diff
     VSUBW_U_3d   1111 001 1 1 . .. .... .... 0011 . 0 . 0 .... @3diff
+
+    VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+    VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
+
+    VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
+    VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
   ]
 }
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_PREWIDEN(VADDW_S, s, ext, add, true)
 DO_PREWIDEN(VADDW_U, u, extu, add, true)
 DO_PREWIDEN(VSUBW_S, s, ext, sub, true)
 DO_PREWIDEN(VSUBW_U, u, extu, sub, true)
+
+static bool do_narrow_3d(DisasContext *s, arg_3diff *a,
+                         NeonGenTwo64OpFn *opfn, NeonGenNarrowFn *narrowfn)
+{
+    /* 3-regs different lengths, narrowing (VADDHN/VSUBHN/VRADDHN/VRSUBHN) */
+    TCGv_i64 rn_64, rm_64;
+    TCGv_i32 rd0, rd1;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!opfn || !narrowfn) {
+        /* size == 3 case, which is an entirely different insn group */
+        return false;
+    }
+
+    if ((a->vn | a->vm) & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    rn_64 = tcg_temp_new_i64();
+    rm_64 = tcg_temp_new_i64();
+    rd0 = tcg_temp_new_i32();
+    rd1 = tcg_temp_new_i32();
+
+    neon_load_reg64(rn_64, a->vn);
+    neon_load_reg64(rm_64, a->vm);
+
+    opfn(rn_64, rn_64, rm_64);
+
+    narrowfn(rd0, rn_64);
+
+    neon_load_reg64(rn_64, a->vn + 1);
+    neon_load_reg64(rm_64, a->vm + 1);
+
+    opfn(rn_64, rn_64, rm_64);
+
+    narrowfn(rd1, rn_64);
+
+    neon_store_reg(a->vd, 0, rd0);
+    neon_store_reg(a->vd, 1, rd1);
+
+    tcg_temp_free_i64(rn_64);
+    tcg_temp_free_i64(rm_64);
+
+    return true;
+}
+
+#define DO_NARROW_3D(INSN, OP, NARROWTYPE, EXTOP)                       \
+    static bool trans_##INSN##_3d(DisasContext *s, arg_3diff *a)        \
+    {                                                                   \
+        static NeonGenTwo64OpFn * const addfn[] = {                     \
+            gen_helper_neon_##OP##l_u16,                                \
+            gen_helper_neon_##OP##l_u32,                                \
+            tcg_gen_##OP##_i64,                                         \
+            NULL,                                                       \
+        };                                                              \
+        static NeonGenNarrowFn * const narrowfn[] = {                   \
+            gen_helper_neon_##NARROWTYPE##_high_u8,                     \
+            gen_helper_neon_##NARROWTYPE##_high_u16,                    \
+            EXTOP,                                                      \
+            NULL,                                                       \
+        };                                                              \
+        return do_narrow_3d(s, a, addfn[a->size], narrowfn[a->size]);   \
+    }
+
+static void gen_narrow_round_high_u32(TCGv_i32 rd, TCGv_i64 rn)
+{
+    tcg_gen_addi_i64(rn, rn, 1u << 31);
+    tcg_gen_extrh_i64_i32(rd, rn);
+}
+
+DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
+DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
+DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
+DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
     }
 }
 
-static inline void gen_neon_subl(int size)
-{
-    switch (size) {
-    case 0: gen_helper_neon_subl_u16(CPU_V001); break;
-    case 1: gen_helper_neon_subl_u32(CPU_V001); break;
-    case 2: tcg_gen_sub_i64(CPU_V001); break;
-    default: abort();
-    }
-}
-
 static inline void gen_neon_negl(TCGv_i64 var, int size)
 {
     switch (size) {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             op = (insn >> 8) & 0xf;
             if ((insn & (1 << 6)) == 0) {
                 /* Three registers of different lengths.  */
-                int src1_wide;
-                int src2_wide;
                 /* undefreq: bit 0 : UNDEF if size == 0
                  *           bit 1 : UNDEF if size == 1
                  *           bit 2 : UNDEF if size == 2
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* VADDW: handled by decodetree */
                     {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
                     {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
-                    {0, 1, 1, 0}, /* VADDHN */
+                    {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
                     {0, 0, 0, 0}, /* VABAL */
-                    {0, 1, 1, 0}, /* VSUBHN */
+                    {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
                     {0, 0, 0, 0}, /* VABDL */
                     {0, 0, 0, 0}, /* VMLAL */
                     {0, 0, 0, 9}, /* VQDMLAL */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* Reserved: always UNDEF */
                 };
 
-                src1_wide = neon_3reg_wide[op][1];
-                src2_wide = neon_3reg_wide[op][2];
                 undefreq = neon_3reg_wide[op][3];
 
                 if ((undefreq & (1 << size)) ||
                     ((undefreq & 8) && u)) {
                     return 1;
                 }
-                if ((src1_wide && (rn & 1)) ||
-                    (src2_wide && (rm & 1)) ||
-                    (!src2_wide && (rd & 1))) {
+                if (rd & 1) {
                     return 1;
                 }
 
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                 /* Avoid overlapping operands.  Wide source operands are
                    always aligned so will never overlap with wide
                    destinations in problematic ways.  */
-                if (rd == rm && !src2_wide) {
+                if (rd == rm) {
                     tmp = neon_load_reg(rm, 1);
                     neon_store_scratch(2, tmp);
-                } else if (rd == rn && !src1_wide) {
+                } else if (rd == rn) {
                     tmp = neon_load_reg(rn, 1);
                     neon_store_scratch(2, tmp);
                 }
                 tmp3 = NULL;
                 for (pass = 0; pass < 2; pass++) {
-                    if (src1_wide) {
-                        neon_load_reg64(cpu_V0, rn + pass);
-                        tmp = NULL;
+                    if (pass == 1 && rd == rn) {
+                        tmp = neon_load_scratch(2);
                     } else {
-                        if (pass == 1 && rd == rn) {
-                            tmp = neon_load_scratch(2);
-                        } else {
-                            tmp = neon_load_reg(rn, pass);
-                        }
+                        tmp = neon_load_reg(rn, pass);
                     }
-                    if (src2_wide) {
-                        neon_load_reg64(cpu_V1, rm + pass);
-                        tmp2 = NULL;
+                    if (pass == 1 && rd == rm) {
+                        tmp2 = neon_load_scratch(2);
                     } else {
-                        if (pass == 1 && rd == rm) {
-                            tmp2 = neon_load_scratch(2);
-                        } else {
-                            tmp2 = neon_load_reg(rm, pass);
-                        }
+                        tmp2 = neon_load_reg(rm, pass);
                     }
                     switch (op) {
-                    case 0: case 1: case 4: /* VADDL, VADDW, VADDHN, VRADDHN */
-                        gen_neon_addl(size);
-                        break;
-                    case 2: case 3: case 6: /* VSUBL, VSUBW, VSUBHN, VRSUBHN */
-                        gen_neon_subl(size);
-                        break;
                     case 5: case 7: /* VABAL, VABDL */
                         switch ((size << 1) | u) {
                         case 0:
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                             abort();
                         }
                         neon_store_reg64(cpu_V0, rd + pass);
-                    } else if (op == 4 || op == 6) {
-                        /* Narrowing operation.  */
-                        tmp = tcg_temp_new_i32();
-                        if (!u) {
-                            switch (size) {
-                            case 0:
-                                gen_helper_neon_narrow_high_u8(tmp, cpu_V0);
-                                break;
-                            case 1:
-                                gen_helper_neon_narrow_high_u16(tmp, cpu_V0);
-                                break;
-                            case 2:
-                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
-                                break;
-                            default: abort();
-                            }
-                        } else {
-                            switch (size) {
-                            case 0:
-                                gen_helper_neon_narrow_round_high_u8(tmp, cpu_V0);
-                                break;
-                            case 1:
-                                gen_helper_neon_narrow_round_high_u16(tmp, cpu_V0);
-                                break;
-                            case 2:
-                                tcg_gen_addi_i64(cpu_V0, cpu_V0, 1u << 31);
-                                tcg_gen_extrh_i64_i32(tmp, cpu_V0);
-                                break;
-                            default: abort();
-                            }
-                        }
-                        if (pass == 0) {
-                            tmp3 = tmp;
-                        } else {
-                            neon_store_reg(rd, 0, tmp3);
-                            neon_store_reg(rd, 1, tmp);
-                        }
                     } else {
                         /* Write back the result.  */
                         neon_store_reg64(cpu_V0, rd + pass);
-- 
2.20.1

Convert the Neon 3-reg-diff insns VABAL and VABDL to decodetree.
Like almost all the remaining insns in this group, these are
a combination of a two-input operation which returns a double width
result and then a possible accumulation of that double width
result into the destination.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate.h          |   1 +
 target/arm/neon-dp.decode       |   6 ++
 target/arm/translate-neon.inc.c | 132 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  31 +-------
 4 files changed, 142 insertions(+), 28 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
 typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
 typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
 typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
+typedef void NeonGenTwoOpWidenFn(TCGv_i64, TCGv_i32, TCGv_i32);
 typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
 typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
 typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
     VADDHN_3d    1111 001 0 1 . .. .... .... 0100 . 0 . 0 .... @3diff
     VRADDHN_3d   1111 001 1 1 . .. .... .... 0100 . 0 . 0 .... @3diff
 
+    VABAL_S_3d   1111 001 0 1 . .. .... .... 0101 . 0 . 0 .... @3diff
+    VABAL_U_3d   1111 001 1 1 . .. .... .... 0101 . 0 . 0 .... @3diff
+
     VSUBHN_3d    1111 001 0 1 . .. .... .... 0110 . 0 . 0 .... @3diff
     VRSUBHN_3d   1111 001 1 1 . .. .... .... 0110 . 0 . 0 .... @3diff
+
+    VABDL_S_3d   1111 001 0 1 . .. .... .... 0111 . 0 . 0 .... @3diff
+    VABDL_U_3d   1111 001 1 1 . .. .... .... 0111 . 0 . 0 .... @3diff
   ]
 }
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ DO_NARROW_3D(VADDHN, add, narrow, tcg_gen_extrh_i64_i32)
 DO_NARROW_3D(VSUBHN, sub, narrow, tcg_gen_extrh_i64_i32)
 DO_NARROW_3D(VRADDHN, add, narrow_round, gen_narrow_round_high_u32)
 DO_NARROW_3D(VRSUBHN, sub, narrow_round, gen_narrow_round_high_u32)
+
+static bool do_long_3d(DisasContext *s, arg_3diff *a,
+                       NeonGenTwoOpWidenFn *opfn,
+                       NeonGenTwo64OpFn *accfn)
+{
+    /*
+     * 3-regs different lengths, long operations.
+     * These perform an operation on two inputs that returns a double-width
+     * result, and then possibly perform an accumulation operation of
+     * that result into the double-width destination.
+     */
+    TCGv_i64 rd0, rd1, tmp;
+    TCGv_i32 rn, rm;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!opfn) {
+        /* size == 3 case, which is an entirely different insn group */
+        return false;
+    }
+
+    if (a->vd & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    rd0 = tcg_temp_new_i64();
+    rd1 = tcg_temp_new_i64();
+
+    rn = neon_load_reg(a->vn, 0);
+    rm = neon_load_reg(a->vm, 0);
+    opfn(rd0, rn, rm);
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(rm);
+
+    rn = neon_load_reg(a->vn, 1);
+    rm = neon_load_reg(a->vm, 1);
+    opfn(rd1, rn, rm);
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(rm);
+
+    /* Don't store results until after all loads: they might overlap */
+    if (accfn) {
+        tmp = tcg_temp_new_i64();
+        neon_load_reg64(tmp, a->vd);
+        accfn(tmp, tmp, rd0);
+        neon_store_reg64(tmp, a->vd);
+        neon_load_reg64(tmp, a->vd + 1);
+        accfn(tmp, tmp, rd1);
+        neon_store_reg64(tmp, a->vd + 1);
+        tcg_temp_free_i64(tmp);
+    } else {
+        neon_store_reg64(rd0, a->vd);
+        neon_store_reg64(rd1, a->vd + 1);
+    }
+
+    tcg_temp_free_i64(rd0);
+    tcg_temp_free_i64(rd1);
+
+    return true;
+}
+
+static bool trans_VABDL_S_3d(DisasContext *s, arg_3diff *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        gen_helper_neon_abdl_s16,
+        gen_helper_neon_abdl_s32,
+        gen_helper_neon_abdl_s64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VABDL_U_3d(DisasContext *s, arg_3diff *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        gen_helper_neon_abdl_u16,
+        gen_helper_neon_abdl_u32,
+        gen_helper_neon_abdl_u64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VABAL_S_3d(DisasContext *s, arg_3diff *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        gen_helper_neon_abdl_s16,
+        gen_helper_neon_abdl_s32,
+        gen_helper_neon_abdl_s64,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const addfn[] = {
+        gen_helper_neon_addl_u16,
+        gen_helper_neon_addl_u32,
+        tcg_gen_add_i64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
+}
+
+static bool trans_VABAL_U_3d(DisasContext *s, arg_3diff *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        gen_helper_neon_abdl_u16,
+        gen_helper_neon_abdl_u32,
+        gen_helper_neon_abdl_u64,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const addfn[] = {
+        gen_helper_neon_addl_u16,
+        gen_helper_neon_addl_u32,
+        tcg_gen_add_i64,
+        NULL,
+    };
+
+    return do_long_3d(s, a, opfn[a->size], addfn[a->size]);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     {0, 0, 0, 7}, /* VSUBL: handled by decodetree */
                     {0, 0, 0, 7}, /* VSUBW: handled by decodetree */
                     {0, 0, 0, 7}, /* VADDHN: handled by decodetree */
-                    {0, 0, 0, 0}, /* VABAL */
+                    {0, 0, 0, 7}, /* VABAL */
                     {0, 0, 0, 7}, /* VSUBHN: handled by decodetree */
-                    {0, 0, 0, 0}, /* VABDL */
+                    {0, 0, 0, 7}, /* VABDL */
                     {0, 0, 0, 0}, /* VMLAL */
                     {0, 0, 0, 9}, /* VQDMLAL */
                     {0, 0, 0, 0}, /* VMLSL */
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         tmp2 = neon_load_reg(rm, pass);
                     }
                     switch (op) {
-                    case 5: case 7: /* VABAL, VABDL */
-                        switch ((size << 1) | u) {
-                        case 0:
-                            gen_helper_neon_abdl_s16(cpu_V0, tmp, tmp2);
-                            break;
-                        case 1:
-                            gen_helper_neon_abdl_u16(cpu_V0, tmp, tmp2);
-                            break;
-                        case 2:
-                            gen_helper_neon_abdl_s32(cpu_V0, tmp, tmp2);
-                            break;
-                        case 3:
-                            gen_helper_neon_abdl_u32(cpu_V0, tmp, tmp2);
-                            break;
-                        case 4:
-                            gen_helper_neon_abdl_s64(cpu_V0, tmp, tmp2);
-                            break;
-                        case 5:
-                            gen_helper_neon_abdl_u64(cpu_V0, tmp, tmp2);
-                            break;
-                        default: abort();
-                        }
-                        tcg_temp_free_i32(tmp2);
-                        tcg_temp_free_i32(tmp);
-                        break;
                     case 8: case 9: case 10: case 11: case 12: case 13:
                         /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
                         gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                         case 10: /* VMLSL */
                             gen_neon_negl(cpu_V0, size);
                             /* Fall through */
-                        case 5: case 8: /* VABAL, VMLAL */
+                        case 8: /* VABAL, VMLAL */
                             gen_neon_addl(size);
                             break;
                         case 9: case 11: /* VQDMLAL, VQDMLSL */
-- 
2.20.1

Convert the Neon 3-reg-diff insns VMULL, VMLAL and VMLSL; these perform
a 32x32->64 multiply with possible accumulate.

Note that for VMLSL we do the accumulate directly with a subtraction
rather than doing a negate-then-add as the old code did.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  9 +++++
 target/arm/translate-neon.inc.c | 71 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 21 +++-------
 3 files changed, 86 insertions(+), 15 deletions(-)

Convert the Neon 3-reg-diff insns VQDMULL, VQDMLAL and VQDMLSL:
these are all saturating doubling long multiplies with a possible
accumulate step.

These are the last insns in the group which use the pass-over-each
elements loop, so we can delete that code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  6 +++
 target/arm/translate-neon.inc.c | 82 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 59 ++----------------------
 3 files changed, 92 insertions(+), 55 deletions(-)

Convert the Neon 3-reg-diff insn polynomial VMULL. This is the last
insn in this group to be converted.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  2 ++
 target/arm/translate-neon.inc.c | 43 +++++++++++++++++++++++
 target/arm/translate.c          | 60 ++-------------------------------
 3 files changed, 48 insertions(+), 57 deletions(-)

Mark the arrays of function pointers in trans_VSHLL_S_2sh() and
trans_VSHLL_U_2sh() as both 'static' and 'const'.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-neon.inc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool do_vshll_2sh(DisasContext *s, arg_2reg_shift *a,
 
 static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
 {
-    NeonGenWidenFn *widenfn[] = {
+    static NeonGenWidenFn * const widenfn[] = {
         gen_helper_neon_widen_s8,
         gen_helper_neon_widen_s16,
         tcg_gen_ext_i32_i64,
@@ -XXX,XX +XXX,XX @@ static bool trans_VSHLL_S_2sh(DisasContext *s, arg_2reg_shift *a)
 
 static bool trans_VSHLL_U_2sh(DisasContext *s, arg_2reg_shift *a)
 {
-    NeonGenWidenFn *widenfn[] = {
+    static NeonGenWidenFn * const widenfn[] = {
         gen_helper_neon_widen_u8,
         gen_helper_neon_widen_u16,
         tcg_gen_extu_i32_i64,
-- 
2.20.1

Convert the VMLA, VMLS and VMUL insns in the Neon "2 registers and a
scalar" group to decodetree.  These are 32x32->32 operations where
one of the inputs is the scalar, followed by a possible accumulate
operation of the 32-bit result.

The refactoring removes some of the oddities of the old decoder:
 * operands to the operation and accumulation were often
   reversed (taking advantage of the fact that most of these ops
   are commutative); the new code follows the pseudocode order
 * the Q bit in the insn was in a local variable 'u'; in the
   new code it is decoded into a->q

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  15 ++++
 target/arm/translate-neon.inc.c | 133 ++++++++++++++++++++++++++++++++
 target/arm/translate.c          |  77 ++----------------
 3 files changed, 154 insertions(+), 71 deletions(-)

Convert the float versions of VMLA, VMLS and VMUL in the Neon
2-reg-scalar group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
As noted in the comment on the WRAP_FP_FN macro, we could have
had a do_2scalar_fp() function, but for 3 insns it seemed
simpler to just do the wrapping to get hold of the fpstatus ptr.
(These are the only fp insns in the group.)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  3 ++
 target/arm/translate-neon.inc.c | 65 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 37 ++-----------------
 3 files changed, 71 insertions(+), 34 deletions(-)

Convert the VQDMULH and VQRDMULH insns in the 2-reg-scalar group
to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  3 +++
 target/arm/translate-neon.inc.c | 29 +++++++++++++++++++++++
 target/arm/translate.c          | 42 ++-------------------------------
 3 files changed, 34 insertions(+), 40 deletions(-)

Convert the VQRDMLAH and VQRDMLSH insns in the 2-reg-scalar
group to decodetree.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  3 ++
 target/arm/translate-neon.inc.c | 74 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 38 +----------------
 3 files changed, 79 insertions(+), 36 deletions(-)

Convert the Neon 2-reg-scalar long multiplies to decodetree.
These are the last instructions in the group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  18 ++++
 target/arm/translate-neon.inc.c | 163 ++++++++++++++++++++++++++++
 target/arm/translate.c          | 182 ++------------------------------
 3 files changed, 187 insertions(+), 176 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 
     @2scalar     .... ... q:1 . . size:2 .... .... .... . . . . .... \
                  &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp
+    # For the 'long' ops the Q bit is part of insn decode
+    @2scalar_q0  .... ... . . . size:2 .... .... .... . . . . .... \
+                 &2scalar vm=%vm_dp vn=%vn_dp vd=%vd_dp q=0
 
     VMLA_2sc     1111 001 . 1 . .. .... .... 0000 . 1 . 0 .... @2scalar
     VMLA_F_2sc   1111 001 . 1 . .. .... .... 0001 . 1 . 0 .... @2scalar
 
+    VMLAL_S_2sc  1111 001 0 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
+    VMLAL_U_2sc  1111 001 1 1 . .. .... .... 0010 . 1 . 0 .... @2scalar_q0
+
+    VQDMLAL_2sc  1111 001 0 1 . .. .... .... 0011 . 1 . 0 .... @2scalar_q0
+
     VMLS_2sc     1111 001 . 1 . .. .... .... 0100 . 1 . 0 .... @2scalar
     VMLS_F_2sc   1111 001 . 1 . .. .... .... 0101 . 1 . 0 .... @2scalar
 
+    VMLSL_S_2sc  1111 001 0 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
+    VMLSL_U_2sc  1111 001 1 1 . .. .... .... 0110 . 1 . 0 .... @2scalar_q0
+
+    VQDMLSL_2sc  1111 001 0 1 . .. .... .... 0111 . 1 . 0 .... @2scalar_q0
+
     VMUL_2sc     1111 001 . 1 . .. .... .... 1000 . 1 . 0 .... @2scalar
     VMUL_F_2sc   1111 001 . 1 . .. .... .... 1001 . 1 . 0 .... @2scalar
 
+    VMULL_S_2sc  1111 001 0 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
+    VMULL_U_2sc  1111 001 1 1 . .. .... .... 1010 . 1 . 0 .... @2scalar_q0
+
+    VQDMULL_2sc  1111 001 0 1 . .. .... .... 1011 . 1 . 0 .... @2scalar_q0
+
     VQDMULH_2sc  1111 001 . 1 . .. .... .... 1100 . 1 . 0 .... @2scalar
     VQRDMULH_2sc 1111 001 . 1 . .. .... .... 1101 . 1 . 0 .... @2scalar
 
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VQRDMLSH_2sc(DisasContext *s, arg_2scalar *a)
     };
     return do_vqrdmlah_2sc(s, a, opfn[a->size]);
 }
+
+static bool do_2scalar_long(DisasContext *s, arg_2scalar *a,
+                            NeonGenTwoOpWidenFn *opfn,
+                            NeonGenTwo64OpFn *accfn)
+{
+    /*
+     * Two registers and a scalar, long operations: perform an
+     * operation on the input elements and the scalar which produces
+     * a double-width result, and then possibly perform an accumulation
+     * operation of that result into the destination.
+     */
+    TCGv_i32 scalar, rn;
+    TCGv_i64 rn0_64, rn1_64;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!opfn) {
+        /* Bad size (including size == 3, which is a different insn group) */
+        return false;
+    }
+
+    if (a->vd & 1) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    scalar = neon_get_scalar(a->size, a->vm);
+
+    /* Load all inputs before writing any outputs, in case of overlap */
+    rn = neon_load_reg(a->vn, 0);
+    rn0_64 = tcg_temp_new_i64();
+    opfn(rn0_64, rn, scalar);
+    tcg_temp_free_i32(rn);
+
+    rn = neon_load_reg(a->vn, 1);
+    rn1_64 = tcg_temp_new_i64();
+    opfn(rn1_64, rn, scalar);
+    tcg_temp_free_i32(rn);
+    tcg_temp_free_i32(scalar);
+
+    if (accfn) {
+        TCGv_i64 t64 = tcg_temp_new_i64();
+        neon_load_reg64(t64, a->vd);
+        accfn(t64, t64, rn0_64);
+        neon_store_reg64(t64, a->vd);
+        neon_load_reg64(t64, a->vd + 1);
+        accfn(t64, t64, rn1_64);
+        neon_store_reg64(t64, a->vd + 1);
+        tcg_temp_free_i64(t64);
+    } else {
+        neon_store_reg64(rn0_64, a->vd);
+        neon_store_reg64(rn1_64, a->vd + 1);
+    }
+    tcg_temp_free_i64(rn0_64);
+    tcg_temp_free_i64(rn1_64);
+    return true;
+}
+
+static bool trans_VMULL_S_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_helper_neon_mull_s16,
+        gen_mull_s32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VMULL_U_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_helper_neon_mull_u16,
+        gen_mull_u32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], NULL);
+}
+
+#define DO_VMLAL_2SC(INSN, MULL, ACC)                                   \
+    static bool trans_##INSN##_2sc(DisasContext *s, arg_2scalar *a)     \
+    {                                                                   \
+        static NeonGenTwoOpWidenFn * const opfn[] = {                   \
+            NULL,                                                       \
+            gen_helper_neon_##MULL##16,                                 \
+            gen_##MULL##32,                                             \
+            NULL,                                                       \
+        };                                                              \
+        static NeonGenTwo64OpFn * const accfn[] = {                     \
+            NULL,                                                       \
+            gen_helper_neon_##ACC##l_u32,                               \
+            tcg_gen_##ACC##_i64,                                        \
+            NULL,                                                       \
+        };                                                              \
+        return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);    \
+    }
+
+DO_VMLAL_2SC(VMLAL_S, mull_s, add)
+DO_VMLAL_2SC(VMLAL_U, mull_u, add)
+DO_VMLAL_2SC(VMLSL_S, mull_s, sub)
+DO_VMLAL_2SC(VMLSL_U, mull_u, sub)
+
+static bool trans_VQDMULL_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], NULL);
+}
+
+static bool trans_VQDMLAL_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const accfn[] = {
+        NULL,
+        gen_VQDMLAL_acc_16,
+        gen_VQDMLAL_acc_32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
+}
+
+static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
+{
+    static NeonGenTwoOpWidenFn * const opfn[] = {
+        NULL,
+        gen_VQDMULL_16,
+        gen_VQDMULL_32,
+        NULL,
+    };
+    static NeonGenTwo64OpFn * const accfn[] = {
+        NULL,
+        gen_VQDMLSL_acc_16,
+        gen_VQDMLSL_acc_32,
+        NULL,
+    };
+
+    return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_revsh(TCGv_i32 dest, TCGv_i32 var)
     tcg_gen_ext16s_i32(dest, var);
 }
 
-/* 32x32->64 multiply.  Marks inputs as dead.  */
-static TCGv_i64 gen_mulu_i64_i32(TCGv_i32 a, TCGv_i32 b)
-{
-    TCGv_i32 lo = tcg_temp_new_i32();
-    TCGv_i32 hi = tcg_temp_new_i32();
-    TCGv_i64 ret;
-
-    tcg_gen_mulu2_i32(lo, hi, a, b);
-    tcg_temp_free_i32(a);
-    tcg_temp_free_i32(b);
-
-    ret = tcg_temp_new_i64();
-    tcg_gen_concat_i32_i64(ret, lo, hi);
-    tcg_temp_free_i32(lo);
-    tcg_temp_free_i32(hi);
-
-    return ret;
-}
-
-static TCGv_i64 gen_muls_i64_i32(TCGv_i32 a, TCGv_i32 b)
-{
-    TCGv_i32 lo = tcg_temp_new_i32();
-    TCGv_i32 hi = tcg_temp_new_i32();
-    TCGv_i64 ret;
-
-    tcg_gen_muls2_i32(lo, hi, a, b);
-    tcg_temp_free_i32(a);
-    tcg_temp_free_i32(b);
-
-    ret = tcg_temp_new_i64();
-    tcg_gen_concat_i32_i64(ret, lo, hi);
-    tcg_temp_free_i32(lo);
-    tcg_temp_free_i32(hi);
-
-    return ret;
-}
-
 /* Swap low and high halfwords.  */
 static void gen_swap_half(TCGv_i32 var)
 {
@@ -XXX,XX +XXX,XX @@ static inline void gen_neon_addl(int size)
     }
 }
 
-static inline void gen_neon_negl(TCGv_i64 var, int size)
-{
-    switch (size) {
-    case 0: gen_helper_neon_negl_u16(var, var); break;
-    case 1: gen_helper_neon_negl_u32(var, var); break;
-    case 2:
-        tcg_gen_neg_i64(var, var);
-        break;
-    default: abort();
-    }
-}
-
-static inline void gen_neon_addl_saturate(TCGv_i64 op0, TCGv_i64 op1, int size)
-{
-    switch (size) {
-    case 1: gen_helper_neon_addl_saturate_s32(op0, cpu_env, op0, op1); break;
-    case 2: gen_helper_neon_addl_saturate_s64(op0, cpu_env, op0, op1); break;
-    default: abort();
-    }
-}
-
-static inline void gen_neon_mull(TCGv_i64 dest, TCGv_i32 a, TCGv_i32 b,
-                                 int size, int u)
-{
-    TCGv_i64 tmp;
-
-    switch ((size << 1) | u) {
-    case 0: gen_helper_neon_mull_s8(dest, a, b); break;
-    case 1: gen_helper_neon_mull_u8(dest, a, b); break;
-    case 2: gen_helper_neon_mull_s16(dest, a, b); break;
-    case 3: gen_helper_neon_mull_u16(dest, a, b); break;
-    case 4:
-        tmp = gen_muls_i64_i32(a, b);
-        tcg_gen_mov_i64(dest, tmp);
-        tcg_temp_free_i64(tmp);
-        break;
-    case 5:
-        tmp = gen_mulu_i64_i32(a, b);
-        tcg_gen_mov_i64(dest, tmp);
-        tcg_temp_free_i64(tmp);
-        break;
-    default: abort();
-    }
-
-    /* gen_helper_neon_mull_[su]{8|16} do not free their parameters.
-       Don't forget to clean them now.  */
-    if (size < 2) {
-        tcg_temp_free_i32(a);
-        tcg_temp_free_i32(b);
-    }
-}
-
 static void gen_neon_narrow_op(int op, int u, int size,
                                TCGv_i32 dest, TCGv_i64 src)
 {
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     int u;
     int vec_size;
     uint32_t imm;
-    TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
+    TCGv_i32 tmp, tmp2, tmp3, tmp5;
     TCGv_ptr ptr1;
     TCGv_i64 tmp64;
 
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
         return 1;
     } else { /* (insn & 0x00800010 == 0x00800000) */
         if (size != 3) {
-            op = (insn >> 8) & 0xf;
-            if ((insn & (1 << 6)) == 0) {
-                /* Three registers of different lengths: handled by decodetree */
-                return 1;
-            } else {
-                /* Two registers and a scalar. NB that for ops of this form
-                 * the ARM ARM labels bit 24 as Q, but it is in our variable
-                 * 'u', not 'q'.
-                 */
-                if (size == 0) {
-                    return 1;
-                }
-                switch (op) {
-                case 0: /* Integer VMLA scalar */
-                case 4: /* Integer VMLS scalar */
-                case 8: /* Integer VMUL scalar */
-                case 1: /* Float VMLA scalar */
-                case 5: /* Floating point VMLS scalar */
-                case 9: /* Floating point VMUL scalar */
-                case 12: /* VQDMULH scalar */
-                case 13: /* VQRDMULH scalar */
-                case 14: /* VQRDMLAH scalar */
-                case 15: /* VQRDMLSH scalar */
-                    return 1; /* handled by decodetree */
-
-                case 3: /* VQDMLAL scalar */
-                case 7: /* VQDMLSL scalar */
-                case 11: /* VQDMULL scalar */
-                    if (u == 1) {
-                        return 1;
-                    }
-                    /* fall through */
-                case 2: /* VMLAL sclar */
-                case 6: /* VMLSL scalar */
-                case 10: /* VMULL scalar */
-                    if (rd & 1) {
-                        return 1;
-                    }
-                    tmp2 = neon_get_scalar(size, rm);
-                    /* We need a copy of tmp2 because gen_neon_mull
-                     * deletes it during pass 0.  */
-                    tmp4 = tcg_temp_new_i32();
-                    tcg_gen_mov_i32(tmp4, tmp2);
-                    tmp3 = neon_load_reg(rn, 1);
-
-                    for (pass = 0; pass < 2; pass++) {
-                        if (pass == 0) {
-                            tmp = neon_load_reg(rn, 0);
-                        } else {
-                            tmp = tmp3;
-                            tmp2 = tmp4;
-                        }
-                        gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
-                        if (op != 11) {
-                            neon_load_reg64(cpu_V1, rd + pass);
-                        }
-                        switch (op) {
-                        case 6:
-                            gen_neon_negl(cpu_V0, size);
-                            /* Fall through */
-                        case 2:
-                            gen_neon_addl(size);
-                            break;
-                        case 3: case 7:
-                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
-                            if (op == 7) {
-                                gen_neon_negl(cpu_V0, size);
-                            }
-                            gen_neon_addl_saturate(cpu_V0, cpu_V1, size);
-                            break;
-                        case 10:
-                            /* no-op */
-                            break;
-                        case 11:
-                            gen_neon_addl_saturate(cpu_V0, cpu_V0, size);
-                            break;
-                        default:
-                            abort();
-                        }
-                        neon_store_reg64(cpu_V0, rd + pass);
-                    }
-                    break;
-                default:
-                    g_assert_not_reached();
-                }
-            }
+            /*
+             * Three registers of different lengths, or two registers and
+             * a scalar: handled by decodetree
+             */
+            return 1;
         } else { /* size == 3 */
             if (!u) {
                 /* Extract.  */
-- 
2.20.1

Convert the Neon VEXT insn to decodetree. Rather than keeping the
old implementation which used fixed temporaries cpu_V0 and cpu_V1
and did the extraction with by-hand shift and logic ops, we use
the TCG extract2 insn.

We don't need to special case 0 or 8 immediates any more as the
optimizer is smart enough to throw away the dead code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  8 +++-
 target/arm/translate-neon.inc.c | 76 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 58 +------------------------
 3 files changed, 85 insertions(+), 57 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
 # return false for size==3.
 ######################################################################
 {
-  # 0b11 subgroup will go here
+  [
+    ##################################################################
+    # Miscellaneous size=0b11 insns
+    ##################################################################
+    VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
+  ]
 
   # Subgroup for size != 0b11
   [
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VQDMLSL_2sc(DisasContext *s, arg_2scalar *a)
 
     return do_2scalar_long(s, a, opfn[a->size], accfn[a->size]);
 }
+
+static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
+{
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if ((a->vn | a->vm | a->vd) & a->q) {
+        return false;
+    }
+
+    if (a->imm > 7 && !a->q) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    if (!a->q) {
+        /* Extract 64 bits from <Vm:Vn> */
+        TCGv_i64 left, right, dest;
+
+        left = tcg_temp_new_i64();
+        right = tcg_temp_new_i64();
+        dest = tcg_temp_new_i64();
+
+        neon_load_reg64(right, a->vn);
+        neon_load_reg64(left, a->vm);
+        tcg_gen_extract2_i64(dest, right, left, a->imm * 8);
+        neon_store_reg64(dest, a->vd);
+
+        tcg_temp_free_i64(left);
+        tcg_temp_free_i64(right);
+        tcg_temp_free_i64(dest);
+    } else {
+        /* Extract 128 bits from <Vm+1:Vm:Vn+1:Vn> */
+        TCGv_i64 left, middle, right, destleft, destright;
+
+        left = tcg_temp_new_i64();
+        middle = tcg_temp_new_i64();
+        right = tcg_temp_new_i64();
+        destleft = tcg_temp_new_i64();
+        destright = tcg_temp_new_i64();
+
+        if (a->imm < 8) {
+            neon_load_reg64(right, a->vn);
+            neon_load_reg64(middle, a->vn + 1);
+            tcg_gen_extract2_i64(destright, right, middle, a->imm * 8);
+            neon_load_reg64(left, a->vm);
+            tcg_gen_extract2_i64(destleft, middle, left, a->imm * 8);
+        } else {
+            neon_load_reg64(right, a->vn + 1);
+            neon_load_reg64(middle, a->vm);
+            tcg_gen_extract2_i64(destright, right, middle, (a->imm - 8) * 8);
+            neon_load_reg64(left, a->vm + 1);
+            tcg_gen_extract2_i64(destleft, middle, left, (a->imm - 8) * 8);
+        }
+
+        neon_store_reg64(destright, a->vd);
+        neon_store_reg64(destleft, a->vd + 1);
+
+        tcg_temp_free_i64(destright);
+        tcg_temp_free_i64(destleft);
+        tcg_temp_free_i64(right);
+        tcg_temp_free_i64(middle);
+        tcg_temp_free_i64(left);
+    }
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     int pass;
     int u;
     int vec_size;
-    uint32_t imm;
     TCGv_i32 tmp, tmp2, tmp3, tmp5;
     TCGv_ptr ptr1;
-    TCGv_i64 tmp64;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
             return 1;
         } else { /* size == 3 */
             if (!u) {
-                /* Extract.  */
-                imm = (insn >> 8) & 0xf;
-
-                if (imm > 7 && !q)
-                    return 1;
-
-                if (q && ((rd | rn | rm) & 1)) {
-                    return 1;
-                }
-
-                if (imm == 0) {
-                    neon_load_reg64(cpu_V0, rn);
-                    if (q) {
-                        neon_load_reg64(cpu_V1, rn + 1);
-                    }
-                } else if (imm == 8) {
-                    neon_load_reg64(cpu_V0, rn + 1);
-                    if (q) {
-                        neon_load_reg64(cpu_V1, rm);
-                    }
-                } else if (q) {
-                    tmp64 = tcg_temp_new_i64();
-                    if (imm < 8) {
-                        neon_load_reg64(cpu_V0, rn);
-                        neon_load_reg64(tmp64, rn + 1);
-                    } else {
-                        neon_load_reg64(cpu_V0, rn + 1);
-                        neon_load_reg64(tmp64, rm);
-                    }
-                    tcg_gen_shri_i64(cpu_V0, cpu_V0, (imm & 7) * 8);
-                    tcg_gen_shli_i64(cpu_V1, tmp64, 64 - ((imm & 7) * 8));
-                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
-                    if (imm < 8) {
-                        neon_load_reg64(cpu_V1, rm);
-                    } else {
-                        neon_load_reg64(cpu_V1, rm + 1);
-                        imm -= 8;
-                    }
-                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
-                    tcg_gen_shri_i64(tmp64, tmp64, imm * 8);
-                    tcg_gen_or_i64(cpu_V1, cpu_V1, tmp64);
-                    tcg_temp_free_i64(tmp64);
-                } else {
-                    /* BUGFIX */
-                    neon_load_reg64(cpu_V0, rn);
-                    tcg_gen_shri_i64(cpu_V0, cpu_V0, imm * 8);
-                    neon_load_reg64(cpu_V1, rm);
-                    tcg_gen_shli_i64(cpu_V1, cpu_V1, 64 - (imm * 8));
-                    tcg_gen_or_i64(cpu_V0, cpu_V0, cpu_V1);
-                }
-                neon_store_reg64(cpu_V0, rd);
-                if (q) {
-                    neon_store_reg64(cpu_V1, rd + 1);
-                }
+                /* Extract: handled by decodetree */
+                return 1;
             } else if ((insn & (1 << 11)) == 0) {
                 /* Two register misc.  */
                 op = ((insn >> 12) & 0x30) | ((insn >> 7) & 0xf);
-- 
2.20.1

Convert the Neon VTBL, VTBX instructions to decodetree.  The actual
implementation of the insn is copied across to the new trans function
unchanged except for renaming 'tmp5' to 'tmp4'.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  3 ++
 target/arm/translate-neon.inc.c | 56 +++++++++++++++++++++++++++++++++
 target/arm/translate.c          | 41 +++---------------------
 3 files changed, 63 insertions(+), 37 deletions(-)

diff --git a/target/arm/neon-dp.decode b/target/arm/neon-dp.decode
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/neon-dp.decode
+++ b/target/arm/neon-dp.decode
@@ -XXX,XX +XXX,XX @@ Vimm_1r          1111 001 . 1 . 000 ... .... cmode:4 0 . op:1 1 .... @1reg_imm
     ##################################################################
     VEXT         1111 001 0 1 . 11 .... .... imm:4 . q:1 . 0 .... \
                  vm=%vm_dp vn=%vn_dp vd=%vd_dp
+
+    VTBL         1111 001 1 1 . 11 .... .... 10 len:2 . op:1 . 0 .... \
+                 vm=%vm_dp vn=%vn_dp vd=%vd_dp
   ]
 
   # Subgroup for size != 0b11
diff --git a/target/arm/translate-neon.inc.c b/target/arm/translate-neon.inc.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.inc.c
+++ b/target/arm/translate-neon.inc.c
@@ -XXX,XX +XXX,XX @@ static bool trans_VEXT(DisasContext *s, arg_VEXT *a)
     }
     return true;
 }
+
+static bool trans_VTBL(DisasContext *s, arg_VTBL *a)
+{
+    int n;
+    TCGv_i32 tmp, tmp2, tmp3, tmp4;
+    TCGv_ptr ptr1;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
+        return false;
+    }
+
+    /* UNDEF accesses to D16-D31 if they don't exist. */
+    if (!dc_isar_feature(aa32_simd_r32, s) &&
+        ((a->vd | a->vn | a->vm) & 0x10)) {
+        return false;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    n = a->len + 1;
+    if ((a->vn + n) > 32) {
+        /*
+         * This is UNPREDICTABLE; we choose to UNDEF to avoid the
+         * helper function running off the end of the register file.
+         */
+        return false;
+    }
+    n <<= 3;
+    if (a->op) {
+        tmp = neon_load_reg(a->vd, 0);
+    } else {
+        tmp = tcg_temp_new_i32();
+        tcg_gen_movi_i32(tmp, 0);
+    }
+    tmp2 = neon_load_reg(a->vm, 0);
+    ptr1 = vfp_reg_ptr(true, a->vn);
+    tmp4 = tcg_const_i32(n);
+    gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp4);
+    tcg_temp_free_i32(tmp);
+    if (a->op) {
+        tmp = neon_load_reg(a->vd, 1);
+    } else {
+        tmp = tcg_temp_new_i32();
+        tcg_gen_movi_i32(tmp, 0);
+    }
+    tmp3 = neon_load_reg(a->vm, 1);
+    gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp4);
+    tcg_temp_free_i32(tmp4);
+    tcg_temp_free_ptr(ptr1);
+    neon_store_reg(a->vd, 0, tmp2);
+    neon_store_reg(a->vd, 1, tmp3);
+    tcg_temp_free_i32(tmp);
+    return true;
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
 {
     int op;
     int q;
-    int rd, rn, rm, rd_ofs, rm_ofs;
+    int rd, rm, rd_ofs, rm_ofs;
     int size;
     int pass;
     int u;
     int vec_size;
-    TCGv_i32 tmp, tmp2, tmp3, tmp5;
-    TCGv_ptr ptr1;
+    TCGv_i32 tmp, tmp2, tmp3;
 
     if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
         return 1;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
     q = (insn & (1 << 6)) != 0;
     u = (insn >> 24) & 1;
     VFP_DREG_D(rd, insn);
-    VFP_DREG_N(rn, insn);
     VFP_DREG_M(rm, insn);
     size = (insn >> 20) & 3;
     vec_size = q ? 16 : 8;
@@ -XXX,XX +XXX,XX @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
                     break;
                 }
             } else if ((insn & (1 << 10)) == 0) {
-                /* VTBL, VTBX.  */
-                int n = ((insn >> 8) & 3) + 1;
-                if ((rn + n) > 32) {
-                    /* This is UNPREDICTABLE; we choose to UNDEF to avoid the
-                     * helper function running off the end of the register file.
-                     */
-                    return 1;
-                }
-                n <<= 3;
-                if (insn & (1 << 6)) {
-                    tmp = neon_load_reg(rd, 0);
-                } else {
-                    tmp = tcg_temp_new_i32();
-                    tcg_gen_movi_i32(tmp, 0);
-                }
-                tmp2 = neon_load_reg(rm, 0);
-                ptr1 = vfp_reg_ptr(true, rn);
-                tmp5 = tcg_const_i32(n);
-                gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp5);
-                tcg_temp_free_i32(tmp);
-                if (insn & (1 << 6)) {
-                    tmp = neon_load_reg(rd, 1);
-                } else {
-                    tmp = tcg_temp_new_i32();
-                    tcg_gen_movi_i32(tmp, 0);
-                }
-                tmp3 = neon_load_reg(rm, 1);
-                gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp5);
-                tcg_temp_free_i32(tmp5);
-                tcg_temp_free_ptr(ptr1);
-                neon_store_reg(rd, 0, tmp2);
-                neon_store_reg(rd, 1, tmp3);
-                tcg_temp_free_i32(tmp);
+                /* VTBL, VTBX: handled by decodetree */
+                return 1;
             } else if ((insn & 0x380) == 0) {
                 /* VDUP */
                 int element;
-- 
2.20.1

Convert the Neon VDUP (scalar) insn to decodetree.  (Note that we
can't call this just "VDUP" as we used that already in vfp.decode for
the "VDUP (general purpose register" insn.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/neon-dp.decode       |  7 +++++++
 target/arm/translate-neon.inc.c | 26 ++++++++++++++++++++++++++
 target/arm/translate.c          | 25 +------------------------
 3 files changed, 34 insertions(+), 24 deletions(-)

From: Jean-Christophe Dubois <jcd@tribudubois.net>

Some bits of the CCM registers are non writable.

This was left undone in the initial commit (all bits of registers were
writable).

This patch adds the required code to protect the non writable bits.

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Message-id: 20200608133508.550046-1-jcd@tribudubois.net
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/misc/imx6ul_ccm.c | 76 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 63 insertions(+), 13 deletions(-)

diff --git a/hw/misc/imx6ul_ccm.c b/hw/misc/imx6ul_ccm.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/imx6ul_ccm.c
+++ b/hw/misc/imx6ul_ccm.c
@@ -XXX,XX +XXX,XX @@
 
 #include "trace.h"
 
+static const uint32_t ccm_mask[CCM_MAX] = {
+    [CCM_CCR] = 0xf01fef80,
+    [CCM_CCDR] = 0xfffeffff,
+    [CCM_CSR] = 0xffffffff,
+    [CCM_CCSR] = 0xfffffef2,
+    [CCM_CACRR] = 0xfffffff8,
+    [CCM_CBCDR] = 0xc1f8e000,
+    [CCM_CBCMR] = 0xfc03cfff,
+    [CCM_CSCMR1] = 0x80700000,
+    [CCM_CSCMR2] = 0xe01ff003,
+    [CCM_CSCDR1] = 0xfe00c780,
+    [CCM_CS1CDR] = 0xfe00fe00,
+    [CCM_CS2CDR] = 0xf8007000,
+    [CCM_CDCDR] = 0xf00fffff,
+    [CCM_CHSCCDR] = 0xfffc01ff,
+    [CCM_CSCDR2] = 0xfe0001ff,
+    [CCM_CSCDR3] = 0xffffc1ff,
+    [CCM_CDHIPR] = 0xffffffff,
+    [CCM_CTOR] = 0x00000000,
+    [CCM_CLPCR] = 0xf39ff01c,
+    [CCM_CISR] = 0xfb85ffbe,
+    [CCM_CIMR] = 0xfb85ffbf,
+    [CCM_CCOSR] = 0xfe00fe00,
+    [CCM_CGPR] = 0xfffc3fea,
+    [CCM_CCGR0] = 0x00000000,
+    [CCM_CCGR1] = 0x00000000,
+    [CCM_CCGR2] = 0x00000000,
+    [CCM_CCGR3] = 0x00000000,
+    [CCM_CCGR4] = 0x00000000,
+    [CCM_CCGR5] = 0x00000000,
+    [CCM_CCGR6] = 0x00000000,
+    [CCM_CMEOR] = 0xafffff1f,
+};
+
+static const uint32_t analog_mask[CCM_ANALOG_MAX] = {
+    [CCM_ANALOG_PLL_ARM] = 0xfff60f80,
+    [CCM_ANALOG_PLL_USB1] = 0xfffe0fbc,
+    [CCM_ANALOG_PLL_USB2] = 0xfffe0fbc,
+    [CCM_ANALOG_PLL_SYS] = 0xfffa0ffe,
+    [CCM_ANALOG_PLL_SYS_SS] = 0x00000000,
+    [CCM_ANALOG_PLL_SYS_NUM] = 0xc0000000,
+    [CCM_ANALOG_PLL_SYS_DENOM] = 0xc0000000,
+    [CCM_ANALOG_PLL_AUDIO] = 0xffe20f80,
+    [CCM_ANALOG_PLL_AUDIO_NUM] = 0xc0000000,
+    [CCM_ANALOG_PLL_AUDIO_DENOM] = 0xc0000000,
+    [CCM_ANALOG_PLL_VIDEO] = 0xffe20f80,
+    [CCM_ANALOG_PLL_VIDEO_NUM] = 0xc0000000,
+    [CCM_ANALOG_PLL_VIDEO_DENOM] = 0xc0000000,
+    [CCM_ANALOG_PLL_ENET] = 0xffc20ff0,
+    [CCM_ANALOG_PFD_480] = 0x40404040,
+    [CCM_ANALOG_PFD_528] = 0x40404040,
+    [PMU_MISC0] = 0x01fe8306,
+    [PMU_MISC1] = 0x07fcede0,
+    [PMU_MISC2] = 0x005f5f5f,
+};
+
 static const char *imx6ul_ccm_reg_name(uint32_t reg)
 {
     static char unknown[20];
@@ -XXX,XX +XXX,XX @@ static void imx6ul_ccm_write(void *opaque, hwaddr offset, uint64_t value,
 
     trace_ccm_write_reg(imx6ul_ccm_reg_name(index), (uint32_t)value);
 
-    /*
-     * We will do a better implementation later. In particular some bits
-     * cannot be written to.
-     */
-    s->ccm[index] = (uint32_t)value;
+    s->ccm[index] = (s->ccm[index] & ccm_mask[index]) |
+                           ((uint32_t)value & ~ccm_mask[index]);
 }
 
 static uint64_t imx6ul_analog_read(void *opaque, hwaddr offset, unsigned size)
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
          * the REG_NAME register. So we change the value of the
          * REG_NAME register, setting bits passed in the value.
          */
-        s->analog[index - 1] |= value;
+        s->analog[index - 1] |= (value & ~analog_mask[index - 1]);
         break;
     case CCM_ANALOG_PLL_ARM_CLR:
     case CCM_ANALOG_PLL_USB1_CLR:
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
          * the REG_NAME register. So we change the value of the
          * REG_NAME register, unsetting bits passed in the value.
          */
-        s->analog[index - 2] &= ~value;
+        s->analog[index - 2] &= ~(value & ~analog_mask[index - 2]);
         break;
     case CCM_ANALOG_PLL_ARM_TOG:
     case CCM_ANALOG_PLL_USB1_TOG:
@@ -XXX,XX +XXX,XX @@ static void imx6ul_analog_write(void *opaque, hwaddr offset, uint64_t value,
          * the REG_NAME register. So we change the value of the
          * REG_NAME register, toggling bits passed in the value.
          */
-        s->analog[index - 3] ^= value;
+        s->analog[index - 3] ^= (value & ~analog_mask[index - 3]);
         break;
     default:
-        /*
-         * We will do a better implementation later. In particular some bits
-         * cannot be written to.
-         */
-        s->analog[index] = value;
+        s->analog[index] = (s->analog[index] & analog_mask[index]) |
+                           (value & ~analog_mask[index]);
         break;
     }
 }
-- 
2.20.1

From: Erik Smit <erik.lucas.smit@gmail.com>

The hardware supports configurable descriptor sizes, configured in the DBLAC
register.

Most drivers use the default 4 word descriptor, which is currently hardcoded,
but Aspeed SDK configures 8 words to store extra data.

Signed-off-by: Erik Smit <erik.lucas.smit@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[PMM: removed unnecessary parens]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/net/ftgmac100.c | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/hw/net/ftgmac100.c b/hw/net/ftgmac100.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/ftgmac100.c
+++ b/hw/net/ftgmac100.c
@@ -XXX,XX +XXX,XX @@
 #define FTGMAC100_APTC_TXPOLL_CNT(x)        (((x) >> 8) & 0xf)
 #define FTGMAC100_APTC_TXPOLL_TIME_SEL      (1 << 12)
 
+/*
+ * DMA burst length and arbitration control register
+ */
+#define FTGMAC100_DBLAC_RXBURST_SIZE(x)     (((x) >> 8) & 0x3)
+#define FTGMAC100_DBLAC_TXBURST_SIZE(x)     (((x) >> 10) & 0x3)
+#define FTGMAC100_DBLAC_RXDES_SIZE(x)       ((((x) >> 12) & 0xf) * 8)
+#define FTGMAC100_DBLAC_TXDES_SIZE(x)       ((((x) >> 16) & 0xf) * 8)
+#define FTGMAC100_DBLAC_IFG_CNT(x)          (((x) >> 20) & 0x7)
+#define FTGMAC100_DBLAC_IFG_INC             (1 << 23)
+
 /*
  * PHY control register
  */
@@ -XXX,XX +XXX,XX @@ static void ftgmac100_do_tx(FTGMAC100State *s, uint32_t tx_ring,
         if (bd.des0 & s->txdes0_edotr) {
             addr = tx_ring;
         } else {
-            addr += sizeof(FTGMAC100Desc);
+            addr += FTGMAC100_DBLAC_TXDES_SIZE(s->dblac);
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static void ftgmac100_write(void *opaque, hwaddr addr,
         s->phydata = value & 0xffff;
         break;
     case FTGMAC100_DBLAC: /* DMA Burst Length and Arbitration Control */
+        if (FTGMAC100_DBLAC_TXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "%s: transmit descriptor too small : %d bytes\n",
+                          __func__, FTGMAC100_DBLAC_TXDES_SIZE(s->dblac));
+            break;
+        }
+        if (FTGMAC100_DBLAC_RXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "%s: receive descriptor too small : %d bytes\n",
+                          __func__, FTGMAC100_DBLAC_RXDES_SIZE(s->dblac));
+            break;
+        }
         s->dblac = value;
         break;
     case FTGMAC100_REVR:  /* Feature Register */
@@ -XXX,XX +XXX,XX @@ static ssize_t ftgmac100_receive(NetClientState *nc, const uint8_t *buf,
         if (bd.des0 & s->rxdes0_edorr) {
             addr = s->rx_ring;
         } else {
-            addr += sizeof(FTGMAC100Desc);
+            addr += FTGMAC100_DBLAC_RXDES_SIZE(s->dblac);
         }
     }
     s->rx_descriptor = addr;
-- 
2.20.1

From: fangying <fangying1@huawei.com>

Virtual time adjustment was implemented for virt-5.0 machine type,
but the cpu property was enabled only for host-passthrough and max
cpu model.  Let's add it for any KVM arm cpu which has the generic
timer feature enabled.

Signed-off-by: Ying Fang <fangying1@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Message-id: 20200608121243.2076-1-fangying1@huawei.com
[PMM: minor commit message tweak, removed inaccurate
 suggested-by tag]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/cpu.c   |  6 ++++--
 target/arm/cpu64.c |  1 -
 target/arm/kvm.c   | 21 +++++++++++----------
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ void arm_cpu_post_init(Object *obj)
     if (arm_feature(&cpu->env, ARM_FEATURE_GENERIC_TIMER)) {
         qdev_property_add_static(DEVICE(cpu), &arm_cpu_gt_cntfrq_property);
     }
+
+    if (kvm_enabled()) {
+        kvm_arm_add_vcpu_properties(obj);
+    }
 }
 
 static void arm_cpu_finalizefn(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void arm_max_initfn(Object *obj)
 
     if (kvm_enabled()) {
         kvm_arm_set_cpu_features_from_host(cpu);
-        kvm_arm_add_vcpu_properties(obj);
     } else {
         cortex_a15_initfn(obj);
 
@@ -XXX,XX +XXX,XX @@ static void arm_host_initfn(Object *obj)
     if (arm_feature(&cpu->env, ARM_FEATURE_AARCH64)) {
         aarch64_add_sve_properties(obj);
     }
-    kvm_arm_add_vcpu_properties(obj);
     arm_cpu_post_init(obj);
 }
 
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -XXX,XX +XXX,XX @@ static void aarch64_max_initfn(Object *obj)
 
     if (kvm_enabled()) {
         kvm_arm_set_cpu_features_from_host(cpu);
-        kvm_arm_add_vcpu_properties(obj);
     } else {
         uint64_t t;
         uint32_t u;
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -XXX,XX +XXX,XX @@ static void kvm_no_adjvtime_set(Object *obj, bool value, Error **errp)
 /* KVM VCPU properties should be prefixed with "kvm-". */
 void kvm_arm_add_vcpu_properties(Object *obj)
 {
-    if (!kvm_enabled()) {
-        return;
-    }
+    ARMCPU *cpu = ARM_CPU(obj);
+    CPUARMState *env = &cpu->env;
 
-    ARM_CPU(obj)->kvm_adjvtime = true;
-    object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
-                             kvm_no_adjvtime_set);
-    object_property_set_description(obj, "kvm-no-adjvtime",
-                                    "Set on to disable the adjustment of "
-                                    "the virtual counter. VM stopped time "
-                                    "will be counted.");
+    if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) {
+        cpu->kvm_adjvtime = true;
+        object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get,
+                                 kvm_no_adjvtime_set);
+        object_property_set_description(obj, "kvm-no-adjvtime",
+                                        "Set on to disable the adjustment of "
+                                        "the virtual counter. VM stopped time "
+                                        "will be counted.");
+    }
 }
 
 bool kvm_arm_pmu_supported(CPUState *cpu)
-- 
2.20.1

From: Jean-Christophe Dubois <jcd@tribudubois.net>

Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
[PMD: Fixed 32-bit format string using PRIx32/PRIx64]
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/net/imx_fec.c    | 106 +++++++++++++++++++-------------------------
 hw/net/trace-events |  18 ++++++++
 2 files changed, 63 insertions(+), 61 deletions(-)

diff --git a/hw/net/imx_fec.c b/hw/net/imx_fec.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/imx_fec.c
+++ b/hw/net/imx_fec.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/module.h"
 #include "net/checksum.h"
 #include "net/eth.h"
+#include "trace.h"
 
 /* For crc32 */
 #include <zlib.h>
 
-#ifndef DEBUG_IMX_FEC
-#define DEBUG_IMX_FEC 0
-#endif
-
-#define FEC_PRINTF(fmt, args...) \
-    do { \
-        if (DEBUG_IMX_FEC) { \
-            fprintf(stderr, "[%s]%s: " fmt , TYPE_IMX_FEC, \
-                                             __func__, ##args); \
-        } \
-    } while (0)
-
-#ifndef DEBUG_IMX_PHY
-#define DEBUG_IMX_PHY 0
-#endif
-
-#define PHY_PRINTF(fmt, args...) \
-    do { \
-        if (DEBUG_IMX_PHY) { \
-            fprintf(stderr, "[%s.phy]%s: " fmt , TYPE_IMX_FEC, \
-                                                 __func__, ##args); \
-        } \
-    } while (0)
-
 #define IMX_MAX_DESC    1024
 
 static const char *imx_default_reg_name(IMXFECState *s, uint32_t index)
@@ -XXX,XX +XXX,XX @@ static void imx_eth_update(IMXFECState *s);
  * For now we don't handle any GPIO/interrupt line, so the OS will
  * have to poll for the PHY status.
  */
-static void phy_update_irq(IMXFECState *s)
+static void imx_phy_update_irq(IMXFECState *s)
 {
     imx_eth_update(s);
 }
 
-static void phy_update_link(IMXFECState *s)
+static void imx_phy_update_link(IMXFECState *s)
 {
     /* Autonegotiation status mirrors link status.  */
     if (qemu_get_queue(s->nic)->link_down) {
-        PHY_PRINTF("link is down\n");
+        trace_imx_phy_update_link("down");
         s->phy_status &= ~0x0024;
         s->phy_int |= PHY_INT_DOWN;
     } else {
-        PHY_PRINTF("link is up\n");
+        trace_imx_phy_update_link("up");
         s->phy_status |= 0x0024;
         s->phy_int |= PHY_INT_ENERGYON;
         s->phy_int |= PHY_INT_AUTONEG_COMPLETE;
     }
-    phy_update_irq(s);
+    imx_phy_update_irq(s);
 }
 
 static void imx_eth_set_link(NetClientState *nc)
 {
-    phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
+    imx_phy_update_link(IMX_FEC(qemu_get_nic_opaque(nc)));
 }
 
-static void phy_reset(IMXFECState *s)
+static void imx_phy_reset(IMXFECState *s)
 {
+    trace_imx_phy_reset();
+
     s->phy_status = 0x7809;
     s->phy_control = 0x3000;
     s->phy_advertise = 0x01e1;
     s->phy_int_mask = 0;
     s->phy_int = 0;
-    phy_update_link(s);
+    imx_phy_update_link(s);
 }
 
-static uint32_t do_phy_read(IMXFECState *s, int reg)
+static uint32_t imx_phy_read(IMXFECState *s, int reg)
 {
     uint32_t val;
 
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
     case 29:    /* Interrupt source.  */
         val = s->phy_int;
         s->phy_int = 0;
-        phy_update_irq(s);
+        imx_phy_update_irq(s);
         break;
     case 30:    /* Interrupt mask */
         val = s->phy_int_mask;
@@ -XXX,XX +XXX,XX @@ static uint32_t do_phy_read(IMXFECState *s, int reg)
         break;
     }
 
-    PHY_PRINTF("read 0x%04x @ %d\n", val, reg);
+    trace_imx_phy_read(val, reg);
 
     return val;
 }
 
-static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
+static void imx_phy_write(IMXFECState *s, int reg, uint32_t val)
 {
-    PHY_PRINTF("write 0x%04x @ %d\n", val, reg);
+    trace_imx_phy_write(val, reg);
 
     if (reg > 31) {
         /* we only advertise one phy */
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
     switch (reg) {
     case 0:     /* Basic Control */
         if (val & 0x8000) {
-            phy_reset(s);
+            imx_phy_reset(s);
         } else {
             s->phy_control = val & 0x7980;
             /* Complete autonegotiation immediately.  */
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
         break;
     case 30:    /* Interrupt mask */
         s->phy_int_mask = val & 0xff;
-        phy_update_irq(s);
+        imx_phy_update_irq(s);
         break;
     case 17:
     case 18:
@@ -XXX,XX +XXX,XX @@ static void do_phy_write(IMXFECState *s, int reg, uint32_t val)
 static void imx_fec_read_bd(IMXFECBufDesc *bd, dma_addr_t addr)
 {
     dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
+
+    trace_imx_fec_read_bd(addr, bd->flags, bd->length, bd->data);
 }
 
 static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
@@ -XXX,XX +XXX,XX @@ static void imx_fec_write_bd(IMXFECBufDesc *bd, dma_addr_t addr)
 static void imx_enet_read_bd(IMXENETBufDesc *bd, dma_addr_t addr)
 {
     dma_memory_read(&address_space_memory, addr, bd, sizeof(*bd));
+
+    trace_imx_enet_read_bd(addr, bd->flags, bd->length, bd->data,
+                   bd->option, bd->status);
 }
 
 static void imx_enet_write_bd(IMXENETBufDesc *bd, dma_addr_t addr)
@@ -XXX,XX +XXX,XX @@ static void imx_fec_do_tx(IMXFECState *s)
         int len;
 
         imx_fec_read_bd(&bd, addr);
-        FEC_PRINTF("tx_bd %x flags %04x len %d data %08x\n",
-                   addr, bd.flags, bd.length, bd.data);
         if ((bd.flags & ENET_BD_R) == 0) {
+
             /* Run out of descriptors to transmit.  */
-            FEC_PRINTF("tx_bd ran out of descriptors to transmit\n");
+            trace_imx_eth_tx_bd_busy();
+
             break;
         }
         len = bd.length;
@@ -XXX,XX +XXX,XX @@ static void imx_enet_do_tx(IMXFECState *s, uint32_t index)
         int len;
 
         imx_enet_read_bd(&bd, addr);
-        FEC_PRINTF("tx_bd %x flags %04x len %d data %08x option %04x "
-                   "status %04x\n", addr, bd.flags, bd.length, bd.data,
-                   bd.option, bd.status);
         if ((bd.flags & ENET_BD_R) == 0) {
             /* Run out of descriptors to transmit.  */
+
+            trace_imx_eth_tx_bd_busy();
+
             break;
         }
         len = bd.length;
@@ -XXX,XX +XXX,XX @@ static void imx_eth_enable_rx(IMXFECState *s, bool flush)
     s->regs[ENET_RDAR] = (bd.flags & ENET_BD_E) ? ENET_RDAR_RDAR : 0;
 
     if (!s->regs[ENET_RDAR]) {
-        FEC_PRINTF("RX buffer full\n");
+        trace_imx_eth_rx_bd_full();
     } else if (flush) {
         qemu_flush_queued_packets(qemu_get_queue(s->nic));
     }
@@ -XXX,XX +XXX,XX @@ static void imx_eth_reset(DeviceState *d)
     memset(s->tx_descriptor, 0, sizeof(s->tx_descriptor));
 
     /* We also reset the PHY */
-    phy_reset(s);
+    imx_phy_reset(s);
 }
 
 static uint32_t imx_default_read(IMXFECState *s, uint32_t index)
@@ -XXX,XX +XXX,XX @@ static uint64_t imx_eth_read(void *opaque, hwaddr offset, unsigned size)
         break;
     }
 
-    FEC_PRINTF("reg[%s] => 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
-                                              value);
+    trace_imx_eth_read(index, imx_eth_reg_name(s, index), value);
 
     return value;
 }
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
     const bool single_tx_ring = !imx_eth_is_multi_tx_ring(s);
     uint32_t index = offset >> 2;
 
-    FEC_PRINTF("reg[%s] <= 0x%" PRIx32 "\n", imx_eth_reg_name(s, index),
-                (uint32_t)value);
+    trace_imx_eth_write(index, imx_eth_reg_name(s, index), value);
 
     switch (index) {
     case ENET_EIR:
@@ -XXX,XX +XXX,XX @@ static void imx_eth_write(void *opaque, hwaddr offset, uint64_t value,
         if (extract32(value, 29, 1)) {
             /* This is a read operation */
             s->regs[ENET_MMFR] = deposit32(s->regs[ENET_MMFR], 0, 16,
-                                           do_phy_read(s,
+                                           imx_phy_read(s,
                                                        extract32(value,
                                                                  18, 10)));
         } else {
             /* This a write operation */
-            do_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
+            imx_phy_write(s, extract32(value, 18, 10), extract32(value, 0, 16));
         }
         /* raise the interrupt as the PHY operation is done */
         s->regs[ENET_EIR] |= ENET_INT_MII;
@@ -XXX,XX +XXX,XX @@ static bool imx_eth_can_receive(NetClientState *nc)
 {
     IMXFECState *s = IMX_FEC(qemu_get_nic_opaque(nc));
 
-    FEC_PRINTF("\n");
-
     return !!s->regs[ENET_RDAR];
 }
 
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
     unsigned int buf_len;
     size_t size = len;
 
-    FEC_PRINTF("len %d\n", (int)size);
+    trace_imx_fec_receive(size);
 
     if (!s->regs[ENET_RDAR]) {
         qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
         bd.length = buf_len;
         size -= buf_len;
 
-        FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
+        trace_imx_fec_receive_len(addr, bd.length);
 
         /* The last 4 bytes are the CRC.  */
         if (size < 4) {
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_fec_receive(NetClientState *nc, const uint8_t *buf,
         if (size == 0) {
             /* Last buffer in frame.  */
             bd.flags |= flags | ENET_BD_L;
-            FEC_PRINTF("rx frame flags %04x\n", bd.flags);
+
+            trace_imx_fec_receive_last(bd.flags);
+
             s->regs[ENET_EIR] |= ENET_INT_RXF;
         } else {
             s->regs[ENET_EIR] |= ENET_INT_RXB;
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
     size_t size = len;
     bool shift16 = s->regs[ENET_RACC] & ENET_RACC_SHIFT16;
 
-    FEC_PRINTF("len %d\n", (int)size);
+    trace_imx_enet_receive(size);
 
     if (!s->regs[ENET_RDAR]) {
         qemu_log_mask(LOG_GUEST_ERROR, "[%s]%s: Unexpected packet\n",
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
         bd.length = buf_len;
         size -= buf_len;
 
-        FEC_PRINTF("rx_bd 0x%x length %d\n", addr, bd.length);
+        trace_imx_enet_receive_len(addr, bd.length);
 
         /* The last 4 bytes are the CRC.  */
         if (size < 4) {
@@ -XXX,XX +XXX,XX @@ static ssize_t imx_enet_receive(NetClientState *nc, const uint8_t *buf,
         if (size == 0) {
             /* Last buffer in frame.  */
             bd.flags |= flags | ENET_BD_L;
-            FEC_PRINTF("rx frame flags %04x\n", bd.flags);
+
+            trace_imx_enet_receive_last(bd.flags);
+
             /* Indicate that we've updated the last buffer descriptor. */
             bd.last_buffer = ENET_BD_BDU;
             if (bd.option & ENET_BD_RX_INT) {
diff --git a/hw/net/trace-events b/hw/net/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -XXX,XX +XXX,XX @@ i82596_receive_packet(size_t sz) "len=%zu"
 i82596_new_mac(const char *id_with_mac) "New MAC for: %s"
 i82596_set_multicast(uint16_t count) "Added %d multicast entries"
 i82596_channel_attention(void *s) "%p: Received CHANNEL ATTENTION"
+
+# imx_fec.c
+imx_phy_read(uint32_t val, int reg) "0x%04"PRIx32" <= reg[%d]"
+imx_phy_write(uint32_t val, int reg) "0x%04"PRIx32" => reg[%d]"
+imx_phy_update_link(const char *s) "%s"
+imx_phy_reset(void) ""
+imx_fec_read_bd(uint64_t addr, int flags, int len, int data) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x"
+imx_enet_read_bd(uint64_t addr, int flags, int len, int data, int options, int status) "tx_bd 0x%"PRIx64" flags 0x%04x len %d data 0x%08x option 0x%04x status 0x%04x"
+imx_eth_tx_bd_busy(void) "tx_bd ran out of descriptors to transmit"
+imx_eth_rx_bd_full(void) "RX buffer is full"
+imx_eth_read(int reg, const char *reg_name, uint32_t value) "reg[%d:%s] => 0x%08"PRIx32
+imx_eth_write(int reg, const char *reg_name, uint64_t value) "reg[%d:%s] <= 0x%08"PRIx64
+imx_fec_receive(size_t size) "len %zu"
+imx_fec_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
+imx_fec_receive_last(int last) "rx frame flags 0x%04x"
+imx_enet_receive(size_t size) "len %zu"
+imx_enet_receive_len(uint64_t addr, int len) "rx_bd 0x%"PRIx64" length %d"
+imx_enet_receive_last(int last) "rx frame flags 0x%04x"
-- 
2.20.1

From: Guenter Roeck <linux@roeck-us.net>

The Linux kernel's IMX code now uses vendor specific commands.
This results in endless warnings when booting the Linux kernel.

sdhci-esdhc-imx 2194000.usdhc: esdhc_wait_for_card_clock_gate_off:
	card clock still not gate off in 100us!.

Implement support for the vendor specific command implemented in IMX hardware
to be able to avoid this warning.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Message-id: 20200603145258.195920-2-linux@roeck-us.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/sd/sdhci-internal.h |  5 +++++
 include/hw/sd/sdhci.h  |  5 +++++
 hw/sd/sdhci.c          | 18 +++++++++++++++++-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/hw/sd/sdhci-internal.h b/hw/sd/sdhci-internal.h
index XXXXXXX..XXXXXXX 100644
--- a/hw/sd/sdhci-internal.h
+++ b/hw/sd/sdhci-internal.h
@@ -XXX,XX +XXX,XX @@
 #define SDHC_CMD_INHIBIT               0x00000001
 #define SDHC_DATA_INHIBIT              0x00000002
 #define SDHC_DAT_LINE_ACTIVE           0x00000004
+#define SDHC_IMX_CLOCK_GATE_OFF        0x00000080
 #define SDHC_DOING_WRITE               0x00000100
 #define SDHC_DOING_READ                0x00000200
 #define SDHC_SPACE_AVAILABLE           0x00000400
@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
 
 
 #define ESDHC_MIX_CTRL                  0x48
+
 #define ESDHC_VENDOR_SPEC               0xc0
+#define ESDHC_IMX_FRC_SDCLK_ON          (1 << 8)
+
 #define ESDHC_DLL_CTRL                  0x60
 
 #define ESDHC_TUNING_CTRL               0xcc
@@ -XXX,XX +XXX,XX @@ extern const VMStateDescription sdhci_vmstate;
 #define DEFINE_SDHCI_COMMON_PROPERTIES(_state) \
     DEFINE_PROP_UINT8("sd-spec-version", _state, sd_spec_version, 2), \
     DEFINE_PROP_UINT8("uhs", _state, uhs_mode, UHS_NOT_SUPPORTED), \
+    DEFINE_PROP_UINT8("vendor", _state, vendor, SDHCI_VENDOR_NONE), \
     \
     /* Capabilities registers provide information on supported
      * features of this specific host controller implementation */ \
diff --git a/include/hw/sd/sdhci.h b/include/hw/sd/sdhci.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/sd/sdhci.h
+++ b/include/hw/sd/sdhci.h
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
     uint16_t acmd12errsts; /* Auto CMD12 error status register */
     uint16_t hostctl2;     /* Host Control 2 */
     uint64_t admasysaddr;  /* ADMA System Address Register */
+    uint16_t vendor_spec;  /* Vendor specific register */
 
     /* Read-only registers */
     uint64_t capareg;      /* Capabilities Register */
@@ -XXX,XX +XXX,XX @@ typedef struct SDHCIState {
     uint32_t quirks;
     uint8_t sd_spec_version;
     uint8_t uhs_mode;
+    uint8_t vendor;        /* For vendor specific functionality */
 } SDHCIState;
 
+#define SDHCI_VENDOR_NONE       0
+#define SDHCI_VENDOR_IMX        1
+
 /*
  * Controller does not provide transfer-complete interrupt when not
  * busy.
diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -XXX,XX +XXX,XX @@ static uint64_t usdhc_read(void *opaque, hwaddr offset, unsigned size)
         }
         break;
 
+    case ESDHC_VENDOR_SPEC:
+        ret = s->vendor_spec;
+        break;
     case ESDHC_DLL_CTRL:
     case ESDHC_TUNE_CTRL_STATUS:
     case ESDHC_UNDOCUMENTED_REG27:
     case ESDHC_TUNING_CTRL:
-    case ESDHC_VENDOR_SPEC:
     case ESDHC_MIX_CTRL:
     case ESDHC_WTMK_LVL:
         ret = 0;
@@ -XXX,XX +XXX,XX @@ usdhc_write(void *opaque, hwaddr offset, uint64_t val, unsigned size)
     case ESDHC_UNDOCUMENTED_REG27:
     case ESDHC_TUNING_CTRL:
     case ESDHC_WTMK_LVL:
+        break;
+
     case ESDHC_VENDOR_SPEC:
+        s->vendor_spec = value;
+        switch (s->vendor) {
+        case SDHCI_VENDOR_IMX:
+            if (value & ESDHC_IMX_FRC_SDCLK_ON) {
+                s->prnsts &= ~SDHC_IMX_CLOCK_GATE_OFF;
+            } else {
+                s->prnsts |= SDHC_IMX_CLOCK_GATE_OFF;
+            }
+            break;
+        default:
+            break;
+        }
         break;
 
     case SDHC_HOSTCTL:
-- 
2.20.1

From: Guenter Roeck <linux@roeck-us.net>

Set vendor property to IMX to enable IMX specific functionality
in sdhci code.

Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20200603145258.195920-3-linux@roeck-us.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/fsl-imx25.c  | 6 ++++++
 hw/arm/fsl-imx6.c   | 6 ++++++
 hw/arm/fsl-imx6ul.c | 2 ++
 hw/arm/fsl-imx7.c   | 2 ++
 4 files changed, 16 insertions(+)

diff --git a/hw/arm/fsl-imx25.c b/hw/arm/fsl-imx25.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx25.c
+++ b/hw/arm/fsl-imx25.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx25_realize(DeviceState *dev, Error **errp)
                                  &err);
         object_property_set_uint(OBJECT(&s->esdhc[i]), IMX25_ESDHC_CAPABILITIES,
                                  "capareg", &err);
+        object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
+                                 "vendor", &err);
+        if (err) {
+            error_propagate(errp, err);
+            return;
+        }
         object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
         if (err) {
             error_propagate(errp, err);
diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx6.c
+++ b/hw/arm/fsl-imx6.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6_realize(DeviceState *dev, Error **errp)
                                  &err);
         object_property_set_uint(OBJECT(&s->esdhc[i]), IMX6_ESDHC_CAPABILITIES,
                                  "capareg", &err);
+        object_property_set_uint(OBJECT(&s->esdhc[i]), SDHCI_VENDOR_IMX,
+                                 "vendor", &err);
+        if (err) {
+            error_propagate(errp, err);
+            return;
+        }
         object_property_set_bool(OBJECT(&s->esdhc[i]), true, "realized", &err);
         if (err) {
             error_propagate(errp, err);
diff --git a/hw/arm/fsl-imx6ul.c b/hw/arm/fsl-imx6ul.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx6ul.c
+++ b/hw/arm/fsl-imx6ul.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx6ul_realize(DeviceState *dev, Error **errp)
             FSL_IMX6UL_USDHC2_IRQ,
         };
 
+        object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
+                                        "vendor", &error_abort);
         object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
                                  &error_abort);
 
diff --git a/hw/arm/fsl-imx7.c b/hw/arm/fsl-imx7.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/fsl-imx7.c
+++ b/hw/arm/fsl-imx7.c
@@ -XXX,XX +XXX,XX @@ static void fsl_imx7_realize(DeviceState *dev, Error **errp)
             FSL_IMX7_USDHC3_IRQ,
         };
 
+        object_property_set_uint(OBJECT(&s->usdhc[i]), SDHCI_VENDOR_IMX,
+                                 "vendor", &error_abort);
         object_property_set_bool(OBJECT(&s->usdhc[i]), true, "realized",
                                  &error_abort);
 
-- 
2.20.1

The following changes since commit 8f6330a807f2642dc2a3cdf33347aa28a4c00a87:

Merge tag 'pull-maintainer-updates-060324-1' of https://gitlab.com/stsquad/qemu into staging (2024-03-06 16:56:20 +0000)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20240308

for you to fetch changes up to bbf6c6dbead82292a20951eb1204442a6b838de9:

target/arm: Move v7m-related code from cpu32.c into a separate file (2024-03-08 14:45:03 +0000)

----------------------------------------------------------------
target-arm queue:
 * Implement FEAT_ECV
 * STM32L4x5: Implement GPIO device
 * Fix 32-bit SMOPA
 * Refactor v7m related code from cpu32.c into its own file
 * hw/rtc/sun4v-rtc: Relicense to GPLv2-or-later

----------------------------------------------------------------
Inès Varhol (3):
      hw/gpio: Implement STM32L4x5 GPIO
      hw/arm: Connect STM32L4x5 GPIO to STM32L4x5 SoC
      tests/qtest: Add STM32L4x5 GPIO QTest testcase

Peter Maydell (9):
      target/arm: Move some register related defines to internals.h
      target/arm: Timer _EL02 registers UNDEF for E2H == 0
      target/arm: use FIELD macro for CNTHCTL bit definitions
      target/arm: Don't allow RES0 CNTHCTL_EL2 bits to be written
      target/arm: Implement new FEAT_ECV trap bits
      target/arm: Define CNTPCTSS_EL0 and CNTVCTSS_EL0
      target/arm: Implement FEAT_ECV CNTPOFF_EL2 handling
      target/arm: Enable FEAT_ECV for 'max' CPU
      hw/rtc/sun4v-rtc: Relicense to GPLv2-or-later

Richard Henderson (1):
      target/arm: Fix 32-bit SMOPA

Thomas Huth (1):
      target/arm: Move v7m-related code from cpu32.c into a separate file

MAINTAINERS                        |   1 +
 docs/system/arm/b-l475e-iot01a.rst |   2 +-
 docs/system/arm/emulation.rst      |   1 +
 include/hw/arm/stm32l4x5_soc.h     |   2 +
 include/hw/gpio/stm32l4x5_gpio.h   |  71 +++++
 include/hw/misc/stm32l4x5_syscfg.h |   3 +-
 include/hw/rtc/sun4v-rtc.h         |   2 +-
 target/arm/cpu-features.h          |  10 +
 target/arm/cpu.h                   | 129 +--------
 target/arm/internals.h             | 151 ++++++++++
 hw/arm/stm32l4x5_soc.c             |  71 ++++-
 hw/gpio/stm32l4x5_gpio.c           | 477 ++++++++++++++++++++++++++++++++
 hw/misc/stm32l4x5_syscfg.c         |   1 +
 hw/rtc/sun4v-rtc.c                 |   2 +-
 target/arm/helper.c                | 189 ++++++++++++-
 target/arm/tcg/cpu-v7m.c           | 290 +++++++++++++++++++
 target/arm/tcg/cpu32.c             | 261 ------------------
 target/arm/tcg/cpu64.c             |   1 +
 target/arm/tcg/sme_helper.c        |  77 +++---
 tests/qtest/stm32l4x5_gpio-test.c  | 551 +++++++++++++++++++++++++++++++++++++
 tests/tcg/aarch64/sme-smopa-1.c    |  47 ++++
 tests/tcg/aarch64/sme-smopa-2.c    |  54 ++++
 hw/arm/Kconfig                     |   3 +-
 hw/gpio/Kconfig                    |   3 +
 hw/gpio/meson.build                |   1 +
 hw/gpio/trace-events               |   6 +
 target/arm/meson.build             |   3 +
 target/arm/tcg/meson.build         |   3 +
 target/arm/trace-events            |   1 +
 tests/qtest/meson.build            |   3 +-
 tests/tcg/aarch64/Makefile.target  |   2 +-
 31 files changed, 1962 insertions(+), 456 deletions(-)
 create mode 100644 include/hw/gpio/stm32l4x5_gpio.h
 create mode 100644 hw/gpio/stm32l4x5_gpio.c
 create mode 100644 target/arm/tcg/cpu-v7m.c
 create mode 100644 tests/qtest/stm32l4x5_gpio-test.c
 create mode 100644 tests/tcg/aarch64/sme-smopa-1.c
 create mode 100644 tests/tcg/aarch64/sme-smopa-2.c

cpu.h has a lot of #defines relating to CPU register fields.
Most of these aren't actually used outside target/arm code,
so there's no point in cluttering up the cpu.h file with them.
Move some easy ones to internals.h.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240301183219.2424889-2-peter.maydell@linaro.org
---
 target/arm/cpu.h       | 128 -----------------------------------------
 target/arm/internals.h | 128 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 128 insertions(+), 128 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct ARMGenericTimer {
     uint64_t ctl; /* Timer Control register */
 } ARMGenericTimer;
 
-#define VTCR_NSW (1u << 29)
-#define VTCR_NSA (1u << 30)
-#define VSTCR_SW VTCR_NSW
-#define VSTCR_SA VTCR_NSA
-
 /* Define a maximum sized vector register.
  * For 32-bit, this is a 128-bit NEON/AdvSIMD register.
  * For 64-bit, this is a 2048-bit SVE register.
@@ -XXX,XX +XXX,XX @@ void pmu_init(ARMCPU *cpu);
 #define SCTLR_SPINTMASK (1ULL << 62) /* FEAT_NMI */
 #define SCTLR_TIDCP   (1ULL << 63) /* FEAT_TIDCP1 */
 
-/* Bit definitions for CPACR (AArch32 only) */
-FIELD(CPACR, CP10, 20, 2)
-FIELD(CPACR, CP11, 22, 2)
-FIELD(CPACR, TRCDIS, 28, 1)    /* matches CPACR_EL1.TTA */
-FIELD(CPACR, D32DIS, 30, 1)    /* up to v7; RAZ in v8 */
-FIELD(CPACR, ASEDIS, 31, 1)
-
-/* Bit definitions for CPACR_EL1 (AArch64 only) */
-FIELD(CPACR_EL1, ZEN, 16, 2)
-FIELD(CPACR_EL1, FPEN, 20, 2)
-FIELD(CPACR_EL1, SMEN, 24, 2)
-FIELD(CPACR_EL1, TTA, 28, 1)   /* matches CPACR.TRCDIS */
-
-/* Bit definitions for HCPTR (AArch32 only) */
-FIELD(HCPTR, TCP10, 10, 1)
-FIELD(HCPTR, TCP11, 11, 1)
-FIELD(HCPTR, TASE, 15, 1)
-FIELD(HCPTR, TTA, 20, 1)
-FIELD(HCPTR, TAM, 30, 1)       /* matches CPTR_EL2.TAM */
-FIELD(HCPTR, TCPAC, 31, 1)     /* matches CPTR_EL2.TCPAC */
-
-/* Bit definitions for CPTR_EL2 (AArch64 only) */
-FIELD(CPTR_EL2, TZ, 8, 1)      /* !E2H */
-FIELD(CPTR_EL2, TFP, 10, 1)    /* !E2H, matches HCPTR.TCP10 */
-FIELD(CPTR_EL2, TSM, 12, 1)    /* !E2H */
-FIELD(CPTR_EL2, ZEN, 16, 2)    /* E2H */
-FIELD(CPTR_EL2, FPEN, 20, 2)   /* E2H */
-FIELD(CPTR_EL2, SMEN, 24, 2)   /* E2H */
-FIELD(CPTR_EL2, TTA, 28, 1)
-FIELD(CPTR_EL2, TAM, 30, 1)    /* matches HCPTR.TAM */
-FIELD(CPTR_EL2, TCPAC, 31, 1)  /* matches HCPTR.TCPAC */
-
-/* Bit definitions for CPTR_EL3 (AArch64 only) */
-FIELD(CPTR_EL3, EZ, 8, 1)
-FIELD(CPTR_EL3, TFP, 10, 1)
-FIELD(CPTR_EL3, ESM, 12, 1)
-FIELD(CPTR_EL3, TTA, 20, 1)
-FIELD(CPTR_EL3, TAM, 30, 1)
-FIELD(CPTR_EL3, TCPAC, 31, 1)
-
-#define MDCR_MTPME    (1U << 28)
-#define MDCR_TDCC     (1U << 27)
-#define MDCR_HLP      (1U << 26)  /* MDCR_EL2 */
-#define MDCR_SCCD     (1U << 23)  /* MDCR_EL3 */
-#define MDCR_HCCD     (1U << 23)  /* MDCR_EL2 */
-#define MDCR_EPMAD    (1U << 21)
-#define MDCR_EDAD     (1U << 20)
-#define MDCR_TTRF     (1U << 19)
-#define MDCR_STE      (1U << 18)  /* MDCR_EL3 */
-#define MDCR_SPME     (1U << 17)  /* MDCR_EL3 */
-#define MDCR_HPMD     (1U << 17)  /* MDCR_EL2 */
-#define MDCR_SDD      (1U << 16)
-#define MDCR_SPD      (3U << 14)
-#define MDCR_TDRA     (1U << 11)
-#define MDCR_TDOSA    (1U << 10)
-#define MDCR_TDA      (1U << 9)
-#define MDCR_TDE      (1U << 8)
-#define MDCR_HPME     (1U << 7)
-#define MDCR_TPM      (1U << 6)
-#define MDCR_TPMCR    (1U << 5)
-#define MDCR_HPMN     (0x1fU)
-
-/* Not all of the MDCR_EL3 bits are present in the 32-bit SDCR */
-#define SDCR_VALID_MASK (MDCR_MTPME | MDCR_TDCC | MDCR_SCCD | \
-                         MDCR_EPMAD | MDCR_EDAD | MDCR_TTRF | \
-                         MDCR_STE | MDCR_SPME | MDCR_SPD)
-
 #define CPSR_M (0x1fU)
 #define CPSR_T (1U << 5)
 #define CPSR_F (1U << 6)
@@ -XXX,XX +XXX,XX @@ FIELD(CPTR_EL3, TCPAC, 31, 1)
 #define XPSR_NZCV CPSR_NZCV
 #define XPSR_IT CPSR_IT
 
-#define TTBCR_N      (7U << 0) /* TTBCR.EAE==0 */
-#define TTBCR_T0SZ   (7U << 0) /* TTBCR.EAE==1 */
-#define TTBCR_PD0    (1U << 4)
-#define TTBCR_PD1    (1U << 5)
-#define TTBCR_EPD0   (1U << 7)
-#define TTBCR_IRGN0  (3U << 8)
-#define TTBCR_ORGN0  (3U << 10)
-#define TTBCR_SH0    (3U << 12)
-#define TTBCR_T1SZ   (3U << 16)
-#define TTBCR_A1     (1U << 22)
-#define TTBCR_EPD1   (1U << 23)
-#define TTBCR_IRGN1  (3U << 24)
-#define TTBCR_ORGN1  (3U << 26)
-#define TTBCR_SH1    (1U << 28)
-#define TTBCR_EAE    (1U << 31)
-
-FIELD(VTCR, T0SZ, 0, 6)
-FIELD(VTCR, SL0, 6, 2)
-FIELD(VTCR, IRGN0, 8, 2)
-FIELD(VTCR, ORGN0, 10, 2)
-FIELD(VTCR, SH0, 12, 2)
-FIELD(VTCR, TG0, 14, 2)
-FIELD(VTCR, PS, 16, 3)
-FIELD(VTCR, VS, 19, 1)
-FIELD(VTCR, HA, 21, 1)
-FIELD(VTCR, HD, 22, 1)
-FIELD(VTCR, HWU59, 25, 1)
-FIELD(VTCR, HWU60, 26, 1)
-FIELD(VTCR, HWU61, 27, 1)
-FIELD(VTCR, HWU62, 28, 1)
-FIELD(VTCR, NSW, 29, 1)
-FIELD(VTCR, NSA, 30, 1)
-FIELD(VTCR, DS, 32, 1)
-FIELD(VTCR, SL2, 33, 1)
-
 /* Bit definitions for ARMv8 SPSR (PSTATE) format.
  * Only these are valid when in AArch64 mode; in
  * AArch32 mode SPSRs are basically CPSR-format.
@@ -XXX,XX +XXX,XX @@ static inline void xpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
 #define HCR_TWEDEN    (1ULL << 59)
 #define HCR_TWEDEL    MAKE_64BIT_MASK(60, 4)
 
-#define HCRX_ENAS0    (1ULL << 0)
-#define HCRX_ENALS    (1ULL << 1)
-#define HCRX_ENASR    (1ULL << 2)
-#define HCRX_FNXS     (1ULL << 3)
-#define HCRX_FGTNXS   (1ULL << 4)
-#define HCRX_SMPME    (1ULL << 5)
-#define HCRX_TALLINT  (1ULL << 6)
-#define HCRX_VINMI    (1ULL << 7)
-#define HCRX_VFNMI    (1ULL << 8)
-#define HCRX_CMOW     (1ULL << 9)
-#define HCRX_MCE2     (1ULL << 10)
-#define HCRX_MSCEN    (1ULL << 11)
-
-#define HPFAR_NS      (1ULL << 63)
-
 #define SCR_NS                (1ULL << 0)
 #define SCR_IRQ               (1ULL << 1)
 #define SCR_FIQ               (1ULL << 2)
@@ -XXX,XX +XXX,XX @@ static inline void xpsr_write(CPUARMState *env, uint32_t val, uint32_t mask)
 #define SCR_GPF               (1ULL << 48)
 #define SCR_NSE               (1ULL << 62)
 
-#define HSTR_TTEE (1 << 16)
-#define HSTR_TJDBX (1 << 17)
-
-#define CNTHCTL_CNTVMASK      (1 << 18)
-#define CNTHCTL_CNTPMASK      (1 << 19)
-
 /* Return the current FPSCR value.  */
 uint32_t vfp_get_fpscr(CPUARMState *env);
 void vfp_set_fpscr(CPUARMState *env, uint32_t val);
diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ FIELD(DBGWCR, WT, 20, 1)
 FIELD(DBGWCR, MASK, 24, 5)
 FIELD(DBGWCR, SSCE, 29, 1)
 
+#define VTCR_NSW (1u << 29)
+#define VTCR_NSA (1u << 30)
+#define VSTCR_SW VTCR_NSW
+#define VSTCR_SA VTCR_NSA
+
+/* Bit definitions for CPACR (AArch32 only) */
+FIELD(CPACR, CP10, 20, 2)
+FIELD(CPACR, CP11, 22, 2)
+FIELD(CPACR, TRCDIS, 28, 1)    /* matches CPACR_EL1.TTA */
+FIELD(CPACR, D32DIS, 30, 1)    /* up to v7; RAZ in v8 */
+FIELD(CPACR, ASEDIS, 31, 1)
+
+/* Bit definitions for CPACR_EL1 (AArch64 only) */
+FIELD(CPACR_EL1, ZEN, 16, 2)
+FIELD(CPACR_EL1, FPEN, 20, 2)
+FIELD(CPACR_EL1, SMEN, 24, 2)
+FIELD(CPACR_EL1, TTA, 28, 1)   /* matches CPACR.TRCDIS */
+
+/* Bit definitions for HCPTR (AArch32 only) */
+FIELD(HCPTR, TCP10, 10, 1)
+FIELD(HCPTR, TCP11, 11, 1)
+FIELD(HCPTR, TASE, 15, 1)
+FIELD(HCPTR, TTA, 20, 1)
+FIELD(HCPTR, TAM, 30, 1)       /* matches CPTR_EL2.TAM */
+FIELD(HCPTR, TCPAC, 31, 1)     /* matches CPTR_EL2.TCPAC */
+
+/* Bit definitions for CPTR_EL2 (AArch64 only) */
+FIELD(CPTR_EL2, TZ, 8, 1)      /* !E2H */
+FIELD(CPTR_EL2, TFP, 10, 1)    /* !E2H, matches HCPTR.TCP10 */
+FIELD(CPTR_EL2, TSM, 12, 1)    /* !E2H */
+FIELD(CPTR_EL2, ZEN, 16, 2)    /* E2H */
+FIELD(CPTR_EL2, FPEN, 20, 2)   /* E2H */
+FIELD(CPTR_EL2, SMEN, 24, 2)   /* E2H */
+FIELD(CPTR_EL2, TTA, 28, 1)
+FIELD(CPTR_EL2, TAM, 30, 1)    /* matches HCPTR.TAM */
+FIELD(CPTR_EL2, TCPAC, 31, 1)  /* matches HCPTR.TCPAC */
+
+/* Bit definitions for CPTR_EL3 (AArch64 only) */
+FIELD(CPTR_EL3, EZ, 8, 1)
+FIELD(CPTR_EL3, TFP, 10, 1)
+FIELD(CPTR_EL3, ESM, 12, 1)
+FIELD(CPTR_EL3, TTA, 20, 1)
+FIELD(CPTR_EL3, TAM, 30, 1)
+FIELD(CPTR_EL3, TCPAC, 31, 1)
+
+#define MDCR_MTPME    (1U << 28)
+#define MDCR_TDCC     (1U << 27)
+#define MDCR_HLP      (1U << 26)  /* MDCR_EL2 */
+#define MDCR_SCCD     (1U << 23)  /* MDCR_EL3 */
+#define MDCR_HCCD     (1U << 23)  /* MDCR_EL2 */
+#define MDCR_EPMAD    (1U << 21)
+#define MDCR_EDAD     (1U << 20)
+#define MDCR_TTRF     (1U << 19)
+#define MDCR_STE      (1U << 18)  /* MDCR_EL3 */
+#define MDCR_SPME     (1U << 17)  /* MDCR_EL3 */
+#define MDCR_HPMD     (1U << 17)  /* MDCR_EL2 */
+#define MDCR_SDD      (1U << 16)
+#define MDCR_SPD      (3U << 14)
+#define MDCR_TDRA     (1U << 11)
+#define MDCR_TDOSA    (1U << 10)
+#define MDCR_TDA      (1U << 9)
+#define MDCR_TDE      (1U << 8)
+#define MDCR_HPME     (1U << 7)
+#define MDCR_TPM      (1U << 6)
+#define MDCR_TPMCR    (1U << 5)
+#define MDCR_HPMN     (0x1fU)
+
+/* Not all of the MDCR_EL3 bits are present in the 32-bit SDCR */
+#define SDCR_VALID_MASK (MDCR_MTPME | MDCR_TDCC | MDCR_SCCD | \
+                         MDCR_EPMAD | MDCR_EDAD | MDCR_TTRF | \
+                         MDCR_STE | MDCR_SPME | MDCR_SPD)
+
+#define TTBCR_N      (7U << 0) /* TTBCR.EAE==0 */
+#define TTBCR_T0SZ   (7U << 0) /* TTBCR.EAE==1 */
+#define TTBCR_PD0    (1U << 4)
+#define TTBCR_PD1    (1U << 5)
+#define TTBCR_EPD0   (1U << 7)
+#define TTBCR_IRGN0  (3U << 8)
+#define TTBCR_ORGN0  (3U << 10)
+#define TTBCR_SH0    (3U << 12)
+#define TTBCR_T1SZ   (3U << 16)
+#define TTBCR_A1     (1U << 22)
+#define TTBCR_EPD1   (1U << 23)
+#define TTBCR_IRGN1  (3U << 24)
+#define TTBCR_ORGN1  (3U << 26)
+#define TTBCR_SH1    (1U << 28)
+#define TTBCR_EAE    (1U << 31)
+
+FIELD(VTCR, T0SZ, 0, 6)
+FIELD(VTCR, SL0, 6, 2)
+FIELD(VTCR, IRGN0, 8, 2)
+FIELD(VTCR, ORGN0, 10, 2)
+FIELD(VTCR, SH0, 12, 2)
+FIELD(VTCR, TG0, 14, 2)
+FIELD(VTCR, PS, 16, 3)
+FIELD(VTCR, VS, 19, 1)
+FIELD(VTCR, HA, 21, 1)
+FIELD(VTCR, HD, 22, 1)
+FIELD(VTCR, HWU59, 25, 1)
+FIELD(VTCR, HWU60, 26, 1)
+FIELD(VTCR, HWU61, 27, 1)
+FIELD(VTCR, HWU62, 28, 1)
+FIELD(VTCR, NSW, 29, 1)
+FIELD(VTCR, NSA, 30, 1)
+FIELD(VTCR, DS, 32, 1)
+FIELD(VTCR, SL2, 33, 1)
+
+#define HCRX_ENAS0    (1ULL << 0)
+#define HCRX_ENALS    (1ULL << 1)
+#define HCRX_ENASR    (1ULL << 2)
+#define HCRX_FNXS     (1ULL << 3)
+#define HCRX_FGTNXS   (1ULL << 4)
+#define HCRX_SMPME    (1ULL << 5)
+#define HCRX_TALLINT  (1ULL << 6)
+#define HCRX_VINMI    (1ULL << 7)
+#define HCRX_VFNMI    (1ULL << 8)
+#define HCRX_CMOW     (1ULL << 9)
+#define HCRX_MCE2     (1ULL << 10)
+#define HCRX_MSCEN    (1ULL << 11)
+
+#define HPFAR_NS      (1ULL << 63)
+
+#define HSTR_TTEE (1 << 16)
+#define HSTR_TJDBX (1 << 17)
+
+#define CNTHCTL_CNTVMASK      (1 << 18)
+#define CNTHCTL_CNTPMASK      (1 << 19)
+
 /* We use a few fake FSR values for internal purposes in M profile.
  * M profile cores don't have A/R format FSRs, but currently our
  * get_phys_addr() code assumes A/R profile and reports failures via
-- 
2.34.1

We prefer the FIELD macro over ad-hoc #defines for register bits;
switch CNTHCTL to that style before we add any more bits.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240301183219.2424889-4-peter.maydell@linaro.org
---
 target/arm/internals.h | 27 +++++++++++++++++++++++++--
 target/arm/helper.c    |  9 ++++-----
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -XXX,XX +XXX,XX @@ FIELD(VTCR, SL2, 33, 1)
 #define HSTR_TTEE (1 << 16)
 #define HSTR_TJDBX (1 << 17)
 
-#define CNTHCTL_CNTVMASK      (1 << 18)
-#define CNTHCTL_CNTPMASK      (1 << 19)
+/*
+ * Depending on the value of HCR_EL2.E2H, bits 0 and 1
+ * have different bit definitions, and EL1PCTEN might be
+ * bit 0 or bit 10. We use _E2H1 and _E2H0 suffixes to
+ * disambiguate if necessary.
+ */
+FIELD(CNTHCTL, EL0PCTEN_E2H1, 0, 1)
+FIELD(CNTHCTL, EL0VCTEN_E2H1, 1, 1)
+FIELD(CNTHCTL, EL1PCTEN_E2H0, 0, 1)
+FIELD(CNTHCTL, EL1PCEN_E2H0, 1, 1)
+FIELD(CNTHCTL, EVNTEN, 2, 1)
+FIELD(CNTHCTL, EVNTDIR, 3, 1)
+FIELD(CNTHCTL, EVNTI, 4, 4)
+FIELD(CNTHCTL, EL0VTEN, 8, 1)
+FIELD(CNTHCTL, EL0PTEN, 9, 1)
+FIELD(CNTHCTL, EL1PCTEN_E2H1, 10, 1)
+FIELD(CNTHCTL, EL1PTEN, 11, 1)
+FIELD(CNTHCTL, ECV, 12, 1)
+FIELD(CNTHCTL, EL1TVT, 13, 1)
+FIELD(CNTHCTL, EL1TVCT, 14, 1)
+FIELD(CNTHCTL, EL1NVPCT, 15, 1)
+FIELD(CNTHCTL, EL1NVVCT, 16, 1)
+FIELD(CNTHCTL, EVNTIS, 17, 1)
+FIELD(CNTHCTL, CNTVMASK, 18, 1)
+FIELD(CNTHCTL, CNTPMASK, 19, 1)
 
 /* We use a few fake FSR values for internal purposes in M profile.
  * M profile cores don't have A/R format FSRs, but currently our
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void gt_update_irq(ARMCPU *cpu, int timeridx)
      * It is RES0 in Secure and NonSecure state.
      */
     if ((ss == ARMSS_Root || ss == ARMSS_Realm) &&
-        ((timeridx == GTIMER_VIRT && (cnthctl & CNTHCTL_CNTVMASK)) ||
-         (timeridx == GTIMER_PHYS && (cnthctl & CNTHCTL_CNTPMASK)))) {
+        ((timeridx == GTIMER_VIRT && (cnthctl & R_CNTHCTL_CNTVMASK_MASK)) ||
+         (timeridx == GTIMER_PHYS && (cnthctl & R_CNTHCTL_CNTPMASK_MASK)))) {
         irqstate = 0;
     }
 
@@ -XXX,XX +XXX,XX @@ static void gt_cnthctl_write(CPUARMState *env, const ARMCPRegInfo *ri,
 {
     ARMCPU *cpu = env_archcpu(env);
     uint32_t oldval = env->cp15.cnthctl_el2;
-
     raw_write(env, ri, value);
 
-    if ((oldval ^ value) & CNTHCTL_CNTVMASK) {
+    if ((oldval ^ value) & R_CNTHCTL_CNTVMASK_MASK) {
         gt_update_irq(cpu, GTIMER_VIRT);
-    } else if ((oldval ^ value) & CNTHCTL_CNTPMASK) {
+    } else if ((oldval ^ value) & R_CNTHCTL_CNTPMASK_MASK) {
         gt_update_irq(cpu, GTIMER_PHYS);
     }
 }
-- 
2.34.1

Don't allow the guest to write CNTHCTL_EL2 bits which don't exist.
This is not strictly architecturally required, but it is how we've
tended to implement registers more recently.

In particular, bits [19:18] are only present with FEAT_RME,
and bits [17:12] will only be present with FEAT_ECV.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240301183219.2424889-5-peter.maydell@linaro.org
---
 target/arm/helper.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void gt_cnthctl_write(CPUARMState *env, const ARMCPRegInfo *ri,
 {
     ARMCPU *cpu = env_archcpu(env);
     uint32_t oldval = env->cp15.cnthctl_el2;
+    uint32_t valid_mask =
+        R_CNTHCTL_EL0PCTEN_E2H1_MASK |
+        R_CNTHCTL_EL0VCTEN_E2H1_MASK |
+        R_CNTHCTL_EVNTEN_MASK |
+        R_CNTHCTL_EVNTDIR_MASK |
+        R_CNTHCTL_EVNTI_MASK |
+        R_CNTHCTL_EL0VTEN_MASK |
+        R_CNTHCTL_EL0PTEN_MASK |
+        R_CNTHCTL_EL1PCTEN_E2H1_MASK |
+        R_CNTHCTL_EL1PTEN_MASK;
+
+    if (cpu_isar_feature(aa64_rme, cpu)) {
+        valid_mask |= R_CNTHCTL_CNTVMASK_MASK | R_CNTHCTL_CNTPMASK_MASK;
+    }
+
+    /* Clear RES0 bits */
+    value &= valid_mask;
+
     raw_write(env, ri, value);
 
     if ((oldval ^ value) & R_CNTHCTL_CNTVMASK_MASK) {
-- 
2.34.1

The functionality defined by ID_AA64MMFR0_EL1.ECV == 1 is:
 * four new trap bits for various counter and timer registers
 * the CNTHCTL_EL2.EVNTIS and CNTKCTL_EL1.EVNTIS bits which control
   scaling of the event stream. This is a no-op for us, because we don't
   implement the event stream (our WFE is a NOP): all we need to do is
   allow CNTHCTL_EL2.ENVTIS to be read and written.
 * extensions to PMSCR_EL1.PCT, PMSCR_EL2.PCT, TRFCR_EL1.TS and
   TRFCR_EL2.TS: these are all no-ops for us, because we don't implement
   FEAT_SPE or FEAT_TRF.
 * new registers CNTPCTSS_EL0 and NCTVCTSS_EL0 which are
   "self-sychronizing" views of the CNTPCT_EL0 and CNTVCT_EL0, meaning
   that no barriers are needed around their accesses. For us these
   are just the same as the normal views, because all our sysregs are
   inherently self-sychronizing.

In this commit we implement the trap handling and permit the new
CNTHCTL_EL2 bits to be written.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240301183219.2424889-6-peter.maydell@linaro.org
---
 target/arm/cpu-features.h |  5 ++++
 target/arm/helper.c       | 51 +++++++++++++++++++++++++++++++++++----
 2 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_fgt(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, FGT) != 0;
 }
 
+static inline bool isar_feature_aa64_ecv_traps(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, ECV) > 0;
+}
+
 static inline bool isar_feature_aa64_vh(const ARMISARegisters *id)
 {
     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, VH) != 0;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static CPAccessResult gt_counter_access(CPUARMState *env, int timeridx,
              : !extract32(env->cp15.cnthctl_el2, 0, 1))) {
             return CP_ACCESS_TRAP_EL2;
         }
+        if (has_el2 && timeridx == GTIMER_VIRT) {
+            if (FIELD_EX64(env->cp15.cnthctl_el2, CNTHCTL, EL1TVCT)) {
+                return CP_ACCESS_TRAP_EL2;
+            }
+        }
         break;
     }
     return CP_ACCESS_OK;
@@ -XXX,XX +XXX,XX @@ static CPAccessResult gt_timer_access(CPUARMState *env, int timeridx,
                 }
             }
         }
+        if (has_el2 && timeridx == GTIMER_VIRT) {
+            if (FIELD_EX64(env->cp15.cnthctl_el2, CNTHCTL, EL1TVT)) {
+                return CP_ACCESS_TRAP_EL2;
+            }
+        }
         break;
     }
     return CP_ACCESS_OK;
@@ -XXX,XX +XXX,XX @@ static void gt_cnthctl_write(CPUARMState *env, const ARMCPRegInfo *ri,
     if (cpu_isar_feature(aa64_rme, cpu)) {
         valid_mask |= R_CNTHCTL_CNTVMASK_MASK | R_CNTHCTL_CNTPMASK_MASK;
     }
+    if (cpu_isar_feature(aa64_ecv_traps, cpu)) {
+        valid_mask |=
+            R_CNTHCTL_EL1TVT_MASK |
+            R_CNTHCTL_EL1TVCT_MASK |
+            R_CNTHCTL_EL1NVPCT_MASK |
+            R_CNTHCTL_EL1NVVCT_MASK |
+            R_CNTHCTL_EVNTIS_MASK;
+    }
 
     /* Clear RES0 bits */
     value &= valid_mask;
@@ -XXX,XX +XXX,XX @@ static CPAccessResult e2h_access(CPUARMState *env, const ARMCPRegInfo *ri,
 {
     if (arm_current_el(env) == 1) {
         /* This must be a FEAT_NV access */
-        /* TODO: FEAT_ECV will need to check CNTHCTL_EL2 here */
         return CP_ACCESS_OK;
     }
     if (!(arm_hcr_el2_eff(env) & HCR_E2H)) {
@@ -XXX,XX +XXX,XX @@ static CPAccessResult e2h_access(CPUARMState *env, const ARMCPRegInfo *ri,
     return CP_ACCESS_OK;
 }
 
+static CPAccessResult access_el1nvpct(CPUARMState *env, const ARMCPRegInfo *ri,
+                                      bool isread)
+{
+    if (arm_current_el(env) == 1) {
+        /* This must be a FEAT_NV access with NVx == 101 */
+        if (FIELD_EX64(env->cp15.cnthctl_el2, CNTHCTL, EL1NVPCT)) {
+            return CP_ACCESS_TRAP_EL2;
+        }
+    }
+    return e2h_access(env, ri, isread);
+}
+
+static CPAccessResult access_el1nvvct(CPUARMState *env, const ARMCPRegInfo *ri,
+                                      bool isread)
+{
+    if (arm_current_el(env) == 1) {
+        /* This must be a FEAT_NV access with NVx == 101 */
+        if (FIELD_EX64(env->cp15.cnthctl_el2, CNTHCTL, EL1NVVCT)) {
+            return CP_ACCESS_TRAP_EL2;
+        }
+    }
+    return e2h_access(env, ri, isread);
+}
+
 /* Test if system register redirection is to occur in the current state.  */
 static bool redirect_for_e2h(CPUARMState *env)
 {
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vhe_reginfo[] = {
     { .name = "CNTP_CTL_EL02", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 5, .crn = 14, .crm = 2, .opc2 = 1,
       .type = ARM_CP_IO | ARM_CP_ALIAS,
-      .access = PL2_RW, .accessfn = e2h_access,
+      .access = PL2_RW, .accessfn = access_el1nvpct,
       .nv2_redirect_offset = 0x180 | NV2_REDIR_NO_NV1,
       .fieldoffset = offsetof(CPUARMState, cp15.c14_timer[GTIMER_PHYS].ctl),
       .writefn = gt_phys_ctl_write, .raw_writefn = raw_write },
     { .name = "CNTV_CTL_EL02", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 5, .crn = 14, .crm = 3, .opc2 = 1,
       .type = ARM_CP_IO | ARM_CP_ALIAS,
-      .access = PL2_RW, .accessfn = e2h_access,
+      .access = PL2_RW, .accessfn = access_el1nvvct,
       .nv2_redirect_offset = 0x170 | NV2_REDIR_NO_NV1,
       .fieldoffset = offsetof(CPUARMState, cp15.c14_timer[GTIMER_VIRT].ctl),
       .writefn = gt_virt_ctl_write, .raw_writefn = raw_write },
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo vhe_reginfo[] = {
       .type = ARM_CP_IO | ARM_CP_ALIAS,
       .fieldoffset = offsetof(CPUARMState, cp15.c14_timer[GTIMER_PHYS].cval),
       .nv2_redirect_offset = 0x178 | NV2_REDIR_NO_NV1,
-      .access = PL2_RW, .accessfn = e2h_access,
+      .access = PL2_RW, .accessfn = access_el1nvpct,
       .writefn = gt_phys_cval_write, .raw_writefn = raw_write },
     { .name = "CNTV_CVAL_EL02", .state = ARM_CP_STATE_AA64,
       .opc0 = 3, .opc1 = 5, .crn = 14, .crm = 3, .opc2 = 2,
       .type = ARM_CP_IO | ARM_CP_ALIAS,
       .nv2_redirect_offset = 0x168 | NV2_REDIR_NO_NV1,
       .fieldoffset = offsetof(CPUARMState, cp15.c14_timer[GTIMER_VIRT].cval),
-      .access = PL2_RW, .accessfn = e2h_access,
+      .access = PL2_RW, .accessfn = access_el1nvvct,
       .writefn = gt_virt_cval_write, .raw_writefn = raw_write },
 #endif
 };
-- 
2.34.1

For FEAT_ECV, new registers CNTPCTSS_EL0 and CNTVCTSS_EL0 are
defined, which are "self-synchronized" views of the physical and
virtual counts as seen in the CNTPCT_EL0 and CNTVCT_EL0 registers
(meaning that no barriers are needed around accesses to them to
ensure that reads of them do not occur speculatively and out-of-order
with other instructions).

For QEMU, all our system registers are self-synchronized, so we can
simply copy the existing implementation of CNTPCT_EL0 and CNTVCT_EL0
to the new register encodings.

This means we now implement all the functionality required for
ID_AA64MMFR0_EL1.ECV == 0b0001.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240301183219.2424889-7-peter.maydell@linaro.org
---
 target/arm/helper.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo generic_timer_cp_reginfo[] = {
     },
 };
 
+/*
+ * FEAT_ECV adds extra views of CNTVCT_EL0 and CNTPCT_EL0 which
+ * are "self-synchronizing". For QEMU all sysregs are self-synchronizing,
+ * so our implementations here are identical to the normal registers.
+ */
+static const ARMCPRegInfo gen_timer_ecv_cp_reginfo[] = {
+    { .name = "CNTVCTSS", .cp = 15, .crm = 14, .opc1 = 9,
+      .access = PL0_R, .type = ARM_CP_64BIT | ARM_CP_NO_RAW | ARM_CP_IO,
+      .accessfn = gt_vct_access,
+      .readfn = gt_virt_cnt_read, .resetfn = arm_cp_reset_ignore,
+    },
+    { .name = "CNTVCTSS_EL0", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 0, .opc2 = 6,
+      .access = PL0_R, .type = ARM_CP_NO_RAW | ARM_CP_IO,
+      .accessfn = gt_vct_access, .readfn = gt_virt_cnt_read,
+    },
+    { .name = "CNTPCTSS", .cp = 15, .crm = 14, .opc1 = 8,
+      .access = PL0_R, .type = ARM_CP_64BIT | ARM_CP_NO_RAW | ARM_CP_IO,
+      .accessfn = gt_pct_access,
+      .readfn = gt_cnt_read, .resetfn = arm_cp_reset_ignore,
+    },
+    { .name = "CNTPCTSS_EL0", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 0, .opc2 = 5,
+      .access = PL0_R, .type = ARM_CP_NO_RAW | ARM_CP_IO,
+      .accessfn = gt_pct_access, .readfn = gt_cnt_read,
+    },
+};
+
 #else
 
 /*
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo generic_timer_cp_reginfo[] = {
     },
 };
 
+/*
+ * CNTVCTSS_EL0 has the same trap conditions as CNTVCT_EL0, so it also
+ * is exposed to userspace by Linux.
+ */
+static const ARMCPRegInfo gen_timer_ecv_cp_reginfo[] = {
+    { .name = "CNTVCTSS_EL0", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 3, .crn = 14, .crm = 0, .opc2 = 6,
+      .access = PL0_R, .type = ARM_CP_NO_RAW | ARM_CP_IO,
+      .readfn = gt_virt_cnt_read,
+    },
+};
+
 #endif
 
 static void par_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
     if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) {
         define_arm_cp_regs(cpu, generic_timer_cp_reginfo);
     }
+    if (cpu_isar_feature(aa64_ecv_traps, cpu)) {
+        define_arm_cp_regs(cpu, gen_timer_ecv_cp_reginfo);
+    }
     if (arm_feature(env, ARM_FEATURE_VAPA)) {
         ARMCPRegInfo vapa_cp_reginfo[] = {
             { .name = "PAR", .cp = 15, .crn = 7, .crm = 4, .opc1 = 0, .opc2 = 0,
-- 
2.34.1

When ID_AA64MMFR0_EL1.ECV is 0b0010, a new register CNTPOFF_EL2 is
implemented.  This is similar to the existing CNTVOFF_EL2, except
that it controls a hypervisor-adjustable offset made to the physical
counter and timer.

Implement the handling for this register, which includes control/trap
bits in SCR_EL3 and CNTHCTL_EL2.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240301183219.2424889-8-peter.maydell@linaro.org
---
 target/arm/cpu-features.h |  5 +++
 target/arm/cpu.h          |  1 +
 target/arm/helper.c       | 68 +++++++++++++++++++++++++++++++++++++--
 target/arm/trace-events   |  1 +
 4 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -XXX,XX +XXX,XX @@ static inline bool isar_feature_aa64_ecv_traps(const ARMISARegisters *id)
     return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, ECV) > 0;
 }
 
+static inline bool isar_feature_aa64_ecv(const ARMISARegisters *id)
+{
+    return FIELD_EX64(id->id_aa64mmfr0, ID_AA64MMFR0, ECV) > 1;
+}
+
 static inline bool isar_feature_aa64_vh(const ARMISARegisters *id)
 {
     return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, VH) != 0;
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -XXX,XX +XXX,XX @@ typedef struct CPUArchState {
         uint64_t c14_cntkctl; /* Timer Control register */
         uint64_t cnthctl_el2; /* Counter/Timer Hyp Control register */
         uint64_t cntvoff_el2; /* Counter Virtual Offset register */
+        uint64_t cntpoff_el2; /* Counter Physical Offset register */
         ARMGenericTimer c14_timer[NUM_GTIMERS];
         uint32_t c15_cpar; /* XScale Coprocessor Access Register */
         uint32_t c15_ticonfig; /* TI925T configuration byte.  */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void scr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
         if (cpu_isar_feature(aa64_rme, cpu)) {
             valid_mask |= SCR_NSE | SCR_GPF;
         }
+        if (cpu_isar_feature(aa64_ecv, cpu)) {
+            valid_mask |= SCR_ECVEN;
+        }
     } else {
         valid_mask &= ~(SCR_RW | SCR_ST);
         if (cpu_isar_feature(aa32_ras, cpu)) {
@@ -XXX,XX +XXX,XX @@ void gt_rme_post_el_change(ARMCPU *cpu, void *ignored)
     gt_update_irq(cpu, GTIMER_PHYS);
 }
 
+static uint64_t gt_phys_raw_cnt_offset(CPUARMState *env)
+{
+    if ((env->cp15.scr_el3 & SCR_ECVEN) &&
+        FIELD_EX64(env->cp15.cnthctl_el2, CNTHCTL, ECV) &&
+        arm_is_el2_enabled(env) &&
+        (arm_hcr_el2_eff(env) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
+        return env->cp15.cntpoff_el2;
+    }
+    return 0;
+}
+
+static uint64_t gt_phys_cnt_offset(CPUARMState *env)
+{
+    if (arm_current_el(env) >= 2) {
+        return 0;
+    }
+    return gt_phys_raw_cnt_offset(env);
+}
+
 static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
 {
     ARMGenericTimer *gt = &cpu->env.cp15.c14_timer[timeridx];
@@ -XXX,XX +XXX,XX @@ static void gt_recalc_timer(ARMCPU *cpu, int timeridx)
          * reset timer to when ISTATUS next has to change
          */
         uint64_t offset = timeridx == GTIMER_VIRT ?
-                                      cpu->env.cp15.cntvoff_el2 : 0;
+            cpu->env.cp15.cntvoff_el2 : gt_phys_raw_cnt_offset(&cpu->env);
         uint64_t count = gt_get_countervalue(&cpu->env);
         /* Note that this must be unsigned 64 bit arithmetic: */
         int istatus = count - offset >= gt->cval;
@@ -XXX,XX +XXX,XX @@ static void gt_timer_reset(CPUARMState *env, const ARMCPRegInfo *ri,
 
 static uint64_t gt_cnt_read(CPUARMState *env, const ARMCPRegInfo *ri)
 {
-    return gt_get_countervalue(env);
+    return gt_get_countervalue(env) - gt_phys_cnt_offset(env);
 }
 
 static uint64_t gt_virt_cnt_offset(CPUARMState *env)
@@ -XXX,XX +XXX,XX @@ static uint64_t gt_tval_read(CPUARMState *env, const ARMCPRegInfo *ri,
     case GTIMER_HYPVIRT:
         offset = gt_virt_cnt_offset(env);
         break;
+    case GTIMER_PHYS:
+        offset = gt_phys_cnt_offset(env);
+        break;
     }
 
     return (uint32_t)(env->cp15.c14_timer[timeridx].cval -
@@ -XXX,XX +XXX,XX @@ static void gt_tval_write(CPUARMState *env, const ARMCPRegInfo *ri,
     case GTIMER_HYPVIRT:
         offset = gt_virt_cnt_offset(env);
         break;
+    case GTIMER_PHYS:
+        offset = gt_phys_cnt_offset(env);
+        break;
     }
 
     trace_arm_gt_tval_write(timeridx, value);
@@ -XXX,XX +XXX,XX @@ static void gt_cnthctl_write(CPUARMState *env, const ARMCPRegInfo *ri,
             R_CNTHCTL_EL1NVVCT_MASK |
             R_CNTHCTL_EVNTIS_MASK;
     }
+    if (cpu_isar_feature(aa64_ecv, cpu)) {
+        valid_mask |= R_CNTHCTL_ECV_MASK;
+    }
 
     /* Clear RES0 bits */
     value &= valid_mask;
@@ -XXX,XX +XXX,XX @@ static const ARMCPRegInfo gen_timer_ecv_cp_reginfo[] = {
     },
 };
 
+static CPAccessResult gt_cntpoff_access(CPUARMState *env,
+                                        const ARMCPRegInfo *ri,
+                                        bool isread)
+{
+    if (arm_current_el(env) == 2 && !(env->cp15.scr_el3 & SCR_ECVEN)) {
+        return CP_ACCESS_TRAP_EL3;
+    }
+    return CP_ACCESS_OK;
+}
+
+static void gt_cntpoff_write(CPUARMState *env, const ARMCPRegInfo *ri,
+                              uint64_t value)
+{
+    ARMCPU *cpu = env_archcpu(env);
+
+    trace_arm_gt_cntpoff_write(value);
+    raw_write(env, ri, value);
+    gt_recalc_timer(cpu, GTIMER_PHYS);
+}
+
+static const ARMCPRegInfo gen_timer_cntpoff_reginfo = {
+    .name = "CNTPOFF_EL2", .state = ARM_CP_STATE_AA64,
+    .opc0 = 3, .opc1 = 4, .crn = 14, .crm = 0, .opc2 = 6,
+    .access = PL2_RW, .type = ARM_CP_IO, .resetvalue = 0,
+    .accessfn = gt_cntpoff_access, .writefn = gt_cntpoff_write,
+    .nv2_redirect_offset = 0x1a8,
+    .fieldoffset = offsetof(CPUARMState, cp15.cntpoff_el2),
+};
 #else
 
 /*
@@ -XXX,XX +XXX,XX @@ void register_cp_regs_for_features(ARMCPU *cpu)
     if (cpu_isar_feature(aa64_ecv_traps, cpu)) {
         define_arm_cp_regs(cpu, gen_timer_ecv_cp_reginfo);
     }
+#ifndef CONFIG_USER_ONLY
+    if (cpu_isar_feature(aa64_ecv, cpu)) {
+        define_one_arm_cp_reg(cpu, &gen_timer_cntpoff_reginfo);
+    }
+#endif
     if (arm_feature(env, ARM_FEATURE_VAPA)) {
         ARMCPRegInfo vapa_cp_reginfo[] = {
             { .name = "PAR", .cp = 15, .crn = 7, .crm = 4, .opc1 = 0, .opc2 = 0,
diff --git a/target/arm/trace-events b/target/arm/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/trace-events
+++ b/target/arm/trace-events
@@ -XXX,XX +XXX,XX @@ arm_gt_tval_write(int timer, uint64_t value) "gt_tval_write: timer %d value 0x%"
 arm_gt_ctl_write(int timer, uint64_t value) "gt_ctl_write: timer %d value 0x%" PRIx64
 arm_gt_imask_toggle(int timer) "gt_ctl_write: timer %d IMASK toggle"
 arm_gt_cntvoff_write(uint64_t value) "gt_cntvoff_write: value 0x%" PRIx64
+arm_gt_cntpoff_write(uint64_t value) "gt_cntpoff_write: value 0x%" PRIx64
 arm_gt_update_irq(int timer, int irqstate) "gt_update_irq: timer %d irqstate %d"
 
 # kvm.c
-- 
2.34.1

Enable all FEAT_ECV features on the 'max' CPU.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240301183219.2424889-9-peter.maydell@linaro.org
---
 docs/system/arm/emulation.rst | 1 +
 target/arm/tcg/cpu64.c        | 1 +
 2 files changed, 2 insertions(+)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -XXX,XX +XXX,XX @@ the following architecture extensions:
 - FEAT_DotProd (Advanced SIMD dot product instructions)
 - FEAT_DoubleFault (Double Fault Extension)
 - FEAT_E0PD (Preventing EL0 access to halves of address maps)
+- FEAT_ECV (Enhanced Counter Virtualization)
 - FEAT_EPAC (Enhanced pointer authentication)
 - FEAT_ETS (Enhanced Translation Synchronization)
 - FEAT_EVT (Enhanced Virtualization Traps)
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -XXX,XX +XXX,XX @@ void aarch64_max_tcg_initfn(Object *obj)
     t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN64_2, 2); /* 64k stage2 supported */
     t = FIELD_DP64(t, ID_AA64MMFR0, TGRAN4_2, 2);  /*  4k stage2 supported */
     t = FIELD_DP64(t, ID_AA64MMFR0, FGT, 1);       /* FEAT_FGT */
+    t = FIELD_DP64(t, ID_AA64MMFR0, ECV, 2);       /* FEAT_ECV */
     cpu->isar.id_aa64mmfr0 = t;
 
     t = cpu->isar.id_aa64mmfr1;
-- 
2.34.1

From: Inès Varhol <ines.varhol@telecom-paris.fr>

Features supported :
- the 8 STM32L4x5 GPIOs are initialized with their reset values
    (except IDR, see below)
- input mode : setting a pin in input mode "externally" (using input
    irqs) results in an out irq (transmitted to SYSCFG)
- output mode : setting a bit in ODR sets the corresponding out irq
    (if this line is configured in output mode)
- pull-up, pull-down
- push-pull, open-drain

Difference with the real GPIOs :
- Alternate Function and Analog mode aren't implemented :
    pins in AF/Analog behave like pins in input mode
- floating pins stay at their last value
- register IDR reset values differ from the real one :
    values are coherent with the other registers reset values
    and the fact that AF/Analog modes aren't implemented
- setting I/O output speed isn't supported
- locking port bits isn't supported
- ADC function isn't supported
- GPIOH has 16 pins instead of 2 pins
- writing to registers LCKR, AFRL, AFRH and ASCR is ineffective

Signed-off-by: Arnaud Minier <arnaud.minier@telecom-paris.fr>
Signed-off-by: Inès Varhol <ines.varhol@telecom-paris.fr>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20240305210444.310665-2-ines.varhol@telecom-paris.fr
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 MAINTAINERS                        |   1 +
 docs/system/arm/b-l475e-iot01a.rst |   2 +-
 include/hw/gpio/stm32l4x5_gpio.h   |  70 +++++
 hw/gpio/stm32l4x5_gpio.c           | 477 +++++++++++++++++++++++++++++
 hw/gpio/Kconfig                    |   3 +
 hw/gpio/meson.build                |   1 +
 hw/gpio/trace-events               |   6 +
 7 files changed, 559 insertions(+), 1 deletion(-)
 create mode 100644 include/hw/gpio/stm32l4x5_gpio.h
 create mode 100644 hw/gpio/stm32l4x5_gpio.c

diff --git a/MAINTAINERS b/MAINTAINERS
index XXXXXXX..XXXXXXX 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ F: hw/arm/stm32l4x5_soc.c
 F: hw/misc/stm32l4x5_exti.c
 F: hw/misc/stm32l4x5_syscfg.c
 F: hw/misc/stm32l4x5_rcc.c
+F: hw/gpio/stm32l4x5_gpio.c
 F: include/hw/*/stm32l4x5_*.h
 
 B-L475E-IOT01A IoT Node
diff --git a/docs/system/arm/b-l475e-iot01a.rst b/docs/system/arm/b-l475e-iot01a.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/b-l475e-iot01a.rst
+++ b/docs/system/arm/b-l475e-iot01a.rst
@@ -XXX,XX +XXX,XX @@ Currently B-L475E-IOT01A machine's only supports the following devices:
 - STM32L4x5 EXTI (Extended interrupts and events controller)
 - STM32L4x5 SYSCFG (System configuration controller)
 - STM32L4x5 RCC (Reset and clock control)
+- STM32L4x5 GPIOs (General-purpose I/Os)
 
 Missing devices
 """""""""""""""
@@ -XXX,XX +XXX,XX @@ Missing devices
 The B-L475E-IOT01A does *not* support the following devices:
 
 - Serial ports (UART)
-- General-purpose I/Os (GPIO)
 - Analog to Digital Converter (ADC)
 - SPI controller
 - Timer controller (TIMER)
diff --git a/include/hw/gpio/stm32l4x5_gpio.h b/include/hw/gpio/stm32l4x5_gpio.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/hw/gpio/stm32l4x5_gpio.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * STM32L4x5 GPIO (General Purpose Input/Ouput)
+ *
+ * Copyright (c) 2024 Arnaud Minier <arnaud.minier@telecom-paris.fr>
+ * Copyright (c) 2024 Inès Varhol <ines.varhol@telecom-paris.fr>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+/*
+ * The reference used is the STMicroElectronics RM0351 Reference manual
+ * for STM32L4x5 and STM32L4x6 advanced Arm ® -based 32-bit MCUs.
+ * https://www.st.com/en/microcontrollers-microprocessors/stm32l4x5/documentation.html
+ */
+
+#ifndef HW_STM32L4X5_GPIO_H
+#define HW_STM32L4X5_GPIO_H
+
+#include "hw/sysbus.h"
+#include "qom/object.h"
+
+#define TYPE_STM32L4X5_GPIO "stm32l4x5-gpio"
+OBJECT_DECLARE_SIMPLE_TYPE(Stm32l4x5GpioState, STM32L4X5_GPIO)
+
+#define GPIO_NUM_PINS 16
+
+struct Stm32l4x5GpioState {
+    SysBusDevice parent_obj;
+
+    MemoryRegion mmio;
+
+    /* GPIO registers */
+    uint32_t moder;
+    uint32_t otyper;
+    uint32_t ospeedr;
+    uint32_t pupdr;
+    uint32_t idr;
+    uint32_t odr;
+    uint32_t lckr;
+    uint32_t afrl;
+    uint32_t afrh;
+    uint32_t ascr;
+
+    /* GPIO registers reset values */
+    uint32_t moder_reset;
+    uint32_t ospeedr_reset;
+    uint32_t pupdr_reset;
+
+    /*
+     * External driving of pins.
+     * The pins can be set externally through the device
+     * anonymous input GPIOs lines under certain conditions.
+     * The pin must not be in push-pull output mode,
+     * and can't be set high in open-drain mode.
+     * Pins driven externally and configured to
+     * output mode will in general be "disconnected"
+     * (see `get_gpio_pinmask_to_disconnect()`)
+     */
+    uint16_t disconnected_pins;
+    uint16_t pins_connected_high;
+
+    char *name;
+    Clock *clk;
+    qemu_irq pin[GPIO_NUM_PINS];
+};
+
+#endif
diff --git a/hw/gpio/stm32l4x5_gpio.c b/hw/gpio/stm32l4x5_gpio.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/hw/gpio/stm32l4x5_gpio.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * STM32L4x5 GPIO (General Purpose Input/Ouput)
+ *
+ * Copyright (c) 2024 Arnaud Minier <arnaud.minier@telecom-paris.fr>
+ * Copyright (c) 2024 Inès Varhol <ines.varhol@telecom-paris.fr>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+/*
+ * The reference used is the STMicroElectronics RM0351 Reference manual
+ * for STM32L4x5 and STM32L4x6 advanced Arm ® -based 32-bit MCUs.
+ * https://www.st.com/en/microcontrollers-microprocessors/stm32l4x5/documentation.html
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/gpio/stm32l4x5_gpio.h"
+#include "hw/irq.h"
+#include "hw/qdev-clock.h"
+#include "hw/qdev-properties.h"
+#include "qapi/visitor.h"
+#include "qapi/error.h"
+#include "migration/vmstate.h"
+#include "trace.h"
+
+#define GPIO_MODER 0x00
+#define GPIO_OTYPER 0x04
+#define GPIO_OSPEEDR 0x08
+#define GPIO_PUPDR 0x0C
+#define GPIO_IDR 0x10
+#define GPIO_ODR 0x14
+#define GPIO_BSRR 0x18
+#define GPIO_LCKR 0x1C
+#define GPIO_AFRL 0x20
+#define GPIO_AFRH 0x24
+#define GPIO_BRR 0x28
+#define GPIO_ASCR 0x2C
+
+/* 0b11111111_11111111_00000000_00000000 */
+#define RESERVED_BITS_MASK 0xFFFF0000
+
+static void update_gpio_idr(Stm32l4x5GpioState *s);
+
+static bool is_pull_up(Stm32l4x5GpioState *s, unsigned pin)
+{
+    return extract32(s->pupdr, 2 * pin, 2) == 1;
+}
+
+static bool is_pull_down(Stm32l4x5GpioState *s, unsigned pin)
+{
+    return extract32(s->pupdr, 2 * pin, 2) == 2;
+}
+
+static bool is_output(Stm32l4x5GpioState *s, unsigned pin)
+{
+    return extract32(s->moder, 2 * pin, 2) == 1;
+}
+
+static bool is_open_drain(Stm32l4x5GpioState *s, unsigned pin)
+{
+    return extract32(s->otyper, pin, 1) == 1;
+}
+
+static bool is_push_pull(Stm32l4x5GpioState *s, unsigned pin)
+{
+    return extract32(s->otyper, pin, 1) == 0;
+}
+
+static void stm32l4x5_gpio_reset_hold(Object *obj)
+{
+    Stm32l4x5GpioState *s = STM32L4X5_GPIO(obj);
+
+    s->moder = s->moder_reset;
+    s->otyper = 0x00000000;
+    s->ospeedr = s->ospeedr_reset;
+    s->pupdr = s->pupdr_reset;
+    s->idr = 0x00000000;
+    s->odr = 0x00000000;
+    s->lckr = 0x00000000;
+    s->afrl = 0x00000000;
+    s->afrh = 0x00000000;
+    s->ascr = 0x00000000;
+
+    s->disconnected_pins = 0xFFFF;
+    s->pins_connected_high = 0x0000;
+    update_gpio_idr(s);
+}
+
+static void stm32l4x5_gpio_set(void *opaque, int line, int level)
+{
+    Stm32l4x5GpioState *s = opaque;
+    /*
+     * The pin isn't set if line is configured in output mode
+     * except if level is 0 and the output is open-drain.
+     * This way there will be no short-circuit prone situations.
+     */
+    if (is_output(s, line) && !(is_open_drain(s, line) && (level == 0))) {
+        qemu_log_mask(LOG_GUEST_ERROR, "Line %d can't be driven externally\n",
+                      line);
+        return;
+    }
+
+    s->disconnected_pins &= ~(1 << line);
+    if (level) {
+        s->pins_connected_high |= (1 << line);
+    } else {
+        s->pins_connected_high &= ~(1 << line);
+    }
+    trace_stm32l4x5_gpio_pins(s->name, s->disconnected_pins,
+                              s->pins_connected_high);
+    update_gpio_idr(s);
+}
+
+
+static void update_gpio_idr(Stm32l4x5GpioState *s)
+{
+    uint32_t new_idr_mask = 0;
+    uint32_t new_idr = s->odr;
+    uint32_t old_idr = s->idr;
+    int new_pin_state, old_pin_state;
+
+    for (int i = 0; i < GPIO_NUM_PINS; i++) {
+        if (is_output(s, i)) {
+            if (is_push_pull(s, i)) {
+                new_idr_mask |= (1 << i);
+            } else if (!(s->odr & (1 << i))) {
+                /* open-drain ODR 0 */
+                new_idr_mask |= (1 << i);
+            /* open-drain ODR 1 */
+            } else if (!(s->disconnected_pins & (1 << i)) &&
+                       !(s->pins_connected_high & (1 << i))) {
+                /* open-drain ODR 1 with pin connected low */
+                new_idr_mask |= (1 << i);
+                new_idr &= ~(1 << i);
+            /* open-drain ODR 1 with unactive pin */
+            } else if (is_pull_up(s, i)) {
+                new_idr_mask |= (1 << i);
+            } else if (is_pull_down(s, i)) {
+                new_idr_mask |= (1 << i);
+                new_idr &= ~(1 << i);
+            }
+            /*
+             * The only case left is for open-drain ODR 1
+             * with unactive pin without pull-up or pull-down :
+             * the value is floating.
+             */
+        /* input or analog mode with connected pin */
+        } else if (!(s->disconnected_pins & (1 << i))) {
+            if (s->pins_connected_high & (1 << i)) {
+                /* pin high */
+                new_idr_mask |= (1 << i);
+                new_idr |= (1 << i);
+            } else {
+                /* pin low */
+                new_idr_mask |= (1 << i);
+                new_idr &= ~(1 << i);
+            }
+        /* input or analog mode with disconnected pin */
+        } else {
+            if (is_pull_up(s, i)) {
+                /* pull-up */
+                new_idr_mask |= (1 << i);
+                new_idr |= (1 << i);
+            } else if (is_pull_down(s, i)) {
+                /* pull-down */
+                new_idr_mask |= (1 << i);
+                new_idr &= ~(1 << i);
+            }
+            /*
+             * The only case left is for a disconnected pin
+             * without pull-up or pull-down :
+             * the value is floating.
+             */
+        }
+    }
+
+    s->idr = (old_idr & ~new_idr_mask) | (new_idr & new_idr_mask);
+    trace_stm32l4x5_gpio_update_idr(s->name, old_idr, s->idr);
+
+    for (int i = 0; i < GPIO_NUM_PINS; i++) {
+        if (new_idr_mask & (1 << i)) {
+            new_pin_state = (new_idr & (1 << i)) > 0;
+            old_pin_state = (old_idr & (1 << i)) > 0;
+            if (new_pin_state > old_pin_state) {
+                qemu_irq_raise(s->pin[i]);
+            } else if (new_pin_state < old_pin_state) {
+                qemu_irq_lower(s->pin[i]);
+            }
+        }
+    }
+}
+
+/*
+ * Return mask of pins that are both configured in output
+ * mode and externally driven (except pins in open-drain
+ * mode externally set to 0).
+ */
+static uint32_t get_gpio_pinmask_to_disconnect(Stm32l4x5GpioState *s)
+{
+    uint32_t pins_to_disconnect = 0;
+    for (int i = 0; i < GPIO_NUM_PINS; i++) {
+        /* for each connected pin in output mode */
+        if (!(s->disconnected_pins & (1 << i)) && is_output(s, i)) {
+            /* if either push-pull or high level */
+            if (is_push_pull(s, i) || s->pins_connected_high & (1 << i)) {
+                pins_to_disconnect |= (1 << i);
+                qemu_log_mask(LOG_GUEST_ERROR,
+                              "Line %d can't be driven externally\n",
+                              i);
+            }
+        }
+    }
+    return pins_to_disconnect;
+}
+
+/*
+ * Set field `disconnected_pins` and call `update_gpio_idr()`
+ */
+static void disconnect_gpio_pins(Stm32l4x5GpioState *s, uint16_t lines)
+{
+    s->disconnected_pins |= lines;
+    trace_stm32l4x5_gpio_pins(s->name, s->disconnected_pins,
+                              s->pins_connected_high);
+    update_gpio_idr(s);
+}
+
+static void disconnected_pins_set(Object *obj, Visitor *v,
+    const char *name, void *opaque, Error **errp)
+{
+    Stm32l4x5GpioState *s = STM32L4X5_GPIO(obj);
+    uint16_t value;
+    if (!visit_type_uint16(v, name, &value, errp)) {
+        return;
+    }
+    disconnect_gpio_pins(s, value);
+}
+
+static void disconnected_pins_get(Object *obj, Visitor *v,
+    const char *name, void *opaque, Error **errp)
+{
+    visit_type_uint16(v, name, (uint16_t *)opaque, errp);
+}
+
+static void clock_freq_get(Object *obj, Visitor *v,
+    const char *name, void *opaque, Error **errp)
+{
+    Stm32l4x5GpioState *s = STM32L4X5_GPIO(obj);
+    uint32_t clock_freq_hz = clock_get_hz(s->clk);
+    visit_type_uint32(v, name, &clock_freq_hz, errp);
+}
+
+static void stm32l4x5_gpio_write(void *opaque, hwaddr addr,
+                                 uint64_t val64, unsigned int size)
+{
+    Stm32l4x5GpioState *s = opaque;
+
+    uint32_t value = val64;
+    trace_stm32l4x5_gpio_write(s->name, addr, val64);
+
+    switch (addr) {
+    case GPIO_MODER:
+        s->moder = value;
+        disconnect_gpio_pins(s, get_gpio_pinmask_to_disconnect(s));
+        qemu_log_mask(LOG_UNIMP,
+                      "%s: Analog and AF modes aren't supported\n\
+                       Analog and AF mode behave like input mode\n",
+                      __func__);
+        return;
+    case GPIO_OTYPER:
+        s->otyper = value & ~RESERVED_BITS_MASK;
+        disconnect_gpio_pins(s, get_gpio_pinmask_to_disconnect(s));
+        return;
+    case GPIO_OSPEEDR:
+        qemu_log_mask(LOG_UNIMP,
+                      "%s: Changing I/O output speed isn't supported\n\
+                       I/O speed is already maximal\n",
+                      __func__);
+        s->ospeedr = value;
+        return;
+    case GPIO_PUPDR:
+        s->pupdr = value;
+        update_gpio_idr(s);
+        return;
+    case GPIO_IDR:
+        qemu_log_mask(LOG_UNIMP,
+                      "%s: GPIO->IDR is read-only\n",
+                      __func__);
+        return;
+    case GPIO_ODR:
+        s->odr = value & ~RESERVED_BITS_MASK;
+        update_gpio_idr(s);
+        return;
+    case GPIO_BSRR: {
+        uint32_t bits_to_reset = (value & RESERVED_BITS_MASK) >> GPIO_NUM_PINS;
+        uint32_t bits_to_set = value & ~RESERVED_BITS_MASK;
+        /* If both BSx and BRx are set, BSx has priority.*/
+        s->odr &= ~bits_to_reset;
+        s->odr |= bits_to_set;
+        update_gpio_idr(s);
+        return;
+    }
+    case GPIO_LCKR:
+        qemu_log_mask(LOG_UNIMP,
+                      "%s: Locking port bits configuration isn't supported\n",
+                      __func__);
+        s->lckr = value & ~RESERVED_BITS_MASK;
+        return;
+    case GPIO_AFRL:
+        qemu_log_mask(LOG_UNIMP,
+                      "%s: Alternate functions aren't supported\n",
+                      __func__);
+        s->afrl = value;
+        return;
+    case GPIO_AFRH:
+        qemu_log_mask(LOG_UNIMP,
+                      "%s: Alternate functions aren't supported\n",
+                      __func__);
+        s->afrh = value;
+        return;
+    case GPIO_BRR: {
+        uint32_t bits_to_reset = value & ~RESERVED_BITS_MASK;
+        s->odr &= ~bits_to_reset;
+        update_gpio_idr(s);
+        return;
+    }
+    case GPIO_ASCR:
+        qemu_log_mask(LOG_UNIMP,
+                      "%s: ADC function isn't supported\n",
+                      __func__);
+        s->ascr = value & ~RESERVED_BITS_MASK;
+        return;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: Bad offset 0x%" HWADDR_PRIx "\n", __func__, addr);
+    }
+}
+
+static uint64_t stm32l4x5_gpio_read(void *opaque, hwaddr addr,
+                                    unsigned int size)
+{
+    Stm32l4x5GpioState *s = opaque;
+
+    trace_stm32l4x5_gpio_read(s->name, addr);
+
+    switch (addr) {
+    case GPIO_MODER:
+        return s->moder;
+    case GPIO_OTYPER:
+        return s->otyper;
+    case GPIO_OSPEEDR:
+        return s->ospeedr;
+    case GPIO_PUPDR:
+        return s->pupdr;
+    case GPIO_IDR:
+        return s->idr;
+    case GPIO_ODR:
+        return s->odr;
+    case GPIO_BSRR:
+        return 0;
+    case GPIO_LCKR:
+        return s->lckr;
+    case GPIO_AFRL:
+        return s->afrl;
+    case GPIO_AFRH:
+        return s->afrh;
+    case GPIO_BRR:
+        return 0;
+    case GPIO_ASCR:
+        return s->ascr;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: Bad offset 0x%" HWADDR_PRIx "\n", __func__, addr);
+        return 0;
+    }
+}
+
+static const MemoryRegionOps stm32l4x5_gpio_ops = {
+    .read = stm32l4x5_gpio_read,
+    .write = stm32l4x5_gpio_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+        .unaligned = false,
+    },
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+        .unaligned = false,
+    },
+};
+
+static void stm32l4x5_gpio_init(Object *obj)
+{
+    Stm32l4x5GpioState *s = STM32L4X5_GPIO(obj);
+
+    memory_region_init_io(&s->mmio, obj, &stm32l4x5_gpio_ops, s,
+                          TYPE_STM32L4X5_GPIO, 0x400);
+
+    sysbus_init_mmio(SYS_BUS_DEVICE(obj), &s->mmio);
+
+    qdev_init_gpio_out(DEVICE(obj), s->pin, GPIO_NUM_PINS);
+    qdev_init_gpio_in(DEVICE(obj), stm32l4x5_gpio_set, GPIO_NUM_PINS);
+
+    s->clk = qdev_init_clock_in(DEVICE(s), "clk", NULL, s, 0);
+
+    object_property_add(obj, "disconnected-pins", "uint16",
+                        disconnected_pins_get, disconnected_pins_set,
+                        NULL, &s->disconnected_pins);
+    object_property_add(obj, "clock-freq-hz", "uint32",
+                        clock_freq_get, NULL, NULL, NULL);
+}
+
+static void stm32l4x5_gpio_realize(DeviceState *dev, Error **errp)
+{
+    Stm32l4x5GpioState *s = STM32L4X5_GPIO(dev);
+    if (!clock_has_source(s->clk)) {
+        error_setg(errp, "GPIO: clk input must be connected");
+        return;
+    }
+}
+
+static const VMStateDescription vmstate_stm32l4x5_gpio = {
+    .name = TYPE_STM32L4X5_GPIO,
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]){
+        VMSTATE_UINT32(moder, Stm32l4x5GpioState),
+        VMSTATE_UINT32(otyper, Stm32l4x5GpioState),
+        VMSTATE_UINT32(ospeedr, Stm32l4x5GpioState),
+        VMSTATE_UINT32(pupdr, Stm32l4x5GpioState),
+        VMSTATE_UINT32(idr, Stm32l4x5GpioState),
+        VMSTATE_UINT32(odr, Stm32l4x5GpioState),
+        VMSTATE_UINT32(lckr, Stm32l4x5GpioState),
+        VMSTATE_UINT32(afrl, Stm32l4x5GpioState),
+        VMSTATE_UINT32(afrh, Stm32l4x5GpioState),
+        VMSTATE_UINT32(ascr, Stm32l4x5GpioState),
+        VMSTATE_UINT16(disconnected_pins, Stm32l4x5GpioState),
+        VMSTATE_UINT16(pins_connected_high, Stm32l4x5GpioState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static Property stm32l4x5_gpio_properties[] = {
+    DEFINE_PROP_STRING("name", Stm32l4x5GpioState, name),
+    DEFINE_PROP_UINT32("mode-reset", Stm32l4x5GpioState, moder_reset, 0),
+    DEFINE_PROP_UINT32("ospeed-reset", Stm32l4x5GpioState, ospeedr_reset, 0),
+    DEFINE_PROP_UINT32("pupd-reset", Stm32l4x5GpioState, pupdr_reset, 0),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void stm32l4x5_gpio_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    ResettableClass *rc = RESETTABLE_CLASS(klass);
+
+    device_class_set_props(dc, stm32l4x5_gpio_properties);
+    dc->vmsd = &vmstate_stm32l4x5_gpio;
+    dc->realize = stm32l4x5_gpio_realize;
+    rc->phases.hold = stm32l4x5_gpio_reset_hold;
+}
+
+static const TypeInfo stm32l4x5_gpio_types[] = {
+    {
+        .name = TYPE_STM32L4X5_GPIO,
+        .parent = TYPE_SYS_BUS_DEVICE,
+        .instance_size = sizeof(Stm32l4x5GpioState),
+        .instance_init = stm32l4x5_gpio_init,
+        .class_init = stm32l4x5_gpio_class_init,
+    },
+};
+
+DEFINE_TYPES(stm32l4x5_gpio_types)
diff --git a/hw/gpio/Kconfig b/hw/gpio/Kconfig
index XXXXXXX..XXXXXXX 100644
--- a/hw/gpio/Kconfig
+++ b/hw/gpio/Kconfig
@@ -XXX,XX +XXX,XX @@ config GPIO_PWR
 
 config SIFIVE_GPIO
     bool
+
+config STM32L4X5_GPIO
+    bool
diff --git a/hw/gpio/meson.build b/hw/gpio/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/hw/gpio/meson.build
+++ b/hw/gpio/meson.build
@@ -XXX,XX +XXX,XX @@ system_ss.add(when: 'CONFIG_RASPI', if_true: files(
     'bcm2835_gpio.c',
     'bcm2838_gpio.c'
 ))
+system_ss.add(when: 'CONFIG_STM32L4X5_SOC', if_true: files('stm32l4x5_gpio.c'))
 system_ss.add(when: 'CONFIG_ASPEED_SOC', if_true: files('aspeed_gpio.c'))
 system_ss.add(when: 'CONFIG_SIFIVE_GPIO', if_true: files('sifive_gpio.c'))
diff --git a/hw/gpio/trace-events b/hw/gpio/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/gpio/trace-events
+++ b/hw/gpio/trace-events
@@ -XXX,XX +XXX,XX @@ sifive_gpio_update_output_irq(int64_t line, int64_t value) "line %" PRIi64 " val
 # aspeed_gpio.c
 aspeed_gpio_read(uint64_t offset, uint64_t value) "offset: 0x%" PRIx64 " value 0x%" PRIx64
 aspeed_gpio_write(uint64_t offset, uint64_t value) "offset: 0x%" PRIx64 " value 0x%" PRIx64
+
+# stm32l4x5_gpio.c
+stm32l4x5_gpio_read(char *gpio, uint64_t addr) "GPIO%s addr: 0x%" PRIx64 " "
+stm32l4x5_gpio_write(char *gpio, uint64_t addr, uint64_t data) "GPIO%s addr: 0x%" PRIx64 " val: 0x%" PRIx64 ""
+stm32l4x5_gpio_update_idr(char *gpio, uint32_t old_idr, uint32_t new_idr) "GPIO%s from: 0x%x to: 0x%x"
+stm32l4x5_gpio_pins(char *gpio, uint16_t disconnected, uint16_t high) "GPIO%s disconnected pins: 0x%x levels: 0x%x"
-- 
2.34.1

From: Inès Varhol <ines.varhol@telecom-paris.fr>

Signed-off-by: Arnaud Minier <arnaud.minier@telecom-paris.fr>
Signed-off-by: Inès Varhol <ines.varhol@telecom-paris.fr>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20240305210444.310665-3-ines.varhol@telecom-paris.fr
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/arm/stm32l4x5_soc.h     |  2 +
 include/hw/gpio/stm32l4x5_gpio.h   |  1 +
 include/hw/misc/stm32l4x5_syscfg.h |  3 +-
 hw/arm/stm32l4x5_soc.c             | 71 +++++++++++++++++++++++-------
 hw/misc/stm32l4x5_syscfg.c         |  1 +
 hw/arm/Kconfig                     |  3 +-
 6 files changed, 63 insertions(+), 18 deletions(-)

diff --git a/include/hw/arm/stm32l4x5_soc.h b/include/hw/arm/stm32l4x5_soc.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/arm/stm32l4x5_soc.h
+++ b/include/hw/arm/stm32l4x5_soc.h
@@ -XXX,XX +XXX,XX @@
 #include "hw/misc/stm32l4x5_syscfg.h"
 #include "hw/misc/stm32l4x5_exti.h"
 #include "hw/misc/stm32l4x5_rcc.h"
+#include "hw/gpio/stm32l4x5_gpio.h"
 #include "qom/object.h"
 
 #define TYPE_STM32L4X5_SOC "stm32l4x5-soc"
@@ -XXX,XX +XXX,XX @@ struct Stm32l4x5SocState {
     OrIRQState exti_or_gates[NUM_EXTI_OR_GATES];
     Stm32l4x5SyscfgState syscfg;
     Stm32l4x5RccState rcc;
+    Stm32l4x5GpioState gpio[NUM_GPIOS];
 
     MemoryRegion sram1;
     MemoryRegion sram2;
diff --git a/include/hw/gpio/stm32l4x5_gpio.h b/include/hw/gpio/stm32l4x5_gpio.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/gpio/stm32l4x5_gpio.h
+++ b/include/hw/gpio/stm32l4x5_gpio.h
@@ -XXX,XX +XXX,XX @@
 #define TYPE_STM32L4X5_GPIO "stm32l4x5-gpio"
 OBJECT_DECLARE_SIMPLE_TYPE(Stm32l4x5GpioState, STM32L4X5_GPIO)
 
+#define NUM_GPIOS 8
 #define GPIO_NUM_PINS 16
 
 struct Stm32l4x5GpioState {
diff --git a/include/hw/misc/stm32l4x5_syscfg.h b/include/hw/misc/stm32l4x5_syscfg.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/stm32l4x5_syscfg.h
+++ b/include/hw/misc/stm32l4x5_syscfg.h
@@ -XXX,XX +XXX,XX @@
 
 #include "hw/sysbus.h"
 #include "qom/object.h"
+#include "hw/gpio/stm32l4x5_gpio.h"
 
 #define TYPE_STM32L4X5_SYSCFG "stm32l4x5-syscfg"
 OBJECT_DECLARE_SIMPLE_TYPE(Stm32l4x5SyscfgState, STM32L4X5_SYSCFG)
 
-#define NUM_GPIOS 8
-#define GPIO_NUM_PINS 16
 #define SYSCFG_NUM_EXTICR 4
 
 struct Stm32l4x5SyscfgState {
diff --git a/hw/arm/stm32l4x5_soc.c b/hw/arm/stm32l4x5_soc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/stm32l4x5_soc.c
+++ b/hw/arm/stm32l4x5_soc.c
@@ -XXX,XX +XXX,XX @@
 #include "sysemu/sysemu.h"
 #include "hw/or-irq.h"
 #include "hw/arm/stm32l4x5_soc.h"
+#include "hw/gpio/stm32l4x5_gpio.h"
 #include "hw/qdev-clock.h"
 #include "hw/misc/unimp.h"
 
@@ -XXX,XX +XXX,XX @@ static const int exti_or_gate1_lines_in[EXTI_OR_GATE1_NUM_LINES_IN] = {
     16, 35, 36, 37, 38,
 };
 
+static const struct {
+    uint32_t addr;
+    uint32_t moder_reset;
+    uint32_t ospeedr_reset;
+    uint32_t pupdr_reset;
+} stm32l4x5_gpio_cfg[NUM_GPIOS] = {
+    { 0x48000000, 0xABFFFFFF, 0x0C000000, 0x64000000 },
+    { 0x48000400, 0xFFFFFEBF, 0x00000000, 0x00000100 },
+    { 0x48000800, 0xFFFFFFFF, 0x00000000, 0x00000000 },
+    { 0x48000C00, 0xFFFFFFFF, 0x00000000, 0x00000000 },
+    { 0x48001000, 0xFFFFFFFF, 0x00000000, 0x00000000 },
+    { 0x48001400, 0xFFFFFFFF, 0x00000000, 0x00000000 },
+    { 0x48001800, 0xFFFFFFFF, 0x00000000, 0x00000000 },
+    { 0x48001C00, 0x0000000F, 0x00000000, 0x00000000 },
+};
+
 static void stm32l4x5_soc_initfn(Object *obj)
 {
     Stm32l4x5SocState *s = STM32L4X5_SOC(obj);
@@ -XXX,XX +XXX,XX @@ static void stm32l4x5_soc_initfn(Object *obj)
     }
     object_initialize_child(obj, "syscfg", &s->syscfg, TYPE_STM32L4X5_SYSCFG);
     object_initialize_child(obj, "rcc", &s->rcc, TYPE_STM32L4X5_RCC);
+
+    for (unsigned i = 0; i < NUM_GPIOS; i++) {
+        g_autofree char *name = g_strdup_printf("gpio%c", 'a' + i);
+        object_initialize_child(obj, name, &s->gpio[i], TYPE_STM32L4X5_GPIO);
+    }
 }
 
 static void stm32l4x5_soc_realize(DeviceState *dev_soc, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void stm32l4x5_soc_realize(DeviceState *dev_soc, Error **errp)
     Stm32l4x5SocState *s = STM32L4X5_SOC(dev_soc);
     const Stm32l4x5SocClass *sc = STM32L4X5_SOC_GET_CLASS(dev_soc);
     MemoryRegion *system_memory = get_system_memory();
-    DeviceState *armv7m;
+    DeviceState *armv7m, *dev;
     SysBusDevice *busdev;
+    uint32_t pin_index;
 
     if (!memory_region_init_rom(&s->flash, OBJECT(dev_soc), "flash",
                                 sc->flash_size, errp)) {
@@ -XXX,XX +XXX,XX @@ static void stm32l4x5_soc_realize(DeviceState *dev_soc, Error **errp)
         return;
     }
 
+    /* GPIOs */
+    for (unsigned i = 0; i < NUM_GPIOS; i++) {
+        g_autofree char *name = g_strdup_printf("%c", 'A' + i);
+        dev = DEVICE(&s->gpio[i]);
+        qdev_prop_set_string(dev, "name", name);
+        qdev_prop_set_uint32(dev, "mode-reset",
+                             stm32l4x5_gpio_cfg[i].moder_reset);
+        qdev_prop_set_uint32(dev, "ospeed-reset",
+                             stm32l4x5_gpio_cfg[i].ospeedr_reset);
+        qdev_prop_set_uint32(dev, "pupd-reset",
+                            stm32l4x5_gpio_cfg[i].pupdr_reset);
+        busdev = SYS_BUS_DEVICE(&s->gpio[i]);
+        g_free(name);
+        name = g_strdup_printf("gpio%c-out", 'a' + i);
+        qdev_connect_clock_in(DEVICE(&s->gpio[i]), "clk",
+            qdev_get_clock_out(DEVICE(&(s->rcc)), name));
+        if (!sysbus_realize(busdev, errp)) {
+            return;
+        }
+        sysbus_mmio_map(busdev, 0, stm32l4x5_gpio_cfg[i].addr);
+    }
+
     /* System configuration controller */
     busdev = SYS_BUS_DEVICE(&s->syscfg);
     if (!sysbus_realize(busdev, errp)) {
         return;
     }
     sysbus_mmio_map(busdev, 0, SYSCFG_ADDR);
-    /*
-     * TODO: when the GPIO device is implemented, connect it
-     * to SYCFG using `qdev_connect_gpio_out`, NUM_GPIOS and
-     * GPIO_NUM_PINS.
-     */
+
+    for (unsigned i = 0; i < NUM_GPIOS; i++) {
+        for (unsigned j = 0; j < GPIO_NUM_PINS; j++) {
+            pin_index = GPIO_NUM_PINS * i + j;
+            qdev_connect_gpio_out(DEVICE(&s->gpio[i]), j,
+                                  qdev_get_gpio_in(DEVICE(&s->syscfg),
+                                  pin_index));
+        }
+    }
 
     /* EXTI device */
     busdev = SYS_BUS_DEVICE(&s->exti);
@@ -XXX,XX +XXX,XX @@ static void stm32l4x5_soc_realize(DeviceState *dev_soc, Error **errp)
         }
     }
 
-    for (unsigned i = 0; i < 16; i++) {
+    for (unsigned i = 0; i < GPIO_NUM_PINS; i++) {
         qdev_connect_gpio_out(DEVICE(&s->syscfg), i,
                               qdev_get_gpio_in(DEVICE(&s->exti), i));
     }
@@ -XXX,XX +XXX,XX @@ static void stm32l4x5_soc_realize(DeviceState *dev_soc, Error **errp)
     /* RESERVED:    0x40024400, 0x7FDBC00 */
 
     /* AHB2 BUS */
-    create_unimplemented_device("GPIOA",     0x48000000, 0x400);
-    create_unimplemented_device("GPIOB",     0x48000400, 0x400);
-    create_unimplemented_device("GPIOC",     0x48000800, 0x400);
-    create_unimplemented_device("GPIOD",     0x48000C00, 0x400);
-    create_unimplemented_device("GPIOE",     0x48001000, 0x400);
-    create_unimplemented_device("GPIOF",     0x48001400, 0x400);
-    create_unimplemented_device("GPIOG",     0x48001800, 0x400);
-    create_unimplemented_device("GPIOH",     0x48001C00, 0x400);
     /* RESERVED:    0x48002000, 0x7FDBC00 */
     create_unimplemented_device("OTG_FS",    0x50000000, 0x40000);
     create_unimplemented_device("ADC",       0x50040000, 0x400);
diff --git a/hw/misc/stm32l4x5_syscfg.c b/hw/misc/stm32l4x5_syscfg.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/stm32l4x5_syscfg.c
+++ b/hw/misc/stm32l4x5_syscfg.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/irq.h"
 #include "migration/vmstate.h"
 #include "hw/misc/stm32l4x5_syscfg.h"
+#include "hw/gpio/stm32l4x5_gpio.h"
 
 #define SYSCFG_MEMRMP 0x00
 #define SYSCFG_CFGR1 0x04
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -XXX,XX +XXX,XX @@ config STM32L4X5_SOC
     bool
     select ARM_V7M
     select OR_IRQ
-    select STM32L4X5_SYSCFG
     select STM32L4X5_EXTI
+    select STM32L4X5_SYSCFG
     select STM32L4X5_RCC
+    select STM32L4X5_GPIO
 
 config XLNX_ZYNQMP_ARM
     bool
-- 
2.34.1

From: Inès Varhol <ines.varhol@telecom-paris.fr>

The testcase contains :
- `test_idr_reset_value()` :
Checks the reset values of MODER, OTYPER, PUPDR, ODR and IDR.
- `test_gpio_output_mode()` :
Checks that writing a bit in register ODR results in the corresponding
pin rising or lowering, if this pin is configured in output mode.
- `test_gpio_input_mode()` :
Checks that a input pin set high or low externally results
in the pin rising and lowering.
- `test_pull_up_pull_down()` :
Checks that a floating pin in pull-up/down mode is actually high/down.
- `test_push_pull()` :
Checks that a pin set externally is disconnected when configured in
push-pull output mode, and can't be set externally while in this mode.
- `test_open_drain()` :
Checks that a pin set externally high is disconnected when configured
in open-drain output mode, and can't be set high while in this mode.
- `test_bsrr_brr()` :
Checks that writing to BSRR and BRR has the desired result in ODR.
- `test_clock_enable()` :
Checks that GPIO clock is at the right frequency after enabling it.

Acked-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Arnaud Minier <arnaud.minier@telecom-paris.fr>
Signed-off-by: Inès Varhol <ines.varhol@telecom-paris.fr>
Message-id: 20240305210444.310665-4-ines.varhol@telecom-paris.fr
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tests/qtest/stm32l4x5_gpio-test.c | 551 ++++++++++++++++++++++++++++++
 tests/qtest/meson.build           |   3 +-
 2 files changed, 553 insertions(+), 1 deletion(-)
 create mode 100644 tests/qtest/stm32l4x5_gpio-test.c

diff --git a/tests/qtest/stm32l4x5_gpio-test.c b/tests/qtest/stm32l4x5_gpio-test.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/qtest/stm32l4x5_gpio-test.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * QTest testcase for STM32L4x5_GPIO
+ *
+ * Copyright (c) 2024 Arnaud Minier <arnaud.minier@telecom-paris.fr>
+ * Copyright (c) 2024 Inès Varhol <ines.varhol@telecom-paris.fr>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest-single.h"
+
+#define GPIO_BASE_ADDR 0x48000000
+#define GPIO_SIZE      0x400
+#define NUM_GPIOS      8
+#define NUM_GPIO_PINS  16
+
+#define GPIO_A 0x48000000
+#define GPIO_B 0x48000400
+#define GPIO_C 0x48000800
+#define GPIO_D 0x48000C00
+#define GPIO_E 0x48001000
+#define GPIO_F 0x48001400
+#define GPIO_G 0x48001800
+#define GPIO_H 0x48001C00
+
+#define MODER 0x00
+#define OTYPER 0x04
+#define PUPDR 0x0C
+#define IDR 0x10
+#define ODR 0x14
+#define BSRR 0x18
+#define BRR 0x28
+
+#define MODER_INPUT 0
+#define MODER_OUTPUT 1
+
+#define PUPDR_NONE 0
+#define PUPDR_PULLUP 1
+#define PUPDR_PULLDOWN 2
+
+#define OTYPER_PUSH_PULL 0
+#define OTYPER_OPEN_DRAIN 1
+
+const uint32_t moder_reset[NUM_GPIOS] = {
+    0xABFFFFFF,
+    0xFFFFFEBF,
+    0xFFFFFFFF,
+    0xFFFFFFFF,
+    0xFFFFFFFF,
+    0xFFFFFFFF,
+    0xFFFFFFFF,
+    0x0000000F
+};
+
+const uint32_t pupdr_reset[NUM_GPIOS] = {
+    0x64000000,
+    0x00000100,
+    0x00000000,
+    0x00000000,
+    0x00000000,
+    0x00000000,
+    0x00000000,
+    0x00000000
+};
+
+const uint32_t idr_reset[NUM_GPIOS] = {
+    0x0000A000,
+    0x00000010,
+    0x00000000,
+    0x00000000,
+    0x00000000,
+    0x00000000,
+    0x00000000,
+    0x00000000
+};
+
+static uint32_t gpio_readl(unsigned int gpio, unsigned int offset)
+{
+    return readl(gpio + offset);
+}
+
+static void gpio_writel(unsigned int gpio, unsigned int offset, uint32_t value)
+{
+    writel(gpio + offset, value);
+}
+
+static void gpio_set_bit(unsigned int gpio, unsigned int reg,
+                         unsigned int pin, uint32_t value)
+{
+    uint32_t mask = 0xFFFFFFFF & ~(0x1 << pin);
+    gpio_writel(gpio, reg, (gpio_readl(gpio, reg) & mask) | value << pin);
+}
+
+static void gpio_set_2bits(unsigned int gpio, unsigned int reg,
+                           unsigned int pin, uint32_t value)
+{
+    uint32_t offset = 2 * pin;
+    uint32_t mask = 0xFFFFFFFF & ~(0x3 << offset);
+    gpio_writel(gpio, reg, (gpio_readl(gpio, reg) & mask) | value << offset);
+}
+
+static unsigned int get_gpio_id(uint32_t gpio_addr)
+{
+    return (gpio_addr - GPIO_BASE_ADDR) / GPIO_SIZE;
+}
+
+static void gpio_set_irq(unsigned int gpio, int num, int level)
+{
+    g_autofree char *name = g_strdup_printf("/machine/soc/gpio%c",
+                                            get_gpio_id(gpio) + 'a');
+    qtest_set_irq_in(global_qtest, name, NULL, num, level);
+}
+
+static void disconnect_all_pins(unsigned int gpio)
+{
+    g_autofree char *path = g_strdup_printf("/machine/soc/gpio%c",
+                                            get_gpio_id(gpio) + 'a');
+    QDict *r;
+
+    r = qtest_qmp(global_qtest, "{ 'execute': 'qom-set', 'arguments': "
+        "{ 'path': %s, 'property': 'disconnected-pins', 'value': %d } }",
+        path, 0xFFFF);
+    g_assert_false(qdict_haskey(r, "error"));
+    qobject_unref(r);
+}
+
+static uint32_t get_disconnected_pins(unsigned int gpio)
+{
+    g_autofree char *path = g_strdup_printf("/machine/soc/gpio%c",
+                                            get_gpio_id(gpio) + 'a');
+    uint32_t disconnected_pins = 0;
+    QDict *r;
+
+    r = qtest_qmp(global_qtest, "{ 'execute': 'qom-get', 'arguments':"
+        " { 'path': %s, 'property': 'disconnected-pins'} }", path);
+    g_assert_false(qdict_haskey(r, "error"));
+    disconnected_pins = qdict_get_int(r, "return");
+    qobject_unref(r);
+    return disconnected_pins;
+}
+
+static uint32_t reset(uint32_t gpio, unsigned int offset)
+{
+    switch (offset) {
+    case MODER:
+        return moder_reset[get_gpio_id(gpio)];
+    case PUPDR:
+        return pupdr_reset[get_gpio_id(gpio)];
+    case IDR:
+        return idr_reset[get_gpio_id(gpio)];
+    }
+    return 0x0;
+}
+
+static void system_reset(void)
+{
+    QDict *r;
+    r = qtest_qmp(global_qtest, "{'execute': 'system_reset'}");
+    g_assert_false(qdict_haskey(r, "error"));
+    qobject_unref(r);
+}
+
+static void test_idr_reset_value(void)
+{
+    /*
+     * Checks that the values in MODER, OTYPER, PUPDR and ODR
+     * after reset are correct, and that the value in IDR is
+     * coherent.
+     * Since AF and analog modes aren't implemented, IDR reset
+     * values aren't the same as with a real board.
+     *
+     * Register IDR contains the actual values of all GPIO pins.
+     * Its value depends on the pins' configuration
+     * (intput/output/analog : register MODER, push-pull/open-drain :
+     * register OTYPER, pull-up/pull-down/none : register PUPDR)
+     * and on the values stored in register ODR
+     * (in case the pin is in output mode).
+     */
+
+    gpio_writel(GPIO_A, MODER, 0xDEADBEEF);
+    gpio_writel(GPIO_A, ODR, 0xDEADBEEF);
+    gpio_writel(GPIO_A, OTYPER, 0xDEADBEEF);
+    gpio_writel(GPIO_A, PUPDR, 0xDEADBEEF);
+
+    gpio_writel(GPIO_B, MODER, 0xDEADBEEF);
+    gpio_writel(GPIO_B, ODR, 0xDEADBEEF);
+    gpio_writel(GPIO_B, OTYPER, 0xDEADBEEF);
+    gpio_writel(GPIO_B, PUPDR, 0xDEADBEEF);
+
+    gpio_writel(GPIO_C, MODER, 0xDEADBEEF);
+    gpio_writel(GPIO_C, ODR, 0xDEADBEEF);
+    gpio_writel(GPIO_C, OTYPER, 0xDEADBEEF);
+    gpio_writel(GPIO_C, PUPDR, 0xDEADBEEF);
+
+    gpio_writel(GPIO_H, MODER, 0xDEADBEEF);
+    gpio_writel(GPIO_H, ODR, 0xDEADBEEF);
+    gpio_writel(GPIO_H, OTYPER, 0xDEADBEEF);
+    gpio_writel(GPIO_H, PUPDR, 0xDEADBEEF);
+
+    system_reset();
+
+    uint32_t moder = gpio_readl(GPIO_A, MODER);
+    uint32_t odr = gpio_readl(GPIO_A, ODR);
+    uint32_t otyper = gpio_readl(GPIO_A, OTYPER);
+    uint32_t pupdr = gpio_readl(GPIO_A, PUPDR);
+    uint32_t idr = gpio_readl(GPIO_A, IDR);
+    /* 15: AF, 14: AF, 13: AF, 12: Analog ... */
+    /* here AF is the same as Analog and Input mode */
+    g_assert_cmphex(moder, ==, reset(GPIO_A, MODER));
+    g_assert_cmphex(odr, ==, reset(GPIO_A, ODR));
+    g_assert_cmphex(otyper, ==, reset(GPIO_A, OTYPER));
+    /* 15: pull-up, 14: pull-down, 13: pull-up, 12: neither ... */
+    g_assert_cmphex(pupdr, ==, reset(GPIO_A, PUPDR));
+    /* 15 : 1, 14: 0, 13: 1, 12 : reset value ... */
+    g_assert_cmphex(idr, ==, reset(GPIO_A, IDR));
+
+    moder = gpio_readl(GPIO_B, MODER);
+    odr = gpio_readl(GPIO_B, ODR);
+    otyper = gpio_readl(GPIO_B, OTYPER);
+    pupdr = gpio_readl(GPIO_B, PUPDR);
+    idr = gpio_readl(GPIO_B, IDR);
+    /* ... 5: Analog, 4: AF, 3: AF, 2: Analog ... */
+    /* here AF is the same as Analog and Input mode */
+    g_assert_cmphex(moder, ==, reset(GPIO_B, MODER));
+    g_assert_cmphex(odr, ==, reset(GPIO_B, ODR));
+    g_assert_cmphex(otyper, ==, reset(GPIO_B, OTYPER));
+    /* ... 5: neither, 4: pull-up, 3: neither ... */
+    g_assert_cmphex(pupdr, ==, reset(GPIO_B, PUPDR));
+    /* ... 5 : reset value, 4 : 1, 3 : reset value ... */
+    g_assert_cmphex(idr, ==, reset(GPIO_B, IDR));
+
+    moder = gpio_readl(GPIO_C, MODER);
+    odr = gpio_readl(GPIO_C, ODR);
+    otyper = gpio_readl(GPIO_C, OTYPER);
+    pupdr = gpio_readl(GPIO_C, PUPDR);
+    idr = gpio_readl(GPIO_C, IDR);
+    /* Analog, same as Input mode*/
+    g_assert_cmphex(moder, ==, reset(GPIO_C, MODER));
+    g_assert_cmphex(odr, ==, reset(GPIO_C, ODR));
+    g_assert_cmphex(otyper, ==, reset(GPIO_C, OTYPER));
+    /* no pull-up or pull-down */
+    g_assert_cmphex(pupdr, ==, reset(GPIO_C, PUPDR));
+    /* reset value */
+    g_assert_cmphex(idr, ==, reset(GPIO_C, IDR));
+
+    moder = gpio_readl(GPIO_H, MODER);
+    odr = gpio_readl(GPIO_H, ODR);
+    otyper = gpio_readl(GPIO_H, OTYPER);
+    pupdr = gpio_readl(GPIO_H, PUPDR);
+    idr = gpio_readl(GPIO_H, IDR);
+    /* Analog, same as Input mode */
+    g_assert_cmphex(moder, ==, reset(GPIO_H, MODER));
+    g_assert_cmphex(odr, ==, reset(GPIO_H, ODR));
+    g_assert_cmphex(otyper, ==, reset(GPIO_H, OTYPER));
+    /* no pull-up or pull-down */
+    g_assert_cmphex(pupdr, ==, reset(GPIO_H, PUPDR));
+    /* reset value */
+    g_assert_cmphex(idr, ==, reset(GPIO_H, IDR));
+}
+
+static void test_gpio_output_mode(const void *data)
+{
+    /*
+     * Checks that setting a bit in ODR sets the corresponding
+     * GPIO line high : it should set the right bit in IDR
+     * and send an irq to syscfg.
+     * Additionally, it checks that values written to ODR
+     * when not in output mode are stored and not discarded.
+     */
+    unsigned int pin = ((uint64_t)data) & 0xF;
+    uint32_t gpio = ((uint64_t)data) >> 32;
+    unsigned int gpio_id = get_gpio_id(gpio);
+
+    qtest_irq_intercept_in(global_qtest, "/machine/soc/syscfg");
+
+    /* Set a bit in ODR and check nothing happens */
+    gpio_set_bit(gpio, ODR, pin, 1);
+    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR));
+    g_assert_false(get_irq(gpio_id * NUM_GPIO_PINS + pin));
+
+    /* Configure the relevant line as output and check the pin is high */
+    gpio_set_2bits(gpio, MODER, pin, MODER_OUTPUT);
+    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) | (1 << pin));
+    g_assert_true(get_irq(gpio_id * NUM_GPIO_PINS + pin));
+
+    /* Reset the bit in ODR and check the pin is low */
+    gpio_set_bit(gpio, ODR, pin, 0);
+    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) & ~(1 << pin));
+    g_assert_false(get_irq(gpio_id * NUM_GPIO_PINS + pin));
+
+    /* Clean the test */
+    gpio_writel(gpio, ODR, reset(gpio, ODR));
+    gpio_writel(gpio, MODER, reset(gpio, MODER));
+    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR));
+    g_assert_false(get_irq(gpio_id * NUM_GPIO_PINS + pin));
+}
+
+static void test_gpio_input_mode(const void *data)
+{
+    /*
+     * Test that setting a line high/low externally sets the
+     * corresponding GPIO line high/low : it should set the
+     * right bit in IDR and send an irq to syscfg.
+     */
+    unsigned int pin = ((uint64_t)data) & 0xF;
+    uint32_t gpio = ((uint64_t)data) >> 32;
+    unsigned int gpio_id = get_gpio_id(gpio);
+
+    qtest_irq_intercept_in(global_qtest, "/machine/soc/syscfg");
+
+    /* Configure a line as input, raise it, and check that the pin is high */
+    gpio_set_2bits(gpio, MODER, pin, MODER_INPUT);
+    gpio_set_irq(gpio, pin, 1);
+    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) | (1 << pin));
+    g_assert_true(get_irq(gpio_id * NUM_GPIO_PINS + pin));
+
+    /* Lower the line and check that the pin is low */
+    gpio_set_irq(gpio, pin, 0);
+    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) & ~(1 << pin));
+    g_assert_false(get_irq(gpio_id * NUM_GPIO_PINS + pin));
+
+    /* Clean the test */
+    gpio_writel(gpio, MODER, reset(gpio, MODER));
+    disconnect_all_pins(gpio);
+    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR));
+}
+
+static void test_pull_up_pull_down(const void *data)
+{
+    /*
+     * Test that a floating pin with pull-up sets the pin
+     * high and vice-versa.
+     */
+    unsigned int pin = ((uint64_t)data) & 0xF;
+    uint32_t gpio = ((uint64_t)data) >> 32;
+    unsigned int gpio_id = get_gpio_id(gpio);
+
+    qtest_irq_intercept_in(global_qtest, "/machine/soc/syscfg");
+
+    /* Configure a line as input with pull-up, check the line is set high */
+    gpio_set_2bits(gpio, MODER, pin, MODER_INPUT);
+    gpio_set_2bits(gpio, PUPDR, pin, PUPDR_PULLUP);
+    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) | (1 << pin));
+    g_assert_true(get_irq(gpio_id * NUM_GPIO_PINS + pin));
+
+    /* Configure the line with pull-down, check the line is low */
+    gpio_set_2bits(gpio, PUPDR, pin, PUPDR_PULLDOWN);
+    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) & ~(1 << pin));
+    g_assert_false(get_irq(gpio_id * NUM_GPIO_PINS + pin));
+
+    /* Clean the test */
+    gpio_writel(gpio, MODER, reset(gpio, MODER));
+    gpio_writel(gpio, PUPDR, reset(gpio, PUPDR));
+    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR));
+}
+
+static void test_push_pull(const void *data)
+{
+    /*
+     * Test that configuring a line in push-pull output mode
+     * disconnects the pin, that the pin can't be set or reset
+     * externally afterwards.
+     */
+    unsigned int pin = ((uint64_t)data) & 0xF;
+    uint32_t gpio = ((uint64_t)data) >> 32;
+    uint32_t gpio2 = GPIO_BASE_ADDR + (GPIO_H - gpio);
+
+    qtest_irq_intercept_in(global_qtest, "/machine/soc/syscfg");
+
+    /* Setting a line high externally, configuring it in push-pull output */
+    /* And checking the pin was disconnected */
+    gpio_set_irq(gpio, pin, 1);
+    gpio_set_2bits(gpio, MODER, pin, MODER_OUTPUT);
+    g_assert_cmphex(get_disconnected_pins(gpio), ==, 0xFFFF);
+    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) & ~(1 << pin));
+
+    /* Setting a line low externally, configuring it in push-pull output */
+    /* And checking the pin was disconnected */
+    gpio_set_irq(gpio2, pin, 0);
+    gpio_set_bit(gpio2, ODR, pin, 1);
+    gpio_set_2bits(gpio2, MODER, pin, MODER_OUTPUT);
+    g_assert_cmphex(get_disconnected_pins(gpio2), ==, 0xFFFF);
+    g_assert_cmphex(gpio_readl(gpio2, IDR), ==, reset(gpio2, IDR) | (1 << pin));
+
+    /* Trying to set a push-pull output pin, checking it doesn't work */
+    gpio_set_irq(gpio, pin, 1);
+    g_assert_cmphex(get_disconnected_pins(gpio), ==, 0xFFFF);
+    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) & ~(1 << pin));
+
+    /* Trying to reset a push-pull output pin, checking it doesn't work */
+    gpio_set_irq(gpio2, pin, 0);
+    g_assert_cmphex(get_disconnected_pins(gpio2), ==, 0xFFFF);
+    g_assert_cmphex(gpio_readl(gpio2, IDR), ==, reset(gpio2, IDR) | (1 << pin));
+
+    /* Clean the test */
+    gpio_writel(gpio, MODER, reset(gpio, MODER));
+    gpio_writel(gpio2, ODR, reset(gpio2, ODR));
+    gpio_writel(gpio2, MODER, reset(gpio2, MODER));
+}
+
+static void test_open_drain(const void *data)
+{
+    /*
+     * Test that configuring a line in open-drain output mode
+     * disconnects a pin set high externally and that the pin
+     * can't be set high externally while configured in open-drain.
+     *
+     * However a pin set low externally shouldn't be disconnected,
+     * and it can be set low externally when in open-drain mode.
+     */
+    unsigned int pin = ((uint64_t)data) & 0xF;
+    uint32_t gpio = ((uint64_t)data) >> 32;
+    uint32_t gpio2 = GPIO_BASE_ADDR + (GPIO_H - gpio);
+
+    qtest_irq_intercept_in(global_qtest, "/machine/soc/syscfg");
+
+    /* Setting a line high externally, configuring it in open-drain output */
+    /* And checking the pin was disconnected */
+    gpio_set_irq(gpio, pin, 1);
+    gpio_set_bit(gpio, OTYPER, pin, OTYPER_OPEN_DRAIN);
+    gpio_set_2bits(gpio, MODER, pin, MODER_OUTPUT);
+    g_assert_cmphex(get_disconnected_pins(gpio), ==, 0xFFFF);
+    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) & ~(1 << pin));
+
+    /* Setting a line low externally, configuring it in open-drain output */
+    /* And checking the pin wasn't disconnected */
+    gpio_set_irq(gpio2, pin, 0);
+    gpio_set_bit(gpio2, ODR, pin, 1);
+    gpio_set_bit(gpio2, OTYPER, pin, OTYPER_OPEN_DRAIN);
+    gpio_set_2bits(gpio2, MODER, pin, MODER_OUTPUT);
+    g_assert_cmphex(get_disconnected_pins(gpio2), ==, 0xFFFF & ~(1 << pin));
+    g_assert_cmphex(gpio_readl(gpio2, IDR), ==,
+                               reset(gpio2, IDR) & ~(1 << pin));
+
+    /* Trying to set a open-drain output pin, checking it doesn't work */
+    gpio_set_irq(gpio, pin, 1);
+    g_assert_cmphex(get_disconnected_pins(gpio), ==, 0xFFFF);
+    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR) & ~(1 << pin));
+
+    /* Trying to reset a open-drain output pin, checking it works */
+    gpio_set_bit(gpio, ODR, pin, 1);
+    gpio_set_irq(gpio, pin, 0);
+    g_assert_cmphex(get_disconnected_pins(gpio2), ==, 0xFFFF & ~(1 << pin));
+    g_assert_cmphex(gpio_readl(gpio2, IDR), ==,
+                               reset(gpio2, IDR) & ~(1 << pin));
+
+    /* Clean the test */
+    disconnect_all_pins(gpio2);
+    gpio_writel(gpio2, OTYPER, reset(gpio2, OTYPER));
+    gpio_writel(gpio2, ODR, reset(gpio2, ODR));
+    gpio_writel(gpio2, MODER, reset(gpio2, MODER));
+    g_assert_cmphex(gpio_readl(gpio2, IDR), ==, reset(gpio2, IDR));
+    disconnect_all_pins(gpio);
+    gpio_writel(gpio, OTYPER, reset(gpio, OTYPER));
+    gpio_writel(gpio, ODR, reset(gpio, ODR));
+    gpio_writel(gpio, MODER, reset(gpio, MODER));
+    g_assert_cmphex(gpio_readl(gpio, IDR), ==, reset(gpio, IDR));
+}
+
+static void test_bsrr_brr(const void *data)
+{
+    /*
+     * Test that writing a '1' in BSS and BSRR
+     * has the desired effect on ODR.
+     * In BSRR, BSx has priority over BRx.
+     */
+    unsigned int pin = ((uint64_t)data) & 0xF;
+    uint32_t gpio = ((uint64_t)data) >> 32;
+
+    gpio_writel(gpio, BSRR, (1 << pin));
+    g_assert_cmphex(gpio_readl(gpio, ODR), ==, reset(gpio, ODR) | (1 << pin));
+
+    gpio_writel(gpio, BSRR, (1 << (pin + NUM_GPIO_PINS)));
+    g_assert_cmphex(gpio_readl(gpio, ODR), ==, reset(gpio, ODR));
+
+    gpio_writel(gpio, BSRR, (1 << pin));
+    g_assert_cmphex(gpio_readl(gpio, ODR), ==, reset(gpio, ODR) | (1 << pin));
+
+    gpio_writel(gpio, BRR, (1 << pin));
+    g_assert_cmphex(gpio_readl(gpio, ODR), ==, reset(gpio, ODR));
+
+    /* BSx should have priority over BRx */
+    gpio_writel(gpio, BSRR, (1 << pin) | (1 << (pin + NUM_GPIO_PINS)));
+    g_assert_cmphex(gpio_readl(gpio, ODR), ==, reset(gpio, ODR) | (1 << pin));
+
+    gpio_writel(gpio, BRR, (1 << pin));
+    g_assert_cmphex(gpio_readl(gpio, ODR), ==, reset(gpio, ODR));
+
+    gpio_writel(gpio, ODR, reset(gpio, ODR));
+}
+
+int main(int argc, char **argv)
+{
+    int ret;
+
+    g_test_init(&argc, &argv, NULL);
+    g_test_set_nonfatal_assertions();
+    qtest_add_func("stm32l4x5/gpio/test_idr_reset_value",
+                   test_idr_reset_value);
+    /*
+     * The inputs for the tests (gpio and pin) can be changed,
+     * but the tests don't work for pins that are high at reset
+     * (GPIOA15, GPIO13 and GPIOB5).
+     * Specifically, rising the pin then checking `get_irq()`
+     * is problematic since the pin was already high.
+     */
+    qtest_add_data_func("stm32l4x5/gpio/test_gpioc5_output_mode",
+                        (void *)((uint64_t)GPIO_C << 32 | 5),
+                        test_gpio_output_mode);
+    qtest_add_data_func("stm32l4x5/gpio/test_gpioh3_output_mode",
+                        (void *)((uint64_t)GPIO_H << 32 | 3),
+                        test_gpio_output_mode);
+    qtest_add_data_func("stm32l4x5/gpio/test_gpio_input_mode1",
+                        (void *)((uint64_t)GPIO_D << 32 | 6),
+                        test_gpio_input_mode);
+    qtest_add_data_func("stm32l4x5/gpio/test_gpio_input_mode2",
+                        (void *)((uint64_t)GPIO_C << 32 | 10),
+                        test_gpio_input_mode);
+    qtest_add_data_func("stm32l4x5/gpio/test_gpio_pull_up_pull_down1",
+                        (void *)((uint64_t)GPIO_B << 32 | 5),
+                        test_pull_up_pull_down);
+    qtest_add_data_func("stm32l4x5/gpio/test_gpio_pull_up_pull_down2",
+                        (void *)((uint64_t)GPIO_F << 32 | 1),
+                        test_pull_up_pull_down);
+    qtest_add_data_func("stm32l4x5/gpio/test_gpio_push_pull1",
+                        (void *)((uint64_t)GPIO_G << 32 | 6),
+                        test_push_pull);
+    qtest_add_data_func("stm32l4x5/gpio/test_gpio_push_pull2",
+                        (void *)((uint64_t)GPIO_H << 32 | 3),
+                        test_push_pull);
+    qtest_add_data_func("stm32l4x5/gpio/test_gpio_open_drain1",
+                        (void *)((uint64_t)GPIO_C << 32 | 4),
+                        test_open_drain);
+    qtest_add_data_func("stm32l4x5/gpio/test_gpio_open_drain2",
+                        (void *)((uint64_t)GPIO_E << 32 | 11),
+                        test_open_drain);
+    qtest_add_data_func("stm32l4x5/gpio/test_bsrr_brr1",
+                        (void *)((uint64_t)GPIO_A << 32 | 12),
+                        test_bsrr_brr);
+    qtest_add_data_func("stm32l4x5/gpio/test_bsrr_brr2",
+                        (void *)((uint64_t)GPIO_D << 32 | 0),
+                        test_bsrr_brr);
+
+    qtest_start("-machine b-l475e-iot01a");
+    ret = g_test_run();
+    qtest_end();
+
+    return ret;
+}
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -XXX,XX +XXX,XX @@ qtests_aspeed = \
 qtests_stm32l4x5 = \
   ['stm32l4x5_exti-test',
    'stm32l4x5_syscfg-test',
-   'stm32l4x5_rcc-test']
+   'stm32l4x5_rcc-test',
+   'stm32l4x5_gpio-test']
 
 qtests_arm = \
   (config_all_devices.has_key('CONFIG_MPS2') ? ['sse-timer-test'] : []) + \
-- 
2.34.1

From: Richard Henderson <richard.henderson@linaro.org>

While the 8-bit input elements are sequential in the input vector,
the 32-bit output elements are not sequential in the output matrix.
Do not attempt to compute 2 32-bit outputs at the same time.

Cc: qemu-stable@nongnu.org
Fixes: 23a5e3859f5 ("target/arm: Implement SME integer outer product")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2083
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20240305163931.242795-1-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/sme_helper.c       | 77 ++++++++++++++++++-------------
 tests/tcg/aarch64/sme-smopa-1.c   | 47 +++++++++++++++++++
 tests/tcg/aarch64/sme-smopa-2.c   | 54 ++++++++++++++++++++++
 tests/tcg/aarch64/Makefile.target |  2 +-
 4 files changed, 147 insertions(+), 33 deletions(-)
 create mode 100644 tests/tcg/aarch64/sme-smopa-1.c
 create mode 100644 tests/tcg/aarch64/sme-smopa-2.c

diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -XXX,XX +XXX,XX @@ void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn,
     }
 }
 
-typedef uint64_t IMOPFn(uint64_t, uint64_t, uint64_t, uint8_t, bool);
+typedef uint32_t IMOPFn32(uint32_t, uint32_t, uint32_t, uint8_t, bool);
+static inline void do_imopa_s(uint32_t *za, uint32_t *zn, uint32_t *zm,
+                              uint8_t *pn, uint8_t *pm,
+                              uint32_t desc, IMOPFn32 *fn)
+{
+    intptr_t row, col, oprsz = simd_oprsz(desc) / 4;
+    bool neg = simd_data(desc);
 
-static inline void do_imopa(uint64_t *za, uint64_t *zn, uint64_t *zm,
-                            uint8_t *pn, uint8_t *pm,
-                            uint32_t desc, IMOPFn *fn)
+    for (row = 0; row < oprsz; ++row) {
+        uint8_t pa = (pn[H1(row >> 1)] >> ((row & 1) * 4)) & 0xf;
+        uint32_t *za_row = &za[tile_vslice_index(row)];
+        uint32_t n = zn[H4(row)];
+
+        for (col = 0; col < oprsz; ++col) {
+            uint8_t pb = pm[H1(col >> 1)] >> ((col & 1) * 4);
+            uint32_t *a = &za_row[H4(col)];
+
+            *a = fn(n, zm[H4(col)], *a, pa & pb, neg);
+        }
+    }
+}
+
+typedef uint64_t IMOPFn64(uint64_t, uint64_t, uint64_t, uint8_t, bool);
+static inline void do_imopa_d(uint64_t *za, uint64_t *zn, uint64_t *zm,
+                              uint8_t *pn, uint8_t *pm,
+                              uint32_t desc, IMOPFn64 *fn)
 {
     intptr_t row, col, oprsz = simd_oprsz(desc) / 8;
     bool neg = simd_data(desc);
@@ -XXX,XX +XXX,XX @@ static inline void do_imopa(uint64_t *za, uint64_t *zn, uint64_t *zm,
 }
 
 #define DEF_IMOP_32(NAME, NTYPE, MTYPE) \
-static uint64_t NAME(uint64_t n, uint64_t m, uint64_t a, uint8_t p, bool neg) \
+static uint32_t NAME(uint32_t n, uint32_t m, uint32_t a, uint8_t p, bool neg) \
 {                                                                           \
-    uint32_t sum0 = 0, sum1 = 0;                                            \
+    uint32_t sum = 0;                                                       \
     /* Apply P to N as a mask, making the inactive elements 0. */           \
     n &= expand_pred_b(p);                                                  \
-    sum0 += (NTYPE)(n >> 0) * (MTYPE)(m >> 0);                              \
-    sum0 += (NTYPE)(n >> 8) * (MTYPE)(m >> 8);                              \
-    sum0 += (NTYPE)(n >> 16) * (MTYPE)(m >> 16);                            \
-    sum0 += (NTYPE)(n >> 24) * (MTYPE)(m >> 24);                            \
-    sum1 += (NTYPE)(n >> 32) * (MTYPE)(m >> 32);                            \
-    sum1 += (NTYPE)(n >> 40) * (MTYPE)(m >> 40);                            \
-    sum1 += (NTYPE)(n >> 48) * (MTYPE)(m >> 48);                            \
-    sum1 += (NTYPE)(n >> 56) * (MTYPE)(m >> 56);                            \
-    if (neg) {                                                              \
-        sum0 = (uint32_t)a - sum0, sum1 = (uint32_t)(a >> 32) - sum1;       \
-    } else {                                                                \
-        sum0 = (uint32_t)a + sum0, sum1 = (uint32_t)(a >> 32) + sum1;       \
-    }                                                                       \
-    return ((uint64_t)sum1 << 32) | sum0;                                   \
+    sum += (NTYPE)(n >> 0) * (MTYPE)(m >> 0);                               \
+    sum += (NTYPE)(n >> 8) * (MTYPE)(m >> 8);                               \
+    sum += (NTYPE)(n >> 16) * (MTYPE)(m >> 16);                             \
+    sum += (NTYPE)(n >> 24) * (MTYPE)(m >> 24);                             \
+    return neg ? a - sum : a + sum;                                         \
 }
 
 #define DEF_IMOP_64(NAME, NTYPE, MTYPE) \
@@ -XXX,XX +XXX,XX @@ DEF_IMOP_64(umopa_d, uint16_t, uint16_t)
 DEF_IMOP_64(sumopa_d, int16_t, uint16_t)
 DEF_IMOP_64(usmopa_d, uint16_t, int16_t)
 
-#define DEF_IMOPH(NAME) \
-    void HELPER(sme_##NAME)(void *vza, void *vzn, void *vzm, void *vpn,      \
-                            void *vpm, uint32_t desc)                        \
-    { do_imopa(vza, vzn, vzm, vpn, vpm, desc, NAME); }
+#define DEF_IMOPH(NAME, S) \
+    void HELPER(sme_##NAME##_##S)(void *vza, void *vzn, void *vzm,          \
+                                  void *vpn, void *vpm, uint32_t desc)      \
+    { do_imopa_##S(vza, vzn, vzm, vpn, vpm, desc, NAME##_##S); }
 
-DEF_IMOPH(smopa_s)
-DEF_IMOPH(umopa_s)
-DEF_IMOPH(sumopa_s)
-DEF_IMOPH(usmopa_s)
-DEF_IMOPH(smopa_d)
-DEF_IMOPH(umopa_d)
-DEF_IMOPH(sumopa_d)
-DEF_IMOPH(usmopa_d)
+DEF_IMOPH(smopa, s)
+DEF_IMOPH(umopa, s)
+DEF_IMOPH(sumopa, s)
+DEF_IMOPH(usmopa, s)
+
+DEF_IMOPH(smopa, d)
+DEF_IMOPH(umopa, d)
+DEF_IMOPH(sumopa, d)
+DEF_IMOPH(usmopa, d)
diff --git a/tests/tcg/aarch64/sme-smopa-1.c b/tests/tcg/aarch64/sme-smopa-1.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/tcg/aarch64/sme-smopa-1.c
@@ -XXX,XX +XXX,XX @@
+#include <stdio.h>
+#include <string.h>
+
+int main()
+{
+    static const int cmp[4][4] = {
+        {  110,  134,  158,  182 },
+        {  390,  478,  566,  654 },
+        {  670,  822,  974, 1126 },
+        {  950, 1166, 1382, 1598 }
+    };
+    int dst[4][4];
+    int *tmp = &dst[0][0];
+
+    asm volatile(
+        ".arch armv8-r+sme\n\t"
+        "smstart\n\t"
+        "index z0.b, #0, #1\n\t"
+        "movprfx z1, z0\n\t"
+        "add z1.b, z1.b, #16\n\t"
+        "ptrue p0.b\n\t"
+        "smopa za0.s, p0/m, p0/m, z0.b, z1.b\n\t"
+        "ptrue p0.s, vl4\n\t"
+        "mov w12, #0\n\t"
+        "st1w { za0h.s[w12, #0] }, p0, [%0]\n\t"
+        "add %0, %0, #16\n\t"
+        "st1w { za0h.s[w12, #1] }, p0, [%0]\n\t"
+        "add %0, %0, #16\n\t"
+        "st1w { za0h.s[w12, #2] }, p0, [%0]\n\t"
+        "add %0, %0, #16\n\t"
+        "st1w { za0h.s[w12, #3] }, p0, [%0]\n\t"
+        "smstop"
+        : "+r"(tmp) : : "memory");
+
+    if (memcmp(cmp, dst, sizeof(dst)) == 0) {
+        return 0;
+    }
+
+    /* See above for correct results. */
+    for (int i = 0; i < 4; ++i) {
+        for (int j = 0; j < 4; ++j) {
+            printf("%6d", dst[i][j]);
+        }
+        printf("\n");
+    }
+    return 1;
+}
diff --git a/tests/tcg/aarch64/sme-smopa-2.c b/tests/tcg/aarch64/sme-smopa-2.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/tcg/aarch64/sme-smopa-2.c
@@ -XXX,XX +XXX,XX @@
+#include <stdio.h>
+#include <string.h>
+
+int main()
+{
+    static const long cmp[4][4] = {
+        {  110,  134,  158,  182 },
+        {  390,  478,  566,  654 },
+        {  670,  822,  974, 1126 },
+        {  950, 1166, 1382, 1598 }
+    };
+    long dst[4][4];
+    long *tmp = &dst[0][0];
+    long svl;
+
+    /* Validate that we have a wide enough vector for 4 elements. */
+    asm(".arch armv8-r+sme-i64\n\trdsvl %0, #1" : "=r"(svl));
+    if (svl < 32) {
+        return 0;
+    }
+
+    asm volatile(
+        "smstart\n\t"
+        "index z0.h, #0, #1\n\t"
+        "movprfx z1, z0\n\t"
+        "add z1.h, z1.h, #16\n\t"
+        "ptrue p0.b\n\t"
+        "smopa za0.d, p0/m, p0/m, z0.h, z1.h\n\t"
+        "ptrue p0.d, vl4\n\t"
+        "mov w12, #0\n\t"
+        "st1d { za0h.d[w12, #0] }, p0, [%0]\n\t"
+        "add %0, %0, #32\n\t"
+        "st1d { za0h.d[w12, #1] }, p0, [%0]\n\t"
+        "mov w12, #2\n\t"
+        "add %0, %0, #32\n\t"
+        "st1d { za0h.d[w12, #0] }, p0, [%0]\n\t"
+        "add %0, %0, #32\n\t"
+        "st1d { za0h.d[w12, #1] }, p0, [%0]\n\t"
+        "smstop"
+        : "+r"(tmp) : : "memory");
+
+    if (memcmp(cmp, dst, sizeof(dst)) == 0) {
+        return 0;
+    }
+
+    /* See above for correct results. */
+    for (int i = 0; i < 4; ++i) {
+        for (int j = 0; j < 4; ++j) {
+            printf("%6ld", dst[i][j]);
+        }
+        printf("\n");
+    }
+    return 1;
+}
diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
index XXXXXXX..XXXXXXX 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -XXX,XX +XXX,XX @@ endif
 
 # SME Tests
 ifneq ($(CROSS_AS_HAS_ARMV9_SME),)
-AARCH64_TESTS += sme-outprod1
+AARCH64_TESTS += sme-outprod1 sme-smopa-1 sme-smopa-2
 endif
 
 # System Registers Tests
-- 
2.34.1

The sun4v RTC device model added under commit a0e893039cf2ce0 in 2016
was unfortunately added with a license of GPL-v3-or-later, which is
not compatible with other QEMU code which has a GPL-v2-only license.

Relicense the code in the .c and the .h file to GPL-v2-or-later,
to make it compatible with the rest of QEMU.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini (for Red Hat) <pbonzini@redhat.com>
Signed-off-by: Artyom Tarasenko <atar4qemu@gmail.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20240223161300.938542-1-peter.maydell@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 include/hw/rtc/sun4v-rtc.h | 2 +-
 hw/rtc/sun4v-rtc.c         | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/hw/rtc/sun4v-rtc.h b/include/hw/rtc/sun4v-rtc.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/rtc/sun4v-rtc.h
+++ b/include/hw/rtc/sun4v-rtc.h
@@ -XXX,XX +XXX,XX @@
  *
  * Copyright (c) 2016 Artyom Tarasenko
  *
- * This code is licensed under the GNU GPL v3 or (at your option) any later
+ * This code is licensed under the GNU GPL v2 or (at your option) any later
  * version.
  */
 
diff --git a/hw/rtc/sun4v-rtc.c b/hw/rtc/sun4v-rtc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/rtc/sun4v-rtc.c
+++ b/hw/rtc/sun4v-rtc.c
@@ -XXX,XX +XXX,XX @@
  *
  * Copyright (c) 2016 Artyom Tarasenko
  *
- * This code is licensed under the GNU GPL v3 or (at your option) any later
+ * This code is licensed under the GNU GPL v2 or (at your option) any later
  * version.
  */
 
-- 
2.34.1

From: Thomas Huth <thuth@redhat.com>

Move the code to a separate file so that we do not have to compile
it anymore if CONFIG_ARM_V7M is not set.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-id: 20240308141051.536599-2-thuth@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/tcg/cpu-v7m.c   | 290 +++++++++++++++++++++++++++++++++++++
 target/arm/tcg/cpu32.c     | 261 ---------------------------------
 target/arm/meson.build     |   3 +
 target/arm/tcg/meson.build |   3 +
 4 files changed, 296 insertions(+), 261 deletions(-)
 create mode 100644 target/arm/tcg/cpu-v7m.c

diff --git a/target/arm/tcg/cpu-v7m.c b/target/arm/tcg/cpu-v7m.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/tcg/cpu-v7m.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * QEMU ARMv7-M TCG-only CPUs.
+ *
+ * Copyright (c) 2012 SUSE LINUX Products GmbH
+ *
+ * This code is licensed under the GNU GPL v2 or later.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "hw/core/tcg-cpu-ops.h"
+#include "internals.h"
+
+#if !defined(CONFIG_USER_ONLY)
+
+#include "hw/intc/armv7m_nvic.h"
+
+static bool arm_v7m_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
+{
+    CPUClass *cc = CPU_GET_CLASS(cs);
+    ARMCPU *cpu = ARM_CPU(cs);
+    CPUARMState *env = &cpu->env;
+    bool ret = false;
+
+    /*
+     * ARMv7-M interrupt masking works differently than -A or -R.
+     * There is no FIQ/IRQ distinction. Instead of I and F bits
+     * masking FIQ and IRQ interrupts, an exception is taken only
+     * if it is higher priority than the current execution priority
+     * (which depends on state like BASEPRI, FAULTMASK and the
+     * currently active exception).
+     */
+    if (interrupt_request & CPU_INTERRUPT_HARD
+        && (armv7m_nvic_can_take_pending_exception(env->nvic))) {
+        cs->exception_index = EXCP_IRQ;
+        cc->tcg_ops->do_interrupt(cs);
+        ret = true;
+    }
+    return ret;
+}
+
+#endif /* !CONFIG_USER_ONLY */
+
+static void cortex_m0_initfn(Object *obj)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+    set_feature(&cpu->env, ARM_FEATURE_V6);
+    set_feature(&cpu->env, ARM_FEATURE_M);
+
+    cpu->midr = 0x410cc200;
+
+    /*
+     * These ID register values are not guest visible, because
+     * we do not implement the Main Extension. They must be set
+     * to values corresponding to the Cortex-M0's implemented
+     * features, because QEMU generally controls its emulation
+     * by looking at ID register fields. We use the same values as
+     * for the M3.
+     */
+    cpu->isar.id_pfr0 = 0x00000030;
+    cpu->isar.id_pfr1 = 0x00000200;
+    cpu->isar.id_dfr0 = 0x00100000;
+    cpu->id_afr0 = 0x00000000;
+    cpu->isar.id_mmfr0 = 0x00000030;
+    cpu->isar.id_mmfr1 = 0x00000000;
+    cpu->isar.id_mmfr2 = 0x00000000;
+    cpu->isar.id_mmfr3 = 0x00000000;
+    cpu->isar.id_isar0 = 0x01141110;
+    cpu->isar.id_isar1 = 0x02111000;
+    cpu->isar.id_isar2 = 0x21112231;
+    cpu->isar.id_isar3 = 0x01111110;
+    cpu->isar.id_isar4 = 0x01310102;
+    cpu->isar.id_isar5 = 0x00000000;
+    cpu->isar.id_isar6 = 0x00000000;
+}
+
+static void cortex_m3_initfn(Object *obj)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+    set_feature(&cpu->env, ARM_FEATURE_V7);
+    set_feature(&cpu->env, ARM_FEATURE_M);
+    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
+    cpu->midr = 0x410fc231;
+    cpu->pmsav7_dregion = 8;
+    cpu->isar.id_pfr0 = 0x00000030;
+    cpu->isar.id_pfr1 = 0x00000200;
+    cpu->isar.id_dfr0 = 0x00100000;
+    cpu->id_afr0 = 0x00000000;
+    cpu->isar.id_mmfr0 = 0x00000030;
+    cpu->isar.id_mmfr1 = 0x00000000;
+    cpu->isar.id_mmfr2 = 0x00000000;
+    cpu->isar.id_mmfr3 = 0x00000000;
+    cpu->isar.id_isar0 = 0x01141110;
+    cpu->isar.id_isar1 = 0x02111000;
+    cpu->isar.id_isar2 = 0x21112231;
+    cpu->isar.id_isar3 = 0x01111110;
+    cpu->isar.id_isar4 = 0x01310102;
+    cpu->isar.id_isar5 = 0x00000000;
+    cpu->isar.id_isar6 = 0x00000000;
+}
+
+static void cortex_m4_initfn(Object *obj)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+
+    set_feature(&cpu->env, ARM_FEATURE_V7);
+    set_feature(&cpu->env, ARM_FEATURE_M);
+    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
+    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
+    cpu->midr = 0x410fc240; /* r0p0 */
+    cpu->pmsav7_dregion = 8;
+    cpu->isar.mvfr0 = 0x10110021;
+    cpu->isar.mvfr1 = 0x11000011;
+    cpu->isar.mvfr2 = 0x00000000;
+    cpu->isar.id_pfr0 = 0x00000030;
+    cpu->isar.id_pfr1 = 0x00000200;
+    cpu->isar.id_dfr0 = 0x00100000;
+    cpu->id_afr0 = 0x00000000;
+    cpu->isar.id_mmfr0 = 0x00000030;
+    cpu->isar.id_mmfr1 = 0x00000000;
+    cpu->isar.id_mmfr2 = 0x00000000;
+    cpu->isar.id_mmfr3 = 0x00000000;
+    cpu->isar.id_isar0 = 0x01141110;
+    cpu->isar.id_isar1 = 0x02111000;
+    cpu->isar.id_isar2 = 0x21112231;
+    cpu->isar.id_isar3 = 0x01111110;
+    cpu->isar.id_isar4 = 0x01310102;
+    cpu->isar.id_isar5 = 0x00000000;
+    cpu->isar.id_isar6 = 0x00000000;
+}
+
+static void cortex_m7_initfn(Object *obj)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+
+    set_feature(&cpu->env, ARM_FEATURE_V7);
+    set_feature(&cpu->env, ARM_FEATURE_M);
+    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
+    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
+    cpu->midr = 0x411fc272; /* r1p2 */
+    cpu->pmsav7_dregion = 8;
+    cpu->isar.mvfr0 = 0x10110221;
+    cpu->isar.mvfr1 = 0x12000011;
+    cpu->isar.mvfr2 = 0x00000040;
+    cpu->isar.id_pfr0 = 0x00000030;
+    cpu->isar.id_pfr1 = 0x00000200;
+    cpu->isar.id_dfr0 = 0x00100000;
+    cpu->id_afr0 = 0x00000000;
+    cpu->isar.id_mmfr0 = 0x00100030;
+    cpu->isar.id_mmfr1 = 0x00000000;
+    cpu->isar.id_mmfr2 = 0x01000000;
+    cpu->isar.id_mmfr3 = 0x00000000;
+    cpu->isar.id_isar0 = 0x01101110;
+    cpu->isar.id_isar1 = 0x02112000;
+    cpu->isar.id_isar2 = 0x20232231;
+    cpu->isar.id_isar3 = 0x01111131;
+    cpu->isar.id_isar4 = 0x01310132;
+    cpu->isar.id_isar5 = 0x00000000;
+    cpu->isar.id_isar6 = 0x00000000;
+}
+
+static void cortex_m33_initfn(Object *obj)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+
+    set_feature(&cpu->env, ARM_FEATURE_V8);
+    set_feature(&cpu->env, ARM_FEATURE_M);
+    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
+    set_feature(&cpu->env, ARM_FEATURE_M_SECURITY);
+    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
+    cpu->midr = 0x410fd213; /* r0p3 */
+    cpu->pmsav7_dregion = 16;
+    cpu->sau_sregion = 8;
+    cpu->isar.mvfr0 = 0x10110021;
+    cpu->isar.mvfr1 = 0x11000011;
+    cpu->isar.mvfr2 = 0x00000040;
+    cpu->isar.id_pfr0 = 0x00000030;
+    cpu->isar.id_pfr1 = 0x00000210;
+    cpu->isar.id_dfr0 = 0x00200000;
+    cpu->id_afr0 = 0x00000000;
+    cpu->isar.id_mmfr0 = 0x00101F40;
+    cpu->isar.id_mmfr1 = 0x00000000;
+    cpu->isar.id_mmfr2 = 0x01000000;
+    cpu->isar.id_mmfr3 = 0x00000000;
+    cpu->isar.id_isar0 = 0x01101110;
+    cpu->isar.id_isar1 = 0x02212000;
+    cpu->isar.id_isar2 = 0x20232232;
+    cpu->isar.id_isar3 = 0x01111131;
+    cpu->isar.id_isar4 = 0x01310132;
+    cpu->isar.id_isar5 = 0x00000000;
+    cpu->isar.id_isar6 = 0x00000000;
+    cpu->clidr = 0x00000000;
+    cpu->ctr = 0x8000c000;
+}
+
+static void cortex_m55_initfn(Object *obj)
+{
+    ARMCPU *cpu = ARM_CPU(obj);
+
+    set_feature(&cpu->env, ARM_FEATURE_V8);
+    set_feature(&cpu->env, ARM_FEATURE_V8_1M);
+    set_feature(&cpu->env, ARM_FEATURE_M);
+    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
+    set_feature(&cpu->env, ARM_FEATURE_M_SECURITY);
+    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
+    cpu->midr = 0x410fd221; /* r0p1 */
+    cpu->revidr = 0;
+    cpu->pmsav7_dregion = 16;
+    cpu->sau_sregion = 8;
+    /* These are the MVFR* values for the FPU + full MVE configuration */
+    cpu->isar.mvfr0 = 0x10110221;
+    cpu->isar.mvfr1 = 0x12100211;
+    cpu->isar.mvfr2 = 0x00000040;
+    cpu->isar.id_pfr0 = 0x20000030;
+    cpu->isar.id_pfr1 = 0x00000230;
+    cpu->isar.id_dfr0 = 0x10200000;
+    cpu->id_afr0 = 0x00000000;
+    cpu->isar.id_mmfr0 = 0x00111040;
+    cpu->isar.id_mmfr1 = 0x00000000;
+    cpu->isar.id_mmfr2 = 0x01000000;
+    cpu->isar.id_mmfr3 = 0x00000011;
+    cpu->isar.id_isar0 = 0x01103110;
+    cpu->isar.id_isar1 = 0x02212000;
+    cpu->isar.id_isar2 = 0x20232232;
+    cpu->isar.id_isar3 = 0x01111131;
+    cpu->isar.id_isar4 = 0x01310132;
+    cpu->isar.id_isar5 = 0x00000000;
+    cpu->isar.id_isar6 = 0x00000000;
+    cpu->clidr = 0x00000000; /* caches not implemented */
+    cpu->ctr = 0x8303c003;
+}
+
+static const TCGCPUOps arm_v7m_tcg_ops = {
+    .initialize = arm_translate_init,
+    .synchronize_from_tb = arm_cpu_synchronize_from_tb,
+    .debug_excp_handler = arm_debug_excp_handler,
+    .restore_state_to_opc = arm_restore_state_to_opc,
+
+#ifdef CONFIG_USER_ONLY
+    .record_sigsegv = arm_cpu_record_sigsegv,
+    .record_sigbus = arm_cpu_record_sigbus,
+#else
+    .tlb_fill = arm_cpu_tlb_fill,
+    .cpu_exec_interrupt = arm_v7m_cpu_exec_interrupt,
+    .do_interrupt = arm_v7m_cpu_do_interrupt,
+    .do_transaction_failed = arm_cpu_do_transaction_failed,
+    .do_unaligned_access = arm_cpu_do_unaligned_access,
+    .adjust_watchpoint_address = arm_adjust_watchpoint_address,
+    .debug_check_watchpoint = arm_debug_check_watchpoint,
+    .debug_check_breakpoint = arm_debug_check_breakpoint,
+#endif /* !CONFIG_USER_ONLY */
+};
+
+static void arm_v7m_class_init(ObjectClass *oc, void *data)
+{
+    ARMCPUClass *acc = ARM_CPU_CLASS(oc);
+    CPUClass *cc = CPU_CLASS(oc);
+
+    acc->info = data;
+    cc->tcg_ops = &arm_v7m_tcg_ops;
+    cc->gdb_core_xml_file = "arm-m-profile.xml";
+}
+
+static const ARMCPUInfo arm_v7m_cpus[] = {
+    { .name = "cortex-m0",   .initfn = cortex_m0_initfn,
+                             .class_init = arm_v7m_class_init },
+    { .name = "cortex-m3",   .initfn = cortex_m3_initfn,
+                             .class_init = arm_v7m_class_init },
+    { .name = "cortex-m4",   .initfn = cortex_m4_initfn,
+                             .class_init = arm_v7m_class_init },
+    { .name = "cortex-m7",   .initfn = cortex_m7_initfn,
+                             .class_init = arm_v7m_class_init },
+    { .name = "cortex-m33",  .initfn = cortex_m33_initfn,
+                             .class_init = arm_v7m_class_init },
+    { .name = "cortex-m55",  .initfn = cortex_m55_initfn,
+                             .class_init = arm_v7m_class_init },
+};
+
+static void arm_v7m_cpu_register_types(void)
+{
+    size_t i;
+
+    for (i = 0; i < ARRAY_SIZE(arm_v7m_cpus); ++i) {
+        arm_cpu_register(&arm_v7m_cpus[i]);
+    }
+}
+
+type_init(arm_v7m_cpu_register_types)
diff --git a/target/arm/tcg/cpu32.c b/target/arm/tcg/cpu32.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/cpu32.c
+++ b/target/arm/tcg/cpu32.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/boards.h"
 #endif
 #include "cpregs.h"
-#if !defined(CONFIG_USER_ONLY) && defined(CONFIG_TCG)
-#include "hw/intc/armv7m_nvic.h"
-#endif
 
 
 /* Share AArch32 -cpu max features with AArch64. */
@@ -XXX,XX +XXX,XX @@ void aa32_max_features(ARMCPU *cpu)
 /* CPU models. These are not needed for the AArch64 linux-user build. */
 #if !defined(CONFIG_USER_ONLY) || !defined(TARGET_AARCH64)
 
-#if !defined(CONFIG_USER_ONLY)
-static bool arm_v7m_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
-{
-    CPUClass *cc = CPU_GET_CLASS(cs);
-    ARMCPU *cpu = ARM_CPU(cs);
-    CPUARMState *env = &cpu->env;
-    bool ret = false;
-
-    /*
-     * ARMv7-M interrupt masking works differently than -A or -R.
-     * There is no FIQ/IRQ distinction. Instead of I and F bits
-     * masking FIQ and IRQ interrupts, an exception is taken only
-     * if it is higher priority than the current execution priority
-     * (which depends on state like BASEPRI, FAULTMASK and the
-     * currently active exception).
-     */
-    if (interrupt_request & CPU_INTERRUPT_HARD
-        && (armv7m_nvic_can_take_pending_exception(env->nvic))) {
-        cs->exception_index = EXCP_IRQ;
-        cc->tcg_ops->do_interrupt(cs);
-        ret = true;
-    }
-    return ret;
-}
-#endif /* !CONFIG_USER_ONLY */
-
 static void arm926_initfn(Object *obj)
 {
     ARMCPU *cpu = ARM_CPU(obj);
@@ -XXX,XX +XXX,XX @@ static void cortex_a15_initfn(Object *obj)
     define_arm_cp_regs(cpu, cortexa15_cp_reginfo);
 }
 
-static void cortex_m0_initfn(Object *obj)
-{
-    ARMCPU *cpu = ARM_CPU(obj);
-    set_feature(&cpu->env, ARM_FEATURE_V6);
-    set_feature(&cpu->env, ARM_FEATURE_M);
-
-    cpu->midr = 0x410cc200;
-
-    /*
-     * These ID register values are not guest visible, because
-     * we do not implement the Main Extension. They must be set
-     * to values corresponding to the Cortex-M0's implemented
-     * features, because QEMU generally controls its emulation
-     * by looking at ID register fields. We use the same values as
-     * for the M3.
-     */
-    cpu->isar.id_pfr0 = 0x00000030;
-    cpu->isar.id_pfr1 = 0x00000200;
-    cpu->isar.id_dfr0 = 0x00100000;
-    cpu->id_afr0 = 0x00000000;
-    cpu->isar.id_mmfr0 = 0x00000030;
-    cpu->isar.id_mmfr1 = 0x00000000;
-    cpu->isar.id_mmfr2 = 0x00000000;
-    cpu->isar.id_mmfr3 = 0x00000000;
-    cpu->isar.id_isar0 = 0x01141110;
-    cpu->isar.id_isar1 = 0x02111000;
-    cpu->isar.id_isar2 = 0x21112231;
-    cpu->isar.id_isar3 = 0x01111110;
-    cpu->isar.id_isar4 = 0x01310102;
-    cpu->isar.id_isar5 = 0x00000000;
-    cpu->isar.id_isar6 = 0x00000000;
-}
-
-static void cortex_m3_initfn(Object *obj)
-{
-    ARMCPU *cpu = ARM_CPU(obj);
-    set_feature(&cpu->env, ARM_FEATURE_V7);
-    set_feature(&cpu->env, ARM_FEATURE_M);
-    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
-    cpu->midr = 0x410fc231;
-    cpu->pmsav7_dregion = 8;
-    cpu->isar.id_pfr0 = 0x00000030;
-    cpu->isar.id_pfr1 = 0x00000200;
-    cpu->isar.id_dfr0 = 0x00100000;
-    cpu->id_afr0 = 0x00000000;
-    cpu->isar.id_mmfr0 = 0x00000030;
-    cpu->isar.id_mmfr1 = 0x00000000;
-    cpu->isar.id_mmfr2 = 0x00000000;
-    cpu->isar.id_mmfr3 = 0x00000000;
-    cpu->isar.id_isar0 = 0x01141110;
-    cpu->isar.id_isar1 = 0x02111000;
-    cpu->isar.id_isar2 = 0x21112231;
-    cpu->isar.id_isar3 = 0x01111110;
-    cpu->isar.id_isar4 = 0x01310102;
-    cpu->isar.id_isar5 = 0x00000000;
-    cpu->isar.id_isar6 = 0x00000000;
-}
-
-static void cortex_m4_initfn(Object *obj)
-{
-    ARMCPU *cpu = ARM_CPU(obj);
-
-    set_feature(&cpu->env, ARM_FEATURE_V7);
-    set_feature(&cpu->env, ARM_FEATURE_M);
-    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
-    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
-    cpu->midr = 0x410fc240; /* r0p0 */
-    cpu->pmsav7_dregion = 8;
-    cpu->isar.mvfr0 = 0x10110021;
-    cpu->isar.mvfr1 = 0x11000011;
-    cpu->isar.mvfr2 = 0x00000000;
-    cpu->isar.id_pfr0 = 0x00000030;
-    cpu->isar.id_pfr1 = 0x00000200;
-    cpu->isar.id_dfr0 = 0x00100000;
-    cpu->id_afr0 = 0x00000000;
-    cpu->isar.id_mmfr0 = 0x00000030;
-    cpu->isar.id_mmfr1 = 0x00000000;
-    cpu->isar.id_mmfr2 = 0x00000000;
-    cpu->isar.id_mmfr3 = 0x00000000;
-    cpu->isar.id_isar0 = 0x01141110;
-    cpu->isar.id_isar1 = 0x02111000;
-    cpu->isar.id_isar2 = 0x21112231;
-    cpu->isar.id_isar3 = 0x01111110;
-    cpu->isar.id_isar4 = 0x01310102;
-    cpu->isar.id_isar5 = 0x00000000;
-    cpu->isar.id_isar6 = 0x00000000;
-}
-
-static void cortex_m7_initfn(Object *obj)
-{
-    ARMCPU *cpu = ARM_CPU(obj);
-
-    set_feature(&cpu->env, ARM_FEATURE_V7);
-    set_feature(&cpu->env, ARM_FEATURE_M);
-    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
-    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
-    cpu->midr = 0x411fc272; /* r1p2 */
-    cpu->pmsav7_dregion = 8;
-    cpu->isar.mvfr0 = 0x10110221;
-    cpu->isar.mvfr1 = 0x12000011;
-    cpu->isar.mvfr2 = 0x00000040;
-    cpu->isar.id_pfr0 = 0x00000030;
-    cpu->isar.id_pfr1 = 0x00000200;
-    cpu->isar.id_dfr0 = 0x00100000;
-    cpu->id_afr0 = 0x00000000;
-    cpu->isar.id_mmfr0 = 0x00100030;
-    cpu->isar.id_mmfr1 = 0x00000000;
-    cpu->isar.id_mmfr2 = 0x01000000;
-    cpu->isar.id_mmfr3 = 0x00000000;
-    cpu->isar.id_isar0 = 0x01101110;
-    cpu->isar.id_isar1 = 0x02112000;
-    cpu->isar.id_isar2 = 0x20232231;
-    cpu->isar.id_isar3 = 0x01111131;
-    cpu->isar.id_isar4 = 0x01310132;
-    cpu->isar.id_isar5 = 0x00000000;
-    cpu->isar.id_isar6 = 0x00000000;
-}
-
-static void cortex_m33_initfn(Object *obj)
-{
-    ARMCPU *cpu = ARM_CPU(obj);
-
-    set_feature(&cpu->env, ARM_FEATURE_V8);
-    set_feature(&cpu->env, ARM_FEATURE_M);
-    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
-    set_feature(&cpu->env, ARM_FEATURE_M_SECURITY);
-    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
-    cpu->midr = 0x410fd213; /* r0p3 */
-    cpu->pmsav7_dregion = 16;
-    cpu->sau_sregion = 8;
-    cpu->isar.mvfr0 = 0x10110021;
-    cpu->isar.mvfr1 = 0x11000011;
-    cpu->isar.mvfr2 = 0x00000040;
-    cpu->isar.id_pfr0 = 0x00000030;
-    cpu->isar.id_pfr1 = 0x00000210;
-    cpu->isar.id_dfr0 = 0x00200000;
-    cpu->id_afr0 = 0x00000000;
-    cpu->isar.id_mmfr0 = 0x00101F40;
-    cpu->isar.id_mmfr1 = 0x00000000;
-    cpu->isar.id_mmfr2 = 0x01000000;
-    cpu->isar.id_mmfr3 = 0x00000000;
-    cpu->isar.id_isar0 = 0x01101110;
-    cpu->isar.id_isar1 = 0x02212000;
-    cpu->isar.id_isar2 = 0x20232232;
-    cpu->isar.id_isar3 = 0x01111131;
-    cpu->isar.id_isar4 = 0x01310132;
-    cpu->isar.id_isar5 = 0x00000000;
-    cpu->isar.id_isar6 = 0x00000000;
-    cpu->clidr = 0x00000000;
-    cpu->ctr = 0x8000c000;
-}
-
-static void cortex_m55_initfn(Object *obj)
-{
-    ARMCPU *cpu = ARM_CPU(obj);
-
-    set_feature(&cpu->env, ARM_FEATURE_V8);
-    set_feature(&cpu->env, ARM_FEATURE_V8_1M);
-    set_feature(&cpu->env, ARM_FEATURE_M);
-    set_feature(&cpu->env, ARM_FEATURE_M_MAIN);
-    set_feature(&cpu->env, ARM_FEATURE_M_SECURITY);
-    set_feature(&cpu->env, ARM_FEATURE_THUMB_DSP);
-    cpu->midr = 0x410fd221; /* r0p1 */
-    cpu->revidr = 0;
-    cpu->pmsav7_dregion = 16;
-    cpu->sau_sregion = 8;
-    /* These are the MVFR* values for the FPU + full MVE configuration */
-    cpu->isar.mvfr0 = 0x10110221;
-    cpu->isar.mvfr1 = 0x12100211;
-    cpu->isar.mvfr2 = 0x00000040;
-    cpu->isar.id_pfr0 = 0x20000030;
-    cpu->isar.id_pfr1 = 0x00000230;
-    cpu->isar.id_dfr0 = 0x10200000;
-    cpu->id_afr0 = 0x00000000;
-    cpu->isar.id_mmfr0 = 0x00111040;
-    cpu->isar.id_mmfr1 = 0x00000000;
-    cpu->isar.id_mmfr2 = 0x01000000;
-    cpu->isar.id_mmfr3 = 0x00000011;
-    cpu->isar.id_isar0 = 0x01103110;
-    cpu->isar.id_isar1 = 0x02212000;
-    cpu->isar.id_isar2 = 0x20232232;
-    cpu->isar.id_isar3 = 0x01111131;
-    cpu->isar.id_isar4 = 0x01310132;
-    cpu->isar.id_isar5 = 0x00000000;
-    cpu->isar.id_isar6 = 0x00000000;
-    cpu->clidr = 0x00000000; /* caches not implemented */
-    cpu->ctr = 0x8303c003;
-}
-
 static const ARMCPRegInfo cortexr5_cp_reginfo[] = {
     /* Dummy the TCM region regs for the moment */
     { .name = "ATCM", .cp = 15, .opc1 = 0, .crn = 9, .crm = 1, .opc2 = 0,
@@ -XXX,XX +XXX,XX @@ static void pxa270c5_initfn(Object *obj)
     cpu->reset_sctlr = 0x00000078;
 }
 
-static const TCGCPUOps arm_v7m_tcg_ops = {
-    .initialize = arm_translate_init,
-    .synchronize_from_tb = arm_cpu_synchronize_from_tb,
-    .debug_excp_handler = arm_debug_excp_handler,
-    .restore_state_to_opc = arm_restore_state_to_opc,
-
-#ifdef CONFIG_USER_ONLY
-    .record_sigsegv = arm_cpu_record_sigsegv,
-    .record_sigbus = arm_cpu_record_sigbus,
-#else
-    .tlb_fill = arm_cpu_tlb_fill,
-    .cpu_exec_interrupt = arm_v7m_cpu_exec_interrupt,
-    .do_interrupt = arm_v7m_cpu_do_interrupt,
-    .do_transaction_failed = arm_cpu_do_transaction_failed,
-    .do_unaligned_access = arm_cpu_do_unaligned_access,
-    .adjust_watchpoint_address = arm_adjust_watchpoint_address,
-    .debug_check_watchpoint = arm_debug_check_watchpoint,
-    .debug_check_breakpoint = arm_debug_check_breakpoint,
-#endif /* !CONFIG_USER_ONLY */
-};
-
-static void arm_v7m_class_init(ObjectClass *oc, void *data)
-{
-    ARMCPUClass *acc = ARM_CPU_CLASS(oc);
-    CPUClass *cc = CPU_CLASS(oc);
-
-    acc->info = data;
-    cc->tcg_ops = &arm_v7m_tcg_ops;
-    cc->gdb_core_xml_file = "arm-m-profile.xml";
-}
-
 #ifndef TARGET_AARCH64
 /*
  * -cpu max: a CPU with as many features enabled as our emulation supports.
@@ -XXX,XX +XXX,XX @@ static const ARMCPUInfo arm_tcg_cpus[] = {
     { .name = "cortex-a8",   .initfn = cortex_a8_initfn },
     { .name = "cortex-a9",   .initfn = cortex_a9_initfn },
     { .name = "cortex-a15",  .initfn = cortex_a15_initfn },
-    { .name = "cortex-m0",   .initfn = cortex_m0_initfn,
-                             .class_init = arm_v7m_class_init },
-    { .name = "cortex-m3",   .initfn = cortex_m3_initfn,
-                             .class_init = arm_v7m_class_init },
-    { .name = "cortex-m4",   .initfn = cortex_m4_initfn,
-                             .class_init = arm_v7m_class_init },
-    { .name = "cortex-m7",   .initfn = cortex_m7_initfn,
-                             .class_init = arm_v7m_class_init },
-    { .name = "cortex-m33",  .initfn = cortex_m33_initfn,
-                             .class_init = arm_v7m_class_init },
-    { .name = "cortex-m55",  .initfn = cortex_m55_initfn,
-                             .class_init = arm_v7m_class_init },
     { .name = "cortex-r5",   .initfn = cortex_r5_initfn },
     { .name = "cortex-r5f",  .initfn = cortex_r5f_initfn },
     { .name = "cortex-r52",  .initfn = cortex_r52_initfn },
diff --git a/target/arm/meson.build b/target/arm/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -XXX,XX +XXX,XX @@ arm_system_ss.add(files(
   'ptw.c',
 ))
 
+arm_user_ss = ss.source_set()
+
 subdir('hvf')
 
 if 'CONFIG_TCG' in config_all_accel
@@ -XXX,XX +XXX,XX @@ endif
 
 target_arch += {'arm': arm_ss}
 target_system_arch += {'arm': arm_system_ss}
+target_user_arch += {'arm': arm_user_ss}
diff --git a/target/arm/tcg/meson.build b/target/arm/tcg/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/tcg/meson.build
+++ b/target/arm/tcg/meson.build
@@ -XXX,XX +XXX,XX @@ arm_ss.add(when: 'TARGET_AARCH64', if_true: files(
 arm_system_ss.add(files(
   'psci.c',
 ))
+
+arm_system_ss.add(when: 'CONFIG_ARM_V7M', if_true: files('cpu-v7m.c'))
+arm_user_ss.add(when: 'TARGET_AARCH64', if_false: files('cpu-v7m.c'))
-- 
2.34.1