Series comparison

-[Qemu-devel] [PULL 00/16] target-arm queue
+[PULL 00/26] target-arm queue
-The following changes since commit ad1b4ec39caa5b3f17cbd8160283a03a3dcfe2ae:
+The following changes since commit 4cc10cae64c51e17844dc4358481c393d7bf1ed4:
-  Merge remote-tracking branch 'remotes/kraxel/tags/input-20180515-pull-request' into staging (2018-05-15 12:50:06 +0100)
+  Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging (2021-05-06 18:56:17 +0100)
 are available in the Git repository at:
-  git://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20180515
+  https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210510
-for you to fetch changes up to ae7651804748c6b479d5ae09aeac4edb9c44f76e:
+for you to fetch changes up to 8f96812baa53005f32aece3e30b140826c20aa19:
-  tcg: Optionally log FPU state in TCG -d cpu logging (2018-05-15 14:58:44 +0100)
+  hw/arm/xlnx: Fix PHY address for xilinx-zynq-a9 (2021-05-10 13:24:09 +0100)
 ----------------------------------------------------------------
 target-arm queue:
- * Fix coverity nit in int_to_float code
+ * docs: fix link in sbsa description
- * Don't set Invalid for float-to-int(MAXINT)
+ * linux-user/aarch64: Enable hwcap for RND, BTI, and MTE
- * Fix fp_status_f16 tininess before rounding
+ * target/arm: Fix tlbbits calculation in tlbi_aa64_vae2is_write()
- * Add various missing insns from the v8.2-FP16 extension
+ * target/arm: Split neon and vfp translation to their own
- * Fix sqrt_f16 exception raising
+   compilation units
- * sdcard: Correct CRC16 offset in sd_function_switch()
+ * target/arm: Make WFI a NOP for userspace emulators
- * tcg: Optionally log FPU state in TCG -d cpu logging
+ * hw/sd/omap_mmc: Use device_cold_reset() instead of
    device_legacy_reset()
  * include: More fixes for 'extern "C"' block use
  * hw/arm/imx25_pdk: Fix error message for invalid RAM size
  * hw/arm/mps2-tz: Implement AN524 memory remapping via machine property
  * hw/arm/xlnx: Fix PHY address for xilinx-zynq-a9
 ----------------------------------------------------------------
-Alex Bennée (5):
+Alex Bennée (1):
-      fpu/softfloat: int_to_float ensure r fully initialised
+      docs: fix link in sbsa description
       target/arm: Implement FCMP for fp16
       target/arm: Implement FCSEL for fp16
       target/arm: Implement FMOV (immediate) for fp16
       target/arm: Fix sqrt_f16 exception raising
-Peter Maydell (3):
+Guenter Roeck (1):
-      fpu/softfloat: Don't set Invalid for float-to-int(MAXINT)
+      hw/arm/xlnx: Fix PHY address for xilinx-zynq-a9
-      target/arm: Fix fp_status_f16 tininess before rounding
-      tcg: Optionally log FPU state in TCG -d cpu logging
+Peter Maydell (22):
       target/arm: Fix tlbbits calculation in tlbi_aa64_vae2is_write()
       target/arm: Move constant expanders to translate.h
       target/arm: Share unallocated_encoding() and gen_exception_insn()
       target/arm: Make functions used by m-nocp global
       target/arm: Split m-nocp trans functions into their own file
       target/arm: Move gen_aa32 functions to translate-a32.h
       target/arm: Move vfp_{load, store}_reg{32, 64} to translate-vfp.c.inc
       target/arm: Make functions used by translate-vfp global
       target/arm: Make translate-vfp.c.inc its own compilation unit
       target/arm: Move vfp_reg_ptr() to translate-neon.c.inc
       target/arm: Delete unused typedef
       target/arm: Move NeonGenThreeOpEnvFn typedef to translate.h
       target/arm: Make functions used by translate-neon global
       target/arm: Make translate-neon.c.inc its own compilation unit
       target/arm: Make WFI a NOP for userspace emulators
       hw/sd/omap_mmc: Use device_cold_reset() instead of device_legacy_reset()
       osdep: Make os-win32.h and os-posix.h handle 'extern "C"' themselves
       include/qemu/bswap.h: Handle being included outside extern "C" block
       include/disas/dis-asm.h: Handle being included outside 'extern "C"'
       hw/misc/mps2-scc: Add "QEMU interface" comment
       hw/misc/mps2-scc: Support using CFG0 bit 0 for remapping
       hw/arm/mps2-tz: Implement AN524 memory remapping via machine property
 Philippe Mathieu-Daudé (1):
-      sdcard: Correct CRC16 offset in sd_function_switch()
+      hw/arm/imx25_pdk: Fix error message for invalid RAM size
-Richard Henderson (7):
+Richard Henderson (1):
-      target/arm: Implement FMOV (general) for fp16
+      linux-user/aarch64: Enable hwcap for RND, BTI, and MTE
       target/arm: Early exit after unallocated_encoding in disas_fp_int_conv
       target/arm: Implement FCVT (scalar, integer) for fp16
       target/arm: Implement FCVT (scalar, fixed-point) for fp16
       target/arm: Introduce and use read_fp_hreg
       target/arm: Implement FP data-processing (2 source) for fp16
       target/arm: Implement FP data-processing (3 source) for fp16
- include/qemu/log.h         |   1 +
+ docs/system/arm/mps2.rst                           |  10 +
- target/arm/helper-a64.h    |   2 +
+ docs/system/arm/sbsa.rst                           |   2 +-
- target/arm/helper.h        |   6 +
+ include/disas/dis-asm.h                            |  12 +-
- accel/tcg/cpu-exec.c       |   9 +-
+ include/hw/misc/mps2-scc.h                         |  21 ++
- fpu/softfloat.c            |   6 +-
+ include/qemu/bswap.h                               |  26 ++-
- hw/sd/sd.c                 |   2 +-
+ include/qemu/osdep.h                               |   8 +-
- target/arm/cpu.c           |   2 +
+ include/sysemu/os-posix.h                          |   8 +
- target/arm/helper-a64.c    |  10 ++
+ include/sysemu/os-win32.h                          |   8 +
- target/arm/helper.c        |  38 +++-
+ target/arm/translate-a32.h                         | 144 +++++++++++++
- target/arm/translate-a64.c | 421 ++++++++++++++++++++++++++++++++++++++-------
+ target/arm/translate-a64.h                         |   2 -
- util/log.c                 |   2 +
+ target/arm/translate.h                             |  29 +++
-files changed, 428 insertions(+), 71 deletions(-)
+ hw/arm/imx25_pdk.c                                 |   5 +-
  hw/arm/mps2-tz.c                                   | 108 +++++++++-
  hw/arm/xilinx_zynq.c                               |   2 +-
  hw/misc/mps2-scc.c                                 |  13 +-
  hw/sd/omap_mmc.c                                   |   2 +-
  linux-user/elfload.c                               |  13 ++
  target/arm/helper.c                                |   2 +-
  target/arm/op_helper.c                             |  12 ++
  target/arm/translate-a64.c                         |  15 --
  target/arm/translate-m-nocp.c                      | 221 ++++++++++++++++++++
  .../arm/{translate-neon.c.inc => translate-neon.c} |  19 +-
  .../arm/{translate-vfp.c.inc => translate-vfp.c}   | 230 +++------------------
  target/arm/translate.c                             | 200 ++++--------------
  disas/arm-a64.cc                                   |   2 -
  disas/nanomips.cpp                                 |   2 -
  target/arm/meson.build                             |  15 +-
 files changed, 718 insertions(+), 413 deletions(-)
  create mode 100644 target/arm/translate-a32.h
  create mode 100644 target/arm/translate-m-nocp.c
  rename target/arm/{translate-neon.c.inc => translate-neon.c} (99%)
  rename target/arm/{translate-vfp.c.inc => translate-vfp.c} (94%)

-[Qemu-devel] [PULL 01/16] fpu/softfloat: int_to_float ensure r fully initialised
+[PULL 01/26] docs: fix link in sbsa description
 From: Alex Bennée <alex.bennee@linaro.org>
-Reported by Coverity (CID1390635). We ensure this for uint_to_float
+A trailing _ makes all the difference to the rendered link.
 later on so we might as well mirror that.
 Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
-Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210428131316.31390-1-alex.bennee@linaro.org
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- fpu/softfloat.c | 2 +-
+ docs/system/arm/sbsa.rst | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/fpu/softfloat.c b/fpu/softfloat.c
+diff --git a/docs/system/arm/sbsa.rst b/docs/system/arm/sbsa.rst
 index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat.c
+--- a/docs/system/arm/sbsa.rst
-+++ b/fpu/softfloat.c
++++ b/docs/system/arm/sbsa.rst
-@@ -XXX,XX +XXX,XX @@ FLOAT_TO_UINT(64, 64)
+@@ -XXX,XX +XXX,XX @@ Arm Server Base System Architecture Reference board (``sbsa-ref``)
+ While the `virt` board is a generic board platform that doesn't match
- static FloatParts int_to_float(int64_t a, float_status *status)
+ any real hardware the `sbsa-ref` board intends to look like real
- {
+ hardware. The `Server Base System Architecture
--    FloatParts r;
+-<https://developer.arm.com/documentation/den0029/latest>` defines a
-+    FloatParts r = {};
++<https://developer.arm.com/documentation/den0029/latest>`_ defines a
-     if (a == 0) {
+ minimum base line of hardware support and importantly how the firmware
-         r.cls = float_class_zero;
+ reports that to any operating system. It is a static system that
-         r.sign = false;
+ reports a very minimal DT to the firmware for non-discoverable
 --
-.17.0
+.20.1

-[Qemu-devel] [PULL 07/16] target/arm: Implement FCVT (scalar, fixed-point) for fp16
+[PULL 02/26] linux-user/aarch64: Enable hwcap for RND, BTI, and MTE
 From: Richard Henderson <richard.henderson@linaro.org>
-Cc: qemu-stable@nongnu.org
+These three features are already enabled by TCG, but are missing
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+their hwcap bits.  Update HWCAP2 from linux v5.12.
 Cc: qemu-stable@nongnu.org (for 6.0.1)
 Buglink: https://bugs.launchpad.net/bugs/1926044
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Tested-by: Alex Bennée <alex.bennee@linaro.org>
+Message-id: 20210427214108.88503-1-richard.henderson@linaro.org
 Message-id: 20180512003217.9105-5-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 17 +++++++++++++++--
+ linux-user/elfload.c | 13 +++++++++++++
-file changed, 15 insertions(+), 2 deletions(-)
+file changed, 13 insertions(+)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/linux-user/elfload.c b/linux-user/elfload.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/linux-user/elfload.c
-+++ b/target/arm/translate-a64.c
++++ b/linux-user/elfload.c
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_fixed_conv(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ enum {
-     bool sf = extract32(insn, 31, 1);
+     ARM_HWCAP2_A64_SVESM4       = 1 << 6,
-     bool itof;
+     ARM_HWCAP2_A64_FLAGM2       = 1 << 7,
+     ARM_HWCAP2_A64_FRINT        = 1 << 8,
--    if (sbit || (type > 1)
++    ARM_HWCAP2_A64_SVEI8MM      = 1 << 9,
--        || (!sf && scale < 32)) {
++    ARM_HWCAP2_A64_SVEF32MM     = 1 << 10,
-+    if (sbit || (!sf && scale < 32)) {
++    ARM_HWCAP2_A64_SVEF64MM     = 1 << 11,
-+        unallocated_encoding(s);
++    ARM_HWCAP2_A64_SVEBF16      = 1 << 12,
-+        return;
++    ARM_HWCAP2_A64_I8MM         = 1 << 13,
-+    }
++    ARM_HWCAP2_A64_BF16         = 1 << 14,
-+
++    ARM_HWCAP2_A64_DGH          = 1 << 15,
-+    switch (type) {
++    ARM_HWCAP2_A64_RNG          = 1 << 16,
-+    case 0: /* float32 */
++    ARM_HWCAP2_A64_BTI          = 1 << 17,
-+    case 1: /* float64 */
++    ARM_HWCAP2_A64_MTE          = 1 << 18,
-+        break;
+ };
-+    case 3: /* float16 */
-+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+ #define ELF_HWCAP   get_elf_hwcap()
-+            break;
+@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap2(void)
-+        }
+     GET_FEATURE_ID(aa64_dcpodp, ARM_HWCAP2_A64_DCPODP);
-+        /* fallthru */
+     GET_FEATURE_ID(aa64_condm_5, ARM_HWCAP2_A64_FLAGM2);
-+    default:
+     GET_FEATURE_ID(aa64_frint, ARM_HWCAP2_A64_FRINT);
-         unallocated_encoding(s);
++    GET_FEATURE_ID(aa64_rndr, ARM_HWCAP2_A64_RNG);
-         return;
++    GET_FEATURE_ID(aa64_bti, ARM_HWCAP2_A64_BTI);
-     }
++    GET_FEATURE_ID(aa64_mte, ARM_HWCAP2_A64_MTE);
      return hwcaps;
  }
 --
-.17.0
+.20.1

-[Qemu-devel] [PULL 14/16] target/arm: Fix sqrt_f16 exception raising
+[PULL 03/26] target/arm: Fix tlbbits calculation in tlbi_aa64_vae2is_write()
-From: Alex Bennée <alex.bennee@linaro.org>
+In tlbi_aa64_vae2is_write() the calculation
   bits = tlbbits_for_regime(env, secure ? ARMMMUIdx_E2 : ARMMMUIdx_SE2,
                             pageaddr)
-We are meant to explicitly pass fpst, not cpu_env.
+has the two arms of the ?: expression reversed. Fix the bug.
-Cc: qemu-stable@nongnu.org
+Fixes: b6ad6062f1e5
-Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
+Reported-by: Rebecca Cran <rebecca@nuviainc.com>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Tested-by: Alex Bennée <alex.bennee@linaro.org>
 Message-id: 20180512003217.9105-12-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
+Reviewed-by: Rebecca Cran <rebecca@nuviainc.com>
+Message-id: 20210420123106.10861-1-peter.maydell@linaro.org
 ---
- target/arm/translate-a64.c | 3 ++-
+ target/arm/helper.c | 2 +-
-file changed, 2 insertions(+), 1 deletion(-)
+file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/target/arm/helper.c b/target/arm/helper.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/helper.c
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/helper.c
-@@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn)
+@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri,
-         tcg_gen_xori_i32(tcg_res, tcg_op, 0x8000);
+     uint64_t pageaddr = sextract64(value << 12, 0, 56);
-         break;
+     bool secure = arm_is_secure_below_el3(env);
-     case 0x3: /* FSQRT */
+     int mask = secure ? ARMMMUIdxBit_SE2 : ARMMMUIdxBit_E2;
--        gen_helper_sqrt_f16(tcg_res, tcg_op, cpu_env);
+-    int bits = tlbbits_for_regime(env, secure ? ARMMMUIdx_E2 : ARMMMUIdx_SE2,
-+        fpst = get_fpstatus_ptr(true);
++    int bits = tlbbits_for_regime(env, secure ? ARMMMUIdx_SE2 : ARMMMUIdx_E2,
-+        gen_helper_sqrt_f16(tcg_res, tcg_op, fpst);
+                                   pageaddr);
-         break;
-     case 0x8: /* FRINTN */
+     tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits);
      case 0x9: /* FRINTP */
 --
-.17.0
+.20.1

-[Qemu-devel] [PULL 11/16] target/arm: Implement FCMP for fp16
+[PULL 04/26] target/arm: Move constant expanders to translate.h
-From: Alex Bennée <alex.bennee@linaro.org>
+Some of the constant expanders defined in translate.c are generically
 useful and will be used by the separate C files for VFP and Neon once
 they are created; move the expander definitions to translate.h.
-These where missed out from the rest of the half-precision work.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210430132740.10391-2-peter.maydell@linaro.org
 ---
  target/arm/translate.h | 24 ++++++++++++++++++++++++
  target/arm/translate.c | 24 ------------------------
 files changed, 24 insertions(+), 24 deletions(-)
-Cc: qemu-stable@nongnu.org
+diff --git a/target/arm/translate.h b/target/arm/translate.h
 Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
 Tested-by: Alex Bennée <alex.bennee@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20180512003217.9105-9-richard.henderson@linaro.org
 [rth: Diagnose lack of FP16 before fp_access_check]
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/helper-a64.h    |  2 +
  target/arm/helper-a64.c    | 10 +++++
  target/arm/translate-a64.c | 88 ++++++++++++++++++++++++++++++--------
 files changed, 83 insertions(+), 17 deletions(-)
 diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper-a64.h
+--- a/target/arm/translate.h
-+++ b/target/arm/helper-a64.h
++++ b/target/arm/translate.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ extern TCGv_i32 cpu_NF, cpu_ZF, cpu_CF, cpu_VF;
- DEF_HELPER_FLAGS_2(udiv64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+ extern TCGv_i64 cpu_exclusive_addr;
- DEF_HELPER_FLAGS_2(sdiv64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
+ extern TCGv_i64 cpu_exclusive_val;
- DEF_HELPER_FLAGS_1(rbit64, TCG_CALL_NO_RWG_SE, i64, i64)
-+DEF_HELPER_3(vfp_cmph_a64, i64, f16, f16, ptr)
++/*
-+DEF_HELPER_3(vfp_cmpeh_a64, i64, f16, f16, ptr)
++ * Constant expanders for the decoders.
- DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
++ */
- DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
++
- DEF_HELPER_3(vfp_cmpd_a64, i64, f64, f64, ptr)
++static inline int negate(DisasContext *s, int x)
 diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/helper-a64.c
 +++ b/target/arm/helper-a64.c
@@ -XXX,XX +XXX,XX @@ static inline uint32_t float_rel_to_flags(int res)
      return flags;
  }
 +uint64_t HELPER(vfp_cmph_a64)(float16 x, float16 y, void *fp_status)
 +{
-+    return float_rel_to_flags(float16_compare_quiet(x, y, fp_status));
++    return -x;
 +}
 +
-+uint64_t HELPER(vfp_cmpeh_a64)(float16 x, float16 y, void *fp_status)
++static inline int plus_2(DisasContext *s, int x)
 +{
-+    return float_rel_to_flags(float16_compare(x, y, fp_status));
++    return x + 2;
 +}
 +
- uint64_t HELPER(vfp_cmps_a64)(float32 x, float32 y, void *fp_status)
++static inline int times_2(DisasContext *s, int x)
 +{
 +    return x * 2;
 +}
 +
 +static inline int times_4(DisasContext *s, int x)
 +{
 +    return x * 4;
 +}
 +
  static inline int arm_dc_feature(DisasContext *dc, int feature)
  {
-     return float_rel_to_flags(float32_compare_quiet(x, y, fp_status));
+     return (dc->features & (1ULL << feature)) != 0;
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/translate.c
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void arm_gen_condlabel(DisasContext *s)
      }
  }
--static void handle_fp_compare(DisasContext *s, bool is_double,
+-/*
-+static void handle_fp_compare(DisasContext *s, int size,
+- * Constant expanders for the decoders.
-                               unsigned int rn, unsigned int rm,
+- */
-                               bool cmp_with_zero, bool signal_all_nans)
+-
- {
+-static int negate(DisasContext *s, int x)
-     TCGv_i64 tcg_flags = tcg_temp_new_i64();
+-{
--    TCGv_ptr fpst = get_fpstatus_ptr(false);
+-    return -x;
-+    TCGv_ptr fpst = get_fpstatus_ptr(size == MO_16);
+-}
+-
--    if (is_double) {
+-static int plus_2(DisasContext *s, int x)
-+    if (size == MO_64) {
+-{
-         TCGv_i64 tcg_vn, tcg_vm;
+-    return x + 2;
+-}
-         tcg_vn = read_fp_dreg(s, rn);
+-
-@@ -XXX,XX +XXX,XX @@ static void handle_fp_compare(DisasContext *s, bool is_double,
+-static int times_2(DisasContext *s, int x)
-         tcg_temp_free_i64(tcg_vn);
+-{
-         tcg_temp_free_i64(tcg_vm);
+-    return x * 2;
-     } else {
+-}
--        TCGv_i32 tcg_vn, tcg_vm;
+-
-+        TCGv_i32 tcg_vn = tcg_temp_new_i32();
+-static int times_4(DisasContext *s, int x)
-+        TCGv_i32 tcg_vm = tcg_temp_new_i32();
+-{
+-    return x * 4;
--        tcg_vn = read_fp_sreg(s, rn);
+-}
-+        read_vec_element_i32(s, tcg_vn, rn, 0, size);
+-
-         if (cmp_with_zero) {
+ /* Flags for the disas_set_da_iss info argument:
--            tcg_vm = tcg_const_i32(0);
+  * lower bits hold the Rt register number, higher bits are flags.
-+            tcg_gen_movi_i32(tcg_vm, 0);
+  */
          } else {
 -            tcg_vm = read_fp_sreg(s, rm);
 +            read_vec_element_i32(s, tcg_vm, rm, 0, size);
          }
 -        if (signal_all_nans) {
 -            gen_helper_vfp_cmpes_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
 -        } else {
 -            gen_helper_vfp_cmps_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
 +
 +        switch (size) {
 +        case MO_32:
 +            if (signal_all_nans) {
 +                gen_helper_vfp_cmpes_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
 +            } else {
 +                gen_helper_vfp_cmps_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
 +            }
 +            break;
 +        case MO_16:
 +            if (signal_all_nans) {
 +                gen_helper_vfp_cmpeh_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
 +            } else {
 +                gen_helper_vfp_cmph_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
 +            }
 +            break;
 +        default:
 +            g_assert_not_reached();
          }
 +
          tcg_temp_free_i32(tcg_vn);
          tcg_temp_free_i32(tcg_vm);
      }
@@ -XXX,XX +XXX,XX @@ static void handle_fp_compare(DisasContext *s, bool is_double,
  static void disas_fp_compare(DisasContext *s, uint32_t insn)
  {
      unsigned int mos, type, rm, op, rn, opc, op2r;
 +    int size;
      mos = extract32(insn, 29, 3);
 -    type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
 +    type = extract32(insn, 22, 2);
      rm = extract32(insn, 16, 5);
      op = extract32(insn, 14, 2);
      rn = extract32(insn, 5, 5);
      opc = extract32(insn, 3, 2);
      op2r = extract32(insn, 0, 3);
 -    if (mos || op || op2r || type > 1) {
 +    if (mos || op || op2r) {
 +        unallocated_encoding(s);
 +        return;
 +    }
 +
 +    switch (type) {
 +    case 0:
 +        size = MO_32;
 +        break;
 +    case 1:
 +        size = MO_64;
 +        break;
 +    case 3:
 +        size = MO_16;
 +        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 +            break;
 +        }
 +        /* fallthru */
 +    default:
          unallocated_encoding(s);
          return;
      }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_compare(DisasContext *s, uint32_t insn)
          return;
      }
 -    handle_fp_compare(s, type, rn, rm, opc & 1, opc & 2);
 +    handle_fp_compare(s, size, rn, rm, opc & 1, opc & 2);
  }
  /* Floating point conditional compare
@@ -XXX,XX +XXX,XX @@ static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
      unsigned int mos, type, rm, cond, rn, op, nzcv;
      TCGv_i64 tcg_flags;
      TCGLabel *label_continue = NULL;
 +    int size;
      mos = extract32(insn, 29, 3);
 -    type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
 +    type = extract32(insn, 22, 2);
      rm = extract32(insn, 16, 5);
      cond = extract32(insn, 12, 4);
      rn = extract32(insn, 5, 5);
      op = extract32(insn, 4, 1);
      nzcv = extract32(insn, 0, 4);
 -    if (mos || type > 1) {
 +    if (mos) {
 +        unallocated_encoding(s);
 +        return;
 +    }
 +
 +    switch (type) {
 +    case 0:
 +        size = MO_32;
 +        break;
 +    case 1:
 +        size = MO_64;
 +        break;
 +    case 3:
 +        size = MO_16;
 +        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 +            break;
 +        }
 +        /* fallthru */
 +    default:
          unallocated_encoding(s);
          return;
      }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
          gen_set_label(label_match);
      }
 -    handle_fp_compare(s, type, rn, rm, false, op);
 +    handle_fp_compare(s, size, rn, rm, false, op);
      if (cond < 0x0e) {
          gen_set_label(label_continue);
 --
-.17.0
+.20.1

-[Qemu-devel] [PULL 04/16] target/arm: Implement FMOV (general) for fp16
+[PULL 05/26] target/arm: Share unallocated_encoding() and gen_exception_insn()
-From: Richard Henderson <richard.henderson@linaro.org>
+The unallocated_encoding() function is the same in both
 translate-a64.c and translate.c; make the translate.c function global
 and drop the translate-a64.c version.  To do this we need to also
 share gen_exception_insn(), which currently exists in two slightly
 different versions for A32 and A64: merge those into a single
 function that can work for both.
-Adding the fp16 moves to/from general registers.
+This will be useful for splitting up translate.c, which will require
 unallocated_encoding() to no longer be file-local.  It's also
 hopefully less confusing to have only one version of the function
 rather than two.
-Cc: qemu-stable@nongnu.org
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Tested-by: Alex Bennée <alex.bennee@linaro.org>
-Message-id: 20180512003217.9105-2-richard.henderson@linaro.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210430132740.10391-3-peter.maydell@linaro.org
 ---
- target/arm/translate-a64.c | 21 +++++++++++++++++++++
+ target/arm/translate-a64.h |  2 --
-file changed, 21 insertions(+)
+ target/arm/translate.h     |  3 +++
  target/arm/translate-a64.c | 15 ---------------
  target/arm/translate.c     | 14 +++++++++-----
 files changed, 12 insertions(+), 22 deletions(-)
+diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-a64.h
++++ b/target/arm/translate-a64.h
+@@ -XXX,XX +XXX,XX @@
+ #ifndef TARGET_ARM_TRANSLATE_A64_H
+ #define TARGET_ARM_TRANSLATE_A64_H
+-void unallocated_encoding(DisasContext *s);
+-
+ #define unsupported_encoding(s, insn)                                    \
+     do {                                                                 \
+         qemu_log_mask(LOG_UNIMP,                                         \
+diff --git a/target/arm/translate.h b/target/arm/translate.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.h
++++ b/target/arm/translate.h
+@@ -XXX,XX +XXX,XX @@ void arm_free_cc(DisasCompare *cmp);
+ void arm_jump_cc(DisasCompare *cmp, TCGLabel *label);
+ void arm_gen_test_cc(int cc, TCGLabel *label);
+ MemOp pow2_align(unsigned i);
++void unallocated_encoding(DisasContext *s);
++void gen_exception_insn(DisasContext *s, uint64_t pc, int excp,
++                        uint32_t syn, uint32_t target_el);
+ /* Return state of Alternate Half-precision flag, caller frees result */
+ static inline TCGv_i32 get_ahp_flag(void)
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
-@@ -XXX,XX +XXX,XX @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
+@@ -XXX,XX +XXX,XX @@ static void gen_exception_internal_insn(DisasContext *s, uint64_t pc, int excp)
-             tcg_gen_st_i64(tcg_rn, cpu_env, fp_reg_hi_offset(s, rd));
+     s->base.is_jmp = DISAS_NORETURN;
-             clear_vec_high(s, true, rd);
+ }
-             break;
-+        case 3:
+-static void gen_exception_insn(DisasContext *s, uint64_t pc, int excp,
-+            /* 16 bit */
+-                               uint32_t syndrome, uint32_t target_el)
-+            tmp = tcg_temp_new_i64();
+-{
-+            tcg_gen_ext16u_i64(tmp, tcg_rn);
+-    gen_a64_set_pc_im(pc);
-+            write_fp_dreg(s, rd, tmp);
+-    gen_exception(excp, syndrome, target_el);
-+            tcg_temp_free_i64(tmp);
+-    s->base.is_jmp = DISAS_NORETURN;
-+            break;
+-}
-+        default:
+-
-+            g_assert_not_reached();
+ static void gen_exception_bkpt_insn(DisasContext *s, uint32_t syndrome)
-         }
+ {
-     } else {
+     TCGv_i32 tcg_syn;
-         TCGv_i64 tcg_rd = cpu_reg(s, rd);
+@@ -XXX,XX +XXX,XX @@ static inline void gen_goto_tb(DisasContext *s, int n, uint64_t dest)
@@ -XXX,XX +XXX,XX @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
              /* 64 bits from top half */
              tcg_gen_ld_i64(tcg_rd, cpu_env, fp_reg_hi_offset(s, rn));
              break;
 +        case 3:
 +            /* 16 bit */
 +            tcg_gen_ld16u_i64(tcg_rd, cpu_env, fp_reg_offset(s, rn, MO_16));
 +            break;
 +        default:
 +            g_assert_not_reached();
          }
      }
  }
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
-         case 0xa: /* 64 bit */
+-void unallocated_encoding(DisasContext *s)
-         case 0xd: /* 64 bit to top half of quad */
+-{
-             break;
+-    /* Unallocated and reserved encodings are uncategorized */
-+        case 0x6: /* 16-bit float, 32-bit int */
+-    gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
-+        case 0xe: /* 16-bit float, 64-bit int */
+-                       default_exception_el(s));
-+            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+-}
-+                break;
+-
-+            }
+ static void init_tmp_a64_array(DisasContext *s)
-+            /* fallthru */
+ {
-         default:
+ #ifdef CONFIG_DEBUG_TCG
-             /* all other sf/type/rmode combinations are invalid */
+diff --git a/target/arm/translate.c b/target/arm/translate.c
-             unallocated_encoding(s);
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_exception_internal_insn(DisasContext *s, uint32_t pc, int excp)
      s->base.is_jmp = DISAS_NORETURN;
  }
 -static void gen_exception_insn(DisasContext *s, uint32_t pc, int excp,
 -                               int syn, uint32_t target_el)
 +void gen_exception_insn(DisasContext *s, uint64_t pc, int excp,
 +                        uint32_t syn, uint32_t target_el)
  {
 -    gen_set_condexec(s);
 -    gen_set_pc_im(s, pc);
 +    if (s->aarch64) {
 +        gen_a64_set_pc_im(pc);
 +    } else {
 +        gen_set_condexec(s);
 +        gen_set_pc_im(s, pc);
 +    }
      gen_exception(excp, syn, target_el);
      s->base.is_jmp = DISAS_NORETURN;
  }
@@ -XXX,XX +XXX,XX @@ static void gen_exception_bkpt_insn(DisasContext *s, uint32_t syn)
      s->base.is_jmp = DISAS_NORETURN;
  }
 -static void unallocated_encoding(DisasContext *s)
 +void unallocated_encoding(DisasContext *s)
  {
      /* Unallocated and reserved encodings are uncategorized */
      gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
 --
-.17.0
+.20.1

-New patch
+[PULL 06/26] target/arm: Make functions used by m-nocp global
+We want to split out the .c.inc files which are currently included
 into translate.c so they are separate compilation units.  To do this
 we need to make some functions which are currently file-local to
 translate.c have global scope; create a translate-a32.h paralleling
 the existing translate-a64.h as a place for these declarations to
 live, so that code moved into the new compilation units can call
 them.
 The functions made global here are those required by the
 m-nocp.decode functions, except that I have converted the whole
 family of {read,write}_neon_element* and also both the load_cpu and
 store_cpu functions for consistency, even though m-nocp only wants a
 few functions from each.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210430132740.10391-4-peter.maydell@linaro.org
 ---
  target/arm/translate-a32.h     | 57 ++++++++++++++++++++++++++++++++++
  target/arm/translate.c         | 39 +++++------------------
  target/arm/translate-vfp.c.inc |  2 +-
 files changed, 65 insertions(+), 33 deletions(-)
  create mode 100644 target/arm/translate-a32.h
 diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/translate-a32.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + *  AArch32 translation, common definitions.
 + *
 + * Copyright (c) 2021 Linaro, Ltd.
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2.1 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
 + */
 +
 +#ifndef TARGET_ARM_TRANSLATE_A64_H
 +#define TARGET_ARM_TRANSLATE_A64_H
 +
 +void load_reg_var(DisasContext *s, TCGv_i32 var, int reg);
 +void arm_gen_condlabel(DisasContext *s);
 +bool vfp_access_check(DisasContext *s);
 +void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop);
 +void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop);
 +void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop);
 +void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop);
 +
 +static inline TCGv_i32 load_cpu_offset(int offset)
 +{
 +    TCGv_i32 tmp = tcg_temp_new_i32();
 +    tcg_gen_ld_i32(tmp, cpu_env, offset);
 +    return tmp;
 +}
 +
 +#define load_cpu_field(name) load_cpu_offset(offsetof(CPUARMState, name))
 +
 +static inline void store_cpu_offset(TCGv_i32 var, int offset)
 +{
 +    tcg_gen_st_i32(var, cpu_env, offset);
 +    tcg_temp_free_i32(var);
 +}
 +
 +#define store_cpu_field(var, name) \
 +    store_cpu_offset(var, offsetof(CPUARMState, name))
 +
 +/* Create a new temporary and set it to the value of a CPU register.  */
 +static inline TCGv_i32 load_reg(DisasContext *s, int reg)
 +{
 +    TCGv_i32 tmp = tcg_temp_new_i32();
 +    load_reg_var(s, tmp, reg);
 +    return tmp;
 +}
 +
 +#endif
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@
  #define ENABLE_ARCH_8     arm_dc_feature(s, ARM_FEATURE_V8)
  #include "translate.h"
 +#include "translate-a32.h"
  #if defined(CONFIG_USER_ONLY)
  #define IS_USER(s) 1
@@ -XXX,XX +XXX,XX @@ void arm_translate_init(void)
  }
  /* Generate a label used for skipping this instruction */
 -static void arm_gen_condlabel(DisasContext *s)
 +void arm_gen_condlabel(DisasContext *s)
  {
      if (!s->condjmp) {
          s->condlabel = gen_new_label();
@@ -XXX,XX +XXX,XX @@ static inline int get_a32_user_mem_index(DisasContext *s)
      }
  }
 -static inline TCGv_i32 load_cpu_offset(int offset)
 -{
 -    TCGv_i32 tmp = tcg_temp_new_i32();
 -    tcg_gen_ld_i32(tmp, cpu_env, offset);
 -    return tmp;
 -}
 -
 -#define load_cpu_field(name) load_cpu_offset(offsetof(CPUARMState, name))
 -
 -static inline void store_cpu_offset(TCGv_i32 var, int offset)
 -{
 -    tcg_gen_st_i32(var, cpu_env, offset);
 -    tcg_temp_free_i32(var);
 -}
 -
 -#define store_cpu_field(var, name) \
 -    store_cpu_offset(var, offsetof(CPUARMState, name))
 -
  /* The architectural value of PC.  */
  static uint32_t read_pc(DisasContext *s)
  {
@@ -XXX,XX +XXX,XX @@ static uint32_t read_pc(DisasContext *s)
  }
  /* Set a variable to the value of a CPU register.  */
 -static void load_reg_var(DisasContext *s, TCGv_i32 var, int reg)
 +void load_reg_var(DisasContext *s, TCGv_i32 var, int reg)
  {
      if (reg == 15) {
          tcg_gen_movi_i32(var, read_pc(s));
@@ -XXX,XX +XXX,XX @@ static void load_reg_var(DisasContext *s, TCGv_i32 var, int reg)
      }
  }
 -/* Create a new temporary and set it to the value of a CPU register.  */
 -static inline TCGv_i32 load_reg(DisasContext *s, int reg)
 -{
 -    TCGv_i32 tmp = tcg_temp_new_i32();
 -    load_reg_var(s, tmp, reg);
 -    return tmp;
 -}
 -
  /*
   * Create a new temp, REG + OFS, except PC is ALIGN(PC, 4).
   * This is used for load/store for which use of PC implies (literal),
@@ -XXX,XX +XXX,XX @@ static inline void vfp_store_reg32(TCGv_i32 var, int reg)
      tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
  }
 -static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
 +void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
  {
      long off = neon_element_offset(reg, ele, memop);
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
      }
  }
 -static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
 +void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
  {
      long off = neon_element_offset(reg, ele, memop);
@@ -XXX,XX +XXX,XX @@ static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
      }
  }
 -static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
 +void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
  {
      long off = neon_element_offset(reg, ele, memop);
@@ -XXX,XX +XXX,XX @@ static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
      }
  }
 -static void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
 +void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
  {
      long off = neon_element_offset(reg, ele, memop);
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
   * The most usual kind of VFP access check, for everything except
   * FMXR/FMRX to the always-available special registers.
   */
 -static bool vfp_access_check(DisasContext *s)
 +bool vfp_access_check(DisasContext *s)
  {
      return full_vfp_access_check(s, false);
  }
 --
 .20.1

-[Qemu-devel] [PULL 13/16] target/arm: Implement FMOV (immediate) for fp16
+[PULL 07/26] target/arm: Split m-nocp trans functions into their own file
-From: Alex Bennée <alex.bennee@linaro.org>
+Currently the trans functions for m-nocp.decode all live in
 translate-vfp.inc.c; move them out into their own translation unit,
 translate-m-nocp.c.
-All the hard work is already done by vfp_expand_imm, we just need to
+The trans_* functions here are pure code motion with no changes.
 make sure we pick up the correct size.
-Cc: qemu-stable@nongnu.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
-Tested-by: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180512003217.9105-11-richard.henderson@linaro.org
-[rth: Merge unallocated_encoding check with TCGMemOp conversion.]
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210430132740.10391-5-peter.maydell@linaro.org
 ---
- target/arm/translate-a64.c | 20 +++++++++++++++++---
+ target/arm/translate-a32.h     |   3 +
-file changed, 17 insertions(+), 3 deletions(-)
+ target/arm/translate-m-nocp.c  | 221 +++++++++++++++++++++++++++++++++
  target/arm/translate.c         |   1 -
  target/arm/translate-vfp.c.inc | 196 -----------------------------
  target/arm/meson.build         |   3 +-
 files changed, 226 insertions(+), 198 deletions(-)
  create mode 100644 target/arm/translate-m-nocp.c
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/translate-a32.h
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/translate-a32.h
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_imm(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@
  #ifndef TARGET_ARM_TRANSLATE_A64_H
  #define TARGET_ARM_TRANSLATE_A64_H
 +/* Prototypes for autogenerated disassembler functions */
 +bool disas_m_nocp(DisasContext *dc, uint32_t insn);
 +
  void load_reg_var(DisasContext *s, TCGv_i32 var, int reg);
  void arm_gen_condlabel(DisasContext *s);
  bool vfp_access_check(DisasContext *s);
 diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/target/arm/translate-m-nocp.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + *  ARM translation: M-profile NOCP special-case instructions
 + *
 + *  Copyright (c) 2020 Linaro, Ltd.
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2.1 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "tcg/tcg-op.h"
 +#include "translate.h"
 +#include "translate-a32.h"
 +
 +#include "decode-m-nocp.c.inc"
 +
 +/*
 + * Decode VLLDM and VLSTM are nonstandard because:
 + *  * if there is no FPU then these insns must NOP in
 + *    Secure state and UNDEF in Nonsecure state
 + *  * if there is an FPU then these insns do not have
 + *    the usual behaviour that vfp_access_check() provides of
 + *    being controlled by CPACR/NSACR enable bits or the
 + *    lazy-stacking logic.
 + */
 +static bool trans_VLLDM_VLSTM(DisasContext *s, arg_VLLDM_VLSTM *a)
 +{
 +    TCGv_i32 fptr;
 +
 +    if (!arm_dc_feature(s, ARM_FEATURE_M) ||
 +        !arm_dc_feature(s, ARM_FEATURE_V8)) {
 +        return false;
 +    }
 +
 +    if (a->op) {
 +        /*
 +         * T2 encoding ({D0-D31} reglist): v8.1M and up. We choose not
 +         * to take the IMPDEF option to make memory accesses to the stack
 +         * slots that correspond to the D16-D31 registers (discarding
 +         * read data and writing UNKNOWN values), so for us the T2
 +         * encoding behaves identically to the T1 encoding.
 +         */
 +        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 +            return false;
 +        }
 +    } else {
 +        /*
 +         * T1 encoding ({D0-D15} reglist); undef if we have 32 Dregs.
 +         * This is currently architecturally impossible, but we add the
 +         * check to stay in line with the pseudocode. Note that we must
 +         * emit code for the UNDEF so it takes precedence over the NOCP.
 +         */
 +        if (dc_isar_feature(aa32_simd_r32, s)) {
 +            unallocated_encoding(s);
 +            return true;
 +        }
 +    }
 +
 +    /*
 +     * If not secure, UNDEF. We must emit code for this
 +     * rather than returning false so that this takes
 +     * precedence over the m-nocp.decode NOCP fallback.
 +     */
 +    if (!s->v8m_secure) {
 +        unallocated_encoding(s);
 +        return true;
 +    }
 +    /* If no fpu, NOP. */
 +    if (!dc_isar_feature(aa32_vfp, s)) {
 +        return true;
 +    }
 +
 +    fptr = load_reg(s, a->rn);
 +    if (a->l) {
 +        gen_helper_v7m_vlldm(cpu_env, fptr);
 +    } else {
 +        gen_helper_v7m_vlstm(cpu_env, fptr);
 +    }
 +    tcg_temp_free_i32(fptr);
 +
 +    /* End the TB, because we have updated FP control bits */
 +    s->base.is_jmp = DISAS_UPDATE_EXIT;
 +    return true;
 +}
 +
 +static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
 +{
 +    int btmreg, topreg;
 +    TCGv_i64 zero;
 +    TCGv_i32 aspen, sfpa;
 +
 +    if (!dc_isar_feature(aa32_m_sec_state, s)) {
 +        /* Before v8.1M, fall through in decode to NOCP check */
 +        return false;
 +    }
 +
 +    /* Explicitly UNDEF because this takes precedence over NOCP */
 +    if (!arm_dc_feature(s, ARM_FEATURE_M_MAIN) || !s->v8m_secure) {
 +        unallocated_encoding(s);
 +        return true;
 +    }
 +
 +    if (!dc_isar_feature(aa32_vfp_simd, s)) {
 +        /* NOP if we have neither FP nor MVE */
 +        return true;
 +    }
 +
 +    /*
 +     * If FPCCR.ASPEN != 0 && CONTROL_S.SFPA == 0 then there is no
 +     * active floating point context so we must NOP (without doing
 +     * any lazy state preservation or the NOCP check).
 +     */
 +    aspen = load_cpu_field(v7m.fpccr[M_REG_S]);
 +    sfpa = load_cpu_field(v7m.control[M_REG_S]);
 +    tcg_gen_andi_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
 +    tcg_gen_xori_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
 +    tcg_gen_andi_i32(sfpa, sfpa, R_V7M_CONTROL_SFPA_MASK);
 +    tcg_gen_or_i32(sfpa, sfpa, aspen);
 +    arm_gen_condlabel(s);
 +    tcg_gen_brcondi_i32(TCG_COND_EQ, sfpa, 0, s->condlabel);
 +
 +    if (s->fp_excp_el != 0) {
 +        gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
 +                           syn_uncategorized(), s->fp_excp_el);
 +        return true;
 +    }
 +
 +    topreg = a->vd + a->imm - 1;
 +    btmreg = a->vd;
 +
 +    /* Convert to Sreg numbers if the insn specified in Dregs */
 +    if (a->size == 3) {
 +        topreg = topreg * 2 + 1;
 +        btmreg *= 2;
 +    }
 +
 +    if (topreg > 63 || (topreg > 31 && !(topreg & 1))) {
 +        /* UNPREDICTABLE: we choose to undef */
 +        unallocated_encoding(s);
 +        return true;
 +    }
 +
 +    /* Silently ignore requests to clear D16-D31 if they don't exist */
 +    if (topreg > 31 && !dc_isar_feature(aa32_simd_r32, s)) {
 +        topreg = 31;
 +    }
 +
 +    if (!vfp_access_check(s)) {
 +        return true;
 +    }
 +
 +    /* Zero the Sregs from btmreg to topreg inclusive. */
 +    zero = tcg_const_i64(0);
 +    if (btmreg & 1) {
 +        write_neon_element64(zero, btmreg >> 1, 1, MO_32);
 +        btmreg++;
 +    }
 +    for (; btmreg + 1 <= topreg; btmreg += 2) {
 +        write_neon_element64(zero, btmreg >> 1, 0, MO_64);
 +    }
 +    if (btmreg == topreg) {
 +        write_neon_element64(zero, btmreg >> 1, 0, MO_32);
 +        btmreg++;
 +    }
 +    assert(btmreg == topreg + 1);
 +    /* TODO: when MVE is implemented, zero VPR here */
 +    return true;
 +}
 +
 +static bool trans_NOCP(DisasContext *s, arg_nocp *a)
 +{
 +    /*
 +     * Handle M-profile early check for disabled coprocessor:
 +     * all we need to do here is emit the NOCP exception if
 +     * the coprocessor is disabled. Otherwise we return false
 +     * and the real VFP/etc decode will handle the insn.
 +     */
 +    assert(arm_dc_feature(s, ARM_FEATURE_M));
 +
 +    if (a->cp == 11) {
 +        a->cp = 10;
 +    }
 +    if (arm_dc_feature(s, ARM_FEATURE_V8_1M) &&
 +        (a->cp == 8 || a->cp == 9 || a->cp == 14 || a->cp == 15)) {
 +        /* in v8.1M cp 8, 9, 14, 15 also are governed by the cp10 enable */
 +        a->cp = 10;
 +    }
 +
 +    if (a->cp != 10) {
 +        gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
 +                           syn_uncategorized(), default_exception_el(s));
 +        return true;
 +    }
 +
 +    if (s->fp_excp_el != 0) {
 +        gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
 +                           syn_uncategorized(), s->fp_excp_el);
 +        return true;
 +    }
 +
 +    return false;
 +}
 +
 +static bool trans_NOCP_8_1(DisasContext *s, arg_nocp *a)
 +{
 +    /* This range needs a coprocessor check for v8.1M and later only */
 +    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 +        return false;
 +    }
 +    return trans_NOCP(s, a);
 +}
 diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
  #define ARM_CP_RW_BIT   (1 << 20)
  /* Include the VFP and Neon decoders */
 -#include "decode-m-nocp.c.inc"
  #include "translate-vfp.c.inc"
  #include "translate-neon.c.inc"
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-vfp.c.inc
 +++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
      return true;
  }
 -/*
 - * Decode VLLDM and VLSTM are nonstandard because:
 - *  * if there is no FPU then these insns must NOP in
 - *    Secure state and UNDEF in Nonsecure state
 - *  * if there is an FPU then these insns do not have
 - *    the usual behaviour that vfp_access_check() provides of
 - *    being controlled by CPACR/NSACR enable bits or the
 - *    lazy-stacking logic.
 - */
 -static bool trans_VLLDM_VLSTM(DisasContext *s, arg_VLLDM_VLSTM *a)
 -{
 -    TCGv_i32 fptr;
 -
 -    if (!arm_dc_feature(s, ARM_FEATURE_M) ||
 -        !arm_dc_feature(s, ARM_FEATURE_V8)) {
 -        return false;
 -    }
 -
 -    if (a->op) {
 -        /*
 -         * T2 encoding ({D0-D31} reglist): v8.1M and up. We choose not
 -         * to take the IMPDEF option to make memory accesses to the stack
 -         * slots that correspond to the D16-D31 registers (discarding
 -         * read data and writing UNKNOWN values), so for us the T2
 -         * encoding behaves identically to the T1 encoding.
 -         */
 -        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 -            return false;
 -        }
 -    } else {
 -        /*
 -         * T1 encoding ({D0-D15} reglist); undef if we have 32 Dregs.
 -         * This is currently architecturally impossible, but we add the
 -         * check to stay in line with the pseudocode. Note that we must
 -         * emit code for the UNDEF so it takes precedence over the NOCP.
 -         */
 -        if (dc_isar_feature(aa32_simd_r32, s)) {
 -            unallocated_encoding(s);
 -            return true;
 -        }
 -    }
 -
 -    /*
 -     * If not secure, UNDEF. We must emit code for this
 -     * rather than returning false so that this takes
 -     * precedence over the m-nocp.decode NOCP fallback.
 -     */
 -    if (!s->v8m_secure) {
 -        unallocated_encoding(s);
 -        return true;
 -    }
 -    /* If no fpu, NOP. */
 -    if (!dc_isar_feature(aa32_vfp, s)) {
 -        return true;
 -    }
 -
 -    fptr = load_reg(s, a->rn);
 -    if (a->l) {
 -        gen_helper_v7m_vlldm(cpu_env, fptr);
 -    } else {
 -        gen_helper_v7m_vlstm(cpu_env, fptr);
 -    }
 -    tcg_temp_free_i32(fptr);
 -
 -    /* End the TB, because we have updated FP control bits */
 -    s->base.is_jmp = DISAS_UPDATE_EXIT;
 -    return true;
 -}
 -
 -static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
 -{
 -    int btmreg, topreg;
 -    TCGv_i64 zero;
 -    TCGv_i32 aspen, sfpa;
 -
 -    if (!dc_isar_feature(aa32_m_sec_state, s)) {
 -        /* Before v8.1M, fall through in decode to NOCP check */
 -        return false;
 -    }
 -
 -    /* Explicitly UNDEF because this takes precedence over NOCP */
 -    if (!arm_dc_feature(s, ARM_FEATURE_M_MAIN) || !s->v8m_secure) {
 -        unallocated_encoding(s);
 -        return true;
 -    }
 -
 -    if (!dc_isar_feature(aa32_vfp_simd, s)) {
 -        /* NOP if we have neither FP nor MVE */
 -        return true;
 -    }
 -
 -    /*
 -     * If FPCCR.ASPEN != 0 && CONTROL_S.SFPA == 0 then there is no
 -     * active floating point context so we must NOP (without doing
 -     * any lazy state preservation or the NOCP check).
 -     */
 -    aspen = load_cpu_field(v7m.fpccr[M_REG_S]);
 -    sfpa = load_cpu_field(v7m.control[M_REG_S]);
 -    tcg_gen_andi_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
 -    tcg_gen_xori_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
 -    tcg_gen_andi_i32(sfpa, sfpa, R_V7M_CONTROL_SFPA_MASK);
 -    tcg_gen_or_i32(sfpa, sfpa, aspen);
 -    arm_gen_condlabel(s);
 -    tcg_gen_brcondi_i32(TCG_COND_EQ, sfpa, 0, s->condlabel);
 -
 -    if (s->fp_excp_el != 0) {
 -        gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
 -                           syn_uncategorized(), s->fp_excp_el);
 -        return true;
 -    }
 -
 -    topreg = a->vd + a->imm - 1;
 -    btmreg = a->vd;
 -
 -    /* Convert to Sreg numbers if the insn specified in Dregs */
 -    if (a->size == 3) {
 -        topreg = topreg * 2 + 1;
 -        btmreg *= 2;
 -    }
 -
 -    if (topreg > 63 || (topreg > 31 && !(topreg & 1))) {
 -        /* UNPREDICTABLE: we choose to undef */
 -        unallocated_encoding(s);
 -        return true;
 -    }
 -
 -    /* Silently ignore requests to clear D16-D31 if they don't exist */
 -    if (topreg > 31 && !dc_isar_feature(aa32_simd_r32, s)) {
 -        topreg = 31;
 -    }
 -
 -    if (!vfp_access_check(s)) {
 -        return true;
 -    }
 -
 -    /* Zero the Sregs from btmreg to topreg inclusive. */
 -    zero = tcg_const_i64(0);
 -    if (btmreg & 1) {
 -        write_neon_element64(zero, btmreg >> 1, 1, MO_32);
 -        btmreg++;
 -    }
 -    for (; btmreg + 1 <= topreg; btmreg += 2) {
 -        write_neon_element64(zero, btmreg >> 1, 0, MO_64);
 -    }
 -    if (btmreg == topreg) {
 -        write_neon_element64(zero, btmreg >> 1, 0, MO_32);
 -        btmreg++;
 -    }
 -    assert(btmreg == topreg + 1);
 -    /* TODO: when MVE is implemented, zero VPR here */
 -    return true;
 -}
 -
 -static bool trans_NOCP(DisasContext *s, arg_nocp *a)
 -{
 -    /*
 -     * Handle M-profile early check for disabled coprocessor:
 -     * all we need to do here is emit the NOCP exception if
 -     * the coprocessor is disabled. Otherwise we return false
 -     * and the real VFP/etc decode will handle the insn.
 -     */
 -    assert(arm_dc_feature(s, ARM_FEATURE_M));
 -
 -    if (a->cp == 11) {
 -        a->cp = 10;
 -    }
 -    if (arm_dc_feature(s, ARM_FEATURE_V8_1M) &&
 -        (a->cp == 8 || a->cp == 9 || a->cp == 14 || a->cp == 15)) {
 -        /* in v8.1M cp 8, 9, 14, 15 also are governed by the cp10 enable */
 -        a->cp = 10;
 -    }
 -
 -    if (a->cp != 10) {
 -        gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
 -                           syn_uncategorized(), default_exception_el(s));
 -        return true;
 -    }
 -
 -    if (s->fp_excp_el != 0) {
 -        gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
 -                           syn_uncategorized(), s->fp_excp_el);
 -        return true;
 -    }
 -
 -    return false;
 -}
 -
 -static bool trans_NOCP_8_1(DisasContext *s, arg_nocp *a)
 -{
 -    /* This range needs a coprocessor check for v8.1M and later only */
 -    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
 -        return false;
 -    }
 -    return trans_NOCP(s, a);
 -}
 -
  static bool trans_VINS(DisasContext *s, arg_VINS *a)
  {
-     int rd = extract32(insn, 0, 5);
+     TCGv_i32 rd, rm;
-     int imm8 = extract32(insn, 13, 8);
+diff --git a/target/arm/meson.build b/target/arm/meson.build
--    int is_double = extract32(insn, 22, 2);
+index XXXXXXX..XXXXXXX 100644
-+    int type = extract32(insn, 22, 2);
+--- a/target/arm/meson.build
-     uint64_t imm;
++++ b/target/arm/meson.build
-     TCGv_i64 tcg_res;
+@@ -XXX,XX +XXX,XX @@ gen = [
-+    TCGMemOp sz;
+   decodetree.process('neon-ls.decode', extra_args: '--static-decode=disas_neon_ls'),
+   decodetree.process('vfp.decode', extra_args: '--static-decode=disas_vfp'),
--    if (is_double > 1) {
+   decodetree.process('vfp-uncond.decode', extra_args: '--static-decode=disas_vfp_uncond'),
-+    switch (type) {
+-  decodetree.process('m-nocp.decode', extra_args: '--static-decode=disas_m_nocp'),
-+    case 0:
++  decodetree.process('m-nocp.decode', extra_args: '--decode=disas_m_nocp'),
-+        sz = MO_32;
+   decodetree.process('a32.decode', extra_args: '--static-decode=disas_a32'),
-+        break;
+   decodetree.process('a32-uncond.decode', extra_args: '--static-decode=disas_a32_uncond'),
-+    case 1:
+   decodetree.process('t32.decode', extra_args: '--static-decode=disas_t32'),
-+        sz = MO_64;
+@@ -XXX,XX +XXX,XX @@ arm_ss.add(files(
-+        break;
+   'op_helper.c',
-+    case 3:
+   'tlb_helper.c',
-+        sz = MO_16;
+   'translate.c',
-+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
++  'translate-m-nocp.c',
-+            break;
+   'vec_helper.c',
-+        }
+   'vfp_helper.c',
-+        /* fallthru */
+   'cpu_tcg.c',
 +    default:
          unallocated_encoding(s);
          return;
      }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_imm(DisasContext *s, uint32_t insn)
          return;
      }
 -    imm = vfp_expand_imm(MO_32 + is_double, imm8);
 +    imm = vfp_expand_imm(sz, imm8);
      tcg_res = tcg_const_i64(imm);
      write_fp_dreg(s, rd, tcg_res);
 --
-.17.0
+.20.1

-[Qemu-devel] [PULL 10/16] target/arm: Implement FP data-processing (3 source) for fp16
+[PULL 08/26] target/arm: Move gen_aa32 functions to translate-a32.h
-From: Richard Henderson <richard.henderson@linaro.org>
+Move the various gen_aa32* functions and macros out of translate.c
 and into translate-a32.h.
-We missed all of the scalar fp16 fma operations.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210430132740.10391-6-peter.maydell@linaro.org
 ---
  target/arm/translate-a32.h | 53 ++++++++++++++++++++++++++++++++++++++
  target/arm/translate.c     | 51 ++++++++++++------------------------
 files changed, 69 insertions(+), 35 deletions(-)
-Cc: qemu-stable@nongnu.org
+diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
 Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Tested-by: Alex Bennée <alex.bennee@linaro.org>
 Message-id: 20180512003217.9105-8-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/translate-a64.c | 48 ++++++++++++++++++++++++++++++++++++++
 file changed, 48 insertions(+)
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/translate-a32.h
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/translate-a32.h
-@@ -XXX,XX +XXX,XX @@ static void handle_fp_3src_double(DisasContext *s, bool o0, bool o1,
+@@ -XXX,XX +XXX,XX @@ static inline TCGv_i32 load_reg(DisasContext *s, int reg)
-     tcg_temp_free_i64(tcg_res);
+     return tmp;
  }
-+/* Floating-point data-processing (3 source) - half precision */
++void gen_aa32_ld_internal_i32(DisasContext *s, TCGv_i32 val,
-+static void handle_fp_3src_half(DisasContext *s, bool o0, bool o1,
++                              TCGv_i32 a32, int index, MemOp opc);
-+                                int rd, int rn, int rm, int ra)
++void gen_aa32_st_internal_i32(DisasContext *s, TCGv_i32 val,
-+{
++                              TCGv_i32 a32, int index, MemOp opc);
-+    TCGv_i32 tcg_op1, tcg_op2, tcg_op3;
++void gen_aa32_ld_internal_i64(DisasContext *s, TCGv_i64 val,
-+    TCGv_i32 tcg_res = tcg_temp_new_i32();
++                              TCGv_i32 a32, int index, MemOp opc);
-+    TCGv_ptr fpst = get_fpstatus_ptr(true);
++void gen_aa32_st_internal_i64(DisasContext *s, TCGv_i64 val,
 +                              TCGv_i32 a32, int index, MemOp opc);
 +void gen_aa32_ld_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
 +                     int index, MemOp opc);
 +void gen_aa32_st_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
 +                     int index, MemOp opc);
 +void gen_aa32_ld_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
 +                     int index, MemOp opc);
 +void gen_aa32_st_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
 +                     int index, MemOp opc);
 +
-+    tcg_op1 = read_fp_hreg(s, rn);
++#define DO_GEN_LD(SUFF, OPC)                                            \
-+    tcg_op2 = read_fp_hreg(s, rm);
++    static inline void gen_aa32_ld##SUFF(DisasContext *s, TCGv_i32 val, \
-+    tcg_op3 = read_fp_hreg(s, ra);
++                                         TCGv_i32 a32, int index)       \
-+
++    {                                                                   \
-+    /* These are fused multiply-add, and must be done as one
++        gen_aa32_ld_i32(s, val, a32, index, OPC);                       \
 +     * floating point operation with no rounding between the
 +     * multiplication and addition steps.
 +     * NB that doing the negations here as separate steps is
 +     * correct : an input NaN should come out with its sign bit
 +     * flipped if it is a negated-input.
 +     */
 +    if (o1 == true) {
 +        tcg_gen_xori_i32(tcg_op3, tcg_op3, 0x8000);
 +    }
 +
-+    if (o0 != o1) {
++#define DO_GEN_ST(SUFF, OPC)                                            \
-+        tcg_gen_xori_i32(tcg_op1, tcg_op1, 0x8000);
++    static inline void gen_aa32_st##SUFF(DisasContext *s, TCGv_i32 val, \
 +                                         TCGv_i32 a32, int index)       \
 +    {                                                                   \
 +        gen_aa32_st_i32(s, val, a32, index, OPC);                       \
 +    }
 +
-+    gen_helper_advsimd_muladdh(tcg_res, tcg_op1, tcg_op2, tcg_op3, fpst);
++static inline void gen_aa32_ld64(DisasContext *s, TCGv_i64 val,
-+
++                                 TCGv_i32 a32, int index)
-+    write_fp_sreg(s, rd, tcg_res);
++{
-+
++    gen_aa32_ld_i64(s, val, a32, index, MO_Q);
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tcg_op1);
 +    tcg_temp_free_i32(tcg_op2);
 +    tcg_temp_free_i32(tcg_op3);
 +    tcg_temp_free_i32(tcg_res);
 +}
 +
- /* Floating point data-processing (3 source)
++static inline void gen_aa32_st64(DisasContext *s, TCGv_i64 val,
-  *   31  30  29 28       24 23  22  21  20  16  15  14  10 9    5 4    0
++                                 TCGv_i32 a32, int index)
-  * +---+---+---+-----------+------+----+------+----+------+------+------+
++{
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_3src(DisasContext *s, uint32_t insn)
++    gen_aa32_st_i64(s, val, a32, index, MO_Q);
-         }
++}
-         handle_fp_3src_double(s, o0, o1, rd, rn, rm, ra);
++
-         break;
++DO_GEN_LD(8u, MO_UB)
-+    case 3:
++DO_GEN_LD(16u, MO_UW)
-+        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
++DO_GEN_LD(32u, MO_UL)
-+            unallocated_encoding(s);
++DO_GEN_ST(8, MO_UB)
-+            return;
++DO_GEN_ST(16, MO_UW)
-+        }
++DO_GEN_ST(32, MO_UL)
-+        if (!fp_access_check(s)) {
++
-+            return;
++#undef DO_GEN_LD
-+        }
++#undef DO_GEN_ST
-+        handle_fp_3src_half(s, o0, o1, rd, rn, rm, ra);
++
-+        break;
+ #endif
-     default:
+diff --git a/target/arm/translate.c b/target/arm/translate.c
-         unallocated_encoding(s);
+index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate.c
 +++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static TCGv gen_aa32_addr(DisasContext *s, TCGv_i32 a32, MemOp op)
   * Internal routines are used for NEON cases where the endianness
   * and/or alignment has already been taken into account and manipulated.
   */
 -static void gen_aa32_ld_internal_i32(DisasContext *s, TCGv_i32 val,
 -                                     TCGv_i32 a32, int index, MemOp opc)
 +void gen_aa32_ld_internal_i32(DisasContext *s, TCGv_i32 val,
 +                              TCGv_i32 a32, int index, MemOp opc)
  {
      TCGv addr = gen_aa32_addr(s, a32, opc);
      tcg_gen_qemu_ld_i32(val, addr, index, opc);
      tcg_temp_free(addr);
  }
 -static void gen_aa32_st_internal_i32(DisasContext *s, TCGv_i32 val,
 -                                     TCGv_i32 a32, int index, MemOp opc)
 +void gen_aa32_st_internal_i32(DisasContext *s, TCGv_i32 val,
 +                              TCGv_i32 a32, int index, MemOp opc)
  {
      TCGv addr = gen_aa32_addr(s, a32, opc);
      tcg_gen_qemu_st_i32(val, addr, index, opc);
      tcg_temp_free(addr);
  }
 -static void gen_aa32_ld_internal_i64(DisasContext *s, TCGv_i64 val,
 -                                     TCGv_i32 a32, int index, MemOp opc)
 +void gen_aa32_ld_internal_i64(DisasContext *s, TCGv_i64 val,
 +                              TCGv_i32 a32, int index, MemOp opc)
  {
      TCGv addr = gen_aa32_addr(s, a32, opc);
@@ -XXX,XX +XXX,XX @@ static void gen_aa32_ld_internal_i64(DisasContext *s, TCGv_i64 val,
      tcg_temp_free(addr);
  }
 -static void gen_aa32_st_internal_i64(DisasContext *s, TCGv_i64 val,
 -                                     TCGv_i32 a32, int index, MemOp opc)
 +void gen_aa32_st_internal_i64(DisasContext *s, TCGv_i64 val,
 +                              TCGv_i32 a32, int index, MemOp opc)
  {
      TCGv addr = gen_aa32_addr(s, a32, opc);
@@ -XXX,XX +XXX,XX @@ static void gen_aa32_st_internal_i64(DisasContext *s, TCGv_i64 val,
      tcg_temp_free(addr);
  }
 -static void gen_aa32_ld_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
 -                            int index, MemOp opc)
 +void gen_aa32_ld_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
 +                     int index, MemOp opc)
  {
      gen_aa32_ld_internal_i32(s, val, a32, index, finalize_memop(s, opc));
  }
 -static void gen_aa32_st_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
 -                            int index, MemOp opc)
 +void gen_aa32_st_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
 +                     int index, MemOp opc)
  {
      gen_aa32_st_internal_i32(s, val, a32, index, finalize_memop(s, opc));
  }
 -static void gen_aa32_ld_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
 -                            int index, MemOp opc)
 +void gen_aa32_ld_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
 +                     int index, MemOp opc)
  {
      gen_aa32_ld_internal_i64(s, val, a32, index, finalize_memop(s, opc));
  }
 -static void gen_aa32_st_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
 -                            int index, MemOp opc)
 +void gen_aa32_st_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
 +                     int index, MemOp opc)
  {
      gen_aa32_st_internal_i64(s, val, a32, index, finalize_memop(s, opc));
  }
@@ -XXX,XX +XXX,XX @@ static void gen_aa32_st_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
          gen_aa32_st_i32(s, val, a32, index, OPC);                       \
      }
+-static inline void gen_aa32_ld64(DisasContext *s, TCGv_i64 val,
+-                                 TCGv_i32 a32, int index)
+-{
+-    gen_aa32_ld_i64(s, val, a32, index, MO_Q);
+-}
+-
+-static inline void gen_aa32_st64(DisasContext *s, TCGv_i64 val,
+-                                 TCGv_i32 a32, int index)
+-{
+-    gen_aa32_st_i64(s, val, a32, index, MO_Q);
+-}
+-
+-DO_GEN_LD(8u, MO_UB)
+-DO_GEN_LD(16u, MO_UW)
+-DO_GEN_LD(32u, MO_UL)
+-DO_GEN_ST(8, MO_UB)
+-DO_GEN_ST(16, MO_UW)
+-DO_GEN_ST(32, MO_UL)
+-
+ static inline void gen_hvc(DisasContext *s, int imm16)
+ {
+     /* The pre HVC helper handles cases when HVC gets trapped
 --
-.17.0
+.20.1

-[Qemu-devel] [PULL 06/16] target/arm: Implement FCVT (scalar, integer) for fp16
+[PULL 09/26] target/arm: Move vfp_{load, store}_reg{32, 64} to translate-vfp.c.inc
-From: Richard Henderson <richard.henderson@linaro.org>
+The functions vfp_load_reg32(), vfp_load_reg64(), vfp_store_reg32()
 and vfp_store_reg64() are used only in translate-vfp.c.inc. Move
 them to that file.
-Cc: qemu-stable@nongnu.org
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Tested-by: Alex Bennée <alex.bennee@linaro.org>
-Message-id: 20180512003217.9105-4-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210430132740.10391-7-peter.maydell@linaro.org
 ---
- target/arm/helper.h        |  6 +++
+ target/arm/translate.c         | 20 --------------------
- target/arm/helper.c        | 38 ++++++++++++++-
+ target/arm/translate-vfp.c.inc | 20 ++++++++++++++++++++
- target/arm/translate-a64.c | 96 +++++++++++++++++++++++++++++++-------
+files changed, 20 insertions(+), 20 deletions(-)
 files changed, 122 insertions(+), 18 deletions(-)
-diff --git a/target/arm/helper.h b/target/arm/helper.h
+diff --git a/target/arm/translate.c b/target/arm/translate.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.h
+--- a/target/arm/translate.c
-+++ b/target/arm/helper.h
++++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_touhd_round_to_zero, i64, f64, i32, ptr)
+@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
- DEF_HELPER_3(vfp_tould_round_to_zero, i64, f64, i32, ptr)
+     }
- DEF_HELPER_3(vfp_touhh, i32, f16, i32, ptr)
+ }
- DEF_HELPER_3(vfp_toshh, i32, f16, i32, ptr)
-+DEF_HELPER_3(vfp_toulh, i32, f16, i32, ptr)
+-static inline void vfp_load_reg64(TCGv_i64 var, int reg)
-+DEF_HELPER_3(vfp_toslh, i32, f16, i32, ptr)
+-{
-+DEF_HELPER_3(vfp_touqh, i64, f16, i32, ptr)
+-    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(true, reg));
-+DEF_HELPER_3(vfp_tosqh, i64, f16, i32, ptr)
+-}
- DEF_HELPER_3(vfp_toshs, i32, f32, i32, ptr)
+-
- DEF_HELPER_3(vfp_tosls, i32, f32, i32, ptr)
+-static inline void vfp_store_reg64(TCGv_i64 var, int reg)
- DEF_HELPER_3(vfp_tosqs, i64, f32, i32, ptr)
+-{
-@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_ultod, f64, i64, i32, ptr)
+-    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(true, reg));
- DEF_HELPER_3(vfp_uqtod, f64, i64, i32, ptr)
+-}
- DEF_HELPER_3(vfp_sltoh, f16, i32, i32, ptr)
+-
- DEF_HELPER_3(vfp_ultoh, f16, i32, i32, ptr)
+-static inline void vfp_load_reg32(TCGv_i32 var, int reg)
-+DEF_HELPER_3(vfp_sqtoh, f16, i64, i32, ptr)
+-{
-+DEF_HELPER_3(vfp_uqtoh, f16, i64, i32, ptr)
+-    tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
+-}
- DEF_HELPER_FLAGS_2(set_rmode, TCG_CALL_NO_RWG, i32, i32, ptr)
+-
- DEF_HELPER_FLAGS_2(set_neon_rmode, TCG_CALL_NO_RWG, i32, i32, env)
+-static inline void vfp_store_reg32(TCGv_i32 var, int reg)
-diff --git a/target/arm/helper.c b/target/arm/helper.c
+-{
 -    tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 -}
 -
  void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
  {
      long off = neon_element_offset(reg, ele, memop);
 diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/helper.c
+--- a/target/arm/translate-vfp.c.inc
-+++ b/target/arm/helper.c
++++ b/target/arm/translate-vfp.c.inc
-@@ -XXX,XX +XXX,XX @@ VFP_CONV_FIX_A64(uq, s, 32, 64, uint64)
+@@ -XXX,XX +XXX,XX @@
- #undef VFP_CONV_FIX_A64
+ #include "decode-vfp.c.inc"
+ #include "decode-vfp-uncond.c.inc"
- /* Conversion to/from f16 can overflow to infinity before/after scaling.
-- * Therefore we convert to f64 (which does not round), scale,
++static inline void vfp_load_reg64(TCGv_i64 var, int reg)
 - * and then convert f64 to f16 (which may round).
 + * Therefore we convert to f64, scale, and then convert f64 to f16; or
 + * vice versa for conversion to integer.
 + *
 + * For 16- and 32-bit integers, the conversion to f64 never rounds.
 + * For 64-bit integers, any integer that would cause rounding will also
 + * overflow to f16 infinity, so there is no double rounding problem.
   */
  static float16 do_postscale_fp16(float64 f, int shift, float_status *fpst)
@@ -XXX,XX +XXX,XX @@ float16 HELPER(vfp_ultoh)(uint32_t x, uint32_t shift, void *fpst)
      return do_postscale_fp16(uint32_to_float64(x, fpst), shift, fpst);
  }
 +float16 HELPER(vfp_sqtoh)(uint64_t x, uint32_t shift, void *fpst)
 +{
-+    return do_postscale_fp16(int64_to_float64(x, fpst), shift, fpst);
++    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(true, reg));
 +}
 +
-+float16 HELPER(vfp_uqtoh)(uint64_t x, uint32_t shift, void *fpst)
++static inline void vfp_store_reg64(TCGv_i64 var, int reg)
 +{
-+    return do_postscale_fp16(uint64_to_float64(x, fpst), shift, fpst);
++    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(true, reg));
 +}
 +
- static float64 do_prescale_fp16(float16 f, int shift, float_status *fpst)
++static inline void vfp_load_reg32(TCGv_i32 var, int reg)
  {
      if (unlikely(float16_is_any_nan(f))) {
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(vfp_touhh)(float16 x, uint32_t shift, void *fpst)
      return float64_to_uint16(do_prescale_fp16(x, shift, fpst), fpst);
  }
 +uint32_t HELPER(vfp_toslh)(float16 x, uint32_t shift, void *fpst)
 +{
-+    return float64_to_int32(do_prescale_fp16(x, shift, fpst), fpst);
++    tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
 +}
 +
-+uint32_t HELPER(vfp_toulh)(float16 x, uint32_t shift, void *fpst)
++static inline void vfp_store_reg32(TCGv_i32 var, int reg)
 +{
-+    return float64_to_uint32(do_prescale_fp16(x, shift, fpst), fpst);
++    tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 +}
 +
-+uint64_t HELPER(vfp_tosqh)(float16 x, uint32_t shift, void *fpst)
+ /*
-+{
+  * The imm8 encodes the sign bit, enough bits to represent an exponent in
-+    return float64_to_int64(do_prescale_fp16(x, shift, fpst), fpst);
+  * the range 01....1xx to 10....0xx, and the most significant 4 bits of
 +}
 +
 +uint64_t HELPER(vfp_touqh)(float16 x, uint32_t shift, void *fpst)
 +{
 +    return float64_to_uint64(do_prescale_fp16(x, shift, fpst), fpst);
 +}
 +
  /* Set the current fp rounding mode and return the old one.
   * The argument is a softfloat float_round_ value.
   */
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-a64.c
 +++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
                             bool itof, int rmode, int scale, int sf, int type)
  {
      bool is_signed = !(opcode & 1);
 -    bool is_double = type;
      TCGv_ptr tcg_fpstatus;
 -    TCGv_i32 tcg_shift;
 +    TCGv_i32 tcg_shift, tcg_single;
 +    TCGv_i64 tcg_double;
 -    tcg_fpstatus = get_fpstatus_ptr(false);
 +    tcg_fpstatus = get_fpstatus_ptr(type == 3);
      tcg_shift = tcg_const_i32(64 - scale);
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
              tcg_int = tcg_extend;
          }
 -        if (is_double) {
 -            TCGv_i64 tcg_double = tcg_temp_new_i64();
 +        switch (type) {
 +        case 1: /* float64 */
 +            tcg_double = tcg_temp_new_i64();
              if (is_signed) {
                  gen_helper_vfp_sqtod(tcg_double, tcg_int,
                                       tcg_shift, tcg_fpstatus);
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
              }
              write_fp_dreg(s, rd, tcg_double);
              tcg_temp_free_i64(tcg_double);
 -        } else {
 -            TCGv_i32 tcg_single = tcg_temp_new_i32();
 +            break;
 +
 +        case 0: /* float32 */
 +            tcg_single = tcg_temp_new_i32();
              if (is_signed) {
                  gen_helper_vfp_sqtos(tcg_single, tcg_int,
                                       tcg_shift, tcg_fpstatus);
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
              }
              write_fp_sreg(s, rd, tcg_single);
              tcg_temp_free_i32(tcg_single);
 +            break;
 +
 +        case 3: /* float16 */
 +            tcg_single = tcg_temp_new_i32();
 +            if (is_signed) {
 +                gen_helper_vfp_sqtoh(tcg_single, tcg_int,
 +                                     tcg_shift, tcg_fpstatus);
 +            } else {
 +                gen_helper_vfp_uqtoh(tcg_single, tcg_int,
 +                                     tcg_shift, tcg_fpstatus);
 +            }
 +            write_fp_sreg(s, rd, tcg_single);
 +            tcg_temp_free_i32(tcg_single);
 +            break;
 +
 +        default:
 +            g_assert_not_reached();
          }
      } else {
          TCGv_i64 tcg_int = cpu_reg(s, rd);
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
          gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
 -        if (is_double) {
 -            TCGv_i64 tcg_double = read_fp_dreg(s, rn);
 +        switch (type) {
 +        case 1: /* float64 */
 +            tcg_double = read_fp_dreg(s, rn);
              if (is_signed) {
                  if (!sf) {
                      gen_helper_vfp_tosld(tcg_int, tcg_double,
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
                                           tcg_shift, tcg_fpstatus);
                  }
              }
 +            if (!sf) {
 +                tcg_gen_ext32u_i64(tcg_int, tcg_int);
 +            }
              tcg_temp_free_i64(tcg_double);
 -        } else {
 -            TCGv_i32 tcg_single = read_fp_sreg(s, rn);
 +            break;
 +
 +        case 0: /* float32 */
 +            tcg_single = read_fp_sreg(s, rn);
              if (sf) {
                  if (is_signed) {
                      gen_helper_vfp_tosqs(tcg_int, tcg_single,
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
                  tcg_temp_free_i32(tcg_dest);
              }
              tcg_temp_free_i32(tcg_single);
 +            break;
 +
 +        case 3: /* float16 */
 +            tcg_single = read_fp_sreg(s, rn);
 +            if (sf) {
 +                if (is_signed) {
 +                    gen_helper_vfp_tosqh(tcg_int, tcg_single,
 +                                         tcg_shift, tcg_fpstatus);
 +                } else {
 +                    gen_helper_vfp_touqh(tcg_int, tcg_single,
 +                                         tcg_shift, tcg_fpstatus);
 +                }
 +            } else {
 +                TCGv_i32 tcg_dest = tcg_temp_new_i32();
 +                if (is_signed) {
 +                    gen_helper_vfp_toslh(tcg_dest, tcg_single,
 +                                         tcg_shift, tcg_fpstatus);
 +                } else {
 +                    gen_helper_vfp_toulh(tcg_dest, tcg_single,
 +                                         tcg_shift, tcg_fpstatus);
 +                }
 +                tcg_gen_extu_i32_i64(tcg_int, tcg_dest);
 +                tcg_temp_free_i32(tcg_dest);
 +            }
 +            tcg_temp_free_i32(tcg_single);
 +            break;
 +
 +        default:
 +            g_assert_not_reached();
          }
          gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
          tcg_temp_free_i32(tcg_rmode);
 -
 -        if (!sf) {
 -            tcg_gen_ext32u_i64(tcg_int, tcg_int);
 -        }
      }
      tcg_temp_free_ptr(tcg_fpstatus);
@@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
          /* actual FP conversions */
          bool itof = extract32(opcode, 1, 1);
 -        if (type > 1 || (rmode != 0 && opcode > 1)) {
 +        if (rmode != 0 && opcode > 1) {
 +            unallocated_encoding(s);
 +            return;
 +        }
 +        switch (type) {
 +        case 0: /* float32 */
 +        case 1: /* float64 */
 +            break;
 +        case 3: /* float16 */
 +            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 +                break;
 +            }
 +            /* fallthru */
 +        default:
              unallocated_encoding(s);
              return;
          }
 --
-.17.0
+.20.1

-New patch
+[PULL 10/26] target/arm: Make functions used by translate-vfp global
+Make the remaining functions which are needed by translate-vfp.c.inc
+global.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210430132740.10391-8-peter.maydell@linaro.org
+---
+ target/arm/translate-a32.h | 18 ++++++++++++++++++
+ target/arm/translate.c     | 25 ++++++++-----------------
+files changed, 26 insertions(+), 17 deletions(-)
+diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-a32.h
++++ b/target/arm/translate-a32.h
+@@ -XXX,XX +XXX,XX @@ void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop);
+ void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop);
+ void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop);
+ void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop);
++TCGv_i32 add_reg_for_lit(DisasContext *s, int reg, int ofs);
++void gen_set_cpsr(TCGv_i32 var, uint32_t mask);
++void gen_set_condexec(DisasContext *s);
++void gen_set_pc_im(DisasContext *s, target_ulong val);
++void gen_lookup_tb(DisasContext *s);
++long vfp_reg_offset(bool dp, unsigned reg);
++long neon_full_reg_offset(unsigned reg);
+ static inline TCGv_i32 load_cpu_offset(int offset)
+ {
+@@ -XXX,XX +XXX,XX @@ static inline TCGv_i32 load_reg(DisasContext *s, int reg)
+     return tmp;
+ }
++void store_reg(DisasContext *s, int reg, TCGv_i32 var);
++
+ void gen_aa32_ld_internal_i32(DisasContext *s, TCGv_i32 val,
+                               TCGv_i32 a32, int index, MemOp opc);
+ void gen_aa32_st_internal_i32(DisasContext *s, TCGv_i32 val,
+@@ -XXX,XX +XXX,XX @@ DO_GEN_ST(32, MO_UL)
+ #undef DO_GEN_LD
+ #undef DO_GEN_ST
++#if defined(CONFIG_USER_ONLY)
++#define IS_USER(s) 1
++#else
++#define IS_USER(s) (s->user)
++#endif
++
++/* Set NZCV flags from the high 4 bits of var.  */
++#define gen_set_nzcv(var) gen_set_cpsr(var, CPSR_NZCV)
++
+ #endif
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@
+ #include "translate.h"
+ #include "translate-a32.h"
+-#if defined(CONFIG_USER_ONLY)
+-#define IS_USER(s) 1
+-#else
+-#define IS_USER(s) (s->user)
+-#endif
+-
+ /* These are TCG temporaries used only by the legacy iwMMXt decoder */
+ static TCGv_i64 cpu_V0, cpu_V1, cpu_M0;
+ /* These are TCG globals which alias CPUARMState fields */
+@@ -XXX,XX +XXX,XX @@ void load_reg_var(DisasContext *s, TCGv_i32 var, int reg)
+  * This is used for load/store for which use of PC implies (literal),
+  * or ADD that implies ADR.
+  */
+-static TCGv_i32 add_reg_for_lit(DisasContext *s, int reg, int ofs)
++TCGv_i32 add_reg_for_lit(DisasContext *s, int reg, int ofs)
+ {
+     TCGv_i32 tmp = tcg_temp_new_i32();
+@@ -XXX,XX +XXX,XX @@ static TCGv_i32 add_reg_for_lit(DisasContext *s, int reg, int ofs)
+ /* Set a CPU register.  The source must be a temporary and will be
+    marked as dead.  */
+-static void store_reg(DisasContext *s, int reg, TCGv_i32 var)
++void store_reg(DisasContext *s, int reg, TCGv_i32 var)
+ {
+     if (reg == 15) {
+         /* In Thumb mode, we must ignore bit 0.
+@@ -XXX,XX +XXX,XX @@ static void store_sp_checked(DisasContext *s, TCGv_i32 var)
+ #define gen_sxtb16(var) gen_helper_sxtb16(var, var)
+ #define gen_uxtb16(var) gen_helper_uxtb16(var, var)
+-
+-static inline void gen_set_cpsr(TCGv_i32 var, uint32_t mask)
++void gen_set_cpsr(TCGv_i32 var, uint32_t mask)
+ {
+     TCGv_i32 tmp_mask = tcg_const_i32(mask);
+     gen_helper_cpsr_write(cpu_env, var, tmp_mask);
+     tcg_temp_free_i32(tmp_mask);
+ }
+-/* Set NZCV flags from the high 4 bits of var.  */
+-#define gen_set_nzcv(var) gen_set_cpsr(var, CPSR_NZCV)
+ static void gen_exception_internal(int excp)
+ {
+@@ -XXX,XX +XXX,XX @@ void arm_gen_test_cc(int cc, TCGLabel *label)
+     arm_free_cc(&cmp);
+ }
+-static inline void gen_set_condexec(DisasContext *s)
++void gen_set_condexec(DisasContext *s)
+ {
+     if (s->condexec_mask) {
+         uint32_t val = (s->condexec_cond << 4) | (s->condexec_mask >> 1);
+@@ -XXX,XX +XXX,XX @@ static inline void gen_set_condexec(DisasContext *s)
+     }
+ }
+-static inline void gen_set_pc_im(DisasContext *s, target_ulong val)
++void gen_set_pc_im(DisasContext *s, target_ulong val)
+ {
+     tcg_gen_movi_i32(cpu_R[15], val);
+ }
+@@ -XXX,XX +XXX,XX @@ static void gen_exception_el(DisasContext *s, int excp, uint32_t syn,
+ }
+ /* Force a TB lookup after an instruction that changes the CPU state.  */
+-static inline void gen_lookup_tb(DisasContext *s)
++void gen_lookup_tb(DisasContext *s)
+ {
+     tcg_gen_movi_i32(cpu_R[15], s->base.pc_next);
+     s->base.is_jmp = DISAS_EXIT;
+@@ -XXX,XX +XXX,XX @@ static inline void gen_hlt(DisasContext *s, int imm)
+ /*
+  * Return the offset of a "full" NEON Dreg.
+  */
+-static long neon_full_reg_offset(unsigned reg)
++long neon_full_reg_offset(unsigned reg)
+ {
+     return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
+ }
+@@ -XXX,XX +XXX,XX @@ static long neon_element_offset(int reg, int element, MemOp memop)
+ }
+ /* Return the offset of a VFP Dreg (dp = true) or VFP Sreg (dp = false). */
+-static long vfp_reg_offset(bool dp, unsigned reg)
++long vfp_reg_offset(bool dp, unsigned reg)
+ {
+     if (dp) {
+         return neon_element_offset(reg, 0, MO_64);
+--
+.20.1

-New patch
+[PULL 11/26] target/arm: Make translate-vfp.c.inc its own compilation unit
+Switch translate-vfp.c.inc from being #included into translate.c
+to being its own compilation unit.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210430132740.10391-9-peter.maydell@linaro.org
+---
+ target/arm/translate-a32.h                          |  2 ++
+ target/arm/{translate-vfp.c.inc => translate-vfp.c} | 12 +++++++-----
+ target/arm/translate.c                              |  3 +--
+ target/arm/meson.build                              |  5 +++--
+files changed, 13 insertions(+), 9 deletions(-)
+ rename target/arm/{translate-vfp.c.inc => translate-vfp.c} (99%)
+diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-a32.h
++++ b/target/arm/translate-a32.h
+@@ -XXX,XX +XXX,XX @@
+ /* Prototypes for autogenerated disassembler functions */
+ bool disas_m_nocp(DisasContext *dc, uint32_t insn);
++bool disas_vfp(DisasContext *s, uint32_t insn);
++bool disas_vfp_uncond(DisasContext *s, uint32_t insn);
+ void load_reg_var(DisasContext *s, TCGv_i32 var, int reg);
+ void arm_gen_condlabel(DisasContext *s);
+diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c
+similarity index 99%
+rename from target/arm/translate-vfp.c.inc
+rename to target/arm/translate-vfp.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-vfp.c.inc
++++ b/target/arm/translate-vfp.c
+@@ -XXX,XX +XXX,XX @@
+  * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+  */
+-/*
+- * This file is intended to be included from translate.c; it uses
+- * some macros and definitions provided by that file.
+- * It might be possible to convert it to a standalone .c file eventually.
+- */
++#include "qemu/osdep.h"
++#include "tcg/tcg-op.h"
++#include "tcg/tcg-op-gvec.h"
++#include "exec/exec-all.h"
++#include "exec/gen-icount.h"
++#include "translate.h"
++#include "translate-a32.h"
+ /* Include the generated VFP decoder */
+ #include "decode-vfp.c.inc"
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
+ #define ARM_CP_RW_BIT   (1 << 20)
+-/* Include the VFP and Neon decoders */
+-#include "translate-vfp.c.inc"
++/* Include the Neon decoder */
+ #include "translate-neon.c.inc"
+ static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
+diff --git a/target/arm/meson.build b/target/arm/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/meson.build
++++ b/target/arm/meson.build
+@@ -XXX,XX +XXX,XX @@ gen = [
+   decodetree.process('neon-shared.decode', extra_args: '--static-decode=disas_neon_shared'),
+   decodetree.process('neon-dp.decode', extra_args: '--static-decode=disas_neon_dp'),
+   decodetree.process('neon-ls.decode', extra_args: '--static-decode=disas_neon_ls'),
+-  decodetree.process('vfp.decode', extra_args: '--static-decode=disas_vfp'),
+-  decodetree.process('vfp-uncond.decode', extra_args: '--static-decode=disas_vfp_uncond'),
++  decodetree.process('vfp.decode', extra_args: '--decode=disas_vfp'),
++  decodetree.process('vfp-uncond.decode', extra_args: '--decode=disas_vfp_uncond'),
+   decodetree.process('m-nocp.decode', extra_args: '--decode=disas_m_nocp'),
+   decodetree.process('a32.decode', extra_args: '--static-decode=disas_a32'),
+   decodetree.process('a32-uncond.decode', extra_args: '--static-decode=disas_a32_uncond'),
+@@ -XXX,XX +XXX,XX @@ arm_ss.add(files(
+   'tlb_helper.c',
+   'translate.c',
+   'translate-m-nocp.c',
++  'translate-vfp.c',
+   'vec_helper.c',
+   'vfp_helper.c',
+   'cpu_tcg.c',
+--
+.20.1

-[Qemu-devel] [PULL 09/16] target/arm: Implement FP data-processing (2 source) for fp16
+[PULL 12/26] target/arm: Move vfp_reg_ptr() to translate-neon.c.inc
-From: Richard Henderson <richard.henderson@linaro.org>
+The function vfp_reg_ptr() is used only in translate-neon.c.inc;
 move it there.
-We missed all of the scalar fp16 binary operations.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Message-id: 20210430132740.10391-10-peter.maydell@linaro.org
 ---
  target/arm/translate.c          | 7 -------
  target/arm/translate-neon.c.inc | 7 +++++++
 files changed, 7 insertions(+), 7 deletions(-)
-Cc: qemu-stable@nongnu.org
+diff --git a/target/arm/translate.c b/target/arm/translate.c
 Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
 Tested-by: Alex Bennée <alex.bennee@linaro.org>
 Message-id: 20180512003217.9105-7-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
  target/arm/translate-a64.c | 65 ++++++++++++++++++++++++++++++++++++++
 file changed, 65 insertions(+)
 diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/target/arm/translate.c
-+++ b/target/arm/translate-a64.c
++++ b/target/arm/translate.c
-@@ -XXX,XX +XXX,XX @@ static void handle_fp_2src_double(DisasContext *s, int opcode,
+@@ -XXX,XX +XXX,XX @@ void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
-     tcg_temp_free_i64(tcg_res);
+     }
  }
-+/* Floating-point data-processing (2 source) - half precision */
+-static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
-+static void handle_fp_2src_half(DisasContext *s, int opcode,
+-{
-+                                int rd, int rn, int rm)
+-    TCGv_ptr ret = tcg_temp_new_ptr();
 -    tcg_gen_addi_ptr(ret, cpu_env, vfp_reg_offset(dp, reg));
 -    return ret;
 -}
 -
  #define ARM_CP_RW_BIT   (1 << 20)
  /* Include the Neon decoder */
 diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
 index XXXXXXX..XXXXXXX 100644
 --- a/target/arm/translate-neon.c.inc
 +++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static inline int neon_3same_fp_size(DisasContext *s, int x)
  #include "decode-neon-ls.c.inc"
  #include "decode-neon-shared.c.inc"
 +static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 +{
-+    TCGv_i32 tcg_op1;
++    TCGv_ptr ret = tcg_temp_new_ptr();
-+    TCGv_i32 tcg_op2;
++    tcg_gen_addi_ptr(ret, cpu_env, vfp_reg_offset(dp, reg));
-+    TCGv_i32 tcg_res;
++    return ret;
 +    TCGv_ptr fpst;
 +
 +    tcg_res = tcg_temp_new_i32();
 +    fpst = get_fpstatus_ptr(true);
 +    tcg_op1 = read_fp_hreg(s, rn);
 +    tcg_op2 = read_fp_hreg(s, rm);
 +
 +    switch (opcode) {
 +    case 0x0: /* FMUL */
 +        gen_helper_advsimd_mulh(tcg_res, tcg_op1, tcg_op2, fpst);
 +        break;
 +    case 0x1: /* FDIV */
 +        gen_helper_advsimd_divh(tcg_res, tcg_op1, tcg_op2, fpst);
 +        break;
 +    case 0x2: /* FADD */
 +        gen_helper_advsimd_addh(tcg_res, tcg_op1, tcg_op2, fpst);
 +        break;
 +    case 0x3: /* FSUB */
 +        gen_helper_advsimd_subh(tcg_res, tcg_op1, tcg_op2, fpst);
 +        break;
 +    case 0x4: /* FMAX */
 +        gen_helper_advsimd_maxh(tcg_res, tcg_op1, tcg_op2, fpst);
 +        break;
 +    case 0x5: /* FMIN */
 +        gen_helper_advsimd_minh(tcg_res, tcg_op1, tcg_op2, fpst);
 +        break;
 +    case 0x6: /* FMAXNM */
 +        gen_helper_advsimd_maxnumh(tcg_res, tcg_op1, tcg_op2, fpst);
 +        break;
 +    case 0x7: /* FMINNM */
 +        gen_helper_advsimd_minnumh(tcg_res, tcg_op1, tcg_op2, fpst);
 +        break;
 +    case 0x8: /* FNMUL */
 +        gen_helper_advsimd_mulh(tcg_res, tcg_op1, tcg_op2, fpst);
 +        tcg_gen_xori_i32(tcg_res, tcg_res, 0x8000);
 +        break;
 +    default:
 +        g_assert_not_reached();
 +    }
 +
 +    write_fp_sreg(s, rd, tcg_res);
 +
 +    tcg_temp_free_ptr(fpst);
 +    tcg_temp_free_i32(tcg_op1);
 +    tcg_temp_free_i32(tcg_op2);
 +    tcg_temp_free_i32(tcg_res);
 +}
 +
- /* Floating point data-processing (2 source)
+ static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
-  *   31  30  29 28       24 23  22  21 20  16 15    12 11 10 9    5 4    0
+ {
-  * +---+---+---+-----------+------+---+------+--------+-----+------+------+
+     long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
@@ -XXX,XX +XXX,XX @@ static void disas_fp_2src(DisasContext *s, uint32_t insn)
          }
          handle_fp_2src_double(s, opcode, rd, rn, rm);
          break;
 +    case 3:
 +        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
 +            unallocated_encoding(s);
 +            return;
 +        }
 +        if (!fp_access_check(s)) {
 +            return;
 +        }
 +        handle_fp_2src_half(s, opcode, rd, rn, rm);
 +        break;
      default:
          unallocated_encoding(s);
      }
 --
-.17.0
+.20.1

-New patch
+[PULL 13/26] target/arm: Delete unused typedef
+The VFPGenFixPointFn typedef is unused; delete it.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20210430132740.10391-11-peter.maydell@linaro.org
+---
+ target/arm/translate.c | 2 --
+file changed, 2 deletions(-)
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static const char * const regnames[] =
+ /* Function prototypes for gen_ functions calling Neon helpers.  */
+ typedef void NeonGenThreeOpEnvFn(TCGv_i32, TCGv_env, TCGv_i32,
+                                  TCGv_i32, TCGv_i32);
+-/* Function prototypes for gen_ functions for fix point conversions */
+-typedef void VFPGenFixPointFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
+ /* initialize TCG globals.  */
+ void arm_translate_init(void)
+--
+.20.1

-New patch
+[PULL 14/26] target/arm: Move NeonGenThreeOpEnvFn typedef to translate.h
+Move the NeonGenThreeOpEnvFn typedef to translate.h together
+with the other similar typedefs.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20210430132740.10391-12-peter.maydell@linaro.org
+---
+ target/arm/translate.h | 2 ++
+ target/arm/translate.c | 3 ---
+files changed, 2 insertions(+), 3 deletions(-)
+diff --git a/target/arm/translate.h b/target/arm/translate.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.h
++++ b/target/arm/translate.h
+@@ -XXX,XX +XXX,XX @@ typedef void NeonGenOneOpFn(TCGv_i32, TCGv_i32);
+ typedef void NeonGenOneOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32);
+ typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
+ typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
++typedef void NeonGenThreeOpEnvFn(TCGv_i32, TCGv_env, TCGv_i32,
++                                 TCGv_i32, TCGv_i32);
+ typedef void NeonGenTwo64OpFn(TCGv_i64, TCGv_i64, TCGv_i64);
+ typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
+ typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static const char * const regnames[] =
+     { "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
+       "r8", "r9", "r10", "r11", "r12", "r13", "r14", "pc" };
+-/* Function prototypes for gen_ functions calling Neon helpers.  */
+-typedef void NeonGenThreeOpEnvFn(TCGv_i32, TCGv_env, TCGv_i32,
+-                                 TCGv_i32, TCGv_i32);
+ /* initialize TCG globals.  */
+ void arm_translate_init(void)
+--
+.20.1

-New patch
+[PULL 15/26] target/arm: Make functions used by translate-neon global
+Make the remaining functions needed by the translate-neon code
+global.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210430132740.10391-13-peter.maydell@linaro.org
+---
+ target/arm/translate-a32.h |  8 ++++++++
+ target/arm/translate.c     | 10 ++--------
+files changed, 10 insertions(+), 8 deletions(-)
+diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-a32.h
++++ b/target/arm/translate-a32.h
+@@ -XXX,XX +XXX,XX @@ void gen_set_pc_im(DisasContext *s, target_ulong val);
+ void gen_lookup_tb(DisasContext *s);
+ long vfp_reg_offset(bool dp, unsigned reg);
+ long neon_full_reg_offset(unsigned reg);
++long neon_element_offset(int reg, int element, MemOp memop);
++void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
+ static inline TCGv_i32 load_cpu_offset(int offset)
+ {
+@@ -XXX,XX +XXX,XX @@ DO_GEN_ST(32, MO_UL)
+ /* Set NZCV flags from the high 4 bits of var.  */
+ #define gen_set_nzcv(var) gen_set_cpsr(var, CPSR_NZCV)
++/* Swap low and high halfwords.  */
++static inline void gen_swap_half(TCGv_i32 dest, TCGv_i32 var)
++{
++    tcg_gen_rotri_i32(dest, var, 16);
++}
++
+ #endif
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ static void gen_smul_dual(TCGv_i32 a, TCGv_i32 b)
+ }
+ /* Byteswap each halfword.  */
+-static void gen_rev16(TCGv_i32 dest, TCGv_i32 var)
++void gen_rev16(TCGv_i32 dest, TCGv_i32 var)
+ {
+     TCGv_i32 tmp = tcg_temp_new_i32();
+     TCGv_i32 mask = tcg_const_i32(0x00ff00ff);
+@@ -XXX,XX +XXX,XX @@ static void gen_revsh(TCGv_i32 dest, TCGv_i32 var)
+     tcg_gen_ext16s_i32(dest, var);
+ }
+-/* Swap low and high halfwords.  */
+-static void gen_swap_half(TCGv_i32 dest, TCGv_i32 var)
+-{
+-    tcg_gen_rotri_i32(dest, var, 16);
+-}
+-
+ /* Dual 16-bit add.  Result placed in t0 and t1 is marked as dead.
+     tmp = (t0 ^ t1) & 0x8000;
+     t0 &= ~0x8000;
+@@ -XXX,XX +XXX,XX @@ long neon_full_reg_offset(unsigned reg)
+  * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
+  * where 0 is the least significant end of the register.
+  */
+-static long neon_element_offset(int reg, int element, MemOp memop)
++long neon_element_offset(int reg, int element, MemOp memop)
+ {
+     int element_size = 1 << (memop & MO_SIZE);
+     int ofs = element * element_size;
+--
+.20.1

-New patch
+[PULL 16/26] target/arm: Make translate-neon.c.inc its own compilation unit
+Switch translate-neon.c.inc from being #included into translate.c
+to being its own compilation unit.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210430132740.10391-14-peter.maydell@linaro.org
+---
+ target/arm/translate-a32.h                           |  3 +++
+ .../arm/{translate-neon.c.inc => translate-neon.c}   | 12 +++++++-----
+ target/arm/translate.c                               |  3 ---
+ target/arm/meson.build                               |  7 ++++---
+files changed, 14 insertions(+), 11 deletions(-)
+ rename target/arm/{translate-neon.c.inc => translate-neon.c} (99%)
+diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-a32.h
++++ b/target/arm/translate-a32.h
+@@ -XXX,XX +XXX,XX @@
+ bool disas_m_nocp(DisasContext *dc, uint32_t insn);
+ bool disas_vfp(DisasContext *s, uint32_t insn);
+ bool disas_vfp_uncond(DisasContext *s, uint32_t insn);
++bool disas_neon_dp(DisasContext *s, uint32_t insn);
++bool disas_neon_ls(DisasContext *s, uint32_t insn);
++bool disas_neon_shared(DisasContext *s, uint32_t insn);
+ void load_reg_var(DisasContext *s, TCGv_i32 var, int reg);
+ void arm_gen_condlabel(DisasContext *s);
+diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c
+similarity index 99%
+rename from target/arm/translate-neon.c.inc
+rename to target/arm/translate-neon.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate-neon.c.inc
++++ b/target/arm/translate-neon.c
+@@ -XXX,XX +XXX,XX @@
+  * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+  */
+-/*
+- * This file is intended to be included from translate.c; it uses
+- * some macros and definitions provided by that file.
+- * It might be possible to convert it to a standalone .c file eventually.
+- */
++#include "qemu/osdep.h"
++#include "tcg/tcg-op.h"
++#include "tcg/tcg-op-gvec.h"
++#include "exec/exec-all.h"
++#include "exec/gen-icount.h"
++#include "translate.h"
++#include "translate-a32.h"
+ static inline int plus1(DisasContext *s, int x)
+ {
+diff --git a/target/arm/translate.c b/target/arm/translate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/translate.c
++++ b/target/arm/translate.c
+@@ -XXX,XX +XXX,XX @@ void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
+ #define ARM_CP_RW_BIT   (1 << 20)
+-/* Include the Neon decoder */
+-#include "translate-neon.c.inc"
+-
+ static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
+ {
+     tcg_gen_ld_i64(var, cpu_env, offsetof(CPUARMState, iwmmxt.regs[reg]));
+diff --git a/target/arm/meson.build b/target/arm/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/meson.build
++++ b/target/arm/meson.build
+@@ -XXX,XX +XXX,XX @@
+ gen = [
+   decodetree.process('sve.decode', extra_args: '--decode=disas_sve'),
+-  decodetree.process('neon-shared.decode', extra_args: '--static-decode=disas_neon_shared'),
+-  decodetree.process('neon-dp.decode', extra_args: '--static-decode=disas_neon_dp'),
+-  decodetree.process('neon-ls.decode', extra_args: '--static-decode=disas_neon_ls'),
++  decodetree.process('neon-shared.decode', extra_args: '--decode=disas_neon_shared'),
++  decodetree.process('neon-dp.decode', extra_args: '--decode=disas_neon_dp'),
++  decodetree.process('neon-ls.decode', extra_args: '--decode=disas_neon_ls'),
+   decodetree.process('vfp.decode', extra_args: '--decode=disas_vfp'),
+   decodetree.process('vfp-uncond.decode', extra_args: '--decode=disas_vfp_uncond'),
+   decodetree.process('m-nocp.decode', extra_args: '--decode=disas_m_nocp'),
+@@ -XXX,XX +XXX,XX @@ arm_ss.add(files(
+   'tlb_helper.c',
+   'translate.c',
+   'translate-m-nocp.c',
++  'translate-neon.c',
+   'translate-vfp.c',
+   'vec_helper.c',
+   'vfp_helper.c',
+--
+.20.1

-New patch
+[PULL 17/26] target/arm: Make WFI a NOP for userspace emulators
+The WFI insn is not system-mode only, though it doesn't usually make
+a huge amount of sense for userspace code to execute it.  Currently
+if you try it in qemu-arm then the helper function will raise an
+EXCP_HLT exception, which is not covered by the switch in cpu_loop()
+and results in an abort:
+qemu: unhandled CPU exception 0x10001 - aborting
+R00=00000001 R01=408003e4 R02=408003ec R03=000102ec
+R04=00010a28 R05=00010158 R06=00087460 R07=00010158
+R08=00000000 R09=00000000 R10=00085b7c R11=408002a4
+R12=408002b8 R13=408002a0 R14=0001057c R15=000102f8
+PSR=60000010 -ZC- A usr32
+qemu:handle_cpu_signal received signal outside vCPU context @ pc=0x7fcbfa4f0a12
+Make the WFI helper function return immediately in the usermode
+emulator. This turns WFI into a NOP, which is OK because:
+ * architecturally "WFI is a NOP" is a permitted implementation
+ * aarch64 Linux kernels use the SCTLR_EL1.nTWI bit to trap
+   userspace WFI and NOP it (though aarch32 kernels currently
+   just let WFI do whatever it would do)
+We could in theory make the translate.c code special case user-mode
+emulation and NOP the insn entirely rather than making the helper
+do nothing, but because no real world code will be trying to
+execute WFI we don't care about efficiency and the helper provides
+a single place where we can make the change rather than having
+to touch multiple places in translate.c and translate-a64.c.
+Fixes: https://bugs.launchpad.net/qemu/+bug/1926759
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Message-id: 20210430162212.825-1-peter.maydell@linaro.org
+---
+ target/arm/op_helper.c | 12 ++++++++++++
+file changed, 12 insertions(+)
+diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
+index XXXXXXX..XXXXXXX 100644
+--- a/target/arm/op_helper.c
++++ b/target/arm/op_helper.c
+@@ -XXX,XX +XXX,XX @@ static inline int check_wfx_trap(CPUARMState *env, bool is_wfe)
+ void HELPER(wfi)(CPUARMState *env, uint32_t insn_len)
+ {
++#ifdef CONFIG_USER_ONLY
++    /*
++     * WFI in the user-mode emulator is technically permitted but not
++     * something any real-world code would do. AArch64 Linux kernels
++     * trap it via SCTRL_EL1.nTWI and make it an (expensive) NOP;
++     * AArch32 kernels don't trap it so it will delay a bit.
++     * For QEMU, make it NOP here, because trying to raise EXCP_HLT
++     * would trigger an abort.
++     */
++    return;
++#else
+     CPUState *cs = env_cpu(env);
+     int target_el = check_wfx_trap(env, false);
+@@ -XXX,XX +XXX,XX @@ void HELPER(wfi)(CPUARMState *env, uint32_t insn_len)
+     cs->exception_index = EXCP_HLT;
+     cs->halted = 1;
+     cpu_loop_exit(cs);
++#endif
+ }
+ void HELPER(wfe)(CPUARMState *env)
+--
+.20.1

-New patch
+[PULL 18/26] hw/sd/omap_mmc: Use device_cold_reset() instead of device_legacy_reset()
+The omap_mmc_reset() function resets its SD card via
+device_legacy_reset().  We know that the SD card does not have a qbus
+of its own, so the new device_cold_reset() function (which resets
+both the device and its child buses) is equivalent here to
+device_legacy_reset() and we can just switch to the new API.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20210430222348.8514-1-peter.maydell@linaro.org
+---
+ hw/sd/omap_mmc.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/hw/sd/omap_mmc.c b/hw/sd/omap_mmc.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/sd/omap_mmc.c
++++ b/hw/sd/omap_mmc.c
+@@ -XXX,XX +XXX,XX @@ void omap_mmc_reset(struct omap_mmc_s *host)
+      * into any bus, and we must reset it manually. When omap_mmc is
+      * QOMified this must move into the QOM reset function.
+      */
+-    device_legacy_reset(DEVICE(host->card));
++    device_cold_reset(DEVICE(host->card));
+ }
+ static uint64_t omap_mmc_read(void *opaque, hwaddr offset,
+--
+.20.1

-New patch
+[PULL 19/26] osdep: Make os-win32.h and os-posix.h handle 'extern "C"' themselves
+Both os-win32.h and os-posix.h include system header files. Instead
+of having osdep.h include them inside its 'extern "C"' block, make
+these headers handle that themselves, so that we don't include the
+system headers inside 'extern "C"'.
+This doesn't fix any current problems, but it's conceptually the
+right way to handle system headers.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+---
+ include/qemu/osdep.h      | 8 ++++----
+ include/sysemu/os-posix.h | 8 ++++++++
+ include/sysemu/os-win32.h | 8 ++++++++
+files changed, 20 insertions(+), 4 deletions(-)
+diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/qemu/osdep.h
++++ b/include/qemu/osdep.h
+@@ -XXX,XX +XXX,XX @@ QEMU_EXTERN_C int daemon(int, int);
+  */
+ #include "glib-compat.h"
+-#ifdef __cplusplus
+-extern "C" {
+-#endif
+-
+ #ifdef _WIN32
+ #include "sysemu/os-win32.h"
+ #endif
+@@ -XXX,XX +XXX,XX @@ extern "C" {
+ #include "sysemu/os-posix.h"
+ #endif
++#ifdef __cplusplus
++extern "C" {
++#endif
++
+ #include "qemu/typedefs.h"
+ /*
+diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/sysemu/os-posix.h
++++ b/include/sysemu/os-posix.h
+@@ -XXX,XX +XXX,XX @@
+ #include <sys/sysmacros.h>
+ #endif
++#ifdef __cplusplus
++extern "C" {
++#endif
++
+ void os_set_line_buffering(void);
+ void os_set_proc_name(const char *s);
+ void os_setup_signal_handling(void);
+@@ -XXX,XX +XXX,XX @@ static inline void qemu_funlockfile(FILE *f)
+     funlockfile(f);
+ }
++#ifdef __cplusplus
++}
++#endif
++
+ #endif
+diff --git a/include/sysemu/os-win32.h b/include/sysemu/os-win32.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/sysemu/os-win32.h
++++ b/include/sysemu/os-win32.h
+@@ -XXX,XX +XXX,XX @@
+ #include <windows.h>
+ #include <ws2tcpip.h>
++#ifdef __cplusplus
++extern "C" {
++#endif
++
+ #if defined(_WIN64)
+ /* On w64, setjmp is implemented by _setjmp which needs a second parameter.
+  * If this parameter is NULL, longjump does no stack unwinding.
+@@ -XXX,XX +XXX,XX @@ ssize_t qemu_recv_wrap(int sockfd, void *buf, size_t len, int flags);
+ ssize_t qemu_recvfrom_wrap(int sockfd, void *buf, size_t len, int flags,
+                            struct sockaddr *addr, socklen_t *addrlen);
++#ifdef __cplusplus
++}
++#endif
++
+ #endif
+--
+.20.1

-[Qemu-devel] [PULL 16/16] tcg: Optionally log FPU state in TCG -d cpu logging
+[PULL 20/26] include/qemu/bswap.h: Handle being included outside extern "C" block
-Usually the logging of the CPU state produced by -d cpu is sufficient
+Make bswap.h handle being included outside an 'extern "C"' block:
-to diagnose problems, but sometimes you want to see the state of
+all system headers are included first, then all declarations are
-the floating point registers as well. We don't want to enable that
+put inside an 'extern "C"' block.
-by default as it adds a lot of extra data to the log; instead,
-allow it to be optionally enabled via -d fpu.
+This requires a little rearrangement as currently we have an ifdef
 ladder that has some system includes and some local declarations
 or definitions, and we need to separate those out.
 We want to do this because dis-asm.h includes bswap.h, dis-asm.h
 may need to be included from C++ files, and system headers should
 not be included within 'extern "C"' blocks.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180510130024.31678-1-peter.maydell@linaro.org
 ---
- include/qemu/log.h   | 1 +
+ include/qemu/bswap.h | 26 ++++++++++++++++++++++----
- accel/tcg/cpu-exec.c | 9 ++++++---
+file changed, 22 insertions(+), 4 deletions(-)
  util/log.c           | 2 ++
 files changed, 9 insertions(+), 3 deletions(-)
-diff --git a/include/qemu/log.h b/include/qemu/log.h
+diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/qemu/log.h
+--- a/include/qemu/bswap.h
-+++ b/include/qemu/log.h
++++ b/include/qemu/bswap.h
-@@ -XXX,XX +XXX,XX @@ static inline bool qemu_log_separate(void)
+@@ -XXX,XX +XXX,XX @@
- #define CPU_LOG_PAGE       (1 << 14)
+ #ifndef BSWAP_H
- /* LOG_TRACE (1 << 15) is defined in log-for-trace.h */
+ #define BSWAP_H
- #define CPU_LOG_TB_OP_IND  (1 << 16)
-+#define CPU_LOG_TB_FPU     (1 << 17)
+-#include "fpu/softfloat-types.h"
+-
- /* Lock output for a series of related logs.  Since this is not needed
+ #ifdef CONFIG_MACHINE_BSWAP_H
-  * for a single qemu_log / qemu_log_mask / qemu_log_mask_and_addr, we
+ # include <sys/endian.h>
-diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
+ # include <machine/bswap.h>
-index XXXXXXX..XXXXXXX 100644
+@@ -XXX,XX +XXX,XX @@
---- a/accel/tcg/cpu-exec.c
+ # include <endian.h>
-+++ b/accel/tcg/cpu-exec.c
+ #elif defined(CONFIG_BYTESWAP_H)
-@@ -XXX,XX +XXX,XX @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
+ # include <byteswap.h>
-     if (qemu_loglevel_mask(CPU_LOG_TB_CPU)
++#define BSWAP_FROM_BYTESWAP
-         && qemu_log_in_addr_range(itb->pc)) {
++# else
-         qemu_log_lock();
++#define BSWAP_FROM_FALLBACKS
-+        int flags = 0;
++#endif /* ! CONFIG_MACHINE_BSWAP_H */
-+        if (qemu_loglevel_mask(CPU_LOG_TB_FPU)) {
-+            flags |= CPU_DUMP_FPU;
++#ifdef __cplusplus
-+        }
++extern "C" {
- #if defined(TARGET_I386)
++#endif
--        log_cpu_state(cpu, CPU_DUMP_CCOP);
++
--#else
++#include "fpu/softfloat-types.h"
--        log_cpu_state(cpu, 0);
++
-+        flags |= CPU_DUMP_CCOP;
++#ifdef BSWAP_FROM_BYTESWAP
- #endif
+ static inline uint16_t bswap16(uint16_t x)
-+        log_cpu_state(cpu, flags);
+ {
-         qemu_log_unlock();
+     return bswap_16(x);
-     }
+@@ -XXX,XX +XXX,XX @@ static inline uint64_t bswap64(uint64_t x)
- #endif /* DEBUG_DISAS */
+ {
-diff --git a/util/log.c b/util/log.c
+     return bswap_64(x);
-index XXXXXXX..XXXXXXX 100644
+ }
---- a/util/log.c
+-# else
-+++ b/util/log.c
++#endif
-@@ -XXX,XX +XXX,XX @@ const QEMULogItem qemu_log_items[] = {
++
-       "show trace before each executed TB (lots of logs)" },
++#ifdef BSWAP_FROM_FALLBACKS
-     { CPU_LOG_TB_CPU, "cpu",
+ static inline uint16_t bswap16(uint16_t x)
-       "show CPU registers before entering a TB (lots of logs)" },
+ {
-+    { CPU_LOG_TB_FPU, "fpu",
+     return (((x & 0x00ff) << 8) |
-+      "include FPU registers in the 'cpu' logging" },
+@@ -XXX,XX +XXX,XX @@ static inline uint64_t bswap64(uint64_t x)
-     { CPU_LOG_MMU, "mmu",
+             ((x & 0x00ff000000000000ULL) >> 40) |
-       "log MMU-related activities" },
+             ((x & 0xff00000000000000ULL) >> 56));
-     { CPU_LOG_PCALL, "pcall",
+ }
 -#endif /* ! CONFIG_MACHINE_BSWAP_H */
 +#endif
 +
 +#undef BSWAP_FROM_BYTESWAP
 +#undef BSWAP_FROM_FALLBACKS
  static inline void bswap16s(uint16_t *s)
  {
@@ -XXX,XX +XXX,XX @@ DO_STN_LDN_P(be)
  #undef le_bswaps
  #undef be_bswaps
 +#ifdef __cplusplus
 +}
 +#endif
 +
  #endif /* BSWAP_H */
 --
-.17.0
+.20.1

-[Qemu-devel] [PULL 03/16] target/arm: Fix fp_status_f16 tininess before rounding
+[PULL 21/26] include/disas/dis-asm.h: Handle being included outside 'extern "C"'
-In commit d81ce0ef2c4f105 we added an extra float_status field
+Make dis-asm.h handle being included outside an 'extern "C"' block;
-fp_status_fp16 for Arm, but forgot to initialize it correctly
+this allows us to remove the 'extern "C"' blocks that our two C++
-by setting it to float_tininess_before_rounding. This currently
+files that include it are using.
 will only cause problems for the new V8_FP16 feature, since the
 float-to-float conversion code doesn't use it yet. The effect
 would be that we failed to set the Underflow IEEE exception flag
 in all the cases where we should.
-Add the missing initialization.
+Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 ---
  include/disas/dis-asm.h | 12 ++++++++++--
  disas/arm-a64.cc        |  2 --
  disas/nanomips.cpp      |  2 --
 files changed, 10 insertions(+), 6 deletions(-)
-Fixes: d81ce0ef2c4f105
+diff --git a/include/disas/dis-asm.h b/include/disas/dis-asm.h
 Cc: qemu-stable@nongnu.org
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
 Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20180512004311.9299-16-richard.henderson@linaro.org
 ---
  target/arm/cpu.c | 2 ++
 file changed, 2 insertions(+)
 diff --git a/target/arm/cpu.c b/target/arm/cpu.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/cpu.c
+--- a/include/disas/dis-asm.h
-+++ b/target/arm/cpu.c
++++ b/include/disas/dis-asm.h
-@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
+@@ -XXX,XX +XXX,XX @@
-                               &env->vfp.fp_status);
+ #ifndef DISAS_DIS_ASM_H
-     set_float_detect_tininess(float_tininess_before_rounding,
+ #define DISAS_DIS_ASM_H
-                               &env->vfp.standard_fp_status);
-+    set_float_detect_tininess(float_tininess_before_rounding,
++#include "qemu/bswap.h"
-+                              &env->vfp.fp_status_f16);
++
- #ifndef CONFIG_USER_ONLY
++#ifdef __cplusplus
-     if (kvm_enabled()) {
++extern "C" {
-         kvm_arm_reset_vcpu(cpu);
++#endif
 +
  typedef void *PTR;
  typedef uint64_t bfd_vma;
  typedef int64_t bfd_signed_vma;
@@ -XXX,XX +XXX,XX @@ bool cap_disas_plugin(disassemble_info *info, uint64_t pc, size_t size);
  /* from libbfd */
 -#include "qemu/bswap.h"
 -
  static inline bfd_vma bfd_getl64(const bfd_byte *addr)
  {
      return ldq_le_p(addr);
@@ -XXX,XX +XXX,XX @@ static inline bfd_vma bfd_getb16(const bfd_byte *addr)
  typedef bool bfd_boolean;
 +#ifdef __cplusplus
 +}
 +#endif
 +
  #endif /* DISAS_DIS_ASM_H */
 diff --git a/disas/arm-a64.cc b/disas/arm-a64.cc
 index XXXXXXX..XXXXXXX 100644
 --- a/disas/arm-a64.cc
 +++ b/disas/arm-a64.cc
@@ -XXX,XX +XXX,XX @@
   */
  #include "qemu/osdep.h"
 -extern "C" {
  #include "disas/dis-asm.h"
 -}
  #include "vixl/a64/disasm-a64.h"
 diff --git a/disas/nanomips.cpp b/disas/nanomips.cpp
 index XXXXXXX..XXXXXXX 100644
 --- a/disas/nanomips.cpp
 +++ b/disas/nanomips.cpp
@@ -XXX,XX +XXX,XX @@
   */
  #include "qemu/osdep.h"
 -extern "C" {
  #include "disas/dis-asm.h"
 -}
  #include <cstring>
  #include <stdexcept>
 --
-.17.0
+.20.1

-[Qemu-devel] [PULL 15/16] sdcard: Correct CRC16 offset in sd_function_switch()
+[PULL 22/26] hw/arm/imx25_pdk: Fix error message for invalid RAM size
 From: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Per the Physical Layer Simplified Spec. "4.3.10.4 Switch Function Status":
+The i.MX25 PDK board has 2 banks for SDRAM, each can
 address up to 256 MiB. So the total RAM usable for this
 board is 512M. When we ask for more we get a misleading
 error message:
-  The block length is predefined to 512 bits
+  $ qemu-system-arm -M imx25-pdk -m 513M
   qemu-system-arm: Invalid RAM size, should be 128 MiB
-and "4.10.2 SD Status":
+Update the error message to better match the reality:
-  The SD Status contains status bits that are related to the SD Memory Card
+  $ qemu-system-arm -M imx25-pdk -m 513M
-  proprietary features and may be used for future application-specific usage.
+  qemu-system-arm: RAM size more than 512 MiB is not supported
   The size of the SD Status is one data block of 512 bit. The content of this
   register is transmitted to the Host over the DAT bus along with a 16-bit CRC.
-Thus the 16-bit CRC goes at offset 64.
+Fixes: bf350daae02 ("arm/imx25_pdk: drop RAM size fixup")
 Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Message-id: 20180509060104.4458-3-f4bug@amsat.org
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Igor Mammedov <imammedo@redhat.com>
 Message-id: 20210407225608.1882855-1-f4bug@amsat.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- hw/sd/sd.c | 2 +-
+ hw/arm/imx25_pdk.c | 5 ++---
-file changed, 1 insertion(+), 1 deletion(-)
+file changed, 2 insertions(+), 3 deletions(-)
-diff --git a/hw/sd/sd.c b/hw/sd/sd.c
+diff --git a/hw/arm/imx25_pdk.c b/hw/arm/imx25_pdk.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/sd/sd.c
+--- a/hw/arm/imx25_pdk.c
-+++ b/hw/sd/sd.c
++++ b/hw/arm/imx25_pdk.c
-@@ -XXX,XX +XXX,XX @@ static void sd_function_switch(SDState *sd, uint32_t arg)
+@@ -XXX,XX +XXX,XX @@ static struct arm_boot_info imx25_pdk_binfo;
-         sd->data[14 + (i >> 1)] = new_func << ((i * 4) & 4);
  static void imx25_pdk_init(MachineState *machine)
  {
 -    MachineClass *mc = MACHINE_GET_CLASS(machine);
      IMX25PDK *s = g_new0(IMX25PDK, 1);
      unsigned int ram_size;
      unsigned int alias_offset;
@@ -XXX,XX +XXX,XX @@ static void imx25_pdk_init(MachineState *machine)
      /* We need to initialize our memory */
      if (machine->ram_size > (FSL_IMX25_SDRAM0_SIZE + FSL_IMX25_SDRAM1_SIZE)) {
 -        char *sz = size_to_str(mc->default_ram_size);
 -        error_report("Invalid RAM size, should be %s", sz);
 +        char *sz = size_to_str(FSL_IMX25_SDRAM0_SIZE + FSL_IMX25_SDRAM1_SIZE);
 +        error_report("RAM size more than %s is not supported", sz);
          g_free(sz);
          exit(EXIT_FAILURE);
      }
-     memset(&sd->data[17], 0, 47);
--    stw_be_p(sd->data + 65, sd_crc16(sd->data, 64));
-+    stw_be_p(sd->data + 64, sd_crc16(sd->data, 64));
- }
- static inline bool sd_wp_addr(SDState *sd, uint64_t addr)
 --
-.17.0
+.20.1

-[Qemu-devel] [PULL 02/16] fpu/softfloat: Don't set Invalid for float-to-int(MAXINT)
+[PULL 23/26] hw/misc/mps2-scc: Add "QEMU interface" comment
-In float-to-integer conversion, if the floating point input
+The MPS2 SCC device doesn't have any documentation of its properties;
-converts exactly to the largest or smallest integer that
+add a "QEMU interface" format comment describing them.
 fits in to the result type, this is not an overflow.
 In this situation we were producing the correct result value,
 but were incorrectly setting the Invalid flag.
 For example for Arm A64, "FCVTAS w0, d0" on an input of
 x41dfffffffc00000 should produce 0x7fffffff and set no flags.
-Fix the boundary case to take the right half of the if()
-statements.
-This fixes a regression from 2.11 introduced by the softfloat
-refactoring.
-Cc: qemu-stable@nongnu.org
-Fixes: ab52f973a50
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
-Message-id: 20180510140141.12120-1-peter.maydell@linaro.org
+Message-id: 20210504120912.23094-2-peter.maydell@linaro.org
 ---
- fpu/softfloat.c | 4 ++--
+ include/hw/misc/mps2-scc.h | 12 ++++++++++++
-file changed, 2 insertions(+), 2 deletions(-)
+file changed, 12 insertions(+)
-diff --git a/fpu/softfloat.c b/fpu/softfloat.c
+diff --git a/include/hw/misc/mps2-scc.h b/include/hw/misc/mps2-scc.h
 index XXXXXXX..XXXXXXX 100644
---- a/fpu/softfloat.c
+--- a/include/hw/misc/mps2-scc.h
-+++ b/fpu/softfloat.c
++++ b/include/hw/misc/mps2-scc.h
-@@ -XXX,XX +XXX,XX @@ static int64_t round_to_int_and_pack(FloatParts in, int rmode,
+@@ -XXX,XX +XXX,XX @@
-             r = UINT64_MAX;
+  *  (at your option) any later version.
-         }
+  */
-         if (p.sign) {
--            if (r < -(uint64_t) min) {
++/*
-+            if (r <= -(uint64_t) min) {
++ * This is a model of the Serial Communication Controller (SCC)
-                 return -r;
++ * block found in most MPS FPGA images.
-             } else {
++ *
-                 s->float_exception_flags = orig_flags | float_flag_invalid;
++ * QEMU interface:
-                 return min;
++ *  + sysbus MMIO region 0: the register bank
-             }
++ *  + QOM property "scc-cfg4": value of the read-only CFG4 register
-         } else {
++ *  + QOM property "scc-aid": value of the read-only SCC_AID register
--            if (r < max) {
++ *  + QOM property "scc-id": value of the read-only SCC_ID register
-+            if (r <= max) {
++ *  + QOM property array "oscclk": reset values of the OSCCLK registers
-                 return r;
++ *    (which are accessed via the SYS_CFG channel provided by this device)
-             } else {
++ */
-                 s->float_exception_flags = orig_flags | float_flag_invalid;
+ #ifndef MPS2_SCC_H
  #define MPS2_SCC_H
 --
-.17.0
+.20.1

-[Qemu-devel] [PULL 08/16] target/arm: Introduce and use read_fp_hreg
+[PULL 24/26] hw/misc/mps2-scc: Support using CFG0 bit 0 for remapping
-From: Richard Henderson <richard.henderson@linaro.org>
+On some boards, SCC config register CFG0 bit 0 controls whether
 parts of the board memory map are remapped. Support this with:
  * a device property scc-cfg0 so the board can specify the
    initial value of the CFG0 register
  * an outbound GPIO line which tracks bit 0 and which the board
    can wire up to provide the remapping
-Cc: qemu-stable@nongnu.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
-Tested-by: Alex Bennée <alex.bennee@linaro.org>
-Message-id: 20180512003217.9105-6-richard.henderson@linaro.org
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20210504120912.23094-3-peter.maydell@linaro.org
 ---
- target/arm/translate-a64.c | 30 ++++++++++++++----------------
+ include/hw/misc/mps2-scc.h |  9 +++++++++
-file changed, 14 insertions(+), 16 deletions(-)
+ hw/misc/mps2-scc.c         | 13 ++++++++++---
 files changed, 19 insertions(+), 3 deletions(-)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/include/hw/misc/mps2-scc.h b/include/hw/misc/mps2-scc.h
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/include/hw/misc/mps2-scc.h
-+++ b/target/arm/translate-a64.c
++++ b/include/hw/misc/mps2-scc.h
-@@ -XXX,XX +XXX,XX @@ static TCGv_i32 read_fp_sreg(DisasContext *s, int reg)
+@@ -XXX,XX +XXX,XX @@
-     return v;
+  *  + QOM property "scc-cfg4": value of the read-only CFG4 register
   *  + QOM property "scc-aid": value of the read-only SCC_AID register
   *  + QOM property "scc-id": value of the read-only SCC_ID register
 + *  + QOM property "scc-cfg0": reset value of the CFG0 register
   *  + QOM property array "oscclk": reset values of the OSCCLK registers
   *    (which are accessed via the SYS_CFG channel provided by this device)
 + *  + named GPIO output "remap": this tracks the value of CFG0 register
 + *    bit 0. Boards where this bit controls memory remapping should
 + *    connect this GPIO line to a function performing that mapping.
 + *    Boards where bit 0 has no special function should leave the GPIO
 + *    output disconnected.
   */
  #ifndef MPS2_SCC_H
  #define MPS2_SCC_H
@@ -XXX,XX +XXX,XX @@ struct MPS2SCC {
      uint32_t num_oscclk;
      uint32_t *oscclk;
      uint32_t *oscclk_reset;
 +    uint32_t cfg0_reset;
 +
 +    qemu_irq remap;
  };
  #endif
 diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/misc/mps2-scc.c
 +++ b/hw/misc/mps2-scc.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/bitops.h"
  #include "trace.h"
  #include "hw/sysbus.h"
 +#include "hw/irq.h"
  #include "migration/vmstate.h"
  #include "hw/registerfields.h"
  #include "hw/misc/mps2-scc.h"
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
      switch (offset) {
      case A_CFG0:
          /*
 -         * TODO on some boards bit 0 controls RAM remapping;
 -         * on others bit 1 is CPU_WAIT.
 +         * On some boards bit 0 controls board-specific remapping;
 +         * we always reflect bit 0 in the 'remap' GPIO output line,
 +         * and let the board wire it up or not as it chooses.
 +         * TODO on some boards bit 1 is CPU_WAIT.
           */
          s->cfg0 = value;
 +        qemu_set_irq(s->remap, s->cfg0 & 1);
          break;
      case A_CFG1:
          s->cfg1 = value;
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_reset(DeviceState *dev)
      int i;
      trace_mps2_scc_reset();
 -    s->cfg0 = 0;
 +    s->cfg0 = s->cfg0_reset;
      s->cfg1 = 0;
      s->cfg2 = 0;
      s->cfg5 = 0;
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_init(Object *obj)
      memory_region_init_io(&s->iomem, obj, &mps2_scc_ops, s, "mps2-scc", 0x1000);
      sysbus_init_mmio(sbd, &s->iomem);
 +    qdev_init_gpio_out_named(DEVICE(obj), &s->remap, "remap", 1);
  }
-+static TCGv_i32 read_fp_hreg(DisasContext *s, int reg)
+ static void mps2_scc_realize(DeviceState *dev, Error **errp)
-+{
+@@ -XXX,XX +XXX,XX @@ static Property mps2_scc_properties[] = {
-+    TCGv_i32 v = tcg_temp_new_i32();
+     DEFINE_PROP_UINT32("scc-cfg4", MPS2SCC, cfg4, 0),
-+
+     DEFINE_PROP_UINT32("scc-aid", MPS2SCC, aid, 0),
-+    tcg_gen_ld16u_i32(v, cpu_env, fp_reg_offset(s, reg, MO_16));
+     DEFINE_PROP_UINT32("scc-id", MPS2SCC, id, 0),
-+    return v;
++    /* Reset value for CFG0 register */
-+}
++    DEFINE_PROP_UINT32("scc-cfg0", MPS2SCC, cfg0_reset, 0),
-+
+     /*
- /* Clear the bits above an N-bit vector, for N = (is_q ? 128 : 64).
+      * These are the initial settings for the source clocks on the board.
-  * If SVE is not enabled, then there are only 128 bits in the vector.
+      * In hardware they can be configured via a config file read by the
   */
@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
  static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn)
  {
      TCGv_ptr fpst = NULL;
 -    TCGv_i32 tcg_op = tcg_temp_new_i32();
 +    TCGv_i32 tcg_op = read_fp_hreg(s, rn);
      TCGv_i32 tcg_res = tcg_temp_new_i32();
 -    read_vec_element_i32(s, tcg_op, rn, 0, MO_16);
 -
      switch (opcode) {
      case 0x0: /* FMOV */
          tcg_gen_mov_i32(tcg_res, tcg_op);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
          tcg_temp_free_i64(tcg_op2);
          tcg_temp_free_i64(tcg_res);
      } else {
 -        TCGv_i32 tcg_op1 = tcg_temp_new_i32();
 -        TCGv_i32 tcg_op2 = tcg_temp_new_i32();
 +        TCGv_i32 tcg_op1 = read_fp_hreg(s, rn);
 +        TCGv_i32 tcg_op2 = read_fp_hreg(s, rm);
          TCGv_i64 tcg_res = tcg_temp_new_i64();
 -        read_vec_element_i32(s, tcg_op1, rn, 0, MO_16);
 -        read_vec_element_i32(s, tcg_op2, rm, 0, MO_16);
 -
          gen_helper_neon_mull_s16(tcg_res, tcg_op1, tcg_op2);
          gen_helper_neon_addl_saturate_s32(tcg_res, cpu_env, tcg_res, tcg_res);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_three_reg_same_fp16(DisasContext *s,
      fpst = get_fpstatus_ptr(true);
 -    tcg_op1 = tcg_temp_new_i32();
 -    tcg_op2 = tcg_temp_new_i32();
 +    tcg_op1 = read_fp_hreg(s, rn);
 +    tcg_op2 = read_fp_hreg(s, rm);
      tcg_res = tcg_temp_new_i32();
 -    read_vec_element_i32(s, tcg_op1, rn, 0, MO_16);
 -    read_vec_element_i32(s, tcg_op2, rm, 0, MO_16);
 -
      switch (fpopcode) {
      case 0x03: /* FMULX */
          gen_helper_advsimd_mulxh(tcg_res, tcg_op1, tcg_op2, fpst);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn)
      }
      if (is_scalar) {
 -        TCGv_i32 tcg_op = tcg_temp_new_i32();
 +        TCGv_i32 tcg_op = read_fp_hreg(s, rn);
          TCGv_i32 tcg_res = tcg_temp_new_i32();
 -        read_vec_element_i32(s, tcg_op, rn, 0, MO_16);
 -
          switch (fpop) {
          case 0x1a: /* FCVTNS */
          case 0x1b: /* FCVTMS */
 --
-.17.0
+.20.1

-[Qemu-devel] [PULL 12/16] target/arm: Implement FCSEL for fp16
+[PULL 25/26] hw/arm/mps2-tz: Implement AN524 memory remapping via machine property
-From: Alex Bennée <alex.bennee@linaro.org>
+The AN524 FPGA image supports two memory maps, which differ in where
+the QSPI and BRAM are.  In the default map, the BRAM is at
-These were missed out from the rest of the half-precision work.
+x0000_0000, and the QSPI at 0x2800_0000.  In the second map, they
+are the other way around.
-Cc: qemu-stable@nongnu.org
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+In hardware, the initial mapping can be selected by the user by
-Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
+writing either "REMAP: BRAM" (the default) or "REMAP: QSPI" in the
-Tested-by: Alex Bennée <alex.bennee@linaro.org>
+board configuration file.  The board config file is acted on by the
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+"Motherboard Configuration Controller", which is an entirely separate
-Message-id: 20180512003217.9105-10-richard.henderson@linaro.org
+microcontroller on the dev board but outside the FPGA.
-[rth: Fix erroneous check vs type]
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+The guest can also dynamically change the mapping via the SCC
 CFG_REG0 register.
 Implement this functionality for QEMU, using a machine property
 "remap" with valid values "BRAM" and "QSPI" to allow the user to set
 the initial mapping, in the same way they can on the FPGA, and
 wiring up the bit from the SCC register to also switch the mapping.
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
+Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
+Message-id: 20210504120912.23094-4-peter.maydell@linaro.org
 ---
- target/arm/translate-a64.c | 31 +++++++++++++++++++++++++------
+ docs/system/arm/mps2.rst |  10 ++++
-file changed, 25 insertions(+), 6 deletions(-)
+ hw/arm/mps2-tz.c         | 108 ++++++++++++++++++++++++++++++++++++++-
+files changed, 117 insertions(+), 1 deletion(-)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
 diff --git a/docs/system/arm/mps2.rst b/docs/system/arm/mps2.rst
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/docs/system/arm/mps2.rst
-+++ b/target/arm/translate-a64.c
++++ b/docs/system/arm/mps2.rst
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ Differences between QEMU and real hardware:
-     unsigned int mos, type, rm, cond, rn, rd;
+   flash, but only as simple ROM, so attempting to rewrite the flash
-     TCGv_i64 t_true, t_false, t_zero;
+   from the guest will fail
-     DisasCompare64 c;
+ - QEMU does not model the USB controller in MPS3 boards
-+    TCGMemOp sz;
++
++Machine-specific options
-     mos = extract32(insn, 29, 3);
++""""""""""""""""""""""""
--    type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
++
-+    type = extract32(insn, 22, 2);
++The following machine-specific options are supported:
-     rm = extract32(insn, 16, 5);
++
-     cond = extract32(insn, 12, 4);
++remap
-     rn = extract32(insn, 5, 5);
++  Supported for ``mps3-an524`` only.
-     rd = extract32(insn, 0, 5);
++  Set ``BRAM``/``QSPI`` to select the initial memory mapping. The
++  default is ``BRAM``.
--    if (mos || type > 1) {
+diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
-+    if (mos) {
+index XXXXXXX..XXXXXXX 100644
-+        unallocated_encoding(s);
+--- a/hw/arm/mps2-tz.c
 +++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@
  #include "hw/boards.h"
  #include "exec/address-spaces.h"
  #include "sysemu/sysemu.h"
 +#include "sysemu/reset.h"
  #include "hw/misc/unimp.h"
  #include "hw/char/cmsdk-apb-uart.h"
  #include "hw/timer/cmsdk-apb-timer.h"
@@ -XXX,XX +XXX,XX @@
  #include "hw/core/split-irq.h"
  #include "hw/qdev-clock.h"
  #include "qom/object.h"
 +#include "hw/irq.h"
  #define MPS2TZ_NUMIRQ_MAX 96
  #define MPS2TZ_RAM_MAX 5
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
      SplitIRQ cpu_irq_splitter[MPS2TZ_NUMIRQ_MAX];
      Clock *sysclk;
      Clock *s32kclk;
 +
 +    bool remap;
 +    qemu_irq remap_irq;
  };
  #define TYPE_MPS2TZ_MACHINE "mps2tz"
@@ -XXX,XX +XXX,XX @@ static const RAMInfo an505_raminfo[] = { {
      },
  };
 +/*
 + * Note that the addresses and MPC numbering here should match up
 + * with those used in remap_memory(), which can swap the BRAM and QSPI.
 + */
  static const RAMInfo an524_raminfo[] = { {
          .name = "bram",
          .base = 0x00000000,
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_scc(MPS2TZMachineState *mms, void *opaque,
      object_initialize_child(OBJECT(mms), "scc", scc, TYPE_MPS2_SCC);
      sccdev = DEVICE(scc);
 +    qdev_prop_set_uint32(sccdev, "scc-cfg0", mms->remap ? 1 : 0);
      qdev_prop_set_uint32(sccdev, "scc-cfg4", 0x2);
      qdev_prop_set_uint32(sccdev, "scc-aid", 0x00200008);
      qdev_prop_set_uint32(sccdev, "scc-id", mmc->scc_id);
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_mpc(MPS2TZMachineState *mms, void *opaque,
      return sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 0);
  }
 +static hwaddr boot_mem_base(MPS2TZMachineState *mms)
 +{
 +    /*
 +     * Return the canonical address of the block which will be mapped
 +     * at address 0x0 (i.e. where the vector table is).
 +     * This is usually 0, but if the AN524 alternate memory map is
 +     * enabled it will be the base address of the QSPI block.
 +     */
 +    return mms->remap ? 0x28000000 : 0;
 +}
 +
 +static void remap_memory(MPS2TZMachineState *mms, int map)
 +{
 +    /*
 +     * Remap the memory for the AN524. 'map' is the value of
 +     * SCC CFG_REG0 bit 0, i.e. 0 for the default map and 1
 +     * for the "option 1" mapping where QSPI is at address 0.
 +     *
 +     * Effectively we need to swap around the "upstream" ends of
 +     * MPC 0 and MPC 1.
 +     */
 +    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
 +    int i;
 +
 +    if (mmc->fpga_type != FPGA_AN524) {
 +        return;
 +    }
 +
-+    switch (type) {
++    memory_region_transaction_begin();
-+    case 0:
++    for (i = 0; i < 2; i++) {
-+        sz = MO_32;
++        TZMPC *mpc = &mms->mpc[i];
-+        break;
++        MemoryRegion *upstream = sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 1);
-+    case 1:
++        hwaddr addr = (i ^ map) ? 0x28000000 : 0;
-+        sz = MO_64;
++
-+        break;
++        memory_region_set_address(upstream, addr);
-+    case 3:
++    }
-+        sz = MO_16;
++    memory_region_transaction_commit();
-+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
++}
-+            break;
++
-+        }
++static void remap_irq_fn(void *opaque, int n, int level)
-+        /* fallthru */
++{
-+    default:
++    MPS2TZMachineState *mms = opaque;
-         unallocated_encoding(s);
++
-         return;
++    remap_memory(mms, level);
 +}
 +
  static MemoryRegion *make_dma(MPS2TZMachineState *mms, void *opaque,
                                const char *name, hwaddr size,
                                const int *irqs)
@@ -XXX,XX +XXX,XX @@ static uint32_t boot_ram_size(MPS2TZMachineState *mms)
      MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
      for (p = mmc->raminfo; p->name; p++) {
 -        if (p->base == 0) {
 +        if (p->base == boot_mem_base(mms)) {
              return p->size;
          }
      }
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
-         return;
-     }
+     create_non_mpc_ram(mms);
--    /* Zero extend sreg inputs to 64 bits now.  */
++    if (mmc->fpga_type == FPGA_AN524) {
-+    /* Zero extend sreg & hreg inputs to 64 bits now.  */
++        /*
-     t_true = tcg_temp_new_i64();
++         * Connect the line from the SCC so that we can remap when the
-     t_false = tcg_temp_new_i64();
++         * guest updates that register.
--    read_vec_element(s, t_true, rn, 0, type ? MO_64 : MO_32);
++         */
--    read_vec_element(s, t_false, rm, 0, type ? MO_64 : MO_32);
++        mms->remap_irq = qemu_allocate_irq(remap_irq_fn, mms, 0);
-+    read_vec_element(s, t_true, rn, 0, sz);
++        qdev_connect_gpio_out_named(DEVICE(&mms->scc), "remap", 0,
-+    read_vec_element(s, t_false, rm, 0, sz);
++                                    mms->remap_irq);
++    }
-     a64_test_cc(&c, cond);
++
-     t_zero = tcg_const_i64(0);
+     armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename,
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
+                        boot_ram_size(mms));
-     tcg_temp_free_i64(t_false);
+ }
-     a64_free_cc(&c);
+@@ -XXX,XX +XXX,XX @@ static void mps2_tz_idau_check(IDAUInterface *ii, uint32_t address,
+     *iregion = region;
--    /* Note that sregs write back zeros to the high bits,
+ }
-+    /* Note that sregs & hregs write back zeros to the high bits,
-        and we've already done the zero-extension.  */
++static char *mps2_get_remap(Object *obj, Error **errp)
-     write_fp_dreg(s, rd, t_true);
++{
-     tcg_temp_free_i64(t_true);
++    MPS2TZMachineState *mms = MPS2TZ_MACHINE(obj);
 +    const char *val = mms->remap ? "QSPI" : "BRAM";
 +    return g_strdup(val);
 +}
 +
 +static void mps2_set_remap(Object *obj, const char *value, Error **errp)
 +{
 +    MPS2TZMachineState *mms = MPS2TZ_MACHINE(obj);
 +
 +    if (!strcmp(value, "BRAM")) {
 +        mms->remap = false;
 +    } else if (!strcmp(value, "QSPI")) {
 +        mms->remap = true;
 +    } else {
 +        error_setg(errp, "Invalid remap value");
 +        error_append_hint(errp, "Valid values are BRAM and QSPI.\n");
 +    }
 +}
 +
 +static void mps2_machine_reset(MachineState *machine)
 +{
 +    MPS2TZMachineState *mms = MPS2TZ_MACHINE(machine);
 +
 +    /*
 +     * Set the initial memory mapping before triggering the reset of
 +     * the rest of the system, so that the guest image loader and CPU
 +     * reset see the correct mapping.
 +     */
 +    remap_memory(mms, mms->remap);
 +    qemu_devices_reset();
 +}
 +
  static void mps2tz_class_init(ObjectClass *oc, void *data)
  {
      MachineClass *mc = MACHINE_CLASS(oc);
      IDAUInterfaceClass *iic = IDAU_INTERFACE_CLASS(oc);
      mc->init = mps2tz_common_init;
 +    mc->reset = mps2_machine_reset;
      iic->check = mps2_tz_idau_check;
  }
@@ -XXX,XX +XXX,XX @@ static void mps3tz_an524_class_init(ObjectClass *oc, void *data)
      mmc->raminfo = an524_raminfo;
      mmc->armsse_type = TYPE_SSE200;
      mps2tz_set_default_ram_info(mmc);
 +
 +    object_class_property_add_str(oc, "remap", mps2_get_remap, mps2_set_remap);
 +    object_class_property_set_description(oc, "remap",
 +                                          "Set memory mapping. Valid values "
 +                                          "are BRAM (default) and QSPI.");
  }
  static void mps3tz_an547_class_init(ObjectClass *oc, void *data)
 --
-.17.0
+.20.1

-[Qemu-devel] [PULL 05/16] target/arm: Early exit after unallocated_encoding in disas_fp_int_conv
+[PULL 26/26] hw/arm/xlnx: Fix PHY address for xilinx-zynq-a9
-From: Richard Henderson <richard.henderson@linaro.org>
+From: Guenter Roeck <linux@roeck-us.net>
-No sense in emitting code after the exception.
+Commit dfc388797cc4 ("hw/arm: xlnx: Set all boards' GEM 'phy-addr'
 property value to 23") configured the PHY address for xilinx-zynq-a9
 to 23. When trying to boot xilinx-zynq-a9 with zynq-zc702.dtb or
 zynq-zc706.dtb, this results in the following error message when
 trying to use the Ethernet interface.
-Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
+macb e000b000.ethernet eth0: Could not attach PHY (-19)
-Tested-by: Alex Bennée <alex.bennee@linaro.org>
-Message-id: 20180512003217.9105-3-richard.henderson@linaro.org
+The devicetree files for ZC702 and ZC706 configure PHY address 7. The
-Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
+documentation for the ZC702 and ZC706 evaluation boards suggest that the
 PHY address is 7, not 23. Other boards use PHY address 0, 1, 3, or 7.
 I was unable to find a documentation or a devicetree file suggesting
 or using PHY address 23. The Ethernet interface starts working with
 zynq-zc702.dtb and zynq-zc706.dtb when setting the PHY address to 7,
 so let's use it.
 Cc: Bin Meng <bin.meng@windriver.com>
 Signed-off-by: Guenter Roeck <linux@roeck-us.net>
 Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
 Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
 Message-id: 20210504124140.1100346-1-linux@roeck-us.net
 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
 ---
- target/arm/translate-a64.c | 2 +-
+ hw/arm/xilinx_zynq.c | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
+diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
 index XXXXXXX..XXXXXXX 100644
---- a/target/arm/translate-a64.c
+--- a/hw/arm/xilinx_zynq.c
-+++ b/target/arm/translate-a64.c
++++ b/hw/arm/xilinx_zynq.c
-@@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
+@@ -XXX,XX +XXX,XX @@ static void gem_init(NICInfo *nd, uint32_t base, qemu_irq irq)
-         default:
+         qemu_check_nic_model(nd, TYPE_CADENCE_GEM);
-             /* all other sf/type/rmode combinations are invalid */
+         qdev_set_nic_properties(dev, nd);
-             unallocated_encoding(s);
+     }
--            break;
+-    object_property_set_int(OBJECT(dev), "phy-addr", 23, &error_abort);
-+            return;
++    object_property_set_int(OBJECT(dev), "phy-addr", 7, &error_abort);
-         }
+     s = SYS_BUS_DEVICE(dev);
+     sysbus_realize_and_unref(s, &error_fatal);
-         if (!fp_access_check(s)) {
+     sysbus_mmio_map(s, 0, base);
 --
-.17.0
+.20.1

The following changes since commit ad1b4ec39caa5b3f17cbd8160283a03a3dcfe2ae:

Merge remote-tracking branch 'remotes/kraxel/tags/input-20180515-pull-request' into staging (2018-05-15 12:50:06 +0100)

are available in the Git repository at:

git://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20180515

for you to fetch changes up to ae7651804748c6b479d5ae09aeac4edb9c44f76e:

tcg: Optionally log FPU state in TCG -d cpu logging (2018-05-15 14:58:44 +0100)

----------------------------------------------------------------
target-arm queue:
 * Fix coverity nit in int_to_float code
 * Don't set Invalid for float-to-int(MAXINT)
 * Fix fp_status_f16 tininess before rounding
 * Add various missing insns from the v8.2-FP16 extension
 * Fix sqrt_f16 exception raising
 * sdcard: Correct CRC16 offset in sd_function_switch()
 * tcg: Optionally log FPU state in TCG -d cpu logging

----------------------------------------------------------------
Alex Bennée (5):
      fpu/softfloat: int_to_float ensure r fully initialised
      target/arm: Implement FCMP for fp16
      target/arm: Implement FCSEL for fp16
      target/arm: Implement FMOV (immediate) for fp16
      target/arm: Fix sqrt_f16 exception raising

Peter Maydell (3):
      fpu/softfloat: Don't set Invalid for float-to-int(MAXINT)
      target/arm: Fix fp_status_f16 tininess before rounding
      tcg: Optionally log FPU state in TCG -d cpu logging

Philippe Mathieu-Daudé (1):
      sdcard: Correct CRC16 offset in sd_function_switch()

Richard Henderson (7):
      target/arm: Implement FMOV (general) for fp16
      target/arm: Early exit after unallocated_encoding in disas_fp_int_conv
      target/arm: Implement FCVT (scalar, integer) for fp16
      target/arm: Implement FCVT (scalar, fixed-point) for fp16
      target/arm: Introduce and use read_fp_hreg
      target/arm: Implement FP data-processing (2 source) for fp16
      target/arm: Implement FP data-processing (3 source) for fp16

In float-to-integer conversion, if the floating point input
converts exactly to the largest or smallest integer that
fits in to the result type, this is not an overflow.
In this situation we were producing the correct result value,
but were incorrectly setting the Invalid flag.
For example for Arm A64, "FCVTAS w0, d0" on an input of
0x41dfffffffc00000 should produce 0x7fffffff and set no flags.

Fix the boundary case to take the right half of the if()
statements.

This fixes a regression from 2.11 introduced by the softfloat
refactoring.

Cc: qemu-stable@nongnu.org
Fixes: ab52f973a50
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180510140141.12120-1-peter.maydell@linaro.org
---
 fpu/softfloat.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index XXXXXXX..XXXXXXX 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -XXX,XX +XXX,XX @@ static int64_t round_to_int_and_pack(FloatParts in, int rmode,
             r = UINT64_MAX;
         }
         if (p.sign) {
-            if (r < -(uint64_t) min) {
+            if (r <= -(uint64_t) min) {
                 return -r;
             } else {
                 s->float_exception_flags = orig_flags | float_flag_invalid;
                 return min;
             }
         } else {
-            if (r < max) {
+            if (r <= max) {
                 return r;
             } else {
                 s->float_exception_flags = orig_flags | float_flag_invalid;
-- 
2.17.0

In commit d81ce0ef2c4f105 we added an extra float_status field
fp_status_fp16 for Arm, but forgot to initialize it correctly
by setting it to float_tininess_before_rounding. This currently
will only cause problems for the new V8_FP16 feature, since the
float-to-float conversion code doesn't use it yet. The effect
would be that we failed to set the Underflow IEEE exception flag
in all the cases where we should.

Add the missing initialization.

Fixes: d81ce0ef2c4f105
Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20180512004311.9299-16-richard.henderson@linaro.org
---
 target/arm/cpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -XXX,XX +XXX,XX @@ static void arm_cpu_reset(CPUState *s)
                               &env->vfp.fp_status);
     set_float_detect_tininess(float_tininess_before_rounding,
                               &env->vfp.standard_fp_status);
+    set_float_detect_tininess(float_tininess_before_rounding,
+                              &env->vfp.fp_status_f16);
 #ifndef CONFIG_USER_ONLY
     if (kvm_enabled()) {
         kvm_arm_reset_vcpu(cpu);
-- 
2.17.0

From: Richard Henderson <richard.henderson@linaro.org>

Adding the fp16 moves to/from general registers.

Cc: qemu-stable@nongnu.org
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180512003217.9105-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
             tcg_gen_st_i64(tcg_rn, cpu_env, fp_reg_hi_offset(s, rd));
             clear_vec_high(s, true, rd);
             break;
+        case 3:
+            /* 16 bit */
+            tmp = tcg_temp_new_i64();
+            tcg_gen_ext16u_i64(tmp, tcg_rn);
+            write_fp_dreg(s, rd, tmp);
+            tcg_temp_free_i64(tmp);
+            break;
+        default:
+            g_assert_not_reached();
         }
     } else {
         TCGv_i64 tcg_rd = cpu_reg(s, rd);
@@ -XXX,XX +XXX,XX @@ static void handle_fmov(DisasContext *s, int rd, int rn, int type, bool itof)
             /* 64 bits from top half */
             tcg_gen_ld_i64(tcg_rd, cpu_env, fp_reg_hi_offset(s, rn));
             break;
+        case 3:
+            /* 16 bit */
+            tcg_gen_ld16u_i64(tcg_rd, cpu_env, fp_reg_offset(s, rn, MO_16));
+            break;
+        default:
+            g_assert_not_reached();
         }
     }
 }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
         case 0xa: /* 64 bit */
         case 0xd: /* 64 bit to top half of quad */
             break;
+        case 0x6: /* 16-bit float, 32-bit int */
+        case 0xe: /* 16-bit float, 64-bit int */
+            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+                break;
+            }
+            /* fallthru */
         default:
             /* all other sf/type/rmode combinations are invalid */
             unallocated_encoding(s);
-- 
2.17.0

From: Richard Henderson <richard.henderson@linaro.org>

Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180512003217.9105-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper.h        |  6 +++
 target/arm/helper.c        | 38 ++++++++++++++-
 target/arm/translate-a64.c | 96 +++++++++++++++++++++++++++++++-------
 3 files changed, 122 insertions(+), 18 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_touhd_round_to_zero, i64, f64, i32, ptr)
 DEF_HELPER_3(vfp_tould_round_to_zero, i64, f64, i32, ptr)
 DEF_HELPER_3(vfp_touhh, i32, f16, i32, ptr)
 DEF_HELPER_3(vfp_toshh, i32, f16, i32, ptr)
+DEF_HELPER_3(vfp_toulh, i32, f16, i32, ptr)
+DEF_HELPER_3(vfp_toslh, i32, f16, i32, ptr)
+DEF_HELPER_3(vfp_touqh, i64, f16, i32, ptr)
+DEF_HELPER_3(vfp_tosqh, i64, f16, i32, ptr)
 DEF_HELPER_3(vfp_toshs, i32, f32, i32, ptr)
 DEF_HELPER_3(vfp_tosls, i32, f32, i32, ptr)
 DEF_HELPER_3(vfp_tosqs, i64, f32, i32, ptr)
@@ -XXX,XX +XXX,XX @@ DEF_HELPER_3(vfp_ultod, f64, i64, i32, ptr)
 DEF_HELPER_3(vfp_uqtod, f64, i64, i32, ptr)
 DEF_HELPER_3(vfp_sltoh, f16, i32, i32, ptr)
 DEF_HELPER_3(vfp_ultoh, f16, i32, i32, ptr)
+DEF_HELPER_3(vfp_sqtoh, f16, i64, i32, ptr)
+DEF_HELPER_3(vfp_uqtoh, f16, i64, i32, ptr)
 
 DEF_HELPER_FLAGS_2(set_rmode, TCG_CALL_NO_RWG, i32, i32, ptr)
 DEF_HELPER_FLAGS_2(set_neon_rmode, TCG_CALL_NO_RWG, i32, i32, env)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ VFP_CONV_FIX_A64(uq, s, 32, 64, uint64)
 #undef VFP_CONV_FIX_A64
 
 /* Conversion to/from f16 can overflow to infinity before/after scaling.
- * Therefore we convert to f64 (which does not round), scale,
- * and then convert f64 to f16 (which may round).
+ * Therefore we convert to f64, scale, and then convert f64 to f16; or
+ * vice versa for conversion to integer.
+ *
+ * For 16- and 32-bit integers, the conversion to f64 never rounds.
+ * For 64-bit integers, any integer that would cause rounding will also
+ * overflow to f16 infinity, so there is no double rounding problem.
  */
 
 static float16 do_postscale_fp16(float64 f, int shift, float_status *fpst)
@@ -XXX,XX +XXX,XX @@ float16 HELPER(vfp_ultoh)(uint32_t x, uint32_t shift, void *fpst)
     return do_postscale_fp16(uint32_to_float64(x, fpst), shift, fpst);
 }
 
+float16 HELPER(vfp_sqtoh)(uint64_t x, uint32_t shift, void *fpst)
+{
+    return do_postscale_fp16(int64_to_float64(x, fpst), shift, fpst);
+}
+
+float16 HELPER(vfp_uqtoh)(uint64_t x, uint32_t shift, void *fpst)
+{
+    return do_postscale_fp16(uint64_to_float64(x, fpst), shift, fpst);
+}
+
 static float64 do_prescale_fp16(float16 f, int shift, float_status *fpst)
 {
     if (unlikely(float16_is_any_nan(f))) {
@@ -XXX,XX +XXX,XX @@ uint32_t HELPER(vfp_touhh)(float16 x, uint32_t shift, void *fpst)
     return float64_to_uint16(do_prescale_fp16(x, shift, fpst), fpst);
 }
 
+uint32_t HELPER(vfp_toslh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_int32(do_prescale_fp16(x, shift, fpst), fpst);
+}
+
+uint32_t HELPER(vfp_toulh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_uint32(do_prescale_fp16(x, shift, fpst), fpst);
+}
+
+uint64_t HELPER(vfp_tosqh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_int64(do_prescale_fp16(x, shift, fpst), fpst);
+}
+
+uint64_t HELPER(vfp_touqh)(float16 x, uint32_t shift, void *fpst)
+{
+    return float64_to_uint64(do_prescale_fp16(x, shift, fpst), fpst);
+}
+
 /* Set the current fp rounding mode and return the old one.
  * The argument is a softfloat float_round_ value.
  */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
                            bool itof, int rmode, int scale, int sf, int type)
 {
     bool is_signed = !(opcode & 1);
-    bool is_double = type;
     TCGv_ptr tcg_fpstatus;
-    TCGv_i32 tcg_shift;
+    TCGv_i32 tcg_shift, tcg_single;
+    TCGv_i64 tcg_double;
 
-    tcg_fpstatus = get_fpstatus_ptr(false);
+    tcg_fpstatus = get_fpstatus_ptr(type == 3);
 
     tcg_shift = tcg_const_i32(64 - scale);
 
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
             tcg_int = tcg_extend;
         }
 
-        if (is_double) {
-            TCGv_i64 tcg_double = tcg_temp_new_i64();
+        switch (type) {
+        case 1: /* float64 */
+            tcg_double = tcg_temp_new_i64();
             if (is_signed) {
                 gen_helper_vfp_sqtod(tcg_double, tcg_int,
                                      tcg_shift, tcg_fpstatus);
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
             }
             write_fp_dreg(s, rd, tcg_double);
             tcg_temp_free_i64(tcg_double);
-        } else {
-            TCGv_i32 tcg_single = tcg_temp_new_i32();
+            break;
+
+        case 0: /* float32 */
+            tcg_single = tcg_temp_new_i32();
             if (is_signed) {
                 gen_helper_vfp_sqtos(tcg_single, tcg_int,
                                      tcg_shift, tcg_fpstatus);
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
             }
             write_fp_sreg(s, rd, tcg_single);
             tcg_temp_free_i32(tcg_single);
+            break;
+
+        case 3: /* float16 */
+            tcg_single = tcg_temp_new_i32();
+            if (is_signed) {
+                gen_helper_vfp_sqtoh(tcg_single, tcg_int,
+                                     tcg_shift, tcg_fpstatus);
+            } else {
+                gen_helper_vfp_uqtoh(tcg_single, tcg_int,
+                                     tcg_shift, tcg_fpstatus);
+            }
+            write_fp_sreg(s, rd, tcg_single);
+            tcg_temp_free_i32(tcg_single);
+            break;
+
+        default:
+            g_assert_not_reached();
         }
     } else {
         TCGv_i64 tcg_int = cpu_reg(s, rd);
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
 
         gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
 
-        if (is_double) {
-            TCGv_i64 tcg_double = read_fp_dreg(s, rn);
+        switch (type) {
+        case 1: /* float64 */
+            tcg_double = read_fp_dreg(s, rn);
             if (is_signed) {
                 if (!sf) {
                     gen_helper_vfp_tosld(tcg_int, tcg_double,
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
                                          tcg_shift, tcg_fpstatus);
                 }
             }
+            if (!sf) {
+                tcg_gen_ext32u_i64(tcg_int, tcg_int);
+            }
             tcg_temp_free_i64(tcg_double);
-        } else {
-            TCGv_i32 tcg_single = read_fp_sreg(s, rn);
+            break;
+
+        case 0: /* float32 */
+            tcg_single = read_fp_sreg(s, rn);
             if (sf) {
                 if (is_signed) {
                     gen_helper_vfp_tosqs(tcg_int, tcg_single,
@@ -XXX,XX +XXX,XX @@ static void handle_fpfpcvt(DisasContext *s, int rd, int rn, int opcode,
                 tcg_temp_free_i32(tcg_dest);
             }
             tcg_temp_free_i32(tcg_single);
+            break;
+
+        case 3: /* float16 */
+            tcg_single = read_fp_sreg(s, rn);
+            if (sf) {
+                if (is_signed) {
+                    gen_helper_vfp_tosqh(tcg_int, tcg_single,
+                                         tcg_shift, tcg_fpstatus);
+                } else {
+                    gen_helper_vfp_touqh(tcg_int, tcg_single,
+                                         tcg_shift, tcg_fpstatus);
+                }
+            } else {
+                TCGv_i32 tcg_dest = tcg_temp_new_i32();
+                if (is_signed) {
+                    gen_helper_vfp_toslh(tcg_dest, tcg_single,
+                                         tcg_shift, tcg_fpstatus);
+                } else {
+                    gen_helper_vfp_toulh(tcg_dest, tcg_single,
+                                         tcg_shift, tcg_fpstatus);
+                }
+                tcg_gen_extu_i32_i64(tcg_int, tcg_dest);
+                tcg_temp_free_i32(tcg_dest);
+            }
+            tcg_temp_free_i32(tcg_single);
+            break;
+
+        default:
+            g_assert_not_reached();
         }
 
         gen_helper_set_rmode(tcg_rmode, tcg_rmode, tcg_fpstatus);
         tcg_temp_free_i32(tcg_rmode);
-
-        if (!sf) {
-            tcg_gen_ext32u_i64(tcg_int, tcg_int);
-        }
     }
 
     tcg_temp_free_ptr(tcg_fpstatus);
@@ -XXX,XX +XXX,XX @@ static void disas_fp_int_conv(DisasContext *s, uint32_t insn)
         /* actual FP conversions */
         bool itof = extract32(opcode, 1, 1);
 
-        if (type > 1 || (rmode != 0 && opcode > 1)) {
+        if (rmode != 0 && opcode > 1) {
+            unallocated_encoding(s);
+            return;
+        }
+        switch (type) {
+        case 0: /* float32 */
+        case 1: /* float64 */
+            break;
+        case 3: /* float16 */
+            if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+                break;
+            }
+            /* fallthru */
+        default:
             unallocated_encoding(s);
             return;
         }
-- 
2.17.0

From: Richard Henderson <richard.henderson@linaro.org>

Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180512003217.9105-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_fp_fixed_conv(DisasContext *s, uint32_t insn)
     bool sf = extract32(insn, 31, 1);
     bool itof;
 
-    if (sbit || (type > 1)
-        || (!sf && scale < 32)) {
+    if (sbit || (!sf && scale < 32)) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (type) {
+    case 0: /* float32 */
+    case 1: /* float64 */
+        break;
+    case 3: /* float16 */
+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            break;
+        }
+        /* fallthru */
+    default:
         unallocated_encoding(s);
         return;
     }
-- 
2.17.0

From: Richard Henderson <richard.henderson@linaro.org>

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180512003217.9105-6-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static TCGv_i32 read_fp_sreg(DisasContext *s, int reg)
     return v;
 }
 
+static TCGv_i32 read_fp_hreg(DisasContext *s, int reg)
+{
+    TCGv_i32 v = tcg_temp_new_i32();
+
+    tcg_gen_ld16u_i32(v, cpu_env, fp_reg_offset(s, reg, MO_16));
+    return v;
+}
+
 /* Clear the bits above an N-bit vector, for N = (is_q ? 128 : 64).
  * If SVE is not enabled, then there are only 128 bits in the vector.
  */
@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
 static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn)
 {
     TCGv_ptr fpst = NULL;
-    TCGv_i32 tcg_op = tcg_temp_new_i32();
+    TCGv_i32 tcg_op = read_fp_hreg(s, rn);
     TCGv_i32 tcg_res = tcg_temp_new_i32();
 
-    read_vec_element_i32(s, tcg_op, rn, 0, MO_16);
-
     switch (opcode) {
     case 0x0: /* FMOV */
         tcg_gen_mov_i32(tcg_res, tcg_op);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
         tcg_temp_free_i64(tcg_op2);
         tcg_temp_free_i64(tcg_res);
     } else {
-        TCGv_i32 tcg_op1 = tcg_temp_new_i32();
-        TCGv_i32 tcg_op2 = tcg_temp_new_i32();
+        TCGv_i32 tcg_op1 = read_fp_hreg(s, rn);
+        TCGv_i32 tcg_op2 = read_fp_hreg(s, rm);
         TCGv_i64 tcg_res = tcg_temp_new_i64();
 
-        read_vec_element_i32(s, tcg_op1, rn, 0, MO_16);
-        read_vec_element_i32(s, tcg_op2, rm, 0, MO_16);
-
         gen_helper_neon_mull_s16(tcg_res, tcg_op1, tcg_op2);
         gen_helper_neon_addl_saturate_s32(tcg_res, cpu_env, tcg_res, tcg_res);
 
@@ -XXX,XX +XXX,XX @@ static void disas_simd_scalar_three_reg_same_fp16(DisasContext *s,
 
     fpst = get_fpstatus_ptr(true);
 
-    tcg_op1 = tcg_temp_new_i32();
-    tcg_op2 = tcg_temp_new_i32();
+    tcg_op1 = read_fp_hreg(s, rn);
+    tcg_op2 = read_fp_hreg(s, rm);
     tcg_res = tcg_temp_new_i32();
 
-    read_vec_element_i32(s, tcg_op1, rn, 0, MO_16);
-    read_vec_element_i32(s, tcg_op2, rm, 0, MO_16);
-
     switch (fpopcode) {
     case 0x03: /* FMULX */
         gen_helper_advsimd_mulxh(tcg_res, tcg_op1, tcg_op2, fpst);
@@ -XXX,XX +XXX,XX @@ static void disas_simd_two_reg_misc_fp16(DisasContext *s, uint32_t insn)
     }
 
     if (is_scalar) {
-        TCGv_i32 tcg_op = tcg_temp_new_i32();
+        TCGv_i32 tcg_op = read_fp_hreg(s, rn);
         TCGv_i32 tcg_res = tcg_temp_new_i32();
 
-        read_vec_element_i32(s, tcg_op, rn, 0, MO_16);
-
         switch (fpop) {
         case 0x1a: /* FCVTNS */
         case 0x1b: /* FCVTMS */
-- 
2.17.0

From: Richard Henderson <richard.henderson@linaro.org>

We missed all of the scalar fp16 binary operations.

Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180512003217.9105-7-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 65 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fp_2src_double(DisasContext *s, int opcode,
     tcg_temp_free_i64(tcg_res);
 }
 
+/* Floating-point data-processing (2 source) - half precision */
+static void handle_fp_2src_half(DisasContext *s, int opcode,
+                                int rd, int rn, int rm)
+{
+    TCGv_i32 tcg_op1;
+    TCGv_i32 tcg_op2;
+    TCGv_i32 tcg_res;
+    TCGv_ptr fpst;
+
+    tcg_res = tcg_temp_new_i32();
+    fpst = get_fpstatus_ptr(true);
+    tcg_op1 = read_fp_hreg(s, rn);
+    tcg_op2 = read_fp_hreg(s, rm);
+
+    switch (opcode) {
+    case 0x0: /* FMUL */
+        gen_helper_advsimd_mulh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x1: /* FDIV */
+        gen_helper_advsimd_divh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x2: /* FADD */
+        gen_helper_advsimd_addh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x3: /* FSUB */
+        gen_helper_advsimd_subh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x4: /* FMAX */
+        gen_helper_advsimd_maxh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x5: /* FMIN */
+        gen_helper_advsimd_minh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x6: /* FMAXNM */
+        gen_helper_advsimd_maxnumh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x7: /* FMINNM */
+        gen_helper_advsimd_minnumh(tcg_res, tcg_op1, tcg_op2, fpst);
+        break;
+    case 0x8: /* FNMUL */
+        gen_helper_advsimd_mulh(tcg_res, tcg_op1, tcg_op2, fpst);
+        tcg_gen_xori_i32(tcg_res, tcg_res, 0x8000);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    write_fp_sreg(s, rd, tcg_res);
+
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tcg_op1);
+    tcg_temp_free_i32(tcg_op2);
+    tcg_temp_free_i32(tcg_res);
+}
+
 /* Floating point data-processing (2 source)
  *   31  30  29 28       24 23  22  21 20  16 15    12 11 10 9    5 4    0
  * +---+---+---+-----------+------+---+------+--------+-----+------+------+
@@ -XXX,XX +XXX,XX @@ static void disas_fp_2src(DisasContext *s, uint32_t insn)
         }
         handle_fp_2src_double(s, opcode, rd, rn, rm);
         break;
+    case 3:
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            unallocated_encoding(s);
+            return;
+        }
+        if (!fp_access_check(s)) {
+            return;
+        }
+        handle_fp_2src_half(s, opcode, rd, rn, rm);
+        break;
     default:
         unallocated_encoding(s);
     }
-- 
2.17.0

From: Richard Henderson <richard.henderson@linaro.org>

We missed all of the scalar fp16 fma operations.

Cc: qemu-stable@nongnu.org
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180512003217.9105-8-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 48 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fp_3src_double(DisasContext *s, bool o0, bool o1,
     tcg_temp_free_i64(tcg_res);
 }
 
+/* Floating-point data-processing (3 source) - half precision */
+static void handle_fp_3src_half(DisasContext *s, bool o0, bool o1,
+                                int rd, int rn, int rm, int ra)
+{
+    TCGv_i32 tcg_op1, tcg_op2, tcg_op3;
+    TCGv_i32 tcg_res = tcg_temp_new_i32();
+    TCGv_ptr fpst = get_fpstatus_ptr(true);
+
+    tcg_op1 = read_fp_hreg(s, rn);
+    tcg_op2 = read_fp_hreg(s, rm);
+    tcg_op3 = read_fp_hreg(s, ra);
+
+    /* These are fused multiply-add, and must be done as one
+     * floating point operation with no rounding between the
+     * multiplication and addition steps.
+     * NB that doing the negations here as separate steps is
+     * correct : an input NaN should come out with its sign bit
+     * flipped if it is a negated-input.
+     */
+    if (o1 == true) {
+        tcg_gen_xori_i32(tcg_op3, tcg_op3, 0x8000);
+    }
+
+    if (o0 != o1) {
+        tcg_gen_xori_i32(tcg_op1, tcg_op1, 0x8000);
+    }
+
+    gen_helper_advsimd_muladdh(tcg_res, tcg_op1, tcg_op2, tcg_op3, fpst);
+
+    write_fp_sreg(s, rd, tcg_res);
+
+    tcg_temp_free_ptr(fpst);
+    tcg_temp_free_i32(tcg_op1);
+    tcg_temp_free_i32(tcg_op2);
+    tcg_temp_free_i32(tcg_op3);
+    tcg_temp_free_i32(tcg_res);
+}
+
 /* Floating point data-processing (3 source)
  *   31  30  29 28       24 23  22  21  20  16  15  14  10 9    5 4    0
  * +---+---+---+-----------+------+----+------+----+------+------+------+
@@ -XXX,XX +XXX,XX @@ static void disas_fp_3src(DisasContext *s, uint32_t insn)
         }
         handle_fp_3src_double(s, o0, o1, rd, rn, rm, ra);
         break;
+    case 3:
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            unallocated_encoding(s);
+            return;
+        }
+        if (!fp_access_check(s)) {
+            return;
+        }
+        handle_fp_3src_half(s, o0, o1, rd, rn, rm, ra);
+        break;
     default:
         unallocated_encoding(s);
     }
-- 
2.17.0

From: Alex Bennée <alex.bennee@linaro.org>

These where missed out from the rest of the half-precision work.

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180512003217.9105-9-richard.henderson@linaro.org
[rth: Diagnose lack of FP16 before fp_access_check]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/helper-a64.h    |  2 +
 target/arm/helper-a64.c    | 10 +++++
 target/arm/translate-a64.c | 88 ++++++++++++++++++++++++++++++--------
 3 files changed, 83 insertions(+), 17 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -XXX,XX +XXX,XX @@
 DEF_HELPER_FLAGS_2(udiv64, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_FLAGS_2(sdiv64, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 DEF_HELPER_FLAGS_1(rbit64, TCG_CALL_NO_RWG_SE, i64, i64)
+DEF_HELPER_3(vfp_cmph_a64, i64, f16, f16, ptr)
+DEF_HELPER_3(vfp_cmpeh_a64, i64, f16, f16, ptr)
 DEF_HELPER_3(vfp_cmps_a64, i64, f32, f32, ptr)
 DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
 DEF_HELPER_3(vfp_cmpd_a64, i64, f64, f64, ptr)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -XXX,XX +XXX,XX @@ static inline uint32_t float_rel_to_flags(int res)
     return flags;
 }
 
+uint64_t HELPER(vfp_cmph_a64)(float16 x, float16 y, void *fp_status)
+{
+    return float_rel_to_flags(float16_compare_quiet(x, y, fp_status));
+}
+
+uint64_t HELPER(vfp_cmpeh_a64)(float16 x, float16 y, void *fp_status)
+{
+    return float_rel_to_flags(float16_compare(x, y, fp_status));
+}
+
 uint64_t HELPER(vfp_cmps_a64)(float32 x, float32 y, void *fp_status)
 {
     return float_rel_to_flags(float32_compare_quiet(x, y, fp_status));
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_data_proc_reg(DisasContext *s, uint32_t insn)
     }
 }
 
-static void handle_fp_compare(DisasContext *s, bool is_double,
+static void handle_fp_compare(DisasContext *s, int size,
                               unsigned int rn, unsigned int rm,
                               bool cmp_with_zero, bool signal_all_nans)
 {
     TCGv_i64 tcg_flags = tcg_temp_new_i64();
-    TCGv_ptr fpst = get_fpstatus_ptr(false);
+    TCGv_ptr fpst = get_fpstatus_ptr(size == MO_16);
 
-    if (is_double) {
+    if (size == MO_64) {
         TCGv_i64 tcg_vn, tcg_vm;
 
         tcg_vn = read_fp_dreg(s, rn);
@@ -XXX,XX +XXX,XX @@ static void handle_fp_compare(DisasContext *s, bool is_double,
         tcg_temp_free_i64(tcg_vn);
         tcg_temp_free_i64(tcg_vm);
     } else {
-        TCGv_i32 tcg_vn, tcg_vm;
+        TCGv_i32 tcg_vn = tcg_temp_new_i32();
+        TCGv_i32 tcg_vm = tcg_temp_new_i32();
 
-        tcg_vn = read_fp_sreg(s, rn);
+        read_vec_element_i32(s, tcg_vn, rn, 0, size);
         if (cmp_with_zero) {
-            tcg_vm = tcg_const_i32(0);
+            tcg_gen_movi_i32(tcg_vm, 0);
         } else {
-            tcg_vm = read_fp_sreg(s, rm);
+            read_vec_element_i32(s, tcg_vm, rm, 0, size);
         }
-        if (signal_all_nans) {
-            gen_helper_vfp_cmpes_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
-        } else {
-            gen_helper_vfp_cmps_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
+
+        switch (size) {
+        case MO_32:
+            if (signal_all_nans) {
+                gen_helper_vfp_cmpes_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
+            } else {
+                gen_helper_vfp_cmps_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
+            }
+            break;
+        case MO_16:
+            if (signal_all_nans) {
+                gen_helper_vfp_cmpeh_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
+            } else {
+                gen_helper_vfp_cmph_a64(tcg_flags, tcg_vn, tcg_vm, fpst);
+            }
+            break;
+        default:
+            g_assert_not_reached();
         }
+
         tcg_temp_free_i32(tcg_vn);
         tcg_temp_free_i32(tcg_vm);
     }
@@ -XXX,XX +XXX,XX @@ static void handle_fp_compare(DisasContext *s, bool is_double,
 static void disas_fp_compare(DisasContext *s, uint32_t insn)
 {
     unsigned int mos, type, rm, op, rn, opc, op2r;
+    int size;
 
     mos = extract32(insn, 29, 3);
-    type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
+    type = extract32(insn, 22, 2);
     rm = extract32(insn, 16, 5);
     op = extract32(insn, 14, 2);
     rn = extract32(insn, 5, 5);
     opc = extract32(insn, 3, 2);
     op2r = extract32(insn, 0, 3);
 
-    if (mos || op || op2r || type > 1) {
+    if (mos || op || op2r) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (type) {
+    case 0:
+        size = MO_32;
+        break;
+    case 1:
+        size = MO_64;
+        break;
+    case 3:
+        size = MO_16;
+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            break;
+        }
+        /* fallthru */
+    default:
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_compare(DisasContext *s, uint32_t insn)
         return;
     }
 
-    handle_fp_compare(s, type, rn, rm, opc & 1, opc & 2);
+    handle_fp_compare(s, size, rn, rm, opc & 1, opc & 2);
 }
 
 /* Floating point conditional compare
@@ -XXX,XX +XXX,XX @@ static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
     unsigned int mos, type, rm, cond, rn, op, nzcv;
     TCGv_i64 tcg_flags;
     TCGLabel *label_continue = NULL;
+    int size;
 
     mos = extract32(insn, 29, 3);
-    type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
+    type = extract32(insn, 22, 2);
     rm = extract32(insn, 16, 5);
     cond = extract32(insn, 12, 4);
     rn = extract32(insn, 5, 5);
     op = extract32(insn, 4, 1);
     nzcv = extract32(insn, 0, 4);
 
-    if (mos || type > 1) {
+    if (mos) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (type) {
+    case 0:
+        size = MO_32;
+        break;
+    case 1:
+        size = MO_64;
+        break;
+    case 3:
+        size = MO_16;
+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            break;
+        }
+        /* fallthru */
+    default:
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_ccomp(DisasContext *s, uint32_t insn)
         gen_set_label(label_match);
     }
 
-    handle_fp_compare(s, type, rn, rm, false, op);
+    handle_fp_compare(s, size, rn, rm, false, op);
 
     if (cond < 0x0e) {
         gen_set_label(label_continue);
-- 
2.17.0

From: Alex Bennée <alex.bennee@linaro.org>

These were missed out from the rest of the half-precision work.

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180512003217.9105-10-richard.henderson@linaro.org
[rth: Fix erroneous check vs type]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
     unsigned int mos, type, rm, cond, rn, rd;
     TCGv_i64 t_true, t_false, t_zero;
     DisasCompare64 c;
+    TCGMemOp sz;
 
     mos = extract32(insn, 29, 3);
-    type = extract32(insn, 22, 2); /* 0 = single, 1 = double */
+    type = extract32(insn, 22, 2);
     rm = extract32(insn, 16, 5);
     cond = extract32(insn, 12, 4);
     rn = extract32(insn, 5, 5);
     rd = extract32(insn, 0, 5);
 
-    if (mos || type > 1) {
+    if (mos) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (type) {
+    case 0:
+        sz = MO_32;
+        break;
+    case 1:
+        sz = MO_64;
+        break;
+    case 3:
+        sz = MO_16;
+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            break;
+        }
+        /* fallthru */
+    default:
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
         return;
     }
 
-    /* Zero extend sreg inputs to 64 bits now.  */
+    /* Zero extend sreg & hreg inputs to 64 bits now.  */
     t_true = tcg_temp_new_i64();
     t_false = tcg_temp_new_i64();
-    read_vec_element(s, t_true, rn, 0, type ? MO_64 : MO_32);
-    read_vec_element(s, t_false, rm, 0, type ? MO_64 : MO_32);
+    read_vec_element(s, t_true, rn, 0, sz);
+    read_vec_element(s, t_false, rm, 0, sz);
 
     a64_test_cc(&c, cond);
     t_zero = tcg_const_i64(0);
@@ -XXX,XX +XXX,XX @@ static void disas_fp_csel(DisasContext *s, uint32_t insn)
     tcg_temp_free_i64(t_false);
     a64_free_cc(&c);
 
-    /* Note that sregs write back zeros to the high bits,
+    /* Note that sregs & hregs write back zeros to the high bits,
        and we've already done the zero-extension.  */
     write_fp_dreg(s, rd, t_true);
     tcg_temp_free_i64(t_true);
-- 
2.17.0

From: Alex Bennée <alex.bennee@linaro.org>

All the hard work is already done by vfp_expand_imm, we just need to
make sure we pick up the correct size.

Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180512003217.9105-11-richard.henderson@linaro.org
[rth: Merge unallocated_encoding check with TCGMemOp conversion.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void disas_fp_imm(DisasContext *s, uint32_t insn)
 {
     int rd = extract32(insn, 0, 5);
     int imm8 = extract32(insn, 13, 8);
-    int is_double = extract32(insn, 22, 2);
+    int type = extract32(insn, 22, 2);
     uint64_t imm;
     TCGv_i64 tcg_res;
+    TCGMemOp sz;
 
-    if (is_double > 1) {
+    switch (type) {
+    case 0:
+        sz = MO_32;
+        break;
+    case 1:
+        sz = MO_64;
+        break;
+    case 3:
+        sz = MO_16;
+        if (arm_dc_feature(s, ARM_FEATURE_V8_FP16)) {
+            break;
+        }
+        /* fallthru */
+    default:
         unallocated_encoding(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void disas_fp_imm(DisasContext *s, uint32_t insn)
         return;
     }
 
-    imm = vfp_expand_imm(MO_32 + is_double, imm8);
+    imm = vfp_expand_imm(sz, imm8);
 
     tcg_res = tcg_const_i64(imm);
     write_fp_dreg(s, rd, tcg_res);
-- 
2.17.0

From: Alex Bennée <alex.bennee@linaro.org>

We are meant to explicitly pass fpst, not cpu_env.

Cc: qemu-stable@nongnu.org
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180512003217.9105-12-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target/arm/translate-a64.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn)
         tcg_gen_xori_i32(tcg_res, tcg_op, 0x8000);
         break;
     case 0x3: /* FSQRT */
-        gen_helper_sqrt_f16(tcg_res, tcg_op, cpu_env);
+        fpst = get_fpstatus_ptr(true);
+        gen_helper_sqrt_f16(tcg_res, tcg_op, fpst);
         break;
     case 0x8: /* FRINTN */
     case 0x9: /* FRINTP */
-- 
2.17.0

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

Per the Physical Layer Simplified Spec. "4.3.10.4 Switch Function Status":

The block length is predefined to 512 bits

and "4.10.2 SD Status":

The SD Status contains status bits that are related to the SD Memory Card
  proprietary features and may be used for future application-specific usage.
  The size of the SD Status is one data block of 512 bit. The content of this
  register is transmitted to the Host over the DAT bus along with a 16-bit CRC.

Thus the 16-bit CRC goes at offset 64.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180509060104.4458-3-f4bug@amsat.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/sd/sd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -XXX,XX +XXX,XX @@ static void sd_function_switch(SDState *sd, uint32_t arg)
         sd->data[14 + (i >> 1)] = new_func << ((i * 4) & 4);
     }
     memset(&sd->data[17], 0, 47);
-    stw_be_p(sd->data + 65, sd_crc16(sd->data, 64));
+    stw_be_p(sd->data + 64, sd_crc16(sd->data, 64));
 }
 
 static inline bool sd_wp_addr(SDState *sd, uint64_t addr)
-- 
2.17.0

Usually the logging of the CPU state produced by -d cpu is sufficient
to diagnose problems, but sometimes you want to see the state of
the floating point registers as well. We don't want to enable that
by default as it adds a lot of extra data to the log; instead,
allow it to be optionally enabled via -d fpu.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180510130024.31678-1-peter.maydell@linaro.org
---
 include/qemu/log.h   | 1 +
 accel/tcg/cpu-exec.c | 9 ++++++---
 util/log.c           | 2 ++
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/qemu/log.h b/include/qemu/log.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -XXX,XX +XXX,XX @@ static inline bool qemu_log_separate(void)
 #define CPU_LOG_PAGE       (1 << 14)
 /* LOG_TRACE (1 << 15) is defined in log-for-trace.h */
 #define CPU_LOG_TB_OP_IND  (1 << 16)
+#define CPU_LOG_TB_FPU     (1 << 17)
 
 /* Lock output for a series of related logs.  Since this is not needed
  * for a single qemu_log / qemu_log_mask / qemu_log_mask_and_addr, we
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index XXXXXXX..XXXXXXX 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -XXX,XX +XXX,XX @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb)
     if (qemu_loglevel_mask(CPU_LOG_TB_CPU)
         && qemu_log_in_addr_range(itb->pc)) {
         qemu_log_lock();
+        int flags = 0;
+        if (qemu_loglevel_mask(CPU_LOG_TB_FPU)) {
+            flags |= CPU_DUMP_FPU;
+        }
 #if defined(TARGET_I386)
-        log_cpu_state(cpu, CPU_DUMP_CCOP);
-#else
-        log_cpu_state(cpu, 0);
+        flags |= CPU_DUMP_CCOP;
 #endif
+        log_cpu_state(cpu, flags);
         qemu_log_unlock();
     }
 #endif /* DEBUG_DISAS */
diff --git a/util/log.c b/util/log.c
index XXXXXXX..XXXXXXX 100644
--- a/util/log.c
+++ b/util/log.c
@@ -XXX,XX +XXX,XX @@ const QEMULogItem qemu_log_items[] = {
       "show trace before each executed TB (lots of logs)" },
     { CPU_LOG_TB_CPU, "cpu",
       "show CPU registers before entering a TB (lots of logs)" },
+    { CPU_LOG_TB_FPU, "fpu",
+      "include FPU registers in the 'cpu' logging" },
     { CPU_LOG_MMU, "mmu",
       "log MMU-related activities" },
     { CPU_LOG_PCALL, "pcall",
-- 
2.17.0

The following changes since commit 4cc10cae64c51e17844dc4358481c393d7bf1ed4:

Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging (2021-05-06 18:56:17 +0100)

are available in the Git repository at:

https://git.linaro.org/people/pmaydell/qemu-arm.git tags/pull-target-arm-20210510

for you to fetch changes up to 8f96812baa53005f32aece3e30b140826c20aa19:

hw/arm/xlnx: Fix PHY address for xilinx-zynq-a9 (2021-05-10 13:24:09 +0100)

----------------------------------------------------------------
target-arm queue:
 * docs: fix link in sbsa description
 * linux-user/aarch64: Enable hwcap for RND, BTI, and MTE
 * target/arm: Fix tlbbits calculation in tlbi_aa64_vae2is_write()
 * target/arm: Split neon and vfp translation to their own
   compilation units
 * target/arm: Make WFI a NOP for userspace emulators
 * hw/sd/omap_mmc: Use device_cold_reset() instead of
   device_legacy_reset()
 * include: More fixes for 'extern "C"' block use
 * hw/arm/imx25_pdk: Fix error message for invalid RAM size
 * hw/arm/mps2-tz: Implement AN524 memory remapping via machine property
 * hw/arm/xlnx: Fix PHY address for xilinx-zynq-a9

----------------------------------------------------------------
Alex Bennée (1):
      docs: fix link in sbsa description

Guenter Roeck (1):
      hw/arm/xlnx: Fix PHY address for xilinx-zynq-a9

Peter Maydell (22):
      target/arm: Fix tlbbits calculation in tlbi_aa64_vae2is_write()
      target/arm: Move constant expanders to translate.h
      target/arm: Share unallocated_encoding() and gen_exception_insn()
      target/arm: Make functions used by m-nocp global
      target/arm: Split m-nocp trans functions into their own file
      target/arm: Move gen_aa32 functions to translate-a32.h
      target/arm: Move vfp_{load, store}_reg{32, 64} to translate-vfp.c.inc
      target/arm: Make functions used by translate-vfp global
      target/arm: Make translate-vfp.c.inc its own compilation unit
      target/arm: Move vfp_reg_ptr() to translate-neon.c.inc
      target/arm: Delete unused typedef
      target/arm: Move NeonGenThreeOpEnvFn typedef to translate.h
      target/arm: Make functions used by translate-neon global
      target/arm: Make translate-neon.c.inc its own compilation unit
      target/arm: Make WFI a NOP for userspace emulators
      hw/sd/omap_mmc: Use device_cold_reset() instead of device_legacy_reset()
      osdep: Make os-win32.h and os-posix.h handle 'extern "C"' themselves
      include/qemu/bswap.h: Handle being included outside extern "C" block
      include/disas/dis-asm.h: Handle being included outside 'extern "C"'
      hw/misc/mps2-scc: Add "QEMU interface" comment
      hw/misc/mps2-scc: Support using CFG0 bit 0 for remapping
      hw/arm/mps2-tz: Implement AN524 memory remapping via machine property

Philippe Mathieu-Daudé (1):
      hw/arm/imx25_pdk: Fix error message for invalid RAM size

Richard Henderson (1):
      linux-user/aarch64: Enable hwcap for RND, BTI, and MTE

docs/system/arm/mps2.rst                           |  10 +
 docs/system/arm/sbsa.rst                           |   2 +-
 include/disas/dis-asm.h                            |  12 +-
 include/hw/misc/mps2-scc.h                         |  21 ++
 include/qemu/bswap.h                               |  26 ++-
 include/qemu/osdep.h                               |   8 +-
 include/sysemu/os-posix.h                          |   8 +
 include/sysemu/os-win32.h                          |   8 +
 target/arm/translate-a32.h                         | 144 +++++++++++++
 target/arm/translate-a64.h                         |   2 -
 target/arm/translate.h                             |  29 +++
 hw/arm/imx25_pdk.c                                 |   5 +-
 hw/arm/mps2-tz.c                                   | 108 +++++++++-
 hw/arm/xilinx_zynq.c                               |   2 +-
 hw/misc/mps2-scc.c                                 |  13 +-
 hw/sd/omap_mmc.c                                   |   2 +-
 linux-user/elfload.c                               |  13 ++
 target/arm/helper.c                                |   2 +-
 target/arm/op_helper.c                             |  12 ++
 target/arm/translate-a64.c                         |  15 --
 target/arm/translate-m-nocp.c                      | 221 ++++++++++++++++++++
 .../arm/{translate-neon.c.inc => translate-neon.c} |  19 +-
 .../arm/{translate-vfp.c.inc => translate-vfp.c}   | 230 +++------------------
 target/arm/translate.c                             | 200 ++++--------------
 disas/arm-a64.cc                                   |   2 -
 disas/nanomips.cpp                                 |   2 -
 target/arm/meson.build                             |  15 +-
 27 files changed, 718 insertions(+), 413 deletions(-)
 create mode 100644 target/arm/translate-a32.h
 create mode 100644 target/arm/translate-m-nocp.c
 rename target/arm/{translate-neon.c.inc => translate-neon.c} (99%)
 rename target/arm/{translate-vfp.c.inc => translate-vfp.c} (94%)

From: Alex Bennée <alex.bennee@linaro.org>

A trailing _ makes all the difference to the rendered link.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20210428131316.31390-1-alex.bennee@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/system/arm/sbsa.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/system/arm/sbsa.rst b/docs/system/arm/sbsa.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/sbsa.rst
+++ b/docs/system/arm/sbsa.rst
@@ -XXX,XX +XXX,XX @@ Arm Server Base System Architecture Reference board (``sbsa-ref``)
 While the `virt` board is a generic board platform that doesn't match
 any real hardware the `sbsa-ref` board intends to look like real
 hardware. The `Server Base System Architecture
-<https://developer.arm.com/documentation/den0029/latest>` defines a
+<https://developer.arm.com/documentation/den0029/latest>`_ defines a
 minimum base line of hardware support and importantly how the firmware
 reports that to any operating system. It is a static system that
 reports a very minimal DT to the firmware for non-discoverable
-- 
2.20.1

From: Richard Henderson <richard.henderson@linaro.org>

These three features are already enabled by TCG, but are missing
their hwcap bits.  Update HWCAP2 from linux v5.12.

Cc: qemu-stable@nongnu.org (for 6.0.1)
Buglink: https://bugs.launchpad.net/bugs/1926044
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210427214108.88503-1-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 linux-user/elfload.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index XXXXXXX..XXXXXXX 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -XXX,XX +XXX,XX @@ enum {
     ARM_HWCAP2_A64_SVESM4       = 1 << 6,
     ARM_HWCAP2_A64_FLAGM2       = 1 << 7,
     ARM_HWCAP2_A64_FRINT        = 1 << 8,
+    ARM_HWCAP2_A64_SVEI8MM      = 1 << 9,
+    ARM_HWCAP2_A64_SVEF32MM     = 1 << 10,
+    ARM_HWCAP2_A64_SVEF64MM     = 1 << 11,
+    ARM_HWCAP2_A64_SVEBF16      = 1 << 12,
+    ARM_HWCAP2_A64_I8MM         = 1 << 13,
+    ARM_HWCAP2_A64_BF16         = 1 << 14,
+    ARM_HWCAP2_A64_DGH          = 1 << 15,
+    ARM_HWCAP2_A64_RNG          = 1 << 16,
+    ARM_HWCAP2_A64_BTI          = 1 << 17,
+    ARM_HWCAP2_A64_MTE          = 1 << 18,
 };
 
 #define ELF_HWCAP   get_elf_hwcap()
@@ -XXX,XX +XXX,XX @@ static uint32_t get_elf_hwcap2(void)
     GET_FEATURE_ID(aa64_dcpodp, ARM_HWCAP2_A64_DCPODP);
     GET_FEATURE_ID(aa64_condm_5, ARM_HWCAP2_A64_FLAGM2);
     GET_FEATURE_ID(aa64_frint, ARM_HWCAP2_A64_FRINT);
+    GET_FEATURE_ID(aa64_rndr, ARM_HWCAP2_A64_RNG);
+    GET_FEATURE_ID(aa64_bti, ARM_HWCAP2_A64_BTI);
+    GET_FEATURE_ID(aa64_mte, ARM_HWCAP2_A64_MTE);
 
     return hwcaps;
 }
-- 
2.20.1

In tlbi_aa64_vae2is_write() the calculation
  bits = tlbbits_for_regime(env, secure ? ARMMMUIdx_E2 : ARMMMUIdx_SE2,
                            pageaddr)

has the two arms of the ?: expression reversed. Fix the bug.

Fixes: b6ad6062f1e5
Reported-by: Rebecca Cran <rebecca@nuviainc.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Rebecca Cran <rebecca@nuviainc.com>
Message-id: 20210420123106.10861-1-peter.maydell@linaro.org
---
 target/arm/helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -XXX,XX +XXX,XX @@ static void tlbi_aa64_vae2is_write(CPUARMState *env, const ARMCPRegInfo *ri,
     uint64_t pageaddr = sextract64(value << 12, 0, 56);
     bool secure = arm_is_secure_below_el3(env);
     int mask = secure ? ARMMMUIdxBit_SE2 : ARMMMUIdxBit_E2;
-    int bits = tlbbits_for_regime(env, secure ? ARMMMUIdx_E2 : ARMMMUIdx_SE2,
+    int bits = tlbbits_for_regime(env, secure ? ARMMMUIdx_SE2 : ARMMMUIdx_E2,
                                   pageaddr);
 
     tlb_flush_page_bits_by_mmuidx_all_cpus_synced(cs, pageaddr, mask, bits);
-- 
2.20.1

Some of the constant expanders defined in translate.c are generically
useful and will be used by the separate C files for VFP and Neon once
they are created; move the expander definitions to translate.h.

diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ extern TCGv_i32 cpu_NF, cpu_ZF, cpu_CF, cpu_VF;
 extern TCGv_i64 cpu_exclusive_addr;
 extern TCGv_i64 cpu_exclusive_val;
 
+/*
+ * Constant expanders for the decoders.
+ */
+
+static inline int negate(DisasContext *s, int x)
+{
+    return -x;
+}
+
+static inline int plus_2(DisasContext *s, int x)
+{
+    return x + 2;
+}
+
+static inline int times_2(DisasContext *s, int x)
+{
+    return x * 2;
+}
+
+static inline int times_4(DisasContext *s, int x)
+{
+    return x * 4;
+}
+
 static inline int arm_dc_feature(DisasContext *dc, int feature)
 {
     return (dc->features & (1ULL << feature)) != 0;
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void arm_gen_condlabel(DisasContext *s)
     }
 }
 
-/*
- * Constant expanders for the decoders.
- */
-
-static int negate(DisasContext *s, int x)
-{
-    return -x;
-}
-
-static int plus_2(DisasContext *s, int x)
-{
-    return x + 2;
-}
-
-static int times_2(DisasContext *s, int x)
-{
-    return x * 2;
-}
-
-static int times_4(DisasContext *s, int x)
-{
-    return x * 4;
-}
-
 /* Flags for the disas_set_da_iss info argument:
  * lower bits hold the Rt register number, higher bits are flags.
  */
-- 
2.20.1

The unallocated_encoding() function is the same in both
translate-a64.c and translate.c; make the translate.c function global
and drop the translate-a64.c version.  To do this we need to also
share gen_exception_insn(), which currently exists in two slightly
different versions for A32 and A64: merge those into a single
function that can work for both.

This will be useful for splitting up translate.c, which will require
unallocated_encoding() to no longer be file-local.  It's also
hopefully less confusing to have only one version of the function
rather than two.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210430132740.10391-3-peter.maydell@linaro.org
---
 target/arm/translate-a64.h |  2 --
 target/arm/translate.h     |  3 +++
 target/arm/translate-a64.c | 15 ---------------
 target/arm/translate.c     | 14 +++++++++-----
 4 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -XXX,XX +XXX,XX @@
 #ifndef TARGET_ARM_TRANSLATE_A64_H
 #define TARGET_ARM_TRANSLATE_A64_H
 
-void unallocated_encoding(DisasContext *s);
-
 #define unsupported_encoding(s, insn)                                    \
     do {                                                                 \
         qemu_log_mask(LOG_UNIMP,                                         \
diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ void arm_free_cc(DisasCompare *cmp);
 void arm_jump_cc(DisasCompare *cmp, TCGLabel *label);
 void arm_gen_test_cc(int cc, TCGLabel *label);
 MemOp pow2_align(unsigned i);
+void unallocated_encoding(DisasContext *s);
+void gen_exception_insn(DisasContext *s, uint64_t pc, int excp,
+                        uint32_t syn, uint32_t target_el);
 
 /* Return state of Alternate Half-precision flag, caller frees result */
 static inline TCGv_i32 get_ahp_flag(void)
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -XXX,XX +XXX,XX @@ static void gen_exception_internal_insn(DisasContext *s, uint64_t pc, int excp)
     s->base.is_jmp = DISAS_NORETURN;
 }
 
-static void gen_exception_insn(DisasContext *s, uint64_t pc, int excp,
-                               uint32_t syndrome, uint32_t target_el)
-{
-    gen_a64_set_pc_im(pc);
-    gen_exception(excp, syndrome, target_el);
-    s->base.is_jmp = DISAS_NORETURN;
-}
-
 static void gen_exception_bkpt_insn(DisasContext *s, uint32_t syndrome)
 {
     TCGv_i32 tcg_syn;
@@ -XXX,XX +XXX,XX @@ static inline void gen_goto_tb(DisasContext *s, int n, uint64_t dest)
     }
 }
 
-void unallocated_encoding(DisasContext *s)
-{
-    /* Unallocated and reserved encodings are uncategorized */
-    gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
-                       default_exception_el(s));
-}
-
 static void init_tmp_a64_array(DisasContext *s)
 {
 #ifdef CONFIG_DEBUG_TCG
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_exception_internal_insn(DisasContext *s, uint32_t pc, int excp)
     s->base.is_jmp = DISAS_NORETURN;
 }
 
-static void gen_exception_insn(DisasContext *s, uint32_t pc, int excp,
-                               int syn, uint32_t target_el)
+void gen_exception_insn(DisasContext *s, uint64_t pc, int excp,
+                        uint32_t syn, uint32_t target_el)
 {
-    gen_set_condexec(s);
-    gen_set_pc_im(s, pc);
+    if (s->aarch64) {
+        gen_a64_set_pc_im(pc);
+    } else {
+        gen_set_condexec(s);
+        gen_set_pc_im(s, pc);
+    }
     gen_exception(excp, syn, target_el);
     s->base.is_jmp = DISAS_NORETURN;
 }
@@ -XXX,XX +XXX,XX @@ static void gen_exception_bkpt_insn(DisasContext *s, uint32_t syn)
     s->base.is_jmp = DISAS_NORETURN;
 }
 
-static void unallocated_encoding(DisasContext *s)
+void unallocated_encoding(DisasContext *s)
 {
     /* Unallocated and reserved encodings are uncategorized */
     gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syn_uncategorized(),
-- 
2.20.1

We want to split out the .c.inc files which are currently included
into translate.c so they are separate compilation units.  To do this
we need to make some functions which are currently file-local to
translate.c have global scope; create a translate-a32.h paralleling
the existing translate-a64.h as a place for these declarations to
live, so that code moved into the new compilation units can call
them.

The functions made global here are those required by the
m-nocp.decode functions, except that I have converted the whole
family of {read,write}_neon_element* and also both the load_cpu and
store_cpu functions for consistency, even though m-nocp only wants a
few functions from each.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210430132740.10391-4-peter.maydell@linaro.org
---
 target/arm/translate-a32.h     | 57 ++++++++++++++++++++++++++++++++++
 target/arm/translate.c         | 39 +++++------------------
 target/arm/translate-vfp.c.inc |  2 +-
 3 files changed, 65 insertions(+), 33 deletions(-)
 create mode 100644 target/arm/translate-a32.h

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/translate-a32.h
@@ -XXX,XX +XXX,XX @@
+/*
+ *  AArch32 translation, common definitions.
+ *
+ * Copyright (c) 2021 Linaro, Ltd.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef TARGET_ARM_TRANSLATE_A64_H
+#define TARGET_ARM_TRANSLATE_A64_H
+
+void load_reg_var(DisasContext *s, TCGv_i32 var, int reg);
+void arm_gen_condlabel(DisasContext *s);
+bool vfp_access_check(DisasContext *s);
+void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop);
+void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop);
+void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop);
+void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop);
+
+static inline TCGv_i32 load_cpu_offset(int offset)
+{
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    tcg_gen_ld_i32(tmp, cpu_env, offset);
+    return tmp;
+}
+
+#define load_cpu_field(name) load_cpu_offset(offsetof(CPUARMState, name))
+
+static inline void store_cpu_offset(TCGv_i32 var, int offset)
+{
+    tcg_gen_st_i32(var, cpu_env, offset);
+    tcg_temp_free_i32(var);
+}
+
+#define store_cpu_field(var, name) \
+    store_cpu_offset(var, offsetof(CPUARMState, name))
+
+/* Create a new temporary and set it to the value of a CPU register.  */
+static inline TCGv_i32 load_reg(DisasContext *s, int reg)
+{
+    TCGv_i32 tmp = tcg_temp_new_i32();
+    load_reg_var(s, tmp, reg);
+    return tmp;
+}
+
+#endif
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@
 #define ENABLE_ARCH_8     arm_dc_feature(s, ARM_FEATURE_V8)
 
 #include "translate.h"
+#include "translate-a32.h"
 
 #if defined(CONFIG_USER_ONLY)
 #define IS_USER(s) 1
@@ -XXX,XX +XXX,XX @@ void arm_translate_init(void)
 }
 
 /* Generate a label used for skipping this instruction */
-static void arm_gen_condlabel(DisasContext *s)
+void arm_gen_condlabel(DisasContext *s)
 {
     if (!s->condjmp) {
         s->condlabel = gen_new_label();
@@ -XXX,XX +XXX,XX @@ static inline int get_a32_user_mem_index(DisasContext *s)
     }
 }
 
-static inline TCGv_i32 load_cpu_offset(int offset)
-{
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    tcg_gen_ld_i32(tmp, cpu_env, offset);
-    return tmp;
-}
-
-#define load_cpu_field(name) load_cpu_offset(offsetof(CPUARMState, name))
-
-static inline void store_cpu_offset(TCGv_i32 var, int offset)
-{
-    tcg_gen_st_i32(var, cpu_env, offset);
-    tcg_temp_free_i32(var);
-}
-
-#define store_cpu_field(var, name) \
-    store_cpu_offset(var, offsetof(CPUARMState, name))
-
 /* The architectural value of PC.  */
 static uint32_t read_pc(DisasContext *s)
 {
@@ -XXX,XX +XXX,XX @@ static uint32_t read_pc(DisasContext *s)
 }
 
 /* Set a variable to the value of a CPU register.  */
-static void load_reg_var(DisasContext *s, TCGv_i32 var, int reg)
+void load_reg_var(DisasContext *s, TCGv_i32 var, int reg)
 {
     if (reg == 15) {
         tcg_gen_movi_i32(var, read_pc(s));
@@ -XXX,XX +XXX,XX @@ static void load_reg_var(DisasContext *s, TCGv_i32 var, int reg)
     }
 }
 
-/* Create a new temporary and set it to the value of a CPU register.  */
-static inline TCGv_i32 load_reg(DisasContext *s, int reg)
-{
-    TCGv_i32 tmp = tcg_temp_new_i32();
-    load_reg_var(s, tmp, reg);
-    return tmp;
-}
-
 /*
  * Create a new temp, REG + OFS, except PC is ALIGN(PC, 4).
  * This is used for load/store for which use of PC implies (literal),
@@ -XXX,XX +XXX,XX @@ static inline void vfp_store_reg32(TCGv_i32 var, int reg)
     tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
 }
 
-static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
+void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
 {
     long off = neon_element_offset(reg, ele, memop);
 
@@ -XXX,XX +XXX,XX @@ static void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
     }
 }
 
-static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
+void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
 {
     long off = neon_element_offset(reg, ele, memop);
 
@@ -XXX,XX +XXX,XX @@ static void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop)
     }
 }
 
-static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
+void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
 {
     long off = neon_element_offset(reg, ele, memop);
 
@@ -XXX,XX +XXX,XX @@ static void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop)
     }
 }
 
-static void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
+void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
 {
     long off = neon_element_offset(reg, ele, memop);
 
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool full_vfp_access_check(DisasContext *s, bool ignore_vfp_enabled)
  * The most usual kind of VFP access check, for everything except
  * FMXR/FMRX to the always-available special registers.
  */
-static bool vfp_access_check(DisasContext *s)
+bool vfp_access_check(DisasContext *s)
 {
     return full_vfp_access_check(s, false);
 }
-- 
2.20.1

Currently the trans functions for m-nocp.decode all live in
translate-vfp.inc.c; move them out into their own translation unit,
translate-m-nocp.c.

The trans_* functions here are pure code motion with no changes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210430132740.10391-5-peter.maydell@linaro.org
---
 target/arm/translate-a32.h     |   3 +
 target/arm/translate-m-nocp.c  | 221 +++++++++++++++++++++++++++++++++
 target/arm/translate.c         |   1 -
 target/arm/translate-vfp.c.inc | 196 -----------------------------
 target/arm/meson.build         |   3 +-
 5 files changed, 226 insertions(+), 198 deletions(-)
 create mode 100644 target/arm/translate-m-nocp.c

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -XXX,XX +XXX,XX @@
 #ifndef TARGET_ARM_TRANSLATE_A64_H
 #define TARGET_ARM_TRANSLATE_A64_H
 
+/* Prototypes for autogenerated disassembler functions */
+bool disas_m_nocp(DisasContext *dc, uint32_t insn);
+
 void load_reg_var(DisasContext *s, TCGv_i32 var, int reg);
 void arm_gen_condlabel(DisasContext *s);
 bool vfp_access_check(DisasContext *s);
diff --git a/target/arm/translate-m-nocp.c b/target/arm/translate-m-nocp.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/target/arm/translate-m-nocp.c
@@ -XXX,XX +XXX,XX @@
+/*
+ *  ARM translation: M-profile NOCP special-case instructions
+ *
+ *  Copyright (c) 2020 Linaro, Ltd.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "tcg/tcg-op.h"
+#include "translate.h"
+#include "translate-a32.h"
+
+#include "decode-m-nocp.c.inc"
+
+/*
+ * Decode VLLDM and VLSTM are nonstandard because:
+ *  * if there is no FPU then these insns must NOP in
+ *    Secure state and UNDEF in Nonsecure state
+ *  * if there is an FPU then these insns do not have
+ *    the usual behaviour that vfp_access_check() provides of
+ *    being controlled by CPACR/NSACR enable bits or the
+ *    lazy-stacking logic.
+ */
+static bool trans_VLLDM_VLSTM(DisasContext *s, arg_VLLDM_VLSTM *a)
+{
+    TCGv_i32 fptr;
+
+    if (!arm_dc_feature(s, ARM_FEATURE_M) ||
+        !arm_dc_feature(s, ARM_FEATURE_V8)) {
+        return false;
+    }
+
+    if (a->op) {
+        /*
+         * T2 encoding ({D0-D31} reglist): v8.1M and up. We choose not
+         * to take the IMPDEF option to make memory accesses to the stack
+         * slots that correspond to the D16-D31 registers (discarding
+         * read data and writing UNKNOWN values), so for us the T2
+         * encoding behaves identically to the T1 encoding.
+         */
+        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+            return false;
+        }
+    } else {
+        /*
+         * T1 encoding ({D0-D15} reglist); undef if we have 32 Dregs.
+         * This is currently architecturally impossible, but we add the
+         * check to stay in line with the pseudocode. Note that we must
+         * emit code for the UNDEF so it takes precedence over the NOCP.
+         */
+        if (dc_isar_feature(aa32_simd_r32, s)) {
+            unallocated_encoding(s);
+            return true;
+        }
+    }
+
+    /*
+     * If not secure, UNDEF. We must emit code for this
+     * rather than returning false so that this takes
+     * precedence over the m-nocp.decode NOCP fallback.
+     */
+    if (!s->v8m_secure) {
+        unallocated_encoding(s);
+        return true;
+    }
+    /* If no fpu, NOP. */
+    if (!dc_isar_feature(aa32_vfp, s)) {
+        return true;
+    }
+
+    fptr = load_reg(s, a->rn);
+    if (a->l) {
+        gen_helper_v7m_vlldm(cpu_env, fptr);
+    } else {
+        gen_helper_v7m_vlstm(cpu_env, fptr);
+    }
+    tcg_temp_free_i32(fptr);
+
+    /* End the TB, because we have updated FP control bits */
+    s->base.is_jmp = DISAS_UPDATE_EXIT;
+    return true;
+}
+
+static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
+{
+    int btmreg, topreg;
+    TCGv_i64 zero;
+    TCGv_i32 aspen, sfpa;
+
+    if (!dc_isar_feature(aa32_m_sec_state, s)) {
+        /* Before v8.1M, fall through in decode to NOCP check */
+        return false;
+    }
+
+    /* Explicitly UNDEF because this takes precedence over NOCP */
+    if (!arm_dc_feature(s, ARM_FEATURE_M_MAIN) || !s->v8m_secure) {
+        unallocated_encoding(s);
+        return true;
+    }
+
+    if (!dc_isar_feature(aa32_vfp_simd, s)) {
+        /* NOP if we have neither FP nor MVE */
+        return true;
+    }
+
+    /*
+     * If FPCCR.ASPEN != 0 && CONTROL_S.SFPA == 0 then there is no
+     * active floating point context so we must NOP (without doing
+     * any lazy state preservation or the NOCP check).
+     */
+    aspen = load_cpu_field(v7m.fpccr[M_REG_S]);
+    sfpa = load_cpu_field(v7m.control[M_REG_S]);
+    tcg_gen_andi_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
+    tcg_gen_xori_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
+    tcg_gen_andi_i32(sfpa, sfpa, R_V7M_CONTROL_SFPA_MASK);
+    tcg_gen_or_i32(sfpa, sfpa, aspen);
+    arm_gen_condlabel(s);
+    tcg_gen_brcondi_i32(TCG_COND_EQ, sfpa, 0, s->condlabel);
+
+    if (s->fp_excp_el != 0) {
+        gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+                           syn_uncategorized(), s->fp_excp_el);
+        return true;
+    }
+
+    topreg = a->vd + a->imm - 1;
+    btmreg = a->vd;
+
+    /* Convert to Sreg numbers if the insn specified in Dregs */
+    if (a->size == 3) {
+        topreg = topreg * 2 + 1;
+        btmreg *= 2;
+    }
+
+    if (topreg > 63 || (topreg > 31 && !(topreg & 1))) {
+        /* UNPREDICTABLE: we choose to undef */
+        unallocated_encoding(s);
+        return true;
+    }
+
+    /* Silently ignore requests to clear D16-D31 if they don't exist */
+    if (topreg > 31 && !dc_isar_feature(aa32_simd_r32, s)) {
+        topreg = 31;
+    }
+
+    if (!vfp_access_check(s)) {
+        return true;
+    }
+
+    /* Zero the Sregs from btmreg to topreg inclusive. */
+    zero = tcg_const_i64(0);
+    if (btmreg & 1) {
+        write_neon_element64(zero, btmreg >> 1, 1, MO_32);
+        btmreg++;
+    }
+    for (; btmreg + 1 <= topreg; btmreg += 2) {
+        write_neon_element64(zero, btmreg >> 1, 0, MO_64);
+    }
+    if (btmreg == topreg) {
+        write_neon_element64(zero, btmreg >> 1, 0, MO_32);
+        btmreg++;
+    }
+    assert(btmreg == topreg + 1);
+    /* TODO: when MVE is implemented, zero VPR here */
+    return true;
+}
+
+static bool trans_NOCP(DisasContext *s, arg_nocp *a)
+{
+    /*
+     * Handle M-profile early check for disabled coprocessor:
+     * all we need to do here is emit the NOCP exception if
+     * the coprocessor is disabled. Otherwise we return false
+     * and the real VFP/etc decode will handle the insn.
+     */
+    assert(arm_dc_feature(s, ARM_FEATURE_M));
+
+    if (a->cp == 11) {
+        a->cp = 10;
+    }
+    if (arm_dc_feature(s, ARM_FEATURE_V8_1M) &&
+        (a->cp == 8 || a->cp == 9 || a->cp == 14 || a->cp == 15)) {
+        /* in v8.1M cp 8, 9, 14, 15 also are governed by the cp10 enable */
+        a->cp = 10;
+    }
+
+    if (a->cp != 10) {
+        gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+                           syn_uncategorized(), default_exception_el(s));
+        return true;
+    }
+
+    if (s->fp_excp_el != 0) {
+        gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
+                           syn_uncategorized(), s->fp_excp_el);
+        return true;
+    }
+
+    return false;
+}
+
+static bool trans_NOCP_8_1(DisasContext *s, arg_nocp *a)
+{
+    /* This range needs a coprocessor check for v8.1M and later only */
+    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
+        return false;
+    }
+    return trans_NOCP(s, a);
+}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 #define ARM_CP_RW_BIT   (1 << 20)
 
 /* Include the VFP and Neon decoders */
-#include "decode-m-nocp.c.inc"
 #include "translate-vfp.c.inc"
 #include "translate-neon.c.inc"
 
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@ static bool trans_VCVT_dp_int(DisasContext *s, arg_VCVT_dp_int *a)
     return true;
 }
 
-/*
- * Decode VLLDM and VLSTM are nonstandard because:
- *  * if there is no FPU then these insns must NOP in
- *    Secure state and UNDEF in Nonsecure state
- *  * if there is an FPU then these insns do not have
- *    the usual behaviour that vfp_access_check() provides of
- *    being controlled by CPACR/NSACR enable bits or the
- *    lazy-stacking logic.
- */
-static bool trans_VLLDM_VLSTM(DisasContext *s, arg_VLLDM_VLSTM *a)
-{
-    TCGv_i32 fptr;
-
-    if (!arm_dc_feature(s, ARM_FEATURE_M) ||
-        !arm_dc_feature(s, ARM_FEATURE_V8)) {
-        return false;
-    }
-
-    if (a->op) {
-        /*
-         * T2 encoding ({D0-D31} reglist): v8.1M and up. We choose not
-         * to take the IMPDEF option to make memory accesses to the stack
-         * slots that correspond to the D16-D31 registers (discarding
-         * read data and writing UNKNOWN values), so for us the T2
-         * encoding behaves identically to the T1 encoding.
-         */
-        if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-            return false;
-        }
-    } else {
-        /*
-         * T1 encoding ({D0-D15} reglist); undef if we have 32 Dregs.
-         * This is currently architecturally impossible, but we add the
-         * check to stay in line with the pseudocode. Note that we must
-         * emit code for the UNDEF so it takes precedence over the NOCP.
-         */
-        if (dc_isar_feature(aa32_simd_r32, s)) {
-            unallocated_encoding(s);
-            return true;
-        }
-    }
-
-    /*
-     * If not secure, UNDEF. We must emit code for this
-     * rather than returning false so that this takes
-     * precedence over the m-nocp.decode NOCP fallback.
-     */
-    if (!s->v8m_secure) {
-        unallocated_encoding(s);
-        return true;
-    }
-    /* If no fpu, NOP. */
-    if (!dc_isar_feature(aa32_vfp, s)) {
-        return true;
-    }
-
-    fptr = load_reg(s, a->rn);
-    if (a->l) {
-        gen_helper_v7m_vlldm(cpu_env, fptr);
-    } else {
-        gen_helper_v7m_vlstm(cpu_env, fptr);
-    }
-    tcg_temp_free_i32(fptr);
-
-    /* End the TB, because we have updated FP control bits */
-    s->base.is_jmp = DISAS_UPDATE_EXIT;
-    return true;
-}
-
-static bool trans_VSCCLRM(DisasContext *s, arg_VSCCLRM *a)
-{
-    int btmreg, topreg;
-    TCGv_i64 zero;
-    TCGv_i32 aspen, sfpa;
-
-    if (!dc_isar_feature(aa32_m_sec_state, s)) {
-        /* Before v8.1M, fall through in decode to NOCP check */
-        return false;
-    }
-
-    /* Explicitly UNDEF because this takes precedence over NOCP */
-    if (!arm_dc_feature(s, ARM_FEATURE_M_MAIN) || !s->v8m_secure) {
-        unallocated_encoding(s);
-        return true;
-    }
-
-    if (!dc_isar_feature(aa32_vfp_simd, s)) {
-        /* NOP if we have neither FP nor MVE */
-        return true;
-    }
-
-    /*
-     * If FPCCR.ASPEN != 0 && CONTROL_S.SFPA == 0 then there is no
-     * active floating point context so we must NOP (without doing
-     * any lazy state preservation or the NOCP check).
-     */
-    aspen = load_cpu_field(v7m.fpccr[M_REG_S]);
-    sfpa = load_cpu_field(v7m.control[M_REG_S]);
-    tcg_gen_andi_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
-    tcg_gen_xori_i32(aspen, aspen, R_V7M_FPCCR_ASPEN_MASK);
-    tcg_gen_andi_i32(sfpa, sfpa, R_V7M_CONTROL_SFPA_MASK);
-    tcg_gen_or_i32(sfpa, sfpa, aspen);
-    arm_gen_condlabel(s);
-    tcg_gen_brcondi_i32(TCG_COND_EQ, sfpa, 0, s->condlabel);
-
-    if (s->fp_excp_el != 0) {
-        gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
-                           syn_uncategorized(), s->fp_excp_el);
-        return true;
-    }
-
-    topreg = a->vd + a->imm - 1;
-    btmreg = a->vd;
-
-    /* Convert to Sreg numbers if the insn specified in Dregs */
-    if (a->size == 3) {
-        topreg = topreg * 2 + 1;
-        btmreg *= 2;
-    }
-
-    if (topreg > 63 || (topreg > 31 && !(topreg & 1))) {
-        /* UNPREDICTABLE: we choose to undef */
-        unallocated_encoding(s);
-        return true;
-    }
-
-    /* Silently ignore requests to clear D16-D31 if they don't exist */
-    if (topreg > 31 && !dc_isar_feature(aa32_simd_r32, s)) {
-        topreg = 31;
-    }
-
-    if (!vfp_access_check(s)) {
-        return true;
-    }
-
-    /* Zero the Sregs from btmreg to topreg inclusive. */
-    zero = tcg_const_i64(0);
-    if (btmreg & 1) {
-        write_neon_element64(zero, btmreg >> 1, 1, MO_32);
-        btmreg++;
-    }
-    for (; btmreg + 1 <= topreg; btmreg += 2) {
-        write_neon_element64(zero, btmreg >> 1, 0, MO_64);
-    }
-    if (btmreg == topreg) {
-        write_neon_element64(zero, btmreg >> 1, 0, MO_32);
-        btmreg++;
-    }
-    assert(btmreg == topreg + 1);
-    /* TODO: when MVE is implemented, zero VPR here */
-    return true;
-}
-
-static bool trans_NOCP(DisasContext *s, arg_nocp *a)
-{
-    /*
-     * Handle M-profile early check for disabled coprocessor:
-     * all we need to do here is emit the NOCP exception if
-     * the coprocessor is disabled. Otherwise we return false
-     * and the real VFP/etc decode will handle the insn.
-     */
-    assert(arm_dc_feature(s, ARM_FEATURE_M));
-
-    if (a->cp == 11) {
-        a->cp = 10;
-    }
-    if (arm_dc_feature(s, ARM_FEATURE_V8_1M) &&
-        (a->cp == 8 || a->cp == 9 || a->cp == 14 || a->cp == 15)) {
-        /* in v8.1M cp 8, 9, 14, 15 also are governed by the cp10 enable */
-        a->cp = 10;
-    }
-
-    if (a->cp != 10) {
-        gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
-                           syn_uncategorized(), default_exception_el(s));
-        return true;
-    }
-
-    if (s->fp_excp_el != 0) {
-        gen_exception_insn(s, s->pc_curr, EXCP_NOCP,
-                           syn_uncategorized(), s->fp_excp_el);
-        return true;
-    }
-
-    return false;
-}
-
-static bool trans_NOCP_8_1(DisasContext *s, arg_nocp *a)
-{
-    /* This range needs a coprocessor check for v8.1M and later only */
-    if (!arm_dc_feature(s, ARM_FEATURE_V8_1M)) {
-        return false;
-    }
-    return trans_NOCP(s, a);
-}
-
 static bool trans_VINS(DisasContext *s, arg_VINS *a)
 {
     TCGv_i32 rd, rm;
diff --git a/target/arm/meson.build b/target/arm/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -XXX,XX +XXX,XX @@ gen = [
   decodetree.process('neon-ls.decode', extra_args: '--static-decode=disas_neon_ls'),
   decodetree.process('vfp.decode', extra_args: '--static-decode=disas_vfp'),
   decodetree.process('vfp-uncond.decode', extra_args: '--static-decode=disas_vfp_uncond'),
-  decodetree.process('m-nocp.decode', extra_args: '--static-decode=disas_m_nocp'),
+  decodetree.process('m-nocp.decode', extra_args: '--decode=disas_m_nocp'),
   decodetree.process('a32.decode', extra_args: '--static-decode=disas_a32'),
   decodetree.process('a32-uncond.decode', extra_args: '--static-decode=disas_a32_uncond'),
   decodetree.process('t32.decode', extra_args: '--static-decode=disas_t32'),
@@ -XXX,XX +XXX,XX @@ arm_ss.add(files(
   'op_helper.c',
   'tlb_helper.c',
   'translate.c',
+  'translate-m-nocp.c',
   'vec_helper.c',
   'vfp_helper.c',
   'cpu_tcg.c',
-- 
2.20.1

Move the various gen_aa32* functions and macros out of translate.c
and into translate-a32.h.

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -XXX,XX +XXX,XX @@ static inline TCGv_i32 load_reg(DisasContext *s, int reg)
     return tmp;
 }
 
+void gen_aa32_ld_internal_i32(DisasContext *s, TCGv_i32 val,
+                              TCGv_i32 a32, int index, MemOp opc);
+void gen_aa32_st_internal_i32(DisasContext *s, TCGv_i32 val,
+                              TCGv_i32 a32, int index, MemOp opc);
+void gen_aa32_ld_internal_i64(DisasContext *s, TCGv_i64 val,
+                              TCGv_i32 a32, int index, MemOp opc);
+void gen_aa32_st_internal_i64(DisasContext *s, TCGv_i64 val,
+                              TCGv_i32 a32, int index, MemOp opc);
+void gen_aa32_ld_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
+                     int index, MemOp opc);
+void gen_aa32_st_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
+                     int index, MemOp opc);
+void gen_aa32_ld_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
+                     int index, MemOp opc);
+void gen_aa32_st_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
+                     int index, MemOp opc);
+
+#define DO_GEN_LD(SUFF, OPC)                                            \
+    static inline void gen_aa32_ld##SUFF(DisasContext *s, TCGv_i32 val, \
+                                         TCGv_i32 a32, int index)       \
+    {                                                                   \
+        gen_aa32_ld_i32(s, val, a32, index, OPC);                       \
+    }
+
+#define DO_GEN_ST(SUFF, OPC)                                            \
+    static inline void gen_aa32_st##SUFF(DisasContext *s, TCGv_i32 val, \
+                                         TCGv_i32 a32, int index)       \
+    {                                                                   \
+        gen_aa32_st_i32(s, val, a32, index, OPC);                       \
+    }
+
+static inline void gen_aa32_ld64(DisasContext *s, TCGv_i64 val,
+                                 TCGv_i32 a32, int index)
+{
+    gen_aa32_ld_i64(s, val, a32, index, MO_Q);
+}
+
+static inline void gen_aa32_st64(DisasContext *s, TCGv_i64 val,
+                                 TCGv_i32 a32, int index)
+{
+    gen_aa32_st_i64(s, val, a32, index, MO_Q);
+}
+
+DO_GEN_LD(8u, MO_UB)
+DO_GEN_LD(16u, MO_UW)
+DO_GEN_LD(32u, MO_UL)
+DO_GEN_ST(8, MO_UB)
+DO_GEN_ST(16, MO_UW)
+DO_GEN_ST(32, MO_UL)
+
+#undef DO_GEN_LD
+#undef DO_GEN_ST
+
 #endif
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static TCGv gen_aa32_addr(DisasContext *s, TCGv_i32 a32, MemOp op)
  * Internal routines are used for NEON cases where the endianness
  * and/or alignment has already been taken into account and manipulated.
  */
-static void gen_aa32_ld_internal_i32(DisasContext *s, TCGv_i32 val,
-                                     TCGv_i32 a32, int index, MemOp opc)
+void gen_aa32_ld_internal_i32(DisasContext *s, TCGv_i32 val,
+                              TCGv_i32 a32, int index, MemOp opc)
 {
     TCGv addr = gen_aa32_addr(s, a32, opc);
     tcg_gen_qemu_ld_i32(val, addr, index, opc);
     tcg_temp_free(addr);
 }
 
-static void gen_aa32_st_internal_i32(DisasContext *s, TCGv_i32 val,
-                                     TCGv_i32 a32, int index, MemOp opc)
+void gen_aa32_st_internal_i32(DisasContext *s, TCGv_i32 val,
+                              TCGv_i32 a32, int index, MemOp opc)
 {
     TCGv addr = gen_aa32_addr(s, a32, opc);
     tcg_gen_qemu_st_i32(val, addr, index, opc);
     tcg_temp_free(addr);
 }
 
-static void gen_aa32_ld_internal_i64(DisasContext *s, TCGv_i64 val,
-                                     TCGv_i32 a32, int index, MemOp opc)
+void gen_aa32_ld_internal_i64(DisasContext *s, TCGv_i64 val,
+                              TCGv_i32 a32, int index, MemOp opc)
 {
     TCGv addr = gen_aa32_addr(s, a32, opc);
 
@@ -XXX,XX +XXX,XX @@ static void gen_aa32_ld_internal_i64(DisasContext *s, TCGv_i64 val,
     tcg_temp_free(addr);
 }
 
-static void gen_aa32_st_internal_i64(DisasContext *s, TCGv_i64 val,
-                                     TCGv_i32 a32, int index, MemOp opc)
+void gen_aa32_st_internal_i64(DisasContext *s, TCGv_i64 val,
+                              TCGv_i32 a32, int index, MemOp opc)
 {
     TCGv addr = gen_aa32_addr(s, a32, opc);
 
@@ -XXX,XX +XXX,XX @@ static void gen_aa32_st_internal_i64(DisasContext *s, TCGv_i64 val,
     tcg_temp_free(addr);
 }
 
-static void gen_aa32_ld_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
-                            int index, MemOp opc)
+void gen_aa32_ld_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
+                     int index, MemOp opc)
 {
     gen_aa32_ld_internal_i32(s, val, a32, index, finalize_memop(s, opc));
 }
 
-static void gen_aa32_st_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
-                            int index, MemOp opc)
+void gen_aa32_st_i32(DisasContext *s, TCGv_i32 val, TCGv_i32 a32,
+                     int index, MemOp opc)
 {
     gen_aa32_st_internal_i32(s, val, a32, index, finalize_memop(s, opc));
 }
 
-static void gen_aa32_ld_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
-                            int index, MemOp opc)
+void gen_aa32_ld_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
+                     int index, MemOp opc)
 {
     gen_aa32_ld_internal_i64(s, val, a32, index, finalize_memop(s, opc));
 }
 
-static void gen_aa32_st_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
-                            int index, MemOp opc)
+void gen_aa32_st_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
+                     int index, MemOp opc)
 {
     gen_aa32_st_internal_i64(s, val, a32, index, finalize_memop(s, opc));
 }
@@ -XXX,XX +XXX,XX @@ static void gen_aa32_st_i64(DisasContext *s, TCGv_i64 val, TCGv_i32 a32,
         gen_aa32_st_i32(s, val, a32, index, OPC);                       \
     }
 
-static inline void gen_aa32_ld64(DisasContext *s, TCGv_i64 val,
-                                 TCGv_i32 a32, int index)
-{
-    gen_aa32_ld_i64(s, val, a32, index, MO_Q);
-}
-
-static inline void gen_aa32_st64(DisasContext *s, TCGv_i64 val,
-                                 TCGv_i32 a32, int index)
-{
-    gen_aa32_st_i64(s, val, a32, index, MO_Q);
-}
-
-DO_GEN_LD(8u, MO_UB)
-DO_GEN_LD(16u, MO_UW)
-DO_GEN_LD(32u, MO_UL)
-DO_GEN_ST(8, MO_UB)
-DO_GEN_ST(16, MO_UW)
-DO_GEN_ST(32, MO_UL)
-
 static inline void gen_hvc(DisasContext *s, int imm16)
 {
     /* The pre HVC helper handles cases when HVC gets trapped
-- 
2.20.1

The functions vfp_load_reg32(), vfp_load_reg64(), vfp_store_reg32()
and vfp_store_reg64() are used only in translate-vfp.c.inc. Move
them to that file.

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static long vfp_reg_offset(bool dp, unsigned reg)
     }
 }
 
-static inline void vfp_load_reg64(TCGv_i64 var, int reg)
-{
-    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(true, reg));
-}
-
-static inline void vfp_store_reg64(TCGv_i64 var, int reg)
-{
-    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(true, reg));
-}
-
-static inline void vfp_load_reg32(TCGv_i32 var, int reg)
-{
-    tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
-}
-
-static inline void vfp_store_reg32(TCGv_i32 var, int reg)
-{
-    tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
-}
-
 void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop)
 {
     long off = neon_element_offset(reg, ele, memop);
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c.inc
@@ -XXX,XX +XXX,XX @@
 #include "decode-vfp.c.inc"
 #include "decode-vfp-uncond.c.inc"
 
+static inline void vfp_load_reg64(TCGv_i64 var, int reg)
+{
+    tcg_gen_ld_i64(var, cpu_env, vfp_reg_offset(true, reg));
+}
+
+static inline void vfp_store_reg64(TCGv_i64 var, int reg)
+{
+    tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(true, reg));
+}
+
+static inline void vfp_load_reg32(TCGv_i32 var, int reg)
+{
+    tcg_gen_ld_i32(var, cpu_env, vfp_reg_offset(false, reg));
+}
+
+static inline void vfp_store_reg32(TCGv_i32 var, int reg)
+{
+    tcg_gen_st_i32(var, cpu_env, vfp_reg_offset(false, reg));
+}
+
 /*
  * The imm8 encodes the sign bit, enough bits to represent an exponent in
  * the range 01....1xx to 10....0xx, and the most significant 4 bits of
-- 
2.20.1

Make the remaining functions which are needed by translate-vfp.c.inc
global.

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -XXX,XX +XXX,XX @@ void read_neon_element32(TCGv_i32 dest, int reg, int ele, MemOp memop);
 void read_neon_element64(TCGv_i64 dest, int reg, int ele, MemOp memop);
 void write_neon_element32(TCGv_i32 src, int reg, int ele, MemOp memop);
 void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop);
+TCGv_i32 add_reg_for_lit(DisasContext *s, int reg, int ofs);
+void gen_set_cpsr(TCGv_i32 var, uint32_t mask);
+void gen_set_condexec(DisasContext *s);
+void gen_set_pc_im(DisasContext *s, target_ulong val);
+void gen_lookup_tb(DisasContext *s);
+long vfp_reg_offset(bool dp, unsigned reg);
+long neon_full_reg_offset(unsigned reg);
 
 static inline TCGv_i32 load_cpu_offset(int offset)
 {
@@ -XXX,XX +XXX,XX @@ static inline TCGv_i32 load_reg(DisasContext *s, int reg)
     return tmp;
 }
 
+void store_reg(DisasContext *s, int reg, TCGv_i32 var);
+
 void gen_aa32_ld_internal_i32(DisasContext *s, TCGv_i32 val,
                               TCGv_i32 a32, int index, MemOp opc);
 void gen_aa32_st_internal_i32(DisasContext *s, TCGv_i32 val,
@@ -XXX,XX +XXX,XX @@ DO_GEN_ST(32, MO_UL)
 #undef DO_GEN_LD
 #undef DO_GEN_ST
 
+#if defined(CONFIG_USER_ONLY)
+#define IS_USER(s) 1
+#else
+#define IS_USER(s) (s->user)
+#endif
+
+/* Set NZCV flags from the high 4 bits of var.  */
+#define gen_set_nzcv(var) gen_set_cpsr(var, CPSR_NZCV)
+
 #endif
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@
 #include "translate.h"
 #include "translate-a32.h"
 
-#if defined(CONFIG_USER_ONLY)
-#define IS_USER(s) 1
-#else
-#define IS_USER(s) (s->user)
-#endif
-
 /* These are TCG temporaries used only by the legacy iwMMXt decoder */
 static TCGv_i64 cpu_V0, cpu_V1, cpu_M0;
 /* These are TCG globals which alias CPUARMState fields */
@@ -XXX,XX +XXX,XX @@ void load_reg_var(DisasContext *s, TCGv_i32 var, int reg)
  * This is used for load/store for which use of PC implies (literal),
  * or ADD that implies ADR.
  */
-static TCGv_i32 add_reg_for_lit(DisasContext *s, int reg, int ofs)
+TCGv_i32 add_reg_for_lit(DisasContext *s, int reg, int ofs)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
 
@@ -XXX,XX +XXX,XX @@ static TCGv_i32 add_reg_for_lit(DisasContext *s, int reg, int ofs)
 
 /* Set a CPU register.  The source must be a temporary and will be
    marked as dead.  */
-static void store_reg(DisasContext *s, int reg, TCGv_i32 var)
+void store_reg(DisasContext *s, int reg, TCGv_i32 var)
 {
     if (reg == 15) {
         /* In Thumb mode, we must ignore bit 0.
@@ -XXX,XX +XXX,XX @@ static void store_sp_checked(DisasContext *s, TCGv_i32 var)
 #define gen_sxtb16(var) gen_helper_sxtb16(var, var)
 #define gen_uxtb16(var) gen_helper_uxtb16(var, var)
 
-
-static inline void gen_set_cpsr(TCGv_i32 var, uint32_t mask)
+void gen_set_cpsr(TCGv_i32 var, uint32_t mask)
 {
     TCGv_i32 tmp_mask = tcg_const_i32(mask);
     gen_helper_cpsr_write(cpu_env, var, tmp_mask);
     tcg_temp_free_i32(tmp_mask);
 }
-/* Set NZCV flags from the high 4 bits of var.  */
-#define gen_set_nzcv(var) gen_set_cpsr(var, CPSR_NZCV)
 
 static void gen_exception_internal(int excp)
 {
@@ -XXX,XX +XXX,XX @@ void arm_gen_test_cc(int cc, TCGLabel *label)
     arm_free_cc(&cmp);
 }
 
-static inline void gen_set_condexec(DisasContext *s)
+void gen_set_condexec(DisasContext *s)
 {
     if (s->condexec_mask) {
         uint32_t val = (s->condexec_cond << 4) | (s->condexec_mask >> 1);
@@ -XXX,XX +XXX,XX @@ static inline void gen_set_condexec(DisasContext *s)
     }
 }
 
-static inline void gen_set_pc_im(DisasContext *s, target_ulong val)
+void gen_set_pc_im(DisasContext *s, target_ulong val)
 {
     tcg_gen_movi_i32(cpu_R[15], val);
 }
@@ -XXX,XX +XXX,XX @@ static void gen_exception_el(DisasContext *s, int excp, uint32_t syn,
 }
 
 /* Force a TB lookup after an instruction that changes the CPU state.  */
-static inline void gen_lookup_tb(DisasContext *s)
+void gen_lookup_tb(DisasContext *s)
 {
     tcg_gen_movi_i32(cpu_R[15], s->base.pc_next);
     s->base.is_jmp = DISAS_EXIT;
@@ -XXX,XX +XXX,XX @@ static inline void gen_hlt(DisasContext *s, int imm)
 /*
  * Return the offset of a "full" NEON Dreg.
  */
-static long neon_full_reg_offset(unsigned reg)
+long neon_full_reg_offset(unsigned reg)
 {
     return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
 }
@@ -XXX,XX +XXX,XX @@ static long neon_element_offset(int reg, int element, MemOp memop)
 }
 
 /* Return the offset of a VFP Dreg (dp = true) or VFP Sreg (dp = false). */
-static long vfp_reg_offset(bool dp, unsigned reg)
+long vfp_reg_offset(bool dp, unsigned reg)
 {
     if (dp) {
         return neon_element_offset(reg, 0, MO_64);
-- 
2.20.1

Switch translate-vfp.c.inc from being #included into translate.c
to being its own compilation unit.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210430132740.10391-9-peter.maydell@linaro.org
---
 target/arm/translate-a32.h                          |  2 ++
 target/arm/{translate-vfp.c.inc => translate-vfp.c} | 12 +++++++-----
 target/arm/translate.c                              |  3 +--
 target/arm/meson.build                              |  5 +++--
 4 files changed, 13 insertions(+), 9 deletions(-)
 rename target/arm/{translate-vfp.c.inc => translate-vfp.c} (99%)

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -XXX,XX +XXX,XX @@
 
 /* Prototypes for autogenerated disassembler functions */
 bool disas_m_nocp(DisasContext *dc, uint32_t insn);
+bool disas_vfp(DisasContext *s, uint32_t insn);
+bool disas_vfp_uncond(DisasContext *s, uint32_t insn);
 
 void load_reg_var(DisasContext *s, TCGv_i32 var, int reg);
 void arm_gen_condlabel(DisasContext *s);
diff --git a/target/arm/translate-vfp.c.inc b/target/arm/translate-vfp.c
similarity index 99%
rename from target/arm/translate-vfp.c.inc
rename to target/arm/translate-vfp.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-vfp.c.inc
+++ b/target/arm/translate-vfp.c
@@ -XXX,XX +XXX,XX @@
  * License along with this library; if not, see <http://www.gnu.org/licenses/>.
  */
 
-/*
- * This file is intended to be included from translate.c; it uses
- * some macros and definitions provided by that file.
- * It might be possible to convert it to a standalone .c file eventually.
- */
+#include "qemu/osdep.h"
+#include "tcg/tcg-op.h"
+#include "tcg/tcg-op-gvec.h"
+#include "exec/exec-all.h"
+#include "exec/gen-icount.h"
+#include "translate.h"
+#include "translate-a32.h"
 
 /* Include the generated VFP decoder */
 #include "decode-vfp.c.inc"
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
 
 #define ARM_CP_RW_BIT   (1 << 20)
 
-/* Include the VFP and Neon decoders */
-#include "translate-vfp.c.inc"
+/* Include the Neon decoder */
 #include "translate-neon.c.inc"
 
 static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
diff --git a/target/arm/meson.build b/target/arm/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -XXX,XX +XXX,XX @@ gen = [
   decodetree.process('neon-shared.decode', extra_args: '--static-decode=disas_neon_shared'),
   decodetree.process('neon-dp.decode', extra_args: '--static-decode=disas_neon_dp'),
   decodetree.process('neon-ls.decode', extra_args: '--static-decode=disas_neon_ls'),
-  decodetree.process('vfp.decode', extra_args: '--static-decode=disas_vfp'),
-  decodetree.process('vfp-uncond.decode', extra_args: '--static-decode=disas_vfp_uncond'),
+  decodetree.process('vfp.decode', extra_args: '--decode=disas_vfp'),
+  decodetree.process('vfp-uncond.decode', extra_args: '--decode=disas_vfp_uncond'),
   decodetree.process('m-nocp.decode', extra_args: '--decode=disas_m_nocp'),
   decodetree.process('a32.decode', extra_args: '--static-decode=disas_a32'),
   decodetree.process('a32-uncond.decode', extra_args: '--static-decode=disas_a32_uncond'),
@@ -XXX,XX +XXX,XX @@ arm_ss.add(files(
   'tlb_helper.c',
   'translate.c',
   'translate-m-nocp.c',
+  'translate-vfp.c',
   'vec_helper.c',
   'vfp_helper.c',
   'cpu_tcg.c',
-- 
2.20.1

The function vfp_reg_ptr() is used only in translate-neon.c.inc;
move it there.

diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
     }
 }
 
-static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
-{
-    TCGv_ptr ret = tcg_temp_new_ptr();
-    tcg_gen_addi_ptr(ret, cpu_env, vfp_reg_offset(dp, reg));
-    return ret;
-}
-
 #define ARM_CP_RW_BIT   (1 << 20)
 
 /* Include the Neon decoder */
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c.inc
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c.inc
@@ -XXX,XX +XXX,XX @@ static inline int neon_3same_fp_size(DisasContext *s, int x)
 #include "decode-neon-ls.c.inc"
 #include "decode-neon-shared.c.inc"
 
+static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
+{
+    TCGv_ptr ret = tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(ret, cpu_env, vfp_reg_offset(dp, reg));
+    return ret;
+}
+
 static void neon_load_element(TCGv_i32 var, int reg, int ele, MemOp mop)
 {
     long offset = neon_element_offset(reg, ele, mop & MO_SIZE);
-- 
2.20.1

Move the NeonGenThreeOpEnvFn typedef to translate.h together
with the other similar typedefs.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210430132740.10391-12-peter.maydell@linaro.org
---
 target/arm/translate.h | 2 ++
 target/arm/translate.c | 3 ---
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -XXX,XX +XXX,XX @@ typedef void NeonGenOneOpFn(TCGv_i32, TCGv_i32);
 typedef void NeonGenOneOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32);
 typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
 typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
+typedef void NeonGenThreeOpEnvFn(TCGv_i32, TCGv_env, TCGv_i32,
+                                 TCGv_i32, TCGv_i32);
 typedef void NeonGenTwo64OpFn(TCGv_i64, TCGv_i64, TCGv_i64);
 typedef void NeonGenTwo64OpEnvFn(TCGv_i64, TCGv_ptr, TCGv_i64, TCGv_i64);
 typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static const char * const regnames[] =
     { "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
       "r8", "r9", "r10", "r11", "r12", "r13", "r14", "pc" };
 
-/* Function prototypes for gen_ functions calling Neon helpers.  */
-typedef void NeonGenThreeOpEnvFn(TCGv_i32, TCGv_env, TCGv_i32,
-                                 TCGv_i32, TCGv_i32);
 
 /* initialize TCG globals.  */
 void arm_translate_init(void)
-- 
2.20.1

Make the remaining functions needed by the translate-neon code
global.

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -XXX,XX +XXX,XX @@ void gen_set_pc_im(DisasContext *s, target_ulong val);
 void gen_lookup_tb(DisasContext *s);
 long vfp_reg_offset(bool dp, unsigned reg);
 long neon_full_reg_offset(unsigned reg);
+long neon_element_offset(int reg, int element, MemOp memop);
+void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
 
 static inline TCGv_i32 load_cpu_offset(int offset)
 {
@@ -XXX,XX +XXX,XX @@ DO_GEN_ST(32, MO_UL)
 /* Set NZCV flags from the high 4 bits of var.  */
 #define gen_set_nzcv(var) gen_set_cpsr(var, CPSR_NZCV)
 
+/* Swap low and high halfwords.  */
+static inline void gen_swap_half(TCGv_i32 dest, TCGv_i32 var)
+{
+    tcg_gen_rotri_i32(dest, var, 16);
+}
+
 #endif
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ static void gen_smul_dual(TCGv_i32 a, TCGv_i32 b)
 }
 
 /* Byteswap each halfword.  */
-static void gen_rev16(TCGv_i32 dest, TCGv_i32 var)
+void gen_rev16(TCGv_i32 dest, TCGv_i32 var)
 {
     TCGv_i32 tmp = tcg_temp_new_i32();
     TCGv_i32 mask = tcg_const_i32(0x00ff00ff);
@@ -XXX,XX +XXX,XX @@ static void gen_revsh(TCGv_i32 dest, TCGv_i32 var)
     tcg_gen_ext16s_i32(dest, var);
 }
 
-/* Swap low and high halfwords.  */
-static void gen_swap_half(TCGv_i32 dest, TCGv_i32 var)
-{
-    tcg_gen_rotri_i32(dest, var, 16);
-}
-
 /* Dual 16-bit add.  Result placed in t0 and t1 is marked as dead.
     tmp = (t0 ^ t1) & 0x8000;
     t0 &= ~0x8000;
@@ -XXX,XX +XXX,XX @@ long neon_full_reg_offset(unsigned reg)
  * Return the offset of a 2**SIZE piece of a NEON register, at index ELE,
  * where 0 is the least significant end of the register.
  */
-static long neon_element_offset(int reg, int element, MemOp memop)
+long neon_element_offset(int reg, int element, MemOp memop)
 {
     int element_size = 1 << (memop & MO_SIZE);
     int ofs = element * element_size;
-- 
2.20.1

Switch translate-neon.c.inc from being #included into translate.c
to being its own compilation unit.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210430132740.10391-14-peter.maydell@linaro.org
---
 target/arm/translate-a32.h                           |  3 +++
 .../arm/{translate-neon.c.inc => translate-neon.c}   | 12 +++++++-----
 target/arm/translate.c                               |  3 ---
 target/arm/meson.build                               |  7 ++++---
 4 files changed, 14 insertions(+), 11 deletions(-)
 rename target/arm/{translate-neon.c.inc => translate-neon.c} (99%)

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -XXX,XX +XXX,XX @@
 bool disas_m_nocp(DisasContext *dc, uint32_t insn);
 bool disas_vfp(DisasContext *s, uint32_t insn);
 bool disas_vfp_uncond(DisasContext *s, uint32_t insn);
+bool disas_neon_dp(DisasContext *s, uint32_t insn);
+bool disas_neon_ls(DisasContext *s, uint32_t insn);
+bool disas_neon_shared(DisasContext *s, uint32_t insn);
 
 void load_reg_var(DisasContext *s, TCGv_i32 var, int reg);
 void arm_gen_condlabel(DisasContext *s);
diff --git a/target/arm/translate-neon.c.inc b/target/arm/translate-neon.c
similarity index 99%
rename from target/arm/translate-neon.c.inc
rename to target/arm/translate-neon.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate-neon.c.inc
+++ b/target/arm/translate-neon.c
@@ -XXX,XX +XXX,XX @@
  * License along with this library; if not, see <http://www.gnu.org/licenses/>.
  */
 
-/*
- * This file is intended to be included from translate.c; it uses
- * some macros and definitions provided by that file.
- * It might be possible to convert it to a standalone .c file eventually.
- */
+#include "qemu/osdep.h"
+#include "tcg/tcg-op.h"
+#include "tcg/tcg-op-gvec.h"
+#include "exec/exec-all.h"
+#include "exec/gen-icount.h"
+#include "translate.h"
+#include "translate-a32.h"
 
 static inline int plus1(DisasContext *s, int x)
 {
diff --git a/target/arm/translate.c b/target/arm/translate.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -XXX,XX +XXX,XX @@ void write_neon_element64(TCGv_i64 src, int reg, int ele, MemOp memop)
 
 #define ARM_CP_RW_BIT   (1 << 20)
 
-/* Include the Neon decoder */
-#include "translate-neon.c.inc"
-
 static inline void iwmmxt_load_reg(TCGv_i64 var, int reg)
 {
     tcg_gen_ld_i64(var, cpu_env, offsetof(CPUARMState, iwmmxt.regs[reg]));
diff --git a/target/arm/meson.build b/target/arm/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -XXX,XX +XXX,XX @@
 gen = [
   decodetree.process('sve.decode', extra_args: '--decode=disas_sve'),
-  decodetree.process('neon-shared.decode', extra_args: '--static-decode=disas_neon_shared'),
-  decodetree.process('neon-dp.decode', extra_args: '--static-decode=disas_neon_dp'),
-  decodetree.process('neon-ls.decode', extra_args: '--static-decode=disas_neon_ls'),
+  decodetree.process('neon-shared.decode', extra_args: '--decode=disas_neon_shared'),
+  decodetree.process('neon-dp.decode', extra_args: '--decode=disas_neon_dp'),
+  decodetree.process('neon-ls.decode', extra_args: '--decode=disas_neon_ls'),
   decodetree.process('vfp.decode', extra_args: '--decode=disas_vfp'),
   decodetree.process('vfp-uncond.decode', extra_args: '--decode=disas_vfp_uncond'),
   decodetree.process('m-nocp.decode', extra_args: '--decode=disas_m_nocp'),
@@ -XXX,XX +XXX,XX @@ arm_ss.add(files(
   'tlb_helper.c',
   'translate.c',
   'translate-m-nocp.c',
+  'translate-neon.c',
   'translate-vfp.c',
   'vec_helper.c',
   'vfp_helper.c',
-- 
2.20.1

The WFI insn is not system-mode only, though it doesn't usually make
a huge amount of sense for userspace code to execute it.  Currently
if you try it in qemu-arm then the helper function will raise an
EXCP_HLT exception, which is not covered by the switch in cpu_loop()
and results in an abort:

qemu: unhandled CPU exception 0x10001 - aborting
R00=00000001 R01=408003e4 R02=408003ec R03=000102ec
R04=00010a28 R05=00010158 R06=00087460 R07=00010158
R08=00000000 R09=00000000 R10=00085b7c R11=408002a4
R12=408002b8 R13=408002a0 R14=0001057c R15=000102f8
PSR=60000010 -ZC- A usr32
qemu:handle_cpu_signal received signal outside vCPU context @ pc=0x7fcbfa4f0a12

Make the WFI helper function return immediately in the usermode
emulator. This turns WFI into a NOP, which is OK because:
 * architecturally "WFI is a NOP" is a permitted implementation
 * aarch64 Linux kernels use the SCTLR_EL1.nTWI bit to trap
   userspace WFI and NOP it (though aarch32 kernels currently
   just let WFI do whatever it would do)

We could in theory make the translate.c code special case user-mode
emulation and NOP the insn entirely rather than making the helper
do nothing, but because no real world code will be trying to
execute WFI we don't care about efficiency and the helper provides
a single place where we can make the change rather than having
to touch multiple places in translate.c and translate-a64.c.

Fixes: https://bugs.launchpad.net/qemu/+bug/1926759
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210430162212.825-1-peter.maydell@linaro.org
---
 target/arm/op_helper.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index XXXXXXX..XXXXXXX 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -XXX,XX +XXX,XX @@ static inline int check_wfx_trap(CPUARMState *env, bool is_wfe)
 
 void HELPER(wfi)(CPUARMState *env, uint32_t insn_len)
 {
+#ifdef CONFIG_USER_ONLY
+    /*
+     * WFI in the user-mode emulator is technically permitted but not
+     * something any real-world code would do. AArch64 Linux kernels
+     * trap it via SCTRL_EL1.nTWI and make it an (expensive) NOP;
+     * AArch32 kernels don't trap it so it will delay a bit.
+     * For QEMU, make it NOP here, because trying to raise EXCP_HLT
+     * would trigger an abort.
+     */
+    return;
+#else
     CPUState *cs = env_cpu(env);
     int target_el = check_wfx_trap(env, false);
 
@@ -XXX,XX +XXX,XX @@ void HELPER(wfi)(CPUARMState *env, uint32_t insn_len)
     cs->exception_index = EXCP_HLT;
     cs->halted = 1;
     cpu_loop_exit(cs);
+#endif
 }
 
 void HELPER(wfe)(CPUARMState *env)
-- 
2.20.1

The omap_mmc_reset() function resets its SD card via
device_legacy_reset().  We know that the SD card does not have a qbus
of its own, so the new device_cold_reset() function (which resets
both the device and its child buses) is equivalent here to
device_legacy_reset() and we can just switch to the new API.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210430222348.8514-1-peter.maydell@linaro.org
---
 hw/sd/omap_mmc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/sd/omap_mmc.c b/hw/sd/omap_mmc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/sd/omap_mmc.c
+++ b/hw/sd/omap_mmc.c
@@ -XXX,XX +XXX,XX @@ void omap_mmc_reset(struct omap_mmc_s *host)
      * into any bus, and we must reset it manually. When omap_mmc is
      * QOMified this must move into the QOM reset function.
      */
-    device_legacy_reset(DEVICE(host->card));
+    device_cold_reset(DEVICE(host->card));
 }
 
 static uint64_t omap_mmc_read(void *opaque, hwaddr offset,
-- 
2.20.1

Both os-win32.h and os-posix.h include system header files. Instead
of having osdep.h include them inside its 'extern "C"' block, make
these headers handle that themselves, so that we don't include the
system headers inside 'extern "C"'.

This doesn't fix any current problems, but it's conceptually the
right way to handle system headers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/osdep.h      | 8 ++++----
 include/sysemu/os-posix.h | 8 ++++++++
 include/sysemu/os-win32.h | 8 ++++++++
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -XXX,XX +XXX,XX @@ QEMU_EXTERN_C int daemon(int, int);
  */
 #include "glib-compat.h"
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef _WIN32
 #include "sysemu/os-win32.h"
 #endif
@@ -XXX,XX +XXX,XX @@ extern "C" {
 #include "sysemu/os-posix.h"
 #endif
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #include "qemu/typedefs.h"
 
 /*
diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/os-posix.h
+++ b/include/sysemu/os-posix.h
@@ -XXX,XX +XXX,XX @@
 #include <sys/sysmacros.h>
 #endif
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 void os_set_line_buffering(void);
 void os_set_proc_name(const char *s);
 void os_setup_signal_handling(void);
@@ -XXX,XX +XXX,XX @@ static inline void qemu_funlockfile(FILE *f)
     funlockfile(f);
 }
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif
diff --git a/include/sysemu/os-win32.h b/include/sysemu/os-win32.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/os-win32.h
+++ b/include/sysemu/os-win32.h
@@ -XXX,XX +XXX,XX @@
 #include <windows.h>
 #include <ws2tcpip.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #if defined(_WIN64)
 /* On w64, setjmp is implemented by _setjmp which needs a second parameter.
  * If this parameter is NULL, longjump does no stack unwinding.
@@ -XXX,XX +XXX,XX @@ ssize_t qemu_recv_wrap(int sockfd, void *buf, size_t len, int flags);
 ssize_t qemu_recvfrom_wrap(int sockfd, void *buf, size_t len, int flags,
                            struct sockaddr *addr, socklen_t *addrlen);
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif
-- 
2.20.1

Make bswap.h handle being included outside an 'extern "C"' block:
all system headers are included first, then all declarations are
put inside an 'extern "C"' block.

This requires a little rearrangement as currently we have an ifdef
ladder that has some system includes and some local declarations
or definitions, and we need to separate those out.

We want to do this because dis-asm.h includes bswap.h, dis-asm.h
may need to be included from C++ files, and system headers should
not be included within 'extern "C"' blocks.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/bswap.h | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/bswap.h
+++ b/include/qemu/bswap.h
@@ -XXX,XX +XXX,XX @@
 #ifndef BSWAP_H
 #define BSWAP_H
 
-#include "fpu/softfloat-types.h"
-
 #ifdef CONFIG_MACHINE_BSWAP_H
 # include <sys/endian.h>
 # include <machine/bswap.h>
@@ -XXX,XX +XXX,XX @@
 # include <endian.h>
 #elif defined(CONFIG_BYTESWAP_H)
 # include <byteswap.h>
+#define BSWAP_FROM_BYTESWAP
+# else
+#define BSWAP_FROM_FALLBACKS
+#endif /* ! CONFIG_MACHINE_BSWAP_H */
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "fpu/softfloat-types.h"
+
+#ifdef BSWAP_FROM_BYTESWAP
 static inline uint16_t bswap16(uint16_t x)
 {
     return bswap_16(x);
@@ -XXX,XX +XXX,XX @@ static inline uint64_t bswap64(uint64_t x)
 {
     return bswap_64(x);
 }
-# else
+#endif
+
+#ifdef BSWAP_FROM_FALLBACKS
 static inline uint16_t bswap16(uint16_t x)
 {
     return (((x & 0x00ff) << 8) |
@@ -XXX,XX +XXX,XX @@ static inline uint64_t bswap64(uint64_t x)
             ((x & 0x00ff000000000000ULL) >> 40) |
             ((x & 0xff00000000000000ULL) >> 56));
 }
-#endif /* ! CONFIG_MACHINE_BSWAP_H */
+#endif
+
+#undef BSWAP_FROM_BYTESWAP
+#undef BSWAP_FROM_FALLBACKS
 
 static inline void bswap16s(uint16_t *s)
 {
@@ -XXX,XX +XXX,XX @@ DO_STN_LDN_P(be)
 #undef le_bswaps
 #undef be_bswaps
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* BSWAP_H */
-- 
2.20.1

Make dis-asm.h handle being included outside an 'extern "C"' block;
this allows us to remove the 'extern "C"' blocks that our two C++
files that include it are using.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/disas/dis-asm.h | 12 ++++++++++--
 disas/arm-a64.cc        |  2 --
 disas/nanomips.cpp      |  2 --
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/include/disas/dis-asm.h b/include/disas/dis-asm.h
index XXXXXXX..XXXXXXX 100644
--- a/include/disas/dis-asm.h
+++ b/include/disas/dis-asm.h
@@ -XXX,XX +XXX,XX @@
 #ifndef DISAS_DIS_ASM_H
 #define DISAS_DIS_ASM_H
 
+#include "qemu/bswap.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 typedef void *PTR;
 typedef uint64_t bfd_vma;
 typedef int64_t bfd_signed_vma;
@@ -XXX,XX +XXX,XX @@ bool cap_disas_plugin(disassemble_info *info, uint64_t pc, size_t size);
 
 /* from libbfd */
 
-#include "qemu/bswap.h"
-
 static inline bfd_vma bfd_getl64(const bfd_byte *addr)
 {
     return ldq_le_p(addr);
@@ -XXX,XX +XXX,XX @@ static inline bfd_vma bfd_getb16(const bfd_byte *addr)
 
 typedef bool bfd_boolean;
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* DISAS_DIS_ASM_H */
diff --git a/disas/arm-a64.cc b/disas/arm-a64.cc
index XXXXXXX..XXXXXXX 100644
--- a/disas/arm-a64.cc
+++ b/disas/arm-a64.cc
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
-extern "C" {
 #include "disas/dis-asm.h"
-}
 
 #include "vixl/a64/disasm-a64.h"
 
diff --git a/disas/nanomips.cpp b/disas/nanomips.cpp
index XXXXXXX..XXXXXXX 100644
--- a/disas/nanomips.cpp
+++ b/disas/nanomips.cpp
@@ -XXX,XX +XXX,XX @@
  */
 
 #include "qemu/osdep.h"
-extern "C" {
 #include "disas/dis-asm.h"
-}
 
 #include <cstring>
 #include <stdexcept>
-- 
2.20.1

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

The i.MX25 PDK board has 2 banks for SDRAM, each can
address up to 256 MiB. So the total RAM usable for this
board is 512M. When we ask for more we get a misleading
error message:

$ qemu-system-arm -M imx25-pdk -m 513M
  qemu-system-arm: Invalid RAM size, should be 128 MiB

Update the error message to better match the reality:

$ qemu-system-arm -M imx25-pdk -m 513M
  qemu-system-arm: RAM size more than 512 MiB is not supported

Fixes: bf350daae02 ("arm/imx25_pdk: drop RAM size fixup")
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 20210407225608.1882855-1-f4bug@amsat.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/imx25_pdk.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/arm/imx25_pdk.c b/hw/arm/imx25_pdk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/imx25_pdk.c
+++ b/hw/arm/imx25_pdk.c
@@ -XXX,XX +XXX,XX @@ static struct arm_boot_info imx25_pdk_binfo;
 
 static void imx25_pdk_init(MachineState *machine)
 {
-    MachineClass *mc = MACHINE_GET_CLASS(machine);
     IMX25PDK *s = g_new0(IMX25PDK, 1);
     unsigned int ram_size;
     unsigned int alias_offset;
@@ -XXX,XX +XXX,XX @@ static void imx25_pdk_init(MachineState *machine)
 
     /* We need to initialize our memory */
     if (machine->ram_size > (FSL_IMX25_SDRAM0_SIZE + FSL_IMX25_SDRAM1_SIZE)) {
-        char *sz = size_to_str(mc->default_ram_size);
-        error_report("Invalid RAM size, should be %s", sz);
+        char *sz = size_to_str(FSL_IMX25_SDRAM0_SIZE + FSL_IMX25_SDRAM1_SIZE);
+        error_report("RAM size more than %s is not supported", sz);
         g_free(sz);
         exit(EXIT_FAILURE);
     }
-- 
2.20.1

The MPS2 SCC device doesn't have any documentation of its properties;
add a "QEMU interface" format comment describing them.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210504120912.23094-2-peter.maydell@linaro.org
---
 include/hw/misc/mps2-scc.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/hw/misc/mps2-scc.h b/include/hw/misc/mps2-scc.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/mps2-scc.h
+++ b/include/hw/misc/mps2-scc.h
@@ -XXX,XX +XXX,XX @@
  *  (at your option) any later version.
  */
 
+/*
+ * This is a model of the Serial Communication Controller (SCC)
+ * block found in most MPS FPGA images.
+ *
+ * QEMU interface:
+ *  + sysbus MMIO region 0: the register bank
+ *  + QOM property "scc-cfg4": value of the read-only CFG4 register
+ *  + QOM property "scc-aid": value of the read-only SCC_AID register
+ *  + QOM property "scc-id": value of the read-only SCC_ID register
+ *  + QOM property array "oscclk": reset values of the OSCCLK registers
+ *    (which are accessed via the SYS_CFG channel provided by this device)
+ */
 #ifndef MPS2_SCC_H
 #define MPS2_SCC_H
 
-- 
2.20.1

On some boards, SCC config register CFG0 bit 0 controls whether
parts of the board memory map are remapped. Support this with:
 * a device property scc-cfg0 so the board can specify the
   initial value of the CFG0 register
 * an outbound GPIO line which tracks bit 0 and which the board
   can wire up to provide the remapping

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210504120912.23094-3-peter.maydell@linaro.org
---
 include/hw/misc/mps2-scc.h |  9 +++++++++
 hw/misc/mps2-scc.c         | 13 ++++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/include/hw/misc/mps2-scc.h b/include/hw/misc/mps2-scc.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/misc/mps2-scc.h
+++ b/include/hw/misc/mps2-scc.h
@@ -XXX,XX +XXX,XX @@
  *  + QOM property "scc-cfg4": value of the read-only CFG4 register
  *  + QOM property "scc-aid": value of the read-only SCC_AID register
  *  + QOM property "scc-id": value of the read-only SCC_ID register
+ *  + QOM property "scc-cfg0": reset value of the CFG0 register
  *  + QOM property array "oscclk": reset values of the OSCCLK registers
  *    (which are accessed via the SYS_CFG channel provided by this device)
+ *  + named GPIO output "remap": this tracks the value of CFG0 register
+ *    bit 0. Boards where this bit controls memory remapping should
+ *    connect this GPIO line to a function performing that mapping.
+ *    Boards where bit 0 has no special function should leave the GPIO
+ *    output disconnected.
  */
 #ifndef MPS2_SCC_H
 #define MPS2_SCC_H
@@ -XXX,XX +XXX,XX @@ struct MPS2SCC {
     uint32_t num_oscclk;
     uint32_t *oscclk;
     uint32_t *oscclk_reset;
+    uint32_t cfg0_reset;
+
+    qemu_irq remap;
 };
 
 #endif
diff --git a/hw/misc/mps2-scc.c b/hw/misc/mps2-scc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/misc/mps2-scc.c
+++ b/hw/misc/mps2-scc.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/bitops.h"
 #include "trace.h"
 #include "hw/sysbus.h"
+#include "hw/irq.h"
 #include "migration/vmstate.h"
 #include "hw/registerfields.h"
 #include "hw/misc/mps2-scc.h"
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_write(void *opaque, hwaddr offset, uint64_t value,
     switch (offset) {
     case A_CFG0:
         /*
-         * TODO on some boards bit 0 controls RAM remapping;
-         * on others bit 1 is CPU_WAIT.
+         * On some boards bit 0 controls board-specific remapping;
+         * we always reflect bit 0 in the 'remap' GPIO output line,
+         * and let the board wire it up or not as it chooses.
+         * TODO on some boards bit 1 is CPU_WAIT.
          */
         s->cfg0 = value;
+        qemu_set_irq(s->remap, s->cfg0 & 1);
         break;
     case A_CFG1:
         s->cfg1 = value;
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_reset(DeviceState *dev)
     int i;
 
     trace_mps2_scc_reset();
-    s->cfg0 = 0;
+    s->cfg0 = s->cfg0_reset;
     s->cfg1 = 0;
     s->cfg2 = 0;
     s->cfg5 = 0;
@@ -XXX,XX +XXX,XX @@ static void mps2_scc_init(Object *obj)
 
     memory_region_init_io(&s->iomem, obj, &mps2_scc_ops, s, "mps2-scc", 0x1000);
     sysbus_init_mmio(sbd, &s->iomem);
+    qdev_init_gpio_out_named(DEVICE(obj), &s->remap, "remap", 1);
 }
 
 static void mps2_scc_realize(DeviceState *dev, Error **errp)
@@ -XXX,XX +XXX,XX @@ static Property mps2_scc_properties[] = {
     DEFINE_PROP_UINT32("scc-cfg4", MPS2SCC, cfg4, 0),
     DEFINE_PROP_UINT32("scc-aid", MPS2SCC, aid, 0),
     DEFINE_PROP_UINT32("scc-id", MPS2SCC, id, 0),
+    /* Reset value for CFG0 register */
+    DEFINE_PROP_UINT32("scc-cfg0", MPS2SCC, cfg0_reset, 0),
     /*
      * These are the initial settings for the source clocks on the board.
      * In hardware they can be configured via a config file read by the
-- 
2.20.1

The AN524 FPGA image supports two memory maps, which differ in where
the QSPI and BRAM are.  In the default map, the BRAM is at
0x0000_0000, and the QSPI at 0x2800_0000.  In the second map, they
are the other way around.

In hardware, the initial mapping can be selected by the user by
writing either "REMAP: BRAM" (the default) or "REMAP: QSPI" in the
board configuration file.  The board config file is acted on by the
"Motherboard Configuration Controller", which is an entirely separate
microcontroller on the dev board but outside the FPGA.

The guest can also dynamically change the mapping via the SCC
CFG_REG0 register.

Implement this functionality for QEMU, using a machine property
"remap" with valid values "BRAM" and "QSPI" to allow the user to set
the initial mapping, in the same way they can on the FPGA, and
wiring up the bit from the SCC register to also switch the mapping.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210504120912.23094-4-peter.maydell@linaro.org
---
 docs/system/arm/mps2.rst |  10 ++++
 hw/arm/mps2-tz.c         | 108 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/docs/system/arm/mps2.rst b/docs/system/arm/mps2.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/system/arm/mps2.rst
+++ b/docs/system/arm/mps2.rst
@@ -XXX,XX +XXX,XX @@ Differences between QEMU and real hardware:
   flash, but only as simple ROM, so attempting to rewrite the flash
   from the guest will fail
 - QEMU does not model the USB controller in MPS3 boards
+
+Machine-specific options
+""""""""""""""""""""""""
+
+The following machine-specific options are supported:
+
+remap
+  Supported for ``mps3-an524`` only.
+  Set ``BRAM``/``QSPI`` to select the initial memory mapping. The
+  default is ``BRAM``.
diff --git a/hw/arm/mps2-tz.c b/hw/arm/mps2-tz.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/mps2-tz.c
+++ b/hw/arm/mps2-tz.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/boards.h"
 #include "exec/address-spaces.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/reset.h"
 #include "hw/misc/unimp.h"
 #include "hw/char/cmsdk-apb-uart.h"
 #include "hw/timer/cmsdk-apb-timer.h"
@@ -XXX,XX +XXX,XX @@
 #include "hw/core/split-irq.h"
 #include "hw/qdev-clock.h"
 #include "qom/object.h"
+#include "hw/irq.h"
 
 #define MPS2TZ_NUMIRQ_MAX 96
 #define MPS2TZ_RAM_MAX 5
@@ -XXX,XX +XXX,XX @@ struct MPS2TZMachineState {
     SplitIRQ cpu_irq_splitter[MPS2TZ_NUMIRQ_MAX];
     Clock *sysclk;
     Clock *s32kclk;
+
+    bool remap;
+    qemu_irq remap_irq;
 };
 
 #define TYPE_MPS2TZ_MACHINE "mps2tz"
@@ -XXX,XX +XXX,XX @@ static const RAMInfo an505_raminfo[] = { {
     },
 };
 
+/*
+ * Note that the addresses and MPC numbering here should match up
+ * with those used in remap_memory(), which can swap the BRAM and QSPI.
+ */
 static const RAMInfo an524_raminfo[] = { {
         .name = "bram",
         .base = 0x00000000,
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_scc(MPS2TZMachineState *mms, void *opaque,
 
     object_initialize_child(OBJECT(mms), "scc", scc, TYPE_MPS2_SCC);
     sccdev = DEVICE(scc);
+    qdev_prop_set_uint32(sccdev, "scc-cfg0", mms->remap ? 1 : 0);
     qdev_prop_set_uint32(sccdev, "scc-cfg4", 0x2);
     qdev_prop_set_uint32(sccdev, "scc-aid", 0x00200008);
     qdev_prop_set_uint32(sccdev, "scc-id", mmc->scc_id);
@@ -XXX,XX +XXX,XX @@ static MemoryRegion *make_mpc(MPS2TZMachineState *mms, void *opaque,
     return sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 0);
 }
 
+static hwaddr boot_mem_base(MPS2TZMachineState *mms)
+{
+    /*
+     * Return the canonical address of the block which will be mapped
+     * at address 0x0 (i.e. where the vector table is).
+     * This is usually 0, but if the AN524 alternate memory map is
+     * enabled it will be the base address of the QSPI block.
+     */
+    return mms->remap ? 0x28000000 : 0;
+}
+
+static void remap_memory(MPS2TZMachineState *mms, int map)
+{
+    /*
+     * Remap the memory for the AN524. 'map' is the value of
+     * SCC CFG_REG0 bit 0, i.e. 0 for the default map and 1
+     * for the "option 1" mapping where QSPI is at address 0.
+     *
+     * Effectively we need to swap around the "upstream" ends of
+     * MPC 0 and MPC 1.
+     */
+    MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
+    int i;
+
+    if (mmc->fpga_type != FPGA_AN524) {
+        return;
+    }
+
+    memory_region_transaction_begin();
+    for (i = 0; i < 2; i++) {
+        TZMPC *mpc = &mms->mpc[i];
+        MemoryRegion *upstream = sysbus_mmio_get_region(SYS_BUS_DEVICE(mpc), 1);
+        hwaddr addr = (i ^ map) ? 0x28000000 : 0;
+
+        memory_region_set_address(upstream, addr);
+    }
+    memory_region_transaction_commit();
+}
+
+static void remap_irq_fn(void *opaque, int n, int level)
+{
+    MPS2TZMachineState *mms = opaque;
+
+    remap_memory(mms, level);
+}
+
 static MemoryRegion *make_dma(MPS2TZMachineState *mms, void *opaque,
                               const char *name, hwaddr size,
                               const int *irqs)
@@ -XXX,XX +XXX,XX @@ static uint32_t boot_ram_size(MPS2TZMachineState *mms)
     MPS2TZMachineClass *mmc = MPS2TZ_MACHINE_GET_CLASS(mms);
 
     for (p = mmc->raminfo; p->name; p++) {
-        if (p->base == 0) {
+        if (p->base == boot_mem_base(mms)) {
             return p->size;
         }
     }
@@ -XXX,XX +XXX,XX @@ static void mps2tz_common_init(MachineState *machine)
 
     create_non_mpc_ram(mms);
 
+    if (mmc->fpga_type == FPGA_AN524) {
+        /*
+         * Connect the line from the SCC so that we can remap when the
+         * guest updates that register.
+         */
+        mms->remap_irq = qemu_allocate_irq(remap_irq_fn, mms, 0);
+        qdev_connect_gpio_out_named(DEVICE(&mms->scc), "remap", 0,
+                                    mms->remap_irq);
+    }
+
     armv7m_load_kernel(ARM_CPU(first_cpu), machine->kernel_filename,
                        boot_ram_size(mms));
 }
@@ -XXX,XX +XXX,XX @@ static void mps2_tz_idau_check(IDAUInterface *ii, uint32_t address,
     *iregion = region;
 }
 
+static char *mps2_get_remap(Object *obj, Error **errp)
+{
+    MPS2TZMachineState *mms = MPS2TZ_MACHINE(obj);
+    const char *val = mms->remap ? "QSPI" : "BRAM";
+    return g_strdup(val);
+}
+
+static void mps2_set_remap(Object *obj, const char *value, Error **errp)
+{
+    MPS2TZMachineState *mms = MPS2TZ_MACHINE(obj);
+
+    if (!strcmp(value, "BRAM")) {
+        mms->remap = false;
+    } else if (!strcmp(value, "QSPI")) {
+        mms->remap = true;
+    } else {
+        error_setg(errp, "Invalid remap value");
+        error_append_hint(errp, "Valid values are BRAM and QSPI.\n");
+    }
+}
+
+static void mps2_machine_reset(MachineState *machine)
+{
+    MPS2TZMachineState *mms = MPS2TZ_MACHINE(machine);
+
+    /*
+     * Set the initial memory mapping before triggering the reset of
+     * the rest of the system, so that the guest image loader and CPU
+     * reset see the correct mapping.
+     */
+    remap_memory(mms, mms->remap);
+    qemu_devices_reset();
+}
+
 static void mps2tz_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
     IDAUInterfaceClass *iic = IDAU_INTERFACE_CLASS(oc);
 
     mc->init = mps2tz_common_init;
+    mc->reset = mps2_machine_reset;
     iic->check = mps2_tz_idau_check;
 }
 
@@ -XXX,XX +XXX,XX @@ static void mps3tz_an524_class_init(ObjectClass *oc, void *data)
     mmc->raminfo = an524_raminfo;
     mmc->armsse_type = TYPE_SSE200;
     mps2tz_set_default_ram_info(mmc);
+
+    object_class_property_add_str(oc, "remap", mps2_get_remap, mps2_set_remap);
+    object_class_property_set_description(oc, "remap",
+                                          "Set memory mapping. Valid values "
+                                          "are BRAM (default) and QSPI.");
 }
 
 static void mps3tz_an547_class_init(ObjectClass *oc, void *data)
-- 
2.20.1

From: Guenter Roeck <linux@roeck-us.net>

Commit dfc388797cc4 ("hw/arm: xlnx: Set all boards' GEM 'phy-addr'
property value to 23") configured the PHY address for xilinx-zynq-a9
to 23. When trying to boot xilinx-zynq-a9 with zynq-zc702.dtb or
zynq-zc706.dtb, this results in the following error message when
trying to use the Ethernet interface.

macb e000b000.ethernet eth0: Could not attach PHY (-19)

The devicetree files for ZC702 and ZC706 configure PHY address 7. The
documentation for the ZC702 and ZC706 evaluation boards suggest that the
PHY address is 7, not 23. Other boards use PHY address 0, 1, 3, or 7.
I was unable to find a documentation or a devicetree file suggesting
or using PHY address 23. The Ethernet interface starts working with
zynq-zc702.dtb and zynq-zc706.dtb when setting the PHY address to 7,
so let's use it.

Cc: Bin Meng <bin.meng@windriver.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 20210504124140.1100346-1-linux@roeck-us.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/xilinx_zynq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -XXX,XX +XXX,XX @@ static void gem_init(NICInfo *nd, uint32_t base, qemu_irq irq)
         qemu_check_nic_model(nd, TYPE_CADENCE_GEM);
         qdev_set_nic_properties(dev, nd);
     }
-    object_property_set_int(OBJECT(dev), "phy-addr", 23, &error_abort);
+    object_property_set_int(OBJECT(dev), "phy-addr", 7, &error_abort);
     s = SYS_BUS_DEVICE(dev);
     sysbus_realize_and_unref(s, &error_fatal);
     sysbus_mmio_map(s, 0, base);
-- 
2.20.1